Comparative analysis of copy number detection by whole-genome BAC and oligonucleotide array CGH

Background Microarray-based comparative genomic hybridization (aCGH) is a powerful diagnostic tool for the detection of DNA copy number gains and losses associated with chromosome abnormalities, many of which are below the resolution of conventional chromosome analysis. It has been presumed that whole-genome oligonucleotide (oligo) arrays identify more clinically significant copy-number abnormalities than whole-genome bacterial artificial chromosome (BAC) arrays, yet this has not been systematically studied in a clinical diagnostic setting. Results To determine the difference in detection rate between similarly designed BAC and oligo arrays, we developed whole-genome BAC and oligonucleotide microarrays and validated them in a side-by-side comparison of 466 consecutive clinical specimens submitted to our laboratory for aCGH. Of the 466 cases studied, 67 (14.3%) had a copy-number imbalance of potential clinical significance detectable by the whole-genome BAC array, and 73 (15.6%) had a copy-number imbalance of potential clinical significance detectable by the whole-genome oligo array. However, because both platforms identified copy number variants of unclear clinical significance, we designed a systematic method for the interpretation of copy number alterations and tested an additional 3,443 cases by BAC array and 3,096 cases by oligo array. Of those cases tested on the BAC array, 17.6% were found to have a copy-number abnormality of potential clinical significance, whereas the detection rate increased to 22.5% for the cases tested by oligo array. In addition, we validated the oligo array for detection of mosaicism and found that it could routinely detect mosaicism at levels of 30% and greater. Conclusions Although BAC arrays have faster turnaround times, the increased detection rate of oligo arrays makes them attractive for clinical cytogenetic testing.


Introduction
Molecular cytogenetic techniques such as array-based comparative genomic hybridization (aCGH) have revolutionized cytogenetic diagnostics and, in turn, the clinical management of patients with developmental delays and multiple congenital anomalies [1,2]. These rapid, highresolution, and highly accurate techniques have identified numerous previously unrecognized chromosomal syndromes [3][4][5][6][7][8], refined critical regions for established genetic defects [9], and broadened our view of the "normal" diploid genome [10]. In addition, aCGH has given the clinician a greater appreciation of variability in the clinical presentation of many well-described conditions [11,12] and allowed for the discovery of new conditions with relatively mild phenotypes [13,14]. Furthermore, the application of aCGH has created a paradigm shift in genetics that has moved the description and discovery of genetic conditions from the "phenotype-first" approach, in which patients exhibiting similar clinical features are identified prior to the discovery of an underlying etiology, to a "genotype-first" approach, in which a collection of individuals with similar copy-number imbalances can be examined for common clinical features [15].
Originally, targeted microarrays constructed from bacterial artificial chromosomes (BAC) were developed for the clinical laboratory because of their ability to clearly identify copy number changes in discrete regions of the human genome known to play a role in genetic disease [16]. This "less is more" idea prevailed in the early years of clinical aCGH because the technology was new and proof of principle was required before it could be adopted for more widespread diagnostic use. Furthermore, the identification of copy number alterations of unclear clinical significance was considered undesirable to the diagnostician, the ordering physician, and the patient's family. Recently, the coverage of microarrays has expanded to include more comprehensive coverage of the human genome, leading many to suggest that whole-genome BAC or oligo arrays are the next step in the continued improvement in the detection rate of cytogenetic abnormalities.
It has been presumed that whole-genome oligonucleotide arrays, because they have higher resolutions, would detect more copy number aberrations than wholegenome BAC arrays. However, to our knowledge, there has not been a systematic comparison of these two whole-genome copy number screening technologies in a clinical diagnostic environment. Therefore, to determine which platform is most effective in identifying clinically significant DNA copy number alterations, we designed a whole-genome BAC array and a whole-genome oligo array and compared the results in a blinded study of 466 clinical diagnostic specimens. In addition, we prospectively evaluated 3,443 patients by the whole-genome BAC array and 3,096 patients by the whole-genome oligo array and compared the detection rates of clinically significant abnormalities and those of unclear clinical significance. Finally, we validated our oligo array with 48 cases to determine the level of mosaicism that can be reliably detected and compared that level to our previously published cases analyzed using the BAC array.

Whole-genome BAC array design and aCGH
We constructed a whole-genome BAC array designed for clinical diagnostic use using >4,600 BAC clones. All clones were validated by FISH prior to inclusion on the array using previously described validation procedures [16]. Contigs of 3-6 overlapping clones were selected to cover 1,543 genetic loci, including >150 known microdeletion/microduplication syndromes and increased density of coverage in the 5-10 Mb surrounding the subtelomeric and pericentromeric regions of the genome. In addition, we placed contigs to cover >500 functionally significant genes such as transcription factors and other genes known to play important roles in development. This coverage also includes genome-wide representation with at least one contig in nearly every chromosomal band at the resolution of an 850-band karyotype. The mean gap size for the whole-genome BAC array is~1.6 Mb. Microarray manufacturing and aCGH analysis using the whole-genome BAC array were performed as previously described [13]. BAC arrays were analyzed after a dye-swap, two-experiment analysis [16], using sex-mismatched controls. Results were then displayed using custom BAC aCGH analysis software (Genoglyphix™; Signature Genomic Laboratories, Spokane, WA).

Whole-genome Oligonucleotide Array Design and aCGH
Oligonucleotide-based microarray analysis was performed using a custom-designed, 105K-feature whole-genome microarray manufactured by Agilent Technologies (Santa Clara, CA) with one probe every 10 kb in regions of interest-microdeletion/microduplication syndromes, the pericentromeric regions, subtelomeres and genes involved in important developmental pathways-for an average of 50 oligos per clinical locus. In addition, to achieve backbone coverage, we placed a probe, on average, every 35 kb throughout the rest of the genome between the regions of interest. Genomic DNA labeling was performed as described for BAC arrays, whereas array hybridization and washing were performed as specified by the manufacturer (Agilent Technologies). A dye swap was not performed for the oligo arrays, and sex-matched controls were used. Arrays were scanned and analyzed as previously described [17]. Results of aberration calls consisting of five or more consecutive oligos were then displayed using custom oligonucleotide aCGH analysis software (Genoglyphix™; Signature Genomic Laboratories). The use of five consecutive oligos achieved a resolution of 40 kb in the regions of interest and a resolution of 140 kb in the backbone.

Decision Algorithm for Clinically Significant Copy Number Reporting
We developed a decision algorithm for classifying clinically significant copy number alterations, alterations of unclear clinical significance, and alterations of no currently known clinical significance. Alterations that were associated with established chromosomal syndromes, were large and affected a significant amount of gene content, or were part of a complex rearrangement such as an unbalanced translocation, insertion, or marker chromosome were characterized as clinically significant. Alterations with unclear clinical significance were most commonly those which were not currently associated with a syndrome but which affected gene content which may have contributed to the patient's phenotype and those which could not be precisely refined by the BAC array. Alterations were considered to have no known clinical significance if they were small, affected minimal gene content, and/or were present in regions where common copy-number variation was known to occur in the general population. Signature's own Genoglyphix Chromosome Aberration Database (GCAD) was used as a reference to assist in the interpretation of each alteration. GCAD is a database of >11,000 chromosomal abnormalities identified in >9,500 patients out of >40,000 patients evaluated by our laboratory and contains detailed statistics of each observed alteration (breakpoint coordinates, size, gene content, etc.) as well as clinical information pertaining to patient referral.

Fluorescence in situ Hybridization (FISH)
When possible, all copy number alterations detected by microarray analysis were visualized by interphase and/or metaphase FISH using a BAC probe located within the region of gain or loss. FISH was performed as previously described [18].

Patient Clinical Testing
To validate the custom-designed 105k oligo array compared to the whole-genome BAC array, 466 cases were run side-by-side in a platform comparison study. In each case, the clinically validated BAC array results were used for interpretation and reporting. Specimens with known chromosome abnormalities, parental specimens, and prenatal cases were excluded from the analysis.
In addition, we conducted a prospective study of 3,443 consecutive BAC microarray analyses and 3,096 consecutive oligo microarray analyses in our clinical laboratory. The array platform used for testing in each case was chosen by the referring physician at the time of sample submission to our clinical diagnostic laboratory. Cases with previously known chromosomal abnormalities, parental samples, and prenatal specimens were again excluded from the data collection.

Mosaicism Assessment
The ability of the oligo platform to detect mosaicism was assessed on 48 patients previously known to carry mosaic abnormalities at levels as low as 5%. The alterations studied included a variety of interstitial, terminal, and whole-chromosome copy-number abnormalities, as well as marker chromosomes. The mosaic alteration in each patient was initially assessed by BAC array and the level of mosaicism determined by interphase FISH analysis when possible. In a separate experiment, mosaicism was assessed using a dilution of cells from a male with trisomy 21 with normal male control cells, as previously described [19]. After FISH verification of the dilutions, DNA was extracted from the diluted cells, labeled and hybridized to the custom-designed oligo array as described above.

Platform comparison study
From the 466 cases analyzed by the BAC array, using the previously described algorithm, we excluded 347 cases that only had aberrations located within regions that contained no genes and/or aberrations that had been established to be normal population variants by Signature Genomic Laboratories or identified in the Toronto Database of Genomic Variants (DGV, http:// projects.tcag.ca/variation/). After these cases were excluded, 138 copy number alterations in 119 cases (25.5% of the original 466 cases) remained that required FISH analysis. These aberrations included subtelomeric and pericentromeric gains for which FISH was required to exclude an unbalanced translocation or a marker chromosome. After FISH was performed, 60 aberrations in 52 cases were classified as normal variants because marker chromosomes and derivative chromosomes were not identified and because these alterations were located within regions where common copy number variation is known to occur. Thus, alterations of potential clinical significance according to our algorithm were identified in 67 cases, a detection rate of 14.4%. Of these cases, 56 (12.0%) were considered to contain clinically significant copy number alterations (Table 1), and 11 (2.4%) were considered to contain copy number variants of unclear clinical significance for which parental analyses were recommended to further clarify the abnormality ( Table  2). aCGH and FISH analysis performed on parental samples revealed that six alterations of unclear significance were inherited from a carrier parent and one was a de novo event in the proband. The origin of the other four unclear alterations could not be determined.
Using the oligo array, we identified 1,337 copy number variations among the same 466 cases. Using the algorithm previously described, we excluded 1,172 aberrations that were located within regions that had no gene content or those that were common copy number variants. After these exclusions were made, 165 aberrations in 138 cases (29.6%) remained that required FISH analysis. After FISH analysis was performed, aberrations of potential clinical significance were identified in 73 cases, a detection rate of 15.7%. Of these, the same 56 (12.0%) cases that were identified by the BAC platform were considered to contain clinically significant alterations (Table  1) and 17 (3.7%) were determined to contain copy number variants of unclear clinical significance. Table 3 shows the six cases for which aberrations of unclear clinical significance were identified by the oligo array but not by the BAC array. In all six cases, the aberrations either fell within the gaps in the BAC array coverage or were only partially covered by one or more BACs. The average size of the alterations that were not In two cases, the oligo microarray identified additional complexity that was not recognized by the BAC array. In patient 21566, the BAC array identified one interstitial deletion of 17p13.2p13.1, whereas oligo array analysis identified that deletion and an additional interstitial deletion in the same band (data not shown). In patient 21897, the BAC array identified a 6.8 Mb terminal deletion of 5p, whereas oligo array analysis identified that deletion and a 1.4 Mb duplication proximal to the deleted region ( Figure 1).

Prospective Diagnostic Comparison
Of the 3,443 diagnostic specimens analyzed using our whole-genome BAC array, 605 (17.6%) had copy number alterations. Using the previously described algorithm, 365 (10.6%) had abnormalities that were classified as clinically significant, whereas 240 (6.9%) had copy number variants of unclear clinical significance.
Of the 3,096 diagnostic specimens analyzed using our whole-genome oligo array during the same time period, 698 (22.5%) had copy number alterations. Using the previously described algorithm, 477 (15.9%) of these cases were determined to contain alterations considered to be clinically significant and 221 (7.0%) were determined to contain copy number variants of unclear clinical significance ( Table 4).
The increased number of cases with clinically significant alterations detected by the oligo array was found to be statistically significant using a Fisher's Exact Test (OR = 1.5359, p < .0001). The increased number of cases with alterations of unclear significance detected by the oligo array was not statistically significant (OR = 1.0259, p = 0.8090).

Mosaicism Assessment
All but three of the 48 previously known mosaic alterations were detected by the oligo array. FISH analysis estimated that the proportion of uncultured cells carrying the alteration was 24% in the first case, while the proportion in cultured cells was 6%. In the second case, 5% of cells were found to carry the alteration by FISH (data not shown). The proportion of cells carrying the alteration in the third case could not be determined because FISH confirmation was not possible on the  sample received by our laboratory. Certain alterations, such as tetrasomy 12p, were successfully detected in proportions of cells as low as 10% by the oligo array, although this low threshold of detection was facilitated by the tetrasomic nature of the rearrangement; the 4:2 ratio of patient to control DNA in this case was more readily detected than the 3:2 ratio typically associated with duplications. Figure 2 shows a 2.77 Mb interstitial deletion at 16q12.1 present in 23% of cultured metaphase cells that was detected by the oligo array.
In the dilution series of trisomy 21 cells, shifts in the aCGH data were distinguishable down to levels as low as 10%, but could only be readily detected at a level of 30% or greater ( Figure 3). As the proportion of trisomy 21 cells was increased from 10% to 30%, the average log 2 ratio of chromosome 21 increased from 0.08 to 0.21. During the prospective diagnostic comparison, 16 cases analyzed using the BAC array contained mosaic alterations, whereas only 12 mosaic cases were identified using the oligo array ( Table 5). The increased number of mosaic abnormalities detected by the BAC array was determined to be not statistically significant (OR = 1.1999, p = 0.7066).

Discussion
BAC and oligo array platforms each have unique advantages and disadvantages in a diagnostic setting; these may include turnaround times, genomic coverage, and   Figure 4) or fall within gaps in the BAC array coverage ( Figure 5). Figure 4 shows a 44 kb deletion of 17p13.3 detected in a patient referred to our laboratory for convulsions. This deletion encompasses the first exon of PAFAH1B1 (LIS1). While it is not known whether this deletion results in a null allele or simply a truncated gene product, hemizygous deletions and mutations of this gene are found in patients with isolated lissencephaly type 1 (OMIM 607432) and have been linked to epileptic seizures and convulsions [20,21]. Although RP11-135N5 provides coverage of this region on the BAC array, FISH analysis using this clone could not confirm the deletion in any cells because of the deletion's small size compared to the FISH probe used. Thus, this clinically significant deletion could only have been reliably detected using the oligo platform. Although oligo-based aCGH has the power to detect alterations smaller than the size of a BAC probe, BAC-based aCGH has an advantage in that the analysis makes evident the appropriate probe to be used for FISH confirmation. In addition, this probe is usually readily available because of its inclusion on the microarray platform and will have a high rate of successful confirmation. When oligonucleotide-based aCGH is performed, BAC probes must be specifically selected for the FISH confirmation of each small abnormality that is detected. Once a probe has been selected, it must also be specially prepared or ordered before FISH can be performed. This process increases both the time it takes to perform FISH confirmation of oligo aCGH results and the cost associated with the analysis. Figure 5 shows a 2.9 Mb deletion of 6q14.1 detected in a patient referred to our laboratory for developmental delay and dysmorphic features. This deletion encompasses eight genes: PHIP, HMGN3, LCA5, SH3BGRL2, ELOVL4, TTK, BCKDHB, two of which are known to be associated with human disease [22][23][24]. Although this

2.9
Mb deletion is likely to be clinically significant, it lies within a gap in the coverage of our BAC array and could only be detected using the oligo platform because of its more uniform backbone coverage. The detection rate of alterations of unclear clinical significance is also a concern during the selection of a microarray platform in a clinical diagnostic setting. Our data suggest that both the oligo and BAC platforms detect similar numbers of abnormalities of unclear significance (7.0% by BAC and 7.1% by oligo), although the circumstances leading to an unclear clinical interpretation may vary between the platforms. On the BAC platform, unclear results are often associated with gaps in coverage which prevent the precise determination of the breakpoints and gene content of an abnormal region. This lack of information prohibits definitive interpretation of the clinical significance of the alteration. Figure 6 presents a 262 kb deletion of 9q33.1 detected by BAC array in a patient referred for developmental delay, dysmorphic features, and multiple congenital anomalies. The boundaries of this alteration as defined by BAC array include only one gene, TLR4 [25]. However, gaps in BAC coverage on both sides of the alteration span 4.5 Mb proximally and 4.0 Mb distally. As a result of these coverage gaps, this alteration, though estimated to be just 262 kb, may be as large as 8.7 Mb and include up to 48 additional genes. The design of BAC arrays with dense clone coverage is possible; however, probe density is limited by the availability of BAC clones and the presence of potentially interfering genomic architecture such as segmental duplications. In addition, BACbased microarrays will not reliably detect abnormalities smaller than the size of an individual probe-80-200 kb, on average, for BAC clones. Although gaps in coverage and limited breakpointresolving power are primarily a concern for BAC platforms, both oligo and BAC platforms produce results that are unclear because a lack of published evidence prevents a conclusive association between the gene content of an alteration and the clinical features of the patient from being made. Figure 7 presents a 160 kb deletion of 4q25 detected by oligo array in a patient referred for developmental delay. Follow-up analysis performed on this patient's parents revealed that this alteration was de novo in origin. This alteration deletes two genes, PAPSS1 and SGMS2. While mutations or alterations of these genes have not been associated with disease in humans, it has been shown that PAPSS1 plays a key role in post-translational modification and SGMS2 mediates the production of sphingomyelin [26,27]. Thus, although the gene content and inheritance pattern of this deletion suggest a causative role in the patient's clinical features, a lack of published information linking the genes affected by this alteration with a distinct phenotype prevents a clear interpretation from being made based on only aCGH results. This type of unclear result, although more prominent with oligo platforms (4.2% by BAC vs. 7.1% by oligo), is an element of all aCGH analysis regardless of platform and accentuates the need for databases containing aCGH results in combination with phenotypic information. Although the number of characterized genetic disorders and genomic regions is rapidly increasing, the clinical consequences of alterations involving much of the genome still remain unclear.
The increase in the number of copy number alterations identified by higher-resolution whole-genome arrays underscores the need for a variety of tools to facilitate the interpretation of array results in a clinical diagnostic setting. We propose the use of an algorithm such as the one outlined here in conjunction with databases of normal population variants, clinically significant alterations, and those of unclear significance. Although such databases can provide invaluable context for the analysis of aCGH data, care must be taken by the diagnostician when comparing their data to pre-existing databases of copy-number variations. For example, data in the DGV are pooled from a variety of sources, platforms, and populations using a variety of different controls and without independent verification, and thus may not be appropriate for comparison in all situations. Furthermore, recent evidence suggests that most data in the DGV overestimate the size of the regions involved because they are dependent primarily on BAC array data, which has a tendency to overestimate the true size of small aberrations [28]. Thus, the most useful CNV databases may be those generated by individual laboratories using identical reference controls and array platforms. Based on our experience, we have constructed a database of abnormal copy number aberrations identified by BAC and oligo aCGH in our laboratory and a database of copy-number variations thought to have no significance. Such databases are essential for understanding the various copy number aberrations identified by microarray analysis.
Genotype-phenotype correlations in a diagnostic setting must address a variety of factors including gene content, potential position effects, aberration size, and inheritance patterns. These factors often present conflicting evidence about the potential clinical significance of a rare alteration. For instance, the size of an abnormality is   commonly used as justification for its proposed clinical consequences; however, this association is not always straightforward. High-resolution microarray analysis routinely detects abnormalities smaller than 500 kb that disrupt clinically significant genes and have clear phenotypic impact ( Figure 4); conversely, numerous examples of common copy-number variants have been observed that are relatively large but lie in regions with sparse gene content. In addition, although it is generally assumed that de novo abnormalities are causative and inherited abnormalities are not, this is not always the case. There are a number of regions of the genome where both inherited and de novo copy number alterations have been identified, some of which result in mild phenotypes that may be inherited from parents who have a milder or subclinical, presentation. For example, deletions of distinct regions of 1q21 have been associated with both thrombocytopenia absent radius (TAR) syndrome and a variable phenotype including microcephaly/ macrocephaly, developmental delay, cardiac abnormalities, and schizophrenia [29][30][31], but in many instances aberrations of these regions are inherited from phenotypically normal parents [32]. Another example is the 16p11.2 region associated with a range of cognitive, developmental, and speech delays, behavioral issues, and autism, deletions and duplications of which can be inherited or de novo [33][34][35][36]. In regions such as these, copy number changes may unmask recessive alleles or work in conjunction with various genetic modifiers, perhaps even other CNVs, to produce a clinical phenotype. Potentially, non-paternity may also confound genotype-phenotype correlation for copy number alterations in these complex regions of the genome. These reasons underscore the need for thorough databases of normal population variants and clinically significant alterations complete with genotype-phenotype correlations. Such databases expedite the process of determining the potential significance of copy-number alterations in a diagnostic setting; aid in the elucidation of new microdeletion/duplication syndromes and new regions of benign copy-number variation; and help reduce the burden of expensive, timeconsuming, and difficult follow-up necessitated by the increased number of alterations of unclear clinical significance detected by microarray analysis. We [19] and others [37] have shown that mosaicism can be detected at low frequencies of chromosomally abnormal cells using BAC-based aCGH; however, the ability of oligo platforms to reliably detect mosaic abnormalities has not yet been well established. Our current assessments demonstrate that aCGH using . The nearest distal clone on chromosome 9 that is not deleted is RP11-977E8 and is approximately 4.0 Mb away from the deleted region. The nearest proximal clone on chromosome 9 that is not deleted is RP11-999I23 and is approximately 4.4 Mb away from the deleted region. Probes are ordered on the x axis according to physical mapping positions, with proximal 9q32 clones to the left and distal 9q33.2 clones to the right. Below is a schematic of the deletion region. Vertical blue lines represent the minimum size of this alteration, which encompasses one gene, TLR4.
either BAC or oligo platforms can easily detect mosaicism of 30% or greater for a variety of alterations and that levels as low as 10% can be detected with both platforms under optimal conditions. In addition, our retrospective analysis showed that there is no significant difference between the two types of platforms in the number of mosaic abnormalities detected in a clinical diagnostic setting (p = 0.7066). However, BAC-based arrays may still have a greater ability to detect mosaic abnormalities present at very low levels (less than 20%), perhaps due to the routine use of dye-swap experiments which can be cost-prohibitive with oligo arrays but promote the visual identification of mosaic abnormalities. The sensitivity of the BAC array is demonstrated by the detection in three cases of abnormalities in only 10% of cells during the retrospective study, whereas the lowest level of mosaicism detected by our oligo array was 21% ( Table 5). The ability of an aCGH platform to detect mosaic abnormalities also depends largely on the effectiveness of the software used to analyze the data, as low-level mosaic alterations are difficult to identify using only visual inspection (Figure 3). For this reason, it is important to select analysis software which facilitates the identification of mosaic alterations.
These data suggest high-resolution oligo-based aCGH detects a higher proportion of clinically significant abnormalities than BAC-based aCGH. Our results also demonstrate the ability of microarray-based CGH to reliably produce high-yield results in a clinical setting using differing platforms, array designs, and analysis algorithms, supporting the validity of array CGH as a first-tier diagnostic screening tool [38]. Finally, the prevalence of copy number variants of unclear clinical significance detected on both platforms underscores the need for the development of readily accessible diagnostic tools in the form of databases of documented chromosome abnormalities to aid in the interpretation of microarray data.