Contribution of copy number variants (CNVs) to congenital, unexplained intellectual and developmental disabilities in Lebanese patients

Background Chromosomal microarray analysis (CMA) is currently the most widely adopted clinical test for patients with unexplained intellectual disability (ID), developmental delay (DD), and congenital anomalies. Its use has revealed the capacity to detect copy number variants (CNVs), as well as regions of homozygosity, that, based on their distribution on chromosomes, indicate uniparental disomy or parental consanguinity that is suggestive of an increased probability of recessive disease. Results We screened 149 Lebanese probands with ID/DD and 99 healthy controls using the Affymetrix Cyto 2.7 M and SNP6.0 arrays. We report all identified CNVs, which we divided into groups. Pathogenic CNVs were identified in 12.1% of the patients. We review the genotype/phenotype correlation in a patient with a 1q44 microdeletion and refine the minimal critical regions responsible for the 10q26 and 16q monosomy syndromes. Several likely causative CNVs were also detected, including new homozygous microdeletions (9p23p24.1, 10q25.2, and 8p23.1) in 3 patients born to consanguineous parents, involving potential candidate genes. However, the clinical interpretation of several other CNVs remains uncertain, including a microdeletion affecting ATRNL1. This CNV of unknown significance was inherited from the patient’s unaffected-mother; therefore, additional ethnically matched controls must be screened to obtain enough evidence for classification of this CNV. Conclusion This study has provided supporting evidence that whole-genome analysis is a powerful method for uncovering chromosomal imbalances, regardless of consanguinity in the parents of patients and despite the challenge presented by analyzing some CNVs. Electronic supplementary material The online version of this article (doi:10.1186/s13039-015-0130-y) contains supplementary material, which is available to authorized users.


Background
Intellectual disability (ID) is defined as a significant limitation in both intellectual function and adaptive behavior that originates before the age of 18 [1]. Its prevalence in the general population is estimated to be between 1 and 3%, with higher disability rates in developing countries [2][3][4][5]. In Lebanon, the latest statistical study on ID showed a relatively high incidence (4.1%) [6].
The etiological factors of ID are heterogeneous. ID with a genetic origin is more frequently found in patients with an IQ < 50 (50%) than in other patients (15%), with chromosomal aberrations being the most common causes [7,8]. Screening for these imbalances is routinely performed with conventional cytogenetic and molecular tests. Nevertheless, the resolution is limited to 5 Mb in standard karyotyping; thus, the detection of imbalances is successful only in less than 4% of patients with ID (when trisomy 21 is excluded). Other techniques such as fluorescence in situ hybridization (FISH) and multiplex ligation dependent probe amplification (MLPA), which allow the detection of microimbalances smaller than 5 Mb in targeted regions, can each explain 3% of the patients' phenotypes [9,10].
With the introduction of the chromosomal microarray analysis (CMA) technique, which is capable of detecting submicroscopic rearrangements referred to as copy number variants (CNVs), as well as regions of homozygosity (ROH), in search for uniparental disomy or consanguinity, 10-20% of patients with unexplained ID can now be provided with a molecular diagnosis [10,11].
Herein, we report the results of a large CMA analysis project on a cohort of 149 patients with unexplained ID and a cohort of 99 controls. Our findings underscore the implication of clinically relevant CNVs in ID/DD and emphasize the ability of the technique in detecting ROH that are highly suggestive of an increased likelihood of rare recessive diseases.

Results
In this cohort of 149 Lebanese patients having ID, DD, with or without CA, we found an average of 6 chromosomal imbalances per proband of which 67.8% had previously been reported as benign. Of the remaining CNVs, either previously reported or newly found, 10.9% were microdeletions and 21.3% were microduplications.
Twenty causal alterations (group I, Table 1) were identified in 18 patients. Eight of these pathogenic CNVs were terminal, whereas the other 12 were interstitial. One aberration, a microdeletion of 2,262 kb in patient P15, was originally determined to be a balanced translocation by standard karyotyping. Eight (8/149 or 5.3%) were greater than 5 Mb and, therefore, could have been detected by conventional karyotyping. All CNVs belonging to group I were of de novo origin, except for one that was inherited from the father of patient P10 [12].
Six CNVs were classified in group IIa (Table 2A). In 5 of these, the de novo or inherited status could be confirmed. Three of those CNVs were found in patients with consanguineous parents. They were homozygous and were segregated in the families. Other likely causative CNVs are de novo and have morbid gene like GRHL2 and RICTOR.
One CNV, belonging to group IIb and involving a pathogenic gene, was inherited from a normal parent and was therefore considered to be of unclear significance (Table 2B).
Twenty other CNVs were classified as familial variants (group IIc, Additional file 2: Table S1) that were most likely benign. Eleven were microduplications that ranged between 47 and 1,612 kb. One inherited microdeletion from a healthy mother encompassed intron 4 of the AUTS2 gene.
Thirty-nine CNVs, above the limited threshold set for the detection of true positive CNVs but with no parental DNA available for further investigation, were also classified as variants of uncertain clinical significance (VOUS) (group IId, Additional file 3: Table S2), of which 84.6% (33/39) were microduplications.
The arrays were also analyzed for significant regions of absence of heterozygosity in search for uniparental disomy, mosaicism, and autosomal recessive pathogenicity in patients with consanguineous parents. Our results did not suggest uniparental disomy or mosaicism but confirmed the high rate of consanguineous marriage in Lebanon: 42 of the 149 patients (28.2%) were confirmed as being born to closely related parents and had greater than 66 Mb of ROH, which correlates with an estimate of F = 1/32. Three of these patients had an ROH size equivalent to that of the theoretical coefficient of inbreeding, F = 1/4, ten patients had an estimated F = 1/8, 22 probands had F = 1/16, and 7 patients had F = 1/32 (Additional file 4: Table S3).

Discussion
Herein, we present our results from the first Lebanese CMA study investigating the involvement of CNVs in patients with ID/DD. We validated the Cyto2.7 M array platform with a confirmation of 99 identified CNVs using Quantitative PCR (Q-PCR). This resulted in the determination of a clinical threshold of 62 kb with at least 49 consecutive markers. Most abnormalities that do not meet these requirements could not be confirmed and their presence is due to the "cleanness" of the signal detected from the array. This confirms results from previous studies mentioning the importance of taking into consideration array Quality Control parameters [13]. However, the genomic content should be checked in all CNVs as it points to candidate genes that could be missed when applying any threshold. An improvement of the reliability of small CNVs in newer arrays is then required and was applied in the CytoScan HD array by enriching the number and type of markers in a region (oligonucleotide and SNP probes) [14].
Our results indicate an overall diagnostic yield of 12.1%, a value in agreement with previous studies obtaining a detection rate between 10% and 20% [10,15]. Different array types from the same company, like Affymetrix 500 K SNP array are used with a close threshold of 100 kb and also obtain similar diagnostic yield [16]. We confirmed, once more, the pathogenic effect of CNVs belonging to group I. However, we found new genotype/phenotype correlations in three patients, with 1q44, 10q26.11-q26.13, or 16q22.3 interstitial microdeletions, belonging to group I. We reviewed reports of patients with aberrations that overlap these CNVs and propose a variety of new findings for these 3 aberrations.

1q44 microdeletion
Patient P2, a 48-month-old female with ID, DD, and facial dysmorphisms (Table 1), was found to have two de novo microdeletions involving the 1q44ter region and exon 5 of JARID2. The deletion of 1q44 is a recognizable clinical disorder characterized by short stature, DD, ID, microcephaly, facial dysmorphism, variable types of seizure, and partial to complete agenesis of the corpus callosum (ACC) [17][18][19]. Moreover, deletions of JARID2 are associated with cognitive impairment and facial features such as prominent supraorbital ridges, deep set eyes, dark infraorbital circles, and midface hypoplasia [20,21]. When carefully examined, patient P2 has the same facial features noted in patients with both deletions, JARID2 deletion and the 1q44 syndrome. Although the 1q44 syndrome forms a recognizable phenotype, the disentangling of the genetic causes of seizure, ACC, and microcephaly has been challenging. To explain these features, several critical regions in the 1q44 deletions were defined; the AKT3 gene has been proposed to be responsible for microcephaly, and the ZNF238 gene (also named ZBTB18) to have a potential role in ACC [22,23]. In patient P2, neither of these genes was deleted; however, she had microcephaly. Ballif et al. proposed a critical region containing three genes: COX20 with no known relevant function, HRNUPU and HRNUPU-AS1. The last two when mutated, are thought to be responsible for seizures with the occurrence of the first seizure not exceeding 4 years of age [22,24]. The deletion of HRNUPU, which is involved in embryonic brain development, is most likely pathogenic because of its haploinsufficiency. Its antisense transcript, HRNUPU-AS1, has been found to affect the expression of HRNUPU [25]. Our patient's deletion (Additional file 1: Figure S1) overlapped the critical region defined by Ballif et al. encompassing the seizurecausing genes proposed in the literature, although the patient had microcephaly but no seizures. Seizures may not have yet started in the proband studied here, although seizure onset occurs early in the described patients, making future seizures very unlikely. The deletion may also have incomplete penetrance or variable expressivity. Finally, a combination of the 1q44 deletion and the 20 kb deletion of JARID2's exon 5 might explain the patient's phenotype and an epistatic relationship may exist between the copy number imbalances, exacerbating the proband's intellectual impairment and leading to the modification of the patient's clinical features.
Genital anomalies have been also associated with the deletion of the 10q25.3-q26.1 segment [26]. We suggest the common region 10q26.12 to be responsible for these defects and reduce the previously critical region to a region including only WDR1 and PPAPDC1A (article accepted in AMJG).

16q22.3 microdeletion
The 16q22.3 microdeletion was found in patient P15, who was known to have an apparently balanced translocation on standard karyotype between chromosome 1 and 16. Published reports of chromosome 16 microdeletions are very rare. Few cases involve the 16q22.3 region deleted in patient P15 [33][34][35][36][37][38][39][40]. These patients show similar characteristics involving cleft palate, ID, and psychomotor retardation. The deletion described here reduces the smallest region of overlap to 2,262 kb and suggests that the absence of a kidney and the presence of clubbed feet in the patient described by Natt and colleagues [34] as well as congenital heart defects are not caused by the absence of genes present in the overlapping region.
One microimbalance belonging to group IIb was inherited from the healthy parent of patient P24. It is a 64 kb heterozygous deletion affecting exons 9 to 13 of ATRNL1. A 325 kb deletion adjacent to this gene, described by Stark et al., implicates this gene in cognitive impairment, autism and several dysmorphic features. ATRNL1 is involved in the regulation of energy homeostasis by binding to melanocortins [41,42]. The two patients have some common characteristics (Table 3), especially autistic traits, skeletal abnormalities, and ID. However, the presence of the deletion in the healthy mother of patient 24 makes it difficult to assess the clinical significance of this CNV. Therefore, sequencing of ATRNL1 was performed but no point mutation was found. This CNV can then be considered pathogenic with an incomplete penetrance or simply classified as benign. A study with a larger number of controls is therefore required because only intronic deletions were found in our control database (10 CNVs, all in intron 26).
Five CNVs were found in 42 probands with consanguineous parents. This confirms previous studies that showed the importance of microarrays in the identification of causes of ID/DD in probands with consanguineous parents [43]. These CNVs have a pick-up rate of 3.4% (5/149). Three of them were new homozygous deletions: a deletion of the PTPRD gene in patient P20 and the two other CNVs characterizing two new phenotypes in patients P22 and P23. Although the clinical significance of the three CNVs is still unclear, they were considered potentially causative because they followed a pattern of autosomal recessive inheritance. A search for other patients with similar phenotypes is necessary to accurately classify these CNVs.
Additionally, we looked for the presence of parental consanguinity suggestive of autosomal recessive disorders. We therefore compared the estimated coefficient of inbreeding to the coefficient deduced from the pedigree of each family, and interestingly, we noticed the occurrence of significant deviations from theoritical values. In 17/42 patient (40.4%), a higher degree of relationship than shown by their pedigree was observed. This variation is due to multiple loops of consanguinity or multiple generations of inbreeding observed within the Lebanese community [44], which complicates the estimation of the degree of relationship. These cases were marked for further investigation with the aim of sequencing candidate genes within the ROH regions.
Finally, we established an internal database for polymorphic CNVs to help further studies discriminate between rare polymorphisms and disease-associated variants (data not shown).

Conclusions
In conclusion, this is the first Lebanese study on ID/DD patients. It has provided supporting evidence that wholegenome analysis is a powerful method for uncovering chromosomal imbalances and genomic rearrangements, regardless of consanguinity in the parents of patients and despite the challenge presented by analyzing some CNVs.

Ethical statement
This study was carried out with protocols approved by the Institutional Review Board (IRB) on human experimentation at Saint Joseph University.
Patients and controls from all regions of Lebanon were recruited through the Medical Genetics Unit of Saint Joseph University over a period of three years. Approval for the study and informed written consent were obtained from legally authorized patient representatives and the 99 healthy subjects.

Cohort
A total of 149 Lebanese children (88 boys and 61 girls) with moderate to severe ID associated with developmental delay and/or congenital abnormalities (CA) of unknown origin were analyzed using the Affymetrix Cyto 2.7 M platform. Known syndromes were eliminated using karyotyping, subtelomere FISH, MLPA, and/or fragile X testing. Copy number analysis was performed for 99 healthy Lebanese individuals using two types of arrays, SNP 6.0 and Cyto 2.7.

Chromosomal microarray analysis-based technologies Cyto 2.7M
Genomic DNA, isolated from peripheral blood samples using the salting-out technique, was amplified, purified, fragmented, denatured, and then hybridized into the Cyto 2.7 M, following the Affymetrix® standard protocol.
A single array has a high density of 2,361,876 nonpolymorphic markers and 400,103 SNP markers, with whole-genome backbone coverage of~1 kb spacing. The analysis of scanned chips was performed using the Affymetrix Chromosome Analysis Suite software (ChAS v.1.0.1). The software initiates studies on arrays of which the Median Absolute Pairwise Difference score (MAPD), the waviness segment count, and the SNPQC meet the Quality Control (QC) criteria set by the manufacturer: MAPD < 0.27; SNPQC > 1.1; Wav Seg Count ≤ 30 [45]. The annotation file used in our analysis can be found on the Affymetrix website, listed as ArrayNA30.1 (hg18).

SNP 6.0
This array consists of 906,600 SNP probes and 900,000 non-polymorphic oligonucleotides used for detecting CNVs with an average spacing of 0.7 kb. The preparation and hybridization of DNA samples were performed following the Affymetrix® 6.0 standard protocol.

Assessment of array parameters
Assessing the existence and causality of the small CNVs identified by this high-resolution platform was very challenging and required the consideration of array parameters. We selected ninety-nine random CNVs, with no threshold, for validation by quantitative PCR (Q-PCR). Our results (data not shown but available upon request) prompted us to select a threshold of 62 kb with at least 49 consecutive markers, which we further utilized to filter the CNV analysis results.

Detection of parental disomy and consanguinity
Large ROH observed on a single chromosome can be suggestive of parental disomy (≥10 Mb); however, when distributed throughout the genome, large ROHs are indicative of a consanguineous relationship between the patient's parents. Small stretches of homozygosity < 3 Mb were ignored because they are common even in outbred populations [46].
To detect possible parental consanguinity, we compared the patient's ROH size, calculated on several chromosomes (sum of ROH ≥ 3 Mb), with the theoretical ROH size, estimated by multiplying the 2,867,732,772base total size of the autosomal haploid genome (NCBI Build 36.1 assembly (2006)) by the theoretical value of the coefficient of inbreeding [47]. We also defined a variable range of expected ROH size by using the mid-line between theoretical average sizes as used by Fan et al. (Additional file 5: Table S4) [43].

Quantitative PCR (Q-PCR)
The array findings were confirmed and their de novo and inherited status were distinguished using Q-PCR with an ABI Prism 7500 system (Applied Biosystems, Foster City, CA, USA) using fluorescent SYBR Green dye (ABI). Specific primers targeting genes or intronic sequences were designed using Primer Express 3 Software (ABI).
PCR was performed in a 20 μl reaction volume containing 10 μl Power SYBR-Green PCR Master Mix (ABI), 10 pmol forward and reverse primers, and 10 ng of genomic DNA. The reaction cycling conditions were 95°C for 10 min, followed by 40 cycles of 95°C for 15 sec and 60°C for 1 minute. Each sample was run in triplicate for the quantification of the expression level of a target gene and compared to the expression level of two endogenous genes.
Data evaluation was carried out using the ABI Prism Sequence Detection System (SDS) using the comparative ΔΔ threshold cycle number (Ct) method. To exclude the presence of non-specific products, a melting-curve analysis of the products was performed after completion of the amplification.

Workflow for selecting CNVs
To assess the clinical significance of the detected CNVs, we followed the recommended steps from Miller et al. and Buysse et al. [10,47,48].
All imbalances found at least twice in the Database of Genomic Variants (DGV) and our internal database of healthy individuals were considered to be benign and excluded from further analysis. CNVs under the selected threshold, as well as those that did not involve genes or miRNAs, were also excluded. The remaining CNVs were classified into groups.
Group I contains pathogenic CNVs overlapping critical regions of known microdeletions or microduplications and/or involving genes already described as causing a phenotype, especially ID. These CNVs are found in the publicly available DECIPHER (http://decipher.sanger.ac.uk) and ISCA (www.clinicalgenome.org) databases and in published literature such as the Catalogue of Unbalanced Chromosome Aberrations in Man [9].
Group II contains genomic imbalances classified as being variants of uncertain clinical significance because of their unclear possible pathogenicity. Parental studies were mandatory for the classification of these CNVs. VOUS were grouped into four categories: Group IIa contains rare, likely pathogenic CNVs that mostly occur de novo and includes genes with a possible correlation to the phenotype (abnormal with a low recurrence risk); Group IIb corresponds to CNVs for which clinical interpretation remains uncertain, even after parental studies, owing to variable expressivity or incomplete penetrance; Group IIc includes all inherited CNVs, also called familial variants, that were considered to be benign; and Group IId contains VOUS that could not be further tested owing to the absence of parental DNA.

DNA sequencing
The coding sequences of ATRNL1 were sequenced after DNA amplification by PCR (NM-207303). Primers were designed using Primer 3 (http://frodo.wi.mit.edu) and OLIGOS v.9.3, and checked for specificity using BLAST (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi). PCR reactions were performed using Taq DNA polymerase (Invitrogen Life Technologies, Carlsbad, Calif., USA). PCR products from genomic DNA were purified using the illustra TM GFX PCR DNA and Gel Band Purification Kit (GE Healthcare, Buckinghamshire, UK), and sequenced using the BigDye _ Terminator v1.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, Calif., USA) under standard conditions. The labeled products were subjected to electrophoresis on an Applied Biosystems Genetic Analyzer sequencing system.