About classical molecular genetics, cytogenetic and molecular cytogenetic data not considered by Genome Reference Consortium and thus not included in genome browsers like UCSC, Ensembl or NCBI

Liehr, Thomas

doi:10.1186/s13039-021-00540-7

Commentary
Open access
Published: 20 March 2021

About classical molecular genetics, cytogenetic and molecular cytogenetic data not considered by Genome Reference Consortium and thus not included in genome browsers like UCSC, Ensembl or NCBI

Thomas Liehr ORCID: orcid.org/0000-0003-1672-3054¹

Molecular Cytogenetics volume 14, Article number: 20 (2021) Cite this article

6848 Accesses
5 Citations
2 Altmetric
Metrics details

Abstract

Background

The Genome Reference Consortium (GRC) has according to its own statement the “mission to improve the human reference genome assembly, correcting errors and adding sequence to ensure it provides the best representation of the human genome to meet basic and clinical research needs”. Data from GRC is included in genome browsers like UCSC (University of California, Santa Cruz), Ensembl or NCBI (National Center for Biotechnology Information) and are thereby bases for scientific and diagnostically working human genetic community.

Method

Here long standing knowledge deriving from classical molecular genetic, cytogenetic and molecular cytogenetic data, not being considered yet by GRC was revisited.

Results

There were three major points identified: (1) GRC missed to including three chromosomal subbands, each, for 1q32.1, 2p21, 5q13.2, 6p22.3 and 6q21, which were defined by International System for Human Cytogenetic Nomenclature (ISCN) already back in 1980s; instead GRC included additional 6 subbands not ever recognized by ISCN. (2) GRC defined 34 chromosomal subbands of 0.1 to 0.9 Mb in size, while it is general agreement of cytogeneticists that it unlikely to detect chromosomal aberrations below 1–2 Mb in size by GTG-banding. And (3): still all sequences used in molecular cytogenetic routine diagnostics to detect heterochromatic and/ or pericentromeric satellite DNA sequences within the human genome are not included yet into human reference genome. For those sequences, localization and approximate sizes have been determined in the 1970s to 1990, and if included at least ~ 100 Mb of the human genome sequence could be added to the genome browsers.

Conclusion

Overall, taking into account the here mentioned points and correcting and including the data will definitely provide to the still not being completely finished mapping of the human genome.

Background

The goal to understand ourselves as human beings and what makes us that different from all other species on the planet through centuries led to multiple lines of sciences, including philosophy, theology, history, medicine and natural sciences. The latter discipline gave birth to genetics and initiated ~ 3 decades ago the effort to sequence the entire human genome, with the hope to finally reach here an in-depth breakthrough concerning the above mentioned question.

Still, one can see the enthusiasm the ‘Human Genome Project’ (HGP) was and is accompanied by, in the statement on the corresponding internet presence as: “The HGP was one of the great feats of exploration in history. Rather than an outward exploration of the planet or the cosmos, the HGP was an inward voyage of discovery led by an international team of researchers looking to sequence and map all of the genes—together known as the genome—of members of our species, Homo sapiens. Beginning on October 1, 1990 and completed in April 2003, the HGP gave us the ability, for the first time, to read nature's complete genetic blueprint for building a human being” [1].

Looking at the history of human genetics, it was Gregor Mendel who suggested 1856 that in the cells there must be “coupling groups”, i.e. that what was seen by Walther Flemming in 1879 and called by Wilhelm Waldeyer in 1888 ‘chromosomes’ [2]. Even though banding cytogenetics, introduced by Lore Zech in the late 1960s provided major progress in human genetics [3], soon the developments in molecular genetic techniques were taking over the main stream of the field [4]. However, neither a pure base pair oriented view (of molecular genetics) nor an isolated chromosome-oriented view (of (molecular) cytogenetics) alone will ever be sufficient to understand our genome. Accordingly, the Genome Reference Consortium (GRC), being responsible for collecting and publishing the human genome reference sequence, aligns actual sequencing data with the chromosomal level. Recent insights from ‘second-’ and ‘third-generation-sequencing’ approaches [5] highlighted furthermore by discovery of inter- and intrachromosomal interactions, and more specifically the TADs (topologically associating domains) [6], that chromosome structure is extremely important for genome function, and if impaired for diseases [5, 6]. Most recently, insight from electron microscopy combined with cytogenetic, molecular cytogenetic and molecular genetic data led to a complete new understanding of the chromosome structure itself [7]. As stated by Joan-Ramon Daban: “Experimental evidence indicates that the chromatin filament is self-organized into a multilayer planar structure that is densely stacked in metaphase and unstacked in interphase. This chromatin organization is unexpected, but it is shown that diverse supramolecular assemblies are multilayered. The mechanical strength of planar chromatin protects the genome integrity, even when double-strand breaks are produced.” He suggests “that the chromatin filament in the loops and topologically associating domains is folded within the thin layers of the multilaminar chromosomes. It is also proposed that multilayer chromatin has two states: inactive when layers are stacked and active when layers are unstacked” [7].

In this paper it is discussed the following: in as much could GRC profit in its mission to “improve the human reference genome assembly, correcting errors and adding sequence to ensure it provides the best representation of the human genome to meet basic and clinical research needs” [8] by performing also such an integrative view on the available data of the human genome as Joan-Ramon Daban did? This question is of immense practical meaning, as GRC-data is bases for genomic browsers like UCSC (University of California, Santa Cruz) [9], Ensembl [10] or NCBI (National Center for Biotechnology Information) [11], and those are being applied as backbone for correct interpretation of (molecular) cytogenetic and molecular genetic diagnostic results. In the following, insights to be considered from banding cytogenetics, molecular genetics and molecular cytogenetics are accordingly discussed, because as recently stated by Ye and colleagues: “karyotype coding” defines the genome system information [12].

Insights from banding cytogenetics

Banded human chromosomes confronted scientists in the 1970s with similar problems and questions like nowadays there is with sequencing data:

How to describe and how to denominate what we see?
How to define a worldwide valid nomenclature for what we see?
How to give a definite and reliable nomenclature for bands or DNA-sequence positions?

Banding pattern in human chromosomes was denominated compulsorily latest in 1978 [13], and refined later on, with progress of used methods and higher banding resolution [14]. Since many editions of the International System for Human Cytogenomic Nomenclature (ISCN), banding nomenclature has not been changed, and refers to data from 1981 and 1994 [15]. This denomination of bands had just the goal to describe what is visible in a light-microscope: shorter, more condensed chromosomes show less GTG-dark and -light bands than longer, more decondensed ones (Fig. 1). However, this nomenclature is by no means reflecting biological realities, i.e. it does not describe which subbands at a higher chromosomal resolution derived from which more condensed ones, at lower resolution. This has been shown by seminal works of Uwe Claussen, who could demonstrate that GTG-light bands represent maximally decondensed chromosomal parts, and never split into further subbands. He showed this for chromosomes 6 [16] and Xp [17] using chromosomes stretching, and for all other chromosomes applying another, fluorescence in situ hybridization (FISH) based approach [18] (Fig. 1).

In as far as it might be important for GRC to consider that e.g. the maximally stretched short arm of the X-chromosome has overall 14 GTG-light and 13 GTG-dark bands, has to be seen in future. At least one could deduce from that study [17] that there might be roughly (if Xp represents ~ 2% if human genome) overall ~ 700 GTG-light and 650 GTG-dark bands in the whole human genome, which wait for their alignment with sequencing data. In the light of the description of TADs [6] and work of Joan-Ramon Daban [7] this has to be considered earlier or later. Also an important clue to learn from this work [16,17,18], is that in GTG-light bands, being less condensed than GTG-dark ones, there should be less DNA included than in more condensed dark ones. The GTG-dark bands, being alinged at ~ 850 band level with sequencing results in genome browsers, can still be decondensed and should thus must contain more DNA and more GTG-light and dark subbands at higher decondensation levels.

There are lacking and newly postulated subbands in GRC

Obviously, the 850 band level of a haploid human chromosome set was originally used to align the GRC-sequencing data with chromosomal subbands. However, during this transfer, the splitting of 5 bands into three subbands each was missed. Thus, 15 subbands as shown in Fig. 2 are not included in all genomic browsers. There are depicted there as (i) 1q32.1 instead of 1q32.11, 1q32.12 and 1q32.13, (ii) 2p21 instead of 2p21.3, 2p21.2 and 2p21.1, (iii) 5q13.2 instead of 5q13.21, 5q13.22 and 5q13.23, (iv) 6p22.3 instead of 6p22.33, 6p22.32 and 6p22.31, and (v) 6q21 instead of 6q21.1, 6q21.2 and 6q21.3.

On the other hand, subband 9q34.1 is not subdivided in ISCN, but in GRC there are 3 subbands as 9q34.11, 9q34.12 and 9q34.13; the same was done for 6p24, which is divided in 6p24.3, 6p24.2 and 6p24.1. Interestingly, until version GRCH37/hg19 band 2q12.2 was also divided into 3 subbands (2q12.21, 2q12.22, 2q12.23) not present in ISCN, ever.

Size of chromosomal subbands cannot be changed according to sequence results

The sizes of the chromosomal subbands shown in ISCN were determined based on microscopic observations. Even though it is stated in ISCN [15] that “location and width of bands are not based on any measurements”, they represent what is and was seen by ten thousands of cytogeneticists worldwide, and since decades; thus it can be taken as fact, and no better data is available than this. Accordingly, individual band extensions in the chromosomal idiograms depicted in the genome browsers are not allowed to be changed based on results of sequencing. However, the latter has obviously been done when updating the browsers with new versions (Fig. 3). It could not be found out how GRC aligns sequencing data to GTG-bands—however, the described changes of chromosomal band sizes suggest that it has been done possibly the following way: In the first version several years ago it was defined according to the knowledge from that time that e.g. band A on a certain chromosome is located between DNA-markers X and Z. Later on more or less DNA-stretches had been found to be located indeed between these markers, and thus band size had to be adapted. It must be repeated—this is illegitimate and must lead to non-compatible molecular cytogenetic and molecular data (Fig. 3). The percentages / expansions of the chromosomal subbands can only be oriented on the values given for each band in Additional file 1: Table 1—column C. In Additional file 1: Table 1 size of chromosomal subbands were determined (in percent of total chromosomal length) according to ISCN (2020) [15] and aligned with the overall DNA-content per chromosome given in UCSC [9]. It was calculated:

$$x = \frac{{{\text{ A }}\left[ {{\text{Mb}}} \right]{\text{ x B }}\left[ {\text{\%}} \right]}}{100}$$

A = length of chromosome A [Mb]; B = percentage of a chromosomal band of the chromosome the band is located on.

Chromosomal subbands cannot be seen in microscope if they are smaller than 1.0 megabasepair

According to an “American College of Medical Genetics guideline on the cytogenetic evaluation of the individual with developmental delay or mental retardation” statement from 2005, “at resolutions > 650 bands, alterations as small as 3–5 Mb can be reliably detected using chromosome analysis on peripheral blood; for the detection of subtle rearrangements in patients with either abnormal or normal karyotypes, molecular cytogenetic analysis may be useful” [19]. Also in high resolution GTG-banding a resolution of ~ 1400 bands can be achieved—this means that at this band level it may be (theoretically) possible to see bands of 1–2 Mb in size—maybe exceptionally, even 0.5 Mb. Deduced from that it may be optimistically suggested that at a resolution of 850 bands a subband may be seen if is at least 1 Mb in size. In Table 1 34 subbands are listed which are 0.1 to 0.9 Mb in size according to GRCh38/hg38. On the other hand, only 22 subbands are between 0.53 and 0.99 Mb according to ISCN and Additional file 1: Table 1—column C (Table 2). None of them is smaller than 0.5 Mb and 2/3 of them are GTG-light bands; no GTG-dark band is smaller than 0.75 Mb. Here it is of interest, as above mentioned, that GTG-light bands are maximally decondensed—thus, higher banding resolutions than 1 Mb seem to be possible here.

Table 1 Chromosomal bands smaller than 1.0 Mb in size, according to UCSC Genome Browser (GRCh38/hg38) assembly are listed and compared to the size calculated based on ISCN (2020) idiograms (see Additional file 1: Table 1)

Full size table

Table 2 Chromosomal bands smaller than 1.0 Mb in size (size calculated based on ISCN (2020) idiograms—see Additional file 1: Table 1) compared to their size according to UCSC Genome Browser (GRCh38/hg38) assembly are listed. Also it is indicated of the corresponding band was GTG-dark or GTG-light

Full size table

Overall this means, band length in GRC should be reconsidered, not allowing smaller bands in size than 0.5 Mb; especially as such are even in UCSC no more visible (Fig. 3). Furthermore, it should be considered GTG-light bands contain naturally less DNA than GTG-dark bands. A simple projection of the same stretch of DNA in a GTG-light band as in a GTG-dark band will not reflect what is biological reality.

Lessons to learn from classical molecular genetics and molecular cytogenetics

What about repetitive DNA?

In early times of molecular genetics (here referred to as “classical molecular genetics”) the interest in repetitive DNA within the human genome was immense. It was simply accessible and easy to study. As summarized elsewhere [20], those studies produced immense data about these yet by modern approaches (like second generation sequencing) still almost not accessible regions of the human genome [5]. In Table 3 just a selection of since decades identified satellite DNA-sequences is given. Most of them were and are used in millions of FISH experiments (here referred to as “molecular cytogenetics”) and the location of these DNA-stretches is more than proven. Even the approximate sizes of these repetitive DNAs are known. If only those satellite DNAs from Table 3 would be included into the genome browsers, in one run 100 Mb of yet not mapped DNA-regions could be filled. This is also more than timely, as the expression of such satellite DNAs has been shown as least as long non-coding RNAs not only just recently [21]. Additional megabases could be filled by using the information in parts collected elsewhere [20] and also by adding nucleolus organizing regions to all acrocentric p-arms and telomeric sequences to the chromosomal ends. From the view of a biologist it is somehow surprising to start and end each chromosome in the browsers ignoring there the well-known telomeric repeats—they should be included, as they cannot be searched in UCSC; NCBI or Ensembl, yet. Maybe, database of genomic variants [22] might start thinking about inclusion of polymorphisms in repetitive DNAs, too. Finally and interestingly, without taking into account these megabases of repetitive DNA GRC includes nonetheless overall 3,091,153,988 base pairs in the human sequence (Additional file 1: Table 1); an incredible exact number, considering all the yet unknown regions.

Table 3 Satellite DNAs with known location and sizes according to [14]

Full size table

Conclusion

GRC and human genome browsers would tremendously profit in their comprehensiveness and accuracy if ‘classical (cyto)genetic’ data and ‘karyotype coding’ [12] would be considered more. As nicely stated by Iourov, Yurov and Vorsanova in 2020 [23]: “Undoubtedly, genome-centric and gene-centric are the words to describe actual concepts in human genetics. In a world of genes and genomes, the lack of required attention to chromosomes is often observed. As a result, chromosome research gradually loses the genetic (genomic) context. Certainly, brilliant insights into chromosome biology obtained by studies dedicated to molecular/cell biology, evolution, biochemistry, biophysics, etc., are fascinating. However, genome research and human (medical) genetics miss the essential link between genes and genomes, which is determined by chromosomal analysis (i.e., cytogenetics, molecular cytogenetics, cytogenomics). This is also the case for diagnostic research, which has recently suffered problems in quality of cytogenetic diagnosis. Data on genes and genomes are useless outside the chromosomal context when intrinsic molecular and cellular pathways are highlighted in health and disease. Without the chromosomal context, genes are virtual elements interacting with each other in an elusive digital universe. Unfortunately, this situation is generally the case for numerous attempts to analyze and interpret genomic data. More dramatically, education programs in genomics and genomic medicine developed for medical/biological students, physicians, or the public generally conceal any information about the chromosome, the physical (biological) storage of genomic data” [23]. This statement is further underlined by publicationos of Ron Hochestenbach and colleagues [24, 25].

Yet, and also after inclusion of more data in future, the results shown in the browsers are nothing more than a model of the human genome—they do not depict the natural human genome, they do not describe the highly variable nature within living cells which is present in a three dimensional context. GRC, as ISCN provide both mainly a unifying nomenclature, to be able to describe aberrations from the norm.

Availability of data and materials

All data generated or analysed during this study are included in this published article and its supplementary information files.

Abbreviations

GRC:: Genome Reference Consortium
GTG:: G-bands by trypsin using Giemsa
HGP:: Human Genome Project
ISCN:: International System for Human Cytogenetic or Cytogenomic Nomenclature
NCBI:: National Center for Biotechnology Information
TADs:: Topologically associating domains
UCSC:: University of California, Santa Cruz

References

https://www.genome.gov/human-genome-project. Accessed 15 Feb 2021.
Liehr T. Human genetics—edition 2020: a basic training package. Epubli; 2020.
Google Scholar
Schlegelberger B. In memoriam: Prof. Dr. rer. nat. Dr. med. h.c. Lore Zech; 24.9.1923–13.3.2013: Honorary member of the European Society of Human Genetics, Honorary member of the German Society of Human Genetics, Doctor laureate, the University of Kiel, Germany. Mol Cytogenet. 2013;6:20.
Article Google Scholar
Liehr T. Overview of yet available approaches used in cytogenomics. In: Liehr T, editor. Cytogenomics. Academic Press; 2021.
Google Scholar
Ungelenk M. Sequencing approaches. In: Liehr T, editor. Cytogenomics. Academic Press; 2021.
Google Scholar
Gross DS, Chowdhary S, Anandhakumar J, Kainth AS. Chromatin. Curr Biol. 2015;25:R1158–63.
Article CAS Google Scholar
Daban JR. Supramolecular multilayer organization of chromosomes: possible functional roles of planar chromatin in gene expression and DNA replication and repair. FEBS Lett. 2020;594:395–411.
Article CAS Google Scholar
https://www.ncbi.nlm.nih.gov/grc. Accessed 15 Feb 2021.
https://genome.ucsc.edu/. Accessed 15 Feb 2021.
https://www.ensembl.org/index.html. Accessed 15 Feb 2021.
https://www.ncbi.nlm.nih.gov/genome/gdv/. Accessed 15 Feb 2021.
Ye CJ, Stilgenbauer L, Moy A, Liu G, Heng HH. What is karyotype coding and why is genomic topology important for cancer and evolution? Front Genet. 2019;1(10):1082.
Article Google Scholar
Lindsten JE, Klinger H.P, Hamerton JL. An international system for human cytogenetic nomenclature (1978) ISCN (1978). Karger, 1978.
Yunis JJ, Chandler ME. High-resolution chromosome analysis in clinical medicine. Prog Clin Pathol. 1978;7:267–88.
CAS PubMed Google Scholar
McGowan-Jordan J, Hastings RJ, Moore S, editors. ISCN 2020—an International System for Human Cytogenomic Nomenclature (2020). Karger, 2020.
Hliscs R, Mühlig P, Claussen U. The nature of G-bands analyzed by chromosome stretching. Cytogenet Cell Genet. 1997;79:162–6.
Article CAS Google Scholar
Kuechler A, Mueller CR, Liehr T, Claussen U. Detection of microdeletions in the short arm of the X chromosome by chromosome stretching. Cytogenet Cell Genet. 2001;95:12–6.
Article CAS Google Scholar
Kosyakova N, Weise A, Mrasek K, Claussen U, Liehr T, Nelle H. The hierarchically organized splitting of chromosomal bands for all human chromosomes. Mol Cytogenet. 2009;2:4.
Article Google Scholar
Shaffer LG, American College of Medical Genetics Professional Practice and Guidelines Committee. American College of Medical Genetics guideline on the cytogenetic evaluation of the individual with developmental delay or mental retardation. Genet Med. 2005;7:650–4.
Article Google Scholar
Liehr T. Benign & pathological chromosomal imbalances; microscopic and submicroscopic copy number variations (CNVs) in genetics and counseling. Academic Press; 2014.
Google Scholar
Hall LE, Mitchell SE, O’Neill RJ. Pericentric and centromeric transcription: a perfect balance required. Chromosome Res. 2012;20:535–46.
Article CAS Google Scholar
Database of genomic variants. http://dgv.tcag.ca/dgv/app/home. Accessed 15 Feb 2021.
Iourov IY, Yurov YB, Vorsanova SG. Chromosome-centric look at the genome. In: Iourov I, Vorsanova S, Yurov Y, editors. Human interphase chromosomes—Biomedical aspects. Springer; 2020. p. 157–70.
Chapter Google Scholar
Hochstenbach R, Slunga-Tallberg A, Devlin C, Floridia G, de Alba MR, Bhola S, Rack K, Hastings R. Fading competency of cytogenetic diagnostic laboratories: the alarm bell has started to ring. Eur J Hum Genet. 2017;25(3):273–4.
Article Google Scholar
Hochstenbach R, Liehr T, Hastings RJ. Chromosomes in the genomic age Preserving cytogenomic competence of diagnostic genome laboratories. Eur J Hum Genet. 2020. https://doi.org/10.1038/s41431-020-00780-y.
Article PubMed Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Jena University Hospital, Friedrich Schiller University, Institute of Human Genetics, Am Klinikum 1, 07747, Jena, Germany
Thomas Liehr

Authors

Thomas Liehr
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Study and writing up of this study was done by TL.

Corresponding author

Correspondence to Thomas Liehr.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The author declares that he has no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Here sizes of chromosomal subbands were determined (in percent of total chromosomal length) according to ISCN (2020) [15] and aligned with the overall DNA-content per chromosome given in UCSC [9].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Liehr, T. About classical molecular genetics, cytogenetic and molecular cytogenetic data not considered by Genome Reference Consortium and thus not included in genome browsers like UCSC, Ensembl or NCBI. Mol Cytogenet 14, 20 (2021). https://doi.org/10.1186/s13039-021-00540-7

Download citation

Received: 15 February 2021
Accepted: 08 March 2021
Published: 20 March 2021
DOI: https://doi.org/10.1186/s13039-021-00540-7

About classical molecular genetics, cytogenetic and molecular cytogenetic data not considered by Genome Reference Consortium and thus not included in genome browsers like UCSC, Ensembl or NCBI