About classical molecular genetics, cytogenetic and molecular cytogenetic data not considered by Genome Reference Consortium and thus not included in genome browsers like UCSC, Ensembl or NCBI

Background The Genome Reference Consortium (GRC) has according to its own statement the “mission to improve the human reference genome assembly, correcting errors and adding sequence to ensure it provides the best representation of the human genome to meet basic and clinical research needs”. Data from GRC is included in genome browsers like UCSC (University of California, Santa Cruz), Ensembl or NCBI (National Center for Biotechnology Information) and are thereby bases for scientific and diagnostically working human genetic community. Method Here long standing knowledge deriving from classical molecular genetic, cytogenetic and molecular cytogenetic data, not being considered yet by GRC was revisited. Results There were three major points identified: (1) GRC missed to including three chromosomal subbands, each, for 1q32.1, 2p21, 5q13.2, 6p22.3 and 6q21, which were defined by International System for Human Cytogenetic Nomenclature (ISCN) already back in 1980s; instead GRC included additional 6 subbands not ever recognized by ISCN. (2) GRC defined 34 chromosomal subbands of 0.1 to 0.9 Mb in size, while it is general agreement of cytogeneticists that it unlikely to detect chromosomal aberrations below 1–2 Mb in size by GTG-banding. And (3): still all sequences used in molecular cytogenetic routine diagnostics to detect heterochromatic and/ or pericentromeric satellite DNA sequences within the human genome are not included yet into human reference genome. For those sequences, localization and approximate sizes have been determined in the 1970s to 1990, and if included at least ~ 100 Mb of the human genome sequence could be added to the genome browsers. Conclusion Overall, taking into account the here mentioned points and correcting and including the data will definitely provide to the still not being completely finished mapping of the human genome. Supplementary Information The online version contains supplementary material available at 10.1186/s13039-021-00540-7.

natural sciences. The latter discipline gave birth to genetics and initiated ~ 3 decades ago the effort to sequence the entire human genome, with the hope to finally reach here an in-depth breakthrough concerning the above mentioned question.
Still, one can see the enthusiasm the 'Human Genome Project' (HGP) was and is accompanied by, in the statement on the corresponding internet presence as: "The HGP was one of the great feats of exploration in history. Rather than an outward exploration of the planet or the cosmos, the HGP was an inward voyage of discovery led by an international team of researchers looking to sequence and map all of the genes-together known as the genome-of members of our species, Homo sapiens. Beginning on October 1, 1990 and completed in April 2003, the HGP gave us the ability, for the first time, to read nature's complete genetic blueprint for building a human being" [1].
Looking at the history of human genetics, it was Gregor Mendel who suggested 1856 that in the cells there must be "coupling groups", i.e. that what was seen by Walther Flemming in 1879 and called by Wilhelm Waldeyer in 1888 'chromosomes' [2]. Even though banding cytogenetics, introduced by Lore Zech in the late 1960s provided major progress in human genetics [3], soon the developments in molecular genetic techniques were taking over the main stream of the field [4]. However, neither a pure base pair oriented view (of molecular genetics) nor an isolated chromosome-oriented view (of (molecular) cytogenetics) alone will ever be sufficient to understand our genome. Accordingly, the Genome Reference Consortium (GRC), being responsible for collecting and publishing the human genome reference sequence, aligns actual sequencing data with the chromosomal level. Recent insights from 'second-' and 'third-generationsequencing' approaches [5] highlighted furthermore by discovery of inter-and intrachromosomal interactions, and more specifically the TADs (topologically associating domains) [6], that chromosome structure is extremely important for genome function, and if impaired for diseases [5,6]. Most recently, insight from electron microscopy combined with cytogenetic, molecular cytogenetic and molecular genetic data led to a complete new understanding of the chromosome structure itself [7]. As stated by Joan-Ramon Daban: "Experimental evidence indicates that the chromatin filament is self-organized into a multilayer planar structure that is densely stacked in metaphase and unstacked in interphase. This chromatin organization is unexpected, but it is shown that diverse supramolecular assemblies are multilayered. The mechanical strength of planar chromatin protects the genome integrity, even when double-strand breaks are produced. " He suggests "that the chromatin filament in the loops and topologically associating domains is folded within the thin layers of the multilaminar chromosomes. It is also proposed that multilayer chromatin has two states: inactive when layers are stacked and active when layers are unstacked" [7].
In this paper it is discussed the following: in as much could GRC profit in its mission to "improve the human reference genome assembly, correcting errors and adding sequence to ensure it provides the best representation of the human genome to meet basic and clinical research needs" [8] by performing also such an integrative view on the available data of the human genome as Joan-Ramon Daban did? This question is of immense practical meaning, as GRC-data is bases for genomic browsers like UCSC (University of California, Santa Cruz) [9], Ensembl [10] or NCBI (National Center for Biotechnology Information) [11], and those are being applied as backbone for correct interpretation of (molecular) cytogenetic and molecular genetic diagnostic results. In the following, insights to be considered from banding cytogenetics, molecular genetics and molecular cytogenetics are accordingly discussed, because as recently stated by Ye and colleagues: "karyotype coding" defines the genome system information [12].

Insights from banding cytogenetics
Banded human chromosomes confronted scientists in the 1970s with similar problems and questions like nowadays there is with sequencing data: • How to describe and how to denominate what we see? • How to define a worldwide valid nomenclature for what we see? • How to give a definite and reliable nomenclature for bands or DNA-sequence positions?
Banding pattern in human chromosomes was denominated compulsorily latest in 1978 [13], and refined later on, with progress of used methods and higher banding resolution [14]. Since many editions of the International System for Human Cytogenomic Nomenclature (ISCN), banding nomenclature has not been changed, and refers to data from 1981 and 1994 [15]. This denomination of bands had just the goal to describe what is visible in a light-microscope: shorter, more condensed chromosomes show less GTG-dark and -light bands than longer, more decondensed ones (Fig. 1). However, this nomenclature is by no means reflecting biological realities, i.e. it does not describe which subbands at a higher chromosomal resolution derived from which more condensed ones, at lower resolution. This has been shown by seminal works of Uwe Claussen, who could demonstrate that GTG-light bands represent maximally decondensed chromosomal parts, and never split into further subbands. He showed this for chromosomes 6 [16] and Xp [17] using chromosomes stretching, and for all other chromosomes applying another, fluorescence in situ hybridization (FISH) based approach [18] (Fig. 1).
In as far as it might be important for GRC to consider that e.g. the maximally stretched short arm of the X-chromosome has overall 14 GTG-light and 13 GTGdark bands, has to be seen in future. At least one could deduce from that study [17] that there might be roughly (if Xp represents ~ 2% if human genome) overall ~ 700 GTG-light and 650 GTG-dark bands in the whole human genome, which wait for their alignment with sequencing data. In the light of the description of TADs [6] and work of Joan-Ramon Daban [7] this has to be considered earlier or later. Also an important clue to learn from this work [16][17][18], is that in GTG-light bands, being less condensed than GTG-dark ones, there should be less DNA included than in more condensed dark ones. The GTG-dark bands, being alinged at ~ 850 band level with sequencing results in genome browsers, can still be decondensed and should thus must contain more DNA and more GTG-light and dark subbands at higher decondensation levels.

Size of chromosomal subbands cannot be changed according to sequence results
The sizes of the chromosomal subbands shown in ISCN were determined based on microscopic observations. Even though it is stated in ISCN [15] that "location and width of bands are not based on any measurements", they represent what is and was seen by ten thousands of cytogeneticists worldwide, and since decades; thus it can be taken as fact, and no better data is available than this. Accordingly, individual band extensions in the chromosomal idiograms depicted in the genome browsers are not allowed to be changed based on results of sequencing. However, the latter has obviously been done when updating the browsers with new versions (Fig. 3). It could not be found out how GRC aligns sequencing data to GTG-bands-however, the described changes of chromosomal band sizes suggest that it has been done possibly the following way: In the first version several years ago it was defined according to the knowledge from that time that e.g. band A on a certain chromosome is located between DNA-markers X and Z. Later on more or less DNA-stretches had been found to be located indeed between these markers, and thus band size had to be adapted. It must be repeated-this is illegitimate and must lead to non-compatible molecular cytogenetic and molecular data (Fig. 3). The percentages / expansions of the chromosomal subbands can only be oriented on the values given for each band in Additional file 1: Table 1column C. In Additional file 1: Table 1 size of chromosomal subbands were determined (in percent of total chromosomal length) according to ISCN (2020) [15] and Chromosomal subbands cannot be seen in microscope if they are smaller than 1.0 megabasepair According to an "American College of Medical Genetics guideline on the cytogenetic evaluation of the individual with developmental delay or mental retardation" statement from 2005, "at resolutions > 650 bands, alterations as small as 3-5 Mb can be reliably detected using chromosome analysis on peripheral blood; for the detection of subtle rearrangements in patients with either abnormal or normal karyotypes, molecular cytogenetic analysis may be useful" [19]. Also in high resolution GTG-banding a resolution of ~ 1400 bands can be achieved-this means that at this band level it may be (theoretically) possible to see bands of 1-2 Mb in size-maybe exceptionally, even 0.5 Mb. Deduced from that it may be optimistically suggested that at a resolution of 850 bands a subband may be seen if is at least 1 Mb in size. In Table 1  them is smaller than 0.5 Mb and 2/3 of them are GTGlight bands; no GTG-dark band is smaller than 0.75 Mb.
Here it is of interest, as above mentioned, that GTG-light bands are maximally decondensed-thus, higher banding resolutions than 1 Mb seem to be possible here.
Overall this means, band length in GRC should be reconsidered, not allowing smaller bands in size than 0.5 Mb; especially as such are even in UCSC no more visible (Fig. 3). Furthermore, it should be considered GTGlight bands contain naturally less DNA than GTG-dark bands. A simple projection of the same stretch of DNA in a GTG-light band as in a GTG-dark band will not reflect what is biological reality.

What about repetitive DNA?
In early times of molecular genetics (here referred to as "classical molecular genetics") the interest in repetitive DNA within the human genome was immense. It was simply accessible and easy to study. As summarized elsewhere [20], those studies produced immense data about these yet by modern approaches (like second generation sequencing) still almost not accessible regions of the human genome [5]. In Table 3 just a selection of since decades identified satellite DNA-sequences is given. Most of them were and are used in millions of FISH experiments (here referred to as "molecular cytogenetics") and the location of these DNA-stretches is more than proven. Even the approximate sizes of these repetitive DNAs are known. If only those satellite DNAs from Table 3 would be included into the genome browsers, in one run 100 Mb of yet not mapped DNA-regions could be filled. This is also more than timely, as the expression of such satellite DNAs has been shown as least as long non-coding RNAs not only just recently [21]. Additional megabases could be filled by using the information in parts collected elsewhere [20] and also by adding nucleolus organizing regions to all acrocentric p-arms and telomeric sequences to the chromosomal ends. From the view of a biologist it is somehow surprising to start and end each chromosome in the browsers ignoring there the well-known telomeric repeats-they should be included, as they cannot be searched in UCSC; NCBI or Ensembl, yet. Maybe, database of genomic variants [22] might start thinking about inclusion of polymorphisms in repetitive DNAs, too. Finally and interestingly, without taking into account these megabases of repetitive DNA GRC includes nonetheless overall 3,091,153,988 base pairs in the human sequence (Additional file 1: Table 1); an

Conclusion
GRC and human genome browsers would tremendously profit in their comprehensiveness and accuracy if 'classical (cyto)genetic' data and 'karyotype coding' [12] would be considered more. As nicely stated by Iourov, Yurov and Vorsanova in 2020 [23]: "Undoubtedly, genomecentric and gene-centric are the words to describe actual concepts in human genetics. In a world of genes and genomes, the lack of required attention to chromosomes is often observed. As a result, chromosome research gradually loses the genetic (genomic) context. Certainly, brilliant insights into chromosome biology obtained by studies dedicated to molecular/cell biology, evolution, biochemistry, biophysics, etc., are fascinating. However, genome research and human (medical) genetics miss the essential link between genes and genomes, which is determined by chromosomal analysis (i.e., cytogenetics, molecular cytogenetics, cytogenomics). This is also the case for diagnostic research, which has recently suffered problems in quality of cytogenetic diagnosis. Data on genes and genomes are useless outside the chromosomal context when intrinsic molecular and cellular pathways are highlighted in health and disease. Without the chromosomal context, genes are virtual elements interacting with each other in an elusive digital universe. Unfortunately, this situation is generally the case for numerous attempts to analyze and interpret genomic data. More dramatically, education programs in genomics and genomic medicine developed for medical/biological students, physicians, or the public generally conceal any information about the chromosome, the physical (biological) storage of genomic data" [23]. This statement is further underlined by publicationos of Ron Hochestenbach and colleagues [24,25]. Yet, and also after inclusion of more data in future, the results shown in the browsers are nothing more than a model of the human genome-they do not depict the natural human genome, they do not describe the highly variable nature within living cells which is present in a three dimensional context. GRC, as ISCN provide both mainly a unifying nomenclature, to be able to describe aberrations from the norm.