top of page

What Determines the Size of Your DNA

Updated: Dec 14, 2021

Guanine-Cytosine Content in DNA

—New York, NY

What Determines the Size of Your DNA Guanine Cytosine Content in DNA, RNA, genome project, geneticists, non-coding regions, microRNA, crispr, polymerase, PCR, GC content, LTR,

The completion of the Genome Project in 2003 has exponentially broadened our knowledge of the genome. This august project has enabled geneticists to reveal patterns tucked away in our genetics. What were once considered non-coding regions are now being recognized as functional units, such as microRNA and CRISPR. A valuable result of genomic analysis has been the creation of a phylogenetic chart that traces back our ancestry to the last known relative. One method of analysis is the quantification of organism base pairs. While this information is helpful in mapping out our ancestry, it also reveals a significant amount of information about the organism itself, which may serve us in the medical and agricultural fields in the future. The base pair content of an organism is important because it reveals hidden mechanisms of DNA.

DNA is composed of a pyrimidine and purine base pair, bonded to a backbone of deoxyribose, (a five carbon sugar), and inorganic phosphate. There are four bases that comprise DNA: adenine and thymine, which pair together (AT), and guanine and cytosine, which pair together (GC). GC has a stronger bond due to the triple hydrogen bond, as opposed to the AT double hydrogen bond. Scientists hypothesize that organisms with a higher GC content in their genome will be more thermostable and be able to thrive in extreme temperatures. In fact, it was from Thermus Aquaticus, a thermophile, that Taq polymerase, the DNA polymerase enzyme used in PCR, was isolated. Taq pol is used due to its ability to withstand the higher temperatures required to denature and anneal DNA. A similar enzyme, Pfu, was found in Pyrococcus furiosus, a hyperthermophile, that survives in even harsher conditions. While T. aquaticus has a GC content of 68%, P. furiosus has a GC content of 40.8%. This finding indicates that there are more variables that contribute to the GC content of DNA than just the environment.

A study by Almpanis, Swain, Gatherer, and McEwan concludes that there is a positive correlation between genome size and GC content in bacteria. The GC content maxes out at ~75%, with a minimum of ~13.5%. This is primarily due to the requirement of the amino acid lysine, which can only be retrieved by codons that contain thymine and or adenine. This finding also correlates with plasmids- extrachromosomal DNA. Interestingly, the GC content of the plasmid matches its host GC content. One theory is that the similar GC content allows the organism to determine if there is compatibility between the two types of DNA.

Intracellular pathogens, such as phages, however, have a lower GC content than their host. This might be due to reducing the burden that the phage places on its host by saving it from spending more energy on pyrimidine synthesis. These findings provide an avenue into the identification of endogenous retroviruses by recognizing areas of the genome with higher concentrations of purines.

Just when geneticists began to understand the underpinnings of why certain bases are chosen within bacteria, they began to analyze a different kingdom of life which introduced new patterns. In Ecological and Evolutionary Significance of Genomic GC Content Diversity in Monocots, it was found that the GC content in the monocat family plant kingdom displayed a quadratic relationship to genome size. Both ends of the spectrum had low GC content; however, the middle range contained a high GC content. One theory is that it can be a consequence of the higher biochemical costs of GC base synthesis, which explains the decrease in GC content on the higher end of the spectrum. Other reasons may be due to increases in genome growth predominantly arising from increases in the amount of LTR (Long Terminal Repeat) retrotransposons, which consist of GC rich regions, that dominate in most plant genomes. Conversely, an increase in AT content can be due to compacting a larger genome. It is easier to compact a larger genome with AT because it compacts better than GC due to its geometric composition and relationship with neighboring chemical groups. On the lower end of the spectrum, however, some decrease in GC content can be attributed to holocentric chromosomes. These are chromosomes in which there are multiple kinetochore sites on the chromosome rather than one, as seen in monocentric chromosomes. Holocentric chromosomes cause very low levels of GC contents due to their small and rigid size, which reduces recombination rates, specifically at heterologous recombination sites, which is a repair mechanism in which crossover is used between two different chromosomes, preferentially introducing GC bases. With the lack of this repair mechanism, the GC content will remain low in smaller chromosomes. Other decreases in GC content can be attributed to the higher mutability rates due to cytosine’s ability to be methylated.

In the monocot families that present an increase in GC content, it was noted that they dominate in cold and dry environments. This is a stark contrast to thermophiles. One theory for this is that it prevents desiccation by freezing by adding proteins and sugar to the water to lower the freezing point. Other characteristics include a significantly reduced amount of introns, or none at all, a higher percentage of GC in the five prime regions of the gene, and a higher TATA box content.

After more than a hundred years since the publishing of Mendel's laws of segregation and independent assortment, genetics has grown into a field that leads medicine into a new era and one that has become more publicly accessible. It was only less than two decades ago when sampling an entire human genome would cost billions of dollars, whereas now the cost has been reduced to a couple of thousand dollars. Leading by example, Ozzy Osbourne was one of the first, along with Steve Jobs, to have their genome sequenced. Geneticists were interested in Osbourne’s genome in order to better understand why his body is able to withstand his rockstar lifestyle of drug abuse. Geneticists discovered mutations in his ADH4 gene, which enables a quicker metabolism rate of alcohol than most people. Certainly, as genetic testing becomes part of the battery of tests which compromise annual doctor visits, abnormal genotypes will enable physicians to treat diseases, both obvious and underlying, and dispense appropriate medications necessary for a particular patient. One possible way scientists may want to bring about this goal is to expand the genomic database outside of the medical and agricultural fields. Thus, a broader range of organisms will be revealed and we can learn more about genes.


Almpanis, A., Swain, M., Gatherer, D., & McEwan, N. (2018, April). Correlation between bacterial G+C content, genome size and the G+C content of associated plasmids and bacteriophages. Retrieved November 25, 2020, from

Brumm, Phillip & Monsma, Scott & Keough, Brendan & Jasinovica, Svetlana & Ferguson, Erin & Schoenfeld, Thomas & Lodes, Michael & Mead, David. (2015). Complete Genome Sequence of Thermus aquaticus Y51MC23. PLoS ONE. PLoS ONE 10(10): e0138674. 10.1371/journal.pone.0138674.

Lehman, Niles, and Peter J. Unrau. “Recombination During In Vitro Evolution.” Journal of Molecular Evolution, vol. 61, no. 2, 2005, pp. 245–252., doi:10.1007/s00239-004-0373-4.

Mandrioli, M., & Manicardi, G. (n.d.). Holocentric chromosomes. Retrieved November 25, 2020, from

Sullivan, Bill. “Gene's Addiction, or Why Ozzy Osbourne Is Still Alive.” Discover Magazine, Discover Magazine, 9 May 2020,

Šmarda, P., Bureš, P., Horová, L., Leitch, I., Mucina, L., Pacini, E., . . . Rotreklová, O. (2014, September 30). Ecological and evolutionary significance of genomic GC content diversity in monocots. Retrieved November 25, 2020, from

“The Human Genome Project.”,

Šmarda, P., Bureš, P., Horová, L., Leitch, I., Mucina, L., Pacini, E., . . . Rotreklová, O. (2014, September 30). Ecological and evolutionary significance of genomic GC content diversity in monocots. Retrieved November 25, 2020, from

Yong, E. (2016, March 03). How Viruses Infiltrated Our DNA and Supercharged Our Immune System. Retrieved November 25, 2020, from

19 views0 comments


bottom of page