Change we can "believe" in: morphological variation from regulatory elements versus coding regions


The underlying mechanisms that generate morphological variation among organisms have long intrigued evolutionary biologists. Variation in the coding region of genes has been a major focus of our understanding of evolution since the modern synthesis combined ideas from the fields of genetics, paleontology, and systematics. However, these concepts mostly ignored the roles of embryology in the evolution of form (Carroll, 2008). Recently, the development of molecular techniques has fostered the emergence of the field of evolutionary developmental biology (Evo-Devo), which has been emphasizing the role of regulatory elements in morphological evolution (Carroll, 2005). It theorizes that mutations associated with major morphological differences among species are more likely to occur in regulatory elements of genes, rather than in their protein-coding regions (Prud’homme et al. 2006, Jeong et al. 2008). This idea has been challenging fundamental concepts within evolutionary biology, although its relevance is still debatable (Hoekstra and Coyne, 2007). This has fueled an ongoing debate among biologists considering sources and impacts of variation. Here, we attempt to review each perspective, their limitations, and their impacts on the field of evolution.


Evolutionary theory is the milestone of all fields within biology since the Modern Synthesis in the early 30s of the last century. However, evolutionary biology has undergone some remarkable conceptual shifts regarding the relevance of the processes claimed to shape life on earth. For example, the publication of the neutral theory of molecular evolution (Kimura, 1983) highlighted the importance of neutral variation and genetic drift for explaining molecular evolution. This theory is a landmark of evolutionary biology, because up until that time, selection was thought to be the only agent of evolution.

More recently, we are witnessing an important debate within the field of evolutionary biology. This debate focuses on whether changes in regulatory regions or those in the coding regions are more relevant for phenotypic evolution. To understand the question, one needs first to be introduced to the two major sources of genetic variation affecting phenotypic evolution. The coding regions of genes produce proteins. Mutations of amino acids, also known as structural changes, may potentially change the function of a protein or its specificity. Nevertheless, genes are regulated by elements generally outside the coding region, called regulatory elements (Figure 1). Changes in regulatory regions can lead to the silencing or activation of a gene at different temporal and spatial scales. Mutations at both regions affect the evolution of phenotypic traits. However, a general picture of the relative importance of each is still unknown.

After the Modern Synthesis, phenotypic evolution was considered to be dependent on changes in coding regions. Many studies have supported the relevance of changes in the amino acid sequences to phenotypic variation (Rosenblum et al., 2004, Protas et al., 2006, Gratten et al., 2006). However, other comparative studies of genes across clades led to alternative conclusions about the process of phenotypic evolution. King and Wilson (1975) noted the close similarity in protein and DNA sequences between humans and chimps, besides the roughly 6 MYR of independent evolution and distinctive morphological, physiological, and cultural differences between the two lineages. Their findings were in contrast with the expectation that multi-level differences between humans and chimps should be reflected in the gene sequences. Instead, they proposed that changes in gene regulation would be a plausible explanation for phenotypic differences between those two primate lineages.

Since this discovery, an accumulation of data from different studies including single gene, multiple genes, and whole genomes led to the proposal of a new theory where mutations in regulatory regions are the major agents in phenotypic evolution (Carroll 2005, 2008). However, opponents of this theory emphasize that changes in the coding regions are also important for phenotypic variation, and considering regulatory elements as the main drive of trait evolution is still premature (Hoekstra and Coyne 2007). The major arguments and counterarguments of both sides are exposed in Table 1.

Here, we attempt to examine evidences accounting for the relevance of both regulatory elements and coding regions as main drives of phenotypic evolution. The goals of this review are two-fold. First, we attempt to present and comment on regulatory explanations and its opponents with two case studies. Second, we address deficiencies in these and other studies, attempting to better understand the relative importance of each process in phenotypic variation.


Figure 1. Mutations in regulatory regions and coding regions and their specific possible outcomes. Changes in the regulatory region are show from A2 – A4. A1 shows the structure of a hypothetical gene and where the gene is normally expressed (A1’). A1’ shows the expression of that gene in different tissues. The read color indicates the expression of the hypothetical gene. A2: mutation in the first regulatory element leads to depletion of expression in tissue 1 and 2 (A2’). A3: mutation in the second regulatory region increases the expression of the gene in tissue 2 and 4 (A3’). A4: mutation in the third regulatory element generates a shift in the expression domain of the gene (A4’). Changes in the coding region are shown from B2 and B3. B1 depicts a hypothetical gene and B1’ its hypothetical protein. B2: mutation in the coding region can lead to inactivation of the protein (B2’) that is most likely the case for most of non-synonymous mutations. B3: mutation in another part of the coding region that promotes change in the function of the protein (B3’).


Case study I:Forkhead Box P2 (FOXP2): An example of a structural mutation resulting in a phenotypic difference between apes and humans

Comparing the human genome with its primate relative, the chimpanzees, represents a relevant example for interpreting sources of between species variation because of their close phylogenetic relationship and short divergence time. Humans and chimpanzees share about 98% of their nucleotide sequences (Chimpanzee Sequencing and Analysis Consortium 2005). Despite this genetic similarity, there are many phenotypic differences between the two species, including cognition and behavior.

Phenotypic traits like speech and language separates humans from our closest relatives in the ape family. While the capacity to communicate is likely conferred by many genes, FOXP2 has been found to have an impact on our motor skills that allow us to manipulate our throat to form a multitude of sounds, thus involved in the formation of speech. This gene codes for a transcription factor, and was originally discovered because speech and language deficiencies were found to be an inherited trait in the human population (Vargha-Khadem et al. 2005). It was found that half of the offspring of three generations within a certain family appeared to be affected, leading to the discovery of the FOXP2 gene is linked to speech capability, and that those offspring that possessed a speech deficiency had a point mutation on chromosome 7 where the FOXP2 gene resides (Vargha-Khadem et al. 2005).

FOXP2 contains 715 amino acids and poly-glutamate regions that are prone to elevated mutation rates; however those poly-glutamate sections do not appear to impact gene function (Enard et al. 2002). Disregarding those highly mutated regions, Enard and others (2002) found that humans have three different amino acids in their FOXP2 gene than the mouse, whereas some of our closest relatives, the chimpanzee and gorilla have two amino acid differences from the human sequence and one amino acid difference from the mouse (Figure 2).

These amino acid substitutions were found in position 303 (threonine-to-asparagine) and 325 (asparagine-to-serine) on exon seven, the change in position 325 in humans is thought to impact phosphorylation target sites and secondary structure Moreover, no polymorphisms in FOXP2 sites 303 or 325 were found in a geographically broad sampling in 44 humans (Enard et al. 2002).

Using computer simulations, Zhang and others (2002) found that FOXP2 exhibited high rates of evolution in humans and that selection was accelerated at ~100,000 years ago. Further, there is evidence that the elimination of variation by strong positive selection (selective sweep), took place in the hominid lineage (Enard et al. 2002).

This example of protein coding changes producing phenotypic changes between species shows the importance of structural changes to phenotypic evolution. However, this interpretation is challenged by Carroll (2005), who argues that without knowledge of the function FOXP2 in other tissues, regulatory control cannot be ruled out as an important process.


Figure 2. Silent and replacement nucleotide substitutions mapped on a phylogeny. Vertical bars represent nucleotide changes and grey boxes indicate amino-acid changes – reproduced from Enard et al. 2002.

Case Study II: Regulatory changes promoting phenotypic diversity in stickleback fish

Many examples of morphological evolution have arisen by mutations in regulatory regions (see Carroll 2008). In this case study, we point out the importance of a recent whole-genome study carried on in Gasterosteus aculeatus, commonly known as sticklebacks, that supports the aforementioned model of evolution (Jones et al. 2008).

The adaptation from marine to freshwater in sticklebacks was investigated by sequencing and characterizing the whole-genome of 21 individuals from both populations. The invasion of freshwater environments was followed by several morphological changes, such as armored plates degeneration and lighter defensive spines, which happened several times and in a short period of time (Jones et al. 2012). This morphological variation associated with different ecological conditions makes sticklebacks a model for testing evolutionary genomic changes associated with morphological evolution and adaptation.

The study showed that 0.5% of the genome was clustered by ecotype potentially caused by parallel reuse of standing genetic variation, neutral divergence or geographic structure. The authors recognized 64 polymorphisms with 41% of them assumed to have happened in regulatory regions involved with the adaptation of those fish populations to freshwater environments (Fig 3-a). Moreover, genome-wide expression analysis shows that marine–freshwater fish enriched with genes showing divergent expression are located in tissues that are significantly impacted during evolution to these environments (Fig 3-b).


Figure 3. Contributions of coding and regulatory changes to parallel marine–freshwater stickleback adaptation. a: 64 marine–freshwater divergent regions in the genome with the strongest evidence of parallel evolution, 41% (26) mapped entirely to non-coding regions of the genome, and presumably contain regulatory changes; b: Genome-wide expression analysis between marine and freshwater fish (observed, grey bars; expected, white bars). [adapted from Jones et al. 2012]

The EDA locus, which is related to developmental progress of armor plates was one of the significant divergent regions in the marine-freshwater comparison, and are therefore subject to strong selection. In addition, two Wnt genes, WNT7B and WNT11, which regulate the paracrine signaling pathway in the kidney (Yu et al. 2009), were also different between those two ecotypes. Freshwater fishes produce more hypotonic urine than marine fishes, thus the regulation of kidney function plays a role in the adaptation process (Jones et al. 2012). Thus, this study provides genome-wide evidence suggesting that regulatory regions contribute to the parallel evolution in a much larger proportion than changes in coding regions. The evolution of regulatory regions is therefore predominantly involved with adaptive radiation of sticklebacks.

Limitations of current research

In our review, we noticed that important aspects were not considered in these and other studies. To better understand the roles of regulatory and coding region variation, we need to address the following caveats in our current approaches.

Taxonomic coverage
Most studies considering regulatory variation need genetic tools to identify these regions, and these methods are only currently developed for model species, which are not representative of biological diversity.

The genetic basis of phenotypic variation is generally dissected at the species level, be it within or between closely related species, limiting the phylogenetic sampling to the specific level.

The comparison between different taxonomic levels requires a comparable metric to assess differences in evolutionary histories, given the different temporal scales each taxa experiences.

Type of genes
The genes often chosen for studies are restricted by availability of our current knowledge and their homologous nature. In most cases, genes are selected from a candidate set, without considering other genes that affect the same phenotype.

Many highly conserved multi-gene families are used for a comparative approach, biasing towards finding variation dominantly in regulatory regions. At the same time, early comparisons of protein evolution did not find conservation in the coding region, because they were not searching for it, which has been referred to as the “Mendelian Blind Spot” (R. Armundson).

Another relevant aspect of gene selection is the functional role and the impact that gene has on development. Early development genes tend to be more conserved across taxa, while late development genes often have higher genetic variation (Gunter Wagner, pers. comm.).

Genes that are expressed in different cell types and tissues within an organism are likely to have regulatory variation. Therefore, comparing between species will bias towards regulatory differences as the source of phenotypic variation (Gunter Wagner, pers. comm.).

One way to circumvent this bias is by looking at genome-wide scale, in spite of the lack of full comprehension of less studied (but still annotated) genes.

Type of traits
Due to the different goals of various groups studying model species, the traits we know most about are not necessarily homologous, or comparable on a genetic level.

Similar to the early and late development gene selection choice, traits that develop early in the ontogeny tend to be more conserved in the coding region.

Furthermore, we do not know anything about whether the ecological role of a trait, its impact on a species history, or its evolutionary history differently affects coding or regulatory regions.

Rates of Evolution & Independence of Coding-Regulatory Regions Evolution
There are proposed statistics that are available to assess the type and rate of evolution on coding sequences, but such metric do not exist for regulatory regions. This presents an issue when trying to compare the impacts of mutations of these regions on phenotypic evolution.

A lack of a systematic comparison hinders also our ability to evaluate if these regions show a coupled evolutionary history, or if they have evolved independently of one another. Though, these methods are being developed (Castillo-Davis et al., 2004), they are sparsely applied.

Gene Expression
We lack much basic information about regulatory elements, such as their location in the genome (cis vs. trans), and the degrees to which they may act on gene expression, and whether it affects gene expression level or pattern. For example, reorganization of regulatory elements in order, orientation, and spacing can still result in intact developmental patterns (reviewed in Castillo-Davis et al., 2004).

It is also important to have comprehensive methods that explore variation in both regulatory and coding regions in the same study, as well as to obtain functional evidence to the observed genetic variation. Although it is difficult to find regulatory regions at the sequence level, it is just as difficult to assess how coding region mutations affect protein function (Gunter Wagner, pers. comm.).


What is more important to phenotypic evolution? Our review emphasizes that mutations in coding and regulatory regions are both relevant, but with our current knowledge and tools we cannot reveal a general pattern. Additionally, we have raised many issues that should be addressed in future studies to thoroughly evaluate the impact of mutations in these regions on phenotypic evolution.
New discussions are emphasizing that phenotypic evolution most likely occurs by a small proportion of major effect mutations, where certain mutations have a greater impact on phenotypic evolution than others, whether they are structural or regulatory (McGregor et al., 2007; Nei, 2007). Finally, other levels of regulation are also not being considered, including mutations that affect regulation at the levels of transcription, mRNA, splicing and stability, and post-translational modifications (Wray, 2007).


Adachi, Y., Hauck, B., Clements, J., Kawauchi, H., Kurusu, M., Totani, Y.,Kang, Y.Y., Eggert, T., Walldorf, U., Furukubo-Tokunaga, K., et al. (2003). Conserved cis-regulated modules mediate complex neural expression patterns of the eyeless gene in the Drosophila brain. Mech. Dev. 120, 1113–1126.

Carroll SB (2005) Evolution at Two Levels: On Genes and Form. PLoS Biology 3(7): e245.

Carroll, S. B. (2008) Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134, 25–36

Castillo-Davis, C.I.C., Hartl, D.L.D., and Achaz, G.G. (2004). cis-Regulatory and protein evolution in orthologous and duplicate genes. Genome Research 14, 1530–1536.
Enard W, Przeworski M, Fisher SE, Lai CS, Wiebe V, Kitano T, Monaco AP, Paabo S (2002) Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418: 869–872.

Chimpanzee Sequencing and Analysis Consortium. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437: 69-87.

Haesler S, Wada K, Nshdejan A, Morrisey EE,Lints T, et al. (2004) FoxP2 expression in avian vocal learners and non-learners. J Neurosci 24: 3164–3175.
Ohno S (1970) Evolution by gene duplication. New York: Springer-Verlag. 160 p.

Jones, F.C. et al. (2012) The genomic basis of adaptative evolution in threespine stickleback. Nature 484: 55-61.

Lai CSL, Fisher SE, Hurst JA, Vargha-Khadem F, Monaco AP (2001) A forkhead-domain gene is mutated in a severe speech and language disorder. Nature 413: 519–523.

McGregor, A.P., Orgogozo, V., Delon, I., Zanet, J., Srinivasan, D.G., Payre, F., and Stern, D.L. (2007). Morphological evolution through multiple cis-regulatory mutations at a single gene. Nature 448, 587–590.
Nei, M. (2007). The new mutation theory of phenotypic evolution. Proceedings of the National Academy of Sciences of the United States of America 104, 12235.
Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge.

Sandmann, T., Grardot, C., Brehme, M., Tongprasit, W., Stole, V., and Furlong, E.E. (2007). A core transcriptional network for early mesoderm development in Drosophila melanogaster. Genes Dev. 21, 436–449.

Wittkopp PJ, Kalay G. (2011) Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet. 13:59-69.

Wray, G.A.G. (2007). The evolutionary significance of cis-regulatory mutations. Nature Reviews Genetics 8, 206–216.
Vargha-Khadem F, Gadian DG, Copp A, Mishkin M (2005) FOXP2 and the neuroanatomy of speech and language. Nature Reviews Neuroscience 6: 131–138.

Yu, J. et al. (2009) A Wnt7b-dependent pathway regulates the orientation of epithelial cell division and establishes the cortico-medullary axis of the mammalian kidney. Development 136: 161–171.

Zhang J, Webb DM, Podlaha O (2002) Accelerated protein evolution and origins of human-specific features: Foxp2 as an example. Genetics 162: 1825–1835.


Elizabeth A. Sheets, MS Candidate, San Francisco State University
Trisha Spanbauer, PhD Candidate, University of Nebraska - Lincoln
Ronnie G. Gavilan, Postdoc, STRI, Panama
Mei-Hui Wang, PostDoc, University of California Irvine
Diogo B. Provete, PhD Candidate, Federal University of Goias, Brazil
Leila T Shirai, PhD Candidate, Gulbenkian Institute, Portugal
Thanos P. Mourikis, Undergraduate student, University of Athens, Greece
Filipe Silva, MS, Harvard University/Uppsala

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License