GREATER COMPLEXITY


Mutations can increase genetic complexity in addition to modifying existing genes. There are a number of ways in which genomes can change and new information can be added to genomes. These include gene duplication, segmental duplications of chromosomes, exon shuffling, lateral transfer, gene fusion, alternate splicing, and frameshift mutation. All of these have been observed in the human genome (Okamura, 2006).

EVOLUTIONARY MODEL

If evolution is correct, natural processes such as mutation can lead to an increase in complexity. New genes can be added by modifying duplicates of existing genes. Large, complex genes could be formed by uniting a series of smaller components.

CREATIONISM MODEL

If the creationism model is correct, mutations cannot produce any positive result, let alone an increase in complexity. If every gene were created in its current form and adapted for its current function, there is no reason to expect to find a series of modified duplicates of the same structure, nor shorter sequences from which larger genes could be derived.

INTELLIGENT DESIGN

If intelligent design is correct, then complex genes must appear suddenly in their final form rather than be modified over time from simpler sequences or duplicated from existing genes.


CHROMOSOMAL DUPLICATIONS
Not only do many genes seem to be present in multiple copies resulting from duplications, entire chromosome segments seem to be duplicated. For example, human chromosomes 7p15-12, 17.q11-22, 12q12-13, and 2q31-34 not only contain homologous Hox clusters, but also EGFR homologs. Human chromosomes 6p21, 9q33-34, 1q22-31, and 19p13 possess homologs of nuclear receptors, vav-like oncogenes, notch-like receptors, pbx genes, tenascin homologs, complement proteins, abl-like kinases, TNF homologs, and MHC I related clusters (Spring, 1997).


About 5% of the human genome is composed of large segmental duplications, many in the regions near centromeres and telomeres, which have occurred in the last 40 million years (Linardopoulou, 2005).About 2/3 of the segmental duplications which exist in the human genome are shared by chimpanzees. Of the 33% which is unique to the human genome, much of it on chromosomes 5 and 15, some include regions linked to human diseases such as Prader-Willi syndrome and spinal muscular atrophy. In chimps these regions exist only as single copy sequences (Cheng, 2005). About 5.2% of the human genome seems to have originated from segmental duplications (Zhang, 2006).


Intrachromosomal (but not interchromosomal) duplications are positively correlated with gene density, suggesting that chromosomal duplications are more likely to be maintained if they involve genes and thus are an important source of gene amplification. The human chromosomes with the lowest gene density also have the least number of introchromosomal duplications (Bailey, 2002).


The gene families whose sequences have been increased through these chromosomal segment duplications include the immunoglobulin, serine protease, cytochrome P450, cytokines, globin, and major histocompatibility complex genes. As a result these gene amplifications have probably contributed to a wide range of physiological processes including immunity, metabolism of drugs, pregnancy and development. Human chromosomes 7,9,15,16,17,22, and Y have the greatest numbers of interchromosomal and intrachromosomal duplications (Bailey, 2002).


DOMAIN AND EXON SHUFFLING
New genes can be created through exon shuffling. For example, a mRNA of alcohol dehydrogenase was inserted into an intron of the yellow emperor gene in Drosophila by retroposition. Its expression pattern and the rate of synonymous changes suggest it has acquired a new function (Long, 2003).

1) Not all parts of a protein are created equal: Protein Folds
Does a protein have to have a precise amino acid order to maintain its function? The answer is obviously no. If you survey the amino acid sequences of a certain protein in any one species (such as humans) or across species, it is evident that some parts of the protein are not as critical to the overall function as others (and thus more free to change). A protein may require as few as 7 amino acids to determine its tertiary structure (Reader, 2002). Many of the essential portions of proteins form a specific protein fold and it is this part of the protein that performs an essential function such as binding DNA, binding ATP, forming the active site of the enzyme, adding a phosphate group to a protein, etc. For example, the zinc finger fold binds DNA and is a requirement for all the zinc finger transcription factors, allowing them to bind DNA. The original zinc finger proteins have been duplicated hundreds of times to produce a superfamily of proteins which bind DNA. Variations between different members of the superfamily allow them to bind to specific regions of DNA while retaining the zinc finger protein fold as the essential part of the protein.


How many protein folds are there? Not as many as one might think. There are only several hundred protein folds which have been identified from all known gene sequences and it is thought that the total number of protein folds throughout the kingdoms of life may be about 1,000. These folds may be central elements of different proteins-the average fold is known to be incorporated into over 100 different proteins but some (such as the TIM barrel, the immunoglobulin fold, the Rossman fold, the ferrodoxin fold, and the helix-turn-helix bundle) are incorporated into thousands of different proteins each. The twenty five most abundant folds are parts of 61% of proteins with structural homologues throughout all groups of life (Gerstein, 1997).


Although each major group of organisms possesses varying numbers of these folds in their genomes (for example, immunoglobulins for intercellular communication and zinc fingers for gene regulation are among the ten most abundant folds in animals but not plants or eubacteria), there are many folds which are shared. Of 229 protein folds identified in eukaryotes, 156 were shared with bacteria. Of 194 protein folds identified in animals (metazoans), 132 were shared with other eukaryotes. Of 181 protein folds identified in chordates, 131 were present in non-chordate animals. Thus, the functional portions of many human proteins are domains which evolved very early in the history of life and were subsequently duplicated and modified in more complex species (Gerstein, 1997).


There are only a few thousand protein domains known in living organisms. Only 7% of vertebrate protein domains are unique to vertebrates (Liu, 2001; International Human Genome Sequencing Consortium, 2001). There are 21 small-molecule-binding-domains (SMBDs) which bind to small intracellular molecules and are shared by at least 2 of the three main groups of organisms (eubacteria, archaea, and eukaryotes). These small domains have been incorporated into a number of unrelated proteins. For example, the T-OB domain has been incorporated into some ABC transporters where it regulates the uptake of the substrate (Anantharaman, 2001). There are a large number of multidomain proteins in the human genome in which a small number of ancestral domains have been shuffled and spliced to produce a diversity of proteins.

shuffling
Exons can share homology over a number of diverse genes. For example, sequence similarities suggest a common origin of the X exon of human a-1 (II) collagen and the second exon of rat mannose-binding protein A; the 4th exon of human serum albumin and the 7th exon of human K6b epidermal keratin, the first exons of human apolipoprotein B-100 and EGF receptor; the 5th exon of mouse a-2 type IV collagen and first exon of human complement C1q B-chain; and the second exon of human TNF-ß and 3rd exon of rat asialoglycoprotein receptor. All known exons may be descended from only 1,000-7,000 ancestral exons. (Dorit, 1990).
Since individual genes come in pieces, a cell can shuffle these pieces to produce a diversity of different proteins from one gene. In the following diagram, one pre-mRNA containing introns (magenta) and exons (various colors) can be spliced in different ways to produce a variety of mRNAs which are composed of different sets of exons. These mRNAs would then encode different protein sequences despite their origin from the same gene. Alternate splicing of the original transcript RNA allow higher eukaryotes to generate a diverse repertoire of proteins. One of the reasons that the early estimates of the size of the human genome (about 100,000 genes) were so high was that so many genes produce alternate transcripts.

introns
The LDLR family of receptors share a number of molecular components. The green areas represent LDLR repeats (with some variation in the number of repeats) and the red areas represent EGF-precursor domains which occur in proteins outside the LDLR family.


1) Gene Duplications: Globins, Hox Genes, and Opsins
Not all humans have the same number of genes-some people have duplicated copies of genes which others have only one copy of (per haploid set). Duplications can occur through different mechanisms but a common cause is unequal crossing over during recombination (recombination is a normal aspect of gamete formation). Often, duplicated genes will then exist as a cluster of genes in tandem on a single chromosome. There are many such examples of gene clusters in the human genome. Other chromosomal changes, such as translocation, can move one chromosome segment to another chromosome. When these mutations occur during the formation of gametes, it can affect the number of gene copies present in the offspring.

There are many examples of recent duplications of genes in human genomes. For example, duplications of a cluster of genes on chromosome 1p36.2 have included the pseudogene MSPL. Human genomes can vary in the number of this pseudogene they contain per haploid set from 4 to 7 or more (van der Drift, 1999). The following groups of human genes are examples of gene family members which have arisen through tandem duplications. In the globin and opsin clusters, additional duplications are occurring, producing additional family members in some humans than in others.

1) Red and Green Opsins

Most non-primate mammals and more primitive primates (prosimians and New World Monkeys) have one opsin for long wavelengths of light. Old world monkeys and apes have two genes resulting from gene duplication: red and green. Some people have multiple copies of red (up to four) and green opsins (up to seven).

 

GLOBINS
Not only have ancestral globin genes duplicated to form single copies of globin genes dispersed in the genome (myoglbin, neuroglobin), there are two clusters of globin genes in the human genome: alpha and beta.

ALPHA CLUSTER
5'-----zeta-------zeta pseudogene-------alpha pseudogene------alpha 2------alpha 1-----theta----3'

BETA CLUSTER
5'---------epsilon----------gammma G---------gamma A----------eta------------delta----------beta---------3'

There are variations in the number of globin genes from the alpha and beta clusters. There are 3 copies of zeta genes in some Melanesians and Polynesians. A common variation of the zeta pseudogene makes it a functional gene. While most adults have 2 alpha hemoglobin genes, common variants include the possession of 1 or 3 alpha genes. Melanesians only have 1 alpha gene and, among African-Americans, chromosomes with one alpha gene are about as common as those with two. The function of the theta gene isn't known although it is expressed in red blood cell lineages in fetal mammals (including humans at about 5 weeks). There have been cases where it has been deleted but no effects of this are known. Most people have two gamma genes (gamma G and gamma A) but some people have 3 or even 5 (there was even one family that had 4 copies on another chromosome). Delta hemoglobin can complex with alpha hemoglobin in adults and it can be considered to be a second beta hemoglobin gene. There are examples of fusion hemoglobin genes in which the delta and beta genes have fused into one gene (and even a case of a delta-beta-delta fusion gene).
Once genes are duplicated to produce a gene family, some members can be modified for new functions. For example, an olfactory receptor gene is expressed in the notochord of the developing chick embryo, suggesting that it has been adapted for a novel signaling function (Nef, 1997).


2) HOX CLUSTERS
The importance of HOX clusters in embryonic development cannot be over-emphasized. These gene clusters appear to be multiple tandem duplications. In vertebrates, the original cluster was duplicated to produce four clusters, from which a few members were lost.

 

SEA URCHIN CLUSTER


CHORDATE CLUSTER


SUBSEQUENT DUPLICATION AND GENE LOSS PRODUCES THE CLUSTERS AS THEY EXIST TODAY

Duplications of the same ancestral genes can occur separately in different lineages. Although both tunicates and vertebrates both have duplicate gonadotropin releasing hormone receptors, they have arisen from independent duplications (Kusakabe, 2002). A number of genes have been duplicated in amphioxus since its separation from other chordate lineages. These genes include a 14th gene in the Hox cluster, a duplicate Evx (whose function has diverged from the standard function of Evx), and a duplicate Emx gene (Minguillon, 2002).
They are highly expressed in the nervous system. Protocadherin homologs are known in Drosophila and the existence of 3 clusters of protcadherin genes on human chromosome 5q31 suggests that gene duplication has produced some of the diversity of the gene subfamily (Wu, 2000).
Protocadherins are the largest subfamily of cadherins. Mammalian genomes analyzed to date possess about 70 protocadherins organized in large clusters (chromosome 5q31, 13q21, and Xq21 in humans). The union of constant and variable regions occurs in a and ? protocadherins on 5q31. It seems that the clusters of protocadherins are unique to vertebrates (Frank, 2002).


In humans, mice and rats, the CNR/Pchha family of protocadherins exists as a tandem set of genes occupying between 230 and 250 kilobases (Yanase, 2004; Sugino, 2000). Although humans lack a CNR3 gene which is known in mice, there is a homologous pseudogene in the human genome (Yanase, 2004; Sugino, 2000; Hamada, 2001).
Carp are tetraploid and possess two copies of the myc gene. The nucleotide sequence of one is evolving faster than the other, suggesting that it is adapting to a new function (Futami, 2001). It seems that extensive gene duplication occurred before the origin of animals which gave rise to new domains and domain shuffling (Muller, 2001a).
Gene duplications can occur in related species. Mice possess about 80 more genes and 55 more pseudogenes of the V1r family than rats. Almost all the V2r genes resulted from duplications after the lineages separated (Yang, 2005).
Duplications of the same ancestral genes can occur separately in different lineages. Although both tunicates and vertebrates both have duplicate gonadotropin releasing hormone receptors, they have arisen from independent duplications (Kusakabe, 2002). A number of genes have been duplicated in amphioxus since its separation from other chordate lineages. These genes include a 14th gene in the Hox cluster, a duplicate Evx (whose function has diverged from the standard function of Evx), and a duplicate Emx gene (Minguillon, 2002).


GENE FUSION
Human hexokinase enzymes resulted from the fusion of two replicas of an ancestral hexokinase. One of the subunits of the resultant enzyme was modified over time to serve a regulatory function.

The Tre2 oncogene only occurs in hominoid primates and seems to have resulted from the fusion of two genes: the highly conserved USP32 and the TBC1D3 gene which has undergone amplification in primates (Paulding, 2003).

Most ABC transporters are composed of 5 separate protein domains: 2 ATP binding regions (or nucleotide-binding regions, NBDs), 2 transmembrane regions, and a receptor. In prokaryotes, most of these domains are encoded by their own separate genes, although genes for these separate domains can be located near each other in operons. Other transporters are encoded by genes which code for two or more of these domains, resulting from a fusion of these individual genes. As a result, some ABC transporters are encoded by 4 genes, 3 genes, 2 genes, or 1 single gene. In humans, "full" ABC proteins possess 2 transmembrane and 2 ATP-binding regions encoded by one single gene and are usually located on the cell membrane. "Half" ABC proteins possess only one of each of these domains and thus two "half" proteins are required to form a functional channel. These "half" proteins are typically expressed in the membranes of organelles.


Three ABC transporters: theHisQMP2transporter ofE.colimade of 4 subunits encoded by 3 genes; theDrosophilaeye pigment transporterformed by the products of 2 genes (each encoding half a transporter), and thechloride ion transporter responsible for cystic fibrosis (one gene)


Voltage regulated ion channels probably exist in all living organisms (Anderson, 2001). The structure of the ancestral ion channel seems to have been a small protein with two transmembrane regions on either side of a pore-forming region. This is the structure of the protein in many prokaryotic channels and eukaryotic inward rectifier channels. Before the split between prokaryotes and eukaryotes, four additional transmembrane regions were added to an ancestral channel, producing a protein with a pore-forming region and 6 transmembrane regions. Some channels resulted from a fusion of a 6 transmembrane segment channel and a 2 transmembrane segment channel (Anderson, 2001). Most potassium channels are formed by the interaction of four separate subunits composed of 6 transmembrane regions. All sodium and calcium channels are formed by a single protein which is formed by four tandem regions composed of 6 transmembrane regions (Anderson, 2001).

Potassium channels are ancient proteins, and are conserved between archaea and eukaryotes (Jiang, 2002a and 2002b). The simplest channels form tetramers using four subunits of the same gene. (Yellen, 2002; Fleishman, 2004). As pictured below, the potassium channel is not only the simplest of the voltage regulated ion channels, its 6-transmembrane region structure (with the fourth unit being the voltage-regulated portion) is the prototype for the more complex sodium and calcium channels which are composed of four separate homologous regions. The potassium, sodium, and calcium voltage regulated channels are pictured below (after Darnell, p.782).


Collagen is the most abundant protein in the human body. The basic organization of collagen is a set of three intertwined helices (which may represent the products of one, two, or three genes). In each turn of the helix, there are three amino acids and every third amino acid is a glycine residue (which has the smallest side chain and this is the only amino acid which can fit inside the triple helix). Proline is the second most common amino acid in collagen and many parts of the chain are repeating units of Gly-Pro-X (glycine, proline, followed by any amino acid). It is thought that the "primordial unit" of collagen is a 54 base pair DNA sequence encoding 6 Gly-X-Y amino acid triplets (X is usually proline and Y hydroxyproline). This primordial unit was then duplicated to produce multiple exons in collagen genes. In the (I) collagen gene, there are 21 exons which consist of this primordial unit, 9 exons which consist of 2 primordial units, 1 exon of three primordial units, and 10 exons which consist of 1 or 2 primordial units with 9 base pairs deleted (Darnell, p. 907). The 2 (I) gene possesses 42 exons (of 52) which encode these repeats for a total of 338 repeats (Brown, p. 473). Thus the entire gene is based on duplications of a much smaller functional unit. Each exon encodes a complete set of triplet amino acid repeats.

INTRON CAPTURE
The phosphoglycerate kinase in trypanosomes has acquired a unique amino acid sequence by the capture of an intron. No new function has yet been acquired by this sequence (Golding, 1994).

PROCESSED PSEUDOGENES
Although most processed pseudogenes seem to be incapable of function (because the reinserted mRNA lacks the portions of the original gene which initiated transcription), there are cases in which some are activated and contribute to phenotype. It has been estimated that about 1% can be activated after retroposition. Comparisons of human and mouse genomes suggest that the rate of activated pseudogenes being added to the genome is comparable to that of new gene duplications (Sakai, 2007).


DUPLICATIONS AND CHANGES IN EXPRESSION PATTERNS
Mutations can alter the sequences of existing genes and change their expression pattern of these genes throughout an organism's body. If evolution is correct, then significant aspects of an organisms' complexity could be attributed to these types of changes. If creationism and intelligent design are correct, minor modifications such as these could never lead to significant increases in complexity.

INSECTS
Significant Hox differences are observed when comparing Drosophila to Oncyphorans (a sister group of arthropods) such as the expression pattern of Ubx and its ability to repress Dll (Distalless); such modifications might have been crucial in the evolution of the arthropod or insect body plans. The Ultrabithorax gene of onynchohorans, when expressed in Drosophila, is capable of producing some, but not all, of the developmental changes induced by Drosophila Ultrabithorax. During the course of insect evolution, modifications occurred in Ultrabithorax function, expanding its activity (Grenier, 2000). While insects, onychophorans, and other arthropods possess the Ultrabithorax homeobox protein, there is a regulatory domain present in the carboxy end of the protein found in insects which is absent in onycophorans and other arthropods.


The onycophoran Ultrabithorax protein, when expressed in Drosophila, can induce some, but not all, of the developmental changes induced by the Drosophila Ultrabithorax protein. The domain of the protein which was added in the insect lineage may have been involved in the specialization of the higher insect form. The most primitive insects (such as collembolans) lack this addition to Ultrabithorax and differ from higher insects in that they possess limbs on their abdomens and the last thoracic and first abdominal segments are not well differentiated from each other. (Galant, 2002).
Hox gene expression in all arthropods divides the heads into 6 segments, demonstrating a common ancestry (this is true even in spiders, which, until recently were thought to be missing the first antennal segment) (Damen 1998).


In insects, the expression pattern of Hox genes was modified to result in shorter, non-overlapping segments. Although the same Hox genes are expressed in crustacean and insect heads, there are differences in the expression of Hox genes such as lab, pb, and Dfd. Insect and crustacean heads possess modified versions of the same ancestral structures (Abzhanov, 1999). Comparisons of Hox gene expression patterns in the heads of chelicerates and other arthropods show that divergent head structures are composed of equivalent head segments. Chelicerates retain their deuterocerebral segment, which is not obvious from morphological studies (Telford, 1998).


The difference in function of the first thoracic limbs (legs in insects; maxillipeds for feeding in crustaceans) may be the result of observed differences in Hox gene expression between these groups in these appendages (Ubx and AbdA). In woodlice, the first thoracic appendage begins development as a leg and then tranforms to a maxilliped; this change is associated with the expression of Scr (Averof, 1997b). Specializations in insect legs has occurred through changes in the temporal and spatial expression of Hox genes such as Ultrabithorax and abdominal-A (Mahfooz, 2004). A number of changes have been observed in invertebrates such as tandem duplications and the splitting of hox clusters through the insertion of other genes (Wagner, 2003)


VERTEBRATES
Vertebrate homologs Evx1 and Evx2 resulting from a gene duplication in early vertebrate ancestry also function in development. In vertebrates (but not Amphioxus), the role of Evx in the nervous system is augmented to include expression in the midbrain-hindbrain boundary (MHB) which is an important organizing region for the vertebrate brain (Ferrier, 2001).
The evolution of vertebrate jaws seems to have involved a modification of the expression pattern of the homeobox Dlx. In lampreys, Dlx is expressed in the pharyngeal arches where cartilage forms but in jawed vertebrates, there are two different regions of expression in the dorsal and ventral regions of these arches. In gnathostomes, Dlx 1and 2 are expressed throughout the region of the developing jaws, Dlx 5 and 6 are expressed in the lower jaw and hyoid, and Dlx3 and 7 are expressed in the ventral lower jaw and hyoid. This change in expression pattern might underlie the ability of the ventral arches and the lower jaws to move. (Neidert, 2001; Stock, 1996; Graham, 2002).