|GENETICS HOME||GENETICS TABLE OF CONTENTS||OBL HOME||OBL REFERENCES|
GENETIC EVIDENCE FOR EVOLUTION FROM A VARIETY OF GENE FAMILIES
One of the most significant conclusions from the analysis of genomes is the great role played by gene duplications to create gene families (Danielson, 1999). As has been discussed in other chapters, a large percentage of modern genomes appear to be homologous derivatives of ancestral genes which have resulted from multiple duplications and subsequent modification. This process continues today and individuals can vary in their possession of duplicates of individual genes (such as the globins and opsins). These duplications offer an evolutionary opportunity to gain new functions. It seems that extensive gene duplication occurred before the origin of animals which gave rise to new domains and domain shuffling (Muller, 2001a).
In addition to those gene families described in previous chapters, there are many, many more. The functions performed by these separate family members do not necessarily need to be performed by similar molecules at all; the human genome could consist of tens of thousands of unique genes which are equally unrelated to each other. Instead, gene families consisting of modified duplications of ancestral genes are the norm when analyzing genomes.
For example, in the three domains of life, more than 2000 cytochrome P450 enzymes are known which have been classified into more than 235 families. These proteins are present in virtually all eukaryotic cells where they are typically attached to membranes (such as those of the ER or mitochondria) (Williams, 2004; Jacobs, 2003).
Collagen is the most abundant protein in the human body. There are at least 19 kinds of vertebrate collagen coded by at least 33 genes. It is thought that the “primordial unit” of collagen is a 54 base pair DNA sequence encoding 6 Gly-X-Y amino acid triplets (X is usually proline and Y hydroxyproline). This primordial unit was then duplicated to produce multiple exons in collagen genes. In the a(I) collagen gene, there are 21 exons which consist of this primordial unit, 9 exons which consist of 2 primordial units, 1 exon of three primordial units, and 10 exons which consist of 1 or 2 primordial units with 9 base pairs deleted (Darnell, p. 907). The a2 (I) gene possesses 42 exons (of 52) which encode these repeats for a total of 338 repeats (Brown, p. 473).
The human genome includes at least 62 genes for intermediate filaments (Karabinos, 2004). The gene family of intermediate filaments can be divided into five groups: two groups of keratins, vimentins, neurofilaments, and nuclear laminins. While all eukaryotes possess laminins, the laminins found in vertebrates possess additional domains and an addition of 42 amino acids in one ancestral domain (Reimer, 1998).
There see to be around 20 actin genes in the human genome. For example, ACTA1 is expressed in skeletal muscle and mutations cause myopathy. ACTB is expressed in cells other than muscle. There are about 20 pseudogenes of ACTB (for example, ACTBP1-5). ACTIN, platelet is expressed in platelets (the purple cell in the following image is a platelet). Myosins also form a gene family. MYL2 is expressed in cardiac and smooth muscle; mutations cause hypertrophic cardiomyopathy. MYL5 is expressed in the retina (such as that of a developing frog in the following image), cerebellum, basal ganglia, and in fetal skeletal muscle.
The secretory calcium-binding
phosphoprotein family (SCPP) includes three
proteins in enamel matrix (amelogenin, enamelin, and ameloblastin), five
proteins involved in the formation of dentin and bone (dentin, sialophosphoprotein, dentin matrix acidic phosphoprotein 1, integrin-binding
sialoprotein, matrix extracelullar
phophoglycoprotein, and secreted phosphoprotein
1), caseins, and several salivary proteins.
Most of these genes are located on a cluster on chromosome 4q13
in humans. This gene family seems
to have arisen from the SPARC gene. SPARC,
which is expressed in fish bone and scales, may have been the first gene
expressed in vertebrate mineralized tissue.
SPARC is expressed where the epithelium meets the connective tissue
beneath in invertebrates and jawless fish (
Serine protease gene family members have been adapted to a wide variety of diverse functions. ACROSIN ACR is the major protease in the acrosome of spermatozoa. AIRWAY TRYPSIN-LIKE PROTEN is made by serous glands in the bronchi and trachea and is present in the mucus lining the respiratory tract, at least in those with chronic disease. CHYMOTRYPSIN is a digestive enzyme. It also has a hypocalecimic function which continues even when its protease function is blocked. GRANZYME B is expressed in activated T cells and is involved in target cell apoptosis. Three serine proteases called MASPS activate the complement cascade after interacting with the MBL or ficolins bound to a microbial substrate. They are homologous to the complement factors C1r and C1s, and share a common structural organization, including serine protease domains (Presanis, 2003). Within the protease superfamily of genes, one family of related proteins includes clotting factor IX, factor X, factor VII, and protein C. Factors VII and X remain linked on chromosme 13. Clotting factor XII, tissue plasminogen activator, and urokinase are related proteases as are Clotting Factors VIII and V (Banfield, 1994).
HOMOLOGIES IN DISTANTLY RELATED ORGANISMS
There is no expectation that homologs of human genes would exist in other organisms if they were not evolutionarily related. However, human cells utilize many of the same molecular mechanisms found in the most primitive cells.
Ribosomes are organelles which are conserved between all eukaryotic and prokaryotic organisms. As the sites of protein synthesis, they are essential parts of the cell.
Despite the fact that a microscopic bacteria may obtain its nutrition from decomposition in the soil and a human skeletal muscle cell from the digested products of lasagna brought to it by the bloodstream, they both begin to metabolize their food molecules through the same chemical reactions, catalyzed by the same enzymes. The series of chemical reactions is referred to as glycolysis. Glycolysis does not require any specialized part of the cell to occur nor does it require oxygen (neither of which would have been available to the most primitive cells). The steps of the glycolytic pathway are given below followed by descriptions of the versions of these ubiquitous enzymes found in the human genome.
Hemoglobins are heme containing proteins which reversibly bind oxygen. They are found in bacteria, fungi, higher plants, most invertebrates and all vertebrates. All of them belong to the same globin gene family, having evolved from a single ancestral protein of about 17 kDa.
Tunicates possess intermediate proteins homologus to IF subfamilies of keratins, vimentins, and neurofilaments (Karabinos, 2004). Keratin is expressed in the epithelia of lancelets, beginning in the larval stage (Karabinos, 2001). Lancelets do possess multiple intermediate filaments which seem to represent a relatively unspecialized ancestral condition (Reimer, 1998).
The actin superfamily includes actin, actin related proteins (ARPs), heat shock proteins, heat shock protein cognates (HSP60, HSC70), sugar kinases (such as hexokinase), and a number of ATP-binding proteins known in bacteria (such as MreB, FtsA, and StbA) (Kandasamy, ). Actin is conserved in the diverse classes of eukaryotes, as are actin-related proteins (ARPs). Some ARPs function in the cytoskeleton (ARP1-3) where two ARPs (ARP2 and ARP3) are required for the polymerization of actin monomers. A number of ARPs function in the nucleus where some of them have a role in remodeling chromatin. The fact that this role is conserved in eukaryotes indicates that ancestral eukaryotes utilized members of the actin family in the nucleus. ARP1, 2, 3, 4, 5, 6, 8,and 10 are known in humans and diverse other eukaryotes as well including plants and/or yeast. ARP 7 and ARP 9 are only known in yeast (Blessing, 2004).
The myosin gene family is known in at least 6 phyla of protists and protest myosins include homologs of Myosin I-A, I-B, I-C, I-D, I-E, I-F, I-K, II, IV, VII, XI, XIV, Myo1, MyoJ, and MyoM. Protist myosins function in cell shape, motility, cytokinesis, nuclear division, and movement of membranes and vacuoles(Gavin, 2001).
Given that lamprey and hagfish cartilages are now known not to possess collagen as a major component, any definition which includes the skeletal tissues of jawless fish as true cartilage also includes a number of invertebrate cartilages. Cartilage/cartilage-like tissue is known in cnidarians, annelids, arthropods, and mollusks (Robson, 1999). The cartilage of squid does possess collagen and resembles the hyaline cartilage of vertebrates, although it is not the collagen II found in vertebrate hyaline cartilage (Robson, 1999).
In E. coli, RecA is a protein which mediates recombination. In eukaryotes, the homologous protein is important both in DNA replication and the recombination which occurs in meiosis.
The alcohol dehydrogenase which the liver produces belongs to the same gene family as yeast, plant, and prokaryotic alcohol dehydrogenases (Holmes, 1996).
THE GENE CLADOGRAM AND “IRREDUCIBLE COMPLEXITY”
One of the main arguments in “Intelligent Design” is that of “irreducible complexity.” Advocates of Intelligent Design have argued that molecular systems in living organisms involve multiple interacting genes and that such complex pathways could not have evolved gradually. Genetic analysis strongly refutes this.
Human proteins do not seem to have been “designed” for unique function in humans, instead they utilize functional folds known as domains which are widely distributed throughout living organisms. Although each major group of organisms have different distributions of these folds (for example, immunoglobulins for intercellular communication and zinc fingers for gene regulation are among the ten most abundant folds in animals but not plants or eubacteria), there are many folds which are shared. Of 229 protein folds identified in eukaryotes, 156 were shared with bacteria. Of 194 protein folds identified in animals (metazoans), 132 were shared with other eukaryotes. Of 181 protein folds identified in chordates, 131 were present in non-chordate animals (Gerstein, 1997).
Modern cells still depend on functional RNA
molecules for many processes which proteins could conceivably perform. The ribosome should be viewed as a ribozyme—an RNA enzyme. No
translation factor (such as EF-G) is required for translation. It could have formed in precells
which depended primarily on functional RNA molecules but which had begun
to use amino acid chains, perhaps as ways of stabilizing RNA molecules
(Woese, 2001, Cech, 2000, Nissen, 2000). It is possible that the ribosome originally
functioned in the polymerization of RNA and was later modified to enable
translation. Analysis of the tRNA binding sites of ribosomes
suggest that they might once have functioned in the polymerization of
RNA molecules (
Even the most central mechanisms in modern cell physiology seem to have evolved in stages. The last common ancestor of all modern life had most, but not all, of its translational machinery in the forms found in modern organisms. Some ribosomal proteins are found only in bacteria, others only in eukaryotes and archaea, and others only in eukaryotes. The origin of modern cells probably occurred with the establishment of the translational mechanisms were established (Woese, 2002).
Organisms in different domains of life can utilize nonhomologous enzymes to catalyze the same steps of DNA replication. There are non-homologous enzymes which accomplish DNA topoisomerase I, DNA topoisomerase II, DNA primase, DNE-dependent RNA polymerase, and thymidylate synthase. DNA polymerase is encoded by several families of proteins. The families of A and B are related to RNA polymerase, reverse transcriptase, and the DNA polymerase of viruses (Forterre, 2002).
Although eukaryotes all use the same glycolytic pathway for glucose degradation, nonhomologous enzymes may catalyze the same step in different organisms. For example, several of the most primitive amitochondrate eukaryotes use glucokinase and glucosephosphate isomerase enzymes in the early steps of glycolysis similar to some bacteria rather than the hexokinase typical of higher eukaryotes (Henze, 2001; Wu, 2001).
Some of the enzymes used in the Krebs cycle used for aerobic respiration are known in anaerobic bacteria, such as aconitase and isocitrate dehydrogenase (Baughin, 2002). Aerobic respiration in eukaryotes seems to have developed from similar (although simpler) structures in bacteria.
Bacterial F0 particles have 3 subunits (a, b, and c) while eukaryotes have these subunits plus 2-5 more. Bacterial F1 particles are composed of 5 subunits.
Cytochrome oxidase must have preceded the rise in oxygen concentration and this enzyme is known in all 3 domains of living organisms. Oxygen which rusted iron in fossil sediments (redbeds) are known from 2.4-2.6 billion years ago and the modern concentration of oxygen was reached around 500 million years ago. The original enzymes may have functioned in denitirification—denitrifying and oxygen reducing chains are similar. The first oxidase, FixN, reduced oxygen rather than joining two NO (Baltscheffsky, p. 284-90).
Hemoglobin is similar in structure to peroxidase which removes dangerous forms of oxygen. In nematodes, hemoglobin can function both as a peroxidase and to remove NO. Bacterial flavohemoglobin can remove NO by reacting it with oxygen to form nitrate. When oxygen is not present, flavohemoglobin removes NO by promoting the conversion of N2O. Thus these molecules offer protection from NO in both aerobic and anaerobic conditions. In the ancient earth (and in communities of deep sea vents), NO is abundant while oxygen is scarce. Mycobacterium tuberculosis uses hemoglobin to protect it from reactive N molecules of the host’s defenses during infection. The globins of primitive worms can protect against reactive forms of oxygen. Given that hemoglobin occurs is widespread in both prokaryotes and eukaryotes, it must have been used early in earth’s history before substantial amounts of oxygen had accumulated in the atmosphere. Other globin molecules bind hydrogen sulfide. The ability to bind oxygen may be a secondarily acquired ability of this gene family (Hausladen, 2001; Couture, 1999; Lecomte, 2005; Wu, 2003b).
The lens of the vertebrate eye (the lenses
of a developing frog, chicken, and pig are depicted in the preceding images)
produces a number of water soluble proteins called crystallins
which together compose 80-90% of the protein in the lens. There are ubiquitous crystallins
which are present in all vertebrate eyes and there are taxon-specific
crystallins which appear in some lineages but
not in others. Interestingly, many
crystallins are identical or homologous to other genes and
seem to have been included in the lens as a secondary function. Homologs of the crystallin genes, with both one domain and two domains are
known in slime molds and bacteria. Two human crystallin
genes on chromosome 11 seem to have arisen from a duplication in which
one is transcribed in the opposite direction as the other (
aA and aB
crystallins are members of the heat shock protein
superfamily. They may
prevent inappropriate protein aggregations in the lens (Venner,
1990). The mammalian α crystalline/hsp family includes αA-and βB-crystallins, p20, hsp 27, and HSPL27 (
The Rh protein of human red blood cell membranes seems to be a channel which allows the bidirectional transport of carbon dioxide, ammonia gas, NO, and oxygen. Although the function of these glycoproteins is not entirely known, they are homologous to ammonium transporters in bacteria, fungi, plants, invertebrates, and a number of vertebrates. Mutations limit the growth of algae in high carbon dioxide concentrations. (Okuda, 2002; Soupene, 2004).
Some have asked the question as to how a clotting factor cascade could evolve, given that a number of factors acting one at a time are needed to produce the end result. Interestingly, it seems as if the mechanism of blood clotting is a modified pathway which originally was involved in immunity. Intelligent design advocates have often referred to the coagulation cascade as a mechanism which could not have evolved. Analysis of the proteins utilized by this cascade demonstrates that they are modifications of ancestral proteins which possessed other functions.
SERINE PROTEASES INVOLVED IN COAGULATION
The first serine proteases are thought to have been simple digestive enzymes. Gene duplications produced multiple copies of these genes and allowed for some to adapt for a variety of more specific functions. Serine proteases are a large family of enzymes in the human genome which function in diverse physiological processes ranging from digestion to coagulation (OMIM; Yosef, 2003). This is an ancient gene family, including eubacterial digestive enzymes (and the vertebrate digestive enzymes trypsin and chymotrypsin). Most of these proteins have the amino acid proline at residue 225 in the protein. However, in vertebrates, some of these proteins possess the amino acid serine at residue 225. This enabled the binding of sodium and novel protein function. Some serine proteases in blood (such as plasmin and clotting factor XIa) possess a proline at site 225 while others such as thrombin, clotting factor Xa (involved in clotting), and complement protein C1r (involved in immunity) possess a serine. Mutations at site 225 drastically affect the function of thrombin (affecting ligand recognition up to 60,000 times). The change in some of the serine proteases needed to acquire a function in coagulation seems to stem from one ancestral mutation changing the amino acid at residue 225 (Guinto,1998; Dang, 1996).
Thrombin, the protein which converts the inactive blood protein fibrinogen into the fibrin which forms a blood clot, is a serine protease. There are a number of serine protease cascades (vertebrate coagulation, vertebrate complement, arthropod hemolymph clotting, and arthropod developmental determination of dorsal and ventral positions) which involve three central serine proteases. The third (downstream) proteases easter (in arthropod development), arthropod clotting enzyme, complement C2, and thrombin cleave precursor molecules spatzle, coagulen, C3, and fibrinogen to form the Toll ligand, coagulin, C3a & C3b, and fibrin (Krem, 2002). Many of these serine proteases are homologous. Thrombin is homologous to C1r and C1s of the complement cascade. Human clotting factors VII, IX, and X are homologous to the factor C of the horseshoe crab clotting cascade. The arthropod developmental cascade may be the closest to the ancestral cascade that gave rise to others (Krem, 2002).
Enzymes of the coagulation cascade participate in immunity, cell growth and embryogenesis. Thrombin is not only expressed in the liver, the major site of the clotting factor synthesis, but also in developing and adult rate brains. Thrombin proteolytically activates protease activated receptors (PARs) connected to G-protein signal transduction cascades, promoting the survival or apoptosis of glial cells and neurons, the survival of myoblasts, and neutrophil chemotaxis. Prothrombin can even promote the migration of cells through the extracellular matrix. Factor Xa can act as a growth factor (Krem, 2002).
Tissue plasminogen activator (t-PA) and urokinase-type plasminogen activator (u-PA) can cleave plasminogen. All are serine proteases. Plasminogen also has other roles in wound healing, inflammation, and neural degeneration through targets other than fibrin (Hervio, 2000).
Within the protease superfamily of genes, one family of related proteins includes clotting factor IX, factor X, factor VII, and protein C. Factors VII and X remain linked on chromosme 13. Clotting factor XII, tissue plasminogen activator, and urokinase are related proteases as are Clotting Factors VIII and V. Hagfish possess prothrombin molecules homologous to those of vertebrates (Banfield, 1994). Coagulation factor VII interacts with tissue factor to initiate the extrinsic pathway for coagulation and is known to exist in zebrafish. The zebrafish domain structure of Factor VII possesses the shared domains found in coagulation factors VII, IX, X and protein C (Sheehan, 2001). The intergenic DNA between factors VII and VIII is similar to the intergenic DNA of the trypsin cluster in Drosophila (Hanumanthaiah, 2002).
The sea urchin protein SpBf is a complement protein which possesses SCR domains, a von Willebrand factor domain, and a serine protease domain. Sea urchins possess complement C3 proteins which seem to function in opsonization and whose levels increase in response to infection (Smith, 2002).
OTHER COAGULATION PROTEINS
In the pufferfish (a teleost), proteins are known which are homologous to most of the 26 mammalian proteins involved in coagulation although a few are absent and several exist as multiple copies. While the tunicate genome possesses gene family members (and the functional domains) of almost all of these proteins, none of them are truly homologous, indicating that the evolution of the vertebrate coagulation cascade occurred after the separation of the urochordate lineage from that of vertebrates (Jiang, 2003). Teleosts have all the clotting factors found in mammals plus a additional VII-like homolog (Hanumanthaiah, 2002).
The blood of some invertebrates can clot in response to injury, but not through the cascade found in vertebrates. The invertebrate coagulogen is not related to fibrinogen (coagulogen is instead similar to nerve growth factor). However, there are fibrinogen-like molecules, in both vertebrates and invertebrates, including a group known as lectins. The first fibrinogen-related proteins in invertebrates were discovered by researchers who specifically predicted that such molecules would be found based on an evolutionary model for life’s diversity (Xu and Doolittle, 1990).
In horseshoe crabs, the tachylectin 5A produced in the hemolymph (rather than being expressed on the cells of the hemolymph as are other tachylectins) and causes the agglutination of bacteria (Adema, 1997; Gokudan, 1999; Kairies, 2001). Its amino acid sequence, 3-dimensional shape, and calcium binding site are homologous to those of fibrinogen. Vertebrates, including humans, have homologues of fibrinogen which function in innate immunity called ficolins which recognize carbohydrate groups on bacteria. Human ficolins bind the same molecules as tachylectin 5A and are more closely related to tachylectins than to fibrinogen. (Kairies, 2001). Interestingly, the von Willebrand factor which is also involved in coagulation is homologous to invertebrate lectins (Adema, 1997).
When blood clots, several clotting factors and several proteins involved in the regulation of coagulation perform a reaction converting the amino acid glutamate to g-carboxyglutamate (Gla) after translation. Not only is this reaction essential for blood clotting (where it was first discovered), it also has other functions in vertebrates and occurs in several bone proteins, for example. This reaction and the enzyme which catalyzes it (g-glutamyl carboxylase) were thought to be found only in vertebrates. It is now known in insects and molluscs as well where g-carboxylation of glutamate has several roles, such as the production of venom peptides. The g-glutamyl carboxylase gene is conserved between mammals (including humans), insects, and mollusks. In fact, the correspondence of intron/exon boundaries is surprisingly homologous and eight of the introns appear to have predated the split in coelomate lineages in the Precambrian. Thus, an enzyme which appeared to have been unique to vertebrates is a modified version of an ancestral enzyme which long predated the vertebrate clotting cascade (Bandyopadhyay, 2002).
Tissue Factor (TF) serves as a cell membrane attachment (tether) for one of the protease enzymes of the clotting cascade (clotting factor VII). It is homologous to cytokine receptors –receptors for erythropoeitin, interleukins, colony stimulating factor, interferon, and several hormones. This group belongs to the immunoglobulin superfamily which is one of the largest protein families in the animal kingdom. Before vertebrates evolved a coagulation cascade involving TF, receptors-related to TF were already present on cell membranes and their functions included the response to infection (such as might occur after a wound) (Bazan, 1990).
Plasmin can cleave other substrates far more effectively than its typical substrate. A negative selectivity seems to have decreased the efficiency of its binding to properly regulate the amount of fibrinolysis (Hervio, 2000).
THE GENE CLADOGRAM
Many of the genes which humans require to be human evolved long before humans. The distribution of these genes among modern organisms supports that modern groups of organisms can be organized into clades which share a common ancestry. The same clades of organisms which are supported through the analysis of signaling molecules are supported by the analysis of other genes, anatomical features, embryological development, and the fossil record. The organization of modern organism into a nested hierarchy of clades is predicted by the evolutionary model but not alternative models.
--Streptococcus pyogenes possesses a collagen-like sequence in enzyme hyaluronidase (Stern, 1992).
--Bacteria possess several homologs of actin such as MreB and ParM which can polymerize into filaments. MreB determines the shape of the cell and ParM functions in the movement of intracellular structures, such as plasmids (Amos, 2004; Egleman, 2003). MreB seems to form a bacterial filamentous cytoskeleton under the cell membrane (Egelman, 2001).
FtsZ is known in eubacteria, archaea and some mitochondria (Gull, 2001).
--There is some evidence of fibronectin and immunoglobulin domains existing in bacteria (Aravind, 2003, Angata, 2002).
--Heat shock proteins are a family of proteins found in archaea, bacteria (such as those above), and eukaryotes which share a heat shock domain of about 100 amino acids at the C-terminus (Waters, 1999; Iwaki, 1997).
--The b/g crystallins compose the majority of lens proteins in most vertebrates and are related to microbial stress-protective proteins (Piatigorsky, 2001), mammalian AIM1 which is involved in melanoma tumorigenicity, spherulin 3a of slime molds, and the epidermis-specific EDSP of amphibians (Ray, 1997).
--Hemoglobin is (Hausladen, 2001; Couture, 1999).
--serine proteases (Guinto,1998; Dang, 1996).
--alcohol dehydrogenase (Holmes, 1996).
--Long chain fatty acids (LCFAs) are energy sources in both prokaryotes and eukaryotes. The gene family of transporters for these fatty acids (FATPs) perform a conserved function in bacteria and vertebrates (including humans) (Hirsch, 1998).
--electron transport system
-- Lactate dehydrogenase (LDH) is an ancient gene which is expressed in bacteria, plants, and animals (OMIM)
--A number of proteins functioning in aerobic respiration are shared between eubacteria and archaea and thus may have existed in the last common ancestor. These include cytochrome oxidase (subunits I and II), cytochrome b, Rieske iron-sulphur, blue copper protein, 2Fe-2S and 4Fe-4S ferredoxins, and succinate dehydrogenase (the iron-sulphur subunit) (Castresana, 1995).
-- cytochrome P450 enzymes (Williams, 2004; Jacobs, 2003).
--Translation occurs at ribosomes which are made of rRNA (ribosomal) and protein (it seems 3 types of rRNA and about 60-70 proteins are needed; rRNA makes up half the weight). Ribosomes must be assembled from several subunits (which are identified by their density measured in Svedburg units, S).
--RNase P and MRP are endonucleases which are required for the processing of tRNAs and mitochondrial DNA replication, respectively. RNase P is known from all kingdoms of life while MRP is known in eukaryotes (Bompfunewerer, 2005).
--Although all modern organisms have the same requirements of DNA replication, transcription, and translation, there are some differences in these processes between the major groups of living organisms today. This suggests that LUCA’s replication, transcription, and translation mechanisms were not complete at the time when the three domains of living organisms diverged. RNA synthesis was present in LUCA but it was less advanced than protein synthesis (Olsen, 1997). The last common ancestor of all modern life did not have transcriptional mechanisms which are shared in retained in modern organisms to the same degree that common translational mechanisms are. Although the two largest subunits (b and b’) and part of another subunit (a) of eubacterial RNA polymerase are conserved with archebacteria, additional archeal subunits and the nature of the a subunit are shared with eukaryotes but not with eubacteria. Although LUCA possessed rudimentary transcriptional mechanisms, these were modified differently in eubacteria and archebacteria (Woese, 2002).
--In eukaryotes, the origin recognition complex is a complex of proteins (six in yeast, four of which have homologs in humans) which binds the DNA and the site where replication begins. The helicase loading factor Cdc6/Cdc18 (two homologs known from two different species of yeast) is in the same gene family as two of these proteins of the origin recognition complex. There are similarities between Cdc6/Cdc18 and the prokaryotic helicase loader DnaC. CDKs target Cdc6/Cdc18. Six members of the minichromosome maintenance family (MCMs) seem to function as a helicase homologous to that of prokaryotes (Leatherwood, 1998).
--Eukaryotes use RecA family members for repair of double stranded breaks and homologous recombination is essential for meiosis. RecA family includes RecA of eubacteria, RadA in Archea, and Rad51 and Dmc1 in eukaryotes (Gasior, 2001).
--3-Hydroxy-3-methylglutaryl coenzyme A reductase (HMG-CoA reductase) functions both in sterol synthesis in vertebrates and the synthesis of juvenile hormone and pheromones in insects (Tittiger, 2003). Homologous enzymes are known in eubacteria, archebacteria, plants, fungi, and animals (Istvan, 2001).
--Ubiquitin occurs in all eukaryotic cells and in archaea but is not known from eubacteria. (Gamulin, V., from Muller, 1998.)
--The Sm/Lsm proteins form ribonucleoprotein complexes which function in RNA splicing, mRNA degradation, and the maintanence of telomeres. They are known in eukaryotes and archaea (Collins, 2001).
--snoRNAs (small nucleolar RNAs) are never translated into protein; the RNA is functional and helps the maturation of the rRNA in ribosome formation. More than thirty kinds are known. snoRNAs can modify the bases of other RNA molecules. SnoRNAs are known in both eukaryotes and archaea. Functional RNA molecules, especially those involved in peptidyl transfer of the ribosome often contain modified nucleotides which have undergons 2-O-ribose methylation and pseudouridylation. snoRNAs guide many of these modifications (Bachellerie, 2002). Archaea and eukaryotes use snoRNAs with a C/D box perform 2’-O-methylation of nucleotides while box H/ACA snoRNAs covert uridine to pseudouridine. The core of the complex which modifies RNA nucleotides in archaea and eukaryotes is similar to part of the ribosome, suggesting a common origin (Tran, 2004).
--Although LUCA possessed DNA polymerases which would be inherited by all its descendants, eubacteria and archea/eukaryotes replicate their DNA in distinct ways. The modern mechanisms for replicating DNA evolved twice from a simple ancestral pattern: once in the eubacterial lineage and a second time in the archea/eukaryotic lineage (Woese, 2002).
--In eukaryotes, DNA is coiled around histone proteins. The histone fold domain is present in a large number of proteins including transcription factors and enzymes in organisms ranging from archaea to mammals. Although histones were thought to be unique to eukaryotes, archaea are known to possess histone-like proteins (HMf, HMt) which form dimers and allow the supercoiling of DNA (Arents, 1995).
--There are a number of features which were once thought to distinguish eukaryotes and prokaryotes which are now known to be shared between archaea and eukaryotes, suggesting that archaea are more closely related to eukaryotes than are eubacteria. For example, archaea transcribe DNA through mechanisms which were formerly considered to be eukaryotic. In eukaryotes, transcription factors are required to begin transcription while in eubacteria, RNA polymerase can bind DNA strands without them. Archebacteria seem to require transcription factors for transcription, suggesting links to eukaryotic mechanisms (Ouzonunis, 1992). Archaea possess a promoter sequence similar to the TATA box of eukaryotes, genes homologous to the TBP transcription factor in eukaryotes, and a homolog of the eukaryotic translation elongation factor TFIIS (Langer, 1995; Thomas, 2001). Archebacteria and eukaryotes both utilize the general transcription factor TFIIB (Bagby, 1995). While the large subunits of RNA polymerase are homologous in all three domains of life, archaea are more like eukaryotes in their promoter sequence for RNA polymerase II promoters (TATA box) (Langer, 1995).
Archaea possess histones and form nucleosomes homologous to those of eukaryotes (Lopez-Garcia, 1999). Archaea resemble eukaryotes more than eubacteria in their polymerase, DNA replication factors, DNA repair enzymes, cell division control proteins, proteasomes, protein transport systems, elongation factors EF-1α and EF-2, α and β subunits of ATPase. Archeal mechanisms in translation (5S rRNA, initiator tRNAmet, and promoter sequences) are more similar with those of eukaryotes. There are several ribosomal proteins shared between archebacteria and eukaryotes which are not known from eubacteria. Six of the archaeal subunits of RNA polymerase are only known in eukaryotes. Primer recognition and the use of 7S RNA is also more similar to eukaryotes. Examples of a molecular chaperone protein (similar to a eukaryotic mitotic spindle protein), the use of the amino acid hypusine (previously known only from eukaryotes), and eukaryotic-like sensitivities to toxins and antibiotics are known in archebacteria. The archaea known as Crenarchaeota are thought to be most similar to the ancestral archaeon and also seem to be the most similar to eukaryotes. (Gray, 1992; Olsen, 1997; Bult, 1996).
Sequence comparisons of many genes supports that archaea are more closely related to eukaryotes than either are to eubacteria. A grouping of the eukaryotes with archaea is supported with adenylosuccinate synthase, arginosuccinate lyase, aspartate aminotransferase, deoxyribodipyrimidine photolyase, dihydroorotate oxidase, ferredoxin, glutamate dehydrogenase, glutamine synthetase, glycine hydroxymethyltransferase, lactate dehydrogenase, pyrroline -5-carboxylate reductase, pyruvate kinase, ribose-phosphate pyrophosphokinase, thronyl tRNA synthetase, triose-phosphate isomerase, UDP glucose epimerase, and valyl-tRNA synthetase (Katz, 1998).
--Archaea and eukaryotes utilize the kinase PTSK to create selenoproteins. Selenoproteins are much more abundant in archaea and eukaryotes than in eubacteria (Diamond, 2004).
--Double-strand breaks in chromosomes are one of the first steps in recombination during meiosis. This is mediated by the Spo11 protein which belongs to a gene family which includes proteins from nematodes and even archaea. It is similar to archeabacterial topoisomerase enzymes which are unlike topoisomerases of eubacteria and eukaryotes (Keeney, 1997).
--The proteins unique to eukaryotes include cytoskeletal proteins (tubulins, actin, tubulin-associated proteins, and actin associated proteins) and proteins associated with endocytosis (clathrin, clathrin associated proteins, and dynamin). Other proteins unique to eukaryotes include ribosomal proteins, proteins of the ER and Golgi, signaling molecules, ubiquitin, ubiquitin-like proteins, ubiquitin protease, ubiquitin conjugation enzymes, 14-3-3 proteins; B cyclin, some regulators of cell cycle, introns and gene promoters (Hartman, 2002; Xu, 2002; Katz, 1998).
--snRNAs (small nuclear RNAs) range in size from 80 to 350 nucleotides. They exist in all eukaryotes and small nuclear ribonucleoproteins (snrps; made with snRNAs U1 through U6) which form a structure known as the spliceosome which control the splicing of pre-mRNAs to produce mRNAs. The RNA is critical in this process: U2 and U6 can begin splicing even without the protein and mutations in the RNA sequences affect the specificity of the splicing. It is estimated that about 15% of the single point mutations which cause human disease affect mRNA splicing (Maniatis, 2002).
--While the presence of introns is shared by all eukaryotes, the use of introns and exon shuffling seems to have increased markedly in animals, contributing to their success (Patthy, 1999; Muller, 2002).
--lamins . (Reimer, 1998).
--The eukaryotic cytoskeleton is composed of actin filaments, microtubules, intermediate filaments, and motor proteins (Elmendorf, 2003).
--While yeast possess only one known actin gene, multiple genes are known from all protozoa, plants, and animals studied (Hightower, 1986). .Actin proteins are highly conserved proteins which make up most of the eukaryotic cytoskeleton and can compose 10-20% of the total cellular protein. These thin filaments are involved in organelle transport, cell motility, and cytokinesis (OMIM).
--In eukaryotes, the “tilt” of actin subunits is important in determining subunit interaction in filaments. This tilt is determined by two inserted sequences which are absent in bacterial homologs (Egleman, 2003).
A number of unconventional myosin classes had already appeared in primitive eukaryotes such as classes I, V, VII, and XII (Baker, 1997).
All eukaryotes appear to require α, β, and γ tubulins (Dutcher, 2001).
In all eukaryotes, the mitotic spindle’s equator identifies the site for cytokinesis (van den Ent, 2001; Leung, 2004).
--In many, but not all, eukaryotes α and β tubulins can be modified after translation through the addition of additional chains of glycyl and glutamyl residues (polyglcylation and polyglutamylation) (Gull, 2001).
Galectins are known in sponges and fungi but not in protozoa or plants (Muller, 2001).
---Eukaryotic ribosomes vary in size from 55S to 66S in animals to 70S to 80S in plants and fungi. The prokaryotic ribosome is 66% RNA, the eukaryotic ribosome is 60% RNA. In eukaryotes, the rRNA pieces are assembled with the ribosomal proteins (which have migrated into the nucleus) in the nucleolus; the area around which the nucleolus forms (the nucleolar organizer) contains the rRNA genes
--Microsporidia are primitive eukaryotes and they possess a ribosome whose size is similar to that of prokaryotes. In all eukaryotic ribosomes, only those of microsporidia lack a 5.8 S subunit. The large subunit rRNA in microsporidia and prokaryotes is homologous to the 5.8S subunit at its 5’end (Vossbrink, 1986).
-- MRP is known in eukaryotes (Bompfunewerer, 2005).
--Telomerase RNA includes a core which is conserved among eukaryotes. H/ACA box snoRNAs convert uridine to pseudourine. (Bompfunewerer, 2005).
--MicroRNAs (miRNAs) are small single stranded chains of about 22 nucleotides.(Bompfunewerer, 2005).
--Eukaryotes possess noncoding RNA which is similar to mRNA but is never translated (Bompfunewerer, 2005).
--About a third of the introns in these genes found in a protist (the malaria parasite) also exist in at least one of the groups of higher invertebrates, indicating that many introns have been retained over 1 billion years. I (Rogozin, 2003; Muller, 2002).
--PI3Ks are used in yeast, slime molds, plants, nematodes, fruit flies, and mammals (Vanhaesebroek, 1997).
-- n a number of genes, such as p38 and JNK kinases, the introns are highly conserved from sponges through protostomes (such as flies) and deuterostomes (such as humans). New introns are also known which are specific to some lineages, especially animals and plants (Rogozin, 2003; Muller, 2002).
--Type IV collagen is known from sponges (Boute, 1997
--Since sponges can contain both fibrillar and non-fibrillar collagens, the amplification of this gene family had begun in the early animals (Exposito, 1990).
--Tenascin is an extracellular matrix protein which is known in vertebrates and invertebrates. Sponges, the most primitive animals, also possess tenascin (Humvert-David, 1993).
--A βγ cyrstallin is known in sponges.
--Introns in vertebrate βγ crystallins seem to have arisen after the sponge lineage separated from them (Di Maro, 2002).
--Rh-like proteins known in nematodes and sponges (Seack, 1997).
--sponges possess homologs of the flagellar and mitochondrial creatine kinases. (Pineda, 2001).
--Metazoans share a U7-snRNP mechanism which processes the pre-mRNA of histone proteins (Bompfunewerer, 2005).
are known from a variety of vertebrates as well as from nematodes. In humans, autoantibodies
directed against them can result in lupus erythematosus
and Sjogren’s syndrome. Although their function is not completely known,
it seems they are involved in the production of 5S rRNA
--This 21 nucleotide RNA seems to be widespread (if not universal) in bilateran animals (including humans) but not in more primitive organisms. In C. elegans it is involved in development and seems to function in the down-regulation of genes (OMIM).
--Microcephalin (MCPH1) is related to topoisomerase II-binding protein and BRCA1. It regulates chromosome condensation in mitosis and DNA repair. Homologs exist in bilateran animals (Ponting, 2005).
--Human ubiquitin carrier protein E2-C/UbcH10 is homologous to and can substitute for that of clams (Townsley, 1997).
--Arthropods utilize a cascade of serine proteases in their clotting/immune response. A similar ancestral cascade probably gave rise to the coagulation and complement cascades in vertebrates. The more derived sequences often have additional domains which have been added, such as EGF, kringle, and LDL domains (Krem, 2000).
--The intergenic DNA between factors VII and VIII is similar to the intergenic DNA of the trypsin cluster in Drosophila (Hanumanthaiah, 2002).
--Diverse creatine kinases (cytoplasmic, mitochondrial and flagellar) are known in both protostomes and deuterostomes. (Sona, 2004; Ellington, 2000).
--Abnormal spindle-like microcephaly associated ASPM is a large protein which interacts with microtubules and is expressed in areas where new neurons are produced. Its homolog in flies is known to function in the organization of microtubules during cell division (Ponting, 2005).
--The sea urchin protein SpBf is a complement protein which possesses SCR domains, a von Willebrand factor domain, and a serine protease domain. Sea urchins possess complement C3 proteins which seem to function in opsonization and whose levels increase in response to infection (Smith, 2002).
-- Tunicates possess intermediate proteins homologus to IF subfamilies of keratins, vimentins, and neurofilaments (Karabinos, 2004).
--intermediate filaments which are classified with these subfamilies, unlike all other non-vertebrates (Reimer, 1998).
--Amphioxus uses muscle fibers in its notochord, using it for locomotion in a way unlike that of vertebrate embryos. The actin in amphioxus notochord is intermediate between the forms of actin used in muscle and non-muscle cells in vertebrates (Suzuki, 2000).
--The cytokeratins of both tetrapods and fish can be grouped into the same 2 groups. Most vertebrate epithelial cells produce keratin.
--Hagfish have only a few keratin genes while higher vertebrates possess many more, presumably the result of duplications early in the vertebrate lineage (Fuchs, 1981; Markl, 1989).
--Keratin is known in fish, including jawless fish(Conrad, 1998).
--Hagfish possess prothrombin molecules homologous to those of vertebrates (Banfield, 1994).
--Rates of carbonic anhydrase activity suggest that there have been increases in the catalytic efficiency of CA in the ancestors of agnathans and in the ancestors of higher vertebrates (Tufts, 2003)..
--From the analysis of Hox sequences in hagfish, it appears that at least one of the genome duplications early in vertebrate evolution occurred before craniates evolved and it seems that additional duplications occurred in hagfish after this. (Stadler, 2004; Hahn, 1998; Escriva, 2002; Hoyle, 1998).
--the lamins found in vertebrates possess additional domains and an addition of 42 amino acids in one ancestral domain. (Reimer, 1998).
-- The change in some of the serine proteases needed to acquire a function in coagulation seems to stem from one ancestral mutation changing the amino acid at residue 225 (Guinto,1998; Dang, 1996).
--Vertebrate fibrinogen is composed of several subunits, a, b, and g in all vertebrates studied (including lampreys).(Henschen, 1983; Weissbach, 1990).
--in the gnathostome lineage a duplication of globin gene gave rise to myoglobin and hemoglobin; shortly afterwards a duplication of hemoglobin gene gave rise to alpha and beta globin genes; a duplication in the alpha genes gives rise to some members of the alpha family which are expressed only in the embryo (in contrast, the embryonic members of the beta family originate later; embryonic beta genes in mammals and birds arose independently of each other)
--Although lampreys possess
glial cells which are morphologically similar to astrocytes, they lack the intermediate filaments normally
expressed in astrocytes (vimentin
and GFAP) and express keratins. Lamprey
glial cells express keratins in both the brain and spinal
cord that are similar to those of the epidermis (
-- Lampreys have a single lactate dehydrogenase gene which has a mixture of characteristics of the two main LDH genes in higher vertebrates: LDH-A in white skeletal muscle and LDH-B in aerobic tissues such as the heart and brain (OMIM).
--Vertebrates utilize 7SK RNP which has been shown in mammals to control transcriptional elongation (Bompfunewerer, 2005).
--MicroRNAs (miRNAs) are small single stranded chains of about 22 nucleotides. They are known in both animals and plants and underwent expansion in the vertebrate lineages. Their function is not yet known (Bompfunewerer, 2005).
--SPARCL1 gave rise to
amelogenin, enamelin, and ameloblastin early in the history of gnathostomes
--There are a number
of cases in which a single invertebrate gene is homolgous
to four vertebrate genes. This
is observed in Hox clusters, syndecan,
myc, BMP (5-8), EGFR/ERBB2-4, ENGR,
--coagulation and is known to exist in zebrafish. (Since little is known about the existence and extent of the coagulation cascade in fish, this was a finding which will help direct future research). The zebrafish domain structure of Factor VII possesses the shared domains found in coagulation factors VII, IX, X and protein C (Sheehan, 2001).
--In the pufferfish (a teleost), proteins are known which are homologous to most of the 26 mammalian proteins involved in coagulation although a few are absent and several exist as multiple copies. While the tunicate genome possesses gene family members (and the functional domains) of almost all of these proteins, none of them are truly homologous, indicating that the evolution of the vertebrate coagulation cascade occurred after the separation of the urochordate lineage from that of vertebrates (Jiang, 2003)
--Seventeen of the eighteen families of P450 enzymes present in mammals are also represented in fish (the only exception being the CYP39 family). Of the eighteen families, duplications of the CYP2 family members have produced the greatest diversity (Nelson, 2003).
--The CYP2J, CYP2N, and CYP2P subfamilies of P450 enzymes function in arachidonic acid metabolism and are members of a clade of P450 enzymes shared between fish and mammals (Oleksiak, 2003).
--ApoA-I and ApoE are expressed in yolk sac. The duplication which produced these two genes from an ancestral gene them predates the evolution of bony fish (Babin, 1997).
--Cytoglobin is the fourth type of globin known in humans, mice, and fish. It is expressed in almost all tissues and appears to be related to vertebrate myoglobin (Hankeln, 2005; DeSantis, 2004).
--In amniotes, keratins are only expressed in epithelia while other intermediate filaments, such as vimentin are expressed in mesenchyme. (Conrad, 1998).
--Reptilian homologs of AMH, DAX1, SF1, SOX9, and WT1 seem to function similarly to those proteins in mammals (Pleau, 1999; Shimada, 1998).
--The ancestral β globin gene had duplicated to produce an proto-eta gene and a proto-beta gene prior to the separation of the lineages of therian mammals. (Chiu, 1996; Meireles, 1995).
--duplication of beta globin gene to produce b and e globin (Meireles, 1995)
--involvement of b globin after embryonic development and e globin during embryonic development (Meireles, 1995)
--Unlike therian mammals, the monotreme genome does not seem to undergo imprinting and there is no evidence of X inactivation. In marsupials, some imprinting is known and the X inactivation does occur, although its mechanism is not as complex as that observed in placental mammals (Grutzner, 2003; Grutzner, 2004).
--Early in the evolution
of eutherian mammals, the proto-eta
locus duplicated to produce ε, η, and γ globins and the
proto-β locus duplicated to form β and δ globins. In the primate lineage, the η globin was mutated and became a pseudogene
and the γ globin gene was duplicated in
the lineage of anthropoid primates. In
--duplications of the eta globin gene produces 5’---e-----g-----h--- genes; duplication of beta globin genes produce---d----b---- genes; thus establishing an ancestral eutherian pattern for the cluster: 5’---e-----g-----h----d----b----3’ (Meireles, 1995)
--Mice and humans possess
equivalent keratin clusters with virtually all of the same genes in the
same order (
--Eta hemoglobin apparently was an embryonic hemoglobin in the ancestors of eutherian mammals. In artiodactyls (deer, cows, giraffes, etc.) it is still a functional gene. In primates, eta is a nonfunctional pseudogene. Rodents no longer have any trace of the eta hemoglobin gene.
-- In the primate lineage, the η globin was mutated and became a pseudogene and the γ globin gene was duplicated in the lineage of anthropoid primates. (Chiu, 1996; Meireles, 1995).
--eta hemoglobin becomes a pseudogene due to an A to G substitution in the initiation codon and other mutations (stop codon at position at position 429-31, and several deletions)
--e and g globin expressed in embryos (Chiu, 1996)
--The gene L-gluonolactone oxidase is required for an organism to synthesize its own vitamin C. In primates and guinea pigs, this gene lost its function. Presumably, the ancestral primates ate enough fruits that the inability to synthesize vitamin C was no great disadvantage to them. Humans still possess a non-functional pseudogene for this enzyme on chromosome 8p21.1 (OMIM).
-- One of the techniques used to determine which genes have had the greatest importance in the evolution of a specific lineage is to examine which genes have undergone an accelerated rate of change compared to homologs in related organisms. The proteins composing the electron transport chain in advanced primates have experienced this type of positive selection, suggesting that these modifications were important in primate adaptations, such as a large brain with increased oxygen requirements. Proteins of the electron transport chain have experienced positive selection in the lineage leading to higher apes (such as COX4-1, COX7AH, COX8L, and ISP), apes (COX4-1, COX8L, ISP), catarrhine primates (COX2, COX6B, COX6C, COX7C, CYCS, ISP; COX8H became a pseudogene), anthropoid primates (COX1, COX6B, COX6C, COX7C, COX8L, CYCS, CYB,ISP) and primates (COX8H) (Grossman, 2004).
--In the primate lineage,
the η globin was mutated and became a pseudogene and the γ globin
gene was duplicated in the lineage of anthropoid primates. In
--gene conversion in which d globin acquired exons from the adjacent b globin gene; (similar conversion events occurred in other mammalian lineages; in no lineage is the modern d gene composed only of d sequences; some of the exons have been replaced by b exons)
--g globin becomes the major b family globin during fetal development (in other mammals, g is expressed only during embryonic development and b is expressed during fetal development). (Meireles, 1995)
--g globin gene duplicated (Chiu, 1996)
the Y-specific DAZ cluster evolved since the divergence of new world monkeys and human lineages (Xu, 2001).
--Apolipoprotein(a) is homologous
to plasminogen and the gene possessed by humans
is known only in
--17 separate insertions/deletions in intergenic DNA between y and d globin (Maeda, 1988)
--insertion of intergenic DNA which increases the distance between e and g genes; chanignig distance from 5-7 kilobases to 13 kb (Chiu, 1996)
--b globin no longer used during fetal development (Chiu, 1996)
Many retroviral sequences of cattarhine primates share a common ancestry from an ancestral infection (Tristem, 2000).
--amino acid substitutions occurred at positions 121, 151, 155, and 156 of myoglobin
--amino acid substitution at position 19 of alpha hemoglobin
--3 Alu sequences inserted into a globin cluster (Bailey, 1997)
--In ape lineages, both of these genes have undergone positive selection unlike that of other mammalian lineages, suggesting that they have contributed to brain growth in apes (Ponting, 2005).
--changes in eta globin shared by all higher apes: 3 deletions (positions 164, 966-70, 1,610-1,637), an insertion (245-82), and at least 24 substitutions (positions 5, 66, 383, 405, 495, 573, 780, 853, 926, 1268, 1278, 1334, 1422, 1667, 1859, 1949, 2002, 2093, 2137, 2138, 2161, 2188, 2193, and 2213)
CHIMP AND HUMAN
--not only are the yh-globin (eta) sequences of chimps and humans more similar to each other than to the sequence of gorillas (and humans, chimps, and gorillas are much more similar to each other than any are to orangutans), there are changes from the ancestral pattern that humans and chimps share: transitions at positions 1338 and 4473, transversions at positions 560, 5480, and 6971, deletions at positions 1287 and 3054, and one insertion at position 3272.
--a deletion in y globin (Koop, 1986)
--6 base pair deletion in intergenic DNA between y and d globin (Maeda, 1988)
--KRTHAP1 is a keratin pseudogene in humans but it is an active gene in chimps and gorillas. Its loss of function is the result of one base pair change (OMIM).
--3 zeta hemoglobin genes in some Melanesians and Polynesians
--1 alpha hemoglobin in Melanesians
--1 alpha hemoglobin gene common in African Americans
--no G allele of HoxA1 in Asians
--most deep-rooted populations are African with respect to b globin gene (Long, 1990)
--most deep-rooted populations are African with respect to b globin gene (Wainscoat, 1986)
--3 mutations of b globin unique t Jews of Kurdistan which have been relatively isolated for 27 centuries (Rund, 1991)
--amplification of KGF gene (Kelley, 1992).