GENETICS HOME GENETICS TABLE OF CONTENTS   OBL HOME OBL REFERENCES

LUCA

(the Last Universal Common Ancestor)

     What was the last common ancestor of all living things like?  One of the ways to approximate and answer is to analyze the molecular characteristics that modern cells do and do not share.  Molecular mechanisms which are shared by all modern organisms are likely to have been inherited from the last common ancestor of living things.  In other cases, different groups of modern living things perform similar molecular tasks with different proteins, suggesting that these molecular mechanisms may not have been established in LUCA.

 

Modern cells must make protein from RNA.  Heart cells must make the proteins which allow muscles to contract.

LUCA
Cells lining the trachea must make the protein in cilia to beat mucus away from the lungs.
PSEUDOSTRATIFIED
Cells in arterial walls must make elastic protein fibers to allow for the stretching of each pulse.
ELASTIC TISSUE
The cells of the epidermis must make the keratin of human skin and hair.
SKIN

    Other organisms, ranging from bacteria to whales, also depend on converting an RNA sequence into a protein sequence.   Several great advances would have been present in LUCA compared to the pre-cells of the RNA world.  LUCA would have used proteins to carry out cell functions, which means that the genetic code and the process of translation had evolved prior to LUCA.

 

THE FUNCTION OF THE FIRST PROTEINS

     When the process of translation began, the odds of any mRNA producing the same order of amino acids when producing multiple peptides would have been close to zero.  Therefore, the value of proteins would have depended on a statistical average of the function of a set of peptides produced from and RNA sequence. 

     Of what value would it have been to ancestral cells to synthesize proteins without the ability to precisely control the amino acid sequence?  First of all, these precells, would have had a host of catalytic RNA molecules.  The original function of proteins probably would have been to stabilize the three dimensional structure of RNA and, arguably, precise sequences are not as critical in this function as in others, such as that of a catalytic enzyme.  The earliest cells may not have needed to catalyze as many reactions as modern cells do.  If the first cells existed in a rich chemical environment, the need to synthesize new metabolites would have been far less. The earliest peptides would not have replaced RNA; rather they would have interacted with RNA to enhance its functional capability (Noller, 2004). Proteins can actually bond to nucleotides and the first proteins might have been represented by small proteins which formed this type of bond (Maurel, 2006).

     How would proteins have acquired more critical functions?  It turns out that if you generate a pool of random proteins sequences (or RNA or DNA sequences, for that matter), some of the sequences frequently possess the ability to catalyze a specific reaction.  Other sequences possess the ability to catalyze different reaction.  Some catalytic activity exists in random sequences. If precells began to synthesize proteins for the purpose of stabilizing RNA chains, some of these proteins would have possessed additional potential functions, even if they were generated at random.  The first catalytic proteins which were selected for in cells would have supplemented and gradually replaced the RNA ribozymes which were already present in the cells (Woese, 1965).

     The earliest attempts to control the amino acid order of proteins might not necessarily have required the distinction between all 20 types of amino acids.  The first step in translation could have been to distinguish between groups of amino acids.  In modern organisms, the amino acids tyr, his, lys, glu, and trp (the “functional groups”) perform more critical functions than others such as phe, leu, val, ala, and thr (the “nonfunctional” groups).  (Woese, 1965)  One study proposed a chronology for the assignment of codons for the 20 amino acids, using the amounts produced under abiotic conditions, the use of the older tRNA type II synthetases to transport them, the relative abundance of each amino acid in modern proteins, abundance in the Murchison meteorite, thermostability of codons, and a number of hypotheses on the subject.   It is proposed that new codons were incorporated in complementary pairs which were derived from the earliest codons and that start and stop codons were developed last.  Using forty criteria, 9 amino acids were ranked as being the most likely to have been developed early: glycine, alanine, valine, aspartic acid, glutamic acid, proline, serine, leucine, and threonine (Trifonov, 2000).

     Some have suggested that the earliest coding RNA molecules would have been composed many duplicates of a primordial type of codon.  The proposed candidate is a codon like RNY in which the first nucleotide is a purine (R), the second nucleotide (N) could represent any of the four nucleotides, and the third nucleotide (Y) is a pyrimidine.  Such codons would permit GNC, GNU, ANC, and ANU.  Short molecules composed of many GNC triplets would have formed loops similar to those in tRNA. These short molecules might have performed both the functions performed by modern mRNA and tRNA  (Lehman, 2004).

 

Do all living things use essentially the same genetic code? Yes.  Would this have to be true if all living things did not evolve from a common ancestor?  No.

     The DNA genetic code is practically universal among living things; it must have been present in the last common ancestor of all modern living things (LUCA).  In the genetic code, triplets of DNA bases code for the amino acids of proteins.  If this can be compared to three letter words in a language, all living things speak the same language: the same triplet of bases will code for the same amino acid in every organism on the planet.  With only minor exceptions, a certain DNA sequence (gene) would be read the same way and produce the same amino acid sequence (protein) in every organism on the planet.

      In other words, if the following genetic message was copied from DNA and sent to the cell to make protein: GGUGAUAAGAGGCGGUCGCCGCUG, all living things would insert the same amino acids in the same order (aspartic acid, glycine, arginine, lysine, arginine, serine, proline, and leucine).  Just as it is not necessary that all languages on earth use the same words, it is not necessary that all living things use “CUG” to code for leucine (as opposed to another of the 20 amino acids).

     The genetic code seems to have been selected for its ability to minimize the effects of mutations.  Translation errors involving codons rich in T and C, make most mistakes in the third codon position.  The fact that similar codons code for the same amino acid (the degeneracy of the genetic code) decreases the effect of such mistakes.  The second most common mistake is in the first position in the codon; the genetic code allows for similar amino acids to be inserted in such errors.  For example, a mutation changing the first position in the codon of a polar amino acid is likely to result in a different polar amino acid being inserted in its place, thus minimizing the effect of the mutation.  The second codon position is the least error prone and there are no compensatory aspects of the genetic code to lessen the impact of mistakes here (Woese, 1965).  If a codon is no longer used by an organism, there is the potential for it to redevelop at a later time.  Those codons which are rarely used are more likely to be reassigned (Santos, 2004).

     While codon sequences are highly conserved, those of anticodons vary.  In a table depicting the genetic code, there are 16 boxes which contain 4 codons each.  Eight of them code for one single amino acid and five code for two.  Because of wobble between the third codon base and the first anticodon base, only two anticodons are found for many of the one- and two-amino acid boxes, GNN and UNN.  CNN anticodons are found more frequently than ANN.  The universal flexibility (wobble) of the third codon, it is likely that LUCA wobbled. Archaea have fewer anticodons which recognize given codon boxes bacteria or eukaryotes.  This simpler system suggests that archaea are close to the origin of cells (Tong, 2004).

Ten amino acids were identified in the solution resulting from Miller’s experiment: alanine (Ala, A); glycine (Gly, G); aspartate (Asp, D); valine (Val, V); leucine (Leu, L); glutamate (Glu, E); serine (Ser, S); isoleucine (Ile, I); proline (Pro, P); and threonine (Thr, T). Nine of these ten are among the ten most commonly used amino acids in living organisms. The codons coding for these have the highest thermostability, suggesting that they might have been the earliest codons developed in the genetic code (Trifnoy, 2008).

     If LUCA possessed tRNAs to bind to all 20 amino acids, then one would not expect that tRNAs which bind to different amino acids would be closely related.  This is observed, and most frequently in Archaea suggesting that they are the most primitive branch of life.  The genetic code seems established just prior to LUCA (Xue, 2003).

 

LUCA AND TRANSLATION

     Although all modern organisms have the same requirements of DNA replication, transcription, and translation, there are some differences in these processes between the major groups of living organisms today.  This suggests that LUCA’s replication, transcription, and translation mechanisms were not complete at the time when the three domains of living organisms diverged.  RNA synthesis was present in LUCA but it was less advanced than protein synthesis (Olsen, 1997).    The last common ancestor of all modern life did not have transcriptional mechanisms which are shared in retained in modern organisms to the same degree that common translational mechanisms are.  Although the two largest subunits (b and b’) and part of another subunit (a) of eubacterial RNA polymerase are conserved with archebacteria, additional archeal subunits and the nature of the a subunit are shared with eukaryotes but not with eubacteria.  Although LUCA possessed rudimentary transcriptional mechanisms, these were modified differently in eubacteria and archebacteria (Woese, 2002). 

     It seems that billion of years can pass (in bacteria, for example) without fundamental changes to basic cellular processes (such as translation).  This is not unexpected: once a cellular pathway is established as a basic requirement of the cell, it need not be altered.  However, fundamental differences do exist between the three domains (in the use of fMet only in bacteria, for example).  This suggests that translation was not fixed in LUCA at the time of the divergence of the three major domains (DiGiulio, 2001).

     While many aspects of translation are common to all living organisms, others are not, such as the initiation of translation.  Translation initiation factors have slight molecular similarities across  the three domanins, indicating the last common ancestor had rudiments of the modern initiation mechanims.  The bacterial IF-1 factor is a distant family relative to the g subunit of EF-1A in eukaryotes.  In bacteria, GTP regenerates spontaneously in the preinitiation complex while eukaryotes require eIF-2B for this function.  All archaea also possess the eIF-2 factor of eukaryotes, although it does not recycle GTP.  Interestingly, while archaea do not demonstrate the GTP recycling function performed by eIF-2B in eukaryotes, they do possess related members of the gene family (as do other bacteria and eukaryotes) although the function of these homologs are not well understood (Kyprides, 1998).

     The ribosomal RNA similarities between organisms are greatest when considering comparisons among bacteria or among eukaryotes; there are fewer similarities when comparisons between these two groups are made.  In bacteria, the 5S RNA possesses a loop which is absent in eukayotes.  Eukaryotic ribosomes contain a 5.8 S RNA which has no homolog in bacteria (Woese, 1977).  Modern bacteria can utilize very different proteins in equivalent positions in the ribosome, suggesting that these evolved separately after the bacterial lineages diverged.  About a dozen nonhomologous ribosomal proteins in eubacteria and archaea can stabilize the same ribosomal regions (Klein, 2004).

 

 

LUCA AND THE TRANSITION TO DNA

DNA

    Modern organisms (with the exception of many viruses) use DNA as their genetic molecule.  It is likely that the earliest precells used RNA instead.  Prior to the evolution of LUCA, a transition from an RNA genome to a DNA genome was made. DNA molecules are also capable of catalytic function although there are variations compared to the activity of RNA molecules. In vitro selection has produced a deoxyribozyme homolog of a ribozyme which served as a template. Several modifications were required since a DNA complement of the ribozyme was not functional just as an RNA copy of the deoxyribozyme was not functional (Paul, 2006).

     The ancestor of the three domains would have been far simpler than a modern prokaryote; the term progenote is frequently used.       The conversion of RNA nucleotides to DNA nucleotides is complex and it may be that ribozymes would have been unable to perform this reaction.  Thus, it seems likely that the evolution of proteins preceded the evolution of DNA.  Some have suggested that methyl RNA, which occurs naturally and is chemically intermediate between DNA and RNA in some respects, may have served as a transitional molecule as organisms made the transition from RNA to DNA in their genetic code (Poole, 2000).

     DNA is superior to RNA as a genetic molecule because it lacks a reactive oxygen at the second carbon in the pentose sugar and a mutation of cytosine to uracil would lead to a degradation of the genetic message if RNA was the genetic molecule. Given that there are two distinct viral enzymes which synthesize thymine, the evolution of DNA as a genetic code could have occurred twice (Forterre, 2006). It is possible that the original DNA genome was single stranded.  The transition to DNA to probably occurred before the transition to thymine; the original DNA probably contained uracil.  Some viruses possess DNA molecules which utilize the base uracil instead of thymine which is referred to as U-DNA. It has been suggested that the transition from and RNA world to a DNA world occurred in two steps with U-DNA representing the first stage (Forterre, 2005). Some viruses encode their own DNA replication proteins.  It is possible that cells got the DNA replication proteins from a virus (Forterre, 2002).

     Eubacteria and archebacteria (and some viruses) may possess one of two non-homologous thymidylate synthases (Forterre, 2002).  The proteins utilized in DNA metabolism do not correspond with the usual pattern (Forterre, 2002).  Even though transcriptase occurs in 3 domains, it can catalyze reactions with RNA in addition to those with DNA (Forterre, 2002).

 

LUCA AND DNA REPLICATION

The following picture is of a fly polytene chromosome, which is composed of DNA.

CHROMOSOME

     There are aspects of DNA replication which are conserved in all organisms as well such as origin recognition, replication bidirectionality, the use of RNA primers, and the replication of leading and lagging strands.  There are other essential components of DNA replication that are unrelated or distantly related between eukaryotes and eubacteria.  This suggests that LUCA did not replicate DNA the way that modern cells do and many aspects of replication have evolved independently in the main lineages of organisms.  Cells in the different domains of life can also possess different mechanisms for the synthesis DNA nucleotides,  (Leipe, 1999; Forterre, 2002; Woese, 1998).  Although LUCA possessed DNA polymerases which would be inherited by all its descendants, eubacteria and archea/eukaryotes replicate their DNA in distinct ways.  The modern mechanisms for replicating DNA evolved twice from a simple ancestral pattern: once in the eubacterial lineage and a second time in the archea/eukaryotic lineage (Woese, 2002).

      Organisms in different domains of life can utilize nonhomologous enzymes to catalyze the same steps of DNA replication.  There are non-homologous enzymes which accomplish DNA topoisomerase I, DNA topoisomerase II, DNA primase, DNE-dependent RNA polymerase, and thymidylate synthase.  DNA polymerase is encoded by several families of proteins.  The families of A and B are related to RNA polymerase, reverse transcriptase, and the DNA polymerase of viruses.  The families C, D, X, and Y are unrelated (Forterre, 2002).

     Topoisomerases are the only enzymes which can perform a vital step of DNA replication, although this step would not have been required in the RNA world.  DNA topoisomerases probably originated soon after transition from RNA to DNA. Different families of topoisomerases arose independently. Some topoisomerases are homologous to proteins of other gene families such as tyrosine recombinase, eukaryotic decatenase, and proteins used in mismatch repair. Among the possible explanations for the unusual distribution of topoisomerases among organisms is that viruses introduced enzymes into separate early cells (Forterre, 2007). There are two non-homologous families of Type II  DNA topoisomerases, just as there are non-homologous proteins which perform equivalent functions in other parts of DNA replication.  TopoIIA is known in bacteria, eukaryotes, viruses, and some archaea while TopoIIB is known in Archaea and some plants.  It is thought that TopoIIA was present in ancestral eubacteria and TopoIIB was present in archaea.  Endosymbiosis or viral transfer might explain the current distribution of the enzymes (Gadele, 2003).

Some have proposed that some of the proteins involved in DNA replication might have arisen from viruses or plasmids (Forterre, 1999; Filee, 2002).       

    DNA might have existed first as U-DNA before uracil was replaced by thymine.  Some bacterial viruses actually utilize U-DNA and might be relics from this stage of evolution (Forterre, 2005).

 

 

WERE THE FIRST CELLS WARM OR HOT?

     There is disagreement over whether the first cells existed at hot temperatures or moderate temperatures.  Since RNA not as stable at hot temperatures some feel that the ancestor of modern cells was probably not a hyperthermophile (Forterre, 2002). Some have suggested that the adaptation of archaea to high temperature was a derived condition rather than an ancestral one and required the modification of ancestral cell membrane sn-1,2 glycerol ester lipids to sn-2,3 ester lipids (Xu, 2002).  There is opposing view which suggests LUCA lived at a high temperature. If the number of codons dedicated to each amino acid reflects the relative abundance of each in early proteins, there is evidence that later steps in the establishment of the genetic code occurred at high temperatures, given the abundance of codons for those amino acids which are most common in thermophilic organisms (DiGiulio, 2000).  When a family tree of all life is generated based on comparisons of the small subunit of ribosomes, the “root” of this tree (those sequences which seem to be closest to the points at which the major branches diverge) is occupied by archaea and eubacteria which live at high temperatures.  This seems to indicate that LUCA was a hyperthermophile (DiGiulio, 2003; Barion, 2007).

     Some feel that the transition from precellular ancestors to true cells occurred separately for the three separate kingdoms and, as a result, there technically never was a “first cell” ancestral to all life today.  In eubacteria, the transition between precelluar ancestors and the first cells involved the formation of the unique peptidoglycan cell wall.  (This cell wall is almost universal in the bacteria; its absence in Mycoplasma and Planctomyces is thought to represent secondary losses.) 

      A diversity of Archaeal cell wall (such as in the archaea depicted below) and cell membrane materials are known; the archaeal lineages diverged from each other before a common structure was established.  The early archaea probably lacked a cell wall.  The cell membrane in archaea is composed of L-glycerol phytanylether lipids as opposed to the D-glycerol acylester lipids in eubacteria and eukaryotes (Kandler, 1994)..

BACTERIA

      The minimum complexity or size of the simplest organism which can truly considered alive is not known.  In addition to the strong possibility that the simplest living organisms became extinct eons ago, new discoveries continue to be made concerning modern microbes.  Nanobacteria are a recently discovered group of proteobacteria which typically range in size from .2 to .5 microns and can be as small as .05 microns.  They are known from human blood.  There is some disagreement over whether they can the formation of calcium apatite in the environment (Kajamder, 1998; Carson, 1998).