4.6 billion to 3.5 billion years ago


The DNA genetic code is practically universal among modern organisms; it must have been present in the last common ancestor of all modern living things (LUCA). Genetic analysis suggests that much of the genetic complexity of living organisms has resulted from the splicing of small genetic units to form larger ones, shuffling these small units to create a variety of gene products from the same genetic units, and adapting specific protein folds (called domains) in a variety of diverse proteins.

In the genetic code, triplets of DNA bases code for the amino acids of proteins. If this can be compared to three letter words in a language, all living things speak the same language: the same triplet of bases will code for the same amino acid in every organism on the planet. With only minor exceptions, a certain DNA sequence (gene) would be read the same way and produce the same amino acid sequence (protein) in every organism on the planet.
In other words, if the following genetic message was copied from DNA and sent to the cell to make protein: GGUGAUAAGAGGCGGUCGCCGCUG, all living things would insert the same amino acids in the same order (aspartic acid, glycine, arginine, lysine, arginine, serine, proline, and leucine). Just as it is not necessary that all languages on earth use the same words, it is not necessary that all living things use "CUG" to code for leucine (as opposed to another of the 20 amino acids). The fact that a certain gene sequence would result in the exact same protein in organisms as diverse as humans, oak trees, and algae (with only minor variations possible in bacteria) indicate that all living organisms have descended from a common ancestor using this genetic code.

There are three kinds of cell on earth today (the eubacteria, archaea, and eukaryotes). By analyzing the molecular mechanisms that these three types of cell have in common, it is possible to make predictions of what the last universal common ancestor of all cells alive today (LUCA) might have been like. Although all modern organisms have the same requirements of DNA replication, transcription, and translation, there are some differences in these processes between the major groups of living organisms today. This suggests that LUCA's replication, transcription, and translation mechanisms were not complete at the time when the three domains of living organisms diverged. RNA synthesis was present in LUCA but it was less advanced than protein synthesis (Olsen, 1997). Fundamental differences do exist between the three domains (in the use of fMet only in bacteria, for example). This suggests that translation was not fixed in LUCA at the time of the divergence of the three major domains (DiGiulio, 2001). The genetic code seems established just prior to LUCA (Xue, 2003).


What did the original genes look like? Most prokaryotic genes are continuous coding units while most eukaryotic genes are divided structures, split into segments known as exons which include the coding sequences for proteins and intervening sequences known as introns.
intron splicing

In the process of converting a DNA message into protein, the DNA is copied into RNA. The introns of the RNA must be removed and the exons joined before RNA can leave the nucleus and be translated into protein. How are introns removed? There are two general mechanisms: some require a complex known as a spliceosome and others remove themselves. Those that remove themselves show a catalytic activity which is expected of RNAs whose function preceded the transition to proteins. The spliceosome itself is composed of catalytic RNAs, known as snRNAs. There is another class of RNA, snoRNA, which is essential for processing the RNA found in ribosomes (Mishra, 1997). Perhaps the most interesting aspect of the snoRNAs is that most of them are not encoded by their own genes but are rather encoded by the introns of other genes. Could it be that the oldest introns were actually functional RNA molecules? (Maxwell, 1995; Poole, 1998; OMIM).

Since genes come in pieces, a cell can shuffle these pieces to produce a diversity of different proteins.

In the original cells, large RNA molecules (whether RNAs which functioned on their own or RNAs coding for proteins) could have been assembled by smaller units which were spliced together.

exon shuffling

The exon theory of genes proposes that small coding RNAs coding 15-20 amino acids composed the original genes. The splicing performed by introns was thus vital in processing the first proteins (DiGiulio, 2001). One of the ways that higher eukaryotes generate such a diverse repertoire of proteins is through alternate splicing of original transcripts.

Many of the essential portions of proteins form a specific protein fold (domain) and it is this part of the protein that binds the DNA, or binds ATP, or forms the active site of the enzyme, or whatever. For example, the zinc finger fold binds DNA and is a requirement for all the zinc finger transcription factors, allowing them to bind DNA. The original zinc finger proteins have been duplicated hundreds of times to produce a superfamily of proteins which bind DNA. Variations between different members of the superfamily allow them to bind to specific regions of DNA while retaining the zinc finger protein fold as the essential part of the protein.
How many protein folds (domains) are there? Not as many as one might think. These folds may be central elements of different proteins-the average fold is known to be incorporated into over 100 different proteins but some (such as the TIM barrel, the immunoglobulin fold, the Rossman fold, the ferrodoxin fold, and the helix-turn-helix bundle) are incorporated into thousands of different proteins each. The twenty five most abundant folds are parts of 61% of proteins with structural homologues throughout all groups of life (Gerstein, 1997). Although each major group of organisms have different distributions of these folds (for example, immunoglobulins for intercellular communication and zinc fingers for gene regulation are among the ten most abundant folds in animals but not in plants or eubacteria), there are many folds which are shared (Gerstein, 1997). There are only a few thousand protein domains known in living organisms. Only 7% are unique to vertebrates (Liu, 2001; International Human Genome Sequencing Consortium, 2001)..