The DNA genetic code is practically universal among
modern organisms; it must have been present in the last common ancestor
of all modern living things (LUCA). Genetic analysis suggests that much
of the genetic complexity of living organisms has resulted from the splicing
of small genetic units to form larger ones, shuffling these small units
to create a variety of gene products from the same genetic units, and
adapting specific protein folds (called domains) in a variety of diverse
In the genetic code, triplets of DNA bases code for the amino acids
of proteins. If this can be compared to three letter words in a language,
all living things speak the same language: the same triplet of bases will
code for the same amino acid in every organism on the planet. With only
minor exceptions, a certain DNA sequence (gene) would be read the same
way and produce the same amino acid sequence (protein) in every organism
on the planet.
In other words, if the following genetic message was copied from DNA and
sent to the cell to make protein: GGUGAUAAGAGGCGGUCGCCGCUG, all living
things would insert the same amino acids in the same order (aspartic acid,
glycine, arginine, lysine, arginine, serine, proline, and leucine). Just
as it is not necessary that all languages on earth use the same words,
it is not necessary that all living things use "CUG" to code
for leucine (as opposed to another of the 20 amino acids). The fact that
a certain gene sequence would result in the exact same protein in organisms
as diverse as humans, oak trees, and algae (with only minor variations
possible in bacteria) indicate that all living organisms have descended
from a common ancestor using this genetic code.
LUCA-THE LAST UNIVERSAL COMMON ANCESTOR
There are three kinds of cell on earth today (the eubacteria, archaea,
and eukaryotes). By analyzing the molecular mechanisms that these three
types of cell have in common, it is possible to make predictions of what
the last universal common ancestor of all cells alive today (LUCA) might
have been like. Although all modern organisms have the same requirements
of DNA replication, transcription, and translation, there are some differences
in these processes between the major groups of living organisms today.
This suggests that LUCA's replication, transcription, and translation
mechanisms were not complete at the time when the three domains of living
organisms diverged. RNA synthesis was present in LUCA but it was less
advanced than protein synthesis (Olsen, 1997). Fundamental differences
do exist between the three domains (in the use of fMet only in bacteria,
for example). This suggests that translation was not fixed in LUCA at
the time of the divergence of the three major domains (DiGiulio, 2001).
The genetic code seems established just prior to LUCA (Xue, 2003).
INTRONS AND EXONS
What did the original genes look like? Most prokaryotic genes are continuous
coding units while most eukaryotic genes are divided structures, split
into segments known as exons which include the coding sequences for proteins
and intervening sequences known as introns.
INTRONS AND THE RNA WORLD
In the process of converting a DNA message into protein, the DNA is copied
into RNA. The introns of the RNA must be removed and the exons joined
before RNA can leave the nucleus and be translated into protein. How are
introns removed? There are two general mechanisms: some require a complex
known as a spliceosome and others remove themselves. Those that remove
themselves show a catalytic activity which is expected of RNAs whose function
preceded the transition to proteins. The spliceosome itself is composed
of catalytic RNAs, known as snRNAs. There is another class of RNA, snoRNA,
which is essential for processing the RNA found in ribosomes (Mishra,
1997). Perhaps the most interesting aspect of the snoRNAs is that most
of them are not encoded by their own genes but are rather encoded by the
introns of other genes. Could it be that the oldest introns were actually
functional RNA molecules? (Maxwell, 1995; Poole, 1998; OMIM).
Since genes come in pieces, a cell can shuffle these pieces to produce
a diversity of different proteins.
In the original cells, large RNA molecules (whether RNAs which functioned
on their own or RNAs coding for proteins) could have been assembled by
smaller units which were spliced together.
The exon theory of genes proposes that small coding RNAs coding 15-20
amino acids composed the original genes. The splicing performed by introns
was thus vital in processing the first proteins (DiGiulio, 2001). One
of the ways that higher eukaryotes generate such a diverse repertoire
of proteins is through alternate splicing of original transcripts.
Many of the essential portions of proteins form a specific protein fold
(domain) and it is this part of the protein that binds the DNA, or binds
ATP, or forms the active site of the enzyme, or whatever. For example,
the zinc finger fold binds DNA and is a requirement for all the zinc finger
transcription factors, allowing them to bind DNA. The original zinc finger
proteins have been duplicated hundreds of times to produce a superfamily
of proteins which bind DNA. Variations between different members of the
superfamily allow them to bind to specific regions of DNA while retaining
the zinc finger protein fold as the essential part of the protein.
How many protein folds (domains) are there? Not as many as one might think.
These folds may be central elements of different proteins-the average
fold is known to be incorporated into over 100 different proteins but
some (such as the TIM barrel, the immunoglobulin fold, the Rossman fold,
the ferrodoxin fold, and the helix-turn-helix bundle) are incorporated
into thousands of different proteins each. The twenty five most abundant
folds are parts of 61% of proteins with structural homologues throughout
all groups of life (Gerstein, 1997). Although each major group of organisms
have different distributions of these folds (for example, immunoglobulins
for intercellular communication and zinc fingers for gene regulation are
among the ten most abundant folds in animals but not in plants or eubacteria),
there are many folds which are shared (Gerstein, 1997). There are only
a few thousand protein domains known in living organisms. Only 7% are
unique to vertebrates (Liu, 2001; International Human Genome Sequencing