Collagen is the major extracellular protein in all animals. Collagenous sequences are known from vertebrate proteins such as acetylcholinesterase, C1q (a complement protein), pulmonary surfactant apoprotein, several lectins, and type I macrophage scavenger receptor.  The bacteria Streptococcus pyogenes possesses a collagen-like sequence in enzyme hyaluronidase (Stern, 1992).

     Collagen exists in fungi where it composes fimbriae which function in cell to cell communication.   Until recently, collagen was only known to exist in animals.  Animals cells can interact with fungal collagen in a way similar to how they interact with animal collagens (Celerin, 1996).

Collagen is the most abundant protein in the animal kingdom and is the most abundant protein in the human body. There are at least 19 kinds of vertebrate collagen coded by at least 33 genes.  A number of invertebrates have collagen fibrils very similar to the type of collagen found in vertebrates and collagen fibrils are even known from cnidarians and sponges.  Sponge collagen is homologus to that of vertebrates.



   Vertebrates and invertebrates use collagen IV in their basement membranes.  Basement membranes exist in cnidarians and higher animals. (Garrone, R., from Muller, 1998.)

     The basic organization of collagen is a set of three intertwined a helices (which may represent the products of one, two, or three genes).  In each turn of the helix, there are three amino acids and every third amino acid is glycine (which has the smallest side chain and this is the only amino acid which can fit inside the triple helix).  Proline is the second most common amino acid in collagen and many parts of the chain are repeating units of Gly-Pro-X (glycine, proline, followed by any amino acid).  It is thought that the “primordial unit” of collagen is a 54 base pair DNA sequence encoding 6 Gly-X-Y amino acid triplets (X is usually proline and Y hydroxyproline).  In the a(I) collagen gene, there are 21 exons which consist of this primordial unit, 9 exons which consist of 2 primordial units, 1 exon of three primordial units, and 10 exons which consist of 1 or 2 primordial units with 9 base pairs deleted (Darnell, p. 907).  The a2 (I) gene possesses 42 exons (of 52) which encode these repeats for a total of 338 repeats (Brown, p. 473).  Thus the entire gene is based on duplications of a much smaller functional unit.  Each exon encodes a complete set of triplet amino acid repeats.  There are non-helical regions of the helical collagens which give the structures varying binding properties and structures.

One primordial unit is 6 replicas of an amino acid triplet:




Twenty-one exons code for one primordial unit (each blue block below represents a triplet; each set of six boxes represents a primordial unit).






































































































































































































































5 exons consist of one primordial unit with a 9 base pair deletion (coding for one triplet less).








































Nine exons consist of 2 primordial units.













































































































































































5 exons consist of one primordial unit with a 9 base pair deletion (coding for one triplet less).






































































































One exon consists of three primordial units.























The head to tail  orientation of pairs of collagen genes on three human chromosomes suggests that an ancestral duplication and inversion was followed by several rounds of duplication (Boute, 1997).



     Hemoglobins are heme containing proteins which reversibly bind oxygen.  They are found in bacteria, fungi, higher plants, most invertebrates and all vertebrates.  All of them belong to the same globin gene family, having evolved from a single ancestral protein of about 17 kDa. 

     In bacteria and yeast, multi-domain proteins combine hemoglobin with another domain to result in a number of functional differences.  Bacterial flavohemoglobin can remove nitric oxide (NO) by reacting it with oxygen to form nitrate.  When oxygen is not present, flavohemoglobin removes NO by promoting the conversion of N2O.  Hemoglobin is an ancient molecule (given its distribution across both prokaryotes and eukaryotes) that first evolved in a world without much oxygen.  Mycobaterium tuberculosis uses hemoglobin to protect it from reactive N molecules of the host’s defenses during infection.  Hemoglobin is similar in structure to peroxidase which removes dangerous forms of oxygen.  In nematodes, hemoglobin can function both as a peroxidase and to remove NO.

     Yeast hemoglobin has two domains which are both homologous to bacterial flavoheme proteins.  Plants  have a variety of hemoglobin molecules and it is even possible that all plants possess hemoglobins (Zhu, 1992, Anderson, 1996).  Some, referred to as the symbiotic hemoglobins, are found in the nodules of nitrogen fixing plants (primarily legumes, but nonlegumes as well) where they transport oxygen to nitrogen fixing bacteria.  Nonsymbiotic hemoglobins are a distinct group of hemoglobins; it is possible that they are found in all plants. 

     There are a variety of hemoglobins in invertebrates.  Some are made of a single polypeptide chain with one heme group (as in flies), multi-subunit proteins with two heme groups per subunit (known primarily from crustaceans), multi-subunit proteins with multiple heme groups (8 to 20 per subunit; known in crustaceans and mollusks), and multi-subunit proteins in which not all subunits contain heme and subunits can be united by disulfide bones (this group includes some annelid worms).  Some hemoglobins in invertebrates function inside cells; others are extracellular.

    In the gnathostome lineage a duplication of globin gene gave rise to myoglobin and hemoglobin; shortly afterwards a duplication of hemoglobin gene gave rise to alpha and beta globin genes.   A duplication in the alpha genes gives rise to some members of the alpha family which are expressed only in the embryo (in contrast, the embryonic members of the beta family originate later; embryonic beta genes in mammals and birds arose independent duplications).


ALPHA GLOBIN FAMILY:  There is a family of alpha globin genes which is located on chromosome 16 (16pter—p13.3) which have arisen from duplications from a single ancestral beta globin gene.


5’-----zeta------zeta pseudogene-------alpha pseudogene------alpha 2------alpha 1-----theta----3’




1)     Zeta

     Zeta hemoglobin is a functional gene in human embryonic development and fetuses suffer a number of problems when it is deleted.  If children are homozygous for mutations in the alpha genes, their hemoglobin consists of 2 zeta chains and 2 beta chains but often they are stillborn.

          There are 3 copies of zeta genes in some Melanesians and Polynesians.



2)     Zeta Pseudogene

Interestingly, a common variation of the zeta pseudogene makes it a functional gene.


3)  Alpha pseudogene

    There are early termination and splice junction mutations which make this gene nonfunctional.



     In adults, hemoglobin A is composed of 2 alpha chains and 2 beta chains; hemoglobin A2 is made of 2 alpha chains and 2 delta chains.  Fetal hemoglobin is composed of 2 alpha chains and 2 gamma chains.  Embryonic hemoglobin is composed of 2 alpha chains and 2 epsilon chains.

6) Theta

     This globin gene was first discovered in orangutans and later found to be in humans as well.  It is an old gene, having about as many amino acid differences from alpha hemoglobin as alpha and zeta hemoglobin differ from each other.  The promoter differs from the promoters of the other genes in the alpha family.


BETA GLOBIN FAMILY: There is a family of beta globin genes which is located on chromosome 11 (11p15.5) which have arisen from duplications from a single ancestral beta globin gene.





5’---------epsilon----------gammma G---------gamma A-------eta----------delta----------beta---------3’




1)     Epsilon

Two epsilon chains complex with two alpha chains to form embryonic hemoglobin, 4 epsilon chains can also form a tetramer with themselves.  The amino acid sequence is similar to that of beta and delta hemoglobin and no mutations are known in humans.  A deletion from the DNA region upstream of the epsilon gene in mice causes the expression of the protein in adult mice.



2-3)            Gamma Genes

Most people have two gamma genes (gamma G and gamma A) but some people have 3 or even 5 (there was even one family that had 4 copies on another chromosome).  Two gamma hemoglobin chains complex with two alpha chains to form fetal hemoglobin.  A small amount of this fetal hemoglobin (called hemoglobin F) is detectable in the blood of adults.  Large amounts of fetal hemoglobin may be found in cases in which there is abnormal beta hemoglobin or in which there has been mutations in the promoters of the gamma genes.



3)     Eta

Eta hemoglobin apparently was an embryonic hemoglobin in the ancestors of eutherian mammals.  In artiodactyls (deer, cows, giraffes, etc.) it is still a functional gene.  In primates, eta is a nonfunctional pseudogene.  Rodents no longer have any trace of the eta hemoglobin gene.



4)     Delta

Delta hemoglobin can complex with alpha hemoglobin in adults and it can be considered to be a second beta hemoglobin gene.  Thalassemias result from mutations in the delta hemoglobin gene.  There are examples of fusion hemoglobin genes in which the delta and beta genes have fused into one gene (and even a case of a delta-beta-delta fusion gene).



5)     Beta

In adults, 2 beta chains complex with 2 alpha chains to form hemoglobin.  Some people have two beta genes and delta chains can apparently be substituted for beta chains.  Mutations in beta globin genes cause a variety of thalassemias which range from minor asymptomatic cases to severe cases (in which no beta or delta hemoglobin is made) which result in sever anemia, the persistance of fetal hemoglobin, and requires bone marrow transplants.



     Bacteria regulate their genes to some degree.  For example, they may only synthesize the enzymes to digest a certain food molecule when that food molecule is present.  When eukaryotic cells evolved from the simpler prokaryotic cells, gene regulation became more important (for example, for controlling stages of the more complex cell cycle).  When multicellular fungi, plants, and animals evolved from simpler eukaryotes, gene regulation became more important still.  In most multicellular organisms, cells have specialized—although all cells have the same genes, each cell type is expressing a unique set of genes.  As organisms lived for longer periods, certain genes became appropriate only at different stages of an organism’s life cycle.

     All human cells, including the blood, lung, and muscle cells pictured above, have the same genes.  Human life would be impossible without gene regulation: liver cells must express genes that are not expressed in brain cells or white blood cells; the gene expressed during fetal development must be different from those expressed as an adult.  In the human genome, almost ¼  of the genes for proteins produce factors needed for the replication of DNA, its maintenance, and the control of gene expression. (Brown, p.21).  Most proteins which regulate gene expression bind to DNA and determine the activity level of genes. 

     Despite the large number of genes which regulate the expression of human genes, most of these genes have possess one of two structural designs: helix-turn-helix and zinc finger motifs. 


The first transcriptional regulators discovered in bacteria were found to have similar DNA-binding regions: there were two a-helices separated by a short turn.   The second a-helix rests in the major grooves of the DNA double helix upon binding.  This helix-turn-helix (HTH) motif is only about 20 amino acids long; other parts of HTH proteins control the binding of the HTH region only at specific sites on the DNA molecule.   HTH proteins are common in bacteria and eukaryotes; in bacteria the HTH proteins include the lactose repressor of the lac operon and the tryptophan repressor of the tryp operon (a standard features of general biology textbooks).  The helix which interacts with the DNA is typically longer in eukaryotes.

     One family of HTH proteins are those which possess a High Mobility Group (HMG) motif.  One member of this family is the SRY gene which determines gender in mammals.  While sexual reproduction has existed since early eukaryotes, sex chromosomes have evolved in different lineages much more recently.  The mammalian X and Y chromosomes evolved after the mammalian lineage split from that of modern reptiles.

1) SRY

SRY is located on the Y chromosome the testes determining factor.  An XY individual with a mutation in SRY is female although typically there are no menses (in one case, there were menstrual cycles but no follicles).  In mice, such females can be fertile but the ovaries fail early in life.  An XX individual with the SRY gene translocated to one of the X chromosomes develops as a male.  Mutations in the SRY gene during development can produce true hermaphrodites: individuals with both male and female tissue.


     One subset of  helix-turn-helix proteins evolved in eukaryotes which share a conserved set of 60 amino acids, called the homeodomain.  Several thousand homeodomain family proteins are known in eukaryotes and higher eukaryotes have clearly expanded on the ancestral sets: 6 homeodomain proteins are known from yeast, 82 from the worm C. elegans, 100 from Drosophila, and 160 from humans (Brown, p. 44).   This gene family includes a number of smaller subfamilies, some of  which (SIX, TALE, PAX, POU, and HOX) are essential regulators of differentiation during embryonic development.


      Pax genes are important in the development of the eyes and brain.  Mice with mutations in Pax2 and Pax5 suffer a complete loss of the posterior midbrain and cerebellum. Eyes are found in some species of Hydra which possess a single Pax gene.  All animals whose eyes have been studied require the Pax6 gene to develop.  Pax-6 is important in the development of eyes in both invertebrates (such as Drosophila) and vertebrates (including humans).  The common involvement of Pax-6 in eye development in both protostomes and deuterostomes suggest that this developmental mechanism was present in their common ancestor.  Nemertine worms, which may represent a lineage whose origin was close to the protostome-deuterostome divergence, express a Pax-6 homolog in their CNS and eye regions (Loosli, 1996).

     Below is the eyeless mutation in fruit flies: a mutation in Pax-6 which reduces the size of the eye (and can even cause its absence).  In humans, Pax-6 mutations can also reduce eye size and mutations in mice have caused the absence of eyes.

The HOX clusters are a fascinating string(s) of genes which regulate the positional differentiation of cells along several axes in the body (the longitudinal axis, the axis of limbs, the brain, the GI tract, etc.).


    Cnidarians (jellyfish, coral, hydra) have a “proto-Hox” cluster resulting from a tandem duplication of a Hox gene found in sponges.  One of these genes (cnox-1) is similar to the anterior Hox genes of bilaterans, the other (cnox-9) of the posterior bilateran genes.  (Peterson, 2002).  In Hydra, genes of the Hox/ParaHox cluster family are involved in the differentiation of head structures.  A type of jellyfish known as comb jellies possesses a three-gene Hox cluster, with a medial Hox gene. 








Nematode worms have 2 Hox genes in tandem.  No medial gene has yet been identified.  Hox genes in annelids are involved in both segmentation and organogenesis.






     The ancestor of all coelomates had at least 5 Hox cluster genes: 2 anterior genes (lab/Hox1, pb/Hox2), 2 medial genes (dfd/Hox4, Ant/), and 1 posterior gene (AbdB/Hox9). The ribbonworm Lineus close to split between protostomes and deuterostomes.  It has one cluster of at least 6 genes: 2 of the anterior class, three of the middle class, and 1 of the posterior class.  After the split of the protostome and deuterostome lineages, some protostomes acquired additional Hox genes (such as Ubx, Scr, and AbdA) and vertebrate ancestors acquired additional Hox genes to result in a total of 13. 




















Drosophila homologs
















Vertebrate homologs


















Sea urchins have a single Hox cluster of 10 genes which is essentially the same as that in chordates despite the differences in body plans such as the lack of many chordate head organs.   










     There is a type of chromosomal mutation known as polyploidy in which organisms possess multiple sets of chromosomes.  Polyploidy is very common in plants and is also known in animals.  The analysis of many genes indicates that at the base of the vertebrate lineage, 2 instances of polyploidy occurred, resulting vertebrates which possessed 4 copies of every gene that their ancestors had.  Although many of these genes were later lost, the human genome (and the genomes of other vertebrates) still carries evidence of this.  One of the best known examples is that while more primitive animals possess 1 Hox cluster to govern the specialization of cells along the axis of the body, vertebrates have 4 Hox clusters.  These additional clusters allowed the first vertebrates to increase their level of complexity.
































































     The first regulatory protein which promoted transcription in eukaryotes (TFIIIA) was a zinc finger protein.  Zinc finger proteins are a family of proteins whose loop structures bind ions of zinc (through cysteine and histidine amino acids occurring in conserved positions) and interact with DNA.  While several are known from bacteria, gene duplication has produced large numbers of them in eukaryotic cells.  Concerning the most abundant family of zinc finger proteins (the C2H2 family) yeast possess 34 genes, the worm C. elegans 68, Drosophila 234, and humans 564 (Brown, p. 44).   In vertebrates, another family of zinc finger proteins, that of thyroid and steroid hormone receptors, has produced many important transcription regulators.  Multiple zinc finger regions can occur in the same protein (as many as 37 regions found in an amphibian protein).



Nuclear receptors include receptors for estrogen, glucocorticoids, mineralocorticoid, thyroid hormone, vitamin D and retinoic acid receptors in vertebrates, several receptors from insects (such as ecdysone, ftz regulatory factor 1, and the products of the genes knirps tailless, knirps, and ultraspiracle) and the C. elegans differentiation activating factor.  They all form part of a gene family derived from an ancestral protein with ligand-binding and DNA-binding domains (Amero, 1992).

     Unlike most hormones, steroid and thyroid hormones enter cells rather than binding only on the outside.  Once they have bound the receptor, the hormone-receptor complex travels to the nucleus where it binds DNA and effects gene transcription. Nuclear hormone receptors are known from vertebrates, echinoderms, arthropods and nematodes.  The vertebrate retinoic acid receptor (RXR) is homologous to the receptor for juvenile hormone III. Sponges have retinoic acid and its receptor (Schacke, 1994).  Jellyfish have a receptor which is similar to vertebrate RXR which binds retinoic acid and then binds the DNA of crystallin genes, just as in both vertebrates and invertebrates (Kostrouch, 1998).

      There were already several different subfamilies of nuclear proteins at the time when protostomes and deuterostomes separated (Laudet, 1992). Tunicates possess genes for all major peptide hormone receptors (such as insulin and gonadotropins), except growth hormone.  Tunicates lack steroid hormones (and the P450 enzymes which synthesize them) but they do possess nuclear receptors, such as those which bind thyroid hormones and retinoic acid (which protostomes lack).  Interestingly, there is one member of the estrogen-related receptor family in both flies and tunicates whose ligand is unknown.  Tunicates possess both an iodine-sequestering endostyle, homologs of thyroid peroxidase which synthesizes thyroid hormones in vertebrates, and iodothyronine deiodinases (which convert thyroxine to T3) (Dehal, 2002).


     Of the major steroid hormone receptors in mammals (2 estrogen receptors, protesterone receptor, androgen receptor, glucocorticoid receptor, and mineralocorticoid receptor), all seem to have evolved from an ancestral receptor in primitive vertebrates.  A single indeterminate steroid hormone receptor is present in hagfish, 3 are known to date from lampreys and sharks, and all 6 are known in bony vertebrates (Thornton, 2001).

     Most sex steroid receptors bind to the DNA sequence TGACCT while glucocorticoid receptors bind to the sequence TGTTCT.  There are three amino acids which function in this binding.  A mutation affecting these amino acids can affect hormone binding specificity.  For example, one glucocorticoid receptor mutant can interact with DNA regions recognized by estrogen receptors (Zilliacus, 1994).

     There is an X-linked gene DAX-1 which, when present in two copies can cause XY individuals with SRY to develop as females.  It is a nuclear hormone receptor that binds to retinoic acid and regulates transcription.  DAX-1 is not required for normal male development (Zanaria, 1994)..