Collagen is the most abundant animal protein and was previously known only in animals.  Collagen is now known to exist in fungi where it composes the fimbriae which function in cell to cell communication.   Animals cells can interact with fungal collagen in a way similar to how they interact with animal collagens (Celerin, 1996). 

Collagenous sequences are known from vertebrate proteins such as acetylcholinesterase, C1q (a complement protein), pulmonary surfactant apoprotein, several lectins, and type I macrophage scavenger receptor.  The bacteria Streptococcus pyogenes possesses a collagen-like sequence in enzyme hyaluronidase (Stern, 1992).

    A number of invertebrates synthesize collagen fibrils very similar to the type of collagen found in vertebrates; collagen fibrils are even known from the simplest animals, the sponges and cnidarians (the following images are of cells from a sponge and from the cnidarian Hydra). 


Sequence comparisons and intron-exon locations identify one of the known sponge collagens as homologous to the fibrillar collages in deuterostomes (collagen types I, II, III, V, and XI) with the Gly-Xaa-Yaa triplet repeats needed for helix formation.  Since sponges can contain both fibrillar and non-fibrillar collagens, the amplification of this gene family had begun in the early animals (Exposito, 1990).  Type IV collagen is known from sponges (Boute, 1997) and vertebrates and invertebrates use collagen IV in their basement membranes.  Basement membranes exist in cnidarians and higher animals. (Garrone, R., from Muller, 1998.)

     Collagen is the most abundant protein in the human body. There are at least 19 kinds of vertebrate collagen coded by at least 33 genes. In humans, collagens are synthesized by a number of cells including fibroblasts and a number of epithelial cells.  The basic organization of collagen is a set of three intertwined a helices (which may represent the products of one, two, or three genes).  In each turn of the helix, there are three amino acids and every third amino acid is a glycine residue (which has the smallest side chain and this is the only amino acid which can fit inside the triple helix).  Proline is the second most common amino acid in collagen and many parts of the chain are repeating units of Gly-Pro-X (glycine, proline, followed by any amino acid).  It is thought that the “primordial unit” of collagen is a 54 base pair DNA sequence encoding 6 Gly-X-Y amino acid triplets (X is usually proline and Y hydroxyproline).  This primordial unit was then duplicated to produce multiple exons in collagen genes.  In the a(I) collagen gene, there are 21 exons which consist of this primordial unit, 9 exons which consist of 2 primordial units, 1 exon of three primordial units, and 10 exons which consist of 1 or 2 primordial units with 9 base pairs deleted (Darnell, p. 907).  The a2 (I) gene possesses 42 exons (of 52) which encode these repeats for a total of 338 repeats (Brown, p. 473).  Thus the entire gene is based on duplications of a much smaller functional unit.  Each exon encodes a complete set of triplet amino acid repeats.  There are non-helical regions of the helical collagens which give the structures varying binding properties and structures.

One primordial unit is 6 replicas of an amino acid triplet:


Twenty-one exons code for one primordial unit (each blue block below represents a triplet; each set of six boxes represents a primordial unit).

5 exons consist of one primordial unit with a 9 base pair deletion (coding for one triplet less). Nine exons consist of 2 primordial units.

5 exons consist of one primordial unit with a 9 base pair deletion (coding for one triplet less).


One exon consists of three primordial units.


     The head to tail orientation of pairs of collagen genes on three human chromosomes suggests that an ancestral duplication and inversion was followed by several rounds of duplication (Boute, 1997).

     In general, the animals of the Cambrian Period are stouter than those which preceded them in the Ediacaran fauna.  The tendons and ligaments involved in this change of body structure would have involved collagen.  Collagen is the major structural protein in animals and more than 16 types are known.  About one third of the types of collagen (types I, II, III, V, and XI) form fibrils made of helical trimers while the rest are nonfibrillar.  While collagens are known in invertebrates from sponges through insects (such as the damselfly in the following image), they have not been as well studied as the collagens of vertebrates.     




      The major collagen of skin, tendons, and bone is composed of two chains of collagen 1a1 and one chain of collagen 1a2.  Different tissues can modify this collagen after it is produced (by varying the degree of cross-linking between chains).  Mutations in both of these genes can result in osteogenesis imperfecta and Ehlers-Danlos syndrome.  Specific symptoms include abnormal skeletal development, rupture of membranes, hemorrhage, stillbirth, osteoporosis, and dwarfism.  Some mutations of the collagen 1a2 gene can cause Marfan syndrome.  Human compact bone is depicted in the following image.



Although type II collagen (Col2{alpha}1) was once thought to be found only in the cartilage of jawed vertebrates, it has been found in the cartilage of both hagfish and lampreys. Lancelets possess a ColA homolog which is expressed in the notochord. Some have hypothesized that the material of the notochord should be classified as a primitive form of cartilage and that notochord cells were the precursors of chondrocytes. Duplication of this ancestral ColA gene was probably important in the evolution of vertebrates (Zhang, 2006).

     The collagen 2a1 gene is expressed in cartilage (hyaline cartilage pictured below) and the eye.  Mutations can lead to problems with cartilage formation.



     This collagen is expressed in blood vessels (such as those in the following image) and during fetal development.  Mutations cause Ehlers-Danlos syndrome and aneurysms of arteries.



     Collagen 4a1 forms a mesh of four chains held together at their ends.

     Collagen 4a2 is used in the basement membrane separating epithelia from connective tissue and it forms a unique loop within its triple helix. 


     Collagen 4a3 is used in the basement membrane and the glomeruli of the kidneys (as are the next two genes).  Mutations result in abnormal basement membranes, Alport syndrome, and nephritis.

      Collagen 4a4 is located nest to the gene Collagen 4a3.  Mutations in this gene also cause Alport syndrome and nephritis.

      Collagen 4a5 is expressed in the glomeruli of the kidney (that of a fish is pictured below).  Since the gene is on the X chromosome, it is responsible for the sex-linked form of Alport syndrome.


       Collagen 4a6 contains 2 alternate promoters for 2 alternate transcripts.  One is expressed in the placenta, the other is expressed in the kidney and lung.



     Collagen 5a1 and collagen 5a2 compose the collagen of the placenta and skin (and is a minor component in the collagen of other tissues). .  Mutations cause Ehlers-Danlos syndrome.  The collagen of the dermis is depicted in the following image.


      Collagen 5a3 is most highly expressed in the mammary gland, placenta, and fetal heart and lung (it is expressed to a lesser degree in the adult heart and brain).



      Collagen 6a1 is expressed in the aorta and its mutations cause myopathy.  Mutations in collagen 6a2 cause myopathy and muscular dystrophy. Collagen 6a3 is expressed in the developing mammalian heart; mutations cause myopathy.



      This collagen is used in extraembyronic membranes and the basement membrane under stratified squamous epithelia (as in the following image).  Mutations can cause epidermal problems.



      Collagen 8a1 is expressed in endothelial cells, the skin, the cornea, the lens, and mesenchyme. Collagen 8a2 is expressed in the cornea and collagen 8a3 is expressed in the retina.  The retina in the eye of an embryonic pig is depicted below.



     Collagen 9 is a minor collagen in cartilage and other tissues composed of Collagen 9a1, Collagen 9a2, and  Collagen 9a3 chains.  Mutations can cause myopathy and epiphyseal dysplasia.



     This is a minor collagen of cartilage; mutations lead to abnormalities of cartilage.



      Collagen 11a1 and collagen11a2 are used in cartilage.



     This collagen is one of several collagens (including 9, 12, 14, and 16) which interact with other proteins in the extracellular matrix.



     This collagen is used in skin and tendons (tendon pictured below).



     This collagen is used in blood vessels and muscle (human skeletal muscle pictured below).



     This collagen allows collagens 1 and 2 two interact with other matrix proteins.



     This collagen is expressed in cell junctions called hemidesmosomes.



      This collagen is involved in interactions with other matrix proteins.





Tenascin is an extracellular matrix protein which is known in vertebrates and invertebrates.  Sponges, the most primitive animals, also possess tenascin (Humvert-David, 1993).