The number of proteases in the human genome (including their non-protease homologs) is estimated to be 500-700, of which about one third are serine proteases (Yosef, 2003).  Depending on which amino acids are activated in their active site, proteases can be classified as serine, cysteine,  aspirate, threonine, or metallo-proteases (Yosef, 2003).  Although most are secreted proteins, one group called the Type II Transmembrane Serine Proteases (TTSPs) are bound to membranes (Yosef, 2003).

     The first serine proteases are thought to have been simple digestive enzymes.  Gene duplications produced multiple copies of these genes and allowed for some to adapt for a variety of more specific functions.  Serine proteases are a large family of enzymes in the human genome which function in diverse physiological processes ranging from digestion to coagulation (OMIM; Yosef, 2003).  This is an ancient gene family, including eubacterial digestive enzymes (and the vertebrate digestive enzymes trypsin and chymotrypsin).  Most of these proteins have the amino acid proline at residue 225 in the protein.  However, in vertebrates, some of these proteins possess the amino acid serine at residue 225.  This enabled the binding of sodium and novel protein function.  Some serine proteases in blood (such as plasmin and clotting factor XIa) possess a proline at site 225 while others such as thrombin, clotting factor Xa (involved in clotting), and complement protein C1r (involved in immunity) possess a serine.  Mutations at site 225 drastically affect the function of thrombin (affecting ligand recognition up to 60,000 times).  The change in some of the serine proteases needed to acquire a function in coagulation seems to stem from one ancestral mutation changing the amino acid at residue 225 (Guinto,1998; Dang, 1996).

    The plant Arabidopsis possesses more than 550 protease genes in its genome.  Serine proteases and pepsin-like proteases compose two of the largest groups of proteases with 54 and 59 member genes respectively (Beers, 2004).  Serine proteases and their homologs are the second largest gene family in the genome of fruit flies (Ross, 2003).  Serine proteases are often secreted as zymogens, which must be cleaved in order to be activated.  In mammals, hepatocyte growth factor is a homolog of serine proteases but it has lost its proteolytic activity (Ross, 2003).

     Chymases are a family of serine proteases found in mast cells which process peptide hormones, inflammation, and the reaction to parasites (Chandrasekharan, 1996).  In Drosophila, the membrane protein Rhomboid (RHO) is a serine protease which regulates EGF receptors is homologous to some bacterial proteins.  Other eukaryotes also have RHO sequences (Gallio, 2002).

     The chymotrypsin family of serine proteases is involved in digestion, coagulation, immune reactions, the dissolution of clots, fertilization, and development.  Members of the chymotrypsin family are usually transported out of the cells or into vesicles. Mast cells and neutrophils use proteases which are related to the kallikreins (Krem, 2000; Ross, 2003). 

     Arthropods utilize a cascade of serine proteases in their clotting/immune response.  A similar ancestral cascade probably gave rise to the coagulation and complement cascades in vertebrates.  The more derived sequences often have additional domains which have been added, such as EGF, kringle, and LDL domains (Krem, 2000). 

     Serine proteases form 0.6% of the human genome.   Uncontrolled action of serine proteases can be factors in cancer, arthritis and emphysema.  There are at least 6 clusters of serine protease genes in the human genome, the largest of which is the cluster of 15 kallikrein genes on chromosome 19 (Yosef, 2003).  Serine proteases can be involved in psoriasis.  Some members of the kallikrein cluster on chromosome 19 are expressed only in the prostate such as KLK2, KLK3, and prostase (Gan, 2000).



ACROSIN ACR is the major protease in the acrosome of spermatozoa (pictured below).


AIRWAY TRYPSIN-LIKE PROTEN is made by serous glands in the bronchi and trachea and is present in the mucus lining the respiratory tract, at least in those with chronic disease.


CATHEPSIN G is present in neutrophil granules (a neutrophil is pictured below).


CHYMOTRYPSIN is a digestive enzyme.  It also has a hypocalecimic function which continues even when its protease function is blocked.


CHYMOTRYPTASE-LIKE PRTOEN CTRL is adigestive enzyme released from the pancreas that functions over a broad pH range.


CHYMASE binds angiotensin I and II and may have a role in determining blood pressure.  Some mammals have multiple chymase genes and it is possible that some genes have been lost during primate evolution.


CORIN converts the hormone precursor pro-ANP to ANP.  Its highest expression is in the myocytes of atria although it is also present in developing kidneys and bones.


ELASTASE is a digestive enzyme and also functions in the mitochondrial inner membrane.  It is expressed in the bone marrow and neutrophils release it to degrade proteins on bacterial membranes (e.g. Shigella, Salmonella, and Yersinia).  Mutations cause cyclic hematopoeisis and neutropenia.


GRANZYME A induces apoptosis without utilizing caspases and functions in the cell lysis induced by the cytotoxic and natural killer cells.  It is called the Hanukah factor because of its homology to clotting factor IX which is also known as the Christmas factor.


GRANZYME B is expressed in activated T cells and is involved in target cell apoptosis.


GRANZYME H is expressed in natural killer cells and activated lymphocytes.


GRANZYME K is expressed in the lung, spleen, thymus, and leukocytes.  It is involved in the function of natural killer cells and T lymphocytes.  Leukocytes of the thymus are depicted below.


GRANZYME M is expressed in natural killer cells and activated lymphocytes.


HEPSIN is expressed in most tissue but is most highly expressed in the liver.  It may not be an essential enzyme since mutant mice appear normal.  (Liver cells are pictured below.)

KALLIKREINs form a gene family of about 25-30 genes.  In mice, there are 12 functioning genes and 12 linked pseudogenes.


KLK1 generates bioactive peptides in the kidney, colon, salivary glands, pancreas, and blood vessels.  A lower expression is linked to hypertension.




KLK3 is the prostate-specific antigen (PSA) whose levels are measured in blood tests for prostate cancer.  It normally functions in the liquefication of semen and its levels are good indicators of prostate activity.  Its levels increase in hirsute females.


KLK4, also known as prostase, degrades extra-cellular proteins and activates the prostate-specific antigen.


KLK5 is involved in desquamation of the skin.  It is also expressed in the brain, placenta, kidneys, and is increased in some cancers.


KLK6 is expressed in the brain and some primary tumors.


KLK7 is expressed in the stratum corneum of the skin where it functions in the loss of the superficial sheets of epithelia (desquamation) and controls the thickness of the skin.  It is expressed in other tissues as well and is more highly expressed in breast cancers.  Breast cancer cells are depicted below.


KLK8 has several tissue specific alternate transcripts and is overexpressed in many ovarian carcinomas.


KLK9 is expressed in many tissues and its levels may be useful as an indicator of ovarian cancer.


KLK10 may have a role in tumorigenesis.


KLK11 is expressed in the brain.


KLK12 is expressed in many tissues and is downregulated in some tumors.


KLK13 is primarily expressed in the mammary gland, prostate, salivary glands, and in the testes.  It is downregulated in some breast cancers.


KLK14 is expressed in normal but not malignant tissues.  One transcript is primarily expressed in the prostate while another is only expressed in skeletal muscle.



MEGSIN is produced by megakaryocytes and mesangial cells in glomeruli.


NEUTROPHIL AZUOCIDIN NAZC is homologous to proteases but doesn’t function as a protease but rather as a chemotactic factor for neutrophils and lymphocytes.


PRSS1 (also known as trypsin 1) mutations cause nutritional edema, hypoproteinemia, pancreatitis, and failure to thrive.


PRSS2 (also known as trypsin 2) is expressed in the pancreas and in other tissues where its function is unknown.  It is one of 8 trypsin genes in the TCRb locus (with TCR receptor elements), of which only three are functional.


PRSS11 cleaves insulin-like growth factor.  It has an IGF binding protein domain and is most highly expressed in the placenta.


PRSS12 is most highly expressed in the lung and brain, where it localizes to the presynaptic region of synapses.


PRSS7 is an intestinal enzyme which activates the pancreatic enzymes trypsin, chymotrypsin, and carboxypeptidase A.  It is the first enzyme in a cascade which activates the digestive proenzymes.


PRSS8 (prostatin) is expressed in many tissues including the epithelia of the prostate.


PRSS15 degrades aconitase in the regulation of mitochondrial function.


PRSS16 functions in vesicles and endosomes of the thymus.


PRSS25 is a homolog of the HTRA genes in E. coli.  It is a nuclear protease involved in stress responses and heat shock in bacteria.  In eukaryotes it is involved in an apoptosis pathway that doesn’t involve caspases.



TMPRSS2 is expressed in several tissues but is highest in the prostate.


TMPRSS3 is expressed in many tissues including the fetal cochlea.  Mutations can cause deafness.


TMPRSS4is overexpressed in pancreatic and other cancers; it may be involved in tumor invasion.


TMPRSS5 is expressed as alternate transcripts.  One transcript appears only in the brain and another is primarily expressed in the spinal cord.  The spinal cord is depicted below.


TRYTPASE is expressed in mast cells.


TRYPTASE B1 is present in mast cell granules.



Trypsin and chymotrpsin are present in the gut of Daphnia (von Elert, 2004).  The serine proteases C1r, C1s, and MASPs all show functional similarities to trypsin, such as the cleavage at basic amino acids (Gal, 2007).


Homologs of apoptotic proteins (such as HtrA2/Omi homolog high temperature requirement protein A2 and apoptosis-inducing factor AIF) are known in bacteria (Chose, 2003).  Omi/HtrA2 is an apoptotic serine protease which is homologous to the bacterial endoprotease HtrA.  In bacteria, it functions in the folding and degradation of proteins.  Under normal conditions it is contained in the mitochondria, but in apoptosis, it binds (and thus inactivates) the inhibitor of apoptosis proteins (IAPs).  AIF and EndoG are widespread in both prokaryotes and eukaryotes.  The homology of mitochondrial HtrA-like proteases to bacterial enzymes supports the endosymbiotic origin of mitochondria (Lorenzo, 2004). 




     Three serine proteases called MASPS activate the complement cascade after interacting with the MBL or ficolins bound to a microbial substrate.  They are homologous to the complement factors C1r and C1s, and share a common structural organization, including serine protease domains (Presanis, 2003).  MASP-2 creates the C3 convertase C4b2a by cleaving the complement proteins C4 and C2.

   MASP-1 functions in the immune system but is similar to thrombin (Presanis, 2003).  MASP-2 can be alternately spliced to produce a small MAp19 protein which doesn’t function as an enzyme but participates in the complement pathway (Presanis, 2003).  In humans, the MASP1/3 gene encodes both MASP 1 and MASP3 while the MASP-2 gene encodes both MASP-2 and sMAP (Fujita, 2004).

The liver synthesizes complement proteins, clotting factors, and MBL associated serine proteases (MASP 1 and 2) (THiel,)

(after Gadjeva, 2001)



MASP1 functions in many vertebrates in defenses against bacteria and binds to polysacchraides in enterobacteria.  It activates the complement proteins C3 and C2 without involving the antibody-antigen pathway.


MASP2 activates complement proteins C4 and C2


Diverse insects have evolved the ability to clot their hemolymph by incorporating proteins used for other functions in clots. These proteins proteins which have been secondarily incorporated into clots include oral proteins of fly larvae and the silk proteins of butterflies (Korayem, 2007).


In between the red blood cells in the above pictures, you can see several small purple platelets.  Some of the molecules released by platelets in response to an injury are involved in blood coagulation.  Some have asked the question as to how a clotting factor cascade, where a number of factors acting one at a time are needed to produce the end result.  Interestingly, it seems as if the mechanism of blood clotting is a modified pathway which originally was involved in immunity.




The coagulation cascade evolved from a modified duplicate of the complement cascade (Gal, 2007).


     Thrombin, the protein which converts the inactive blood protein fibrinogen into the fibrin which forms a blood clot, is a serine protease.  There are a number of serine protease cascades (vertebrate coagulation, vertebrate complement, arthropod hemolymph clotting, and arthropod developmental determination of dorsal and ventral positions) which involve three central serine proteases.  The third (downstream) proteases easter (in arthropod development), arthropod clotting enzyme, complement C2, and thrombin cleave precursor molecules spatzle, coagulen, C3, and fibrinogen to form the Toll ligand, coagulin, C3a & C3b, and fibrin (Krem, 2002).  Many of these serine proteases are homologous.  Thrombin is homologous to C1r and C1s of the complement cascade.  Human clotting factors VII, IX, and X are homologous to the factor C of the horseshoe crab clotting cascade.  The arthropod developmental cascade may be the closest to the ancestral cascade that gave rise to others (Krem, 2002).

  Thrombin can stimulate chemotaxis of monocytes and neutrophils in wound repair and promotes differentiation in macrophages (Banfield, 1992).  Clotting factor C has high sequence homology to complement factors C1r and C1s .  Functional linkages between development, immunity, and hemostasis are also found in vertebrates.  Enzymes of the coagulation cascade participate in immunity, cell growth and embryogenesis.  Thrombin is not only expressed in the liver, the major site of the clotting factor synthesis, but also in developing and adult rate brains.  Thrombin proteolytically activates protease activated receptors (PARs) connected to G-protein signal transduction cascades, promoting the survival or apoptosis of glial cells and neurons, the survival of myoblasts, and neutrophil chemotaxis.  Prothrombin can even promote the migration of cells through the extracellular matrix.  Factor Xa can act as a growth factor…(Krem, 2002)


     Tissue plasminogen activator (t-PA) and urokinase-type plasminogen activator (u-PA) can cleave plasminogen.  All are serine proteases.  Plasminogen also has other roles in wound healing, inflammation, and neural degeneration through targets other than fibrin (Hervio, 2000).  Apolipoprotein(a) is homologous to plasminogen and the gene possessed by humans is known only in Old World monkeys and apes.  High levels of apo(a) can interfere with plasminogen activation and increase the risk of heart attack and stroke.  Some insectivores (hedgehogs) possess an apo(a)-like molecule but it seems to have evolved separately from that found in catarrhine primates (Lawn, 1997).

     Within the protease superfamily of genes, one family of related proteins includes clotting factor IX, factor X, factor VII, and protein C.  Factors VII and X remain linked on chromosme 13.  Clotting factor XII, tissue plasminogen activator, and urokinase are related proteases as are Clotting Factors VIII and V.  Hagfish possess prothrombin molecules homologous to those of vertebrates (Banfield, 1994).  Coagulation factor VII interacts with tissue factor to initiate the extrinsic pathway for coagulation and is known to exist in zebrafish.  (Since little is known about the existence and extent of the coagulation cascade in fish, this was a finding which will help direct future research).  The zebrafish domain structure of Factor VII possesses the shared domains found in coagulation factors VII, IX, X and protein C (Sheehan, 2001).  The intergenic DNA between factors VII and VIII is similar to the intergenic DNA of the trypsin cluster in Drosophila (Hanumanthaiah, 2002).

     Most birds lack clotting factor IX.  Ostriches also lack factors VII, X, XI, and XII.  Caimans lack additional clotting factors; only factors I, II, and X have been reported.  Ostrich clotting mechanisms seem to be intermediate between those of reptiles and those of higher birds (Frost, 1999).

     The sea urchin protein SpBf is a complement protein which possesses SCR domains, a von Willebrand factor domain, and a serine protease domain.  Sea urchins possess complement C3 proteins which seem to function in opsonization and whose levels increase in response to infection (Smith, 2002). 



In the pufferfish (a teleost), proteins are known which are homologous to most of the 26 mammalian proteins involved in coagulation although a few are absent and several exist as multiple copies.  While the tunicate genome possesses gene family members (and the functional domains) of almost all of these proteins, none of them are truly homologous, indicating that the evolution of the vertebrate coagulation cascade occurred after the separation of the urochordate lineage from that of vertebrates (Jiang, 2003).  Teleosts have all the clotting factors found in mammals plus a additional VII-like homolog (Hanumanthaiah, 2002).




      The blood of some invertebrates can clot in response to injury, but not through the cascade found in vertebrates.  The invertebrate coagulogen is not related to fibrinogen (coagulogen is instead similar to nerve growth factor).   However, there are fibrinogen-like molecules, in both vertebrates and invertebrates, including a group known as lectins.  The first fibrinogen-related proteins in invertebrates were discovered by researchers who specifically predicted that such molecules would be found based on an evolutionary model for life’s diversity (Xu and Doolittle, 1990).

     In horseshoe crabs, the tachylectin 5A produced in the hemolymph (rather than being expressed on the cells of the hemolymph as are other tachylectins) and causes the agglutination of bacteria (Adema, 1997; Gokudan, 1999; Kairies, 2001).  Its amino acid sequence, 3-dimensional shape, and calcium binding site are homologous to those of fibrinogen.  The differences in binding (tachylectin 5A binds carbohydrates while fibrinogen binds the protein of other fibrinogen monomers) is due to 2 loops in fibrinogen (P3 and P1) which are 7 and 14 amino acids shorter than the corresponding loops in tachylectin.  Tachylectin 5A molecules can function alone but in vitro (and potentially in vivo) they can combine to form tetramers (which would result in increased specificity in binding).  Vertebrates, including humans, have homologues of fibrinogen which function in innate immunity called ficolins which recognize carbohydrate groups on bacteria.  Human ficolins bind the same molecules as tachylectin 5A and are more closely related to tachylectins than to fibrinogen.    (Kairies, 2001).    Interestingly, the von Willebrand factor which is also involved in coagulation is homologous to invertebrate lectins (Adema, 1997).

     Vertebrate fibrinogen is composed of several subunits, a, b, and g in all vertebrates studied (including lampreys).  These subunits are homologous, suggesting that a common ancestral gene duplicated to produce the a gene and the precursor of the b-g genes which duplicated subsequently.  The ancestral b and g fibrinogen gene seems to have resulted from gene fusion with an a-like fibrinogen gene (the N-terminus) with a second gene homologous to cytotactin and pT49 in humans (the C-terminus) (Henschen, 1983; Weissbach, 1990).

Caseins compose more than 80% of the protein in milk Most caseins interact with calcium and are important not only as a source of amino acids for the newborn, but also for the calcium the carry. {kappa}-casein is not homologous to other caseins and seems to have originated from a modified duplicate of a fibrinogen gene. All therian mammals express {kappa}-casein and at least one of the calcium-sensitive caseins. Mice with mutations in their {kappa}-casein gene fail to lactate (Shekar, 2006).


     When blood clots, several clotting factors and several proteins involved in the regulation of coagulation perform a reaction converting the amino acid glutamate to g-carboxyglutamate (Gla) after translation.  Not only is this reaction essential for blood clotting (where it was first discovered), it also has other functions in vertebrates and occurs in several bone proteins, for example.  This reaction and the enzyme which catalyzes it (g-glutamyl carboxylase) were thought to be found only in vertebrates.  It is now known in insects and molluscs as well where g-carboxylation of glutamate has several roles, such as the production of venom peptides.  The g-glutamyl carboxylase gene is conserved between mammals (including humans), insects, and mollusks.  In fact, the correspondence of intron/exon boundaries is surprisingly homologous and eight of the introns appear to have predated the split in coelomate lineages in the Precambrian.  Thus, an enzyme which appeared to have been unique to vertebrates is a modified version of an ancestral enzyme which long predated the vertebrate clotting cascade (Bandyopadhyay, 2002).



     Tissue Factor (TF) serves as a cell membrane attachment (tether) for one of the protease enzymes of the clotting cascade (clotting factor VII).  It is homologous to cytokine receptors –receptors for erythropoeitin, interleukins, colony stimulating factor, interferon, and several hormones.  This group belongs to the immunoglobulin superfamily which is one of the largest protein families in the animal kingdom.   Before vertebrates evolved a coagulation cascade involving TF, receptors-related to TF were already present on cell membranes and their functions included the response to infection (such as might occur after a wound) (Bazan, 1990).



          Plasmin can cleave other substrates far more effectively than its typical substrate.  A negative selectivity seems to have decreased the efficiency of its binding to properly regulate the amount of fibrinolysis (Hervio, 2000).



A number of membrane proteins are activated by the splitting of the membrane portion from a cytoplasmic portion.  Once separated from its membrane tether, the cytoplasmic portion executes diverse biological functions. Four families of proteins are known to perform intermembrane proteolysis, the presinilins, the site 2 protease (S2P) family, rhomboids, and signal peptide peptidase (SPP).  Presenilin are members of the aspartyl protease family.  Among their targets is the Alzheimers precursor protein (APP) (Urban, 2002).


A protein fold motif is known in a number of secreted proteins including the whey acidic protein (the primary protein in rodent milk) and in a group of serine protease inhibitors. Humans possess a group of 14 serine protease inhibitor genes on chromosome 20. These proteins function in innate immunity. Most of the genes found in humans are shared among placental mammals (and two pseudogenes in humans represent functioning genes in rodents) (Clauss, 2005).