The first transcriptional regulators discovered in bacteria were found to have similar DNA-binding regions: there were two a-helices separated by a short turn.   The second a-helix rests in the major grooves of the DNA double helix upon binding.  This helix-turn-helix (HTH) motif is only about 20 amino acids long and other parts of HTH proteins control the binding of the HTH region only at specific sites on the DNA molecule.   HTH proteins are common in bacteria and eukaryotes; in bacteria the HTH proteins include the lactose repressor of the lac operon and the tryptophan repressor of the tryp operon (a standard features of general biology textbooks).  The helix which interacts with the DNA is typically longer in eukaryotes (OMIM).

     Bacteria, archaea, and eukaryotes all utilize the helix-turn-helix domain in both general and specific transcription factors and these transcription factors include the most widespread prokaryotic transcription factors.  It is present in sigma factors, the RPB10 subunit of RNA polymerases, and in eukaryotic proteins such as Myb and BRCA2.   The HTH domain has proved to be very versatile since it has also been incorporated into proteins involved in DNA replication (such as the FtsK-HerA superfamily), transposases, integrases, RNA metabolism and binding, and even enzymes which bind to proteins rather than nucleic acids (such as the Rio family of protein kinases).  Carbamoyl phosphate synthetase HTH domains probably were present in LUCA (Aravind, 2005).

    The family of winged helix proteins were first identified in Drosophila with the forkhead gene. Since then, more than a hundred Forkhead (or Fox) genes have been described which can function in both embryonic development and adult physiology.    XBF-2 is a member of this gene family which promotes the formation of neural tissue through the transcriptional repression of BMP-4 (Mariani, 1998).  The mammalian forkhead transcription factors are homologous to the DAF-16 transcription factor of nematodes which functions in the cell cycle and metabolism (Hosaka, 2004).  In animals, Forkhead is expressed in S and begins a cascade of transcription factors and the B cyclins which will regulate the transitions between G2 and M and M and G1.  Forkhead genes are not known in plants (Potuschak, 2001). 

     The forkhead family had produced several subfamilies prior to the evolution of sponges (Adell, 2004).  Tunicates have homologs of forkhead/HNF-3B used in differentiation of neural tube, Pax, snail (DiGregorio, 1998).  Drosophila embryos utilize the gene forkhead in embryonic development for structures such as the gut.  The vertebrate homologs consist of a group of winged-helix genes, Pintallavis, HNF-3β, and HNF-3α which are expressed during gastrulation by the organizer region, axial mesoderm, and floor plate.   HNF-3B is required for development of the notochord (Weinstein, 1994).  One member of the forkhead protein family is HILS1, a spermatid-specific histone-like protein that may function in gene regulation and chromatin remodeling (Yan, 2003).  Mutant mice without Foxo1 gene expression fail to develop blood vessels in the yolk sac and die early in development.  Foxo3a mutations in females resulted in abnormal development of the ovary (Hosaka, 2004).

     One subset of  helix-turn-helix proteins evolved in eukaryotes which share a conserved set of 60 amino acids, called the homeodomain.  Several thousand homeodomain family proteins are known in eukaryotes and higher eukaryotes have clearly expanded on the ancestral sets: 6 homeodomain proteins are known from yeast, 82 from the worm C. elegans, 100 from Drosophila, and 160 from humans (Brown, p. 44).   This gene family includes a number of smaller subfamilies, some of  which (SIX, TALE, PAX, POU, and HOX) are essential regulators of differentiation during embryonic development.  One homoedomain protein, bicoid (Drosophila)/goosecoid (vertebrates), is one of the first signals expressed in a developing embryo.  The HOX clusters are a fascinating string(s) of genes which regulate the positional differentiation of cells along several axes in the body (the longitudinal axis, the axis of limbs, the brain, the GI tract, etc.).

     Another family of HTH proteins are those which possess a High Mobility Group (HMG) motif.  One member of this family is the SRY gene which determines gender in mammals.






     The gene family of HMG (high mobility group) Box proteins are located in more than 150 proteins which included DNA repair proteins and transcription factors of the nucleus and mitochondria (Yang, 2002; Bowles, 2000).  HMG proteins can interact with other transcription factors.   The HMG family are a set of proteins made of 3 a helices and a 79 amino acid conserved sequence called the HMG Box (most proteins have only one such sequence).  This group of transcription factors includes the SRY gene located on the Y chromosome which begins the process of gender differentiation in male mammals.  These transcription factors bend DNA upon binding to it. 

     HMG Box proteins evolved before the divergence of the major eukaryotic lineages and HMG proteins are known in animals, plants and fungi (Nagai, 2001).  In eukaryotes, the UBF (universal binding factor) contains multiple HMG domains as does the human mitochondrial transcription factor mtTFA (Yang, 2002).  The TCF/SOX subfamily includes two fungal proteins which determine mating type.   Five HMG proteins are known in maize (Krech, 1999) and HMG-I/Y in plants is involved in fertility (Lafleuriel, 2004).  The nematode C.elegans possesses an HMG protein which binds telomeres (Im, 2003).  Amphioxus possesses a single HMGB gene, which seems to represent the ancestral type of gene that was duplicated to produce the vertebrates HMGB1, HMGB2, and HMGB3 genes.  A single HMGB gene is also known in lampreys and sea urchins.  Two are known in trout (Liu, 2004).

     The HMGA family of HMG transcription factors are involved in cell growth, differentiation, and apoptosis and function in a number of signal transduction pathways.  They are proto-oncogenes and high constitutive expression levels of these proteins are common in diverse cancers.  Two genes, HMGA1 and HMGA2, are known in mammalian genomes (in addition to pseudogenes) (Reeves, 2001).

    Some HMG proteins possess two HMG boxes.  Flies have such a protein related to HMG1/2.  One of these HMG proteins is known in sea urchins, one in lampreys and multiple members in higher vertebrates.  It seems that a tandem duplication within an ancestral gene with one HMG box produced a gene with two boxes.  This occurred before the evolution of coelomates.  An ancestral HMG gene then duplicated to produce HMG1 and HMG2 in gnathostomes (Sharman, 1997).

      The HMG superfamily can be divided into two subfamilies.  The TCF/SOX/MATA group binds to specific DNA sequences (such as AACAAAG) with a single HMG domain while the HMG/UBF group possess multiple HMG domains and are less specific in their binding (Bowles, 2000; Koopman, 2004).


HMG1 is a structural protein associated with chromatin in all mammalian nuclei.


HMG4 is expressed during embryonic development.


HMG17 and HMG14 are among the most highly conserved proteins (other than histones) in the eukaryotic nucleus (such as the nucleus of the Amoeba pictured above).  


HMG 14 may be involved in some of the characteristics of Downs syndrome.  It associates with nucleosomes and seems to activate chromatin for transcription.


There are more than 70 copies of HMG17 in the genome, most of which are retropseudogenes (making this the largest known family of retropseudogenes) (OMIM).


HMG20a is expressed in the spleen, testes, heart, and other tissues.

HMG20b is most highly expressed in the prostate, testis, heart, and kidney.  It is involved in the regulation of the cell cycle and with BRCA2.


HMGIY can exist in two isoforms and can cause cancer.  HMGIY and is one of several HMG proteins (such as HMG14/17) which lack an HMG box (Sharman, 1997). 


HMGic can exist in different isoforms and is involved in cancers of the uterus and lipomas.


NHP2L1 is homologous to HMG-like proteins in yeast and is weakly homologous to some ribosomal proteins.  It is expressed in all human tissues.  It has 4 domains: a PWWP domain, HMG box, SET domain, and PHD zinc finger.  Mutations are involved in the Wolf-Hirschhorn syndrome.


SWI/SNF-related activates transcription and regulates chromatin.  The SWI/SNF complex of proteins is conserved in yeast through vertebrates and functions as a transcription factor.  In mammals, an additional unit is present which is lacking in flies and yeast.  This unit, BAF57, contains and HMG box(O’Neill, 1998).




T-cell specific transcription factor

TCF7L1. TCF7L2, and TCF7 are a subfamily of related genes that are expressed in lymphoid tissue.


TCF4 is a factor in colon cancer caused by mutations in APC.


LEF1 is expressed in lymphocytes, neural crest cells, teeth, and hair follicles.  Mutant mice lack teeth, mammary glands, whiskers and hair.



Nuclear antigen SP100 is expressed in lymphoid tissues.





     Mammalian genomes (such as those of humans and mice) possess about 20 Sox genes.  While most SOX proteins are transcriptional activators, some can serve as repressors (Argenton, 2004).    The members of this gene family can be divided into 8 groups, most of which are represented by a single gene member in invertebrates.  Single genes of the families B1, B2, C, and D had evolved by the divergence of nematode lineages from those of coelomates (Koopman, 2004).  SOXB1 family members are transcriptional activators expressed in the CNS while SOXB2 genes seem to function as transcriptional repressors in the CNS (Bowles, 2000).  Two additional genes, members of the E and F families had evolved by the branching of the coelomate lineages, in addition to a duplication of the B2 gene.  Teleost fish possess all of these families of SOX proteins, with multiple genes in each family.  A teleost fish is depicted below.


  By the origin of the mammalian lineage, three additional genes (Sry, Sox15, and Sox30) had evolved to form the A, G, and H groups. (Koopman, 2004).  Introns conserved in groups D, E, and F (Bowles, 2000).

     Group A contains mammalian SRY genes, Group B1 includes SOX 1, 2, and 3; Group C includes SOX 4,11, 22, and 24; Group D includes SOX5,6,13, and 23; GroupE includes SOX8,9, and 10; GroupF includes SOX7, 17, and 18; and Group G includes SOX 20 and 15.  Groups H, I, and J contain one gene member each (SOX30, SOX31, and SOXJ, respectively) (Bowles, 2000)..


    Not only have many of the genes themselves been conserved in coelomates, many of their expression patterns (and presumably their functions) have been conserved as well.  In both vertebrates and Drosophila, group B and group D SOX genes are expressed in the nervous system, group B1 genes are expressed in the eye, group C genes are expressed throughout the embryo.  Group F expression patterns do not coincide (Cremazy, 2001).


SOX1-3 are the most similar to the SRY gene and are expressed in the developing CNS. 


The Drosophila gene Dichaete is homologous to SOX2 and mouse SOX2 can replace the expression of dichaete in flies.  This fly gene may approximate the structure of the ancestral gene of the B group of SOX genes (Bowles, 2000).  Dichaete needed for embryological development and the development of the nervous system (Sparkes, 2001). SOX2 is required for hair cell differentiation (Kiernan, 2005).


SOX3 is located on the X chromosome.  The marsupial of SOX3 may be the gene from which the SRY gene was derived.  Both genes lack introns (O’Neill, 1998).  Marsupial SRY is more similar to the SOX3 gene on the X chromosome than placental SRY (Nagai, 2001).  Mutations in SOX3 cause testicular defects and retardation (OMIM). 


SOX4 binds to (A/T)(A/T)CAAAG regions in the enhancer elements of genes found in T and B lymphocytes.  The sequence of Sox4, which is expressed in the mammalian testis, is highly conserved in amniotes (Ganesh, 1997).


SOX5 is expressed in the testes and other tissues and can alternately spliced in different tissues.  Cells of the testis are depicted below.


     SOX9 is expressed in the brain, heart, and testes.  Sox9 is needed for the endochondral skull elements formed by cranial neural crest cells but not for intermembranous bone and that formed from mesoderm.  Sox9 is involved in chondrogenesis, being essential for chondrocyte development and the targeting the enhancer of chondrocyte specific genes, such as collagen Col2a1 (Mori-Akiyama, 2003; OMIM; Lefebvre, 1997/8).  Sox9 mutations cause the skeletal disorder campomelic dysplasia (CD).  In 75% of the XY CD patients sex reversal accompanies this disorder (Koopman, 1999).

     SOX9 mutations cause feminization of XY individuals and autosomal sex reversal. (Jordan, 2001, Vaiman, 2000).  In birds and mammals, SOX9 is preferentially expressed in males and functions in testis differentiation.  In frogs, it is expressed in both genders, seemingly involved in the development of both ovaries and testes (Takase, 2000).  Mice with null mutations of Sox9 did not develop cartilaginous structures (such as pharyngeal arches and the nasal apparatus) which are formed by cranial neural crest cells (Mori-Akiyama, 2003).


     SOX10 is expressed in neural crest cells, the PNS, and glial precursors.  Mutations cause the loss of neural crest cells in embryonic development.  Mutations can cause microcornia, pigmentation changes, deafness, intestinal agangliosis, mental retardation, abnormal EEG, and a white forelock. 

Mutations in SOX10 cause pigmentation and neural defects in both mice and zebrafish (Argenton, 2004).



SOX 11 is expressed in the nervous system (cerebral tissue is pictured below).


SOX13 binds to the sequence AACAAAG and is expressed in a variety of human tissues.  It may be an autoantigen involved in Insulin-Dependent Diabetes Mellitus (OMIM).


SOX14 is expressed in the fetal brain, spinal cord, testis, and other tissues.  It is involved in the development of limb buds and is highly conserved among the amniotes (OMIM).


SOX17 is required for endoderm development, as is a close homolog found in teleosts, Casanova (Shivdasani, 2002).


SOX18 is most highly expressed in the heart.

Sox 18 is a member of the F group of SOX genes.  Mutations in Sox18 cause cardiovascular defects and a loss of vibrissae in mice (Hosking, 2001).


SOX21 is expressed in the embryonic brain in mammals and chickens (chick brain depicted below).


SOX22 is expressed in a variety of human tissues, fetal and adult.


SOX30 is expressed in a variety of fetal tissues and the testes of adults.


The gene Sox100B is conserved in coelomates and it is expressed in the gastrointestinal tract, gonads, and excretory structures of both flies and vertebrates (Loh, 2000).



     The following picture is of a set of human chromosomes.  Whether these chromosomes will determine the development of male or female protein expression patterns is determined primarily by the presence or absence of one single gene: SRY.


    The sex-determining SRY gene is a member of the SOX gene family.  SOX genes have been conserved in coleomates and Drosophila.   DSox14 is similar to SRY (Sparkes, 2001).  In vertebrates, a number of SOX genes can expressed in the gonads including Sox 5, 6, 8, 9, 17, 20, 23, 24, and 30.  A duplication of a SOX gene, probably the ancestor of SOX3, gave rise to a gene named SRY which is not only expressed in the testis, but is the testis determining factor located on the Y chromosome of therian mammals (placentals and marsupials).  To date, the SRY gene has not been identified in monotremes.  It is present in marsupials, although it has not been demonstrated to function in sex determination.  Genes other than SRY can determine gender, given that some placental mammals lack SRY and that the majority human XY females do not have mutations in the SRY gene (Pask, 2000).

     SRY is located on Yp11 and is the tdf gene—the testes determining factor.  An XY individual with a mutation in SRY can develop as a normal female although typically there are no menses (in one case, there were menstrual cycles occurred but there were no follicles).  Some of the mutations of SRY which lead to the development of XY females are known to interfere with the protein’s ability to bend DNA rather than bind DNA (Koopman, 1999).  A fifth of human XY females possess mutations in the HMG box of SRY (Nagai, 2001).  In mice, such females can be fertile but the ovaries fail early in life.  An XX individual with the SRY gene translocated to one of the X chromosomes develops as a male.  SRY is sufficient to initate male development in mice which are chromosomally female (XX) (Holmes, 1996).  Mutations in the SRY gene during development can produce true hermaphrodites: individuals with both male and female tissue. SRY interacts with a number of autosomal genes which can determine the effects of mutations.  A father with a mutant SRY gene may be male, but can produce an XY daughter due to differences in the genetic background of these autosomal genes (OMIM). 

     SRY is most highly expressed in the genital ridge in the 6 week male prior to testis formation human embryo and is limited to the testis in adult males (Magararit, 1998).. SRY initiates the differentiation and production of Sertoli cells, migration of cells from the mesonephros into the testis, and development of male pattern of blood vessels (Tilmann, 2002).  The early embryonic gonads are capable of differentiating in male or female specific pathways.  SRY causes the development of Sertoli cells from cells which would otherwise have formed follicular cells (although the presence of PCGs is also required for the formation of follicular cells) (Tilmann, 2002).

     The evolutionary rate of SRY higher than that of other SOX genes and this high evolutionary rate of SRY complicates analysis of its phylogeny (Bowles, 2000; Nagai, 2001).