Modern organisms are made of a variety of proteins.


While modern organisms store the code for the amino acid order in proteins in their DNA, it is RNA which carries this code to the ribosomes where the proteins are actually made.  To accomplish this, sections of DNA molecules must be copied into RNA by the enzyme RNA polymerase.  This process is called transcription.  In bacteria there is one RNA polymerase which transcribes DNA while in eukaryotes there are three.  Ten of the subunits found in the yeast polymerases are homologous to polymerase subunits in humans.  In eukaryotes, 9 of these 10 subunits are conserved between the three polymerases.  RNA polymerase II not only transcribes most protein-coding genes in eukaryotes, it is required for transcription-coupled repair of DNA. The core enzyme subunits of RNA polymerase have been conserved from bacteria through humans (Vassylyev, 2002).

     The various subunits of RNA polymerase II are scattered throughout the genome (A on 17p13, B on 4q12, C on 16q13, I on 19q12, J on 7q22, F on 22q13, L on 11p15, E on 19p13, H on 3q28, and K on 8p22).  POLR2A and B compose the active center of the enzyme while POLR2A, E, and I compose the region which grips the DNA downstream of the active center.

     The largest subunit of RNA polymerase II (RPB1) is homologous in prokaryotes and eukaryotes but in most eukaryotes there is an additional domain at the C-terminus of the protein.  This domain interacts with other proteins to regulate gene expression, transcription elongation, and RNA processing.  In protists, this region may be simple and even absent, while in higher eukaryotes it possesses  tandem repeats of seven amino acids.  This set of tandem repeats is essential for the complexity of transcription control required in higher eukaryotes (Stiller, 2002).


POLR2A is the largest subunit of RNA polymerase II with a size of 220 kD.  It is homologous to RPB1 in yeast.


POLR2B is the second largest subunit of RNA polymerase II.  It is homologous to RPB2 in yeast.


POLR2C is homologous to RPB3 in yeast.


POLR2D interacts with POLR2G and is homologous to RPB4 in yeast.


POLR2E is homologous to RPB5 in yeast.


POLR2F is homologous to ABC23 in yeast and contains a leucine zipper.


POLR2H is homologous to RPB8 in yeast.


POLR2I is homologous to RPB9 in yeast.  Yeast cells which lack this subunit can’t grow in extreme temperatures and transcription can begin at inappropriate sites.


POLR2J is homologous to RPB11 in yeast.


POLR2K is homologous to RPB12 in yeast.


POLR2L is homologous to RPB5 in yeast.



RNA Polymerase III transcribes shorter genes.

POLR3K is homologous to RPC11 in yeast.


     While bacteria, archaea, and eukaryotes perform transcription through multisubunit RNA polymase enzymes, the RNA polymerase enzymes of some bacteriophages, mitochondria, and chloroplasts are single-subunit enzymes.  These single-subunit enzymes have structural similarities to the mutli-subunit enzymes and initiate transcription in the same manner although there are no described sequence homologies to the multisubunit enzymes (Tahirov, 2002).



     In eubacteria, RNA polymerase can bind to DNA and initiate transcription on its own.  In eukaryotes, the situation is much more complex and RNA cannot initiate transcription without a number of additional proteins. Archebacteria seem to require transcription factors for transcription, suggesting links between archaeal and eukaryotic mechanisms (Ouzounis, 1992).  Interestingly, archebacteria share some of these transcription factors (such as TDP and TFIIB) with eukaryotes (Bagby, 1995; Barinaga, 1994).



The TATA Binding Protein (TBP; represented in magenta below) and a number of TBP-associated factors (TAFs; represented in blue below) form TFIID which binds to the DNA at the TATA box.


     The mammalian TBP protein has a 180 amino acid core which is common to all eukaryotes and a N-Terminal domain which is common to all vertebrates.  In mice, this N terminal domain is important in the maternal immunotolerance of pregnancy.  TBP is needed for the transcription of most genes although RNA polymerase II (but not RNA polymerase I or III) does seem to function with other mechanisms which are independent of TBP.  The TBP gene possesses a long CAG repeat and its length is polymorphic in humans.  It is possible that different alleles contribute to neurological disorders; in some cases this seems to be the case.  One mutation causes spinocerebellar ataxia.


TAF1 seems to be important in the cell cycle stage G1 as demonstrated by temperature sensitive mutants in mice.


TAF2 is homologous to TAFII150 in Drosophila.


TAF3 possesses a histone fold sequence and interacts directly with TBP


TAF4 interacts with the protein huntingtin and may be involved in some neurodegenerative disorders.  In at least 8 neurodegenerative disorders, proteins with variable numbers of CAG repeats (which code for glutamine and are often referred to as polyQ repeats) are involved; these polyQ regions of the proteins interact with TAF4.  In transgenic mice, TAF4 was able to offer some protection from mutant huntingtin protein.


TAF4B mutants in mice cause female infertility; TAF4B seems to be required for ovary development.


TAF5 us homologous to TAFII90 in yeast.


TAF6 is involved in programmed cell death.   In response to apoptotic signals, its expression can increase and a different isoform (TAF6 delta) is produced.  The resulting variant of TFIID promotes transcription of other genes involved in apoptosis.  Caspase can cleave TAF6 delta.


TAF6-like is part of the P/CAF complex which functions in transcription regulation.


TAF7 is expressed in all tissues.


TAF7-like is expressed only in the testes.


TAF9 interacts with p53 at the same site as MDM2, which is the major inhibitor of p53.


TAF10 is present in some but not all TFIID complexes.


TAF11 possesses a histone fold sequence.


TAF12 interacts directly with TBP.


TAF13 possesses a histone fold sequence and interacts directly with TBP.


TAF15 is similar to the proto-oncogenes EWS and FUS.  A fusion gene of TAF15 and CSMF has been involved in chondrosarcomas.




     The complex TFIID requires TBP and at least 8 TAF proteins.  Most of a cell’s TBP however is incorporated into B-TFIID which is composed of TBP and BTAF1.  BTAF1 uses ATP to dissociate TBP from DNA and is homologous to the yeast protein Mot1.


TFIID is not the only genteral transcription factor complex; RNA polymerase II requires others as well.  TFIIB.   TFIIE, TFIII and TFIIJ bind RNA polymerase II first. 



GTF2B is part of the TFIIB complex which is involved in transcription in eukaryotes, eubacteria, and archebacteria.  It is homologous to cyclin A and cyclin A may have evolved from proteins with a more generalized role in transcription.



GTF2E1 and E2 are the two subunits of the TFIIE complex which exists as a heterotetramer.  GTF2E1 is similar to bacterial transcription factors.



GTF2A1 and A2 are components of the TFIIA complex which also forms part of the reinitiation complex.



GTF2I associates with a protein (BTK) whose mutations cause agammaglobulinemia, upstream stimulating factor-1, and other proteins.


GTF2IRD1 is one of the genes deleted in Williams syndrome and its loss may be responsible for the muscle fatiguability observed in the syndrome.



GTF2F1 is homologous to bacterial sigma factors and interacts with TFIID as RNA polymerase binds to it or just afterwards.






TFIIE helps to bind TFIIH whose kinase phosphorylates the largest subunit of RNA polymerase (POLR2A) to make the transition from transcription initiation to chain elongation.  CDK-activating kinase is part of the H complex and thus transcription can be regulated by signals affecting the cell cycle.


GTF2H2 is involved in transcription and transcription-coupled repair.




GTF2H4 is a homolog of the yeast protein TFB2.



RNA polymerase I is composed of 6-14 subunits and transcribes rRNA.  It requires the general transcription factors TBP, TAF1A, TAF1B, and TAF1C.  All three of these TAF proteins interact with TBP and each other.


RNA polymerase III is needed for the transcription of small RNAs and cytoplasmic RNAs including 5SRNA, tRNA, and adenovirus-associated RNA (of both cellular and viral origin).

GTF3A has 9 C2H2 zinc fingers.  It was the first transcription factor identified as well as being the first zinc finger protein identified.


GTFC1 is a component of RNA polymerase III.


GTF3C2 is expressed in all tissues.






Archaea use transcription factors TBP and TFB, homologs of the eukaryotic transcription factors TBP and TFIIB, for the transcription of non-stress genes.  Apparently, these are the only two transcription factors needed for RNA polymerase to bind to promoters (Thomsen, 2001).  Archaeal cells are depicted below.



     In the following images of cancer cells from the breast, liver, and uterus, genes are not being regulated properly. 


As ancestral eukaryotic cells became more complex, gene regulation became more and more important.   For example, while the ability to survive in diverse environments may require new enzymes (for new food sources or to synthesize essential molecules which are no longer present in the environment), producing these enzymes under all circumstances (constitutively) is a waste of energy and resources.  Organisms which can produce only those proteins which they need at any one time are more efficient than those which produce all of their proteins all of the time.  When eukaryotic cells evolved from the simpler prokaryotic cells, gene regulation became more important (for example, for controlling stages of the more complex cell cycle) and when multicellular fungi, plants, and animals evolved from simpler eukaryotes, gene regulation became more important still.  In most multicellular organisms, cells are specialized and although all cells have the same genes, each cell type is expressing a unique set of genes.  As organisms lived for longer periods, certain genes became appropriate only at different stages of an organism’s life cycle.

     All human cells, such as those depicted in the following images, have the same genes. 





Stratified squamous epithelia of the vagina:

Cells of seminiferous tubules:




Human life would be impossible without gene regulation: liver cells must express genes that are not expressed in brain cells or white blood cells; the gene expressed during fetal development must be different from those expressed as an adult.  In the human genome, almost ¼  of the genes for proteins produce factors needed for the replication of DNA, its maintenance, and the control of gene expression.  An additional 20% of the human genome is involved in signal transduction, which includes signals which affect gene expression (Brown, p.21).  Most proteins which regulate gene expression bind to DNA and determine whether or not an RNA transcript is made from the gene.  (Although there are other ways to control gene expression, transcriptional regulation is the primary mechanism in eukaryotes such as humans.)

     Despite the large number of genes which regulate the expression of human genes, most of these genes have possess one of several structural designs such as helix-turn-helix , HMG box, helix-loop-helix, and zinc finger motifs.  These gene families may possess hundreds of members.  Some of these proteins are subunits which interact with other factors before binding DNA.  For example, a molecule which interacts with another is a dimer; one which reacts with three others forms a tetramer.  Whether the subunits are products of the same gene or different genes determines whether they form homodimers or homotetramers vs. heterodimers or heterotetramers.

     Although all cells in the body possess the same genes, it is gene regulation which permits the differentiation of specialized cells.

epithelia basophil