The initiation of transcription by RNA polymerase II often is the rate-limiting step in gene expression and is determined by the interaction of the enzyme with a number of nuclear factors and DNA elements in the regulatory region of the transcription unit.11,12 Two types of elements participate in regulation of transcription of the typical eukaryotic gene. The first component is the cis-acting element, which is a DNA segment located in the regulatory region of a transcriptional unit. These cis elements are located near the structural region of a gene. Another component is the trans-acting element or factor, which usually consists of a DNA-binding protein present in the nucleus that is encoded by a gene separate from the transcriptional unit that is being regulated and interacts with a cis-regulatory element. These nuclear factors are said to “act in trans” to regulate gene expression. Mutations that affect trans-activation occur in distant genes but not in the gene that is being regulated. Interactions of transacting factors with cis-DNA elements determine tissue-specific, developmental, and regulated expression of genes.
Three important DNA elements—the TATA box, the upstream promoter element, and enhancers—occur in the regulatory region, which is commonly located at the 5′-flanking region of the transcriptional unit (see Fig. 3-6).12,13 However, regulatory regions or elements thereof may occur in other sections of the transcriptional unit, including introns and 3′-flanking regions. The key elements include the TATA box, an AT-rich region located in a fixed position and orientation 25 to 30 nucleotides upstream from the cap site. This DNA element binds several TATA-binding proteins and associated factors found in the nucleus that allow for efficient interaction of RNA polymerase II with the transcriptional unit. The TATA box interactions, although not strictly required, are important for accurate and efficient initiation of transcription. The nature of the transcription initiation complex has been clarified. The components include at least seven transcription factors (i.e., TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, TFIIJ) and RNA polymerase II. A major constituent is TFIID, which is also a large multisubunit (>700-kDa) complex with at least eight proteins. One of these, the TATA-binding protein (TBP), allows the binding of TFIID to the TATA box or related Inr (initiator) sequences. The other components of TFIID have also been partially characterized and are known as coactivators or TBP-associated proteins (TAFs). They are essential for the communication of enhancer-binding protein signals to the basal transcriptional machinery and the subsequent regulation of gene expression. Basal transcription depends on the formation of a preinitiation complex involving TFIID-TFIIA-TFIIB, followed by the rapid entry of RNA polymerase II to facilitate the establishment of the transcriptional machinery.14,15 and 16
The second DNA element is the upstream promoter element (UPE), which is located 60 to 110 nucleotides upstream from the cap site.17 It includes elements such as the CCAAT and GC-rich (GGGCGG) boxes that bind to CAAT-binding proteins and SP1 (a cellular DNA-binding protein that interacts with the SV40 genome), respectively. These elements also associate with DNA-binding proteins that augment the efficiency of transcription by RNA polymerase II. These UPEs may or may not require specific TATA boxes to perform their function most efficiently. Together, the TATA box and UPEs are components close to the structural region and are essential for maintenance of basal levels of gene transcription (Fig. 3-7).
FIGURE 3-7. Regulation of transcriptional rates by interactions of transacting factors. Various permutations of interactions of nuclear binding proteins with various DNA elements within regulatory regions determine rates of transcription at the basal and regulated levels. Such proteins include the TATA box, the upstream promoter element (UPE), and enhancer or hormone regulatory element (HRE)–binding proteins.
Enhancers are located in variable positions and may act independently of orientation. They may be located more distal than the promoter elements and are found up to several thousand nucleotides upstream or downstream of the transcriptional unit. These elements also bind proteins that enhance transcriptional rates or diminish them (i.e., silencers) in an ill-defined manner and constitute the foci of regulated transcription (see Fig. 3-7). Several proposed mechanisms include the cooperative interaction of a number of DNA-binding proteins to effect efficient formation of the transcription-initiation complex of RNA polymerase II with the regulatory or promoter region.18 Another hypothesis suggests that the interaction of proteins with these elements opens up the configuration of DNA, perhaps by “bending” to allow access of the gene to the transcription machinery.
With recombinant DNA techniques, a reporter gene construct can be produced that may be transfected into foreign cells by gene transfer.19,20 This allows the expression of the reporter gene with enhanced production of an enzyme or polypeptide product that is not normally produced in eukary-otic cells. The synthesis of such products may be detected by sensitive enzyme assays or radioimmunoassays. DNA constructs in which a structural region corresponding to the enzyme alone is transfected into cells are not expressed in the absence of regulatory regions. However, if a promoter element is placed 5′ to the reporter gene, then expression may occur. Using such approaches, structural analysis of various portions of the 5′-regulatory regions of genes, including enhancer and upstream regulatory elements, may be performed.
After the RNA transcript is initiated, the RNA polymerase II continues the process of template reading by elongation of the transcript until termination occurs. The actual site of transcription termination is variable, located 50 to 200 or more nucleotides downstream from the 3′ end of the last exon or polyadenylation site.21 Although potential weak consensus sequences have been discerned that may determine the site at which the initial RNA transcript is terminated, the polyadenylation site appears to be obtained only by virtue of endonucle-olytic cleavage of longer heterogeneous 3′ ends of the hnRNA. The well-conserved consensus polyadenylation site sequence AAUAAA, which is located 15 to 20 nucleotides upstream from the polyadenylation site, and other proximal downstream sequences 10 to 12 nucleotides from the polyadenylation site, may serve as points of recognition for this processing event. Although the mechanisms of polyadenylation are not well known, the presence of the consensus sequences suggests a requirement for stem-loop formation and involvement of small nuclear ribonucleoproteins (snRNPs).
MESSENGER RNA PROCESSING
The hnRNA product of gene transcription is rapidly processed in the nucleus with a half-time (t1/2) of 5 to 20 minutes (Fig.3-8).22 Three major events transform the large heterogeneous RNA precursors into the mature RNA. First, at the 5′ end of hnRNA, a 7methylguanosine residue is added to the first nucleotide of the transcript by means of a 5′-5′ triphosphate bond after 20 to 30 nucleotides have been polymerized. This reaction is rapid (t1/2 < 1 minute) and is catalyzed by the 5'-capping enzyme, including guanylyl and methyl transferases. The 5'-methyl cap associates with 5'-cap binding proteins, which favors the formation of a stable 40S translation-initiation complex and increases the stability and efficiency of translation of the eventual mature mRNA.23
FIGURE 3-8. Gene transcription and RNA processing. The initial RNA transcript is known as heterogeneous nuclear RNA (hnRNA I). It contains exons and introns of the structural region and rapidly undergoes 5' capping with 7methylguanosine (7meG) and 3' polyadenylation (An) (hnRNA II). Little heterogeneous nuclear RNA has been detected without 5' cap or 3' polyadenylated (poly[A]) tails. In a slower process, introns are removed by RNA splicing followed by religation of exon sequences. The mature messenger RNA (mRNA) is composed of fused exon sequences and contains a 5' cap and a 3' poly(A) tail.
The second modification occurs at the 3' end and involves the addition of a polyadenylate, or poly(A), tail. Polyadenylation includes the addition of 250 to 300 adenylate (A) residues at the polyadenylation site located at the 3' end of the RNA. This poly(A) tail, which is reduced to 30 to 250 residues during nuclear processing and export, may also be important for increased RNA stability. These two additions, capping and polyadenylation, occur within minutes after the synthesis of hnRNA and generally before RNA splicing; almost all isolated hnRNA contains both modifications.
The third major processing step involved in mRNA maturation is the removal of introns during RNA splicing.24,25 and 26 This process includes endonucleolytic cleavage of introns and religation of exons. The 5' and 3' ends of introns have consensus sequences, as shown in Figure 3-9.
FIGURE 3-9. RNA splicing: consensus intron sequences and mechanisms for intron removal. A consensus sequence has been determined for the 5' and 3' ends of intron sequences. Data suggest the potential mechanism of intron removal by means of lariat formation preceded by interaction with nuclear RNAs. (nt, nucleotides.)
These consensus sequences may be necessary for the appropriate interaction of U1 snRNP species present in the nucleus to serve as a “splicing adapter” for the splicing process. Moreover, a polypyrimidine tract is located adjacent to the 3' AG residues and a critical adenylate residue in a branch sequence, 30 nucleotides upstream of the 3' end of the intron. The first step in the splicing process involves the formation of the spliceosome, which includes the hnRNA, U1 snRNP, and other factors. The initial event is endonucleolytic cleavage at the 5' splice site, followed by the formation of a 5'-2' phosphodiester bond between the 5' G and the downstream A located in a branch sequence. This “lariat” intermediate is then cleaved at the 3' end and degraded, and the exons are ligated.
The removal of introns from hnRNA must be precise; errors can change the exon or mRNA-coding regions. The sequence of removal of multiple introns within a gene is generally nonrandom, although the mechanism is unknown. Variations in the splicing pattern in a given hnRNA transcript can occur, and tissue-specific interactions of RNA splicing-modification proteins may dictate alternate patterns of intron-RNA splicing, causing altered mRNA forms.27 In a complex transcriptional unit, an alternate exon choice, including alternative internal acceptor and donor site use, may yield different mRNA products. A complex transcriptional unit may also possess alternate transcriptional start sites in the same contiguous segment of DNA (i.e., in the same exon) or in multiple transcriptional start sites in different exons contributed by alternate exon choice. Another possible mechanism for diversity in the complex transcriptional unit is alternative final exon choice (i.e., differences in polyadenylation sites).
The splicing process is another rate-limiting step and takes place over 5 to 30 minutes; it is much slower than the capping and polyadenylation reactions. What role RNA splicing plays in the informational flow is unclear. However, the potential contribution of RNA diversity by RNA splicing has been discussed. There are mRNAs that lack poly(A) tails (e.g., histone mRNAs), mRNAs that lack a 5' cap (e.g., poliovirus mRNAs), and eukaryotic genes that lack introns. Such modifications are not essential for RNA maturation.
The newly synthesized mature mRNA is actively transferred from the nuclear to the cytoplasmic compartments by way of the nuclear pore complex (NPC). The NPC is a large multiple-component structure that is located in the nuclear envelope and serves as a channel for the movement of macromolecules such as RNAs. The mRNA and other RNAs subject to transport are closely associated with proteins and exist as ribonucleoproteins. Each RNA likely possesses distinct protein-targeting sequences that permit its export and import. This shuttling of mRNA from the nucleus to the cytoplasm is mediated by a large family of transport factors known collectively as exportins and importins. However, the precise nature of the interactions of these shuttling proteins, mRNAs, and the NPC is not well understood.28
The structure of mRNA is shown in Figure 3-10. The exons encode two major regions of the mRNA: translated and untranslated. The translated or coding region contains the open reading frame, beginning from the initiation methionine codon to the termination codon. The untranslated regions flank the coding region and are known as 5' or 3' untranslated regions. The functions of the untranslated regions are not well established, but data indicate that the 5' untranslated region may be important in determining the efficiency of translation of the mRNA.29,30 The 3' untranslated region may contain important RNA elements, especially several AU-rich sequences that determine the stability of mRNA in the cytoplasm.31 Each of these regions may mediate its effects by binding to specific RNA-binding proteins.32,33
FIGURE 3-10. Structure and translation of messenger RNA (mRNA). In most cases, the mature mRNA represents the fusion of multiple exons. These sequences encode two major regions: translated and untranslated. The translated or coding region is delimited by the translation initiator codon, AUG, at its 5' end and the termination codon, UGA, UAA, UAG, at its 3' end. This coding region represents a series of codons in an open-reading frame that determines the amino-acid sequence of its encoded polypeptide. The 5' and 3' untranslated regions are shown. The mRNA enters the cytoplasm to interact with the ribosome. There, protein synthesis is initiated, and by way of a series of several cotranslational events, secretory polypeptide hormone precursors are processed. The steps involve cleavage of the signal or leader peptide, followed by addition of asparagine-linked carbohydrate moieties in glycoprotein subunits or hormones, and intramolecular folding with the formation of disulfide linkages. These events occur within the lumen of the rough endoplasmic reticulum. These partially processed polypeptide hormones are then shuttled to the Golgi stack, where these molecules are transported, sorted, and further processed posttranslationally to yield the bioactive hormone located in secretory granules or vesicles.
The mRNA in the cytoplasm rapidly interacts with the ribosome (see Fig. 3-10). The ribosome is a complex ribonuclear particle that contains 28S, 18S, and 5S RNAs, along with a group of ribosomal proteins. Among these proteins are factors responsible for the initiation, elongation, and termination of mRNA translation. For the typical mRNA, 3 to 15 ribosomes may be attached at any given time. As the ribosome reads the mRNA in the process of translation, amino acids are brought to the translation complex by way of adapter tRNA molecules. These molecules are differentiated by the presence of anticodon structures (i.e., RNA sequences complementary to a particular codon) at one end and attachment sites for specific amino-acid residues at the other end of the L-shaped molecule. The reading of successive codons causes the alignment of the appropriate amino acids and polymerization to yield the polypeptide chain.
Translation initiation occurs at the initiator codon or AUG, which represents the amino residue methionine. Translation generally begins at the first AUG codon located at the 5' end of the mRNA. This initiation methionine codon is normally followed by an open reading frame of codons encoding amino acids until a termination codon is reached. When a UAG, UAA, or UGA is encountered, protein synthesis stops, and the nascent polypeptide chain is released from the ribosome complex. The context of the methionine codon that is used for translation initiation has been characterized further to include a consensus sequence: 5'-CCACCAUGG-3'.
This sequence nest presents the AUG as the most favorable initiation codon.34 However, examples have been found in which the AUG is located 5' of the authentic start site. In these instances, the context may not be ideal or may be quickly followed by a termination codon in frame. Whether peptides encoded by these short-reading frames are eventually expressed is unknown.35,36
All polypeptide hormones and almost all other proteins destined for membrane, lysosome, ER, and Golgi stack locations or for secretion are encoded by a larger polypeptide precursor. All polypeptide hormones possess a signal or leader peptide that is a characteristic segment of protein located at the N-terminal end37,38 (Table 3-1). Although no consensus primary sequence has been obtained for this signal peptide, it generally possesses a hydrophobic core preceded by basic amino-acid residues in its 16- to 30-amino-acid residue extent.
TABLE 3-1. Polypeptide Hormones: Some of Their Precursor Proteins*
Several events occur before the entire polypeptide chain is synthesized (Fig. 3-11). After the synthesis of ~70 amino acids, the signal recognition particle (SRP), a group of six proteins and a small RNA (7S), interacts with the signal peptide to momentarily halt translation elongation in the RNA–ribosome-nascent protein complex.39,40 The 7S RNA contains a signal peptide recognition and an elongation arrest domain. This complex then interacts with the SRP receptor, an integral membrane protein located on the cytoplasmic face of the ER. In this process, poly-ribosomes are attached to membranous structures associated with the endoplasmic reticulum to form the rough ER (RER). After this interaction occurs, translational arrest is relieved, and translation proceeds as usual. At this point, the signal peptide is vectorially transported through the membrane into the cisternal aspect of the ER. The newly synthesized protein has been translocated from the inside to the outside of the cell in a topologic sense.
FIGURE 3-11. Details of translational and cotranslational processes. The messenger RNA (mRNA) interacts with the ribosome where protein synthesis is initiated. In the case of polypeptide hormones, the first segment of protein synthesized is the N-terminal signal or leader peptide. As soon as the signal peptide emerges from the ribosomal complex, a protein-RNA particle known as the signal recognition particle (SRP) associates with the signal peptide. This interaction allows the ribosomal-mRNA–nascent polypeptide complex to interact with the SRP receptor located on the cytoplasmic face of the endoplasmic reticulum (ER) membrane and brings the ribosome in close apposition to the ER to form the rough ER. The momentary translational arrest that occurs on interaction of the complex with SRP is released to allow further protein synthesis. Cleavage of the signal peptide from the apoprotein by signal peptidase and other modifications, including addition of asparagine-linked carbohydrates (CHO), intramolecular folding, and disulfide linkage formation, occurs coincidentally with release of ribosomes from the ER. In this manner, the partially processed protein, although initially synthesized in the cytoplasmic space, enters the luminal space.