5 Gene Expression: Transcription

Walter Suza; Donald Lee; Philip Becraft; and Marjorie Hanneman

Learning Objectives
  1. Describe the roles that the promoter, coding region and untranslated regions of a gene play in gene expression.
  2. Describe mRNA processing steps.
  3. Draw the process of transcription and include the following in your drawing. DNA template and non-template strands, RNA polymerase, new RNA strand, and direction of RNA synthesis.
  4. Draw the process of mRNA processing and include the following in your diagram, Gene (DNA), promoter, coding region, introns, exons, pre-mRNA, mature mRNA, poly A tail, cap.

Introduction

Genes are DNA sequences that control traits in an organism by coding for proteins (Figure 1). Organisms such as plants and animals have tens of thousands of genes. The impact that a single gene’s information can have on an organism, however, is tremendous. Furthermore, organisms have all their genes in each of their cells, but they only need to use the information from a subset of these genes, depending on the type of cell and the cell’s stage of development. Therefore, the key to gene function is controlling its expression.

 

DNA is split in half (RNA) through transcription and then, through translation, converted to proteins.
Figure 1. The central dogma of molecular genetics. Image by Marjorie Hanneman and Walter Suza.

Gene structure and transcription

The DNA sequence contains the information to control all biological functions, including the manifestation of traits important to agriculture (yield, drought tolerance, disease resistance, etc., etc.) How is the information contained in DNA sequences converted into the cellular activities necessary for plants and other organisms to function? DNA sequences are used to direct the synthesis of other molecules that actually perform these cellular functions.

Most typical genes encode proteins. The production of a protein from a gene involves several different processes (Figure 1). Transcription involves the copying of the DNA nucleotide sequence into an intermediate nucleotide molecule called RNA (ribonucleic acid).  The primary RNA molecule is processed into a mature messenger RNA (mRNA) which then provides the information for the synthesis of a protein through the process of translation. Proteins are composed of amino acids connected by peptide bonds. The sequence of amino acids is determined by the sequence of nucleotide bases in the mRNA. The 20 amino acids have different chemical and physical properties and the sequence of amino acids determine the structure and function of the protein.

Gene structure

Only particular regions of chromosomal DNA are transcribed. A gene can be considered as the region of transcribed DNA, along with associated regions of DNA important for the regulation of transcription (Figure 1). A gene has several parts that are each important to the function of the gene.

The regulatory region (also known as the promoter) contains DNA sequence involved in the control of where and when the genes will be turned on to produce mRNA. The coding region is the part of the gene that is used as template to produce RNA molecules in a process called transcription. Some RNA molecules perform cellular functions directly while many others (messenger RNAs) are used to direct the synthesis of proteins in a process called translation.

DNA has a regulatory region, the double helix, and a coding region, the nucleotides and mRNA which make up its codons.
Figure 2. Every gene has a promoter and a coding region. Adapted from NIH-NHGRI.

a. Gene Promoter

The signals for starting and stopping transcription are located within DNA sequences. Specific nucleotide segments called promoters are recognized by RNA polymerase to start RNA synthesis. After the transcription of full-length RNA strand is completed, a second segment of DNA called terminator invokes termination of RNA synthesis and the detachment of RNA polymerases from the DNA template. 

b. Protein-coding region

The protein-coding region of a gene is composed of the sequence of nucleotides that codes for amino acids. As described further in the section on translation, the coding region begins with an ATG start codon (AUG in RNA) and then ends with one of three stop codons. These sequences include only exons, but not all exonic sequences are protein coding as they may include untranslated regions.

A line with 5'UTR before translation starts and 3' UTR after translation stops. In the middle is the protein coding region.
Figure 3. The protein-coding region of a gene contains nucleotides that codes for amino acids. The 5′ and a 3′ UTR sequences do not code for amino acids but contain regulatory sequences that influence gene expression. In mRNA, translation start is AUG and translation stop can be UAA, UAG, and UGA.  Image by Walter Suza.

c. Untranslated regions (UTRs)

Mature transcripts contain some sequences that do not code for amino acid sequences in proteins. These are referred to as untranslated regions or UTRs. Most mRNA transcripts contain a 5′ and a 3′ UTR. The 5′ UTR contains sequences toward the 5′ end of the mRNA sequence, before the start codon. These sequences can often be important for translational regulation, and sometimes other functions. The sequences following the stop codon are the 3′ UTR. The 3′ UTR may also have important functions regulating transcript stability or directing transcript localization within cells, or sometimes even transport (trafficking) between cells.

UTRs and introns are often useful in genetic studies. Protein coding regions are under strong selective pressure to produce functional proteins and so sequence variation is relatively rare. UTRs and introns on the other hand are under less stringent selection and are therefore sources of sequence variants that can be used to develop genetic markers.

Transcription

The genetic information of DNA is transferred to an intermediate molecule called RNA that is often translated to amino acid sequences used to build proteins. RNA is a nucleic acid, like DNA, but with some important differences. RNA contains a ribose sugar group instead of the deoxyribose found in DNA. RNA molecules are single stranded, instead of being double stranded. RNA contains a uridine (U) base and does not contain a thymidine base. The other bases (A, C, G) are contained in both RNA and DNA. U has the property of base-pairing with A.

The non-template strand and template strand of DNA , with mRNA in between. the RNA Polymerase is connected to the template strand from the mRNA.
Figure 4. The enzyme RNA polymerase (RNA Pol) uses the template strand to synthesize RNA in the 5′ to 3′ prime direction. The base T is replaced with U in RNA. Image by Walter Suza.

RNA synthesis is directed by a DNA template in a process called transcription. A protein complex containing the enzyme RNA polymerase synthesizes an RNA molecule by adding nucleosides to the 3′ end of a growing chain. The principle of base pairing is used again and each nucleotide base added is complementary to the corresponding base on the DNA template. Thus, the RNA is complementary in sequence to the template strand of DNA, which is also referred to as “antisense” or “negative” strand (Figure 4). The RNA is identical in sequence (except U replaces T) to the other strand, which is called the “sense” or “positive” strand.  Because RNA molecules are produced by the process of transcription, they are often referred to as transcripts.

RNA processing

Coding (transcribed) region

This is the region that is transcribed by RNA polymerase, also known as the RNA coding region. As described below, it may include introns, sequences that are removed from the mature RNA molecule during RNA processing.  The transcribed region is demarcated by promoter and terminator sequences.

Introns and exons

As mentioned, and described in detail below, introns are sequences that are removed from transcripts during RNA processing. Sequences that are retained in mature transcripts are called exons. The corresponding stretches of DNA are typically referred to with the same terms. Introns are commonly found in genes of eukaryotes but are rare in prokaryotic organisms.

Intron splicing

The process of transcription produces pre-mRNA that contains both introns and exons. The process of splicing involves removal of introns from pre-mRNA and joining together the exons. A complex group of proteins that form a spliceosome perform the splicing reaction.

Introns sometimes serve as boundaries for sequences encoding functional protein domains, leading to possibility for new and variant proteins by exon shuffling. Also, introns can provide possibility for productions of variant RNA forms through alternate splicing allowing more than one gene product from a single gene. Some introns result from the insertion of transposable elements and may be spliced perfectly of imperfectly, offering more possibility for new genetic diversity.

The RNA coding sequence of DNA is turned into pre-mRNA and finally mRNA.
Figure 5. DNA (gene) transcription produces precursor-mRNA (pre-mRNA) that contains both introns and exons. The 5′ cap is 7-methyl guanidine. The enzyme poly(A) polymerase adds the poly(A) tail. The process of splicing involves removal of introns from pre-mRNA and joining together the exons to form mature mRNA. Image by Walter Suza

5′ Capping

The 5’ capping is the addition of a 7-methyl guanidine to the first nucleotide of mRNA molecule, usually and adenine or guanidine. The phosphodiester linkage between 7-methyl guanidine and the target nucleotide is 5′-5′ instead of 5′-3′, and 3 phosphates rather than 1 are retained in the linkage. The cap stabilizes the 5′ end of the mRNA and plays a role in translation initiation.

Poly adenylation

The transcription of a gene may proceed beyond what ends up as 3′ end of mature mRNA. Thus the 3′ end of mRNA is formed after transcription. The enzyme poly(A) polymerase adds numerous adenosines to the 3′ end to result in what is called the poly(A) tail. The poly(A) tail is necessary for proper processing and transport of mRNA to the cytoplasm. The poly(A) tail is also important for the stability of mRNA, and initiation of translation in eukaryotic organisms.

Summary

The genetic information of DNA is transferred through transcription to an intermediate molecule called RNA. The signals for starting and stopping transcription are located within the DNA sequence and referred to as promoter and terminator sequences. The coding region of a gene is composed of a sequence of nucleotides that are transcribed into RNA. These sequences include exons and introns. Exons are the sequences that code for proteins. The coding region of a gene contains exons and introns. Also, pre-mRNA contains both introns and exons. The introns in pre-mRNA are removed through a process called intron splicing. The mRNA is processed by 5′ capping and addition of a poly(A) tail.

Learning Activities

Learning Activity 1

Given the following sequence of double-stranded DNA, predict the sequence of the RNA strand.

Learning Activity 2

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Genetics, Agriculture, and Biotechnology Copyright © 2021 by Walter Suza and Donald Lee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.