Previous Article | Next Article ![]()
Journal of Virology, April 2006, p. 3811-3822, Vol. 80, No. 8
0022-538X/06/$08.00+0 doi:10.1128/JVI.80.8.3811-3822.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Division of Plant Sciences, 108 Waters Hall,1 Department of Computer Science, 201 Engineering Bldg. West, University of Missouri, Columbia, Missouri 65211,2 Plant Biology Dept., The Samuel Roberts Noble Foundation, Ardmore, Oklahoma 73402,3 Dept. of Biochemistry, Dakota Wesleyan University, Mitchell, South Dakota 57301,4 Dept. of Microbiology, Miami University, Oxford, Ohio 450565
Received 13 September 2005/ Accepted 25 January 2006
|
|
|---|
|
|
|---|
![]() View larger version (29K): [in a new window] |
FIG. 1. A comparison of genome structures of pararetroviruses and retrotransposons. (A) Genome structure of CaMV. The origin of the CaMV genome sequence is indicated at nucleotide coordinates 8034/1. The positions and approximate sizes of the six CaMV open reading frames that encode protein products are indicated as well as their functions. The positions of the 35S and 19S transcripts are indicated within the circle, and the arrows denote their 3' ends; the 35S RNA is approximately 8,200 nucleotides in length and terminally redundant. (B) Genome structure of Ty1. The major transcript is indicated by the arrow. The ORFs gag and pol encode viral internal proteins and reverse transcriptase, respectively. (C) The structure of the 35S RNA leader sequence. The origin of the CaMV genome is as indicated in 1A. The positions and relative sizes of sORF A through F are indicated as well as the positions of CaMV ORFs VII and I. Although ORF VII has the potential to encode a protein that would be 96 amino acids in length, no protein product has been detected in vivo. The positions of the shunt donor and acceptor sites are indicated in the linear sequence. The diagram on the right illustrates the juxtaposition of shunt donor and acceptor sites within the 35S RNA leader sequence that would form a stem-loop structure.
|
A second feature that is a hallmark of the caulimoviruses is the ribosomal shunt mechanism of translation (51). The ribosomal shunt has undoubtedly evolved over time to compensate for the complexity and length of the leader sequence of the 35S RNA. This leader sequence is approximately 604 bp in length, and it contains up to nine short ORFs (sORFs), depending on the strain, that differ in size from 9 to 102 nucleotides (Fig. 1C). Such a leader could be expected to be a formidable impediment to translation in any eukaryotic organism, as eukaryotic ribosomes generally initiate translation at the first AUG that is found in a favorable context and encountered nearest the 5' end of the transcript (29). In the majority of eukaryotic transcripts, ribosomes are unable to reinitiate the transcription of a second cistron. However, extensive studies have shown that the complexity of the 35S RNA leader sequence is bypassed through the formation of a large stem-loop structure (15, 20), which brings into close proximity the essential elements of the ribosomal shunt (the shunt donor and acceptor sites). In translation of the 35S RNA, host ribosomes enter the 5' end of the 35S RNA and scan a short distance until they reach the shunt donor site (between sORFs A and B) (Fig. 1C). At this point in the sequence, ribosomes bypass the central region of the leader and land at the shunt acceptor site within sORF F (10, 16, 48). The shunt mechanism allows ribosomes to reach the start codon of gene VII. The subsequent expression of genes I through V on the 35S RNA involves alternative mechanisms, including transactivation of the translation of genes I through V by the CaMV gene VI product (18, 44) and splicing (14, 28).
In the present study, we developed a system using Saccharomyces cerevisiae to probe the expression strategy of CaMV. S. cerevisiae has proven useful for studying the transcription and replication of positive-sense plant viruses, such as Brome mosaic virus and Tomato bushy stunt virus (39, 42), and genome-wide screens have revealed yeast (Saccharomyces cerevisiae) proteins that are involved in the replication of these viruses (30, 43). Previous studies have also shown that there is a degree of conservation between the transcriptional machinery of yeasts and plants. For example, the CaMV 35S promoter is active in yeast species (21) and the CaMV 35S terminator sequence also can function as a terminator in yeast, although the sequence elements that are responsible for termination are different from those that are active in plants (23). Furthermore, the CaMV translational transactivator protein has limited activity in yeast (56).
Our study has revealed the presence of a cryptic yeast promoter within the CaMV genomic region that corresponds to a portion of the 35S RNA leader sequence. A BLAST search of the yeast genome showed that this CaMV promoter element was similar to the R region of the LTR of the yeast Ty-1 retrotransposon. In plants, the same CaMV sequence has been shown to have an essential role in the ribosomal shunt mechanism of translation, as it forms the base of the right arm of the stem-loop. Since the left arm of the stem-loop must represent an imperfect reverse copy of the right arm, we propose that the ribosomal shunt has evolved from a pair of LTRs that have become incorporated end to end into the CaMV genome.
|
|
|---|
Construction of plasmids for gene expression studies in whole leaves. To evaluate the strength of the cryptic yeast promoter in whole leaves, GUS reporter constructs were inserted into an Agrobacterium tumefaciens binary vector, pKYLX7 (53). To create pKYGUS, the GUS gene was modified by PCR to be flanked by KpnI and XhoI sites, which facilitated its cloning into KpnI/XhoI sites engineered into pKYLX7. pCMS129 was created by replacing the 35S promoter, which was present on an EcoRI/HindIII DNA fragment, with a CaMV segment from nucleotides 7672 to 131. pKYMdg was created by digesting pCMS129 with EcoRI and HindIII, filling in the sticky ends, and religating the plasmid. Triparental matings and agroinfiltration were performed as described previously by Palanichelvam and Schoelz (41). Leaf samples were evaluated for GUS expression 4 days after infiltration.
Recombinant DNA techniques and yeast transformation.
Restriction enzymes and T4 DNA ligase were purchased from Promega Corp. (Madison, WI) or New England Biolabs (Beverly, MA). DNA ligation, plasmid transformation, and plasmid purification were performed according to the procedure of Maniatis et al. (33). Plasmids were propagated in Escherichia coli strain JM101 (38) grown in double yeast tryptone broth containing 100 µg of ampicillin per milliliter. Purified plasmid DNA from E. coli was transformed into S. cerevisiae strain JC746Dip (MATa/MATa ura3/ura3 his3/his3 leu2/leu2
trp/
trp) by the lithium acetate method (24). Yeast transformants were grown in synthetic dextrose medium (0.67% yeast nitrogen base without amino acids, supplemented with auxotrophic requirements and 2% raffinose) lacking uracil.
Detection of the CAT mRNA in yeast by Northern blot analysis. Yeast was grown in minimal medium to an optical density at 600 nm of 1.6 to 1.9, and total RNA was isolated by the glass bead method (2) (type IV glass beads, 250 to 300 microns in diameter; Sigma, St. Louis, MO). Equal concentrations of RNA were denatured using glyoxal, sodium phosphate (pH 7.0), and dimethyl sulfoxide and run on a 1% agarose gel in the presence of 0.01 M sodium phosphate (57). Northern hybridization was performed according to the procedure of Maniatis et al. (33) by using a 32P-labeled CAT DNA probe.
Measurement of CAT protein levels in yeast extracts by CAT ELISA. The induction of the GAL1 promoter has been described previously by Schneider and Guarente (54). Yeast cells (40 ml at an optical density at 600 nm of 1.6 to 1.9) were pelleted by centrifugation, and the pellet was washed twice with distilled water. The pellet was resuspended in 200 µl of 0.25 M Tris-HCl (pH 7.8), the cells were sonicated for a total of 120 s (four times for 30 s each, with incubation on ice for 1 min between sonications), and cell debris were removed by centrifugation. The concentration of total protein was determined by using a modified Lowry protein assay (Bio-Rad, Hercules, CA), and the amount of CAT protein was measured using a CAT enzyme-linked immunosorbent assay (ELISA) kit (Boehringer Mannheim, Indianapolis, IN). CAT and GUS levels were measured by at least three tests for each plasmid, with three replicates per test.
Nucleotide sequence analysis. Nucleotide sequences of CaMV strain D4 present in clone pCMS119 (from nucleotides 7949 to 131) were determined at the DNA Sequencing Core Facility of the University of Missouri, Columbia. The genome size of 8,034 bp for D4 was determined through a comparison of the partial D4 nucleotide sequence to the complete CM1841 nucleotide sequence (17). RNA secondary structure for the CaMV strain D4 leader sequence was generated by using the mFOLD program (63). tRNA primer binding sites for the putative left LTR were found in the tRNAscan-SE genomic tRNA database (32). Multiple sequence alignments were performed using Clustal W (58).
To assess the statistical significance for the homologous relationship between the R region in yeast LTR and CaMV sequences, we performed a sequence-profile analysis using HMMER (11), which applies profile-hidden Markov models for the statistical descriptions of a sequence family's consensus (60). We downloaded 297 Ty1 and Ty2 LTRs from the yeast genome database (http://www.yeastgenome.org) (27), from which we obtained 40 full LTRs. To avoid the bias from data redundancy, we kept only one copy of LTR R in the profile data set when there were multiple identical sequences. As a result, 20 LTR-R regions were used to construct the multiple sequence alignment profile LTR-R using Clustal W (58). Eight complete CaMV genome sequences were obtained from GenBank (BBC, GenBank ID M90542.1; NY8153, M90541.1; CM1841, V00140.1; CMV-1, M90543.1; B29, X79465.1; Strasbourg, V00141.1; D/H, M10376.1; and Xinjiang, AF14064.1) as well as a CaMV D4 partial genomic sequence (DQ355155). As with the yeast LTR sequences, only one copy of a CaMV sequence was retained in the instance of multiple identical sequences. Consequently, we searched the LTR-R profile against seven CaMV sequences. The E value from HMMER was used to evaluate the significance of the homologous relationship between yeast LTR-R and CaMV R regions. We utilized the E value of 0.05 suggested by HMMER as the significance threshold (12).
Mapping the 5' end of the CAT mRNA derived from the CaMV sequence. Total RNA was extracted from yeast cells transformed with pCMS119 and used for 5'-end mapping of the CAT mRNA by using the GeneRacer Kit (Invitrogen, Carlsbad, CA). Following the dephosphorylation and decapping of the full-length mRNA, the GeneRacer RNA oligonucleotide was ligated to the 5' end and used for reverse transcription. The resulting cDNA was PCR amplified by using a GeneRacer-specific 5' primer and a gene-specific primer corresponding to the 3' end of the CAT gene. Two fragments, which were approximately the size of the CAT gene, were amplified. The two fragments were separated from each other by gel isolation and subsequently cloned by using a one-shot Topo cloning kit (Invitrogen, Carlsbad, CA). The resultant clones were submitted for sequencing at the DNA Sequencing Core Facility of the University of Missouri, Columbia to identify the transcript initiation sites.
Measurement of GUS expression in yeast cellular extracts and agroinfiltrated plant disks. After protein concentrations from yeast and plants were determined, GUS expression was measured by using a chemiluminescence GUS assay with the GUS-light kit (Tropix, Bedford, MA). GUS concentrations in experimental samples were determined by comparison with a standard curve made with purified GUS supplied by the kit.
Nucleotide sequence accession number. The sequence for CaMV strain D4 was submitted to GenBank and assigned accession number DQ355155.
|
|
|---|
![]() View larger version (29K): [in a new window] |
FIG. 2. Expression of CaMV-CAT fusions in yeast. The yeast GAL1 promoter and CYC1 terminator are illustrated by black and white circles, respectively. The 5' end of the CaMV sequence in all of these constructs begins at the 35S RNA transcription start site (nt 7435). The CaMV genome sequence origin is indicated as in Fig. 1. The CAT gene is indicated by stippled boxes, and solid boxes indicate that CaMV ORFs fused in frame to the 5' end of CAT. The coordinates for the insertion of the CAT gene within the CaMV sequence are shown in bold. The values are the average of at least three tests, and the standard deviations are presented.
|
In the construct pVII-CAT, the CAT gene was inserted in frame into the coding sequence and so we anticipated that a CAT fusion product would be produced in these cells. To determine whether the CAT protein was in fact fused to CaMV sequences within ORF VII, we altered the CaMV and CAT start codons through PCR-directed mutagenesis. A mutation of the gene VII start codon had no effect on the level of CAT expressed from pCMS100, whereas a mutation of the CAT start codon in pCMS93 almost completely abolished CAT expression (Fig. 2). We also directly compared the size and amount of CAT protein produced in yeast that contained pVII-CAT to that produced in the pMono-CAT cells. In agreement with the CAT ELISA results, a Western blot assay indicated a slightly higher level of the CAT protein in pVII-CAT compared to that in pMono-CAT. In addition, the CAT protein produced from the pVII-CAT construct had the same molecular weight as the CAT protein produced in cells containing the pMono-CAT plasmid (data not shown). These results demonstrated that the CAT start codon is used to express CAT rather than CAT being initiated from the upstream gene VII start codon.
Sequences within sORF F and gene VII form part of a cryptic yeast promoter. The high level of expression of CAT protein from pVII-CAT could be due to a ribosomal shunt mechanism, an internal ribosomal entry site (IRES), or the presence of a cryptic yeast promoter. To investigate whether the high level of CAT expression in yeast was due to a ribosomal shunt mechanism, we deleted 239 nucleotides, from the beginning of the 35S RNA to 25 nucleotides upstream of sORF D (nucleotides 7435 to 7672), to form pCMS95. The effect of this deletion would be to abolish a significant portion of one of the arms of the mRNA stem-loop structure (10, 48). However, this deletion had no effect on CAT expression (Fig. 3), and smaller deletions of sORF A and sORF B also had no effect on CAT expression (data not shown). These results indicated that ribosomal shunting was not responsible for expression of CAT in the construct pVII-CAT, and this was confirmed in the experiments described below.
![]() View larger version (31K): [in a new window] |
FIG. 3. Identification of the cryptic yeast promoter in the CaMV large intergenic region. The yeast GAL1 promoter and CYC1 terminator are illustrated by black and white circles, respectively. The deletion of the GAL1 promoter is indicated by a diagonal line. The coordinates for CaMV sequences are indicated as well as the CaMV genome sequence origin. The CAT gene is indicated by stippled boxes, whereas the GUS gene is indicated by the open boxes. Values are the averages of at least three tests, and the standard deviations are presented.
|
To distinguish whether the CaMV sequences in pCMS101 functioned as a promoter or as an IRES in yeast, we deleted the yeast GAL1 promoter sequence and examined the effect of this deletion on the expression of CAT and GUS in pCMS119. Interestingly, the level of CAT protein expressed from pCMS119 was comparable to that of pCMS101 (Fig. 3), whereas GUS expression was abolished. We concluded that CAT expression was not dependent upon the GAL1 promoter and that sequences within CaMV sORF F and gene VII were part of a cryptic promoter that functioned in yeast. A further deletion of CaMV sequences from nucleotides 7949 to 7995 essentially abolished the activity of the putative CaMV promoter, as CAT protein expression from pCMS120 was comparable to that of the control plasmid pJB79 (Fig. 3). Furthermore, the addition of the GAL1 promoter to pCMS120 had no effect on the CAT protein level (Fig. 3, compare pCMS120 and pCMS121), confirming that CAT expression was driven by a cryptic promoter present in the CaMV sequences.
A comparison of the CAT transcripts produced in yeast cells provided further evidence that CaMV sequences centered around sORF F and gene VII functioned as an active promoter in yeast. Two prominent transcripts that contain CAT sequences can be seen in yeast cells transformed with pCMS101 (Fig. 4). The larger transcript, expressed from the GAL1 promoter, consisted of GUS and CAT sequences, whereas a second transcript was smaller than the pMono-CAT transcript (Fig. 4). The larger size of the CAT mRNA that was derived from the pMono-CAT plasmid could be attributed to sequences from the pYES2 polylinker within its untranslated leader sequence. Removal of the GAL1 promoter, as in constructs pCMS119 and pCMS120, abolished the larger GUS/CAT transcript (Fig. 4), demonstrating that this transcript was not responsible for the expression of CAT. In contrast, the elimination of CaMV sequences from nucleotides 7949 to 7995 abolished the smaller transcript (Fig. 4, lanes pCMS120 and pCMS121) and CAT protein expression (Fig. 3).
![]() View larger version (53K): [in a new window] |
FIG. 4. Northern blot analysis of CAT mRNA expression. Two micrograms of total RNA was denatured, separated on a 1% agarose gel, transferred to a nylon membrane, and probed with a 32P-labeled CAT gene.
|
Identification of two transcript initiation sites within gene VII. The size of the CAT mRNA derived from the CaMV yeast promoter, which was detected in the Northern blot analysis of pCMS101 and pCMS119, indicated that the initiation of transcription occurs within gene VII near the CAT start codon. To precisely determine the transcript initiation site for the CAT mRNA in pCMS119, we used a GeneRacer kit. Two PCR fragments, corresponding to the 5' end of the CAT mRNA, were cloned, and their nucleotide sequence was determined. Based on the intensity of the PCR-amplified fragments and nucleotide sequence, the most likely transcript initiation site was at the AG dinucleotides 35 nt upstream from the CAT start codon (Fig. 5). A second site was located 56 nucleotides upstream of the CAT start codon at CT.
![]() View larger version (30K): [in a new window] |
FIG. 5. Identification of transcript initiation sites associated with the cryptic yeast promoter. (A) Reverse transcription-PCR bands associated with the 5' end of the transcript. Lane 1 contains DNA size standards and lane 2 illustrates the two reverse transcription-PCR products that delimit the 5' ends of the transcripts. (B) Nucleotide sequence of the cryptic yeast promoter. Transcript initiation sites are indicted by arrows. The origin of the CaMV genome sequence is indicated at nucleotide coordinates 8034/1. The start and stop codons for sORF F and ORF VII are illustrated in bold as is the start codon for the CAT gene.
|
![]() View larger version (18K): [in a new window] |
FIG. 6. Analysis of the cryptic yeast promoter activity in plant cells. The GUS sequence is indicated by the closed boxes. The left border of the Agrobacterium tumefaciens T-DNA is indicated by the box labeled TL. The portion of the CaMV large intergenic region shown to have promoter activity in yeast is indicated by stippling, and its coordinates are indicated. Values are the averages of at least three tests, and the standard deviations are presented.
|
![]() View larger version (14K): [in a new window] |
FIG. 7. Alignment of nucleotide sequences between CaMV strain D4, YDRCdelta3, and Ty912. (A) CaMV sequences within sORF F have sequence and structural similarities to the R regions present within the LTR of Ty1. Sequence identity is illustrated by capital letters, whereas sequence divergence is illustrated by lowercase letters and missing bases are illustrated by dashes. The R region of the LTR of Ty1 and the terminal redundancies present in the 35S RNA of CaMV are illustrated by stippled boxes. The coordinates of key elements that are discussed in the text are also indicated. (B) The CaMV sequence that is homologous to the Ty1 R region, which is illustrated in bold letters, plays a key role in the formation of the ribosomal shunt acceptor. The sORF A open reading frame is boxed. This diagram is redrawn from that of Pooggin et al. (48) using the sequence of CaMV strain D4. The D4 sequence for sORF A and the left arm is derived from Daubert and Routh (9), whereas the D4 sequence for the right arm was determined in this paper.
|
To assess the statistical significance for the homologous relationship between the R region in the Ty1 LTR and CaMV sequences, we performed a sequence-profile analysis. We searched the profile of yeast LTR R regions against each of the six CaMV genome sequences and the CaMV D4 partial genome sequence by using HMMER (Fig. 8) (11). Our analysis demonstrated that the E values between the profiles of LTR R regions in yeast and the genomic sequence of CaMV range from 0.026 to 0.00043, which is much lower than the E value significance threshold of 0.05 suggested by HMMER (12). Among the seven CaMV sequences, the D4 strain has the most significant E value of 0.00043. Thus, the sequence profile analyses suggest that the yeast LTR R region and the R region of CaMV are homologous with significant confidence.
![]() View larger version (54K): [in a new window] |
FIG. 8. The multiple sequence alignment among the 20 nonredundant yeast LTR R regions and 7 CaMV nonredundant R regions. Stem sections are labeled STEM1a through STEM d and bulge loops are labeled BL1a through BL 1c. The asterisks denote the sequence conservation. The stems of the right arm of the CaMV ribosome shunt landing site are indicated by black boxes and underlining with heavy dark lines. The seven CaMV sequences in the figure are BBC, CMV-1, B29, CM1841, Strasbourg, D/H, and D4 strains.
|
|
|
|---|
A sequence element in sORF F functions as a cryptic yeast promoter/enhancer and has homology to the R region present in the LTR of the yeast retrotransposon Ty1. In our study, we considered three different mechanisms that might explain the high level of expression of CAT from the construct pVII-CAT in baker's yeast: a ribosomal shunt, an IRES, and a cryptic promoter. We eliminated the ribosomal shunt and the IRES as possible mechanisms because both would have been dependent on the yeast GAL1 promoter, and elimination of the GAL1 promoter abolished CAT expression. Instead, our mutagenesis studies and Northern analysis revealed that the CaMV sequence had promoter or enhancer activity in yeast.
A BLAST analysis of the yeast genome revealed that the CaMV promoter/enhancer element had sequence homology to the delta element YRDCdelta3, which is located on yeast chromosome IV. Delta elements are solo copies of the LTR of Ty1; they result from a recombination event that occurs between the LTRs of a full-length Ty1 element. As such, delta elements represent a site in the genome where a full-length element had once been inserted. Within the yeast genome, there are 32 full-length copies of Ty1 and 185 solo Ty1 LTRs, either complete versions of the LTR or fragments (27). The BLAST search did not reveal a match between the CaMV sequence and other Ty1 LTRs, but once the initial match was made between the CaMV sORF F sequence and YRDCdelta3, it was possible to align the CaMV sequence with the original sequence of Ty1 (8). This alignment could be extended to the primer binding site for first-strand DNA synthesis for both CaMV and Ty1. In the case of the Ty1 LTR, the R region is located 41 nucleotides upstream from the Met tRNA binding site, whereas in CaMV, the distance is 39 nucleotides.
The portion of CaMV that contributes to the cryptic promoter activity corresponds to the untranslated leader region (ULR) of the Ty1 mRNA (Fig. 7A). Transcription of Ty1 begins at nucleotide position 240 within the LTR (13), and the start codon for the gag ORF is found at nucleotide position 294; the CaMV promoter/enhancer corresponds to approximately 83% of this sequence (Fig. 7A). This portion of the Ty1 LTR has not been reported previously to have enhancer activity. However, it is well established that transcriptional enhancers can be located within ULRs. For example, the copia retroelement of Drosophila contains seven transcriptional enhancer elements within a ULR that is adjacent to its 5' LTR (34, 37). The enhancer elements in the copia ULR have been shown to significantly increase the expression of a minimal heat shock (hsp) promoter element and to bind to the Drosophila CCAAT/enhancer binding protein (DmC/EBP) (37, 62). The copia 5' LTR also enhances the expression of the minimal hsp promoter, but the highest level of expression was attained when both the LTR and ULR were placed upstream from the minimal hsp promoter (37, 62).
In addition, a transcriptional enhancer was recently found within the ULR of CaMV at a point between the 35S RNA transcriptional start site and sORF A (45). It is worthwhile to note that the CaMV 35S promoter was first characterized in 1985 (40) and it is considered to be one of the most extensively studied plant promoters (45), yet the existence of this enhancer element is just coming to light now, perhaps because its presence was masked by other more powerful enhancers upstream from the 35S promoter (45). The discovery of this enhancer in the 35S leader sequence underscores how much there is to be learned about even well-characterized regulatory elements. At a minimum, our work illustrates that sequence alterations in the Ty1 ULR have the capacity to function as a promoter/enhancer element in yeast.
A model for the evolution of the ribosomal shunt mechanism of translation in CaMV from LTRs. So why might a cryptic yeast promoter/enhancer be located within the CaMV large intergenic region? It is unlikely that the promoter/enhancer function is responsible for its conservation from yeasts to plants. This sequence lacked promoter/enhancer activity in an agroinfiltration assay (Fig. 6), although we cannot eliminate the possibility that the cryptic promoter might function in plants in the presence of one or more CaMV proteins. Furthermore, the homology of the 42-nucleotide CaMV sequence to the R region in the Ty1 LTR has no discernible function as an R region for CaMV. The comparable region for CaMV, its terminal redundancy in the 35S RNA, is located 338 nucleotides upstream from the 42-nucleotide stretch that is homologous to the Ty1 R region (Fig. 7A). Nonetheless, the 42-nucleotide CaMV sequence does have an essential role in the translation of the CaMV 35S RNA in plants; it encompasses the shunt acceptor site (Fig. 7B) (10, 16, 48). Since neither of the functions attributed to the 42-nucleotide CaMV sequence in yeast are relevant to how CaMV functions in plants, we propose that the sequence has evolved from the R region of an ancestral LTR into the stem-loop structure required for the ribosomal shunt.
Once the base of the right arm of the 35S RNA stem-loop is recognized to be a remnant of an LTR, then other elements of the LTR come into focus to provide an alternate picture of the structure of the 35S RNA leader sequence (Fig. 9). Most importantly, the left arm of the stem-loop can also be recognized as the remnant of the R region of Ty1, although in an opposite orientation to the R region that is represented in the right arm (Fig. 9B). In essence, the CaMV large intergenic region can be redrawn as two LTRs in opposite orientations (Fig. 9C). The R region remnant in the left LTR extends from nucleotide 7510 to 7563. It occupies more sequence than the R remnant in the right LTR because of insertions of foreign sequences over time (Fig. 9B); these insertions contribute a larger bulge loop in the large stem-loop structure (Fig. 7B). The boundaries between U5 and R correspond to the shunt donor and acceptor sites (Fig. 9A and C).
![]() View larger version (38K): [in a new window] |
FIG. 9. Two alternate views on the structure of the 35S RNA leader sequence. (A) A view that emphasizes the shunt donor and acceptor sites as well as sORFs A through F and ORF VII. The origin of the CaMV genome sequence is indicated at nucleotide coordinates 8034/1. (B) Sequence comparison between the yeast delta element YDRCdelta3 and the CaMV sequences that form the base of the stem-loop structure in the 35S RNA leader sequence. The sequence of the right arm matches that of the 35S RNA, whereas the sequence in the left arm is the reverse complement. An asterisk indicates the lack of a base at that position. (C) An alternate view of the leader sequence 35S RNA, which hypothesizes a derivation from two inverted LTRs. The diagram has been generated from the positions of the R region remnants as illustrated in Fig. 7B.
|
The boundary of U5 on the right LTR is defined by the methionine tRNA binding site, which is the primer binding site for first-strand CaMV DNA synthesis during reverse transcription (Fig. 9C) (50). An inspection of the nucleotide sequence which is upstream from the left LTR reveals one attractive candidate for an additional tRNA primer binding site, as the 3' end of a valine tRNA from S. cerevisiae provides a reasonably good fit. Interestingly, the 3' end of this primer binding site is adjacent to the transcriptional start site of the 35S RNA at nucleotide 7435. The U5 region of the left LTR is a little larger than its right counterpart because of the inclusion of two short sequence elements that enhance the transcription of the 35S RNA (45).
The alternate views of the 35S RNA leader sequence presented in Fig. 9A and C both fit the model for the formation of the ribosomal shunt (10, 16, 48). The model in Fig. 9A illustrates the essential elements of the shunt, shunt donor, and acceptor sites as well as the sORFs in the leader sequence. Although the sORFs are a prominent feature of the leader sequence, the elimination of start codons in the leader only delays infectivity rather than abolishing it (48). Consequently, none are essential. Pooggin et al. (48) suggested that the sORFs may play a structural role in maintenance of the stem-loop structure or in positioning ribosomes for the shunt. In contrast, the model in Fig. 9C addresses the derivation of the ribosomal shunt mechanism; it suggests that the shunt is derived from two LTRs positioned in opposite directions. This model is consistent with the concept that sORFs might contribute to the stem-loop structure rather than encoding peptides that contribute to CaMV infections.
The structure of the CaMV large intergenic region illustrated in Fig. 9C implies that a CaMV progenitor may have served as a landing pad for other retrotransposons such that two LTRs would be juxtaposed next to the 35S promoter. There are, in fact, precedents for this arrangement of sequences. It is well established that retrotransposons of all types can be found in clusters (5, 52), and Ty1 is no exception. In a survey of transposable elements in the yeast genome, Kim et al. (27) documented 16 compound insertions in which retrotransposons had inserted within other retrotransposons. In addition, they found other cases in which Ty elements were inserted adjacent to each other and in opposite orientations. In one case, the retroelements were separated by only 10 base pairs (27).
It cannot be determined whether the tandem LTRs present in the CaMV large intergenic region are derived from a fungal or plant progenitor, although either scenario is possible. Ty1-copia-type retroelements are found in all plants (46, 59, 61), and they could have served as a source of the Ty1 LTR remnants in CaMV. Although pararetroviruses do not become integrated into their hosts as a part of their infection cycle, integrated copies of some pararetroviruses have been found in plant genomes (19, 26, 31, 49). Consequently, the LTR remnants in the CaMV large intergenic region might have arisen through recombination with a plant retroelement, at a time when a CaMV progenitor may have existed as an integrated copy in its host genome. However, attempts to align the CaMV LTR remnant with Ty1/copia-like and Ty3/Gypsy retroelements present in Arabidopsis (a total of 1,447 retroelements) failed to reveal any conservation (data not shown). Consequently, it does not appear that the CaMV LTR remnant is derived from either of these sources. In contrast, since the CaMV LTR remnants have homology with a yeast retroelement, they might have arisen from a mycorrhizal or endophytic fungus. Some fungi are capable of vectoring viruses into plants (22), although pararetroviruses have not been documented to be spread in this manner (22).
Mobile elements and segments of repetitive DNA have been documented to play a role in the evolution of their hosts through their insertion into and adjacent to host genes (6, 36, 61). Britten (6, 7) has compiled several examples that have been found in animals and insects and has suggested three criteria for inclusion in this group: (i) that the insertion occurred far in the past and that it is not a transient mutation; (ii) that the element is derived from recognizable group of similar sequences; and (iii) that the element serves a useful function. We propose that the LTR remnants that we have identified in the CaMV large intergenic region should be included in this group. Although we cannot establish a date for when the LTRs were inserted into the CaMV genome, they must be as old as CaMV itself, as they are an integral component of the CaMV genome. The elements themselves are related to the Ty1-copia class of retrotransposons, a class that is ubiquitous in plants (59, 61). Finally, the LTR remnants form the base of the large stem-loop structure that forms within the CaMV 35S RNA leader sequence; they are a structural requirement for the ribosomal shunt (10, 16, 48). The insertion of elements adjacent to each other and in opposite orientations may represent an economical way in evolution to form a sizable RNA local secondary structure, which could be much more efficient than the coevolution of two independent sequences to form the arms of a local secondary structure. This mechanism might appear in other genomes and for other types of RNA local structural motifs.
This research was supported by the Missouri Agricultural Experiment Station, the Food for the 21st Century program at the University of Missouri, and the Research Board at the University of Missouri. The research of X.-F.W. and D.X. was supported by the U.S. Department of Energy's Genomes to Life program (www.doegenomestolife.org) under the project "Carbon Sequestration in Synechococcus sp.: from Molecular Machines to Hierarchical Modeling" (www.genomes-to-life.org).
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»