Previous Article | Next Article ![]()
Journal of Virology, October 2007, p. 11267-11281, Vol. 81, No. 20
0022-538X/07/$08.00+0 doi:10.1128/JVI.00007-07
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
,
Department of Pathology,1 Department of Surgery, The Ohio State University, Columbus, Ohio 432102
Received 2 January 2007/ Accepted 27 July 2007
|
|
|---|
|
|
|---|
Progress in developing effective antiviral drugs and vaccine candidates will likely rely upon detailed knowledge of viral gene products and how they function in pathogenetic processes. For this reason, there is an intense interest in defining the gene products of HCMV and other herpesviruses. The HCMV genome of 230 kb is among the largest of the herpesvirus genomes. The genome is comprised of two unique regions, known as the unique long (UL) and unique short (US) regions, that are bounded by terminal (TRL or TRS) and internal (IRL and IRS) repeat regions. Although the entire sequence of the laboratory-adapted AD169 strain of HCMV was first available in 1989 (22), the precise number and nature of viral genes and gene products are still in question. After sequencing the HCMV genome, Chee and colleagues predicted approximately 200 open reading frames (ORFs) capable of coding for proteins (5, 22). It was subsequently discovered that the AD169 laboratory strain harbored a deletion of approximately 15 kb relative to clinical isolates. This region was predicted to encode 19 additional ORFs, suggesting that the HCMV genome encoded up to 220 genes (18). More recently, comparison of the HCMV genome with the chimpanzee cytomegalovirus genome led to a revised estimate for the protein-coding genes of AD169 to 145 (27). Likewise, application of an in silico approach based on the Bio-Dictionary gene finder algorithm to define the coding potential of HCMV supported elimination of 37 previously annotated ORFs (61). However, comparison of the AD169 genomic sequence to those of clinical isolates and use of proteomic experimental approaches predict that additional unannotated ORFs exist (62, 83).
Most studies of herpesvirus genomes have focused on protein-coding potentials of these viruses. However, rapid advances in understanding of the role of noncoding gene products and antisense (AS) transcripts are dramatically changing the paradigms applied to gene definitions, gene regulation, and gene functions (reviewed in references 51, 56, 60, and 93). In particular, the application of bioinformatics approaches to analyze expressed sequence databases has revealed that AS transcription is widespread in human and other genomes (reviewed in references 51 and 60).
Sense-antisense (S-AS) transcript pairs have been found in genomes from archaebacteria to humans (58, 79, 94), and a subset of S-AS pairs is recognized to be conserved across species (94). While some pioneering studies have estimated that between 1 and 15% of human or mouse genes were influenced by S-AS pairs (29, 47, 52, 73, 92), more recent estimates of up to 20 to 26% of human genes (24, 94) and 72% of mouse genes (45) suggest that AS-mediated gene regulation may be much more common than previously appreciated. Natural AS transcripts (NATs) are classified as cis or trans in nature. cis NATs are transcribed from opposite strands of the same genomic locus and are predicted to have longer and more perfect complementary sequences for S transcripts than trans NATs derived from separate loci. Regulatory functions of NATs are predicted to derive from double-stranded RNA-dependent and -independent mechanisms, including RNA editing, RNA interference, chromatin remodeling, transcriptional interference, and masking of RNA elements involved in splicing, localization, transport, and translation of RNAs (reviewed in references 51 and 60).
In this study, we examined the transcriptional products of HCMV during lytic infection of fibroblasts. Remarkably, of the 604 HCMV cDNA clones analyzed in this study, at least 45% were derived from genomic regions predicted to be noncoding. Of similar interest was our finding that 55% of the cDNA clones in this study were completely or partially AS to known or predicted HCMV genes. Moreover, cis NAT pairs were identified or predicted for 56 of the 191 genes currently annotated at the Los Alamos National Laboratory Sexually Transmitted Diseases Sequences Database (STD database) (now at the Oral Pathogens Sequences Database). We conclude that genomic maps based on ORF analyses and other in silico analyses may drastically underestimate the true complexity of viral gene products. In addition, the abundance of AS transcription in the HCMV viral genome raises the distinct possibility that AS-dependent gene regulatory mechanisms influence both viral gene expression and gene organization. These noncoding and AS transcripts may offer new insights into HCMV pathogenesis and may serve as novel targets for developing intervention strategies and treatments for HCMV-related diseases.
|
|
|---|
HCMV strain AD169 was obtained from ATCC and was propagated and titrated on MRC-5 cells by plaque assay (87). Cytomegalovirus strain VHL/E, originally isolated from duodenal biopsy material from a bone marrow transplant recipient (86), was propagated in HUVEC as detailed elsewhere (85) to preserve its natural endothelial cytopathogenicity.
Extraction of HCMV genomic DNA. HCMV genomic DNA was extracted as described previously (81). Briefly, confluent HFF-TEL cells in two 175-cm2 flasks were exposed to 1 PFU per cell of AD169 or Towne strains. The cells were harvested at 72 h postinfection and collected by centrifugation. The cell pellets were resuspended in 5 ml of 150 mM NaCl, 10 mM Tris (pH 7.4), and 1.5 mM MgCl2. After incubation on ice, NP-40 was added to a final concentration of 0.1%. The lysate was centrifuged at 3,700 rpm for 20 min using a Beckman GS-6R centrifuge. The supernatant was collected and brought to a final concentration of 0.2% sodium dodecyl sulfate (SDS), 0.5 mM EDTA, and 50 mM ß-mercaptoethanol. After incubation on ice and extraction with phenol-chloroform, the genomic DNA was precipitated with ethanol, resuspended in 1 ml of Tris-EDTA buffer, and treated with RNase (Sigma-Aldrich). The genomic DNA was further purified by centrifugation in a linear 5 to 20% (wt/vol) potassium acetate gradient at 40,000 rpm for 3.5 h at 20°C in a Beckman L7 Ultracentrifuge SW60 rotor. Following centrifugation, the DNA was collected, precipitated with ethanol, and resuspended in 50 µl distilled water. The purified genomic DNA was digested with MseI, followed by phenol-chloroform extraction and ethanol precipitation. The digested genomic DNA was finally resuspended in 50 µl sterile water.
Construction of HCMV AD169 cDNA libraries. RNA was extracted from infected HFF cells cultured in 175-cm2 flasks under conditions that selected for immediate-early (IE), early (E), or late (L) viral transcripts. For all conditions, cells were exposed to 20 PFU per cell of the AD169 strain of HCMV. To select for IE transcripts, cells were treated with 100 µg/ml of cyclohexamide (Sigma-Aldrich) for 1 h prior to infection and throughout the 24-h infection period, when cells were harvested for RNA isolation. To select for E transcripts, 100 µM ganciclovir (Roche Pharma) was added to the medium after the infection period, and cells were harvested 72 h later. To select for L viral transcripts, untreated infected cells were harvested 72 h after infection. Prior to isolation of total RNA, cells from a small section of the flasks were scraped and collected. Cells were disrupted in SDS-polyacrylamide gel electrophoresis sample buffer (25 mM Tris-Cl, pH 6.8, 2.5% ß-mercaptoethanol, 5% glycerol, and 0.5% SDS), boiled, and sonicated. Proteins were separated by denaturing gel electrophoresis and transferred to nitrocellulose (Amersham Bioscience). Efficacies of drug treatments were verified by immunoblot analyses for IE UL122/123 products, IE1/2 (Rumbaugh Goodwin no. 1203), the early UL55 product, glycoprotein B (gB) (Rumbaugh Goodwin no. 1201), and the L UL99 product, pp28 (Rumbaugh Goodwin no. 1207) (data not shown).
Unless otherwise stated, all extractions of total cellular RNA were performed using the TRIZOL reagent (Invitrogen), following the instructions of the manufacturer. Polyadenylated mRNA was isolated using an Oligotex kit (QIAGEN) according to the manufacturer's instruction. cDNA libraries were constructed using two cloning vectors, pAcCMV (96), derived from the pAcSG2 Baculovirus transfer vector (BD Biosciences), and pcDNA3.1(+) (Invitrogen). These vectors were modified by introducing recognition sites for two restriction enzymes, PacI and PmeI, that are absent in the AD169 genome. Specifically, sequences recognized by the PacI and PmeI enzymes were inserted between the StuI and KpnI sites of vector pAcCMV using the following oligonucleotides: 5' CCTGTTTAAACCTAGGCGGCCGCTTAATTAAGGTAC and 5' CTTAATTAAGCGGCCGCCTAGGTTTAAACAGG. The PmeI site at position 1007 in pcDNA3.1(+) was replaced with sequences specifying a PacI cutting site using a site-directed mutagenesis kit (Stratagene) and the following oligonucleotides: 5' CTAGAGGGCCCGTTTAATTAAGCTGATCAGCCTCGACTG and 5'CAGTCGAGGCTGATCAGCTTAATTAAACGGGCCCTCTAG.
cDNA libraries were constructed by following the instruction manual for the SuperScript Plasmid System with Gateway Technology for cDNA Synthesis and Cloning (Invitrogen) with some minor modifications. Briefly, a poly(T)-tailed PacI primer-adapter was used for first-strand cDNA synthesis (5'GCGGCCGCTTAATTAACC(T)15). After second-strand synthesis, an EcoRI-PmeI adapter was added to the 5' end of the cDNA that was generated from the following oligonucleotides: 5'AATTCAGGCCTGTTTAAACG and CGTTTAAACAGGCCTG. cDNA fragments were used in ligation reactions with modified pAcCMV or pcDNA3.1(+) vectors previously digested with the EcoRI and PacI restriction enzymes. Recombinant plasmids harboring cDNA sequences were transformed into XL1-Blue Supercompetent Escherichia coli cells (Stratagene).
cDNA library screening by colony hybridization. Random transformed bacterial colonies were picked individually and transferred onto agarose grid plates (approximately 100 clones on each plate). Colonies on grid plates were transferred to Hybond-N+ nylon membranes (82 mm in diameter; Amersham Bioscience) and processed for hybridization as described by Hirsch (40). MseI-digested HCMV genomic DNA was labeled using the DIG High Prime DNA Labeling Detection Starter Kit II (Roche Applied Science). Probes were incubated with the membranes according to the manufacturer's instruction.
DNA sequencing and sequence analysis. Bacterial colonies harboring HCMV-derived cDNA sequences were inoculated into 4 ml LB broth supplemented with 50 µg/ml ampicillin. Plasmid DNA was purified from overnight culture using QIAprep Spin Miniprep kits (QIAGEN). cDNA inserts were sequenced from the 5' end using the T7 primer for pcDNA3.1(+) and a pAcCMV-specific primer (5'GGAGACGCCATCCACGCTGTTTTGACC) at the OSU Plant-Microbe Genomics Facility. In total, 870 clones were submitted for sequence analysis from the 5' ends. Sequences were compared to the AD169 genome (GenBank accession no. NC_001347) using mega BLAST (95). Matched AD169 gene sequences were downloaded and aligned to corresponding cDNA clone sequences manually using BioEdit software (http://www.mbio.ncsu.edu/BioEdit/bioedit.html) (39). Selected clones were subjected to a second round of sequence analysis using primers specific for the 3' ends of the cDNA inserts for the pcDNA3.1(+) (5'GCACCTTCCAGGGTCAAGGAAG) and pAcCMV (5'GAGGTGCGTCTGGTGCAAAC) vectors. These primers failed to generate sequence from a subset of clones, presumably because of difficulties in reading through poly(A) tracts. Therefore, in some cases, 3' ends were sequenced using standard poly(T) primers. Selected cDNA inserts were also sequenced using specifically designed internal primers. AD169 genomic positions (accession no. NC_001347) corresponding to the sequences of the cDNA clones were determined using the SPIDEY program for cDNA to genomic alignments (http://www.ncbi.nlm.nih.gov/IEB/Research/Ostell/Spidey/).
RT-PCR for detection of AS transcripts. Confluent MRC-5 cells in six-well plastic tissue culture plates were exposed to 2 PFU per cell of the AD169 strain of HCMV. Cells were harvested at 24, 48, and 72 h after infection. Also, confluent HUVEC monolayers in six-well plastic tissue culture plates were inoculated with sonicated cell lysates containing VHL/E of HCMV (1 PFU/cell). To enhance infection efficiency of endothelial cells, plates were centrifuged at 300 x g for 30 min at room temperature. Cells were then incubated for an additional 30 min before removal of inoculum, followed by two washes with phosphate-buffered saline and addition of fresh medium. Cells were harvested at 24, 48, 72, 96, and 120 h after infection. Total RNA was isolated and treated with DNase I (Invitrogen). The reverse transcription-PCR (RT-PCR) analysis was performed using a ThermoScript kit (Invitrogen), following the manufacturer's instructions. cDNA synthesis was performed in the first step using total RNA and gene-specific primers at 55°C for 35 min (Table 1, reverse primers). In the second step, PCR was performed using primers specific for the gene of interest. A complete list of PCR primers is given in Table 1. Reactions were carried out at 94°C for 2 min, 30 cycles of 94°C for 30 s, 59°C for 45 s, and 72°C for 50 s, and final extension at 72°C for 10 min. RT-PCR products were analyzed by agarose gel (1% [wt/vol]) electrophoresis. No reverse transcriptase and no template controls were run in parallel.
|
View this table: [in a new window] |
TABLE 1. Primers for RT-PCR and 5' and 3' RACE
|
Plasmids pE10422 and pE104114, harboring cDNAs with gene sequences in S and AS orientation, were used to generate probes complementary to AS and S transcripts, respectively, of UL36 (see Table S1 in the supplemental material for a description of cDNA clones). Plasmids pE10422 and pE104114 were linearized with BssHII and NdeI, respectively. RNA probes were generated using T7 polymerase. The AS probe corresponds to nucleotides 49791 to 49577 and 49473 to 48961, while the S probe corresponds to nucleotides 49135 to 50065, of the AD169 genome. Plasmids pL5312 and pL8212, harboring cDNAs with gene sequences in S and AS orientation, were used to generate probes complementary to AS and S transcripts, respectively, of UL24. Plasmids pL5312 and pL8212 were linearized with SalI and NcoI, respectively. RNA probes were generated using T7 polymerase. The AS probe corresponds to nucleotides 29806 to 29136, while the S probe corresponds to nucleotides 28949 to 29362, of the AD169 genome.
Plasmid pL222 carrying the AS UL102 sequence was digested with EcoRI and XbaI (corresponding to nucleotides 150184 to 149378 of AD169), and this fragment was inserted into pBluescript II KS+ (Stratagene). The plasmid was linearized with EcoRI to generate a probe complementary to the AS transcripts of UL102 using the T7 promoter or linearized with XbaI to generate a probe complementary to the S transcripts using the T3 promoter.
Plasmid pE1033, carrying the AS RL5 and S RL4 sequences, was digested with BamHI and SalI (corresponding to nucleotides 4555 to 3958 or 185843 to 186440 of AD169) and inserted into pBluescript II KS+. This plasmid was linearized with BamHI to generate a probe complementary to the AS RL4 transcripts using the T3 promoter. Because the S-specific probe generated from this entire fragment exhibited nonspecific binding, this plasmid was linearized with AvaII to generate a probe complementary to the S transcripts of ß2.7 using the T7 promoter (corresponding to nucleotides 4555 to 4399 or 185843 to 185999 of AD169).
Plasmid pE10335, carrying the AS UL61 and AS UL62 sequences, was digested with EcoRI and XhoI (corresponding to nucleotides 94467 to 94735 of AD169), and this fragment was inserted into pBluescript II KS+. This plasmid was linearized with EcoRI to generate a probe complementary to the AS UL61 and UL62 transcripts using the T3 promoter. Because the S-specific probe generated from the T7 promoter exhibited nonspecific binding, a second S-specific probe was generated by linearizing plasmid pE10335 with MluI (corresponding to nucleotides 94467 to 95174 of AD169) and using the T7 promoter.
RACE. Rapid amplification of cDNA ends (RACE) was performed to determine 5' and 3' ends of the AS clones of UL36 (E) and the 5' ends of AS clones for UL24 (E) and UL102 (L). Total RNA was isolated from MRC-5 cells in six-well plastic tissue culture plates exposed to 2 PFU per cell of the AD169 strain of HCMV at 48 h (for UL36 RACE) or 72 h (for UL24 and UL102 RACE) after infection. RNA was treated with DNase I (Roche Applied Science). 5' or 3' cDNA ends were amplified using the 5'/3' RACE kit (Roche Applied Science), following the manufacturer's instructions. The products of the RACE reactions were inserted into a TOPO TA vector (Invitrogen) and sequenced at the OSU Plant-Microbe Genomics Facility. Primers used for RACE experiments are listed in Table 1.
|
|
|---|
Many indicators suggest that these libraries accurately reflect temporal regulation, splicing, and abundance of viral gene products. cDNA clones isolated include S sequences overlapping 125 of the 191 unique genes currently annotated for the AD169 strain of HCMV in the STD database, including full-length sequences for 92 annotated genes. Examples of correct temporal expression and splicing of viral genes in these libraries were supported by clones representing the UL123 and UL4 genes. The major transcriptional activator (IE72) encoded by the UL123 gene is known to be expressed during IE times (78). At least 15 cDNA clones isolated from the IE library were full-length and fully spliced transcripts capable of coding for the IE72 protein. Also, transcripts with short and long 5' untranslated regions (UTRs) for the UL4 gene have been characterized to accumulate at early and late times after infection, respectively (1, 15, 20). We identified 14 transcripts harboring longer 5' UTR regions, all of which were derived from the L library. We also identified 15 clones that had shorter 5' UTRs, the majority of which (9) were found in the E library. In addition, studies from other laboratories suggest that the most abundant transcripts found in infected cells overlap the TRL4/IRL4 genes (TRL/IRL region genes are henceforth referred to as RL) (35), and this was reflected in our libraries. Indeed, 169 clones in our E and L libraries harbored RL4 gene sequences, 141 of which were classified within the same transcript group (see Table S1, group 148, in the supplemental material). Taken together, these and other indicators suggest that our libraries reasonably reflect the range and abundance of HCMV transcriptional products.
A more detailed comparison of the temporal profiles of transcripts isolated in this study relative to microarray and Northern studies from other laboratories is shown in Table S2 in the supplemental material. This table also lists the clones characterized in this study, organized by gene name, and provides references for other transcript mapping studies performed for HCMV. The primary conclusion from this analysis is that the tentative temporal class assignments made in this study are largely congruent with temporal class assignments made using Northern and microarray analyses. The major difference is that we found many more genes represented in the transcripts of the IE class relative to findings with other methods. Specifically, we identified 45 genes with at least 1 clone isolated in the IE library. For many of these genes, the majority of clones were isolated at E or L times with only one or a few clones isolated at IE times. There are two non-mutually exclusive explanations to account for these unexpected transcripts found in the IE library: (i) they represent tegument-associated viral transcripts delivered by the virion rather than newly synthesized viral transcripts; and (ii) these IE transcripts reflect leaky control of E and L gene expression in the IE period in HCMV-infected fibroblasts. Although the latter possibility could be due to incomplete blockade of protein synthesis by cycloheximide treatment, the efficacy of drug treatments used to generate the cDNA libraries was verified by measuring viral protein accumulation (data not shown). It is worthy of note that while the cDNA library approach employed in this study is subject to different biases relative to other methods (see Discussion and reference 19), it is free of bias introduced by employing gene-specific probes. Thus, it is possible that a cDNA library approach offers a specific advantage relative to Northern and microarray approaches in capturing minor transcripts available at various temporal phases of infection.
Another unexpected feature of our libraries was the prevalence of transcripts overlapping genes predicted to be noncoding genes. While we expected transcripts overlapping the RL4 gene to be abundant (35), we also isolated numerous cDNA clones from other repeat region genes. We found that 230 of the 604 clones analyzed mapped to the RL2-RL9 region. We also found 29 clones derived from the UL61 to UL68 gene region and 13 clones derived from the UL106 to UL111 gene region, also recently revised as likely to be noncoding regions (27). Altogether, we obtained 274 clones (45% of the total) overlapping annotated genes predicted to be noncoding.
AS transcription in the HCMV transcriptome. One of the most striking features of our transcriptome study was the prevalence of transcripts in AS orientation to known or predicted genes. For this analysis, we compared our cDNA sequences to the genomic map of the AD169 strain (GenBank accession no. NC_001347) of HCMV and the STD database. This map includes up-to-date revisions in annotation proposed by Davison and colleagues, including annotation of those genes revised as noncoding (27).
Of the 604 sequences we analyzed, 257 represented one or more genes strictly in the S orientation (Table 2). Remarkably, 347 sequences were partially or completely AS to genes annotated on the STD database map, representing 57% of the cDNA clones isolated in our libraries. Because experimental evidence verifying the existence of gene products derived from a number of viral genes is lacking, we considered the possibility that the only products derived from a subset of these genes are those we identified in our library and that our calculation for the number of AS transcripts could be overestimated. When we excluded those genes for which we could find no evidence in our libraries or in the literature for a product derived from the S orientation of the gene (orphaned AS transcripts; see Table 5), we estimated that 271 clones (45%) represented transcriptional products strictly in one orientation with respect to gene sequences (designated the S orientation) and 333 clones (55%) were completely or partially in AS orientation. The AS sequences fell into two categories: a minority (54 clones) overlapped one or more genes strictly in AS orientation, whereas 279 clones overlapped more than one gene with sequences in both S and AS orientations. ORF analysis of AS clones predicts that these are predominantly noncoding (see Table S1 in the supplemental material). When these clones are included in the calculations for coding and noncoding transcripts, we estimate that up to 49.5% of the clones isolated in this study are noncoding in nature.
|
View this table: [in a new window] |
TABLE 2. Orientation of cDNA clones
|
|
View this table: [in a new window] |
TABLE 5. Orphaned AS transcripts
|
|
View this table: [in a new window] |
TABLE 3. Genes for which cis natural S-AS pairs were identified and their properties
|
|
View this table: [in a new window] |
TABLE 4. Genes with predicted S-AS pairs
|
Based upon the library in which the clones were isolated, we made predictions regarding the temporal association of S and AS transcripts derived from the same gene. Accordingly, 7 S-AS pairs were discordant relative to the library in which they were identified, whereas most (28) were concordant inasmuch as they were isolated from the same library. Keeping in mind that the E library is expected to contain both IE and E transcripts and the L library could contain transcripts expressed at IE, E, or L temporal classes, together these findings suggest that the majority of S-AS pairs are concordantly and inversely expressed during infection.
We also classified S-AS pairs with respect to the nature of the complementary overlapping sequences (Table 3). We used combinations of classification schemes proposed by others (45, 94) divided into one of four categories: full overlap, intronic, convergent, and divergent. Full overlap was defined as one gene sequence being completely contained within the gene sequence of the other member of the pair. Intronic was defined as one gene sequence starting within the intron of the other member of the pair and ending beyond the start of its pair. Divergent was defined as S-AS pairs exhibiting overlap in their 5' regions in a head-to-head manner. Finally, convergent was defined as S-AS pairs exhibiting overlap in their 3' regions in a tail-to-tail manner. Each potential pair was assigned to only one category, and the order of stringency was full overlap, intronic, divergent, and convergent. Using this scheme, we identified S-AS pairs that fell into each of these groups. Although the least abundant class, intronic pairs were observed for S-AS pairs overlapping seven genes. Convergent overlap was common, with 1 or more S-AS pairs for 14 genes falling into this class. Pairs exhibiting full overlap were also abundant. We found 1 or more S-AS pairs overlapping 19 genes in this class. However, it should be noted that in the absence of 3' end sequence for all of the cDNA clones, it is possible that the number of genes with S-AS pairs exhibiting full overlap is currently overestimated. Finally, we found that 29 of 38 genes included 1 or more S-AS pairs that were divergent in their overlap. Also, of those genes with multiple S-AS pairs, the divergent class was typically most abundant. Thus, while we observe a diversity of S-AS pair classes derived from HCMV genes, pairs with divergent or head-to-head overlap were most common.
Finally, we classified S-AS pairs according to functional class of known or predicted gene products (Table 6). Several S-AS pairs overlap genes known to be involved in DNA replication and packaging, including those genes (UL70 and UL102) that encode the components of the HCMV helicase-primase complex. We also found S-AS pairs for the viral inhibitors of apoptosis encoded by UL36 and UL37 genes and the recently described noncoding ß2.7 transcript overlapping the RL4 gene (69). Additionally, we identified S-AS pairs that overlap genes encoding tegument proteins, glycoproteins, and proteins involved in subversion of immune responses (UL111A and RL11) (2, 54) or cellular antiviral defense mechanisms (TRS1 and IRS1) (17). Finally, we identified S-AS pairs for at least 32 genes of unknown function, most of which are predicted to be noncoding. This aside, these findings indicate that there is not a clear bias of S-AS pairs for genes of specific functional classes and that S-AS pairs exist for both coding and noncoding genes.
|
View this table: [in a new window] |
TABLE 6. Functional classes of genes with verified or predicted cis natural S-AS pairs
|
Our analysis of cis NATs from the HCMV genome relied upon isolation of virally derived transcripts from human foreskin-derived fibroblasts. If this is a truly robust phenomenon, we predict that cis NATs will be observed in other infected cell types and in cells infected with other strains of HCMV. To test this, we used RT-PCR to identify AS transcripts in lung-derived human fibroblasts infected with the laboratory-adapted strain AD169 of HCMV and endothelial cells infected with the VHL/E clinical strain of HCMV. We designed primers specific for AS transcripts from both coding and noncoding viral genes. We also utilized primers specific for S-oriented viral UL55 transcripts coding for gB or cellular glyceraldehyde-3-phosphate dehydrogenase (GAPDH) transcripts. For specificity controls, RNA isolated from virus-infected and mock-infected cells was analyzed in the presence and in the absence of reverse transcriptase.
In the first set of experiments, RT-PCR was used to amplify AS transcripts identified in the E and L libraries using RNA isolated from AD169-infected MRC-5 fibroblasts at 48 and 72 h after infection, respectively. As shown in Fig. 1A, we specifically amplified portions of AS transcripts found in the early library derived from genes known or predicted to be protein coding (UL36 and US30/31) and genes predicted to be noncoding (UL61, UL67, RL2/3, and RL8/9). Similarly, we specifically amplified products of expected sizes for AS transcripts derived from genes known to be coding (UL27, UL70, UL87, and UL102) and predicted to be noncoding (RL3) from RNA isolated at 72 h after infection (Fig. 1B). Finally, we specifically amplified products of expected sizes for AS transcripts derived from genes specifying tegument proteins (UL23, UL24, UL25, UL47, and UL88) (Fig. 1C). Except for the cellular GADPH, we could not amplify these transcripts from mock-infected cells. The absence of bands of predicted sizes in reactions conducted without reverse transcriptase ensures that these results do not reflect contamination of viral DNA in our reactions. We conclude from these experiments that virally derived AS transcripts are generated during lytic HCMV infections of fibroblast cells and that these AS transcripts do not reflect DNA contamination during the cDNA library construction.
![]() View larger version (22K): [in a new window] |
FIG. 1. Verification of virally derived AS transcripts. Digital images of agarose gels used to separate PCR products specific for virally derived AS and S transcripts. Confluent MRC-5 cells in six-well tissue culture plates were exposed to 2 PFU per cell of the AD169 strain of HCMV (top panels) or were mock infected (bottom panels). (A) RT-PCR of AS transcripts identified in the E library using total RNA isolated at 48 h after infection. (B) RT-PCR of AS transcripts identified in the L library using total RNA isolated at 72 h after infection. (C) RT-PCR of AS transcripts specific to tegument genes using total RNA isolated at 24 h after infection (UL47) or 72 h after infection (UL23, UL24, UL25, and UL85). Total RNA was isolated and subjected to RT-PCR using primers specific to virally derived AS transcripts (Table 1). Primers specific for the S transcripts of viral UL55 (gB) or cellular GAPDH were included as positive controls. No-reverse-transcriptase and no-template (NT) controls were run in parallel. At 48 h, the no-template control included primers specific for AS US30/31. At 72 h, the no-template control included primers specific for AS UL87/88. In panel C, the no-template control included primers specific for AS UL23. RT-PCR products were separated by agarose gel (1% [wt/vol]) electrophoresis, and bands were visualized by exposure to UV light.
|
![]() View larger version (32K): [in a new window] |
FIG. 2. Verification of AS transcripts in endothelial cells infected with a clinical strain of HCMV. Digital image of an agarose gel used to separate PCR products specific for virally derived AS and S transcripts. Confluent HUVEC monolayers in six-well tissue culture plates were inoculated with the VHL/E strain of HCMV (1 PFU/cell). Cells were harvested at 24, 48, 72, 96, and 120 h postinfection or were mock infected for 120 h. Total RNA was isolated and subjected to RT-PCR using primers specific to virally derived AS transcripts for UL36/37 (A) or UL102 (B). Primers specific for the S transcripts of viral UL55 (gB) or cellular GAPDH were included as positive controls. No-reverse-transcriptase and no-template controls were run in parallel. RT-PCR products were separated by agarose gel (1% [wt/vol]) electrophoresis, and bands were visualized by exposure to UV light.
|
![]() View larger version (29K): [in a new window] |
FIG. 3. Northern blot analysis of S and AS transcripts. Film images of S and AS transcripts analyzed by Northern blotting and schematic depictions of select S and AS transcripts. Confluent MRC-5 cells in six-well tissue culture plates were exposed to 2 PFU per cell of the AD169 strain of HCMV or were mock infected (M). Cells were harvested at 24, 48, and 72 h after infection. Total RNA was isolated, subjected to denaturing agarose gel electrophoresis, and transferred to nylon membranes. Prelabeled RNA molecular mass markers were loaded for each group (L). Membranes were incubated with probes specific for the S and AS transcripts of UL24, 3 µg RNA/lane (A); UL36, 3 µg RNA/lane (B); UL102, 4 µg RNA/lane (C and F); UL61, 5 µg RNA/lane (D and F); or RL4, 4 µg RNA/lane (E and F) as described in Materials and Methods. Exposure times are indicated at the top of each panel. A schematic of transcripts represented by select cDNA clones isolated in our library relative to the genome is shown in the right panels. Gene regions and intergenic regions are depicted by thick arrows and white boxes, respectively. Transcripts cloned in this study are represented as thin arrows below the gene regions. 5' ends of transcripts are depicted with filled circles. Clones in which we identified the poly(A) tails are indicated (AAA). The genomic positions of the 5' and 3' ends of the clones in the libraries are shown. Underlined genomic positions are those verified by RACE. Dashed lines represent presumptive transcript sequences based on RACE analysis. Tentative assignment of bands corresponding to clones identified in this study is indicated with circled numbers. In panel F, the relative abundances of transcripts overlapping the RL4 and UL61 gene regions are compared to those of transcripts derived from the UL102 gene region. The white circle indicates the position of the 2.6-kb marker band. No images were altered.
|
Similar to that observed for the UL36 AS transcripts, we found that expression of the S UL102 transcript precedes expression of the AS UL102 transcripts (Fig. 3C). We identified a 2.3-kb S transcript in our library represented by clone pL2211. An additional larger S transcript is also observed by Northern analysis, consistent with that reported previously (73a). RACE analysis confirmed the 5' end of AS clone pL222 and identified another, larger AS transcript of 2.1 kb initiating at genomic position 151035. Additional, even larger bands were observed by Northern analysis that have yet to be characterized. We predicted that cis NATs of UL102 should be expressed with similar abundance and concordantly. While concordant expression was confirmed, the difference in exposure times required to visualize these transcripts indicates inverse accumulation of the S-AS transcripts.
Finally, we used a Northern blot approach to ascertain whether the abundance of clones in our libraries overlapping the RL4/5 and UL61/UL62 genes reasonably reflects the abundance of these transcripts in infected cells (Fig. 3D, E, and F). First we set out to verify the existence of S and AS clones from these gene regions. Clone pL537 was the longest S clone overlapping UL61 in our library, predicting a band of 3.9 kb, which corresponds well with the largest band observed on this Northern blot (Fig. 3D). Numerous clones overlapped UL61 in the AS orientation, the longest of which are listed in transcript group 59 (see Table S1 in the supplemental material), which predict a band of 4.6 kb. These clones correspond well with the largest band identified with the AS-specific probe for UL61 (Fig. 3D). The smaller bands identified by both S and AS probes specific for UL61 may represent distinct smaller transcripts or degradation products of the larger transcripts isolated in this study. The major 2.7-kb transcript overlapping the RL4/5 genes has been described previously (35) and represented the major band identified by the S-specific probe (Fig. 3E). However, most or possibly all of the clones overlapping the RL4 genes isolated in this study appear to be initiated from genomically encoded poly(A) tracts and thus do not represent the full-length 2.7-kb transcript (represented by clone pE103121, depicted in the right panel of Fig. 3E). We identified a band of approximately 3 kb using a probe that would recognize transcripts that are in AS orientation relative to the 2.7-kb transcript. Although the precise boundaries of the AS RL4/5 transcripts (represented by clone pE103210 in the right panel) as well as the UL61-derived transcripts have yet to be verified, the different exposures times required to visualize the S and AS transcripts clearly demonstrate inverse expression of the UL61-derived AS transcripts relative to the UL61-derived S transcripts and inverse expression of RL4-derived S transcripts relative to the RL4-derived AS transcripts. Finally, we compared the abundance of clones overlapping the UL61/UL62 and RL4/5 gene regions to that derived from the UL102 gene region (Fig. 3F). In our libraries, we isolated two S UL102 clones, five times as many AS UL61 clones, and 85 times as many S RL4 clones. The abundance of these clones in our libraries correlated with the signal intensities of transcripts that bound the S-specific RL4 probe and the AS-specific UL61 probe relative to the S-specific UL102 probe. Taken together, these experiments prove the existence of cis NATs derived from coding (UL24, UL36, and UL102) and predicted noncoding (UL61 and RL4) genes. Furthermore, these studies indicate that the representation of S and AS transcripts in our libraries reasonably reflects the abundance and temporal expression patterns of S and AS transcripts generated in HCMV-infected fibroblasts.
|
|
|---|
The discovery of widespread AS transcription also bears upon the use of viral gene arrays for transcriptome analyses and recombinant viruses for gene function studies. Specifically, these findings suggest that tiling arrays using overlapping probes from both DNA strands rather than probes based upon ORF predictions will be required to capture an accurate representation of viral transcripts. Additionally, these findings raise the possibility that AS as well as S gene products could be affected when recombinant viruses harboring mutations or deletions are generated.
There are several factors that could influence the composition of this library that are relevant to the interpretation of these data. First, because highly abundant transcripts are expected to be preferentially represented, we infer that transcripts from genes not represented in our library are of relatively low abundance. While abundance of these transcripts is likely a key factor, we cannot exclude the possibility that artifactual pressures might also influence library composition. For example, it is possible that there were selective advantages for isolating transcripts from gene regions with long or repetitive tracts of adenosines, such as the RL gene region. These transcripts may have been preferentially enriched during the purification of polyadenylated RNAs prior to cDNA library construction. Similarly, genomically encoded poly(A) tracts may have served as primer binding sites during the reverse transcription step of the cDNA library construction. While many cDNA clones appear to have genuine poly(A) tracts, we also found a number of clones, especially from predicted noncoding regions, in which the 3' end sequence corresponded to a genomically encoded poly(A) tract. Therefore, further studies will be necessary to define precise ends of a subset of transcripts, as well as their relative abundance in infected cells. Despite these caveats, Northern analyses of transcripts derived from single-copy and repeat-region genes suggest that composition of our library reasonably reflects the abundance and temporal expression patterns of transcripts in infected fibroblasts.
AS transcripts of the HCMV genome. Individual AS transcripts have been described for many herpesviruses, including the betaherpesviruses, (6, 48, 82), the gammaherpesviruses, (65, 66, 72), and especially the alphaherpesviruses, (8, 9, 13, 16, 21, 25, 26, 42, 49, 50, 53, 68, 84, 88, 89, 91). In fact, Roizman and colleagues predicted that genes in AS orientation to known herpesvirus genes could be common (16). However, to our knowledge, our study is the first to systematically document cis NATs derived from a herpesvirus. We found that at least 55% of the clones analyzed in this study contain sequences in AS orientation overlapping 56 known or predicted genes. These figures are dramatic inasmuch as they suggest that more than half of all virally derived transcripts harbor AS sequences. Nevertheless, three factors suggest that our study may have underestimated AS transcription in HCMV. First, our libraries do not contain all of the S transcripts expressed by HCMV, and thus, it is likely that they do not contain all of the AS transcripts that accumulate during infection. Indeed, at least two previously reported AS transcripts overlapping the HCMV UL82 and UL123 genes were not isolated in our libraries (6, 48). Second, we selected for polyadenylated transcripts during construction of our libraries, and studies suggest that a large fraction of AS transcripts are poly(A) negative (46). Finally, we classified only cis NATs identified in our libraries. Considering that the HCMV genome includes numerous repeat elements, it is likely that trans-derived NATs are also generated during infection.
Relative to eukaryotic genomes, herpesvirus genomes are small and densely populated with genes. In fact, many genes are directly adjacent to one another, and a number of viral genes overlap each other, often in opposite orientations. These characteristics suggest that the occurrence of S-AS pairs may only reflect the dense organization of viral genes and the relative paucity of intergenic regions relative to larger genomes. Nevertheless, S-AS pairs derived as a consequence of this gene organization may be functionally relevant with respect to regulation of viral genes. This raises the interesting possibility that regulatory consequences of S-AS pairs may constitute one evolutionary pressure influencing function and orientation of adjacent viral genes.
Given these properties of the HCMV genome, it seems reasonable to predict that higher proportions of viral genes would be associated with S-AS pairs than would be the case for genes of eukaryotic genomes. In this study, we found that at least 29% of the currently annotated 191 HCMV genes are associated with S-AS pairs. While this figure is similar to estimates of 22 to 26% (24, 94) of human genes associated with S-AS pairs, it is lower than the estimate of 72% of mouse genes (45) predicted to be influenced by S-AS pairs. These preliminary findings suggest that the proportion of HCMV genes involved in the generation of S-AS pairs is similar to that observed for the human genome, despite the striking differences in sizes and structures of these genomes. Another common feature between mammalian and viral S-AS pairs is the nature of the complementary sequences. Our analyses indicate that most S-AS pairs overlap either in 5' or 3' regions, with intronic organization as the least frequent class of S-AS pairs, and this is similar to reports by other groups for the human genome (52, 92, 94). One prediction from this type of overlap is that the potential regulatory consequences of these S-AS pairs relate to function, accessibility, or processing of the UTRs of viral transcripts (80). Since posttranscriptional processing of HCMV viral transcripts has been described previously (7, 14, 30-33, 55, 76, 90), it will be of interest to determine whether such events are influenced by AS transcripts. Also, in mammalian genomes, S-AS pairs appear to be overrepresented among genes involved in genomic imprinting (45), metabolic, catalytic, and cell organization functions (94), and translational regulation (24). However, our findings indicate that there is not a clear functional bias among HCMV genes for which S-AS pairs were identified.
A key question raised by our findings is whether the virally derived S-AS pairs have regulatory consequences during lytic or latent viral infections. AS transcripts can impact viral gene expression though multiple mechanisms, such as influencing splicing, editing, stability, localization, and translation of transcripts (51, 60). In addition, double-stranded intermediates generated from direct interaction of S-AS pairs may lead to gene regulation by RNA silencing or chromatin remodeling (51, 60). S-AS pairs involved in regulatory functions are predicted to be expressed concordantly and to accumulate in an inverse manner (23). Indeed, coexpressed and inversely expressed S-AS pairs not only are more frequent in the human genome than would be expected by chance but are also evolutionarily conserved (80). Our results predict that the majority of HCMV S-AS pairs are expressed with just such a profile, and we provide experimental evidence for concordant expression and inverse accumulation for S-AS pairs derived from the UL36, UL102, UL61/UL62, and RL4/5 gene regions.
Another possibility is that AS transcripts may serve as primary transcripts for microRNAs (miRNA), as was recently suggested for the AS latency-associated transcripts of herpes simplex virus and Marek's disease virus (12, 38). Several studies have identified putative or validated miRNAs of HCMV (28, 36, 64). Interestingly, a number of miRNAs that have been predicted or validated are derived from the AS strand of viral genes, including the UL31, UL53, UL70, UL102, UL114, UL150, and US29 genes (36, 64). In support of this possibility, we identified AS transcripts from several of these genes, including UL70, UL102, and US29. While further studies are required to establish regulatory roles for these AS transcripts, we predict that these AS transcripts may dramatically alter our understanding of viral gene regulation during both lytic and latent infections.
To summarize, the remarkable accumulation of noncoding and AS transcripts during infection suggests that currently available genomic maps based on ORF and other in silico analyses may drastically underestimate the true complexity of viral gene products. These findings also raise the possibility that aspects of both the HCMV life cycle and genome organization are influenced by AS transcription. These noncoding and AS transcripts may offer new insights into HCMV pathogenesis and may serve as novel targets for developing intervention strategies and treatments for HCMV-related diseases.
This work was supported by grant AI51411-03 from the National Institutes of Health.
Published ahead of print on 8 August 2007. ![]()
Supplemental material for this article may be found at http://jvi.asm.org/. ![]()
|
|
|---|
receptor homologs. J. Virol. 76:8596-8608.This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»