Department of Veterinary PathoBiology, University of Minnesota, St. Paul, Minnesota,1 Hematology Branch, National Heart, Lung and Blood Institute, Bethesda, Maryland2
Received 26 February 2004/ Accepted 8 June 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
SPV was classified as a parvovirus considering molecular similarities to other parvoviruses, i.e., adeno-associated virus 2 (prototype dependovirus), the autonomous parvoviruses (the prototype being minute virus of mice), and parvovirus B19 (prototype erythrovirus). In general, members of the subfamily Parvovirinae have a small genome of about 5 kb composed of single-stranded DNA with terminal palindromic sequences that serve as primers for the synthesis of the complementary strand. Only the positive strand of these viruses encodes proteins, suggesting that the positive-sense mRNA is transcribed from the minus-strand genomic DNA (3). To maximize the coding capacity, a number of promoters and all three reading frames are used to generate multiple RNAs by splicing (3). Also, for efficient utilization of the genome, capsid proteins are encoded by overlapping in-frame DNA sequences, implying that the coding sequences share the carboxyl terminus and that the smaller capsid proteins (VP2 and VP3) are truncated versions of the large capsid protein (VP1) (3).
The genus Erythrovirus was formed considering the predilection for the prototype parvovirus B19 to infect erythroid cells and significant differences between B19 and other members of the subfamily Parvovirinae (3, 11). All nine B19 transcripts observed utilize the same P6 promoter (5, 12, 23), and all transcripts initiate from the 5' end of the genome. In contrast, minute virus of mice and adeno-associated virus 2 use two and three promoters, respectively (13, 18, 25). All parvovirus transcripts studied to date terminate at the 3' end of the genome. The only known exception is B19, which uses two polyadenylation signals. B19 forms two types of small, relatively abundant RNAs which are absent in minute virus of mice and adeno-associated virus (2, 23). Also, B19 transcripts have multiple large introns not seen in either adeno-associated virus or minute virus of mice (1, 16, 19, 27).
SPV has 50% homology to B19 at the DNA level. At the amino acid level, there is 70% homology with B19 capsid proteins and 50% homology with B19 nonstructural protein (7). SPV thus became the second virus to be considered a member of the genus Erythrovirus. The goal of this study was to investigate the transcripts produced by SPV. The working hypothesis was that SPV transcription would resemble that of B19. Our findings confirmed this hypothesis but also indicated that SPV, unlike B19, has coding strategies in common with other parvoviruses.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The research performed complied with all relevant federal guidelines and institutional policies on animal use (with the protocol approval by the Institutional Animal Care and Use Committee, University of Minnesota).
Reverse transcription. Total RNA was extracted from adult bone marrow and fetal liver recovered at necropsy from experimentally infected macaques (RNeasy, Qiagen Inc., Valencia, Calif.). RNA was quantified both before and after performing DNase treatment (DNA-free; Ambion Inc., Austin, Tex.). The absence of SPV DNA in the DNase-treated RNA was confirmed by the inability to amplify products with PCR with SPV-specific primers SP3 and SP5 (7). This RNA was then reverse transcribed into cDNA with random decamers as primers (Retroscript; Ambion Inc.).
PCR. We amplified 5 µl of reverse transcription products (cDNAs) by PCR with Taq polymerase (Retroscript). Cycling conditions were as follows: 95°C for 3 min and 59°C for 2 min, 35 cycles of 72°C for 3.30 min, 95°C for 45 s, and 59°C for 45 s; and 72°C for 10 min. All primers used were approximately 20 nucleotides in length and were synthesized at the Advanced Genetic Analysis Center at the University of Minnesota.
The nucleotide positions referred to in this manuscript indicate positions on the positive-sense DNA strand of the SPV genome (excluding the terminal repeats) cloned previously (GenBank accession number U26342) (7). A forward primer located at nucleotide position 230 (F230) was primarily used in conjunction with four reverse (R) primers at nucleotide positions 2346, 2568, 3369, and 4854. Two other primers used were F1545 and R1598 (see Fig. 2).
|
PCR products (concentrations averaging 0.6 µg/µl) were used directly for ligation, or PCR products were used for ligation after separation by gel electrophoresis (1% agarose in 0.5x Tris-boric acid-EDTA buffer). Bands were visualized after 30 min of staining with ethidium bromide, followed by 10 min of destaining with distilled water on a platform shaker. Different-sized products were mechanically separated and gel purified with silica gel membrane assembly for ligation (QIAquick; Qiagen Inc., Valencia, Calif.). The concentrations of gel-extracted products averaged 20 ng/µl.
Cloning. PCR products were ligated overnight into the pCR 2.1 vector (original TA cloning; Invitrogen Corporaticon, Carlsbad, Calif.) with T4 DNA ligase in a total volume of 10 µl at 14°C. Ligated products were cloned in transformed Escherichia coli cells and plasmids were isolated (QIAprep). Plasmid samples were restriction enzyme digested with EcoRI (New England BioLabs Inc., Beverly, Mass.) to select clones for sequencing.
Sequence analysis. Plasmids (1 to 2 µg) were sequenced at the Advanced Genetic Analysis Center at the University of Minnesota in a total reaction volume of 12 µl with the T7P and M13R primers (3.2 pmol). The software programs Seqman, Megalign, and Editseq (version 4.0; Lasergene; DNAStar Inc., Madison, Wis.) were used to align trace files of sequences of cloned PCR products with SPV sequence (GenBank accession no. U26342). This identified splice junctions. Putative open reading frames (ORFs) were identified with the program Mapdraw (version 4.0; DNAStar).
BLAST search. Standard nucleotide-nucleotide BLAST and standard protein-protein BLAST searches were performed on the NCBI website.
Northern blotting and autoradiography. RNA samples were electrophoresed in sets in a denaturing 1% agarose-formaldehyde gel. Each set consisted of DNase-treated sample RNA extracted from tissues of infected macaques (referred to hereafter as SPV RNA) (7 µg), negative control (in most instances RNA from tissues of noninfected macaques, and occasionally MA104 or CHO cell RNA), and the radiolabeled marker synthesized by in vitro transcription (Ribomark labeling system; Promega Corporation). An in vitro-transcribed product was also used as positive control (see below).
In vitro-transcribed positive control. One of the commonly observed partial transcripts (transcript number 6 encoding the ORF for VP2; Fig. 2), 333 bases in length, was selected. Plasmids containing the PCR products corresponding to this partial transcript (in an orientation that would give a ribonucleotide of positive sense) were used for in vitro transcription. A stock solution of such plasmids was purified from a culture of previously transformed cells maintained in 20% glycerol at 86°C (QIAprep; Qiagen Inc., Valencia, Calif.). For a single reaction, 10 µg of plasmid was linearized with BamHI (New England BioLabs Inc.) to obtain a 5' overhang. After RNase treatment with proteinase K and ethanol precipitation, the product was suspended in a total volume of 16 µl. This was suitable for a 40-µl reaction of in vitro transcription with T7 reaction components (Ribomax large scale RNA production system). Synthesized RNA was separated from the DNA template by DNase treatment followed by phenol-chloroform extraction and ethanol precipitation. Unincorporated nucleotides were removed with chromatography columns (Micro Bio-Spin; Bio-Rad Laboratories, Hercules, Calif.). The yield of RNA was typically approximately 500 µg (10 µg/µl).
After denaturing gel electrophoresis, RNA was transferred (downward transfer) by blotting onto a nylon filter and cross-linked with a UV cross-linker (Stratalinker; Stratagene, Cedar Creek, Tex.) set at auto-cross-link (12,000 mJ/cm2 for 90 s). Northern blots were blocked with salmon sperm DNA (Stratagene) and probed with one of the following probes. One was a splice junction-specific antisense 40-mer oligonucleotide (spanning 20 bases on either side of the 10 different splices identified) tailed with 32P-labeled dATP (Redivue; Amersham Pharmacia Biotech Inc., Piscataway, N.J.) with terminal deoxynucleotidyl transferase (New England BioLabs Inc.). Splice junction-specific oligonucleotides were obtained from the Advanced Genetic Analysis Center at the University of Minnesota or from Integrated DNA Technologies Inc., Coralville, Iowa. The other was a cloned 333-bp partial transcript radiolabeled with random primer extension with Klenow fragment of polymerase I (Prime-It II; Stratagene).
The radioactive probes were purified with chromatography columns (Micro Bio-Spin; Bio-Rad Laboratories, Hercules, Calif.). The specific activity of the probes was approximately 2 x 108 cpm. The temperature conditions used for hybridization and washing were optimized after calculating the melting temperatures (Tm) of various oligonucleotides used as probes. Hybridization was performed at 68°C for 12 to 18 h and three 10-min washes were performed at 68°C with 6x SSC (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate)-0.1% sodium dodecyl sulfate.
Autoradiography results. The AssayZap (version 2.50; Biosoft, Ferguson, Mo. [http://www.biosoft.com/w/assayzap.htm]) and GraphPad (version 2.0C, GraphPad Software, Inc., San Diego, Calif. [http://www.graphpad.com/welcome.htm]) programs were used to determine the sizes of the bands observed after autoraudiography. The results were checked on both the PhosphorImager (Molecular Dynamics, Sunnyvale, Calif.) and imaging films (Eastman Kodak Company, Rochester, N.Y.).
Nucleotide sequence accession number. The GenBank accession number for the SPV sequence used to analyze the sequences of the observed transcripts is U26342.
| RESULTS |
|---|
|
|
|---|
|
cDNA sequence analysis of tissue derived RNA transcripts. Thirteen spliced, partial transcripts (Fig. 2) were found by nucleotide sequence analysis of seven cloned PCR products (Fig. 1, as described above). The exact nucleotide positions where these transcripts begin and end were not mapped, and the numbers refer to the nucleotide position of the primers used to detect them (positions correspond to the previously cloned SPV genome, GenBank accession number U26342).
Thirteen different transcripts were observed because bands seen with ethidium bromide staining on agarose gel (Fig. 1) represented one or more transcripts, which had a narrow range of molecular sizes. For example, band 3 (Fig. 1, lanes 8 and 9), observed with primers F230 and R2568, represented transcripts numbered 2, 3, 4, and 5 in Fig. 2. Similarly, band 5 (Fig. 1, lanes 10 and 11), observed with primers F230 and R3369, represented transcript number 6 in Fig. 2, while band 6 (lanes 10 and 11) observed with the same primer set represented transcripts numbered 7 and 8 in Fig. 2. The transcripts detected with primers F230 and R4854 (numbers 9 to 14, Fig. 2) can similarly be correlated to bands 8, 9, and 10 in Fig. 1. Band 8 represented transcripts 9 and 11, band 9 represented transcripts 10 and 12, and band 10 represented transcripts 13 and 14, respectively. Transcripts identified with primers F230 and R2346 (Fig. 1, lanes 6 and 7, band number 1) were shorter versions (not shown in Fig. 2) of transcripts observed with F230 and R2568 (described above).
The large number of transcripts (13 from seven bands) was explained by the identification of 10 alternatively used splices (Table 1), indicated by letters a to j in Fig. 2. Several clones of similar sized inserts were sequenced to ensure that all possible species of mRNAs in these were included. It is surmised that the more abundant transcripts/splices were detected more often (Table 1).
|
Splice sites. Seven out of 10 splice sites (Fig. 3A) had typical GT-AG donor/acceptor junctions (6, 26). Nonconventional splice donor sites (AC and GA) observed in the case of SPV were at positions 3211, 3227, and 3263, and their common, nonconventional, splice acceptor site (GT) was at position 4638 (Fig. 3B). Peculiarly, these nonconventional donor sites had GT sequences preceding them by two positions (at nucleotide positions 3209, 3225, and 3261) and the nonconventional acceptor site had AG sequences preceding it by two positions (at nucleotide 4636). With evidence of nonconventional splice donors in minute virus of mice and B19 (10, 14), the three nonconventional splice donors and one nonconventional acceptor in SPV are not unique.
|
Promoter site. Six consensus RNA transcript promoter sites (TATAA), at nucleotide positions 195, 1140, 1387, 3487, 4189, and 4968 (Fig. 2), are present in the SPV sequence (7). The last site (at nucleotide position 4968) should be nonfunctional unless the virus utilizes a circular replicative form. All transcripts detected in our study begin at the 5' end of the genome, because the predominant forward primer used was F230. This indicates the use of the promoter at position 195. It is supported by the observation of a highly GC-rich sequence, two SP1 sites upstream of the first TATA box at position 195 (7), promoter activity in the analogous region in B19 (4), and in vitro confirmation with reporter assays (S. W. Green and K. E. Brown, unpublished data). Whether SPV utilizes downstream promoters requires additional study.
Polyadenylation site. Five sites with the sequence AATAAA were identified in the SPV sequence beginning at nucleotide positions 2449, 2618, 2959, 4948, and 4958 (7). Considering these positions, all spliced partial transcripts seen with primer sets F230 and R3369 and F230 and R4854 (Fig. 2, transcripts numbered 6 to 14) would end at the 3' end of the genome, utilizing either of the two polyadenylation sites starting at nucleotide 4948 or 4958. Other spliced partial transcripts seen with F230 and R2568 (Fig. 2, transcripts numbered 2 to 5) could use any of the polyadenylation sites after nucleotide position 2568 (nucleotide position 2618, 2959, 4948, or 4958). The unspliced transcript coding for NS1 may end in the middle of the genome, with one of the two polyadenylation signals after nucleotide 2449 (nucleotide 2618 or 2959). The polyadenylation site utilized by these transcripts cannot be accurately deduced from the results obtained to date.
Potential classes of full-length transcripts. To predict the sizes of bands that would be observed with Northern blotting and autoradiography, the full lengths of the spliced transcripts were estimated (Table 2). It was assumed that partial transcripts would terminate utilizing the polyadenylation signals at the 3' end of the genome (discussed above) to derive the maximum coding potential of SPV. This step was taken with the understanding that potential variations may be realized when SPV transcripts were investigated with Northern blot analysis, as completed below. The transcripts observed with primer R4854 span almost the entire genome (Fig. 2, transcripts numbered 9 to 14), and hence there would be little variation in the actual size of these full-length transcripts, when compared to the determined estimates. However, as mentioned before and discussed later, variations are possible in transcripts observed with primers R2568 and R3369 (Fig. 2, transcripts numbered 2 to 8).
|
|
|
Similarly, splice site b is shared in the partial transcripts numbered 3, 7 and 14 (Fig. 2), the predicted full-length sizes of which approximately equal 1, 2.4, and 3.4 kb. When SPV RNA was probed with a radiolabeled, antisense, 40-mer probe spanning 20 bases on either side of the splice site b, bands of sizes 1, 2.4 and 3.4 kb were detected (Fig. 4, blot B, lane 2). The predictions and observations for all probes are summarized in Table 3.
The predicted number and sizes of bands were observed with five of the 10 splice junction-specific probes (correlating to splice sites a, b, e, f, and i, Table 3). Our observations did not correlate to predictions for probes corresponding to the remaining 5 splice sites. These are discussed in some detail below.
| DISCUSSION |
|---|
|
|
|---|
The transcription maps for B19 and minute virus of mice were originally produced with S1 nuclease mapping (14, 23). Nuclease protection assays have a detection limit of 4,000 to 5,000 copies of mRNA. Reverse transcription-PCR is presently the most sensitive method available for mRNA detection. The procedure is somewhat tolerant of degraded RNA, i.e., as long as the RNA is intact within the region spanned by the primers, the target will be amplified. A major disadvantage with reverse transcription-PCR is observation of artifacts. It was used considering the high sensitivity and limited amount of macaque tissue available. The combination of cDNA sequencing and Northern blot analysis helped us better analyze our results.
No bands were detected with probes correlating to the splice sites d and g (Table 3 and data not shown). Splice sites d and g appear to be utilized by less abundant transcripts among those observed (numbered 5 and 10, Fig. 2, observations summarized in Table 1). It is possible that their abundance is so low that they could not be visualized with Northern blot analysis. Another possibility is that they are PCR artifacts.
The partial transcript with splice site d putatively encodes VP1. The corresponding RNA species encoding VP1 in B19 was the least abundant among those observed, and this was noted to be consistent with the relative expression of the protein (23). This may be explained by the fact that VP1 comprises only about 5% of the capsid proteins (9). Since RNA was extracted from in vivo-derived tissues from infected macaques in this study, it is possible that transcripts encoding only VP1 were of such low abundance that they could not be detected. Alternatively, the VP1 encoding transcript could be utilizing a downstream promoter.
When estimating the full-lengths of spliced partial transcripts observed by Northern blot analysis, it was assumed that the transcripts observed would terminate at the 3' end of the genome. However, transcripts observed with primer R2568 (Fig. 2) might only code for small ORFs and end before coding for the capsid proteins by utilizing an earlier polyadenylation signal. These would then be similar to B19 transcripts ending in the middle of the genome (Fig. 5). This hypothesis is supported by the observation of strong bands of 0.8 or 0.9 kb with the probes spanning splice sites a and c (Fig. 4), and could explain the observation of smaller, unpredicted bands, as seen with the probe corresponding to splice site c (Table 3).
|
The observation of faint, unpredicted bands with probes corresponding to splice sites h and j may be a result of non-splice-specific binding of the probes. Northern blots with each probe were repeated two to three times to minimize procedural errors, but the possibility cannot be negated.
Some features of the SPV transcripts detected are analogous to those of B19. The initial splice junctions for SPV are at nucleotides 279 and 333 (Fig. 2). These are similar to nucleotide 406 and nucleotide 441 for B19 (10) as described by Brunstein et al. (variant splice sites of B19 not shown in Fig. 5). Brunstein et al. had noted this variation in splicing-pattern with reverse transcription-PCR. This variant has not been detected in case of B19 in studies with other techniques. Hence, it is possible that such splice variants are low in abundance and hence, only detected with a technique as sensitive as reverse transcription-PCR.
As shown in Fig. 5, and in agreement with the predictions of Brown et al. (7), an unspliced SPV message putatively coding for NS (transcript number 1) is present in the 5' end of the genome. Also, depending on the use of the polyadenylation signal, partial transcripts observed with primer R2568 (numbers 2 to 5) may terminate in the middle of the genome. These would then be analogous to smaller transcripts of B19, such as one encoding the ORF for a 7.5-kDa protein (Fig. 5). Smaller putative ORFs (for 5.4- and 10.4-kDa proteins) are also encoded by transcripts numbered 13 and 14, respectively in the left half of the genome. These may represent small nonstructural proteins as seen with B19 (17). Putative ORFs for the capsid proteins (incomplete ORFs followed by dotted lines in transcripts numbered 2 to 8, Fig. 5) are in the 3' end of the genome. Both SPV and B19 have a small overlap of sequences encoding the large 5' ORF for NS (ORF ends at nucleotide position 2370 in SPV and 2448 in B19) and the large 3' ORF for capsid proteins (ORF begins at nucleotide position 2363 in SPV and at 2444 in B19) (7, 23).
Considering the above-mentioned observations, SPV and B19 appear closely related evolutionarily. However, some differences were also noted. SPV has zero to three splices per transcript, compared to zero to two reported for B19 (23) (Fig. 5). The spliced transcript putatively encoding ORF for the truncated NS protein (transcript number 4) is more like the transcripts encoding ORFs for Rep 68/40 of adeno-associated virus 2 and NS2 of minute virus of mice (14, 16). An identical spliced transcript was observed in studies of SPV infection of human bone marrow (8), suggesting that this is not an artifact. Antibodies that bind to the carboxyl end of the SPV NS protein are currently not available, and we were unable to confirm production of this truncated NS protein by Western blotting. In case of the capsid protein encoding transcripts, B19 has a double-spliced message with an ORF only for VP2 (Fig. 5). The SPV transcript putatively encoding only for VP2 was single-spliced (transcript number 6). Also contrary to B19, SPV did not appear to utilize a transcript encoding VP1 alone, although the reverse transcription-PCR product for such a transcript may have been too low in abundance for detection in this study.
Sequence analysis of observed transcripts with Mapdraw also revealed additional ORFs encoding unidentified proteins (possible truncated versions of capsid proteins and the nonstructural protein, and small, unknown proteins) (Fig. 2 and 5). Interestingly, SPV switches frame after splice junctions in many cases. Putative ORFs of unidentified proteins that did not end before nucleotide position 4854 for transcripts numbered 9 to 14 (Fig. 2) could utilize the stop codon at nucleotide position 4870 in the first reading frame, or the stop codon at nucleotide position 4950 in the third reading frame. Molecular sizes for proteins have been calculated accordingly (Table 2).
Predicted amino acid analysis of the deduced complete transcripts indicated that potential translational units begin at nuleotide positions 307, 1781, 1916, 2113, 2363, 3149, 3261, and 4678 (Fig. 2). Brown et al. predicted the sequences encoding major SPV proteins NS = 307 to 2370 (
2.1 kb), VP1 = 2363 to 4819 (
2.5 kb), and VP2 = 3149 to 4819 (
1.7 kb), and confirmed the expression of capsid proteins with Western blot analysis (7). Western blotting of SPV-infected fetal liver with rabbit polyclonal antibody raised against SPV V1, SPV VP2, and a B19 NS peptide confirmed the presence of 90 kDa, 60 kDa and 77 kDa proteins, respectively (data not shown). In addition, the VP2 encoding sequence was recently also used to express the protein in a baculovirus system. Virus-like particles that agglutinated red blood cells were successfully produced (8). This again supports the previous predictions and the use of ATG as the start codon by SPV.
The Kozak consensus sequence for initiation of translation in vertebrates is (GCC) GCCRCCATGG, where R is a purine (A or G) (15). Ozawa et al. indicated a part of the sequence (purine-NNATG) when they determined the transcription map of B19 (23). Only two of the putative ORFs observed in this study on SPV have the sequence "purine-NN-ATG-G", i.e., have a second G. These two ORFs putatively encode NS1 and VP2. None of the other putative ORFs have the second G. The initial portion of the sequence as quoted by Kozak [(GCC) GCCRCC] is absent in all putative ORFs. On the other hand, 6 out of 8 putative ORFs (beginning at nucleotide positions 307, 1781, 2363, 3149, 3261, and 4678) have the sequence "purine-NN-ATG". This may be explained by a recent analysis in which it was observed that a large number of start codons used for translation initiation deviate significantly from Kozak's sequence (24). It was suggested that other means used to translate proteins, including leaky scanning, reinitiation, or internal initiation of translation may have greater roles than previously imagined.
SPV seems to utilize a diversified splicing mechanism for its transcripts. Variation in transcripts and their coding potential could be due to splicing, promoter strength and differential termination. The reverse transcription-PCR approach used in this study was successful in amplifying SPV RNA, but the larger PCR products could not be cloned/sequenced. The cloning approach favored the identification of smaller, highly spliced products. Compared to B19, which was shown to have 9 transcripts (23), 14 putative messages were observed (Fig. 5). Most of the work on transcription of the prototype parvoviruses has to date not employed this technique and hence the use of various splice sites had only been predicted. As was suggested by Morgan et al. and Jongeneel et al. based on their work on minute virus of mice, the use of alternate nearby splice donor and acceptor sites can generate large numbers of transcripts that can be detected by cDNA cloning only (14, 19). This map of splice junctions is in agreement with their prediction. Even though functionality of these junctions needs confirmation with further studies, SPV may be efficiently switching reading frames in vivo with extensive use of alternative splicing to make different proteins in infected cells and exemplifies the elegant splicing pattern that parvoviruses may use.
Although incomplete, this splice junction map provides substantial information on the splice junctions in SPV transcripts. The splice junction map suggests that the general transcription pattern for SPV is similar to that of B19 with transcripts originating from 5' end of the genome (Fig. 5). The abundance of transcripts with ORFs for putative capsid proteins indicates permissive infection of cells from which the RNA was extracted.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| J. Bacteriol. | Mol. Cell. Biol. | Microbiol. Mol. Biol. Rev. |
|---|
| Clin. Vaccine Immunol. | ALL ASM JOURNALS |
|---|