Previous Article | Next Article ![]()
Journal of Virology, June 2004, p. 6666-6675, Vol. 78, No. 12
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.12.6666-6675.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Luis Rubio,2 Ashleigh B. Smythe,3 and Bryce W. Falk1*
Department of Plant Pathology,1 Department of Nematology, University of CaliforniaDavis, Davis, California 95616,3 Instituto Valenciano de Investigaciones Agrarias (IVIA), 46113 Moncada, Valencia, Spain2
Received 5 November 2003/ Accepted 18 February 2004
|
|
|---|
|
|
|---|
Cucumber mosaic virus (CMV) is a tripartite, positive-sense plant RNA virus (Fig. 1). CMV occurs naturally worldwide and has perhaps the widest host range among all plant viruses, including some monocotyledonous and a great number of dicotyledonous plant hosts (26). RNA1 encodes the 1a protein, which together with the RNA2-encoded 2a protein forms the viral component of the replicase complex (19). RNA2 also encodes a second protein, 2b. The 2b coding region overlaps the coding region for the C-terminal portion of the 2a protein but is in a different reading frame register. The CMV 2b protein functions in host-specific long-distance movement (12, 13) and as a virulence determinant by suppressing posttranscriptional gene silencing (3). RNA3 encodes two proteins. The 3a protein is a cell-to-cell movement protein (MP), and the 3b protein is the capsid protein (CP), which is also involved in cell-to-cell movement and aphid-mediated CMV transmission from plant to plant (5, 27, 28).
![]() View larger version (7K): [in a new window] |
FIG. 1. Genome organization of CMV. Numbers of nucleotides (nt) are given for the Fny strain. Open boxes, open reading frames. Solid boxes correspond to the genomic regions analyzed here.
|
In a previous study, we analyzed the biological and molecular variation of California CMV isolates by single-strand conformation polymorphism (SSCP) and by sequence and phylogenetic analyses of the CP gene (22). Here we extend the analyses to other genomic regions (1a, 2a, 2b, MP, and the 3' nontranslated region [3' NTR]) and analyze the genetic structure, diversity, and various population genetics parameters for a California CMV population.
|
|
|---|
contains 63 isolates collected from cucurbit plants in two growing seasons at the Kearney Agricultural Center, Parlier, Calif., in 1999, and group ß contains 18 isolates collected from various hosts and different California locations during 1985 to 1994. RT-PCR amplification. Total RNA extraction, reverse transcriptase PCR (RT-PCR), and PCR were carried out as described previously (22). Primers used for amplifying different genomic regions of CMV, designed according to CMV strain Fny (GenBank accession numbers for RNA1, RAN2, and RNA3 are D00356, D00355, and D10538, respectively), are listed in Table 1.
|
View this table: [in a new window] |
TABLE 1. Primers designed for RT-PCR amplification of different genomic regions of CMV
|
. Recombinant colonies were screened by PCR using the same conditions described above. Nucleotide sequences were determined in both directions by means of a model 377 ABI PRISM DNA sequencer (Perkin-Elmer, Fremont, Calif.) in the Automated DNA Sequencing Facility of the University of CaliforniaDavis. Three colonies for each isolate were used for sequence analysis, and the consensus sequences were used for phylogenetic analysis. Sequences of oligonucleotide primers were excluded for nucleotide sequence comparisons. SSCP analysis. RT-PCR products or bacterial-colony PCR products were used for SSCP analysis. For the MP gene, PCR products were first digested by KpnI in order to obtain smaller fragments and greater accuracy in SSCP. The DNAs were denatured and subjected to electrophoresis by the methods described previously (22). All SSCP analyses were repeated at least twice, and only samples within the same gel were compared.
Sequence analysis.
Multiple nucleotide sequence alignments were performed by using CLUSTAL W (40). Alignments were manually adjusted by using MacClade, version 4.0 (24). All the sequence alignments used in this paper are available upon request. Genetic diversity (the average number of nucleotide substitutions per site between two sequences) within and between populations and the degree of population subdivision were calculated by following the method of Lynch and Crease (23). Population genetics parameters with respect to the total number of mutations, the statistic 
from the number of segregating sites (S), the number of nonsynonymous mutations, the average number of nucleotide differences between two random sequences in a population (
), the average number of synonymous and nonsynonymous nucleotide substitutions, and synonymous codon usage bias were calculated by the DnaSP program, version 3.99 (35). The distribution of synonymous and nonsynonymous substitutions along the coding regions was analyzed by using the SNAP program (available at http://hiv-web.lanl.gov). Synonymous codon usage bias was measured by quantifying the "effective" number of codons (ENC) (41) that are used in a gene. For the nuclear universal genetic code, the value of ENC ranges from 20 (if only one codon is used for each amino acid, i.e., the codon bias is maximum) to 61 (if all synonymous codons for each amino acid are equally used, i.e., no codon bias).
Phylogenetic relationships were inferred by using the PAUP* 4.0b10.0 program with the maximum parsimony optimality criterion (38). For all analyses, a limit of 100,000 trees was imposed. To avoid reaching the tree limit on the first replicate, a maximum of 100 trees were saved per replicate ("nchuck" option in effect). Gaps were treated as a fifth character state. Heuristic searches were performed by using 1,000 replicates of random taxon addition and tree bisection-reconnection branch swapping. Bootstrap analyses were performed by using 1,000 replicates, each with 10 replicates of random taxon addition heuristic search. All branches with bootstrap values of <70% were collapsed. Two other members of the genus Cucumovirus, Peanut stunt virus (PSV) and Tomato aspermy virus (TAV), were included as outgroups. The GenBank accession numbers of 13 CMV isolates with full-length nucleotide sequences and of PSV (strain ER), used as reference isolates, have been given by Roossinck (33). The GenBank accession numbers of TAV (strain V) are D10044 (RNA1), L79972 (RNA2), and AJ277268 (RNA3).
|
|
|---|
(Fig. 2A). For the MP gene, we first used KpnI to cleave the PCR product into three smaller DNA fragments of 321, 502, and 18 bp (only the SSCP bands of the 321- and 502-bp fragments are shown on the gels). KpnI failed to digest PCR products for 23 isolates; therefore, AccI was used for these 23 isolates to yield two fragments of 331 and 510 bp. However, PCR products for three isolates were not digested by AccI, and therefore these were directly used for SSCP analysis. Considering the SSCP analysis and restriction endonuclease digestion results, 21 haplotypes were observed for the MP gene. Haplotype C was predominant, corresponding to 35 isolates, of which 34 were from group
and 1 was from group ß. Haplotype P, corresponding to 13 group
isolates, was the second most frequent, and haplotype E, found for 9 isolates (5 from group
and 4 from group ß), was the third most frequent (Fig. 2B). For the RNA3 3' NTR, 16 haplotypes were found. Haplotypes A and B were the two predominant haplotypes, corresponding to 35 (33 from group
and 2 from group ß) and 26 (20 from group
and 6 from group ß) isolates, respectively (Fig. 2C). Collectively, these results showed that the California CMV population comprised one to three predominant haplotypes, depending on the genomic region analyzed, and a number of minor haplotypes with low frequency.
![]() View larger version (49K): [in a new window] |
FIG. 2. Genetic structure of the California CMV population based on SSCP analyses of different genomic regions. (A) The CMV 2b gene; (B) the CMV MP gene; (C) the RNA3 3' NTR. Each lane contains a distinct SSCP pattern, which is considered a distinct haplotype, designated by the letter above the lane. Below each lane, the number of samples with that SSCP pattern is given (upper numbers), as well as the haplotype frequencies for group and ß isolates, respectively (lower numbers) (e.g., 0 + 1 means that no isolates in group and 1 isolate in group ß have pattern A). For the MP gene, those isolates without a KpnI site were digested by AccI; however, three isolates were digested neither by KpnI nor by AccI (lanes T and U).
|
(collected from cucurbit plants, same location and year) than for group ß (collected from various locations, host plants, and years) (P = 0.01) (Table 2). Curiously, the 2b gene had the highest genetic diversity for group ß isolates but the lowest for group
isolates (Table 2). For the California CMV population, genetic diversity ranged from 0.01323 ± 0.00275 to 0.02186 ± 0.00607 according to the genomic region analyzed; this gave a mean genetic diversity of 0.01648 ± 0.00366 (Table 2). |
View this table: [in a new window] |
TABLE 2. Genetic diversity within the California CMV population
|
(CK31, CK35, CK13, CK5, and CK27) formed a clade supported by a bootstrap value of 87%. The group ß isolates were much more dispersed and fell into distinct clades (e.g., 113B, 190A, MD284, and SJ91B) (Fig. 3A). For the MP gene, four group
isolates (CK7, CK5, CK58, and CK27) formed a small clade with a supporting value of 92%, which fell within a large clade composed of most isolates. The remaining group
isolates were unresolved, forming polytomy clades (Fig. 3B). However, as for the 2b phylogeny, group ß isolates appeared to be more diverse than group
isolates. The same trend was also observed in the phylogenetic tree based on the 3' NTR (data not shown). These results showed that group
isolates were closely related.
![]() View larger version (21K): [in a new window] |
FIG. 3. Bootstrap majority rule (70%) consensus trees of the CMV 2b (A) and MP (B) genes obtained by using California CMV isolates corresponding to the respective haplotypes of the regions analyzed (see Fig. 1) and reconstructed by maximum parsimony heuristic searches. Bootstrap values are given above the branches. The GenBank accession numbers of the reference CMV isolates used here are as follows: Fny, D10538; IA, AB042294; Ixora, U20219; Leg, D16405; Ls, AF127976; Mf, AJ276481; NT9, D28780; Q, M21464; S, AF063610; SD, AB008777; Tfn, Y16926; Trk 7, L15336; Y, D12499. Outgroups are ER-PSV (U15730) and V-TAV (L79972). Standard subgroup II CMV isolates are Q, Ls, S, and Trk7. Standard subgroup IB isolates are NT9, Tfn, IA, Ixora, and SD. Standard subgroup IA isolates are Fny, Mf, Y, and Leg.
|
and 12 from group ß) were chosen as representative of the California population based on the nucleotide distances of isolates in the population. In addition, 13 CMV isolates whose complete genome sequences were in GenBank were used for reference. The bootstrap maximum-parsimony trees for these genomic regions are shown (Fig. 4). Strikingly, isolate Ca was assigned to subgroup IA based on phylogenetic analysis of the MP and CP regions and the 3' NTR of RNA3 (Fig. 4A, B, and C) but to subgroup IB based on the 1a, 2a, and 2b genes (Fig. 4D, E, and F). In addition, trees for all three regions of RNA3 showed similar topologies, including the placement of isolate Ca, and trees for the two RNA2 genomic regions (2a and 2b) also showed similar topologies. Based on these congruent topologies, it seems unlikely that recombination events led to the origin of isolate Ca; rather, these topologies suggest that genome segment reassortment could have played a role. However, because only partial sequences of the 1a and 2a genes were used, phylogenetic analyses were performed in parallel by using complete 1a and 2a sequences for 13 CMV isolates (from GenBank) and also using partial sequences (corresponding to those obtained for our CMV isolates) for the same 13 isolates. These analyses were performed to assess the validity of our data; the results showed that, for the 2a gene, the two data sets gave essentially identical tree topologies, suggesting that the partial 2a sequence can be used for reconstruction of the evolutionary history of the 2a gene. In contrast, the topologies for the 1a gene in the two data sets were slightly different. Based on the complete sequence of 1a, isolate Leg was placed into a clade together with isolates Fny and Y, but Leg was unresolved with respect to isolates Fny and Y when the partial sequences were used (data not shown and Fig. 4D). In spite of this, our phylogenetic analyses still suggested that isolate Ca was a natural reassortant resulting from genetic exchange between subgroup IA and IB isolates. However, no potential ancestors of this isolate were identified among the 27 isolates analyzed here.
![]() View larger version (35K): [in a new window] |
FIG. 4. Bootstrap majority rule (70%) consensus trees of six genomic regions of CMV reconstructed by maximum-parsimony heuristic searches. Bootstrap values are given above branches. (A) MP; (B) CP; (C) 3' NTR of RNA3; (D) 1a; (E) 2a; (F) 2b. See the legend to Fig. 3 for isolate designations.
|
Close examination of the topologies for subgroup IA isolates also suggested evidence for reassortment between some subgroup IA isolates. Examination of the trees for the RNA2 coding regions showed that isolates V27, CK31, and CK33 appeared to be closely related (Fig. 4E and F). However for RNA3, V27 was placed separate from CK31 and CK33 but closer to C94M3 (Fig. 4A, B, and C). CK31 and CK33 were placed with several isolates including C94T1. To help clarify their relationships, pairwise nucleotide distances among these isolates and a subgroup IB isolate (NT9) were calculated by using sequence data for the 2a, 2b, MP, and CP coding regions (Table 3). Assuming that C94T1-like and V27-like isolates were the parental strains, these data suggested that RNA3 of isolates CK31 and CK33 came from a C94T1-like isolate but RNA2 (the 2a and 2b coding regions) came from a V27-like isolate. For isolate SJ91B, these analyses suggested that RNA3 (based on the CP and MP regions) also originated from a C94T1-like isolate, but the origin of RNA2 (based on the 2a and 2b sequences) was not clear. RNA3 of isolate C94M3 could have originated from a V27-like isolate, but the parental isolate for RNA2 (based on the 2a and 2b sequences) was also unknown.
|
View this table: [in a new window] |
TABLE 3. Pairwise nucleotide distances among six California CMV isolates and a reference subgroup IB isolate for the 2a, 2b, MP, and CP genes
|
, the average number of nucleotide differences between two random sequences in a population (also called genetic diversity), and
(S), the statistic of the number of segregating sites, were used here as two indicators to estimate genetic variation. For both estimations, the order of genetic variation, from greatest to least, was as follows: 2b, 2a, 1a, MP, and CP (Table 4). Thus, the 2b gene showed the most segregating sites and the highest frequency of mutations among the different genomic regions analyzed here. In terms of the number of nonsynonymous mutations, less than one-third of the mutations for 1a, MP, and CP were nonsynonymous, but the percentage was much higher (50% or higher) for the 2a and 2b regions, suggesting that the 2a and 2b genes were more flexible with regard to amino acid changes. To further understand the evolutionary constraints imposed on different coding regions, the ratio of the average number of nonsynonymous substitutions per nonsynonymous site (Ka) to the average number of synonymous substitutions per synonymous site (Ks) was estimated. The Ka/Ks ratios for all coding regions analyzed here were less than 1.0, indicating that they were all subjected to negative selection (Table 4). The 1a, MP, and CP genes showed low Ka/Ks ratios, suggesting high selective pressure, whereas the Ka/Ks ratios for the 2a and 2b regions were 5 to 36 times higher (Table 4), suggesting that the proteins encoded by these two genes were considerably more tolerant of amino acid changes.
|
View this table: [in a new window] |
TABLE 4. Population genetics parameters of different coding regions of CMV
|
![]() View larger version (22K): [in a new window] |
FIG. 5. Cumulative incidences of synonymous and nonsynonymous substitutions in coding regions 1a (A), 2a (B), 2b (C), MP (D), and CP (E). The x axis represents the position of the codon, and the y axis represents the average cumulative number of synonymous or nonsynonymous mutations estimated at a specific codon position. Red, green, and black curves, synonymous, nonsynonymous, and indel mutations, respectively. See Fig. 1 for diagrams of the genomic regions analyzed.
|
|
|
|---|
isolates showed very low diversity (mean, 0.00606 ± 0.00159), irrespective of the genomic region used for analysis (Table 2). This result is not very surprising for the group ß isolates, because they were collected from different host plants and locations and in different years. Phylogenetic analyses using nucleotide sequences of CMV isolates of the different haplotypes revealed that group
isolates were closely related (Fig. 3). Together with the low diversity among group
isolates, these data suggest that group
isolates were most likely derived via a founder event from a common ancestor and that they only recently colonized and spread within the area. However, we cannot rule out the possibility that the low diversity and close evolutionary relationship among isolates in group
could also be due, at least in part, to selection by the host plant, since all the group
isolates were collected from cucurbit plants (although of different cultivars [22]). Phylogenetic analyses of different CMV genomic regions revealed natural reassortment between subgroup IA and IB isolates and potential reassortment between subgroup IA isolates but yielded no evidence for recombination (Fig. 4). Our results are seemingly in agreement with the hypothesis that the purpose of a multipartite viral genome is to favor genetic exchange through reassortment (7, 30). However, conflicting data arguing against this hypothesis have been provided in a report for a Spanish CMV population in which both recombination and reassortment were infrequent, and reassortment was not more frequent than recombination (16). Thus, the absence of detectable recombinants in our study may also be explained by two alternative scenarios: either (i) recombination events occurred between two closely related isolates and are difficult to detect by the methods we used here or (ii) recombination events did occur, but the resulting recombinants were not favored and subsequently were selected out. Indeed, recombination events in the CMV genome and CMV satellite RNA have been described recently in both experimental systems and natural populations (1, 2, 4, 10, 16, 18, 37).
Estimation of various population genetics parameters showed that coding regions on RNA2 (2a and 2b) were more variable, suggesting that the resultant proteins were more tolerant to amino acid changes than were regions on RNA1 (1a) and RNA3 (MP and CP) (Table 4). Among all the coding regions analyzed here, the 2b gene appeared to be the most flexible. Furthermore, a short region in the 2b gene, corresponding to nucleotide positions 2659 to 2697 of RNA2, might be subjected to positive selection. It is noteworthy that the extent of selection pressure imposed on different coding regions seems to be correlated with the functions of the proteins they encode and/or their interactions with the host. This correlation, inferred from phylogenetic analyses of 15 CMV isolates with full-length sequences in GenBank, has been proposed by Roossinck (33). It is of interest to consider the 2b protein and its role(s) in host interactions. The 2b protein is related to long-distance movement and virulence and is a suppressor of RNA silencing (3, 13). Unlike the conserved 1a-membrane, MP-plasmodesma, and CP-RNA and -aphid vector interactions (11, 29), the 2b-host interaction has been suggested to be host specific. The fact that the 2b protein is essential for long-distance virus movement within cucumber plants but not for systemic spread in Nicotiana spp. is evidence in support of this hypothesis (13). This host-specific function of the 2b protein was believed to provide a genetic basis for the extremely wide host range of CMV (13), thereby theoretically allowing 2b to be considerably more tolerant to nucleotide and amino acid changes. Additionally, the 2b gene is present in all members of the genus Cucumovirus but in only one other genus of the family Bromoviridae, the genus Ilarvirus, and is proposed to be a novel, naturally occurring hybrid gene (13, 42). One might argue that since approximately 72% of the 2b gene is embedded within the 2a gene, its high variability might be affected by its overlapping nature. However, high variability was also observed in a short, nonoverlapping region of 2b (codon positions 81 to 93 [Fig. 5C]). Considering these facts together, it is reasonable to postulate that positive selection might still be apparent in the 2b gene and that this gene is still evolving. Further analysis for positive selection in the 2b gene will require the use of more-sophisticated tools and additional data sets.
Analysis of the distribution of synonymous and nonsynonymous mutations revealed that different coding regions exhibit different cumulative behaviors of mutations (Fig. 5), indicating that different parts of these coding regions are under different evolutionary constraints. This notion is consistent with the idea that different coding regions are under different selection pressures, as revealed by the estimation of Ka/Ks ratios (Table 4). The lack of synonymous mutations in some coding regions, i.e., codon positions 1 to 20 and 30 to 50 in the 2b coding region and positions 98 to 151 in the CP coding region (Fig. 5C and E) also suggests that these regions might be subjected to negative selection, possibly due to codon usage bias and/or maintenance of RNA structures (primary, secondary, and tertiary) important for RNA-RNA or RNA-protein interactions. It is also possible that the sample size in our study was insufficient for accurate analysis of these regions. The first scenario seems less likely, since an estimation of codon usage bias based on the ENC suggested that there was only slight bias in codon usage in these coding regions (Table 4). Interestingly, CP codon positions 98 to 151, corresponding to nucleotide positions 1564 to 1721 of Fny RNA3, were also found to have only a few mutations in quasispecies populations recovered from various host plants infected by a genetically identical CMV population (36).
This work was supported in part by the Biotechnology Risk Assessment Research Grants Program (award 99-33120-8293 to B.W.F.) of the U.S. Department of Agriculture and by the University of California. H.-X.L. was partly supported by the China Scholarship Council of the Ministry of Education, People's Republic of China.
Present address: Department of Biology, York University, Toronto, Ontario M3J 1P3, Canada. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»