Previous Article | Next Article ![]()
Journal of Virology, June 2006, p. 5833-5840, Vol. 80, No. 12
0022-538X/06/$08.00+0 doi:10.1128/JVI.00122-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-UPV, 46022 València, Spain,1 Molecular Evolution and Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Ireland2
Received 18 January 2006/ Accepted 26 March 2006
|
|
|---|
|
|
|---|
Ilarviruses have the same genome organization, encoding products functionally similar to those of alfalfa mosaic virus (AMV) and other members of the Bromoviridae family. RNAs 1 and 2 encode the proteins involved in replication (P1 and P2, respectively). RNA 3 is bicistronic and encodes the movement protein (MP) and coat protein (CP) (35), although CP is translated from subgenomic RNA 4. Both MP and CP play important roles at different stages of the infectious cycle of ilarviruses. For example, the binding of CP into stem-loop structures flanked by AUGC sequences located in the 3' nontranslated region of the inoculum RNAs is required for initiation of infection, a phenomenon known as genome activation (5, 6, 8, 24). PNRSV and AMV are phylogenetically close (10, 11, 42). Most studies on the implication of CP in the viral life cycle have been conducted with AMV, whereas there are relatively few experimental data on PNRSV CP structural properties and biological functions (3). For instance, AMV CP is required for plus-strand RNA accumulation, encapsidation, cell-to-cell movement, and systemic spread of the virus (49). This latter function requires the establishment of a physical interaction between CP and MP. It has also been suggested that, in AMV, MP may form tubular structures that traverse the cell wall through modified plasmodesmata, and these tubules mediate unidirectional transport of viral RNA-MP-CP complexes (41).
Tenllado and Bol (49) showed that it was possible to map the different AMV CP functions into different domains of the protein. The N-terminal arm looks sufficient for genome activation without the concourse of CP dimers, as demonstrated by using synthetic peptides of this protein region in in vivo experiments (4). Mutations in the N-terminal arm also affect cell-to-cell movement. The C-terminal domain, except the last 7 to 14 amino acids, is involved in dimer formation, which is required, for example, for correct virion formation. Amino acids 17 to 20 are critical for plus-strand RNA accumulation. Finally, mutations at the N and C termini affect the movement of viral materials through the vascular system (49).
The CP N-terminal domain has been shown to be involved in cell-to-cell movement and in systemic spread in members of the different genera within the family Bromoviridae (AMV [41], brome mosaic virus [BMV] [16, 38, 39, 40], and cucumber mosaic virus [CMV] [25]). Furthermore, Sánchez-Navarro and Bol (41) also showed that deletions in the N- or C-terminal arm of AMV MP did not interfere with the ability of the protein to assemble into tubules and allow cell-to-cell and systemic movements. However, simultaneously deleting the 11 N-terminal amino acids and 45 amino acids at the C terminus produced proteins able to form tubules but not proficient in promoting cell-to-cell movement. Furthermore, recently Sánchez-Navarro et al. (43) have shown that the 44 amino acids at the AMV MP C terminus physically interact with CP to elicit movement. Apparently, the C terminus of AMV MP confers specificity on the transport process (41) as it does in the cases of BMV (47) and CMV (26). MPs of PNRSV, AMV, CMV, and BMV have been shown to have an RNA-binding domain located at the N terminus of the protein in the first two cases or at the C terminus in the last two (3, 4, 7, 21). At least in the case of PNRSV, it has been shown that basic amino acids in this domain are required for RNA-binding activity and cell-to-cell movement (21). Finally, it has been reported that CMV MP also interacts with the 2a polymerase gene, although this interaction is not relevant for the formation of the replicase 1a-2a complex (23).
Although it is widely accepted that MP and CP should interact to facilitate viral cell-to-cell movement, direct evidence has not been provided. Here we report results from a molecular evolution study seeking to identify the selective constrains acting upon CP and MP during the evolution of different isolates of PNRSV. Our aim is twofold. At the one side, using a maximum-parsimony approach, we have identified amino acid sites under selection in each protein at different evolutionary time points. At the other side, we have applied statistical methods drawn from information theory to identify groups of amino acids that covary throughout time as a consequence of their functional or structural relationship. The existence of covariation groups was explored both within and between proteins.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. PNRSV isolates included in this study
|
According to the likelihood ratio test implemented in MODELTEST version 3.7 (37), the molecular evolution model that best explains the observed pattern of nucleotide diversity for MP alignment was that proposed by Tamura and Nei (48), which was modified to account for the heterogeneity in substitution rates among sites (gamma parameter 0.285). Similarly, the model that best explained the observed sequence variation in CP alignment was that proposed by Kimura (27), with a transition to a transversion rate ratio of 2.291 and also incorporating heterogeneity in substitution rates among sites (gamma parameter 0.265). Phylogenetic reconstructions were obtained by the minimum-evolution method as implemented in the MEGA3 program (31) with the above nucleotide substitution models. The support of the internal nodes of the trees was evaluated by the bootstrap method with 10,000 pseudoreplicates (15). Nodes with bootstrap support of <75% were collapsed to the nearest significant node.
Analysis of selective constraints.
To uncover the signature of coevolution between CP and MP, we first explored the selective constraints operating upon each individual protein. The advantage of using parsimony methods to determine codon regions consistently under specific selective constrains relies on its robustness against methodological biases (46). In maximum-parsimony analyses, the comparison of the rates of synonymous (dS) and nonsynonymous (dN) substitutions per class of sites is used to describe the evolutionary dynamics of protein-coding genes. The ratio of these two rates (
= dN/dS) provides information on whether the gene has been fixing amino acid replacements in a neutral fashion (
= 1), amino acid changes have been removed by the action of purifying selection (
<1), or changes have been fixed by adaptive evolution (
>1).
The particular method chosen is based on the parsimony criterion, which is appropriate here, given the conservation at the codon level of our alignments, and is implemented in the program SWAPSC version 1.0 (12). SWAPSC uses a statistically optimized window size to detect selective constraints in specific codon regions of the alignment at a particular branch of the phylogenetic tree (13). Briefly, the method estimates the expected distribution of dS and dN by Li's method (32) from simulated sequence alignments and assuming a Poisson distribution of substitutions. A statistically optimum window size is then estimated that makes the detection of adaptive evolution independent of the window size. The empirical values of dS and dN obtained by using the optimal window size are contrasted with the expected distributions, and several hypotheses regarding the selective constraints acting on codon regions are tested. Simulated sequence alignments were obtained with the program EVOLVER from the PAML package version 3.14 (54) with parameters estimated from the true sequence alignments after running the most appropriate codon-based model in PAML (models M2 and M8 for MP and CP, respectively [55]).
The Bayesian approach implemented in the CODEML program from PAML was also used. The sites determined to be under selection were the same as those found by the above-mentioned maximum-parsimony approach, giving robustness to the conclusion (data not shown).
Testing for coevolution between amino acid sites. The aim of this analysis was to determine whether MP and CP have coevolved during PNRSV phenotypic radiation because of their putative functional link. We also tested whether coevolving amino acids can be used to predict protein-protein contact interfaces. We realize that this type of analysis is not definitive in resolving docking problems, but nonetheless, results could aid in identifying specific amino acid sites responsible for protein interactions and in designing new experiments. We are also interested in a more general definition of protein interaction, which can be functional, physical, or phylogenetic. Significance of coevolution was tested by the mutual information criterion (MIC) approach taken from information theory (28, 29). MIC definition involves the joint probability distribution, P(xi, yj), of the occurrence of symbol x at position i and symbol y at position j belonging to the same protein or to two different proteins. MIC values range between 0, indicating independent evolution, and a positive value whose magnitude depends on the strength of covariation between sites. Variable positions included in the analyses were only those parsimony informative (i.e., containing at least two types of amino acids each with a minimum count of two). The statistical power of this test depends both on the sample size and on the level of variation at the sites considered. The significance of MIC values was assessed by means of a randomization test in which columns in the amino acid alignment of parsimony-informative sites were shuffled in place for each protein. The MIC value was recalculated thereafter for each shuffle to generate the expected probability distribution under the null hypothesis of no association. Significance P values were computed, based on a million permutations, as the fraction of shuffles with a MIC value greater than or equal to the observed value. To minimize the number of false positives, the significance level was set to 1%. Only the 15 isolates for which both full-length protein sequences were available in the database were used for these analyses (Table 1).
Threading predictions of protein tertiary structure.
Tertiary structures of MP and CP were predicted by threading with the server http://cbsuapps.tc.cornell.edu/loopp.aspx and default options (33). Structure predictions are available upon request (see Fig. 3). So far, the native tertiary structures of PNRSV MP and CP have not been experimentally determined. However, the structure of phylogenetically close AMV CP has been obtained at 4 Å resolution (30). At this resolution, CP presents a globular structure with both the C and the N termini emerging from it. This gives indirect support to our prediction of PNRSV CP folding as a globule with protruding ends. Even less is known about MP structure. The PNRSV MP RNA-binding domain (amino acids 56 to 88) has been shown by circular dichroism to fold into an
-helix (22). Our predicted globular structure for MP reflects this experimental evidence. Nonetheless, given the scarceness of information on the in vivo three-dimensional structure adopted by MP, our threading prediction cannot be contrasted with experimental data and therefore a flag of caution has to be put on any conclusion drawn from the predicted structure.
![]() View larger version (43K): [in a new window] |
FIG. 3. Selected sites and covariation analysis of MP and CP. Color code: red, positively selected sites; blue, negatively selected sites; green, sites showing hypermutagenesis. Covarying amino acids are indicated and linked with yellow arrows.
|
|
|
|---|
Figure 1 shows the minimum-evolution phylogenetic trees inferred for the genes for MP and CP. Regardless of the gene analyzed, internal nodes are generally well supported by the bootstrap analysis. External nodes are generally less well supported (Fig. 1). More importantly, the three main groups previously defined in the literature upon sequence homology (PE-5, PV-32, and PV-96) are highly supported by bootstrap values (Fig. 1). However, the phylogenetic position of isolate ChrIt.mrs1/Italy/Cherry, previously considered a member of the PV-96 group (2), was not well resolved. As shown in Fig. 1A, its splitting node has a bootstrap support value of <75% and thus has been collapsed. Nonetheless, to keep our classification consistent with previous proposals (2), ChrIt.mrs1/Italy/Cherry has been conservatively included within the PV-96 group (Fig. 1A).
![]() View larger version (49K): [in a new window] |
FIG. 1. Phylogenetic analysis of MP (A) and CP (B). Trees were inferred by the minimum-evolution method using the Tamura and Nei corrected nucleotide distances (48). Values at the nodes are bootstrap support values based on 10,000 pseudoreplicates. Nodes with bootstrap support of <75% were collapsed. ChrIt.mrs1/Italy/Cherry has been included in the PV-96 group for consistency with previous proposals (2).
|
Detecting selective constraints in MP and CP.
The parsimony-based method found evidence of adaptive evolution in both proteins (Table 2). The optimum window sizes estimated by SWAPSC were five and four codons for MP and CP, respectively. A single region in MP appears to be under adaptive evolution, i.e., positive selection (
>1), E252 to E257. This region was selected after the split occurred within the PE-5 clade and produced the ancestor of a set of geographically and host diverse isolates (branch G in Fig. 2A). MP sites under purifying selection (
<1) are located in regions L61 to S65 and R212 to L216. The first region has undergone purifying selection in all branches of the phylogenetic tree, whereas the action of purifying selection on the second region has been limited to branches A, B, and G (Table 2 and Fig. 2A), the internal tree branches that gave rise to the different clades. Finally, the method also detected three regions of the protein that show globally accelerated rates of amino acid substitution (Table 2). The hypermutagenesis of these regions took place in the branches leading to the PE-5 cluster (branches G and H in Fig. 2A), supporting the hypothesis of an accelerated rate of evolution during the genesis of this clade. The above-mentioned sites were mapped onto the predicted tertiary structure (Fig. 3).
|
View this table: [in a new window] |
TABLE 2. Summary of SWAPSC maximum-parsimony analysis of adaptive evolution in PNRSV CP and MPa
|
![]() View larger version (17K): [in a new window] |
FIG. 2. SWAPSC analysis of selective constraints in MP (panel A) and CP (panel B). In this example, the branch detected under adaptive evolution is drawn as a dotted line. The plots show the distribution of the nonsynonymous to synonymous substitution rate ratio ( ) along the sequence. Peak heights are proportional to the intensities of positive selection in the corresponding regions.
|
The fact that the branch leading to the ancestor of PE-5 isolates was found to be under adaptive evolution in both proteins provides a first indication of coevolution between MP and CP. Interestingly, this putative process of coevolution is in the root of the genesis of a new serotype.
Covariation within and between MP and CP. By the MIC approach, we have detected several covariation groups (P < 0.01) within and between proteins (Table 3). Figure 3 illustrates the distribution of covariation sites on the predicted tertiary structures. In a first step, we analyzed within-protein covariation. Regarding MP, three amino acid residues show significant covariation (L253, E257, and I261; Table 3). Hereafter, we will call this covariation group MP CG. For CP, two different significant covariation groups (hereafter called CP CG1 and CG2) have been identified (Table 3). CP M52 belongs to both groups, and the covariation between V48 and D141 was statistically significant at the usual 5% significance level (MIC value = 0.161, P = 0.040), suggesting that both covariation groups can be reduced to a single one at the less stringent 5% confidence level. Nonetheless, in the following discussion we will maintain the distinction between two nondisjoint groups.
|
View this table: [in a new window] |
TABLE 3. Covariation groups within and between proteins
|
|
|
|---|
There are two domains of MP that are highly conserved, probably because of structural and/or functional constraints. These domains range, respectively, from L61 to S65 and from R212 to L216. In addition, amino acid sites 8 to 70 have been predicted to form a transmembrane domain and may be necessary for the protein to interact with the cell wall (42). Supporting the existence of such functional and/or structural constraints in this region, we have identified sites that are under the action of negative selection (Table 2, L61 to S65). In addition, it has also been recently shown that MP interacts with RNA molecules throughout amino acids 56 to 88 (21). Interestingly, this 32-amino-acid domain includes the above-mentioned amino acids detected as negatively selected (L61 to S65). These basic amino acids should be, in accordance with our threading structural predictions (data not shown), exposed to the solvent, both properties being typical of RNA- and DNA-binding motifs.
The variability in CP is concentrated in two regions, including one between N-terminal amino acids 48 and 56 (19) and another between C-terminal amino acids 139 and 145. These variable regions also contain most of the sites found to be under positive selection (Table 2, S49 to G53 and K139 to Q142). Amino acids 25 to 50 are essential for the protein to bind PNRSV RNA in vitro (3, 36). More generally, the N-terminal domain of members of the different genera within the family Bromoviridae (Cucumovirus [20], Bromovirus [44], and Ilarvirus [3, 4, 7]) is involved in capturing RNA during encapsidation and also plays an important role in the phenomenon of genomic activation in the case of ilarviruses (5, 6). Zhang et al. (56) predicted, on the basis of in silico experiments, that the main requirement for cowpea chlorotic mottle bromovirus RNA encapsidation should be an abundance of positively charged amino acids at the internal surface of the capsid. In good agreement with this prediction, changes S50N and D141N (both positively selected in branch D leading to group PE-5) and change G53R (positively selected in the branch leading to isolates ChrIt.lam1/Italy/Cherry and AprIt.caf1/Italy/Apricot of group PV-96) imply net gains of positive charges. Therefore, changes increasing the strength of the RNA-CP electrostatic interaction may have been selected during PNRSV diversification.
It has also been suggested that the amino acids around M110 should play an important role during the formation of CP dimers (9, 52). This importance is confirmed by our finding of a stretch of four amino acids (Table 2, L114 to L117) under the action of negative selection.
Our maximum-parsimony analysis of CP identified sites under positive selection at the origin of the PE-5 group (Table 2, S49 to G53). Precisely in this region of PE-5 CP, Va
ková et al. (52) have predicted upon bioinformatic grounds the existence of an antigenic site. Amino acids K139 to Q142 (Table 2) have also been identified as under positive selection in the branch leading to isolates PchIt.mry1/Italy/Peach and ChrIt.bla1/Italy/Cherry of the PE-5 group. The biological significance of these sites is not obvious. However, since it is quite close to the region including A148 to K152, which is (i) under negative selection (and hence likely has an important function) and (ii) predicted to be in the surface of the folded protein (data not shown), we can speculate that it may be involved in the formation of the MP-CP complexes that allow PNRSV cell-to-cell movement. This suggestion is supported by our covariation analyses, which reveal that the C-terminal amino acids of MP are coevolving with the N-terminal region of CP (CP CG1). This suggestion gets further support from evidence for such contact in other members of the Bromoviridae family such as CMV (26), BMV (47), and AMV (41, 43).
Comparison of Tables 2 and 3 shows that all but one (MP I261) of the amino acids involved in covariation groups have been shown to be the target of natural selection. MP L253, E256, and E257 are under positive selection, as is the case for CP M52 and D141, whereas CP V48 is under purifying selection. This almost perfect match suggests that selection acts upon the maintenance of the right folded structures (i.e., within-molecule interactions) and interaction between proteins. From a practical standpoint, this match between the two types of analyses enhances the robustness of the results.
Within the inherent limitations of the threading technique used for predicting MP and CP tertiary structures, it is worth noting that the amino acids covarying between the proteins are not scattered along the molecules but it is possible to fit both molecules in space in such a way that the predicted intermolecular covariations start making structural sense (Fig. 3). The three amino acids belonging to MP CG plus MP E256 are all concentrated in the same surface region of the predicted globule, whereas the three amino acids belonging to CP CG1 and CG2 are located in a predicted hypothetical cavity (CP M52 occupying a central location) into which MP would fit to maximize the likelihood of physical interactions between the two predicted between-protein covariation groups.
The present in silico study opens new research avenues for researchers interested in experimentally exploring the in vivo interaction between MP and CP. In particular, it suggests which sites can be changed by site-directed mutagenesis to preclude the formation of MP-CP complexes. Also, site-directed mutagenesis studies may shed light on the importance of the sites identified here as the target of selection.
This work was supported by grants from the Spanish MEC-FEDER (BMC2003-00066 and BFU2005-23720-E/BMC), the Generalitat Valenciana (GV04B280 and GRUPOS03/064), and the EMBO Young Investigator Program to S.F.E. and from the Irish Science Foundation under the President of Ireland Young Researcher Award program to M.A.F. F.M.C. was the recipient of fellowships from the CSIC I3P Bioinformatics program and from the ESF Functional Genomics program.
|
|
|---|
ková, D., K. Petrzik, and R. Karesová. 2000. Variability and molecular typing of the woody-tree infecting prunus necrotic ringspot ilarvirus. Arch. Virol. 145:699-709.[CrossRef][Medline]This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»