Previous Article | Next Article ![]()
Journal of Virology, April 2004, p. 3252-3261, Vol. 78, No. 7
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.7.3252-3261.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
IRD, 34394 Montpellier cedex 5, France,1 ZARC, Zanzibar, Tanzania,2 INERA, Ouagadougou, Burkina-Faso,3 ADRAO, Abidjan 01, Côte d'Ivoire,4 International Laboratory for Tropical Agricultural Biotechnology, St. Louis, Missouri 631325
Received 25 August 2003/ Accepted 9 December 2003
|
|
|---|
|
|
|---|
RYMV is an RNA virus belonging to the genus Sobemovirus (46). Based on organizational differences in the central part of the genome (encoding the virus polyprotein), the sobemoviruses were subdivided into Southern cowpea mosaic virus (SCPMV)-like and Cocksfoot mottle virus (CfMV)-like types (42, 46). First reported to have an SCPMV-like organization (28), RYMV is now suspected to be of the CfMV type (12). Its genome harbors four open reading frames (ORFs) (42). ORF1, which is located at the 5' end of the genome, encodes a small protein involved in virus movement (7) and in suppressing gene silencing (47). ORF2, which encodes the central polyprotein, has two overlapping ORFs. ORF2a encodes a serine protease and a viral genome-linked protein. ORF2b, which is translated through a -1 ribosomal frameshift mechanism as a fusion protein, encodes the RNA-dependent RNA polymerase. The coat protein gene (ORF4) is expressed by a subgenomic RNA at the 3' end of the genome. The requirement of encapsidation for long-distance movement has been suggested (8). Additionally, the genome comprises three noncoding regions (NCR) at the 5' (5' NCR) and 3' (3' NCR) ends and between ORF1 and ORF2.
RNA viruses have a potential for much genetic variability due to the intrinsically high mutation rate associated with the RNA-dependent RNA polymerase, their high rates of replication, and their large population sizes (9, 10, 22, 34). RYMV variability was first apparent from the detection of several serotypes in immunological studies with polyclonal and monoclonal antibodies (13, 20, 23, 29). Moreover, sequencing of the coat protein gene revealed genetic variation within each serotype (11). These studies suggested that the strains followed a geographic distribution with a split between East and West African strains (3, 32, 45).
Comparative analysis of gene sequence data and geographic information can elucidate the origin and spread of viruses as well as the evolutionary processes that underlie their genetic diversity. Accordingly, we sequenced the full genome of 14 isolates selected as representatives of the genetic variability as well as of the geographic distribution of RYMV from a set of 320 isolates that had been serologically typed or partially sequenced. These 14 sequences, together with two previously reported ones (28), were analyzed. First, we identified the major evolutionary constraints operating on the genome. Phylogenetic analyses were then made to determine the genetic relationships between the isolates. Last, phylogeographic studies were conducted to assess the links between geographic and genetic distances. Altogether, these analyses suggested that (i) RYMV evolution operated under a conservative selection, in the absence of adaptation or recombination events, (ii) RYMV dispersed and differentiated gradually from the east to the west of Africa, and (iii) RYMV originated and evolved in wild graminaceous species and only recently infected cultivated rice.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Origins and references of isolates of RYMV used in the study
|
Sequence analyses. The following analyses were conducted on the 16 sequences.
(i) Sequence alignment. The sequences were aligned using CLUSTAL W with default parameters (43). The alignment was corrected by hand in some inappropriate gaps that were not multiples of 3 nt in coding regions to maintain the alignment of the encoded amino acids.
(ii) Sequence diversity.
The diversity index (
), which is the average number of nucleotide differences per site between two sequences (26), was calculated along the whole 16 genome sequences by using a 100-nt sliding window with a 25-nt step. The value assigned to the nucleotide was that of the window midpoint. Additionally, for each ORF,
a, the average number of nucleotide substitutions per nonsynonymous site,
s, the average number of nucleotide substitutions per synonymous site, and their ratio (
=
a/
s) were calculated (27). All diversity indices were calculated by using DNAsp version 3.5 (36).
(iii) Search for positive selection.
A search for positive selection was performed on each of the four ORFs (excluding the overlapping parts) with the 16 sequences. More extensive samples of 36 and 48 sequences of ORF4 representatives of the full corpus of 145 sequences were also tested. The pattern of selection was inferred through
values corresponding, respectively, to negative selection (
< 1), neutral evolution (
= 1), and positive selection (
>1) (50). Estimations of
were performed within the maximum-likelihood (ML) framework which used codon-based models of sequence evolution that account for phylogenetic structure, biases in codon usage, and the transition/transversion ratio (51). Efficient determination of sites under positive selection only required implementation of six models of codon substitution: M0, M1, M2, M3, M7, and M8 (50). Null models M0, M1, and M7 did not allow the existence of positively selected sites because
values were fixed or estimated between 0 and 1 boundaries, whereas models M2, M3, and M8 allowed positive selection with parameters that estimate
to be >1. The significance of positive selection was then evaluated through a likelihood ratio test between null models and those which allowed positive selection. Models M0 and M1 are both nested within M2 and M3, M2 is nested within M3, and M7 is nested within M8. Once positively selected sites were found, a Bayesian approach was used to infer the most likely value for each site. Models were implemented by using the CODEML program of the PAML package, version 3.1 (49).
(iv) Residue substitution. Residue substitution was estimated by using the most parsimonious series of substitutions that could give rise to the residue differences observed in the alignment given the current phylogenetic tree relationships. This was applied on the whole genome to calculate the transition-to-transversion ratio and to assess, on each ORF, the amino acid replacements according to their physical properties determined by their pairwise physicochemical distances (15).
(v) Phylogenetic analyses. Phylogenetic relationships between the isolates were determined by three methods: the ML with the transition/transversion ratio estimated through ML, the maximum-parsimony (MP) method, and a distance method where the nucleotide pairwise distances were corrected by using the Kimura two-parameter methods and trees were reconstructed by the neighbor-joining (NJ) method. The full heuristic search option was used, and the significance of the internal branches was evaluated by using 1,000 bootstrap replications for MP and NJ analyses and 100 replications for ML analyses. All analyses were implemented with PAUP, version 4.0 (40).
(vi) Bremer support index. MACLADE 4 (21) was used to calculate, under the most parsimonious hypothesis, the Bremer support index of the branches (number of nucleotide substitutions necessary to break a node).
(vii) Search for recombination. The aligned sequences were checked for incongruent relationships that might have resulted from recombination by using a distance method implemented in PHYLPRO (48) and a nucleotide substitution distribution method implemented in GENECOV (39).
(viii) Phylogeographic studies. The genetic distances between the isolates were expressed in matrices of pairwise nucleotide divergence percentages (26). We applied the log10 (d + 100) transformation of the geographic distances (d; in kilometers) to test the relationships with the genetic distances. Correlations between genetic and geographic distances (after logarithmic transformation) were assessed with the Mantel test (24) implemented in GENETIX 4.02 (5).
(ix) Genetic distances between clades. The average number of substitutions per site between two populations (27) was calculated by using DNAsp, version 3.5 (36), to assess the genetic distances between the clades.
|
|
|---|
![]() View larger version (27K): [in a new window] |
FIG. 1. Genomic organization of RYMV (bottom) and nucleotide diversity index along the genome (top). The diversity index ( ), average number of nucleotide differences per site between two sequences, was calculated along the genome by using a sliding window of 100 nt moved by steps of 25 nt after alignment of the 16 sequences. The value is assigned to the nucleotide at the midpoint of the window. Key conserved amino acid motifs and frameshift (FrSi) or transcription (TrSi) signals are indicated (see text for details).
|
|
View this table: [in a new window] |
TABLE 2. Diversity indices (average and maximum) calculated on the total ( ), synonymous ( s), and nonsynonymous ( a) sites and the ratio ( = a/ s) for the four ORFs after alignment of the 16 RYMV sequences
|
a) was 6 to 11 times less than the synonymous diversity and ranged between 2 and 5%, with the maximum between any two isolates being 9% (Table 2). This indicated that the conservative selection pressure operated mostly at the protein level. Patterns of amino acid changes also provided information on the selection pressure which acted on proteins. With RYMV, the physicochemical properties of the amino acids as defined by Grantham (15) were conserved in most replacements, no matter which the ORF (data not shown). However, selection pressure also occurred at synonymous sites, as apparent in the conservation of the various nucleotide signals and secondary structures reported above, and in the differences in
s between ORFs (Table 2). The nucleotide diversity of the 3' NCR was similar to that of the ORF1 total diversity index, which was further indication that the conservative pressure also operated in NCR. Nucleotide substitution by transition was more frequent than by transversion, with a transition/transversion ratio of 7.2. None of the models used to assess diversifying selection detected sites under positive selection within the first three ORFs. By contrast, M3 and M8 detected a single site under positive selection in ORF4 (Table 3). Analyses of the 36 and 48 sequences of ORF4 gave the same results. Bayesian analyses assigned this site to threonine218 (nt 4095 to 4097) with posterior probabilities of 0.98 (model M3) and 0.92 (model M8). However, the likelihood ratio tests indicated that the models detecting positive selection were not the most likely ones (Table 3). A conservative conclusion is that there is no site under positive selection in the RYMV genome, even within ORF4.
|
View this table: [in a new window] |
TABLE 3. ML analysis of the evolution of RYMV coat protein with models allowing to vary across amino acid sitesa
|
![]() View larger version (33K): [in a new window] |
FIG. 2. Phylogenetic trees of 16 representative isolates of RYMV calculated from the complete genome (a) and from the ORF4 sequences (b). These trees were constructed by using the ML method. The numbers at each node indicate the percentage of bootstrap support (values of >70% are shown for the full genome; values of >50% are shown for ORF4). Phylogenetically constrained insertion-deletion events are indicated at the corresponding node at the left of panel a. The geographic origins and sizes of the genomes of the 16 isolates are shown at the right of panel a. The incongruence between full-genome and ORF4 tree topologies is indicated by an arrow in panel b.
|
The phylogenetic analyses were conducted on each individual ORF and on the 3' NCR with the ML, MP, and NJ methods. Compared to the full genome, there was a loss of phylogenetic resolution in trees based on ORF2a and ORF2b sequences, which were the most conserved regions. Due to the lower number of informative sites, some groups separated in the full-length sequence analysis were collapsed within the same group. Minor incongruences were observed in ORF1, ORF2a, and ORF2b. However, the only incongruence which broke the relation between the branching order and the east to west origins of the isolates was through the coat protein gene (ORF4) phylogenetic analysis (Fig. 2b). With the full genome (Fig. 2a), isolates Ma10, CIb, and CI4 from Mali and Côte d'Ivoire in West Africa belonged to a monophyletic group with all other isolates from West Africa ((CIb, CI4), Ma10, (Ma77, (CI63, (CIa, SL4)))). By contrast, with ORF4 (Fig. 2b), the West African isolates CIb, CI4, and Ma10 formed a sister group of Central African isolates Nia, Ni1, and Ni2 within a paraphyletic group (((Ni1, Ni2), Nia), (CIb, CI4), Ma10). This was apparent whatever the methods used and whether they were applied on nucleotide or amino acid sequences (data not shown). This paraphyletic grouping of isolates from Central Africa with some isolates from West Africa from savanna ecologies on the basis of ORF4, previously referred to as savanna strains (30), was also observed with the extended samples of 36 and 48 sequences of ORF4.
The partial incongruence in ORF4 was not a signal of ancient interstrain recombination between ORF4 and other parts of the genome, as no recombination events were detected by using PHYLPRO or GENECOV (data not shown). MP analysis also indicated that the substitutions supporting the node of the paraphyletic group (((Ni1, Ni2), Nia), (CIb, CI4), Ma10) were spread along the coat protein gene and not gathered within any given region, as expected after a recombination event. Neither was ORF4 incongruence due to convergent evolution on the CP, as no adaptive selection was detected through the study of
values (see above). Actually, the corresponding Bremer support index, the number of nucleotide substitutions necessary to break the node of the paraphyletic group, was only 4. The Bremer support index of the node supporting the isolates from West Africa was 6 when calculated on the full genome. Then a nonuniform distribution of a few substitutions among some lineages across the genome could explain this incongruence. Practically, this incongruence indicates that ORF4 sequencing is a reliable typing tool to assign the RYMV isolates to the main clades but that it is inappropriate to assess the phylogenetic relationships between some clades.
Phylogeographic studies. There was a close relationship between the geographic distance (after logarithmic transformation) and the genetic distance between isolates (Fig. 3). The more apart the origin of the isolates, the greater was their genetic distance. This was verified on the full genome and also with each individual ORF and with the 3' NCR (correlation coefficients [r] ranged between 0.53 and 0.79; P < 0.001). In no instance did isolates separated by a long distance have a low genetic divergence. This was true also with the 145 isolates partially sequenced (data not shown).
![]() View larger version (46K): [in a new window] |
FIG. 3. Pairwise comparisons and corresponding linear regression lines of geographic distance (in kilometers, after logarithmic transformation) and genetic distance (percent divergence in nucleotides) between each of the 16 isolates calculated on the full genome, on each ORF, and on the 3' NCR.
|
![]() View larger version (33K): [in a new window] |
FIG. 4. Geographic distribution of the six clades in mainland Africa. The six clades were defined by analysis of the complete genomic sequences of representative isolates of RYMV by using the ML method. Each isolate partially sequenced was assigned to one of the six clades by phylogenetic analyses. The geographic area of each clade was derived from the locations where these isolates were collected.
|
) was calculated on the total number of sites and also by distinguishing the synonymous (
s) and nonsynonymous (
a) sites. Nucleotide diversity gradually decreased moving westward, being the greatest in the eastern clade 1 and minimal in the most western clades 5 and 6 (Fig. 5). This decrease was observed whether the diversity index was calculated on the total nucleotide substitutions (Fig. 5a) or on synonymous or nonsynonymous substitutions (Fig. 5b). Genetic distances among clades were calculated from the number of substitutions per site among the corresponding isolates. Altogether, they were consistent with the general topology of the phylogenetic tree with an increase in genetic distances between clades from west to east. The genetic distance of the most western clade 6 to other clades increased from 3.5% with clade 5 up to 11% with the most eastern clade 1.
![]() View larger version (36K): [in a new window] |
FIG. 5. Total nucleotide diversity index and standard error calculated for 10 representative isolates of each clade oriented along an east-to-west transect across Africa (a) and diversity indices calculated for synonymous and nonsynonymous sites (left and right scales, respectively) (b).
|
|
|
|---|
The average nucleotide diversity among the 16 isolates of RYMV was 7%, and the maximum diversity between any two isolates was 10%. RYMV diversity showed a pronounced and characteristic spatial structure. The branching order of the clades correlated with the geographic origin of the isolates along an east-to-west transect across Africa and was associated with a marked decrease in the nucleotide diversity westward. The indel polymorphism and the nucleotide substitution patterns were related. There was a close relationship between genetic and geographic distances. In no instances were two distant isolates (from sites >100 km apart) genetically close (<1%). This was apparent not only with the 16 isolates fully sequenced but also with the 145 isolates partially sequenced (data not shown). This relationship between geographic and genetic distances and the associated marked geographic structure would not be apparent if a long-distance spread had occurred. Overall, this indicated that dispersion of the virus during its evolutionary history was gradual. There was no evidence that differences in geographic distribution of the strains along an east-to-west transect across Africa had any sort of selective basis such as adaptation to particular hosts, vectors, or agro-ecologies. This was supported by analyses of the sequences which showed that no single amino acid sites were under positive selection. Considering that these analyses are highly conservative (50, 51), we conclude that adaptation did not play a major role in RYMV evolution. Altogether, RYMV evolutionary history showed features markedly different from those of other plant viruses subjected to similar analyses of full sequences of several representative isolates. In particular, reassortment was found to be critical for Cucumber mosaic virus (35), recombination and long-distance transport by humans were involved for Turnip mosaic virus (44), and adaptive selection was suspected for Potato leafroll virus (17).
Full-genome analyses excluded earlier hypotheses (30), based on coat protein gene sequences, of an adaptation to savanna regions of some isolates from Central Africa (clade 3) and West Africa (clade 4). Actually, conflicts among data partitions appear to be the rule rather than the exception (16, 37). The partial incongruence between ORF4 and the rest of the genome was not due to obvious recombination or positive selection events. Considering the few informative sites in ORF4 supporting the paraphyletic group, the most conservative conclusion is that the partial incongruence is due to a nonuniform distribution of a few substitutions along lineages across the genome. Practically, this incongruence indicates that sequencing the coat protein gene, the most widely used gene in phylogenetic studies of plant viruses, may lead to erroneous evolutionary assumptions. However, for RYMV at least, the coat protein gene is a valid typing tool to assign the isolates to the main clades without ambiguity and to assess intraclade diversity, but it is inappropriate to determine the phylogenetic relationships between some of the clades.
Information on the evolutionary history of viruses can be inferred from the analyses of their spatial genetic structures (14). Our results suggested an origin of RYMV in East Africa, a gradual dispersion and differentiation westward across the continent, and a genetic isolation by distance. Strain interaction may reinforce genetic isolation of the clades. In particular, we found that isolates from clade 5 dominated when coinoculated with isolates from clade 4 (30). This was observed in cultivated rice (30) and in wild grasses (our unpublished data). These interactions may explain the wider distribution of clade 5 over clade 4 in West Africa and the lack of double infection. Genetic isolation of clades due to strain interaction was also suggested for Wheat streak mosaic virus (18).
These results provide information on the conditions of emergence and on the tempo of evolution of RYMV. A recent and substantial long-range virus dissemination by vectors or humans from Kenya since 1966 to the rest of Africa is most unlikely as phylogeographic relationships would not have been preserved if long-distance dispersal had occurred. This is consistent with the lack of efficient biotic and abiotic long-range means of dispersal for RYMV, as there is no seed transmission, a short retention time in the vector, and a low flight ability of the beetle vector. Coevolution between RYMV and cultivated rice is unlikely also. Two species of rice are cultivated in Africa, Oryzae glaberrima and O. sativa. Both are susceptible to RYMV. O. glaberrima was domesticated in West Africa c. 2,500 years ago (31). O. sativa was introduced to Africa from Asia, where RYMV is absent, a few hundred years ago. RYMV evolution characterized by a higher diversity in East Africa does not fit with the longer history of cultivated rice in West Africa. Overall, this suggests that the observed evolutionary history of RYMV developed in primary hosts and colonized cultivated rice only later. The primary hosts are likely to be wild grass species, as the present host range of the virus is limited to the gramineaceous species.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»