Previous Article | Next Article ![]()
Journal of Virology, July 2003, p. 7202-7213, Vol. 77, No. 13
0022-538X/03/$08.00+0 DOI: 10.1128/JVI.77.13.7202-7213.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Rega Institute for Medical Research, KULeuven, Leuven, Belgium,1 Molecular Virology and Bioinformatics Unit, Africa Centre, Nelson Mandela School of Medicine, Durban, South Africa,2 Laboratoire Retrovirus, IRD, Montpellier, France,3 Linnaeus Center for Bioinformatics, Uppsala University, Uppsala, Sweden,4 Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand,5 Human Immunodeficiency Virus and Retrovirology Branch, Division of AIDS, Sexually Transmitted Diseases, and Tuberculosis, Centers for Disease Control and Prevention, Atlanta, Georgia6
Received 23 October 2002/ Accepted 3 April 2003
|
|
|---|
|
|
|---|
The PLV strains are currently assigned to six approximately equidistant phylogenetic lineages (9, 18): (i) the SIVcpz clade, joining SIVcpz strains from African chimpanzees (Pan troglodytes) and HIV-1 groups M, N, and O; (ii) the SIVsmm clade, including HIV-2 subtypes as well as SIVsmm and SIVstm isolated from sooty mangabeys (Cercocebus atys) and stump-tailed macaques, respectively; (iii) the SIVagm clade, which clusters together viral strains from three species of African green monkey, namely, vervet monkeys (Chlorocebus pygerythrus; SIVagmVer), grivet monkeys (Chlorocebus aethiops; SIVagmGri), and tantalus monkeys (Chlorocebus tantalus; SIVagmTan); (iv) a group known as the SIVlhoest clade, joining the strains SIVlhoest from L'Hoest monkeys (Cercopithecus lhoesti), SIVsun from sun-tailed monkeys (Cercopithecus solatus), and SIVmnd from mandrills (Mandrillus sphinx); (v) a divergent strain, SIVsyk, so far isolated from only one Sykes monkey (Cercopithecus albogularis); and (vi) the divergent SIVcol strain recently isolated from a guereza colobus monkey.
Phylogenetic analyses also provide evidence for the existence of at least three SIV mosaic genomes: (i) SIVsab, isolated from West African sabaeus monkeys (21); (ii) SIVrcm, isolated from red-capped mangabeys (4, 17); and (iii) SIVmnd2, isolated from mandrills (42).
African monkeys are the natural hosts of PLVs, whereas the SIVs isolated from Asian macaques reflect cross-species transmission from captive sooty mangabeys (5, 19). It has been demonstrated also that HIV-1 and HIV-2 were introduced into the human population through contacts with infected simians (7, 8, 13-15). The six different HIV-2 subtypes seem to have arisen from at least four distinct interspecies transmissions from West African sooty mangabeys (8, 15). At least one zoonotic transmission, and probably three, from SIV-infected Pan troglodytes troglodytes (SIVcpz) is responsible for the origin of HIV-1 groups M, O, and N infection in our species (13). Other PLVs appear to be phylogenetically related according to the species rather than the geographic origins of their hosts (18, 47). For example, the SIVagmVer strains, isolated from vervet monkeys living in East and South Africa, are monophyletic and cluster, in turn, with the SIVagmTan strain isolated from a Central African tantalus monkey and with the SIVagmGri strain from an East African grivet monkey (see Fig. 1). This fact has been explained by assuming that the common ancestor of the African green monkey species was infected with the common ancestor of the SIVagm lineage, followed by coevolution of virus and host (12, 18, 28). Another example of possible cospeciation is the clustering of SIVlhoest, from L'Hoest monkeys, with SIVsun, from sun-tailed monkeys, both members of the genus Cercopithecus and belonging to the L'Hoest superspecies (2, 18). On the other hand, the fact that SIVmnd, from mandrills, falls within the same clade may be explained as a cross-species transmission from an unknown source possibly related to L'Hoest and sun-tailed monkeys (18, 20). An SIV transmission from African green monkeys to yellow baboons has also been reported, showing evidence that cross-species transmission has been happening in the wild even in recent times (21).
![]() View larger version (19K): [in a new window] |
FIG. 1. Consensus phylogenetic tree representing the relationships among the six major lineages of PLVs. Conventional names for the PLV clades are given in bold. Horizontal branch lengths are drawn to scale, with the bar indicating 0.1 nt replacements per site, and were inferred by maximum likelihood with the GTR+ +I model of nucleotide substitution (see Results) by using only first and second codon positions. Asterisks on the branches indicate statistical support (P < 0.001) for the monophyletic lineage.
|
Uncertainty remains about the precise phylogenetic relationships among the six major PLV lineages. Several phylogenetic trees have been reported in the literature. Depending on the gene region or the algorithm used to infer the phylogenetic relationships, the relative branching pattern of the major PLV lineages does not appear to be stable. For example, Beer et al. (3) obtained separate neighbor-joining PLV trees for gag, pol, and env: such trees show that the SIVsmm lineage is monophyletic with the SIVagm lineage in gag but with the SIVsyk lineage in pol. In another example, the maximum-likelihood tree based on full-length PLV Pol protein sequences reported by Hahn et al. (18) clusters SIVagm with SIVlhoest. There is no doubt that the strains belonging to each of the six equidistant lineages constitute a monophyletic clade, whatever genomic region is used in the analysis; i.e., the full genome sequences within each lineage trace back eventually to a unique common ancestor. The fact that the precise phylogenetic relationships among the major clades remain elusive could be due to the loss of phylogenetic signal near the root of the PLV tree because of substitution saturation or to an inability of the phylogeny inference methods currently used to model properly the evolution of these viruses. Although some regions of the PLV genome appear to have undergone substitution saturation, the first and second codon positions of gag, pol, and env sequences retain enough information to allow a reliable investigation of the PLV phylogeny (10).
In what follows, we show that the supposedly pure PLV lineages have, in fact, mosaic genomes that were probably acquired through several interspecies transmission and recombination events close to the root of the tree. To support this claim, we analyzed full-genome PLV sequences with several tree-building methods, based on distance as well as maximum-likelihood algorithms, and with split decomposition analysis, a network-building approach that can take into account, and visually represent, conflicting phylogenetic relations within the data under investigation. Our findings not only make more unlikely the cospeciation hypothesis but also provide new insights on the origin and evolution of this important group of primate viruses.
|
|
|---|
In a previous study using the same data set, the observed saturation index at each codon position of the PLV concatemer was compared with half of the theoretical index expected in case of full substitution saturation (10). The indexes are computed by using the notion of information entropy (49), and the algorithm is implemented in the program DAMBE (48; Xia, DAMBE software package). The result clearly shows that third codon positions of gag, pol, vif, env, and nef, as well as vif and nef first and second codon positions, are significantly saturated (10). Therefore, they were excluded from the analysis. In contrast, first and second codon positions in gag, pol, and env still retain enough information for the reliable inference of phylogenetic relationships (10).
A similar alignment was also obtained for amino acid sequences. In what follows, we analyze only the nucleotide alignment; however, identical results were obtained by using amino acids (data not shown), which is not surprising since the nucleotide alignment includes mostly nonsynonymous positions. The taxa used in the analysis are shown in the consensus tree given in Fig. 1. The complete alignment is available upon request.
PLV consensus tree.
The general time-reversible model with
-distributed rates across sites (GTR+
model) has been tested and found to be the most appropriate for modeling the nucleotide substitution process in PLVs (22, 24, 40). By applying the hierarchical likelihood ratio test (LRT) strategy implemented in the MODELTEST version 3.06 program (31), we found the GTR+
model with a fraction of sites assumed to be invariable (GTR+
+I model) to be the best fitting for the PLV concatemer data set. In order to perform the test, a star-like tree, like the one shown in Fig. 1, was used. Such a tree is certainly not the best picture of the PLV evolutionary history, but it is not too wrong considering that the monophyletic origin of the strains within each of the major clades is highly supported in each region of the PLV genome. It has been shown that the use of any reasonably good tree for the data, i.e., one that is much better than a randomly chosen tree and includes clades that are well supported under any optimality criterion, will not critically alter the testing of evolutionary models, since parameter estimates do not vary much from tree to tree (43, 46, 50).
A maximum-likelihood tree was obtained by using the selected model. As expected, the tree shows six long branches statistically supported (P < 0.001) by the zero-branch-length test (J. Felsenstein, PHYLIP [phylogenetic inference package] version 3.5c software documentation, Department of Genetics, University of Washington, Seattle) and leading to the supposedly pure PLV lineages. However, none of the short internal branches connecting the major clades had results significantly different from zero (P > 0.05). Therefore, such branches were collapsed and the resulting star-like tree, shown in Fig. 1, should be thought of as a consensus tree giving a schematic representation of the PLV monophyletic clades, with the central node as a soft polytomy representing our uncertainty about the exact phylogenetic relationships among the major clades.
Bootscanning analysis. Bootscanning plots were obtained by employing the concatenated nucleotide alignment with the SIMPLOT package (S. Ray, SIMPLOT version 2.5 software documentation, 1999). After the sequences were grouped per lineage, a sequence from each major PLV clade was used in turn as a query against those of all the others, with a sliding window of 500 nucleotides (nt) moved in steps of 20 nt and maximum-likelihood-estimated distances. As a control, a bootscanning analysis was also performed on the SIVrcm isolate NG411 (AF349680), which has a known mosaic genome (4), against all the major lineages.
Phylogenetic inference based on tree-building methods. As discussed below in Results, bootscanning plots suggest the existence of at least five recombinant fragments within the PLV genome (see Fig. 3 and Table 1). Separate phylogenetic analyses were carried out for each fragment with distance, as well as maximum-likelihood-based methods, as follows. The best-fitting nucleotide substitution model was evaluated with MODELTEST version 3.06 (31). Minimum evolution and weighted least-square with inverse-square weighting objective functions were used to infer neighbor-joining and Fitch-Margoliash trees, respectively (45). Maximum-likelihood trees were obtained by implementing a heuristic search with Tree Bisection Reconnection branch swapping, since exhaustive or branch and bound searches, which are guaranteed to find the optimal maximum-likelihood tree, are not feasible for more than 10 to 15 taxa due to the exponential increase of the possible topologies. Starting trees for the heuristic search were obtained by both neighbor joining and random sequence addition (10 repeats), leading in each case to the same result. One thousand bootstrapping and jackknifing resamplings were used to evaluate the reliability of the distance-based trees. Statistical support for the maximum-likelihood trees was assessed with the zero-branch-length test (Felsenstein, PHYLIP software documentation). In such a test, each branch of a tree is collapsed in turn. Upon full reoptimization of the tree parameters by maximum likelihood, a probability is calculated, by comparison with the original tree, of obtaining a likelihood ratio as large as, or larger than, the ratio observed under the null hypothesis that the given branch has zero length. Topologies of different trees were compared with the use of the Shimodaira-Hasegawa (S-H) test with resampling-estimated log likelihood (RELL) bootstrapping by using 1,000 bootstrap replicates (41). Phylogenetic analyses and topological tests were performed with PAUP*4.0b10 (D. L. Swofford, PAUP*, phylogeny analysis based on parsimony [*and other methods], version 4, Sinauer Associates, Sunderland, Mass.).
![]() View larger version (54K): [in a new window] |
FIG. 3. Results of bootscannings with a sequence from each major PLV lineage as a query sequence against those of all the others. A concatemer with the nonoverlapping regions of gag, pol, and env was obtained. The bootscanning analysis was performed on first and second codon positions with a sliding window of 500 nt (20 nt/step) and 1,000 bootstrap replicates. Significant bootstrap values (>90%) are indicated in each bootscan plot by a horizontal broken line. Vertical broken lines represent attempts to locate the putative recombination break points common to all six PLV lineages. The resulting five fragments, which may have a monophyletic origin, indicated in the figure as Rec1 to Rec5, were chosen to be analyzed in separate phylogenetic analyses (see Tables 1, 2, and 3 and Fig. 4).
|
|
View this table: [in a new window] |
TABLE 1. Best-fitting nucleotide substitution model (first and second codon positions only) for the five putative recombinant fragments of the six major PLV lineages
|
+I model by using the parameters estimated for the whole PLV data set with the Seq-Gen program (33). Each simulated set consists of 22 taxa 3,282 nt long with four recombinant fragments of the same lengths as the fragments inferred from the original data (PLV_REC1, PLV_REC2, PLV_REC4, and PLV_REC5; see Table 1), each one following the corresponding tree (see Fig. 4). After a tree for each recombinant fragment in each simulated set was reestimated, the log likelihood of the GTR+
+I model was compared with those of TVM+
+I (which assumes a transition-transversion bias and four different rates for the four different transversions, with
-distributed relative nucleotide substitution rates across sites and a fraction of sites being invariable), TN+
+I (which assumes a transition-transversion bias and a purine transition-pyrimidine transition bias, with
-distributed relative nucleotide substitution rates across sites and a fraction of sites being invariable), and HKY+
+I (which assumes a transition-transversion bias, with
-distributed relative nucleotide substitution rates across sites and a fraction of sites being invariable) by using the LRT.
![]() View larger version (17K): [in a new window] |
FIG.4. Neighbor-joining phylogenetic trees of the putative PLV recombinant fragments inferred from the bootscanning plots in Fig. 3. Genetic distances for each fragment were estimated by using the best-fitting substitution model given in Table 1 and the first and second codon positions. Only the names of the major PLV lineages are shown. Thicker internal branches are significantly supported by bootstrap and jackknife values and/or the results of the zero-branch-length test, according to the data in Table 2. Edges are drawn to scale, with the bar indicating 0.1 nt replacements per site. (a) PLV_REC1; (b) PLV_REC2; (c) PLV_REC4; (d) PLV_REC5.
|
Split decomposition analysis.
Split decomposition analysis allows the canonical decomposition of any distance measure (such as the genetic distances generated from a set of aligned nucleotide or amino acid sequences) into the sum of split metrics plus a split prime residue. More precisely, given a set X of taxa (X = A, B, C, ... , etc.) and a distance d(A,B) in X, the isolation index
S of any split S(U,V) of X is defined
![]() |
![]() |
![]() View larger version (21K): [in a new window] |
FIG. 2. (a) Results of bootscanning of SIVrcm against the six major PLV lineages. A concatemer with the nonoverlapping regions of gag, pol, and env was obtained. The bootscanning analysis was performed on first and second codon positions with a sliding window of 500 nt (20 nt/step) and 1,000 bootstrap replicates. (b, left panel) Results of split decomposition analysis of the concatemer alignment including only SIVrcm, SIVagm, SIVsmm, and SIVcpz isolates. Since several isolates of SIVagm, SIVsmm, and SIVcpz were used in the analysis, such names represent the monophyletic group rather than a particular isolate. Distances were obtained with the GTR+ +I model of nucleotide substitution (see Results), and 1,000 bootstrap replicates were generated to assess the reliability of each edge in the split graph. The percentages of bootstrap replicates are reported on the edge. Edges are drawn to scale, with the bar indicating 0.1 nt replacements per site. (Right panel) The two conflicting phylogenies represented by the split graph are depicted, with internal branches color-coded according to the internal splits in the graph on the left. Each internal split is supported by 100% of bootstrap replicates (see left panel).
|
![]() View larger version (7K): [in a new window] |
FIG. 5. Results of split-decompositon analysis of the six major PLV lineages. A concatemer with the nonoverlapping regions of gag, pol, and env was obtained. The analysis was performed using only first and second codon positions, with distances inferred by using the GTR+ +I model of nucleotide substitution (see Results). One thousand bootstrap replicates were generated to assess the reliability of each edge in the split graph. Names at the tips of the split graph represent the monophyletic group of each major lineage rather than a particular isolate. Edges are drawn to scale, with the bar indicating 0.1 nt replacements per site. Edges in red have >85% bootstrap support.
|
huson/phylogenetics/splitstree.html). Distances were inferred with the GTR+
+I model, the one best fitting the concatemer data set (see above), by using the parameters described in Results. As the measure of support, we calculated the bootstrap value for each edge in the split graph, i.e., the percentage of computed graphs out of 1,000 bootstrap replicates in which the split corresponding to that edge has occurred. |
|
|---|
+I model described by the following maximum-likelihood-estimated parameters: rAC = 4.02, rAG = 4.05, rAT = 1.79, rCG = 3.21, rCT = 5.53, Pinv = 0.19, and
= 1.22, where rij is the relative rate parameter for the i
j nucleotide substitution with respect to an rGT of 1 by default, Pinv is the proportion of invariable sites, and
is the shape parameter of the discrete
distribution (eight rate categories). Simpler models (like the TVM+
+I or the TN+
+I model) compared with the one above always performed significantly worse (P < 0.001). Phylogenetic patterns in the PLV genome. The maximum-likelihood tree in Fig. 1 shows, as expected, the six major lineages of PLV to be approximately equidistant, with the SIVagm, SIVcpz, SIVsmm, and SIVlhoest clades significantly supported (P < 0.001). Results of bootscanning analyses are summarized in Fig. 2 and 3. In Fig. 2a, the bootscanning plot of SIVrcm, a previously reported recombinant virus (4), exhibits a typical mosaic pattern, with gag and part of env clustering with SIVsmm and pol clustering with SIVcpz, which is confirmed by the corresponding split graph in Fig. 2b. The bootscanning plots in Fig. 3, where each major lineage is compared with all the others, also show a number of conflicting phylogenies across the PLV genome. For example, from Fig. 3A (and B), it seems clear that SIVsmm and SIVagm are monophyletic in gag, in the central and 3'-end part of pol, and in env but not in the remaining genomic regions. On the other hand, Fig. 3A (and D) shows SIVsmm clustering with SIVsyk in the 5' part of pol. In a bootscanning plot, a clustering is considered significant when the percentage of permuted trees is at least 90% or greater (as in the examples discussed above). Thus, it is not possible to infer clear-cut phylogenetic relationships for all the different PLV genomic regions from the data in Fig. 3. However, bootscanning analysis alone is not sufficient evidence to support a hypothesis of extensive recombination across the PLV genome. Bootscanning plots are very sensitive to rate heterogeneity over sites, and the conflicting signals in Fig. 3 could be due to the low resolution of the algorithm in detecting and supporting the correct phylogenetic relationships within each sliding window. What can be deduced from the data in Fig. 3 is an indication of the genomic regions that are problematic in assessing phylogenetic relationships among the PLV major lineages, at least with the simple tools implemented in the SIMPLOT package. When different parts of the aligned viral genomes are analyzed, there seem to exist conflicting phylogenetic histories. Five such regions, flanked by the broken lines in Fig. 3, show relatively consistent patterns among all the major viral lineages. The fragments indicated by the broken lines only approximate the mosaic-like structure depicted by the figure, since recombination break points, as inferred by the bootscanning plots, appear to be at slightly different positions from lineage to lineage (Fig. 3E and F). The goal of this analysis is not to infer accurate recombination break points but to investigate whether or not different PLV genomic regions of the so-called pure lineages actually lead to conflicting phylogenies. If any or all of these regions have different monophyletic origins, phylogenetic trees inferred with different algorithms from each region should consistently support, with a high level of confidence, different clustering patterns among the PLV lineages.
We subdivided the PLV genome into five putative recombinant fragments, indicated by the broken lines in Fig. 3, and analyzed them separately by standard phylogeny methods.
Evolutionary patterns and phylogeny of major PLV clades across the genome.
Table 1 summarizes the amino acid positioning, relative to the inferred protein sequences of SIVcpzANT, which was used as a reference, of the five PLV genome fragments discussed above and named for convenience PLV_REC1 to PLV_REC5. According to the hierarchical LRT (see Materials and Methods), different fragments follow different nucleotide substitution models (Table 1). The shape parameter
of the
distribution ranges from 0.55 in PLV_REC1, which implies strong rate heterogeneity, to 1.37 in PLV_REC3, which indicates a relatively weak nucleotide substitution rate heterogeneity among sites. On the other hand, transition-transversion bias parameters are very similar for each fragment. Note also that the invariant models selected as the best-fitting ones for PLV_REC2, PLV_REC4, and PLV_REC5 show from 10 to 20% of the sites to be invariable, an indication that such sites are severely constrained by strong purifying selection.
Table 2 gives the phylogenetic support, in terms of bootstrap and jackknife values (distance-based trees) or P values (maximum-likelihood trees), for the clustering of the six major PLV lineages in the five genomic fragments investigated. Except for PLV_REC3, for which incongruent tree topologies were obtained and which will not be discussed further, all the tree-building algorithms consistently support, for each fragment, different phylogenetic relationships among the major PLV clades. Such relationships are clearly summarized by the neighbor-joining trees shown in Fig. 4. Maximum-likelihood and Fitch-Margoliash trees have the same topologies. The trees were obtained by including only the SIV strains, but similar results were obtained when HIV-1 strains clustering within the SIVcpz clade and HIV-2 strains clustering within the SIVsmm clade were also included (data not shown).
|
View this table: [in a new window] |
TABLE 2. Phylogenetic support for the clustering of the six major PLV lineages in five putative recombinant fragments
|
Each putative PLV recombinant fragment, except for PLV_REC1 and PLV_REC4, seems to evolve according to a different evolutionary model. All the models reported in Table 1 are simpler variants of the one best fitting the PLV concatemer (see above). Since the nucleotide substitution models considered are nested, it makes sense that when different genomic regions with different evolutionary dynamics are pooled together the most general model, including the simpler ones as a special case, is the one best fitting the data. However, a simpler model may not be rejected with a smaller data set due to the loss of statistical power of the LRT. In other words, with short gene regions, it might not be possible to reject a simple model, not because the simple model is indeed correct but because the data do not provide enough information and the test lacks power. To address this problem, we investigated 100 simulated recombinant data sets, obtained with the GTR+
+I model, by performing a separate LRT for each recombinant fragment in each simulated set (see Materials and Methods). The simulated alignments were generated with four nucleotide partitions of the same lengths as PLV_REC1, PLV_REC2, PLV_REC4, and PLV_REC5, each partition following the different phylogenetic tree inferred for the corresponding fragment (Fig. 4). In general, the TVM+
+I model could not be rejected in 53% of the cases, and as expected, failure to reject such a model was greater for the shortest fragment (70%) than for the longest one (29%). On the other hand, simpler models, such as TN+
+I and HKY+
+I, were rejected for 100% of the simulated data sets. Therefore, shorter fragments do reduce the power of the LRT, and the test is very likely to fail in rejecting the TVM+
+I model, which constrains only one of the parameters of the GTR+
+I model (the true one for the simulated data), but not models imposing heavier constraints, like TN+
+I and HKY+
+I. By comparison of this result with the data in Table 1, it seems reasonable to suggest different evolutionary patterns in the four PLV recombinant fragments investigated, although the rejection of the GTR+
+I model for PLV_REC2 could be due to the lack of power of the test.
Each of the four trees in Fig. 4 is the optimal tree for the corresponding genome region according to both distance- and maximum-likelihood-based criteria. It may be asked whether for each putative recombinant fragment the other trees are significantly worse than the optimal tree. Table 3 shows the results of the S-H RELL test for each fragment. The likelihood of the optimal tree is always significantly better than those of all the others, except for the PLV_REC4 data set for which the likelihood of tree number 2 (the tree with the topology given in Fig. 4b) is not significantly different from the likelihood of the optimal tree as shown in Fig. 4d. Overall, the results in Table 3 confirm that in each of the putative recombinant fragments the inferred optimal tree describes the phylogenetic relationships among the major PLV lineages significantly better than the alternative trees.
|
View this table: [in a new window] |
TABLE 3. Results of topological tests for the possible trees of each PLV putative recombinant fragment
|
Split decomposition analysis. Computer simulations show that the algorithm implemented by SplitsTree becomes unreliable when more than 15 to 20 taxa are analyzed, and with a larger number of strains it tends to give a star-like graph with little or no resolution (data not shown). However, it is possible to group taxa belonging to a monophyletic clade and treat them as a single taxonomic unit. Since the monophyly of each major PLV clade is unequivocal (Fig. 1), the group option in SplitsTree allowed us to analyze six lineages rather than all the strains in the data set. In practice, the algorithm computes the average distance from each group to any other and, by canonical decomposition of such group distances, infers a split graph representing the relationships among the different groups. The resulting split graph for the PLV gag-pol-env concatemer, using only first and second codon positions, is shown in Fig. 5. Note that each branch leading to a major PLV clade is much longer than the internal splits, which are very close to the center of the graph. Even though reticulation in split graphs does not necessarily imply recombination and may be due, for example, to insufficient correction for superimposed mutations (45), Fig. 5 confirms that the data contain conflicting phylogenetic signals and are consistent with early recombination events among the major lineages as shown by the phylogenetic analyses discussed above. The fit index is 98.1, an excellent index meaning that only 1.9% of the distances in the distance matrix are not represented by the graph. Most of the internal splits have a bootstrap support between 85 and 100%. Overall, the graph seems to reliably represent the data, showing the six major PLV clades related by a complex web-like network, as is also evident from the results summarized in Tables 2 and 3 and Fig. 4. By comparison of the split graph with Fig. 4, it is possible to see that the internal splits with high bootstrap support correspond to the topology of each tree in the figure. For example, the horizontal central split partitions the taxa into the two monophyletic subsets PLVlhoest-PLVcol-PLVcpz and PLVagm-PLVsmm-PLVsyk, according to the tree in Fig. 4a. On the other hand, the vertical central split and the split on the lower right part of the graph partition the taxa into the two monophyletic subsets PLVsyk-PLVcpz-PLVcol and PLVsmm-PLVagm-PLVlhoest, according to the tree in Fig. 4d. The intuitive interpretation of Fig. 5 is that each internal split represents a conflicting phylogeny that would be resolved arbitrarily in a single (wrong) tree by using any of the other tree-building algorithms available.
|
|
|---|
Our results also suggest that split decomposition methods could be employed to investigate the evolution of PLVs. They have the advantage of not forcing the data onto the wrong tree but the disadvantage of being less intuitively understood when multiple conflicting signals in the data produce complex networks, like the ones discussed in this paper.
Although clear-cut examples exist of simian-to-simian and simian-to-human cross-species transmissions, it is generally accepted that host-specific virus evolution of PLVs is the rule (18). However, as noticed by Sharp and coworkers (40), the cospeciation hypothesis bears a paradox. Apes and Old World monkeys originated at least 25 million years ago (C. Mulder, Letter, Nature 333:396, 1988), and African green monkeys originated probably around 1,000,000 years ago (25), suggesting an old time scale for the PLV tree. HIV-1 and HIV-2 share the same common ancestor with all other PLVs, but the zoonosis at the origin of their subtypes appears to have occurred much more recently. For example, the cenancestor of group M has been dated to the 1920s to 1930s (22, 36), and the interspecies transmissions giving rise to HIV-1 and HIV-2 all occurred somewhere between the end of the 17th and the beginning of the 20th centuries (24a, 36). The finding should not come as a surprise considering the fast evolutionary rate of these viruses, estimated around 10-3 nt substitutions per site per year (26, 40, 44). As discussed in Results, the molecular clock hypothesis for the putative PLV recombinant fragments 1, 2, and 4 could not be rejected. In the clock-like trees, the length of the branch connecting the tip with the cenancestor of the SIVagm clade is 0.8 to 1.7 times longer than the one leading from the tip to the cenancestor of the SIVcpz clade (see Results). This is incompatible with the cospeciation hypothesis, which would require a branch length about 1,000 times longer to fit the dramatically different time scales between the separation of HIV-1 and SIVcpz, dated a few hundred years ago (36), and the origin of the African green monkeys, which occurred around 1,000,000 years ago (26, 40). Moreover, even though the molecular clock hypothesis has been shown in some cases to fit poorly the evolutionary patterns of PLVs (23), rate differences among PLV strains are unlikely to cover such an order of magnitude. In fact, evidence exists of the opposite. For example, the evolutionary rate of SIV in vivo has been roughly estimated at around 10-2 nt substitutions per site per year in SIVagm (29) and around 6 x 10-3 nt substitutions per site per year in SIVsmm (34). As shown above, for at least four of the five putative recombinant fragments across the PLV genome, a molecular clock could not be rejected. Besides molecular clock considerations, jungle graphs have shown that the relatively high degree of cophylogenetic match observed in PLV evolution does not necessarily indicate cospeciation but could in fact have resulted from preferential host interspecies transmission events between more closely related host species (6). The recombinant origin of each PLV lineage discussed in the present paper makes even more unlikely long independent cospeciation of PLV lineages and their hosts. On the contrary, it seems that cross-species transmission events leading to highly complex mosaic genomes have occurred continuously throughout the entire PLV evolutionary history. A few recent studies try to address how this mode of evolution may affect dating strategies (32, 38, 39). Simulations show that the higher the recombination rate in the sequences, the higher the probability that the molecular clock will falsely be rejected (39). Thus, the fact that for three of the four PLV genome fragments the molecular clock hypothesis cannot be rejected seems to be indirect evidence that within these recombinant fragments further recombination is unlikely.
In conclusion, the findings presented in this paper provide compelling evidence that (i) simian-to-simian transmission of SIV, combined with frequent recombination between diverse lineages of SIV, has played a major role in the evolution of the PLVs; (ii) these recombinogenic events have been ongoing since the beginning of the evolutionary history of PLVs; (iii) as a result of these early events, there are no longer any pure lineages of SIVinstead, most, if not all, contemporary strains of SIV are complex mosaic virusesand (iv) these mosaic viruses are the ancestral strains of the HIV causing the current HIV infection and AIDS pandemic.
We thank Martine Peeters, Stuart Ray, and Walter Fitch for critical reading of the manuscript and helpful suggestions. We also thank Ziheng Yang and two of the anonymous reviewers for valuable critiques and suggestions.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»