Previous Article | Next Article ![]()
Journal of Virology, May 2008, p. 4429-4440, Vol. 82, No. 9
0022-538X/08/$08.00+0 doi:10.1128/JVI.02354-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia 30333
Received 31 October 2007/ Accepted 7 February 2008
|
|
|---|
10-fold slower [(0.10 ± 0.03) x 10–2], and the more stochastic Ka clock was
30-fold slower [(0.03 ± 0.01) x 10–2]. Nonsynonymous substitutions at all P1/capsid sites, including the neutralizing antigenic sites, appeared to be constrained by purifying selection. Simulation of the evolution of third-codon positions suggested that saturation of synonymous transitions would be evident at 10 years and complete at
65 years of independent transmission. Saturation of synonymous transversions was predicted to be minimal at 20 years and incomplete at 100 years. The rapid evolution of the Kt, Ks, and As clocks can be used to estimate the dates of divergence of closely related viruses, whereas the slower Bs and Ka clocks may be used to explore deeper evolutionary relationships within and across poliovirus genotypes. |
|
|---|
10–2 substitutions per site per year (31, 39, 52-54, 89, 90). The rates appear to be similar across the three poliovirus serotypes and for both circulating polioviruses and polioviruses associated with chronic infections. This very rapid rate of genomic evolution has facilitated high-resolution molecular epidemiologic studies, permitting the unambiguous identification of the sources of imported viruses (10, 41) and the resolution of separate lineages during outbreaks (39, 43, 74, 75), during endemic transmission (23, 52, 90), during prolonged poliovirus replication in immunodeficient patients (9, 31, 53, 89), and from environmental surveillance (23). However, the rapid accumulation of nucleotide substitutions, most of which are synonymous transitions, obscures deeper evolutionary relationships (46, 70).
Underlying the rapid pace of poliovirus genomic evolution are the high rates of base misincorporation (in the range of 10–5 to 10–3 per base per replication) by the poliovirus RNA-dependent RNA polymerase (2, 15, 19, 81, 82, 84). These high mutation rates are attributable to the absence of 3'
5' exonuclease proofreading mechanisms for the viral RNA polymerases (19), although other mechanisms may also be involved (17, 18). The strong transition bias of the poliovirus polymerase (2, 46) is reflected in the pattern of fixation nucleotide substitutions into poliovirus genomes.
In this report, we have calibrated five different molecular clocks based upon the rates of fixation of total substitutions (Kt), synonymous substitutions (Ks), synonymous transitions (As), synonymous transversions (Bs), and nonsynonymous substitutions (Ka) into the P1/capsid regions of circulating wild poliovirus type 1. Our analyses were facilitated by a chance epidemiologic event when type 1 polioviruses previously indigenous to the northern Andean region (Venezuela to Peru) were displaced by a genotype imported around 1980 from the Middle East (40). The imported virus circulated widely in the northern Andean region until elimination in 1991. Poliovirus surveillance yielded isolates that were closely related to the true imported founder and isolates representing two major diverging lineages. Analysis of the Andean isolate sequences permitted robust estimates of the fixation rates for different categories of nucleotide substitution. The closely related Kt, Ks, and As clocks can be used routinely by poliovirus surveillance to establish recent epidemiologic links. Resolving the Bs clock from the noisy Kt and Ks signals (32) opens a way to explore the broader patterns of poliovirus transmission. The poliovirus Ka clock, constrained by purifying selection at all nonsynonymous P1/capsid sites, was the slowest and most variable of all.
|
|
|---|
P1/capsid region sequencing.
Conditions for reverse transcription-PCR amplification and cycle sequencing of the P1/capsid region (nucleotides [nt] 743 to 3385) were essentially the same as those described previously (52). Reverse transcription used primer 3503A (5'-AAGAGGTCTCTRTTCCACAT-3'; targets nt 3485 to 3504) (numbering of nucleotide positions followed that described for Sabin 1 [57]), followed by 35 PCR cycles using primers 3503A and S48 (5'-GGGGACAAGTTTGTACAAAAAAGCAGGCTTTAAAACAGCTCTGGGGTT-3'; targeting nt 0001 to 0019) to generate an
3,533-bp amplicon spanning the entire P1/capsid region. Sequencing was performed in both directions, using several primers, and every nucleotide position was sequenced at least once from each strand.
Estimation of numbers of synonymous and nonsynonymous sites.
Multiple alignments among the homologous P1/capsid sequences were obtained after partitioning total P1/capsid sites (LT; 2,643 nt) into four subalignments comprising the following positions: synonymous sites for all synonymous substitutions (LS;
909 nt), synonymous sites for synonymous transitions (LS-A), synonymous sites for synonymous transversions (LS-B), and nonsynonymous sites for all nonsynonymous substitutions (LN). Consensus LS-A (
865 nt) positions were obtained by alignment of twofold degenerate sites for transitions only and of fourfold degenerate sites, consensus LS-B (
512 nt) positions were obtained by alignment of twofold degenerate sites for tranversions only and of fourfold degenerate sites, and consensus LN (
1,734 nt) positions were obtained by alignment of nondegenerate sites (L0) (12). Assignment of fourfold, twofold, and zerofold degenerate sites followed the method of Li et al. (50). Positions bearing more than one class of degenerate sites were discarded. Alignment editing was performed by using the SEAVIEW (30) and MEGA (47) software packages. All alignments were subsequently used to calculate the absolute rates of evolution by both maximum likelihood and Bayesian approaches.
P1/capsid phylogeny estimation.
Phylogenetic trees were estimated from P1/capsid sequence alignments by using maximum parsimony, distance methods, likelihood functions (77), and Bayesian inference (72). Maximum parsimony and distance-based (minimum evolution) trees were inferred in heuristic searches using the PAUP software package (76). The topology of the maximum likelihood (ML) tree was reconstructed in a majority-rule consensus tree after 1,000 bootstrap replicates were performed in the fastDNAml program (25, 59), and the corresponding branch lengths were evaluated following model selection approaches (37, 65) among nested models of evolution implemented in the program MODELTEST (66). ML estimations of branch lengths were calculated by using the PAML software package (93). The most preferred model of evolution, the REV (general time reversible process) model with variable rates among sites (91), assigned different probabilities to each of the six pairs of substitutions and assumed unequal base frequencies. Variability of rates among sites was selected using the discrete gamma distribution (
8), with a shape parameter (
) of 0.3 and eight categories of discrete sites (92). A majority-rule consensus Bayesian tree was also inferred under the REV+
8 model after two independent Markov chain Monte Carlo runs of 10,000,000 steps were performed using the program MrBayes 3.1 (72).
Phylogenetic trees, rooted to sequence 3216/VEN81, were visualized in TreeExplorer (http://evolgen.biol.metro-u.ac.jp/TE/TE_man.html).
Pattern and distribution of substitutions.
P1/capsid nucleotide substitutions along the 60 branches of the estimated REV+
8 ML tree were calculated using marginal reconstruction of ancestral states as implemented in PAML (45, 94). The list of total inferred substitutions was imported into and manually edited in a Microsoft Excel spreadsheet and used to calculate the substitution pattern. Directional substitutions were also calculated with DAMBE software (87) by incorporating PAML inferred ancestral sequences. Relative substitution frequencies were obtained from the substitution pattern after substitutions were scaled to the proportion of mutant nucleotides (78). The proportion (Pn) of inferred synonymous transitions and synonymous transversions at each of the 881 P1/capsid codons and the expected proportions assuming a Poisson distribution were compared and plotted using statistical applications included in MATLAB (MathWorks, Natick, MA).
Estimation of relative rates of fixation of synonymous and nonsynonymous substitutions into antigenic and nonantigenic sites.
The numbers of synonymous substitutions per synonymous site (Ks) and nonsynonymous substitutions per nonsynonymous site (Ka) between pairwise sequences were estimated using the correction method of Li (49), as modified by Pamilo and Bianchi (62) and implemented in the program MEGA (47). The estimated parameter
(defined as the dN/dS [equivalent to Ka/Ks] ratio) evaluates the selective constraints on or selective pressure for amino acid replacement among homologous proteins.
Linear regression analysis of absolute rates of synonymous and nonsynonymous substitution. The number of Ks and Ka P1/capsid substitutions that accumulated from the root sequence, 3216/VEN81, were computed using the program K-Estimator (11). Ks values were estimated as the sum of estimated As and Bs substitutions in corrected synonymous sites (12). Finally, Kt was calculated as the expected number of nucleotide substitutions per site, as implemented in PAML (93). The different categories of substitutions from the root sequence were plotted as a function of the time of sample collection. When the exact dates of sample collection were unknown, collection dates were assumed to be 10 days after the onset of paralysis (when the dates of onset were known) or 1 July of the year of isolation when only the year of onset was known. The rates of accumulation of each category of substitution were estimated by linear regression, using statistical applications within the SAS system, version 9 (SAS Institute, Inc., Cary, NC). Rates of fixation of Ks, As, Bs, and Ka substitutions from the regression analysis were normalized to total P1/capsid sites from the LT, LS, and LN values and expressed as K's (Ks x LS/LT), A's (As x LS/LT), B's (Bs x LS/LT), and K'a (Ka x LN/LT) substitutions per site per year, respectively.
ML and Bayesian inference of absolute rates of synonymous and nonsynonymous substitution.
The global molecular clock model (single-rate dated-tip model [ML/SRDT]) (69), which assigns a unique rate of substitution (an ML estimate) among lineages, was compared to the different-rate model, which does not assume rate constancy among lineages. The simpler, constrained ML/SRDT model (with likelihood L0) was compared to the unconstrained different-rate model (with likelihood L1) in likelihood ratio tests (LRTs), with the difference in free parameters (n) equal to 3 (n = number of sequences) (80). The LRT statistic, 2
, is twice the ratio of log(L1/L0), and if the simpler rate constancy hypothesis is true, the statistic 2
is approximately
2 distributed. The corrected significance level for rejection (
) of the Kt, K's, A's, B's, and K'a molecular clocks was set to an overall value of 0.05. Test statistics were obtained using MODELTEST. Rates of substitution and their corresponding 95% confidence intervals (CI) were calculated in PAML. Reliability of the CIs was investigated by tuning the iteration algorithm in PAML.
The Bayesian approach for the same types of molecular clocks was performed in the program BEAST v1.4 (22). Two independent chains of 5,000,000 steps were run under a model of constant population size. The models of evolution used for each molecular clock were the same as those used for the ML/SRDT clocks.
Analysis of differential rates of saturation of transition and transversion substitutions. We compared the numbers of corrected and uncorrected transitions and transversions at third-codon positions (3CPs) between each sequence pair of the P1/capsid alignment against the cognate branch length (patristic distance) inferred by the ML/SRDT tree. Patristic distances under the clock model represented total evolutionary time, in years. Corrected and uncorrected distances were calculated in PAUP (76). The application PATRISTIC (29) was used to calculate patristic distances (ML/SRDT estimates) and match them with observed distances between each sequence pair.
Modeling of long-term patterns of transitions and transversions.
The long-term (
100 year) evolution of 3CPs within the P1/capsid region (881 nt) of the Andean viruses was simulated by computer, using the MATLAB program mutate as described by Allman and Rhodes (1). Successive mutation runs obtained by computer simulation extended the evolution from the 3216/VEN81 founder sequence (S0) by several decades. Phylogenetic analysis of the simulated sequences produced an ML tree similar in topology to that obtained with the natural sequences. Serially sampled simulated sequences (St) were obtained by application of a 4 x 4 Markov matrix (M) based on the substitution pattern inferred from the ML/SRDT Andean phylogeny. The probabilities for transversions were increased 1.8-fold in the matrix to correct for the observation that only
56% of P1/capsid 3CPs were fourfold degenerate sites (as determined by back translation of sequence 3216/VEN81-1). Each mutation run applied an arbitrary number of time steps (t) and assumed a rate of fixation (
) of
0.01 3CP substitutions/3CP site/time step, starting with the 3CP base distribution vector for VEN81-1 [p0 = 0.241 for A, 0.243 for G, 0.288 for C, and 0.228 for U)]. For example:
![]() |
6. The differential saturation rate of 3CP transitions and transversions was calculated as described above for the natural sequences. Nucleotide sequence accession numbers. Sequences of the 31 wild Andean poliovirus type 1 isolates described in this study were submitted to the GenBank library under accession numbers EF374000 to EF374030.
|
|
|---|
![]() View larger version (53K): [in a new window] |
FIG. 1. ML/SRDT tree of P1/capsid region sequence relationships among 31 wild poliovirus type 1 isolates from cases in the period 1981-1991 in the northern Andean region. The tree was rooted to the sequence of isolate VEN81-1, with branch lengths scaled under the SRDT model (69) to the date of specimen collection, as described in Materials and Methods. Statistical support for the tree topology (bootstrap values or Bayesian posterior density values [in parentheses]) is given at the nodes. The sequence of the last indigenous wild poliovirus isolate from the Americas (PER91) is marked by an asterisk. Italicized letters in parentheses identify clusters of isolates (A, VEN81-1, VEN81-2, VEN82, PER83, COL86, VEN88-1, COL89, and COL91-5; B, VEN84; C, VEN88-2, COL90-2, COL90-3, COL90-4, COL91-2, and COL91-3; D, COL91-1, COL91-4, COL91-6, COL91-7, and COL91-8; E, PER88-4; F, PER88-1; G, PER88-3; H, PER88-2; I, ECU89, ECU90-1, COL90-1, PER90-1, and PER91; J, ECU90-2; and K, PER90-2) having identical sequences in the NAg sites shown in Fig. 4.
|
Pattern of nucleotide substitution.
The ML algorithm used to construct the tree (Fig. 1) estimated that a total of 1,247 nucleotide substitutions were cumulatively fixed into the P1/capsid regions of the 31 isolates during their divergence from the VEN81-1 root sequence (Table 1). In accord with previous reports (2, 15, 31, 46, 52, 70), most (93%) substitutions generated synonymous codons and most (92%) substitutions were transitions. Synonymous transitions were the most common (87%) and nonsynonymous transversions the least common (1.8%) substitutions. A higher proportion of transitions (95%) than transversions (78%) were synonymous. The large majority (89%) of nucleotide changes occurred in 3CPs, and >99% of these were synonymous (Table 1). However, the majority of codon positions (91% of 1CPs, 97% of 2CPs, and 32% of 3CPs) were calculated by ML to have been invariant during the 10-year period of observation. Multiple substitutions were estimated to have occurred at
13% of sites (and at nearly half of all variable sites); superimposed substitutions (primarily back substitutions) occurred at
8% of sites. Despite the large number of substitutions, the P1/capsid region RNA base composition (A, 28.1% ± 0.15%; G, 22.7% ± 0.19%; C, 25.4% ± 0.26%; and U, 23.8% ± 0.26% [n = 31]) and effective codon usage (Nc = 56 ± 0.5; n = 31) (86) were strongly conserved.
|
View this table: [in a new window] |
TABLE 1. Nucleotide substitutions fixed into P1/capsid region of Andean lineages, 1981 to 1991a
|
![]() View larger version (24K): [in a new window] |
FIG. 2. Distribution by category of nucleotide substitutions across the P1/capsid region along all observed Andean lineages, based upon the ML/SRDT tree in Fig. 1 and the assignments summarized in Table 1. The reference sequence (set at zero substitutions) was that of VEN81-1. Symbols: Kt, total nucleotide substitutions; Ks, synonymous substitutions; Ka, nonsynonymous substitutions; As, synonymous transitions; Aa, nonsynonymous transitions; Bs, synonymous transversions; Ba, nonsynonymous transversions. NAg sites are shown in gray, and the highly variable (79) and structurally disordered (35) 32 amino-terminal residues of VP1 are represented as a gradient of gray shades.
|
![]() View larger version (23K): [in a new window] |
FIG. 3. ML estimates of distributions of numbers of As (unfilled bars) and Bs (hatched bars) substitutions fixed per site into the P1/capsid regions of all 31 Andean isolates during 10 years of divergence from the VEN81-1 root sequence. The distributions expected for a Poisson model of nucleotide substitution are shown as thin (As) and thick (Bs) lines.
|
Only a small proportion (17%) of the variable sites mapped within or near the neutralizing antigenic (NAg) sites (Fig. 2 and 4). None of the P1/capsid region sites appeared to be under positive selection during circulation of the Andean lineages. A measure of the intensity of selection at a site is the ratio (
) of nonsynonymous to synonymous substitutions (Ka/Ks or dN/dS). When
is >1, the site is inferred to be under positive selection; when
is 1, the site is inferred to be under no selection; and when
is <1, the site is inferred to be under negative (purifying) selection. No positively selected sites were found under six different models of codon evolution (95), and the models with the highest estimated likelihoods all yielded
values of <1 (see Table S1 in the supplemental material). When codons for three different P1/capsid domains (NAg sites, non-NAg sites, and highly variable sites) were considered separately, all were found to be under purifying selection, and there was little fluctuation of
values along lineages (Table 2). Interestingly, average
values from root to descendant lineages were lowest for non-NAg sites, higher for the highly variable sites, and highest for the NAg sites. These findings suggest that selection for amino acid conservation was strong throughout the P1/capsid region, but stronger for the non-NAg sites than for the highly variable and NAg sites.
![]() View larger version (10K): [in a new window] |
FIG. 4. Sequences of amino acid residues within or bounding NAg sites 1 (VP1 amino acids 95 to 106), 2 (VP2 amino acids 164 to 173, VP2 amino acids 269 to 271, and VP1 amino acids 221 to 226), 3a (VP3 amino acids 56 to 62; VP3 amino acids 70 to 74, and VP1 amino acids 287 to 292), and 3b (VP2 amino acids 71 to 73 and VP3 amino acids 75 to 79). The reference sequence is that of VEN81-1. Residues defining the type 1 poliovirus NAg sites by mutations conferring resistance to neutralization by monoclonal antibodies (4, 56, 60, 83) are indicated in bold. Isolates having identical sequences in the NAg sites are grouped and identified by italicized capital letters, described in the legend to Fig. 1.
|
|
View this table: [in a new window] |
TABLE 2. Relative rates of fixation ( ) of Ka (dN) and Ks (ds) substitutions into NAg and non-NAg sites from root to descendant lineages
|
G
I
J), diverging pathways (I
J and I
K), and fluctuating patterns (A
C
A) (Fig. 1 and 4).
Relative frequencies of specific base changes.
The frequency of transitions between pyrimidines (C
U) in the P1/capsid region was
1.4 times higher than that of transitions between purines (G
A) (Fig. 5), consistent with the approximately equimolar base composition of the P1/capsid sequences and a slight bias toward C+U nucleotide bases at degenerate 3CPs. The unequal rates between the eight types of transversions were more pronounced. For example, U
A substitutions accounted for
30% of all transversions and the U
A frequency was about 30 times higher than that of the most infrequent transversion, C
G. Direct C
G transversions appear to be rare in poliovirus (39, 46, 52, 75, 89, 90) and other picornaviruses (24), and most C
G substitutions in poliovirus probably occur via two steps, C
A
G, as inferred several times for the Andean lineages.
![]() View larger version (15K): [in a new window] |
FIG. 5. ML estimates of relative frequencies of specific base changes (10–2) (black arrows, transitions; gray arrows, transversions) at all codon positions within the P1/capsid regions of the Andean isolates during divergence from the VEN81-1 root sequence.
|
pyrimidine (0.039) and pyrimidine
purine (0.044) transversions were nearly equivalent (Fig. 5). These frequencies reflect in part the estimated
10-fold ratio of transition/transversion misincorporations by the poliovirus RNA-dependent RNA polymerase (2). Fixation of transversions in the P1/capsid region is further constrained by purifying selection at nonsynonymous sites and by the poliovirus codon usage bias (68, 73, 79). When transition frequencies in the VEN81-1 P1/capsid sequence were normalized to all degenerate 3CP sites (n = 834) and transversion frequencies were normalized to the number of threefold (n = 39) and fourfold (n = 496) degenerate 3CP sites, the transition/transversion frequency ratio was
8. Notably, a lower transition/transversion frequency ratio of 4.4 was obtained for the highly variable (51, 79) 5'-untranslated region (5'-UTR) sequences (nt 640 to 742) immediately preceding the start codon, an interval where base substitution is not constrained by amino acid conservation, RNA secondary structure (61), or codon usage bias. Rates of fixation of different categories of nucleotide substitutions. Three basic approaches are used to determine rates of nucleotide substitution, including linear regression (44), ML (69), and Bayesian inference (20, 22). The three approaches yielded very similar results for the rates of fixation of Kt, K's, A's, and B's substitutions (the "prime" symbol indicates normalization of substitution categories to total sites) into the P1/capsid region (Table 3; see Table S2 in the supplemental material).
|
View this table: [in a new window] |
TABLE 3. Substitution rates for P1/capsid sequences (2,643 nt)
|
10-fold higher rate than B's substitutions and at an
30-fold overall higher rate than K'a substitutions. The rates of fixation for each category of substitution were similar between the northern and southern lineages (see Table S3 in the supplemental material), consistent with a molecular clock.
![]() View larger version (13K): [in a new window] |
FIG. 6. Estimation by linear regression of the rates of fixation of different categories of nucleotide substitutions into the P1/capsid regions of 31 Andean isolates. (A to C) For the abscissa, the time the sample was taken for each isolate (zero time, time of sampling for isolate VEN81-1, assumed to be 1 July 1981) is given. For the ordinate, corrected P1/capsid nucleotide substitution differences were normalized to total sites (Kt) (A), total synonymous (Ks) or nonsynonymous (Ka) sites (B), and total synonymous sites (As and Bs) (C) (zero substitutions, sequence of VEN81-1). Rates were calculated as described in Materials and Methods. (D) Relative rates of fixation of As and Bs substitutions (slope = 0.109; R2 = 0.86).
|
Testing the molecular clocks.
The hypothesis that the rates of fixation of Kt, K's, A's, B's, and K'a substitutions were consistent with molecular clocks was tested in LRTs (26, 65). The probability values for all five molecular clocks in the LRTs were well above the rejection threshold (
= 0.05) (Table 4). P values were highest for the B's (0.65), Kt (0.59), and K's (0.49) clocks and lower for the A's (0.42) and K'a (0.14) clocks and were inversely proportional to the estimated 95% CI for the Kt, K's, and A's rates. In contrast, the CI for the B's clock represents
40% of the estimated rate, despite scoring the highest P value, and the CI for the more variable K'a clock represents
70% of the estimated rate.
|
View this table: [in a new window] |
TABLE 4. SRDT rates for consensus alignments of P1/capsid sequences
|
6 months of the earliest recognized cases, the most divergent pair of isolates, COL91-8 and PER91, had been separated by a total of
19 years of independent evolution. Under the assumption that nucleotide substitution occurred via time-reversible Markov chains (26), saturation of variable sites was analyzed beyond the actual 10-year period of observation, for up to nearly 20 years, by rerooting of the tree (Fig. 7A and B). Although it is evident from the pattern of nucleotide substitution (Fig. 5) that the assumption of reversibility of specific base changes does not strictly hold, the most frequent base substitutions (A
G, C
U, and A
U; 95% of total) and the overall purine
pyrimidine substitutions were approximately time reversible, such that the assumption of time reversibility was not rejected by the MODELTEST program (66). When the ML/SRDT tree was rerooted to the sequence of COL91-8 or PER91, we were able to estimate the kinetics of saturation of transition and transversion substitutions over a period of nearly 20 years.
![]() View larger version (25K): [in a new window] |
FIG. 7. Saturation of 3CP transitions (closed circles), 3CP transversions (open circles), and total 3CP substitutions (triangles) in the P1/capsid region over the total period of independent evolution of the northern and southern lineages. (A) Uncorrected pairwise 3CP differences between all 31 isolates (ordinate) were plotted as a function of time of evolutionary separation (estimated from branch lengths of the ML/SRDT tree [Fig. 1]). (B) Pairwise 3CP differences corrected by use of the REV model of nucleotide substitution. (C) Simulation of saturation of 3CP transitions (closed circles), 3CP transversions (open circles), and total 3CP substitutions (triangles) in the P1/capsid region over a 100-year period, calculated by using the mutate program in MATLAB as described in Materials and Methods. The boxed area represents the time interval shown in panels A and B.
|
40%. In contrast, 3CP transversions saturated much more slowly, and at 20 years the predicted underestimate from pairwise comparisons was only
10% (Fig. 7A).
When 3CP transition differences were corrected by application of the REV substitution model (91), using the ML estimates of the nucleotide substitution rate parameters (Fig. 5), the period of apparently linear accumulation of 3CP transitions was extended to
10 years (Fig. 7B). However, by 20 years of evolutionary divergence, saturation of 3CP transitions was evident, and REV-corrected pairwise comparisons of sequences of the most divergent isolates were predicted to underestimate the actual number of 3CP transitions by
10% (Fig. 7B), illustrating the effects of increased saturation of variable sites on even the most robust substitution models (Fig. 3) (26, 34). In contrast, the REV model was able to fully correct for the predicted limited saturation of 3CP transversions at 20 years (Fig. 7B).
Predicted useful ranges of the Ks, As, and Bs clocks.
The above analyses suggest that comparing the number of P1/capsid 3CP differences between moderately divergent poliovirus isolates may be used to estimate their total time of evolutionary separation for up to
20 years. Pairwise transversion differences can be used without correction for time estimates of up to 20 years, but transition differences require correction for time estimates in the 10- to 20-year range. Because no continuous, well-defined poliovirus lineages with isolates having more than 20 years of total evolutionary separation have been identified, we cannot rigorously apply the above approach beyond 20 years. Therefore, we applied mathematical modeling to simulate the evolution of P1/capsid 3CP substitutions from the VEN81-1 root (see Fig. S2 in the supplemental material) in order to investigate the saturation kinetics of transitions and transversions over an
100-year period (Fig. 7C). Simulations were performed by using the program mutate in MATLAB, applying the initial 3CP base distribution vector for VEN81-1 and a substitution matrix based upon the relative frequencies of specific base changes given in Fig. 5, but with the transversion frequencies increased 1.8-fold to correct for the observation that only
56% of P1/capsid 3CPs were fourfold degenerate sites. Simulations were performed for a total of 200 time steps (equivalent to
6 months of evolution per time step), and the abscissa of the plot of the simulated saturation curves was scaled to years by assuming a rate of 0.03 substitutions/3CP site/year. Across the 0- to 20-year time interval, the saturation curves for 3CP transitions, transversions, and total substitutions in the P1/capsid region predicted by the simulation (Fig. 7C, boxed area) were very similar to those obtained by rerooting of the natural sequences (Fig. 7A). Beyond 20 years, the simulation predicted that P1/capsid 3CP transition differences would reach a maximum of
40% at
65 years of evolution (and then gradually decline), in excellent agreement with the observed 3CP transition differences (up to 42% in VP1) across divergent type 1 poliovirus genotypes (data not shown). Transversions at fourfold degenerate 3CPs were predicted to gradually saturate, and by 100 years they would be fixed into
23% of fourfold degenerate 3CPs (Fig. 7C), well below the saturation limit (
37% in VP1) observed across highly divergent type 1 poliovirus genotypes. At 100 years, the increase in total 3CP substitutions was predicted by the simulation to be driven exclusively by accumulation of transversions, which would constitute an increasing proportion of the total number of substitutions (Fig. 7C).
|
|
|---|
10-fold slower, and the more stochastic K'a clock was
30-fold slower. The Kt, K's, and A's clocks are useful in high-resolution molecular epidemiologic studies supporting the current drive to eradicate polio (85). In contrast, the B's clock has potential for estimating more distant evolutionary events (26, 32), better resolving different poliovirus genotypes, and providing a broader overview of the global patterns of poliovirus circulation and the occurrence of displacement events. In addition, use of sequence alignments showing only transversion differences can facilitate recognition of recombination breakpoints that would otherwise be obscured by the high background of transitions (J. Jorba, unpublished results). The precision of molecular clock calibrations is a function of the evolution rate, the time interval for sampling, the sequence length (21), and the availability of biological material with a well-defined evolutionary history. Each of these factors was favorable in this study. The Kt evolution rate of 1.03 x 10–2 substitutions/site/year for the poliovirus P1/capsid region is among the highest reported for any RNA virus (17), is lower than the fastest rate reported for equine infectious anemia virus during persistent infection (gp90env; 5.5 x 10–2) (48), is similar to rates reported for foot-and-mouth disease virus (complete genome; 8.2 x 10–3) (13), influenza A virus (hemagglutinin; 6.3 x 10–3) (27), porcine reproductive and respiratory syndrome virus (ORF 3; 5.8 x 10–3) (28), and enterovirus 71 (VP1; 4.2 x 10–3) (5), and is higher than the rates reported for other rapidly evolving viruses, such as human immunodeficiency virus type 1 (gp160env; 2.4 x 10–3) (44), human respiratory syncytial virus (G; 1.9 x 10–3) (98), and hepatitis C virus (NS3; 1.5 x 10–3) (67).
Our studies were also facilitated by the availability of isolates representing well-defined evolutionary pathways obtained over a 10-year period. The 1981 isolates from Venezuela were closely related to the true imported founder virus, and increasingly sensitive poliovirus surveillance in the Americas (16, 64) yielded an ample number of well-documented direct descendants of the founder virus. Although comprehensive sequence data sets now exist for poliovirus isolates from other regions of endemicity (7, 23), the evolutionary pathways in most areas of endemicity are complex, and isolates closely related to the overall common founder are not generally available (31). The evolutionary pathways of outbreaks are usually well defined, but most outbreaks are of short duration (usually <1 year), precluding extended observation (8, 63).
Sequences within the P1/capsid region comprise the widest interval that can be used routinely to follow poliovirus lineages. Recombination within the P2 and P3 noncapsid and 5'-UTR regions is frequent among circulating polioviruses (33, 39, 51, 90), introducing a potential source of distortion to molecular clocks. Indeed, multiple rounds of recombination with heterologous species C enteroviruses (6) had occurred during the 10 years of circulation of the Andean lineages, but none of the breakpoints mapped within the P1/capsid region (unpublished results). Intertypic or intergenotypic recombination within the P1/capsid region occurs only rarely, and the known crossovers map within 90 nt of the start codon (unpublished results) and 120 nt of the P1/P2 junction (3, 52, 55).
The global evolutionary record for poliovirus is fragmentary, consisting of multiple geographically separate genotypes (40). No extended evolutionary record exists for any specific genotype, and the divergence times for even the most closely related genotypes are presently unknown. Reasonable estimations of divergence times will require a better understanding of the evolutionary range of the dating methods to be used. To facilitate these analyses, we applied two different approaches to overcome the limitations in the available biological record. Rerooting of the phylogenetic trees of the natural sequences enabled us to model the saturation of As and Bs substitutions for up to 20 years, because the early divergence of the Andean lineages effectively doubled the total evolutionary time available for analysis. The initial rates of As and Bs substitutions estimated by rerooting closely corresponded to the rates obtained by linear regression and other methods. Beyond 20 years, it was necessary to apply a second approach, using computer simulation based upon the inferred frequencies of specific base changes. The saturation curves for 3CP transitions (>99% of which were As substitutions) and 3CP transversions (95% of which were Bs substitutions) generated by the two different approaches were in excellent agreement over the 0- to 20-year period open to comparison and were similar to the predictions of the K2P formula at a transition/transversion ratio of 10 (26, 42). The predictions of the simulation were also consistent with the observed VP1 sequence differences across divergent poliovirus genotypes. However, such comparisons explore the limits of 3CP saturation but provide no additional information about the saturation kinetics. Further investigation of the time course for 3CP saturation, especially for transversions, will require comparisons across closely related genotypes.
The simulation predicted that the rate of accumulation of Bs substitutions will decline over time and will be
60% of the initial rate at 100 years. Because the CIs for the estimated divergence times will widen as Bs substitutions approach saturation, determination of deeper evolutionary relationships beyond the useful range of the poliovirus Bs clock will require calibration of even slower molecular clocks. Poliovirus clocks based on the component rates for fixation of Ka substitutions, similar to earlier studies on flaviviruses (96), may hold promise. The observation that fixation of P1/capsid amino acid replacements occurs primarily by genetic drift suggests that it may be possible to develop refined poliovirus protein clocks with only limited recourse to site-specific rate adjustments.
A general problem with rapidly evolving RNA viruses is that broader evolutionary trends are difficult to discern (36), and phylogenetic trees of natural isolates are often complex. Phylogenetic trees summarizing only P1/capsid Bs substitutions across poliovirus genotypes have revealed previously unrecognized relationships among some genotypes, and the dates of divergence of those genotypes have been estimated using the Bs clock (unpublished results). These relationships reflect past patterns of transmission during the period of widespread polio endemicity, conditions quite different from the current phase of localized endemicity (85). Understanding of past transmission pathways may contribute to a better understanding of current risks of poliovirus spread from the remaining foci of endemicity. Similar analyses of other rapidly evolving RNA viruses, especially agents of vaccine-preventable diseases known to have significant transition biases (13, 71, 97), may aid in the development of improved strategies for their control.
The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the funding agency.
Published ahead of print on 20 February 2008. ![]()
Supplemental material for this article may be found at http://jvi.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»