Previous Article | Next Article ![]()
Journal of Virology, July 2008, p. 6427-6433, Vol. 82, No. 13
0022-538X/08/$08.00+0 doi:10.1128/JVI.00471-08
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Centre for Infectious Diseases, University of Edinburgh, Summerhall, Edinburgh EH9 1QH, United Kingdom,1 Institute of Virology, University of Bonn, Sigmund-Freud-Strasse 25, D-53105 Bonn, Germany,2 Department of Virology, University of Helsinki, and Helsinki University Hospital Laboratory Division, Haartmaninkatu 3, FI-00290 Helsinki, Finland3
Received 4 March 2008/ Accepted 7 April 2008
|
|
|---|
|
|
|---|
B19 virus has respiratory, vertical (where infection may lead to intrauterine anemia or fetal death [hydrops fetalis] [4]), and parenteral (often through the contamination of plasma-derived blood products [3]) routes of transmission. Primary infections are characterized by an intense viremia of short duration, a rapid resolution following seroconversion for antibody to B19 virus, and lifelong immunity from reinfection (23). In Western countries, the frequency of B19 virus exposure rises steeply with age in individuals between the ages of 1 year and the late-teenage years to approximately 60 to 70% (13, 28, 33, 44).
Little is known about the origin, past epidemiology, or evolution of B19 virus. However, the lifelong persistence of viral DNA in tissues (11, 15, 19, 22, 30, 41) may reveal what variants circulated in previous decades of life when primary infections occurred (30). There are three genotypes of B19 virus, differing in nucleotide sequence by approximately 13 to 14% (6, 16, 29, 37); genotypes 1 and 2 are found in Europe, the United States, and other Western countries, while genotype 3 is restricted to sub-Saharan Africa and South America (6, 31, 34). Sequences of both genotype 1 and 2 were found in tissues of persistently infected individuals, although genotype 2 detection was strictly limited to those born before 1973, indicating that the circulation of this genotype ceased in Finland and Germany after this date (30). A similar restriction of genotype 2 to those born before 1963 was found in study subjects from Scotland (27), indicating some commonality in the transmission networks and dynamics of B19 over large areas of Europe.
The relatively dynamic picture of B19 virus populations suggested by these findings has reawakened interest in the evolution and molecular epidemiology of B19 virus. To estimate the times of origin of genotype 1 variants that replaced genotype 2 in Europe, we created and analyzed a large data set of genotype 1 VP1/VP2 sequences of plasma-derived (n = 33) viruses. We additionally studied tissue-derived variants of B19 virus to compare the evolution of horizontally transmitted, exogenous virus populations with that of virus populations occurring during persistent infection in vivo. The rate of the sequence change of B19 virus was several times higher than previously determined (31, 38), suggesting a much shorter time frame for the emergence of genotype 1, which fits better with the "bioportfolio" evidence for its replacement of genotype 2 in Western countries.
|
|
|---|
Amplification of B19 virus sequences. DNA was extracted from tissue and plasma/blood product samples by standard methods (27, 30) and screened for B19 virus DNA sequences by B19 virus-nested PCR (26). The whole of the open reading frame 2 (ORF2) gene was amplified from positive samples by using three sets of nested primers (see Table S2 in the supplemental material). Amplicons were sequenced using the BigDye Terminator kit (Applied Biosystems), and sequences were assembled and aligned in Simmonic Sequence Editor v1.5 (40). The sequence data set was supplemented with published sequences with annotated sample dates and sources (i.e., plasma or tissue) or provided by the investigators (see Table S3 in the supplemental material).
Screening for recombination and positive selection. All sequences were screened for recombination through the detection of phylogeny violations between trees constructed from sequential 501-base fragments, increasing by 252-base increments through the ORF2 region by using a TreeOrder scan in a Simmonic package. The only violations detected at a bootstrap value of >65% involved the previously described recombinant sequence AN66 (38), which was excluded from further analysis. No evidence for positively selected amino acid residues among the remaining data set of 121 ORF2 sequences was obtained by using CodeML (45) or by using the HyPhy package (all P values were >0.01 [32]).
Phylogenetic analysis and rate calculation. Phylogenetic trees of B19 virus ORF2 sequences were constructed by neighbor-joining of Jukes-Cantor analysis (J-C)-corrected pairwise distances. Regression analysis was performed by using maximum composite likelihood (MCL) distances (calculated by MEGA4 [43] with a gamma distribution value of 0.179) between each sequence and the earliest dated sequence available, PVBPRO. The separate regression analysis of synonymous sites was performed by using J-C-corrected distances.
A Markov chain Monte Carlo (MCMC) method implemented in BEAST package version 1.4 (9) was used to independently estimate the rate of the sequence change of B19 virus (10). Dated sequence sets were run with chain lengths of 10 million under the SRD06 model of substitution (39) and the assumption of a constant population size. Separate runs using a strict molecular clock and a relaxed (uncorrelated lognormal) clock were carried out. All other parameters were optimized during the burn-in period. The output from BEAST was analyzed by using the program TRACER (http://beast.bio.ed.ac.uk/Tracer).
Nucleotide sequence accession numbers. B19 virus sequences were assigned the accession numbers EU478527 to EU478589 (see Table S1 in the supplemental material).
|
|
|---|
![]() View larger version (26K): [in a new window] |
FIG. 1. Variability of ORF1 and ORF2 coding sequences measured as mean pairwise synonymous distances between sequences of the same genotype (A) and different genotypes (B) (note the different y axis scales between panels A and B). Analysis was carried out using 201-base fragments increasing by 24-base increments across each coding region. ARF, alternate reading frame.
|
|
View this table: [in a new window] |
TABLE 1. Comparison of sequence diversity levels in ORF2 and an alternate reading frame
|
![]() View larger version (45K): [in a new window] |
FIG. 2. Phylogenetic analysis of plasma-derived (A) and autopsy/biopsy sample-derived (B) B19 virus sequences from this study (circles) and published sources (diamonds). Trees were constructed from pairwise J-C-corrected nucleotide distances by neighbor-joining. Symbols were colored based on the calendar year of collection (see the key). The robustness of groupings was calculated by bootstrap resampling of 1,000 replicates of the data; values of 70% are shown.
|
![]() View larger version (18K): [in a new window] |
FIG. 3. Relationship between sample date and divergence (MCL-corrected pairwise distances) between the sequence of the reference strain of B19 virus, PVBPRO (collected in 1973), and plasma-derived (A) and autopsy/biopsy-derived (B) sequences. Previously published (open circles) and newly obtained (filled circles) sequences are shown. Each graph includes a line of best fit, calculated according to the linear region.
|
|
View this table: [in a new window] |
TABLE 2. Estimates of the rates of sequence change of B19 viruses by using regression and BEAST analyses
|
The same data sets were analyzed by the Bayesian MCMC method (10). For plasma-derived sequences, applying strict and relaxed molecular clocks predicted almost identical log likelihoods and nucleotide substitution rates (3.64 x 10–4 and 3.72 x 10–4 substitution per site per year) (Table 2). The relaxed clock assumption provided a mean branch substitution rate of 4.11 x 10–4 substitution per site per year (data not shown). Allowing a variable rate fitted the data significantly better than allowing a uniform rate (a log10 Bayes factor of 5.428), although likelihoods and substitution rates were similar (Table 2), indicating that the uniform-rate assumption provides an adequate model for the evolution of B19 virus. The MCMC estimates of the substitution rate were, furthermore, close (within 5 to 15%) to those calculated by regression analysis (4.4 x 10–4 substitution per site per year) (Table 2), lying within the high-probability distribution (HPD) of the MCMC values. Predicted dates for the most recent common ancestor (MRCA) of the sequence analyzed in the plasma-derived data set were from 1956 to 1959 (the HPD range was 1941/1945 to 1968/1969) (Fig. 2).
The substitution rates calculated for autopsy-derived variants of B19 virus by the MCMC method were 10-fold lower than those calculated for plasma-derived viruses (Table 2). The lower HPD interval was close to zero, consistent with an extremely low consistent rate of or undetectable sequence change among these sequences. These data are consistent with the lack of association between divergence and sample by regression analysis (Fig. 2B).
Rate of sequence change of B19 virus during persistence. The lack of correlation between divergence and sample year observed for tissue-derived B19 virus variants may be accounted for by restricted or absent virus replication during persistence. The immediate problem with investigating this hypothesis is the lack of information about the time of primary infection in the study subjects from whom tissue samples were obtained. However, as a group, their most likely expected times of infection may be inferred from standard incidence and seroprevalence information on B19 virus from the United Kingdom and Finland. Both countries showed a period of peak incidence between 8 and 9 years of age, at which time, overall seroprevalence was 35% (i.e., approximately half that of the adult population) (28).
From the subset of study subjects whose ages at the time of sampling were recorded (see Table S1 in the supplemental material), we compared two models of sequence change. Model A assigns a mean time of infection of 9 years of age and assumes no sequence change during the time that the virus persisted. Model B assumes a sequence change after infection at the same rate as that of the exogenous, horizontally transmitted virus. Model A provides a convincing correlation between the time of infection and the substitution rate at all and at silent sites (Table 3) (R2 = 0.32 to 0.33; P
0.01; substitution rates, 3.2 x 10–4 and 12.9 x 10–4 per site per year). Lower correlation coefficients likely arose from a greater scatter of the data associated with the introduction of a further variable (age of infection) and the lower number of subjects analyzed. The average of the measured substitution rates of (3.7 ± 2.0) x 10–4 and (18.3 ± 8.5) x 10–4 substitution per site per year at all and at synonymous sites, respectively, were remarkably similar (although with higher standard errors) to estimates based on regression analysis for plasma virus (Table 2). No such association is evident in model B (concordant with the analysis of the larger data set) (Table 2).
|
View this table: [in a new window] |
TABLE 3. Comparison of two models of B19 sequence change during persistenced
|
|
|
|---|
There are two likely reasons for the observed discrepancies in rates between studies. Previous analyses of ORF2 included regions of suppressed variability in VP1 containing overlapping gene sequences which effectively drove SSV in the VP1 coding frame down to close to zero. The inclusion of these relatively invariant regions in the analysis in previous analyses of the gene coding VP1/2 (31, 38) substantially reduced the overall substitution rates calculated.
The second reason for lower estimates in previous studies is the result of the inclusion of sequences derived from biopsy or autopsy tissue samples (e.g., 11 of 38 VP1/2 sequences analyzed by Shackelton and Holmes [38]). However, we have shown that sequence change in B19 virus variants during the time of virus persistence is not modeled adequately by assuming that they have the same substitution rate as horizontally transmitted viruses (Table 3, model B). To demonstrate the effect of their inclusion, we analyzed a combined data set of plasma- and tissue (autopsy)-derived B19 virus variants by the MCMC method and obtained a composite substitution rate for B19 virus of 1.66 x 10–4 (HPD, 1.17 x 10–4 to 2.19 x 10–4) substitution per site per year, three times lower than the rate based on plasma-derived viruses. This figure would be even lower if regions of suppressed variability in VP1 had been included in the analyzed sequences.
Comparison with substitution rates of other viruses. The sequence change in VP1/2 and elsewhere in the B19 virus genome is constrained by extreme conservatism in its amino acid sequence of encoded viral proteins (14). Almost all sequence change (even between the more divergent genotypes of B19 virus) occurs at synonymous sites, with extremely low dN/dS ratios for both major genes (31, 38). A fairer comparison of the underlying frequency and fixation of mutations in the B19 virus population with those of other viruses (whose protein sequence may not be under such constraint or may indeed be subject to positive selection) may be achieved by comparing synonymous-substitution rates. In the current study, regression analysis was allowed to calculate an unconstrained-substitution rate for B19 virus of at least 1.8 x 10–3 synonymous substitution per site per year (Table 2). This rate lies within the range (of substitutions per site per year) reported for flaviviruses (0.4 x 10–3 to 4.9 x 10–3 [restricted to nonzero rates]), paramyxoviruses (0.7 x 10–3 to 4.8 x 10–3), and orthomyxoviruses (0.4 x 10–3 to 4.2 x 10–3) but was higher than ranges reported for togaviruses and reoviruses (0.1 x 10–3 to 0.78 x 10–3 and 0.6 x 10–3 to 1.4 x 10–3, respectively) and lower only than those reported for picornaviruses (2.5 x 10–3 to 12 x 10–3).
The factors that contribute to the rapid sequence drift of B19 virus and potentially to the drift of other parvoviruses maintained by long, horizontal transmission chains are unknown. B19 virus, in common with other small viruses, is replicated by host cell DNA polymerase, and the enzyme's intrinsic proofreading activity might be expected to reduce the frequency of nucleotide misincorporations to that of host DNA copying. In primates, this mutation frequency leads to a synonymous-substitution rate of approximately 2 x 10–9 substitution per site per year (5), around a million times slower than that calculated for B19 virus. The difference in substitution rates cannot be entirely attributed to differences in the number of replicative cycles between B19 virus and its human host. Human DNA in the germ line is copied approximately 30 (females) and 200 (males) times per generation, an average of around 5 times per year. B19 virus cannot possibly undergo 5 million sequential genome replications in a year. With a minimum replication cycle in cells of at least 24 h, during which time at most 10 to 20 sequentially templated copyings of the viral genome during the synthesis of concatamerized replication intermediates are likely to occur (8), there would be a maximum of 7,000 sequential genome transcriptions per year. This total is nearly 1,000 times fewer than that expected if B19 virus and human DNA replication occurred with the same underlying mutation frequency.
It is possible that the unidirectional synthesis of DNA during the rolling hairpin replication of parvovirus DNA, resolution, and synthesis of single-stranded genomic DNA sequences for packaging (8) may be associated with higher error frequencies than the bidirectional synthesis of host genomic DNA. Interestingly, substitution rates in RNA viruses are correlated with genome size, with lower rates occurring in viruses with larger genomes (18). This provides evidence that viruses, in common with larger organisms, have the ability to modulate mutation rates to provide the correct balance between replicative fitness and adaptability. One might speculate that some small DNA viruses, such as B19 virus, may have a similar ability to modify the mutation rates of the host cell, perhaps because its transmission and epidemiology, and thus its evolutionary constraints and pressures, might closely resemble those of respiratory viruses with RNA genomes. How this might be achieved is not known.
Recent spread of parvovirus B19 virus. Revised estimates for the substitution rates of B19 virus predict a relatively recent date for the MRCA of genotype 1 (Europe, the United States, Japan, Korea, and Brazil). An MRCA occurring in the late 1950s is consistent with the epidemiology of genotype 1 and genotype 2. The large-scale surveillance of B19 virus genotypes in Europe showed that genotype 1 infections are almost invariably detected in contemporary samples, with extremely few if any genotype 2 (or 3) infections (6, 7, 21). This contrasts with the frequent detection of genotype 2 variants in tissue samples or persistently infected (immunosuppressed) individuals (16, 24, 27, 30, 36). The hypothesis that these variants originate from primary infections when genotype 2 was the predominant circulating virus (30) is supported by the unusual age restrictions of type 2-infected subjects (those born before 1973 and 1963 [27, 30]). These observations are fully consistent with the predicted MRCA of genotype 1 in the late 1950s, which then almost entirely replaced genotype 2 by the early 1970s over large parts of the world. The epidemiological factors and the possible role of biological differences between genotypes, such as greater infectivity or transmissibility, that may have contributed to its dramatic spread and replacement of other genotypes are entirely unknown (12).
Nature of B19 virus sequence change during persistence. The observation that the sequence divergence of tissue-associated variants showed no correlation with sample collection dates demonstrates directly that the evolution of B19 virus during its persistence differs from that of horizontally transmitted viruses. However, paired samples from acute infection and after a period of persistence were not available to directly investigate the sequence change, and this study had to rely on estimates of times of infection based on those determined by previous seroepidemiological studies (13, 28, 33, 44). However, the crude model used in the current study (primary infections at the age of 9 and no subsequent sequence change during persistence) actually fitted the evolutionary data closely and restored the correlation between sequence divergence and the estimated year of primary infection (Table 3).
Models A and B compared in this study are extremes, and the observed better fit of model A to the data clearly does not rule out other scenarios, such as the persistence of the virus with a much lower rate of sequence change arising from restricted virus replication or ongoing rapid sequence change in a small subset of individuals and an absence in the majority. Observations for maintained strong cytotoxic-T-cell reactivity in B19 virus-seropositive individuals (17) and low-level fluctuating viremia in 0.3 to 0.8% of healthy blood donors (20, 35) do indeed imply various degrees of ongoing replication in vivo. It remains uncertain whether individuals with reactivated virus may be infectious; it is therefore possible that the infrequent cases of genotype 2 infections recorded in Europe (7, 21, 24) may have arisen by the transmission of reactivated virus from persistently infected individuals rather than by the continued low-level circulation of this genotype in the community. Understanding the nature of the persistence of B19 virus is therefore important for future investigations of its molecular epidemiology and evolution and of the factors underlying the large-scale population replacements that we have witnessed in the past few decades.
We are grateful to Leena-Maija Aaltonen (Helsinki University Central Hospital [HUS]), Harri Laitinen (Finnish Red Cross), Esa Partio (Dextra Medical Centre), Annamari Ranki (HUS), Jeanne Bell and Francis Carnie (University of Edinburgh Department of Pathology), Kate Templeton (Royal Infirmary of Edinburgh), and Sally Baylis, Jacqueline Fryer, and Anthony Hubbard (National Institute for Biological Standards and Controls, London, United Kingdom) for provision of plasma, the plasma pool, clotting factor, and biopsy tissues used in the study.
Published ahead of print on 16 April 2008. ![]()
Supplemental material for this article may be found at http://jvi.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»