Previous Article | Next Article ![]()
Journal of Virology, September 2008, p. 8947-8950, Vol. 82, No. 17
0022-538X/08/$08.00+0 doi:10.1128/JVI.00101-08
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Arnold J. Levine, and
Raul Rabadan*
Institute for Advanced Study, Einstein Dr., Princeton, New Jersey 08540
Received 15 January 2008/ Accepted 6 June 2008
|
|
|---|
|
|
|---|
48 for 15 years. For a null Poisson process, this gives an extremely low P value of 6.6 x 10–20. Note also that 15 years is actually a lower bound on the true evolutionary time between these two segments, since their latest common ancestor is likely to predate both; this makes their virtual identity even more improbable. To visualize this anomaly, consider the plot in Fig. 1.
![]() View larger version (52K): [in a new window] |
FIG. 1. Hamming distance versus distance in years for PB2 segments from the avian database. Each plus represents a pair of different PB2 sequences of influenza viruses isolated from avian hosts. The x axis gives the difference in years between the times of isolation of the two viruses, and the y axis gives the Hamming distance between their sequences (number of nucleotide [nt.] differences divided by the length of the segment). The dashed line represents a Jukes-Cantor fit for the expected Hamming distance. An apparently slowly evolving pair is shown.
|
Another anomaly present in the influenza virus database relates to homologous recombination. This mechanism is generally believed to be extremely rare or nonexistent in the influenza virus and in negative-strand RNA viruses generally (4, 5) and has never been observed experimentally. However, we found many sequences in the database that show very strong apparent evidence of homologous recombination. As a rough test for this, we divided the nucleotide sequence of each segment into two equal halves. For each pair of segments, we compared the number of nucleotide differences between them in the first half (i.e., 5' in the positive strand) with the number of nucleotide differences in the second half. The idea is that if two segments are nearly identical in one part of their sequence but very different in another part, this is strong evidence of homologous recombination, with the divergent parts explained by a recombination event.
In Fig. 2, we plotted a sample comparison for pairs of PB2 segments of viruses isolated from avian hosts. Most points cluster along the diagonal, as would be expected for roughly uniform evolution along the segment with no homologous recombination. However, there are some very significant outliers. For instance, the PB2 sequence of A/shorebird/DE/236/2003(H11N9) differs from that of A/shorebird/DE/231/2003(H9N4) by 6 nucleotides out of the first 1,155 and by 80 out of the second 1,155. Using a null hypergeometric distribution, this gives an extremely low P value of 1.6 x 10–18. As before, these outliers are so extreme that possible corrections allowing for slightly nonuniform evolution along the segment are irrelevant: no model would account for the difference distribution of such a pair as the result of a random fluctuation. Beyond such extreme outliers, a glance at Fig. 2 shows a larger scatter of points lying well off the diagonal "cloud," suggesting that many more sequences are potential apparent recombinants with less significant P values.
![]() View larger version (40K): [in a new window] |
FIG. 2. Number of differences in nucleotides 1156 to 2310 versus the number of differences in nucleotides 1 to 1355 in PB2 segments from the avian database. Each plus represents a pair of different PB2 viruses isolated from avian hosts. An apparently recombinant pair is shown.
|
There have been some historically reported cases of "frozen evolution" in the influenza virus. The most famous of these involves the reemergence in 1977 of H1N1 in the human population after an absence of 20 years. The viruses isolated in the former USSR and China in 1977 were virtually identical in their nucleotide sequences to H1N1 viruses from 1949 to 1950. (We readily detected all previously known "frozen viruses" in our search but omitted them from the list in Appendix S1 in the supplemental material.) In this case, it is believed that the term "frozen" applies literally; these viruses were probably stored in a laboratory for 27 years and then reintroduced into the population in a vaccination experiment gone wrong. There are additional examples involving common laboratory strains PR/34 and WSN, which appeared to reemerge unchanged in humans and camels in Mongolia (1, 23) in the 1980s and pigs in South Korea (7; Seo et al., unpublished) in 2004, respectively. These cases are believed to be explained by escaped vaccines in the former case and stock contamination in the laboratory in the latter.
What can account for the many "frozen" sequences reported here? One possibility is that some interesting biological mechanism is at work. For example, it is possible that the "frozen" viruses are mutating at a slower rate, perhaps because of a more faithful polymerase. To examine this possibility, we searched for amino acid mutations in the polymerase genes (and other genes) common to the "frozen" viruses but were unable to find any such mutations (data not shown). Given the lack of error correction for RNA to RNA polymerase, a mutation that dramatically reduces replication errors does not appear plausible. Another possibility is that these viruses have a much lower rate of replication, perhaps because they persist without replicating within host cells or even in the outside environment, but there is no known latency mechanism for RNA viruses, and the very long times (often decades) elapsed between isolations of nearly identical viruses make this kind of mechanism seem somewhat unlikely (12, 13, 15, 17-21, 24). A recent article argues against the likelihood of influenza virus persistence in the outside environment, such as environmental ice (22). It is important to note that the notion of "evolutionary stasis," which may or may not hold for influenza virus in certain hosts, is not relevant to these results; even viruses that are "static" at the amino acid level are expected to have normal rates of drift in synonymous third-codon nucleotide positions.
We speculate that perhaps the most likely explanation for both of the anomalies reported here is stock contamination in the sequencing laboratories (or wherever the viruses are stored). If the virus stock containing virus A is contaminated with virus B, an experiment supposedly sequencing virus A is actually sequencing virus B, thus resulting in apparent near sequence identity between viruses A and B; if viruses A and B are separated by many years, this will appear as an anomalously low evolutionary rate. If viruses A and B are mixed in the stock, the reverse transcriptase reaction used during sequencing could jump between an A and a B template, resulting in an apparent homologous recombinant. This possibility is consistent with the fact that there is a very significant overlap between the sequences exhibiting apparent slow evolution and those exhibiting apparent homologous recombination; this overlap would be very difficult to explain on biological grounds but is natural if stock contamination has occurred. Also, there is a relative prevalence of old viruses (isolated before 1990) and viruses sequenced by laboratories in Asia, and especially China, among the anomalous sequences; it is tempting to speculate that such differences could reflect differences in laboratory protocols. Along the same lines, nearly all anomalous sequences come from avian and swine hosts; it seems natural to assume that viruses from human hosts are generally handled with greater care (because of the potential public health hazards resulting from their spread) and are thus less susceptible to stock contamination.
If stock contamination is indeed to blame for these anomalies, the results reported here could represent just the tip of the iceberg. This is because we would detect the contamination of viruses A and B only when A and B are sufficiently distant from each other in time of isolation (resulting in a "frozen" virus) and/or nucleotide sequence (possibly resulting in a "recombinant"). It is natural to assume, however, that most contamination events in fact occur between viruses that are relatively close to each other in both time and sequence, resulting in a reported sequence that is wrong but not wrong enough to be detectable by the present methods; this could perhaps account for the "off-diagonal clouds" in Fig. 2. Thus, the present results suggest that an unknown and possibly quite nontrivial percentage of the data in the influenza sequence database might be compromised, and it is our hope that some steps will be taken by the influenza virus research community to address the issue of quality control in the database. One simple, though certainly insufficient, measure would be to regularly resequence the viruses; we expect that in most cases involving apparent "recombinants," a new sequencing assay would result in a different sequence, since it seems unlikely that the reverse transcriptase jumps would occur in the same positions as before. Aside from such detection steps, more should be done in laboratories to prevent contamination from occurring in the first place. If the rapid growth in the influenza virus genome database can be accompanied by addressing these apparent quality control issues, the influenza virus research community will truly be in possession of an invaluable resource.
Published ahead of print on 25 June 2008. ![]()
Supplemental material for this article may be found at http://jvi.asm.org/. ![]()
These two authors contributed equally to this work. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»