Previous Article | Next Article ![]()
Journal of Virology, December 2004, p. 12817-12828, Vol. 78, No. 23
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.23.12817-12828.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Department of Infectious Diseases, St. Jude Children's Research Hospital, Memphis, Tennessee,1 Department of Virology III, National Institute of Infectious Diseases, Tokyo, Japan2
Received 21 May 2004/ Accepted 14 July 2004
|
|
|---|
|
|
|---|
Early reports describing influenza B virus evolution focused on the HA gene. Sequencing of the HA1 region of a number of viruses suggested that the HA gene had evolved into two separate lineages sometime before 1983, with both lineages distinct from the HA genes of viruses that circulated from 1940 to 1973 (24, 25). These viruses were found to cocirculate at times in the same location (14, 25), although in some years viruses with HAs from each lineage could be found only in isolated regions of the world (20). However, the segmented genome of influenza viruses allows genetic exchange to occur by a process called reassortment, where gene segments from different viruses infecting the same host can mix. It was soon recognized that reassortment was occurring between cocirculating influenza B viruses and that the division into lineages extended to the other seven gene segments as well (16, 18, 20). Studies have shown that reassortment contributes to the diversity of influenza B viruses, and a potential functional advantage of these nascent reassortant viruses was postulated as an explanation for the recurrence of epidemics despite the relative antigenic stability of HA (16, 20).
Prior evolutionary studies of influenza B viruses have focused on a limited number of genes or gene segments and their relationships to each other (16, 18, 20). The two lineages of these various genes, as well as the viruses that bear them, have been referred to as either Yamagata 88-like or Victoria 87-like, after two viruses with representative HAs from the original descriptions B/Yamagata/16/88 (Yam88) and B/Victoria/2/87 (Vic87) (25). However, the complexity generated from the reassortment of multiple gene segments has made identification of viruses by the HA lineage alone insufficient. For example, Yam88 has an NS segment from the same lineage as Vic87 (18) and is therefore a reassortant itself and not representative of later viruses from the same HA lineage. Differences in the gene constellation beyond the HA gene may be important in some circumstances, such as vaccine strain selection, when a change in NA by reassortment may alter the antigenicity of the vaccine if it is not changed to correspond (4, 20). In this report, we examined the evolution of entire influenza B virus genomes, comparing 31 viruses isolated from 1979 to 2003 from both Asia and the United States. We found great diversity from reassortment and described multiple genotypes based on the mixing of gene segments from both lineages. This approach has given us insight into the evolution of individual genes and of whole viruses.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Virus abbreviations and accession numbers of sequences used in this studya
|
Phylogenetic and evolutionary analysis. Phylogenetic trees were constructed by using the neighbor-joining method (26) and bootstrap analysis (n = 500) to determine the best-fitting tree for each gene. Nucleotide distances were estimated by using the method of Tajima and Nei (29) and evolutionary trees drawn using TREECON software (TREECON for Windows, version 1.3b) (30). Nucleotide and amino acid substitution rates for lineages of individual genes were estimated by determining the evolutionary distance (number of substitutions per site) of each virus from the putative node of divergence at the "root" and plotting against the year of isolation. The slope of the best-fitting line was determined by regression analysis and reported along with the coefficient of correlation for the slope of the plot. As a measure of the strength of the data, a slope was reported only if it was positive and the coefficient of correlation was greater than or equal to 0.75. The percentage of nucleotide differences from the putative node of divergence that code for amino acid changes was calculated for the lineages of individual genes and compared using two-way analysis of variance corrected for multiple tests. A P value of <0.01 was considered significant for this comparison.
Nucleotide sequence accession numbers. Nucleotide sequence data accession numbers for sequences determined for this study are listed in Table 1, along with accession numbers for previously reported sequences utilized in this study (2, 3, 5, 9, 10, 13, 14, 16, 17, 22-25, 28, 31, 33) and unpublished sequence data for viruses Rus69 (D. Katinger, J. Romanova, and A. Egorov), Mie93 (R. Nerome, M. Ishida, M. Matsumoto, Y. Hiromoto, and N. Tanabe), and Mem97 (J. A. McCullers). Nucleotide sequence data accession numbers for sequences determined for this study but not included in Table 1 are as follows for the HA and NA genes, respectively: for B/Nashville/6/89, AF129895 and AF129907; for B/Nashville/45/91, AY581946 and AY581987; for B/Guangzhou/86/92, AY581947 and AY581988; for B/Houston/1/92 AF129899 and AF129912; for B/Sichuan/8/92, AF129898 and AF129911; for B/Houston/2/93, AF129900 and AF129914; for B/Nanchang/26/93, AF134911 and AF134906; for B/Nanchang/195/94, AY581948 and AY581992; for B/Memphis/18/95, AF129891 and AF129918; for B/Nanchang/3/95, AY581951 and AY581995; for B/Nanchang/15/95, AY581952 and AY581996; for B/Memphis/19/96, AF129905 and AF129920; for B/Memphis/21/96, AY581954 and AY581998; for B/Nanchang/20/96, AY581955 and AY581999; for B/Nashville/3/96, AF129906 and AF129922; for B/Nanchang/4/97, AY581957 and AY582004; for B/Nanchang/5/97, AF134915 and AF134910; for B/Nanchang/15/97, AY581958 and AY582005; for B/Nanchang/7/98, AY581960 and AY582007; for B/Nanchang/12/98, AY581961 and AY582008; for B/Memphis/8/99, AY581962 and AY582010; for B/Memphis/1/01, AY581964 and AY582013; and for B/Memphis/3/01, AY581965 and AY582014.
|
|
|---|
![]() View larger version (20K): [in a new window] |
FIG. 1. Phylogenetic trees constructed using the neighbor-joining method and bootstrap analysis (n = 500) to determine the best-fitting tree for the PB1 and PB2 genes. Nucleotide distances were estimated by the method described by Tajima and Nei (29) and by using evolutionary trees drawn with TREECON software. The bar indicates a 2% difference at the nucleotide level.
|
![]() View larger version (21K): [in a new window] |
FIG. 2. Phylogenetic trees constructed using the neighbor-joining method and bootstrap analysis (n = 500) to determine the best-fitting tree for the PA and NP genes.
|
![]() View larger version (17K): [in a new window] |
FIG. 3. Phylogenetic tree constructed using the neighbor-joining method and bootstrap analysis (n = 500) to determine the best-fitting tree for the HA1 region of the HA gene.
|
![]() View larger version (18K): [in a new window] |
FIG. 4. Phylogenetic trees constructed using the neighbor-joining method and bootstrap analysis (n = 500) to determine the best-fitting tree for the NA and NB genes.
|
![]() View larger version (17K): [in a new window] |
FIG. 5. Phylogenetic trees constructed using the neighbor-joining method and bootstrap analysis (n = 500) to determine the best-fitting tree for the M1 and M2 genes.
|
![]() View larger version (19K): [in a new window] |
FIG. 6. Phylogenetic trees constructed using the neighbor-joining method and bootstrap analysis (n = 500) to determine the best-fitting tree for the NS1 and NS2 genes.
|
Two separate PA genes were recovered from the quasispecies population of Nan56094, first egg passage (the original clinical material was not available for testing). Two viruses could be plaque purified from this stock, and seven of their gene segments were identical, but each possessed a different PA gene. The first virus isolated is considered to be B/Nanchang/560/94, and the PA genes are labeled "a " and "b " to differentiate them (Nan56094 contains the a gene). It is uncertain whether these two viruses existed separately in nature, whether they existed together as a quasispecies, or whether the second PA gene is a remnant of a different virus from which only this gene segment was recovered through laboratory reassortment during the initial passage. The two PA genes are found in separate lineages in the phylogenetic tree (Fig. 2).
Amino acid differences between lineages II and III. The predicted amino acid sequences of lineage II and III genes from this study were analyzed to identify sites where they consistently differed. This analysis was then extended to all full-length sequences in the ISD from viruses isolated after 1986. The PB1, PB2, PA, NB, and M2 genes had four, five, six, two, and four lineage-specific amino acids, respectively, all of which could be generalized to all sequences in the ISD (Fig. 7). The NP gene had three lineage-specific amino acids, and the NS1 gene had five lineage-specific amino acids, although some exceptions existed in the ISD for lineage II NP genes at position 21 and lineage II NS1 genes at positions 116 and 127. The HA1 gene had 11 lineage-specific amino acids, 6 of which could be generalized to all 722 full-length HA1 sequences in the ISD. The high number of exceptions may be due to increased antigenic variability of this protein or may be a function of the number of sequences available for examination. Interestingly, the NA gene had only one lineage-specific amino acid, although several other positions were very commonly lineage specific (for lineages II and III: position 44, E versus K; position 70, G versus E; position 71, V versus M; position 73, L versus F; position 88, P versus Q; position 106, T versus A). The exceptions are rare and appear to be reversions at those positions, most of which are in a set of viruses whose NA genes cluster together, i.e., Gua93, Nan56094, and Nan96. The M1 and NS2 genes had no lineage-specific amino acids; these genes separate into lineages at the nucleotide level but are highly conserved at the amino acid level.
![]() View larger version (13K): [in a new window] |
FIG. 7. Amino acids specific for group 2 or group 3 lineages. a, numbering of amino acids is relative to B/Lee/40 beginning at the start of the coding region of each protein. b, differences were determined for the entire coding sequence of all proteins except HA, where only the region coding for HA1 was used (analysis of the limited number of full-length HA sequences available indicates that no lineage-specific differences are seen in the HA2 region). c, group 2 HA1 sequences in the ISD may also have a D, A, or K at position 56; a T at position 71; an N at position 148; a G at position 149; an N, R, or A at position 201; an S or D at position 208; or an I at position 261. Group 3 HA1 sequences may also have had a K at position 56 or an S, D, or no amino acid at position 162A. d, amino acids in italics are lineage specific within the viruses studied in this report, but exceptions exist within sequences found in the ISD, as further detailed here. For this determination, 30 post-1986 sequences each were examined for PB1, PB2, and PA; 722 sequences were examined for HA; 36 sequences were examined for NP; 119 sequences were examined for NA and NB; 36 sequences were examined for M1 and M2; and 80 sequences were examined for NS1 and NS2. e, group 2 NP sequences in the ISD may also have a T at position 21. f, group 2 NB sequences in the ISD may also have a T at position 53. g, group 2 NS1 sequences in the ISD may also have a C at position 116 or a K at position 127. -, no amino acid.
|
|
View this table: [in a new window] |
TABLE 2. Evolutionary rates of influenza B virus genes and percentage nucleotide differences that cause amino acid changes
|
Genotypes of influenza B virus. At least 15 separate genotypes based on the lineage of the eight gene segments were identified from the 32 viruses examined in this study (Fig. 8). A 16th genotype is possible if the Nan560b PA is considered in the background in which it was isolated (data not shown). Although the selection of the viruses included in the study was not random, it is clear from these results that reassortment among influenza B viruses is a common event and that a great diversity of gene constellations exists. Of the 56 theoretical pairings of a lineage II gene segment with the other seven lineage III gene segments, a surprisingly high 47 actual pairings are seen. The lineage III NP segment is not seen paired with the lineage II PB2, NA, or M segments, and the lineage II PB1 segment is not seen paired with the lineage III PB2, PA, HA, NP, NA, and M segments. Thus, most possible pairings are present despite the small sample size, suggesting that few if any functional restrictions on reassortment exist.
![]() View larger version (32K): [in a new window] |
FIG. 8. Genotypes of influenza B viruses. Hatched boxes represent lineage I genes, open boxes represent lineage II genes, and filled boxes represent lineage III genes. From top to bottom, the boxes within each virus diagram represent the lineage of gene segments 1 through 8, which code for PB1, PB2, PA, HA, NP, NA and NB, M1 and M2, and NS1 and NS2.
|
|
|
|---|
Analysis of the entire genomes of 31 influenza B viruses isolated between 1979 and 2003 revealed a high degree of genetic diversity generated by reassortment. Fourteen genotypes could be distinguished within these viruses. These genotypes appear to be the result of reassortment events between a theoretical genotype 2 virus that contributed seven gene segments to Yam88 and a theoretical genotype 3 virus similar to Vic87, which took all eight gene segments from lineage III (Fig. 8). While the Yam88 virus did not derive all eight gene segments from lineage II, a later reassortment event generated a set of genotype 2 viruses that circulated worldwide between 1996 and 2001 that did (Fig. 8). Individual genes, excepting the NS1 gene, were undergoing linear evolution during the 20-year period that was studied, although the rate of evolution differed depending on the lineage and the specific gene. The degree to which these nucleotide changes translated into amino acid changes also varied by lineage and by gene.
It is tempting to speculate that the sequence of reassortment and evolution of genotypes can be inferred from an examination of the gene constellation of individual viruses and the year of isolation. Thus, a genotype 3 virus may have acquired a lineage II NP segment prior to 1989 to generate genotype 7, which acquired a lineage II NS segment between 1989 and 1993 to generate genotype 9, which sequentially led to genotypes 11, 14, and then 15. Similarly, a theoretical genotype 2 virus likely gained a lineage III NS segment to generate genotype 6, which later took back a lineage II NS segment to form the group of genotype 2 viruses that circulated in the late 1990s. However, there is little evidence to support this set of assumptions, and the viruses could easily have been created another way. For example, it might be assumed that genotype 15 viruses were generated by reassortment between genotype 14 viruses and a source of a gene segment 6 from lineage III (Fig. 8), particularly since genotype 15 viruses cluster so closely to genotype 14 when the phylogenetic tree for HA1 is examined (Fig. 3). However, analysis of the trees for the other genes demonstrates that the PA (Fig. 2) and the M1 and M2 (Fig. 5) genes likely derive from a different set of viruses, as the genotype 15 virus genes are found in different sublineages from the genotype 14 viruses.
One manner by which reassortment could contribute to evolution is by providing a functional advantage for the new viruses. However, proof of a biological difference between the different genotypes, such as differences in growth or pathogenicity, is lacking. Therefore, the alternate explanation that reassortment is random and does not impact the fitness of the virus can also be argued. A number of observations from this study suggest this conclusion. Most possible pairings of gene segments between lineages were seen, and the absence of the others is likely explained by the sample size and the disappearance of some lineage III gene segments from circulation. Although the selection of viruses for study was not random, a large number of genotypes could be detected by studying relatively few isolates. The genes in different lineages are closely related; examination of the polymerase genes reveals that even the most divergent genes are 94% identical at the nucleotide level and 97 to 98% identical at the amino acid level. Finally, we could find no evidence for coevolution of genes, which might be expected if functional mismatches occur during reassortment. However, the dominance of lineage II gene segments over lineage III in terms of length of circulation and continued evolution supports a functional difference for at least some of these genes. The study of potential biological differences between artificial reassortants created by reverse genetics (11) will be necessary to answer this question.
One advantage of our method for selecting viruses was that representatives from both Asia and North America could be studied and compared. Both lineages of all genes were seen in both the Asian and the North American viruses, indicating frequent mixing between these pools of viruses. Examples from both regions are also seen in most of the identified sublineages. It is clear that multiple genotypes can circulate in a single location at once. Five different genotypes circulated together in China in 1993 and 1994, while at least two circulated in Memphis in 1996 (20). However, individual genotypes were more likely to be confined to a single region. For example, genotype 14 viruses were found only in Japan and China, while genotype 15 viruses were found only in the United States. This result is most likely an artifact of sampling bias, but it also may reflect regional circulation of certain genotypes following reassortment events. More viruses will have to be sequenced to understand the global distribution of genotypes.
One of the questions that this study can help to address is the sequestration of gene segments away from the general pool of viruses that are sampled annually. Lindstrom et al. (16) observed a 9-year gap in the lineage II NS gene segment between 1984 and 1993, a 5-year gap in the lineage III M gene segment between 1988 and 1993, and the disappearance of the lineage III M gene segment after 1993. This result is partially explained by the limited number of influenza B viruses which have been sequenced, as only seven sequences are available for the NS gene segment from viruses isolated between 1984 and 1993 (five at the time of the report), and thus the gap may be by chance. This appears to be the explanation for the gap in the lineage III M gene's appearances, since the sequences in this report add viruses to that lineage in 1989, 1991, 1994, and 1996. However, multiple instances of lineage gaps of 3 to 4 years can be found even with the expanded pool of sequences provided in this report. This question and the question of whether the lineage III genes that have not been seen in recent years are currently circulating at low levels in regions that are not frequently sampled or are gone from the population remain open.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»