Previous Article | Next Article ![]()
Journal of Virology, September 2007, p. 9932-9941, Vol. 81, No. 18
0022-538X/07/$08.00+0 doi:10.1128/JVI.00674-07
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
,
National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands,1 Erasmus Medical Center, Rotterdam, The Netherlands,2 Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands3
Received 29 March 2007/ Accepted 25 June 2007
|
|
|---|
|
|
|---|
The strains most commonly identified as the cause of outbreaks belong to genotype GGII.4. In The Netherlands, this was the case for 68% of all norovirus outbreaks that were characterized during 12 years of surveillance and for up to 81% of all health care-related outbreaks. Since their first detection in The Netherlands in January 1995, the GGII.4 strains have consistently been present in the Dutch population (46). These observations are in agreement with those of other surveillance studies worldwide (3, 4, 15, 17, 29, 36, 55).
During the past 15 years, four epidemic norovirus seasons have occurred, in the winters of 1995-1996, 2002-2003, 2004-2005, and 2006-2007. These worldwide epidemics were invariantly caused by the predominant genotype, GGII.4, and were attributed to the emergence of new variant lineages of this genotype (4, 31, 35, 52, 53). These genetic variants, which have been identified previously by partial sequencing of either the RNA-dependent RNA polymerase (RdRp) or the capsid gene, have been given several names across the world. Here they are referred to by using the first year of their detection, supplemented where necessary with an extra suffix. The following variants have been identified: <1996, 1996, 2002, 2004, 2006a, and 2006b.
The pattern of emergence of new lineages followed by large-scale epidemics suggests that new variants obtained one or more decisive advantages over the previously circulating predominant variant. It is unknown what the nature of this advantage is, but its basis is likely to be found in VP1, since this protein is needed for essential properties and functions in the viral life cycle, such as antigenicity, host specificity, host cell binding and virus entry properties, and assembly of new particles.
Noroviruses have a positive-strand RNA genome of
7.6 kb, which is subdivided into three open reading frames (ORFs). ORF1 encodes a polyprotein which is posttranslationally processed into the nonstructural proteins, including the RdRp. Conserved regions within the RdRp are commonly used as targets for diagnostic PCR assays. At the National Institute for Public Health and the Environment in The Netherlands (RIVM), region A (nucleotides 4279 to 4604; Lordsdale genome numbering [GenBank accession no. X86557]) is commonly used for genotyping outbreak strains. The second ORF (ORF2) encodes the major structural protein VP1. Ninety dimers of this capsid protein form a T=3 icosahedral shell (41). In the virion, a small number of copies of the protein encoded by ORF3 are present. The precise role of this protein is not clear, although it has been suggested that it functions both in upregulation of VP1 expression and as a histone-like protein in stabilizing the capsid-RNA complex (2, 19, 22).
The understanding of immunity against noroviruses remains limited. Between the different GGs and genotypes, antigenic differences as well as cross-reactivities have been demonstrated using virus-like particles and polyclonal antisera (20). Short-term immunity was reported, but preexisting antibodies were not protective against reinfection with the same genotype (25, 39, 56). Studies looking at neutralizing antibodies have not been possible due to the lack of cell culture or small-animal model systems (13). The high level of genetic diversity between different GGs and even between genotypes within the same GG resulting from the high mutation rate and from recombination events contributes to a large degree of antigenic diversity.
Host genetic factors determining the presence or absence of virus receptors also play an important role in susceptibility (21, 23). These receptors, the histo-blood group antigens, show virus strain-specific binding patterns, determining the ability of virus to infect potential host cells. Because noroviruses belonging to GGII.4 have the broadest range of binding to the histo-blood group antigens of all genotypes assayed to date, this may explain part of the relative success of these viruses (24). Other success factors may include a higher stability of the viral particles outside the host, a higher replication rate, or other factors that need to be investigated more thoroughly.
To obtain more insight into the genetic and structural bases of the selective advantage of new GGII.4 variants over the old GGII.4 variants, we determined the complete capsid sequences of a systematic sample of GGII.4 norovirus outbreak strains found in The Netherlands during 13 years of surveillance of viral gastroenteritis and studied their genetic diversity and predicted structure (46). Because a high-resolution three-dimensional (3D) model of GGII noroviruses was lacking at the time this study was initiated, a homology model of the capsid protein was made in silico based on the known 3D structure of the Norwalk virus (NV; GGI.1) capsid protein.
|
|
|---|
As the first step, all GGII.4 strains detected between January 1994 and December 2004 were selected from the database. A phylogenetic tree was made based on partial polymerase sequences of 145 nucleotides for the older sequences to 250 nucleotides for the strains isolated after 2001 (amplified with primer pair JV12 and JV13 or modifications thereof [region A]) (51, 53). The branching of the tree was used to guide the selection of outbreak strains for this study, with at least two strains per branch selected when sufficient material was available. Following reports of unusual outbreaks in the spring of 2006 (27, 28), six strains from this period were included in the study. A minimum spanning tree (MST) was made on the basis of 145 nucleotides of the polymerase gene, using the default settings in Bionumerics, to give an overview of the distribution of strains available in the database. An MST is a tree that connects all samples from a database in such a manner that the summed distance between all samples or branches is minimized. An MST is particularly useful for representing large (genomic) data sets with relatively high similarity levels and, as such, has been shown to enable representation of microevolution or population modeling (44, 45). Another condition is that the data set should represent the biodiversity for the organism under study and therefore should have been gathered over a time period that is short relative to the expected rate of change for the organism. During tree formation, the sample with the largest number of related samples is chosen as the root node, and subsequent branches are added in order of relatedness.
Viral RNA isolation, cDNA preparation, and sequencing. Stool specimens taken in selected outbreaks were collected from the biobank. Specimens were stored as undiluted stools at 4°C, as 10% fecal suspensions at 4°C, and as RNA extracts frozen at –80°C. Where available, extracted RNAs were used as the template. When this failed to yield a PCR product, a new RNA extraction was done from diluted stool or fecal samples. Sequencing of these samples was done as described previously (11). Briefly, RNA was reverse transcribed in overlapping fragments, using avian myeloblastosis virus reverse transcriptase (Invitrogen), and subsequently, the obtained cDNA was amplified and sequenced using an ABI Prism BigDye Terminator v3.0 ready reaction cycle sequencing kit. The primers that were used were the following: TCTCAGATCTGAGCACGTGG (GR19A), AACAGTTAAGATTGGGACG (GR19B), GTCTCTTGTCGAGTTCTCACG (GR20), GGTGAATTGAACACTACCCAGC (GR21), CTCGACCCGTGCCCACAAAGC (GR22), CATTATAATGCACGCCTGCGCC (GR23), GGGTCAACCAGTTCTACACAC (GR24), CCAGCTGAAGAACCTAGTCTCG (GR25), ACGTGCCCAGGCAAGAGCCAAT (GR-JS1), TAACATCTACTATTATATGGG (GR-JS2), TCATATTTGCAGCAGTCCCA (GR20A), CTCTGAAGGTGCAGATGTTG (GR21A), TGTGAATCCAGACACAGGTAG (GR24A), and ACGGGCCGCATCTGCTGTGGAA (GR25A).
Data processing. DNA sequences were processed using SeqMan and EditSeq (DNAStar Inc., Konstanz, Germany) and aligned and analyzed using the BioEdit sequence alignment editor (Isis Pharmaceuticals Inc.). Alignments were done manually or using ClustalW alignment algorithms in BioEdit. Informative sites were determined by ProSeq 2.91. Sites were considered informative when at least two strains had an identical amino acid mutation in the alignment. Informative sites discriminating subsequent epidemic variants were also determined. Epidemic variants were defined as GGII.4 strains that were dominant for at least one outbreak season following initial detection. Silent mutations (nucleotide mutations which caused no amino acid mutation) or replacement mutations (nucleotide mutations which caused amino acid mutations) were determined using ProSeq 2.91, with an insertion considered a single mutation.
Phylogenetic analyses were done in Bionumerics, using neighbor joining and the unweighted-pair group method using average linkages, with 1,000 bootstrap resamplings and no correction and with the gap cost set at 5%. Trees were plotted using the program Treeview (version 1.6.6) (37) or Treecon (version 1.3b) (50), with the exception of the MST, which was calculated as well as plotted in Bionumerics. For phylogenetic analysis of the partial polymerase and capsid sequences, the following sequences from GenBank were included: Grimsby strain (Hu/NoV/GGII.4/Grimsby/1995/UK; GenBank accession no. AJ004864), Farmington Hills strain (Hu/NoV/GGII.4/Farmington Hills/2002/USA; accession no. AY502023), Hunter strain (Hu/NoV/GII.4/Hunter 284E/2004/AU [accession no. DQ078794] for the capsid and Hu/NoV/GII.4/Hunter 532D/2004/AU [accession no. DQ078801] for the partial polymerase sequence), and Camberwell strain (Hu/NoV/GGII.4/1994/AU [accession no. U46500] and others for the capsid analysis).
Sequences were checked for possible recombination events by using Simplot (version 3.2), where the window size was varied from 80 to 150, with steps of 20 nucleotides, and a distance model with Jukes-Cantor correction was used. The capsid and RdRp sequences were analyzed independently, as well as after concatemerization of region A and the capsid sequences, to look for possible crossover in the joining region.
Homology modeling. The three-dimensional structure of the NV capsid protein (PDB code 1IHM) (40) was used as a template for homology modeling of the GGII.4 capsid protein. Sequence alignments were made using the program MUSCLE (14). Compared to the NV capsid protein, the GGII.4 capsid protein has four insertions of three to seven amino acids which cannot be modeled. Generally, such insertions are located in surface-exposed loops of proteins. Based on the alignment of the two sequences and on the 3D structure of the NV capsid protein, the most likely places for insertion were predicted. The GGII.4 capsid protein also has one deletion of two amino acids compared to the NV capsid protein. The place of this deletion can be modeled and was chosen in the same way as that for the insertions. Homology modeling was performed with WhatIf/Yasara Twinset software (Yasara) (54).
Nucleotide sequence accession numbers. The complete capsid nucleotide sequences determined in this study are accessible in the DNA DataBank of Japan under accession numbers AB303922 through AB303941 and EF126961 through EF126966.
|
|
|---|
The MST for the partial polymerase sequences is shown in Fig. 1. The total distance of the tree is 230 nucleotides. This tree was not used to make the selection of the strains. However, it illustrates the grouping of the different variants and the positioning of the selected strains because it takes into account the localization of nucleotide changes. Strains OB2000043, EP2002006, OB2004003, OB2004012, and OB2004039 are considered outliers based on their positions in the neighbor-joining tree for all polymerase sequences (not shown).
![]() View larger version (29K): [in a new window] |
FIG. 1. MST, based on alignment of 145 nucleotides of the polymerase gene sequences (region A) of all GGII.4 strains found in The Netherlands between January 1995 and August 2006 (n = 574). Colors represent different variants, as indicated in the figure. The sizes of the circles are drawn to scale with their member counts. The smallest circles represent 1 strain, and the largest circle (the center of the 2002 cluster) represents 70 strains. Genetic distances between the circles, in numbers of nucleotides, are given on connecting lines. The total distance is 230 nucleotides. Strains included in this study are indicated. The strains shown as circles with dotted lines are considered outliers.
|
![]() View larger version (19K): [in a new window] |
FIG. 2. Neighbor-joining tree for complete capsid amino acid sequences. Type strains from GenBank were used in order to emphasize and confirm the groupings. Branch lengths are drawn to scale. Bootstrap values are percentages of 1,000 iterations.
|
Analysis of the capsid gene and changes over time. Informative sites in the capsid sequences were then determined. An alignment of all informative sites in the capsid is represented in Fig. 3. Sites were considered informative when at least two strains had an identical amino acid or nucleotide mutation in the alignment.
![]() View larger version (63K): [in a new window] |
FIG. 3. Fixed amino acid changes (informative sites) in capsid sequences of GGII.4 outbreak strains collected between 1995 and 2006. The informative sites throughout the protein are listed from left to right. Amino acid numbering is indicated at the top, and outbreak dates (month-year of isolation, e.g., 01-95 is January 1995) and names are given on the left. From top to bottom, the same color indicates identical amino acids, and different colors are distinct amino acids. Colors were assigned by frequency; amino acids that occurred most are shown in green, followed by red, blue, and yellow (diminishing frequencies). The amino acids circled in magenta are part of the additional RGD motif present in the 2002 variant and the earliest strain. The arrow at the top indicates where an amino acid insertion occurred. The orange bars at the bottom indicate the locations of insertions in GGII.4 compared to NV and correspond to insertions 1 to 3 in Fig. 5A. Asterisks indicate hypervariable sites (with more than one mutation), and the arrows below the sequences indicate the sites where an amino acid mutation occurs at each variant change (not including 2006a). Domains are indicated in the bar below the figure.
|
|
View this table: [in a new window] |
TABLE 1. Informative sites in GGII.4 capsid sequencesa
|
As shown in Fig. S1 in the supplemental material and in Fig. 3, changes in informative sites occurred stepwise rather than gradually, with the steps coinciding in time with the emergence of each respective new epidemic variant (2002, 2004, 2006a, and 2006b). When consecutive variants were compared and EP2002006 was considered the precursor (<1996) of the 1996 variant, the numbers of stable amino acid mutations per emerging new variant were 14 (<1996 variant versus 1996 variant), 25 (1996 variant versus 2002 variant), 21 (2002 variant versus 2004 variant), 8 (2004 variant versus 2006a variant), 25 (2004 variant versus 2006b variant), and 23 (2002 variant versus 2006b variant). Both 2006a and 2006b were compared to the 2004 variant, since this was their temporal precursor. The 2006b variant was also compared to the 2002 variant, since these variants are genetically more closely related based on phylogenetic clustering (neighbor joining) (Fig. 2).
Prevalence of GGII.4 variants in The Netherlands from January 1995 to February 2007. Since the capsid changes showed clustering in time of GGII.4 variants and the capsid-based variant assignment was consistent with that based on the partial RdRp sequences used for routine surveillance, we plotted the presence of the different GGII.4 variant types in The Netherlands over time (Fig. 4). This figure shows that new variants invariantly replaced their predecessors within 5 months of cocirculation.
![]() View larger version (16K): [in a new window] |
FIG. 4. Graph showing the prevalence of GGII.4 variant types in The Netherlands between January 1995 and February 2007. Genotype and variant type assignments were done based on partial polymerase sequence data (region A) available from the Dutch norovirus surveillance database.
|
![]() View larger version (69K): [in a new window] |
FIG. 5. Informative sites mapped on 3D model of GGII.4 capsid proteins. Sites with two distinct amino acid changes over the 12-year period are depicted in green, and sites with three or more amino acid changes are shown in red. The conserved RGD motif is shown in blue. (A) Dimeric subunit of two capsids, with one in gray and one in light blue. The extra RGD motif is indicated in yellow. The locations of the insertions compared to the NV capsid, which have not been modeled, are indicated by orange arrows 1 to 4. The brackets on the right indicate the different domains. The shell domain is indicated in grey, the P1 domain is shown in green, and the P2 domain is shown in blue. (B) Three capsid proteins, including a dimer with one-half of a neighboring dimer. The gray and light blue areas form one dimer, and the yellow capsid belongs to another dimer. The inserted RGD motif is indicated by the blue arrows.
|
E2002
D2004
E2006b), 255 (S1996
G2002
S2004
G2006b), 340 (E1996
G2002
R2004
G2006b), 407 (N1996
S2002
D2004
S2006b), and 534 (A1996
T2002
A2004
T2006b).
![]() View larger version (51K): [in a new window] |
FIG. 6. (A) (i to v) Changes in informative sites (green) derived from amino acid comparisons between subsequent epidemic variants. For each comparison, two views of the capsid protein are given, with one frontal view and one from the rear. For the 2006b variant, two comparisons were made, with the phylogenetic precursor (2002) and the chronologic precursor (2004). (B) Amino acids that change between every subsequent variant group, with 2006a not included.
|
|
|
|---|
In the analysis of the informative sites, for both the nucleotide sequences and the amino acid sequences, mutations were fixed at a number of sites. Every successive variant had a number of distinct, lineage-defining mutations, which were found throughout the capsid sequence. The highest densities of informative sites were located on the surfaces of the protruding regions of the capsid (Fig. 3 and 5). The P2 domain had significantly more mutations than the rest of the capsid protein and, more specifically, many more replacement mutations (0.11 per nucleotide, versus 0.01 to 0.03 per nucleotide for the other domains of the capsid) (Table 1). This is clear evidence of selective force providing new variant viruses having certain mutations with an advantage over previously circulating variants.
The subsequent variants of GGII.4 accumulated mutations in chronological order, and each descended from its predecessor in time, with the exception of the 2006b variant. At the amino acid level, this variant seemed more related to the 2002 variant (Fig. 2). This newly emerging variant is likely a descendant of a virus strain older than the 2004 variant that has accumulated quite a few mutations while not causing many outbreaks in the population.
The situation that is currently unfolding is highly intriguing. In the spring of 2006, two distinct new variants emerged, named 2006a and 2006b. These two new variants have been detected and reported worldwide, often in cruise ship-related outbreaks (27, 28). It has not been reported before that two norovirus variants can cause epidemic-scale outbreaks simultaneously. The 2006a variant shows 8 amino acid mutations compared to its predecessor, the 2004 variant, whereas the 2006b variant shows 25 amino acid mutations compared to the 2004 variant, its temporal and therefore immunologic predecessor. Both 2006 variants emerged almost simultaneously (42). It will be interesting to see if both variants continue to cause outbreaks simultaneously in the population or if one proves to be more successful than the other, perhaps with differing patterns across the world.
The viral strains used for this analysis all originated from our outbreak surveillance database. Strains that are intermediate between the epidemic variants are likely to have reduced viral fitness and are therefore less likely to be detected on the basis of sampling from outbreaks. Although we did look for intermediate strains bridging the different variants by choosing to sequence a number of outliers from the polymerase alignment, no capsid sequences that could be considered intermediates between the different variants were found. EP2006006 does not fit with any variant of the strains included in this study. It does, however, show resemblance to the older strains from GenBank that were included in the neighbor-joining tree. Since no real intermediate strains were found, the origin of emerging variants or the reservoir in which they accumulate their defining mutations thus remains a subject for speculation. The most logical place is the general population. While not causing (many) outbreaks, strains may circulate in the population and not come to the attention of surveillance, slowly accumulating mutations until the built-up genetic variety results in enough antigenic variety to be able to successfully cause (more) outbreaks and become a dominant variant. Alternatively, animal reservoirs, a limited number of which have been recognized (7, 8, 16), or chronically infected patients (18, 34) may be places where the virus can accumulate mutations.
Neutralizing epitopes were previously reported for the surface-exposed P2 domain, and a role in antigenicity was indicated for this domain in several studies with human as well as animal caliciviruses (6, 9, 30, 32, 33, 38, 43, 48, 49).
Tan and coworkers reported the conserved RGD motif to be involved in host cell binding (12, 47). Highly variable regions were found in close spatial proximity to the conserved RGD motif (Fig. 5). Tan and coworkers also reported three amino acids, neighboring the RGD motif, which were suggested to have a role in ligand (histo-blood group antigen) binding specificity (47). One of these surrounding amino acids, designated IV in their paper, is an informative site in our study. Before the 2002 variant, this amino acid was Q376, and it mutated into E376 from the 2002 variant onward. Studies are needed to determine if these mutations lead to changes in host binding specificities.
A second RGD motif (amino acids 339 to 341) was present in the earliest strain sequenced as well as in the 2002 variant. Since it was absent from variants after 2002, it does not seem to confer a great binding advantage. The location of this motif, in spatial proximity to the reported conserved RGD motif on the surface of the molecule and as an insertion compared to the NV genome, does suggest a possible role in ligand binding.
The five amino acids that were informative when comparing all chronologic sets of variants were spread over the capsid. One that stands out is amino acid 340 (E1996
G2002
R2004
G2006b), which in the 2002 variant was also part of an additional RGD motif. The functional implications of these mutations remain to be determined. For our structural analyses of the polymorphisms in the GGII.4 variants, we used a computer-derived model of the VP1 protein. After submission of the present study, Cao et al. published a paper on the cocrystallization of the P protein of a GGII.4 strain norovirus with its receptor (5). No differences between our computer model and this high-resolution structure were found to be of influence to the data presented here.
One could speculate that the location and positioning of the P2 domain of the capsid might explain part of the great prevalence of mutations in this area. The protruding region is connected to the shell domain with a hinge region, and an additional point of flexibility between P1 and P2 was reported (10, 40). This provides flexibility to slightly adjust to the position of the protruding region on top of the shell domain if needed, thus allowing for more conformational changes and thus for more mutations in this region than in the rest of the protein (10). This does not explain the epidemiological observations, however, and therefore we do not think it is the complete story.
Similarly, a possible advantage that new lineages of GGII.4 might have obtained by the accumulation of mutations is increased stability. However, even though increased stability of the viral particles outside the host would increase the number of infectious particles of the more stable variant available for infection, it does not explain the rapid and complete replacement of previous variants that circulated in the population (31). Improved binding or a broadened host range also does not provide a tight explanation for the replacement of previously circulating strains.
The most likely advantage for new variants over older ones is that of immune evasion. Noroviruses, particularly strains of GGII.4, are highly prevalent in the population. During epidemic seasons, up to 86% of norovirus outbreaks were caused by the predominant genotype, GGII.4, followed by a sharp drop in the prevalence of this genotype in the subsequent season (46). Then, only after the emergence of a genetically distinct new lineage of this genotype, the prevalence of GGII.4 strains rose again to cause a new epidemic.
A similar pattern of so-called epochal evolution has been described very elegantly for influenza A virus (H3N2) (26), where periods of phenotypic stasis are separated by the stepwise emergence of phenotypically distinct new variants, as we also see here for noroviruses. During the periods of phenotypic (and antigenetic) stasis, neutral or almost neutral mutations do occur and accumulate if they are beneficial or at least not disadvantageous. For influenza virus, this pattern of evolution and emergence of genetically novel variants is attributed to host population immunity and subsequent antigenic escape by the virus. The striking parallel observed here for norovirus suggests that this pattern of epidemics is driven by (population) immunity as well.
No long-term immunity to norovirus infection has been reported so far. Short-term protective antibodies have been reported, however, and repeated exposure, which is likely to occur with the high prevalence of norovirus, will lengthen the duration of specific protection. Studies with NV in volunteers suggested that immune protection wanes after 6 months without reexposure (25, 39).
In agreement with the hypothesis that immunity to the predominant GGII.4 variant built up in the population, Nilsson and coworkers reported on the in vivo evolution of a GGII.3 strain infecting a chronically ill immunocompromised patient (34). They observed the accumulation of amino acid mutations in the capsid protein and suggested that these changes gave rise to a new phenotype, through immune response-driven evolution. Similar to our findings, they found most amino acid mutations in the P2 domain of the capsid. This observation supports the idea that new variants may possibly emerge from chronically infected patients.
The data presented in this paper underpin observations that the elevated numbers of norovirus outbreaks in the winter seasons of 1995-1996, 2002-2003, 2004-2005, and 2006-2007 (4, 31, 46, 53) were mainly, if not solely, due to the emergence of new variants of the GGII.4 genotype. The gradual increase in nucleotide mutations in the sequences of norovirus GGII.4 strains confirms that genetic drift occurs in the virus. Additionally, the stepwise fixation of numbers of amino acid mutations in the capsid of this predominant genotype, mainly in the surface-exposed P2 domain, is likely to be caused by selective pressure due to population immunity, which resulted in emerging variants which have caused worldwide epidemic rises in outbreak numbers.
Further immunological studies of this variation in the capsid protein are urgently needed to shed light on the mechanisms of immune evasion utilized by the most prevalent genotype of norovirus.
Published ahead of print on 3 July 2007. ![]()
Supplemental material for this article may be found at http://jvi.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»