Previous Article | Next Article ![]()
Journal of Virology, November 2006, p. 11274-11282, Vol. 80, No. 22
0022-538X/06/$08.00+0 doi:10.1128/JVI.01236-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
John W. Wilesmith,3
Nigel P. Ferris,1
Geoff H. Hutchings,1 and
Donald P. King1
Institute for Animal Health, Ash Road, Pirbright, Surrey GU24 0NF, United Kingdom,1 Division of Environmental and Evolutionary Biology, University of Glasgow, Glasgow G12 8QQ, United Kingdom,2 Animal Health and Welfare, Defra, 1a Page Street, London SW1P 4PQ, United Kingdom,3 Met Office, FitzRoy Road, Exeter, Devon EX1 3PB, United Kingdom4
Received 13 June 2006/ Accepted 30 August 2006
|
|
|---|
|
|
|---|
In 2001, the United Kingdom livestock industry was devastated by an epidemic of FMDV caused by the Pan Asia O strain of the virus (22, 23). The disease was rapidly and widely disseminated throughout the United Kingdom, and a total of 2,030 farms were declared "infected premises" (IPs). The extent of the outbreak and speed with which the virus spread hindered attempts to trace the exact origin of infection for many premises; numerous cases were simply attributed to "local spread." Samples of infected tissue were collected from the majority of IPs prior to the culling of the animals and stored at the Institute for Animal Health, Pirbright Laboratory. These samples provided a unique opportunity to study the microevolution of this virus strain over the 7 months that the epidemic persisted.
The microevolution of FMDV has been extensively investigated in cell culture, largely focusing upon serotype C, revealing high levels of genetic variability (2, 44, 50) and adaptability (21). Viral populations have been shown to evolve to resist antibody-induced selection (6, 14, 27, 49) and to exhibit what has been referred to as genomic "memory" (3, 43). They have also been shown to adapt to different multiplicities of infection (48), incurring the effects of Muller's ratchet (19), although the virus population can subsequently recover from detrimental effects of a low multiplicity of infection (20, 33). The virus population has also been shown to maintain defective RNAs (10) and to persist and coevolve alongside cell lines (13). All serotypes of FMDV have been shown to exhibit substantial levels of genetic diversity in the field, particularly within the nucleic sequence encoding the capsid proteins, over broad temporal and spatial scales (9). This is typified by a continuous accumulation of silent changes over time and a more restricted set of amino acid substitutions (36).
Thus, it is evident from previous studies, both in vitro (above) and in vivo (8), that FMDV mutation rates are high. It is much less clear to what extent the genetic diversity, anticipated to be generated over the course of infection within an individual animal, becomes fixed, is transmitted to other animals, and thereby accumulates over the course of an outbreak. Quantitative models of viral genetic change over short time periods suggest this rate of accumulation should be readily measurable and informative of transmission pathways at high resolution (26). Strong tests of this prediction would be provided by evidence that the consensus sequence (acquired directly from tissue samples, rather than through prior passage in cell culture), which reflects the average of the viral genetic diversity within a sample, evolves at a rate sufficient to enable reliable tracing of transmission pathways.
Over recent decades, studies considering the nucleotide sequence encoding the VP1 capsid protein of FMDV alone have provided valuable insight into the emergence of various strains and serotypes worldwide (7, 31, 45). Changes observed within VP1 over the course of single outbreaks have also been previously documented (11, 30, 38, 45, 54), but the relatively short length of the sequence encoding VP1 (633 nucleotides [nt]), together with the short infectious period, has limited the power of these analyses to reconstruct transmission pathways at high resolution. Complete genome analysis has the potential to resolve transmission histories and improve the precision of epidemiological investigations. The ability to determine the exact source of infection of IPs from the 2001 outbreak in the United Kingdom would help us understand the route of transmission of infection between farms. Such information is invaluable for the control of future epidemics.
Here we report 23 complete consensus genomes, sequenced directly from clinical epithelial samples collected from 21 IPs from the 2001 outbreak in the United Kingdom. These samples were carefully chosen with the aim of, firstly, characterizing the extent of variation within the outbreak, secondly, analyzing the relationship between time and accumulation of genetic change, and, thirdly, determining whether sequence data can map known transmission events and therefore aid the analysis of spatio-temporal clusters of IPs with uncertain histories.
|
|
|---|
![]() View larger version (22K): [in a new window] |
FIG. 1. Location of FMDV-infected premises investigated. The figure shows a map of Great Britain with total designated FMD-infected premises () in 2001; 20 of those from which virus has been sequenced are highlighted ( ).
|
|
View this table: [in a new window] |
TABLE 1. Details of the 23 FMDV consensus genomes sequenced in this study
|
RT-PCR amplification of samples. TRIzol (Invitrogen) RNA extraction was performed as described previously (42). Ten microliters RNA, 4 µl 10 mM primer UKFMD/Rev6 (5'GGC GGC CGC TTT TTT TTT TTT TTT3'), 4 µl 10 mM random hexamers (Promega), 2 µl 10 mM deoxynucleoside triphosphate mix was incubated at 68°C for 3 min and then on ice for 3 min. Eighteen microliters of freshly prepared RT mix (4 µl 10x RT buffer [Invitrogen], 8 µl 25 mM MgCl2, 4 µl 0.1 M dithiothreitol, 2 µl RNase OUT [Invitrogen]) was added to the sample, followed by 2 µl SuperScript III reverse transcriptase (Invitrogen). The sample was then incubated at 42°C for 4 h, after which the reaction was stopped by incubation at 85°C for 5 min. The cDNA was then cleaned using QIAquick PCR purification kits (QIAGEN), eluting in 30 µl of elution buffer before storage at 20°C. Five overlapping PCR fragments covering the FMDV genome were amplified from each sample by using 47.5 µl of master mix (5 µl 10x buffer, 2 µl MgSO4, 1 µl 10 mM deoxynucleoside triphosphate mix, 1 µl 10 mM forward primer, 1 µl 10 mM reverse primer, 0.2 µl Platinum Taq Hi-Fidelity [Invitrogen], 37.3 µl nuclease-free water) plus 2.5 µl cDNA. The five primer sets consisted of 5' TTG AAA GGG GGC GTT AGG GTC TCA 3' (forward) and 5' GGG TGA AAG GTG GGC TTY GGG T 3' (reverse); set 2 consisted of 5' CCC AAG TTT TTA CCG CCT TTC CCG 3' (forward) and 5' GTT GAT AAT GCT TCC AGT GTT GCC TG 3' (reverse); set 3 consisted of 5' CCA CGC TGG CAT CTT CCT GAA AG 3' (forward) and 5' CCA GTG GCC AGT TCC TCA AAT GC 3' (reverse); set 4 consisted of 5' GTG TTG GAC CTG ATG CAA ACC CC 3' (forward) and 5' GTC TCT TGC GAG TCT CGC GGA TC 3' (reverse); and set 5 consisted of 5' TTC AAG CCT CAA CCG CCC CTC 3' (forward) and 5' GGC GGC CGC TTT TTT TTT TTT TTT 3' (reverse). Primer sets 1 to 4 were run on a PCR program cycle of initial denaturation at 94°C for 2 min and then 39 cycles of 94°C for 30 s, 68°C for 30 s, and 72°C for 3 min, ending with incubation at 72°C for 7 min. Primer set 5 was run on a cycle of initial denaturation at 94°C for 2 min and then 39 cycles of 94°C for 45 s, 60°C for 45 s, and 72°C for 3 min, finishing with incubation at 72°C for 7 min. PCR products were cleaned up using QIAquick PCR purification kits (QIAGEN), eluting in 30 µl in elution buffer. In order to estimate DNA content, 2 µl of the PCR product was run on a 1.2% agarose gel at 90 V for 35 min alongside a quantitative ladder (HyperLadder I; Bioline). The DNA concentrations of these products were adjusted for sequencing (fragment 1, 20 ng/µl; fragment 2, 70 ng/µl; fragment 3, 100 ng/µl; fragment 4, 100 ng/µl; and fragment 5, 100 ng/µl) and stored at 20°C.
Sequencing reaction. Forty-two forward and 42 reverse sequencing reactions (primer sequences available at http://www.iah.bbsrc.ac.uk/primary_index/current_research/groups/king_files/publications.htm) were performed in a 96-well Beckman sequencing plate using 8 µl of master mix (CEQ dye terminator cycle sequencing with quick start kit; Beckman Coulter), 4 µl of 1 mM primer, and 8 µl DNA per reaction. A QIAGEN BioRobot 3000 was used to set up the reactions. The plate was run on a program of 30 cycles of 96°C for 20 s, 50°C for 20 s, and 60°C for 4 min. Following thermocycling, the reactions were cleaned up by ethanol precipitation before being run on the Beckman Coulter sequencing machine.
Data analysis. The raw data files were assembled into a contig and edited using SeqMan (DNAStar). All further sequence manipulation was performed using BioEdit and DnaSP freeware. The phylogenetic reconstruction (tree available at http://www.iah.bbsrc.ac.uk/primary_index/current_research/groups/king_files/publications.htm) was carried out using maximum likelihood methods as implemented in Paup* (version 4b 10), assuming an HKY model of base substitution (variable base frequencies and variable transition and transversion frequencies (24) with rate heterogeneity. Bayesian bootstrap values were obtained from 500,000 generations; every 100th tree was sampled using the software package MRBAYES (28). The genealogical relationships shown in Fig. 2 were based on statistical parsimony as implemented in the software package TCS (12); gaps were assumed to be missing nucleotides (Fig. 2). A molecular clock was fitted using Markov chain Monte Carlo techniques implemented in the software package BEAST (Bayesian evolutionary analysis sampling trees) (18), a relaxed clock with exponentially distributed rates and exponentially increasing population size and the HKY model of base substitution (24) with rate heterogeneity was assumed. The number of substitutions over the phylogeny was estimated using DnaSP, and the nonsynonymous-to-synonymous ratios (dN/dS) were estimated using the HyPhy software package (40). HyPhy was also used for partition analysis to determine whether selective pressures (dN/dS ratios) differed significantly between genes using a Muse-Gaut 94 substitution model (39). The Kolmogorov-Smirnov (KS) test was used to investigate the positioning of mutations along the genome.
![]() View larger version (38K): [in a new window] |
FIG. 2. TCS analysis of 23 United Kingdom Pan Asia O FMDV sequences. Statistical parsimony analysis of 23 sequences by TCS; each sequence is represented by the IP number from which it was isolated. Each connecting branch line represents a nucleotide substitution, with each dot representing a putative ancestor virus. Nonsynonymous mutations are shown in boldface. The arrow represents the introduction of virus into the United Kingdom via IP4, where replication occurred for 2 weeks before clinical samples were collected. IP4 was the source of virus for IP1, where the disease was first detected, and also for IP6, from where virus was disseminated throughout the United Kingdom. IP2027 is the last IP of the outbreak. Gray shading indicates changes located within VP1.
|
|
|
|---|
Maximum likelihood phylogenetic analysis of complete consensus genome sequences. Groups of viruses that reflect the known geographic spread of the FMD were identified by phylogenetic reconstruction of the outbreak using maximum likelihood methods (data not shown). The transition transversion ratio was estimated to be 7.61 (the alpha parameter governing the gamma distribution for rate heterogeneity was estimated as infinity). All internal branches within the phylogeny were supported by high Bayesian bootstrap values in excess of 96%. Viruses in different geographical areas evolved along separate lineages, resulting in a large breadth of variation within the outbreak, with 197 nucleotide substitutions at 191 sites along the genome. Thus, six sites were identified to have been mutated twice, higher than that expected on the basis of a Poisson process with a mean of 197 substitutions/8196 nucleotides, in which all sites were assumed to be equally mutable (in which an average of 2.2 sites would be expected to receive multiple hits).
Fine-scale statistical parsimony analysis (TCS). Statistical parsimony analysis (Fig. 2) revealed that virus isolated from the first infected premise (IP4) and the last (IP2027) differed by 44 silent (synonymous) and 8 nonsilent (nonsynonymous) substitutions, and there were also 2 nucleotide deletions (one of four bases and one of two bases in the 5' UTR). At a finer resolution, viruses from closely related farms differed by between 1 and 5 nucleotides. Virus sequences obtained from three pigs sampled on the same day on a single premise (IP4) where disease had been present longer than that on other farms differed in consensus by 1 to 2 nucleotides.
Genomic distribution of nucleotide substitutions. Twenty-eight substitution sites were identified within the noncoding regions at the 5' and 3' ends of the genome, and 163 substitution sites were identified within the open reading frame. Of these substitution sites in the open reading frame, 40 resulted in nonsynonymous change and 123 were synonymous, resulting in an overall nonsynonymous-to-synonymous ratio (dN/dS, estimated per synonymous and nonsynonymous site) of 0.09. The distribution of synonymous changes throughout the genome could not be distinguished from a uniform distribution (D) (Fig. 3) (by KS test, D = 0.0424; P > 0.5). Nonsynonymous changes appeared more clustered (Fig. 3), although the KS test (D = 0.1845; P > 0.2) again did not distinguish their distributions from a uniform one. Only a single amino acid position (position 42 in VP1) underwent two independent substitutions. These were on two distinct geographical genetic lineages as a result of different nucleotide substitutions in adjacent sites: in Northumberland, this was a change from valine to alanine and, in Devon, from valine to isoleucine. Contrary to previous studies (9), VP1 was not found to be more variable than other parts of the genome, incurring less substitutions than other genes, such as VP3 (Fig. 3).
![]() View larger version (34K): [in a new window] |
FIG. 3. Synonymous and nonsynonymous substitutions across the genome. Shown is a graph of cumulative substitutions against nucleotide positions, depicting the rate of substitution for total (black), synonymous (gray), and nonsynonymous (light gray) point mutations. The rate of accumulation of silent changes throughout the genome is constant and faster than the rate of amino acid substitutions which also appears to vary more. The dN/dS ratio for each genomic region is shown in the lower graph, but likelihood analyses provide no support for different dN/dS ratios for different genes.
|
2 = 7.724; df = 11; P = 0.74) and the null hypothesis that all dN/dS ratios were equal could not be rejected on the basis of these data. VP4 had the highest dN/dS ratio overall, equal to 1, and thus there was no evidence of positive selection occurring at an amino acid level anywhere in the genome (however, this does not preclude the possibility of positive selection occurring at an RNA level). Rate of nucleotide substitution over time. Total nucleotide changes from the ancestor virus were shown to accrue linearly with time; synonymous changes accumulated faster than nonsynonymous changes (Fig. 4). A relaxed molecular clock for the rate of substitution of all nucleotide changes (estimated using BEAST) was found to advance at a rate of 2.26 x 105 per site per day (95% CI, 1.75 x 105 to 2.80 x 105). This analysis also predicted that the most recent common ancestor to the viruses sequenced was present 12 days before the recovery of the earliest FMDV sequenced in this study. This would date the start of the outbreak to 7 February 2001 (95% CI, 20 January to 19 February 2001), which is identical to estimates obtained on the basis of purely clinical evidence (1, 47). Previous studies have suggested that there may have been 35 sequential IP-IP transmission events between IP4 and the final case (IP2027) (25), suggesting that substitutions were fixed across the genome at an average rate of 1.5 nucleotide changes per IP transfer.
![]() View larger version (12K): [in a new window] |
FIG. 4. Accumulation of substitutions with respect to time. Shown is the accumulation of mutations (a measure of genetic distance) increasing linearly with time, compatible with a molecular clock. , all substitutions; , silent substitutions; , nonsilent substitutions.
|
![]() View larger version (10K): [in a new window] |
FIG. 5. TCS analysis of two clusters of infected premises. Statistical parsimony analysis of known and unknown virus transmission events. Each connecting branch line represents a nucleotide substitution, with each dot representing a putative ancestor virus. Black arrows indicate the transmission route determined by traditional contact tracing, gray arrows indicate the route supported by the genetic data. The data show the close genetic relationship of IP1692 to IP1597 and IP1654, although epidemiological tracing suggested IP1692 was infected from a more distantly related source.
|
|
|
|---|
Complete consensus sequences were clearly amplified with very few ambiguities found, indicating that this method was not affected by the proposed heterogeneous nature of virus populations. Sequence was obtained directly from epithelium samples, avoiding the passaging of virus through animals or cell culture. The virus isolate from the abattoir in Essex (IP1) was compared to an isolate sequenced previously, O UKG 35/2001 (AJ539141) (37), from the same abattoir. O UKG 35/2001 had been passaged through two pigs prior to direct sequencing. Notably, the virus consensus sequence derived directly from the epithelium of an animal from IP1 is very close to O UKG 35/2001, differing at only the three ambiguous sites identified within this isolate. It also differed from the remainder of the United Kingdom viruses sequenced at these same three sites, suggesting that it was an intermediate between the United Kingdom viruses and O UKG 35/2001.
The genetic variation observed between viruses was mainly due to synonymous point substitutions, the majority of which were transitions. These nucleotide changes were found to occur evenly across the whole genome (as indicated by the KS test). These two factors suggest that the evolution of FMDV that was demonstrated to have occurred during the 2001 outbreak in the United Kingdom may have arisen largely as a result of a genetic drift subject to purifying selection. The fixation of mutations could have occurred due to bottleneck events during the transmission between animals or IPs. The study includes only 23 sequences sampled over a short time scale, which has resulted in limited statistical power to discriminate differences in substitution rates and selection pressures between genes.
The TCS statistical parsimony analysis clearly showed the genetic evolutionary history of the virus. However, it could not definitively identify the geographic location or IP on which a nucleotide substitution had occurred. This is because the exact time that the virus was transmitted from a farm in relation to the time at which animals were culled and viral samples were collected is unknown. During the 2001 outbreak in the United Kingdom, there was strong pressure to cull animals as fast as possible following the identification of infection but, nevertheless, there were lengthy delays between the time that some IPs were estimated to have become infected and the time that samples could be collected from them. When transmission from an IP occurs a long time prior to the acquisition of viral samples, it will remain uncertain on which IPs substitutions occurred. This is apparent in our data; for example, we do not have the "real" ancestor to the outbreak, as the farm on which the outbreak began was not identified as infected until up to 3 weeks after the virus is estimated to have been introduced into Britain (Fig. 2). This problem, along with the observation that virus recovered from closely housed animals can differ by 1 to 2 nucleotides and is likely to pass through a "bottleneck" on passage between farms, suggests that, if this tool is to be used in future outbreaks, a greater number of samples taken from each and every farm may improve the resolution of this method.
Virus samples were sequenced from pigs, cattle, and sheep. Although no evidence was found of the effect of host tropism within these genetic data, such an effect cannot be confidently excluded using these data alone. The route by which each individual animal was infected is unknown. For example, the virus could have been circulating in sheep before being subsequently identified and sequenced from a cow, which would confound the results.
Nucleotide change conformed to a relaxed molecular clock corresponding to 0.00825 substitutions/site/year, thus 0.9% of the genome is predicted to change per annum. Additional analyses conducted assuming various strict molecular clocks and alternate demographic models fitted the data with only slightly reduced average posterior likelihoods and produced results that were very similar. These models are suggestive of a constant rate of viral replication and transmission during the outbreak. Independent genetic determination of the timing of the first case of the outbreak, matched previous estimates based on clinical evidence. Such a tool for detecting the timing of an initial case in an outbreak could, in principle, prove very useful in identifying how virus is introduced into the country. The results reported here were consistent with other studies, for example, those of serotype "O" virus in Taiwan, which found a divergence of 0.2 to 0.9% between VP1 genes sequenced over a 2-month period (54); serotype SAT-2, which was found to change at a rate of 0.009 substitutions/nucleotide/year (4); and serotype "C,1" which changed at an estimated rate of 0.0004 to 0.045 substitutions/nucleotide/year (51). They also correspond closely to rates measured over longer time periods, with the "O" serotype of the virus seen to change at a rate of 1.43% per year in the Philippines and at 0.43% per year in Turkey, for example (26).
It is well established that detailed high-resolution transmission histories can be reconstructed for some RNA viruses. Transmission events have been reconstructed between individuals infected by viruses with long infectious periods using sequence data, for example, for human immunodeficiency virus (34, 56) and hepatitis C virus (52). In the recent outbreak of severe acute respiratory syndrome, human-to-human tracing was investigated using virus genetic sequencing (35, 55). Broad resolution of rabies (5, 32, 53) and rhinovirus (46) transmission networks have also been studied using genetic sequencing. Using full-genome sequences enables the tracing of transmission histories of the virus at high resolution (between farms) reliably and with confidence. The capacity to conduct animal disease tracing by using genetic data is of considerable value, particularly considering the difficulties of identifying transmission pathways by other means. Much of FMDV transmission between farms is poorly understood, and the results of this study demonstrate the potential of this methodology to help in unraveling the transmission routes utilized by the virus. Unknown transmission histories were resolved as illustrated by the investigation of the cluster from County Durham. However, less detailed, yet informative analyses were also possible. For example, there was uncertainty over the relationship between Northumberland cases that arose in April and those that arose mid-August during the epidemic. Ineffective decontamination and cleaning of formerly infected IPs could have led to the propagation of subsequent outbreaks. However, the later Northumberland cases (IP1970, IP1976, and IP2027) were shown to be only distantly related to the early Northumberland case at IP1070 (Fig. 2) and actually originated in Cumbria.
This study shows that complete genome sequencing can be used to resolve interfarm transmission, and as shown previously (11), VP1 sequences alone are unlikely to contain sufficient genetic information to enable the reconstruction of transmission pathways at the highest levels of resolution. Between the first and the last IP there were estimated to be 35 interfarm transmissions, during which there were 52 nucleotide changes (i.e., 1.486 substitutions/interfarm transmission), but only 13 of these substitutions fell within the capsid genes (0.371 substitutions/interfarm transmission) and only 5 in VP1 (0.143 substitutions/interfarm transmission). Hence only full-genome sequences have the capacity to detect more than one nucleotide change per farm transfer. However, with the advance of high-speed sequencing technology, full-genome sequencing has the potential to be used in real time during future epidemics to further inform control policy.
The collection and archiving of UKG 2001 samples was undertaken by the Food and Agricultural Organization World Reference Laboratory for foot-and-mouth disease.
Published ahead of print on 13 September 2006. ![]()
Present address: Institute for Animal Health, Ash Road, Pirbright, Surrey GU24 0NF, United Kingdom. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»