Previous Article | Next Article ![]()
Journal of Virology, January 2009, p. 1071-1082, Vol. 83, No. 2
0022-538X/09/$08.00+0 doi:10.1128/JVI.01501-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.

Eleanor Barnes,2,
Rachel Taggart,2
Philippe Lemey,1
Peter V. Markov,1
Bouachan Rasachak,3
Bounkong Syhavong,3
Rattanaphone Phetsouvanah,3,5
Isabelle Sheridan,2
Isla S. Humphreys,2
Ling Lu,4
Paul N. Newton,3,5 and
Paul Klenerman2
Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, United Kingdom,1 The Peter Medawar Building for Pathogen Research, University of Oxford, South Parks Road, Oxford OX1 3SY, United Kingdom,2 Wellcome Trust-Mahosot Hospital-Oxford University Tropical Medicine Research Collaboration, Microbiology Laboratory, Mahosot Hospital, Vientiane, Laos,3 Division of Gastroenterology-Hepatology and Nutrition, University of Utah, Salt Lake City, Utah,4 Centre for Tropical Medicine, Nuffield Department of Clinical Medicine, Churchill Hospital, University of Oxford, Oxford OX3 7LJ, United Kingdom5
Received 17 July 2008/ Accepted 20 October 2008
|
|
|---|
|
|
|---|
HCV is a positive-sense RNA virus belonging to the genus Hepacivirus in the family Flaviviridae that, at present, has not been found to naturally infect any species other than humans. The virus exhibits a very high degree of genetic diversity that is classified by phylogenetic analysis into six genotypes, denoted 1 to 6, each of which contains numerous subtypes, denoted 1a, 2c, 3d, 6f, etc. (50, 51). The recent discovery of a putative seventh genotype (33) suggests that further HCV diversity remains to be characterized.
The various genotypes and subtypes of HCV have been associated with different epidemiological and geographical patterns (31, 51, 54). For example, a high proportion of infections are caused by just a few strains that are globally distributed but genetically conserved, notably subtypes 1a, 1b, 2a, 2b, and 3a. Evolutionary molecular clock methods suggest that these so-called epidemic subtypes originated around 100 years ago (43, 45, 54, 59). These probably spread due to their association with new and efficient routes of viral transmission that arose during the 20th century, notably blood transfusion, hemodialysis, use of blood products, injection drug use, and nonsterile medical injections (e.g., see references 13, 15, and 45). Much research attention has focused on these strains, not least because they account for most HCV infections in Europe, North America, Japan, and Australasia. However, an understanding of the evolution and genetic diversity of all genotypes is necessary for the development of successful drug and vaccine treatments. For example, genotype 1 infections respond more poorly than genotypes 2 and 3 to antiviral drugs (11, 30), and the efficacies of cellular immune responses may also vary among strains.
In contrast to the epidemic scenario described above, HCV lineages in areas of endemicity are highly divergent and are typically isolated from residents or emigrants of restricted (and sometimes remote) geographic regions, suggesting a long duration of continuous infection in these areas. Endemic strains belonging to genotypes 1 and 2 are found in West Africa (2, 20, 47, 66). Similar regional patterns of endemic diversity have been found for genotype 3 in South Asia, for genotype 4 in Central Africa and the Middle East, and for genotype 6 in East Asia (28, 32, 36). However, no region has yet been found to contain high levels of HCV genotype 5 genetic diversity (64). Molecular clock analyses suggest that these strains have been present in their respective geographical regions for at least several centuries (43, 54).
Genotype 6 provides a striking example of HCV diversity in endemic areas. Indeed, the first HCV isolates from East Asia were so divergent that they were initially classified as separate genotypes, designated 7, 8, 9, and 11 (1, 50, 61, 62, 63). These strains have since been reclassified as individual subtypes within genotype 6 (52). Genotype 6 infections are also of considerable epidemiological importance: there are an estimated 62 million HCV-infected people in the WHO-defined Western Pacific region (68), which represents approximately one-third of all infections worldwide. As illustrated in Fig. 1, genotype 6 isolates have been obtained from residents or emigrants of Thailand, India, Cambodia, Laos, Myanmar (Burma),Vietnam, China, Hong Kong, and Indonesia (25, 46). The prevalence of HCV in the general population is variable among East Asian countries, ranging from about 0.5% in Singapore and Hong Kong, to around 6% in Vietnam and Thailand (69), and exceeding 10% in Myanmar (29). The reported prevalence in China is approximately 2 to 3%, which amounts to approximately 30 million people (22, 70). Of course, not all HCV infections in East Asia are caused by genotype 6, and the genotype distribution of HCV infection is variable among and within different East Asian countries. For example, genotype 6 appears to be the most frequent genotype in Myanmar (49% of infections) (29) and Vietnam (52% of infections) (37), but not in Thailand, where the globally distributed subtype 3a, which is associated with injection drug use, is twice as common as genotype 6 (23). The most common strain in China is the global subtype 1b (5), although subtype 6 is found at higher frequencies in southern China (27) and Hong Kong (42, 71).
![]() View larger version (33K): [in a new window] |
FIG. 1. A map of East Asia illustrating the geographic distribution of the NS5B gene data set (see also Fig. 3). Arrows indicate the countries of origin of the sequences and values in parentheses indicate the number of sequences from each country. Common subtypes present in each country (>20% of sequences) are shown in brackets. Note that the subtype distribution of NS5B sequences does not necessarily reflect the relative prevalence of infection by each subtype. Twelve isolates were sampled outside of Asia; available information indicates that at least seven of these were obtained from individuals of Asian origin.
|
We have investigated the transmission history of HCV in Asia by conducting a comprehensive evolutionary analysis of available HCV genotype 6 gene sequences. By combining sophisticated molecular clock, coalescent, and geographical analyses, we show that genotype 6 in Asia is structured by both geography and by an explosion of local epidemics that occurred during the 20th century. The genetic diversity of our new Lao isolates indicates a pattern of past HCV transmission distinct from that found elsewhere in Southeast Asia.
|
|
|---|
Viral RNA was reverse transcribed using the SuperScript II reverse transcriptase protocol (Invitrogen). In brief, 10 µl viral RNA, 0.5 µl of deoxynucleoside triphosphates (25 mM each), 0.5 µl random primers (500 µg/ml), and 1 µl sterile water were heated to 65°C for 5 min and then quick-chilled on ice. Four µl of 5x first-strand buffer, 2 µl dithiothreitol (0.1 M), and 1 µl RNasin were added and incubated at room temperature for 2 min, followed by addition of 1 µl SuperScript II reverse transcriptase. The mixture was incubated at 42°C for 50 min and then at 70°C for 15 min. cDNA was amplified by nested PCR, using the High-Fidelity Expand PCR system (Roche). Serum samples from healthy individuals were used as negative controls.
Table 1 provides full details of the primers used during each round of amplification for each subgenomic region. Amplification of the 5' untranslated region (UTR; 236 bp) was used to define HCV RNA positivity; 5'UTR PCR conditions for both rounds were 94°C for 2 min, 30 cycles of 94°C for 25 s, 55°C for 25 s, and 72°C for 25 s, and then 72°C for 2 min. HCV core gene (464 bp) amplification was performed; PCR conditions for both rounds were 94°C for 2 min, 35 cycles of 94°C for 30 s, 55°C for 30 s, and 72°C for 60 s, and then 72°C for 7 min. HCV NS5B gene (377 bp) amplification was performed; PCR conditions for both rounds were 94°C for 2 min, 28 rounds of 94°C for 15 s, 60°C for 30 s, and 72°C for 45 s, and then 72°C for 7 min. Amplicons were run on 1% agarose gels stained with ethidium bromide, and DNA was purified using the Qiagen gel extraction kit. Purified DNA was sequenced in both forward and reverse directions using the ABI Prism Big Dye Terminator cycle sequencing kit (Perkin-Elmer Applied Biosciences).
|
View this table: [in a new window] |
TABLE 1. Details of HCV primers for each subgenomic region
|
Maximum likelihood phylogenies were estimated for the core and NS5B data sets using PAUP (58). The most appropriate nucleotide substitution model for phylogenetic analysis was determined using the model selection procedure implemented in the program MODELTEST; for both data sets the best-fitting model was GTR+
+I (41). Under this model, phylogenies were heuristically searched using the SPR (subtree pruning and regrafting) and NNI (nearest neighbor interchange) perturbation algorithms. The statistical robustness levels of phylogenetic groupings were subsequently assessed using bootstrap analyses (500 replicates). Phylogeographic structure was then identified using FigTree (available from http://tree.bio.ed.ac.uk), and clades and lineages were colored according to their location of sampling.
Estimation of evolutionary time scale. We used an evolutionary molecular clock to estimate the time scale of HCV epidemic history in East Asia. Previous evolutionary analyses of HCV have employed the simplest strict clock model, which assumes that all phylogeny branches evolve at exactly the same rate. This potentially unrealistic assumption can be avoided by using a relaxed clock, which allows evolutionary rates to vary among lineages according to some probability distribution (8).
Because there was insufficient temporal structure in the study sequences to directly estimate rates of molecular evolution, we used the external rate calibration approach previously employed by Pybus et al. (44) and Hue et al. (19). First, we obtained an external estimate of the evolutionary rate of our 399-nt core gene fragment from an independent, previously published HCV data set (59) that does contain significant temporal structure. Posterior distributions of this rate were estimated under three clock models using the Bayesian MCMC approach implemented in the program BEAST: (i) strict clock, (ii) relaxed clock with an uncorrelated log normal rate distribution, and (iii) relaxed clock with an uncorrelated gamma rate distribution (8, 9). The exponential relaxed clock model is a special case of the gamma relaxed clock model. Second, these external rates were subsequently used to define normally distributed priors for the core gene evolutionary rate during our strict and relaxed clock analyses of the concatenated data set. Third, evolutionary rates for the NS5B gene region were estimated using a relative rate approach. Briefly, once we have specified a rate for the core region, we can use the relative diversity of the core and NS5B regions to estimate an NS5B rate, because the time scales for both regions are identical. Additionally, the core and NS5B gene regions were given independent among-site
rate distributions. In summary, the sequence evolution model used incorporated multiple levels of rate heterogeneity: among nucleotide sites, among genes, and among lineages.
Evolutionary analysis of the concatenated data set. To estimate the epidemic history of HCV genotype 6, we analyzed the concatenated core and NS5B alignment using the Bayesian Skyline plot (BSP) approach, as implemented in the program BEAST (for details, see Pybus et al. [44] and Drummond et al. [8]). To test the robustness of our results to model specification, we estimated epidemic history under a number of different model combinations and then used Bayes factors to choose the statistically most appropriate model (57). Marginal posterior distributions were estimated for each model parameter using Bayesian MCMC inference. All MCMC analyses were run for at least 50 million states. MCMC chain convergence, effective sample sizes, and Bayes factors were computed and investigated using the program Tracer v1.4 (9). In addition, an independent maximum likelihood relative rates analysis was performed using HYPHYv0.99 (40), which confirmed that relative rates were sampled appropriately in the Bayesian MCMC analyses.
A Bayesian estimate of phylogeny was obtained from the posterior distribution of trees arising from the best-fitting BEAST analysis (see above). First, the program TreeAnnotator (9) was used to calculate the Bayesian posterior probabilities of each internal node. Second, these probabilities were multiplied for each phylogeny sampled during the MCMC analyses. Third, the phylogeny with the highest total was located. This phylogeny best summarizes the set of credible trees and is called the maximum clade support phylogeny. Because a relaxed clock was used in the Bayesian MCMC analysis, the branch lengths and node heights of the maximum clade support phylogeny are in units of years. See Drummond et al. (8) for further details.
Lastly, the program BaTS (38) was used to test for the presence of statistically significant phylogeographic structure. BaTS tests the null hypothesis of panmixis (i.e., no correlation between phylogeny and taxa location) by performing randomization tests on two tree-shaped statistics: the parsimony score (PS) (53) and the association index (AI) (65). The randomizations are performed across a credible set of trees; hence, BaTS correctly incorporates phylogenetic uncertainty when testing for phylogeographic structure. For our BaTS analyses we used the posterior distribution of trees arising from the best-fitting BEAST analysis of the concatenated data set (see above). Further methodological details have been provided by Parker et al. (38).
Nucleotide sequence accession numbers. The GenBank accession numbers of the new HCV sequences from Laos are EU420957 to EU420986.
|
|
|---|
Phylogenetic analysis of the core and NS5B data sets. Figures 2 and 3 show the maximum likelihood phylogenies estimated from the core and NS5B alignments, respectively. As expected, the NS5B gene phylogeny is deeper and has a greater number of well-supported clades, reflecting the greater genetic variation of this region. In most cases sequences correctly group into their respective subtypes, although in both phylogenies strains belonging to subtypes 6d and 6e are mixed; we suggest this could be due to database annotation errors. In the core phylogeny, subtype 6o is incorrectly placed as a monophyletic ingroup of subtypes 6d and 6e; this is a phylogenetic estimation error most likely arising from limited sequence diversity.
![]() View larger version (28K): [in a new window] |
FIG. 2. Estimated maximum likelihood phylogeny for the core gene alignment. Bootstrap scores of >70% are shown next to well-supported nodes, and the phylogeny is midpoint rooted. Sequence names are given in the Los Alamos HCV Database format, as follows: subtype/country code/strain name/accession number. Where available, information is given in parentheses about the country of origin of emigrants from Southeast Asia to other countries. Gray bars indicate major subtypes of HCV genotype 6. Phylogeny branches and sequence names are colored according to the country of origin of the sampled individual. Country codes and colors for Asian strains are as follows: CN, China (red); VN, Vietnam (dark blue); HK, Hong Kong (red); TH, Thailand (green); IN, India (light blue); MM, Myanmar (gray); ID, Indonesia (brown); KH, Cambodia (orange); LA, Laos (magenta). Strains sampled outside of Asia are colored black (CA, Canada; FR, France; US, United States).
|
![]() View larger version (27K): [in a new window] |
FIG. 3. The estimated maximum likelihood phylogeny for the NS5B gene alignment. Bootstrap scores of >70% are shown next to well-supported nodes, and the phylogeny is midpoint rooted. See Fig. 2 for sequence naming details and the coloring scheme.
|
Our new isolates from Laos are genetically diverse and are interspersed among many different genotype 6 lineages. There is only one well-supported cluster of Lao sequences (strains Laos373, Laos394, Laos259, Laos23, Laos347, and Laos38), which is most closely related to subtype 6b. The locations of the remaining Lao strains are as follows: Laos250 falls between subtypes 6i and 6j; Laos382 groups most closely with subtype 6 h; Laos349 clusters near subtype 6l; Laos248 belongs to subtype 6o; Laos132 groups with the divergent Vietnamese strain VN235; Laos310 clusters with strain C81 sampled in Canada from an Asian immigrant; Laos390 is similar to the Laos strain IG93335 and groups with subtype 6q sequences from Cambodia. Laos344, Laos350, and Laos176 are highly divergent and do not closely group with any other strains.
Estimation of evolutionary time scale. We estimated a time scale for the evolution of HCV genotype 6 from an independent, previously published set of HCV sequences sampled at different times (59). Under the strict clock model, the estimated rate for our 399-nt core gene region was 1.78 x 10–4 substitutions/site/year (95% credible region, 1.11 x 10–4 to 2.6 x 10–4). A very similar estimate was obtained under the relaxed clock models, 1.72x10–4 substitutions/site/year (95% credible region, 0.91 x 10–4 to 2.7x10–4). These rate estimates were subsequently used as prior distributions in all subsequent BEAST analyses (see Materials and Methods for details).
Evolutionary analysis of the concatenated data set. Evolutionary analysis of the concatenated core plus NS5B data set was performed in BEAST under a range of molecular clock and coalescent model combinations. Simple coalescent models (i.e., constant size, exponential growth) consistently performed very poorly in comparison to the Bayesian Skyline plot (log10 Bayes factors, >25) and are therefore not reported here (results available on request). Six remaining models were well-supported: (i) a strict clock with BSP of 5 steps, (ii) a strict clock with BSP of 10 steps, (iii) a log normal relaxed clock with BSP of 5 steps, (iv) a log normal relaxed clock with BSP of 10 steps, (v) a gamma relaxed clock with BSP of 5 steps, and (vi) a gamma relaxed clock with BSP of 10 steps.
As shown in Fig. 4, all six model combinations gave similar median estimates for the age of HCV genotype 6, which was dated to
1,100 to 1,350 years ago. However, the 95% credible intervals for these estimates are large, ranging from
600 years ago to nearly 3,000 years ago, with the lower interval being less variable among models than the higher interval. However, these limits more accurately portray the true extent of statistical error than those reported in previous analyses (43, 54), which failed to use realistic models of sequence evolution or to incorporate uncertainty arising from phylogeny estimation. Our estimate of the date of the most recent common ancestor of genotype 6 is 400 years older than that reported by Pybus et al. (43), which likely reflects the much greater diversity of isolates considered here. Figure 4 also gives the estimated marginal likelihood of each model, calculated using Tracer v1.4, which represent the probability of each model combination given the data. Models C and E had substantially higher marginal likelihoods than the other models (log10 Bayes factors of >3.5). Model E has a slightly greater marginal likelihood than model C, but this difference is not considered significant (log10 Bayes factor,
0.5). For each clock model, the BSP with 5 steps was statistically favored over the BSP with 10 steps (Fig. 4).
![]() View larger version (17K): [in a new window] |
FIG. 4. Estimated dates of origin of genotype 6, as obtained under model combinations A to F (see text for details). The composition and marginal posterior log likelihood of each model combination are provided. The error bars are the 95% highest posterior density credible intervals for each estimate.
|
![]() View larger version (19K): [in a new window] |
FIG. 5. The maximum clade support phylogeny of the concatenated (core plus NS5B) data set, obtained under the best-fitting model (combination E). Branch lengths represent time (see time scale at the bottom of the figure). Posterior probability scores (>0.9) are shown next to well-supported nodes. The shaded area corresponds to the 20th century, during which the lineages denoted by white circles showed rapid diversification. See Fig. 2 for sequence naming details.
|
= 0.22) than for NS5B (
= 0.32). Rate heterogeneity among lineages is represented by the relaxed clock coefficient of variation (COV) parameter. Smaller COV values represent less rate variation among lineages and more clock-like evolution, hence the credible region of COV should abut zero if evolution follows a strict molecular clock. Our COV estimate is 0.37 (95% credible region, 0.28 to 0.45), which represents significant among-lineage rate variation and is similar to recently reported values for Dengue virus, human influenza A virus, and human immunodeficiency virus type 1 (8, 26). However, we might have expected to obtain lower COV values for HCV, because some previous studies reported that the hypothesis of a strict molecular clock is not always rejected (43). We therefore suggest that the failure of previous analyses to reject the strict clock was due to the comparatively small sample sizes used. In order to ensure accuracy, future evolutionary analyses of HCV data sets of sufficient size should incorporate rate variation among lineages. The relaxed clock covariance parameter is 0.02 (95% credible region, –0.11 to 0.14). This parameter measures the degree to which among-lineage rate variation is randomly distributed across the phylogeny, as opposed to being localized to specific clades. Our data suggest the former is true for HCV, because the estimated covariance is not significantly different from zero. Very similar COV and covariance values were obtained under the other relaxed clock models (combinations C, D, and F).
Figure 6 shows the BSP estimated from the concatenated data set. The BSP is a flexible, nonparametric estimate of past changes in effective population size (7). It is based on the coalescent process, a population genetic model that describes the relationship between the demographic history of a population and the ancestral relationships of sequences sampled from it (explained further in reference 6). The most notable feature of Fig. 6 is the change at the onset of the 20th century, from a low and relatively constant effective population size to rapid, epidemic growth. This change coincides with the onset of rapid diversification in the lineages highlighted in Fig. 3. The rate of growth appears to slow from the 1980s to the present, matching the change in HCV transmission that followed the virus' isolation in 1989, although this recent decrease is not statistically significant given the large confidence intervals. However, BSPs should be interpreted carefully when, as in this case, sequences have been sampled from a geographically structured population (3). Specifically, the 20th century growth phase shown in Fig. 5 was estimated from a heterogenous collection of lineages, some of which show diversification during the 20th century and others which do not (Fig. 3). Consequently, the exponential growth rate during this recent phase (i.e., between 1900 and 1980) (Fig. 6) is
0.035 year–1, considerably lower than equivalent rates previously estimated for more specific populations, such as genotype 4 in Egypt (
0.25 year–1 [44]), subtype 1b in China (
0.3 year–1 [34]), and subtypes 1a and 3a in the United Kingdom and United States (0.1 to 0.2 year–1 [45, 60]). The growth rate we have estimated is therefore likely to reflect an average rate for the whole of genotype 6 and should not be extrapolated to individual epidemic subtypes, which will have spread comparatively faster. For example, the exponential growth rate of subtype 6a in Hong Kong has been estimated at
0.17 year–1 (60), in agreement with the population-specific and subtype-specific rates listed above.
![]() View larger version (10K): [in a new window] |
FIG. 6. Bayesian skyline plot, showing the epidemic history of genotype 6 estimated from the concatenated (core plus NS5B) data set (see text for details). The thick black line represents the estimated effective population size through time. The gray area represents the 95% highest posterior density confidence intervals for this estimate.
|
|
|
|---|
The genotype 6 clusters highlighted in Fig. 2 can be described as "local epidemic" strains, which transmitted rapidly during the 20th century in specific locations but did not spread internationally in the manner of the "global epidemic" subtypes 1a, 1b, and 3a (56). It is therefore probable that local epidemic lineages are characterized by transmission routes that are different and more varied than the routes associated with global epidemic subtypes. Local epidemic strains have previously been noted in Africa, particularly subtype 4a in Egypt, which has been associated with large-scale injectable antischistosomiasis treatment campaigns (10, 44). Our results indicate that local epidemic subtypes are also common to Asia. Since it is reasonable to assume that similar epidemiological factors have affected other HCV genotypes, we further argue that all common HCV subtypes are the result of selective amplification of endemic lineages, either locally or globally, during the 20th century. This hypothesis can explain why HCV subtypes contain unusually similar levels of genetic diversity and why they are highly phylogenetically distinct.
In the context of the scenario described above, the pattern of HCV genetic diversity in Laos is unusual, since there are no discernible "20th century" clusters of Lao sequences (Fig. 4). This is unlikely to be a sample size artifact, since smaller or equivalent-sized samples from other countries are sufficient to identify tight sequence clusters. Our phylogeographic analysis indicates that the Lao strains are clustered, but less strongly than strains from Vietnam, Thailand, or China. Five Lao isolates group together with the subtype 6b isolate TH580, but the branching events in this cluster substantially predate the 20th century (Fig. 4). Therefore, HCV in Laos appears to have been less involved with whatever events amplified endemic genotype 6 lineages elsewhere in Asia; the low prevalence of HCV in Laos (
1%) compared to nearby countries also supports this notion (21, 69). Although we can only speculate on the reasons for this, differences in health care infrastructure among countries may be important, particularly if iatrogenic and nosocomial transmission has contributed to Asian HCV spread. We have been unable to find formal comparisons of investment in public health campaigns (such as vaccinations and mass treatment with injectable drugs) during the first half of the 20th century. However, the information available suggests that the French colonial authorities in Laos invested less in such public health campaigns than other mainland Southeast Asian countries (24, 55). In the second half of the 20th century Laos suffered civil war, extraordinarily destructive aerial bombing by the United States, and economic hardship in the wake of the war and the 1975 revolution, resulting in relatively low investment in public health intervention until the mid-1990s. Furthermore, before the 1990s the country had few reliable transport links (49). Hence, it is possible that the Lao population has experienced comparatively lower levels of mass exposure to blood-borne viruses such as HCV.
Our analyses show that genotype 6 infections worldwide are descended from a common ancestor that existed around 1,100 to 1,350 years earlier (95% credible region, 600 to >2,500 years ago) (Fig. 4). This long time scale is based on extrapolation of HCV evolution observed over a much shorter time span of about 25 years. Although there is no obvious reason why HCV rates should significantly vary through time—and our relaxed clock results suggest that they do not—we note that such estimates should be interpreted cautiously and are more likely to underestimate clade age than to overestimate it (17).
Prior to the 20th century, HCV transmission in Asia appears to have been characterized by long-term low-level infection in areas of endemicity (Fig. 6). We currently have almost no understanding of how stable, endemic transmission of HCV can be maintained for many centuries (46) and no idea how the virus spread across Asia from a common ancestor. Our results do indicate that endemic genotype 6 lineages were historically associated with different locations (e.g., subtype 6f in Thailand, 6g in Indonesia, 6d in Vietnam, and 6q in Cambodia), with multiple genotype 6 lineages being present in modern-day Vietnam, Thailand, and Laos (Fig. 5). In addition, the presence of old phylogenetic nodes that connect different country-specific lineages (Fig. 2, 3, and 5) suggests that at least some historical gene flow occurred. However, the significant spatial structure we observed demonstrates that genotype 6 gene flow is restricted. Furthermore, it appears to be limited by distance, as sequences from pairs of nearby nonadjacent countries (Thailand-Vietnam, Myanmar-Vietnam, Myanmar-Cambodia, China-Cambodia, or China-Thailand) do not tend to cluster with each other (Fig. 2 and 3). In contrast, isolates from Laos group with strains from several neighboring countries, as expected given Laos' geographically central position in mainland Southeast Asia. Similarly, strains from Myanmar are often found intermingled with those from neighboring Thailand. Of course, historical patterns of movement may not be well represented by classifications based on current political borders.
The evolutionary models employed in our analyses included a relaxed molecular clock that estimated the degree to which the rate of molecular evolution varies across a phylogenetic tree (8). The analyses indicated a larger-than-expected amount of rate variation for HCV. Incorporating this rate heterogeneity increases the accuracy of our estimates (8) and the realism of our analysis, and therefore we recommend that future phylogenetic analyses of HCV gene sequences use this, or a similar, approach. However, such methods cannot be reliably applied to small data sets or short sequences. It is common for HCV molecular epidemiological surveys to produce short subgenomic fragments around 300 nt long, which by themselves are insufficient to reliably estimate phylogenetic groupings (34, 52). The compromise solution used here was to increase statistical power by concatenating multiple subgenomic sequences, but at the expense of a reduction in the number of available reference strains for comparison. Since viral sequences in databases can prove useful for many years after their initial investigation, we encourage the standardization of the subgenomic regions used and the production of longer gene fragments per isolate.
We are very grateful to the participating patients, and to the doctors, nurses, and technical staff of Mahosot Hospital: Vimone Soukkhaserm, Mayfong Mayxay, Nicholas J. White, Amphay Phyaluanglath, Somphone Phannouvong, Pathila Inthepphavong, and Martin Stuart-Fox.
Published ahead of print on 29 October 2008. ![]()
O.G.P. and E.B. contributed equally. ![]()
|
|
|---|
csrc/klauss1.html.This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»