High-Resolution Analysis of Intrahost Genetic Diversity in Dengue Virus Serotype 1 Infection Identifies Mixed Infections

ABSTRACT Little is known about the rate at which genetic variation is generated within intrahost populations of dengue virus (DENV) and what implications this diversity has for dengue pathogenesis, disease severity, and host immunity. Previous studies of intrahost DENV variation have used a low frequency of sampling and/or experimental methods that do not fully account for errors generated through amplification and sequencing of viral RNAs. We investigated the extent and pattern of genetic diversity in sequence data in domain III (DIII) of the envelope (E) gene in serial plasma samples (n = 49) taken from 17 patients infected with DENV type 1 (DENV-1), totaling some 8,458 clones. Statistically rigorous approaches were employed to account for artifactual variants resulting from amplification and sequencing, which we suggest have played a major role in previous studies of intrahost genetic variation. Accordingly, nucleotide sequence diversities of viral populations were very low, with conservative estimates of the average levels of genetic diversity ranging from 0 to 0.0013. Despite such sequence conservation, we observed clear evidence for mixed infection, with the presence of multiple phylogenetically distinct lineages present within the same host, while the presence of stop codon mutations in some samples suggests the action of complementation. In contrast to some previous studies we observed no relationship between the extent and pattern of DENV-1 genetic diversity and disease severity, immune status, or level of viremia.

Little is known about the rate at which genetic variation is generated within intrahost populations of dengue virus (DENV) and what implications this diversity has for dengue pathogenesis, disease severity, and host immunity. Previous studies of intrahost DENV variation have used a low frequency of sampling and/or experimental methods that do not fully account for errors generated through amplification and sequencing of viral RNAs. We investigated the extent and pattern of genetic diversity in sequence data in domain III (DIII) of the envelope (E) gene in serial plasma samples (n ‫؍‬ 49) taken from 17 patients infected with DENV type 1 (DENV-1), totaling some 8,458 clones. Statistically rigorous approaches were employed to account for artifactual variants resulting from amplification and sequencing, which we suggest have played a major role in previous studies of intrahost genetic variation. Accordingly, nucleotide sequence diversities of viral populations were very low, with conservative estimates of the average levels of genetic diversity ranging from 0 to 0.0013. Despite such sequence conservation, we observed clear evidence for mixed infection, with the presence of multiple phylogenetically distinct lineages present within the same host, while the presence of stop codon mutations in some samples suggests the action of complementation. In contrast to some previous studies we observed no relationship between the extent and pattern of DENV-1 genetic diversity and disease severity, immune status, or level of viremia. D engue virus (DENV) is a single-strand positive-sense RNA virus of the family Flaviviridae and exists as four closely related antigenically distinct serotypes denoted DENV-1 to DENV-4. These serotypes differ at the consensus level by 25 to 40% at the amino acid level (15,30). Genetic variation within each of the four serotypes is defined as a series of "genotypes" (or "subtypes"), which can vary from one another by up to ϳ6 to 8% and 3% at the nucleotide and amino acid levels, respectively (15,25,32). For example, at least four major genotypes of DENV-1 exist, each with a different geographical distribution (9,38). The basis for the genetic diversity in DENV is its error-prone RNA polymerase (10), such that mutations commonly occur during viral replication and on which a combination of genetic drift and negative and/or positive natural selection is able to act. This high rate of replication error results in DENV existing as a population of closely related variants within an individually infected host (33,34), and this intrahost genetic diversity has been proposed to have implications for pathogenesis of DENV infection, variable disease outcomes, virus evolution, and host immunity (6).
Several previous studies have confirmed that the population of DENV in humans and within individual Aedes mosquitoes contains measurable genetic variation (8,19,33,34). Levels of withinhost genetic diversity have been previously shown to vary among patients. Reported levels of intrahost genetic diversity ranged from 0.21 to 1.67% for the E gene of DENV-3, with genomedefective DENVs observed in 3.9% to 5.8% of clones (33,34). Another similar study showed that the intrahost diversity for the C and NS2B genes ranged from 0.12 to 1.02% and 0.16 to 1.20%, respectively (34). Lin et al. (19) showed that DENV exhibits sub-stantial sequence diversity in humans and to a lesser extent in mosquitoes, with the major variant transmitted in both humans and mosquitoes. Intriguingly, Descloux et al. suggested that the level of intrahost genetic diversity was lower in patients suffering severe dengue disease, i.e., dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS), than in those experiencing the milder dengue fever (DF), such that there is a direct link between viral genetic diversity and clinical outcome (8).
All previous studies of intrahost DENV genetic diversity have utilized point measurements in which a limited set of clones (n ϭ 10 to 50) containing short, amplified segments of the viral genome were sequenced. In most cases the proportion of mutations due to experimental (PCR/sequencing) error in these studies is uncertain, but it is likely an important contributor to the levels of diversity observed. As well as a limited sample of population diversity, it is unknown whether the extent of sequence variation changes during the course of infection, and the relationship between intrahost genetic variation and dengue severity is unclear. To address these issues, we undertook an expansive study of intrahost DENV variation by sequencing a median of 155 high-quality clones from serial plasma samples taken from 17 patients infected with DENV-1 and applying a rigorous quality control to exclude artifactual mutations. With these data we explored the relationship between intrahost genetic diversity and clinical outcome, focusing on the sequence encoding domain III (DIII) of the envelope (E) gene. Importantly, DIII is involved in cell receptor binding and is the major target of virus-neutralizing antibodies in humans (3,21), and hence mutations within this region may have important functional consequences.

MATERIALS AND METHODS
Study population. Plasma samples from dengue patients included in the placebo arm of a clinical trial of chloroquine were used for this study (29). We selected 17 patients for study based on the serotype of infection (i.e., DENV-1), serological response (i.e., primary or secondary), and disease severity (i.e., DF or DHF). Classification of disease severity was according to 1997 WHO classification criteria (36). For each patient, three sequential plasma samples, beginning with the enrolment plasma sample, were selected for analysis. Samples were selected to represent the breadth of viremia levels found in DENV-1-infected patients. Briefly, we selected three primary DF, seven secondary DF, and seven secondary DHF patients, with medians age of 19, 19, and 20 years, respectively, and a male/female ratio of 1.3. The median day of illness at admission was 2.2 (range, 0.6 to 2.8 days).
Whole-genome (consensus) sequencing of DENV-1. Viral genomes in the enrolment samples were sequenced as part of the Broad Institute's Genome Resources in Dengue project using a capillary sequencingdirected amplification viral sequencing pipeline as previously reported (24). In short, isolated viral RNAs were reverse transcribed, and then overlapping amplicons that span the complete genome were amplified using a high-fidelity polymerase; resulting products were Sanger sequenced, and the resulting sequence coverage was ϳ8-fold. Resulting sequence reads were assembled using the Broad Institute's AV454 algorithm (13a). Consensus assemblies were used for alignment of clone reads as part of the variant calling process (see below).
RNA extraction, real-time PCR, cloning, and sequencing. Dengue viral RNA was isolated directly from plasma using the QIAamp viral RNA minikit (Qiagen, Germany). RNA was reverse transcribed, and DENV-1 viremia levels were assessed using an internally controlled, serotypespecific, real-time reverse transcriptase PCR (RT-PCR) assay that has been described elsewhere (18); results were expressed as cDNA equivalents per ml of serum.
PCR amplimers were cloned into the T/A cloning vector pCRII-TOPO, which was transformed into TOP10 competent cells (Invitrogen). Each transformation culture was plated out on Luria-Bertani (LB)-ampicillin-isopropylthiogalactoside (IPTG)-5-bromo-4-chloro-3-indolyl-␤-D-galactopyranoside (X-Gal) plates and grown overnight at 37°C. A total of 382 white colonies (suggestive for amplicon insertion) were selected from each sample and sequenced using dye terminator chemistry on an ABI 3730xl sequencer (Applied Biosystems) from both ends to generate paired end reads and quality files.

Variant calling. (i) Read alignment and merging.
Reads from each sample were aligned to the consensus genome sequence present in the enrolment plasma sample using the BLAST-Like Alignment Tool (BLAT) version 33 (17). A custom script was used to merge overlapping forward and reverse reads, simultaneously assign appropriate base quality scores, and trim the resulting reads to the target amplicon sequence.
Overlapping forward and reverse reads were merged into a single contig and assigned quality scores. To control for poor alignment at the ends of reads, forward and reverse reads were required to have at least 5 bases aligning into the designed primer (i.e., DIII-E P3 and DIII-E P5) or were trimmed backwards 5 bases from the end of their alignment. The quality scores were assigned based on the agreement or disagreement of the bases between the forward and the reverse reads. The sum of quality scores was assigned for bases agreeing; bases disagreeing were assigned to the base with the highest quality score and the quality score was assigned as the difference. Gaps were given quality equal to the lower quality of the adjacent base or the lowest quality of any contiguous base of the same type (homopolymer adjustment); a base(s) was discarded when the gap had higher quality than the inserted base(s) on the opposite strand, and bases retained their quality scores if the quality of bases was higher. Indels of the same length in both reads were retained as real. Complex events (e.g., inserts relative to reference opposite deletions or insertions or deletions of different length) were replaced with a number of Ns equal to the length of the consensus between the two flanking consistent alignments and given a quality score of 0. When the overlapping region of the forward or reverse read did not extend to the designed primer, the merged read was extended to include whichever read had the largest number of aligning bases on that side of the overlap and was assigned the raw quality for those bases. In cases where both complement forward or reverse reads did not align, we trimmed the single read to the target amplicon region and retained it for variant calling.
(ii) Base variant calling. To reduce false-positive base variant calls, we employed a neighborhood quality standard (NQS) algorithm (2) to filter bases used for variant calling. Bases not meeting an NQS condition over those regions were excluded; i.e., a base satisfies the NQS condition if the base has PHRED score of Ն20 and the neighboring five bases on each side have PHRED scores of Ն15. Two variant base data sets were generated for downstream analysis. In the first, highest-quality data set, defined as VP, base variants were called using the V-Phaser algorithm (A. Macalalad et al., submitted for publication). In short, V-Phaser applies an error probability model defined by a process read error rate, and refined by the inclusion of variant nucleotide phasing information, to define the frequency at which a nucleotide polymorphism needs to be observed to be a true variant given the observed sequence coverage. In general, for the data sets analyzed as part of this study, variants were identified as real if they were observed on two or more reads. To explore how erroneous PCR and sequencing may have contributed to the observed levels of genetic diversity, we generated a second variant data set, defined as 1HQ, that included variants that were seen only once (i.e., singletons). In both the 1HQ and VP data sets, only high-quality bases that passed NQS were used for base variant calling.
(iii) Variant haplotype calling. For each aligned read (see "Read alignment and merging" above), we computed a vector of valid base variant calls (see "Base variant calling" above). The minimal set of such vectors required to explain all reads was collected using a custom haplotype calling algorithm. For each sample, the algorithm was seeded with a single read and then reads were assigned a haplotype. If a read matched unambiguously based on the variant positions to an existing haplotype group (in first iteration match is to seed read), it was assigned that haplotype; otherwise, it was assigned as a new haplotype. This process was iterated until all reads were grouped into defined haplotypes defined by variant vectors. We assigned reads that have variant vectors with missing data (e.g., due to failure to align or presence of a call which is not considered valid) by a similar process. For reads that the partial vector maps unambiguously to a complete haplotype, the missing information is "corrected" based on the complete vector; those reads that do not map unambiguously are assigned as "incomplete" haplotypes.
Evolutionary analysis. (i) Measurements of genetic variation. Alignments of full-length pseudoreads (i.e., all valid variants) from the haplotypes were generated with the MUSCLE software (version 3.7) (11), using default settings. Because of the very low numbers of mutations observed, the mean pairwise genetic diversity within each sample was calculated from the uncorrected pairwise distance matrix (p distance) between taxa, and the population standard error (SE) was estimated with 1,000 bootstrap replicates using the MEGA5 program (27). To estimate the mean numbers of synonymous (d S ) and nonsynonymous (d N ) substitutions per site (d N /d S ratio) in each sample, we utilized the Jukes-Cantor substitution model within MEGA5 (27). Mutations detected within each sample were further characterized as to their frequency and presence in other samples and were mapped to inferred amino acid sequences.
(ii) Pattern of intrahost evolution. The evolutionary relationships among the DENV-1 sequences from each sample were inferred through the construction of minimum spanning networks, utilizing the program TCS 1.21 (5) and following the algorithm of Templeton et al. (28). Inferences from this method depend on the chosen probability of parsimony, and we chose a value of 99% (i.e., a 99% connection limit). This number of mutational differences associated with the probability just before the (99%) cutoff is the maximum number of mutational connections between pairs of sequences. Networks that are unconnected at the 99% probability of parsimony were linked by decreasing the connection probability. The power of this approach is that it allows the population frequency of each mutation to be assessed, and the parsimony-based approach is justified by the small number of total mutations observed.
(iii) Global DENV-1 phylogenetic inference. To determine the frequency of mixed infections in our data sets, the sequences of each individual patient were aligned together with 1,390 previously published DENV-1 E gene sequences (i.e., "background data set"), which combine subsets of genotype I (n ϭ 1,111), II (n ϭ 91), and III (n ϭ 188). Phylogenetic trees for these data were then estimated using the maximumlikelihood (ML) method available in the RAxML package (version 7.0.4) (26). In all cases we used the GTRϩ⌫ 4 model of nucleotide substitution, as determined by ModelTest v3.7 (23). The reliability of specific groupings on the trees was estimated using bootstrapping with 1,000 pseudoreplicates.
Nucleotide sequence accession numbers. All nucleotide sequences generated here have been submitted to GenBank and assigned accession numbers 2262271431 to 2299350311 (see Table S2 in the supplemental material).

Extent and pattern of intrahost genetic variation.
The clinical, serological, and demographic features of the 17 DENV-1 infected patients who participated in this study are shown in Table 1. To determine the intrahost evolutionary dynamics of DENV-1 in these patients, we studied genetic diversity in 49 serial plasma samples collected during the course of their illness. Overall, we sequenced 8,458 clones of the 463-nucleotide region carrying DIII of the E gene derived from 49 serial plasma samples collected during the course of infection. In the VP data set, 8,458 clones were assigned into complete haplotypes, with a median of 155 (range, 4 to 362) clones analyzed at each time point (Table 2); these data excluded singleton mutations and included only highquality variant positions that were seen frequently enough at a given sequence coverage to be unlikely to occur as a result of error alone (i.e., typically observed at least twice). In the 1HQ data set, which contains all variants observed, including singletons that may be artifacts resulting from process errors, 8,315 and 143 clones were assigned to complete and incomplete haplotypes, re-spectively. A median of 155 (range, 4 to 361) clones were analyzed at each time point (Table 3).
In the VP data set, which included only highly confident variant calls but which may have excluded some bona fide mutations at low frequency, we identified a total of 281 nucleotide mutations across the 8,458 clones of the 463-nt region (Table 2), corresponding to a mutational frequency of 7.2 ϫ 10 Ϫ5 (95% confidence interval [CI], 6.4 ϫ 10 Ϫ5 to 8.1 ϫ 10 Ϫ5 ) mutations per nucleotide site. Across all patients and time points, these mutations were observed at 43 residues (see Table S1 in the supplemental material). In all patients, the majority of sequences (65 to 100%; mean, 97%) recovered were identical to the consensus. A measure of selection pressure could be calculated in 18 samples, with mean values of pairwise distance ranging from 0.00005 to 0.00130 (mean, 0.00034) ( Table 2). There was no significant difference in the mean pairwise distance between patients with DHF and DF (0.00030 versus 0.00041) ( Table 2). To determine the selection pressure affecting DENV within each patient, we estimated the mean d N /d S value for each sample. Mean d N /d S values varied between 0.13 and 1.9, with an average value of 0.23. Of the 8,458 clones sequenced, 4 clones contained a total of 6 stop codons (0.05%) ( Table 2). In sum, these stringently filtered data provided a conservative picture of the level of genetic diversity in these samples, but those are very likely to be real biological variants and suggest that sequence diversities in viral populations may be very low.
Intrahost phylogenetic relationships. To infer the evolutionary history of mutations in each sample, we inferred minimum spanning networks ( Fig. 1 and 2). In five patients (i.e., patients 49, 121, 154, 323, and 391), the viral population harbored only the consensus sequence. Six patients (i.e., patients 59, 82, 107, 336, 349, and 376) contained haplotypes that are multiple mutational steps (Ն2) away from the consensus sequence, such that longer branches stem from the consensus sequence. In addition, two patients (82 and 162) harbored multiple phylogenetically distinct viral lineages (i.e., haplotypes) across multiple time points (Fig.  1A and B). A third patient (patient 336) also supported multiple haplotypes when the parsimony probability was reduced to 97% (Fig. 1E); in this network, hap 1 (n ϭ 2) required seven additional mutational steps, which was suggestive of mixed infections.
Evidence of mixed infections. Notably, one sample (G2542, patient 336) contained two identical clones that differed by 7 nt (1.5%) from the consensus sequence and hence showed a difference far greater than that observed in the majority of other patients (mean ϭ 0.1%). This prompted us to determine whether the high level of genetic diversity in patient 336 was due to mixed infections from different origins within the global diversity of DENV-1. Phylogenetic analyses of the alignment of all haplotypes of each patient with the "background data set" (a global samples of DENV-1 E DIII sequences from GenBank) provided strong evidence for multiple infections, all involving genotype 1 viruses (Fig.  3). Specifically, patient 336 harbored a mixed infection with viruses from clade 1 and 5 (clades are as described by Raghwani et al. [24]).
Analysis of the 1HQ data set. As a comparison with the highquality but conservative VP data set and to assess the likely extent of sequencing errors, we performed an additional analysis of the 1HQ data set. Among the 8,315 clones of the VP data set, 2936 nucleotide mutations were observed, corresponding to a mutation frequency of 7.6 ϫ 10 Ϫ4 (95% CI, 7.4 to 7.9 ϫ 10 Ϫ4 ) muta-tions per nucleotide site (Table 3). A total of 1,922 amino acid mutations were observed. The majority of clones (n ϭ 1434, 17.4%) harbored a single amino acid mutation, while 2.8% carried multiple amino acid mutations. Mean estimates of pairwise genetic diversity varied from 0.00048 to 0.00360 (mean, 0.00164), and the mean d N /d S values ranged from 0 to 1.6 (mean, 0.58).
These d N /d S values are much higher than those seen between patients, which are normally Ͻ0.1, suggesting that intrahost variation is characterized by transient deleterious mutations or caused by the experimental procedure, which results in an elevation of d N /d S values (14). In addition, 36 in-frame stop codons in 32 clones were identified in 17 of the 49 samples studied (Table 3). Genome-defective DENVs were observed in 0.38% of clones. All mutations in the VP data set (n ϭ 43) were also observed in sequential samples and/or across multiple patients in the 1HQ data set (see Table S1 [left] in the supplemental material). Many mutation positions (n ϭ 625 of 845; 74%) were observed in sequential samples and/or across multiple patients in the 1HQ data set, but these mutations lacked statistical rigor to be called a valid variant in the VP data set.

DISCUSSION
The intrahost population genetic structure of DENV has previously been described as a population of closely related sequences  (1,7,14,19,33,34). Our study, which comprises the largest series of samples and patients as well as stringent filtering of sequence quality, confirms these observations but shows that the occurrence of mutations in the virus population are much lower than previously reported. The mean pairwise genetic diversity varied between 0.00048 and 0.00360 and between 0.00005 and 0.00130 in the 1HQ and VP data sets, respectively, with no significant difference in the mean pairwise distance between patients with DHF and DF. The substantially higher sequence variation in our 1HQ data set resembled that described in previous reports (8,19,33,34). However, given that the 1HQ data set undoubtedly includes a significant number of artifactual mutations, the high sequence variation in this data set should be regarded as the upper bound of DENV genetic diversity. As a consequence, it is likely that previous estimates of intrahost genetic diversity in DENV have been inflated by the erroneous inclusion of PCR and sequencing errors in the diversity calculations and hence should be treated with caution.
It is important to note that accurate estimations of intrahost sequence variability depend largely on the accuracy of the experimental procedure, particularly the fidelity of RT-PCR and sequencing. However, distinguishing bona fide from artifactual mutations is not a trivial exercise. Our rigorous approach to error correction relies on (i) the alignment of clonal sequences to a reference sequence for haplotype calling, (ii) the identification of unambiguous mutations with a high quality score of a base(s), and (iii) whether mutations were seen once (1HQ data set) or frequently enough at a given sequence coverage to be unlikely to be from error (VP data set). Indeed, the 1HQ data set must harbor a high, but undetermined, number of artifactual mutations which were likely introduced during reverse transcription, PCR amplification, or sequencing. The process error rate (i.e., RT-PCR plus cloning plus Sanger sequencing) can be expected to be on the order of 2 8 ϫ 10 Ϫ6 to 8 ϫ 10 Ϫ6 /nt/cycle when a proofreading polymerase is used, as reported by Malet et al. (20), which corresponds to an expectation that ϳ0.036% of the observed variants could be errors in our experimental system (0.024 to 0.269% mutations observed in the 1HQ data set). Conversely, the VP data set undoubtedly represents biological variants but may underestimate the true intrahost sequence variation, as the variant calling algorithm will call singletons as errors despite some of these mutations possibly representing true biological variants. Notably, the probability of a mutation occurring independently at random across multiple sequential samples is very low, and hence singleton variants observed in multiple samples may have a higher likelihood of being true biological variants than those observed in a single patient. Indeed, although RT, PCR, and sequencing errors likely contribute to the majority of variants observed in the 1HQ data set, we were able to identify mutations that occurred in multiple patients and at multiple time points (see Table S1 in the supplemental material), suggesting that they are biological variants even though they are at low frequency within individual patients and hence are excluded from the VP data set.
Overall, our VP data set indicates that the DIII segment of the E gene in DENV-1 exhibits limited sequence variation during the course of infection. In addition, it is striking that in both the VP and HQ1 data sets, we found no clear evidence for adaptive evolution in the DIII region, in the form of consistently high d N /d S ratios and/or mutations that exhibited a steady increase in frequency, even though it is thought to be the principle target for neutralizing antibodies (3,21). The lack of positive selection in this case is likely to be a function of the fact that dengue is a self-limiting infection in which innate, humoral, and cellular immune mechanisms remove the virus population before evidence of positive selection can be detected (4,13,22,31).
The relationship between viral genetic variation and disease severity has been well documented for human immunodeficiency virus type 1 (HIV-1) and hepatitis C virus (HCV) (12,35). For example, higher HIV-1 sequence diversity has been shown to be associated with slower disease progression (35). Similarly, disease progression in HCV infection was associated with measurable genetic evolution, while resolving hepatitis correlated with evolutionary stasis in the acute phase of HCV infection (12). Because our analysis considered a relatively large number of sequences per patient and these patients likely harbored differences in immunological responses, we were able to look for associations between the intrahost diversity of DENV-1 and disease outcome, immune status, or viremia. Notably, we observed no clear evolutionary patterns in relation to any of these variables. These results are in contrast to results reported by Descloux et al. (8), who showed higher intrahost sequence variation in patients with DHF/DSS than in those with DF. The basis for the differences in results between our studies is unknown but could be related to the methods used to filter sequence quality or to sample size. In addition, Descloux et al. assessed a much smaller number of clones (662 clones from 16 serum samples at a single time point), increasing the chance of stochastic effects.
Finally, one of the most striking observations from this study was the presence within some patients of phylogenetically distinct lineages or subtypes of genotype 1 DENV-1, indicative of mixed infection. That these mixed-infection events were also observed within the high-quality VP data sets indicates that they are bona fide. This is the first time that intraserotype mixed infection has been reported for DENV-1, and we likely greatly underestimate its true frequency as we are able to infer the occurrence of mixed infection only when it involves lineages that fall into topologically distinct places on phylogenetic trees (i.e., we cannot identify mixed infection among very closely related viral lineages). Intriguingly, a previous study of DENV-2 evolution also revealed the presence of mixed infection, such that individual patients harbored multiple phylogenetically distinct lineages (1). We therefore conclude that mixed infection is a potentially important contributor to intrahost virus genetic and phenotypic diversity and provides the raw material for intraserotype recombination (16,37). However, we were unable to determine whether these mixed infections represent the simultaneous infection (i.e., coinfection) or superinfection of multiple viral lineages in humans. This is clearly an area that requires additional study.

FIG 2
Minimum spanning networks of intrahost DENV sequence data (VP data set) in which one (B, E, G, J, and K) and/or two (C, I, and J) mutations were shared between haplotypes. All sequences were identical to the consensus in panels A, D, F, H, and L. Refer to Fig. 1 for more information.

FIG 3
Maximum-likelihood (ML) phylogenetic tree for all (n ϭ 89) consensus sequences derived from clones in the VP data set in relation to 1,390 equivalent background DENV-1 sequences collected from GenBank. Red lines represent clones from sample G2542, and red arrows signify mixed infection. Clades are indicated as numbers. Horizontal branches are drawn to a scale of nucleotide substitutions per site, and the tree is midpoint rooted; nodes are ordered increasingly and presented as a polar tree.