Dengue Type 3 Virus in Plasma Is a Population of Closely Related Genomes: Quasispecies

ABSTRACT Using reverse transcription-PCR and clonal sequencing of the dengue virus envelope gene derived from the plasma samples of six patients, we reported for the first time that dengue virus circulates as a population of closely related genomes. The extent of sequence diversity varied among patients, with the mean pairwise proportions of difference ranging from 0.21 to 1.67%. Genome-defective viruses were found in 5.8% of the total number of clones analyzed. Our findings on the quasispecies nature of dengue virus and the defective virus in vivo have implications with regard to the pathogenesis of dengue virus.

Dengue virus belongs to the genus Flavivirus of the family Flaviviridae. It consists of four serotypes, DEN-1 to DEN-4. While most dengue virus infections are present as asymptomatic or mild, self-limited dengue fever (DF), some patients may develop severe and potentially life-threatening dengue hemorrhagic fever (DHF)-dengue shock syndrome (5,6,10,25). Epidemics of the four dengue viruses continue to be a major public health problem in tropical and subtropical areas. It has been estimated that approximately 100 million cases of DF and 250,000 cases of DHF occur annually worldwide (5,15).
Due partly to the nonproofreading and thus error-prone nature of viral RNA polymerase, many RNA viruses exhibit a high degree of sequence variation, not only among isolates from different individuals but also among viruses within the same individual (13,26). RNA viruses therefore exist as a population of closely related sequences known as quasispecies (3,8,21). Quasispecies are believed to play an important role in the survival and evolution of RNA viruses as well as in the pathogenesis of disease. Well-studied examples are human immunodeficiency virus type 1 (HIV-1) and hepatitis C virus (HCV) (2,4,13,24,26). Little is known about the extent of sequence variation of dengue virus in vivo and its relationship to disease severity. In this study, we investigated sequence variation in the envelope (E) gene of dengue viruses in plasma samples from six patients confirmed to have dengue virus type 3 during an outbreak in southern Taiwan in 1998 (7,11,23,25). We report for the first time that dengue virus is present as quasispecies in plasma. In addition, a deletion and stop codons were found in 4 out of 69 clones analyzed, indicating the presence of genome-defective viruses in vivo.
Dengue virus RNA was isolated from plasma samples from acutely ill patients collected within 7 days of the onset of illness with a QIAamp viral RNA mini kit (Qiagen, Hilden, Germany) as described previously (23). The RNA eluates were subjected to reverse transcription (RT) by using a cDNA synthesis kit (Life Technologies, Rockville, Md.). Based on sequences available from GenBank, outer (d3E206A, d3E422B) and inner (d3E254A, d3E397B) primers were designed to amplify a 430nucleotide region in the E gene ( Fig. 1). This region covers all of domain III (which is presumably the receptor-binding domain), the connecting segment, and the hinge junction to domain II, according to the crystallographic model of the E protein of the tick-borne encephalitis virus (19). An aliquot of cDNA was subjected to first-and second-round PCR with the outer and inner primers, respectively. The PCR conditions were 95°C for 5 min, followed by 30 cycles of 95°C for 1 min, 62°C for 1 min, and 72°C for 1 min, and then 72°C for 5 min. Each PCR product was cloned into the T/A cloning vector pCRII-TOPO, which was taken up by TOP10 competent cells by transformation (Invitrogen, San Diego, Calif.). To avoid the bias due to preferential amplification of certain templates in a single PCR, multiple clones derived from two separate PCRs were picked up and sequenced completely with a BigDye terminator cycle sequencing kit and a model ABI 373A automated sequencer (Applied Biosystems, Foster City, Calif.).
The nucleotide sequences of 10 clones derived from dengue viruses from the plasma of a DF patient, ID17, were aligned with the program DNAMAN (version 4.15; Lynnon Biosoft, Quebec, Canada), and their 393-bp regions (excluding the sequences of primers) were compared. There were a total of 33 nucleotide substitutions, of which 30 were nonsilent and 3 were silent. To assess the extent of sequence variation, we determined the mean diversity, which was the number of substitutions divided by the total number of nucleotides sequenced (26). It was 0.84% for ID17 (Table 1). Among the 10 clones analyzed, some clones had more nucleotide substitutions whereas others had fewer. We therefore employed another method, pairwise comparison of each nucleotide sequence using the program MEGA, version 1.02 (Molecular Evolutionary Genetics Analysis, Pennsylvania State University, University Park), to assess the extent of sequence variation. The pairwise p-distances (proportions of difference in distances) thus determined ranged from 0.76 to 2.80% with a mean of 1.67% (Table 1).
To further examine the extent of sequence variation of den-gue virus in vivo, RT and PCR were carried out for viral RNAs derived from the plasma of five other patients, including three DF and two DHF patients (25). Ten to seventeen clones from each sample were completely sequenced and analyzed, and the results are summarized in Table 1. There were more nonsilent substitutions than silent substitutions for most of the samples examined. The mean diversity ranged from 0.10 to 0.84%, indicating that the extent of sequence diversity varies among patients. Some patients, for example, ID3, had very homogeneous sequences (mean diversity, 0.10%) ( Table 1). This was also revealed by the pairwise p-distance (mean, 0.21%) ( Table  1). Overall, the extent of sequence variation determined by the mean diversity correlated with that determined by the mean pairwise p-distance (coefficient of correlation, r ϭ 0.9998) (SPSS software, base 8.0; SPSS, Chicago, Ill.). These findings suggested that similar extents of sequence variation were seen in most of the clones analyzed and indicated that dengue virus exists as a population of closely related sequences in vivo.
To investigate the extent and the distribution of sequence variation at the protein level, the deduced amino acid sequences of all clones from each patient were aligned and analyzed. The mean diversity of the amino acids ranged from 0.13 to 2.29%, and the mean p-distances ranged from 0.27 to 4.14% (Table 2). These results are generally in agreement with those at the nucleotide level in that the mean p-distances of amino acids correlated with those of the nucleotides (r ϭ 0.997) (SPSS, base 8.0) ( Table 1). Shown in Fig. 2 are the amino acid sequence alignments of viral E proteins from two patients, ID17 and ID20. For ID17, there were a total of 30 amino acid substitutions within the 131-amino-acid region analyzed ( Fig.  2A). Some amino acid changes were conservative (such as threonine to serine and isoleucine to leucine) ( Fig. 2A), whereas others were drastic (such as glutamic acid to valine or glycine) ( Fig. 2A). Two in-frame stop codons were found at amino acid residues 290 (clone 7A) and 301 (clone 4A), indicating that genome-defective dengue viruses were present in the plasma. For ID20, there were 10 amino acid substitutions (Fig. 2B). Of note was a single nucleotide deletion at the third base of amino acid residue 353 (of clone 1B) that resulted in a frameshift and a premature stop codon 3 residues downstream (Fig. 2B). Among the 69 clones analyzed, there were 3 clones that had in-frame stop codons (clones ID17-4A, ID17-7A, and ID19-2B) and one clone (ID20-1B) with a deletion (Fig. 2 and  data not shown). This corresponds to a frequency of defective viruses of 5.8%, based on the small fraction of the genome examined in this study.
Using the clonal sequencing analysis, we demonstrated that dengue virus is present as a quasispecies in vivo. This finding suggests that future analysis and interpretation of dengue viral sequences derived from clinical samples should take into consideration the quasispecies structure of dengue viruses in vivo, i.e., the simultaneous presence of multiple variant genomes. The possibility that the sequence variation observed was due to in vitro artifacts needs to be addressed. The error rate of RT was estimated to be around 10 Ϫ4 on a complex template (20). The error rate of Taq polymerase was reported to be between 3.9 ϫ 10 Ϫ6 and 9.1 ϫ 10 Ϫ6 /nucleotide/cycle under most PCR conditions (12)(13)(14). After 60 cycles of PCR amplification, the error frequencies were between 2.3 ϫ 10 Ϫ4 and 5.5 ϫ 10 Ϫ4 , which, together with the RT errors, are lower than the mean diversity (0.10 to 0.84%) observed in our samples. We carried out a control experiment in which a recombinant E clone with a known sequence was transcribed in vitro with T7 polymerase, followed by RT, PCR, subcloning, and sequencing under identical conditions. Among the eight clones analyzed, only one substitution out of 3,144 bases sequenced was found. This corresponds to an error frequency of 0.03%, which was lower than the mean diversity observed in our samples. Considering the error frequencies and mean levels of diversity together, the sequence variation reported in this study is unlikely to be due to in vitro artifacts, though a small proportion of the substitutions might have been introduced by RT or Taq polymerase.
The extent of sequence diversity observed in this study is similar to what has been reported for acute infection with HIV-1 and HCV, a member of Flaviviridae (4,13,26). In the case of HCV, the mean Hamming distances, which are the  averages of the fractions of the amino acid differences taken for all sequence pairs derived from a single sample, ranged from 6 to 7% in hypervariable region 1 (HVR1) of E and ranged from less than 1 to 2% in the region outside HVR1 (4). The mean p-distances of amino acid sequences in our study (equivalent to the mean Hamming distances), ranging from 0.27 to 4.14%, were close to the mean Hamming distances of the region outside HVR1 of HCV and were lower than those in HVR1. This suggests that there is no hypervariable region like HVR1 of HCV identified in the 430-bp E region examined in this study. Consistent with this finding, analysis of the distribution of amino acid substitutions in different domains (the hinge junction, the connecting segment, and domain III) within this region of all clones reveals that there is no clustering of mutations in any particular domain examined (one-way analysis of variance, P ϭ 0.734) ( Fig. 2 and data not shown).
It should be noted that the sequence variation examined in this study was in the E gene, which is known to be the major determinant of cell tropism and the target of both humoral and cellular immune responses in many viruses. The E gene is therefore an area likely to be under positive selection. The difference in number of nonsynonymous nucleotide substitutions per site (dN) and the difference in number of synonymous nucleotide substitutions per site (dS) for each case were calculated with the MEGA program based on the method of Nei and Gojobori (17). The ratio of dN to dS was thus determined. With the exception of the mean dN/dS ratio of patient ID17, 2.13, the mean dN/dS ratios of five other patients (ranging from 0.48 to 0.85) were similar to those for the E region outside HVR1 in patients with acute HCV infection (ranging from 0.12 to 0.73) (4). The observation that the dN/dS ratios were generally lower than 1 suggests that strong positive selection for this E region is unlikely to have occurred in most cases of dengue virus infection, which is characterized by a short-lived resolving viremia (5,10,16,22).
The relationship between the extent of sequence diversity and disease severity has been studied in several RNA viruses. For example, it was reported that a higher degree of sequence  diversity of HIV-1 correlates with slower disease progression, suggesting an adaptive evolution in vivo (24). It has been reported recently that in the acute phase of HCV infection, progressing hepatitis was associated with genetic evolution whereas resolving hepatitis correlated with evolutionary stasis (4). By comparing the six samples in this study, we found that the levels of sequence diversity in the DHF patients were in the same range as those in the DF patients (Table 1). Recognizing as a caveat the small sample size in this study, studies of more cases are needed in the future to investigate the relationship between the extent of sequence diversity and disease severity in dengue virus infection. Genome-defective dengue viruses containing either stop codons or a deletion, which have been confirmed by the sequencing of both strands, were identified in 5.8% of all clones analyzed. The frequency is similar to what has been reported for HCV, another member of Flaviviridae (13). There was evidence indicating that defective viruses can modulate viral replication in vitro, affect the clinical course of disease, or lead to the establishment of persistent infection (1,3,9). It should be noted that the 393-bp E region sequenced corresponds to 3.7% of the dengue virus genome. Although variation in the E gene may not be representative of the entire dengue virus genome, our finding that 5.8% of the clones in a small fraction of the genome were defective suggests that the frequency of defective viruses might be higher. Further analysis of more sequences of multiple regions would be required to confirm this. Interestingly, while the defective virus was found in only one of the four DF patients in this study, it was found in both of the DHF patients. Whether the higher frequencies of defective viruses seen in DHF patients result from higher levels of viral replication (16,22) or whether the defective viruses contribute to the pathogenesis of DHF remains to be investigated.
Nucleotide sequence accession numbers. The sequences of the E genes of dengue viruses from the six patients studied here have been submitted to GenBank, and their accession numbers are AY053394 through AY053399.