Previous Article | Next Article ![]()
Journal of Virology, March 2004, p. 2426-2433, Vol. 78, No. 5
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.5.2426-2433.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Department of Medicine, Duke University Medical Center,1 Department of Biostatistics and Bioinformatics and Center for Bioinformatics and Computational Biology, Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27710,3 Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama 352942
Received 1 April 2003/ Accepted 8 November 2003
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Mutation rates in HIV-1 genomes have been studied in two systems. One is an in vitro system in which purified RT was used for primer extension of short RNA or DNA templates (2-4, 17, 19, 41, 42). Many factors, including the purity, source, and concentration of RT, the deoxynucleoside triphosphate pool, and other components in the reaction, influence the RT mutation rate in this system. The other system is a single-round viral-replication method (21, 28-30, 33, 38). The mutation rate is studied by determining cDNA sequences after reverse transcription in live cells and is generally not affected by the factors that affect the previously described system. Since these two systems are very different, a wide range of mutation rates has been reported. On average, reported mutation rates are high with purified HIV-1 RT (3x 10-4 to 6 x 10-4) and 10- to 20-fold lower with the single-round infection system (3.4 x 10-5 x 10-5). All previous studies used small DNA or RNA templates, and the overall mutation rate for the complete viral genome could only be extrapolated (2-4, 17, 19, 21, 28-30, 33, 38, 41, 42).
When RT fidelity is studied using partial viral or nonviral gene fragments, it is difficult to fully understand how these mutations affect the biological function of viral genes. To more accurately estimate the viral mutation rate, we have used either the lambda phage library method or the long-range PCR technique in a single-round infection system to obtain and analyze complete and near-full-length HIV-1 genomes. We sequenced a total of 160,000 bp from 19 viral genomes and calculated the mutation rate to be 5.4 x 10-5 per base per replication cycle, which is equivalent to 1.1 mutations in each genome per infection cycle. Inspection of mutations showed that all site mutations in the protein-coding regions were nonsynonymous mutations and that half of the mutations were deleterious. This study is the first to directly examine the mutation rate of the HIV-1 genome by analyzing complete and near-full-length viral genomes.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Cloning and sequencing of complete HIV-1 genomes from Gpt-resistant cells. Genomic DNA was extracted from expanded Gpt-resistant cells through the use of a Qiagen blood and tissue kit (Qiagen Inc., Valencia, Calif.) and digested with restriction enzyme XbaI. The 10- to 20-kb genomic DNA fragments were collected from a sucrose gradient (10 to 40%), purified with phenol-chloroform, and cloned in preprepared lambda phage DASHII XbaI arms (Stratagene, Cedar Creek, Tex.). The positive plaques were screened by hybridization with a BH10 full-length genome probe and purified three times using a plaque-to-plaque purification method. The HIV-1 genomes were subcloned into the pBluescript 3.1 vector at the XbaI site. A primer walking method and an automatic ABI 377 sequencer (Applied Biosystems Inc., Foster City, Calif.) were used to sequence the entire HIV-1 genomes on both strands.
Construction of a green fluorescent protein (GFP) proviral clone.
To study the viral mutation rate in lymphocyte cell lines, we established a new reporter gene system by cloning enhanced green fluorescent protein (EGFP) genes in place of approximately 1,000 bp of the beginning of the coding region of gp120 within the HIV-1 molecular clone pNL4-3. Two PCR products were generated using Deep Vent polymerase (New England Biolabs, Beverly, Mass.). The 5' fragment consisted of the region of HIV-1 NL4-3 from the unique SalI site to the beginning of the env gene open reading frame, and the 3' fragment consisted of the EGFP gene amplified from pEGFP-N1 (Clontech, Palo Alto, Calif.), including the Kozak sequence upstream of the EGFP start codon. An NheI site was introduced at the 3' end of the EGFP open reading frame and used for cloning into pNL4-3. A stop codon was introduced at the 5' end of the EGFP gene in frame with the upstream partial vpu gene to prevent fusion with the unused parallel open reading frame present in the EGFP sequence. The two PCR products were joined by PCR sequence overlap extension to create a 1,165-bp fragment that was cloned directly into pNL4-3, replacing the original 1,465-bp pNL4-3 SalI-NheI fragment. The resulting construct, pNLEnvGFP2, expresses EGFP but lacks Env and Vpu expression. The vpr gene was inactivated by replacing the PflMI-EcoRI vpr deletion fragment from pNLpuro
vpr (25), yielding the final construct pNLEnvGFP2
vpr.
Establishment of Jurkat and THP-1 cell lines that a carry pNLEnvGFP2
Vpr proviral genome.
Plasmids pNLEnvGFP2
vpr and pVSV-G (vesicular stomatitis virus glycoprotein) were transfected together into 293T cells to produce pseudotyped infectious viruses. Viruses were harvested from the cell culture supernatant at 48 h and used to infect Jurkat or THP-1 cells in the presence of 10 µg of Polybrene/ml. Virus replication was limited to a single round of infection, owing to the lack of de novo Env production within the infected target cells. Infected Jurkat and THP-1 cells were identified by EGFP fluorescence analysis performed using a FACStarPlus apparatus (BD Biosciences, San Jose, Calif.), and individual cells were cloned (using the automatic cell deposition unit single-cell-sorting feature of the FACStarPlus) into 96-well round-bottom plates. The cells were maintained in RPMI 1640 supplemented with 2 mM L-glutamine, 100 U of penicillin/ml, 100 µg of streptomycin/ml, 10% heat inactivated fetal bovine serum, and 57 µM ß-mercaptoethanol. After the colonies were grown in cultures for approximately 3 weeks, colonies positive for GFP expression were identified by fluorescence microscopy or flow cytometry and production of HIV-1 Gag antigen (p24) was confirmed. Four Jurkat and three THP-1 cell clones were selected for sequence analysis.
Long-range PCR amplification. Genomic DNA from expanded cells was extracted with a Qiagen blood and tissue kit (Qiagen Inc.). Near-full-length HIV-1 genomes were amplified (using 0.2 µg of DNA from each cell line) with the primers UP1A (5'-AGTGGCGCCCGAACAGG-3') and LOW1 (5'-CACAACAGACGGGCACACACTAC-3') as previously described (9, 10). To avoid PCR error in any particular PCR amplification, ten PCRs were performed and combined for each sample. Nested primers UP2 (5'-GAGCTCTCTCGACGCAGGAC-3') and LOW2 (5'-TGAGGCTTAAGCAGTGGGTTCC-3') were used to obtain enough PCR products for direct sequencing in cases in which the PCR yield was low in the first round. Expand High Fidelity PCR system enzyme mix (Roche Molecular Biochemicals, Indianapolis, Ind.) was used to minimize misincorporation during PCR amplification. The combined PCR products were directly sequenced as described for the lambda phage clones.
Construction of a RT mutant. A two-bp mutation (GG to CA) was introduced into the complementary primers G262A+ (5'-GTTAGTGGCAAAATTGAATTGG-3') and G262A- (5'-CCAATTCAATTTTGCCACTAAC-3'). The mutation changed the Gly (GGG) codon to the Ala (GCA) codon. Primers PA (5'-CCAGAGGAGCTCTCTCGACGCA-3') and PB (5'-CCATCCCCTAGCTTTCCCTGA-3') were located outside of the BssHII and NdeI sites in the R7/3-gpt genome. To construct the RT mutant, two PCRs were performed with primers PA/G262A- and PB/G262A+ (with R7/3-gpt as the template) in a total volume of 100 µl containing 20 mM Tris HCl (pH 8.0), 10 mM KCl, 10 mM (NH4)2SO2, 0.1% Triton X-100, bovine serum albumin (0.1 mg/ml), 0.2 mM deoxynucleoside triphosphate, 10 ng of template DNA, 20 pmol of each primer, and 2.5 U of PFU polymerase (Stratagene, La Jolla, Calif.). The thermocycling conditions were as follows: one cycle of 94°C for 3 min and 30 cycles of 94°C for 30 s, 55°C for 30 s, and 72°C for 1 min. For the second-round PCR, 1 µl of PCR product from each PCR and the PA and PB primers were used to connect the two PCR fragments under the same thermocycling conditions (with the exception of a 2-min extension at 72°C). We used high-fidelity enzyme (PFU) and 20 thermocycles during the PCR amplification to minimize misincorporation. The final PCR product was isolated from an agarose gel, digested with BssHII and NdeI, and ligated into the R7/3-gpt proviral clone. The insertion was confirmed by sequence analysis.
Inverted PCR amplification. Genomic DNA (3 µg) was digested with PstI in a volume of 60 µl for 3 h at 37°C. The enzyme was inactivated at 65°C for 10 min. A total of 15 µl of the reaction mixture was subsequently self-ligated with T4 DNA ligase overnight at 16°C. The ligated DNA and two nested primer pairs (for first-round PCR, primer pair gagA [5'-ATCACCTAGAACTTTAAATGCATGGG-3'] and U3A [5'-AATCAGGGAAGTAGCCTTGTGTGTG-3']; for second-round PCR, primer pair gagB [5'-GGAGCCACCCCACAAGATTTAA-3'] and U3B [5'-GTAGATCCACAGATCAAGG-3']) were then used to amplify the junction sequence between the cellular DNA and the integrated proviral DNA. The thermocycling conditions for PCR were as follows: 1 cycle of 94°C for 3 min and 30 cycles of 94°C for 1 min, 64°C for 1 min, and 72°C for 4 min followed by 1 cycle of extension at 72°C for 10 min. The PCR products were visualized on a 0.7% agarose gel, purified with a QIAquick gel extraction kit (Qiagen Inc.), and sequenced either directly or from molecular clones.
Data analysis. Statistical data analyses were performed using S-PLUS 6.1 for Windows, release 1 (Insightful Corp.) (for the calculation of contingency tables), and software written in Fortran 90 by one of the authors (T.B.K.) (for confidence intervals). To compute the expected values of the proportions of synonymous, nonsynonymous, and lethal mutants under the null hypothesis, we wrote a program (in Fortran 90) to examine all possible single-point mutations in the HIV genome. In computing the expected values, transition mutations were weighted more heavily (by 4:1) than transversions. All genes containing the mutation were examined for each point mutation, since there are several areas of gene overlap. A lethal change in at least one gene was scored lethal, a nonsynonymous change in at least one gene was scored nonsynonymous, and all others were scored synonymous.
| RESULTS |
|---|
|
|
|---|
The mutations of the HIV-1 genomes in HeLa cells are summarized in Table 1. One mutation was identified in both the Gpt1 and the M2.1 clones, while two mutations were identified in the Gpt2 clone. Among these mutations, three were transition mutations (G to A) and one was a transversion mutation (T to A). All site mutations resulted in either amino acid changes or stop codons (nonsynonymous mutations). Three other clones (M2.3, M2.B, and H12A) did not have any mutations. No deletions or insertions were found in any clones.
|
To investigate whether each clone represented one independent infection (especially for the clones that have no mutations [M2.3, M2.B, and H12A]), the cellular sequences adjacent to the viral LTR were compared. The sequences from all six clones had unique cellular flanking sequences, and they were all different from the sequence of the original plasmid DNA used for transfection. This confirmed that all the clones were from independent integration events.
Mutation rates of the HIV-1 genome in lymphocytes. The lambda phage library method is a reliable way to study the mutation rate of the HIV genome in a single-round infection system, since only high-fidelity DNA polymerase is used to obtain the entire proviral genome. However, this method is time consuming and labor intensive. Recently, we used the long-range PCR technique to obtain near-full-length HIV-1 genomes for the study of viral variation and evolution (9-12). Only 191 bp of the HIV-1 genome is missing in the amplified PCR product. We tested whether the same method could be used to quickly evaluate the nature and rate of mutations of the HIV-1 genome in lymphocyte cell lines, since most previous studies of RT mutation rates were performed on adherent cells (HeLa) that are not the HIV-1 target cells in vivo (28-30, 33).
A selection system for single-round-infected lymphocyte cells was established by replacing the beginning of the env gene in the HIV-1 genome with the EGFP gene. Plasmids pNLEnvGFP2
vpr and pVSV-G were cotransfected into 293T cells. At 2 days later, pseudotyped viruses were harvested and used to infect either Jurkat (T cell) or THP-1 (monocyte) cells. Since plasmid pNLEnvGFP2
vpr did not contain the env gene, no de novo Env protein would be synthesized after infection and only one round of infection was completed, as seen with our Gpt selection system. Individual cells expressing GFP were sorted with a FACStarPlus cell sorter and seeded into single wells of a 96-well plate. Following expansion of the clones, GFP expression and the presence of p24 protein were confirmed with fluorescence microscopy and a commercial enzyme-linked immunosorbent assay kit, respectively.
Four Jurkat and three THP-1 cell clones were expanded, and the genomic DNA was extracted. Near-full-length HIV-1 genomes were amplified by nested PCR amplifications. To limit the misincorporation generated during PCR amplification, the following steps were adopted. First, a high-fidelity PCR system was used for PCR amplification (Expand High Fidelity PCR system enzyme mix). This enzyme mix has high efficiency for long-range PCR and higher fidelity than standard Taq polymerase. Second, high copy numbers (at least 32,000 copies) of proviral DNA were used for each PCR. Therefore, random misincorporation in any one template at the initial PCR cycles was diluted out and was not detected by directly sequencing of the PCR products. Third, 10 or more independent PCR amplification reactions were carried out for each sample and were combined before sequencing was performed. Finally, all mutations were confirmed by additional independent PCR amplifications (covering only the mutation regions) and sequence analysis.
Near-full-length HIV-1 genomes from four Jurkat and three THP-1 cell clones were sequenced, and the resulting sequences were compared to the original pNLEnvGFP2
vpr sequence. Three proviral genomes from THP-1 cell clones were sequenced, and mutations were found in all of them. THP-1.15 had two single-base mutations (Table 2). THP-1.4 had a single-base mutation and two deletions. The first deletion (118 bp) was in the vif gene, and the second deletion (23 bp) was in the tat/rev first exon-overlap region. Direct repeats (ACAG and CA) were found for two deletion mutations in THP-1.4 and THP-1.8 clones (Fig. 1), suggesting that both deletions were the result of template misalignment between direct repeats in the genome, as previously reported (37). Among four HIV-1 genomes from Jurkat cell clones, two (Jurkat.1 and Jurkat.3) were found to have two and three base mutations, respectively, while two others (Jurkat.2 and Jurakat.4) did not carry any mutations. No deletions or insertions were found in the proviral genomes from the Jurkat cell clones (Table 2).
|
|
Since PCR-amplified HIV genomes did not contain cellular flanking sequences, it was impossible to determine whether they represented independent clones, especially for the clones that were identical to each other. To rule out this possibility, inverted PCRs were performed to amplify the cellular sequences flanking the 5' LTR of the integrated HIV-1 proviral genomes and to investigate whether each clone was from an independent infection (5). The sequence analysis of the PCR products indicated that all cellular sequences from the studied clones were unique. Thus, each clone indeed represented an independent infection event.
The mutation rate of the HIV-1 genome with a mutation in the RT gene. The mutation (G262A) in HIV-1 RT can dramatically increase the dissociation rate constant for templates and primers and decrease its fidelity (1). To investigate whether the mutation in RT would result in an increase of mutations in the single-round infection system, we used a recombinant PCR technique to introduce a 2-bp mutation (GG to CA) at position 262 of RT in the R7/3-gpt genome. The mutation changed Gly to Ala in wild-type RT. The PCR product containing the mutated RT was subsequently cloned into the R7/3-gpt proviral clone at two unique restriction enzyme sites (BssHII and NdeI). Sequence analysis of the mutated clone (R7/3-gptG262A) confirmed the mutations, and no additional mutation was introduced. R7/3-gptG262A and pVSV-G were cotransfected into 293T cells, and pseudotyped viruses were used to infect HeLa cells.
Six clones were expanded, and genomic DNA was extracted. Since the long-range PCR technique proved to be a reliable and much faster means for the study of mutations in a single-round infection system, near-full-length proviral genomes were amplified and sequenced to evaluate the nature and rate of mutations in all six clones (Table 3). Mutations were found in four genomes (G262A.1, G262A.3, G262A.4, and G262A.5); two other genomes (G262A.2 and G262A.6) contained no mutation. G262A.4 and G262A.5 had identical 72-bp deletions. There were two 72-bp long direct repeats in the region, and the mutation deleted one complete repeat. All deletion mutations occurred at the direct repeat sites (Fig. 1).
|
| DISCUSSION |
|---|
|
|
|---|
We sequenced 19 complete and near-full-length HIV genomes and found at least one mutation in 12 clones. Seven other clones contained no mutation. The number of the genomes with or without mutations were Poisson distributed. The highest number of mutations identified in any HIV-1 genome was three. A total of 157,993 bases were sequenced, and 21 mutations (14 site mutations, one insertion, and six deletions) were identified (Table 4). The locations of all the mutations are summarized in Fig. 2. The overall mutation rate was estimated at 5.4 x 10-5 per base per replication cycle (21/391,749 bp), which was similar to the rate (3.4 x 10-5) estimated by using the lacZ gene as a template in a single-round infection system (28, 29, 32, 40). The genome mutation rate was estimated at 1.1 mutations per genome per infection cycle (21 mutations among 19 genomes), which was similar to that previously estimated by using small nonviral sequences as target templates (32).
|
|
We used the long-PCR technique to amplify 13 near-full-length HIV-1 proviral genomes from single-round-infected lymphocyte or HeLa cells. Seventeen mutations (10 site mutations, six deletions, and one insertion) were identified, and all were confirmed by additional independent PCR amplification and sequence analysis. With Expand High Fidelity Taq polymerase, a combination of multiple PCRs, the input of high copy numbers of templates, and the confirmation of the mutations by independent PCRs, the long-range PCR method proved to be a reliable and quick method for the study of the mutation rates of the HIV-1 genome.
Other viral proteins (NC and Vpr) have been reported to influence viral RT fidelity and mutation rates (27, 29, 36). By comparing Vpr+ and Vpr- retroviral vectors, Mansky found that the mutation rate in a Vpr- viral clone was fourfold higher than that in a Vpr+ viral clone (12 x 10-5 versus 3.4 x 10-5) (29). In our system, all proviral clones contained a defective vpr gene but the mutation rate (5.4 x 10-5) was similar to that for a Vpr+ viral vector (P = 0.081). To evaluate whether the mutation rate is higher with Vpr protein when a complete HIV-1 genome is used as the template, a functional vpr gene should be restored in our vectors to reevaluate the mutation rates. The presence of the cellular protein CEM15/APOBEC3G was found to severely diminish retrovirus infectivity through deamination of dC
dU during the minus-strand cDNA synthesis (13, 26, 47). The deamination led to highly biased G-to-A hypermutation and resulted in a high rate of defective genomes and decreased infectivity. The HIV-1 vif gene might be able to counteract the CEM15/APOBEC3G-mediated inhibition of retroviral infections. However, it was not clear whether the G-to-A hypermutation was prevented by the presence of the vif gene. Since CEM15/APOBEC3G-negative cells (293T and HeLa) were used for most of the experiments in this study and all the proviral clones contained the functional vif gene, mutations caused by CEM15/APOBEC3G should be negligible in our analysis.
Near-full-length retroviral genomes and either fingerprint or denaturing gradient gel methods (24, 35) were used to estimate the viral mutation rate for Rouse sarcoma virus and murine leukemia virus. Since no sequence data were obtained in these studies, however, mutations could not be precisely located and the biological impact of mutations could not be evaluated. By sequencing complete viral genomes after single-round infection, we identified 14 single-base mutations. A total of 13 site mutations occurred in the viral protein-coding regions, and all led to amino acid changes. Among 13 nonsynonymous mutations (8 nonsynonymous mutations and five stop codons), the mutation frequencies at each position of the codons were very similar. We found six, five, and four mutations (two extra mutations from the gene overlap regions) at the first, second, and third positions of amino acid codons, respectively. Therefore, the high nonsynonymous mutation rate cannot be explained by a preference for mutations at the first and second positions in amino acid codons. Since (in general) about two-thirds of random mutations lead to amino acid changes, about 8 mutations (among 13 site mutations) were expected to be nonsynonymous mutations. This expectation is significantly different from what was observed in this study (P = 0.0001). The reason why no synonymous mutation was observed is not clear. It is possible that some synonymous mutations will be identified when more genomes are analyzed.
More than one-third of nonsynonymous mutations (5/13) resulted in dysfunctional genes due to premature stop codons (Table 1, Table 2, and Table 3). All six deletions (23, 72, 118, 300, and 869 bp) were either big or not in frame, and they also led to defective genes. These deleterious mutations yielded a lethal mutation rate as high as 52% (11/21). Many other nonsynonymous mutations were changes between very different amino acids or at highly conserved positions. These mutations might affect the protein conformation and/or biological functions. Some of them might result in nonfunctional genes and further increase the ratio of defective genomes.
Amino acid residues 257 to 266 are highly conserved among retroviral pol genes and form part of an
-helix in the thumb subdomain that interacts with DNA (18). To investigate whether G262A mutation would increase the mutation rate in the complete genome analysis, we replaced Gly with Ala in the R7/3-gpt clone. We amplified and sequenced six proviral genomes from HeLa cell clones. A total of six mutations (three deletions, one insertion, and two site mutations) were identified. In the analysis of six proviral genomes in the R7/3-gpt genome in HeLa cells, in contrast, no deletion or insertion mutation was detected with wild-type RT. For deletions or insertion to occur, the primer and template need to dissociate and realign to the template. Therefore, the presence of higher-level deletion and insertion mutations in R7/3-gptG262A clones could be due to the higher dissociation rate between the template and the primer, as reported previously (1). G262A mutations could increase the dissociation rate constant for the template and primer by over 200-fold and could increase the mutation rate by fourfold (1). In our study, however, the site mutation rate with the R7/3-gptG262A clone (1.6 x 10-5) was only about twofold lower than that observed with the wild-type R7/3-gpt clone (Table 1), and the differences in mutation rates between R7/3-gptG262A and R7/3-gpt genomes were not statistically significant (P > 0.1). More sequences are needed from both R7/3-gpt and R7/3-gptG262A clones to clarify the discrepancies between these two studies.
The G262A mutation in RT is also resistant to izovudine (1). The results of this and other studies indicated that HIV-1 genome with drug-resistant mutations may change the mutation pattern and possibly the mutation rate. Many drug resistance-related mutations have been found to modify RT fidelity (4, 20, 31, 45). Since complete genome analysis in a single-round infection system can directly estimate the mutation rate and potential biological influence of the mutations in the HIV-1 genome, viruses that carry mutations resistant to RT inhibitors can be studied in this system to understand how drug-resistant mutations affect the nature and rates of mutations and what their potential biological influence is on viral biological phenotypes.
| ACKNOWLEDGMENTS |
|---|
F.G. was supported by an NIH/NIGMS grant, GM065057.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| J. Bacteriol. | Mol. Cell. Biol. | Microbiol. Mol. Biol. Rev. |
|---|
| Clin. Vaccine Immunol. | ALL ASM JOURNALS |
|---|