Previous Article | Next Article ![]()
Journal of Virology, April 2008, p. 3952-3970, Vol. 82, No. 8
0022-538X/08/$08.00+0 doi:10.1128/JVI.02660-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.

Departments of Medicine,1 Microbiology, University of Alabama at Birmingham, Birmingham, Alabama 35294,10 Institute of Genetics, University of Nottingham, Nottingham NG7 2UH, United Kingdom,2 Department of Pathology and Laboratory Medicine, Emory University, Atlanta, Georgia 30329,3 Zambia-Emory HIV Research Group (ZEHRG) and Zambia Blood Transfusion Service, Lusaka, Zambia,4 Department of Internal Medicine,5 UNC Center for AIDS Research, University of North Carolina, Chapel Hill, North Carolina 27599,6 Duke Human Vaccine Institute, Duke University Medical Center, Durham, North Carolina 27710,7 Los Alamos National Laboratory, Los Alamos, New Mexico 87545,8 Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom9
Received 14 December 2007/ Accepted 29 January 2008
|
|
|---|
|
|
|---|
Differing findings concerning the complexity of viruses in the acute and early phases of HIV-1 infection likely result from a combination of factors, including differences in study populations and associated risk behaviors; various clinical and laboratory definitions of acute, early, and chronic infection; and different experimental strategies used to analyze the genetic complexity of the evolving viral quasispecies postinfection. Chief among these are differences in the experimental designs and the methodologies used. A common approach has been to identify subjects within the first several months following infection and to derive viral sequences by bulk or near-limiting-dilution PCR amplification of proviral DNA or plasma RNA in samples obtained from them, followed by cloning, sequencing, and phylogenetic analysis (2, 4, 5, 9, 15, 21, 24, 31, 32, 36, 37, 57). In addition, acute infection cases have been analyzed by using the heteroduplex tracking assay (HTA) (4, 15, 21, 31, 32). While these approaches provide a first approximation of the complexity of transmitted virus(es), they have significant limitations. HTA, for example, interrogates only a fraction of the gene of interest and does not provide sequence information (4, 15, 21, 31, 32). Consequently, HTA allows for only qualitative inferences regarding the genetic complexity of virus populations. Bulk and near-endpoint PCR followed by cloning and sequencing are compromised by Taq polymerase-mediated template switching, which generates recombinants in vitro that do not exist in vivo (6, 26, 35, 43, 54); the introduction of Taq polymerase errors into cloned products (27); and nonproportional representation of target sequences due to template resampling (44, 45) or unequal template amplification and cloning (20, 27). Verified endpoint titration of viral nucleic acids eliminates PCR-induced recombination and ensures the proportional representation of target sequences (43), but if the amplicon is molecularly cloned prior to sequencing (as is generally the case), it will contain Taq-induced nucleotide misincorporations (27).
In this study, we sought to develop an experimental strategy that would allow us to amplify and sequence complete HIV-1 env genes from virion RNA in the plasma of infected individuals while avoiding methodological artifacts inherent in previously used approaches. To this end, we adapted methods described by Palmer and colleagues, who used single-genome sequencing of uncloned pro-pol amplicons to identify genetically linked drug resistance mutations in plasma-derived virus (27). The newly developed methods were applied to characterize the evolving quasispecies in a cohort of heterosexual Zambian subjects with acute or early HIV-1 clade C (or other) infections who were screened at 3-month intervals for evidence of incident infections. We also compared the results obtained by single-genome amplification (SGA)-direct sequencing with those obtained by more-conventional bulk PCR cloning and sequencing methods on the same clinical specimens and developed methods for evaluating the sample misidentification and cross-contamination that invariably occur in the setting of a large field trial. Our results define an effective experimental strategy for the molecular analysis of the transmission of HIV-1 and its early evolution in cohorts with differing risk behaviors and in vaccine trials where the identification of transmitted viruses and molecular pathways of virus immune escape can be instrumental in assessing vaccine efficacy or failure.
|
|
|---|
Laboratory staging of primary HIV-1 infection. The subjects were classified according to the system of Fiebig et al. (8) that is based on the detection of HIV-1-specific RNA, antigen, and antibody in plasma (Fig. 1). The Fiebig classification includes an eclipse phase that precedes the first detection of vRNA and subsequent stages defined by the orderly appearance of viral markers: stage I (vRNA positive, p24 antigen and antibody negative), stage II (vRNA and p24 antigen positive, antibody negative), stage III (enzyme-linked immunosorbent assay [ELISA] antibody positive, Western blot negative), stage IV (ELISA positive, Western blot indeterminant), stage V (ELISA and Western blot positive, p31 integrase antibody negative), and stage VI (ELISA, Western blot, and p31 integrase antibody positive). Plasma samples from the patients were tested for HIV-1 antibodies by ELISA (Enzygnost HIV integral from Dade Behring, Marburg, Germany; Biotest anti-HIV tetra ELISA from Biotest, Dreieich, Germany; or Abbott Murex HIV Ag/Ab combination assay from Murex Biotech Limited, Dartford, United Kingdom) and Western blot analysis (Genetics Systems HIV-1 Western blot from Bio-Rad Laboratories, Hercules, CA). In accordance with diagnostic guidelines, a positive Western blot was defined as reactivity with any two of the following three HIV-1 proteins: the exterior envelope glycoprotein/uncleaved envelope precursor (gp120/gp160), the transmembrane envelope glycoprotein (gp41), and the major core protein (p24). An indeterminate Western blot was defined as any visible band that did not meet the positivity criteria. A negative Western blot included no visible bands. A subset of samples was also monitored for p24 antigen by using a Beckman Coulter HIV-1 p24 antigen ELISA (Beckman Coulter Inc., Fullerton, CA).
![]() View larger version (16K): [in a new window] |
FIG. 1. Laboratory staging of acute and early HIV-1 infections. (A) Temporal appearance of HIV-1-specific laboratory markers following HIV-1 infection according to the classification system of Fiebig et al. (8). The eclipse phase is defined by the interval between transmission and first detection of vRNA in the plasma and generally lasts about 10 days, with a range of approximately 7 to 21 days (3, 10, 17-19, 40). The mean durations of Fiebig stages I (7 days), II (5 days), III (3 days), IV (6 days), and V/VI (70+ days) are indicated. (B) Time points (x axis) at which plasma samples were obtained for each of the 12 study subjects (y axis). Because subjects were studied at intervals of 3 months, the symbols are positioned to represent the maximum possible number of days from transmission.
|
50 µl. For SGA and bulk PCR methods, between 1,000 and 20,000 vRNA molecules were typically reverse transcribed. cDNA synthesis. Reverse transcription of RNA to single-stranded cDNA was performed by using the SuperScript III protocol according to the manufacturer's instructions (Invitrogen Life Technologies, Carlsbad, CA). RNA, deoxynucleoside triphosphates (0.5 mM each), and 0.25 µM primer OFM19 (5'-GCACTCAAGGCAAGCTTTATTGAGGCTTA-3'; nucleotides [nt] 9604 to 9632 of the HXB2 sequence) were incubated for 5 min at 65°C to denature secondary structure of the RNA. First-strand cDNA synthesis was carried out in 20- to 100-µl reaction mixtures with 1x reverse transcriptase buffer containing 5 mM dithiothreitol, 2 U/µl of an RNase inhibitor (RNaseOUT), and 10 U/µl SuperScript III. For SGA, the reaction mixture was incubated at 50°C for 60 min followed by an additional hour at 55°C. For bulk PCR, the reaction mixture was incubated at 55°C for 60 min. Following the completion of the reverse transcription step, the reaction mixture was inactivated by being heated to 70°C for 15 min followed by RNase H digestion at 37°C for 20 min (Invitrogen Life Technologies, Carlsbad, CA). The resulting cDNA was used immediately for PCR or kept frozen at –80°C until further analysis.
Standard (bulk) env gene amplification.
Full-length rev/env cassettes (including parts of the first exon of the tat gene; the entire vpu, rev, and env genes; and parts of the nef gene) were amplified by nested PCR from plasma-derived viral cDNA as previously described (5, 50, 51). Briefly, 1 µl of bulk cDNA (containing 100 to 1,000 viral templates) was subjected to first-round PCR in a volume of 20 µl. PCR was performed by using an Expand High Fidelity PCR system (Roche Diagnostic Corporation, Indianapolis, IN) in 1x Expand PCR buffer containing 1.5 mM MgCl2, 0.2 mM of each deoxynucleoside triphosphate, and 0.2 µM of Vif1 (5'-GGGTTTATTACAGGGACAGCAGAG-3'; nt 4900 to 4923) and OFM19 primers. The following cycling conditions were used: 94°C for 2 min followed by 35 cycles of 94°C for 15 s, 55°C for 30 s, and 68°C for 4 min, with a final extension of 68°C for 10 min. Second-round PCR was performed by using 1 µl of the first-round PCR product and primers EnvA (5' GGCTTAGGCATCTCCTATGGCAGGAAGAA-3'; nt 5954 to 5982) and EnvN (5'-CTGCCAATCAGGGAAGTAGCCTTGTGT-3'; nt 9145 to 9171) under the same conditions used for the first-round PCR. To ensure that the amplified HIV-1 env sequences were representative of the plasma quasispecies, five independent nested PCRs were carried out for each specimen (20, 41). The final PCR products were analyzed by 1% agarose gel electrophoresis, and products of the predicted size (
3.2 kb) were ligated into the pcDNA3.1.V5-His TOPO TA vector according to the manufacturer's instructions (Invitrogen Life Technologies, Carlsbad, CA). The ligated vector was transformed into Stbl2 cells at 42°C for 25 s (Invitrogen Life Technologies, Carlsbad, CA). Transformed reaction mixtures were plated on LB-Amp (ampicillin; 100 µg/ml) plates and cultured overnight at 30°C. Multiple colonies were picked and grown in LB broth (100 µg/ml ampicillin), and plasmid DNA was isolated by using a QIAprep spin miniprep kit (Qiagen, Valencia, CA). The resulting plasmid DNA was subjected to restriction enzyme digestion to identify full-length clones, and 10 to 37 env-containing clones from each patient were sequenced.
SGA. For SGA of the full-length env genes, cDNA was endpoint diluted in 96-well plates such that fewer than 29 PCRs yielded an amplification product. According to a Poisson distribution, the cDNA dilution that yields PCR products in no more than 30% of wells contains one amplifiable cDNA template per positive PCR more than 80% of the time. First-round PCR was carried out in 1x High Fidelity platinum PCR buffer, 2 mM MgSO4, 0.2 mM of each deoxynucleoside triphosphate, 0.2 µM of primers Vif1 and OFM19, and 0.025 U/µl platinum Taq High Fidelity polymerase (Invitrogen, Carlsbad, CA) in a 20-µl reaction mixture. The PCR mixtures were set up in MicroAmp optical 96-well reaction plates (Applied Biosystems, Foster City, CA) and sealed with ABI MicroAmp adhesive film. The following PCR conditions were used: 94°C for 2 min followed by 35 cycles of 94°C for 15 s, 55°C for 30 s, and 68°C for 4 min, with a final extension of 68°C for 10 min. Second-round PCR was carried out using 1 to 2 µl of the first-round product and 0.2 µM of primers EnvA and EnvN with the same PCR mixture as the first round. The PCR conditions included: 94°C for 2 min followed by 45 cycles of 94°C for 15 s, 55°C for 30 s, and 68°C for 4 min, with a final extension at 68°C for 10 min. The amplicons were sized on precast 1% agarose E-gel 96 (Invitrogen Life Technologies, Carlsbad, CA). All products derived from cDNA dilutions yielding less than 30% PCR positivity were sequenced. A standard operating procedure for SGA derivation of full-length env genes is available upon request.
Generation of vRNA transcripts from HIV-1 molecular clones with T7 polymerase. Envelope genes corresponding to two transmitted strains of HIV-1 (BORId9.4F8 and BORId9.4F12) obtained from the plasma of an acutely infected individual were directionally cloned into the viral expression vector pcDNA3.1D (Invitrogen Life Technologies, Carlsbad, CA) under the control of the T7 promoter. These plasmids were digested with EcoRV, and linear DNA was recovered by QIAquick gel extraction (Qiagen, Valencia, CA) following electrophoresis in 1% agarose. env RNA transcripts were generated using a Riboprobe in vitro transcription system (Promega, Madison, WI) by incubating 700 ng template DNA with T7 polymerase at 37°C for 80 min and utilizing a standard 100-µl-volume transcription protocol under the following conditions: 1x optimized transcription buffer; 10 mM dithiothreitol; 0.5 mM rATP, rGTP, rCTP, and rUTP; 100 U rRNasin RNase inhibitor; and 80 U T7 polymerase. Following transcription, the DNA template was degraded with 1 U per µg of template RQ1 RNase-free DNase (Promega, Madison, WI) for 15 min at 37°C. RNA molecules of more than 200 nucleotides were concentrated with RNeasy MinElute cleanup (Qiagen, Valencia, CA) and stored at –80°C. The RNA's mass was determined by spectrophotometry (NanoDrop 1000), and the copy number estimated assuming that all transcripts were full-length. RNA transcripts (BORId9.4F8 and BORId9.4F12) were mixed 1:1 at an estimated 100,000 copies total. RNA transcripts were reverse transcribed into cDNA (using primer Env3in; see below) which was then diluted to the single-molecule level prior to PCR amplification as described above, with the following primer modification: for first-round PCR, primer Env5in (5'-TTAGGCATCTCCTATGGCAGGAAGAAG-3'; nt 5957 to 5983) and antisense primer Env3in (5'-GTCTCGAGATACTGCTCCCACCC-3'; nt 8904 to 8882) were used, and for second-round PCR, primers 3'R3BG (5'-CCTATCTGTCCCCTCAGCTACTGC-3'; nt 8510 to 8531) and 5'F3BG (5'-CGACGAAGACCTCCTCAAGACAG-3'; nt 5993 to 6015) were used. A total of 59 envelope genes representing 145,590 bases were sequenced and compared to the input env sequences to detect recombination during cDNA synthesis and to identify all transversion, transition, and insertion-deletion mutations.
Generation of virion RNA by transfection of 293T cells with HIV-1 molecular clones. The infectious molecular clones YU2 and SG3 were independently transfected into 293T cells (Fugene; Roche Diagnostic Corporation, Indianapolis, IN) and cultured for 48 h to generate viral stocks. The virus concentration was determined by using an Amplicor vRNA assay, version 1.5 (Roche Diagnostic Corporation, Indianapolis, IN). Equal numbers of RNA molecules from each virus stock were mixed 1:1, and 20,000 vRNA copies were extracted by using a QIAamp viral RNA mini kit (Qiagen, Valencia, CA). Purified vRNA was reverse transcribed into cDNA (using primer Env3out; see below) and diluted to the single-molecule level prior to PCR amplification as described above, with the following primer modifications: for first-round PCR, primer Env5out (5'-TAGAGCCCTGGAAGCATCCAGGAAG-3'; nt 5853 to 5877) and antisense primer Env3out (5'-TTGCTACTTGTGATTGCTCCATGT-3'; nt 8913 to 8936) were used, and for second-round PCR, primer Env5in (5'-TTAGGCATCTCCTATGGCAGGAAGAAG-3'; nt 5957 to 5983) and antisense primer Env3in (5'-GTCTCGAGATACTGCTCCCACCC-3'; nt 8904 to 8882) were used. A total of 50 envelope genes representing 132,839 bases was sequenced and compared to the input env sequences to detect recombination during cDNA synthesis and to identify all transversion, transition, and insertion-deletion mutations.
DNA sequencing. Viral env genes were sequenced by using BigDye Terminator chemistry and the protocols recommended by the manufacturer (Applied Biosystems, Foster City, CA). The sequences were determined by using an ABI 3730xl genetic analyzer (Applied Biosystems, Foster City, CA) and edited by using the Sequencher program, version 4.7 (Gene Codes, Ann Arbor, MI). Both strands of DNA were sequenced. All chromatograms were carefully inspected for sites of ambiguous sequence (double peaks), and those that contained one or more positions of mixed bases were excluded from further analysis.
Microsatellite analysis. Samples with heterogeneous infections were examined for potential misidentification by microsatellite analysis using an AmpFlSTR Identifiler PCR amplification kit following the manufacturer's instructions (Applied Biosystems, Foster City, CA). Briefly, DNA was extracted from 200 µl of plasma by using a QIAamp DNA blood minikit (Qiagen, Valencia, CA) and used for the amplification of 15 highly polymorphic microsatellite loci (D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, and FGA) and a portion of the amelogenin gene for gender identification in a single PCR amplification. The amplicons were sized by using an ABI 3130 genetic analyzer. Data analysis and allele designations were carried out by using the GeneMarker program (SoftGenetics LLC). The selected markers are widely accepted tetranucleotide loci for genetic characterization, standardized under the Combined DNA Index System (CODIS).
Phylogenetic analyses. Env protein sequences from each subject were aligned using CLUSTAL W (46), and amino acids were then replaced by codons. Intrastrain diversities were calculated by using uncorrected sequence distances. For all other analyses, the alignments were gap stripped and sequences with large deletions were excluded. All trees were constructed by using the neighbor joining method (38) implemented in CLUSTAL W using Kimura's correction (14). The following subtype C sequences were included for reference: 98ZA502 (GenBank accession no. AY158534), DU151 (GenBank accession no. DQ41185), DU422 (GenBank accession no. DQ411854), TV012 (GenBank accession no. AF391243), TV002 (GenBank accession no. AF391232), SK144B1 (GenBank accession no. AY703911), and BWMC168 (GenBank accession no. AF443087).
Recombination breakpoint analysis. Diversity plots were used to map breakpoints in putative recombinants. The percent diversity between env nucleotide sequences of a presumed recombinant and each of two parental strains believed to have been involved in the recombination event was determined by moving a window of 100 bp along an alignment in 10-bp increments. The distance values for each of these pairwise comparisons were plotted at the midpoint of the 100-bp segment. Sites where the putative mosaic was equidistant from both parents (i.e., sites where the two parental distance lines crossed) were scored as recombination crossovers (due to the window size of 100 bp, the positions of these breakpoints are only approximate). If the sequence distance between the recombinant and both putative parents was greater than 0.05 (y axis), then this sequence was scored as being of unknown origin. If the recombinant was identical to one putative parental sequence, then that sequence was taken to be the parent, even if the nucleotide sequence distance in the window to the other parent was less than 0.05. Finally, if the recombinant was identical to both parents, no recombination breakpoint was invoked. All recombinants were confirmed by phylogenetic tree analysis.
Statistical analyses. Power calculations were performed to estimate the likelihood of missing infrequent viral variants present in patient plasma but not sampled and represented in our env analyses. From probability theory, with n plasma vRNA sequences, there is a 95% likelihood that a given missed variant comprises a fraction f (or less) of the virus population where f = 1 – 0.051/n. For n = 20, f is less than 14%; for n = 30, f is less than 10%; and for n = 40, f is less than 8%.
We also estimated the probability that observed clusters of nonsynonymous mutations in the Rev and Env coding regions could occur by chance; if the estimated likelihood is small, we may infer that the clusters reflect selection of variant amino acid sequences. We considered 9-codon windows because this was the length of the observed clusters, and this is also the length of a typical T-cell epitope. From the binomial expansion, the probability of seeing at least the observed number (k) of clustered mutations within a single 9-mer is
![]() |
Nucleotide sequence accession numbers. The GenBank accession numbers for all env sequences determined in this study are EU166353 to EU166402, EU166413 to EU166473, EU166483 to EU166517, and EU166544 to EU166916, except for one bulk-amplified sequence (DQ388514) which was reported previously.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Determination of SGA-direct sequencing error rates using in vitro-synthesized HIV-1 RNA templates
|
Laboratory staging of acute and early HIV-1 infection in subjects.
In order to evaluate the sequence diversity of HIV-1 in relation to the estimated duration of infection, we adopted the Fiebig classification (8), which categorizes patients based on an orderly and reproducible appearance of HIV-1-specific markers in the plasma (Fig. 1A). The eclipse phase is defined by the interval between virus transmission and the first detection of vRNA in the plasma and is believed to last on average about 10 days, with a range of approximately 7 to 21 days (3, 10, 17-19, 40). The mean durations of Fiebig stages I (7 days), II (5 days), III (3 days), IV (6 days), and V/VI (70+ days) are indicated. The 95% confidence intervals (CI) were calculated and reported elsewhere (8). The timing of patient sampling (Fig. 1B) was determined from the clinical record and by analysis of plasma vRNA, p24 antigen, and antibody, and all were in good agreement (Table 2). Two subjects (ZM246F and ZM247F) were initially studied during Fiebig stage II, when their plasma vRNA levels were extremely high and HIV-1 antibody was undetectable. Subject ZM249M was first studied during Fiebig stage IV, when plasma vRNA was still high and antibodies were detectable by ELISA but the Western blot pattern was indeterminant. Based on estimates of Fiebig stage durations, this sample was still expected to have been obtained within
31 days of infection. Subjects ZM246F, ZM247F, and ZM249M were each studied at a second time point 4 to 80 days after the first. The other nine subjects were studied on a single occasion no more than 99 to 151 days after infection, according to their clinical records (Table 2).
|
View this table: [in a new window] |
TABLE 2. Epidemiological and laboratory information for primary infection subjectsa
|
![]() View larger version (11K): [in a new window] |
FIG. 2. Env quasispecies complexity in 12 primary infection subjects from Zambia. A neighbor-joining tree of SGA-derived full-length env sequences is shown. Brackets encompass sequences from each study subject, as indicated. Asterisks at nodes indicate 90% or higher bootstrap values (shown only on branches whose length exceeds 0.003 substitutions per site). The scale bar represents 0.01 nucleotide substitutions per site. 98ZA502, DU151, DU422, TV012, TV002, SK144B1, and BWMC168 represent subtype C reference sequences. ZM231F falls outside all known group M subtypes and thus remains unclassified.
|
|
View this table: [in a new window] |
TABLE 3. Diversity of SGA-direct sequence-derived complete env genes in subjects with primary HIV-1 infections
|
Identification of transmitted or early "founder" viruses. The estimates of time to the MRCA for all subjects with homogeneous infections suggested that viral sequences were coalescing at or near the transmission event. To evaluate this further, we selected for analysis three subjects (ZM249M, ZM247F, and ZM246F) from whom we had very early samples. Figure 3 depicts the phylogenetic trees of the SGA-derived env sequences from two consecutive plasma samples obtained from each of these subjects 4 to 80 days apart. The sequences were also examined by using a novel analytical tool, Highlighter (http://www.hiv.lanl.gov/content/sequence/HIGHLIGHT/highlighter.html), which provides a visually informative representation comparing each env sequence to a selected reference sequence. For subject ZM249M, both the phylogenetic tree and Highlighter analysis identified a single set of 48 identical or nearly identical sequences that differed from the consensus by 4 or fewer nucleotide substitutions, while any 1 sequence differed from another by a maximum of 8 nucleotides (0.3% maximum diversity) (Fig. 3A). For subject ZM247F, both the phylogenetic tree and Highlighter analysis identified two distinct sets of identical or nearly identical env sequences. Although the sequences in one lineage differed from those in the other lineage by as many as 66 nucleotides (2.7% maximum diversity), the sequences from within a lineage differed by no more than 4 nucleotides from each other and by no more than 2 nucleotides from the lineage consensus (<0.2% maximum diversity) (Fig. 3B). Finally, for subject ZM246F, a set of 26 identical or nearly identical sequences that differed from the consensus by 3 or fewer nucleotides and from each other by a maximum of 5 nucleotides (<0.2% maximum diversity) was identified for the first time point (Fig. 3C). Fourteen env sequences obtained 80 days later were still closely related to the original consensus sequence, although none was identical to an earlier sequence. Nonetheless, their consensus was still identical to the consensus of the earlier time point. Mathematical models of virus replication and diversification that assume exponential growth in the absence of immune selection predict a high proportion of identical sequences in acute and very early infections (B. F. Keele, unpublished data). For subject ZM247F, lineages 1 and 2, and subjects ZM246F and ZM249M, the proportions of identical sequences in the earliest samples were 5/7 (71%), 11/18 (61%), 16/27 (59%), and 11/24 (45%), consistent with the model expectations. Thus, we conclude that these consensus sequences correspond to the transmitted or early "founder" viruses that initiated productive clinical infection in these subjects.
![]() ![]() View larger version (59K): [in a new window] |
FIG. 3. Identification of transmitted or early founder env genes in subjects with acute HIV-1 infection. SGA-derived env sequences from pre- and postseroconversion plasma samples from subjects ZM249M (A), ZM247F (B), and ZM246F (C) were examined by phylogenetic tree construction (left panels) and Highlighter analysis (right panels). Sequences from the earlier time points are in bold face. Trees are midpoint rooted. The scale bars represent 1 (A and C) or 10 (B) nt substitutions per site. The corresponding Highlighter diagrams denote the locations of nucleotide sequence substitutions in each env sequence in comparison to a reference sequence listed at the top; the positions of these substitutions within the env sequence are indicated at the bottom. Nucleotide substitutions and gaps are color coded. For each subject, sets of identical and nearly identical env sequences form consensus sequences that coalesce to transmitted or early founder viral env sequences. Subjects ZM249M and ZM246F were infected by a single virus and ZM247F by two viruses, identified as variant 1 and variant 2 in the tree (panel B). The tick in parentheses in sequence 080503_A5 (panel A) indicates a single-nucleotide insertion. The boxed region in panel C indicates clustered mutations in V3. Interestingly, these changes were limited to position 8 (Ile to Thr, Arg, or Lys) and position 25 (Asp to Asn, Glu, or Gly).
|
![]() View larger version (35K): [in a new window] |
FIG. 4. Evidence of immune selection in two subjects sampled at later Fiebig stages. (A) Highlighter analysis of SGA-derived env sequences (left panel) from subject ZM180M (Fiebig stage VI), and corresponding nucleotide (middle panel) and amino acid (right panel) sequence alignments from the overlapping rev gene. Boxes indicate mutations that cluster within a 27-nt region that corresponds to a 9-mer in the Rev protein sequence. (B) Highlighter analysis of SGA-derived env sequences (left panel) from subject ZM206F (Fiebig stage VI), and corresponding nucleotide (middle panel) and amino acid (right panel) sequence alignments of the V1 region. Boxes indicate mutations that cluster within a 27-nt region that corresponds to a 9-mer of the V1 loop. In Highlighter analyses, nucleotide substitutions and gaps are color coded. The consensus sequence is at the top of each alignment; lowercase letters show residues where mutations occurred. Dashes in the alignments indicate sequence identity to the consensus; dots indicate deletions.
|
![]() View larger version (38K): [in a new window] |
FIG. 5. In vivo recombination in multiply-infected subjects. (A) Neighbor-joining tree of SGA-derived env sequences from subject ZM229M depicting two major transmitted variants (orange and purple), as well as five in vivo-generated recombinants (black). Asterisks at nodes indicate 90% or higher bootstrap values. The scale bars represent 0.005 nucleotide substitutions per site. (B-C) Diversity plots of two representative recombinants identified in panel A. The sequence distances of ZM229M_C15 (panel B) and ZM229M_D1 (panel C) are compared to those of representatives of the two parental lineages (orange and purple, respectively). The two recombinants contain between one and seven crossovers; schematic representations of their putative mosaic structures are shown below. (D) Neighbor-joining tree of SGA-derived env sequences from subject ZM215F depicting two major transmitted variants (orange and purple), as well as 19 different recombinants (black). Asterisks and scale bar are as described for panel A. (E-F) Diversity plots of two ZM215F recombinants containing "extraneous" sequences. The sequence distances of ZM215F_D4 (panel B) and ZM215F_F17 (panel C) are compared to those of representatives of the two parental lineages (orange and purple, respectively). Shaded areas indicate regions where ZM215F_D4 and ZM215F_F17 are equidistant from the two parental lineages, suggesting recombination with additional variants. Schematic representations of their putative mosaic structures are shown below.
|
![]() View larger version (29K): [in a new window] |
FIG. 6. Bulk PCR-induced in vitro recombination in a multiply-infected individual. (A) Neighbor-joining tree of SGA-derived env sequences derived from subject ZM247F (sample obtained 1 November 2003) depicting two major transmitted variants (green and blue; sequences of the more-abundant variant are in green) and no recombinants. (B) Neighbor-joining tree of bulk PCR-derived env sequences amplified from the same specimen depicting two transmitted variants (green and blue; sequences of the more-abundant variant are in green) and eight recombinants (red). Asterisks at nodes indicate 90% or higher bootstrap values. The scale bars show 0.002 substitutions per site. (C and D) Diversity plots of two representative recombinants in panel B. The two recombinants contain one or more crossovers, the approximate positions of which are indicated by nucleotide position (x axis) and shown schematically below the panels.
|
Comparison of SGA and bulk amplification methods for detection of specimen cross contamination. Quality control and assurance are always a concern in large-scale clinical trials, since specimens can be misidentified, cross-contaminated, or otherwise compromised. In the course of the present study, we obtained one plasma sample from subject ZM246F in which phylogenetic analyses suggested specimen cross contamination. This questionable sample revealed three SGA-derived env lineages (Fig. 7A), in contrast to the presence of a single viral lineage in this individual's plasma 3 months earlier (Fig. 3C). We considered the possibility that subject ZM246F had become super-infected in the intervening 80 days, but the extent of sequence diversity within the blue lineages made this explanation extremely unlikely. When bulk PCR was performed on the questionable specimen (Fig. 7B), we obtained an even more complicated phylogenetic tree, this time revealing three lineages plus two additional recombinant viruses (ZM246F/M_BULK_41 and ZM246F/M_BULK_60). We resolved this quandary by reviewing clinic records and subjecting the questionable plasma sample to DNA microsatellite analysis. The comparison of 15 polymorphic loci and one gender marker revealed three rather than two alleles in the questionable sample, but not in samples from three control subjects with heterogeneous HIV-1 infections (Table 4). This finding indicated that genetic material from two different individuals had been mixed in the questionable ZM246F sample. Analysis of peripheral blood mononuclear cell DNA from the partner (ZM246M) of subject ZM246F provided an explanation: the contaminating alleles in the questionable specimen were his (Table 4). Thus, during the processing of the couple's blood on the 4 April 2003 clinic visit, the two plasma samples were inadvertently mixed. This inadvertent sample contamination was informative in that the two recombinant viruses whose diversity plots are shown in Fig. 7C and D could only have been generated in vitro as an artifact of Taq-induced template switching, since the two viral lineages from which they were derived never coexisted in the same individual. Moreover, a substantial number of the bulk-derived (17 out of 25), but not SGA-derived, sequences of ZM246M were identical or nearly identical, most likely because of target resampling and/or differences in cloning efficiency among different env sequences (20).
![]() View larger version (42K): [in a new window] |
FIG. 7. Bulk PCR-induced in vitro recombination in a mixture of two plasma samples. (A) Neighbor-joining tree of SGA-derived env sequences derived from a mixture of plasma from two infected subjects, ZM246F (green) and ZM246M (blue). Subject ZM246M was chronically infected with at least two major viral lineages (dark and light blue) that differed in their env sequences by approximately 6%. ZM246F was acutely infected with a virus from an unrelated individual which differed from the ZM246M env sequences by approximately 10%. (B) Neighbor-joining tree of bulk PCR-amplified env sequences from the same mixed-plasma specimen. In addition to viral lineages representing ZM246F (green) and ZM246M (blue), two additional recombinants (ZM246F/M_BULK_41 and ZM246F/M_BULK_60) are apparent (red). Asterisks at nodes indicate 90% or higher bootstrap values. The scale bars show 0.01 nucleotide substitutions per site. (C and D) Diversity plots of the two recombinants in panel B. The approximate position of recombination crossovers is indicated by nucleotide position (x axis) and schematically shown below the panels.
|
|
View this table: [in a new window] |
TABLE 4. Microsatellite analysis of plasma samples with heterogeneous viral sequencesa
|
|
|
|---|
A second objective of this study was to evaluate in a field trial setting the ability of SGA-direct sequencing strategies to decipher transmitted clade C (or other non-clade B) viruses and their early evolution in a time frame typical of vaccine trial follow-up schedules (every 3 months). In a companion study of acute and early subtype B infections (B. F. Keele, unpublished data), we studied 51 subjects in Fiebig stages I/II and 26 subjects in Fiebig stages III/IV; with such early sampling we found that we could infer transmitted or early founder env sequences in most patients, including those infected by more than one virus. A mathematical model of early HIV-1 replication and diversification described in that study provided the theoretical basis for identifying transmitted or early founder viral genomes. Here, we were less certain whether this approach would be applicable since the frequency of patient sampling was less, samples were obtained from the majority of subjects (9/12) weeks to months after infection (Fiebig stage V or VI), and the genetic subtypes analyzed were non-clade B. Nonetheless, we show here for three subjects (ZM249M, ZM247F, and ZM246F) studied prior to seroconversion that the phylogenetic trees and Highlighter analyses allow for an unambiguous identification of the transmitted or early founder virus(es). For six homogeneous-transmission cases studied later in the infection process (ZM178F, ZM180M, ZM184F, ZM206F, ZM231F, and ZM235), the env sequences also coalesced in a time frame consistent with transmitted or early founder viruses (Fig. 3, Table 3). However, this was not the case for individuals who were infected by more than one virus and were sampled for the first time at later time points (e.g., Fiebig stages V/VI); in these instances, identification of the transmitted viruses was precluded by more-extensive nucleotide substitutions, as well as in vivo recombination. This limitation notwithstanding, our findings for primary clade C infections mirror data obtained for primary clade B infections (B. F. Keele, unpublished data): sequences of transmitted or early founder env genes can be readily inferred from SGA-derived sequences if subjects are sampled sufficiently early (Fiebig stages I to IV) and, in some cases, also at later time points (Fiebig stages V to VI) but only if the infection was initiated by a single virus.
A third study objective was to determine if insights into selection pressures on virus replication could be inferred from SGA-derived sequences from single time points distant from the transmission event. We show two examples of this. In Fig. 4A, the results of Highlighter analysis of 24 env sequences from a subject at Fiebig stage V (ZM180M) are shown, illustrating a heavy concentration of nucleotide substitutions in the region of env that overlaps the second exon of rev. The actual nucleotide substitutions are shown in the middle panel. Each of the 24 sequences was found to contain one or more of seven different mutations when compared to the consensus sequence. Because of the large number of different mutations, it was possible to infer the consensus sequence in this region and across the entire env gene. Moreover, all of the nucleotide substitutions concentrated within this 9-codon stretch of the Rev open reading frame were nonsynonymous (Fig. 4A, right panel). Statistical analysis ruled out the possibility that this cluster of mutations arose by chance, and the observation most likely reflects selection for sequences with amino acid differences. Although viably frozen lymphocytes were not available for cytotoxic T-lymphocyte studies, this subject's HLA profile was typed as A*2901, A*3002, B*1510, B*4201, Cw*0304, Cw*17(01-03). The Rev sequence under selection pressure is LAEPVPLPLPPIERLNIGD, with the variable region underlined. There are several HLA-B42 and HLA-C motifs that overlap this region of interest, where potential second-position and C-terminal anchor motifs are indicated as follows: XPXXXXXXXXL (B*4201); XPXXXXXXL (B*4201); and XAXXXXXXL (Cw*1701, Cw*1702, and Cw*0304). The potential B*4201 epitopes are embedded directly in the region that is the focus of the mutations, while the potential HLA-C epitope is slightly offset. People who carry Cw*03 tend to have a reaction to the peptide that spans this region more often than people without Cw*03, suggesting that a Cw*03 epitope is present and recognized in many subtype C infections (B. T. Korber, unpublished data). A similar pattern of mutations in a Rev 9-mer was recently identified in a subtype B-infected subject, and in this individual, HLA-restricted cytotoxic T-lymphocyte reactivity was confirmed by enzyme-linked immunospot assay and gamma interferon induction (G. M. Shaw and P. Borrow, unpublished data). In the sample from subject ZM206, obtained in stage VI, there was equally strong evidence of selection, again within a 9-amino-acid fragment but this time within the variable loop 1 (V1) region of Env (Fig. 4B). Remarkably, 33 out of 35 sequences had 1 or more of 16 different point mutations within this region, while the other 2 sequences had deletions. This again allowed for the identification of a consensus sequence that likely corresponds to the transmitted or early founder sequence. Again, these changes meant that every sequence sampled encoded a different amino acid sequence when compared to the consensus, and again, the likelihood of such a concentration of mutations occurring by chance was estimated to be extremely low. The HLA profile of this subject was A*0202, A*2301; B*1510, B*180101; Cw*0501, Cw*1601, and the region of extreme selection is GSSKANDNNVNITSD. There are no obvious anchor motifs for the relevant HLAs in this sequence, although KANDNNVNI could fit an A*0202 binding pocket (P. Goulder, personal communication). Alternatively, the observed cluster of V1 mutations could be the result of neutralizing-antibody escape (34). Taken together, these results indicate that molecular patterns of virus adaptation can be inferred even in samples obtained several months after transmission from subjects for whom earlier specimens are not available for comparison.
The SGA-direct sequencing approach is ideally suited to the evaluation of genetic linkages, as described by Palmer et al. for the analysis of drug resistance mutations in the pro-pol genes (27). We sought to determine if SGA-direct sequencing might reveal env gene recombination in subjects acutely infected by more than one virus and then to compare recombination frequencies between SGA and bulk amplification methods. Figure 5 illustrates multiple examples of viral recombination in the two subjects at Fiebig stage VI, ZM229M and ZM215F. Interestingly, in subject ZM215F, the recombination involved not only two principal transmitted virus lineages but also additional sequences not otherwise represented in the sequence set. Exhaustive phylogenetic analyses indicated that subjects ZM215F and ZM229M had each been infected by four or more viruses (Table 3). Thus, in these two heterogeneous infections, viral diversification was accelerated by extensive recombination.
Although viral recombination assessed by SGA methods can be complicated and nearly indecipherable in multiply-infected individuals at later time points, this problem is magnified if bulk PCR methods are used. In Fig. 6, we show results for a subject at Fiebig stage II who was infected by two variants. The analysis of a total of 44 SGA-derived env sequences reveals no evidence of recombination, but 8 of 34 bulk PCR-derived sequences are mosaic, each exhibiting different breakpoint patterns. Artifactual Taq-mediated template switching was also demonstrated in an example of a cross-contaminated plasma specimen from two subjects (ZM246F and ZM246M) who were infected by unrelated viruses (Fig. 7). In the contaminated specimen, the SGA method clearly distinguished a single ZM246F lineage from two ZM246M lineages with no recombinant viruses among them. Conversely, the bulk method generated two mosaic sequences in vitro that did not exist in vivo. From the results of our analyses and those of other reports (27, 43-45, 54), we conclude that SGA methods do not generate in vitro recombinants, whereas bulk methods commonly do. Bulk amplification-cloning-sequencing strategies are also susceptible to Taq-induced nucleotide misincorporation, template resampling, and cloning bias. These limitations may not be problematic for certain applications. However, if a goal is to obtain sequences of appreciable length that correspond to HIV-1 genomes that exist in vivo, then SGA-direct sequencing approaches have distinct advantages.
The results of the present study, together with those of Palmer et al. (27) and B. F. Keele (unpublished), illustrate new scientific avenues for deciphering HIV-1 transmission and patterns of early virus diversification. Based on the data reported here, it is likely that these new approaches will help to clarify the genetic and biological complexity of viruses transmitted by different routes and under various clinical circumstances, all factors that may be important in the design and assessment of candidate vaccines, antiretroviral drugs, or microbicides. In samples from clinical trial participants, it may be possible to use SGA-based methods to generate not only transmitted and evolved env genes but all HIV-1 genes of interest. Recently, we have shown for seven clade B- or C-infected subjects that complete (9 kb) HIV-1 genomes corresponding to the transmitted or early founder virus can be identified by SGA-direct sequencing methods (39). Such approaches may be useful in mapping linked mutations conferring escape from cellular and humoral immune responses in naïve or vaccinated individuals.
This work was supported by grants from the Bill and Melinda Gates Foundation Grand Challenges Program (grant 37874), the National Institutes of Health (grants R01 AI58706, R01 AI51231, P01 AI061734, P30 AI27767, and P30 AI50410), the Center for HIV/AIDS Vaccine Immunology (grant U01 AI067854), an internally directed research grant for vaccine design at the Los Alamos National Laboratory, and the International AIDS Vaccine Initiative for cohort recruitment and maintenance.
Published ahead of print on 6 February 2008. ![]()
# These authors contributed equally to the work. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»