Previous Article | Next Article ![]()
Journal of Virology, November 2002, p. 10745-10755, Vol. 76, No. 21
0022-538X/02/$04.00+0 DOI: 10.1128/JVI.76.21.10745-10755.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, Massachusetts,1 Statistical Center for HIV/AIDS Research and Prevention, Fred Hutchinson Cancer Research Center, Seattle, Washington,2 Laboratoire de Bactériologie-Virologie, Université Cheikh Anta Diop, Dakar, Senegal3
Received 9 April 2002/ Accepted 22 July 2002
|
|
|---|
|
|
|---|
Most previous studies have sought to examine the correlation between disease progression and intrapatient diversity in subtype B-infected individuals, both adults (1, 18, 20, 22, 24, 25, 40, 41, 47) and children (7, 11, 15, 43), excepting a study examining viral evolution in nonrecombinant subtype A-infected women from Kenya (34) and a study by Sutthent et al. which revealed a correlation between poorer clinical status and increased diversity in subtype E-infected mothers and their infants (44). The majority of studies have evaluated regions of the env gene, a rapidly evolving and functionally important region. Markham et al. examined 285 base pairs (V3 and part of C2/C3) of DNA of the env gene drawn from seroconvertor intravenous drug users from the ALIVE study in Baltimore, Maryland (22). The greatest diversity and divergence was noted in rapid progressors, followed by moderate progressors and finally by nonprogressors, and diversity had a negative correlation with CD4+ cell counts. This suggested that greater diversification was proportional to the rate of progression. McNearney et al. noted similar results for the env V3 region in a smaller study (25). These data supported the Nowak model, which assumes that viral replication, with the resulting accumulation of mutations, drives diversity and therefore antigenic variation, with the immune system controlling disease progression as long as it is able to recognize and control viral replication and emerging variants (30). This model suggests that host immune selection pressures are not driving diversity but rather that the diversity generated by viral replication is driving the disease. Many rounds of viral replication (and therefore high viral load levels) would eventually overwhelm the immune system with new viral variants and result in rapid progression to disease.
Wolinsky et al. studied the V3-V5 region of the env gene in men from the MACS cohort (47). They concluded that, for those in the rapid progressor group, there was little selective pressure for change and that therefore less diversity was evident in this group than in either the moderate or the slow progressor group. Quasispecies diversification and divergence were strongly correlated with a favorable clinical outcome, and this suggested that diversity driven by an effective host immune response resulted in significant pressure on the virus to evolve. Similar results were obtained for HIV-1-infected children in a study by Ganeshan et al. (11).
Multiple intersubtype recombinant viruses have been identified in HIV-1-infected individuals, and several exist as major circulating recombinant forms (23). It is estimated that, worldwide, at least 9 million HIV-1 infections are due to CRF02_A/G-IbNG (23). CRF02_A/G-IbNG was first characterized from a sample collected in Ibadan, Nigeria (31). It is subtype A in the gag gene and the majority of the env gene, subtype G in the long terminal repeats, and a mosaic of A and G in the pol gene. It is epidemic in West and West Central Africa (3, 27) and is the predominant virus circulating in Senegal (38). The CRF02_A/G-IbNG-infected individuals included in our study were antiretroviral therapy-naïve at the time of sampling. We sought to directly examine the genetic diversity of HIV-1 CRF02_A/G-IbNG in seroincident women during the disease-free interval in order to identify patterns of diversity at the viral setpoint, as opposed to during AIDS or primary infection. In this study, we examined intrapatient diversity and divergence in 12 HIV-1 CRF02_A/G-IbNG-infected seroincident women categorized into high and low viral load groups in Dakar, Senegal.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Profiles of two groups in study populationa
|
DNA amplification and sequencing by PCR. Proviral DNA was extracted from peripheral blood mononuclear cells with the Qiagen QIAmp Blood Midi kit (Qiagen, Inc., Chatsworth, Calif.). The DNA was resuspended in Tris-EDTA buffer, and its concentration was determined by measurement of the optical density at 260 nm. One microgram of DNA was used for each PCR. For each individual, three independent PCRs were performed under identical conditions for each sample at both time points to minimize resampling bias (2). To amplify the HIV-1 C2V3 env region, a nested PCR was performed with two sets of primers, WT1-WT2 for the first round and KK30-KK40 for the second round, as previously described (38, 48). Additionally, a single gag fragment was sequenced from each individual to ensure subtype concordance between the gag and the env genes, further verifying the classification of CRF02_A/G-IbNG. A nested PCR with two primer sets (p108-p109 for the first round and p91-p92 for the second round) was performed to amplify the HIV-1 gag 3'-p24/5'-p7 region, as previously described (35). Negative controls were included in each amplification reaction. The PCR products were purified by agarose gel electrophoresis and by using purification columns (gel extraction kit; Qiagen, Inc.). All purified products were subsequently cloned into the pCR2.1 vector (Topo-TA Cloning; Invitrogen, Carlsbad, Calif.). Several clones were processed for each PCR. The plasmid preparation for the double-stranded DNA sequencing was performed by alkaline lysis with silica columns (S.N.A.P. Mini-Prep kit; Invitrogen). Samples were sequenced by dye terminator cycle sequencing with Taq polymerase (Perkin-Elmer Applied Biosystem Division, Foster City, Calif.) and an automatic sequencer (ABI 373; Perkin-Elmer Applied Biosystem Division). Each earlier and later time point sample was processed separately, from DNA extraction to sequence analysis.
Sequence analysis and statistical methods. For all generated C2V3 viral sequences, a multiple alignment was performed with the Clustal package of multiple alignment programs (Clustal X 1.64), with minor manual adjustments when necessary and complete removal of positions which contained gaps (45). Reference sequences from the HIV sequence database of the Los Alamos National Laboratory were included in certain analyses. Phylogenetic analyses were performed by the neighbor-joining method for both env and gag regions. The alignments were corrected for multiple substitutions, and reliability was estimated by 1,000 bootstrap resamplings (37). Consensus sequences were generated for each individual's earlier and later sequences by using the SeqPublish program (HIV sequence database; Los Alamos National Laboratory). DNA and protein distance analyses were executed with the DNAdist and Prodist programs by using the Phylip 3.572 package (9). Statistical analyses were performed with S-Plus version 6.0 software (Insightful Corporation, Seattle, Wash.) and another software program (version 4.0; Stata Corporation, College Station, Tex.). Viral load values for each individual were log10 transformed for the purpose of statistical analysis.
Nucleotide sequence accession numbers. All nucleotide sequences were submitted to GenBank and provided with accession numbers (AF526650 to AF526876).
|
|
|---|
Earlier and later intrapatient diversity. We generated and sequenced multiple C2V3 env clones from each time point. Panels a and b of Fig. 1 show two neighbor-joining phylogenetic trees including all earlier and later time point clones for each sample in the low and high viral load groups, respectively. There was a distinct difference between the high and low viral load groups in the clustering patterns of clones. Two of five low viral load individuals (subjects 2723 and 3065) showed evidence of separate clustering between the times the samples were collected, while the remainder revealed mixed earlier and later clones on branches. Four of five low viral load individuals (subjects 2723, 668, 225, and 1067) displayed horizontal branch lengths for the majority of their later clones that were similar to or shorter than those displayed for earlier clones (Fig. 1a), suggesting that later clones have fewer nucleotide changes (less divergence) than clones do. In the high viral load group, five of seven individuals (subjects 1196, 915, 2587, 2909, and 2997) revealed distinct clustering between all earlier and later clones, and there was evidence of a similar pattern emerging in a sixth individial (subject 2425) (Fig. 1b). Additionally, four of seven individuals (subjects 1196, 915, 2587, and 2997) displayed later clones with horizontal branch lengths that were greater than those of earlier clones, while the remaining individuals (subjects 394, 2909, and 2425) showed evidence of a similar trend in later clones. The most dramatic clustering and the longest clone branch lengths in later samples occurred in the two individuals with the greatest observation times, subjects 1196 and 915. Later clones from members of the high viral load group displayed longer branch lengths and therefore greater divergence those from members of the low viral load group. This phylogenetic analysis suggests that the high viral load group exhibited a higher magnitude of nucleotide divergence over time, a divergence not evident in analyses of earlier samples.
![]() ![]() View larger version (43K): [in a new window] |
FIG. 1. Low (a) and high (b) viral setpoint group neighbor-joining phylogenetic trees of multiple clones of HIV-1 env C2V3 sequences. The designations of earlier clones appear in roman type, while those of later clones appear in italics. In the clone designations, the letters E and L respectively indicate early and late time point clones, the letters A, B, and C indicate different PCRs, and the final digit indicates the clone number sequenced. The tree was constructed using 1,000 bootstrap resamplings. The number of resamplings carried out is indicated at each major branch point. Bar, 1% nucleotide divergence.
|
![]() View larger version (20K): [in a new window] |
FIG. 2. Box-and-whisker plots of intrapatient diversity at earlier and later time points (a) and diversification rates (b), stratified by viral load in each case. The line in the middle of the box represents the median and the box extends from the 25th percentile (x[25]) to the 75th percentile (x[75]), which is defined as the interquartile range (IQR). The whiskers extend to the upper and lower adjacent values. The upper adjacent value is less than or equal to x(75) + 1.5(IQR) and the lower adjacent value is greater than or equal to x(25) + 1.5(IQR). Any observed points more extreme than the adjacent values are referred to as outside values or outliers and are plotted individually.
|
Nucleotide divergence between all earlier and later viral sequences. Divergence can be defined as the distance from a founder sequence to a later sequence. The average of nucleotide distance between all earlier and later sequences was calculated for each individual, and this single value approximated the nucleotide divergence from earlier to later virus populations in a single individual. This divergence value was significantly different for the high (median distance, 0.054) and the low (median distance, 0.029) load subjects (P = 0.015 [Wilcoxon rank sum test]), with the high viral load group subjects displaying a greater level of divergence from earlier to later time points (data not shown). Linear regression models identified a positive linear correlation between the log10-transformed viral load level and the nucleotide distance measurement, with a time-adjusted slope of 0.30 (95% CI, 0.083 to 0.52; P = 0.012) and nonadjusted slope of 0.28 (95% CI, 0.058 to 0.50; P = 0.018). A positive correlation between viral load level and the magnitude of nucleotide divergence was identified, with high viral load subjects displaying greater divergence over time.
Amino acid divergence between all earlier and later viral sequences. The understanding of amino acid diversity and divergence is important inasmuch as the majority of immune responses to infection occur at the protein level. The average amino acid distance between all earlier and later sequences was calculated for each individual, and this single value approximated the amino acid divergence from earlier and later virus populations in a single individual. The amino acid divergence was significantly different for the high and low viral load groups (median divergence, 0.26 versus 0.11; P = 0.048 [Wilcoxon rank sum test]) with the high viral load subjects exhibiting a significantly higher amino acid divergence from earlier to later time points (data not shown). There was a positive but nonsignificant linear correlation between the protein distance and the log10-transformed viral load measurement (P = 0.094). This suggested that the greater nucleotide divergence noted in the high viral load group resulted in amino acid changes and therefore a greater amino acid divergence associated with high viral load levels. The increase in potential nonsynonymous changes in this population may provide evidence of additional selective pressures in individuals carrying high viral loads.
Consensus sequence analysis of nucleotide and amino acid divergence. Consensus nucleotide and amino acid sequences were generated for all earlier and later time point samples from each subject. These alternate divergence measures were evaluated to confirm the previous findings. The nucleotide divergence between earlier and later consensus sequences was significantly greater in the high viral load group (median divergence, 0.027 [high viral load group] versus 0.0092 [low viral load group]; P = 0.023 [Wilcoxon rank sum test] [data not shown]). Additionally, the amino acid divergence between earlier and later consensus sequences was significantly greater in the high viral load group (median divergence, 0.13 [high viral load group] versus 0.044 [low viral load group]; P = 0.023 [Wilcoxon rank sum test] [data not shown]).
dn/ds ratios as a measure of positive selection pressure. Characterization of nucleotide substitutions in a gene or genome involves several main forces, including mutation generation, random genetic drift, purifying or negative selection, and occasionally positive selection (12). The ratio of nonsynonymous substitutions per nonsynonymous site to synonymous substitutions per synonymous site (dn/ds ratio) is a measure of positive selection and is frequently used in the HIV-1 system. A dn/ds ratio of greater than 1 may be considered indicative of positive selective pressure. We calculated the dn/ds ratios for our groups of of high and low viral load individuals through the SNAP program (Los Alamos Database), based on the method employed by Nei and Gojobori (28). The calculated dn/ds ratios were not normally distributed; therefore, analysis by the nonparametric Wilcoxon rank sum test was conducted on both direct and loge-transformed values. There was no significant difference between the high and low viral load groups with regard to dn/ds ratios (median dn/ds ratio, 1.20 [high viral load group] versus 0.435 [low viral load group]; median loge dn/ds values, 0.19 [high viral load group] versus 0.83 [low viral load group]; P = 0.168 [Wilcoxon rank sum test]), although the trend was towards a higher median dn/ds ratio in the high viral load individuals. Synonymous changes per synonymous site were similar for individuals carrying high and low viral loads (median ds value, 0.0484 [high viral load group] versus 0.0419 [low viral load group]; median loge ds value, -3.03 [high viral load group] versus -3.17 [low viral load group]; P = 0.465 [Wilcoxon rank sum test]), but nonsynonymous changes per nonsynonymous site were significantly greater in the high viral load group than in the low viral load group (median dn value, 0.0567 [high viral load group] versus 0.0232 [low viral load group]; median loge dn value, -2.87 [high viral load group] versus -3.76 [low viral load group]; P = 0.028 [Wilcoxon rank sum test]). This argues against strong positive selection on one of the two populations but indicates that there may have been a greater number of nonsynonymous substitutions in the high viral load group. It is possible that selective pressures are constant across all individuals regardless of viral load category, and the greater number of nonsynonymous changes in the high viral load group might have occurred secondary to the higher viral replication rate, although this would predict an equivalent increase in the number of synonymous changes. Alternatively, it is possible that the high viral load population is subject to positive selective pressures, which could include diversifying selection for amino acid changes. This could select for a more cytopathic, pathogenic, or robust viral variant, as has been suggested previously (22).
Interaction between the effects of viral load and time and its effect on nucleotide and amino acid divergence. The rate of viral divergence over time appeared to differ by viral load level. This interaction was noted for viral load measured as a binary variable (Fig. 3a) and as a continuous quantitative variable (Fig. 3b.) Fig. 3a shows a plot of the nucleotide and amino acid divergence (mean nucleotide distance and mean protein distance between all earlier and later clones) versus the observation time separately by viral load group. In each plot, the best-fit regression lines for the low and high viral load groups are substantially nonparallel, illustrating the interaction. Statistical tests for interaction in the linear regression model gave P values of 0.148 and 0.089 for mean nucleotide and mean protein distances, respectively. Statistical tests for interaction in additional linear regression models for consensus nucleotide distances (P = 0.033) and consensus protein distances (P = 0.052) were performed (data not shown). The results were sensitive to the individual with low viral load in the bottom right corner of each plot; this and the small sample size imply that formal inferences should be made cautiously. Figure 3b illustrates the interaction when viral load is considered quantitatively for nucleotide and amino acid divergence. In each plot, the black line represents the estimated increase in mean divergence (100%) per 1|-|log10 increase in viral load as a function of time and the gray lines are 95% pointwise confidence intervals. The steepness of the (positive) slope indicates the strength of interaction, with a zero slope indicating no interaction. Statistical tests for interaction in the linear regression model gave P values of 0.175 and 0.102 for mean nucleotide and mean protein distances, respectively. Statistical tests for interaction in additional linear regression models for consensus nucleotide distances (P = 0.069) and consensus protein distances (P = 0.097) were performed (data not shown). In sum, the data suggest that the rate of viral divergence over time was modified by viral load, with a higher viral load leading to a considerably higher divergence rate. The relationship between divergence and time was different between high and low viral load levels, with evidence of a strong interaction between observation time and viral load level.
![]() View larger version (25K): [in a new window] |
FIG. 3. (a) Nucleotide and amino acid divergence (y axis) plotted against observation time in years (x axis) with regression lines. Squares, high viral load subjects; circles, low load subjects. (b) Slope of nucleotide and amino acid divergence (y axis, represented by mean distances) plotted against observation time (x axis) for all subjects. Solid line, estimated (percent) increase in mean divergence per 1|-|log10 increase in viral load as a function of time interval; broken line, 95% pointwise confidence interval; dashed horizontal line, zero line. (c) Number (log10 transformed) of viral RNA copies/ml of plasma (y axis) for each subject plotted against nucleotide and amino acid divergence rates (x axis, mean distance divided by observation time.).
|
|
|
|---|
Many studies have evaluated HIV-1 subtype B diversity at single and sequential time points and its correlation with disease progression (1, 7, 11, 15, 18, 20, 22, 24, 25, 40, 41, 43, 47). These studies included individuals treated with antiretroviral drugs and sampled both during the disease-free interval and once clinical AIDS was established. Some studies evaluated both HIV-1 seroincident and seroprevalent individuals. A few studies did not directly examine sequence when calculating diversity, while others did not control for resampling bias. To our knowledge, this is the first study to examine the relationship between viral load level and genetic diversity in a non-B subtype during the disease-free interval. We attempted to characterize the relationship between genetic diversity and the HIV-1 RNA viral load and also sought to determine whether the relationship was linear and positively or negatively correlated. Additionally, this design provided an opportunity to examine diversity during the disease-free interval, unaltered by the viral dynamics of clinical AIDS or acute infection. Not surprisingly, the magnitude of diversity was positively correlated to viral load level, such that individuals carrying high viral loads demonstrated a greater intrapatient diversity, diversification, diversification rate, and divergence over time in their nucleotide and protein sequences than did individuals carrying low viral loads. Further, the greater the observation time, the greater the generated divergence between the earlier and the later samples in the high viral load group. Accordingly, intrapatient diversity was similar between individuals carrying high and low viral loads earlier in infection, with a trend towards a greater intrapatient diversity in the high viral load group later in infection. Much of this may be attributable to mutations generated by increasing rounds of viral replication in the individuals carrying high viral loads as well as to rapid virus turnover. Reverse transcriptase has a high error rate and an inadequate proofreading and repair mechanism. Hence, the mutation rate of HIV-1 has a profound effect on genetic variability (4), and if diversity generated by mutation rate is directly proportional to rounds of replication, this diversity will be positively compounded by time. Not surprisingly, the high levels of nucleotide diversity and divergence found in the high viral load group was magnified in individuals with the greatest observation time, a finding similar to what has been observed for subtype B infections (17). It therefore seems quite likely that viral replication rate and the resultant level of circulating virus is the primary driving force in the generation of nucleotide diversity in this population of CRF02_A/G-IbNG-infected individuals.
All individuals generated diversity and divergence proportional to their viral load level: individuals with high viral load were characterized by high diversity and divergence and individuals with low viral load were characterized by lower diversity and divergence, as has been described previously in studies of HIV-1 subtype B (22, 25). Increased nucleotide diversity and divergence which resulted in concordant amino acid divergence was noted in high viral setpoint individuals. Accordingly, individuals carrying high viral loads possessed a greater number of nonsynonymous nucleotide changes. This might be consistent with initial strong immune pressures on the virus population until rapidly replicating variants predominated to push the individual to AIDS (30) or on individual virulent variants resulting due to the accumulation of nonsynonymous changes during the progression to AIDS (40). There was no significant difference in the dn/ds ratio between high and low viral load individuals. However, one cannot rule out the possibility of positive selective forces acting on the high viral load population, or of the absence of positive selection on the low viral load population. Even the presence of small selective forces, whether positive or negative in nature, could influence variation in the population (4). The magnitude of sequence divergence noted in the high viral setpoint individuals in this study is most likely a direct function of a higher viral replication rate and duration of infection (time) to compound resulting error-associated mutations. Alternatively, one cannot rule out new, unique, and virulent viral variants that emerge as a result of high replication rate in the individuals with high viral setpoints and continue to reestablish new populations as infection time progresses.
Unlike many of the published studies of diversity, which evaluated subtype B-infected homosexual men, this study was an evaluation of a cohort composed solely of women infected with non-subtype B virus through a heterosexual mode of transmission. The role of gender-specific differences in the biology of HIV disease pathogenesis and transmission remains to be fully characterized, although the data suggest that such differences may exist in viral load (8, 10, 36, 42) and pathogenesis (8, 10). There is also evidence that HIV-1-infected women from Mombasa, Kenya, harbor viruses of greater heterogeneity than do HIV-1-infected men of the same population (19). Such gender-specific differences in biology and transmission may affect the interpretation of diversity in this population as well as the general applicability of any findings to other HIV-1-infected populations.
It is also possible that diversity seen in the env C2V3 region may not be fully representative of diversity across the entire genome and that selective forces towards positive selection in this region may not be applicable to forces across the entire genome. However, the V3 region is an important functional region for virus tropism and pathogenicity, cytotoxic T-lymphocyte responses, humoral antibody responses, and coreceptor usage and is also a region of frequent mutations (22). Therefore, examination of mutations and diversity in this region may represent genetic diversity in functionally important domains more susceptible to selection. Future studies to further characterize the relationship between genetic diversity and viral load are clearly needed.
It is also necessary to address and better clarify the role of immunologic correlates and their associations with viral load. We have recognized an interaction between genetic diversity, viral setpoint, and duration of infection. Therefore, elucidation of the etiology and directionality of these variables and their influences on one another may provide additional prognostic markers for disease progression in regions where non-B subtypes predominate. In particular, accounting for such diversity during vaccine design might allow for selection of more broadly neutralizing epitopes for use in negative partners in discordant couples, babies of HIV-1-infected mothers, or in other individuals at risk for transmission by viruses that establish high viral setpoints. Elucidation of the interactions between diversity and markers of disease progression will be pivotal for vaccine design, therapy development, and disease pathogenesis studies.
This work was supported in part by U.S. Army grant DAMD 17-95-C-5005 as well as by NIH grants AI 43879 and AI 467274.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»