ABSTRACT
HIV-1 entry into target cells influences several aspects of HIV-1 pathogenesis, including viral tropism, HIV-1 transmission and disease progression, and response to entry inhibitors. The evolution from CCR5- to CXCR4-using strains in a given human host is still unpredictable. Here we analyzed timing and predictors for coreceptor evolution among recently HIV-1-infected individuals. Proviral DNA was longitudinally evaluated in 66 individuals using Geno2pheno[coreceptor]. Demographics, viral load, CD4+ and CD8+ T cell counts, CCR5Δ32 polymorphisms, GB virus C (GBV-C) coinfection, and HLA profiles were also evaluated. Ultradeep sequencing was performed on initial samples from 11 selected individuals. A tropism switch from CCR5- to CXCR4-using strains was identified in 9/49 (18.4%) individuals. Only a low baseline false-positive rate (FPR) was found to be a significant tropism switch predictor. No minor CXCR4-using variants were identified in initial samples of 4 of 5 R5/non-R5 switchers. Logistic regression analysis showed that patients with an FPR of >40.6% at baseline presented a stable FPR over time whereas lower FPRs tend to progressively decay, leading to emergence of CXCR4-using strains, with a mean evolution time of 27.29 months (range, 8.90 to 64.62). An FPR threshold above 40.6% determined by logistic regression analysis may make it unnecessary to further determine tropism for prediction of disease progression related to emergence of X4 strains or use of CCR5 antagonists. The detection of variants with intermediate FPRs and progressive FPR decay over time not only strengthens the power of Geno2pheno in predicting HIV tropism but also indirectly confirms a continuous evolution from earlier R5 variants toward CXCR4-using strains.
IMPORTANCE The introduction of CCR5 antagonists in the antiretroviral arsenal has sparked interest in coreceptors utilized by HIV-1. Despite concentrated efforts, viral and human host features predicting tropism switch are still poorly understood. Limited longitudinal data are available to assess the influence that these factors have on predicting tropism switch and disease progression. The present study describes longitudinal tropism evolution in a group of recently HIV-infected individuals to determine the prevalence and potential correlates of tropism switch. We demonstrated here that a low baseline FPR determined by the Geno2pheno[coreceptor] algorithm can predict tropism evolution from CCR5 to CXCR4 coreceptor use.
INTRODUCTION
HIV-1 entry into target cells influences several aspects of HIV-1 pathogenesis, including viral tropism, HIV-1 transmission and disease progression, and response to entry inhibitors (1, 2). It is widely accepted that CCR5 coreceptor-using viruses predominate during the early stages of infection (3–5), with some exceptions (6). During infection, depending on the viral subtype, a proportion of patients experience emergence of CXCR4 coreceptor-using strains, which usually coincides with the earliest signs of disease progression. However, this shift in tropism is not required for AIDS progression. In fact, CXCR4-using strains are observed in about 50% of individuals with subtype B strains during chronic infection and virological failure (7–10).
The introduction of maraviroc in the antiretroviral arsenal and the development of new drugs targeting coreceptors renewed interest in coreceptors utilized by HIV-1. Maraviroc has been licensed for HIV-infected patients harboring exclusively CCR5-tropic viruses. The safety pharmacokinetic profile of this drug further supports its consideration for simplification and intensification strategies (11). Several genotypic and phenotypic assays have been developed to infer the coreceptor tropism. Phenotypic assays are the gold standard for the direct determination of HIV-1 tropism; however, technical and logistic limitations restrict their use in clinical settings. Alternatively, sequencing of the third variable loop (V3 loop) of HIV-1 envelope gp120 and interpretation using different bioinformatics tools such as Geno2pheno[coreceptor] and position-specific scoring matrices (PSSMX4R5) (12–15) are more accessible for routine laboratories, but opinions on their clinical utility still differ considerably (16–18).
Despite concentrated efforts, factors predicting tropism switch are still poorly understood. Different theories have been proposed to explain tropism switch (reviewed in reference 19). First, non-R5 variants emerge as a result of a stepwise evolution from R5 strains (20, 21), which could result in the emergence of minor CXCR4-using strains very early during infection. Second, CXCR4-using viruses may have been transmitted with the R5 variants, but effective immune responses against non-R5 viruses during the primary infection might have prevented their expansion, and their late emergence is associated with a compromised immune system (22–24). Third, CCR5- and CXCR4-using strains have different target cell ranges. CXCR4 is usually expressed on a high fraction of naive CD4+ T cells, compared to a substantial fraction of memory cells that express both CXCR4 and CCR5 (25, 26). The shift in target cell populations over time may be responsible for coreceptor switch. During the primary infection, the frequency of proliferative naive CD4+ T cells is low compared to that of memory cells. However, later in infection the frequency of proliferative naive CD4+ T cells increases. This relative increase shifts the selection pressure in favor of non-R5 viruses (27). It is also conceivable that the fixation and expansion of a CXCR4-using strain will not occur in a stochastic fashion. During early stages of HIV infection, virus replication occurring in lymphoid tissues will occur mainly in the gastrointestinal tract, the gut-associated lymphoid tissues (GALT), which has an excess of the CXCR4-blocking ligand SDF-1, which counterselects for the emergence of CXCR4-using strains (28). Upon an exhaustion of the GALT, HIV will replicate predominantly in the thymus, which is the major organ of naive T cell production, thus positively selecting for CXCR4-using strains (29).
We therefore asked what would be the correlates of tropism evolution over time. In order to answer this question, we evaluated a group of individuals from a cohort of recently HIV-1-infected candidates and evaluated the HIV-1 sequences from the V3 region of gp120, viral loads, and CD4+ and CD8+ T cell counts during the antiretroviral-naive period and after antiretroviral initiation. We also evaluated the influence of the CCR5 wild-type (wt)/Δ32 polymorphisms, GB virus C (GBV-C) coinfection, HLA subtyping, HIV-1 subtype, and demographic characteristics such as gender, age, and HIV acquisition route on the pace of HIV-1 tropism evolution over time in this population. Very few studies have described longitudinal tropism evolution in individuals who were followed from their initial infection (30). We demonstrated that a tropism switch from CCR5 to CXCR4 coreceptor use was rare and that a low baseline false-positive rate (FPR) determined by the Geno2pheno[coreceptor] algorithm not only can determine but also can predict tropism evolution.
RESULTS
The longitudinal HIV-1 coreceptor tropism in proviral DNA samples of 66 individuals (59 male and 7 female) was analyzed. The mean interval between the first and last samples analyzed was 55.1 months (range, 3.0 to 114.3). The mean age calculated at the time of recruitment in the study was 32.5 years (±8.3); the mean age of male participants was 31.9 years (±8.2), and that of female participants was 37.3 years (±7.4). Fifty-four of the 59 male participants were men who have sex with men (MSM). For one female (2053), the route of HIV-1 acquisition was unavailable, whereas all others reported heterosexual exposure. Two individuals presented a CCR5 wt/Δ32 profile, and 18 (27.3%) had GBV-C coinfection.
Correlates of coreceptor tropism switch.At baseline, 11 (16.7%) participants were predicted as harboring CXCR4-using viruses with a 10% FPR cutoff, and 17 (25.8%) did so with a 20% cutoff. For further analysis, the 20% FPR cutoff was retained for tropism predictions in accordance with the European guidelines (31). Ten of 49 individuals with R5 strains and 2 of 17 harboring CXCR4-using strains presented with a baseline CD4+ T cell count of <350 cells/μl (P = 0.47 by Fisher's exact test). The subtype B V3 region was documented in 63 (95.5%) cases, subtype F in 2, and subtype C in one. No significant association was observed between the two groups for gender, age, route of transmission, CCR5 Δ32 profile, presence of GBV-C coinfection, HIV-1 subtype, baseline viral load, and CD4+ and CD8+ T cell counts.
Follow-up analysis of samples revealed that 40 (60.6%) individuals remained R5 tropic (R5/R5) and 15 (22.7%) remained CXCR4 using (non-R5/non-R5). Discordant tropism results at the two time points were observed for 11 individuals; 9 individuals who were predicted initially to harbor CCR5-tropic viruses showed a switch to CXCR4-using strains (R5/non-R5), and 2 individuals who harbored CXCR4-using strains at baseline showed an R5 tropism profile at the last time point (non-R5/R5).
To investigate the potential correlates of a tropism switch from CCR5 to CXCR4 coreceptor use, the analysis concentrated on 40 R5/R5 and 9 R5/non-R5 individuals. No significant difference between the two groups was observed for age, gender, GBV-C coinfection, CCR5 Δ32 profile, baseline viral loads, and CD4+ T cell counts (Table 1). Only a low FPR at baseline was found to be a predictor of a tropism switch (FPR < 40%, P = 0.00004; FPR < 50%, P = 0.0006). Conversely, a high FPR relates to the absence of a tropism switch from R5 to non-R5.
Demographic and virological characteristics of individuals grouped according to tropism switch categories
Envelope V3 loop evolution overtime.The population V3 sequences at the two time points were compared to define the mutations influencing the tropism switch. V3 loop amino acid alignment demonstrates variations all over the V3 loop from the common North American reference subtype B sequence, D85.40 (32). Minor sequence variations with no impact on tropism profile are observed among the nonswitchers (R5/R5 and non-R5/non-R5; see Tables S1 and S2, respectively, in the supplemental material).
To gain insight into tropism evolution among individuals presenting a switch from CCR5 to CXCR4 coreceptor use, all the intermediate samples for which proviral DNA was available were investigated (Table 2 and Fig. 1). V3 sequences were analyzed at a median of 5 time points, with a minimum of two (individual 2029) and maximum of 8 (individual 1057) time points. Figure 2 depicts the phylogenetic relationships among those individuals. Since antiretroviral therapy (ART) obscures the natural progression of disease, individuals who remained ART naive were considered to estimate the timing of the tropism switch. The mean and median evolution times in these six naive individuals were 27.29 and 17.83 months, respectively, with a minimum evolution time of 8.90 months (individual 2029) and maximum of 64.62 months (individual 1066). Compared to 3 individuals who underwent ART, none of the ART-naive individuals harbored strains presenting basic residues at position 11 and/or 25 with the emergence of CXCR4-using strains. However, an increase in V3 loop net charge was observed with evolution to CXCR4 coreceptor use, except for individual 1066, who showed HIV strains with a net charge of +3 throughout the follow-up. Five out of 31 individuals from the R5/R5 group presented an increase in the net charge in the V3 loop, compared to 7 out of 9 from the R5/non-R5 group (Fisher's exact test, P = 0.001).
Overview of the V3 loop amino acid sequence alignments from the individuals showing a tropism switch from CCR5 to CXCR4 coreceptor use at different time pointsa
Changes in plasma viral load and CD4+ T cell count over time in each of the study subjects from the R5/non-R5 group. Individual 2029 is not shown because of limited available time points. The shaded area in each panel borders the two time points during which tropism switching occurred; the left border indicates CCR5 tropism, whereas the right border indicates non-R5 tropism. Arrows above the x axes indicate the time points at which coreceptor tropism was analyzed. Black arrows show CCR5 tropism and red arrows non-R5 tropism. Horizontal dashed red lines in each panel indicate the lower detection limit of assays for detecting plasma viral load. The starts of ART in individuals 1022, 1002, and 1135 are marked in the respective panels. Plasma viral load is shown as log10 HIV-1 RNA copies/ml and CD4+ T cell counts as cells/μl.
Neighbor-joining phylogenetic tree of env V3 sequences from the R5/non-R5 group. Solid symbols indicate R5 HIV strains, whereas open symbols represent non-R5-tropic strains. Bootstrap values above 70% are indicated at the nodes. An unrelated non-American B consensus sequence, D85.40 (32), was used as an outgroup.
Mutational pathways toward CXCR4 use evolution were specific for each individual (Table 2). One or two mutations were enough to derive the tropism switch among the treatment-naive individuals, with a gradual decrease in CD4+ T cell count compared to those who underwent ART (Fig. 1). Interestingly, a consistent trend of gradual decrease in FPR over time was observed, with evolution toward CXCR4 use, with a baseline median FPR of 33.7% (range, 20.4 to 71.8%) that dropped at the moment of switch to a median of 10.0% (range, 6.9 to 17.9%).
We also identified two individuals (1130 and 1049) who were detected with CXCR4-using strains at baseline and reverted to R5 strains over time. Differences in V3 loop composition were observed at the two time points, which resulted in the “reversal” of tropism profile. The length of the V3 loop, N-linked glycosylation, and the composition of the V3 tip were conserved during the follow-up. Basic amino acids were absent at position 11 at both time points. We considered that this finding is not related to an overestimation of non-R5 prevalence in these individuals due to the 20% FPR cutoff used in our analysis that profiled non-R5 strains, since both individuals presented FPRs at baseline below 10%, which evolved to FPRs above 20% during follow-up (see Table S3 in the supplemental material). Individual 1049 presented lysine at position 25 at baseline, which changed to aspartic acid at the later time point. Notably, this individual was also coinfected with GBV-C. Coinfection with GBV-C has been associated with decreased immune activation (33) and slower disease progression (34).
To evaluate the presence of minority variants that might affect tropism evolution, we performed ultradeep sequencing (UDS) on 11 baseline samples from selected patients (Table 3). We were able to sequence samples from 5 of 9 R5/non-R5 and 1 of 2 non-R5/R5 individuals who showed a tropism switch. Among samples from R5/non-R5 switchers, strains with a lower FPR than that detected using population sequencing were revealed in baseline samples of 3 out of 5 individuals. Individual 1057 presented at UDS a minor population (2.1% prevalence) with an FPR of 19.1, whereas the remaining strains presented an FPR of 33.7, the same FPR previously detected by population sequencing at baseline. The same feature was detected in sample 1089, where two strains present in a minority population (each with 0.6% prevalence) presented FPRs of 11.5 and 30.1, whereas 98.7% of strains presented an FPR of 27.3 (the same as in population sequencing) (Table 3). The UDS from the baseline sample of patient 1110 (R5/non-R5) showed a CXCR4-using variant (FPR = 2.1%) with an abundance of 71.5%, in contrast to the population sequencing, which gave an R5 prediction. The second abundant variant (11.3%) with R5 tropism was similar to that in the population sequencing. Individual 1130, who showed a tropism switch from CXCR4 to CCR5 coreceptor use, did not show the presence of CCR5-tropic minor variants at baseline.
Overview of ultradeep sequencing on baseline samples from selected individuals
Among the 3 tropism nonswitchers with an R5/R5 prediction, no minority variants with X4 tropism were identified. None of the minor R5 variants were identified with an FPR lower than that detected by population sequencing.
UDS of the baseline sample from individual 1121 with a non-R5/non-R5 prediction identified V3 variants with FPRs identical to that in the population sequencing. Interestingly, UDS of the sample from individual 1133 (non-R5/non-R5) detected the two V3 abundant variants with FPRs identical to that in the population sequencing (11.7), but this individual also presented minority populations with high FPRs, such as 67.0 (5.7% prevalence), 38.1 (2.7%), and 32.1 (1.3%), suggesting this patient was closer to the tropism switch moment (Table 3; see Table S2 in the supplemental material).
In order to further confirm intrapatient V3 evolution, we performed Beast analysis for individuals who were sampled more than twice during the follow-up (n = 15). Ancestral phylogenetic reconstruction estimating the substitution rates and time to transmitted/founder (T/F) strains, based on the generation of 85,000 phylogenetic trees, revealed node ages varying from 0.98 to 5.22 years for the specific patients (Fig. 3A). The posterior probabilities trees showed values close to 1, thus confirming the robustness of the analysis (Fig. 3B). As seen in Fig. 3C, the substitution rates were consistently high for every HIV quasispecies infecting distinct human hosts. This pointed to an intense HIV evolution in the V3 region of gp120.
Bayesian phylogenetic trees of the HIV-1 env V3 sequences under an uncorrelated log-normal relaxed clock with a GTR + G substitution model. The Bayesian skyline plots depict ages of coalescent intervals for each patient (A), posterior probabilities for each analysis (B), and minimum and maximum substitution rates per site per year (C). The sequences are grouped by patients and sampling time is indicated in the sequence names (in months) along with the respective clinic visit number. The clades of each patient are collapsed with different colors going from darkest to the lightest for the most recent years, best posterior probability values and minimum and maximum substitution rate/site/year. Each has a 95% highest posterior density (HPD) credible interval. The time scale, shown at the bottom of the each panel, covers 5.5 years counting back from 2007. UDS, ultradeep sequencing; m, months.
FPR evolution over time.FPRs at two time points were compared for each individual. More than 93% of individuals harboring R5-tropic strains at baseline with FPRs of >50% maintained their tropism profiles; 27.5% of the individuals (n = 11) showed no change in FPR value, 32.5% showed a decrease, and 40% showed an increase in FPR at the end of study. FPR estimates for the non-R5/non-R5 group either remained the same or decreased over time, showing more adaption toward CXCR4 coreceptor use. In the group that reverted to CCR5 tropism over time, both individuals showed an FPR of <40% at the end of follow-up. Interestingly 6/9 individuals in the R5/non-R5 group who remained treatment naive presented an FPR of less than 40% at baseline, indicating that the FPR value assessed through the Geno2pheno[coreceptor] algorithm can be considered to determine tropism switch over time.
FPR as predictor of coreceptor tropism evolution.The fact that a low baseline FPR is associated with tropism switch suggested that the FPR can be used to build a predictive model of potential tropism switch in other patients and as a marker of tropism switch. A logistic regression analysis among ART-naive individuals to determine an FPR threshold that would define the probability that a viral strain will switch tropism was performed. The box-and-whisker plot (Fig. 4A) clearly demonstrates the difference between the FPRs of the two groups. In order to reduce variance and bias, K-fold cross validation was used with 10-fold division of training data set (5 cases per fold). We trained the model on the training data set, which resulted in an intercept of 7.47324 and an FPR-related beta of −0.19962 with a P value of 0.00879, more extreme than our level of significance.
Evolution of FPR over time among naive individuals from the R5/R5 and R5/non-R5 groups. (A) Differences in FPR between the R5/R5 and R5/non-R5 groups; (B) FPR for patients in the R5/R5 group over time; (C) FPR for patients in the R5/non-R5 group over time.
The model was 94% accurate on the training data (95% confidence interval [CI], 83.45% to 98.75%). This was expected, since the training data set was itself used to determine the parameters. The model's performance on the test data set declined slightly to 92% (95% CI, 73.97% to 99.02%) (see Table S4 in the supplemental material). This can be explained by the smaller number of cases in the test data set (25 versus 50). As reinforcement of the model quality, a receiver operating characteristic (ROC) curve of the true-positive rate against the false-positive rate for the test set covered an area under the curve of 98.33%, indicating the robustness of our model. Using the model parameters, the Logit function (35) was used to calculate the FPR that is associated with the log odds of these parameters. Since there is an FPR for each case in the test data set (25 in all), it is appropriate to take the mean of this vector of FPRs and treat that as the overall FPR. Table S5 in the supplemental material shows the values for the cutoff for each case. The mean of these values is 40.6%, with a 95% CI of 22.05% to 59.10%. The graphical interpretation of the threshold value gives confidence that physicians may be able to use this value provisionally as an estimate of an FPR above which a switch in tropism from R5 to non-R5 is unlikely. Figure 4B and C show how this level affects the two groups. In both panels, each line represents the FPR for a single patient being tracked across all visits.
All the individuals in the R5/R5 group (Fig. 4B) remained above the threshold, except for two individuals who were below the threshold at the end of study. However, in the R5/non-R5 group (Fig. 4C), the threshold line separates the patients who have switched tropism from the others. In this case, these patients end with FPRs below 40.6%. We determined that patients with a baseline FPR of 40.6% presented a stable FPR over time, with an area under the ROC curve of 98%, whereas lower FPRs tended to progressively decay, leading to the emergence of CXCR4-using strains. Studies with large data sets will help to reduce the width of this confidence interval.
DISCUSSION
The present study was carried out to evaluate coreceptor tropism evolution in a cohort of HIV-1-infected individuals after recent infection. For this purpose, a longitudinal follow-up of individuals was carried out for a median of 59.5 months (3.7 to 117.4), documenting known predictors of disease progression such as viral load, CD4+ and CD8+ T cell counts, CCR5 Δ32 polymorphisms, GBV-C coinfection, HLA subtyping, and viral coreceptor tropism.
We analyzed the HIV-1 V3 loop sequences through the Geno2pheno[coreceptor] algorithm, which provides different FPR cutoffs for the probability of incorrectly classifying R5 sequences as non-R5 tropic. The lower the FPR, the lower will be the likelihood of misclassifying a sequence as non-R5.
Using a 10% FPR cutoff, 11 out of 66 individuals (16.6%) were identified as harboring CXCR4-using strains during recent HIV infection, which is in accordance with other studies (36–38). A relatively high proportion of non-R5 variants (25.7%) was detected when a 20% FPR cutoff was used. Different studies have reported between 2 and 23% of CXCR4 use during recent HIV infection (37–43). However, comparisons among the studies are restricted by differences in study design, population size, patient characteristics, age, viral subtypes, tropism definition, and assays used for inferring coreceptor tropism.
One important issue in this study was to confirm that there is a continuous evolution between distinct HIV populations infecting distinct human hosts. In fact, ancestral phylogenetic reconstruction was able to confirm high substitutions rates for every HIV quasispecies infecting distinct human hosts, thus confirming the presence of a continuous HIV evolution over time (Fig. 3). Tropism switch from R5 to non-R5 in this study population was rare; only nine individuals showed a switch during the follow-up analysis. For three, CD4+ T cells decayed to a level below 350 cells/μl, leading to ART initiation according to local guidelines at that time (www.aids.gov.br ). In the remaining 6 individuals, the mean evolution time to CXCR4 preference was estimated to be 27.29 months (range, 8.90 to 64.62). Also, we were able to document the presence of strains with intermediate FPRs, accompanied by a gradual decay in FPR, while evolving toward non-R5 tropism. This observation confirms previous studies which demonstrate the appearance of intermediate variants through a stepwise mutational pathway during evolution from CCR5- to CXCR4-using strains (20, 44).
In a previous study, UDS identified CXCR4-using minority strains among 2 of 8 individuals as early as 12 months before the tropism switch. Also, these minority variants were detected at least 3 months prior to phenotypic detection of CXCR4-using variants using the MT-2 assay (20). However, another study was unable to detect minority non-R5 variants prior to coreceptor switch (30). Extended bioinformatics analysis of these data estimated a mean evolution time of 10.1 months (range, 4.6 to 24.7) before such variants were detected by the MT-2 assay (21). In contrast to those studies, we have estimated tropism evolution from recent infection, documenting the presence of minor variants before a tropism switch. To exclude the presence of minor CXCR4-using strains at baseline that might have overgrown later, we performed UDS on 5/6 samples. UDS of two of these samples from distinct individuals presented strains that have the same FPR as in baseline sequences using population sequencing. Two other samples presented minority populations with lower FPRs than previously detected by population sequencing at baseline. Interestingly, in one individual who presented HIV strains switching from R5 to CXCR4 use, the UDS analysis detected a majority (71%) of CXCR4-using strains at baseline, whereas population sequencing at the same time point predicted an R5 virus. These findings suggest that the experimental system used for bulk population sequencing may not reliably reflect the dominant virus in the quasispecies present in different patients, assuming that UDS is more accurate. Nonetheless, the UDS results presented here suggest the added value of detecting strains with lower FPRs in several individuals in which these populations cannot be detected by the conventional population sequencing.
Another important issue still unsolved is for how long a strain with an R5 prediction will persist. It was observed that 97% of the individuals who stayed R5 tropic had FPRs of >50%, with no trend to FPR decay over time. In addition, 74% of these individuals were followed for more than 4 years. In contrast, individuals showing a tropism switch presented low FPRs at baseline. To evaluate the tropism prediction potential of the FPR, we built a logistic regression model including data from all antiretroviral-naive individuals. Analysis resulted in a threshold of 40.6% (95% CI, 22.05 to 59.10), above which an R5-tropic virus is less likely to switch tropism. The robustness of this model is indicated by an area under the ROC curve of 98.33%. Previous studies also confirmed that an FPR of <50% was retained as a predictor of a tropism switch (45, 46). Another study demonstrated that no minority CXCR4 variants were detected in samples with FPRs of >60% (47). Although the confidence interval in our study appears to be wide, studies with large data sets are warranted to further confirm this cutoff and reduce the width of this confidence interval.
We have been able to detect that ART initiation in two individuals (1002 and 1135) was associated with a “disturbance” in the detection of the tropism profile, in contrast to what was usually detected during the natural history of HIV infection. Interestingly, patient 1002 presented R5 strains during the natural history of infection and developed non-R5 strains detected in the proviral compartment upon ART initiation (Fig. 1G). At the next time point, the patient showed a rebound in viremia with the emergence of R5 strains that were also detected in subsequent samples where the viral load was again suppressed. At the last time point, non-R5 strains were again detected. Interestingly, the R5 strain (sample 1002_V31) that emerged upon viremia under treatment was phylogenetically closer to non-R5 strains, but the R5 strain (sample 1002_V34) detected at the next time point clustered with the ancestral R5 strains (Fig. 2). On the other hand, another patient (1135) infected with an original R5 strain presented HIV evolution to non-R5 strains right at ART initiation, and either R5 or non-R5 strains were further detected, whereas no significant viremia occurred under ART (Fig. 1H). In this case, R5 and non-R5 strains always clustered together in the phylogenetic analysis (Fig. 2).
Moreover, the identification of two patients with tropism reversal from CXCR4 to CCR5 is disturbing. This phenomenon has been previously described among individuals failing ART (48) or upon ART interruption among individuals experiencing virological failure due to ancestral viral progeny activation (49). However, this was not the case in these two individual analyzed in the current study, since they remained ART naive during the entire follow-up (see Table S3 in the supplemental material). The tropism reversal observed here might be due to a replicative advantage of R5 viruses and effective cytotoxic T lymphocyte (CTL) responses against non-R5 strains early in infection (24, 50), as shown by studies where the viral population reverted to a non-syncytium-inducing phenotype after documented infection with syncytium-inducing strains (51, 52). Whether a true shift in tropism from CXCR4 to CCR5 coreceptor use can occur remains questionable. In one of the two individuals, no minor R5 variants were detected in the baseline sample (1130), and R5 strains identified later during the course of disease would have evolved from non-R5 strains. Also, coinfection with GBV-C in this individual might be contributing to reemergence of R5 strains. We recognize that the key data obtained by computational analysis (to identify a CXCR4-CCR5 switch) must be translated using cell cultures.
GBV-C coinfections have been described to be associated with decreased T cell activation (33), slower disease progression (34), and better response to ART (53). Also, GBV-C coinfection leads to reduced expression of CCR5 and CXCR4 on CD4+ T cells during chronic infection (54). Moreover, coinfection with GBV-C failed to demonstrate any significant association with tropism switch. However, a higher prevalence of GBV-C coinfections in the R5/R5 group than in the R5/non-R5 group (30% versus 11%, P = 0.41) might be a possible factor maintaining the tropism profiles, which deserves further investigation.
The presence of a defective CCR5 allele and infection with non-R5 variants is controversial. It has been proposed that reduced cell surface availability of CCR5 due to the Δ32 mutation favors infection by non-R5 variants (55). Two patients in the present study were CCR5 heterozygous. One patient (1044) was infected with R5-tropic virus, whereas other (1068) harbored CXCR4-using variants. The above-mentioned hypothesis cannot be confirmed for the latter, as UDS was not performed to detect the presence of minor R5 variants. Also, the possibility of infection with dual-tropic strains cannot be ruled out, since dendritic cells (DCs) express both CCR5 and CXCR4. However, the permissiveness of DCs for R5 variants is significantly higher than that for CXR4-using viruses (56, 57). When HIV-1 is isolated from newly infected individuals, CCR5-tropic strains are predominant, irrespective of the route of infection. Because the early massive HIV-1 replication occurs in activated T cells and such T-cell activation is induced by antigen presentation, the selective expansion of CCR5-tropic virus may occur largely at the level of DC-T cell interaction. Thus, the immunological synapse serves as an infectious synapse through which the virus can be disseminated. The CCR5 and CXCR4 expression levels appear to contribute little to the predominance of CCR5-tropic HIV-1 transmission from dendritic cells to CD4+ T cells (58).
An interesting trend in the V3 loop net charge was observed over time in 7 out of 9 individuals who evolved from CCR5- to CXCR4-using strains. The positively charged amino acids lysine (encoded by either AAA or AAG nucleotides) and arginine (AGA and AGG) emerged over time in this set of patients, probably reflecting the APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like) hypermutation-mediated process which is a consequence of HIV-1 replication (59). In contrast to R5, CXCR4 is covered with negatively charged amino acids, which may contribute to a change of tropism over time (60). In other words, what was meant to be innate immunity in primates may be contributing to the emergence of more cytopathic HIV strains.
This is one of the few studies in which coreceptor tropism was analyzed after recent infection and over a long period of time. However, it is important to acknowledge limitations and caveats. All the above-reported lack of associations might be due to the population size being too small to detect small effects, particularly when analyzing subgroups. Also, selection of the FPR cutoff for determining coreceptor tropism based on Geno2pheno[coreceptor] is still controversial; here, according to European guidelines (31), a 20% FPR cutoff was utilized since interpretation was based on a single DNA sequencing product. Currently there are no cutoffs in European guidelines for tropism determination based on next-generation sequencing (NGS), and only German-Austrian treatment guidelines (www.daignet.de ) suggested that sequences with FPRs of ≤3.5% can be classified as non-R5 if the proportion of non-R5 variants in a sample is higher than 2% (45, 61). As seen in Table 3, only individuals 1110 and 1130 fulfilled these criteria.
In conclusion, no correlates of tropism switch were identified besides FPR. The FPR value inferred by the Geno2pheno[coreceptor] algorithm has the potential to efficiently predict tropism switch. Our logistic regression analysis revealed that patients with an FPR result above 40.6% are less likely to switch tropism and may not need to be confirmed over time, whereas individuals harboring R5 viruses with an FPR prediction of less than 40.6% are more susceptible to coreceptor switch and need repeated testing if initiation of a CCR5 antagonist is considered or even for prediction of disease progression related to emergence of X4 HIV strains. It will be worthwhile to infer tropism during early infection as a monitoring tool able to determine urgency for treatment initiation among individuals infected with non-R5 and/or dual-tropic viruses. Also, the detection of variants with intermediate FPRs and the progressive decay of FPRs over time among those viruses not only strengthens the power of Geno2pheno[coreceptor] in predicting HIV tropism but also indirectly suggests that a continuous evolution from earlier CCR5-using strains toward CXCR4-using strains is occurring. Other larger studies are warranted to more accurately confirm the FPR threshold below which tropism will not switch.
MATERIALS AND METHODS
Study population.Samples from 66 individuals who were >18 years old, antiretroviral naive, and diagnosed with recent HIV-1 infection according to the serologic testing algorithm for recent HIV seroconversion (STARHS) were analyzed. Recruited individuals were followed every 3 to 4 months after the initial clinical visit, and their viral loads and CD4+ and CD8+ T cell counts were determined. Determination of CCR5 wt/Δ32 polymorphisms and GBV-C coinfection and HLA subtyping were performed as previously described (6). Initially, tropism analysis was performed at two time points, i.e., baseline and with the last available samples. For individuals with discordant tropism predictions at the two time points, analysis of intermediate samples was performed to verify the timing for tropism switch. In addition, 7 individuals from nonswitcher groups were also analyzed at multiple time points. Informed written consent was obtained from all the patients, and the study was approved by the Ethical Committee and the Institutional Review Board of the Federal University of São Paulo (0919/01).
Envelope V3 loop sequencing and coreceptor tropism determination.Proviral DNA was used to amplify and sequence the HIV-1 envelope C2V3C3 region as previously described (32, 62, 63). V3 nucleotide sequences were subsequently submitted to the online coreceptor tropism prediction algorithm Geno2pheno[coreceptor] (64). The FPR is a measure for evaluating the validity of statistical models and in testing predicted values from those models. Software programs such as Geno2pheno[coreceptor] use the FPR as a measure of the class that a case or variable may fall into. The FPR is therefore the frequency with which we believe a result to be positive when it is, in fact, negative. That is, it represents the false-positive or statistical type II error. Geno2pheno[coreceptor] generates results graded as FPR and offers different FPR cutoffs with the probability to classify incorrectly an R5-tropic virus as a CXCR4-using virus. The higher the FPR, the greater will be the likelihood of falsely classifying a sequence as non-R5 tropic and also greater will be the likelihood of predicting a CXCR4-using virus (64). Geno2pheno[coreceptor] reports for each sample tested a predicted phenotype and an FPR (probability that this conclusion is incorrect). The conclusions are based on a machine learning model known as support vector machines (SVM), trained on 1,100 V3 sequences (http://coreceptor.bioinf.mpi-inf.mpg.de/index.php ). For example, if FPR is zero, there will be 0% chances that the virus will be predicted as R5. In contrast, an FPR of 100 means that all tested HIV strains are predicted to be R5. Here we have utilized a 20% FPR cutoff for tropism inference based on European guidelines that describe the use of this cutoff for a single PCR product amplified from proviral DNA (31). The net charge of the V3 loop (65, 66) and the 11/25 rule (66, 67) were also used to define coreceptor tropism.
UDS of V3.In order to estimate the viral input for the subsequent amplification, proviral DNA was quantified by quantitative PCR (qPCR) as described previously (68). In total, 11 baseline samples were subjected to ultradeep sequencing (UDS). Samples from 3 individuals were sequenced on a Roche 454 instrument, and samples from 8 were sequenced on an Illumina MiSeq instrument.
Deep sequencing was performed on the 454 GS Junior platform (Roche). Sequences were extracted with the Pyrobayes program (69) and aligned by Mosaik (70), using a consensus sequence generated by the Consensus Maker tool (available at http://www.hiv.lanl.gov ) as a reference. Sequencing-associated error correction was performed using an in-house script, in which all polymorphisms with a frequency equal to or lower than 1.6%, as determined by sequencing of pHXB2r, pNL4.3, and pBaL, were corrected by reference to the consensus sequence (data not shown). After the correction step, duplicate sequences were extracted and the number of variants leveled according to the viral template input for the first round of amplification.
For Illumina MiSeq use, sequencing libraries were prepared as described previously (71). To take the sequencing error into account, only variants detected at a frequency higher than 1% and with a Phred quality score of >30% (i.e., a base call accuracy of 99.9%) were considered. To avoid artificial generation of in silico chimeras through assembly and overestimation of V3 region diversity, the analysis was restricted to individual paired-end reads that encompass the complete V3 region.
V3 sequences were aligned, truncated, and translated using Geneious R9 software (Biomatters) and then submitted to Geno2pheno[coreceptor] to predict coreceptor usage with an FPR of 10%.
Phylogenetic inference.MEGA software v.6.0 (72) was used to generate neighbor-joining trees using the Kimura 2-parameter model (73). Statistical significance was assessed with bootstrap tests with a total of 1,000 replications.
To document intrahost HIV-1 evolution, sequences were analyzed by the Beast methodology, generating trees with branch lengths in chronological units (years). This analysis also includes the estimation of substitution rates and time to most common recent ancestor (T/F strains) for sequence data from the individuals who were sampled more than twice during the follow-up (n = 15). Alignment of the nucleotide sequences was performed using the Muscle v3.8.31 software (74). To select the best evolutionary model for phylogenetic analysis, the ModelTest program v3.7 (75) was used in conjunction with the Paup package v4.0b (76) to perform statistical hierarchical likelihood ratio tests (HLRT) and Akaike information criterion (AIC) tests. These select the best substitution model out of the existing 56 models.
The Beast v1.8.4 program was used to estimate the coalescence date and nucleotide substitution rate per site per year. We assumed an uncorrelated log-normal molecular clock with demographic models, constant population size, and exponential and logistic growth. The analysis was run for 108 (100 million) generations, with the initial 15% of each run discarded as burn-in. The trees were analyzed with FigTree program v1.4.3, and the convergence of parameters was evaluated with Tracer software v1.6 (Department of Zoology, Oxford University).
Statistical analysis.The R5/R5 and R5/non-R5 groups were compared with the Fisher exact test for the categorical variables and the Mann-Whitney U nonparametric test for continuous variables. The significance level was set at P = 0.05.
Logistic regression model to determine FPR threshold.Based on the FPR estimates from Geno2pheno[coreceptor], we developed a predictive model to determine the FPR threshold that predicts tropism switch. This model was built using the R statistical software language (77) and its caret package (78). Data from all available visits of naive individuals belonging to the R5/R5 (n = 16) and R5/non-R5 (n = 6) groups were used to build this model. Also, all the visits of individuals 1085 (R5/R5) and 1002 and 1135 (R5/non-R5) during the ART-naive period when they were R5-tropic were also added to R5/R5 group. In total, 75 patient time points from 22 individuals were used to build the logistic regression model, which determined the logarithm of the odds of a patient shifting tropism from R5 to non-R5.
We divided the samples into two groups, retaining the proportion of the tropism variable. The training group (n = 50) contained 66.7% of the samples, and the testing group (n = 25) contained 33.3%. Both groups had 60% of the cases belonging to the R5/R5 group and 40% to the R5/non-R5 group. In order to reduce variance and bias associated with a small sample size, the K-fold cross validation technique (79, 80) was utilized to amplify the sample size. For this study, a 10-fold division of the training sample (5 cases per fold) was used. We trained the model with a 5-fold cross validation. The model was applied first to the training set and then to the testing data set. To verify the quality of the model, the receiver operating characteristic (ROC) curve was prepared using the “funModeling” package for R (81). The regression parameter for the FPR is expressed as the logarithm of the odds of a phenomenon, in this case switching of coreceptors from CCR5 to CXCR4. Using the model parameters of the intercept and the FPR-related slope, the Logit function (35) was used to calculate the FPR associated with prediction of tropism switch.
Model comparison.To test whether the logistic regression was appropriate among the various machine learning algorithms available, we compared it to two other algorithms (K-nearest neighbors and linear discriminant analysis). When these models were compared using t tests, all three possible comparisons showed no significant difference in their performance (using the area under the ROC curve as the measure of performance) (data not shown). This led us proceed with reporting the logistic regression results as the final model.
Accession number(s).Sequences generated in a previous study (6) using Sanger sequencing and used here can be found in NCBI's GenBank database under accession numbers MF465903 to MF465961 . All sequences generated in this study using Sanger sequencing have been deposited in GenBank under accession numbers MF465977 to MF466117 . The UDS sequences have been uploaded to the GenBank database under accession numbers MF466118 to MF466152 .
ACKNOWLEDGMENTS
This work was supported by the Sao Paulo Research Foundation (FAPESP) (2011/17334-3). M.S.A. received a PhD fellowship from the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazilian Ministry of Education.
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We declare no conflicts of interest.
M.S.A. and R.S.D. designed the experiments. M.S.A., A.R.L., S.S., M.C., J.G., M.C.S., and S.V.K. performed the experiments. M.S.A., L.M.J., E.G.K., M.C.S., S.V.K., and R.S.D. analyzed the data. J.H., M.S.A., and E.G.K. did the statistical analysis. J.P.L.Z. and M.S.A. did the phylogenetic analysis. M.S.A., J.H., S.S., and R.S.D. prepared the manuscript.
FOOTNOTES
- Received 12 May 2017.
- Accepted 8 June 2017.
- Accepted manuscript posted online 28 June 2017.
Supplemental material for this article may be found at https://doi.org/10.1128/JVI.00793-17 .
- Copyright © 2017 American Society for Microbiology.
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵