Previous Article | Next Article ![]()
Journal of Virology, April 2002, p. 3852-3864, Vol. 76, No. 8
0022-538X/02/$04.00+0 DOI: 10.1128/JVI.76.8.3852-3864.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
JaeHyung Ahn,3 Jason M. Walker,2 and Ronald Swanstrom1,2*
Department of Microbiology and Immunology,1 UNC Center for AIDS Research,2 Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-72953
Received 13 July 2001/ Accepted 21 December 2001
|
|
|---|
|
|
|---|
Strains of HIV-1 are also classified based on their ability to induce the formation of multinucleated giant cells (syncytia) in MT-2 cells: viruses that display this cytopathic effect are termed syncytium-inducing (SI); those that do not are called non-syncytium-inducing (NSI) (62, 71). X4 viruses are frequently SI, and MT-2 cell tropism has been shown to be an entry-related phenotype that is largely determined by coreceptor preference (5). Strong selection either during or shortly after transmission results in the predominance of R5 variants early in infection (17, 58, 82). Variants able to use CXCR4 ultimately appear in approximately one-half of HIV-infected individuals, and their detection has been associated with an accelerated loss of CD4+ T cells and an increased likelihood of progressing to AIDS (15, 39, 55).
The surface subunit of the HIV-1 Env glycoprotein, gp120, controls entry-related phenotype, though postentry factors may also play a role in determining cell tropism. The gp120 coding domain of env has been divided into alternating constant and variable regions, referred to as C1 through C5 and V1 through V5, respectively (70). The variable regions lie mostly within regions encoding disulfide-constrained, surface-exposed loops (45). It is clear, however, that considerable differences in position-by-position variability exist without respect to the boundaries of these regions (80).
The determinants of coreceptor usage (4, 33, 69, 73, 77) and MT-2 cell tropism (8, 12, 19, 35, 52, 66, 74, 76) lie largely within the 35 amino acids of V3. However, changes in this region alone are not always necessary or sufficient to confer a particular phenotype in viruses expressing engineered gp120 proteins, since changes in other regions of gp120, particularly in V1/V2 (6, 13, 37, 38, 59) and C4 (9), have been shown to influence phenotype either alone or in conjunction with V3. In addition, isolates with identical V3 sequences can have dissimilar patterns of coreceptor usage, cell tropism, or replication capacity (31, 37, 38, 53). Typically, specific changes in these other regions do not have consistent effects in a wide range of sequence contexts.
Although phenotype-associated variability in V3 has been studied extensively using statistical approaches (3, 40, 47, 48), other regions of gp120 have escaped similar scrutiny, due in part to the relative paucity of sequence information corresponding to isolates of known phenotype. Specific amino acid substitutions within V3 that are associated with different entry phenotypes can, however, be used to predict phenotype with reasonable success. In particular, previous studies have determined that the presence of at least one basic substitution at V3 position 11 or 25 (HXB gp160 306 and 322, respectively) is associated with the X4, R5X4, and SI phenotypes (18, 29). We have recently observed that this 11/25 rule is the most reliable of the available motif-based methods for predicting the X4/SI phenotype (54).
Here we have further refined this prediction method, resulting in an improved phenotypic classification scheme based on V3 sequence. Throughout this paper, inferred phenotypes are indicated in lowercase letters (x4 and r5), and experimentally determined phenotypes are indicated in uppercase letters (X4, R5, SI, and NSI).
We have searched for positions in gp120 that are linked to entry phenotype-associated changes in V3 in a large set of gp120 sequences obtained from the Los Alamos HIV Database (http://hiv-web.lanl.gov) spanning the V1 to C4 regions. After controlling for epidemiological and phylogenetic relationships among the sequences, we identified 15 positions in gp120 but outside of V3 that have significantly different variability or representation of specific amino acids among x4 and r5 sequences (i.e., sequences of different inferred phenotype). Six of these fall between positions 190 and 204 at the C-terminal end of V2. In addition, we have found that in x4 sequences, a significant accumulation of basic substitutions occurs nearby within V2, in a region of marked variability and length polymorphism (HXB positions 180 to 189).
The identified positions are especially attractive candidates for influencing gp120 function due to their hypothesized localization at the coreceptor binding face and oligomer interface in the gp120 trimer, their proximity to one another, and the nature of the specific changes at each position. Furthermore, several of these positions have also been described previously as playing a role in defining gp120 function.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Clade B env sequences available from the HIV database
|
For the second sampling method (set 2), V1V3 and V3C4 sequence sets were resampled separately 30 times to create subsets defined by both subject of origin and sequence similarity as follows. Nucleotide alignments were created by replacing each amino acid in the protein alignments described above with the corresponding codon from the nucleotide sequence. Aligned positions at which gap characters appeared in more than 97% of sequences were deleted. All pairwise distances between aligned nucleotide sequences were calculated with Dnadist (PHYLIP version 3.6
) according to the Jin-Nei method with the coefficient of variation = 1/a1/2 = 1, where a is the parameter of the gamma distribution (27). The resulting distance matrix was used to select groups of sequences in which no two sequences had a pairwise similarity score greater than 94 (where similarity = 100 - distance). The similarity value of 94 was chosen by determining the highest possible cutoff value that did not result in a significant number of sequence clusters (implying epidemiological linkage) supported by at least 50% of bootstrap replicates in neighbor-joining trees of representative subsets (see "Phylogenetic trees" below).
Approximately 57 and 40% of pairwise comparisons among V1V3 and V3C4 sequences, respectively, classified as originating in the same individual had scores of
94. Fewer than 1% of comparisons among unrelated sequences spanning either V1V3 or V3C4 had scores of >94. To prevent intrasubject sequences from being included in the same subset, these groups of sequences were also filtered using the patient database described above. As a result, each of the 30 subsets contained no two sequences which had either a pairwise similarity score of >94 or that were isolated from the same subject. The software used to perform this filtering process is available by request.
Phylogenetic trees. To visualize the extent of relatedness among sequences in the sets constructed above, one of each of the 30 subsets of V1V3 and V3C4 nucleotide sequences were used to build phylogenetic trees. Sequences were codon-aligned as described above, and positions containing gap characters in >1% of entries were disregarded. Neighbor-joining trees were constructed, and bootstrap analysis was performed using PAUPSearch (GCG Wisconsin Package). Distances were calculated according to the general time-reversible model, with substitution rate variation across sites following the gamma distribution. The gamma shape parameter was set to alpha = 0.38, as estimated by Leitner et al. (44). Branches appearing in at least 50% of 100 bootstrap replicates were retained. Trees constructed using values of alpha ranging from 0.20 to 0.70 did not vary substantially in the number of supported clusters.
Sequence classification. gp120 sequences were classified as either X4-like (x4) or R5-like (r5) based on amino acid composition at positions 11 and 25 of the V3 loop (HXB positions 306 and 322). Sequences with R or K at 11 and/or K at 25 were considered x4, and sequences with no basic amino acid at 11 and 25 were considered r5. Arginine at 25 does not discriminate between the two phenotypes, so sequences with 25R were not included in the analysis. Note that lowercase designations (x4 and r5) refer to inferred phenotypes, while uppercase letters (X4 and R5) denote experimentally determined coreceptor usage.
To determine the accuracy of this method for predicting MT-2 cell tropism and coreceptor usage, we classified sets of V3 sequences of known phenotype. In this test, the X4 reliability (positive predictive value) and R5 reliability (negative predictive value) were calculated using Bayes' theorem. Adjusted values for reliability were calculated using an estimated X4 prevalence in the database of 15% (54). The sequence sets used to perform this analysis as well as the methods used were described by Resch et al. (54).
Statistical analysis.
Three different measures were used to assess the difference in amino acid composition between x4 and r5 sequences. First, diversity of amino acids (D) was used as described by Yamaguchi-Kabata and Gojobori (80) to evaluate the extent of variability at each position in a population of HIV Env sequences and is computed as:
![]() |
![]() |
Dj= Dj(x4) - Dj(r5) or
Ej = Ej(x4) - Ej(r5).
We used permutation tests to calculate probabilities that the above statistics described chance differences between x4 and r5 sequences. Let the diversity (
Dj) and entropy (
Ej) scores at each position be represented by a generic summary statistic, Sj. For each position j, a reference distribution of R = 1,000 scores for set Ak and Bk (k = 1, ... R) was created by randomly casting sequences into groups A and B, such that P(Ak) = P(x4) and P(Bk) = P(r5), and calculating a new summary statistic, S'jk. The probability that the summary statistic Sj describes a chance difference between x4 and r5 sequences can be estimated as P
N/R, where N is the smaller of the two values NL (number of events in which Sj
S'jk) and NG (number of events in which Sj
S'jk), 1
k
R. This results in a one-tailed test, where the P value is always calculated from the smaller tail. Put simply, the permutation test counts the number of (simulated) chance regroupings of sequences that results in a difference equal to or more extreme than the difference resulting from the classification of sequences as x4 and r5. If Sj fell outside the support of its reference distribution (i.e., N = 0), a P value of <0.001 (1/R) was assigned.
For the third measure of amino acid composition, a similar procedure was used to describe the representation of specific amino acids among x4 and r5 sequences. A binomial test of statistical significance was used to generate a Z score, Zij, describing the difference in the proportion of x4 and r5 sequences containing a given amino acid, i, at each aligned position, j (thus, there are potentially 21 such Z scores for each position: one for each amino acid, and one for the gap character). As described above, reference distributions of Z scores created by repartitioning the data sets (i.e., Z'ijk for 1
k
R) were used to calculate a P value describing the probability that the observed difference in frequency of each amino acid at a position among x4 and r5 sequences was due to chance (thus, 21 reference distributions were created for each position). At each position, test results were retained only for amino acids representing at least 10% of either x4 or r5 sequences.
Logistic regression analysis. We assembled sets of unaligned protein sequence fragments corresponding to the hypervariable regions of V1, V2, and V4 (amino acids 131 to 157, 180 to 189, and 386 to 413, respectively). Subsets of sequences used in the analysis corresponded exactly to those used in the tests described above (e.g., V1V3-set 1 and the 30 subsets of unrelated V1V3 sequences). Values describing net charge (R + K - D - E) and length in amino acids were assigned to each sequence fragment. Logistic regression analysis was used to correlate inferred phenotype with each of these two sequence descriptors. A univariate regression was performed for each of these descriptors.
|
|
|---|
Alignment of certain regions was not possible due to extensive length polymorphism, including hypervariable regions of V1 (positions 134 to 152) and V2 (positions 185 to 189), loop E in C3 (positions 354 to 357), and V4 (positions 396 to 413) (HXB gp160 numbering is used throughout this paper except for V3, which is numbered 1 to 35, starting at the first cysteine). These regions were not included in the analysis of individual amino acid variability and were considered separately for length variability and charge using a different strategy (see below).
Genotypic classification of gp120 sequences. The great majority of available gp120 sequences are not associated with an experimentally determined phenotype for either growth in MT-2 cells or coreceptor usage. One classification scheme that has been used as a surrogate for phenotype designates sequences with basic residues at V3 positions 11 and 25 (HXB positions 306 or 322) as X4-like or SI-like (x4) and those with no basic residues at 11 or 25 as R5-like or NSI-like (r5) (20, 29, 30, 63).
Using V3 sequences of known phenotype, we previously determined that this 11/25 rule used as a test has an estimated reliability of 0.85 and 0.48 for the SI and X4 phenotypes, respectively, and was the best published method available for predicting phenotype (54). However, the presence of 25R alone is nondiscriminating for both the SI/NSI and X4/R5 phenotypes (54). Therefore, to improve the accuracy of this method, we excluded all sequences with arginine at V3 position 25 and no basic residue at 11, resulting in the rejection of 4 and 5% of available V1V3 and V3C4 sequences, respectively. We have termed this classification method the 11KR/25K rule.
We evaluated the accuracy of these criteria as described by Resch et al. (54). Briefly, using sets of HIV-1 V3 sequences of known NSI/SI or X4/R5 phenotype and subject of origin, we selected 100 subsets in which each subject was represented by at most one sequence of each phenotype. A 2 x 2 contingency table describing the success of the sequence classification was determined for each subset; summaries of the resulting 100 contingency tables are shown in Table 2. Compared to the 11/25 rule, we observed a reduction in R5 and NSI sequences misclassified as x4 or SI, resulting in an improvement in the specificity of the test, or the fraction of correctly classified R5/NSI sequences. Because the reliability (positive and negative predictive values) of this test also depends on the prevalence of sequences in the database having the SI or X4 phenotype, we calculated these values using both the prevalence of X4 or SI sequences in the average 2 x 2 table and an estimated prevalence of X4/SI sequences in the database of 15% (54). The resulting estimate for the X4 reliability of the 11KR/25K rule is a substantially improved 0.96 to 0.98 and 0.65 to 0.78 for the SI and X4 phenotypes, respectively.
|
View this table: [in a new window] |
TABLE 2. Success of the 11KR/25K rule for phenotypic classification
|
|
View this table: [in a new window] |
TABLE 3. Description of set 1 and 30 subsets of sequencesa
|
To include as many sequences as possible in the analysis, we resampled all available sequences by constructing 30 subsets each of V1V3 and V3C4 sequences (Table 2). Among the 30 subsets thus constructed, 79% (948) of the available V1V3 sequences and 51% (1810) of V3C4 sequences were represented at least once. A greater proportion of V1V3 sequences were sampled because of the smaller total number of sequences and the presence of fewer large clusters of sequences isolated from single individuals. Of the sequences sampled from the V1V3 and V3C4 datasets, 24 and 43%, respectively, appeared in only one of the 30 subsets, and 68 and 79%, respectively, were present in five or fewer of the 30 subsets. As a result, a group of highly divergent sequences uniquely representing single individuals was somewhat overrepresented in this resampling method. These two sampling methods complemented each other, and positions displaying significant linkage to V3 in both groups appear to be strongly supported.
The fraction of sequences in common between all possible pairs of the 30 subsets was also calculated. Because the average overlap between any two subsets was 66 and 60% for the V1V3 and V3C4 sets, respectively, these subsets could not be considered independent samples of sequences. Accordingly, we made no attempt to aggregate results of statistical tests from different subsets; rather, we report the results of the analysis performed on each subset as a separate test.
Variability of certain positions outside of V3 is linked to V3 genotype.
Amino acid variability in gp120 is not uniformly distributed, suggesting that different regions of the protein are subject to dissimilar selective pressures and functional constraints (80). One such constraint could be imposed by the requirement for specific physical interactions with the cell-associated ligands of gp120. Since R5 and X4 strains use different coreceptors for entry, we hypothesized that some of the site-to-site variability might differ among gp120 sequences derived from viruses of different phenotype. We therefore used two measures of variability, entropy (E) and diversity (D), to compare the extent of heterogeneity at each aligned position in r5 and x4 protein sequences. Permutation tests were used to determine if the difference of entropy (
E) or diversity (
D) between the two phenotypes fell in the tails of the reference distributions for each position under the assumption of no difference.
As described in previous studies, notably in that by Yamaguchi-Kabata and Gojobori (80), we found marked differences in variability from site to site in gp120. To emphasize global patterns of variability, we plotted entropy scores for sequences in V1V3-set 1 and V3C4-set 1 averaged over a 5-amino-acid sliding window (Fig. 1A). Note that this analysis excludes positions in the alignment with no corresponding amino acid in the HXB2 numbering standard. Differences in diversity and entropy among x4 and r5 sequences were calculated at each well-aligned position.
![]() View larger version (24K): [in a new window] |
FIG. 1. Entropy at each aligned amino acid corresponding to its position in HXB-2 from V1 to C4 among subtype B gp120 sequences. Entropy values were calculated from alignments of sequences in V1V3-set 1 (V1 to C3) and V3C4-set 1 (V3 to C4). (A) Entropy values averaged over a 5-amino-acid sliding window (i.e., the value plotted at position 3 is the average entropy for positions 1 through 5). Regions of poor alignment are included in the plot. (B) Differences in entropy among x4 and r5 sequences at each position. Positive values correspond to positions at which variability is higher among x4 sequences. Black bars indicate positions at which entropy differed significantly between x4 and r5 sequences. Positions within regions of poor sequence alignment (134 to 152, 185 to 189, 354 to 357, and 396 to 413) were excluded from the statistical analysis and are set at zero in this plot.
|
We identified six positions for which either
D or
E achieved significance according to the above criteria using both sequence sampling methods; an additional six positions achieved significance using either one sampling method or the other (Fig. 1B and Table 4). Both measures of variability gave comparable results for all positions in achieving significance except for 424, where 11 versus 30 subsets reached P < 0.05 for
D and
E, respectively. Overall, variability increased at each of these positions among x4 sequences except at positions 345 and 365, both of which were more homogeneous for the consensus amino acid among x4 compared to r5 sequences (Fig. 1).
|
View this table: [in a new window] |
TABLE 4. Summary of all test results for V3-linked variability
|
![]() View larger version (46K): [in a new window] |
FIG. 2. Amino acid composition at positions varying significantly between x4 and r5 sequences with respect to entropy, diversity, or representation of specific amino acids. The percentage of sequences containing the indicated amino acid is indicated on the y axis. Black bars, amino acid composition of x4 sequences; lightly shaded bars, amino acid composition of r5 sequences. Values shown describe V1V3-set 1 and V3C4-set 1. P values of specific amino acids achieving significantly different representation in x4 and r5 sequences are indicated with asterisks or pluses as specified in the legend. A summary of the P values obtained for each of the sampling methods is shown in Table 4. Note that amino acids with a P value between 0.05 and 0.01 in set 1 were not considered significant but were marked with a single + for comparison.
|
A cluster of positions at which amino acid composition was linked to inferred entry phenotype, including positions 190, 191, 195, 198, 200, and 204, appeared in the relatively well- conserved region at the C-terminal end of V2 (Table 4 and Fig. 2). Position 190 is the first well-conserved amino acid after the region of extensive length polymorphism in V2 (V2hv) and falls within a motif (defined as NXS/T) (41) that is predicted to direct the N-linked glycosylation of position 188N. This motif appears in 61% of sequences in V1V3-set 1 (although positions 188 and 189 were not aligned, their composition could be determined by examining the two amino acids in each sequence preceding position 190). Substitution of residues other than S or T at position 190 occurred more often among x4 sequences (39%) than r5 sequences (13%). When both positions 188 and 190 were considered, the difference in the predicted frequency of glycosylation of 188 among x4 (38%) and r5 (68%) sequences remained highly significant (P < 2 x 10-5, Fisher's exact test). In addition, x4-associated residues at position 190 were most frequently basic, with 23% of x4 sequences in V1V3-set 1 containing R190 or K190, compared to 4% of r5 sequences. The shift from a large, negatively charged carbohydrate complex to a positively charged amino acid represents a dramatic chemical change in the x4 sequences, one that is in the same direction as in V3.
Position 191 showed a significant change away from the consensus amino acid, tyrosine, among x4 sequences. At position 195, x4 sequences were more likely to contain a histidine than were R5 sequences (15% versus 3%). A change away from the consensus amino acid (threonine) also occurred at position 198, where nonconsensus substitutions were found in 3% of r5 sequences and 18% of x4 sequences. At position 200, we observed a significantly increased frequency of T among x4 sequences. Just outside of the V1/V2 stem, an additional position, 204, was less likely to contain A among x4 sequences; this result was significant in set 1 but not in the 30 subsets of V1V3 sequences.
Four other positions, 166, 177, 365, and 382, were significantly linked to V3 genotype in set 1 but not the 30 subsets of sequences. Positions 166 and 177 are N-terminal of the region of extensive length polymorphism in V2. Position 166 was more likely to contain K among r5 sequences (12%) than x4 sequences (2%). A shift from tyrosine to asparagine at position 177 was associated with the x4 genotype. Position 365 was almost always serine among x4 sequences, while a low level of substitution of A, L, or T was found in r5 sequences; this pattern of higher conservation among x4 sequences was notable, as most genotype-linked positions were better conserved among r5 sequences. Finally, 382F was present in 100% of r5 sequences, while low levels of substitution of Y, L, or G were observed among sequences with the x4 genotype.
Apparent linkage of positions to basic substitutions in V3 cannot be explained by phylogenetic relationships among sequences. A theoretical source of bias in this analysis is phylogenetic relatedness among sequences. For example, substitutions shared by clusters of related sequences of the same phenotype could be misconstrued as being linked to phenotype rather than due to common ancestry. To estimate the relatedness of sequences in V1V3-set 1 and V3C4-set 1, neighbor-joining trees were constructed from nucleotide sequence alignments. Few clusters of more than two sequences were supported by 50% of bootstrap replicates (7.6 and 6.0% of V1V3 and V3C4 sequences, respectively), and no cluster of more than five sequences was observed. Among pairs and clusters of sequences of the same V3 genotype, there was no pattern of substitution at any of the V3-linked positions that suggested that phylogenetic relationships among sequences biased the results of the statistical analyses (data not shown).
Increased net positive charge in the V2 hypervariable region among x4 sequences. Although poor alignment prevented the examination of position-by-position differences in highly variable regions of gp120, we were able to test the correlation between the V3 11KR/25K genotype and global descriptors of these regions. Previous studies have suggested that length, charge, and number of potential glycosylation sites in the V1 or V2 loops vary with coreceptor usage, MT-2 phenotype, or disease progression (32, 67). Some or all of these associations were not confirmed in other studies (34, 67, 75). We used logistic regression to test the correlation between the V3 genotype and net charge or length in sequences spanning the hypervariable portions of V1 (V1hv; positions 131 to 157), V2 (V2hv; positions 180 to 189), and V4 (V4hv; positions 386 to 413). We assembled protein sequence fragments representing V1hv and V2hv from both V1V3-set 1 and the 30 subsets of V1V3 sequences described above; likewise, V4hv sequence fragments corresponded to V3C4-set 1 and the 30 subsets of V3C4 sequences.
An inferred x4 phenotype was found to be correlated with an increased net positive charge in V2hv with a significance of P < 0.003 for sequences in V1V3-set 1 in a univariate logistic regression analysis (Table 5). The difference between the average charge in V2hv among x4 and r5 sequences was 0.56. All 30 sequence subsets attained a P value below 0.05, with a median P value of 0.008. The average difference in charge among x4 and r5 sequences in the subset corresponding to the median P value was 0.49. Frequency distributions of net charge of V2hv in the 30 subsets show that x4 sequences are consistently shifted to more positive values (Fig. 3A).
|
View this table: [in a new window] |
TABLE 5. Regression analysis of covariation between V3 genotype and features of the V2 hypervariable regiona
|
![]() View larger version (27K): [in a new window] |
FIG. 3. Frequency distribution of characteristics of the V2 hypervariable region (HXB positions 180 to 189) among 30 subsets of x4 and r5 sequences. Each trace corresponds to either x4 (black dashed line) or r5 (gray line) sequences represented in one of the 30 subsets. (A) Frequency distribution of net charge. (B) Frequency distribution of the length of each sequence corresponding to HXB positions 180 to 189 in the alignment.
|
We conclude that among the sequences examined in this study, there is good evidence that the x4 V3 genotype is associated with a net increase in positive charge in the V2hv region. There is a less well supported trend toward elongation of this region in x4 sequences. In addition, no correlation was found between V3 genotype and either charge or length in V1 or V4 (data not shown).
|
|
|---|
The vast majority of isolates corresponding to sequences available in public databases have not been assigned experimentally determined phenotypes; at most 15 sequences greater than 300 nucleotides in length representing isolates originating from different individuals were classified as X4 in the Los Alamos HIV database (http://hiv-web.lanl.gov) at the time of writing (data not shown). As a result, this analysis relied on the assignment of an inferred phenotype based on V3 sequence using a modification of the 11/25 rule, which assigns an x4 genotype to sequences with at least one basic substitution at V3 position 11 or 25 (HXB 306 and 322) (20, 29, 30, 48, 79). By eliminating sequences with arginine at V3 position 25, we improved the specificity of the phenotypic classification by reducing the number of R5 sequences misclassified as X4. The reliability of the test (or the probability that a sequence predicted to have a particular phenotype actually has that phenotype) gives an indication of the expected "purity" of the x4 and r5 sequence sets and depends on the test's specificity and sensitivity but also on the prevalence of X4 sequences in the data set (Table 2). The reliability of this test, called here the 11KR/25K rule, is high for the NSI and SI phenotypes and moderately good for coreceptor usage.
The average adjusted X4 and R5 reliabilities for this test are 0.65 and 0.91, respectively. As a result of these values, the x4 and r5 sets are expected to be asymmetrical in quality with respect to coreceptor usage, with the r5 set less likely to contain X4 sequences than the x4 set is to contain R5 sequences. Another factor leading to the asymmetrical quality of the r5 and x4 groups was the inclusion of dually tropic viruses in the x4 group; we would expect sequences with some R5-like characteristics to be found in the x4 set, to the extent that R5X4 and R5 have common determinants for CCR5 usage. Despite the mixed nature of the sequences classified as x4, X4/SI-associated changes are still apparent (and significant) compared to the much larger and relatively pure r5 sequence set. Indeed, for the reasons just stated, we expect that our results underestimate the true differences in amino acid composition and variability between X4 and R5 viruses. Thus, even an imperfect but asymmetric classification scheme is useful for discerning global changes over a large number of sequences. As an example, we have noted that patterns of variability among V3 sequences that are known to be either R5 or X4 closely resemble the patterns observed among V3 sequences classified as r5 or x4 using the less accurate 11/25 rule (54). It is reasonable to suppose that such global patterns should also be well reproduced outside of V3, with their detection further enhanced using the improved classification scheme.
Although the distinction between MT-2 tropism and coreceptor usage is not merely a semantic one, the two classifications are clearly closely related. We are not aware of any SI isolates that are not X4 or R5X4, and NSI isolates that can use CXCR4 are exceedingly rare (we identified a single example in the database [5]). It is therefore something of a paradox that the correlation between discrete changes in V3 sequence and MT-2 cell tropism is more robust than for coreceptor usage. This might be the result of selection bias in the sets of phenotyped V3 sequences used to test the rule or could reflect a difference in the sensitivity of the experimental assays used to test viral isolates for the two phenotypic classifications. Additionally, we have previously suggested that SI isolates may represent a later, more evolved stage of the X4 phenotype (54). The determinants for coreceptor usage might therefore be more complex or might be more likely to lie outside V3 than those for syncytium induction.
Despite these disparities, the 11KR/25K rule should exclude SI isolates from the r5 set approximately 97% percent of the time (based on the adjusted reliability for predicting the NSI phenotype), and the NSI sequences that remain are highly unlikely to use CXCR4. This should be taken as a strength of the approach for predicting coreceptor usage. And more generally, the connection between the SI/NSI phenotype and coreceptor usage seems sufficiently strong to lend additional support to the hypothesis that the observed V3-linked changes are associated with coreceptor preference.
What is the relationship between V3, the V1/V2 stem (containing positions 190 to 200), position 440, and the determination of coreceptor usage? A given position in gp120 could influence entry phenotype in a number of ways, most obviously by participating in a direct physical interaction between gp120 and the coreceptor. But the amino acid composition of a position could also hypothetically modify the tertiary structure of the gp120 subunit, the stability or rotational conformation of the Env oligomer, or the conformational responsiveness to CD4 binding, all of which could affect coreceptor preference or affinity. Changes in charge could also affect the ability of the virus to approach the cell-associated receptors or plasma membrane. In addition, X4 viruses tend to arise late in infection, are associated with a more advanced stage of disease, and are thus likely to be exposed to a different immunological environment as a group compared to R5 viruses. Thus, some changes linked to V3 genotype might not reflect a functional contribution to coreceptor usage. To assess these possibilities, we located positions identified by our statistical approach within the three-dimensional structure of a theoretical model of the HXB-2 gp120 trimer in a CD4-bound conformation (Fig. 4) (43).
![]() View larger version (64K): [in a new window] |
FIG. 4. Model of the gp120 trimer structure as published by Kwong et al. (43), detailing the physical location of positions linked to phenotype-associated changes in V3. Residues achieving significant linkage to V3 according to both sampling methods are filled in in red; others are colored blue (see Table 4). Positions located in the region of the V1/V2 loop for which the structure was not solved are also indicated. Critical CCR5 binding positions (57) are colored green and are surrounded by green dots; positions defining the CD4 binding site are orange (42). The V3 stem is colored purple (the structure of the rest of the V3 loop was not solved). Model coordinates were kindly provided by Peter Kwong. The illustration was created using Rasmol (60). (A) View of the trimer from the perspective of the target cell. (B) View of the trimer perpendicular to the plane of the viral membrane, with the coreceptor binding face directed toward the top of the page. Each subunit is colored a different shade for clarity.
|
A physical and functional relationship between V3 and C4 has been established by monoclonal antibody epitope mapping (49, 50, 78) as well as by mutational analysis in both HIV and simian immunodeficiency virus (SIV) (9, 51). The pattern of substitutions at position 440 reported here also suggests an interaction between that position and the V3 loop: the consensus arginine is much more likely to be replaced by an acidic or uncharged amino acid among isolates with basic substitutions in V3 (Fig. 2). These results thus support the previously suggested possibility that the nonbasic substitutions at position 440 may compensate for the accumulation of positive charges in V3 by minimizing the electrostatic repulsion between that position and the V3 loop (9, 48). Such compensatory changes might be necessary to maintain a stable conformation of gp120 or to aid in the formation of epitopes required for coreceptor interaction.
Position 424 also lies within C4, and unlike most other positions identified, 424 is not surface exposed. It is, however, near amino acids making contact with either CCR5 (419 to 422) or CD4 (425 to 430) (42, 57). The transition from isoleucine to the less bulky valine among x4 sequences may influence the conformation or flexibility of regions of gp120 involved in interaction with either CD4 or the coreceptor. Another buried, hydrophobic position was found to be linked to V3 genotype, this one located at the conserved, N-terminal end of V4: position 382 is always F among r5 sequences but experiences a low level of substitution in an x4 background, although the difference was significant only in set 1. The side chains of residues 424 and 382 are close to one another in the gp120 crystal structure, suggesting a possible interaction between these positions.
One provocative finding was the cluster of five positions linked to V3 genotype between positions 190 and 200. This region forms parts of the V1/V2 stem and bridging sheet and lies at the oligomer interface (42, 43). It has been suggested that positions in the V1/V2 stem may contribute to the interaction between gp120 and CCR5 (56). Although positions 195, 198, and 200 are well exposed on the surface of gp120 (fractional solvent accessibility, >0.4) (42), only the side chains of positions 198 and 200 point toward the coreceptor target; the side chain of 195 projects in the opposite direction. Structural information is not available for V2 positions N-terminal of 195, but by extrapolating from the alternating pattern of side chain orientation of positions 195 to 202, we speculate that 190 is oriented toward the target cell. V3-associated changes at each of these positions were significant in sets of sequences assembled using both selection methods.
Several of these positions are strong candidates for experimental verification. For example, the consensus sequence at positions 188 to 190 (NTS) forms a motif predicted to direct N-linked glycosylation of position 188. Position 190 is more frequently serine or threonine among r5 sequences (86 versus 61%); most of the substitutions among x4 sequences result in both the disruption of the glycosylation motif and the accumulation of a positive charge (Fig. 2). The S190R mutation was observed among clones of an HIV-1 isolate selected for syncytium formation in microglia by passage in vitro (65). It is also possible that the loss of glycosylation at 190 among x4 viruses is due to a release from antibody selection late in infection, when X4 isolates are most likely to arise. Like position 440, positions 195 and 200 are under significant positive selection, which is consistent with a role in determining Env function (80). Positions 198 and 200 lie in a beta sheet in the V1/V2 stem directly across from positions 123 and 121, respectively, both of which have been shown to be critical for CCR5 binding (56, 57) (Fig. 4).
According to structural models, the V1/V2 stem is separated from V3 within a single gp120 monomer in the CD4-bound state, but it may be close to the V3 loop of a neighboring subunit in the Env oligomer (Fig. 4) (43). There is also experimental evidence of a physical and functional interaction between V3 and V1/V2: neutralizing antibodies in serum from monkeys infected with HIV-1/SIV chimeric viruses recognize discontinuous epitopes that are either composed of or influenced by changes at both residue 13 (HXB gp160 308) in V3 and positions 187 and 190 in the V1/V2 stem (26). Several studies have demonstrated a cooperative interaction between V3 and positions in the V1/V2 stem to the extent that changes in both regions are required to confer a particular coreceptor usage, cell tropism, or cytopathic effect (13, 33, 37, 59). A critical unifying feature of these studies, however, is the context dependence of the changes described; that is, specific substitutions at a particular position often influence phenotype only in a very restricted set of isolates. In this light, the fact that global patterns of V3-linked variation were apparent at all in the V1/V2 stem is significant and calls for experimental confirmation of the overall role of this region in gp120 function.
The linkage of other positions to V3 is less easily interpreted due to a lack of structural information, because linkage to V3 is only marginally significant or because location in the three-dimensional structure of gp120 does not suggest an obvious involvement in coreceptor binding or oligomer assembly. Like position 190, a mutation at position 166 in V2 (R166G) was involved in the acquisition of syncytium induction in microglia in vitro (65); in this study, R166K was overrepresented in x4 sequences. Position 177 lies in a region of V2 shown to be involved in gp41-independent dimerization of gp120 (10), and positions 204 and 211 are located at the oligomer interface (43). Any of these positions could hypothetically influence the assembly or conformation of the gp120 oligomer. Rizzuto and Sodroski (56) identified position R117, whose side chain lies within 5 Å of that of 204 (204 is situated between 117 and the coreceptor binding face) and also projects into the oligomer interface, as being critical for CCR5 binding; the authors of this study also noted that CCR5-associated positions tended to lie close to the trimer axis.
It has not been well established whether the increase in net positive charge in V2 accompanies a switch to the X4 or SI phenotype. Groenink et al. (32) first reported a significantly higher positive charge in the hypervariable V2 locus of SI and "switch NSI" isolates (i.e., those with an NSI phenotype isolated from individuals who also harbored SI virus) than in NSI isolates. Other groups have also noted a trend toward higher positive charge in V2 in SI isolates (16, 75). Here, we report a significant correlation between X4/SI-associated changes in V3 and an increase in net positive charge of gp120 sequence fragments that include V2hv (HXB positions 180 to 189). Although the average difference in charge, approximately +0.5, is not large, this charge accumulation occurs in conjunction with an additional x4-associated basic substitution at position 190 (see Fig. 2 and 3).
Modeling of the coreceptors suggests that the extracellular surface of CXCR4 is more negatively charged than that of CCR5 (22), and others have shown that that the surface components of CXCR4 used by HIV-1 are more acidic than the corresponding regions of CCR5 (7, 46). In addition, substitution of alanine for certain acidic residues in CXCR4 not only resulted in the loss of coreceptor activity but also allowed R5 viruses to infect cells using the mutant receptors (11). This charge difference between CCR5 and CXCR4 fits well with the observation that basic substitutions in V3 are associated with CXCR4 usage. Because of the location of V2hv near the coreceptor binding face in the gp120 trimer, it is tempting to speculate that the increase in positive charge in the V1/V2 stem among X4 isolates further enhances a direct interaction with the negatively charged surface of CXCR4.
We have noted an increase in V2hv length among x4 sequences, although the significance of this correlation is marginal. It is a matter of some controversy whether the accumulation of insertions in V2hv (HXB positions 185 to 189) is associated with either changes in phenotype or disease progression. Groenink et al. (32) observed extension of V2 among SI and switch NSI sequences, although the same group failed to confirm this result using a more extensive data set (61). Another recent study showed an increased length of V2 in both SI isolates and NSI isolates obtained shortly before a phenotypic switch; the same group demonstrated that X4, R5X4, and R3R5X4 viruses were indistinguishable from one another, but all had longer average V2 lengths than R5 viruses (36). Two other studies failed to find a relationship between V2 extension and entry phenotype (34, 75). An extensive longitudinal study of Env sequence evolution in 12 subjects demonstrated a positive correlation between V2 extension and slow disease progression and showed that insertions in V2 reduced virus replication in macrophages (67). Further elucidation of the connection between V2hv extension and entry phenotype will probably have to wait for the availability of a large set of isolates with experimentally determined coreceptor usage.
Our analysis has identified positions in gp120 that are linked to entry phenotype-associated changes in V3. The cluster of such positions in V2 complements experimental evidence showing an often context-dependent functional association between V2, V3, and coreceptor usage. We have also confirmed and extended observations that CXCR4 usage is associated with increased positive charge in the V2 hypervariable region. Using a statistical approach, we have provided specific predictions about the functional role of regions of gp120 that can now be tested experimentally.
This work was supported by NIH grant R01-AI44667 to R.S. N.G.H. was supported in part by NIH Training Grants T32-AI07419 and T32-AI07001.
Present address: Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Md. ![]()
|
|
|---|
, MIP-1ß receptor as a fusion cofactor for macrophage-tropic HIV-1. Science 272:1955-1958.[Abstract]
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»