Previous Article | Next Article ![]()
Journal of Virology, April 2003, p. 4836-4847, Vol. 77, No. 8
0022-538X/03/$08.00+0 DOI: 10.1128/JVI.77.8.4836-4847.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Celia A. Schiffer,2 Matthew J. Gonzales,3 Jonathan Taylor,4 Rami Kantor,3 Sunwen Chou,5 Dennis Israelski,3 Andrew R. Zolopa,3 W. Jeffrey Fessel,6 and Robert W. Shafer3*
Department of Biochemistry,1 Division of Infectious Diseases, Department of Medicine,3 Department of Statistics, Stanford University, Stanford,4 AIDS Research, Kaiser-Permanente, Northern California, San Francisco, California,6 Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts,2 Division of Infectious Diseases, Oregon Health and Science University, Portland, Oregon5
Received 13 September 2002/ Accepted 14 January 2003
|
|
|---|
|
|
|---|
Most of the published sequence data on protease inhibitor-associated mutations are based on isolates obtained from persons treated for no more than 1 year with a single inhibitor (4, 17, 19-21). Few published data are available from persons with carefully characterized treatment histories who have received more than one inhibitor (12), and the genetic mechanisms by which HIV-1 protease develops resistance to multiple inhibitors have not been explored. Understanding the genetic basis of multidrug resistance, however, is critical to designing new non-cross-resistant protease inhibitors that are active against current drug-resistant HIV-1 isolates.
To characterize the patterns of mutations in protease isolates from heavily treated persons, we collected and analyzed a large number of protease sequences of HIV-1 isolates obtained from persons with a range of protease inhibitor experiences. Our analysis allows us to extend previous observations of the mutational flexibility of HIV-1 protease and to identify interactions among protease mutations. We used published structural data to explore possible underlying causes for these interactions.
|
|
|---|
If multiple isolates were obtained from the same person during the course of protease inhibitor treatment, we included only the most recent isolate. We included two isolates from the same person only if a pre-protease inhibitor treatment isolate was also available. Only sequences that encompassed positions 10 to 90 were included in our analysis (96% included the complete protease, positions 1 to 99). All isolates were sequenced by dideoxynucleotide sequencing rather than by hybridization assays.
Mutations. Mutations were defined as differences from the HIV-1 protease consensus B sequence (15). Of 2,244 sequences meeting the study criteria, 89% (1,990) were determined by direct PCR (population-based) sequencing and 11% (254) were determined by sequencing multiple clones of an isolate. About 1% of nucleotide positions in the sequences determined by direct PCR sequencing contained nucleotide mixtures (defined as the presence of a second electrophoretic peak of at least 20 to 30% of the primary peak). Positions with mixtures were scored as mutations in our analysis of mutation prevalence. However, because it is not possible to determine if these mutations were present in the same genome as other mutations in the sequence, mutations present as mixtures were excluded from our covariation analysis.
For the 254 isolates for which multiple clones were sequenced, we restricted our analysis to the clone that occurred with the highest frequency. This restriction caused us to exclude 128 mutations that were present in 30% or more of the clones from an individual (but that were not present in the clone with the highest prevalence) and to include 15 mutations that, although present in the most prevalent clone, existed in <30% of the total. This restriction was necessary to prevent the inclusion of mutations from different genomes in our covariation analysis. It did not significantly change the results of our analysis of mutation prevalence.
Statistical analysis. (i) Mutation prevalence. We performed chi-square tests of independence to determine if there was an association between drug treatment and a mutation at each protease position. The chi-square statistic was based on a 2-by-2 contingency table containing the numbers of isolates from treated and untreated persons and the numbers of isolates with and without mutations.
To investigate whether there was a linear relationship between the number of protease inhibitors received and the prevalence of a mutation, we performed a logistic regression analysis in which the number of drugs was the independent variable and the presence or absence of mutation was the dependent variable. Persons were categorized in one of four groups according to the extent of treatment: one, two, three, and four or more protease inhibitors. Untreated persons were not included in this analysis.
For the chi-square and logistic regression analyses, we used the method of Benjamini and Hochberg to identify results that were statistically significant in the presence of multiple-hypothesis testing (1). This method was developed for the problem of multiple-hypothesis testing when multiple significant findings are not unexpected. As opposed to the Bonferroni correction, which divides the significance cutoff by the number of hypotheses tested (n), the Benjamini-Hochberg method ranks the hypotheses by their P values, and each hypothesis of rank r is compared with a significance cutoff, now called a false-discovery rate (FDR), divided by (n - r). In this study, FDRs of 0.01 and 0.05 were used to determine statistical significance.
(ii) Mutation covariation.
We investigated covariation between positions by calculating the binomial (phi) correlation coefficient for the simultaneous presence of mutations at two positions in the same isolate. The correlation coefficients were computed separately for the subsets of protease inhibitor-treated and untreated individuals. Statistically significant correlations were those with P values of
0.05 using a Bonferroni correction for 2,080 (i.e., the binomial coefficient
) pairwise comparisons. We further investigated the relationships among positions by performing a principal-components analysis (PCA) of positions found in the analysis described above to be mutated in treated persons. The matrix of binomial correlation coefficients was used as a measure of similarity between positions.
Mutational clusters were defined as clusters of three or more positions in which each member of the cluster was significantly correlated with the presence of each of the other members of the cluster (referred to as cliques in graph theory). Mutational clusters were identified by an exhaustive search technique that evaluated all possible clusters that could be formed from the statistically significant pairs of covarying residues.
(iii) Structural analysis. We used two published X-ray crystallographic structures (1hsg [3] and 1hhp [27]) and one molecular-dynamics simulation (23) of wild-type HIV-1 protease to examine the interresidue distances between positions with statistically significant frequencies of covariation. One X-ray crystallographic structure (1hsg) was of HIV-1 protease bound to indinavir, and one (1hhp) was of an unliganded enzyme. The molecular-dynamics simulation, based on the 1hhp structure, showed the flaps of the protease curled inward. The distance between two residues was considered to be the shortest interatomic distance between any atoms in the two residues. Interresidue distances were calculated between all positions in the protease dimer in each of the three structures. Residues within 8 Å of each other in at least one structure were considered to be neighboring pairs in the folded enzyme. This distance was chosen as a conservative maximum distance at which two residues may interact.
When covariation could not be explained by the proximity of the two residues, we investigated the possibility that covariation resulted from the presence of one or more linking residues, a phenomenon called chained covariation (A. S. Lapedes, B. G. Giraud, L. C. Liu, and G. D. Stormo, presented at the AMS/SIAM Conference on Statistics in Molecular Biology, Seattle, Wash., 1997). To identify chained covariation, we performed a Markov chain analysis of the statistically significant pairs of covarying residues. This analysis finds the shortest chain between a residue pair, where the chain consists entirely of correlated residues within eight Å of one another. We then counted the number of covarying pairs that could be explained by a chain of one, two, three, or more linking residues. To determine whether such chains were statistically significant, we performed a stepwise permutation analysis in which we randomly generated pairs of residues and determined whether these residues could also be linked by a chain consisting entirely of correlated, neighboring residues. Repeated permutations provided the expected number and distribution of chains of one, two, three, or more linking residues in a molecule having the size and topology of HIV-1 protease.
Nucleotide sequence accession numbers. The nucleotide sequences, mutations, drug treatment histories, and GenBank accession numbers can be downloaded as a PDF file from http://hivdb.stanford.edu/data/pr1.html. Of the 599 previously unpublished isolates sequenced at the Stanford University Hospital between 1 July 1997 and 31 December 2001, 383 had already been submitted to GenBank for a study of HIV subtypes in northern California (8); 216 new sequences were submitted to GenBank with this report (AF544406 to AF544621).
|
|
|---|
60% of whom were receiving ritonavir at a low dose as part of a dual protease inhibitor combination. One hundred fifteen persons received amprenavir, which was approved in 1999, and eight persons received lopinavir, which was approved in 2001. |
View this table: [in a new window] |
TABLE 1. HIV-1 isolates and protease inhibitor exposurea
|
![]() View larger version (36K): [in a new window] |
FIG. 1. Histograms of mutation frequency according to the number of protease inhibitors (PIs) received. The median number of mutations (differences from the consensus B sequence) increased from 4 in untreated persons to 12 in persons receiving 4 inhibitors.
|
|
View this table: [in a new window] |
TABLE 2. Mutation frequencies at protease positions 1 to 99 according to the number of protease inhibitors received
|
Our logistic regression analysis revealed that mutations at 24 positions had statistically significant positive linear relationships between the number of protease inhibitors received and the presence of a mutation (Table 2). The positions with the strongest linear relationships were positions 10, 20, 46, 53, 54, 63, 71, 73, 82, 84, and 90. There was a statistically significant negative linear relationship between the number of inhibitors and the presence of a mutation at position 30.
Locations of protease mutations within the enzyme's three-dimensional structure. The invariant HIV-1 protease positions include the active-site positions (positions 25 to 27); other positions in or near the substrate cleft (positions 28 to 29, 31, and 80 to 81); most of the N- and C-terminal domains, which together with the active site make up the dimer interface; and other positions that appear to be associated with maintaining the enzyme's conformation and flexibility (e.g., 10 conserved glycines, including 3 in the flexible tips of the enzyme flap at positions 49, 51, and 52). The polymorphic positions are found almost entirely in surface loops.
The 23 known drug resistance positions include six substrate cleft residues (positions 30, 32, 48, 50, 82, and 84); four flap tip drug resistance mutations (positions 46, 47, 53, and 54); position 90, which although not in the substrate cleft decreases susceptibility to multiple protease inhibitors; three additional residues which are generally mutated only in treated persons (positions 24, 73, and 88); and nine polymorphic residues (positions 10, 20, 33, 36, 60, 63, 71, 77, and 93). The 22 new drug resistance positions include one substrate cleft residue (position 23), three flap residues (positions 43, 45, 55), one terminal-domain residue (position 95), and 17 residues in the enzyme core. The substrate cleft residues at positions 48 and 50 are also in the protease flap tips.
Correlations between protease mutations. To identify patterns of drug resistance mutations, we calculated the pairwise binary (phi) correlation coefficients among the 45 treatment-associated and 17 polymorphic protease residues. This analysis was performed separately for the 1,004 isolates from untreated persons and for the 1,240 isolates from treated persons to detect associations that were independent of the treatment status of the individuals from whom the sequenced isolates were obtained. Among the untreated isolates, 23 of the 2,080 possible pairwise correlations were statistically significant, including 19 positive (phi = 0.14 to 0.31) and 4 negative (phi = -0.14 to -0.21) correlations. Among the treated isolates, 115 of the possible 2,080 correlations were statistically significant, including 99 positive (phi = 0.13 to 0.63) and 16 negative (phi = -0.13 to -0.34) correlations.
Table 3 shows the most strongly correlated pairs of positions among the 115 statistically significant correlations in isolates from treated persons. The three most strongly correlated pairs of positions among the treated isolates were 54 and 82 (phi = 0.63), 32 and 47 (phi = 0.51), and 73 and 90 (phi = 0.47). Mutations at two pairs of primary resistance positions had significant positive correlations: positions 84 and 90 and positions 48 and 82. Mutations at positions 82 and 90, although both common, were not significantly correlated with each other. Position 30 was negatively correlated with each of the other primary resistance positions. The positions with the greatest number of positive correlations were positions 10 (16 correlations), 46 (13 correlations), 71 (12 correlations), 90 (10 correlations), 20 (10 correlations), 73 (10 correlations), 82 (9 correlations), 63 (7 correlations), 84 (6 correlations), and 54 (6 correlations).
|
View this table: [in a new window] |
TABLE 3. Most strongly correlated pairs of positions among 115 statistically significant correlations in isolates from treated personsa
|
We can use our measurements of comutation frequencies to construct a graphical model that summarizes the relationships among positions in HIV-1 protease. In this model, we attempt to place positions with high degrees of comutation close together and positions with low or negative degrees of comutation far apart. These relationships are modeled as consistently as possible within the framework of a two-dimensional plot. One computational technique that generates such graphical models is called PCA. We performed PCA on the 45 positions that were associated with protease inhibitor treatment and used the matrix of correlation coefficients as a measure of similarity between positions. The results of our PCA are shown in Fig. 2. The figure shows that positions 30 and 88 cluster together and are separate from most other positions. It also shows a clustering of positions 54 and 82 and their separation from positions 73, 84, 90, and 93.
![]() View larger version (11K): [in a new window] |
FIG. 2. PCA of the 45 positions associated with protease inhibitor treatment. The graph is a two-dimensional projection of the distances among the 45 positions, where the similarity between any two positions is measured by their binary (phi) correlation coefficient among persons who have received at least one inhibitor. Positions with high degrees of comutation are close together, and positions with low or negative degrees of comutation are far apart. These relationships are modeled as consistently as possible within the framework of a two-dimensional plot.
|
Fifty-six (49%) of the 115 correlated pairs were separated by >8 Å. Our Markov chain analysis showed that of these 56 pairs of residues, 16 could be linked by one residue, 21 by two residues, 13 by three residues, and 1 by five residues. However, our permutation analysis, which was designed to determine whether such chains were statistically significant, showed that this amount of chained covariation would be expected by chance in a molecule with 56 correlated, neighboring residues having the size and topology of HIV-1 protease. Therefore, compared with randomly selected residue pairs, the covarying residues we observed were significantly more likely to be within 8 Å of one another but not significantly more likely to be linked by chained covariation.
Figure 3 shows the strongest positive correlations superimposed on the structure of the protease. Most of the strong correlations are in a plane that is adjacent to the substrate cleft and include residues 10, 24, 30, 46, 54, 82, 84, and 90.
![]() View larger version (55K): [in a new window] |
FIG. 3. The 50 most highly correlated residues in isolates from treated persons are shown superimposed on the locations of these residues within the folded enzyme. The blue lines represent positively correlated residues (n = 44; phi > 0.2); the red lines represent negatively correlated residues (n = 7; phi < -0.2). The diameter of each line is proportional to the correlation coefficient of the residue pair. The lines connect the beta carbons of each residue, with the exception of the glycines at positions 48 and 73, which are connected to other residues by their alpha carbons. Each correlated pair is shown twice, once in each monomer.
|
|
View this table: [in a new window] |
TABLE 4. Clusters of correlated protease positions
|
Six representative clusters from Table 4 are shown in Fig. 4. These six clusters occurred in 17% of isolates from all treated persons and 29% of isolates from persons receiving two or more protease inhibitors. Published in vitro susceptibility results for isolates containing each of these six patterns of mutations (and no additional known resistance mutations) reveal that each pattern is associated with reduced susceptibility to each of the protease inhibitors: amprenavir, 2- to 5-fold; indinavir, 10- to 15-fold; lopinavir, 2- to 20-fold; nelfinavir, 10- to 30-fold; saquinavir, 3- to 30-fold; and ritonavir, 3- to 100-fold (25).
![]() View larger version (52K): [in a new window] |
FIG. 4. Six representative clusters from Table 4. Each position in a cluster demonstrates statistically significant mutational covariation with each of the other positions within a cluster. (A) Positions 10, 63, 71, 90, and 93; (B) positions 10, 46, 71, 90, and 93; (C) positions 10, 71, 73, 84, and 90; (D) positions 10, 46, 71, 84, and 90; (E) positions 10, 48, 54, and 82; (F) positions 10, 24, 46, 54, and 82. The clusters are only shown on one monomer of the protease dimer. The side chains of the residues within each cluster are shown on the protease backbone. Oxygen is shown in red, nitrogen in blue, carbon in gray, and sulfur in yellow.
|
|
|
|---|
The large number of isolates analyzed in this study and the fact that a large proportion were from patients who received multiple protease inhibitors allowed us to identify 22 new treatment-associated positions. Mutations at eight of these new positions were observed to develop in a previous longitudinal study of protease isolates from 178 treated persons, but the associations of these mutations with treatment in that study were in most cases not statistically significant (24). The newly identified mutations occur primarily in combination with previously reported drug resistance mutations, suggesting that they act as accessory mutations to increase the level of resistance to multiple protease inhibitors or to compensate for losses in fitness. Most of the new mutations involve the replacement of one hydrophobic residue with another, possibly resulting in the repacking of hydrophobic regions in the core domain of the monomer.
Of the newly identified sites of mutation, residue 23located at the base of the P1 pocket, where it is flanked by V82 and I84is the position most likely to have a direct impact on inhibitor binding. The mutation L23I likely tightens or reshapes the P1 pocket and may compensate for the increase in size of the pocket that occurs with either V82A or I84V. Alternatively, L23I may directly interfere with inhibitor binding, as it is near the active site (30). Site-directed mutagenesis experiments in which L23I is placed in a wild-type enzyme or in an enzyme containing other mutations (e.g., V82A or I84V) are needed to clarify the effect of this mutation on protease function and protease inhibitor resistance.
Our analysis of the association between mutation prevalence and drug therapy has several limitations. First, the lack of available data on the isolates used in this study made it impossible to demonstrate a direct association between mutations and reduced in vitro susceptibility. Second, we did not control for the duration of HIV-1 infection or protease inhibitor treatment. However, despite these limitations, our analyses do establish a conservative lower limit to the extent of HIV-1 protease mutability and generate hypotheses about specific mutations. These hypotheses can be confirmed by demonstrating the longitudinal development of the mutations with treatment or the effects of the mutations on in vitro drug susceptibility.
Of the 22 newly described treatment-associated mutations, the 13 that are conserved in untreated persons are of more interest than the 9 polymorphic positions. Indeed, we cannot exclude the possibility that the increased prevalence of mutations at the nine polymorphic mutations reflects the increased variability of virus populations in persons infected for a longer period of timea population that is likely to include more treated than untreated persons.
A recent computational study evaluated the variability of protease residues in HIV-1, other primate lentiviruses, and feline immunodeficiency virus, as well as the theoretical free-energy contribution of each residue to the binding of HIV-1 substrates and inhibitors (29). Our analyses complement this effort by quantifying the variability of this enzyme in isolates that have evolved in the presence of one or more protease inhibitors. However, positions reported to be invariant may develop mutations within virus populations under different selection pressures. Mutations other than those described in this paper have been reported during in vitro passage experiments. Mutation at the invariant residue 91 (T91S) has been reported during in vitro passage with lopinavir (2). The substrate cleft mutations R8QK and A28S have been reported after passage with the experimental inhibitors A-77003 and TMC-126, respectively (11, 31).
Mutation covariation. The presence of positions within a molecule that covary, or mutate in a correlated manner, suggests that mutations at one position may require a compensatory mutation at a second position for optimal function (7, 16; Lapedes et al., AMS/SIAM Conference on Statistics in Molecular Biology). Covariation analysis has been used to help predict unsolved protein structures and to better understand the functions of proteins with known structures. Previous analyses of covariation have used alignments of sequences in a protein family rather than an alignment of variants of a single protein. However, the high mutation rate and mutation tolerance of HIV-1 have made it possible for us and others (14) to identify statistically significant covariation within a single HIV-1 subtype.
One of the major challenges of covariation analysis is to differentiate covariation resulting from the functional dependency between two positions from the shared inheritance of both mutations from a founder virus. In this study, covariation almost certainly reflects functionality rather than evolutionary relatedness. The fact that mutational correlations were so much more common among treated isolates is consistent with the repeated selection of the correlated mutations in many different isolates during selective drug pressure rather than with the inheritance of the correlated mutations from a small number of ancestral isolates.
Although biochemical and biophysical experiments are required to demonstrate the mechanism for the correlation between pairs of residues, our analyses provide preliminary hypotheses that help prioritize which residues to study. For example, more than one-half of the 115 pairs of significantly correlated positions were within 8 Å of each other. This proportion exceeds that expected by chance, suggesting that in many cases covariation results from a direct interaction between the correlated mutations.
The correlations between amino acids that were not close to one another in the three-dimensional protease structure are more difficult to explain. For example, mutations at position 46 were highly correlated with mutations at many distant positions (Fig. 3 and 4). Although we found that many nonneighboring but highly correlated residues could be linked through a chain of covarying residues, our statistical analysis suggested that the compact shape of the protease and the large number of correlated residues could cause this to occur by chance. An alternative explanation for correlation between distant residues comes from the work of others who have shown that the wild-type protease enzyme may be partially down-regulated and that mutations at certain residues, such as M46I and L63P, increase catalytic activity and may be selected in enzymes with other mutations that decrease catalytic activity, regardless of the locations of these other mutations (9, 22).
The negative correlation between D30N and the other primary protease inhibitor resistance mutations may reflect the fact that D30N decreases protease fitness without contributing resistance to any protease inhibitor other than nelfinavir (6, 12, 18). Alternatively, enzymes containing D30N together with other primary mutations appear to have decreased activity (28).
The frequent occurrence of mutational clusters, as well as other common patterns of mutations, suggests that mutations can interact as part of higher-order networks. These mutation patterns are tangible evidence for the high genetic barrier to resistance to the protease inhibitors. However, these patterns are complex and frequently overlapping, suggesting that there are few, if any, absolute dependencies between drug resistance mutations. Determining the biochemical and biophysical properties of enzymes with these patterns of mutations will be important for designing new protease inhibitors that are less likely to trigger resistance or are effective against already drug-resistant isolates.
Present address: Genentech, Inc., South San Francisco, Calif. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»