Previous Article | Next Article ![]()
Journal of Virology, April 2004, p. 3722-3732, Vol. 78, No. 7
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.7.3722-3732.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Molecular Biology Institute, Center for Genomics and Proteomics, Dept. of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, California 90095-1570,1 Specialty Laboratories Inc., Santa Monica, California 904042
Received 30 June 2003/ Accepted 4 December 2003
|
|
|---|
|
|
|---|
Fortunately, the rapid evolution of HIV itself may provide a powerful tool for gaining understanding of its function in general and drug resistance in particular. HIV's high mutation rate is essentially performing a saturating mutagenesis experiment that in principle could reveal the detailed selection pressures for every possible mutation. The question is how to best read out this detailed information and make use of it.
In evolutionary biology, one important tool for characterizing selection pressure is the ratio of observed amino acid mutations over observed synonymous mutations (nucleotide mutations that do not change the amino acid translation), often referred to as Ka/Ks (amino acid mutations over synonymous mutations) or dn/ds (nonsynonymous mutations over synonymous mutations). Since amino acid mutations, but not synonymous mutations, experience selection pressure due to their effect on protein function, their ratio gives a straightforward measure of this selection pressure. Throughout this paper we will use the term Ka/Ks, which is normalized by the ratio expected under a random mutation model (i.e., in the absence of any selection pressure) (10). A Ka/Ks value of 1 indicates neutral selection, i.e., the observed ratio of mutations that cause amino acid changes versus those that do not exactly matches the ratio expected under a random mutation model. Thus, amino acid changes are neither being selected for nor against. A Ka/Ks value of <1 indicates negative selection pressure. That is, most amino acid changes are deleterious and are selected against, producing an imbalance in the observed mutations that favors synonymous mutations. Much less common is positive selection (Ka/Ks > 1), indicating that amino acid changes are favored, i.e., they increase the organism's fitness. This unusual condition may reflect a change in the function of a gene or a change in environmental conditions that forces the organism to adapt. For example, HIV mutations which confer resistance to new antiviral drugs might be expected to undergo positive selection in a patient population treated with these drugs.
In this paper we present a large-scale study of the value of positive selection for detecting drug-resistant mutations in HIV protease and RT. Ordinarily, Ka/Ks is measured as a single value for an entire gene (10). This can reveal very interesting positive selection events in the evolution of an organism, but unfortunately the overall Ka/Ks values for HIV protease and RT provide no extraordinary result (they are negative [18], as in most genes). A more interesting question is whether positive selection can be observed at the level of individual mutations rather than by pooling all data for the entire gene. However, this would require very large amounts of mutation data to obtain a statistically significant Ka/Ks result for each individual mutation.
To solve this problem, we have performed automated mutation analysis of approximately 40,000 HIV samples from AIDS patients sequenced by Specialty Laboratories from 1999 to mid-2002. This massive data set provides essentially complete mutagenesis in the regions sequenced (including protease codons 1 to 99 and RT codons 1 to 381). More importantly, it enables the calculation of accurate Ka/Ks values at each codon, and even for each individual amino acid within that codon, with high statistical significance. Using these data, we have found that positive selection detects most of the known drug-resistant mutations and discovered many new mutations that are strong candidates for drug resistance or other key functional changes in HIV. The positive selection map and complete mutation data for the 40,000 HIV samples can be of great value to the AIDS research community.
|
|
|---|
![]() View larger version (25K): [in a new window] |
FIG. 1. RT-PCR
and sequencing of the HIV-1 protease and RT regions. HIV-1 RNA was
isolated from AIDS patient plasma samples. Reverse transcription was
performed to obtain the cDNA from single-stranded viral RNA. The HIV
protease and RT region around 1.4 kb was amplified by PCR using two
(forward and backward) unique primers. This was followed by a nested
PCR, which split the target sequence into three shorter fragments with
the use of six unique primers. These fragments were then cycle
sequenced in forward and reverse
directions.
|
Single nucleotide polymorphism (snp) scoring and identification. To identify real mutations and distinguish them reliably from possible sequencing errors, all six chromatogram reads for each sample were aligned against the subtype B reference sequence (GenBank accession no. GI9629357) and analyzed by the programs POA and snp_assess as previously described (5-7, 9). For each candidate mutation, snp_assess calculated the log odds ratio (LOD) of the probability that it is a true mutation versus the probability that it is a sequencing error. An LOD value of >3 implies that the likelihood of a sequencing error is less than 10-3 (Fig. 2).
![]() View larger version (38K): [in a new window] |
FIG. 2. Chromatogram
evidence for an HIV-1 protease mutation. The program snp_assess
identified an A G mutation with an LOD score of 11.6 (top).
Chromatograms for the forward (Seq214224) and reverse (Seq214232)
strand sequencing are shown in the lower panels. Seq214232 is shown in
reverse complement for the purposes of
comparison.
|
Calculation of Ka/Ks for specific amino acid substitutions. Our calculation is based on the definition of Ka/Ks developed by Li (10). The main differences of our approach are that (i) instead of calculating Ka/Ks for an individual gene or codon, we calculate an individual Ka/Ks value for each specific amino acid mutation; (ii) we follow the definition of Ka/Ks as normalized by a random mutation model (i.e., no selection pressure, described in detail below), unlike some treatments of dn/ds (25); (iii) HIV has a high transition/transversion ratio (20), which must be taken into account for an accurate Ka/Ks calculation. We first measured the transition and transversion frequencies ft and fv from the entire data set, according to the following formulas: ft = Nt/ntS and fv = Nv/nvS, where S is the total number of samples; Nt and Nv are the numbers of observed transition and transversion mutations, respectively; nt is the number of possible transitions in the region that was sequenced (simply equal to its length L in nucleotides); and nv is the number of possible transversions (equal to 2L). For this calculation, we used all of the nucleotides in the region that was sequenced. It is also possible to perform this calculation specifically on silent nucleotide positions (i.e., nucleotides where all possible mutations are synonymous); however, we have followed the more conservative approach of using all nucleotides, in keeping with previously published work (20). In this calculation (and all others below) we counted only single nucleotide substitutions; all other mutations were excluded.
The definition of
Ka/Ks can be extended to a
specific amino acid substitution (X
Y) at a codon by
calculating the ratio of NY, the count of
X
Y mutations observed at that codon, over
Ns, the count of synonymous mutations observed at
that codon. This NY/Ns ratio is
then normalized by the ratio expected under a random mutation model
(i.e., in the absence of any selection pressure), according to the
following formula:
![]() |
Y exhibits
positive selection pressure.
We calculated an LOD confidence
score for a mutation X
Y to be under positive selection
pressure according to the following formula:
![]() |
![]() |
Drug resistance prediction P values.
Given n mutations with
positive selection of a total of N mutations, we calculated
the log probability of predicting at least m drug-resistant
mutations by random chance (of the total number, M, of known
drug-resistant mutations), according to the following hypergeometric
distribution:
![]() |
|
View this table: [in a new window] |
TABLE 1. Positive-selection
pressure in protease
|
|
|
|---|
We identified 1,923,620 candidate mutations in these samples, of which 1,830,097 had high LOD scores (LOD > 3; throughout this paper we will focus on mutations with LOD scores of >3). Manual verification indicates that the false-positive rate in the mutation data presented in this paper is less than 1% (see Materials and Methods). This sequencing covers the whole protease gene (297 bp) and the first 1,143 bp of the RT gene. The average number of mutations (compared to the subtype B reference) was 31.96 per kb overall, 29.57/kb in HIV protease, and 32.58/kb for RT. Approximately 349,000 mutations were detected in protease, and 1.48 million were detected in RT. This represented 1,148 distinct codon mutations in protease and 3,873 in RT. On average, each distinct mutation was observed in 364 independent samples, corresponding to an allele frequency of 0.92%. The overall ratio of transition to transversion was 8.75, indicating that the HIV-1 pol enzyme has a very strong bias towards transition substitutions. This result is consistent with a previous report (20).
We identified 232,299 amino acid mutations in protease and 586,192 mutations in RT. These subdivided into 528 distinct amino acid changes at 91 codon positions for protease and 1,964 distinct amino acid changes at 361 codon positions for RT. Thus, the average population frequency of each amino acid change was 1.1% in protease and 0.75% in RT. We detected 5.33 distinct amino acid changes per codon in protease and 5.15 per codon in RT.
Positive selection mapping of individual amino acid mutations. To relate these polymorphism data to their potential impact on protein function, we mapped all mutations onto the HIV-1 subtype B protein sequences for the protease (amino acids 1 to 99) and RT (amino acids 1 to 381) proteins. Overall, the Ka/Ks value for this region is 0.2687, indicating that it is under negative selection. This is consistent with previous reported results (18).
To seek drug-resistant mutations, we mapped positive selection pressure throughout the sequenced region by calculating a Ka/Ks value for each amino acid mutation. These results show dramatic differences in Ka/Ks at different positions in the proteins and strongly positive selection pressure at individual amino acids (Fig. 3 and 4). In marked contrast to the overall pattern of negative selection pressure in this region, we observed Ka/Ks values of >1 (i.e., positive selection) for 69 individual mutations in protease and 142 mutations in RT and Ka/Ks values of >10 for 20 mutations in protease and 47 mutations in RT. To assess the statistical significance of these results, we also calculated a P value for each mutation, giving the probability of the observed results under the assumption of neutral selection pressure (i.e., Ka/Ks = 1; see Materials and Methods for details of the P value calculation). By using a 1% statistical significance threshold, our positive-selection results are statistically significant (P values of <10-10 in most cases [Tables 1 and 2 ]).
![]() View larger version (31K): [in a new window] |
FIG. 3. Positive
selection mapping of HIV-1 protease from 40,000 patient samples. The
Ka/Ks value represents the
greatest selection pressure among all the individual amino acid
mutations at each codon. The dotted line indicates the
Ka/Ks value of
1.
|
![]() View larger version (38K): [in a new window] |
FIG. 4. Positive
selection mapping of HIV-1 RT from 40,000 patient samples. The
Ka/Ks value represents the
greatest selection pressure among all the individual amino acid
mutations at each codon. The dotted line indicates the
Ka/Ks value of
1.
|
|
View this table: [in a new window] |
TABLE 2. Positive-selection
pressure in
RTa
|
Positive selection of drug-resistant mutations. Positive selection mapping identified the majority of drug-resistant mutation positions identified in the published literature for HIV protease (Fig. 5a). We identified 47 positions in protease that showed positive selection of individual mutations. Notably, 19 of these 47 positions are known to be associated with drug resistance. Thus, positive selection mapping identified most (83%) of the 23 known drug-resistant mutation positions in protease (Fig. 5a). This is a statistically significant match. The P value for obtaining this result by random chance is 10-3.3. Moreover, the known drug-resistant mutations at these positions matched the amino acid changes that we observed to be positively selected (Table 1). It should be noted that at two of the four known drug-resistant mutations that we missed (Met 36 and Met 46), no synonymous mutations are possible (all mutations change the amino acid), and therefore we could not even calculate a Ka/Ks ratio there.
![]() View larger version (47K): [in a new window] |
FIG. 5. Positive
selection identifies drug resistance and positive fitness mutations.
(a) Identification of codons with positive selection, either from the
set of all positions in HIV protease (All codons), positions reported
in the literature as sites of drug-resistant mutations (Known drug
resistance associated codons), or positions reported as sites of
mutations specifically associated with adaptation to drug treatment
(Treatment associated codons). (b) Identification of specific amino
acid mutations with positive selection, either from the set of all HIV
protease mutations found in our data set (All mutations), or mutations
reported in the literature as causing drug resistance (Known drug
resistance associated mutations). (c) Phenotypic fitness, as measured
by a protease activity assay by Loeb et al., for a random sample of HIV
protease mutants (All mutations tested), or the subset of those
mutations found to have positive selection in our study (Positive
selected mutations). active, protease mutants with normal or
greater-than-wild-type proteolytic activity; intermediate, partial
cleavage was observed in the assay; inactive, no proteolytic cleavage
was
observed.
|
The statistical significance of our results becomes even stronger when evaluated at the level of individual mutations. Of the 527 amino acid changes in protease, we identified 69 that displayed positive selection (Ka/Ks > 1) with LOD scores greater than 2 (Table 1). Twenty-five of these corresponded to known drug-resistant mutations, of a total of 52 known to exist in protease. This is a statistically significant match: the P value for obtaining this result by random chance is 10-10.4. Of the 2,255 amino acid changes that we identified in RT, 142 demonstrated positive selection (LOD score > 2) (Table 2). Of these, 23 matched known drug-resistant mutations. The P value for obtaining this result (of the 55 known drug-resistant mutations in RT, by random chance) is 10-13.9.
Comparison with independent drug treatment studies for HIV protease. One basic weakness of our data set is the lack of drug treatment histories for the individual patients. Not only do we lack information about what specific treatment a patient received, but also many of the samples may come from patients who have not been treated with any HIV drugs. We therefore compared our results with a carefully controlled independent study that identified mutations associated with specific drug treatments (24). Rather than calculating Ka/Ks, this study kept a detailed drug treatment history for each patient and measured the change in the frequency of each mutation among patients treated with a given set of drugs from that of patients not treated with those drugs. By comparing 1,004 HIV isolates from untreated patients with 1,240 HIV isolates from patients treated with one or more protease inhibitors, Wu et al. identified 45 positions in HIV protease where mutations were specifically associated with drug treatment.
Our Ka/Ks data match the results of Wu et al. closely. Of the 47 positions in protease identified by our positive selection mapping, 34 matched those found by Wu et al. (Fig. 5a). This is a statistically significant result (P < 10-5.2). It is striking that Ka/Ks mapping of a random sample of HIV sequences, with no drug treatment information whatsoever, finds the majority (76%) of drug-resistant mutations identified by a careful study of specific drug treatments (24).
Comparison with independent assays of phenotypic fitness for HIV protease. Positive selection mapping should yield important information not only about drug resistance but also about mutations that improve viral fitness in other ways. To test this hypothesis, we also compared our results to the exhaustive site-directed mutagenesis results of Loeb et al., who constructed and assayed the biochemical activity of approximately 50% of all point mutants of HIV-1 subtype B protease (11). These data demonstrate that our positive selection mapping detects not only drug resistance but also key determinants of fitness (Fig. 5c). While the set of all mutations tested by Loeb et al. was strongly biased towards negative activity (no detectable protease activity), with a smaller number of positive (normal) activity and intermediate activity, the mutations detected by our positive selection metric were almost entirely of normal or increased activity. This is a significant result, with a P value of 10-16.6.
|
|
|---|
A major difficulty in anti-HIV drug development is the rapid selection of mutations in the viral genome that confer drug resistance by means of resultant changes within the protein target. Current clinical and academic research has been focused on some known codon positions that are associated with drug resistance. It is very important to detect and understand the pattern of these mutations. Due to the polymorphic nature of the HIV virus genome and the complexity of the drug resistance mechanisms, other codon positions may also play a role in the development of drug resistance. Our calculation of selection pressure for all the codons on the HIV-1 protease and RT regions can help to identify important codon positions that might affect drug resistance. Despite the fact that the Specialty Laboratories data set included no drug treatment information for the patients, our Ka/Ks calculations successfully identified 76% of mutations found to be associated with drug resistance through clinical studies.
Our approach differs from previous work in several ways. First, a number of studies have examined the problem of calculating selection pressure for individual sites in a protein (4, 14, 21, 25). However, our approach calculates Ka/Ks for each observed amino acid mutation, instead of combining all observed mutations for a site into a single Ka/Ks value. Our data for protease indicate that pooling multiple mutations for a site can obscure a large fraction (40%) of the positively selected sites that can be detected at the level of individual amino acid mutations. Second, we have not made use of any drug treatment information; our method does not require it, and the Specialty Laboratories data set did not include it. By contrast, Wu et al. identified drug-resistant mutations without considering Ka/Ks by comparing the frequency of each mutation in two groups: patients that received a specific drug treatment regime and a control population of untreated patients (24). Third, because our data reflect a single subtype (B), we have not considered phylogeny relationships or ancestral genotype in our analysis. For populations that contain important phylogenetic structure, it would be better to measure Ka/Ks in a way that takes this structure into account, as has been previously described (4). Finally, the base-calling software that we used (PHRED) does not report minor peaks when two or more nucleotide bases are present as a mixture at a given position in the chromatogram. Such minor peaks can be of great interest and deserve further analysis.
Impact of mutation on protein function. There are 23 known HIV-1 protease inhibitor (PI) drug-resistant mutations that have been mapped onto the terminal domain (positions 8, 88, 90, and 93), the core domain (positions 10, 20, 24, 30, 32, 63, 71, 73, 77, 82, and 84), and the flap domain (positions 33, 36, 46, 47, 48, 50, 53, and 54), respectively (8, 18, 21, 24). Nineteen of them were detected by positive selection mapping in our data. The high proportion of known drug-resistant positions detected by this approach (76 to 83%) suggests that it could provide a relatively useful and reliable new tool for detecting important new drug-resistant mutations.
Indeed, our positive selection analysis does detect 28 novel positions in protease that may be important functional determinants but the significance of which is currently unknown. Some of these positions (e.g., 35, 37, 62, 64, 72, 74, and 85) are adjacent to codon positions known to be associated with drug resistance (e.g., 36, 63, 71, 73, and 85). But most of the newly identified positions are not near the active site. Instead, they are primarily located in the core domain (e.g., positions 12, 13, 15, 19, 60, 62, and 64) and flap domain (e.g., positions 35, 37, 39, 41, 45, and 57). They may affect either enzyme catalysis or dimer stability or reshape the active site through long-range structural perturbations (19). It is possible that some of the positively selected mutations may act as accessory mutations that improve viral fitness rather than directly interfering with drug binding (1).
Our comparison with the protease activity data of Loeb et al. (11) demonstrated that our approach also provides a useful window on important fitness determinants in the evolving viral population. Given that Ka/Ks measures reproductive fitness fairly directly, this is not unexpected. Most of the positively selected mutations observed were conservative amino acid changes (Table 1). Thus, while natural selection evidently favors amino acid changes at these positions, such changes appear to be constrained to preserve structure and function. New experiments will be required to assess whether any of these mutations acts as a primary cause of drug resistance or contributes to drug resistance via a secondary effect.
The situation for RT is even more complicated. Most of the known drug-resistant mutations are in the 5' polymerase coding region, particularly in the "fingers" (codons 1 to 85 and 118 to 155) and "palm" (codons 86 to 117 and 156 to 237) subdomains (19). We have identified codon positions with positive selection pressure not only in the fingers and palm subdomains but also in the "thumb" subdomain (codons 238 to 318), which has seldom been mentioned in research on drug-resistant mutations before. In addition to affecting drug resistance and virus fitness, some of the newly identified positions might be epitopes for cell-mediated immunity (13). The importance of all these codon positions needs to be experimentally examined.
The amino acid-specific Ka/Ks data show distinct patterns of positive selection. For example, at protease codon 12 several amino acid changes were positively selected (T12K, T12P, and T12S), resulting in an overall Ka/Ks value of 6.92 for the codon. By contrast, at position 48 a single amino acid change (G48V) was positively selected, while the other possible amino acid changes were negatively selected, resulting in an overall Ka/Ks for the codon of 0.24 (negative selection). Such specificity may reveal important functional constraints in the evolution of the enzyme. G48V is strongly selected for (Ka/Ks value of 5.06) and has been shown to cause drug resistance (19). It is striking that other amino acid replacements at this position are not also favored, implying a significant functional constraint.
This work was supported by U.C. Life Science Informatics grant 01-10090 and by funding from Specialty Laboratories, Inc. to C. Lee and L. Chen. C. Lee was supported by NIH grant 1P20MH065166-01.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»