Previous Article | Next Article ![]()
Journal of Virology, May 2009, p. 4605-4615, Vol. 83, No. 9
0022-538X/09/$08.00+0 doi:10.1128/JVI.02017-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.
,
Department of Paediatrics, Nuffield Department of Medicine, Peter Medawar Building for Pathogen Research, South Parks Road, Oxford OX1 3SY, United Kingdom,1 Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3SY, United Kingdom,2 Partners AIDS Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts,3 Centre for Immunology, St. Vincent's Hospital, Sydney, Australia,4 Microsoft Research, One Microsoft Way, Redmond, Washington 9805,5 HIV Pathogenesis Programme, The Doris Duke Medical Research Institute, University of KwaZulu-Natal, Durban, South Africa,6 Howard Hughes Medical Institute, Chevy Chase, Maryland7
Received 24 September 2008/ Accepted 10 February 2009
|
|
|---|
|
|
|---|
Previous work has shown that HLA alleles, in particular those alleles associated with effective immune control of HIV, such as HLA-B*57 and HLA-B*27 (20, 21, 35), select a characteristic, predictable combination of escape mutations in the virus (8, 11, 25, 30, 38)—termed a "footprint" (31). The starting point for the present analysis is our observation, from previous studies of B-clade- and C-clade-infected cohorts (25, 29), that escape mutations selected in B-clade-infected individuals expressing HLA-B*57 frequently represented the consensus in C-clade sequences and that the consensus sequence in the AE clade bears many of the signature mutations of the HLA-B*57 footprint.
The aim of the present studies was to investigate the extent to which HLA footprints impact on HIV phylogeny, first by considering their relationship to sites of difference between clades and second by quantifying their influence on the clustering of sequences within a clade. We initially tested the hypothesis that amino acid differences between clades are commonly also those that represent escape variants associated with particular HLA alleles. To investigate this, we compared sites of interclade amino acid variability with sites at which HLA selection pressure had been previously identified from analysis of a C-clade-infected cohort of >700 study subjects in Durban, South Africa (30). The role of these variable sites in determining clade phylogeny was investigated by comparing the phylogenetic clustering of sequences in the presence and absence of the variable sites and also by swapping characteristic amino acid polymorphisms between clades.
Having established a potential role for HLA in driving amino acid sequence diversity between different clades, we next evaluated the extent to which HLA alleles might have an impact on viral evolution within a clade. We tested the hypothesis that characteristic combinations of CD8+ T-cell escape mutations might cause unrelated HIV nucleotide sequences of the same clade to cluster phylogenetically.
|
|
|---|
Impact of HLA selection on amino acid polymorphisms between clades. In order to investigate whether HLA footprints might account for some of the amino acid differences occurring between HIV type 1 (HIV-1) clades, we used clade A1, A2, AE, B, and C Gag, Pol, and Nef sequences published by the Los Alamos HIV database (www.hiv.lanl.gov). We selected these clades because A, B, and C account for the majority of infections worldwide and the most sequence data are available for these groups. In addition, we included analysis of CRF strain AE, in which Gag is derived from the A clade, to highlight the observation that these sequences show evidence of an HLA-B*57 footprint. We used the most recent (2004) consensus sequences for each clade to identify sites of amino acid differences between clade consensus sequences. We identified sites of HLA-mediated polymorphism by using previous analyses of C-clade Gag, Pol, and Nef sequences from 710 chronically infected adults (30). Fisher's exact test was used to test whether sites at which there is interclade variability are also sites of HLA-associated polymorphism. In order to determine the extent to which the results of our analysis are applicable despite the longitudinal HIV sequence changes that occur, we also compared ancestral HIV sequences for clades A1, B, and C (available at www.hiv.lanl.gov) to current consensus sequences for these three clades.
Impact of clade-specific amino acid polymorphisms on clade phylogeny. To investigate the effect of the sites of interclade difference on the phylogenetic distinction between clades, we used Gag p24 sequences downloaded from www.hiv.lanl.gov, excluding clonal sequences. First, we sought to determine whether these variable sites are fundamental to defining clades. We randomly selected 20 taxa from clades A1, A2, AE, B, and C and used 12 taxa available for clade A2. We constructed NJ phylogenetic trees from nucleotides in the presence and absence of 27 codons at which we had identified amino acid differences between clades.
Second, we investigated whether amino acid polymorphisms characteristic of one clade alter phylogenetic clustering when superimposed onto sequences from another clade. We selected 20 sequences at random from clades A1, B, and C. Sequences were then modified at sites at which we had identified variability between clades (these sites and their HLA associations are shown in Fig. 1) by substituting the consensus codon for one clade for the same site in taxa selected from another clade. NJ phylogenetic trees were constructed by using the original nucleotide sequences for each clade plus the altered sequences bearing the characteristic codons of a different clade.
![]() View larger version (36K): [in a new window] |
FIG. 1. Sites of interclade variability and HIV-associated polymorphisms in p24 Gag. Consensus sequences (as of 2004) for HIV clades A1, A2, AE, B, and C are shown. Sites of interclade variability are marked with gray bars, and x indicates the sites of HLA-B-mediated selection pressure identified by analysis of subjects with C-clade infections in Durban (30). The four HLA-B*57/5801 epitopes in p24 Gag are enclosed in open boxes, with arrows indicating sites of HLA-B*57/5801-associated escape mutation that have been described in previous studies of B- and C-clade infections (6, 29, 30). There is a correlation between sites of interclade variability and sites of HLA-B-mediated selection; P < 10–6 (Fisher's exact test).
|
Relationship between sites of HLA selection within a clade. We used C-clade population sequences from Gag, Pol, and Nef to investigate the relationship between entropy and HLA footprint sites. Sequences with gaps of more than five consecutive amino acids were removed from the analysis in order to avoid false overestimation of entropy due to missing data. Total sequence numbers for this analysis were as follows: p17 Gag, 584; p24 Gag, 646; p15 Gag, 421; protease, 402; reverse transcriptase (RT), 254; integrase, 244; Nef, 424. The Shannon entropy for each amino acid residue was calculated. We adopted a conservative approach by then excluding sites at which there is no amino acid variation (entropy score = 0) from further analysis. Sites of HLA-associated amino acid polymorphism were determined from our previously published analysis (30). We compared entropy at sites of HLA-A, -B, and -Cw selection to entropy at sites where no HLA selection pressure had been identified. Significant difference between the entropy scores of these two groups of sites was sought with a Mann-Whitney test.
Sequences used to asses phylogenetic clustering mediated by footprints of HLA-B*5703 and HLA-B*2705. We used population sequences from C-clade Gag and Nef to evaluate the impact of the HLA-B*5703 footprint and from B-clade Gag to evaluate the impact of the HLA-B*2705 footprint. We selected C-clade taxa from a pool of 566 Gag sequences (p17 and p24, 1,080 nucleotides) and 443 Nef sequences (621 nucleotides). HLA B*5703 was present in 35 subjects with Gag sequences and 16 subjects with Nef sequences. This allele commonly selects five escape mutations in Gag (8, 11, 25) and two in Nef (30) (for the mutations and their frequencies, see Fig. S1 in the supplemental material).
In our B-clade analysis, 6 subjects from the acute infection cohort and 19 subjects with chronic infection had HLA-B*27. The HLA-B*27 footprint comprises three mutations selected in chronic infection, at Gag positions 173, 264, and 268 (37, 38). To contribute to a pool of HLA-B*27-negative subjects, we selected sequences from www.hiv.lanl.gov expressing the wild-type amino acid at HLA-B*27 footprint sites. Due to limited sequence availability in the chronic infection cohort (38), phylogenetic trees were restricted to taxa of 330 nucleotides length.
Quantification of phylogenetic clustering. In order to quantify phylogenetic clustering, we adopted a maximum-parsimony approach. Our method calculates the minimum number of mutations required to produce an evolutionary history consistent with the specified amino acid pattern. We constructed NJ phylogenetic trees from nucleotide sequences and mapped all of the amino acid changes at the specified HLA footprint sites on this tree with the MacClade parsimony algorithm (28). The minimum number of mutations (parsimony score) for each of the footprint sites was then summed, giving a total minimum number of evolutionary changes. As a comparator, we calculated the equivalent parsimony score from the same sequences in the absence of the footprint sites. In order to assess the impact of the footprint mutations on clustering, we calculated the difference between the parsimony scores generated in the presence and absence of the footprint mutations for each pool of 100 sequences. A greater degree of phylogenetic clustering is reflected by a smaller parsimony score. Thus, as phylogenetic clustering accumulates, there is an increase in the difference between parsimony scores in the presence and absence of footprint sites; if no phylogenetic clustering is brought about by the footprint, we would expect this difference to be zero.
Phylogenetic clustering of sequences bearing an HLA-B*5703 footprint. We evaluated the phylogenetic clustering of sequences bearing an HLA-B*5703 footprint by generating a data set containing 100 C-clade Gag sequences. These were selected at random to comprise 20 HLA-B*5703-positive and 80 HLA-B*5703-negative individuals from the aforementioned pool of sequence data. We then repeated the process of selecting 100 Gag sequences at random 100 times over, constructing an NJ tree for each data set, and quantifying phylogenetic clustering, as described above, in the presence and absence of the five HLA-B*5703 footprint sites. In order to account for the statistical variance arising from phylogeny estimation, we also used one set of these taxa (100 Gag sequences) to generate 100 bootstrap trees and quantified phylogenetic clustering in the presence and absence of the footprint sites.
To assess clustering mediated by HLA-B*5703 footprinting in Nef, we generated 50 randomized data sets, each containing 100 C-clade taxa. Each data set contained the same 16 sequences from HLA-B*5703-positive patients (due to limited data, we did not randomize these sequences) and 84 selected at random from HLA-B*5703-negative patients. We used the same methods as described for Gag to quantify phylogenetic clustering in the presence and absence of footprint sites.
Simulation of phylogenetic clustering according to a varying number of footprint mutations and an altered footprint frequency. To further explore the phylogenetic impact of the HLA-B*5703 footprint, we used the methods described above to quantify the phylogenetic impact of varying the number of mutations per sequence and the proportion of sequences bearing mutations. To quantify the impact of between one and five footprint mutations, we used 100 Gag sequences selected at random from HLA-B*5703-negative subjects. We artificially superimposed a characteristic, conserved HLA-B*5703 footprint on 20 sequences from this pool of 100, starting with the mutation site with the strongest HLA-B*5703 association (defined by Fisher's exact test; see Fig. S1 in the supplemental material). We added the full footprint of five mutations, one site at a time, to each of the 20 selected taxa, quantifying phylogenetic clustering after the addition of each mutation and repeating this process of random sequence selection and addition of a conserved footprint 20 times. To explore how the proportion of sequences bearing the mutations impacts upon phylogenetic clustering, we repeated this analysis with all available HLA-B*5703-negative sequences (n = 526). We added sequential footprint mutations to 5%, 10%, and 20% of all taxa, selected at random, and repeated this process 10 times.
Phylogenetic clustering of Gag sequences bearing an HLA-B*27 footprint. We investigated the impact of the HLA-B*27 footprint on viral phylogeny by the same methods described above. We modeled the phylogenetic impact of this footprint by using taxa from HLA-B*27-negative subjects from our C-clade cohort. We randomly selected 100 Gag sequences and superimposed the characteristic three-site HLA-B*27 footprint (37, 38) on 20 sequences selected at random, repeating this process 20 times.
We then assessed phylogenetic clustering among sequences from subjects from B-clade cohorts by using subjects truly expressing HLA-B*27. The HLA-B*27 footprint is selected late in infection (14, 38), so we compared clustering among subjects with acute and chronic infections. Due to limited sequence numbers, we did not perform randomizations but constructed a single ML tree with 100 nucleotide sequences. These taxa consisted of all sequences from subjects expressing HLA-B*27 (n = 6 in the acute infection cohort and n = 19 in the chronic infection cohort), with the remaining sequences selected at random from the pool of HLA-B*27-negative subjects.
|
|
|---|
A strong association was seen between sites of interclade amino acid variability and sites of selection pressure mediated by HLA-A, -B, and -C (P = 4.1 x 10–7, P = 9.8 x 10–19, and P = 1.3 x 10–5, respectively; Table 1). Consistent with previous studies (21), the strongest association with interclade amino acid differences was for sites of selection pressure mediated by HLA-B in p24 Gag (P = 5.0 x 10–10) (Table 1 and Fig. 1 and 2a). This remained significant even when HLA-B*57/5801 was excluded from the analysis (P = 3.6 x 10–8), demonstrating that the association is not limited to these immunodominant alleles. Overall, 59% of the sites of interclade p24 Gag variability were also identified as sites of HLA-B-driven selection pressure (Fig. 1). For the Pol and Nef proteins, 18% of the variable residues were also sites of HLA-B-driven escape mutation (example shown in Fig. 2b).
|
View this table: [in a new window] |
TABLE 1. Correlation between sites of HLA-mediated immune selection pressurea and sites of interclade variability in clades A1, A2, AE, B, and Cb
|
![]() View larger version (19K): [in a new window] |
FIG. 2. Proportion of amino acid residues at which HLA-B-associated polymorphisms are detected. All amino acids in each protein are represented, divided into "variable residues" at which there are interclade differences in amino acids and "conserved residues" that are identical among consensus sequences for clades A1, A2, AE, B, and C. The proportion of each of these sites at which HLA-B selection has been previously identified (30) is shown in gray. (a) p24 Gag. (b) RT. P values were calculated with Fisher's exact test.
|
We also investigated whether HLA footprint sites are associated with sites of sequence variability within a clade. We used C-clade sequences to compare Shannon entropy for sites at which there is no known HLA selection to sites at which there is HLA-associated polymorphism (as identified previously [30]). As expected, we found significantly higher entropy scores at sites of HLA-mediated selection, compared to sites at which no HLA selection is detected (see Fig. S2 in the supplemental material); this relationship was most consistent for HLA-B (Table 2).
|
View this table: [in a new window] |
TABLE 2. Relationship between Shannon entropy at sites of HLA-associated amino acid polymorphism compared to sites at which no HLA selection was detected in C-clade sequencesa
|
![]() View larger version (28K): [in a new window] |
FIG. 3. Phylogenetic trees illustrating the preservation of clade phylogeny in the presence and absence of sites of clade-specific difference. Sequences from clades A1, A2, AE, B, and C were selected at random from www.hiv.lanl.gov, and ML phylogenetic trees were constructed (midpoint rooted, bootstrap values based on 100 replicates). Each clade is enclosed within a box to show clustering in the presence and absence of sites that vary between clades. The 2004 consensus sequence for each clade is marked by x. (A) Tree constructed from complete p24 sequences (693 nucleotides). (B) Tree constructed from p24 sequences in the absence of 27 codons at which there is amino acid difference between clade consensus sequences (612 nucleotides).
|
![]() View larger version (24K): [in a new window] |
FIG. 4. Phylogenetic trees illustrating altered distribution of taxa when codons determining clade-specific differences are swapped between clades. Twenty sequences were selected at random from the clades indicated within dashed boxes. ML trees were constructed from nucleotides (midpoint rooted, bootstrap values based on 100 replicates). (A) Codons defining amino acids characteristic of clade A1 (selected from the clade A1 consensus) were superimposed on 20 sequences from clade B. The clade B sequences are shown twice, once without alteration (marked B) and once with A1-clade amino acids superimposed (marked B+A1). (B) Codons defining amino acids characteristic of clade C (selected from the clade C consensus) were superimposed on the same 20 sequences from clade B. As before, the clade B sequences are shown twice, unchanged (marked B) and bearing the characteristic C-clade codons (marked B+C).
|
Phylogenetic clustering can be mediated by an HLA-B*5703 footprint in Gag. Previous analysis of C-clade sequences found that the HLA allele with the greatest number of associated HIV sequence polymorphisms was HLA-B*5703, with five strong associations in Gag alone (q, <0.05; P < 10–6; see Fig. S1 in the supplemental material) (30). Therefore, we initially focused on HLA-B*5703 to determine the potential impact of a single allelic footprint on viral evolution within a clade.
To determine whether HLA-B*5703-mediated selection pressure has a potential impact on viral evolution, we assessed the phylogenetic clustering of C-clade Gag and Nef sequences. Phylogenetic clustering among 100 Gag sequences, of which 20 were taken from subjects expressing HLA-B*5703, was quantified in the presence and absence of the five HLA-B*5703 footprint sites. In the absence of these five codons, phylogenetic clustering of the sequences was reduced. When this analysis was repeated with 100 randomized data sets, the clustering effect mediated by the HLA-B*5703 footprint sites was found to be highly statistically significant (P < 0.0001 [paired t test]; Fig. 5). In 100 bootstrap trees generated from a single data set of 100 taxa, this clustering effect remained equally significant (P < 0.0001 [paired t test]; data not shown). Such clustering occurs even though some HLA-B*5703-negative patients may also possess HLA-B*5703 footprint mutations as a result of transmission (2) or selection by non-HLA-B*5703 alleles (19; see Fig. S1 in the supplemental material).
![]() View larger version (20K): [in a new window] |
FIG. 5. Phylogenetic clustering in sequences from individuals with HLA-B*5703 with clustering analyzed in the presence and absence of the footprint sites. Phylogenetic clustering mediated by an HLA-B*5703 footprint was assessed in 100 NJ trees for Gag (with 20% of the sequences from subjects with HLA-B*5703) and 50 for Nef (with 16% of the sequences from subjects with HLA-B*5703). The mean difference in parsimony score is plotted, with error bars showing 95% confidence intervals. A significantly greater degree of phylogenetic clustering was observed in the presence of HLA-B*5703 footprint sites than in the absence of these sites for both proteins (P < 0.0001; paired t test).
|
Progressive phylogenetic clustering of Gag sequences occurs as sequential HLA-B*5703 footprint mutations are added. In order to quantify more specifically the impact of accumulating HLA-selected mutations, we conducted a simulation of HLA-B*5703 footprinting by using 100 sequences from HLA-B*5703-negative subjects and sequentially adding the five HLA-B*5703 footprint mutations to 20 of them selected at random. Representative phylogenetic trees constructed from these sequences are shown (Fig. 6), demonstrating progressive clustering between sequences bearing the footprint as more mutations were added. Quantifying this clustering by the methods described above, we found that progressively fewer mutations were required to explain the phylogeny as more footprint polymorphisms were added, reflecting increasing clustering of footprint-bearing taxa (Fig. 7A). Thus, significant phylogenetic clustering can arise as a consequence of imposing even a partial HLA-B*5703 footprint on randomly chosen Gag sequences.
![]() View larger version (24K): [in a new window] |
FIG. 6. Phylogenetic clustering as a consequence of artificial imposition of HLA-B*5703 mutations on HLA-B*5703-negative Gag sequences. ML phylogenetic trees constructed from 100 Gag sequences (selected at random from a pool of HLA-B*5703-negative individuals). The same 100 sequences are represented in each tree, and the same 20 are marked with arrows. (A) Twenty taxa were selected at random (marked with arrows). In this panel, no footprint mutations have been added. (B) The same sequences after the addition of three HLA-B*5703 footprint mutations to the arrowed sequences. (C) The same sequences after the addition of five footprint mutations. Progressive phylogenetic clustering among sequences bearing the HLA-B*5703 footprint is evident.
|
![]() View larger version (24K): [in a new window] |
FIG. 7. Model to show difference in phylogenetic clustering as five HLA-B*5703 footprint mutations were superimposed on Gag sequences. Each panel shows the mean difference in parsimony score between trees with no mutations and trees built from the same taxa with sequential HLA-B*5703 footprint mutations superimposed. Addition of mutations increases the difference in parsimony scores, reflecting progressive phylogenetic clustering (error bars show 95% confidence intervals; r2 from linear regression with the y intercept set to go through the origin). (A) One hundred Gag sequences were selected at random from HLA-B*5703-negative individuals and used to construct an NJ phylogenetic tree. Five characteristic HLA-B*5703 footprint mutations were added, one at a time, to 20 sequences in each tree. Twenty repetitions are shown. (B) All 526 taxa from HLA-B*5703-negative subjects were used. The footprint mutation was added to a varying proportion of sequences (individual sequences selected at random). Ten repetitions are shown.
|
By quantification of phylogenetic clustering with a larger pool of sequences (n = 526) with a variable proportion bearing the footprint, we found that the degree of clustering increases as the proportion of footprint-bearing sequences is increased (Fig. 7B). Even with only 5% of the sequences bearing the footprint (comparable to the true population phenotypic frequency of HLA-B*5703 in Durban), there is still a significant increase in clustering as the footprint mutations accumulate (r2 = 0.58, P < 0.0001 [linear regression]).
Phylogenetic clustering can be mediated by an HLA-B*5703 footprint in Nef. In order to examine the effect of a smaller HLA-B*5703 footprint in a more variable protein than Gag, we repeated the same analysis with Nef, which contains two polymorphisms associated with HLA-B*5703 (see Fig. S1 in the supplemental material) (30). We first investigated the true biological footprint, and subsequently imposed an artificial footprint of both mutations, by the same methods described above. Significant phylogenetic clustering of Nef sequences from B*5703-positive individuals was again seen as a consequence of shared footprint mutations (P < 0.0001 [paired t test]; Fig. 5).
Phylogenetic clustering can be mediated by an HLA-B*27 Gag footprint in chronic infection. Having established the significant phylogenetic impact of a single HLA allele that imposes a substantial footprint on the virus, we sought evidence of phylogenetic clustering mediated by a second allele, HLA-B*2705. HLA-B*2705 is also associated with immune control of HIV but selects a smaller footprint of three mutations in Gag, arising late in the course of infection (14, 38). We observed significant phylogenetic clustering when we superimposed the characteristic footprint of three mutations (37, 38) on C-clade Gag sequences (P < 0.0001 [paired t test]; see Fig. S4a in the supplemental material). Analysis of the impact of the HLA-B*2705 Gag footprint arising as a result of natural selection showed, as expected, no clear evidence of phylogenetic clustering among HLA-B*2705 subjects in acute infection, since the escape mutations characteristically arise late (14; see Fig. S4b in the supplemental material). In contrast, in our analysis of sequence data from subjects with chronic infection (38), there is clustering of taxa from HLA-B*2705-positive subjects (see Fig. S4c in the supplemental material), indicating that the selection pressure imposed by this allele can also drive convergent evolution over time.
|
|
|---|
Our observation that HIV amino acid differences between clades tend to be those that are also selected as CD8+ T-cell escape mutations has two possible explanations, (i) that these sites vary because of HLA selection or (ii) that sites of HLA escape are less constrained by a fitness cost to the virus. Both explanations may contribute to the observed findings. Analysis of the AE subtype that predominates in Thailand suggests that the former explanation may operate at least under some circumstances. The Gag mutations A163G and S165N are selected by the HLA-B*5703 CD8+ T-cell response to the epitope KAFSPEVIPMF (Gag 162 to 172) in C-clade infection (8). When the A163G mutant arises alone, it significantly reduces viral replicative capacity in vitro and reverts rapidly to the wild type in vivo in the absence of HLA-B*5703 (8). More commonly, A163G is found in association with the compensatory mutation S165N (8). The observation that T242N, the one mutation with an uncompensated fitness cost, does revert suggests that the other polymorphisms (presumably with compensatory mutations) do not revert because there is no fitness cost rather than as a consequence of ongoing selection pressure. We hypothesize that the founder strain may have been transmitted by an individual with HLA-B*5703, that is, that the AE subtype of HIV came to incorporate A163G/S165N as a consequence of HLA-B*5703 being the original driving force.
Similar observations can be made from the less extensive characterization that has been made of HCV-specific CD8+ T-cell responses: an escape mutation (Y1444F) within an immunodominant HLA-A*01 epitope in NS3 has been shown to accumulate in cohorts infected with genotypes 1 and 3 (23, 33). In the case of HLA B*27, the immunodominant epitope in NS5B in genotype 1 infection may undergo a double mutation which is associated with immune escape and loss of protection (34). These mutations form the consensus sequence of genotypes 2, 3, and 6 (Los Alamos Hepatitis C database; http://hcv.lanl.gov).
The contribution of HLA selection to amino acid sequence variation may be substantial, with 59% of the variation between Gag clade consensus sequences associated with sites of HLA-B selection. Moreover, these results are likely to be a considerable underestimate, as the HLA escape mutations we have considered here are limited to those identified in our Durban cohort (30). Sites that are under positive selection may also be identified by calculating the dN/dS ratio (9, 10), and this approach could be used to generate additional evidence that differences between clades relate to HLA selection. However, the results of dN/dS ratio calculations vary as a function of the frequency of the selecting allele, are affected by the rate of reversion, and are difficult to apply to large populations; for these reasons, this approach is outside the scope of the present analyses.
The substantial overlap between sites of amino acid variability and sites of HLA-driven escape mutation is of relevance to T-cell-based vaccines. These sites of amino acid variability are not simply "toggle" sites (12) of little significance to T-cell recognition. On the contrary, toggling between amino acid variants may be a function of positive selection (9), and our data suggest that vaccine constructs may need to be matched to the clade of virus prevailing in the target population.
We have demonstrated that sites of amino acid difference between clades are not required to distinguish clade specificity, suggesting that clades have been originally defined by nucleotide differences between founder sequences. Stripping sites of only nonsynonymous nucleotide substitutions is somewhat artificial since nonsynonymous and synonymous changes may arise in the same codon. However, the finding that clade clustering is preserved in the absence of synonymous changes is consistent with the previous observation that all HIV clades may be traced to ancestral sequences from the same region of Africa (39, 40), rather than occur subsequently as a consequence of HLA selection. Nevertheless, we have also shown that exchange of polymorphisms between clades does affect phylogeny, underscoring the potential for HLA selection to shape the future evolution of the epidemic.
We have shown that otherwise unrelated HIV sequences within a clade may cluster together phylogenetically as a consequence of selection pressure imposed by even a single HLA allele, both in true sequence data and in a model of serial mutations. This simulated approach is robust because the underlying sequences are altered only at footprint sites, and the mutations themselves are inferred from genuine sequence data. The model shows more phylogenetic clustering than that seen among the true sequences as a consequence of complete conservation of the mutations applied in the simulation compared to variations in the true sequences (number of mutations, sites of mutations, and nucleotide substitutions are all conserved in the model but vary in real sequences). Irrespective of this, we show that clustering mediated by the HLA-B*5703 footprint sites remains highly statistically significant in true biological sequences. The significance of this clustering effect is all the more striking when considered in the context of the enormous diversity of HIV and the potential for multiple HLA footprints to coexist.
This observation is of potential utility when considering the use of lineage-corrected approaches in the detection of HLA-mediated selection pressure on viruses. Statistical approaches to identify associations between HIV sequence polymorphisms and HLA alleles (21, 32) have been refined to account for a founder effect (4): lineage-based methods correct for similarities among taxa generated by their common ancestry, thus distinguishing bona fide HLA-escape mutations from artifactual associations mediated by founder effects (18, 36). Conversely, we show here that viral amino acid polymorphisms arising independently in individuals who share an HLA allele (29, 31, 38) may not be identified as independent mutations (homoplasies) in phylogenetic reconstruction but instead can mistakenly appear as shared ancestral mutations (synapomorphies). This may result in sequences that share common escape mutations being artificially grouped together during phylogenetic reconstruction. This phylogenetic bias has previously been addressed in the setting of HIV adaptation to neutralizing antibodies (18) and drug therapy (24) but not in the context of adaptation to CD8+ T-cell responses. This confounding effect can be minimized by excluding the nucleotides under analysis (footprint sites) when constructing the phylogenetic tree. However, this requires a priori knowledge of the sites of escape mutation. In the absence of this information, phylogenetic clustering is likely to be reduced by maximizing the length of the sequence analyzed. This is relevant to many studies that carry out phylogenetic analysis with short protein fragments (e.g., protease, 99 amino acids). Artifactual clustering is likely to be a particular problem for within-clade HIV data sets, which are characterized by many phylogenetically uninformative singleton polymorphisms.
In conclusion, these data support a role for both founder effects and HLA selection in establishing the epidemic and suggest that future HIV evolution—within and between clades—may be significantly shaped by the HLA alleles to which the virus is exposed. CD8+ T-cell vaccines may therefore need to be geared to the clade of virus affecting the target population and modified over time to keep pace with evolutionary changes in the virus driven by HLA.
This work was funded by the National Institutes of Health (contracts NO1-A1-15422, 2RO1AI46995-06, and R01A1067073), the Wellcome Trust (A.L. and P.G.), the UK Medical Research Council (P.M. and A.P.), and the Mark and Lisa Schwartz Foundation. P.G. is an Elizabeth Glaser Pediatric AIDS Foundation Scientist.
Published ahead of print on 25 February 2009. ![]()
Supplemental material for this article may be found at http://jvi.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»