Previous Article | Next Article ![]()
Journal of Virology, February 2003, p. 2587-2599, Vol. 77, No. 4
0022-538X/03/$08.00+0 DOI: 10.1128/JVI.77.4.2587-2599.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
HIV-1 Molecular Virology and Bioinformatics Laboratories, Africa Centre for Health and Population Studies and the Nelson R. Mandela School of Medicine,1 Centre for HIV/AIDS Networking,2 Department of Dermatology,5 Department of Virology, University of Natal,6 Medical Research Council, Durban,3 Department of Medical Virology, University of Stellenbosch and Tygerberg Hospital, Tygerberg, South Africa,4 Nuffield Department of Clinical Medicine, University of Oxford, Oxford, United Kingdom7
Received 29 July 2002/ Accepted 7 November 2002
|
|
|---|
|
|
|---|
Recent studies suggest that subtype C is spreading northward into the Congo, Tanzania, Burundi, and Kenya, where it is becoming increasingly predominant relative to other subtypes (24, 28, 54). C viruses also dominate the rapidly expanding epidemic in India (59) and are increasing in frequency in China (15, 54, 76) and Brazil (4, 64). C/D recombinants have been identified in several countries, including Tanzania, Kenya, and India (18, 33, 52), and C/B recombinants have been detected in China (73).
The reasons for the increase in HIV-1 C are not known but may be related to host, viral, or socioeconomic factors. At the viral level, it has been suggested that an extra NF-
B binding site in the long terminal repeat may enhance gene expression, altering the transmissibility and pathogenesis of C viruses (66). Others have suggested that C viruses may be more stable and that their protease genes may have increased catalytic activity relative to other subtypes (72). Additional features of subtype C include a five-amino-acid insertion in the transmembrane domain of Vpu (42), a prematurely truncated second exon of rev (15, 54, 78), and an increase in amino acid variation at protease cleavage sites (T. de Oliveira et al., submitted for publication).
Recent advances in sequencing and bioinformatics (9, 48, 49, 74) make it easier to analyze full-length HIV-1 sequences and correlate the genetic information with the immunological and biological properties of the virus. These advances, combined with the development of promising vaccine candidates and simplified, more affordable drug regimens, are paving the way for enhanced prevention and treatment efforts in southern Africa. As with HIV-1 B, it is expected that safe and efficacious treatment of C infections will not only reduce the morbidity and premature death associated with HIV-1 and AIDS (16, 22, 27, 46) but will also play a role in reducing transmission (23).
Since we are on the brink of implementing intervention strategies in a region of the world where subtype C infections predominate, it is urgent that we collect information that will help define the phylogenetic relationships, transmissibility, and drug responsiveness of C viruses. In this study, we analyzed the C2V5 and pol subgenomic regions of 72 contemporary viruses from KwaZulu-Natal and compared the results with those for 18 retrospective C isolates from South Africa.
|
|
|---|
Viral load and CD4+ cell counts. RNA was extracted from plasma and dried blood spots with a guanidinium-silica method (Nuclisens isolation kit; Organon Teknika) and an automated extractor (Organon-Teknika). Virus levels were measured with the Nuclisens HIV-1 QT kit, an assay with a quantitative range of 40 to >500,000 copies of HIV-1 RNA/ml of plasma. When applied to 50 µl of dried blood, the lower limit of detection is 1,600 HIV-1 RNA copies/ml of blood. Specificity of the method has been previously assessed and shown to be greater than 98.9% (6). CD4+ cell counts in venous blood were determined according to a standard FACSCount method.
Sequencing of the envelope C2V5 region. Sequencing of env was performed directly on a 621-bp PCR product generated from the C2V5 region (nucleotides 7026 to 7646, relative to HXB2) (31). RNA was extracted from plasma with the ViroSeq method (Applied Biosystems). Plasma RNA and Nuclisens-extracted dried blood spot RNA were reverse transcribed to cDNA with Superscript II and random hexamer primers (Invitrogen Corp., San Diego, Calif.). The RNA template and random primers (100 ng) were heated to 70°C for 10 min, chilled on ice, and reverse transcribed at room temperature in a 20-µl reaction volume containing 1x reaction buffer, 10 mM dithiothreitol, 0.5 mM each deoxynucleoside triphosphate, and 200 U of Superscript reverse transcriptase (Invitrogen) at 42°C for 50 min, followed by 15 min at 70°C.
The C2V5 env region was amplified from the cDNA with MK605 (5'-AATGTCAGCACAGTACAATGTACAC-3'; positions 6945 to 6969) and CD4R2 (5'-TATAATTCACTTGTCCAATTGTCC-3'; positions 7652 to 7675) as outer primers (5) and (M13F)-ES7 (5'-tgtaaaacgacggccagtCTGTTAAATGGCAGTCTAGC-3'; positions 7002 to 7021) and (M13R)-ES8 (5'-caggaaacagctatgaccCACTTCTCCAATTGTCCCTCA-3'; positions 7648 to 7668) as inner primers. The first and second PCR steps were carried out in final volumes of 25 µl and 50 µl, respectively, containing 1x PCR buffer, 2.0 mM MgCl2, 0.2 mM each deoxynucleoside triphosphate, 2.5 pmol of each primer, and 1.25 U of Amplitaq Gold. The PCR conditions were 95°C for 13 min, followed by six cycles at 95°C for 30 s, 65°C for 45 s, and 72°C for 60 s, with a decrease of 1°C per cycle. This was followed by 29 cycles at 95°C for 30 s, 60°C for 45 s, and 72°C for 60 s, with an increase of 5 s for each extension cycle, and a final extension of 72°C for 10 min. Amplified DNA was visually quantified by agarose gel electrophoresis, purified on a Microcon (Amicon) spin column, and sequenced on an automated 3100 genetic analyzer (Applied Biosystems Inc., Foster City, Calif.) with M13 sequencing primers and a Big-Dye terminator cycle sequencing kit.
Sequencing of reverse transcriptase and protease. Sequencing of pol (nucleotides 2253 to 3485, relative to HXB2) (31) was performed with the ViroSeq HIV-1 genotyping system (Applied Biosystems). Plasma and dried blood spot RNAs were reverse transcribed with Moloney murine leukemia virus reverse transcriptase. A 1.8-kb fragment containing the protease (amino acids 1 to 99) and reverse transcriptase (amino acids 1 to 312) regions was then amplified in a 40-cycle PCR with Amplitaq Gold DNA polymerase and AmpErase dUTP/uracil-N-glycosidase to minimize the risk of cross-contamination. PCR products were visually quantified by agarose gel electrophoresis. Following purification, the products were sequenced with six of the seven kit primers (primer D was not used) and Big-Dye terminator reagents and run on a 3100 genetic analyzer as described above. Sequences were assembled, translated, and analyzed for the presence of amino acid polymorphisms. A report was generated for each sequence, with mixtures of wild-type and mutant bases being classified as mutant.
Genetic subtyping and phylogenetic analysis. To rule out contamination between samples, each new sequence was compared to other sequences amplified at the same time, as well as to other sequences previously amplified in our laboratory and published sequences in the Los Alamos BLAST search database (2). The sequences were aligned with CLUSTAL W (67) and manually edited with the codon alignment of the Genetic Data Environment (GDE 2.2) program (63). New sequences were then compared to subtype reference strains in the Los Alamos subtype database (http://hiv-web.lanl.gov/content/hiv-db/SUBTYPE_REF/align.html). Following degapping with the degapped option in PAUP*, phylogenetic trees were generated on a Linux computer with the F84 model of substitution and the neighbor-joining method (version 4.0b2a) of PAUP* (65). Trees were rooted with a homologous region of HIV-1 group O (O-CM_MP5180).
To examine intrasubtype relationships, each KwaZulu-Natal sequence was analyzed against a subset of published C sequences from Zimbabawe, South Africa, Brazil, Tanzania, Zambia, Ethiopia, Israel, and eastern India. Appropriate evolutionary models were selected with the Akaike identification system (1), implemented in MODELTEST 3.0 (48). With this method, a pairwise distance matrix was calculated and used to construct neighbor-joining maximum likelihood trees. Parameters of the reverse transcriptase/protease model, TVM + I + G, were:
A = 0.3986,
C = 0.1653,
G = 0.2033, and
T = 0.2328; R matrix values, RA
C = 2.7534, RA
G = 10.1383, RA
T = 0.9138, RC
G = 1.3684, RC
T = 13.5383, and RG
T = 1.0000; proportion of invariable sites = 0.4263; and heterogeneous variable site distribution (gamma) with alpha shape = 0.8233. Parameters of the env model, GTR + I + G, were:
A = 0.3801,
C = 0.1838, fG = 0.2890,
T = 0.1472; R matrix values, RA
C = 3.3002, RA
G = 8.3576, RA
T = 3.7717, RC
G = 1.9646, RC
T = 23.3707, RG
T = 1.0000; proportion of invariable sites = 0.1534; and heterogeneous variable site distribution (
) with alpha shape (
) = 0.7332. Trees were viewed with Treetool and Treeview.
Genetic diversity and intersubtype recombination analysis. Mean genetic distances were measured with the Kimura-2 parameter model implemented in MEGA (35). To investigate whether the sequences were recombinant forms of subtype C, recombination analyses were performed with the recombination identification program (62), Bootscanning (56), recombination detection program (53), and Simplot (39), a method that uses a sliding-window approach to calculate bootstrap plots for constructing neighbor-joining trees with the DNADIST, NEIGHBOR, or CONSENSE programs of the PHYLIP package (14).
Nucleotide and amino acid sequence analysis. Nucleic acid sequences were also analyzed with SNAP (http://hiv-web.lanl.gov) (32) and Codeml, a program from the PAML software package (51). Various software programs were then used to calculate the ratio of synonymous to nonsynonymous amino acid substitutions as a measure of natural selection pressure at the protein level. Programs included SNAP and MEGA (35), which calculate a synonymous-to-nonsynonymous (ds/dn) substitution ratio, and Codeml, which calculates a w (dn/ds) value. High rates of synonymous mutation are indicative of conservation and a strict requirement for biological function, while high rates of nonsynonymous substitution are indicative of adaptive change, presumably in response to host selection pressure.
To identify amino acid patterns that are characteristic of KwaZulu-Natal viruses, nucleotide sequences were translated and aligned and the consensus was analyzed by viral epidemiology signature pattern analysis (32). Consensus sequences were screened for the presence of biologically important sites with Prosite, a database of protein families and domains.
Identification of resistance mutations and correlation with phenotype. The Stanford HIV-SEQ and ß-test programs were used to identify and assess the impact of resistance-associated mutations and polymorphisms on phenotypic resistance. Each reverse transcriptase and protease sequence was compared to that of a subtype B reference strain, HXB2, in the Stanford HIV reverse transcriptase and protease sequence database (http://hivdb.Stanford.Edu/hiv/). Mutations associated with reduced sensitivity to antiretroviral drugs were assigned a drug penalty score based on genotypic-phenotypic correlative data.
Nucleotide sequence accession numbers. GenBank accession numbers for sequences obtained in this study including information on the year of specimen collection and risk category are provided in Table 1.
|
View this table: [in a new window] |
TABLE 1. GenBank accession numbers and year of samplinga
|
|
|
|---|
|
View this table: [in a new window] |
TABLE 2. Characteristics of and laboratory results for children and adults in the study
|
|
View this table: [in a new window] |
TABLE 3. DNA distances between subtype C sequences from different population groups
|
To investigate within-subtype clustering, trees were constructed with published C sequences from eight different countries (Fig. 1 and 2). Full-length reference sequences were selected because these strains contained both the env and pol genes. Unlike sequences from India, where seven out of nine (77.7%) samples grouped as a single monophyletic group, KwaZulu-Natal sequences were widely dispersed across multiple clusters, or sublineages. The topology of samples within these maximum-likelihood and neighbor-joining trees was similar for both env and pol and for retrospective specimens collected prior to 1992.
![]() View larger version (46K): [in a new window] |
FIG. 1. Representative pol tree showing the relationships between retrospective and contemporary sequences from South Africa, Botswana, and other countries affected by the subtype C epidemic. The sequences are coded by the country of origin and year of isolation. The following sequences were included in the analysis: 49 previously described isolates from Botswana (accession numbers AF110960, AF110963, AF110967, AF110970, AF110972, AF110973, AF110978, and AF443074 to AF443115), 9 sequences from India (accession numbers AF286232, AF286223, AF286231, AB023804, AF067159, AF067155, AF067154, AF067157, and AF067158), 4 sequences from Tanzania (accession numbers AF286234, AF286235, AF361874, and AF361875), 2 sequences from Zambia (AF286224 and AF286225), 2 sequences from Brazil (U52853 and AF2862228), 1 sequence from Ethiopia (U46016), 1 sequence from Israel (AF286233), and 69 sequences from South Africa, including 5 previously described sequences (AF286227, AY043173, AY043174, AY043175, and AY043176), 3 sequences from another study (71), and 61 sequences newly generated from this study (14 retrospective and 47 contemporary strains).
|
![]() View larger version (36K): [in a new window] |
FIG. 2. Phylogenetic relationship of C2V5 envelope sequences from KwaZulu-Natal, Botswana, Zambia, and Tanzania. Non-KwaZulu-Natal strains are the same as those described in Fig. 1.
|
KwaZulu-Natal and subtype-specific signature motifs. The KwaZulu-Natal protease consensus sequence was identical to the consensus sequence of subtype C at 100% of 99 amino acids, but differed from the consensus of subtypes A, B, and D at seven, eight, and six positions, respectively (Fig. 3). Compared to the B consensus, amino acid substitutions were identified at 32 different positions. The mean number of substitutions was nine, with 65 (94.2%) isolates having eight or more substitutions relative to subtype B.
![]() View larger version (47K): [in a new window] |
FIG.3. Correlation of signature patterns with structure and function for protease and reverse transcriptase. conKZN, KwaZulu-Natal consensus; conA, conB, conC, and conD, consensus sequences for subtypes A, B, C, and D, respectively; APV, SQV, RTV, NFV, INV, drug binding sites for amprenavir, saquinavir, ritonavir, nelfinavir, and indinavir, respectively; functn, RT, reverse transcriptase; CTL, cytotoxic T-lymphocyte epitope; , drug-binding site; k, protein kinase C phosphorylation site; c, casein kinase phosphorylation site; m, myristoylation site; aaaa, amidation site; t, tyrosine kinase phosphorylation site; g, cyclic AMP- and cyclic GMP-dependent protein kinase site; T, thiocarboxanilide UC-781; N, nevirapine; Q, quinoxaline HBY 097; E, efivirenz; a, accessory mutation; P, primary mutation; caret, extended ß-strand; S, bend; star, hydrogen-bonded turn; h, helix; p, purifying selection pressure; d, Darwinian (positive) selection pressure.
|
|
View this table: [in a new window] |
TABLE 4. Frequency of the most common amino acid substitutions in the pol gene
|
![]() View larger version (23K): [in a new window] |
FIG. 4. Correlation of signature patterns with structure and function of V3 loop. KNZenv, KwaZulu-Natal consensus; Con_A, Con_B, Con_C, and Con_D, consensus sequences for subtypes A, B, C, and D, respectively; k, protein kinase C phosphorylation site; c, casein kinase phosphorylation site; n, N-linked glycosylation site; caret, extended ß-strand; h, helix; 4, CD4+ binding site; d, Darwinian (positive) selection pressure.
|
|
View this table: [in a new window] |
TABLE 5. Amino acid substitutions at codons associated with drug resistancea
|
Impact of substitution on functional motifs. Naturally occurring polymorphisms also resulted in significant variation in the number and type of phosphorylation sites. Overall, 17 potential phosphorylation sites were identified in the pol gene, 3 in the protease and 14 in the reverse transcriptase. Twelve of the pol sites were conserved among KwaZulu-Natal patients and in the consensus sequences for subtypes A, B, C, and D (Fig. 3). These included the predicted protein kinase C site at codons 12 to 14 near the N terminus of the protease and the two casein kinase II phosphorylation motifs at the active site. Most KwaZulu-Natal sequences had an S-X-K rather than a T-X-K motif at protease codons 12 to 14. Conserved phosphorylation sites in reverse transcriptase included protein kinase C codons 68 to 70; tyrosine kinase codon 49 to 56; cyclic AMP/cyclic GMP-dependent codons 65 to 67, 102 to 105, and 125 to 128; and CKII codons 3 to 6, 107 to 110, 191 to 194, 215 to 218, and 253 to 256.
Two KwaZulu-Natal patients lacked a cyclic AMP phosphorylation site at reverse transcriptase codons 102 to 105 due to the presence of a K103N mutation. Some phosphorylation sites, such as the CKII sites at reverse transcriptase positions 39 to 41 and 200 to 203, were present in subtypes A, B, and D but absent from most of the KwaZulu-Natal and subtype C sequences. Other differences included the absence of an internal myristoylation site (41) at reverse transcriptase codons 196 to 201 in nine patients and the presence of an amidation site at protease codons 67 to 70 in subtype A, subtype C, and all but two of the KwaZulu-Natal sequences. With a single exception, all of the natural reverse transcriptase mutations were embedded within cytotoxic T-lymphocyte, T-helper, or overlapping cytotoxic T-lymphocyte/T-helper epitopes, as defined for B viruses.
Of particular interest, with respect to the env gene, was a cluster of substitutions located at or in close proximity to the bottom of the V3 loop, a region known to play a major role in viral tropism and coreceptor usage. This cluster included amino acid -1, immediately upstream from the cysteine residue at the beginning of V3, and amino acid positions 11 and 13 within the V3 loop itself. In common with other C viruses, strains from 89.0% of KwaZulu-Natal patients had amino acid substitutions that resulted in elimination of the N-linked glycosylation site at position -1 (amino acid 301 according to the numbering of Korber at al. [31]). In 91.0% of patients, loss of glycosylation was associated with a serine (S) substitution at position 11 and the presence of a positively charged arginine (R) residue at V3 position 13. The resultant S-X-R motif gave rise to a second, alternative protein kinase C site immediately adjacent to the phosphorylation site at amino acids 8 to 10. These findings suggest a potential linkage between deglycosylation and phosphorylation in the V3 loop of C viruses.
Most A variants also carried the extra protein kinase C site at position 11 to 13 but lacked the N-linked glycan at position -1. Instead, a more distal N-X-S glycosylation site (positions -7 to -5) was frequently absent in A viruses. Another protein kinase C site, located downstream from the C terminus of V3 at positions 45 to 47 (relative to V3), was missing in most KwaZulu-Natal viruses. This site is highly conserved among subtype B viruses. In common with subtype B, KwaZulu-Natal and other C viruses contained a highly conserved CKII site at amino acids 68 to 71.
|
|
|---|
Our results indicate that C viruses in KwaZulu-Natal have a higher level of nucleotide diversity than previously reported (70, 71) and that the epidemic, in its explosive phase, is characterized by multiple circulating sublineages in both the Indian and black communities. The restricted distribution of subtype C viruses from India compared to the multilineage pattern of Indian viruses from Africa indicates that the two Indian epidemics have different origins and different evolutionary histories. The presence of retrospective samples (collected prior to 1990) at internal (basal) branches in three of the sublineages suggests that each lineage is derived from a different founder variant and that these variants have been cocirculating in South Africa for at least 10 years. Of significant note was the cosegregation and close relatedness of sequences from KwaZulu-Natal black and Indian inhabitants, not only to each other, but also to published sequences from Botswana. This close relationship with sequences from Botswana was not observed in a previous study (45), presumably because of the small number of samples included from South Africa (n = 5). Taken together, our findings confirm the existence of multiple HIV-1 C sublineages in southern Africa and demonstrate that the spread of these different lineages has been substantial.
The finding that C viruses from KwaZulu-Natal are substantially more diverse than those in India and Brazil is consistent with other studies and has been attributed to the longer duration of the AIDS epidemic in Africa (4, 59). The overall evolutionary rate of pol and env sequences, as measured by a dated-tip likelihood method (51), was 35% and 68% higher than that of subtype B. Despite the high level of diversity, KwaZulu-Natal viruses were remarkably well conserved at the amino acid level, both within subtype C and among different individuals. This is due to the fact that a large number of the nucleotide substitutions are silent (synonymous) mutations that cause no change in the amino acid sequence. As a result, the consensus sequence for the KwaZulu-Natal protease was identical to the consensus sequence for subtype C, while the reverse transcriptase consensus sequences differed from the C consensus at a single amino acid, codon 60.
High rates of synonymous-to-nonsynonymous nucleotide change have also been observed among subtype C isolates from Zimbabwe (58) and Ethiopia (38). This inherent property of African subtype C viruses is a reflection of the differential pressure exerted on the three positions of the amino acid code. For the KwaZulu-Natal reverse transcriptase gene, the mutation rate for the third position of the codon was four times higher than that observed for the second position and 30 times higher than for the first codon position (data not shown).
The conservation of subtype C at the amino acid level offers considerable promise for the development of a consensus- or ancestor-based "supervaccine" (17, 45). Recent primate studies suggest that it may be possible to overcome diversity and achieve cross-protection against different HIV-1 variants (12, 61). However, it should be stressed that the long-term impact of silent mutations on vaccine efficacy is not known.
In the context of antiretroviral therapy, one recent study found that, despite numerous naturally occurring mutations in reverse transcriptase, C viruses from Zimbabwe were as susceptible as subtype B viruses to commonly used nucleoside and nonnucleoside reverse transcriptase inhibitors (58).. However, another recent study found that, although C viruses in Ethiopia were susceptible to reverse transcriptase inhibitors, the presence of silent mutations led to a more rapid emergence of resistance (38). These data emphasize the need for carefully designed prospective trials to determine whether existing polymorphisms influence the development of resistance in C-infected patients.
With the exception of two primary resistance mutations, K103N and G190A, which occurred in a single husband-wife pair, none of the reverse transcriptase or protease polymorphisms occurred at drug-binding sites or at active sites of the enzymes. Both mutations are known to cause high-level resistance to nevirapine in persons infected with subtype B (50). Although believed to be naturally occurring, the possibility that these mutations represent treatment-induced changes cannot be excluded. As many as 15% of patients in the private sector in South Africa have received or are currently receiving some form of antiretroviral therapy. Many protocols include nevirapine because of its low cost and long half-life. Nevirapine is also being increasingly used for the prevention of mother-to-child HIV-1 transmission in KwaZulu-Natal and other regions of Africa (23).
All of the remaining pol polymorphisms occurred in regions involved in the three-dimensional configuration of reverse transcriptase and protease. One such polymorphism, which occurred in a single patient, was A98G in the reverse transcriptase. This mutation was also detected in a treatment-naive patient from Ethiopia (38). In persons infected with subtype B, A98G has been associated with low-level resistance to nonnucleoside reverse transcriptase inhibitors. Other polymorphisms were localized within the hinge region of protease, a region that induces conformational changes during drug binding. A subset of these mutations, M36I/R41K/H69/L89 M, has been linked to increased catalytic activity in subtypes A and C (72). Another series of polymorphisms, at codons 12, 15, 19, and 93, occurred in >80% of KwaZulu-Natal viruses and formed a KwaZulu-Natal/subtype C signature motif. The first three amino acids of this motif are located near the N terminus of protease, in an extended ß-strand; the fourth, I93L, is located in a hydrogen-bonded turn, immediately upstream of the protease/reverse transcriptase cleavage site. The marked dominance of I93L among C viruses, its close proximity to the protease/reverse transcriptase cleavage site, and its linkage to the T12S/T15V/L19I signature warrant further investigation. Studies of HIV-1 B have reported that mutations in the protease and Gag-Pol cleavage sites contribute to drug resistance, are specifically selected during therapy, and can lead to improved enzyme kinetics (7, 11).
The observed natural polymorphisms did not occur at random but were clustered in specific functional domains of the reverse transcriptase, protease, and env genes. Overall, >95% of KwaZulu-Natal pol codons were found to be under strong purifying (negative) selection pressure (dn/ds < 1.0) and thus were unlikely to undergo nonsynonymous substitution. These conserved codons were concentrated within active sites and at drug-binding sites in reverse transcriptase and protease and at nucleoside triphosphate binding sites in reverse transcriptase. The remaining 5% of amino acids were under strong positive selection pressure and were concentrated in regions associated with maintaining the tertiary structure and facilitating conformational changes. Some positively selected codons, such as protease 63 and reverse transcriptase 123 and 174, showed extensive interpatient and intersubtype variation. Other codons (such as protease 12S and reverse transcriptase 39E, 245Q, 272P, and 277R) were highly conserved among KwaZulu-Natal and subtype C sequences and formed part of an HIV-1 C signature sequence. The conservation of codons in the face of strong diversifying pressure suggests that they may play an important role in the evolutionary, structural, and phenotypic properties of C viruses. A few positively selected codons were conserved across several subtypes, suggesting that they may contribute to the evolutionary history of group M viruses.
Although many factors contribute to the generation of new variants, one of the most important is related to cytotoxic T lymphocytes and the role they play in recognizing epitopes presented by major histocompatibility complex class I molecules. With a single exception, all of the naturally occurring reverse transcriptase mutations were embedded within cytotoxic T-lymphocyte, T-helper, or overlapping cytotoxic T-lymphocyte/T-helper epitopes as previously defined for B viruses (30). Several signature sequences in env also mapped to known subtype B cytotoxic T-lymphocyte epitopes, including the heavily glycosylated regions at the bottom of V3 and the associated protein kinase C phosphorylation site at V3 position 11. Information on subtype C epitopes is just beginning to emerge and, when combined with novel methods of analysis, may lead to new insights into the immune selection pressures occurring during seroconversion and in response to therapy. By examining sites under positive selection pressure, we may be able to identify targets of the host immune system and select appropriate epitopes for inclusion in a subtype C vaccine.
Although it is well known that most C viruses lack a V3 glycosylation site and a basic amino acid residue at position 11, the biological significance of these findings remains unclear. Disruption of V3 glycosylation has also been reported to occur in 52%, 34%, and 20% of subtype G, A, and D viruses, respectively. Studies of subtype B have suggested that this N-linked glycan may play a role in the interaction of gp120 with its coreceptors (37) and in perinatal transmission. Nakayama et al. (43) found that absence of this V3 glycan caused a marked reduction in CXCR4-dependent but not CCR5-dependent viral entry. Others have suggested that the V3 glycan is not necessary for CXCR4 usage (40) and that its absence leads to enhanced infectivity of CXCR4-expressing cells (47).
Li et al. (37) found that multiple factors contribute to coreceptor usage and that the effects exerted by the V3 glycan are both isolated and context dependent. Similarly, the absence of a basic amino acid at position 11 of V3 and at positions 24 and 25 has been associated with a non-syncytium-inducing phenotype and CCR5 coreceptor-using properties, while the presence of basic charge has been correlated with CXCR4 and syncytium-inducing phenotypes (19, 20, 21, 26, 43). As with deglycosylation, these correlations have been imprecise.
Our findings, showing a potential linkage between V3 deglycosylation and the presence of a serine phosphorylation site at position 11, suggest that factors other than glycosylation and charge may have to be taken into account when assessing the function of V3. Based on the knowledge that C viruses are almost exclusively non-syncytium inducing and CCR5 using, it is tempting to speculate that deglycosylation may allow better access to the CCR5 coreceptor, while phosphorylation may alter the conformation of gp120, exposing retroviral sites that are needed for efficient CCR5-mediated viral entry. Although highly speculative, this possibility warrants further study given the critical importance of V3 for host cell recognition and viral entry.
Differences were also observed in the number and position of phosphorylation sites in reverse transcriptase and protease. Phosphorylation is known to modulate the activity of many proteins that interact with nucleic acids, including DNA and RNA polymerase. It is also known that, in addition to reverse transcriptase and protease, several protein kinases are incorporated into mature HIV-1 virions (68), where they are available not only to regulate the activity of reverse transcriptase and protease, but also to participate in interactions with the host cell. Phosphorylation of threonine residue at reverse transcriptase codon 215 has been shown to increase discrimination against azidothymidine, leading to drug resistance (36), and phosphorylation of protease substrates can lead to impaired proteolytic cleavage (68). Our data indicate that several phosphorylation sites in the pol gene of KwaZulu-Natal and subtype C viruses are highly conserved and positively selected. It will be important to determine whether these sites play a significant role in the replicative capacity and proteolytic processing of C viruses.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»