Previous Article | Next Article ![]()
Journal of Virology, September 2005, p. 11343-11352, Vol. 79, No. 17
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.17.11343-11352.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Department of Ecology and Evolutionary Biology, University of California at Irvine, Irvine, California,1 Department of Pathology, Immunology, and Laboratory Medicine, University of Florida, Gainesville, Florida,2 Gene Johnson Inc., St. Augustine, Florida,3 Department of Laboratory Medicine, Positive Health Program, University of California at San Francisco, San Francisco, California,4 Evolutionary Biology Group, Zoology Department, University of Oxford, Oxford, United Kingdom5
Received 17 December 2004/ Accepted 20 May 2005
|
|
|---|
|
|
|---|
Several reports on HIV-1 sequence heterogeneity in the brains of infected individuals, with or without HAD, have shown that brain-derived viral sequences are often monophyletic in the inferred phylogenetic tree, suggesting distinct virodemes (subdivided viral populations) in the CNS (13, 22, 26, 29, 42). In particular, the finding of independently evolving HIV-1 subpopulations in the frontal lobe, basal ganglia, medial temporal lobe, and nonmedial temporal lobe suggests the existence of restricted pathways for HIV-1 evolution within the CNS (6, 29). Compartmentalization and independent evolution of primary and secondary drug resistance mutations to reverse transcriptase and protease inhibitors in diverse CNS regions of HIV-1 patients, with and without HAD, have also been shown (33). However, the role, if any, that HIV-1 sequence variation might play in the neuropathogenesis of HAD is currently unknown. This is due, in part, to complexities concerning viral migration across the blood-brain barrier and to a paucity of studies of compartmentalization of HIV-1 quasispecies in different brain-derived tissues (1).
The main goal of the present study was to identify the extent of tissue-specific HIV-1 evolutionary patterns in the CNS. In particular, we studied the viral subpopulations infecting tissues usually associated with initial infection (meninges and cortex) and disease progression (white matter and spinal cord) isolated from a T-cell-depleted patient diagnosed with severe HAD at the time of death. We used a so-called "phylodynamic" framework, which combines various phylogenetic and population genetic analyses to investigate the correlation between the epidemiological and evolutionary behavior of viral pathogens and the immune system of the host (10). Our findings not only suggest that the Williams and Hickey (39) model may be correct, but also offer a plausible mechanism that may explain the interplay between macrophage activation and viral evolution in the development of HAD.
|
|
|---|
PCR, sequencing, and data management. The envelope V1-V3 region was amplified by nested PCR using the following set of primers: EnvF1 (outer forward), AACATGTGGAAAAATAACATGGT; EnvR1 (outer reverse), TGWATTACAGTAGAAAAATTCCCC; EnvF2 (inner forward), TGGTAGAACAGATGCAKGAGGA; and EnvR2 (inner reverse), CCCTCCACAATTAAAACTGTGY. Outer PCRs were carried out for 35 cycles of 15 s at 95°C, 30 s at 55°C, and 1 min at 68°C, followed by a final 8-min extension step at 68°C, in a 100-µl volume using 100 ng of DNA template, 10 µl of 10x buffer (Roche, Indianapolis, IN), 10 mM deoxynucleoside triphosphates, 20 µM of each primer, and 0.45 units of Roche Taq polymerase. Inner PCRs were carried out under the same conditions using 1 µl of the outer PCR product. Inner products were cloned into the pCRII-TOPO vector, and positive clones were detected by PCR, using the TOPO TA Cloning Kit Dual Promoter (Invitrogen, Carlsbad, CA) with conditions recommended by the manufacturer. Sequencing was carried out by ELIM Biopharmaceuticals Inc. (Hayward, CA) on an ABI 3730xl sequencer. About 20 clones (>120 total sequences) of PCR-amplified env-V1-V3 from each different brain region were obtained. V1-V2 and V3 domains were isolated using HIVbase software (http://www.hivbase.com) (14). Sequence integrity and possible contamination were tested according to the Los Alamos HIV Sequence Database guidelines (http://hiv-web.lanl.gov). Charge analysis was performed using HIVbase software. Multiple sequence alignments were generated using the Clustal algorithm (37) and manually edited in order to maximize positional homology in gap-rich regions following the Lamers et al. protocol (16). Alignments are available from the authors upon request.
Phylogenetic analysis.
To exclude the possibility of intrapatient recombination in the envelope region studied, V1-V2 and V3 alignments were analyzed separately. The samples were all identified as HIV-1 subtype B by phylogenetic analysis with known reference strains, and no significant differences were detected between V1-V2 and V3 phylogenies (data not shown). Maximum likelihood (ML) trees were estimated as follows. First, the best-fitting nucleotide substitution model was tested with a hierarchical likelihood ratio test following the strategy described by Swofford and Sullivan (35) using a neighbor-joining tree with Jukes and Cantor corrected distances. ML trees were then inferred with the selected model (HKY +
, eight categories, for both data sets) and ML-estimated substitution parameters. The heuristic search for the best tree was performed using a neighbor-joining tree as the starting tree and the TBR branch-swapping algorithm with the collapse option off to avoid polytomies in the final tree. Calculations were performed with PAUP* 4.0b10 (D. L. Swofford, Sinauer Associates, Sunderland, MA). Statistical support for internal branches in the tree was obtained by bootstrapping (1,000 replicates) and the ML-based zero-branch-length test (35). Trees were rooted by ML rooting by selecting the rooted tree with the best likelihood under the molecular-clock constraint.
Gene flow tests and migration counts. The hypothesis of compartmentalization, i.e., the existence of distinct HIV-1 subpopulations in the brain, was tested by the Slatkin and Maddison test for gene flow (31) using the MacClade program (20). A one-character data matrix is obtained from the original data set by assigning to each taxon in the tree a one-letter code indicating its tissue of origin. Then, the putative origin of each ancestral sequence (i.e., internal node) in the tree is inferred by the Fitch algorithm (9) by finding the most parsimonious reconstruction (MPR) of the ancestral character. The final tree length, i.e., the number of observed migrations in the genealogy, can easily be computed and compared to the tree length distribution of 10,000 trees obtained by random joining-splitting. Observed genealogies significantly shorter than random trees indicate the presence of subdivided populations (virodemes). The presence of structure in the trees was also tested with the Finkelstein test (8), which gives the probability of any particular clade being the result of random association due to runs of identical events. Finally, specific migrations among different compartments (states) were traced with the state changes and stasis tool (MacClade), which counts the number of changes in a tree for each pairwise state. When multiple MPRs are present (as they are in our data sets), the algorithm calculates the average migration count over all possible MPRs for each pair. The resulting pairwise migration matrix was normalized, and a migration network was sketched representing the proportion of observed migrations in the tree from/to each tissue sampled.
Parametric and nonparametric estimates of demographic history.
A generalized skyline plot is a nonparametric estimate of demographic history based on an inferred ultrametric genealogy (34). A genealogy reconstructed from randomly sampled HIV sequences contains information about population level processes, such as change in population size and growth rate (11, 25, 36). A skyline plot can be inferred from a genealogy with clock-like branch lengths, and it represents an estimate of how the population size changes over time (25, 34). Given a viral phylogeny, P, and a vector,
, representing the parameters of the demographic model N(t), parametric estimates can also be obtained by calculating the log of the conditional probability, ln[
P]. ML estimates of
can be found by numerical optimization of ln[
P], and 95% confidence intervals for the estimates are obtained with the likelihood ratio statistic (25, 34). For example, in a model of exponential population growth, the estimated parameters will be N(0)µ and r/µ, where µ is the evolutionary rate in nucleotide substitutions per site per year and N(0) is the effective number of infections at present. Notice that if the evolutionary rate cannot be inferred from the data, as with sequences collected at a single time point without any known divergence time in the tree, time is expressed in terms of nucleotide substitutions per site, and it runs backward into the past, so that N(0) is the effective number of infections at present, and N(t) is the effective number of infections at time t in the past. Parametric estimates for the constant-size, exponential, and logistic-growth models were obtained with the GENIE v3.0 software package (24). The tested models are nested; therefore, we could use the likelihood ratio test to assess which one best fits the data set studied. Differential equations defining the models can be found in the GENIE manual distributed with the package. Generalized skyline plots were obtained with the APE package implemented in R (http://cran.r-project.org). Because the molecular clock for the brain data set is rejected (see Results), the tree used as the input file for the skyline plot calculation is the tree with local-clock-corrected branch lengths assuming a different evolutionary rate for the major tissue-specific clades in the tree.
Molecular-clock analysis. Ultrametric trees were obtained by enforcing a molecular clock on the inferred genealogy and reestimating the branch lengths and substitution parameters with maximum likelihood using the codon substitution model M0 described by Yang et al. (44). The local-molecular-clock model was implemented by defining a baseline evolutionary rate, equal to 1 by default, for the tree containing HIV-1 sequences isolated from different brain tissues, plus five different local clocks for the major tissue-specific clades in the tree employing the same codon substitution model used for the global clock. Calculations were performed with the CODEML program of the PAML package (43). Different clock hypotheses were tested with the likelihood ratio test. Degrees of freedom for the test between global- and local-clock models were calculated by considering that for a binary tree a local-clock model has n 1 + r free parameters, where r is the number of unconstrained relative evolutionary rates (43). Branch lengths proportional to divergence time were also inferred with the nonparametric-smoothing algorithm implemented in the program r8s (http://ginger.ucdavis.edu/r8s/). Nonparametric smoothing relaxes the assumption of a perfect molecular clock, allowing evolutionary rates to vary smoothly along a tree (28).
Positive-selection analysis. Positive-selection analysis was performed by comparing the maximum likelihood codon substitution models developed by Yang et al. (44) and implemented in CODEML. Two categories of models were tested. (i) Branch site models (43) tested for different dN/dS (nonsynonymous/synonymous substitution) ratios along given branches of the tree. Three different models were compared: model 0, assuming a single dN/dS for the whole tree; model 1, assuming a different dN/dS for each branch in the tree; and model 2, assuming a baseline dN/dS equal to 1 for the entire tree and five different dN/dS along the branches of the five major tissue-specific clades. Model 0, model 1, and model 2 are nested, and a hierarchical likelihood ratio test can be used to check which one of them fits the data significantly better (43). (ii) Nnsites models (43) tested for specific sites under positive selection. The models compared were M7 and M8. M7 is a neutral model assuming a ß distribution with 10 classes of dN/dS across codons, where each class is constrained to have a dN/dS of <1. M8 is a positive-selection model assuming a ß distribution with 11 classes, 10 with a dN/dS of <1 and one class with a dN/dS of >1. Since the two models are nested, the likelihood ratio test can be used to check whether assuming an extra class of codons under positive selection fits the data significantly better.
All analyses were carried out with the latest available version of the PAML (v3.14) software package (43).
V3 loop cross-pattern tissue analysis. The ancestral sequence of the node that linked the meninges, temporal lobe, and lymph node-semen main clusters was reconstructed by maximum likelihood, using the BASEML program implemented in the PAML package (43). Sequences were also divided into 10 groups based on the major supported monophyletic clades present in the inferred genealogy, and the most recent common ancestor (MRCA) for each group was reconstructed. Signature pattern analysis was carried out with the VESPA program by comparing each sequence group to its inferred ancestor (12).
Epitope and motif searches. All published cytopathic-T-lymphocyte (CTL) epitopes and associated HLA serotypes can be found in table format at the HIV Molecular Immunology website (http://www.hiv.lanl.gov/content/immunology). All CTL epitopes specific for envelope were built into a search query within HIVbase software and applied to the isolated data sets in this study. We also built search queries for crest motifs previously associated with HIV-1 neurotropic variants (32). In order to screen both our sequences and sequences in the Los Alamos HIV databases (http://www.hiv.lanl.gov) for motifs, all the available sequences from the HIV databases were imported into HIVbase and the appropriate query was carried out with the HIVbase query tool. The saved search queries are available for download (http://www.HIVbase.com).
Nucleotide sequence accession numbers. Individual sequences have been deposited in GenBank under accession numbers DQ121207 to DQ121294.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. HIV copy number per genomic equivalent of tissue DNA
|
![]() View larger version (86K): [in a new window] |
FIG. 1. Histology of HAD brain. (A) Hematoxylin and eosin (H&E) staining of the frontal lobe subcortical white matter. (B) Anti-HIV p24 staining. (C) Anti-CD68 (macrophage) staining. (D) Control antibody staining.
|
![]() View larger version (33K): [in a new window] |
FIG. 2. Maximum likelihood unrooted genealogy of HIV-1 gp120 V3 isolated both from lymphoid (in yellow box) and site-specific brain tissue. The colored boxes represent different tissues: white, frontal lobe; blue, temporal lobe; red, occipital lobe; orange, spinal cord; black, meninges; green, iliac lymph node; magenta, seminal vesicles. Branch lengths were estimated with the HKY + model, which was selected with a hierarchical likelihood ratio test (25). Branch lengths are drawn to scale, with the bar at the bottom indicating 0.1 nucleotide substitution per site. One asterisk along a branch represents support with a P value of < 0.005 in the zero-branch-length test (25) for the clade subtending that branch. Two asterisks indicate that the clade subtending the branch was also supported by a bootstrap value of >75%.
|
|
View this table: [in a new window] |
TABLE 2. Finkelstein test (10) for runs of identical events within biological sequences
|
![]() View larger version (39K): [in a new window] |
FIG. 3. Migration analysis of HIV-1 in the brain according to the Slatkin and Maddison test (31); maximum likelihood rooted cladogram of HIV-1 strains isolated from different brain regions. Colored boxes on the tips of the tree indicate the tissue of origin of the taxa according to the legend to Fig. 2. Internal nodes represent reconstructed ancestral sequences. The color of a branch indicates the tissue of origin of the top node (sequence) of that branch. Striped branches are those for which more than one parsimonious reconstruction is possible for the ancestral character. Different colors between two branches, indicated by an asterisk, imply that a migration must have occurred. Despite the uncertainty of the state assignments for some of the internal nodes, the total number of observed migrations, 10, can still be computed precisely with the Fitch algorithm (9). The distribution of tree lengths, i.e., the number of observed migrations in 10,000 trees generated by random joining-splitting from the original tree, is also shown.
|
![]() View larger version (74K): [in a new window] |
FIG. 4. Migration network of HIV-1 sequences in the CNS and tissues sampled superimposed on the surface of the brain. The meninges line on the surface of the brain is indicated by the white line. Arrow size corresponds to the percentage of total migrations observed in the tree from one tissue to another; the largest migration (21.5%) occurs from meninges to temporal lobe; the smallest migration shown (1.8%) occurs from occipital lobe to spinal cord. Migrations of less than 1% are not shown. Green arrows represent gene outflow from the meninges and occipital lobe, whereas blue arrows indicate gene flow associated with the temporal lobe, frontal lobe, and spinal cord. Double-headed arrows indicate the almost equal exchange of viruses between tissues. None of the percent changes shown are significantly different from zero. (The brain image is copyright-protected material used with permission of the authors and the University of Iowa's Virtual Hospital [http://www.vh.org].)
|
![]() View larger version (16K): [in a new window] |
FIG. 5. Generalized Skyline plot of HIV-1 quasispecies in the brain. Nonparametric (solid line) and parametric (broken line) estimates of N(t)µ (15), the effective HIV-1 population size over time rescaled by the virus evolutionary rate, in the brain. The plot was inferred using the rooted cladogram in Fig. 3 with ultrametric branch lengths estimated by maximum likelihood (see Materials and Methods). Time, on the x axis, runs backward from the present (t = 0), and it is given as the number of nucleotide substitutions per site accumulated since the time of sampling.
|
|
View this table: [in a new window] |
TABLE 3. Molecular-clock analysis of HIV-1 brain sequences
|
Positive-selection analysis. Maximum likelihood analysis did not detect any branch in the brain tree or any site under positive selection in the V3 or V1-V2 alignment. Two nested codon-based substitution models (see Materials and Methods), one assuming neutrality (null hypothesis) and the other assuming a proportion of sites under positive selection (alternative hypothesis), gave almost identical log likelihoods, 981.28 and 981.02, respectively, indicating that the hypothesis of sites under positive selection does not fit the data significantly better. We then checked whether significant selective pressure was present in the internal branches that separate different clusters. Model 1, assuming a different dN/dS ratio for each branch in the tree, has, as expected, the highest log likelihood score (957.72), but its likelihood is not significantly different than the one calculated for model 0 (997.41; P > 0.05) or for model 2 (994.59; P > 0.05). The comparison of model 0 and model 2 by the likelihood ratio test indicates that the null model (model 0) cannot be rejected at the 5% level. In conclusion, no branches in the tree seem to be under positive selection. We also compared average dN/dS ratios among maximum likelihood inferred ancestral sequences at the internal nodes of the significantly supported monophyletic clades of the tree in Fig. 3; dN/dS values were always significantly lower than 1 (data not shown), suggesting that the observed HIV-1 sequence diversity, both within and between subpopulations, is most probably the effect of random genetic drift after bottleneck events and of migration pathways in the brain.
V3 loop cross-pattern tissue analysis. Figure 6A again shows the V3 data set genealogy with sequences assigned to 11 distinct subgroups: 10 based on the 10 monophyletic clades within the brain and 1 including the sequences within the semen-lymph node cluster. Figure 6B is the alignment of the consensus sequences for each subgroup with the ancestral sequence at the midpoint root of the tree. Meninges (MENg1), spinal cord (SCg1 and SCg2), and frontal lobe (FLg1 and FLg2) subgroups share exclusive mutations at positions 16 (E16K), 32 (S32G), 61 (T61E), and 62 (K62N/E). The same five subgroups also show the loss of the third N-linked glycosylation site indicated in Fig. 6B, which is commonly conserved among different subtypes (13a). In addition, an extra phosphorylation site is present at positions 43 to 46 in the meninges (MENg3) and the semen-lymph node subgroups.
![]() View larger version (50K): [in a new window] |
FIG. 6. Signature analysis of ancestral sequences; inferred HIV-1 V3 loop ancestral sequences from both lymphoid and nonlymphoid tissues. (A) HIV-1 V3 loop ancestral sequence of the midpoint ancestor (indicated by the green circle) of the tree from Fig. 2 (same color code) and ancestral sequences inferred for the MRCA of each tissue-specific monophyletic clade. Inferred amino acid sequences are shown next to the clade they belong to, with dashes indicating identical residues with respect to the midpoint ancestral sequence. The amino acids on the tip of the V3 loop are given in orange within the box. (B) V3 alignment of the midpoint ancestral sequence (ANCESTOR) and the consensus sequences of the strains belonging to each monophyletic clade indicated in panel A. The sequences are aligned to HXB2. Dots represent identical amino acid residues with respect to the midpoint ancestral sequence. The alignment is numbered according to the beginning of the HXB2 envelope protein by using the Sequence Locator Tool (http://www.hiv.lanl.gov). Putative N-glycosylation and protein kinase C phosphorylation sites along the sequences are highlighted in red and blue, respectively. Gray highlights indicate encoding targets for T-cell reactivity (CTL epitopes). In the brain sequences, only one CTL epitope motif that is associated with HLA type B7 is present. Within the gray highlighted region of HXB2, six CTL epitope motifs are found which are associated with a wide range of HLA types (A*0201, A11, A2, A3, D, Dd, H2, H2D, and B27).
|
|
|
|---|
In a study like the present one where brain biopsies from living patients are impractical, if not unethical, it is obviously impossible to obtain serial samples over time. The samples used in the analysis were collected at the time of death. The inferred trees represent a "snapshot" of HIV-1 genealogy since the time of the last viral bottleneck in those tissues (25, 34). Therefore, the MRCA at the root of the tree in Fig. 2 is unlikely to be the original common ancestor of the brain sequences. However, the inferred genealogy still contains information about the demographic history in the brain of this patient. The neutral evolution of the V3 loop, known to be usually under strong immune selection, could be explained by the T-cell depletion in the patient. On the other hand, the striking presence of quite different viral evolutionary clocks in the meninges and in the temporal lobe, instead of a uniform molecular clock as predicted by the neutral theory, may offer an explanation for the continuous activation/accumulation of macrophages in the brain during the AIDS phase. We suggest that after the failure of the immune system, newly produced viral variants, which would be rapidly cleared under normal conditions, begin to productively infect macrophages. To use Williams and Hickey's words, "in what becomes a self-amplifying cycle, infected macrophages activate other macrophages in the vicinity," and so on. In turn, more macrophages become infected and even more HIV-1 variants are produced and fixed in the viral population over time, giving rise to the increased evolutionary rate apparent in our data. "Macrophage dysregulation" (38), i.e., the accumulation, activation, and infection of macrophages in the temporal lobe that can eventually lead to neurodegeneration, is initiated by the production of new viral variants that the immune system is no longer able to control. At the same time, the cycle of continuous macrophage accumulation/infection is responsible for the increased evolutionary rate of the HIV-1 subpopulation in the temporal lobe. The scenario is loosely analogous to the finding of a 200-times-faster evolutionary rate in a human retrovirus evolving under neutrality that was observed as a consequence of a higher transmission rate within a network of injecting drug users (27).
As shown in Fig. 6B, the variation in the amino acid signature of the V3 loop leads to a loss or acquisition of different glycosylation sites. However, the phylodynamic patterns observed suggest that the main cause of the HIV-1 genetic variation in the brain is random genetic drift rather than positive selection. Neuronal injury is believed to result from both the direct effects of viral proteins and the indirect effects mediated by macrophage activation and resulting secretion of neurotoxic products (1). Carbohydrate binding proteins on the cell surface or macrophage endocytocis receptors interact with oligosaccharides on HIV-1 and can affect its infectivity. Therefore, the nonselective or random loss of glycosylation sites may also promote the macrophage activation cycle, thereby increasing the inflammatory response and the secretion of cytotoxic products, which are implicated in the onset of HAD (17).
The HLA haplotype of this patient is unknown; however, it is interesting that when screened for all known CTL epitope motifs, the sequences contained only a single epitope, which is associated with HLA type B7. In contrast, a known T-cell-tropic isolate, HXB2, contains multiple CTL epitope motifs that correspond to a variety of HLA haplotypes and spans the tip of the V3 loop (amino acid positions 308 to 325). If the patient was indeed HLA B7, then the epitope has nonselectively disappeared during the evolution of the viruses. Alternatively, if the patient was not HLA B7, then no known CTL epitopes exist in this population of brain sequences. The sequences isolated from brain regions typically involved in HAD, spinal cord and frontal lobe white matter, do not contain any known epitopes and are less glycosylated. Additional glycosylation or reglycosylation has been shown to be related to a more efficient virus in macrophages (18), and the altering of an epitope has a quantitative as well as qualitative impact on CTL recognition (41). A macrophage-tropic variant of HIV-1 not containing dominant, cross-reactive T-cell epitopes might contribute to enhanced viral spread in vivo, especially in the context of primary infection. The recently described immune activation/viral-load "set point" study (7) demonstrated that individuals with primary infection develop an activation/viral-load set point that is essentially predictive of their long-term survival. Higher levels of activation predict worse long-term outcomes. One possibility, given the finding of V3 epitope-free HIV-1 strains in a HAD patient's brain, is that individuals infected by strains of virus missing T-cell epitopes would have a greater likelihood of developing a greater body burden of HIV-1. This initial spread, not regulated properly by T-cell function, would yield a significant load of long-lived HIV-infected macrophages capable of seeding HIV-1 throughout the body and relatively resistant to highly active antiretroviral therapy.
Finally, it is important to emphasize that generalizing from the present work should be done with caution because only one patient has been examined. However, our findings point out that a description of the viral phylodynamic (10) processes in the brain may be invaluable in helping to address the details of HIV-1 intrapatient evolution and crucial for understanding the pathogenesis of HAD. Most importantly, the study suggests a plausible mechanism for macrophage dysregulation as a cause of neurodegeneration, underlining the need for effective antiretroviral treatments targeting not only T cells but also macrophages to successfully treat patients suffering from dementia and ultimately to eradicate HIV-1 infection.
Marco Salemi is supported by grants AI065265 and HD32259 and the Department of Pediatrics of the University of Florida, Gainesville. Susanna Lamers is supported by grants NCI-U01-CA66529-12 and NCI U01-CA96230-04, grants for the AIDS and Cancer Specimen Resource, and NIH-R01 MH73510-01.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»