| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Previous Article | Next Article ![]()
Journal of Virology, August 2007, p. 8507-8514, Vol. 81, No. 16
0022-538X/07/$08.00+0 doi:10.1128/JVI.02683-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

David C. Nickle,1
Jian Yan,2
Gerald H. Learn,1
Laura Heath,1
David Weiner,2 and
James I. Mullins1*
Department of Microbiology, University of Washington, Seattle, Washington 98195-8070,1 School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 191042
Received 5 December 2006/ Accepted 20 May 2007
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The plasticity of the HIV genome allows the generation of enormous numbers of viable mutants, resulting in circulating sequences that can differ by more than 30% in the maximally variable env gene. Since the genetic diversity of HIV-1 will continue to increase over the many years required for a vaccine to be developed, clinically tested, manufactured, and deployed, it is crucial to focus on vaccine sequence designs that can mitigate the effect of this diversity and to develop a substantially greater understanding of the structure-function relationships within the viral proteome to enhance development of antiretroviral therapeutics. Specifically, the efforts described here are directed toward designing HIV vaccine sequences that would embody as many common features of all circulating strains as possible while retaining functionality.
Several methods can be implemented to minimize the genetic distance to all known extant viruses. Consensus sequences (CON) correspond to the most frequent amino acid or nucleotide at each site within a gene sequence alignment. However, a CON is subject to sampling bias and may not retain covariable sites, since it does not consider evolutionary history (25). Our approaches seek to design antigens encompassing conserved structural features through phylogenetics-informed algorithms. HIV-1 phylogenies approximate a star-like shape, in which most circulating strains have diverged approximately equally from a central point: the most recent common ancestor (ANC). A prototypical strain that embodies the ANC will be more genetically similar to all circulating strains than any one strain chosen at random (if star phylogenies are accurate for HIV evolutionary history). The ANC is also expected to conserve the amino acid covariation required for proper protein folding (because the ancestor sequence is an estimate of a sequence that actually existed in viable viruses), thus improving the likelihood that the protein will function. However, the presence of outlier sequences in the phylogeny can bias the amino acid sequence against a large number of the intended circulating viral targets (12).
To ensure the robustness of ancestral reconstructions against such a long-branch bias, we developed a novel algorithm, elaborated here, to identify the point on the phylogenetic tree that represents the minimum of a metric of evolutionary distance. This position, called the Center of Tree (COT), minimizes the evolutionary distance to all sampled circulating strains, while still residing on an evolutionary path, to better capture the biological properties of circulating viruses (24). We have applied this novel phylogenetics-based design strategy to computationally derive genes encoding HIV-1 proteins that have important biological functions and that are frequent immunologic targets: Gag, Tat, and Nef. COT antigens were synthesized de novo and then characterized experimentally to evaluate their functionality and their immunogenicity.
| MATERIALS AND METHODS |
|---|
|
|
|---|
, satisfying the following relationship: F(
:
1,
2,...,
n) for all points p, where the notation F(
:
1,
2,...,
n) highlights the fact that distance li depends on point p. In this description, the form of F is general; specific choices for F can be made based on the intended application. The general algorithm below is applicable for most useful continuous Fs. For a given choice of F, a different algorithm may be more efficient. We describe such an algorithm to find COT when F is the mean of the squares (MS) of the l values. Depending on F, it is possible that more than one COT will exist for a given tree, but for many reasonable choices of F the COT will be unique.
General algorithm.
We first show that for a certain large class of functions, F:Rn
R (R is the set of real numbers; the function does not have an infinite number of local minima, i.e., COT points), there is a finite number of points along the tree, each corresponding to a possible COT, and that we can enumerate these points and determine which are in fact a COT.
For an unrooted tree T of n tips, there is a set of nodes, u
2n – 2 nodes, counting tips and internal branches, and a set of branches, w
2n – 3 branches, including internal and external branches (u and w are less than their maxima when polytomies [nodes with more than three branches] exist in the tree). For each qj, 1
j
u, and we can calculate as cj particular node, q, for example, F(qj:l1, l2,..., ln). Each cj is a candidate COT.
We then determined the candidate COT point for each branch, bk, and enumerated bk, 1
k
w. Note that each branch, for instance, branch k, is flanked by two nodes, called Rk and Lk. Let the branch length of bk be l. Now the tree is divided into right and left parts, so that if the tree had a root within branch k, the tips a1, a2,..., an would be divided into two groups: those descended from Rk and those descended from Lk. Suppose there are s right tips and t left tips. Let the distances from the right tips to Rk be written ñ1,... ñ2, and the distances from the left tips to Lk be written
1,...
2. Now let a point, p, lying on branch bk be the distance, x, from the right node Rk. The distance from p to Lk is then l – x.
For branch bk and p defined along it as described above, we then have F(p:l1,..., ln) = F(p:ñ1 + x,..., ñs + x,
1 + l – x, ...
1 + l – x) =
Fbk(x), 0 < x < 1.
In other words, on any branch k of T we can completely express the function F of n distances for every point p along that branch as a function of a single variable, x. By our assumption that F has a finite number of extreme points, the functions
Fbk(x) have a finite number of minima for x between 0 and l. Because F is continuous, those minima can be found by standard numerical methods, and each minimum
is associated with a point p|Axx as described above. Suppose there are v such points over all w branches, then we can write di = F(p1:l1,..., ln) =
Fb(pi)(
i), for 1
i
v, where b(pi) is the branch associated with point pi (not necessarily the ith branch). Each pi is then a candidate COT, since if pi is to minimize F among all points on the tree, it must at least minimize F on those points comprising the branch on which p resides. Since the nodes and branches contain all points on the tree, we have enumerated all possible COTs in the qj and the pi.
Therefore, any and all points p p
{q1,..., qu, p1,..., pv} that satisfy F(p:l1,..., ln) = min{c1,..., cu, d1,..., dv} are the only COTs for tree T given function F.
Phylogenetic trees can be expressed in computer programs as data structures that can be efficiently traversed by recursive routines that isolate each node and branch individually and systematically. The decomposition above formally describes the tasks to be performed upon consideration of each node and branch. While the algorithm is executed, the points and function values are stored, and the final determination of a COT sequence is accomplished by identifying the minima of the list of values and their associated points after the tree data structure has been completely traversed.
Algorithm to find points minimizing the mean squared distance from the points to the tips.
For the COT used in this study, we used the equation F(p:l1,..., ln) = 
, the mean of the squared distances from the tips to point p. The COT obtained by minimizing this function essentially balances the average length of the branches on either side of point p and thereby provides a point that yields a single reconstructed sequence with the maximum amount of sequence similarity to all the tips, given the evolutionary constraints of nucleotide change along the tree branches. As in the general algorithm, we decompose the tree into nodes and branches, enumerate all possible COT sequences, and calculate F for each possibility. The point with the minimum F is the COT. We can also express the function F in terms of quantities that can be efficiently calculated as the tree is traversed recursively; this allows the algorithm to accumulate the quantities ci and di. Below we describe the method of identifying possible COT points and calculating ci and di based on these quantities, and then we describe the recursion equations for these quantities that can be used in the tree-traversal algorithm.
Nodes.
Consider each node qi a temporary root of the tree, and suppose qi has k descendant branches, each of which defines a subtree with tm tips, 1
m
k. F then can be written in the following manner:
![]() |
m =
, the proportion of n tips in the mth subtree, and [
, 1
j
tm, is the distance from qi to each of the tm tips.
Each MSm is therefore the mean of squared distances to the n tips of the entire tree associated with subtree m from node qi, considering node qi as the root, and each
m is the proportion of the n tips of the entire tree associated with subtree m.
Branches.
With this function F, there exists at most one possible COT on any branch. Consider a branch of length l with left and right nodes as described in the general algorithm above, and consider a point, p, within the branch. Let ML be the simple average of distances from point p to the left tips and MR be the average of distances from p to the right tips. Suppose that there are t left tips and s right tips, and then use the equation
=
. Now, define
as follows:
![]() |
1 is true, and it is the distance
1 from the left node along the branch. If there is such a point, then the value of F at that point, di, can be written as di =
(1 –
)(ML + MR + l) –
(
– MSL) – (1 –
)(
– MSR), where MS is the mean of summed squared distances from the left or right nodes to their descendant tips, as indicated. Finally, the COT is the point associated with the smallest value among the ci and di.
Recursions to calculate M and MS.
Suppose node q has k descendant nodes. Each of the k nodes is connected to q by a branch of length li, and each is the root of a subtree having si tips, 1
i
k. Suppose that, for each subtree, the mean distance Mi and the mean squared distances MSi from node to tips have been calculated. The mean distance Mq and mean squared distance MSq from q to all s = s1 +... + sk descendant tips then are given by Mq =
1(M1 + l1) +... +
k(Mk + lk) and MSq =
1(MS1 + 2l1M1 +
) +... +
k(MSk + 2lkMk +
, where
i =
for 1
i
k.
These quantities can thus be built up as the tree is recursively traversed and can be used in the calculations described above.
The PERL scripts implementing the above algorithm have been combined into a Web-based tool that is available at http://indra.mullins.microbiol.washington.edu/cgi-bin/COT/cot.cgi.
Application to data. HIV-1 subtype B nucleotide sequences and three subtype D sequences (used as an outgroup) were used to create data sets for Gag (39 sequences), the first exon of Tat (40 sequences), and Nef (37 sequences). Consensus sequences were derived from each data set without the outgroup (21). ANC were estimated from maximum-likelihood (ML) trees generated with PAUP* (32) using the outgroup. The outgroup was then removed, and the tree was imported back into PAUP* to estimate the COT sequence. The three computationally derived nucleotide sequences, CON, ANC, and COT, were added to the data sets, and new ML trees were generated. For each tree, we estimated the average genetic distance between the derived sequence and the sequences used to generate the phylogenies.
Construction and in vitro expression of COT gag, tat, and nef genes. COT gene nucleotide sequences were optimized for expression in human cells by changing the codon usage to that of highly expressed human genes and by reducing the free energy to improve the stability and translation efficiency of the transcripts (35). The optimized COT genes were synthesized by Blue Heron Corp. (Bothell, WA) and subcloned into pcDNA3.1(–) (Invitrogen) at XbaI and NotI sites.
Human embryonic kidney 293T cells were transfected with the COT gag, tat, and nef gene constructs by the calcium phosphate coprecipitation method (3). Briefly, 3 x 105 293T cells were seeded per well in a 6-well plate and were transfected the next day with 4 µg of plasmid DNA. Forty-eight hours posttransfection, cells were lysed on ice with 0.5% NP-40 lysis buffer supplemented with protease inhibitors (2 µg/ml leupeptin, 2 µg/ml aprotinin, 1 mM orthovanadate, and 1 mM phenylmethylsulfonyl fluoride). Lysates were separated by electrophoresis on denaturing sodium dodecyl sulfate (SDS)-polyacrylamide gels containing 12 to 18% acrylamide, depending on the protein size. Proteins were transferred onto a polyvinylidene difluoride (PVDF) membrane (Immobilon-P; Millipore) according to the Towbin transfer protocol (33). COT Gag and Nef proteins were detected with a 1:2,000 dilution of HIV-1 p24-specific monoclonal antibody (no. 4121; National Institute of Allergy and Infectious Diseases [NIAID]; https://www.aidsreagent.org/index.cfm) and of HIV-1 Nef-specific monoclonal antibody (no. 3689; NIAID), respectively, followed by a 1:4,000 dilution of horseradish peroxidase-labeled anti-mouse immunoglobulin G. COT Tat was detected using a rabbit antiserum (no. 1974; NIAID), and protein-bound antigens were detected with anti-rabbit horseradish peroxidase conjugate. Reactive protein bands were visualized by chemiluminescence using the ECL Plus Western blotting reagent (Amersham).
Virus-like particle production. Virus-like particles were obtained from 293T cell culture supernatants according to standard protocols (27). Briefly, 48 h after transfection of 293T cells with COT Gag expression vectors, culture supernatants were clarified at 3,000 rpm at 4°C for 15 min and filtered through a 0.2-µm-pore-size filter. The filtrate was layered on top of a 20% sucrose cushion and spun at 27,000 rpm at 4°C for 1 h in a Beckman ultracentrifuge using an SW28 rotor. The pelleted virus-like particles were suspended in TNE buffer (10 mM Tris, 100 mM NaCl, 1 mM EDTA, pH 7.9).
Trypsin digestion assay. The virus-like particle fractions were incubated for 30 min at 37°C in the presence of trypsin (2 µg/ml) with or without Triton X-100. Alternatively, a protease inhibitor cocktail (30 µM aprotinin, 435 µM leupeptin, 1 mM phenylmethylsulfonyl fluoride) was added to the virus-like particle fractions prior to the incubation with trypsin (200 µg/ml) and with or without 1% Triton X-100. Reactions were stopped by adding Laemmli sample loading buffer (42 mM Tris-HCl, pH 6.8, 1.7% [wt/vol] SDS, 8.25% [vol/vol] glycerol, 0.6 M ß-mercaptoethanol) and heating at 100°C for 3 min. COT Gag was detected by following the Western blotting protocol described above.
HIV-1 long terminal repeat (LTR) activity in the absence or presence of COT Tat.
293T cells (1 x 105 to 2 x 105) were seeded in 6-well plates and transfected by the calcium phosphate coprecipitation technique (3) with various amounts of plasmids, but the total amount of DNA was kept constant (2 µg) by addition of the empty plasmid vector pcDNA3.1(–) DNA. Forty-eight hours posttransfection, cells were stained with 5-bromo-4-chloro-3-indolyl-
-D-galactopyranoside (X-Gal) to detect ß-galactosidase expression, and blue cells were scored. Basal transcription of pHIVlacZ (no. 151; NIAID) was determined with 200 and 400 ng of plasmid DNA in at least three different transfections. Tat activation of transcription was determined with 200 and 400 ng of pHIVlacZ cotransfected with 200, 400, or 800 ng of COT Tat exon 1 in at least three experiments.
Major histocompatibility complex class I (MHC-I) downregulation assay. 293T cells (105) were seeded in 6-well plates and transfected by the calcium phosphate coprecipitation technique with 2 µg of pcDNA3.1-COTnef, pcDNA3.1nef-NL4-3, or pcDNA3.1nef-mock (having a defective reading frame with the inactivating mutations at the 5' end of nef [29]). Forty-eight hours posttransfection, the cell surface expression levels of the HLA-1 allele were analyzed with anti-HLA-ABC antigen-phycoerythrin by flow cytometry as described previously (29).
Immunization of mice. The plasmids encoding COT proteins were digested with BamHI and NotI, and the COT inserts were cloned into pVAX (Invitrogen, Carlsbad, CA) to generate the constructs pVAX-COTGag, pVAX-COTTat, and pVAX-COTNef. Immunizations were also done using a mock pVAX plasmid and pGag02CAM, which encodes a primary HIV-1 clade B Gag.
All immunizations were carried out with groups of three 5- to 6-week-old female BALB/c mice. DNA vaccines were administered at a dose of 100 µg at days 0, 14, and 28. Immunizations were performed under anesthesia by injection into the anterior tibial muscle in the hind legs. Mice were sacrificed at day 35. The animals were cared for according to the regulations and guidelines of the University of Pennsylvania Institutional Animal Care and Use Committee.
Gamma interferon (IFN-
) ELISPOT assays.
One week after the last immunization, splenocytes were isolated and red blood cells were lysed by suspension in 2 ml red blood cell lysis buffer-spleen for 5 min. The cells were then washed in phosphate-buffered saline and resuspended in RPMI 1640 medium with 10% fetal bovine serum. Cells were counted and prepared for analysis.
ELISPOT assays were performed using high-protein-binding immunoprecipitation 96-well multiscreen plates coated with monoclonal antibody to mouse IFN-
. Responses were mapped using HIV overlapping peptide libraries corresponding to the CON from subtypes A and B and group M. Briefly, 15-mer overlapping peptide library pools of either Gag, Tat, or Nef were added to 2 x 105 cells per well and were incubated for 24 h at 37°C in a 5% CO2 incubator. All tests were performed in triplicate. After addition of the detection antibody, color development was monitored until spots were visible, and the plates were air dried. Wells were imaged and spots were counted by an automated ELISPOT reader (CTL Analyzers, Cleveland, OH) using the ImmunoSpot software and were analyzed as described above. The average number of spot-forming cells (SFC) was adjusted to 1 x 106 splenocytes for data plotting. For each immunization experiment, pVAX was used as an internal control, and the average number of SFC obtained with the mock pVAX plasmid was subtracted from the numbers obtained with the COT plasmids.
| RESULTS |
|---|
|
|
|---|
|
|
|
In vitro expression of COT Gag, Tat, and Nef proteins. To evaluate the expression patterns of each of the three COT constructs, 293T cells were transiently transfected with plasmid DNA, and cell lysates were analyzed by immunoblotting (Fig. 2). An antibody specific to the viral core protein p24 recognized the Gag precursor p55 protein within cells and in cell supernatants (Fig. 2A) as well as a cleavage product of approximately 41 kDa. COT Tat exon 1 migrated with the expected rate, corresponding to a molecular mass of approximately 14 kDa (Fig. 2B). COT Nef was also well expressed and was detected at 27 and 25 kDa (Fig. 2C), as expected due to the presence of alternative start codons. The codon-optimized Nef COT protein also was expressed at higher levels than Nef proteins corresponding to sequences from different HIV-1 patients or from the HIV-1 NL4-3 isolate (data not shown).
|
|
|
|
ELISPOT assays using overlapping libraries of peptides corresponding to CON from subtypes A and B and group M. Each of the three COT DNA immunogens induced specific IFN-
ELISPOT responses when assayed with peptide pools corresponding to subtype B CON peptides and, in some cases, subtype A (Gag) and group M (Gag and Nef) CON peptides. Additionally, pCOTGag elicited stronger IFN-
ELISPOT responses than pGag02MAC, which encodes a circulating strain of HIV-1 subtype B (Fig. 6).
|
| DISCUSSION |
|---|
|
|
|---|
Phylogenetics-based ancestral sequence reconstructions like the COT may produce products that are distinct from those of CON reconstructions, as recent results demonstrate that the behavior of an ancestral protein need not be an average of those of its descendants. Gaucher and colleagues (13), for example, have suggested that the ancestor of modern mesophiles lived at higher temperatures than its descendants by showing that the ancestral proteins could function at 55°C. By comparing central protein sequences derived from different datasets, we found some variable positions among the sequences. Previously, it was reported that differences in the ancestral sequences mostly derived from the method used to root the phylogenetic tree (28). This emphasizes the need to carefully consider the sequence collection used to generate the tree and to assess the reliability of the tree prior to deriving a COT sequence.
We and others have suggested that reconstructions of central HIV sequences like COT, CON, and ANC could be used to develop vaccine antigens (12, 24). Ancestral sequences are intended to be as similar as possible to all the strains of a given subtype, and therefore they should induce immune system coverage broader than that of any individual native viral protein. We therefore recreated ancestral HIV-1 antigens for Gag, Tat, and Nef, which are potentially critical immunologic targets, given their immune reactivity (9) and their roles in the virus life cycle (36). Since we had recently derived an HIV-1 B ANC sequence for the env gene and tested its immunogenicity in rabbits (8), we chose not to reconstruct any COT Env. The structural HIV-1 protein Gag is highly conserved and is among the most common targets of the virus-specific cell-mediated immune response, while Tat and Nef are more variable, yet critical, regulatory proteins important in viral gene expression and pathogenesis, and they frequently induce T-cell immune responses early in infection. Engineered COT proteins were well expressed, a fundamental requirement for the immunogenicity of vaccine candidates, especially for the usually poorly expressed HIV-1 proteins (16, 40). Moreover, these proteins are capable of eliciting CTL immune responses in mice. It should be noted that COT Gag elicited strong cross-clade CTL responses in studies of reactivity against subtype B and A and group M peptide pools. Future detailed mapping studies need to be performed in order to formally demonstrate the ability of COT immunogens to induce CTL responses that are broader than those of HIV-1 antigens. However, the increase in magnitude of the COT antigens observed for the CTL responses strongly suggests that CD4 help was improved by these designs, suggesting a positive effect on class II responses.
Our data confirm the utility of phylogenetic tools to select and construct novel functional ancestral gene sequences in the pursuit of understanding the core features of viral proteins required for function and for a broadly protective vaccine against HIV. Our approach takes advantage of the rapid accumulation of sequence data to rationally design HIV antigens. Among the most vexing challenges of HIV therapeutics, as well as of vaccine design, is the enormous capacity of HIV-1 to mutate and subsequently to become drug resistant or evade host immune responses. The critical problem posed by the extreme diversity of HIV is exacerbated by the long development and testing cycle of a new vaccine, which means that the variability of HIV will likely have changed considerably in the meantime. In this regard, it may be too optimistic to expect that the isolate-based candidate vaccines (currently in phase II and III human clinical trials) could be cross-reactive enough to protect against circulating viruses. Thus, using a central antigen such as COT might contribute to improvements over isolate-based vaccine approaches (26). Although more studies are necessary to determine whether central sequences will elicit cross-reactive responses sufficient to be protective, recent studies with a second-generation CON group M env vaccine showed that it elicited improved levels of neutralizing antibodies in guinea pigs, in some cases stronger and broader than those of contemporary isolates (19). Further complicating the development of HIV antigens is the propensity of HIV to escape virus-specific CTLs, underscoring the importance of HIV-specific cellular immune responses in the control of the virus (2, 18, 20). Viral escape from host immunity thus represents a substantial hurdle for candidate CTL vaccines. COT antigens potentially could help avoid escape by allowing more CTL responses to be generated by capitalizing on the larger representation of epitopes in COT designs than the representation of epitopes in circulating isolates.
Apart from the critical importance of analyzing the genetic diversity of contemporaneous HIV from a global perspective, additional experimentation on ancestral-state sequences is valuable for the study of HIV molecular evolution. Protein reconstruction provides an unusual opportunity to study the pathways and mechanisms of functional changes during molecular evolution, since the mechanistic basis and dynamics of this process can be tracked in detail in vitro, allowing some fundamental questions to be rigorously examined (7, 37). Importantly, though analyzing ancestral proteins in the context of extant cells could lead to experimental artifacts, this risk is minimized in the case of HIV due to the enormous evolutionary rate of HIV compared to that of its host. That is, the environment of the virus has remained essentially unchanged during viral evolution since its leap from nonhuman primates to humans in the last century (38). Ancestral reconstructions can shed additional light on the common functional elements of HIV strains as well as the lineage-specific evolutionary changes that led to the multiple contemporaneous variations of these common elements. Hence, although biological properties are often studied using mutational analysis, we suggest that switching to the ancestral-state amino acid would be a pertinent means to investigate an amino acid's role. Our improving understanding and modeling of the molecular evolution of HIV, along with a better knowledge of the correlates of immune protection, will allow us to begin to predict how HIV sequences will change and to identify elements that increasingly successful therapies and an ultimately successful vaccine will need to incorporate.
| ACKNOWLEDGMENTS |
|---|
This work was supported by a gift from the Boeing Corporation and by the University of Washington Center for AIDS Research and STDs (NIH P30 AI27757).
| FOOTNOTES |
|---|
Published ahead of print on 30 May 2007. ![]()
Present address: Department of Epidemiology and Biostatistics, College of Public Health, Franklin College of Arts & Sciences, University of Georgia, Athens, GA 30602. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| J. Bacteriol. | Mol. Cell. Biol. | Microbiol. Mol. Biol. Rev. |
|---|
| Clin. Vaccine Immunol. | ALL ASM JOURNALS |
|---|