**DOI:**10.1128/JVI.00117-10

## ABSTRACT

The prominent role of antiviral cytotoxic CD8^{+} T-lymphocytes (CD8-TL) in containing the acute viremia of human and simian immunodeficiency viruses (HIV-1 and SIV) has rationalized the development of T-cell-based vaccines. However, the presence of escape mutations in the acute stage of infection has raised a concern that accelerated escape from vaccine-induced CD8-TL responses might undermine vaccine efficacy. We reanalyzed previously published data of 101,822 viral genomes of three CD8-TL epitopes, Nef_{103-111}RM9 (RM9), Tat_{28-35}SL8 (SL8), and Gag_{181-189}CM9 (CM9), sampled by ultradeep pyrosequencing from eight macaques. Multiple epitope variants appeared during the resolution of acute viremia, followed by the predominance of a single mutant epitope. By fitting a mathematical model, we estimated the first acute escape rate as 0.36 day^{−1} within escape-prone epitopes, RM9 and SL8, and the chronic escape rate as 0.014 day^{−1} within the CM9 epitope. Our estimate of SIV acute escape rates was found to be comparable to very early HIV-1 escape rates. The timing of the first escape was more highly correlated with the timing of the peak CD8-TL response than with the magnitude of the CD8-TL response. The transmitted epitope decayed more than 400 times faster during the acute viral decline stage than predicted by a neutral evolution model. However, the founder epitope persisted as a minor population even at the viral set point; in contrast, the majority of acute escape epitopes were completely cleared. Our results suggest that a reservoir of SIV infection is preferentially formed by virus with the transmitted epitope.

A critical role of CD8^{+} T-lymphocytes (CD8-TL) in controlling the peak of acute viral replication has been demonstrated both in HIV-1 (10, 31, 57) and experimental SIV infections (51). HIV-1-infected patients with strong HIV-1-specific CD8-TL responses early after the onset of the acute retroviral syndrome showed more effective control of primary viremia than patients with low or undetectable virus-specific CD8-TL activity (10). Delayed HIV-1-specific CD8-TL responses within an acutely infected individual was found to be one factor contributing to the patient's persistent viremia, symptoms, and low CD4^{+} T-cell counts (31). A close temporal association between the magnitude of immunodominant B57-restrcited HIV-1-specific CD8 T cells and viral load was observed (57). In nonhuman primate models, the effect of CD8^{+} T cells on acute viral containment has been more directly probed by administering an anti-CD8 antibody to transiently deplete CD8^{+} lymphocytes from the peripheral blood (51). The resolution of peak viremia was much slower in the CD8^{+} lymphocyte-depleted rhesus macaques than in the untreated control animals (51).

CT8-TL responses provide selective pressure within human leukocyte antigen (HLA)-restricted regions of the viral genome, which can select for escape variants. Understanding the kinetics of viral escape has important implications for the development of T-cell-based vaccines. Recently, in acutely infected HIV-1 subjects, single-genome amplification (SGA) and sequencing have shown that while only random mutations were observed prior to peak viremia (50), CD8-TL escape mutations were prominent as early as 20 to 30 days after the acute peak of viremia (24), well before the establishment of the viral set point. Indeed, it was observed that the emergence of viral escape mutants occurred coincidently with the expansion of the epitope-specific CD8-TL population in the acutely infected host, and that it resulted in amino acid substitutions in the transmitted/founder virus that diminished recognition by CT8-TL specific for the original (transmitted) epitope (24).

Quantitatively, the average rate of CD8-TL escape mutation within 20 days of HIV-1 infection since the first screening has been estimated as 0.33 day^{−1} (24). This early escape rate is substantially greater than the chronic escape rate, which has been estimated as 0.04 day^{−1} (6). However, these prior estimates (6, 24) have been based on Sanger sequencing data from a limited number of virus clones. The availability of ultradeep pyrosequencing methods provides the opportunity to revisit these estimates using much richer data sets, which can detect mutations with a frequency of as little as 1% (8). The quantification of the rate of CD8-TL escape in SIV and HIV-1 is important, since it can serve as a surrogate measure of the magnitude and effectiveness of the host CD8-TL response. Mathematical models have been developed to quantify the process of viral CD8-TL escape (6, 7, 23), which framed the escape phenomenon as a synergetic outcome of the differences of wild and mutant epitopes in terms of susceptibility to cytotoxic T-lymphocyte (CTL) killing versus their intrinsic viral fitness.

The goal of the present study was to quantify escape dynamics within three well-defined CD8-TL epitopes by rigorously analyzing both previously published and newly generated ultradeep pyrosequencing data from a set of eight SIV-infected macaques (8). Bimber and colleagues (8) previously demonstrated multifarious patterns of CTL escape in these SIV-infected macaques, and a recently published analysis of the same data set by Hughes et al. revealed that the persistence of low levels of inoculum sequence and its consistent loss kinetics enable the reliable inference of the wild-type sequence when only samples from later in infection are available for study (26). Here, we used the same extensive sequence data set, in combination with newly generated data, to quantify viral escape dynamics for three well-defined CD8-TL epitopes relative to the transmitted (wild-type) epitope sequence. By fitting a mathematical model of CD8-TL escape (6) to the experimentally determined CT8-TL escape kinetics, we compared the rate of the first CD8-TL escape of the escape-prone epitopes, Nef_{103-111}RM9 and Tat_{28-35}SL8, to that of the escape-resistant epitope, Gag_{181-189}CM9. For this purpose, we define the time to first CD8-TL escape as the time when the first CD8-TL escape mutant comprises 50% of the combined population of the transmitted (wild) sequence and the first escape mutant clone. This definition is different from the timing of the first emergence of amino acid variants within an epitope. Our definition can be used when individual clones are obtained either by single-genome amplification (42, 49) or pyrosequencing (32, 48).

In this study, by employing a rich data set from ultradeep pyrosequencing, we tested the hypothesis that the transmitted epitope contributes to the formation of a reservoir of infection. Our results suggest that this is indeed the case, and they also suggest that viral reversion (13, 21, 34, 37) is complicated in some cases by the unexpected persistence of wild-type, transmitted virus strains long after initial infection.

## MATERIALS AND METHODS

Animal selection and viral infection, viruses, and epitopes.A major portion of the experimental data that were reanalyzed in this paper has been published previously, including much of the acute deep-sequencing data (8) and the tetramer staining for the four *Mamu-A1*001*-positive Indian rhesus macaques (54). The latter animals have been identified in prior work using two different identification systems (UW [8] and UC-Davis [54]): (i) 31107 (rh2122), (ii) 31824 (rh2126), (iii) 31157 (rh2127), and (iv) 31689 (rh2124). New data in this paper are the deep-sequencing results for chronic escape kinetics of the Mamu-A1*001-restricted Gag_{181-189}CM9 (CM9) epitope and the tetramer staining data for the Mamu-A1*001-restricted Tat_{28-35}SL8 (SL8) and Mamu-A1*001-restricted Gag_{181-189}CM9 epitopes in the four Indian rhesus macaques (rh2122, rh2124, rh2126, and rh2127). All of the data sets were kindly provided by Benjamin N. Bimber, Benjamin J. Burwitz, Matt Reynolds, David Watkins, and David O'Connor.

Briefly, four Mauritian cynomolgus macaques (MCM) from the Wisconsin National Primate Research Center, cy0161, cy0162, cy0163, and cy0165, were infected intrarectally (i.r.) with 50,000 50% tissue culture infectious doses (TCID_{50}) of SIVmac239 (8). All four animals expressed Mafa-A1*063. Four *Mamu-A1*001*-positive Indian rhesus macaques, rh2122 (31107), rh2124 (31689), rh2126 (31824), and rh2127 (31157), were infected i.r. with SIVmac239. We studied three different CT8-TL epitopes: Mafa-A1*063-restricted Nef_{103-111}RM9 (RM9) for cy0161, cy0162, cy0163, and cy0165, and Mamu-A1*001-restricted Tat_{28-35}SL8 and Mamu-A1*001-restricted Gag_{181-189}CM9 for rh2122, rh2124, rh2126, and rh2127. Three of the latter animals (rh2122, rh2126, and rh2127) were vaccinated with constructs encoding a combination of CTL epitopes including CM9 and full-length proteins (Tat, Rev, and Nef) by using a DNA prime and a recombinant modified vaccine virus Ankara vector (rMVA) prior to the i.r. challenge with SIVmac239. The original vaccination study was published in reference 54. This immunization regimen elicited a very strong anamnestic CD8-TL response, with the containment of acute-phase virus replication but no containment of virus replication in the chronic phase (54).

Sample preparation, ultradeep pyrosequencing, and tetramer analysis.Sample preparation, ultradeep pyrosequencing, sequence analysis, and tetramer analysis were done as described in reference 8.

Shannon information entropy.We assessed the level of the diversity of epitope variants by measuring Shannon information entropy (52). Shannon information entropy at time *t* is defined as
$$mathtex$$\[S(t){=}{-}\ {{\sum}_{i{=}1}^{N(t)}}f_{i}(t)\mathrm{log}_{2}f_{i}(t)\]$$mathtex$$(1) where *N*(*t*) is the number of distinct epitope variants that are present at time *t*, and *f _{i}*(

*t*) is the proportion of the

*i*th epitope variant. The more diverse the epitopes present in amounts comparable to one another, the greater the resulting entropy value.

Mathematical model of CD8-TL escape.As introduced by Asquith et al. (6) and other studies (23, 38), the kinetics of the CTL escape variant is described by the dynamics of two different infected cell populations,
$$mathtex$$\[\frac{dw(t)}{dt}{=}aw(t){-}bw(t){-}cw(t)\]$$mathtex$$(2)$$mathtex$$\[\frac{dm(t)}{dt}{=}a{^\prime}m(t){-}bm(t)\]$$mathtex$$(3) where *w(t)* is the number of cells productively infected with wild-type virus and *m(t)* is the number of cells infected with mutant virus. The replication rate of wild-type virus is given by *a*, and mutant virus replicates at a rate of *a*′. Both wild-type- and mutant-infected cells are cleared at rate *b*, and wild-type-infected cells are eliminated by extra CD8-TL recognizing only wild-type epitope at rate *c*. The rate constants *a, b,* and *c* have no time dependence, which is a simplification made in the model. The acute stage of SIV/HIV-1 infection clearly involves dynamic variations in the rate constants. An alternative model, which relaxed the approximation of the constant rates, has studied the effect of time dependence in the rate constants on the estimation of the escape rate (23).

The proportion of escape mutants is denoted as *p*(*t*), which is defined as *m*(*t*)/[*w*(*t*) + *m*(*t*)]. By solving equations 2 and 3, we have
$$mathtex$$\[p(t){=}\frac{1}{1{+}g\mathrm{exp}({-}kt)}\]$$mathtex$$(4) where *g* = *w*(0)/*m*(0) and *k* = *c* − (*a* − *a*′). Here, the rate of escape *k* is determined by the difference between the rate of the CD8-TL killing of wild-type virus and the fitness cost of the mutant virus, *a* − *a*′. The timing of escape mutations, T_{50}, is defined as the time when the mutant virus of interest comprises one-half of the aggregate population of wild-type virus plus the mutant virus, i.e., *p*(*t*) = 0.5. From equation 4, we have *T* = log(*g*)/*k*.

Equation 4 is also a logistic model, where log{*p*(*t*)/[1 − *p*(*t*)]} is assumed to be a linear function of time with slope *k*. For *k* > 0, *p*(*t*) is an increasing function taking on values between 0 and 1. We apply this model to pyrosequencing data when the proportion of a mutant clone is increasing relative to that of the wild-type (transmitted) virus.

The model is appropriate for analyses of the early phase of CD8-TL escape for a given epitope, when the frequency of the escape mutant is increasing, but would not be appropriate if applied to a longer time window, during which an initial escape CD8-TL mutant may be replaced by other mutants.

Data fitting: least squares and beta regression.We employed two independent methods for the best-fit analysis using data from each animal separately to estimate animal-specific curves. Both methods assume equation 4 is a reasonable model for *p*(*t*). The methods differ in how *g* and *k* are estimated. The first method is the nonlinear least-squares regression method with the Levenberg-Marquardt algorithm (45). In this method, the parameter values *g* and *k* are chosen to be the values that minimize the sum of squared differences between the logistic curve *p*(*t*) = 1/[1 + *g* exp(−*kt*)] and the data points *m*(*t*)/[*m*(*t*) + *w*(*t*)],
$$mathtex$$\[\ {{\sum}_{t}}\ \left[\frac{1}{1{+}g\mathrm{exp}({-}kt)}{-}\frac{m(t)}{m(t){+}w(t)}\right]^{2}\]$$mathtex$$(5) where the sum is over every sampling time, *t*.

Alternatively, since the data values for the proportion of mutant virus are restricted to the range of 0 to 1, we used the method of beta regression (17). In beta regression, the deviations from the fitted model are properly represented using a beta distribution, which is defined only for values between 0 and 1. Let the mean value of the proportion of the mutant virus at time *t* be the fitted value in equation 4, *E*(*t*) = 1/[1 + *g* exp(−*kt*)]. We assume that the observed values of the proportion of the mutant virus come from a beta distribution with mean *E*(*t*) =1/[1 + *g* exp(−*kt*)] and variance *E*(*t*)[1 − *E*(*t*)]/(1 + φ), where φ is the dispersion parameter. Here, larger values of φ mean less variability around the fit.

The likelihood function for beta regression is
$$mathtex$$\[L(g,k,{\varphi}){=}\ {{\prod}_{t}}\frac{{\Gamma}({\varphi})}{{\Gamma}[E(t){\varphi}]{\Gamma}[(1{-}E(t)){\varphi}]}p(t)^{E(t){\varphi}{-}1}[1{-}p(t)]^{[1{-}E(t)]{\varphi}{-}1}\]$$mathtex$$(6) where the product is over every sampling time *t*. We used the values of the parameters that maximize equation 6 (maximum-likelihood estimates) to estimate *g* and *k*.

Estimating standard errors.Let ĝ and *k̂* be a set of parameter values that minimizes the sum of squared errors for the least-squares method or that maximizes the likelihood for the beta regression method. The standard errors of the parameter estimates ĝ and *k̂* were estimated by the asymptotic covariance matrix. This is the inverse of the Hessian matrix, defined as the matrix of second derivatives of the function being optimized (the sum of squared errors in equation 5 for the least-squares method and the likelihood equation 6 for the beta regression method) with respect to the parameters *g* and *k*. The inverse of the approximate Hessian at the solution, ĝ and *k̂*, is the approximate asymptotic covariance matrix,
$$mathtex$$\[Cov({\hat{g}},{\hat{k}}){=}H({\hat{g}},{\hat{k}})^{{-}1}\]$$mathtex$$(7) where *H*(ĝ, *k̂*) is the approximate Hessian at the solution.

The diagonal elements of the covariance matrix are the variances of the parameter estimates, and the off-diagonals are the covariance between them,
$$mathtex$$\[Cov({\hat{g}},{\hat{k}}){=}\ \left[\ \begin{array}{ll}\mathrm{var}({\hat{g}})&\mathrm{cov}({\hat{g}},\ {\hat{k}})\\\mathrm{cov}({\hat{g}},\ {\hat{k}})&\mathrm{var}({\hat{k}})\end{array}\ \right]\]$$mathtex$$(8) and the standard errors are given by σ(ĝ) = var(ĝ) and σ(*k̂*) = var(*k̂*).

These standard errors can be used to create approximate confidence intervals, because the parameter estimates are asymptotically normally distributed if they are maximum-likelihood estimates for the parameters. For the nonlinear least-squares method, the estimates are maximum-likelihood estimates only if the deviations from the fitted model have a normal distribution. For the beta regression method, the estimates are the maximum-likelihood estimates.

A point estimate of the timing of escape mutations can be found by plugging in the point estimates for *g* and *k*, *T̂*_{50} = log(ĝ)/*k̂*. To find approximate standard errors for the estimates of *T̂*_{50}, we use the delta method (4). Assuming the asymptotic normality of the parameter estimates ĝ and *k̂* (see the last paragraph), the approximate standard error for *T̂*_{50} = log(ĝ)/*k̂* is given by
$$mathtex$$\[{\sigma}({\hat{T}}_{50}){=}\sqrt{\left(\frac{{\partial}{\hat{T}}_{50}}{{\partial}{\hat{g}}}\right)^{2}\mathrm{var}({\hat{g}}){+}\left(\frac{{\partial}{\hat{T}}_{50}}{{\partial}{\hat{k}}}\right)^{2}\mathrm{var}({\hat{k}}){+}\frac{{\partial}{\hat{T}}_{50}}{{\partial}{\hat{g}}}\frac{{\partial}{\hat{T}}_{50}}{{\partial}{\hat{k}}}\mathrm{cov}({\hat{g}},\ {\hat{k}})}\]$$mathtex$$(9) where σ(ĝ), σ(*k̂*), and cov(ĝ, *k̂*) are the standard error of ĝ, standard error of *k̂*, and covariance of ĝ and *k̂*, respectively. Estimated standard errors of *T̂*_{50} (the timing of escape) and *k̂* (the rate of escape), both from nonlinear least-squares regression and beta regression, are listed in Table 1.

Acute sequence evolution model.We have developed a model for characterizing sequence evolution in acute HIV infection (9, 29, 33). This model incorporates key virologic parameters, including the basic reproductive ratio (53), generation time (40, 43), and the single-cycle error (base substitution) rate of viral reverse transcriptase (39). Each new infection entails a single round of reverse transcription that introduces errors in the proviral DNAs with the number of mutations given by the binomial distribution, Binom(*n*; *N _{B}*, ε), where

*n*is the number of new base substitutions,

*N*is the length of the viral genome, and ε is the reverse transcriptase error rate. The binomial distribution implies that base substitutions occur independently, with a probability of ε at each of the

_{B}*N*sites of the viral genome in each reverse transcription cycle. In the absence of any selective pressure (including CT8-TL pressure), the fraction of transmitted viral sequences is expected to decay exponentially as $$mathtex$$\[\mathrm{Exp}\ \left[{-}{\varepsilon}N_{B}\left(\frac{t}{{\tau}_{a}}\frac{1{+}{\varphi}}{2{\varphi}}{+}\frac{1{-}{\varphi}}{{\varphi}^{2}}\right)\right]\]$$mathtex$$(10) where

_{B}*t*is days postinfection, τ

_{a}is viral generation time,

*R*

_{0}is basic reproductive ratio, and φ = 1 + 8/

*R*

_{0}. We used the parameter values of τ

_{a}= 1.5 days,

*R*

_{0}= 6, and ε = 2.16 × 10

^{−5}substitutions per site per replication cycle (33). This acute sequence evolution model has been applied to interpret sequence clones derived from 102 subjects with acute HIV-1 subtype B infection (29), from 69 subjects with acute HIV-1 subtype C infection (1), and from numerous SIV-infected macaques (9, 30).

## RESULTS

First viral immune escape.The development of escape mutations in the acute phase of HIV-1 and SIV infections is well documented (5, 24, 41). We set out to determine the concrete dynamics of the emergence of each mutant during this phase of infection. For this study, we rely on a combination of previously published (8) and newly generated pyrosequencing data (see Materials and Methods) together with samples from three animals that received a previously described DNA prime/rMVA boost vaccine regimen prior to i.r. challenge with SIV_{mac239} (54).

Figure 1 shows the percentage of each viral variant out of the total sample viral population within the Mafa-A1*063-restricted epitope Nef_{103-111}RM9 (RM9) for animals cy0161, cy0162, cy0163, and cy0165 as well as for the Mamu-A1*001-restricted epitope Tat_{28-35}SL8 (SL8) for animals rh2122, rh2124, rh2126, and rh2127. Viral load kinetics in each animal also are plotted in Fig. 1. As noted previously, peak acute viremia was reduced in the three vaccinated rhesus macaques (rh2122, rh2126, and rh2127), but virus replication in the chronic phase was not contained (54).

During the initial viral ramp-up stage, the wild-type epitope sequence, which corresponds to the transmitted (founder) virus, was the dominant population in every animal. Escape variants started to appear between days 14 and 21 postinfection, which coincides with the decline of peak viremia. While ultradeep sequencing also showed additional minor variants (8), for simplicity, Fig. 1 presents the percentage of epitope sequence variants that represented more than 10% of the viral population at a minimum of one time point. During the period of day 14 through day 56, a few variant peptides were observed (Fig. 1). As the viral set point was approached, however, a single escape variant came to dominate the virus population. The dominant variant represented more than 50% of the total viruses in all animals, although minor variants remained present.

The sequence of the dominant escape mutant at the set point was not uniform in all of the animals. In cynomolgus macaques cy0161 and cy0162, the variant RPQVPLRTM (K3Q) became dominant by day 140 postinfection, while in animals cy0163 and cy0165 the variant RSKVPLRTM (P2S) became dominant, consisting of more than 80% of the epitope population as the viral set point was being established. Amino acid substitutions (underlined in the sequences) and their positions in the sequences were used to name the variants; for example, the variant with a change of K to Q, RPKVPLPRTM to RPQVPLPRTM, is designated K3Q. We also observed that the dominant mutant at the viral set point in cy0163, RSKVPLRTM (P2S), appeared rather earlier in cy0161 and cy0162 during the viral decline stage and was selected out afterwards. This observation shows that the escape pattern within a single epitope restricted by the MHC class I molecule, Mafa-A1*063, can be different from animal to animal. Similar findings were obtained in the group of rhesus macaques. In animal rh2126, variant PTPESANL (S1P) replaced the founder virus in the acute stage and persisted until day 140 postchallenge. On the other hand, in animal rh2124, this mutant was subdominant and another variant SIPESANL (T2I) became the major escape variant (Fig. 1). In animal rh2127, as many as six mutant viruses emerged during the acute phase, each representing greater than 10% of the population, and the variant STPESAKP (N7K, L8P) ultimately became dominant at the set point.

Hughes et al. recently examined this same data set and concluded that the random nature of virus mutagenesis is an unexpectedly powerful driver of individual-specific differences in the emergence and dominance of different epitopes (26). When several epitope variants emerge within a single animal, their dominance pattern may reflect that (i) intrinsic fitness differences among variants depend on host factors such as a unique pattern of CD8-TL responses and (ii) the existence of additional compensatory mutations outside the epitope can drive the predominance of a particular epitope in the animal.

Shannon information entropy analysis.Although the intrinsic escape pattern was unique in each animal, a common characteristic was the emergence of diverse epitopes during the early decline from peak viremia, followed by the selection of a single dominant escape variant at the viral set point. Selective sweep occurred as a result of positive selection (46). To better evaluate this observation, we assessed the level of the diversity of the epitope variants by measuring the Shannon information entropy (52), which is defined in equation 1 in Materials and Methods. The greater the number of different variants that are present in comparable amounts, the greater the entropy value. Figure 2 shows the dynamics of Shannon entropy from eight animals. Initially, as the transmitted (founder) epitope is dominant within each animal, the value of the entropy is close to 0. Entropy reached peak levels at 3 to 4 weeks postinfection in all animals except rh2124, denoting that more variants within the epitope region appear in comparable frequency among the variants. This is consistent with recent findings from three acutely HIV-1-infected subjects (24), in which the phase of viral decline after peak viremia was marked by the appearance of diverse variants within CT8-TL epitopes and the entropy decreased as the viral set point was established. The entropy values at viral set point (day 140 postinfection) and the values at the viral decline after the peak (day 28 postinfection) exhibited a statistically significant difference (*P* = 0.00088, paired *t* test).

Estimating the rate and the timing of the first CD8-TL escape.We next estimated the rate of CD8-TL escape, *k*, and the timing of the first CD8-TL escape, *T*_{50}, in each animal by fitting our model equation, the logistic curve in equation 4, to the serial CD8-TL escape kinetics data (see Materials and Methods). The timing of the first CD8-TL escape, *T*_{50}, is defined as the time (in days) when the proportion of the first CD8-TL escape mutant is 50% relative to the aggregate population of wild-type virus plus the that of the escape mutant virus that emerges first in each animal. The first CD8-TL escape mutant was designated the mutant virus comprising more than 10% of the total population for the first time postinfection. When the total virus population was comprised of more than a single mutant virus at the same time, we chose the mutant virus that was present in the greater proportion as the first CD8-TL escape mutant. As described in Materials and Methods, we employed two different methods for data fitting. The first method is the nonlinear least-squares regression with the Levenberg-Marquardt minimization algorithm (45), which chooses *g* and *k* to minimize the sum of squared errors between the logistic curve *p*(*t*) = 1/[1 + *g* exp(−*kt*)] and the data points *m*(*t*)/[*w*(*t*) + *m*(*t*)]. Since the data values for the proportion of mutant virus are restricted to the range from 0 to 1, beta regression was used as a second data-fitting method (17). In beta regression, the deviations from the fitted model are properly represented using a beta distribution. Figure 3 shows the best fits of the model to the kinetics of the first escape in each animal from the least squares and beta regression. The proportion of each first mutant epitope relative to the sum of the mutant and wild-type epitopes, defined as *m*(*t*)/[*w*(*t*) + *m*(*t*)] (see Materials and Methods), is plotted as a function of time. For animals cy0161, cy0162, cy0163, and cy0165, the wild-type (transmitted) epitope sequence is RM9, Nef_{103-111}, RPKVPLRTM, and for animals rh2122, rh2124, rh2126, and rh2127, the wild-type (transmitted) epitope sequence is SL8, Tat_{28-35}, STPESANL. Each individual mutant epitope is denoted in Fig. 3; for instance, the first escape variant in animal cy0161 is P2S (Fig. 1). The estimated rates of escape with standard errors are listed in Table 1. The rate of the first escape during the early stage ranges from 0.19 to 1.07 day^{−1}, with a mean of 0.36 day^{−1}. The rates of the first escape in the two escape-prone epitopes, RM9 and SL8, were comparable without a statistical difference (*P* = 0.56, two-tailed Wilcoxon rank sum test). The timing of the first escape ranges from 18.5 to 27.3 days with a mean of 22.1 days. We plotted the estimated timing of the first escape and the estimated rate of the first escape with asymptotic standard errors for each animal in Fig. 4. As shown in Table 1 and Fig. 4, while the least-squares method provides smaller values of the sum of squared errors, the beta regression estimates yield smaller standard errors for the parameter estimates. Although data from each animal were fit separately with no shared information, the beta regression estimates are more consistent across the animals than the least-squares estimates.

Estimating the rate and the timing of chronic CD8-TL escape.In addition to RM9 and SL8 epitopes, we also measured the kinetics of changes in the CM9, Gag_{181-189}, CTPYDINQM epitope during the infection of rhesus macaques. The selection of mutant variants of this epitope is known to take an extended time period (>100 days) (54), and therefore we used this analysis to understand the process of chronic escape following the establishment of the virus set point. Figure 5 plots the percentage of the wild-type (transmitted) epitope and the variant (escape) epitopes together with viral load kinetics. In macaques rh2124 and rh2126, the transmitted CM9 epitope was preserved during the entire period of analysis, 400 days postinfection, yielding an escape rate of 0 for this time period. In animal rh2122, the transmitted CM9 epitope was dominant until 224 days postchallenge. At the next sampling interval (day 400), the epitope was found to have been replaced with a single mutant, GTPYDINQM (C1G). The transmitted CM9 epitope was 1.9% of the total sampled virus population at 400 days postinfection. Because the change in the percentage of the escape variant happens only between two time points, we did not perform data fitting for animal rh2122. The estimated chronic CD8-TL escape rate for the CM9 epitope in animal rh2127 was 0.014 day^{−1}, which is approximately 26-fold less than the mean CD8-TL escape rate for the escape-prone epitopes RM9 and SL8. (Table 1).

Acute expansion kinetics of epitope-specific CD8^{+} T lymphocytes.We next examined the association between the emergence of viral epitope variants and the expansion kinetics of the acute, epitope-specific CD8^{+} T lymphocytes using previously reported data (8, 54) and the addition of new data set (see Fig. S1 in the supplemental material for the details). In cy0161, cy0162, cy0163, and cy0165, RM9-specific CD8-TL were quantitated in peripheral blood using *Mafa-A1***063* tetramers loaded with Nef RM9 peptide (8); in animals rh2122, rh2124, rh2126, and rh2127, SL8-specific and CM9-specific CD8^{+} T lymphocytes were quantitated using *Mamu-A1***001* tetramers (54) (see Fig. S1 in the supplemental material). The peak CD8^{+} T-cell response occurred at 14 days postinfection in half of the animals (4/8) and slightly later in other cases (day 21 in three cases and day 24 in the other). In all cases except for animal cy0163, the estimated timing of the first CD8-TL escape in each animal was later than the measured peak of the CD8^{+} T-cell response against the wild-type epitope. These data support the notion that selective pressure exerted by CD8^{+} T lymphocytes leads to variations within the RM9 and SL8 epitopes during the acute phase of infection. In contrast, and consistently with previous reports, robust early CD8-TL responses directed against the Gag CM9 epitope failed to elicit the early emergence of CD8-TL escape variants in this epitope. The slow escape of the CM9 epitope despite the presence of strong and early CM9-specific CD8-TL responses has been attributed to the requirement for compensatory substitutions within extraepitopic regions to restore viral fitness (22).

Association between the estimated timing of first CD8-TL escape and the acute CD8-TL response.We investigated which factor, the magnitude or the timing of the peak acute CD8-TL response, was more strongly associated with the timing of the first CD8-TL escape. The magnitude of the CD8-TL response was defined as the area under the curve of the expansion kinetics for each epitope-specific CD8-TL response depicted in Fig. S1 in the supplemental material. The timing of the peak CD8-TL response was defined as the time point when the measured number of tetramer-positive CD8^{+} T cells was greatest. As noted above, samples were collected at time intervals of 1 week from the rhesus macaques (days 7, 14, 21, and 28) and at slightly more frequent intervals from the cynomolgus macaques (days 10, 14, 17, 21, 24, and 28).

Figure 6 shows the correlation between the estimated timing of the first CD8-TL escape within escape-prone epitopes RM9 and SL8 and both the magnitude and the timing of the acute peak CD8-TL response. As noted above, the observed timing of the peak CD8-TL response is constrained by the sampling interval used. Despite this limitation, the estimated timing of the first CD8-TL escape was more strongly correlated with the timing of the peak CD8-TL response than with the magnitude of the CD8-TL response. An earlier peak in the CD8-TL response was associated with the earlier estimated timing of first CD8-TL escape (*r* = 0.76) (Fig. 6B). The correlation was statistically significant (*P* = 0.027, F test).

Kinetics of loss of the transmitted (wild-type) epitope.We next quantified the effectiveness of CT8-TL selection pressure by comparing the loss profile of the transmitted (wild-type) epitope to the predicted decay profile generated by our acute sequence evolution model (9, 29, 33). To do this, we analyzed many hundreds of sequences corresponding to the two epitopes of interest (RM9 and SL8) in the original SIVmac239 inoculum. Out of 283 reads of SL8, we detected only the wild-type sequence, STPESANL. In the region of RM9, 98.92% of 1,104 sequence reads corresponded to the wild-type sequence, RPKVPLRTM, while minor variants accounted for the remaining 1.08% of the total sequence reads.

As described in Materials and Methods, our acute sequence evolution model makes the assumption that the sequence population diversifies without any selection, e.g., under neutral evolution. The model predicted an exponential decay of the transmitted virus as a function of time *t* (days) postinfection as described for equation 10. Figure 7A shows the experimentally determined dynamics of the percentage of each transmitted (wild-type) epitope (RM9 and SL8), plotted as a function of time. Figure 7A also depicts the predicted dynamics of sequence decay based on our mathematical model, along with 95% confidence intervals that were calculated by sampling 1,414 sequences at each time from 1,000 Monte Carlo simulation runs. Here, the sample size 1,414 in the simulation was chosen based on the average number of sequence reads that were obtained for each sample. Figure 7A shows that the rate of decay of the transmitted (wild-type) epitope in each macaque is greater than that predicted by the model; this suggests that there is strong selection pressure on the two epitope regions analyzed.

Finally, we plotted the loss dynamics of the transmitted (wild-type) RM9 and SL8 epitopes by averaging data from eight macaques (Fig. 7B). Our goal was to quantify modified dynamics of the transmitted epitope in response to CD8-TL selection. The loss dynamics consist of two distinct stages: (i) between 14 and 28 days postinfection, when peak CT8-TL responses occur, and (ii) 28 days or longer postinfection, when virus replication is moving toward steady-state levels. The rate of the loss of the transmitted epitope was 1.4 × 10^{−1} day^{−1} between 2 and 4 weeks after infection. After 4 weeks, we observed a phase of slower loss rates of approximately 10-fold less, 1.0 × 10^{−2} day^{−1}, as the viral set point is being established.

The acute sequence evolution model predicts the decay rate of the transmitted epitope as 3.2 × 10^{−4} day^{−1} (see equation 10). This rate is reflective of the replacement of the wild-type epitope under conditions of neutral evolution (i.e., replacement due to random base substitutions that occur as a result of errors during viral reverse transcription, in the absence of any favorable selection for mutants). During the period of peak viremia and subsequent viral decline (days 14 to 28 following virus infection), the decay rate of the transmitted (wild-type) epitope was 438 times faster than that predicted by the model, implying that there is a very strong selective clearance of the transmitted epitope as a result of CD8^{+} T-cell-mediated immune control.

Unexpectedly, despite the robust selective pressure, the transmitted (wild-type) RM9 and SL8 epitopes were found to persist as a minor population in the host until at least 140 days postinfection in all eight animals (8). The median proportion was 5.7% at day 140, suggesting that CD8-TL-mediated viral clearance during the acute stage is not absolute (8).

In contrast, the majority of CD8-TL escape mutant epitopes were completely cleared. Out of a total of nine mutant epitopes that appeared as a proportion of greater than 10% of the total population at a minimum of one time point during viral decline stage, five mutant epitopes were undetectable at the viral set point. We found that 60% of those five mutant epitopes were present in a greater proportion than that of the transmitted virus during the viral decline phase. The median (mean) proportion of these mutant epitopes at day 140 postinfection was 0% (1.9%). We conclude that the majority of acute escape mutants, except the positively selected dominant mutant, are transient and less capable of establishing a reservoir than the transmitted epitope.

## DISCUSSION

The association between the strength of the virus-specific CT8-TL response and the resolution of acute-stage viremia has been illustrated for both natural HIV-1 and experimental SIV infection (10, 28, 31, 51). CT8-TL-mediated immune pressure also results in the emergence of viral escape mutants, and viral escape is considered a major obstacle to the development of T-cell-based prophylactic vaccines (27, 35, 55).

The current study accurately monitored viral sequence variations within three well-defined CD8-TL epitopes by monitoring the experimental infection of cynomolgus and rhesus macaques with SIVmac239. We employed ultradeep pyrosequencing to quantify viral escapes in these CD8-TL epitopes in the SIV-infected macaques. Unlike the single-genome amplification technique, which provides sequence information from only a limited number of viral clones (typically less than 50 clones per single sample), ultradeep pyrosequencing readily allows for 1,000 or more sequence reads of an epitope of interest. This in-depth coverage of the three epitopes Nef_{103-111}RM9, Tat_{28-35}SL8, and Gag_{181-189}CM9 permitted us to carefully dissect sequential CT8-TL-driven viral strain changes restricted by Mafa-A1*063 (RM9) and Mamu-A1*001 (SL8 and CM9).

We systematically computed the rates of escape, revealing substantial differences among escape-prone and escape-resistant epitopes by employing ultradeep pyrosequencing data, which measured the proportion of each wild and mutant clone more precisely than any conventional Sanger sequencing. Mathematical models have framed the escape phenomenon as the interplay between the susceptibility to CD8-TL killing and the viral fitness cost for the wild type versus escape variants; the rate of escape is given by the difference between the rate of CT8-TL killing and the fitness cost of the variant. The relative contribution of CT8-TL killing and viral fitness cost to the escape kinetics differs from epitope to epitope. This is exemplified by the rapid escape of the SL8 epitope but the much slower escape of the CM9 epitope in the Indian rhesus macaques. Despite the presence of strong and early CD8-TL responses to CM9, the high fitness cost of CM9 epitope mutants and the requirement of compensatory substitutions within extraepitopic regions to restore viral fitness result in the delayed emergence of CD8-TL escape mutations within this epitope (22).

Another element that may contribute to slower escape is the diminished quantity and quality of CD8^{+} T-cell responses at the chronic stage. The frequency of CD8^{+} T cells specific for a single epitope has been shown to peak at around 2 weeks postinfection and to decrease around 20-fold as the set point is being approached (36). The quality and effectiveness of the CD8^{+} T lymphocyte response also may diminish over time due to chronic antigenic stimulation, the impairment of CD4^{+} T cells, and damage to lymphoid tissue (56).

Our estimates for the rate of the first viral CD8-TL escape within the escape-prone epitopes RM9 and SL8 were greater than some previous estimates (7) but comparable with others (16, 23, 36). Our estimates also are consistent with those based on quantitative real-time PCR (qRT-PCR) assays (36). The basis of qRT-PCR is different from that of cloning and sequencing; it tracks viral loads of both wild-type and predefined escape mutants. The limitation of this approach is that it permits the measurement of only those mutant viruses for which the assay was designed. Table 2 summarizes these previous estimates along with our estimates. The high-resolution ultradeep pyrosequencing data in the current study provided more accurate estimates for the rate of SIV escape than these earlier studies, which generated escape rates ranging from 0.19 to 1.07 day^{−1}.

Our finding using the SIV/macaque model closely parallels the recent findings from the analysis of viral sequences obtained from three subjects with primary HIV-1 infection (24). Specifically, our data show that more diverse sequence variants, reflected by higher Shannon entropy scores, appear in the viral decline phase after the peak of viremia. Following this period of diversification, viral sequences became homogenized with the positive selection of a single dominant epitope variant at the viral set point. This feature was observed in various HIV-1 epitopes, including Rev_{9-26}, Vif_{113-130}, Gag_{236-253}, Nef_{17-34}, and Env_{822-839} (24).

Our estimate of the rate of viral escape during the acute phase of SIVmac239 infection of macaques was found to be comparable to the rate for viral escape during the acute HIV-1 infection of humans. From a recent study of HIV-1 escape (24), when we limited the analysis to the cases where the timing of escape was within 20 days of documented HIV-1 infection after the first screening, the average rate of CD8-TL escape was 0.33 day^{−1}. We note that all three acute subjects analyzed in reference 24 were in Fiebig stage II at the first screening according to reference 50. Here, the Fiebig staging system classifies the status of HIV-1-infected subjects based on an orderly appearance of viral RNA, antigen, and antibodies in plasma during early infection (18, 19). On average, Fiebig stage II (viral RNA^{+}, p24^{+}, before seroconversion) corresponds to around 22 days postinfection (29, 33). Hence, 20 days of documented HIV-1 infection after the first screening corresponds to around 42 days postinfection, which is very similar to the timeframe analyzed in our macaque study. While the average timing of HIV-1 escape is greater than that of SIV escape, 34.8 ± 5.2 days (HIV-1) versus 22.1 ± 3.78 days (SIV), the rates of escape were comparable, 0.33 ± 0.09 day^{−1} (HIV-1) versus 0.36 ± 0.36 day^{−1} (SIV). We compared the rates of HIV-1 and SIV CD8-TL escape in Table 2.

Our report of comparable rates of very early escapes within humans and macaques is contradictory to a previous meta-analysis study of SIV and HIV-1 CTL escape events (7). Either higher levels of CD8^{+} T cells targeting each peptide or a more efficient killing mechanism by CTL in macaques was suggested to explain the difference (7). Our calculation quantitatively indicates that HIV-1 and SIV escape with comparable rates if we focus on very early events within around 1 or 2 months postinfection. Our quantification is consistent with reference 36, which reported the association of the rate of escape with the timing of escape during primary SIV infections.

We observed an association between the timing of the first escape and the peak timing of the peak CD8-TL response to the escape-prone epitopes. The correlation coefficient was 0.76, and the correlation was statistically significant (*P* = 0.027). Because viral CD8-TL escape is a surrogate indicator for functional CD8-TL activity, these findings suggest that an early peak in the acute CD8-TL response is more important for the effective CD8-TL-mediated containment of acute virus replication than the peak magnitude of the acute CD8-TL response. However, note that none of the animals in our study were able to control SIV replication during the chronic phase, including the three vaccinated animals that exhibited a very robust anamnestic CD8-TL response. Thus, it remains unclear what characteristic(s) of the early CD8-TL response best predict the sustained immune-mediated suppression of virus replication at the steady state. Numerous studies have addressed how different facets of CD8-TL responses, the magnitude, the breadth, and the functional capacity, are related to the level of the containment of HIV-1 and SIV (2, 3, 15, 20, 44, 47).

Viral CTL escape kinetics are characterized by the rapid loss of the transmitted epitope concurrently with a decline from the acute-phase peak of viremia. In the phase of falling viral load, the level of transmitted epitopes decays exponentially (Fig. 7), which is consistent with the trend predicted by the acute-sequence evolution model (9, 29, 33). However, the rate of loss during this period is 438 times faster than that predicted by the acute-sequence evolution model, which is premised on the progressive but nonselective replacement of the transmitted virus population through random base substitutions that result from error-prone reverse transcription (9, 29, 33). The rapid loss of the transmitted epitope strongly supports the existence of a strong selective pressure mediated by CT8-TL responses.

Despite the robust selective pressure, the transmitted (wild-type) RM9 and SL8 epitopes were found to persist as a minor population in the host, even at the viral set point, in all animals (8). The percentage of the transmitted (wild-type) epitope remaining at day 140 ranged from 0.04 to 12.2% (8). Thus, despite strong CT8-TL responses against the transmitted virus and a low fitness cost associated with mutations in the SL8 and RM9 epitopes, the transmitted virus population could not be eradicated in most of the animals. Conversely, more than half of CD8-TL escape mutant epitopes were utterly cleared within the same time frame. Some of these mutant epitopes existed in a greater proportion than founder epitopes during the viral decline stage. Our observation may imply that the initial formation of a reservoir of SIV infection is preferentially driven by transmitted, founder epitopes rather than transient acute escape epitopes despite the fact that the population size of some of the acute escape epitope clones was larger than that of the transmitted clone during the phase of acute viremia decline.

Previous results obtained from human subjects with primary HIV-1 infection have shown the complete loss of the transmitted (wild-type) epitope obtained within a similar time period, 159 days after the first preseroconversion screening (24). We believe that this discrepancy can be explained most plausibly by the difference in methodology used in our study versus that of Goonetilleke et al., who performed single-genome amplification (SGA) as a prelude to sequence analysis (24). In this earlier study, only a limited number of clones were analyzed, around 50- to 100-fold fewer than that in the present study. It is likely that the limited number of clones analyzed accounts for the failure to detect the transmitted epitope at late time points postinfection. According to our previous power analysis (29, 33), if we sample 50 clones, we can be 95% confident that the undetected transmitted epitope comprises less than 5.8% of the total virus population. However, if we increase the sample size to 1,414 (as in the present study), we can be 95% confident that the undetected transmitted (wild-type) epitope comprises less than 0.21% of the total virus population. This calculation suggests that the greater depth of coverage possible using ultradeep pyrosequencing compared to that of single-genome amplification permits a much greater sensitivity for the detection of rare virus sequence variants, including residual virus that contains wild-type epitopes. Quantitative real-time PCR (qRT-PCR) assays observed the persistence of the wild-type epitope in SIV infections (36, 37). We do not know the source of the persistence of the transmitted epitope. The transmitted epitope at the viral set point may be derived from productively infected or latently infected CD4^{+} T cells, cellular sources other than CD4^{+} T cells, and/or follicular dendritic cell-associated virions (11, 25, 58).

In conclusion, our study first estimated the rate of CD8-TL-mediated viral escape and the timing of escape by employing high-throughput pyrosequencing data in experimental SIV infections. The rates of escape during very early periods of HIV-1 and SIV infections were unexpectedly comparable, implying the prominent role of CD8-TL in shaping viral evolution during early SIV and HIV-1 infections.

## ACKNOWLEDGMENTS

We thank Benjamin N. Bimber, Benjamin J. Burwitz, Matt Reynolds, David Watkins, and David O'Connor for providing the data analyzed in this work, Vitaly Ganusov, Daniel Garrigan, and Alan Perelson for helpful discussions, and Robert Paul Johnson for the critical review of the manuscript.

This work was funded by NIH grant R01 AI083115 (to H.Y.L.). This publication also was supported in part by NIH grant T32 ES 007271 and the University of Rochester Developmental Center for AIDS Research (NIH P30AI078498).

## FOOTNOTES

- Received 18 January 2010.
- Accepted 15 March 2010.

- Copyright © 2010 American Society for Microbiology