**DOI:**10.1128/JVI.80.5.2380-2389.2006

## ABSTRACT

Growth competition assays have been developed to quantify the relative fitnesses of human immunodeficiency virus (HIV-1) mutants. In this article we develop mathematical models to describe viral/cellular dynamic interactions in the assay experiment, from which new competitive fitness indices or parameters are defined. These indices include the log fitness ratio (LFR), the log relative fitness (LRF), and the production rate ratio (PRR). From the population genetics perspective, we clarify the confusion and correct the inconsistency in the definition of relative fitness in the literature of HIV-1 viral fitness. The LFR and LRF are easier to estimate from the experimental data than the PRR, which was misleadingly defined as the relative fitness in recent HIV-1 research literature. Calculation and estimation methods based on two data points and multiple data points were proposed and were carefully studied. In particular, we suggest using both standard linear regression (method of least squares) and a measurement error model approach for more-accurate estimates of competitive fitness parameters from multiple data points. The developed methodologies are generally applicable to any growth competition assays. A user-friendly computational tool also has been developed and is publicly available on the World Wide Web at http://www.urmc.rochester.edu/bstools/vfitness/virusfitness.htm.

In the past few years, there has been a growing interest in determining the replicative capacity of human immunodeficiency virus type 1 (HIV-1) due to the potential impact of viral fitness on virus population size (viral load), drug resistance, and AIDS progression. Although viral fitness data originating from in vitro studies may not directly resemble in vivo clinical results, they offer a tool for studying HIV-1 replication capacity and its relationship with drug resistance mutations. HIV-1 replication fitness has been proposed to be an important predictor of clinical outcome in HIV-1 infection. Replication fitness is the ability of a virus to replicate under the selective forces present in its environment. HIV-1 replication fitness in vivo is postulated to influence which variants predominate in quasispecies, as well as to affect treatment responses and disease progression (16, 19). A critical question is whether the HIV-1 replication fitness measured by assays in vitro correlates with the fitness in vivo and whether such assays can be used to predict prognosis or response to therapy. We will refer to assays that measure replication of HIV-1 in vitro as fitness assays, recognizing that the nomenclature in this area is controversial and that these assays likely do not fully reflect HIV-1 replication fitness in an individual patient. It is essential to quantify relative degrees of fitness in order to evaluate the clinical significance of in vitro fitness assays. However, there is no clear consensus on how best to quantify HIV replication fitness, as measured in vitro.

Fitness is sometimes quantified on the basis of direct biochemical measurements of enzyme activity (4, 16) or virus growth kinetics in pure cultures that contain only one variant (13, 16). According to population genetics theory, the relative fitness (1 + *s*) of a variant represents its relative contribution to the next generation, where the parameter *s* is called the selection coefficient. Experimental protocols on viral fitness have been reviewed recently (19). However, the assays that operate on a single virus variant cannot take the interactions between different virus strains into account. Many studies have therefore investigated the relative growth kinetics of two virus variants growing in competition either in vivo (8, 9) or in vitro (4, 10, 12, 15). In these studies, a measure of fitness is typically derived by plotting the ratio of the two competing variants on a logarithmic scale against time and estimating the linear slope of this graph, which is used as the measure of fitness. Goudsmit et al. (9) introduced a new definition of relative fitness as the ratio between two production rates of two viral strains based on a viral dynamic model, which could be regarded as a measurement of relative selection effect of the viral production rate (9). However, as shown later in this paper, the definition of relative fitness used in references 1, 9, and 14 is inconsistent with the conventional definition of relative fitness in population genetics and does not take into account the death rates of the viral strains. We recognize the merit of the ratio of the production rates (referred as to the “production rate ratio” or PRR in this article) as a component of the relative fitness in order to distinguish it from the new definition of relative fitness based on the fitness concept in population genetics. The definition of relative fitness in reference 9 was later adopted by Marée et al. (14). They proposed a simple mathematical model to describe the dynamics of viral competition between wild-type and mutant virus and the linkage of the production rate of a viral strain to its fitness. However, their method is not efficient and accurate, because the data from only two time points were used and the method offers no information on the goodness of their estimate. Following the idea of Marée et al. (14), Bonhoeffer et al. (1) presented a new approach with an intention of using time series data at multiple time points. Although Bonhoeffer et al. (1) considered experimental errors in both coordinates in their linear regression model, their method for adjusting the measurement errors in both coordinates is not consistent with the standard statistical methods and is difficult to implement. In this article, we will introduce several parameters, based on HIV dynamic models, to quantify the relative fitness for the growth competition assay. The relationship between these parameters and the relative fitness used in the field of population genetics will be discussed. The confusion in the definition of relative fitness in recent HIV literature will be clarified. More-efficient statistical methods will be proposed to estimate the viral fitness parameters.

## MATERIALS AND METHODS

Growth competition assay.The design and evaluation of a growth competition assay in which viral variants are detected using flow cytometry is described in detail in another publication by Dykes et al. (5). We designed two vectors, pAT1 and pAT2, that are identical to the lab strain pNL4-3 except that they have the mouse *Thy1.1* or *Thy1.2* gene cloned in place of *nef*, respectively. The *Thy1.1* and *Thy1.2* proteins differ by only one amino acid and are expressed on the surfaces of infected cells. A site-directed mutant of reverse transcriptase that is resistant to delavirdine, a nonnucleoside reverse transcriptase inhibitor, and has been shown to be unfit was cloned into pAT2 to produce pAT2P236L (7). Wild-type and mutant DNAs were used to transiently transfect 293 cells (ATCC) using Superfect (QIAGEN). Supernatants were harvested 72 h later and clarified by centrifugation at 500 × *g*. HIV-1 virus capsid protein (p24) quantitation was performed on virus stocks using an HIV-1 p24 antigen enzyme-linked immunosorbent assay (Perkin-Elmer). PM1 cells, pretreated with 10 μg/ml of polybrene (Sigma) in RPMI complete medium (89% RPMI [Gibco], 10% fetal bovine serum [Valley], 1% penicillin-streptomycin [Gibco]), were coinfected with an equal ratio of wild-type and mutant virus and cultured at 37°C in 5% CO_{2}. On days 3, 4, 5, 6, and 7, half of the culture was removed and replaced with fresh medium. The number of viable cells per ml of culture was determined before cells were removed, by staining with trypan blue and counting in a hemocytometer. Cells removed during this process were singly and dually stained with the Thy1.1 (1:200 dilution) and Thy1.2 (1:100 dilution) antibodies (BD Biosciences). Cells were fixed (4% paraformaldehyde [Sigma] in Dulbecco's phosphate-buffered saline [Invitrogen]; pH 7.4 to 7.6) and analyzed using a FACScaliber flow cytometer (Becton Dickinson). Flow cytometry data were analyzed by Cell Quest (Becton Dickinson). Cells were analyzed using a density plot of forward scatter versus side scatter, and the cell population was gated in order to eliminate debris. Gated cells were then analyzed using a density plot of FL1 (FITC/Thy1.1) versus FL2(PE/Thy1.2). The gate defining the positive threshold for the Thy1.1-stained sample was set using the Thy1.2 antibody-stained sample as a negative control and was <0.05%. The reverse was done for the Thy1.2 gate defining positivity. In order to minimize variability in the subjectivity of setting gates, all were set by one observer. The number of viable wild-type or mutant infected cells was calculated by multiplying the total number of viable cells in the original culture by the percent of wild type or mutant as determined by flow cytometry. Note that although our modeling and estimation methodology development was motivated by and applied to this particular assay, the proposed methodologies are generally applicable to any similar competition experiments.

Mathematical models.Simple mathematical models for viral fitness have been employed to estimate the selection coefficient and relative fitness from experimental data (1, 9, 14). Here we extend these simple models and develop a complete model with five compartments for viral fitness as follows.
(1)
where *T*, *T _{m}*,

*T*,

_{w}*M*, and

*W*are numbers of uninfected target cells, cells infected by infectious mutant virus, cells infected by infectious wild-type virus, mutant virus, and wild-type virus, respectively. λ represents the cell proliferation rate;

*d*is the death rate of target

_{T}*T*cells; and are the respective infection rates at which

*T*cells become infected by

*M*and

*W*;

*N*and

_{m}*N*are the respective number of new virions produced from each of the infected cells during their lifetime; δ

_{w}*and δ*

_{m}*are the respective death rates of*

_{w}*T*and

_{m}*T*;

_{w}*c*and

_{m}*c*are the respective clearance rates of mutant and wild-type virions. We summarize all definitions and mathematical notations in Table A1 in the Appendix.

_{w}The model assumes that the growth rate of both viral populations is proportional to their corresponding infected-cell densities. The dynamics of free virus are typically fast in comparison with those of the infected cells (17, 18). This generally implies that a quasi-steady state is rapidly established. In what follows, we therefore do not explicitly distinguish between free virus and infected-cell load. Thus, we write the quasi-steady-state equations for virus as follows:
(2)
where θ* _{m}* = (

*N*δ

_{m}*)/*

_{m}*c*and θ

_{m}_{w}= (

*N*δ

_{w}*)/*

_{w}*c*, and reduce model 1 to the following form. (3) where and . We compared the solutions of

_{w}*T*and

_{m}*T*from equations 1 and 3 by solving the differential equations numerically. We found (data not shown) that when the condition (equation 2) was approximately satisfied, the solutions of infected cells

_{w}*T*and

_{m}*T*from equations 1 and 3 were very similar, although the level of target cells,

_{w}*T*, varied at all times.

If we assume the concentration of target cells (*T*) to be constant or large enough, the model (equation 3) can be further simplified to
(4)
We can solve the above system of differential equations and obtain the following closed-form solution for *T _{m}* and

*T*from time

_{w}*t*

_{1}to

*t*

_{2}. (5) The numerical solutions of both infected cells

*T*and

_{m}*T*from equations 3 and 4 are similar (data not shown) when the target

_{w}*T*cells are approximately constant at all time. Thus, under certain assumptions we can use the simplified equation 3 or 4 for further analysis.

Competitive fitness parameters.Relative fitness has its origin in population genetics. We review the relative fitness definition in population genetics and then clarify the definition of relative fitness as it appeared in recent HIV literature on viral fitness.

In the population genetics literature, fitness of a genotype is generally defined as the ability of the genotype to survive and reproduce (11). It may be quantified as the number of progeny contributed to the next generation (2). Relative fitness of a genotype is defined as the ratio of the fitness of this genotype against that of a reference genotype (usually the fittest genotype). For the competitive viral fitness experiment described in this paper, we assume that no new strains of virus have been produced. Thus, the pooled population of the wild-type and mutant virus can be regarded as a haploid population with two genotypes.

Consider a discrete model for a haploid population with two genotypes. The size of the *i*^{th} genotype (*i*=1, 2) is modeled by *N _{i}*(

*t*)

*= w*

_{i}*N*(

_{i}*t −*1), with a solution as

*N*(

_{i}*t*)

*=*(

*w*)

_{i}^{t}

*N*(0), where

_{i}*w*is the fitness of the

_{i}*i*

^{t}

^{h}genotype. The relative fitness is defined as 1 +

*s=w*

_{1}

*/w*

_{2}, and

*s*is the selection coefficient. Based on the above models, we have where

*d=*ln(1 +

*s*) is the logarithm of the relative fitness. Some HIV literature (8, 10) also adopted this definition of the relative fitness. If observations

*y*(

*t*)

*= ln*[

*N*

_{1}(

*t*)

*/N*

_{2}(

*t*)] at multiple time points are available, parameter

*d*can be estimated by regressing the observations,

*y*(

*t*), on time

*t*. The slope will be the estimate of

*d=*ln(1 +

*s*). If observations are available at only two time points

*t*

_{1}and

*t*

_{2}, then

*d*=[1/(

*t*

_{2}−

*t*

_{1})]ln{[

*N*

_{1}(

*t*

_{2})/

*N*

_{2}(

*t*

_{2})][

*N*

_{2}(

*t*

_{1})/

*N*

_{1}(

*t*

_{1})]}. For a continuous model,

*dN*(

_{i}*t*)/

*dt*=

*g*

_{i}*N*(

_{i}*t*) with the solution , the fitness of the

*i*

^{th}genotype is e

^{gi}, and the relative fitness is 1+s=e

^{g1}/e

^{g2}=e

^{g1−g2}. The estimate of the relative fitness is exactly the same as that in the discrete model case.

There seems to be some confusion and inconsistency in defining viral fitness in the HIV research community. In several recent papers (1, 9, 14), the relative fitness of the mutant HIV strain was defined under the model of equation 4 as *k*_{m}/*k _{w}*, which is inconsistent with the definition of relative fitness used in population genetics as we have shown above. In fact, with the model of equation 4, the relative fitness of the mutant virus with respect to the wild-type virus should be defined as , where

*g*=

_{m}*k*

_{m}*T*− δ

*and*

_{m}*g*=

_{w}*k*

_{w}*T*− δ

*are the net growth rates of the mutant and wild-type infected cells, respectively. In order to facilitate the development of statistical estimation methods in a subsequent section, we introduce the log-relative fitness (LRF): ), which is equivalent to the difference between the two net growth rates. Since*

_{w}*g*and

_{m}*g*are the log fitness of the mutant and wild-type virus, respectively, we further define

_{w}*r = g*/

_{m}*g*as the log-fitness ratio (LFR). We call

_{w}*p = k*

_{m}*/k*the production rate ratio (PRR), which was misleadingly defined as the “relative fitness” in several recent papers (1, 9, 14). For comparison purposes, we also provide the estimation methods for the PRR in this article. Note that from the definition of

_{w}*d*and

*p*, it is easy to find their relationship as

*d*=

*k*

_{w}*T*(

*p*− 1) − (δ

*− δ*

_{m}

_{w}*)*, and the relative fitness of the mutant virus with respect to the wild-type virus is 1 +

*s = e*

^{d}

*≠ p = k*

_{m}*/k*

_{w.}Calculations of competitive fitness parameters from two data points.Let Δ*t* = *t*_{2} − *t*_{1}, with *t _{i}*(

*i*= 1,2) being two experimental time points. Similar to the analysis in reference 14, we can show from the differential equation 3 that Thus, the PRR can be calculated by (6) Note that this result is independent of the assumption that the target cell concentration,

*T*, is a constant, i.e., it can be derived from both models 3 and 4. When

*t*

_{1}= 0,

*t*

_{2}=

*t*, and δ

*= δ*

_{m}*, equation 6 is equivalent to formula 5 in the work of Marée et al. (14). Similar to the analysis of Marée et al. (14), if the culture is diluted by twofold or if half the culture is removed and replaced with fresh media at time*

_{w}*t*

_{2}relative to time

*t*

_{1}during the experiment, we have to adjust equation 6 in order to keep the same unit in the culture for all measurements. In this case, equation 6 is modified as (7) According to the definitions of the log-fitness ratio (

*r*) and the log relative fitness (

*d*), it follows from equation 5 that LFR(

*r*) can be written as (8) Similar to equation 7, when the culture is diluted by twofold at time

*t*

_{2}relative to time

*t*

_{1}, equation 8 should be corrected to the following form: (9) Similarly, the LRF between mutant- and wild-type-infected cells can be expressed as (10) We can also calculate the relative fitness as Note that the calculation of the LRF from equation 10 does not need to be adjusted for dilution factors at different time points because the expression involves only the ratio between

*T*and

_{m}*T*at the same time point. However, calculations of both PRR and LFR require a correction factor if any volume of the culture is removed at any measurement time during the experiment. In addition, the LFR and LRF calculations do not require one to know the death rates of infected cells.

_{w}We can see from equations 6, 8, and 10 that the LFR and LRF do not depend on the death rates of infected cells as the PRR does. Thus, the LFR and LRF are easier to estimate from the experimental data than the PRR. Also, the LRF has a direct relationship with the relative fitness as defined in the field of population genetics.

Estimation methods of fitness parameters from multiple data points.As stated previously, Marée et al. (14) proposed a method to compute the production rate ratio (instead of the relative fitness) based on a simple mathematical model which describes the dynamics of viral competition between wild-type and mutant virus. However, this method may result in inaccurate estimates of the production rate ratio because it is based on only two time points and the method offers no statistical information on the estimated parameter. Following the idea of Marée et al. (14), Bonhoeffer et al. (1) presented a new approach, called the growth-corrected method (GM), to estimate the production rate ratio between two virus variants using time series data (multiple data points). Via simulation studies, they compared their method with the average method (AM), which calculated the average production rate ratio between all pairs of successive time points according to equation 2.4 in the work of Marée et al. (14) and claimed some advantages of their method. However, the GM has the following weaknesses. First, the GM does not follow a standard statistical approach to consider measurement errors in the model, although it intends to adjust for the measurement error in covariates in the linear regression model. Second, because data for both coordinates contain the common element based on their model, the corresponding two measurement errors should be correlated rather than independent as they assumed, and thus, the GM could result in inefficient estimates. Third, the GM is difficult to implement using available standard statistical software. Finally, some extra variance parameters were assumed to be known, an assumption that may not be reliable; see equation A4 in reference 1. To overcome these problems, we suggest standard statistical methods to estimate viral fitness parameters (see Appendix A for details). In what follows, we apply these methods to estimate the viral competitive fitness parameters PRR (*p*), LFR (*r*), and LRF (*d*), as well as the relative fitness (1 + *s*).

(i) PRR estimate.Based on the above discussions, we have constructed a linear regression model with measurement error in covariate based on equation 6 and estimate the PRR. Let *t _{i}* be the time of the

*i*

^{th}observation (

*i*= 1,…,

*n*), Δ

*t*=

_{i}*t*−

_{i}*t*

_{i}_{−}

_{1}, and τ

_{i}= Σ

_{k=1}

^{i}Δ

*t*=

_{k}*t*

_{i}−

*t*

_{0}. Following the analysis in reference 1, equation 6 can be expressed as a regression model between variables at the

*i*

^{th}time point,

*t*, and the initial time,

_{i}*t*

_{0}: (11) where

*m*(

*t*) = ln[

*T*(

_{m}*t*)] and

*w*(

*t*) = ln[

*T*(

_{w}*t*)]. Based on equation 11, let

*y*=

_{i}*m*(

*t*) −

_{i}*m*(

*t*

_{0}) + δ

*τ*

_{m}*,*

_{i}*x*=

_{i}*w*(

*t*) −

_{i}*w*(

*t*

_{0}) + δ

*τ*

_{w}*, and β =*

_{i}*p*;

*Y*and

_{i}*X*are observed measurements corresponding to the underlying true values

_{i}*y*and

_{i}*x*at time point

_{i}*t*. Thus, we can express equation 11 in the form of equation A3 in Appendix A and apply the three methods suggested in Appendix A to estimate

_{i}*p.*We summarize these methods below.

(a) Method 1 (ordinary least square [LS]).As discussed above, the covariate *X _{i}* value is measured with error. However, if information on the error variances associated with the two variables in the model (equation A3 in Appendix A) is not available, we may have to ignore the measurement errors and substitute

*X*for

_{i}*x*in the model (equation A1 in Appendix A) and use the least-square estimate β̂ in equation A2 of Appendix A to estimate

_{i}*p*.

(b) Method 2 (measurement error model with known ratio of error variances [MEr]).If the ratio is known, the formula β̂ in equation A5 of Appendix A can be used to estimate *p*.

(c) Method 3 (measurement error model with known measurement error variance [MEv]).If the variance of the measurement error in covariate, , is known, the formula β̂ in equation A6 of Appendix A can be used to estimate *p*.

In the above three methods, we can calculate the standard deviation for the parameter estimates. See Appendix A for details of these methods.

(ii) LFR estimate.Similar to the analysis in equation 11, it follows from equation 9 that
(12)
Thus, based on equation 12, we can write down a measurement error model as in A3 with *y _{i}* =

*m*(

*t*) −

_{i}*m*(

*t*

_{0}),

*x*=

_{i}*w*(

*t*) −

_{i}*w*(

*t*

_{0}), and β =

*r*. Similar to the three methods for estimating the PRR (

*p*), we can derive the corresponding three methods to estimate the LFR (

*r*).

(iii) LRF estimate.Let *t _{i}* be the time of the

*i*

^{th}observation (

*i*= 1,…,

*n*),

*h*(

*t*) = ln[

_{i}*T*(

_{m}*t*)/

_{i}*T*(

_{w}*t*)], Δ

_{i}*t*=

_{i}*t*

_{i+1}−

*t*, and Δ

_{i}*z*=

_{i}*h*(

*t*

_{i+1}) −

*h*(

*t*). We can express equation 10 for multiple data points as the following standard linear regression model. (13) Since the covariate in model 13 is time which can be accurately recorded, it is not necessary to use a linear regression model with the measurement error in covariates to estimate

_{i}*d.*The standard linear regression estimate (Appendix A) can be used to obtain the estimate of log relative fitness as follows: (14) The standard deviation of the estimate,

*d̂*, can also be found in Appendix A. Therefore, the relative fitness can be estimated as The standard deviation of the relative fitness estimate can be obtained by the delta method.

We have developed Web-based, user-friendly software to implement all these estimation methods (http://www.urmc.rochester.edu/bstools/vfitness/virusfitness.htm). The estimates of all these viral fitness parameters from different estimation methods can be easily obtained from this software.

## RESULTS

Simulation comparison of estimation methods.To evaluate the accuracy of the estimation of the PRR by the three methods suggested above, we compare these three methods (LS, MEr, and MEv) with the AM investigated in reference 1. Note that we did not compare our methods to the GM proposed by Bonhoeffer et al. (1), since the GM is not comparable to the other methods. The GM requires specification of extra variance parameters, and the estimation results are sensitive to these specified parameters. In addition, the GM is not easy to implement, since it is not a standard statistical method to deal with the measurement error in covariate.

Numerical simulations for replication dynamics of mutant and wild-type viruses were performed according to equations 4 with the parameters δ* _{m}* = 0.5, δ

*= 0.5 and several different values of the parameters*

_{w}*k*and

_{m}*k*

_{w}. At selected time points

*t*, the observations (

_{i}*Y*,

_{i}*X*)

_{i}^{T}were generated based on the numerical solutions (

*T*,

_{m}*T*)

_{w}^{T}of differential equations 4 with measurement errors. We assume that the vector of measurement errors ((ε

*,*

_{i}*e*)

_{i}^{T}) is normally distributed with )}, where diag(σ

_{ε}

^{2},σ

_{e}

^{2}) is the 2 by 2 diagonal matrix, with diagonal elements being σ

_{ε}

^{2}and σ

_{e}

^{2}. Several different values of ,were chosen in the simulation studies.

Table 1 presents the true values (β_{true}) of PRR used to generate data, the mean estimates (β̄_{est}), and the corresponding mean squared error (MSE) by the four methods based on 200 simulated data sets for different choices of parameter values and measurement error variances. The results in Table 1 are summarized as follows.

(i) For all the cases, the MSE of the estimate from the MEv method is not larger than those from the other three methods.

(ii) For all the cases, the AM method always gives the largest MSE. The simulation results coincide with the findings from Bonhoeffer et al. (1) that occasionally the AM method yields estimates far from β_{true} and may generate outliers because the calculation of β involves division by a value close to zero.

(iii) Table 1 shows that the mean estimates based on the MEr and MEv methods are generally closer to the true values than those based on the LS and AM methods.

(iv) For (i.e., ρ > 1), the mean estimates from the MEv method are closer to the true values than those from the MEr method. However, the MEr method performs better than the MEv method in terms of achieving a mean estimate close to the true value when (i.e., ρ < 1); in this case, the MSE of the estimate based on the MEr method is also less than that based on the LS method.

(v) When (i.e., ρ = 1), the MEr performs similarly to the MEv and LS methods but better than the AM method in terms of MSE.

Note that if we choose the true values (β_{true}) to be 0.1, 0.2, and 0.4, the results are similar to those above (data not shown). In addition, for estimates of the LFR (*r*), the performance of the three methods has a pattern similar to that of the corresponding methods for estimating the PRR, and detailed results are omitted here.

Effects of measurement errors and death rates on fitness parameter estimates.In order to investigate the effect of measurement errors and death rates of infected cells on the estimate of PRR, we conducted sensitivity analyses using data from an experiment with 7 × 10^{6} target cells infected by a total of 300 ng p24 of the AT1/AT2P236L combination of viruses at a 50/50 ratio. *T _{m}* and

*T*were measured at days 3, 4, 5, 6, and 7. We present the results in Fig. 1 and 2.

_{w}For the three cases (δ* _{m}* − δ

*= 0.8 − 0.5 = 0.3, δ*

_{w}*= δ*

_{m}*= 0.5, and δ*

_{w}*− δ*

_{w}*= 0.8 − 0.5 = 0.3,) displayed in Fig. 1a, the estimates (β̂ < 1) of PRR (based on the MEr method) gradually decrease as ρ increases. When β̂ > 1, the estimates of PRR decrease monotonically as ρ increases (data not shown). However, in both cases, the estimates do not change too much when ρ varies in the range of (0,15).*

_{m}For these three cases (δ* _{m}* − δ

*= 0.8 − 0.5 = 0.3, δ*

_{w}*= δ*

_{m}*= 0.5, and δ*

_{w}*− δ*

_{w}*= 0.8 − 0.5 = 0.3), the estimate of the PRR from the ME*

_{m}_{V}method against the measurement error variance in the covariate () are plotted in Fig. 1b, from which we can see that the estimates of PRR gradually decrease as increases.

Figure 2 presents the results of the estimates of PRR (*p* < 1) from the four methods against the death rate parameter δ* _{m}* or δ

*for the three cases with a value of 1 and a ρ value of 0.2. We can see that the estimates of PRR increase as δ*

_{w}*or δ*

_{m}*increases. However, when the estimates of PRR are greater than 1, the estimates decrease monotonically as δ*

_{w}*or δ*

_{m}*increases (data not shown).*

_{w}In addition, Fig. 1 and 2 show that for all the cases and methods, the estimates of PRR increase as Δδ* _{mw}* = δ

*− δ*

_{m}*(>0) increases, whereas the estimates of PRR decrease as Δδ*

_{w}*= δ*

_{wm}*− δ*

_{w}_{m}(>0) increases.

Since the LFR is the same as the PRR (i.e., *r = p*) when δ* _{m}* = δ

*= 0, the patterns for the estimates of LFR (*

_{w}*r*) are the same as those obtained for the estimates of PRR (

*p*), as shown in Fig. 1 for δ

*= δ*

_{m}*, and so these results are omitted here.*

_{w}Experimental data analysis.For illustrative purposes, we applied the proposed methods to the data from an experiment of our growth competition assay with 7 × 10^{6} target cells infected by a total of 300 ng p24 of the AT1/AT2P236L combination of viruses at a 50/50 ratio. The mutant and wild-type virus-infected cells, *T _{m}* and

*T*, were measured at days 3, 4, 5, 6, and 7. Note that if half the culture is removed and replaced with fresh medium at each measurement point, measurement data [

_{w}*T*(

_{m}*t*) and

_{i}*T*(

_{w}*t*)] need to be multiplied by a factor of 2

_{i}^{i}

^{−}

^{1}(

*i*= 1,…,5) in order to keep the same concentration unit at all time points. We chose δ

*= 0.5, δ*

_{m}*= 0.5, , and ρ = 1.0 to estimate the fitness parameters. Figure 3 displays the raw data of ln(*

_{w}*T*) and ln(

_{m}*T*) and the corresponding fitted lines with observations based on the LS, MEr, and MEv methods. Figure 3b shows that model fittings from the three methods are very similar, because the corresponding estimates (regression coefficient) are very close. Table 2 presents the results of estimated fitness parameters, including

_{w}*p*,

*r*, and

*d*, as well as the relative fitness (1 +

*s*) from different methods. It can be seen that for this data set, the AM method has the smallest estimate for the parameter

*p*, and the estimate of

*p*from the MEr method is greater than those from the LS and MEv methods. A similar pattern was also found for the estimates of

*r*based on the four methods.

## DISCUSSION

We have developed mathematical models to describe viral and cellular dynamics for in vitro experiments using a growth competition assay. Based on the models, we defined three fitness indices. Among these parameters, the LRF was shown to be the easiest to measure and most convenient to use in practice, since it involves only the ratio between mutant- and wild-type-infected cells at the same time point, which can be directly and accurately measured from the same sample in our experiment. It also has a direct relationship with the generic relative fitness defined in population genetics. Both PRR and LFR require calculations of the ratio for mutant- and wild-type-infected cells at two different time points. In addition, to calculate the PRR one also needs to know the death rates of infected cells.

In HIV research literature, the slope of the logarithmic time plot of the ratio of two viral variant frequencies has been used to quantify the relative fitness (8, 12). Actually this slope is exactly the LRF as we defined it in formulas 10 and 13. The LRF is also the difference of the pure growth rate between two viral variants. Goudsmit et al. (8) correctly used the definition of the relative fitness for the discrete model case but defined the slope of the logarithmic time plot as the selection coefficient (*s*) for the continuous time model, which is inconsistent with that used in population genetics literature. Goudsmit et al. (9) and Marée et al. (14) suggested using viral dynamic models to more rigorously define and study the relative fitness in competition experiments. However, their definition of relative fitness considered only the case that the selection acts upon the replication (production or infection) rate. Bonhoeffer et al. (1) adopted the same definition used in the work of Goudsmit et al. (9) and Marée et al. (14). Marée et al. (14) also suggested that the death rate should be assumed to be zero if it is unknown whether the selection acts on the production rate or the death rate. However, in their derivation of the relative fitness or the selection coefficient, the death rates of infected cells are not assumed to be zero (see formula 5 in the work of Marée et al. [14]). In the field of population genetics, the selection is usually assumed to act upon both the production rate and the death rate in the definition of the fitness or relative fitness (2, 11). Also note that even if the death rate is set to be zero, the PRR used in the work of Goudsmit et al. (9) and Marée et al. (14) is not equal to the relative fitness or the selection coefficient; it actually is equal to the LFR in this case. We expect that these confused definitions and concepts have been clarified.

We have proposed methods to calculate the relative fitness indices from two data points and statistical methods to estimate these indices from multiple data points. The measurement error modeling approach was also suggested. We compared these methods, via Monte Carlo simulations, to the existing method, the AM method, suggested in reference 1. From the simulation studies, we conclude that (i) if the information about measurement error variances is unavailable, the LS method may be used to estimate relative fitness parameters; (ii) if the variance of the measurement error of the covariate () is known, the MEv method appears preferable to other methods in terms of the MSE; (iii) if measurement error variances and are equal (i.e., ρ = 1), the MEr method is preferred to other methods because it does not depend on measurement error variances. (iv) If both and ρ are known but ρ > 1, MEv is preferable to other methods; (v) if both and ρ are known but ρ < 1, the MEr method may be better than the other methods, because it is very robust to the choice of ρ (Fig. 1b). In summary, we may choose one of the LS, MEr, and MEv methods to estimate the relative fitness parameters based on the information availability of measurement error variances. Table 3 summarizes a general guideline for method selection.

On the basis of the results summarized above, we found that the MEv method yielded the most reliable estimates of the PRR in terms of achieving the smallest MSE. Since the estimate from the MEv method depends on the measurement error variance of the covariate () and is a decreasing function of , the MEv method should be chosen to estimate the PRR only when a reliable value of is available. Otherwise, this method may overestimate the parameters for small and underestimate for large . In addition, if both ρ and are known but ρ > 1, the MEv method is the best for estimating the PRR and LFR.

The MEr method requires the ratio of measurement error variances (i.e., ) to be known. As indicated in the analysis of effects of measurement errors, this method is very robust to the ratio of measurement error variances. Thus, if ρ can be obtained from other sources or measured from experiments, we may select the MEr method to estimate the PRR and LFR. In particular, when ρ ≤ 1, the MEr method is preferable to other methods in terms of achieving a mean estimate close to the true value.

Unlike the above two methods, the LS method estimates the fitness parameters without considering the effect of measurement error in covariate. But this method may lead to inaccurate estimates if the covariate measurement error is large. However, in contrast to the MEr and MEv methods, the advantage of the LS method is its simplicity and ease of implementation. One can choose the LS method to estimate the fitness parameters when the measurement error is small or not available.

The AM method is based on the approach of Marée et al. (14). Unlike the other three methods suggested in this article, it does not involve either a regression-type procedure or any statistical methods. The AM method is very simple and only needs data at two time points to estimate the parameters. However, as stated in reference 1, the AM method may result in inaccurate estimates of the fitness parameters, because statistical fluctuations in the total virus population can result in numerical problems due to division by a value close to zero. Also, it does not provide any information on the uncertainty of the parameter estimates.

The effect of death rates (δ* _{m}* and δ

*) of infected cells on the estimate of PRR is significant (Fig. 2). If δ*

_{w}*and δ*

_{m}*are unknown, the estimate of PRR may strongly depend on the specification of the δ*

_{w}*and δ*

_{m}*values. Thus, the two newly defined fitness indices, the LFR and the LRF, are more attractive in practice, since they do not depend on the death rates of infected cells and are easier to estimate from the experimental data.*

_{w}Note that the proposed methodology is developed based on the in vitro experiments on T-cell lines. A number of assumptions have been made for model simplifications. These include that the free virus concentrations are proportional to infected cell densities and the concentration of target cells is constant or large enough. These assumptions may need to be further tested, in particular for in vivo data applications.

In summary, we have proposed relative fitness indices based on the dynamic models of virus and cells for our growth competition assay and investigated statistical methods to estimate these indices. The newly defined fitness indices have some advantages for practical applications and are consistent with the fitness definitions used in population genetics. Computational tools were also developed and are freely accessible at http://www.urmc.rochester.edu/bstools/vfitness/virusfitness.htm.

Linear regression (LS).The standard linear regression model without intercept is defined by
(A1)
where *x _{i}* is an independent variable (covariate),

*Y*is a dependent variable (response variable), ε

_{i}*is random error with ), and ν*

_{i}_{ε}= σ

_{ε}

^{2}is the variance of ε

*. According to the LS approach, the LS estimator can be expressed as follows. (A2) where*

_{i}*V̂*{β̂} is the estimated variance of β̂, , , and . However, in practice,

*x*in equation A1 may be measured with error, which needs to be taken into account in the linear regression analysis. This is the so-called “linear regression with measurement error in covariate” in statistics literature, which is briefly reviewed as follows.

_{i}Linear regression with measurement error in covariate.Covariate measurement error is a common source of bias in regression analysis. There exists vast statistical literature on regression models with covariate measurement errors (3, 6). We review the linear regression models with measurement errors in both response variable and covariate here. Assuming that the measurement errors follow normal distributions and are independent of each other, we can write the linear regression model with measurement error in covariate as follows (6).
(A3)
where *y _{i}* is the underlying true value of the dependent variable and

*x*is the underlying true value of an independent variable (covariate), and we also assume that

_{i}*x*follows a normal distribution, . Note that (

_{i}*Y*,

_{i}*X*)

_{i}^{T}is a vector of observations corresponding to the underlying true values (

*y*,

_{i}*x*)

_{i}^{T}, (ε

*,*

_{i}*e*)

_{i}^{T}is the vector of measurement errors with a bivariate normal distribution, and are measurement error variances in response and in covariate, respectively. It follows from equation A3 that the vector (

*Y*,

_{i}*X*)

_{i}^{T}is distributed in a bivariate normal distribution with a mean vector

*E*(

*Y*,

*X*) = (μ

*,μ*

_{Y}*) = (βμ*

_{X}*,μ*

_{x}*) and covariance matrix (A4) In the following, we denote , , and . Fuller (6) proposed several methods to estimate the regression coefficient β in equation A3. Two of these methods are outlined as follows.*

_{x}(i) Ratio of measurement variances known (MEr).The first method assumes that the ratio is known. Based on the method of moments estimators, we have + ρ and ). For samples for which *M _{XY}* ≠ 0, it can be shown that
(A5)
where

*V̂*{β̂} is the estimated variance of β̂, and Γ̂ = (

*n*− 2)

^{−1}(

*n*− 1)(

*M*− 2β̂

_{YY}*M*+ β̂

_{XY}^{2}

*M*

_{XX}*)*.

(ii) Measurement variance known (MEv).The second method assumes that the variance of the measurement error in the covariate, , is known, and it can be seen from equation A4 that the population moments of (*Y _{i}*,

*X*) satisfy (μ

_{i}*,μ*

_{Y}*) = (βμ*

_{X}*,μ*

_{x}*), and (). By replacing the unknown population moments with their sample estimators in the above system of equations, we can obtain the estimator of the regression coefficient, . Note that this estimator is a ratio of two random variables. Such ratios are typically biased estimators of the ratio of the expectations. In fact, the expectation of this estimator is not defined (6). Fuller (6) provides a modified estimator that is nearly unbiased for β. The estimators of the unknown parameters in the above system of equations can be written as (A6) where*

_{x}*V̂*{β̂} is the estimated variance of β̂, Γ̂ = (

*n*− 2)

^{−1}(

*n*− 1)(

*M*− 2β̂

_{YY}*M*+ β̂

_{XY}^{2}

*M*), ), and

_{XX}*Ĥ*satisfies (A7) with ]. The bias expression of theorem 2.5.1 in reference 6 furnishes the motivation for this estimator. The quantity is a biased estimator of , but it is preferred to other estimators of the ratio because of its smaller variance. See reference 6 for detailed derivation of formulae A5 and A6.

## ACKNOWLEDGMENTS

We thank Kyriakos Deriziotis, Amanda Tallo, Amanda Moore, and Kora Fox for their technical expertise.

This work was supported in part by National Institutes of Health (NIH) research grants RO1 AI052765, RO1 AI055290, RO1 AI41387, NO1 AI38858 (subcontract 200VC007), and U01 AI27658.

## FOOTNOTES

- Received 14 June 2005.
- Accepted 2 December 2005.
- ↵*Corresponding author. Mailing address: Department of Biostatistics and Computational Biology, University of Rochester School of Medicine and Dentistry, 601 Elmwood Avenue, Box 630, Rochester, New York 14642. Phone: (585) 275-6767. Fax: (585) 273-1031. E-mail: hwu{at}bst.rochester.edu.

## REFERENCES

- American Society for Microbiology