Previous Article | Next Article ![]()
Journal of Virology, March 2006, p. 2380-2389, Vol. 80, No. 5
0022-538X/06/$08.00+0 doi:10.1128/JVI.80.5.2380-2389.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Department of Biostatistics and Computational Biology, University of Rochester School of Medicine and Dentistry, 601 Elmwood Avenue, Box 630, Rochester, New York 14642,1 Department of Epidemiology & Biostatistics, College of Public Health, MDC 56, University of South Florida, 13201 Bruce B. Downs Blvd., Tampa, Florida 33612,2 Department of Medicine, University of Rochester School of Medicine and Dentistry, 601 Elmwood Avenue, Box 689, Rochester, New York 14642,3 Theoretical Biology & Biophysics Group, MS-K710, T-10, Los Alamos National Laboratory, Los Alamos, New Mexico 875454
Received 14 June 2005/ Accepted 2 December 2005
|
|
|---|
|
|
|---|
Fitness is sometimes quantified on the basis of direct biochemical measurements of enzyme activity (4, 16) or virus growth kinetics in pure cultures that contain only one variant (13, 16). According to population genetics theory, the relative fitness (1 + s) of a variant represents its relative contribution to the next generation, where the parameter s is called the selection coefficient. Experimental protocols on viral fitness have been reviewed recently (19). However, the assays that operate on a single virus variant cannot take the interactions between different virus strains into account. Many studies have therefore investigated the relative growth kinetics of two virus variants growing in competition either in vivo (8, 9) or in vitro (4, 10, 12, 15). In these studies, a measure of fitness is typically derived by plotting the ratio of the two competing variants on a logarithmic scale against time and estimating the linear slope of this graph, which is used as the measure of fitness. Goudsmit et al. (9) introduced a new definition of relative fitness as the ratio between two production rates of two viral strains based on a viral dynamic model, which could be regarded as a measurement of relative selection effect of the viral production rate (9). However, as shown later in this paper, the definition of relative fitness used in references 1, 9, and 14 is inconsistent with the conventional definition of relative fitness in population genetics and does not take into account the death rates of the viral strains. We recognize the merit of the ratio of the production rates (referred as to the "production rate ratio" or PRR in this article) as a component of the relative fitness in order to distinguish it from the new definition of relative fitness based on the fitness concept in population genetics. The definition of relative fitness in reference 9 was later adopted by Marée et al. (14). They proposed a simple mathematical model to describe the dynamics of viral competition between wild-type and mutant virus and the linkage of the production rate of a viral strain to its fitness. However, their method is not efficient and accurate, because the data from only two time points were used and the method offers no information on the goodness of their estimate. Following the idea of Marée et al. (14), Bonhoeffer et al. (1) presented a new approach with an intention of using time series data at multiple time points. Although Bonhoeffer et al. (1) considered experimental errors in both coordinates in their linear regression model, their method for adjusting the measurement errors in both coordinates is not consistent with the standard statistical methods and is difficult to implement. In this article, we will introduce several parameters, based on HIV dynamic models, to quantify the relative fitness for the growth competition assay. The relationship between these parameters and the relative fitness used in the field of population genetics will be discussed. The confusion in the definition of relative fitness in recent HIV literature will be clarified. More-efficient statistical methods will be proposed to estimate the viral fitness parameters.
|
|
|---|
Mathematical models.
Simple mathematical models for viral fitness have been employed to estimate the selection coefficient and relative fitness from experimental data (1, 9, 14). Here we extend these simple models and develop a complete model with five compartments for viral fitness as follows.
![]() |
![]() |
![]() | (1) |
![]() |
![]() |
represents the cell proliferation rate; dT is the death rate of target T cells;
and
are the respective infection rates at which T cells become infected by M and W; Nm and Nw are the respective number of new virions produced from each of the infected cells during their lifetime;
m and
w are the respective death rates of Tm and Tw; cm and cw are the respective clearance rates of mutant and wild-type virions. We summarize all definitions and mathematical notations in Table A1 in the Appendix. |
View this table: [in a new window] |
TABLE A1. Definitions and notations
|
![]() | (2) |
m = (Nm
m)/cm and
w = (Nw
w)/cw, and reduce model 1 to the following form.
![]() |
![]() | (3) |
![]() |
and
. We compared the solutions of Tm and Tw from equations 1 and 3 by solving the differential equations numerically. We found (data not shown) that when the condition (equation 2) was approximately satisfied, the solutions of infected cells Tm and Tw from equations 1 and 3 were very similar, although the level of target cells, T, varied at all times.
If we assume the concentration of target cells (T) to be constant or large enough, the model (equation 3) can be further simplified to
![]() |
![]() | (4) |
![]() |
![]() | (5) |
Competitive fitness parameters. Relative fitness has its origin in population genetics. We review the relative fitness definition in population genetics and then clarify the definition of relative fitness as it appeared in recent HIV literature on viral fitness.
In the population genetics literature, fitness of a genotype is generally defined as the ability of the genotype to survive and reproduce (11). It may be quantified as the number of progeny contributed to the next generation (2). Relative fitness of a genotype is defined as the ratio of the fitness of this genotype against that of a reference genotype (usually the fittest genotype). For the competitive viral fitness experiment described in this paper, we assume that no new strains of virus have been produced. Thus, the pooled population of the wild-type and mutant virus can be regarded as a haploid population with two genotypes.
Consider a discrete model for a haploid population with two genotypes. The size of the ith genotype (i=1, 2) is modeled by Ni(t) = wiNi(t 1), with a solution as Ni(t) = (wi)tNi(0), where wi is the fitness of the ith genotype. The relative fitness is defined as 1 + s=w1/w2, and s is the selection coefficient. Based on the above models, we have
![]() |
, the fitness of the ith genotype is egi, and the relative fitness is 1+s=eg1/eg2=eg1g2. The estimate of the relative fitness is exactly the same as that in the discrete model case.
There seems to be some confusion and inconsistency in defining viral fitness in the HIV research community. In several recent papers (1, 9, 14), the relative fitness of the mutant HIV strain was defined under the model of equation 4 as km/kw, which is inconsistent with the definition of relative fitness used in population genetics as we have shown above. In fact, with the model of equation 4, the relative fitness of the mutant virus with respect to the wild-type virus should be defined as
, where gm=kmT
m and gw=kwT
w are the net growth rates of the mutant and wild-type infected cells, respectively. In order to facilitate the development of statistical estimation methods in a subsequent section, we introduce the log-relative fitness (LRF):
![]() |
m
w), and the relative fitness of the mutant virus with respect to the wild-type virus is 1 + s = ed
p = km/kw.
Calculations of competitive fitness parameters from two data points.
Let
t = t2 t1, with ti(i = 1,2) being two experimental time points. Similar to the analysis in reference 14, we can show from the differential equation 3 that
![]() |
![]() | (6) |
m =
w, equation 6 is equivalent to formula 5 in the work of Marée et al. (14). Similar to the analysis of Marée et al. (14), if the culture is diluted by twofold or if half the culture is removed and replaced with fresh media at time t2 relative to time t1 during the experiment, we have to adjust equation 6 in order to keep the same unit in the culture for all measurements. In this case, equation 6 is modified as
![]() | (7) |
![]() | (8) |
![]() | (9) |
![]() | (10) |
![]() |
We can see from equations 6, 8, and 10 that the LFR and LRF do not depend on the death rates of infected cells as the PRR does. Thus, the LFR and LRF are easier to estimate from the experimental data than the PRR. Also, the LRF has a direct relationship with the relative fitness as defined in the field of population genetics.
Estimation methods of fitness parameters from multiple data points. As stated previously, Marée et al. (14) proposed a method to compute the production rate ratio (instead of the relative fitness) based on a simple mathematical model which describes the dynamics of viral competition between wild-type and mutant virus. However, this method may result in inaccurate estimates of the production rate ratio because it is based on only two time points and the method offers no statistical information on the estimated parameter. Following the idea of Marée et al. (14), Bonhoeffer et al. (1) presented a new approach, called the growth-corrected method (GM), to estimate the production rate ratio between two virus variants using time series data (multiple data points). Via simulation studies, they compared their method with the average method (AM), which calculated the average production rate ratio between all pairs of successive time points according to equation 2.4 in the work of Marée et al. (14) and claimed some advantages of their method. However, the GM has the following weaknesses. First, the GM does not follow a standard statistical approach to consider measurement errors in the model, although it intends to adjust for the measurement error in covariates in the linear regression model. Second, because data for both coordinates contain the common element based on their model, the corresponding two measurement errors should be correlated rather than independent as they assumed, and thus, the GM could result in inefficient estimates. Third, the GM is difficult to implement using available standard statistical software. Finally, some extra variance parameters were assumed to be known, an assumption that may not be reliable; see equation A4 in reference 1. To overcome these problems, we suggest standard statistical methods to estimate viral fitness parameters (see Appendix A for details). In what follows, we apply these methods to estimate the viral competitive fitness parameters PRR (p), LFR (r), and LRF (d), as well as the relative fitness (1 + s).
(i) PRR estimate.
Based on the above discussions, we have constructed a linear regression model with measurement error in covariate based on equation 6 and estimate the PRR. Let ti be the time of the ith observation (i = 1,...,n),
ti = ti ti1, and
i =
k=1i
tk = ti t0. Following the analysis in reference 1, equation 6 can be expressed as a regression model between variables at the ith time point, ti, and the initial time, t0:
![]() | (11) |
m
i, xi = w(ti) w(t0) +
w
i, and ß = p; Yi and Xi are observed measurements corresponding to the underlying true values yi and xi at time point ti. Thus, we can express equation 11 in the form of equation A3 in Appendix A and apply the three methods suggested in Appendix A to estimate p. We summarize these methods below.
(a) Method 1 (ordinary least square [LS]).
As discussed above, the covariate Xi value is measured with error. However, if information on the error variances associated with the two variables in the model (equation A3 in Appendix A) is not available, we may have to ignore the measurement errors and substitute Xi for xi in the model (equation A1 in Appendix A) and use the least-square estimate
in equation A2 of Appendix A to estimate p.
(b) Method 2 (measurement error model with known ratio of error variances [MEr]).
If the ratio
is known, the formula
in equation A5 of Appendix A can be used to estimate p.
(c) Method 3 (measurement error model with known measurement error variance [MEv]).
If the variance of the measurement error in covariate,
, is known, the formula
in equation A6 of Appendix A can be used to estimate p.
In the above three methods, we can calculate the standard deviation for the parameter estimates. See Appendix A for details of these methods.
(ii) LFR estimate.
Similar to the analysis in equation 11, it follows from equation 9 that
![]() | (12) |
(iii) LRF estimate.
Let ti be the time of the ith observation (i = 1,...,n), h(ti) = ln[Tm(ti)/Tw(ti)],
ti = ti+1 ti, and
zi = h(ti+1) h(ti). We can express equation 10 for multiple data points as the following standard linear regression model.
![]() | (13) |
![]() | (14) |
, can also be found in Appendix A. Therefore, the relative fitness can be estimated as
![]() |
We have developed Web-based, user-friendly software to implement all these estimation methods (http://www.urmc.rochester.edu/bstools/vfitness/virusfitness.htm). The estimates of all these viral fitness parameters from different estimation methods can be easily obtained from this software.
|
|
|---|
Numerical simulations for replication dynamics of mutant and wild-type viruses were performed according to equations 4 with the parameters
m = 0.5,
w = 0.5 and several different values of the parameters km and kw. At selected time points ti, the observations (Yi,Xi)T were generated based on the numerical solutions (Tm,Tw)T of differential equations 4 with measurement errors. We assume that the vector of measurement errors ((
i,ei)T) is normally distributed with
)}, where diag(
2,
e2) is the 2 by 2 diagonal matrix, with diagonal elements being 
2 and
e2. Several different values of
,
were chosen in the simulation studies.
Table 1 presents the true values (ßtrue) of PRR used to generate data, the mean estimates (
est), and the corresponding mean squared error (MSE) by the four methods based on 200 simulated data sets for different choices of parameter values and measurement error variances. The results in Table 1 are summarized as follows.
|
View this table: [in a new window] |
TABLE 1. Comparison of the LS, MEr, MEv, and AM methods with simulated datasetsa
|
(ii) For all the cases, the AM method always gives the largest MSE. The simulation results coincide with the findings from Bonhoeffer et al. (1) that occasionally the AM method yields estimates far from ßtrue and may generate outliers because the calculation of ß involves division by a value close to zero.
(iii) Table 1 shows that the mean estimates based on the MEr and MEv methods are generally closer to the true values than those based on the LS and AM methods.
(iv) For
(i.e.,
> 1), the mean estimates from the MEv method are closer to the true values than those from the MEr method. However, the MEr method performs better than the MEv method in terms of achieving a mean estimate close to the true value when
(i.e.,
< 1); in this case, the MSE of the estimate based on the MEr method is also less than that based on the LS method.
(v) When
(i.e.,
= 1), the MEr performs similarly to the MEv and LS methods but better than the AM method in terms of MSE.
Note that if we choose the true values (ßtrue) to be 0.1, 0.2, and 0.4, the results are similar to those above (data not shown). In addition, for estimates of the LFR (r), the performance of the three methods has a pattern similar to that of the corresponding methods for estimating the PRR, and detailed results are omitted here.
Effects of measurement errors and death rates on fitness parameter estimates. In order to investigate the effect of measurement errors and death rates of infected cells on the estimate of PRR, we conducted sensitivity analyses using data from an experiment with 7 x 106 target cells infected by a total of 300 ng p24 of the AT1/AT2P236L combination of viruses at a 50/50 ratio. Tm and Tw were measured at days 3, 4, 5, 6, and 7. We present the results in Fig. 1 and 2.
![]() View larger version (15K): [in a new window] |
FIG. 1. Effect of measurement errors on the estimate of PRR. (a) Estimate of PRR (p) against the ratio of measurement error variances ( ) based on the MEr method. (b) Estimate of PRR against measurement variance in covariate ( ) based on the MEv method. Data were produced from an assay experiment with 7 x 106 target cells infected by a total of 300 ng p24 of the AT1/AT2P236L combination of viruses at a 50/50 ratio. Tm and Tw were measured at days 3, 4, 5, 6, and 7.
|
![]() View larger version (18K): [in a new window] |
FIG. 2. Effect of death rates of infected cells on the estimate of PRR (p) based on the LS, MEr, MEv, and AM methods with application to an assay experiment of 7 x 106 target cells infected with a total of 300 ng p24 of the AT1/AT2P236L combination of viruses at a 50/50 ratio. Tm and Tw were measured at days 3, 4, 5, 6, and 7. We chose , =0.2, but different m and w values were used to obtain estimates of the PRR.
|
m
w = 0.8 0.5 = 0.3,
m =
w = 0.5, and
w
m = 0.8 0.5 = 0.3,) displayed in Fig. 1a, the estimates (
< 1) of PRR (based on the MEr method) gradually decrease as
increases. When
> 1, the estimates of PRR decrease monotonically as
increases (data not shown). However, in both cases, the estimates do not change too much when
varies in the range of (0,15).
For these three cases (
m
w = 0.8 0.5 = 0.3,
m =
w = 0.5, and
w
m = 0.8 0.5 = 0.3), the estimate of the PRR from the MEV method against the measurement error variance in the covariate (
) are plotted in Fig. 1b, from which we can see that the estimates of PRR gradually decrease as
increases.
Figure 2 presents the results of the estimates of PRR (p < 1) from the four methods against the death rate parameter
m or
w for the three cases with a
value of 1 and a
value of 0.2. We can see that the estimates of PRR increase as
m or
w increases. However, when the estimates of PRR are greater than 1, the estimates decrease monotonically as
m or
w increases (data not shown).
In addition, Fig. 1 and 2 show that for all the cases and methods, the estimates of PRR increase as 
mw =
m
w(>0) increases, whereas the estimates of PRR decrease as 
wm =
w
m(>0) increases.
Since the LFR is the same as the PRR (i.e., r = p) when
m =
w = 0, the patterns for the estimates of LFR (r) are the same as those obtained for the estimates of PRR (p), as shown in Fig. 1 for
m =
w, and so these results are omitted here.
Experimental data analysis.
For illustrative purposes, we applied the proposed methods to the data from an experiment of our growth competition assay with 7 x 106 target cells infected by a total of 300 ng p24 of the AT1/AT2P236L combination of viruses at a 50/50 ratio. The mutant and wild-type virus-infected cells, Tm and Tw, were measured at days 3, 4, 5, 6, and 7. Note that if half the culture is removed and replaced with fresh medium at each measurement point, measurement data [Tm(ti) and Tw(ti)] need to be multiplied by a factor of 2i1 (i = 1,...,5) in order to keep the same concentration unit at all time points. We chose
m = 0.5,
w = 0.5,
, and
= 1.0 to estimate the fitness parameters. Figure 3 displays the raw data of ln(Tm) and ln(Tw) and the corresponding fitted lines with observations based on the LS, MEr, and MEv methods. Figure 3b shows that model fittings from the three methods are very similar, because the corresponding estimates (regression coefficient) are very close. Table 2 presents the results of estimated fitness parameters, including p, r, and d, as well as the relative fitness (1 + s) from different methods. It can be seen that for this data set, the AM method has the smallest estimate for the parameter p, and the estimate of p from the MEr method is greater than those from the LS and MEv methods. A similar pattern was also found for the estimates of r based on the four methods.
![]() View larger version (11K): [in a new window] |
FIG. 3. (a) The mutant- and wild-type-infected cells in log scale, ln(Tm) and ln(Tw), were measured at days 3, 4, 5, 6, and 7 from an experiment with 7 x 106 target cells infected by a total 300 ng p24 of the AT1/AT2P236L combination of viruses at a 50/50 ratio. (b) Fitted lines and the corresponding observations for the LS, MEr, and MEv methods.
|
|
View this table: [in a new window] |
TABLE 2. Comparison of methods of fitness parameter estimation with a data set from our growth competitive assaya
|
|
|
|---|
In HIV research literature, the slope of the logarithmic time plot of the ratio of two viral variant frequencies has been used to quantify the relative fitness (8, 12). Actually this slope is exactly the LRF as we defined it in formulas 10 and 13. The LRF is also the difference of the pure growth rate between two viral variants. Goudsmit et al. (8) correctly used the definition of the relative fitness for the discrete model case but defined the slope of the logarithmic time plot as the selection coefficient (s) for the continuous time model, which is inconsistent with that used in population genetics literature. Goudsmit et al. (9) and Marée et al. (14) suggested using viral dynamic models to more rigorously define and study the relative fitness in competition experiments. However, their definition of relative fitness considered only the case that the selection acts upon the replication (production or infection) rate. Bonhoeffer et al. (1) adopted the same definition used in the work of Goudsmit et al. (9) and Marée et al. (14). Marée et al. (14) also suggested that the death rate should be assumed to be zero if it is unknown whether the selection acts on the production rate or the death rate. However, in their derivation of the relative fitness or the selection coefficient, the death rates of infected cells are not assumed to be zero (see formula 5 in the work of Marée et al. [14]). In the field of population genetics, the selection is usually assumed to act upon both the production rate and the death rate in the definition of the fitness or relative fitness (2, 11). Also note that even if the death rate is set to be zero, the PRR used in the work of Goudsmit et al. (9) and Marée et al. (14) is not equal to the relative fitness or the selection coefficient; it actually is equal to the LFR in this case. We expect that these confused definitions and concepts have been clarified.
We have proposed methods to calculate the relative fitness indices from two data points and statistical methods to estimate these indices from multiple data points. The measurement error modeling approach was also suggested. We compared these methods, via Monte Carlo simulations, to the existing method, the AM method, suggested in reference 1. From the simulation studies, we conclude that (i) if the information about measurement error variances is unavailable, the LS method may be used to estimate relative fitness parameters; (ii) if the variance of the measurement error of the covariate (
) is known, the MEv method appears preferable to other methods in terms of the MSE; (iii) if measurement error variances
and
are equal (i.e.,
= 1), the MEr method is preferred to other methods because it does not depend on measurement error variances. (iv) If both
and
are known but
> 1, MEv is preferable to other methods; (v) if both
and
are known but
< 1, the MEr method may be better than the other methods, because it is very robust to the choice of
(Fig. 1b). In summary, we may choose one of the LS, MEr, and MEv methods to estimate the relative fitness parameters based on the information availability of measurement error variances. Table 3 summarizes a general guideline for method selection.
|
View this table: [in a new window] |
TABLE 3. General guideline for selecting methods to estimate the PRR based on information availability of measurement error variances
|
) and is a decreasing function of
, the MEv method should be chosen to estimate the PRR only when a reliable value of
is available. Otherwise, this method may overestimate the parameters for small
and underestimate for large
. In addition, if both
and
are known but
> 1, the MEv method is the best for estimating the PRR and LFR.
The MEr method requires the ratio of measurement error variances (i.e.,
) to be known. As indicated in the analysis of effects of measurement errors, this method is very robust to the ratio of measurement error variances. Thus, if
can be obtained from other sources or measured from experiments, we may select the MEr method to estimate the PRR and LFR. In particular, when
1, the MEr method is preferable to other methods in terms of achieving a mean estimate close to the true value.
Unlike the above two methods, the LS method estimates the fitness parameters without considering the effect of measurement error in covariate. But this method may lead to inaccurate estimates if the covariate measurement error is large. However, in contrast to the MEr and MEv methods, the advantage of the LS method is its simplicity and ease of implementation. One can choose the LS method to estimate the fitness parameters when the measurement error is small or not available.
The AM method is based on the approach of Marée et al. (14). Unlike the other three methods suggested in this article, it does not involve either a regression-type procedure or any statistical methods. The AM method is very simple and only needs data at two time points to estimate the parameters. However, as stated in reference 1, the AM method may result in inaccurate estimates of the fitness parameters, because statistical fluctuations in the total virus population can result in numerical problems due to division by a value close to zero. Also, it does not provide any information on the uncertainty of the parameter estimates.
The effect of death rates (
m and
w) of infected cells on the estimate of PRR is significant (Fig. 2). If
m and
w are unknown, the estimate of PRR may strongly depend on the specification of the
m and
w values. Thus, the two newly defined fitness indices, the LFR and the LRF, are more attractive in practice, since they do not depend on the death rates of infected cells and are easier to estimate from the experimental data.
Note that the proposed methodology is developed based on the in vitro experiments on T-cell lines. A number of assumptions have been made for model simplifications. These include that the free virus concentrations are proportional to infected cell densities and the concentration of target cells is constant or large enough. These assumptions may need to be further tested, in particular for in vivo data applications.
In summary, we have proposed relative fitness indices based on the dynamic models of virus and cells for our growth competition assay and investigated statistical methods to estimate these indices. The newly defined fitness indices have some advantages for practical applications and are consistent with the fitness definitions used in population genetics. Computational tools were also developed and are freely accessible at http://www.urmc.rochester.edu/bstools/vfitness/virusfitness.htm.
|
|
|---|
![]() | (A1) |
i is random error with
), and 
= 
2 is the variance of
i. According to the LS approach, the LS estimator can be expressed as follows.
![]() |
![]() |
![]() | (A2) |
{
} is the estimated variance of
,
,
, and
. However, in practice, xi in equation A1 may be measured with error, which needs to be taken into account in the linear regression analysis. This is the so-called "linear regression with measurement error in covariate" in statistics literature, which is briefly reviewed as follows.
Linear regression with measurement error in covariate.
Covariate measurement error is a common source of bias in regression analysis. There exists vast statistical literature on regression models with covariate measurement errors (3, 6). We review the linear regression models with measurement errors in both response variable and covariate here. Assuming that the measurement errors follow normal distributions and are independent of each other, we can write the linear regression model with measurement error in covariate as follows (6).
![]() | (A3) |
. Note that (Yi,Xi)T is a vector of observations corresponding to the underlying true values (yi,xi)T, (
i,ei)T is the vector of measurement errors with a bivariate normal distribution, and
are measurement error variances in response and in covariate, respectively. It follows from equation A3 that the vector (Yi,Xi)T is distributed in a bivariate normal distribution with a mean vector E(Y,X) = (µY,µX) = (ßµx,µx) and covariance matrix
![]() | (A4) |
,
, and
. Fuller (6) proposed several methods to estimate the regression coefficient ß in equation A3. Two of these methods are outlined as follows.
(i) Ratio of measurement variances known (MEr).
The first method assumes that the ratio
is known. Based on the method of moments estimators, we have
+ 
and
). For samples for which MXY
0, it can be shown that
![]() |
![]() |
![]() |
![]() | (A5) |
{
} is the estimated variance of
, and
= (n 2)1(n 1)(MYY 2
MXY +
2MXX).
(ii) Measurement variance known (MEv).
The second method assumes that the variance of the measurement error in the covariate,
, is known, and it can be seen from equation A4 that the population moments of (Yi,Xi) satisfy (µY,µX) = (ßµx,µx), and (
). By replacing the unknown population moments with their sample estimators in the above system of equations, we can obtain the estimator of the regression coefficient,
. Note that this estimator is a ratio of two random variables. Such ratios are typically biased estimators of the ratio of the expectations. In fact, the expectation of this estimator is not defined (6). Fuller (6) provides a modified estimator that is nearly unbiased for ß. The estimators of the unknown parameters in the above system of equations can be written as
![]() |
![]() | (A6) |
![]() |
![]() |
{
} is the estimated variance of
,
= (n 2)1(n 1)(MYY 2
MXY +
2MXX),
), and
satisfies
![]() | (A7) |
]. The bias expression of theorem 2.5.1 in reference 6 furnishes the motivation for this estimator. The quantity
is a biased estimator of
, but it is preferred to other estimators of the ratio because of its smaller variance. See reference 6 for detailed derivation of formulae A5 and A6.
This work was supported in part by National Institutes of Health (NIH) research grants RO1 AI052765, RO1 AI055290, RO1 AI41387, NO1 AI38858 (subcontract 200VC007), and U01 AI27658.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»