**DOI:**10.1128/JVI.78.16.8942-8945.2004

## ABSTRACT

Human immunodeficiency virus type 1 (HIV-1)-infected splenocytes in humans were recently shown to harbor three to four proviruses per cell on average (A. Jung et al., Nature 418:144, 2002). However, the mechanisms that lead to such extensive multiple infections are not understood. Here, we find by using mathematical analysis that two mechanisms quantitatively capture the distribution of proviral genomes in HIV-1-infected splenocytes, one where multiple genomes are acquired one at a time in a series of sequential infectious contacts of a target cell with free virions and infected cells, and the other where cell-to-cell transmission of multiple virions or genomes results from a single infectious contact of a target cell with an infected cell. The two mechanisms imply different genetic diversities of proviruses within an infected cell and therefore different rates of emergence of drug resistance via recombination.

Human immunodeficiency virus type 1 (HIV-1)-infected CD4^{+} T cells isolated from the spleens of two individuals were recently shown to harbor anywhere between one and eight proviruses, with an average of three to four proviruses per cell (5). This remarkable observation suggests that multiple infections of cells with HIV-1 occur more frequently than single infections. Two recent in vitro studies have also observed a high rate of multiple infections of cells by HIV-1 (2, 6). Here, we present two mechanisms that quantitatively capture the observed distribution of proviral copy number in infected cells (5) and describe how multiple infections are orchestrated in lymphoid tissue (LT) in HIV-1-infected individuals.

We first examine the possibility that multiple infections are the result of sequential infectious contacts of CD4^{+} target cells with cell-free virus or HIV-1-infected cells, each contact resulting in the transmission of one viral genome to the target cell. Sato et al. (11) showed that cell-cell transmission may not involve the participation of virus particles and, thus, we speak of the transmission of viral genomes rather than virus particles. Infectious contacts are those that lead to the successful transmission of viral genomes followed by provirus formation. Let *V* be the density of cell-free virions in LT and *T** be that of infected cells. Following extant models (8), the average rate at which a cell acquires viral genomes may be written as *k*_{V}*V* + *k*_{T}*T**, where *k** _{V}* and

*k*

*are second-order infection rate constants. Thus, a target cell that is first infected at time zero with a single genome acquires (*

_{T}*k*

_{V}*V*+

*k*

_{T}*T**)

*dt*additional genomes in a small subsequent interval of time,

*dt*. Following infection, however, the cell down regulates its surface CD4 receptors, rendering further infections difficult (1, 6, 11). The rate constants

*k*

*and*

_{V}*k*

*are therefore functions of time and may be written as*

_{T}*k*

*=*

_{V}*k*exp(−

_{V,0}*t/t*

*) and*

_{d}*k*

*=*

_{T}*k*

_{T,0}exp(−

*t/t*

*), where*

_{d}*k*

_{V,0}and

*k*

*are the corresponding infection rate constants for uninfected cells and exp(−*

_{T,0}*t/t*

*) is the fraction of CD4 molecules remaining on the surface at time*

_{d}*t*. Here, an exponential decline of surface CD4 molecules, as observed by Piguet et al. (10), is assumed, with

*t*

*the characteristic down regulation time scale.*

_{d}In the chronically infected steady state, where viral loads and infected T-cell counts remain approximately constant over time scales (months or years) that are large compared to *t** _{d}* (∼minutes [8]), the total number of additional genomes acquired on average by infected cells harboring single proviruses is given by the following equation:
$$mathtex$$\[v{\approx}(k_{v,0}V{+}k_{T,0}T^{{\ast}}){{\int}_{0}^{{\infty}}}exp({-}t/t_{d})dt{=}(k_{v,0}V{+}k_{T,0}T^{{\ast}})t_{d}\]$$mathtex$$(1)
Assuming infections to be Poisson processes, the probability that an infected cell harboring a single provirus acquires

*m*additional genomes is

*P*(

*m*) = exp(−ν)ν

^{m}

*/m*!, where

*m*! =

*m*(

*m −*1)… (2)(1) and ν is the mean number of additional genomes. Thus, of all infected cells in LT, a fraction

*P*(

*n*) =

*p*(

*n −*1) harbors

*n*proviruses, with

*n*≥ 1. This distribution has been measured as the frequency of occurrence of different numbers of proviruses in infected splenocytes in two HIV-1-infected individuals (5). By fitting

*P*(

*n*) to the measurements, we estimate the parameter ν. The best fit yields ν = 2.3 (2.28 for patient R and 2.25 for patient B) (Fig. 1a and b), suggesting that in these two individuals, on average, cells in LT were infected ν + 1 = 3.3 times. It can be shown that this value of ν satisfies equation 1 for reasonable values of

*V*,

*T**,

*k*

*,*

_{V,0}*k*

*, and*

_{T,0}*t*

*(see note A1 in Appendix), giving us further confidence in this parameter estimate. This model therefore presents a simple and reasonable picture of how multiple infections are orchestrated in LT by sequential infectious contacts of cells with cell-free virus and infected cells.*

_{d}One apparent limitation of this description is that the Poisson distribution, although providing good fits (Fig. 1a and b), does not capture the bimodal distribution of the data. The distribution of proviral copy number observed in the two patients (5) peaked at multiplicities of 1 and 3, whereas the fit predicts a single peak at 3.3. Whether this bimodal distribution will be characteristic of larger patient populations is unknown, but given the data at hand we asked whether we can capture this bimodal distribution. To do this, we suggest an alternative plausible description in which multiple genomes are transmitted simultaneously in individual infectious events. Multiple genomes may be transmitted either through direct cell-cell contact (11), by transmission through dendritic cell-T-cell contact (2), or via the intriguing new mechanism suggested by the Trojan exosome hypothesis, where HIV-1 exploits a cell-encoded intercellular vesicle traffic pathway for Env-independent infection of nearby cells (3).

We assume that infectious contact of a target cell with a cell-free virion results in the transmission of a single viral genome to the cell, whereas an infectious contact with an infected cell results in the transmission of *m* ≥ 1 genomes. *m* is assumed to follow the modified Poisson distribution, *q*(*m*) = {exp(−γ)/[1 − exp(−γ)]}γ^{m}*/m*!, where γ is the average number of genomes transmitted per infectious cell-cell contact. (The Poisson distribution is modified to rule out the transmission of zero genomes in any infectious cell-cell contact. Some contacts could lead to no transmission of genomes, but we only considered contacts in which transmission occurs.) Assuming that each cell participates in a maximum of one infectious event, the probability that this infectious event is contact with a cell-free virion is given by the following equation:
$$mathtex$$\[f{=}k_{v,0}V/(k_{v,0}V{+}k_{T,0}T^{{\ast}})\]$$mathtex$$(2)
and 1 − *f* is the probability that the infectious event involves cell-cell contact. Under these circumstances, an infected cell can possess a single provirus if it is infected by a cell-free virion or contacts an infected cell that happens to transmit a single genome. On the other hand, it can possess *n* > 1 proviruses if it contacts an infected cell resulting in the successful transmission of *n* genomes. Thus, if *P*(*n*) is the probability that an infected cell possesses *n* proviruses, it follows that *P*(1) = *f* + (1 − *f*)*q*(1), and *P*(*n* > 1) = (1 − *f*)*q*(*n*), where *q*(*n*) is the modified Poisson probability defined above. *P*(*n*) thus gives the distribution of the multiplicity of infection measured (5). We fit *P*(*n*) to the measurements using *f* and γ as adjustable parameters as shown in Fig. 1c and d. The best fits yielded the following values: *f* = 0.1 and γ = 3.4 for one patient (R), and *f* = 0.06 and γ = 3.3 for the other (B). (A more complicated model assuming the possibility of additional sequential infections yielded identical fits [see note A2 in the Appendix].)

The simultaneous infection model thus predicts that ∼10% of infections in LT are transmitted by cell-free virions and ∼90% are transmitted by infected cells and that every infectious cell-cell contact results in the average transmission of ∼3.4 viral genomes. The model also explains the bimodal distribution of the multiplicity of infection observed in the data (5). The maximum at *n* = 1 reflects transmission by cell-free virions, whereas the maximum at *n* = 3 is the result of cell-to-cell transmission. For patient R, in whom ∼10% of the transmissions were by cell-free virions, the best fit of the model captured the bimodal distribution. For patient B, in whom <6% infections were transmitted by cell-free virions, the maximum at *n* = 1 was suppressed in the best fit. It can be shown that the best-fit values of *f* satisfy equation 2 for reasonable values of *V*, *T**, *k*_{V,0}, and *k*_{T,0}, indicating that the model provides a self-consistent picture of infection spread in LT (see note A3 in the Appendix).

Two plausible mechanisms thus explain the observations of the multiplicity of infections in LT of HIV-1-infected individuals (5). Cells may possess multiple proviruses either due to their participation in a series of sequential infectious events, each transmitting single viral genomes, or in single infectious events involving contact with other infected cells, in which multiple genomes are simultaneously transmitted. Of interest is that with either mechanism, the Poisson process predominantly underlies the observed distribution of proviral copy number per cell. Poisson statistics describe processes where the probability of an event occurring in a sufficiently small time interval Δ*t* is λΔ*t*, where λ is the average rate at which events occur, the probability of more than one event occurring in a sufficiently small interval is zero, and the numbers of events occurring in nonoverlapping intervals are independent. Thus, in the sequential infection case, the participation of a cell in an infectious event does not make its participation in another infectious event less or more likely.

Which of these mechanisms occurs in vivo can be tested in several ways. A direct test would be to determine whether infectious cell-cell contacts result in the transmission of multiple genomes, as has been suggested to occur in vitro between dendritic cells and T cells (2). Alternatively, the two mechanisms may be distinguished by their qualitatively distinct predictions of the distribution of proviral copy number per cell, viz, unimodal versus bimodal. While the better fits of the simultaneous infection model (see residual sum of squares [RSS] values in Fig. 1) to the present data (5) suggest a bimodal distribution, investigation of a larger set of patients is necessary to establish this trend unequivocally. On the other hand, the proviruses described in reference 5 were genetically diverse, which argues that sequential infection may play an important role (see below). The implications for our understanding of HIV-1 pathogenesis and the impact of therapy, in particular the emergence of drug-resistant strains via recombination, can be significant. Proviruses within an infected cell resulting from genomes acquired by an infectious cell-cell contact are expected to exhibit limited genetic variability in comparison to those acquired by sequential infections, as the latter are likely to have arisen from different cells. Consequently, sequential infections will favor the emergence of divergent strains via recombination. (An exception could be when transmission occurs via a T-cell-dendritic cell contact and the latter cell has captured diverse virions.)

To gain an understanding of how rapidly recombinant strains emerge according to the two mechanisms, consider a target cell in a pool of cell-free virions of two genotypes in equal abundance. Let there also be cells infected with either genotype (but not both) so that the populations of infected cells and cell-free virions of either genotype are at steady state. Recombinant genotypes can emerge if the target cell is infected with multiple genomes, not all of which are of the same genotype. Let *P** _{R}* be the probability that the target cell acquires multiple genomes, not all of which are of the same genotype. Then, if infection were spread by the sequential transmission mechanism, it follows that

*P*(

_{R}= ∑_{n = 2}^{∞}P*n*)[1 − 2(1/2)

^{n}], where

*P*(

*n*) is the probability that the cell acquires

*n*genomes and 2(1/2)

^{n}is the probability that all

*n*genomes belong to one genotype, where infection by either genotype is assumed to be equally likely. The prefactor of 2 arises as there are two genotypes, while the factor (1/2)

*is analogous to the probability of getting all heads or all tails in*

^{n}*n*tosses of an ideal coin. Assuming

*P*(

*n*) from the above fits (with ν = 2.3), we find

*P*

*to be*

_{R}*∼*0.7.

If infection proceeds by the simultaneous infection mechanism, the emergence of recombinant genotypes still requires the target cell to participate in more than one infectious event, as multiple genomes acquired from a single infectious contact are all expected to be of the same genotype. In this case, *ν* from equation 1 may be interpreted as the average number of additional infectious events an infected cell participates in. From the range of parameter estimates in note A3 of the Appendix, we found that ν assumes a maximum value of 0.02. This yields a *P** _{R}* value of

*∼*0.01 from the above summation. It thus follows that recombinant genotypes will emerge ∼70 times or nearly 2 orders of magnitude faster with the sequential infection mechanism than with the simultaneous infection mechanism, in which a homogeneous population of genomes is transmitted per cell-cell contact.

While the calculation above argues that sequential infection can generate diversity more rapidly than simultaneous infection via cell-to-cell transmission, it does not imply that observing diverse proviruses in chronically infected patients means that simultaneous infection does not occur. If both sequential and simultaneous transmission of genomes occurs, sequential infections over time will build up a diverse set of genomes that can then be transmitted by cell-to-cell infection. Thus, once a diverse set of proviruses establishes itself, possibly in primary infection, both sequential and simultaneous transmission events would be consistent with the observed proviral diversity.

One limitation of our work is that we have assumed that cell populations are homogeneous. If multiple cell populations with different susceptibilities to infection and different probabilities of transmitting infection are considered, more complex models with additional parameters could be generated, which also could explain the data. Here we have striven to present the simplest models that are compatible with the data and which can be tested.

## APPENDIX

**Note A1**Since single viral genomes are transmitted per infectious event in the sequential infection model, transmitting cells need not be productively infected (11). The total infected cell population in LT is estimated to be ∼10^{10} cells (4), which yields *T** = 10^{10}/700 ml^{−1} = 1.4 × 10^{7} ml^{−1}. (An average 70-kg human being has about 700 g of LT at a density of ∼1 g cm^{−3}.) The second-order infection rate, *k*_{V,0}, has been set to 2% of the diffusion-limited collision rate, 4π(*r** _{T}* +

*r*

*)(*

_{V}*D*

*+*

_{T}*D*

*), determined from the Smoluchowski equation (9), which assumes that the radii of T cells,*

_{V}*r*

*, is 4 × 10*

_{T}^{−4}cm and that of virions,

*r*

*, is 5 × 10*

_{V}^{−6}cm, the diffusivity of T cells,

*D*

*, is 1.3 × 10*

_{T}^{−8}cm

^{2}s

^{−1}(7) and that of virions,

*D*

*, is 2 × 10*

_{V}^{−8}cm

^{2}s

^{−1}, yields

*k*

_{V,0}= 2.3 × 10

^{−12}cm

^{3}s

^{−1}= 2 × 10

^{−7}ml day

^{−1}. Similarly, the collision rate between two T cells is 16π

*r*

_{T}*D*

*, of which a fraction, α, leads to successful infections, so that*

_{T}*k*

_{T,0}

*=*α16π

*r*

_{T}*D*

*. In vitro studies find cell-to-cell transmission of infection to be more efficient than transmission by cell-free virus (11). Assuming the maximum efficiency of the former to be 10 times greater than the latter, we write 0.02 < α < 0.2, which yields 4.5 × 10*

_{T}^{−7}ml day

^{−1}<

*k*

_{T,0}< 4.5 × 10

^{−6}ml day

^{−1}. Three HIV genes, viz.,

*nef*,

*env*, and

*vpu*, are together responsible for the elimination of almost all CD4 molecules from the surface of an infected cell, with

*nef*exhibiting the predominant influence (1). Piguet et al. have recently determined the fraction of surface CD4 receptors internalized by the Nef protein as a function of time (10). An exponential fit to their data yields a

*t*

*value of ∼40 min or 0.028 day (data not shown). With ν = 2.3, equation 1 then yields*

_{d}*V*= 9.5 × 10

^{7}to 3.8 × 10

^{8}ml

^{−1}, which is in the range expected for the cell-associated viral load in LT (4).

**Note A2**We consider a model where a maximum of two sequential events can occur, each transmitting 1 or *n* viral genomes, depending on whether the event involves contact with a cell-free virion or an infected cell, respectively. Let *h* be the probability that a second infection does not occur. Then, the probability that the second infection is due to a cell-free virion is (1 − *h*)*f*, and the probability that it is due to another cell is (1 − *h*) (1* − f*). The probability that an infected cell harbors a single provirus is given by *P*(1) *= fh +* (1 − *f*)*q*(1)*h*, where the first term is the probability that the first of two sequential infections is due to a cell-free virion and that a second infection does not occur. The second term is the probability that the first infection is due to an infected cell which happens to transmit a single genome, the probability of which is the modified Poisson distribution, *q*(1), and that a second infection does not occur. Similarly, *P*(2) *= f*[(1 − *h*)*f +* (1 − *h*) (1 − *f*)*q*(1)] *+* (1 − *f*)*q*(1)[(1 − *h*)*f +* (1 − *h*) (1 − *f*)*q*(1)] *+* (1 − *f*)*q*(2)*h*, and *P*(*n* > 2*)* = (1 − *f*)*q*(*n*)*h + f*(1 − *h*) (1 − *f*)*q*(*n* − 1) *+* (1 − *f*)*q*(*n* − 1) (1 − *h*)*f +* (1 − *f*)^{2}(1 − *h*) *Σ _{m}*

*q*(

*n - m*)

*q*(

*m*), where the summation in the last term extends from

*m*= 1 to

*m*=

*n*− 1. Fitting this model to the data in Fig. 1, with

*f*, γ, and

*h*as parameters to be estimated, yielded

*h*= 0 and best-fit values of

*f*and γ that are identical to those of the model, allowing target cells to suffer a maximum of one infectious event.

**Note A3**Since each cell-to-cell transmission event results in the passage of several viral genomes, transmitting cells are most likely to be productively infected. In the chronically infected steady state in HIV-infected individuals, ∼10^{7} to ∼10^{8} CD4^{+} T cells in LT are productively infected (4). The density of productively infected cells in LT, *T**, is therefore ∼10^{7} to 10^{8}/700 ml^{−1}, or ∼1.4 × 10^{4} ml^{−1} to 1.4 × 10^{5} ml^{−1}. From equation 2, *f* = 0.1 implies that 9*k*_{V,0}*V = k*_{T,0}*T**. For the range of *k*_{T,0} (see note A1, above) and *T** estimated, the cell-free viral load in LT, *V = k*_{T,0}*T*/*9*k*_{V,0} = 3.5 × 10^{3} ml^{−1} to 3.5 × 10^{5} ml^{−1}, which is comparable to the viral load in plasma but lower than the cell-associated viral load in LT (4).

## ACKNOWLEDGMENTS

This research is supported by the Department of Energy under contract W-7405-ENG-36 and by National Institutes of Health grants AI28433 and RR06555.

We thank Simon Wain-Hobson for providing the experimental data in Fig. 1.

## FOOTNOTES

- Received 21 January 2004.
- Accepted 19 May 2004.

- Copyright © 2004 American Society for Microbiology