**DOI:**10.1128/JVI.01010-10

## ABSTRACT

Nevirapine forms the mainstay of our efforts to curtail the pediatric AIDS epidemic through prevention of mother-to-child transmission of HIV-1. A key limitation, however, is the rapid selection of HIV-1 strains resistant to nevirapine following the administration of a single dose. This rapid selection of resistance suggests that nevirapine-resistant strains preexist in HIV-1 patients and may adversely affect outcomes of treatment. The frequencies of nevirapine-resistant strains *in vivo*, however, remain poorly estimated, possibly because they exist as a minority below current assay detection limits. Here, we employ stochastic simulations and a mathematical model to estimate the frequencies of strains carrying different combinations of the common nevirapine resistance mutations K103N, V106A, Y181C, Y188C, and G190A in chronically infected HIV-1 patients naïve to nevirapine. We estimate the relative fitness of mutant strains from an independent analysis of previous competitive growth assays. We predict that single mutants are likely to preexist in patients at frequencies (∼0.01% to 0.001%) near or below current assay detection limits (>0.01%), emphasizing the need for more-sensitive assays. The existence of double mutants is subject to large stochastic variations. Triple and higher mutants are predicted not to exist. Our estimates are robust to variations in the recombination rate, cellular superinfection frequency, and the effective population size. Thus, with 10^{7} to 10^{8} infected cells in HIV-1 patients, even when undetected, nevirapine-resistant genomes may exist in substantial numbers and compromise efforts to prevent mother-to-child transmission of HIV-1, accelerate the failure of subsequent antiretroviral treatments, and facilitate the transmission of drug resistance.

Mother-to-child transmission of HIV infection has triggered a pediatric AIDS epidemic in many parts of the world. In 2008, 2.1 million children under the age of 15 years were living with HIV infection worldwide, 430,000 children were newly infected, and 280,000 children died due to HIV/AIDS (62). Most HIV infections in children occur due to the transmission of the virus from the mothers during pregnancy or delivery or subsequently through breast-feeding (9). Nevirapine (NVP), a nonnucleoside reverse transcriptase inhibitor (NNRTI), forms the mainstay of our efforts to prevent mother-to-child transmission of HIV-1 infection in the developing world. A single dose of NVP (SDNVP) to the mother at the onset of labor and to the infant within 72 h of birth lowered the risk of mother-to-child transmission by 41% (22). A key limitation of SDNVP, however, is the rapid selection of NVP resistance mutations (15). For instance, after 6 to 8 weeks of SDNVP, 87% of mothers and infants were found to harbor NVP resistance mutations by an allele-specific assay (34). NVP resistance mutations decay with time following their selection after SDNVP but may persist in some patients for several years (15, 17, 34, 44). NVP resistance mutations may also be archived in latently infected cells (64), which have half-lives of ∼44 months (55). The rapid selection and persistence of NVP resistance mutations following SDNVP may compromise the efficacy of SDNVP during subsequent pregnancies, may adversely impact outcomes of subsequent antiretroviral therapies containing NNRTIs, and may result in the transmission of drug-resistant strains of HIV-1 (18, 27, 33, 36, 38, 60).

HIV exists in infected individuals as a quasispecies, a collection of related yet distinct viral genomes (7, 57). Specific mutations in the viral genome are responsible for resistance to current antiretroviral drugs (6, 25). Thus, genomes carrying drug resistance mutations may exist as part of the viral quasispecies in infected individuals prior to the onset of therapy (31, 41). Mutations typically carry a fitness penalty (3). Consequently, drug-resistant genomes may exist as a minority, with the fitter, susceptible genomes dominant in the quasispecies (19, 24, 39, 56). Resistant genomes have a fitness advantage in the presence of drugs and, consequently, are rapidly selected following the onset of therapy, increasing the risk of the failure of therapy (19, 24, 39, 56). The rapid selection of NVP resistance mutations following SDNVP implies that these mutations may exist as a minority quasispecies in treatment-naïve individuals. Indeed, the risk of failure of NNRTI-based antiretroviral therapy increased substantially when initiated after administration of SDNVP (27, 33, 60). In particular, among 60 women who started NVP-based antiretroviral therapy within 6 months of receiving either SDNVP or placebo, 41.7% of the women who received SDNVP and none who received placebo had virologic failure (*P* < 0.001) (33).

Knowledge of the frequencies of minority NVP resistance mutations is essential for accurate assessment of the risk of mother-to-child transmission, the outcomes of NNRTI-based antiretroviral therapies, and the epidemiology of the transmission of drug resistance (38). Indeed, substantial efforts are ongoing to assess the frequencies and the prevalence of minority NVP resistance mutations in HIV-infected individuals using novel, highly sensitive detection techniques (14, 16, 23, 44). For instance, in 50 South African women without any detectable NVP resistance mutations before SDNVP, 10 were detected with the common K103N NVP resistance mutation following SDNVP, using a standard population-based sequencing analysis which detects mutations present in frequencies of >20%, whereas when a sensitive real-time reverse transcription-PCR (RT-PCR) technique with an assay detection limit of ∼0.1% was used, an additional 16 women were found to harbor the K103N mutation (23). Recent advances in detection techniques have thus provided improved estimates of the frequencies and the prevalence of drug resistance mutations. Yet, current techniques may not be able to assess fully the impact of drug resistance mutations. The population of infected cells within an HIV-1 patient is ∼10^{7} to 10^{8} (20). Thus, even with the best assay sensitivities of ∼0.01% (43), genomes that infect fewer than 10^{3} to 10^{4} cells would remain undetected and yet may have a substantial impact on treatment outcome. Alternative ways of estimating the frequencies of minority NVP resistance mutations are therefore necessary. Here, we employ detailed simulations of the within-host evolution of HIV-1 and a mathematical model of HIV-1 dynamics to estimate the frequencies of NVP resistance mutations in a chronically infected individual naïve to antiretroviral therapy.

## MATERIALS AND METHODS

Estimation of the relative fitness of genomes by analysis of competitive growth assays.We consider the experiments of Collins et al. (8), who performed competitive growth assays in which cells were exposed at low multiplicity of infection (MOI) simultaneously to the wild-type virus and a mutant, *i*, and the time evolution of the ratio of the mutant and wild-type viruses was measured. The experiments proceeded over a short duration (∼6 days), when viral populations grow predominantly by replication; mutation and recombination are negligible. The time evolutions of the populations of the wild-type and mutant virions are then governed by the following equations:
$$mathtex$$\[\frac{dT}{dt}{=}({\lambda}{-}d_{T})T{-}k_{0}T(V_{00}{+}V_{ii})\]$$mathtex$$(1a)$$mathtex$$\[\frac{dT_{0}}{dt}{=}k_{0}TV_{00}{-}{\delta}T_{0}\]$$mathtex$$(1b)$$mathtex$$\[\frac{dT_{i}}{dt}{=}k_{0}TV_{ii}{-}{\delta}T_{i}\]$$mathtex$$(1c)$$mathtex$$\[\frac{dV_{00}}{dt}{=}N{\delta}T_{0}{-}cV_{00}\]$$mathtex$$(1d)$$mathtex$$\[\frac{dV_{ii}}{dt}{=}N{\zeta}_{i}{\delta}T_{i}{-}cV_{ii}\]$$mathtex$$(1e) Here, uninfected cells, *T*, proliferate in proportion to their abundance with a rate constant, λ, and die with a rate constant, *d _{T}*. Uninfected cells are also infected by wild-type virions,

*V*

_{00}, and mutant virions,

*V*, with the second-order infection rate constant

_{ii}*k*

_{0}. We denote by

*T*

_{0}and

*T*cells infected with the wild-type and the mutant genomes, respectively. The latter cells die with a first-order rate constant, δ. Cells infected with the wild type produce

_{i}*N*progeny virions per cell, the viral burst size. Cells infected with the mutant produce a smaller number of virions,

*N*ζ

_{i}, where ζ

_{i}(0 ≤ ζ

_{i}≤ 1) is the relative fitness of genome

*i*. Free virions are lost with the first-order rate constant

*c*.

We solve the above equations using parameter values representative of HIV-1 infection *in vitro* (2, 58, 61): λ = 0.624 day^{−1}, *d _{T}* = 0.018 day

^{−1}, δ = 1.44 day

^{−1},

*c*= 3.5 day

^{−1},

*N*= 10

^{3}, and

*k*

_{0}= 10

^{−8}day

^{−1}. At time

*t*= 0, 5 × 10

^{5}uninfected cells were exposed to virions (MOI = 0.001), of which an experimentally determined fraction contained mutant genomes (8). The ratio

*V*/

_{ii}*V*

_{00}yields the mutant/wild-type ratio measured experimentally. We fit model predictions of the time evolution of

*V*/

_{ii}*V*

_{00}to the experimental data using ζ

_{i}as an adjustable parameter for each of the viable NVP-resistant mutants. The fits were performed using the Levenberg-Marquardt algorithm in MATLAB.

Simulations of viral evolution. (i) Representation and initialization.We consider a fixed population, *C*, of uninfected cells exposed to a pool of *V* virions. Each virion consists of two RNA genomes represented by bit-strings of length *L* = 5 each. The five positions represent the NVP resistance loci 103, 106, 181, 188, and 190. Infection begins with a pool of identical homozygous virions, where each position has the wild-type allele.

(ii) Infection.We let infections occur in discrete generations. In any generation, each cell is infected by *M* virions drawn from the viral pool, where *M* follows a predetermined distribution of the frequency of multiple infections of cells (see below). Virions are chosen for infection randomly from the viral pool.

(iii) Reverse transcription.Following infection, reverse transcription converts the viral RNA in each infecting virion to proviral DNA of length *L*. Here, mutation and recombination introduce genomic variations. We implement recombination first as follows. At the first position, we choose the nucleotide from one of the two strands with a probability of 0.5. At the next position, we choose the nucleotide from the same strand if an even number of crossovers, or template switches, occurs between the two positions. If *l* is the number of nucleotides between positions 1 and 2, then the probability that even crossovers occurs between the two positions is *P*_{even}(*l*) = exp(−ρ*l*)cosh(ρ*l*), where ρ is the recombination rate expressed as the number of crossovers per site per replication (61). For instance, positions 1 and 2 here correspond to amino acid positions 103 and 106, respectively. The number of nucleotides, *l*, between these positions is assumed to be 9, three times the corresponding amino acid separation. We generate a random number from a uniform distribution between 0 and 1. If the number is smaller than *P*_{even}(*l*), then we choose the nucleotide at the second position from the same strand as the first. If not, we choose the nucleotide from the other strand. We repeat this process for all the five positions and generate a recombinant genome. Next, we mutate the recombinant genome at each position with a probability, μ, per site per replication and obtain the resulting proviral DNA. We repeat the process for all virions infecting cells.

(iv) Viral production.The proviral DNAs are then transcribed into viral RNAs, which are assorted randomly into pairs and released as new virions. We assume that the production of virions from a cell depends on the relative fitness of genomes infecting the cell. We consider a cell infected with *M* proviral DNA, numbered *j* = 1, 2,…, *M*, with fitness ζ_{j}. The cell then produces *P*ζ progeny virions, where $$mathtex$$\({\zeta}{=}(1/M){\sum}_{j\ {=}1}^{M}\ {\zeta}_{j}\)$$mathtex$$ is the average fitness of the proviral DNA. Viral RNAs for inclusion in the progeny virions are picked with a probability equal to the relative fitness of genomes within the cell. Thus, provirus *j* is chosen with a probability $$mathtex$$\({\zeta}\ _{j}/{\sum}_{j{=}1}^{M}\ {\zeta}_{j}.\)$$mathtex$$ This process of viral production is repeated for all cells. In effect, we ensure that within a cell, fitter genomes are chosen more frequently than less-fit genomes during viral production and that across cells, fitter virions are produced in larger numbers. The progeny virions form the viral pool for infection of the next generation of uninfected cells.

(v) Calculations of viral diversity and genome frequencies.In each generation, we compute the average diversity, $$mathtex$$\(d_{G}{=}2\ \left(\ {{\sum}_{i{=}1}^{Q{-}1}}\ {{\sum}_{j\ {=}i\ {+}1}^{Q}}d_{ij}\right)/Q(Q{-}1),\)$$mathtex$$ of the *Q* proviruses present in that generation and their average divergence from the founder sequence, $$mathtex$$\(d_{s}{=}{{\sum}_{i{=}1}^{Q}}d_{i0}/Q,\)$$mathtex$$ where *d _{ij}* is the Hamming distance per position between genomes

*i*and

*j*and 0 represents the founder sequence. The Hamming distance between two genomes is the number of positions at which the two genomes differ. We also compute the frequency of each of the

*S*mutant genomes among the proviral DNA, $$mathtex$$\({\phi}_{i}^{C}{=}Q_{i}/Q\)$$mathtex$$, where

*Q*is the number of proviral DNAs of type

_{i}*i*∈(0, 1, 2,…,

*S*), and among the viral RNAs, $$mathtex$$\({\phi}_{i}^{V}{=}G_{i}/2V\)$$mathtex$$, where

*G*is the number of viral RNA genomes of type

_{i}*i*and

*V*is the total viral population. We average the frequencies over 1,000 generations after viral diversity reaches equilibrium (see text). Furthermore, 600 realizations of the infection process are averaged to obtain the expected values of φ

_{i}

^{C}and φ

_{i}

^{V}.

The simulations are implemented using a computer program written in C++.

(vi) Simulation parameters.We employ the following parameter values. Each infected cell may produce ∼10^{2} to 10^{4} virions (5, 20, 21), only a few of which may be infectious (10, 11, 45). Recent studies estimate the median basic reproductive ratio of HIV-1 *in vivo* to be 6 to 8, indicating that each infected cell would produce at least 6 to 8 virions that successfully infect target cells (49, 59). We therefore let each cell produce a maximum of *P* = 10 infectious progeny virions and let the number decrease as the fitness of genomes infecting the cell decreases (see above). We fix the substitution rate at μ = 3 × 10^{−5} substitutions per site per replication, which is approximately the rate found by Mansky and Temin (29, 35), and we fix the recombination rate at ρ = 8.3 × 10^{−4} crossovers per site per replication (32, 61). We employ the relative fitness, ζ_{i}, of each of the *S* resistant genomes obtained from our analysis of growth competition assays (see Fig. 2).

Estimates of the within-host effective population size of HIV-1 vary over a wide range: 10^{3} to 10^{6} cells (1, 30, 40, 50, 52-54). Previous simulations similar to our present simulations capture observations of HIV-1 diversification *in vivo* with effective population sizes, *C*, that span this wide range. In one study, assuming that each cell is infected with *M* = 3 virions, following observations of infected splenocytes in two patients (28), a result in which *C* is ∼10^{3} cells captured observations of the time evolution of HIV-1 diversity in patients over extended durations (∼10 years) following seroconversion (63). In another study, assuming multiple infections of cells are rare, as observed in peripheral blood mononuclear cells in four chronically infected patients (26), a much larger value of *C* (>10^{5}) captured experimental observations of the frequency of the least prevalent haplotype in a two-locus/two-allele model (52). We therefore perform simulations with both combinations of values of *C* and *M*: low *C* = 10^{3} to 10^{4} cells and high *M* = 3 per cell, and high *C* = 10^{5} and low *M*, the latter *M* following a distribution in which 78% of the cells are singly infected, 18% doubly infected, 3% triply infected, and the rest quadruply infected. This latter distribution is consistent with the recently suggested lower limit of 8 to 10% for the frequency of coinfection (51) and is similar to the experimental distribution (26).

Mathematical model of HIV-1 dynamics and evolution.We constructed a deterministic model of HIV dynamics to predict the frequencies of NVP-resistant genomes when the effective population size is large and the frequency of multiple infections of cells is small. We recognize that genomes of a particular kind are produced by replication from cells they infect and by mutation and recombination of related genomes. Genomes are lost by clearance and by mutation to and recombination with other genomes. The equations below then describe the consequent time evolution of the populations of different genomes:
$$mathtex$$\[\frac{dT}{dt}{=}{\lambda}{-}d_{T}T{-}k_{0}T\ {{\sum}_{h{=}j}^{S}}{{\sum}_{j{=}0}^{S}}V_{jh}\]$$mathtex$$(2)$$mathtex$$\[\frac{dT_{i}}{dt}{=}k_{0}T\ {{\sum}_{h{=}j}^{S}}{{\sum}_{j{=}0}^{S}}Q_{i}(jh)V_{jh}{-}k_{1}T_{i}{{\sum}_{h{=}j}^{S}}{{\sum}_{j{=}0}^{S}}V_{jh}{-}{\delta}T_{i}\]$$mathtex$$(3)$$mathtex$$\[\frac{dT_{ii}}{dt}{=}k_{1}T_{i}\ {{\sum}_{m{=}k}^{S}}{{\sum}_{k{=}0}^{S}}Q_{i}(km)V_{km}{-}{\delta}T_{ii}\]$$mathtex$$(4a)$$mathtex$$\[\frac{dT_{ij}}{dt}{=}k_{1}T_{j}\ {{\sum}_{m{=}k}^{S}}{{\sum}_{k{=}0}^{S}}Q_{i}(km)V_{km}{+}k_{1}T_{i}{{\sum}_{m{=}k}^{S}}{{\sum}_{k{=}0}^{S}}Q_{j}(km)V_{km}{-}{\delta}T_{ij}\]$$mathtex$$(4b)$$mathtex$$\[\frac{dV_{ii}}{dt}{=}N{\zeta}_{i}{\delta}{\{}T_{i}{+}T_{ii}{\}}{+}\frac{N}{2}{\delta}\ \left\{\ {{\sum}_{h{=}0}^{i{-}1}}\frac{{\zeta}_{i}^{2}}{{\zeta}_{h}{+}{\zeta}_{i}}T_{hi}{+}{{\sum}_{h{=}i{+}1}^{S}}\frac{{\zeta}_{i}^{2}}{{\zeta}_{i}{+}{\zeta}_{h}}T_{ih}\right\}{-}cV_{ii}\]$$mathtex$$(5a)$$mathtex$$\[\frac{dV_{ij}}{dt}{=}N{\delta}\frac{{\zeta}_{i}{\zeta}_{j}}{{\zeta}_{i}{+}{\zeta}_{j}}T_{ij}{-}cV_{ij}\]$$mathtex$$(5b) Here, we number the different kinds of genomes from 0 to *S* = 31, with 0 representing the wild type, carrying no NVP resistance mutations, and *S* representing the genome carrying all five NVP resistance mutations. *V _{jh}* is the population of virions containing genomes

*j*and

*h*, where

*j*,

*h*∈{0, 1,…,

*S*} and

*j*≤

*h*. Uninfected CD4

^{+}T cells,

*T*, are produced at a constant rate, λ, die at rate

*d*, and are lost due to infection by free virions, which occur with the rate constant

_{T}*k*

_{0}(equation 2). Following infection by a virion

*V*, reverse transcription yields a proviral genome

_{jh}*i*∈{0, 1,…,

*S*} with a probability

*Q*(

_{i}*jh*).

*Q*(

_{i}*jh*) depends on the mutation rate, μ, the recombination rate, ρ, and the separations between loci, and has been described elsewhere (2). Cells

*T*, productively infected with a single provirus

_{i}*i*, are produced by infection of

*T*by free virions followed by mutation and recombination events that yield genome

*i*and are lost by further infections, which occur with the rate constant

*k*

_{1}, or by death at rate δ (equation 3).

*k*

_{1}<

*k*

_{0}because of virus-induced CD4 downmodulation, which lowers the susceptibility of infected cells to further infections (4, 46). Infection of

*T*yields doubly infected cells of two kinds:

_{i}*T*, carrying two copies of genome

_{ii}*i*; and

*T*, carrying a copy each of distinct genomes

_{ij}*i*and

*j*(equation 4).

*T*cells are produced additionally by infection of

_{ij}*T*. For simplicity, we ignore more than two infections of cells.

_{j}Cells *T _{i}* and

*T*produce homozygous virions

_{ii}*V*(equation 5). We assume that a cell

_{ii}*T*produces

_{i}*N*=

_{i}*N*ζ

_{i}progeny virions, where 0 ≤ ζ

_{i}≤ 1 is the relative fitness of genome

*i*. ζ

_{i}= 1 if

*i*= 0, the wild type. Furthermore, we assume that a cell

*T*produces

_{ij}*N*= (

_{ij}*N*+

_{i}*N*)/2 progeny virions, comprising homozygous virions

_{j}*V*and

_{ii}*V*and heterozygous virions

_{jj}*V*in fractions $$mathtex$$\({\zeta}_{i}^{2}/({\zeta}_{i}{+}{\zeta}_{j})^{2},\ {\zeta}_{j}^{2}/({\zeta}_{i}{+}{\zeta}_{j})^{2}\)$$mathtex$$, and $$mathtex$$\(2{\zeta}_{i}{\zeta}_{j}/({\zeta}_{i}{+}{\zeta}_{j})^{2}\)$$mathtex$$, respectively. The fractions follow from the assumption that in the cell

_{ij}*T*excess genomes of types

_{ij}*i*and

*j*are produced in proportion to

*N*and

_{i}*N*and are randomly assorted into progeny virions. Note that this description of viral production mimics our simulations above. Virions are cleared at the rate

_{j}*c*.

In due course, the different viral populations reach steady states. We obtain the steady-state populations by solving the above equations with the left-hand sides set to zero. The frequency of genome *i* is then the fraction of genomes of type *i* at steady state:
$$mathtex$$\[{\phi}_{i}^{V}{=}\frac{2V_{ii}{+}\ {{\sum}_{j{=}0}^{j{-}1}}V_{ji}{+}{{\sum}_{j{=}i{+}1}^{S}}V_{ij}}{2{{\sum}_{j{=}i}^{S}}{{\sum}_{i{=}0}^{S}}V_{ij}}\mathrm{and}{\phi}_{i}^{C}{=}\frac{T_{i}{+}2T_{ii}{+}{{\sum}_{j{=}0}^{i{-}1}}T_{ji}{+}{{\sum}_{j{=}i{+}1}^{S}}T_{ij}}{{{\sum}_{i{=}0}^{S}}T_{i}{+}2{{\sum}_{j{=}i}^{S}}{{\sum}_{i{=}0}^{S}}T_{ij}}\]$$mathtex$$(6) We employ parameter values representative of HIV-1 infection *in vivo* (2, 13, 35, 61): λ = 10^{5} cells ml^{−1} day^{−1}, *d _{T}* = 0.1 day

^{−1},

*k*

_{0}= 2.4 × 10

^{−8}ml day

^{−1},

*k*

_{1}= 0.7

*k*

_{0}, δ = 1 day

^{−1},

*c*= 23 day

^{−1}, and

*N*= 4,792, which yields a basic reproductive ratio of 5, μ = 3 × 10

^{−5}substitutions per site per replication, and ρ = 8.3 × 10

^{−4}crossovers per site per replication. For the NVP resistance mutations we consider, single-nucleotide substitutions yield amino acid changes. We assume that the nucleotide distance between loci is three times the corresponding amino acid distance. The values representing relative fitness of genomes, ζ

_{i}, are obtained from analysis of growth competition assays (Fig. 2). We solve model equations using a computer program written in C.

## RESULTS

In this study, we considered genomes carrying the common NVP resistance mutations K103N, V106A, Y181C, Y188C, and G190A. Depending on whether each of these loci has a mutation or not, a total of *S* = 2^{5} − 1 = 31 distinct resistant genomes exist, of which 5 are single mutants, $$mathtex$$\(\left(\ \begin{array}{l}5\\2\end{array}\ \right){=}10\)$$mathtex$$ are double mutants, and so on. Our aim was to predict the frequencies of each of these *S* genomes in a chronically infected individual naïve to antiretroviral therapy. The frequencies depend on the relative fitnesses of the genomes, which remain poorly estimated. Here, we estimated the relative fitness of each of the *S* genomes by analyzing previously published data from competitive growth assays.

Relative fitness of NVP-resistant genomes.We considered the experiments of Collins et al. (8), who constructed the 31 NVP-resistant mutants of interest here using site-directed mutagenesis and performed competitive growth assays in which cells were exposed at low MOI simultaneously to the wild type and a mutant and the time evolution of the ratio of the mutant and wild-type viruses was measured using a quantitative real-time RT-PCR-based assay. We employed a model of HIV dynamics *in vitro* to predict the time evolution of the mutant and wild-type viruses under the conditions of the experiments (see Materials and Methods). We fit model predictions to the experimental data for each of the viable NVP-resistant mutants. Model predictions provide good fits to the data (Fig. 1; see Fig. S1 in the supplemental material) and yield estimates of the relative fitness of the mutant genomes (Fig. 2).

We find that the relative fitness decreases with the number of mutations: the relative fitnesses (mean ± standard deviation [SD]) of the wild type and single, double, triple, quadruple, and quintuple mutants are 1, 0.74 ± 0.14, 0.56 ± 0.19, 0.32 ± 0.19, 0.27 ± 0.26, and 0, respectively. Within a mutant class, the relative fitness of different mutants varies substantially. The relative fitness of single mutants follows K103N > Y181C > G190A > Y188C > V106A, in agreement with Collins et al. (8). Furthermore, we find that some double mutants are fitter than the corresponding single mutants, that some triple mutants are fitter than the corresponding double mutants, and so on. For instance, the V106A/Y181C mutant is fitter than the V106A mutant. Similarly, the K103N/V106A/Y181C mutant is fitter than the K103N/V106A mutant. All single mutants, however, are less fit than the wild type. Thus, whereas individual mutations lower fitness, in combination some mutations increase fitness. The fitness of genomes is thus the result of complex interactions between mutations. We find that our estimates of the relative fitness correlate well with those of Collins et al. (8) (see the Discussion and see Fig. S2 in the supplemental material). We employ our estimates of the relative fitness in our simulations and model to predict the frequencies of NVP-resistant genomes.

Simulations of HIV-1 evolution.We have previously performed simulations of the genomic diversification of HIV-1 that incorporate the key forces underlying HIV-1 evolution *in vivo* and that quantitatively capture the time evolution of HIV-1 genomic diversity in patients over extended durations post-seroconversion (63). Here, we apply the simulations to predict the frequencies of NVP-resistant strains. Briefly, we considered synchronous infections of a fixed population of cells. Infection begins with a pool of identical homozygous virions that carry no NVP resistance mutations. Each cell is infected with virions randomly chosen from the viral pool. Following infection, viral RNAs are reverse transcribed into proviral DNA, during which process recombination and mutation introduce genomic diversity. The proviral DNAs produce viral RNAs in proportion to their relative fitness. Viral RNAs are randomly assorted into pairs and released as new virions. Each cell produces progeny virions in proportion to the fitness of the genomes infecting it. Virions are randomly chosen from the resulting viral pool for infection of the next generation of uninfected cells, and the cycle is repeated (see Materials and Methods).

In Fig. 3, we present the evolution of viral diversity and divergence at the five NVP resistance loci we consider. Diversity is a measure of the variation in the viral genomes within an individual at any given time, whereas divergence is a measure of the variation of the viral genomes at any time from the founder genome. Initially, because we consider an individual infected with strains not carrying any NVP resistance mutations, both the diversity and the divergence are zero. With time, mutation and recombination give rise to NVP resistance mutations. Accordingly, the diversity and divergence of genomes increase. NVP resistance mutations, however, carry fitness penalties (Fig. 2). Consequently, selection, along with random genetic drift, curtails viral diversification. Eventually, a balance between these competing forces is established, which here occurs rapidly, and diversity and divergence attain equilibrium.

In Fig. 4, we show the corresponding evolution of the frequencies of two representative NVP-resistant genomes in two realizations of our simulations. Evidently, the frequencies are subject to large stochastic variations both within and across realizations. We averaged the frequencies for each of the *S* NVP-resistant genomes over several generations after viral diversity reaches equilibrium and across several realizations of the simulations to obtain the expected frequencies both in the proviral DNA and in the cell-free viral genomic RNA (see Materials and Methods).

Frequencies of NVP-resistant genomes.In Fig. 5, we present the average frequencies of each of the *S* mutant genomes among the proviral DNA (φ_{i}^{C}) and cell-free viral RNA (φ_{i}^{V}) predicted by our simulations. We find that mutant frequencies decrease with the number of resistance mutations the genomes carry. Single mutants are present in mean frequencies of ∼10^{−4} to 10^{−5}, and double mutants are present in mean frequencies of ∼10^{−8} to 10^{−9}. Triple and higher mutants are absent (not shown). Different mutants within a mutant class occur at different frequencies because of different relative fitness values and frequencies of related mutants. Single mutants, which arise largely from the mutation of the wild type, occur at frequencies that increase with their relative fitness: K103N > Y181C > G190A > Y188C > V106A. Double-mutant frequencies depend additionally on the frequencies of the corresponding single mutants. Thus, for instance, the Y188C/G190A mutant has a higher fitness but lower frequency than the K103N/Y188C mutant because the K103N mutant is fitter and has a higher frequency than the G190A mutant, which results in the higher rate of formation of the K103N/Y188C mutant rather than the Y188C/G190A mutant.

The standard deviations of the frequencies are large (Fig. 5), indicative of the strong influence of random genetic drift and the resulting stochastic variations. These stochastic variations are observed both across generations within a particular realization as well as across different realizations of the simulations (Fig. 4). Standard deviations of single-mutant frequencies are ∼10^{−4}, often higher than the mean values, and those of double-mutant frequencies are ∼10^{−6}, several orders of magnitude larger than the corresponding mean values, suggesting that large temporal and interpatient variations in the frequencies of resistant mutants are expected due to stochastic effects underlying viral evolution.

Influence of simulation parameters.We examined here the sensitivity of the above predictions to variations in simulation parameters. Recombination can induce the accumulation (or dissociation) of mutations and alter the frequency of mutant genomes (12). A prerequisite for recombination to induce genomic variation is the infection of individual cells by multiple, distinct virions. We therefore examined the influence of recombination and multiple infections of cells on the frequencies of NVP-resistant genomes. We find that both recombination and multiple infections of cells have no significant effects on the frequencies (Fig. 6). The influence of recombination increases as the separation between resistance loci increases (47, 61). Here, the resistance loci are clustered together (maximum separation of ∼100 amino acids) so that the influence of recombination is expected to be small. An increase in the frequency of multiple infections while keeping all other parameters constant corresponds to an increase in the proviral populations, which do not alter the frequencies of NVP-resistant genomes substantially (see below).

The effective population size of HIV-1 may vary across individuals. With the above parameters, our simulations capture the evolution of viral diversity and divergence in patients with effective population sizes in the range ∼10^{3} to 10^{4} cells (63). We therefore examined whether increasing the population size to 10^{4} alters the frequencies of NVP-resistant mutants. We find that with the larger population size, the mean frequencies of both single and double mutants remain largely unchanged, whereas the standard deviations decrease because of the smaller influence of drift (Fig. 7). Triple and higher mutants still do not occur.

More generally, current estimates of the effective population size of HIV-1 *in vivo* display wide variations, ranging from 10^{3} to 10^{6} cells (1, 30, 40, 50, 52-54). In particular, simulations similar to ours but with the assumption that multiple infections of cells are rare, as observed recently in peripheral blood mononuclear cells in four chronically infected patients (26), obtained a much larger value of the effective population size (>10^{5}) by comparison with experimental observations of the frequency of the least-prevalent haplotype in a two-locus/two-allele model (52). Here, we therefore performed simulations with a population size of 10^{5} and a low frequency of multiple infections that mimics experimental observations (26) (see Materials and Methods). We find that single and double mutants exist at mean frequencies similar to those predicted above (Fig. 8). Triple and higher mutants are again absent. The larger population size, however, implies a smaller influence of drift. Accordingly, the deviations about the mean values of the mutant frequencies are now smaller than those predicted above. Thus, the standard deviations of the single mutant frequencies are ∼10^{−4} and those of the double mutants are ∼10^{−7} (in contrast to ∼10^{−6} in Fig. 5). Yet, the finding that standard deviations are much larger than the mean values implies that there are significant stochastic variations in the mutant frequencies.

Frequencies of NVP-resistant genomes from a deterministic model.Finally, we employed a deterministic model of HIV-1 dynamics and evolution (see Materials and Methods) to estimate frequencies of the resistant genomes in the limit of an infinitely large effective population size. (Simulations with population sizes larger than ∼10^{5} are computationally prohibitive.) We find that the deterministic model predicts that single- and double-mutant frequencies are similar to the mean values predicted by our simulations presented above (Fig. 9). Thus, the expected higher effective population size of ∼10^{6} cells (52) may also not lead to substantial differences in the frequencies of single and double mutants. The deterministic model predicts, in addition, that triple and higher mutants occur at low frequencies when the population size is sufficiently large. Thus, triple, quadruple, and quintuple mutants occur at frequencies of ∼10^{−12}, ∼10^{−16}, and ∼10^{−21}, respectively. In the cell-free viral population, four unviable mutants (relative fitness = 0) occur at vanishing frequencies (see Fig. S6 in the supplemental material).

## DISCUSSION

Minority NVP-resistant strains may exist in HIV-1-infected individuals naïve to NVP at frequencies below current assay detection limits and may compromise SDNVP-based efforts to prevent mother-to-child transmission, accelerate the failure of subsequent NNRTI-based antiretroviral therapy, and result in the transmission of drug-resistant genomes. Here, we employ bit-string simulations of viral genomic diversification and a mathematical model of HIV-1 dynamics and evolution to predict the frequencies of minority NVP-resistant strains in chronically infected NVP-naïve individuals. Our simulations incorporate the influence of mutation, multiple infections of cells, recombination, selection, epistatic interactions between multiple loci, and random genetic drift and predict the frequencies of genomes carrying all possible combinations of common NVP resistance mutations. Our model of HIV dynamics predicts frequencies in the limit of an infinitely large effective population size.

We find that the mean frequencies of genomes depend sensitively on the number of mutations the genomes carry. Single mutants occur at mean frequencies of ∼10^{−4} and double mutants at ∼10^{−8}. These mean frequencies are predicted both by our simulations and the deterministic model and are robust to changes in the recombination rate, the frequency of multiple infections of cells, and the effective population size. We note that the estimates of the mutant frequencies obtained from our simulations are not significantly different from those predicted by our deterministic model (*P* = 0.41 for double-mutant frequencies based on the one-tailed, paired *t* test) but are significantly smaller than estimates from earlier approaches (7, 48) (*P* = 0.04 for double-mutant frequencies based on the one-tailed, paired *t* test) (see Fig. S7 in the supplemental material). Furthermore, our simulations predict that when the effective population size is small, no triple or higher mutants exist. The deterministic model applies in the limit of an infinitely large effective population size and predicts then that triple, quadruple, and quintuple mutants also exist at frequencies of ∼10^{−12}, ∼10^{−16}, and ∼10^{−21}, respectively.

Chronically infected individuals typically contain ∼10^{7} to 10^{8} infected cells (20). Our predictions thus imply that all single and in some cases double mutants may exist in chronically infected individuals. Triple and higher mutants are unlikely to exist. Given the sensitivity (>0.01%) of current assays, however, several single and all double mutants are unlikely to be detected. The mutation K103N has the highest frequency, ∼0.018%, and is accordingly the most commonly detected (15). Y181C and G190A occur at lower frequencies (∼0.015% and ∼0.012%, respectively) and are thus less readily detected. The frequencies of K103N, Y181C, and G190A are near the detection limits of current assays. Consequently, subtle variations in viral fitness, arising, for instance, from different genetic backgrounds or subtypes (14), may result in frequencies above or below the detection limits in different patients. The present estimates of mutant frequencies thus emphasize the need for the development of more sensitive assays to detect minority resistance mutations *in vivo*. In addition, our simulations indicate that the standard deviations of the frequencies can be quite large, due to the influence of random genetic drift. Indeed, the standard deviations of the double-mutant frequencies are over 100-fold larger than the corresponding mean frequencies. Thus, large temporal and interpatient variations in the frequencies of NVP-resistant strains may arise due to stochastic effects.

Even when undetected, NVP resistance mutations may exist in patients in substantial numbers (∼10^{3} copies [frequency of ∼10^{−4}]) and may underlie therapy failure and transmission of drug resistance. For instance, although K103N is the predominant mutation in mothers, infants exposed to SDNVP showed predominantly Y181C mutations (37). One explanation is that NVP resistance mutations arise largely *de novo* in infants, and only a small proportion are transmitted by the mothers (38). Our estimates suggest, alternatively, that Y181C may exist in mothers in undetectable yet comparable frequencies to K103N and may be transmitted preferentially to infants because of its greater fitness in the presence of NVP (42) and/or greater transmissibility.

Long after the administration of SDNVP, NVP resistance mutations fade. The resulting frequencies are expected to be at least as large as those predicted by our simulations for patients not exposed to NVP. We conclude therefore that single-mutant frequencies may not necessarily vanish long after SDNVP. This, together with the relatively small fitness loss of some mutations (relative fitness of ≈0.9 for K103N) may explain the persistence of NVP resistance mutations several years after SDNVP (17, 44). Latently infected cells, which may archive NVP resistance (64), may contribute further to the persistence of NVP resistance mutations. The outcomes of NNRTI-based therapies are thus expected to be similar in NVP-naïve patients and patients in whom therapy is initiated long after SDNVP. Indeed, two studies found no substantial increase in the risk of mother-to-child transmission with repeated usage of SDNVP (18, 36), possibly because the median duration between pregnancies in the latter studies was >20 months, during which period NVP resistance mutations selected following the first SDNVP may have faded to the preexposure levels. Similarly, the risk of failure of NNRTI-based therapy was enhanced when therapy was initiated within 6 months of SDNVP, but no significant enhancement in the risk was observed when therapy was initiated more than 12 months after SDNVP (33, 60).

We recognize several limitations of our approach. First, we assume that the fitness of genomes is determined entirely by their replicative ability, whereas fitness is a combination of infectivity, replicative ability, and survivability. Available data do not allow estimation of differences between genomes in more than one of the latter characteristics with sufficient confidence. For instance, Collins et al. (8) fit their data, which are the same data that we employ—namely, of the ratio of the mutant and wild-type genomes measured at different times, *t*, in a competitive growth assay—to an exponential, *b*exp(*mt*), and assume that the best-fit value of *m* provides a measure of the relative fitness (fitness coefficient) of the mutant. (*b* is the initial ratio of the mutant and wild-type genomes.) *m* thus combines differences in fitness arising from different infectivity, replicative ability, and/or survivability of the virions. Our estimates of the relative fitness correlate well with those of Collins et al. (8) (see Fig. S2 in the supplemental material). An advantage of our analysis is that our estimates of the relative fitness, recognized also as the production rate ratio (65), enable facile determination of viral production as a function of viral fitness and hence are conveniently employed in our simulations and model.

Second, we note that our calculations estimate frequencies of NVP resistance mutations during the chronic infection steady state, where viral diversity at the resistance loci has reached a plateau. If infection is acquired through the transmission of susceptible genomes, then during acute infection, where viral diversity is on the rise, the frequencies of resistant mutants are expected to be lower than current estimates. In contrast, if drug-resistant genomes are acquired through transmission, then the frequencies of resistant mutants may be higher than current estimates. Third, we assume a constant mutation rate at all of the resistance loci. While this is a reasonable approximation given that at all five loci we consider single-nucleotide substitutions yield amino acid changes, the approximation may overestimate the frequencies of the K103N and G190A mutations, which are transversions and may occur at smaller rates compared to the other mutations. The large stochastic variations we predict, however, may subsume changes in the mean frequencies of the K103N and G190A mutations expected from the lower substitution rates. Nonetheless, our estimates of the frequencies of NVP resistance mutations facilitate more accurate assessment of the impact of minority NVP-resistant strains on the outcomes of NVP-based therapies and the transmission of NVP resistance.

## ACKNOWLEDGMENTS

This work was supported by National Institutes of Health grant AI065334. S.G. was supported partly by the Department of Biotechnology, Centre of Excellence for Research on Hepatitis C Virus, India.

## FOOTNOTES

- Received 10 May 2010.
- Accepted 21 July 2010.

- Copyright © 2010 American Society for Microbiology