Why Are CD8 T Cell Epitopes of Human Influenza A Virus Conserved?

Universal influenza vaccines against the conserved epitopes of influenza A virus have been proposed to minimize the burden of seasonal outbreaks and prepare for the pandemics. However, it is not clear how rapidly T cell-inducing vaccines will select for viruses that escape these T cell responses. Our mathematical models explore the factors that contribute to the conservation of CD8 T cell epitopes and how rapidly the virus will evolve in response to T cell-inducing vaccines. We identify the key biological parameters to be measured and questions that need to be addressed in future studies.

caused by antigenic drift, larger antigenic shifts may occur when new antigenic subtypes (e.g., H1, H3, H5, and H7) emerge from zoonotic reservoirs into the human population. Both antigenic drift and antigenic shift necessitate frequent updating of the influenza vaccine. New approaches that focus on antibodies against the conserved regions of HA or CD8 T cells specific to the conserved epitopes on interior proteins have been proposed for the development of vaccines that have broad efficacy and ideally confer universal protection against all IAV subtypes (3)(4)(5). An important question, relevant to the development of a universal T cell-inducing vaccine, is whether IAV is likely to evolve and escape from the vaccine-induced CD8 T cell immunity. In this paper, we address this question by identifying the factors that are responsible for the conservation of the CD8 T cell epitopes of IAV. This allows us to quantify how rapidly the virus might evolve to escape from CD8 T cell responses generated by a universal T cell-inducing influenza vaccine.
CD8 T cells detect and kill virus-infected cells by recognizing short viral proteinderived peptides (epitopes) bound to major histocompatibility complex class I proteins (MHC-I) on the cell surface. While CD8 T cells do not generate sterile immunity that prevents infection, they can reduce the severity of disease and potentially viral transmission (6,7). In addition, they may provide broad protection, because CD8 T cell epitopes of influenza A virus are largely conserved among drifted strains within one subtype and across different subtypes (7)(8)(9). The conservation of CD8 T cell epitopes is consistent with the observation that internal viral proteins (i.e., the nucleoprotein [NP] and matrix-1 protein [M1]), which harbor the majority of CD8 T cell epitopes, have a much lower substitution rate than HA and NA, which are the targets of antibody responses (10). Unlike the highly variable antibody epitopes, to date only 6 out of 64 CD8 T cell epitopes in human IAV have been found to have mutations that allow escape from CD8 T cell responses (9,11).
Why are CD8 T cell epitopes of human influenza A virus conserved? The functional constraint hypothesis proposes that CD8 T cell epitopes are on the protein regions that do not tolerate changes in the amino acid sequence (12,13). In this hypothesis, nonsynonymous mutations in the epitope region totally or partially diminish protein function and result in viruses that incur a substantial cost at the level of replication. In support of this, several studies have revealed that a single amino acid change at some residues within the NP or M1 can reduce the replication of the virus (12). A more recent study has shown that epistatic interactions may stabilize mutations and permit previously inaccessible destabilizing mutations (14). While the epistatic interactions greatly ameliorate the fitness cost, it would slow the rate at which a competitive mutant is generated. In addition, a number of other factors may contribute to the conservation of CD8 T cell epitopes. First, while CD8 T cells protect the hosts from severe pathology, they might not prevent infection and may only provide a modest reduction in virus transmission from an infected individual to new hosts. Second, there are multiple CD8 T cell epitopes, so an escape variant at one of these epitopes will still be targeted by T cells against the other epitopes. Third, a mutant that has a mutated CD8 T cell epitope will have a selective advantage only in the hosts whose MHC-I present the wild-type form of the CD8 T cell epitope; thus, polymorphism of MHC-I genes limits the selective advantage of a mutant to a fraction of the host population. In this paper, we use simple mathematical models to explore how the interplay between the above-mentioned factors affects the rate of invasion of a mutant, which has a mutated epitope that is not detected by CD8 T cells specific for the wild-type epitope. For brevity, the two virus populations are denoted wild-type, WT, and escape variant or mutant, MT.
(This article was submitted to an online preprint archive [15].)

RESULTS
As mentioned in the Introduction, there are (at least) three factors that can affect the fitness of an escape variant: (i) the fitness cost of the mutation, (ii) the extent of selection for the escape variant in hosts carrying the relevant MHC-I allele(s) (i.e., one that presents the wild-type form of focal epitope), and (iii) the frequency of hosts with the relevant MHC-I allele(s) in the population. We begin with a relatively simple population genetics model that allows us to examine how the interplay between these factors affects the rate of invasion of an escape variant. This model assumes the selective advantage of the escape variant in a host is fixed. We then consider why this assumption may not hold and examine the consequences of relaxing this assumption, using an epidemiological model for the spread of the wild type and escape variant.
Population genetics model. Let us consider the wild type (WT) and one escape variant (MT) that carries a mutated focal epitope (Table 1). Let h represent the set of MHC-I allele(s) that presents the focal epitope of WT but not MT, and let f equal the frequency of h. The MHC-I alleles that present epitopes other than the focal one, H, are at frequency (1 -f ). With the usual assumptions for Hardy-Weinberg equilibrium, we have the values shown. m, s, and r denote the fitness cost of MT in all hosts, the selective advantage of MT in hosts of hh genotype, and the dominance coefficient of the h allele, respectively. Here, we have assumed that the WT is equally fit in all genotypes, i.e., while h alleles present the focal epitope, the H alleles present other epitopes carried by the virus, so that all alleles confer about the same level of resistance. This is not true for the MT: while the H alleles still successfully present their other epitopes, h does not present its focal epitope. The frequency of the MT in generation t, q t , is given by is the mean fitness of MT in the host population. Equation 1 allows us to examine how the rate of invasion of the MT depends on m, the fitness cost; f, the frequency of h; s, the selective advantage MT accrued in the hosts of genotype hh; and r, the dominance coefficient as the ratio of selective advantage of MT in Hh hosts to that in hh hosts. The dominance coefficient depends on a number of factors, such as (i) the immunodominance of the focal epitope and (ii) the relationship between the magnitude of CD8 T cell responses and the rate of clearance of infected cells. In the absence of a quantitative understanding of these factors, we set r at an intermediate value (r ϭ 0.5), and in Appendix 1 we discuss the consequences of changing r to lower or higher values.
In Fig. 1, we plot how the rate of invasion depends on the selective advantage, s, of the MT (x axis) and the frequency, f, of the MHC-I alleles in which MT has a selective advantage (y axis). The rate of invasion is plotted on a log scale showing the log 10 number of generations required for the MT to go from a prevalence of 0.01% to 50% in the host population. Given that the serial interval for influenza is about 3 to 4 days (16), 100 generations corresponds to about 1 year. In Fig. 1A, we set the fitness cost to 1% (i.e., m ϭ 0.01). There is a parameter regime (white region with low s and f ) where the fitness cost is sufficient to prevent MT from invasion. When MT can invade, we see faster invasion when s or f increases.
Parameter m. We show the effect of changing the fitness cost of MT (m) from 0% to 10% in  as well as by reducing pathology (IE P ) and transmission (IE I ) in infected individuals. While the role of CD8 T cells in providing protection against influenza remains to be fully understood, a number of studies suggest that they play a significant role in reducing pathology (high IE P ). In humans, a greater number of IAV-specific CD8 T cells prior to heterosubtypic virus infection is associated with faster viral clearance (6) and fewer symptoms (7). In support of human studies, mouse experiments have shown the cellular immunity induced by H1N1 and/or H3N2 is able to protect the hosts from lethal infection with avian H5N1 or H7N9 (18)(19)(20). CD8 T cells are likely to be less effective in preventing infection (very low IE S ), although they may reduce the viral load during infection (modest IE I ). Consequently, the selection pressure on a virus imposed by all CD8 T cell responses, 1 Ϫ (1 Ϫ IE S ) (1 Ϫ IE I ), will be relatively low. Furthermore, since a virus has multiple CD8 T cell epitopes, the selective advantage of a variant that escapes CD8 T cell responses to a single epitope would be considerably smaller (see Appendix 2 for details).
In conclusion, although CD8 T cells may provide some protection against severe pathology, escape variants having a mutated CD8 T cell epitope are unlikely to have much selective advantage, even in the hosts with the relevant MHC-I that presents the wild-type epitope. In other words, we expect s to be small.
Parameter f. Escape variants will accrue advantages in hh hosts and, to a lesser extent (proportional to the dominance coefficient, r), in Hh hosts. For example, the R384G escape mutation on the NP 383-391 epitope is advantageous in the hosts who carry B*08:01 or B*27:05 but not in those who do not carry these alleles. In Fig. 2, we show the distribution of experimentally verified CD8 T cell epitopes derived from the nucleoprotein of human IAV retrieved from the Immune Epitope Database. It is clear that not a single epitope is presented by all human leukocyte antigen (HLA) alleles, and there is no single HLA allele presenting all epitopes. With this information and the frequencies of HLA alleles based on the National Marrow Donor Program (NMDP) data set, we estimated the fraction of host population where the mutation at each amino acid residue would confer a selective advantage (Fig. 2, top). We see that mutations typically confer selective advantage to the virus in only a small fraction of the human population.
Integrating parameters s, f, and m. As mentioned in the Introduction, epistatic interactions allow the virus to potentially generate escape variants to a given CD8 T cell epitope without engendering a substantial fitness cost. Our results show that since s and f are small, these escape variants will spread very slowly in the host population. For example, when f ϭ 0.1 (i.e., the escape variant has advantages in about 20% of the infections, who are of hh genotype [f 2 ϭ 0.01] and of Hh genotype [2f(1 -f ) ϭ 0.18]), an escape variant would spend 10 years reaching 50% prevalence, even if its fitness is

Conservation of Influenza CD8 T Cell Epitopes
Journal of Virology 10% more than that of the wild type (i.e., s ϭ 0.1) in the hosts with relevant HLA, and no fitness cost is included (i.e., m ϭ 0).
In conclusion, even if mutations that allow the virus to escape CD8 T cells specific for a given epitope have little or no fitness cost, escape variants will increase in frequency very slowly.
Epidemiological models. (i) Compensatory immunity reduces the selective advantage of mutants over time. The population genetics framework described above assumes that the fitness of an escape variant virus depends on host genotype and does not change over time. In particular, we assume that the fitness of escape variant (MT) relative to the WT equals (1 Ϫ m ϩ s), (1 -m ϩ rs), and (1 -m) in the hosts of hh, Hh, and HH genotypes, respectively. However, in a host of hh or Hh genotype who has been infected by MT, recovered, and moved to the susceptible category (due to antigenic drift), we might expect the selective advantage (s) of MT to decrease. This decline in s could arise for at least two biological reasons. First, the mutated epitope is still presented by the MHC-I; the mutation simply changes the configuration of the epitope recognized by the CD8 T cell receptor (12). In this case, the mutant epitope may induce a new set of CD8 T cells. Second, the absence of CD8 T cell response to one epitope could result in compensatory increases in responses to other epitopes.
(ii) Epidemiology of infections with wild-type and escape variant viruses. We used a simple epidemiological model to describe the changes in frequencies of susceptible (S), infected (W and M for WT and MT infections), and immune (R) hosts. The subscript to S, W, M, and R populations indicates the viruses these hosts have been exposed to in the past. Individuals can be infected multiple times during their lifetime due to antigenic drift at antibody epitopes (21). We incorporate this by letting individuals move from the immune (R) to susceptible (S) compartments at rate (22).
We considered the epidemiology of WT infections in individuals of different genotypes, all of which have the same structure as that shown in Fig. 3B. Prior to the introduction of MT, we assumed that the WT is circulating and hosts have CD8 T cell immunity to the wild-type epitope. Due to antigenic drift, individuals typically get reinfected with a drifted strain every 5 to 10 years (21), and we chose the rate of loss of immunity corresponding to this duration ( ϭ 5 ϫ 10 Ϫ4 /day, approximately once per 5.5 years). We began the simulations with WT infections at equilibrium.
On the introduction of MT, the MT has fitness (1 -m), (1 -m ϩ rs), or (1 Ϫ m ϩ s) in the MT-infected hosts of the HH, Hh, or hh genotype. MT-infected individuals move to the immune category with a subscript of WM (i.e., R WM ). Individuals in R WM become susceptible due to antigenic drift in the virus and move to S WM . When individuals in S WM are infected with the WT, they move to W WM and the WT has fitness of 1; however, when individuals in S WM are infected with the MT, they move to M WM and the MT has fitness (1 -m), [1 -m ϩ rs(1 -c)], and [1 -m ϩ rs(1 -c)] according to the host's genotype. For simplicity, we incorporate the fitness in the transmissibility parameter, ␤.
In Fig. 4, we explore how the escape variant (MT) spreads through the host population following its introduction. In particular, we focus on how compensatory immunity changes the outcomes predicted by the population genetics models. In Fig. 4A, we chose a simple scenario where the MT has a very small fitness cost (m ϭ 0.001), 5% and 2.5% selective advantage (s ϭ 0.05, rs ϭ 0.025) in the hosts of hh and Hh genotypes, which accounts for 1% and 18% of the population (f ϭ 0.18), respectively, and compensatory immunity reduces the selective advantage by 90% (c ϭ 0.9). In this scenario, we see that the MT now only transiently invades, and compensation in host immunity causes the frequency of MT to decline as the population-level immunity against the MT increases. In Fig. 4B, we explore the consequences of changing the extent of compensation (c). We see that the initial rate of invasion is very similar to what is predicted by the population genetics model. However, once the MT has spread through the population, the outcome depends strongly on the extent of compensatory immunity described by the parameter c. If c is small, then the MT goes to fixation in a manner similar to that of the population genetics model. If c exceeds a threshold value, c*, given by then the MT only transiently invades but declines to extinction in the long run. The competitive exclusion between WT and MT infections is shown in Fig. 4C, where we plot how the outcome depends on s, f, and c, given r ϭ 0.5. Incorporating fitness parameters into the duration of infection gives similar results (results not shown). Furthermore, we show these results are robust to the addition of seasonality (Appendix 3). In summary, the results of the epidemiological models show that the initial rate of invasion of the escape variant is similar to that in the population genetics model described by equation 2. However, at later time points, compensatory immunity reduces the rate of invasion. If the extent of CD8 T cell immunity to the escape variant is sufficiently high, the outcome may even be reversed if the overall selective advantage does not surmount the fitness cost.

DISCUSSION
We considered the question of why CD8 T cell epitopes of human IAV are conserved. The answer to this question is relevant to the development of universal T cell-inducing vaccines against influenza and whether the virus is likely to evolve to escape the vaccine-induced immunity. Despite the wealth of empirical data showing conservation of CD8 T cell epitopes, the evolutionary mechanisms responsible for this conservation are not well understood. One possibility that has been widely considered is that the nucleoprotein and matrix-1 protein, which harbor the bulk of CD8 T cell epitopes, are under strong constraints (12,13). In this view, mutations in CD8 T cell epitopes would have a high fitness cost or be relatively inaccessible due to epistatic changes needed for escape variants to have high fitness. In this study, we proposed two other mechanisms that could impact the fitness of viruses that escape CD8 T cell recognition. The first one is that escape from CD8 T cell responses against a single epitope provides only a relatively small selective advantage to the escape variant. The second one is that polymorphism in the MHC-I genes restricts this small advantage to only a small fraction of individuals in the population; individuals with other alleles would not present this epitope but present other epitopes. We show that even if there is a minimal fitness cost to having a mutation in a CD8 T cell epitope, the latter two factors will result in a very low rate of invasion of the CD8 T cell escape variant. We note that in the preceding models and discussion, we consider a scenario where invasion of a CD8 T cell escape variant is entirely driven by CD8 T cell selection. In reality, selection due to CD8 T cell immunity occurs in the context of a much stronger selection imposed by antibodies. The latter results in periodic replacement of virus strains over the course of a few seasons with drifted virus strains with novel antigenicity. We consider the consequences of this in more detail below.
The conclusion of this study seems to contradict the rapid invasion of the mutation at amino acid residue 384 of the nucleoprotein. This mutation alters the NP 383-391 epitope presented by HLA-B*27:05 and the NP 380 -388 epitope presented by HLA-B*08: 01. The wild-type sequence has an arginine (R), which forms an anchor residue at this site, and all 16 viruses isolated and sequenced during the 1992-1993 epidemic season had the wild-type sequence (23). An arginine-to-glycine mutation at this site (R384G) abrogates the MHC-I binding and prevents antigen presentation, and this mutation rapidly swept through the population in the 1993-1994 epidemic season, where all 56 virus isolates had G at this residue (23). Gog et al. (24) suggested that the rapid fixation of the R384G mutation was due to a combination of a longer duration of infection that slows its decline compared to that of the wild type over the summer and stochastic events. We propose an alternative hypothesis: the selective advantage of the R384G mutation is not required, and hitchhiking of a randomly generated mutant would be sufficient to explain the data. The rapid invasion of R384G temporally matches the transition from BE92 to WU95 antigenic clusters (25), suggesting that it hitchhiked with the antigenically drifted WU95 strain. Additionally, during 2002 to 2005, the invasion of the valine-to-isoleucine (V425I) mutant of the NP 418 -426 epitope was temporally associated with the transition from FU02 to CA04 (26). The rapid invasion pattern has also been observed on NP 251-259 (invasion of S259L) and NP 103-111 (alternating invasions of 103 K and 103R), all being consistent with the hitchhiking hypothesis.
Two recent analyses suggested that CD8 T cell epitopes of IAV circulating in the human population are under selection (27,28). In particular, the number of CD8 T cell epitopes in circulating strains of influenza has gradually declined over a timescale of decades (28). For example, the H3N2 subtype had 84 experimentally confirmed epitopes per virus in 1968, and this number declined to 64 in 2015. A decline in the number of epitopes could be partially due to the bias in identification of epitopes, as new epitopes may not be identified and included. The observed decline rather than drift in the number of confirmed epitopes may arise because of a slight selective advantage of the escape variants over the wild type. This study suggests that the gradual escape of the virus from a CD8 T cell vaccine is possible.
We have intentionally used simple models, because the empirical data do not include accurate measurements of many of the key parameters that govern the generation and spread of virus escape variants. Under these circumstances, the results of simpler models are typically more robust than those of complex models (29). Our models can be thought of as simplified representations of the dynamics in the tropical regions, where the virus pool is maintained by continuous transmission. Antigenic changes arising in the tropical regions have been suggested to drive the seasonal epidemics in the Northern and Southern Hemispheres (30).
This study identifies the importance of measuring parameters such as the fitness of the wild-type and escape variants in hosts who have been previously infected with the wild type and both wild-type and escape variants. Aspects that might be included in more refined models are the waning of CD8 T cell immunity over time, particularly due to the loss of resident memory cells from the respiratory tract (31,32), and the effect of stochasticity.
There are several differences in the ability of the influenza virus to evolve in response to antibody and CD8 T cell immunity. First, antibody immunity can generate sterilizing immunity (prevent infection with a matched virus strain) and, thus, generates substantial selection for antibody escape variants. In contrast, CD8 T cell immunity to influenza does not prevent infection and, thus, generates less selective advantage to a CD8 T cell escape variant. Second, an antibody escape variant will gain a selective advantage in the majority of individuals with antibodies to the wild-type strain, while a CD8 T cell escape variant will have a selective advantage only in individuals of a particular MHC-I genotype. Both of these factors contribute to antibody rather than CD8 T cell immunity driving antigenic drift in influenza.
Vaccination strategies that boost the CD8 T cell response may contribute to the development of broadly protective influenza vaccines. In this paper, we focused on whether these vaccine strategies will rapidly select for virus escape variants at CD8 T cell epitopes, compromising the effectiveness of the vaccine. We show that this is unlikely to be the case. Although it is generally viewed as a potential limitation that these vaccines may not completely prevent infection, this fact, together with MHC polymorphism, greatly reduces the selection pressure on the virus. Consequently, it may take a much longer duration for the virus to evolve and escape vaccine-induced CD8 T cell immunity.

MATERIALS AND METHODS
Population genetics model. The analytical solution is found by rearranging equation 1 into We then express t, the number of generations required for the MT to reach q t , as with the same amino acid sequence, records with different sequences but at the same location of NP and presented by the same HLA allele, or records nested under a longer record were combined into one unique epitope. In total, 64 unique epitopes were identified. Escape mutations were identified from the literature (9,11). An HLA allele data set reported by the NMDP was retrieved from The Allele Frequency Net Database (www.allelefrequencies.net). We included all alleles that have been reported to present at least one epitope in the epitope data set and calculated the average frequency weighted by sample sizes. In addition, since the alleles in one HLA supertype prefer amino acids with similar chemical properties at certain residues of the epitopes, we grouped the HLA alleles based on the classification proposed by Sidney et al. (33).
Epidemiological model. The ordinary differential equations corresponding to Fig. 3B are presented below.
where j of 1, 2, and 3 denotes the genotype of HH, Hh, and hh, respectively. We started simulations from the equilibrium of WT infection, i.e.,  Table 2; transmission parameters are in Table 3. ; j ϭ 1, 2, and 3. See Table 3 for the setup of transmission parameters in MT-infected subjects.  Figure A1 shows the number of generations for a CD8 T cell escape variant (MT) to reach 50% from 0.01% prevalence with different fitness costs (m ϭ 0, 0.001, 0.01, and 0.1) and dominance coefficients (r ϭ 0, 0.25, 0.75, and 1). As m increases, the MT requires higher s and/or f to overcome the deleterious effect. The increase in r changes the shape of the contours, since it raises the selective advantage of MT in the heterozygotes. As a result, the threshold for MT invasion when s is large is lower than that when r is small.
Quantifying selection pressure on virus. We define three types of immunity effectiveness (IE): IE S is the probability that an individual who has influenza-specific CD8 T cells does not get infected upon contact with an infectious individual, IE P is the probability that an infected individual who has influenza-specific CD8 T cells does not develop symptoms, and IE I is the probability that an infected individual who has influenza-specific CD8 T cells does not spread the virus.
The selection pressure on virus S is formulated by 1 Ϫ S ϭ (1 Ϫ IE S ) (1 Ϫ IE I ). The viral fitness after selection, 1 Ϫ S, is the probability of transmission within the host population who have influenza-specific CD8 T cells, and it can be expressed as the probability that an immune individual gets infected (1 Ϫ IE S ) and spreads the virus (1 Ϫ IE I ).
Effect of seasonality. Seasonality is incorporated into transmissibility by where i and j indicate the host's immune status and genotype, respectively, 0 Յ A Յ 1 is the amplitude of oscillation, and T is the period (34). Simulation of model with A ϭ 0.02 and T ϭ 365 (annual outbreaks with ␤ ranging from 0.392 to 0.408) was compared to the one with A ϭ 0 (no seasonality). Our results show that the mean oscillation is about the same as that seen in the model with no seasonality (Fig. A2).