Single-cell virus sequencing of influenza infections that trigger innate immunity

The outcome of viral infection is extremely heterogeneous, with infected cells only sometimes activating innate immunity. Here we develop a new approach to assess how the genetic variation inherent in viral populations contributes to this heterogeneity. We do this by determining both the transcriptome and full-length sequences of all viral genes in single influenza-infected cells. Most cells are infected by virions with defects such as amino-acid mutations, internal deletions, or failure to express a gene. We identify instances of each type of defect that increase the likelihood that a cell activates an innate-immune response. However, immune activation remains stochastic in cells infected by virions with these defects, and sometimes occurs even when a cell is infected by a virion that expresses unmutated copies of all genes. Our work shows that viral genetic variation substantially contributes to but does not fully explain the heterogeneity in single influenza-infected cells.


22
Infection with an acute virus such as influenza initiates a race between the virus and immune 23 system. As the virus spreads, some infected cells trigger innate immunity and begin producing 24 interferon (IFN). This IFN directs expression of anti-viral interferon-stimulated genes (ISGs) in the 25 infected cell and its neighbors via autocrine and paracrine signaling, as well as helping launch a 26 broader immune response (Stetson and Medzhitov, 2006;Honda et al., 2006). If innate immunity 27 is activated sufficiently rapidly, it can reduce viral replication and disease severity (Solov'ev, 1969; 40 It is unclear why only some infected cells trigger innate-immune responses. Two possible 41 contributors are pure stochasticity and pre-existing variation in cellular state. For instance, only 42 some cells induce IFN even upon treatment with synthetic innate-immune ligands (Shalek et al., 43 2013(Shalek et al., 43 , 2014Wimmers et al., 2018), and the frequency of IFN induction may depend on a cell's pre- 44 existing chromatin state (Bhushal et al., 2017). But for influenza, a third possible contributor also 45 looms large: viral genetic diversity. Because influenza has a high mutation rate, even if the consensus 46 sequence of a viral population is "wild-type," individual virions will often have defects (Parvin et al., 47 1986;Suárez et al., 1992;Suárez-López and Ortín, 1994;Bloom, 2014;Pauly et al., 2017). Indeed,48 many studies have identified mutations that increase IFN induction when engineered into a viral 49 population (te Velthuis et al., 2018;Du et al., 2018;Killip et al., 2017;Pérez-Cidoncha et al., 2014), 50 and viral stocks that are rich in internal deletions in the polymerase genes induce more IFN (Baum 51 et al., 2010;Tapia et al., 2013;Boergeling et al., 2015;Dimmock and Easton, 2015). 52 However, existing techniques are inadequate to determine how viral genetic diversity con-  82 To efficiently identify and enrich rare IFN+ cells, we created A549 cells that carried IFN re-       Figure 1B are at https://github.com/jbloomlab/ IFNsorted_flu_single_cell/tree/master/paper/figures/IFN_stochastic/IFN_reporter/plasmids. Supplement 3; further validated by the single-cell transcriptomics below). Therefore, for the rest of 90 this paper, we use "IFN expression" to refer to combined expression of type I and III IFNs. 91 We generated a stock of "wild-type" A/WSN/1933 (H1N1) influenza (hereafter referred to as 92 "WSN"), and found that it activated the IFN reporter in ∼0.5% of infected cells-a frequency roughly 93 comparable to that in our prior single-cell transcriptomics of influenza-infected cells ( Figure 1A). 94 95 To determine if mutations or other features of the infecting virions contribute to the heterogeneous 96 outcome of infection, we developed the approach in Figure 2 to determine the entire transcriptome 97 and the full sequences of all viral genes in single cells. First, we generated a stock of virus that 98 consisted of a mix of wild-type WSN and a "synonymously barcoded" variant that contained two 99 engineered synonymous mutations near each termini of each gene (Figure 2-source data 1). These    118 We used a portion of the cell-barcoded cDNA for standard single-cell transcriptomics by Illumina 119 3'-end sequencing ( Figure 2C). But we also took a portion and enriched for full-length viral molecules 120 by PCR ( Figure 2D). We performed PacBio sequencing on these full-length viral cDNAs to generate  128 We obtained transcriptomes for 1,614 human (A549) cells, and 50 of the uninfected canine cells that 129 were spiked into the experiment as a control ( Figure 3A). We also obtained 12 transcriptomes with a 130 mix of human and canine transcripts; from the number of such mixed cell-type transcriptomes, we 131 estimate (Bloom, 2018) that ∼11% of the transcriptomes are derived from multiple cells. To remove 132 some of these multiplets, we filtered transcriptomes with unusually high or low numbers of cellular 133 transcripts ( Figure 3B), retaining the 1,490 human cells that passed this filtering. 134 To identify infected cells, we examined the fraction of each transcriptome derived from virus ( Fig-135 ure 3B). As expected, only a small fraction (∼0.7%) of transcripts in the uninfected canine cells were 136 from influenza; this low-level background is likely from lysed cells that release ambient viral mRNA 137 that is captured in droplets during reverse-transcription. We tested whether each cell contained

2018)
, the distribution is extremely heterogeneous: many infected cells have only a few percent of 144 mRNA derived from virus, but viral mRNA comprises over half the transcriptome of a few cells. 145 We then called the presence or absence of each viral gene in each infected cell, again using 146 a Poisson model parameterized by background fractions estimated from uninfected canine cells. 147 We called presence / absence of genes rather than transcripts, since the two influenza genes that 148 encode multiple transcripts (M1 / M2 from the M gene, and NS1 / NS2 from the NS gene) do so via 149 alternative splicing that leaves both isoforms with the same termini, making them indistinguishable 150 by 3'-end sequencing. Figure 3D (top panel) shows that most (162 of 290) infected cells express all 151 eight genes, although a substantial minority fail to express one or more genes (see The amount of viral mRNA tended to be lower in cells that failed to express viral genes ( Figure 3D, By examining the synonymous viral barcodes near the 3' termini of transcripts, we determined 164 that 38% of cells were co-infected with wild-type and synonymously barcoded virions ( Figure 3F; 165 cells called as co-infected if a binomial test rejected null hypothesis that ≥95% of viral mRNA is from 166 one viral barcode variant). From Figure 3F, we estimate (Bloom, 2018) that 63% of infected cells are 167 co-infected, implying that 25% are co-infected with two virions with the same viral barcode (such 168 co-infections cannot be identified from transcriptomic data). Interestingly, this co-infection rate 169 is higher than expected from the relative numbers of infected and uninfected cells ( Figure 3C)  cell as infected given the thresholds in Figure 3C. This moderately high rate of co-infection may 173 also explain why more cells in our experiment express all eight viral genes compared to some prior 174 studies, as a co-infecting virion can complement a missing viral gene (Russell et al., 2018). 175 We next examined expression of IFN and ISGs ( Figure 3G and Figure 3-Figure Supplement 3). 176 Over 20% of infected cells were IFN+ given the heuristic thresholds in Figure 3C, indicating that 177 the MACS successfully enriched IFN+ cells far beyond their initial frequency of ∼0.5% ( Figure 1C            There is no association between IFN induction and the amount of viral mRNA in cells that express NS, but viral burden is associated with IFN induction among cells that lack NS. Throughout this figure, we only consider substitutions that are non-synonymous. Insertions are not shown as a mutation type as they are extremely rare (Figure 4).        Figure 5C, the only trend that remained 240 significant at a false discovery rate of 10% was absence of NS. 241 One other interesting trend emerges from the single-cell data. There is no difference in the 242 amount of viral mRNA between IFN+ and IFN-cells that express NS-but among cells that lack NS, 243 the IFN+ ones have significantly more viral mRNA ( Figure 5D).

245
To test if the viral defects identified in single IFN+ cells play a causal role in triggering IFN expression, 246 we used reverse genetics to generate bulk stocks of viruses with some of these defects. 247 The viral defect most strongly associated with IFN induction was failure to express the NS gene 248 (Figure 4, Figure 5C). The single-cell data also showed that amino-acid substitutions in the PB1 and NS genes were 255 enriched in IFN+ cells (Figure 4, Figure 5C), so we created mutant viruses with many of the amino-256 acid substitutions found in these genes among IFN+ cells. Overall, we created six such point-mutant  Figure 4). We therefore created a virus carrying this deletion, and 263 propagated it in cells constitutively expressing PB1 protein. 264 We tested the rate of IFN induction by each viral stock using the reporter cells.  interacts with cell-to-cell heterogeneity to shape the race between virus and immune system. 332

333
IFN reporter cell lines 334 We created IFN reporter variants of the A549 human lung epithelial cell line ( Figure 1B) (Kim et al., 2011). The sequence 346 of the last of these constructs is provided in Figure 1-source data 1. 347 To create the type III interferon reporters, a 1.2kb region upstream of the human IL29 (IFNL1) 348 gene was cloned into the pHAGE2 vector, with the native Kozak sequence retained at the 3' end.

349
Downstream of this promoter we cloned LNGFRΔC linked to ZsGreen via a P2A linker. The sequence 350 of this construct is provided in Figure 1-source data 1. 351 We used these constructs to generate lentiviral vectors and transduce of A549 cells in the Both viral strains were generated by reverse genetics using the pHW18* series of bi-directional 378 plasmids (Hoffmann et al., 2000). We controlled the durations and MOI during viral passaging 379 since these factors can greatly affect the accumulation of defective viral particles (Xue et al., 2016). 380 The viruses were generated by reverse genetics in co-cultures of 293T and MDCK-SIAT1 cells in The single-cell transcriptomics library was sequenced on an Illumina HiSeq 2500, and the data 429 analyzed as described below.
430 Enrichment and preparation of viral cDNA for PacBio sequencing 431 We amplified virus-derived molecules from the small amount of cDNA retained from the 10X 432 Genomics protocol for PacBio sequencing of the full-length cDNA. All molecules in this cDNA 433 have at their 3' end the cell barcode and UMI plus the constant adaptor sequence that is added 434 during the 10X protocol (see Figure 2 for simple schematic, or the detailed analysis notebook 435 at https://github.com/jbloomlab/IFNsorted_flu_single_cell/blob/master/pacbio_analysis.ipynb or 436 Supplementary file 1 for at detailed schematic). However, we only wanted to PacBio sequence cDNA 437 molecules derived from virus, since it would be prohibitively expensive to sequence all molecules. 438 We therefore needed to enrich for the viral molecules, a process made challenging by the need 439 to also retain the 10X adaptor / UMI / cell barcode at the 3' end (the primer at the 5' end can be 440 specific, but the one at the 3' end must bind the common 10X adaptor shared among all molecules). 441 We first performed a multiplex PCR reaction on 1 ng of the full-length 10X cDNA using a 3' 442 primer complementary to the common 10X adaptor, and a multiplex mix of eight 5' primers, one 443 specific for the mRNAs from each of the eight viral gene segments (although some viral segments 444 produce multiple splice forms, all mRNAs from a given segment share the same 5' end). The 445 sequences of these primers are in Figure 4-source data 1. A major concern during these PCRs RT-PCR, reverse-transcribed using an oligo-dT primer, and qPCR was performed as described above.

553
In Figure 6, we tested the IFN inducing capacity of a variety of viral mutants identified in the single-554 cell experiments. For point-mutant viruses, we created variants for all amino-acid substitutions 555 found in PB1 and NS among IFN+ cells that did not also lack NS. One of these mutants (amino-acid 556 substitution S704P in PB1) did not reach sufficient titers in a single attempt to generate it by reverse 557 genetics, and so was dropped from the experiment (note that we did not attempt replicates of the were cloned into the pHW2000 bi-directional reverse-genetics plasmid (Hoffmann et al., 2000) in 567 order to enable generation of viruses encoding the mutant genes. Figure 6-source data 1 provides 568 the full sequences for all of these plasmids. 569 We generated the wild-type and point-mutant viruses for the validation experiments in Figure 6A    , and reassuringly found that strand exchange was rare (Figure 4-Figure Supplement 1). 645 The small number of CCSs with identifiable strand exchange were filtered from further analysis. 646 We then further filtered for CCSs that contained valid cell barcodes as identified by the cellranger 647 pipeline, and kept just one CCS per UMI (preferentially retaining high-quality CCSs that aligned to 648 full-length cDNAs). We then removed from the CCSs the barcoding synonymous mutations that we in Figure 3B. We used the uninfected canine cells to estimate the percentage of total mRNA in a 676 cell that would come from influenza purely due to background (e.g., from cell lysis) in the absence 677 of infection, and called as infected the human cells for which significantly more than this amount 678 of mRNA was derived from influenza under a Poisson model (Figure 3C). We next used a Poisson 679 model parameterized by the amount of expected background mRNA for each influenza gene to call 680 the presence or absence of each influenza gene in each infected cell (Figure 3D and Figure 3- Figure   681 Supplement 1). To identify cells that were co-infected with both viral barcodes (Figure 3F), we used 682 a binomial test to identify cells for which we could reject the null hypothesis that at least 95% of 683 viral mRNA was derived from the more common viral barcode. We called IFN+ and ISG+ cells using 684 the heuristic thresholds shown in Figure 3G and Figure 3-Figure Supplement 3, respectively. We 685 counted IFN mRNAs as any IFN-, IFN-, or IFN-transcripts. We counted ISG mRNAs as any of 686 CCL5, IFIT1, ISG15, or Mx1. The plot in Figure 4 summarizes all of the genotypic information, and 687 was created in substantial part using gggenes (https://github.com/wilkox/gggenes). The authors declare that no competing interests exist.

701
Boni MF, Zhou Y, Taubenberger JK, Holmes EC. Homologous recombination is very rare or absent in human Russell AB, Bloom JD. Computer code for "Single-cell virus sequencing of influenza infections that trigger innate immunity". GitHub. 2018; Commit hash will be added when paper is final. https://github.com/jbloomlab/ IFNsorted_flu_single_cell. Supplementary file 1. An HTML rendering of the Jupyter notebook that analyzes the PacBio data to call the viral sequences in infected cells is available at https://github.com/jbloomlab/IFNsorted_flu_single_cell/raw/ master/paper/figures/pacbio_single_cell_figures/pacbio_analysis.html. This notebook contains detailed descriptions of each step and plots illustrating the results, and is the best way to understand this part of the analysis in detail. The actual Jupyter notebook rendered here is available at https://github.com/jbloomlab/IFNsorted_flu_single_cell/blob/master/pacbio_analysis.ipynb.

Supplementary file 2.
An HTML rendering of the Jupyter notebook that analyzes the annotated cell-gene matrix to generate the figures in this paper is available at https://github.com/jbloomlab/IFNsorted_flu_single_ cell/raw/master/paper/figures/single_cell_figures/monocle_analysis.html. This notebook contains detailed descriptions of each step and plots illustrating the results, and is the best way to understand this part of the analysis in detail. The actual Jupyter notebook rendered here is available at https://github.com/jbloomlab/IFNsorted_flu_single_cell/blob/master/monocle_analysis.ipynb.  , 2006; Honda et al., 2006). The full code that performs the re-analysis shown in this figure is at https://github.com/jbloomlab/IFNsorted_flu_single_cell/tree/ master/paper/figures/IFN_stochastic/SteuermanReanalysis/. 3. An A549 cell line was generated by transduction with both the IFN-and IFN-reporters driving expression of mCherry and ZsGreen, respectively. The cells were then infected with two different stocks of "wild-type" WSN influenza, or stocks with a deletion in PB1 or stop codons in NS1 (described later in the paper). After 13 hours, cells were analyzed by flow cytometry. Cells positive for either fluorescent reporter were further analyzed. As shown in the FACS plots, expression of the IFNB1 and IFNL1 reporters is highly correlated in all cases.    , 2018). The exception is NP, which is detected in virtually all infected cells. The much higher frequency of detecting NP could reflect a biological phenomenon, but we suspect it is more likely that cells lacking NP tend to have much lower viral gene expression overall and so are not reliably called as being infected in our experiments because the number of viral mRNAs is below the detection limit.  Figure 3G shows that substantially more cells are ISG+ than IFN+, both among infected and uninfected cells. This is probably because paracrine signaling can induce ISG expression in cells that are not themselves expressing IFN (Stetson and Medzhitov, 2006;Honda et al., 2006). (B) Correlation between the fraction of cellular mRNA derived from IFN and ISGs. Each point represents one cell, and the Pearson correlation coefficient is shown. IFN and ISG expression are more correlated for infected than uninfected cells, probably because in the latter the ISG expression is more often due to paracrine signaling that does not induce expression of IFN itself. Among both the infected and uninfected populations, there are many cells with high expression of ISGs and little expression of IFN, but no cells that express high levels of IFN without also substantially expressing ISGs.  Figure 3, the numbers of CCSs for different genes should not be taken as an indicator of their abundance in the infected cells. Especially for the polymerase genes (PB2, PB1, and PA), many CCSs corresponded to genes with internal deletions, since these shorter forms of the genes were preferentially amplified during PCR. Therefore, the plot is faceted by the number of CCSs for any length of the gene, and for full-length genes. Note that the disproportionate sequencing of the shorter internally deleted genes should not greatly affect the genotype calling in Figure 4 since UMIs were used to collapse duplicate sequences derived from the same cDNA, and cell barcodes were used to collapse duplicate sequences from the same cell. The bars in the plot are colored by whether the sequence is derived from the wild-type viral variant, the synonymously barcoded viral variant, or represents a mixed-barcode molecule (see Figure 4- Figure Supplement 2 for more explanation). From the frequencies of these different forms, we estimate (Bloom, 2018) that 5.7% of molecules are chimeric due to PCR strand exchange. Note that about half of these PCR chimeras could be identified by the presence of mixed viral barcodes and removed from subsequent analyses, leaving ∼3% un-identified chimeras. Note also that for some molecules (mostly polymerase genes with internal deletions) one of the barcode sites was deleted from the molecule and so the barcode identity could only be partially called. A negligible number of molecules have low-accuracy sequence or unexpected nucleotide identities at the sites of the viral barcodes.

TTG-AGC-AAT
Leu-Ser-Asn CAA-GAG-AAA Gln-Glu-Lys  The library preparation for PacBio sequencing of the cDNA for the full-length viral genes required many cycles of PCR. A major concern is that strand exchange during this PCR could scramble mutations and 10X cell barcodes / UMIs from different molecules. We can detect PCR strand exchange by leveraging the fact that our cells were infected with a mix of wild-type virus and virus carrying synonymous barcodes near both termini of each gene. If there is no strand exchange, all molecules should either be wild-type or have the synonymous barcoding mutations at both termini. But strand exchange will create some molecules that have wild-type nucleotides at one termini and synonymous barcoding mutations at the other termini. Figure 4-Figure Supplement 1 shows the frequencies with which these different types of molecules were observed during the PacBio sequencing. Note that since the rate of homologous recombination in influenza virus in negligible (Boni et al., 2008), such mixed-barcode molecules are not expected to be generated naturally during co-infection. The cells for which we could call complete viral genotypes tended to have higher expression of viral mRNA than cells for which we could not call complete genotypes. Both facts makes sense. Cells with more viral mRNA are more likely to have their viral cDNA captured in the PacBio sequencing, which is only captures a small fraction of the total transcripts identified by the 3'-end sequencing transcriptomic sequencing. The lower calling rate for dual-barcode co-infections is probably because these co-infections have more viral genes that must be sequenced (potentially a copy of each viral gene from each viral variant), increasing the chances that one of these genes is missed by the PacBio sequencing. An important implication of this plot is that the cells for which we call complete viral genotypes are not a random subsampling of all infected cells in the experiment, but are rather enriched for cells that have high levels of viral mRNA and do not have dual-barcode viral infections. Note also that this plot is limited to the cells that were called as infected ( Figure 3C) and could clearly be classified as IFN-or IFN+ ( Figure 3G).  Figure 5C in that it also includes data from cells for which some viral genes were not fully sequenced. For incompletely sequenced cells, deletions and substitutions are included in the counts when that particular viral gene is sequenced. The major trends in Figure 5C are also true for the larger dataset in this figure. In addition, there is now a modest trend for infected cells that fail to express PB2 and PA not to express IFN. This plot shows the same information as panels Figure 5B-D except that it shows the association of viral gene absence / mutations with ISG expression rather than IFN expression. Cells are classified as ISG+ or ISG-as in Figure 3-Figure Supplement 3. The major viral features that associate with IFN induction also associate with ISG expression. Specifically, cells that express wild-type copies of all eight viral genes tend to express ISGs less frequently and at lower levels The absence of NS is strongly associated with higher ISG expression. Viral burden is not associated with ISG expression levels in NS-sufficient infections, but is in NS-deficient infections. After 13 hours, cells were stained for expression of HA protein and analyzed by FACS for HA and the ZsGreen driven by the IFNL1 reporter. The contour plots show the density of all cells, and IFN+ cells are also indicated by orange dots. Cells were classified as HA+ or IFN+ based on gates set to put 0.05% of the uninfected cells in these populations. For all influenza-infected cells, the percentage IFN+ was calculated only among the HA+ cells (since these are the ones that are infected). For the uninfected cells, the percentage IFN+ was calculated among all cells, since uninfected cells do not express HA. For each viral mutant, two independently generated stocks were each assayed in duplicate (i.e., #1a and #1b are replicates of one viral stock, and #2a and #2b are replicates of the other viral stock). The infections with replicate #1 of the wild-type virus were performed at an MOI of 0.1 as determined by TCID50, and all other viruses were infected at an equivalent particle number as determined by HI assay. The HA staining indicates that the number of transcriptionally active viral units per cell was similar across most mutants, and calculating percentage IFN+ only among HA+ cells largely corrects for modest variation in titer across mutants.   Figure 6B. Each infection was performed in triplicate. A complicating factor for these data is that the virus with the deletion in PB1 (PB1del385to2163) cannot perform secondary transcription because it does not express PB1 protein, and so cannot be effectively normalized by HA expression since it expresses lower levels of HA due to the lack of secondary transcription. Therefore, for the data in this figure all cells were infected at an equivalent MOI of 0.3 as determined by TCID50 on MDCK-SIAT1 cells for the wild-type and NS1stop viruses, and on MDCK-SIAT1 cells expressing PB1 (Bloom et al., 2010) for the PB1del385to2163 virus. Finally, the approach in Figure 6-Figure Supplement 3 was used to show that equivalent TCID50s, all viral variants had similar amounts of transcriptionally active virus in the absence of secondary transcription. All this normalization suggests that all infections were performed at a similar dose of particles active for primary transcription. Therefore, the percent IFN+ was calculated from the flow data in this figure for all cells (HA+ and HA-) since that is a more fair comparison for the PB1del385to2163 virus.  In this experiment, A549 cells were infected at MOI of 0.4 (based on TCID50 as described in Figure 6- Figure Supplement 2), and then after 8 hours mRNA was harvested for qPCR on oligo-dT primed reverse transcription products. The y-axis shows the ratio of viral HA mRNA to the housekeeping gene L32. These infections were performed in the presence of absence of 50 g/ml cycloheximide, which blocks protein synthesis and hence secondary transcription by newly synthesized viral proteins (Killip et al., 2014). In the absence of cycloheximide, the viruses with deletions in PB1 produced less viral mRNA presumably because they could not produce PB1 protein for secondary transcription. But in the presence of cycloheximide, all viruses produced similar amounts of viral mRNA, indicating that the dose of particles active for primary transcription is roughly equivalent across variants. Each measurement was performed in quadruplicate.