CpG Dinucleotides Inhibit HIV-1 Replication through Zinc Finger Antiviral Protein (ZAP)-Dependent and -Independent Mechanisms

Some RNA virus genomes are suppressed in the nucleotide combination of a cytosine followed by a guanosine (CpG), indicating that they are detrimental to the virus. The antiviral protein ZAP binds viral RNA containing CpGs and prevents the virus from multiplying. However, it remains unknown how the number and position of CpGs in viral genomes affect restriction by ZAP and whether CpGs have other antiviral mechanisms. Importantly, manipulating the CpG content in viral genomes could help create new vaccines. HIV-1 shows marked CpG suppression, and by introducing CpGs into its genome, we show that ZAP efficiently targets a specific region of the viral genome, that the number of CpGs does not predict the magnitude of antiviral activity, and that CpGs can inhibit HIV-1 gene expression through a ZAP-independent mechanism. Overall, the position of CpGs in the HIV-1 genome determines the magnitude and mechanism through which they inhibit the virus.

. However, the mechanisms by which CpG dinucleotides attenuate viral replication remain unclear. Importantly, introduction of CpG dinucleotides into viral genomes using synthetic biology techniques may be a new way to develop live attenuated virus vaccines, and a full understanding of how CpG dinucleotides inhibit viral replication is necessary to develop this approach (9,10).
HIV-1 encodes the three polyproteins found in all retroviruses (Gag, Pol, and Env), two regulatory proteins (Tat and Rev), and four accessory proteins (Vif, Vpr, Vpu, and Nef) (11). CpG dinucleotides are suppressed throughout the HIV-1 genomic RNA (gRNA), and introducing CpGs into gag or env inhibits viral replication (12)(13)(14)(15)(16). Furthermore, analysis of clinical HIV-1 samples found that mutations that create new CpG dinucleotides in HIV-1 are twice as costly as those that do not and that increased CpG dinucleotide abundance in env may predict disease progression (17,18). There are at least four mechanisms by which CpG dinucleotides could inhibit HIV-1. First, CpG DNA methylation-induced transcriptional silencing could repress viral gene expression (12,13,19). Second, introduction of CpGs into cis-acting elements or structures required for viral replication may render them nonfunctional. Third, CpGs could create deleterious cis-acting elements or structures. Fourth, they could act as a pathogenassociated molecular pattern (PAMP) that is recognized by the innate immune system. Supporting the hypothesis that CpGs in viral RNA could be a PAMP, the antiviral protein ZAP (zinc finger antiviral protein) has recently been shown to bind regions of HIV-1 RNA containing CpG dinucleotides and to inhibit HIV-1 with increased CpG content in env (16). ZAP (encoded by the gene ZC3HAV1) was initially discovered as a cellular factor inhibiting murine leukemia virus (MLV) gene expression (20). Subsequent studies have shown that ZAP also inhibits alphaviruses, filoviruses, and hepadnaviruses, as well as some flaviviruses and picornaviruses (21)(22)(23)(24)(25). ZAP inhibits viral replication by binding viral RNA and targeting it for degradation and/or inhibiting its translation (20,21,(26)(27)(28). It may also have other mechanisms of antiviral activity. However, ZAP does not restrict all viruses, and yellow fever virus, Zika virus, dengue virus, herpes simplex virus 1, vesicular stomatitis virus, and poliovirus are resistant to its antiviral activity (21,24). ZAP does not have enzymatic activity and interacts with other cellular proteins, such as TRIM25 and KHNYN, to restrict viral replication (29)(30)(31). Why some viruses are sensitive to ZAP and others are resistant, and whether the CpG abundance and context in viruses determines this, remains unknown. It is also unclear whether the deleterious effect of CpG dinucleotides on viral replication is mediated exclusively by ZAP or through additional mechanisms.
Overall, the specific mechanisms by which CpG dinucleotides inhibit viral replication are not well understood for any virus. Because CpGs are highly suppressed in HIV-1, it is an excellent model virus to study the antiviral effects of the dinucleotide. In this study, we introduced CpGs into different contexts and regions of the viral genome and analyzed how they restricted viral gene expression and replication. First, we determined whether there was a position-dependent effect of CpGs on viral replication. Second, we analyzed whether there was a correlation between endogenous ZAP activity and the abundance of CpG dinucleotides. Third, we tested whether increasing ZAP abundance increased its antiviral effect. In sum, we found that CpGs in different contexts and locations inhibited viral gene expression through ZAP-dependent inhibition of gene expression and ZAP-independent changes in pre-mRNA splicing. Importantly, the number of introduced CpG dinucleotides did not predict the magnitude of their antiviral activity or inhibition by endogenous ZAP. ZAP appears to target a specific region in env containing introduced CpGs more efficiently than other regions in the viral genome, though high levels of ZAP can target most regions of the genome containing CpGs. Our results indicate that the context and position of CpG dinucleotides in the HIV-1 genome determine how they inhibit viral replication through ZAP-dependent and -independent mechanisms. previously shown, togaviruses show little CpG suppression, with Ͼ40 CpGs/kb and an observed/expected ratio of Ͼ0. 75 (1-5). Many viruses show moderate CpG suppression, with an observed/expected CpG ratio of ϳ0.5. However, there are some viruses in which CpG abundance is highly suppressed, including hepatitis A virus, respiratory syncytial virus, and HIV-1 ( Fig. 1A; see Data Set S1). Within the retrovirus family, lentiviruses have high levels of CpG suppression (6 to 23 CpGs/kb; observed/expected ratio, 0.2 to 0.4) and alpharetroviruses have low levels of suppression (ϳ50 CpGs/kb; observed/expected ratio, ϳ0.7) (Fig. 1B; see Data Set S1). Viruses closely related to HIV-1 have ϳ10 CpGs/kb and an observed/expected ratio of ϳ0.2.
In addition to CpG dinucleotides, UpA dinucleotides are suppressed in many RNA viruses (1, 3,4). Recently, it was reported that ZAP interacts with viral RNA containing UpAs and restricts echovirus 7 containing introduced UpAs (32). Therefore, we analyzed the UpA abundances in our panel of viruses ( Fig. 1C; see Data Set S1). While some vertebrate RNA viruses, such as flaviviruses, show marked UpA suppression (Ͻ30 UpAs/kb; observed/expected ratio, Ͻ0.5) (Fig. 1C), UpA frequency in retroviruses is not substantially suppressed (Fig. 1D). Specifically, viruses closely related to HIV-1 have ϳ70 UpAs/kb and observed/expected ratios of ϳ0.9. In sum, CpGs, but not UpAs, appear to be potently suppressed in HIV-1.
To better understand the potential evolutionary pressures that have led to CpG suppression in HIV-1, we explored the effects of introducing CpGs into different regions of the viral genome using synonymous mutations. However, HIV-1 contains several overlapping reading frames that constrain where CpGs can be introduced ( Fig. 2A). Furthermore, it is important that the synonymous mutations introducing CpGs do not disrupt RNA elements that regulate viral replication. The HIV-1 open reading frames (ORFs) contain multiple cis-acting regulatory elements, including the programmed ribosomal frameshift sequence in gag (33); the Rev response element (RRE) in env (34); polypurine tracts in pol and nef (35); and splicing signals in pol, vif, vpr, tat, rev, and env (36). In addition, there could be uncharacterized elements. Therefore, we identified sequences in the HIV-1 open reading frames that contain reduced variability at synonymous sites, which could indicate the presence of functional RNA elements (37). This analysis identified many of the known HIV-1 linear and structural RNA regulatory elements, including the region at the 5= end of gag required for dimerization and encapsidation, the ribosomal frameshift sequence required for Pol translation, the polypurine tracts required for reverse transcription, several splicing regulatory sequences, and the RRE (Fig. 2B to E; see Data Set S2 in the supplemental material). We synonymously introduced CpG dinucleotides into gag, pol, and env sequences that the analysis revealed were unlikely to contain cis-acting elements (see Data Set S2).
The number of CpGs introduced into the HIV-1 genome is not correlated with the antiviral effect or ZAP sensitivity. An important experimental consideration when studying how CpGs regulate HIV-1 is that CpG DNA methylation-induced transcriptional silencing could potentially inhibit HIV-1 gene expression. However, CpGs in plasmids amplified in bacteria are not methylated when the plasmid is transiently transfected into mammalian cells (38)(39)(40), and we have therefore used an experimental approach in which HIV-1 proviral DNA plasmids are transfected into HeLa or 293T cells. A region in env immediately after vpu does not contain any detectable cis-acting RNA elements, which makes it a good region to analyze the effect of introducing CpGs ( Fig.  2; see Data Set S2). It has previously been shown that introducing 36 CpGs into env nucleotides (nt) 86 to 561 (HIV-1 env86 -561 CpG) inhibits HIV-1 genomic-RNA abundance, Gag expression, Env expression, and infectious-virus production ( Fig. 3A, C, and E and Table 1) (16,31). Importantly, this restriction is eliminated in ZAP knockout cells (Fig. 3B, C, and E), indicating that it is due to the recognition of the CpG dinucleotides by ZAP. To further characterize how CpGs affect HIV-1 replication, we inserted 48 CpGs into env nt 611 to 1014 (HIV-1 env611-1014 CpG) ( Fig. 3A and Table 1) and analyzed their effect on HIV-1 genomic-RNA abundance, Env expression, Gag expression, and infectious-virus production. These CpGs inhibited HIV-1 infectious-virus production more potently than the 36 CpGs in HIV-1 env86 -561 CpG in the control CRISPR cells (Fig. 3C). However, in the ZAP knockout cells, Gag expression, Env expression, and virion production were only partially increased for HIV-1 env611-1014 CpG, and infectious-virus production was rescued only ϳ2-fold (Fig. 3C). This indicates that the 48 CpGs introduced into this region of env caused both ZAP-dependent and ZAP-independent suppression of infectious-virus production. Nef is expressed from fully spliced mRNAs that do not contain the env region with the introduced CpGs (36). As expected, the CpGs introduced into HIV-1 env86 -561 CpG did not decrease Nef expression (Fig. 3C). However, there was decreased Nef expression for HIV-1 env611-1014 CpG, and this was not rescued in the ZAP CRISPR cells. This suggests that another mechanism, such as altered splicing, contributes to the CpGmediated decrease in infectious-virus production for HIV-1 env611-1014 CpG. Furthermore, genomic-RNA abundance for HIV-1 env611-1014 CpG was not fully restored to wild-type levels in ZAP knockout cells, indicating that the CpGs in HIV-1 env611-1014 CpG inhibit genomic-RNA abundance through ZAP-dependent and ZAP-independent mechanisms (Fig. 3E).
We also combined the two regions in env containing 36 and 48 CpGs for a total of 84 CpGs (HIV-1 env86 -1014 CpG) ( Fig. 3A and Table 1). This inhibited HIV-1 infectious-virus production in an approximately additive manner compared to the two regions' individual effects through both ZAP-dependent and ZAP-independent effects on genomic-RNA abundance and viral protein expression ( Fig. 3C and E). To determine the contribution of decreased Env expression to CpG-mediated inhibition, we pseudotyped the  viruses with the vesicular stomatitis virus glycoprotein (VSV-G) and found that the ZAP-dependent and -independent defects in infectious-virus production were still present (Fig. 3D). We then introduced 53 CpGs into a region of pol that did not contain any known or detectable cis-acting elements (HIV-1 pol795-1386 CpG) ( Fig. 2 and 4A and Table 1; see Data Set S2). Surprisingly, this large number of CpGs caused only a small (ϳ2-fold) reduction in Gag expression and infectious-virus production in control CRISPR cells, and this effect was eliminated in the ZAP knockout cells (Fig. 4B). This suggests that the magnitude of ZAP-dependent restriction is not simply proportional to the absolute number of CpGs added to the viral genome.
We have recently identified KHNYN as an essential ZAP cofactor for CpGs to inhibit HIV-1 env86 -561 CpG gene expression and infectious-virus production (31). Our previous work showed that KHNYN overexpression inhibited HIV-1 env86 -561 CpG much more potently than wild-type HIV-1, indicating that the introduced CpGs were required for KHNYN to inhibit HIV-1 (31). To determine if inhibition by KHNYN correlated with CpG abundance or ZAP sensitivity, we overexpressed KHNYN on wild-type HIV-1, HIV-1 env86 -561 CpG, or HIV-1 pol795-1386 CpG ( Fig. 4C and D). KHNYN antiviral activity was correlated with the sensitivity of the virus to endogenous ZAP, with HIV-1 env86 -561 CpG inhibited much more potently than HIV-1 pol795-1386 CpG or wild-type HIV-1. We also tested whether endogenous KHNYN was required to restrict HIV-1 env611-1014 CpG, HIV-1 env86 -1014 CpG, and HIV-1 pol795-1386 CpG using the KHNYN CRISPR cells that we previously characterized (31). Wild-type HIV-1 infectious-virus production was not affected by depleting KHNYN, while HIV-1 env86 -561 CpG infectious-virus production and gene expression were substantially increased (31) (Fig. 4E). The CpG-mediated inhibition of HIV-1 env611-1014 CpG and HIV-1 env86 -1014 CpG infectious-virus production was partially rescued in KHNYN CRISPR cells, which was correlated with their restriction were used to infect TZM-bl reporter cells to measure infectious-virus production. Gag expression in the media, as well as Gag, Hsp90, Env, actin, and Nef expression in the cell lysates, was detected using immunoblotting. The bar charts show the averages of the results of four independent experiments normalized to wild-type HIV-1 in HeLa control CRISPR cells. (E) Genomic-RNA abundance was quantified by qRT-PCR in cell lysates. The bar charts show the averages of the results of three independent experiments normalized to wild-type HIV-1 in HeLa control CRISPR cells. The error bars represent standard deviations. *, P Ͻ 0.05 as determined by a two-tailed unpaired t test; ns, not significant. The black asterisks/ns compare the virus containing introduced CpGs in the control CRISPR cells to wild-type HIV-1 in the control CRISPR cells. The red asterisks/ns compare the virus containing introduced CpGs between the ZAP CRISPR cells (red bars) and the control CRISPR cells (black bars).  by ZAP ( Fig. 3C and 4E). The small decrease in infectious-virus production for HIV-1 pol795-1386 CpG was rescued in the KHNYN CRISPR cells, indicating that the 53 introduced CpGs in pol moderately inhibited HIV-1 through ZAP and KHNYN (Fig. 4E).

CpGs in wild-type virus CpGs introduced Mutations introduced
HIV-1 strains containing reporter genes, such as those encoding green fluorescent protein (GFP) or luciferase, are important experimental tools and have large numbers of CpGs in the reporter gene. Therefore, it was important to determine whether ZAP inhibits these viruses, because this could confound the interpretation of results obtained under some experimental conditions. Specifically, we analyzed whether ZAP could restrict HIV-1 containing the encephalomyocarditis virus (EMCV) internal ribosome entry site (IRES) followed by enhanced GFP (eGFP) (IRES-GFP) or Renilla luciferase (IRES-Renilla) ( Fig. 5A and Table 1) (41,42). Both IRES-GFP and IRES-Renilla introduced 96 CpGs into the viral genome. While both reporter viruses produced less Gag and infectious virus than wild-type HIV-1, ZAP depletion did not increase this production (Fig. 5B). We also analyzed the effect of a Venus fluorescent protein-plus-inker sequence that introduced 64 CpGs as a fusion protein with Gag ( Fig. 5A and Table 1) (43). Interestingly, this sequence did not sensitize Gag abundance or virus-like particle (VLP) production to ZAP (Fig. 5C). Because a ribosome could displace ZAP bound to CpGs in an open reading frame as it moves along the mRNA, it is possible that CpGs in a 3= untranslated region (UTR) could inhibit HIV-1 gene expression in a ZAP-dependent manner more effectively than CpGs in a coding region. Therefore, we inserted stop codons between Gag and Venus in both the Gag and Pol reading frames ( Fig. 5A and Table 1). However, the CpGs in the context of the 3= UTR also did not promote ZAP-mediated inhibition of Gag expression or VLP production (Fig. 5C). In sum, the total number of CpGs introduced into the HIV-1 genome does not correlate with their antiviral activity in the context of endogenous ZAP levels in HeLa cells.
Increasing ZAP abundance inhibits CpG-containing HIV-1. We then analyzed whether increasing ZAP abundance further inhibited HIV-1 containing introduced CpGs. First, we treated control and ZAP knockout HEK293T cells with type I interferon (IFN-I) (Fig. 6). Similar to previous reports, this consistently increased expression of the short isoform of ZAP (ZAP-S) by ϳ2-fold and had no substantial effect on expression of the long isoform of ZAP (ZAP-L) ( Fig. 6C) (44,45). IFN-I inhibited all of the viruses. For wild-type HIV-1, the magnitudes of inhibition were similar in the control and ZAP knockout cells (Fig. 6A), which is consistent with the observation that endogenous ZAP does not target wild-type HIV-1 (16,31). The magnitude of IFN-I inhibition of HIV-1 env86 -561 CpG was reduced when ZAP was depleted, indicating that ZAP contributes to the antiviral effect of IFN-I on the virus. Importantly, IFN-I treatment augmented the ZAP-dependent inhibition of HIV-1 pol795-1386 CpG. This suggests that CpG-rich sequences in an HIV open reading frame that are only weakly restricted by endogenous ZAP levels can be further sensitized by IFN-I. However, even in the presence of IFN-I, the magnitude of inhibition by the 53 introduced CpGs in pol was lower than that of the inhibition mediated by the 36 introduced CpGs in env. For HIV-1-IRES-GFP and HIV-1- asterisk/ns compares the virus containing introduced CpGs between the ZAP CRISPR cells (red bars) and the control CRISPR cells (black bars). (C and D) HeLa cells were transfected with 500 ng pHIV-1, pHIV-1 EnvCpG86 -561 , or pHIV-1 pol795-1386 CpG and 500 ng of pGFP-FLAG or 31.25 ng, 62.5 ng, 125 ng, 250 ng, or 500 ng pKHNYN-1-FLAG plus the amount of pGFP-FLAG required to make 500 ng total. (C) Viral infectivity was measured using TZM-bl reporter cells infected with cell culture supernatants. Each point shows the average value of the results of three independent experiments normalized to the value obtained for wild-type HIV-1 in HeLa cells. *, P Ͻ 0.05; ns, not significant, as determined by a two-tailed unpaired t test. The black asterisks/ns compare the virus containing introduced CpGs to wild-type HIV-1 with 0 ng of KHNYN. The red asterisks/ns compare the virus containing introduced CpGs between the points of KHNYN overexpression to 0 ng of KHNYN. (D) Gag expression in the media, as well as Gag, Hsp90, Env, actin, and KHNYN-FLAG expression in the cell lysates, was detected using immunoblotting. (E) HeLa control and KHNYN-ex3 CRISPR cells were transfected with pHIV-1 or pHIV-1 env86 -561 CpG, pHIV-1 env611-1014 CpG, pHIV-1 env86 -1014 CpG, or pHIV-1 pol795-1386 CpG plus pGFP. Viral infectivity was measured using TZM-bl reporter cells infected with cell culture supernatants. Gag expression in the media, as well as Gag, Hsp90, Env. and actin expression in the cell lysates, was detected using immunoblotting. The bar charts show the average values of the results of three independent experiments normalized to the values obtained for wild-type HIV-1 in the HeLa control CRISPR cells. The error bars represent standard deviations. *, P Ͻ 0.05 as determined by a two-tailed unpaired t test. The black asterisks compare the virus containing introduced CpGs in the control CRISPR cells to wild-type HIV-1 in the control CRISPR cells. The red asterisks compare the virus containing introduced CpGs between the KHNYN CRISPR cells (red bars) and the control CRISPR cells (black bars).
IRES-Renilla, the magnitude of IFN-I inhibition was not decreased upon ZAP depletion (Fig. 6B). This indicates that endogenous ZAP does not target these viruses, despite the large numbers of CpGs that have been introduced into the viral genome.
Because IFN-I only moderately upregulated ZAP-S, we also overexpressed ZAP-S or ZAP-L in HeLa cells. This increased ZAP-S expression ϳ5-fold and ZAP-L expression Infectious-virus production was measured using TZM-bl reporter cells infected with cell culture supernatants. Gag expression in the media, as well as Gag and Hsp90 expression in the cell lysates, was detected using immunoblotting. The bar chart shows the average values of the results of three independent experiments normalized to the values obtained for wild-type HIV-1 in HeLa control CRISPR cells. (C) Control and ZAP-ex6 CRISPR cells were transfected with pHIV-1Gag-Venus or pHIV-1Gag-STOP-Venus. Gag expression in the media, as well as Gag and Hsp90 expression in the cell lysates, was detected using immunoblotting. The bar chart shows the average values of the results of three independent experiments normalized to the virus in control CRISPR cells. The error bars represent standard deviations. *, P Ͻ 0.05; ns, not significant, as determined by a two-tailed unpaired t test. The black asterisks/ns compare the virus containing introduced CpGs in the control CRISPR cells to wild-type HIV-1 in the control CRISPR cells. The red asterisks/ns compare the virus containing introduced CpGs between the ZAP CRISPR cells (red bars) and the control CRISPR cells (black bars).

Ficarelli et al.
Journal of Virology ϳ20-fold (Fig. 7C). Both ZAP isoforms inhibited all of the viruses tested at least 2-fold (Fig. 7). Interestingly, at high levels of ZAP-S or ZAP-L, the 36 introduced CpGs in env and the 53 CpGs in pol inhibited HIV-1 infectious-virus production to similar levels (Fig.  7A). Both HIV-1-IRES-GFP and HIV-1-IRES-Renilla were potently inhibited by overex-  pressed ZAP, indicating that when ZAP abundance is high enough, the CpGs introduced into these viruses can be targeted (Fig. 7B). Therefore, the inhibition observed for HIV-1-IRES-GFP and HIV-1-IRES-Renilla at high ZAP levels indicates that CpGs in contexts that are not targeted by the endogenous ZAP levels in HeLa cells (Fig. 4) can be targeted if ZAP abundance is substantially increased (Fig. 7B). CpG dinucleotides introduced into the 5= end of gag inhibit HIV-1 replication in a ZAP-independent manner. We previously introduced CpG dinucleotides into nt 22 to 378 of gag in two different contexts (Fig. 8A and Table 1) and found that they inhibited viral replication with different phenotypes on Gag expression (15). For HIV-1 gag22-378 CM, the codon modified (CM) sequence was derived from a codonoptimized Gag-Pol construct and introduced 109 synonymous nucleotide changes and 26 CpGs (Table 1) (15,46). In the context of a single-cycle infectivity assay, HIV-1 gag22-378 CM Gag expression and infectious-virus production were decreased to the limit of detection (Fig. 8B). This correlated with a large decrease in genomic-RNA abundance in the cell lysate and medium ( Fig. 8C and D). The CpG dinucleotides were necessary to inhibit the virus, because when they were removed (while leaving the 79 mutations that did not introduce a CpG dinucleotide) to create HIV-1 gag22-378 CM-no-CpG ( Fig. 8A and Table 1), gRNA abundance, Gag expression, and infectious-virion production were substantially increased (Fig. 8B). However, when the 26 CpG dinucleotides were introduced without the additional mutations in the codon-optimized Gag sequence to create HIV-1 gag22-378 CpG ( Fig. 8A and Table 1), infectious-virus production was decreased by Ͼ95%, even though there was no substantial decrease in Gag expression (Fig. 8B).
Deletion of nt 22 to 378 in gag does not substantially decrease infectious-virus production (15,47), indicating that there are no essential cis-acting elements in the region. However, altering the RNA sequence could modulate the local RNA structure in ways that a large deletion did not, and the 5= region of gag has been shown to indirectly regulate gRNA packaging by regulating the structure of the 5= UTR (48,49). To determine if the CpGs in HIV-1 gag22-378 CpG specifically decreased infectious-virus production or if a cis-acting regulatory element in the region had been mutated, we changed the codons previously mutated to introduce CpGs into a different codon (DC) that was not the wild-type HIV-1 codon and did not introduce a CpG where possible to produce pHIV-1 gag22-378 DC ( Fig. 8A and Table 1). HIV-1 gag22-378 DC produced levels of infectious virus similar to those of wild-type HIV-1 (Fig. 8B). We also measured gRNA abundance in the cell lysate and medium. HIV-1 gag22-378 CpG had ϳ60% and ϳ80% decreases in gRNA in the lysate and medium, respectively, while HIV-1 gag22-378 DC had levels of gRNA similar to those of wild-type HIV-1 ( Fig. 8C and D). This indicates that the introduced CpGs are necessary for the reduction in infectious-virus production and not a result of mutating essential cis-acting elements in the region that modulate gRNA packaging or other steps of HIV-1 replication. In sum, while HIV-1 gag22-378 CM and HIV-1 gag22-378 CpG have the same 26 introduced CpGs, there is a much larger decrease in Gag expression and intracellular gRNA abundance for HIV-1 gag22-378 CM, and this is due to the introduced CpGs. This suggests that the sequence surrounding the CpG dinucleotides modulates their inhibitory effect.
To test the effect of the sequence proximal to the CpG, we changed all of the mutations in HIV-1 gag22-378 CM that were not within 5 nucleotides of an introduced CpG back to the wild-type HIV-1 sequence to produce HIV-1 gag22-378 CM-5nt-CpG ( Fig. 8A and Table 1). Interestingly, HIV-1 gag22-378 CM-5nt-CpG Gag expression and infectiousvirus production were very similar to those of HIV-1 gag22-378 CM (Fig. 8E). This indicates that the sequence immediately surrounding the CpG dinucleotides influences the degree to which they inhibit HIV-1 gene expression. We also analyzed intracellular Env abundance and found that Env expression was reduced to undetectable levels for HIV-1 gag22-378 CM and HIV-1 gag22-378 CM-5nt-CpG (Fig. 8E). Env expression was also decreased for HIV-1 gag22-378 CpG, which likely accounts for the decreased infectiousvirus production, in addition to the reduced gRNA levels present in virions (Fig. 8D). The decrease in Env expression was unexpected, because the region in the HIV-1 genome containing the introduced CpGs in these viruses is present only in the unspliced transcript encoding Gag and Gag-Pol and not in the singly spliced env mRNAs. Overall, the local sequence context of the CpGs in gag nt 22 to 378 determines the magnitude of inhibition for both Gag and Env expression.
We then analyzed whether ZAP was necessary for the CpGs in HIV-1 gag22-378 CM and HIV-1 gag22-378 CpG to inhibit Gag expression and infectious-virus production. In contrast to HIV-1 env86 -561 CpG, ZAP depletion did not increase infectious-virus production for HIV-1 gag22-378 CM and HIV-1 gag22-378 CpG (Fig. 9A). This shows that the CpGs introduced into the 5= region of gag inhibit HIV-1 gene expression and infectious-virus production through a ZAP-independent mechanism. To determine if introducing a larger number of CpGs into gag could make it ZAP sensitive, we cloned pHIV-1 gag22-1188 CpG, which contains 62 CpG dinucleotides distributed across ϳ1,100 nt ( Fig. 9B and Table 1). In addition, we produced pHIV-1 gag22-654 CpG and HIV-1 gag658 -1188 CpG, which contain 30 and 32 CpGs, respectively. Introducing CpGs into nt 22 to 654 of gag led to large decreases in Gag expression, Env expression, and infectious-virus production (Fig. 9C). Interestingly, Gag expression for HIV-1 gag22-1188 CpG and HIV-1 gag22-654 CpG was increased in the ZAP knockout cells, though this did not affect Env expression or infectious-virus production. In contrast, introduction of 32 CpGs into nt 658 to 1188 of gag had no effect on Gag expression, Env expression, or infectious-virus production (Fig. 9C). Therefore, we introduced 60 CpGs into this 3= region of gag (HIV-1 gag694 -1206 CpG) ( Fig. 9B and Table 1) and analyzed the effect on infectious-virus production in control and ZAP CRISPR cells (Fig. 9D). The 60 CpGs in the 3= region of gag inhibited HIV-1 infectious-virus production about 2-fold, which is similar to the effect that 53 CpGs had in pol and less than the 36 CpGs in env. Overall, in some contexts, CpGs introduced into gag allow ZAP to inhibit Gag expression, but the number of CpGs is not correlated with the magnitude of the inhibitory effect. Furthermore, the CpGs introduced into the 5= region of gag inhibit infectious-virus production in a ZAP-independent manner.
CpG dinucleotides introduced into gag can inhibit HIV-1 gene expression by modulating pre-mRNA splicing. The HIV-1 genomic RNA undergoes extensive alternative splicing to mediate expression of all of the viral genes, and synonymous mutations in gag have previously been shown to disrupt HIV-1 splicing (36,50). Since the CpGs in the 5= region of gag reduced Env expression in all sequence contexts tested ( Fig. 8E and 9C), we speculated that they could affect splicing. Therefore, we analyzed RNA abundance when progressively longer regions of codon-optimized gag sequence were added to the virus, which introduced 11, 18, or 26 CpGs (HIV-1 gag22-165 CM, HIV-1 gag22-261 CM, and HIV-1 gag22-378 CM) ( Fig. 10A and Table 1). We have previously analyzed these viruses and have shown that they are deficient for genomic-RNA abundance and Gag expression (15). A comparison of the total RNA and the genomic RNA indicated that the abundance of both was decreased by the synonymous mutations (Fig. 10B). To determine whether the decrease in viral RNA abundance is due to altered splicing, we used RNA-seq to sequence the transcriptome in each sample and determine which splice sites were used to produce the mRNAs for each virus (Fig. 10C and Table 2; see Data Set S3 in the supplemental material). This analysis showed that a preexisting cryptic splice donor (CD1) was activated. Importantly, this donor is outside the region into which the CpGs were introduced (Fig. 10D). The frequency of use of the cryptic splice donor increased with the length of the codon-optimized gag sequence and coincided with a decrease in the utilization of canonical splice donor 1 (SD1). Activation of the cryptic splice donor increased the length of the first exon incorporated  Journal of Virology into all of the spliced viral RNAs to include the gag sequence prior to CD1. This led to the incorporation of the Gag initiation codon in every transcript upstream of the canonical start codon for the encoded protein ( Fig. 10C and D), which is predicted to result in inefficient translation of all HIV-1 proteins encoded by a singly or fully spliced mRNA, including Tat and Rev. This could account for the decrease in total RNA levels, genomic-RNA levels, and Gag and Env expression that we observed for HIV-1 gag22-378 CM ( Fig. 8 and 10) (15).

DISCUSSION
There is selection against CpG dinucleotides in many vertebrate RNA viruses, and introducing them into viral genomes may allow novel vaccines to be developed (1- 5,9,10). However, to attenuate viral replication, it is unclear how many CpGs are required, whether there is an optimal location to insert CpGs, and whether all CpGs inhibit via ZAP. Due to the profound suppression of CpG abundance in HIV-1, we used it as a model system to analyze how CpG dinucleotides inhibit viral replication.
Our results show that CpGs can inhibit HIV-1 replication through at least two independent, but not mutually exclusive, mechanisms. First, they can recruit ZAP and target the viral RNA for degradation. Second, they can inhibit replication by altering pre-mRNA splicing. In addition, CpGs could silence HIV-1 transcription through DNA methylation. The multiple mechanisms by which CpGs inhibit HIV-1 infectious-virus production may explain why they are strongly suppressed in the virus, even compared to other RNA viruses, and may also explain why small changes in the number of CpGs in env are linked to disease progression (18). Introducing CpGs into env nt 86 to 561 potently inhibits genomic-RNA abundance, Env expression, Gag expression, and infectious-virus production in a ZAP-dependent manner. However, ZAP depletion does not fully rescue infectious-virus production when CpGs are introduced into several other regions of the HIV-1 genome, highlighting the ZAP-independent effects of CpGs, as well as the sensitivity of the 5= region of env for CpGs that mediate ZAP antiviral activity.
We have characterized the ZAP-independent effect for CpGs introduced into the 5= end of gag. These CpGs have a dramatic effect on genomic-RNA levels, Gag expression, and Env expression by promoting the use of a cryptic 5= splice donor at the expense of SD1. Interestingly, the magnitude of this effect is modulated by the sequence identity immediately surrounding the CpGs. It should be noted that these experiments were done in the context of transiently transfected proviral constructs in HeLa cells, but previous studies have shown that the splicing patterns for HIV-1 are similar in transfected cells and infected T cells (51,52). The introduced CpGs do not directly enhance the strength of this splice site because they are upstream of the splice donor and do not affect the sequence itself. A previous report has identified that introducing synonymous mutations into the 5= end of gag promotes splicing at this cryptic donor, but the role of CpGs in this was not characterized (50). We found that CpGs introduced into this region have multiple effects on viral replication, including decreases in genomic-RNA stability, Gag expression, virion production, and infectivity per genome (15). All of these phenotypes are likely due to the decreased use of SD1 and the corresponding increase in splicing from the cryptic splice donor in gag. The CpGs introduced into the 5= end of gag could inhibit a preexisting exonic splicing silencer (ESS) or introduce an exonic splicing enhancer (ESE). We favor the hypothesis that the CpGs introduced an ESE, because the synonymous mutations promote splicing at the cryptic splice donor in a  length-dependent manner and the sequence within 5 nucleotides surrounding the CpG modulates the magnitude of the decrease in Env and Gag expression. However, further experiments will be required in order to characterize how CpGs modulate splicing. There are approximately 1,500 RNA binding proteins in the human genome, most of which do not have a well-characterized recognition sequence, though several have been reported to bind sequences that contain CpGs (53,54). Therefore, an unknown number of RNA binding proteins bind CpGs, and we do not yet know which protein regulates HIV-1 splicing in a CpG-dependent manner. In addition, introducing CpGs into the HIV-1 genome may affect its local or long-range RNA structures; posttranscriptional modifications, such as cytosine methylation; or other aspects of RNA biology (55)(56)(57).
Surprisingly, the magnitude of ZAP-mediated inhibition was not correlated with the number of CpGs introduced into the viral genome, and some regions of the genome can tolerate substantial numbers of CpGs. Thirty-six CpGs inserted into the 5= region of env had the greatest ZAP-dependent inhibitory phenotype. This corresponds to the observations by Takata et al., who first identified this region in a panel of viruses with large numbers of synonymous mutations in different regions of the HIV-1 genome (16,50). The introduction of 53 CpGs into pol or 60 CpGs into the 3= end of gag produced only a small amount of ZAP-dependent inhibition of infectious-virus production. KH-NYN overexpression also produced less inhibition of infectious-virus production when 53 CpGs were introduced into pol than when 36 CpGs were added to env. This supports our hypothesis that KHNYN antiviral activity is controlled by ZAP's ability to target the viral RNA (31). Only when ZAP levels were very high due to overexpression from a cDNA plasmid did the 53 CpGs in pol mediate a level of repression similar to that mediated by the 36 CpGs introduced into env nt 86 to 561. This highlights the facts that this region in env is very sensitive to the endogenous levels of ZAP in HeLa cells and that the position or local context of the CpG is important for ZAP to inhibit the virus. Interestingly, the weak inhibition of infectious-virus production by the 53 CpGs in pol mediated by endogenous ZAP levels could be substantially enhanced by IFN-I treatment. Therefore, part of the   anti-HIV activity mediated by IFN-I (58) may be to promote ZAP targeting CpGs in contexts where it normally does so inefficiently. The fact that there is only a small increase in ZAP-S abundance upon IFN-I treatment raises the question of whether this induction of ZAP is sufficient to explain the phenotype or whether increased abundance or activity of ZAP cofactors, such as TRIM25 or KHNYN, may contribute (29)(30)(31)59). Because HiV-1 strains containing reporter genes are commonly used research tools, we investigated whether the large numbers of CpGs in the EMCV IRES, GFP, or Renilla luciferase could sensitize these viruses to ZAP. Adding 96 CpGs to the 3= end of the genome in the context of IRES-GFP or IRES-Renilla luciferase did not sensitize the virus to the endogenous levels of ZAP in HeLa or HEK293T cells. Similarly, HIV-1 with GFP in place of nef has previously been shown not to be targeted by endogenous levels of ZAP in HeLa or MT4 cells (16). While HIV-1-IRES-GFP and HIV-1-IRES-Renilla were not inhibited by ZAP after IFN-I treatment, they were inhibited when high levels of ZAP were present due to overexpression from a cDNA plasmid. Therefore, ZAP abundance can determine whether CpG-containing viral genomes are targeted, though it is unclear whether the ZAP levels produced from plasmid-based overexpression can be achieved in a relevant in vivo context. This suggests that these reporter viruses are useful tools that may not be affected by ZAP under many experimental conditions.
An important area of future research is to determine why CpGs in some contexts or regions are efficiently targeted by ZAP and others are not. To date, the primary evidence that ZAP directly binds CpGs comes from PAR-CLIP experiments (16). The advantage of this technique is that it captures ZAP binding to CpGs in a living cell. However, other cellular factors present could modulate ZAP's binding specificity. Several groups have shown that the ZAP cofactor TRIM25 can bind cellular and viral RNA, and it has been reported to regulate ZAP binding to Sindbis virus RNA (29,30,(60)(61)(62)(63)(64)(65). Therefore, TRIM25 or other ZAP cofactors could bind specific motifs in viral RNA to determine the sensitivity of the RNA to ZAP-mediated antiviral activity. In addition, it is not known how many ZAP molecules are required to bind RNA to mediate antiviral activity or if they have to be clustered in a specific way. While structural and mutagenesis studies of the RNA binding domain in ZAP have shown that it is a dimer that may have two RNA binding cavities within a large RNA binding cleft, how it binds CpG dinucleotides remains unknown (66,67). To fully understand how specific CpGs mediate ZAP-dependent antiviral activity, it will be essential to understand how ZAP binds CpGs in specific RNA contexts and structures and the role its cofactors play in modulating its RNA binding activity.
It will be interesting to compare how CpGs inhibit HIV-1 to how they inhibit other RNA viruses and if they do so by targeting viral RNA for degradation, by inhibiting its translation, or through other mechanisms. CpGs directly or indirectly introduced into coding and noncoding regions of picornaviruses have shown that they can potently attenuate viral replication and create strains that protect animals from challenge with the wild-type virus (6-8, 10, 68). ZAP has recently been shown to be necessary for introduced CpGs to inhibit the picornavirus echovirus 7 (32). CpGs have also been shown to attenuate influenza A virus and to protect animals from lethal challenge by the wild-type virus (9). However, the molecular mechanism of attenuation remains unclear, and the CpGs could inhibit via ZAP, altered RNA splicing similar to what we have observed in HIV-1, or other mechanisms. This highlights both a challenge and an opportunity for introducing CpG dinucleotides to create live attenuated vaccines. The multiple mechanisms of action, as well as the position and context dependence of CpG-mediated viral inhibition, pose a challenge to determining the engineering principles for attenuating viruses with a predicted magnitude and mechanism. The opportunity is that CpGs can be used to attenuate viruses through multiple and potentially additive or synergistic mechanisms, which may enhance the utility of this approach.

MATERIALS AND METHODS
Sequence analysis of viral genomes. The "analyze base composition" tool in MacVector (MacVector Inc.) was used to calculate the CpG and UpA observed/expected ratios for the viral sequences. The Ficarelli et al.
Journal of Virology pHA-Renilla was used, for a total of 1 g DNA. The transfection medium was replaced with fresh medium 6 h (HEK293T) or 24 h (HeLa) posttransfection. The cells were lysed 48 h posttransfection, and the medium was recovered. In experiments performed with type I interferon, 1,000 U/ml of IFN-I (Universal Type I Interferon; PBL Assay Science) was added to the cells upon medium change 6 h posttransfection. The medium was filtered through a 0.45-m filter, and the virions were pelleted for 2 h at 20,000 ϫ g through a 20% sucrose cushion in phosphate-buffered saline (PBS) solution. TZM-bl cell infectivity assay. Supernatant was recovered 48 h posttransfection and filtered as previously described. TZM-bl cells were seeded at 70% confluence in 24-well plates and infected by overnight incubation with filtered virus stocks. Forty-eight hours postinfection, the cells were lysed, and the amount of infectious-virus production was measured by induction of ␤-galactosidase using the Galacto-Star system (Applied Biosystems) following the manufacturer's instructions. ␤-Galactosidase activity was quantified as relative light units per second using a PerkinElmer luminometer.
Quantitative reverse transcription (qRT)-PCR. HeLa cells were transfected at a confluence of 70% in a 6-well plate and after 48 h were washed with 1ϫ PBS and lysed. The RNA was extracted using an RNeasy kit (Qiagen) following the manufacturer's instructions. The supernatant was also collected and treated for 3 h at 37°C with RQ1 DNase (Invitrogen) to decrease plasmid DNA contamination. cDNA was synthesized using a high-capacity cDNA archive kit (Applied Biosystems), and 1 g of RNA from virions was isolated using a QIAamp viral RNA minikit following the manufacturer's instructions. Because carrier RNA was added to the lysis buffer, the total RNA isolated was quantified using a Qubit 3.0 fluorometer (ThermoFisher) and normalized so that 20 ng of RNA from each sample was reverse transcribed using the high-capacity cDNA archive kit (Applied Biosystems). Quantitative PCRs (qPCRs) were performed in triplicate with TaqMan universal PCR mix using an Applied Biosystems 7500 real-time PCR system. The HIV-1 NL4-3 gRNA primers were GGCCAGGGAATTTTCTTCAGA/TTGTCTCTTCCCCAAACCTGA (forward/ reverse), and the probe was 6-carboxyfluorescein (FAM)-ACCAGAGCCAACAGCCCCACCAGA-6carboxytetramethylrhodamine (TAMRA). The HIV-1 NL4-3 total RNA primers were TAACTAGGGAACCCACT GC/GCTAGAGATTTTCCACACTG (forward/reverse), and the probe was FAM-ACACAACAGACGGGCACACA CTA-TAMRA. The absolute number of copies was determined using the slope of the standard curve at a qPCR efficiency between 95% and 105%.
Analysis of HIV-1 splicing. HeLa cells were transfected at a confluence of 70% in a 6-well plate and after ϳ40 h were washed with 1ϫ PBS and lysed. The RNA was extracted using an RNeasy kit (Qiagen) following the manufacturer's instructions. The RNA samples were quantified using a Qubit 2.0 fluorometer (Life Technologies, Carlsbad, CA, USA), and RNA integrity was checked with RNA screen tape on an Agilent 2200 TapeStation (Agilent Technologies, Palo Alto, CA, USA). The multiplexed RNA-sequencing library was prepared using an Illumina TruSeq stranded mRNA library preparation kit following the manufacturer's protocol. Sequencing libraries were validated using DNA analysis screen tape on an Agilent 2200 TapeStation and quantified by using a Qubit 2.0 fluorometer, as well as by quantitative PCR (Applied Biosystems, Carlsbad, CA, USA). The sequencing libraries were multiplexed and clustered on two flow cell lanes. After clustering, the flow cell was loaded on the Illumina HiSeq instrument according to the manufacturer's instructions. The samples were sequenced, using a 2-by-150 paired-end (PE) highoutput configuration, by Genewiz. Image analysis and base calling were conducted with HiSeq control software (HCS) on the HiSeq instrument. The raw sequence data (.bcl files) generated from the Illumina HiSeq were converted into fastq files and demultiplexed using the Illumina bcl2fastq program version 2.17. Adapter trimming was performed using BBDuk (https://jgi.doe.gov/data-andtools/bbtools/), and read pairs were merged with BBmerge (https://jgi.doe.gov/data-and-tools/bbtools/) in order to increase base call quality and generate long, whole-fragment reads. The reads were then aligned with the human genome (hg38) and the HIV NL4-3 genomic-RNA sequence simultaneously using Hisat2 (82). HIV-mapping junction-spanning reads were isolated using regtools (https://github.com/griffithlab/regtools) to allow per-junction read counting. To visualize junctions of interest, data from replicates were first merged using the Picard (http://broadinstitute.github.io/picard) MergeSamFiles function, followed by generation of sashimi plots using Gviz (83). The percentage of HIV-1 junction-spanning reads was calculated by dividing the number of reads for each junction by the total number of junction-spanning reads in the library.