ABSTRACT
Ascoviruses are double-stranded DNA (dsDNA) viruses that attack caterpillars and differ from all other viruses by inducing nuclear lysis followed by cleavage of host cells into numerous anucleate vesicles in which virus replication continues as these grow in the blood. Ascoviruses are also unusual in that most encode a caspase or caspase-like proteins. A robust cell line to study the novel molecular biology of ascovirus replication in vitro is lacking. Therefore, we used strand-specific transcriptome sequencing (RNA-Seq) to study transcription in vivo in third instars of Spodoptera frugiperda infected with the type species, Spodoptera frugiperda ascovirus1a (SfAV-1a), sampling transcripts at different time points after infection. We targeted transcription of two types of SfAV-1a genes; first, 44 core genes that occur in several ascovirus species, and second, 26 genes predicted in silico to have metabolic functions likely involved in synthesizing viral vesicle membranes. Gene cluster analysis showed differences in temporal expression of SfAV-1a genes, enabling their assignment to three temporal classes: early, late, and very late. Inhibitors of apoptosis (IAP-like proteins; ORF016, ORF025, and ORF074) were expressed early, whereas its caspase (ORF073) was expressed very late, which correlated with apoptotic events leading to viral vesicle formation. Expression analysis revealed that a Diedel gene homolog (ORF121), the only known “virokine,” was highly expressed, implying that this ascovirus protein helps evade innate host immunity. Lastly, single-nucleotide resolution of RNA-Seq data revealed 15 bicistronic and tricistronic messages along the genome, an unusual occurrence for large dsDNA viruses.
IMPORTANCE Unlike all other DNA viruses, ascoviruses code for an executioner caspase, apparently involved in a novel cytopathology in which viral replication induces nuclear lysis followed by cell cleavage, yielding numerous large anucleate viral vesicles that continue to produce virions. Our transcriptome analysis of genome expression in vivo by the Spodoptera frugiperda ascovirus shows that inhibitors of apoptosis are expressed first, enabling viral replication to proceed, after which the SfAV-1a caspase is synthesized, leading to viral vesicle synthesis and subsequent extensive production of progeny virions. Moreover, we detected numerous bicistronic and tricistronic mRNA messages in the ascovirus transcriptome, implying that ascoviruses use other noncanonical translational mechanisms, such as internal ribosome entry sites (IRESs). These results provide the first insights into the molecular biology of a unique coordinated gene expression pattern in which cell architecture is markedly modified, more than in any other known eukaryotic virus, to promote viral reproduction and transmission.
INTRODUCTION
Ascoviruses are insect-pathogenic double-stranded DNA (dsDNA) viruses (family Ascoviridae) that attack the larval and pupal stages of lepidopterans, causing a long-term but ultimately fatal disease (1–5). They are characterized by a combination of unique features that easily distinguish them from other insect viruses, including large virion size (400 by 150 nm), reticulate virion envelope structure, and especially the formation of virion-containing vesicles derived by cleavage of infected host cells. This process resembles apoptosis but appears to differ in that what appear to be developing apoptotic bodies continue to grow and produce virions, eventually accumulating in host blood and turning it a milky white color (6). These viruses are also unique in that most have a caspase or caspase-like genes, and in at least one case, an executioner caspase was confirmed to play a role in the changes in cell architecture that lead to the development of virion-containing vesicles (7).
The unique ability of ascoviruses to apparently manipulate the apoptotic pathway, leading to the formation of virion-containing vesicles, i.e., viral vesicles, is a complex process with a high degree of cellular reorganization that possibly involves both cellular and viral genes (8, 9). Although the role of the executioner caspase may be significant in the initiation of apoptosis, other viral and host proteins that participate in the viral vesicle formation remain to be identified. Identification of these genes and how they are temporally coordinated will contribute to our knowledge of the underlying molecular biology of this novel cytopathology in which infected cells are reprogrammed to produce the viral vesicles that enable virogenesis to continue as these circulate in the blood.
Unfortunately, understanding the molecular basis of ascovirus cytopathology is hindered by the lack of a suitable cell line in which to study viral gene expression and the interaction with host cell genes. Although previous studies show the permissiveness of noctuid insect cell lines for ascovirus propagation, viral replication varies significantly among different cell lines (10). More importantly, vesicle formation, the primary distinguishing feature of ascoviruses, is very rare and delayed in all cell lines infected with HvAV-3e (10). Moreover, expression of Spodoptera frugiperda ascovirus (SfAV) core genes in Sf9 cells was very limited and aberrant (7), yielding few virions compared to the large quantities produced in vivo (8).
To begin to address the knowledge gaps regarding the unique molecular biology of how ascoviruses manipulate cell architecture to generate viral vesicles, we used third-instar larvae of Spodoptera frugiperda infected with the ascovirus type species, Spodoptera frugiperdaascovirus 1a (SfAV-1a), coupled with transcriptome sequencing (RNA-Seq) technology to gain basic information about the expression of core and other viral genes. SfAV-1a is known to target mainly the insects' fat body for infection, but other ascovirus species (Heliothis virescens ascovirus [HvAV] and Trichoplusia ni ascovirus [TnAV]) have a broad tissue tropism: they infect the fat body, epidermis, and tracheal matrix (8).
Regardless of the ascovirus species and their differences in tissue tropisms, all ascoviruses are known to induce the same cytopathology, and this has been correlated with the core gene set that is shared by different ascoviruses (8, 9). Ascovirus genomes are circular and range in size from 100 to 200 kbp. For the present study, we selected SfAV-1a, which has a genome of ∼156,922 bp, with G+C ratio of 49.2%, and codes for 123 putative proteins (11). The functions of many of these proteins fall within five main classes, namely, proteins associated with nucleotide metabolism, apoptosis, and lipid metabolism, virion structural proteins, and host interaction proteins. However, unlike baculoviruses (family Baculoviridae), the most widely studied dsDNA viruses that infect caterpillars, and for which the function of many proteins are known, the functions of most ascovirus proteins remain unknown. Some ascovirus proteins are related to those of baculoviruses and the related nudiviruses, but phylogenetic analysis of ascovirus DNA polymerases and major capsid proteins (MCPs) indicates that ascoviruses are more closely related to iridescent viruses and phycodnaviruses than to baculoviruses (8, 11). Thus, our SfAV-1a transcriptome study was aimed at defining the expression patterns of this virus as a prelude to defining the functions of the proteins involved in its unique reprogramming of infected cells to produce progeny virions and impede innate host immunity.
Here we report the results of our initial study, in which we determined the temporal core gene expression of SfAV-1a in vivo. Three temporal classes were defined: early, late, and very late. Among other results, we found that apoptosis inhibitors (ORF016, ORF025, and ORF074) were expressed early, whereas the caspase (ORF073) was expressed very late, demonstrating coordination in this ascovirus in which virus replication was enabled first, followed much later by apoptosis and viral vesicle formation. In addition, exploration of the RNA interference pathway through expression of RNase III (ORF022) and high levels of a Diedel homolog (ORF121), a virokine, in the SfAV-1a transcriptome suggest that the function of these proteins is to overcome the host's innate immune response. Finally, the detection of 15 bi-/tricistronic mRNA messages in the SfAV-1a transcriptome is very unusual for a large DNA virus, showing that this ascovirus and likely others have noncanonical translational mechanisms.
RESULTS
Editing and reannotation of SfAV-1a genes based on RNA-Seq data.The SfAV-1a replication progress and host permissiveness were illustrated by gradual increase in the number of viral reads mapping to the viral genome (in comparison to total reads, including those of the host), with only 0.007% at 6 h postinfection (hpi), then 0.06% at 12 hpi, 1.71% at 24 hpi, 1.87% at 48 hpi, 3.64% at the 4th day postinfection, and 3.76% at the 7th day postinfection. By 24 hpi, the viral reads were covering most of the SfAV-1a genome except the inverted-repeat areas, which had no transcription (11). Mapping of SfAV-1a reads to the genome revealed numerous mismatches with the reported genome sequence, including in reported open reading frames (ORFs). Therefore, editing these sequence differences in all positions that showed 100% mismatch by manual curation was an essential prerequisite to determine whether the ORFs were altered and to perform accurate expression pattern analysis. In general, RNA sequence technology is known to significantly improve genome reannotation (12, 13), and our edits of the published sequence corrected a number of the reported ORFs (Table 1).
Comparison of the SfAV-1a original and new positions for core and TMD-containing ORFs identified in the SfAV-1a transcriptomea
In this analysis, we focused our attention on two SfAV-1a gene sets. The first consisted of 44 core genes, present in more than one ascovirus species, identified in our analysis by BLASTP (NCBI) (Table 1). Many of these genes have been reported in previous studies on the evolutionary relationships of ascoviruses (2, 8, 9). The functions of genes in this set are distributed among five classes, namely, genes associated with nucleotide metabolism, lipid metabolism, apoptosis, structural proteins, or host interaction proteins. Two SfAV genes, which appear to be conserved only in SfAV and HvAV, were added to this set because they were highly expressed or due to putative functions associated with SfAV-1a cytopathology.
The second set consisted of 26 proteins, representing the entire transmembrane domain (TMD)-containing proteins identified after the genome reannotation using the TMHMM server v.2.0 (14, 15). The number of TMDs ranged from 1 to 7. Many of the TMD ORFs had no known function, and as a group they are unique to ascoviruses. Thus, these were selected because of their possible association with the outer cell membrane modifications, making them good candidates for participation in viral vesicle formation. Eight of the 26 genes were also represented in the conserved SfAV-1a gene set, including genes for enzymes capable of modifying membrane lipids, more specifically, a patatin-like phospholipase, a PlsC phosphate acyltransferase, a fatty acid elongase, and an esterase/lipase. These enzymes are known to modify such things as fatty acid chain length and the degree of lipid saturation (9, 11). A comparison of the newly annotated SfAV-1a two gene sets with the original genome map positions is provided in Table 1.
Cluster analysis of SfAV-1a expressed core and TMD genes.Cluster analysis of the SfAV-1a core and TMD genes was performed based on calculations of the reads per kilobase per million mapped reads (RPKMs) to determine the transcription dynamics for each gene. Clusters were categorized into three expression classes, namely, early, late, and very late (Fig. 1 and 2). The early gene cluster consisted of 13 core and 4 TMD genes expressed beginning at 6 h postinfection. All early genes continued to be expressed during later time points and were not downregulated. The functions of early core genes mainly dealt with nucleotide metabolism (ORF040, ORF059, ORF066, ORF095, ORF099, and ORF104), inhibition of apoptosis (ORF016, ORF025, and ORF074), and host interaction (ORF055, ORF072, and ORF107). Only one structural protein was detected in this class (yabby-like transcription factor [corresponding to ORF091]). The four early TMD genes were ORF002, ORF007, ORF039, and ORF085. ORF002 corresponds to a virion structural protein, but its specific function is unknown. None of the four lipid-modifying enzymes were detected in this early stage (Fig. 1 and 2).
Cluster analysis of the SfAV-1a core gene temporal transcription pattern identified in vivo during infection of the 3rd-instar larvae of Spodoptera frugiperda. (A) Heat map representation of SfAV-1a expression of each core gene. Core genes are common genes identified in many ascovirus species. The temporal clusters are indicated on the map as early, late, and very late and are separated by white horizontal lines. Heat map colors represent the difference in transcript expression (measured in log2 RPKMs) from average replicate expression levels at 0 h. (B) Distribution of putative function of genes in each gene cluster.
Cluster analysis of the SfAV-1a transmembrane domain (TMD)-containing genes temporal transcription pattern identified in vivo during infection of 3rd-instar larvae of Spodoptera frugiperda. (A) Heat map representation of SfAV-1a expression of each TMD gene. TMD genes were identified using TMHMM server v.2.0 (14, 15). The temporal clusters are indicated on the map as early, late, and very late and are separated by white horizontal lines. Heat map colors represent the difference in transcript expression (measured in log2 RPKMs) from average replicate expression levels at 0 h. (B) Distribution of putative function of genes in each gene cluster.
The majority of the core (i.e., 28) and TMD (i.e., 16) genes were found to start transcription during the late stage of infection, at 12 hpi. The putative functions of the two gene sets centered mainly on DNA and RNA metabolism (Fig. 1 and 2). In addition, six structural proteins were transcribed in this stage, including the two major SfAV-1a structural proteins, namely, the major capsid protein (ORF041) and the DNA-condensing P64 protein (ORF048 [16]). The late gene cluster was characterized by expression of three core genes coding for lipid-modifying enzymes that have TMDs, namely, esterase (ORF013), fatty acid elongase (ORF087), and phosphate acyltransferase (ORF112). The fourth lipid patatin-like enzyme (ORF093) was detected very late in the infection. Thus, expression of lipid-modifying enzymes late and very late was consistent with the synthesis of viral vesicles.
In addition, five core and six TMD genes were expressed very late in the SfAV-1a infection cycle (expression starting at 24 hpi). Two of the core genes were identified previously as SfAV-1a virion structural proteins, namely, the Ervl/Alr family protein (ORF061) and the serine/threonine protein kinase (ORF064) (16). Both the caspase (ORF073) and putative thioredoxin-like proteins (ORF116) were detected in this cluster, where they shared similar transcription patterns for the different time points postinfection. The remainder of the very late SfAV-1a genes identified were TMD genes with unknown functions (ORF012, ORF032, ORF045, ORF062, and ORF122), with the exception of ORF93, which was identified as corresponding to a fourth lipid-modifying enzyme (patatin-like phospholipase).
Four SfAV-1a genes (ORF121, ORF107, ORF032, and ORF055) were highly expressed (>1,000 RPKMs) at certain stages of the infection, especially compared to many of the genes (<100 RPKMs) in our analysis. Interestingly, ORF121 is a late gene identified as a homolog of the Drosophila Diedel protein gene (die gene) (17). In addition, ORF107 is an early gene sharing homology with the trifunctional transcriptional regulator/proline dehydrogenase/l-glutamate gamma-semialdehyde dehydrogenase in Wigglesworthia glossinidia (11).
The other highly expressed ORFs (ORF032 and ORF055) represent good candidates for future studies because they are highly expressed according to our analysis and have no known function. The ORF055 product shares homology with the 254L protein in IIV-6 and is conserved in all ascovirus species, while ORF032 corresponds to a TMD protein conserved in HvAV and TnAV.
Other highly expressed SfAV-1a genes (>100 RPKMs) that have interesting putative functions included genes for a BRO-like protein (ORF079), a transposase (ORF077), a putative VLTF2-like late transcription factor/Zn finger DNA binding protein (ORF113), an inhibitor of apoptosis (IAP)-like protein (ORF016), the virion structural protein yabby-like transcription factor (ORF091), and a thymidine kinase (ORF040). Also, five highly expressed (>100 RPKMs) TMD genes include those for 2 hypothetical proteins (ORF007 and ORF060), a virion structural myristylated membrane protein (ORF054), and a protein (ORF097) with a conserved RING domain (11).
Identification of SfAV-1a bicistronic and tricistronic messages.A total of nine putative bicistronic and six tricistronic mRNA transcripts were detected in the SfAV-1a transcriptome, with two or three ORFs per transcript, respectively. A description of these bicistronic and tricistronic transcripts, with the position, putative function corresponding to the ORFs, intergenic regions, and poly(A) signal positions near the 3′ end of the message, are listed in Table 2 and described schematically in Fig. 3. Interestingly, many of the messages were missing a poly(A) signal near the 3′ end, with RNA secondary structures, in these cases, observed in some messages. Involvement of stem-loop structure in transcription termination has been reported before for the major capsid protein of Spodoptera exigua ascovirus 5a (SeAV-5a) (18).
Description of the bicistronic and tricistronic mRNA messages identified in the SfAV-1a transcriptome infecting 3rd-instar Spodoptera frugiperda larvaea
Description and validation of SfAV-1a bicistronic (A) and tricistronic (B) messages identified. IR, intergenic region. “>” refers to forward orientation and “<” refers to reverse orientation. (C) RT-PCR validation of mRNA message size for two bicistronic and three tricistronic messages (marked with asterisk) amplified from 24-hpi DNase-treated total RNA isolated from SfAV-1a-infected 3rd-instar larvae using specific primers set for each message (described in Table 3). Lanes 1, 3, 5, 7, and 9 represent the PCR product. Lanes 2, 4, 6, 8, and 10 contain negative RT-PCR controls, in which reverse transcriptase was omitted to confirm the absence of DNA contamination in the RNA samples. Lane M shows 1-kb DNA markers (New England BioLabs).
Validation of the SfAV-1a bicistronic and tricistronic messages.As noted above, we edited the SfAV-1a genome based on mapped sequence reads in different positions where only 100% mismatches were considered real. Thus, we were interested in further validating the bicistronic and tricistronic message sequences to confirm that these were not artifacts of genome editing. Therefore, five messages identified encoding apoptosis inhibitors or immune evasion proteins were selected for further validation by reverse transcriptase PCR (RT-PCR) followed by using specific primer sets for each mRNA message (Tables 2 and 3). Sequences of the amplified PCR products were then confirmed by Sanger sequencing. A 100% sequence homology was detected in all, and message size was confirmed by gel electrophoresis. No DNA contamination was detected in the negative RT-PCR controls (Fig. 3).
Primers used in this study
Validation of the RNA-Seq data by RT-qPCR analysis.We used RT-quantitative PCR (RT-qPCR) to validate the RNA-Seq data. Three SfAV-1a genes (ORF016, ORF032, and ORF037) were selected randomly for comparison of their transcription patterns at three time points (24, 48, and 96 hpi). The RT-qPCR data for the three genes were in agreement and consistent with the RNA-Seq data as illustrated in Fig. 4.
Comparison of the RNA-Seq profile with the data obtained from RT-qPCR for three viral genes (ORF032, ORF016, and ORF037). The host L8 ribosomal gene was used for normalization of the qRT-PCR data (37). Two technical and three biological replicates were tested for each sample. The 24-hpi sample was used as the reference for fold change calculations at indicated infection time points. The error bars represent the standard deviations between the three biological replicates at each time point.
SfAV-1a promoter consensus sequences.MEME analysis revealed conservation of different consensus motifs in the promoter sequences of several SfAV-1a core and TMD genes representing different temporal gene clusters (Fig. 5). Moreover, screening of the promoter sequences of the bi-/tricistronic messages identified a consensus motif (Fig. 5). We were able to detect the previously identified late consensus sequences TATA box-like motif (TAATTAAA) and ATTTGATCTT in two late genes, namely, ORF22 (RNase III) and ORF41 (MCP), respectively (18, 19). The TATA box like motif is located 16 bp before the ATG start codon, and the ATTTGATCTT consensus sequence is 28 bp before the ORF start codon. The two motifs are shared between ascovirus MCPs and IIV-6 (late genes). Previous molecular phylogenetic studies provide evidence that the ascoviruses evolved from iridoviruses (20). However, we did not find these motifs in other genes, i.e., in the SfAV-1a core and TMD genes we studied.
Conserved promoter motifs in regions 100 bp upstream of the transcription start site for certain SfAV-1a early, late, and very late core genes (A to C), including late transmembrane domain-containing genes, and bicistronic/tricistronic messages (D) using the MEME motif discovery suite (63).
Prediction of putative IRES sequences using IRESPred.Putative internal ribosome entry site (IRES) sequences were identified using the IRESPred server (http://bioinfo.net.in/IRESPred/ ) in the intergenic regions of some bi-/tricistronic messages (Table 2), implying the possible use of noncanonical protein translational mechanisms such as IRESs by SfAV-1a during translation of some of these messages.
DISCUSSION
A potential problem frequently encountered with RNA-Seq in vivo model systems is a low percentage of pathogen transcripts, which makes a full transcriptome study difficult because of the comparatively high levels of host RNAs (21). However, in our in vivo insect larval system we were able to detect strand-specific RNA sequences from most of the viral genes, even those expressed at a low level, making this system appropriate for transcriptome analysis. This is a good indication of the sensitivity of RNA-Seq technology for detection of SfAV-1a RNAs in this system and the possibility of using it in future studies for this virus family, especially given existing limitations in the in vitro systems (10). Moreover, in vivo RNA-Seq provides actual data on which viral genes are essential during host infection, thus leading to a better understanding of how these viruses establish infections in their caterpillar hosts. Several recent studies have confirmed significant differences between the in vitro and in vivo systems in which expression patterns between the same genes may vary as much as a 100-fold (22, 23). We focused our transcription analysis on the core genes, i.e., those conserved in many ascovirus species, and transmembrane domain (TMD)-containing genes, since the latter represent good candidates to be either directly or indirectly associated with ascovirus vesicle formation, the distinguishing feature of ascovirus cytopathology. However, the single-base resolution of the virus transcripts in our analysis revealed the need for SfAV-1a genome reannotation to match the transcription pattern of the virus. Therefore, the sequence and position for genes included in this study were revised first to accurately estimate transcription levels (Table 1).
The temporal cluster analysis revealed a possible division of SfAV-1a gene transcription into three groups, early, late, and very late, a transcription pattern similar to that in other large DNA viruses, such as baculoviruses (24). In the early gene cluster expressed at 6 hpi, three inhibitor of apoptosis (IAP)-like proteins (ORF016, ORF025, and ORF074) with a RING domain containing an E3 ubiquitin ligase near the C terminus was detected in each. IAPs are known to block the apoptosis pathway through caspase degradation (25). Early expression of apoptosis inhibitors was detected in a baculovirus temporal expression analysis, in which the antiapoptosis protein p35 was detected after 6 hpi (24). The early transcription of apoptosis inhibitors can serve as an essential step in success of viral replication since both baculoviruses and ascoviruses are known to induce cell death. Therefore, early expression of proteins such as p35 or IAPs inhibiting or halting apoptosis induction may serve as a general mechanism for infection in DNA viruses. The delay in synthesis of the SfAV-1a executioner caspase (ORF073) and the early IAP-like protein expression indicate coordination between apoptosis inhibition and the induction at specific stages of infection by this virus that ultimately enhances virus replication (Fig. 6). The caspase activity of this virus was studied previously in vitro, where its association with apoptosis induction was confirmed, the synthesis of this enzyme being initiated 9 hpi (7). Differences between in vitro and in vivo viral gene expression have been reported for other viruses, where the host immune system barriers and cell-specific factors are associated with these differences (26). In addition, detection of both a caspase (ORF073) and putative thioredoxin-like protein (ORF116) in the very late stage, where they shared similar transcription patterns at different time points postinfection, is reasonable. Thioredoxin is a small protein that plays a role in regulation of cell redox state. Maintenance of the cell redox state is essential for cell death induction through apoptosis rather than necrosis (27–29). Alternatively, thioredoxin may have an antiapoptotic activity through its reactive oxygen species (ROS) scavenging ability (29, 30). Moreover, detection of a protein with proline dehydrogenase activity (ORF107) highly expressed (>1,000 RPKMs) may indicate association of other apoptosis inducers in ascovirus cytopathology coordinated with caspase activity. Proline dehydrogenase can induce intrinsic apoptosis by catalyzing formation of reactive oxygen species (31).
(A) SfAV-1a caspase (ORF073) very late expression (at 24 hpi) versus early expression (at 6 hpi) of putative inhibitor of apoptosis (IAP-like proteins) ORFs (ORF074, ORF016, and ORF025). (B) SfAV-1a Diedel homolog (ORF121) expression pattern in vivo.
It is important to note here that although the data for the SfAV-1a caspase suggest that it plays an important role in the changes in cell architecture that lead to viral vesicle formation, there is little support that the caspase-like genes that occur in other ascoviruses, such as HvAV-3a and TnAV-2a, yield functional caspases. Moreover, Diadromus pulchellus toursvirus (previously, Diadromus pulchellus ascovirus 4a; DpAV-4a), which recently has been assigned to a new ascovirus genus, Toursvirus (https://talk.ictvonline.org/ ), bears no genes resembling a caspase gene, yet this virus assembles viral vesicles similar to those formed in other ascoviruses (32). In addition, the caspase-like protein synthesized by HvAV-3e does not act as a caspase; instead, RNA interference (RNAi) gene silencing illustrates that this protein is essential for replication (33). Thus, these caspase-like genes are not orthologs of the SfAV-1a caspase, but their maintenance in HvAV-3a and TnAV-2a suggests that they play an important yet largely unknown role in ascovirus replication and changes in cell architecture.
The SfAV-1a transcription of early genes continued at later time points, which is similar to early gene expression in both frog iridovirus 3 (FV3) and Chilo iridescent virus (CIV) (34, 35), implying that complete degradation of early messages is not essential for late expression, as reported for poxviruses (36). Moreover, transcription of viral structural genes usually occurs in late stages of infection, when virion assembly takes place. However, ORF091 shares homology with yabby-like transcription factor, which corresponds to an early gene, based on our analysis. Therefore, it is possibly associated with activation of a set of the virus genes during late stages of infection. The early synthesis of transcription factors that function as components of virion particles has been reported for herpes simplex virus, where the vp16 transcription factor detaches from the virion and is directed to the nucleus to initiate infection. Adenoviruses are also known to use the same mechanism to initiate a cascade of regulated viral gene transcription, a common character of DNA virus transcription (25).
Focusing on the TMD gene expression highlights two points. First, the late and very late expression of the four lipid-modifying enzymes (ORF013, ORF087, ORF093, and ORF112) when vesicles start to accumulate implies that these proteins are associated with vesicle membrane lipid modification or stabilization after apoptosis induction by the caspase, rather than being prerequisite building blocks for construction of the vesicle membrane. Second, the few TMD genes with unknown functions (ORF012, ORF032, ORF045, ORF062, and ORF122) detected in the very late stage represent good candidates for in vivo protein localization during viral infection to check their role in vesicle membrane formation and integrity.
Comparison of the RNA-Seq profiles with data obtained from RT-qPCR for three viral genes reveals good correlation between the two methods and thus supports RPKM calculations in our study. Moreover, this reveals that the Spodoptera frugiperda L8 housekeeping gene (37) can be used as a reference gene in ascovirus transcription studies, since it shows great stability after the infection. Overall, the RT-qPCR is both a sensitive and reliable method for technical validation of RNA-Seq data (38).
Detection of nine putative bicistronic and six tricistronic mRNA transcripts in the SfAV-1a transcriptome ranging in size from 800 to 5,567 bp was surprising given that the majority of large DNA viruses are not known to form polycistronic messages; monocistronic messages are the dominant species. Therefore, we selected five messages for further validation using RT-PCR to rule out any possible sequence artifact after genome editing based on the RNA-Seq reads. Both the message sizes and the Sanger sequences of the RT-PCR products are in complete agreement with the RNA-Seq data. Searching for this phenomenon in other DNA viruses, the presence of longer RNA transcripts that equal two or three times the predicted size of the ORFs was found in Chilo iridescent virus, the virus most closely related to ascoviruses (39). Moreover, the same phenomenon was reported for other large DNA viruses, such as herpes simplex virus 1 (HSV-1) and vaccinia virus (40, 41). Later, HSV was found to use the noncanonical internal ribosome entry site (IRES) strategy in translation (42). Although the recruitment of the ribosome by IRES secondary structures is known for many RNA viruses, it has only rarely been reported for DNA viruses, including herpes simplex virus 1, Kaposi's sarcoma-associated herpesvirus (KSHV), Epstein-Barr virus (EBV), and white spot syndrome virus (WSSV) (42–45). Therefore, given that ascoviruses destroy the nucleus prior to viral vesicle formation, and that eukaryotic cells favor the switch to an IRES mechanism during apoptosis (46, 47), the ability of ascoviruses to use the IRES mechanism may be advantageous to ensure that replication and translation proceed efficiently. Alternatively, because SfAV-1a expresses a FLAP-like endonuclease protein (ORF066) which shares sequence homology with the virion host shutoff proteins found in herpesviruses (48, 49), it is possible that the use of noncanonical translation mechanisms may be advantageous to the virus through suppressing the host cell canonical cap-dependent translation while keeping a high efficiency for viral translation, especially for important genes associated with apoptosis inhibition and immune evasion (Table 2). However, due to the limitation of a tightly synchronized infection in our in vivo system, determination of host RNA decline is difficult to rule out. Therefore, evaluation of this hypothesis may be possible in the future by focusing on a specific infected host tissue, such as the fat body in case of SfAV-1a, to detect this decline.
Overall, although we identified in silico several putative IRES sequences in the intergenic regions (Table 2) for the bicistronic and tricistronic messages using the IRESPred server (50), experimental validation is needed to determine whether these are functional IRES sequences and to rule out the absence of any cryptic promoter activity or ribosome reinitiation. A future experiment that can be used for validation is to clone these sequences in a bicistronic reporter system in which the proteins are easily detected, for example, firefly and Renilla luciferases (51).
Interestingly, two of the tricistronic messages encoded proteins recently identified to function in host immune system suppression, namely, RNase III (ORF022) and a Diedel homolog (ORF121). RNase III represents a conserved protein among 4 different ascoviruses species, SfAV, TnAV, HvAV, and DpAV. RNase III enzymes are known to function in processing different RNAs and gene silencing. In a recent study of HvAV-3e RNase III, differential gene expression in two different tissue cultures demonstrated the important role of this gene in both virus replication and RNAi silencing specifically by targeting small interfering RNA (siRNA) molecule degradation. The RNase III transcript was detected at 16 h postinfection and peaked at 48 h, followed by a significant decrease in the gene expression level (52). The late expression of the RNase III gene is in agreement with what we found, as the first detection of ORF022 was at 12 hpi. However, the expression increased gradually until day 7 postinfection. Suppressors of host RNAi have been reported for other large DNA viruses, for example, iridoviruses and baculoviruses (53, 54). The evolutionary conservation of an RNAi suppressor in all ascovirus species implies exposure of ascoviruses to the host RNAi machinery, and probably the dsRNA, the pathway inducer signal, originates from the bidirectional transcription, the main source of dsRNA in the case of DNA viruses (55–58). This is in agreement with what we found with SfAV-1a transcripts, which overlap in many positions along the genome, potentially generating dsRNA substrates.
In addition, high expression (>1,000 RPKMs) of the Diedel protein in the SfAV-1a transcriptome may add a third host innate immune system suppression strategy explored by ascoviruses besides apoptosis and RNAi suppression. Diedel is a 12-kDa protein recently identified to function as a cytokine that downregulates the immune deficiency (IMD) pathway in flies and prolongs the life span of the fly under viral infection (17, 59). Homologs of die occur in two other families of dsDNA insect viruses, namely, baculoviruses and entomopoxviruses, and in the venom of two wasp species (Leptopilina heterotoma and Leptopilina boulardi). According to Lamiable et al., die is the first insect “virokine” gene, which may well function in host immune system suppression through downregulation of the IMD pathway. The protein is found to be induced by viral and not bacterial infection in Drosophila, in which it attains significantly high levels (460-fold) at 24 hpi, especially during infection by enveloped viruses (Sindbis virus [SINV] and vesicular stomatitis virus [VSV]), with gradual decline at later time points postinfection, which is in agreement with our data (59) (Fig. 6). Moreover, the die mutation was found to reduce the life span of the fly, especially under SINV infection, in comparison to controls. This reduction correlated with high expression of the immune deficiency pathway-regulated genes when the die gene was missing. The SfAV-1a Diedel homolog protein (ORF121) expression rescued the reduced life span phenotype of die mutant flies, at least partially. Therefore, it is expected that ascovirus Die homologs conserved in SfAV-1a and HvAV were acquired during evolution, from either their lepidopteran or wasp host, and used during lepidopteran host infection to either modulate the IMD pathway by functioning as a virokine or prolong the host life span by functioning as a negative regulator of the IMD pathway. Prolonging the host's life span increases the virus replication period and the chance of virions or vesicles being picked up by endoparasitic wasp ovipositors in nature rather than leading to a dead end in the lepidopteran host, since the infection is not known to vertically transfer.
In conclusion, further transcriptional studies of ascoviruses will improve our knowledge about the molecular basis of their unique cell biology. RNA-Seq in vivo is a useful experimental approach to identify new candidate genes associated with pathogenesis that probably would be either undetected or overlooked in an in vitro system.
MATERIALS AND METHODS
Virus strain and host infection.As in previous studies, we used the wild-type strain Sf82–126 isolate of Spodoptera frugiperda ascovirus 1a (SfAV-1a) (32), the type species for the family Ascoviridae. This strain was used to infect 3rd-instar caterpillars of Spodoptera frugiperda, and all transcriptome data were derived from this developmental stage. Larvae were reared on artificial medium (Benzon Research, Carlisle, PA) and kept at room temperature (22°C). An unusual feature of ascoviruses is that feeding larvae virions or viral vesicles results in low infection levels. To mimic natural infection by parasitic wasps, a minutin pin was contaminated with SfAV-1a virions by dipping it in a suspension of viral vesicles (1 × 108/ml), after which the pin was inserted into the larva's abdomen. Progression of the infection was monitored daily by examining hemolymph color changes and using phase-contrast microscopy to follow viral vesicle accumulation in this tissue.
Total RNA isolation.One milliliter of TRIzol reagent (Invitrogen, Life Technologies) per sample was used for isolation of total RNA from 7 time points, starting with uninfected (healthy) larvae and then sampling at 6, 12, 24, 48, 96, and 168 h postinfection (hpi). Total RNA was isolated from 3 larvae at each time point. The entire body of each larva was homogenized separately by mechanical trituration in TRIzol, followed by the addition of chloroform and separation of the RNA in the aqueous phase. The RNA was precipitated with 100% isopropanol and washed twice in 75% ethanol. RNA quantity and quality were determined using a Thermo Scientific NanoDrop ND-2000c spectrophotometer.
Purification and DNase treatment of RNA.An RNA Clean and Concentrator TM-5 kit (ZYMO RESEARCH) accompanied by an in-column DNase digestion using an RNase-Free DNase set (Qiagen) was used for concentrating the RNA and to eliminate contaminating DNA, respectively. After treatment, the RNA quantity was determined as described above.
Enrichment of mRNA, RNA-Seq library preparation, and sequencing.The NEBNext poly(A) mRNA magnetic isolation module kit (New England BioLabs) was used to enrich the mRNA from the DNase-treated total RNA samples. The poly(A) RNA was then eluted from the oligo d(T)25 attached to paramagnetic beads in 15 μl of the first cDNA strand synthesis reaction buffer combined with random primer mix (2×), followed by 10 min of heating at 94°C. The RNA-Seq libraries were then prepared from the isolated mRNA following the manufacturer's instructions of the NEBNext Ultra Directional RNA library preparation kit (New England BioLabs) for Illumina for two biological replicates at each time point. After library preparation, the quality of the pooled libraries was determined using an Agilent 2100 bioanalyzer and the pooled libraries were sequenced using the HiSeq2500 sequencer (Illumina) in the UCR Core Facility in the Institute for Integrative Genome Biology.
SfAV-1a ORFs reannotation based on the RNA-Seq data.Initial mapping of our RNA-Seq reads with the published SfAV-1a genome (11) indicated the presence of a number of mismatches between our data and the reported sequence of SfAV-1a. Thus, we used the RNA-Seq data to reannotate the original genome sequence of SfAV-1a prior to proceeding with the transcriptome analysis. We initially edited the SfAV-1a genome available through the NCBI database (accession number NC_008361.1 ) only at those positions where 100% of our RNA-Seq reads agreed on a change to the published genome, followed by separation of plus- and minus-strand reads to facilitate reannotation of genes in overlapping transcript regions and for accurate determination of transcript start and stop positions. The transcript start and stop positions were manually curated. The ORF-Finder (NCBI) was used for ORF predictions.
Two types of SfAV-1a genes were specifically targeted for further analysis. The first consisted of 44 core genes conserved among two or more ascovirus species, many of them containing conserved domains and motifs identified by BLASTP (Table 1). Two proteins conserved in both SfAV and HvAV and not in TnAV or DpAV were added to this set for either their high expression or their putative function that could be associated with SfAV-1a cytopathology. The second set consists of 26 SfAV-1a proteins identified through in silico analyses as having at least one or as many as seven transmembrane-spanning motifs using the TMHMM server v.2.0 (14, 15) (Table 1).
Bioinformatic analysis, RPKMs, and read mapping.RNA-Seq libraries were first filtered for low-quality reads using the FASTX toolkit (Hannon Lab [http://hannonlab.cshl.edu/fastx_toolkit/index.html ]); adapter sequences were removed using Trimmomatic (60), and ribosomal contaminants were identified and removed (61). These filtered libraries were mapped to the SfAV-1a genome sequence deposited in the NCBI database (accession number NC_008361.1 ) using bowtie2 (62). Following reannotation of the SfAV-1a genome (see above), libraries were remapped to the reannotated genome using bowtie2. Transcript expression was estimated using reads per kilobase per million mapped reads (RPKMs). Because RNA-Seq reads were sequenced using a directional protocol, only reads representing reads in the 5′-to-3′ direction of each annotated transcript were used for RPKM calculations.
Promoter motif signal detection.Genomic sequences 100 bp upstream of the transcription start site of all the ORFs with a defined 5′ end in the 2 gene sets (core and TMD genes) were examined to identify consensus promoter motifs for different stages of infection (early, late, and very late) using the MEME suite (63). In addition, the sequences 100 bp upstream of the transcription start site for each bi-/tricistronic message were screened using the MEME suite for identification of conserved consensus motifs in their promoters. We excluded the promoter sequence for any gene (core or TMD) that is part of a bi-/tricistronic message in analysis of monocistronic early, late, and very late genes.
RT-qPCR validation of the RNA-Seq data.To validate the RNA-Seq data, three viral genes were selected randomly for validation of their RPKM expression levels using RT-qPCR at 3 time points (24, 48, and 96 hpi). One microgram of the same total RNA samples as used in the RNA-Seq library preparation was reverse transcribed from each sample using a Maxima first-strand cDNA synthesis kit for RT-qPCR (Thermo Scientific). Real-time reactions were carried out using the iQ SYBR green Supermix (Bio-Rad) including 5 μl of 1:25 cDNA in each 25-μl reaction volume. The PCR was performed in a CFX Connect real-time PCR detection system (Bio-Rad) using the following program: 95°C for 3 min, 40 cycles of 90°C for 10 s and 60°C for 30 s, followed by 72°C for 30 s. Melting curves were generated for each reaction at the end of the PCR run to confirm the specificity of the PCR. The final concentration used from each primer was 0.5 μM. Primers were designed using the IDT real-time PCR design tool (Table 3). Standard curves were generated for each gene to calculate primer efficiency. The host L8 ribosomal gene was used for normalization since it showed stability in all the selected time points (37). Two technical and three biological replicates were tested for each sample. The third biological replicate total RNA samples were included in this test. The 24-hpi RNA was used as the reference. Each set of reactions included a no-template control and negative RT-PCR control to confirm absence of DNA contamination. The relative quantification was calculated based on the Pfaffl equation (64).
Validation of the bicistronic and tricistronic messages.Three tricistronic and two bicistronic messages out of the 15 identified SfAV-1a bi-/tricistronic mRNAs (Table 2) were further confirmed by using specific primer sets (Table 3) designed to amplify these messages from the 24-hpi DNase-treated total RNA sample after reverse transcriptase PCR using the following program: 94°C for 4 min followed by 30 cycles of 94°C for 20 s and annealing at 58 to 60°C for 20 s and extension at 72°C for 3 min, with a final extension at 72°C for 5 min. The Maxima first-strand cDNA synthesis kit for RT-qPCR (Thermo Scientific) was used for the reverse transcription reaction, and the PCR PreMix tubes (Bioneer) were used for the specific primer amplification reaction. The PCR product size of each reaction and its negative RT control was confirmed on a 1% agarose gel using Tris-borate-EDTA (TBE) running buffer (Fisher Scientific). All the PCR products were sequenced using Sanger sequencing.
IRESPred for IRES prediction in intergenic regions.We used the IRESPred web server, available at http://bioinfo.net.in/IRESPred/ , for screening of putative internal ribosome entry site (IRES) sequences in the intergenic regions identified in the bicistronic and tricistronic messages (Table 2). IRESPred is designed to predict both cellular and viral IRES structures. However, we were not able to test intergenic regions less than 15 bp in size because of program restrictions (50).
Accession number(s).All the RNA-Seq data obtained in this study have been deposited in the NCBI Gene Expression Omnibus (GEO) and can be accessed through the GEO series accession number GSE98382 .
ACKNOWLEDGMENTS
Heba Zaghloul was funded by the Fulbright Program, sponsored by the U.S. Department of State's Bureau of Educational and Cultural Affairs and administered by AMIDEAST in Egypt, her home country.
We thank Yves Bigot for constructive critical comments on our study.
FOOTNOTES
- Received 30 May 2017.
- Accepted 31 August 2017.
- Accepted manuscript posted online 27 September 2017.
- Copyright © 2017 American Society for Microbiology.