Horizontal Transfer of a Retrotransposon from the Rice Planthopper to the Genome of an Insect DNA Virus

This study provides an example of the horizontal transfer event from a rice planthopper genome to an IIV-6 genome. A small region of the IIV-6 genome (∼300 nt) is highly homologous to the sequences presented in high copy numbers of three rice planthopper genomes that are related to the SINEs, a class of retroposons. The expression of these planthopper SINE-like sequences was confirmed, and corresponding Piwi-interacting RNA-like small RNAs were identified and comprehensively characterized. Phylogenetic analysis suggests that the giant invertebrate iridovirus IIV-6 obtains this SINE-related sequence from Sogatella furcifera through a horizontal transfer event in the past. To the best of our knowledge, this is the first report of a horizontal transfer event between a planthopper and a giant DNA virus and also is the first evidence for the eukaryotic origin of genetic material in iridoviruses.

H orizontal transfer (HT) of genetic material has been increasingly discovered between different viruses and their eukaryotic hosts, and it shapes the evolution of the viruses and their hosts (1). During long evolution of the virus-host relationship, HT events can occur in two opposite ways: from host to virus or from virus to host. For host-to-virus HT, the viral genome can acquire various host genes, such as ubiquitin (2), chloroplast protein (3), and heat shock protein (4), during evolution. Giant viruses or nucleocytoplasmic large DNA viruses have a very large linear or circular genomic double-stranded DNA (dsDNA) molecule between 100 kb (such as some phycodnaviruses and iridoviruses) and 2.5 Mb (such as pandoraviruses). It has been reported that giant viruses contain high proportions (at least 10%) of host-derived genes, and some of these genes are key factors for viral pathogenesis (5)(6)(7). For the virus-to-host direction, viruses hijack many of the host cellular functions to facilitate their own replication, and the sequences of many viruses have occasionally been integrated into host chromosomes during these interactions, a process called endogenization (8). These integrated viral sequences, which may be whole or partial, are referred to as endogenous viral elements (EVEs) (9). With the sequencing of many eukaryotic genomes and advances in bioinformatics, many EVEs derived from retroviral or nonretroviral viruses have been discovered in a variety of eukaryotes (10). Since EVEs are integrated into the germ line and are vertically inherited in their hosts, they serve as viral imprints (fossils) and provide unprecedented opportunities to explore the evolution of viruses and their interactions with various hosts (8). Recent studies have also shown that EVEs derived from nonretroviral viruses can act as templates for the production of PIWI-interacting RNAs (piRNAs; 24 to 32 nucleotides [nt] in length), a small RNA class that was associated with Piwi-subfamily proteins, which might play essential roles in antiviral immunity of the mosquito Aedes aegypti, thereby providing a memory reservoir of past immunity events (9,11,12).
Besides HT events between different viruses and their eukaryotic hosts, eukaryoteto-eukaryote HT are also prevalent in nature (13). Recent studies indicated that most of the eukaryote-to-eukaryote HTs are related to transposable elements (TE), and viruses are major vectors of HT between eukaryotes (14). Piskurek et al. (15) reported that poxviruses (family Poxviridae) are possible vectors for HT of retroposons (a class of non-long terminal repeat [LTR] retrotransposon, subfamilies of short interspersed elements, or SINEs) from reptiles to mammals. Another example is that baculovirus (Autographa californica multiple nucleopolyhedrovirus, family Baculoviridae) infection facilitates HT of two transposable elements from cabbage looper (Trichoplusia ni) between several sympatric moth species (16). With the large amounts of new genomes and short read archives deposited in public databases, more virus-mediated eukaryoteto-eukaryote HT will no doubt be revealed and contribute to our understanding of mechanisms underlying HT between eukaryotes.
The small brown planthopper (SBPH; Laodelphax striatellus), brown planthopper (BPH; Nilaparvata lugens), and white-backed planthopper (WBPH; Sogatella furcifera), generally called rice planthoppers, belong to family Delphacidae (order Hemiptera) and are three of the most destructive insect pests of rice in tropical and temperate regions of Asia (17). In addition to direct feeding damage, they act as efficient vectors of plant viruses and phytoplasmas, including at least 18 important phytopathogenic rice viruses, some of which replicate in their vector as well as in the host plant, such as Rice black-streaked dwarf virus (RBSDV, a reovirus) and Rice stripe tenuivirus (RSV, a tenuivirus) for L. striatellus (18,19), Rice ragged stunt virus (a reovirus) and Rice grassy stunt virus (a tenuivirus) for N. lugens (20), and Southern rice black-streaked dwarf virus (SRBSDV, a reovirus) for S. furcifera (21). Insect-specific viruses are also commonly reported in rice planthoppers, including Himetobi P virus (HiPV), a picorna-like virus that infects the three rice planthoppers asymptomatically with high frequency (22,23). There has been little reported work on HT in rice planthoppers, except for the identification of nudivirus (family Nudiviridae, closely related to polydnavirus)-like sequences in the N. lugens genome. Nudivirus sequences were widely found in the scaffolds or contigs of the N. lugens genome, and these viral sequences were reported to be expressed in different tissues of the insect. However, although the rod-shaped nudivirus virions were not detected in various insect tissues by electron microscopy, the current evidence does not rule out the possibility that these integrated viral sequences are free virus in N. lugens rather than ancient viral relics (24).
Chilo iridescent virus is classified as Invertebrate iridescent virus 6 (IIV-6), the type species of the genus Iridovirus, family Iridoviridae (25). It was originally isolated from diseased larvae of the rice stem borer (Chilo suppressalis) and has been used as the standard model for studies on invertebrate iridoviruses (26,27). Although IIV-6 can infect more than 100 insect species belonging to at least six orders, including Hemiptera (leafhoppers) (27,28), it has never been reported to infect planthoppers. Because the virus causes limited mortality to insects and has a large genome, it has received little research attention (29). Its dsDNA genome has 212,482 bp and contains 468 open reading frames (ORFs) (30,31). Although the viruses in the family Iridoviridae have relatively large genome sizes, iridoviruses seem to be less prone to lateral gene exchange with their host than other giant viruses, such as poxviruses (family Poxviridae) and a marseillevirus (family Marseilleviridae) (6). In addition, eukaryotic class II DNA transposons (miniature inverted-repeat transposable elements, or MITEs) were recently identified in the genomes of iridoviruses (Invertebrate iridescent virus 9, IIV-9, and Invertebrate iridescent virus 22,, indicating that these viruses act as vectors for HT of transposable elements between host species (32). Nevertheless, the origins of these transposons in the genome of iridoviruses are still unclear.
In this study, potential HT events of genetic material between three rice planthoppers and virus genomes were investigated. Interestingly, a small region of the IIV-6 genome (ϳ300 nt) is highly homologous to the sequences present in high copy numbers in rice planthopper genomes that have a sequence relatedness to SINE retroposons. Phylogenic analysis indicated that this SINE-like element is transferred from the planthopper to the IIV-6 genome in the past after the evolutionary divergence of the three rice planthoppers.

RESULTS AND DISCUSSION
Identification of VLSs in the genomes of three rice planthoppers. The availability of recently published genomes of L. striatellus, N. lugens, and S. furcifera provides resources to identify virus-like sequences (VLSs) in rice planthoppers (33)(34)(35). By homology search using planthopper genomes to NCBI virus RefSeqs, 1,699, 5,422, and 4,038 VLSs were discovered in the genomes of L. striatellus, N. lugens, and S. furcifera, respectively (see File S1 in the supplemental material). Interestingly, all identified VLSs were homologous to viruses that have never been reported to infect planthoppers, and none of these viruses were from known planthopper-transmitted rice viruses (such as RSV and RBSDV) or insect-specific viruses (such as HiPV). This contrasts with recent results showing that the genome of mosquitos (major vectors of flaviviruses such as yellow fever virus and dengue virus) contains endogenous flaviviral elements (36)(37)(38). Although the VLSs that we identified are similar to those of viruses that are not known to infect rice planthoppers, they might have infected planthoppers in the past and provide persistent viral fossil evidence in the host genome.
Iridovirus-like sequence that is homologous to the sequences with high copy numbers in rice planthopper genomes. Intriguingly, we found that the vast majority of VLSs in planthoppers were homologous to a region in the IIV-6 (an iridovirus) genome. The percentages of VLSs that are homologous to the IIV-6 sequence were 97.76%, 92.23%, and 98.41% in L. striatellus, N. lugens, and S. furcifera, respectively. The genomes of the three planthoppers next were searched against the IIV-6 genome (NC_003038.1) to confirm the presence of VLSs that are homologous to IIV-6 (File S2). Our results indicated that 1.54% of L. striatellus contigs (587/38,193), 5.34% of N. lugens scaffolds (2,485/46,559), and 4.85% of S. furcifera scaffolds (991/20,450) contain at least one sequence homologous to IIV-6 with significant matches ( Table 1). The top 20 contigs/scaffolds that contain the highest numbers of homologous sequences in the three rice planthoppers are shown in Fig. 1A. IIV-6 has a large genome, and the first (so far the only) complete genome was sequenced in 2001 (30). It is 212,482 bp long and has 468 predicted ORFs (30). Surprisingly, mapping results indicated that all of the discovered homologous sequences (1,686 for L. striatellus, 5,031 for N. lugens, and 3,986 for S. furcifera), except one of N. lugens in scaffold 137, mapped to a short region (ϳ300 nt) from nt 157,843 to 158,142 nt of the viral genome that covered most regions of ORF 353L, the intergenic region, and parts of ORF 354L (here this region is referred to as IIV6_300) (Fig. 1B). ORFs 353L and 354L are both on the complimentary strand of the IIV-6 genome; ORF 354L encodes a protein with a predicted L-lactate dehydrogenase active site domain, while the function of 353L is currently unknown (30). The majority of the homologous sequences were only 100 to ϳ200 bp long, and their integrations are almost equal in both directions (Table 1 and Fig. 1B). To experimentally validate the presence of the homologous sequences, five sequences from different contigs/scaffolds (approximately 700 bp) of each of the three planthoppers were  randomly selected and amplified by PCR. Amplification products with the expected sizes were obtained from all of the selected contigs/scaffolds, and Sanger sequencing of the purified DNA products confirmed their identity (Fig. 1C). Although IIV-6 has a broad host range and can infect more than 100 insect species (27), to the best of our knowledge, this is the first report of an HT event of the genetic material between IIV-6 and a eukaryotic host. IIV6_300 sequence is a predicted transposable element of rice planthoppers. Transposable elements are the major components of eukaryotic genomes and account for approximately 25.7%, 38.9%, and 32.6% of sequences in the genomes of L. striatellus, N. lugens, and S. furcifera, respectively (33)(34)(35). Transposable elements are pieces of DNA that are able to jump from one locus to another in the genome of their host, and the majority of HT events reported until now are the transfers of transposable elements (14). Due to the high copy numbers of the sequences that are homologous to the IIV6_300 sequence in the planthopper genomes, these sequences, including a 500-nt extension in both 5= and 3= termini, were analyzed for the presence of transposable element motifs using CENSOR (39). The analysis indicated that they contain the conserved SINE3-1_TC motif, which is also present in the IIV6_300 sequence (Fig. 1B). Thus, they may be short interspersed nuclear elements (SINEs), which is a class of non-LTR retrotransposon (retroposon) present in various eukaryotic genomes. Note that we did not find the rice planthopper SINE-like sequences (RPSlSs) in the genome of the rice stem borer, the known host of IIV-6. Taken together, these observations suggest that IIV-6 probably obtained a transposable element from a planthopper through an HT event. In the case of other iridoviruses, IIV-9 and IIV-22 were predicted to contain eukaryotic DNA transposon MITEs, which might result from HT (32), but the origins of the predicted eukaryotic MITEs are still unclear.
Transcription and integration profile of RPSlSs in rice planthoppers. The complete genome of IIV-6 was used as a database and searched with the newly reassembled transcriptomes of the three planthoppers. A total of 19, 24, and 178 planthopper transcripts containing RPSlSs were found in L. striatellus, N. lugens, and S. furcifera, respectively, indicating that some of the RPSlSs are transcribed in planthoppers (Tables  2 and 3 and Table S1). As shown in Fig. 2, some RPSlSs were distributed in the transcribed regions of planthopper genes with various predicted functions, such as glycine hydroxymethyltransferase and ubiquitin-conjugating enzyme in L. striatellus, methyltransferase and electron transfer flavoprotein in N. lugens, and tyrosine-protein kinase and glucose dehydrogenase in S. furcifera. Planthopper transcripts contain RPSlSs derived from both strands (Fig. 2). In addition, five RPSlSs from each planthopper were randomly selected and analyzed by reverse transcription-PCR (RT-PCR) (Fig. 3A), followed by Sanger sequencing. The positions of the primer sets are indicated by red arrows below the transcripts (Fig. 2). The result confirmed that RPSlSs are indeed expressed in planthoppers rather than contaminant sequences from incidental exogenous sources.
Notably, none of the RPSlSs were integrated into the coding regions of predicted planthopper genes (Fig. 2). This may be because the disruption of the coding genes leads to detrimental effects on the insects. A previous study showed that transposable elements in the genome can be expressed at low levels and can play important roles in the regulation of gene expression (40,41). Whether the RPSlSs inserted into planthopper genomes have similar transposon-like functions as the regulators of gene expression in rice planthoppers needs further investigation.
To investigate the expression profile of RPSlS loci at different planthopper developmental stages, seven RPSlSs of N. lugens were selected for RT-quantitative PCR (qPCR) analysis. There were relatively low expression levels in eggs or first-instar nymphs (except transcript TCONS_00024158) and markedly high expression in late-instar nymphs and adults (Fig. 3B). This result shows that RPSlSs containing transcriptions are differently regulated during the different developmental stages of N. lugens.    Characteristics of RPSlS-derived small RNAs. The canonical function of the piRNA pathway is in defense against transposable elements and to protect the integrity of the genome in both germ line and gonadal somatic cells of animal species (42). Recent results in mosquitos suggest that piRNAs can also be produced by endogenous flaviviral elements and play a role in insect antiviral immunity (12,38). Thus, it is interesting to investigate whether RPSlS loci produce small RNAs. Nine publicly available small RNA libraries of three planthoppers were mapped to the complete genome of IIV-6 (NC_003038.1). Of the small RNA reads that mapped to the IIV-6 genome, 70.5% to 93.2% of the unique small RNA reads and 64.8% to 96.8% of redundant reads mapped to the IIV6_300 sequence, which indicates the accumulation of small RNAs derived from RPSlS loci (Table 4).
More RPSlS-derived small RNAs were identified in S. furcifera than in the other two planthoppers (Table 4), perhaps because of the closer relationship of RPSlSs in S. furcifera with the reference exogenous IIV-6 (see Fig. 5). Since there are some sequence variations among RPSlSs from the three rice planthoppers and exogenous IIV-6, and for a better understanding of the production of RPSlS-derived small RNAs, small RNA libraries of LS_VF (L. striatellus), NL_CX (N. lugens), and SF_VF (S. furcifera) were further mapped to three randomly selected RPSlS-containing transcripts from corresponding planthoppers. As expected, more small RNAs derived from RPSlS loci were identified by this method (Table 4). Evidently, small RNAs were specifically mapped to RPSlS regions except for the TCONS_00020430 transcript (Fig. 4A). Obvious small RNA hotspots were observed, and these were usually identified in both strands (Fig. 4A). Interestingly, RPSlS-derived small RNAs are predominantly 26 to 28 nt, followed by a 21-to 23-nt peak, although TCONS_00020430 has a clear 22-nt peak (Fig. 4B). However, a 26-to 28-nt small RNA peak was observed in TCONS_00020430 if only the RPSlS region of the transcript was mapped (data not shown), suggesting that the abundant small RNAs with a length of 21 to 23 nt are mainly derived from different regions of the transcript.
The production of piRNAs (a class of the small RNAs) from endogenous viral elements was recently reported from mosquitos; these were antisense strand and could target cognate viral RNA (11,12,38). Previous studies indicated that the piRNA pathway plays an essential role in antiviral defense of mosquitos but not of other insects, such as a fly (Drosophila spp.) (43). Another study demonstrated that exogenous IIV-6, as a dsDNA virus, triggers an RNA interference-based antiviral defense mechanism in Drosophila with the generation of virus-derived small interfering RNA in a DICER2 (RNase III enzyme)-dependent manner (44). From our results, small RNAs derived from RPSlS loci were predominantly 26 to 28 nt long, which is a typical characteristic of piRNAs (24 to 32 nt) (45). We therefore extracted RPSlS-derived small RNAs with lengths of 26 to 28 nt for further sequence logo analysis (https://weblogo.berkeley .edu/logo.cgi). However, this analysis did not identify another typical characteristic of piRNAs, namely, a strong U bias at the 5= terminus or enrichment of A at nt 10 (42 and data not shown). It is remains unclear whether RPSlS-derived small RNAs function in the piRNA pathway against transposons. It will also be interesting to further investigate whether RPSlS-derived small RNAs could mediate antiviral defense against IIV-6 infection.
Phylogenetic relationship of IIV6_300 and RPSlSs. A phylogenetic tree was constructed based on the RPSlSs using the maximum likelihood method. Evidently, RPSlSs were grouped according to the insect species with strong bootstrap support (Fig. 5). IIV6_300 sequence is clustered with RPSlSs of S. furcifera, indicating that IIV-6 obtained a SINE-like transposable element from S. furcifera in the past after the evolutionary divergence of the three rice planthoppers. Note that we could not find any homologous sequence to RPSlSs in other viruses deposited in the public database. Considering that IIV-6 is a giant DNA virus that commonly obtains genetic material from the host, it is very likely that the transposable element is transferred from a planthopper host to the IIV-6 genome. It will be interesting to investigate the possible HT of RPSlSs between eukaryotic organisms involving virus vectors, as recently reported for other viruses and hosts (14)(15)(16).
In conclusion, our investigation on possible occurrences of HT between rice planthoppers and viruses leads to the finding of newly identified retroposon-like elements that transfer to an iridovirus. To the best of our knowledge, this is the first report of a potential HT event between a planthopper and a giant DNA virus and also the first evidence for the eukaryotic origin of genetic material in iridoviruses. The results of this study will further contribute to our understanding of HT events between viruses and their eukaryotic hosts.

MATERIALS AND METHODS
Insect cultures. Populations of three planthoppers (L. striatellus, N. lugens, and S. furcifera) that were not carrying the known rice viruses were reared on susceptible rice seedlings (cv. Wuyujing no. 3) in climate-controlled rooms at 26°C Ϯ 1°C, with a photoperiod of 16 h of light and 8 h of darkness and 70% Ϯ 10% relative humidity.
VLSs in three rice planthopper genomes. The assembled genomes of L. striatellus, N. lugens, and S. furcifera were retrieved from Gigadb and the NCBI reference genome database (33)(34)(35). These genomes were searched against NCBI virus RefSeqs (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral) using a BLASTN algorithm with a cutoff E value of Յ10 Ϫ5 . The detected virus-like sequences (VLSs) are listed in File S1 in the supplemental material. Since most of the planthopper VLSs (Ͼ90%) were mapped to a restricted region (ϳ300 nt) of IIV-6 (IIV6_300), the three planthopper genomes were then searched directly against the IIV-6 genome (NC_003038.1) to identify the IIV-6-like nucleotide sequences (sequences homologous to IIV6_300) in planthoppers. BLAST results are listed in File S2. In addition, contig/scaffold regions of planthoppers that mapped to IIV-6 were further extracted and extended 500 bases at both 5= and 3= termini (to the end of the termini) and used for the identification of potential transposable elements with CENSOR (https://www.girinst.org/censor/index.php).
IIV6_300-like sequences containing transcripts identified from reassembled rice planthopper transcriptomes. Transcriptome raw data were downloaded from the NCBI Sequence Read Archive (SRA)  (Continued on next page) database for L. striatellus (SRX2013762), N. lugens (SRX023419), and S. furcifera (SRX104935). The filtered transcriptome raw reads were then aligned against their corresponding genomes using Tophat2 (http:// ccb.jhu.edu/software/tophat/index.shtml) and reassembled using Cufflinks (http://cole-trapnell-lab .github.io/cufflinks/). The newly reassembled transcripts of the three planthoppers are available upon request. The assembled transcriptomes were also searched again against the IIV-6 genome (NC_003038.1) using BLASTN (E value of Յ10 Ϫ5 ) to identify the transcripts containing the IIV6_300-like sequence (rice planthopper SINE-like sequences, or RPSlSs). The identified planthopper transcripts were then searched against NCBI NR (NCBI nonredundant protein sequences) and NT (nucleotide sequences) databases for annotation. The results are listed in Table 2 (L. striatellus), Table 3 (N. lugens), and Table S1 (S. furcifera). Furthermore, to determine the accurate location of the RPSlS within the planthopper transcripts and genome, the planthopper transcripts containing RPSlSs were used as a query to search against the genome of the three planthoppers using BLASTN (E value of Յ10 Ϫ10 ), and the results are available upon request. Detection of planthopper scaffolds/contigs containing RPSlSs. Genomic DNAs were extracted from the three planthoppers using an insect DNA extraction kit (Omega, USA) following the manufacturer's instructions. Five scaffold/contig sequences (partial, ϳ500 to ϳ700 bp, containing RPSISs) from each planthopper were randomly selected to verify the presence of RPSISs. The PCR products of each sample were purified, ligated into the pMD18-T vector (TaKaRa, China), and sequenced (Tsingke, China). The primer sets used for genome amplification are listed in Table 5.
Detection of planthopper transcripts containing RPSIS. Total RNAs were extracted from the three planthoppers using TRIzol reagent (Invitrogen, USA). The purified RNAs were mixed with genomic DNA remover (Toyobo, Japan) and used for RT-PCR. cDNA was synthesized using HiScript II reverse transcription (Vazyme, China) according to the manufacturer's instructions. Five partial transcripts (approximately 500 bases) containing RPSISs from each planthopper were randomly selected to confirm the expression of RPSISs. The PCR products of each sample were also sequenced as described above. The positions of the primer sets used to amplify the transcripts are shown by red arrows in Fig. 2, and the primer sequences are listed in Table 5.
Expression analysis of RPSISs containing RNAs in N. lugens. To determine the expression of RPSISs containing transcripts in N. lugens at different developmental stages, samples from eggs, first-instar nymphs, second-and third-instar nymphs, fourth-and fifth-instar nymphs, and female and male adults were collected for RNA extraction. Equal quantities of total RNA from each sample were used for cDNA synthesis, as described earlier. Primer sets specific for the seven transcripts containing RPSIS were used for RT-qPCR using the 18S rRNA of N. lugens as an internal reference gene. The primer sequences are listed in Table 5. Three independent biological replicates were used in this experiment.
For small RNA bioinformatics analysis, preliminary treatment of the raw data was performed as described previously (47). In brief, small RNAs with lengths of 18 to 30 nt were extracted and collapsed for downstream analysis after 3= adaptor removal and treatment of low-quality and junk sequences. The treated small RNAs of each library were mapped to the IIV-6 genome (NC_003038.1) using Bowtie software (http://bowtie-bio.sourceforge.net/index.shtml), allowing for one mismatch to identify RPSISderived small RNAs. In addition, to confirm the presence of RPSIS small RNA within planthopper transcripts, three planthopper small RNA libraries (LS_VF, NL_CX, and SF_VF) were mapped to three randomly selected transcripts (containing RPSISs) from each planthopper. The subsequent analyses were performed using custom Perl scripts and Linux bash scripts.
Phylogenetic analysis of RPSISs. Relatively long RPSISs (the L strand) from each planthopper were selected and aligned in ClustalW implemented in MEGA (version 6) (48), followed by manual editing. Planthopper sequences mapped to a region from nt 158120 to 157874 of the IIV-6 genome (the region of ORFs 353L and 354L of IIV6_300) were used for phylogenetic analysis considering the length concordant to the aligned RPSISs. The only one exogenous IIV-6 (IIV6_300) with the corresponding range and orientation available at present was included in this analysis. Phylogenetic analysis was carried out  using MEGA 6, and the tree was generated using the maximum likelihood algorithm (1,000 bootstrap replications) (48).