Previous Article | Next Article ![]()
Journal of Virology, March 2004, p. 2967-2978, Vol. 78, No. 6
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.6.2967-2978.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Laboratorio de Bioinformatica,2 Departamento de Bioquimica, Instituto de Quimica,1 Faculdade de Medicina Veterinária e Zootecnia,7 Departamento de Microbiologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, 05508-900 São Paulo,9 Departamento de Parasitologia, Instituto Adolfo Lutz, 01246-902 São Paulo,4 Laboratório de Parasitologia,5 Centro de Biotecnologia, Instituto Butantan, 05503-900 São Paulo,12 Laboratory of Neurosciences (LIM27), Instituto de Psiquiatria, HCFM, Universidade de São Paulo, 05403-010 São Paulo,11 Departamento de Bioquímica e Imunologia, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, 14049-900 Ribeirão Preto,6 Instituto de Computacao, Universidade Estadual de Campinas, 13084-971 Campinas, São Paulo, Brazil,10 Department of Pediatrics and Departments of Biochemistry, Orthopedics, Physiology and Biophysics, University of Iowa, Iowa City, Iowa 52242,3 Department of Biology, University of York, York YO10 5YW, United Kingdom8
Received 25 September 2003/ Accepted 2 December 2003
|
|
|---|
|
|
|---|
The LTR class of retrotransposons integrates into the genome by means of an integrase with a high degree of sequence specificity (52). It has been generally accepted that LTR retrotransposons can be divided into two major groups, Ty1/copia and Gypsy/Ty3, but the existence of a third group, the BEL (or Pao-like) group, has been proposed (1, 9, 41). Usually LTR retrotransposons have one or two open reading frames (ORFs) with products that show similarities to retroviral Gag and Pol polypeptides. However, some invertebrate retrotransposons, such as Gypsy, have been shown to possess an additional ORF with properties analogous to those of retroviral env, conferring on them the ability to infect other cells (30).
The non-LTR class of retrotransposons, in contrast, integrates into the genome by using a mechanism by which an endonuclease nicks the chromosome and DNA synthesis is initiated using the 3' hydroxyl of the broken strand of target DNA as the primer for reverse transcription (37). Non-LTR retrotransposons comprise one or two ORFs, and only the RT domain is common to all elements. Phylogenetic analysis of this domain has allowed the non-LTR transposons to be classified into 11 clades (40). Other characteristic domains present in some elements are an apurinic or apyrimidic endonuclease, an RNase H, and a putative nucleic acid binding motif.
Schistosoma mansoni, a digenetic blood fluke, is the primary causative agent of schistosomiasis in humans and an important source of morbidity on a global scale. The disease is endemic in 74 developing countries, infecting about 200 million individuals, and it is estimated that an additional 500 to 600 million are at risk (59). The Schistosoma genome has approximately 270 Mbp (54), and a considerable portion (more than 20%) is believed to be composed of retrotransposons (31). Four retroelements belonging to LTR and non-LTR classes have been previously characterized for S. mansoni (10, 14, 15). The presence of RT activity in Schistosoma extracts suggests that some of these elements are active (23).
In this work we describe the sequence and structure of full coding regions of three novel LTR retrotransposons, including a member of the BEL family, not previously described in schistosomes and one novel non-LTR retrotransposon. All have high transcriptional activity and have been reconstructed from expressed sequence tag (EST) data generated by the "Schistosoma mansoni EST Genome Project" (http://bioinfo.iq.usp.br/schisto). Acquisition and maintenance of the parasitic way of life requires the ability to evolve rapidly, which could be conferred by high retrotransposon activity. The data presented here double the number of retrotransposons described for S. mansoni, highlight the existence of two populations with distinct features regarding gene number and transcriptional activity, and show examples of retrotransposon fragment inserts in four different S. mansoni target gene transcripts, suggesting an influence of such elements in S. mansoni genome evolution.
|
|
|---|
Reconstruction of retrotransposon sequences.
EST sequence chromatograms were stored, processed, and trimmed through a Web-based service (48); sequences with at least 100 bp with phred-15 or higher (http://www.phrap.org/) were accepted and further evaluated. S. mansoni retrotransposons were filtered by using BLASTN (http://www.ncbi.nlm.nih.gov/BLAST/) analysis with a local copy of the GenBank nucleotide database and the BlastMachine (Paracel, Inc.), and were processed with a fast parser tool (49) to select those that matched known S. mansoni retrotransposon sequences with an E value of
10-15 and had at least 85% identity along at least 75 nucleotides. By comparing with BLASTX against the set of transposon protein sequences from the GenBank nonredundant (NR) protein database and selecting those matching with an E value of
10-4 and at least 30% identity along at least 75 amino acids, we identified further potential novel S. mansoni transposon coding sequences. A total of 10,348 putative transposon EST reads were selected.
Selected reads were assembled using the Cap3 program (22) to generate the core sequences. Selected core sequences for the novel S. mansoni LTR transposons Saci 1 to -3, the novel non-LTR transposon Perere, and the previously described Boudicca retrotransposon (10) were picked by manual inspection of the longest assembled sequences and were extended by manual curation, using a local copy of BLASTN and the Bioedit program (version 5.0.6) (19) to compare each transposon consensus with the 179,072 EST reads from the project; these full-length assemblies were used as a blueprint for construction of full-length sequences by using a minimum set of EST reads selected from the pool to reconstruct an ORF for each transposon, leaving out truncated copies that generated stops. With the exception of Saci-2, all reconstructed sequences were anchored at the 3' end by a sequence from the directional poly(dT)-primed normalized library (56), in order to avoid problems of artifacts due to incorrect, ambiguous assembly of expressed segments of LTRs. Saci-3 was anchored by a 3'-end sequence from GenBank dbEST (accession number AI018990.1). Saci-perere is a hero-trickster (50) in the native Tupi Indian mythology of South America, a very short young black boy with a red bonnet that confers his magic powers. He is hyperactive and jumps all over the place on his single leg, haunting people and playing tricks. He lives in whirlwinds and moves very fast, making loud whistling noises and scaring people when they travel alone at night in the dark forests.
Construction of phylogenetic trees. The RT domains of novel and known S. mansoni retrotransposons were aligned with Clustal X (version 1.83) (55). The alignment of the characteristic (Y/F)XDD box was checked to ensure the quality of the alignment. Further analysis with Clustal X by using the neighbor-joining method, excluding positions with gaps, resulted in the phylogenetic trees shown in the figures. The confidence of the branches was evaluated by bootstrap analysis using 1,000 samplings. Phylogenetic trees were drawn using Treeview (version 1.6.6) (47). The GenBank sequences and accession numbers utilized for construction of alignments and phylogenetic trees are as follows: BEL, AAB03640.1; blastopia, CAA81643.1; cer-1, AAA50456.1; Copia, OFFFCP; CsRn, AAK07486.1; Dea1, T07863; Grasshopper, AAA21442.1; Gypsy, GNFFG1; HIV2, AAA76841.2; Kabuki, BAA92689.1; Kamikase, 9757434; Mag, S08405; Maggy, AAA33420.1; Micropia, CAA32198.1; MMTV, GNMVMM; Ninja, T31674; Pao, S33901; Pao (P1), BAA95569; SIV, AAA47606.1; Sushi, AAC33526.2; Ted, AAA92249; Tom, CAA80824.1; Ty1, P47100; Ty3, S53577; Ulysses, CAA39967.1; woot, AAC47271.1; Yoyo, T43046; Zam, CAA04050.1; BRG2, X60372.1; CgT1, AAA85636.1; CR1 Gallus, AAC60281.1; CR1 spixii, BAA88337.1; CRE1, M33009.2; Czar, AAA30239.1; Doc, CAA35587.1; I, AAA70222.1; Ingi, CAA29181.1; Jockey, AAA28675.1; Juan, AAA29354.1; L1Homo, AAC51279.1; L1Mouse, AAC72810.1; L1Rat, AAB41224.1; Lian, AAB65093.1; pido, AY034003.1; Q, AAA53489.1; R1Dros, CAA36227.1; R1Mori, AAC13649.1; R2Bombyx, AAB59214.1; R2dros, CAA36225.1; R2esrwig, AAC34906.1; R4, AAA97394.1; RTE1, AAC72298.1; SR1, AAC06263.1; SR2, AAC24982.2; Swimmer, AAD02928.1; T1, AAA29367.1; Tad1, AAA21781.1; Tart, AAC46494.1; Tx1, AAA49976.1.
Southern blotting. Twenty-five micrograms of genomic DNA from S. mansoni adult worms was subjected to overnight incubation at 37°C with the EcoRI or BamHI restriction enzyme (New England Biolabs) in the appropriate buffer. Samples were divided into five aliquots of 5 µg each, which were electrophoresed in 0.8% agarose gels in Tris-acetate-EDTA (TAE) at 1 V/cm and immobilized on a Hybond-N+ nylon membrane (Amersham Biosciences) by using the Posiblot apparatus (Stratagene).
Radiolabeled probes for each retrotransposon were generated from 25 ng of fragments of approximately 700 bp labeled with [32P]dCTP by using Rediprime kits (Amersham Biosciences). After labeling, 1 µl of each probe had its radioactivity counted, and the same number of counts was used in all experiments. Overnight hybridization was performed in 6x SSC (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate)-0.5% sodium dodecyl sulfate (SDS)-5x Denhardt's solution-25 µg of salmon testis DNA (Sigma)/ml at 68°C. Membranes were washed once with 1x SSC-0.2% SDS and twice with 0.5x SSC-0.2% SDS for 30 min each time at 68°C. Radioactive signals were detected by using a phosphor screen and a Storm apparatus (Molecular Dynamics), and images were processed by using the ImageQuant program (Molecular Dynamics).
PCR of genomic DNA for detection of full-length copies. The following primers flanking the entire coding region of each retrotransposon were designed: Pererefw, GTTTGCGTTACGATCACACG; Perererv, ATTTCCAGTGCCAGAGCAAG; Saci-1fw, TGCCTAACAATCGTGCAAAG; Saci-1rv, GGTTCACCTAATCGCTTTCC; Saci-2fw, GAGGCTTGTGATTCCCACTG; Saci-2rv, ACTGTCCTCAGTGCCTGGTC; Saci-3fw, TTTGGAACACGCAATACAGC; Saci-3rv, CAAGCTCGAACCAAACAAGG.
These primers were used for PCR of S. mansoni genomic DNA by using the Advantage 2 polymerase mix (BD Bioscience) and a cycling program of 95°C for 3 min followed by 40 cycles of 95°C for 30 s, 60°C for 30 s, and 68°C for 5 min in a GenAmp PCR system 9700 (Applied Biosystems). The ramp of temperature transition between the annealing and extension steps was reduced to 5% of the default speed in order to increase the amount of amplified product.
Aliquots (2 µl) of each reaction product were electrophoresed in 0.8% agarose gels and further immobilized on a Hybond-N+ nylon membrane (Amersham Biosciences) by using the Posiblot apparatus (Stratagene). Hybridization with radioactive probes was performed with the same probes used for Southern blotting and with the same protocol for hybridization and washing.
Estimation of transposon copy numbers in the S. mansoni genome by use of BLASTN. Comparative estimates of copy numbers of retrotransposons in the S. mansoni genome were obtained essentially as described by Copeland et al. (10), by using the local BLASTN program to compare each of the reconstructed full-length transposons with the database of 27,064 bacterial artificial chromosome (BAC) end sequences, obtained by filtering out the duplicated deposits of 42,017 S. mansoni genome survey sequences from GenBank. The count of hits with scores higher than 100 divided by the total length of the query transposon allowed us to calculate a gene index, an estimate of the relative abundance of the gene in the S. mansoni genome. The absolute copy number range was estimated from the gene index by using the copy number range of the Boudicca retrotransposon as a benchmark (10).
Estimation of relative transcription rate and activity. Transcription rates for different retrotransposons were calculated by using the local BLASTN program to compare each of the full-length transposons with the database of 179,092 ESTs generated by the Schistosoma mansoni EST Genome Project; hits with matching scores higher than 100 were counted. Counts were divided by the sequence length for normalization. The resulting number was considered to reflect the overall expression level among the different stages of the S. mansoni life cycle. The results were divided by the value obtained for SR2 for normalization.
A second estimate was obtained by using a SAGE library previously generated by our group from adult worms (56). In silico restriction maps for NlaIII were produced for each retrotransposon, and the 10 bp adjacent to the 3'-most NlaIII site was recorded as the expected SAGE tag for that transposon transcript. This tag sequence was used to search the database of 68,238 SAGE tags, which had been sequenced from S. mansoni adult worm mRNAs, and the number of identical tags (exact matches) was counted.
Identification of fragments of novel transposons inserted into known genes of S. mansoni. Sequences of retrotransposons Saci-1, -2, and -3, Perere, Boudicca, and SR2 were used as queries for BLASTN searches of our EST database. Reads with matching E values lower than 0.01 were retrieved and assembled with Cap3. A BLASTX search of the resulting contigs and singlets against the GenBank NR protein database followed by manual inspection allowed the identification of sequences containing an ORF for a known protein besides the segment of retroviral sequence. In order to exclude possible artifact chimeras generated at the vector ligation step of EST production, we considered only those transposon inserts that were confirmed by at least two different EST clones spanning the junction between the transposon and the target gene.
Genomic DNA sequences of the new transposons. Preliminary sequence data for the S. mansoni genome was obtained from "The Schistosoma mansoni Genome Project" at The Institute for Genomic Research (TIGR) (http://www.tigr.org) and at The Sanger Institute (ftp://ftp.sanger.ac.uk/pub/databases/Trematode/S.mansoni/). We used these data for a BLASTN search, with the novel reconstructed retrotransposon sequences as queries. Results described refer to the genomic best match for each retrotransposon.
Nucleotide sequence accession numbers. Full-length reconstructed sequences for the four new transposons were deposited in the Third Party Annotation section of the DDBJ/EMBL/GenBank databases under the following TPA accession numbers: Saci-1, BK004068; Saci-2, BK004069; Saci-3, BK004070; Perere, BK004067. The reconstructed sequence for Boudicca (10), for which no full-length sequence was available, was deposited under TPA accession number BK004066. All 7,086 ESTs matching the novel transposons and the known Boudicca and SR2 transposons were deposited in GenBank under accession numbers CF490117 to CF497202.
|
|
|---|
To determine the ancestral origin of these novel transposons, phylogenetic trees were constructed. Sequences from the RT domains of several members of the LTR group of retrotransposons and some retroviruses were aligned by using the Clustal X program (55), and a phylogenetic tree was constructed by using the neighbor-joining method (Fig. 1). The result clearly distinguished the three different LTR transposon families previously described and permitted classification of the three novel S. mansoni LTR retrotransposons, Saci-1, -2, and -3 (see below). The same approach was used for members of the non-LTR group of retrotransposons, and the resulting phylogenetic tree (Fig. 2) allowed the distinction of all the 11 clades previously described, permitting classification of the novel S. mansoni non-LTR transposon Perere as a member of the CR1 family (see below). The latter family includes the previously reported transposon pido of the closely related trematode Schistosoma japonicum. A set of 44,000 ESTs has recently been acquired from adult worms and eggs of S. japonicum (21). A search of that database, by using TBLASTX and sequences from the three novel S. mansoni LTR transposons as queries, showed that very few messages similar to these S. mansoni LTR transposons are expressed in S. japonicum (52 ESTs in all, with a matching cutoff E value of
10-5), and the four transcripts that covered the RT domains were not phylogenetically related to those of the S. mansoni transposons (data not shown).
![]() View larger version (17K): [in a new window] |
FIG. 1. Phylogenetic tree for the RT domains of LTR retrotransposons. The tree was constructed by the neighbor-joining method, excluding positions with gaps. Previously described S. mansoni retrotransposons are boldfaced, and the three novel S. mansoni LTR retrotransposons identified in this work are boxed. Numbers represent the confidence of the branches assigned by bootstrap analysis (in 1,000 samplings); bootstrap values lower than 500 are omitted from the figure.
|
![]() View larger version (15K): [in a new window] |
FIG. 2. Phylogenetic tree for the RT domains of non-LTR retrotransposons. The tree was constructed by the neighbor-joining method, excluding positions with gaps. Previously described S. mansoni retrotransposons are boldfaced, and the novel non-LTR S. mansoni retrotransposon identified in this work is boxed. Numbers represent the confidence of the branches assigned by bootstrap analysis (in 1,000 samplings); bootstrap values lower than 500 are omitted from the figure.
|
![]() View larger version (61K): [in a new window] |
FIG.3. Multiple-sequence alignment of novel S. mansoni LTR retrotransposons. Clustal X alignments for conserved regions of Gag, protease (Pro), RT, RNase H (RH), and integrase (Int) are presented. Arrows above the Cys domain point to cysteines and histidines comprising the motifs; shaded arrows indicate that the amino acid is not present in all sequences in the alignment. In the protease domain, the D(T/S)G motif common to aspartic proteases is boxed. Arrows above RNase H and integrase alignments point to conserved motifs previously described. Regions RT1 to RT7, each containing one of the seven motifs described for RT, are indicated.
|
Saci-2 has an unusual Cys motif. The Cys motif of the Gag domain of Saci-2 resembles that present in other retroposons (CX2CX4HX4C) such as micropia of Drosophila melanogaster and COS41.3 of Ciona intestinalis, but a lysine replaces the conserved histidine (Fig. 3). This amino acid is confirmed by 22 out of 24 ESTs that cover that region of the retrotransposon, and neither of the other 2 EST sequences codes for a histidine. There is an additional histidine in Saci-2, 1 amino acid distant from the final cysteine of the motif. This creates a new CX2CX9CXH motif, which resembles the motif of the CCCH zinc finger (Zf) protein family (CX8CX5CX3H), shown to bind specific RNAs (32). Further experiments are warranted to determine whether this is a functional domain or simply a nonfunctional degeneration of the CX2CX4HX4C motif.
Saci-3 is closely related to Boudicca and CsRn1. The first ORF of Saci-3 encodes a putative Gag protein presenting a Cys domain with a motif (CX2HX9CX3C) identical to that of S. mansoni Boudicca (10), Clonorchis sinensis CsRn1 (3), and Bombyx mori Kabuki (Fig. 3) and different from the usual CX2HX4CX4C motif seen in most retrotransposon and retrovirus Gags (10, 11). The second ORF has a structure typical of an ORF encoding a Pol polyprotein, with the presence of domains for protease, RT, RNase H, and integrase. The phylogenetic tree constructed from RT domains places Saci-3 in the same branch as Boudicca, CsRn1, and Kabuki, with Boudicca more closely related to CsRn1 than to Saci-3 (Fig. 1). Saci-3, like Boudicca, has a third ORF that does not exhibit identity to other known proteins and may code for an envelope protein, due to its position on the retroviral message. However, it is particularly difficult to characterize envelope proteins, since they present a low degree of similarity to each other (33).
Perere is a member of the CR1 family of non-LTR retrotransposons. Perere has a single ORF coding for a product of 1,227 amino acids, a polyprotein with domains for endonuclease and RT (Fig. 4). The presence of the endonuclease domain suggests that the mechanism of integration of Perere involves the nicking of target DNA. The phylogenetic tree suggests that Perere belongs to the CR1 family of non-LTR retrotransposons (Fig. 2), and apparently it forms a discrete branch with S. japonicum pido and a non-LTR retrotransposon of Caenorhabditis elegans. Such a discrete branch has been described previously (31) but was weakly supported; now, with the addition of Perere, the existence of this branch gains strong bootstrap support.
![]() View larger version (49K): [in a new window] |
FIG. 4. Multiple-sequence alignment of the novel S. mansoni non-LTR retrotransposon Perere. Clustal X alignments for conserved regions of the endonuclease (End) and RT domains are presented. Regions RT1 to RT7, each containing one of the seven motifs described for RT, are indicated.
|
In contrast, for SR2, a reconstruction using genomic clones (accession number AF025672), the ESTs in our data set showed a lower level of identity (91.7% ± 4.6%), suggesting that the genomic clone deposited in GenBank is highly divergent from most of its expressed copies.
Estimates of copy numbers and transcriptional activities of S. mansoni retrotransposons. Computational estimates of copy number showed that Saci-1 and -2 display relatively low copy numbers in the genome of S. mansoni (70 to 850 copies), while Saci-3 and Perere display intermediate copy numbers (150 to 2,500 copies) compared to those of previously described S. mansoni retrotransposons (Table 1). Southern blot experiments with all four new retrotransposons, and with Boudicca as a benchmark, showed that Boudicca exhibits a significantly higher signal, indicating that it is present at a much higher copy number in the genome than the other retrotransposons (Fig. 5A).
|
View this table: [in a new window] |
TABLE 1. Transcriptional activities of S. mansoni retrotransposonsa
|
![]() View larger version (31K): [in a new window] |
FIG. 5. Detection of the novel S. mansoni retrotransposons in the parasite's genome. (A) Southern blot of novel retrotransposons. S. mansoni genomic DNA was digested with one of two different restriction enzymes and analyzed by Southern blotting with radiolabeled probes specific for each of the transposons. Fragments of similar sizes and the same number of radioactive counts were used for each of the five transposon probes. The Boudicca retrotransposon (10) was included for comparison, along with the four new retrotransposons Saci-1, -2, and -3 and Perere. (B and C) Detection of full-length copies. (B) Products of PCR from genomic S. mansoni DNA with primers specific for each retrotransposon were subjected to electrophoresis in 0.8% agarose gels, transferred to nylon membranes, and hybridized with the same probes as those used for panel A. (C) A second 0.8% agarose gel with the same products was stained with ethidium bromide to visualize all products, including those not detected by the probes. The expected sizes of amplified products based on the reconstructed full-length sequences were as follows: Perere, 4,593 bp; Saci-1, 5,130 bp; Saci-2, 4,246 bp; Saci-3, 5,003 bp.
|
The ratio of the relative transcriptional rate (or SAGE tag count) to the copy number allowed us to estimate the relative transcriptional activity, which would reflect the average level of transcription per genomic copy of each retrotransposon. It is noticeable that Saci-1, -2, and -3 and Perere have considerably higher transcriptional activities (1 to 2 orders of magnitude higher) than the other S. mansoni retrotransposons that have been described (Table 1).
The frequency of sequenced transposon transcripts for each of the life cycle stages was determined (Table 2). In cercariae, the frequency of novel transposon ESTs was 11.2% of all transcripts sequenced, and that of the known transposons Boudicca and SR2 was 3.2% (Table 2). For all types of transposons taken together, the frequency of transposon ESTs in cercariae was twofold higher than that in schistosomula and three- to fourfold higher than those in adults, eggs, miracidia, and germ balls.
|
View this table: [in a new window] |
TABLE 2. Frequency of sequenced transposon transcripts in life cycle stages
|
![]() View larger version (13K): [in a new window] |
FIG. 6. Genomic sequences of the novel transposons. Shown is a schematic representation of the alignment of genomic clones. Assembled BAC sequences were retrieved from the TIGR and Sanger S. mansoni genome sequencing projects, and BAC end sequences were obtained from GenBank. Numbers above the diagrams represent the position (in base pairs) in each clone. Shaded areas, LTR sequences. The hatched area in the Saci-1 alignment represents a portion of the genomic clone LTR that had no similarity to the reconstructed sequence or to any EST from our data set, and the solid area represents a gap in the sequence to allow for alignment. The GenBank accession numbers of BAC end sequences are given in open rectangles above the regions of Saci-2 to which they are aligned.
|
Further confirmation of the existence of genomic full-length copies for each retrotransposon was obtained by PCR of S. mansoni genomic DNA using primers designed to flank the coding region of each retrotransposon (Fig. 5B and C). We were able to amplify a fragment of approximately the size expected for a full-length copy for each of the reconstructed retrotransposons (Fig. 5B), as detected by hybridization with the same radioactive probes used in Southern blot experiments. In addition, Perere and Saci-1 presented a few other copies with lower molecular weights, as recognized by the radioactive probes, which must correspond to truncated copies. Other amplicons of lower molecular weights, which were not recognized by the radioactive probes, were visualized by ethidium bromide staining (Fig. 5C) and possibly represent truncated copies of retrotransposons lacking the probe region.
Identification of DNA fragments of novel transposons inserted into known genes of S. mansoni. We were able to map fragments of retrotransposon sequences to the untranslated regions (UTR) of four different S. mansoni gene transcripts (Table 3). It is noteworthy that in all cases of LTR transposon insertion, the inserted fragments were from the LTR region and were always located in the 5' UTR of the target gene. In contrast, the insertion of SR2 was only a few bases away from the 3' end of the sequence in both target genes. Additionally, all the retrotransposon inserts were found in the same strand as the ORFs of the target genes. Taken together, the above data may indicate that either the LTR region or the polyadenylation site of the retrotransposons may actually be used as an alternative promoter or alternative polyadenylation site for the target genes, as described for other organisms (4, 38, 43).
|
View this table: [in a new window] |
TABLE 3. Mapping of transposon inserts into S. mansoni target genes
|
Curation of the S. mansoni EST database. With the sequences used in this work, we were able to reannotate as retrotransposons 2,391 reads that have not been previously identified as such (56). These reads clustered into 452 S. mansoni assembled EST sequences (SmAEs) (56) and are accessible at the project website (http://bioinfo.iq.usp.br/schisto); they represent 1.5% of the 30,988 unique SmAE sequences previously reported (56).
|
|
|---|
Saci-1, -2, and -3 and Perere display low to medium numbers of copies in the genome but exhibit expression levels equal to or higher than those of the other transcribed retrotransposons (Table 1). This means that in a comparison of transcript production per genome copy, the novel retrotransposons have an activity up to 2 orders of magnitude higher. These characteristics probably reflect different niches occupied by each of the retrotransposons during genome evolution. It has been shown that each retrotransposon tends to have a different ratio of distribution between euchromatin and heterochromatin (12) and that different locations in the chromosome influence the conservation and the copy number of transposable elements (26, 29). The elements tend to be more abundant in heterochromatin because of the lower density of functional genes in this region (13, 26, 29), but they are more degenerate and expressed at lower levels than elements present in the euchromatin (26). It is tempting to hypothesize that the high-copy-number retrotransposons present in the S. mansoni genome are preferentially located in heterochromatin, generating several truncated copies that would be inactive, thus accounting for their low transcriptional activity. The repetitive elements W1 and W2 are located in the heterochromatic region of the S. mansoni W sexual chromosome. Different parasite isolates are known to exhibit sex-specific polymorphisms, resulting from different numbers of copies of repetitive elements, which indicates genomic instability and suggests that replication of repetitive elements is a method of generating variability within schistosomes (18). In contrast, we propose that the low-copy number retrotransposons would have several copies located in euchromatin and would be subject to a stricter process of selection, which has been shown previously to induce the conservation of retrotransposons with active characteristics (46). It is noteworthy that retrotransposons are thought to proliferate in the sexual species, where they would propagate during sexual reproduction, as suggested by the work of Arkhipova and Meselson (2). Schistosomes are among the earliest animals in the evolutionary scale to develop sexual dimorphism and heteromorphic sex chromosomes, and a substantial fraction of the genome of this metazoan parasite is predicted to comprise repetitive sequences made up of retrotransposons (5). The fact that the non-LTR transposon Perere is phylogenetically related to the S. japonicum non-LTR transposon pido implies an ancestral acquisition.
Gene silencing caused by cosuppression, which is a mechanism that preferentially diminishes mRNA levels of high-copy-number retrotransposons (25), may provide an explanation for the differential levels of transcription of high- and low-copy-number transposons in S. mansoni. Cosuppression may trigger different pathways (25) such as methylation of DNA (36, 57), chromatin remodeling (17), and RNA silencing (20). Our group has previously identified (56) S. mansoni sequences coding for proteins with a high degree of similarity to those involved in gene silencing, such as DDM1 (SmAE 609008.1) (44), DNMAP1 (SmAE 700041.1) (51), Dicer (SmAE 604739.1), and Argonaute/Piwi (SmAEs 603705.1 and 606231.1) (20, 61). It is possible that the different levels of retrotransposon expression result from selective transposon silencing related to the copy number and triggered by cosuppression. Interestingly, retrotransposon activity as a factor in the silencing of nearby genes in the genome has been recently described (53). Thus, mapping of retrotransposons may provide clues for the silencing of some additional genes in the S. mansoni genome.
Knowledge of these four transposons should help in the assembly of the parasite's genome sequence, a task that is particularly difficult when a highly repetitive, complex genome is sequenced by the whole-genome shotgun approach (34, 45). Moreover, data from these four new elements allowed us to discern in S. mansoni two populations of retrotransposons with different copy numbers and transcriptional activities. New experiments, such as fluorescence in situ hybridization for detection of these retrotransposons in the S. mansoni chromosomes, in silico analysis of their differential distribution throughout the S. mansoni genome, and measurement of retrotransposon transcriptional activities in RNA interference experiments, should provide clues to understanding the differences between these two populations and extend our understanding of the dynamics of the S. mansoni genome and the biology of this complex human parasite.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»