Previous Article | Next Article ![]()
Journal of Virology, June 2003, p. 6153-6166, Vol. 77, No. 11
0022-538X/03/$08.00+0 DOI: 10.1128/JVI.77.11.6153-6166.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Department of Tropical Medicine, School of Public Health and Tropical Medicine, Tulane University Health Sciences Center, New Orleans, Louisiana,1 Department of Molecular Parasitology, Institute for Biology, Humboldt University, Berlin, Germany,2 Wolfson Wellcome Biomedical Laboratory, Department of Zoology, The Natural History Museum, London, and,3 Pathogen Sequencing Unit, The Sanger Institute, Hinxton, England; and,5 Department of Biological Sciences, Illinois State University, Normal, Illinois4
Received 5 September 2002/ Accepted 26 February 2003
|
|
|---|
8-fold coverage of the S. mansoni genome revealed that numerous copies of Boudicca were interspersed throughout the schistosome genome. By reverse transcription-PCR, mRNA transcripts were detected in the sporocyst, cercaria, and adult developmental stages of S. mansoni, indicating that Boudicca is actively transcribed in this trematode. |
|
|---|
The life cycle of S. mansoni involves parasitism of both humans and Biomphalaria glabrata snails. Infectious larvae known as cercariae emerge from the snails into a body of water, where they initiate infection by direct penetration of human skin. In the human host, the worms develop into male and female adults which live together within the mesenteric venules of the intestines and release eggs into the bloodstream. To perpetuate the life cycle, the eggs traverse the intestinal wall, facilitated by secreted proteolytic enzymes and their spines, and pass out in the feces to fresh water. Although chemotherapy is available, its effectiveness is limited by continuous reinfection upon subsequent exposure to water containing cercariae. Furthermore, symptoms do not necessarily resolve upon chemotherapeutic cure of the infection, and chronic symptoms of the disease can remain with the patient for life. No vaccine is currently available.
Health education and drug therapy are the cornerstones of the World Health Organization's strategy to combat schistosomiasis. Although the endemic distribution of schistosomiasis has changed in the past 50 years, overall, the estimated number of infected persons and those at risk of infection has not been reduced (6, 11, 69). Moreover, interactions with other infectious diseases can induce increased pathology, as with coinfection with hepatitis C, in which liver damage can be more severe than in patients with either disease alone (26).
Mobile genetic elements appear to be a principal force driving the evolution of eukaryotic genomes (10, 41, 58), and these elements play an important role in the establishment of genome size (51). One of the major categories of mobile genetic elements is the long terminal repeat (LTR) retrotransposable element, i.e., the LTR retrotransposons and the retroviruses (23). These elements are of interest for their potential for horizontal transmission, among other attributes. Among the invertebrate retroviruses, such as gypsy (32) and Tom (62), acquisition of envelope protein-encoding genes from diverse viruses by unrelated LTR retrotransposons confers the ability to be infectious and thereby facilitates horizontal transmission. Malik et al. (42) theorized that this has occurred independently on several occasions during the evolution of the invertebrate retroviruses.
It is hoped that an enhanced understanding of the schistosome genome can be expected to lead to long-term strategies for the control of schistosomiasis. The genome of schistosomes, blood flukes of the phylum Platyhelminthes, is estimated at
270 Mbp per haploid genome (56), arrayed on seven pairs of autosomes and one pair of sex chromosomes (27, 28). Both the evolution and size of this genome may be highly influenced by mobile genetic elements. Indeed, more than half of the schistosome genome appears to be composed of or derived from repetitive sequences, to a large extent from retrotransposable elements (34-36).
Previously characterized schistosome mobile genetic elements include SINE-like retrotransposons (60, 18), LTR retrotransposons (36), and at least two families of non-LTR retrotransposons (35). Although active replication of these elements has not been definitively proven, mRNA transcripts encoding reverse transcriptase and endonuclease have been detected (34, 36), as has reverse transcriptase activity in schistosome extracts (29), suggesting that at least some of these elements are actively mobile within the genome. Indeed, actively replicating mobile genetic elements in other platyhelminths have been described as RNA intermediates (4) and DNA transposons (3, 53). Furthermore, the schistosome mobile genetic elements so far characterized are highly represented within the genome, with copy numbers ranging up to 10,000 per haploid genome (34).
Whereas evidence suggests the presence of a large number of families of retrotransposable elements within the chromosomes of the human blood flukes (20, 25), this is the first description of a full-length LTR-retrotransposon from the genome of the African and neotropical human blood fluke Schistosoma mansoni. We have termed this new S. mansoni retrotransposon Boudicca, after the queen of the Celtic Iceni tribe. In 61 A.D., Boudicca led her Celtic tribesmen in a revolt against the Romans that swept across ancient Britain, culminating in the destruction of Roman London (http://www.athenapub.com/boudicca.htm). The image of Boudicca driving her chariot in battle against the Roman legions, reminiscent of a mobile genetic element moving throughout the genome of the pathogenic S. mansoni parasite, inspired the designation of the new retrotransposon.
|
|
|---|
21,000 clones, with an average insert size ranging from 120 to 170 kb, providing
8-fold coverage of the schistosome genome. Numerous BAC end sequences determined from randomly selected clones from this library are now in the public databases. The sequence of the insert of one of the BAC clones of Le Paslier et al. (39), clone number 53-J-5, was determined recently in its entirety (133.5 kb) at the Sanger Institute, and this sequence is available from the Sanger Institute Schistosoma Genome Project site (ftp://ftp.sanger.ac.uk/pub/databases/Trematode/S.mansoni/BACs/53J5/). Plasmid DNA from BAC clone 53-J-5 was prepared from liquid cultures of Escherichia coli with the PhasePrep BAC DNA kit (Sigma). Examination of the nucleotide sequence of BAC 53-J-5 by BlastX searches suggested that it included a retrotransposable element bearing a reverse transcriptase-encoding domain (not shown). A sequence of 5,858 nucleotides encoding a degenerate copy of the Boudicca element was located between residues 109662 and 115518 in the reverse orientation of clone 53-J-5, as detailed below. Subsequently, the sequences of the LTRs and of predicted open reading frame (ORFs) of the novel 53-J-5-associated retrotransposon were employed as the query in Blast searches to interrogate the TIGR database of S. mansoni genomic DNA BAC end sequences at http://tigrblast.tigr.org/euk-blast/index.cgi?project=sma1.
Sequences identified with strong matches were obtained subsequently from GenBank, as follows: LTR-specific BACs BAC 39-I-11 (GenBank accession no. BH201890), BAC 41-N-21 (BH203925), BAC 42-I-15 (BH200403), BAC 47-B-16 (BH206071), BAC 50-J-17 (BH210181), BAC 55-G-14 (BH204242), BAC 57-M-13 (BH210591), BAC 58-C-4 (BH202081), BAC 60-C-2 (BH203189), BAC 60-J-19 (BH203616), and BAC 62-M-3 (BH205111); ORF1-specific BAC 45-H-5 (BH206669), BAC 46-G-15 (BH209250), BAC 62-G-23 (BH211091), and BAC 62-N-16 (BH204125); ORF2 protease domain-specific BAC 43-P-17 (BH20912); ORF2 integrase, zinc finger, and DDE domain-specific BAC 53-C-10 (BH 200708), BAC 49-G-18 (BH 202479), BAC 45-E-19 (BH 211202), BAC 53-L-22 (BH 201551), and BAC 61-H-12 (BH 210420); and ORF2 integrase COOH-terminal domain BAC 45-L-19 (BH 199683), BAC 47-D-15 (BH 210140), BAC 48-A-14 (BH 208816), and BAC 51-L-17 (BH 207502), BAC56-N-2 (BH 200029). (We did not target BAC clones with high identity to the reverse transcriptase domain of ORF2 because the strong conservation of reverse transcriptase in evolutionary terms [68, 41, 45] makes it problematic to ensure that the Blast-identified BAC clones represented Boudicca rather than some other unidentified LTR retrotransposon from the genome of S. mansoni.)
Both the contiguous 53-J-5 copy of Boudicca and a composite of genomic sequence fragments assembled from these BAC ends were used to generate the consensus sequences. Alignments were established with ClustalW and MacVector software. Analysis of potential ORFs was accomplished with MacVector software, with parameters set to use stop codons as ends, an ATG codon as the beginning of ORF1, and codons after stops as the beginnings of subsequent ORFs. Since ORFs downstream of ORF1 in retrotransposons generally begin as a ribosome slip event rather than the beginning of a new mRNA transcript, downstream ORFs do not necessarily begin with an ATG (66). Structural analysis of predicted polypeptides for the presence of signal peptides and transmembrane domains was carried out with the on-line tools at http://www.cbs.dtu.dk/services/SignalP/(46) and http://www.cbs.dtu.dk/services/TMHMM-2.0/(33), respectively.
Phylogenetic analysis. In order to characterize the phylogenetic relationship of Boudicca to other mobile genetic elements, phylograms based on the reverse transcriptase domain of retroviral Pol were generated and rooted with reverse transcriptase from members of the copia family as the outgroup (61). Alignments of the reverse transcriptase sequences and generation of the bootstrapped phylogenetic tree (54, 47) were accomplished with ClustalX (63) and Njplot (50) software, with manual adjustment of gap size in Ty1/copia and gypsy so that the YVDD active sites aligned. Branch length ratios were preserved upon transfer into CorelDraw diagrams for display.
Sequences of reverse transcriptases used in the phylogenetic analysis were nomad (AF039416), gypsy (GNFFG1), yoyo (U60529), Tom (CAA80824), Ted (M32662), ZAM (CAA04050), kabuki (BAA92689), CsRn1 (AAK07486), Ty3 (S53577), grasshopper (M77661), Maggy (L35053), Hsr1 (X92487), sushi (AAC33526), Dea1 (T07863), Osvaldo, (CAB39733), Ulysses (X56645), Woot (U09586), micropia (X14037), Blastopia (Z27119), Cyclops (AJ000640), Gulliver (F243513), Mag (S08405), Cer1 (U15406), feline leukemia virus (NP047255), human immunodeficiency virus type 1 (PO4585), human immunodeficiency virus type 2 (J04542), simian immunodeficiency virus (AAA47606), mouse mammary tumor virus (GNMVMM), Ty1 (P47100), and copia (OFFFCP) and were obtained from the GenBank, EMBL, and PIR databases. Where possible, protein sequences were used directly from the database; otherwise, reverse transcriptase sequences were predicted by translation of ORF2, followed by removal of protease, RNase H, and integrase to leave an amino acid sequence for reverse transcriptase.
Developmental stages of schistosomes; isolation of schistosome nucleic acids. Schistosoma mansoni (Puerto Rican NMRI strain) was propagated by infecting BALB/c mice by intradermal injection of 70 cercariae collected from Biomphalaria glabrata snails that were maintained in the laboratory. Adult worms were recovered from infected mice by portal perfusion at 6 to 7 weeks after infection. Genomic DNA was extracted from adult worms and from cercariae. About 30 mixed-sex, adult S. mansoni worms were lysed in 0.1% sodium dodecyl sulfate-100 mM NaCl-50 mM Tris-20 mM EDTA (pH 8)-proteinase K at 500 µg/ml. Genomic DNA was extracted from the lysate by sequential partition against phenol-chloroform and chloroform-isoamyl alcohol, digestion with RNase A, a second partition against phenol-chloroform, and precipitation in ethanol in the presence of sodium acetate. The S. mansoni genomic DNA was dissolved in 10 mM Tris-1 mM EDTA, pH 8.0.
For Southern blot analysis, genomic DNA was isolated from S. mansoni cercariae (NMRI strain), with a wet pellet of cercariae of
1 ml in volume, with the AquaPure genomic DNA kit from Bio-Rad. Schistosome eggs were isolated from the livers of infected mice at 8 weeks postinfection. For the initiation of in vitro cultures, miracidia were transformed into primary (mother) sporocysts by overnight culture in MEMSE-J with 10% fetal calf serum (31) or medium F with 10% bovine serum albumin (29) at 26°C and 5% CO2. Shed ciliated plates were washed away.
Southern blots and BAC library screening.
S. mansoni genomic DNA (
33 µg per lane) and BAC clone 53-J-5 (insert size, 133.5 kb) were digested with restriction enzymes, separated through a 0.8% agarose gel by electrophoresis, transferred to nylon (Zeta-Probe GT; Bio-Rad) by capillary action (59), and cross-linked to the nylon by UV light. A Boudicca-specific probe was produced by PCR amplification with BAC 53-J-5 as the template and 5'-AACTGCAGATGCACGGAATCACGGACT (forward) and 5'-GCTCTAGACTAAGATTCAGTCGGCAGATGC (reverse) primers, with restriction sites for PstI and XbaI added to facilitate cloning into plasmid vectors. The probe, targeting part of the gag gene of Boudicca, was 385 bp in length, spanning residues 622 to 1006 of the 5,858 nucleotides of the 53-J-5 copy of Boudicca.
For Southern hybridization, the North2South Direct horseradish peroxidase labeling and detection kit (Pierce, Rockford, Ill.) was employed, with the washing and stringency conditions recommended by the manufacturer. This system uses direct labeling of the probe with horseradish peroxidase and a chemiluminescent signal detected with X-ray film (Fuji). To screen the high-density nylon filters representing the S. mansoni genomic DNA BAC library (39), the 385-bp probe (above) was labeled with digoxigenin with the digoxigenin labeling system from Roche (Indianapolis, Ind.). Hybridizations of the BAC high-density filters were carried out at 42°C overnight in the hybridization solution from Roche's digoxigenin labeling system, after which the nylon membranes were washed at 68°C for 30 min in 0.5x SSC (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate)-0.1% sodium dodecyl sulfate. Development of the digoxigenin-labeled signal was accomplished by immunoblotting with antidigoxigenin-alkaline phosphatase-conjugated immunoglobulins (Roche), after which CSPD [disodium 3-(4-meth-oxyspiro(1,2-dioxetane-3,2' (5'-chloro)tricyclo[3,3.1.13.7]decan)-4-yl) phenyl-phosphate; (Roche] was used as the substrate for development of the digoxigenin chemiluminescence, which in turn was detected on X-ray film.
Retrotransposon gene copy number analysis.
Comparative estimates of the copy number of Boudicca were obtained by a bioinformatics approach, wherein Blast analysis of the BAC end database of S. mansoni genomic sequences targeted better-characterized retrotransposable elements from S. mansoni for which copy numbers have been reported, including the non-LTR retrotransposons SR1 and SR2 (19, 20) and the SINE-like element Sm
(60). S. manosni BAC end sequences from the Institute for Genomic Research and the Centre National de Sequençage were obtained from the TIGR ftp site (ftp.tigr.org/) and from Raymond Pierce (Institut Pasteur, Lille, France), respectively. Standalone Blast queries of the known repeat sequences against the BAC end sequences were performed.
In addition, the copy number estimate obtained from this bioinformatics approach was supported by Southern hybridization analysis of restriction enzyme-digested genomic DNAs alongside titrations of increasing quantities of HindIII-digested BAC 53-J-5. Densitometric analysis of Southern hybridization signals was accomplished with the Versa-Doc gel documentation system (Bio-Rad) and accompanying Quantity-One software (Bio-Rad). Densitometric data for the genomic DNA- and BAC-containing lanes in the Southern hybridization were used to estimate the copy number for Boudicca according to the formula [(A/B) x C]/E = F. This formula was derived from two equations: (A/B) x C = D and D/E = F, where A is the number of copies of Boudicca in the BAC 53-J-5 lane (Fig. 8, lane 14), B is the density volume of the BAC 53-J-5 lane in units of optical density per mm2, C is the density volume of the S. mansoni genomic DNA lanes in units of optical density per mm2, D is the total number of copies of Boudicca per lane, E is the number of haploid genomes in each genomic DNA lane, and F is the total number of copies of Boudicca per haploid genome.
![]() View larger version (82K): [in a new window] |
FIG. 8. RT-PCR-based detection of Boudicca transcripts in developmental stages of Schistosoma mansoni. (A) Schematic representation of RT-PCR-based detection of mRNA transcripts. The solid arrows represent primers (binding in ORF2) used in the first run of the nested PCR, and the open arrows represent primers used in the second run of the nested PCR. Primer P-622-F (white arrow), binding in ORF1, and P-2193-R span an area overlapping the first two ORFs of Boudicca. (B) After RNA isolation from adult worms, DNase digestion, and reverse transcription, a 447-bp fragment was amplified in the first PCR (lane 1); DNase digestion without reverse transcription revealed no contamination with genomic DNA (negative control lane 2); a corresponding fragment was amplified from genomic DNA (positive control lane 3). A 183-bp fragment was reamplified with nested primers from the 447-bp RT-PCR product (lane 4) and from the 447-bp genomic PCR product (positive control) (lane 6). Reamplification of the negative control was not possible (lane 5). A 1,571-bp fragment was amplified with primers spanning an area overlapping the first two ORFs from mRNA transcripts (lane 7) and from genomic DNA (lane 8). Lane M, size markers. (C) After RNA isolation from larval stages, DNase digestion, and reverse transcription, a 447-bp fragment was amplified in the first PCR (lane 5, sporocysts; lane 6, cercariae); a 183-bp fragment was reamplified with nested primers from the 447-bp RT-PCR product (lane 8, sporocysts; lane 9, cercariae). Corresponding fragments were amplified from genomic DNA: in the first PCR, a 447-bp fragment (lane 7), and in the nested PCR, a 183-bp fragment (lane 10). The quality of the cDNA transcripts was tested by amplification of cytochrome c oxidase subunit 1 (AF101196). A 342-bp fragment was amplified (lane 3, sporocysts; lane 4, cercariae). DNase digestion without transcription revealed no contamination with DNA (negative control; lane 1, sporocysts; lane 2, cercariae). Lane M, molecular size standards.
|
8-fold coverage of the of S. mansoni genome (39). RT-PCR. Twenty adult worms, 1,000 sporocysts, or 2,000 cercariae of S. mansoni were homogenized by mechanical disruption in lysis buffer (RNeasy RNA extraction kit; Qiagen). RNA was extracted according to the manufacturer's instructions under RNase-free conditions. After the RNA isolation, any residual DNA contamination was removed by digestion with RNase-free DNase (Promega). The RNA was precipitated to remove the DNase, dissolved in RNase-free water, and used as a template for oligo(dT)-primed reverse transcriptions (RT) with Moloney murine leukemia virus H- point mutant reverse transcriptase (Promega). The resulting cDNA was used as the template in PCR experiments with primers specific for the reverse transcriptase of Boudicca (forward, 5'-CCCTAAAAAGGACAGCAACGATTG; reverse, 5'-GGTTCCGATTTGGCATTTCTG, 447-bp product) and with primers forward (5' ATGCACGGAATCACGGAC) and reverse (5'-GAGTGATGATGGCGGTTTTAGG) spanning the region from gag to the reverse transcriptase (1,571-bp product overlapping ORF1 and ORF2 of Boudicca) (see Fig. 8A below).
PCR amplification was performed under the following cycling conditions: 94°C for 5 min; 35 cycles of 94°C for 1 min, annealing for 1 min at primer-dependent temperature (see below), and extension at 72C for 1 min; and a final extension at 72°C for 10 min. A nested PCR was subsequently performed on the 447-bp reverse transcriptase product (primers: forward, 5'-CCAACTGATGTTTATCGGCGTC; reverse, 5'-GAGTGATGATGGCGGTTTTAGG, 183-bp product) to validate the specificity of the first round of PCR. Annealing temperatures were 62°C for the 447- and 183-bp reverse transcriptase fragments and 58°C for the 1,571-bp fragment spanning both ORF1 and ORF2. The transcription reaction was validated with primers forward (5'-GATTTGCGCTATGGCTTC) and reverse (5'-GGCCATCACCATACTAGC, 342-bp product) specific for the S. mansoni cytochrome c oxidase subunit 1 gene (GenBank accession no. AF101196) (49). Negative control reactions were carried out with DNase-treated RNA as the template. Genomic DNA from adult S. mansoni was used as the positive control (64).
Nucleotide sequence accession numbers. Nucleotide sequences reported here have been assigned GenBank accession numbers as follows: consensus sequences of the Boudicca LTR, accession no. BK000439; gag region, BK000440; integrase-zinc finger-DDE regions, BK000441; and integrase carboxy terminus, BK000444.
|
|
|---|
5.8 kb in length. Whereas a full, contiguous copy of this element is contained within the BAC clone 53-J-5, this copy appears to have been degraded by a number of mutations (Fig. 1B and 2). In order to reconstruct the genomic structure of a putatively active Boudicca element, we examined sequences of BAC ends from S. mansoni genomic DNA (39) that are available at the TIGR website (above). Fragments from a number of discrete copies of Boudicca were aligned to assemble regions of consensus that resolved mutations evident in the 53-J-5 copy of the element (see Fig. 3).
![]() View larger version (48K): [in a new window] |
FIG. 1. (A) Schematic representation of the predicted structure of the Schistosoma mansoni retrotransposon Boudicca. (B) Schematic representation of a contiguous but degraded copy of Boudicca, found in the insert of BAC 53-J-5 of the S. mansoni BAC library of Le Paslier et al. (39). MHR, major homology region; PR, protease, RNH, RNase H; IN-zn, integrase zinc finger domain; IN-DDE, integrase DDE domain; IN-Ct, integrase C-terminal domain, env?, possible envelope protein.
|
![]() View larger version (79K): [in a new window] |
FIG. 2. Annotated sequence of Boudicca from BAC clone 53-J-5. Functional domains and conserved regions of protein domains are shaded and labeled to the left of the sequence. Amino acid sequences are shown in frame below their corresponding nucleic acid sequence. The amino acid sequence of the +3 reading frame (ORF2, pol) is shown in bold; amino acids in the +1 reading frame (all other proteins) are shown in lightface. LTRs are also shown in bold. MHR, major homology region; CHB, Cys-His box; PR, protease; RT, reverse transcriptase; RNH, RNase H; INZ, integrase zinc finger domain; IND, integrase DDE domain.
|
![]() View larger version (41K): [in a new window] |
FIG.3. Resolution of specific mutated regions of the 53-J-5 copy of Boudicca. (A) Alignment of seven BAC ends containing the Boudicca LTR sequence indicate that the short 3' LTR in 53-J-5 is due to a deletion specific to that LTR. (B) Alignment of four BAC ends containing the middle portion of the Boudicca gag gene indicated that the stop codon at position 999 of 53-J-5 was a mutation unique to that copy. (C) Alignment of five BAC ends containing the region at the end of the long, +3 pol ORF indicate that the frameshift at the end of this ORF in 53-J-5 is due to an insertion mutation at position 3577 in the 53-J-5 copy. (D) ORF analysis of five BAC ends spanning the region between the integrase DDE domain and the integrase C-terminal domain indicated that a large gap of noncoding sequence in the 53-J-5 copy (positions 3925 to 4149) corresponds to an intact pol open reading frame in the putatively active copy of Boudicca. Nomenclature for the BAC ends is that used in the TIGR database to classify the Schistosoma mansoni BAC library clones (http://tigrblast.tigr.org/euk-blast/index.cgi?project=sma1).
|
A search of the nonredundant database at GenBank with the consensus sequence of the Boudicca LTR did not reveal significant matches to LTRs of other retrotransposons or indeed any significant matches at all. However, BlastN searches of the Genome Survey Sequences (GSS) database at GenBank returned 185 matches with significant scores of 170 and greater, over 150 bp and longer, all of which were from S. mansoni (mostly to BAC end sequences) (not shown). Furthermore, a search of the nonhuman, nonmouse expressed sequence tag database revealed four significant matches to S. mansoni expressed sequence tags and no other matches (not shown). The length of the LTRs of most retrotransposons ranges in size from about 200 to about 600 bp (45): for comparison of sizes, the LTRs of related retrotransposons are Gulliver, 259 bp; kabuki, 272 bp; gypsy, 526 bp; ZAM, 594 bp; Ted, 273 bp; and Tom, 474 bp.
gag gene of Boudicca encodes a distinctive Cys-His box motif, CHCC. Downstream of its 5' LTR, Boudicca exhibits two, and possibly three, protein-encoding reading frames, ORF1, ORF2, and ORF3, and terminates in the 3' LTR (Fig. 1 and 2). ORF1 of the BAC 53-J-5 copy included a stop codon (TGA) at position 999 that would result in a truncated Gag precursor. However, the consensus sequence from four other BACs spanning that particular region revealed that the putatively active Boudicca element contained an intact ORF1 encoding Gag (Fig. 3B). This gag gene, encoding 276 amino acids, starts with an ATG (methionine) codon at nucleotide positions 505 to 507 of Boudicca, ends at nucleotide position 1332, and encodes at least two conserved domains of the retroviral structural proteins encoded by gag (matrix, capsid, and nucleocapsid) (Fig. 1 and 2).
Near its NH2 terminus, the Gag protein included a major homology region, orthologous in sequence to the major homology region from numerous retroviruses (Fig. 4A). The major homology region is a region within the capsid protein that is highly conserved among several retroviruses as well the yeast retrotransposon Ty3 (13, 14). The major homology region is required for infectivity and is involved in capsid assembly (8, 52). Carboxy terminal to the major homology region, the Gag protein included a conserved domain of the nucleocapsid protein, involved in RNA binding, known as a Cys-His box (21, 44). As shown in the alignment in Fig. 4B, orthologous Cys-His domains are present in the CsRn1 and Kabuki retrotransposons. This Boudicca/CsRn1/Kabuki CHCC motif, specifically CX2HX9CX3C, is dissimilar to the more usual CX2CX4HX4C (i.e., CCHC) zinc finger motif seen with Cys-His nucleic acid binding motifs of nucleocapsid proteins of most other retrotransposons and retroviruses (4, 21, 44).
![]() View larger version (87K): [in a new window] |
FIG. 4. Alignments of Boudicca Gag functional domains with those of LTR-type retrotransposons and retroviruses. Two conserved domains of the Gag protein, the Cys-His box and the major homology region, were located within the first open reading frame of Boudicca. Alignments with related elements (Cys-His box) and retroviruses (major homology domain) are shown. For the Cys-His box comparisons, translations of DNA sequences rather than direct protein sequences were used. Accession numbers, followed by location within the nucleotide sequence in parentheses, are CsRn1, AY013561 (1261 to 1329), and kabuki, AB032718 (6047 to 6100). Retroviral major homology regions were obtained from reference 12. aa, amino acids; RSV, Rous sarcoma virus; HIV, human immunodeficiency virus; BLV, bovine leukemia virus; HTLV, human t-cell lymphotropic virus; EIAV, equine infectious anemia virus; VISNA, visna virus; SIV, simian immunodeficiency virus; FLV, feline leukemia virus; MPMV, Mason-Pfizer monkey virus; MoMLV, Moloney murine leukemia virus; MMTV, mouse mammary tumor virus.
|
1,060 amino acid residues (Fig. 1, 2, and 5). The Boudicca polymerase included four enzymatic domains, protease, reverse transcriptase, RNAse H, and integrase, in that order (Fig. 1, 2, and 5), characteristic of the domain order and structure of the polyprotein of retroviruses and Ty3/gypsy LTR retrotransposons. Part of the sequence of the protease domain of Boudicca, in particular the region of the active-site triad of residues including the catalytic Asp residue, is shown as a ClustalW-generated alignment in Fig. 5. Strong conservation with the active-site regions of proteases from other gypsy-like retrotransposons was apparent, although the more usual DT/SG motif was mutated to DTD in the 53-J-5 copy of Boudicca. However, the protease of active Boudicca elements may have DTG, since the Boudicca protease in BAC end BAC 43-P-17 (BH20912) had DTG (not shown). The processing sites for protease have yet to be determined, and hence we do not yet know the lengths of the mature forms of each of the protease, reverse transcriptase, RNase H, and integrase domains of the Boudicca polyprotein. Protease enzymes from retroviruses are
100 amino acids in length (30), and it can be anticipated based on identity that the protease of Boudicca will be of similar dimensions.
![]() View larger version (61K): [in a new window] |
FIG. 5. Alignments of Boudicca polymerase functional domains with those of five other LTR-type retrotransposons. Five of six gypsy-like LTR retrotransposons (CsRn1, kabuki, ZAM, Ted, Tom, and gypsy) were used to find conserved regions of pol in Boudicca. ClustalW alignments are presented for conserved regions of reverse transcriptase, RNase H, aspartic protease, integrase, zinc finger domain, and integrase DDE (catalytic) domain. The conserved integrase motifs HHCC (for the zinc finger domain) and DDE (for the central catalytic domain) are marked with arrowheads. Identical amino acids appear in boldface and are shaded dark gray; similar amino acids are in lightface and are shaded light gray. Accession numbers are as follows: CsRn1, AAK07486; kabuki, BAA92690; ZAM, CAA04050; Ted, B36329; Tom, CAA80824; and gypsy, GNFFG1. aa, amino acids.
|
180 amino acid residues (Fig. 1, 2, and 5). These conserved domains are presented in amino acid alignment with the orthologous reverse transcriptase regions of CsRn1, kabuki, and related LTR retrotransposons, where strong identity with these other reverse transcriptases was apparent (Fig. 5). Furthermore, we focused on the sequence of the reverse transcriptase residues of Boudicca to examine its phylogenetic relationships to other retrotransposons (below).
In conjunction with reverse transcriptase, RNase H is involved in transcription of the RNA genome to the proviral DNA genome; RNase H cleaves the original RNA template of the RNA-DNA hybrid. Also, it generates a polypurine tract, the primer for plus-strand DNA synthesis, and it removes the RNA primers from newly synthesized minus and plus strands of the proviral DNA (41). RNase H enzymes exhibit a characteristic active site that includes four conserved carboxylate residues, three Asps and a Glu; all four appear to be conserved in the Boudicca RNase H (Fig. 5 and not shown) (41). The locations of these active sites indicated that the RNase H enzyme domain of Boudicca extended for
140 residues COOH-terminal to reverse transcriptase (not shown).
The integrase of Boudicca resided at the COOH terminus of the polymerase precursor, like other Ty3/gypsy like retrotransposons. Integrase mediates integration of the DNA provirus into the host chromosome. Integrase is composed of three domains: the NH2-terminal zinc-binding domain, the central catalytic domain (DDE domain), and a COOH-terminal, nonspecific DNA binding domain. Alignments of the Boudicca integrase zinc-binding domain, with its characteristic HX3HX29CX2C motif, and of the catalytic DDE domain with its D-39aa-D-35aa-E spacing, where Xaa represents the indicated number of amino acids, are presented in Fig. 5. It was clear that the integrase domain of Boudicca was orthologous to that of kabuki, CsRn1, Ted, and other LTR retrotransposons. Within the 53-J-5 copy of Boudicca, the insertion of a G nucleotide at position 3577 led to a frameshift mutation that disrupted ORF2 in the region encoding the integrase (Fig. 1, 2, and 3). However, this frameshift is apparently absent from active copies of Boudicca because it was not seen in five other copies that we examined (Fig. 3, panel C).
Furthermore, within the 53-J-5 copy of Boudicca, an additional open reading frame starts at position 4150 and extends to 4923, with the stop codon TAG at 4462 to 4464 (Fig. 1B and 2). The 5' half of this reading frame shows identity to the integrase of CsRn1 (AAK07485, AAK07486, and AAK07487) and several other elements (e.g., BAA92695.1, NP_178653, and BAA92696) encoding the carboxy-terminal portion of integrase. This leaves 225 bp of noncoding sequence interrupting the coding region of the integrase (Fig. 1B). To determine whether this mutation was present in other copies of Boudicca, BAC ends homologous to this region were examined for open reading frames (Fig. 3D). In contrast to the 53-J-5 copy, four of five other Boudicca sequences spanning this site contained uninterrupted coding sequences, indicating that the pol gene of the putatively active Boudicca is composed of a single ORF spanning from 1299 to 4464 (Fig. 1A, 1B, 2, and 3D). Furthermore, the reading frame of the fifth Boudicca positive BAC end, 47-D-15, was interrupted by only a single stop codon.
Envelope glycoprotein encoded by ORF3 of Boudicca?
The region from positions 4465 to 4923 (Fig. 2) forms another, discrete open reading frame (Fig. 1, 2, and 3). Although in the same reading frame (+1) as the carboxy terminus of the integrase, this coding region appears to encode a polypeptide that, so far, does not show identity to other known proteins. By comparison with the genome organization of related retroelements, including the errantiviruses (41, 42), this 3'-situated ORF may encode a retrovirus-like envelope protein (Fig. 1, 2, and 3D). Whereas BlastX searches with this sequence did not reveal significant matches, the envelope proteins encoded by other LTR retrotransposons and retroviruses exhibit low or little identity, making identification by sequence alignment difficult (38). However, many envelope functions appear to have similar structural characteristics, including signal peptides, transmembrane domains, glycosylation, and Cys bridges (40). Indeed, the deduced polypeptide of Boudicca was predicted to include a signal peptide of
50 amino acids, with cleavage at TLG*CS (Fig. 2), according to the algorithm of Nielsen et al. (46). Also, it has a predicted transmembrane domain in the vicinity of amino acid residues 78 and 79, according to the algorithm of Krogh et al. (33), and two potential disulfide bridges. These structural features usually occur in viral envelopes (48).
Interestingly, all three BAC end sequences which span the region 3' of nucleotide 4465 on the 53-J-5 copy exhibit a stop codon at positions 4465 to 4467 (Fig. 3). In addition, clone 58-D-2 also includes positions 4465 to 4467 and also has this stop codon. Therefore, unlike the premature stop mutation at position 999 in the gag domain of the 53-J-5 copy (Fig. 1, 2, and 3), which is clearly a mutation in a nonfunctional copy of Boudicca, the stop codon downstream of nucleotide 4465 appears to be a characteristic of active Boudicca elements. This also suggests that the region after this stop codon encodes a third polypeptide, discrete from Gag and Pol. Based on the structures of related retrotransposons, including the errantiviruses, the logical candidate for this region would be an envelope protein.
Boudicca is related to kabuki and to gypsy. The predicted reverse transcriptase domain of Boudicca was aligned with the orthologous domains of numerous other LTR retrotransposons from the Ty3/gypsy and Ty1/copia families and the vertebrate retroviruses. Phylogenetic comparison of the reverse transcriptase domains of these diverse elements revealed that Boudicca's closest relatives were kabuki from Bombyx mori (1) and CsRn1 from the related trematode parasite Clonorchis sinensis (4) (Fig. 6), placing Boudicca in the newly characterized Kabuki/CsRn1 clade of gypsy-like retrotransposons (4). Although not as closely related as to kabuki and CsRn1, Boudicca was also closely related to gypsy and related errantiviruses from insects, including nomad, ZAM, and Tom from species of Drosophila, Ted from Trichoplusia ni, and yoyo from Ceratitis capitata (41). These closely related retroelements, along with the vertebrate retroviruses and several other gypsy-like LTR retrotransposons, form a group of LTR elements distinct (bootstrap = 99.5%) from the Ty1/copia assemblage.
![]() View larger version (32K): [in a new window] |
FIG. 6. Phylogeny of Boudicca and other retrotransposons and retroviruses based on their reverse transcriptase domains (68). Bootstrapped tree rooted with the phylogenetically distinct LTR retrotransposons Ty1 and copia. The tree was constructed with ClustalW and Nj Plot and subsequently imported into CorelDraw for annotation, where branch lengths were preserved. Bootstrap values for 1,000 replicates are shown, where values of greater than 500 were obtained.
|
Numerous copies of Boudicca in the schistosome genome. A Southern hybridization analysis of S. mansoni genomic DNA digested with BamHI, HindIII, PstI, and KpnI probed with a Boudicca ORF1-specific probe was carried out. BamHI cuts once, HindIII cuts twice, and neither KpnI nor PstI cuts within the 5,858 bp of the Boudicca copy in BAC 53-J-5. The probe does not contain restriction sites for these enzymes. In adjacent lanes, HindIII-digested BAC 53-J-5, of which the insert is 133.5 kb and contains a single copy of Boudicca, was serially diluted to contain 3.8, 38, 380, 3,800,3.8 x 104, 3.8 x 105, 3.8 x 106, 3.8 x 107, 3.8 x 108, and 3.8 x 109 copies of Boudicca, respectively. Densitometric analysis of the resulting Southern hybridization signals provided copy number estimates for the Boudicca retrotransposon of 2,630, 2,582, 2,099, and 2,026 per haploid genome for the BamHI-, HindIII-, PstI-, and KpnI-digested preparations, respectively (Fig. 7). Together, these Southern hybridization signals indicated that there were between 2,000 and 3,000 copies of Boudicca in the schistosome genome.
![]() View larger version (97K): [in a new window] |
FIG. 7. Southern hybridization analysis of genomic DNA of Schistosoma mansoni and BAC clone 53-J-5 probed with a Boudicca gag-specific probe, revealing that Boudicca is a multicopy number element. S. mansoni genomic DNA was digested with BamHI (lane 1), HindIII (lane 2), PstI (lane 3), and KpnI (lane 4), and titrations of BAC 53-J-5 DNA digested with HindIII (lanes 5 to 14, containing 3.8, 38, 380, 3800, 3.8 x 104, 3.8 x 105, 3.8 x 106, 3.8 x 107, 3.8 x 108, and 3.8 x 109 copies of Boudicca, respectively). Molecular size standards (in kilobase pairs) are shown at the left. Estimates of Boudicca copy number from densitometry scans of the X-ray film used to produce this image were calculated as follows. The number of haploid genomes contained in each digested S. mansoni genomic DNA lane (lanes 1, 2, 3, and 4) were calculated to be 1.1 x 108, based on the mass of genomic DNA loaded (33,000 ng/lane) and the mass of the S. mansoni haploid genome (2.94 x 10-4 ng) (270 Mb). Total density volume, in units of optical density per mm2, represents the total positive signal under consideration. With this as a measure of total Boudicca copy number for a given area of the blot, total density volumes were obtained for each band or smear. Lanes 1, 2, 3, and 4 containing genomic DNA which represented a known number of copies of the haploid genome of S. mansoni were compared with the positive band in lane 14 containing HindIII-digested BAC 53-J-5, which represented 3.8 x 109 copies of Boudicca, according to formula given in the text. The hybridization signal shown here was obtained after a 5-h exposure. Copy number was determined from the same experiment after a 30-min exposure.
|
retrotransposon (M27676, 331 bp), the 18S rRNA gene of S. mansoni (M62652, 1,739 bp), and the cDNA encoding S. mansoni cathepsin D protease (U60995, 1,285 bp) returned 578, 1, and 0 hits, respectively (Table 1). Since gene copy numbers have been estimated for all these query sequences (SR1, 200 to 2,000 copies; SR2, 1,000 to 10,000 copies; Sm
, 7,000 to 10,000 copies; 18S rRNA gene, 100 copies; and cathepsin D, one or a few copies [19, 20, 57, 60, 67]), and since the hit value for Boudicca was intermediate between those for SR2 and Sm
, we suggest that there may be as many as 1,000 to 10,000 copies of this LTR retrotransposon interspersed throughout the genome of S. mansoni. |
View this table: [in a new window] |
TABLE 1. Estimate of gene copy number of the Boudicca LTR retrotransposon in the genome of S. mansonia
|
1,200 copies per haploid genome (not shown). Together, these three approaches, all of which provided concordant estimates, confirmed that Boudicca was present at high copy number, between 1,000 and 10,000 copies per haploid genome. Boudicca is transcribed in developmental stages of the schistosome. RT-PCR targeting mRNA from the developmental stages of S. mansoni revealed that ORF2 sequences of Boudicca were expressed in all stages examined, mixed-sex adults, cercariae, and sporocysts (Fig. 8, panels A and B). This result was confirmed with a nested PCR strategy to reamplify amplicons of 447 bp from the first-round PCR with Boudicca-specific primers. With nested primers targeting a shorter region within the region of ORF2 targeted in the first-round PCR, the expected nested PCR product of 183 bp was obtained for cercariae, sporocysts, and mixed-sex adult stages of the schistosome, indicating that mRNAs for the entire retrotransposon were transcribed in all three of the developmental stages examined. In addition, RT-PCR with a forward primer targeting a site in ORF1 and reverse primer targeting a site in ORF2 produced a band of the expected size, 1,571 bp, indicating that mRNAs for the entire retrotransposon were transcribed in schistosomes. The control RT-PCR showed that cytochrome oxidase was also expressed in all developmental stages tested, with the expected product of 342 bp, verifying the integrity of the schistosome mRNA preparations examined in this study (Fig. 8C) (49). Omission of either the template cDNA or exogenous reverse transcriptase resulted in the absence of detectable amplicons, ruling out contamination of cDNAs with genomic DNAs (Fig. 8).
|
|
|---|
5.8 kb, and high copy number, estimated at over 1,000, Boudicca can be expected to have contributed substantially to the genome size of S. mansoni, perhaps constituting as much as 4% or more of it (5.8 kb x 1,000 = 5.8 Mb). Moreover, it will likely have influenced the evolution of the genome of this schistosome through the mutagenic action of its movement, its influence on expression of adjacent genes, and its effects on chromosomal recombination (10, 51). Although the Boudicca copy located in BAC clone 53-J-5 was clearly degenerate and inactive, it was possible to assemble a consensus sequence of a form of Boudicca likely to be active based on sequences of several of the large number of fragments of genomic copies of Boudicca that have been partially sequenced and are available in the public domain. Verification that the consensus Boudicca represents an ostensibly active retrotransposon may have to await the generation of the entire genome sequence of S. mansoni, a task that is now claiming the attention of a number of genome sequencing labs (24). Retrotransposons with degraded sequences may also be capable of functioning to a limited extent, since one copy can be transcribed and reinserted with functional proteins produced by another copy by a process of transcomplementation (2, 9, 65).
Although Ty3/gypsy-like, Boudicca is closely related to kabuki and CsRn1. Phylogenetic analysis focused on the reverse transcriptase domain confirmed that Boudicca was a Ty3/gypsy-like LTR retrotransposon. Within the Ty3/gypsy assemblage, its closest relatives were kabuki from the silk moth B. mori and CsRn1 from the Oriental liver fluke C. sinensis. In addition to the close identity to kabuki and CsRn1 revealed in the phylogenetic analysis, Boudicca, CsRn1, and kabuki shared structural similarities in the Gag protein that differentiated this group from other gypsy-like elements. Boudicca, CsRn1, and kabuki have a CHCC zinc finger Cys-His box motif at the COOH terminus of Gag that is dissimilar to the more usual CCHC motif reported in the nucleocapsid proteins encoded by gag in other LTR retrotransposons and in retroviruses (4, 30, 55).
In general, the short branch lengths of the phylogenetic tree reflect the close identity among members of the Ty3/gypsy family, and the clear placement of Boudicca within this group. The fact that elements in this group bear a close relationship to each other while colonizing hosts that are phylogenetically distant suggests that these retroelements may have spread by horizontal transmission. Interestingly, this group of elements also appear to be more closely related to vertebrate retroviruses than to the other family of LTR retrotransposons, the Ty1/copia group. Also of interest is that Boudicca is more closely related to insect and liver fluke retrotransposons than to the other full-length, characterized schistosome LTR retrotransposon Gulliver from Schistosoma japonicum (36). Therefore, it is apparent that the schistosome LTR retrotransposons Boudicca and Gulliver are discrete elements that are unlikely to have evolved vertically from a common ancestor. Furthermore, examination of other reverse transcriptase-encoding sequences in the schistosome genome verified the presence of additional retrotransposons (25).
Envelope protein? Despite its phylogenetic identity as revealed by the reverse transcriptase alignments, Boudicca may have an important structural difference that separates it from kabuki and CsRn1, a difference that would liken it to the errantivirus group of insect LTR retrotransposons such as gypsy, Tom, and ZAM and to Tas of Ascaris lumbricoides (22). Specifically, Boudicca appears to have a third ORF that may encode an envelope protein of about 150 amino acid residues that includes several key structural motifs, a signal peptide, a transmembrane domain, and disulfide bridges, features that are characteristic of envelope proteins from other retroelements (48). The envelope protein is the key structural component that allows retroviruses and errantiviruses to function as transmissible extracellular particles because it allows them to enter new host cells via binding to membrane receptors that mediate uptake of the virus by the next host cell (32). Furthermore, through this interaction with cell surface receptors at the point of host cell entry and infection, the envelope protein confers host cell and species specificity on the retrovirus. In addition to other aspects of the molecular biology of Boudicca, we plan to more fully characterize this putative envelope protein in future studies.
Boudicca is actively transcribed. Whereas the contiguous copy of Boudicca present in the 53-J-5 BAC clone has clearly been degraded by a number of mutations, probably to an inactive element, an RT-PCR experiment demonstrated active transcription from Boudicca copies. Moreover, the transcripts were detected in three developmental stages examined, adults, cercariae, and sporocysts. This suggests that Boudicca may be transposing within the genome of S. mansoni at this point in evolutionary time. Other recent reports have also suggested the activity of reverse transcriptase enzymes and/or mobilization of retrotransposable elements in schistosomes (29, 36). Given the phylogenetic proximity of Boudicca to the errantiviruses and the possible presence of an envelope protein, Boudicca may be capable of both vertical and horizontal transmission. Finally, as an endogenous retrotransposon, it is feasible that Boudicca or its structural components could be harnessed for use in transgenesis of schistosomes or other parasitic helminths.
This work was supported by the Deutscher Akademischer Austauschdienst in the form of a Short-Term Research Fellowship awarded to C.S.C., by research grant number KA 866/2-1 from the Deutsche Forschungsgemeinschaft to B.H.K., by the UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Disease (TDR), No. 990525, to D.L.W., and by a Wellcome Trust Beowulf Genomics Project Grant to D.A.J. and A.C.I. (for the sequencing of clone 53-J-5). P.J.B. is a recipient of a Burroughs Wellcome Fund scholar award in Molecular Parasitology. The Interdisciplinary Program in Molecular and Cellular Biology and the Center for Infectious Diseases, Tulane University, both supported this study, as did the Department of Molecular Parasitology, Institute for Biology, Humboldt University Berlin.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»