Previous Article | Next Article ![]()
Journal of Virology, May 2005, p. 6325-6337, Vol. 79, No. 10
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.10.6325-6337.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Section of Virology, Department of Medical Sciences, Uppsala University, SE-751 85 Uppsala, Sweden,1 Unit of Physiology, Department of Neuroscience, Uppsala University, SE-751 23 Uppsala, Sweden2
Received 19 October 2004/ Accepted 16 January 2005
|
|
|---|
|
|
|---|
The proviral structure consists mainly of 5' LTR-gag-pro-pol-env-3' LTR, where the long terminal repeats (LTRs) are identical at the integration event. They are built from untranslated 3' and 5' (U3 and U5) sequences separated by a repeat segment (R). The group-specific antigen (Gag) includes the matrix (MA), capsid (CA), and nucleocapsid (NC) proteins. The protease gene (pro) is located between the gag and the polymerase gene (pol), which contains reverse transcriptase (RT), RNaseH, and integrase (IN) domains. The envelope gene (env) consists of the surface unit (SU), with a signal peptide (SP) located at the 5' end and the downstream transmembrane unit (TM). The PBS is situated between the 5' LTR and gag, while the polypurine tract (PPT) is located between env and the 3' LTR. Most HERVs hold defective genomes, but some HERVs have the potential to produce proteins and thus have possible physiological functions. However, HERV proteins have not yet been identified using definite methods like N-terminal sequencing or mass spectrometry. The HERV-W envelope protein (Env), identical to the human protein Syncytin, has the capacity to fuse cytotrophoblast to syncytiotrophoblast in vitro and may thus be important in human placental morphogenesis during pregnancy (40). The TM proteins of many, but not all, gammaretroviruses reportedly have an immunosuppressive unit (ISU) consisting of 17 conserved amino acids (aa) (CKS-17) (for a review, see reference 14). Although the exact mechanism still is not known, gammaretroviral TMs have been shown to promote evasion from anti-tumor cytotoxicity (38, 39). Proviral sequences that do not encode proteins could also have cellular effects if the LTRs serve as promoters for gene expression. Amylase production in human parotid glands became possible after an integration of an HERV-E next to the amylase gene (48, 58). HERVs have also been implicated in several diseases, such as multiple sclerosis (12, 44) and schizophrenia (26), and in different cancers (7, 50, 51). However, this is still speculative. It is of importance to map these proviral structures in order to understand their functions. To study the sequence variability among HERV-H proviruses, we collected a number of representative full-length HERV-H sequences, aligned them, and constructed a consensus HERV-H, likely to be more similar to an "ancestral HERV-H sequence." Although there may have been several integration events and ancestral HERV-H variants, phylogenetic analysis of the HERV-H group (24) and the computer constructed near open reading frame (ORF) consensus presented here is compatible with the hypothesis of a common ancestral HERV-H. We analyzed the genes and the LTRs in detail with respect to sequence length, position in the viral genome, detectable conserved consensus motifs, and splicing patterns using bioinformatic tools and database searches. Features in the HERV-H consensus were then used in phylogenetic analyses.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Consensus motifs recognized by RetroTectora
|
Phylogenetic analyses of the HERV-H-like sequences used in alignment were conducted in ClustalX (version 1.83) (56) with default settings. An unrooted neighbor-joining (NJ) tree for the HERV-H sequences was produced using the manually adjusted alignment from BioEdit (see above). Bootstrappings of the NJ trees were conducted in 1,000 replications using ClustalX (1.83) and visualized in TreeView (version 1.6.6) (42).
Splice predictions in the putative HERV-H proviral sequence were conducted using NetGene2 at http://www.cbs.dtu.dk/services/NetGene2/ (9) and the NNSPLICE 0.9 at http://www.fruitfly.org/seq_tools/splice.html. The predicted splice donor (SD) and splice acceptor (SA) sites were analyzed in a Gene2EST search (http://woody.embl-heidelberg.de/gene2est/) (20) (with the RepeatMasker disabled) using the putative HERV-H consensus sequence that ranged over the R region of the 5'LTR and over the U3 region in the 3'LTR.
HERV-H env (chromosome 2q24.3, positions 642 through 2396 in AF108843) was used in screening the expressed sequence tag (EST) database at GenBank (dbEST release 061402 containing 4,458,530 sequences) for expression of the entire env gene. A new search in the same database using HERV-H env TM protein (AF108843, positions 1857 through 2396) was performed to decide whether the SU, TM, or both were expressed. Criterium for inclusion was at least 80% sequence identity over at least 100 nucleotides.
Laboratory procedures. Polyclonal antisera against HERV-H SU (peptide 1437, PPEELIYFLDRSSKTSPDIS, preimmunization serum rabbit 430 and serum VU800) were raised in rabbits. The tetrameric multiantigenic peptide (based on a backbone of ß-alanine and two lysines) was purified by reverse-phase high performance liquid chromatography (HPLC) on a C18 column to >95% purity and characterized using mass spectrometry.
Human placenta from an anonymous donor was acquired fresh from the Uppsala Academic Hospital (with ethical committee approval). Samples were homogenized in a denaturing buffer (8 M urea, 50 mM phosphate, 50 mM ß-mercaptoethanol, 1% [vol/vol] Tween 20, pH 6.0) using ultra-Turrax 18 (IKA Works Inc. Wilmington, NC). The extract was centrifuged at 4°C and 16,000 xg for 4 min, and the supernatant was collected. The placenta homogenate supernatant was subjected to cation exchange chromatography (loading buffer, pH 6.0, 8 M urea-50 mM phosphate-50 mM ß-mercaptoethanol-0.1% [vol/vol] Tween 20; eluent, pH 6.0, 8 M urea-50 mM phosphate-50 mM ß-mercaptoethanol-0.1% [vol/vol] Tween 20-1 M NaCl; column, HiPrep 16/10 SP FF [Amersham Pharmacia Biotech, Uppsala, Sweden]), run with 0 M to 1 M NaCl in 40 min), using a Shimadzu LC8 HPLC.
Sodium dodecyl sulfate-polyacrylamide gel electrophoresis and Western blots on the protein fractions were run on a Phast System (Pharmacia, Uppsala, Sweden) using dedicated precast gradient gels (10 to 15% acrylamide) and blotted onto polyvinylidene difluoride membranes. Blocking was conducted using 0.20 (vol/vol) Tween 20. Alkaline phosphatase-conjugated goat
-rabbit (Sigma, A3812) antibody (Ab) was used as secondary Ab to detect the primary rabbit
-HERV-H SU and TM Abs.
|
|
|---|
|
View this table: [in a new window] |
TABLE 2. Eighteen full-length HERV-H provirus sequences
|
![]() View larger version (24K): [in a new window] |
FIG. 1. Left panel: NJ tree (1,000 bootstraps) of HERV-H-like and reference gammaretrovirus pol sequences. Right panel: outlined NJ tree (1,000 bootstraps) of full-length HERV-H-like elements used in HERV-H consensus construct and the HERV-H consensus which positioned close to the center. Encircled sequences have a 344-nt segment in gag covering the 5' end of the conserved CA1 consensus motif, which is not present in the other sequences. Sequence comparison for the dendrogram was conducted by neighbor-joining with the Kimura two-parameter model and 500 bootstraps. Sequence comparisons were independent of the different gag lengths through pairwise deletions in the analysis.
|
![]() View larger version (18K): [in a new window] |
FIG. 2. Three reading frames and alternative splicing in the HERV-H consensus sequence as interpreted by RetroTector (Sperber and Blomberg, unpublished) and by EST searches. Proviral genes with outlined names of conserved consensus motifs are presented as black bars below each reading frame. Alternative splice patterns are outlined with their corresponding SD and SA sites.
|
![]() View larger version (80K): [in a new window] |
FIG. 3. The HERV-H consensus sequence likely to represent an "original" provirus. The proviral genes are noted to the left of each panel and with arrows over the nucleotide sequence. Amino acid sequences are presented under the respective gene, and their motifs are presented within boxes. SD and SA are presented over the nucleotide sequence (compare Fig. 2).
|
|
View this table: [in a new window] |
TABLE 3. Nucleotide frequencies of other viruses
|
The LTRs of the 18 HERV-H-like sequences were scanned for potential TF binding sites with the help of the ConSite program (http://mordor.cgb.ki.se/cgi-bin/CONSITE/consite). The search showed numerous potential TF binding sites at an 85% identity cutoff, but no clear conservation, probably due to the poor analysis specificity and the several loci studied. A more detailed analysis of these binding sites was out of scope for this study.
PBS and vicinity. The PBS (456 through 473) (Fig. 3) had a typical sequence (TGGTGCCGTGACTCGGAT) complementary to His-tRNA. Variation in PBS amounted to 21 substitutions at eight sites (underlined in the HERV-H consensus PBS above) among the 18 collected HERV-H sequences when compared to the HERV-H consensus (supplemental data). According to the manually adjusted alignment and the RetroTector prediction, an insertion (compared to a normal gammaretroviral sequence [15]) was observed just 5' of gag in the consensus sequence and was present in all of the 18 collected HERV-H-like elements. As shown below, the predicted HERV-H Gag contained hallmarks which identified its extent, from a typical Gag start to the zinc finger motifs in NC, with reasonable certainty. We are therefore confident that this region, an unusually long 5' leader sequence with an ORF here referred to as "pre-gag" which precedes the traditional gag, is separate from it and is discussed below. The 5' leader also had an ORF that started within the 5' LTR region in frame 3 (nucleotides 294 through 1019). There was no known or predicted protein with this sequence in GenBank. However, in a nucleotide BLAST of the nonredundant database of GenBank, the region was found to share similarity with the HERV-H part of an intergenic splice transcript of PLA2L (PLA2 like, accession no. Z14310) that initiates in HERV-H 5' LTR and splices to genes downstream of the provirus (at chromosome 8q24.1-3) to form a fusion transcript together with the two normally independently expressed HHLA1 (HERV-H LTR associated gene) and Otoconin-90 (18, 30, 60). The similarity was found in frame 1 of the HERV-H consensus ranging over the nucleotides 448 through 504 and continued in frame 2 for nucleotides 506 through 598 (Fig. 2). The intergenic splice transcript most probably uses the major splice donor at corresponding nucleotide position 600 in the HERV-H consensus. This position had a high likelihood score in the NetGene2 and the NNSPLICE 0.9 splice prediction programs (Fig. 2). It has also been found in major HERV-H transcripts (33).
As mentioned above, the 5' leader region also contained an ORF, "pre-gag" in frame 1, that started at nucleotide 904 and continued into the gag ORF (frame 1 with start at nucleotide 1255) (Fig. 3). This 117-amino acid N-terminal proline- and serine-rich "elongation" of the predicted Gag had three repeated leucine zipper motifs (consensus, L-x(6)-L-x(6)-L-x(6)-L, ranging over the nucleotides 1060 through 1125, 1081 through 1146, and 1102 through 1167) (data not shown). The proline content was 21/117 amino acids in the HERV-H consensus and 18/84 in MLV pp12, whereas the serine content was 19/117 in HERV-H and 8/84 in MLV pp12. Both prolines and serines were repeated in heptads (supplemental data). A BLAST search in GenBank with this sequence did not yield any known or predicted protein, nor were there logically positioned splice donor/acceptor sites or ESTs to indicate a separate "pre-gag" transcript, which would encode such a protein. The HERV-H 5' leader region was analyzed with respect to secondary structure using the Mfold web server (65) and compared for similarities with the MLV (NC_001501). MLV has earlier been shown to utilize a mechanism involving an internal ribosomal entry site (IRES) situated in the 5' leader region (4). Although multiple hairpins were predicted in HERV-H "pre-gag," we could not demonstrate apparent similarities in folding structure and possible IRES formation.
gag. The gag (nucleotides 1255 through 2850, frame 1) (Fig. 3) was recognized by six motifs with corresponding nucleotide positions in the HERV-H consensus: MA (MA1, 1255 through 1272), CA (CA0, 1852 through 1941; CA1, 2254 through 2319; and CA2, 2389 through 2421), and NC (NC1, 2644 through 2685, and NC2, 2713 through 2751). As mentioned, the N-terminal sequence of Gag MA was predicted to start with "MGNL..." where the G should be a myristylation site, needed for virion assembly (22, 47). The assigned start methionine conforms to the Kozak consensus motif for translational start (32). In the MA-CA region, a PPPY motif, which in several retroviruses encodes a so-called "late" function (63), occurred at nucleotide positions 1729 through 1740, 159 amino acids from the N-terminal of Gag. This "late" motif has been shown to interact with the Nedd4 family of ubiquitine ligases and is required for the budding process in Rous sarcoma virus (RSV p2b) (28). MLV pp12 (NP_955584), which contains the MLV "late" motif, showed a weak similarity to the 5' extended HERV-H "pre-Gag" in an alignment (supplemental data). The 18 aligned HERV-H sequences could also be divided into two groups on the basis of Gag structure, where one group missed a segment corresponding to the nucleotide positions 1933 through 2277 in the consensus sequence (Fig. 3; see alignment in supplemental data) and thus partly in the 5' end of the major homology region (MHR) (see below), in the CA1 consensus motif. Interestingly, the sequences with "complete" gag, which lacked the 344-nucleotide deletion, grouped together in a full-length provirus unrooted dendrogram (Fig. 1). Remaining after the deletion was an almost intact C-terminal domain (CTD) of CA. The N-terminal domain (NTD) of the related gammaretroviral MLV CA (also largely similar to human immunodeficiency virus [HIV], Rous sarcoma virus [RSV], and human T-cell leukemia virus [HTLV]) was recently shown by X-ray structure to be involved in Gag assembly (41). Among the 926 HERV-H pol-containing elements in the genome, the HERV-H consensus pol grouped within the RGH2-like subgroup (see supplemental data and Jern et al. [24]). The full HERV-H-like sequence dendrogram was independent of the 344-nucleotide deletion, since pairwise deletions were used in the analysis. Twenty amino acids in the conserved MHR (...TTQGKDKNPAQFMARLAATL...), compared to the consensus of Benit et al. (2) in Fig. 4, were located in the CA 336 amino acids from the N-terminal of Gag and started approximately at nucleotide position 2260 in the HERV-H consensus. The start of the MHR was missing in 12 of the selected HERV-H sequences, since it was located in the 344-nucleotide deletion described above (Fig. 4). The HERV-H elements in hg16 at chromosome 19 position 20068590 and chromosome 10 position 104629785 had the most complete gag genes, each with just 1 shift and 1 stop (supplemental data). In principle, truncated Gag proteins could be expressed from chromosome 19 at position 20068590, chromosome 10 at position 104629785, chromosome 7 at position 106030611, and chromosome 16 at position 9738198.
![]() View larger version (107K): [in a new window] |
FIG. 4. Gag alignment ranging from the MHR to the zinc finger motifs for HERV-H consensus and other gamma-HERV-W, -E, -T, -GALV, -MLV, and -FeLV together with two beta-MMTV and -MPMV, one epsilonretrovirus (Xen1), and the more distant retrotransposons (Gypsy and Copia). MHR consensus ( , aromatic; O, hydrophobic amino acids; invariants are Q, E, and R) was adapted from Benit et al. (2). The HERV-H sequences share the properties of double zinc finger motifs missing in the other gammaretroviruses that in contrast display an increased density of charged residues in the charged segment.
|
pro. The pro (start at nt 2911 extending to approximately nucleotide 3267, frame 1) (Fig. 3) contained two detectable conserved consensus motifs (PR2: 2944 through 2973 and PR3: 3133 through 3165), located in the same reading frame but separated from gag by several stop codons (the first was more conserved than the following two) (see alignment in supplemental data). Thus, an ancestral HERV-H probably had one stop, which is normal for gammaretroviruses (15), despite this data set. The pro structure was similar to that of other gammaretroviral proteases (data not shown). The HERV-H consensus had a noncanonical NTE motif instead of the normal DTG or DSG in the active site (Fig. 3, PR2; see alignment in supplemental data). Consequently, it was present in a majority of the HERV-H Pro puteins (data not shown). However, a small subset of HERV-H had DTG. Thus, HERV-H may have gone through an evolution in this motif. An analysis of betaretroviral Pro puteins resulted in DTD, DTG, and DSG variants (data not shown).
As expected, pro was not separated by stops and frameshifts from the downstream pol. This arrangement of gag and pro-pol ORFs has been described as typical for gammaretroviruses with MLV as a model (15). HERV-H-like proviruses in general had few stops and frameshifts in pro. The pro was found to be the only gene that exists in many completely open forms (see selected alignment in supplemental data). In 48 of 926 HERV-H-like proviruses of the human genome (version hg15), pro had an ORF and was, on average, 300 nucleotides long. However, the pro genes had approximately the same ratio between number of stops and shifts and sequence length as the gag, pol, and env genes (data not shown). This does not support a selection for open pro, nor does it rule out the possibility for pro ORF maintenance selection in a specific locus or loci.
pol. The pol (nucleotides approximately 3268 through 6369, frame 1) (Fig. 3) was located just 3' of pro without stops separating the genes in accordance with the typical gammaretroviral genome organization (see above). There were 11 conserved consensus motifs recognized by RetroTector in both RT (RT1, 3457 through 3486; RT2, 3613 through 3654; RT3, 3718 through 3762; RT4, 3826 through 3855; and RT5, 3919 through 3948) and IN (IN2, 5329 through 5355; IN3, 5425 through 5457; IN4, 5527 through 5550; IN5, 5707 through 5742; IN6, 5791 through 5838; and IN7, 6169 through 6216) (Table 1; Fig. 3; supplemental data). An RNaseH motif could also be detected, with the help of alignments and comparison of the RNaseH consensus derived from 1,605 RNaseH sequences in Pfam (DG-(38 aa) E-(20 aa)-DS (64 aa)-N-(3 aa)-D), PF00075, http://www.sanger.ac.uk/Software/Pfam/), just 3' of RT and ranging approximately over the nucleotide positions 4721 through 4898. The predicted HERV-H consensus Pol protein was similar to other gammaretroviral Pol in its internal RT-RNaseH-IN arrangement and intermotif distances as detected with RetroTector. The characteristic HHCC zinc finger motif and the DD35E catalytic domains (15) were present, as implemented in the RetroTector Pol analysis.
The majority of HERV-H elements had a clear GPY/F domain (explained in discussion), detected as an "IN7" motif by RetroTector (alignment in supplemental data). The deduced GPY/F domain of HERV-H aligned well with the known GPY/F domains of other retroviral elements (29, 36). Despite numerous pol sequences detected by RetroTector, we did not find complete ORFs (see supplemental data).
env. env (6668 through 8419, frame 2) (Fig. 3) was located 3' of pol and was preceded by a splice acceptor signal in pol at position 6097 (CCTACTCCAG\ATCCCCAGCC). Further, it contained three detectable motifs: SU3, 7823 through 7867; TM3, 8027 through 8092; and TM5, 8267 through 8302. A von Heijne SP could be recognized in the 5' of SU (PSNTSTLMKFYSLLLYSLLFSFPFL, using tools at http://www.cbs.dtu.dk/services/SignalP/) and ranged over nucleotide positions 6690 through 6767. Separating the SU from TM, a furin cleavage site, RQKR, was detected at nucleotide positions 7817 through 7828. In exogenous retroviruses, the cleavage (/) occurs after R in the consensus sequence RX(R/K)R/. The ISU (also noted as the CKS17 motif) with underlined consensus, LQNRRGLDLLTAEKGGLCIF, could be detected in the TM and ranged from nucleotide positions 8027 through 8086. N-Glycosylation sites were found starting at nucleotide positions 6698, 6809, 7334, 7463, 7517, 7724, 7778, 8117, and 8348. The HERV-H consensus aligned well to the three env ORFs (17), extracted with RetroTector from the human genome (supplemental data).
Further, the PPT(AAGAAGGCAGGA) was recognized just upstream of the 3' LTR and ranged from nucleotide positions 8572 through 8584 (Fig. 3).
Alternative splicing and ESTs. In the putative HERV-H consensus, numerous SD sites and even more SA sites were predicted. To analyze the predicted SD and SA sites in the putative HERV-H consensus proviral sequence, we performed a search for ESTs using the Gene2EST (20) and the dbEST at National Center for Biotechnology Information. The major SD site was found at the predicted nucleotide position 600, not far downstream of the PBS (Fig. 2; Fig. 3, sequence). Another SD (33, 61) was confirmed at nucleotide position 2927 at the border of gag and pro. An SA site (33, 61) was confirmed at nucleotide position 2824. Further, the major SA for env transcripts (33) was recognized within pol at nucleotide 6097. Splicing in this region was observed in ESTs, but the exact position of this SA could not be confirmed among cDNA in the dbEST. Another previously shown SA was found 9 nucleotides upstream of the 3' LTR at position 8574 and another SA located 5 nucleotides into the 3' LTR at position 8590 (61). Neither additional SDs nor additional SAs were confirmed by ESTs. EST confirmations of predicted SD and SA did not always result in identical positions. The observations of ESTs were that splicing occurred in a region plus/minus a few nucleotides from the predicted position.
To analyze the expression of env, we searched the dbEST at GenBank (with HERV-H env AF108843, positions 642 through 2396, which share 98% identity with the HERV-H consensus env). Removal of sequences that were <80% identical to the query sequence over a segment of at least 100 nt resulted in a selection of 129 hits, and in a BLAT search, we noted respective loci for expression (Table 4). As expected, there were generally more hits in a search with the less variable TM than with SU. Colon tumor cDNA libraries generated the most hits (17 hits), exclusively for the TM. These differences may be due to the different cDNA synthesis and cloning strategies in different cDNA libraries. The env-containing ESTs were short, and none of them encompassed the env splice acceptor site, which thus could not be confirmed in this way.
|
View this table: [in a new window] |
TABLE 4. Extract of dbEST search results; tissues with deviating expression patterns between SU and TM
|
|
|
|---|
Consensus sequences have been used successfully for other retroelements, e.g., long interspersed nucleotide elements (55). However, that exhaustive analysis was conducted on sequences less than 1 kb. Such a large-scale analysis of full-length retroviruses (approximately 9 kb) in alignments is not amenable. Therefore, we chose full-length sequences scattered over the HERV-H tree so the data set used here is representative (see the cladogram in the supplemental data and Jern et al. [24]). It is separate from other gammaretroviruses (Fig. 1, left panel). A codon-guided alignment is important to judge the effect of selected (preintegrational) and neutral (postintegrational, in absence of physiological or pathogenic function) substitutions. This was to a large degree achieved by the putein-guided minor adjustments of the computer-generated consensus (Fig. 3 and alignment in supplemental data).
The markedly skewed nucleotide frequency in HERV-H disfavoring guanine and favoring cytidine (Table 3) ought to have a functional explanation. In HIV, the G-to-A hypermutation is caused by encapsidation of the host enzyme APOBEC3G, a cytidine deaminase (37), which converts C to U in the antisense strand. This is likely to be a host defense measure against retroviruses (59), and gives HIV an excess of A. In fact, all lentiviruses have an excess of A (Table 3). Skewed nucleotide distributions also occur in the deltaretroviruses (HTLV and bovine leukemia virus), which are rich in C and in betaretroviruses [like HERV-K(HML-2) and MMTV], which, like HIV, are rich in A (3). Gammaretroviruses (like MLV and FeLV) do not have a marked overrepresentation of any nucleotide (Table 3). The mechanisms involved in the skewed distributions of nonlentiviral retroviruses are not known. Our observation indicates that exogenous HERV-H had a hypermutation mechanism which depleted guanine and enriched for cytidine. It is more pronounced than in HTLV. Since guanine is a purine, while cytidine is a pyrimidine, it is difficult to envisage a simple chemical conversion, which leads from one to the other, like the deamination catalyzed by APOBEC3G. However, regardless of mechanism, the skewed nucleotide frequency of HERV-H-related proviruses is an additional and independent tool for the classification of ERVs. The confinement of an extraordinarily low G/C ratio to the 926 HERV-H and most of the 106 adjacent HERV-H proviruses (Table 3) segregates these clades from the rest of the gammaretroviruses.
The HERV-H consensus LTRs differed by 1.1% and thus indicated proximity to the proviral integration event where the LTRs would be identical. Random, postintegrational mutations were most probably evened out in the consensus LTRs. However, this difference may also represent the ambiguity of the sequence and roughly indicates the accuracy of the consensus. The LTR structures were bona fide according to identified TF binding sites. Multiple hits in the dbEST also confirmed the start and end of the R region of the 5' LTR, which also verified the RetroTector prediction (data not shown).
The 5' leader region between the PBS and the gag (Fig. 2 and 3) may serve several functions and usually contains the packaging signal
downstream of the major splice donor site (for a review, see Coffin et al. [15]). However, the 5' leader in the HERV-H consensus was rather long and contained additional features. The 117-amino acid 5' "elongation" of the Gag protein ("pre-Gag") had heptad repeats of leucine, proline, and serine. Heptad serine repeats in RNA polymerase II CTD have several regulatory functions, e.g., initiation of transcription, which is decreased when CTD is phosphorylated (46). This phosphorylation of the serines also prevents transcription by accidental readthrough of upstream stop codons. However, the function of the "pre-Gag" sequence of HERV-H is uncertain.
MLV has earlier been shown to have an alternative gag start (CUG) upstream of the normal gag start (AUG), thus producing a larger "glyco-Gag" (45). In normal initiation of translation, the ribosome, carrying Met-tRNA and initiation factors, binds to the 5' CAP and migrates (scans) along the 5' leader until it encounters an AUG codon. Besides the 5' CAP initiation of translation, the MLV 5' leader sequence has an IRES that is involved in the initiation of translation of both Gag and glyco-Gag precursors (4). In a comparison between the HERV-H and the MLV 5' leaders using the Mfold web server (65), we could not find evidence for a folding similarity. This may be due to the difference in length between the two sequences. However, multiple hairpin structures (
G < 70 kcal/mol) were predicted in both sequences. Kozak showed that a
G of 50 kcal/mol would, in theory, be sufficient to interfere with the ribosome migration in a CAP-initiated translation (31). This is circumstantial evidence for HERV-H sharing the features of MLV with both CAP- and alternative IRES-initiated translation.
Another ORF was found in the HERV-H consensus ranging from nucleotide positions 294 through 1019. The function of this ORF remains to be shown, but additional ORFs upstream of gag have recently been reported for other proviruses (25). The region upstream of HERV-H gag may also have a physiological function. The HERV-H LTR has been demonstrated to initiate a phosholipase A2-related gene, generating the single-copy gene PLA2L (PLA2-like, accession no. Z14310), which is expressed in human teratocarcinoma cells (18, 30, 59). An interpretation is that PLA2L was derived from a HERV-H that suffered a deletion in that particular region of the 5' leader, resulting in a single ORF instead of separate ORFs in frames 1 and 2 (Fig. 2). The PLA2L intergenic transcript could be a general example of how old retroviruses gain coding potential and how serendipitous generation of physiological functions can take place. In some retroviruses, serine- and proline-rich sequences occur as separate proteins between the MA and CA proteins (MLV, MPMV, MMTV) within the Gag polyprotein and can also contain late domains (19). They are phosphorylated and probably serve regulatory functions (63). HERV-H lacks a pp12, but hypothetically, the proline-rich predicted Pre-Gag protein could serve similar functions.
The HTLV Gag CA has two distinct domains, the NTD and CTD (27, 41), where mutations in the CTD, including the conserved 20 amino acids of MHR, are detrimental for virus assembly and mutations in the NTD produce noninfectious viruses. Interestingly, 12 out of the selected 18 HERV-H-like sequences had deletions in the MHR of CTD (Fig. 4). The uniform pattern of the sequence with the 344-nucleotide gag deletion (located together in Fig. 1), including the CTD and thus the MHR, indicated a single event. In fact, most the 926 HERV have the deletion in CA. The deletion probably arose earlier than the divergence of RTVLH2 and RGH2 subgroups of HERV-H (cladogram in supplemental data). We postulate that these proviruses needed help from RGH2-like "midwife" elements (24) with a complete gag that could provide a functional CA. The lentiviral conserved "AGPI" and HTLV-1 "AGPL" motifs (27), which bind the cellular cyclophilin A (CypA) protein needed in viral replication, could not be found the HERV-H CA. Thus, CypA was probably not needed for the infectivity of HERV-H (for a review, see reference 27).
Like betaretroviruses (MMTV and MPMV), epsilonretrovirus (Xen1), gypsy, copia, deltaretrovirus (not shown), and lentiretrovirus (not shown), the HERV-H consensus Gag had two zinc finger motifs (Fig. 4). Normally, gammaretroviruses (e.g., MLV, GaLV, HERV-E,-W, etc.) have only the first of the two motifs, but remnants of a second zinc finger, or a complete one, were found in all of the 18 representative HERV-H-like sequences. We propose that the original Gammaretrovirus ancestor had two finger motifs in its NC protein and lost one during evolution. The lost finger would have been replaced by a section of charged amino acids in the CA-NC region (...KREETPEER... in MLV [Fig. 4]) (11), sometimes referred to as an "electric wire." This evolutionary scenario is supported by the presence of two zinc finger motifs in the Epsilonretrovirus Xen1 (AJ506107), from Xenopus laevis, which branches off at the root of Gammaretrovirus (25). Gammaretroviral HERVs other than HERV-H retain a miscellany of a second zinc finger, concomitantly with a more or less dense accumulation of charged amino acids (Fig. 4) upstream of the first zinc finger. The charged amino acid segment (most developed in MLV) upstream of the zinc finger locates in a region overlapped by the Gag interaction (I) domain which is required for virion formation (8) and may have arisen as compensation for the loss of the second zinc finger. Thus, the human genomic record also provides evidence for this sequence of retroviral evolution.
Particulate fractions from supernatants of multiple sclerosis B cells have been reported to contain HERV-H sequences with fragments of the second zinc finger motif similar to that of RGH2 (13). A database search indicated that they were similar, but not identical, to several HERV-H loci (data not shown). Taken at face value, this could indicate a possibility for RGH2-like "midwife" elements (24) to "break out" (5) under special conditions. A related question is whether the few partial gag ORFs recorded here have any potential for expression of Gag functions. From our data, we do not deduce a packaging-competent HERV-H capsid. However, it should be further investigated.
The stop codons observed in frame between gag and pro (Fig. 3; see alignment in supplemental data) may be a consequence of imperfections during consensus construction, exemplified by the 1.1% error rate estimated through LTR divergence. A consensus sequence will be identical to the ancestral one only if postintegrational mutations are random. If a common inactivating mutation is selected for, it will occur in the consensus. Thus, an ancestral HERV-H probably had one stop, which is normal for gammaretroviruses (15). The NTE protease motif of the consensus may not be representative of the original HERV-H. Its functionality is uncertain. It may be a consequence of the gradual taming of an exogenous HERV-H. (We thank Ronald Swanstrom for pointing this out for us.) The numerous HERV-H-like pro ORFs or near-ORFs in the human genome may simply be due to their shorter sequences (about 300 nucleotides) and thereby a smaller target for random mutations. On the other hand, a separately expressed Pro derived from other proviruses such as the betaretroviral RERV-H was reported (64). We here show the presence of a splice acceptor site in pro (see Lindeskog et al. [33]), raising the possibility of similar mechanisms in HERV-H. However, ESTs supporting a separate HERV-H pro expression were not found. The degree of expression and possible physiological functions of the HERV-H protease should be investigated.
Deletions in pol have been observed in the conserved consensus motifs RT4 and RT5 (Table 1) (see Jern et al. [24]). This was also true for 11 of the 18 representative HERV-H-like sequences used in the consensus construct (supplemental data). The deletion in RT4 that encodes the most conserved YXDD motif used as target in many PCRs may thus cause false-negative results in broadly amplifying HERV-H pol PCRs (for a review, see reference 24). The addition of functional modules in carboxy-terminal IN without disturbing the basic integrase can result in extra features like the conserved GPY/F domain (36). To this domain, another "chromo" (chromatin-binding) domain is sometimes appended (36). The integrase carboxyterminus may interact with chromatin via DNA-binding proteins (49, 53). In the distant relatives gypsy and chromovirus, these two domains together direct the integration to more or less specific genomic positions (53). The functional implications of the broad conservation of the GPY/F motif within retroelements remain to be studied. The large amount of HERV-H integrations may give hints to the possible integration specificity endowed by the HERV-H Pol carboxy terminus.
Several HERV-H proviruses contain env ORFs (17, 23, 34, 38) and a number of HERV-H env polymorphism studies have been conducted (17, 23). However, unlike HERV-W Env (40), the HERV-H Env has hitherto not been detected in human tissues. In this study, we describe the expression of many HERV-H env-containing ESTs. HERV-H RNA expression was found in several malignant tissues. env was detected mostly in colon tumor libraries where TM transcripts dominated over those detected with HERV-H consensus SU. A polyclonal antiserum raised from a HERV-H SU peptide could detect a protein in human normal placenta. The HERV-H env RNA is expressed from several loci (Table 4). Transcripts were not necessarily complete, since in several tissues, we found only TM expression. We found no proof for expression of an entire spliced env product in the dbEST search. This may be explained by a relative incompleteness of the dbEST. Our EST results were also consistent with earlier described data on a HERV-H/F virus (43). The relation of the HERV-H loci identified here, in particular, their env sequences, to various cancers should be studied. Functional, immunological as well as diagnostically useful information may emerge from such a study.
Concluding remarks.
The HERV-H consensus derived from the largest endogenous retrovirus family (constituting nearly one-third of all pol-containing ERVs detected by RetroTector) shared a typical gene arrangement with other gammaretroviruses and may thus be useful as a model in studies on retroviral evolution and in expression and polymorphism studies. Unique features of HERV-H consensus compared to other gammaretroviruses are the pre-gag ORF whose function is obscure, and the two zinc finger motifs in gag, a gene which currently is in the searchlight because of its interaction with cellular factors like cyclophilin, APOBEC3G, and TRIM5
. The two zinc fingers probably existed early in Gammaretrovirus evolution. After the HERV-H branching off, the second motif was lost and was followed by an accumulation of a more or less densely charged section in the CA-NC boundary, which is most marked in MLV, to cover the loss. The rich genetic retroviral sequence record materially helps in the understanding of retroviral evolution.
We thank dr Rüdiger Pipkorn, Deutsche Krebsforschungszentrum, Heidelberg, Germany, for peptide synthesis, purification and characterization.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»