Previous Article | Next Article ![]()
Journal of Virology, July 2005, p. 9270-9284, Vol. 79, No. 14
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.14.9270-9284.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Department of Pathology,1 Section of Virology, Department of Medical Sciences,2 Unit of Physiology, Department of Neuroscience, Uppsala University, Uppsala, Sweden,3 Department of Laboratory Medicine, Women and Childrens Health, NTNU, Trondheim, Norway4
Received 27 September 2004/ Accepted 26 March 2005
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
ERV3 is one of the most studied HERVs (3-5, 11, 15, 18, 19, 26, 29-31, 41, 51, 59, 60, 78, 86). It is sometimes referred to as HERV-R. We dislike this term for reasons discussed below. ERV3 is a class I HERV which clusters with viruses belonging to the genus Gammaretrovirus (10, 77). It integrated 30 to 40 million years ago and is present in humans and other higher primates, with the exception of gorillas (26). It was originally thought to be a solitary provirus (47). A few later observations indicated the existence of a group of ERV3-like sequences (8, 53). Despite this, a detailed description of this group, its expression, and the extent of ERV3-like sequences is lacking. This study describes the structure of the original ERV3 locus at chromosome band 7q11, identifies several novel ERV3-like sequences in the human genome, and defines an ERV3-like HERV group. Transcription factor binding sites unique to the 5' LTR which possibly are involved in ERV3 7q11 expression were identified. Expressed sequence tag (EST) sequences probably transcribed from the ERV3 7q11 locus were searched for in different cell types and used to corroborate previously reported cap, splice, and polyadenylation sites (30). We developed real-time quantitative PCRs (QPCRs) that quantitatively detect transcripts from the SU of env from HERV-E(4-1) and ERV3 7q11 in comparison with nonretroviral reference genes. The QPCR was used to study the tissue-specific pattern of RNA expression of the env genes of ERV3 7q11 and a related group, HERV-E. We also checked ERV3 env mRNA expression at the tissue and cellular level by an in situ hybridization (ISH) technique.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
EST sequences were searched in dbEST (National Center for Biotechnology Information) during the spring of 2004 by using BLAST (1). EST sequences were selected according to percent identity to the ERV3 7q11 search sequence, as indicated in Results.
Characterization of ERV3-like sequences. The programs RetroTector and RetroTector Shell (G. O. Sperber and J. Blomberg, unpublished data) were used with the April 2003 version (hg15) of the human genome to identify likely retroviral sequences. Most of the ALU sequences were removed prior to the analysis. The programs identified 3,661 elements with a pol gene. These were characterized and classified by the programs. Briefly, the program function RetrovID searches for LTR-like sequences and conserved retroviral motifs from gag, pro, pol, and env and parses them together into chains by the use of distance tables and other heuristics. The function ORFID constructs likely protein sequences (puteins) from the machine-identified gag, pro, pol, and env gene candidates. It uses codon statistics, frequency of stop codons, and alignment to proteins of known retroviruses belonging to the same genus to approximate the original ORF. The Xonid function searches for possible ORFs in stretches not used by ORFID. The RetroTector interpretations were automatically matched with sequences in an annotated list of retroviral sequences from GenBank, using similarity to Pol proteins, and in a list of repetitive elements from Repbase (27), using similarity to the entire nucleotide sequence of the elements. Both similarity searches were performed with word-based BLAST-like algorithms (1) written by J. Blomberg. This initial screening was followed by alignments and clustering using pol and env nucleotides and Pol and Env proteins with the Clustal W 1.83 program. Similarity matrices were generated from all alignments. The Kimura and PAM250 score matrices were used for nucleotides and proteins, respectively. Both alignments and similarity matrices were based on the pairwise deletion technique. This minimizes the influence of insertions and deletions in the retroviral elements. Thus, similarities were not much affected by gaps. The cladogram shown below (see Fig. 2B) was based on the guide neighbor-joining tree produced by Clustal. It was processed with TreeView (courtesy of R. Page, Taxonomy and Systematics, Division of Environmental and Evolutionary Institute of Biomedical and Life Sciences, University of Glasgow; available at http://taxonomy.zoology.gla.ac.uk/rod/rod.html) and tree-modifying programs written by J. Blomberg. The final classification of retroviral elements was consequently based on several independent methods. RetroTector also has an XonID function, which localizes ORFs whose products are longer than 50 amino acids (aa) which do not overlap the four major ones (gag, pro, pol, and env). XonID prioritizes ORFs which begin just after a predicted splice acceptor and end at a predicted splice donor site. It also uses the same codon statistical functions as ORFID. It was applied to the ERV3 locus at 7q11.
|
RNA samples from diverse tissues and cDNA synthesis. A commercial RNA panel, Human Total RNA Master Panel II (Clontech Laboratories, Palo Alto, CA), was used for cDNA synthesis with subsequent QPCR. Human Total RNA Master Panel II contains RNA from 20 different tissues, most of which consist of pooled RNA from two or more persons. RNA from tissues from the brain cerebellum, whole brain, heart, liver, and lung originated from a single person. Samples derived from a single person of Asian origin, except for the whole brain, which was of Caucasian origin. The sources of pooled RNA varied between 2 and 84 persons, all of Caucasian origin. During the work, it was discovered that three of the RNA samples, from heart, lung, and liver, probably were degraded. They yielded almost no signal with both HERV and housekeeping gene QPCRs. They were therefore excluded from further analysis. The RNA described above was used as a template for cDNA synthesis with 2 µg RNA in each reaction mixture. Synthesis of cDNA was made in a 50-µl reaction mixture containing Stratascript reverse transcription (RT; 1 U/µl) (Stratagene, Amsterdam, The Netherlands), 1x Stratascript buffer (Stratagene, Amsterdam, The Netherlands), 0.01 M dithiothreitol (Promega, Madison, WI), 0.8 mM each deoxynucleoside triphosphate (Applied Biosystems, PE Europe, The Netherlands), random hexamers (10.6 ng/µl; Amersham Pharmacia Biotech, Uppsala, Sweden), and RNasin (1.6 U/µl; Promega, Madison, WI). The cDNA reaction mixture was incubated at 25°C for 10 min, 37°C for 90 min, and 70°C for 15 min and then stored at 20°C. According to the manufacturer, the RNA contained virtually no genomic DNA. However, a control for DNA contamination, a reaction mixture without RT, was made for every RNA sample. The signal was usually negative. In cases where the negative reaction became positive, it was at least 10 times weaker than the RT-positive signal. It was then subtracted from the RT-positive signal. Two microliters of each cDNA was used in each PCR. We prepared five samples in addition to the 16 tissues included in the RNA panel. The samples included three from different placentas (three individuals) and two samples from skeletal muscle, one from a single person and one pooled from five different persons. Total RNA was isolated from 30 mg of tissue. The QIAGEN RNA isolation kit was used according to the manufacturer's recommendations (Merck Eurolab, Stockholm, Sweden). Total RNA was DNase treated using a DNA-free kit from Ambion (Austin, TX), following the protocol from the manufacturer. After DNase treatment, RNA was used for cDNA synthesis, which was performed as described above in a 50-µl reaction mixture. Using this cDNA, 4 µl was used for each PCR.
Primer design and QPCR. Primers and reporter-quencher (R-Q) probes were designed with the help of the Primer3 program (http://www.genome.wi.mit.edu/cgi-bin/primer/primer3). The midpoint temperature (Tm) was calculated by using the Cybergene program (http://www.cybergene.se/). The primers were synthesized by Interactiva (Thermo Hybaid GmbH, Ulm, Germany), and the three R-Q probes were synthesized by Scandinavian Gene Synthesis (SGS, Köping, Sweden). Primer pairs and probes were all high-performance liquid chromatography purified by the manufacturer. All R-Q probes were labeled with 6-carboxyfluorescein as a reporter at their 5' ends and an internal Dark Quencher attached to a thymidine in a middle position of the probe sequence (12 to 16 bp from the 5' end). We primarily chose histone 3.3 (M11353) for use as a reference gene since it is evenly expressed in many cell types, regardless of cell cycle stage (80, 82, 83). Histone 3.3 mRNA has previously been used as a reference in other RNA expression studies (6, 47, 48). Histone 3.3 mRNA is polyadenylated, unlike other histone mRNAs (80). There is room for further studies of its expression in different tissues. The following histone 3.3 primers were based on those originally designed by Mats Lindeskog (Lund University) (48) but were modified by us to fit new PCR conditions: forward primer 5' CCTCTACTGGAGGGGTGAAGAA 3' (Tm = 59.4°C) and reverse primer 5' TGCCTCCTGCAAAGCACCGATA 3' (Tm = 59.4°C). The probe, 5' CTCTGGAAGCGCAGATCTGTTTTAAAGTCCT 3', is situated 6 nucleotides (nt) from the reverse primer, with a Tm that is 10°C higher than that of the primer pair (Tm = 69.7°C). By the use of dilutions of a clone of an amplimer from a human DNA sample (L. Hu and D. Uzhameckis, unpublished data), the sensitivity for the histone PCR was found to be 1 to 10 target gene equivalents per PCR. The integrity of the plasmid was ascertained by sequencing. Optimization was first made with an iCycler (Bio-Rad Laboratories AB, Sundbyberg, Sweden). Normalization of mRNA expression against cell number, DNA content, total protein, and mRNAs from housekeeping genes, like those for actin, GAPDH (glyceraldehyde-3-phosphate dehydrogenase), and histone 3.3, has been used by us or others in numerous studies (50, 62, 76). These genes were selected because of their previously observed wide and approximately even expression levels in many tissues. Besides histone 3.3 RNA, RNAs from three widely used housekeeping genes, i.e., the GAPDH, hypoxanthine phosphoribosyltransferase 1, and ubiquitin C (UBC) genes (GenBank accession number M26880) (76), were also determined (data not shown). The mean of these three last-mentioned reference RNAs with the Clontech tissue RNA panel was 260 eq per ng of total RNA, with a standard error of the mean of 40.9%, while a histone 3.3 average was 19.5 eq per ng of total RNA, with a standard error of the mean of 15.2%. Thus, histone 3.3 RNA was more evenly expressed. HERV QPCR data were therefore normalized to histone 3.3 RNA expression only. In the following, the word "equivalents" is used instead of "copies" for description of the amount of RNA or DNA detected in QPCR because of probable slight variations in amplification efficiency in samples.
Primer pairs and probes for HERV-E(4-1) and ERV3 7q11 were designed in the SU region where the related HERVs are most divergent and thus should generate specific probes. The chosen primers and probes for HERV-E (HUMER41, accession no. M10976) and ERV3 (HUMERVA34A, accession no. M12140.1) are marked in the original sequence as shown below (see Fig. 3A and B, respectively). The Tm for the ERV3 probe (68.5°C), 5' AACTCGCAACTGTTGGGTTGAGCGGGTCC 3', was 15°C higher than the Tms of its primer pair, namely, forward 5' CGCTAGGGGCACGAGTCA 3' (Tm = 53.4°C) and reverse 5' AGGCAATTTCTGGTTAACATGCT 3' (Tm = 59.2°C).
|
Tissues used for ISH with ERV3 env probe. Normal human tissues were obtained from the Department of Clinical Pathology, Uppsala University Hospital. They were treated according to routine clinical procedures, following surgery, as briefly described below. All tissues were investigated by the same pathologist and judged normal and without remarks. With permission, pieces of the following organs were taken from a 50-year-old woman who died from a ruptured aortic aneurysm: aorta, breast, brain, liver, lung, esophagus, spleen, skeletal muscle, kidney, thyroid, bladder, and stomach. The other tissues (mentioned below) were taken from anonymous routine cases, which were judged as histologically normal. Tissues were fixed in 4% phosphate-buffered formaldehyde and processed for routine histopathology and archived as paraffin blocks. The following tissues were derived from one (not necessarily the same) individual: aorta, bladder, bone marrow, breast, brown fat (from a child), brain (cerebellum), colon, kidney, liver (from a newborn), esophagus, parathyroid gland, pituitary gland (adeno part), skeletal muscle (striated), and seminal vesicle. The following tissues were derived from two or more individuals: adrenal gland, brain, gastric mucosa, heart, lung and bronchial epithelia, lymph node, ovary (corpus luteum), pancreas, parotid gland, placenta (from >10 individuals), prostate, skin (from >10 individuals), testis (slightly atrophic), thymus, thyroid gland, and uterus (endometrium).
ISH. ISH concerning ERV3 7q11 was performed as described previously (3, 5). Briefly, paraffin-embedded tissues were sectioned (4-µm thick) and mounted on 3-aminopropyltrietoxylsilane-coated slides (Sigma, St. Louis, MO). Sections were pretreated with 0.2 M HCl for 10 min and permeabilized with 2 µg/ml Proteinase K (Merck, Darmstadt, Germany) at 37°C for 15 min prior to hybridization. Hybridization conditions were stringent. Tissue sections were hybridized with 35S-labeled riboprobes produced as described before (3) according to a Promega standard protocol for in vitro transcription. Hybridization was then continued overnight at 56°C, and samples were washed in 2x standard saline citrate and 50% formamide prior to treatment with RNase A (Boehringer Mannheim, Germany), at 100 µg/ml, at 37°C for 30 min. Application of NTB2 photographic emulsion (KODAK AB, Järfälla, Sweden) diluted 1:1 in distilled water was followed by exposure at 4°C for 2 to 4 weeks. Slides were developed and counterstained with Mayer's hematoxylin and mounted with Pertex (Histolab Products AB, Göteborg, Sweden). Photos were taken in a bright field with a 40x oil immersion SPLAN objective through a Vanox Zeiss microscope.
Grading of ISH. The grading system used has been previously described (4, 5). The selected tissues were screened by use of a microscope for elevated levels of silver grain density, representing ERV3 env mRNA. A semiquantitative grading was done using scores ranging from to +++, and sample slides were compared to their respective positive control slides with sections of placenta and skin. Most tissues found positive were tested at least twice and from more than one individual (see Table 3).
|
| RESULTS |
|---|
|
|
|---|
Structure of the ERV3-locus on chromosome 7, band q11. We found that the clone AC073210.8 (Table 1; see Fig. 3A) in GenBank contains not only the annotated ERV3 pol env 3' LTR sequence (accession no. M12140.1), submitted by Cohen et al. (15) but also a 5' LTR. Its absolute position on chromosome 7 is antisense to the assigned chromosomal direction, with the 5' LTR starting at position 63858202 (human genome version hg15, April 2003) (Fig. 1). The LTRs are 91% identical. The provirus has a primer binding site (PBS) complementary to arginine tRNA, and gag, pro, pol, and env sequences similar to those of other members of genus Gammaretrovirus. The only longer ORFs are in pro and env. A detailed interpretation done with RetroTector is available as supplemental material (see Fig. S1). The distance between the PBS and the probable start of the gag sequences is unusually long (900 bp) but has no likely ORF, as studied with XonID (Fig. 1). The predicted Gag polyprotein contains a matrix protein (MA) with an MGQ myristylation signal and a PPPY motif necessary for "late" functions, i.e., budding (54, 55, 81). A capsid protein (CA) with a major homology region (MHR) related to that of MLV, HERV-H, and HERV-E and a nucleocapsid protein (NC) with one zinc finger are also present. At the predicted end of Pol is a GPY/F motif, called IN7 in RetroTector. This motif occurs in several gammaretroviruses as well as in gypsy (46) and chromovirus (23) elements, both widespread retrotransposons. In the latter two, the GPY/F motif may border a chromodomain. Both of these C-terminal integrase portions may influence integration specificity by interaction with DNA-binding proteins (69).
|
According to the RetroTector interpretation, Env starts with MLGMNMLLITLFLLLPLSMLK, 40 amino acids upstream of MTKTLLYHTYYECAGTCLGTC, the Env start suggested by Cohen et al. (15). We favor the first sequence. It includes a hydrophobic von Heijne signal sequence motif typical of the amino terminus of retroviral membrane proteins. Other groups (19, 26) made interpretations similar to ours.
ERV3 7q11 ESTs, splicing, and the LTR structure of the ERV3 7q11. A BLAST search with three nucleotide sequences from three portions of the ERV3 7q11 provirus in the human EST database yielded 25 to 86 hits which were >95% identical to the ERV3 7q11 sequence (Table 2). The hit frequency for a tissue depends on the number of hits per tissue cDNA library. Some tissues have many libraries and thus give many hits. Additionally, the cloning strategy, using 5', 3', or random priming, determines the likelihood that a certain query sequence will find a hit. Thus, this statistic is at best semiquantitative. However, it demonstrates in which tissues ERV3 is expressed. Hits for placenta, testis, skin, and brain were especially frequent. Notable were also six ESTs from carcinoid, a multiple endocrine tumor with a strong genetic dependence. Illustrative ESTs are depicted together with the ERV3 7q11 provirus (Fig. 1). Four of the carcinoid ESTs were almost identical (98%) to the 5' LTR of ERV3 7q11 and clearly different from the 3' LTR. The simplest explanation for this is that these ESTs are polyadenylated at the 5' LTR of ERV3 at 7q11, which is unusual. It is unlikely that these transcripts were polyadenylated at single ERV3 LTRs. We did not find any single ERV3 LTRs which were as similar to the transcripts as the ERV3 7q11 5' LTR is. The start site of these transcripts is not known. According to this interpretation it should be upstream of ERV3 7q11.
|
Using EST data, and RetroTector and TFBS predictions, a map of the ERV3 7q11 5' and 3' LTRs was made (see Fig. S5 and S6 in the supplemental material). The 5'-3' LTR nonidentity of 9% indicates that many of the preintegrational TFBS should have been damaged by random mutation. If indeed there was selection for expression of any ERV3 transcripts, it is likely that TFBS essential for their expression were spared or created postintegration. Despite the 9% divergence between the LTRs, the ERV3 7q11 env ORF, covering 1,812 nt, has been maintained. Observations of an ERV3 protein by Western blotting and immunohistochemistry with anti-ERV3 sera (78) also support the existence of an ERV3 Env protein. A possible selection for production of the ERV3 Env protein in certain tissues should be mirrored by selection at the 5' LTR, and less so at the 3' LTR, if the 5' LTR has the dominating promoter. Comparing the 5' and 3' LTRs, a region upstream of the TATAA box was almost spared of mutation and may be especially important for ERV3 promoter activity. Its downstream adjacent region was more extensively mutated (see Fig. S5 and S6b in the supplemental material). To better understand any selective forces on the 5' LTR, we mapped TFBS in the 5' and 3' LTRs. Using ConSite at moderate stringency (10 bits), we determined that TFBS with more than one predicted occurrence in the 5' LTR included 12 SOX-5, 5 HFH-2, 4 ROR
1, 3 Sox17, and 3 Thing1/E47 sites. Of these, three SOX-5, two ROR
1, and three SOX17 sites were unique to the 5' LTR relative to the 3' LTR. For comparison, five random control sequences of equal lengths (492 nt) and with approximately the same ATGC content as that of the 5' LTR of ERV3 (27% A, 29% T, 20% G, and 24% C) were also analyzed. SOX-5 sites occurred 2, 3, 3, 4, and 5 times (average, 3.4), while SOX17 occurred 1, 3, 3, 3, and 4 times (average, 2.8), and ROR
1 occurred 0, 0, 1, 2, and 2 times (average, 1.0) in each of these five control sequences, respectively. The SOX5 frequency thus deviated most from random prediction (P = 0.01, Mann-Whitney test) and was the most frequent among the TFBS unique to the 5' LTR. The predicted SOX-5 sites (12 in the 5' LTR and 10 in the 3' LTR) clustered in the middle of U3. Although several ERV3-like ESTs mapping to sequences downstream of the canonical splice donor and upstream of the canonical splice acceptor were found, none of them probably came from ERV3 7q11, judging from the degree of nucleotide identity. A scarcity of full-length mRNAs from the ERV3 7q11 provirus was also noted earlier (30).
Among the more completely recorded mRNAs, only one encompassing most of ERV3 7q11 env, the placental H-plk.a, was found. Most other env-containing ESTs encompassed only the TM domain. A chimeric cDNA clone starting close to the 3' LTR of ERV3 7q11 (the placental H-plk.b) runs through the ERV3 7q11 3' LTR into an adjacent long interspersed element and is then spliced from an ensuing short interspersed element into a Krüppel zinc finger, as described earlier (30).
ERV3-like sequences based on SU protein sequence. ERV3 has been described as a single-copy HERV (47, 78). However, even if the SU part is one of the most variable in a retrovirus, many ERV3-like sequences were found when searches for matches were done at the protein level, using the SU part of Env. These sequences were subsequently aligned at the nucleotide level. A neighbor-joining unrooted tree was then built (Fig. 2A). This SU-based tree shows the relationship with four other class 1 HERVs, HUMRIRT (M64936), HERV15yq1 (AF290422), HERV15yq2 (AF290423), and HUMERGPE (M74509). HERV-E(4-1) formed a cluster together with other known HERV-E sequences (e.g., HUMERGPE). Close to ERV3 were sequences from clone AC013429, which have a proline tRNA as the PBS, and from clones AB019437 and AC008175, which did not have an identifiable PBS. More distant from ERV3, among the cluster of ERV3 SU-like proviral sequences, was HUMRIRT, also known as RRHERVI (28). The branching pattern was similar to that of the Pol amino-acid based tree (Fig. 2B; discussed below).
Overall, ERV3 SU-like proviruses with an identifiable PBS had a PBS complementary to proline, arginine, or isoleucine tRNA (Table 1). ERV3 itself uses arginine tRNA. The use of an arginine PBS is not confined to ERV3-like elements (see below). We therefore could not call the group "HERV-R."
Delineation of ERV3-like sequences based on similarity in Pol. When the RetroTector version 010 and the RetroTector Shell version 01 programs became available, we selected all ERV3-like sequences in the human genome version 15 using an ERV3 Pol protein-based search. Out of 76 retrieved ERV3-related sequences, 41 ERV3-like sequences based on Pol similarity formed a separate cluster at 80% similarity or higher, using the PAM250 score matrix (Fig. 2B; see Fig. S3b and Table S4a in the supplemental material). Although the 11 ERV3 SU-like elements of Fig. 2A and the 12 elements in Table 1 had a pol gene, several were defective in pol and therefore hard to classify by Pol. However, an alignment showed that all could be accommodated in the larger ERV3-like group, which was Pol based (data not shown). Thus, env and pol genes evolved together in this gammaretroviral group. In the following, the 41 Pol-based ERV3-like elements are simply called "ERV3-like." However, the 80% limit did not completely separate ERV3-like elements from HERV-E-like ones, neither in Pol nor in Env (see Tables S4a and 4b, respectively, in the supplemental material). The neighbor-joining tree derived from the Clustal alignment of the 76 elements (Fig. 2B) also demonstrated the ERV3- and HERV-E-like clusters, with bootstrap support. The similarity contingency table (see Table S4b in the supplemental material) showed a somewhat incomplete separation of ERV3-like and HERV-E-like elements, possibly indicating recombination and/or a common evolutionary origin of the groups. As shown in Fig. S3b (see the supplemental material), the ERV3-like elements consisted of 23 elements highly similar to ERV3 7q11 in Pol, hereafter called "ERV3 elements," and 18 elements highly similar to RRHERVI, hereafter called "RRHERVI elements." In the first group, 13 elements had a recognized env gene, while in the second group, 11 elements had such a gene. Out of the 41 ERV3-like sequences, only one element, the ERV3 locus on 7q11, had an env ORF. Among the 76 elements, HERV-E elements had the most complete pol gene, based on the presence of conserved motifs. The LTR divergence of the 41 ERV3-like elements was on average 11.4%, with a range of 5.1 to 24.2%. The corresponding figures for the HERV-E elements shown in Fig. 3B were 15.1% with a range of 7.0 to 30.5%. Some of the more complete ERV3 elements were analyzed in greater depth (Table 1 and Fig. 3B).
PBS tRNA usage: HERV-R in Repbase (27). Of the 41 ERV3-like elements, 2 used arginine, 2 used proline, and 1 used isoleucine tRNA as a PBS. A weak lysine tRNA-like PBS was also predicted. The PBS of the remaining elements was not identified by RetroTector, for unknown reasons. A RetroTector Shell search indicated that the human genome contains three ERV3-related elements with an arginine PBS. Two of them (ERV3 7q11 itself and chrY.3018841) belonged to the 41 ERV3-like elements, while the third (chr19.21399629) was classified as "HUERSP3-like," a gammaretroviral sequence which is more distant from ERV3 than HERV-E (data not shown).
Altogether, RetroTector detected 72 retroviral elements with an arginine PBS in the human genome. The great majority of them, 69, thus fell outside of the ERV3-like group. Of the 69, 3 were HERVRB-like, 5 were HUERSP3-like, 2 were MER41-like, 2 were HERV19-like, and 2 were HERVFc-like, based on RepBase nomenclature. The rest have not yet been classified, but some of them contained stretches similar to the HERV9 and the little-studied MER41, MER51, and MER66 sequences. This is an example of "PBS promiscuity" (see Discussion). RepBase (27) contains a sequence called HERV-R. This HERV-R was 100% identical to the baboon endogenous retrovirus (74) and therefore is not a human ERV. This is a further reason to avoid using the term HERV-R.
Structure and localization of ERV3-like elements. ERV3-like elements were common on chromosome Y. Six elements were located there, while two occurred on each of chromosomes 2, 6, and 19. Chromosomes 14 and X contained one ERV3-like element each. A complete list of the ERV3-like elements is available in the supplemental material (Table S2). Table 1 contains a partial list of the ERV3-like elements.
The ERV3-like sequences harbored not only env genes but also other proviral structures (Table 1). The LTRs of HERV-E contain a promoter known to affect transcription of several cellular genes. Moreover, HERV-E transcripts have been seen in both normal and diseased tissues (52, 65). To retrieve HERV-E(4-1) Env-encoding sequences, the same type of TBLASTN search that was initially made with ERV3 Env was also made with HERV-E Env. The results are shown with the ERV3-like SU sequences in Fig. 3. The nucleotide alignments presented in Fig. 3 illustrate how the subsequently designed real-time primers and probe fit with the ERV3-like sequences. As anticipated, most similarities were observed in the conserved transmembrane region (TM). Therefore, we chose the SU region of env for designing specific primers. As shown in the alignment, point mutations, insertions, and deletions favor a PCR specific for the ERV3 7q11 locus. A similar narrow specificity was also expected of the HERV-E(4-1) SU-derived primers and probe. The searches for ERV3-like sequences revealed interesting details regarding the genetic environment at their integration sites, mentioned in footnotes to Table 1. It is beyond the scope of this paper to discuss the possible effects of ERV3-like elements on the surrounding genes.
QPCR results. In order to study the HERV expression quantitatively, specific real-time PCRs were developed for ERV3 and HERV-E. The ERV3 QPCR method had a sensitivity of around 10 haploid human genomes and around 10 equivalents of ERV3 plasmid DNA per PCR. The HERV-E QPCR method could detect 5 to 10 haploid genomes and around 10 equivalents of HERV-E(4-1) plasmid DNA. The specificity for both methods measured as the absence of cross-amplification from the HERV-E(4-1) and ERV3 plasmids was 100%.
The ERV3 and HERV-E SU QPCRs confirmed previous data showing high expression in adrenal gland and placenta (30, 33).
QPCR with ERV3 and HERV-E. The two graphs showing the results of QPCR with ERV3 and HERV-E (Fig. 4) reveal great differences in mRNA expression between organs, but also between the SU of ERV3 (Fig. 4A) and HERV-E (Fig. 4B). The high expression of ERV3 in the adrenal gland confirmed previous ISH results. Other organs with high expression levels were whole brain (ERV3), placenta (ERV3 and HERV-E), kidney (ERV3 and HERV-E), thymus (ERV3), thyroid gland (HERV-E), prostate (HERV-E), and trachea (HERV-E). However, all organs had some expression of ERV3 and HERV-E SU.
|
|
| DISCUSSION |
|---|
|
|
|---|
ERV3-like sequences and the ERV3 locus. ERV3 is a rather typical gammaretrovirus. Most of the structures of MLV are discernible in it. However, unlike MLV, there is no indication of a pp12 Gag protein. In addition, a space separates the predicted pol from the predicted env. It contains a putative ORF, here referred to as ERV3 XORF. Theoretically, it could encode a protein of its own, an extension of the Pol protein, or an unusual leader sequence for the Env protein. The position between predicted pol and env is typical of sequences for regulatory proteins of complex retroviruses. This putative protein or protein subdomain should be searched for in tissues with high ERV3 expression using antisera and proteomic techniques. If it is a C-terminal extension of the integrase new to gammaretroviruses, a participation in chromatin binding (69) should be investigated.
Mutilated and recombinant HERVs create taxonomic problems. The somewhat incomplete separation in the similarity matrices (see Fig. S3a and S3b in the supplemental material) may be symptoms of recombination. Gene conversion is an often invoked but rarely proven recombinatory mechanism for highly related HERVs (71). Nevertheless, the Pol-based classification used in this paper was to a large extent consistent with the most similar RepBase (27) sequence, found by a BLAST algorithm with the nucleotide sequence of the whole element. The older PBS-based classification (37) was sometimes congruent. For example, the HERV-E group was well delineated from other HERV groups. A glutamine tRNA PBS is almost exclusively associated with the HERV-E group (work with RetroTector; data not shown). Meanwhile, neutral group names referring to established loci are preferable to minimize confusion. We therefore chose the name "ERV3-like" for the group of elements comprising ERV3 and RRHERVI elements. They share a Pol protein similarity of 80% or higher, as estimated with the PAM250 matrix. Based on a comparison of RepeatMasker (A. Smit, unpublished data) output and its corollary, the HERVd database (53), both of which are based on RepBase, and RetroTector output, from the human genome version 15 (April 2003), the 41 ERV3-like elements defined here were classified either as HERV3 or HERV15 by RepeatMasker (J. Blomberg and G. O. Sperber, unpublished data). The two independent classifications thus were approximately concordant.
ERV3 7q11 has been described as a single-copy gene, related to the higher-copy group HERV-E (51). We here show that it is part of a group of 41 elements. A few of the ERV3-like sequences were previously described, some of which occur on the Y chromosome (34). The env gene of orthologs of the human ERV3 7q11 locus had an ORF that was maintained in several Old World monkeys. However, the locus was absent in gorillas; hence, it may not perform an exclusive essential function in primates (26). Single-nucleotide polymorphisms have been reported in the original ERV3 provirus, where not only env but also LTR polymorphic sequences were discovered by Rasmussen et al. (59, 60). Additionally, de Parseval et al., at the same time, discovered a single-nucleotide polymorphism in the carboxy terminus of ERV3 SU, introducing a stop codon in 1% of Caucasians, which would create a truncated Env protein (18, 19). It is, however, not known if the premature stop in the carboxy-terminal surface region would preclude a function or not. Regardless, women with the mutation had normal pregnancies (19). In previous studies of ERV3, Lin et al. showed that when ERV3 Env was expressed in the trophoblastic cell line BeWo, the trophoblasts differentiated, fused, and produced beta-human chorionic gonadotropin (41). A similar effect was attributed to the HERV-W Env protein syncytin (49), which can fuse cells in placenta, possibly forming the syncytiotrophoblastic layer.
Interestingly, some of the ERV3-like sequences were found to be integrated in the vicinity of genes encoding regulatory proteins, including an estrogen and retinoic acid receptor regulatory protein TIF1 (Table 2), with unknown effects. Although TFBS predictions must be judged with caution, the frequent occurrence (12 hits) of predicted SOX-5 sites in the 5' LTR of ERV3 requires a comment. Sox (Sry-type high-mobility-group box) proteins are a subfamily of DNA-binding proteins with a high-mobility-group domain. Sry is the testis-determining factor (68). Sox proteins have functions in sex determination and neurogenesis (38). SOX-5 is highly expressed in testis, especially in spermatids (9, 12, 73, 84). Expression in the germ line is relevant for an endogenous retrovirus. The other frequently predicted sites were HFH2 and Thing1. HFH2 (FoxD3; 5 hits in the 5'LTR) drives the development of the neural crest and production of parathyroid hormone (58), and Thing1 (Hand1; 3 hits) is important for trophoblast transition to syncytiotrophoblast and neural crest development (16). Carcinoid tumors, here reported to express ERV3 RNA, have been considered to be a neural crest derivative (36). ROR
1 (3 hits) is expressed in sebaceous glands (61) and belongs to the retinoic acid receptor family. Retinoids influence lipid synthesis in sebaceous glands (72) and activate RRHERV-I, which belongs to the ERV3-like HERVs (Table 1) (28). As shown here, ERV3 RNA is abundant in placenta, testis, and sebaceous glands. It is tempting to see a pattern of ERV3 expression corresponding to the known activities of these transcription factors. However, random sequences of similar nucleotide compositions were also predicted to bind some of the same factors. Their possible involvement in promotion of transcription from ERV3 at 7q11 should be experimentally verified in transient-transfection experiments.
Tissue-specific expression of ERV3 and related sequences. ESTs which likely were encoded at ERV3 7q11 were found in cDNA libraries from normal tissues like placenta, testis, and adrenal gland. A reason for the relative scarcity of unspliced transcripts described previously (30), and in this work, could be nonsense-mediated decay of the full-length mRNAs, which are relatively more defective than the env mRNAs (39). Some malignancies, primarily carcinoid and colon tumors, also contained ERV3 cDNA. As mentioned above, carcinoid is a neuroendocrine cancer with an inheritable predisposition. The observation of multiple ERV3 ESTs in two different cDNA libraries from the neuroendocrine tumor carcinoid should be followed up.
QPCR showed high levels of ERV3 7q11 env RNA expression in adrenal gland, skeletal muscle, brain, placenta, thymus, and testis. HERV-E(4-1) env RNA was present primarily in placenta, skeletal muscle, kidney, thyroid gland, prostate, and trachea. Thus, tissues with a function in reproduction as well as endocrine functions tended to have a higher expression of ERV3 7q11 env. The expression was higher than that for env of the related HERV-E(4-1) and had a different tissue profile.
Our ISH studies of ERV3 confirmed previous Northern blot analyses showing that ERV3 is active in some specific cell types, such as syncytiotrophoblasts in placenta and cells in sebaceous glands (3, 30). In addition, high activities were found in brown fat, corpus luteum, testis, and Hassal's bodies in thymus. Several of these tissues are under strong hormonal influence. The EST and QPCR data also showed high expression levels in adrenal gland. The possible functions of a retroviral protein in these tissues are unknown. In animals, a high expression of ERVs is common in reproductive tissues like seminiferous tubules, vesicula seminales, testis, and placenta (20, 24, 25, 32, 40). The results of EST searches, QPCR, and ISH from this investigation and previous Northern blot analyses were similar. They cover different aspects of RNA expression. dbEST covers a broad range of normal and pathological tissues, but quantification is imperfect. QPCR gives an approximate number of HERV RNA molecules per nanogram of RNA or per household gene transcript but is sensitive to mutations in primer and probe sequences. ISH identifies which cell types express the RNA but is not as quantitative as QPCR. Northern blot analysis describes the molecular weights of the RNAs but is less quantitative than QPCR. Since ISH was performed with fixed tissues and PCR was done with fresh ones, the procedures are not completely comparable. Besides, it is known that external factors such as ischemia and anoxia can increase the activities of selected retroelements such as VL-30 (2). Further, polymorphism and epigenetic events such as imprinting could lead to both inter- and intraindividual variation in HERV expression. An interindividual variation in RNA expression has been shown for several HERVs (7, 42, 48, 85).
The results presented here demonstrate that both ERV3 7q11 and HERV-E(4-1) are expressed in most organs, as reported by Sibata et al. (66). Although an exact comparison cannot be made, ERV3 expression was higher than HERV-E expression in several tissues. Similar to the results presented here, expression of ERV3 env has been previously reported to be high in adrenal and sebaceous glands (3, 4) and in placenta (11, 41), whereas HERV-E env expression was found to be high in placenta (66).
In conclusion, this investigation provides a description of ERV3 structure, ERV3-like elements in the human genome, and the degree of ERV3 7q11 and HERV-E(4-1) env RNA expression in normal tissues. Measurements of organ-, cell type-, and transcript-specific expression by QPCR, ISH, and previous Northern blot analysis gave concordant results. The techniques described here will allow further studies of the pattern of HERV expression in selected model systems and in diseased persons.
| ACKNOWLEDGMENTS |
|---|
We thank Anna Forsman, Lijuan Hu, Patric Jern, and Dmitrijs Uzhameckis for valuable assistance.
| FOOTNOTES |
|---|
Supplementary material for this article may be found at http://jvi.asm.org/. ![]()
| REFERENCES |
|---|
|
|
|---|