Previous Article | Next Article ![]()
Journal of Virology, June 2004, p. 5784-5798, Vol. 78, No. 11
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.11.5784-5798.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Terry Fox Laboratory, British Columbia Cancer Agency,1 Department of Medical Genetics, University of British Columbia,Vancouver, British Columbia, Canada2
Received 26 September 2003/ Accepted 29 January 2004
|
|
|---|
|
|
|---|
The
Betaretrovirus genus includes the viruses formerly known as
type B and type D retroviruses
(33). Betaretroviruses
have been discovered in mammalian hosts of wide geographical and
evolutionary diversity (Table
1). Mouse mammary tumor virus (MMTV), the prototype type B retrovirus,
exists in closely related endogenous and exogenous forms, with variable
distribution in both laboratory strains and wild species of mice
(6,
7,
11,
14). Jaagsiekte sheep
retrovirus (JSRV), enzootic nasal tumor virus (ENTV), and endogenous
sheep retrovirus are closely related endogenous and exogenous
retroviruses of sheep and goats
(9,
12,
35). Type D retroviruses
were first discovered in Old World monkeys and include exogenous
(simian retrovirus type 1 [SRV-1], SRV-2, and Mason-Pfizer
monkey virus [MPMV])
(23,
25,
28) and endogenous
(simian endogenous retrovirus [SERV])
(32) forms. Endogenous
type D retroviruses have also been discovered in a New World monkey
(squirrel monkey retrovirus [SMRV])
(8), mice (Mus
musculus type D retrovirus [MusD])
(18), and a metatherian
(marsupial) mammal, the Australian common brushtail possum
(Trichosurus vulpecula endogenous retrovirus type D
[TvERV-D]) (1).
PCR approaches, using degenerate primers based on conserved regions of
the retroviral pro and/or pol genes, have also been
used to detect betaretrovirus-related elements in the genomes of pigs
(10,
22), the bower bird, and
the stripe-faced dunnart
(13), although these
elements have not been completely characterized. Many of the endogenous
betaretroviruses appear to have entered the genomes of their hosts
relatively recently (within the last
10 million years) (Table
1). However, no
satisfactory explanation as to how betaretroviruses could have become
so widely distributed has been presented.
|
View this table: [in a new window] |
TABLE 1. Distribution
of known betaretroviruses
|
17)
subfamilies, which are now almost globally distributed
(21). Several species of
murid rodents are invaluable subjects for laboratory experiments, and
the genomes of two murine speciesMus musculus and
Rattus norvegicushave been almost entirely sequenced
(34;
http://hgsc.bcm.tmc.edu/projects/rat/
[Rat Genome Sequencing Consortium]). A previous study has investigated the origins of MusD, the type D retrovirus present in the genomes of Mus musculus and closely related members of the Murinae subfamily (18). Here we describe the discovery of multiple groups of betaretroviruses present in the genomes of Mus musculus and Rattus norvegicus. We discuss the possible evolutionary origins of these groups of retroviruses and present the hypothesis that murid rodents are responsible for the current global distribution of betaretroviruses.
|
|
|---|
BLAST searches. All searches of genomic DNA databases were performed using either the NCBI BLAST Web server (http://www.ncbi.nlm.nih.gov) or the Network BLAST client server, also available from NCBI. BLAST searches were also performed locally using the Standalone BLAST application.
Pol searches. We searched the translated mouse genome assemblies, using the tBLASTn program, with the amino acid sequences of a highly conserved region of the Pol proteins of all known betaretroviruses and several class II endogenous retroviruses (the mouse, Chinese hamster, and Syrian hamster intracisternal A-type particles [MIAP, CHIAP, and SHIAP], human endogenous retrovirus K10 [HERV-K10], HERV-HML5, HERV-HML6, and rabbit endogenous retrovirus). The region of the Pol protein used in the searches corresponds to the 246-amino-acid sequence spanning the QWPLTNDKLAAAQQL and FQKLLGDINWLRPYLK motifs of the reverse transcriptase (RT) domain (15) of the MPMV Pol protein (amino acids 940 to 1185 of the sequence corresponding to GenBank accession number NP_056891). Results were retrieved in the hit table format and were parsed using a series of Perl scripts to eliminate partial and redundant matches. The remaining nonredundant nucleotide sequences were used to conduct an all-against-all BLASTn comparison. A single element was selected from any group containing members with >95% identity over their entire lengths.
Sequence retrieval. Sequences were retrieved either manually using the Entrez server at NCBI or electronically using the EFetch Perl script provided by NCBI (http://www.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html).
pol nucleotide and Pol amino acid sequence alignment and tree construction. pol nucleotide and Pol amino acid sequence alignments were performed using ClustalX V1.83 (30) and default parameters. The amino acid sequences of elements containing frameshift mutations were manually reconstructed by comparison with the most closely related intact Pol protein. Phylogenetic trees were constructed from alignments by using the neighbor-joining method within ClustalX and were viewed using Tree Explorer (Koichiro Tamura; http://evolgen.biol.metro-u.ac.jp/TE/TE_man.html).
TM tree.
Where present,
transmembrane (TM) sequences were derived from DNA sequences by
conceptual translation. The TM region corresponded to the
150-
to 160-amino-acid region spanning from the cleavage site (RAKR) to the
TM domain (LLGPLLCLLLVLSFGPIIF) of the MPMV Env
protein (amino acids 391 to 547 of the sequence corresponding to
GenBank accession number AAC82575), as described by Bénit et al.
(4). Alignment and tree
construction were performed as described
above.
Primer binding site identification. For those elements that possess long terminal repeats (LTRs), we attempted to identify the tRNA species used to prime reverse transcription. The 25 nucleotides (nt) immediately adjacent to the 5' LTR were compared against a database of tRNA sequences (26) by using the BLASTn program of Standalone BLAST, a word size of 7 nt, and a reduced penalty for mismatches (1). In most cases, the highest-scoring match was assumed to be the priming tRNA.
pol percent identity range and average. Each subgroup of pol sequences was aligned using ClustalX (see above) and output as a percent identity matrix. The percent identity range gives the lowest and highest percent identities from this matrix, whereas the percent identity average is the average of all the percent identities.
pol and LTR copy numbers. pol copy numbers were taken from the initial pol nucleotide tree. LTR copy numbers were estimated by conducting BLASTn searches of the mouse and rat genomes with each LTR sequence. Segmented matches were joined if the gap between matching segments was less than 100 nt, and only those matches of greater than 90% of the length of the original LTR were included in the subsequent analysis. Results of all matches to all LTRs were parsed to eliminate redundant matches, and the copy number of each LTR was tallied.
Repeat annotation. Each pol or LTR sequence was compared with the February 2003 assembly of the mouse genome or the June 2003 assembly of the rat genome by using the BLAT search tool at http://genome.ucsc.edu. The coordinates of the best (and in most cases identical) match were used to parse the repeat annotation (chromOut) files, which were generated using the RepeatMasker program (http://www.repeatmasker.org), for the repeat annotation at that location in the relevant genome and chromosome.
PipMaker alignments and dot plots. Long alignments were performed using PipMaker (24). Dot plots were generated from the blastz output file returned by PipMaker by using the Perl GD module.
Additional Web-based tools. Open reading frame (ORF) structures were identified using NCBI's ORF Finder (http://www.ncbi.nlm.nih.gov/gorf/), and translations of nucleotide sequences were performed using the translate tool on the ExPASy molecular biology server (http://ca.expasy.org/tools/dna.html). In cases in which ORFs were interrupted by frameshift mutations or insertions or deletions, relevant ORFs were identified using the tBLASTn and BLASTx functions of the BLAST 2 sequences server (27).
Sequences. FASTA files of sequences are available upon request.
Accession numbers of retroviral sequences used in BLAST searches and alignments. Accession numbers of retroviral sequences used in BLAST searches and alignments are as follows: MPMV, AF033815; SRV-1, M11841; SRV-2, M16605; SERV231, U85505; SERV252, U85506; SMRV, M23385; TvERV-D, AF224725 and AF284693 (Env); JSRV, M80216; ENTV, Y16627; MMTV, M15122; MIAP, M17551; MIAP-related element with an envelope gene (MIAPE), M73818;HERV-K10 HML2, M14123; Rous sarcoma virus, AF033808; reticuloendotheliosis virus (REV), X01455;spleen necrosis virus (SNV) Env, M87666; feline retrovirus RD114 (Env), X87829; baboon endogenous virus (BaEV), D10032; gibbon ape leukemia virus, M26927; koala retrovirus, AF151794; Mus musculus endogenous retrovirus (MmERV), AC005743 (nt 112341 to 121005); Mus dunni endogenous virus, AF053745; porcine endogenous retrovirus type A 463H12, AF435966; Moloney murine leukemia virus, AF033811; Mus cervicolor popaeus endogenous virus (McpEV), AF327437;feline leukemia virus, M18247; python endogenous retrovirus, AF500296; murine endogenous retrovirus U1 (Env), AC079043 (nt 96983 to 97459); HERV-H (Env), CAB94192; and HERV-W (Env), AAD14546.2.
Nucleotide sequence accession numbers. Sequences of new elements from mouse, rat, and other species are located in GenBank under the accession numbers given in Table 2.
|
View this table: [in a new window] |
TABLE 2. Features
of representative betaretroviruses
|
|
|
|---|
We searched for
betaretroviruses in the mouse and rat genomes by conducting tBLASTn
searches (i.e., comparing a protein query sequence with the genome
translated in all six reading frames) using a
246-amino-acid
sequence from the RT domains of the Pol proteins of all known
betaretroviruses and several class II primate and rodent endogenous
retroviruses (see Materials and Methods). Matching sequences were
aligned and used to construct a neighbor-joining tree. This initial
tree, based on nucleotide sequences, indicated that multiple groups of
endogenous betaretroviruses and class II elements are present in the
mouse and rat genomes. In this paper, we will focus on those elements
which clustered with the betaretroviruses.
All elements grouping
with the betaretroviruses were analyzed in more detail. For each
element, the sequence of a
15.7-kb region,spanning 7.5 kb on either side of the Pol-related sequence, was
extracted from GenBank. The resulting sequences were analyzed in terms
of their gene contents (the presence or absence of genes for retroviral
proteins and the integrity of those genes) and the presence of
identifiable LTRs and primer binding sites. Selected elements
containing intact or reconstructible ORFs were used in phylogenetic
analyses which were performed using pol nucleotide and deduced
Pol amino acid sequences. Those elements which were chosen to represent
groups of several elements were usually the most intactin
terms of the presence and integrity of ORFs and the presence of
LTRsof their group, although we strived to ensure complete
coverage of the betaretrovirus section of the original pol
tree. In many cases, this required extensive manual reconstruction of
Pol ORFs which contained numerous frameshift mutations. As a
consequence, we consider the Pol amino acid tree to be less reliable
than the pol nucleotide tree. Trees were also constructed
using deduced amino acid sequences corresponding to gag (data
not shown) and env (see below) where
present.
As shown in Fig. 1, multiple groups of betaretrovirus-related pol sequences are present in the mouse and rat genomes. We designated these groups ß1 to ß7 according to their pol-based phylogenetic relationships to one another and to known betaretroviruses from other species. The branching orders of several of the groups differ between the pol nucleotide and Pol amino acid trees, possibly due to errors introduced during manual reconstruction of mutated Pol ORFs (see above). However, all of the groups contain the same members in both trees and (in most cases) are supported by high bootstrap values (Fig. 1). Two sets of sister groups (namely, ß4-ß5 and ß6-ß7) are apparent in both the pol and Pol trees and could arguably be combined. However, we made the arbitrary decision to designate these as separate groups based on the presence of previously known betaretroviruses.
![]() View larger version (74K): [in a new window] |
FIG. 1. Neighbor-joining
trees based on alignments of betaretrovirus pol nucleotide (a)
and Pol amino acid (b) sequences. The aligned sequences are from the
same 738-bp or 246-amino-acid region spanning the eight
conserved regions of retroviral RT proteins described by Jacobo-Molina
and Arnold (15). The
trees are rooted on the ß1 group of retroviruses. Names of
previously known betaretroviruses are surrounded by dashed boxes, and
those of newly discovered nonmurine retroviruses are surrounded by
solid boxes. Bootstrap values (out of 1,000 replicates) are shown at
the nodes. Shapes correspond to those in Fig.
3 and indicate groups
determined based on the Env
tree.
|
Descriptions of groups ß1 to ß7. (i) ß1. The ß1 group falls outside the large group containing all of the other betaretroviruses in the pol nucleotide tree (Fig. 1a) but lies within a larger group including groups ß3 to ß7 and excluding group ß2, with high bootstrap support, in the Pol amino acid tree (Fig. 1b). We favor the grouping based on nucleotide sequences because of the subjectivity involved in manually determining amino acid sequences from nucleotide sequences which contain frameshift mutations (see above).
The majority of the elements in group ß1 fall into four clusters of rat-specific elements, represented by the Rattus norvegicus endogenous retrovirus group ß1 element corresponding to accession number NW_043030 (designated RnERV-ß1_NW_043030) (28 elements),RnERV-ß1_NW_043429 (9 elements), RnERV-ß1_NW_044437 (2 elements), and RnERV-ß1_NW_044440 (2 elements). The majority of the elements in these clusters possess mutated gag, pro, and pol genes. Although the majority of the members of this group possess remnants of an env gene, few were sufficiently intact to enable inclusion in the TM tree.
A single ß1 element, MmERV-ß1_NT_039714, is present in the draft mouse genome, and only a mutated pol gene of that element could be detected.
(ii) ß2. The ß2 group comprises mouse and rat elements and includes the previously known betaretrovirus MMTV. Several clusters within group ß2 are apparent; some of these are species specific, while others contain both mouse and rat elements.
RnERV-ß2_NW_043524 represents a cluster of 19 rat-specific elements which are all highly similar and initially appeared to have arisen through a recent replicative burst. However, all of these elements lack LTRs and intact ORFs for gag, pro, or pol, and closer inspection revealed that this group arose through duplication of genomic DNA rather than retrotransposition (data not shown).
MmERV-ß2_AC113463a and MmERV-ß2_AC131667 belong to a group of seven mouse-specific elements. MmERV-ß2_AC131667 possesses intact pol and env ORFs, but its gag and pro genes are interrupted by a small number of premature stop codons and frameshift mutations. Both MmERV-ß2_AC113463a and MmERV-ß2_AC131667 possess relatively long LTRs (Table 2).
Four endogenous MMTV elements were identified in the C57BL/6J mouse genome. Three of these were full-length, possessing two identical or near-identical LTRs and (largely) intact ORFs. The fourth was an incomplete MMTV from a short contiguous DNA sequence (contig).
The elements most closely
related to MMTV are a group of rat elements represented by
RnERV-ß2_AC127663. This group contains five closely
related members, the most intact of which is
RnERV-ß2_AC127663, with intact gag and
pro ORFs, a single terminating mutation in the pol
gene, and a frameshift in the env gene (Fig.
2). Not only do these elements group with MMTV based on pol
sequences, but they also possess long LTRs (
1,200 bp) and
sag genes, features they share with MMTV. Seventeen solitary
LTRs derived from this group of elements are present in the rat genome,
and they are all highly (94 to 100%) related to those of
RnERV-ß2_AC127663 (data not shown). Thus, these
elements also appear to have entered the rat genome recently (see
Discussion).
![]() View larger version (47K): [in a new window] |
FIG. 2. ORF
structures of selected murine betaretroviruses. The three forward
reading frames are shown for each element, with vertical black lines
indicating stop codons. The gag, pro, pol,
and/or env ORFs from each element are indicated, with
indicating the presence of premature termination and/or
frameshift
mutations.
|
(iii) ß3. The ß3 group comprises two murine clusters, both of which contain mouse and rat elements, as well as the previously known betaretroviruses JSRV and ENTV of sheep and goats. It is a sister group to the ß2 elements in the pol nucleotide tree (Fig. 1a) but groups with the ß6 and ß7 elements (albeit with very low bootstrap support) in the Pol amino acid tree (Fig. 1b). Again, we consider the position in the pol nucleotide tree to be the most likely.
One ß3 cluster contains 11 elements and is represented by MmERV-ß3_AC111097, MmERV-ß3_AC122238, MmERV-ß3_NT_039307, and RnERV-ß3_AC120757. All of the members of this group have numerous premature stop codons and frameshift mutations. MmERV-ß3_AC111097 and RnERV-ß3_AC120757 possess identifiable LTRs, and numerous solitary LTRs related to each are present in the mouse and rat genomes (Table 2).
JSRV and ENTV fall inside the ß3 group with high bootstrap support in the pol nucleotide tree and moderate bootstrap support in the Pol amino acid tree, although in both cases JSRV and ENTV lie outside the group of mouse and rat elements.
(iv) ß4. Group ß4 contains five mouse-specific clusters, two rat-specific clusters, the type D retrovirus TvERV-D from the Australian brushtail possum, and a gray mouse lemur (Microcebus murinus) endogenous retrovirus which we describe for the first time. Several of the members of this group possess intact gag, pro, pol, and/or env ORFs.
RnERV-ß4_AC106444 and RnERV-ß4_AC119089 belong to a cluster of 18 closely related rat elementstheir pol genes are 94% identical, on average (Table 2). All possess LTRs and identifiable retroviral ORFs. Several of the elements have one or more intact ORFs, and the 5' and 3' LTRs of many of the elements are highly similar (>97%), a testament to their recent expansion. These elements are also accompanied by a vast excess of solitary LTRs (Table 2).
MmERV-ß4_AC110500 belongs to a cluster of eight mouse elements which also appear to have expanded relatively recently. MmERV-ß4_AC110500 possesses near-identical LTRs, intact gag, pro, and pol ORFs, and an env gene with three premature stop codons (Fig. 2).
A
cluster of 10 mouse-specific elements includes
MmERV-ß4_AC102561,
MmERV-ß4_AL683829b, and
MmERV-ß4_AL805955. The latter possesses intact
gag, pro, pol, and env ORFs (Fig.
2) and identical LTRs,
suggesting recent retrotransposition. In addition, almost
500 MmERV-ß4_AL805955-related LTRs
are present in the mouse genome (Table
2), with identity to those
of MmERV-ß4_AL805955 ranging from 100% down to
80%. Most of these are solitary LTRs, although some are
associated with a family of LTR retrotransposons present in 15 copies
and possessing only remnants of the original
MmERV-ß4_AL805955 ORFs. Despite the number of
MmERV-ß4_AL805955-related LTRs in the mouse genome,
they are only partially recognized as repeats by the RepeatMasker
program (see below and Table
3).
|
View this table: [in a new window] |
TABLE 3. Repeat
annotation of pol and LTRsa
|
TvERV-D is placed within group ß4 with moderate to high bootstrap support in both the pol nucleotide and Pol amino acid trees. However, its deep branching position within the ß4 group and its long branch reflect its distant relationship to the mouse and rat ß4 elements.
(v) ß5. Group ß5 comprises several rat- and mouse-specific clusters, as well as SMRV.
RnERV-ß5_AC127785, RnERV-ß5_NW_043324, and RnERV-ß5_NW_044400 represent smaller clusters within a larger cluster of 26 rat elements. The RnERV-ß5_AC127785 cluster of 18 elements has an average pol identity of 95%. Several members of the RnERV-ß5_AC127785 group, including RnERV-ß5_AC127785 itself, have intact gag, pro, pol, and env ORFs (Fig. 2) and identical or near-identical LTRs, which suggests autonomous and recent replication. In contrast, the members of the RnERV-ß5_NW_044400 (four elements) and RnERV-ß5_NW_043324 (four elements) clusters possess highly mutated gag, pro, pol, and env ORFs and many lack identifiable LTRs, suggesting ancient retrotransposition events.
The clusters of mouse and rat elements represented by MmERV-ß5_NT_039553 and RnERV-ß5_NW_043819 appear to be sister groups. These groups fall within the larger ß5 group with moderate bootstrap support (65.0%) in the pol nucleotide tree (Fig. 1a) but with only weak support (24.2%) in the Pol amino acid tree (Fig. 1b).
SMRV lies within ß5 with moderate to strong bootstrap support in the pol nucleotide and Pol amino acid trees.
(vi) ß6. Group ß6 is a relatively small group comprising one cluster of 12 rat elements (represented by RnERV-ß6_NW_043087) and two clusters of two mouse elements each (one cluster includes MmERV-ß6_NT_039167 and MmERV-ß6_NT_039210; the other is represented by MmERV-ß6_NT_039424), as well as the exogenous and endogenous type D retroviruses of Old World monkeys. The gag, pro, pol, and env genes of all ß6 elements are mutated. The Old World monkey type D retroviruses fall within group ß6 with very strong bootstrap support based on both pol nucleotide and Pol amino acid sequences.
(vii) ß7. The ß7 elements form the largest murine betaretrovirus group, comprising 229 elements. Only two of these elements are rat elements, and the rest are from the mouse genome. No known betaretroviruses from other species belong to the ß7 group.
The only two rat elements belonging to group ß7, RnERV-ß7_NW_043514 and RnERV-ß7_NW_043214, appear to be old insertions. They have mutated gag, pro, and pol genes but no identifiable LTRs.
In contrast, the mouse elements in group ß7 have been retrotranspositionally active for some time, and some are still active. Some of the older members of this group (MmERV-ß7_NT_039472 and MmERV-ß7_NT_039618), which do not have identifiable LTRs and possess highly mutated (and often barely distinguishable) gag, pro, and/or pol ORFs, also possess mutated env genes. However, the majority of ß7 elements possess only vestiges of an env gene or lack it completely. Although some clusters are apparent within the ß7 group, the clusters are generally poorly defined.
MmERV-ß7_BK001485 and
MmERV-ß7_AC124426 represent a cluster of 78 elements
that have retrotransposed recently, as previously reported
(3); this group includes
the previously identified type D retrovirus MusD
(18). The pol
genes of this group have an average sequence identity of 95%
(Table 2). Both
MmERV-ß7_BK001485 and MmERV-ß7_AC124426
have identical LTRs and intact gag, pro, and
pol ORFs. An additional six elements have completely intact
gag, pro, and pol ORFs and identical (or
nearly identical) LTRs. More than 500
MmERV-ß7_BK001485- and
MmERV-ß7_AC124426-related LTRs reside in the mouse
genome (Table 2).
Approximately 40% of these are associated with full-length
proviruses, another
15% are associated with the ETnII
family of MusD-derived retroelements
(3), and the remainder are
solitary LTRs.
The remaining elements in the ß7 group all have mutated gag, pro, and pol genes. Generally, those elements which diverge closer to the base of the ß7 group (Fig. 1) appear to be older: they contain more mutations in their gag, pro, and pol genes, and their 5' and 3' LTRs are either distantly related or unidentifiable (Table 2).
Relationships of env genes. A neighbor-joining tree was constructed using sequences from a conserved region of the TM domain of the env ORF, where present and/or reconstructible (see Materials and Methods). The TM tree is shown in Fig. 3. In contrast to the trees based on pol nucleotide and Pol amino acid sequences, which appear to represent primarily gradual evolution, the Env tree shows three distinct groups of murine betaretroviruses. One of these groups includes the ß1, ß2, and ß3 groups, as well as the class II endogenous retroviruses MIAPE and HERV-K. A second group comprises the majority of the ß4, ß5, and ß7 elements, as well as the Env proteins of the mammalian type C (gamma) retroviruses. The third group includes individual ß4 and ß6 elements, as well as all of the type D (and related) retroviruses and an endogenous retrovirus that we discovered in the genomic sequence of Seba's short-tailed bat (Carollia perspicillata; see below). The Env sequences of the rat ß6 elements, along with those of HERV-W, HERV-H, and murine endogenous retrovirus U1, are distantly related to this group.
![]() View larger version (53K): [in a new window] |
FIG. 3. Neighbor-joining
tree based on alignment of TM amino acid sequences. The aligned
sequences are from the region spanning the cleavage site and the TM
domain of the Env protein, as described by Bénit et
al. (4). Names of
previously known retroviruses are surrounded by dashed boxes, and those
of newly discovered nonmurine retroviruses are surrounded by solid
boxes. Shapes correspond to those in Fig.
1 and indicate groups
determined based on the Env
tree.
|
In the case of groups ß4 to ß6, it appears that recombination has occurred between the pol and env genes, giving rise to new pol-env combinations. The ß4 and ß5 groups of murine endogenous retroviruses are sister clades in the Env tree as they are in the pol and Pol trees. This suggests that a single recombination event gave rise to the pol-env combination of the common ancestor of the ß4 and ß5 groups. One ß4 element, MmERV-ß4_NT_039539, has undergone an additional recombination event, during which it has acquired a ß6-related env gene (Fig. 3).
The env genes of the ß6 elements appear to have been acquired through two independent recombination events. The mouse elements MmERV-ß6_NT_039167 and MmERV-ß6_NT_039210 have env genes which cluster with those of the type D retroviruses, as they do in the pol and Pol trees. The rat ß6 elements are the murine betaretroviruses that are most closely related to the type D retroviruses of Old World monkeys on the basis of their pol genes, whereas the mouse ß6 elements are more closely related to these retroviruses on the basis of their env genes (Fig. 3).
The only group ß7 elements with sufficient Env sequences to include in alignments, MmERV-ß7_NT_039472 and MmERV-ß7_NT_039618, cluster with the gammaretroviruses (gibbon ape leukemia virus, koala retrovirus, MmERV, Mus dunni endogenous virus, porcine endogenous retrovirus type A, Moloney murine leukemia virus, McpEV, and feline leukemia virus) and an unclassified python retrovirus.
One of the most interesting features of the Env tree is the relationship of the type D group members (MPMV, SRV-1 and SRV-2, SERV251, SMRV, TvERV-D, BaEV, RD114, SNV, and REV) to one another and to the murine betaretroviruses. Whereas the type D retroviruses are placed within or interspersed with groups ß4, ß5, and ß6 based on their pol sequences (Fig. 1), they form a tight cluster with one another, to the exclusion of all murine retroviruses, based on their Env sequences. This suggests that the type D envelope may have been acquired from a nonmurine, and possibly nonmurid, host (see Discussion).
In an attempt to identify the origin of the type D group env gene, we conducted tBLASTn searches using the amino acid sequences of several members of this group against the nonmouse, nonrat, nonhuman genome survey sequence and HTGS databases at NCBI. The highest-scoring match was with a BAC sequence from Seba's short-tailed bat (Carollia perspicillata), which is being sequenced as part of the NISC Comparative Vertebrate Sequencing Initiative. This env gene, which belongs to an endogenous retrovirus which we have named CpERV-ß5_AC138156, is most closely related to that of SMRV (Fig. 2). CpERV-ß5_AC138156 is an incomplete provirus which possesses almost (98%) identical 363-bp LTRs but has a large deletion which removes approximately one-third of the gag gene (at the 3' end), the entire pro and pol genes, and approximately 1/10 of the env gene (at the 5' end). What remains of the gag ORF corresponds to 485 amino acids and has the highest identity to the Gag protein of SMRV (49% identity). The 514 amino acids of the Env protein of CpERV-ß5 are 68% identical to the corresponding sequence of the SMRV Env protein. Thus, it appears that SMRV and CpERV-ß5_AC138156 share a recent common ancestor (see Discussion).
Common insertions in the mouse and rat genomes. We attempted to identify insertions of betaretrovirus elements at the same positions in the mouse and rat genomes. In general, the betaretroviruses we discovered formed species-specific clusters, suggesting expansion after the mouse-rat split. Two exceptions to this general rule were two ß2 elements and a group of ß3 elements.
The two ß2 elements (MmERV-ß2_NT_039339 andRnERV-ß2_NW_0433361) are not represented in the pol and Pol trees in Fig. 1, but they group with MmERV-ß2_NT_039761. It was apparent from the initial pol tree derived from all mouse and rat pol sequences that these two elements were relatively closely related and that they grouped together to the exclusion of all other mouse and rat elements. Both elements include only a pol-related region and a short region with similarity to the gag gene, but we were unable to detect any homology to other retroviral genes or identify LTRs. However, alignment of the pol regions and flanking sequences showed that these loci display similarity over a large range in the mouse and rat genomes (Fig. 4a), suggesting that these elements represent remnants of a ß2 retroviral insertion which occurred prior to the mouse-rat split.
![]() View larger version (12K): [in a new window] |
FIG. 4. Dot
plots of aligned common insertions in mouse and rat betaretroviral
elements. Alignments and dot plots were generated as described in
Materials and Methods. In all cases, the mouse and rat elements are on
the horizontal and vertical axes, respectively. The grey boxes
represent the retroviral regions. (a) Flanking sequences of 7.5 kb on
either side of the pol region of
MmERV-ß2_NT_039339 and
RnERV-ß2_NW_0433361. (b) Flanking sequences of
5 kb on either side of RMER16 LTRs in mouse (accession number
AC115914;
horizontal axis) and rat (accession number
AC095664;
vertical axis) genomes. (c) Data in this panel are the same as
described for panel b with mouse (accession number
AC131039;
horizontal axis) and rat (accession number
AC125654;
vertical axis) RMER16 LTRs. Notations along axes are accession numbers
followed by the coordinates of the first and last nucleotides of the
aligned sequence in the contig or clone and the orientation of the
sequence relative to the contig or clone. +, element and contig
or clone are in the same orientation; , element and contig or
clone are in opposite
orientations.
|
Repeat annotation. We determined the RepeatMasker annotation of the pol and LTR regions of the mouse and rat betaretroviruses as described in Materials and Methods (Table 3). In general, those groups with large numbers of closely related members are well annotated, presumably because they are more readily detected by repeat-seeking programs. Such elements match, over their entire lengths and with low levels of divergence from consensus, repeats in the Repbase Update database. Examples include the LTRs of RnERV-ß1_NW_043429, MmERV-ß2_AC113463, MmERV-ß2_AC131667, MmERV-ß3_AC111097, RnERV-ß3_AC120757, RnERV-ß4_AC106444, RnERV-ß4_AC119089, RnERV-ß6_NW_043087, and many members of the ß7 group and the pol regions of MmERV-ß3_AC111097, MmERV-ß3_AC122238, RnERV-ß3_AC120757, MmERV-ß6_NT_039424, RnERV-ß6_NW_043087, and many members of the ß7 group. Those groups that contain few and/or distantly related members are less well annotated.
Although the majority of the pol elements in the genomes of mice and rats are annotated as repeats, most of them show high levels of divergence from consensus. This suggests that although the murine betaretroviruses are recognized as being of retroviral origin, the annotation of mouse and rat betaretroviruses is currently incomplete. Consequently, some elements are assigned to groups to which they are only distantly related.
Many of the LTRs are only partially annotated. The most striking example of this is the LTRs of MmERV-ß4_AL805955. The mouse genome assembly contains almost 500 MmERV-ß4_AL805955 LTRs (Table 2) with an average sequence identity of 87%, and yet these LTRs are not assigned their own name in Repbase Update and instead are annotated as having a section with 26.8% divergence from the ETnERV2 consensus, a section with 25.7% divergence from the RNLTR3c consensus, and an intervening section which is a nonrepeat (Table 3). Other examples of numerous yet incompletely annotated LTRs are those of MmERV-ß4_AC110500, MmERV-ß4_AC124523, MmERV-ß5_AC125328, MmERV-ß5_NT_039649, and RnERV-ß5_AC127785 (Tables 2 and 3). Clearly, the completeness of the repeat databases has implications for both the annotation of genomic sequences and evolutionary deductions (see Discussion).
|
|
|---|
Four of the murine betaretrovirus groups (ß2, ß4, ß5, and ß7) possess coding-competent members, with fully intact ORFs for Gag, Pro, Pol, and/or Env proteins (Fig. 2). Most of the elements with intact ORFs also possess identical or near-identical 5' and 3' LTRs. These two features combined suggest recent and autonomous retrotransposition or infection.
Previous reports suggest that the ß2 (MMTV) elements of mice have variable distribution in wild mice and inbred strains and appear to have entered the genomes of their hosts recently (7, 14). Our results support these observations and suggest that the closely related ß2 viruses in the rat genome (represented by RnERV-ß2_AC127663) were also recently acquired. Both groups of elements contain few members, all of which are fully (or almost fully) intact and have highly similar 5' and 3' LTRs. In addition, no MMTV solitary LTRs and only 17 solitary LTRs from the RnERV-ß2_AC127663 group are observed in the mouse and rat assemblies, respectively. More distantly related ß2 elements reside in the mouse (MmERV-ß2_AC113463 and MmERV-ß2_AC131667) and rat (RnERV-ß2_NW_043520) genomes. These may correspond to the MMTV-related elements previously detected by Southern hybridization (6). It is interesting that MMTV and RnERV-ß2_AC127663 both possess sag genes, whereas the more distantly related ß2 elements do notacquisition of the sag gene by these viruses may have been crucial in enabling the cross-species transmission back to mice and rats.
The ß7 (MusD) elements display insertional polymorphisms in mice (2), suggesting that they are still active retrotransposons. These elements also display elevated embryonal expression in some laboratory strains of mice, which may contribute to (or enable) their retrotranspositional activity (4). The activity of ß4 and ß5 coding-competent elements is unknown. However, that these elements have retained intact ORFs despite their presence in the genomes of their hosts for such presumed long periods of time suggests that betaretroviruses from these and/or the other three groups may have retained coding competency and, therefore, the ability to undergo cross-species transmission to other murid species.
We have identified some ß2 and ß3 elements that likely integrated prior to the mouse-rat splitas evidenced by proviruses and solitary LTRs, respectively, at the same positions within the genomes of both speciesbut we have been unable to do so for the other beta groups. Although this may be because such common integrants do not exist, it is more likely that we have simply missed those integrants because of the nature of our search criteria, because the elements have been mutated or deleted over time, or because the genome sequences are incomplete. For many older elements of some of the groups, only incomplete proviruses (i.e., those lacking LTRs) could be found, and these groups may contain common integrants which we have not detected. It is also likely that many solitary LTRs reside in the mouse and rat genomes that are not represented by their original internal sequences, and these would not be detected by our approach.
Although the majority of the pol elements we have discovered here have already been annotated as repeats, this is the first time the phylogenetic relationships of these elements have been described. The most recently expanded and numerous elements have been identified by repeat-seeking programs and have been well annotated, but older and less numerous elements are poorly annotated. Generally, the pol regions of these elements are recognized as being retroviral, but they are highly diverged from the consensus sequences of the groups to which they have been assigned. In addition, LTRs of these older elements are usually only partially recognized as being repeats. The completeness of annotation obviously has important implications for determining ages of elements and dates of expansion of groups. Divergence from consensus is commonly used to estimate the age of a given element or group of elements (17, 29, 34). However, incomplete identification of repeat groups can lead to the assignment of some repeats to groups to which they are only distantly related, giving high divergences from consensus and skewing measurements of repeat age. Thorough identification and annotation of repeats is therefore of crucial importance for studies of the evolution of repeats and their hosts.
Increase in the known host range of betaretroviruses. In addition to the newly described mouse and rat betaretroviruses, we have discovered three previously unknown betaretroviruses from other species. CpERV-ß5_AC138156 is present in the genome sequence of Carollia perspicillata, a short-tailed leaf-nosed bat of Central and South America, and M_murinus_ERV-ß4_AC145758 resides in the genome of the gray mouse lemur (Microcebus murinus) of Madagascar. CpERV-ß5_AC138156 is, as far as we are aware, the first known bat retrovirus and is most closely related to the endogenous type D retrovirus of the squirrel monkey (Saimiri sciureus), which also inhabits South America. It is possible that transmission occurred directly between Carollia perspicillata and Saimiri sciureus or between these hosts via an intermediate host or that the retroviruses were transmitted to both hosts from an unknown (perhaps murid) host. M_murinus_ERV-ß4_AC145758 possesses sufficient sequence to be included in both pol and Pol and env trees, and in both cases it groups with the murine ß4 elements (Fig. 1 and 3). The third novel betaretrovirus sequence was that of BtERV-ß2_CC563924, which was discovered in a clone from the cow (Bos taurus) genome. The significance of these newly discovered proviruses to the evolution of betaretroviruses is unknown, but they extend the biological and geographical ranges of known betaretrovirus hosts and suggest that further investigation of betaretroviruses in these and other species is warranted.
Several groups have recently reported the detection of betaretroviruses in the genomes of pigs (10, 22), the bower bird, and the stripe-faced dunnart (13). These elements were detected by PCR using degenerate primers, and the sequences were too short to include in our pol and Pol alignments. We constructed trees using shorter pol and Pol sequences, including two pig elements (PMSN-1 and PMSN-4) (10) and the bower bird and stripe-faced dunnart elements (13), and all of these elements fell outside of the seven groups of betaretroviruses described here (data not shown), suggesting that an even greater diversity of betaretroviruses exists and awaits thorough characterization.
Murid rodents as hosts for evolution and distribution of betaretroviruses. It is clear from our results that a diverse range of betaretroviruses is present in the genomes of murine rodents. We have also obtained evidence of the presence of several betaretroviruses in the genomes of two North American sigmodontine rodents (our unpublished results), suggesting that betaretroviruses are broadly distributed in the Muridae family. Thus, murid rodents, with their global distribution, appear to have played a major role in the evolution and spread of betaretroviruses. It is also clear that betaretroviruses are present in the genomes of a wide variety of nonmurid hostssome known, some currently unknownand that numerous interspecies transmission events must have occurred. At this stage, however, it is unclear whether the majority of betaretrovirus evolution occurred in a murid rodent context, with occasional transmission to other species, or whether other hosts have played an equal or greater role in betaretrovirus evolution.
Transmission between murid and nonmurid hosts, regardless of the direction of transmission, has sometimes involved recombination within the retroviral genome to create new pol-env combinations, as exemplified by the type D retroviruses. These viruses are found within different groups of murine betaretroviruses (ß4, ß5, and ß6) based on their pol genes (Fig. 1). However, they do not group with their murine counterparts in the TM tree and are instead grouped together, to the exclusion of all murine sequences (Fig. 3). This suggests that several different betaretroviruses have acquired the same env gene during transmission between hosts. Several viruses from other (non-beta) retroviral generanamely, SNV and REV of anseriform and gallinaceous birds, BaEV of baboons, and the feline retrovirus RD114also possess type D env genes, and these viruses all appear to have arisen relatively recently through recombination and cross-species transmission (19, 20, 31). The type D env gene has thus proven itself to recombine readily and enable infection of a wide range of hosts and may confer a selective advantage on viruses which possess it.
Our results show that the diversity of endogenous betaretroviruses within the genomes of mice and rats (and other mammals) is much greater than was previously appreciated. Studies of other murid rodents, other mammals, and perhaps nonmammalian vertebrates will surely reveal an even greater diversity.
We thank the NISC Comparative Sequencing Program for the use of their sequence data.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»