Previous Article | Next Article ![]()
Journal of Virology, May 2005, p. 6478-6486, Vol. 79, No. 10
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.10.6478-6486.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
and
Michael Tristem
Department of Biological Sciences, Imperial College, Silwood Park, Buckhurst Rd., Ascot, Berkshire SL5 7PY, United Kingdom
Received 22 July 2004/ Accepted 28 November 2004
|
|
|---|
|
|
|---|
Most ERVs show clear homology to one another and to modern exogenous retroviruses, especially across the RT gene, which is relatively refractory to nonsynonymous substitution. Diverse retrovirus sequences can therefore be aligned in order to investigate phylogenetic relationships, and this has been instrumental in the classification of exogenous retroviruses into seven genera (alpha-, beta-, gamma-, delta-, and epsilonretroviruses; lentivirus; and spumavirus) (12, 26, 34, 37). Although many ERVs have not been assigned to particular genera, there is a growing tendency to group them into classes according to their similarity to exogenous retroviruses (19, 20, 36). Using this system of classification, ERVs clustering with gamma- and epsilonretroviruses are termed class I, those that cluster with lentiviruses, alpha-, beta-, and deltaretroviruses are termed class II, and those that cluster with spumaviruses are termed class III (6, 36). It should be noted that, despite this classification system, most ERVs are only distantly related to known exogenous retroviruses. In particular, the lentiviruses and deltaretroviruses have no closely related endogenous counterparts (16). The distribution and diversity of class I and class III ERVs have both been investigated previously in some detail (6, 16, 23), but this is not the case with the class II ERVs. Only a small number of alpha- and betaretrovirus-related (but no lentivirus or deltaretrovirus-related) elements have been characterized (9, 10, 16). Despite this, sequence analysis has revealed that class II ERVs cluster into a robustly supported clade in retrovirus phylogenies (16). Furthermore, it appears that, in contrast to class I and class III retroviruses, the class II ERVs have a relatively restricted host range, being confined largely to mammals and birds (16). The only exceptions to this are two ERVs, termed python endogenous retroviruses, that were recently identified in boid snakes (18).
Class II ERVs differ from other retroviruses in several features of their genomic organization. All described class II retroviruses produce a Gag-Pol polyprotein via one or two ribosomal frameshifting sites rather than the termination codon suppression mechanism found commonly in other retroviruses (28). One frameshift site is (with the exception of the lentiviruses) located at the protease (PR)-Pol(RT) boundary, whereas the other (encoded by the lentiviruses and the beta- and deltaretroviruses) is situated between Gag and PR (28). Another unusual feature of class II retroviruses is the presence of a short, glycine-rich region related to the G-patch domain found in many RNA binding proteins (2). Within the Mason-Pfizer monkey virus, this region is synthesized with and then cleaved from the PR protein, although its precise function remains to be determined (2, 17).
Here we report the results of widespread sampling within vertebrates for class II ERVs related to the lentiviruses, alpha-, beta-, and deltaretrovirus genera. We show that many features of their genomic organization and host association are remarkably stable, having defined monophyletic origins on phylogenetic trees. We also demonstrate that the exogenous lentiviruses are probably most closely related to several endogenous retroviruses derived from rodents and insectivores.
|
|
|---|
Sequence analysis and alignment. Novel class II ERV sequences were translated and aligned to previously characterized viruses. Cross-amplified class I and class III viruses were discarded from the data set by constructing neighbor-joining phylogenies and by excluding sequences that clustered with gamma-, epsilon-, or spumaretroviruses. An amino acid alignment was constructed by using (i) the known amino acid sequences of previously described retroviruses and (ii) virtual translations of novel retroviruses that lacked premature frameshifts in the amplified region. This amino acid alignment was used as a template to identify the likeliest locations of indels responsible for frameshifts when the remaining sequences were aligned. Manual adjustments were made on this basis if they were supported by the results of alignment algorithms using the raw nucleotides. Regions lacking clear homology between sequences or where homology could not be unambiguously identified were excluded from the alignment. The final DNA alignment contained 121 taxa spanning 792 bp. The equivalent amino acid alignment spanned 264 residues.
Phylogenetic analysis. Phylogenetic analyses were performed by using both Bayesian MCMC (Markov Chain Monte Carlo) inference, as implemented in MRBAYES 3.0B4, and maximum-parsimony (MP) and neighbor-joining (NJ) approaches in PAUP 4 (29, 31). For MRBAYES analyses, the nucleic acid alignment and a general time-reversible model with codon position-specific rates was used. Four chains were run well past their asymptotes before 10,000 trees were collected, at one tree per 100 generations. These trees were used to calculate a majority rule phylogram. Nucleic acid-based MP reconstruction was performed by using 30,000 random addition replicates of an unweighted datamatrix, with third-codon positions excluded, holding a single tree in memory during each replicate. The resultant minimum trees were then used as the starting point for a heuristic search, during which a total of seven optimal trees were recovered. MP and NJ analyses were also performed by using an amino acid alignment (to further investigate the topology and sister relationships of the lentiviruses, as discussed below). Amino acid-based MP analysis again used 30,000 random addition replicates, holding one tree in memory during each replicate. Amino acid-based NJ analysis used PAUP defaults, and the tree was bootstrapped by using 1,000 replicates.
Nucleotide sequences and accession numbers. The novel retrovirus sequences described here have been submitted to the EMBL/GenBank databases and will appear under accession numbers AY820046 to AY820125.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Screening of vertebrate taxa for class II-related retroviruses
|
|
View this table: [in a new window] |
TABLE 2. Novel class II ERVs identified in this study
|
![]() View larger version (31K): [in a new window] |
FIG. 1. ML phylogeny of the class II retroviruses with numbers on each branch indicating percentage posterior probabilities and asterisks showing probabilities of >95. The phylogeny is rooted on several deltaretroviruses (bovine leukemia virus, human T-cell leukemia virus type 1, and human T-cell leukemia virus type 2). Internal branches are thickened if the same topology was recovered in a strict consensus of seven equal-length MP trees. Taxon colors represent the host class of origin with avian hosts in blue, mammalian hosts in green, and reptilian hosts in black (in boldface type if the viruses are described here for the first time).
|
Betaretroviruses are known to cluster into two subgroups, with one subgroup comprised of viruses present within many primate species, as well as ungulates (such as the Jaagsiekte virus within sheep), rodents (MusD), and marsupials (TvERV within the Brushtail possum) (4, 15, 22, 38). The second subgroup contains the sole representative mouse mammary tumor virus (MMTV) (27). Our results demonstrate that, although betaretroviruses are likely to be restricted to mammals, they are probably widespread throughout this vertebrate class. We found novel examples in several additional mammalian orders, including carnivores and a marine mammal. Furthermore, we identified murine mammary tumor virus-like viruses in several African and North American ungulates.
The sister clade to the betaretroviruses comprises the IAP-related elements, which have been described in a number of rodent species (21), and appear to be abundant within the mouse genome (20). Several novel sequences clustering strongly with the IAP elements were identified during our screening, all derived from rodents or lagomorphs, perhaps suggesting that the IAP elements have a more restricted host range than other class II retroviral groups. Consistent with this, two recent studies have shown that class II viruses related to betaretroviruses and IAP elements are extremely widespread in murid rodents (3, 25). In particular, it appears that there are multiple groups of endogenous class II-related retroviruses, some of which cluster separately with each of SMRV-H (Squirrel monkey retrovirus), Mason-Pfizer monkey virus, Jaagsiekte, and TvERV (3, 25).
Relationship of the lentiviruses to class II ERVs. An unusual and unexpected feature of the phylogeny shown in Fig. 1 was the placement of the exogenous lentiviruses as sister taxa to several endogenous mammalian viruses from rodents (RV-Grass rat II and MuERVU1) and insectivores (RV-European hedgehog). Nucleic acid-based MP phylogenies also supported this relationship. This is in contrast to previous reports, which have generally placed the lentiviruses toward the base of the class II virus phylogeny, as paraphyletic sister taxa to the exogenous deltaretroviruses (5, 11, 16, 18, 33, 37). Characteristically skewed nucleotide compositions have been described in several retrovirus genera (7). Lentiviruses in particular are notable in being adenine-rich and cytosine-poor across the entire genome. Analysis of the nucleotide composition of RV-European hedgehog, RV-Grass rat II, and MuERVU1 viruses indicated that they did not share this bias (unpublished data). Nevertheless, to exclude the possibility that the observed relationship between these viruses and the lentiviruses was a function of nucleotide composition, we constructed MP and NJ trees by using a protein alignment (which spanned the same residues used in the DNA-based alignment [see Fig. S1 in the supplemental material]). The relationship was retained, with weak bootstrap support, in the case of the NJ analysis (unpublished data). However, the MP analysis placed RV-European hedgehog, RV-Grass rat II, and MuERVU1 as paraphyletic sister taxa to the lentiviruses (unpublished data).
To further investigate the relationship between the lentiviruses and the endogenous rodent- and insectivore-derived viruses, we studied the PR-RT region in more detail. Almost all known class II retroviruses encode PR and Pol (RT, RNase H, and IN) in different reading frames and use ribosomal frameshifting to produce a PR-Pol polyprotein (28). However, this is not the case with the lentiviruses, which encode PR and Pol in the same reading frame (28). We therefore examined our sequences for the presence of ribosomal frameshifting sites. The majority of the mammal-derived viruses and all viruses from avian hosts contained a characteristic thymine-rich region immediately upstream of a 1 frameshift at the boundary of the PR and Pol(RT) proteins (see Fig. 2). However, these features were absent from the RV-European hedgehog, RV-Grass rat II, and MuERVU1 sequences, supporting the hypothesis that these viruses may represent endogenous sister taxa to the lentiviruses.
![]() View larger version (30K): [in a new window] |
FIG. 2. Origin and distribution of the class II PR-RT frameshift mechanism and the G-patch motif. Taxa in boldface do not contain encode the frameshift, whereas the other taxa do (gray type indicates taxa with deletions encompassing the frameshift site). The boxes by each taxon label indicate whether the virus encodes a G-patch domain and whether this domain is degenerate. All avian taxa encode a frameshift site but lack a G-patch domain (not shown).
|
|
|
|---|
The results of our screening also suggest that class II retroviruses are probably present within most mammalian and avian orders (we were able to recover sequences from 24 of the 27 orders investigated). Orders failing to yield class II sequences (Charadriiformes [shorebirds], Ciconiiformes [storks], and Scandentia [tree shrews]) were likely due to the low number of samples screened and the risk of obtaining false-negative results when a PCR-based approach is used. Furthermore, we note that results for individual taxa may not reflect the actual diversity of class II lineages contained in their genomes because only five clones were sequenced from each taxon and factors such as the primer sequence and ERV copy number may influence the results of PCR-based approach to screening. Despite this, it remains possible that class II ERVs may have a patchy distribution across vertebrate orders and families, especially within mammals. We were only able to recover class II sequences from 25 of the 49 mammalian taxa investigated, whereas viral fragments were recovered from 38 of the 46 avian taxa. This finding is consistent with previous liquid hybridization studies of the betaretroviruses and IAP elements (15, 21).
A high proportion of the novel avian ERV fragments encoded open reading frames that were intact, or nearly intact, across the aligned region (excluding the PR/RT frameshift site). Of the 55 sequences identified, 35 encoded one or less in-frame stop-codon or frameshifting mutation (Table 2), suggesting that many of these viruses have been active in the recent past. Thus, exogenous counterparts to these sequences may currently be circulating in avian populations, and we think it likely that exogenous viruses belonging to many of the avian subgroups present within the phylogeny will eventually be isolated. It is possible that retroviruses described previously, but for which pol sequence data are currently unavailable, are in fact members of these groups (9, 10, 13, 14).
In contrast, among the mammalian-derived viruses, novel ERV sequences clustering outside of the betaretrovirus or IAP groups (shown in Fig. 1) were relatively degenerate, displaying multiple in-frame stop codons or frameshifting mutations (unpublished data). Indeed, with the exception of HERV.K (HML2), not a single sequence outside of these two groups was intact across the amplified region. This suggests that many of these mammalian viruses are likely to represent older viral lineages and may therefore be less likely to have extant exogenous counterparts. The relatively intact nature of the betaretroviruses and IAP elements suggests that they may represent some of the more recently active groups of endogenous class II mammalian viruses, as has been suggested previously (3).
One of the most striking results of our phylogenetic analyses was the clustering of viruses derived from mammals and birds into distinct monophyletic groups. These groupings were not strongly supported (and MP analysis placed the mammalian viruses as a paraphyletic sister clade to several groups of avian viruses), but it is unlikely that such a pattern would be obtained by chance. Previous studies have shown intermingling of avian and mammalian class II ERVs in phylogenies (33). However, these studies were based on shorter sequence alignments of a smaller and less diverse range of class II taxa and are therefore likely to be less accurate than the phylogeny presented here. Monophyly of the mammalian and avian derived sequences strongly implies that vertebrate interclass transmission events have been rare in the evolution of the class II retroviruses (although we note that only a very small proportion of endogenous viruses have been examined to date). The most likely exceptions to this are the boid python ERVs, which are sister taxa to two novel viruses derived from Felids. This relationship was supported in both maximum-likelihood (ML) and MP analyses, making it possible that an interclass transmission event from mammals to reptiles has occurred during the evolution of these viruses. The rarity of interclass transmission apparent from our analyses reflects that observed previously with the gammaretroviruses. The gammaretroviruses comprise a genus of exogenous and endogenous viruses that are widespread in tetrapod vertebrates (23, 34) but only very rarely undergo interclass transmission (23, 24). Taken together, these results imply that interclass transmission within the family Retroviridae occurs much less frequently than intraclass transmission.
An intriguing result from our phylogenetic analyses was the positioning of the lentiviruses. Previous reports have suggested they form one of the most basal clades within the class II viruses, as paraphyletic sister taxa to the deltaretroviruses (5, 11, 16, 18, 33, 37). Our analyses (which were based on a larger number of class II sequences than available previously) placed the lentiviruses in a more derived position within the mammalian viruses, as sister taxa to three viral sequences from rodents and insectivores, including the murine element MuERVU1 (previously identified by Bénit et al. [5]). This topology was also observed in MP phylogenies but lacked robust bootstrap support. Further support for this relationship was apparent from analyses investigating the acquisition and loss of viral characters present within the amplified PR-RT region.
The first of these characters, the frameshift event known to occur at the PR-RT boundary in several class II genera, was remarkably stable, being found in the vast majority of class II sequences. Indeed, it appears that ribosomal frameshifting has only been lost on three occasions across the phylogeny. Two of these loss events involve only a single virus (RV Small mongoose I and RV Echidna II), whereas the third is shared by both lentiviruses and RV-Grass rat II, MuERVU1, and RV-European hedgehog. The presence or absence of the G-patch domain (2) also appears to be a relatively stable characteristic. Our phylogenies suggest that the G-patch was acquired early during the evolution of the mammalian class II viruses (it is absent from the avian-derived sequences). Loss (or degeneracy) of the G-patch appears to have occurred more often than loss of ribosomal frameshifting, but a single loss event is, again, shared between the lentiviruses and their three sister taxa.
The lentiviruses are known to encode several accessory genes that are not present within other retroviruses (28). Investigation of MuERVU1, for which the full-length sequence is available, failed to reveal any obvious similarity to these lentivirus-specific genes (unpublished data). However, this might be expected since many of the accessory genes are absent from the more basal lentiviruses (such as EIAV (28) and those that are conserved in all members of the genus (such as tat and rev) are both relatively short and highly divergent. Although our phylogenetic, G-patch and frameshift site analyses all support the conclusion that lentiviruses have distantly related endogenous counterparts, more definitive proof will have to await the detailed comparative analysis of several full-length viral genomes.
R.G. was supported by an NERC studentship.
Supplemental material for this article may be found at http://jvi.asm.org/. ![]()
Present address: Division of Virology, The National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, United Kingdom. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»