| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Previous Article | Next Article ![]()
Journal of Virology, October 2007, p. 10712-10717, Vol. 81, No. 19
0022-538X/07/$08.00+0 doi:10.1128/JVI.00410-07
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

and
Michael Tristem
Department of Biology, Imperial College, Silwood Park, Ascot, Berkshire SL5 7PY, United Kingdom
Received 26 February 2007/ Accepted 10 July 2007
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The distribution of HERVs within the genome will reflect the likelihood of integrations becoming fixed in different genomic locations. Several studies, including those investigating transposable elements (TEs) in Drosophila and long interspersed nuclear elements in humans have implicated recombination in determining TE fixation (2, 8). For example, lower levels of TE induced ectopic exchange (where homologous TEs in different genomic locations recombine, which can lead to major genomic rearrangements) are expected in regions of reduced recombination, thus leading to the accumulation of TEs (30, 52). Local recombination rates could also influence fixation indirectly, based on models that assume that insertions are slightly deleterious (14, 36), because the efficacy of selection in the host genome increases with recombination rate. Gene density and GC content have also been implicated previously in exerting selection pressure against the fixation of HERVs and other TEs (34, 53). TE insertions, either directly into genes or in close proximity to them, would usually be highly deleterious since they would disrupt normal gene expression (53). Occasionally, such insertions may be beneficial, for example, by providing promoters and enhancers that are co-opted by the host for the regulation of gene expression (13, 50, 51).
In the present study, we use retroviral sequence data, including solo-LTRs, from the human genome sequence to understand factors influencing two major stages of the HERV life cycle: (i) from initial integration to fixation and (ii) the persistence of fixed, full-length HERVs over time. An overabundance of HERVs, detected within chromosome Y and a 5-Mb section of chromosome 19, is also discussed.
| MATERIALS AND METHODS |
|---|
|
|
|---|
LTR_MINER is able to defragment LTRs and IRs separately and then link sets of two defragmented LTRs and one defragmented IR into the same full-length HERV (i.e., the inference that the defragmented sequences originated in the same proviral insertion). This defragmentation procedure is also crucial for the identification of solo-LTRs, since we use this term exclusively to denote the result of a recombinational deletion event (46), as opposed to LTRs that have become separated from the IR due to an intervening insertion of an unrelated sequence.
Defragmentation of multiple similarity hits. LTR_MINER parameters were set so that sequence similarity hits to the same element family were defragmented only when they had the same orientation on the chromosome, and their genomic order and sequences mapped onto consecutive (though not necessarily contiguous) regions of the respective family consensus sequence. Occasionally, inferred consecutive fragments overlapped slightly; LTR_MINER pieces together such fragments, provided that their overlap does not exceed 40 bp.
Identification of full-length elements. After the defragmentation of all hits to LTRs, and separately of hits to IRs, in the original RepeatMasker annotation, LTR_MINER identifies full-length HERVs by searching for their structural pattern: LTR-IR-LTR (a defragmented LTR followed by a defragmented IR followed by a defragmented LTR of the same family and orientation). Note that here the defragmented LTR or IR may consist of multiple RepeatMasker hits; furthermore, the LTR-IR boundaries need not be contiguous (i.e., the boundaries may be separated by an unrelated insertion). However, only patterns for which the distance between the inner ends of the two LTRs is less than 30 kbp are considered candidate full-length HERVs. Finally, in order to check whether the pattern could be straddling a nested insertion of the same family, the search is then recursively extended from each end of the pattern for further hits to an IR and an LTR (of same family and orientation), subject to the constraint on the maximum distance between intra-element LTRs mentioned above. The two LTRs of the innermost pattern are classified as a pair of intra-element LTRs (and in this way any nested insertions of the same family could be identified). The automatic annotations produced by LTR_MINER have been verified by comparison to manually curated datasets with multiple levels of nested insertions (V. Pereira, unpublished data).
Identification of solo-LTRs. LTR_MINER was set to classify a defragmented LTR as a "solo-LTR" if no other LTR or IR sequence (of same family and in the same orientation) is present within a 5-kbp radius from the fragment's ends. The aim was the identification of elements resulting from deletion (of the IR and one LTR) events via homologous recombination between intra-element LTRs and not to classify as solo-LTRs sequences that are separated from IRs because of insertions.
Gene density data. Data on the number of genes, total amount of sequence data, and chromosome length for each chromosome were taken from build 31 of the human genome held at the National Centre for Biotechnology Information (freely downloadable from http://www.ncbi.nlm.nih.gov).
Determination of local recombination rate. We used the high-resolution recombination map of Kong et al. (28) to compute the recombination rates at HERV-containing loci. Kong et al. (28) genotyped 5,136 microsatellite markers for 146 Icelandic families, with a total of 1,257 meiotic events. This map has greater accuracy and about fivefold-higher resolution than previous genetic maps (9, 12). The markers were mapped on to an updated version of the August 2001 freeze of the Human Genome Project Working Draft (29). The published recombination rates represent the average for a window of 3 Mb, centered on the marker.
Every intact HERV and solitary LTR identified by using LTR_MINER was assigned a recombination rate based on the recombination rate of the nearest marker. Locations of recombination markers and HERVs were taken from Kong et al. (28) and the LTR_MINER output, respectively. Only elements within 1.5 Mb of a marker were included in this analysis. The process of assigning HERVs to the nearest marker was automated by using Python scripts (available on request). Since HERV integration is random with respect to genomic location, the exclusion of some elements beyond the range of these markers would not be expected to result in any bias. A similar approach has been used to investigate the effect of local recombination rates on divergence between the human and chimpanzee (21). Recombination rates were divided into evenly sized bins of 0.1 cM/Mb for rates between 0 and 3 centimorgans (cM)/Mb, resulting in 30 bins, plus a high recombination rate bin for HERVs at loci with recombination rates above 3 cM/Mb. HERVs at loci with higher recombination rates were pooled into a single bin due to lower copy numbers. The high recombination rate bin included recombination rates ranging from >3 to 7.27 cM/Mb, with a mean of 3.56 cM/Mb. The precise x coordinate of every recombination bin was calculated as the average of the recombination values of both full-length HERVs and solo-LTRs within each recombination bin. The significance of the relationship between the paired/solitary LTR ratio and recombination rate was assessed by using linear regression. In addition, we also performed a binomial regression where every HERV was described as being either a solo-LTR or a full-length HERV. The predictor variable was the local recombination rate. Local recombination rate was treated as a continuous variable in this analysis, using the recombination rate of the nearest marker rather than the recombination rate bins of the linear regression.
The relationship between the total number of HERV integration events (solitary LTRs plus complete elements) and local recombination rate was examined by adding the number of full-length HERVs and solo-LTRs at each recombination bin and dividing by the total amount of sequenced nucleotides that each bin covers. This was computed by adding the full length of the genome covered by markers corresponding to each bin and then subtracting regions of the genome for each bin that have not yet been sequenced.
Test for differences in HERV density in chromosome Y and 19p12.
Differences in HERV density in the Y chromosome and in a 5-Mb region of chromosome 19 compared to the rest of the genome were assessed by using a 2-by-2
2 contingency table. Chromosome 19 was split into two, because although it has the highest gene density of all of the human chromosomes (29), it contains a 5-Mb region on the p arm near the centromere that is relatively gene poor. This region is at locus 19p12. The observed and expected number of elements on chromosome Y and in the remainder of the genome were compared. The expected frequencies were calculated assuming that the total number of elements was distributed between Y and the rest of the genome relative to the total amount of DNA in each partition of the genome. This test was performed both for the number of full-length HERVs and for the total number of integrations. Comparisons were made both between chromosome Y and the rest of the genome and between the 5-MB region of locus 19p12 and the rest of the genome.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
HERV fixation is independent of local recombination rate. We used the sum of solo-LTRs and full-length HERVs to measure fixation in different genomic regions. This is in contrast to previous reports that used overall retroviral content (without assigning retroviral DNA to particular insertion events) rather than a direct measure of the number of HERVs insertions that have become fixed (34). The inclusion of solo-LTRs removes confounding effects caused by potential differences in the rate of full-length HERV loss (via recombinational deletion) in different genomic locations. We used multiple regression analysis to explicitly compare the association of both gene density and recombination rate (which are both correlated to GC content [28]) to HERV fixation, between entire chromosomes. We found that gene density and not recombination rate is the significant predictor of HERV fixation (P < 0.0001 and P = 0.2 in the multiple regression, respectively). Thus, consistent with previous reports (34), gene density correlates with HERV fixation; Fig. 1A shows a negative correlation between the density of total fixation events and gene density among the chromosomes (P < 0.0001). Chromosomes Y and 19 are outliers, containing more HERVs than would be expected from their gene densities (discussed below).
|
We find no evidence that host recombination rate correlates with fixation (Fig. 1b, P = 0.49), despite a 30-fold variation in recombination rate, from 0.1 to 3 cM/Mb. This is contrary to predictions suggested by both the ectopic exchange model and models that implicate recombination indirectly with the probability of fixation (14, 16). Furthermore, our data are consistent with results obtained from TEs in Arabidopsis thaliana (53) and retrotransposons in Caenorhabditis elegans (14), which show no relationship between element density and local recombination rate (Table 1).
|
We plotted the ratio of paired to solo-LTRs against local recombination rate (Fig. 1C). We found, in contrast to total fixation events (Fig. 1B), a highly significant relationship between the paired/solo-LTR ratio and the local recombination rate (P < 0.0001), with a threefold increase in recombination rate halving the ratio. The local recombination rate has therefore played a major role in limiting the persistence of HERVs in their full-length state.
It is likely that HERVs integrate at random with respect to the local rate of recombination. Although integration targeting has been demonstrated for some exogenous retroviruses, the integration preferences of such viruses does not correlate with the genomic distribution of related ERVs (1, 41). This suggests that integration preference does not determine the distribution of ERVs in the genome. The probability of ERV fixation is also independent of the rate of recombination because (as shown above) gene density, not recombination rate, influences the probability of fixation. Once fixed, the persistence of the element in its full-length state then depends on the local recombination rate. Full-length elements situated in regions of low recombination undergo less deletion and hence persist for a greater period of time. We propose that this is because the rate of intrastrand homologous recombination within a genome (the mechanism generating solo-LTRs) (46) is linked by as-yet-unknown mechanistic processes to the background rate of meiotic recombination. Thus, the likelihood of solo-LTR formation correlates with the probability of a meiotic recombination event at any given location.
How does this model differ from other previously proposed scenarios such as the ectopic exchange (30, 52), gene disruption (17, 33), or deleterious gene product (36) models? Although the ectopic exchange model is consistent with the increased solo-LTR ratio in areas of high recombination (Fig. 1C) (because ectopic exchange occurs more frequently between longer, full-length elements (8), the solo-LTRs would still be deleterious to some extent. However, this is not the case: fixation frequency does not depend on the local recombination rate, which contrasts with the expectation of the ectopic exchange model (Fig. 1B).
Under the gene disruption model (17, 33), solo-LTRs would also be expected to be deleterious, since they contain the retroviral regulatory sequences affecting host gene expression. Our data support this model in terms of fixation, since we note an underrepresentation of fixed elements in gene-rich regions. The gene disruption model is unlikely to apply to persistence (Fig. 1C) since both solo-LTRs and full-length HERVs are deleterious (retroviral promoter and enhancer sequences are in the LTR). Although the formation of a solo-LTR can, in certain cases, lead to a reversion of a somatic mutation (43, 47), there are also cases of solo-LTRs disrupting the expression or splicing of nearby genes (24, 42). Thus, it seems likely that many solo-LTRs present within or near genes would be deleterious and would therefore be rapidly lost by selection.
There are, however, scenarios under which solo-LTRs may confer a selective advantage to the host compared to full-length HERVs. Under the deleterious gene product model (36), a solo-LTR does not express retroviral genes or form new viral particles that could be deleterious. Figure 1C is theoretically consistent with this model, since full-length elements are preferentially removed in regions of high recombination. However, it is our opinion that (at least within humans) negative selection would act quickly against intact and active viruses, largely preventing them from reaching fixation within the host population (3). This is consistent with the presence of replication competent viruses in other mammals, most of which are unfixed within the host species (10, 11, 19, 35, 38). In particular, the majority of active and infectious endogenous murine retroviruses are present in only a few inbred strains of mice (6). During this period (when the LTRs are largely identical), we have shown that recombinational deletion occurs extremely rapidly (5), and this masks the effect of elevated levels of recombinational deletion according to local recombination rate. However, as LTR divergence increases over time due to host induced mutation, the rate of recombinational deletion greatly slows and the effects of local recombination rate then become more pronounced and important in determining long-term persistence. Thus, few, if any, active HERVs contribute to the data shown in Fig. 1C, which, instead, represent the mechanistic generation of solo-LTRs via intrastrand homologous recombination rather than by any selection driven process.
HERVs are overrepresented on chromosome 19 and Y. Previous reports have noted an overrepresentation of HERVs on chromosomes 19 and Y (34), a pattern that is repeated when solo-LTRs are included (Fig. 1A and Table 2). The overrepresentation of HERVs on chromosome 19 is largely due to a 5-Mb span at locus 19p12, a region containing features of both euchromatin and heterochromatin (22). Table 2 shows that full-length elements are about eightfold overrepresented in this region, whereas the total number of elements (solo-LTRs plus full-length elements) are approximately threefold overrepresented. Thus, there are both more integration events and a higher proportion of full-length elements than expected. 19p12 is rich in zinc-finger (ZNF) genes that appear to have arisen via segmental duplication over the past 50 million years, probably via unequal crossover between ß-satellite repeats (15). There are numerous solo-LTRs in the close vicinity of these ZNF genes, which are derived from only a small number of HERV families. BLAST analysis using one copy from each solo-LTR type as a probe revealed that, in each case, almost all of the top matches were from 19p12, suggesting that many solo-LTRs in this region arose as a by-product of the ZNF gene cluster expansion (unpublished data). To investigate this further, we reconstructed the phylogeny of one of the solo-LTR types present at high copy number in 19p12. The maximum-likelihood phylogeny was estimated under the HKY model of substitution (with the parameters of the model also estimated from the data) in PAUP* (49). This showed that the chromosome 19 LTRs clustered together to the exclusion of most LTRs present on other chromosomes (not shown), which is consistent with their proliferation via segmental duplication. Although the total number of integrations on 19p12 can therefore probably be explained by segmental duplication, the proportion of full-length elements to solo-LTRs cannot. Indeed, by showing that many of the solo-LTRs in this region probably arose by segmental duplication, the proportion of full-length elements is skewed further. It remains unclear why this should be the case.
|
We hypothesize that the rate of HERV fixation is higher in males, as a result of a higher underlying integration frequency in the male germ line. Most exogenous retroviruses require dividing cells in order to cross the nuclear membrane and thereby integrate into the host genome (39). The lentiviruses (e.g., human immunodeficiency virus) represent a notable exception to this general rule by infecting nondividing cells (31). Other viruses, such as avian leukosis virus, are intermediate between these two states, being able to infect nondividing cells, albeit less efficiently than human immunodeficiency virus (48). Many HERVs replicate via the same mechanism as exogenous retroviruses (4) and so would also be expected to require dividing cells in order to integrate. Since males require a large number of germ line cell divisions to produce sperm from spermatogonial stem cells (44), the male germ line may be a more suitable environment for HERV replication and integration than the female germ line, where germ line cell division ceases during fetal development. Because the Y chromosome resides only in males, it should therefore acquire more integrations than the somatic chromosomes, which spend half of their time in females. In line with this hypothesis, we note that the male mutation rate is approximately twice that of females (7, 29), approximately equal to the excess of HERVs on chromosome Y. It is thought that the higher mutation rate in males is due to the higher number of germ line cell divisions per generation in males, a process similar to the one we propose (7, 20, 44). In contrast to this, in vivo experiments with mice have noted more effective transmission in the female than in the male germ line (23, 32, 40). However, these experiments were performed on highly inbred strains of mice, and it remains to be determined whether the same situation occurs with normal outbreeding over longer timescales. Comparison of ERV fixation between autosomes and sex chromosomes in a range of other species should allow us to test the validity and generality of the male-biased proliferation hypothesis.
Concluding remarks. We have shown here that, within humans, the genomic distribution and persistence of ERVs is controlled by gene density and local recombination rate, respectively. This is consistent with certain TE classes in some species (14, 34, 53) but not others (2, 14) (Table 1). Thus, factors controlling the evolution, distribution, and persistence of TEs are likely to depend on a complex interplay between specific TE classes and particular host species. The analysis of sets of TEs from a wide variety of host genome sequences should eventually allow general trends in TE evolution to be separated from factors specific to the TE or host life history.
| ACKNOWLEDGMENTS |
|---|
The MRC, NERC, and Wellcome Trust provided financial support.
| FOOTNOTES |
|---|
Published ahead of print on 18 July 2007. ![]()
Present address: Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Falmer, Brighton BN1 9QG, United Kingdom. ![]()
| REFERENCES |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| J. Bacteriol. | Mol. Cell. Biol. | Microbiol. Mol. Biol. Rev. |
|---|
| Clin. Vaccine Immunol. | ALL ASM JOURNALS |
|---|