Previous Article | Next Article 
Journal of Virology, August 2004, p. 8788-8798, Vol. 78, No. 16
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.16.8788-8798.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Human Endogenous Retrovirus Family HERV-K(HML-5): Status, Evolution, and Reconstruction of an Ancient Betaretrovirus in the Human Genome
Laurence Lavie,1 Patrik Medstrand,2 Werner Schempp,3 Eckart Meese,1 and Jens Mayer1*
Department of Human Genetics, University of Saarland, 66421 Homburg,1
Institute of Human Genetics and Anthropology, University of Freiburg, 79106 Freiburg, Germany,3
Department of Cell and Molecular Biology, Lund University, 22184 Lund, Sweden2
Received 27 December 2003/
Accepted 12 April 2004

ABSTRACT
The human genome harbors numerous distinct families of so-called
human endogenous retroviruses (HERV) which are remnants of exogenous
retroviruses that entered the germ line millions of years ago.
We describe here the hitherto little-characterized betaretrovirus
HERV-K(HML-5) family (named HERVK22 in Repbase) in greater detail.
Out of 139 proviruses, only a few loci represent full-length
proviruses, and many lack
gag protease and/or
env gene regions.
We generated a consensus sequence from multiple alignment of
62 HML-5 loci that displays open reading frames for the four
major retroviral proteins. Four HML-5 long terminal repeat (LTR)
subfamilies were identified that are associated with monophyletic
proviral bodies, implying different evolution of HML-5 LTRs
and genes. Sequence analysis indicated that the proviruses formed
approximately 55 million years ago. Accordingly, HML-5 proviral
sequences were detected in Old World and New World primates
but not in prosimians. No recent activity is associated with
this HERV family. We also conclude that the HML-5 consensus
sequence primer binding site is identical to methionine tRNA.
Therefore, the family should be designated HERV-M. Our study
provides important insights into the structure and evolution
of the oldest betaretrovirus in the primate genome known to
date.

INTRODUCTION
Approximately 8% of the human genome is derived from retrovirus-like
elements termed endogenous retroviruses (ERV) (
14). Most of
them are likely remnants of exogenous retrovirus infection of
the germ line which became fixed in the population millions
of years ago. Intracellular retrotransposition events increased
proviral copy number during evolution, resulting in the presence
of a few to several thousand proviruses belonging to various
ERV families. A variety of distinct human ERV (HERV) families
have been identified, suggesting that the germ line was invaded
by various exogenous retroviruses (
23,
25,
41). The human genome
was recently estimated to contain 30 to 50 distinct HERV families
(
11). About 100 different HERV sequences are defined in Repbase,
a widely employed reference sequence database for repetitive
elements (
15). Unfortunately, there is no established nomenclature
for HERV. We follow here a nomenclature that uses the single-letter
amino acid code of the tRNA complementary to the primer binding
site formerly used to prime reverse transcription. Accordingly,
HERV-K denotes a number of HERV families having a lysine tRNA-like
primer binding site. Phylogenetic analysis of HERV reverse transcriptase
sequences have identified 10 HERV-K families in the human genome
which were termed human MMTV-like (HML-1 to HML-10) because
of homologies to the betaretrovirus mouse mammary tumor virus
(MMTV) (
1,
32). Repbase Update also lists 10 HERV-K families.
Structurally intact ERV contain gag, protease (prt), polymerase (pol), and envelope (env) genes and are flanked by long terminal repeats (LTRs). Two HERV families, HERV-K(HML-2) and HERV-W, contain intact open reading frames (ORFs) and encode functional proteins (2, 3, 18, 27, 29, 31, 36). Further proviral sequences with Env coding capacity were identified recently (9). Most HERV sequences are coding deficient because they accumulated a variety of mutations and deletions since provirus formation. The vast majority of HERV are present in the genome only as solitary LTRs due to homologous recombination between 5' and 3' LTRs.
Many HERV families were fixed in Old World primates after their evolutionary split from New World primates about 35 million years ago (41). For example, the HERV-K families HML-2, HML-3, and HML-6 were all found in Old World monkeys but missing in New World primates. The HERV-K(HML-2) family seems to be relatively long-lived in the genome, displaying transpositional activity since its appearance in the primate lineage up till after the human-chimpanzee split (7, 28, 33, 38). In contrast, other ERV families are short-lived. In the case of the HERV-K(HML-3) family, no activity in terms of provirus formation was indicated in the human evolutionary lineage over the last 30 million years (26). To date, only a few HERV families have been characterized in greater detail. Here we characterize the HERV-K(HML-5) family by analyzing proviruses in the human genome. We find that the HML-5 proviruses have an older origin than other betaretroviruses in the human genome, and we reconstruct a putative full-length coding-competent ancient betaretrovirus which targeted the primate germ line more than 50 million years ago.

MATERIALS AND METHODS
HERV-K(HML-5) sequence retrieval.
HERV-K(HML-5) proviruses (LTR22A/HERVK22/LTR22A; Repbase version
8.2.0 consensus sequences) were identified by downloading the
human sequence specified as HERVK22 in the RepeatMasker annotation
of the June 2002 (hg12) UCSC genome browser (
http://genome.ucsc.edu/).
We performed dot matrix comparisons between each proviral sequence
using MacVector (Accelrys Inc.) with default settings. We identified
boundaries between HERV-K(HML-5) proviral loci and flanking
cellular sequences and the location of nonretroviral repetitive
elements within the proviral portions by using RepeatMasker
(provided by A. F. A. Smit and P. Green; RepeatMasker at
http://www.repeatmasker.org).
We subsequently removed LTRs and apparent non-HERV-K(HML-5)
sequences.
Multiple alignment of HERV-K(HML-5) sequences and consensus sequence generation.
We produced multiple alignments of HERV-K(HML-5) proviral sequences displaying a minimal size of 3 kb (after deletion of other repetitive elements). Multiple alignments were generated employing DIALIGN2 (35) provided by the Institut Pasteur web server (http://bioweb.pasteur.fr) or employing ClustalW (42) using default settings. We further optimized alignments by hand-operating the Se-Al program (provided by Andrew Rambaut; http://evolve.zoo.ox.ac.uk/). Consensus sequences, generated by the Boxshade program at the Institut Pasteur, were further evaluated and corrected manually. The resulting proviral sequence was analyzed for encoded retroviral proteins and conserved domains employing BlastP and CD-search at the National Center for Biotechnology Information.
Phylogenetic analysis.
We employed PAUP* (Sinauer Associates) and the PHYLIP package (10), the latter provided by the Institut Pasteur web server, for phylogenetic analysis. The above programs generated multiple alignment of LTR sequences, and several regions from the multiple alignment of proviral body regions were subjected to neighbor-joining analysis. Nucleotide distances were corrected according to the Kimura-2-parameter model (17). Gaps and ambiguous positions were excluded from analysis. Sequences obtained from various primate species were included in the respective multiple alignment portions of human sequences and were analyzed similarly.
Evolutionary age of proviruses.
Approximate integration times of HERV-K(HML-5) proviruses displaying almost full-length LTRs on both sides were estimated by determination of Kimura-2-parameter corrected nucleotide distances between 5' and 3' LTRs of particular proviruses by employing PAUP*. Proviral age (T) was estimated according to the following formula: T = D/(2 · 0.13), where D is the nucleotide divergence between the 2 LTR sequences and 0.13 is the estimated average substitution rate per nucleotide and million years (8, 19). The factor 2 considers that both LTRs diverged independently from each other.
Test for HERV-K(HML-5) homologous sequences in primate species.
PCR primers corresponding to the 5' portion of the gag gene (P1, 5'-AACAGTATATAAAAGTATTGAAACA-3'; and P2, 5'-TGCTTTTTCTTATCTCTTTATAAG-3') and the env gene (P3, 5'-CCATTCACTCCAGATAATTTGT-3'; and P4, 5'-TTCTTTTCCAATCAAGGAGTGA-3') within the HERV-K(HML-5) consensus sequence were used to amplify HERV-K(HML-5) proviruses from human and various other primate species genomic DNAs, together comprising four primate clades: Pan troglodytes, Pongo pygmaeus, Hylobates lar, Colobus guereza, Mandrillus sphinx, Macaca mulatta, Saimiri sciureus, Callithrix jacchus, Alouatta seniculus, and Nycticebus coucang. Genomic DNA (100 ng each) was subjected to PCR with 1 µM primers and 2.5 U of Taq polymerase (Invitrogen Inc.) in a final volume of 25 µl. PCR cycling conditions for gag and env amplification were as follows: 5 min at 94°C; 35 cycles of 45 s at 94°C, 45 s at 54°C, and 2 min at 72°C; 5 min at 72°C. PCR products for the gag region obtained from P. troglodytes, H. lar, M. mulatta, and A. seniculus were cloned into the pGEM-T vector (Promega). Sequences for both DNA strands were obtained using a SequiTherm Excel II DNA sequencing kit-LC (Biozym) and an automated DNA sequencer (Licor 4000-L; MWG). Raw sequence data were analyzed using the Sequencher software (Gene Codes Corp.).
Nucleotide sequence accession numbers.
We deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase.

RESULTS
Identification and structure of HERV-K(HML-5) proviruses.
By using the annotation within the UCSC genome browser (
http://genome.ucsc.edu/),
we identified 139 HML-5 proviruses in the human genome. Twenty-three
out of 139 (16.5%) loci were located on the Y chromosome, whereas
other loci seemed randomly distributed along autosomes. We next
performed dot matrix comparisons between proviral loci and the
Repbase consensus sequences (LTR22A/HERVK22/LTR22A) to examine
proviral structures in more detail. We generated schematic structures
of 100 out of 139 HML-5 sequences (Fig.
1). We did not include
39 of the proviruses because those proviral sequences were heavily
mutated due to integration of other repetitive elements, deletions,
duplications, and/or inversions. The majority (123 proviruses)
displayed larger deletions of retroviral genes: only 11 proviruses
displayed a complete
gag gene, 12 a complete
prt gene, 45 a
complete
pol gene, and 15 a complete
env gene. Among those,
multiple sequences commonly displayed large deletions of about
1,410 and 1,320 bp within either the
gag-prt junction or the
env region, respectively, or in both. The first deletion removed
a large portion of the 5' part of the
gag gene and the complete
prt gene, and the second deletion removed a large central part
of the
env gene (Fig.
1). Sixteen proviruses were intact regarding
the presence of retroviral genes, and nine of these resembled
full-length proviral structures having LTRs on both sides flanking
the putative coding regions.
Reconstruction of a coding-competent HML-5 provirus.
Before multiple sequence alignment of HML-5 sequences, we employed RepeatMasker to identify and to subsequently delete non-HML-5 sequences from proviral sequence entries. A greater number of proviruses contained one or several other repetitive elements which likely integrated into the proviruses at different stages during primate evolution. L1 elements from the M1, P, and PA2-6 subfamilies and Alu elements from the Y, Yc, Ya5, Yb9, Sp, Sc, and Sg subfamilies were recognized (4, 16, 40). Furthermore, numerous LTR elements annotated in Repbase as LTR5B, -6A, -7, -7B, -10F, -12, -12B, -12C, -15, -17, -19, -30, and MER11C (15) were found (Table 1). We chose 62 proviral sequence entries with the least deletions for multiple alignment. By using Boxshade and subsequent visual correction, we generated a HML-5 consensus sequence from the alignment. The sequence displayed a total length of 6,824 bp and was 4% divergent in sequence compared to the HERVK22 consensus sequence present in Repbase. Several nucleotide differences adjusted reading frames with regard to the Repbase sequence. Two 1-bp and an 11-bp insertion were introduced into the 5' intergenic region. A 3-bp sequence was introduced, and 2 and 1 bp were removed in the gag, prt, and pol gene regions, respectively. In addition, a stop codon within the prt gene and three stop codons within pol could be corrected.
Translation of our consensus sequence gave rise to ORFs for
four major retroviral proteins that were not found as such in
the Repbase sequence. ORFs for putative retroviral
gag,
prt,
pol, and
env genes could be defined (Fig.
1; see also Fig.
1A in the supplemental material, which is also available from us
upon request). The putative 517-, 265-, 911-, and 694-amino-acid
Gag, Prt, Pol, and Env proteins, respectively, displayed extended
similarities to the corresponding retroviral proteins, as revealed
by CD-search and BlastP analysis. As described recently, HML-5
proviruses once harbored a potentially active dUTPase enzyme
within the
prt ORF and also a domain encoding a retroviral aspartyl
protease (
30). As expected, the
pol reading frame encoded retroviral
reverse transcriptase, RNase H, and integrase domains. CD-search
furthermore identified a HERV-K/MMTV-type gp36 domain within
the Env protein C-terminal portion (Fig.
1). BlastP analysis
of encoded retroviral proteins against the GenBank protein division
revealed high similarities to retroviral proteins from the HERV-K(HML-2)
family. Moreover, significant similarities to the betaretroviruses
ovine pulmonary adenocarcinoma virus, MMTV, Mason-Pfizer monkey
virus, and simian retroviruses 1 and 2 were detected, with identities
to respective retroviral proteins ranging from 26 to 49% and
similarities ranging from 41 to 64% (as determined by BLAST).
Thus, our method was able to reconstruct all putative coding
sequences of an ancient betaretrovirus.
The HML-5 sequence was initially assigned to the HERV-K superfamily due to the similarity in pol with other HERV-K families. Based on the analysis of an earlier draft version of the human genome sequence, Tristem suggested that the HERV-K(HML-5) primer binding site region is more similar to isoleucine than to lysine (43). We analyzed the HERV-K(HML-5) consensus primer binding site region in regard to the similarity to reported human tRNA sequences (22) (http://rna.wustl.edu/GtRDB/) and found that the HML-5 primer binding site is identical in sequence to the methionine tRNA 3' end along at least 18 nucleotides (nt). Isoleucine tRNA displayed differences in three nucleotides, and lysine tRNAs displayed at least six different nucleotides (Fig. 2). Therefore, following the current HERV nomenclature, the HERV-K(HML-5) family should be designated HERV-M.
Phylogeny of the HERV-K(HML-5) family.
Dot matrix comparisons between the different proviruses and
the HERVK22 sequence flanked by LTR22A, as included in Repbase,
showed that only a few entries presented more extended similarities
with the LTR22A sequence. This finding suggested the existence
of one or more LTR variants associated with HML-5 sequences
and corroborated the definition of different LTR22 subfamilies
in Repbase (LTR22, LTR22A, and LTR22B). For each proviral entry,
we utilized RepeatMasker annotations to determine the presence
of complete or deleted 5' and 3' LTRs (Table
2). We found that
70 proviral sequences were associated with full-length LTRs
on both sides, 59 entries harbored at least one complete or
deleted LTR on one side, and 10 entries lacked both LTRs. We
generated a ClustalW multiple alignment with default alignment
parameters for a total of 134 fairly complete LTR sequences.
Neighbor-joining analysis of the sequences subject to multiple
alignment suggested the existence of several LTR subfamilies
(Fig.
3). A major branch, displaying 100% bootstrap support
(1,000 replicates), consisted of sequences similar to the LTR22A
subfamily. Among those sequences, two subgroups displaying 63
and 91% bootstrap support could be distinguished. We named these
two subgroups LTR22A and LTR22A2. Another group (99% support)
consisted of sequences with similarity to the LTR22 subfamily.
The majority of the remaining sequences were more similar to
the LTR22B subfamily. Therefore, analysis of the entire HML-5
data set revealed three major LTR subfamilies (LTR22, LTR22A,
and LTR22B) associated with the proviral sequences and is in
accord with previous findings. In addition, LTR22A comprises
a clearly distinguishable subgroup, named LTR22A2.
Based on the larger data set available in our study (
40), we
generated three new consensus sequences for the LTR22 families
listed in Repbase and for LTR22A. The classification of LTR
sequences in Table
2 is based on their phylogenetic relationship
to those new LTR22 consensus sequences. The new consensus sequences
for LTR22B and LTR22A displayed lengths of 497 and 457 bp, respectively,
and were modestly different from the Repbase sequences. The
471-nt-long LTR22A2 consensus sequence presented 74% identity
and 4% gaps compared to LTR22A. As revealed in this study, the
previously established Repbase LTR22 consensus sequence was
probably derived from three loci located on the Y chromosome
(proviruses 117, 121, 131; Fig.
3) that are less representative
for LTR22. The LTR22 consensus sequence generated in this study
is 492 bp in length compared to the 580-nt Repbase sequence
that furthermore included 47 ambiguous positions (International
Union of Pure and Applied Chemistry codes).
We furthermore analyzed the phylogenetic relationships between HERV-K(HML-5) proviruses for five different proviral regions (representing all major retroviral genes) by neighbor-joining and bootstrap analysis. Phylogenetic trees displayed similar branch lengths with respect to the consensus sequence, suggesting a similar nucleotide divergence and age of proviral sequences (data not shown). Neighbor-joining analysis implied two HML-5 subfamilies, but they were not supported in the bootstrap analysis. Only a few chromosome Y sequences were distinguished from the remaining sequences with higher bootstrap support (Fig. 4). An exception was obtained for the region spanning the end of reverse transcriptase until the start of RNase H. In this case, a subfamily of 28 proviral sequences was distinguished with 86% bootstrap support that was mostly associated with LTR22A and LTR22A2 sequences. Eight more LTR22B sequences were separated with 90% support. The remaining 14 sequences (with low bootstrap support) were either LTR22 or LTR22B (data not shown). Taken together, analysis of four out of five internal regions supports the observation that HML-5 proviral bodies do not form distinct subgroups but rather represent a monophyletic group, in contrast to phylogenetically clearly different LTR sequences.
Age of the HERV-K(HML-5) family.
A number of repetitive elements were found in HERV-K(HML-5)
proviruses. In particular, reasonably old
Alu subfamilies, such
as
AluSg,
AluSc, and
AluSp (Table
1), with approximate evolutionary
ages of 31, 35, 37 million years (
16), respectively, suggested
an evolutionary age of HML-5 proviruses higher than that of
other HERV-K families. The age of a provirus can be estimated
from sequence comparison of flanking 5' and 3' LTRs. Owing to
the retroviral reverse transcription strategy, both LTR sequences
are identical in sequence at the time of provirus formation.
Without selective pressure, both LTRs independently accumulate
mutations over time. Thus, sequence differences between a provirus'
LTRs are an approximate measure of provirus age (
8). We determined
the degree of sequence divergence between 5' and 3' LTRs with
larger overlapping portions for 53 HML-5 proviruses. We obtained
sequence divergences ranging from 6 to 24% (mean, 12.9%; standard
deviation, 3.87). These numbers equal an approximate age of
50 (±15) million years for the HML-5 proviruses (Table
2). We furthermore calculated the average ages of proviral sequences
from Kimura-2-parameter corrected distances to the HML-5 consensus
sequence for five different proviral regions, excluding gaps
and CpG dinucleotides (
16). Here, an average evolutionary age
of about 60 (±27) million years was indicated. The age
of roughly 55 million years from both analyses thus corresponds
to a HML-5 integration time into the genome clearly before the
evolutionary split of Old World from New World monkeys that
took place about 40 million years ago. This observation is in
contrast to other HERV-K families described till now because
those are not present in New World monkeys, suggesting that
HML-5 represents an ancient betaretrovirus family in primates.
To investigate the possibility that HML-5 elements are present
in New World primate genomes, we examined the species distribution
of HML-5
gag and
env regions by PCR with genomic DNA from hominoids,
Old World monkeys, New World monkeys, and prosimians. The amplified
portion in the
gag gene was located outside the above-described
commonly deleted region, whereas the amplicon in
env included
the frequently deleted region. We obtained
gag and
env PCR products
of expected sizes from all species tested, except for prosimians
(Fig.
5). This result was in good agreement with the above-mentioned
age estimate from LTR and consensus sequence analysis. Full-length
env genes, present in a minority of HML-5 proviruses in the
human genome, were amplified as a PCR product from all species.
PCR products corresponding to the
env deletion variant were
also amplified, indicating the presence of both longer and shorter
env variants in all tested species.
We cloned PCR products from the
gag region obtained from
P. troglodytes,
H. lar,
M. mulatta, and
A. seniculus and sequenced,
in total, 16 clones. The sequences were included in the neighbor-joining
analysis presented in Fig.
2. The nonhuman sequences grouped
among the human sequences; that is, they did not form separate
branches or groups. Also, the percentage of identity with the
human consensus sequence was very similar: 82 to 91% compared
to 78 to 92% for the human sequences. Therefore, our sample
sequence data set does not indicate different evolutionary behavior
of HERV-K(HML-5) homologous sequences in the examined nonhuman
primates. However, more elaborate studies will be required to
characterize precise evolutionary behavior of HERV-K(HML-5)
homologues in primates after their evolutionary separation from
the human lineage.

DISCUSSION
HERV represent former exogenous retroviruses that are fossilized
in the human genome. Closer examination of HERV sequences provides
information on ancient primate-targeting retroviruses and retrovirus
evolution in general. The completed sequence of the human genome
provides an excellent source of information for such studies.
In this paper, we set out to analyze a hitherto little-characterized
HERV family, HERV-K(HML-5), in more detail. We found that only
9 out of the approximately 139 HML-5 loci (6%) displayed full-length
retroviral genes flanked by LTRs on both sides. A higher number
of HML-5 loci are defective regarding proviral structures;
gag-prt and/or
env regions are missing in about 50% of proviruses, and
the start and end points of deletions obviously cluster within
defined regions. Analogous observations were recently made for
HERV-K(HML-3), displaying deletions within
gag and
pol (
26).
Amplification of deleted proviruses has been observed also for
other HERV families, for example, deleted HERV-H elements have
increased to high copy numbers during primate evolution in comparison
to full-length HERV-H proviruses (
24). Recombination on the
DNA level, mutations during retroviral reverse transcription,
or splicing of retroviral transcripts before reverse transcription
and provirus formation could account for such deleted proviruses.
Spliced and reintegrated transcripts have been observed for
HERV (
12,
21). Spliced human immunodeficiency virus type 1 transcripts
were also found as cDNA along with full-length retroviral RNA
during infection (
20), further supporting provirus formation
from spliced proviral transcripts in the evolutionary past.
Besides active amplification of proviral sequences, about 5%
of proviral loci probably arose passively from chromosomal duplications,
owing to the fact that the human genome comprises about 5% of
duplicated sequences (
14). In contrast, we do not find evidence
that HML-5 sequences were amplified by L1-mediated pseudogene
formation, as recently described for HERV-W (
6,
37).
HML-5 loci with both gag-prt and env deletions probably emerged during reverse transcription by recombination between RNA transcripts from proviruses having either deletion. In combination with HERV-K(HML-3) deletion variants (26), we note that despite the lack of one or more proviral regions, the 5' intergenic and gag portions are usually present in obviously (retrovirus-like) retrotransposed proviruses. Those retroviral regions are known to encode the packaging signal
that interacts on the RNA level with the Gag-encoded nucleocapsid (NC) protein. One may therefore hypothesize that interaction between retroviral RNA and NC is still essential during intracellular (retrovirus-like) retrotransposition of HERV. Alternatively, helper viruses could have been involved in the formation of new proviruses, and a packaging signal could have interacted with the helper virus' NC protein. Also in combination with previous results (9, 26), deletions within the env gene seem to occur recurrently, rendering the Env protein nonfunctional. Env-demolishing mutations could have resulted in decreasing production of infectious retrovirus. Such decrease could have significantly added to the fading of exogenous stages, and env deletions may therefore represent another cause for the extinction of exogenous stages.
Our study shows that HML-5 sequences were fixed in an ancestral genome after the simian lineage had evolutionary separated from the prosimian lineage but before the evolutionary separation of Old World and New World primates, indicated by provirus ages of approximately 55 million years and corroborated by PCR examination of various primate species. Other HERV-K superfamily members were fixed in an ancestral genome approximately 30 to 40 million years ago, after the evolutionary split of Old World from New World primates (26, 28, 34, 39). To the best of our knowledge, HML-5 represents the oldest betaretrovirus in the primate genome known to date. Despite a relatively high proportion of incomplete HML-5 proviruses in the human genome, our study generated a consensus sequence displaying the four major retroviral ORFs gag, prt, pol, and env. The corresponding proteins displayed significant similarities to other betaretroviruses, and the recreated HML-5 consensus is therefore expected to be very close in sequence to the former exogenous precursor to HML-5 endogenous variants. Thus, this study reconstructed an ancient betaretrovirus that was targeting primates about 55 million years ago.
Phylogenetic analysis of several proviral regions indicated similar sequence divergence among HML-5 sequences, resulting in almost monophyletic tree structures. This finding indicates that probably all HML-5 sequences were generated in a relatively brief evolutionary period around 55 million years ago, the latter as revealed by LTR-LTR divergences and divergence from the consensus sequence. Formation of new proviruses then obviously ceased. Proviruses with deletions in gag-prt and env were probably also generated in a relatively short period of time. This observation is further confirmed by PCR analysis that revealed gag and env deletions in all tested HML-5-positive primate species. Thus, different HML-5 "master proviruses" with different genome structures were obviously active and generated provirus progeny. In this manner, HML-5 displays similar behavior as HERV-K(HML-3) (26). However, both families display different behavior than HERV-K(HML-2), which formed proviruses during the hominoid period as well as in recent human evolution (5, 7, 28, 33).
Phylogenetic analysis of LTR sequences associated with proviral bodies revealed several apparent LTR families. However, phylogenetic analysis of proviral body sequences yielded monophyletic tree topologies for most examined regions. Only a region between reverse transcriptase and RNase H displayed branches with higher bootstrap support. Clearly, proviral bodies appear much more homogeneous in sequence than the associated LTR sequences. Thus, almost homogeneous proviral bodies were associated with clearly different LTR variants at the time of provirus formations. It is currently not known whether the different LTR variants were already present in the exogenous precursor(s) or represent derivatives from a germ line-fixed LTR founder family. In both cases, LTR sequences evolved in sequence independently from, and obviously more rapidly than, the proviral bodies. Reasons for apparently different evolutionary rates of LTRs and proviral bodies are currently not clear.
Whether HML-5 is ancestral to other HERV-K families currently seems unclear. Both the HML-5 and HML-6 families appear less related to the remaining HERV-K families, as evidenced by DNA sequence comparisons (32) and by phylogenetic comparison of dUTPase domains (30), for instance. In addition, this study revealed that the HML-5 consensus sequence primer binding site is identical to methionine but clearly less similar to lysine tRNA 3' ends; therefore, it is actually a HERV-M family and adds to the lesser phylogenetic relationship with HERV-K. However, HML-5 is still more related to HERV-K than to other (beta)retroviruses. Also in the course of this study, when employing an updated tRNA sequence data set (22) we noted that the previously characterized HERV-K(HML-3) family primer binding site region (26) is more related to arginine and asparagine than to lysine tRNA 3' ends, therefore actually requiring designation HERV-R or HERV-N. At the present time, the precise phylogenetic relationship of HML-5 to other HERV-Ks as well as endogenous and exogenous retroviruses must await further specific investigations. We are currently studying the remaining HERV-K families in a similar fashion. Certainly, results for other hitherto little-described HERV families will significantly contribute to such studies.

ACKNOWLEDGMENTS
This work was supported by grants from the Deutsche Forschungsgemeinschaft
to J.M. (Ma2298/2-1) and E.M. (Me917/16-1). P.M. is supported
by a fellowship from the Knut and Alice Wallenberg Foundation
and grants from the Swedish Research Council, Åke Wiberg
Foundation, and Magn Bergvall Foundation. W.S. is supported
by DFG grant SCH214/7-3.

FOOTNOTES
* Corresponding author. Mailing address: Department of Human Genetics, Building 60, University of Saarland, Medical Faculty, 66421 Homburg, Germany. Phone: 49 6841 1626627. Fax: 49 6841 1626186. E-mail:
jens.mayer{at}uniklinik-saarland.de.

Supplemental material for this article may be found at http://jvi.asm.org/. 

REFERENCES
1 - Andersson, M. L., M. Lindeskog, P. Medstrand, B. Westley, F. May, and J. Blomberg. 1999. Diversity of human endogenous retrovirus class II-like sequences. J. Gen. Virol. 80:255-260.[Abstract]
2 - Berkhout, B., M. Jebbink, and J. Zsiros. 1999. Identification of an active reverse transcriptase enzyme encoded by a human endogenous HERV-K retrovirus. J. Virol. 73:2365-2375.[Abstract/Free Full Text]
3 - Blond, J. L., D. Lavillette, V. Cheynet, O. Bouton, G. Oriol, S. Chapel-Fernandes, B. Mandrand, F. Mallet, and F. L. Cosset. 2000. An envelope glycoprotein of the human endogenous retrovirus HERV-W is expressed in the human placenta and fuses cells expressing the type D mammalian retrovirus receptor. J. Virol. 74:3321-3329.[Abstract/Free Full Text]
4 - Boissinot, S., and A. V. Furano. 2001. Adaptive evolution in LINE-1 retrotransposons. Mol. Biol. Evol. 18:2186-2194.[Abstract/Free Full Text]
5 - Buzdin, A., S. Ustyugova, K. Khodosevich, I. Mamedov, Y. Lebedev, G. Hunsmann, and E. Sverdlov. 2003. Human-specific subfamilies of HERV-K (HML-2) long terminal repeats: three master genes were active simultaneously during branching of hominoid lineages. Genomics 81:149-156.[CrossRef][Medline]
6 - Costas, J. 2002. Characterization of the intragenomic spread of the human endogenous retrovirus family HERV-W. Mol. Biol. Evol. 19:526-533.[Abstract/Free Full Text]
7 - Costas, J. 2001. Evolutionary dynamics of the human endogenous retrovirus family HERV-K inferred from full-length proviral genomes. J. Mol. Evol. 53:237-243.[CrossRef][Medline]
8 - Dangel, A. W., B. J. Baker, A. R. Mendoza, and C. Y. Yu. 1995. Complement component C4 gene intron 9 as a phylogenetic marker for primates: long terminal repeats of the endogenous retrovirus ERV-K(C4) are a molecular clock of evolution. Immunogenetics 42:41-52.[CrossRef][Medline]
9 - de Parseval, N., V. Lazar, J. F. Casella, L. Benit, and T. Heidmann. 2003. Survey of human genes of retroviral origin: identification and transcriptome of the genes with coding capacity for complete envelope proteins. J. Virol. 77:10414-10422.[Abstract/Free Full Text]
10 - Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Department of Genetics SK-50, University of Washington, Seattle.
11 - Gifford, R., and M. Tristem. 2003. The evolution, distribution and diversity of endogenous retroviruses. Virus Genes 26:291-315.[CrossRef][Medline]
12 - Goodchild, N. L., J. D. Freeman, and D. L. Mager. 1995. Spliced HERV-H endogenous retroviral sequences in human genomic DNA: evidence for amplification via retrotransposition. Virology 206:164-173.[CrossRef][Medline]
13 - Hughes, J. F., and J. M. Coffin. 2001. Evidence for genomic rearrangements mediated by human endogenous retroviruses during primate evolution. Nat. Genet. 29:487-489.[CrossRef][Medline]
14 - International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921.[CrossRef][Medline]
15 - Jurka, J. 2000. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16:418-420.[CrossRef][Medline]
16 - Kapitonov, V., and J. Jurka. 1996. The age of Alu subfamilies. J. Mol. Evol. 42:59-65.[CrossRef][Medline]
17 - Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120.[CrossRef][Medline]
18 - Kitamura, Y., T. Ayukawa, T. Ishikawa, T. Kanda, and K. Yoshiike. 1996. Human endogenous retrovirus K10 encodes a functional integrase. J. Virol. 70:3302-3306.[Abstract]
19 - Lebedev, Y. B., O. S. Belonovitch, N. V. Zybrova, P. P. Khil, S. G. Kurdyukov, T. V. Vinogradova, G. Hunsmann, and E. D. Sverdlov. 2000. Differences in HERV-K LTR insertions in orthologous loci of humans and great apes. Gene 247:265-277.[CrossRef][Medline]
20 - Liang, C., J. Hu, R. S. Russell, M. Kameoka, and M. A. Wainberg. 2004. Spliced human immunodeficiency virus type 1 RNA is reverse transcribed into cDNA within infected cells. AIDS Res. Hum. Retrovir. 20:203-211.[CrossRef][Medline]
21 - Lindeskog, M., and J. Blomberg. 1997. Spliced human endogenous retroviral HERV-H env transcripts in T-cell leukaemia cell lines and normal leukocytes: alternative splicing pattern of HERV-H transcripts. J. Gen. Virol. 78:2575-2585.[Abstract]
22 - Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955-964.[Abstract/Free Full Text]
23 - Lower, R., J. Lower, and R. Kurth. 1996. The viruses in all of us: characteristics and biological significance of human endogenous retrovirus sequences. Proc. Natl. Acad. Sci. USA 93:5177-5184.[Abstract/Free Full Text]
24 - Mager, D. L., and J. D. Freeman. 1995. HERV-H endogenous retroviruses: presence in the New World branch but amplification in the Old World primate lineage. Virology 213:395-404.[CrossRef][Medline]
25 - Mager, D. L., and P. Medstrand. 2003. Retroviral repeat sequences. In D. Cooper (ed.), Nature encyclopedia of the human genome. Macmillan Publishers Ltd., Hampshire, England.
26 - Mayer, J., and E. Meese. 2002. The human endogenous retrovirus family HERV-K(HML-3). Genomics 80:331-343.[CrossRef][Medline]
27 - Mayer, J., E. Meese, and N. Mueller-Lantzsch. 1997. Chromosomal assignment of human endogenous retrovirus K (HERV-K) env open reading frames. Cytogenet. Cell Genet. 79:157-161.[Medline]
28 - Mayer, J., E. Meese, and N. Mueller-Lantzsch. 1998. Human endogenous retrovirus K homologous sequences and their coding capacity in Old World primates. J. Virol. 72:1870-1875.[Abstract/Free Full Text]
29 - Mayer, J., E. Meese, and N. Mueller-Lantzsch. 1997. Multiple human endogenous retrovirus (HERV-K) loci with gag open reading frames in the human genome. Cytogenet. Cell Genet. 78:1-5.[Medline]
30 - Mayer, J., and E. U. Meese. 2003. Presence of dUTPase in the various human endogenous retrovirus K (HERV-K) families. J. Mol. Evol. 57:642-649.[CrossRef][Medline]
31 - Mayer, J., M. Sauter, A. Racz, D. Scherer, N. Mueller-Lantzsch, and E. Meese. 1999. An almost-intact human endogenous retrovirus K on human chromosome 7. Nat. Genet. 21:257-258.[CrossRef][Medline]
32 - Medstrand, P., and J. Blomberg. 1993. Characterization of novel reverse transcriptase encoding human endogenous retroviral sequences similar to type A and type B retroviruses: differential transcription in normal human tissues. J. Virol. 67:6778-6787.[Abstract/Free Full Text]
33 - Medstrand, P., and D. L. Mager. 1998. Human-specific integrations of the HERV-K endogenous retrovirus family. J. Virol. 72:9782-9787.[Abstract/Free Full Text]
34 - Medstrand, P., D. L. Mager, H. Yin, U. Dietrich, and J. Blomberg. 1997. Structure and genomic organization of a novel human endogenous retrovirus family: HERV-K (HML-6). J. Gen. Virol. 78:1731-1744.[Abstract]
35 - Morgenstern, B. 1999. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15:211-218.[Abstract/Free Full Text]
36 - Mueller-Lantzsch, N., M. Sauter, A. Weiskircher, K. Kramer, B. Best, M. Buck, and F. Grasser. 1993. Human endogenous retroviral element K10 (HERV-K10) encodes a full-length gag homologous 73-kDa protein and a functional protease. AIDS Res. Hum. Retrovir. 9:343-350.[Medline]
37 - Pavlicek, A., J. Paces, D. Elleder, and J. Hejnar. 2002. Processed pseudogenes of human endogenous retroviruses generated by LINEs: their integration, stability, and distribution. Genome Res. 12:391-399.[Abstract/Free Full Text]
38 - Reus, K., J. Mayer, M. Sauter, H. Zischler, N. Muller-Lantzsch, and E. Meese. 2001. HERV-K(OLD): ancestor sequences of the human endogenous retrovirus family HERV-K(HML-2). J. Virol. 75:8917-8926.[Abstract/Free Full Text]
39 - Seifarth, W., C. Baust, A. Murr, H. Skladny, F. Krieg-Schneider, J. Blusch, T. Werner, R. Hehlmann, and C. Leib-Mosch. 1998. Proviral structure, chromosomal location, and expression of HERV-K-T47D, a novel human endogenous retrovirus derived from T47D particles. J. Virol. 72:8384-8391.[Abstract/Free Full Text]
40 - Smit, A. F. A. 1996. Structure and evolution of mammalian interspersed repeats. Ph.D. dissertation. University of Southern California, Los Angeles.
41 - Sverdlov, E. D. 2000. Retroviruses and primate evolution. Bioessays 22:161-171.[CrossRef][Medline]
42 - Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.[Abstract/Free Full Text]
43 - Tristem, M. 2000. Identification and characterization of novel human endogenous retrovirus families by phylogenetic screening of the human genome mapping project database. J. Virol. 74:3715-3730.[Abstract/Free Full Text]
Journal of Virology, August 2004, p. 8788-8798, Vol. 78, No. 16
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.16.8788-8798.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
This article has been cited by other articles:
-
Jha, A. R., Pillai, S. K., York, V. A., Sharp, E. R., Storm, E. C., Wachter, D. J., Martin, J. N., Deeks, S. G., Rosenberg, M. G., Nixon, D. F., Garrison, K. E.
(2009). Cross-Sectional Dating of Novel Haplotypes of HERV-K 113 and HERV-K 115 Indicate These Proviruses Originated in Africa before Homo sapiens. Mol Biol Evol
26: 2617-2626
[Abstract]
[Full Text]
-
Basta, H. A., Cleveland, S. B., Clinton, R. A., Dimitrov, A. G., McClure, M. A.
(2009). Evolution of Teleost Fish Retroviruses: Characterization of New Retroviruses with Cellular Genes. J. Virol.
83: 10152-10162
[Abstract]
[Full Text]
-
Hughes, J. F., Coffin, J. M.
(2005). Human Endogenous Retroviral Elements as Indicators of Ectopic Recombination Events in the Primate Genome. Genetics
171: 1183-1194
[Abstract]
[Full Text]
-
Flockerzi, A., Burkhardt, S., Schempp, W., Meese, E., Mayer, J.
(2005). Human Endogenous Retrovirus HERV-K14 Families: Status, Variants, Evolution, and Mobilization of Other Cellular Sequences. J. Virol.
79: 2941-2949
[Abstract]
[Full Text]