Previous Article | Next Article 
Journal of Virology, April 2005, p. 5211-5214, Vol. 79, No. 8
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.8.5211-5214.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Weak Palindromic Consensus Sequences Are a Common Feature Found at the Integration Target Sites of Many Retroviruses
Xiaolin Wu,1*
Yuan Li,2
Bruce Crise,2
Shawn M. Burgess,3 and
David J. Munroe1
Laboratory of Molecular Technology,1
AIDS Vaccine Program, Scientific Application International Corporation, National Cancer Institute at Frederick, Frederick,2
Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland3
Received 2 September 2004/
Accepted 14 December 2004

ABSTRACT
Integration into the host genome is one of the hallmarks of
the retroviral life cycle and is catalyzed by virus-encoded
integrases. While integrase has strict sequence requirements
for the viral DNA ends, target site sequences have been shown
to be very diverse. We carefully examined a large number of
integration target site sequences from several retroviruses,
including human immunodeficiency virus type 1, simian immunodeficiency
virus, murine leukemia virus, and avian sarcoma-leukosis virus,
and found that a statistical palindromic consensus, centered
on the virus-specific duplicated target site sequence, was a
common feature at integration target sites for these retroviruses.

TEXT
Much is known about the sequence requirements at the end of
the viral DNA for efficient integration of retroviruses (
6).
A dinucleotide CA is invariably positioned exactly 2 bp from
both ends of the viral termini. The sequences internal to the
CA dinucleotide extending for up to 15 bp also have significant
roles. However, despite decades of effort, the mechanism of
target site selection remains largely unknown. Evidence accumulated
to date shows that most of the regions of the host genome are
potential retroviral integration target sites, but integration
is usually not random and the preferences appear to be specific
to the individual viruses (
12,
22,
26). Target site selection
can be influenced by many factors, including DNA binding proteins
(
1,
13,
17,
20), the chromatin structure of DNA (
18,
19,
24),
and, perhaps most importantly, cellular targeting proteins (
4,
25). It is still unclear how primary sequence at the target
sites influences the target site selection, although weak consensus
sequences have been reported for the target sites of several
retroviruses (
2,
5,
7,
18,
19,
23).
Recently, there have been several large-scale surveys of retroviral integration sites in the human genome (12, 15, 22, 26). We downloaded these sequences from GenBank and mapped these integration sites to the human genome, using the BLAT program on the University of CaliforniaSanta Cruz genome server (November 2003 freeze; UCSC Human Genome Project, http://genome.ucsc.edu). Included in our analysis were integration sites for human immunodeficiency virus type 1 (HIV-1; GenBank accession no. BH609398 to BH610086) (22), simian immunodeficiency virus (SIV) (GenBank accession no. AY679815 to AY680027), murine leukemia virus (MLV) (GenBank accession no. AY515855 to AY516880) (26), and avian sarcoma-leukosis virus (ASLV) (GenBank accession no. CL528318 to CL528772) (12). A total of 334 in vivo HIV-1 sites in SupT1 cells, 81 in vitro HIV-1 sites in naked DNA catalyzed by HIV preintegration complexes (PICs), 148 SIV sites in CEMx174 cells, 695 MLV sites in HeLa cells, and 357 ASLV sites in 293T-TVA cells were mapped.
The sequences upstream and downstream of proviral integration sites (same orientation as virus) were extracted for further analysis. All sequences were aligned at the integration sites (between base 1 and base 1), and the frequencies of A, C, G, and T at each position around the integration site were calculated. These values were compared to the expected value based on the total base frequency of the human genome or values from 500 computer-generated random integration site sequences in the human genome (Fig. 1). The human genome is relatively AT rich (60% AT and 40% GC). At any random site, the expected frequencies for A, C, G, and T are 30, 20, 20, and 30%, respectively. The base composition at each position around the 500 computer-generated random integration sites varies little from the expected value (Fig. 1A). In Fig. 1, we emphasized significant frequency changes at any base position adjacent to the precise integration site by highlighting changes of 10% or greater than expected (green for >10% increase and red for >10% decrease). Clear statistical preferences are observed for each virus, and they are different for each of the genera analyzed.
We compared the base compositions of in vivo HIV-1 integration
sites in SupT1 cells to randomly generated sites. The frequencies
of some bases at specific positions around HIV-1 target sites
are significantly higher or lower than the expected value (Fig.
1B). For example, base position 1 shows a preference for G (40%)
and avoidance for T (9%). Bases 2 and 4 show preferences for
T (54%) and A (46%), respectively. Base 5 shows preference for
C (41%) and avoidance for A (10%). These values are either 10%
higher or 10% lower than the expected frequencies at random
sites. To evaluate the statistical significance of these differences,
we performed bootstrapping by randomly choosing 334 sites from
the human genome (to match our in vivo HIV-1 sample size) and
computed the base composition for each of the 20 positions surrounding
the random sites. This process was repeated 10,000 times, and
only 13 times were frequency changes >10% at any position
among the 20-bp DNA: this corresponds to a
P value of 0.0013.
When the changes are highlighted, it is easy to see that these
preferences are symmetrically centered on base 3, forming a
statistical palindrome. Interestingly, this palindrome is centered
on the duplicated target site sequence, which comprises bases
1 to 5. The same statistical palindrome was also observed for
HIV-1 integration sites in HeLa cells and mouse bone marrow
cells (data not shown), suggesting that the preference is not
cell line specific or species specific and may represent an
intrinsic property of target site recognition by integrase.
We then calculated the base composition of HIV-1 integration sites in naked SupT1 genomic DNA catalyzed by PICs (22). The preference pattern is similar to that of in vivo HIV-1 integration sites (Fig. 1C). However, the preference outside the duplicated target site sequence differs slightly. Similar consensus sequences for HIV-1 target sites were reported previously, and synthetic oligonucleotides with the consensus sequence were shown indeed to be the favored target sites by PICs (5). These results with naked DNA targets suggest that preferences observed in vivo are due to recognition determinants of the integration machinery itself and not the influence of DNA binding proteins or chromatin structure.
SIV is closely related to HIV-1. The genome structure and proteins encoded by these two viruses share a great deal of homology. It was therefore interesting to see if SIV and HIV-1 shared the same target site preference. Our analysis showed that SIV integration sites had a similar statistical palindromic composition (Fig. 1D). However, there are differences between SIV and HIV-1, and they mainly lie outside of the 5-bp duplicated target site. For example, SIV has higher frequency of G at base 1 and T at base 2. Perhaps the most significant difference is at base 3, the third base outside the duplicated target site, where HIV-1 prefers T while SIV prefers G. The difference shows palindromic symmetry on the other side of the integration site, where SIV has a higher frequency of C, A, and C at bases 6, 7, and 8. The overall similarity of the target site sequence indicates that SIV and HIV-1 integration machinery may differ only slightly from each other.
MLV belongs to the genus Gammaretrovirus, which differs from lentiviruses in many aspects. Integration of MLV requires passage through mitosis, whereas lentiviruses do not (9, 21). MLV and HIV-1 also have distinct global target site preferences (22, 26). MLV highly prefers transcription start site regions, while HIV-1 prefers anywhere inside actively transcribed regions. Alignment of 695 MLV integration sites in HeLa cells revealed a different target site, but clearly significant consensus sequence (Fig. 1E). Like the HIV-1 sites, the consensus is also centered on the duplications that occur at the target site, which is 4 bp long for MLV instead of 5 bp long for HIV.
We also analyzed the target site sequence of another retrovirus, ASLV, which belongs to the genus Alpharetrovirus. From 148 integration sites, we deduced a weak palindromic target site consensus sequence for ASLV (Fig. 1F). This palindromic structure is centered on a 6-bp sequence fragment, which also coincides with the inferred duplicated target site for ASLV. As shown in Fig. 1, the palindromic target site consensus sequence for all four retroviruses extends beyond the target site duplications, suggesting that bases outside the very short target site duplications also contribute to target site selection.
It is interesting to note that a statistical palindromic consensus sequence has also been reported for the P transposable element in Drosophila (10), suggesting that a palindromic feature might be shared widely among many integrases or transposases. Like P element target sequences, we found very few individual retroviral target site sequences that correspond to the consensus, based on the favorite base at each position. For example, only 10 of 334 individual HIV in vivo sites have G1T2(A/T)3A4C5 at the duplicated target sites. There are several possible explanations for this discrepancy. First, the consensus is very weak and thus can only be found with large data sets like those used in this study. Second, there might be secondary DNA structures that can only be reflected partially by the primary sequences. To evaluate this possibility, we analyzed several known physical properties of the integration site DNA (Fig. 2).
We examined four different DNA structural properties, including
A-philicity (
8,
11), DNA bendability (
3,
14), protein-induced
deformability (
16), and hydrogen bond (H-bond) potential patterns
(
10) for HIV, SIV, MLV, and ASLV integration site DNA (Fig.
2). A-philicity measures the propensity of DNA to form an A
DNA-like double helix, which has a wide and shallow minor groove
believed to give proteins easier access to form hydrogen bonds
with bases within the DNA helix (
8,
11). DNA bendability also
changes the width and depths of the major groove and minor groove,
affecting protein access (
3,
14). Protein-induced deformability
represents the impact of protein binding on DNA topology (
16).
H-bond potential patterns describe the potential hydrogen bond
donors and acceptors of a base pair in the major groove of DNA
that interacts with proteins (
10). All of these properties are
based on DNA primary sequence. However, H-bond potential is
calculated based on single-nucleotide frequencies; A-philicity
and protein-induced deformability are calculated based on dinucleotide
frequencies; and DNA bendability is calculated based on trinucleotide
frequencies. All four retroviruses showed significant signal
change for the A-philicity score at integration sites when compared
to computer-generated random integration sites (Fig.
2A). For
the DNA bendability score, HIV and SIV showed more significant
changes than MLV and ASLV at the integration sites (Fig.
2B).
Significant changes were also observed for the measurement of
protein-induced deformability at the integration sites of HIV,
SIV, and MLV, while the change was less dramatic for ASLV. Also,
H-bond potential exhibited palindromic patterns centered on
the duplicated target sites for all four retroviruses (Fig.
2D). From these analyses, it is obvious that many structural
properties are favored at the retroviral integration sites.
Our results suggest that the observed statistical palindromic primary sequence might reflect the influences of integrase on site selection at target sites. The symmetry of the target site sequence might reflect that the integrase complex works in symmetrical dimers, tetramers, or oligomers at the integration sites, such that each half-complex would have a similar preference for target DNA structure. Our results also imply that it may not be appropriate to think of the consensus sequences as the most favorite sequence at each base. It might be better to think of certain bases being excluded at certain positions to meet the spatial or energy requirements of the integration complexes. For example, all four retroviruses and even P element transposons do not prefer T at the first base of the duplicated target site sequence (or A at the last base). In fact, we only observed two individual target site sequences with T1N2N3N4A5 out of 334 (0.6%) HIV-1 integration sites. This is statistically lower than the frequency of random genomic site sequence, where T1N2N3N4A5 can be expected at 9% (30% A x 30% T) or 30 out of 334 (P < 0.001 using a chi-square test). Similarly, T1N2N3N4A5, T1N2N3A4, and T1N2N3N4N5A6 are observed at statistically lower frequencies for SIV (P < 0.05), MLV (P < 0.001), and ASLV (P < 0.001), respectively. This common avoidance may reflect the physical or chemical restraints for position 1 during the DNA cleavage and strand transfer reaction catalyzed by the integrase. All retroviruses showed very low A-philicity scores at base 1, and dinucleotides TA, TC, TG, and TT had high A-philicity scores. Thus, if low A-philicity is truly a requirement for base 1, T will be unlikely to appear at this position. Likewise, many other factors also may contribute to the selection of spatially "best-fit" target sites. The exact structural property of the integration sites will be better understood as our knowledge of the DNA physical structure advances.

ACKNOWLEDGMENTS
This work has been funded in part with Federal Funds from the
National Cancer Institute, National Institute of Health, DHHS,
under contract N01-CO-12400.
The HbondView software was a kind gift from G. C. Liao.
The contents of this publication do not necessarily reflect the views or policies of the DHHS, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.

FOOTNOTES
* Corresponding author. Mailing address: Laboratory of Molecular Technology, Scientific Application International Corporation, National Cancer Institute at Frederick, 915 Toll House Ave., Frederick, MD 21701. Phone: (301) 846-7677. Fax: (301) 846-6100. E-mail:
forestwu{at}mail.nih.gov.


REFERENCES
1 - Bor, Y. C., F. D. Bushman, and L. E. Orgel. 1995. In vitro integration of human immunodeficiency virus type 1 cDNA into targets containing protein-induced bends. Proc. Natl. Acad. Sci. USA 92:10334-10338.[Abstract/Free Full Text]
2 - Bor, Y. C., M. D. Miller, F. D. Bushman, and L. E. Orgel. 1996. Target-sequence preferences of HIV-1 integration complexes in vitro. Virology 222:283-288.[CrossRef][Medline]
3 - Brukner, I., R. Sanchez, D. Suck, and S. Pongor. 1995. Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides. EMBO J. 14:1812-1818.[Medline]
4 - Bushman, F. D. 2003. Targeting survival: integration site selection by retroviruses and LTR-retrotransposons. Cell 115:135-138.[CrossRef][Medline]
5 - Carteau, S., C. Hoffmann, and F. Bushman. 1998. Chromosome structure and human immunodeficiency virus type 1 cDNA integration: centromeric alphoid repeats are a disfavored target. J. Virol. 72:4005-4014.[Abstract/Free Full Text]
6 - Coffin, J. M., S. H. Hughes, and H. E. Vermus. 1997. Retroviruses. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.
7 - Fitzgerald, M. L., and D. P. Grandgenett. 1994. Retroviral integration: in vitro host site selection by avian integrase. J. Virol. 68:4314-4321.[Abstract/Free Full Text]
8 - Ivanov, V. I., and L. E. Minchenkova. 1994. The A-form of DNA: in search of the biological role. Mol. Biol. (Moscow) 28:1258-1271. (In Russian.)
9 - Lewis, P. F., and M. Emerman. 1994. Passage through mitosis is required for oncoretroviruses but not for the human immunodeficiency virus. J. Virol. 68:510-516.[Abstract/Free Full Text]
10 - Liao, G. C., E. J. Rehm, and G. M. Rubin. 2000. Insertion site preferences of the P transposable element in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 97:3347-3351.[Abstract/Free Full Text]
11 - Lu, X. J., Z. Shakked, and W. K. Olson. 2000. A-form conformational motifs in ligand-bound DNA structures. J. Mol. Biol. 300:819-840.[CrossRef][Medline]
12 - Mitchell, R. S., B. F. Beitzel, A. R. Schroder, P. Shinn, H. Chen, C. C. Berry, J. R. Ecker, and F. D. Bushman. 17 August 2004. Retroviral DNA integration: ASLV, HIV, and MLV show distinct target site preferences. PLoS Biol. 2:E234. [Online.] doi:10.371/journal.pbio.0020234.[CrossRef][Medline]
13 - Muller, H. P., and H. E. Varmus. 1994. DNA bending creates favored sites for retroviral integration: an explanation for preferred insertion sites in nucleosomes. EMBO J. 13:4704-4714.[Medline]
14 - Munteanu, M. G., K. Vlahovicek, S. Parthasarathy, I. Simon, and S. Pongor. 1998. Rod models of DNA: sequence-dependent anisotropic elastic modelling of local bending phenomena. Trends Biochem. Sci. 23:341-347.[CrossRef][Medline]
15 - Narezkina, A., K. D. Taganov, S. Litwin, R. Stoyanova, J. Hayashi, C. Seeger, A. M. Skalka, and R. A. Katz. 2004. Genome-wide analyses of avian sarcoma virus integration sites. J. Virol. 78:11656-11663.[Abstract/Free Full Text]
16 - Olson, W. K., A. A. Gorin, X. J. Lu, L. M. Hock, and V. B. Zhurkin. 1998. DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc. Natl. Acad. Sci. USA 95:11163-11168.[Abstract/Free Full Text]
17 - Pruss, D., F. D. Bushman, and A. P. Wolffe. 1994. Human immunodeficiency virus integrase directs integration to sites of severe DNA distortion within the nucleosome core. Proc. Natl. Acad. Sci. USA 91:5913-5917.[Abstract/Free Full Text]
18 - Pruss, D., R. Reeves, F. D. Bushman, and A. P. Wolffe. 1994. The influence of DNA and nucleosome structure on integration events directed by HIV integrase. J. Biol. Chem. 269:25031-25041.[Abstract/Free Full Text]
19 - Pryciak, P. M., A. Sil, and H. E. Varmus. 1992. Retroviral integration into minichromosomes in vitro. EMBO J. 11:291-303.[Medline]
20 - Pryciak, P. M., and H. E. Varmus. 1992. Nucleosomes, DNA-binding proteins, and DNA sequence modulate retroviral integration target site selection. Cell 69:769-780.[CrossRef][Medline]
21 - Roe, T., T. C. Reynolds, G. Yu, and P. O. Brown. 1993. Integration of murine leukemia virus DNA depends on mitosis. EMBO J. 12:2099-2108.[Medline]
22 - Schroder, A. R., P. Shinn, H. Chen, C. Berry, J. R. Ecker, and F. Bushman. 2002. HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110:521-529.[CrossRef][Medline]
23 - Stevens, S. W., and J. D. Griffith. 1996. Sequence analysis of the human DNA flanking sites of human immunodeficiency virus type 1 integration. J. Virol. 70:6459-6462.[Abstract]
24 - Taganov, K. D., I. Cuesta, R. Daniel, L. A. Cirillo, R. A. Katz, K. S. Zaret, and A. M. Skalka. 2004. Integrase-specific enhancement and suppression of retroviral DNA integration by compacted chromatin structure in vitro. J. Virol. 78:5848-5855.[Abstract/Free Full Text]
25 - Wu, X., and S. M. Burgess. 2004. Integration target site selection for retroviruses and transposable elements. Cell Mol. Life Sci. 61:2588-2596.[CrossRef][Medline]
26 - Wu, X., Y. Li, B. Crise, and S. M. Burgess. 2003. Transcription start regions in the human genome are favored targets for MLV integration. Science 300:1749-1751.[Abstract/Free Full Text]
Journal of Virology, April 2005, p. 5211-5214, Vol. 79, No. 8
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.8.5211-5214.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
This article has been cited by other articles:
-
Shun, M.-C., Botbol, Y., Li, X., Di Nunzio, F., Daigle, J. E., Yan, N., Lieberman, J., Lavigne, M., Engelman, A.
(2008). Identification and Characterization of PWWP Domain Residues Critical for LEDGF/p75 Chromatin Binding and Human Immunodeficiency Virus Type 1 Infectivity. J. Virol.
82: 11555-11567
[Abstract]
[Full Text]
-
Linheiro, R. S., Bergman, C. M.
(2008). Testing the palindromic target site model for DNA transposon insertion using the Drosophila melanogaster P-element. Nucleic Acids Res
36: 6199-6208
[Abstract]
[Full Text]
-
Hansen, G. M., Markesich, D. C., Burnett, M. B., Zhu, Q., Dionne, K. M., Richter, L. J., Finnell, R. H., Sands, A. T., Zambrowicz, B. P., Abuin, A.
(2008). Large-scale gene trapping in C57BL/6N mouse embryonic stem cells. Genome Res
18: 1670-1679
[Abstract]
[Full Text]
-
Meints, R. H., Ivey, R. G., Lee, A. M., Choi, T.-J.
(2008). Identification of Two Virus Integration Sites in the Brown Alga Feldmannia Chromosome. J. Virol.
82: 1407-1413
[Abstract]
[Full Text]
-
Wang, G. P., Ciuffi, A., Leipzig, J., Berry, C. C., Bushman, F. D.
(2007). HIV integration site selection: Analysis by massively parallel pyrosequencing reveals association with epigenetic modifications. Genome Res
17: 1186-1194
[Abstract]
[Full Text]
-
Derse, D., Crise, B., Li, Y., Princler, G., Lum, N., Stewart, C., McGrath, C. F., Hughes, S. H., Munroe, D. J., Wu, X.
(2007). Human T-Cell Leukemia Virus Type 1 Integration Target Sites in the Human Genome: Comparison with Those of Other Retroviruses. J. Virol.
81: 6731-6741
[Abstract]
[Full Text]
-
Garfinkel, D. J., Stefanisko, K. M., Nyswaner, K. M., Moore, S. P., Oh, J., Hughes, S. H.
(2006). Retrotransposon Suicide: Formation of Ty1 Circles and Autointegration via a Central DNA Flap. J. Virol.
80: 11920-11934
[Abstract]
[Full Text]
-
Moalic, Y., Blanchard, Y., Felix, H., Jestin, A.
(2006). Porcine Endogenous Retrovirus Integration Sites in the Human Genome: Features in Common with Those of Murine Leukemia Virus. J. Virol.
80: 10980-10988
[Abstract]
[Full Text]
-
Kang, Y., Moressi, C. J., Scheetz, T. E., Xie, L., Tran, D. T., Casavant, T. L., Ak, P., Benham, C. J., Davidson, B. L., McCray, P. B. Jr.
(2006). Integration site choice of a feline immunodeficiency virus vector.. J. Virol.
80: 8820-8823
[Abstract]
[Full Text]
-
Guiot, E., Carayon, K., Delelis, O., Simon, F., Tauc, P., Zubin, E., Gottikh, M., Mouscadet, J.-F., Brochon, J.-C., Deprez, E.
(2006). Relationship between the Oligomeric Status of HIV-1 Integrase on DNA and Enzymatic Activity. J. Biol. Chem.
281: 22707-22719
[Abstract]
[Full Text]
-
MacNeil, A., Sankale, J.-L., Meloni, S. T., Sarr, A. D., Mboup, S., Kanki, P.
(2006). Genomic Sites of Human Immunodeficiency Virus Type 2 (HIV-2) Integration: Similarities to HIV-1 In Vitro and Possible Differences In Vivo.. J. Virol.
80: 7316-7321
[Abstract]
[Full Text]
-
Nowrouzi, A., Dittrich, M., Klanke, C., Heinkelein, M., Rammling, M., Dandekar, T., von Kalle, C., Rethwilm, A.
(2006). Genome-wide mapping of foamy virus vector integrations into a human cell line.. J. Gen. Virol.
87: 1339-1347
[Abstract]
[Full Text]
-
Geurts, A. M., Hackett, C. S., Bell, J. B., Bergemann, T. L., Collier, L. S., Carlson, C. M., Largaespada, D. A., Hackett, P. B.
(2006). Structure-based prediction of insertion-site preferences of transposons into chromosomes.. Nucleic Acids Res
34: 2803-2811
[Abstract]
[Full Text]
-
Crise, B., Li, Y., Yuan, C., Morcock, D. R., Whitby, D., Munroe, D. J., Arthur, L. O., Wu, X.
(2005). Simian Immunodeficiency Virus Integration Preference Is Similar to That of Human Immunodeficiency Virus Type 1. J. Virol.
79: 12199-12204
[Abstract]
[Full Text]
-
(2005). Correction for Holman et al., Symmetrical base preferences surrounding HIV-1, avian sarcoma/leukosis virus, and murine leukemia virus integration sites, PNAS 2005 102:6103-6107
. Proc. Natl. Acad. Sci. USA
102: 6238-6238
[Full Text]