Previous Article | Next Article ![]()
Journal of Virology, October 2007, p. 11290-11303, Vol. 81, No. 20
0022-538X/07/$08.00+0 doi:10.1128/JVI.00963-07
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
20 Base Pairs Are a Significant Target for Recombinant Adeno-Associated Virus Vector Integration in the Liver, Muscles, and Heart in Mice
Department of Molecular Genetics & Biochemistry, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15261,1 Program in Genetics and Genome Biology, Hospital for Sick Children Research Institute, Toronto, Ontario M5G 1L8, Canada,2 Laboratory of Molecular Technology, SAIC-Frederick, Inc., NCI-Frederick, Frederick, Maryland 21702,3 Departments of Pediatrics and Genetics, Stanford University School of Medicine, Stanford, California 943054
Received 4 May 2007/ Accepted 30 July 2007
|
|
|---|
20 bp (total length,
40 bp) are a significant target for rAAV integration. Up to
30% of total integration events occurred in the vicinity of DNA palindromes with an arm length of
20 bp. Considering that DNA palindromes may constitute fragile genomic sites, our results support the notion that rAAV integrates at chromosomal sites susceptible to breakage or preexisting breakage sites. The use of rAAV to label fragile genomic sites may provide an important new tool for probing the intrinsic source of ongoing genomic instability in various tissues in animals, studying DNA palindrome metabolism in vivo, and understanding their possible contributions to carcinogenesis and aging. |
|
|---|
Although rAAV vector integration frequency is considered to be low, it is important to further understand the interactions between rAAV vector and host chromosomal DNA in various tissues in experimental animals. This is because a study has demonstrated increased incidence of liver cancer in rAAV2 vector-treated animals (5), the mechanisms for which remain elusive. In addition, the use of new serotype vectors with robust transduction efficiency, such as rAAV8, can increase vector genome loads in cells, which may pose an increased risk of undesirable genomic alterations in rAAV-transduced cells. Such robust serotype vectors now are widely used for many preclinical studies for gene therapy of various human diseases. Moreover, we have hypothesized that elucidation of the mechanisms of rAAV integration may help in understanding ongoing genomic instability in living animals, which is difficult to investigate with currently available strategies but is important for studies on carcinogenesis and aging. Furthermore, it has been shown that rAAV vectors serve as a powerful tool to study the mechanisms of fundamental biological processes in cells, such as DNA damage responses and DNA repair (11).
To begin to further understand rAAV integration in various tissues of experimental animals, it is essential to establish a system by which rAAV integration sites can be identified on a large scale in nondividing cells with high efficiency and reliability without any cell manipulations. This minimizes possible technical biases and enables identification of rAAV integrations in nonhepatic tissues, for which selection is not easy to perform. However, currently available methods for high-throughput rAAV integration site analysis all rely on cell division either under a selective pressure (40) or without selection (32). Cell division is required for diluting out extrachromosomal rAAV vector genomes with high complexity, which are abundantly present in cells in a quiescent state and inhibit efficient isolation of rAAV integrants.
In the present study, we have established a novel high-throughput method to identify rAAV integration sites in nondividing cells with high efficiency and high reliability independently of cell division, transgene expression, or selection. We have identified a thousand rAAV integration sites in quiescent somatic cells in mouse liver, skeletal muscle, and heart, and we discovered that DNA palindromes are a common target for rAAV integration. DNA palindromes are found prevalently in many organisms, including mammals (24, 58; S. M. Lewis, T. Zheng, S. Chen, T. Alleyne, J. Cheung, T. Chiang, and R. Richard, unpublished data). They have gained attention recently due to accumulating evidence that they have roles in promoting genomic instability in eukaryotes. This DNA motif has been shown to be involved in gene amplification (2, 7, 43, 46, 52-54, 60), nonrandom chromosomal translocations causing human diseases (6, 9, 12, 16-20, 55), genomic instability in animal models (1, 3, 4, 22, 23), retrovirus integration (14), and RAG protein-mediated transposition (45). Thus, the discovery that rAAV integrates preferentially at DNA palindromes not only provides further insights into the mechanisms of rAAV integration but also provides an unprecedented opportunity to study the biological impact and properties of naturally occurring DNA palindromes in tissues of living animals.
|
|
|---|
|
View larger version (11K): [in a new window] |
FIG. 1. AAV-ISce I.AO3 vector map. The vector genome consists (left to right) of an AAV2-ITR sequence, a stuffer sequence, an ISceI-BamHI combination site, a shortened Tn3 prokaryotic promoter (Pr), the ß-lactamase gene, the pUC plasmid origin of replication, a portion of MLV long terminal repeat (MLV-LTR), and an ITR. Diagrammed below is a schema of the preparative vector-cellular DNA junction fragments obtained by BamHI and BglII double digestion. Below that are fragments predicted for a diagnostic BstYI digestion used to analyze plasmid clones.
|
Southern blotting. Quantitative Southern blot analysis was performed to determine vector genome copy numbers per cell (i.e., double-stranded vector genome copy numbers per diploid genomic equivalent) in each rAAV-transduced tissue as previously described (34). A ß-lactamase-specific probe was used for the analysis.
Generation of rAAV vector integration site plasmid libraries. Plasmid rescue formed the basis of the strategy to isolate rAAV integration sites (31, 32, 35, 36, 40). Total DNA was extracted from mouse liver, heart, and lower limb muscle tissues that had been transduced with rAAV vector. Ten to 30 µg of DNA was incubated at 37°C with ISceI (New England BioLabs [NEB]) at 4 U per µg DNA for 4 h. Additional ISceI enzyme was added at 2 U per µg into each reaction mixture, which was then incubated for another 4 h. After addition of an equal amount of yeast genomic DNA (Saccharomyces cerevisiae S288C from the ATCC), the DNA samples were treated with 1 U per µg calf intestinal alkaline phosphatase at 50°C for 1 h. The calf intestinal alkaline phosphatase-treated samples were mixed with a 0.2 volume of 2% low-melting-temperature (LMT) agarose gel in 1x Tris-EDTA (TE) buffer, poured into wells of a 1% LMT agarose gel, and electrophoresed at a low voltage overnight at 4°C. Regions of each lane containing only high-molecular-weight (HMW) DNA were excised and equilibrated first with 1x TE buffer and then with 1x NEB buffer 2 with bovine serum albumin at 4°C. In-gel digestion of HMW DNA was performed with BamHI and BglII (10 U each per µg) (NEB) at 37°C for 2 h. DNA was recovered by dissolving the gel in 1x ß-agarase buffer (NEB) at 70°C for 20 min, followed by incubation with ß-agarase at 42°C for 1 h, phenol-chloroform extraction, and ethanol precipitation with ammonium acetate and glycogen. The resulting DNA pellets were dissolved in water and quantified by spectrophotometry. The DNA was then self ligated at a concentration of 3 µg DNA in 700 µl of a reaction mixture containing 1,400 U of T4 DNA ligase (NEB) at 16°C overnight. The DNA preparations were purified with phenol-chloroform, followed by isopropanol precipitation with potassium acetate. Linear DNA was removed by dissolving the DNA pellets in water and incubating them with ATP-dependent exonuclease (10 U per µg DNA; Plasmid Safe; Epicenter) for at least 4 h. The DNA was purified again with phenol-chloroform, ethanol precipitated with sodium acetate, and dissolved in water. ElectroMax DH10B Escherichia coli (Invitrogen) cells were transformed with 1 to 3 µg of the final DNA product. The resulting plasmid libraries were plated on Luria-Bertani (LB) agar plates containing ampicillin (50 µg/ml).
High-throughput analysis of rAAV provirus plasmid libraries.
Plasmid DNA was prepared from each E. coli colony with a Perfectprep Plasmid 96 Vac direct bind system (Eppendorf) or with manual minipreps. Each plasmid DNA was digested with BstYI and was separated on 1.2% agarose gels with a control plasmid, pAAV-ISce I.AO3, treated in the same manner. BstYI cuts at the sites generated by BamHI-BamHI, BamHI-BglII, and BglII-BglII cohesive end ligation. Therefore, BstYI digestion yields diagnostic 199- and 768-bp bands (Fig. 1). When a plasmid contains the rAAV-cellular DNA junction sequence, a third band of
1 kb should emerge. Therefore, we selected all the plasmids containing at least these three bands for the downstream analysis. Plasmid DNA sequence was determined as previously described with a 3730x DNA analyzer (Applied Biosystems) and sequencing primer OriP2 (40). In some cases, a sequencing primer, 36-39BamHI-1 (5'-CGACACGGAAATGTTGAATACTCAT-3'), also was used.
PCR amplification and subsequent cloning of rAAV-labeled DNA palindromes into a plasmid. Four representative DNA palindromes labeled by rAAV integrations (palindrome coordinates chr 10: 98632057, chr 11: 44726916, chr 15: 99702254, and chr 19: 11606158; see Table 3) were amplified by PCR in a 50-µl reaction mixture containing 0.5 µg naive mouse liver genomic DNA, 2x Pfx amplification buffer, 1 mM MgCl2, 0.2 mM of each deoxynucleoside triphosphate, 0.4 µM each of forward and reverse primers, and 1 U of Platinum Pfx DNA polymerase (Invitrogen). PCR cycles were 2 min at 95°C and 34 cycles of 15 s at 95°C, 30 s at 60°C, and 30 s at 68°C, and subsequently 5 min at 68°C. The primer combinations were 49-102Pal10D1-1F1 and 49-102Pal10D1-1R1 for DNA palindrome chr 10: 98632057, 49-102Pal11B1.1-1F1 and 49-102Pal11B1.1-1R2 for DNA palindrome chr 11: 44726916, 49-102Pal15F1-1F1 and 49-102Pal15F1-1R1 for DNA palindrome chr 15: 99702254, and 49-102Pal19A-1F1 and 49-102Pal19A-1R1 for DNA palindrome chr 19: 11606158.
|
View this table: [in a new window] |
TABLE 3. Summary of palindromes recurrently labeled by rAAV integration
|
The PCR products were treated with T4 polynucleotide kinase (NEB), inserted into the unique EcoRV site of pBluescript KS II(–) (Stratagene) by DNA ligation with T4 DNA ligase, and then introduced into ElectroMax DH10B E. coli for cloning. The transformed bacteria were plated on LB agar plates containing ampicillin (50 µg/ml).
The stability of DNA palindromes upon cloning in bacteria was assessed in the following manner. Plasmid DNA was recovered from each colony, digested with a combination of EcoRI and HindIII, which excised the cloned PCR products containing each DNA palindrome, and analyzed by 2% agarose gel electrophoresis. In addition, plasmid DNA was sequenced with an M13 reverse primer (5'-GGAAACAGCTATGACCATG-3').
Bioinformatics. We performed in silico digestion of the mouse genomic DNA with BamHI and BglII in the following manner. The entire sequence of each mouse chromosome (University of California Santa Cruz [UCSC] mm8 and NCBI Build 36, February 2006 freeze) was searched for BamHI (GGATCC) and BglII (AGATCT) sites using a Perl script. The fragment length was calculated from one site to the next site found.
For mapping of the rAAV integration sites, we first determined the breakage site of the rAAV vector genome using the BLAST 2 sequences (bl2seq) program of NCBI by comparing the vector genome sequence and isolated plasmid sequence. The rAAV-flanking DNA then was used as the query against the public mouse genome database (UCSC mm8 and NCBI Build 36) as previously described (40, 59). If rAAV-flanking sequences were not identified in the mouse genome, we searched them against human genome sequences and the sequences of the plasmids used for vector production, i.e., pHLP19 (AAV2 helper plasmid), p5E18-VD2-8 (AAV8 helper plasmid), pladeno5 (adenovirus helper plasmid), and pAAV-ISce I.AO3 (AAV-ISce I.AO3 vector plasmid). This was necessary because human genomic DNA and these plasmid DNAs, which are irrelevant to the rAAV vector genome, often were found incorporated in rAAV vector genomes by illegitimate recombination at the time of rAAV vector production in human embryonic kidney 293 cells, as previously reported (31, 32, 35, 37, 40). As a measure of illegitimate events occurring during the plasmid rescue procedure, we also searched isolated rAAV provirus plasmid DNA against the yeast genome. When precise breakpoints could not be determined due to microhomology at integration sites, we defined the break sites of the rAAV vector and the flanking mouse genome sequence such that the microhomology was included in both sides. Computer-simulated random integration sites (1,000, 10,000, and 30,000) were generated as previously described (40, 59). Random numbers were generated by a rand() function in the Perl program, with the srand() function for seed as previously described (59). To investigate the spectrum of rAAV and random integration sites in the mouse genome, we downloaded coordinates of RefSeq genes, CpG islands, and genomic locations of micro-RNA (miRNA) for the February 2006 mouse genome freeze from the UCSC genome project website and analyzed integration site data as previously described (40, 59).
For palindrome analyses, the flanking 500-bp genomic sequences centromeric and telomeric to each integration site were extracted from the mouse genome database and were searched for the presence of DNA palindromes. The size of the examined window (i.e., 1 kb) took into consideration that rAAV integration can be accompanied by chromosomal sequence deletion and that, in mice, over 80% of these are less than 1 kb (40). Each 1-kb sequence was aligned against itself using the NCBI bl2seq program (version BLAST-2-2-8) with a setting of (-p blastn, -G 5, -E 2, -q -2, -r 1, -e 10.0, -w 11). All the self-complementary sequences identified by the program, the minimum being 12 bp in total length (6 bp in arm length), were collected and again searched with a newer version of the NCBI bl2seq program (version BLAST-2-2-14) using the same settings for confirmation.
In our analysis, palindromes were defined according to the following criteria: (i) inverted repeats are present and spaced
5 bp apart; (ii) the arm length is
6 bp; (iii) mismatches are minimized such that overall self complementarity with any spacer included is
80%; and (iv) no sequence gaps occur in the self-aligned region. If two or more palindromes occurred within the 1-kb window of sequence around the rAAV or random integration site, the longest palindrome (first priority) and closest proximity (second priority) to each integration site was designated the integration-labeled palindrome. When the longest palindromic regions had multiple alternative self alignments, which was particularly the case with long (AT)n-containing palindromes, the palindrome showing the longest alignment with self complementarity of
90% was taken, if present, as the integration-labeled palindrome. rAAV orientation was defined according to whether the vector was incorporated in a plus or minus orientation relative to the numbering of the mouse reference sequence. An orientation was randomly assigned to each simulated integration event using a rand() function. A total of 20 independent random integration data sets that included an orientation parameter were generated for the simulated insertion sites.
The coordinates of the center of each palindrome were defined as follows. For palindromes covering an odd number of base pairs (spacer included), the coordinate of the center-most position was used. For palindromes covering an even number of base pairs, where both of two positions are central, the base pair closest to the integration site was taken as the palindrome's central coordinate.
The coordinates of the rAAV integration sites from our previous study (accession numbers EI173586 to EI174306) (40) were updated, and the rAAV provirus-flanking DNA sequences were reanalyzed according to the February 2006 freeze (UCSC mm8 and NCBI Build 36). The palindrome analysis was performed as described above.
For comparison, rAAV integration sites identified in nonselected human cultured cells (32) were retrieved from GenBank. Of 1,172 submitted sequences (accession numbers DU709854 to DU711025), 815 were chosen according to the following criteria: (i) we excluded sequences with plasmid-derived or undefined rAAV flanks; (ii) we excluded sequences with insertions of over 20 bp in length at junctions; (iii) we excluded sequences in which the breakpoint in the rAAV genome was too close to the sequence primer and therefore we could not determine the breakpoints; (iv) we excluded rAAV integrations into the human rRNA gene repeats; and (v) we included only integration sites that mapped to a unique site in the human genome (hg 18) with identity of 95% or more.
For dinucleotide repeat analyses, we determined the lengths and positions of integration-labeled dinucleotide repeat tracts present in the same 1-kb sequence windows (±500 bp from rAAV and random integration sites) as described for the palindrome analyses.
Statistics.
The experimentally determined palindrome-labeling and integration frequencies were compared to those expected from 1,000, 10,000, or 30,000 randomly occurring integrations. The biological significance of various parameters of interest was assessed by determining the statistical significance of detected biases using the two-tailed
2 test. For cases in which values in a contingency table were five or less, the two-tailed Fisher's exact probability test was used. Twenty-nine integrations landing in the rRNA gene repeats in the present study were excluded from the statistical analyses.
Nucleotide sequence accession numbers. Sequences of the AAV vector plasmid pAAV-ISce I.AO3 and of the rAAV vector genome-host cellular DNA junction have been deposited in GenBank under the accession numbers EU022316 (for pAAV-ISce I.AO3) and ER934559 to ER935499 and ER935831 (for rAAV integration junction sequences).
|
|
|---|
In rAAV-transduced cells, rAAV genomes exist in various double-stranded DNA forms. These include extrachromosomal double-stranded circular monomers, double-stranded circular and linear concatemers, and integrated monomeric and concatemeric genomes, all of which exhibit various rearrangements (35, 36, 38, 40, 41). This multiplicity of structure creates significant complexity in DNA samples from rAAV-transduced cells, where only a small portion of vector genomes actually integrates into host chromosomal DNA (41). In proliferating cells, the extrachromosomal forms can be diluted out by cell division, but substantial dilution does not occur in quiescent cells in animal tissues, adding greatly to the challenge of identifying rAAV integration events in the latter context. To isolate many integration sites in nondividing cells, we developed a new methodology, for which the important features include the following. (i) Abundant extrachromosomal circular monomer genomes and concatemeric genomes are removed by ISceI digestion and size fractionation. (ii) Circular ligation products are prepared free of recombinogenic linear DNA before transformation of bacteria. (iii) Artifactual recombination events between rAAV vector genomes and mouse chromosomal DNA generated either in vitro or in bacteria is monitored by addition of yeast genomic DNA. The yeast genomic DNA mixed with sample DNA serves as a tag sequence of unwanted recombination, which we can measure as the proportion of rescued plasmids bearing rAAV genomes flanked with yeast genomic DNA.
Representativeness of the rAAV integration site data set in our study. To minimize technical biases, the choice of the appropriate restriction enzyme(s) employed in plasmid rescue was critical. To construct libraries best representing vector integration sites, digestion of sample DNA with a restriction enzyme(s) must generate DNA fragments in appropriate sizes so that a vast majority of digested DNA fragments are small enough to transform bacteria efficiently.
In the present study, we opted to use BamHI and BglII double digestion, in which BamHI cuts the vector genome once. To verify that the choice of BamHI and BglII double digestion was appropriate, in silico restriction enzyme digestion of the mouse genome was performed. A histogram of all 1,395,386 BamHI-BglII restricted fragments revealed that 90 and 99% of the fragments are
4.5 and
9.5 kb in length, respectively (Fig. 2). To perform a similar analysis of BamHI-BglII-restricted fragments flanking random integration sites, we multiplied the size of each of 1,395,386 BamHI-BglII-restricted fragments with a random number between 0 and 1. This analysis revealed that 90 and 99% of random integration-flanking BamHI-BglII-restricted fragments should be
2.5 and
6.0 kb in length, respectively (Fig. 2). Because the maximum predicted size of rAAV vector genomes contained in rAAV integration junction plasmids is, in theory, 2.2 kb, 90 and 99% of rAAV provirus plasmid sizes should be
4.7 and
8.2 kb, respectively.
![]() View larger version (12K): [in a new window] |
FIG. 2. Histogram of in silico BamHI-BglII-digested mouse genomic DNA fragments. In silico-digested DNA fragments were grouped based upon size in 500-bp increments. , BamHI-BglII-digested mouse genomic DNA; , BamHI-BglII-digested mouse genomic DNA flanked with computer-simulated random integrants (see the text).
|
The reliability of our plasmid rescue strategy was assessed by monitoring illegitimate intermolecular recombination between rAAV vector genomes and yeast genomic DNA. Among 1,269 plasmid clones obtained from our libraries, no such recombination events were observed, demonstrating the high fidelity of the method.
A survey of rAAV integration sites in quiescent somatic cells from mouse liver, skeletal muscle, and heart. rAAV2 and rAAV8 vectors with AAV-ISce I.AO3 genomes were introduced into C57BL/6J and DNA-PKcs-deficient C57BL/6J SCID male mice by injection via the tail vein or the portal vein. The purpose of studying both wild-type and DNA-PKcs-deficient mice was to investigate whether DNA-PKcs activity influences rAAV integration patterns in vivo. A recent study by Song et al. suggested that DNA-PKcs inhibits rAAV integration in cultured cells and in mouse liver (51). Serotypes 2 and 8 were chosen because they primarily transduce hepatocytes, striated myofibers, and cardiomyocytes in the liver, skeletal muscle, and heart, respectively (10, 34). These cell types are presumed to be mostly quiescent in adult animals. A dose of 7.2 x 1012 or 5.0 x 1010 vector genomes (vg) of AAV8-ISce I.AO3 per mouse was infused via the tail vein, and 3.0 x 1011 vg of AAV2-ISce I.AO3 per mouse was infused via the portal vein (Table 1). According to our previous studies, a tail vein injection of 5.0 x 1010 vg/mouse of rAAV8 vector or a portal vein injection of 3.0 x 1011 vg/mouse of rAAV2 vector transduces 5 to 10% of hepatocytes in the liver (10, 34, 39). A tail vein injection of 7.2 x 1012 vg/mouse of rAAV8 vector can transduce virtually all the hepatocytes, cardiomyocytes, and striated myofibers (10, 34, 39). Liver, skeletal muscle, and heart tissues were harvested 6 weeks postinjection to identify rAAV integration sites in each of these tissues.
|
View this table: [in a new window] |
TABLE 1. Summary of results from high-throughput rAAV integration site analysis
|
rAAV integrates preferentially in genes and near gene regulatory sequences in hepatic and nonhepatic mouse tissues. We investigated rAAV integration site preference in mouse liver, skeletal muscle, and heart. Consistent with previous reports (32, 40), the rRNA gene repeat is a preferred site for rAAV integration (29/997 = 2.9%, compared to a predicted 0.3% frequency from a random integration model) (40) (Table 1). Excluding these rRNA gene integrations, a total of 941 rAAV integration sites mapped to unique sites in the mouse genome. We compared rAAV integration sites to 10,000 computer-simulated random integrations (Table 2). The results were consistent with our previous observation, i.e., preferential integration into RefSeq genes, near transcription start sites, and into or near CpG islands (32, 40). However, the bias for transcription start sites and CpG islands in the present study (in which integration sites were identified in quiescent cells) was less pronounced than in our previous study (in which integration sites were isolated after in vivo selection of integrants) (40) (Table 2). Apart from this difference, the trend for integration into RefSeq genes was quite comparable between the two studies. We also investigated rAAV integration in or near miRNA-coding regions. None and 2 of 941 rAAV integrations occurred within the region ±1 and ±5 kb from genomic locations of miRNA, respectively.
|
View this table: [in a new window] |
TABLE 2. Locations of unselected rAAV integration sites, computer-simulated random integration sites, in-vivo-selected rAAV integration sites, and break-prone palindromes in various mouse tissuesd
|
6 bp with a Perl script based on NCBI's bl2seq program as described in Materials and Methods. For the purposes of the study, a palindrome was defined as an inverted repeat sequence with an arm length of
6 bp, a spacer of
5 bp, and overall self complementarity of
80%.
We compared the frequency of integration in the vicinity of DNA palindromes for the working data set of 941 rAAV integrations to those for 1,000 and 30,000 computer-simulated random integrations. A DNA palindrome found within 500 bp of either side of an integration site that fulfilled our criteria was defined as having been integration labeled. We set a 1-kb window for the analysis, because rAAV integrations are frequently accompanied by chromosomal sequence deletions (40). A total of 369 rAAV integrations labeled a DNA palindrome with an arm length of
6 bp. Of the 1,000 computer-simulated integrations, 318 random integrations labeled a palindrome with an arm length of
6 bp. Of the 30,000 computer-simulated integrations, 1,327 labeled a palindrome with an arm length of
11 bp. We categorized DNA palindromes based on their arm length and compared the palindrome-labeling frequency in each category between the experimental and simulated groups. This analysis unambiguously revealed that 1-kb regions containing DNA palindromes are indeed hot spots for rAAV vector integration in the liver, skeletal muscle, and heart (Fig. 3). For example, in SCID mouse heart, 30% of all rAAV integrations were in the vicinity of DNA palindromes with an arm length of
20 bp, whereas a 1.3% frequency was observed in the random simulation (Fig. 3J). The observation holds for both wild-type mice and DNA-PKcs-deficient SCID mice (Fig. 3C, D, E, F, H, I, and J). The effect was diminished in a sample for which a high-vector-genome load in cells had been measured at the time tissue was harvested (Fig. 3B and Table 2). Notably, preferential labeling of palindromes was not observed in our previous study (36, 40) (Fig. 3K and L).
![]() View larger version (29K): [in a new window] |
FIG. 3. Frequency of palindrome labeling by rAAV integration compared to that of labeling by random integration. Palindromes at or near rAAV and random integration sites are categorized based on their arm length. (A to L) Frequency of palindrome labeling as a percentage of total integrations is plotted for each size category. Experimental variables are given above each graph: rAAV vector serotype (AAV2 or AAV8)/mouse strain (B6, C57BL/6J; Sc, SCID; HTI, hereditary tyrosinemia type I mouse)/tissue type (Lv, liver; M, skeletal muscle; H, heart)/vector dose (hi, high dose; lo, low dose; see Tables 1 and 2). The number in parentheses in each panel indicates the total number of rAAV integration sites analyzed in each group. For panels A to J, 3' rAAV integration sites were isolated without selection, while integration sites were isolated after in vivo selection for panels K and L, in which 5' and 3' rAAV integration sites are separately displayed. (M) Palindrome labeling frequency data from panels A to J are com- bined and plotted as a function of palindrome arm length. For this analysis, labeling frequency of palindromes with the same arm length is compared between rAAV and random integrations. For palindromes with an arm length of 11 bp, points represent palindrome arm lengths in increments of 2 bp (i.e., 11 to 12 bp, 13 to 14 bp, and so on). Asterisks and solid triangles indicate statistical significance compared to results for random integrations (P < 0.001 [two asterisks] and 0.001 P < 0.01 [one asterisk] by 2 test; P < 0.001 [closed triangle] by Fisher's exact test).
|
20 bp (total length,
40 bp) have a biological impact on rAAV integration.
The 369 rAAV-labeled DNA palindromes with an arm length range between 6 and 72 bp and 1,645 random integration-labeled palindromes with an arm length range between 6 and 86 bp were compared to determine the minimum length of palindromes with biological significance. In the analysis, rAAV- and simulation-labeled palindromes were sorted according to arm length, and each set was displayed in plots showing the labeling frequency as a function of arm length. The difference became significant at a palindrome arm length of 19 to 20 bp or more (Fig. 3M). Thus, we conclude that naturally occurring palindromes as short as
40 bp (arm length of
20 bp) have an impact in vivo, attracting rAAV genomes for their integration. This is a significant finding, in that it demonstrates that DNA palindromes much shorter than well-characterized unstable AT-rich palindromes with an arm length of 150 to 300 bp (18) can have some impact in vivo.
DNA palindromes with an arm length of
20 bp are in fact the primary target for integration.
The aforementioned observation that rAAV integration occurred preferentially near DNA palindromes per se does not necessarily indicate that rAAV integrated into DNA palindromes themselves. It might be possible that DNA palindromes merely attracted rAAV vector genomes by some mechanism and allowed them to integrate in their vicinity but not necessarily into DNA palindromes. To investigate which is likely the case, we analyzed the positional relationship between rAAV integration sites and the nearby labeled palindromes. For this, the location of each of the 369 rAAV-labeled palindromes was scored within the defined 1-kb region extending ±500 bp of each integration site. The histogram collating the palindrome locations for the rAAV integration site collection showed a unimodal distribution pattern with a peak corresponding to the rAAV integration site (Fig. 4A). Importantly, this distribution pattern is primarily attributed to that of 134 DNA palindromes with arm lengths of
20 bp (Fig. 4B) and not to 235 of those with arm lengths of
19 bp (Fig. 4C). Twenty-six of 134 rAAV-labeled palindromes with arm lengths of
20 bp had rAAV integration within the palindromes. No specific pattern was observed for 399 similarly analyzed randomly labeled palindromes with arm lengths of
20 bp (data not shown). This observation strongly indicates that DNA palindromes themselves were in fact the primary target for rAAV integration. rAAV-cellular DNA junctions found near but outside the DNA palindromes most likely resulted from deletions of host chromosomal DNA associated with integrations (40) that would be within the DNA palindrome if no cellular DNA sequence deletions had occurred. In addition, the contrasting positional distribution patterns between palindromes with arm lengths of
20 and
19 bp give an independent confirmation, along with the statistical evidence shown in Fig. 3M, that a biological impact becomes apparent only once palindromes have a length of roughly
40 bp.
![]() View larger version (19K): [in a new window] |
FIG. 4. Positional relationship between rAAV integration sites and DNA palindromes (pal.). The positions of rAAV-labeled DNA palindromes are displayed relative to their associated rAAV integration sites. A 1-kb sequence window represents ±500 bp centromeric (minus) and telomeric (plus) to each rAAV integration site (centered at the 0-bp position). The histogram gives the number of palindromes located within each 50-bp increment relative to the rAAV integration sites (i.e., positions –500 to –451, –450 to –401, and so on). An exception is the 51-bp center window from positions –50 to 0, which includes an rAAV integration site. (A) rAAV-labeled DNA palindromes with an arm length of 6 bp (369 palindromes in all). (B) rAAV-labeled DNA palindromes with an arm length of 20 bp (134 in all). (C) rAAV-labeled DNA palindromes with an arm length between 6 and 19 bp (235 in all).
|
20 bp have special attributes. By mapping locations of the 134 sites of rAAV integration events that labeled DNA palindromes with an arm length of
20 bp within the 1-kb window, it became clear that rAAV integrations were almost exclusively center-deleted events (Fig. 5C). Even more notable was the strong enhancement of integrations for the positions closest to (and before) the palindrome symmetry center (Fig. 5C). Neither trend was observed when 235 rAAV-labeled palindromes with an arm length of
19 bp were displayed (Fig. 5D). One may argue that palindromic sequences were deleted in bacteria, resulting in an artifactual bias toward center-deleted integration. Although we cannot totally exclude this possibility in all the integration events presented here, a majority of palindrome-labeling integration events should correctly present the center-deleted or center-retained events based on the observations presented in the last section of Results.
![]() View larger version (20K): [in a new window] |
FIG. 5. Patterns of palindrome (Pal.) center retention/deletion associated with rAAV integration. (A) Definition of center-deleted and center-retained rAAV integration. (B) Percentage of center-deleted rAAV integrations of total rAAV integrations is plotted as a function of palindrome arm length and compared to that for random integration. To increase the power of analysis, palindromes with different arm lengths are combined in the following manner: 9 to 10 bp (shown as 10 in the figure), 11 to 14 bp (14), 15 to 18 bp (18), 19 to 22 bp (22), 23 to 26 bp (26), 27 to 28 bp (28), 29 to 30 bp (30), and 31 bp or more ( 31). We compared the observed frequency to that of 20 independently generated random integration data sets (see Materials and Methods) by the 2 test or Fisher's exact test. Solid triangles indicate statistical significance (P < 0.05; Fisher's exact test) for all 20 random data set comparisons. Three representative random integration data sets are included. (C and D) Positional relationship between center-deleted or center-retained rAAV integrations and the palindrome symmetry center. rAAV integration sites are shown as histograms, giving their locations in 50-bp increments across a 1-kb region centered on the palindrome symmetry center. An exception is the 51-bp center window from positions –50 to 0, which includes the palindrome symmetry center. Center-deleted or center-retained rAAV integrations are displayed in the left (clear background) or right (gray background) portion of the 1-kb window as labeled. (C) rAAV-labeled DNA palindromes with an arm length of 20 bp (134 palindromes in all). (D) rAAV-labeled DNA palindromes with an arm length between 6 and 19 bp (235 in all).
|
20 bp. The analysis revealed that rAAV-labeled palindromes have a significantly higher proportion of (AT)n palindromes than random integration-labeled palindromes (Fig. 6A). This demonstrates that (AT)n palindromes are a significant type of DNA palindrome that constitute the targets for rAAV vector integration.
![]() View larger version (19K): [in a new window] |
FIG. 6. Significance of (AT)n and (AC)n (GT)n palindromes in rAAV-labeled palindromes susceptible to breakage. (A) Percentage of rAAV-labeled (AT)n palindromes among the 134 rAAV-labeled palindromes with an arm length of 20 bp is plotted as a function of n. Random integration-labeled palindromes are likewise plotted. Asterisks and solid triangles indicate statistical significance between rAAV and random integrations (*, P < 0.001 by 2 test; closed triangle, P < 0.001 by Fisher's exact test). (B) DNA sequences of six (AC)n (GT)n palindromes (numbered 1 to 6) labeled by rAAV integration. Black arrowheads indicate the positions at which joining to the rAAV vector genome occurred. Uppercase letters denote the portion of the palindrome sequence that was retained in each junction, while lowercase letters denote the nonretained portion. Underlined nucleotides are central spacer sequences. All six (AC)n (GT)n palindromes show a center-deleted pattern. Two representative (AT)n palindromes are shown as 7 and 8 for comparison. Coordinates of the palindromes numbered 1 to 8 (UCSC mm8) are chr 10: 38176562, chr 5: 118212140, chr 6: 7083048, chr 15: 29735040, chr 8: 120556025, chr 16: 69258562, chr X: 150903099, and chr 15: 3434443, respectively.
|
20 bp were comprised of (AC)n on one arm and (GT)n on the other [i.e., (AC)n (GT)n palindromes] (Fig. 6B). Statistical analysis revealed that the rAAV labeling frequency of this type of non-AT-rich palindrome with an arm length of
20 bp (6 of 941) was significantly higher than the random labeling frequency (27/30,000) (P < 0.0001;
2 test). In addition, all of the (AC)n (GT)n palindromes were labeled in a center-deleted pattern (Fig. 6B). Thus, not only AT-rich palindromes but also other types of palindromes constitute targets for rAAV vector integration.
Nonpalindromic dinucleotides are not rAAV integration targets.
An important question to address in the present case, given that rAAV-labeled palindromes were predominantly (AT)n palindromes (Fig. 6A), is whether it is a palindrome or simple dinucleotide repeat nature that is operative in creating a platform for rAAV integration. To address this, we took all possible dinucleotide repeats [i.e., (AT)n or (TA)n, (CG)n or (GC)n, (AC)n or (GT)n, and (AG)n or (CT)n] and measured their labeling frequency in the experimental or randomly assigned data sets. Here, labeling frequency is the frequency of rAAV integration events at or near dinucleotide repeat tracts of interest (i.e., whether or not at least one dinucleotide repeat tract is found within 500 bp of each integration site). The rAAV labeling frequency was significantly higher than that of random labeling only for (AT)n (Fig. 7A). This was not the case for nonpalindromic dinucleotide repeats (AC or GT)n or (AG or CT)n (Fig. 7B and C). (CG)n was rarely found in experimental and simulation data sets. It should be noted that when we counted all (AC or GT)n (n
6) repeats in the integration-labeled 1-kb sequence windows, we observed a significant difference in the prevalence of (AC or GT)n (n
6) between 134 rAAV integrations labeling palindromes with an arm length of
20 bp and 1,000 random integrations. We found 71 (AC or GT)n repeats (n
6) in the 134 1-kb sequence windows containing rAAV-labeled palindromes. This frequency (71/134) was threefold higher than the number expected from random integration (181/1,000), although the (AC or GT)n labeling frequency was not significantly different between the experimental and simulation groups (Fig. 7B). This observation was in accord with our finding that 18 of the 71 (AC or TG)n repeats (n
6) were derived from (AC)n (GT)n palindromes or (AC or GT)n dinucleotide repeats occasionally commingled as inverted repeats in the left and right arms of (AT)n palindromes. Thus, only palindromic (AT)n and paired (AC)n and (GT)n are the significant type of dinucleotide repeats that create a platform for preferential rAAV integration.
![]() View larger version (38K): [in a new window] |
FIG. 7. Analyses of dinucleotide labeling by rAAV integration. Relative frequency of rAAV-labeling of the four possible types of dinucleotide repeats among total integration events is plotted as a function of n (n is the number of dinucleotide repeats). For comparison, the frequency of random integration labeling is also plotted. The four types of dinucleotide repeats are (AT)n (A), (AC or GT)n (B), (AG or CT)n (C), and (CG)n (D).
|
20 bp in the mouse genome form a platform for rAAV integration.
The 134 rAAV-labeled DNA palindromes with arm lengths of
20 bp were found throughout the genome (Fig. 8), with approximately half of them residing in genic regions (Table 2). It was of interest to address whether all or only a portion of DNA palindromes with an arm length of
20 bp were susceptible to breakage, serving as a platform for rAAV integration. To investigate this, we took advantage of the observation that certain DNA palindromes were recurrently labeled by rAAV, and such palindromes were predominantly (AT)n (n
20) palindromes (Table 3). In the mouse genome database, we identified 10,659 (AT)n (n
20) palindromes. There were 119 (AT)n (n
20) palindromes labeled by rAAV, and of these, 9 palindromes were labeled twice by independent integration events (Table 3). According to a random model, the probability that a palindrome has n times labeling in the sample size of 119 follows the equation P(n) = 119Cn(1/10,659)n(10,658/10,659)119 – n [where P is probability and nCr = n!/(n – r)!r!, where nCr is the combination symbol and ! is factorial]; i.e., P(0) = 9.89 x 10–1, P(1) = 1.10 x 10–2, and P(2) = 6.11 x 10–5. Recurrent rAAV labeling of the same palindrome in this sample size is not accommodated in a random model (P < 0.0001;
2 test). Thus, DNA palindromes susceptible to breakage constitute a special subset of all (AT)n (n
20) palindromes.
![]() View larger version (43K): [in a new window] |
FIG. 8. Distribution of break-prone palindromes in the mouse genome. A total of 134 rAAV-labeled palindromes with an arm length of 20 bp are mapped on a normal mouse karyotype. Symbols indicate center-deleted integration in (white circles) or near (black circles) palindromes and center-retained integration in (white squares) or near (gray squares) palindromes, respectively. There was no rAAV-labeled break-prone palindrome on the Y chromosome in the present study.
|
20 bp were labeled in wild-type mice to that in SCID mice. We compared tissues for which over 60 integration sites were identified in both the wild-type mouse and SCID mouse groups, and no significant difference in the frequency of palindrome labeling was found between the two mouse strains in the sample size of our analysis (Table 4). Although further analyses may be required, DNA-PKcs does not appear to be required for preferential rAAV integration at DNA palindromes. Since we often found microhomology up to 8 bp at rAAV integration sites in both DNA-PKcs-proficient and -deficient mice, a DNA-PK-independent microhomology-dependent alternative end-joining pathway(s) (13, 26) might be operative in rAAV integration. |
View this table: [in a new window] |
TABLE 4. Frequency of rAAV labeling of DNA palindromes with an arm length of 20 bp in the presence or absence of DNA-PKcs activitya
|
EcoRI and HindIII double digestion of plasmid DNA containing the DNA palindrome with an arm length of 47 bp (chr 15: 99702254) recovered from bacteria consistently showed a deletion of approximately
60 bp by 2% agarose gel electrophoresis. Sequencing analysis of a plasmid clone revealed a 64-bp deletion within the 92-bp central AT dinucleotide repeat tract [(AT)n; n = 46] in the DNA palindrome. In the other three DNA palindromes with an arm length range of 28 to 33 bp (coordinates chr 10: 98632057, chr 11: 44726916, and chr 19: 11606158), no apparent deletion was detected by EcoRI and HindIII double digestion followed by 2% agarose gel electrophoresis. Sequencing analysis of these three DNA palindromes cloned in bacteria indicated that palindromes had been relatively stably maintained in bacteria, although complete sequencing of entire PCR products cloned in plasmids often was not successful due to the presence of a long stretch of AT dinucleotide repeats, which significantly reduced sequence signals in electropherograms beyond the AT dinucleotide repeats. Fluctuations of the lengths of PCR-amplified AT dinucleotide repeats within the DNA palindromes (–6 bp to +12 bp for deletions and additions, respectively) were observed depending on the plasmid clones sequenced.
To further assess the stability of DNA palindromes at coordinates chr 10: 98632057, chr 11: 44726916, and chr 19: 11606158, transformed bacteria from a single colony were replated on LB agar plates, and plasmid DNA recovered from 18 colonies representing the progeny of a single colony was analyzed by EcoRI and HindIII double digestion followed by 2% agarose gel electrophoresis. Three of 18 colonies for palindrome chr 10: 98632057 carried plasmids with an
20-bp deletion; 2 of 18 colonies for palindrome chr 11: 44726916 carried plasmids with an
50 bp deletion; and none of 18 colonies for palindrome chr 19: 11606158 had plasmids with a deletion. Although we observed deletions in some clones, the results indicate that plasmid DNA molecules with a deletion should constitute only a minor fraction of total plasmid DNA molecules harboring each of the three DNA palindromes.
|
|
|---|
40 bp) could have an impact in vivo and serve as a platform for preferential rAAV integration provides significant insights into the mechanism of rAAV integration and the stability of this DNA motif in the genome of quiescent somatic cells in animals. Miller et al. have recently demonstrated that rAAV vectors integrate at DNA double-strand breaks created by ISceI digestion in cultured cells. Based on their experimental observations, they proposed that rAAV vectors do not cause chromosomal breaks and integrate at preexisting chromosomal breakage sites (30). Although whether rAAV vectors integrate only at preexisting DNA breaks or whether rAAV vectors create DNA breaks to make a platform for integration may require further studies, our observations are at least consistent with the notion that rAAV integrates at fragile sites in the host chromosomal DNA.
It is important to understand how rAAV vectors integrate at DNA palindromes. A stream of evidence in the studies of rare cases of DNA palindrome-associated de novo human chromosome translocations (18), central rearrangement of exogenously introduced genomic DNA palindromes in mice (1, 4, 22, 23), and in vitro studies of AT-rich palindromes isolated from translocation sites (19), as well as studies in yeast (25, 33, 42), have indicated that palindromes form a hairpin or cruciform structure, and then palindrome resolution and recombination follow, resulting in genomic alteration (reviewed in references 18 and 23). In the present study, the 134 rAAV integration events in the vicinity of DNA palindromes (arm length,
20 bp), including 26 rAAV integrations that occurred within DNA palindromes, exhibited exclusively the center-deleted integration pattern (Fig. 5C). Although an argument may exist that such a significant bias has resulted from deletions of DNA palindromes in bacteria, most DNA palindromes in the study were those in a modest size range that are generally not grossly unstable upon cloning in E. coli (21). In fact, as long as we analyzed the stability of four DNA palindromes in DH10B E. coli (palindrome coordinates chr 10: 98632057, chr 11: 44726916, chr 15: 99702254, and chr 19: 11606158 in Table 3), palindromes with an arm length range of at least 28 to 33 bp could be relatively stably maintained in this strain of bacteria. Only one palindrome with a longer arm length of 47 bp (total length of 94 bp; coordinate chr 15: 99702254) consistently exhibited an
60-bp deletion within the central AT dinucleotide repeats. It should be noted that even when we focus on only the 93 rAAV-labeled palindromes with arm lengths of
33 bp, only 5 of them showed the center-retained integration pattern. All of the above results indicate that DNA palindromes were the primary targets for rAAV integration, and various degrees of chromosomal DNA deletions occurred at integration, resulting in exclusively center-deleted integrations. Based on these observations, we propose a model for how rAAV might integrate at DNA palindromes (Fig. 9). In this model, DNA palindromes form double-stranded cruciforms by torsional stress or single-stranded hairpins by strand slippage or misalignment. Double-stranded or single-stranded breaks then are created by not-yet-defined mechanisms, forming a platform for rAAV vector integration.
![]() View larger version (17K): [in a new window] |
FIG. 9. Proposed model for rAAV integration at palindromes in either a cruciform or hairpin structure. In this model, hairpin loops of palindromic AAV-ITR and DNA palindromes on the genome are nicked, trimmed, and joined by cellular DNA repair mechanisms. Various degrees of host chromosomal DNA deletions occur at rAAV integration, resulting in exclusively center-deleted integrations.
|
It was a surprise that preferential rAAV integration at DNA palindromes was not revealed in our previous study, in which we investigated rAAV integrations in mouse liver using a hereditary tyrosinemia type I mouse model and in vivo selection (Fig. 3K and L) (40). This trend also was not observed in another large-scale rAAV integration site study by Miller et al., in which rAAV integration events were collected from dividing human cells infected with rAAV in vitro under no selective pressure (32). Since Miller et al. did not specifically focus on DNA palindromes in their analysis, we reanalyzed 815 rAAV integration sites reported by them (405 and 410 of the left and right sides of rAAV-host cellular DNA junctions in human cells, respectively, from data made available online; see Materials and Methods) in the same way as we did for the present study and found no evidence of preferential integration at DNA palindromes. At this point, it remains unclear why there were significant differences in the experimental observations among these three studies. There are a number of parameters that we need to consider to interpret the differences (i.e., mouse cells versus human cells, quiescent cells versus proliferating cells, in vivo assay versus in vitro assay, selection versus no selection, differences in the vector construct and plasmid rescue procedures, and so on). Although these possibilities should be investigated further in future experiments, a tempting and straightforward inference is that entering the cell cycle will erase palindrome labeling. One key difference among the three studies is the proliferation status of the sampled cells. It has been reported that DNA breaks can persist as unrepaired DNA lesions in nondividing cells both in vitro (47) and in animal tissues in vivo (50). The palindrome-labeling events may thus create a DNA lesion that is eradicated through repair or apoptosis if the cell starts to proliferate but may remain unrepaired if the cell remains quiescent. Once a cell with unrepaired DNA lesions with rAAV adducts enters the cell cycle, the unbroken sister chromatid will become available in late S and G2 phases, in which the cell may use homologous recombination to eliminate any rAAV adducts. If the cell enters the cell cycle and fails to repair the DNA lesions with rAAV adducts, it would die by apoptosis, thus eliminating the cell itself. Alternatively, the quiescent state of cells studied might create an environment in which DNA palindromes become more prone to breakage.
In summary, our demonstration that DNA palindromes are a significant target for rAAV vector integration provides significant new insights into the mechanisms of rAAV integration in vivo and the biological impact of DNA palindromes on the mouse genome. Further investigation might productively focus on how and whether rAAV evokes DNA repair machinery directed toward little-understood processes involved in the metabolism of DNA hairpin structures in both rAAV genomes (i.e., inverted terminal repeats) and cellular genomes. The demonstration that rAAV preferentially integrates into DNA palindromes and presumably can label a break-prone subset of DNA palindromes is a beginning. Further studies on the interactions between rAAV vector genome, host cellular DNA, and cellular DNA repair machinery will not only provide clues on how to improve the current rAAV systems for human gene therapy but will also suggest new opportunities to explore the intrinsic sources of genome instability, palindrome metabolism, carcinogenesis, and aging.
This work was supported by Public Health Service grants DK68636 and DK78388 (to H.N.) and HL64274 (to M.A.K.) from the National Institutes of Health, a Career Development Award from the National Hemophilia Foundation (to H.N.), and at least in part by the National Cancer Institute, DHHS, under contract N01-CO-12400 with SAIC-Frederick, Inc.
The contents of this publication do not necessarily reflect the views or policies of the DHHS, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. government.
Published ahead of print on 8 August 2007. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»