Previous Article | Next Article ![]()
Journal of Virology, September 2005, p. 11434-11442, Vol. 79, No. 17
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.17.11434-11442.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Lisa M. Petek,2
Michael A. Jacobs,3
Rajinder Kaul,3 and
David W. Russell2,4*
Department of Pediatrics, Division of Genetics and Developmental Medicine,1 Department of Medicine, Divisions of Hematology,2 Medical Genetics,3 Department of Biochemistry, University of Washington, Seattle, Washington4
Received 17 February 2005/ Accepted 26 May 2005
|
|
|---|
|
|
|---|
Vectors based on adeno-associated virus (AAV) have a linear, single-stranded DNA genome containing a transgene cassette flanked by viral inverted terminal repeats (ITRs). Transduction occurs by multiple pathways, including integration into the host genome (24), expression from linear and circular episomal forms (6, 30), and homologous recombination with chromosomal sequences (37). Definitive evidence for integration has come from sequencing vector-chromosome junctions recovered from human cells (25, 26, 38, 47) and mouse tissues (28, 29), which demonstrated that AAV vectors integrate at nonhomologous chromosomal locations.
Chromosomal sequences surrounding vector proviruses can be deleted or rearranged (25, 26, 29), although it is not known if the AAV vector causes these changes. While integration may occur in only a subset of transduced cells, the large vector doses infused during in vivo gene delivery can lead to substantial numbers of integration events. In mouse models of liver transduction, integrated AAV vector genomes were present at roughly 0.05 copies/cell (32), a value that would correspond to 7.5 x 109 integration events in humans undergoing liver-directed gene therapy with AAV vectors (assuming 1.5 x 1011 cells/liver) (44). The potential consequences of billions of integration events are largely unknown.
Integration requires that the linear AAV vector genome ligate to two chromosomal ends. Unlike retroviral vectors, AAV vectors do not contain an endonuclease to generate chromosomal ends, so they must rely on existing double-strand breaks or nicks. The deletions, insertions, and microhomologies found at AAV vector-chromosome junctions suggest that integration occurs by the nonhomologous end-joining pathway of double-strand break repair (26), and AAV vectors will integrate at a specific double-strand break when it is created in human cells (25). This dependency on host cell factors and chromosomal features allows us to interpret AAV vector integration sites as chromosomal repair events tagged by a provirus.
Here we have performed a large-scale analysis of AAV vector integration sites in normal human cells in the absence of selective pressure, including their relationship to genes, repetitive DNAs, and other chromosomal features. We have characterized the deletions, insertions, and translocations associated with AAV vector integration and the structures of vector proviruses and identified several integration hotspots. Our results establish the profile of insertional mutagenesis associated with AAV vectors, and they suggest that similar integration studies may be a valuable tool for understanding chromosome biology.
|
|
|---|
Cell culture. Cells were grown at 37°C in 5% CO2 in Dulbecco's modified Eagle's medium containing 4 g of glucose/liter (Gibco/Invitrogen, Carlsbad, CA), 10% heat-inactivated fetal bovine serum, penicillin, and streptomycin. Primary, normal male human fibroblasts (MHF2) were obtained from the Coriell Institute for Medical Research (Camden, NJ; catalog no. GM05387). 293T cells have been described (7). MHF2 cells were transduced with the AAV2-TOA vector by seeding 6-cm tissue culture dishes with 5 x 105 cells on day 0, replacing the medium and infecting with 2.5 x 1010 genome-containing particles of AAV2-TOA on day 1 (5 x 104 genome-containing particles per cell), and replacing the medium again on day 3. On day 6 cells were detached with trypsin and seeded to one 15-cm dish. On day 10 the cells in a single 15-cm dish were split to three 15-cm dishes. On day 14 genomic DNA was prepared from the 15-cm dishes, except for 3 x 106 cells that were used to seed another set of three 15-cm dishes. This process was repeated every 6 days, and the majority of proviruses were isolated from DNA prepared on the fourth round of this procedure.
Vector preparation. The serotype 2 AAV vector AAV2-TOA was made by cotransfection of 293T cells with helper plasmid pDG and vector plasmid pA2TOA and purified by benzonase treatment of cell lysates, iodixanol step gradient, heparin affinity column chromatography (HiTrap, Amersham Biosciences, Uppsala Sweden), and HiTrap desalting column as described (48). AAV vector quantification was based on the amount of full-length single-stranded vector genomes detected by alkaline Southern blot analysis (18).
Shuttle vector rescue in bacteria. Rescue of AAV2-TOA proviruses was done as described (25) with the following modifications: 20 µg of genomic DNA containing integrated proviruses was digested with 80 units of MfeI, AvrII, or NcoI, extracted with phenol and chloroform, and precipitated with ethanol. DNA fragments were resuspended in 355 µl of H2O and brought to 400 µl with 40 µl of 10x ligation buffer and 5 µl of T4 DNA ligase (400 U/µl, New England Biolabs, Beverly, MA). Ligations were incubated at 15°C overnight to circularize fragments, heat inactivated by incubation at 65°C for 20 min, brought to 50 mM NaCl, digested further with 80 units of DpnI for an additional 2 h to remove bacterial DNA, extracted with phenol and chloroform, and precipitated with ethanol.
The DNA pellets were resuspended in 5 µl of H2O, and Escherichia coli strain DH10B (15) was transformed by electroporation with
4 µg (1 µl) of DNA at a time. Transformed bacteria were grown on agar containing 50 µg/ml ampicillin and colonies were replated to agar containing 12.5 µg/ml tetracycline. Bacteria resistant to both ampicillin and tetracycline were grown in 96-well culture dishes in freezing medium and then frozen at 80°C for future sequencing. Freezing medium contains 10 g tryptone, 5 g yeast extract, 10 g NaCl, 6.3 g K2HPO4, 1.8 g KH2PO4. 0.5 g sodium citrate, 0.9 g (NH4)2SO4, and 44 ml glycerol per liter of H2O, brought to 10 µM MgSO4 and supplemented with ampicillin after autoclaving.
Microarray analysis of gene expression levels. We seeded 5 x 105 MHF2 cells in six 6-cm dishes on day 0. On day 1 fresh medium was added to the dishes, and three dishes received 2.5 x 1010 genome-containing particles of AAV2-TOA. On day 3 RNA was harvested from confluent 6-cm dishes and all six samples were processed independently. Labeling of 5 µg of total RNA was performed as described by Affymetrix (Santa Clara, CA); 15 µg of cRNA was used per Affymetrix HG-U133 Plus 2.0 array, which analyzes 47,400 transcripts and variants. Only the subset of probes that identify specific RefSeq gene transcripts (13,069) were used in our analysis. Probe sets that hybridized with more than one gene were excluded. Gene expression levels from all three uninfected cell samples were averaged and compared to those from all three infected cell samples. Where multiple probe sets reflect the transcription level of a single RefSeq gene, the average transcription level was used in rankings.
Database searches and comparisons with genomic features. DNA sequences were processed with computer programs interpreted by the PERL programming language. Sequences were truncated at bp 500, and expected vector-derived sequences were trimmed. The resulting junction sequences were aligned to build 35 of the human genome and three additional files containing AAV2-TOA vector sequence, nonvector sequence from plasmid pA2TOA, and the 43-kb human ribosomal DNA (rDNA) repeat (GenBank accession no. U13369) (10) using a stand-alone version of BLAT (21) that generates a BLAST alignment score.
The input script was as follows: blat chromosome_file query_file out=blast8-ooc=11.ooc output_file. An additional 95% homology requirement and BLAST score of >100 were used to establish genomic positions. Alignments were sorted by BLAST score, and those with the five highest scores were saved for further processing. The average match length for all sequences was 383 bp. Nucleotide insertions were defined as sequence preceding the alignment with the highest BLAST score when the alignment did not start at position number 1 of the sequence query. Additional PERL programs were used to remove duplicate junction sequences, compare localized integration sites to various chromosomal features using tables available from the University of California-San Francisco database (20), and determine the positions of restriction enzyme sites in the human genome.
We produced a randomly localized set of genomic positions by generating random numbers between 1 and 5,941,037,819 (the size of the build 35 diploid male genome with chromosomes laid end to end) with the PERL "rand" function. The buffer size had to be increased from 15 to 31 bits to avoid generating duplicate numbers. These random numbers were converted to chromosomal positions by splitting the numeric range of the diploid genome into separate chromosomes with each starting at base pair 1 of the p arm and extending the entire length of the chromosome. These chromosomal positions were used to extract 383 bp of sequence from build 35 of the human genome at each randomly determined position, and the resulting files were aligned with the genome using BLAT as described above. About 7% of these extracted sequences corresponded to gapped or repetitive sequence in the human genome, could not be reliably localized, and were discarded. A set of 10,000 localized positions was used as a control data set (calculated random integration events) for comparison with AAV vector integration site positions. To analyze clustering and hotspots, we used similar sets of 499 and 670 random genomic positions as size-matched controls
To identify oncogenes, we searched several databases, including Entrez Gene (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db = gene), OMIM (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db = OMIM), the Tumor Gene Database (http://www.tumor-gene.org/TGDB/tgdb.html), and the Retrovirus Tagged Cancer Gene Database (http://rtcgd.ncifcrf.gov/).
Statistical analysis.
In all cases statistical significance was determined using the
2 test to compare AAV vector integration site frequencies with those of randomly generated genomic positions. P values were determined using tables, and those less than 0.01 were considered significant.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Summary of sequencing and localization
|
![]() View larger version (27K): [in a new window] |
FIG. 1. Location of junction sites in the AAV vector proviruses. In the top panel, the nucleotide sequence of an AAV vector ITR in the flop orientation is shown numbered from 1 to 145 beginning at the 3' end. The locations of the most common vector-chromosome junctions are indicated by underlined, bold type. In the bottom panel, the percent of vector-chromosome junctions found at specific base pairs in the AAV ITR and adjoining vector sequence is shown. The percent found in three observed peaks is also indicated.
|
|
View this table: [in a new window] |
TABLE 2. Genomic features of integration sites
|
3 independent proviruses within 500 kb (Fig. 2B), may explain why some chromosomes had statistically significant increases in integration frequencies. Chromosomes 7 and 19 had more vector integrations than expected, with three and two hotspots, respectively. A list of all the hotspots meeting these criteria is shown in Table 3, and in many cases, the hotspot size was significantly less than 500 kb.
![]() View larger version (51K): [in a new window] |
FIG. 2. Chromosomal distribution of integration sites. (A) Localized AAV vector integration sites (n = 670) and a calculated set of random sites (n = 10,000) are graphed as a percentage of all integrants in each chromosome. Only one integration junction was included for AAV vector proviruses where both ends were localized to the same chromosome. Asterisks mark comparisons with P values of <0.01. (B) A human chromosome ideogram is shown with AAV vector integration sites (dots to the left of each chromosome) and hotspots where at least three integrants were found within 500 kb (boxed dots; see Table 3). Each dot represents a unique AAV vector integrant (n = 670) and is 33% opaque to display multiple overlapping integrants. Ribosomal DNA repeats present on the p arm of chromosomes 13, 14, 15, 21, and 22 contained a significant number of AAV vector integrants that are described separately in Fig. 3.
|
|
View this table: [in a new window] |
TABLE 3. AAV vector integration hotspots
|
![]() View larger version (21K): [in a new window] |
FIG. 3. Clustering of AAV vector integration sites and localization within the human ribosomal DNA. (A) The distance between each unique AAV vector integration site identified with one sequencing primer (n = 499) and its nearest neighbor was determined and binned by size, and the percentage of proviruses within each bin was plotted. We used only left junctions in this analysis, to ensure that two different ends of the same provirus were not scored as neighbors. As a control, we performed a similar analysis on three size-matched sets of randomly distributed sites (n = 499), and plotted means with standard deviations. Significant differences (P < 0.01) are marked with asterisks. Each bar represents the number of clones with a nearest neighbor within the distance bounded by the values on the x axis. (B) Each AAV vector integration junction localized to the 43-kb ribosomal DNA repeat unit is shown as a circle above a scale diagram of the repeat. Solid circles connected by lines represent the junctions of proviruses where both ends were localized (n = 21). Open circles represent proviruses where only one end was localized (n = 37). The 13.3-kb transcribed region is drawn with open boxes, where the thick regions represent the 18S, 5.8S, and 23S rRNAs. The 29.7-kb intergenic spacer is depicted as a single bold line.
|
106 bp (Fig. 4A) and 35% of junctions had insertions of DNA that also varied in size (Fig. 4B). Sixteen proviruses had left and right junctions where the best alignment scores were on different chromosomes (Table 4). As these represent possible chromosomal translocations, we used additional criteria to establish their validity. In 11 of 16 proviruses with mismatched ends (MM6 to MM16), the second-best BLAST score was >90% of the best score for at least one end, raising the possibility that there may have been a localization error due to sequence repeats. Many of these junctions mapped to pericentromeric chromosomal regions rich in alpha satellite DNA. The remaining five proviruses had BLAST scores for both ends that were significantly above those of other possible alignments, and three of these had scores over 500 for both ends (MM1 to MM3). Even if one conservatively assumes that only the three translocations meeting the most rigorous criteria are real, this represents nearly 1% of all vector integrations (3 of 323).
![]() View larger version (14K): [in a new window] |
FIG. 4. Deletions and insertions found at integration sites. (A) The sizes of chromosomal deletions associated with AAV vector proviruses (n = 212) were sorted into bins and the percentage of proviruses in each deletion size range bounded by values shown on the x axis was plotted. (B) The sizes of insertions found at AAV vector junctions (n = 416) were sorted into bins and the percentage of proviruses with insertions in each size range bounded by values shown on the x axis was plotted. Junctions linked to vector or plasmid sequence were not included.
|
|
View this table: [in a new window] |
TABLE 4. Vector proviruses where left and right junctions mapped to different chromosomes
|
![]() View larger version (21K): [in a new window] |
FIG. 5. Integration in CpG islands and at transcription start sites. (A) Localized AAV vector integration sites (n = 670) and a calculated set of random sites (n = 10,000) were mapped relative to those of CpG islands (identified as GC content 50%, length >200 bp, and ratio of observed to expected number of CpG dinucleotides >0.6). Integration sites within CpG islands or in twelve 0.75-kb windows flanking each island (the average size of a CpG island is 764 bp) were binned and plotted as the percentage of all sites. Significant differences (P < 0.01) are marked with an asterisk. (B) AAV vector integration sites (n = 670) and a calculated set of random sites (n = 10,000) were mapped relative to the transcription start sites of RefSeq genes, binned into windows of increasing sequence size, and plotted as a percentage of all integrants per kb (to account for the different window sizes). Significant differences (P < 0.01) are marked with an asterisk.
|
![]() View larger version (41K): [in a new window] |
FIG. 6. RefSeq gene expression levels and AAV vector integration. RNA samples from uninfected normal human fibroblasts (y axis) and those infected with AAV2-TOA (x axis) were hybridized to the human U133A Plus 2.0 gene chip array (Affymetrix) and the expression levels of all RefSeq genes assayed in the array were ranked and plotted (light gray dots). RefSeq genes containing AAV vector integrants are circled in red. The average expression rank of RefSeq genes containing a calculated random integrant (n = 2,930; yellow circle), those containing AAV vector integrants (n = 195; purple circle), and those containing AAV vector integrants within 1 kb of transcription start sites (n = 14; green square) are shown.
|
|
|
|---|
An important issue regarding insertional mutagenesis is its relationship to chromosomal gene transcription. We found that AAV vectors had only a modest preference for integration within transcription units (38.81% versus 34.76% for calculated random integrants), which was less than that reported for AAV vector integration in mouse livers, where 72% of integrants were in genes (29). This could reflect differences in the etiology of chromosomal breaks used for integration, where relatively low hepatocyte proliferation rates may decrease the impact of DNA replication on integration sites and increase the impact of transcription. AAV vectors had a more dramatic preference for integration at transcription start sites, with a >3-fold enrichment within the first kb transcribed, but these accounted for just 2.1% of all integrants due to the small window size. We found only a modest effect of transcription, with the average expression rank of genes containing proviruses at any transcribed position or within 1 kb of the start site only 1.09 to 1.16-fold greater than that of genes with calculated random integration events.
There was also a significant preference for integration in CpG islands (4.03% versus 0.84% for calculated random integrants), which typically act as transcriptional control elements for nearby genes (3, 23). Surprisingly, some of our results were similar to those of murine leukemia virus vectors, where integrations also occurred preferentially near transcription start sites and CpG islands, with only a slight overall preference for genes (46). Thus, despite the distinct life cycles of these viruses, they may share aspects of integration site selection.
AAV vectors do not contain an endonuclease, so integration must occur at chromosomal sites where free DNA ends form. This can take the form of a double-strand break (25) or perhaps a nick that is converted to a double-strand break during DNA replication. The preference for transcriptional start sites, CpG islands, and segmental duplications suggests that these regions may be prone to DNA damage that leads to breaks. In the case of transcription start sites, this damage could be due to local unwinding that initiates transcription and exposes bases. CpG islands can have altered chromatin structure and hypersensitivity to nucleases (42, 45), may act as replication origins (5), and, when methylated, can be mutagenic due to deamination of 5-methylcytosine (34), all of which could increase the likelihood of strand breakage. Segmental duplications are recombinogenic areas of the human genome (39) that may also recombine with AAV vector DNA by similar mechanisms. By the same reasoning, long terminal repeats may be relatively protected from DNA damage, as they were underrepresented sites of AAV vector integration.
The integration hotspots we observed may also be damage-prone areas of the genome. The major hotspot in rDNA could reflect unique aspects of these repeats, which are frequently involved in recombination events with distinct mechanisms in transcribed and nontranscribed regions (11) that may account for the distribution of vector integrations we observed. In crustaceans and insects, rDNA can be a preferred site of transposon insertions (19, 35). Other hotspots correlated with known areas of genomic instability. Hotspots chr7HS1 and chr7HS2 both flank the region where deletions occur in Williams-Beuren syndrome (2), and hotspot chr3HS1 lies near the common fragile site FRA3B (17), an area where DNA gaps and breaks may form and a known integration site for plasmids and papillomavirus (36, 43). The study of AAV vector integration hotspots may lead to new insights into chromosome biology, as the integrated proviruses serve as tags for chromosomal damage at the nucleotide level. Large-scale integration surveys done in specific cell types or under different conditions may help us understand how chromosome structures change during differentiation or genotoxic stress.
Our findings help to define the spectrum of insertional mutagenesis associated with AAV vectors, with implications for their use in gene therapy. Overall, we observed a broad distribution of integrations throughout the human genome, with significant clustering in several hotspots. The effects of integrating in the rDNA repeat hotspot are not known, but given that these genes are already highly expressed and present in multiple copies, there should be minimal phenotypic effects. A major concern at other hotspots is the potential for activating oncogenes by introducing promoters and/or enhancers, and although wild-type AAV has never been shown to cause cancer, the transcriptional control elements are different in vectors. The hotspots we identified did not include known oncogenes, but given the broad distribution of integration sites, one must assume that a provirus could integrate near any gene.
The chromosomal changes associated with integration can be significant, as large deletions and even translocations were observed. Since DNA damage present at these sites presumably exposed free DNA ends prior to integration, a key remaining question is whether the same chromosomal effects would have occurred in the absence of AAV. For example, would repair of damaged chromosomes produce the same deletion sizes or translocations without vector integration? Our study also underscores the variability that occurs in proviral structures. All proviruses were deleted to some degree at their terminal repeats, and many sequence reads identified vector genomes joined to other vector or plasmid sequences that could affect transgene expression. Our results complement a recent report of AAV vector integration in murine hepatocytes with similar findings, albeit with a greater preference for transcribed genes (31). Further research will be required to determine what types of integration events might occur in clinical trials, including studies with cells from different tissues and preclinical animal models.
This work was supported by grants from the U.S. National Institutes of Health and the Child Health Research Center at Children's Hospital, Seattle, Wash.
Supplemental material for this article may be found at http://jvi.asm.org. ![]()
Present address: Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Wash. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»