Previous Article | Next Article ![]()
Journal of Virology, February 2004, p. 1301-1313, Vol. 78, No. 3
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.3.1301-1313.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Carlos F. Barbas III,2 and Samson A. Chow1*
Department of Molecular and Medical Pharmacology, Molecular Biology Institute, and UCLA AIDS Institute, UCLA School of Medicine, Los Angeles, California 90095,1 The Skaggs Institute for Chemical Biology and the Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 920372
Received 19 August 2003/ Accepted 14 October 2003
|
|
|---|
|
|
|---|
A salient feature of retroviral integration is that the site of insertion can occur throughout the chromosomes of the target cell. Analyses of integration sites in cells infected with human immunodeficiency virus type 1 (HIV-1) (9, 48), avian sarcoma virus (25), avian leukosis virus (56), and Rous sarcoma virus (52) revealed that most regions of the cellular DNA are accessible. However, the same region or exact nucleotide sequence in the host cell genome can be utilized at a frequency several hundred-fold greater than chance, lending credence to the idea that there are hot and cold spots for integration (14, 48, 56). Therefore, integration of retroviral DNA into target DNA is nonspecific, but it is not a random process.
The ability of retroviruses to permanently insert their genome into the chromosome of an infected cell is a property that can be exploited for gene therapy (for reviews, see references 30 and 42). However, because the integration reaction is an inherently mutagenic process, the nonspecific nature of integration can be a potential pitfall for introducing a transgene with retroviral vectors (for reviews, see references 53 and 54). Depending on the site of integration, insertional mutagenesis may disrupt normal cell functions by inactivating an essential host gene or inappropriately causing overexpression of an undesirable gene, such as a proto-oncogene. Development of leukemia associated with the use of retroviral vectors in gene therapy trials has been reported (22, 36), but the level of risk of cancer and other side effects caused by insertional mutagenesis has not been assessed carefully.
One strategy to control the site specificity of retroviral integration is through the use of a fusion protein consisting of a retroviral integrase and a sequence-specific DNA-binding protein, such as phage
repressor (7), Escherichia coli LexA repressor (20, 29), and murine transcription factor Zif268 (8). In vitro, the sequence-specific DNA-binding proteins direct integration by recognizing and binding to their target sites on the DNA, causing integration to be mediated into the adjacent regions. A major limitation of the strategy is that the DNA-binding sequences of the previously tested fusion proteins are defined and fixed and may not necessarily be localized to a desired chromosomal site. In addition these DNA-binding proteins can recognize multiple DNA variants of their consensus binding sequence, or the number of nucleotides required for specific protein-DNA interaction is insufficient for specifying a unique site within a mammalian genome (26, 35, 41). Using the HIV-1 integrase-LexA fusion protein as an example, although LexA protein binds to a 16-bp sequence with approximate twofold rotational symmetry, only three nucleotides at each end of the palindrome are highly conserved among the binding sties (35). In the human genome of 3 x 109 bp, we estimated that there are thousands of potential LexA-binding sites. The relatively low binding specificity, coupled with the difficulty of incorporating the fusion protein into infectious virions (8, 29), have made it difficult to assess whether these fusion proteins are able to direct integration in vivo.
One class of DNA-binding proteins that offers several advantages in conferring site specificity to retroviral integrases are the synthetic proteins derived from the Cys2-His2 zinc finger proteins (for reviews, see references 3 and 49). Structural studies of the Cys2-His2 zinc finger domain showed that it has a simple ßß
fold of
30 amino acids in length and is stabilized by hydrophobic interactions and zinc chelation (34, 39). Analysis of the three-zinc-finger protein Zif268-DNA complex revealed that the
-helix of each zinc finger fits directly into the major groove and the amino acid side chains make specific contacts with a 3-bp DNA subsite. Most of the base contacts involve the G-rich strand of the binding site (18, 41). Studies directed at modifying the sequence specificity of the zinc finger DNA-binding domains have shown that they can be selected to specifically bind a wide array of DNA sequences. In addition, many selected zinc finger domains exhibit sufficient modularity in their recognition of DNA triplets that they can be combined with other such domains to create polydactyl proteins that recognize extended sequences of DNA (5, 11, 17, 31, 37, 38, 46, 47, 50). One example of such a polydactyl protein is E2C, which contains six zinc finger domains. E2C was constructed by grafting the amino acid residues of each zinc finger involved in specific DNA recognition into the framework of the designed consensus protein Sp1C, a derivative of Sp1 (15). E2C binds with high affinity and recognizes a contiguous 18-bp sequence, which is unique in the human genome and located within the 5' untranslated region of the erbB-2 gene on chromosome 17 (4, 5). Artificial transcription factors based on modified zinc finger domains have been used to target specific DNA sequences and selectively activate or repress expression of reporter genes (4, 5, 10, 12, 16, 28, 32, 33, 37, 47).
Herein, we constructed and purified various fusion proteins consisting of HIV-1 integrase and the polydactyl zinc finger protein E2C. The fusion proteins retained their integration activity and ability to bind specifically to the E2C-binding site. Analysis of the distribution and frequency of integration events revealed that integration of viral DNA mediated by the integrase-E2C fusion proteins was biased near the E2C-binding site.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. DNA sequences of PCR primers and oligonucleotides used in construction of fusion proteins and DNA substrates for integration assays
|
10 and INR as the forward and reverse primers, respectively. The two PCR products containing a common overlapping region were annealed together, extended, and amplified in the presence of 0.2 mM deoxyribonucleoside triphosphates (United States Biochemicals), 2.5 U of Pfu polymerase (Pfu Turbo; Stratagene), and E2CF1 and INR as the forward and reverse primers, respectively. The extension and amplification were carried out in a thermocycler (MJ Research, Inc.) programmed for three cycles, with each cycle consisting of 5 min of denaturation at 95°C, 1 min of annealing at 50°C, and 2 min of extension at 72°C. This was followed by an additional 30 cycles, with each cycle consisting of 40 s of denaturation at 95°C, 40 s of annealing at 58°C, and 2 min of extension at 72°C. The final PCR products from the two separate reactions were digested with NdeI and BamHI, gel purified, and then ligated to pT7-7/H-IN previously cut with NdeI and BamHI to form pE2C/IN1-288 and pE2C/IN11-288, respectively. As a control, the E2C protein was also prepared as a fusion with glutathione S-transferase (GST), which was inserted at the C terminus of E2C. The GST gene was amplified by PCR using pGEX-2T (Amersham Biosciences, Piscataway, N.J.) as the template and GST(+) and GST(-) as the forward and reverse primers, respectively. The amplified product was digested with KpnI and PstI, followed by ligation to pE2C/IN1-288 previously digested with KpnI and PstI to remove the entire integrase gene.
To prepare a plasmid that contained a single binding site for the E2C protein, a double-stranded oligonucleotide containing the E2C-binding sequence, formed by annealing oligonucleotides Te2c(+) with Te2c(-), was inserted into the HindIII and BamHI sites of a plasmid derived from pBluescript II KS(+) (Stratagene), resulting in pBS-e2c. To prepare a plasmid that contained a mutant binding site for the E2C protein, two oligonucleotides, mTe2c(+) and mTe2c(-), which contain five nucleotide mutations in the E2C-binding sequence, were annealed and inserted into the identical location as described earlier for pBS-e2c to form pBS-e2cm.
The sequences of all the PCR-amplified DNA fragments were verified by restriction enzyme analysis and the dideoxynucleotide chain termination method. Sequencing reactions were carried out by the UCLA Sequencing Core Facility using an ABI 3700 DNA analyzer (PE Applied Biosystems, Foster City, Calif.).
Expression and purification of the fusion proteins. Wild-type and integrase-E2C fusion proteins were expressed and purified using a protocol similar to that described previously (1). The DNA constructs were transformed into E. coli BL21(DE3) cells. The bacterial colony was grown for 16 h at 37°C in 50 ml of Luria broth containing 80 µg of ampicillin (LB-amp)/ml. Forty milliliters of the bacterial culture was used to inoculate 2 liters of prewarmed LB-amp at 32 or 35°C. Expression of the recombinant protein was induced by the addition of 0.4 mM isopropyl-1-thio-ß-D-galactopyranoside when the optical density at 600 nm reached 0.8 to 1.0, and the culture was grown for an additional 3 h. The cell pellet from every liter of culture was resuspended in 40 ml of cold lysis buffer (20 mM HEPES [pH 7.5], 10% glycerol, 5 mM 2-mercaptoethanol, 2 µg of leupeptin/ml, 1 mM phenylmethylsulfonyl fluoride, 1 M NaCl, 0.2 mM EDTA, 0.5% Igepal, and 0.2 µg of hen egg lysozyme/ml). The cell suspensions were sonicated three times in a 1-min ice-water bath using a 0.5-inch horn tip (Branson Sonifier 450). The lysate was then centrifuged at 100,000 x g for 1 h at 4°C. The supernatant fraction, after dialysis against buffer A (20 mM HEPES [pH 7.5], 10% glycerol, 5 mM 2-mercaptoethanol, 1 M NaCl, 0.1% Igepal), was mixed with Ni2+-nitrilotriacetic acid agarose resin (Qiagen) on ice for 2 h and then washed four times with 10 ml of buffer A containing 50 mM imidazole. The resin was packed into a 15-cm by 0.7-cm (inner diameter) Econo-Column (Bio-Rad), and protein was eluted by applying 15 ml of buffer A with a linear gradient of 50 to 500 mM imidazole at 1 ml/min. The fractions containing the protein were pooled and dialyzed against buffer C {20 mM HEPES (pH 7.5), 20% glycerol, 1 mM dithiothreitol (DTT), 0.5 M NaCl, 0.1 mM EDTA, and 10 mM 3-[(3-cholamidopropyl)-dimethylammonio]-1-propanesulfonic acid (CHAPS)}. The His tag preceding the various fusion proteins was removed by incubating the protein with 80 to 100 NIH U of human thrombin (Sigma) per mg of protein and passing the digested protein through a cation-exchange chromatography column (5 cm by 1 cm [inner diameter]) packed with high-performance SP-Sepharose resin (Amersham Pharmacia). The protein solution was diluted in buffer D (20 mM HEPES [pH 7.0], 10% glycerol, 10 mM DTT, 0.1 mM EDTA, 10 mM CHAPS) to 0.1 mg/ml before loading onto the SP-Sepharose column. A gradient from 0 to 1 M NaCl in buffer D was used to elute the protein from the column. Peak fractions containing the non-His-tagged protein were pooled, concentrated by Centricon-10 columns (Amicon), and dialyzed against buffer GF (20 mM HEPES [pH 7.5], 10% glycerol, 1 mM DTT, 0.5 M NaCl, 0.1 mM EDTA, 10 mM CHAPS). The dialysate was then applied to a HiPrep 26/60 Sephacryl S-200 high-resolution gel filtration column (Amersham Pharmacia) previously equilibrated with buffer GF. The protein was eluted with buffer GF at a flow rate of 0.1 ml/min at 4°C on a BioLogic workstation system (Bio-Rad). Peak fractions containing the full-length protein were pooled and concentrated by a Centricon-10 column or using a stirred cell (model 8050; Amicon) with a YM10 ultrafiltration membrane (Millipore) at a N2 pressure of 50 lb/in2. The protein was then dialyzed against storage buffer (20 mM HEPES [pH 7.5], 20% glycerol, 50 µM ZnCl2, 0.3 M NaCl, 10 mM DTT, 10 mM CHAPS) overnight and stored at -80°C. Protein concentrations were determined by the Bradford assay (Bio-Rad) according to the manufacturer's instructions, using bovine serum albumin (BSA) as a standard.
For the E2C-GST fusion protein, the initial steps of protein expression and purification were identical to those described earlier for the integrase-E2C fusion protein. After incubation with human thrombin to remove the His tag at the N terminus, the digested protein was passed through a glutathione-Sepharose affinity chromatography column (Amersham Pharmacia). The column was washed with 10 ml of a buffer containing a final concentration of 10 mM sodium phosphate (pH 7.4), 1 M NaCl, and 0.1% Triton X-100, and the protein was eluted with glutathione elution buffer (100 mM Tris-HCl [pH 7.5], 0.5 M NaCl, 10 mM glutathione, 0.1% Triton X-100). After elution, the protein was dialyzed against storage buffer as described earlier.
Footprinting analysis of DNA binding.
To examine the ability of the various E2C-integrase fusion proteins to specifically recognize the E2C-binding site, pBS-e2p, which contains a single E2C-binding sequence, was digested with XhoI. The linearized DNA was labeled at the 3' ends using [
-32P]dCTP and Klenow fragment of E. coli DNA polymerase I (New England Biolabs). The labeled DNA was then digested with PstI, and the 382-bp singly end-labeled fragment containing the E2C-binding sequence was isolated from a 1% agarose gel with a QIAEX gel extraction kit (Qiagen). The labeled strand contained the G-rich sequence of the E2C-binding site. To analyze the DNase I digestion pattern of the C-rich sequence of the E2C-binding site, pGL3basic-e2p DNA containing an E2C-binding site (5) was digested with NcoI. The linearized DNA was labeled at the 5' ends using [
-32P]ATP and T4 polynucleotide kinase. The labeled DNA was digested with PstI, and the 395-bp singly end-labeled fragment containing the E2C-binding sequence was isolated from a 1% agarose gel as described previously. The 3' or 5' singly end-labeled DNA fragment (0.3 nM) was incubated with or without protein at room temperature for 30 min in a buffer containing a final concentration of 20 mM HEPES (pH 7.5), 0.05% Igepal, 1.5 mM CaCl2, 2.5 mM MgCl2, 50 mM NaCl, 10 mM DTT, 100 µg of BSA/ml, and 2 µg of poly(dI-dC)/ml. The sample was digested with 2 ng of DNase I/ml for 3 min at room temperature. The digestion was stopped by the addition of 18 mM EDTA, and the sample was deproteinized by phenol-chloroform extraction, ethanol precipitated in the presence of 10 µg of tRNA as a carrier, and resuspended in 5 µl of formamide-10 mM EDTA. After denaturation at 90°C for 3 min, the sample was analyzed by electrophoresis through a 6% denaturing polyacrylamide gel containing 7 M urea in a Tris-borate-EDTA buffer.
In vitro assays for integrase activity. In vitro activities of HIV-1 integrase and the various integrase-E2C fusion proteins were determined using established oligonucleotide-based assays (13). The 3'-end-processing and 3'-end-joining (strand transfer) reactions were carried out at 37°C for 1 h in a 20-µl reaction volume containing 5 nM 32P-labeled substrate, 75 nM purified enzyme, 20 mM HEPES (pH 7.5), 30 mM NaCl, 10 mM MnCl2, 10 mM DTT, and 0.05% Igepal. The oligonucleotides used as DNA substrates were purified by electrophoresis through a 15% denaturing polyacrylamide gel. Oligonucleotides B2-1 and C220 were labeled at the 5' end with [
-32P]ATP and T4 polynucleotide kinase (New England Biolabs). The substrate used to assay the 3'-end-processing and 3'-end-joining activities was a double-stranded oligonucleotide containing sequences derived from the U5 end of the HIV-1 long terminus repeat. The substrate was prepared by annealing the labeled C-220 strand with its complementary oligonucleotide V2. To assay only the 3'-end-joining activity, a substrate that resembles the viral U5 end after 3'-end processing was used. The preprocessed substrate was prepared by annealing the labeled B2-1 strand with the V2 strand. The reaction was stopped by adding 18 mM EDTA and heating at 90°C for 3 min before analysis by electrophoresis on a denaturing 15% polyacrylamide gel with 7 M urea in Tris-borate-EDTA buffer. The gel was then dried and placed into a PhosphorImager cassette (Molecular Dynamics), and the reaction products were analyzed with the ImageQuant software (Molecular Dynamics).
PCR-based assay for distribution and frequency of integration events. The PCR-based integration assay is used for analyzing target DNA sites chosen for integration (44, 51). Individual integration events along a target DNA are amplified through PCR to show the distribution and frequency of integration. One microgram of target DNA, pBS-e2c or pBS-e2cm, was preincubated with wild-type HIV-1 integrase or the fusion protein at room temperature for 15 min in a 20-µl final volume of the standard reaction buffer. The donor substrate was the 21-bp oligonucleotide mimicking the preprocessed HIV-1 U5 end and was prepared by annealing B2-1 to V2. The integration reaction was started by adding 15 nM preprocessed U5 DNA and incubating at 37°C for 30 or 60 min. The reaction was terminated with the addition of 80 µl of stop solution (10 mM Tris-HCl [pH 7.5], 5 mM EDTA [pH 8.0], 375 mM sodium acetate, 0.25 mg of tRNA/ml). The DNA was extracted with phenol-chloroform, ethanol precipitated, and resuspended in 50 µl of 10 mM Tris-HCl (pH 7.5), 1 mM EDTA (pH 8.0). A 3-µl aliquot of the reaction mixture was then added to a buffer containing 10 mM Tris-HCl (pH 8.8), 50 mM KCl, 0.001% gelatin (wt/vol), 1.5 mM MgCl2, 200 µM deoxyribonucleoside triphosphates, 5 pmol each of the forward and reverse primers, and 1 U of Taq polymerase (Taq 2000; Stratagene) in a final volume of 20 µl. To monitor integration events occurring into the DNA strand containing the G-rich sequence of the E2C-binding site, 0.25 µM oligonucleotide PR-G was used as the reverse primer. The forward primer for the PCR, B2-1, is complementary to the U5 donor substrate and was prepared by mixing 0.05 µM 5'-end-labeled B2-1 and 0.20 µM unlabeled B2-1. To monitor integration events occurring into the DNA strand containing the C-rich sequence of the E2C-binding site, 0.25 µM oligonucleotide PR-C and B2-1 were used as the reverse and forward primers, respectively. Integration events were amplified by 25 or 35 cycles of PCR: 1 min at 94°C, 1 min at 55°C, and 2 min at 72°C. The radiolabeled PCR products were resolved on a 6% denaturing polyacrylamide gel containing 7 M urea in a Tris-borate-EDTA buffer and analyzed with a PhosphorImager (Molecular Dynamics).
|
|
|---|
![]() View larger version (18K): [in a new window] |
FIG. 1. (A) Primary structures of HIV-1 integrase and E2C fusion proteins. Open and shaded boxes represent the peptides derived from HIV-1 integrase (IN) and polydactyl zinc finger protein E2C, respectively. The stippled box represents peptides from GST. The numbers in parentheses correspond to the amino acid residues included in each fusion protein. Full-length HIV-1 integrase and the E2C protein have 288 and 183 amino acids, respectively. The predicted molecular mass (in kilodaltons) of the various recombinant proteins is indicated on the right. The peptide containing seven consecutive His residues (His tag) used for affinity chromatography was removed from the N terminus by thrombin cleavage during purification. (B) Coomassie blue-stained SDS-polyacrylamide gel of various purified proteins. One microgram of each purified protein as labeled on the top was run on an SDS-12.5% polyacrylamide gel (lanes 1 to 7). Lane 8 contains the molecular weight standards (Gibco BRL) with masses in kilodaltons indicated on the left.
|
The activities of the fusion proteins were first tested for their abilities to catalyze 3'-end processing and 3'-end joining using oligonucleotide-based assays (13). A representation of the in vitro activity is shown in Fig. 2, and the results are summarized in Table 2. In this assay, 3'-end-processing and -joining activities are assayed by the appearance of a product that is shortened by two nucleotides and products that are longer in length than the input DNA, respectively (Fig. 2). The lengths of the 3'-end-joining products are heterogeneous because the site of joining is nonspecific. Fusion of full-length HIV-1 integrase to the N or C terminus of E2C did not change appreciably the 3'-end-processing and -joining activities from those of the wild-type integrase (Fig. 2, lanes 4 and 5 versus lane 3). A fusion protein consisting of a 10-amino-acid deletion at the N terminus of HIV-1 integrase tethered to the C terminus of E2C retained a wild-type level of 3'-end-processing activity but an approximately 50% decrease in the 3'-end-joining activity (Fig. 2, lane 6). Fusion proteins containing a larger deletion in either the N or C terminus of integrase, IN50-288/E2C or IN1-234/E2C, had a weak 3'-end-processing activity and an undetectable level of 3'-end-joining activity with the oligonucleotide substrates (Fig. 2, lanes 7 and 8).
![]() View larger version (90K): [in a new window] |
FIG. 2. Catalytic activities of HIV-1 integrase-E2C fusion proteins. The reaction was carried out with 5 nM U5 end oligonucleotide (C220/V2) and a 100 nM concentration of the indicated proteins (lanes 3 to 8). Lane 1 contains the size marker, and the lengths of DNA in nucleotides are indicated on the left. Lane 2 represents a reaction done in the absence of protein. The filled arrowhead denotes the position of the substrate (21-mer), and the open arrowhead indicates the position of the 3'-end-processing product (19-mer). The bands that migrated above the substrate are the products of the 3'-end-joining reaction (strand transfer products [s.t.p.]).
|
|
View this table: [in a new window] |
TABLE 2. Summary of in vitro activities of HIV-1 integrase and E2C fusion proteins
|
![]() View larger version (98K): [in a new window] |
FIG. 3. Footprinting analysis of protein binding to an E2C recognition sequence. (A) Digestion pattern of the C-rich strand of the E2C-binding site. The 395-bp PstI-NcoI DNA fragment of pGL3Basic-e2p (5) was singly labeled at the 5' end of the DNA strand containing the C-rich sequence of the E2C-binding site. (B) Digestion pattern of the G-rich strand of the E2C-binding site. The 382-bp XhoI-PstI DNA fragment of pBS-e2p was singly labeled at the 3' end of the strand containing the G-rich sequence of the E2C-binding site. In both panels, the labeled fragment (0.3 nM) was incubated with a 50 nM concentration of the indicated protein (lanes 4 to 10) for 30 min at room temperature. The samples were then digested with DNase I (2 ng/ml) for 3 min at room temperature, and the digested products were separated on a denaturing polyacrylamide gel. Lane 1 contains size markers, with the DNA lengths in nucleotides indicated on the left. Lane 2 represents the undigested, singly end-labeled DNA fragment. Lane 3 represents a digestion carried out in the absence of protein. The stippled boxes on the right indicate the locations of the E2C-binding site. In panel B, the open box indicates the extended region of protection observed with fusion proteins containing E2C at the C terminus of the integrase (lanes 8 to 10).
|
Site-directed integration mediated by HIV-1 integrase-E2C fusion protein. A PCR-based assay (Fig. 4A) was used to examine the usage of target sites by HIV-1 integrase and the various fusion proteins (1, 20). Integration reactions were conducted as described in Materials and Methods. The plasmid BS-e2c (Fig. 4B), which contains a single E2C-binding site, was used as the target DNA for integration. The integration products into the target DNA strand containing the C-rich sequence of the E2C-binding site were amplified by PCR and analyzed on a denaturing polyacrylamide gel (Fig. 5A). Each band on the gel corresponds to an integration event at a given phosphodiester bond. The frequency of integration at a particular site and its exact position can be determined by the intensity of the band and by use of a sequencing ladder, respectively. In reactions using wild-type HIV-1 integrase, the distribution and intensity of PCR-amplified products showed that most positions on the plasmid DNA could be used as target sites for integration, and there was a wide variation in integration frequency among the target sites (Fig. 5A, lanes 3 to 5). In reactions wherein the integration was mediated by the fusion protein IN/E2C (Fig. 5A, lanes 6 to 8) or E2C/IN (Fig. 5A, lanes 9 to 11), the E2C-binding site was not used as a target by the fusion proteins, and a significant fraction of the integration events instead occurred near the E2C-binding sequence. For both IN/E2C and E2C/IN, the integration hot spots were distributed asymmetrically and clustered within a 10-nucleotide region about 10 nucleotides upstream (5') of the C-rich strand of the E2C-binding site. In comparison to the wild-type protein, there was a notable decrease in the frequency of integration in the outlying regions of the E2C-binding sequence. For IN/E2C, the decrease in nonspecific integration was seen primarily downstream of the E2C-binding site, whereas the nonspecific integration using E2C/IN was uniformly decreased throughout the target DNA molecule (Fig. 5A, lanes 6 to 8 versus 9 to 11).
![]() View larger version (20K): [in a new window] |
FIG. 4. PCR-based assay for determining distribution and preference of integration sites. (A) Schematic representation of the PCR-based assay. The reaction included the preprocessed U5 double-stranded oligonucleotide (thick lines) and supercoiled plasmid DNA (thin circular lines) as the donor and target substrates, respectively. Thick arrows denote the primers for the PCR. The 5'-end-labeled forward primer, indicated by the asterisk, annealed to the viral U5 DNA, while the reverse primer annealed to the target DNA. Integration of U5 viral DNA at different positions (denoted by x, y, and z) generated a population of recombinants products. The distribution of integration sites was analyzed by the lengths of the PCR products after separation on a denaturing polyacrylamide gel. Integration events occurring into the two target DNA strands were monitored separately by using a reverse primer that annealed to the top or bottom strand. (B) Target DNA substrate. The wild-type or mutant E2C-binding sequence (uppercase letters) was cloned between the BamHI and HindIII sites of a plasmid derived from pBluescript II KS(+), resulting in pBS-e2c and pBS-e2cm, respectively. The point mutations in the E2C-binding sequence are marked in bold. The arrows represent the primers PR-G and PR-C used in the PCR amplification of the integration products occurring in the plasmid DNA containing the G-rich and C-rich strands of the E2C-binding site, respectively. The numbers in parentheses denote the map positions of the sites for primer annealing and restriction enzyme cleavage.
|
![]() View larger version (115K): [in a new window] |
FIG. 5. Selection of target sites by wild-type HIV-1 integrase or fusion proteins containing HIV-1 integrase and E2C. (A) Distribution of integration sites on the C-rich strand of the E2C-binding site. One microgram of target DNA, pBS-e2c, was preincubated with the indicated concentrations of proteins (lanes 3 to 11) for 15 min at room temperature. The integration reaction was started by adding 0.3 pmol of preprocessed U5 DNA and incubating the mixture at 37°C for 45 min. The reaction products were amplified by PCR using labeled B2-1 and PR-C as forward and reverse primers, respectively. In lanes 12 to 14, pBS-e2c was preincubated with the E2C/GST protein (5, 10, or 40 pmol) for 10 min at room temperature. This was followed by the addition of 5 pmol of HIV-1 integrase and an additional preincubation period of 5 min at room temperature before the start of the integration reaction. Lane 1 contains the size marker, with the DNA lengths in nucleotides indicated on the left. Lane 2 is a negative control and represents the PCR products of an integration reaction carried out in the absence of enzymes. The stippled box on the right indicates the position of the E2C-binding site. Arrowheads denote the integration hot spots specific for E2C-containing fusion proteins. (B) Distribution of integration sites on the G-rich strand of the E2C-binding site. Experiments were performed identically to those described for panel A, except that PR-G was used as the reverse primer during PCR amplification. Other symbols have the same significance as in panel A.
|
To ensure that the integration hot spots observed with the fusion proteins were not experimental artifacts, the integration reaction was carried out in the presence of a fixed amount of wild-type HIV-1 integrase and various amounts of E2C/GST protein (Fig. 5A, lanes 12 to 14). Similar to reactions observed earlier with either IN/E2C or E2C/IN fusion proteins, very few integration events took place within the E2C-binding site in the presence of E2C/GST and wild-type integrase. However, in contrast to IN/E2C or E2C/IN, integration hot spots were not detected near the E2C-binding site. The levels of nonspecific integration in the outlying regions were also not noticeably altered. The data provide support that the integration pattern, as defined by both the distribution and frequency of integration events, of IN/E2C or E2C/IN results from two components working in cis as a fusion protein and not from a combined effect of two separate functions provided in trans by individual components. The result also ruled out the possibility that the directed integration by the fusion proteins could be an indirect consequence of DNA distortion induced by protein binding of the E2C recognition site (24).
The distribution and frequency of integration events occurring into the target DNA strand containing the G-rich sequence of the E2C-binding site were also examined (Fig. 5B). Overall, integration hot spots were also observed on the G-rich strand, but they were less pronounced and more scattered than those on the C-rich strand. For IN/E2C, a major hot spot was found immediately upstream of the E2C-binding site, and several other hot spots were located within a 20-bp region downstream of the E2C-binding site (Fig. 5B, lanes 4 and 5). For E2C/IN, several hot spots were also seen within the 20-nucleotide region downstream of the E2C-binding site, but the major hot spot was located within the E2C-binding site (Fig. 5B, lanes 6 and 7). Using the same method described earlier for quantitating integration specificity, we estimated that IN/E2C- and E2C/IN-mediated integration at the hot spots constituted 14 and 32%, respectively, of the total integration events, whereas 5% of integration mediated by wild-type integrase occurred into the same areas. Similar to the C-rich strand, integration hot spots were not detected when integrase and E2C/GST were added to the reaction as separate, individual proteins (Fig. 5B, lanes 8 and 9).
The activity of many integrase variants, although too weak to be detected by the oligonucleotide-based assays, can be studied using the more sensitive PCR-based assay (1, 20, 51). A previous study using integrase-LexA fusion protein showed that the ability to direct integration into a specific site can be achieved with a fusion protein containing the core domain (residues 50 to 234) of HIV-1 integrase (20). Three truncation variants of HIV-1 integrase fused to E2C were examined for their ability to mediate site-directed integration (Fig. 6). As expected from their low activities in oligonucleotide-based assays, the integration efficiencies of these truncated fusion proteins were poorer than that of their full-length counterpart. Other than the poor efficiency and minor differences in the specific choice of integration sites, the truncated fusion proteins IN50-288/E2C (Fig. 6A and B, lanes 4 and 5), IN1-234/E2C (Fig. 6A and B, lanes 6 and 7), and E2C/IN11-288 (Fig. 6A and B, lanes 8 and 9) showed integration patterns similar to those of the E2C fusion proteins containing a full-length integrase (Fig. 5A and B, lanes 6 to 11). For the strand containing the C-rich sequence of the E2C-binding site (Fig. 6A), the integration hot spots were localized within a 20-nucleotide region upstream of the E2C-binding site and integration within the E2C-binding site was absent. For the strand containing the G-rich sequence of the E2C-binding site (Fig. 6B), integration hot spots were present upstream and downstream, as well as within the E2C-binding site.
![]() View larger version (73K): [in a new window] |
FIG. 6. Integration site usage of E2C fusion proteins containing N- or C-terminal-truncated HIV-1 integrase. (A) C-rich strand; (B) G-rich strand. Integration reactions were performed using wild-type HIV-1 integrase (lanes 2 and 3) or various fusion proteins containing N-terminal-truncated (lanes 4 and 5, IN50-288/E2C; lanes 8 and 9, E2C/IN11-288) or C-terminal-truncated integrase (lanes 6 and 7). Lane 1 contains the size marker, and the DNA lengths in nucleotides are indicated on the left. Symbols have the same significance as in Fig. 5.
|
![]() View larger version (87K): [in a new window] |
FIG. 7. Integration site selection of wild-type HIV-1 integrase and various fusion proteins in the presence of a mutant E2C-binding site. The integration reaction was performed in the presence of wild-type integrase (lanes 3 to 5) or the indicated fusion proteins (lanes 6 to 17) and 1 µg of pBS-e2cm, which contains a mutant E2C-binding site, as the target DNA. The integration products into the target DNA strand containing the C-rich sequence of the E2C-binding site were amplified using labeled B2-1 and PR-C as the forward and reverse primers, respectively. Lane 1 contains the DNA size marker, and lane 2 is an integration reaction carried out in the absence of enzymes. The filled box on the right indicates the position of the mutant E2C-binding site.
|
|
|
|---|
Although site-directed integration has been reported previously with fusion proteins consisting of full-length or truncated retroviral integrase and various sequence-specific binding proteins (7, 8, 20, 29), the use of a polydactyl zinc finger protein as the target-specifying component offers important advantages over the published ones with regard to specificity and versatility. E2C belongs to a class of synthetic DNA-binding proteins constructed by linking two zinc finger proteins, with each containing three finger domains (4, 5, 38). Like the human and murine transcription factors Sp1 and Zif268, in which each zinc finger domain recognizes three nucleotides, the synthetic polydactyl zinc finger proteins specifically bind to an 18-bp contiguous DNA sequence (4, 5). Assuming random base distribution, an 18-bp address would be specific within 69 billion bp of sequence, more than sufficient for specifying a unique site in human and other mammalian genomes. For instance, a BLAST search of the GenBank database (human November 2002 freeze) verified that the 18-bp E2C-binding site is unique in the human genome and occurs only once on chromosome 17. Also, each zinc finger domain can potentially be developed as a modular building block for specific recognition of each of the 64 possible 5'-NNN-3' sequences (5, 11, 16, 17, 31, 50). Depending on the sequence of the desired site, the modules can be assembled in any order necessary to form new six-zinc-finger proteins with specific recognition of that particular site. In addition to specificity and versatility, because the synthetic polydactyl zinc finger proteins are put through multiple rounds of selection for their target sequence, their binding affinities are typically in the subnanomolar to picomolar range, which are 10- to 100-fold higher than their three-zinc-finger counterparts and most other sequence-specific DNA-binding proteins (4, 5, 16, 31). The application of this class of designed DNA-binding proteins is also illustrated by studies in which artificial transcription factors based on modified zinc finger domains are used to activate or repress expression of reporter genes, as well as endogenous genes in the native chromosomal environment of animal and plant cells (4, 5, 10, 12, 16, 21, 28, 32, 33, 37, 47).
By examining the distribution and frequency of integration events on the target DNA, we found that the site-directed integration of viral DNA mediated by the integrase-E2C protein has similar characteristics to those reported previously (7, 8, 20, 29). The recognition sequence of the fusion protein is largely devoid of integration events, while integration hot spots specific for the fusion protein are located within 20 bp flanking the recognition sequence. Concomitantly, the frequency of integration events in the outlying regions (>20 bp) is decreased. These characteristics are consistent with our working model, in which the fusion protein binds to its cognate recognition site and mediates integration of viral DNA into the nearby regions. The absence of integration events in the cognate binding site is presumably a result of steric hindrance produced by the sequence-specific binding of the fusion protein. Retention of the fusion protein at the binding site in turn decreases the availability of the fusion protein to mediate integration events elsewhere on the target DNA molecule.
Although a majority of integration events mediated by the full-length integrase-E2C fusion proteins occur within a 20-bp region flanking the E2C recognition sequence, a considerable number of integration events are observed in the outlying region (20 bp or more) of the E2C-binding site. This is likely the result of the nonspecific DNA-binding activity of HIV-1 integrase (45, 55, 57). The preparation of several fusion proteins consisting of various truncated integrases was designed to test whether a higher specificity could be achieved by using an integrase without the domains known to interact with target DNA (23, 45, 55, 57). In comparison to the full-length fusion proteins, the integration specificity of the truncated fusion proteins was not improved, while the integration efficiency was significantly decreased. The result is consistent with the previous finding using integrase-LexA fusion proteins (20) and suggests that a better understanding of integrase-target DNA interaction is needed for suppressing the nonspecific integration activity of HIV-1 integrase.
One distinguishable feature of the site-directed integration mediated by the integrase-E2C fusion proteins is the asymmetric distribution of the integration hot spots. Although hot spots are found on both strands of the DNA helix, the major preferred sites are clustered upstream of the C-rich strand of the E2C recognition sequence (Fig. 8). We do not think that the absence of integration hot spots downstream of the C-rich strand of the E2C-binding site is attributable to local DNA sequences, since the same region is used efficiently as integration sites by wild-type integrase. Structural analysis of the Zif268-DNA complex showed that the zinc finger protein binds in the major groove, and most of the amino acid-DNA contacts are made with the G-rich strand of the target sequence (18, 41). Since the E2C-binding site is nonpalindromic, the binding of the fusion protein to the target DNA is directional and may result in an asymmetric distribution of integration hot spots. We are perplexed, however, by the observation that the same hot spots are found upstream of the C-rich strand regardless of whether the integrase is tethered to the N or C terminus of E2C. In the absence of structural information on the fusion protein, we do not know how the conformation and the position of the fusion protein in relation to the target DNA may affect the distribution of integration events.
![]() View larger version (12K): [in a new window] |
FIG. 8. Positions of preferred integration sites of various HIV-1 integrase-E2C fusion proteins. The DNA sequence flanking the E2C-binding site of pBS-e2c is shown. The E2C recognition sequence is in uppercase letters. Arrowheads above and below the DNA sequence indicate the positions of preferred integration sites on the G-rich and C-rich strands of the E2C-binding site, respectively. The relative preferences of the integration sites for each fusion protein were determined by PhosphorImager analysis and are approximated using arrowheads of different sizes.
|
10-bp periodicity (Fig. 8), suggesting that the fusion protein is bound at the E2C-binding site and interacts with the same face of the double helix. Certain fusion proteins, such as E2C/IN and E2C/IN11-288, have integration hot spots within the E2C-binding site, which is well protected against DNase I digestion by the footprinting analysis. Integration mediated by preintegration complexes containing the HIV-1 integrase-Zif268 fusion protein also shows hot spots within the Zif recognition site (8). It is possible that certain positions within the fusion protein-target DNA complex are more exposed and allow accessibility for the tethered integrase-donor DNA complex, but not for DNase I. The ability of retroviruses to precisely and permanently introduce foreign genes into cellular chromosomes has resulted in their common use as vectors for both genetic engineering in higher eukaryotes and gene therapy. Because of their ability to infect nondividing cells, there is a strong interest in developing lentivirus-based vectors for gene delivery (40, 43). However, integration of viral DNA nonspecifically into host chromosomes is a major concern in the use of lentiviral and other retroviral vectors (22, 36, 53, 54). Studies on site-directed integration using fusion proteins may lead to a new approach for inserting exogenous genes at specific sites and improve the therapeutic application of current retroviral vectors.
We thank Michelle Holmes-Son and other members of the Chow laboratory for helpful discussions, Luke Deltredici for graphic support, and Michael Carey for comments on and critical reading of the manuscript.
Present address: Department of Pharmacology and Toxicology, College of Pharmacy, University of Arizona, Tucson, AZ 85721. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»