Previous Article | Next Article ![]()
Journal of Virology, October 2006, p. 9497-9510, Vol. 80, No. 19
0022-538X/06/$08.00+0 doi:10.1128/JVI.00856-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Tan Wang,2,
Christine Smith-Snyder,1
Marie Cote,1
Michael Scher,1
Joelle N. Pelletier,4
Sinu John,3
Colleen B. Jonsson,2 and
Monica J. Roth1*
Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, 675 Hoes Lane, Piscataway, New Jersey 08854,1 Department of Biochemistry and Molecular Biology, Southern Research Institute, 2000 9th Ave. S., Birmingham, Alabama 35205,2 Graduate Program in Biochemistry and Molecular Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35294,3 Département de Chimie, Faculté des Arts et Sciences, et Département de Biochimie, Faculté de Médecine, Université de Montréal, C.P. 6128, Succursale Centre-Ville, Montréal, Québec H3C 3J7, Canada4
Received 26 April 2006/ Accepted 7 July 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The replication and integration of retroviral particles are two distinct yet interrelated processes. Replicative complexes and preintegrative complexes have been purified and characterized from infected cells (6, 9, 17, 28-30, 39, 52, 53, 55, 56, 66, 67). Within viral species as well as between viral species, the composition of replicative complexes differs from that of preintegrative complexes. Interactions between RT and IN are also reported (40, 69, 89, 90, 96), and multiple mutations of IN are known to alter viral replication (27, 58-60). Despite extensive efforts, the assembly of these complexes is not well understood. These studies have been assisted by structural studies. A structure of the M-MuLV RT has recently been reported (21), as have structures of related retroviral IN subdomains (8, 14, 18, 19, 26, 35, 43, 87, 94). However, to date, neither a structure of a complete retroviral IN protein nor one of a subdomain in complex with DNA has been obtained.
The ability of retroviral particles to stably integrate into the host genome is a great benefit for gene delivery, but the potential for insertional mutagenesis cannot be overlooked (15, 22, 38, 63). Schemes to target integration into alternative positions within the host chromosome to avoid this issue frequently involve generation of fusion proteins with novel targeting domains (10, 48, 84). The linker insertion genetic footprint provides a means to identify nonessential regions within proteins capable of withstanding insertions. Extending these studies to include the RNase H domain provides a parallel analysis of a protein containing a related catalytic core consisting of an acidic catalytic triad.
In this report, the 3' terminus of the M-MuLV pol gene and the HIV-1 IN gene were subjected to random insertion mutagenesis. Individual constructs, selected from the library, were assayed for the effects on virus viability in vivo or IN functions in vitro. Using this complementary approach, four regions functionally tolerant of insertions were identified within RNase H-IN. These regions correlate with domain and protein junctions. No viable linker insertions were identified within any nonstructured regions of connection-RNase H.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The 3' terminal two-thirds of the pol gene was subcloned into a minimal plasmid backbone for mutagenesis. The amp gene and the origin of replication of pGEM-3Zf(+) (Promega) were PCR amplified using primer Hpatag/45314 (5'-GCCGTTAACACATGTGAGCAAAAGGCC-3') and primer Bamtag/45315 (5'-CGGGATCCTTGAAAAAGGAAGAGTATG-3') using a mixture of 5 U Taq DNA pol (Invitrogen) and 2.5 U cloned Pfu (Stratagene). The 1.7-kb PCR product was digested with BamHI and HpaI and ligated to the 2.3-kb BamHI/HpaI fragment from pNCA-C-XN-SU8. The resulting plasmid, pGEM-BH-XN-BH, was used for mutagenesis.
To facilitate the reconstruction of the insertional library into pNCA-C-XN-SU8, a deletion within MuLV IN was generated. pNCA-C-XN-SU8 was partially digested with XmnI, and the linear product was isolated and digested with PmlI. The 10-kb DNA fragment was isolated and ligated to yield pNCA-C-XN-SU8-
IN. This deletion within IN maintains the SalI and NotI sites required for reconstruction of the library. pNCA-C-XN-SU8-
IN and pGEM-BH-XN-BH plasmids were generated in HB101 Escherichia coli cells.
Mutagenesis. M-MuLV mutagenesis was performed on pGEM-BH-XN-BH (80 ng) using the GPS-LS linker scanning system kit (NEB). The method is based on random Tn7 transposition (5) introducing the chloramphenicol resistance gene (Cmr). DNA was introduced into ElectroMAX DH10B cells (Gibco BRL) by electroporation. Chloramphenicol-resistant colonies (105) were selected on one 245- by 245-mm plate. Colonies were scraped off the plate and pooled; the mutagenized pGEM-BH-BH-chlor plasmids were isolated (Midi protocol; QIAGEN) and maintained as a library. This initial library was digested with PmeI to remove the chloramphenicol resistance gene, ligated, and electroporated into ElectroMAX DH10B cells (Gibco BRL). Ampicillin-resistant colonies were pooled and lysed to isolate the pGEM-BH-XN-BH-15 constructs, which contained the 15-bp linker insertion encoding a PmeI site.
The EZ:TN in-frame linker insertion kit (Epicenter Biotechnologies) was used to generate a library of mutants of HIV-1 IN with 19-amino-acid insertions within the target plasmid pINSD.His (NIH AIDS Research and Reference Reagent Program) by following the manufacturer's protocols. Mutants were screened using a PCR-based strategy. The pinsdBscreen primer (5'-CGG GCT TTG TTA GCA GCC GG-3') and pinsdFscreen primer (5'-GGT GCC GCG CGG CAG CC-3') were used to amplify the HIV-1 IN sequence, which annealed to nucleotide positions 301 to 320 and 335 to 351 of the pET15b plasmid, respectively. The PCR mixtures contain 1x PCR buffer from the Expand Long template PCR system (Boehringer Mannheim), 2.25 mM MgSO4, 0.2 mM deoxynucleoside triphosphate, 2 µM primers, 2 U Taq polymerase, and a toothpick trace of the glycerol stocks stored at 80°C. PCR conditions were as follows: 94°C for 4 min, followed by 35 cycles of denaturing at 94°C for 30 s, annealing at 69°C for 30 s, and an extension step at 72°C for 2 min and 15 s. This cycle was followed by a final extension period at 72°C for 4 min, which was followed by a hold at 4°C. After PCR, the samples were loaded onto a 1.5% agarose gel and examined to determine which of the clones was positive for linker insertion within the IN gene. Clones with insertions were individually digested with NotI according to manufacturer's recommendations.
Reconstruction of library into MuLV provirus.
The 15-bp insertion library was reconstructed back into the pNCA-C-XN-SU8 provirus backbone. The SalI-NotI 2,060-bp fragment from the pGEM-BH-XN-BH-15 library was exchanged into the pNCA-C-XN-SU8-
IN, which was digested with the same enzymes. Library DNA was introduced into chemically competent UltraMAX DH5
-FT (Tetr) (Gibco BRL) and maintained on one 245- by 245-mm plate. Since the Tn7 transposition was performed on the BamHI-HpaI 2,281-bp region of MuLV within pGEM-BH-XN-BH, the possibility remained that the insertions occurred outside the SalI-NotI fragment utilized in the provirus reconstruction. To eliminate constructs in which the wild-type coding sequence was transferred, the library DNA was digested with PmeI, and the linear DNA was isolated and ligated to generate the final mutant library. This selected for pNCA-C-XN-SU8 plasmids bearing a PmeI linker insertion.
Single isolate mapping with MuLV. The position of the PmeI sites within MuLV (pNCA-C-XN-SU8) was determined by size analysis of the SalI/PmeI fragment released from the individual mutated plasmid library isolates. Automated sequencing was performed in the DNA core facility of Robert Wood Johnson Medical School (UMDNJ) using an appropriate primer determined after restriction mapping. Alternatively, individual colonies were directly sequenced with primers spanning the MuLV RT/IN coding region to identify PmeI-containing sequences. Approximately 750 individual colonies were isolated and screened for insertions. DNA sequencing of HIV-1 clones was performed with the ABI PRISM BigDye Primer v3.0 cycle sequencing ready reaction kit with AmpliTaq DNA polymerase, FS (Applied Biosystems, Foster City, CA) to determine the site of the 19-codon insertion. Sequence data were analyzed with VectorNTI from InforMax Inc. (Frederick, MD).
Cell culture. The generation and maintenance of canine D17 cells expressing MCAT, the receptor for ecotropic M-MuLV (pJET) (1) has been previously described (68). Individual PmeI-encoding MuLV proviral constructs (100 ng each) from the final library were transiently introduced to 2 x 104 D17/pJET (15 mm wells) in the presence of 150 µg/ml DEAE-dextran (64). Upon confluence, supernatants were collected and cells were passed to six-well (60 mm) plates for maintenance. Supernatants were collected on all subsequent days of confluence and assayed for RT activity (33). Viral DNA was isolated from RT-positive cultures using the method of Hirt (41).
PCR of MuLV viral DNA. Unintegrated MuLV viral DNA (41) was isolated from D17/pJET cells and used as a template for PCR in the presence of 100 pmol of primers JR6325L (5'-CAGTACTGACCCCTCTGAGCATC-3') and JR4085R (5'-ATCAAGCAAGCTCTTCTAACTGCC-3') using a mixture of Taq DNA pol (5 Units; Invitrogen) and cloned PFU (2.5 Units; Stratagene). The amplified 2.2-kb product (bp 4085 to bp 6325 in the pNCA-C-XN-SU8 parental vector) was isolated from a 1% agarose gel using the QIAquick gel extraction kit protocol (QIAGEN) and subjected to automated DNA sequencing.
Expression of the M-MuLV IN C terminus.
A directional deletion analysis was performed to select for a stable MuLV IN C terminus expression construct. The His6-thrombin-WTIN construct within a pET vector was digested with SphI and subjected to Bal31 digestion for five time points between 5 and 30 min. The DNA was digested with PstI, and the deletion fragments between 1 and 1.8 kbp were gel isolated. The plasmid pIN1-105 plasmid contains the His6-thrombin-leader followed by the IN 1-105 expressed downstream of an NdeI site (C
303) (91). The pIN1-105 was digested with NdeI and blunt ended by filling in with Klenow polymerase. After PstI digestion, the 4,568-bp fragment was isolated and ligated with the Bal31 deletion fragments. Individual colonies (total of 93) were analyzed for the size of the deletion, and 20 were further selected for protein expression in E. coli BL21(DE3). Isolate 77 was subjected to DNA sequence analysis.
Expression and purification of HIV IN and mutants. Wild-type HIV-1 IN and insertion mutants were expressed in E. coli BL21(DE3) cells in 50 ml of medium and purified as hexahistidine-tagged fusion proteins as described previously (88). Purification from 50-ml cultures yielded approximately 2 mg of 90 to 95% homogenous protein. The protein fraction refolded at a concentration of 5 mg/ml exhibited the greatest enzymatic activity. HIV-1 IN precipitated upon addition of buffer C (0.2 M NaCl). The precipitated protein was resuspended in buffer D (0.5 M NaCl) to a final concentration of 1 mg/ml.
In vitro integration and disintegration assays of HIV-1 IN. Strand transfer and disintegration reactions were performed as described previously (88). Reaction products were separated on a 20% polyacrylamide denaturing gel and subjected to autoradiography or PhosphorImager screens (Molecular Dynamics). Products were quantified with ImageQuant software (Molecular Dynamics). Oligonucleotides were purified on 20% denaturing polyacrylamide gels, 32P labeled at the 5' end with T4 polynucleotide kinase, and hybridized to complementary strands as previously described (47). Unincorporated radioactivity was removed from labeled integration and disintegration substrates with G-25 or G-50 Quick Spin columns (Boehringer, Mannheim, IN).
Molecular modeling of MuLV RT.
A three-dimensional model of the M-MuLV RT was reconstructed using the 1RW3 crystal structure comprising the fingers, palm, thumb, and connection domains (21) and the preliminary RNase H
C crystal structure (54; Wayne Hendrickson, personal communication). A crude full model was generated using the O program (45) by positioning the RNase H
C domain into the diffuse electron density observed in the 1RW3 structure. Nonstructured regions were molecularly modeled using the O program and subjected to energy minimization using the AMBER suite of programs (70). Regions reconstructed include residues 327 to 334 (at the tip of the thumb), residues 475 to 504 (between connection and RNase H), and residues 592 to 603 and 633 to 642 (RNase H). Additional modification included insertions of the C-helix from E. coli RNase H (1G15) (32) between the B- and D-helices of the M-MuLV RNase H domain, mutating the residues to the correct M-MuLV residues and subjecting the final model to energy minimization. The corresponding figures were generated using MOLSCRIPT (49) and Raster3D (65).
Structural model of HIV-1 IN monomer.
The structural model of the HIV-1 IN monomer was constructed from a combination of two X-ray crystal structures, represented by PDB codes 1k6y (the two-domain finger/core) and 1ex4 (the two-domain core/C terminus). The "A" molecule core region of 1k6y was superimposed onto the "A" molecule core region of 1ex4 using the program O. The 1k6y structure is comprised of residues 1 to 46, 56 to 139, and 149 to 210; and 1ex4 is comprised of residues 56 to 141 and 145 to 270. Thus, the superpositioning consisted of overlaying the C
atoms of all common core residues (root mean square deviation, 0.83 angstroms). Where the model contained disordered regions (residues 47 to 55 and 142 to 144, inclusive), polyalanine segments containing the correct number of amino acids were created and moved into the appropriate linking positions in the model. The Ala residues were then changed to the proper residues, and the regions were subjected to least-squares minimization. Similarly, residues 271 to 288 (absent from 1ex4) were created using a polyalanine chain, mutating to the appropriate amino acid residues, followed by energy minimization.
| RESULTS |
|---|
|
|
|---|
|
The Tn5 mutagenesis system was used to create a library of mutants within HIV-1 IN. The generation of over 2,000 Kanr colonies was indicative of a large-scale mutational library. To make the screening process high throughput, each individual colony was picked and transferred into 96-well culture blocks and PCR-based screening of insertions were conducted. In total, 1,056 colonies were analyzed for the presence of the Tn5 transposon insertion. One hundred eleven clones were positive for having 1 insertion within the HIV-1 IN gene; of these, 56 were unique.
Insertion sites of M-MuLV and HIV-1 individual isolates. The final pNCA-C-XN-SU8 PmeI insertion library for M-MuLV was characterized by analyzing individual isolates. Isolates of the final library were subjected to restriction mapping and sequencing analysis (summarized in Fig. 2 and Tables 1 and 2). The 15-bp insertion generated by the linker scanning system resulted in a 5-amino-acid insertion in 4/6 reading frames and a TAA stop codon in 2/6 reading frames. Sequencing and restriction mapping of isolates from the library demonstrated that insertions were distributed throughout the fragment. The insertion sites, however, were not randomly distributed, with clustering of insertions within the center of the fragment. This could indicate a preference for a specific structure within the plasmid DNA by the transposase enzyme or reflect an inadequate sampling of the large population of constructs within the library. However, within the population examined, a large number of duplicate isolates were identified, indicating that the sample size was representative. A total of 148 in-frame insertions were identified. The ratio of in-frame inserts to those with stop codons was as predicted. In the initial screen of 80 individual colonies analyzed, 67 had unique insertion sites that could be readily sequenced. Approximately 2/5 (29/67) individual constructs had mutations that resulted in stop codons; 37 constructs resulted in the insertion of 5 amino acids. One isolate bore a deletion of 20 essential amino acids within the core region of MuLV IN.
|
|
|
Of the 148 in-frame insertions, 40 were within MuLV RT, 10 were within the connection region, and 30 were within RNase H. The remaining 108 in-frame insertions mapped within the MuLV IN protein; 45 mapped to the N-terminal zinc-binding domain (amino acids [aa] 1 to 105), 56 to the catalytic core (aa 106 to 286), and 7 to the C-terminal domain (aa 287 to 408).
Sequencing of the HIV-1 IN library showed that the insertions were distributed throughout the HIV-1 IN gene (Fig. 3 and Table 3). The insertion sites, however, were not randomly distributed, with clustering of insertions within the C-terminal domain. Of the 111 insertions, 2 were within the N-terminal domain, 35 were within the catalytic core, and 74 were within the C-terminal domain. Of these, 56 clones had unique insertion positions and the correct sequence.
|
|
Plasmid DNA of the individual constructs from the final pNCA-C-XN-SU8 insertion library was transiently introduced into D17/pJET cells in the presence of DEAE-dextran. On days of confluence, cells were screened for the release of reverse transcriptase into the supernatant. Figure 4 is an autoradiograph of one RT assay performed on day 16. The insertions were arranged within the 96-well plate in a linear order from the 5' end to the 3' end of the pol gene. Rows A to G contained 84 in-frame insertions; a single termination insertion within the C terminus of IN (in5743. IH) was included. The positive controls, pNCA-C-XN-Su8 (H11) and pNCA-C (H12) are clearly positive for RT activity at this time point. Quite remarkably, two regions of viable insertions are readily detected in this series. The first 10 isolates are all viable. These correspond with the linker insertions initiating at the extreme C terminus of RNase H and spanning into the N terminus of MuLV IN. These include in4583, in4603, in4607, in4628, in4629, in4638, in4640, in4641, in4647, and in4650. This indicates that insertions within the first 14 amino acids of MuLV IN are tolerated as well as the terminal 9 aa of MuLV RNase H (Fig. 2). Within this region, one insertion from a separate assay series was found not to be viable (Table 1, in4614). This insertion is within the protease recognition sequence and results in the substitution of the P2' and P3' position from STLL/IEN to STLL/IVF.
|
|
The 16 viable viruses identified in this analysis appeared with a time course identical to that of the parental pNCA-C-XN virus. The RT+ virus passaged in this study was isolated and utilized to isolate the unintegrated viral DNA by the method of Hirt (41). The terminal two-thirds of the pol gene was PCR amplified from the viral DNA. This PCR product was sequenced in its entirety. All of the viral constructs maintained the linker insertion sequence encoding the PmeI site. No additional second-site mutations within MuLV IN were identified.
Mutations within MuLV RT-connection-RNase H. Of the 40 linker insertions within the C-terminal half of MuLV RT encoding connection and RNase H, only the three extreme C-terminal in-frame insertions were viable (in4583, in4603, and in4607). These define the sequences at the MuLV RT-IN junction. These results are surprising, as the preliminary X-ray structure of the MuLV RT contains unstructured or flexible loops in several regions within connection-RNase H. Figure 6 shows a molecular model of MuLV RT based upon the structures 1RW3 and MuLV RNase H domain (54). Gaps in the structure were reconstructed and are indicated, including the region between amino acids 327 to 334 (thumb), 475 to 504 (joining connection with the RNase H domain), 592 to 603 (RNase H), and 633 to 642 (RNase H). The positions of the linker insertions are mapped onto this MuLV RT model (Fig. 6). Although several of these insertions map within structurally undefined regions, none of the inserts were viable. These nonstructured regions display stringent requirements for correct replication of the virus in vivo.
|
|
HIV-1 IN N-terminal domain mutants.
The HIV-1 N-terminal domain is made of a three-helix bundle (Fig. 5B). Two insertions were identified at N27
L, located at one end of the helix bundle in the loop connecting the second (
2) and third (
3) helices. The two mutants were at the same position but had different amino acid sequence insertions. Both of the mutants retained full disintegration activity; however, integration activity was barely detectable. Two additional insertions, D55
C and P58
G, fall into the hinge region between the HIV-1 HHCC and core domains. In the two-domain crystal structure, this connecting region (residues 47 to 55) is disordered in all four monomers (87). These two insertions retained full disintegration activity and had moderate to full integration activity.
HIV-1 IN core mutants.
In the HIV-1 core domain (aa 50 to 186) (12), all insertions resulted in disruption of strand transfer activity. The requirements for strand transfer are more stringent than disintegration, and three regions that displayed low levels of disintegration were identified. These include three insertions (I135
K, N144
P, S147
Q) located between
3 and
4 and a group of five mutations (D167
Q through M178
A) from the end of
4 into
5. Interestingly, six insertions that were distributed within
6' (A196 to E212) showed a gradient of increasing disintegration activity as one moves toward the C-terminal end of the helix. Of considerable interest, insertion E212
L maintained nearly full disintegration and integration activity. E212
L is within the region connecting the C terminus and core, which consists of an extended alpha-helix with a bend at the center.
HIV-1 IN C-terminal domain.
In the HIV-1 C-terminal domain, four different regions of activity were identified, and the overall activity of each region increased toward the C terminus. In the first region, between ß1" and ß2", two insertions were identified (R228
D and S230
R) with barely detectable disintegration and no integration activity. The second region, which comprises ß2" through ß4" revealed 10 insertion sites that retained a higher level of disintegration than the first region but exhibited no integration activity. Mutant G247
A, which is just before ß3", was the exception, as it retained full integration and disintegration activity. Interestingly, two insertions, which are right before and after G247, had no integration activity and were decreased in disintegration activity. The third region, which is after ß5" (from I268 to V281) had similar levels of activity in disintegration and retained moderate integration activity compared to wild-type HIV-1 IN. The fourth region, which comprised insertions at R284 to the C terminus, retained full disintegration and integration activity.
Overall summary of HIV and MuLV IN analysis.
In summary, four regions retained full integration activity in this complementary in vivo and in vitro study of M-MuLV and HIV-1 IN, respectively. These correspond to the first 14 amino acids of IN (MuLV), the hinge region connecting the N-terminal and core domains (HIV), the region within the
6' helix connecting the core and C-terminal domains (MuLV and HIV), and the extreme C terminus of the IN (MuLV and HIV).
| DISCUSSION |
|---|
|
|
|---|
Several systems have been developed for "genetic footprinting" of a gene based upon the generation of a library of random inserts and screening those pools for selectable phenotypes. The systems are based on bacterial transposons, including Tn5, Tn7, and Mu, or viruses (5, 42, 73, 78, 83). These systems have the potential to screen the entire population of insertions before and after a selection process through positional mapping of the inserts by PCR. The two systems utilized in this study characterized individual isolates rather than the population as a whole. For the in vitro studies, selection for IN function is complex, and a high-throughput approach was developed. For the Tn7 system, the unique sequence of the insertion is limited to a 10-nucleotide region, which is insufficient to direct a PCR primer to specifically hybridize. Mapping insertions using a series of nested PCR products followed by PmeI digestion proved difficult, as the PCR products were not efficiently cleaved by PmeI. Due to this technical difficulty, this study focused on analysis of individual isolates whose insertion sites could be predetermined prior to introduction into tissue culture for selection. This approach eliminated several additional complications, including limiting the number of termination insertions analyzed as well as decreasing the number of false positives resulting from complementation and/or recombination of mixed infections. Approximately 750 isolates were sequenced to identify the 178 unique isolates utilized in these studies. Within this population of 750, duplicates were identified, indicating that the population analyzed was representative of the library.
The domain boundaries defined in these studies are in general agreement with previous biochemical studies. For the MuLV RT, deletion studies in E. coli which identified a stable and active MuLV RT (pB6B15.23) (77) resulted in the truncation of the seven terminal amino acids of RT/RNase H. This truncation is within the 9-aa region at the C terminus of RT, which was tolerant of linker insertions. Similarly, MuLV IN deletion constructs (p135-1) (76), which lacked the N-terminal 8 amino acids, bound DNA similar to a full-length IN construct. The results of these studies indicate that the N-terminal 13 amino acids of MuLV IN tolerated insertions in vivo. The one exception was the insertion that altered the protease recognition site (in4614). It should be noted that the N terminus of MuLV IN encodes 45 amino acids not conserved in either HIV or avian sarcoma virus-related INs (92). The region tolerant of 5-aa insertions at the N terminus of MuLV IN maps within this nonconserved region. Previous studies indicated that the MuLV IN C terminus could be truncated by 28 amino acids and maintain virus viability (74). These studies refine this region, demonstrating that truncation of 31 aa resulted in nonviable virus. Interestingly, the in-frame linker insertion at this coding region was viable, whereas insertions 3 amino acids upstream were not. These boundaries for IN function may assist in expressing minimized IN constructs for crystallization studies.
In the HIV-1 IN N terminus, only two insertions, at N27, were obtained. These insertions retained disintegration but had barely detectable levels of integration. Relevant to our mutants, it has been shown that a monoclonal antibody which interacts with amino acids 27 to 29 destabilizes the N-terminal helical bundle and decreases 3' processing and transfer activities of HIV-1 IN in vitro (95). In addition, it is known that deletion of the N-terminal 39 aa abolishes integration activity (25). In the core domain of HIV-1 IN, using an extensive panel of mutants, we show that integration was abolished and disintegration was diminished with insertions between D64
C and E212L inclusive. HIV-1 IN disintegration requires only the core domain (residues 50 to 186) (12). Importantly, this set of mutants demonstrates the compactness of IN and underscores the complexity of intramolecular and intermolecular interactions that IN must maintain during the integration process. In our studies, it was anticipated that some of the loop regions within the core might be more amenable to mutation given the solvent accessibility shown in the monomer and dimer structures, such as the loops between the N-terminal
3 and the core ß1', the core ß5' and
4', and the core
5' and
6'. While we did not expect integration activity per se, we expected disintegration, since this activity may not require a higher-order complex. However, in our studies, insertions located at the core loops ß5' and
4' and
4' and
5', all lost integration activity and had no or barely detectable disintegration activity. These two regions retaining minimal disintegration activity correspond to an extended loop (residues 137 to 156) and a flanking region (residues 161 to 173), which are protected from proteolysis upon metal binding (2, 3). Substitution of Gly140 and Gly149 with more constrained Ala residues impaired catalysis of HIV-1 IN, indicating a requirement for some degree of conformational flexibility for catalytic activity (37). These two loops are believed to undergo significant movement to aid in the coordination of a metal ion by the catalytic triad (2, 3). Interestingly, residues 168 to 171 are also reported to contact the host factor LEDGF (20).
Previously, we and others had shown that the C terminus of HIV-1 and M-MuLV IN can tolerate large C-terminal deletions and, similar to the core, can still retain considerable disintegration activity (12, 25, 46, 61). Herein, we show four different regions in the HIV-1 IN C terminus with a gradient of increasing activity as one moves toward the carboxyl terminus. Insertional mutants after amino acids 239 in ß2" and in the loop between ß3" and ß4" lost strand transfer activity while exhibiting full or moderate levels of disintegration activity. G247
A was an exception, as it retained full integration and disintegration activity. Interestingly, the insertions in ß2" and ß3", positioned before and after G247, had no integration activity and low disintegration activity. The context of G247 differed within two molecular models of an HIV IN tetramer (72, 87). In contrast to the Wang tetramer model (72, 87), a 19 aa insertion in the Podtelezhnikov et al. model (72, 87) could interfere with the binding of a putative LTR and sterically clash with the loop region (between ß1' and ß2') of another core molecule. Our results are consistent with this tetramer model. Insertions after I268 and before Q284 had similar levels of activity in disintegration and retained moderate integration activity compared to wild-type IN. The terminal region, which comprised insertions after R284, retained full integration and disintegration activity.
It is of interest that, although functional complementation of MuLV IN was achieved in vitro using constructs that stably expressed the N-terminal zinc binding domain (MuLV IN1-105) with the core-C terminus fragment (MuLV IN 106 to 404) (91), no viable linker insertion was identified in vivo at the junction of the HHCC domain and the core domain. However, in the case of the in vitro HIV-1 IN mutational study, three 19-bp insertions at two positions (D55
C and P58
G) were identified at the transition between the HHCC and core domain, which retain full activity in both disintegration and strand transfer activity of HIV-1 IN. The D55/C56/S57 sequence is proposed to be involved in close proximity with the HIV LTR positions 1 to 4, based on a structural tetramer model (16).
Although it is possible that the linkers are substituting for natural amino acids at that position, we did not observe instances where two in-frame insertions at the same position resulted in differential effects both in MuLV and HIV-1. This might have been predicted, as the insertions frequently encode Cys, which could alter the protein folding. However, within the MuLV IN 6/11 viable insertions encoded Cys. In both the HIV-1 and MuLV IN studies, insertions at the same coding sequence were identified that behaved identically, indicating there was not a positive selection for a Cys residue to, for example, stabilize the region. For MuLV IN, this is exemplified within in4628 PCLNTPY and in4628 PLFKQPY; for HIV, the two insertions at D55
C encode LSLVHILRPQDVYKRQQVD and CLLYTSCGRKMCTRDRQVD and those at V250
I encode CLLYTSCGRKMCTRDRAVV and LSLVHILRPQDVYKRQAVV.
Insights into the boundaries defining the insertion-tolerant region between the core and C terminus were obtained in these comparative studies. In M-MuLV, this region, encoding DPDMTRVTNSPSLQ, corresponds with HIV-1 IN sequence IATDIQFKELQKQI (Fig. 5A). At the 5' terminus, the closest nonviable 5-amino-acid insertion in MuLV IN is 5 aa upstream. However, the closest insertion downstream of 5487 is at 5535, 16 aa C-terminal. A more-saturated library within this region would be required. The deletion study that identified a stable C-terminal construct mapped directly within this region, supporting this as a domain boundary. The 19-amino-acid insertions within HIV-1 IN provide additional insights into these boundaries. A panel of insertional mutants within the HIV-1 IN
6' showed a gradient of increasing disintegration activity, with E212
L active for both disintegration and integration. Insertion E212
L maps within the 12-aa region homologous to MuLV (IATDIQFKELQKQI, where EL is underlined) (18). Insertions C-terminal to the observed bend tolerated insertions of both 5 and 19 amino acids, in vitro and in vivo in the HIV-1 and MuLV IN, respectively. The 19-aa insertion D207
I maps within the region homologous to MuLV IN (IATDIQFKELQKQI, with the DI insertion site underlined) yet is not active for disintegration of strand-transfer activity. Thus, differences in the boundaries between HIV-1 and MuLV IN were identified. This may reflect the differences in the size of the insertions, where 5 amino acids are tolerated and 19 amino acids are not, or structural differences in the assembly of IN multimers.
In both MuLV and HIV IN studies, the results indicate considerable flexibility in the linkage between the catalytic core and C-terminal domain, either through lengthening the distance between the two domains and/or increasing the discontinuity of the extended alpha-helix. It is not known whether the insertions into the long
6' helix that connects the core and C terminus present a favorable condition for the virus. In the related insertional study of the Cre recombinase, insertions into the M-N linker increased DNA binding cooperativity (71). In this system, it was proposed that extending the length of the linker would lead to a smaller bend angle and thus stabilize partner Cre subunits binding to the loxP. In a similar manner, extending the distance between the core and C terminus in IN may assist in the assembly of the synaptic complex consisting of the two viral termini plus the target DNA. The arrangement of the C-terminal domain relative to catalytic core differs among HIV-1, simian immunodeficiency virus type 1, and Rous sarcoma virus IN X-ray structures (18, 19, 94).
The results of the linker insertions into the MuLV RT-connection and RNase H domains were unexpected, as no viable mutations outside the extreme C terminus were identified. Figure 6 contains a molecular model of the MuLV RT, based on the structure of the MuLV RT (1RW3, 443 resides, encoding through residue 474), plus the model of the MuLV RNase H
C domain (54). To assist in mapping the linker insertions, the structurally undefined and deleted regions were reconstructed into this model as tubes. These include the region within the thumb (residues 327 to 334), the region in the connection domain downstream of residue 474 through to the structurally unelucidated region within RNase H (residues 475 to 504), the
-C helix of RNase H, the region homologous to the His loop (23) in HIV-1 RNase H (residues H634 to H642), and residues 592 to 603 of RNase H. The function of the large structurally undefined region between residues 474 to 504 is of interest. Domain mapping using in vitro RT activities (85, 86) mapped the N terminus of RNase H to position 4542 of the DNA provirus (4093 of the viral RNA) (82). Therefore, in4100 localizes within the structurally undefined N terminus of RNase H and in4113 at the beginning of the RNase H structured region. The in vivo data presented in this paper correlate with the in vitro data, indicating that the N terminus of RNase H, despite being structurally undefined, is essential for RNase H activity. In addition, in4023 maps within the structurally undefined region of the RT connection domain. By molecular modeling, residues 475 to 504 were placed on the opposite face of the RT molecule from where the nucleic acid binding site lies, and it was therefore believed that it may reflect a nonessential region of RT. However, in4023 was found to be nonviable in vivo. Interestingly, insertions within this region (M38, H7, and H2) (85) were found to be temperature sensitive for RT activity in vitro. Conformational changes within this region may be required for switching between the polymerase and RNase H activities or to allow steric access to the active sites. Similarly, in Cre, flexible loops were identified which were not tolerant to insertions, indicating their role in Cre function, possibly protein assembly or DNA binding. The function of these structurally uncharacterized loops in both RT and IN need to be defined. The intrinsic flexibility of both these enzymes may reflect the multifunctional activities and staged assembly steps required to specifically bind and recognize their cognate substrates (24, 51).
One aim of this mutational analysis was to identify sites within the IN protein that may tolerate small insertional tags whose function may alter the target site selection of the viral integrases. Protein domains and tags have been inserted both into the N terminus (11, 48, 84) and C terminus (13, 36, 48, 80, 81, 84) of retroviral IN constructs. The identification of the region between the N terminus, the core, and C terminus of IN as functional in the presence of a variety of linker insertions strongly suggests that this region could serve as a third potential insertion site for short tags within the IN protein. The ability of this site to function in alternative protein-protein or protein-DNA interactions depends on its accessibility within the synaptic complex. Further biochemical and structural studies are required to address this question.
| ACKNOWLEDGMENTS |
|---|
We thank Jennifer Jones and Naadira McClean for their assistance.
| FOOTNOTES |
|---|
These authors contributed equally to the manuscript. ![]()
| REFERENCES |
|---|
|
|
|---|