JVI Figure table search 04
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Puglia, J.
Right arrow Articles by Roth, M. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Puglia, J.
Right arrow Articles by Roth, M. J.

 Previous Article  |  Next Article 

Journal of Virology, October 2006, p. 9497-9510, Vol. 80, No. 19
0022-538X/06/$08.00+0     doi:10.1128/JVI.00856-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.

Revealing Domain Structure through Linker-Scanning Analysis of the Murine Leukemia Virus (MuLV) RNase H and MuLV and Human Immunodeficiency Virus Type 1 Integrase Proteins

Jennifer Puglia,1,{dagger} Tan Wang,2,{dagger} Christine Smith-Snyder,1 Marie Cote,1 Michael Scher,1 Joelle N. Pelletier,4 Sinu John,3 Colleen B. Jonsson,2 and Monica J. Roth1*

Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, 675 Hoes Lane, Piscataway, New Jersey 08854,1 Department of Biochemistry and Molecular Biology, Southern Research Institute, 2000 9th Ave. S., Birmingham, Alabama 35205,2 Graduate Program in Biochemistry and Molecular Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35294,3 Département de Chimie, Faculté des Arts et Sciences, et Département de Biochimie, Faculté de Médecine, Université de Montréal, C.P. 6128, Succursale Centre-Ville, Montréal, Québec H3C 3J7, Canada4

Received 26 April 2006/ Accepted 7 July 2006


    ABSTRACT
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Linker-scanning libraries were generated within the 3' terminus of the Moloney murine leukemia virus (M-MuLV) pol gene encoding the connection-RNase H domains of reverse transcriptase (RT) as well as the structurally related M-MuLV and human immunodeficiency virus type 1 (HIV-1) integrase (IN) proteins. Mutations within the M-MuLV proviral vectors were Tn7 based and resulted in 15-bp insertions. Mutations within an HIV-1 IN bacterial expression vector were based on Tn5 and resulted in 57-bp insertions. The effects of the insertions were examined in vivo (M-MuLV) and in vitro (HIV-1). A total of 178 individual M-MuLV constructs were analyzed; 40 in-frame insertions within RT connection-RNase H, 108 in-frame insertions within IN, 13 insertions encoding stop codons within RNase H, and 17 insertions encoding stop codons within IN. For HIV-1 IN, 56 mutants were analyzed. In both M-MuLV and HIV-1 IN, regions are identified which functionally tolerate multiple-linker insertions. For MuLV, these correspond to the RT-IN proteolytic junction, the junction between the IN core and C terminus, and the C terminus of IN. For HIV-1 IN, in addition to the junction between the IN core and C terminus and the C terminus of IN, insertions between the N terminus and core domains maintained integration and disintegration activity. Of the 40 in-frame insertions within the M-MuLV RT connection-RNase H domains, only the three C-terminal insertions mapping to the RT-IN proteolytic junction were viable. These results correlate with deletion studies mapping the domain and subdomain boundaries of RT and IN. Importantly, these genetic footprints provide a means to identify nonessential regions within RT and IN for targeted gene therapy applications.


    INTRODUCTION
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Methods have been developed for the comprehensive analysis of a gene by construction of a saturating or near-saturating library of mutants (5, 78, 83). This approach has defined domain boundaries, provided functional maps, and given insights into previously predicted unstructured loops (4, 5, 50, 71, 73, 78, 83). In this report, this method of insertional functional mapping is applied to three catalytically related domains: the Moloney murine leukemia virus (M-MuLV) RNase H domain of the reverse transcriptase (RT), and the M-MuLV and human immunodeficiency virus type 1 (HIV-1) integrase (IN) proteins. Inclusion of the HIV-1 IN protein assisted comparison and model building, since structural information is available (7, 18, 26, 34, 35, 37, 87). In the retroviral life cycle, the RNase H activity is required for viral replication during the conversion of the viral RNA (vRNA) into double-stranded DNA through the RNA-DNA intermediate. The IN protein is required for the insertion of the double-stranded DNA into the host chromosome, establishing the integrated provirus.

The replication and integration of retroviral particles are two distinct yet interrelated processes. Replicative complexes and preintegrative complexes have been purified and characterized from infected cells (6, 9, 17, 28-30, 39, 52, 53, 55, 56, 66, 67). Within viral species as well as between viral species, the composition of replicative complexes differs from that of preintegrative complexes. Interactions between RT and IN are also reported (40, 69, 89, 90, 96), and multiple mutations of IN are known to alter viral replication (27, 58-60). Despite extensive efforts, the assembly of these complexes is not well understood. These studies have been assisted by structural studies. A structure of the M-MuLV RT has recently been reported (21), as have structures of related retroviral IN subdomains (8, 14, 18, 19, 26, 35, 43, 87, 94). However, to date, neither a structure of a complete retroviral IN protein nor one of a subdomain in complex with DNA has been obtained.

The ability of retroviral particles to stably integrate into the host genome is a great benefit for gene delivery, but the potential for insertional mutagenesis cannot be overlooked (15, 22, 38, 63). Schemes to target integration into alternative positions within the host chromosome to avoid this issue frequently involve generation of fusion proteins with novel targeting domains (10, 48, 84). The linker insertion genetic footprint provides a means to identify nonessential regions within proteins capable of withstanding insertions. Extending these studies to include the RNase H domain provides a parallel analysis of a protein containing a related catalytic core consisting of an acidic catalytic triad.

In this report, the 3' terminus of the M-MuLV pol gene and the HIV-1 IN gene were subjected to random insertion mutagenesis. Individual constructs, selected from the library, were assayed for the effects on virus viability in vivo or IN functions in vitro. Using this complementary approach, four regions functionally tolerant of insertions were identified within RNase H-IN. These regions correlate with domain and protein junctions. No viable linker insertions were identified within any nonstructured regions of connection-RNase H.


    MATERIALS AND METHODS
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Generation of plasmids. Construction and analysis of pNCA-C, a viable, replication-competent M-MuLV proviral construct have been previously described (31). The pNCA-C-XN-SU8 M-MuLV proviral construct was derived from pNCA-C (74, 80). This contains a NotI linker within the XbaI site at the 3' terminus of the M-MuLV pol gene, yielding a 23-amino-acid C-terminal truncation of the MuLV IN protein plus a suppressor tRNA in the 3' long terminal repeat (LTR) (SU8) (57).

The 3' terminal two-thirds of the pol gene was subcloned into a minimal plasmid backbone for mutagenesis. The amp gene and the origin of replication of pGEM-3Zf(+) (Promega) were PCR amplified using primer Hpatag/45314 (5'-GCCGTTAACACATGTGAGCAAAAGGCC-3') and primer Bamtag/45315 (5'-CGGGATCCTTGAAAAAGGAAGAGTATG-3') using a mixture of 5 U Taq DNA pol (Invitrogen) and 2.5 U cloned Pfu (Stratagene). The 1.7-kb PCR product was digested with BamHI and HpaI and ligated to the 2.3-kb BamHI/HpaI fragment from pNCA-C-XN-SU8. The resulting plasmid, pGEM-BH-XN-BH, was used for mutagenesis.

To facilitate the reconstruction of the insertional library into pNCA-C-XN-SU8, a deletion within MuLV IN was generated. pNCA-C-XN-SU8 was partially digested with XmnI, and the linear product was isolated and digested with PmlI. The 10-kb DNA fragment was isolated and ligated to yield pNCA-C-XN-SU8-{Delta}IN. This deletion within IN maintains the SalI and NotI sites required for reconstruction of the library. pNCA-C-XN-SU8-{Delta}IN and pGEM-BH-XN-BH plasmids were generated in HB101 Escherichia coli cells.

Mutagenesis. M-MuLV mutagenesis was performed on pGEM-BH-XN-BH (80 ng) using the GPS-LS linker scanning system kit (NEB). The method is based on random Tn7 transposition (5) introducing the chloramphenicol resistance gene (Cmr). DNA was introduced into ElectroMAX DH10B cells (Gibco BRL) by electroporation. Chloramphenicol-resistant colonies (105) were selected on one 245- by 245-mm plate. Colonies were scraped off the plate and pooled; the mutagenized pGEM-BH-BH-chlor plasmids were isolated (Midi protocol; QIAGEN) and maintained as a library. This initial library was digested with PmeI to remove the chloramphenicol resistance gene, ligated, and electroporated into ElectroMAX DH10B cells (Gibco BRL). Ampicillin-resistant colonies were pooled and lysed to isolate the pGEM-BH-XN-BH-15 constructs, which contained the 15-bp linker insertion encoding a PmeI site.

The EZ:TN in-frame linker insertion kit (Epicenter Biotechnologies) was used to generate a library of mutants of HIV-1 IN with 19-amino-acid insertions within the target plasmid pINSD.His (NIH AIDS Research and Reference Reagent Program) by following the manufacturer's protocols. Mutants were screened using a PCR-based strategy. The pinsdBscreen primer (5'-CGG GCT TTG TTA GCA GCC GG-3') and pinsdFscreen primer (5'-GGT GCC GCG CGG CAG CC-3') were used to amplify the HIV-1 IN sequence, which annealed to nucleotide positions 301 to 320 and 335 to 351 of the pET15b plasmid, respectively. The PCR mixtures contain 1x PCR buffer from the Expand Long template PCR system (Boehringer Mannheim), 2.25 mM MgSO4, 0.2 mM deoxynucleoside triphosphate, 2 µM primers, 2 U Taq polymerase, and a toothpick trace of the glycerol stocks stored at –80°C. PCR conditions were as follows: 94°C for 4 min, followed by 35 cycles of denaturing at 94°C for 30 s, annealing at 69°C for 30 s, and an extension step at 72°C for 2 min and 15 s. This cycle was followed by a final extension period at 72°C for 4 min, which was followed by a hold at 4°C. After PCR, the samples were loaded onto a 1.5% agarose gel and examined to determine which of the clones was positive for linker insertion within the IN gene. Clones with insertions were individually digested with NotI according to manufacturer's recommendations.

Reconstruction of library into MuLV provirus. The 15-bp insertion library was reconstructed back into the pNCA-C-XN-SU8 provirus backbone. The SalI-NotI 2,060-bp fragment from the pGEM-BH-XN-BH-15 library was exchanged into the pNCA-C-XN-SU8-{Delta}IN, which was digested with the same enzymes. Library DNA was introduced into chemically competent UltraMAX DH5{alpha}-FT (Tetr) (Gibco BRL) and maintained on one 245- by 245-mm plate. Since the Tn7 transposition was performed on the BamHI-HpaI 2,281-bp region of MuLV within pGEM-BH-XN-BH, the possibility remained that the insertions occurred outside the SalI-NotI fragment utilized in the provirus reconstruction. To eliminate constructs in which the wild-type coding sequence was transferred, the library DNA was digested with PmeI, and the linear DNA was isolated and ligated to generate the final mutant library. This selected for pNCA-C-XN-SU8 plasmids bearing a PmeI linker insertion.

Single isolate mapping with MuLV. The position of the PmeI sites within MuLV (pNCA-C-XN-SU8) was determined by size analysis of the SalI/PmeI fragment released from the individual mutated plasmid library isolates. Automated sequencing was performed in the DNA core facility of Robert Wood Johnson Medical School (UMDNJ) using an appropriate primer determined after restriction mapping. Alternatively, individual colonies were directly sequenced with primers spanning the MuLV RT/IN coding region to identify PmeI-containing sequences. Approximately 750 individual colonies were isolated and screened for insertions. DNA sequencing of HIV-1 clones was performed with the ABI PRISM BigDye Primer v3.0 cycle sequencing ready reaction kit with AmpliTaq DNA polymerase, FS (Applied Biosystems, Foster City, CA) to determine the site of the 19-codon insertion. Sequence data were analyzed with VectorNTI from InforMax Inc. (Frederick, MD).

Cell culture. The generation and maintenance of canine D17 cells expressing MCAT, the receptor for ecotropic M-MuLV (pJET) (1) has been previously described (68). Individual PmeI-encoding MuLV proviral constructs (100 ng each) from the final library were transiently introduced to 2 x 104 D17/pJET (15 mm wells) in the presence of 150 µg/ml DEAE-dextran (64). Upon confluence, supernatants were collected and cells were passed to six-well (60 mm) plates for maintenance. Supernatants were collected on all subsequent days of confluence and assayed for RT activity (33). Viral DNA was isolated from RT-positive cultures using the method of Hirt (41).

PCR of MuLV viral DNA. Unintegrated MuLV viral DNA (41) was isolated from D17/pJET cells and used as a template for PCR in the presence of 100 pmol of primers JR6325L (5'-CAGTACTGACCCCTCTGAGCATC-3') and JR4085R (5'-ATCAAGCAAGCTCTTCTAACTGCC-3') using a mixture of Taq DNA pol (5 Units; Invitrogen) and cloned PFU (2.5 Units; Stratagene). The amplified 2.2-kb product (bp 4085 to bp 6325 in the pNCA-C-XN-SU8 parental vector) was isolated from a 1% agarose gel using the QIAquick gel extraction kit protocol (QIAGEN) and subjected to automated DNA sequencing.

Expression of the M-MuLV IN C terminus. A directional deletion analysis was performed to select for a stable MuLV IN C terminus expression construct. The His6-thrombin-WTIN construct within a pET vector was digested with SphI and subjected to Bal31 digestion for five time points between 5 and 30 min. The DNA was digested with PstI, and the deletion fragments between 1 and 1.8 kbp were gel isolated. The plasmid pIN1-105 plasmid contains the His6-thrombin-leader followed by the IN 1-105 expressed downstream of an NdeI site (C{Delta}303) (91). The pIN1-105 was digested with NdeI and blunt ended by filling in with Klenow polymerase. After PstI digestion, the 4,568-bp fragment was isolated and ligated with the Bal31 deletion fragments. Individual colonies (total of 93) were analyzed for the size of the deletion, and 20 were further selected for protein expression in E. coli BL21(DE3). Isolate 77 was subjected to DNA sequence analysis.

Expression and purification of HIV IN and mutants. Wild-type HIV-1 IN and insertion mutants were expressed in E. coli BL21(DE3) cells in 50 ml of medium and purified as hexahistidine-tagged fusion proteins as described previously (88). Purification from 50-ml cultures yielded approximately 2 mg of 90 to 95% homogenous protein. The protein fraction refolded at a concentration of 5 mg/ml exhibited the greatest enzymatic activity. HIV-1 IN precipitated upon addition of buffer C (0.2 M NaCl). The precipitated protein was resuspended in buffer D (0.5 M NaCl) to a final concentration of 1 mg/ml.

In vitro integration and disintegration assays of HIV-1 IN. Strand transfer and disintegration reactions were performed as described previously (88). Reaction products were separated on a 20% polyacrylamide denaturing gel and subjected to autoradiography or PhosphorImager screens (Molecular Dynamics). Products were quantified with ImageQuant software (Molecular Dynamics). Oligonucleotides were purified on 20% denaturing polyacrylamide gels, 32P labeled at the 5' end with T4 polynucleotide kinase, and hybridized to complementary strands as previously described (47). Unincorporated radioactivity was removed from labeled integration and disintegration substrates with G-25 or G-50 Quick Spin columns (Boehringer, Mannheim, IN).

Molecular modeling of MuLV RT. A three-dimensional model of the M-MuLV RT was reconstructed using the 1RW3 crystal structure comprising the fingers, palm, thumb, and connection domains (21) and the preliminary RNase H {Delta}C crystal structure (54; Wayne Hendrickson, personal communication). A crude full model was generated using the O program (45) by positioning the RNase H {Delta}C domain into the diffuse electron density observed in the 1RW3 structure. Nonstructured regions were molecularly modeled using the O program and subjected to energy minimization using the AMBER suite of programs (70). Regions reconstructed include residues 327 to 334 (at the tip of the thumb), residues 475 to 504 (between connection and RNase H), and residues 592 to 603 and 633 to 642 (RNase H). Additional modification included insertions of the C-helix from E. coli RNase H (1G15) (32) between the B- and D-helices of the M-MuLV RNase H domain, mutating the residues to the correct M-MuLV residues and subjecting the final model to energy minimization. The corresponding figures were generated using MOLSCRIPT (49) and Raster3D (65).

Structural model of HIV-1 IN monomer. The structural model of the HIV-1 IN monomer was constructed from a combination of two X-ray crystal structures, represented by PDB codes 1k6y (the two-domain finger/core) and 1ex4 (the two-domain core/C terminus). The "A" molecule core region of 1k6y was superimposed onto the "A" molecule core region of 1ex4 using the program O. The 1k6y structure is comprised of residues 1 to 46, 56 to 139, and 149 to 210; and 1ex4 is comprised of residues 56 to 141 and 145 to 270. Thus, the superpositioning consisted of overlaying the C{alpha} atoms of all common core residues (root mean square deviation, 0.83 angstroms). Where the model contained disordered regions (residues 47 to 55 and 142 to 144, inclusive), polyalanine segments containing the correct number of amino acids were created and moved into the appropriate linking positions in the model. The Ala residues were then changed to the proper residues, and the regions were subjected to least-squares minimization. Similarly, residues 271 to 288 (absent from 1ex4) were created using a polyalanine chain, mutating to the appropriate amino acid residues, followed by energy minimization.


    RESULTS
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Figure 1 outlines the series of steps used to generate the linker insertion library within the M-MuLV proviral construct pNCA-C-XN-SU8 (panel A) and the HIV-1 IN expression construct (panel B). For M-MuLV, the target fragment encoding the 3' terminal two-thirds of the pol gene (2.3 kb BamHI/HpaI fragment) was first subcloned into a minimal plasmid encoding ori/amp, generating pGEM-XN-BH-XN. The Tn7 mutagenesis system results in the random insertion of the transposon encoding the chloramphenicol resistance gene throughout the plasmid. The use of a minimal plasmid biases the nonessential regions to be within the target viral insert. With the target sequence 2,281 bp in size, the generation of 105 Cmr colonies was indicative of an extensive mutational library. The colonies were pooled, and the plasmid DNA was isolated as a population and digested with PmeI to remove the chloramphenicol resistance gene. After ligation, the remainder of the Tn7 element reconstitutes a 15-bp linker insertion encoding a PmeI site. This population of 5 amino acid insertions was selected for Ampr, colonies were pooled, and the plasmid DNA was isolated as a population.


Figure 1
View larger version (24K):
[in this window]
[in a new window]
 
FIG. 1. (A) Generation of the M-MuLV pol insertional library. The seven steps required to generate the insertional library within the retroviral proviral construct are outlined. The top figure schematically outlines the pNCA-C XN construct, containing the viral LTR and gag, pol, and env genes. The region of the pol gene encoding the RT, connection (C), and RNase H (R) domains and the IN protein subjected to Tn7 insertional mutagenesis (GPS-LS linker scanning system; NEB) are expanded. Restriction sites utilized and their positions within the M-MuLV viral RNA (82), where appropriate, are BamHI (B, 3535), SalI (S, 3705), XbaI (X, 5766), NotI (N), HpaI (H, 5816), and PmeI (P). (B) Generation of the HIV-1 insertional library. The five steps required to generate the insertional library are outlined. The region of the IN gene encoding the protein was subjected to Tn5 insertional mutagenesis, which contains the kanamycin resistance gene between its short 19-bp mosaic end (ME) Tn5 transposase recognition sequences. NotI restriction sites flanking the ME also are shown.

 
Reconstruction of the library back into a retroviral backbone utilized a proviral construct bearing a deletion of the IN gene, decreasing the possibility of wild-type (WT) sequences within the library. Reconstruction was facilitated by the presence of a unique NotI site introduced at an XbaI site at the C terminus of IN (75). This mutation truncates the C-terminal 23 amino acids of MuLV IN and maintains virus viability (75). The PmeI-bearing pGEM-BH-XN-BH plasmid library was digested with the unique restriction enzymes SalI and NotI, and the library was regenerated by fragment exchange into the pNCA-C-XN-SU8 proviral backbone. With this approach, it is possible that a small number of WT sequences could be transferred, if the initial transposition occurred either within the 170-bp region between the BamHI and SalI sites at the 5' end or if it occurred in the 50-bp region between the NotI and HpaI sites at the 3' end. To eliminate these particular constructs from the library, the reconstructed pNCA-C-XN-SU8-PmeI library was digested with PmeI, and the linear DNA was isolated, religated, and transformed back into E. coli to generate the final library. WT MuLV does not encode a PmeI site and would be eliminated during this process.

The Tn5 mutagenesis system was used to create a library of mutants within HIV-1 IN. The generation of over 2,000 Kanr colonies was indicative of a large-scale mutational library. To make the screening process high throughput, each individual colony was picked and transferred into 96-well culture blocks and PCR-based screening of insertions were conducted. In total, 1,056 colonies were analyzed for the presence of the Tn5 transposon insertion. One hundred eleven clones were positive for having 1 insertion within the HIV-1 IN gene; of these, 56 were unique.

Insertion sites of M-MuLV and HIV-1 individual isolates. The final pNCA-C-XN-SU8 PmeI insertion library for M-MuLV was characterized by analyzing individual isolates. Isolates of the final library were subjected to restriction mapping and sequencing analysis (summarized in Fig. 2 and Tables 1 and 2). The 15-bp insertion generated by the linker scanning system resulted in a 5-amino-acid insertion in 4/6 reading frames and a TAA stop codon in 2/6 reading frames. Sequencing and restriction mapping of isolates from the library demonstrated that insertions were distributed throughout the fragment. The insertion sites, however, were not randomly distributed, with clustering of insertions within the center of the fragment. This could indicate a preference for a specific structure within the plasmid DNA by the transposase enzyme or reflect an inadequate sampling of the large population of constructs within the library. However, within the population examined, a large number of duplicate isolates were identified, indicating that the sample size was representative. A total of 148 in-frame insertions were identified. The ratio of in-frame inserts to those with stop codons was as predicted. In the initial screen of 80 individual colonies analyzed, 67 had unique insertion sites that could be readily sequenced. Approximately 2/5 (29/67) individual constructs had mutations that resulted in stop codons; 37 constructs resulted in the insertion of 5 amino acids. One isolate bore a deletion of 20 essential amino acids within the core region of MuLV IN.


Figure 2
View larger version (23K):
[in this window]
[in a new window]
 
FIG. 2. Functional mapping of the linker insertions on the MuLV pol gene products. The figure summarizes the linker insertions and their effects on retroviral viability. {circ}, nonviable termination inserts; •, viable termination inserts; {triangledown}, nonviable in-frame insertions; {blacktriangledown}, viable in-frame insertions. Asterisks (*) indicate linker insertions previously characterized (74, 79). The insertion marked with a plus sign was temperature sensitive for replication and integration. Amino acid numbering within RT and IN are indicated at the left and right edges. The protease cleavage site marking the junction between MuLV RT and IN is indicated above the sequence. MuLV RT aa 515 is marked, indicating the N terminus of the domain homologous to E. coli RNase H (93). The HHCC N-terminal domain of MuLV IN corresponds to IN1-105 (91). The position of T287 of MuLV IN is indicated, marking the N terminus of the MuLV IN C-terminal domain (Fig. 7). The coding region subjected to mutagenesis in this study includes the sequence N'-DEKQ... .GGPS-C'.

 

View this table:
[in this window]
[in a new window]
 
TABLE 1. Summary of in-frame insertions

 

View this table:
[in this window]
[in a new window]
 
TABLE 2. Summary of terminations

 
The composition of 5-amino-acid inserts is determined by the target site selected and duplicated during the transposition process as well as the sequences encoding the PmeI restriction site. Depending upon the reading frame, the in-frame insertions will encode a core of either CLN or FKQ/H (Table 1). The insertions are therefore not simple aliphatic side chains but contain bulky and often reactive or charged species. Similarly, the TAA stop codon cannot be avoided, as it encodes the core of the PmeI site (GTTTAAAC).

Of the 148 in-frame insertions, 40 were within MuLV RT, 10 were within the connection region, and 30 were within RNase H. The remaining 108 in-frame insertions mapped within the MuLV IN protein; 45 mapped to the N-terminal zinc-binding domain (amino acids [aa] 1 to 105), 56 to the catalytic core (aa 106 to 286), and 7 to the C-terminal domain (aa 287 to 408).

Sequencing of the HIV-1 IN library showed that the insertions were distributed throughout the HIV-1 IN gene (Fig. 3 and Table 3). The insertion sites, however, were not randomly distributed, with clustering of insertions within the C-terminal domain. Of the 111 insertions, 2 were within the N-terminal domain, 35 were within the catalytic core, and 74 were within the C-terminal domain. Of these, 56 clones had unique insertion positions and the correct sequence.


Figure 3
View larger version (33K):
[in this window]
[in a new window]
 
FIG. 3. Mutation functional map of insertions of HIV IN. Positions of each insertion (indicated by arrow) and their activity (using different color scheme) relative to disintegration (circle) and strand transfer activity (square) are shown in the alignment of HIV-1 and MuLV IN protein. Amino acid sequences alignment of MuLV and HIV-1 IN was based on the method of Johnson et al. (44). Dots indicate alignment gap/insertion. Numbering from the N terminus of MuLV IN includes alignment gaps. The GenBank accession number for MuLV IN sequences is NC 001501. Known structural elements of HIV-1 IN, determined by crystallography of recombinant HIV-1 IN (18, 87), are also shown (bold horizontal lines) above the respective homologous segments. Their PDB accession numbers are 1K6Y and 1ex4, respectively. Core structural elements are labeled with a prime ('); C-terminal elements are labeled with a double prime ("). HHCC and DDE motifs are highlighted by red color. Activity is based on the WT activity set to 100%: –, 0%; ±, 0 to 5%; +, 6 to 35%; ++, 36 to 75%; +++, 76 to 100%.

 

View this table:
[in this window]
[in a new window]
 
TABLE 3. Summary of HIV-1 IN insertions

 
In vivo analysis of individual M-MuLV isolates. The viability of individual viral constructs was tested for the passage of transiently expressed virus in tissue culture. Three series of viral constructs were analyzed. The first consisted of a random mixture of both in-frame and terminating codon insertions spanning the complete target sequence. Since the pol gene is expressed as a precursor protein containing protease, reverse transcriptase, and integrase, it was predicted that termination codons within RT would be lethal, resulting in loss of MuLV IN protein. Twenty-nine termination codon insertions were included for analysis. The second series contained linker insertion mutations that mapped to the MuLV IN protein plus one termination codon at the C terminus of IN, and the third series included mutations within the MuLV RT connection and RNase H domains.

Plasmid DNA of the individual constructs from the final pNCA-C-XN-SU8 insertion library was transiently introduced into D17/pJET cells in the presence of DEAE-dextran. On days of confluence, cells were screened for the release of reverse transcriptase into the supernatant. Figure 4 is an autoradiograph of one RT assay performed on day 16. The insertions were arranged within the 96-well plate in a linear order from the 5' end to the 3' end of the pol gene. Rows A to G contained 84 in-frame insertions; a single termination insertion within the C terminus of IN (in5743. IH) was included. The positive controls, pNCA-C-XN-Su8 (H11) and pNCA-C (H12) are clearly positive for RT activity at this time point. Quite remarkably, two regions of viable insertions are readily detected in this series. The first 10 isolates are all viable. These correspond with the linker insertions initiating at the extreme C terminus of RNase H and spanning into the N terminus of MuLV IN. These include in4583, in4603, in4607, in4628, in4629, in4638, in4640, in4641, in4647, and in4650. This indicates that insertions within the first 14 amino acids of MuLV IN are tolerated as well as the terminal 9 aa of MuLV RNase H (Fig. 2). Within this region, one insertion from a separate assay series was found not to be viable (Table 1, in4614). This insertion is within the protease recognition sequence and results in the substitution of the P2' and P3' position from STLL/IEN to STLL/IVF.


Figure 4
View larger version (29K):
[in this window]
[in a new window]
 
FIG. 4. RT assay. RT assay of 85 individual isolates 16 days after transfection into D17/pJET cells (see Materials and Methods). RT-positive constructs are as follows: A1, in4583-15; A2, in4603-15; A3, in4607-15; A4, in4628-15; A5, in4629-15; A6, in4638-15; A7, in4640-15; A8, in4641-15; A9, in4647-15, A10, in4650-15; G7, in5450-15; G8, in5465-15; G9, in5487-15; H11, XN, parental vector pNCA-C-XN-SU8 (positive control); H12, WT, full-length pNCA-C M-MuLV proviral vector.

 
Of considerable interest are the three consecutive insertions (G7 to 9) (Fig. 4) consisting of in5450, in5465, and in5487. These three insertions span a 12-aa region between the core and C-terminal domain. This region has not been previously explored in mutational analyses of IN. Using the homology of MuLV and HIV-1 IN defined by McClure et al. (62), the equivalent region of HIV-1 IN was identified. Figure 5A shows the mapping of this region onto the two-domain structure (IEX4) of the HIV-1 IN core-C terminus (18). The region of HIV-1 IN homologous to MuLV IN spanning in5450 to in5487 is shown, as are the HIV-1 IN core domain and C terminus. The region connecting the C terminus and core consists of an extended alpha-helix, containing a central bend. The homologous insertion-tolerant region maps within this alpha-helical domain, centered at the bend. The net result of the 5-amino-acid insertion would be to lengthen the distance between the two domains and/or increase the discontinuity of the extended alpha-helix.


Figure 5
View larger version (30K):
[in this window]
[in a new window]
 
FIG. 5. (A) MuLV viable domain mapped onto the HIV-1 core-C terminus structure (1EX4). The 14-amino-acid region in MuLV IN spanning insertions in5450-15 through in5487-15 (DPDMTRVTNSPSLQ) was tolerant of 5 amino acid insertions in vivo. This region corresponds to the HIV-1 IN sequence IATDIQFKELQKQI (44), which is highlighted in red (A204 to I217 of the A molecule in 1EX4 is taken from the two-domain structure of the HIV-1 core-C terminus [18]). The HIV-1 core domain is colored blue; the C terminus is yellow. The C terminus ends at amino acid 271. The figure was generated in MOLSCRIPT V 2.0 (49). (B) A three-dimensional structural model of the HIV-1 monomer (aa 1 to 288). The locations of the insertion mutations and their subsequent effects on disintegration and strand transfer activity are shown using the color scheme corresponding to Fig. 3. Amino acid numbering within HIV-1 IN is shown in white. The large spheres denote disintegration activity and the widened colored linear portions denote strand transfer activity.

 
A third region of the MuLV IN protein was found nonessential. This mapped to the extreme C terminus of MuLV IN. Interestingly, insertion in4742 resulting in the in-frame insertion of AVFKAAA (insertion shown in boldface type) was viable (Fig. 2), whereas the terminator insertion in5743 encoding AA*TAA was nonviable (Fig. 2). These studies more finely define the nonessential region of the C terminus of MuLV IN. Linker insertions and truncational studies that mapped three amino acids upstream were previously reported to be nonviable (Fig. 2), whereas in-frame insertions and truncations mapping 2 amino acids downstream were viable (Fig. 2) (74). Of the terminator insertions, only one, in5764, was viable. This mapped within the region previously identified to be nonessential (74).

The 16 viable viruses identified in this analysis appeared with a time course identical to that of the parental pNCA-C-XN virus. The RT+ virus passaged in this study was isolated and utilized to isolate the unintegrated viral DNA by the method of Hirt (41). The terminal two-thirds of the pol gene was PCR amplified from the viral DNA. This PCR product was sequenced in its entirety. All of the viral constructs maintained the linker insertion sequence encoding the PmeI site. No additional second-site mutations within MuLV IN were identified.

Mutations within MuLV RT-connection-RNase H. Of the 40 linker insertions within the C-terminal half of MuLV RT encoding connection and RNase H, only the three extreme C-terminal in-frame insertions were viable (in4583, in4603, and in4607). These define the sequences at the MuLV RT-IN junction. These results are surprising, as the preliminary X-ray structure of the MuLV RT contains unstructured or flexible loops in several regions within connection-RNase H. Figure 6 shows a molecular model of MuLV RT based upon the structures 1RW3 and MuLV RNase H domain (54). Gaps in the structure were reconstructed and are indicated, including the region between amino acids 327 to 334 (thumb), 475 to 504 (joining connection with the RNase H domain), 592 to 603 (RNase H), and 633 to 642 (RNase H). The positions of the linker insertions are mapped onto this MuLV RT model (Fig. 6). Although several of these insertions map within structurally undefined regions, none of the inserts were viable. These nonstructured regions display stringent requirements for correct replication of the virus in vivo.


Figure 6
View larger version (34K):
[in this window]
[in a new window]
 
FIG. 6. Position of the linker insertions within M-MuLV RT-RNase H. The figure shows two views, differing by 180°, of the molecular model of the M-MuLV RT. The individual subdomains are colored as follows: finger-palm, salmon; thumb, pink; connection, blue; and RNase H, green. The catalytic triad (D524, E562, and D583) is shown as space-filled orange spheres. The loop structures introduced into structurally undefined regions are yellow. The position of each individual linker insertion is shown in red. Amino acid positions within MuLV RT are shown in black. The figure was generated in MOLSCRIPT V 2.0.

 
Expression of MuLV IN C-terminal domain. The positioning of the viable insertions between nucleotides 5450 to 5487 is indicative of a domain boundary between the core and C terminus. To confirm this through biochemical means, a deletion analysis of the MuLV IN protein was performed to identify a stably expressed C-terminal domain. Of the 93 deletion constructs generated, 20 with deletions beyond position 5137 of MuLV (82) were further analyzed for protein expression. It was predicted that one in three N-terminal directional deletions would result in an in-frame deletion. However, only one construct reproducibly yielded an abundant, stable MuLV C-terminal IN protein. Figure 7 shows the screening of five individual constructs, where isolate 77 expressed a single 17-kDa protein. DNA sequence analysis identified the N terminus of this protein to be TNSP, corresponding with Thr287 of MuLV IN (marked in Fig. 2). This maps to nucleotide 5471 (82) in the center of the region defined by the linker insertion analysis. Additional studies indicated that IN 287 to 408 construct (no. 77) could be purified from soluble E. coli extracts by nickel-affinity chromatography (data not shown). These results confirm, using biochemical data, that the boundary between the core and C-terminal domain lies within this region.


Figure 7
View larger version (50K):
[in this window]
[in a new window]
 
FIG. 7. Expression of a stable C-terminal IN domain. Whole-cell E. coli extracts of individual deletion constructs of the MuLV IN N terminus. Extracts were subjected to sodium dodecyl sulfate-polyacrylamide gel electrophoresis, followed by Coomassie blue staining. Five individual colonies are shown. Lane 1, isolate 1; lane 2, isolate 3; lane 3, isolate 74; lane 5, isolate 77; lane 6, isolate 35. The positions of the protein standards are indicated at the left. The arrow marks the stable C-terminal IN287-408 MuLV protein product of approximately 17 kDa (isolate 77).

 
In vitro analysis of individual HIV-1 IN mutants. Fifty-six insertional mutant proteins were expressed, purified, and assessed for strand transfer and disintegration activity (Table 3 and Fig. 3). These activities varied and are summarized below based on the position of the mutation within the N-terminal, core and C terminus of the HIV-1 IN protein.

HIV-1 IN N-terminal domain mutants. The HIV-1 N-terminal domain is made of a three-helix bundle (Fig. 5B). Two insertions were identified at N27{downarrow}L, located at one end of the helix bundle in the loop connecting the second ({alpha}2) and third ({alpha}3) helices. The two mutants were at the same position but had different amino acid sequence insertions. Both of the mutants retained full disintegration activity; however, integration activity was barely detectable. Two additional insertions, D55{downarrow}C and P58{downarrow}G, fall into the hinge region between the HIV-1 HHCC and core domains. In the two-domain crystal structure, this connecting region (residues 47 to 55) is disordered in all four monomers (87). These two insertions retained full disintegration activity and had moderate to full integration activity.

HIV-1 IN core mutants. In the HIV-1 core domain (aa 50 to 186) (12), all insertions resulted in disruption of strand transfer activity. The requirements for strand transfer are more stringent than disintegration, and three regions that displayed low levels of disintegration were identified. These include three insertions (I135{downarrow}K, N144{downarrow}P, S147{downarrow}Q) located between {alpha}3 and {alpha}4 and a group of five mutations (D167{downarrow}Q through M178{downarrow}A) from the end of {alpha}4 into {alpha}5. Interestingly, six insertions that were distributed within {alpha}6' (A196 to E212) showed a gradient of increasing disintegration activity as one moves toward the C-terminal end of the helix. Of considerable interest, insertion E212{downarrow}L maintained nearly full disintegration and integration activity. E212{downarrow}L is within the region connecting the C terminus and core, which consists of an extended alpha-helix with a bend at the center.

HIV-1 IN C-terminal domain. In the HIV-1 C-terminal domain, four different regions of activity were identified, and the overall activity of each region increased toward the C terminus. In the first region, between ß1" and ß2", two insertions were identified (R228{downarrow}D and S230{downarrow}R) with barely detectable disintegration and no integration activity. The second region, which comprises ß2" through ß4" revealed 10 insertion sites that retained a higher level of disintegration than the first region but exhibited no integration activity. Mutant G247{downarrow}A, which is just before ß3", was the exception, as it retained full integration and disintegration activity. Interestingly, two insertions, which are right before and after G247, had no integration activity and were decreased in disintegration activity. The third region, which is after ß5" (from I268 to V281) had similar levels of activity in disintegration and retained moderate integration activity compared to wild-type HIV-1 IN. The fourth region, which comprised insertions at R284 to the C terminus, retained full disintegration and integration activity.

Overall summary of HIV and MuLV IN analysis. In summary, four regions retained full integration activity in this complementary in vivo and in vitro study of M-MuLV and HIV-1 IN, respectively. These correspond to the first 14 amino acids of IN (MuLV), the hinge region connecting the N-terminal and core domains (HIV), the region within the {alpha}6' helix connecting the core and C-terminal domains (MuLV and HIV), and the extreme C terminus of the IN (MuLV and HIV).


    DISCUSSION
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
The retroviral genome has evolved to encode multifunctional proteins expressed within polyproteins. These compact viral particles must assemble, infect, replicate, and integrate the viral genome using limited enzymatic functions. In this study, we have used two parallel transposon-based mutational systems (Tn5 and Tn7), differing in the size of the insertion, to create functional maps of the M-MuLV and HIV-1 IN proteins. Studies in M-MuLV extend the region within the 3' terminus of the pol gene to include the connection and RNase H domains of RT. Analysis of 178 mutations in MuLV and 57 mutations in HIV-1 IN indicate limited nonessential regions tolerant of amino acid insertions. These regions localize to protein and domain boundaries, between the RT and IN, between the N terminus and the core of IN, at the C terminus of IN, and between the core and C terminus of IN. Although these results are nonsaturating, the data indicate functional conservation even within regions shown to be disordered within crystallographic structures.

Several systems have been developed for "genetic footprinting" of a gene based upon the generation of a library of random inserts and screening those pools for selectable phenotypes. The systems are based on bacterial transposons, including Tn5, Tn7, and Mu, or viruses (5, 42, 73, 78, 83). These systems have the potential to screen the entire population of insertions before and after a selection process through positional mapping of the inserts by PCR. The two systems utilized in this study characterized individual isolates rather than the population as a whole. For the in vitro studies, selection for IN function is complex, and a high-throughput approach was developed. For the Tn7 system, the unique sequence of the insertion is limited to a 10-nucleotide region, which is insufficient to direct a PCR primer to specifically hybridize. Mapping insertions using a series of nested PCR products followed by PmeI digestion proved difficult, as the PCR products were not efficiently cleaved by PmeI. Due to this technical difficulty, this study focused on analysis of individual isolates whose insertion sites could be predetermined prior to introduction into tissue culture for selection. This approach eliminated several additional complications, including limiting the number of termination insertions analyzed as well as decreasing the number of false positives resulting from complementation and/or recombination of mixed infections. Approximately 750 isolates were sequenced to identify the 178 unique isolates utilized in these studies. Within this population of 750, duplicates were identified, indicating that the population analyzed was representative of the library.

The domain boundaries defined in these studies are in general agreement with previous biochemical studies. For the MuLV RT, deletion studies in E. coli which identified a stable and active MuLV RT (pB6B15.23) (77) resulted in the truncation of the seven terminal amino acids of RT/RNase H. This truncation is within the 9-aa region at the C terminus of RT, which was tolerant of linker insertions. Similarly, MuLV IN deletion constructs (p135-1) (76), which lacked the N-terminal 8 amino acids, bound DNA similar to a full-length IN construct. The results of these studies indicate that the N-terminal 13 amino acids of MuLV IN tolerated insertions in vivo. The one exception was the insertion that altered the protease recognition site (in4614). It should be noted that the N terminus of MuLV IN encodes 45 amino acids not conserved in either HIV or avian sarcoma virus-related INs (92). The region tolerant of 5-aa insertions at the N terminus of MuLV IN maps within this nonconserved region. Previous studies indicated that the MuLV IN C terminus could be truncated by 28 amino acids and maintain virus viability (74). These studies refine this region, demonstrating that truncation of 31 aa resulted in nonviable virus. Interestingly, the in-frame linker insertion at this coding region was viable, whereas insertions 3 amino acids upstream were not. These boundaries for IN function may assist in expressing minimized IN constructs for crystallization studies.

In the HIV-1 IN N terminus, only two insertions, at N27, were obtained. These insertions retained disintegration but had barely detectable levels of integration. Relevant to our mutants, it has been shown that a monoclonal antibody which interacts with amino acids 27 to 29 destabilizes the N-terminal helical bundle and decreases 3' processing and transfer activities of HIV-1 IN in vitro (95). In addition, it is known that deletion of the N-terminal 39 aa abolishes integration activity (25). In the core domain of HIV-1 IN, using an extensive panel of mutants, we show that integration was abolished and disintegration was diminished with insertions between D64{downarrow}C and E212L inclusive. HIV-1 IN disintegration requires only the core domain (residues 50 to 186) (12). Importantly, this set of mutants demonstrates the compactness of IN and underscores the complexity of intramolecular and intermolecular interactions that IN must maintain during the integration process. In our studies, it was anticipated that some of the loop regions within the core might be more amenable to mutation given the solvent accessibility shown in the monomer and dimer structures, such as the loops between the N-terminal {alpha}3 and the core ß1', the core ß5' and {alpha}4', and the core {alpha}5' and {alpha}6'. While we did not expect integration activity per se, we expected disintegration, since this activity may not require a higher-order complex. However, in our studies, insertions located at the core loops ß5' and {alpha}4' and {alpha}4' and {alpha}5', all lost integration activity and had no or barely detectable disintegration activity. These two regions retaining minimal disintegration activity correspond to an extended loop (residues 137 to 156) and a flanking region (residues 161 to 173), which are protected from proteolysis upon metal binding (2, 3). Substitution of Gly140 and Gly149 with more constrained Ala residues impaired catalysis of HIV-1 IN, indicating a requirement for some degree of conformational flexibility for catalytic activity (37). These two loops are believed to undergo significant movement to aid in the coordination of a metal ion by the catalytic triad (2, 3). Interestingly, residues 168 to 171 are also reported to contact the host factor LEDGF (20).

Previously, we and others had shown that the C terminus of HIV-1 and M-MuLV IN can tolerate large C-terminal deletions and, similar to the core, can still retain considerable disintegration activity (12, 25, 46, 61). Herein, we show four different regions in the HIV-1 IN C terminus with a gradient of increasing activity as one moves toward the carboxyl terminus. Insertional mutants after amino acids 239 in ß2" and in the loop between ß3" and ß4" lost strand transfer activity while exhibiting full or moderate levels of disintegration activity. G247{downarrow}A was an exception, as it retained full integration and disintegration activity. Interestingly, the insertions in ß2" and ß3", positioned before and after G247, had no integration activity and low disintegration activity. The context of G247 differed within two molecular models of an HIV IN tetramer (72, 87). In contrast to the Wang tetramer model (72, 87), a 19 aa insertion in the Podtelezhnikov et al. model (72, 87) could interfere with the binding of a putative LTR and sterically clash with the loop region (between ß1' and ß2') of another core molecule. Our results are consistent with this tetramer model. Insertions after I268 and before Q284 had similar levels of activity in disintegration and retained moderate integration activity compared to wild-type IN. The terminal region, which comprised insertions after R284, retained full integration and disintegration activity.

It is of interest that, although functional complementation of MuLV IN was achieved in vitro using constructs that stably expressed the N-terminal zinc binding domain (MuLV IN1-105) with the core-C terminus fragment (MuLV IN 106 to 404) (91), no viable linker insertion was identified in vivo at the junction of the HHCC domain and the core domain. However, in the case of the in vitro HIV-1 IN mutational study, three 19-bp insertions at two positions (D55{downarrow}C and P58{downarrow}G) were identified at the transition between the HHCC and core domain, which retain full activity in both disintegration and strand transfer activity of HIV-1 IN. The D55/C56/S57 sequence is proposed to be involved in close proximity with the HIV LTR positions 1 to 4, based on a structural tetramer model (16).

Although it is possible that the linkers are substituting for natural amino acids at that position, we did not observe instances where two in-frame insertions at the same position resulted in differential effects both in MuLV and HIV-1. This might have been predicted, as the insertions frequently encode Cys, which could alter the protein folding. However, within the MuLV IN 6/11 viable insertions encoded Cys. In both the HIV-1 and MuLV IN studies, insertions at the same coding sequence were identified that behaved identically, indicating there was not a positive selection for a Cys residue to, for example, stabilize the region. For MuLV IN, this is exemplified within in4628 PCLNTPY and in4628 PLFKQPY; for HIV, the two insertions at D55{downarrow}C encode LSLVHILRPQDVYKRQQVD and CLLYTSCGRKMCTRDRQVD and those at V250{downarrow}I encode CLLYTSCGRKMCTRDRAVV and LSLVHILRPQDVYKRQAVV.

Insights into the boundaries defining the insertion-tolerant region between the core and C terminus were obtained in these comparative studies. In M-MuLV, this region, encoding DPDMTRVTNSPSLQ, corresponds with HIV-1 IN sequence IATDIQFKELQKQI (Fig. 5A). At the 5' terminus, the closest nonviable 5-amino-acid insertion in MuLV IN is 5 aa upstream. However, the closest insertion downstream of 5487 is at 5535, 16 aa C-terminal. A more-saturated library within this region would be required. The deletion study that identified a stable C-terminal construct mapped directly within this region, supporting this as a domain boundary. The 19-amino-acid insertions within HIV-1 IN provide additional insights into these boundaries. A panel of insertional mutants within the HIV-1 IN {alpha}6' showed a gradient of increasing disintegration activity, with E212{downarrow}L active for both disintegration and integration. Insertion E212{downarrow}L maps within the 12-aa region homologous to MuLV (IATDIQFKELQKQI, where EL is underlined) (18). Insertions C-terminal to the observed bend tolerated insertions of both 5 and 19 amino acids, in vitro and in vivo in the HIV-1 and MuLV IN, respectively. The 19-aa insertion D207{downarrow}I maps within the region homologous to MuLV IN (IATDIQFKELQKQI, with the DI insertion site underlined) yet is not active for disintegration of strand-transfer activity. Thus, differences in the boundaries between HIV-1 and MuLV IN were identified. This may reflect the differences in the size of the insertions, where 5 amino acids are tolerated and 19 amino acids are not, or structural differences in the assembly of IN multimers.

In both MuLV and HIV IN studies, the results indicate considerable flexibility in the linkage between the catalytic core and C-terminal domain, either through lengthening the distance between the two domains and/or increasing the discontinuity of the extended alpha-helix. It is not known whether the insertions into the long {alpha}6' helix that connects the core and C terminus present a favorable condition for the virus. In the related insertional study of the Cre recombinase, insertions into the M-N linker increased DNA binding cooperativity (71). In this system, it was proposed that extending the length of the linker would lead to a smaller bend angle and thus stabilize partner Cre subunits binding to the loxP. In a similar manner, extending the distance between the core and C terminus in IN may assist in the assembly of the synaptic complex consisting of the two viral termini plus the target DNA. The arrangement of the C-terminal domain relative to catalytic core differs among HIV-1, simian immunodeficiency virus type 1, and Rous sarcoma virus IN X-ray structures (18, 19, 94).

The results of the linker insertions into the MuLV RT-connection and RNase H domains were unexpected, as no viable mutations outside the extreme C terminus were identified. Figure 6 contains a molecular model of the MuLV RT, based on the structure of the MuLV RT (1RW3, 443 resides, encoding through residue 474), plus the model of the MuLV RNase H {Delta}C domain (54). To assist in mapping the linker insertions, the structurally undefined and deleted regions were reconstructed into this model as tubes. These include the region within the thumb (residues 327 to 334), the region in the connection domain downstream of residue 474 through to the structurally unelucidated region within RNase H (residues 475 to 504), the {alpha}-C helix of RNase H, the region homologous to the His loop (23) in HIV-1 RNase H (residues H634 to H642), and residues 592 to 603 of RNase H. The function of the large structurally undefined region between residues 474 to 504 is of interest. Domain mapping using in vitro RT activities (85, 86) mapped the N terminus of RNase H to position 4542 of the DNA provirus (4093 of the viral RNA) (82). Therefore, in4100 localizes within the structurally undefined N terminus of RNase H and in4113 at the beginning of the RNase H structured region. The in vivo data presented in this paper correlate with the in vitro data, indicating that the N terminus of RNase H, despite being structurally undefined, is essential for RNase H activity. In addition, in4023 maps within the structurally undefined region of the RT connection domain. By molecular modeling, residues 475 to 504 were placed on the opposite face of the RT molecule from where the nucleic acid binding site lies, and it was therefore believed that it may reflect a nonessential region of RT. However, in4023 was found to be nonviable in vivo. Interestingly, insertions within this region (M38, H7, and H2) (85) were found to be temperature sensitive for RT activity in vitro. Conformational changes within this region may be required for switching between the polymerase and RNase H activities or to allow steric access to the active sites. Similarly, in Cre, flexible loops were identified which were not tolerant to insertions, indicating their role in Cre function, possibly protein assembly or DNA binding. The function of these structurally uncharacterized loops in both RT and IN need to be defined. The intrinsic flexibility of both these enzymes may reflect the multifunctional activities and staged assembly steps required to specifically bind and recognize their cognate substrates (24, 51).

One aim of this mutational analysis was to identify sites within the IN protein that may tolerate small insertional tags whose function may alter the target site selection of the viral integrases. Protein domains and tags have been inserted both into the N terminus (11, 48, 84) and C terminus (13, 36, 48, 80, 81, 84) of retroviral IN constructs. The identification of the region between the N terminus, the core, and C terminus of IN as functional in the presence of a variety of linker insertions strongly suggests that this region could serve as a third potential insertion site for short tags within the IN protein. The ability of this site to function in alternative protein-protein or protein-DNA interactions depends on its accessibility within the synaptic complex. Further biochemical and structural studies are required to address this question.


    ACKNOWLEDGMENTS
 
This work was supported by NIH grants RO1 GM070837 issued to M.J.R. and GM07666-24 to C.B.J.

We thank Jennifer Jones and Naadira McClean for their assistance.


    FOOTNOTES
 
* Corresponding author. Mailing address: Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, 675 Hoes Lane, Piscataway, NJ 08854. Phone: (732) 235-5048. Fax: (732) 235-4783. E-mail: roth{at}rwja.umdnj.edu. Back

{dagger} These authors contributed equally to the manuscript. Back


    REFERENCES
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 

  1. Albritton, L. M., L. Tweng, D. Scadden, and J. M. Cunningham. 1989. A putative murine retrovirus receptor gene encodes a multiple membrane-spanning protein and confers susceptibility to virus infection. Cell 57:659-666.[CrossRef][Medline]
  2. Asante-Appiah, E., S. H. Seeholzer, and A. M. Skalka. 1998. Structural determinants of metal-induced conformational changes in HIV-1 integrase. J. Biol. Chem. 273:35078-35087.[Abstract/Free Full Text]
  3. Asante-Appiah, E., and A. Skalka. 1997. A metal-induced conformational change and activation of HIV-1 integrase. J. Biol. Chem. 272:16196-16205.[Abstract/Free Full Text]
  4. Auerbach, M., C. Shu, A. Kaplan, and I. Singh. 2003. Functional characterization of a portion of the Moloney murine leukemia virus gag gene by genetic footprinting. Proc. Natl. Acad. Sci. USA 100:11929-11930.[Free Full Text]
  5. Biery, M. C., F. J. Stewart, A. E. Stellwagen, E. A. Raleigh, and N. L. Craig. 2000. A simple in vitro Tn7-based transposition system with low target site selectivity for genome and gene analysis. Nucleic Acids Res. 28:1067-1077.[Abstract/Free Full Text]
  6. Bowerman, B., P. O. Brown, J. M. Bishop, and H. E. Varmus. 1989. A nucleoprotein complex mediates the integration of retroviral DNA. Genes Dev. 3:469-478.[Abstract/Free Full Text]
  7. Bujacz, G., J. Alexandratos, Z. Qing, C. Clement-Mella, and A. Wlodawer. 1996. The catalytic domain of human immunodeficiency virus integrase: ordered active site in the F185H mutant. FEBS Lett. 398:175-178.[CrossRef][Medline]
  8. Bujacz, G., M. Jaskolski, J. Alexandratos, A. Wlodawer, G. Merkel, R. A. Katz, and A. M. Skalka. 1995. High-resolution structure of the catalytic domain of avian sarcoma virus integrase. J. Mol. Biol. 253:333-346.[CrossRef][Medline]
  9. Bukrinsky, M. I., N. Sharova, T. L. McDonald, T. Pushkarskaya, W. G. Tarpley, and M. Stevenson. 1993. Association of integrase, matrix, and reverse transcriptase antigens of human immunodeficiency virus type 1 with viral nucleic acids following acute infection. Proc. Natl. Acad. Sci. USA 90:6125-6129.[Abstract/Free Full Text]
  10. Bushman, F. 1995. Targeting retroviral integration. Science 267:1443-1444.[Free Full Text]
  11. Bushman, F. D. 1994. Tethering human immunodeficiency virus 1 integrase to a DNA site directs integration to nearby sequences. Proc. Natl. Acad. Sci. USA 91:9233-9237.[Abstract/Free Full Text]
  12. Bushman, F. D., A. Engelman, I. Palmer, P. Wingfield, and R. Craigie. 1993. Domains of the integrase protein of human immunodeficiency virus type 1 responsible for polynucleotidyl transfer and zinc binding. Proc. Natl. Acad. Sci. USA 90:3428-3432.[Abstract/Free Full Text]
  13. Bushman, F. D., and M. D. Miller. 1997. Tethering Human immunodeficiency virus type 1 preintegration complexes to target DNA promotes integration at nearby sites. J. Virol. 71:458-464.[Abstract]
  14. Cai, M., R. Zheng, M. Caffrey, R. Craigie, G. M. Clore, and A. M. Gronenborn. 1997. Solution structure of the N-terminal zinc binding domain of HIV-1 integrase. Nat. Struct. Biol. 4:567-577.[CrossRef][Medline]
  15. Calmels, B., C. Ferguson, M. O. Laukkanen, R. Adler, M. Faulhaber, H.-J. Kim, S. Sellers, P. Hematti, M. Schmidt, C. von Kalle, K. Akagi, R. E. Donahue, and C. E. Dunbar. 2005. Recurrent retroviral vector integration at the MDS1-EVI1 locus in non-human primate hematopoietic cells. Blood 106:2530-2533.[Abstract/Free Full Text]
  16. Chen, A., I. T. Weber, R. W. Harrison, and J. Leis. 2006. Identification of amino acids in HIV-1 and avian sarcoma virus integrase subsites required for specific recognition of the long terminal repeat ends. J. Biol. Chem. 281:4173-4182.[Abstract/Free Full Text]
  17. Chen, H., and A. Engelman. 1998. The barrier-to-autointegration protein is a host factor for HIV type 1 integration. Proc. Natl. Acad. Sci. USA 95:15270-15274.[Abstract/Free Full Text]
  18. Chen, J. C.-H., J. Krucinski, L. J. W. Miercke, J. S. Finer-Moore, A. H. Tang, A. D. Leavitt, and R. M. Stroud. 2000. Crystal structure of the HIV-1 integrase catalytic core and C-terminal domains: a model for viral DNA binding. Proc. Natl. Acad. Sci. USA 97:8233-8238.[Abstract/Free Full Text]
  19. Chen, Z., Y. Yan, S. Munshi, Y. Li, J. Zugay-Murphy, B. Xu, M. Witmer, P. Felock, A. Wolfe, V. Sardana, E. A. Emini, D. Hazuda, and L. C. Kuo. 2000. X-ray structure of simian immunodeficiency virus integrase containing the core and C-terminal domain (residues 50-293)-an initial glance of the viral DNA-binding platform. J. Mol. Biol. 296:521-533.[CrossRef][Medline]
  20. Cherepanov, P., A. Ambrosio, S. Rahman, T. Ellenberger, and A. Engelman. 2005. Structural basis for the recognition between HIV-1 integrase and transcriptional coactivator p75. Proc. Natl. Acad. Sci. USA 102:17308-17313.[Abstract/Free Full Text]
  21. Das, D., and M. Georgiadis. 2004. The crystal structure of the monomeric reverse transcriptase from Moloney murine leukemia virus. Structure (Cambridge) 12:819-829.
  22. Dave, U. P., N. A. Jenkins, and N. G. Copeland. 2004. Gene therapy insertional mutagenesis insights. Science 303:333.[Free Full Text]
  23. Davies, J. F., III, Z. Hostomska, Z. Hostomsky, S. R. Jordan, and D. A. Matthews. 1991. Crystal structure of the ribonuclease H domain of HIV-1 reverse transcriptase. Science 252:88-95.[Abstract/Free Full Text]
  24. Dayam, R., and N. Neamati. 2004. Active site binding modes of the beta-diketoacids: a multi-active site approach in HIV-1 integrase inhibitor design. Bioorg. Med. Chem. 12:6371-6381.[CrossRef][Medline]
  25. Drelich, M., R. Wilhelm, and J. Mous. 1992. Identification of amino acid residues critical for endonuclease and integration activities of HIV-1 IN protein in vitro. Virology 188:459-468.[CrossRef][Medline]
  26. Dyda, F., A. B. Hickman, T. M. Jenkins, A. Engelman, R. Craigie, and D. R. Davies. 1994. Crystal structure of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases. Science 266:1981-1986.[Abstract/Free Full Text]
  27. Engelman, A. 1999. In vivo analysis of retroviral integrase structure and function. Adv. Virus Res. 52:411-426.[Medline]
  28. Farnet, C. M., and W. A. Hazeltine. 1991. Determination of viral proteins present in human immunodeficiency virus type 1 preintegration complex. J. Virol. 65:1910-1915.[Abstract/Free Full Text]
  29. Fassati, A., and S. P. Goff. 2001. Characterization of intracellular reverse transcription complexes of human immunodeficiency virus type 1. J. Virol. 75:3626-3635.[Abstract/Free Full Text]
  30. Fassati, A., and S. P. Goff. 1999. Characterization of intracellular reverse transcription complexes of Moloney murine leukemia virus. J. Virol. 73:8919-8925.[Abstract/Free Full Text]
  31. Felkner, R. H., and M. J. Roth. 1992. Mutational analysis of N-linked glycosylation sites of the SU protein of Moloney murine leukemia virus. J. Virol. 66:4258-4264.[Abstract/Free Full Text]
  32. Goedken, E., and S. Marqusee. 2001. Co-crystal of Escherichia coli RNase HI with Mn2+ ions reveals two divalent metals bound in the active site. J. Biol. Chem. 276:7266-7271.[Abstract/Free Full Text]
  33. Goff, S. P., P. Traktman, and D. Baltimore. 1981. Isolation and properties of Moloney murine leukemia virus mutants; use of a rapid assay for release of virion reverse transcriptase. J. Virol. 38:239-248.[Abstract/Free Full Text]
  34. Goldgur, Y., R. Craigie, G. H. Cohen, T. Fujiwara, T. Yoshinaga, T. Fujishita, H. Sugimoto, T. Endo, H. Murai, and D. R. Davies. 1999. Structure of the HIV-1 integrase catalytic domain complexed with an inhibitor: a platform for antiviral drug design. Proc. Natl. Acad. Sci. USA 96:13040-13043.[Abstract/Free Full Text]
  35. Goldgur, Y., F. Dyda, A. B. Hickman, T. M. Jenkins, R. Craigie, and D. R. Davies. 1998. Three new structures of the core domain of HIV-1 integrase: an active site that binds magnesium. Proc. Natl. Acad. Sci. USA 95:9150-9154.[Abstract/Free Full Text]
  36. Goulaouic, H., and S. A. Chow. 1996. Directed integration of viral DNA mediated by fusion proteins consisting of human immunodeficiency virus type 1 integrase and Escherichia coli LexA protein. J. Virol. 70:37-46.[Abstract]
  37. Greenwald, J., V. Le, S. Butler, F. Bushman, and S. Choe. 1999. The mobility of an HIV-1 integrase active site loop is correlated with catalytic activity. Biochemistry 38:8892-8898.[CrossRef][Medline]
  38. Hacein-Bey-Abina, S., V. K. C., M. Schmidt, M. P. McCormack, N. Wulffraat, P. Leboulch, A. Lim, C. S. Osborne, R. Pawliuk, E. Morillon, R. Sorensen, A. Forster, P. Fraser, J. I. Cohen, G. de Saint Basile, I. Alexander, U. Wintergerst, T. Frebourg, A. Aurias, D. Stoppa-Lyonnet, S. Romana, I. Radford-Weiss, F. Gross, F. Valensi, E. Delabesse, E. Macintyre, F. Sigaux, J. Soulier, L. E. Leiva, M.Wissler, C. Prinz, T. H. Rabbitts, F. Le Deist, A. Fischer, and M. Cavazzana-Calvo. 2003. LMO2-associated clonal T cell proliferation in two patients after gene therapy for SCID-X1. Science 302:415-419.[Abstract/Free Full Text]
  39. Hansen, M. S., and F. D. Bushman. 1997. Human immunodeficiency virus type 2 preintegration complexes: activities in vitro and response to inhibitors. J. Virol. 71:3351-3356.[Abstract]
  40. Hehl, E. A., P. Joshi, G. V. Kalpana, and V. R. Prasad. 2004. Interaction between human immunodeficiency virus type 1 reverse transcriptase and integrase proteins. J. Virol. 78:5056-5067.[Abstract/Free Full Text]
  41. Hirt, B. 1967. Selective extraction of polyoma DNA from infected mouse cell cultures. J. Mol. Biol. 26:365-371.[CrossRef][Medline]
  42. Hoffman, L., J. Jendrisak, R. Meis, I. Goryshin, and S. Reznikof. 2000. Transposome insertional mutagenesis and direct sequencing of microbial genomes. Genetica 108:19-24.[CrossRef][Medline]
  43. Hyde, C. C., F. D. Bushman, T. C. Mueser, and Z.-N. Yang. 1999. Crystal structure of an active two-domain derivative of rous sarcoma virus integrase. J. Mol. Biol. 296:535-538.
  44. Johnson, M. S., M. A. McClure, D. F. Feng, J. Gray, and R. F. Doolittle. 1986. Computer analysis of retroviral pol genes: assignment of enzymatic functions to specific sequences and homologies with nonviral enzymes. Proc. Natl. Acad. Sci. USA 83:7648