Previous Article | Next Article ![]()
Journal of Virology, June 2008, p. 5279-5294, Vol. 82, No. 11
0022-538X/08/$08.00+0 doi:10.1128/JVI.02631-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Jeremiah S. Joseph,2,
Kumar S. Saikatendu,2
Pedro Serrano,3
Amarnath Chatterjee,3
Margaret A. Johnson,3
Lujian Liao,2
Joseph P. Klaus,1
John R. Yates III,2
Kurt Wüthrich,3,4
Raymond C. Stevens,3
Michael J. Buchmeier,1* and
Peter Kuhn2,3*
Molecular and Integrative Neurosciences Department,1 Department of Cell Biology,2 Department of Molecular Biology,3 Skaggs Institute of Chemical Biology, The Scripps Research Institute, 10550 N. Torrey Pines Road, La Jolla, California 92037,4 School of Biological Sciences, University of Reading, Whiteknights, RG6 6AJ Reading, United Kingdom5
Received 11 December 2007/ Accepted 16 March 2008
|
|
|---|
|
|
|---|
AVT183, 815LKGG
API821, and 2737LKGG
KIV2743 to release nsp1, nsp2, and nsp3, respectively. In current coronavirus terminology, the term "nonstructural protein" usually refers to peptides processed from pp1a and pp1ab, while "structural protein" refers to the N, M, S, and E proteins, which interact to coordinate the structure of the virion lipidic envelope (39). The term "accessory protein" refers to group- or subgroup-specific proteins, some of which may be incorporated in virions. A typical virion may contain the viral RNA genome, plus tens to hundreds of copies of N, M, and S proteins; a few E proteins (16); and an unknown but presumably small quantity of accessory proteins such as the SARS-CoV ORF3a (22), ORF6 (21), ORF7a (20), and ORF7b (51) proteins. Incorporation of the accessory ORF9b protein can be inferred from incorporation of the homologous I protein of murine hepatitis virus (MHV) (13). Furthermore, our recent electron cryomicroscopy (cryo-EM) analysis of coronavirus ultrastructure (39) revealed that the viral ribonucleoprotein is sufficiently loosely packed in the virion core to leave ample space for possible additional incorporation of host proteins (56).
Here we used mass spectrometry proteomics and protein kinase profiling techniques to probe the contents of purified SARS-CoV virions. We investigated cellular pathways involved in coronavirus assembly, and we expected our experimental approach to identify novel virus component-host protein interactions important to virogenesis. We attempted to bias the analysis toward identification of biologically significant host proteins by subtracting proteins purified from uninfected cells, proteins identified with only one sample preparation method, and proteins occurring only on the proteolytically sensitive surface of the virion during the analysis. One hundred seventy-two host proteins and eight viral proteins meeting these criteria are described here, including three nsp's. Network analysis (2) based on previously reported biochemical interaction mapping (65) revealed several hubs of connectivity (we use the term "hub of connectivity" or "hub" to refer to molecular species showing an outstandingly large number of intermolecular interactions) among incorporated components of viral origin. Among the hubs with the most connections are the viral M protein, the RNA genome, and nsp3. The M protein links the other major virion components at the site of budding (35), and an integral role for the RNA genome in assembly had been anticipated (15). nsp3, however, which is the protein capable of making the most connections to other virus-encoded components of the virion, had not previously been implicated in coronavirus assembly. We therefore selected nsp3 for further functional and structural characterization.
SARS-CoV nsp3 is a large multidomain protein that includes confirmed proteinase and poly(ADP-ribose) binding domains. We present here an updated nsp3 phylogeny and domain map, including novel validated metal ion-binding and nucleic acid-binding domains. We also describe the use of relative conservation data to infer functional information for the remaining uncharacterized nsp3 domains. We interpret these data in light of recent functional and structural characterizations of nsp3 domains (45, 47, 53), which leads us to suggest an important role for nsp3 in coronavirus RNA synthesis and virogenesis.
|
|
|---|
1 to 3 PFU/cell), medium was exchanged after 24 h, and high-titer infectious supernatant was collected 48 h after inoculation. Viral supernatants were clarified by centrifugation at 12,000 x g for 30 min, collected by precipitation with 8% polyethylene glycol 8000 and 2% NaCl, and banded at 140,000 x g for 1.5 h on discontinuous five-step 10% to 50% sucrose gradients. Purified native virus was collected by side puncture and pelleted through HEPES-buffered 0.9% saline (pH 7.0). At this point, aliquots representing virus purified from about 1 liter of infectious supernatant were treated with 5,000 U DNase I (New England Biolabs) for 1 h at 37°C in the supplied DNase I buffer to remove any adherent host chromatin and associated proteins, followed by 60 mg proteinase K (New England Biolabs) for 1 h at 37°C. Proteinase K treatments were not performed in the presence of a detergent in order to preserve the integrity of the viral membrane. Proteinase K was then removed by pelleting virus through a 30% sucrose cushion. Native and enzymatically treated virus preparations were lysed and inactivated with 1% Triton X-100 (for kinase assays), followed by boiling for 5 min (for mass spectrometry). The concentration of detergent was reduced by pelleting denatured protein aggregates through HEPES-buffered saline. Infectious SARS-CoV in this study was purified by density gradient banding. Banded viruses are expected to be more pure than viruses purified by pelleting through a discontinuous 10 to 30% sucrose cushion, as was done in our previous cryo-EM study of SARS-CoV supramolecular architecture (39). Analysis of a representative portion of that set of cryo-EM images containing 1,018 enveloped particles from pelleted SARS-CoV revealed 42 particles not visibly recognizable as SARS-CoV (4% of the total) and eight apparently empty vesicles (1%), which are not expected to contribute a significant amount of protein to the mass spectrometry analysis. The purity of the SARS-CoV used for mass spectrometry and kinase analysis was therefore estimated to be greater than or equal to 95%.
Protein construct design, cloning, expression, and purification. SARS-CoV nsp3 (GenBank accession number NP_828862) extends from nucleotides 2719 to 8484, corresponding to residues Ala907 to Gly2828 of pp1a. A summary of selected nsp3 expression constructs and conditions is shown in Table 1. Expression of several nsp3 domains has been described previously (3, 47, 53). The UB2-PL2pro expression construct was a kind gift from Andrew Mesecar (University of Chicago—Illinois). All other constructs were amplified by PCR from genomic cDNA of the SARS-CoV Tor-2 strain. Amplification primers were designed to produce the constructs listed in Table 1. Amplicons were cloned into the expression vectors pMH1F (N-terminal His6Thio6 tag; derivative of pBAD from Invitrogen), pET25b (tagless construct), pET28b (thrombin-cleavable N-terminal His6 tag), or pET28aTEV (tobacco etch virus protease-cleavable N-terminal His6 tag).
|
View this table: [in a new window] |
TABLE 1. Recombinant expression of nsp3 domains
|
Metal ion-binding assay. Purified proteins were not actively stripped of metal ions before analysis; rather, proteins were selected that did not measurably strip CoCl2 from the Talon affinity matrix at the time of purification. Ten-micromolar SUD-C, SUD451-651, or full-length SUD solutions were mixed with CoCl2 to final concentrations ranging from 0 to 50 µM Co(II) in a buffer containing 25 mM Tris at pH 7.8 and 300 mM NaCl. Samples were incubated on ice for 30 min, and absorption spectra from 250 to 800 nm were then recorded on a Cary UV-Vis spectrophotometer. Matched baseline spectra from samples containing only buffer and CoCl2 were subtracted from the absorption spectra of the protein-containing samples. Zn(II) titration was performed by recording optical spectra after addition of ZnCl2 following incubation with 50 µM CoCl2.
Extraction of viral proteins and digestion. Native SARS-CoV, enzymatically treated SARS-CoV, and host background protein samples were divided into two identical parts, one used for trichloroacetic acid (TCA) precipitation and the other for methanol delipidation. For TCA precipitation, TCA was added to the sample to a final content of 25% (vol/vol). The sample was then placed on ice for 30 min and centrifuged at 13,000 x g for 5 min. The pellet was twice washed with cold acetone to ready it for the next step. For methanol delipidation, 2.5 volumes of methanol, 0.25 volume of chloroform, and 0.5 volume of water were added. The sample was then centrifuged at 16,000 x g for 2 min, and the organic layer was removed. After back extraction with 3 volumes of methanol, the sample was centrifuged at 16,000 x g for 2 min to obtain a pellet. The resulting pellets from the two extraction conditions were separately solubilized in Invitrosol (Invitrogen, Carlsbad, CA), sonicated for 30 min, and reduced with tris(2-carboxyethyl)phosphine, and the cysteines were alkylated with iodoacetamide. Acetonitrile was then added to a final content of 80% (vol/vol). Finally, the sample was digested with trypsin (enzyme/substrate ratio of 1:50 [wt/wt]) at 37°C overnight.
Mass spectrometry analysis of viral proteins. The protein digest from each sample was analyzed by Multidimensional Protein Identification Technology (MudPIT) (69). Briefly, digested proteins were pressure loaded onto a fused silica capillary column packed with a 3-cm, 5-µm Partisphere strong cation exchanger (SCX; Whatman, Clifton, NJ) and 3-cm, 5-µm Aqua C18 material (RP; Phenomenex, Ventura, CA), with a 2-µm filter union (UpChurch Scientific, Oak Harbor, WA) attached to the SCX end. The column was washed with buffer containing 94.9% water, 5% acetonitrile, and 0.1% formic acid. After desalting, a 100-µm-inside-diameter capillary with a 5-µm pulled tip packed with 10-cm, 3-µm Aqua C18 material was attached to the filter union, and the entire split column was placed in line with an Agilent 1100 quaternary high-pressure liquid chromatograph (Agilent, Palo Alto, CA) and analyzed using a modified 11-step separation (66). Three buffer solutions were used: 5% acetonitrile-0.1% formic acid (buffer A), 80% acetonitrile-0.1% formic acid (buffer B), and 500 mM ammonium acetate-5% acetonitrile-0.1% formic acid (buffer C). The first step consisted of a 100-min gradient from 0 to 100% buffer B. Steps 2 to 10 had the following profile: 3 min of 100% buffer A, 5 min of X% buffer C, a 10-min gradient from 0 to 15% buffer B, and a 97-min gradient from 15 to 45% buffer B. The 5-min buffer C percentages (X) were 5, 10, 15, 20, 25, 30, 40, 55, and 75%, respectively. In the final step, the gradient contained 3 min of 100% buffer A, 20 min of 100% buffer C, a 10-min gradient from 0 to 15% buffer B, and a 107-min gradient from 15 to 100% buffer B. As peptides were eluted from the microcapillary column, they were electrosprayed directly into an LTQ linear ion trap mass spectrometer (ThermoFinnigan, San Jose, CA) with the application of a distal 2.4-kV spray voltage. A cycle of one full-scan mass spectrum (400 to 1,400 m/z) followed by five data-dependent tandem mass spectrometry (MS/MS) spectra at a 35% normalized collision energy was repeated continuously throughout each step of the multidimensional separation.
Processing of mass spectra. MS/MS spectra were analyzed using the following software analysis protocol. Poor-quality spectra were removed from the data set using an automated spectrum quality assessment algorithm (4). MS/MS spectra remaining after filtering were searched with the SEQUEST algorithm (12) against a combined human, SARS-CoV, and vervet monkey database from NCBI that was concatenated to a decoy database in which the sequence for each entry in the original database was reversed. SEQUEST results were assembled and filtered using the DTASelect program (60) with a peptide false-positive rate of 5%. To increase the probability of identifying viral proteins while simultaneously maintaining reasonably high filtering criteria, proteins with one peptide hit were accepted, but we required all peptides identified to be fully tryptic.
Bioinformatics analysis. An initial multiple sequence alignment was produced using NCBI BLAST (1) to identify homologous regions and then Clustal to align the homologous regions (8). The initial alignment was manually fine tuned to reflect (in hierarchical order) solved coronavirus protein structures, conserved cysteine and histidine residues, TMHMM2 transmembrane region prediction (30), and structure/loop context from PredictProtein analysis (46). Annotations and region boundaries displayed here were derived from published analysis by Gorbalenya et al. (18), de novo SARS-CoV-specific domain structure-prediction (24), and a combination of domain expression and nuclear magnetic resonance screening for foldedness.
The following sequences were used for nsp3 alignments in Fig. 2A and S2 in the supplemental material: group Ia, HCoV-NL63 (YP_003766), HCoV-229E (NP_073549), PEDV (AAK38661), BtCoV 512/2005 (ABG47077); group Ib, transmissible gastroenteritis virus (TGEV) (NP_058422), PRCoV (ABG89316), FCoV (YP_239353); group IIa, HCoV-HKU1-A (YP_173236), HCoV-HKU1-N6 (ABD75567), MHV-JHM (AAA46457), MHV-A59 (NP_068668), BCoV (NP_150073), HCoV-OC43 (NP_937947), HEV (YP_459949); group IIb, SARS-CoV (AAP41036), BtCoV-HKU3 (AAY88865), BtCoV-Rf1 (ABD75321); group IIc, BtCoV-HKU5 (ABN10892), BtCoV 133/2005 (YP_729202); group IId, BtCoV-HKU9-1 (YP_001039970), BtCoV-HKU9-2 (ABN10918), BtCoV-HKU9-3 (ABN10926), BtCoV-HKU9-4 (ABN10934); group III, IBV-Beaudette (NP_066134), IBV-Peafowl/GD/KQ6/2003 (AAT70073), IBV-LX4 (AAQ21583), IBV-BJ (AAP92673); torovirus group (aligned from ADP-ribose-1''-phosphatase [ADRP] onward), EToV (ABC26008), BToV (YP_337905). The alignment presented in Fig. 2C and analysis in Fig. 8 include HCoV-229E, HCoV-NL63, BtCoV 512/2005, FCoV, HCoV-HKU1, MHV-A59, HCoV-OC43, SARS-CoV Tor2, BtCoV 133/2005, BtCoV-HKU5, BtCoV-HKU9-1, BtCoV-HKU9-4, IBV-Beaudette, and IBV-Peafowl/GD/KQ6/2003 sequences listed under or linked from the accession numbers above.
![]() View larger version (68K): [in a new window] |
FIG. 2. Overview of nsp3 organization. (A) Multiple sequence alignment of coronavirus and torovirus nsp3 homologs. The 16-component functional annotation presented here (Func) is an extension of our previous SARS-CoV-specific domain boundary prediction (SARS) and the ongoing analysis by Gorbalenya and collaborators (Gorb). It incorporates domain boundaries defined in a hierarchy of functional (f), structural (s), and phylogenetic (p) criteria. The functional annotation was compiled from published data and results presented here. Region designations include the following: ubiquitin-related domains (UB1 and UB2), an acidic hypervariable region (AC), complete (PL1pro and PL2pro) or partial (pro) papain-like cysteine proteinases, ADRP, a SARS-CoV subgroup-specific MBD, the carboxyl-terminal moiety of the "SARS-unique domain" (SUD-C), group II-specific NAB domain and marker domain (G2M), two predicted double-pass transmembrane domains (TM1-2 and TM3-4), a putative metal-binding region (ZF), and three subdomains forming part of the Y region (Y1 to Y3) originally described by Gorbalenya et al. (18). Dotted lines denote additional subgroup-specific domains not included in the annotation above. Amino acid residues are color coded gray (AFGILMPVWY), light blue (KNQRST), blue (CH), or red (DE) to highlight patterns that may mark conserved protein structures. We divide group II into four subgroups following published suggestions (71) and divide group I into two subgroups. Sequences from equine and bovine toroviruses are shown from the domain homologous to ADRP onward. (B) Selected SARS-CoV expression constructs. Solid lines denote expression (also Table 1); dashed lines indicate that no expression has so far been obtained. (C) Enlargement of the ZF and flanking regions, with transmembrane domain predictions. The overlay shows the average transmembrane probability score for 400-amino-acid regions centered on the first conserved cysteine of ZF. A red overlay displays average transmembrane probability scores calculated by TMHMM2 for this region from a set of 15 representative coronaviruses, approximately equally weighted with respect to each subgroup (see Materials and Methods). For display purposes, in this panel the sequences are aligned only with conserved clusters of four cysteine/histidine residues in ZF and Y1 ( and β). (D) Structural annotation of SARS-CoV nsp3. Experimentally characterized flexibly disordered regions are indicated with dashed green lines, and predicted flexible regions separating conserved domains are indicated with solid green lines.
|
![]() View larger version (29K): [in a new window] |
FIG. 8. Use of amino acid conservation to infer function for experimentally uncharacterized nsp3 domains. Average percent identity (API) was measured by pairwise alignment of conserved proteins and domains from different subgroups (Ia versus Ib, IIa versus IIb, etc.) or groups (I versus III, etc.). Conserved coronavirus proteins are grouped by functional class, including enzymes (P-E; nsp5, nsp12, nsp13, nsp14, nsp15, and nsp16), nonenzymatic proteins (P-NE; M and E), enzymatic domains (D-E; ADRP, PL1pro, and PL2pro), and putative nonenzymatic domains (D-NE; UB1, AC, SUD-C, UB2, NAB, and two nucleoprotein domains). Dotted lines mark intersubgroup API values associated with domains not found in all groups (PLP1, SUD-C, NAB, and G2M). Subgroup-specific markers such as SARS-CoV MBD were not included. Uncharacterized nsp3 domains clustering with enzymatic (UD-E; Y1, Y2, and Y3) and nonenzymatic (UD-NE; TM1-2, ZF, TM3-4, and G2M) classes are indicated.
|
-33P]ATP. The labeled substrate array was visualized by autoradiography, digitally scanned, and quantified using ImageJ densitometry software (NIH). Duplicate PepChip arrays incorporated a total of 48 nonsubstrate peptides, which were used as negative controls to determine the background levels in the densitometry analysis. Density values for these spots were used to assess and filter results. Peptides for which both replicate spots exceeded the mean density value plus 1 standard deviation on the controls on the scanned autoradiograph were taken as positive results. Protein stoichiometry analysis. A detailed description and validation of perfluoro-octanoic acid (PFO)-polyacrylamide gel electrophoresis (PAGE) as a tool for protein stoichiometry assessment can be found elsewhere (44). Briefly, purified protein samples were incubated at 37°C for 1 h; mixed 1:1 with PFO loading buffer containing 8% (wt/vol) PFO, 100 mM Tris base, 20% (vol/vol) glycerol, and 0.05% (wt/vol) orange G; and loaded onto precast 4 to 20% Tris-glycine gradient gels. Gel electrophoresis was performed with a standard Tris-glycine running buffer to which 0.5% (wt/vol) PFO was added. Protein was detected by SYPRO-ruby poststain (Invitrogen).
Electrophoretic mobility shift and unwinding assays. For electrophoretic mobility shift assay (EMSA), protein samples were mixed with 0.8 µg of RNA or DNA substrate and assay buffer containing 150 mM NaCl-50 mM Tris at pH 8.0 to a total reaction volume of 20 µl. Sequence-matched RNA and DNA oligomers were designed (substituting T for U as appropriate) with randomized sequences designed to adopt single-stranded conformations: ssRNA1/ssDNA1, 5'-AAAUACCUCUCAAAAAUAACACCACACCAUAUACCACAU-3', and ssRNA2/ssDNA2, 5'-AGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUC-3'. Double-stranded RNA (dsRNA) and DNA (dsDNA) were produced by boiling and slowly cooling equimolar mixtures of single-stranded RNA (ssRNA) or DNA (ssDNA) (substituting T for U) oligomers, 5'-GAAAGGAAAAAGGGAGAAGA-3' and 5'-UCUUCUCCCUUUUUCCUUUC-3'. Protein-nucleic acid mixtures were incubated at 37°C for 1 h and analyzed by native electrophoresis on precast 6% acrylamide DNA retardation gels (Invitrogen). Nucleic acid was detected by SYBR-gold poststain (Invitrogen) and photographed using a UV light source equipped with a digital camera. SYBR-gold was rinsed out and protein was subsequently detected by SYPRO-ruby poststain (Invitrogen). Densitometry analysis was performed using a flatbed scanner with ImageJ software (NIH). The mobility shift of RNA at each protein concentration was calculated relative to the maximum shift observed in each experiment. Kd (dissociation constant) values were measured from the midpoints of the fitted titration data.
For unwinding assays, nucleic acid and protein mixtures were prepared and incubated as described above for the EMSAs. Instead of applying the samples immediately to polyacrylamide gels, samples were incubated at 4°C overnight to allow protein-nucleic acid complexes to dissociate before native PAGE analysis. Results were visualized and recorded as for EMSA.
|
|
|---|
We also attempted a proteomics analysis of Junin-Candid1 arenavirus at the same time as SARS-CoV. However, due to the slow growth of Candid1 in our hands, the resulting samples were essentially virus free but contained numerous high-molecular-weight proteins associated with the cytoskeleton (21% of the proteins identified) and extracellular matrix (60% of the proteins identified [Table 2]). Trace sequences totaling 3.4% of the full-length Candid1 nucleoprotein were identified, but these samples were otherwise free of viral proteins. Nucleoprotein is the most plentiful component of purified arenavirus (58), but characteristic virion components such as the Candid1 SSP, GP-C, Z, and L proteins and host ribosomes (41) were conspicuously absent from these preparations. These samples were used to approximate the spectrum of proteins purified from uninfected Vero-E6 cells and are referred to here as "background" samples.
|
View this table: [in a new window] |
TABLE 2. Background host proteins excluded from this analysis
|
Proteomics of SARS-CoV.
To determine the protein composition of the purified native, PK, and background samples, we performed two-dimensional liquid chromatography MS/MS analysis of peptide mixture generated by in-solution digestion of the proteins. Two primary extraction techniques were employed: TCA precipitation and methanol delipidation. Peptides extracted by TCA and methanol delipidation were analyzed separately, and the results were combined. Some proteins were identified using only one extraction technique, while others were identified with both. Except where explicitly stated, proteins reported here met three criteria: (i) presence in at least one PK sample, (ii) presence in one native sample, and (iii) absence from both background samples. SARS-CoV grows relatively poorly in most human cell types, and so the virus was grown in Vero-E6 cells derived from the African green monkey Cercopithecus aethiops. Because of the limited number of Cercopithecus aethiops protein sequences available, peptides were screened against a database including Cercopithecus aethiops and Homo sapiens sequences in addition to all SARS-CoV protein sequences of at least 9 amino acids. Using this procedure, eight viral proteins and 172 host proteins were identified from SARS-CoV, including the three explicit Cercopithecus aethiops sequences cyclophilin A (PPIA), calreticulin (CALR), and STAT-1
(overview in Table 3; see also detailed descriptions in Table S1 in the supplemental material). Because of the large number of proteins identified, most of the host proteins listed in Table 3 and Table S1 in the supplemental material are presented without regarding their potential relevance to the viral replication cycle.
|
View this table: [in a new window] |
TABLE 3. Host proteins identified in purified SARS-CoV grouped by functiona
|
Substrates that were phosphorylated by at least 1 standard deviation above background levels in each of two replicate arrays are reported here. Of 77 phosphorylated substrates, 29 could be linked with a specific protein kinase. Three kinase activity signatures were detected multiple times in the virion lysate, i.e., CSNK2 (four substrates), protein kinase A (PRKA; 12 substrates), and protein kinase C (PRKC; five substrates). Other kinase signatures represented by a single phosphorylated substrate included CAMK2, CKS1, CSK, epidermal growth factor receptor, GRK1, MAPK1, PHK, and RPS6K. Of these, CSNK2 was detected in both PK and native virion lysates and thus represents a probable virion component. Ribosomal protein S6 kinase (RPS6K) was found in both PK samples and is probably incorporated, as we conclude from the generally heavy ribosomal protein representation in SARS-CoV as well as the specific presence of the RPS6 substrate in the sample. PRKA, PRKC, and MAPK1 were absent in PK samples, and each was detected in only one native sample (data not shown), and therefore we concluded that they were present through adventitious copurification or entanglement at the virion surface. One protein kinase detected by mass spectrometry, PRKD, was not detected by substrate phosphorylation, possibly due to the presence of only three validated PRKD substrates on the chip.
Relative abundance of viral proteins. Protein detection by mass spectrometry proteomics depends on factors including abundance, sensitivity of detection, enzymatic pretreatment, extraction method, proteinase accessibility, and availability of potential proteolytic cleavage products of appropriate molecular weight. Mass spectrometry is therefore not an optimal tool for precise measurement of the absolute stoichiometry of incorporated components but can provide a general idea of ranked abundance within a sample. We used a hierarchy of native detection frequency > PK detection frequency > peptide coverage relative to protein length for a tentative ranking of the relative abundance of viral and host proteins in SARS-CoV (Table 4). SARS-CoV N, M, and S were consistently among the 10 most abundant proteins detected in PK samples. The accessory SARS-CoV ORF3a and ORF9b proteins and nsp2, nsp3, and nsp5 were present in lower relative abundance and were of equal or lesser abundance in PK samples than were some ribosomal proteins, histones, heat shock protein 90, and phosphatase I (Table 4; also see Table S1 in the supplemental material).
|
View this table: [in a new window] |
TABLE 4. Viral proteins identified in purified SARS-CoVa
|
![]() View larger version (22K): [in a new window] |
FIG. 1. Interaction map for SARS-CoV-derived components. Double outlines indicate major components, including known high-copy-number virion proteins and the large viral RNA genome, and minor components, including low-copy-number and weakly conserved proteins. Black outlines identify components detected by mass spectrometry proteomics. Gray outlines indicate components identified in other published studies. Solid single outlines denote novel components identified in both native and PK SARS-CoV.
|
Phylogenetic analysis of nsp3. The most frequently encountered protein globular domains are formed from contiguous polypeptide chain segments of about 100 amino acid residues (68). Previous bioinformatics analyses of nsp3 had identified only a few domains fitting this criterion, but they predicted several large regions likely to include multiple structural domains. We therefore compiled a higher-resolution analysis of nsp3 domain architecture as a tool for novel structural and functional characterization. We performed a phylogenetic analysis of nsp3 (Fig. 2; see also Fig. S2 in the supplemental material) to identify small, conserved regions that might yield expressible protein domains. Protein sequence analysis of coronavirus and torovirus nsp3 homologs revealed a pattern of alternating conserved and nonconserved regions, consistent with a multiple-domain and linker structure (Fig. 2A; see also Fig. S2 in the supplemental material). Results from previously published studies (17, 61, 73) and fold recognition software (24) were incorporated in this process of construct design. Previous studies showing that the UB1, PL2pro, and ADRP domains of nsp3 were both well folded and functional when expressed separately were taken as support of the domain and linker structure of nsp3 (45, 47, 53).
As shown in Fig. 2B, predicted domains located toward the amino terminus of nsp3 were tested and found to be generally amenable to expression as domains, while all but one region downstream of the PL2pro domain was not efficiently expressed. One possible reason for the expression difficulties may lie in the presence of a long hydrophobic domain predicted to contain four transmembrane spans in this region (Fig. 2C). Based on the expression pattern and the available structural data, a general model of nsp3 structure was proposed (Fig. 2D). In modeling nsp3, we were guided by the assumption that nsp3 topology would be constant among coronaviruses. The proposed structure contains four transmembrane spans and places nearly all of nsp3, including the PL2pro domain, on one face of the membrane. The domain topology of the model of membrane-embedded nsp3 is inferred from the presence of PL2pro cleavage sites at both termini of nsp3 and bioinformatic predictions. While the exact number of transmembrane spans is not certain, any multiple of two could be conducive to posttranslational processing of nsp3 by PL2pro and would present the bulk of nsp3 on the same membrane face occupied by nsp5 3CLpro and the pp1b replicase proteins. Our model of TM distribution (Fig. 2C) follows the 3TM + 1TM distribution of transmembrane regions recently proposed for MHV nsp3 (26), which was based in part on observed glycosylation patterns from truncated nsp3 constructs (19, 26) and is consistent with an independent model of nsp4 structure (40). The interpretation presented in Fig. 2C includes all three major phylogenetic groups and the newly sequenced group II bat coronaviruses. Although we note that phylogenetic evidence more consistently suggests a 2TM + 2TM distribution across the coronavirus family (Fig. 2C), the weight of biochemical evidence currently favors the 3TM + 1TM distribution.
Several types of domain designation may be possible for a given set of input sequences, depending on the criteria used for selection. Here we present a working functional annotation based on a hierarchy of functional > structural > phylogeny-based domain identification. Where protein function and structure are known, "functional" domains such as ADRP and PL2pro have been noted. Where only the structure is known, as for ubiquitin-related UB2, "structural" domains are noted. Where only the primary sequence data were available, islands of sequence conservation, termed "phylogenetic" domains such as Y1 to Y3, were designated. Our analysis revealed 16 conserved nsp3 domains—identified here as UB1, AC, PL1pro, ADRP, MBD (metal-binding domain), SUD-C, UB2, PL2pro, NAB, G2M, TM1-2, ZF, TM3-4, Y1, Y2, and Y3—of which between 12 and 15 domain homologs could be identified in any one coronavirus (Fig. 2A). Tryptic peptide fragments of nsp3 identified by mass spectrometry were derived from the ADRP (four peptides), MBD (one peptide), SUD-C (three peptides), PL2pro (two peptides), Y1 (two peptides), and Y2 (three peptides) domains. The multidomain construct SUD, with residues 389 to 726, encompasses the newly annotated MBD and SUD-C domains.
Stoichiometry of nsp3. PFO is a nondissociative detergent that can be used with native PAGE to determine the mass of protein complexes (44). We investigated the oligomeric structure of purified nsp3 domains using PFO-PAGE. The expressed domains and multidomain constructs of nsp3 tested here (Fig. 3) and previously (45, 53) appeared to migrate mainly as monomeric species, with trace amounts of dimers visible, while lysozyme and protein molecular weight markers migrated as monomers, as previously reported (44). In contrast, full-length nsp2 was primarily monomeric, with a small concentration of trimeric species and traces of dimeric, tetrameric, and higher-molecular-weight species (compare Fig. 3A and 3B), confirming that monomer > dimer oligomerization is characteristic of nsp3 domains.
![]() View larger version (43K): [in a new window] |
FIG. 3. Oligomerization of SARS-CoV nsp3 domains. (A) PFO-PAGE analysis reveals the oligomeric state of selected nsp3 domains in solution. A Benchmark protein ladder (M) was used to estimate protein and protein complex molecular masses, indicated in kDa at left. Lanes in panel A contain, from left to right, 25 µM, 50 µM, and 100 µM nsp2, ADRP, and SUD; 25 µM and 50 µM UB2-PL2pro; and 50 µM and 100 µM NAB, respectively. (B) Reducing sodium dodecyl sulfate-PAGE analysis of selected nsp3 domains. Lanes in panel B contain, from left to right, 50 µM and 100 µM nsp2, NAB, SUD, ADRP, and UB1, respectively.
|
![]() View larger version (87K): [in a new window] |
FIG. 4. PFO-PAGE analysis of interdomain oligomerization. Approximately equimolar concentrations of bacterially expressed nsp3 domains were incubated separately (left) or in combination (right) at 37°C for 1 h and analyzed by PFO-PAGE. The panel at left demonstrates the electrophoretic mobility of each protein species and homooligomer; lanes at left contain 2 and 1 nanomole of UB1, ADRP, or SUD or 10 and 5 nanomoles NAB, respectively. Each lane at right depicts mixtures of 2 nanomoles of UB1, ADRP, or SUD and 5 nanomoles NAB as shown. Proteins were visualized with SYPRO-ruby staining. Marked bands correspond to 50-kDa and 110-kDa UB1+SUD complexes (filled triangles) and 60-kDa ADRP+SUD complexes (open triangles). In the presence of additional nsp3 domains, UB1+SUD complexes are not formed, but the amount of ADRP+SUD complex is increased. Duplicate samples are shown for the four-domain mixture. Lanes containing the Benchmark protein ladder are indicated (M), with masses in kilodaltons indicated at left.
|
, and Y1β). During some but not all purifications of bacterially expressed SUD, addition of protein caused a visible "bleaching" effect on the Talon affinity matrix which was interpreted to arise from cobalt stripping activity.
To test for metal-binding activity by SUD, we added additional CoCl2 and ZnCl2 to purified SUD, SUD451-651, and SUD-C domains and examined the UV-visible spectra (Fig. 5). Zinc binding does not produce a detectable spectral change, but charge transfer between cobalt(II) and sulfur atoms (here, probably cysteine residues) produces a characteristic absorption signal with peaks at
310 and 340 nm (5). UV-visible spectrum analysis indicated that full-length SUD (389 to 726; Fig. 5A to C) bound cobalt, whereas neither the truncated SUD (451 to 651; Fig. 5F) nor the carboxyl-terminal portion of this region (SUD-C 513 to 651; Fig. 5D and E) showed evidence of cobalt binding. Addition of zinc(II) to cobalt-complexed SUD did not dampen the S-Co(II) spectral signal but appeared to induce additional protein precipitation, visualized as a general increase in the absorbance in the far-UV range, which was confirmed by visual inspection. The precipitation-corrected 310-nm absorbance curves (Fig. 5B, inset) are most consistent with binding of a single cobalt atom per SUD molecule. Addition of zinc following cobalt saturation did not diminish the spectral signal at 310 nm, indicating that equimolar zinc is unable to displace bound cobalt bound to SUD. These data were interpreted to indicate that a cysteine-coordinated metal ion-binding site with a high affinity for cobalt is localized partly or wholly in the amino-terminal domain of SUD, which we therefore describe as the MBD. SUD contains six conserved cysteines (SARS-CoV nsp3 positions 393, 456, 492, 507, 550, and 623) and two conserved histidine residues (positions 539 and 613), which could participate in a tetrahedral metal ion coordination site. The lack of metal ion binding by truncated SUD451-651 suggests that Cys393 may have a key role in metal ion coordination.
![]() View larger version (15K): [in a new window] |
FIG. 5. Titration of cobalt binding by 10 µM SUD and SUD-C. UV-visible spectra of 10 µM full-length SUD (A to C), SUD-C (D and E), and truncated SUD451-651 (F) solutions were measured after addition of 0 to 5 molar equivalents of Co(II) in the form of CoCl2. Relative Co(II) concentration is indicated with colored lines running from red (0 equivalents) to violet (5 equivalents). Because of the observed metal ion concentration-dependent protein precipitation during these experiments, both the raw absorbance at 310 nm (A310; panels B, C, E, and F; black circles) and normalized absorbance (A310/A250; open circles) are plotted. (C) Displacement of Co(II) by Zn(II) was investigated by addition of ZnCl2 to 10 µM SUD solutions that had been previously saturated with 5 equivalents of Co(II).
|
![]() View larger version (72K): [in a new window] |
FIG. 6. Generic nucleic acid binding properties of SUD-C, SUD, and NAB domains of nsp3. (A) EMSAs were performed with sequence-matched 20-nucleotide dsDNA or dsRNA or one of two functionally equivalent sets of sequence-matched 40-nucleotide ssDNA or ssRNA oligomers. Gels were stained for protein or nucleic acid as indicated. Lanes containing protein only at the highest listed concentration (P), 800 ng of nucleic acid only (N), dsDNA ladder marker (M), and mixtures of protein with 800 ng nucleic acid are indicated. Protein concentration decreases in twofold increments from left to right within the indicated range. Maximum protein concentrations used here were determined empirically by expression and stability in solution at 4°C. Electrophoretic mobility ranges for nucleic acids (black brackets), protein (small triangles), and protein-nucleic acid complexes (white brackets) are indicated on the right. SUD-C has a small net positive charge at neutral pH and migrated through the gel only in complex with nucleic acid (NA). Results from two single-stranded nucleic acid sequences that behaved equivalently in non-sequence-specific EMSA are shown. (B) Binding curves were constructed from densitometry data calibrated to the maximum and minimum binding in each gel. The range in which increasing nucleic acid binding was observed is indicated with a bold line above each graph to facilitate comparison. SUD binding curves may overestimate affinity since maximal binding overlapped with the limit of protein solubility.
|
![]() View larger version (49K): [in a new window] |
FIG. 7. Duplex unwinding activity of NAB and comparison with SARS-CoV nucleoprotein amino-terminal structured domain (N-NTD). Samples of NAB (A) or N-NTD (B) protein and single-stranded or duplex nucleic acid were mixed and incubated as for EMSA and then chilled overnight to allow the protein-nucleic acid complexes to dissociate before analysis by native PAGE. Lanes containing the highest concentration of protein only (P), nucleic acid only (N), and dsDNA marker (M) are indicated. Double-stranded (filled triangles) and single-stranded (open triangles) nucleic acids were detected with SYBR-gold dye, which stains double-stranded nucleic acid more prominently than single-stranded nucleic acid. Protein concentration decreases in twofold increments within the range shown. Enlargements showing the dose-dependent nucleic acid unwinding activity are included at the bottom of each panel.
|
We examined protein conservation for the aforementioned seven uncharacterized nsp3 domains (Fig. 8). Conservation analysis predicted nonenzymatic (or nonconserved enzymatic) function for the four domains G2M, TM1-2, ZF, and TM3-4. All three domains from the Y region (Y1 to Y3) were approximately equally conserved and ranked between enzymes and enzymatic domains. From the consistently high conservation of Y1, Y2, and Y3, we hypothesize that Y1 to Y3 may form a single functional unit with a conserved enzymatic function.
|
|
|---|
DNase and proteinase K treatments were performed to differentiate between proteins entwined or embedded at the virion surface and internal proteins. Data presented in the "PK" column of Table 2 demonstrate that the enzymatic treatment followed by an additional density gradient purification step did reduce detection of most "background" extracellular matrix proteins below the threshold of detection. However, proteinase K treatment did not completely eliminate all viral surface proteins. The spike protein ectodomain was detected after enzymatic treatment of the virions, possibly because of the persistence of a proteinase-resistant core. Thus, we were unable to rule out either possible topology for nsp3 in the viral membrane based on proteolytic cleavage by proteinase K.
Origin and quantification of proteins detected in our analysis. The diversity of proteinase K-resistant host proteins that we found to be associated with purified virions may best be explained as a manifestation of the internal state of the infected host cells at the time of peak viral release. Between 24 and 48 h postinoculation the infected Vero-E6 cells became rounded and detached, with a granular appearance characteristic of late-stage infection. The extensive collection of histones observed in SARS-CoV virions following sequential treatment with DNase I and proteinase K argues against entwined host chromatin on the virion surface as the source of these proteins, just as the absence of the most common, high-copy-number nuclear proteins argues against copurification of intact nuclei. Packaged shreds of degraded chromatin from apoptotic cells would seem to be a more likely source, and it has been estimated that 95% of the Vero-E6 cells are apoptotic within 48 h of infection (37). This could also explain the presence of the mitochondrial and ribosomal proteins observed. Analysis of background proteins pelleted from the supernatant of Vero-E6 cells over the same time period did not reveal any histone, ribosome, nuclear, or mitochondrial proteins. When discussing these observations, we need to keep in mind that the presently used collection procedure was designed to maximize virus yield and therefore protein detection. Analysis of virus collected before the onset of apoptosis might reveal a somewhat different protein profile and will be the focus of a future study.
Proteins previously reported to interact with incorporated coronavirus proteins and genomic RNA were well represented in the proteomics results, including two proteins reported to bind N protein, i.e., cyclophilin A and 14-3-3 (tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein) (34, 59). We did not detect UBE2I, which is an E2 ubiquitin-conjugating enzyme reported to interact with the SARS-CoV nucleoprotein (32), but did identify three other points of contact with the ubiquitin and ubiquitin-like conjugation pathways, i.e., the ubiquitin-specific proteinase 14 (USP14), the ubiquitin-carboxyl-terminal esterase L1 (UCH-L1), and the SUMO-1 activating enzyme 1 (SAE1). Further research may determine whether the presence of these host proteins could be related to the presence of two ubiquitin-like domains (45, 53) or to the ubiquitin-cleaving activity of nsp3 (33). Proteins previously reported to interact with the MHV genome including polypyrimidine tract binding protein (PPTB1), cytoplasmic polyadenosine binding proteins (PABP4 and PABP1/3), and heterogeneous nuclear ribonucleoproteins (hnRNPA1, hnRNPA2/B1, and hnRNPA3) were among the numerous RNA-binding proteins detected (55). However, in this study we were unable to distinguish whether RNA-binding proteins were incorporated bound to viral RNA or host mRNAs or as soluble proteins.
The relative abundance of proteins detected by mass spectrometry can be approximated from the reproducibility of detection and from the variety of peptides found, with more-complete coverage expected for overrepresented proteins than for rare proteins. Our data are consistent with published reports that the SARS-CoV M and N proteins are highly abundant in the virion, closely followed by the S protein (16). Proteins that were detected in fewer experiments and with lower coverage, such as ORF3a, ORF9b, nsp2, nsp3, and nsp5 proteins, are therefore likely to be present in lower relative copy numbers. Each SARS-CoV preparation that was analyzed contained
1010 virions, making it possible that proteins present in single copies on only a small percentage of virions could still be identified.
Connections to vesicular trafficking pathways. The host factors involved in coronavirus budding remain largely unknown. Most enveloped viruses bud by coopting host proteins, often from the intracellular ESCRT transport pathways (reviewed in reference 67). Elements of the clathrin and COPI protein complexes were identified in PK SARS-CoV preparations. Clathrin coats assemble at the plasma membrane and the trans-Golgi network, which are quite distant from the endoplasmic reticulum (ER)-Golgi intermediate compartment (ERGIC) where SARS-CoV budding occurs. Other components of assembled clathrin lattices, including clathrin light chain and adaptor proteins, were not detected, suggesting that free clathrin may have been captured from the cytoplasm at the time of budding.
Three COPI components (
-COP, β-COP, and
-COP) and a protein involved in coatomer assembly (ARF4) were detected. The COPI coatomer plays a role in transport between the ER and the Golgi apparatus and in transport between Golgi stacks (27). COPI proteins are abundant at ERGIC membranes and have been shown to colocalize with budding MHV (29). A dibasic motif in the cytoplasmic tail of the SARS-CoV and MHV spike proteins of the type K(X)KXX was recently shown to bind COPI through an undetermined mechanism and is required for efficient interaction with the M protein (36). Coronavirus M proteins also possess a conserved dibasic motif in the cytoplasmic tail region, which might function similarly (see Fig. S3 in the supplemental material).
The ADP-ribosylation factor 4 (ARF4) is a small guanine nucleotide-binding protein involved in COPI trafficking. It has been shown that depletion of ARF4, but not of ARF1, ARF3, ARF5, or ARF6 (ARF2 having been lost in mammalian cells), induced tubulation at Golgi membranes (64). A similar phenomenon has been observed in MHV-infected cells and is linked to E protein expression (see reference 43 and references therein). ARF4-binding motifs are found in G-protein-coupled receptors such as rhodopsin, and these generally take the form of conserved NP(Xn)Y motifs, where Xn typically denotes the presence of one to three intervening nonconserved residues (10). A similar XP(X1)Y, XP(X2)Y, or XP(X4)Y motif can be found in most coronavirus E proteins (see Fig. S3 in the supplemental material). While we have not mapped the precise amino acid requirements for E-mediated budding in this study, two other mutagenesis and reversion studies have identified a region critical to the function of MHV (14) and TGEV (31) E proteins that maps to residues 47 to 65 in SARS-CoV E (see Fig. S3 in the supplemental material). The SARS-CoV E protein XP(X2)Y or XP(X4)Y motif, KPTVYVY, is found between residues 53 and 59. We note that the proline and the C-terminal tyrosine of this motif appear to be highly conserved among coronaviruses.
Anticipated proteins that were not detected in this study. Some peptides were likely missed in our study because of low solubility, poor proteinase accessibility, or unreported sequence differences from the Homo sapiens homologs. As expected for the presently used technique, hydrophobic transmembrane regions are underrepresented, even among the proteins that were otherwise unambiguously detected. Thus, no peptides were recovered from the transmembrane regions of the S, M, ORF3a, nsp3, and nsp4 proteins. This may also explain the failure to detect hydrophobic low-copy-number virion components such as the E, ORF6, and ORF7b proteins. Proteinase K treatment may have eliminated the detectable regions of some type I integral membrane proteins with very small cytoplasmic tail regions such as the ORF7a protein, which was detected in native but not PK samples (data not shown). We therefore suspect that proteins from ORF2 to ORF9b may have been present in at least limited quantity in purified SARS-CoV preparations and that ORF3b, E, ORF6, ORF7a, ORF7b, ORF8a, and ORF8b proteins were not detected here due to technical limitations and the biochemical properties of these proteins.
Although we detected the ARF5 and COPI proteins, which are abundant at membranes of the ERGIC (29, 64), which is the site of SARS-CoV assembly (57), we did not detect any of the other ERGIC components identified in a recent proteomics survey (6). Overall, integral membrane proteins and membrane-associated proteins are underrepresented in the results of our analysis. While we were unable to exclude the possibility that low membrane protein detection was due to purely technical reasons, such as low solubility, limited protease accessibility, or paucity of trypsin-cleavable fragments of appropriate molecular weight, our results would appear to corroborate a previous observation that M protein networks can exclude host proteins that are present at the site of assembly from the viral membrane (9).
Novel viral proteins identified. We identified three new incorporated SARS-CoV proteins (nsp2, nsp3, and nsp5) in PK virus samples. We were also able to confirm that the ORF9b protein is incorporated, as was suspected from previously published results for the MHV I protein (13). We are unable to determine from the present results whether nsp2 and nsp3 were incorporated as a polyprotein. We were also unable to exclude the possibility that the nsp's were associated with other membrane-bound structures that copurified with virus. However, the best available evidence suggests that the viral replicase proteins, with the exception of nsp1, colocalize at the site of replication (42, 63), and no viral structure has yet been described which is specifically enriched in nsp2, nsp3, and nsp5. Therefore, we interpret these results to indicate that the nsp's identified by mass spectrometry proteomics analysis were incorporated in virions.
Two viral proteinases, nsp3 and nsp5, were detected in the virion. Finding the 1,922-amino-acid, multiple-membrane-spanning nsp3 in the virion was especially unexpected. nsp3 is best known for the presence of the highly conserved second papain-like cysteine proteinase (PL2pro) and ADRP, which together comprise about one-third of the mass of nsp3. But how was it incorporated in the virion? The lack of evidence for incorporated SARS-CoV polymerase, helicase, and nuclease proteins would appear to rule out efficient copurification of replicase complexes or double-membraned replicase vesicles as a source of nsp3. At the time at which this work was started, no protein-protein interactions involving nsp3 had been reported, except for PL2pro-mediated cleavage of polyubiquitin substrates (33). The relative abundance of nsp3 (as estimated by the frequency of detection and the percent coverage) appeared to be greater than that of nsp2, ORF3a, ORF6, ORF7a, and ORF9b proteins, all of which are reported to interact with nsp3 (65). This observation suggests that viral protein-protein interaction is probably not the primary mechanism of nsp3 incorporation.
We recently reported that the UB1 domain of nsp3 binds a discrete ssRNA species and that the adjacent AC domain binds bacterial dsDNA (53). Here we now report that two additional domains of SARS-CoV nsp3, i.e., SUD and NAB, exhibit distinct nucleic acid-binding characteristics. MBD incorporates a metal ion-binding site which may mediate RNA binding, and NAB exhibits energy-independent double-stranded nucleic acid unwinding properties, which would be consistent with nucleic acid chaperone activity. The presence of conserved cysteine/histidine clusters between the putative transmembrane domains (ZF) and at the amino terminus of the Y domain (Y1) may signal the presence of additional MBDs, which could increase the total number of nucleic acid-binding domains in nsp3 to six, i.e., UB1, AC, SUD, NAB, ZF, and Y1. RNA-binding proteins were also abundant among the detected host proteins and may have been packaged with genomic RNA or with incorporated host mRNAs. However, we also note that several other putative or confirmed SARS-CoV RNA-binding proteins were not detected in this study, perhaps suggesting that nsp3 has a specialized role in virogenesis, as was previously suggested for the PLpro-containing nsp1 of equine arteritis virus (62).
The mechanism of putative nsp3 incorporation in purified SARS-CoV preparations remains unclear. Despite the presence of a host-derived envelope on each virion, host integral membrane and membrane-associated proteins comprised only 1% and 7%, respectively, of the detected proteins. A much higher percentage of viral integral and membrane-associated proteins was detected, including M, S, nsp3, and the ORF3a and ORF9b proteins. Therefore, while nucleic acid binding properties may have contributed to the disproportionate detection of nsp3 in purified virions relative to the adjacent proteolytic products of pp1a, the transmembrane region of nsp3 may also have contributed to incorporation.
The recent structural characterization of twin ubiquitin-related domains near the amino terminus of nsp3 (53) and the present demonstration of the presence of additional RNA-binding domains in nsp3 have implications for reconstructing the path of nidovirus replicase evolution. The Nidovirales encompass the coronaviruses, toroviruses, arteriviruses, and roniviruses. Replicase polyproteins from these viruses share conserved domains and common transcriptional mechanisms. As shown in Fig. 2A, coronaviruses from groups I, II, and III contain an initial UB1 homolog, followed by either a functional (group I and IIa) or a vestigial (group III) PL1pro domain lacking the catalytic histidine found in functional PL1pro and PL2pro. We interpret the existence of paired UB and PLpro domains as favoring an evolutionary model in which a prototypic PLpro gene was duplicated in the last common ancestor of coronaviruses and subsequent loss of PL1pro occurred in some coronavirus lineages (73). Two possible mechanisms for the duplication of the UB and PLpro domains are duplication of a gene cassette by direct repeat, as observed on a smaller scale in the AC domain of HCoV-HKU1 nsp3 from various isolates (Fig. 2) (70), and a recombination event between viruses with distinct UB and PLpro domains prior to the divergence of the known coronavirus lineages.
We identified the novel nucleic acid binding domains SUD and NAB, which are located downstream of UB and PLpro homologs. SUD and NAB share no detectable sequence homology. This does not necessarily preclude a structural relationship, since sequence-based criteria did not predict the structural homology between the two ubiquitin-related domains of nsp3, i.e., UB1 and UB2. Further investigation will be required to determine whether SUD and NAB domains are the result of duplication of a putative UB-PLpro-nucleic acid binding protein gene cassette or became embedded in nsp3 independently.
Functional implications. There is growing evidence that the function of nsp3 is closely tied to association with nucleic acids. We would hypothesize that the functions of the UB1, AC, SUD, NAB, ADRP, and PL2pro domains could be coordinated on a complex of protein and single-stranded and double-stranded RNA, such as the viral replicative form RNA (recently reviewed in reference 50). The character of coronavirus RNA replicase activity changes from an early, unstable form associated with discontinuous negative-strand synthesis to a later form associated with positive-strand synthesis (49). These observations suggest that PLpro-mediated cleavage of the coronavirus polyprotein or other substrates such as polyubiquitin or poly(ADP-ribose), may drive the shift from coronavirus negative-sense to positive-sense RNA synthesis. A possible location for this activity would be the template-switching hot spots mapped to complementary sequences near the 5' genomic terminus and 3' antigenomic terminus (72). PLpro involvement in viral RNA synthesis has a precedent in arterivirus, which requires the multifunctional, multidomain papain-related proteinase nsp1 for subgenomic RNA transcription but not for replication (62).
Although the work presented here represents primarily a starting point for detailed exploration of the overall function of coronavirus nsp3, we already note similarities to eukaryotic poly(ADP-ribose) polymerase (PARP) enzymes. Activated PARP consumes NAD+ to synthesize a polymer of ADP-ribose that is covalently linked to a target protein (reviewed in reference 52). The best-characterized member of the family is PARP-1, which initiates the repair of nicked DNA through two N-terminal nucleic acid binding zinc fingers and is auto-poly(ADP-ribosyl)ated on a glutamic acid-rich domain. PARPs can contain multiple adaptor domains preceding a conserved C-terminal catalytic domain, which in some cases includes one or more H2A macrodomains homologous to the ADRP of nsp3, and also nucleic acid binding domains. The ADRP domain of SARS-CoV nsp3 has been shown to strongly bind poly(ADP-ribose), and analysis of the nsp3 Y region yielded several Fold and Function Assignment System (FFAS) (24) hits on viral RNA-dependent RNA polymerase and both prokaryotic and eukaryotic DNA-dependent RNA polymerase domains (based on FFAS confidence scores of –9.5 and lower; data not shown). If nsp3 did indeed contain a functional PARP domain, it could obviously function in proofreading, genome repair, or nidovirus-specific discontinuous subgenomic RNA transcription.
This work was supported by NIH grant AI-59799 (M.J.B.), NIH/NIAID contract HHSN266200400058C (P.K.), NIH P41 RR11823 (J.R.Y.) and U01 DE016267 (J.R.Y.), and by the Joint Center for Structural Genomics through the NIH/NIGMS grant U54-GM074898. Additional support was obtained for M.A.J. and P.S. through fellowships from the Canadian Institutes of Health Research and the Spanish Ministry of Science and Education and by the Skaggs Institute for Chemical Biology. Kurt Wüthrich is the Cecil H. and Ida M. Green Professor of Structural Biology at The Scripps Research Institute.
This is TSRI manuscript 19291.
Published ahead of print on 26 March 2008. ![]()
Supplemental material for this article may be found at http://jvi.asm.org/. ![]()
These authors contributed equally to this paper. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»