Previous Article | Next Article ![]()
Journal of Virology, March 2009, p. 2255-2264, Vol. 83, No. 5
0022-538X/09/$08.00+0 doi:10.1128/JVI.02001-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.

Institute of Biomedical Sciences,1 Institute of Physics, Academia Sinica, Taipei 11529,3 Research Center for Adaptive Data Analysis,2 Department of Physics, National Central University, Chungli 32001,4 National Synchrotron Radiation Research Center, Hsinchu 30076,5 Center for Nonlinear and Complex Systems and Department of Physics, Chung Yuan Christian University, Chungli 32023,6 Department of Physics, National Taiwan Normal University, Taipei 10610, Taiwan, Republic of China7
Received 23 September 2008/ Accepted 24 November 2008
|
|
|---|
|
|
|---|
Coronaviruses are positive-sense single-stranded RNA (ssRNA) viruses. The coronavirus genomic RNA is encapsidated into a helical capsid by the nucleocapsid (N) protein, which is one of the most abundant coronavirus proteins (19). The N protein has nonspecific binding activity toward nucleic acids, including ssRNA, single-stranded DNA, and double-stranded DNA (33). It can also act as an RNA chaperone (39). However, the mechanism of binding of the N protein to nucleic acids is poorly understood.
The SARS-CoV N protein is a homodimer composed of 422 amino acids (aa) in each chain. The N protein can be divided into two structural domains interspersed with disordered (unstructured) regions (Fig. 1A) (2). The N-terminal domain (NTD; also called RBD) serves as a putative RNA-binding domain, while the C-terminal domain (CTD; also called DD) is a dimerization domain (13, 36). Both the NTD and the CTD bind to nucleic acids through electropositive regions on their surfaces (3, 13, 32). All coronaviruses share similar domain architectures at both the sequence and structure levels. No structure of N protein or any of its domains in complex with nucleic acids is available.
![]() View larger version (18K): [in a new window] |
FIG. 1. (A) Schematic of the domain architecture of the SARS-CoV N protein. Structured domains are shown as balls, and unstructured regions are shown as lines. (B) Protein constructs used in the current study. Numbers represent the amino acid residue range relative to the full-length N protein (NP). Sumo-1-FL contains a Sumo-1 tag (shown as an oval), followed by the flexible linker of the N protein between residues 181 and 246.
|
Here we tested all three disordered regions of the SARS-CoV N protein and found that they are all involved in RNA binding. The central region, in particular, had a large impact on binding behavior as monitored by electrophoretic mobility shift assays (EMSA). Small-angle X-ray scattering (SAXS) and nuclear magnetic resonance (NMR) results show that this central region is a flexible linker (FL) that connects the two structural domains in an extended conformation. Our results provide new insights into the functional coupling of intrinsic disorder, RNA binding, and oligomerization.
|
|
|---|
![]() View larger version (86K): [in a new window] |
FIG. 2. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis gel strips of the various SARS-CoV NP protein constructs after purification. Almost all constructs appear as a single band in the gel strips, and for the few exceptions, the purity of the main band exceeds 90%. Lanes are labeled in the following order: M, light molecular mass marker; 1, NP1-181; 2, NP45-181; 3, NP45-247; 4, NP181-365; 5, NP248-365; 6, NP248-422; 7, NP45-365; 8, Sumo-1-FL.
|
NMR spectroscopy. Samples contained 0.5 to 1 mM protein in NMR buffer (10 mM sodium phosphate [pH 6.0], 50 mM NaCl, 1 mM EDTA, 1 mM 2,2-dimethyl-2-silapentane-5-sulfonate [DSS], 0.01% NaN3, 10% D2O, and Complete Mini protease inhibitor mix [Roche]). Experiments were performed at 30°C unless stated otherwise. Bruker 600-MHz spectrometers equipped with cryoprobes were employed in the experiments. The data acquired were processed with the TopSpin suite (Bruker Biospin, Germany) or iNMR (Nucleomatica, Italy).
Size exclusion chromatography. Experiments were conducted using an Akta fast-performance liquid chromatography system (GE Healthcare, CA) equipped with a Tricorn 10/300 Superdex 75 column at an elution rate of 0.2 ml/min. Apparent molecular weights of the proteins were estimated from the elution profile calibrated with the LMW gel filtration calibration kit (GE Healthcare, CA). Elution volume and molecular weight have the relationship log(MW) = 6.5404 – 0.1802 EV, where MW is the molecular weight in thousands and EV is the elution volume in milliliters.
SAXS. The didomain construct NP45-365 was concentrated to 10 mg/ml with an Amicon Ultra concentrator (Millipore, MA). Data were collected on the BL13A beam line of the National Synchrotron Radiation Research Center at 25°C (Hsinchu, Taiwan). The first 10 points of the data were excluded from analysis due to possible aggregation effects. The GNOM program was used to analyze the scattering profile and to obtain the radius of gyration (Rg), the pairwise distribution function [P(r)], and the maximal distance (dmax) (31). The BUNCH program was used to add flexible linkers assuming P1 symmetry (27). Atomic coordinates of the NTD monomer and the CTD dimer served as input to a modified version of CRYSOL (M. Petoukhov, personal communication). A total of 252 modeling runs were obtained, and the interdomain distances were measured by calculating the coordinates of the center of gravity of the two domains using in-house software.
Secondary-structure prediction and sequence alignment. Representative N protein sequences from all groups of Coronaviridae were obtained from the SwissProt server. The JPred metaserver was used to obtain consensus secondary-structure predictions for the central flexible linkers of the various sequences (5). These sequences were then manually aligned based on the predicted structural and physicochemical properties. The sequence length was arbitrarily fixed to that of the SARS-CoV N protein flexible linker for easier visualization.
|
|
|---|
![]() View larger version (56K): [in a new window] |
FIG. 3. Effects of the ID regions (residues 1 to 44 and 182 to 247) on the RNA binding affinity of the NTD. (A through C) Fitting of the binding isotherms of NP45-181 (NTD) (A), NP1-181 (B), and NP45-247 (C), based on the EMSA results. Each binding isotherm represents the overall fitting against three independent experiments, taking into account the standard deviation of each data point. (D through F) Representative EMSA results for NP45-181 (D), NP1-181 (E), and NP45-247 (F).
|
![]() View larger version (55K): [in a new window] |
FIG. 4. Effects of the ID regions (residues 182 to 247 and 366 to 422) on the RNA binding activity of the CTD. (A through C) Fitting of the binding isotherms of NP248-365 (CTD) (A), NP248-422 (B), and NP182-365 (C), based on the EMSA results. Each binding isotherm represents the overall fitting against three independent experiments, taking into account the standard deviation of each data point. (D through F) Representative EMSA results for NP248-365 (D), NP248-422 (E), and NP182-365 (F).
|
|
View this table: [in a new window] |
TABLE 1. Binding coefficients for U20 ssRNA to various regions of the SARS-CoV N proteina
|
The flexible linker is ID. A combination of techniques was used to ascertain the intrinsic disorder of the flexible linker. 15N-edited heteronuclear single-quantum coherence (HSQC) spectra have been widely used as a tool to monitor the order and disorder of proteins (8). Well-dispersed spectra are indicative of a structured protein, while congested spectra with resonances clustered around a small region of 8.3 ± 0.5 ppm in the proton dimension are often disordered. Comparing the HSQC spectrum of NP45-247 with that of the NTD (NP45-181) in Fig. 5A, we observed additional resonances in the spectrum of NP45-247 clustered in the 7.5- to 8.5-ppm range on the proton chemical shift. This strongly suggests that the additional residues from aa 182 to 247 of NP45-247 are disordered. The dispersed resonances are almost exact matches between the two constructs, indicating that residues 182 to 247 do not affect the structure of residues 45 to 181. Furthermore, size exclusion chromatography of NP45-247 shows that the protein elutes out of the column with a Stokes radius corresponding to a globular protein of 41 kDa (Fig. 5B). The theoretical molecular mass of the construct is 22.9 kDa, suggesting that the NP45-247 construct has an elongated shape. This is in contrast to the NTD, which is mainly globular (13). We attribute this to residues 182 to 247 forming an extraneous "tail" that affects the hydrodynamic properties of the molecule. An alternative interpretation of dimer formation is excluded, because no additional well-dispersed resonance was observed. Our data presented in the next paragraph for CTD constructs also preclude dimer formation for residues 182 to 247.
![]() View larger version (32K): [in a new window] |
FIG. 5. Residues 182 to 247 are ID when attached to the NTD. (A) 15N-edited HSQC spectra of NP45-181 (NTD) (left) and NP45-247 (right) show additional resonances clustered in the middle of the spectrum of NP45-247. Axis units are ppm. (B) Size exclusion chromatogram of NP45-247. The corresponding apparent molecular weight was calculated from the equation log(MW) = 6.5404 – 0.1802 EV, where MW is the molecular weight in thousands and EV is the elution volume in milliliters.
|
![]() View larger version (33K): [in a new window] |
FIG. 6. Residues 182 to 247 are ID when attached to the CTD. (A) 15N-edited HSQC spectra of NP248-365 (CTD) (left) and NP182-365 (right) show additional resonances clustered in the middle of the spectrum of NP182-365. Axis units are ppm. (B) Size exclusion chromatogram of NP182-365. The corresponding apparent molecular weight was calculated from the equation log(MW) = 6.5404 – 0.1802 EV, where MW is the molecular weight in thousands and EV is the elution volume in milliliters.
|
![]() View larger version (15K): [in a new window] |
FIG. 7. SAXS results for the didomain construct NP45-365. (A) Scattering profile of NP45-365 (crosses) and normalization fitting with GNOM (dashed lines). J, scattering intensity; s, scattering angle vector. (B) Normalized results from GNOM showing the pairwise distance distribution [P(r)] and the maximum distance. The radius of gyration is fitted to 61 Å. "r" represents the pairwise distances. (C) Representative model of NP45-365 structure based on CRYSOL simulations of SAXS data. Only the alpha carbons are shown. Notice the difference in distance between the two NTDs and the CTD core.
|
|
View larger version (25K): [in a new window] |
FIG. 8. Alignment of the flexible linker regions from different coronavirus N proteins. Residues that are predicted by JPred to form a helix are boxed. The arginines of the SR-rich regions are underlined. The names of the coronaviruses (with SwissProt accession numbers and phylogenetic groups in parentheses) are as follows: SARS-CoV (P59595; group 2b); NL63, human coronavirus NL63 (Q6Q1R8; group 1b); 229E, human coronavirus 229E (P15139; group 1b); TGEV, porcine transmissible gastroenteritis virus strain Purdue (P04134; group 1a); OC43, human coronavirus OC43 (P33469; group 2a); MHV-1, murine hepatitis virus 1 (P18446; group 2a); IBV, avian infectious bronchitis virus strain Beaudette (P69596; group 3).
|
|
|
|---|
The major conclusions from the present studies are as follows. (i) The SARS-CoV N protein is a modular protein consisting of two structured domains flanked by three long stretches of ID segments. (ii) The ID regions account for almost half of the molecule, and the central ID region exists in an extended conformation. (iii) There are multiple RNA-binding sites in the N protein with comparable binding affinities in the micromolar region. The binding sites are distributed in several regions of the molecule. Apparently this property is shared by all coronaviruses and perhaps by many nucleic acid-binding proteins. A large number of nucleic acid-binding proteins, including those of viral origin, contain long stretches of ID regions (34). Paramyxoviruses and flaviviruses, for example, have N and core proteins that contain considerable amounts of disordered residues, respectively (14, 17). The advantages of these properties can be put in the context discussed below. Their relevance to RNA packaging and their functions are also discussed.
Enhanced RNA-binding affinity. The presence of intrinsic disorder and multiple binding sites together can confer high RNA-binding affinity. First, the extended conformation of the N protein due to the presence of ID segments increases the collision radius with RNA, much like in the "fly-casting" model proposed by Shoemaker et al. (29). Second, transcription factors and other allosteric cell signaling proteins contain a disproportionate number of domains or segments that are ID under native conditions. Hilser and Thompson have proposed a quantitative mechanistic model to assess the importance of intrinsic disorder for intramolecular site-to-site communication in a multidomain regulatory protein, the so-called "coupled-allostery" effect (12). They showed that site-to-site allosteric coupling is maximized when intrinsic disorder is present in the domains or segments containing one or both of the coupled binding sites. Although regulatory proteins generally have much higher affinity for their respective RNA or DNA targets than that presented here for the N protein, the same principles can be applied to this system. The N protein contains multiple RNA-binding sites and showed a large Hill coefficient, as revealed by our EMSA results. Thus, like that of a multidomain regulatory protein, the RNA binding of the N protein is allosteric, i.e., binding of a segment to RNA facilitates the binding of other segments to RNA. The flexibility of the ID region in the N protein allows the optimal alignment of RNA-binding site-containing segments of the N protein and facilitates their binding to the RNA molecule already bound to other sites of the same N protein molecule, resulting in enhanced binding affinity. It should be realized that the "coupled-allostery" effect is more robust and effective in enhancing binding affinity than the multivalence effect in a rigid molecule, since the binding sites do not have to align perfectly for initial binding. Thus, even though the RNA-binding affinity of the individual sites of the N protein is not particularly strong, the RNA-binding affinity of the full-length protein can be very high due to the combined "fly-casting" (29) and "coupled-allostery" (12) effects conferred by the modular N protein with ID linkers.
ID regions as interaction hubs. One of the surprises in this study is the involvement of the flexible linker in RNA binding (Table 1), which has never been reported for the SARS-CoV N protein. The SR-rich region of the flexible linker has been implicated in a number of protein-protein interactions, including those with host proteins such as human heterogeneous nuclear ribonucleoprotein A1 (hnRNP A1) and the phosphoprotein B23 (22, 37). It also plays a role in self-association (10, 23) and could have implications for the formation of the nucleocapsid. The SR-rich region also contains the highest density of positive charges in the flexible linker but is also a site for multiple phosphorylation and thus is a prime target for regulating RNA-binding activity (30). In fact, electrostatic charges have been shown to play an important role in the nonspecific RNA binding of the structured regions, and all the putative disordered regions of coronavirus N proteins are positively charged (3, 13, 32). The multifarious activities revolving around the flexible linker of the SARS-CoV N protein strongly suggest that this region acts as a "flexible-net" interaction hub (24), where intrinsic disorder plays a key role.
The flexible linker might not be the only region that could act as an interaction hub. The C-terminal disordered region, for example, has been found to participate in the oligomerization of the N protein (21). However, a polylysine stretch within the oligomerization region has also been shown to bind to nucleic acids. Moreover, earlier reports have shown that this C-terminal region interacts with the membrane (M) protein of SARS-CoV (11). Although the function of the N-terminal disordered domain has not yet been identified, it has been speculated that this region is involved in protein-protein interactions (25). Taken together, we speculate that the three disordered regions of the SARS-CoV N protein represent three interaction hubs that bind to different partners of the N protein interactome. This is consistent with the observation for the regulatory proteins that ID regions are able to recognize multiple partners.
Coupled nucleic acid binding and self-association. Similar mechanisms may link RNA binding with N protein self-association in the disordered regions. Both the flexible linker and the C-terminal disordered region have been implicated in oligomerization of the N protein (10, 21), and our current findings showed that they also bind to nucleic acids. The effect of RNA binding on oligomerization could be even more dramatic for the ID regions. The extensively charged nature of the flexible linker and the C-terminal disordered region represents a large barrier to N-N interaction. In fact, repulsive forces between the domains may cause the large Rg observed for the didomain construct NP45-365 in our SAXS studies. While charge repulsion between the domains confers the advantage of avoiding interdomain interactions and results in a larger electrostatic binding surface, it also impedes oligomerization (4, 9) and formation of the nucleocapsid. Binding to nucleic acids may neutralize the charges on the N protein and allows two protein molecules to approach and oligomerize. This simple concept would couple capsid formation, which is essentially a self-association process, with RNA binding and guarantee the formation of nucleocapsids containing genetic material. Multiple phosphorylation of the SR-rich region, on the other hand, could provide an additional level of regulation to the RNA-binding process or the self-association process (26, 30). However, the functions and levels of phosphorylation of SARS-CoV NP are still uncertain, and whether phosphorylation really plays a role in RNA binding and/or capsid formation remains to be determined.
Insights into the linkage between RNA binding and RNP packaging. The modular structure and the presence of ID segments in the N protein offer considerable advantages for the packaging of the genomic RNP and the expression of genomic information. We envision that a single RNA molecule will bind to multiple N proteins at a given moment. Since the bindings are electrostatic and nonspecific, the RNA-bound N proteins presumably can "slide" along the RNA molecule and interact with other RNA-bound N proteins (16). The flexible linker allows more freedom for the different parts of the N protein molecule to interact with each other, resulting in specific packaging of the helical RNP molecule. We have previously shown that in crystal the CTD packs to form two parallel, basic helical grooves, which may be oligonucleotide attachment sites (3). Thus, the RNA molecule would wrap around the CTD core in forming the helical RNP molecule. In the model, both the N and the C terminus of the CTD protrude out of the helical core, potentially allowing the linker, NTD, and N-terminal residues to interact with other parts of the RNA molecule. The ID regions will play a pivotal role in optimizing the interaction of the RNA molecule with all the other segments of the N protein. The SARS-CoV NTD and the NTD and CTD of avian infectious bronchitis virus have also been found to form helical packing in crystal (7, 15, 28). In the absence of the structure of RNA-bound N protein, we cannot exclude the possibility of other forms of helical packaging. Nonetheless, the two characteristics of the N protein, i.e., intrinsic disorder and multiple RNA-binding sites, will be of fundamental importance in understanding the packaging of the RNP.
The modular structure and multiple sites of moderate RNA-binding affinity of the N protein not only allow the packaging of a stable RNP but also offer an energetically favorable condition for the expression of the viral genomic information. One can envision an unzipping mechanism for unwinding of the viral RNA molecule and dissociation of the RNA molecule from the N protein in a stepwise manner, one module at a time, without the need to overcome a high-energy barrier, since each module of the N protein interacts with the RNA molecule with only moderate affinity. Whether such a mechanism exists will not be known until the detailed atomic resolution structure of the SARS-CoV RNP complex is available.
In conclusion, we showed that the SARS-CoV N protein is a modular protein containing multiple RNA-binding sites. A hallmark of this protein is the presence of long segments of ID regions, accounting for almost half of the sequence. We have also determined the RNA-binding affinity of each module semiquantitatively. The RNA-binding sites reside throughout the entire sequence, including the ID regions of the protein. The flexible linkers of different coronavirus N proteins share low homology, yet they exhibit similar physicochemical properties, implying a universal code of RNA binding in this protein family. The presence of multiple RNA-binding sites of moderate affinity, coupled with the presence of the long stretches of ID regions in the N protein structure, is likely to have fundamental consequences not only for the RNA-packaging mechanism and viral genome expression but also for interaction with other viral and host proteins.
We also thank Maxim Petoukhov (EMBL, Heidelberg, Germany) for providing the modified CRYSOL program and for helpful discussions about SAXS techniques and data processing.
Published ahead of print on 3 December 2008. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»