Characterization of Bafinivirus Main Protease Autoprocessing Activities

ABSTRACT The production of functional nidovirus replication-transcription complexes involves extensive proteolytic processing by virus-encoded proteases. In this study, we characterized the viral main protease (Mpro) of the type species, White bream virus (WBV), of the newly established genus Bafinivirus (order Nidovirales, family Coronaviridae, subfamily Torovirinae). Comparative sequence analysis and mutagenesis data confirmed that the WBV Mpro is a picornavirus 3C-like serine protease that uses a Ser-His-Asp catalytic triad embedded in a predicted two-β-barrel fold, which is extended by a third domain at its C terminus. Bacterially expressed WBV Mpro autocatalytically released itself from flanking sequences and was able to mediate proteolytic processing in trans. Using N-terminal sequencing of autoproteolytic processing products we tentatively identified Gln↓(Ala, Thr) as a substrate consensus sequence. Mutagenesis data provided evidence to suggest that two conserved His and Thr residues are part of the S1 subsite of the enzyme's substrate-binding pocket. Interestingly, we observed two N-proximal and two C-proximal autoprocessing sites in the bacterial expression system. The detection of two major forms of Mpro, resulting from processing at two different N-proximal and one C-proximal site, in WBV-infected epithelioma papulosum cyprini cells confirmed the biological relevance of the biochemical data obtained in heterologous expression systems. To our knowledge, the use of alternative Mpro autoprocessing sites has not been described previously for other nidovirus Mpro domains. The data presented in this study lend further support to our previous conclusion that bafiniviruses represent a distinct group of viruses that significantly diverged from other phylogenetic clusters of the order Nidovirales.

The order Nidovirales comprises four profoundly separated clusters of plus-strand RNA viruses that have been assigned to three families called Coronaviridae, Arteriviridae, and Roniviridae (13,20). The taxonomy of the family Coronaviridae has recently been revised and now includes two subfamilies called Coronavirinae (genera Alpha-, Beta-, and Gammacoronavirus) and Torovirinae (genera Torovirus and Bafinivirus) (http://ictvonline.org/virusTaxonomy.asp). White bream virus (WBV) strain DF24/00 is the type species and, to date, sole member of the newly established genus Bafinivirus (38). With about 26.6 kb, the genome of WBV rivals the large genome sizes reported for many other nidoviruses. Further similarities between bafiniviruses and other nidoviruses include conserved (i) polycistronic genome organizations; (ii) genome expression strategies, including the synthesis of a nested set of subgenomic (sg) mRNAs; and (iii) colinear arrays of replicative enzymes, including RNA-processing activities that are not conserved in other RNA viruses. WBV (and yetto-be-identified members of the genus Bafinivirus) and viruses from the genus Torovirus were classified as two separate genera in a common subfamily on the basis of phylogenetic analyses of the most conserved viral enzymes and conserved features of their structural proteins (18,38). The available information suggests that members of the two genera originated from an immediate common ancestor but split early in evolution (38).
Also, the two groups of viruses infect very different hosts (mammals and fish, respectively) and use slightly different strategies to synthesize their subgenomic RNAs (38,48). The available (although limited) data suggest that the bafinivirus cluster diverged profoundly from other nidoviruses (including toroviruses) and occupies a distant position in the evolutionary tree of nidoviruses. Molecular studies of the biology of bafiniviruses may therefore be expected to provide new insight into conserved and distinct biological features of the major clusters of nidoviruses and, possibly, help identify major driving forces involved in the evolution of this highly divergent group of RNA viruses. We therefore decided to embark on a systematic characterization of the molecular biology of bafiniviruses, and we report here the results of a first set of experiments aimed to investigate the expression and autoprocessing of the WBV main protease (M pro ), an enzyme predicted to play a key role in the proteolytic processing of the WBV ORF1a/1b-encoded polyproteins, pp1a and pp1ab, and thus in the formation and functional maturation of the viral replication-transcription complex, as shown previously for other nidoviruses (reviewed in references 47 and 58).
Proteolytic processing of WBV pp1a and pp1ab is expected to yield more than a dozen nonstructural proteins (nsps). The majority of processing events are predicted to be mediated by the M pro , while other yet-to-be-identified proteases might cleave a limited number of sites in the N-terminal polyprotein regions (38). Because of similarities in both sequence and structure between nidovirus M pro s and picornavirus 3C proteases (3C pro ), the nidovirus M pro is occasionally referred to as 3C-like protease (3CL pro ) (19,63), while yet another name, chymotrypsin-like proteases (CHL pro ), is some-times used for the same group of proteases to refer to the chymotrypsin (two ␤-barrel) fold that is shared by all 3C and 3C-like proteases characterized to date (2-4, 7, 31, 32, 42, 46, 52, 54, 64).
We previously identified a putative M pro homolog in the WBV replicase polyproteins (38). Similar to its homologs in other nidoviruses (21,59,63), the WBV M pro was confirmed to be located within the C-terminal third of pp1a where it is flanked on both sides by hydrophobic domains (38). The present study provides initial insight into the autoprocessing characteristics and substrate specificity of the WBV M pro using information derived from heterologous expression systems and WBV-infected cells and revealed conserved residues predicted to be involved in catalysis and substrate specificity. 0.5. The culture was halved, and protein expression was induced through the addition of IPTG (isopropyl-␤-D-thiogalactopyranoside) to a final concentration of 1 mM in one of the two cultures, while the other was mock induced. After induction of protein expression, the cultures were incubated for 3 h at 25°C. Aliquots of each culture were spun at 15,000 ϫ g for 1 min, the pelleted cells were resuspended in 2ϫ Laemmli sample buffer and incubated at 94°C for 2 min. Protein expression was analyzed by SDS-PAGE and Coomassie staining.
GST-affinity purification and N-terminal sequencing of proteolytic processing products. The proteins were expressed in E. coli TB1 essentially as described above. Three hours after induction of expression, the cells were pelleted by centrifugation at 6,000 g for 10 min at 4°C, resuspended in GST purification buffer (20 mM Tris-Cl [pH 7.5], 200 mM NaCl, 1 mM dithiothreitol [DTT]), and stored at Ϫ20°C overnight. The cells were thawed, 1ϫ complete protease inhibitor cocktail without EDTA (Roche) was added, and the cells were lysed by sonication. Insoluble material was removed by centrifugation at 18,000 ϫ g for 30 min at 4°C. GST-containing cleavage products were purified by using GST-Bind Resin (Novagen) and eluted with 10 mM reduced glutathione in GST purification buffer. For determination of the N terminus, the eluted protein was electrophoresed in a discontinuous SDS-10% polyacrylamide gel, transferred onto Sequi blot polyvinylidene difluoride membrane (Bio-Rad), and stained with Coomassie brilliant blue R-250. Determination of the N terminus by Edman degradation was carried out by the protein sequencing facility of the Functional Genomics Center, Zurich, Switzerland.
IMAC of M pro -CHis wild-type and mutant proteins. E. coli TB1 cells were transformed with the appropriate (wild-type or mutant) pMal-WBV-M pro -CHis plasmid DNA. The protein was expressed as described for GST-affinity purification with the exception that the immobilized metal-affinity chromatography (IMAC) purification buffer contained 20 mM Tris-Cl (pH 8.0), 200 mM NaCl, and 5 mM 2-mercaptoethanol. In addition, the lysis buffer contained 10 mM imidazole. The cleared lysate was incubated with Ni-NTA agarose (Invitrogen) for 1 h at 4°C, and nonspecifically bound proteins were subsequently removed by extensive washing with IMAC purification buffer containing 50 mM imidazole. Bound proteins were eluted using 250 mM imidazole in IMAC purification buffer.
Amylose affinity purification. Maltose-binding protein (MBP) fusion proteins were expressed in E. coli TB1 cells essentially as described above for GST fusion proteins with the exception that bacteria were suspended in MBP purification buffer (50 mM Tris-Cl [pH 7.5], 300 mM NaCl, 1 mM DTT). Cleared lysates were applied onto a column containing amylose resin (NEB). Nonspecifically bound proteins were removed by extensive washing with MBP purification buffer, and bound protein was eluted using MBP purification buffer supplemented with 10 mM maltose. The MBP-M pro marker fusion proteins were subsequently dialyzed against factor Xa cleavage buffer (20 mM Tris-Cl [pH 7.5], 200 mM NaCl, 5 mM CaCl 2 ) and cleaved with factor Xa (Novagen) according to the manufacturer's instructions.
Detection of proteolytic processing products by immunoblotting. To generate a WBV M pro -specific antiserum, M pro -CHis was expressed in E. coli TB1 and purified by IMAC as described above. The protein was then purified further by size exclusion chromatography using a HiLoad 16/60 Superdex 75 prep-grade column (GE Healthcare) and buffer containing 20 mM Tris-Cl (pH 8.0), 200 mM NaCl, 1 mM DTT. This purified protein was used to raise a WBV M pro -specific polyclonal rabbit antiserum (Eurogentec, Inc.). Anti-MBP monoclonal antibody (catalog no. E8032S) was purchased from NEB, and rabbit anti-GST antiserum (catalog no. ab9085) was purchased from Abcam. For immunodetection of proteolytic processing products of MBP-pp1a-3424-3725-GST wild-type or mutant proteins, E. coli TB1 cells were transformed with pMal-pp1a-3424-3725-GST plasmid DNA or the appropriate mutant derivative. As a control for MBP expression, pMal-c2X was used. The proteins were expressed essentially as described above with the exemption that cultures were chilled on ice for 10 min prior to addition of IPTG. As a control, untransformed E. coli TB1 were used which were treated likewise. Aliquots of the cultures corresponding to equal amounts of OD 600 were centrifuged at 15,000 ϫ g for 1 min. Bacterial pellets were resuspended in 1ϫ Laemmli sample buffer supplemented with 100 mM MgCl 2 and agitated on a vortex for 5 min. After DNA was pelleted by centrifugation for 5 min at 15,000 ϫ g, the supernatant was heated for 2 min at 94°C. Proteins were electrophoresed in 18-by-16-cm discontinuous SDS-15% polyacrylamide gels and subsequently transferred onto nitrocellulose membranes by semidry blotting. Unspecific binding sites were blocked by incubation of the membrane in 5% (wt/vol) skimmed milk powder in phosphate-buffered saline (PBS). The membrane was incubated with rabbit anti-GST antiserum (1:10,000) in 5% (wt/vol) skimmed milk powder in PBS-T0. 05  .05, and analyzed by using the Odyssey system. The immunoblot was then repeated as described above, but this time anti-MBP monoclonal antibody (1:10,000) and anti-WBV-M pro (1:2,000) were used as primary antibodies and IRDye 800CW goat anti-rabbit IgG and IRDye 680 goat anti-mouse IgG (both at 1:20,000) were used as secondary antibodies.

Detection of M pro in WBV-infected EPC cells.
For the detection of M pro in WBV-infected cells, EPC cells were infected at a multiplicity of infection (MOI) of 10 50% tissue culture infective dose(s) (TCID 50 ) per cell. At 24 h postinfection, the cells were lysed in 1ϫ Laemmli sample buffer, and the proteins were denatured by heating for 3 min at 94°C. The samples were purified by using the SDS-PAGE sample prep kit (Pierce) according to the manufacturer's instructions. The protein was quantified with Coomassie Plus Protein Assay Reagent (Pierce) using bovine serum albumin (BSA) as a standard. A total of 100 g of protein per sample was separated in an 18-by-16-cm discontinuous SDS-15% polyacrylamide gel and transferred onto nitrocellulose. To ensure even running across the lanes, the M pro markers had been mixed with lysates from mockinfected cells (100 g of cellular protein per lane). WBV M pro expressed in virus-infected cells, and bacterially expressed M pro markers were detected by immunoblotting as described above with anti-WBV-M pro (1:1,000) as the primary antibody, followed by labeling with IRDye 800CW goat anti-rabbit IgG.
HPLC-based peptide cleavage assay. The peptide conforming to the N1 cleavage site (N1 peptide, H 2 N-GNTITRQ2ALQR-COOH, high-pressure liquid chromatography [HPLC] purified to Ն98%) was purchased from Eurogentec, Inc. The reactions contained 0.5 mM peptide in 50 mM Tris-Cl, 100 mM NaCl, 1 mM DTT (pH 8.0), and 0.3 M protease or the equivalent volume of purification buffer (negative control) in a total reaction volume of 25 l. After incubation at 30°C for the indicated length of time, the reaction was terminated by addition of one volume 2% trifluoroacetic acid (TFA). Reaction products were resolved in a 2 to 50% acetonitrile gradient in 0.1% TFA over 5 column volumes (CV) (3 h of incubation) or 9 CV (20 h of incubation) at a flow rate of 0.5 ml per min on a RPC C2/C18 ST 4.6/100 column connected to an Ä KTA purifier 10 system (both GE Healthcare) and detected by measuring the absorption at 215 nm (A 215 ).

Amino-and carboxyl-terminal autoprocessing activities of the WBV M pro domain.
For all members of the family Coronaviridae analyzed to date, the M pro domain has been shown (or is predicted) to autocatalytically release itself from the downstream transmembrane domain (39,40,50,62,63). In contrast, processing immediately C-terminal of the M pro domain could not be confirmed in the ronivirus Gill-associated virus (GAV) (60). We previously showed that expression of a fusion protein comprised of MBP and WBV pp1a/pp1ab residues Gln-3424 to Gln-3726 (that is, the putative M pro domain together with short flanking sequences) resulted in autoprocessing N-terminal of the protease domain (38). In these experiments, C-terminal M pro autoprocessing at the M pro carboxyl terminus could not be characterized because the expected small size of the potential C-terminal processing product likely precluded its detection. In the present study, we therefore extended the expression construct at the carboxyl terminus. To prevent the possible cytotoxicity that is often caused by the expression of hydrophobic sequences in E. coli, we did not extend the fusion protein with the transmembrane domain located downstream in WBV pp1a/pp1ab but inserted a heterologous sequence. Plasmid pMal-WBV-3CL_559-560 (38) was modified to contain the GST coding sequence downstream of the MBP-pp1a-3424-3725 sequence (Fig.  1A), resulting in plasmid pMal-pp1a-3424-3725-GST. When protein expression of IPTG-versus mock-induced E. coli cells transformed with pMal-pp1a-3424-3725-GST was analyzed by SDS-PAGE of whole-cell lysates, overexpression of the ϳ46-kDa N-terminal processing product could again be observed. Although an additional product with an apparent molecular mass of ϳ27 kDa was also detectable, we could not reliably assign its identity at this point. No third processing product could be detected. Using GST-Bind Resin (Novagen), an ϳ27-kDa protein was purified from E. coli expressing the wild-type (wt) fusion protein (Fig. 1E, lane 3). A protein of similar apparent molecular mass was also detected by immunoblot analysis with GST-specific antiserum (data not shown and see Fig. 4D). At 27 kDa, the molecular mass of the additional observed product was higher than the calculated molecular mass of the GST domain alone (26 kDa), indicating that processing had occurred within the WBV pp1a/pp1ab part of the fusion protein (ϳ10 amino acids from the C-terminal end of the WBV sequence). The data led us to conclude that the fusion protein containing the WBV pp1a/pp1ab wt sequence 3424 to 3725 was proteolytically cleaved at both the N and the C termini of the WBV M pro domain. The failure to detect all of the expected processing products in the initial expression analysis may be due to comigration of M pro -and GST-containing products (due to their similar sizes), instability of one of the products, or masking of one of the products through comigration with E. coli protein(s).
Determination of the M pro domain boundaries and substrate specificity of the protease. To determine the exact location of the processing sites and obtain information on the substrate specificity of the WBV M pro , we carried out a series of affinity purifications of different processing products and determined their N termini by Edman degradation. In the first instance, we sequenced the N terminus of the purified Cterminal (GST-containing) processing product of MBP-pp1a-3424-3725-GST (see above) by five cycles of Edman degradation (data not shown). The analysis revealed that cleavage had occurred at the WBV pp1a/pp1ab sequence 3713 TVGQ2TLT S 3720 . 3CL pro s commonly show a preference for either Gln or Glu at the position immediately preceding the scissile bond (commonly referred to as the P1 residue according to the nomenclature introduced by Schechter and Berger [37], while the subsequent residue is referred to as P1Ј). The Gln/Glu residue is often followed by a small aliphatic residue at the P1Ј position (reviewed in references 36 and 63). The tentatively identified substrate specificity of WBV M pro for Gln followed by Thr thus conformed well to these previous observations, lending additional support to our identification of the C-terminal processing site.
As mentioned above, the P1 Gln/Glu residue is generally the major specificity determinant of 3C-like proteases (4,7,17,26,33,45,46,52,53). Substitution of the P1 residue of the identified cleavage site, Gln-3716, with Ala in the MBP-pp1a-3424-3725-GST fusion protein could therefore be expected to abolish processing C-terminal of the M pro domain. Expression of a protein containing this specific substitution (MBP-pp1a-3424-3725_Q3716A-GST) should then allow us to purify the M pro - GST fusion protein through glutathione affinity purification and determination of the protein's N terminus would reveal the N-terminal M pro autoprocessing site. Surprisingly, however, the apparent molecular mass of the purified product, as judged by SDS-PAGE, was only ϳ1 kDa larger than the Cterminal processing product of the MBP-pp1a-3424-3725-GST wt protein (Fig. 1F, compare lanes 3 and 4). N-terminal sequencing (data not shown) of this processing product revealed that processing had occurred at a second C-terminal processing site, 3706 VTQQ2TSVT 3713 , immediately N terminal of the previously identified 3713 TVGQ2TLTS 3720 processing site. Both sites possess identical residues at the P1 and P1Ј positions, Gln and Thr, respectively, but no further conserved features could be inferred for the two cleavage sites. In both cases, cleavage seemed to occur with high efficiency, as only the fully processed product was detectable in the eluate fraction obtained from the glutathione affinity purification. For brevity, the more N terminal of the two C-terminal M pro autoprocessing sites, 3706 VTQQ2TSVT 3713 , is referred to as the C1 site, and the more C terminal, 3713 TVGQ2TLTS 3720 , is referred to as the C2 processing site.
To be able to purify M pro and determine its N terminus, amino acids Gln-3424 to Thr-3707 of WBV pp1a/1ab were expressed as a fusion protein with MBP. A hexahistidine tag was added at the C terminus (just upstream of the C1 processing site) to allow affinity purification of the M pro domain (Fig.  1B). The WBV pp1a/pp1ab sequence in this protein should contain the M pro core domain and the N-terminal autoprocessing site, allowing autocatalytic release and subsequent affinity purification of the hexahistidine-tagged M pro domain (M pro -CHis). The purified protein had an apparent molecular mass of ϳ28 kDa (Fig. 1G, lane 5). The theoretical molecular mass of the entire WBV pp1a/1ab-derived part (Gln-3424 to Thr-3707) of the fusion protein including the C-terminal hexahistidine tag is 31.3 kDa. The data thus led us to conclude that (I) WBV pp1a/1ab residues Gln-3424 to Thr-3707 are sufficient for autocatalytic processing at the N terminus and (II) processing occurred about 20 amino acids into the N-terminal region of the expressed WBV pp1a/1ab fragment.
Sequencing of the N terminus of this protein (data not shown) revealed that the band consisted in fact of two processing products. While cleavage had predominantly occurred at 3439 ITRQ2ALQR 3446 , a minor processing product, where cleavage had occurred seven amino acids further downstream, at 3446 RIRQ2 AVTV 3453 , could also be detected. In keeping with the nomenclature introduced for the C-terminal WBV M pro processing sites (see above), the more N terminal of the two detected processing sites ( 3439 ITRQ2ALQR 3446 ) are thus referred to as the N1 site and the more C-terminal processing site ( 3446 RIRQ2AVTV 3453 ) as the N2 site.
The release of M pro from the polyprotein can occur either intramolecularly (cis) or intermolecularly (trans). To determine the substrate preferences of the WBV M pro in transcleavage reactions, we devised an assay that was suitable to determine whether M pro displays a preference for one of the two detected N-and C-terminal cleavage sites. The proteolytically inactive MBP-pp1a-3424-3725_S3589A-GST fusion protein was used as a substrate in this experiment. Heterologously expressed and affinity-purified substrate protein was incubated with purified M pro -CHis, and reaction products were analyzed by SDS-PAGE and Coomassie blue staining. Five major processing products could be identified (Fig. 2, lane 3). Based on the correlation of the apparent molecular masses of these products with the calculated molecular masses of theoretically possible processing products and the previously observed electrophoretic mobilities of some of these products, the four smaller products were postulated to be (I) the GST domain (and a few C-terminal residues of the WBV pp1a/1ab fragment) (ϳ27 kDa), (II) the M pro domain (ϳ29 kDa), (III) the MBP domain (and a few N-terminal residues of the WBV pp1a/1ab fragment) (ϳ46 kDa) and the M pro -GST processing intermediate (ϳ55 kDa). The largest processing product of ϳ80 kDa was originally thought to be the MBP-M pro processing intermediate which, as a consequence of cleavage at one of the C-terminal processing sites, lacks the C-terminal GST domain. However, the results of a subsequent immunoblot analysis led us to conclude that this product was the result of a processing event within the MBP domain (see below for details).
To determine the C-terminal M pro processing site that had been used under trans-cleavage reaction conditions, we subjected the 27-kDa processing product to five cycles of Edman degradation. Similarly, to determine the N-terminal M pro processing site cleaved under these conditions, we analyzed the N terminus of the 55-kDa M pro -GST processing intermediate. The M pro -GST processing intermediate was chosen instead of the M pro domain since the latter could not be sufficiently separated from the M pro -CHis included in the reaction (Fig. 2,  compare lanes 3 and 4), which might have caused contamina- tion problems in subsequent sequence analyses. The data obtained in this analysis allowed us to conclude that processing had occurred at the C1 ( 3706 VTQQ2TSVTV 3713 ) and N1 ( 3439 ITRQ2ALQR 3446 ) sites, respectively. However, upon extended incubation of the trans-cleavage reaction, processing at the C2 site also became evident (data not shown). In summary, the data show that WBV M pro (pp1a/pp1ab residues 3443 to 3707) is active in trans and preferentially cleaves the N1 and C1 sites under these conditions.
Identification of M pro in WBV-infected cells. The timely and spatially regulated release of replicase subunits is thought to be critical for plus-strand RNA virus replication, and defects in virus replication observed in coronaviruses carrying nonsynonymous mutations that affect polyprotein processing support this hypothesis (12, 14; N. Karl, T. Hertzig, and J. Ziebuhr, unpublished data). In the light of these observations, the multitude of processing sites identified in the bacterial expression system was all the more astonishing and raised the question of whether all of these processing sites are indeed processed during virus replication. To investigate M pro autoprocessing in virus-infected cells and to corroborate the autoprocessing data obtained in our bacterial expression systems, we performed another set of experiments.
As described above, M pro autoprocessing could occur at two N-terminal (N1 and N2) and two C-terminal (C1 and C2) sites, potentially resulting in four different processing products, N1-C1, N1-C2, N2-C1, and N2-C2 (Fig. 3A). To identify and distinguish these proteins in SDS-polyacrylamide gels, we devised size markers for the four theoretically possible forms of WBV M pro , N1-C1, N1-C2, N2-C1, and N2-C2. Fragments of WBV pp1a/pp1ab corresponding to the four putative forms of M pro were expressed as fusion proteins with MBP, affinity purified, and subsequently cleaved with factor Xa to release the desired M pro fragments without any additional, vector-derived sequences. The following WBV pp1a/1ab fragments were expressed: pp1a/1ab Ala-3443 to Gln-3709 for the N1-C1 marker, Ala-3443 to Gln-3716 for the N1-C2 marker, Ala-3450 to Gln-3709 for the N2-C1 marker, and Ala-3450 to Gln-3716 for the N2-C2 marker. To prevent autoprocessing at processing sites present in some of these proteins, all marker proteins contained the active-site Ser-3589-to-Ala substitution.
The electrophoretic mobility of M pro detected in WBV-infected EPC cells was compared to that of the size markers described above in an immunoblot with an M pro -specific antiserum. Two proteins with slightly different molecular masses were detected in infected but not mock-infected cells (Fig. 3B). The two proteins comigrated with the N1-C1 (open arrowhead) and N2-C1 (filled arrowhead) size markers, respectively. The data lead us to conclude that the M pro domain accumulates in two forms in virus-infected cells. Direct sequencing of the N termini and/or mass spectrometric analysis would be required to unequivocally confirm the identities of these two proteins. These types of analyses would require the isolation of the two forms from WBV-infected cells in comparatively high amounts and purity. Only limited options of flanking processing sites conforming to the detected substrate specificity of M pro and even more so products of fitting molecular mass resulting from processing at this site are conceivable. Based on the comigration pattern with marker proteins it seems reasonable to suggest that the two forms of M pro are produced by C-terminal processing at the C1 site and N-terminal processing at the N1 or N2 site. No product resulting from processing at the C2 site was detectable.
Mutagenesis analysis of residues predicted to be involved in catalysis and substrate specificity. Cellular and viral chymotrypsin-like serine proteases rely on a Ser-His-Asp catalytic triad (reviewed in reference 24). To analyze whether this characteristic is conserved in WBV M pro and to identify other conserved residues important for protease activity, the sequence of WBV M pro was compared to that of homologs from astroviruses and toroviruses, which we identified in database searches as the closest known relatives of the WBV M pro . We also included arterivirus main proteases in our analysis, because the active-site residues of these related nidovirus serine proteases have been characterized in detail in previous mutagenesis and structural studies. WBV pp1a/pp1ab residues Ser-3589, His-3492, and Asp-3518 (Fig. 4A) could be aligned with the active-site Ser, His, and Asp/Glu residues of other 3C-like proteases (19,29,41,42,46,63), although sequence conservation was largely confined to the areas surrounding the  (22). Abbreviations of virus names and accession numbers are as follows: HAstV-1, human astrovirus 1 (AAW51880); EToV, equine torovirus (X52374); BToV, bovine torovirus (AY427798); PRRSV, porcine reproductive and respiratory syndrome virus (Q04561); LDV, lactate dehydrogenase-elevating virus (NC_02534); EAV, equine arteritis virus (NP_127506). Secondary structure elements of the astrovirus M pro are shown above the sequence (PDB code: 2W5E). These were extracted from the PDB file and added to the alignment using ESPript 2.2 and manually adjusted to conform to the numbering used by Speroni et al. (42). Conserved residues that were identified as possible catalytic or S1 subsite residues based on data available for the corresponding residues in astrovirus and nidovirus homologs (see the text for details) are marked with closed or open arrowheads, respectively. The gray arrowhead marks the (nonconserved) Asp residue that was used as a negative control in this mutagenesis study. Residues were numbered according to their position in the respective viral polyprotein sequence. (B to D) Autoprocessing activities of mutant forms of the WBV M pro with substitutions of putative active-site residues. MBP-pp1a-3424-3725-GST and mutant proteins were expressed in E. coli TB1. After induction of expression for 3 h at 25°C, total cell lysates were prepared and separated in a 15% discontinuous SDS-polyacrylamide gel and transferred onto nitrocellulose. GST-containing unprocessed precursor proteins, processing intermediates, and processing end products were detected by Western blot analysis with rabbit anti-GST antiserum (D) as the primary antibody and IRDye800CW anti-rabbit IgG as the secondary antibody. After imaging, bound antibodies were removed. MBP-and M pro -containing proteins were subsequently detected by using mouse anti-MBP monoclonal antibody (B) and rabbit anti-WBV M pro antiserum (C), followed by labeling with IRDye680 anti-mouse IgG and IRDye 800CW anti-rabbit IgG, respectively. Images were acquired by using an LI-COR Odyssey system and software. Wild-type and mutant proteins analyzed in this experiment are indicated above the lanes. MBP-LacZ␣, E. coli TB1 transformed with pMal-c2X and induced with 1 mM IPTG; control, untransformed and mock induced E. coli TB1; M pro marker 1, M pro markers N1-C1 and N1-C2; M pro marker 2, M pro markers N2-C1 and N2-C2. The molecular masses of the marker proteins (in kilodaltons) and the identities of the M pro marker proteins (in panel C) are indicated to the left. Black arrowheads, processing end products; gray arrowheads, processing intermediates; open arrowhead, unprocessed precursor. putative active-site triad described above. Two conserved residues, a His and a Thr/Ser, the latter located five residues upstream of the active site Ser/Cys, have been implicated in binding the substrate P1-Gln/Glu, thereby contributing to the partially conserved P1 specificity of picornavirus 3C and related viral proteases (7, 31-33, 41, 60, 64). These two residues, His-3603 and Thr-3584, were found to be conserved in the WBV M pro sequence.
The functional significance of putative catalytic and P1-binding residues for WBV M pro activity was assessed by characterizing the proteolytic activities of mutant MBP-pp1a-3424-3725-GST fusion proteins carrying specific Ala replacements of predicted active-site residues. Unprocessed precursor proteins and cleavage products were identified in total bacterial lysates by immunoblotting with antibodies specific for MBP (Fig. 4B), WBV M pro (Fig. 4C), and GST (Fig. 4D), respectively. Consistent with previously reported data and the predicted role of Ser-3589 as the M pro active-site nucleophile (38), unprocessed full-length fusion protein with an apparent molecular mass of 101 kDa could be detected in lysates obtained from E. coli TB1 cells expressing the S3589A mutant protein in all three immunoblots (Fig. 4B, C, and D, open arrowhead). In contrast, when the MBP-pp1a-3424-3725-GST fusion protein containing the wild-type WBV M pro sequence was expressed, proteolytic processing was observed. Thus, an MBP-containing N-terminal processing product of ϳ46 kDa (Fig. 4B, filled arrowhead) could be detected using anti-MBP antibody and two forms of the M pro domain equivalent to those identified in WBV-infected cells (that is, the N1-C1 and N2-C1 forms) could be detected using the M pro -specific antiserum (Fig. 4C, filled arrowheads). Furthermore, a C-terminal processing product of ϳ27 kDa could be detected using the GST-specific antiserum (Fig. 4D). In the latter case, an additional minor product of ϳ80 kDa could be specifically detected. A protein of the same size had also been detected in the M pro -CHis-mediated transcleavage reaction of the MBP-pp1a-3424-3725_S3589A-GST mutant protein reported above (Fig. 2, lane 3). This minor processing product was not detected by the anti-MBP antibody specific for an epitope within the N-terminal 75 amino acids of MBP (NEB technical support, unpublished data), confirming that the protein is a C-proximal rather than N-terminal processing product. Although the identity of this minor processing product was not determined unambiguously, its apparent molecular mass and immunological reactivity led us to suggest that the protein was produced by cleavage within the MBP sequence, which is irrelevant for pp1a/pp1ab processing in infected cells and, therefore, was not studied further.
When the putative active site His-3492 was substituted with Ala, only unprocessed precursor was detected (Fig. 4B, C, and  D), supporting the proposed critical role of His-3492 in M pro activity, where the residue is predicted to act as a general base during catalysis. Similarly, unprocessed fusion protein was detected when the predicted acidic residue of the catalytic triad, Asp-3518, was substituted with Ala (Fig. 4B, D3518A). In this case, however, a small amount of the N-terminal processing product could also be detected. In addition, a protein with an apparent molecular mass of ϳ56 kDa was detected in these lysates using the anti-M pro or the anti-GST antiserum ( Fig. 4C  and D, gray arrowhead). The apparent molecular mass, as well as the antigenic properties, of the protein led us to conclude that this protein is the M pro -and GST-containing processing intermediate resulting from processing of the full-length precursor at one (or both) of the N-terminal M pro autoprocessing sites. In contrast, substitution of Asp-3509 with Ala, which we used as a control in this experiment, did not impair proteolytic processing. We therefore concluded that Asp-3518 has a role in proteolytic activity and likely represents the acidic residue of the catalytic triad. Interestingly, substitution of this residue did not completely abolish activity and differentially affected the proteolytic processing at specific sites. Thus, processing at (at least one) N-terminal processing site did still occur (albeit relatively inefficiently), whereas virtually no C-terminal autoprocessing activity was detected. Together, the data lead us to suggest that the conserved Asp-3518 residue has an important but nonessential role in proteolytic activity, at least under these in vitro conditions. To further corroborate these conclusions, the potential effects of the Asp-3518-to-Ala substitution on protease activity were studied in a peptide-based trans-cleavage assay. The desired substitution was introduced into the pMal-WBV-M pro -CHis plasmid by site-directed mutagenesis and the mutant protein was expressed in E. coli TB1 and purified by IMAC. When analyzed by SDS-PAGE (Fig. 5A), the preparation of wt protein was found to contain two proteins with apparent molecular masses of ϳ29 and ϳ28 kDa, resulting from differential processing at either of the two Nterminal processing sites N1 and N2. In contrast, the preparation of the D3518A mutant protein contained only the larger (ϳ29 kDa) form. In addition, small amounts of a protein with a molecular mass corresponding to that of the unprocessed fusion protein were present in the eluate fraction. Taken together, these observations confirmed that autoprocessing is impaired in the D3518A mutant. The expressed proteins, M pro -CHis wt and D3518A, were then used in a peptide-based cleavage assay (for details, see Materials and Methods) using a peptide whose sequence corresponded to the N1 processing site (N1 peptide, H 2 N-GNTITRQ2ALQR-COOH). No cleavage of the substrate peptide could be detected (even after overnight incubation) with the D3518A mutant protein (Fig.  5C), whereas wt protein digested the peptide to completion under equivalent conditions. The data confirmed the critical role for Asp-3518 in proteolytic activity. Substitution of this residue with Ala resulted in an overall reduced activity but differentially affected the processing at individual sites. In a bacterial expression system, processing at either of the C-terminal processing sites and the N2 site was more severely affected than processing at the N1 site. Peptide cleavage data confirmed and extended these conclusions by showing that, in contrast to the wt protein, the D3518A mutant was virtually inactive at the N1 processing site when tested in an in vitro trans-cleavage assay.
The effect of substitutions of the putative P1-Gln/Glu-binding residues, His-6303 and Thr-3584, with Ala was also analyzed in this experiment. No autoprocessing activity could be detected in the His-3603-to-Ala substitution mutant (Fig. 4B, C, and D), leading us to conclude that this residue, in accordance with our hypothesis, is crucial for proteolytic activity, likely by contributing to substrate binding. In contrast, no unprocessed precursor was detected in lysates of E. coli TB1 cells expressing the T3584A mutant protein, demonstrating that the role of this conserved residue in proteolysis is less critical. However, we only detected the larger N1-C1 form of M pro in lysates derived from E. coli cells expressing the T3584A mutant protein (Fig. 4C), while the smaller N2-C1 form, which was readily apparent in cells expressing the wt construct, was not detectable in cells expressing the mutant protein. Furthermore, the predominant C-terminal GST-containing processing product detected in lysates of E. coli expressing the T3584A mutant protein was of marginally lower electrophoretic mobility than that detected in the wt preparation (Fig. 4D). This could possibly be due to (predominant) processing at the C1 but not the C2 site. However, due to the minor differences in the molecular masses of the two proteins and the limited resolution of proteins of this size in SDS-PAGE, we cannot exclude that some processing at the C2 site did occur. The effect of the Thr-3584-to-Ala substitution on trans-processing activity was also tested. The M pro -CHis T3584A mutant protein was expressed and purified as described above for the M pro -CHis D3518A mutant protein. In contrast to the purified wt protein, only the larger form of M pro resulting from processing at the N1 site was present in the eluate fraction of the T3584A mutant protein (Fig. 5A). This is in good agreement with the changes in N-terminal processing previously observed after expression of the MBP-pp1a-3424-3725_T3584A-GST mutant protein (Fig. 4C). When tested in the peptide-based transcleavage assay using N1 peptide as substrate, the activity of the T3584A mutant protein was markedly reduced compared to wt protein (Fig. 5B). However, some processing of the peptide could still be observed and, after overnight incubation, T3584A mutant protein digested the peptide to near completion (Fig. 5C). Thus, under the conditions used, the activity of the T3584A mutant protein was reduced compared to wt but less affected than that of the D3518A mutant discussed above.

DISCUSSION
The study revealed important new information on the autoprocessing activities of an M pro from bafiniviruses. We were able to show that WBV M pro cleaves the polyprotein at N-and C-proximal sites to release (at least) two M pro -containing processing products. Based on comparative sequence analysis data ( Fig. 4A and data not shown) and crystal structure information available for 3C and 3C-like proteases (2,3,7,31,32,42), it is reasonable to suggest that the N-proximal two-thirds of the WBV M pro have a chymotrypsin-like two-␤-barrel fold that is extended at the C terminus by an extra domain. Secondary structure predictions generated by using the Jpred 3, Predict-Protein, and I-TASSER servers (9,34,35,56) consistently suggested that the C-terminal extra domain has a helical structure comprised of at least three ␣-helices (data not shown). With about 80 residues, the size of the C-terminal domain is larger than that of the arterivirus and smaller than that of the coronavirus homologs.
The data presented here suggest that pp1a/pp1ab residues Ser-3589, His-3492, and Asp-3518 residues constitute a prototypical Ser-His-Asp catalytic triad that is conserved in many cellular and viral CHL pro s. Although the conserved Ser and His residues proved to be essential for activity, replacement of Asp-3518 with Ala resulted in reduced but clearly detectable activity, with cleavage at different sites being affected to a varying extent (Fig. 4B to D). The data suggest a supportive (rather than essential) role for this residue in catalysis, most likely by positioning the active-site His in the required orientation and neutralizing the developing positive charge of the His in the transition state (5,6,10,15,16,43). Similar data have previously been reported for related chymotrypsin-like proteases, including EAV nsp4 (Asp-1129) and EToV M pro (Glu-3347) (39,41). For example, partial processing defects at selected sites had also been seen in EAV nsp4 mutant proteins with equivalent Asp replacements. Although further studies, including the elucidation of the three-dimensional structure of the WBV M pro , are needed to unequivocally confirm the role of the Asp residue in catalysis, the data provide strong evidence to propose that Asp-3518 is the third member of a Ser-His-Asp catalytic triad. With respect to the catalytic triad, the WBV M pro is most similar to the arterivirus nsp4, which also uses a Ser-His-Asp triad, whereas the more closely related toroviruses appear to use an M pro with a catalytic triad in which a Glu residue assumes the role of the third catalytic residue (7,39,41,46). In marked contrast to the Ser-His-Asp/Glu catalytic triad used by arteri-, toro-, and bafinivirus M pro s, the M pro s of coronaviruses and roniviruses rely on a Cys-His catalytic dyad (3,4,52,54,60,61).
The data suggest a critical involvement of His-3603 and Thr-3584 in mediating specificity for Gln at the P1 position of WBV M pro cleavage sites as revealed in the present study (Fig.  4B, C, and D). Equivalent residues in other Gln/Glu-specific CHL pro s were previously shown to be located in the S1 subsite, where their side chains form hydrogen bonds to the carboxamide/carboxylate moiety of the P1-Gln/Glu residue (31,32,41,42). Our data strongly support the proposed critical role of His-3603 in substrate recognition and are in agreement with mutational analyses reported earlier for other 3C and 3C-like proteases (25,27,41). In the case of the T3584A mutant, proteolytic processing was not completely abolished. Although processing defects were readily detectable for the N2 and C2 sites ( Fig. 4C and D and Fig. 5), cleavage at other sites was not evidently affected. Similar differential defects caused by substitutions of the two S1 subsite residues were previously seen in other Gln/Glu-specific CHL pro s. Thus, whereas substitution of the equivalent His in EAV nsp4, pp1a/pp1ab His-1198, abolished autoproteolytic processing (see above), substitution of the corresponding Thr, Thr-1179, caused less profound and varying functional defects, depending on the nature of the substituting residue and the processing site (41). Similarly, poliovirus 3C pro Thr-142 mutants displayed partial processing defects at specific sites, whereas His-161 mutants proved to be completely inactive (8,27).
In the course of the present study, we identified a total of four (two N-terminal and two C-terminal) M pro autoprocessing sites through N-terminal sequencing of relevant cleavage products. The use of three of these sites [two N-terminal and one C-terminal site(s)] during virus replication could be confirmed by studying M pro expression in WBV-infected cells (Fig. 3), providing support for the biological relevance of the processing data obtained in bacterial expression systems. All cleavage sites identified in the present study conformed to the consensus Q2(A,T), thus tentatively identifying the substrate specificity of the WBV M pro . The vast majority of viral 3C and 3C-like proteases share a specificity for Gln or Glu at the P1 position and, apparently, the WBV M pro is no exception to this general rule. Only a few exceptions have been described, most notably the M pro of Human coronavirus HKU-1 and the NIa protease of Sweet potato mild mottle virus (family Potyviridae) NIa, which are able to cleave after His (1,57). To identify and determine the role of other residues possibly contributing to WBV M pro substrate specificity, further studies are required, including, for example, the experimental identification of other WBV M pro cleavage sites in the viral polyproteins, site-directed mutagenesis studies, and protease assays using libraries of peptides with single substitutions of residues flanking the scissile bond on either side.
To our knowledge, the use of alternative N-and C-terminal M pro autoprocessing sites has no precedent in other RNA virus systems, where proteolytic processing is generally thought to occur in a tightly controlled and spatially and temporally regulated manner. We therefore sought to further corroborate our biochemical data and studied M pro autoprocessing during viral replication. The data we obtained suggest that proteolytic cleavage had occurred at (at least) three of the previously identified sites during viral replication in EPC cells, giving rise to two major forms of M pro that differed in their N terminus (Fig. 3B). The data are consistent with information derived from our heterologous expression experiments and support the biological significance of the two WBV M pro isoforms identified here.
Processing at both N-terminal M pro processing sites was observed upon heterologous expression of the MBP-pp1a-3424-3725-GST fusion protein and in WBV-infected cells. In both experiments, processing could be the result of cleavage in cis or trans or a combination of both reaction modes. Heterologously expressed M pro CHis is able to efficiently process the N1 site in trans-cleavage assays ( Fig. 2 and Fig. 5B and C). No processing of the N2 site could be detected in trans-cleavage assays ( Fig. 2 and data not shown), although we cannot fully exclude that processing occurred at a low level below the detection limit of our assay. Thus, it seems reasonable to suggest that processing at the N2 site primarily occurs in cis, although further experiments will be required to rigorously establish the processing mode at either site. At present, based on the limited M pro processing data obtained in E. coli and WBV-infected cells, it is not possible to answer the question of whether cleavage at the two N-terminal sites occurs alternatively or consecutively. To date, we have been unable to separate N2-C2 from N1-C2 and determine their respective activities independently in vitro.
Both C-terminal M pro processing sites were processed very efficiently in the MBP-pp1a-3424-3725-GST fusion protein expressed in E. coli (Fig. 1E and F and Fig. 4C and D). Notably, efficient processing at the C1 site in addition to efficient processing at the C2 site could be confirmed for the MBP-pp1a-3424-3725-GST wt protein in the immunoblot experiments (Fig. 4C), excluding the possibility that cleavage at the C1 site originally observed in the MBP-pp1a-3424-3725_Q3716A-GST mutant protein was due to (or even dependent on) the Glnto-Ala substitution at the C2 site. However, in reactions that allow cleavage only in trans, the C2 site appears to be cleaved slightly less efficiently than the C1 site (Fig. 2). On the basis of the available data, different scenarios for C-terminal processing of the WBV M pro domain are conceivable. Thus, cleavage might occur (I) first at the C1 site and subsequently at the C2 site by trans cleavage of the released C-terminal fragment, (II) first at the C2 site and subsequently within the M pro -containing processing intermediate at the C1 site (in cis or trans) or (III) with no pronounced preference for either of the two sites. Although none of these possibilities can be formally excluded on the basis of the available data, we currently favor a model in which, with regard to C-terminal processing, the C1 site is VOL. 85, 2011 BAFINIVIRUS MAIN PROTEASE cleaved first. This is supported by the following observations. First, no M pro -containing processing intermediates carrying a C2 terminus could be detected for either the bacterially expressed MBP-pp1a-3424-3725-GST fusion protein or in WBVinfected cells. Second, M pro -CHis-mediated processing of the MBP-pp1a-3424-3725_S3589A-GST fusion protein in trans primarily led to processing at the C1 site, whereas processing at the C2 site was only observed after extended incubation. Third, in E. coli TB1 cells expressing the T3584A substitution mutant, processing occurred almost exclusively at the C1 site (Fig. 4), further supporting the idea that the C1 site is cleaved more efficiently than the C2 site. However, since the GST-containing, C1-cleaved product was produced by a mutant protease (T3584A) carrying a substitution of a presumed substratebinding residue, we cannot formally exclude the possibility that inefficient cleavage at the C2 site was caused by subtle changes in substrate specificity rather than a generally reduced activity.
One of the questions that remain to be answered is whether or not processing at the C2 site occurs during virus replication. While the detection of two stable forms of M pro , N1-C1 and N2-C1, in WBV-infected cells is in perfect agreement with the data obtained for the bacterially expressed MBP-pp1a-3424-3725-GST fusion protein (where efficient processing had also occurred at the C2 site), the available data do not allow us to confirm or exclude that the C2 processing site in pp1a/pp1ab is indeed processed in vivo.
As outlined above, the data suggest that two forms of M pro are produced in WBV-infected cells. To our knowledge, this is the first report of alternative autoprocessing of a nidovirus protease domain to date. Previously, the use of two alternative M pro cleavage sites has been reported only for the nsp7/8 junction in Mouse hepatitis virus A59 (MHV-A59). Both of these sites appear to be processed during MHV-A59 replication, but their individual significance remains elusive (11). Our in vitro assays were necessarily based on a mixture of both M pro forms and this mixture showed clear trans-processing activity toward both protein and peptide substrates. For M pro s of coronaviruses, it is known that the addition or deletion of amino acid residues at the N terminus severely decreases activity by disrupting critical interactions between the two protomers in the coronavirus M pro dimer, with the N terminus of one protomer contributing to the formation of a functional S1 site in the other protomer (49,51,52,54). It is therefore conceivable that processing of the WBV M pro N2 site affects the enzyme's activity and/or specificity, a possibility that remains to be explored in future studies. In contrast to the coronavirus homologs, the M pro s of the arteriviruses EAV and Porcine reproductive and respiratory syndrome virus (PRRSV) were reported to be monomeric, both in solution and in crystals of these proteins. Also in the crystal structures reported for these enzymes, the N termini were found to be located well away from the active site (7,46). The multimerization state of the M pro of WBV and the more closely related toroviruses and their structures remain to be determined but, clearly, the unique observation of two forms of M pro being produced during virus replication warrants further studies, including the use of reverse-genetic approaches, to establish possible roles for the two forms of M pro in the viral life cycle.
In conclusion, the study provides the first detailed characterization of a replicase gene-encoded protein from WBV, the prototype virus of the newly established genus Bafinivirus. Although the data obtained here are generally consistent with WBV being a virus most closely related to toroviruses, they confirm our previous notion that WBV and yet-to-be-identified bafiniviruses diverged profoundly from other nidoviruses, including toroviruses and extending to the most conserved viral enzymes. Our study reveals a remarkable functional and structural variability of nidovirus main proteases, whose functional implications remain to be investigated in future studies.