Previous Article | Next Article ![]()
Journal of Virology, February 2005, p. 1595-1604, Vol. 79, No. 3
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.3.1595-1604.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Laboratory of Clinical and Epidemiological Virology, Department of Microbiology and Immunology, Rega Institute for Medical Research, University of Leuven, Leuven, Belgium
Received 14 June 2004/ Accepted 16 September 2004
|
|
|---|
|
|
|---|
Before the 2002-to-2003 severe acute respiratory syndrome (SARS) epidemic, coronaviruses were somewhat neglected in human medicine, but they have always been of considerable importance in animal health. Coronaviruses infect a variety of livestock, poultry, and companion animals, in whom they can cause serious and often fatal respiratory, enteric, cardiovascular, and neurologic diseases (25). Most of our understanding about the molecular pathogenic properties of coronaviruses has been achieved by the veterinary virology community.
The coronaviruses are classified into three groups based on genetic and serological relationships (19). Group 1 contains the porcine epidemic diarrhea virus (PEDV), porcine transmissible gastroenteritis virus (TGEV), canine coronavirus (CCoV), feline infectious peritonitis virus (FIPV), human coronavirus 229E (HCoV-229E), and the recently identified human coronavirus NL63 (HCoV-NL63). Group 2 contains the murine hepatitis virus (MHV), bovine coronavirus (BCoV), human coronavirus OC43 (HCoV-OC43), rat sialodacryoadenitis virus (SDAV), porcine hemagglutinating encephalomyelitis virus (PHEV), canine respiratory coronavirus (CRCoV), and equine coronavirus (ECoV). Group 3 contains the avian infectious bronchitis virus (IBV) and turkey coronavirus (TCoV). The SARS coronavirus (SARS-CoV) is not assigned to any of these groups but is most closely related to group 2 coronaviruses (21, 54).
HCoV-OC43 (ICTVdb code 19.0.1.0.006) and HCoV-229E (ICTVdb code 19.0.1.0.005) were isolated in 1967 from volunteers at the Common Cold Unit in Salisbury, United Kingdom. HCoV-OC43 was initially propagated on ciliated human embryonic tracheal and nasal organ cultures (42). HCoV-OC43 and HCoV-229E are responsible for 10 to 30% of all common colds, and infections occur mainly during the winter and early spring (38). The incubation period is 2 to 4 days. During the 2002-to-2003 winter season, a new human coronavirus, HCoV-NL63, was isolated from a 7-month-old child suffering from bronchiolitis and conjunctivitis in The Netherlands (61). Seven additional HCoV-NL63-infected individuals, both infants and adults, were identified, indicating that HCoV-NL63 can be considered an important new etiologic agent in respiratory tract infections. Coronaviruses infect all age groups, and reinfections are common. The infection can be subclinical and is usually mild, but there have been reports of more-severe lower respiratory tract involvement in infants and elderly people (17, 60). Human coronaviruses can induce a demyelinating disease in rodents and can infect primary cultures of human astrocytes and microglia. A possible etiological role for HCoV-OC43 and HCoV-229E in multiple sclerosis is being debated (4, 13, 15).
The coronavirus genomes are the largest of the known RNA viruses (27 to 31.5 kb) and are polycistronic, generating a nested set of subgenomic RNAs with common 5' and 3' sequences (35). The 5' two-thirds of the genome consists of two large replicase open reading frames (ORFs), ORF1a and ORF1b. The ORF1a polyprotein (pp1a) can be extended with ORF1b-encoded sequences via a 1 ribosomal frameshift at a conserved slippery site (6), generating the >7,000-amino-acid polyprotein pp1ab, which includes the putative RNA-dependent RNA polymerase (RdRp) and RNA helicase (HEL) activity (20, 39). The polyproteins pp1a and pp1ab are autocatalytically processed by two or three different viral proteases encoded by ORF1a: one or two papain-like proteases (PLP1 and PLP2) and a 3C-like protease (3CLpro) (39, 67, 68). Other putative domains presumably associated with a 3'-to-5' exonuclease (ExoN) activity, a poly(U)-specific endo-RNase (XendoU) activity, and a 2'-O-methyltransferase (2'-O-MT) activity are predicted in pp1ab (27, 54). The 3' end of a coronavirus genome includes several structural and accessory protein genes: an envelope-associated hemagglutinin esterase (HE) glycoprotein gene, present only in group 2 coronaviruses; a spike (S) glycoprotein gene; an envelope (E) protein gene; a matrix (M) glycoprotein gene; a nucleocapsid (N) phosphoprotein gene; and several ORFs that encode putative nonstructural (ns) proteins (35).
Coronaviruses are well equipped to adapt rapidly to changing ecological niches by the high mutation rate of their RNA genome (about 104 nucleotide substitution/site/year) and high recombination frequencies (51). Many animal coronaviruses cause long-term or persistent enzootic infections. Long periods of coronavirus infection combined with a high mutation and recombination rate increase the probability that a virus mutant with an extended host range might arise.
The current emergence of the SARS-CoV is an example of a crossing of the animal-human species barrier. It is likely that the SARS-CoV was enzootic in an unknown animal or bird species before suddenly emerging as a virulent virus for humans. Chinese scientists found that six masked palm civets (Paguma larvata) and a racoon dog (Nyctereutes procyonoides) for sale in an exotic food market in Shenzhen, in the Guangdong province in Southern China, were harboring a virus very similar to the SARS-CoV (1). Thirteen percent of the civet merchants tested at markets in Guangdong also had SARS antibodies. Sequence analysis showed that the animal version of the SARS-CoV contained an extra stretch of 29 bases (22). It is still not clear whether the civets were a reservoir for the virus or were infected by another species.
HCoV-OC43 and BCoV (ICTVdb code 03.019.0.01.002) show remarkable antigenic and genetic similarities (23, 29, 36, 44, 52, 63, 65). They both have hemagglutinating activity by attaching to the N-acetyl-9-O-acetylneuraminic acid moiety on red blood cells (33). BCoV causes severe diarrhea in newborn calves. The complete nucleotide sequences of different BCoV strains are known, but only fragments of the HCoV-OC43 genome had been determined previously. In this paper, we report the complete HCoV-OC43 sequence (30,738 bases) and the comparative characterization and evolutionary relationship of the BCoV-HCoV-OC43 pair. This is the first animal-human zoonotic pair of coronaviruses that can be analyzed in order to gain insights into the processes of adaptation of a nonhuman coronavirus to a human host.
|
|
|---|
Sequencing of the HCoV-OC43 genome. To determine the HCoV-OC43 genomic sequence, a set of overlapping RT-PCR products (average size, 1.5 kb) encompassing the entire genome was generated. For both RT-PCR and sequencing, oligonucleotide primers were designed in regions that were conserved between the BCoV and MHV genomes. The forward PCR primer in the 5'-terminal sequence (OC43F1 [5'-GATTGTGAGCGATTTGC-3']) was based on the HCoV-OC43 5' untranslated region partial sequence (H. Y. Wu, J. S. Guy, D. Yoo, R. Vlasak, and D. A. Brian, unpublished data; GenBank accession number AF523847). To generate RT-PCR products containing the exact 3'-terminal sequence, we used oligonucleotide OC43R74 (5'-TTTTTTTTTTGTGATTCTTCCA-3') based on the conserved 3'-end sequence of all known group 2 coronaviruses. By using 150 sequencing primers, sequencing in both directions was performed on an ABI Prism 3100 genetic analyzer (Perkin-Elmer Applied Biosystems) using the BigDye terminator cycle sequencing kit (version 3.1). Chromatogram sequencing files were inspected with Chromas 2.2 (Technelysium, Helensvale, Australia), and contigs were prepared by using SeqMan II (DNASTAR, Madison, Wis.).
DNA and protein sequence analyses. ORF analysis was performed by using the NCBI ORF finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). Potential 3C-like protease cleavage sites were identified by using the NetCorona 1.0 server (30). DNA and protein similarity searches were performed using the NCBI WWW-BLAST (basic local alignment search tool) server on the GenBank DNA database, release 118.0 (2). Pairwise nucleotide and protein sequence alignments were performed by using FASTA algorithms in the ALIGN program on the GENESTREAM network server (http://www2.igh.cnrs.fr) at the Institut de Génétique Humaine in Montpellier, France (47). Maizel-Lenk dot matrix plots were calculated using the pairwise FLAG 1.0 (fast local alignment for gigabases) algorithm at the server of the Biomedical Engineering Center of the Industrial Technology Research Institute in Hsinchu City, Taiwan (http://bioinformatics.itri.org.tw/prflag/prflag.php). Multiple sequence alignments were prepared by using CLUSTALW (58) and CLUSTALX, version 1.82 (59) and were manually edited in GENEDOC (46). Phylogenetic analyses were conducted by using MEGA, version 2.1 (34).
Evolutionary rate analyses and timing of the most recent common ancestor. The relationship between isolation date and genetic divergence was investigated using a linear regression, based on a maximum-likelihood tree, as implemented in the Path-O-Gen software, kindly provided by Andrew Rambaut (University of Oxford, Oxford, United Kingdom). Evolutionary rates and divergence times were estimated by using maximum likelihood in the TipDate software package, version 1.2 (49), and Bayesian inference in BEAST, version 1.03 (kindly made available by A. J. Drummond and A. Rambaut, University of Oxford; http://evolve.zoo.ox.ac.uk/beast/). The molecular clock hypothesis was tested by using the likelihood ratio test.
Nucleotide sequence accession number. The nucleotide sequence data reported in this paper were deposited in GenBank under accession number AY391777 by using the National Center for Biotechnology Information (NCBI; Bethesda, Md.) BankIt v3.0 submission tool (http://www3.ncbi.nlm.nih.gov/BankIt/).
|
|
|---|
ORF organization of HCoV-OC43. The HCoV-OC43 genome contains 11 major ORFs flanked by 5' and 3' untranslated regions of 211 and 288 nucleotides, respectively. A linear representation of the major ORFs of HCoV-OC43, other group 2 coronaviruses, and SARS-CoV is given in Fig. 1. Table 1 shows a comparison of the positions of the major ORFs of HCoV-OC43 and BCoV strain Mebus.
![]() View larger version (24K): [in a new window] |
FIG. 1. Linear representation of the ORFs of the group 2 coronaviruses and SARS-CoV. Nucleotide insertions (open arrowheads) and deletions (solid arrowheads) in the HCoV-OC43 genome compared to BCoV are shown.
|
|
View this table: [in a new window] |
TABLE 1. Positions of the major ORFs of HCoV-OC43 (ATCC VR759) and BCoV (Mebus strain)
|
|
View larger version (6K): [in a new window] |
FIG. 2. Overview of the putative domain organization and potential proteolytic cleavage sites of the HCoV-OC43 replicase polyprotein pp1ab. Cleavage sites that are predicted to be processed by 3C-like protease are indicated by black arrowheads, while potential papain-like protease cleavage sites are indicated by white arrowheads. The following predicted domains are shown: papain-like proteases 1 and 2 (PLP1 and PLP2), X domain (X), putative transmembrane domains 1, 2, and 3 (TM1, TM2, and TM3), 3C-like protease (3CL), growth factor-like domain (GFL), RdRp, metal ion-binding domain (MB), HEL, ATPase, putative 3'-to-5' exonuclease (ExoN), putative poly(U)-specific endo-RNase (XendoU), and a putative S-adenosylmethionine-dependent ribose 2'-O-methyltransferase (MT).
|
HCoV-OC43 sequence similarity to other group 2 coronaviruses. The sequence similarity among HCoV-OC43, BCoV, CRCoV, PHEV, ECoV, MHV, and SDAV was investigated by pairwise alignments of the corresponding ORFs and their proteins (Table 2). HCoV-OC43 showed the highest percentage of similarity to BCoV in all ORFs except for the HCoV-OC43 E gene, which showed 99.6% identity on the nucleotide level and 98.8% identity on the protein level to the PHEV E gene. Maizel-Lenk dot matrix plots illustrate the similarity between HCoV-OC43 and BCoV (Fig. 3).
|
View this table: [in a new window] |
TABLE 2. Nucleotide and amino acid similarities of the major HCoV-OC43 (ATCC VR759) ORFs with the ORFs of BCoV, CRCoV, PHEV, ECoV, MHV, and SDAV
|
![]() View larger version (31K): [in a new window] |
FIG. 3. Maizel-Lenk dot matrix plots: the complete genome sequence of HCoV-OC43 is compared to the complete genome sequences of BCoV, MHV, SARS-CoV, HCoV-229E, IBV, and TGEV, respectively. Sequence identities are indicated by a dot.
|
![]() View larger version (18K): [in a new window] |
FIG. 4. Phylogenetic analysis of the coronavirus ORF1b replicase amino acid sequences. The HCoV-OC43 ORF1b protein (GenBank accession number AY391777) was compared to other coronaviruses and to an equine torovirus as an outgroup. Group 1, HCoV-229E (accession number AF304460), HCoV-NL63 (AY567487), PEDV strain CV777 (AF353511), and TGEV strain Purdue (AJ271965). Group 2, BCoV strain Mebus (U00735), MHV type 2 (MHV-2; AF201929), MHV strain Penn 97-1 (AF208066), and MHV-A59 (X51939). Group 3, IBV strain Beaudette (M95169), IBV strain LX4 (AY338732), IBV strain BJ (AY319651). SARS-CoV strain Frankfurt-1 (AY291315) is not classified in any of these groups but is most closely related to group 2 coronaviruses. Outgroup, equine Berne torovirus (EToV; X52374). Regions that were poorly conserved in the manually edited multiple protein sequence alignment were deleted from the alignment. All columns containing gaps were removed. The resulting alignment included 2,083 characters (1,122 being parsimony informative) and contained the meld of the following HCoV-OC43 fragments: 13686-13721, 13737-13793, 13797-13820, 13857-13889, 13869-13994, 14013-14090, 14127-14174, 14247-14390, 14397-14594, 14598-14756, 14766-14855, 14859-15230, 15243-15443, 15480-15674, 15684-15719, 15729-15764, 15786-15854, 15864-15989, 16023-16358, 16374-16715, 16719-16898, 16902-17093, 17115-17258, 17268-17336, 17340-17363, 17379-17501, 17535-17561, 17568-17825, 17925-18008, 18018-18032, 18069-18101, 18207-18440, 18450-18527, 18531-18563, 18576-18602, 18612-18929, 18942-19010, 19,026-19139, 19143-19259, 19284-19466, 19479-19625, 19686-19793, 20289-20318, 20370-20603, 20625-20708, 20718-20762, 20769-20885, 20907-21008, 21045-21125, 21135-21233, 21252-21296, 21309-21431, and 21438-21476. The frequencies of occurrence of particular bifurcations (percentage of 10,000 bootstrap replicate calculations) are indicated at the nodes.
|
|
View this table: [in a new window] |
TABLE 3. Date and area of isolation of bovine and human coronaviruses used to calculate TMRCA
|
![]() View larger version (20K): [in a new window] |
FIG. 5. Maximum-likelihood phylogenetic tree of spike gene nucleotide sequences of HCoV-OC43 and several BCoV strains for which the date of isolation was known.
|
![]() View larger version (17K): [in a new window] |
FIG. 6. Results of the evolutionary rate analysis. Line a, linear regression of root-to-tip divergence (y axis) versus sampling time (x axis). The point at which the regression line crosses the time axis indicates the TMRCA (1891). Line b, maximum-likelihood estimate (1873) with 95% confidence intervals (1815 to 1899) for the TMRCA. Curve c, marginal posterior probability (right y axis) for the TMRCA obtained by using the Bayesian coalescent approach. The vertical bars in the distribution represent the 95% highest posterior density interval. Dates of isolation of HCoV-OC43 and BCoV strains are indicated by grey dots.
|
|
View this table: [in a new window] |
TABLE 4. Evolutionary rate estimations of the BCoV-HCoV-OC43 pair
|
|
|
|---|
The prototype HCoV-OC43 strain (ATCC VR759) is a laboratory strain that, since its isolation in 1967, has been passaged 7 times in human embryonic tracheal organ culture, followed by 15 passages in suckling mouse brain cells and an unknown number of passages in human rectal tumor HRT-18 cells and/or Vero cells. During the passage history, it is likely that a number of mutations have accumulated. It would be interesting to analyze the complete nucleotide sequence of contemporary HCoV-OC43 strains that are free from in vitro expansion mutations.
Nucleotide and amino acid similarity percentages were determined for the major HCoV-OC43 ORFs and those of other group 2 coronaviruses (BCoV, CRCoV, PHEV, ECoV, MHV, and SDAV). For all HCoV-OC43 ORFs, the highest similarity demonstrated was that to the corresponding BCoV ORFs, except for the HCoV-OC43 E gene, which showed 99.6% identity on the nucleotide level and 98.8% identity on the amino acid level with the PHEV E gene. Based on the high similarity between HCoV-OC43 and PHEV in E, and between HCoV-OC43 and BCoV in all the other major ORFs, some hypotheses concerning the origin of HCoV-OC43 can be put forward. Adaptation of BCoV to a human host and a recombination event between BCoV and PHEV leading to a new type of coronavirus with a different species specificity could both have been responsible for the emergence of a new human coronavirus.
Phylogenetic analysis of coronavirus ORF1b replicase protein sequences confirms the presence of three coronavirus group clusters and a separate branch for SARS-CoV, which seems to be most closely related to group 2 coronaviruses (21, 54). HCoV-OC43 and BCoV cluster together, demonstrating the close relationship between the two viruses. There is in fact more divergence between the different MHV strains or between the different IBV strains than between HCoV-OC43 and BCoV. The close relationship between HCoV-OC43 and BCoV on the genetic level has also been shown to correspond to a close antigenic relationship: by using monoclonal antibodies directed against the BCoV S protein, common antigenic determinants for BCoV, HCoV-OC43, and PHEV have been demonstrated (62, 63). A phylogenetic tree was also constructed for the spike gene of HCoV-OC43 and several BCoV isolates for which the date of isolation could be traced. Different molecular clock calculations situate the most recent common ancestor of HCoV-OC43 and the different BCoV isolates around 1890. We suggest that around 1890, BCoV might have jumped the species barrier and became able to infect humans, resulting in the emergence of a new type of human coronavirus (HCoV-OC43), a scenario similar to the origin of the SARS outbreak. Indisputable evidence for the bovine-to-human direction of the interspecies transmission event, instead of a human-to-bovine direction, is not available. However, we consider the occurrence of a 290-nucleotide deletion (corresponding to the absence of BCoV ns4.9 and ns4.8) in HCoV-OC43 relative to the BCoV genome to be a potential supporting argument, as this additional sequence fragment in BCoV is also present in MHV and SDAV. Consequently, we assume that a deletion from BCoV to HCoV-OC43 rather than an insertion in the opposite direction took place during evolution, and thus, we hypothesize that the interspecies transmission event occurred from bovines to humans.
Nevertheless, it is possible that two other group 2 coronaviruses, CRCoV and PHEV, might have played a role in the emergence of HCoV-OC43. CRCoV appears to be very closely related to BCoV and HCoV-OC43 (16), and for the HCoV-OC43 E gene, the highest percentage of similarity was found with the PHEV E gene, suggesting a possible recombination event. To elucidate the evolutionary relationship of HCoV-OC43 and BCoV with CRCoV and PHEV, complete genome sequence data of CRCoV and PHEV would be required. Molecular dating has frequently been used to investigate the origin of viral epidemics (31, 40, 48). The reliability of such an analysis is dependent on the validity of the molecular clock hypothesis, which assumes that the evolutionary rate is roughly constant in the lineages of a phylogenetic tree. Although this assumption is frequently violated for viral sequence data (28), a molecular clock test indicated that this hypothesis could not be rejected for the coronavirus data set investigated here.
In the second half of the nineteenth century, a highly infectious respiratory disease with a high mortality rate affected cattle herds around the world (11, 41). The same disease, or a similar disease, is now known as contagious bovine pleuropneumonia (CBPP) and is caused by Mycoplasma mycoides mycoides. In the nineteenth century, the clinical symptoms of CBPP would have been difficult to distinguish from those of BCoV pneumonia, and it can be hypothesized that the bovine respiratory disease in the second half of the nineteenth century might have been similar to the coronavirus-associated shipping fever disease (56). Most industrialized countries mounted massive culling operations in the period between 1870 and 1890 (11) and were able to eradicate the disease by the beginning of the twentieth century. During the slaughtering of CBPP-affected herds, there was ample opportunity for the culling personnel to come into contact with bovine respiratory secretions. These respiratory secretions could have contained BCoV, either as the causal agent or as a coinfecting agent.
Interestingly, around the period in which the BCoV interspecies transmission would probably have taken place, a human epidemic ascribed to influenza was spreading around the world. The 1889-1890 pandemic probably originated in Central Asia (3) and was characterized by malaise, fever, and pronounced central nervous system symptoms (53). A significant increase in case fatality with increasing age was observed. Absolute evidence that an influenza virus was the causative agent of this epidemic was never obtained, due to the lack of tissue samples from that period. However, postepidemic analysis in 1957 of the influenza antibody pattern in sera of people who were 50 to 100 years old indicated that H2N2 influenza antibodies might have originated from the 1889-1890 pandemic (45). However, it is tempting to speculate about an alternative hypothesis, that the 1889-1890 pandemic may have been the result of interspecies transmission of bovine coronaviruses to humans, resulting in the subsequent emergence of HCoV-OC43. The dating of the most recent common ancestor of BCoV and HCoV-OC43 to around 1890 is one argument. Another argument is the fact that central nervous system symptoms were more pronounced during the 1889-1890 epidemic than in other influenza outbreaks. It has been shown that HCoV-OC43 has neurotropism and can be neuroinvasive (4).
Maximum-likelihood phylogenetic analysis of the spike gene of HCoV-OC43 and several BCoV strains for which the date of isolation is known indicates that these strains evolved according to a molecular clock. An evolutionary rate on the order of 4 x 104 nucleotide change per site per year was estimated, and this rate was highly consistent across the different methods used. This rate falls within the range reported for other RNA viruses, including SARS-CoV (14, 50, 51).
This study provides evidence for viral promiscuity, a phenomenon that has already been reported for several animal coronaviruses, including BCoV, for which the potential to infect other species, including humans, has already been described (26, 66). The isolation of the SARS-CoV from masked palm civets and raccoon dogs indicates that this new type of coronavirus was also enzootic in an animal species before suddenly emerging as a virulent virus for humans. The characterization of the BCoV-HCoV-OC43 pair presented in this study provides insights into the process of adaptation of a nonhuman coronavirus to a human host, which is important for understanding the interspecies transmission events that led to the origin of the SARS outbreak.
This work was supported by a fellowship from the Flemish Fonds voor Wetenschappelijk Onderzoek (FWO) to L.V. and by FWO grant G.0288.01.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»