Previous Article | Next Article ![]()
Journal of Virology, August 2006, p. 7481-7490, Vol. 80, No. 15
0022-538X/06/$08.00+0 doi:10.1128/JVI.00697-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
J. X. Zhang,2,
S. Y. Zhang,1
P. Wang,2
X. H. Fan,3
L. F. Li,2
G. Li,1
B. Q. Dong,4
W. Liu,4
C. L. Cheung,2
K. M. Xu,2
W. J. Song,2
D. Vijaykrishna,2
L. L. M. Poon,2
J. S. M. Peiris,2
G. J. D. Smith,2
H. Chen,2* and
Y. Guan2*
Institute of Zoology and Graduate School, Chinese Academy of Sciences, Beijing 100080,1 State Key Laboratory of Emerging Infectious Diseases, Department of Microbiology, The University of Hong Kong, Faculty of Medicine Building, 21 Sassoon Road, Pokfulam, Hong Kong SAR,2 Department of Microbiology and Immunology, Guangxi Medical University,3 Guangxi Center for Disease Control and Prevention, Nanning 530021, People's Republic of China4
Received 6 April 2006/ Accepted 9 May 2006
|
|
|---|
|
|
|---|
In March 2003, after the outbreak of SARS, a novel coronavirus (SARS-CoV) was identified as the etiologic agent responsible for human infection (20, 31). The identification of SARS-like coronavirus in Himalayan palm civets and raccoon dogs in live-animal markets in southern China suggested that SARS was a possible zoonosis (10). Further virological surveillance confirmed that the infectious source of SARS was from those live-animal markets and confirmed its zoonotic origin (11). However, subsequent studies suggested that those market animals were intermediate hosts rather than the natural reservoirs of SARS-CoV, as extensive surveillance studies did not detect the virus in either farmed or wild animals of the same species (17).
Recent studies have suggested horseshoe bats (Rhinolophus spp.) as possible natural reservoirs of SARS-like coronavirus (23, 25). However, genome sequence comparison of the spike (S) genes from bat SARS-like coronavirus and civet SARS-like coronavirus revealed only 64% genetic homology, suggesting that the evolutionary pathway of SARS-CoV remains to be fully described. Given the high biodiversity of bats, along with significant population size, broad geographical distribution, and the ability to migrate and along with the detection of many emerging viruses (1, 7), it is reasonable to consider that bats may contain the direct progenitor of SARS-CoV. Moreover, a growing number of novel coronaviruses have recently been identified, such as HCoV-NL63 (42) and HCoV-HKU1 (47) from humans and some avian infectious bronchitis virus (IBV)-like coronaviruses from different avian species (16, 26). These accumulated findings suggest that coronaviruses may have a much wider distribution in the animal kingdom than previously thought.
To explore the natural distribution of the virus in bat populations and also to understand the possible role of bats in coronavirus ecology, we conducted a virological surveillance study in China. Genetic analysis revealed that bat coronaviruses mainly clustered into three different groups: group 1, a group including all SARS and SARS-like coronaviruses from different hosts (putative group 4), and an independent bat coronavirus group (putative group 5). Further characterization of bat coronaviruses revealed high genetic diversity across a large geographic distribution and revealed that different species of bats maintain coronaviruses from different groups and that the same species of bat from different geographic locations can also contain the same type of coronavirus. Thus, the findings of this study suggest that bats may play an integral role in the ecology and evolution of coronaviruses.
|
|
|---|
![]() View larger version (31K): [in a new window] |
FIG. 1. Map of China showing 15 provinces where coronavirus surveillance in bats was conducted. Numbers indicate number of sites positive over the total number of sites sampled in each province.
|
|
View this table: [in a new window] |
TABLE 1. Coronavirus distribution in different bat species and locations
|
As many coronaviruses have been recently identified in different animals from different regions, to avoid confusion the nomenclature of bat coronaviruses from this study is given in the following format: host, geographic location of sampling, sample number, and year, e.g., BtCoV/Rhinolophus ferrumequinum/Hubei/273/2004 (abbreviation, BtCoV/273/04).
Genome analysis. The nucleotide data obtained from diagnostic sequencing of the RdRp fragment were analyzed with available coronavirus sequences in GenBank and used to determine the diversity of the detected coronaviruses and to select representative strains for full genome sequencing. Four viruses in two new coronavirus lineages were selected for complete genome sequencing.
RNA extraction was done using the viral RNA kit from QIAGEN, and cDNA synthesis was conducted with random hexamer, gene-specific, and oligo(dT) primers. Degenerate primers for cDNA amplification and sequencing were designed from multiple alignments of GenBank sequence data using the program CODEHOP (35). Conventional PCR using Platinum Taq DNA high-fidelity polymerase (Invitrogen) and gene-specific primers was then used for filling gaps between the CODEHOP-amplified regions. Shotgun sequencing (38) with the Zero Blunt PCR cloning kit (Invitrogen) was conducted for large PCR fragments generated from specific primers between the CODEHOP-amplified regions (35). For regions that could not be amplified using CODEHOP, we used the method of rapid amplification of cDNA ends with second-generation 5'/3' kits for rapid amplification of cDNA ends (Roche). Sequencing was performed by using the BigDye Terminator version 3.1 cycle sequencing kit on an ABI PRISM 3700 DNA analyzer (Applied Biosystems) following the manufacturer's instructions. All primer sequences are available upon request.
The open reading frames (ORFs) of each of the four complete genomes were identified and mapped using the program SeqBuilder (Lasergene version 6.1; DNAStar, Madison, WI) and confirmed using Z-Curve (8). Homology searches of identified ORFs against other known coronaviruses were conducted in the GenBank and Pfam databases (2). Protein precursors produced by ORF1ab were predicted using the program Z-Curve (8). Prediction of transmembrane (TM) domains was performed using TMpred (12).
Sequence similarity. Full-length amino acid alignments of each of the major gene products were used to calculate the similarity (p distances) within and between the different coronavirus groups, including putative groups 4 and 5, using MEGA3 (21). The virus sequences used in this analysis are the same as those in the phylogenetic trees.
Phylogenetic studies. For the structural proteins, spike (S), membrane (M), envelope (E), and nucleocapsid (N), only full-length sequences were included in the analyses. For the replicase domains, the conserved sequence regions of the RdRp and helicase (HEL) were used. Multiple alignments of bat coronaviruses with other known coronaviruses were conducted with the programs TransAlign (3) and ClustalW (41) and manually optimized with Se-Al (33). Phylogenetic trees were constructed using the neighbor-joining criterion with the Jukes-Cantor model (JC69) in the programs MEGA3 (21) and PAUP* version 4.0b (40). Gaps were treated as missing data in all analyses.
Recombination analysis. Sliding window analysis was used to detect recombination within the RdRp, S, and N genes. The same multiple alignments used for phylogenetic tree reconstruction, with the outgroup excluded, were analyzed using the difference of the sum of squares method (window size, 300, with steps of 100 amino acids) in the program Topali (29). The RDP method, as implemented in program RDP version 2 (28), was also used for recombination detection with the percentage of identity for recombinant sequences set from 0 to 100.
Nucleotide sequence accession numbers. The sequences reported in this paper have been deposited in GenBank under accession numbers DQ648786 to DQ648797 and DQ648799 to DQ648858.
|
|
|---|
Two colonies of bats, from different sampling sites, had much higher positive rates than average. One Miniopterus schreibersi colony had a 55% (11/20) positive rate, while a Pipistrellus abramus colony had a 35% (11/31) positive rate. All positive samples were from anal swabs, and none from throat swabs, suggesting that the gastrointestinal tract is the principal replication site of coronavirus infection in those bats.
There were also some species of bats that had high sample numbers, but in which all individuals were negative for coronavirus: 84 individuals of the genus Hipposideros (58 from Hipposideros armiger), 101 specimens of Rhinolophus pusillus, and 37 samples from two genera of the Pteropodidae.
To determine the overall diversity of coronaviruses that were isolated from bats, preliminary phylogenetic analysis of the RdRp fragment obtained from RT-PCR detection revealed that all viruses characterized fell within the previously recognized coronavirus groups, including the SARS-CoV group. Of the 65 viruses, only three bat coronaviruses were closely related to SARS-CoV (putative group 4) and 40 clustered with group 1 viruses, while the remaining 22 viruses form a separate group that is most closely related to group 2 viruses (putative group 5); however, there was no statistical support for this relationship (Fig. 2). None of the coronaviruses characterized in this study were phylogenetically related to group 3.
![]() View larger version (27K): [in a new window] |
FIG. 2. Phylogenetic relationships of 64 coronaviruses isolated from bats in China. The tree was generated based on 440 nucleotides of the RNA-dependent RNA polymerase region by the neighbor-joining method in the MEGA program. Numbers above branches indicate neighbor-joining bootstrap values (percent) calculated from 1,000 bootstrap replicates. Terminal nodes containing bat coronaviruses isolated in this study are collapsed and represented by a blue triangle with the number of viruses indicated within. The tree was rooted to Breda virus (AY427798). Scale bar, 0.05 substitution per site. Red text indicates provinces from where viruses were isolated. Abbreviations: AH, Anhui; FJ, Fujian; GD, Guangdong; GX, Guangxi; HA, Hainan; HB, Hubei; HE, Henan; JX, Jiangxi; SC, Sichuan; SD, Shandong; YN, Yunnan.
|
These findings suggest that genetically divergent coronaviruses are commonly present in, and specific to, different species of bats in China.
Genome organization. Based on preliminary phylogenetic analysis of the RdRp gene (Fig. 2), four strains, representing the diversity of bat coronaviruses isolated in this study, were selected for full genome sequencing: BtCoV/Tylonycteris pachypus/Guangdong/133/2005 (BtCoV/133/05), BtCoV/Rhinolophus ferrumequinum/Hubei/273/2004 (BtCoV/273/04), BtCoV/R. macrotis/Hubei/279/2004 (BtCoV/279/04), and BtCoV/Scotophilus kuhlii/Hainan/512/2005 (BtCoV/512/05). An additional five viruses were selected for partial sequencing of the RdRp, HEL, and S genes: BtCoV/S. kuhlii/Hainan/515/2005 (BtCoV/515/05), BtCoV/S. kuhlii/Hainan/527/2005 (BtCoV/527/05), BtCoV/Pipistrellus pipistrellus/Hainan/434/05 (BtCoV/434/05), BtCoV/P. abramus/Sichuan/355/2005 (BtCoV/355/05), and BtCoV/Myotis ricketti/Yunnan/701/2005 (BtCoV/701/05). Sequences generated in this study were analyzed with all available coronavirus sequence data in public databases. Comparison of the genome organization of bat coronaviruses with that of representative strains of other coronavirus is presented in Fig. 3 and Table 2.
![]() View larger version (24K): [in a new window] |
FIG. 3. Linear representation of the ORFs of the bat coronaviruses and representative known coronaviruses from each group. Conserved functional domains in ORF1a and ORF1b are indicated by yellow boxes. The following predicted domains are shown: pepain-like proteases 1 and 2 (PL1 and PL2), 3C-like protease (3CL), RdRp, metal ion-binding domain (MB), and helicase (Hel). Putative ORFs are indicated by blue boxes and numbered according to their order in the genome: BtCoV/R. ferrumequinum/Hubei/273/04 (BtCoV/273/04), BtCoV/R. macrotis/Hubei/279/04 (BtCoV/279/04), BtCoV/T. pachypus/Guangdong/133/05 (BtCoV/133/05), BtCoV/S. kuhlii/Hainan/512/05 (BtCoV/512/05), SARS-CoV, PEDV, avian IBV, and human coronavirus OC43 (HCoV-OC43).
|
|
View this table: [in a new window] |
TABLE 2. Comparison of coronavirus genome structuresa
|
Putative ORFs coding for nonstructural proteins or accessory proteins were deduced and analyzed if transcription-regulating sequences (TRSs) were present close to, and upstream of, potential initiating methionine residues. The ORFs of nonstructural proteins vary significantly among different bat coronaviruses. The genome organization of BtCoV/273/04 and that of BtCoV/279/04 were essentially the same and were similar to that of SARS-CoV. The genome organization of BtCoV/512/05 is most similar to that of porcine epidemic diarrhea virus (PEDV), while the genome of BtCoV/133/05 is unlike that of all known coronaviruses (Fig. 3).
In the genome of all coronaviruses, approximately the first two-thirds of the genome is composed of the two large replicase ORFs ORF1a and ORF1b, which encode virus replicase polyproteins pp1a and pp1ab (14). Proteolytic processing end products and putative functional domains of the replicase polyproteins were identified. The nonstructural proteins nsp1 and nsp2 were the most variable among these bat coronaviruses, while papain-like protease (PL), 3C-like protease (3CL), RdRp, metal binding (MB), and HEL functional domains were conserved in all genomes, except that of BtCoV/133/05 (Fig. 3). Coronaviruses generally employ two papain-like proteases, PL1 and PL2, to process the N-proximal regions of the replicative polyproteins. PL1 and PL2 were identified in BtCoV/512/05; however, only one PL domain was identified in BtCoV/273/04 and BtCoV/279/04. It is noteworthy that in BtCoV/133/05 both nsp1 and nsp2 were highly divergent from other coronaviruses and that the PL domain could not be identified in any of the nonstructural proteins (Fig. 3; Table 2).
ORFs located between the S and E genes and between the M and N genes were predicted and are numbered according to their order in the genome (Fig. 3; Table 2). In viruses BtCoV/273/04, BtCoV/279/04, and BtCoV/512/05, there is a single ORF between the S and E genes (ORF3). In BtCoV/273/04 and BtCoV/279/04 ORF3 is predicted to encode a similar protein of 274 amino acids (aa) with two predicted TM helices in the N-terminal sequence. BLAST and Pfam searches failed to identify any sequences similar to this protein. In BtCoV/512/05 ORF3 encodes a predicted 224-aa protein also with two predicted TM domains in the N-terminal sequence.
The region between the S and E genes in BtCoV/133/05 is the longest among all known coronaviruses, at 2,013 bp (Fig. 3). Furthermore, in BtCoV/133/05 this region contains three predicted ORFs (ORF3a, ORF3b, and ORF3c), with predicted proteins of 91, 285, and 227 aa, respectively. Each of these ORFs has a conserved TRS upstream of the ORFs: UUAACGAACUU (9 nucleotides) AUG for OFR3a and UUAACGAACUU AUG for ORF3b and ORF3c. The ORF3c-encoded protein contains three TM domains, but no matching proteins could be identified.
In BtCoV/273/04 and BtCoV/279/04 the region between the M and N genes is a 1,085- and a 1,095-bp sequence, respectively, that contains three ORFs (ORF6, ORF7, and ORF8) of 63, 122, and 122 aa, respectively (Fig. 3). ORF7 is predicted to have two TM domains, in both the N- and C-terminal sequences, while for ORF8 one TM helix is predicted. BLAST and Pfam searches failed to identify sequences similar to any of the three predicted proteins. This region between the M and N genes is absent in BtCoV/133/05 and BtCoV/512/05 (Fig. 3). The sequence region between the M and N genes of BtCoV/273/04 and BtCoV/279/04 and other SARS-like CoVs showed a gene organization similar to that of IBV (22, 46). Analysis of this region in a representative IBV (NC_001451) revealed a much shorter region (692 bp) also with two ORFs (ORF6 and ORF7) predicted to encode proteins of 65 and 82 aa, respectively. However, unlike BtCoV/273/04 and BtCoV/279/04, in IBV no conserved TRSs were identified upstream of the three ORFs.
Downstream of the N gene in BtCoV/512/05, there is a 387-bp sequence (ORF10) that is predicted to encode a 129-aa protein with a putative signal peptide at the N-terminal region and three TM domains. This sequence region is absent in all known coronaviruses including BtCoV/133/05, BtCoV/273/04, and BtCoV/279/04 (Fig. 3). No matching protein was identified in GenBank or Pfam.
The hemagglutinin esterase protein, which is present in group 2 coronaviruses (6) and presumably obtained by horizontal gene transfer from influenza C virus (48), was not present in any of the bat coronaviruses analyzed in this study. In the 3' untranslated region a stem-loop II-like (s2m) motif (15) was recognized in BtCoV/273/04 and BtCoV/279/04 but not in BtCoV/133/05 and BtCoV/512/05 (Fig. 3). This motif is also present in group 3 coronaviruses and SARS-CoV but not in other coronaviruses (34, 37).
Sequence similarity. To understand the interrelationship between the BtCoVs and the other known coronaviruses, similarity analysis within and between groups was conducted (9). Analysis of the RdRp amino acid sequence showed that, within groups, the similarity ranged from 82 to 99%, while between different groups, including the putative groups 4 and 5 in the present study, the similarity range was 60 to 74% (Fig. 4A). In contrast, within-group similarities of the S protein were from 59 to 91% and between-group similarities were from 22 to 36% (Fig. 4B). Similar patterns were observed for the remaining major gene products: more-conserved genes usually had higher similarity between different groups, and less-conserved genes had lower similarity between groups (data not shown).
![]() View larger version (32K): [in a new window] |
FIG. 4. Similarity histogram of RdRp (A) and spike (B) genes based on alignments from the program TransAlign.
|
![]() View larger version (26K): [in a new window] |
FIG. 5. Phylogenetic relationships of the helicase (A) and spike (B) genes of representative coronaviruses isolated from bats in China. Trees were generated by the neighbor-joining method in the PAUP program. Numbers above branches indicate neighbor-joining bootstrap values (percent) calculated from 1,000 bootstrap replicates. Analyses were based on 1,833 nucleotides for the helicase gene and 3,510 nucleotides for the spike gene. The trees were rooted to Breda virus (AY427798). Scale bar, 0.1 substitution per site.
|
In all genes analyzed, except the S gene, group 1 bat coronaviruses are most closely related to PEDV (bootstrap support, 99%), and these viruses cluster with HCoV-NL63 and HCoV-229E (Fig. 5A). In the S gene tree, while group 1 bat coronaviruses still clustered together with PEDV, they were now most closely related to those coronaviruses from domestic animals (Fig. 5B). The relationship of group 1 bat coronaviruses to PEDV, transmissible gastroenteritis virus, and feline coronaviruses demonstrates that virus transmission may occur between bats, livestock, and companion animals, presenting a possible pathway for human infection.
None of the viruses sequenced in this study was the direct progenitor of SARS. It is noteworthy that within putative group 4 the SARS-like viruses from bats clustered together, away from SARS viruses from other mammalian hosts (Fig. 5), suggesting that other intermediate hosts or viruses were involved in the emergence of SARS.
Taken together, the above phylogenetic findings demonstrated that bats had a relatively high diversity of coronaviruses and harbor a distinct lineage (putative group 5) that may represent a novel coronavirus group. These relationships are in consensus with the results of the genomic and sequence similarity analyses.
Recombination analysis. To evaluate if the different gene phylogenies for group 1 bat CoVs and putative group 5 viruses were due to recombination, a sliding window analysis was conducted. Results of this analysis indicated that while some areas of the RdRp, S, and N genes may be recombinant, there was no statistical support for this conclusion. Furthermore, those potentially recombinant areas were highly divergent and ambiguously aligned, and the different phylogenies were therefore likely due to variation in the rates of substitution and not recombination between coronaviruses (13, 30).
|
|
|---|
Phylogenetic analyses of the present study revealed high genetic diversity of coronaviruses in bats from this region. Except for SARS-like viruses, many bat coronaviruses clustered with existing group 1 viruses; while others formed a separate lineage that included only viruses from bats (putative group 5). Within group 1, the bat CoVs did not form a single group but were highly divergent and related to coronaviruses previously identified from different domestic animals.
Our findings also revealed that within the SARS and SARS-like CoV group (putative group 4) the S gene and other genes clustered into two subgroups, one of bat CoVs and another of SARS viruses from humans and other mammalian hosts. As the similarity of the S genes between those two subgroups is only approximately 80% and since coronaviruses usually have low mutation rates (24), it seems unlikely that these viruses have diverged due to host adaptation within such a short time period. Therefore, the direct progenitor of the SARS-CoV from civets in the animal markets of southern China and the ecological and evolutionary pathway that led to the emergence of SARS have still not been fully determined.
The association between almost all of the coronaviruses that we sequenced and a single bat species demonstrates a high degree of host restriction for coronavirus in bat populations. For example, similar viruses were detected in Myotis ricketti from Anhui, Guangdong, and Yunnan, approximately 1,600 km distant, while two different bat species sampled in the same cave had different coronaviruses. This wide distribution may be associated with bat migration. It also appears that SARS-like CoVs from bats are restricted to different species of Rhinolophus. Furthermore, Hipposideros, which belongs to the same family as Rhinolophus, and all members of the Pteropodidae all tested negative for coronavirus, even though many individuals were sampled (Table 1). As such, these viruses may be restricted to just a few families and genera, and further information regarding which taxonomic groups of bats may host coronaviruses will provide an insight into the evolution and ecology of coronaviruses.
While there have been previous reports of recombination in coronaviruses as a major evolution pattern (18, 19), it is likely that at least some of this is due to those sequence areas being highly divergent. This study did not find any convincing evidence for recombination events in the bat coronaviruses tested. This information further supports the high degree of host specificity seen for bat coronaviruses, as two divergent viruses are unlikely to coinfect the same bat species, let alone the same individual. However, it must be noted that there was one instance of a single bat species being infected with coronaviruses from two different groups (Fig. 2; Table 1).
It is also possible that coronavirus may cause a persistent or long-term infection in bat species as observed for other coronaviruses in vivo and in vitro (5, 36). Each of the previous studies that have identified coronaviruses in bats has sampled at various times and in different areas of China, and all have successfully identified coronaviruses from the samples (23, 25, 32). In addition, the present study was conducted over 17 months in provinces throughout China, and positive samples were identified almost year-round.
In the present study, all bat coronaviruses tested had classical coronavirus genome organization (4). However, BtCoV/133/05 from putative group 5 had the longest genome characterized from bats, a large noncoding region at the start of the genome in which we were unable to identify the PL domain, and also three ORFs between the S and E genes.
The continued identification of novel coronaviruses from different hosts, especially bats, suggests that coronaviruses are more diverse than previously thought (14). Therefore, the classification of the group may need to be modified to match this increasing diversity. The results of this study suggest that many novel coronaviruses cannot be easily accommodated in the current classification, as antigenic data are not available in many cases due to difficulty in virus isolation (14, 23, 25, 32). Genetic data also indicate that some of these novel coronaviruses are intermediate strains that fall between the established groups. Therefore, based on phylogenetic relationships, low genetic similarity, and unique genome organization we propose a new putative coronavirus group (group 5) and also support the suggestion that SARS-like coronaviruses belong to group 4 (27). The proliferation of coronaviruses identified from different hosts has also led to confusion in naming the viruses. We have therefore used a standardized naming system based on the influenza A virus convention. While any changes in nomenclature and taxonomy must be arrived at through consensus in the scientific community, we believe that it is reasonable to consider these issues.
In considering the diversity of species and the habitats that they occupy, large population sizes and densities, and the ability to migrate, bats appear to be ideal candidates for the natural reservoirs of all coronaviruses (7). The current study revealed that coronaviruses in bats exhibit high genetic diversity and high prevalence across a wide geographical distribution, possibly with asymptomatic or persistent infection. However, as bats are a large order that account for approximately 20% of extant mammalian species (1), so far only a small proportion of the total species number have been investigated and those only from China (23, 25, 32). There is also a general lack of knowledge regarding the prevalence of coronaviruses in other animal groups, and it will be difficult to reach solid conclusions until more is known regarding the frequency and diversity of coronaviruses in other animals, especially those that share ecological space with bats.
X. C. Tang and J. X. Zhang contributed equally to the manuscript. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»