ABSTRACT
We previously showed that close relatives of human coronavirus 229E (HCoV-229E) exist in African bats. The small sample and limited genomic characterizations have prevented further analyses so far. Here, we tested 2,087 fecal specimens from 11 bat species sampled in Ghana for HCoV-229E-related viruses by reverse transcription-PCR (RT-PCR). Only hipposiderid bats tested positive. To compare the genetic diversity of bat viruses and HCoV-229E, we tested historical isolates and diagnostic specimens sampled globally over 10 years. Bat viruses were 5- and 6-fold more diversified than HCoV-229E in the RNA-dependent RNA polymerase (RdRp) and spike genes. In phylogenetic analyses, HCoV-229E strains were monophyletic and not intermixed with animal viruses. Bat viruses formed three large clades in close and more distant sister relationships. A recently described 229E-related alpaca virus occupied an intermediate phylogenetic position between bat and human viruses. According to taxonomic criteria, human, alpaca, and bat viruses form a single CoV species showing evidence for multiple recombination events. HCoV-229E and the alpaca virus showed a major deletion in the spike S1 region compared to all bat viruses. Analyses of four full genomes from 229E-related bat CoVs revealed an eighth open reading frame (ORF8) located at the genomic 3′ end. ORF8 also existed in the 229E-related alpaca virus. Reanalysis of HCoV-229E sequences showed a conserved transcription regulatory sequence preceding remnants of this ORF, suggesting its loss after acquisition of a 229E-related CoV by humans. These data suggested an evolutionary origin of 229E-related CoVs in hipposiderid bats, hypothetically with camelids as intermediate hosts preceding the establishment of HCoV-229E.
IMPORTANCE The ancestral origins of major human coronaviruses (HCoVs) likely involve bat hosts. Here, we provide conclusive genetic evidence for an evolutionary origin of the common cold virus HCoV-229E in hipposiderid bats by analyzing a large sample of African bats and characterizing several bat viruses on a full-genome level. Our evolutionary analyses show that animal and human viruses are genetically closely related, can exchange genetic material, and form a single viral species. We show that the putative host switches leading to the formation of HCoV-229E were accompanied by major genomic changes, including deletions in the viral spike glycoprotein gene and loss of an open reading frame. We reanalyze a previously described genetically related alpaca virus and discuss the role of camelids as potential intermediate hosts between bat and human viruses. The evolutionary history of HCoV-229E likely shares important characteristics with that of the recently emerged highly pathogenic Middle East respiratory syndrome (MERS) coronavirus.
INTRODUCTION
Coronaviruses (CoVs) are enveloped viruses with a single-stranded, positive-sense contiguous RNA genome of up to 32 kb. The subfamily Coronavirinae contains four genera termed Alpha-, Beta-, Gamma-, and Deltacoronavirus. Mammals are predominantly infected by alpha- and betacoronaviruses, while gamma- and deltacoronaviruses mainly infect avian hosts (1, 2).
Four human coronaviruses (HCoVs), termed HCoV-229E, -NL63, -OC43, and -HKU1, circulate in the human population and mostly cause mild respiratory disease (3). HCoV-229E is frequently detected in up to 15% of specimens taken from individuals with respiratory disease (4–6). Although HCoV-229E can be detected in fecal specimens, HCoVs generally do not seem to play a role in acute gastroenteritis (7–9). Severe respiratory disease with high case-fatality rates is caused by severe acute respiratory syndrome (SARS)-CoV and Middle East respiratory syndrome (MERS)-CoV, which emerged recently. HCoV-229E and HCoV-NL63 belong to the genus Alphacoronavirus, while HCoV-OC43, HCoV-HKU1, and SARS- and MERS-CoV belong to the genus Betacoronavirus (1, 10).
In analogy to major human pathogens, including Ebola virus, rabies virus, mumps virus, and hepatitis B and C viruses (11–16), the evolutionary origins of SARS- and MERS-CoV were traced back to bats (17–22). The genetic diversity of bat CoVs described over the last decade exceeds the diversity in other mammalian hosts (2). This has led to speculations on an evolutionary origin of all mammalian CoVs in bat hosts (23). Bats share important ecological features potentially facilitating virus maintenance and transmission, such as close contact within large social groups, longevity, and the ability of flight (13, 24).
How humans become exposed to remote wildlife viruses is not always clear (25). Human infection with SARS-CoV and MERS-CoV was likely mediated by peridomestic animals. For SARS-CoV, the suspected source of infection was carnivores (26). Preliminary evidence suggested that these carnivore hosts may also have adapted SARS-CoV for human infection (27). For MERS-CoV, camelids are likely intermediate hosts, supported by circulation of MERS-CoV in camel herds globally and for prolonged periods of time (28–30). Whether MERS-CoV only recently acquired the capacity to infect humans is unclear.
The evolutionary origins of HCoV-229E are uncertain. In 2007, a syndrome of severe respiratory disease and sudden death was recognized in captive alpacas from the United States (31), and an alphacoronavirus genetically closely related to HCoV-229E was identified as the causative agent (32).
In 2009, we detected viruses in fecal specimens from 5 of 75 hipposiderid bats from Ghana and showed that these bat viruses were genetically related to HCoV-229E by characterizing their partial RNA-dependent RNA polymerase (RdRp) and nucleocapsid genes (33). A lack of specimens containing high CoV RNA concentrations has so far prevented a more comprehensive characterization of those bat viruses to further address their relatedness to HCoV-229E. Here, we tested more than 2,000 bats from Ghana for CoVs related to HCoV-229E. We describe highly diversified bat viruses on a full-genome level and analyze the evolutionary history of HCoV-229E and the genetically related alpaca CoV.
MATERIALS AND METHODS
For all capturing, sampling, and exportation of bat specimens, we obtained permission from the respective countries' authorities.
Bat and human sampling.Bats were caught in the Ashanti region, central Ghana, during 2009 to 2011 as described previously (21). Archived anonymized respiratory specimens derived from patients sampled between 2002 and 2011 were obtained from Hong Kong/China, Germany, The Netherlands, Brazil, and Ghana.
RNA purification, coronavirus detection, and characterization.RNA was purified from approximately 20 mg of fecal material suspended in 500 μl RNAlater stabilizing solution using the MagNA Pure 96 system (Roche, Penzberg, Germany). Elution volumes were 100 μl. Testing for CoV RNA was done using a real-time reverse transcription-PCR (RT-PCR) assay designed to allow detection of HCoV-229E and all genetically related bat CoVs known from our pilot study (33). Oligonucleotide sequences were as follows: CoV229Elike-F13948m, TCYAGAGAGGTKGTTGTTACWAAYCT; CoV229Elike-P13990m, 6-carboxyfuorescein (FAM)-TGGCMACTTAATAAGTTTGGIAARGCYGG-Black Hole Quencher 1 (BHQ1); and CoV229Elike-R14138m, CGYTCYTTRCCAGAWATGGCRTA. Testing used the SSIII RT-PCR kit (Life Technologies, Karlsruhe, Germany) with the following cycling protocol in a LightCycler 480 (Roche, Penzberg, Germany): 20 min at 50°C for reverse transcription, followed by 3 min at 95°C and 45 cycles of 15 s at 95°C, 10 s at 58°C, and 20 s at 72°C. CoV quantification relied on cRNA in vitro transcripts generated from TA-cloned periamplicons using the T7-driven MEGAscript (Life Technologies, Heidelberg, Germany) kit as described previously (34). Partial RdRp gene sequences from real-time RT-PCR-positive specimens were obtained as described previously (18). Full CoV genomes and spike gene sequences were generated for those specimens containing the highest CoV RNA concentrations using sets of nested RT-PCR assays (primers are available upon request) located along the HCoV-229E genome and designed to amplify small sequence islets. Sequence islets were connected by bridging long-range nested PCR using strain-specific primers (available upon request) and the Expand High Fidelity kit (Roche) on cDNA templates generated with the Superscript III reverse transcriptase (Life Technologies).
Phylogenetic analyses.Bayesian phylogenetic reconstructions were made using MrBayes V3.1 (35) under assumption of a GTR+G+I nucleotide substitution model for partial RdRp gene sequences and the WAG amino acid substitution model for translated open reading frames (ORFs). Two million generations were sampled every 100 steps, corresponding to 20,000 trees, of which 25% were discarded as burn-in before annotation using TreeAnnotator V1.5 and visualization using FigTree V1.4 from the BEAST package (36). Neighbor-joining phylogenetic reconstructions were made using MEGA5.2 (37) and a percent nucleotide distance model, the complete deletion option, and 1,000 bootstrap replicates. Genome comparisons were made using MEGA5.2 (37); SSE V1.1 (38) and recombination analyses were made using SimPlot V3.5 (39).
RESULTS
Specimens from 2,087 bats belonging to 11 species were available for PCR testing. Table 1 provides details on the overall sample composition and detection rates in individual bat species. Only bats belonging to the family Hipposideridae tested positive in 81 of 1,853 specimens (4.4%). All positive-testing bats had been morphologically identified in the field as either Hipposideros cf. ruber or Hipposideros abae. Those were the most abundant species within the sample. No HCoV-229E-related RNA was detected in the 17 available specimens from Hipposideros jonesi and Hipposideros cf. gigas.
Overview of bats tested for 229E-related coronaviruses in Ghana
An 816-nucleotide (nt) fragment from the RdRp gene was obtained from 41 of the 81 positive specimens. This fragment was used for further analysis, as the 816-nt sequence yields improved resolution in inference of phylogeny compared to shorter sequences derived from RT-PCR screening of field-derived samples (2). To expand the available genomic data for HCoV-229E, the 816-nt RdRp gene fragment was also sequenced from 23 HCoV-229E strains from patients sampled between 2002 and 2011 in China, Germany, The Netherlands, Brazil, and Ghana. In addition, the 816-nt RdRp gene fragment was sequenced from two historical HCoV-229E strains isolated in 1965 and the 1980s (40). In analogy to the official taxonomic designation SARS-related CoV, including human SARS-CoV and related CoVs from other animals (1), we here restrict usage of the term HCoV-229E to the human virus and refer to the animal viruses as 229E-related CoV. Figure 1A shows a Bayesian phylogeny of the partial RdRp gene. The bat virus diversity we observed in our pilot study (represented by viruses Buoyem344 and Kwamang19) was expanded greatly. A phylogenetically basal virus termed Kwamang8 obtained in our pilot study was not detected again, although the present study contained specimens from the same cave and bat species. All human strains occupied an apical phylogenetic position and were not intermixed with any of the animal viruses. The recently described alpaca 229E-related CoV (32) clustered with two viruses obtained from hipposiderid bats in a parallel study from our groups in the Central African country Gabon (41). The two Gabonese bat-associated viruses differed from the alpaca 229E-related CoV by only 3.2% nucleotide content within the RdRp gene fragment. Hipposiderid bat CoVs were sorted neither by sampling sites nor by their host species in their RdRp genes. Overall, bat 229E-related CoVs sampled over 3 years differed up to 13.5% in their nucleotide sequences and 3.3% in their amino acid sequences. Although the HCoV-229E data set used for comparison was sampled over 50 years, the human-associated viruses showed 5- to 10-fold less genetic diversity than bat viruses, with only 1.4% nucleotide and 0.7% amino acid variation. Because of the small sequence variation in HCoV-229E, Fig. 1A contains only nine representative HCoV-229E strains. The neighbor-joining phylogeny shown in Fig. 1B represents the high sequence identity between all HCoV-229E strains determined in this study.
Phylogenetic relationships of the genus Alphacoronavirus, HCoV-229E strains, and the novel bat viruses. (A) Bayesian phylogeny of an 816-nucleotide RdRp gene sequence fragment corresponding to positions 13891 to 14705 in HCoV-229E prototype strain inf-1 (GenBank accession no. NC_002645) using a GTR+G+I substitution model. SARS-CoV was used as an outgroup. Viruses with additional sequence information generated in this study are marked with circles (full genome) or triangles (spike gene). Bat viruses detected in our previous studies from Ghana (33) and Gabon are given in cyan (41). (B) Neighbor-joining phylogeny of the same RdRp gene fragment with a nucleotide percentage distance substitution model and the complete deletion option. The tree was rooted against HCoV-NL63. Viruses are colored according to their origin. (C) Bayesian phylogeny of the full spike gene of bat 229E-related CoVs, the alpaca 229E-related CoV, and HCoV-229E strains identified with GenBank accession numbers and year of isolation, using a WAG amino acid substitution model and HCoV-NL63 as an outgroup. The novel bat 229E–related CoVs are shown in boldface and red. Branches leading to the outgroup were truncated for graphical reasons, as indicated by slashed lines. Values at nodes show support of grouping from posterior probabilities or 1,000 bootstrap replicates (only values above 0.7 are shown).
To analyze to what extent bat 229E-related CoVs show genetic variation, the spike gene encoding the viral glycoprotein was characterized from 15 representative bat viruses (labeled with a triangle in Fig. 1A). Figure 1C shows a Bayesian phylogenetic tree of the bat 229E-related CoV spike gene sequences and HCoV-229E full spike sequences sampled over 50 years. The bat viruses formed three genetically diverse clades, of which two phylogenetically basal clades contained bat viruses only. These clades were sorted according to their sampling sites, Kwamang (abbreviated KW) and Akpafu Todzi (abbreviated AT). A third clade contained closely related bat viruses obtained from three different sample sites separated by several hundred kilometers (Buoyem, Kwamang, and Forikrom) (21). These data suggested cocirculation of different spike gene lineages within sampling sites as well as the existence of separate lineages between sites. However, the small number of viruses characterized from the phylogenetically basal bat clades 1 and 2 implies that caution should be taken in assertions on geographically separated spike gene lineages. The alpaca 229E-related CoV and all HCoV-229E strains clustered in an apical phylogenetic position compared to the bat viruses. The most closely related bat viruses from clade 1 differed from HCoV-229E by 8.4 to 13.7%. The two other bat virus lineages were less related to HCoV-229E, with 30.6 to 33.0% amino acid sequence distance. The patristic distance within HCoV-229E was 5.5% on the amino acid level; that within bat viruses was 6-fold higher, with 33.6%.
Topologies of the Bayesian phylogenetic reconstructions of RdRp and spike genes from bats and the alpaca were not congruent, which may hint at past recombination events across animal 229E-related CoVs. To further investigate the genomic relationships of bat 229E-related CoVs and HCoV-229E, the full genomes were determined directly from fecal specimens from four bat viruses representing bat 229E-related CoV clades 1 to 3 (labeled with circles in Fig. 1A and C). We refer to these viruses as 229E-related CoV lineages 1 to 3 hereafter. Figure 2A shows that bat 229E-related CoV genomes comprise 28,014 to 28,748 nt, which exceeds the length of known HCoV-229E strains by 844 to 1,479 nt. As shown in Fig. 2B, HCoV-229E and all bat viruses were closely related within the putative ORF1ab. This allowed the delineation of nonstructural proteins (NSP) 1 to 16 for all bat viruses, in analogy to HCoV-229E. Table 2 provides details on length and cleavage sites of the predicted NSP domains. Sequence identity in seven concatenated NSP is used by the International Committee for the Taxonomy of Viruses (ICTV) for CoV species designation (1). As shown in Table 3, the four fully sequenced bat viruses showed translated amino acid sequence identities of 93.3 to 97.1% with HCoV-229E. This was well above the 90% threshold established by the ICTV, indicating that all bat 229E-related CoVs and HCoV-229E form a single species. Bat virus Kwamang8, which formed a phylogenetically basal sister clade to the other bat viruses and HCoV-229E, could not be sequenced on a full-genome level. The amino acid sequence of the partial RdRp gene of Kwamang8 differed by only 3.3% from those of other bat viruses and HCoV-229E. Based upon previous comparisons of CoV RdRp gene sequences for tentative species delineation (2, 18), Kwamang8 forms part of the same species as the other bat viruses and HCoV-229E. This CoV species would also include the recently described alpaca 229E-related CoV (32), which showed 96.9 to 97.2% amino acid sequence identity with HCoV-229E and 94.2 to 97.8% with the bat viruses in the seven concatenated NSP domains.
Genome organization of 229E-related coronaviruses and relationships between viruses from bats and humans. (A) 229E-related CoV genomes are represented by black lines; ORFs are indicated by gray arrows. Locations of transcription regulatory core sequences (TRS) are marked by black dots. HCoV-NL63 is shown for comparison. (B) Similarity plots generated using SSE V1.1 (38) with a sliding window of 400 and a step size of 40 nucleotides (nt). The HCoV-229E prototype strain inf-1 was used with the indicated animal viruses.
Coding capacity for the putative nonstructural proteins of the novel bat 229E-related coronaviruses
Comparison of amino acid identities of seven conserved replicase domains of the bat 229E-related coronaviruses, HCoV-229E, and the alpaca 229E-related coronavirus for species delineation
As shown in Fig. 2A, all seven open reading frames (ORFs) known from HCoV-229E were found in bat 229E-related CoVs in the sequence ORF1a/1b-spike-ORF4-envelope-membrane-nucleocapsid. Amino acid identities between predicted ORFs of the bat viruses and HCoV-229E ranged from the 67.2 to 91.6% described above for the translated spike genes to 88.3 to 94.6% (ORF1ab), with bat virus lineage 1 consistently showing highest amino acid sequence identities. Table 4 provides details for all sequence comparisons.
Amino acid identity between open reading frames of human, bat, and camelid 229-related coronaviruses
We looked for additional support for the existence of these predicted ORFs by analyzing the sequence context at their 5′ termini. This is because in CoVs, ORFs are typically preceded by highly conserved transcription regulatory sequence (TRS) elements (42). All putative ORFs from bat 229E-related CoVs showed high conservation of the typical HCoV-229E TRS core sequence UCU C/A AACU and adjacent bases. Table 5 provides details on all putative TRS elements within bat 229E-related CoV genomes.
Putative transcription regulatory sequences of the novel bat 229E-related coronaviruses and HCoV-229E
Figure 3A shows Bayesian phylogenetic trees reconstructed for all individual ORFs. The alpaca 229E-related CoV clustered in intermediate position between HCoV-229E and the bat viruses in the ORF1ab and spike genes but with bat viruses only in membrane, envelope, nucleocapsid, and ORF4. The divergent topologies again suggested recombination events in 229E-related CoVs. To find further evidence for recombination events and to identify genomic breakpoints, 229E-related CoVs were analyzed by bootscanning. As shown in Fig. 3B, bootscanning supported multiple recombination events involving HCoV-229E, bat 229E-related CoVs, and the alpaca 229E-related CoV. Major recombination breakpoints occurred within the ORF1ab and the beginning of the spike gene, compatible with previous analyses of CoV recombination patterns (2) and the divergent topologies between the RdRp and spike genes noted above. Bootscanning also suggested a potential genomic breakpoint within the spike gene, mapping to the borders of the S1 (associated with receptor binding) and S2 (associated with membrane fusion) domains. This would be consistent with previous evidence supporting intraspike recombination events in bat-associated CoVs (43). To obtain further support for potential intraspike recombination events, separate phylogenetic reconstructions for the S1 and the S2 domains were made. As shown in Fig. 3B, these phylogenetic reconstructions supported recombination events involving the alpaca 229E-related CoV and HCoV-229E but not the bat 22E-related CoVs. In the S1 domain, the alpaca 229E-related CoV clustered with clinical HCoV-229E strains, while the HCoV-229E reference strain inf-1, isolated in 1962, clustered in a phylogenetically basal sister relationship. Only in the S2 domain was the intermediate position of the alpaca compared to bat and human 229E-related CoVs noted before in comparisons of the full spike gene maintained. These data may hint at recombination events between HCoV-229E and the alpaca virus and further supported genetic compatibility between these two viruses belonging to one CoV species.
Bayesian phylogenies of major open reading frames and recombination analysis of HCoV-229E and related animal viruses. (A) Phylogenies were calculated with a WAG amino acid substitution model. The novel bat viruses are shown in red. The alpaca CoV is shown in cyan. Filled circles, posterior probability support exceeding 0.95; the scale bar corresponds to genetic distance. Details on the origin of HCoV-229E strain VFC408, which was generated for this study, can be retrieved from reference 69. Branches leading the outgroup HCoV-NL63 were truncated for graphical reasons. (B) Bootscan analysis using the Jukes-Cantor algorithm with a sliding window of 1,500 and a step size of 300 nt. The HCoV-220E inf-1 strain was used with animal 229E-related viruses as indicated. (C) Phylogenies of the S1 and S2 subunits were calculated according to panel A. One representative HCoV-229E strain was selected per decade according to reference 70: GenBank accession no. DQ243974, DQ243964, DQ243984, and DQ243967.
Three major differences existed between HCoV-229E, the alpaca 229E-related CoV, and the bat 229E-related CoVs. The first of these differences occurred in the putative ORF4. Similar to the case for HCoV-229E strains characterized from clinical specimens, a contiguous ORF4 existed in all bat viruses and was 156 to 164 amino acid residues longer than the alpaca 229E-related CoV ORF4. Reanalysis of the putative ORF4 sequence of the alpaca 229E-related CoV showed that this apparently shorter ORF4 was due to an insertion of a single cytosine residue at position 181. Without this putative insertion, the alpaca 229E-related CoV ORF4 showed the same length as homologous ORFs in bat 229E-related CoVs and HCoV-229E. Since the HCoV-229E ORF4 is known to accumulate mutations in cell culture (40), the apparently truncated ORF in the alpaca 229E-related CoV isolate may thus not occur in vivo. The extended ORF4 of the alpaca 229E-related CoV would be most closely related to bat viruses from lineage 1, with 5.5% amino acid sequence distance, compared to at least 8.8% distance from HCoV-229E strains.
The second difference was a considerably longer S1 portion of the bat 229E-related CoV spike genes compared to HCoV-229E. Figure 4 shows that the three bat lineages contained 185 to 404 additional amino acid residues upstream of the putative receptor binding domain (RBD) (44, 45) compared to HCoV-229E. Bat lineage 1, which was phylogenetically most closely related to HCoV-229E, carried the smallest number of additional amino acid residues. Of note, the alpaca 229E-related CoV was identical to HCoV-229E in the number of amino acid residues within this region of the spike gene.
Amino acid sequence alignment of the 5′ ends of the spike genes of HCoV-229E and related animal viruses. Amino acid alignment of the first part of the spike genes of 229E-related CoVs, including four bat 229E-related CoVs, the alpaca 229E-related CoV and the HCoV-229E inf-1 strain, is shown. Conserved amino acid residues are marked in black, and sequence gaps are represented by hyphens.
The third major difference was the existence of an additional putative ORF downstream of the nucleocapsid gene in all bat viruses. Nonhomologous ORFs of unknown function downstream of the nucleocapsid gene occur in several alpha- and betacoronaviruses, including Feline infectious peritonitis virus (FIPV), Transmissible gastroenteritis virus of swine (TGEV), Rhinolophus bat CoV HKU2, Scotophilus bat CoV 512, Miniopterus bat CoV HKU8 (23), the Chaerephon bat CoVs BtKY22/BtKY41, the Cardioderma bat CoV BtKY43 (46), and bat CoV HKU10 from Chinese Hipposideros and Rousettus species (47). In the genus Betacoronavirus, only bat CoV HKU9 from Rousettus and the genetically related Eidolon bat CoV BtKY24 (46) carry additional ORFs at this genomic position. No ORF in the 3′-terminal genome region is known from HCoV-229E. The alpaca 229E-related CoV contains an ORF at this position termed ORFX by Crossley et al. (32). In analogy to consecutive numbers used to identify HCoV-229E ORFs, we refer to this ORF as ORF8 here. The putative TRS context preceding ORF8 was conserved in all bat 229E-related CoVs and in the alpaca 229E-related CoV, suggesting that a corresponding subgenomic mRNA8 may exist. The 3′ untranslated region (UTR) of bat 229E-related CoVs immediately followed the putative ORF8. This was supported by the existence of a conserved octanucleotide sequence and highly conserved stem elements forming part of the pseudoknot typically located at the 5′ ends of alphacoronavirus 3′ UTRs (48). As shown in Fig. 5, HCoV-229E shows a high degree of sequence conservation compared to bat 229E-related CoVs and the alpaca 229E-related CoV in this genomic region, including a highly conserved putative TRS. Bioinformatic analyses (49–51) provided evidence for the presence of two transmembrane domains in the predicted proteins 8 of the alpaca and the genetically related bat 229E-related viruses. This may imply a role of the predicted protein 8 in coronaviral interactions with cellular or viral membranes.
Nucleotide sequence alignment of the genomic 3′ ends of HCoV-229E and related animal viruses Nucleotide alignment of the genome region downstream of the nucleocapsid gene, including four bat 229E-related CoVs, the alpaca 229E-related CoV, and representative HCoV-229E, are shown, with full genomes identified with GenBank accession number or strain name. Dots represent identical nucleotides, and hyphens represent sequence gaps. Gray bars above alignments indicate open reading frames and the beginning of the poly(A) tail. The putative start and stop codon of ORF8 is in lime green, and the corresponding putative TRS element is in blue. The conserved genomic sequence elements and the highly conserved stem elements forming part of the pseudoknot (PK) are marked with gray and purple background.
As shown in Fig. 5, one of the bat 229E-related CoV lineages represented by virus KW2E-F56 contained a highly divergent ORF8. In protein BLAST comparisons, the KW2E-F56 ORF8 showed limited similarity to the putative ORF7b of HKU10 and to the putative ORF8 located upstream of the nucleocapsid of a Nigerian Hipposideros betacoronavirus termed Zaria bat CoV (47, 52). This may hint at cross-genus recombination events between different hipposiderid bat CoVs in the past. However, overall amino acid sequence identity between these bat CoV ORFs was very low, with maximally 28.2%. As shown in Fig. 6, only the central parts of these ORFs contained a stretch of 46 more conserved amino acid residues showing up to 39.1% sequence identity and 47.8% similarity (Blosum62 matrix). The origin and function of the divergent ORF8 thus remain to be determined.
Amino acid sequence alignment of the putative ORF8 from a bat 229E-related coronavirus and closest hits from two other hipposiderid bat coronaviruses Conserved amino acid residues between sequence pairs are highlighted in color according to amino acid properties, and sequence gaps are represented by hyphens. The central domain showing higher sequence similarity between compared viruses is boxed for clarity. The 229E-related alphacoronavirus KW2E-F56 from Hipposideros cf. ruber detected in this study is given in red, the alphacoronavirus HKU10 originated from a Chinese H. pomona animal, and the betacoronavirus Zaria originated from a Nigerian H. gigas animal.
DISCUSSION
We characterized highly diverse bat CoVs on a full-genome level and showed that these viruses form one species together with HCoV-229E and a recently described virus from alpacas (32). We analyzed the genomic differences between human, bat, and alpaca 229E-related CoVs to elucidate potential host transitions during the formation of HCoV-229E.
A major difference between bat 229E-related CoVs and HCoV-229E was the spike gene deletion in HCoV-229E compared to the bat viruses. Interestingly, the bat 229E-related CoV lineage 1, which was phylogenetically most related to HCoV-229E, also carried the smallest number of additional amino acid residues. Most chiropteran CoVs are restricted to the gastrointestinal tract, whereas HCoVs replicate mainly in the respiratory tract (2). The spike deletion in HCoV-229E compared to ancestral bat viruses is thus noteworthy, since deletions in this protein have been associated with changes in coronaviral tissue tropism. This is best illustrated by TGEV, whose full-length spike variants are associated with a dual tropism for respiratory and enteric tracts, whereas the deleted variant termed Porcine respiratory CoV (PRCV) replicates mainly in the respiratory tract (53). One could hypothesize that adaptation of bat 229E-related CoV lineage 1 both to nonchiropteran hosts and to respiratory transmission may have been easier than for the other bat 229E-related CoV lineages.
Because the exact amino acid residues of the HCoV-229E RBD conveying cell entry are not known, it is difficult to predict whether the bat viruses may interact with the HCoV-229E cellular receptor aminopeptidase N (45) or its Hipposideros homologue. Characterization of this bat molecule and identification of permissive cell culture systems may allow initial susceptibility experiments for chimeric viruses. Of note, although the alpaca 229E-related CoV was successfully isolated (32), no data on receptor usage and cellular tropism are available so far (2, 53).
Another major difference was the existence of an ORF8 downstream of the nucleocapsid gene in bat 229E-related viruses and the detection of putative sequence remnants of this ORF in HCoV-229E. Hypothetically, deterioration of ORF8 in HCoV-229E could have occurred due to loss of gene function in human hosts after zoonotic transmission from bats or intermediate hosts. This may parallel gradual deletions in the SARS-CoV accessory ORF8 during the human epidemic compared to bat SARS-related CoVs (54) and is consistent with characterizations of HCoV-229E clinical strains showing high variability of this genomic region (55).
The virus-host association between 229E-related CoVs and the bat genus Hipposideros is strengthened by our virus detections in Hipposideros species in Ghana and in Gabon (41), which is separated from Ghana by about 1,800 km. The observed link between 229E-related alphacoronaviruses and hipposiderid bats is paralleled by the detections of genetically closely related betacoronaviruses in different Hipposideros species from Ghana, Nigeria, Thailand, and Gabon (33, 41, 52, 56), suggesting restriction of these CoVs to hipposiderid bat genera. Due to their proofreading capacity, CoVs show evolutionary rates of 10E−5 to 10E−6 substitutions per site per replication cycle, which is much slower than rates observed for other RNA viruses (57, 58). Our data thus suggest a long evolutionary history of 229E-related CoVs in Old World hipposiderid bats that greatly exceeds that of HCoV-229E in humans, confirming previous hypotheses from our group (33).
The putative role of the alpaca 229E-related CoV in the formation of HCoV-229E is unclear. Our data enable new insights into the evolutionary history of HCoV-229E. First, the alpaca 229E-related CoV contained an intact ORF8 which was genetically related to the homologous gene in bat 229E-related CoVs. Second, genes of the alpaca CoV clustered either with bat viruses only or in an intermediate position between bat viruses and HCoV-229E. Because the alpaca 229E-related CoV showed the same deletion in its spike gene as HCoV-229E compared to bat 229E-related CoVs, it may be possible that alpacas represent a first host switch from bats followed by a second interhost transfer from alpacas to humans. The relatedness of the alpaca 229E-related CoV to older HCoV-229E strains rather than to contemporary ones reported by Crossley et al. would be compatible with this scenario (32). However, the alpaca 229E-related CoV was reported only from captive animals in the United States, and whether this virus is indeed endemic in New World alpacas is unclear. Additionally, the apparent intraspike recombination event may speak against a role of the alpaca virus as the direct ancestor of HCoV-229E. On the other hand, it cannot be excluded that the basal clustering of the HCoV-229E prototype strain inf-1 in relation to the alpaca virus and other HCoV-229E strains is due to mutations associated with extensive passaging in cell culture. Further analyses will be required to confirm this apparent recombination event, ideally including additional sequence information from old HCoV-229E strains. Furthermore, a hypothetical direct transfer of Old World bat viruses to New World alpacas appears to be geographically unfeasible. It would be highly relevant to investigate Old World camelids for 229E-related CoVs that may have been passed on to captive alpacas and that may represent direct ancestors of HCoV-229E.
Additional constraints to consider in the hypothetical role of camelids for the evolutionary history of 229E-related CoVs are the time and place of putative host switches from bats. Camels were likely introduced into Africa not earlier than 5,000 years ago from the Arabian Peninsula (59, 60) and could not possibly come into direct contact with West African H. cf. ruber or H. abae of the Guinean savanna. The majority of CoV species seems to be confined to host genera (2). Therefore, it may be possible that 229E-related CoV transmission was mediated through closely related species such as H. tephrus, which occurs in the Sahel zone and comes into contact to populations of H. cf. ruber distantly related to those from the Guinean savanna (61). This bat species should be analyzed for 229E-related CoVs together with other genera of the family Hipposideridae, such as Asellia or Triaenops, which are desert-adapted bats sharing their habitat with camelids in both Arabia and Africa and may harbor genetically related CoVs. An important parallel to this evolutionary scenario is the role of camelids for the emerging MERS-CoV (30, 62), whose likely ancestors also occur in bats (20, 21). However, we cannot rule out that the alpaca 229E-related CoV and HCoV-229E represent two independent zoonotic acquisitions from 229E-related CoVs existing in hipposiderid bats and potentially yet-unknown intermediate hosts.
The existence of different serotypes in the expanded 229E-related CoV species is unclear. CoV neutralization is determined mainly by antibodies against the S protein, and particularly the S1 domain (63). The phylogenetic relatedness of the S1 domains from the alpaca 229E-related CoV and HCoV-229E suggests that these viruses form one serotype. The most closely related bat 229E-related CoV lineage showed 8.4% amino acid sequence distance in the translated spike gene from HCoV-229E. This was comparable to the 7.8 to 18.6% amino acid distance between FIPV, TGEV and canine CoV, which belong to one CoV species (Alphacoronavirus 1) and for which cross-neutralization was observed (64). The ca. 30% spike amino acid sequence distance between the other bat 229E-related lineages and HCoV-229E was comparable to the distance between HCoV-NL63 and HCoV-229E, which form two different serotypes (65). HCoV-229E thus likely forms one serotype that includes the alpaca 229E- and potentially the most closely related bat 229E-related lineage, while the other bat 229E-related lineages may form different serotypes. In our study, the lack of bat sera and absence of bat 229E-related CoV isolates prevented serological investigations. The generation of pseudotyped viruses carrying bat 229E-related spike gene motifs may allow future serological studies. Of note, our joint analyses of Ghanaian patients with respiratory disease in this study and previous work from our group investigating Ghanaian villagers (66) showed that Ghanaians were infected with the globally circulating HCoV-229E, whereas no evidence of bat 229E-related CoV infecting humans was found. If serotypes existed in 229E-related CoVs, serologic studies may aid in elucidating putative exposure of humans and potential camelid intermediate hosts to these bat viruses.
It should be noted that throughout Africa, bats are consumed as wild game (67) and humans frequently live in close proximity of bat caves (68), including usage of bat guano as fertilizer and drinking water from these caves (21). These settings potentially facilitate the exposure of humans and their peridomestic animals, including camelids, to these previously remote bat viruses.
In summary, HCoV-229E may be a paradigmatic example of the successful introduction of a bat CoV into the human population, possibly with camelids as intermediate hosts.
ACKNOWLEDGMENTS
We thank Monika Eschbach-Bludau, Sebastian Brünink, Tobias Bleicker, Fabian Ebach, and Thierno Diawo Dallo at the Institute of Virology, Bonn, for technical assistance and Ebenezer Kofi Badu, Priscilla Anti, Olivia Agbenyega, Florian Gloza-Rausch, Stefan Klose, and Thomas Kruppa for their help during the organization and conducting of field work.
This study was supported by the European Union FP7 projects EMPERIE (contract number 223498) and ANTIGONE (contract number 278976) and by the German Research Foundation (DFG grants DR 772/3-1, KA1241/18-1, and DR 772/7-1, TH 1420/1-1).
FOOTNOTES
- Received 9 July 2015.
- Accepted 7 September 2015.
- Accepted manuscript posted online 16 September 2015.
- Address correspondence to Christian Drosten, drosten{at}virology-bonn.de, or Jan Felix Drexler, drexler{at}virology-bonn.de.
V.M.C, H.J.B., and A.F.T. contributed equally to this article.
Citation Corman VM, Baldwin HJ, Tateno AF, Zerbinati RM, Annan A, Owusu M, Nkrumah EE, Maganga GD, Oppong S, Adu-Sarkodie Y, Vallo P, da Silva Filho LVRF, Leroy EM, Thiel V, van der Hoek L, Poon LLM, Tschapka M, Drosten C, Drexler JF. 2015. Evidence for an ancestral association of human coronavirus 229E with bats. J Virol 89:11858–11870. doi:10.1128/JVI.01755-15.
REFERENCES
- Copyright © 2015, American Society for Microbiology. All Rights Reserved.