Introduction and Dispersal of Sindbis Virus from Central Africa to Europe

This study shows that only a single introduction of SINV into a new geographical area is required for spread and establishment, provided that the requisite vector(s) and reservoir(s) of epizootological and epidemiological importance are present. Furthermore, we present the first report of recombination between two strains of SINV in nature. Our study increases the knowledge on new introductions and dispersal of arboviruses in general and of SINV in particular.

Sindbis virus (SINV) and Western equine encephalitis virus (WEEV), and flaviviruses (Flaviviridae), e.g., West Nile virus (WNV), Japanese encephalitis virus (JEV), Murray Valley encephalitis virus (MVEV), St. Louis encephalitis virus (SLEV), Rocio virus (ROCV), and Usutu virus (USUV) (1)(2)(3)(4)(5). In addition to their dependence on avian hosts, all these viruses also utilize vectors from the same genus of mosquitoes, Culex. This genus has species representatives throughout the world; thus, it has the potential to provide suitable vector species in both tropical and temperate regions (2,(6)(7)(8). Human cases emerge from an infectious mosquito bite and are considered an effect of spillover from a viremic bird population. The viremia produced in humans is not enough to infect a mosquito, and consequently, humans are dead-end hosts of the virus (2).
SINV is a positive single-stranded RNA virus that has a wide distribution throughout the Old World. It has a genome size of about 11.7 kb and has two open reading frames (ORFs) that encode four nonstructural (nsP1 to -4) and five structural (C, E3, E2, 6K/TF, and E1) proteins, respectively (9). Based on phylogenetic analyses of the partial E2 gene, a total of six genotypes (genotypes I to VI) have been identified (10). SINV genotype I (SINV-I) has been isolated from Europe, Africa, and the Middle East; SINV-II and SINV-VI have been isolated from Australia; SINV-III has been isolated from Southeast Asia; SINV-IV has been isolated from Asia and the Middle East; and SINV-V (also referred to as Whataroa virus) has been isolated from New Zealand (10).
SINV-I is the only genotype that has been associated with outbreaks of human disease, and it was first isolated from mosquitoes in Cairo, Egypt, in 1952 (11). Outbreaks have been reported from South Africa and northern Europe, and the disease goes under several names: Pogosta, Ockelbo disease, Karelian fever, and Sindbis fever. Signs and symptoms include fever, exanthema, arthralgia, myalgia, and cases of arthritis that can remain for years (12,13).
The first cases of Sindbis fever in Fennoscandia were reported from Sweden in the 1960s, and later, this disease was shown to be associated with a strain of SINV-I isolated from mosquitoes in Edsbyn, Sweden (14). Cases of Sindbis fever are sporadically diagnosed in central Sweden (15), and in recent years, there has also been an outbreak of Sindbis fever in northern Sweden (16). Several larger outbreaks have been observed in Finland and in South Africa (17,18). However, as mentioned above, SINV-I has also been isolated from other parts of Africa, Europe, and the Middle East, without any reports of disease outbreaks. Previous studies indicated that SINV-I has been transported with northward-migrating birds, connecting South Africa and Fennoscandia (10). The driving forces behind SINV-I outbreaks and the details of SINV-I movements between continents and regions are still unknown.
To further elucidate the evolutionary history of SINV-I and its dispersal patterns, we have sequenced and analyzed the complete genomes of 36 new strains, in addition to 30 previously sequenced publicly available strains, from various vector and host species, spanning 10 countries and 58 years.
The most parsimonious interpretation is that SINV-I strains circulating in central and northern Europe were introduced from central Africa rather than from South Africa, as previously hypothesized (10). Analysis of clade A, which contains most European strains, shows that the basal position was held by a strain from Nedre Dalälven, Sweden (09_M_358; GenBank accession number MK045245), suggesting that SINV-I was first introduced from central Africa to Sweden, with subsequent circulation there, and then further dispersed from Sweden to other parts of northern, eastern, and central Europe (Fig. 1). This was further confirmed by the phylogenetic tree based on amino acid sequences of SINV-I, which had the same topology (Fig. 2).
Evolutionary rates. A TempEst analysis showed that the whole data set (64 sequences) did not have any temporal structure (correlation coefficient, Ϫ0.0904). However, the data set including sequences from clades A and B showed a temporal signal (43 sequences; correlation coefficient, 0.5128) and was subsequently used to estimate the evolutionary rate of clades A and B. Following model testing, the evolutionary history of clades A and B of SINV-I was reconstructed with a strict molecular clock mode and a coalescent exponential population demographic model. The evolutionary rate for this data set is 5.45 ϫ 10 Ϫ5 substitutions (95% highest posterior density [HPD], 4.40 ϫ 10 Ϫ5 to 6.56 ϫ 10 Ϫ5 ) ( Table 1). The time to the most recent common ancestor (tMRCA) estimated that clade A dated back to 93 years ago (95% HPD, 76 to 112 years), indicating that SINV-I was introduced into Sweden in the 1920s. This was then followed by two separate introductions of SINV-I from Sweden into Finland and Germany around the 1960s and the 1970s, respectively, which corresponds to epidemiological data on SINV infection in northern Europe (Fig. 3). Notably, we also estimate that clade A and clade B diverged from their African ancestors more than 300 years ago (95% HPD, 1605 to 1743) and perhaps longer if the data have been impacted by strong purifying selection (19). Hence, prior to the emergence of SINV-I in northern Europe, there were more than 300 years of viral diversity and spread that are still unaccounted for.
Phylogeny of the partial E2 gene. We also performed the same analysis on 163 sequences of the partial E2 alignment (Fig. 4), and the results showed that it could not discriminate any additional within-clade geographical structure compared to the analysis of the complete ORFs. Northern/eastern SINV strains also clustered together in the partial E2 phylogeny. However, it failed to detect the origin of this clade, as both the central African and South African clades were mixed together, and the overall posterior probability support was low.
Genetic diversity of SINV-I. The SINV-I genome was highly conserved, with an average nucleotide similarity of 96.5% (range, 87.9% to 100%). A comparison of nucleotide and amino acid differences showed that the highest degree of observed divergence was found in clade D (Table 1). Within clades A, B, and C, where most strains from regions with reported outbreaks were located, no specific differences could be observed that clearly separated strains isolated from human patients from other strains (Fig. 5). Thus, direct disease association could not be accounted for by any unique amino acid substitution.
Recombination within SINV-I. Two recombination events in the alignment of the complete ORFs were found, showing that the strains NVS_305_Culex_vansomereni_ Kenya_2007 (GenBank accession number KY616988) and H7_Culex_pipiens_Germany_ 2013 (GenBank accession number MG779534) have undergone recombination (Fig. 6). Interestingly, the phylogenies of nsP4 and E1 for NVS_305_Culex_vansomereni_ Kenya_2007 were incongruous with those of other genes. Based on nsP4 and E1, this strain clustered in clade A together with the strains from northern Europe, whereas based on the other genes, NVS_305_Culex_vansomereni_Kenya_2007 is a member of clade E. For H7_Culex_pipiens_Germany_2013, incongruences of phylogenies were largely based on structural versus nonstructural protein-coding genes. This strain clustered in clade D on the basis of nonstructural genes and the (structural) E1 gene, whereas the other structural genes (C, E3, E2, and 6K/TF) of this strain were found to be more related to the strains from clade A. This recombinant strain indicates a second introduction of SINV-I to Germany, arriving from the south, which recombined with the strain introduced from the north. The potential breakpoints for these two strains were also confirmed by several methods in RDP4 (Table 2 and Fig. 7).

DISCUSSION
In this study, we have investigated the evolutionary history and dispersal pattern of SINV-I, by phylogenetic analyses of 66 full-genome sequences isolated from regions all throughout its geographical range. The current long-standing hypothesis has been that SINV-I was introduced to northern Europe from South Africa by migratory birds (20)(21)(22). This has also been supported by reports of a similar disease occurring in these two regions (12,18). However, our analyses suggest that the most likely origin of all strains isolated in northern Europe is a single introduction of SINV-I into Sweden from central Africa rather than from South Africa. The specific region from which the virus was exported is, however, uncertain, due to the few and geographically limited isolates of SINV available from Africa. In addition, our results indicate that SINV-I strains were further exported from Sweden to Finland, Russia, and Germany, presenting an eastward and southward dispersal, in contrast to the more commonly proposed northward dispersal of pathogens (23).
The intercontinental dispersals of several bird-hosted viruses, such as WNV and USUV, have been linked to the northward movement of migratory birds (1,24). In the present study, there is no indication of multiple introductions of SINV-I to northern Europe; thus, northward transport with migratory birds is likely to be a very rare event. This supports conclusions from previous studies on SINV-I phylogeny and the low SINV-I antibody prevalence in northward-migrating birds (20,25). However, the composition of clade D, containing Middle Eastern and southern/central European strains, implied that southern/central Europe might have had three introductions of SINV-I, two from northern Europe and one from central Africa.
The southward dispersal of SINV-I could be explained by the autumn migration of the main amplifying host species. The enzootic circulation of SINV-I occurs mainly in August, with thrushes of the genus Turdus as the main amplifying host of SINV-I in Sweden (15,21). These Turdus species breed in Sweden and migrate southward in August to October to spend the winter in central and southwestern Europe (26). The SINV-neutralizing antibody prevalence in redwing (Turdus iliacus), songthrush (Turdus philomelos), and fieldfare (Turdus pilaris) sometimes exceeds 70% in Sweden (27), which indicates that a substantial number of migrating thrushes are likely to include viremic individuals dispersing SINV-I to local mosquito populations on their way south.  The first report of Sindbis fever in humans in Fennoscandia was from Sweden in 1967 (28). However, prior to this report, there were already signs indicating the existence of SINV in Europe: specific anti-SINV antibodies were found in human sera in northern Italy and Finland already in 1965 (29,30). Antibody screenings of 5,000 human serum samples in Finland in 1958 to 1964 and in Austria before 1963 (31), however, reported that no antibodies were present. Our dating analysis showed that SINV-I was introduced into Sweden in the 1920s and spread east and southward on two separate occasions to Finland and Germany in the 1960s and 1970s. This is well in agreement with epidemiological data from Finland but also indicates that SINV-I was circulating undetected in Sweden for many decades before it was reported to cause disease. After the confirmation of the first SINV human case in Sweden, sporadic cases as well as recurrent outbreaks have been reported annually in Sweden (32) and Finland (33,34), as have occasional outbreaks of hundreds or even thousands of human cases. In Finland, SINV-I has been a notifiable disease since 1995, and numbers of reported cases are 10-fold higher than in Sweden, where the awareness of Sindbis fever is considerably lower. Further investigations are needed to clarify if this difference is due to underreporting in Sweden or if it reflects real differences in Sindbis fever occurrence. In both countries, however, the introduction of SINV-I was a rare one-time event, and the virus successfully managed to become established in an enzootic cycle locally, with occasional cases of spillover to humans.
Previous studies have used the E2 gene as a proxy for genotyping SINV-I strains (10). This study confirms that it is a good marker for genotyping; however, it has a limited ability to resolve detailed dispersal patterns. Reconstructing the phylogenies based on separate genes also did not improve the resolution of the phylogeny, despite many more sequences being used in the partial E2 phylogeny. Thus, the best-resolved phylogeny is by using concatenated ORF sequences.
We detected two strains (NVS_305_Culex_vansomereni_Kenya_2007 and H7_Culex_ pipiens_Germany_2013) that most likely underwent a recombination event. This is, to our knowledge, the first time that a recombination event has been detected for SINV-I in nature. A previous recombination event between a Sindbis-like virus and Eastern equine encephalomyelitis virus (EEEV) in South America has had a large impact on the alphavirus genus, giving rise to WEEV, Highlands J virus, and Fort Morgan virus (35,36). The ancestral recombinant obtained E1 and E2 proteins from the Sindbis-like virus and the remaining genes from EEEV (35). The three recombinants, and their variants (e.g., Buggy Creek virus, a variant of Fort Morgan virus), are all vector-borne viruses that infect birds in the New World, but only WEEV has a clear association with human disease.
SINV-I, as for all alphaviruses, evolves relatively slowly (5), which is indicated by the comparatively low genetic diversity observed within the clades and by the finding that model selection favored the constant clock model in this study. Our results showed few genetic differences within SINV-I, especially within the clade of northern European strains (99.1 to 100% overall similarity). The E2 gene encodes a glycoprotein that is associated with the pathogenicity of SINV infections (37,38). Single substitutions can cause considerable differences in arbovirus vector specificity and pathogenicity, as shown for chikungunya virus (CHIKV) and Zika virus (39,40). However, we could not detect any clear nucleotide, or any amino acid, differences between SINV-I strains from regions with disease and those without disease (Fig. 5). The reasons for the lack of   become infected with SINV-I as often as humans do in areas of endemicity of Fennoscandia. Ecological studies have shown that the abundance of the most competent vector of SINV-I, Culex torrentium, is decreasing southward in Europe (43,44). One could speculate that there is a critical abundance of Cx. torrentium that needs to be reached for an efficient-enough transmission of SINV-I in the bird population to allow spillover infections of humans. Likewise, the abundance of the bridge vector Aedes cinereus is likely of importance for the transmission of virus from birds to humans (3). Other In conclusion, our results suggest that SINV-I was successfully introduced only once into northern Europe from central Africa, probably via migratory birds. This The minor parent is the parent contributing the smaller fraction of sequence. The major parent is the parent contributing the larger fraction of sequence. Breakpoint positions (in base pairs) are relative to E594_Aedes_rossicus_Sweden_2002. NS indicates that no significant P value was recorded for this recombination event using this method.
Phylogenetic Analysis of Sindbis Genotype I Virus Journal of Virology introduction led to the establishment of endemic SINV circulation in Sweden, and from there, SINV-I spread to other parts of northern, eastern, and central Europe.
The emergence of reported human cases is potentially due to synergistic effects, including awareness of the disease in the community, ecological circumstances, and undiscovered host and viral genetic factors.

MATERIALS AND METHODS
Virus strains. SINV strains for the present study are listed in Table 3 and were originally isolated from a number of sources, including mosquitoes, birds, and humans (e.g., see references 10, 20, and 45; J. O. Lundström, J. Hesson, M. Schafer, O. Ostman, T. Semmler, M. Weidmann, and M. Pfeffer, unpublished data). All strains sequenced in this study were propagated in Vero cells as described previously by Hesson et al. (45) or in suckling mice as described previously (46) and stored at Ϫ80°C.
RNA extraction and sequencing. Total RNA of the SINV-infected cell culture supernatant was extracted using the QIAamp viral RNA minikit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. cDNA was synthesized using the RevertAid H minus first-strand cDNA synthesis kit (Thermo Scientific, Vilnius, Lithuania) in a 20-l mixture containing 2 l of random hexamers (10 pmol/l), 2 l 10 M deoxynucleoside triphosphate (dNTP), and 5 l RNA. The presence of SINV RNA was confirmed by quantitative PCR (qPCR) as described previously (41).
PCRs, using modified oligonucleotide primers that cover the complete genome of SINV (17) ( Table 4), were performed using Phusion Flash (Thermo Scientific, Lithuania). PCR amplicons from the same strain were pooled and purified using the QIAquick PCR purification kit (Qiagen, Hilden, Germany). Illumina sequencing libraries were constructed using multiplex PCR products (1 ng of input DNA) and the Nextera XT DNA library prep kit (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. Sequencing was performed on an Illumina MiSeq instrument using MiSeq reagent kit version 2 (Illumina, San Diego, CA, USA). Assembly of the sequence data was done using the CLC genome workbench, using the SINV Babanki strain (GenBank accession number HM147984) as the reference sequence (47). Low-coverage regions were closed by conventional PCR, with primers designed according to known sequences in the flanking regions.
Phylogenetic analysis. Publicly available full-genome sequences of SINV were retrieved from the National Center for Biotechnology Information (NCBI) website (https://www.ncbi.nlm.nih.gov/). All sequences, 66 in total, were then aligned using MAFFT with default settings, followed by manual refinement using AliView (48,49).
Several alignments were constructed based on (i) the complete open reading frames (ORFs), (ii) the single genes (nsP1 to -4, C, E1, E2, E3, and 6K/TF), (iii) complete and partial E2 sequences, and (iv) the complete ORFs excluding recombinant sequences. The best-fit evolutionary nucleotide substitution model following jModelTest analysis, GTRϩFϩIϩG4 (general time-reversible model with empirical base frequencies, allowing for a proportion of invariable sites, and a discrete gamma model with default 4 rate categories), was used in all phylogenetic analyses (50). Potential recombination events were investigated using the Phi test in SPLITS TREE 4.0 (51), Simplot version 3.5.1 (52), and RDP3 (53). Recombination events were determined using RDP, GENECONV, Bootscan, Maxchi, Chimera, SiSscan, Phylpro, LARD, and 3Seq. In addition, alignment 4 was translated into amino acids, and amino acid positions were compared using AliView.
To reconstruct the evolutionary history of the SINV-I complex, Bayesian phylogenetic trees of the complete ORFs and separated genes (nsP1 to -4, C, E3, E2, 6K/TF, and E1) of SINV-I were constructed by employing MrBayes v.3.2.6 (54). Bayesian analysis consisted at least 5 million Bayesian Monte Carlo Markov chain (MCMC) generations sampling every 1,000 generations. The run was continued until convergence was obtained (average deviation, Ͻ0.01) and with a 25% burn-in. To further infer the evolutionary rates and divergence time of SINV-I, we first performed a regression of root-to-tip genetic distances against date of sampling by using TempEst (55). The whole data set (64 sequences) showed no clocklike structure (correlation coefficient, Ϫ0.0904). Thus, only the sequences from clades A and B (43 sequences; correlation coefficient, 0.5128) were used for the evolutionary rate estimation. We employed six different combinations of demographic and molecular clock models (S2) and ran 50 million Bayesian MCMC generations sampling every 1,000 generations, implemented in BEAST version 2.3.1 (56) ( Table 5). Model comparison performed by using a marginal-likelihood estimator in two approaches, path sampling (PS) and stepping-stone sampling (SS), selected strict clock and exponential population as a better model for data analysis, with the log Bayesian factor (BF) value over at least 25. In all analyses, strain ArB489 from central Africa, isolated in 1985 (GenBank accession number MF409177), was used to root the tree. All computations were run using the CIPRES computational cluster (http://www.phylo.org/index .php/). Finally, trees were viewed and edited using FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/ figtree/).
A number of studies have previously used partial E2 sequences for creating SINV phylogenies (10,22,57). To investigate whether phylogenies based on partial E2 sequences give the same results as the phylogenies based on the complete ORF, and to be able to compare more strains, we also constructed phylogenies using all strains sequenced in this study and all available partial E2 sequences from GenBank (170 in total [minimum, 313 nucleotides {nt}; maximum, 2,200 nt]) (10).
Data availability. All newly sequenced strains have been deposited in GenBank under accession numbers MK045224 to MK045259.