Journal of Virology, February 2004, p. 1602-1603, Vol. 78, No. 3
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.3.1602-1603.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
| LETTER TO THE EDITOR |
|
|
|---|
Phylogenetic analysis has proven successful for the investigation and prediction of the evolution of viruses such as influenza virus (1). Initial inspection of SARS-CoV sequences revealed a high degree of homogeneity, which might indicate an RNA virus that evolves unusually slowly. To investigate further, we carried out a full-genome alignment of the available SARS-CoV strains recently analyzed by Ruan et al. (4) by use of the CLUSTAL algorithm (6). The alignment was carefully edited by hand to maximize the number of identities, and the site positions containing gaps were removed. The resulting alignment (available from the authors upon request) is 21,333 nucleotides long; 63 sites have at least one sequence with a different nucleotide, and only 10 sites are phylogenetically informative, i.e., they are useful to discriminate among different tree topologies, according to the unweighted parsimony criterion. Subalignments were generated for all of the known coding regions, most of which were identical among the different isolates. We analyzed open reading frame (ORF) 1ab (4), which appears to be the most variable. Maximum likelihood (ML) methods were employed for the analyses because they allow for the testing of different phylogenetic hypotheses by calculating the probability of a given model of evolution generating the observed data and by comparing the probabilities of nested models by the likelihood ratio test (5). In addition, because only 10 sequences were retained after excluding the identical ones, it was possible to search for the optimal ML tree through an exhaustive or branch-and-bound search (5).
Table 1 shows the average base composition and the ML estimates of parameters describing the mode of evolution of SARS-CoV in ORF 1ab. The
parameter of the
distribution is extremely low (0.008), implying an extensive heterogeneity in the rate at which different nucleotide sites mutate along the genome. Moreover, the ML estimator implies that about 90% of the constant sites in the sequences are indeed invariable, i.e., they never change, possibly because of strong purifying selection. The variable sites, on the other hand, accumulate mutations very quickly. However, a note of caution is necessary because such a result may also be due to the small number of sequences available for analysis and the very short observation period. Table 1 also shows that the hypothesis of a molecular clock cannot be rejected, although the P value is very close to 0.05; i.e., SARS-CoV isolates appear to be evolving at a constant evolutionary rate, which can be estimated from the ML tree with clock-like branch lengths shown in Fig. 1. The branch lengths in the tree are proportional to the number of mutations accumulated by each viral lineage during evolution from the cenancestor, the most recent common ancestor. Assuming that the SARS-CoV cenancestor entered the human population 4 to 8 months ago (7), the evolutionary rate of the virus is of the order of 4 x 10-4 nucleotide changes per site per year (95% confidence interval [CI], 2.0 x 10-4 to 6 x 10-4) along the entire ORF 1ab. When only the variable sites are considered, the estimated rate is noticeably higher: 3.5 x 10-3 changes per site per year (95% CI, 2.6 x 10-3 to 4.4 x 10-3). This is the usual range for an RNA virus. Therefore, on average, eight point mutations are expected for the entire ORF 1ab region at each replication round. However, we cannot exclude the possibility that the sequence variability in the data sets is also affected by the passage of the virus in Vero cell culture before sequencing (4). Figure 1 also shows that the root of the tree, inferred by ML, is between the strains isolated from Hong Kong and Beijing, which are known to be epidemiologically linked to the strains isolated from patients in Guangdong Province and all the others (4). Epidemiological data also indicate that the index patient traveled from Guangdong to Hotel M in Hong Kong, where he transmitted the virus to several individuals who successively traveled to Singapore, Canada, and Vietnam (4). The tree shows, indeed, that the Singapore isolate and the isolates from Beijing belong to different, statistically supported clusters. However, because of the low phylogenetic signal, further classification of SARS-CoV isolates is not possible by phylogenetic analysis. All analyses confirm that SARS-CoV is not closely related to any known coronavirus (4), although it is assumed that the source must be one or more unidentified animal reservoirs in Asia.
|
View this table: [in a new window] |
TABLE 1. Maximum likelihood estimators of nucleotide substitution model parameters for the SARS virus in ORF 1ab polyproteina
|
![]() View larger version (19K): [in a new window] |
FIG. 1. Optimal ML tree of SARS-CoV ORF 1ab nucleotide sequences. Branch lengths are drawn proportional to the number of nucleotide changes per site and were estimated via ML enforcing a molecular clock and employing the HKY85+ +I nucleotide substitution model (Table 1). The numbers on the branches represent the percentages of bootstrap-jackknife support (1,000 replicates) for the subtending clade. The P value for the zero-branch-length test (7) is also given.
|
|
|
|---|
|
Marco Salemi Walter M. Fitch Department of Ecology and Evolutionary Biology University of California, Irvine Irvine, California
Massimo Ciccozzi*
Martha J. Lewis
|
||||||
| * Phone: 0039 0649902337, Fax: 0039 0649387210, E-mail: ciccozzi{at}iss.it |
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»