Previous Article | Next Article ![]()
Journal of Virology, March 2009, p. 2697-2707, Vol. 83, No. 6
0022-538X/09/$08.00+0 doi:10.1128/JVI.02152-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.
,
CIRAD, UMR 53 PVBMT CIRAD-Université de la Réunion, Pôle de Protection des Plantes, Ligne Paradis, 97410 Saint Pierre, La Réunion, France,1 Electron Microscope Unit, University of Cape Town, Private Bag, Rondebosch 7701, South Africa,2 School of Biological Science, University of Canterbury, Private Bag 4800, Christchurch, New Zealand,3 Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Observatory 7925, South Africa4
Received 13 October 2008/ Accepted 23 December 2008
|
|
|---|
|
|
|---|
Such interspecies recombination is fairly common in many virus families (8, 17, 27, 44, 82). It is becoming clear, however, that as with mutation events, most recombination events between distantly related genomes are maladaptive (5, 13, 38, 50, 63, 80). As genetic distances between parental genomes increase, so too does the probability of fitness defects in their recombinant offspring (16, 51). The viability of recombinants is apparently largely dependent on how severely recombination disrupts coevolved intragenome interaction networks (16, 32, 51). These networks include interacting nucleotide sequences that form secondary structures, sequence-specific protein-DNA interactions, interprotein interactions, and amino acid-amino acid interactions within protein three-dimensional folds.
One virus family where such interaction networks appear to have a large impact on patterns of natural interspecies recombination are the single-stranded DNA (ssDNA) geminiviruses. As with other ssDNA viruses, recombination is very common among the species of this family (62, 84). Partially conserved recombination hot and cold spots have been detected in different genera (39, 81) and are apparently caused by both differential mechanistic predispositions of genome regions to recombination and natural selection disfavoring the survival of recombinants with disrupted intragenome interaction networks (38, 51).
Genome organization and rolling circle replication (RCR)—the mechanism by which geminiviruses and many other ssDNA viruses replicate (9, 67, 79; see reference 24 for a review)—seem to have a large influence on basal recombination rates in different parts of geminivirus genomes (20, 33, 39, 61, 81). To initiate RCR, virion-strand ssDNA molecules are converted by host-mediated pathways into double-stranded "replicative-form" (RF) DNAs (34, 67). Initiated by a virus-encoded replication-associated protein (Rep) at a well-defined virion-strand replication origin (v-ori), new virion strands are synthesized on the complementary strand of RF DNAs (28, 73, 74) by host DNA polymerases. Virion-strand replication is concomitant with the displacement of old virion strands, which, once complete, yields covalently closed ssDNA molecules which are either encapsidated or converted into additional RF DNAs. Genome-wide basal recombination rates in ssDNA viruses are probably strongly influenced by the specific characteristics of host DNA polymerases that enable RCR. Interruption of RCR has been implicated directly in geminivirus recombination (40) and is most likely responsible for increased basal recombination rates both within genes transcribed in the opposite direction from that of virion-strand replication (40, 71) and at the v-ori (1, 9, 20, 69, 74).
Whereas most ssDNA virus families replicate via either a rolling circle mechanism (the Nanoviridae, Microviridae, and Geminiviridae) (3, 23, 24, 31, 59, 67, 74) or a related rolling hairpin mechanism (the Parvoviridae) (25, 76), among the Circoviridae only the Circovirus genus is known to use RCR (45). Although the Gyrovirus genus (the other member of the Circoviridae) and the anelloviruses (a currently unclassified ssDNA virus group) might also use RCR, it is currently unknown whether they do or not (78). Additionally, some members of the Begomovirus genus of the Geminiviridae either have a second genome component, called DNA-B, or are associated with satellite ssDNA molecules called DNA-1 and DNA-Beta, all of which also replicate by RCR (1, 47, 68).
Recombination is known to occur in the parvoviruses (19, 43, 70), microviruses (66), anelloviruses (40, 46), circoviruses (11, 26, 60), nanoviruses (30), geminivirus DNA-B components, and geminivirus satellite molecules (2, 62). Given that most, if not all, of these ssDNA replicons are evolutionarily related to and share many biological features with the geminiviruses (22, 31, 36), it is of interest to determine whether conserved recombination patterns observed in the geminiviruses (61, 81) are evident in these other groups. To date, no comparative analyses have ever been performed with different ssDNA virus families to identify, for example, possible influences of genome organization on recombination breakpoint distributions found in these viruses.
Here we compare recombination frequencies and recombination breakpoint distributions in most currently described ssDNA viruses and satellite molecules and identify a number of sequence exchange patterns that are broadly conserved across this entire group.
|
|
|---|
3') was excluded from analyses (because it was largely unalignable) and sequences were linearized at the first codon of the nonstructural protein gene. Microvirus sequences were linearized at position 1469 relative to the sequence of isolate M14428. Sequence alignments were constructed using poa (37) and edited both by eye and using the ClustalW-based (77) alignment tool implemented in Mega4 (75). Highly divergent sequences (i.e., those sharing <60% genome-wide sequence identity to any other sequences in a data set) were discarded. Finally, to ensure that sequences could be aligned properly, data sets were split into groups of sequences all sharing >60% genome-wide sequence identity. The Begomovirus DNA-A/DNA-A-like and Mastrevirus genome sequence alignments analyzed here were described previously (38, 81). Four "population-level" data sets that were used to detect evidence of recombination rate differences within the complementary- and virion-strand geminivirus and circovirus genes were assembled precisely as outlined previously (61).
Details of all analyzed data sets are given in Table S1 in the supplemental material, and sequence alignments are available upon request from the authors and/or within RDP3 project files provided as supplemental material.
Characterization of individual recombination events. Detection of potential recombinant sequences, identification of likely parental sequences, and localization of recombination breakpoints was carried out with the RDP (48), GENECONV (62), BOOTSCAN (49), MAXCHI (54), CHIMAERA (64), SISCAN (21), LARD (29), and 3SEQ (4) methods implemented in RDP3 (52) (see the RDP project files submitted as supplemental material for full details of program settings). Default settings were used throughout, and only potential recombination events detected by three or more of the above methods coupled with phylogenetic evidence of recombination were considered significant. Our choice of using the consensus of three or more methods was determined empirically based on false-positive rates encountered during analyses of the simulated data sets of Posada and Crandall (64). Simultaneously analyzing these data sets with seven methods (all of those mentioned above except for LARD), using a consensus of three or more methods with a Bonferroni-corrected P value cutoff of 0.05, resulted in false-positive rates below one falsely inferred recombination event per 100 data sets analyzed while at the same time ensuring a good degree of analysis power. To achieve maximum analysis power, we minimized the severity of Bonferroni correction during exploratory recombination analyses by either removing from analyzed alignments or masking within them (a setting in RDP3) all but one sequence within groups of sequences sharing >98% genome-wide sequence identity. The exact program settings can be accessed within the RDP project files provided as supplemental material.
Analysis of genome-wide recombination patterns. Recombination breakpoint density plots and recombinant region count matrices were constructed using RDP3 as described previously (27, 38). The matrices represent the numbers of times that recombinational movements of sequence tracts between genomes separate pairs of nucleotide sites. This representation of detectable recombination events highlights the differential "exchangeability" of sequence tracts between genomes. Whereas highly exchangeable genome regions (i.e., those represented by warm colors in the matrices due to their frequent movement into foreign genetic backgrounds) are expected to be most modular, the less exchangeable regions (i.e., those represented by cool colors due to their infrequent movement into foreign genetic backgrounds) are expected to be the least modular.
Recombination hot and cold spot tests. Recombination breakpoint hot and cold spots were identified from breakpoint distribution plots by use of previously described permutation-based linear "local" and "global" tests (27). The statistical significance of potential recombination region hot and cold spots in recombinant region count matrices was tested using a two-dimensional version of the linear local recombination hot and cold spot permutation test of Heath et al. (27). Briefly, this involved the same procedure as the linear test except that rather than plotting breakpoints in permuted and real data sets on a linear genome map, the genomic regions bounded by breakpoints were plotted on recombination region count matrices as described previously (45). The score of each cell in a particular real recombinant region count matrix was ranked relative to corresponding cells recorded in 1,000 permuted matrices to identify cells in the real matrix that had either higher or lower values than 95% or 99% of corresponding cells in the permuted matrices. It should be stressed that the permutation P values were not corrected for multiple testing and that one would, for example, expect a false-positive rate of 5% of the cells in the matrix at a P value cutoff of 0.05. Nevertheless, the test does provide a reasonable quantitative assessment of the least and most transmissible portions of genomes that takes into account the influence that sequence diversity has on the detectability of recombination events (64).
Comparison of recombination breakpoint densities between different genome regions. We used another modification of the local permutation test of Heath et al. (27) to specifically test for clustering of recombination breakpoints in different genome regions. In this test, rather than partitioning the alignment with a moving window of set length, the alignment was partitioned in various other ways. For example, to test for significant clustering of breakpoints in the intergenic regions, alignments were partitioned into coding and noncoding regions and tested to determine whether more/fewer breakpoints were detectable in the intergenic regions than could be accounted for by chance. Other similar tests included (i) discounting breakpoints falling outside coding regions and determining whether individual genes contained significantly more/fewer detectable breakpoints than the remainder of the coding regions and (ii) again discounting breakpoints falling outside coding regions and determining whether the middle 50% of all genes collectively contained significantly more/fewer detectable breakpoints than those collectively observed in the beginning 25% and ending 25% of the genes.
Inference of population-scaled mutation and recombination frequencies. Variations in site-to-site composite likelihood estimates of population-scaled recombination frequencies were assessed with the INTERVAL (56) component of ldhat (55). The program settings for these analyses were a precomputed likelihood lookup table for a population-scaled mutation frequency of 0.001, a minimum minor allele frequency cutoff of 0.05 (for data sets containing 20 or more sequences) or 0.01 (for data sets with between 11 and 19 sequences), a block penalty of 10, a starting recombination frequency of 5, and 107 Markov chain Monte Carlo updates, with sampling every 2,000 updates and the first 500 samples discarded (56). The average genome-wide recombination frequency estimate obtained after a first run with these parameter settings was then used for a second run, using the same parameters but with the starting recombination frequency replaced by that estimated from the first run. We avoided analysis inaccuracies at the edges of alignments by simulating circular genome sequences as described previously (61). Briefly, this involved constructing tandemly repeated alignments of full genome sequences and then excluding the point recombination frequency estimates for the repeated ends of the alignments.
|
|
|---|
![]() View larger version (19K): [in a new window] |
FIG. 1. Variable recombination rates across the genomes of maize streak virus (MSV) (61) (a), DNA-A genome components of East African cassava mosaic virus (EACMV) (61) (b), porcine circovirus 2 (PCV-2) (c), porcine circovirus 1 (PCV-1) (d), and DNA-B genome components of East African cassava mosaic virus (EACMV-DNA B) (e and f). The black plots represent average estimates of point recombination rates determined by the reversible-jump Markov chain Monte Carlo (RJMCMC) approach implemented in the INTERVAL component of ldhat (64). Gray regions represent the 95% credibility intervals of point recombination rate estimates from the RJMCMC chain. Genome cartoons above the plots indicate the starting and ending alignment positions of various genes (cp, CP gene; rep, replication-associated protein gene). Vertical lines beneath the genome cartoons indicate the locations of polymorphic sites used for the analysis.
|
Interspecies and interstrain recombination is common in ssDNA viruses. We used a battery of eight recombination detection methods and a series of manual recombination signal evaluation tools implemented in the program RDP3 (52) to identify and characterize 663 unique recombination events detectable within 27 different ssDNA virus full genome/genome component data sets (see Table 1 for a summary of these events and Table S1 in the supplemental material for details of each individual event, as well as the interactive RDP3 project files provided as supplemental material).
|
View this table: [in a new window] |
TABLE 1. Summary of recombination signals detectable in ssDNA virus full-genome data sets
|
A v-ori recombination hot spot is partially conserved across diverse rolling circle replicons. For data sets in which more than 10 recombination events were detectable, we tested for the presence of both recombination breakpoint and recombinant region hot and cold spots. Whereas breakpoint distribution maps were generated and tested as described previously (27), the tracts of sequence exchanged during recombination events were also mapped onto recombinant region count matrices. These matrices describe the relative frequencies with which different parts of the analyzed ssDNA replicons are separated during recombination. Recombinant region hot and cold spots were identified as genome regions that were more or less frequently exchanged, respectively, during recombination than can be accounted for by chance (with the null hypothesis that tracts of sequence are randomly exchanged). Importantly, the permutation tests used for both the recombination breakpoint distribution plots and the recombinant region count matrices account for recombination being inherently easier to detect in more diverse genome regions than it is in less diverse regions.
The most conserved feature of detectable recombination breakpoint distributions was a statistically significant breakpoint cluster at the v-ori's (black arrows in Fig. 2) of circovirus (beak-and-feather disease virus), microvirus, and geminivirus (mastrevirus and begomovirus DNA A, DNA B, and DNA-Beta) genomes/genome components that are known to use RCR. While it has been apparent for some time now that the v-ori is a mechanistically predisposed recombination hot spot in geminiviruses (73, 74) and circoviruses (9), our results indicate that the same is probably true for most other rolling circle replicons. There are two probable reasons for v-ori sequences being recombination hot spots. Firstly, they are the natural points at which recombinational repairs of double-stranded genome breakages are resolved by the joining and nicking activities of Rep proteins (74), and secondly, they are the sites at which unit-length genomes are replicationally released from high-molecular-weight genomic concatemers that are produced by recombination-dependent replication (33, 65).
![]() View larger version (48K): [in a new window] |
FIG. 2. Distributions of recombination breakpoints detected within different ssDNA virus data sets. All detectable breakpoint positions are indicated by small vertical lines at the top of the graphs. A 200-nt window was moved along each of the represented alignments 1 nt at a time, and the number of breakpoints detected within the window region was counted and plotted (solid lines). The upper and lower broken lines indicate 99% and 95% confidence thresholds, respectively, for globally significant breakpoint clusters. Light and dark gray areas indicate local 99% and 95% breakpoint clustering thresholds, respectively, taking into account local regional differences in sequence diversity that influence the ability of different recombination detection methods to identify recombination breakpoints. Red areas indicate recombination hot spots, while blue areas represent recombination cold spots. Genes (horizontal arrows) are represented on the top of the graph (cp, CP gene; rep, replication-associated protein gene). The v-ori's of various rolling circle replicons are indicated with vertical black arrows.
|
Besides the v-ori hot spot, we found no other obviously conserved recombination breakpoint patterns between ssDNA replicons from different families. We realized, however, that given the small numbers of recombination breakpoints detected in many of the data sets, our analysis lacked sufficient power to find any but the most obvious recombination hot and cold spots.
Despite this, we were encouraged by the fact that careful visual inspection of recombinant region count matrices (Fig. 3, upper triangles) revealed what may have been subtle conserved recombination patterns that were missed by our breakpoint hot/cold spot test. In these matrices, whereas dark/light blue triangles and rectangles correspond with genomic regions that tend to be inherited from the same parental source during recombination events, orange/red triangles and rectangles denote recombinant regions that tend to become separated during recombination events. The lower triangles in Fig. 3 indicate whether individual pairs of sites represented in the upper triangles are separated more (red) or less (blue) often during recombination events than can be accounted for by chance. Note, however, that the statistical test used in these "probability matrices" is not multiple comparison corrected, which means, for example, that for a P value threshold of <0.01 one would expect a 1% false-positive rate per pair of sites analyzed.
![]() View larger version (57K): [in a new window] |
FIG. 3. Recombination region count matrices (upper hemimatrices) and recombination region hot/cold spot matrices (lower hemimatrices) for 12 different ssDNA replicon data sets. Unique recombination events were mapped onto the matrices based on their estimated breakpoint positions. In the upper matrices, the shades displayed are a function of the number of times that pairs of nucleotides (plotted on the x and y axes) were separated by observable recombination events. Pairs of sites that appear to be the least and most separable by recombination (taking into account the genome-wide variation in sequence diversity and the impact that this has on recombination detection) are represented on the bottom matrices by blue and red, respectively. The positions of genes (horizontal arrows) are represented between the matrices (cp, CP gene; rep, replication-associated protein gene).
|
Selection apparently disfavors recombinants with breakpoints in coding regions. To directly test for conserved features of recombination breakpoint distributions that might underlie the patterns we observed in the recombination region count matrices, we modified our recombination breakpoint hot and cold spot test. The original test effectively determined whether the numbers of breakpoints observed in particular small stretches of sequence (in the case of Fig. 2, a moving 200-nucleotide [nt] window) were greater or less than could be accounted for by chance. Given the small numbers of breakpoints detectable in many of the data sets, this test lacked power primarily because the average number of breakpoints per window was low (and often zero), regardless of whether windows were over recombination cold spots or not. To remedy this problem in our new test, rather than partitioning sequence alignments using a moving window, we simply partitioned them into two or three large regions and used the same permutation test to determine whether individual partitions contained more or fewer recombination breakpoints than could be accounted for by chance. Specifically, we compared breakpoint numbers for (i) coding versus noncoding regions, (ii) the middle 50% of genes versus the beginning and end 25% of genes, and (iii) CP genes versus other genes. To further increase the power of our modified test, we merged some of our original 27 data sets and discarded five others in which fewer than eight recombination breakpoints were detectable (Table 2).
|
View this table: [in a new window] |
TABLE 2. Imbalances in recombination breakpoint locations between different genome regions
|
While this tendency might be due to recombination breakpoints within genes being less tolerable than those that fall between genes, it is difficult to discount the fact that the tendency might instead be caused by the occurrence within intergenic regions of mechanistically predisposed recombination hot spots such as v-ori's. However, even when intergenic regions were discounted, we detected a tendency for recombination breakpoints to occur within the beginning and ending 25% of genes rather than in the middle 50%. This tendency was significant (P < 0.05) for all but the geminivirus DNA-1, anellovirus (TTV), circovirus (PCV), parvovirus, and dependovirus data sets. Among these exceptions, the anellovirus (TTV), parvovirus, and dependovirus data sets displayed higher densities of recombination breakpoints at the edges of genes than in the middle 50% of genes. Also, when we merged the dependovirus and parvovirus data sets (both are members of the Parvoviridae), the increased clustering of breakpoints at the edges of genes became significant (Table 2).
Collectively, the relatively low abundance of breakpoints within coding regions and the tendency for breakpoints to occur toward the edges of genes are consistent with the hypothesis that breakpoints are less tolerable when they occur within genes (5, 38, 83) because there is a relatively high probability that recombinant proteins will not fold properly (14; see reference 7 for a review).
CP genes contain fewer detectable breakpoints than other genes. While recombination region count matrices (Fig. 3) indicated that the entire CP genes of various ssDNA virus groups tended to be inherited from the same parental source, recombination breakpoint distribution analyses indicated that for data sets containing a CP gene, 8 of 13 (61%) had statistically significant recombination cold spots within these genes. We therefore specifically tested whether CP genes tended to have significantly fewer recombination breakpoints than other genes. Importantly, the test we used discounted breakpoints that fell within intergenic regions and was therefore an unbiased comparison between the genes themselves.
As expected, relative to the other genes, we found significantly fewer (P < 0.05) recombination breakpoints within CP genes for 6 of 10 analyzed data sets (3 of the original 14 data sets did not contain a CP gene, and 1 contained only a CP gene). Among the four data sets where CP genes did not have significantly fewer breakpoints than other genes, the circovirus (PCV) and parvovirus data sets had lower densities of breakpoints within their CP genes than in their Rep genes (the only other gene on these replicons), and in the case of the parvovirus data set, this lower density was marginally significant (P = 0.0932). The other two exceptional data sets, the anellovirus (TTMV) and dependovirus data sets, were both unusual in that related data sets (TTV and parvovirus data sets, respectively) displayed either a significant or marginally significant tendency toward having lower breakpoint numbers within their CP genes.
Decreased recombination breakpoint densities have been noted within the structural protein genes of picornaviruses (27, 42, 72), adenoviruses (41), human immunodeficiency viruses (both capsid and envelope proteins) (17), and hepatitis B viruses (85). This implies either that viral CP genes generally experience low basal recombination rates or that they are generally less tolerant of recombination than most other genes. For geminiviruses, there is good evidence from both experimental and computational analyses that CP genes both experience lower basal recombination rates (33, 61) and have a low degree of recombination tolerance (38, 51). This evidence suggests that whereas recombination breakpoints within the CP gene frequently disrupt protein folding (38), even breakpoints bounding the gene can disrupt proper CP function by interfering with coevolved interactions between the CP and the remainder of the genome (51). In addition to this, recombination might compromise coevolved interactions between different CP molecules and disrupt virus particle assembly. Our results suggest that natural selection acting against disruption of these various CP interactions is operational across all of the ssDNA viruses and is possibly a feature of virus evolution in general.
Shared mechanistic and selective processes probably underlie shared recombination patterns. Although it has been known for some time now that large numbers of recombination events are detectable within the full genome sequences of most ssDNA virus groups (2, 11, 26, 30, 40, 46, 60, 62, 66), we have shown here that these events have occurred in patterns which are broadly conserved across the ssDNA viruses. As has been suggested for the geminiviruses (38, 81), these conserved patterns imply that an interplay between selective and mechanistic processes determines the general distributions of recombination breakpoints detectable in ssDNA viruses.
We have found evidence of what appear to be similar mechanistic predispositions to recombination between the geminiviruses and circoviruses. Most obviously, we and others have found evidence consistent with the hypothesis that the v-ori's of diverse rolling circle replicons are mechanistically predisposed recombination hot spots (9, 73, 74). It also appears likely that complementary-sense genes in both geminiviruses and circoviruses experience increased recombination rates relative to virion-sense genes, possibly due to mechanistic interferences between the transcription and replication complexes during RCR.
Higher mechanistic predispositions to recombination in particular parts of genomes do not, however, necessarily translate into increased numbers of breakpoints detectable in those regions in naturally sampled genomes. If they are to survive, newly produced recombinants must be able to compete productively with their parents. Our results suggest that among the ssDNA replicons we analyzed, natural selection in general tends to (i) penalize breakpoints within coding regions more harshly than it does breakpoints in intergenic regions, (ii) favor recombinants with breakpoints on the edges of genes more than recombinants with breakpoints within the centers of genes, and (iii) strongly disfavor recombinants with breakpoints within CP genes.
The notion that all ssDNA replicons might be experiencing approximately equivalent evolutionary processes is further supported by the recent finding that microviruses, circoviruses, parvoviruses, and geminiviruses (and probably other ssDNA replicons too) are unusual among DNA viruses in that they are subject to nucleotide substitution rates that are as high as those of some RNA viruses (see reference 15 for a review). Although high mechanistic mutation and recombination frequencies do not necessarily translate into high evolution rates, our recombination results emphasize the huge evolutionary potential of these viruses. Whereas in the past this potential no doubt facilitated the dispersal and adaptation of ancestral ssDNA replicons to bacterial, plant, and animal hosts (12, 22), in recent times it has no doubt contributed directly to the emergence of many of these viruses as serious plant and animal pathogens (15, 58, 71, 84).
Published ahead of print on 30 December 2008. ![]()
Supplemental material for this article may be found at http://jvi.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»