Previous Article | Next Article ![]()
Journal of Virology, December 2005, p. 15503-15510, Vol. 79, No. 24
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.24.15503-15510.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Departments of Microbiology and Immunology,1 Pediatrics,2 Obstetrics, Gynecology and Women's Health,3 Epidemiology and Population Health, Albert Einstein Cancer Center, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, New York 10461,4 Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 100245
Received 7 June 2005/ Accepted 12 September 2005
|
|
|---|
|
|
|---|
The typical HPV genome contains seven or eight open reading frames. The late genes, L1 and L2, encode structural proteins that comprise the viral capsid, while two of the early genes, E1 and E2, mediate viral genome replication (7). The ability of high-risk HPV infection to progress to malignancy is due in large part to the activities of two other early ORFs, E6 and E7. High-risk E6 and E7 are the primary oncoproteins, influencing the activities of cell cycle protein p53 and pRB, respectively (6). Many animal PVs and nearly all cutaneous and low-risk HPVs also contain E6 and E7. Though other proteins, such as E4 and E5, are thought to complement their activity by modulating viral late functions (9), high-risk E6 and E7 are important cell cycle regulators, and their expression is often sufficient to immortalize primary human keratinocytes in tissue culture systems (13).
A striking feature of PV genome organization is that the ORFs have remained largely intact through millions of years of evolution extending through avian and mammalian PV lineages (22). As double-stranded DNA viruses, PVs maintain high fidelity in viral genome replication by usurping the host cell proofreading polymerases. Evolution proceeds clonally through mutation and selection or genetic drift, and remarkably, no examples of recombination have been documented.
The overall stability of PV genomes suggests a simple monophyletic model of evolution. Indeed, several traits, including oncogenicity and tissue tropism appear to have evolved only once (15, 27). Bravo and Alonso (2) have shown that this monophyly breaks down when the phylogenies of early and late genes are compared qualitatively. However, no studies have yet shown the extent of this breakdown in statistical terms. To date, the approach of most classification studies has conflated the phylogenetic histories of the papillomavirus genes to serve a taxonomic end. For example, because the PV L1 gene is known to be highly conserved, it has been regarded as the analysis standard of PV evolution: genera, species, types, subtypes, and variants have been classified by degrees of divergence in L1, and the L1 phylogeny is often taken as representative of the virus (5). In this report, we provide quantitative statistical evidence that phylogenies created from several alpha HPV ORFs are in fact incongruent with the recently proposed taxonomy.
|
|
|---|
Sequence alignment and phylogenetic analysis. Multiple-sequence alignments were created using CLUSTAL W with gap cost 10.0 and either the Gonnet cost matrix for protein or the IUB matrix for DNA (25). Phylogenies were created using only equal weighted characters from the coding regions E6, E7, E1, E2, L2, and L1. E5 was not used because homologous ORFs were not found throughout this data set. E4, embedded within E2, is also excluded from the analysis, but it is treated in depth elsewhere (16). The total-evidence tree shown in Fig. 1 was generated with both protein and nucleotide sequences of all six coding regions (a rationale for using concatenated nucleotide and amino acid sequences is provided in reference 1), while the early- and late-gene trees shown in Fig. 2 contain E6, E7, E1, and E2 nucleotide sequences and L2 and L1 nucleotide sequences, respectively. Parsimony and neighbor-joining trees were generated with PAUP version 4.10 (24). Before these trees were constructed, alignment gaps were coded as missing. In both the parsimony and distance analyses, 100 bootstrap replicates were performed to assess robustness at each node. Bayesian trees were constructed using the Markov-chain Monte Carlo technique in MRBAYES sampling every 100th generation in a series of 100,000 cycles (8).
![]() View larger version (22K): [in a new window] |
FIG. 1. Total-evidence phylogeny. A phylogenetic tree was inferred from maximum-parsimony, neighbor-joining, and Bayesian methods. The tree shown is from the Bayesian analysis inferred from alignment of protein and nucleotide sequences of six concatenated ORFs (E6, E7, E1, E2, L2, and L1). Numbers on or near branches indicate first the node number and after the colon support indices from methods in the following order: Bayesian credibility value, maximum-parsimony bootstrap percentage, and neighbor-joining bootstrap percentage. Methods that show 100% support are represented with an asterisk. Any conflict between the Bayesian tree and either of the other two methods is indicated by an "N" at the appropriate node. Only informative sites were kept for the analyses. Bovine PV type 1 (BPV1) was used as the outgroup taxon.
|
![]() View larger version (16K): [in a new window] |
FIG. 2. Early- and late-gene phylogenies. Phylogenetic trees were inferred using Bayesian methods. The early tree (A) was calculated from E6, E7, E1, and E2 concatenated nucleotide alignments, while the late tree (B) was derived from combined L2 and L1 nucleotide sequence data. Bayesian credibility values are provided near the appropriate nodes, and alpha papillomavirus group designations are shown on their respective leaf branches. A representative virus was chosen from each of 13 alpha HPV species groups (5).
|
Localized incongruence. To locate incongruence and to assess the relative degrees of incongruence across different node/partition pairs, we developed a computational algorithm modeled after the localized incongruence length difference (LILD). LILD measures the tree length difference between a parsimony search wherein a node optimal in the total-evidence phylogeny constrains tree topology and a parsimony search where that constraint is lifted (26). For analysis of the genital HPVs, we treated E6, E7, E1, E2, L2, and L1 as separate partitions and calculated the LILD for each of these partitions at every node in the total-evidence phylogeny shown in Fig. 1. For the 59 internal nodes in a phylogeny of 60 PVs, this translates into 354 unique LILD experiments. To determine the nonparametric statistical significance of these LILD calculations, we evaluated the difference relative to a distribution of 100 random partitions created from the original total-evidence data matrix. Constrained and unconstrained tree lengths were therefore calculated for 202 trees at every node for every partition. For the test partition, we employed a search of 100 bootstrap replicates for both the constrained and unconstrained conditions. Random partitions were searched using 20 bootstrap replicates. Coupled to the number of unique node/partition pairs, this analysis was computationally intensive, requiring nearly 1.5 million tree reconstructions. To facilitate the analyses, the LILD pipeline algorithm was adapted to run in parallel. The PERL programs designed for this task employed both Bioperl (23) and PAUP* (24) and are available from the authors.
Nucleotide sequence accession numbers. The sequences of two newly sequenced, novel genital HPV genomes were deposited in GenBank under the following accession numbers: HPV106, DQ080082; and candHPV102, DQ080083.
|
|
|---|
Separate Bayesian trees for the early genes and late genes were computed, and the results are displayed in Fig. 2. Since members (i.e., types) within a given species group (60 to 70% sequence identity) almost always cluster together (data not shown) independent of constituent types chosen to construct the tree, representative viruses were chosen as "type species" as suggested by de Villiers et al. (5). The deep branching patterns of the early and late trees constructed with representative type species are markedly different. Though the tree constructed using early-gene sequences mirrors the total-evidence phylogeny, the late-gene tree is incongruent at several points, including the common ancestor of the high-risk clade (species 5, 6, 7, 9, and 11). The late-gene tree subdivides this clade, splitting groups 9 and 11 from groups 5, 6, and 7. Individual trees were also generated for E6, E7, E1, E2, L2, and L1. In all cases, using the early genes resulted in trees that maintain an oncogenic node, while L1 and L2 trees split this node consistent with two lineages of high-risk types (data not shown).
Visualization of early- and late-gene sequence divergence.
The topological incongruence seen between trees generated from the early and late genes was further characterized by performing pairwise open reading frame analyses as shown in representative scans for E6 and L2 in Fig. 3. These graphs display global similarity for each ORF of viruses from each species group compared with each HPV type used to construct the tree shown in Fig. 1. This graphic approach facilitated identification of ORF regions that showed exaggerated divergence from others. We used the total-evidence phylogeny generated by concatenated HPV ORFs to guide the profiles scanned. The difference between early- and late-gene trees suggested that each ORF should show a distinct open reading frame scan; however, only the E6 and L2 scans showed patterns consistent with the incongruence noted between trees generated from the early and late genes. The plateau effect evident in the E6 trace (>70% identity relative to a 60% baseline) is indicative of a deep, shared homology in all high-risk types, highlighting the relationship between E6 function and cancer association. The L2 trace, however, splits this plateau and instead displays a shouldering phenomenon (
75% identity relative to a 70% baseline in groups 10, 8, 1, 13, 9, and 11) that unites groups 7, 5, and 6 with groups 4, 15, 3, and 2. This is consistent with the differences in branching patterns observed in the early- and late-gene trees shown in Fig. 2 and suggests the position of incongruence may be around the presumed oncogenic node. While all early genes seem to retain the oncogenic node and both late genes seem to break it, the open reading frame scans suggested that the incongruence is most pronounced for E6 and L2.
![]() View larger version (24K): [in a new window] |
FIG. 3. Open reading frame scans. The frame of the phylogenetic tree and its species groups displayed in Fig. 1 are presented alongside plots of percent identity calculated from global pairwise alignments. The behavior of species group 7 viruses differs with respect to E6 (A) and L2 (B). A common shared homology in E6 appears to be broken in L2.
|
![]() View larger version (37K): [in a new window] |
FIG. 4. Sliding-window analysis. The pairwise distances of HPV18 (A) and HPV16 (B) relative to HPVs selected for Fig. 2 are plotted for comparison. The analysis employed a 500-nucleotide window sliding 20 positions at a time. Arrows emphasize positions of interest, indicating two E6/E7 populations of viruses corresponding to high- and low-risk viruses and the L2 N terminus that displays two populations in the HPV18 scan, but not in the HPV16 scan.
|
Figure 5 shows a distribution of those node/partition combinations deemed significant with a P of 0.01 using a nonparametric test (26). The plot displays relative incongruence across ORF/node pairs. Of the 11 statistically significant node/partition pair LILDs, 9 involve either L1 or L2. And remarkably, significant LILD is concentrated at the oncogenic node, corresponding to node 4 in Fig. 1 and 5, the common ancestor of species 9, 11, 7, 5, and 6. Both L2 and L1 show incongruence at this node: the tree length difference for L2 is 60 steps, while that for L1 is 33 steps. The L2 LILD far outstrips any other node/ORF pairs in terms of relative significance, as indicated by arrow 1 in Fig. 5. Node 3 (the common ancestor to the oncogenic types and groups 10, 8, 1, and 13) also shows significant relative incongruence with respect to L2. However, in this case, poor bootstrap support coupled to a diffuse distribution of random partitions suggests a generally unstable node (Fig. 5). Only two of the significant node/ORF pairs involve the early genes, E7 (5/E7) and E6 (6/E6), which is not totally unexpected, given that the total-evidence phylogeny was relatively consistent with the early ORF tree (compare Fig. 1 and Fig. 2A).
![]() View larger version (41K): [in a new window] |
FIG. 5. Localized incongruence length difference analysis. The LILD distributions for all node/partition pairs considered significant at a P of 0.01 are shown. Node numbers correspond to those given in Fig. 1. The full distribution (both the test partition and random partitions greater than zero) for each significant pair is given in order to visualize the relative extent of incongruence from each calculation. The node/partition pairs of particular interest are highlighted with arrows: 4/L2 (arrow 1) and 4/L1 (arrow 2).
|
|
|
|---|
With the growth of HPV sequence data, a number of groups have attempted phylogenetic classification of novel new HPV types (4, 15). These studies all employed L1, primarily because the field's most commonly used primers (e.g., MY09/MY11 and GP5+/GP6+) amplify a fragment in this ORF that was readily available for sequencing, but also because the L1 ORF is highly conserved in PV genomes. The L1 phylogeny has since become the PV standard. de Villiers et al. used the L1 ORF in the most recent classification of PVs, delimiting, for the first time, sequence similarity cutoffs for genus and species designations (5). However, previous phylogenies built from either the short
300-base-pair L1 segments or complete L1 sequences were incongruent with E6 phylogenies (5, 15). Whereas E6 phylogenies cluster high-risk genital HPVs, suggesting a common ancestor (Fig. 1), the L1 trees split this oncogenic clade, grouping species 9 and 11 with 1, 8, and 10, and species 7, 5, and 6 with 4, 15, 3, and 2 (Fig. 2). Bravo and Alonso also highlight the differing topologies of early- and late-gene trees (2).
In light of the connection between clinical manifestations and HPV evolution, the discrepancies in tree topology is an important one and may have significant implications for future studies on the biology of HPV genomes. Though trees from both E6 and L1 have been generated in numerous classification studies thus far, this fundamental incongruence has not been rigorously examined and subjected to statistical analysis. Previous studies aimed to categorize the complete body of PV sequences, concentrating more on the relationships between genus supergroups rather than the shifting branching pattern between species.
Whole-genome HPV scanning allows visualization of divergent evolutionary history. The first indication of a conflict in the evolutionary histories of the genital HPV ORFs was in the E6 and L2 open reading frame scans (Fig. 3). The E6 plateau, uniting the high-risk types, overlapped with the L2 shoulder, hinting that specific regions were driving the deep incongruence characteristic of the discordant early- and late-gene phylogenies. Extending the simple similarity measures used in the open reading frame scan to a similarity analysis employing a sliding window reinforced and extended these observations, implicating the amino half of L2 (Fig. 4). The alpha papillomavirus species groups 7, 5, and 6 are at the intersection of the E6 plateau and the L2 shoulder, appearing to share two distinct phylogenetic histories.
Determining incongruent branches and ORFs used to reconstruct the HPV phylogeny. To pinpoint the branches in the phylogenetic tree that demonstrate incongruence with statistical confidence, we developed a high-volume, high-throughput system for localizing tree length differences between constrained and unconstrained nodes in the total-evidence tree. In effect, the process is an extension of the localized incongruence length difference test (26), designed to dissect the competing influences of component genes across every node of a total-evidence phylogeny. For the alpha HPVs, the results of this analysis are striking (Fig. 5). Incongruence is firmly focused on the oncogenic node, the putative common ancestor for all high-risk (i.e., cancer-associated) genital mucosal types. L2 was particularly significant, implying that the evolutionary histories of the high-risk viruses follow two distinct paths, one conferred by the early genes and another by the late genes. This conclusion is also supported by the oncogenic node incongruence observed in L1. Though its incongruence was not evident in the pairwise similarity measures employed in the open reading frame scans and sliding-window analyses, L1 clearly contributed to the late-gene incongruence.
Localization of incongruence has been used in studies of domain shuffling of steroid receptors. In that study, detection of incongruence length differences localized to a particular node allowed the inference of domain shuffling (26). It has also been used in explaining a genomic island contributing to the diversity of Actinobacillus actinomycetemcomitans (18). In the data presented here for whole HPV genomes, a localized length difference could indicate an early recombination event or strong divergence due to very different selective constraints on particular genes or gene regions. What is clear from these analyses is that no single ORF provides a definitive evolutionary history of the alpha PVs. When genes are pooled, it is possible to resolve the incongruence and favor a single point of origin for high-risk types. However, this convenient resolution blurs the ORFs' competing effects and masks the important evolutionary differences in the paths taken by the early and late genes.
Perhaps the most important observation from these experiments is that the high-risk types are not necessarily monophyletic. From an evolutionary point of view, there appear to be two distinct populations of high-risk PVs: the HPV16-related groups 9 and 11 and the HPV18-related groups 7, 5, and 6. The two groups share homology in their early genes, particularly the oncogenic genes E6 and E7, but diverge with respect to L1 and L2. A number of histological studies comparing infections with HPV16 and HPV18 show that the two viruses differ in terms of their biological niche distribution (e.g., association with squamous or glandular cancer) (3). A progenitor of the oncogenic types related to HPV18 (alpha PV groups 7, 5, and 6) may have adapted characteristics of the L2 and L1 genes of alpha PV groups 4, 15, 3, and 2. Alternatively, the E6 and E7 genes of that same progenitor may have evolved features that mimic the oncogenicity of alpha PV groups 9 and 11, suggesting two distinct evolutionary instances of the high-risk phenotype. Although an ancient recombination event is possible, lack of evidence for recombination in present-day PVs suggests that natural selection appears to have converged on either a single solution for the late genes or a single solution for oncogenicity.
This work is supported in part by a grant from the National Cancer Institute, National Institutes of Health to R.D.B. (CA78527).
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»