Previous Article | Next Article ![]()
Journal of Virology, June 2005, p. 7014-7023, Vol. 79, No. 11
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.11.7014-7023.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Leiping Fu,1
Rolando Herrero,3
Rob DeSalle,4 and
Robert D. Burk1,2*
Department of Microbiology & Immunology,1 Departments of Pediatrics, Epidemiology & Population Health, and Obstetrics, Gynecology and Women's Health, Albert Einstein Cancer Center, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461,2 Proyecto Epidemiológico Guanacaste, Costa Rican Foundation for Health Sciences, San José, Costa Rica,3 Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 100244
Received 18 October 2004/ Accepted 27 January 2005
|
|
|---|
) ratio (M3 = 0.7965). The E2 gene
had the next-highest
ratio (M3 = 0.5611); however, no
specific codons were under positive selection. These data
indicate that the E6 and E5 ORFs are evolving under positive Darwinian
selection and have done so in a relatively short time period. Whether
response to selective pressure upon the E5 and E6 ORFs
contributes to the biological success of HPV16, its specific biological
niche, and/or its oncogenic potential remains to be
established. |
|
|---|
HPVs are causally involved in the etiology of cervical cancer and its precursor lesions (16, 17, 43, 49). Of the high-risk HPV types associated with cervical cancer, HPV16 is the most prevalent and is found in approximately half of all cancers (7, 49). Numerous variants of HPV16 have been identified in different geographic locations and ethnic groups (35, 42, 53, 68). Although all HPV16 isolates are closely related, previous studies inferred five distinct phylogenetic branches among HPV16 variants: European (E), Asian (As), Asian-American (As-Am), African-1 (Af-1), and African-2 (Af-2), corresponding to the geographic locations from which the samples were obtained (12, 34). Subsequent studies by sequence analyses of theHPV16 variants in other genomic regions (e.g., E6, L2, and L1) expanded and complemented this phylogenetic hypothesis (69).
Although HPV16 variants are an important focus of phylogenetic studies and the molecular variants of E2, L2, L1, the URR, and especially the E6 region have been described in detail previously (28), covariation among different ORFs belonging to the same lineage or isolate have not been studied in great detail. HPV16 variants have demonstrable differences in biological properties in vitro which may be responsible, in part, for differences in pathogenicity, carcinogenic risk, and perhaps immunogenicity (28). Furthermore, HPV16 variants are associated with different cervical cancer risks (6, 32, 65, 67). Although the diversifying selection in the HPV16 E6 and E7 oncogenes has been described recently (20), the evolutionary basis of the entire genome, coevolutionary mechanisms among different HPV16 genes, and their underlying biological significance remain unknown. Obtaining whole-genome sequences representative of the major HPV16 variants allowed us to determine with certainty nucleotide and amino acid sequence changes that are of potential evolutionary importance.
Comparison of synonymous (silent;
dS) and nonsynonymous (amino acid-changing;
dN) substitution rates in protein-coding genes
provides an important means for investigating the forces of molecular
evolution (72). The
nonsynonymous/synonymous rate ratio (
=
dN /dS) measures selective
pressure at the protein level. Nonsynonymous rates
that are significantly higher than synonymous rates are
accepted as evidence for molecular adaptation
(71,
72). This criterion has
been used to identify a number of genes under adaptive molecular
evolution (45).
In this report, the complete nucleotide sequences of and codon variations within HPV16 genomes representing the five major lineages were determined. By examining the ratio of dN to dS substitutions per site, diversifying selection acting on all of the eight protein-coding regions was evaluated. In addition, complete genome sequences for these variants allow us to reconstruct their genealogical relationships with unprecedented precision.
|
|
|---|
Sequence analysis of the URR revealed that HPV16 genomes were distributed into five previously defined lineages of HPV16 (69): As-Am, Af-1, Af-2, E, and As. Clinical information for each sample, together with the GenBank accession numbers of the HPV16 DNA sequences, are listed in Table 1. HPV16 whole genomes were amplified by overlapping PCR (58, 59). Two sets of primers for nested PCR were designed to amplify the entire genome in two overlapping fragments of 4,108 bp and 3,863 bp. Oligonucleotide primer sequences used in these studies are available from the authors. For overlapping PCR, an equal mixture of AmpliTaq Gold DNA polymerase (Perkin-Elmer Applied Biosystems, Foster City, CA) and Platinum Taq High Fidelity DNA polymerase (Invitrogen, Rockville, MD) were utilized as previously described (58, 59). The estimated rate of error using high-fidelity Taq polymerase is approximately 1 x 106 errors per bp (18, 47). PCR products were separated by electrophoresis in agarose gels, stained with ethidium bromide, and visualized under UV illumination. After confirmation of the appropriate product size, each PCR product was purified (QIAGEN gel extraction kit; QIAGEN, Valencia, CA), ligated into the pGEM-T Easy vector (Promega, Madison, WI), and sequenced on an ABI 3700 sequencer in the Einstein Sequencing Facility. Several additional primers were used to obtain supplemental sequences to clarify sequence ambiguities. Isolates within the same lineages were included to evaluate intralineage variation. HPV16 nucleotide and amino acid positions were numbered according to the HPV16R European reference sequence (RS) described and annotated in the HPV Sequence Database (50).
|
View this table: [in a new window] |
TABLE 1. Clinical
information on specimens
|
The computer program MRBAYES v3.0b4 (38) was used for Bayesian tree construction, with 100,000 cycles for the Markov chain Monte Carlo algorithm. The gamma model was set for among-site rate variation and allowed all substitution rates of aligned sequences to be different. Maximum parsimony (MP) and maximum likelihood (ML) trees were calculated by a heuristic search with PAUP* v4.0b10 (57). For maximum parsimony analysis, amino acid and nucleotide sequence data were reduced from 2,475 to 52 and from 7,425 to 124 phylogenetically informative sites, respectively. For maximum likelihood analysis, the computer program MODELTEST v3.06 (54) was used to identify the best evolutionary model; the phylogenetic model HKY+G was selected to provide the likelihood parameters, with a transition-transversion ratio of 1.7220 and gamma distribution shape 0.2316. Both parsimony and likelihood data were bootstrap resampled 1,000 times. HPV31 was set as the out-group taxon (29).
Positive selection estimation.
The
nonsynonymous/synonymous rate ratio (
=
dN /dS) is an indicator of
natural selection, with
values of 1 representing neutral
variation,
values of <1 representing purifying
selection, and
values of >1 representing diversifying
positive selection. Different amino acid sites in a protein are
expected to be under different selective pressures and have different
underlying
ratios
(72,
73). Six codon
substitution models were used to investigate whether positive selection
could be identified within the eight ORFs of HPV16 as follows: M0
(one-ratio), M1 (neutral), M2 (selection), M3 (discrete), M7 (beta),
and M8 (beta and
)
(72). These models view
the codon as the fundamental unit of evolutionary change and
take into account genealogic history when calculating scores. Log
likelihood scores evaluate the quality of the fit of the input data to
the conditions of the model. In these models,
was
estimated for separate classes of codons that are
assumed to have evolved independently of one another.
The six
models used for the
distribution were implemented in the
CODEML program in the PAML package
(70,
72). Maximum parsimony
within PAUP* v4.0b10 (57)
was used for tree reconstruction for each HPV16 ORF. M0 assumes the
same
ratio for all sites. M1 assumes two classes of sites in
the protein characterized by the formulas
0
= 0 (conserved sites) and
1 = 1
(neutral sites). M2 adds a third class of sites with
as a
free parameter, thus allowing for sites with
values of
>1. M3 uses a general discrete distribution with three site
classes, with the proportions (p0,
p1, ..., pK-1)
and the
ratios (
0,
1, ...,
K-1) estimated from the data. M7 assumes a
beta distribution B(p, q), which, depending
on parameters p and q, can take various shapes in the
interval (0, 1). M8 adds an extra class of sites to the beta (M7) model
with the proportion and the
ratio estimated from the data,
thus allowing for sites with
values of >1
(72).
In order to
assess the influence of positive selection on a particular coding
region, a likelihood ratio test (LRT) was used to compare nested models
(51,
72). Twice the log
likelihood difference between two models follows a
2 distribution with 2 degrees of freedom, which is
equal to the difference in the number of parameters estimated between
the models. From these models, three LRTs were constructed, which
compared M1 with M2, M1 with M3, and M7 with M8. When alternative
models (M2, M3, and M8) all detect the same sites with
values
of >1, all three tests taken together can be considered strong
evidence of positive
selection.
|
|
|---|
To reconstruct the origin of present-day HPV16 variants, we constructed phylogenetic trees using multiple algorithms that resulted in three different topologies. The MP trees inferred from the nucleotide sequences of the eight ORFs and URR with MRBAYES and PAUP* both demonstrated two major clades for HPV16 variants (Fig. 1A), consistent with previous analysis (69). The grouping of the European taxa (E, As, and G131) was consistent in the trees generated by all algorithms. However, a different topology was noted in both ML trees based on the nucleotide sequences of the eight ORFs and URR (Fig. 1B) and the MP tree based on the amino acid sequences of the eight ORFs (Fig. 1C) using MRBAYES and PAUP* algorithms, respectively. These data indicate that the non-European taxa (Af-1, Af-2, and As-Am) are a paraphyletic group (i.e., they lack features common to all members of the European taxa). The MP trees based on the concatenated amino acid and nucleotide sequences showed the same topology as that obtained with the amino acid sequences alone (data not shown). The position of the Af-1 genome was ambiguous and the most variable; however, upon close examination of the shared and disparate sequences, it has the characteristics of a true intermediate bridging the evolution of the European and Af-2-As-Am groups.
![]() View larger version (12K): [in a new window] |
FIG. 1. Phylogenetic
trees inferred from the complete genomes of representative HPV16
variants. Trees were generated using multiple algorithms. (A)
Tree created using Bayesian construction and MP based on the nucleotide
sequences of the eight ORFs and the URR. (B) Tree created
using Bayesian construction and ML based on the nucleotide sequences of
the eight ORFs and the URR. (C) Tree created using Bayesian
construction and MP based on the amino acid sequences of the eight
ORFs. The parsimony tree based on the concatenated amino acid and
nucleotide sequences was similar to the tree shown in panel C (data not
shown). The numbers on or near branches are support indices for the
algorithms described for each
tree.
|
|
View this table: [in a new window] |
TABLE 2. Nucleotide sequence variations between the HPV16R European variant RS and each variant genomea
|
Sequence variation within HPV16 coding regions. Nucleotide sequence variations in the eight putative ORFs and their corresponding amino acid sequences were compared with the HPV16R reference sequence (Tables 2 and 3). Measures of variability for each region/ORF of the HPV16 genome are shown in Table 4. The frequencies of nucleotide changes in HPV16 genes varied from 2.0% to 6.3%. However, the proportion of nonsynonymous amino acid variations was highest in E5 and E2, which were followed in that regard by L2, E4, E6, E1, L1, and E7. The absolute ratios of the number of nonsynonymous to synonymous changes were over 1.00 in the E2 and E5 ORFs. This ratio is different from the dN /dS ratio, which is based on the relative rate ratio values.
|
View this table: [in a new window] |
TABLE 3. Amino
acid sequence variations between the HPV16R European variant RS and
each variant genomea
|
|
View this table: [in a new window] |
TABLE 4. Comparison
of nucleotide and amino acid sequence variability within HPV16 genomes
|
Lineage fixation among different HPV16 regions. No evidence of genomic recombination was present in the representative HPV16 variant genomes. Since the genomes are evolving through nucleotide changes and not gross rearrangement or recombination, nucleotide changes in one region (e.g., E6) are highly correlated with and inseparable from changes in other regions (e.g., E1) within genomes from the same lineage. For instance, amino acid changes at E6 (78 [Y]), E1 (78 [E], 168 [S], and 452 [D]), E2 (35 [Q], 203 [D], 208 [A], 254 [N], 271 [V], and 341 [C]), and an additional 11 positions (shown in boldface in Table 3) all segregate together, representing ancestral changes in the non-European taxa. Similarly, amino acid changes at E6 (14 [Q] and 78 [H]), E1 (78 [Q], 168 [C], and 452 [E]), E2 (35 [H], 143 [A], 203 [N], 208 [P], 254 [T], 271 [F], and 341 [W]), and 12 other positions (Table 3, underlined in the RS) indicate ancestral changes in the European lineage. These changes have occurred as the result of lineage fixation akin to linkage disequilibrium in organisms with recombining genomes. Isolates classified by nucleotide variations in E6, the URR, and L1 would have the corresponding base changes in E1, E2, and L2. However, E4, E5, and E7 failed to supply sufficient information for variant classification, especially for the non-European isolates.
Mechanism of HPV16 variation: identification of purifying and diversifying selection.
The nonsynonymous/synonymous rate ratio
(
= dN /dS)
was used to calculate whether positive selection has been a force in
the evolution of HPV16 variants within each ORF. The likelihood
analysis, including parameter estimates for different models, is shown
in Table
5.
|
View this table: [in a new window] |
TABLE 5. Phylogenetic
analysis by ML estimation for HPV16 genes
|
) were calculated, and the model with the highest
log likelihood value was used as the "best" model. In
essentially all ORFs, the M3 (discrete) model was optimal. The
dN /dS ratio is an average of
all sites in an ORF. For instance, using M3 for ORF E6, the average
dN /dS ratio is 0.41. The
majority (94%) of sites are under purifying selection, with
values of less than 1, but 5.6% of sites are under
strong diversifying or Darwinian selection, with
values of
7.3. These sites are HPV16 E6 aa 10, 14, and 83 (see Table
6). The E5 ORF had the highest average
value(0.80), with about
5% of sites (E5 aa 48 and 65; Table
6) under strong
diversifying selection with
values of 10.7 by M3. To further
test whether specific sites were evolving under positive selection, we
used the LRT. Only the E6 and E5 ORFs consistently showed sites under
positive selection (Table
5). Thus, although the E2
ORF had one of the higher average
values (
=
0.56), with about 24% of sites with a dN
/dS value over 1 (
value of 2.32 by M3),
no specific sites could be confirmed to be under strong positive
selection by the LRT. However, the E4 ORF overlaps with the E2 hinge
region that bridges the E2 DNA binding and activation domains. This
results in an increased number of nonsynonymous changes in the hinge
region, elevating the overall dN
/dS value of the E2 (data not shown). The remaining
HPV16 ORFs seem to be highly conserved under purifying selection
pressure, despite the presence of nucleotide and amino acid variation.
Table 6 is a summary of
the likelihood ratio tests, which are more sensitive for detecting
selection at a single amino acid within an ORF than the other
tests(72). |
View this table: [in a new window] |
TABLE 6. Likelihood
ratio tests for positive selection of amino acid sites
|
|
|
|---|
Lineage fixation of genetic polymorphisms within HPV16 genomes. A number of studies have demonstrated an association between HPV16 variants and cervical high-grade disease/cancer (6, 32). Some reports have suggested that for HPV16 variants, both numerical and specific nucleotide alterations relative to the HPV16R reference sequence may affect biological behavior (65, 67, 76). Some epidemiological studies have also shown an association between high-grade lesions and cancer and an HPV16 nucleotide change within E6 at nt 350 (T to G) (1, 75), whereas another study suggested that the latter phenomenon might be population dependent (77). If causal, these associations may be explained by differences (i) in the transcriptional regulation of the virus by different variants, (ii) in the biological activities of the protein encoded by variants (e.g., enhanced transformation abilities of E6/E7), or (iii) in the abilities of the hosts to respond immunologically to specific viral epitopes encoded by variants (15, 31, 64). Alternatively, they may be markers of other linked changes in the variant genome. However, to unequivocally evaluate causal genetic effects, whole-genome analyses are needed. The present study analyzed the full genetic variability of HPV16 isolates representing the five major classes of variants. Most previously reported variants were also detected in our sequenced isolates. This is the first report to analyze taxa within an HPV type in the context of the complete genome.
There was no evidence for recombination between HPV16 variants. A deep branch was determined between non-European and European variants, which can be inferred from linked amino acid variations throughout the HPV16 genome; 21 such positions distributing to six HPV16 ORFs represent a primordial change in the non-European taxa ancestor (in boldface in Table 3). Previous studies suggested that non-European isolates might confer a higher risk of developing high-grade cervical intraepithelial neoplasia (CIN)/cancer compared to so-called prototype-like (European) variants based on a limited set of nucleotide alterations (28). However, the array of common variants likely contains a complex set of biological changes related to a higher risk of cervical cancer. Further immunological and biochemical analyses focusing on these variable sites are needed.
Diversifying selection within HPV16 oncogenes.
In this
study, the nonsynonymous/synonymous rate ratio (
=
dN /dS) was used to determine
whether positive selection impacts the evolution of HPV16 variants
within different ORFs. The average dN
/dS ratio of each HPV16 ORF was less than 1,
indicating that all HPV16 ORFs are under purifying selection pressure.
However, in agreement with a previous report
(20), the E6 oncogene
contains three codon sites that are evolving under the
influence of diversifying selection (HPV16 E6 aa 10, 14, 83). HPV16 E6
aa 27 was also suggested to be under selective pressure
(20), but this site was
not identified in the present study. In addition, we demonstrate that
E5 contains two sites under diversifying selection (HPV16 E5 aa 48 and
65), which suggests that HPV16 E6 and E5 have recently evolved through
positive selection pressure.
Immune selection is likely to be a force driving the evolution of HPV16 genomes. The main immune response to HPV infection is cell mediated. The presentation of viral peptides to T cells in the context of human leukocyte antigen (HLA) class I and II molecules is influenced by genetic polymorphisms of both HPV and HLA. These variations in the immune response to different viral isolates are influenced by the heterogeneity of the host immune response and influence the outcome of HPV infection. Recent suggested that the common E6 variant, L83V, is predominately associated with three distinct HLA class I alleles, namely, B*44, B*51, and B*57, for the E6 epitope stretching from aa 74 to 83(SEYRHYCYSL/V) (1, 11, 75). The immunologic relevance of the HPV16 E6 N-terminal region and variant positions E6 aa 10 and 14 is supported by the demonstration of an endogenously processed HLA A*0201-restricted E6 peptide (KLPQLCTEL; E6 aa 11 to 19) as well as of an overlapping HLA B-7-restricted E6 peptide (RPRKLPQL; E6 aa 8 to 15) in this region (3, 23). In another study, the HPV16 E6 variant R10G was demonstrated to alter a B*07 binding epitope such that it may influence immune recognition by cytotoxic T lymphocytes (23). Certain HLA class I alleles, in concert with specific HPV variants, could be associated with a predisposition for cervical cancer development, whereas others may be protective. The variable positions predicted to be under Darwinian selection (HPV16 E6 aa 10, 14, and 83) might thus be under immune selection.
Additional functional characteristics of the sites under selection may be related to modification of E6 protein-protein interaction domains. A substantial proportion of the changes found in HPV16 E6 were located in the amino half of the protein, which represents the binding region of the cellular protein E6-AP and has a role in both cell-mediated and humoral host immune responses (26, 40, 41). Moreover, a recent report indicates that variations in codons 14 and 27 substantially altered the E6 antigenic index and may affect interactions with E6-AP (66). This provides biochemical support for selection at HPV16 E6 aa 14. In addition, the cellular tumor suppressor protein p53, usually targeted and degraded by HPV E6, is polymorphic in human populations and varies according to latitude and ethnicity (4). A p53 polymorphism at codon 72, exon 4, resulting in either a proline (Pro) or an arginine (Arg), has been identified as a possible risk factor for the development of cervical carcinoma (55). A few studies have suggested that the combination of HPV16 E6 aa L83 and the p53 Arg/Arg genotype might represent an interactive risk factor for cervical cancer (8, 63). Therefore, selection pressure driven by protein-protein polymorphic interactions may facilitate the life cycle and pathogenicity of HPV-associated disease.
HPV16 E5 is a highly hydrophobic membrane-bound protein of 83 amino acids associated with the Golgi apparatus, endoplasmic reticulum, and nuclear membrane in infected cells. Recently, the E5 protein has been suspected of acting to alter levels of cell-cycle regulators and to play a role during the productive/early stage of the HPV16 life cycle (25, 27). Previous studies have indicated that amino acid changes within HPV16 E5 might alter the transforming activity of the protein by affecting the interactions with the epidermal growth factor receptor, the 16-kDa subunit of H+-ATPase and, potentially, other cellular proteins (19, 56). An amino acid change, L48V, was noted between HPV16 E5 amino acids 46 and 50, a stretch of hydrophobic residues potentially representing a transmembrane domain. HPV16 E5 protein also stimulates the nuclear oncogenes c-jun, junB, and c-fos. E5 induction of c-jun is through an activator protein-1 binding site, and E5-activation of c-fos is via nuclear factor-1 (NF-1) binding sites. Therefore, E5 may influence HPV gene expression via the activation of activator protein-1 and NF-1 (13, 14). Since several binding sites of transcription factors (e.g., NF-1) located in the HPV URR are variable, selective pressure driven by the ability of HPV16 E5 protein to active viral gene expression via these transcription factors is possible. Although mutation analyses indicated that the transforming activity of mutant E5 proteins was similar to that of wild-type E5 (37), the oncogenicity of specific HPV variants may vary geographically, possibly due to genetic differences between populations. Since the biochemical and biological functions of HPV16 E5 protein are not well characterized, efforts are still needed to decipher the complicated network existing between E6, E7, E5, and other cellular proteins.
Diversifying selection within the HPV16 E2 gene.
It is
worth noting that the E2 gene might be under selection pressure, since
it has a relatively high
ratio (Table
5). The HPV16 E2 protein
is involved in gene-expression regulation and replication of the HPV
genome and appears to be the target of both humoral and cellular immune
responses (22,
44). Several humoral
epitopes are known in the E2 protein, one in the transactivation domain
(aa 121 to 140), three in the hinge region (aa 181 to 200, 241 to 260,
and 271 to 290), and one in the DNA binding domain (aa 328 to
346) (22). In
this work, all but one (i.e., aa 181 to 200) of these epitopes had
changes in the E2 proteins of non-European variants, suggesting the
potential for different humoral responses to HPVs of the European and
non-European taxa (Table
3). Since the disruption
of the E2 gene during viral integration contributes to tumor
progression, the variant HPV16 E2 proteins of non-European variants
(e.g., the As-Am isolate) may be better adapted for deregulating the
expression of viral oncogenes
(10). Hence, the high
level of diversity among HPV16 E6 and E5 proteins may be a potential
force driving the evolutionary selection of the E2 gene. For instance,
HPV16 E2 variants frequently cosegregated with the E6 T350G (L83V)
variant that has been associated with viral persistence
(10). This cosegregation
was proposed to act as an additional risk factor for the development of
cervical cancer (46,
60). Another reason that
the HPV16 E2 gene contained a relatively high level of nonsynonymous
substitutions might be explained by the evolutionary constraint imposed
upon it from the overlapping gene, HPV16 E4. Accordingly, synonymous
substitutions of the E2 gene might be constrained by the relatively
conserved E4 gene (39,
48). However, no specific
sites were detected by the LRTs. Yang and Swanson
(74) suggest that these
tests are highly prone to sampling bias and that more variants of HPV16
E2 gene should therefore be included in future molecular evolutionary
analyses.
This study provides new data demonstrating lineage fixation of variants in complete HPV16 genomes. Analysis of a representative set of HPV16 genomes identified diversifying sites under selective pressure within the HPV16 E6 and E5 ORFs. These HPV16 proteins are known to have transforming activities. Nevertheless, more effort is needed to examine the evolutionary dynamics of HPV genomes, especially those associated with high-risk HPV types, in order to understand the genetic basis of biological success and niche adaptation that probably indirectly lead to cancer causation.
This work was supported in part by Public Health Service awards CA78527 and CA85178 from the National Cancer Institute.
Present address: National Research Institute for Child Health & Development, 3-35-31 Taishidoh, Setagaya, Tokyo 154-8567, Japan. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»