Previous Article | Next Article ![]()
Journal of Virology, July 2004, p. 7131-7137, Vol. 78, No. 13
0022-538X/04/$08.00+0 DOI: 10.1128/JVI.78.13.7131-7137.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Lindell Bromham,2,
Megan Woolfit,2 Gwenaël Piganeau,2 Judy Tellam,1 Geoff Connolly,1 Natasha Webb,1 Leith Poulsen,1 Leanne Cooper,1 Scott R. Burrows,1 Denis J. Moss,1 Sofia M. Haryana,3 Mun Ng,4 John M. Nicholls,4 and Rajiv Khanna1*
Tumour Immunology Laboratory, Division of Infectious Diseases and Immunology, Queensland Institute of Medical Research, The Bancroft Centre, and Joint Oncology Program, Department of Molecular and Cellular Pathology, University of Queensland, Brisbane, Australia,1 Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Falmer, Brighton BN1 9QG, United Kingdom ,2 Department of Pathology, University of Hong Kong, Hong Kong SAR, People's Republic of China,3 Faculty of Medicine, Gadjah Mada University, Yogyakarta, Indonesia4
Received 2 December 2003/ Accepted 12 March 2004
|
|
|---|
|
|
|---|
Each of the EBV-associated malignancies is characterized by a unique viral and cellular phenotype. In most of the EBV-associated malignancies, viral gene expression is often restricted to a limited number of proteins. One such protein that has received considerable attention is latent membrane antigen 1 (LMP1), which is expressed in all EBV-associated pathological diseases except Burkitt's lymphoma. This protein is of particular interest because it has been recognized as one of the most crucial latent proteins for EBV-mediated transformation of normal B cells and is uniquely able to induce malignant outgrowth and hyperplasia in transgenic mice (17). Previous studies have shown that LMP1 acts as a constitutively active receptor-like molecule independent of the binding of a ligand. The transmembrane domains mediate oligomerization of LMP1 molecules in the plasma membrane, a prerequisite for LMP1 function. The C terminus of LMP1 initiates signaling through C-terminal activator regions (referred to as CTAR) which are involved in the induction of NF-
B, STAT, and Janus kinase 3 (JAK3) (11, 16, 22).
Over the last few years, there has been increasing evidence that specific amino acid changes within the various functional domains of LMP1 may alter its oncogenic potential (3, 4, 26). Particular attention has been focused on a 10-amino-acid region (amino acids 343 to 352) of LMP1 adjacent to the CTAR2 domain which is frequently deleted from EBV isolates from regions of high NPC endemicity (18, 21). However, the clinical significance of this variant form of LMP1 has been challenged because the aggressively oncogenic phenotype of the variant form of LMP1 does not map to these 10 amino acids (13, 21). An alternative approach is to employ viral phylogenetics at the population level to understand the focal distribution of NPC in southeast Asia. This analysis revealed that LMP1 sequences from EBV isolates from regions where NPC is endemic have evolved under significant selection pressure and also as a distinct lineage with almost negligible exchange of isolates with other regions of the world.
|
|
|---|
PCR and DNA sequencing of EBV gene fragments. Specific oligonucleotide primers flanking three different regions of the LMP1 sequence (N terminus, 219 bp; transmembrane, 270 bp; and C terminus, 478 bp) were selected for PCR amplification. The resulting PCR products were purified with QIAquick spin columns (Qiagen Inc., Chatsworth, Calif.) and sequenced in both directions with a Prism ready reaction dideoxy terminator cycle sequencing kit (Applied Biosystems Inc., Foster City, Calif.) following the manufacturer's protocol.
Molecular phylogenies.
Alignments of sequences from three regions of LMP1 (N terminus, transmembrane domains, and C terminus) were made with the sequence editor Se-Al (24). In addition to these three alignments, a concatenated alignment of all three regions was created. We used maximum likelihood to reconstruct the phylogeny of the nucleotide sequences of the EBV isolates, for each gene region and for the concatenated alignment. Although it requires more computational time than simpler methods such as parsimony or neighbor joining, maximum likelihood is more generally accurate and robust than other phylogenetic inference methods (10). We used the HKY+
model, which allows for variation in substitution rates across sites, base frequency bias, and transition-transversion bias (7, 31). We estimated the parameters of the model from the data with maximum likelihood. All maximum-likelihood phylogenies were estimated with the phylogenetic inference package PAUP* (28).
Analysis of selection across codons.
The rate of accumulation of synonymous substitutions, which change the DNA sequence of a gene but do not change the translated amino acid sequence, is expected to be governed primarily by the mutation rate. The rate of nonsynonymous substitutions, which change the amino acid sequence, may be influenced by selection. Therefore, the ratio of the rates of nonsynonymous and synonymous substitutions (signified by omega
) can be used as a test of the action of selection on the evolution of a protein-coding gene. Codons under no selection would be expected to have no restriction on changing the amino acid and so should accumulate nonsynonymous and synonymous substitutions at the same rate, so the ratio will be one (
= 1). Codons under negative selection, so that some amino acid changes have lower fitness than the existing amino acid, will accumulate nonsynonymous changes at a slower rate, so the ratio will be less than one (
< 1). Codons under such strong negative selection that no amino acid changes are permitted would have a ratio of zero (
= 0). Codons under positive selection, so that favorable changes in amino acids are promoted by selection, will have a greater rate of nonsynonymous substitutions than synonymous, so the ratio will be greater than one (
> 1).
In order to test for codons with
> 1, we compared the likelihoods of a nested series of models estimated by the program PAML 3.13, using a likelihood ratio test (5, 10). For this test, the estimated likelihood of the data given the first (simpler) model is compared to the likelihood given the second (more parameter-rich) model. If the second model has a significantly higher likelihood, the simpler model is rejected in favor of the more parameter-rich model. The test statistics for this comparison, twice the difference of the likelihoods of the two models, was compared to a chi-squared distribution (19a). The comparison between models was made with a likelihood ratio test (5, 10).
First, it is necessary to test whether
varies significantly across codons. If codons do not vary significantly in the ratio of nonsynonymous and synonymous rates, then there is no evidence that the rate at some sites is being determined by selection. However, if we can show that a multiple-ratio model (M1; codons differ in value of
) is a significantly better fit to the data than the single-ratio model (M0; all codons have the same value of
), then we can assume that different codons are being differently affected by selection. To determine whether the difference in the ratio of rates is due to positive or negative selection, two more nested pairs of models must be tested.
Under the two-ratio "neutral" model (M1), codons are either completely free to accept nonsynonymous changes (
= 1) or completely constrained so that no change of amino acid is permitted (
= 0). This two-ratio model only allows codons that are neutral or constrained, so in order to demonstrate that some codons are under selection, it must be shown that this model is inadequate to describe the data and that an additional rate category must be added (the three-ratio model, M3) (35).
The existence of a third category of sites that are neither completely neutral (
= 1) nor completely constrained (
= 0) suggests that some codons are under selection, either negative (some but not all amino acid changes allowed;
< 1) or positive (some amino acid changes are promoted by selection;
> 1). Since we are specifically interested in sites under positive selection, a further test is needed. We need to show that a model (M7) that allows codons to be constrained (
= 0), neutral (
= 1), or under negative selection (
< 1) is not adequate to describe the data. Only after model M7 is rejected in favor of the M8 model, which allows for codons under positive selection (
> 1), can we conclude that some codons in the sequence are under positive selection (34). Codons under positive selection can then be identified with an empirical Bayesian approach (35) and mapped onto the alignment.
Analysis of selection across lineages.
In order to test whether LMP1 sequences in the southeast Asian region show different patterns of molecular evolution than isolates from other regions, we took the maximum-likelihood phylogeny for each of the three LMP1 sequences and identified each edge of the tree as southeast Asian or non-southeast Asian. An edge of the tree is the lineage connecting two nodes (branching points) or between a node and a tip. "Southeast Asian edges" were those tips leading to southeast Asian isolates plus internal edges if the majority of edges attached to them were southeast Asian. The approach taken is similar to that described above for comparing nested sets of models to determine how many parameters are needed to describe the data. We used the codeml program in PAML 3.1 (33) to test a single-ratio model, where all lineages have the same ratio of nonsynonymous and synonymous rates (
), against a model which allowed two different ratios, one for southeast Asian edges and one for non-southeast Asian edges (32). We then tested whether the NPC lineages had a different
than non-NPC lineages within the southeast Asian clade by comparing a one-rate model (all lineages have the same
) to a two-ratio model (NPC lineages have a different value of
from non-NPC lineages).
Analysis of recombination.
To determine whether there was any evidence for recombination within the LMP1 sequences, we conducted four different tests for recombination on each of the alignments (1). Geneconv (S. A. Sawyer, 1999, http://www.math.wustl.edu/
sawyer) and maximum chi-square (27) tests were used to determine if there was any significant clustering of substitution between each pair of sequence. The decrease in linkage disequilibrium between segregating alleles, measured as r2, with distance was calculated as described previously (2). Finally, we also conducted a maximum-likelihood permutation test (20), which is designed to estimate the rate of recombination and assesses the significance of the estimate by randomly shuffling the sites.
|
|
|---|
![]() View larger version (24K): [in a new window] |
FIG. 1. Maximum-likelihood phylogenies of EBV isolates for three regions of the LMP1 gene: (a) N terminus, (b) transmembrane, and (c) C terminus. (d) Concatenated alignment of isolates for which all three regions were sequenced. See the text for details of phylogeny estimation. Each tree shows distinct clustering of isolates from different geographical regions: southeast Asia, Africa, Australia, and Papua New Guinea (PNG). Note that these trees are unrooted and so cannot be used to determine which regional group is basal. Isolates from NPC patients are shown in grey shading.
|
The tree based on LMP1 sequences shows very low resolution between the isolates within each region (particularly southeast Asia), as shown by the short branches within these groups. This suggests either that the LMP1 sequences sampled have a relatively recent common origin or that the rate of evolution of these sequences has been relatively low. Furthermore, because of the shallow divergences between the southeast Asian isolates, it is difficult to determine whether the LMP1 sequences from the NPC patients form a distinct clade within the region (which would indicate a common origin of NPC strains). Thus, the current analyses show no evidence that NPC lineages display a distinct origin within the southeast Asian clade (Fig. 2), and on the basis of the LMP1 sequences reported here, we cannot identify a unique strain of EBV that is responsible for NPC. However, these data does not rule out the possibility that unique sequence motifs within the southeast Asian strains predispose this population to a higher risk of NPC.
![]() View larger version (16K): [in a new window] |
FIG. 2. Maximum-likelihood phylogeny of LMP1 sequences from the southeast Asian isolates. All three gene regionsC terminus, transmembrane, and N terminuswere included in the alignment on which this tree is based, although not all three sequences were available for all isolates. Isolates from NPC patients are shown with grey shading.
|
) between isolates from distinct geographic regions for the three different regions of LMP1 and for the concatenated alignment of all three gene regions. In order to demonstrate that some codons in a sequence are under positive selection pressure, it must be shown that there is an excess of nonsynonymous substitutions (
> 1); that is, that more amino acid substitutions have occurred more frequently than would be expected by chance. The procedure established by Yang and colleagues (35) allows testing of a series of nested models of substitution against the data, at each step asking whether the data reject a simpler model in favor of a more complex model.
We found that
was higher in the southeast Asian lineages than in the non-southeast Asian lineages for all three regions of the LMP1 sequence (Table 1). However, these differences were not statistically significant, potentially as a result of low statistical power due to relatively short sequences. To increase the power of the test by increasing the sequence length, we repeated the test for a concatenated alignment of all three regions (987 nucleotides) for a subset of 12 isolates (six southeast Asian and six non-southeast Asian). For the concatenated alignment, the southeast Asian isolates showed a significantly higher
, suggesting that they were under different selection pressure than non-southeast Asian lineages (Table 1).
|
View this table: [in a new window] |
TABLE 1. Comparisons of the ratios of nonsynonymous to synonymous substitution rates ( ) across lineages, for three regions of the LMP1 gene and for the concatenated alignment of all three gene regions
|
than non-NPC southeast Asian lineages for the C-terminal and N-terminal regions but not for the transmembrane region: however, none of these differences were statistically significant (Table 1). We also tested the concatenated alignment of all three regions of the LMP1 gene for six isolates, comparing
in the NPC and non-NPC southeast Asian isolates. For both of these groups of isolates,
=
, indicating that in this alignment there were nonsynonymous substitutions but no synonymous substitutions, and so the
values for the NPC and non-NPC lineages cannot be statistically compared. It should be noted that none of the models in the analysis allows for variation in
both between lineages and across sites within lineages, so when analyzing
across lineages,
is averaged across all the sites in the alignment (and vice versa). This makes it a conservative test, as a higher
in one lineage than the other will only be detected if it affects many sites. Similarly, a change in
at only a few sites is unlikely to be detected, even if the change is large.
Evidence of selection across codons.
We looked for evidence of selection on specific codons within the LMP1 gene with an additional series of nested models. For the three different regions of LMP1 sequences analyzed in this study, only for the C terminus can the null (M0) and neutral (M1) models be rejected (Table 2). This may be because the analysis concerns the average
for each site, taken across all lineages in the phylogeny, and so will only detect positive selection in sites that are under selection in many lineages or for which selection is strong enough in a subset of lineages to make a significant difference to the average rate. For the C-terminal region, M7 is rejected in favor of M8, indicating that a model that allows some positively selected sites (
> 1) is a significantly better fit to the data than a model that allows only constrained sites, neutral sites, or sites under negative selection (Table 2). Sites with a probability of greater than 50% of being under positive selection are shown in Table 3 and Fig. 3. Of particular interest were residues 321, 350, and 356 within the C terminus of LMP1, which showed a probability of >99% of being positively selected. Two of these residues (N350 and A356) lie within box 2 of the JAK3 motif, while the other residue (D321) is located within the repeat region of LMP1 (Fig. 3). Other residues that showed a probability of >50 to 90% of being positively selected included a number of sites within the repeat region and CTAR2 and CTAR3 domains (Table 3 and Fig. 3).
|
View this table: [in a new window] |
TABLE 2. Likelihood ratio tests of variation in ratio of nonsynonymous to synonymous substitution rates ( ) across codons for three regions of the LMP1 genea
|
|
View this table: [in a new window] |
TABLE 3. Amino acid residues within the C terminus of LMP1 sequences that have a > 50% probability of being positively selecteda
|
![]() View larger version (80K): [in a new window] |
FIG. 3. C-terminal alignment, with the putative positively selected sites shown. Sites for which P is >99% are shown in bold yellow letters on a blue background, while sites for which P is >50% but less than 99% are shown as red letters on a yellow background. The repeat region is shown as a red box. The CTAR domains and JAK3 motif are also indicated. For more details on the analysis, see the text and Table 3.
|
(1). To confirm that our finding of significant positive selection on codons in the C terminus is not due to recombination, we conducted four different tests of recombination on the alignments for each gene region. Since each of these methods makes different assumptions about the data, it is very likely that any signal of recombination would be detected by at least one of them. First, we used the Geneconv (Sawyer, http://www.math.wustl.edu/
sawyer) and maximum chi square (27) tests to determine if there was any significant clustering of substitutions between each pair of sequences. Second, we looked at another fingerprint of recombination, the decrease in linkage disequilibrium between segregating alleles, measured as r2, with distance (2). We also conducted a maximum-likelihood permutation test (20), which is designed to estimate the rate of recombination and assesses the significance of the estimate by randomly shuffling the sites and testing whether the fit of the model to the data gets worse. A summary of these analyses is presented in Table 4. These analyses revealed conclusive evidence of recombination for the N-terminal region (two methods detected recombination), while only weak evidence for recombination was observed for the transmembrane region of LMP1 (one method detected recombination). However, none of the four tests detected any evidence for recombination in the C-terminal region (Table 4). These analyses further support our conclusions that that the C terminus of the LMP1 gene is under significant positive selection pressure, particularly at some sites within the C-terminal activator regions.
|
View this table: [in a new window] |
TABLE 4. Output of the recombination detection methods useda
|
B and/or AP-1 activation. Further effects, such as promotion of cell survival through the induction of antiapoptotic proteins (8) and/or by reducing the immunogenicity of malignant cells (29), may also contribute to the endemic distribution of NPC in this region. This hypothesis is strongly supported by recent studies by Hu and colleagues, who showed that LMP1 sequences derived from LMP1-expressing NPC tumors were highly mutated and poorly immunogenic (9). This study suggests that the focal distribution of NPC may be influenced not only by host genetics, diet, and environment but also by an interplay with viral genetics. Furthermore, the present study also provides an important platform from which to explore the similar interplay of viral genetics in other human malignancies, especially those associated with hepatitis B virus, human papillomavirus, and human T-cell leukemia virus type 1.
These authors contributed equally to this study, and their order should be considered arbitrary. ![]()
|
|
|---|
B. Proc. Natl. Acad. Sci. USA 94:12592-12597.
B activation. J. Virol. 71:586-594.[Abstract]
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»