Previous Article | Next Article ![]()
Journal of Virology, August 2002, p. 7968-7975, Vol. 76, No. 16
0022-538X/02/$04.00+0 DOI: 10.1128/JVI.76.16.7968-7975.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
Departments of Microbiology and Immunology,1 Pharmacology and Toxicology,4 Oncology, London Regional Cancer Centre, The University of Western Ontario, London, Ontario, Canada N6A 4L6,5 London Laboratory Services Group, St. Joseph's Hospital, London, Ontario, Canada N6A 4V2,2 INSERM UR524, 59045 Lille, France3
Received 19 March 2002/ Accepted 7 May 2002
|
|
|---|
|
|
|---|
The leftmost adenoviral gene, termed early region 1A (E1A), is the first gene expressed after infection and has been most extensively characterized in hAd5. In hAd5, the E1A gene encodes two major proteins of 289 and 243 residues that are expressed early after infection. These proteins arise from differential splicing of the same transcript and differ only by the presence of an internal sequence of 46 amino acids in the larger protein. The E1A proteins are localized in roughly equal amounts in both the cytoplasm and nucleus (58, 72). Three additional mRNA species are produced at later times that encode, or are predicted to encode, proteins of 217, 171, and 55 amino acids (67, 74). The E1A proteins are essential for a productive viral infection (35), as they activate expression of other viral early genes and reprogram cell growth to provide an optimal environment for viral replication (4).
hAd5 E1A interacts with a variety of cellular proteins, including transcriptional coactivators, such as the CREB binding protein (CBP) and p300 (2, 15, 42), the p300/CBP-associated factor (pCAF) (56), and the transcriptional repressor CtBP (61). E1A also interacts with various components of the general and specific transcriptional machinery, including the TATA-binding protein (TBP) (6, 22, 26, 64), several of the TBP-associated factors (21, 46), and a lengthy list of sequence-specific transcription factors (20). In addition, E1A also targets proteins that directly regulate the cell division cycle, such as the retinoblastoma tumor suppressor gene product (Rb) and the related family members p130 and p107 (16, 18, 25, 81) and the cyclin-dependent kinase inhibitors p21 and p27 (37, 45). Because of these many interactions with cellular regulatory proteins, the multifunctional E1A proteins influence a variety of transcriptional and cell cycle events (4, 14, 20, 49, 54, 63). Importantly, hAd5 E1A can function as an oncogene in rodent cells. Expression of hAd5 E1A alone is sufficient to immortalize primary rodent cells (30) and can fully transform them in cooperation with a second oncogene, such as adenovirus E1B (23) or activated ras (59). In human cells however, E1A can function as a tumor suppressor gene by inhibiting tumorigenesis and metastasis (50) and may have some utility in cancer therapy (73).
Despite the large numbers of studies using hAd5 E1A, relatively little is known about the function of the E1A proteins of other adenoviruses, raising the question of how representative hAd5 E1A is of the other E1A proteins. In 1985, comparison of E1A sequences from three hAds and one simian adenovirus (sAd) identified three regions with higher overall levels of sequence conservation designated conserved regions (CR) 1, 2, and 3 (38). Since that time, the sequences of a number of other human and sAd E1A genes have been determined. Using these, and two additional sequences that we determined, we report here a detailed comparison of 15 E1A proteins representing each of the six hAd subgroups.
|
|
|---|
cDNA synthesis, cloning, and sequence determination. Total RNA was extracted with Trizol (Sigma Aldrich, Oakville, Ontario, Canada) from human KB cells 6 h postinfection with hAd21 at an approximate multiplicity of infection of 20 PFU per cell. mRNA was subsequently isolated using Oligotex resin (Qiagen Inc., Mississauga, Ontario, Canada) and used as a template for Moloney murine leukemia virus reverse transcriptase (Invitrogen, Burlington, Ontario, Canada) to generate total cDNA. The E1A gene was amplified with primers JMO194 and JMO200 (GCGAATTCTTGAGTGCCAGCGAGTAGAGTTTTCTC and TAGTCGACCACAGCTGCAGGGCAC, respectively) designed to anneal in the highly conserved noncoding regions upstream and downstream of the gene in the subgroup B adenoviruses. The E1A gene was then subcloned as an EcoRI/SalI fragment into pAS1 (12) and sequenced using the flanking primers JMO26 (CATCATCGGAAGAGAGTAG) and JMO61 (CATAAATCATAAGAAATTCGC). Plasmid pVM303 was sequenced using primers JMO200 (described above), JMO188 (TACGAATTCATGAGACACCTGCGCTTC), and JMO201 (CTGCCACTTTATTTACAGTCCTGTGTCTGATGATG). JMO188 and JMO201 anneal to the N-terminal coding region and splice junction of hAd3, respectively. Sequencing was performed by the Robarts Research Institute DNA Sequencing Facility (London, Ontario, Canada).
Sequence manipulation. Stretches of overlapping and complementary strand sequences from each viral DNA fragment were manually assembled into a coherent sequence. Splice junctions for the largest E1A products of hAd3 were predicted based on BLAST alignment (1) of the nucleotide sequence with the closely related hAd7 nucleotide sequence. The sequences of hAd3 and hAd21 E1As have been deposited in GenBank; additional E1A sequences were obtained from GenBank, and all of the accession numbers are listed in Table 1. The sequence of hAd17 was predicted from the published sequence (10) following BLAST alignment with the related hAd9 nucleotide sequence. Isoelectric points (pI) and amino acid compositions were determined using the ProtParam Tool at the Expert Protein Analysis System website (http://ca.expasy.org/). Alignments of the largest predicted E1A products were performed with CLUSTAL W (69) at the European Molecular Biology Laboratory European Bioinformatics Institute (http://www.ebi.ac.uk/clustalw/) using default parameters except that the gap open cost was set to 2. The evolutionary tree (Fig. 1) was displayed using the program TreeView (53). The aligned sequence file produced by CLUSTAL W was imported into GeneDoc (52), edited manually, and shaded to four levels of conservation (Fig. 2). Overall sequence identities and similarities were also calculated using GeneDoc. Sequence identities at each position in the alignment were calculated manually and plotted using Microsoft Excel (Fig. 3). Sequence identities and similarities for various subregions of E1A were calculated with respect to hAd5 using GeneDoc, averaged, and plotted using Microsoft Excel (Fig. 4). Nuclear import sequences (Table 2) were predicted using the program PSORT (51). Secondary-structure predictions for each E1A protein (see Fig. 6) were performed using the program PSIPRED (47).
|
View this table: [in a new window] |
TABLE 1. GenBank accession numbers and properties of the largest adenovirus E1A protein products
|
![]() View larger version (14K): [in a new window] |
FIG. 1. Phylogenetic tree for the adenovirus E1A proteins. An unrooted tree was generated for the E1A proteins (Table 1) with the program TreeView. Each species of E1A is labeled at the tip of its representative branch. The adenoviral subgroups are labeled A to F according to convention.
|
![]() View larger version (114K): [in a new window] |
FIG. 2. Sequence alignment of the adenovirus E1A proteins. The sequences of the indicated adenovirus E1A proteins were aligned and shaded for conservation. Darker shading corresponds to higher levels of conservation. Gaps are indicated as dots. The positions of the CR are indicated as solid bars. The binding sites for Rb and CtBP in CR2 and CR4, respectively, are indicated with asterisks.
|
![]() View larger version (43K): [in a new window] |
FIG. 3. Variations in the level of sequence identity along E1A. Protein sequences were aligned (Fig. 2), and the plurality of the consensus sequence was calculated as described in Materials and Methods and plotted as a function of the amino acid position. Note that gaps introduced during the alignment are reflected in the numbering of the consensus sequence, which therefore does not correspond to any of the individual sequences. The axis on the left indicates the number of sequences out of 15 that were identical. A nonintegral value indicates that two or more different amino acids were equally prevalent at a given position.
|
![]() View larger version (20K): [in a new window] |
FIG. 4. Graph of average sequence identities along E1A. Four CR within the E1A sequences were identified as described in Materials and Methods. The percent identity of each adenoviral E1A sequence to that of hAd5 was calculated for each of the indicated regions and averaged. Note that hAd2 E1A was excluded from these calculations as it is virtually identical to that of hAd5.
|
|
View this table: [in a new window] |
TABLE 2. Sequences and locations of predicted NLSs within E1A proteins
|
![]() View larger version (36K): [in a new window] |
FIG. 6. Secondary-structure predictions for the E1A proteins. Prediction of secondary structure for each of the indicated E1A proteins was performed using the PSIPRED program as described in Materials and Methods. Predicted -helices and ß-strands four or more residues in length are shown as blocks or arrows, respectively. The scale at the top indicates the amino acid positions within each E1A protein.
|
|
|
|---|
The 15 E1A sequences were aligned using CLUSTAL W. As expected, the highest levels of identity and similarity occurred between viruses within the same subgroup. The two sAd7 sequences were most like hAd12, while the sAd25 sequence most resembled hAd4, suggesting that they should be placed in human subgroups A and E, respectively. The organization of the E1A sequences into six subgroups is also displayed as an unrooted phylogenetic tree (Fig. 1). This tree also supports the placement of sAd25 in subgroup E, while the degree of divergence of the sAd7 proteins from hAd12 suggests that they could be considered a new subgroup rather than members of subgroup A. Subgroup C members are most closely related to subgroup D, whereas subgroup B is closest to E, and A is closest to F. The sequences within subgroups B and C are the most highly related, whereas those within subgroup A have the greatest degree of divergence. The high degree of relatedness between hAd4 and sAd25 suggests that the distinction between at least some sAds and hAds may be relatively artificial.
Redefinition of the conserved regions of E1A. Previous work showed that three separate regions of E1A had a higher overall degree of homology than the rest of E1A (38, 75). With the expanded number of sequences now available, we decided to revisit the definition of these CR. An inspection of the overall sequence alignment (Fig. 2) shows that there are indeed a number of regions conserved among all types. We determined the number of times the most common residue occurred at each position in the alignment and plotted this measure of sequence identity graphically (Fig. 3). This type of analysis suggests that there may be four regions of higher homology, separated by regions with distinctly lower overall identity. Three of the highly conserved regions correspond approximately to those identified previously, but a fourth conserved region has become apparent near the C terminus of E1A. To define the boundaries of these regions precisely, we decided to set a stringent cutoff for each edge of the CR based on 100% identity or similarity, with no more than nine consecutive less conserved residues occurring between absolutely conserved residues. This arbitrary cutoff does not preclude the possibility that residues adjacent to the absolutely conserved core regions may be critical. However, the fact that those adjacent residues are not absolutely conserved in all E1A proteins suggests that they may be of secondary importance. Using this definition, CR1 spans residues 50 to 82, CR2 spans residues 127 to 158, CR3 spans residues 182 to 231, and CR4 spans residues 292 to 335 of the alignment. These values correspond to residues 42 to 72, 115 to 140, 144 to 191, and 251 to 288, respectively, in hAd5. The generally accepted prior boundaries of CR1, -2, and -3 for hAd2 and -5 are residues 41 to 80, 121 to 139, and 140 to 189 (62), and these compare favorably with the regions our analysis delimits. The stringent cutoffs we chose resulted in a somewhat smaller CR1 and -3 but an expanded CR2.
The relationship between the region we have defined as CR3 and its ability to function as a transcriptional activator can be evaluated because of the tremendous amount of study that this region has received. Interestingly, a comprehensive mutational analysis of a region containing CR3 in hAd5 E1A demonstrated that mutation of residues 137 to 144 had no effect, whereas mutation of residue 145 impaired transcriptional activation (77). Another study showed that deletion of residues 188 to 204 also blocked transcriptional activation (33). Taken together, these results support our definition of a smaller CR3 that is shifted leftward with respect to the original to encompass several extra amino acid residues on its C-terminal edge. Within CR3, the four cysteine residues that form the zinc finger (11) are absolutely conserved (alignment positions 192, 195, 209, and 212).
We determined the relative sequence identity for each CR and the less conserved regions with respect to the prototype hAd5. The average sequence identities for all sequences, except the closely related hAd2, are plotted in Fig. 4. The average identity ranged from 48 to 59% among the CR and from 5 to 25% among the other regions. CR1 had the highest level of sequence conservation, whereas CR4 had the lowest level, perhaps explaining its oversight in prior analyses. The extreme N-terminal portion had the highest average identity of the less conserved regions, suggesting that it could also be involved in activities common to different E1A proteins. This is supported by the observation that mutations in this region abolish the ability of both hAd5 and hAd12 E1A proteins to transform primary rodent cells and repress transcription (5, 34, 60, 66, 76, 82). The region linking CR2 and CR3 had the lowest average identity, as it is almost completely absent in the subgroup C E1A proteins. This region is exceptionally rich in alanine residues in the subgroup A E1A proteins. Studies using subgroup A and C chimeras have shown that this region influences tumorigenicity (32, 68), perhaps by repressing major histocompatibility complex class I antigens or by conferring resistance to lysis by cytotoxic T lymphocytes or natural killer cells (31, 55). The targets of this region have not yet been identified.
Conservation of protein interaction sites among E1A types. We next examined whether the well-defined binding sites for several of the known E1A-interacting proteins were conserved in all species of E1A. The Rb protein is known to interact with a consensus site composed of the core LXCXE, where X can be any amino acid (13). This sequence resides within CR2 and is absolutely conserved in all E1A sequences examined here (Fig. 2, positions 135, 137, and 139), indicating the importance of this interaction for viral activation of the cell cycle and transcription (4). All the E1A sequences possess an invariant aspartic acid residue amino terminal to this consensus site. Interestingly, previous studies with the human papillomavirus E7 proteins, which also bind Rb via an LXCXE motif, have shown that high-affinity interaction with Rb requires the presence of an aspartic acid residue at the same position (27). This suggests that the different E1A types presented here all associate with Rb with high affinity.
The interaction of the transcriptional corepressor CtBP with E1A requires the sequence PLDLS near the C termini of hAd5 and hAd12 E1As (48, 61). This motif, or homologous variants, are present in all E1A types with the exception of the proteins of subgroup D viruses, which contain the variant PLDLC. It remains to be determined if these proteins retain interaction with CtBP, but it appears that most if not all of the E1A proteins target this transcriptional regulator. Interestingly, the lysine residues at position 332 of the alignment are absolutely identical in all E1A proteins (Fig. 2). In hAd5 E1A, this lysine is acetylated by p300 and pCAF, and this modification blocks the interaction of E1A with CtBP (83), suggesting that this method of regulating CtBP binding may exist in common in all E1A proteins.
Conservation of phosphorylation sites. hAd5 E1A is phosphorylated at serine residues 89, 96, 132, 185, 188, and 219 (see reference 79 and references therein). Mutations at these sites affect a variety of E1A activities, suggesting that phosphorylation may regulate E1A function (44, 78, 79). These sites correspond to positions 99, 107, 145, 223, 226, and 260 in the aligned sequence. Only serines 145, 223, and 226 are invariant, and they are flanked by residues that are highly conserved in virtually every sequence, suggesting that they quite likely are phosphorylated similarly to hAd5 E1A. In each of the adenovirus E1A sequences, serines 145 and 226 fit the consensus sequence for substrates of casein kinase II (S/TXXD/E), although in hAd5 E1A, only the first of these sites is phosphorylated in vitro by this kinase (78).
NLSs. hAd5 has been shown to contain a functional nuclear localization signal (NLS), comprised of the sequence KRPRP, at its extreme C terminus (43). Several of the residues within this sequence are not well conserved, suggesting that a functional NLS may not be present in some of the other E1A proteins. We used the program PSORT (51) to predict NLS sequences in each of the E1A proteins (Table 2). While most E1A proteins contain putative NLSs near their carboxyl termini, the subgroup B and sAd7 proteins do not. Additionally, no other NLS sequences are predicted to exist in these proteins. In contrast, E1A proteins from subgroups C, D, and F are predicted to contain a second NLS in addition to the one located at the carboxyl terminus. Interestingly, hAd5 E1A contains additional functional NLS sequences within residues 23 to 120 and CR3 that do not match standard consensus NLS motifs (57, 65). It seems likely that these signals could function in the absence of a carboxyl-terminal signal to mediate import into the nucleus. It remains unclear why only about half of either hAd5 or hAd12 E1A is localized to the nucleus (19, 58, 72) despite such a variety of nuclear import signals and the absence of any predicted nuclear export signal sequences.
Potential interactions with proteins containing SH3 domains. Proteins containing src homology-3 (SH3) domains recognize and bind proline-rich ligands, generally those possessing a motif containing the core sequence PXXP (36). We noticed that hAd5 E1A contains 11 PXXP sequences, suggesting the possibility that E1A may target cellular proteins with SH3 domains. Inspection of the alignment indicates that none of these putative motifs are highly conserved, with the possible exception of the PIKP sequence starting at alignment position 292, which is present only in the E1A proteins of subgroups B, C, and D. Whether this sequence actually interacts with any cellular SH3 domain-containing protein remains to be determined.
Sequence unique to subgroup C. Inspection of the alignment indicates that subgroup C E1A proteins contain a lengthy insertion shortly after CR3 that is not present in any of the other proteins. We performed a BLAST analysis of this protein sequence and determined that it possesses significant homology to a short portion of interleukin-16 (IL-16) from a variety of species (Fig. 5). However, no significant homology could be observed at the nucleotide level (data not shown), suggesting that this sequence is not likely to represent an integration of a cellular sequence into E1A. This is further supported by the observation that overall homology with bovine or mouse IL-16 is higher than with the human sequence. Interestingly, this portion of IL-16 represents the site at which secreted IL-16 is cleaved from the much larger precursor protein by capase-3 (84). A comparison of the cleavage site indicates that the aspartic acid residue and adjacent sequences are identical in subgroup C E1A proteins, suggesting that under some circumstances they too might be substrates for caspase-3. In this way, E1A could compete for cleavage by caspase-3, possibly reducing its effectiveness at cleaving cellular substrates, such as the precursor of IL-16. Alternatively, cleavage of E1A could potentially release the amino-terminal and carboxyl-terminal portions of E1A to perform separate functions. Given the role of caspase-3 in apoptosis (8) and the known ability of E1A to induce apoptosis (80), it is intriguing to speculate that hAd2 and -5 E1As may have evolved to be specifically cleaved during the apoptotic process.
![]() View larger version (23K): [in a new window] |
FIG. 5. Sequence alignment of the region unique to hAd2 and hAd5 E1As with precursor IL-16. The sequences of hAd2 and -5 E1As (residues 202 to 234) and precursor IL-16 proteins of bovine (b), mouse (m), and human (h) were aligned and shaded for conservation. Darker shading corresponds to higher levels of sequence conservation. The percentages of identical and similar amino acid residues are shown to the right. Cleavage of precursor IL-16 by caspase-3 occurs following the aspartic acid and is indicated with an arrow.
|
Protein secondary-structure analysis.
Secondary-structure predictions of the E1A proteins (Fig. 6) were generated using the PSIPRED program (47). Predicted
-helices and ß-strands of four or more residues in length are shown. Interestingly, despite the limited degree of sequence homology, many of the predicted secondary structures are common to virtually all of the E1A proteins. Specifically, all E1A proteins are predicted to contain
-helices near their N termini, one or two in CR1, two within CR3, and an additional helix within CR4. The presence of a putative
-helix near the N-terminal portion of each of the E1A proteins is interesting, as the sequence in this region is not highly conserved. In E1A from hAd2 and -5, this region extends from residues 13 to 38, which is required for interaction with a variety of proteins, including p300/CBP (17), the S4 and Sug1 components of the proteasome (24, 72), and the cyclin-dependent kinase inhibitor p21 (9). In hAd12 E1A, this predicted region is considerably shorter, extending from amino acids 12 to 27, and is preceded by a short ß-strand. Interestingly, the amino-terminal portion of hAd12 differs from those of hAd2 and -5 in that it functions as a transcriptional activation domain (41) and is sufficient to interact with p300 (40). The
-helix in CR3 corresponds closely to the zinc finger and extends typically 5 amino acids past the final zinc-coordinating cysteines. In hAd5, the zinc finger is required for interaction with the TBP (22) and the Sur-2 component of the transcription mediator complex (7). The
-helix in CR4 contains the sequence LXXLL in the E1A proteins of subgroups B, E, and F or homologous variants in all other E1A sequences (positions 311, 314, and 315). This motif is present in a variety of transcriptional coactivators, including p300 and CBP, which interact with liganded nuclear receptors (28, 70). Whether this region of E1A can indeed bind to nuclear receptors remains to be determined.
Predicted ß-strands are generally shorter and not as conserved as the
-helices. However, most E1A proteins are predicted to contain a short ß-strand at the start of CR1, with the exception of three of the subgroup B proteins. In hAd5 E1A, this region spans residues 42 to 50 and is necessary for interaction with p300/CBP (17). In addition, point mutations at residue 47 impair binding to Rb and p130 (76), suggesting that this putative ß-strand may form an interaction surface with these and perhaps other cellular regulatory proteins. Subgroup B and C protein are also predicted to contain a short ß-strand that overlaps the CtBP binding site.
Conclusions. The analyses presented here refine the positioning of CR1, -2, and -3 and define a fourth CR near the carboxyl termini of the E1A proteins. Despite the differences among the E1A sequences, numerous protein interaction motifs and predicted regions of secondary structure remain recognizably present in all E1A species. These observations suggest a strong selective pressure to maintain specific protein-protein interactions, such as those between E1A and Rb or CtBP. The alignment presented here should aid in defining the surfaces of E1A required for interaction with other cellular targets. In addition, this sequence analysis suggests the possibility that at least some of the E1A proteins may interact with SH3 domain-containing proteins or liganded nuclear hormone receptors.
|
|
|---|
to the adenovirus E1A(12S) oncoprotein correlates with its nuclear translocation and an increase in PKA-dependent promoter activity. Virology 285:30-41.[CrossRef][Medline]
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»