Previous Article | Next Article ![]()
Journal of Virology, February 2005, p. 2559-2572, Vol. 79, No. 4
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.4.2559-2572.2005
Jing Su,1,2,3
John McGraw,2,4
Ted L. Hadfield,2,4,
Clark Tibbetts,2,3 and
Donald Seto1,2,3*
Bioinformatics and Computational Biology, School of Computational Sciences, George Mason University, Manassas,1 HQ USAF Surgeon General Office, Directorate of Modernization,3 Epidemic Outbreak Surveillance Consortium, Falls Church, Virginia,2 Division of Microbiology, Department of Infectious and Parasitic Diseases Pathology, Armed Forces Institute of Pathology, Washington, D.C.4
Received 16 June 2004/ Accepted 13 October 2004
|
|
|---|
|
|
|---|
|
|
|---|
|
|
|---|
|
|
|---|
|
|
|---|
|
|
|---|
In this report, the complete and annotated genome sequence of HAdV-4 (GenBank accession no. AY594253) is presented. This genome of HAdV-4 is 35,990 bp in length, and a comprehensive annotation identifies 49 coding sequences, along with numerous other biological features. Organization of the HAdV-4 genome is similar to that of other members of Mastadenoviruses. The bioinformatics and phylogenetic studies on this and other AdV genomes provide insight into the biology of HAdV-4, as well as the evolution of the HAdV-E species, and raise interesting questions about the use of putatively related SAdV genomes as vectors in human gene therapy and vaccine delivery.
|
|
|---|
PCR methodology. Standard PCR methodologies were used to amplify regions to be sequenced. Pfu Turbo DNA polymerase (Stratagene, Inc.) was optimal for PCR amplifications.
Leveraged primer-walking DNA-sequencing strategy. Genes and DNA sequences from HAdV-4 are archived in GenBank. These were used as scaffolds for developing minimally tiled and overlapping primers for PCR amplification and DNA sequencing. Additionally, SAdV-25 (NC_003266) sequences were used to design primers for tiling this minimally overlapping three-fold coverage. This was listed as the member of HAdV species E in GenBank. Gaps in sequence coverage were closed by PCR amplifying across the gap and sequencing the amplicon. Amplicons were purified with the Montage DNA gel extraction kit (Millipore Corp., Billerica, Mass.) to purify target amplicons.
PCR fragments were sequenced with either the PCR or sequencing primers, using the ABI Prism BigDye Terminator v3.1 Cycle Sequencing Ready Reaction kit on an ABI 3100 DNA sequencer (Applied Biosystems, Inc., Foster City, Calif.). Postreaction products were purified with the Montage SEQ96 sequencing reaction cleanup kit and a Millipore Multiscreen384 vacuum manifold (Millipore Corp.).
Direct sequencing of ITR ends. For sequencing of the inverted terminal repeat (ITR) ends, the ends of this double-stranded DNA linear genome were determined by direct sequencing off the purified DNA. Primers were designed from newly determined internal sequences. Template DNA (0.2 to 1.0 µg per reaction) was purified further by passing through a MicroSpin G-50 column (Amersham Biosciences, Piscataway, N.J.) and sequenced.
Genome assembly, annotation, and sequence analysis. DNA sequences were assembled with Sequencher 4.1.1 (Gene Codes Corporation, Inc., Ann Arbor, Mich.). Features of the DNA sequence were revealed by the Wisconsin GCG package (SeqWeb v.2).
The genome sequence was annotated by parsing into 1-kb nonoverlapping segments and querying each segment. This was identical to the annotation algorithm optimized for HAdV-1 analysis (44). These were queried systematically against the nonredundant National Center for Biotechnology Information database, using the BLASTX program of the BLAST suite sequence-alignment software (1). The searches used the default parameters of a word size of 3 and expectation of 10, with the BLOSUM62 substitution matrix and with gap penalties of 11 (existence) and 1 (extension). Low-complexity sequences were filtered out of the queries.
GenomeScan was used for theoretical gene predictions. This was useful for identifying exons from the coding sequences where exon-intron borders were difficult to determine. To enable this, the algorithm uses exon-intron identification combined with similarity searches to a sequence database in order to predict coding sequences in a given DNA fragment (91). Novel sequences or "hypothetical proteins" were also identified by using another gene prediction software, GeneMark (6). During the course of this annotation, while GeneMark had a slightly higher accuracy than GenomeScan, neither was completely accurate nor comprehensive in generating a list of putative genes. To visualize the progress, the web-accessible annotation tool Artemis was used to expedite genome annotation (5).
Multiple sequence alignment was performed with CLUSTALX software (81). All sequence alignments were performed with default parameters (for pairwise alignment, gap-opening and extension penalties of 10 and 0.1, respectively, and the Gonnet 250 protein weight matrix; for multiple alignment, gap-opening and extension penalties of 10 and 0.2, respectively, and the Gonnet series of protein weight matrices). Phylogenetic trees were constructed by the neighbor-joining method (67). Bootstrapping was performed with 1,000 resampling iterations to assess the robustness of the trees.
|
|
|---|
It should be noted that initial genomic assembly and analysis of the ATCC-archived HAdV-4 showed the presence of HAdV-3. The reported sequence is derived from a plaque-purified sample. Recent work suggests that the original isolate could have contained a coinfection of at least two HAdV serotypes (EOS; unpublished data).
Genome annotation. HAdV-4 genome sequence is 35,990 bp in length and has an overall base composition of 21.95% A, 28.96% C, 28.7% G, and 20.36% T. The GC content of 57.67% is within the 57-to-59% range noted in the literature for HAdV-E (73). Like the other Mastadenoviruses, the HAdV-4 genome is organized into early, intermediate, and late transcription regions. Forty-nine coding sequences were identified in the genome sequence, including those of six hypothetical proteins.
Noncoding features. (i) Sequence motifs. Noncoding DNA sequence motifs on the HAdV-4 genome are listed in Table 1. Genome location, putative function, and functional orientation are indicated.
|
View this table: [in a new window] |
TABLE 1. HAdV-4 genome noncoding motif annotationa
|
Contained within the ITR are DNA sequence motifs that are required for viral replication as well as gene activation and transcription. The core origin of DNA replication, ATAATATACC, that binds the preterminal protein-DNA polymerase complex, was present at bp 9 to 18 (80). In addition to the DNA polymerase and pTP, HAdVs also require a set of host cellular factors for efficient replication. These are reflected in the cellular transcription factor DNA-binding motifs. The ITR region of HAdV-4 has an NFIII/Oct-1 recognition site (TATGCAAATAA) at (bp 41 to 51) and an Sp1 binding site (GGGGATGGGGC) at (bp 65 to 75). The NFI/CTFI recognition site was not present. This is all consistent with earlier work reporting the DNA sequences required for HAdV-4 replication (30). In vivo and in vitro studies showed that while HAdV-2 requires both NFI and NFIII for efficient DNA replication, HAdV-4 apparently does not need these two host cellular factors. The HAdV-4 ITR contains a NFIII recognition site but lacks an NFI recognition site.
(ii) VA RNA. Non-protein-coding RNA sequences, known as the virus-associated (VA) RNA species, repress the antiviral activity of host interferons and thus play a role in host response to infection (51). HAdV VA RNA genes have been studied and compared with one another in the literature (41). As reported, HAdV-A, -B2, and -F species members contain one such gene; HAdV-B1, -C, -D, and -E species members contain two such genes. It was also reported that the VA RNA I gene of HAdV-16 is much more closely related (98.7% identity) to its HAdV-4 counterpart than to the counterparts from other members of the HAdV-B1 subspecies (HAdV-3 and -7). Additionally, the VA RNA genes of prototype HAdV-4 differ in length and composition from a "wild" variant, HAdV-4a, and from several other "wild" variants, which may or may not be HAdV-4a. A 65-base deletion in HAdV-4a VA RNA II eliminates part of one promoter element (element A) and all of another (element B); its significance is unknown (85).
These earlier literature observations are borne out in the current in-depth genome analyses. The HAdV-4 VA RNA I and II coding sequences are located at bp 10356 to 10514 and 10575 to 10743, respectively. BLAST analysis shows that VA RNA I is greater than 95% identical to its counterparts in HAdV-7, -16, and -21, as well as approximately 95% identical to SAdV-21, -23, -24, and -25. VA RNA II returned scores of 94, 82, and 90% against HAdV-7 (three regions of identity), 83% each against HAdV-16 and -21, as well as greater than 90% identity against SAdV-22 through -25. This closeness to the chimpanzee AdVs has also been reported in the literature, as both VA RNA I and II genes from SAdV-21 through -25 are all highly related to their HAdV-4 counterparts.
Gene coding features. Table 2 displays the annotation of coding genes found in HAdV-4, along with their locations along the genome. The coding orientation is also detailed.
|
View this table: [in a new window] |
TABLE 2. HAdV-4 genome gene coding annotationa
|
(b) E1B. Five coding sequences were identified in the E1B region. The early 20-kDa protein has high identity to the small T antigen that is conserved in other HAdVs. The 54.2-kDa protein has identity to the large T antigen protein, which inhibits the cellular p53-mediated host defense mechanisms (92). The large T antigen protein also plays a role in regulating viral late gene expression. The 8.1- and the 16.7-kDa proteins show significant identity to their counterparts in the SAdV genome sequences. These have identities to the 1.26- and 1.31-kb mRNA products identified in the HAdV-C species.
(c) E2. The E2 transcriptional unit encodes proteins required for viral DNA replication. HAdV-4 replication requires three virus-encoded factorsterminal protein precursor, DNA polymerase, and DNA binding proteinas well as additional human cellular proteins (15). The E2 transcription unit is divided into two regions, E2A and E2B. A 57.4-kDa DNA binding protein was identified within E2A. In the E2B region, a 135.2-kDa DNA polymerase and a 73.7-kDa terminal protein precursor were located.
(d) E3. The E3 region of HAdVs encodes proteins antagonistic to the host immune mechanism (89). These proteins are not required for efficient viral growth in vitro. The HAdV-4 E3 gene region encodes nine proteins of the following sizes: 12.0, 23.3, 19.3, 24.7, 6.31, 29.7, 10.4, 16.7, and 14.9 kDa. The 12.0-kDa protein has significant homology to an immunomodulating E3 protein in HAdV-2. BLAST alignments suggest that HAdV-4 23.3-kDa protein is homologous to the SAdV genome-encoded CR1-alpha 1 protein; its homologs are found in other HAdVs. The 19.3-kDa protein appears to be a homolog of the E3-gp 19-kDa major histocompatibility class I antigen-binding glycoprotein found in HAdV-7. The 24.7-kDa protein has identity to the CR1 (conserved region 1)-containing proteins in the E3 region of other HAdVs (16). A function for this 80-amino-acid (aa) domain has not been identified. Interestingly, the 29.7-kDa protein also contains a CR1 domain and has a high identity to its counterpart in the SAdV-25 genome (CR1-delta 1 protein). This is not found in other HAdVs. The 10.9-kDa protein has significant identity to an E3 protein that plays a role in down-regulating the epidermal growth factor receptor. The 16.7-kDa protein has identity to an HAdV E3 protein known to protect virus-infected cells against tumor necrosis factor-induced cytolysis (37).
(e) E4. Members of the E4 transcription unit perform a range of functions (46). For example, the E4 proteins are involved in viral RNA export and stabilization. The E4 Orf6 protein combines with the E2 55-kDa protein to inhibit cellular p53. E4 Orf6/7 regulates the cellular transcription factor E2F, while E4 Orf4 controls protein phosphorylation in infected cells. Seven putative coding sequences were identified in the HAdV-4 E4 region. These include a 13.5-kDa protein with identity to the E4 protein Orf1, a 14.6-kDa Orf2 protein, a 13.6-kDa nuclear binding Orf3 protein, a 14-kDa Orf4 protein, a 15.8-kDa Orf6/7 protein, a 34.6-kDa Orf6 protein, and a 7.3-kDa Orf7-like protein. In HAdV-9 (species D), the E4 Orf1 coding sequence has a dUTPase domain and is reported to be an oncogenic determinant (87). The pathway of oncogenic transformation was partly elucidated when it was shown that Orf1 activates phosphatidylinositol 3-kinase, at the host cell membrane, thus initiating a cascade of downstream events that eventually lead to cell transformation (24).
(ii) Intermediate genes. (a) IX. The intermediate transcript-derived protein IX (pIX) is a minor component of the AdV capsid. In addition to functioning as a transcriptional regulator, it is partially responsible for virion stability; virions lacking pIX are heat labile and lose their infectivity if the packaged DNA exceeds 35 kb in size (69). In HAdV-5, IX also acts as a transcriptional activator for the major late promoter (MLP) as well as other viral promoters, including those enhancing expression from the E1A, E4, and major late promoters, as well as cellular promoters. The physiological role of IX as a transcriptional regulator is not clearly understood (68). Recent literature indicates that, although pIX can affect transcription from a variety of viral promoters, it does not appear to play a significant role in the activation of AdV promoters during normal AdV replication (69). An open reading frame (ORF) encoding a 14.4-kDa pIX was identified at bp 3441 to 3869.
(b) IVa2. The second intermediate transcript-derived protein, IVa2, plays a serotype-specific role in packaging viral DNA during AdV assembly (95). The IVa2 protein binds the "A repeats" sequence, at the left end of the genome during the packaging process (55, 94). It is speculated that the virions are assembled around the DNA rather than the DNA being packaged into a preassembled viral capsid (94). The IVa2 protein also functions as a transcription factor for the major late genes (7). An HAdV-4 IVa2 protein coding sequence was identified at bp (3930 to 5554)c, where the attached letter "c" represents coding sequence transcribed from the complementary strand.
(iii) Late genes. The HAdV late genes are transcribed from a single promoter, the MLP. Multiple poly(A) signals are utilized to produce the various distinct mRNA species (72). The core elements of MLP were identified from extensive studies of the HAdV-2 genome sequence (93). Based on sequence comparison, all of the regulatory elements in the HAdV-4 MLP were identified: inverted CAAT box (bp 5803 to 5812), upstream element (bp 5823 to 5832), TATA box (bp 5854 to 5860); and MAZ/Sp1 binding sites flanking the TATA box at bp 5844 to 5853 and 5861 to 5871. The initiator element, which includes the transcription start site for the late transcription unit, is located at bp 5883 to 5889. Two downstream elements recognizing the IVa2 protein were identified at bp 5970 to 5980 (DE1) and 5985 to 6000 (DE2a and DE2b). The late transcription unit encodes the major AdV structural proteins and is subdivided further into regions L1 to L5, each region being expressed as a distinct mRNA species.
(a) L1. In the L1 region, the 52-kDa protein (bp 10765 to 11937) and protein IIIa (bp 11961 to 13736) were identified. The 52-kDa protein serves as a scaffold for capsid assembly during virus assembly (31). The IIIa protein is found on the outer surface of the virus and reportedly has a function in holding the virus facets together (68).
(b) L2. Four coding sequences were catalogued in the L2 region of the HAdV-4 genome. The penton base protein III, which is found at the 12 virion vertices, is located at bp 13815 to 15422. The penton protein binds to the host integrins via a conserved Arg-Gly-Asp (RGD) motif to trigger virus internalization (88). The RGD motif in the HAdV-4 penton is located at bp 14772 to 14780. Coding sequences for proteins VII and V, found at the viral core, are located at bp 15426 to 16007 and 16055 to 17080, respectively. A coding sequence for an 8.4-kDa pX protein was identified at bp 17103 to 17336. This pX protein, also known as the mu protein, has no known function.
(c) L3. Three coding sequences were identified in the L3 region of HAdV-4: the minor capsid protein precursor, pVI; hexon; and 23-kDa protease. The minor capsid protein is probably found on the inner capsid surface and may play a role as a structural intermediate between the capsid and the viral core. In HAdV-4, the coding sequence for the pVI precursor is located at bp 17368 to 18141. The coding sequence for the 105.2-kDa HAdV-4 hexon is located at bp 18248 to 21058. The hexon protein is the major structural component of the AdV capsid, constituting nearly 63% of the virion mass. Its length is 936 aa. In comparison with the other HAdV hexons, the HAdV-4 hexon is 75% identical to the HAdV-2 hexon, 92% identical to the HAdV-16 hexon, and between 82 and 83% identical to the hexons of HAdV-3, -7, and 21. The hexon monomer is made of two eight-stranded ß-barrels and three extended loops. A CLUSTAL-based multiple sequence alignment revealed four major regions of variation (variable regions [VRs] A to D) between the hexons of HAdV-2, -4, -5, and -7 (Fig. 1). When mapped onto the three-dimensional structure of the HAdV-2 hexon, all four regions mapped onto a series of outer loops. These four variable loops probably represent the serotype-specific epitopes. The hexon amino acid sequence is highly conserved outside the VRs. The last coding sequence in the HAdV-4 L3 region encodes a 23-kDa protease and was located at bp 21082 to 21702. This protease is required for the cleavage of viral proteins during virus maturation and assembly.
![]() View larger version (69K): [in a new window] |
FIG. 1. Multiple sequence alignment of the hexon proteins of HAdV serotypes 2, 4, 5, and 7. CLUSTALX alignment of the amino acid sequences of the hexons of HAdV-2, -4, -5, and -7 reveals four major regions of variation (noted as VR A through D). All VRs map onto a series of loops in the three-dimensional structure of the HAdV-2 hexon. CLUSTAL notes amino acid alignments as follows: an asterisk indicates a con-served amino acid, a single dot indicates either size or hydropathy is conserved, and stacked dots (:) indicate both size and hydropathy are conserved.
|
11.4%) and probably has a highly disordered structure. (e) L5. The L5 region of HAdV-4 encodes the 45.1-kDa fiber protein at bp 31645 to 32922. A trimeric fiber assembly protrudes from the vertices of the icosahedral AdV capsid. The N-terminal domain attaches noncovalently to the penton base protein, while the globular C-terminal "knob" domain binds host cells. A study of the crystal structure of the HAdV-12 knob domain bound to the coxsackievirus and AdV receptor (CAR) revealed the key fiber amino acid residues required for CAR binding (39). These residues include Asp415, Pro417, and Pro418. Another important residue, Lys429, is conserved throughout all HAdV species, except HAdV-F. Multiple sequence alignment of the fiber sequences of the HAdV-B and -C species, as well as HAdV-12 and HAdV-4, shows the aspartate residue is present in HAdV-2, -4, and -12 but is replaced with an alanine in HAdV-5 and either asparagine or lysine in HAdV-B, as shown in Fig. 2. Pro417 is substituted for by serine in HAdV-4 and all of the HAdV-C species members; it is replaced with either glutamic acid or threonine in HAdV-B. Lys429 is conserved across all the fiber sequences.
![]() View larger version (78K): [in a new window] |
FIG. 2. Multiple sequence alignment of fiber proteins of HAdV species A, B, C, and E. Amino acid sequences of the fiber protein from species B (HAdV-3 and -7), C (HAdV-2 and -5), and E (HAdV-4) are aligned with the HAdV-12 (species D) homologous sequence. The three- dimensional structure of the HAdV-12 fiber was solved, and the key residues involved in CAR binding were mapped. Some of these key residues are marked by numbers at the top of the alignment showing the conservation among the CAR binding species A, C, and E AdVs relative to HAdV-12: D415 (no. 1), P417 (no. 2), P418 (no. 3), and K429 (no. 4). CLUSTAL notes amino acid alignments as follows: an asterisk indicates a conserved amino acid, a single dot indicates either size or hydropathy is conserved, and stacked dots (:) indicate both size and hydropathy are conserved.
|
5-kb stretch between the MLP initiator element and the L1 52-kDa protein coding sequence. E1B encodes two additional hypothetical proteins, which are presumably expressed from 1.26- and 1.31-kb mRNAs. Both putative proteins have partial identity to the E1B 55-kDa protein. The hypothetical protein predicted by GeneMark is encoded at bp 35331 to 35426 and has no identity to any protein in the GenBank database. The presence of these putative coding sequences strongly suggests the complete set of proteins encoded by the AdV genome sequence has not been identified. The evolutionary origins of HAdV4. HAdV4 and HAdV7 are the etiological agents of ARD. However, unlike HAdV-7, which is a member of a species with several distinct serotypes, HAdV4 is the sole member of species E. This and the lack of a large number of HAdV4 genome types imply that this AdV species is the product of a relatively recent evolutionary event. The nature of this event, however, has been a matter of debate. One hypothesis suggests that HAdV-4 originated from a recombination event between two HAdV species (29). A second hypothesis indicates that HAdV-4 originated from a chimpanzee-human interspecies transmission event (48).
Serotype and species origins through genome recombination. Genome recombination occurs in AdVs. It was observed within a single serotype: HAdV-12 (79). Interserotypic recombinants, both laboratory-generated and naturally occurring strains, have also been documented (40, 52, 78). Illegitimate recombination was shown to be a factor in serotype evolution (12). Improved technology, such as rapid high-throughput DNA sequencing and analysis, permits the exact identification of recombination sites and clarification of such molecular events (52). The hypothesis of evolution of new serotypes through recombination is further supported by the recent determination of the complete genome of HAdV-11. Multiple sequence alignment and analyses imply that HAdV-11 is possibly the product of a recombination event between HAdV-7-like and HAdV-35-like genomes (77).
In a study of the sequences from HAdV4 fiber gene and its immediate neighbors, it was concluded that HAdV-E arose as a recombination event between a species B genome and a species C genome (29). The authors left open the possibility of finding the sites of recombination to prove this hypothesis with more extensive DNA sequencing, especially of the intergenic flanking sequences. To investigate this, the immediate 5'-upstream fiber-flanking 100-bp sequence of HAdV-4 was aligned with sequences from both the HAdV-B and HAdV-C species. A similar alignment was performed with the 100-bp sequence immediately downstream of the fiber stop codon (data not shown). There were conserved segments in the upstream region, but there were no regions of identity downstream of the fibers.
The full-length genome data of HAdV-4 along with several other HAdV genomes allow a comprehensive evaluation of all genes across these species along the entire genome. As shown in Table 3, the percent identities do not reflect a recombination event between species B and species C. In fact, a higher percent identity is seen with genes corresponding to an SAdV, SAdV-25.
|
View this table: [in a new window] |
TABLE 3. Percent identities of select HAdV-4 proteins to their homologs in other HAdV speciesa
|
The amino acid sequences of each of the proteins from all the aforementioned HAdVs and SAdVs were aligned by using CLUSTALX (81). Figure 3 displays the phylogenetic relationships among these viruses. The gross topologies of the trees constructed with the genes coding for E1A 32-kDa protein, E1B 55-kDa protein, L1 55-kDa protein, L2 penton, L3 hexon, and E4 34-kDa protein were similar. There were three main groupings among the HAdVs and SAdVs. The first cluster comprises HAdV-4 plus SAdV-22, -23, -24, and -25. This group is closest to the second cluster, which comprised species B members (HAdV-3, -7, and -11). The third cluster comprised species C (HAdV-1, -2 and -5). In contrast, the phylogenetic tree constructed from the L5 fiber sequences had a distinctly different topology. Based on the fiber amino acid sequence, the HAdV-4, SAdV-22, SAdV-23, SAdV-24, and SAdV-25 cluster is closer to species C than to species B. This may be explained by an earlier recombination event prior to a split between the SAdVs and HAdV-4. HAdV-4, however, still maintains its close similarity to the SAdVs. In four of the six trees (genes coding for E1A 32-kDa protein, E1B 55-kDa protein, L1 52-kDa protein, and L5 fiber), SAdV-25 is most closely related to HAdV-4; this was the SAdV used for the earlier restriction enzyme digestion analysis suggesting these two AdVs were "distantly related" (48). Additional phylogenetic analyses (data not shown and unpublished data) show a close relationship between SAdV-21 and the HAdV-B. This agrees with the recent assignment of SAdV-21 to the Mastadenovirus HAdV-B species by Harrach (www.vmri.hu/~harrach/ADENOSEQ.HTM).
![]() View larger version (25K): [in a new window] |
FIG. 3. Phylogenetic analyses of selected HAdV-4 proteins. The amino acid sequences of six AdV proteins from eight HAdVs and four SAdVs were aligned by CLUSTALX using default parameters. The sequences of HAdV serotypes 3, 7, and 11 were used to represent the B subgroup, while the sequences of serotypes 1, 2, and 5 were chosen to represent the C subgroup. The sequences of HAdV-40 were used as outgroups in each of the trees. The unrooted trees were constructed by the neighbor-joining method (67). The robustness of the trees was measured by bootstrapping (1,000 replications). Numbers indicate bootstrap values in support of the adjacent node.
|
|
View this table: [in a new window] |
TABLE 4. Percent identities of select HAdV-4 structural proteins to their homologs in SAdV speciesa
|
|
View this table: [in a new window] |
TABLE 5. Comparison of the fiber lengths and amino acid sequence identities between HAdV-4 and selected HAdVs and SAdVsa
|
![]() View larger version (49K): [in a new window] |
FIG. 4. Arrangement of coding sequences in the E3 and L5 regions. Genome maps of HAdV serotypes 4, 7, and 5 and SAdV-25, showing the arrangement of coding sequences in the E3 and L5 regions, are presented. Displayed are schematics of each double-stranded linear genome sequence in the E3/L5 region, along with the three forward frames of translation. The color scheme for the coding sequences is as follows: brown = E3 19-kDa protein; dark green = E3 24.8-kDa protein; purple = E3 6.3-kDa protein; red (hatched) = E3 29.7-kDa protein (also called the CR1-delta-1 in SAdVs); blue = E3 10.4-kDa protein; light green = E3 14.5-kDa protein; red (solid) = E3 14.7-kDa protein; light blue = fiber; blue (hatched) = E3 CR1-gamma1 protein (in SAdV-25); and green (hatched) = 7.7-kDa protein (in HAdV-7). A comparison of gene order and synteny in the E3 region, among HAdVs and chimpanzee AdVs, highlights unique conserved features across species.
|
Noncoding genome landmarks. The noncoding genome landmark identities are complemented by identities from comparisons of noncoding motifs and regions. For example, the critically important ITR of HAdV-4 has the highest identity (BLAST score of 50) to the ITR of SAdV-22 rather than to other HAdV ITRs (highest BLAST score among the HAdVs = 42). Among the HAdV species, the HAdV-4 ITR is most related to those of the B species (HAdV-3, -7, and -21; BLAST score for each = 42). But, as noted in the annotation section, the HAdV-4 ITR diverges considerably from the HAdV "canonical" sequence. For reference, the SAdV-25 ITR has a lower match with HAdV-4 but also contains regions of identity to HAdV-4 (BLAST score of 36).
Human and simian VA RNA genes have been studied in detail (41). One observation is that HAdV-E, -B1, -C, and -D species contain two VA RNA genes, whereas the HAdV-B2, -A, and -F species contain only one. In addition, it was noted that the chimpanzee AdVs (SAdV-21 through -25) have two VA RNA genes, whereas AdVs isolated from monkeys apparently have either one or zero VA RNA genes. As noted, both VA RNA genes of SAdV-22 to -25 are all highly related to their HAdV-4 counterparts (41).
The current study supports and extends this earlier observation. VA RNA I coding sequences of HAdV-4 and SAdV-25 have two mismatches between them (BLAST score of 147), while the VA RNA II coding sequences have five mismatches and 10 gaps (BLAST score of 114). Among the HAdVs, the only VA RNA coding sequence in HAdV-11 had some identity to the HAdV-4 VA RNA I (BLAST score of 44). The VA RNA genes of HAdV-7 had equally low BLAST scores against HAdV-4, scoring 33 (VA RNA I) and 33 (VA RNA II) against their homologs.
Taken together, these data support a model in which HAdV-4 (species E) arose from a zoonotic event involving a SAdV-25-like virion. They also complement earlier observations with restriction enzyme digestion analyses (48, 86).
|
|
|---|
To this end, the complete genome of HAdV-4 has been sequenced, annotated, and analyzed for the first time. Due to the modest-sized genome of the HAdVs (ca. 36 kb), scaffolds of primers derived from existing genome data and coupled with the rapid methods for DNA sequencing can produce complete genome sequences. Up-to-date field strains can be sequenced rapidly, and unique DNA sequence signature-based arrays may be generated to survey unambiguously and simultaneously all of the important serotypes and strains of HAdV (and other causative agents). This algorithm of genome-based diagnostics is a basis for the detection of other pathogens: e.g., severe acute respiratory syndrome-causing coronavirus (50, 62).
This underscores the importance and effectiveness of the "leveraged primer-walking" genome-sequencing strategy using archived genomes and partial genomes to generate pathogen DNA sequence data for rapid turnaround in microarray assay design and deployment. The unique ARD-related pathogen signatures have been incorporated into a respiratory pathogen microarray (RPM) chip developed by EOS. These RPM chips (versions 1 and 2) are undergoing validation in a real-world test bed.
AdV evolution and phylogeny. The study of viral origins and evolution is in its infancy (53). Continued improvements in genome-sequencing and analysis methodology and technology are resulting in more and complete viral genomes being deposited into databases, especially in the context of emerging human pathogens. With the availability of genome sequence data, viral and host evolution can be viewed through horizontal transfer of genes among hosts and through genome recombination of related viruses within a host. These events are likely to play major roles in the evolution of viruses as human pathogens.
Based on the presented genome data and phylogenetic analyses, HAdV-4 is evolutionarily closest to the SAdVs. It is plausible that the HAdV-E species resulted from an interspecies transmission event. The notion that new viral species in humans may arise from zoonotic infections has precedence. There is evidence in the literature suggesting that the two strains of HIV, HIV-1 and HIV-2, represent cross-species infections from the chimpanzee (HIV-1) and the sooty mangabey monkey (HIV-2) (25, 36). Similarly, phylogenetic data on HTLV and simian T-cell leukemia/lymphotropic virus (STLV) indicate that HTLV I and II originated from separate interspecies transfers between simian species and humans (74).
The phylogenetic data presented, however, also suggest a recombination at the fiber region between B- and C-like AdVs. This recombination event may have preceded the zoonotic transmission of HAdV-4 from chimpanzees to humans. It is borne out by the fact that HAdV-4 and SAdV-23 through -25 show a closer similarity to the HAdV-Cs at the fiber but are closer to the HAdV-Bs at other regions of the genome, both upstream and downstream of the fiber coding sequence.
Origins of the chimpanzee AdVs. Due to the intriguing bioinformatics analysis of HAdV-4 in the context of the chimpanzee AdVs and the apparent phylogenetic closeness, it may be questionable whether the chimpanzee AdVs are originally of chimpanzee origin or represent a cross-species jump from humans (animal handlers) to chimpanzees. The literature reports at least three independent isolations of chimpanzee AdVs, with the latest being the source of the four sequenced genomes (59).
First reports described isolation of AdVs from chimpanzee throat washings and fecal specimens (35, 75). This was followed by studies in the late 1960s and early 1970s of experimental kuru syndrome and Creutzfeldt-Jakob (CJ) disease in a laboratory colony of chimpanzees. This work resulted in the isolation of more than 100 strains of latent viruses (3, 59). Tissues were aseptically removed from nine sacrificed chimpanzees (Pan satyrus) experimentally inoculated with kuru prions 1 to 3 years previously. Each isolate was given a chimpanzee virus number in the order of its appearance (59). Distinct virus types defined from these strains were tentatively called "Pan" viruses (in laboratory nomenclature) to distinguish the latent viruses they found in chimpanzees (3). Pan 5, 6, 7, and 9 were proven to be new AdVs (3): Pan 5 was isolated from mesenteric lymph nodes of a chimpanzee experimentally infected with kuru; Pan 6 was from the mesenteric nodes of another chimpanzee; and Pan 7 was isolated from the inguinal nodes of yet another similarly infected chimpanzee. Pan 9 was isolated from the mesenteric nodes of a chimpanzee infected with CJ disease. All four isolates had the biochemical and physical characteristics of AdVs. Complement fixation tests, hemagglutination, and neutralization tests were performed. These four viruses were defined as new AdV types, as they were not neutralized by antisera to known AdVs of human or simian origin (3). Antisera included "human [sero]types 1-33." The sera of nine normal chimpanzees that were bled when they entered the colony were also tested against the Pan AdVs. Three of the nine had antibodies to one or more of them. Sera taken from animal handlers when they came on duty at the chimpanzee colony and again after 1 to 3 years did not have antibodies for these Pan viruses (3). It is likely these cataloged and archived AdVs are indeed of chimpanzee origin.
Chimpanzee AdV-derived vectors as vaccine and gene therapy vectors in the context of HAdV-4 bioinformatics. The importance of whole-genome determinations and bioinformatics analyses is far-reaching. There is interest in understanding the genomics and biology of SAdVs, especially in the context of the chimpanzee AdVs being considered as alternative vectors for gene therapy and for vaccine delivery development. Currently, vectors used for gene therapy are based on HAdV-5. However, HAdV-5-derived vectors cause problems in human gene therapy protocols, including fatality (56). In general, host preexisting immunity to HAdV is enough of a concern to stimulate the development of alternative AdV vectors to which neutralizing antibodies would be rare in the human population. The lack of neutralizing antibodies against chimpanzee AdV in human serum samples suggests vectors derived from them will be useful as vaccine and gene therapy vectors (61).
However, contrasting neutralization data have also been presented in the literature. According to Li and Wadell (48), SAdVs are "distantly related" to HAdV-4 based on restriction enzyme digestion profiles. The same report noted SAdVs were neutralized by rabbit antisera against HAdV-4 prototype virion (48). The bioinformatics analyses presented here of the HAdV-4 genome along with the SAdV genomes suggest a stronger link may exist, along with a more worrisome potential and theoretical host immune response, as in cross-reaction of the SAdV-derived vectors with the human host.
These observations may impact the development of vectors for applications in gene therapy and vaccine delivery as these vectors may be administered to the patient multiple times.
Research support was provided through a grant (DAMD17-03-2-0089) from the U.S. Army Medical Research and Material Command (USAMRMC). Partial support was also provided through the Epidemic Outbreak Surveillance Project (EOS), funded through HQ USAF Surgeon General Office, Directorate of Modernization (SGR), and the Defense Threat Reduction Agency.
The opinions and assertions contained herein are the private ones of the authors and are not to be construed as official or reflecting the views of the Department of Defense.
During the course of this work, the EOS Consortium included the following members: Peter F. Demitry and Theresa Lynn Difato, Department of USAF/SGR; Jerry Diao, Kenya Grant, Rosana R. Holliday, Cheryl J. James, Chris Olsen, and Kathy Ward, USAF/SGR (Ctr); John Gomez, Margaret Jesse, Kindra Nix, Jose J. Santiago, Curtis White, and Sue A. Worthy, Lackland AFB, Tex.; Eric H. Hanson and Robb K. Rowley, The George Washington University (IPA); Elizabeth A. Walter, Texas A&M UniversitySan Antonio (IPA); Russell P. Kruzelock, Virginia Tech (IPA); Jennifer Weller, George Mason University (IPA); Robert Crawford, Armed Forces Institute of Pathology; Baochuan Lin, David A. Stenger, Dzung Thach, Gary J. Vora, and Zheng Wang, Naval Research Laboratory; Brian K. Agan and Michael Jenkins, Wilford Hall Medical Center; Linda Canas, Air Force Institute for Operational Health; and David Metzgar, Kevin Russell, and Jianguo Wu, Navy Health Research Center.
Present address: Midwest Research Institute, Palm Bay, FL 32909. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»