Previous Article | Next Article ![]()
Journal of Virology, July 2003, p. 7590-7600, Vol. 77, No. 13
0022-538X/03/$08.00+0 DOI: 10.1128/JVI.77.13.7590-7600.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada
Received 17 December 2002/ Accepted 26 March 2003
|
|
|---|
|
|
|---|
The existence of multiple poxviruses that can infect humans raises the possibility of the evolution of a new smallpox-like virus through host gene acquisitions or intervirus recombination events. If the new virus retained the ability to infect animals, however, then its eradication would be unlikely due to the natural animal reservoir of infection. In addition to the importance of poxvirus pathogens, multiple attenuated poxviruses are being used as vectors for clinical purposes, including cancer treatment, vaccines for human immunodeficiency virus, cytomegalovirus, and measles virus, and a successful rabies virus vaccine for feral animals (1, 9, 10, 29, 36, 42, 55).
The present study includes an analysis of 21 poxvirus genomes that have been completely sequenced (Table 1) 19 members of the subfamily Chordopoxvirinae and 2 members of the subfamily Entomopoxvirinae. With this wealth of sequence information, it is possible to move from the laborious and slow techniques of single-gene functional analysis to a global comprehension of poxvirus genes. The development of new bioinformatic tools is required for these large-scale analyses. As part of the Poxvirus Bioinformatics Resource (PBR; www.poxvirus.org) funded by the Canadian Protein Engineering Network Centre of Excellence and the U.S. National Institutes of Health, our group and collaborators have developed the Viral Genome Organizer (62), the Virus Genome Database (27), and Poxvirus Orthologous Clusters (POCs) (21). POCs, which is the successor of the Virus Genome Database, is an MySQL database containing sequenced poxvirus genomes and a software suite, with graphics, designed for users to search for and analyze poxvirus genes, promoters, and gene or protein homologs (orthologs) in related viruses. In addition, POCs enables users to analyze the likelihood that an open reading frame (ORF) encodes an expressed protein and allows searches for unique viral genes, genes missing from particular viral genomes, or genes present in a user-defined subset of viruses. These tools are available to all researchers at no cost. We have developed and used these computer applications to group genes into families and have identified genes that are most highly conserved in the family Poxviridae. Thus, this analysis represents the first step toward identifying the minimum poxvirus genome and also identifies less-well-conserved genes which may be involved in host-specific virulence. Finally, we have used data analysis by POCs to predict necessary sequencing updates in the VV strain Tian Tan (VV-Tan) GenBank file and have confirmed the hypothesis by DNA sequence analysis.
|
View this table: [in a new window] |
TABLE 1. Poxvirus genomes
|
|
|
|---|
Sequencing of VV-Tan genes. VV-Tan DNA was kindly provided by Joe Esposito, Centers for Disease Control and Prevention, Atlanta, Ga. (CDC), and was isolated from a passaged VV-Tan strain generously provided to CDC by a smallpox vaccine producer in People's Republic of China. Each VV-Tan gene to be sequenced was first PCR amplified with the following reaction mixture to make a final volume of 100 µl: 34 ng of VV-Tan DNA, 0.1 µM forward and reverse primers, 0.1 mM deoxynucleoside triphosphates (GibcoBRL), PCR buffer, Taq polymerase, and double-distilled H2O. The reaction mixture was heated to 94°C for 2 min and subjected to 30 cycles of 94°C for 1 min, 45°C for 1 min, and 72°C for 2 min. PCR products were purified by using a Qiagen QIAquick PCR kit and sequenced by using a LI-COR, Inc., 4200 global edition DNA sequencer with nested primers (see below). Sequences were assembled and analyzed (minimum redundancy of two) by using Dear-Staden software (18).
Nucleotide sequence accession numbers. Gene and primer sequences are available under GenBank accession numbers AY188507, AY188508, AY188509, AY188510, AY188511, AY188512, AY188513, AY188514, and AY188515.
|
|
|---|
We have designed a suite of tools in which the user interface with the database is specifically designed for molecular virologists with little computer and no SQL experience. POCs enables users to quickly perform a great variety of complex queries for genes or proteins from a single virus, subsets of viruses, or all viruses. The user can perform searches for sets of genes based on the nucleotide content or sequence, pI, gene size, and/or codon usage (alone or in any combination) by using the "Sequence Query" window. Data on individual genes and proteins (e.g., predicted molecular weight, pI, and amino acid and nucleotide contents and sequences) from any virus can also be viewed by using this window. Protein hydrophobicity plotting is available by three methods under the "Analysis" pull-down menu (28, 34, 43). The nucleotide composition and predicted amino acid sequence can be used to aid in predicting whether an ORF is authentic (61). In addition, the 100-bp upstream sequence is included in the Sequence query interface to allow for promoter analysis.
Comparisons of poxvirus DNA sequences and/or protein sequences from one or more viruses can be performed with the National Center for Biotechnology Information BLAST programs TBLASTN, BLASTX, BLASTP, and PSIBLAST (6), which have been integrated into the POCs software. TBLASTN compares a protein sequence to six-frame translations of DNA sequences in the database; BLASTX compares a translated DNA sequence to protein sequences in the database; BLASTP compares the protein sequence to protein sequences in the database; and PSIBLAST does iterative protein-protein sequence searches and often uncovers distant homologies not found by other methods. These BLAST programs allow the user to search for orthologous genes or proteins and to identify ORFs that were not originally annotated by the sequencing group. The programs are also very useful for the detection of possible errors in reported gene sequences by allowing the entire region of an ORF or fragmented ORFs to be viewed and determining whether there is a nucleotide change with conservation of the downstream sequence. If the downstream sequence is conserved intact, it is possible that the reported nucleotide change is an error, whereas truly fragmented genes accumulate further errors over time because there is no longer any selective pressure for sequence conservation (see the discussion of VV-Tan sequencing below).
Comparisons of DNA or protein sequences can also be made in POCs with Jalview (http://www.compbio.dundee.ac.uk/), JDotter (54), Laj (46), and NAP (30). Jalview is a multiple-sequence-alignment program that allows the user to manually edit multiple alignments (which is often necessary) after either local or remote preliminary alignment by CLUSTAL W (57) and T-Coffee (41). JDotter is an alignment program that compares two gene or two protein sequences and displays similar regions in a dot plot via an interactive graphic interface. Laj also generates a dot plot-like picture of two aligned sequences but actually displays a series of local alignments generated by BLAST. These gapped local alignments provide useful information regarding the conservation of DNA sequences. Laj is best at comparing two genomic sequences, whereas JDotter is more useful for comparing smaller regions, i.e., specific genes or proteins. NAP greatly simplifies searching for frameshifts or for insertions or deletions in a DNA sequence because it generates nucleotide-amino acid alignments that allow the user to compare a DNA sequence from one virus to a predicted protein sequence from a different virus and shows point mutations and insertions or deletions.
These tools enable the user to compare poxvirus genomes and determine whether genes are fragmented. In addition, POCs aids in predicting whether fragmented genes are likely to produce a truncated but functional protein (depending on the size, conservation, and potential promoter sequences).
Analysis with these tools in POCs is fast and straightforward because the POCs software is devoted to poxvirus genomes. In addition, POCs is able to manage spliced genes; therefore, the software may be used for other large viral genomes, such as those of herpesviruses and baculoviruses.
Gene family organization. We created a POCs database of gene families by grouping related genes based on similarities in BLASTP results. The families were named based on the predicted or known functions of the constituent proteins, as described in poxvirus sequencing studies (5), Fields Virology (40), and original research (12, 14, 38, 53, 56, 58). Gene families were assigned initially by performing a BLASTP analysis of each protein against every protein in every poxvirus. We proceeded based on the expectation that there should be a large set of "essential" genes that are present in each and every poxvirus genome and relatively few families that contain multiple genes from a single genome. Therefore, in automatically generating these families, it was desirable that in most cases, each "essential gene" family (e.g., DNA polymerase) contain one gene from each virus, resulting in families with 21 genes, one from each of the 21 genomes. We performed a large number of trials with different expect (E) values to create the largest number of families meeting this criterion (data not shown). These trials indicated that the largest number of such families was generated by using an E-value of 10-17, and this value was used as the basis for family designation. Thus, the gene families that we have created primarily represent groups of orthologs (hence, the name POCs, for poxvirus orthologous clusters); however, some families contain paralogs (related genes in the same genome). In future versions of the database, we intend to implement a system to annotate the ortholog-paralog relationships.
Each POCs gene family was then manually inspected (using the various tools built into POCs) to verify correct assignment of the poxvirus orthologs. This was accomplished by analyzing every family with the TBLASTN, BLASTX, BLASTP, PSIBLAST, Jalview, and NAP tools. The creation of the families made it simple to identify genes that were conserved in all but a few virus genomes, and further investigation of this group of genes identified errors in GenBank files. During this analysis, an annotation error was found in the fowlpox virus (FPV) GenBank file for the rifampin resistance gene, FPV-050. The GenBank file listed the gene ORF as a fragment between bp 52647 and 52856 (210 bp). However, POCs TBLASTN analysis of related poxvirus rifampin resistance genes indicated that the FPV gene started at bp 52914 and stopped at bp 54569 (1,656 bp). Thus, in the original GenBank annotation, it appeared that FPV did not contain a functional counterpart of this gene, suggesting that it is not essential for poxvirus replication. However, with POCs we have shown that this gene is predicted to be full length and functional in FPV and is therefore present in every poxvirus genome sequenced to date. A similar error was found for the myxoma virus (MYX) gene 077L, where TBLASTN analysis showed the ORF as being located between bp 75602 and 75171 (432 bp) and the GenBank file reported the ORF as being located between bp 75735 and 75556 (180 bp). These errors have been corrected in the POCs database and were reported to the authors for correction of the GenBank files. It is extremely important that these errors are detected and corrected; otherwise, they are propagated throughout other public databases and may influence experiments that depend on database sequence information. Thus, the above examples highlight the utility of the POCs software package for scientific analysis because this database is updated and corrected by poxvirus researchers.
Since MCV and the entomopoxviruses have diverged significantly from the chordopoxviruses, we searched for more distantly related orthologs in these viruses. Conserved gene families that did not contain MCV or entomopoxvirus gene members were used to specifically search these genome sequences for orthologs. If an MCV or entomopoxvirus BLASTP hit was found, regardless of the E-value, the MCV or entomopoxvirus gene was then used to search against the POCs database. If gene homology to additional chordopoxvirus genes in the same family was found and the entomopoxvirus or MCV gene had higher homology to this family than to other chordopoxvirus gene families, the gene was included in the poxvirus gene family. Manual alignment analysis, considering conservation of the most highly conserved amino acid residues in the family and hydrophobic regions, was used as the final criterion for family designation. The criterion for inclusion of a gene in a family, called the "family assignment," is available from the "Family View" window in POCs (BLASTP or manual), so that users will know how the gene was assigned.
We have identified a number of entomopoxvirus genes (including homologs of the VV-Cop A9L, E6R, H3L, and L5R genes) that we have manually placed in the larger orthopoxvirus families (Table 2), despite the fact that they have BLASTP E-values above 10 -17. Figure 1 shows an example of one such alignment. Several of the more diverse members of the POCs family containing VV-Cop L5R (putative membrane protein) were aligned by PEPTOOL, available at PBR (www.poxvirus.org). We have manually included the entomopoxviruses in this family based on a significant number of absolutely conserved amino acids, including two cysteines; a large well-conserved hydrophobic domain; and similarity in gene length. The two entomopoxvirus proteins shown in Fig. 1 are 43.6% identical, but they are no more than 24% identical to other proteins in this alignment. Some highly conserved gene families (Table 3) do not currently contain entomopoxvirus members. For example, the family containing the VV-Cop A20R gene does not include entomopoxvirus genes, although this family is expected to encode a very important DNA polymerase processivity factor. A BLASTP analysis of entomopoxviruses uncovered several different entomopoxvirus genes with E-values below 1 (one hit was 10-7); however, further BLASTP analysis of these hits showed that the identified entomopoxvirus genes had higher homology to individual members of several other gene families than to the A20R gene family. Furthermore, PSIBLAST did not reveal any orthologs, and analysis of multiple alignments did not demonstrate the conservation of amino acid residues that were conserved in all other members of the family. These results do not mean that there is no VV-Cop A20R homolog in entomopoxviruses but rather that it cannot be recognized by these similarity search methods. While these family designation criteria are subjective in nature and may or may not prove to be functionally relevant, we have nonetheless attempted to classify MCV and entomopoxvirus genes into suitable families when possible. These designations will provide a scaffold for testing of hypotheses, and the families will be updated as biochemical and functional data become available.
|
View this table: [in a new window] |
TABLE 2. Completely conserved gene families
|
![]() View larger version (55K): [in a new window] |
FIG. 1. Alignment of entomopoxvirus protein sequences with six widely diverged chordopoxvirus protein sequences in the VV-Cop L5R family. Shading shows the most highly conserved regions of the proteins (darker shading indicates more conservation). The grey bar shows a well-conserved hydrophobic domain, and black squares indicate cysteines that are conserved in every protein in this family. YLDV, Yaba-like disease virus; AmEPV, Amsacta moorei entomopoxvirus.
|
|
View this table: [in a new window] |
TABLE 3. Gene families conserved in chordopoxviruses
|
Gene family analysis. In addition to querying the database for a specific gene, it is also possible to query for family information. Within POCs, the user may access the family of any gene and immediately view all poxvirus orthologs from either the Sequence Query or the "Gene Family Analyzer" window. The POCs Gene Family Analyzer interface also allows the user to perform several complex queries. The user can search for the family containing a specific gene (by family name, family number, or ORF designation in any virus), families conserved in a certain number of virus genomes, families containing a certain number of genes, or families that contain or do not contain genes from a particular viral genome. Any of these queries will result in a table with links to all of the data for any requested family or gene. From these queries, the user can compare genes that are present in one virus but not in another, compare genes within a family, and search for fragmented genes. For example, a query to retrieve families that contain genes from MYX and Shope rabbit fibroma virus (SFV) but not swinepox virus (SPV) and lumpy skin disease virus (LSDV) takes only a few seconds to perform. This search identifies 13 gene families and lists all of the viruses and all of the genes that are members in the 13 families. The advantage of displaying the data in this format and detail is that the user is informed as to whether these families have been identified because (i) the genes in the family are present only in SFV and MYX or (ii) the genes are present in many viruses but absent from SPV and LSDV. Similarly, it is possible to search for poxvirus genes required for infecting mammalian hosts as opposed to avian hosts. A query to identify genes present in viruses that infect mammalian hosts (Yaba-like disease virus, GenBank LSDV, monkey poxvirus, SPV, camel poxvirus, VV-Cop, variola virus [Ind, Bang, and Gar], SFV, MYX, and MCV) but not present in FPV results in the software retrieving three families (families 1309 and 1537, of unknown functions and containing VV-Cop F16L and VV-Cop A37R, respectively; and family 1541, containing the VV-Cop A33R envelope protein) (45). This result suggests that these three genes may be specifically important in poxvirus infection of mammalian hosts but not avian hosts.
Further analysis of genes and families may be easily done with POCs. By selecting the POCs page displaying information on these retrieved families, it is possible to rapidly evaluate all of the members of the family. For instance, by selecting the VV-Cop A37R family, it is immediately apparent that this family contains genes whose products are truncated to 68 amino acids in the three published variola virus sequences, compared to about 270 amino acids in most other chordopoxviruses. This truncation suggests that A37R is not required for infection in mammals; however, experimental data are required to determine the functionality of fragmented proteins and proteins of different sizes. POCs provides the most comprehensive suite of tools in one location for the analysis of poxvirus genomes, genes, and proteins and can provide invaluable assistance in the formation and investigation of experimental hypotheses.
Conserved gene families. Of the 300 families generated by the POCs database, there are 49 families that are conserved in all 21 poxvirus genomes (Table 2), suggesting that these genes are essential in the poxvirus life cycle. As might be expected, many of the completely conserved genes are known to be involved in DNA replication and transcription (25 of 49), and 15 of the gene family products are associated with virions, virion assembly, or maturation. Notably, 12 of the putative essential conserved genes are of unknown function, thus highlighting genes that require functional characterization.
A total of 41 additional gene families are conserved only among the chordopoxviruses (Table 3), without clear orthologs in the entomopoxviruses. Eleven of these chordopoxvirus conserved gene families are responsible for replication and transcription; 17 families are associated with virions, virion morphogenesis, or egress; and 13 families have unknown functions. The tyrosine-serine phosphatase (VV-Cop H1L) is included in this list, although it is present in 20 of 21 viruses and missing only from Melanoplus sanguinipes entomopoxvirus (MsEPV). A total of 90 genes are completely conserved in the chordopoxviruses, and we hypothesize that these genes comprise the minimum essential chordopoxvirus genome. Since VV-MVA is a highy passaged and attenuated virus, we hypothesized that it might be missing some genes that are conserved in all natural pathogens; however, no genes were found to be conserved in all chordopoxviruses but absent from VV-MVA.
Interestingly, two gene families, those for deoxyuridine triphosphatase (VV-Cop F2L) and thymidine kinase (VV-Cop J2R), were found to be conserved in all poxviruses except for MCV and MsEPV. Both of these genes are involved in nucleotide metabolism, and it has been hypothesized that MCV does not require these two enzymes because it replicates slowly in rapidly dividing skin cells, which are expected to contain a high concentration of DNA nucleotide precursors (47). Similarly, MsEPV infection begins in the midgut, which also contains a preponderance of rapidly dividing cells (2). The VV-Cop B1R serine-threonine kinase family contains genes from 20 of the 21 viruses and is missing only from MCV. As such, this family is not listed in either the completely conserved or the chordopoxvirus conserved families (Tables 2 and 3). The reasons why such genes would be so highly conserved in insect poxviruses and chordopoxviruses but missing from MCV are unknown; however, the unique life cycle and slow replication of MCV may again hold the answer. The serine protease inhibitor (SPI) family genes are also represented in all chordopoxviruses except for MCV. The skewed AT genome contents of MCV and the entomopoxviruses (36 and 82%, respectively, compared to 66.6% for the orthopoxvirus VV-Cop) confound homology searching. However, we have manually searched for homologs and evaluated the best matches as described above, regardless of E-values.
It has long been noted that essential conserved genes tend to be located in the central regions of genomes. Figure 2 shows the first systematic analysis of conserved gene locations and confirms that all 49 conserved putative essential poxvirus genes and the 41 conserved chordopoxvirus genes from Tables 2 and 3, respectively, are located in the central region of the VV-Cop genome. The terminal regions of the genome contain the majority of the virulence and host range genes. These genes are not as highly conserved among the orthopoxviruses as the genes in the central region of the genome. Genome maps and information on all of the gene families (including those not listed in the tables presented here) are available at www.poxvirus.org.
![]() View larger version (19K): [in a new window] |
FIG. 2. HindIII restriction map of VV-Cop with fragments A to P and genome positions numbered from left to right. Colored bars indicate ORFs and the number of viruses in which an ortholog is conserved. Dark blue bars indicate the 49 genes conserved in all poxviruses (Table 2), and lighter blue bars indicate genes conserved in at least 19 of the 21 genomes, including the 41 genes listed in Table 3. ORFs transcribed rightward are shown above the line, and ORFs transcribed leftward are shown below the line. A few of the larger ORFs are labeled for easy reference.
|
We have worked from the principle that each genome contributes a single gene to each essential gene family. Poxvirus genomes do, however, contain some multigene families, resulting in families that contain more than one gene from an individual viral genome. These are true nonfragmented (full-length) genes, but it is not clear whether they have evolved by gene duplication to produce paralogs or represent multiple gene acquisition events. One such family is the SPI family, which contains 47 genes from 18 genomes. It is known that there are at least three types of SPIs; however, when BLASTP analysis is carried out for each SPI protein, all other known SPI proteins are found by the search with an E-value of less than 10-17. This situation results in the database grouping all SPI genes into one family. Since all three types of SPIs tend to be distantly but equally related, it is difficult to reliably separate the SPI proteins into three families; therefore, we have chosen to place all of the SPI proteins in a single family for the time being. Additional information, such as gene location (synteny), will be used in future work to aid in the classification of orthologs and paralogs.
VV-Tan sequencing and analysis. Analysis of the families indicated that 19 genes in the VV-Tan 1998 GenBank file had significant changes relative to conserved genes in variola virus strains and VV-Cop. Thirteen of these gene families were missing genes from VV-Tan. The TBLASTN function built into POCs permitted a rapid analysis of corresponding VV and variola virus genes to find the homologous DNA region in VV-Tan. Of the 13 genes missing from the GenBank file, we found 10 (TD4R, TL5R, TA37L, TF15L, TG9R, TA62R, TA65R, TB20R, TA63R, and TK5L) in VV-Tan that were listed under "miscellaneous features" and were not fully annotated in this submitted but unpublished sequence. These genes have been added to the POCs database but are still not annotated in the 1998 GenBank file and are therefore unlikely to be represented in any other public database (Table 4). The remaining 3 of the 13 missing VV-Tan genes (corresponding to VV-Cop A2.5L [thioredoxin], VV-Cop F12L [actin or microtubule-associated protein], and VV-Cop G6R [unknown function]) were reported in the GenBank file to have highly conserved DNA sequences over the full length compared to the VV-Cop genes. However, the VV-Tan genes in the GenBank file were reported to have nucleotide insertions or deletions causing premature stop codons (Table 4). Since these three genes are conserved in all chordopoxviruses (Tables 2 and 3), we decided to resequence these genes with modern techniques and equipment. We sequenced the regions of the reported frameshifts in the VV-Tan genes. Analysis of our sequence data (Table 4) showed that these genes were in fact not truncated in the genome of the separately passaged VV-Tan strain from CDC.
|
View this table: [in a new window] |
TABLE 4. Annotation and sequence updates to VV-Tan genes
|
|
|
|---|
We have used gene conservation in POCs to identify areas for annotation and sequencing updates in available genomes. Notably, the 1998 GenBank VV-Tan genome file has been updated. POCs is helpful in this process because it identifies genes that are absolutely or highly conserved in a certain group of viruses and therefore flags genomes of closely related viruses for which a truncation or fragmentation has been reported. These areas may be sequenced to verify accuracy.
Table 3 lists 41 genes conserved in all chordopoxviruses but not entomopoxviruses. A number of the chordopoxvirus conserved genes have what might be predicted to be essential functions in the virus life cycle, including transcription, replication, virion formation, and egress. It is possible that there remain unrecognized, highly diverged orthologs in AT-rich entomopoxvirus genomes. Alternatively, insect cells may possess various complementing factors that are absent from vertebrate hosts. In the completely conserved orthologs, genes encoding transcriptional and replicative functions are more common than genes involved in morphogenesis (at a 5:3 ratio). However, in the chordopoxvirus conserved group, this pattern is reversed, with relatively more genes being involved in morphogenesis than in RNA or DNA synthesis (at a ratio of 11:17). This pattern suggests that replication and transcription are more broadly conserved across genera than morphogenesis. This finding may be expected because morphogenesis and egress are likely more host specific, since the processes rely substantially on host membranes and host proteins.
The large number of highly conserved genes (24) without functional characterizations highlights areas for future experimental research. It should be noted that gene families were first constructed by using BLAST E-values and that these scores are dependent on the length of the query sequence (gene size). Therefore, it is possible that smaller genes were less likely to be identified as family members due to this computational bias. The concept of bias is supported by the fact that the average molecular masses of proteins in the completely conserved and chordopoxvirus conserved gene families are 49 and 25 kDa, respectively. Larger entomopoxvirus genes may have been more easily recognized as family members than smaller genes. However, the molecular mass range in completely conserved genes is 11 to 147 kDa, and that in chordopoxvirus conserved genes is 6 to 86 kDa, so both Tables 2 and 3 contain a range of gene sizes. Furthermore, each family was manually assessed by comparing conserved amino acid patterns, and even very small and distantly related orthologs (e.g., the 8.9-kDa entomopoxvirus protein member of the A9L family) were assigned to families. An additional reason that entomopoxvirus orthologs may be difficult to identify is that entomopoxviruses may have acquired genes independently from other sources. A precedent for this situation appears to exist in the uracil DNA glycosylase (UDG) family. The entomopoxvirus UDG is more similar to bacterial and herpesvirus UDGs than to the chordopoxvirus UDG, suggesting that it was acquired independently. Hence, there are several reasons why entomopoxvirus orthologs may not be recognized.
Our ability to reliably predict whether an ORF will encode a functional protein remains poor. Many gene fragments have been reported for poxvirus genomes, and these may or may not be functional. Gene fusion events resulting in multidomain, composite proteins comprise a major pathway of evolution; therefore, the presence of a fragmented or smaller gene in one genome and a much larger or fused gene in another genome does not mean that the smaller or fragmented gene is nonfunctional (22, 64, 67). One study comparing 23 genomes of eukaryotes, eubacteria, and archaebacteria identified 7,224 domains present both as independent genes and as gene fusions (22). Increased regulatory efficiency is believed to provide a selective advantage for gene fusion events. It is also known that genes may be lost through mutations in the absence of selective pressure, and gene fission events are also possible (8, 22). Various POCs tools may be used to evaluate the likelihood of an ORF actually encoding a protein; these include analyses of the upstream promoter region and nucleotide and amino acid compositions.
The POCs database will be updated regularly, and we will also respond to user requests for changes and new annotations. The database and family identifications should be very useful for antipoxvirus drug development research. The most broad-spectrum antipoxvirus therapies should be directed against the 90 completely or chordopoxvirus conserved gene products. The absolute conservation of the genes for these products identifies them as both essential and broadly conserved; thus, they are the ideal therapeutic targets. Their conservation indicates that they are under strict selective pressure. Since these genes are more mutationally constrained by their functional requirements than are nonessential genes, the use of the conserved proteins encoded by these genes as drug targets would provide the best chances of reducing the development of drug-resistant strains. In addition, vaccines and antibody therapeutic agents might also be directed against these conserved target proteins, which would be the most likely targets to provide cross-protection between virus species. Finally, these conserved genes would be ideal for the detection of poxviruses in environmental or clinical samples, as all known poxviruses infecting vertebrates share these genes. If a poxvirus were identified, then further testing would indicate what species was present. Thus, POCs software and POCs gene families should provide a useful tool for poxvirus researchers.
There are two other public domain resources that have made attempts to cluster poxvirus proteins into families. Like POCs, Clusters of Related Viral Proteins at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/crp_start.html) and the Virus Database at University College London (http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html) both use BLASTP alignments for clustering. These programs, however, are not dedicated to poxviruses, and although they attempt to provide a resource for all viruses, we have found that the families that they create are not complete. In addition, these databases do not provide the rich variety of database queries or analysis tools that are available in POCs.
We thank Graeme Roch, Melissa Da Silva, Monika Fazekas, Ryan Brody, David Meeuwis, and Ross Gibbs for expert technical assistance and discussions and Geoff Smith and Bart Hazes for helpful discussions regarding family assignments.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»