ABSTRACT
Until fairly recently, genome-wide evolutionary dynamics and within-host diversity were more commonly examined in the context of small viruses than in the context of large double-stranded DNA viruses such as herpesviruses. The high mutation rates and more compact genomes of RNA viruses have inspired the investigation of population dynamics for these species, and recent data now suggest that herpesviruses might also be considered candidates for population modeling. High-throughput sequencing (HTS) and bioinformatics have expanded our understanding of herpesviruses through genome-wide comparisons of sequence diversity, recombination, allele frequency, and selective pressures. Here we discuss recent data on the mechanisms that generate herpesvirus genomic diversity and underlie the evolution of these virus families. We focus on human herpesviruses, with key insights drawn from veterinary herpesviruses and other large DNA virus families. We consider the impacts of cell culture on herpesvirus genomes and how to accurately describe the viral populations under study. The need for a strong foundation of high-quality genomes is also discussed, since it underlies all secondary genomic analyses such as RNA sequencing (RNA-Seq), chromatin immunoprecipitation, and ribosome profiling. Areas where we foresee future progress, such as the linking of viral genetic differences to phenotypic or clinical outcomes, are highlighted as well.
INTRODUCTION
Herpesviruses infect and affect every human on the planet, with a universally penetrant global public health impact (1–4). Most adult humans carry one or more members of the nine herpesvirus families that infect our species. These include the alpha-subfamily members herpes simplex virus 1 and 2 (HSV-1/2 or human herpesvirus 1 and 2 [HHV-1/2]) and varicella-zoster virus (VZV or HHV-3); the beta-subfamily of human cytomegalovirus (HCMV or HHV-5) and human herpesviruses 6A, 6B, and 7 (HHV-6A/6B/7); and the gamma-subfamily of Epstein-Barr virus (EBV or HHV-4) and Kaposi's sarcoma-associated herpesvirus (KSHV or HHV-8) (5). Advances in our understanding of the molecular biology and evolution of these herpesviruses have been informed and advanced by work on other large DNA viruses such as poxviruses of humans and animals; multiple families of bacteriophage, baculoviruses, and other insect viruses; and amoebal giant viruses such as mimivirus (6).
Here we review recent studies that have used high-throughput sequencing (HTS) and genome-wide analyses to explore the diversity and evolution of herpesviruses. In the first half of this review, we examine data suggesting that the diversity and evolution of herpesviruses are impacted by mechanisms extending beyond the usual consideration of polymerase fidelity. These include the influence of minority alleles and standing variation in the virus population, recombination between viral genomes, horizontal gene transfer, and nontemplated mechanisms such as ribosome frameshifting and RNA editing (Fig. 1 and 2). We also consider how these mechanisms of variation impact our ability to manipulate herpesviruses in culture. In the second half of this review, we summarize the tremendous gains in identifying the genetic diversity found among the members of each species of human herpesvirus. We explore how to accurately describe the viral populations that researchers handle experimentally. The necessity for a strong foundation of high-quality genomes—and the bioinformatics tools that enable their production—are also discussed. Many new insights have been enabled by the application of HTS and bioinformatics to herpesvirus genomes, and we end with the challenges that lie ahead.
Opportunities for change in a given viral population arise both in vivo and in vitro. (A) The viral population in an infected individual (represented by red or red-shaded virion) may change over time due to immune selection or the accumulation of genetic drift. Bottlenecks at transmission to a new host or during introduction to tissue culture may allow a new genotype to become prevalent, akin to a genetic shift. (B) A viral stock grown in culture is also a viral population, which may undergo changes during introduction to an animal model or through plaque purification. See Fig. 2 for an expanded view of the genomes contained in the viral population.
Viral genomes with subtle variations contribute to the overall viral population and enable change over time. A viral population may contain minor variants (A) that remain unnoticed until selective pressures or bottlenecks reveal them (B). Deep sequencing approaches can reveal minor variants in the overall viral population, but most HTS approaches report only the consensus genome population. The consensus genome is a summary of the most common variants (e.g., those indicated by orange and blue stars) found in a majority of the sequenced genomes, but that exact genotype does not necessarily predominate in nature. As shown in the exaggerated example in panel A, the consensus genome (thick gray line) contains variants that are found in the majority of genomes (thin gray lines) but that are found only rarely in the same genome. Minor variants or alleles (e.g., those indicated by green or orange stars) are not included in the consensus genome at all, but a transmission bottleneck or subsequent selective pressure may lead to a minor variant becoming the majority genotype in the future (B). Recombination can also create entirely new genotypes, which can become dominant through bottlenecks or external selective pressures. Gene accordions, as demonstrated in vaccinia virus, result from expansion and subsequent variation of a gene under strong selective pressure.
(IN)STABILITY OF LARGE DNA VIRUSES
The perception of most virologists is that RNA viruses are inherently variable and that DNA viruses are inherently stable (7, 8). At the molecular level, this view stems from the lack of error correction by most RNA-dependent RNA polymerases. In contrast to RNA viruses, most DNA viruses display high polymerase fidelity and error correction. For herpesviruses such as HSV-1, early studies of mutation rates focused on single genes and detected mutation rates in the range of ∼1 × 10−7 or ∼1 × 10−8 mutations per base per infectious cycle (9, 10). These rates are often quoted in comparisons of RNA viruses to DNA viruses (7, 8, 11) and are matched by restriction-fragment length polymorphism (RFLP) analyses comparing herpesviruses from disparate geographic locations (12). However, these data fail to explain the surprising ease with which herpesvirus variants can be selected or revealed (Fig. 1). For instance, drug-resistant mutations can be selected from drug-sensitive HSV-1 and HSV-2 populations at a rate of 1 in 104 or 1 in 103 PFU (13). This suggests that the rate of standing variation in the population in herpesviruses may be higher than previously appreciated (14) and/or that the different time scales of experimental settings versus evolutionary comparisons are at odds (15).
There are data to support both the hypothesis of standing variation and that of differing time scales. Using HTS approaches, several studies have described the existence and expansion of standing variation in HCMV populations in nonimmunocompetent hosts, such as congenitally infected infants and immunosuppressed or transplant patients (16–20). In a short-time-scale investigation of Muller's ratchet—the hypothesis that small asexual populations accumulate deleterious mutations—Jaramillo et al. took 10 individual subclones of HSV-1 and subjected each to repeated population bottlenecks through sequential plaque-to-plaque transfers (21). Two clonal lineages were completely lost during this process, and single-gene analysis of the remaining clones revealed a mutation frequency of 3.6 × 10−4 substitutions per base per plaque transfer. The authors also found reduced mortality after intracerebral inoculation into mice for these serially passaged clones (21). Even in the absence of intentional selective pressure, genome-wide HTS comparisons of HSV-1 and HCMV subclones have revealed nucleotide variations in up to 3% to 4% of the genome (22–25). A higher-than-expected frequency of observed mutations in herpesvirus populations was also found in a recent application of phylodynamic inference to DNA viruses, which estimated the substitution rate of HSV-1 to be ∼1 × 10−5 or ∼1 × 10−4 mutations per base per year (15). Many of the studies cited above have focused on HSV-1 as a model herpesvirus, but there may well be subfamily- or species-specific differences in the amount of standing variation in the population, the number of replicative cycles per year, and/or the selection pressures faced during viral transmission in real-world settings. Innovative combinations of genome-wide HTS applications, with models that account for positive selection and standing variation, will be needed to bring these diverse data into synchrony.
RECOMBINATION AS A DRIVING FORCE IN DNA VIRUS EVOLUTION
Mutation and evolution in herpesviruses result not only from base substitutions but also from recombination between strains and, to a lesser extent, between species. Recombination in herpesviruses can provide a driving force for evolutionary shifts, akin to that associated with reassortment in segmented RNA viruses (26). HTS studies of laboratory-generated recombinants of HSV-1 have revealed a bias toward breakpoints being detected in repetitive tracts, intergenic regions, and areas of higher G+C content (27). However, most studies of recombination in human herpesviruses have focused on naturally circulating variants and have inferred historical sites of recombination and phylogenetic relationships from the comparison of disparate strains. For example, increased genomic surveillance of VZV has expanded the number of known phylogenetic clades for this species in recent years; this has provided evidence for ancient, interclade recombination as well as for modern recombination between individual strains (28–32). For the beta-herpesvirus HCMV, multiple groups have confirmed the finding of rampant genome-wide recombination among different HCMV strains (19, 33, 34). Lassalle et al. focused most deeply on this aspect of HCMV evolution and found that particular sections or islands of the HCMV genome appeared to cosegregate, whereas widespread recombination between strains was detected everywhere else in the genome (34). The authors postulated that genes in these islands are codependent, thus enabling higher fitness and a selective advantage for genomes that lack recombination events inside these regions.
In the case of the alpha-herpesvirus HSV-1, there is evidence of rampant recombination between different isolates or strains, but not yet sufficient data to discern whether the recombination events are ancient or extant (35, 36). Evidence for ancient recombination between HSV-1 and the distantly related species HSV-2 has been found by two separate groups (37, 38), who recently described several loci in the HSV-2 genome that contained HSV-1-like DNA. This inference is based on the high similarity of these regions to extant HSV-1 genomes and on their divergence from a unique and apparently historical HSV-2 genotype that has thus far been found only in Africa (39, 40). Additional evidence of modern recombination in natural settings stems from the veterinary herpesvirus literature (41). In 2012, Lee and colleagues demonstrated that a virulent avian herpesvirus that had created an outbreak in Australian poultry was in fact a spontaneous recombinant resulting from two live-attenuated vaccines that were both in use (42). This example is bolstered by multiple others among the veterinary herpesviruses which have been reviewed recently elsewhere (41). The next challenge for understanding herpesvirus recombination events is to address where and how often they occur in vivo, since recombination requires the co-occurrence of two distinct viral genomes in a single cell of the same host—an event which may be rare and difficult to detect in clinical or field settings.
LOSING OR GAINING GENE FUNCTIONS IN CULTURE
Our ability to conduct experimental studies of herpesviruses, and to develop therapeutics and vaccines, depends vitally on cell culture techniques. However, cell culture can induce unintentional selective pressure on viral populations, as has been recognized most notably in the case of HCMV. Prior studies revealed that laboratory strains of HCMV such as AD169 and Towne had not only accumulated minor changes associated with genetic drift but also lost multiple genes during their adaptation to cell culture (22, 24, 43). The regions lost in vitro had functions associated with cell tropism and immune evasion in vivo. Although no similar link between frequent deletions and loss of in vivo-specific functions has yet been discovered for other herpesviruses, a tendency for loss of specific genomic regions has been observed. Frequent deletion of the UL55-UL56 region has been observed in cultured HSV-1 strains, although the phenotypic impact of this loss is unknown (36). The loss of genomic regions has also been demonstrated in other large DNA viruses such as mimivirus, which undergoes gene loss from its termini during repeated passage in amoebal culture (44). There is a pressing need to document the nature of any changes that occur during herpesvirus introduction to culture and subsequent passages thereafter, so that the accumulation of genetic drift and/or the selective pressures of cell culture can be better understood.
Although viral propagation in cell culture can induce the loss of gene functions required for in vivo growth, it can also facilitate experimental insights by revealing transient genome intermediates in the process of viral adaptation. This was demonstrated in a recent study using HTS and comparative genomics in the poxvirus vaccinia virus (VACV), which relies on two viral antagonists to combat the host antiviral protein kinase R (PKR) (45). After experimental deletion of one viral PKR inhibitor, the viral genome population developed an accordion-like expansion of the other inhibitor (Fig. 2B). Variations then arose and were positively selected in the extra copies of this PKR antagonist. These variants tended to remain in the progeny viral population even when the accordion-like gene array collapsed. The examination of genome content after each round of viral replication in culture revealed the existence of these intermediates in viral evolution. These data raise the intriguing question of whether similar mechanisms could occur in herpesviruses. Further investigation of herpesvirus adaptation to selective pressure, with analyses performed at frequent intervals throughout positive selection, will be required to test whether intermediates of viral evolution can be detected for herpesviruses as well.
HOST-VIRUS HORIZONTAL GENE TRANSFER
Horizontal gene transfer (HGT), or the movement of genetic material between unrelated organisms, provides another avenue for evolutionary adaptation. In the case of herpesviruses, at least 20% of the core genes shared by all herpesvirus subfamilies are surmised to have cellular origins, while others appear to have originated in another viral species (46–50). The specific source, mechanism, and timing of these ancient HGT events in herpesvirus evolutionary history are not known. Most herpesviruses do not integrate into the host genome during replication. The gamma-herpesvirus EBV can be found occasionally in an integrated state, although it is not a required aspect of its life cycle (51, 52). However, human HHV-6A and HHV-6B and Marek's disease virus, an alpha-herpesvirus of poultry, do integrate into host telomeres as a regular part of their life cycle (53–56). The germ line or chromosomal integration of human herpesviruses (ciHHV), usually HHV6A, is detected in about 1% of the human population (51, 53–56). Recent data from baculovirus-moth model systems indicate that HGT between host and virus does not require viral integration and excision from the host genome (57, 58). Instead, Gilbert et al. found that HGT can be mediated by transposable elements (TEs) that move between host and viral genomes (57) and that recombination of host DNA into viral progeny can occur at sites of microhomology between the host and viral genomes (58). Baculovirus genomes with integrated host DNA constituted only about 5% of the viral progeny and did not remain in the population beyond a few cycles of replication, suggesting that these are transient intermediates with deleterious fitness effects (58). Nonetheless, these data illustrate a potential avenue for evolutionary HGT in nonintegrating viruses and suggest that it may be of interest to screen for signs of host DNA integration into progeny herpesvirus genomes.
OTHER CONTRIBUTIONS TO FUNCTIONAL DIVERSITY
Mechanisms of genetic variation such as single-nucleotide changes, recombination, and horizontal gene transfer are well accepted for their roles in the evolution of herpesviruses. Limited but exciting data suggest that other mechanisms, including several that are more often associated with RNA viruses, may also contribute to the diversity of herpesvirus coding potential. These include ribosome slippage, RNA editing, and novel transcripts revealed by RNA sequencing (RNA-Seq) and ribosome profiling or footprinting. These mechanisms may not be revealed by examining populations of viral genomes but may nonetheless influence phenotypes observed in vivo.
Ribosome frameshifting and RNA editing are two mechanisms by which herpesviruses can achieve a phenotypic outcome different from the outcome that would be predicted by analysis of the nucleotides encoded in their genome. These outcomes can be detected by examining viral transcripts or proteins but may otherwise go undetected in the comparison of genome sequences. Ribosome frameshifting is a regular feature of translation for retroviruses such as HIV, where it enables the production of nucleocapsid and polymerase from the same RNA transcript. Although it is less frequent, ribosome frameshifting has been demonstrated to occur on transcripts of thymidine kinase (TK) in HSV-1 (59, 60). Microdeletions at homopolymers in the TK or polymerase genes of HSV-1 are a common route of viral escape from the activity of the antiviral drug acyclovir (61, 62), and ribosome frameshifting of defective transcripts in these drug-resistant genomes allows production of a low level of functional protein (59, 60). RNA editing or transcriptional stuttering is another mechanism better associated with RNA viruses which is used to generate more than one transcript from a single open reading frame. RNA editing has recently been demonstrated to occur in the gamma-herpesviruses EBV and KSHV, where it affects microRNAs (EBV) or viral protein-coding genes (KSHV) (63, 64). The phenotypic impacts of these RNA editing events remain to be determined.
Finally, HTS approaches have highlighted the presence of previously unrecognized coding potential in herpesvirus genomes, through the use of RNA-Seq and ribosome profiling approaches that demonstrate the shift from host to viral transcriptional and translational control during infection (64–68). These approaches have illuminated new transcripts and coding potential in HCMV, EBV, and KSHV (64–67) and demonstrated the disruption of transcript termination in HSV-1 (68). The novel transcripts found in these studies are too new to have been considered in prior comparative genomics analyses, but future studies may reveal their influence on the biology and evolution of these herpesviruses.
CAPTURING AND CATALOGING THE DIVERSITY OF HERPESVIRUSES
Early applications of HTS to herpesvirus genomes focused on just one or two examples of a given species (69), using viral strains that had been previously characterized in cell culture. The norm for HTS studies has now shifted to include either comparisons of a large number of viral strains at a time or a deeper investigation of viral setting or outbreak. This expansion of known diversity has driven the definition of new phylogenetic clades and facilitated the reconstruction of the evolutionary history of VZV (28, 30–32), HSV-1 (23, 36, 70–72), HSV-2 (73–76), HCMV (20, 33, 34, 77), HHV 6A/6B (119), EBV (78, 79), and KSHV (80), as well as animal herpesviruses (41). While it is clear that increasing the number of fully sequenced genomes for each species widens our knowledge of viral diversity, the next challenge lies in dissecting the phenotypic impacts of the observed genetic differences in these viral populations. Achieving that goal will require the integration of phenotypic measures of viral fitness, with fully sequenced viruses and comparative genomics, to infer how specific genetic differences influence the outcome of infection.
As the genomic comparison of large DNA viruses from cultured stocks became more tractable, the goal of achieving similar resolution from uncultured viruses became a priority. The development of oligonucleotide enrichment methods has facilitated this goal for herpesviruses (29, 37, 77, 81). Oligonucleotide enrichment uses the known genomes of cultured viruses to design small RNA- or DNA-based probes or baits that can hybridize with sparse amounts of the targeted viral genomes in any mixed sample. These hybridized fragments are then isolated using a tag such as biotin on the synthetic oligonucleotide baits. Once enriched from a mixed source sample, the viral genome fragments can be amplified and sequenced using standard HTS approaches. Oligonucleotide enrichment has enabled the capture of herpesvirus genomes from saliva, blood, skin swabs, vesicle fluid, and more (29, 37, 81). Improvements in the isolation and handling of ancient DNA, combined with oligonucleotide enrichment, have even demonstrated the feasibility of recovering historical samples, such as the recent genome sequencing of 17th century smallpox (variola) DNA from mummified human remains (82). This type of ancient viral genome recovery has not yet been attempted for a herpesvirus, but if the challenge is surmounted it may similarly illuminate the rate of evolution and local adaptation seen in these viruses.
THE CONSENSUS VERSUS THE MINORITY
Many of the studies described above that catalogued herpesvirus diversity have focused on defining the consensus genome of each new sample. The consensus genome represents the most common allele or nucleotide at each position (Fig. 2). In the simplest case, the consensus genome is derived from the most common member of the viral population. However, the consensus genome may not exist in nature as the most common genome format—in other words, it may be an amalgamation of several genotypes that exist separately but not together on a single genome (Fig. 2A). For small viruses, the ability to clone and determine the genotypes of individual genomes has enabled the modeling of viral populations and the development of software that can infer likely haplotypes from HTS data (see reference 83 for a review). Barcoded HTS methods offer the potential to improve haplotype linking for large DNA viruses, but these have not yet been widely applied to human herpesviruses. Thus far, no HTS method has proven capable of sequencing individual strands of DNA viruses that are >100 kb in length with sufficient accuracy and reproducibility to make enable the comparison of individual genomes in a viral population. Limited applications of nanopore-based sequencing (MinION; Oxford Nanopore) or single-molecule real-time sequencing (SMRT; Pacific Biosystems) to herpesviruses have demonstrated the potential of these methods (71, 84, 85). However, both technologies currently suffer from error-prone sequence reads, leading to limited applications during this phase of technical development. For these reasons, most researchers using Illumina or other short-read HTS platforms have focused on detecting the location and prevalence of minority variants, without attempting to link their co-occurrence on individual genomes (i.e., determining haplotypes).
As for RNA virus genomes, deep coverage of large DNA virus genomes has enabled the detection of minority variants, or heterogeneous alleles, and their potential expansion in different environments (Fig. 2). It was recently demonstrated that bottlenecks and selective sweeps occur during human congenital infection by the beta-herpesvirus HCMV. In a series of papers, Renzette et al. showed that HCMV genomes sequenced from the urine of congenitally infected infants harbored multiple loci with minority variants (17–19). While the authors initially posited that HCMV diversity approached the level of a quasispecies, their subsequent modeling suggested that viral diversity in these congenitally infected infants resulted from a combination of sources such as reinfection, recombination, positive selection, and bottlenecks during intrahost transmission between body sites (19, 86, 87) (Fig. 2B). Other groups have found a lower level of intrahost diversity in the context of noncongenital HCMV infections (16, 20, 33, 34, 77), suggesting that congenital infections may represent a special case for viral diversification. Recently, HTS was applied to detect low-frequency drug resistance mutations in the HCMV genome, which has confirmed the potential impacts of intrahost viral diversity on clinically important outcomes such as drug resistance (20). Although the study was conducted retrospectively, after patient treatment, the potential application for real-time HTS screening of viral populations during patient treatment is clear (88, 89).
VIRAL ISOLATES, STRAINS, VARIANTS, AND SUBCLONES
HTS methods have brought the issue of viral population diversity to the fore. Almost all experiments conducted with herpesviruses utilize a population of genome-containing virions (Fig. 2). The same is true of most samples collected from a host source. Because viral populations can shift over time or through handling (Fig. 1), it is crucial to clearly define each viral population under study and to know its history (24). A virus collected from a point source at a specific time is often called an isolate, and it may be referred to as a strain after its growth and expansion in culture. Even after being established as a strain in cell culture, a viral population may still undergo further change (Fig. 1). This can occur through random genetic drift, during intentional bottlenecks such as plaque purification, or through the generation of transgenic or mutant subclones. A viral strain can thus consist of a mixed population of viruses, or it may have undergone a bottleneck that led to the creation of a homogeneous population. For instance, the HSV-1 strain KOS has developed variants by genetic drift over passages in culture, as well as through intentional plaque purification of subclones (25, 90–94). These variants and subclones differ in observable phenotypes such as their ability to elicit Toll-like-receptor (TLR)-dependent immune responses (93), pathogenesis in animal models of HSV encephalitis (91), and expression of antigenic proteins (90, 92). This example emphasizes the importance of knowing the identity and history of the viral populations used in all laboratory experiments.
A more consistent standard of description for viral populations would benefit the herpesvirus community. Descriptive terms such as “clinical isolate” and “laboratory strain” are often used to refer to the low (clinical) versus high (laboratory strain) number of passages that a viral population has undergone in cell culture—though in practice these terms are interpreted differently by every research group (5, 24). There is no historical standard for whether or not a herpesvirus isolate should be plaque purified before it can be called a strain. There is no common term used to describe viral genomes that are collected and sequenced directly from a host, without amplification in culture—these are often referred to simply as genomes, sequences, or genotypes (29, 75, 77, 95). The VZV research community and others have moved toward a viral naming system, akin to those utilized for RNA viruses and bacteria, which includes both virus species and host species and preserves data on sample origin (e.g., geographic location), year of isolation, and the name(s) of the strain, variant, or subclone (96, 97). Following on the HSV-1 KOS example above, one variant of the strain is named HSV-1 Homo sapiens/Texas, USA/1963/KOS-KOS63, indicating both the year and location of its origin (25, 94). This approach may help to alleviate current issues in comparing data across laboratories, where viruses of the same common name (e.g., HSV-1 KOS or HCMV AD169) may differ in both genotype and phenotype (24, 25, 94).
BIOINFORMATICS AND THE FOUNDATION OF HIGH-QUALITY GENOMES
The ability to rapidly sequence and assemble large DNA virus genomes has been facilitated by advances in software and computational workflows, although the diversity of options and different standards of publication have led to a wide variety of finished-genome qualities. One major choice underlying all HTS genome analysis is whether to build new viral genomes by alignment of reads to a prior reference genome or by de novo assembly. Alignment-based approaches utilize prior genome knowledge to achieve a faster outcome, but these are prone to miscalling of minority and structural variants (83). De novo assembly is unbiased by prior data and can more easily detect new variants and structural differences, but it is more computationally intensive and can entail the need for more input to curate the genomes thereafter (83). Open-source options for viral genome analysis and annotation include Web-based platforms and those using command-line (Unix-like) interfaces. The vast majority of viral de novo assembly algorithms have been developed and tested only for RNA viruses (see reference 83 for a review of options). We developed the Web-based viral de novo assembly workflow VirAmp specifically for herpesviruses (98), using the Galaxy framework of Web-accessible bioinformatics tools (99, 100). Most other options for viral genome assembly, alignment, annotation, and comparison rely on a Unix command-line interface, which requires more skill to operate. Unix-based software options are freely available through repositories such as BitBucket and GitHub and include programs such as the de novo viral genome assembly workflow VirGA (23), the aligners Bowtie and BWA (83), and the structural variant detector Wham (101). Researchers can also choose from commercial packages that offer one-button solutions for alignment or de novo assembly, such as Geneious (Biomatters) and CLC Bio (Qiagen) (99, 100). These options have made complex bioinformatics tasks accessible to a wider audience, to the extent that whole-genome sequencing and comparisons of diverse bacteriophage are now part of the undergraduate science education curriculum at many universities (102, 103).
The rationale for a strong foundation of high-quality genomes has been well established by the human genome project and multiple microbial genome projects (104). A wide range of secondary HTS applications, such as RNA-Seq, ribosome profiling, chromatin-immunoprecipitation (ChIP) sequencing, and chromatin conformation capture (CCC or 3C) assays, rely on the accuracy of initial genome sequences (105). Use of a misassembled or poorly annotated viral genome leads to errors in these secondary analyses. Similarly, mapping data from downstream analyses of one viral strain onto the reference genome of another strain can produce misleading outcomes. Gaps or unfinished regions in genome assemblies also create an issue, since these create missing data in all subsequent comparative genomics approaches. Publications occasionally omit the deposition of intact genome sequences, limiting future comparisons of these data (35, 71, 106). The failure to complete the sequence of genes with complex tandem repeats or G/C-rich sequences means that these genes are often excluded from comparative genomics studies or are represented by a far smaller number of examples (see, for example, references 36, 73, 75, and 107). Incomplete intergenic regions can skew the assessment of overall genomic diversity, since genetic drift tends to accumulate in intergenic regions. Unresolved gaps also prohibit any insight from secondary analyses such as RNA-Seq or ChIP in these regions, since data cannot be mapped to these areas. The tremendous insights to be gained from HTS technologies and all of their secondary applications thus rely on a strong foundation in the initial deciphering of viral genome populations.
FUTURE DIRECTIONS
Here we have focused on the several areas of recent progress in understanding the genomic diversity and evolution of human herpesviruses. These advances have been driven by the rapid expansion and application of HTS, bioinformatics, and comparative genomics in virology. Together, these data have reshaped our sense of the stability of herpesvirus genomes. While these viruses possess high-fidelity polymerases, their ability to accrue standing variation, and to undergo recombination with neighboring genomes, creates many opportunities for selective pressures to induce rapid genetic shifts. Examples of this include the expansion of minority variants in niche locations in congenitally infected infants and the selection of drug-resistant variants during antiviral therapy (16, 19, 20, 86). In addition to these recent advances and insights, we foresee several areas of future promise.
First, we foresee the improvement and extension of third-generation sequencing and genome editing technologies to herpesviruses. Early applications of MinION and SMRT long-read sequencing to herpesvirology have shown promise in revealing novel transcriptional networks (67, 85) and in confirming a new synthetic genome approach to introduce multiple simultaneous changes to a herpesvirus genome (84). These third-generation sequencing methods may also enable the detection of methylated bases, secondary structures, and even substrates besides DNA (105). As the accuracy of these methods improves, we foresee their use to advance the detection of recombinant genomes and structural variants, as well as to define haplotypes in mixed populations (Fig. 2). The advances in clustered regularly interspaced short palindromic repeat (CRISPR)-Cas systems for genetic engineering of herpesviruses also represent an exciting area for future expansion (108–110). CRISPR-Cas approaches promise to speed the construction of viral mutants for reverse genetic studies (108, 109) and may have therapeutic potential for herpesvirus genome clearance (110). We also anticipate that HTS and genomic analyses of CRISPR-engineered viruses will be a fruitful way to confirm the desired genomic edits and rule out any off-target or bystander changes.
Finally, we consider the linking of viral genetic variation to observable phenotypes to be one of the greatest challenges for virology. The advance of HTS and genomics has begun to enable the application of genome-wide association studies (GWAS) and quantitative trait locus (QTL) approaches to herpesviruses (111, 112). Brandt and colleagues recently demonstrated the application of viral QTL mapping to HSV-1, by examining how specific viral genotypes contributed to phenotypes of ocular infection in mice (113, 114). That QTL study used the recombinant viral progeny of two attenuated strains of HSV-1, with the differing genetic composition of each recombinant being mapped to the nucleotide resolution level using HTS and comparative genomics (27). These forward genetic approaches complement prior decades of reverse genetic approaches, which established the function of herpesvirus genes and began to dissect the impacts of individual genetic variants (115, 116). However, the occurrence of gene deletions and genetic variations in living humans can be quite distinct from those seen in laboratory-constructed mutants (33, 75, 78), and there is significant interest in determining if and how these viral genetic variants may impact human clinical outcomes (79, 117, 118). This motivates the future extension of GWAS analyses to naturally circulating viral variants and clinical isolates. This will shed light on how viral genetic diversity intersects with human genetic differences to produce the spectrum of observed disease.
ACKNOWLEDGMENTS
We acknowledge the many contributions of scientists whose work could not be cited here for space reasons. Their intellectual contributions shaped our work and this summary. We thank our reviewers, members of the Szpara laboratory, and our many colleagues for insightful conversations and feedback.
We acknowledge support from the Eberly College of Science and the Huck Institutes for the Life Sciences at the Pennsylvania State University (M.L.S.). This project was funded, in part, under a grant with the Pennsylvania Department of Health using Tobacco CURE funds (M.L.S.). The Pennsylvania Department of Health specifically disclaims responsibility for any analyses, interpretations, or conclusions presented in this report.
- Copyright © 2017 Renner and Szpara.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
Author Bios

Daniel W. Renner earned Bachelor of Science and Masters of Bioinformatics degrees at the Virginia Commonwealth University in Richmond, VA. As an undergraduate, Renner participated in the Science Education Alliance—Phage Hunters Advancing Genomes and Evolutionary Science (SEA-PHAGES) program and later served as a teaching assistant for both the wet laboratory and viral assembly/genomics components of the program. SEA-PHAGES, along with an internship at the Bioinformatics Core Computational Laboratory, inspired his current interest in viral comparative genomics and assembly, which continues with his ongoing research. Mr. Renner joined the Szpara laboratory at the Pennsylvania State University as a Computational Scientist in October 2014. He now leads the laboratory's computational and bioinformatics analyses with a focus on viral assembly, diversity, and genetic links to virulence. Mr. Renner also helped to produce and continues to support the Szpara laboratory's public open-source workflows for genome assembly and comparison.

Moriah L. Szpara earned a Bachelor of Science degree at the Pennsylvania State University and a PhD in Molecular and Cell Biology at the University of California, Berkeley. Dr. Szpara trained as a postdoctoral fellow with Dr. Lynn Enquist at Princeton University. Dr. Szpara now leads a laboratory as an Assistant Professor at the Pennsylvania State University in the Department of Biochemistry and Molecular Biology, Center for Infectious Disease Dynamics, and the Huck Institutes for the Life Sciences. Dr. Szpara initiated her research in viral comparative genomics in the collaborative environment of the Lewis-Sigler Institute for Integrative Genomics at Princeton University. Dr. Szpara's laboratory at Penn State is focused on dissecting viral genetic contributions to virulence, the nature of viral diversity in clinical settings, and the molecular interactions of viruses with neurons. The Szpara laboratory has also produced open-source software packages to facilitate herpesvirus genome assembly and comparison, including VirGA and VirAmp (http://szparalab.psu.edu/).