Inherited Chromosomally Integrated Human Herpesvirus 6 Genomes Are Ancient, Intact, and Potentially Able To Reactivate from Telomeres

ABSTRACT The genomes of human herpesvirus 6A (HHV-6A) and HHV-6B have the capacity to integrate into telomeres, the essential capping structures of chromosomes that play roles in cancer and ageing. About 1% of people worldwide are carriers of chromosomally integrated HHV-6 (ciHHV-6), which is inherited as a genetic trait. Understanding the consequences of integration for the evolution of the viral genome, for the telomere, and for the risk of disease associated with carrier status is hampered by a lack of knowledge about ciHHV-6 genomes. Here, we report an analysis of 28 ciHHV-6 genomes and show that they are significantly divergent from the few modern nonintegrated HHV-6 strains for which complete sequences are currently available. In addition, ciHHV-6B genomes in Europeans are more closely related to each other than to ciHHV-6B genomes from China and Pakistan, suggesting regional variation of the trait. Remarkably, at least one group of European ciHHV-6B carriers has inherited the same ciHHV-6B genome, integrated in the same telomere allele, from a common ancestor estimated to have existed 24,500 ± 10,600 years ago. Despite the antiquity of some, and possibly most, germ line HHV-6 integrations, the majority of ciHHV-6B (95%) and ciHHV-6A (72%) genomes contain a full set of intact viral genes and therefore appear to have the capacity for viral gene expression and full reactivation. IMPORTANCE Inheritance of HHV-6A or HHV-6B integrated into a telomere occurs at a low frequency in most populations studied to date, but its characteristics are poorly understood. However, stratification of ciHHV-6 carriers in modern populations due to common ancestry is an important consideration for genome-wide association studies that aim to identify disease risks for these people. Here, we present full sequence analysis of 28 ciHHV-6 genomes and show that ciHHV-6B in many carriers with European ancestry most likely originated from ancient integration events in a small number of ancestors. We propose that ancient ancestral origins for ciHHV-6A and ciHHV-6B are also likely in other populations. Moreover, despite their antiquity, all of the ciHHV-6 genomes appear to retain the capacity to express viral genes, and most are predicted to be capable of full viral reactivation. These discoveries represent potentially important considerations in immunocompromised patients, in particular in organ transplantation and in stem cell therapy.

G iven the complex roles that human telomeres play in cancer initiation and progression and in ageing (1,2), it is remarkable that the genomes of human herpesviruses 6A (HHV-6A) and HHV-6B (species Human betaherpesvirus 6A and Human betaherpesvirus 6B) can integrate and persist within them (3). Human telomeres comprise double-stranded DNA primarily composed of variable lengths of (TTAGGG) n repeats and terminated by a 50-to 300-nucleotide (nt) 3= single-strand extension of the G-rich strand. Telomeres, bound to a six-protein complex called shelterin, cap the ends of chromosomes and prevent inappropriate double-strand break repair. They also provide a solution to the "end replication problem" via the enzyme telomerase (4)(5)(6).
The double-stranded DNA genomes of HHV-6A and HHV-6B consist of a long unique (U) region (143 to 145 kb) including many functional open reading frames (ORFs) (U2 to U100), flanked by identical left and right direct repeats (DR L and DR R ; 8 to 10 kb) including two ORFs (DR1 and DR6). Each DR also contains near its ends two variable regions of telomere-like repeat arrays (T1 and T2) (7,8), terminated by the viralgenome-packaging sequences (PAC1 and PAC2, respectively) (9,10). Telomeric integration by HHV-6A or HHV-6B (yielding chromosomally integrated HHV-6 [ciHHV -6]) results in loss of the terminal PAC2 sequence at the fusion point between the telomere and DR R -T2 (11) and loss of the DR L -PAC1 sequence at the other end of the integrated viral genome when the DR L -T1 degenerate telomere-like repeat region becomes part of a newly formed telomere ( Fig. 1A) (12).
Once the HHV-6 genome has integrated in the germ line, it can be passed from parent to child, behaving essentially as a Mendelian trait (inherited ciHHV-6) (13)(14)(15)(16). The telomere carrying the ciHHV-6 genome shows instability in somatic cells, which can result in the partial or complete release of the viral genome as circular DNA (12,17,18). This could represent the first step toward viral reactivation, and in this respect, telomeric integration may be a form of HHV-6 latency. To date, reactivation of ciHHV-6 has been demonstrated in vivo in two settings: first, in a child with X-linked severe combined immunodeficiency who was also a carrier of inherited ciHHV-6A (19), and second, upon transplacental transmission from two ciHHV-6 carrier mothers to their noncarrier babies (20). Recently, it has been shown that ciHHV-6 carriers bear an increased risk of angina pectoris (21), although it is not known whether this arises from viral reactivation, a deleterious effect on the telomere carrying the viral genome, or some other mechanism.
A small proportion of people worldwide are carriers of inherited ciHHV-6A or -6B, but very little is known about the HHV-6 genomes that they harbor, although they may influence any associated disease risk. To investigate ciHHV-6 genomic diversity and evolution, the frequency of independent germ line integrations, and the potential functionality of the integrated viral genomes, we analyzed 28 ciHHV-6 genomes. We discovered that ciHHV-6 genomes are more similar to one another than to the few sequenced reference HHV-6 genomes from nonintegrated viruses. This is particularly marked among the ciHHV-6B genomes from Europeans. We also found that a subset of ciHHV-6B carriers from England, Orkney, and Sardinia are most likely descendants of a single ancient ancestor. Despite the apparent antiquity of some, possibly most, ciHHV-6 genomes, we concluded that the majority contain a full set of intact HHV-6 genes and therefore in principle retain the capacity to generate viable viruses.

RESULTS
Selection of ciHHV-6 carriers and sequence analysis of viral genomes. To investigate sequence variation among ciHHV-6 genomes, 28 samples were selected for analysis: 7 with ciHHV-6A (including LEI-1501) (18) and 21 with ciHHV-6B (Table 1). The selected samples were identified in the various populations screened ( Table 2) and included additional individuals from the London area (16), Scotland and the north of England (22), the Leicester area of England (18), and the Generation Scotland: Scottish Family Health Study (GS:SFHS) (R. F. Jarrett, unpublished data). The chromosomal locations of ciHHV-6 genomes, determined by fluorescent in situ hybridization (FISH), were available for some samples (16,18). For other samples, the junction between the viral DR8 sequence (a noncoding region near one end of the DR) and the chromosome subtelomeric region was isolated by PCR and sequenced (discussed below). Integration of each ciHHV-6 genome was confirmed by detection of a telomere at DR L -T1 using single-telomere length analysis (STELA) (12) or by detection of at least one copy per cell using droplet digital PCR (22,23).
Each viral genome from ciHHV-6 carriers was sequenced from pooled PCR amplicons (12,18). Full sets of HHV-6 amplicons were readily generated ( Fig. 1; see Table S1 in the supplemental material), demonstrating the robustness of this approach for enriching HHV-6 sequences from ciHHV-6 carriers. The HHV-6 amplicons generated from each carrier had the expected sizes, with variation only in amplicons encompassing repetitive regions (e.g., the DR R -T1 region of degenerate telomere-like repeats). This observation indicated that all of the ciHHV-6 genomes are essentially intact, with the exception of the terminal DR R -PAC2 and DR L -PAC1 sequences lost during integration ( Fig. 1A) (11,12).
The ciHHV-6 genome sequences were determined by short-read next-generation sequencing (NGS), with some verification by the Sanger method. De novo assemblies of each genome were generated with few gaps (Fig. 1). The ciHHV-6A genome reported previously by us was included in these analyses (LEI-1501; GenBank accession no. KT355575) (18).
Sequence similarity is greater among ciHHV-6 genomes than to nonintegrated HHV-6 genomes. Nucleotide substitution frequencies were analyzed across the DR and U regions of the HHV-6B genome (excluding the tandem-repeat regions R-DR, R0, R1, R2, R3, and R4, [ Fig. 1]) (9,24) for each sequenced ciHHV-6B genome in comparison with the two available HHV-6B reference genomes from nonintegrated strains (HST from Japan, GenBank accession no. AB021506 [24], and Z29 from the Democratic Republic of  the Congo, GenBank accession no. AF157706 [9]). The ciHHV-6B genomes show different patterns of variation from the reference genomes, with greater divergence from strain Z29 in the distal portion of the U region (kb 120 to 150) and across the DR (kb 1 to 8), reaching a maximum of 35 substitutions per kilobase in these regions ( Fig. 2A).
Overall, there is less divergence from strain HST, although the frequency of substitutions is higher in part of the U region (kb 45 to 64) than in strain Z29. To assess sequence variation among the ciHHV-6B genomes, comparisons were made using the genome in the European individual HAPMAP NA10863 (CEPH1375.02) as a reference. The substitution frequency is considerably less across the viral genomes for 18/20 ciHHV-6B genomes from individuals with European ancestry, indicating greater similarity among them. Notably, the other two ciHHV-6B genomes, which showed higher substitution frequencies in this comparison, were in individuals from Pakistan and China, HGDP00092 and HGDP00813, respectively ( Fig. 2A). Nucleotide substitution frequencies were also analyzed across each of the seven ciHHV-6A genomes in comparison with three nonintegrated HHV-6A reference genomes (strain U1102 from Uganda [25,26] [27,28], and strain AJ from the Gambia [accession no. KP257584.1] [29]). This analysis showed that the ciHHV-6A genomes have similar levels of divergence from each reference genome from nonintegrated HHV-6A and that divergence is highest across the DR and the distal part of the U region (kb 120 to 149) (Fig. 2B). Comparisons with the ciHHV-6A LEI-1501 genome (18) as a reference also showed greater similarity among the ciHHV-6A genomes, although the substitution frequencies are higher than among European ciHHV-6B genomes, indicating greater diversity among the ciHHV-6A genomes sequenced here (30). Notably, ciHHV-6A in the Japanese individual (HAPMAP NA18999) showed greater divergence from the other ciHHV-6A samples of European origin.
In summary, comparisons of nucleotide substitution frequencies showed that the viral genomes in ciHHV-6B carriers are more similar to each other than they are to reference genomes derived from clinical isolates of nonintegrated HHV-6B from Japan (HST) and the Democratic Republic of the Congo (Z29). The ciHHV-6A genomes are also more similar to each other than they are to the three HHV-6A reference genomes, although this is less pronounced than among the ciHHV-6B genomes.  Phylogenetic analysis of ciHHV-6 and nonintegrated HHV-6 genomes. Consistent with the results shown in Fig. 2, phylogenetic analysis of the U regions from 21 ciHHV-6B genomes and the HST and Z29 reference genomes (excluding the DR, the large repeat regions, and missing data shown in Fig. 1) showed that the ciHHV-6B genomes in HGDP00813 from China and HGDP00092 from Pakistan are outliers to the 19 ciHHV-6B genomes from individuals of European descent (Fig. 3A). A phylogenetic network of the ciHHV-6B genomes from individuals with European ancestry showed three clusters of 8, 3, and 5 closely related ciHHV-6B genomes (groups 1, 2, and 3, respectively) (Fig. 3B) and three singletons (ORCA1340, COR264, and 1-ciHHV-6B). Phylogenetic analysis of the DR alone showed that, with the exception of COR264, the European ciHHV-6B samples showed greater similarity to the HST (Japan) reference genome than to the Z29 (Democratic Republic of the Congo) reference genome. However, the DRs in the two non-European ciHHV-6B samples HGDP000813 (China) and HGDP00092 (Pakistan) did not cluster closely with those in the European ciHHV-6B samples, again indicating these ciHHV-6B strains are distinct ( Fig. 3A; see Fig. S1 in the supplemental material).
To explore variation only within HHV-6B genes, the frequencies of substitutions in ORFs of each of the 21 ciHHV-6B genomes were compared with those in the HST and Z29 reference genomes and the ciHHV-6B genome in HAPMAP NA10863 (Fig. 4A). The patterns of variation were similar to those observed across the whole genome ( Fig. 2A) and consistent with the phylogenetic analysis, showing greater similarity among ciHHV-6B in Europeans and with the subgroups. Phylogenetic analysis of specific genes, which were selected because they show greater sequence variation from the reference genomes or among the ciHHV-6B genomes, generated a variety of trees that were generally consistent with the phylogenetic analysis based on the U region but exhibited less discrimination between samples or groups ( Variation within HHV-6A genes was also explored by plotting the base substitution frequency per ORF for each of the seven ciHHV-6A samples in comparison to the three reference genomes and the ciHHV-6A genome in LEI_1501 (Fig. 4B). The patterns of variation are similar to those observed across the whole genome (Fig. 2B). Phylogenetic analyses of U83, U90, and DR6, selected because they show greater sequence variation, generally support the phylogenetic trees and networks generated from analysis of the U and DR regions (see Fig. S3 in the supplemental material).
Overall, the sequence variation and phylogenetic analyses indicate a divergence between the integrated and nonintegrated HHV-6 genomes but with some differences between HHV-6A and HHV-6B. The ciHHV-6B samples from individuals with European  ancestry showed divergence from both the HST (Japan) and Z29 (Democratic Republic of the Congo) reference genomes, although the pattern of divergence varies across the genome. The 21 ciHHV-6B genomes from individuals with European ancestry are more similar to one another than to the ciHHV-6B genomes from China and Pakistan and can be subdivided into distinct groups. There is greater divergence among the seven ciHHV-6A genomes than among the ciHHV-6B genomes, but despite this, two pairs of closely related ciHHV-6A genomes were identified. From these analyses, we concluded that the three groups of closely related ciHHV-6B genomes and the pairs of ciHHV-6A genomes identified in the phylogenetic networks ( Fig. 3B and D, respectively) could represent independent integrations by closely related strains of HHV-6B or HHV-6A. Alternatively, each group might have arisen from a single integration event, with members sharing a common ancestor. Further analyses were undertaken to explore these possibilities.
Comparison of tandem-repeat regions in ciHHV-6 genomes. Tandem-repeat arrays within the human genome often show length variation as a consequence of changes to the number of repeat units present (copy number variation). The greater allelic diversity in these regions reflects the underlying replication-dependent mutation processes in tandem-repeat arrays, which occur at a higher rate than base substitutions (31). To explore diversity among the ciHHV-6B genomes further, tandem-repeat regions distributed across the viral genome were investigated. The R-DR, R2A, R2B, and R4 repeat regions analyzed (locations are shown in Fig. 1C) showed little or no copy number variation among the ciHHV-6B and nonintegrated reference genomes ( Fig. 5A and Table 3). Copy number variation at R1 (location shown in Fig. 1C) was greater but did not show a clear relationship with strains of ciHHV-6B or nonintegrated HHV-6B. Greater copy number variation was detected at the pure array of TTAGGG repeats at DR L -T2 (location shown in Fig. 5B), with the largest number of repeats in the HHV-6B Z29 reference genome and ciHHV-6B in HGDP00813 from China ( Fig. 5A and Table 3). Notably, the copy number variation observed at R0 (location shown in Fig. 1C) correlates reasonably well with the groups of ciHHV-6B genomes identified in the phylogenetic network ( Fig. 3 and 5A and Table 3).
Similar analysis of repeat regions in the ciHHV-6A genomes was conducted ( Table 3). The data suggest that ciHHV-6A genomes have fewer TTAGGG repeats at DR L -T2 than the HHV-6A reference genomes. This variation could have been present in HHV-6A strains prior to integration, or deletion mutations that reduced the length of the DR L -T2 array may have been favored after integration (12).
To explore variation within the T1 array of degenerate telomere-like repeats in ciHHV-6B genomes, we amplified the DR R -T1 region using the U100Fw2 and DR1R primers and investigated the interspersion patterns of TTAGGG and degenerate repeats at the distal end of DR R -T1 (near U100) (Fig. 5B) by using modified telomere variant repeat mapping by PCR (TVR-PCR) (32)(33)(34). Comparison of the TTAGGG interspersion patterns between the samples showed that the ciHHV-6B genomes clustered into groups that share similar TVR maps in DR R -T1 (Fig. 5C). Furthermore, these interspersion patterns differed between the groups and the singleton ciHHV-6B genomes identified in the phylogenetic analyses. Variation around the (CA) n simple tandem repeat, located immediately adjacent to DR L -T1 (location shown in Fig. 5B), also showed clustering into groups that correlate with the ciHHV-6B phylogenetic analyses (Fig. 3 and 5D and Table  3). Overall, the analyses of tandem-repeat regions in the ciHHV-6B genomes are consistent with the phylogenetic analyses.  Fig. 3D. The x axes of all the graphs show a single copy of DR1 and DR6, followed by genes found in the U region.
Ancestry of ciHHV-6B carriers in group 3. The repeat copy number variation observed within and among groups may have arisen before or after telomeric integration of the viral genome. To investigate further how many different integration events may have occurred among the ciHHV-6B carriers, we isolated and sequenced fragments containing the junction between the human chromosome and the ciHHV-6B genome, in addition to using the cytogenetic locations published previously for some samples (Table 1) (16). The junction fragments were isolated by a trial-and-error approach, using PCR between a primer mapping in DR8 in DR R and a variety of primers known to anneal to different subtelomeric sequences (Fig. 6A), including primers that anneal to the subterminal regions of some, but not all, copies of chromosome 17p (17p311 [35] and sub-T17-539 [12]). There was insufficient DNA for analysis from the sequenced ORCA1340 (singleton) or ORCA1622 and ORCA3835 (group 3) samples (Fig. 3B). How- Comparisons can be made among the reference nonintegrated HHV-6B strain HST (Japan) and Z29 (Democratic Republic of the Congo) and ciHHV-6B genomes. The sample order along the x axis is as follows: HST, Z29 (mauve); European group 1 ciHHV-6B genomes (blue); European group 2 ciHHV-6B genomes (orange); European group 3 ciHHV-6B genomes (green); European singleton ciHHV-6B genomes (no highlight); ciHHV-6B in HGDP00813 from China (red); and ciHHV-6B in HGDP00092 from Pakistan (no highlight). (B) Locations of the PCR amplicons used to analyze the repeat sequences shown in panels C and D. The black dashed line indicates the amplicon generated by the U100Fw2 and DR1R primers that were used for TVR-PCR shown in panel C. The red dashed line indicates STELA products, generated from DR1R, that were used to analyze the (CA) n repeat shown in panel D. (C) Distribution of TTAGGG repeats at the distal end of DR R -T1 (near U100) in ciHHV-6B genomes. If the repeat array comprises consecutive TTAGGG repeats, a ladder of bands with 6-base periodicity should be present, and the migration distance between the rungs on the ladder should steadily decrease as the separation between the bands is reduced (near the top of the gel, toward DR1). The observed distance between the bands in each track varies between the samples. This shows that the repeat array is not pure (TTAGGG) n but includes intervening sequence, most likely degenerate telomere-like repeats. The patterns of repeats can be compared between the tracks to identify samples that share the same repeat distribution at that end of DR R -T1. The ciHHV-6B samples are color coded in accordance with groupings identified in Fig. 3: European group 1, blue; group 2, orange; group 3, green; European singletons, gray; ciHHV-6B in HGDP00813 from China, red; ciHHV-6B in HGDP00092 from Pakistan, black. (D) Variation in copy numbers of CA repeats and adjacent 5= sequence near the start of the ciHHV-6B DR L -T1 region. The samples are color coded as in panel C.  ever, analysis of DR R -T1 and the other repeats showed that the 42 ciHHV-6B carriers from Orkney fell into two groups that share the same length at DR R -T1 either with ORCA1340 or with ORCA1622 and ORCA3835 (Table 3). For junction fragment analysis, we selected ORCA1006 as a substitute for ORCA1340, since it shares the same DR R -T1 length. Similarly, ORCA1043, ORCA2119, and ORCA1263 were used as substitutes for ORCA1622 and ORCA3835, since they share a different DR R -TI length. Using the chromosome 17p primers, junction fragments were generated from all of the group 3 ciHHV-6B samples and from 1-ciHHV-6B (a singleton in the phylogenetic network) (Fig.  3). Using these primers, PCR products were not amplified from other ciHHV-6B samples in this study. The sequences of seven junction fragments from group 3 ciHHV-6B genomes (including NWA008 [36], which is another ciHHV-6B carrier with a viral genome that belongs to group 3 [data not shown]) were similar to each other but different from the fragment in sample 1-ciHHV-6B (Fig. 6B). These data indicate the existence of at least two independent integration events into different alleles of the chromosome 17p telomere, or possibly into telomeres of different chromosomes that have similar subterminal sequences (37). Comparison of the junction fragments from group 3 ciHHV-6B samples showed remarkably similar TTAGGG and degenerate-repeat interspersion patterns (Fig. 6B). The differences among the interspersion patterns are consistent with small gains or losses that may have arisen from replication errors in the germ line after integration of the viral genome (32). Therefore, it is most likely that the ciHHV-6B status of group 3 individuals arose from a single ancestral integration event. Using the levels of nucleotide substitution between the group 3 ciHHV-6B genomes, the time to the most recent common ancestor (TMRCA) was estimated as 24,538 Ϯ 10,625 years ago (Table 4). This estimate is based on the assumption that, once integrated, the ciHHV-6B genome mutates at the same average rate as the human genome as a whole. However, deviation from this rate would result in an under-or overestimation of the TMRCA.  Fig. 3B]). These interspersion patterns are distinct from that of the chromosome junction fragment isolate from 1-ciHHV-6B (singleton in Fig. 3B). The sequence on the left of the repeats is from the chromosome subtelomeric region, and the sequence on the right is from the ciHHV-6B genome. Genetic intactness of ciHHV-6 genomes. The evidence for an ancient origin of some, probably most, of the ciHHV-6B genomes analyzed, and for postintegration mutations in repeat regions, raised the question of whether these genomes contain an intact set of viral genes or whether they have been rendered nonfunctional by mutation. To explore the consequences of sequence variation among the ciHHV6B genomes, the amino acid sequences predicted from all the genes in the ciHHV-6B genomes were aligned, and the cumulative frequencies of independent synonymous and nonsynonymous substitutions were determined (Fig. 7A). The ratio of synonymous to nonsynonymous substitutions varies among genes. The great majority of nonsynonymous changes (amounting to 34% of the total) result in single amino acid substitutions, but one substitution in the U20 stop codon of HGDP00092 is predicted to extend the coding region by eight codons. Only one substitution, which creates an in-frame stop codon in U14 of 1-ciHHV-6B, is predicted to terminate a coding region prematurely. Two of the seven ciHHV-6A genomes also have in-frame stop codons, one in U79 of GLA_15137 and the other in U83 genes of GLA_4298 (data not shown).
The 21 inherited ciHHV-6B genomes are likely to include mutations that arose before integration and represent variation among the parental nonintegrated HHV-6B strains, as well as mutations that arose after integration. To explore the latter, five group 3 ciHHV-6B genomes were compared (Fig. 7B). Among the 10 substitutions identified, 3 were in noncoding regions, 1 was a synonymous mutation in U77, and 6 were nonsynonymous mutations. From these limited data, it seems likely that the accumulation of mutations after integration has been random in these ciHHV-6B genomes.

DISCUSSION
In this study, we used comparative analyses to explore diversity among ciHHV-6 genomes in order to understand the factors that influence the population frequencies of ciHHV-6 and to determine whether the integrated genomes appear to retain the capacity for full functionality as a virus. We found that the ciHHV-6B genomes are more Diagram showing the approximate locations and consequences of nucleotide substitutions that are predicted to have arisen after integration in group 3 ciHHV-6B genomes. The horizontal line represents the HHV-6B genome; black dots, locations of noncoding base substitutions; red dots, base substitutions within HHV-6B genes that are predicted to result in amino acid substitutions (nonsynonymous), as indicated; pink dot, synonymous (T-to-C) substitution in DER512 that is not predicted to change the phenylalanine. The numbers of repeats in three regions (T2, R1, and R4) that vary among the group 3 genomes are also shown. similar to each another than to the two available HHV-6B reference genomes from Japan and the Democratic Republic of the Congo (Fig. 2, 3, and 4; see Fig. S1 in the supplemental material). This is particularly noticeable among the 19 ciHHV-6B genomes from individuals with European ancestry, which are more similar to each other than they are to the ciHHV-6B genomes in HGDP00092 from Pakistan and HGDP00813 from China. This pointer toward a relationship between the integrated HHV-6B strain and geographical distribution warrants further investigation if the association between carrier status and potential disease risk is to be understood fully (21,38). The smaller group of seven ciHHV-6A genomes show higher levels of divergence from the three available HHV-6A reference genomes from the United States, Uganda, and the Gambia and as reported previously (30). However, in making these observations, the possibility of sample bias should be considered, both in the geographic distribution of the ciHHV-6 genomes analyzed and, in particular, in the small number of nonintegrated HHV-6A and HHV-6B genomes that are available for comparative analysis.
The isolation of chromosome junction fragments from eight ciHHV-6B samples (seven group 3 samples and 1-ciHHV-6B) by using primers from chromosome 17p subterminal sequences (35) suggests integration in alleles of the 17p telomere. Given the variable nature of human subterminal regions (37), the chromosome locations should be confirmed using a different approach. Nevertheless, comparison of the TTAGGG and degenerate-repeat interspersion patterns at the chromosome-ciHHV-6B junction can be used to deduce relationships (34,39) and, combined with the phylogenetic analyses, shows that the individuals carrying a group 3 ciHHV-6B genome share an ancient ancestor. Group 3 includes individuals from Sardinia, England, Wales, and Orkney, with greater divergence between the ciHHV-6B genomes in the two individuals from Sardinia (HGDP1065 and HGDP1077) than between the individual from Derby, England (DER512), and the Sardinian (HGDP1065) (Fig. 3, 5, and 6 and Tables 1 and 3). Moreover, there is no evidence of a close family relationship between the two individuals from Sardinia. Overall, the data are consistent with the group 3 ciHHV-6B carriers being descendants of a common ancestor who existed approximately 24,500 years ago, similar to the date of the last glacial maximum and probably predating the colonization of Sardinia and Orkney.
The population screen of Orkney identified 42 ciHHV-6B carriers (frequency, 1.9%) ( Table 2) and no ciHHV-6A carriers, which also suggests a founder effect. However, the Orkney ciHHV-6B samples can be divided into two groups based on the length of DR R -T1, the ciHHV-6B phylogenetic analyses, and the different integration sites. Therefore, it is likely that the ciHHV-6B carriers in Orkney are the descendants of two different ciHHV-6B ancestors who may have migrated to Orkney independently. This is consistent with the fine-resolution genetic structure of the Orkney population and the history of Orkney, which includes recent admixture from Norway (Norse Vikings) (36).
Given the evidence that extant ciHHV-6B carriers in group 3 are descendants of a single ancient founder with a germ line integration, it is plausible that other clusters in the phylogenetic tree have similar histories. For example, the three individuals in group 2 may all carry a ciHHV-6B genome integrated in a chromosome 11p telomere. Further verification is required to support this speculation, and this will be valuable when assessing disease risk associated with ciHHV-6 integrations in different telomeres.
There is good evidence that ciHHV-6 genomes can reactivate in some settings, for example, when the immune system is compromised (19,20). However, it is not known what proportion of ciHHV-6 genomes may retain the capacity to reactivate. We investigated this question from various angles. We have presented evidence that some ciHHV-6 genomes are ancient and therefore could have accumulated inactivating mutations while in the human genome. Most of the tandem repeats analyzed in ciHHV-6B genomes showed minor variations in repeat copy numbers ( Fig. 5 and Table  3). However, the functions of these regions are unclear, and as copy number variation exists among the reference genomes, it seems unlikely that the level of variation detected unduly influences the potential functionality of the integrated viral genomes. In the protein-coding regions of ciHHV-6B genomes, 34% of substitutions are nonsyn-onymous and are predicted to cause amino acid substitutions (Fig. 7). A single potentially inactivating mutation was detected as an in-frame stop codon in gene U14 in 1-ciHHV-6B. Since this gene encodes a tegument protein that is essential for the production of viral particles and can induce cell cycle arrest at the G 2 /M phase (40), it seems unlikely that this integrated copy of ciHHV-6B would be able to reactivate. However, the other viral genes may be expressed in this ciHHV-6B genome, and the presence of the viral genome may also affect telomere function. The stop codon in gene U20 in the individual from Pakistan (HGDP00092) is mutated, and this is predicted to extend the U20 protein by 8 amino acid residues. U20 is part of a cluster of genes (U20 to U24) that are specific to HHV-6A, HHV-6B, and their relative human betaherpesvirus 7 and likely plays a role in suppressing an apoptotic response by the infected host cell (41,42). Further experimental analysis will be required to determine whether the modest extension affects the function of the U20 protein. Among the seven ciHHV-6A genomes, two contain novel in-frame stop codons. One of these is located in U83 in GLA_4298. The other is present in U79 in GLA_15137, but this inactivating mutation is absent from the closely related ciHHV-6A genome in 7A-17p13.3 ( Fig. 3C and D).
In summary, we have shown that most ciHHV-6A and ciHHV-6B genomes contain an intact set of genes and therefore may have the potential to be fully functional. This observation needs to be taken into consideration when assessing whether ciHHV-6 carrier status is associated with disease risk and in understanding the underlying mechanisms of such associations (e.g., whether viral reactivation is involved). Among the individuals of European descent, we found strong evidence for the ancient common ancestry of some of the integrated viral genomes. The close similarity between ciHHV-6B genomes in the Europeans and the evidence of multiple different integration events by similar strains also indicate that we have effectively sequenced the ancient, nonintegrated strains of HHV-6B that existed in European populations in prehistoric times. Based on these observations, it is possible that other populations, for example, in China, South Asia, and Africa, may show similar founder effects among ciHHV-6 carriers but from different ancient strains (43). Our limited knowledge of nonintegrated HHV-6A and HHV-6B strains is based mostly on strains derived from Africa and Japan. There is now a real need to sequence nonintegrated strains from other populations, including those in Europe, so that the relationship between nonintegrated HHV-6 and ciHHV-6 can be fully understood. A major challenge will be to determine whether germ line integration continues to occur de novo today and, if so, at what rate and by which viral strains.

MATERIALS AND METHODS
Population screening to identify ciHHV-6 carriers. ciHHV-6 carriers were identified by screening a variety of DNA sample collections from individuals from around the world, using PCR assays to detect either U11, U18, DR5 (HHV-6A), or DR7 (HHV-6B) (12), or U7, DR1, DR6A, or DR6B (22 and unpublished data). DR5, DR6A, DR6B, and DR7 correspond to ORFs in the original annotation of the HHV-6A genome (GenBank accession no. X83413) (25), but DR5 is in a noncoding region of the genome, and DR6A, DR6B, and DR7 are in exons of DR6 in the reannotation used (RefSeq accession no. NC_001664). From the populations screened, 58 samples with ciHHV-6 among 3,875 individuals were identified ( Table 2). The number of individuals screened in most populations was small and therefore cannot be used to give an accurate estimate of ciHHV-6A or -B frequencies, although a larger number of ciHHV-6B-positive samples were identified overall. The frequency of ciHHV-6B carriers in Orkney (1.9%), a collection of islands off the north coast of Scotland, is higher than that reported from England (44). Screening of the GS:SFHS will be described elsewhere (R. F. Jarrett, unpublished data). Ethical approval for the GS:SFHS cohort was obtained from the Tayside Committee on Medical Research Ethics (on behalf of the National Health Service).
Generation of overlapping amplicons and sequencing. The 32 primer pairs used to generate overlapping amplicons from ciHHV-6A genomes and the PCR conditions employed were reported previously (18). The primer pairs used to amplify ciHHV-6B genomes were based on conserved sequences from the HHV-6B nonintegrated HST and Z29 strains (GenBank accession no. AB021506.1 and AF157706, respectively) (9, 24). The primer sequences are shown in Table S1 in the supplemental material. The amplicons from each sample were pooled in equimolar proportions and then sequenced using the Illumina MiSeq or IonTorrent (Life Technologies) next-generation sequencing platforms, as described previously (18). Some sequences were verified by using Sanger dideoxy chain termination sequencing on PCR-amplified products.
Assembly and analysis of DNA sequence data. DNA sequence data were processed essentially as described previously (18), except that SPAdes v. 3.5.0 (45) was used for de novo assembly into contigs, ABACAS v. 1.3.1 (46) was used to order contigs, and Gapfiller v. 1-11 (47) was used to fill gaps between contigs. The integrity of the sequences was verified by aligning them against the read data using BWA v. 0.6.2-r126 (53) and visualizing the alignments as BAM files using Tablet v. 1.13.08.05 (52). Nucleotide substitutions, indels, and repeat regions were also verified by manual analysis using IGV v. 2.3 (http:// software.broadinstitute.org/software/igv/home).
Alignments of the seven ciHHV-6A genomes with the three published HHV-6A genomes from the nonintegrated strains U1102, GS, and AJ (25,(27)(28)(29) and alignment of the 21 ciHHV-6B genomes with the two previously published HHV-6B genomes from the nonintegrated viruses HST and Z29 (9, 24) were carried out using Gap4 (48). Variation across the ciHHV-6 genomes was studied by a combination of manual inspection and automated analysis using an in-house Perl script. The script performed a sliding-window count of substitutions using the aligned Gap4 files, reporting the count according to the midpoint of the window. For analysis across the genome, the window size was 1 kb and the step size was 1 nucleotide. For analysis of individual ORFs, a file with a list of annotated positions was generated.
Phylogenetic analyses were carried out by two different methods. Maximum-likelihood trees were built by using the maximum composite likelihood model (MEGA6.0), and bootstrap values were obtained with 2,000 replications. Model selection was carried out for HHV-6A and HHV-6B separately, and the substitution model with the lowest Bayesian information criterion was selected (the Tamura 3-parameter model [49] for HHV-6B and the Hasegawa-Kishino-Yano model for HHV-6A). Median-joining networks were built using Network 5.0 (Fluxus Engineering) with default parameters. Sites with missing data were excluded from all phylogenetic analyses for both HHV-6A and HHV-6B. The number of positions analyzed for HHV-6B was 130,412, and that for HHV-6A was 117,900. The TMRCA was calculated by using rho as implemented in Network 5.0. Rho values were transformed into time values using the accepted mutation rate for the human genome, 0.5EϪ9 substitutions per bp per year (50), scaled to the number of sites analyzed.
Comparison of tandem-repeat regions. The copy numbers of repeat units in the DR-R, R0, R1, R2, R3, and R4 tandem-repeat regions (9,24) were determined by manual inspection of the individual BAM files generated for each sequenced ciHHV-6 genome, with verification by checking the sequence alignments generated using Gap4. The number of copies of TTAGGG in each DR L -T2 region was determined from PCR amplicons generated using the DR8F and UDL6R primers (see Table S1 in the supplemental material). Each amplicon was purified using a Zymoclean gel DNA recovery kit (Cambridge Bioscience) and then sequenced by the Sanger dideoxy chain termination method. The sequence data were analyzed by using MacVector software (MacVector Inc.). Variation at the (CA) n repeat array located immediately adjacent to T1 in HHV-6B was investigated in DR L specifically by reamplification of STELA (51) products, using the primers DR1R and TJ1F. The short amplicons were purified and sequenced as described above and compared with the same sequences in the reference HST and Z29 genomes.
Analysis of the DR R -T1 region by TVR-PCR. The DR R -T1 regions from ciHHV-6B-positive samples were amplified using the primers U100Fw2 and DR1R. TVR-PCR was conducted on each of these amplicons essentially as described previously (32, 33) but using an end-labeled primer, HHV-6B-UDR5F, and the unlabeled TAG-TELWRev. The TELWRev primer anneals to TTAGGG repeats, allowing amplification of products that differ in length depending on the location of the TTAGGG repeat with respect to the flanking primer (HHV-6B-UDR5F). The labeled amplicons from the T1 region were separated by size in a 6% denaturing polyacrylamide gel.
Analysis of HHV-6 ORFs. The frequency of nucleotide substitutions in each ORF was determined by a combination of manual inspection and automated analysis using a Perl script, as described above. The DNA sequences of the 86 HHV-6B ORFs from the 21 ciHHV-6B genomes were aligned to identify and compare the numbers of synonymous and nonsynonymous codon changes within and among genes. In addition, the predicted amino acid sequences for each gene in the 21 ciHHV-6B genomes were aligned to confirm the number of nonsynonymous changes.
Characterization of chromosome-ciHHV-6 junctions. The junctions between the chromosome and the ciHHV-6 genome were isolated by PCR amplification using various primers that anneal to subterminal regions of a variety of human chromosomes in combination with the DR8F primer. The amplicons were purified as described above and sequenced by the Sanger method with a variety of primers (see Table  S1 in the supplemental material). The number of repeats present in each junction fragment and the interspersion of TTAGGG repeats with degenerate repeats was determined by manual inspection using MacVector software.
Accession number(s). The finished sequences have been deposited in GenBank under accession numbers KY316030 to KY316056 ( Table 1). The LEI_1501 ciHHV-6A genome reported previously has accession number KT355575 (18).