ABSTRACT
Herpes simplex virus type 1 (HSV-1) is a ubiquitous human pathogen which establishes lifelong infections. In the present study, we determined the sequence diversity of the complete genes coding for glycoproteins G (gG), I (gI), and E (gE), comprising 2.3% of the HSV-1 genome and located within the unique short (US) region, for 28 clinical HSV-1 isolates inducing oral lesions, genital lesions, or encephalitis. Laboratory strains F and KOS321 were sequenced in parallel. Phylogenetic analysis, including analysis of laboratory strain 17 (GenBank), revealed that the sequences were separated into three genetic groups. The identification of different genogroups facilitated the detection of recombinant viruses by using specific nucleotide substitutions as recombination markers. Seven of the isolates and strain 17 displayed sequences consistent with intergenic recombination, and at least four isolates were intragenic recombinants. The observed frequency of recombination based on an analysis of a short stretch of the US region suggests that most full-length HSV-1 genomes consist of a mosaic of segments from different genetic groups. Polymorphic tandem repeat regions, consisting of two to eight blocks of 21 nucleotides in the gI gene and seven to eight repeats of 3 nucleotides in the gG gene, were also detected. Laboratory strain KOS321 displayed a frameshift mutation in the gI gene with a subsequent alteration of the deduced intracellular portion of the protein. The presence of polymorphic tandem repeat regions and the different genogroup identities can be used for molecular epidemiology studies and for further detection of recombination in the HSV-1 genome.
The family Herpesviridae is a large family comprising at least 100 herpesviruses which are highly disseminated among animals. Eight human herpesviruses have been described, and molecular phylogenetic analysis has established three subfamilies (24). These three groups correspond to the current taxonomic classification based on biological properties and include the Alphaherpesvirinae, Betaherpesvirinae, and Gammaherpesvirinae. Herpes simplex virus (HSV) belongs to the Alphaherpesvirinae and is classified in this subfamily on the basis of a wide host cell range, an efficient and rapid reproductive cell cycle, and the capacity to establish latency in the sensory ganglia (38).
A major mechanism which upholds the accuracy of replication involves the 3′→5′-exonuclease activity associated with DNA polymerases. Proofreading activity has been demonstrated for the HSV type 1 (HSV-1) DNA polymerase (1). In addition, it is likely that cellular repair mechanisms contribute to the stability of the virus genome. The overall mutation rate for HSV-1 has been estimated to be 3.5 × 10−8 mutations/site/year (41), and the genome is therefore more stable than that described for RNA viruses (13, 18). Although genetic variations and classification into different genogroups have been described for other herpesviruses, such as varicella-zoster virus (31), Epstein-Barr virus (42), cytomegalovirus (6, 7), and human herpesviruses 6 (8), 7 (15), and 8 (28), data on genetic variability based on DNA sequencing of clinical HSV-1 isolates are limited. Recently, Rekabdar et al. sequenced the gene coding for glycoprotein G (gG) for 105 clinical HSV-1 isolates and identified 28 unique sequences (36). Thus, most of the isolates displayed identical gG sequences, a finding that demonstrates stability of the genome at least among HSV-1 isolates in the western hemisphere. Furthermore, four of the investigated HSV-1 isolates appeared to be intragenic recombinants.
In the present study, we performed DNA sequencing of genes coding for envelope gG, glycoprotein I (gI), and glycoprotein E (gE), located within the unique short (US) region, for another group of clinical HSV-1 isolates. The gI and gE gene sequences were clearly separated into three genetic groups. As described earlier (36), phylogenetic analysis of the complete gG gene showed two main genetic groups. However, analysis of separate parts of the gene showed that the gG gene sequence also could be divided into three genetic groups. The phylogenetic classification of the isolates into different genogroups revealed that recombination is an important feature in the evolution of the US region of the HSV-1 genome.
MATERIALS AND METHODS
Isolates and cells.In total, 28 clinical HSV-1 isolates were selected from 11 patients with oral lesions, 10 patients with genital lesions, and 7 patients with encephalitis (Table 1). The isolates derived from patients with encephalitis were cultured from brain biopsy or autopsy specimens and have been described elsewhere (3). Laboratory strains F and KOS321 were sequenced in parallel; these laboratory strains have been available since 1981 and 1983, respectively. All viruses used in this study were investigated at a low passage number (<5). African green monkey kidney (GMK-AH1) cells were cultured in Eagle's minimal essential medium supplemented with 2% calf serum and antibiotics. The isolates were subtyped by using a type-specific anti-glycoprotein C (gC) monoclonal antibody as described previously (20).
Characteristics of 28 patient isolates included in the phylogenetic analysis
PCR and DNA sequencing.The gG gene (US4) was amplified as described previously (35). Amplification and sequencing of the complete gI (US7) and gE (US8) genes were carried out by using the primers listed in Table 2. Viral DNA was prepared by using a QIAmp blood kit (Qiagen). The PCR was started with denaturation for 5 min at 96°C followed by 40 cycles as follows: denaturation for 45 s at 95°C, annealing for 45 s at 58°C for the gI gene and at 62°C for the gE gene, and elongation for 45 s at 72°C with a 3-s cycle extension; a Perkin-Elmer DNA thermal cycler was used. The PCR products were extracted by using a QIAquick gel extraction kit (Qiagen), and cycle sequencing was performed with a dRhodamine Terminator cycle sequencing ready reaction kit (Applied Biosystems). An ABI Prism 310 genetic analyzer (Applied Biosystems) was used for analysis of the samples. HSV-1 DNA was sequenced in both directions by using sense and antisense primers.
Primers used for amplification and sequencing of the gI and gE genes
Sequence analysis.Alignments of the sequences were constructed by using ClustalW (version 1.82) with the International Union of Biochemistry DNA weight matrix. Different values for the gap opening and gap extension penalty as well as manual rearrangements of the alignment were applied to achieve optimal results. Phylogenetic analyses were based on sequences excluding gap-containing regions and were performed by using the Phylip package. Trees were constructed, after bootstrapping to 100 replicates, by the maximum-likelihood method. Neighbor-joining and maximum-parsimony methods were applied in parallel for comparison. Trees based on sequences from the gI and gE genes were rooted by using as outgroup sequences the corresponding genes from HSV-2 laboratory strain HG52 (12), which were retrieved from the EMBL database. The phylogenetic tree constructed for the gG gene sequences could not be rooted due to a significant genetic distance between the gG genes of HSV-1 and HSV-2 (26, 27) and is therefore presented as an unrooted tree. Results were compared with sequences reported for HSV-1 laboratory strain 17 (GenBank accession number X14112 ) (26).
In an attempt to detect putative intragenic recombinant isolates, the phylogeny of gene segments was determined by the bootscan analysis included in the SimPlot program (22). The genes were scanned by using a sliding window of 250 nucleotides (nt) with a 20-nt step size. The neighbor-joining method was applied to 100 bootstrap replicates of the window in each step. The consensus sequence for each genetic group was used as the reference sequence for the recombinant candidates.
RESULTS
Sequence variations of the gG, gI, and gE genes.The HSV-1 genome is composed of the unique long (UL) and the US regions, which are flanked by inverted repeats. The gG gene contains 717 nt (strain 17) and is located 2,325 nt from the gI gene in the US segment of the HSV-1 genome. The gI gene (1,173 nt; strain 17) and the gE gene (1,650 nt; strain 17) are separated by 285 nt (Fig. 1). The complete genes were sequenced for 28 clinical HSV-1 isolates and for laboratory strains F and KOS321, and the sequences were compared with that for strain 17. For the gG gene, the preceding 26 nt were also analyzed. The sequences were schematically aligned, showing the distributions of synonymous and nonsynonymous substitutions compared to the consensus sequence for the respective gene (Fig. 1). The locations and numbers of tandem repeats are also shown. A detailed analysis of the sequences is presented below.
Schematic representation of sequence data for the US4, US7, and US8 genes of 28 clinical isolates as well as laboratory strains F, KOS321, and 17 (GenBank). The sequences were compared to the consensus sequence of each gene, and synonymous (s) and nonsynonymous (n) substitutions are indicated. The sequences are presented in the same order for all three genes. The genetic groups indicated by the phylogenetic analysis (see Fig. 2) are arbitrarily designated A, B, and C, and the isolates are color coded based on genetic group identity. Isolates not clustering distinctly with any genetic group are shown in black. The asterisk in the US7 gene of strain KOS321 represents an insertion of an extra nucleotide. Nucleotide substitutions in the noncoding region upstream of the US4 gene are marked (x). The tandem repeat regions (TR) are shown in light blue, and the numbers of repeats are denoted.
Phylogenetic analysis of the gI gene sequences.After alignment of the sequences and exclusion of the tandem repeat region, sequence analysis including laboratory strain 17 (GenBank) was performed by the maximum-likelihood method. The sequences were clearly separated into three main genetic groups supported by high bootstrap values (Fig. 2). Similar topologies were seen when the neighbor-joining and maximum-parsimony methods were used. The genogroups, arbitrarily designated A, B, and C, differed by 0.6 to 2%. Twenty-nine unique sequences were detected. Laboratory strain KOS321 displayed an insertion of an extra cytosine in a homopolymer run of eight cytosines (nt 1022 to 1029) compared to strain 17. The frameshift mutation introduced a novel stop codon (TAA) located 33 nt downstream of the ordinary stop codon. Thus, approximately half of the deduced intracellular portion of the gI protein contained altered amino acids. Strain KOS321 is a virus that was plaque purified from strain KOS (16). Both stock virus and virus from passages were sequenced several times to exclude PCR or cell culture artifacts. All of the various preparations of strain KOS321 displayed the described frameshift mutation. However, as parental strain KOS was not available for sequencing, we cannot exclude the possibility that the described frameshift mutation was introduced during the plaque purification process.
Phylogenetic analysis with the maximum-likelihood method of the gI and gE gene sequences, excluding the tandem repeat regions, for 28 clinical isolates and laboratory strains F, KOS321, and 17 (GenBank). The trees were constructed from 100 bootstrap replicates by using the Phylip package. Bootstrap values of >70 are shown. The trees clearly separate the isolates into three genetic groups (A to C). Isolates which cluster with different groups in the trees are considered intergenic recombinants. Isolates with a recombination point located between the gI and gE genes are shown in bold type and are connected by broken lines, while isolates with a recombination point located between the gG and gI genes are shown in bold type and are underlined (see Fig. 4). Intragenic recombinants (see Fig. 5) are marked with an asterisk.
Alignment of the sequences revealed the presence of an intragenic tandem repeat region within the gI gene. All isolates harbored a region consisting of two to eight repeated blocks containing the sequence CCTCCACCCCCTCGACCACCA (Fig. 3). In addition, a second genetic block encompassing the sequence TCCCCGCTCCCTCGACCACCA, repeated one to three times, was detected in six of the clinical isolates as well as in strain 17; the first 9 nt in this block (underlined) are identical to the first 9 nt of the flanking region located downstream. Laboratory strain 17 as well as six of eight clinical isolates belonging to genetic group C in the gI gene tree (Fig. 2) but none of the isolates belonging to genetic group A or B contained the second genetic block. These findings further support the presence of a third evolutionarily separated group. The tandem repeat region was located from nt 620 to nt 682 in strain 17, which contains three repeats. Single point mutations were detected in the first block in isolate 25 and in the fifth block in isolate 993626 (Fig. 3). Isolate E4 displayed four complete blocks and a partial block composed of the initial 11 nt of the complete block. In addition, isolate E4 also displayed a deletion of 17 nt in the flanking region (Fig. 3). No association was found between the number of repeated blocks and classification into the phylogenetically separated groups, suggesting that the tandem repeat region evolved separately from and faster than the remaining part of the gene.
Organization of the tandem repeat regions in the gI genes of 28 clinical isolates and laboratory strains F, KOS321, and 17 (GenBank), which are shown in bold type. The second type of block, which is present only in isolates belonging to genetic group C, is underlined. Single nucleotide substitutions are shown in bold type and are underlined.
Genetic stability in cell cultures.In a separate experiment, we investigated the genetic stability of the tandem repeat region of the gI gene in cell cultures. Nine clinical samples derived directly from HSV-1-induced lesions were analyzed. The samples contained two to six repeated blocks in the gI gene (data not shown). Identical numbers of repeats were present after three passages on GMK-AH1 cells for each isolate. These findings indicate that the described polymorphism of the tandem repeat region observed in the 28 clinical isolates is not a cell culture artifact.
Phylogenetic analysis of the gE gene sequences.As shown for the gI gene sequences, phylogenetic analysis including strain 17 revealed three main genetic groups supported by high bootstrap values (Fig. 2). Twenty-three unique sequences were identified. The genogroups differed by 0.7% to 1.1%. All of the isolates classified into genogroup B displayed an insertion of 6 nt (CGAGGG) after position 551, resulting in a tandem repeat region with two repeated blocks of 6 nt. In contrast, none of the isolates classified into genotype A or C displayed such insertion.
Phylogenetic analysis of the gG gene sequences.The complete gG gene as well as the preceding 26 nt were sequenced for the clinical isolates and for laboratory strains F and KOS321 (Fig. 1). Fourteen unique sequences were detected. As described earlier, two main genetic groups were identified by phylogenetic analysis based on the complete gene (36). We observed that the results of the phylogenetic analysis were dependent on which part of the gG gene was investigated. When the sequences were analyzed segment by segment, we observed that the gene consisted of two specific parts separated by the tandem repeat region (nt 235 to 255; strain 17; see below). The first part of the gene, encompassing nt 1 to 234, was highly conserved, with only two informative sites (nt 8 and nt 228). When the sequences upstream of the gG gene were analyzed, an additional specific nucleotide substitution was detected at position −26. These three specific sites cluster the isolates into two groups—isolates from groups A and B (compared to the tree topology described for the gI and gE genes) in one group and isolates from group C in the other. The second part, encompassing nt 256 to 717, contains in total 18 specific sites that discriminate the sequences into group A or into groups B and C. Consequently, the main difference between the two parts of the gG gene is that isolates from group B cluster with group A in the first part and with group C in the second part (Fig. 4), indicating that the gG gene sequences also are divided into three genetic groups. The genetic distance (based on the complete gene) between groups A and B was 2.8%, and that between groups A and C was 3.1%. When the third genogroup for the gG gene sequences is included, laboratory strain KOS321 clusters with group A, strain F clusters with group B, and strain 17 clusters with group C.
Phylogenetic analysis of two gG gene segments, from nt −26 to nt 234 and from nt 256 to nt 717, separated by the tandem repeat region. The isolates are color coded, and intragenic recombinants with a recombination point located between the gG and gI genes are underlined (Fig. 2). The informative sites are displayed, and the genetic groups (A to C) are represented by the laboratory strains.
The alignment also revealed that the gG gene contained a tandem repeat region with six or seven repeats of GAG flanking GAA. This region codes for a polyglutamyl stretch. Including 110 additional HSV-1 sequences presented earlier (36) in the analysis, we conclude that isolates within genetic group A code for seven, eight, or nine glutamyl residues, while isolates within genetic groups B and C code for seven glutamyl residues. The distributions of the tandem repeats within the different isolates suggest that this region evolved more slowly than the tandem repeat region located in the gI gene.
Tunbäck et al. previously mapped the reactivity of an anti-gG-1 monoclonal antibody to a short linear stretch of gG-1 amino acids AFPL (46). Isolates belonging to genetic groups B and C, represented by laboratory strains F and 17, are reactive with the anti-gG-1 monoclonal antibody, as the sequence T331TC (Fig. 4) codes for the amino acid phenylalanine, located within the epitope (data not shown). In contrast, isolates belonging to genetic group A, represented by laboratory strain KOS321, are not reactive with the antibody, due to the substitution G331TC and a subsequent exchange of phenylalanine with the amino acid valine.
Recombinant isolates.When the topologies of the phylogenetic trees were compared, some isolates appeared on different branches, depending on which gene the tree was based on. Strain 17 belonged to group A and isolate 3355 belonged to group B when the gE gene was analyzed, while both of these isolates belonged to group C when the gI gene was analyzed (Fig. 2). Isolates 90132, 90147, and 993626 belonged to group A when the gE and gI genes were analyzed and to group B in the gG gene tree (Fig. 2 and 4). Isolates 7682, 993606, and 993608 clustered with group A in the gE gene tree but with group B in the gI gene tree. In addition, isolate 7682 clustered with group C in the gG gene tree (Fig. 4). These findings suggest that these isolates are intergenic recombinants with recombination points located between the respective genes.
From the tree topologies of the gI and gE gene sequences, it was apparent that some of the isolates did not cluster distinctly with the genetic groups (Fig. 2). For detection of intragenic recombinants, the gI genes for five isolates (25, 7682, 993606, E4, and 993615) and the gE genes for six clinical isolates (97869, 982466, 993615, 993608, 7682, and E4) were analyzed segment by segment by the bootscan method. For most of these isolates, a complex mixture of genetic segments from the different genetic groups was detected. As many segments within the genes are highly conserved and display few informative sites, conclusive information was difficult to obtain. However, four of the isolates displayed a distinct mosaic pattern (bootstrap values of >70) consistent with intragenic recombination (Fig. 5). Isolates 25 and 7682 harbored two segments each from genetic groups B and C in their gI genes. The gE gene in isolate 993615 consisted of segments from all of the genetic groups—A, B, and C. The gE gene in isolate E4 consisted of a segment derived from group B and a segment derived from group C.
Bootscan analysis of the gE and gI genes with the SimPlot program. A sliding window of 250 nt with a 20-nt step size was used. The neighbor-joining algorithm was applied to 100 bootstrap replicates. The consensus sequences for genetic groups A, B, and C were matched to four putative intragenic recombinant isolates—993615 (A) and E4 (B) for the gE gene and 25 (C) and 7682 (D) for the gI gene. The corresponding phylogenetic trees with bootstrap values of >70 are shown above the plots.
Four putative gG gene intragenic recombinant isolates were recently described from among a group of 105 clinical HSV-1 isolates (36). Here we detected one additional recombinant (isolate 97869) with a sequence identical to that of one of the isolates described earlier. This isolate contained eight repeats in the tandem repeat region, a finding which was exclusive for isolates belonging to genetic group A, and the two specific informative sites T267 and A280 (positions for strain 17), which also indicated clustering of the sequences with genetic group A. In contrast, C324 and 15 additional specific informative sites downstream indicated clustering of the sequences with genetic group B or C (Fig. 4). These data support the notion that recombination has occurred between viruses from genetic groups A and B or C, with a recombination point located between A280 and C324 in the gG gene.
Association of genetic variants with clinical entities.We analyzed whether gender or anatomical site of the lesions was associated with any specific genetic group, with intergenic or intragenic recombination events, or with a specific number of tandem repeats. We observed only one association. None of the 10 genital HSV-1 isolates clustered with genetic group B of the gE gene (Table 1 and Fig. 2). For this reason, we investigated 10 additional clinical isolates belonging to genetic group B of the gE gene. Five of these were isolated from genital lesions, indicating that the results of the present study originated by chance.
DISCUSSION
Here we show that the gI and gE gene sequences from well-defined clinical HSV-1 isolates were separated into three main genogroups. The classification was supported by high bootstrap values, suggesting that these groups have evolved separately. In addition, we observed that the gI gene sequences which clustered with genetic group C harbored a unique block in the tandem repeat region compared to sequences which clustered with genetic group A or B. Furthermore, a short tandem repeat region in the gE gene was detected only in sequences clustering with group B. The distribution of repeats in the tandem repeat regions agrees with the phylogenetic analysis and further supports the existence of three genogroups. A possible interpretation of the phylogenetic analysis of the different parts of the gG gene may be that an ancient recombination event occurred between genetic groups B and C. Viruses belonging to genetic group B apparently acquired the latter part of the gene from group C or vice versa (Fig. 4). The lack of specific substitutions which separate group A from group B in the first part of the gG gene may be explained by the fact that the gG gene is highly conserved within this segment or that an ancient recombination event occurred in this segment as well. The investigated genes—US4 (gG), US7 (gI), and US8 (gE)—are located in the US region of the HSV-1 genome.
In a recently published report, the complete gene (UL44) coding for gC was sequenced for most of the isolates investigated here (23 isolates); even though the bootstrap values were low (<70), phylogenetic analysis suggested that these sequences also were divided into three genetic groups (45). As the genetic groups based on the gC gene sequences consisted of a mixture of isolates from the three genetic groups defined for the US region, the designation of the groups was ambiguous. This observation may be explained by a high frequency of recombination events between the gC gene and the gG gene. However, the lack of genetic sequence information in the intervening region caused difficulties in defining individual recombinant isolates. Hence, more sequence data from the UL region are required to draw more definite conclusions.
Homologous recombination not only allows repair of double-strand DNA breaks but also promotes exchange of genetic information between homologous nucleotide segments. The replication of HSV-1 DNA is closely associated with homologous recombination (48). Through the use of strains with different markers, HSV-1 recombinants have frequently been detected in cell culture systems (5, 17, 47). After coinfection of different HSV-1 strains in heterologous animal models, recombination has readily been identified (21, 50). Recently, recombinants between bovine herpes virus 1 mutants after coinoculation of calves by the natural route of infection were also demonstrated (43). The identification of different genogroups in the present study implies that nucleotide substitutions comprising specific informative sites could be used as recombination markers to detect recombinant isolates. In total, 7 of 28 clinical isolates as well as laboratory strain 17 were found to be intergenic recombinants, with a putative recombination point between the gI and gE genes or between the gI or gE gene and the gG gene. Although this finding was not previously described for clinical HSV-1 isolates, recombination of the type-specific EBNA2 and EBNA3A-C loci has been reported for clinical Epstein-Barr virus isolates (29, 49), and recombination between different genotypes has been described for clinical varicella-zoster virus isolates (31) as well as for human herpesvirus 8 isolates (34). Several of the isolates described here harbored putative recombination points located within a gene. It is known that the frequency of homologous recombination is dependent on the length of the homologous region (39, 51). The complex pattern of a mixture of short segments (<300 nt) belonging to different genetic groups within the investigated genes may be explained by overlapping multiple recombination events; alternatively, HSV-1 replication may facilitate recombination of shorter segments, as described for repeat sequences (14).
With combinations of genetic elements from different genetic groups, a large number of genetic mosaic patterns can be formed. Taking into account the facts that only relatively short regions of the HSV-1 genome were analyzed here and that the frequency of recombination events probably was underestimated due to the existence of highly conserved regions with no or few substitutions, we conclude that it is likely that most single clinical HSV-1 isolates are the result of several recombination events. Thus, complete HSV-1 genomes which belong exclusively to any of the genetic groups are probably rare. It is possible that at least some of the specific nonsynonymous substitutions detected for the respective genetic groups have different phenotypic features under specific biological conditions during evolution and that the genetic heterogeneity generated by recombination may be important for the adaptability of HSV-1. A prerequisite for recombination is the simultaneous replication of different genomes in the same cell, implying either that an individual is reinfected with different isolates or that a single HSV-1 isolate contains a heterogeneous pool of genomes that are transmitted simultaneously.
An evolutionary history of the gG, gI, and gE genes of clinical HSV-1 isolates can now be suggested with no estimation of the evolutionary timescale (Fig. 6). After the separation of HSV-1 from HSV-2, tandem repeat regions were introduced in the gG and gI genes; these regions are present in all three genetic groups. Next, HSV-1 divided into groups AB and C, followed by separation into groups A and B and the subsequent introduction of tandem repeats in the gE gene for viruses belonging to group B. After the separation of groups AB and C, a second type of block was introduced in the gI gene for viruses belonging to group C or was deleted for viruses belonging to group AB. Finally, point mutations that were specific for single isolates in all three genetic groups were introduced.
Schematic phylogenetic illustration of a suggested evolutionary history of the gG, gI, and gE genes of HSV-1. Nucleotide positions are for laboratory strain 17. The straight broken line between genogroups B and C displays the possible earlier recombination event between the genogroups in the gG gene. The curved broken lines illustrate recombination of genomic segments between the different genogroups described for present-day clinical HSV-1 isolates.
A limitation of the present study is that clinical HSV-1 isolates from a geographically restricted area (western part of Sweden) and from only Caucasian individuals were investigated. However, the phylogenetic analyses described showed that laboratory strains F and KOS321, viruses which originate in North America, and strain 17, which originates in Scotland, were closely related to the investigated isolates. The same conclusions were reported earlier for HSV-1 (41) and HSV-2 (19) isolates. It is therefore likely that our results are valid for at least a Caucasian western-hemisphere population. The major human migrations from Africa were estimated to have occurred 110,000 years ago, and the separation of the European and Asian populations was estimated to have occurred 50,000 years ago (32). HSV-1 DNA sequence information from these populations is required to determine the distribution and number of genogroups as well as the recombination frequency. As HSV-1 is assumed to cospeciate with its host (24, 41), sequence data from HSV-1 isolates derived from Africa and Asia may also be used to estimate a timescale for the divergence of the genogroups and to date the possible recombination event between genogroups B and C in the gG gene.
Tandem repeat regions are common features of the HSV-1 genome and are located within the direct repeat regions at the genomic termini as well as within the internal repeat region separating the L and S segments (10, 30, 33). Furthermore, tandem repeats have also been detected within the coding sequences of the UL36 gene (25), the US10 gene (9), and the ICP34.5 gene (4, 23). McGeoch et al. (26) previously described the reiteration of DNA sequences within the gG and gI genes for HSV-1 strain 17. Here we show that these regions are polymorphic in clinical isolates. The tandem repeats of the gI gene code for three blocks of the amino acids serine, threonine, and proline (STPSTTT, STPSTTI, and PAPSTTI, respectively). These residues are typical constituents for O-linked glycosylation, suggesting that this region functions as a mucin tract. The gI gene sequences for the alphaherpesviruses HSV-2, varicella-zoster virus, simian varicella virus, pseudorabies virus, equine herpesviruses 1 and 4, bovine herpesvirus 1, and monkey B virus were analyzed by using the tandem repeat finder program (2). A tandem repeat region was detected only for monkey B virus and comprised 2.8 copies with a period size of 33 nt at nt 686 to 778. However, this region does not code for residues involved in a potential mucin tract (AAPPTPGAEGT). Thus, the polymorphic tandem repeat region of the gI gene seems to be an exclusive feature for HSV-1 among the alphaherpesviruses.
Encephalitis is a rare but devastating complication of HSV-1 infection. In the present study, we were unable to find associations between specific sequences in the gG, gI, and gE genes and anatomical sites of lesions, including the brain. Previously published studies on the sequence variability of the glycoprotein B and D genes did not reveal mutations specific for HSV-1 DNA sequences amplified from cerebrospinal fluid samples from encephalitis patients (40, 44). However, different reports have shown that clinical HSV-1 isolates differ in virulence when tested in animal models (3, 11, 37). Recently, Mao and Rosenthal (23) reported that neuroinvasiveness properties of different HSV-1 strains derived from different organs of a neonate were associated with genetic alterations, including tandem repeats within the ICP34.5 gene. DNA sequencing of larger regions from the clinical isolates described here may be helpful in revealing genetic alterations associated with HSV-1 pathogenesis.
In conclusion, the frequent homologous recombination and the division of the gG, gI, and gE gene sequences into three genogroups detected in the present study may be used for molecular epidemiology studies of different populations as well as for studies of single patient isolates and to investigate the role of homologous recombination in the evolution of the HSV-1 genome.
ACKNOWLEDGMENTS
This work was supported by grants from the Medical Society of Göteborg, the Swedish Medical Research Council (grant no. 11225), the LUA foundation at Sahlgren's Hospital, and the Swedish Society for Medical Research.
We thank Carolina Gustafsson, Anette Roth, Zohreh Sadegzadeh, Lisbeth Gustafsson, and Ann-Sofie Tylö for skillful technical assistance. We also thank Per Elias for valuable comments on the manuscript.
FOOTNOTES
- Received 3 March 2004.
- Accepted 20 May 2004.
- Copyright © 2004 American Society for Microbiology