Evaluation of Accumulation of Hepatitis C Virus Mutations in a Chronically Infected Chimpanzee: Comparison of the Core, E1, HVR1, and NS5b Regions

ABSTRACT Four hepatitis C virus genome regions (the core, E1, HVR1, and NS5b) were amplified and sequenced from yearly samples obtained from a chronically infected chimpanzee over a 12-year span. Nucleotide substitutions were found to accumulate in the core, E1, and HVR1 regions during the course of chronic infection; substitutions within the NS5b region were not detected for the first 8 years and were found to be minimal during the last 4 years. The rate of accumulation of mutations in the core and E1 regions, based on a direct comparison between the first 1979 sequence and the last 1990 sequence, was 1.120 × 10−3, while phylogenetic ancestral comparison using the 12 yearly sequences showed a rate of 0.816 × 10−3bases per site per year. Temporal evaluation of the sequences revealed that there appeared to be periods in which substitutions accumulated and became fixed, followed by periods with relative stasis or random substitutions that did not persist. Synonymous and nonsynonymous substitutions within the core, E1, and HVR1 regions were also analyzed. In the core and E1 regions, synonymous substitutions predominated and gradually increased over time. However, within the HVR1 region, nonsynonymous substitutions predominated but gradually decreased over time.

Hepatitis C virus (HCV), a member of the flavivirus family, is the major causative agent of non-A, non-B hepatitis (3,4). Genetic diversity of HCV can be categorized into two or three levels. Phylogenetic analysis of HCV sequences has identified six genotypes, or clades, some of which have discrete subtypes (23,25). This level of diversity is the result of mutations during evolution of the virus and generally reflects a variety of geographic regions. In addition, as has been found for most RNA viruses, HCV within an individual is composed of a group of closely related yet heterogeneous sequences (quasispecies) (16). The sequences of the quasispecies cluster around a dominant sequence (master sequence) that is the most abundant sequence in the population. This master sequence may or may not reflect the consensus sequence, which has been defined as the most frequently appearing nucleotide at a given position when a number of genomes in a quasispecies are sequenced (12) and is the sequence identified by direct sequencing of PCR products.
A number of studies have addressed the mutation rate of HCV and have estimated its range to be 0.4 ϫ 10 Ϫ3 to 1.92 ϫ 10 Ϫ3 bases per site per year (1,18,20). These studies were based on paired HCV sequences that provided only two points for estimating this rate. The values were obtained by calculating the number of nucleotide differences (direct comparison) between cloned fragments that might not reflect the consensus sequence. In addition, using direct comparison, the earlier sequence is assumed to be the direct ancestor of the later sequence. Since HCV is composed of quasispecies, the two sequences may result from two coevolving variants derived from a common ancestor. If this is the case, the value obtained by direct comparison, using only two time points, would overestimate this rate (14).
We used 12 yearly samples from a chimpanzee chronically infected with HCV to evaluate the rate of accumulation of mutations in four genome regions (the core, E1, HVR1, and NS5b) based on the consensus sequence determined by direct sequencing of PCR products. Two different methods were used for analysis of this information: (i) enumeration of nucleotide differences (direct comparison) and (ii) phylogenetic analysis (ancestral comparison).
A batch of human antihemophiliac factor, implicated in the transmission of non-A, non-B hepatitis (7), was intravenously inoculated into a chimpanzee and resulted in chronic hepatitis for 13 years (8). A plasma pool from this chimpanzee was used for characterization of the prototype HCV strain, HCV-1 (11). We selected archived serum samples from this chimpanzee from approximately the same time (February) each year from 1979 to 1990 (n ϭ 12). In addition, five serum samples from 1982 were chosen to further investigate the accumulation of mutations.
RNA was extracted from 100 l of serum using Tripure (Boehringer Mannheim, Indianapolis, Ind.), and reverse transcription was performed using an antisense primer located at nucleotides (nt) 5379 to 5399 (M62321) and Superscript II reverse transcriptase (GIBCO/BRL, Rockville, Md.). The resulting cDNA was used to amplify a 4,657-bp fragment with primers HCV1 (nt 18 to 41, M62321) and HCV9 (nt 4653 to 4675, M62321). Nested PCR amplification was used to gener-ate a 1,333-bp fragment using primers C1 (nt 276 to 298, M62321) and E2 (nt 1587 to 1608, M62321). The 1,333-bp fragment covers the complete core, E1, and HVR1 regions. Amplification of the NS5b fragment required a single round of PCR amplification and a pair of degenerate primers, ENO2 and ENO4, adapted from Enomoto et al. (13); the sequences were 5Ј-TGGGSTTYKCSTATGAYACCCGMTGYTTTG A-3Ј (nt 8245 to 8275, M62321) for ENO2 and 5Ј-GGCKGA RTACCTRGTCATAGCCTCCGTGAA-3Ј (nt 8616 to 8645, M62321) for ENO4. All amplifications were performed with the Advantage cDNA PCR kit (Clontech, Palo Alto, Calif.). The amplicons were gel purified and directly sequenced in both directions using dRhodamine terminators and electrophoresed on an ABI 377 sequencer (PE Applied Biosystems,  1980  1981  1982  1983  1984  1985  1986  1987  1988  1989  1990 a Nucleotide positions in our sequences (accession numbers AF268569 to AF268580), counting from the initial polyprotein ATG. b Nucleotides present in the 1979 sequence at the designated locations. c Due to direct sequencing of PCR products, six degenerate nucleotides were generated and labeled with italic letters. Boxed nucleotides indicated substitutions that accumulated and became fixed.
Foster City, Calif.). All sequence information was analyzed using algorithms supplied by the Genetics Computer Group package.
A comparison of each yearly sequence with the 1979 sequence revealed increasing numbers of substitutions within the core, E1, HVR1, and NS5b regions. Within the core and E1 regions, there was a gradual increase in substitutions (shown in Table 1), while the NS5b region had only one nucleotide change, which occurred late in the infection, and the overall number of substitutions within the HVR1 region was 10 times higher. HVR changes may result from the accumulation of random substitutions, changes in the predominant quasispecies population, and immune pressure, while the NS5b region may have functional constraints that limit base changes (26). Therefore, the HVR1 and NS5b regions were excluded from further analysis when the rate of accumulation of mutations was calculated. The enumeration of nucleotide substitutions within the complete core and E1 regions using the 1979 and 1990 sequences gave a mutation rate of 1.12 ϫ 10 Ϫ3 bases per site per year. This direct comparison method is the same as was used in three previous publications to estimate the rate of accumulation of HCV mutations (1,18,20).
Since the 1979 and 1990 sequences could be derived from two variants descended from a common ancestor generated prior to 1979, we performed a phylogenetic analysis (ancestral comparison) with all 12 sequences, other selected genotype 1 sequences, and prototype sequences representing the other genotypes to determine the rate of accumulation of mutations. A phylogenetic tree (Fig. 1A) was constructed using the 1,149-bp sequence covering the complete core and E1 regions. The 12 yearly sequences clustered together and the branch lengths generally increased with the length of infection. The branch length of each yearly isolate relative to the common ancestor of two different branches of subtype HCV-1a was determined (Fig. 1A). These values were plotted against the year the sample was collected, and the resulting relationship was evaluated by regression analysis. Figure 1B shows the results of this analysis for the 12 yearly HCV-1 and paired HCV-H sequences (18). The slope (0.816) reflects the unit increase in substitutions over time and represents the rate of accumulation of mutations, 0.816 ϫ 10 Ϫ3 bases per site per year. Ten of the twelve solid squares were well within the 95% confidence limits (Fig. 1B). The paired HCV-H sequences were within or close to the confidence limits defined by our study. The graph in Fig. 1B could be interpreted to indicate that the increase in nucleotide substitutions in this chimpanzee followed a linear pattern; however, inspection of the data points suggests that the relationship was not strictly linear. In addition, examination of the sequences from 1979 to 1982 reveals a pattern that is distinct from that of the sequences determined starting in 1983 (Table 1), and additional sequence shifts appear to have occurred between 1985 and 1986 and between 1987 and 1988 (Table 1). These nucleotide changes represent eight synonymous changes and three nonsynonymous changes. The data shown in Table 1 are consistent with a punctuated substitution pattern that reflects the selective fixation of random substitutions or that may reflect the shift from one quasispecies variant population to another.
We investigated this further by selecting five additional samples from the 1982 to 1983 time frame for amplification and sequencing and found that multiple sequence shifts occurred during this year. C 1053 appeared in March 1982 and was followed by the appearance of T 622 in April 1982 (Table 2). This pattern, pattern A, was maintained until December 1982, when three nucleotide changes were detected (C 492 , A 561 , and A 908 ), resulting in sequence pattern B. Between December 1982 and February 1983, two additional nucleotide changes (T 702 and C 929 ) accumulated, resulting in a sequence pattern (C) that persisted until 1985. A third sequence shift occurred during the year between the 1985 and 1986 samples; four additional nucleotide changes accumulated, resulting in sequence pattern D, which persisted for the remainder of the infection.
We also evaluated the core, E1, and HVR1 regions for synonymous and nonsynonymous substitutions. Phylogenetic trees were generated for the core to E1, core only, E1 only, and HVR1 regions, based on only synonymous or nonsynonymous substitutions, and regression analysis was performed using the resulting branch lengths. As shown in Fig. 2, synonymous substitutions predominated and increased over time in the core and E1 regions ( Fig. 2A through C). The rates of accumulation of synonymous mutations for the core to E1, core only, and E1 only region were 2.0 ϫ 10 Ϫ3 , 2.0 ϫ 10 Ϫ3 , and 1.6 ϫ 10 Ϫ3 , respectively. In contrast to the substitutions found within the core and E1 regions, nonsynonymous substitutions predominated in the HVR1 region (Fig. 2D). Furthermore, it appeared by this analysis that nonsynonymous substitutions gradually decreased. An inspection of the HVR1 amino acid substitutions during the course of this infection revealed limited amino acid differences compared to those seen in H77 and H90 (data not shown). All of the differences were conservative and would not drastically alter the overall shape or properties of this region. Previous studies (17) have noted that changes within different HVR1 regions are not random, with homologous amino acid replacements being found within each genetic clade or genotype in human infections. Published data from other HCV-infected chimpanzees showed no or limited amino acid changes within the HVR1 region during infections lasting 1.9 and 8.3 years (6). As only 15 to 20% of chimpanzees generate detectable antibodies to E1 and E2 (6), the HVR1 region may be under less immune pressure in chimpanzees than in humans.
We have evaluated the nucleotide sequences of the core, E1, HVR1, and NS5b regions in a chronically infected chimpanzee over a period of 12 years. This is the first study of serial samples during chronic infection of a chimpanzee, and the sequence data have been used to estimate the rate of accumulation of mutations using the core and E1 genome regions and to evaluate the molecular patterns of changes in this HCV infection. Direct comparison of two paired sequences shows a higher rate than that found with the phylogenetic approach, a characteristic that has been noted by other investigators (14). We found this to be true not only of the data that we have generated but also of previously published paired sequences (data not shown) (1,18,20). Infection of chimpanzees was used to define non-A, non-B hepatitis prior to the identification of the HCV agent and has served as an experimental model for HCV infection. However, recent data indicate that this model may have some limitations when applied to human infections. Chimpanzees appear to resolve chronic infections with a higher frequency than do humans (5), and the genetic diversity within an infected chimpanzee is much lower than that observed within human populations (5,22). This indicates that the virus-host interaction in an HCV infection has an impact on the overall disease process and resolution. Our data are based on sequence information from a chronically infected chimpanzee, and comparable data from human chronic infections are needed to clearly define the process of mutations in the context of human disease.
Nucleotide sequence numbers. The sequences determined in this study have been given GenBank accession numbers AF268569 through AF268592.