Previous Article | Next Article ![]()
Journal of Virology, April 2008, p. 3604-3611, Vol. 82, No. 7
0022-538X/08/$08.00+0 doi:10.1128/JVI.01197-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.

Institute of Digestive Disease and Department of Medicine and Therapeutics,1 Department of Biochemistry,2 Department of Computer Science and Engineering,3 Department of Clinical Oncology, The Chinese University of Hong Kong, Hong Kong,4 Victorian Infectious Diseases Research Laboratory, Melbourne, Australia5
Received 1 June 2007/ Accepted 12 December 2007
|
|
|---|
|
|
|---|
Recently studies reported that the prevalence of basal core promoter mutants (A1762T and G1764A) is associated with more aggressive progression of liver disease and development of HCC (19, 30, 33). Several HBV genes, including truncated pre-S2/S and X genes, have been found in hepatoma tissue (15, 16, 23). Another hot spot mutation in the core promoter region is the G1896A and G1899A mutation (12, 30). HBV DNA integration into the host genome may allow persistence of the viral genome in the host and alteration of cell kinetics and cellular metabolism (3, 4, 29). Whether certain mutations of the HBV genes facilitate the integration of the viral genome and virus-host interaction is not known.
Two major reasons for discrepant results from various studies are (i) the small numbers of patients involved in these studies and (ii) the fact that most studies focus on a particular portion of the HBV genome (22). The aim of the present study was to identify markers in the HBV genome for HCC development by studying the complete genomic sequence of HBV among patients with HCC compared to age-matched individuals presenting with the infection but no HCC development.
(Part of this work has been presented at Digestive Diseases Week, 14 to 19 May 2005, Chicago, IL.)
|
|
|---|
DNA extraction, amplification, sequencing, and determination of genotype. HBV DNA was extracted from 100 µl of serum using the QIAamp DNA blood mini kit (Qiagen GmBH, Hilden, Germany) according to the manufacturer's instructions. To obtain the full-length HBV DNA sequence, we performed seminested PCR to amplify three overlapping fragments of the HBV genome. For each fragment, 5 µl of the extracted DNA was used with Taq DNA polymerase (Amersham Biosciences, Uppsala, Sweden) and Pfu DNA polymerase (Promega, Madison, WI) in the first-round PCR and with Taq DNA polymerase alone in the second-round PCR. The final PCR product was examined on a 1.0% agarose-ethidium bromide gel run in 1x Tris-borate-EDTA buffer.
For fragment A, PCR was carried out with P1 and P2 primers with a 5-min initial denaturation at 95°C, followed by 10 cycles of amplification (94°C for 36 s, 60°C for 36 s, and 72°C for 2.5 min), then 30 cycles of amplification (94°C for 36 s, 50°C for 36 s, and 72°C for 2.5 min), and a 7-min final extension at 72°C. The sequences of all primers used for PCR and sequencing in this study are shown in Table 1. The PCR product was further amplified in a seminested PCR with P1 and P3. PCR was carried out with a 5-min initial denaturation at 95°C, followed by 10 cycles of amplification (94°C for 36 s, 60°C for 36 s, and 72°C for 2 min), then 30 cycles of amplification (94°C for 36 s, 52°C for 36 s, and 72°C for 2 min), and a 7-min final extension at 72°C. For fragment B, PCR was carried out with the P4 and P5 primers, and the PCR product was further amplified in a seminested PCR with the P5 and P6 primers. For fragment C, PCR was carried out with the P7 and P9 primers. The PCR product was further amplified in a seminested PCR with the P8 and P9 primers. Both strands of PCR products were directly sequenced with the DYEnamic ET Dye Terminator cycling sequencing kit for MegaBACE (Amersham Biosciences, Piscataway, NJ).
|
View this table: [in a new window] |
TABLE 1. Primers used for PCR amplification and sequencing of the HBV genome
|
Data mining framework. The data mining framework is shown in Fig. 1. The process involved seven modules. After the molecular evolutionary analyses, the data were passed to the clustering module to check whether clusters existed, based on the phylogenetic tree analysis. These clusters are possible genotypes or subgroups possessing differences in some nucleotides which do not have any effects on the classification of HCC. If clusters were found, each cluster was analyzed separately for potential genetic marker sites. While genotype B HBV appeared to be a homogenous group, the phylogenetic tree results showed that there exist two subgroups (clusters) in genotype C among the HBV strains collected (Fig. 2) (9). All three (sub)groups (B, Cs, and Ce) were analyzed separately in the learning and classification parts.
![]() View larger version (15K): [in a new window] |
FIG. 1. Flow diagram for the data mining and classification process.
|
![]() View larger version (29K): [in a new window] |
FIG. 2. Phylogenetic tree of the full-genome sequencing of HBV in the case-control study. All patients were infected with either genotype B or C HBV. Two subgenotypes (Ce and Cs) could be identified in genotype C HBV due to a more than 4% difference in the entire HBV sequence.
|
The selected features were extracted and passed to the classifier learning module, wherein a rule-based classifier was learned. Rule learning tries to learn rules from a set of training data (samples). It can be modeled as a search problem of finding the best rules that classify the training examples with minimum classification error. Generic genetic programming (31), which is a type of evolutionary algorithm (1), was adopted as our search and optimization algorithm to learn the rules. The testing data were then transferred to the preprocessing module with the marker sites selected by the feature selection module. The testing data were preprocessed, and only the part relevant to the selected sites was kept. This part of the testing data was then used for prediction evaluation in the classification module.
Prediction results were output from the classification module. They were then verified by the actual classes given in the testing samples. If the verification results were unsatisfactory, the process was repeated, starting from the features selection.
In the final validation module, when a reasonable classifier was obtained, the classifier could be further validated by testing with previously unanalyzed validation samples.
Statistical analysis. In the case-control study with 100 HCC cases and 100 non-HCC age-matched controls, 90% of the samples were selected randomly as the training set and the remaining 10% formed the testing set in each experiment. For each data set, the experiment was repeated 10 times by picking different training sets. For each learning and evaluation experiment, sensitivity and specificity as defined below were estimated as the fitness or performance indicators of the classification rules. The average sensitivity and specificity of the testing set in the case-control study and of the validation cohort were determined. The 95% confidence intervals (CIs) of the sensitivity and specificity as well as the likelihood ratios were determined based on the performance of the algorithms on the entire data set. The odds ratios (ORs) and 95% CIs for HCC among patients with different numbers of HCC-related mutations were also calculated. When any zero cell occurred in the two-by-two contingency table, we added 0.5, based on the Haldane correction (14), to all of the cells in the calculation of ORs and 95% CIs. The statistical significance was examined at the conventional level of 0.05 by analysis of variance, the chi-square test, or Fisher's exact test as appropriate.
|
|
|---|
|
View this table: [in a new window] |
TABLE 2. Demographic characteristics, clinical diagnoses, HBeAg/anti-HBe status, and genotypes of HBV for the HCC and control groups
|
Subgenotype prevalence in subjects. Genotype B HBV appeared to be a homogenous group, and all belonged to subgenotype Ba (26). However, the phylogenetic tree results showed that there existed two subgroups, namely, Ce (found predominantly in East Asia) and Cs (found predominantly in Southeast Asia), in genotype C among the HBV strains collected (Fig. 2). This is in concordance with our previous phylogeny with published full-length sequences in GenBank (9).
The clinical characteristics of patients with genotype B and subgenotypes of genotype C are shown in Table 2. No significant difference in age (P = 0.46), gender (P = 0.06), or presence of HCC (P = 0.11) was observed between patients with genotype B and subgenotype C. The proportions of cirrhosis in HCC patients with HBV genotype B, subgenotype Ce, and subgenotype Cs were 65%, 88%, and 62%, respectively. The risk of cirrhosis and HCC for subgenotype Ce was higher than for the others, but this result did not show a statistically significant difference (P = 0.16%). These percentages of cirrhosis were much higher than the proportions of cirrhosis in control patients with HBV genotype B (P < 0.001), subgenotype Ce (P < 0.001), or subgenotype Cs (P < 0.001).
HCC-related mutations. Among HCC patients with genotype B HBV, mutations in the following sites were commonly found: A1762T (81.1%) and G1764A (81.1%), C1165T (18.9%), T2712C/A/G (70.3%), and A/T2525C (21.6%). The mutations at these nucleotide positions in the HCC and control groups are shown in Table 3. In the group with HBV subgenotype Ce, the mutations T31C (37.5%), T53C (37.5%), and A1499G (62.5%) were associated with HCC development: (Table 3). In the group with HBV subgenotype Cs, the mutations G1613A (38.3%), G1899A (27.7%), T2170C/G (34.0%), and T2441C (21.3%) were associated with HCC development (Table 3). Combining the patients from the case-control study and the independent validation cohort, the presence of an increasing number of HCC-related mutations in each HBV genotype/subtype was associated with an increased risk of HCC (Table 4). All mutations associated with HCC development had amino acid changes in at least one of the four open reading frame of HBV (Table 5). Amino acid changes in the X region were found only in genotype B HBV. Envelope region amino acid changes were found in HBV subgenotype Ce, whereas precore/core region amino acid changes were found in HBV subgenotype Cs.
|
View this table: [in a new window] |
TABLE 3. Mutations in different genotypes associated with HCC development in the case-control study (100 patients with HCC and 100 control patients)
|
|
View this table: [in a new window] |
TABLE 4. ORs for HCC with different number of mutations in different HBV genotypes/subtypes
|
|
View this table: [in a new window] |
TABLE 5. Amino acid mutations in different genotypes associated with HCC development
|
IF A1762G1764 and T1165, then HCC
IF T1762A1764 and ACG2712, then HCC
IF T1762A1764 and T2712 and C2525, then HCC
ELSE, non-HCC.
Using this algorithm, the sensitivity (95% CI) and specificity (95% CI) of diagnosing HCC in the testing cohort were 0.75 (0.61 to 0.89) and 0.66 (0.53 to 0.79), respectively, and those in the validation cohort were 0.72 (0.51 to 0.93) and 0.73 (0.59 to 0.87), respectively. The positive and negative likelihood ratios (95% CIs) for the performance of the algorithm in the testing cohort were 2.21 (1.27 to 3.14) and 0.38 (0.15 to 0.60), respectively, and those in the validation cohort were 2.67 (1.12 to 4.21) and 0.38 (0.09 to 0.68), respectively.
The classification rules for the Ce cluster of genotype C were as follows:
IF C31 OR C53 OR G1499, then HCC
ELSE, non-HCC.
Using this algorithm, the sensitivity (95% CI) and specificity (95% CI) of diagnosing HCC in the testing cohort were 0.75 (0.54 to 0.96) and 0.70 (0.42 to 0.98), respectively, and those in the validation cohort were 1.00 (not available) and 0.75 (0.47 to 1.00), respectively. The positive and negative likelihood ratios (95% CIs) for the performance of the algorithm in the testing cohort were 2.50 (0.03 to 4.97) and 0.36 (0.02 to 0.69), respectively, and those in the validation cohort were 4.00 (0.00 to 8.53) and 0.00 (not available), respectively.
The classification rules for the Cs cluster of genotype C were as follows:
IF A1613 OR A1899 OR CG2170 OR C2441, then HCC
ELSE, control.
Using this algorithm, the sensitivity (95% CI) and specificity (95% CI) of diagnosing HCC in the testing cohort were 0.72 (0.59 to 0.85), 0.72 (0.58 to 0.86), respectively, and those in the validation cohort were 0.88 (0.73 to 1.00) and 0.63 (0.48 to 0.78), respectively. The positive and negative likelihood ratios (95% CIs) for the performance of the algorithm in the testing cohort were 2.57 (1.20 to 3.94) and 0.39 (0.20 to 0.58), respectively, and those in the validation cohort were 2.38 (1.32 to 3.44) and 0.19 (0.00 to 0.44), respectively.
|
|
|---|
The issue of HBV genotypes has been debated due to discrepant results in previous studies from different countries (18, 27). These differences may be explained by a distinct distribution of HBV subgenotypes in different geographical regions. In most Asian countries, only subgroup Ba of HBV is found, while the majority of Japanese patients with HBV have subgroup Bj (9). Genotype C HBV has a higher risk of HCC than genotype B HBV, which is probably related to a delayed HBeAg seroconversion, more active hepatitis, and a higher prevalence of basal core promoter mutations (5, 19, 32). Among genotype C HBV, there were also differences in the disease activity associated with different subgenotypes (6). Recently, we have shown that subgenotype Ce HBV was associated with the highest risk of HCC independent of other risk factors, including high HBV DNA levels and liver cirrhosis, among a longitudinal cohort of 1,006 chronic hepatitis B patients followed up for 7.7 years (10). The proportion of HCC in patients with subtype adw was found to be higher than that in patients with subtype adr (25). Going beyond attributing HCC to a specific genotype, this study suggests that different genotypes of HBV are associated with different mutations of the viral genome and thus may have separate mechanisms of hepatic carcinogenesis.
The basal core promoter mutant (T1762/A1764) is found to parallel the progression of liver disease and increases the risk of HCC for both genotype B and C HBV (19, 33). In common with previous studies, we also found mutation at codon 1762/1764 to be associated with HCC in genotype B HBV infection. The reason why 1762/1764 mutations were not identified as a marker for HCC in genotype C HBV was related to the high prevalence of mutations at these sites even among the non-HCC patients (8). However, this phenomenon may also mean that a selection pressure on the basal core promoter/X region of the HBV genome in genotype B HBV is associated with the development of HCC. The HCC-associated mutations selected by HBV subgenotype Ce are located in the envelope region, while those selected by HBV subgenotype Cs are located in the precore/core region. These findings offer additional support for the presence of various virologic mechanisms of hepatocarcinogenesis by different HBV genotypes/subgenotypes. The functions of these mutations and their gene products need further investigation.
HBV DNA appears to integrate into host DNA at different sites, exerting direct and indirect effects on the host genes (7). It has also been postulated that the integrated HBV genes can activate cellular genes remote from the site of HBV DNA integration, thereby influencing cellular proliferation and differentiation. This transactivation effect could be mediated through different signal transduction pathways. Identification of HCC-related mutations is only the first step in understanding the viral mechanism of hepatic carcinogenesis. Functional genomic studies of these mutations would have to be carried out in the future to elucidate the effects of these mutations on cell growth and death of hepatocytes.
There are several limitations in this study. First, although patients in the control group were age matched with those in the HCC group, the possibility of developing malignancy in the future cannot be denied. As there is no matching in the disease severity and liver cirrhosis, the HCC-related mutations may have an indirect effect on HCC development through increasing hepatic inflammation and liver cirrhosis. When the algorithms were tested with the independent validation cohort, a very high sensitivity and a satisfactory specificity were reported for both genotype B and C subgenotypes. Second, although this is by far the largest cohort of HCC and non-HCC cases to have full-length viral genomic analysis of HBV compared to previous studies (17, 22), the sample size is still relatively small. The 95% CIs for the sensitivity and specificity of the genomic algorithms are still wide. In the future, laboratory methods to detect these mutants in a more robust manner than does full-genome sequencing are needed to facilitate a larger-scale validation study. A larger cohort, preferably from a different geographic location, would also be needed to validate the generalization of our results. Third, we can only study patients with genotype B and subgenotypes Ce and Cs of HBV. We cannot study genotype A HBV and genotype D HBV, which are prevalent in Europe and Africa, because of our geographic limitations. Moreover, as most Hong Kong residents are immigrants from China, we did not have the information on the place where the ancestors of the patients acquired the infection. We believe that most of our patients originated from southern China, where HBV subgenotype Cs is more prevalent than subgenotype Ce. However, the methodology adopted in this study could be used in countries with other HBV genotypes for mining of HBV-related mutations. Finally, we have not worked out the functionality of these mutated codons and why they might lead to development of HCC. More work is required to elucidate the virologic and host responses to mutations. We cannot draw a conclusion on the causal relationship between these HBV mutations and HCC.
In conclusion, this study suggests that HBV genotypes B and C demonstrate different point mutations which might be associated with high risk of hepatic carcinogenesis. The difference in the locations of these mutations in the HBV genome may reflect the underlying mechanisms of hepatocarcinogenesis of the different HBV genotype/subgenotypes. The detection of these mutations has shown promising results in the association with a higher cancer risk. By combining this information with other clinical risk factors for HCC, including HBV DNA levels and liver cirrhosis status (10, 11), future clinical algorithms can be refined. It is possible that these diagnostic algorithms may shed light on which patients with chronic HBV infection require more frequent screening and surveillance for HCC development.
APPENDIX
The information gain of a feature (attribute) is the reduction in uncertainty (entropy) that results if the attribute is used for classification. Hence, the higher the information gain, the better. The following equation gives the entropy, E, of an attribute X with n values, X1 ... Xn, where P(Xj) is the frequency of the value Xj: E(X) =
–P(Xj)log2P(Xj).
Specific to a typical DNA classification problem, we assumed that the data had M classes, C1 ... CM. For each aligned site position, it has N possible nucleotides, V1 ... VN. We defined Cm as the number of sequences in class Cm. Cmi is the number of sequences in class Cm whose character at the aligned site is Vi, which could be A, T, G, or C in our case. The remainder of X, R(X) was defined as follows:
![]() |
The information gain, IGj, of the aligned site j is the difference between the original information content E(C) of the data set and the amount of information needed to classify all the unclassified data left in the data set after applying site j for classification: IGj = E(C) – R(j).
The features were ranked by the information gains, and then the top-ranked features were chosen for classification. A site with higher information gain would contribute more discriminatory power to the classification such that more samples could be distinguished by this site.
We declare that we have no conflict of interest.
Published ahead of print on 23 January 2008. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»