CME JOURNAL OF MAGNETIC RESONANCE IMAGING 00:00–00 (2013)

Original Research

Texture-Based Classification of Liver Fibrosis Using MRI Michael J. House, PhD,1* Sander J. Bangma, BSc,2 Mervyn Thomas, PhD,3 Eng K. Gan, MD,4,5 Oyekoya T. Ayonrinde, MD,4,5,6 Leon A. Adams, PhD,4,7 John K. Olynyk, PhD,4,5,6,8 and Tim G. St. Pierre, PhD1 Key Words: MRI; liver fibrosis; texture classification J. Magn. Reson. Imaging 2013;00:000–000. C 2013 Wiley Periodicals, Inc. V

Purpose: To investigate the ability of texture analysis of MRI images to stage liver fibrosis. Current noninvasive approaches for detecting liver fibrosis have limitations and cannot yet routinely replace biopsy for diagnosing significant fibrosis. Materials and Methods: Forty-nine patients with a range of liver diseases and biopsy-confirmed fibrosis were enrolled in the study. For texture analysis all patients were scanned with a T2-weighted, high-resolution, spin echo sequence and Haralick texture features applied. The area under the receiver operating characteristics curve (AUROC) was used to assess the diagnostic performance of the texture analysis.

LIVER FIBROSIS OCCURS in response to chronic liver injury and is characterized by accumulation of extracellular matrix and collagen (1). Advanced stage liver fibrosis is the single most important predictor of morbidity and mortality in people with chronic liver disease, especially that resulting from chronic viral hepatitis (2–4). Staging of liver fibrosis in such patients can be used to guide decision-making regarding the requirement for specific antiviral therapy, especially in older individuals (5,6). Liver biopsy is currently the reference standard in assessing liver pathology and the stage of fibrosis, but it is an invasive procedure which carries with it risks of complications (7,8). The diagnostic accuracy of liver histopathology also depends on factors such as specimen size (9) and heterogeneity of fibrosis (10–13). Current noninvasive approaches for detecting liver fibrosis include clinical symptoms and signs, routine laboratory tests, serum markers of fibrosis and inflammation, quantitative assays of liver function, elastography, and radiologic imaging studies. However, all of these approaches have limitations for staging fibrosis and a recent multicenter study of over 1000 patients (14) concluded that noninvasive tests (FibroScan, biomarkers) cannot yet routinely replace biopsy for diagnosing significant fibrosis (METAVIR  2). Hence, there is a need to develop accurate and reliable noninvasive techniques to assess the severity of liver fibrosis. Texture analysis characterizes the spatial variation of gray levels throughout an image, not discernible to the human eye, using a series of mathematical equations to generate a range of parameters associated with the image texture. Although originally developed as a way to quantify texture in geophysical satellite imagery (15), it has since been applied to many fields of imaging research. Several studies have used texture-based approaches with MRI to assess liver fibrosis (16–18). However, these studies were limited by the absence of

Results: The best mean AUROC achieved for separating mild from severe fibrosis was 0.81. The inclusion of age, liver fat and liver R2 variables into the generalized linear model improved AUROC values for all comparisons, with the F0 versus F1–4 comparison the highest (0.91). Conclusion: Our results suggest that a combination of MRI measures, that include selected texture features from T2-weighted images, may be a useful tool for excluding fibrosis in patients with liver disease. However, texture analysis of MRI performs only modestly when applied to the classification of patients in the mild and intermediate fibrosis stages. 1 School of Physics, The University of Western Australia, Crawley, Western Australia, Australia. 2 Resonance Health Ltd, Claremont, Western Australia, Australia. 3 Emphron Informatics, Toowong, Queensland, Australia. 4 School of Medicine and Pharmacology, The University of Western Australia, Crawley, Western Australia, Australia. 5 Department of Gastroenterology, Fremantle Hospital, Fremantle, Western Australia, Australia. 6 Curtin Health Innovation Research Institute, Curtin University of Technology, Bentley, Australia. 7 Liver Transplant Unit, Sir Charles Gairdner Hospital, Nedlands, Western Australia, Australia. 8 Institute for Immunology and Infectious Diseases, Murdoch University, Murdoch, Western Australia, Australia. Contract grant sponsor: National Health & Medical Research Council of Australia Practitioner Fellowship; Contract grant number: 1042370. *Address reprint requests to M.J.H., School of Physics, The University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia. E-mail: [email protected] Received August 19, 2013; Accepted November 15, 2013. DOI 10.1002/jmri.24536 View this article online at wileyonlinelibrary.com. C 2013 Wiley Periodicals, Inc. V

analysis;

1

2

House et al.

Table 1 Distribution and Characteristics of Patients

METAVIR score 0 1 2 3 4

No. of subjects

Mean age (years) 6 SD

No. of men (%)

Mean biopsy-MRI interval (days) 6 SD

14 13 5 9 8

52.9 6 13.5 54.6 6 12.7 50.2 6 7.0 43.9 6 19.9 51.8 6 14.0

7 (50) 10 (77) 0 (0) 3 (33) 6 (75)

135.2 6 98.4 81.9 6 76.9 51.4 6 26.0 68.3 6 76.6 506.76 626.9

histological diagnostic techniques to verify liver fibrosis staging (16,17) or they did not make full use of all the Haralick texture measures (17,18). Based on the previous studies, we hypothesized that a fast T2-weighted MRI sequence with variable T2 contrast and high spatial resolution combined with an expanded feature set of Haralick texture measures could be capable of staging fibrosis noninvasively. Hence, the aim of this study was to apply texture analysis to our MRI data acquisition and image processing strategy and investigate the ability of texture analysis approaches to stage liver fibrosis in a prospective cohort of patients with biopsy-confirmed liver disease and fibrosis assessment.

lactation, malignancy (excluding basal cell or squamous cell skin cancers). A 63-year-old male with a METAVIR score of F3 was excluded because the correct MRI sequence was not acquired, leaving 49 patients in the final cohort (Table 1). The main causes of liver disease were nonalcoholic steatohepatitis (n ¼ 12), hepatitis C (n ¼ 11), nonalcoholic fatty liver disease (NAFLD) (n ¼ 7), hepatitis B (n ¼ 4), primary sclerosing cholangitis (n ¼ 4), and autoimmune hepatitis (n ¼ 3). Three patients were classified as normal based on their biopsy. Liver Histology and Quantification of Liver Fibrosis The patients underwent percutaneous liver biopsy with ultrasound guidance as part of their routine clinical management. Each liver biopsy specimen was prepared for histological examination before being retrospectively reviewed by a hepatologist with 22-years experience in assessing liver fibrosis and who was blinded to the patient’s identity and MRI data. The fibrosis stage was evaluated according to the METAVIR score (19) for standardization purposes as follows: F0, no fibrosis; F1, portal fibrosis without septa; F2, a few septa; F3, numerous septa without cirrhosis; F4, cirrhosis. MRI Acquisition

MATERIALS AND METHODS Subjects Fifty consecutive patients were prospectively enrolled for this study between March 2009 and March 2010 following institutional ethics approval and informed written consent. The patients were recruited from the hepatology outpatient clinics at two hospitals. The inclusion criteria were: between 18 and 65 years of age, a liver biopsy obtained within 12 months before MRI for patients without cirrhosis (some patients with cirrhosis were enrolled with a MRI-biopsy interval greater than 12 months on the grounds their fibrosis stage was very unlikely to regress), current alcohol consumption less than 20 grams per day for men and less than 10 grams per day for women. The exclusion criteria were: contraindications for MRI, ischemic heart disease as determined by history or abnormal electrocardiogram, pregnancy or

All MRI measurements were made on Siemens 1.5 Tesla (T) Avanto scanners (Siemens Medical Systems, Erlangen, Germany). Phased-array body coils were centered over the liver of the subjects. We investigated a range of imaging parameters including slice thickness, matrix size, and signal averaging. The final acquisition parameters were chosen to maximize the spatial resolution and minimize the scan time to enable all participants to perform a breathhold. Although the signal to noise ratio was not optimal, our hypothesis was that good spatial resolution would be more important for detecting the earlier stages of fibrosis using texture analysis. For texture analysis three axial slices, positioned through the widest part of the liver, were acquired in a single breathhold, turbo spin-echo sequence (Fig. 1) with the following parameters (TR 800 ms, TEs 44, 65, 80, 94, 123, 109, 138 ms, slice thickness 4 mm, matrix 512  512, field of

Figure 1. Summary of texture analysis process. Axial images of a control subject (TR 800 ms, TE 44 ms, and TE 109 ms) showing a typical ROI (solid pink line) covering the whole liver. Gray-tone spatial dependence matrices are calculated for each image and then processed using a Haralick texture feature formula (e.g., angular second moment ASM).

Texture Analysis of Liver Fibrosis

3

Table 2 Haralick Texture Features Used in This Study No.

Haralick texture feature

1 2 3 4 5 6 7 8 9 10 10b 11 12 13

Angular Second Moment Contrast Correlation Sum of Squares: Variance Inverse Difference Moment Sum Average Sum Variance Sum Entropy Entropy Difference Variance Difference Average Difference Entropy Information Measures of Correlation 1 Information Measures of Correlation 2

Abbreviation ASM CON COR SSQV IDM SAVE SVAR SENT ENT DVAR DAVE DENT IMC1 IMC2

view 300  300 mm, 1 excitation, echo train length 29, flip angle 150, bandwidth 255). The breathhold sequence was repeated for each echo time, generating 21 images (7 echo times by 3 slices) for each subject. Transverse relaxation rate (R2) measurements and a quantitative parameter alpha (a), related to the concentration of liver fat, were also made for each participant. To measure alpha, a standard in-phase, out-ofphase gradient echo sequence (repetition time [TR] 88 ms, echo times [TEs] 2.38, 4.76, 7.14 ms, 1 excitation, flip angle 70, bandwidth 500) was acquired (20) during a single breathhold with the same field of view and slice positions as for the texture images. R2 of the liver was measured using the FerriScan data acquisition protocol (21).

MR Image Processing A single analyst, blinded to the patients’ identities and medical histories, reviewed and processed all images using ImageJ 1.42 (NIH, Bethesda, MD). A single slice for each echo time was selected for processing. Each slice contained the largest cross section through the liver, the least motion artifacts, and minimal large vessels through the centre of the liver. On this slice, a region of interest (ROI) was delineated within the

bounds of the entire liver, avoiding the portal vein, very large intrahepatic vessels, and any obvious motionaffected regions (Fig. 1). We used the methodology of St Pierre et al (22) for measuring the transverse relaxation rate of the liver and the methodology of House et al (20) for measuring the liver fat parameter alpha (a). Texture Analysis Texture analysis of the images was performed using ImageJ combined with an in-house software plugin module. The software calculates 14 texture features for a specified ROI; the first 13 Haralick texture features (15) and an additional feature based on the sixth Haralick measure (Sum Average) (Table 2). Several preprocessing options can be specified which include the quantization level (4 to 10 bit) and outlier treatment (3 options). The texture features are generated from graytone spatial dependence (GTSD) matrices, which are obtained by comparing the intensities of pixels separated by a given distance (pixel separation) and direction (orthogonal, diagonal). To capture information on the texture within the liver over different length scales, 20 pixel separations were used (1–10, 12, 14, 16, 18, 20, 25, 35, 45, 60, 75 pixels). Hence, for each echo time the texture analysis generated 11,760 measures for the whole liver (14 Haralick  7 quantization levels  3 outlier treatments  2 directions  20 pixel separations). Statistical Analysis The five-point METAVIR score was dichotomized to generate four comparison groups: F0 versus F1/F2/ F3/F4, F0/F1 versus F2/F3/F4, F0/F1/F2 versus F3/F4, and F0/F1/F2/F3 versus F4. Descriptive clinical and demographic characteristics were compared using Chi-squared analysis or Fishers exact test (for categorical data), Mann-Whitney U-test (nonparametric data) or Student’s t-test (parametric data) (Table 3). The large number of features generated for each subject (82320) required the use of feature reduction and selection strategies before classification modeling. The approach to feature preselection was as follows. Empirical Bayes moderated t-tests (23,24) were used

Table 3 Distribution of Patients Across Comparison Groups*

No. of men (%)

Mean biopsy-MRI interval (days) 6 SD

Liver fat (a) 6 SD

Liver R2 (s-1) 6 SD

52.9613.5 50.6614.6

7 (50) 19 (54)

135 6 98.4 161 6 327

0.12 6 0.10 0.11 6 0.09

34.0 6 7.9 34.1 6 12.0

27 22

53.7612.9 48.2615.4

17 (63) 9 (41)

110 6 91.1 210 6 408

0.12 6 0.08 0.10 6 0.10

36.3 6 12.2 31.4 6 8.5

F0–2 F3–4

32 17

53.2612.1 47.6617.3

17 (53) 9 (53)

100 6 86.7 260 6 460

0.12 6 0.09 0.09 6 0.09

35.5 6 11.9 31.3 6 8.3

F0–3 F4

41 8

51.1614.4 51.8614.0

20 (49) 6 (75)

93.4 6 84.7 507 6 627

0.11 6 0.09 0.11 6 0.08

34.7 6 11.0 30.7 6 10.7

METAVIR group

No. of patients

Mean age (years) 6 SD

F0 F1–4

14 35

F0–1 F2–4

*There were no statistically significant differences at the P ¼ 0.05 level for any of the comparison groups.

4

House et al.

Table 4 Summary of Best Haralick Texture Feature for Group Discrimination Comparison F0 vs F1/F2/F3/F4 F0/F1 vs F2/F3/F4 F0/F1/F2 vs F3/F4 F0/F1/F2/F3 vs F4

Haralick texture feature

Normalization bit level

IDM, SENT, ENT, DENT, IMC1, IMC2 IDM, SENT, ENT, DENT, IMC2 CON, SSQV, SAVE, DVAR, DAVE No significant differences after false discovery rate adjustment

8–10 8–10 4–7

CON ¼ contrast; DAVE ¼ difference average; DENT ¼ difference entropy; DVAR ¼ difference variance; ENT ¼ ENTROPY; IDM ¼ inverse difference moment; IMC1 ¼ Information Measures of Correlation 1; IMC2 ¼ Information Measures of Correlation 2; SAVE ¼ sum average; SVAR ¼ sum variance.

to identify texture features that best separated the two groups in each of the four binary comparisons. The moderated t-test approach has been used extensively in the analysis of micro array data; it provides protection against artificially inflated t statistics, which sometimes arise with many dependent variables, where some very small standard errors arise by chance. Separate analyses were conducted for each combination of TE value (7), bit depth (7), outlier treatment (3), and directional average (2). This generated 294 datasets. Each dataset included 280 measures (14 Haralick measures at 20 pixel distances). Following the empirical Bayes t-tests, P values were adjusted to maintain a 5% false discovery rate across each of the 294 datasets using the methods of Benjamini and Hochberg (25). After false discovery rate adjustment, the frequency of the statistically significant results, for each of the four comparisons, was assessed to identify patterns and trends in the texture measures and MRI acquisition parameters that best separated different stages of fibrosis. In the next stage of analysis, using the reduced feature set identified previously, two procedures were used to classify patients and develop a predictive algorithm. The LogitBoost algorithm (26) is an ensemble metalearner that has been shown to perform well on a wide variety of different problems (27,28) and is relatively resistant to over fitting (29). The second classification procedure used a generalized linear model (GLM) with a binomial error structure and logit link function (30). Cross-validated (31) posterior probabilities of disease were obtained from both of these models, and used to compute a receiver operating characteristic (ROC) curve (32). The area of the empirical (i.e., unsmoothed) ROC curve was calculated using Somers D. A diagnostic algorithm with no utility has a ROC area of 0.5, and a perfect diagnostic algorithm has a ROC area of 1. A 95% confidence interval for the area under the ROC curve was calculated using the Bootstrap resampling approach (33), using bias corrected and adjusted estimates (34). Additional variables (age, liver R2, liver fat [a]) were also entered into the classification models in a separate analysis to determine if clinical or demographic factors improve classification accuracy.

RESULTS Demographics There were no significant differences in age, gender, biopsy-MRI interval, liver fat (a), or liver R2 between

any of the comparison groups, although the F4 group had a larger mean biopsy-MRI interval compared with the other groups (Table 3). The large biopsy-MRI interval resulted from not applying a time cutoff for entry of patients with liver fibrosis stage F4 into the study because, as cirrhotic patients, their fibrosis stage was very unlikely to regress. Texture Analysis: Feature Selection The frequency analysis suggested that there are differences between the comparison groups in terms of their response to the Haralick measures and that the first comparison (F0 versus F1–4) produced the greatest frequency of significant differences. Summaries of the best Haralick texture features and analysis parameters for separating the groups are shown in Table 4. The best processing combination for separating no or low fibrosis (F0/F1) from intermediate fibrosis-cirrhosis (F2–F4) were for Haralick measures IDM, SENT, ENT, DENT, IMC1, IMC2 calculated for quantization levels of eight bits (i.e., 256 gray levels) or higher. The best processing combination for separating low-intermediate fibrosis levels (F0–F2) from severe fibrosis or cirrhosis (F3/F4) were for Haralick measures CON, SSQV, SVAR, DVAR, DAVE calculated for quantization levels between four and eight bits (i.e., 16 to 256 gray levels). Different Haralick measures showed better separation between groups at different echo times. On the basis of frequency of statistically significant results, the echo times showing the strongest separation for each measure are shown in Table 5. These results indicate that images acquired at echo times greater than 123 ms are better for discriminating no or low fibrosis from intermediate fibrosis-cirrhosis. Between group separation was fairly consistent across all interpixel distances, but there was a tendency for better group separation at lower interpixel distances. Outliers trimmed three standard deviations above and below the mean image intensity showed the greatest frequency of significant differences, but direction (diagonal, orthogonal) did not show any obvious trends. In summary, the frequency analysis of individual texture features suggested that the best criteria for finding diagnostic signatures was to use high bit depth images (10 bits), with outliers trimmed using the three standard deviation cutoffs, and Haralick measures evaluated at specific echo times and at relatively short interpixel distances.

Texture Analysis of Liver Fibrosis

5

Table 5 Summary of the Echo Time Producing Best Group Discrimination by Haralick Texture Feature* Haralick no. 1 2 3 4 5 6 7 8 9 10 10b 11 12 13

Haralick texture feature

TE

Angular second moment Contrast Correlation Sum of squares: variance Inverse difference moment Sum average Sum variance Sum entropy Entropy Difference variance Difference average Difference entropy Information measures of correlation 1 Information measures of correlation 2

123 44 44 123 123 44 109 123 123 123 44 109 138 138

*Haralick measures in bold indicate those measures that occur more frequently in t-tests separating patients with no or minimal fibrosis (F0 or F0-F1) from patients with more severe fibrosis (F1F4 or F2-F4) (see Table 4).

Texture Analysis: Classification Results The Logit Boost algorithm was used to develop a predictive rule using all 14 Haralick measures at the selected echo times summarized in Table 5, 9- and 10-bit images, one outlier treatment, both directions and interpixel spacings between one and eight. This combination of parameters produced 32 models for which the area under the ROC curve (AUROC) was calculated. All parameters were the same for the simple generalized linear model except the number of Haralick measures was reduced to six (IDM, SENT, ENT, DENT, IMC1, IMC2) because of the small number of observations in some of the groups. The mean AUROC for the 32 predictive models produced by the Logit Boost and generalized linear model algorithms are summarized in Table 6. The performance of the Logit Boost algorithm for predicting fibrosis group membership was modest, except for the comparison between F0 and F1–4, which produced the highest mean AUROC (0.78) with areas ranging from 0.74 to 0.85. The inclusion of age, liver fat (a) and liver R2 variables into the model improved AUROC values for some comparisons, but lowered them for others (Table 6). The simple generalized linear model produced improved classification results compared with the Logit Boost algorithm with mean AUROC values higher for all four comparisons (Table 6). All ROC areas were statistically significantly greater than 0.5 for all comparisons and the highest mean AUROC was obtained for the F0 versus F1–4 comparison (0.87, range 0.84 to 0.91). The inclusion of age, liver fat (a), and liver R2 variables into the generalized linear model improved AUROC values for all comparisons. Exclusion of large blood vessels by the use of a ROI restricted to the right lateral section of the liver did not change the Haralick features identified or the diagnostic performance (data not shown).

DISCUSSION Our study indicates that texture measures derived from T2-weighted, high-resolution MRI have diagnostic sensitivity for discriminating patients with and without fibrosis, but texture analysis appears to be less sensitive for staging low and intermediate levels of fibrosis. Comparison of our results to other studies for the F  1 patients is difficult as most studies do not report on this grouping. However, we note that, in studies reporting on patient groups across multiple fibrosis stages (35–37), AUROC values tend to decrease toward lower fibrosis stage cutoffs (i.e., from F ¼ 4 to F  3 to F  2), in contrast to our results, which are better for the lowest cutoff available (F  1). Our texture analysis approach appears to be less sensitive for separating patients with mild fibrosis from those with severe fibrosis, but is comparable with other noninvasive techniques (transient elastography, diffusion weighted imaging), including those using texture analysis of MRI (14,18,35–37). However, a recent meta-analysis of MR elastography reported average area under ROC curves of 0.98 and 0.98 for staging F0–1 versus F2–4 and F0–2 versus F3–4, respectively (38). Our results are broadly similar to those of Jirak et al (16), who used texture analysis of MRI to distinguish controls from cirrhotic subjects, but were unable to classify cirrhotic patients separated into different Child-Pugh stages. Although the patient groups are not directly comparable, the successful classification of a small validation group with the highest Child-Pugh scores (16) reflects, to some extent, the higher AUROC we observed for separating F ¼ 4 from other patients. In summary, our results suggest that a combination of MRI measures that include selected texture features, liver fat (a), and liver R2 may be a useful tool for excluding fibrosis in patients with liver disease. However, texture analysis of MRI performs only modestly when applied to the classification of patients in the mild and intermediate fibrosis stages. Our feature selection process suggests that discrimination of patients with liver disease into fibrosis stages is affected by the choice of Haralick measures and texture processing parameters. We identified a subset of six Haralick measures (IDM, SENT, ENT, DENT, IMC1, IMC2) that were better for discriminating between nonfibrotic and fibrotic patients. Three of these texture measures (SENT, ENT, DENT) are directly related to entropy, which is an indication of the randomness of the image texture, and the two IMC features (information measures of correlation) also contain references to entropy within their formulas (15). The IDM texture feature measures image homogeneity. Jirak et al (16) did not include the IMC1 or IMC2 features, but of the top five Haralick features they identified for discriminating between controls and cirrhotic patients, our study identified three of them (IDM, SENT, DENT). Of interest, a different set of texture features and at lower image bit depths (gray levels) were identified in our significant results comparing patients with low-intermediate fibrosis with

6

House et al.

Table 6 Mean Area Under the ROC Curve for Classifying Subjects With METAVIR Scores F  1, F  2, F  3, F ¼ 4 Area under ROC curve Generalized linear model

LogitBoost Comparison F0 vs F1/F2/F3/F4 F0/F1 vs F2/F3/F4 F0/F1/F2 vs F3/F4 F0/F1/F2/F3 vs F4

Haralick

HaralickþageþfatþR2

Haralick

HaralickþageþfatþR2

0.78 0.60 0.51 0.49

0.75 0.66 0.61 0.41

0.87 0.80 0.74 0.80

0.91 0.81 0.81 0.87

patients having severe fibrosis or cirrhosis. Similarly, Jirak et al (16) also observed some variation in the significant texture feature sets depending on the severity of the patient group that was assessed. Our extensive feature set has improved our empirical understanding of which texture features and processing parameters are important in characterizing fibrosis in patients with liver disease. This knowledge should help to greatly reduce the number of variables used in future classification models that use texture analysis for assessing liver fibrosis. Our feature selection approach suggested that there were optimal echo times for each texture feature. For the texture features that best separated nonfibrotic from fibrotic patients (IDM, SENT, DENT, IMC1, IMC2), features generated from images with echo times greater than 109 ms occurred with greater frequency in the statistically significant results. We would expect that contrast between fibrotic tissue and liver parenchyma would increase with higher echo times up to a point, and our analysis of the optimal TE generally supports that concept. Previous studies have indicated that fibrotic tissue can be hyperintense relative to nonfibrotic tissue on T2-weighted MRI (39) and quantitative studies suggest that T1 or T2 relaxation times can be longer in cirrhotic livers (40,41). Liver iron levels may also vary and severe siderosis will reduce the T2 of the liver parenchyma relative to the fibrotic tissue and hence should potentially enhance the contrast between fibrotic and nonfibrotic tissue on T2-weighted images. Conversely, increased liver fat is likely to increase T2 of the liver parenchyma, which may reduce the contrast between fibrotic and nonfibrotic tissue on T2-weighted images. The addition of liver R2 and liver fat (alpha) in the generalized linear model did add some improvement to the area under the ROC curves, suggesting these natural contrast effects can play a role in altering diagnostic sensitivity. In the interests of simplicity and safety, we chose not to use a contrast agent and instead varied the echo time to modify image contrast between fibrotic and nonfibrotic tissue. Contrast agents can enhance the visual appearance of the reticulation patterns in cirrhosis (18,42) and clearly show the network of fibrous tissue in cirrhotic cases when longer echo times are used (42). Kato et al (18) also observed an improvement in their area under the ROC curve, from 0.525 to 0.801, when texture analysis was applied to equilibrium phase contrast-enhanced images

compared with unenhanced T1- and T2-weighted images. While in our study we achieved similar AUROC values for a similar comparison without exogenous contrast, the improvement in diagnostic ability with the addition of a contrast agent requires consideration. The results from our study and others indicate that image contrast has some effect on the ability of texture measures to discriminate fibrotic from nonfibrotic livers. Further study is required to investigate whether the addition of a contrast agent combined with our texture analysis approach will improve our diagnostic sensitivity. We used a large ROI that covered the whole liver as it allowed us to measure variations in image intensity over larger pixel separations. The ability to characterize textural patterns at larger length scales may be important when fibrosis is widespread. In our analysis, we also included a smaller ROI (data not shown) that covered the right lateral segment of the liver and this ROI largely avoided any large vessels. However, there was no difference between the ROIs in the texture features identified as being better for discrimination of fibrosis stages and the area under the ROC curve results were very similar. The main disadvantage of using the large ROI was the longer image processing time. There were some limitations to our study. Our F1 to F4 group was of mixed etiology and included patients with NAFLD and viral hepatitis and our F0 group contained three normal, six NAFLD, and four NASH patients. A separate statistical analysis that excludes the NASH/NAFLD patients was not made given the reduction in group numbers that would arise from such an exclusion, but liver fat (a) was included as a variable in a subset of the classification models. For the GLM, we used a subset of six Haralick measures chosen a posteriori and this could have resulted in an upward bias in estimates of the areas under the ROC curves. An independent validation cohort would be required to test the reproducibility of the texture features selected for the GLM analysis. Finally, the small voxel size we used to achieve high spatial resolution in the MR images may have resulted in lower signal to noise levels compared with other studies. In conclusion, our results suggest that a combination of MRI measures, that include selected texture features from T2-weighted images, may be a useful tool for excluding fibrosis in patients with liver disease. However, texture analysis of MRI performs only modestly when applied to the classification of patients

Texture Analysis of Liver Fibrosis

in the mild and intermediate fibrosis stages. From our investigation of texture measures, processing parameters, and the effects of image contrast, we have considerably reduced the variable space required for future classification studies.

7

19.

20.

21.

ACKNOWLEDGMENT The authors thank the radiology departments and clinics involved in this research for their support. J.K.O. is the recipient of a National Health and Medical Research Council of Australia Practitioner Fellowship.

22.

23.

24.

REFERENCES 1. Friedman SL. The cellular basis of hepatic fibrosis – mechanisms and treatment strategies. N Engl J Med 1993;328:1828–1835. 2. Fattovich G, Brollo L, Giustina G, et al. Natural history and prognostic factors for chronic hepatitis type B. Gut 1991;32:294–298. 3. Fattovich G, Giustina G, Degos F, et al. Morbidity and mortality in compensated cirrhosis type C: a retrospective follow-up study of 384 patients. Gastroenterology 1997;112:463–472. 4. Poynard T, Bedossa P, Opolon P. Natural history of liver fibrosis progression in patients with chronic hepatitis C. The OBSVIRC, METAVIR, CLINIVIR, and DOSVIRC groups. Lancet 1997;349: 825–832. 5. Chitturi S, George J. Predictors of liver-related complications in patients with chronic hepatitis C. Ann Med 2000;32:588–591. 6. Kumar D, Wallington-Beddoe C, George J, et al. Effectiveness of interferon alfa-2b/ribavirin combination therapy for chronic hepatitis C in a clinic setting. Med J Aust 2003;178:267–271. 7. Cadranel J-F, Rufat P, Degos F. Practices of liver biopsy in France: results of a prospective nationwide survey. Hepatology 2000;32:477–481. 8. Poynard T, Ratziu V, Bedossa P. Appropriateness of liver biopsy. Can J Gastroenterol 2000;14:543–548. 9. Colloredo G, Guido M, Sonzogni A, Leandro G. Impact of liver biopsy size on histological evaluation of chronic viral hepatitis: the smaller the sample, the milder the disease. J Hepatol 2003; 39:239–244. 10. Regev A, Berho M, Jeffers LJ, et al. Sampling error and intraobserver variation in liver biopsy in patients with chronic HCV infection. Am J Gastroenterol 2002;97:2614–2618. 11. Bedossa P. Intraobserver and interobserver variations in liver biopsy interpretation in patients with chronic hepatitis C. Hepatology 1994;20:15. 12. Westin J. Interobserver study of liver histopathology using the Ishak score in patients with chronic hepatitis C virus infection. Liver 1999;19:183. 13. Maharaj B, Leary WP, Naran AD, et al. Sampling variability and its influence on the diagnostic yield of percutaneous needle biopsy of the liver. Lancet 1986;327:523–525. 14. Degos F, Perez P, Roche B, et al. Diagnostic accuracy of FibroScan and comparison to liver fibrosis biomarkers in chronic viral hepatitis: a multicenter prospective study (the FIBROSTIC study). J Hepatol 2010;53:1013–1021. 15. Haralick R, Shanmugam K, Dinstein I. Textural features for image classification. IEEE Trans Syst Man Cybern SMC 1973;3: 610–621. 16. Jirak D, Dezortova M, Taimr P, Hajek M. Texture analysis of human liver. J Magn Reson Imaging 2002;15:68–74. 17. Zhang X, Fujita H, Kanematsu M, et al. Improving the classification of cirrhotic liver by using texture features. Conf Proc IEEE Eng Med Biol Soc 2005;1:867–870. 18. Kato H, Kanematsu M, Zhang X, et al. Computer-Aided Diagnosis of hepatic fibrosis: preliminary evaluation of MRI texture analysis

25.

26. 27. 28.

29. 30. 31.

32.

33.

34. 35.

36.

37.

38.

39.

40.

41.

42.

using the finite difference method and an artificial neural network. AJR Am J Roentgenol 2007;189:117–122. Bedossa P, Poynard T. An algorithm for the grading of activity in chronic hepatitis C. The METAVIR Cooperative Study Group. Hepatology 1996;24:289–293. House MJ, Gan EK, Adams LA, et al. Diagnostic performance of a rapid magnetic resonance imaging method of measuring hepatic steatosis. PLoS One 2013;8:e59287. St Pierre TG, Clark PR, Chua-anusorn W, et al. Noninvasive measurement and imaging of liver iron concentrations using proton magnetic resonance. Blood 2005;105:855–861. St Pierre TG, Clark PR, Chua-Anusorn W. Single spin-echo proton transverse relaxometry of iron-loaded liver. NMR Biomed 2004;17:446–458. Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004;3:Article3. Lonnstedt I, Speed T. Replicated microarray data. Stat Sin 2002; 12:31. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 1995;57:289–300. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. Ann Stat 1998;28:337. Zhang G, Fang B. LogitBoost classifier for discriminating thermophilic and mesophilic proteins. J Biotechnol 2007;127:417–424. Cai Y-D, Feng K-Y, Lu W-C, Chou K-C. Using LogitBoost classifier to predict protein structural classes. J Theor Biol 2006;238:172– 176. Gilles B, G’bor L, Nicolas V. On the rate of convergence of regularized boosting classifiers. J Mach Learn Res 2003;4:861–894. Nelder JA, Wedderburn RWM. Generalized linear models. J R Stat Soc Series A 1972;135:370–384. Stone M. Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Series B Stat Methodol 1974;36:111– 147. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29–36. Efron B. The Jackknife, the Bootstrap, and other resampling plans (CBMS-NSF Regional Conference Series in Applied Mathematics). Philadelphia: Society for Industrial and Applied Mathematics; 1982. Efron B, Tibshirani RJ. An introduction to the bootstrap. London: Chapman and Hall/CRC; 1994. Lewin M, Poujol-Robert A, Boelle P-Y, et al. Diffusion-weighted magnetic resonance imaging for the assessment of fibrosis in chronic hepatitis C. Hepatology 2007;46:658–665. Friedrich-Rust M, Ong M-F, Martens S, et al. Performance of transient elastography for the staging of liver fibrosis: a metaanalysis. Gastroenterology 2008;134:960–974.e968. Ziol M, Handra-Luca A, Kettaneh A, et al. Noninvasive assessment of liver fibrosis by measurement of stiffness in patients with chronic hepatitis C. Hepatology 2005;41:48–54. Wang Q-B, Zhu H, Liu H-L, Zhang B. Performance of magnetic resonance elastography and diffusion-weighted imaging for the staging of hepatic fibrosis: a meta-analysis. Hepatology 2012;56: 239–247. Krinsky GA, Lee VS, Theise ND. Focal lesions in the cirrhotic liver: high resolution ex vivo MRI with pathologic correlation. J Comput Assist Tomogr 2000;24:189–196. Thomsen C, Christoffersen P, Henriksen O, Juhl E. Prolonged T1 in patients with liver cirrhosis: an in vivo MRI study. Magn Reson Imaging 1990;8:599–604. Kreft B, Dombrowski F, Block W, Bachmann R, Pfeifer U, Schild H. Evaluation of Different Models of Experimentally Induced Liver Cirrhosis for MRI Research with Correlation to Histopathologic Findings. Invest Radiol 1999;34:360. Aguirre DA, Behling CA, Alpert E, Hassanein TI, Sirlin CB. Liver fibrosis: noninvasive diagnosis with double contrast materialenhanced MR imaging. Radiology 2006;239:425–437.

Texture-based classification of liver fibrosis using MRI.

To investigate the ability of texture analysis of MRI images to stage liver fibrosis. Current noninvasive approaches for detecting liver fibrosis have...
161KB Sizes 0 Downloads 0 Views