Accepted Manuscript Use of Artificial Intelligence as an Innovative Donor-Recipient Matching Model for Liver Transplantation: Results from a Multicenter Spanish Study Javier Briceño, Manuel Cruz-Ramírez, Martín Prieto, Miguel Navasa, Jorge Ortiz de Urbina, Rafael Orti, Miguel-Ángel Gómez-Bravo, Alejandra Otero, Evaristo Varo, Santiago Tomé, Gerardo Clemente, Rafael Bañares, Rafael Bárcena, Valentín Cuervas- Mons, Guillermo Solórzano, Carmen Vinaixa, Ángel Rubín, Jordi Colmenero, Andrés Valdivieso, Rubén Ciria, César Hervás-Martínez, Manuel de la Mata PII: DOI: Reference:

S0168-8278(14)00387-0 http://dx.doi.org/10.1016/j.jhep.2014.05.039 JHEPAT 5192

To appear in:

Journal of Hepatology

Received Date: Revised Date: Accepted Date:

23 September 2013 23 May 2014 26 May 2014

Please cite this article as: Briceño, J., Cruz-Ramírez, M., Prieto, M., Navasa, M., de Urbina, J.O., Orti, R., GómezBravo, M., Otero, A., Varo, E., Tomé, S., Clemente, G., Bañares, R., Bárcena, R., Mons, V.C., Solórzano, G., Vinaixa, C., Rubín, Á., Colmenero, J., Valdivieso, A., Ciria, R., Hervás-Martínez, C., de la Mata, M., Use of Artificial Intelligence as an Innovative Donor-Recipient Matching Model for Liver Transplantation: Results from a Multicenter Spanish Study, Journal of Hepatology (2014), doi: http://dx.doi.org/10.1016/j.jhep.2014.05.039

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1 TITLE PAGE USE OF ARTIFICIAL INTELLIGENCE AS AN INNOVATIVE DONOR-RECIPIENT MATCHING MODEL FOR LIVER TRANSPLANTATION: RESULTS FROM A MULTICENTER SPANISH STUDY

AUTHORS: Javier Briceño1, Manuel Cruz-Ramírez2, MartínPrieto3, Miguel Navasa4, Jorge Ortiz de Urbina5, Rafael Orti1, Miguel-ÁngelGómez-Bravo6, Alejandra Otero7, EvaristoVaro8, Santiago Tomé8, Gerardo Clemente9, Rafael Bañares9,Rafael Bárcena10,ValentínCuervasMons11, Guillermo Solórzano12, Carmen Vinaixa3, ÁngelRubín3, JordiColmenero4,Andrés Valdivieso5, Rubén Ciria1, César Hervás-Martínez2, Manuel de la Mata1. DEPARTMENTS AND INSTITUTIONS: 1. Liver Transplantation Unit. Hospital Reina Sofía. Córdoba. Spain. CIBEREHD. IMIBIC. 2. Department of Computer Science and Numerical Analysis, University of Córdoba, Spain. 3. Liver Transplantation Unit. Hospital La Fe. Valencia. Spain. 4. Liver Transplantation Unit. Hospital Clínic. Barcelona. Spain. 5. Liver Transplantation Unit. Hospital de Cruces. Bilbao. Spain. 6. Liver Transplantation Unit. Hospital Virgen del Rocío. Sevilla. Spain. 7. Liver Transplantation Unit. Hospital Juan Canalejo. A Coruña. Spain. 8. Liver Transplantation Unit. Hospital ClínicoUniversitario. Santiago de Compostela. Spain. 9. Liver Transplantation Unit. Hospital Gregorio Marañón. Madrid. Spain. 10.

Liver Transplantation Unit. Hospital Ramón y Cajal. Madrid. Spain.

2 11.

Liver Transplantation Unit. Hospital Puerta de Hierro. Madrid. Spain.

12.

Liver Transplantation Unit. Hospital Infanta Cristina. Badajoz. Spain.

ADDRESS FOR CORRESPONDENCE: • Javier Briceño. • Liver Transplantation Unit. University Hospital Reina Sofía. Córdoba, SPAIN. • Phone: 0034957010439 / Fax: 0034957010949 • Email: [email protected] WORD COUNT: Title: 128 characters / Abstract: 249/ Document: 3075 NUMBER OF TABLES / FIGURES: Tables: 3 (2 B&W / 1 Color) / Figures: 3 (2 B&W / 1 Color) KEY WORDS: artificial intelligence / allocation / survival / prediction / optimization ABBREVIATIONS AB0i, AB0 incompatible transplant; AHT, arterial hypertension; ALT, alanineaminotransferase plasma level; ANN, artificial neural network; AntiHBc, Hepatitis B (core Ab positive); AST, aspartatetransaminase level; AU-ROC, Area-Under-the Receiver Operating Characteristic Curve; BMI, Body Mass Index; C, correct classification rate;CE, cause of exitus; CIT, cold ischemia time; CREAT, creatinine plasma level; DM, diabetes Mellitus; DRI, donor risk index; D-R, donor-recipient; Dy, dyalisis at transplant; ECD, extended criteria donors; ET, etiology; HCC, hepatocellular carcinoma; HCV, Hepatitis C (positive serology); HRS, hepatorrenal syndrome; ICU, Intensive Care Unit; I/R, grade of ischemia-reperfusion injury; LT, liver transplantation; MANN, multi-objective artificial neural network; MELD, model of end-stage liver disease; MELD-i: MELD score at listing; Mlogistic, Multilogistic regression analysis; MS, Minimum Sensitivity; MULT, multi-organ recovery; NA, sodium plasma level; NN-CCR, Neural Network for Correct Classification Rate; NN-MS, Neural Network for Minimum Sensitivity; PT,

3 portal thrombosis; SD, standard deviation; Slogistic, Simple Logistic regression analysis; SOFT, survival outcomes following liver transplantation score; TBIL, total bilirubin; TIPS, transjugularintrahepaticportosystemic shunt;UAS, upper abdominal surgery; WT, waiting list time. FINANCIAL SUPPORT AND CONFLICT OF INTEREST: I hereby certify that our research has received no funds, that is not under consideration for publication in any other journal and that none of the authors has any conflict of interest regarding the publication of our manuscript. The authors are indebted to AstellasPharma S.A., Spain, for its logistical support.

4

USE OF ARTIFICIAL INTELLIGENCE AS AN INNOVATIVE DONOR-RECIPIENT MATCHING MODEL FOR LIVER TRANSPLANTATION: RESULTS FROM A MULTICENTER SPANISH STUDY ABSTRACT BACKGROUND& AIMS: There is an increasing discrepancy between the number of potential liver grafts recipients and the number of organs available. Organ allocation should follow the concept of benefit of survival, avoiding human-innate subjectivity.The aim of this study is to use artificial-neural-networks (ANN) for donor-recipient (D-R) matching in liver transplantation (LT) and to compare its accuracy with validated scores (MELD, D-MELD, DRI, P-SOFT, SOFT and BAR) of graft survival. METHODS: Sixty-four donor and recipient variables from a set of 1003 LT from a multicenter study including 11 Spanish centers were included.For each D-R pair, common statistics (simple and multiple regression models) and ANN formulae fortwo non-complementary probability-modelsof 3-months graft-survival and -loss were calculated: apositive-survival (NN-CCR) and a negative-loss (NN-MS) model.The NN models were obtained by using the Neural Net Evolutionary Programming (NNEP) algorithm. Additionally, receiver-operatingcurves (ROC) were performed to validate ANN against other scores. RESULTS:Optimal results for NN-CCR and NN-MS modelswere obtained, with the best performance in predicting the probability of graft-survival(90.79%) and -loss (71.42%) for each D-R pair, significantly improving results from multiple regressions. ROC curves for 3months graft-survival and –loss predictions were significantlymore accurate for ANN than for other scoresin both NN-CCR (AUROC-ANN=0.80 vs –MELD=0.50; -D-MELD=0.54; -P-

5 SOFT=0.54; -SOFT=0.55; –BAR=0.67 and -DRI=0.42) and NN-MS (AUROC-ANN=0.82 vs – MELD=0.41; -D-MELD=0.47; -P-SOFT=0.43; -SOFT=0.57, -BAR=0.61 and -DRI=0.48). CONCLUSIONS: ANN maybe considered a powerful decision-making technology for this dataset, optimizing the principles of justice, efficiency and equity. This may be a useful tool for predicting 3-months outcome and a potential research area for future D-R matching models.

6 INTRODUCTION Liver transplantation (LT) is strongly limited by the availability of optimal liver donors. The disparity between demand and supply is unfortunately linked todeaths on waiting list. In the last two decades, donor criteria have been dramatically expanded despite the increased risk in recipient and/or graft[1].However, the combination of multiple marginal factors increases graft injury[2],what led to the development of a donor risk index (DRI) [3].On the recipient side, MELD score is the cornerstone of allocation policies based on the sickest-first principle[4].However, MELD score poorly predicts mortality following transplantation. In this allocation balance, transplantation of high-DRI organs is effective for high but not low-MELD candidates, despite increasing their time on waiting list[5].Recent scores (SOFT and BAR) have combined D-R factors to predict recipient survival following LT[6,7].Donor and graft acceptance, prioritization of candidates and allocation policies depict a complex scenario. More than 100 variables can be considered in a single donor-recipient “best matching” decision, with a risk of subjectivity and mismatch because of human limitations thatshould not be underestimated. Nowadays, artificial neural networks (ANN) are widely extended. Their application areas include agricultural plagues [8], forecasting [9], quantum chemistry [10], game-playing [11], pattern recognition (radar systems and face identification) [12], financial applications [13], data mining [14] or e-mail spam filtering [15]. Contrary to common statistics and regression analyses, ANN can combine multiple variables to analyze different end-points, including not only positive, but also negative results (as graft loss, which is not correctly estimated by common statistics reported in literature). Computational tools for the decisionmaking process in medicine, and mainly in LT, can be useful despite their complexity. By a multicenter analysis, the aims of our study are: a) to test the accuracy of ANN in predicting post-transplant outcomes, defined by two ANN models of graft survival and graft

7 loss; b) to observe the way that ANN change D-R pairs in order to obtain improved outcomes; and c) to compare them with current validated scores (MELD, D-MELD [16], DRI, P-SOFT, SOFT and BAR).

PATIENTS AND METHODS Patient selection A multicenter study from 11 Spanish LT units was conducted, including all consecutive liver transplants performed between January 2007 and December 2008. All transplant recipients aged 18 years or older were included. Partial, living-donor LT and combined transplants were excluded. A total of 57 variables (26 from the recipient, 19 from the donor, 6 from the retrieval procedure and 6 from the transplant procedure) were reported for each donor-recipient pair (Table 1). All the variables included are known before the transplant to avoid post-transplant variables that may interfere with the intention ofthese ANN models to be a pre-transplant score. The end-point variable was 3-months graft mortality. A total of 1031 liver transplants were initially included. The follow-up period was fulfilled in 1003 liver transplants. Twenty-eight cases were excluded because of the absence of graft survival data. All losses were well distributed among the participating institutions (n=28; 2.6%, range 07.5%). Models of donor-recipient matching In order to obtain the best knowledge of donor-recipient (D-R) prognosis, a new system was developed for graft assignment. For each D-R pair, two probabilities were calculated using 2 different and non-complementary models: the positive-survival model and the negative-loss model. The positive-survival model consists of a neural network, which predicts the probability of

8 3-months graft survival after LT. This model uses the mathematical concept of Correct Classification Rate (CCR), or Accuracy, defined as the percentage of correctly classified training patterns. This model tries to maximize the probability that a D-R pair has of belonging to the "graft survival" class. The negative-loss model consists of a neural network giving the probability of nonsurvival of the graft 3 months following the transplant. This model uses the mathematical concept of Minimum Sensitivity (MS), defined as the minimum value of the sensitivities of each of the classes. This model tries to maximize the probability that a D-R pair has of belonging to the “non-graft-survival” class. Common statistics are routinely used in current literature. However, although they can well predict a positive event (as graft survival), their capability for negative events (as graft loss) is poor. That is why the development of a negative model using MS is strictly necessary. To obtain the best prediction of survival according to D-R pairs, these two models are optimized together by a multi-objective evolutionary algorithm [17,18] in order to predict two probabilistic values for each D-R pair: the highest-lowest probability of survival and the highest-lowest probability of not-surviving. Building the positive-survival (CCR) and negative-loss (MS) models The positive-survival and negative-loss models are models of ANN. Neural networks (an artificial intelligence technique) simulate a biological neural system, where each neuron is modeled by a processing unit. Our ANN (Figure 1a) is composed of a set of input variables, a group of internal nodes where a non-linear sigmoid function combines all input variables (propagation function), an activation function and, in our case, an output dependent variable (a binary classification variable).

9 The values of the connections and the structure of the models are determined by an evolutionary algorithm. To verify that the individuals obtained by the evolutionary algorithm are efficient, the coefficients of individual neural network models are trained with a subset of the database (training set) and tested with the rest of the database (generalization set) [19]. For this purpose, experts in computational analysis used the 10-fold cross-validation methodology. Briefly, the whole dataset is randomly divided, and 90% of the patients are used for the training step, leaving 10% for the final testing. This process is performed 10 times, so that all patterns participate in the testing phase. After these 10 randomizations, the best CCR and MS models are chosen. The "best model" chosen is the one that correctly classifies the highest number of pairs in both categories of graft survival and graft loss. This is the currently most advanced bias-avoidance method that exists. The schedule of this methodology is depicted in Figure 1b. Rule-based system With the 2 models obtained, a very simple rule-based-for-decision system was designed: first, MELD score is the cornerstone, so in case of draw when the ANN is not capable to determine differences, the D-R matching is allocated by MELD; second, the D-R pair is chosen in cases of real biological, and not mathematical differences, defined by, at least, 3% and 5% in the NN-CCR and NN-MS models respectively. These probabilities were chosen from the standard deviations from the probabilities of belonging to the class of graft survival (SD=2.86%) or not-survival (S=5.56%). Statistical analysis The prediction capability of graft-survival and –loss for these 2 models was compared with commonly used statistic tools: two logistic regression models (multiple regression analysis, MLogistic; and simple logistic regression analysis, SLogistic). Also comparisons with a standard

10 machine learning classifier known as the Decision-Tree-Learners (C4.5), a combination of a tree structure and logistic regression models in the Logistic Model Tree (LMT), and a kernel based classifier called the Support-Vector-Machine (SVM) were performed (Supplemental Digital Content). The CCR and MS statistics were used to analyze the model power to predict the final result 3 months after transplantation. The Neural Net Evolutionary Programming (NNEP) algorithm was used to obtain these models. The building of the evolutionary NN and comparisons with logistics models were performed by two experts in Neural Network Engineering (H-M C. and C-R M.). Validation with other scores To test the accuracy of ANN in predicting both graft-survival and –loss, comparisons with other current validated scores were performed. Receiver-operating-characteristics (ROC) curves were obtained for every score to predict both end-points and compared against CCR and MS models. According to current literature, MELD, D-MELD, DRI, P-SOFT, SOFT and BAR scores were calculated. There were no missing data in any of the variables included in none of the scores. Ethical and humane considerations Every procedure, including obtaining informed consent, were conducted in accord with the ethical standards of the Committee on Human in accord with the ethical standards of the Helsinki Declaration of 1975.

11 RESULTS Neural Network models (NN-CCR and NN-MS). NN-CCR methodology performs well in predicting the probability of graft survival for each D-R pair (90.79% [SD=2.86%]). In comparison with logistic regression models, the best NN-CCR model has a slight advantage in predicting graft survival, but a clear advantage in predicting graft loss. However, from a clinical point of view, a 22% ability of NN-CCR to predict (classify) the graft loss is low (Figure 2 + Supplemental digital content). NN-MS models have a lower performance in predicting the probability of graft survival for each D-R pair (72,41% [SD=3.05%]) for the best model, in comparison with NN-C models. However, the NN-MS methodology shows the best capacity for graft loss prediction with 71.42% (SD=5.56%) for the best NN-MS model (Figure 2). Prior to the development of the models, homogeneity tests from the 11 hospitals were performed to avoid biases. After performing Friedman (for the CCR model) and Nemenyi (for the MS model), only significant differences between two hospitalscould be detected (the ones with highest-lowest graft survival and highest-lowest graft loss). Accordingly, as homogeneity was obtained in more than 90% of comparisons, the test was considered adequate. The complex interaction between variables in the intelligent model Different variables, not necessarily the same, were present in both CCR and MS models, whilst others were not significant in our ANN (Table 2).From a formal point of view, both CCR and MS models formulae are depicted as well.As observed, ANN models include almost all the variables, giving a specific strength, whilst few variables are excluded from the calculations. It is important to remark that the probability of graft survival and the probability of graft loss are not complementary, and thus, different variables with different values may impact the two models. Besides, a positive model (with a higher amount of cases) is easier to predict than a

12 negative (with a lower amount of cases). This is the reason why the amount of variables interacting in the MS model is higher. The rule-based ANN model in action: 2 examples. To better understand the results obtained from our ANN, five randomly selected recipients are chosen from the database. The rule-based system performs the allocation combining the best results obtained by the NN-CCR and NN-MS models (prediction of graft survival and loss, respectively). Recipients with similar MELD score values (between 24 and 26) were chosen to avoid a selection bias that would make the system chose the lowest MELDscore recipient. Two examples of how the ANN chooses the best D-R matching with both standard- (Table 3A) and extended-criteria (Table 3B) donors are depicted inTable 3. Validation with other scores of prediction of graft survival Comparisons of predictive capability for MELD, DRI, D-MELD, SOFT, P-SOFt and BAR scores were performed. In the positive-survival model, NN-CCR had an AUROC of 0.8060, that was significantly higher (P=0.001) than the obtained by other scores (Figure 3A). In the negative-loss model, NN-MS had an ever higher AUROC value (AUROC=0.8215), thus significantly improving the results (Figure 3B) from any of the other scores (P=0.001). Interestingly, in both prediction models of graft survival and graft loss, scores that combine donor and recipient factors (BAR, SOFT and D-MELD scores) had better AUROCs than isolated donor (DRI) or recipient (MELD) scores.

13

DISCUSSION This study is a clear contribution to consider the potential role of ANN as a valuable tool for organ allocation in order to obtain the best benefit of survival. In the current scenario of graft scarcity and waiting list deaths, the absence of a definitive and objective system for liver-donor assignment is unacceptable. Artificial neural networks are inherent complex computational tools which have been successfully used in biomedical models [20-24]. ANN are gaining acceptance in current medical evidence (240 manuscripts published in 2011). However, only 9 manuscripts have been reported regarding LT. Of them, one was published in 1994, when MELD score and D-R marching were unknown concepts [25]. Another three were just methodological papers [24,26,27]. Four recent manuscripts used ANN to predict pre-transplant end-stage liver disease mortality [28-30] or post-transplant probability of acute rejection [31]. Only one manuscript has used ANN for D-R matching but Haydon et al. used self-organizing maps (a simplified model of ANN) and collected data from 1993-2002 (pre-MELD era) for further validation [32]. The huge advantage that ANN may provide over other metrics is that they are not only static formulas; they are methods of calculation that can be applied to every population as they are trained and validated inside the population. Another advantage is that the more variables they have, the more effective they will be, thus letting us consider parameters that, in the context of a donor offer may not be adequately calibrated. Although the use of ANNs in biomedicine as an alternative to other classification methods is increasing, probably the easier use of common statistics for the medical staff maybe a limitation. However, this may not lead easy multivariate models to prevail over ANN that are more rigorous and validated tools [33,34]. The likelihood that D-R allocation may be guided/assisted by ANN is

14 unpredictable but also inspiring and provocative in order to avoid the four common limitations thatmanuscripts in current literature analyzing models of liver post-transplant share: first, none of them gives a proper global view of donor and recipient status, as just few variables are considered; second, variables from different countries may not be useful in others, decreasing the potential of these models to be exported; third, common statistics are based only on positive models that achieveacceptable rates of prediction capabilities for graft survival, but not for graft loss, a concept that is extremely important in artificial intelligence; and fourth, some of these scores have not been validated in different populations. A key point for the proposal of ANN as a reliable tool for D-R matching processes is the fact that ANN do not behave as static scores; they are dynamic. It means that for a specific population the ANN will learn from itself improving case-by-case its results. In this sense, we offer the potential of a methodology that, in the scenario of a future multicenter validation, would be useful for all the countries, with their own peculiarities. Probably the future would rest on complex software available for surgeons and clinicians that may guide the best decision to be taken in the context of a donor offer Current available predictive models have different limitations:DRI [3,35]does not consider any contribution from recipient factors. Considering recipient status, MELD score [36,37]is the basis of the allocation policies[38] but is a poor predictor of 3-months mortality following LT [39].In an interesting attempt of donor-recipient combination, Halldorson et al. [16] and Rana et al[6]reportedD-MELD and SOFT scores. Unfortunately, both scores have not been designed for donor-recipient matching and have not been validatedprobably due to the appearance of several confounding factors.Survival-benefit as a function of candidate disease severity and donor quality is a modern and interesting concept as an alternative to current urgency-based allocation [5] in terms of efficiency and equity [40].However, as suggested by Schaubel et al,[5] unmeasured recipient and donor characteristics could potentially confound the results.

15 Based on ANN, for each D-R pair two probabilities have been analyzed: the probability of having a surviving organ (accuracy), and the probability of having a non-surviving organ (minimum-sensitivity). Systems based on logistic regression can adequately classify the majority class (favorable event), but their ability to predict the minority class (adverse event) is poor [41]. DRI, MELD, D-MELD, SOFT and BAR scores are all based on logistic regression analysis and accept the “linearity” between donor, graft and/or recipient factors and the survival function. Unfortunately, as in many complex decision-making proceedings, comprehensive biological phenomena follow a nonlinear pattern, and linear regression analysis is a too simple approach. In this report, the best model for predicting the probability of graft survival for each DR pair, reaches 90.79% Correct Classification Rate in the generalization set. To complement this result, NN-MS reaches 71.42% level for graft loss prediction. The power obtained from the combination of these two models is much higher than the obtained by other toolsand performs such a complex net of connections that human mind is not able to perform. Obtaining two models committed to maximizing CCR and MS measures allows us to consider the information they provide as a whole to make a D-R match. In all cases, the probabilities of belonging to the graft-survival and graft-loss groups are heavily improved by ANN, compared with common statistics.It must be remarked that ANN is a linear combination of sigmoid functions, whilst the rest are linear models in the independent (or input) variables. In order to give strength to our analysis, MELD, DRI, D-MELD, SOFT and BAR scores were obtained from our database and AUROCs were performed. Some of them achieved acceptable predictive areas for the positive event, but they definitely dropped in the negative event. Scores that only consider isolated donor or recipient variables had lower areas (MELD and DRI) than those combining both variables (SOFT, P-SOFT and BAR) but always lower than

16 ANN. In this sense, the overlapping of variables and the fact that some of them are all-ornothing scores are pitfalls that ANN overcome due to their mathematical strength [42]. Although powerful, themain limitation inherent to our research is that data have been obtained from observational databases. Although the number of patients collected is high, and the period selected favors its homogeneity (just two years from a clear post-MELD era 200708), a prospective or an external validation would be interesting to clearly test the accuracy of ANN in other populations.It could also be speculated that with more than 55 variables and only 1000 patients, these models are at high risk of overfitting; however, this may be true for usual statistics but not for neural networks that are advanced computational tools that have a self-learning process that solves this problem. Another limitation is that our study has a limited 3-months graft survival end-point and maybe other end-points and variables could have been included as patient survival, 6- and-12-months analyses, immunosuppression, infections, or intent-to-treat survival according to risk on waiting list. The previous may be of interest, but the impact of pure D-R matching on outcome may be biased on mid/long-term analyses. Similarly, post-transplant variables, may also be of interest, but they would be contrary to our aim of obtaining a pure pre-transplant allocation model for D-R matching.We are also aware that most of the scores that are being compared here were not designed to predict 3-month graft outcomes; however, they have been validated in several populations and, although interesting, their application is limited as they do not offer a donor-recipient combination that fulfills the two main rules we have considered in our study: improved posttransplant survival with MELD-driven decision in case of draw. It is our aim to settle the basis of a potential external validation with other non-Spanish centers and a further prospective parallel computer-guided model that could simulate the results obtained from our ANN and compare them with real on-time allocation.

17 Our manuscript provides a robust and probably the highest evidence to the most controversial issue in transplantation: how can we achieve the best post-transplant outcomes analyzing the global scenario of donor and recipient variables?This methodology combines the best of the MELD score with the best of the survival-benefit scoreby using ANN,resulting in a more objectiveapproach in terms of equity, utility and efficiency. Furthermore, our model avoids the individualistic vision of the survival benefit theories to provide the best survival for the whole potential recipients with a similar MELD score. To make it stronger and more adequate to current knowledge, in case of similar results, the rule-based system allocates the organ to the patient with the highest MELD score.

Complex mathematical calculi and

computational intelligence nowadays guide several common life procedures, by improving simple statistic calculi. So many complex interactions should not be calculated by human, as the potential to fail is high. In a moment ofdonor scarcity, we should not only look for the lowest rate of death on waiting list, but also for the optimal post-transplant outcome. Our analysis does not give a simple formula; it offers a methodology that can be exported to every liver transplant program worldwide leading to improved outcomes and optimized D-R matching. ANN method may be useful for predicting 3 month outcomes; considering the limitations of our study and with the perspective of future external validation, ANN could becomea potentially useful area of research and maybe the best method to combine donor, recipient and transplant variables, in order to obtain optimal survival.

18 REFERENCES [1]Busuttil RW, Tanaka K. The utility of marginal donors in liver transplantation. Liver Transplant 2003; 9: 651-663. [2]Briceño J, Solorzano G, Pera C. A proposal for scoring marginal liver grafts. TransplInt 2000;13(suppl 1):S249-S252. [3]Feng S, Goodrich NP, Bragg-Gresham JL, Dykstra DM, Punch JD, DebRoy MA, et al. Characteristics associated with liver graft failure: the concept of a donor risk index. Am J Transplant 2006;6:783-790. [4]Kamath PS, Kim WR. The Model for End-Stage Liver Disease.Hepatology 2007;45:797-805. [5]Schaubel DE, Sima CS, Goodrich NP, Feng S, Merion RM. The survival benefit of deceased donor liver transplantation as a function of candidate disease severity and donor quality.Am J Transplant 2008; 8: 419-425. [6]Rana A, Hardy MA, Halazun KJ, Woodland DC, Ratner LE, Samstein B, et al. Survival outcomes following liver transplantation (SOFT) score: a novel method to predict patient survival following liver transplantation. Am J Transplant 2008; 8: 2537–2546. [7]Dutkowski P, Oberkofler CE, Slankamenac K, Puhan MA, Schadde E, Müllhaupt B, et al. Are there better guidelines for allocation in liver transplantation? A novel score targeting justice and utility in the model for end-stage liver disease era. Ann Surg. 2011 Nov;254(5):745-53; [8] Drake.Use of remote sensing and ANN in prediction of pets in Queensland. Remote Sensing of Environment, 2001, 12(4):32-35. [9] MehdiKhashei, MehdiBijari. An artificial neural network (p, d, q) model for timeseries forecasting. Expert Systems with Applications Volume 37, Issue 1, January 2010, Pages 479– 489

19 [10] Balabin RM, Lomakina EI. J Chem Phys. 2009 Aug 21;131(7):074104. Neural network approach to quantum-chemistry data: accurate prediction of density functional theory energies. [11] Grzeszczuk, R, Terzopoulos D, and Hinton G, 1998. NeuroAnimator: Fast Neural Network Emulation and Control of Physics-Based Models. Computer Graphics (SIGGRAPH '98 Proceedings), pp.9-20 [12] Condon A. Molecular programming: DNA and the brain. Nature. 2011 Jul 20;475(7356):304-5. [13] ArkaGhosh. Comparative study of Financial Time Series Prediction by Artificial Neural Network with Gradient Descent Learning. International Journal Of Scientific & Engineering Research ISSN-2229-5518 Volume 3 Issue 1 January2012 [14] Craven M, Shavlik J. Using neural networks for data mining. Future Gener. Comput. Syt. 1997;13:211–229. [15] Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C. An Evaluation of naïve Bayesian Anti-Spam Filtering. In Proceedings of the workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning, pp 917, Barcelona, Spain, 2000 [16]Halldorson JB, Bakthavatsalam R, Fix O, Reyes JD, Perkins JD. D-MELD, a simple predictor of post liver transplant mortality for optimization of donor/recipient matching. Am J Transplant 2009;9:318-326.

20 [17] Fernández JC, Martínez FJ, Hervás-Martínez C, Gutierrez, P.A. Sensitivity versus Accuracy in Multi-class Problems using Memetic Pareto Evolutionary Neural Networks. IEEE Transaction on Neural Networks 2010; 21: 750-770. [18] Fernández JC, Hervás-Martínez C, Martínez-Estudillo FJ, Gutierrez, P.A Memetic Pareto Evolutionary Artificial Neural Networks to determine growth/no-growth in predictive biology. Applied Soft Computing 2011;11:534–50 [19] Cruz-Ramírez M., Sánchez-Monedero J., Fernández-Navarro F., Fernández JC, HervásMartínez C. Hybrid Pareto differential evolutionary artificial neural networks to determined growth multi-classes in predictive microbiology. In 23rd International Conference on Industrial and Engineering and Other Applications of Applied Intelligent Systems (IEAAIE2010), 646-655, 2010. [20]Oztekina A, Delenb D, Kong ZJ. Predicting the graft survival for heart-lung transplantation patients: An integrated data mining methodology. International Journal of Medical Informatics 2009; 78: e84-e96. [21]Matis S, Doyle H, Marino I, Mural R, Uberbacheret E. Use of Neural Networks for Prediction of Graft Failure following Liver Transplantation. Eighth IEEE Symposium on Computer-Based Medical Systems, pp-133-140. [22]Baxt WG. Use of an artificial neural network for the diagnosis of myocardial infarction. Ann Intern Med 1991; 115: 843-848. [23] Sharpe PK, Solberg HE, Rootwelt K, Yearworth M. Artificial neural networks in diagnosis of thyroid function from in vitro laboratory tests. ClinChem 1993; 39: 2248-2253.

21 [24]Dvorchik I, Subotin M, Marsh W, McMichael J, Fung JJ.Performance of multi-layer feedforward neural networks to predict liver transplantation outcome. Methods Inf Med 1996;35:12-18. [25] Doyle HR, Dvorchik I, Mitchell S, Marino IR, Ebert FH, McMichael J, Fung JJ. Predicting outcomes after liver transplantation. A connectionist approach. Ann Surg. 1994;219:408-15. [26] Hoot N, Aronsky D. Using Bayesian networks to predict survival of liver transplant patients. AMIA AnnuSymp Proc. 2005:345-9 [27]Parmanto B, Doyle HR. Recurrent neural networks for predicting outcomes after liver transplantation: representing temporal sequence of clinical observations. Methods Inf Med. 2001;40(5):386-91. [28] Cucchetti A, Vivarelli M, Heaton ND, Phillips S, Piscaglia F, Bolondi L, et al. Artificial neural network is superior to MELD in predicting mortality of patients with end-stage liver disease. Gut. 2007;56(2):253-8. [29]Ghoshal UC, Das A. Models for prediction of mortality from cirrhosis with special reference to artificial neural network: a critical review. Hepatol Int. 2008 Mar;2(1):31-8. [30]Banerjee R, Das A, Ghoshal UC, Sinha M. Predicting mortality in patients with cirrhosis of liver with application of neural network technology. J GastroenterolHepatol. 2003 Sep;18(9):1054-60. [31] Hughes VF, Melvin DG, Niranjan M, Alexander GA, Trull AK. Clinical validation of an artificial neural network trained to identify acute allograft rejection in liver transplant recipients. Liver Transpl. 2001;7(6):496-503.

22 [32]Haydon GH, Hiltunen Y, Lucey MR, Collett D, Gunson B, Murphy N, et al. Self-organizing maps can determine outcome and match recipients and donors at orthotopic liver transplantation. Transplantation. 2005 27;79(2):213-8. [33]

Baxt

WG.

Application

of

artificial

neural

network

to

clinical

medicine.

Lancet1995;346:1135–8. [34] Forsstrom JJ, Dalton KJ. Artificial neural networks for decision support in clinical medicine. Ann Med 1995;27:509–17. [35] Durand F, Renz JF, Alkofer B, Burra P, Clavien PA, Porte RJ, et al. Report of the Paris Consensus Meeting on expanded criteria donor in liver transplantation. Liver Transplantation 2008; 14:1694-1707. [36]Malinchoc M, Kamath PS, Gordon FD, Peine CJ, Rank J, ter Borg PC. A model to predict poor survival in patients undergoing transjugularintrahepaticportosystemic shunts. Hepatology 2000;31:864-871. [37]Kamath PS, Wiesner RH, Malinchoc M, Kremers W, Therneau TM, Kosberg CL, et al. A model to predict survival in patients with end-stage liver disease.Hepatology 2001; 33: 46470. [38]Wiesner R, Edwards E, Freeman R, Harper A, Kim R, Kamath P, et al. Model for end-stage liver disease (MELD) and allocation of donor livers. Gastroenterology 2003; 124: 91–96. [39] Desai NM, Mange KC, Crawford MD, Abt PL, Frank AM, Markmann JW, et al. Predicting outcome after liver transplantation: Utility of the model for end-stage liver disease and a newly derived discrimination function. Transplantation 2004; 77: 99–106.

23 [40]Schaubel DE, Guidinger MK, Biggins SW, Kalbfleisch JD, Pomfret EA, Sharma P, Merion RM. Survival Benefit-Based Deceased-Donor Liver Allocation. Am J Transplant 2009; 9 (part 2): 970-981. [41]Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 2002; 35: 352–359. [42]Briceño J, Ciria R, de la Mata M. Donor-recipient matching: Myths and realities. J Hepatol. 2013;58(4):811-20

24 TABLE 1. Baseline donor and recipient variables included in the study. Quantitative values are expressed as mean ± standard deviation and range. Frequencies are expressed as percentages. TOTAL NUMBER OF VALID CASES: 1003

Mean ± Sd

Range

N (Percentage)

RECIPIENT VARIABLES

Age (x1)

52.98±9.86

18-73

Diabetes mellitus (x2)

199 (19.8%)

Gender (male) (x3)

724 (72.2%)

BMI (x4)

26.759±4.44

10.0-60.3

Indication transplant •

elective (x5)



preferential ( x6)



urgent (x7)

Hypertension (x8) Dialysis prior to transplantation (x9)

870 (86.7%) 42 (4.1%) 91 (9.0%)

134 (13.4%) 25 (2.5%)

Main diagnosis • HCV (x10)

329 (32.8%)

• Alcohol (x11)

298 (29.7%)

• HBV (x12)

98 (9.8%)

• Fulminant hepatic failure (x13)

65 (6.5%)

• Primary biliary cirrhosis (x14)

15 (1.5%)

• Primary sclerosingcholangitis (x15)

45 (4.5%)

• Others (x16)

Hepatocellular carcinoma (x17)

153 (15.3%)

364 (36.3%)

25 Portal vein thrombosis • No (x18)

872 (86.9%)

• Partial (x19)

102 (10.2%) 29 (2.9%)

• Complete (x20)

Waiting list timing (x21)

183.42±198.8

0-1978

MELD (inclusion) (x22)

16.51±6.567

1-48

MELD (at transplant) (x23)

17.35±7.01

2-57

TIPS (x24)

47 (4.7%)

Hepatorenal syndrome (x25)

78 (7.8%)

Upper abdominal surgery prior to transplantation (x26)

186 (18.5%)

DONOR VARIABLES

Age (x27)

54.09±17.28

14-86

Sex (male) (x28) BMI (x29)

585 (58.3%) 26.425±4.02

17.2-49.2

Cause of death • Brain trauma (x30)

228 (22.7%)

• Cerebral vascular accident (x31)

677 (67.5%)

• Deceased donor after cardiac death (x32)

6 (0.6%) 92 (9.2%)

• Others (x33)

Diabetes Mellitus (x34)

113 (11.3%)

Hypertension (x35)

372 (37.1%)

ITU stay (days) (x36)

3.34±4.18

0-57

Hypotension (< 60 mmHg, >1 hour) (x37)

223 (22.2%)

Inotropic drug use (x38)

844 (84.1%)

Creatinine (x39)

0.946±0.52

0.1-5.9

Sodium plasma level (x40)

148.18±9.85

98-188

AST (x41)

54.00±89.55

1-1228

26 ALT (x42)

50.20±94.73

4-1400

Total bilirubin (x43)

0.754±0.47

0.1-4.2

Hepatitis B (core Ab positive) /antiHBc (x44)

124 (12.4%)

Hepatitis C (positive serology) (x45)

16 (1.6%) RETRIEVAL

PRESERVATION SOLUTION •

Celsior (x46)



Wisconsin (x47)



Others (x48)

290 (28.9%) 711 (70.8%) 2 (0.1%)

DONATION AFTER CARDIAC DEATH (x49) MULTIORGAN (x50)

20 (1.9%) 735 (73.2%)

SPLIT (x51)

4 (0.3%) TRANSPLANT

COMBINED (x52)

24 (2.3%)

WHOLE/PARTIAL GRAFT (x53)

10 (0.9%)

COLD ISCHAEMIA TIME •

< 6h. (x54)



6-12h. (x55)



> 12h. (x56)

ABO COMPATIBILITY (x57)

609 (60.7%) 381 (37.9%) 13 (1.2%) 988 (98.5%)

27

TABLE 2. Variables included in the ANN for CCR and MS models. The dataset showed two complex sets of variables for both graft-survival-CCR and graft-loss-MS models.

VARIABLES INCLUDED IN THE CCR AND MS MODELS CCR

MS FORMULAE

-0.04+2.57 * (1/(1+e^-(-10+3.36 * (x1) - 9.99 * (x2) + 0.88 * (x3) - 10.0 * (x4) + 9.99 * (x5) 10.0 * (x6) + 10.0 * (x7) + 10.0 * (x9) - 2.85 * (x10) + 9.07 * (x12) - 8.28 * (x13) + 8.29 * (x14) - 10.0 * (x15) - 9.50 * (x16) + 9.52 * (x17) - 9.95 * (x18) - 9.09 * (x19) + 10.0 * (x20) - 9.59 * (x21) - 4.43 * (x22) - 8.28 * (x23) + 7.89 * (x24) - 10.0 * (x25) + 6.92 * (x26) - 2.09 * (x27) + 7.68 * (x29) + 10.0 * (x31) + 0.31 * (x32) + 2.53 * (x33) + 8.43 * (x34) + 4.99 * (x35) + 10.0 * (x36) + 5.16 * (x37) + 10.0 * (x38) - 10.0 * (x39) - 10.0 * (x40) + 10.0 * (x41) + 10.0 * (x42) + 5.63 * (x43) + 10.0 * (x44) - 3.08 * (x45) - 7.98 * (x46) + 10.0 * (x47) - 10.0 * (x48) - 10.0 * (x49) + 10.0 * (x50) + 3.11 * (x51) - 8.79 * (x52) - 9.92 * (x53) + 9.97 * (x54) - 10.0 * (x55) + 9.63 * (x56) + 9.80 * (x57))))

2.41-4.06 * (1/(1+e^-(1.53-9.71 * (x1) + 10.0 * (x2) + 9.86 * (x3) - 9.47 * (x4) - 7.44 * (x5) 6.00 * (x6) + 10.0 * (x7) - 10.0 * (x8) - 2.38 * (x9) - 7.89 * (x10) - 1.70 * (x11) - 10.0 * (x12) + 1.02 * (x13) - 1.86 * (x14) + 8.19 * (x15) + 5.66 * (x16) - 9.34 * (x17) + 10.0 * (x18) - 4.52 * (x19) - 10.0 * (x20) + 10.0 * (x21) + 10.0 * (x22) + 10.0 * (x23) + 3.40 * (x24) + 5.07 * (x25) - 9.58 * (x26) - 8.67 * (x27) + 1.82 * (x28) + 2.76 * (x29) + 9.92 * (x30) - 4.85 * (x31) + 1.97 * (x32) + 10.0 * (x33) - 10.0 * (x34) + 10.0 * (x35) + 2.49 * (x36) + 10.0 * (x37) - 9.27 * (x38) - 10.0 * (x39) + 10.0 * (x40) + 0.56 * (x41) - 1.25 * (x42) - 2.08 * (x43) - 0.63 * (x44) + 5.13 * (x45) + 10.0 * (x46) - 4.25 * (x47) + 8.32 * (x48) - 5.12 * (x49) - 10.0 * (x50) - 1.65 * (x51) + 7.42 * (x52) - 9.57 * (x53) - 0.62 * (x54) + 3.25 * (x55) - 2.44 * (x56) + 10.0 * (x57))))

28 TABLE 3: The ANN in action. Two examples (A. standard criteria donors; B. extended criteria donors) of how the computer chooses the best D-R matching to obtain the best probability of graft survival (NN-CCR model- above) and the lowest probability of graft loss (NN-MS modelbelow). After the application of the rule-based system, a recipient is selected for a specific donor.

A. System Application Example I. STANDARD CRITERIA DONORS* GRAFT SURVIVAL PROBABILITY IN % BY THE BEST NN-CCR MODEL CCR

D1

D2

D3

D4

D5

D6

D7

D8

D9

D10

R1 R2 R3 R4 R5

92.62 92.62 92.62 92.62 92.62

49.00 50.03 50.17 49.00 92.62

92.62 92.62 92.62 49.01 92.62

92.60 92.62 92.62 49.00 92.62

49.00 49.00 49.00 49.00 49.00

49.00 49.00 49.00 49.00 92.62

92.62 92.62 92.62 49.00 92.62

92.62 92.62 92.62 92.62 92.62

92.62 92.62 92.62 92.62 92.62

92.60 92.62 92.62 49.00 92.62

GRAFT LOSS PROBABILITY IN % BY THE BEST NN-MS MODEL MS R1 R2 R3 R4 R5

D1

D2

D3

D4

D5

D6

D7

D8

83.90 83.90 83.90 83.90 83.90 83.90 83.90 83.90 81.90 83.90 83.90 83.90 83.90 83.90 83.90 8.21 83.90 83.90 83.90 83.90 83.90 83.90 83.90 83.90 8.21 83.90 20.39 8.21 83.90 83.90 8.28 8.21 8.21 83.89 8.59 8.21 83.90 83.90 8.21 8.21 RULE-BASED SYSTEM OUTPUTS: SELECTION OF THE RECIPIENT

D9

D10

83.90 8.32 83.90 8.21 8.21

83.90 83.90 83.90 8.21 8.21

R4 R5 R5 R5 R1 R5 R5 R2 R2 Best probability in blue font and the probabilities with no significant differences in red font

R5

Depending on the probabilities obtained with the two models, the rule-based system software is used to make the decision of allocating. For example, with 5 recipients with MELD 24 to 26, Donor 1 shows the same probability of functioning graft, but only recipients 4 and 5 show the lowest probability of non-functioning graft. The rule based system software selects recipient 4, because it has a higher MELD. Donor 9 with the same recipients shows the same probability of functioning graft, but only recipients 2, 4 and 5 show the lowest probability of non-functioning graft. The rule based system software selects recipient 2, because it has a higher MELD.

29 B. System Application Example II. EXTENDED CRITERIA DONORS** GRAFT SURVIVAL PROBABILITY IN % BY THE BEST NN-CCR MODEL CCR

D11

D12

D13

D14

D15

D16

D17

D18

D19

D20

R1 R2 R3 R4 R5

92.52 92.62 92.62 49.00 92.62

92.62 92.62 92.62 92.62 92.62

92.62 92.62 92.62 87.86 92.62

92.62 92.62 92.62 92.62 92.62

92.62 92.62 92.62 92.62 92.62

92.62 92.62 92.62 92.62 92.62

92.62 92.62 92.62 92.62 92.62

92.62 92.62 92.62 49.00 92.62

92.62 92.62 92.62 92.62 92.62

49.34 91.21 91.39 49.00 92.62

GRAFT LOSS PROBABILITY IN % BY THE BEST NN-MS MODEL MS

D11

D12

D13

D14

D15

D16

D17

D18

D19

D20

R1 R2 R3 R4 R5

49.98 8.21 83.90 8.21 8.21

8.21 8.21 8.29 8.21 8.21

83.90 8.31 83.90 8.21 8.21

70.12 8.21 83.90 8.21 8.21

83.90 83.90 83.90 83.88 83.33

83.87 8.21 83.90 8.21 8.21

83.90 83.90 83.90 8.41 8.22

83.90 83.90 83.90 8.21 8.21

83.90 9.37 83.90 8.21 8.21

83.90 83.90 83.90 83.90 83.90

RULE-BASED SYSTEM OUTPUTS: SELECTION OF THE RECIPIENT R2 R1 R2 R2 R1 R2 R4 R5 R2 Best probability in blue font and the probabilities with no significant differences in red font

R2

Donor 11 shows a similar probability of functioning graft with recipients 1, 2, 3 and 5, but only recipients 2, 4 and 5 show the lowest probability of non-functioning graft. The rule based system software selects recipient 2, because it has a higher MELD. Donor 18 shows the same probability of functioning graft with recipients 1, 2, 3 and 5, but only recipients 4 and 5 show the lowest probability of non functioning graft. The rule based software system selects recipient 5, despite it has the lowest MELD.

30

FIGURE LEGENDS FIGURE 1. Methods in the Neural Network Process. A: General structure and phases of the artificial neural network (ANN)-based research. A: Architecture with donor (D) and recipient (R) variables interacting in nodes (O) within different strengthens (lines width) and being modified by hidden variables (Hidden) to obtain the main clinical endpoints (survival). B: Schedule of the research with the two main arms: first, development of a neural network and training-testing for the CCR and MS models and comparisons versus common statistics in graft survival prediction (left); comparisons of the prediction capability of the neural network models with current validated scores after another process of Training/Testing (right). Training-testing process randomly divides the study population in 10 groups, of which 9 are used for training the neural network and the other one is used for testing. After that, the testing group is included again in the whole dataset and another randomization is performed, until achieving 10 randomizations. This process minimizes any bias, contrary to common statistics tools. FIGURE 2. Comparisons of the prediction capabilities of 3-months graft survival (NNCCR) and graft loss (NN-MS) neural networks models versus common statistics (simple and multiple regression analyses). There is a clear lack of prediction capability for common statistical tools that is implemented by artificial intelligent computational analyses.

FIGURE 3.Comparisons of AUROCs for ANN versus MELD, D-MELD, DRI, P-SOFT, SOFT and BAR scores. As observed, NN-CCR for positive-graft survival model (A) and NN-MS for negative-graft loss model (B) is significantly higher for ANN than for other scores.

31 SUPPLEMENTAL DIGITAL CONTENT: Comparisons among NN-CCR and NN-MS and logistic regression and other deterministic machine learning methods by Correct Classification Rate and Minimum Sensitivity in the generalization set C(%) and MS(%), respectively. NN-CCR methodology performs well in predicting the percentage of probability for graft survival for each D-R pair. However, NN-C,CR logistic regression models (MLogistic and SLogistic) and other computational intelligence models (C4.5, LMT and SVM) have a poor ability to predict (classify) the graft loss. NN-MS models have the best capacity for graft loss prediction with 71.42% for the best NN-MS model.

Use of artificial intelligence as an innovative donor-recipient matching model for liver transplantation: results from a multicenter Spanish study.

There is an increasing discrepancy between the number of potential liver graft recipients and the number of organs available. Organ allocation should ...
847KB Sizes 4 Downloads 3 Views