Contents lists available at ScienceDirect

Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm

Searching therapeutic agents for treatment of Alzheimer disease using the Monte Carlo method Mariya A. Toropova a, Andrey A. Toropov b,n, Ivan Raška Jrc, Mária Rašková c a

Dipartimento di Scienze Farmaceutiche, Universita‘ degli Studi di Milano, Via L. Mangiagalli, 25, 20133 Milan, Italy IRCCS, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via La Masa 19, Milano 20156, Italy c 3rd Department of Medicine - Department of Endocrinology and Metabolism, First Faculty of Medicine, Charles University in Prague and General University Hospital in Prague, U Nemocnice 1, 12808 Prague 2, Czech Republic b

art ic l e i nf o

a b s t r a c t

Article history: Received 9 May 2015 Accepted 20 June 2015

Quantitative structure - activity relationships (QSARs) for the pIC50 (binding afﬁnity) of gammasecretase inhibitors can be constructed with the Monte Carlo method using CORAL software (http:// www.insilico.eu/coral). The considerable inﬂuence of the presence of rings of various types with respect to the above endpoint has been detected. The mechanistic interpretation and the domain of applicability of the QSARs are discussed. Methods to select new potential gamma-secretase inhibitors are suggested. & 2015 Elsevier Ltd. All rights reserved.

Keywords: QSAR Gamma-secretase inhibitor Monte Carlo method CORAL software OECD principles

1. Introduction Alzheimer's disease is a neurodegenerative disorder of the central nervous system accompanied by degradation of cognitive abilities and memory deterioration, together with a variety of neuropsychiatric symptoms, behavioral disturbances, and progressive impairment of daily life activities. Current pharmacotherapies are restricted to symptomatic interventions but do not prevent progressive neuronal degeneration. Therefore, new therapeutic strategies are needed to intervene with these progressive pathological processes [1]. Two major pathological hallmarks are characteristics of Alzheimer's disease: intracellular neuroﬁbrillary tangles and extracellular amyloid plaques. The amyloid plaque is mainly comprised of an aggregated form of the 40–42 residue amyloid β-peptide (Aβ). The accumulation and deposition of Aβ eventually lead to neuronal damage and cell death. Reduction of Aβ by inhibition of γsecretase may prevent the above neurotoxic events, representing an attractive strategy to reduce the probability of Alzheimer's disease [2]. Gamma-secretase inhibitors are possible therapeutic agents for treatment of Alzheimer disease and cancer [2]. The measure of therapeutic potential of different gamma-secretase inhibitors is n Correspondence to: Laboratory of Environmental Chemistry and Toxicology, IRCCS - Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, 20156 Milano, Italy. Tel.: þ 39 02 39014595; fax: þ39 02 39014735. E-mail address: [email protected] (A.A. Toropov).

http://dx.doi.org/10.1016/j.compbiomed.2015.06.019 0010-4825/& 2015 Elsevier Ltd. All rights reserved.

their binding afﬁnity [3]. There are a number of attempts to build up a model for the endpoint (binding afﬁnity) by means of various approaches. For instance, in work [1], the partial least squares (PLS) regression and neural networks (NN) were utilized to build up a model for the endpoint. The aim of the present work is to develop quantitative structure – activity relationships (QSARs) related to the above-mentioned endpoint using the CORAL software [4]. In fact, the CORAL software is a tool to build up a model for arbitrary endpoint using the Monte Carlo technique. The comparison of the predictability of the above mentioned approaches (PLS and NN [1]) with the CORAL models can be interesting and useful from theoretical and practical point of view.

2. Method 2.1. Data The binding afﬁnity data (IC50 nM converted into negative decimal logarithm pIC50 ¼ log10IC50) of 233 gamma-secretase inhibitors and their simpliﬁed molecular input-line entry system (SMILES) [4] were taken in the literature [2,5]. Three random splits into the training ( E60%), calibration (E 20%) and validation (E20%) sets of the above-mentioned 233 inhibitors were generated and examined in this work. The training set plays the role of builder of a model; the calibration set plays the role of preliminary critic of the model; and the validation set is the ﬁnal estimator of the model.

M.A. Toropova et al. / Computers in Biology and Medicine 64 (2015) 148–154

2.2. Optimal descriptors The optimal descriptors which are using for QSPR/QSAR analyses [6–8] are mathematical functions of so-called correlation weights. In other words, they are “Descriptors of Correlation Weights” (DCW). The correlation weights are calculated using the Monte Carlo method: the numerical data on the correlation weights must give maximum correlation coefﬁcient between the experimental endpoint values with the DCW for visible training set [6–8]. Two versions of the optimal descriptors calculated with the hydrogen ﬁlled graphs (HFGs) were examined: NG

NG

NG

k¼1

k¼1

k¼1

DCW 1 ðT; NÞ ¼ Σ WðEC1k Þ þ Σ WðPT2k Þ þ Σ WðVS2k Þ

ð1Þ

NG

DCW 2 ðT; NÞ ¼ Σ WðEC1k Þ þ WðC3Þ þ WðC4Þ þ WðC5Þ þ WðC6Þ þ WðC7Þ k¼1

ð2Þ where W(EC1k) is the correlation weight for the presence of a given value of extended connectivity of the ﬁrst order in HFG [6]; W (PT2k) is the correlation weight for the presence of a given number of paths of length 2 which started from the k-th vertex in HFG [6]; W(VS2k) is the correlation weight for the presence of a given valence shell value of second order [6]. The listed graph invariants

are calculated with the adjacency matrix of the HFG. The adjacency matrix of a graph G with n vertices (which represent chemical elements) is the n n matrix where the non-diagonal element aij is 1 if i-th and j-th vertices are connected (i.e. i-th and j-th atoms are connected by covalent bond); otherwise, the aij is zero. Fig. 1 contains the example of the adjacency matrix together with numerical values of the above graph invariants. In the case of second version of the optimal descriptor calculated with Eq. (2) an other group of graph invariants were used. These are calculated in accordance with presence (absence) of three-members cycles (C3); four-members cycles (C4), ﬁvemembers cycles (C5), six-members cycles (C6), and sevenmembers cycles (C7). The W(C3), W(C4), W(C5), W(C6), and W (C7) are correlation weights for graph invariants related to cycles. Table 1 contains examples of listed graph invariants. In fact, each graph invariant is a molecular feature. The EC1, PT2, VS2 are examples of local molecular features (fragments), whereas C3 – C7 are global molecular features since they characteristics of a molecule totally. Finally, the NG is the number of vertex in HFG; T and N are parameters of the Monte Carlo optimization. The optimization is aimed to give the maximal correlation coefﬁcient between DCW1(T,N) or DCW2(T,N) and the endpoint for the training set. The T is the threshold i.e. a coefﬁcient to discriminate molecular features extracted from HFG into two categories (i) rare in

Fig. 1. An example of calculation of graph invariants with the adjacency matrix.

Table 1 An examples of codes which are used to represent graph invariants involved into the optimal descriptors calculated with Eq. (1) or with Eq. (2). Twelve symbols' code

Comment

EC1-C...5... EC1-F...3... EC1-S...9... PT2-Cl..2... PT2-N...5... VS2-C...7... VS2-H...9... C3......1... C4....H.1... C5....H.2. C6. .A..3. C6...AH.4...

The The The The The The The The The The The The

149

extended connectivity of ﬁrst order equal to 5 for carbon atom extended connectivity of ﬁrst order equal to 3 for ﬂuorine atom extended connectivity of ﬁrst order equal to 9 for sulfur atom path of length 2 equal to 2 for chlorine atom path of length 2 equal to 5 for nitrogen atom valence shell of second order equal to 7 for carbon atom valence shell of second order equal to 9 for hydrogen atom presence of one ring with three members presence of one ring with four members containing at least one hetero atom (non carbon) presence of two rings with ﬁve members, at least one of these two rings contains hetero atoms. presence of three rings with six members, at least one of these rings is aromatic one presence of four rings with six members, at least one of these rings is aromatic, and at least one of these rings contains heteroatoms.

150

M.A. Toropova et al. / Computers in Biology and Medicine 64 (2015) 148–154

the training set (which are noise); and (ii) not rare in the training set (the model should be based on not rare molecular attributes); the N is the number of epochs of the Monte Carlo optimization [7]. 2.3. The Monte Carlo optimization. In order to build up the model, one should calculate correlation weights for the molecular features. In fact, the list of molecular features (lMF) are extracted during step by step reading of all data (compounds) which are involved in “visible” the training and calibration sets. This process can be represented by scheme: SMILES1 SMILES2 SMILES3 SMILESn

into into into into

2.4. Domain of applicability The measure of the statistical (probabilistic) quality of a features which are extracted from molecular graphs can be calculated as the following:

HFG1 ¼ 4lMF1 HFG2 ¼ 4lMF1 þlMF2 HFG3 ¼ 4lMF1 þlMF2 þlMF3 HFGn ¼ 4 lMF1 þlMF2 þlMF3 þ ...þ lMFn

Def ectMF k ¼

In accordance with the threshold (T), the full list of molecular features (i.e. lMF1 þlMF2þ lMF3þ... þ lMFn) should be separated into two classes: (i) rare (the total number of molecular feature in substances distributed into the training set is equal or less than T); and (ii) active (the total number of molecular feature in substances distributed into the training set is larger than T). The correlation weights for rare features are ﬁxed be zero, i.e. rare graph invariants are not utilized for building up model. The correlation weights of active invariants of HFG are changing by the Monte Carlo method for the maximization of the target function. The target function is the correlation coefﬁcient between optimal descriptor and pIC50 for the training set. Being unlimited (e.g. too large number of epochs), the optimization can lead to overtraining (good statistics for the training set together with poor statistics for the calibration set and very probably with poor statistics for the external validation set). Taking this into account, one should select T¼ Tn and N ¼Nn which give the best statistical quality for the calibration set. Fig. 2 gives the clariﬁcation of the way to obtain the Tn and Nn after analysis of a range of threshold from tmin to tmax and range of the number of epochs from 1 to Nmax. Using the (Tn,Nn) one can obtain more or less trustworthy model which can be represented as: pIC50 ¼ C 0 þ C 1 DCWX ðT n ; Nn Þ

where X¼1 or 2; Tn and Nn are parameters which give best statistics for the calibration set. In this work, the range of threshold from 1 to 10 and the range of number of epochs from 1 to 20 were analyzed. Since the compounds distributed into the training and calibration set are “visible” during building up model, Eq. (3) should be checked up with external (“invisible”) validation set. It is to be noted, no physicochemical parameters and 3D data on the molecular structure are used to build up the CORAL model. In fact, the only SMILES are input data for the described approach.

ð3Þ

P TRN ðMF k Þ P CLB ðMF k Þ N TRN ðMF k Þ þ N CLB ðMF k Þ

If N CLB ðMF k Þ ¼ 0;

Def ectMF k ¼ 1

ð4Þ where DefectMFk is defect of k-th molecular feature; PTRN(MFk) is the probability of presence of the MFk in compounds of the training set, i.e. PTRN(MFk)¼ NTRN(MFk)/NTRN The PTRN(MFk) is the probability of presence of the MFk in compounds of the calibration set, i.e. PTST(MFk) ¼NCLB(MFk)/NCLB The NTRN(MFk) is the number (frequency) of compounds which contain MFk in the training set; The NTRN is the total number of compounds in the training set; The NCLB( MFk) is the number (frequency) of compounds which contain MFk in the calibration set; The NCLB is the total number of compounds in the calibration set. The logic: if the probability of MFk in the training set is equal to the probability of MFk in the calibration set it is the ideal situation and the defect is zero. However, this situation is not typical, i.e. the difference between the probability of MFk in the training set and the probability of MFk in the calibration set is not zero. Under such

Fig. 2. The scheme of deﬁnition Tn and Nn which give the best statistics for the calibration set.

M.A. Toropova et al. / Computers in Biology and Medicine 64 (2015) 148–154

circumstances, the frequency of MFk in the training set and in the calibration set also should be taken into account: if these are small, then the defect of MFk must be larger. Finally, if MFk is absent in the calibration set, the defect of MFk (DefectMFk) is maximal (DefectMFk ¼ 1). Thus, the measure calculated with Eq. (4) can be used for the classiﬁcation of the active (not blocked) attributes.

Having the numerical data on the DefectMFk one can compare reliability of the prediction for an compounds, using the following criterion: ð5Þ

The domain of applicability can be deﬁned as the following: compounds falls into the domain of applicability if Defect(HFG) for this compound obeys the condition: Def ectðHFGÞ o 2n Def ectðHFGÞ

ð6Þ

where Def ectðHFGÞ is average for visible set (training and calibration sets). Thus the Defect(HFG) calculated with Eq. (5) gives possibility to deﬁne the domain of applicability: compounds which are not obey inequality 6 should be estimated as potential outliers [3,10]. Table 2 contains the percentage of outliers according to inequality 6.

3. Results and discussion In the case of the split taken from work [2] the CORAL software gives the following models Without taking into account inﬂuence of various rings: pIC 50 ¼ 4:1271ð 7 0:0087Þ þ 0:1034ð 7 0:0002Þn DCW1 ð2; 14Þ

ð7Þ

n ¼178, r2 ¼ 0.802, q2 ¼0.798, s¼0.567, F¼ 711, C R2p ¼ 0:8001 (training set) n ¼28, r2 ¼0.624, s¼ 0.820, C R2p ¼ 0:610 (calibration set) n ¼27, r2 ¼0.693, s ¼0.886, r 2m ¼ 0:602, r 2m ¼ 0:577, Δr 2m ¼ 0:049 (validation set) With taking into account inﬂuence of various rings: pIC 50 ¼ 4:8955ð 7 0:0095Þ þ 0:1287ð 7 0:0004Þn DCW2 ð4; 9Þ

According to principle “QSAR is a random event” [8] three additional random splits of available data into the “visible” training set and calibration set and “invisible” (during building up model) validation set have been examined. Without taking into account inﬂuence of various rings: Split 1 pIC 50 ¼ 4:4727ð 70:0172Þ þ 0:04156ð 70:0002Þn DCW1 ð5; 3Þ

2.5. The selection of compounds into the domain of applicability

Def ectðHFGÞ ¼ Σ Def ectMF k

151

ð8Þ

n ¼178, r2 ¼ 0.710, q2 ¼0.704, s¼0.685, F¼431, C R2p ¼ 0:7085 (training set) n ¼28, r2 ¼0.527, s ¼0.867, C R2p ¼ 0:510 (calibration set) n ¼27, r2 ¼0.737, s ¼0.802, r 2m ¼ 0:732, r 2m ¼ 0:626, Δr 2m ¼ 0:212 (validation set)

ð9Þ

n¼ 128, r2 ¼ 0.609, q2 ¼ 0.598, s¼ 0.847, F ¼197, C R2p ¼ 0:607 (training set) n¼ 53, r2 ¼0.782, s ¼0.685, C R2p ¼ 0:779 (calibration set) n¼ 52, r2 ¼0.690, s¼0.702, r 2m ¼ 0:670, r 2m ¼ 0:567, Δr 2m ¼ 0:210 (validation set) Split 2 pIC 50 ¼ 3:7050ð 70:0210Þ þ 0:04565ð 70:0002Þn DCW1 ð1; 3Þ

ð10Þ

n¼ 142, r2 ¼0.613, q2 ¼0.603, s¼0.830, F¼ 222, C R2p ¼ 0:606 (training set) n¼ 45, r2 ¼0.839, s ¼0.619, C R2p ¼ 0:8256 (calibration set) n¼ 46, r2 ¼ 0.842, s ¼0.564, r 2m ¼ 0:815, r 2m ¼ 0:740, Δr 2m ¼ 0:149 (validation set) Split 3 pIC 50 ¼ 4:4433ð 70:0170Þ þ 0:0583ð 7 0:0003Þn DCW1 ð5; 5Þ 2

C

2

ð11Þ

R2p

¼ 0:656 n¼ 123, r ¼0.659, q ¼0.649, s¼0.781, F¼234, (training set) n¼ 56, r2 ¼0.726, s ¼0.634, C R2p ¼ 0:721 (calibration set) n¼ 54, r2 ¼0.689, s¼0.816, r 2m ¼ 0:625, r 2m ¼ 0:572, Δr 2m ¼ 0:107 (validation set) With taking into account inﬂuence of various rings: Split 1 pIC 50 ¼ 3:9366ð 70:0224Þ þ 0:1092ð 7 0:0007Þn DCW2 ð4; 4Þ

ð12Þ

n¼ 128, r2 ¼0.601, q2 ¼0.589, s¼ 0.856, F¼190, C R2p ¼ 0:577 (training set) n¼ 53, r2 ¼0.791, s ¼0.679, C R2p ¼ 0:756 (calibration set) n¼ 52, r2 ¼0.700, s ¼0.685, r 2m ¼ 0:654, r 2m ¼ 0:583, Δr 2m ¼ 0:142 (validation set) Split 2 pIC 50 ¼ 4:4497ð 70:0166Þ þ 0:1426ð 7 0:0007Þn DCW2 ð1; 9Þ

ð13Þ

n¼ 142, r2 ¼0.635, q2 ¼0.625, s ¼0.806, F¼ 243, C R2p ¼ 0:588 (training set) n¼ 45, r2 ¼0.873, s ¼0.522, C R2p ¼ 0:772 (calibration set) n¼ 46, r2 ¼0.855, s ¼0.527, r 2m ¼ 0:823, r 2m ¼ 0:791, Δr 2m ¼ 0:065 (validation set)

Table 2 The statistical characteristics of QSAR for pIC50 of gamma-secretase inhibitors for different splits into the training, calibration, and validation sets. Split Ref. Ref. Ref. Ref. 1 2 3 1 2 3 a

[2] [2] [2] [2]

Kind of model

r2train

ntrain

r2calib

ncalib

r2valid

nvalid

Nact

Partial least squares Neuron networks CORAL (No cycles) CORAL (Cycles) CORAL (No cycles) CORAL (No cycles) CORAL (No cycles) CORAL (Cycles) CORAL (Cycles) CORAL (Cycles)

0.65 0.70 0.80 0.71 0.61 0.61 0.66 0.60 0.63 0.60

178 178 178 178 128 142 123 128 142 123

0.69 0.78 0.62 0.53 0.78 0.84 0.73 0.79 0.87 0.78

28 28 28 28 53 45 56 53 45 56

0.67(0.45)c 0.68(0.49)c 0.69 0.74d 0.69 0.84 0.69 0.70 0.85 0.73

26(27) 26(27) 27 27 52 46 54 52 46 54

– – 140 50 121 158 117 49 61 48

The number of active (not blocked) attributes. Percentage of outliers in the validation set according to inequality 6. c After removing one outlier [2], data obtained before removing of the outlier is shown in brackets. d Best predictions for the “invisible” validation set are indicated by bold. b

a

% – – 7 0 2 17 7 6 15 7

b

152

M.A. Toropova et al. / Computers in Biology and Medicine 64 (2015) 148–154

Split 3 pIC 50 ¼ 4:0605ð 7 0:0229Þ þ 0:1216ð 7 0:0007Þn DCW2 ð5; 4Þ

ð14Þ

n ¼123, r2 ¼0.600, q2 ¼0.588, s¼0.846, F¼182, C R2p ¼ 0.584 (training set) n ¼56, r2 ¼0.776, s¼ 0.565, C R2p ¼ 0:763 (calibration set) n ¼54, r2 ¼ 0.727, s¼ 0.748, r 2m ¼ 0:704, r 2m ¼ 0:616, Δr 2m ¼ 0:175 (validation set) Y-randomization test for all models (Eqs. (4)–(11)) has shown that these are not chance correlations, since the C R2p is larger than 0.5 [9]. The comparison of models which are calculated without taking into account the inﬂuence of rings (Eqs. (9)–(11)) with models which are calculated by taking into account the inﬂuence rings (Eqs. (12)–(14)) has shown that presence of rings have considerable impact upon the pIC50, because the statistical quality of models (for the external validation set) calculated with taking into account presence of various cycles is appreciably better (Table 2). The distribution of available data into visible during building up model compounds (i.e. the training and calibration sets) and

invisible validation set has apparent inﬂuence of the predictive ability of these models. The additional statistical characteristics of predictive potential r 2m , r 2m , and Δr 2m are satisfactory for all models (Eqs. (9)–(14)), excepting Eqs. (8) and (9), where Δr 2m 4 0.2, however, it was noted in work [9], that if r 2m 40.5 (obligatory), then Δr 2m should preferably be lower than 0.2, but it is not a rigid limitation [10]. Consequently, even the mentioned models can be estimated as satisfactory ones (Fig. 3). In the case of three splits into the training, calibration, and validation sets, the numbers of compounds in the external validation set is larger than 27 (the number of compounds utilized for validation in work [2]). In spite of this circumstance, the predictive potential of the models (Eqs. (12)–(14)) is better than predictive potential of the above model suggested in the literature [2] (Table 2). In other words, the comparison of approach that is described in work [2] with models which are calculated with Eqs. (12)–(14) gives possibility to detect (i) models calculated with Eqs. (12)–(14) do not involve physicochemical descriptors, i.e. these models are based solely on data on molecular features (this is advantage); (ii) both approaches have taken into account presence of various cycles in molecules; and (iii) probably, one can improve

Fig. 3. Graphical representation of models calculated with Eqs. (9)–(14).

M.A. Toropova et al. / Computers in Biology and Medicine 64 (2015) 148–154

153

Table 3 Examples of effective modiﬁcations of the molecular structure (ID CHEMBL191246) taken from the literature [2]. The calculation has been done with Eq. (12) (Split 1); Eq. (13) (Split 2); and Eq. (14) (Split 3). Molecular structure and SMILES

Eq. (12)

Eq. (13)

Eq. (14)

Comment

7.0140

6.5772

7.1416

7.1357

6.8964

7.2551

Carbon vertex with EC1¼ 13 is added

5.4052

5.3778

5.4969

Presence of only two six-members aromatic cycles without heteroatoms.

O¼ S( ¼O)(c1ccc(Cl)cc1)C2(CCCCC2)c3ccccc3F

O¼ S( ¼O)(c1ccc(Cl)cc1)C2(CCCC(C)C2)c3ccccc3F

Fc1ccccc1S( ¼ O)(¼ O)c1ccc(Cl)cc1

predictive potential of models described in work [2] by taking into account additional details on rings (e.g. presence of heteroatoms) which take place in examined molecules. Comparison of the number of active molecular features involved to build up model (Table 2) shows that the number of active features in the case of taking into account different rings DCW2(Tn,Nn) is considerably less than the number of correlation weights for the DCW1(Tn,Nn). This circumstance indicates that molecular features expressed via C3–C7 are more informative for the given endpoint than PT2 and VS2. Having data for a series of runs of the Monte Carlo optimization with different splits, one can extract promoters of pIC50 increase (positive correlation weights for all runs) and promoters of pIC50 decrease (negative correlation weights for all runs) [11]. Examples of promoters of increase for the pIC50 are (i) EC1 ¼9,10,13 for carbon atoms; (ii) presence of three aromatic six-members rings without heteroatoms; (iii) presence of one seven-members ring with heteroatoms. Examples of promoters of decrease for pIC50 are (i) EC1 ¼4 for hydrogen; (ii) EC1 ¼ 8 for carbon; (iii) presence of two six-members aromatic cycles without heteroatoms. Thus the approach has mechanistic interpretation. Using these results one can deﬁne molecular modiﬁcations which should give increase (or decrease) of pIC50. Table 3 contains examples of the above molecular modiﬁcations. Apparently, however, that the results of the computational experiments can be only preliminary hint for selection of preferable candidates for the role of therapeutic agents, in general, and agents for treatment of Alzheimer disease, in particular. Supplementary materials section contains technical details of the models calculated with Eqs. (12)–(14): (i) Formulae for all used statistical characteristics (Table S1); (ii) SMILES utilized to represent the molecular structure of gamma-secretase inhibitors together with the distribution into the training, calibration, and validation sets (Tables S2–S4); (iii) correlation weights for calculation the DCW2(Tn,Nn) (Table S5); and Y-randomization test (Table S6).

4. Conclusions Taking into account presence of different rings (i.e. using Eq. (2)) gives possibility to improve the predictive potential of models generated by the CORAL software for the pIC50 (binding afﬁnity) for gamma-secretase inhibitors (Table 2). Thus, the best models are calculated with Eqs. (12)–(14) for splits 1, 2, and 3, respectively. The distribution into “visible” (training and calibration sets) and “invisible” validation set has apparent inﬂuence upon the statistical quality (predictive potential) of these models. These models have mechanistic interpretation in terms of stable promoters of increase or decrease of the binding afﬁnity for gamma-secretase inhibitors (pIC50) by means of analysis of correlation weights of different molecular features obtained in several runs of the Monte Carlo optimization. The predictive potential of models built up by the CORAL software is better than predictive potential models for the same compounds described in the literature [2]. The represented data indicated that suggested approach can be utilized for regulatory [12] purposes, according to OECD principles [13].

Conﬂict of interest None declared.

Acknowledgments A.A.T. thanks EC project PROSIL funded under the LIFE program (project LIFE12ENV/IT/000154).

154

M.A. Toropova et al. / Computers in Biology and Medicine 64 (2015) 148–154

Appendix A. Supporting information Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.compbiomed. 2015.06.019. References [1] A. Rahman, The role of adenosine in Alzheimer's disease, Curr. Neuropharmacol. 7 (3) (2009) 207–216. [2] S. Ajmani, S. Janardhan, V.N. Viswanadhan, Toward a general predictive QSAR model for gamma-secretase inhibitors, Mol. Divers. 17 (3) (2013) 421–434. [3] J. Wu, P.M. LoRusso, L.H. Matherly, J. Li, Implications of plasma protein binding for pharmacokinetics and pharmacodynamics of the γ-secretase inhibitor RO4929097, Clin. Cancer. Res. 18 (7) (2012) 2066–2079. [4] CORAL software, 2015; available at 〈http://www.insilio.eu/coral〉 (accessed 03.15). [5] A. Gaulton, L.J. Bellis, BentoAP, J. Chambers, M. Davies, A. Hersey, et al., ChEMBL: a large-scale bioactivity database for drug discovery, Nucleic Acids Res. 40 (2012) D1100–D1107.

[6] A.A. Toropov, E. Benfenati, Correlation weighting of valence shells in QSAR analysis of toxicity, Bioorg. Med. Chem. 14 (11) (2006) 3923–3928. [7] A.P. Toropova, A.A. Toropov, CORAL software: Prediction of carcinogenicity of drugs by means of the Monte Carlo method, Eur. J. Pharm. Sci. 52 (1) (2014) 21–25. [8] A.A. Toropov, A.P. Toropova, T. Puzyn, E. Benfenati, G. Gini, D. Leszczynska, J. Leszczynksy, QSAR as a random event: models for nanoparticles uptake in PaCa2 cancer cells, Chemosphere 92 (2013) 31–37. [9] P.K. Ojha, K. Roy, Comparative QSARs for antimalarial endochins: importance of descriptor-thinning and noise reduction prior to feature selection, Chemom. Intell. Lab. 109 (2011) 146–161. [10] P.K. Ojha, I. Mitra, R.N. Das, K. Roy, Further exploring r2m metrics for validation of QSPR models, Chemom. Intell. Lab. Syst. 107 (2011) 194–205. [11] A.P. Toropova, A.A. Toropov, R. Rallo, D. Leszczynska, J. Leszczynski, Optimal descriptor as a translator of eclectic data into prediction of cytotoxicity for metal oxide nanoparticles under different conditions, Ecotoxicol. Environ. Saf. 112 (2015) 39–45. [12] REACH, 2007. 〈http://ec.europa.eu/environment/chemicals/reach/reach_intro. htm〉 (accessed 30.04.15). [13] OECD principles for the validation, for regulatory purposes, of (quantitative) structure–activity relationship models, 2004, 〈http://www.oecd.org/dataoecd/ 33/37/37849783.pdf〉 (accessed 30.04.15).