Interdiscip Sci Comput Life Sci (2014) 6: 71–83 DOI: 10.1007/s12539-014-0166-4
Prediction of Human Volume of Distribution Values for Drugs Using Linear and Nonlinear Quantitative Structure Pharmacokinetic Relationship Models Bruno LOUIS1 , Vijay K. AGRAWAL2†,∗ 1
(Department of Pharmacy, Sultan Qaboos University Hospital, Al Khod, Muscat 123, Oman) 2 (QSAR and Computer Chemical Laboratories, A.P.S. University, Rewa 486003, India)
Received 22 May 2012 / Revised 29 October 2012 / Accepted 21 November 2012
Abstract: In the present study the volume of distribution values in humans of 121 drugs was estimated using quantitative structure pharmacokinetic relationship analysis. The multiple linear regression (MLR) method and nonlinear artificial neural network (ANN) and support vector machines (SVM) were employed for modeling. The theoretically calculated molecular descriptors were used for modeling and best set of descriptors selected by correlation based feature selection (CFS) method. The performance and predictive capability of linear method was investigated and compared with nonlinear method. The ANN gave better model with an average fold error of 1.66. The test set prediction accuracy shows human volume of distribution values could be predicted, on average, within 2-fold of the actual value. Key words: QSPkR, QSPR, structure pharmacokinetic relationship, volume of distribution, ANN, SVM, modeling, CFS, machine learning.
1 Introduction In drug discovery process many drug candidates fail at the later stages of development due to unfavorable pharmacokinetic (PK) properties, i.e. absorption, distribution, metabolism, and excretion (ADME) (Kennedy, 1997). Screening and optimizing ADME properties in early stage of the drug development process can increase the success rate and decrease the cost of drug development (Hou and Wang, 2008). In many cases experimental evaluation of ADME properties cannot be used due to the time and cost effectiveness. Using computational tools to predict ADME properties of chemical compounds in the early design stages of drug discovery has become an alternative choice (Davis and Riley, 2004; Waterbeemd and Gifford, 2003). The computational method such as the quantitative structure property / pharmacokinetic relationship (QSPR/QSPkR) modeling can be used as a predictive tool. There are a few QSPkR studies reported in the literature, using theoretically calculated molecular descriptors for prediction of clearance, vol†
Present address: National Institute of Technical Teachers’ Training and Research (NITTTR), Shamla Hills, Bhopal 462002, India. ∗ Corresponding author. E-mail:
[email protected] ume of distribution, half life, and protein binding (Yap et al., 2006; Berellini et al., 2009; Durairaj et al., 2009; Gleeson, 2007). The volume of distribution (Vd) is an important PK parameter that relates drug serum concentrations to the amount of drug in the body. The drug distribution in the body mainly depends upon plasma protein binding and tissue binding. A higher Vd relates to greater tissue partitioning which means that the drug can penetrate into tissues as well as bind reversibly to tissue components, whereas drugs highly bound to plasma proteins will have lower Vd value (Bauer, 2008). Vd has a significant impact on other PK properties, such as clearance and half life. The half-life of a drug is determined by the volume of distribution and the clearance, so change in Vd can effect the duration of action. The drugs having low Vd values require more frequent dosing interval, whereas drugs with high values have prolonged duration of action and require longer dosing interval. The Vd can be estimated if the overall tissue binding and plasma protein binding could be measured. The plasma protein binding can be measured using human plasma whereas to measure tissue protein binding is normally difficult. So the volume of distribution in human is traditionally estimated from animal data with proper scaling to man (Karalis et al., 2002; Fagerholm et al., 2007; Obach et al., 1997). The biomimetic-binding measure-
72
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
ment values calculated from artificial membrane chromatography retention time were also used to estimate Vd values of drugs. In this method, phospholipid binding, plasma protein binding (Hollosy et al., 2006), logarithmic retention factor, and unbound drug in plasma, were used as variables in modeling (Sui et al., 2009). Since the animal experiments are costly and labor intensive many studies attempted to estimate the Vd using QSPkR computational methods. Lambardo and his coworkers have over the years attempted to use computational methods to predict Vd of drugs (Lombardo et al., 2002 and 2004; Lombardo et al., 2006). Ghafourian et al. (2006) employed QSPkR techniques to predict the volume of distribution values of 129 drugs and they studied acidic and basic drugs separately. The aim of this work was to establish QSPkR models to predict the volume of distribution values of drugs, using only theoretically calculated molecular descriptors. For that a large set of descriptors was calculated and a correlation based feature selection (CFS) method was employed to select the best set of descriptors for modeling. In order to find nonlinear relationships between descriptors and Vd values, we also used artificial neural network (ANN) and support vector machine (SVM) method, and compared the linear models derived by the traditional multiple linear regression (MLR) method.
2 Materials and methods 2.1
Data set
The volume of distribution at steady state or VDss values of drugs used for the study were taken from literature, which includes acidic, basic, neutral and amphylotics compounds (Sui et al., 2009). We used the same training data set of 97 compounds for model building and 24 test data for validation. The data set covered a broad range of drugs belonging to different therapeutic and pharmacological category. The volume of distribution values at steady state expressed as VDss L/kg , were converted into logarithmic scales as logVd and the values ranged from −1 to 1.32 (0.1 to 21 VDss L/kg). The logVd values along with the names of drugs are
Table 1 Compd. 1
given in Table 1. 2.2
Molecular descriptors
The molecular structure of the studied drugs was obtained from the PubChem database (http://pubchem. ncbi.nlm.nih.gov/) and constructed by using Chemaxon software package (http://www.chemaxon.com/). The molecules were then subjected to 3D optimization with CORINA and the molecular descriptors were calculated using E-Dragon software (Tetko et al., 2005; Tetko, 2005). We have calculated an additional set of descriptors using Chemaxon Marvin plug-in software. 2.3
Descriptor selection and linear model generation
The performance of QSPkR models depends mostly on the parameters/descriptors used to describe the molecular structures. So the selection of relevant descriptors to establish a model is an important step to reduce over-fitting and improve the model predictability. The machine learning methods such as filter and wrapper method can be used for descriptor selection (Witten and Frank, 2005; Demel et al., 2008). In the present study we used a filter based method called Correlationbased Feature Selection (CFS) introduced by Hall and Holmes (2003). The CFS method is a subset evaluation heuristic that takes into account the usefulness of individual features for predicting the class along with the level of inter-correlation among them. The heuristic assigns high scores to subsets containing attributes that are highly correlated with the class and have low inter-correlation with each other. The CFS first discretizes numeric features using the technique of Fayyad and Irani (1993) and then uses symmetrical uncertainty to estimate the degree of association between discrete features. After computing a correlation matrix, CFS applies a heuristic search strategy to find a good subset of features. We used the forward selection search, which produces a list of selected descriptors used for linear modeling. The descriptor selection and model building was performed using WEKA software package (Witten and Frank, 2005).
Descriptors and logVd values of training and test set compounds
Drug name
Ia
In
GATS1e
GATS5e
HATS8m
Psy80
HArRc
logVd
Acetaminophen
0
1
0.672
1.124
0.023
0
0
−0.022
2
Acyclovir
0
1
1.203
1.098
0.109
0
2
−0.161
3
Adefovir
1
0
1.312
1.159
0.16
0
2
−0.377
4
Alprazolam
0
1
0.773
0.805
0.18
0
1
−0.143
5
Amitriptyline
0
0
1.37
0.724
0.083
1
0
0.919
6
Amlodipine
0
0
0.972
1.175
0.136
0
0
1.204
7
Amoxicillin
0
0
0.731
1.112
0.108
0
0
−0.071
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
73
Continued Compd.
Drug name
Ia
In
GATS1e
GATS5e
HATS8m
Psy80
HArRc
logVd
8
Antipyrine
0
1
0.707
1.012
0.006
9
Apomorphine
0
0
0.582
1.181
0.043
0
1
−0.222
0
0
0.301
10
Aspirin
1
0
0.833
1.379
0.002
11
Atenolol
0
0
0.845
1.561
0.032
0
0
−0.824
0
0
12
Aztreonam
1
0
0.839
0.885
0.293
0
−0.032
1
−0.745
13
Caffeine
0
1
1.091
1.105
0
14
Cefazolin
1
0
0.835
1.11
0.264
0
2
−0.215
0
2
-1.000
15
Cefixime
1
0
0.868
1.173
0.304
0
1
−0.620
16
Cefoperazone
1
0
0.83
0.968
0.165
0
1
−0.658
17
Cefotaxime
1
0
0.958
1.118
0.231
0
1
−0.553
18
Ceftriaxone
1
0
0.92
1.177
0.272
0
2
−0.796
19
Chlorambucil
1
0
0.717
1.045
0.077
0
0
−0.538
20
Chloramphenicol
0
1
0.676
1.2
0.263
0
0
−0.027
21
Ciprofloxacin
0
0
0.691
1.148
0.086
0
1
0.260
22
Clonidine
0
0
0.882
0.507
0.01
0
0
0.322
23
Clozapine
0
0
1.006
0.565
0.217
1
0
0.732
24
Codeine
0
0
0.86
1.097
0.029
0
0
0.544
25
Dexamethasone
0
1
0.552
1.269
0.076
0
0
0.057
26
Diazepam
0
1
0.705
0.904
0.134
0
0
0.114
27
Diclofenac
1
0
0.641
1.007
0.35
0
0
−0.770
28
Diltiazem
0
0
0.947
1.022
0.143
1
0
0.491
29
Diprophylline
0
1
0.993
1.27
0.099
0
2
−0.097
30
Doxepin
0
0
1.042
0.754
0.102
1
0
1.068
31
Enalapril
1
0
0.804
1.035
0.086
1
0
0.230
32
Enoxacin
0
0
0.779
1.248
0.075
0
2
0.292
33
Estradiol
0
1
0.459
0.96
0.044
0
0
0.079
34
Famotidine
0
0
0.832
1.148
0.118
0
1
0.079
35
Furosemide
1
0
0.761
1.357
0.233
0
1
−0.886
36
Gliclazide
1
0
0.514
1.442
0.095
0
0
−0.444
37
Gliquidone
1
0
0.673
1.252
0.076
0
0
−0.796
38
Haloperidol
0
0
0.595
0.793
0.046
1
0
1.255
39
Hydrochlorothiazide
1
0
0.628
1.427
0.023
0
0
−0.081
40
Hydrocortisone
0
1
0.534
1.253
0.065
0
0
−0.357
41
Ibuprofen
1
0
0.538
1.346
0.046
0
0
−0.824
42
Imipramine
0
0
1.442
0.762
0.088
1
0
1.322
43
Indomethacin
1
0
0.751
1.061
0.204
1
1
-1.000
44
Isoniazide
0
1
0.697
0.75
0
0
1
−0.174 −0.824
45
Ketoprofen
1
0
0.534
1.134
0.094
0
0
46
Lamivudine
0
1
1.055
0.875
0.05
0
1
0.114
47
lansoprazole
0
1
0.725
1.148
0.258
0
2
−0.456
48
Levofloxacin
0
0
0.809
1.199
0.07
0
1
0.134
49
Lidocaine
0
0
0.828
2.623
0.084
0
0
−0.143
50
Lincomycin
0
0
0.837
0.975
0.072
0
0
0.114
51
Lomefloxacin
0
0
0.712
1.267
0.079
0
1
0.362
52
Medroxyprogesterone
0
1
0.487
1.01
0.069
1
0
0.668
74
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
Continued Compd.
Drug name
Ia
In
GATS1e
GATS5e
HATS8m
Psy80
HArRc
logVd
53
Meloxicam
1
0
0.646
1.007
0.146
0
1
−0.699
54
Meperidine
0
0
0.878
0.935
0.047
0
0
0.431
55
Metformin
0
0
1.8
1.8
0
0
0
0.389
56
Metoclopramide
0
0
0.983
0.94
0.052
0
0
0.531
57
Metolazone
1
0
0.58
1.158
0.18
1
0
0.207
58
Metoprolol
0
0
0.999
1.502
0.034
0
0
0.572
59
Midazolam
0
1
0.663
0.888
0.193
0
1
0.146
60
Minoxidil
0
0
0.752
0.742
0.028
0
1
0.477
61
Mirtazapine
0
0
1.296
0.815
0.029
0
1
0.681
62
Morphine
0
0
0.709
1.164
0.019
0
0
0.415
63
Nefopam
0
0
1.038
0.736
0.04
0
0
0.859
64
Nicardipine
0
0
0.713
1.24
0.156
0
0
−0.009
65
Nifedipine
0
1
0.738
1.218
0.12
0
0
−0.108
66
Nimodipine
0
1
0.807
1.065
0.197
1
0
0.176
67
Norfloxacin
0
0
0.724
1.242
0.081
0
1
0.447
68
Omeprazole
0
1
0.951
1.19
0.081
1
2
−0.469
69
Oxazepam
0
1
0.645
1.118
0.158
0
0
−0.229
70
Penicillin
1
0
0.696
1.179
0.111
0
0
−0.347
71
Pentobarbital
1
0
0.745
1.038
0.031
0
0
0.000
72
Phenytoin
1
0
0.63
1.407
0.03
0
0
−0.194
73
Pravastatin
1
0
0.697
0.921
0.078
0
0
−0.337
74
Prednisolone
0
1
0.534
1.253
0.073
0
0
−0.284
75
Prilocaine
0
0
0.743
1.793
0.036
0
0
0.568
76
Promethazine
0
0
1.451
0.721
0.069
1
0
1.146
77
Propafenone
0
0
0.751
1.389
0.067
1
0
0.556
78
Propranolol
0
0
0.818
1.545
0.045
0
0
0.604
79
Pseudoephedrine
0
0
0.643
0.964
0.007
0
0
0.453
80
Quinidine
0
0
0.847
1.236
0.075
1
1
0.544
81
Ranitidine
0
0
0.793
1.08
0.038
0
1
0.114
82
Risperidone
0
0
0.688
0.663
0.044
1
2
0.041
83
Salbutamol
0
0
0.649
1.581
0.047
0
0
0.279
84
Salicylic acid
1
0
0.643
2.143
0
0
0
−0.770
85
Simvastatin
0
1
0.761
0.791
0.089
0
0
−0.009
86
Sulfadiazine
1
0
0.613
1.74
0.055
0
1
−0.036
87
Sulpiride
0
0
0.749
1.326
0.095
0
0
0.398
88
Theophylline
0
1
1.074
1.712
0
0
2
−0.244
89
Tinidazole
0
1
0.874
0.996
0.046
0
0
−0.155 −0.854
90
Tolbutamide
1
0
0.558
1.583
0.076
0
0
91
Tramadol
0
0
0.874
0.996
0.046
0
0
0.447
92
Triamcinolone
0
1
0.578
1.252
0.096
0
0
0.246
93
Triazolam
0
0
0.737
0.899
0.227
0
1
−0.174
94
Trimethoprim
0
0
1.213
0.958
0.088
0
1
0.204
95
Venlafaxine
0
0
0.872
0.978
0.062
0
0
0.643
96
Verapamil
0
0
1.131
0.772
0.047
0
0
0.670
97
Warfarin
1
0
0.666
1.366
0.11
0
1
−0.854
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
75
Continued Compd.
Drug name
Ia
In
GATS1e
GATS5e
HATS8m
Psy80
HArRc
logVd
Test data 98
Amobarbital
1
0
0.745
0.9
0.05
0
0
0.021
99
Atropine
0
0
0.771
1.674
0.077
0
0
0.301
100
Bupivacaine
0
0
0.772
2.071
0.059
1
0
−0.155
101
Carbamazepine
0
1
0.659
1.856
0.011
0
0
0.146
102
Chlorpheniramine
0
0
0.862
0.689
0.2
0
1
0.505
103
Cimetidine
0
0
1.434
0.908
0.051
0
1
0.000
104
Cromolyn
1
0
0.899
1.016
0.059
0
2
−0.495
105
Digoxin
0
1
0.876
0.871
0.039
0
0
0.494
106
Esmolol
0
0
0.941
1.419
0.041
0
0
0.279
107
Flurbiprofen
1
0
0.534
1.109
0.1
0
0
−0.921
108
Gatifloxacin
0
0
0.781
1.146
0.09
0
1
0.243
109
Glipizide
1
0
0.65
1.453
0.077
0
1
−0.796
110
Glyburide
1
0
0.692
1.228
0.079
0
0
−0.854
111
Griseofulvin
0
1
0.998
1.058
0.156
0
0
0.176
112
Naproxen
1
0
0.72
1.214
0.051
0
0
−0.796
113
Phenobarbital
1
0
0.69
1.214
0.028
0
0
−0.201
114
Piroxicam
1
0
0.619
1
0.174
0
1
−0.854
115
Prazosin
0
0
1.13
0.666
0.041
0
2
−0.174
116
Prednisone
0
1
0.534
1.253
0.074
0
0
−0.013
117
Propiverine
0
0
0.919
1.546
0.103
1
0
0.279
118
Pyrazinamide
0
1
0.937
2.38
0
0
1
−0.155
119
Quinine
0
0
0.847
1.236
0.072
1
1
0.204
120
Ropivacaine
0
0
0.774
2.147
0.06
0
0
−0.260
121
Timolol
0
0
1.133
1.484
0.07
0
1
0.544
2.4
Artificial neural network
Artificial neural network (ANN) is a machine learning method suitable for modeling non-linear relationship. The theory and application of ANN studies in QSPR modeling is extensively discussed in many reviews (Haykin, 2006; Zupan and Gasteiger, 1999; Zupan, 1994). A conventional three-layered back-propagation network was employed in this study (Wythoff, 1993). The back-propagation ANN uses the supervised learning technique and the network is trained by minimizing the squared error of the network’s output. The error is calculated between the desired values and the network’s output. This error is propagated backwards through the network for adjusting the weights to minimize the error. This is achieved by modifying the connection weights during learning by using gradient descent algorithm. A learning rate parameter η influences the rate of weight adjustment and a momentum term μ prevents sudden changes in the direction in which corrections are made. The over fitting problem was minimized by monitoring the performance of the network during training by using a val-
idation data set. A 10-fold cross validation was done during the selection of η, μ and the number of neurons in the hidden layer. 2.5
Support vector machines
Support vector machine (SVM) is a machine learning technique based on the statistical learning theory developed for pattern classification problems by Vapnik (Vapnik, 1995; Cortes and Vapnik, 1995). This technique is built on the structural risk minimization (SRM) principle and is superior to the traditional empirical risk minimization (ERM) principle. The ERM only minimizes the error on the training data whereas SRM minimizes an upper bound on the expected risk. SVM is used to solve non-linear regression problems in QSAR/QSPR studies and the details are given in literature (Burges, 1998; Doucet et al., 2007; Li et al., 2009; Ivanciuc, 2007). The main concepts of SVM are briefly described below. The correlation between structure and property can be defined by yi = f (xi ). The term f (xi ) can be represented by a linear function of the form f (xi ) =
76
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
wi , xi + b, where w is the weight vector of the linear function and b corresponds to the coefficient. In SVM, the input data is first mapped into highdimensional feature space by the use of kernel function and then linear regression is performed in the feature space. The non-linear feature mapping will allow the treatment of non-linear problems in a linear space. In the higher dimensional feature space SVM approximates the set of data with a linear function, y=
m
2.6 wi Φ(xi ) + b,
i=1
where Φ(xi ) is the features of input variables after kernel transformation while wi and b are coefficients. The radial basis function kernel (RBF) or Gaussian kernel which is most commonly used in QSPR problems was applied in this study. The RBF kernel can perform the non-linear mapping as described by the following equation: k(x, y) = exp(−γ x − y2 ). The new feature space after kernel transformation allows the data to be linearly separable by hyper planes or conduct a linear regression. The coefficients w and b are estimated by minimizing regularized risk function which is defined as: R(C) = C
N 1 1 2 Lε(yi , f (xi , w)) + w . N i=1 2
N 1 Lε(yi , f (xi , w)), the empirical N i=1 error (risk), is measured by ε insensitive loss function. ε is a prescribed parameter and is referred to as tube size, and it is defined as the approximation accuracy placed on the training data points. The loss function ignores errors as long as it is less than ε, in other words, errors below ε would not be penalized. The second term 1 2 w is the regularization term, and it is a measure of 2 function flatness. The value of the cost function C determines the trade-off between the empirical error and the regularized term. The minimization of regularized risk function is a constrained optimization problem and this can be reformulated into dual problem formalism by using Lagrange multipliers. The calculation is performed using John Platt’s sequential minimal optimization (SMO) algorithm and modified by the method proposed by Smola and Scholkopf (Smola and Scholkopf, 2004; Shevade et al., 1999). The performance of SVM depends on selection of kernel type, optimizing value of RBF kernel parameter γ (gamma), complexity parameter or regularization parameter C, and ε-insensitive loss function. The RBF
The first term C
kernel parameter γ controls the amplitude of the Gaussian function and also affects the generalization ability of SVM. The parameter ε of ε-insensitive loss function is referred to as tube size, and it is defined as the approximation accuracy placed on the training data points. The value of ε also decides the number of support vectors, the higher the value is, the fewer support vectors are selected. A ten-fold cross validation procedure was used to select the optimum values of above parameters. Validation techniques and model performance evaluation
To validate the ANN and SVM models and their performance, we have used a 10-fold cross-validation procedure, in which the data set was split into 10 folds; one fold was used for testing, and the rest for training. This procedure was repeated for 10 times, so all data were used as test data once, and finally these outputs were averaged. T he error estimate, root mean square error (RMSE) was calculated and then averaged. The RMSE was calculated using the following equation: n 1 RM SE = (log V d exp − log V dpred)2 , n i=1 where n is the number of data points, logVdpred represents the predicted output from the model for a given input, while logVdexp is the desired output for the same input. The overall accuracy of predicted parameters was expressed in terms of average fold-error, which was calculated as the mean of the individual fold-error values (Wajima et al., 2002). Fold error (FE) was calculated according to the following equation and the average values were reported as average fold error AFE. F E = anti log (|log V dexp − log V dpred|)
3 Results and discussion 3.1
Descriptor selection and linear model
We have computed a total of 1707 descriptors using E-DRAGON and Chemaxon, which include 1D, 2D and 3D descriptors. The training data set was used for description selection using the CFS method which resulted in a subset of forty descriptors. The reduced descriptor subset along with two indicator parameters Ia accounting for the presence of acidic nature of the compound and In indicating neutral compounds, was then used for linear model building using a forward step-wise multiple linear regression analysis. Seven statistically significant descriptors were selected and this resulted in the following regression equation with a R2 value of 0.782. The selected descriptors and their values are shown in Table 1.
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
77
logVd = 0.456 - 0.796(± 0.147) Ia - 0.371(± 0.143) In + 0.352(± 0.263) GATS1e - 0.233(± 0.182) GATS5e - 0.846(± 0.783) HATS8m + 0.318(± 0.154) Psy80 0.184(± 0.085) HArRc; N = 97, R = 0.884, R2 = 0.782, SE = 0.265, F = 45.66; PRESS = 6.245, SSY = 22.426, PRESS/SSY = 0.278, R2 cv = 0.722; RMSE = 0.254. Where R is the correlation coefficient, SE is the standard error of estimate. The figures in parentheses with the regression coefficients were standard errors of coefficients and F is the F-statistics. A reliable MLR Table 2 logVd logVd
GATS1e
model is one that has high R2 and low SE values. The correlation matrix for selected parameters is given in Table 2. The correlation matrix shows no intercorrelation of selected descriptors. To further check the inter-correlation of the descriptors, variance inflation factor (VIF) and tolerance were calculated (Draper and Smith, 1981). VIF = 1/1-R2 and tolerance = 1/VIF. In practice, when VIF>5 or if the tolerance remains less than 0.20, it would indicate multicollinearity among the descriptors. The calculated VIF and tolerance values against each descriptor are shown in Table 2. The VIF values are all less than 2 and the tolerance higher than 0.2, indicating that there is no multicollinearity and the equation is stable.
Inter correlation of descriptors and collinearity statistics GATS5e
HATS8m
Psy80
HArRc
Ia
In
Tolerance
VIF
1
GATS1e
0.369
1
GATS5e
−0.285
−0.160
1
HATS8m
−0.368
−0.060
−0.153
1
Psy80
0.375
0.180
−0.273
0.077
1
HArRc
−0.323
0.202
−0.055
0.251
−0.071
1
Ia
−0.684
−0.235
0.189
0.324
−0.099
0.0472
1
In
−0.113
−0.144
−0.095
0.008
−0.071
0.1668
−0.375
We used internal cross-validation method and calculated various cross-validation parameters PRESS (Predicted residual sum of squares), SSY (Sum of squares of the response values), R2cv (Cross validated correlation coefficient). PRESS is a good estimate of the real predictive power of QSPR/QSPkR model. The value of PRESS is smaller than SSY in the regression equation, so the model predicts better than chance. The ratio of PRESS/SSY can be used to calculate approximate confidence intervals of predictions of new observations. To be a reasonable QSPR/QSPkR model, PRESS/SSY should be lower than 0.4, and the value 0.278 indicates a good model. The developed linear model was cross-validated by the leave-one-out method. The high values observed (R2cv = 0.722) are indicative of the reliability in prediction of logVd. An R2 value of 0.782 of this model reveals that it is able to account for 78% of the variances of the logVd. The selected descriptors GATS1e, GATS5e, HATS8m are autocorrelation descriptors of chemical compounds calculated by using various molecular properties that can be represented at the atomic level (Todeschini and Consonni, 2009). GATS1e and GATS5e are Geary autocorrelation - lag 1 and lag 5 / weighted by atomic Sanderson electronegativities and
1
0.782
1.278
0.851
1.175
0.781
1.281
0.880
1.137
0.825
1.212
0.640
1.563
0.731
1.369
belong to 2D autocorrelation indices, while HATS8m is a leverage-weighted autocorrelation of lag 8 / weighted by atomic masses belonging to GETAWAY descriptors. The HATS indices are defined by weighting each atom of the molecule by its physico-chemical properties combined with the diagonal elements of the molecular influence matrix H, thus accounting for 3D features of the molecules. The calculation of these indices is based on the atomic mass weighting scheme scaled on the carbon atom. The Psychotic-80 (Psy80) is an anti psychoticlike index proposed by Ghose-Viswanadhan-Wendoloski (Ghose et al., 1999). They are derived from analysis of the distribution of logP, molar refractivity, molecular weight, number of atoms and chemical constitutions of known antipsychotic drug molecules available in the comprehensive medicinal chemistry database. The property ranges cover approximately 80% of the drugs studied. The positive sign of Psy80 in the equation indicates that increasing antipsychotic drug like property or structural features will increase Vd. It is interesting to note that most antipsychotic drugs have an increased volume of distribution. They tend to exhibit moderate plasma protein binding, coupled with extensive tissue localization and strong tissue binding, resulting in a large Vd (Gorgia, 1993).
78
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
Table 3 Compd.
The predicted logVd values using MLR, ANN and SVM models
Drug name
logVd(exp)
MLR
Residual
ANN
Residual
logVdpred
logVdpred
SVM
Residual
logVdpred
1
Acetaminophen
−0.022
0.041
0.063
0.141
0.163
0.034
0.056
2
Acyclovir
−0.161
−0.207
−0.046
−0.057
0.104
−0.176
−0.015
3
Adefovir
−0.377
−0.651
−0.274
−0.244
0.133
−0.588
−0.211
−0.143
−0.166
−0.024
−0.035
0.108
−0.135
0.008
0.919
1.018
0.099
1.086
0.167
1.035
0.116
4
Alprazolam
5
Amitriptyline
6
Amlodipine
1.204
0.41
−0.794
0.233
−0.971
0.396
−0.808
7
Amoxicillin
−0.071
0.363
0.434
0.264
0.335
0.348
0.419
8
Antipyrine
−0.222
−0.09
0.131
−0.014
0.208
−0.089
0.133
9
Apomorphine
0.301
0.35
0.049
0.367
0.066
0.331
0.03
10
Aspirin
−0.824
−0.369
0.455
−0.195
0.629
−0.401
0.423
11
Atenolol
−0.032
0.363
0.395
0.37
0.401
0.33
0.362
12
Aztreonam
−0.745
−0.682
0.063
−0.827
−0.083
−0.683
0.062
13
Caffeine
−0.215
−0.156
0.059
−0.075
0.139
−0.146
0.069
14
Cefazolin
-1.000
−0.895
0.105
−0.708
0.292
−0.874
0.126
15
Cefixime
−0.620
−0.748
−0.128
−0.87
−0.25
−0.761
−0.142
16
Cefoperazone
−0.658
−0.596
0.061
−0.558
0.1
−0.589
0.069
17
Cefotaxime
−0.553
−0.642
−0.089
−0.735
−0.182
−0.639
−0.086
18
Ceftriaxone
−0.796
−0.888
−0.092
−0.724
0.072
−0.863
−0.067
19
Chlorambucil
−0.538
−0.395
0.142
−0.409
0.129
−0.419
0.118
20
Chloramphenicol
−0.027
−0.178
−0.152
−0.282
−0.255
−0.167
−0.141
21
Ciprofloxacin
0.260
0.176
−0.085
0.295
0.035
0.178
−0.082
22
Clonidine
0.322
0.641
0.318
0.591
0.269
0.669
0.346
23
Clozapine
0.732
0.814
0.081
0.534
−0.198
0.802
0.069
24
Codeine
0.544
0.479
−0.065
0.462
−0.082
0.475
−0.069
25
Dexamethasone
0.057
−0.08
−0.137
0.062
0.005
−0.089
−0.146
26
Diazepam
0.114
0.01
−0.104
0.104
−0.01
0.025
−0.089
27
Diclofenac
−0.770
−0.644
0.125
−0.834
−0.065
−0.702
0.068
28
Diltiazem
0.491
0.749
0.258
0.618
0.126
0.728
0.237
29
Diprophylline
−0.097
−0.313
−0.216
−0.084
0.013
−0.294
−0.197
30
Doxepin
1.068
0.88
−0.189
0.902
−0.167
0.884
−0.184
31
Enalapril
0.230
−0.052
−0.282
0.021
−0.209
−0.018
−0.249
32
Enoxacin
0.292
0.008
−0.284
0.318
0.026
0.025
−0.267
33
Estradiol
0.079
−0.014
−0.093
0.121
0.042
−0.014
−0.094
34
Famotidine
0.079
0.198
0.119
0.267
0.188
0.204
0.125
35
Furosemide
−0.886
−0.769
0.117
−0.81
0.076
−0.791
0.095
36
Gliclazide
−0.444
−0.575
−0.131
−0.62
−0.176
−0.635
−0.191
37
Gliquidone
−0.796
−0.458
0.338
−0.476
0.32
−0.497
0.299
38
Haloperidol
1.255
0.76
−0.495
0.912
−0.343
0.76
−0.495
39
Hydrochlorothiazide
−0.081
−0.47
−0.389
−0.358
−0.277
−0.518
−0.437
40
Hydrocortisone
−0.357
−0.073
0.283
0.071
0.427
−0.083
0.273
41
Ibuprofen
−0.824
−0.502
0.322
−0.444
0.38
−0.552
0.272
42
Imipramine
1.322
1.03
−0.292
1.083
−0.239
1.045
−0.277
43
Indomethacin
-1.000
−0.361
0.639
−0.695
0.305
−0.388
0.612
44
Isoniazide
−0.174
−0.028
0.146
0.006
0.18
−0.018
0.156
45
Ketoprofen
−0.824
−0.495
0.329
−0.539
0.285
−0.536
0.288
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
79
Continued Compd.
Drug name
logVd(exp)
MLR
Residual
ANN
Residual
logVdpred
logVdpred
46
Lamivudine
0.114
0.027
−0.087
0.029
47
lansoprazole
−0.456
−0.513
−0.057
48
Levofloxacin
0.134
0.219
0.085
49
Lidocaine
−0.143
0.066
SVM
Residual
logVdpred −0.085
0.048
−0.066
−0.163
0.293
−0.48
−0.025
0.322
0.189
0.225
0.091
0.209
−0.012
0.131
−0.044
0.098
50
Lincomycin
0.114
0.463
0.349
0.408
0.294
0.462
0.348
51
Lomefloxacin
0.362
0.161
−0.201
0.293
−0.069
0.159
−0.203
52
Medroxyprogesterone
0.668
0.281
−0.387
0.532
−0.137
0.253
−0.415
53
Meloxicam
−0.699
−0.654
0.045
−0.552
0.147
−0.66
0.039
54
Meperidine
0.431
0.508
0.077
0.469
0.038
0.512
0.081
55
Metformin
0.389
0.671
0.282
0.546
0.157
0.641
0.251
56
Metoclopramide
0.531
0.54
0.008
0.482
−0.05
0.546
0.015
57
Metolazone
0.207
−0.239
−0.446
−0.163
−0.369
−0.243
−0.45
58
Metoprolol
0.572
0.43
−0.142
0.408
−0.164
0.404
−0.167
59
Midazolam
0.146
−0.235
−0.381
−0.074
−0.22
−0.208
−0.355
60
Minoxidil
0.477
0.341
−0.136
0.388
−0.089
0.367
−0.11
61
Mirtazapine
0.681
0.514
−0.167
0.424
−0.258
0.559
−0.122
62
Morphine
0.415
0.419
0.004
0.436
0.021
0.406
−0.009
63
Nefopam
0.859
0.617
−0.242
0.548
−0.311
0.637
−0.222
64
Nicardipine
−0.009
0.287
0.296
0.069
0.078
0.258
0.267
65
Nifedipine
−0.108
−0.04
0.068
0.064
0.172
−0.039
0.069
66
Nimodipine
0.176
0.273
0.097
0.244
0.068
0.244
0.068
67
Norfloxacin
0.447
0.169
−0.278
0.295
−0.152
0.169
−0.278
68
Omeprazole
−0.469
0.024
0.493
−0.265
0.204
−0.115
0.354
69
Oxazepam
−0.229
−0.081
0.148
0.007
0.236
−0.075
0.155
70
Penicillin
−0.347
−0.463
−0.116
−0.553
−0.206
−0.498
−0.152
71
Pentobarbital
0.000
−0.345
−0.345
−0.231
−0.231
−0.363
−0.363
72
Phenytoin
−0.194
−0.471
−0.277
−0.377
−0.183
−0.517
−0.323
73
Pravastatin
−0.337
−0.374
−0.037
−0.383
−0.046
−0.392
−0.055
74
Prednisolone
−0.284
−0.08
0.204
0.064
0.348
−0.089
0.195
75
Prilocaine
0.568
0.27
−0.298
0.295
−0.273
0.219
−0.349
76
Promethazine
1.146
1.059
−0.087
1.151
0.005
1.08
−0.066
77
Propafenone
0.556
0.659
0.102
0.684
0.128
0.626
0.07
78
Propranolol
0.604
0.347
−0.258
0.342
−0.262
0.313
−0.291
79
Pseudoephedrine
0.453
0.452
−0.001
0.475
0.021
0.449
−0.004
80
Quinidine
0.544
0.537
−0.007
0.354
−0.19
0.474
−0.07
81
Ranitidine
0.114
0.268
0.154
0.359
0.245
0.281
0.167
82
Risperidone
0.041
0.457
0.415
0.169
0.128
0.364
0.323
83
Salbutamol
0.279
0.277
−0.002
0.293
0.014
0.236
−0.043
84
Salicylic acid
−0.770
−0.612
0.158
−0.473
0.297
−0.704
0.066
85
Simvastatin
−0.009
0.094
0.103
0.18
0.189
0.11
0.119
86
Sulfadiazine
−0.036
−0.759
−0.723
−0.417
−0.381
−0.798
−0.762
87
Sulpiride
0.398
0.331
−0.067
0.253
−0.145
0.305
−0.093
88
Theophylline
−0.244
−0.303
−0.059
−0.111
0.133
−0.313
−0.069
89
Tinidazole
−0.155
0.122
0.277
0.189
0.344
0.127
0.281
80
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
Continued Compd.
Drug name
logVd(exp)
MLR
Residual
ANN
Residual
logVdpred
logVdpred
SVM
Residual
logVdpred
90
Tolbutamide
−0.854
−0.576
0.278
−0.592
0.262
−0.641
0.213
91
Tramadol
0.447
0.493
0.046
0.458
0.011
0.494
0.047
92
Triamcinolone
0.246
−0.084
−0.329
0.051
−0.195
−0.09
−0.335 0.306
93
Triazolam
−0.174
0.13
0.304
0.023
0.197
0.132
94
Trimethoprim
0.204
0.402
0.198
0.37
0.166
0.434
0.23
95
Venlafaxine
0.643
0.483
−0.16
0.434
−0.209
0.484
−0.16
96
Verapamil
97
Warfarin
0.670
0.635
−0.035
0.551
−0.12
0.656
−0.015
−0.854
−0.7
0.154
−0.509
0.345
−0.719
0.135
0.021
−0.329
−0.35
−0.26
−0.281
−0.341
−0.362 −0.073
Test data 98
Amobarbital
0.301
0.273
−0.028
0.227
−0.074
0.228
−0.155
0.514
0.669
0.448
0.603
0.438
0.593
Carbamazepine
0.146
−0.124
−0.27
0.031
−0.116
−0.17
−0.316 −0.239
99
Atropine
100
Bupivacaine
101 102
Chlorpheniramine
0.505
0.246
−0.259
0.172
−0.333
0.266
103
Cimetidine
0.000
0.523
0.523
0.418
0.418
0.567
0.567
104
Cromolyn
−0.495
−0.678
−0.183
0.01
0.505
−0.635
−0.14
105
Digoxin
0.494
0.158
−0.336
0.213
−0.281
0.167
−0.327
106
Esmolol
0.279
0.423
0.144
0.4
0.121
0.401
0.122
107
Flurbiprofen
−0.921
−0.494
0.427
−0.55
0.371
−0.534
0.387
108
Gatifloxacin
0.243
0.204
−0.039
0.301
0.058
0.21
−0.033
109
Glipizide
−0.796
−0.698
0.098
−0.416
0.38
−0.72
0.075
110
Glyburide
−0.854
−0.449
0.405
−0.474
0.38
−0.485
0.369
111
Griseofulvin
0.176
0.059
−0.118
0.11
−0.066
0.073
−0.103
112
Naproxen
−0.796
−0.412
0.384
−0.367
0.429
−0.443
0.353
113
Phenobarbital
−0.201
−0.403
−0.202
−0.292
−0.092
−0.434
−0.234
114
Piroxicam
−0.854
−0.686
0.168
−0.642
0.212
−0.695
0.159
115
Prazosin
−0.174
0.296
0.47
0.374
0.548
0.352
0.526
116
Prednisone
−0.013
−0.081
−0.068
0.063
0.076
−0.09
−0.077
117
Propiverine
0.279
0.651
0.372
0.558
0.28
0.606
0.327
−0.155
−0.323
−0.168
−0.104
0.051
−0.379
−0.224
0.204
0.54
0.336
0.362
0.158
0.477
0.273
−0.260
0.178
0.438
0.167
0.426
0.102
0.362
0.544
0.267
−0.278
0.33
−0.214
0.27
−0.274
3.2
ANN models
118
Pyrazinamide
119
Quinine
120
Ropivacaine
121
Timolol
HArRc (Heteroaromatic ring count) is a constitutional descriptor. Increasing the heteroaromatic ring count in a molecule will result in increased human serum albumin (HSA) binding (Ritchie et al, 2011). The increased HSA binding of drugs will result in low volume of distribution. It is worthy to note that the sign of HArRc descriptor in the equation is negative and that increasing this parameter causes Vd to decrease. Similarly parameter indicating acidity (Ia) also has negative influence on Vd, increasing the acidity of the compound will also increase HSA binding and consequently this will lower the Vd (Kratochwil et al., 2004).
The architecture of the ANN consist of seven neurons in the input layer and four neurons in the hidden layer selected by the auto built function and one output neuron which is the logVd value. The input neurons correspond to the seven selected descriptors Ia, In, GATS1e, GATS5e, HATS8m, Psy80 and HArRc. A sigmoid transfer function was used in all the neurons. The optimum value of learning rate η and momentum μ was determined by varying their values from 0.01 to 1.0, the combination of η = 0.2 and μ = 0.3 which gives the lowest RMSE was selected. The optimization was done
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
81
the ε and γ values were found, the C value was further optimized as 60. The selected parameters (γ = 0.01, ε = 0.029, C = 60) were used for the final training run on training set and predicting the logVd values. The plot of predicted versus experimental logVd based on this model is shown in Fig. 3 and the values are shown in Table 3. The statistical parameters of this model are RMSE = 0.245, R2 = 0.794. This SVM model was used to predict logVd values of the test data set and the predicted values are given in Table 3.
R2=0.782
1.0 0.5 0 −0.5 −1.0 −1.5
Fig. 1
−1.0
−0.5
0 0.5 logVd (exp)
1.0
1.5 1.5
Plot of logVd predicted versus experimental using MLR model.
with 10-fold cross validation and 20% of data were used for validation. The learning time was set to 500. With the above selected parameters the number of neurons in the hidden layer was optimized by varying from 1 to 10, the ANN model with four hidden neurons gives the lowest RMSE. When all the training data were trained in the network using the optimized parameters with the architecture of 7-4-1, it gives a R2 = 0.819, RMSE = 0.235. A plot of experimental and predicted value of logVd of the training data using the ANN model is shown Fig. 2. The predicted values of logVd of the training data are shown in Table 3.
logVd (pred) ANN
1.5 R2=0.818
1.0 0.5 0 −0.5 −1.0 −1.5
Fig. 2
3.3
−1.0
−0.5
0 0.5 logVd (exp)
1.0
1.5
Plot of logVd predicted versus experimental using ANN model.
SVM models
The seven molecular descriptors selected by the linear method were used as inputs for SVM. To make the learning process stable, a large value of C (C = 100) was kept initially (Wang, 2003). If the value of C is too small, then insufficient stress will be placed on fitting the training data. SVM was trained with γ values varying from 0.001 to 0.5 and the optimum value 0.01 was selected which gives the lowest RMSE. The optimum value of ε was found by varying the value from 0.01 to 0.2 and the value 0.029 gives the lowest RMSE. After
logVd (pred) SVM
logVd (pred) MLR
1.5
R2=0.794
1.0 0.5 0 −0.5 −1.0 −1.5
Fig. 3
3.4
−1.0
−0.5
0 0.5 logVd (exp)
1.0
1.5
Plot of logVd predicted versus experimental using SVM model.
Comparison of MLR, ANN and SVM models
Initially we tried to model the volume of distribution values using linear MLR method, this approach was easily interpretable and was able to predict the logVd values with R2 = 0.782, RMSE = 0.254. The descriptors selected were able to provide some insights into the structural factors influencing the volume of distribution. In order to find non-linearity of the selected descriptors, we tried to develop the ANN and SVM models using the same set of descriptors. The statistical analysis of the three different methodologies is presented in Table 4. The ANN model gave the best performance for the training data in the present study. This model gives the highest R2 = 0.819 and lowest RMSE = 0.235 compared to other models. The training set prediction accuracy is within 1.66 average fold errors. For the ANN model, the percentage of compounds within 2-fold error is 84 compared to 79 and 78 for SVM and MLR models. To evaluate the prediction ability of the proposed methods an external validation was also performed with a test set not used in model building. The predictive ability of ANN model is higher by judging from the R2 and RMSE values, which is given in Table 4. The prediction accuracy was measured using AFE. The AFE of the test set for the ANN model is 2.00 while 1.99 and 2.05 for SVM and MLR models, which indicates that the prediction accuracy of all the three models are al-
82
Interdiscip Sci Comput Life Sci (2014) 6: 71–83
most identical. In comparison with the error normally associated with the prediction using interspecies scaling, which is reported to be in the range of 1.56-2.78 (Obach et al., 1997; Mahmood and Balian, 1996), our prediction accuracy is good. It is worthy to mention here that predictions were made solely from molecular structure using theoretically calculated parameters without using any biological data as descriptors. Table 4
Statistical results of different QSPkR models
Model
R
R2
RM SE
AF E
MLR(train)
0.884
0.782
0.254
1.709
ANN(train)
0.905
0.819
0.235
1.658
SVM(train)
0.891
0.794
0.245
1.681
MLR(test)
0.749
0.561
0.323
2.05
ANN(test)
0.788
0.621
0.317
2
SVM(test)
0.762
0.581
0.312
1.99
4 Conclusions To prevent late stage failure in drug discovery, lead candidates must possess appropriate pharmacokinetic properties. So ADME screening has been used as an important part in the early stages of drug discovery process. We have derived a relationship between the chemical structure and volume of distribution values in humans for a data set of 121 structurally unrelated drugs by means of linear and non-linear QSPkR models. The results obtained demonstrate that a QSPkR based prediction using theoretically calculated descriptors can lead to reasonable predictions of human pharmacokinetics Vd values. The statistical analyses of the training data indicate the superiority of the ANN model over SVM and MLR on predictive ability and accuracy of prediction. The results from the study also suggest that the Sanderson electronegativities, atomic mass, the number of heteroaromatic ring in the molecule, antipsychotic drug like properties, and acidity of the molecule play a key role in the Vd values. Thus, the proposed models provide some insights into structural features for screening compounds for pharmacokinetic properties in early drug development stage and help in reduction of animal experiments.
References [1] Balant-Gorgia, A.E., Balant, L.P., Andreoli, A. 1993. Pharmacokinetic optimization of the treatment of psychosis. Clin Pharmacokinetics 25, 217-236. [2] Bauer, L.A. 2008. Applied Clinical Pharmacokinetics. 2nd Edition, McGraw-Hill Medical, New York. [3] Berellini, G., Springer, C., Waters, N.J., Lombardo, F. 2009. In silico prediction of volume of distribution
[4]
[5] [6]
[7]
[8]
[9] [10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
in human using linear and nonlinear models on a 669 compound data set. J Med Chem 52, 4488-4495. Burges, C.A. 1998. Tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2, 1-43. Cortes, C., Vapnik, V. 1995. Support vector networks. Mach Learn 20, 273-293. Davis, A.M., Riley, R.J. 2004. Predictive ADMET studies, the challenges and the opportunities. Curr Opin Chem Biol 8, 378-386. Demel, M.A., Andreas, G.K., Janecek, K.M.T., Ecker, G.F., Gansterer, W.N. 2008. Predictive QSAR models for polyspecific drug targets: The importance of feature selection. Curr Comput Aided Drug Des 4, 91110. Doucet, J.P., Barbault, F., Xia, H., Panaye, A., Fan, B. 2007. Nonlinear SVM approaches to QSPR/QSAR studies and drug design. Curr Comput Aided Drug Des 3, 263-389. Draper, N.R., Smith, H. 1981. Applied Regression Analysis. Wiley, New York. Durairaj, C., Shah, J.C., Senapati, S., Kompella, U.B. 2009. Prediction of vitreal half-life based on drug physicochemical properties: Quantitative structurepharmacokinetic relationships (QSPKR). Pharm Res 26, 1236-1260. Fagerholm, U. 2007. Prediction of human pharmacokinetics; evaluation of methods for prediction of volume of distribution. J Pharm Pharmacol 59, 1181-1190 Fayyad, U.M., Irani, K.B. 1993. Multi-interval discretisation of continous valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France, Morgan Kaufmann, 1022-1027. Ghafourian, T., Barzegar-Jalali, M., Dastmalchi, S., Khavari, T., Hakimiha, N., Nokhodchi, A. 2006. QSPR models for the prediction of apparent volume of distribution. Int J Pharm 319, 82-97. Ghose, A.K., Viswanadhan, V.N., Wendoloski, J.J. 1999. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J Comb Chem 1, 55-68. Gleeson, M.P. 2007. Plasma protein binding affinity and its relationship to molecular structure: An insilico analysis. J Med Chem 50, 101-112. Hall, M.A., Holmes, G. 2003. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15, 1437-1447. Haykin, S. 2006. Neural Networks. A Comprehensive Foundation. 2nd Edition, Perarson Prentice Hall, New Delhi. Hollosy, F., Valko, K., Hersey, A., Nunhuck, S., Gyorgy, K., Bevan, C. 2006. Estimation of volume of distribution in humans from high throughput HPLCbased measurements of human serum albumin binding and immobilized artificial membrane partitioning. J Med Chem 49, 6958-6971.
Interdiscip Sci Comput Life Sci (2014) 6: 71–83 [19] Hou, T., Wang, J. 2008. Structure-ADME relationship: Still a long way to go? Expert Opin Drug Metab Toxicol 4, 759-770. [20] Ivanciuc, O. 2007. Applications of support vector machines in chemistry. Rev Comput Chem 23, 291-400. [21] Karalis, V., Tsantili-Kakoulidou, A., Macheras, P. 2002. Multivariate statistics of disposition pharmacokinetic parameters for structurally unrelated drugs. Pharm Res 19, 1827-1834. [22] Kennedy, T. 1997. Managing the drug discovery and development interface. Drug Discov Today 2, 436-444. [23] Kratochwil, N.A., Huber, W., Muller, F., Kansy, M., Gerber, P.R. 2004. Predicting plasma protein binding of drugs: Revisited. Curr Opin Drug Discov Dev 7, 507-510 [24] Li, H., Liang, Y., Xu, Q. 2009. Support vector machines and its applications in chemistry. Chemom Intell Lab Syst 95, 188-198. [25] Lombardo, F., Obach, R.S., Shalaeva, M.Y., Gao, F. 2002. Prediction of volume of distribution in humans for neutral and basic drugs using physicochemical measurements and plasma protein binding data. J Med Chem 45, 2867-2876. [26] Lombardo, F., Obach, R.S., Shalaeva, M.Y., Gao, F. 2004. Prediction of human volume of distribution values for neutral and basic drugs. J Med Chem 47, 12421250. [27] Lombardo, F., Obach, R.S., DiCapua, F.M. 2006. Hybrid mixture discriminant analysis-random forest model for the prediction of volume of distribution. J Med Chem 49, 2262-2267. [28] Mahmood, I., Balian, J.D. 1996. Interspecies scaling: Predicting pharmacokinetic parameters of antiepileptic drugs in humans from animals with special emphasis on Cl. J Pharm Sci 85, 411-414. [29] Obach, R.S., Baxter, J.G., Liston, T.E., Silber, B.M., Jones, B.C., Maclntyre, F., Rance, D.J., Wastall, P.J. 1997. The prediction of human pharmacokinetic parameters from preclinical and in vitro metabolism data. Pharmacol Exp Ther 283, 46-58. [30] Ritchie, T.J., Macdonald, S.J.F., Young, R.J., Pickett, S.D. 2011. The impact of aromatic ring count on compound developability: Further insights by examining carbo- and hetero-aromatic and aliphatic ring types. Drug Discov Today 16, 164-171. [31] Shevade, S.K., Keerthi, S.S., Bhattacharyya, C., Murthy, K.R.K. 1999. Improvements to SMO algorithm for SVM regression. Technical report CD-99-16, Control Division Dept of Mechanical and Production Engineering, National University of Singapore, Singapore.
83 [32] Smola, A.J., Scholkopf, B. 2004. A tutorial on support vector regression. Stat Comput 14, 199-222. [33] Sui, X., Suna, J., Li, H., Wang, Y., Liu, J., Liu, X., Zhanga, W., Chen, L., He, Z. 2009. Prediction of volume of distribution values in human using immobilized artificial membrane partitioning coefficients, the fraction of compound ionized and plasma protein binding data. Eur J Med Chem 44, 4455-4460. [34] Tetko, I.V. 2005. Computing chemistry on the web. Drug Discov Today 10, 1497-1500. [35] Tetko, I.V., Gasteiger, J., Todeschini, R., Mauri, A., Livingstone, D., Ertl, P., Palyulin, V.A., Radchenko, E.V., Zefirov, N.S., Makarenko, A.S., Tanchuk, V.Y., Prokopenko, V.V. 2005. Virtual computational chemistry laboratory - design and description. J Comput Aid Mol Des 19, 453-463. [36] Todeschini, R., Consonni, V. 2009. Molecular Descriptors for Chemoinformatics (Methods and Principles in Medicinal Chemistry). Wiley-VCH, Weinheim. [37] Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer, New York. [38] Wajima, T., Fukumura, K., Yano, Y., Oguma, T. 2002. Prediction of human clearance from animal data and molecular structural parameters using multivariate regression analysis. J Pharm Sci 91, 2489-99. [39] Wang, W.J., Xu, Z.B., Lu, W.Z., Zhang, X.Y. 2003. Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 55, 643-663. [40] Waterbeemd, H.V.D., Gifford, E. 2003. ADMET in silico modeling: Towards prediction paradise? Nat Rev Drug Discov 2, 192-204. [41] Witten, I.H., Frank, E. 2005. Data Mining: Practical machine learning tools and techniques. 2nd Ed., Morgan Kaufmann, San Francisco. [42] Wythoff, B.J. 1993. Back-propagation neural networks: A tutorial. Chemom Intell Lab Syst 18, 115155. [43] Yap, C.W., Li, Z.R., Chen, Y.Z. 2006. Quantitative structure-pharmacokinetic relationships for drug clearance by using statistical learning methods. J Mol Graphics Modell 24, 383-395. [44] Zupan, J. 1994. Introduction to artificial neural network (ANN) methods: What they are and how to use them? Acta Chim Slov 41, 327-352. [45] Zupan, J., Gasteiger, J. 1999. Neural Networks in Chemistry and Drug Design. Wiley-VCH, Weinheim.