Interdiscip Sci Comput Life Sci (2014) 6: 71–83 DOI: 10.1007/s12539-014-0166-4

Prediction of Human Volume of Distribution Values for Drugs Using Linear and Nonlinear Quantitative Structure Pharmacokinetic Relationship Models Bruno LOUIS1 , Vijay K. AGRAWAL2†,∗ 1

(Department of Pharmacy, Sultan Qaboos University Hospital, Al Khod, Muscat 123, Oman) 2 (QSAR and Computer Chemical Laboratories, A.P.S. University, Rewa 486003, India)

Received 22 May 2012 / Revised 29 October 2012 / Accepted 21 November 2012

Abstract: In the present study the volume of distribution values in humans of 121 drugs was estimated using quantitative structure pharmacokinetic relationship analysis. The multiple linear regression (MLR) method and nonlinear artificial neural network (ANN) and support vector machines (SVM) were employed for modeling. The theoretically calculated molecular descriptors were used for modeling and best set of descriptors selected by correlation based feature selection (CFS) method. The performance and predictive capability of linear method was investigated and compared with nonlinear method. The ANN gave better model with an average fold error of 1.66. The test set prediction accuracy shows human volume of distribution values could be predicted, on average, within 2-fold of the actual value. Key words: QSPkR, QSPR, structure pharmacokinetic relationship, volume of distribution, ANN, SVM, modeling, CFS, machine learning.

1 Introduction In drug discovery process many drug candidates fail at the later stages of development due to unfavorable pharmacokinetic (PK) properties, i.e. absorption, distribution, metabolism, and excretion (ADME) (Kennedy, 1997). Screening and optimizing ADME properties in early stage of the drug development process can increase the success rate and decrease the cost of drug development (Hou and Wang, 2008). In many cases experimental evaluation of ADME properties cannot be used due to the time and cost effectiveness. Using computational tools to predict ADME properties of chemical compounds in the early design stages of drug discovery has become an alternative choice (Davis and Riley, 2004; Waterbeemd and Gifford, 2003). The computational method such as the quantitative structure property / pharmacokinetic relationship (QSPR/QSPkR) modeling can be used as a predictive tool. There are a few QSPkR studies reported in the literature, using theoretically calculated molecular descriptors for prediction of clearance, vol†

Present address: National Institute of Technical Teachers’ Training and Research (NITTTR), Shamla Hills, Bhopal 462002, India. ∗ Corresponding author. E-mail: [email protected]

ume of distribution, half life, and protein binding (Yap et al., 2006; Berellini et al., 2009; Durairaj et al., 2009; Gleeson, 2007). The volume of distribution (Vd) is an important PK parameter that relates drug serum concentrations to the amount of drug in the body. The drug distribution in the body mainly depends upon plasma protein binding and tissue binding. A higher Vd relates to greater tissue partitioning which means that the drug can penetrate into tissues as well as bind reversibly to tissue components, whereas drugs highly bound to plasma proteins will have lower Vd value (Bauer, 2008). Vd has a significant impact on other PK properties, such as clearance and half life. The half-life of a drug is determined by the volume of distribution and the clearance, so change in Vd can effect the duration of action. The drugs having low Vd values require more frequent dosing interval, whereas drugs with high values have prolonged duration of action and require longer dosing interval. The Vd can be estimated if the overall tissue binding and plasma protein binding could be measured. The plasma protein binding can be measured using human plasma whereas to measure tissue protein binding is normally difficult. So the volume of distribution in human is traditionally estimated from animal data with proper scaling to man (Karalis et al., 2002; Fagerholm et al., 2007; Obach et al., 1997). The biomimetic-binding measure-

72

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

ment values calculated from artificial membrane chromatography retention time were also used to estimate Vd values of drugs. In this method, phospholipid binding, plasma protein binding (Hollosy et al., 2006), logarithmic retention factor, and unbound drug in plasma, were used as variables in modeling (Sui et al., 2009). Since the animal experiments are costly and labor intensive many studies attempted to estimate the Vd using QSPkR computational methods. Lambardo and his coworkers have over the years attempted to use computational methods to predict Vd of drugs (Lombardo et al., 2002 and 2004; Lombardo et al., 2006). Ghafourian et al. (2006) employed QSPkR techniques to predict the volume of distribution values of 129 drugs and they studied acidic and basic drugs separately. The aim of this work was to establish QSPkR models to predict the volume of distribution values of drugs, using only theoretically calculated molecular descriptors. For that a large set of descriptors was calculated and a correlation based feature selection (CFS) method was employed to select the best set of descriptors for modeling. In order to find nonlinear relationships between descriptors and Vd values, we also used artificial neural network (ANN) and support vector machine (SVM) method, and compared the linear models derived by the traditional multiple linear regression (MLR) method.

2 Materials and methods 2.1

Data set

The volume of distribution at steady state or VDss values of drugs used for the study were taken from literature, which includes acidic, basic, neutral and amphylotics compounds (Sui et al., 2009). We used the same training data set of 97 compounds for model building and 24 test data for validation. The data set covered a broad range of drugs belonging to different therapeutic and pharmacological category. The volume of distribution values at steady state expressed as VDss L/kg , were converted into logarithmic scales as logVd and the values ranged from −1 to 1.32 (0.1 to 21 VDss L/kg). The logVd values along with the names of drugs are

Table 1 Compd. 1

given in Table 1. 2.2

Molecular descriptors

The molecular structure of the studied drugs was obtained from the PubChem database (http://pubchem. ncbi.nlm.nih.gov/) and constructed by using Chemaxon software package (http://www.chemaxon.com/). The molecules were then subjected to 3D optimization with CORINA and the molecular descriptors were calculated using E-Dragon software (Tetko et al., 2005; Tetko, 2005). We have calculated an additional set of descriptors using Chemaxon Marvin plug-in software. 2.3

Descriptor selection and linear model generation

The performance of QSPkR models depends mostly on the parameters/descriptors used to describe the molecular structures. So the selection of relevant descriptors to establish a model is an important step to reduce over-fitting and improve the model predictability. The machine learning methods such as filter and wrapper method can be used for descriptor selection (Witten and Frank, 2005; Demel et al., 2008). In the present study we used a filter based method called Correlationbased Feature Selection (CFS) introduced by Hall and Holmes (2003). The CFS method is a subset evaluation heuristic that takes into account the usefulness of individual features for predicting the class along with the level of inter-correlation among them. The heuristic assigns high scores to subsets containing attributes that are highly correlated with the class and have low inter-correlation with each other. The CFS first discretizes numeric features using the technique of Fayyad and Irani (1993) and then uses symmetrical uncertainty to estimate the degree of association between discrete features. After computing a correlation matrix, CFS applies a heuristic search strategy to find a good subset of features. We used the forward selection search, which produces a list of selected descriptors used for linear modeling. The descriptor selection and model building was performed using WEKA software package (Witten and Frank, 2005).

Descriptors and logVd values of training and test set compounds

Drug name

Ia

In

GATS1e

GATS5e

HATS8m

Psy80

HArRc

logVd

Acetaminophen

0

1

0.672

1.124

0.023

0

0

−0.022

2

Acyclovir

0

1

1.203

1.098

0.109

0

2

−0.161

3

Adefovir

1

0

1.312

1.159

0.16

0

2

−0.377

4

Alprazolam

0

1

0.773

0.805

0.18

0

1

−0.143

5

Amitriptyline

0

0

1.37

0.724

0.083

1

0

0.919

6

Amlodipine

0

0

0.972

1.175

0.136

0

0

1.204

7

Amoxicillin

0

0

0.731

1.112

0.108

0

0

−0.071

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

73

Continued Compd.

Drug name

Ia

In

GATS1e

GATS5e

HATS8m

Psy80

HArRc

logVd

8

Antipyrine

0

1

0.707

1.012

0.006

9

Apomorphine

0

0

0.582

1.181

0.043

0

1

−0.222

0

0

0.301

10

Aspirin

1

0

0.833

1.379

0.002

11

Atenolol

0

0

0.845

1.561

0.032

0

0

−0.824

0

0

12

Aztreonam

1

0

0.839

0.885

0.293

0

−0.032

1

−0.745

13

Caffeine

0

1

1.091

1.105

0

14

Cefazolin

1

0

0.835

1.11

0.264

0

2

−0.215

0

2

-1.000

15

Cefixime

1

0

0.868

1.173

0.304

0

1

−0.620

16

Cefoperazone

1

0

0.83

0.968

0.165

0

1

−0.658

17

Cefotaxime

1

0

0.958

1.118

0.231

0

1

−0.553

18

Ceftriaxone

1

0

0.92

1.177

0.272

0

2

−0.796

19

Chlorambucil

1

0

0.717

1.045

0.077

0

0

−0.538

20

Chloramphenicol

0

1

0.676

1.2

0.263

0

0

−0.027

21

Ciprofloxacin

0

0

0.691

1.148

0.086

0

1

0.260

22

Clonidine

0

0

0.882

0.507

0.01

0

0

0.322

23

Clozapine

0

0

1.006

0.565

0.217

1

0

0.732

24

Codeine

0

0

0.86

1.097

0.029

0

0

0.544

25

Dexamethasone

0

1

0.552

1.269

0.076

0

0

0.057

26

Diazepam

0

1

0.705

0.904

0.134

0

0

0.114

27

Diclofenac

1

0

0.641

1.007

0.35

0

0

−0.770

28

Diltiazem

0

0

0.947

1.022

0.143

1

0

0.491

29

Diprophylline

0

1

0.993

1.27

0.099

0

2

−0.097

30

Doxepin

0

0

1.042

0.754

0.102

1

0

1.068

31

Enalapril

1

0

0.804

1.035

0.086

1

0

0.230

32

Enoxacin

0

0

0.779

1.248

0.075

0

2

0.292

33

Estradiol

0

1

0.459

0.96

0.044

0

0

0.079

34

Famotidine

0

0

0.832

1.148

0.118

0

1

0.079

35

Furosemide

1

0

0.761

1.357

0.233

0

1

−0.886

36

Gliclazide

1

0

0.514

1.442

0.095

0

0

−0.444

37

Gliquidone

1

0

0.673

1.252

0.076

0

0

−0.796

38

Haloperidol

0

0

0.595

0.793

0.046

1

0

1.255

39

Hydrochlorothiazide

1

0

0.628

1.427

0.023

0

0

−0.081

40

Hydrocortisone

0

1

0.534

1.253

0.065

0

0

−0.357

41

Ibuprofen

1

0

0.538

1.346

0.046

0

0

−0.824

42

Imipramine

0

0

1.442

0.762

0.088

1

0

1.322

43

Indomethacin

1

0

0.751

1.061

0.204

1

1

-1.000

44

Isoniazide

0

1

0.697

0.75

0

0

1

−0.174 −0.824

45

Ketoprofen

1

0

0.534

1.134

0.094

0

0

46

Lamivudine

0

1

1.055

0.875

0.05

0

1

0.114

47

lansoprazole

0

1

0.725

1.148

0.258

0

2

−0.456

48

Levofloxacin

0

0

0.809

1.199

0.07

0

1

0.134

49

Lidocaine

0

0

0.828

2.623

0.084

0

0

−0.143

50

Lincomycin

0

0

0.837

0.975

0.072

0

0

0.114

51

Lomefloxacin

0

0

0.712

1.267

0.079

0

1

0.362

52

Medroxyprogesterone

0

1

0.487

1.01

0.069

1

0

0.668

74

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

Continued Compd.

Drug name

Ia

In

GATS1e

GATS5e

HATS8m

Psy80

HArRc

logVd

53

Meloxicam

1

0

0.646

1.007

0.146

0

1

−0.699

54

Meperidine

0

0

0.878

0.935

0.047

0

0

0.431

55

Metformin

0

0

1.8

1.8

0

0

0

0.389

56

Metoclopramide

0

0

0.983

0.94

0.052

0

0

0.531

57

Metolazone

1

0

0.58

1.158

0.18

1

0

0.207

58

Metoprolol

0

0

0.999

1.502

0.034

0

0

0.572

59

Midazolam

0

1

0.663

0.888

0.193

0

1

0.146

60

Minoxidil

0

0

0.752

0.742

0.028

0

1

0.477

61

Mirtazapine

0

0

1.296

0.815

0.029

0

1

0.681

62

Morphine

0

0

0.709

1.164

0.019

0

0

0.415

63

Nefopam

0

0

1.038

0.736

0.04

0

0

0.859

64

Nicardipine

0

0

0.713

1.24

0.156

0

0

−0.009

65

Nifedipine

0

1

0.738

1.218

0.12

0

0

−0.108

66

Nimodipine

0

1

0.807

1.065

0.197

1

0

0.176

67

Norfloxacin

0

0

0.724

1.242

0.081

0

1

0.447

68

Omeprazole

0

1

0.951

1.19

0.081

1

2

−0.469

69

Oxazepam

0

1

0.645

1.118

0.158

0

0

−0.229

70

Penicillin

1

0

0.696

1.179

0.111

0

0

−0.347

71

Pentobarbital

1

0

0.745

1.038

0.031

0

0

0.000

72

Phenytoin

1

0

0.63

1.407

0.03

0

0

−0.194

73

Pravastatin

1

0

0.697

0.921

0.078

0

0

−0.337

74

Prednisolone

0

1

0.534

1.253

0.073

0

0

−0.284

75

Prilocaine

0

0

0.743

1.793

0.036

0

0

0.568

76

Promethazine

0

0

1.451

0.721

0.069

1

0

1.146

77

Propafenone

0

0

0.751

1.389

0.067

1

0

0.556

78

Propranolol

0

0

0.818

1.545

0.045

0

0

0.604

79

Pseudoephedrine

0

0

0.643

0.964

0.007

0

0

0.453

80

Quinidine

0

0

0.847

1.236

0.075

1

1

0.544

81

Ranitidine

0

0

0.793

1.08

0.038

0

1

0.114

82

Risperidone

0

0

0.688

0.663

0.044

1

2

0.041

83

Salbutamol

0

0

0.649

1.581

0.047

0

0

0.279

84

Salicylic acid

1

0

0.643

2.143

0

0

0

−0.770

85

Simvastatin

0

1

0.761

0.791

0.089

0

0

−0.009

86

Sulfadiazine

1

0

0.613

1.74

0.055

0

1

−0.036

87

Sulpiride

0

0

0.749

1.326

0.095

0

0

0.398

88

Theophylline

0

1

1.074

1.712

0

0

2

−0.244

89

Tinidazole

0

1

0.874

0.996

0.046

0

0

−0.155 −0.854

90

Tolbutamide

1

0

0.558

1.583

0.076

0

0

91

Tramadol

0

0

0.874

0.996

0.046

0

0

0.447

92

Triamcinolone

0

1

0.578

1.252

0.096

0

0

0.246

93

Triazolam

0

0

0.737

0.899

0.227

0

1

−0.174

94

Trimethoprim

0

0

1.213

0.958

0.088

0

1

0.204

95

Venlafaxine

0

0

0.872

0.978

0.062

0

0

0.643

96

Verapamil

0

0

1.131

0.772

0.047

0

0

0.670

97

Warfarin

1

0

0.666

1.366

0.11

0

1

−0.854

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

75

Continued Compd.

Drug name

Ia

In

GATS1e

GATS5e

HATS8m

Psy80

HArRc

logVd

Test data 98

Amobarbital

1

0

0.745

0.9

0.05

0

0

0.021

99

Atropine

0

0

0.771

1.674

0.077

0

0

0.301

100

Bupivacaine

0

0

0.772

2.071

0.059

1

0

−0.155

101

Carbamazepine

0

1

0.659

1.856

0.011

0

0

0.146

102

Chlorpheniramine

0

0

0.862

0.689

0.2

0

1

0.505

103

Cimetidine

0

0

1.434

0.908

0.051

0

1

0.000

104

Cromolyn

1

0

0.899

1.016

0.059

0

2

−0.495

105

Digoxin

0

1

0.876

0.871

0.039

0

0

0.494

106

Esmolol

0

0

0.941

1.419

0.041

0

0

0.279

107

Flurbiprofen

1

0

0.534

1.109

0.1

0

0

−0.921

108

Gatifloxacin

0

0

0.781

1.146

0.09

0

1

0.243

109

Glipizide

1

0

0.65

1.453

0.077

0

1

−0.796

110

Glyburide

1

0

0.692

1.228

0.079

0

0

−0.854

111

Griseofulvin

0

1

0.998

1.058

0.156

0

0

0.176

112

Naproxen

1

0

0.72

1.214

0.051

0

0

−0.796

113

Phenobarbital

1

0

0.69

1.214

0.028

0

0

−0.201

114

Piroxicam

1

0

0.619

1

0.174

0

1

−0.854

115

Prazosin

0

0

1.13

0.666

0.041

0

2

−0.174

116

Prednisone

0

1

0.534

1.253

0.074

0

0

−0.013

117

Propiverine

0

0

0.919

1.546

0.103

1

0

0.279

118

Pyrazinamide

0

1

0.937

2.38

0

0

1

−0.155

119

Quinine

0

0

0.847

1.236

0.072

1

1

0.204

120

Ropivacaine

0

0

0.774

2.147

0.06

0

0

−0.260

121

Timolol

0

0

1.133

1.484

0.07

0

1

0.544

2.4

Artificial neural network

Artificial neural network (ANN) is a machine learning method suitable for modeling non-linear relationship. The theory and application of ANN studies in QSPR modeling is extensively discussed in many reviews (Haykin, 2006; Zupan and Gasteiger, 1999; Zupan, 1994). A conventional three-layered back-propagation network was employed in this study (Wythoff, 1993). The back-propagation ANN uses the supervised learning technique and the network is trained by minimizing the squared error of the network’s output. The error is calculated between the desired values and the network’s output. This error is propagated backwards through the network for adjusting the weights to minimize the error. This is achieved by modifying the connection weights during learning by using gradient descent algorithm. A learning rate parameter η influences the rate of weight adjustment and a momentum term μ prevents sudden changes in the direction in which corrections are made. The over fitting problem was minimized by monitoring the performance of the network during training by using a val-

idation data set. A 10-fold cross validation was done during the selection of η, μ and the number of neurons in the hidden layer. 2.5

Support vector machines

Support vector machine (SVM) is a machine learning technique based on the statistical learning theory developed for pattern classification problems by Vapnik (Vapnik, 1995; Cortes and Vapnik, 1995). This technique is built on the structural risk minimization (SRM) principle and is superior to the traditional empirical risk minimization (ERM) principle. The ERM only minimizes the error on the training data whereas SRM minimizes an upper bound on the expected risk. SVM is used to solve non-linear regression problems in QSAR/QSPR studies and the details are given in literature (Burges, 1998; Doucet et al., 2007; Li et al., 2009; Ivanciuc, 2007). The main concepts of SVM are briefly described below. The correlation between structure and property can be defined by yi = f (xi ). The term f (xi ) can be represented by a linear function of the form f (xi ) =

76

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

wi , xi  + b, where w is the weight vector of the linear function and b corresponds to the coefficient. In SVM, the input data is first mapped into highdimensional feature space by the use of kernel function and then linear regression is performed in the feature space. The non-linear feature mapping will allow the treatment of non-linear problems in a linear space. In the higher dimensional feature space SVM approximates the set of data with a linear function, y=

m 

2.6 wi Φ(xi ) + b,

i=1

where Φ(xi ) is the features of input variables after kernel transformation while wi and b are coefficients. The radial basis function kernel (RBF) or Gaussian kernel which is most commonly used in QSPR problems was applied in this study. The RBF kernel can perform the non-linear mapping as described by the following equation: k(x, y) = exp(−γ x − y2 ). The new feature space after kernel transformation allows the data to be linearly separable by hyper planes or conduct a linear regression. The coefficients w and b are estimated by minimizing regularized risk function which is defined as: R(C) = C

N 1  1 2 Lε(yi , f (xi , w)) + w . N i=1 2

N 1  Lε(yi , f (xi , w)), the empirical N i=1 error (risk), is measured by ε insensitive loss function. ε is a prescribed parameter and is referred to as tube size, and it is defined as the approximation accuracy placed on the training data points. The loss function ignores errors as long as it is less than ε, in other words, errors below ε would not be penalized. The second term 1 2 w is the regularization term, and it is a measure of 2 function flatness. The value of the cost function C determines the trade-off between the empirical error and the regularized term. The minimization of regularized risk function is a constrained optimization problem and this can be reformulated into dual problem formalism by using Lagrange multipliers. The calculation is performed using John Platt’s sequential minimal optimization (SMO) algorithm and modified by the method proposed by Smola and Scholkopf (Smola and Scholkopf, 2004; Shevade et al., 1999). The performance of SVM depends on selection of kernel type, optimizing value of RBF kernel parameter γ (gamma), complexity parameter or regularization parameter C, and ε-insensitive loss function. The RBF

The first term C

kernel parameter γ controls the amplitude of the Gaussian function and also affects the generalization ability of SVM. The parameter ε of ε-insensitive loss function is referred to as tube size, and it is defined as the approximation accuracy placed on the training data points. The value of ε also decides the number of support vectors, the higher the value is, the fewer support vectors are selected. A ten-fold cross validation procedure was used to select the optimum values of above parameters. Validation techniques and model performance evaluation

To validate the ANN and SVM models and their performance, we have used a 10-fold cross-validation procedure, in which the data set was split into 10 folds; one fold was used for testing, and the rest for training. This procedure was repeated for 10 times, so all data were used as test data once, and finally these outputs were averaged. T he error estimate, root mean square error (RMSE) was calculated and then averaged. The RMSE was calculated using the following equation:   n 1  RM SE =  (log V d exp − log V dpred)2 , n i=1 where n is the number of data points, logVdpred represents the predicted output from the model for a given input, while logVdexp is the desired output for the same input. The overall accuracy of predicted parameters was expressed in terms of average fold-error, which was calculated as the mean of the individual fold-error values (Wajima et al., 2002). Fold error (FE) was calculated according to the following equation and the average values were reported as average fold error AFE. F E = anti log (|log V dexp − log V dpred|)

3 Results and discussion 3.1

Descriptor selection and linear model

We have computed a total of 1707 descriptors using E-DRAGON and Chemaxon, which include 1D, 2D and 3D descriptors. The training data set was used for description selection using the CFS method which resulted in a subset of forty descriptors. The reduced descriptor subset along with two indicator parameters Ia accounting for the presence of acidic nature of the compound and In indicating neutral compounds, was then used for linear model building using a forward step-wise multiple linear regression analysis. Seven statistically significant descriptors were selected and this resulted in the following regression equation with a R2 value of 0.782. The selected descriptors and their values are shown in Table 1.

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

77

logVd = 0.456 - 0.796(± 0.147) Ia - 0.371(± 0.143) In + 0.352(± 0.263) GATS1e - 0.233(± 0.182) GATS5e - 0.846(± 0.783) HATS8m + 0.318(± 0.154) Psy80 0.184(± 0.085) HArRc; N = 97, R = 0.884, R2 = 0.782, SE = 0.265, F = 45.66; PRESS = 6.245, SSY = 22.426, PRESS/SSY = 0.278, R2 cv = 0.722; RMSE = 0.254. Where R is the correlation coefficient, SE is the standard error of estimate. The figures in parentheses with the regression coefficients were standard errors of coefficients and F is the F-statistics. A reliable MLR Table 2 logVd logVd

GATS1e

model is one that has high R2 and low SE values. The correlation matrix for selected parameters is given in Table 2. The correlation matrix shows no intercorrelation of selected descriptors. To further check the inter-correlation of the descriptors, variance inflation factor (VIF) and tolerance were calculated (Draper and Smith, 1981). VIF = 1/1-R2 and tolerance = 1/VIF. In practice, when VIF>5 or if the tolerance remains less than 0.20, it would indicate multicollinearity among the descriptors. The calculated VIF and tolerance values against each descriptor are shown in Table 2. The VIF values are all less than 2 and the tolerance higher than 0.2, indicating that there is no multicollinearity and the equation is stable.

Inter correlation of descriptors and collinearity statistics GATS5e

HATS8m

Psy80

HArRc

Ia

In

Tolerance

VIF

1

GATS1e

0.369

1

GATS5e

−0.285

−0.160

1

HATS8m

−0.368

−0.060

−0.153

1

Psy80

0.375

0.180

−0.273

0.077

1

HArRc

−0.323

0.202

−0.055

0.251

−0.071

1

Ia

−0.684

−0.235

0.189

0.324

−0.099

0.0472

1

In

−0.113

−0.144

−0.095

0.008

−0.071

0.1668

−0.375

We used internal cross-validation method and calculated various cross-validation parameters PRESS (Predicted residual sum of squares), SSY (Sum of squares of the response values), R2cv (Cross validated correlation coefficient). PRESS is a good estimate of the real predictive power of QSPR/QSPkR model. The value of PRESS is smaller than SSY in the regression equation, so the model predicts better than chance. The ratio of PRESS/SSY can be used to calculate approximate confidence intervals of predictions of new observations. To be a reasonable QSPR/QSPkR model, PRESS/SSY should be lower than 0.4, and the value 0.278 indicates a good model. The developed linear model was cross-validated by the leave-one-out method. The high values observed (R2cv = 0.722) are indicative of the reliability in prediction of logVd. An R2 value of 0.782 of this model reveals that it is able to account for 78% of the variances of the logVd. The selected descriptors GATS1e, GATS5e, HATS8m are autocorrelation descriptors of chemical compounds calculated by using various molecular properties that can be represented at the atomic level (Todeschini and Consonni, 2009). GATS1e and GATS5e are Geary autocorrelation - lag 1 and lag 5 / weighted by atomic Sanderson electronegativities and

1

0.782

1.278

0.851

1.175

0.781

1.281

0.880

1.137

0.825

1.212

0.640

1.563

0.731

1.369

belong to 2D autocorrelation indices, while HATS8m is a leverage-weighted autocorrelation of lag 8 / weighted by atomic masses belonging to GETAWAY descriptors. The HATS indices are defined by weighting each atom of the molecule by its physico-chemical properties combined with the diagonal elements of the molecular influence matrix H, thus accounting for 3D features of the molecules. The calculation of these indices is based on the atomic mass weighting scheme scaled on the carbon atom. The Psychotic-80 (Psy80) is an anti psychoticlike index proposed by Ghose-Viswanadhan-Wendoloski (Ghose et al., 1999). They are derived from analysis of the distribution of logP, molar refractivity, molecular weight, number of atoms and chemical constitutions of known antipsychotic drug molecules available in the comprehensive medicinal chemistry database. The property ranges cover approximately 80% of the drugs studied. The positive sign of Psy80 in the equation indicates that increasing antipsychotic drug like property or structural features will increase Vd. It is interesting to note that most antipsychotic drugs have an increased volume of distribution. They tend to exhibit moderate plasma protein binding, coupled with extensive tissue localization and strong tissue binding, resulting in a large Vd (Gorgia, 1993).

78

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

Table 3 Compd.

The predicted logVd values using MLR, ANN and SVM models

Drug name

logVd(exp)

MLR

Residual

ANN

Residual

logVdpred

logVdpred

SVM

Residual

logVdpred

1

Acetaminophen

−0.022

0.041

0.063

0.141

0.163

0.034

0.056

2

Acyclovir

−0.161

−0.207

−0.046

−0.057

0.104

−0.176

−0.015

3

Adefovir

−0.377

−0.651

−0.274

−0.244

0.133

−0.588

−0.211

−0.143

−0.166

−0.024

−0.035

0.108

−0.135

0.008

0.919

1.018

0.099

1.086

0.167

1.035

0.116

4

Alprazolam

5

Amitriptyline

6

Amlodipine

1.204

0.41

−0.794

0.233

−0.971

0.396

−0.808

7

Amoxicillin

−0.071

0.363

0.434

0.264

0.335

0.348

0.419

8

Antipyrine

−0.222

−0.09

0.131

−0.014

0.208

−0.089

0.133

9

Apomorphine

0.301

0.35

0.049

0.367

0.066

0.331

0.03

10

Aspirin

−0.824

−0.369

0.455

−0.195

0.629

−0.401

0.423

11

Atenolol

−0.032

0.363

0.395

0.37

0.401

0.33

0.362

12

Aztreonam

−0.745

−0.682

0.063

−0.827

−0.083

−0.683

0.062

13

Caffeine

−0.215

−0.156

0.059

−0.075

0.139

−0.146

0.069

14

Cefazolin

-1.000

−0.895

0.105

−0.708

0.292

−0.874

0.126

15

Cefixime

−0.620

−0.748

−0.128

−0.87

−0.25

−0.761

−0.142

16

Cefoperazone

−0.658

−0.596

0.061

−0.558

0.1

−0.589

0.069

17

Cefotaxime

−0.553

−0.642

−0.089

−0.735

−0.182

−0.639

−0.086

18

Ceftriaxone

−0.796

−0.888

−0.092

−0.724

0.072

−0.863

−0.067

19

Chlorambucil

−0.538

−0.395

0.142

−0.409

0.129

−0.419

0.118

20

Chloramphenicol

−0.027

−0.178

−0.152

−0.282

−0.255

−0.167

−0.141

21

Ciprofloxacin

0.260

0.176

−0.085

0.295

0.035

0.178

−0.082

22

Clonidine

0.322

0.641

0.318

0.591

0.269

0.669

0.346

23

Clozapine

0.732

0.814

0.081

0.534

−0.198

0.802

0.069

24

Codeine

0.544

0.479

−0.065

0.462

−0.082

0.475

−0.069

25

Dexamethasone

0.057

−0.08

−0.137

0.062

0.005

−0.089

−0.146

26

Diazepam

0.114

0.01

−0.104

0.104

−0.01

0.025

−0.089

27

Diclofenac

−0.770

−0.644

0.125

−0.834

−0.065

−0.702

0.068

28

Diltiazem

0.491

0.749

0.258

0.618

0.126

0.728

0.237

29

Diprophylline

−0.097

−0.313

−0.216

−0.084

0.013

−0.294

−0.197

30

Doxepin

1.068

0.88

−0.189

0.902

−0.167

0.884

−0.184

31

Enalapril

0.230

−0.052

−0.282

0.021

−0.209

−0.018

−0.249

32

Enoxacin

0.292

0.008

−0.284

0.318

0.026

0.025

−0.267

33

Estradiol

0.079

−0.014

−0.093

0.121

0.042

−0.014

−0.094

34

Famotidine

0.079

0.198

0.119

0.267

0.188

0.204

0.125

35

Furosemide

−0.886

−0.769

0.117

−0.81

0.076

−0.791

0.095

36

Gliclazide

−0.444

−0.575

−0.131

−0.62

−0.176

−0.635

−0.191

37

Gliquidone

−0.796

−0.458

0.338

−0.476

0.32

−0.497

0.299

38

Haloperidol

1.255

0.76

−0.495

0.912

−0.343

0.76

−0.495

39

Hydrochlorothiazide

−0.081

−0.47

−0.389

−0.358

−0.277

−0.518

−0.437

40

Hydrocortisone

−0.357

−0.073

0.283

0.071

0.427

−0.083

0.273

41

Ibuprofen

−0.824

−0.502

0.322

−0.444

0.38

−0.552

0.272

42

Imipramine

1.322

1.03

−0.292

1.083

−0.239

1.045

−0.277

43

Indomethacin

-1.000

−0.361

0.639

−0.695

0.305

−0.388

0.612

44

Isoniazide

−0.174

−0.028

0.146

0.006

0.18

−0.018

0.156

45

Ketoprofen

−0.824

−0.495

0.329

−0.539

0.285

−0.536

0.288

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

79

Continued Compd.

Drug name

logVd(exp)

MLR

Residual

ANN

Residual

logVdpred

logVdpred

46

Lamivudine

0.114

0.027

−0.087

0.029

47

lansoprazole

−0.456

−0.513

−0.057

48

Levofloxacin

0.134

0.219

0.085

49

Lidocaine

−0.143

0.066

SVM

Residual

logVdpred −0.085

0.048

−0.066

−0.163

0.293

−0.48

−0.025

0.322

0.189

0.225

0.091

0.209

−0.012

0.131

−0.044

0.098

50

Lincomycin

0.114

0.463

0.349

0.408

0.294

0.462

0.348

51

Lomefloxacin

0.362

0.161

−0.201

0.293

−0.069

0.159

−0.203

52

Medroxyprogesterone

0.668

0.281

−0.387

0.532

−0.137

0.253

−0.415

53

Meloxicam

−0.699

−0.654

0.045

−0.552

0.147

−0.66

0.039

54

Meperidine

0.431

0.508

0.077

0.469

0.038

0.512

0.081

55

Metformin

0.389

0.671

0.282

0.546

0.157

0.641

0.251

56

Metoclopramide

0.531

0.54

0.008

0.482

−0.05

0.546

0.015

57

Metolazone

0.207

−0.239

−0.446

−0.163

−0.369

−0.243

−0.45

58

Metoprolol

0.572

0.43

−0.142

0.408

−0.164

0.404

−0.167

59

Midazolam

0.146

−0.235

−0.381

−0.074

−0.22

−0.208

−0.355

60

Minoxidil

0.477

0.341

−0.136

0.388

−0.089

0.367

−0.11

61

Mirtazapine

0.681

0.514

−0.167

0.424

−0.258

0.559

−0.122

62

Morphine

0.415

0.419

0.004

0.436

0.021

0.406

−0.009

63

Nefopam

0.859

0.617

−0.242

0.548

−0.311

0.637

−0.222

64

Nicardipine

−0.009

0.287

0.296

0.069

0.078

0.258

0.267

65

Nifedipine

−0.108

−0.04

0.068

0.064

0.172

−0.039

0.069

66

Nimodipine

0.176

0.273

0.097

0.244

0.068

0.244

0.068

67

Norfloxacin

0.447

0.169

−0.278

0.295

−0.152

0.169

−0.278

68

Omeprazole

−0.469

0.024

0.493

−0.265

0.204

−0.115

0.354

69

Oxazepam

−0.229

−0.081

0.148

0.007

0.236

−0.075

0.155

70

Penicillin

−0.347

−0.463

−0.116

−0.553

−0.206

−0.498

−0.152

71

Pentobarbital

0.000

−0.345

−0.345

−0.231

−0.231

−0.363

−0.363

72

Phenytoin

−0.194

−0.471

−0.277

−0.377

−0.183

−0.517

−0.323

73

Pravastatin

−0.337

−0.374

−0.037

−0.383

−0.046

−0.392

−0.055

74

Prednisolone

−0.284

−0.08

0.204

0.064

0.348

−0.089

0.195

75

Prilocaine

0.568

0.27

−0.298

0.295

−0.273

0.219

−0.349

76

Promethazine

1.146

1.059

−0.087

1.151

0.005

1.08

−0.066

77

Propafenone

0.556

0.659

0.102

0.684

0.128

0.626

0.07

78

Propranolol

0.604

0.347

−0.258

0.342

−0.262

0.313

−0.291

79

Pseudoephedrine

0.453

0.452

−0.001

0.475

0.021

0.449

−0.004

80

Quinidine

0.544

0.537

−0.007

0.354

−0.19

0.474

−0.07

81

Ranitidine

0.114

0.268

0.154

0.359

0.245

0.281

0.167

82

Risperidone

0.041

0.457

0.415

0.169

0.128

0.364

0.323

83

Salbutamol

0.279

0.277

−0.002

0.293

0.014

0.236

−0.043

84

Salicylic acid

−0.770

−0.612

0.158

−0.473

0.297

−0.704

0.066

85

Simvastatin

−0.009

0.094

0.103

0.18

0.189

0.11

0.119

86

Sulfadiazine

−0.036

−0.759

−0.723

−0.417

−0.381

−0.798

−0.762

87

Sulpiride

0.398

0.331

−0.067

0.253

−0.145

0.305

−0.093

88

Theophylline

−0.244

−0.303

−0.059

−0.111

0.133

−0.313

−0.069

89

Tinidazole

−0.155

0.122

0.277

0.189

0.344

0.127

0.281

80

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

Continued Compd.

Drug name

logVd(exp)

MLR

Residual

ANN

Residual

logVdpred

logVdpred

SVM

Residual

logVdpred

90

Tolbutamide

−0.854

−0.576

0.278

−0.592

0.262

−0.641

0.213

91

Tramadol

0.447

0.493

0.046

0.458

0.011

0.494

0.047

92

Triamcinolone

0.246

−0.084

−0.329

0.051

−0.195

−0.09

−0.335 0.306

93

Triazolam

−0.174

0.13

0.304

0.023

0.197

0.132

94

Trimethoprim

0.204

0.402

0.198

0.37

0.166

0.434

0.23

95

Venlafaxine

0.643

0.483

−0.16

0.434

−0.209

0.484

−0.16

96

Verapamil

97

Warfarin

0.670

0.635

−0.035

0.551

−0.12

0.656

−0.015

−0.854

−0.7

0.154

−0.509

0.345

−0.719

0.135

0.021

−0.329

−0.35

−0.26

−0.281

−0.341

−0.362 −0.073

Test data 98

Amobarbital

0.301

0.273

−0.028

0.227

−0.074

0.228

−0.155

0.514

0.669

0.448

0.603

0.438

0.593

Carbamazepine

0.146

−0.124

−0.27

0.031

−0.116

−0.17

−0.316 −0.239

99

Atropine

100

Bupivacaine

101 102

Chlorpheniramine

0.505

0.246

−0.259

0.172

−0.333

0.266

103

Cimetidine

0.000

0.523

0.523

0.418

0.418

0.567

0.567

104

Cromolyn

−0.495

−0.678

−0.183

0.01

0.505

−0.635

−0.14

105

Digoxin

0.494

0.158

−0.336

0.213

−0.281

0.167

−0.327

106

Esmolol

0.279

0.423

0.144

0.4

0.121

0.401

0.122

107

Flurbiprofen

−0.921

−0.494

0.427

−0.55

0.371

−0.534

0.387

108

Gatifloxacin

0.243

0.204

−0.039

0.301

0.058

0.21

−0.033

109

Glipizide

−0.796

−0.698

0.098

−0.416

0.38

−0.72

0.075

110

Glyburide

−0.854

−0.449

0.405

−0.474

0.38

−0.485

0.369

111

Griseofulvin

0.176

0.059

−0.118

0.11

−0.066

0.073

−0.103

112

Naproxen

−0.796

−0.412

0.384

−0.367

0.429

−0.443

0.353

113

Phenobarbital

−0.201

−0.403

−0.202

−0.292

−0.092

−0.434

−0.234

114

Piroxicam

−0.854

−0.686

0.168

−0.642

0.212

−0.695

0.159

115

Prazosin

−0.174

0.296

0.47

0.374

0.548

0.352

0.526

116

Prednisone

−0.013

−0.081

−0.068

0.063

0.076

−0.09

−0.077

117

Propiverine

0.279

0.651

0.372

0.558

0.28

0.606

0.327

−0.155

−0.323

−0.168

−0.104

0.051

−0.379

−0.224

0.204

0.54

0.336

0.362

0.158

0.477

0.273

−0.260

0.178

0.438

0.167

0.426

0.102

0.362

0.544

0.267

−0.278

0.33

−0.214

0.27

−0.274

3.2

ANN models

118

Pyrazinamide

119

Quinine

120

Ropivacaine

121

Timolol

HArRc (Heteroaromatic ring count) is a constitutional descriptor. Increasing the heteroaromatic ring count in a molecule will result in increased human serum albumin (HSA) binding (Ritchie et al, 2011). The increased HSA binding of drugs will result in low volume of distribution. It is worthy to note that the sign of HArRc descriptor in the equation is negative and that increasing this parameter causes Vd to decrease. Similarly parameter indicating acidity (Ia) also has negative influence on Vd, increasing the acidity of the compound will also increase HSA binding and consequently this will lower the Vd (Kratochwil et al., 2004).

The architecture of the ANN consist of seven neurons in the input layer and four neurons in the hidden layer selected by the auto built function and one output neuron which is the logVd value. The input neurons correspond to the seven selected descriptors Ia, In, GATS1e, GATS5e, HATS8m, Psy80 and HArRc. A sigmoid transfer function was used in all the neurons. The optimum value of learning rate η and momentum μ was determined by varying their values from 0.01 to 1.0, the combination of η = 0.2 and μ = 0.3 which gives the lowest RMSE was selected. The optimization was done

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

81

the ε and γ values were found, the C value was further optimized as 60. The selected parameters (γ = 0.01, ε = 0.029, C = 60) were used for the final training run on training set and predicting the logVd values. The plot of predicted versus experimental logVd based on this model is shown in Fig. 3 and the values are shown in Table 3. The statistical parameters of this model are RMSE = 0.245, R2 = 0.794. This SVM model was used to predict logVd values of the test data set and the predicted values are given in Table 3.

R2=0.782

1.0 0.5 0 −0.5 −1.0 −1.5

Fig. 1

−1.0

−0.5

0 0.5 logVd (exp)

1.0

1.5 1.5

Plot of logVd predicted versus experimental using MLR model.

with 10-fold cross validation and 20% of data were used for validation. The learning time was set to 500. With the above selected parameters the number of neurons in the hidden layer was optimized by varying from 1 to 10, the ANN model with four hidden neurons gives the lowest RMSE. When all the training data were trained in the network using the optimized parameters with the architecture of 7-4-1, it gives a R2 = 0.819, RMSE = 0.235. A plot of experimental and predicted value of logVd of the training data using the ANN model is shown Fig. 2. The predicted values of logVd of the training data are shown in Table 3.

logVd (pred) ANN

1.5 R2=0.818

1.0 0.5 0 −0.5 −1.0 −1.5

Fig. 2

3.3

−1.0

−0.5

0 0.5 logVd (exp)

1.0

1.5

Plot of logVd predicted versus experimental using ANN model.

SVM models

The seven molecular descriptors selected by the linear method were used as inputs for SVM. To make the learning process stable, a large value of C (C = 100) was kept initially (Wang, 2003). If the value of C is too small, then insufficient stress will be placed on fitting the training data. SVM was trained with γ values varying from 0.001 to 0.5 and the optimum value 0.01 was selected which gives the lowest RMSE. The optimum value of ε was found by varying the value from 0.01 to 0.2 and the value 0.029 gives the lowest RMSE. After

logVd (pred) SVM

logVd (pred) MLR

1.5

R2=0.794

1.0 0.5 0 −0.5 −1.0 −1.5

Fig. 3

3.4

−1.0

−0.5

0 0.5 logVd (exp)

1.0

1.5

Plot of logVd predicted versus experimental using SVM model.

Comparison of MLR, ANN and SVM models

Initially we tried to model the volume of distribution values using linear MLR method, this approach was easily interpretable and was able to predict the logVd values with R2 = 0.782, RMSE = 0.254. The descriptors selected were able to provide some insights into the structural factors influencing the volume of distribution. In order to find non-linearity of the selected descriptors, we tried to develop the ANN and SVM models using the same set of descriptors. The statistical analysis of the three different methodologies is presented in Table 4. The ANN model gave the best performance for the training data in the present study. This model gives the highest R2 = 0.819 and lowest RMSE = 0.235 compared to other models. The training set prediction accuracy is within 1.66 average fold errors. For the ANN model, the percentage of compounds within 2-fold error is 84 compared to 79 and 78 for SVM and MLR models. To evaluate the prediction ability of the proposed methods an external validation was also performed with a test set not used in model building. The predictive ability of ANN model is higher by judging from the R2 and RMSE values, which is given in Table 4. The prediction accuracy was measured using AFE. The AFE of the test set for the ANN model is 2.00 while 1.99 and 2.05 for SVM and MLR models, which indicates that the prediction accuracy of all the three models are al-

82

Interdiscip Sci Comput Life Sci (2014) 6: 71–83

most identical. In comparison with the error normally associated with the prediction using interspecies scaling, which is reported to be in the range of 1.56-2.78 (Obach et al., 1997; Mahmood and Balian, 1996), our prediction accuracy is good. It is worthy to mention here that predictions were made solely from molecular structure using theoretically calculated parameters without using any biological data as descriptors. Table 4

Statistical results of different QSPkR models

Model

R

R2

RM SE

AF E

MLR(train)

0.884

0.782

0.254

1.709

ANN(train)

0.905

0.819

0.235

1.658

SVM(train)

0.891

0.794

0.245

1.681

MLR(test)

0.749

0.561

0.323

2.05

ANN(test)

0.788

0.621

0.317

2

SVM(test)

0.762

0.581

0.312

1.99

4 Conclusions To prevent late stage failure in drug discovery, lead candidates must possess appropriate pharmacokinetic properties. So ADME screening has been used as an important part in the early stages of drug discovery process. We have derived a relationship between the chemical structure and volume of distribution values in humans for a data set of 121 structurally unrelated drugs by means of linear and non-linear QSPkR models. The results obtained demonstrate that a QSPkR based prediction using theoretically calculated descriptors can lead to reasonable predictions of human pharmacokinetics Vd values. The statistical analyses of the training data indicate the superiority of the ANN model over SVM and MLR on predictive ability and accuracy of prediction. The results from the study also suggest that the Sanderson electronegativities, atomic mass, the number of heteroaromatic ring in the molecule, antipsychotic drug like properties, and acidity of the molecule play a key role in the Vd values. Thus, the proposed models provide some insights into structural features for screening compounds for pharmacokinetic properties in early drug development stage and help in reduction of animal experiments.

References [1] Balant-Gorgia, A.E., Balant, L.P., Andreoli, A. 1993. Pharmacokinetic optimization of the treatment of psychosis. Clin Pharmacokinetics 25, 217-236. [2] Bauer, L.A. 2008. Applied Clinical Pharmacokinetics. 2nd Edition, McGraw-Hill Medical, New York. [3] Berellini, G., Springer, C., Waters, N.J., Lombardo, F. 2009. In silico prediction of volume of distribution

[4]

[5] [6]

[7]

[8]

[9] [10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

in human using linear and nonlinear models on a 669 compound data set. J Med Chem 52, 4488-4495. Burges, C.A. 1998. Tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2, 1-43. Cortes, C., Vapnik, V. 1995. Support vector networks. Mach Learn 20, 273-293. Davis, A.M., Riley, R.J. 2004. Predictive ADMET studies, the challenges and the opportunities. Curr Opin Chem Biol 8, 378-386. Demel, M.A., Andreas, G.K., Janecek, K.M.T., Ecker, G.F., Gansterer, W.N. 2008. Predictive QSAR models for polyspecific drug targets: The importance of feature selection. Curr Comput Aided Drug Des 4, 91110. Doucet, J.P., Barbault, F., Xia, H., Panaye, A., Fan, B. 2007. Nonlinear SVM approaches to QSPR/QSAR studies and drug design. Curr Comput Aided Drug Des 3, 263-389. Draper, N.R., Smith, H. 1981. Applied Regression Analysis. Wiley, New York. Durairaj, C., Shah, J.C., Senapati, S., Kompella, U.B. 2009. Prediction of vitreal half-life based on drug physicochemical properties: Quantitative structurepharmacokinetic relationships (QSPKR). Pharm Res 26, 1236-1260. Fagerholm, U. 2007. Prediction of human pharmacokinetics; evaluation of methods for prediction of volume of distribution. J Pharm Pharmacol 59, 1181-1190 Fayyad, U.M., Irani, K.B. 1993. Multi-interval discretisation of continous valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France, Morgan Kaufmann, 1022-1027. Ghafourian, T., Barzegar-Jalali, M., Dastmalchi, S., Khavari, T., Hakimiha, N., Nokhodchi, A. 2006. QSPR models for the prediction of apparent volume of distribution. Int J Pharm 319, 82-97. Ghose, A.K., Viswanadhan, V.N., Wendoloski, J.J. 1999. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J Comb Chem 1, 55-68. Gleeson, M.P. 2007. Plasma protein binding affinity and its relationship to molecular structure: An insilico analysis. J Med Chem 50, 101-112. Hall, M.A., Holmes, G. 2003. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15, 1437-1447. Haykin, S. 2006. Neural Networks. A Comprehensive Foundation. 2nd Edition, Perarson Prentice Hall, New Delhi. Hollosy, F., Valko, K., Hersey, A., Nunhuck, S., Gyorgy, K., Bevan, C. 2006. Estimation of volume of distribution in humans from high throughput HPLCbased measurements of human serum albumin binding and immobilized artificial membrane partitioning. J Med Chem 49, 6958-6971.

Interdiscip Sci Comput Life Sci (2014) 6: 71–83 [19] Hou, T., Wang, J. 2008. Structure-ADME relationship: Still a long way to go? Expert Opin Drug Metab Toxicol 4, 759-770. [20] Ivanciuc, O. 2007. Applications of support vector machines in chemistry. Rev Comput Chem 23, 291-400. [21] Karalis, V., Tsantili-Kakoulidou, A., Macheras, P. 2002. Multivariate statistics of disposition pharmacokinetic parameters for structurally unrelated drugs. Pharm Res 19, 1827-1834. [22] Kennedy, T. 1997. Managing the drug discovery and development interface. Drug Discov Today 2, 436-444. [23] Kratochwil, N.A., Huber, W., Muller, F., Kansy, M., Gerber, P.R. 2004. Predicting plasma protein binding of drugs: Revisited. Curr Opin Drug Discov Dev 7, 507-510 [24] Li, H., Liang, Y., Xu, Q. 2009. Support vector machines and its applications in chemistry. Chemom Intell Lab Syst 95, 188-198. [25] Lombardo, F., Obach, R.S., Shalaeva, M.Y., Gao, F. 2002. Prediction of volume of distribution in humans for neutral and basic drugs using physicochemical measurements and plasma protein binding data. J Med Chem 45, 2867-2876. [26] Lombardo, F., Obach, R.S., Shalaeva, M.Y., Gao, F. 2004. Prediction of human volume of distribution values for neutral and basic drugs. J Med Chem 47, 12421250. [27] Lombardo, F., Obach, R.S., DiCapua, F.M. 2006. Hybrid mixture discriminant analysis-random forest model for the prediction of volume of distribution. J Med Chem 49, 2262-2267. [28] Mahmood, I., Balian, J.D. 1996. Interspecies scaling: Predicting pharmacokinetic parameters of antiepileptic drugs in humans from animals with special emphasis on Cl. J Pharm Sci 85, 411-414. [29] Obach, R.S., Baxter, J.G., Liston, T.E., Silber, B.M., Jones, B.C., Maclntyre, F., Rance, D.J., Wastall, P.J. 1997. The prediction of human pharmacokinetic parameters from preclinical and in vitro metabolism data. Pharmacol Exp Ther 283, 46-58. [30] Ritchie, T.J., Macdonald, S.J.F., Young, R.J., Pickett, S.D. 2011. The impact of aromatic ring count on compound developability: Further insights by examining carbo- and hetero-aromatic and aliphatic ring types. Drug Discov Today 16, 164-171. [31] Shevade, S.K., Keerthi, S.S., Bhattacharyya, C., Murthy, K.R.K. 1999. Improvements to SMO algorithm for SVM regression. Technical report CD-99-16, Control Division Dept of Mechanical and Production Engineering, National University of Singapore, Singapore.

83 [32] Smola, A.J., Scholkopf, B. 2004. A tutorial on support vector regression. Stat Comput 14, 199-222. [33] Sui, X., Suna, J., Li, H., Wang, Y., Liu, J., Liu, X., Zhanga, W., Chen, L., He, Z. 2009. Prediction of volume of distribution values in human using immobilized artificial membrane partitioning coefficients, the fraction of compound ionized and plasma protein binding data. Eur J Med Chem 44, 4455-4460. [34] Tetko, I.V. 2005. Computing chemistry on the web. Drug Discov Today 10, 1497-1500. [35] Tetko, I.V., Gasteiger, J., Todeschini, R., Mauri, A., Livingstone, D., Ertl, P., Palyulin, V.A., Radchenko, E.V., Zefirov, N.S., Makarenko, A.S., Tanchuk, V.Y., Prokopenko, V.V. 2005. Virtual computational chemistry laboratory - design and description. J Comput Aid Mol Des 19, 453-463. [36] Todeschini, R., Consonni, V. 2009. Molecular Descriptors for Chemoinformatics (Methods and Principles in Medicinal Chemistry). Wiley-VCH, Weinheim. [37] Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer, New York. [38] Wajima, T., Fukumura, K., Yano, Y., Oguma, T. 2002. Prediction of human clearance from animal data and molecular structural parameters using multivariate regression analysis. J Pharm Sci 91, 2489-99. [39] Wang, W.J., Xu, Z.B., Lu, W.Z., Zhang, X.Y. 2003. Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 55, 643-663. [40] Waterbeemd, H.V.D., Gifford, E. 2003. ADMET in silico modeling: Towards prediction paradise? Nat Rev Drug Discov 2, 192-204. [41] Witten, I.H., Frank, E. 2005. Data Mining: Practical machine learning tools and techniques. 2nd Ed., Morgan Kaufmann, San Francisco. [42] Wythoff, B.J. 1993. Back-propagation neural networks: A tutorial. Chemom Intell Lab Syst 18, 115155. [43] Yap, C.W., Li, Z.R., Chen, Y.Z. 2006. Quantitative structure-pharmacokinetic relationships for drug clearance by using statistical learning methods. J Mol Graphics Modell 24, 383-395. [44] Zupan, J. 1994. Introduction to artificial neural network (ANN) methods: What they are and how to use them? Acta Chim Slov 41, 327-352. [45] Zupan, J., Gasteiger, J. 1999. Neural Networks in Chemistry and Drug Design. Wiley-VCH, Weinheim.

Prediction of human volume of distribution values for drugs using linear and nonlinear quantitative structure pharmacokinetic relationship models.

In the present study the volume of distribution values in humans of 121 drugs was estimated using quantitative structure pharmacokinetic relationship ...
359KB Sizes 0 Downloads 0 Views