Accepted Manuscript Bond-based bilinear indices for computational discovery of novel trypanosomicidal drug-like compounds through virtual screening Juan Alberto Castillo-Garit, Oremia del Toro-Cortés, Maria C. Vega, Miriam Rolón, Antonieta Rojas de Arias, Gerardo M. Casañola-Martin, José A. Escario, Alicia Gómez-Barrio, Yovani Marrero-Ponce, Francisco Torrens, Concepción Abad PII:

S0223-5234(15)00230-5

DOI:

10.1016/j.ejmech.2015.03.063

Reference:

EJMECH 7808

To appear in:

European Journal of Medicinal Chemistry

Received Date: 28 October 2014 Revised Date:

27 February 2015

Accepted Date: 27 March 2015

Please cite this article as: J.A. Castillo-Garit, O.d. Toro-Cortés, M.C. Vega, M. Rolón, A. Rojas de Arias, G.M. Casañola-Martin, J.A. Escario, A. Gómez-Barrio, Y. Marrero-Ponce, F. Torrens, C. Abad, Bond-based bilinear indices for computational discovery of novel trypanosomicidal drug-like compounds through virtual screening, European Journal of Medicinal Chemistry (2015), doi: 10.1016/ j.ejmech.2015.03.063. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

Highlights  The two developed models had accuracies higher than 86% for training and test sets

RI PT

 Stochastic model identify nine/eleven compounds of a set of organic chemicals  Antitrypanosomal activity vs. epimastigote forms of T. cruzi is assayed in vitro

 Three compounds shown IC50 values for epimastigote elimination lower than 50 μM.

AC C

EP

TE D

M AN U

SC

 The IC50 cytotoxicity value was five times greater than of IC50 activity value.

ACCEPTED MANUSCRIPT 1

Bond-based bilinear indices for computational

2

discovery of novel trypanosomicidal drug-like

3

compounds through virtual screening

RI PT

4

Juan Alberto Castillo-Garit,a,b,c,d,* Oremia del Toro-Cortés,a Maria C.

6

Vega,e Miriam Rolón,e Antonieta Rojas de Arias,e Gerardo M. Casañola-

7

Martin,b,c,f José A. Escario,g Alicia Gómez-Barrio,g Yovani Marrero-

8

Ponce,h Francisco Torrensd and Concepción Abad,c a

Centro de Estudio de Química Aplicada, Facultad de Química-Farmacia, Universidad Central “Marta

M AN U

Abreu” de Las Villas, Santa Clara, 54830, Villa Clara, Cuba. b

Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy. Universidad Central “Marta Abreu” de Las Villas, Santa Clara, 54830, Villa Clara, Cuba.

c

Departament de Bioquímica i Biologia Molecular, Universitat de València, E-46100 Burjassot, Spain.

d

Institut Universitari de Ciència Molecular, Universitat de València, Edifici d'Instituts de Paterna, P.O.

TE D

Box 22085, E-46071, València, Spain. e

Centro para el Desarrollo de la Investigacion Cientifica (CEDIC) and Fundación Moisés Bertoni/Laboratorios Díaz Gill, Pai Perez 265 casi Mariscal Estigarribia. Asuncion-Paraguay.

f

Centro de Información y Gestión Tecnológica, Ministerio de Ciencia Tecnología y Medio Ambiente (CITMA), 65100, Ciego de Ávila, Cuba

g

EP

Departamento de Parasitología, Facultad de Farmacia, UCM, Pza. Ramón y Cajal s/n, 28040 Madrid.

h

Enviromental and Computational Chemistry Group, Facultad de Química Farmacéutica, Universidad de

Cartagena,Cartagena de Indias, Bolivar, Colombia

25 26

AC C

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

SC

5

27

*

28

Postal address: Universidad Central “Marta Abreu” de Las Villas, Facultad de

29

Química-Farmacia, Carretera a Camajuaní Km 51/2, Santa Clara, Villa Clara, Cuba. CP:

30

54830

31

Telephone: 53-42-281510, 211863

32

e-mail: [email protected] or [email protected]

33

To whom correspondence should be addressed:

ACCEPTED MANUSCRIPT Abstract

35

Two-dimensional bond-based bilinear indices and linear discriminant analysis are used

36

in this report to perform a quantitative structure-activity relationship study to identify

37

new trypanosomicidal compounds. A data set of 440 organic chemicals, 143 with

38

antitrypanosomal activity and 297 having other clinical uses, is used to develop the

39

theoretical models. Two discriminant models, computed using bond-based bilinear

40

indices, are developed and both show accuracies higher than 86% for training and test

41

sets. The stochastic model correctly indentifies nine out of eleven compounds of a set of

42

organic chemicals obtained from our synthetic collaborators. The in vitro

43

antitrypanosomal activity of this set against epimastigote forms of Trypanosoma cruzi is

44

assayed. Both models show a good agreement between theoretical predictions and

45

experimental results. Three compounds showed IC50 values for epimastigote elimination

46

(AE) lower than 50 μM, while for the benznidazole the IC50=54.7 μM which was used

47

as reference compound. The value of IC50 for cytotoxicity of these compounds is at least

48

5 times greater than their value of IC50 for AE. Finally, we can say that, the present

49

algorithm constitutes a step forward in the search for efficient ways of discovering new

50

antitrypanosomal compounds.

52 53 54

SC

M AN U

TE D

EP

AC C

51

RI PT

34

55 56

Keywords: Bond-based Bilinear Indices, LDA-assisted QSAR Model, virtual

57

Screening, Trypanosomicidal, In Vitro Cytotoxicity.

58

ACCEPTED MANUSCRIPT Introduction

60

American trypanosomiasis or Chagas disease is a chronic parasitosis, caused by the

61

kinetoplastid parasite Trypanosoma cruzi, which has afflicted humanity since its earliest

62

presence in the New World [1]. The disease is technically a zoonosis, endemic to Latin

63

America that currently affects about 15 million people. Nearly 30 million people live in

64

areas with vector-borne transmission risk and the disease causes an estimated 20 000

65

deaths per year [2]. It is also estimated that up to 5.4 million people will develop

66

chronic Chagas heart disease [3, 4], while 900 000 will develop megaesophagus and

67

megacolon [4]. Historically it has been categorized (typified) as a disease of poor, rural

68

populations but recent trends in migration have brought T. cruzi infection to Latin

69

American cities, and far beyond the borders of Latin America [5, 6].

70

There is neither a vaccine nor any preventive treatment for this parasitosis. Current

71

chemotherapy remains unsatisfactory and the available drugs are benznidazole and

72

nitrofurans such as nifurtimox [7]. These compounds (Fig. 1) originally registered to

73

treat acute T. cruzi infections, remain to this date as the only available drugs for the

74

specific treatment of Chagas disease [6, 8]. The efficacy of drug therapy declines with

75

the duration of infection and is still a matter of debate in the late chronic phase of the

76

disease [5, 9], which is prevalent in Latin America and is considered incurable [10].

77

Their efficacy also varies according to geographical areas, mainly because of

78

differences in drug susceptibility of different T. cruzi strains [7, 11]. Moreover, both

79

medications have important side effects such as anorexia, vomiting, peripheral

80

neuropathy and allergic dermopathy, which can result on treatment discontinuation [10,

81

12]. All these inconvenients explain the need for the search and design of new drugs

82

against T. cruzi [13, 14].

AC C

EP

TE D

M AN U

SC

RI PT

59

ACCEPTED MANUSCRIPT At present, there is an increasing interest on the development of rational approaches for

84

antiparasitic drugs discovery. In this sense, a very important role may be played by

85

computer-aided drug design studies [15]. In this context, our research group has recently

86

developed a novel scheme to generate molecular fingerprints based on the application of

87

discrete mathematics and linear algebra theory. The approach [known as TOMOCOMD

88

acronym of TOpological MOlecular COMputer Design] [16] allows us to perform

89

rational in silico molecular design (selection/identification) and Quantitative Structure-

90

Activity/Property Relationship (QSAR/QSPR) studies. In fact, this scheme has been

91

successfully applied to the prediction of several physical, physicochemical, chemical,

92

pharmacokinetical, toxicological as well as biological properties [17-20]. Furthermore,

93

these molecular descriptors (MDs) have been extended to consider three-dimensional

94

(3D) features of small/medium-sized molecules based on the trigonometric-3D-

95

chirality-correction factor approach [21-23].

M AN U

SC

RI PT

83

Figure 1 comes about here

97

In the present report, bond-based non-stochastic and stochastic bilinear indices are used

98

to find classification models that allow the discrimination of antitrypanosomal

99

compounds. This kind of approach permits the rational identification of those candidates

EP

TE D

96

to be evaluated, which have the highest probabilities of being active ones. Therefore,

101

eleven compounds were in silico evaluated and, after that, in vitro assayed against

102

epimastigote forms of Trypanosoma cruzi. Cytotoxic studies against macrophages were

103

also conducted.

AC C

100

104 105

Materials and Methods

106

Data set for QSAR Study

ACCEPTED MANUSCRIPT The general data set used in this study consists of 440 compounds of great structural

108

variation, 143 of which are active and 297 are inactive against trypanosome. The

109

antitrypanosomals considered in this study are representative of families with diverse

110

structural patterns and were collected from previous publications [24-41]. The names of

111

compounds in the database together with their experimental data taken from the

112

literature are reported in the Supporting Information (activity data and structures of the

113

143 anti-trypanosome agents in Tables S1 and S3 and Figures S1-S2, respectively). It is

114

remarkable that the wide variability of drugs and mechanisms of action of active

115

compounds in the training and prediction sets assures adequate extrapolation power and

116

increases the possibilities of the discovery of new lead compounds with novel

117

mechanisms of action, which results one of the most critical aspects in the construction

118

of non-congeneric data. Therefore, this dataset provides us with a suitable data matrix to

119

explore the potential of computational tools in ligand-based drug design of trypanocidal

120

agents.

121

On the other hand, 297 compounds having different clinical uses such as antiviral,

122

sedative/hypnotic, diuretic, anticonvulsant, hemostatic, oral hypoglycemic, anti-

123

hypertensive, antihelmintic and anticancer compounds as well as some other kinds of

124

drugs were selected for the set of inactive compounds through random selection, in

125

order to guarantee great structural variability as well. All these compounds were taken

126

from the Negwer Handbook [42] and Merck Index [43] in which their names,

127

synonyms, and structural formulas can be found. The classification of these organic

128

compounds as ‟inactive‟ (non-antitrypanosomal) does not guarantee that all are truly so;

129

some of them may have inhibitory activity toward tyrosinase, which is undetected. This

130

problem can be reflected in the results of classification for the series of inactive

131

compounds [44].

AC C

EP

TE D

M AN U

SC

RI PT

107

ACCEPTED MANUSCRIPT Computational approach

133

The theory of the bond-based bilinear indices used in this study was discussed in detail

134

in an earlier publication [45]. Specifically, the CARDD (Computed-Aided Rational Drug

135

Design) module implemented in the TOMOCOMD Software [16] was used in the

136

calculation of bond-based non-stochastic and stochastic bilinear indices. In this study,

137

the properties used to differentiate the molecular atoms are those previously proposed

138

for the calculation of the DRAGON [46] descriptors [47-49], i.e., atomic mass (M),

139

atomic polarizability (P), atomic Mulliken electronegativity (K), van der Waals atomic

140

volume (V), plus the atomic electronegativity in Pauling scale (G)[50].

141

The bond-based bilinear indices descriptors computed in this study were the following:

M AN U

SC

RI PT

132

142 1) kth (k = 0,15 ) total non-stochastic bond-based bilinear indices, not considering and 143

considering H-atoms in the molecular graph (G) [Y-Zbk( w , u ) and Y-ZbkH( w , u )],

144

respectively.

TE D

145 2) kth (k = 0,15 ) total stochastic bond-based bilinear indices, not considering and considering H-atoms in the molecular graph (G) [Y-Zsbk( w , u ) and Y-ZsbkH( w , u )],

147

respectively.

EP

146

148 3) kth (k = 0,15 ) bond-type local (group = heteroatoms: S, N, O) non-stochastic bilinear indices, not considering and considering H-atoms in the molecular graph (G) [Y-

150

Z

151

putative molecular charge, dipole moment, and H-bonding acceptors character.

AC C

149

bkLE( w , u ) and Y-ZbkLEH( w , u ), correspondingly]. These local descriptors have

152 4) kth (k = 0,15 ) bond-type local (group = heteroatoms: S, N, O) stochastic bilinear 153

indices, not considering and considering H-atoms in the molecular graph (G) [Y-

154

Zs

155

putative molecular charge, dipole moment, and H-bonding acceptors character.

156

Chemometric method

bkLE( w , u ) and Y-ZsbkLEH( w , u )], correspondingly. These local descriptors have also

ACCEPTED MANUSCRIPT Linear discriminant analysis (LDA) was performed with software package

158

STATISTICA [51]. Forward stepwise was fixed as the strategy for variable selection.

159

The quality of the models was determined by examining Wilks´ λ parameter (U-

160

statistic), square Mahalanobis distance (D2), Fisher ratio (F) and the corresponding p-

161

level (p(F))[52] as well as the percentage, in training and test sets, of global good

162

classification (accuracy), Matthews‟ correlation coefficient (MCC), sensitivity,

163

specificity, negative predictive value (sensitivity of the negative category) and false

164

positive rate (false alarm rate) [53]. Models with a ratio (number of cases)/(number of

165

variables) lower than 5 were rejected.

166

One of the crucial steps, involved in a QSAR study, consists in the statistical validation

167

of the results to determine its reliability and significance, while providing an indication

168

of how well the model can predict activity for new molecules. Several procedures are

169

available for this task, and in the present work we carry out both internal and external

170

ones.

171

The internal validation technique uses the dataset from which the model is performed

172

and checks for internal consistency. The procedure derives a new model using a reduced

173

set of structural data. The new model is used to predict the activities of the molecules

174

that were not included in the new-model set. This is repeated until all compounds have

175

been deleted and predicted once. As internal validation we performed two experiments:

176

a leave-group-out (LGO) and a randomization test (Y-scrambling) [54]:

177

(I)

AC C

EP

TE D

M AN U

SC

RI PT

157

LGO Cross-validation (CV) is one of the most commonly used statistical

178

technique for internal validation in which different proportions of the training set

179

are iteratively held-out; then a new model is developed and used to “predict” the

180

held-out compound as new in order to verify internal “predictability”. This

181

procedure is repeated for each set of modified data. Therefore, in this report the

ACCEPTED MANUSCRIPT LGO test is used in such a way that a fraction (5, 10, 15, 20, 25, and 30%) of the

183

initial data is left out (not participating in the model‟s construction) and model

184

predictions are made based on the reduced data. This process is repeated until

185

each observation has been left out once. The accuracies for the reduced training

186

and test (held-out compounds) series are calculated and plotted.

187

(II)

RI PT

182

Y-scrambling, or response permutation testing, is another technique widely use to check the robustness of a QSAR model and to identify models based on

189

chance correlation. The set of activity values is randomly re-assigned to different

190

molecules, and a new QSAR model is performed. This procedure is similar to

191

cross-validation CV but, instead of leaving groups outside, it categorizes the

192

assignments (1 for −1 and vice versa), for each (active and inactive) group of the

193

database 5%, 10%, 15%, 20%, 25% and 30% of the total. This process is

194

repeated until each case has been inverted once. The accuracies for new models

195

are calculated and plotted. If the quality of the random response-models is

196

comparable to the original one, the set of observations is not sufficient to support

197

the model and the chance correlation is detected.

TE D

M AN U

SC

188

Validation external process is necessary to ensure the quality and predictive power of

199

the QSAR models to predict the activity of compounds that were not used for model

200

development. In this study, the original data are divided into two series, the training and

201

test sets. The training set is used to build the QSAR models, and these discriminant

202

functions (DFs) are used to predict the activities of compounds in the test set. The

203

predictability of a model is estimated by comparing the predicted and observed classes

204

of a sufficiently large and representative test of compounds.

205

Biological Assays

206

AC C

EP

198

Determination of „in vitro‟ Tripanosomicidals Activity [55].

ACCEPTED MANUSCRIPT Parasites: For in vitro studies, the clone CL-B5 of T. cruzi was used. The

208

parasites, stably transfected with the Escherichia coli β-galactosidase gene (lacZ),

209

were kindly provided by Dr. F. Buckner through Universidad Complutense

210

Madrid (Spain). The epimastigotes are grown at 28 ºC in liver infusion tryptose

211

broth (LIT) with 10% fetal bovine serum (FBS), penicillin and streptomycin.

212

Reagents: Chlorophenol red-β-D-galactopyranoside (CPRG; Roche, Indianapolis,

213

Ind.) is dissolved in Triton X-100 (pH7.4).

214

Epimastigote susceptibility assay: The screening assay was performed in 96-well

215

microplates with cultures that have not reached the stationary phase.

216

Epimastigotes were seeded at 2 x 105 per milliliter in 200 µL. The plates are then

217

incubate with the drugs at 28ºC for 72 hours, at which time 50 µL of CPRG

218

solution was added to give a final concentration of 200 µM. The plates were

219

incubated at 37ºC for an additional 6 h and are then read at 595 nm. Each

220

concentration was tested in triplicate. Each experiment was performed twice

221

separately. The efficacy of each compound is estimated by calculating the IC50

222

and AE% (antiepimastigotes percentage).

SC

M AN U

TE D

Cytotoxicity Assay [56].

EP

223

RI PT

207

Cell culture: The cell lines that are used are NCTC clone 929 and murine J774

225

macrophages. NCTC clone 929 cells are grown in Minimal Essential Medium

226 227

AC C

224

(Sigma) and J774 macrophages are grown in RPMI 1640 medium (Sigma). Both media are supplement with 10% heat-inactivated FBS (30 minutes at 56ºC),

228

penicillin G (100 U/mL) and streptomycin (100 µg/mL). For the experiments,

229

cells in the pre-confluence phase are harvested with trypsin. Cell cultures are

230

maintained at 37ºC in a humidified 5% CO2 atmosphere.

ACCEPTED MANUSCRIPT Reagents: Resazurin Sodium Salt was obtained from Sigma-Aldrich, St. Louis,

232

USA and stored at 4 º C protected from light. A solution of Resazurin was

233

prepared in 1% phosphate buffer solution (PBS) at pH7 and filter-sterilized before

234

use.

235

Macrophages cell cytotoxicity assay: J774 macrophages were seeded

236

(5x104cells/well) in 96-well flat-bottom microplates with 100 µL of RPMI 1640

237

medium. The cells were allowed to attach for 24 h at 37ºC, 5% CO2 and the

238

medium is replaced by different concentrations of the drugs in 200 µL of medium,

239

and exposed for another 48 h. Growth controls are also included. Afterwards, a

240

volume 20 µL of 2 mM resazurin solution was added and plates were return to

241

incubator for another 3 h to evaluate cell viability. The reduction of resazurin was

242

determined by dual wavelength absorbance measurement at 490 nm and 595 nm.

243

Background was subtracted. Each concentration was assayed in triplicate.

244

Medium and drug controls were used in every test as blanks. The cytotoxicity

245

percentages (%C) and IC50 were then calculated.

SC

M AN U

TE D

246

RI PT

231

Results and Discussion

248

LDA-QSAR Models: Developing and validation.

249

The LDA has become an important tool for the prediction of chemical properties.

250

Because of the simplicity of this method many useful discriminant models have been

251

developed and presented by different authors in the literature [18, 44, 57-59]. It was the

252

technique used for the generation of a discriminant function in the present work. The

253

principle of maximal parsimony (Occam‟s razor) was taken into account as the strategy

254

for model selection [60]. The general data set was randomly divided into two subsets,

255

training and test set (which have 346 and 94 compounds, respectively), both of them

AC C

EP

247

ACCEPTED MANUSCRIPT 256

containing active and inactive compounds. The best models obtained using bond-based

257

non-stochastic and stochastic quadratic indices as molecular descriptors, together with

258

their statistical parameters are given below, respectively:

259

Class= -5.05 – 9.40x10-5 MVb2H( w , u ) +2.17x10-12 MVb15LH( w E, u E) +8.76x10-1 PKb0H( w , u ) –7.95x10-3PKb4H( w , u ) – 2.24x10-3PKb4LH( w E, u E)

261

– 6.16x10-2VKb0H( w , u ) +2.86x10-3VKb3H( w , u ) N = 346

 = 0.459

263

D2 = 5.17

F = 56.88

QTotal = 87.86 % p < 0.0001

264

MCC = 0.73

Class= -3.27 –3.08x10-2 MKsb6L( w E, u E) +3.11x10-3 MVsb6L( w E, u E)

M AN U

265

(1)

SC

262

RI PT

260

266

+2.57x10-1 PKsb7H( w , u ) +4.52x10-1 PKsb14LH( w E, u E) +5.39x10-3 VKs b4H( w , u )

267

–9.30x10-2 VPsb15LH( w E, u E) +1.79x10-1 VPsb6L( w E, u E)

268

–1.77x10-1 VPsb9L( w E, u E)  = 0.509

269

N = 346

270

D2 = 4.224

271

where, Class refers to antitrypanosomal activity, N is the number of compounds,  is

272

the Wilks‟ statistic, QTotal is the accuracy of the model for the training set, MCC is the

273

Matthews‟ correlation coefficient, D2 is the square Mahalanobis distance, F is the Fisher

274

ratio and p-value is its significance level.

275

Both equations appeared statistically significant at p

Bond-based bilinear indices for computational discovery of novel trypanosomicidal drug-like compounds through virtual screening.

Two-dimensional bond-based bilinear indices and linear discriminant analysis are used in this report to perform a quantitative structure-activity rela...
836KB Sizes 6 Downloads 11 Views