Accepted Manuscript Bond-based bilinear indices for computational discovery of novel trypanosomicidal drug-like compounds through virtual screening Juan Alberto Castillo-Garit, Oremia del Toro-Cortés, Maria C. Vega, Miriam Rolón, Antonieta Rojas de Arias, Gerardo M. Casañola-Martin, José A. Escario, Alicia Gómez-Barrio, Yovani Marrero-Ponce, Francisco Torrens, Concepción Abad PII:
S0223-5234(15)00230-5
DOI:
10.1016/j.ejmech.2015.03.063
Reference:
EJMECH 7808
To appear in:
European Journal of Medicinal Chemistry
Received Date: 28 October 2014 Revised Date:
27 February 2015
Accepted Date: 27 March 2015
Please cite this article as: J.A. Castillo-Garit, O.d. Toro-Cortés, M.C. Vega, M. Rolón, A. Rojas de Arias, G.M. Casañola-Martin, J.A. Escario, A. Gómez-Barrio, Y. Marrero-Ponce, F. Torrens, C. Abad, Bond-based bilinear indices for computational discovery of novel trypanosomicidal drug-like compounds through virtual screening, European Journal of Medicinal Chemistry (2015), doi: 10.1016/ j.ejmech.2015.03.063. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Highlights The two developed models had accuracies higher than 86% for training and test sets
RI PT
Stochastic model identify nine/eleven compounds of a set of organic chemicals Antitrypanosomal activity vs. epimastigote forms of T. cruzi is assayed in vitro
Three compounds shown IC50 values for epimastigote elimination lower than 50 μM.
AC C
EP
TE D
M AN U
SC
The IC50 cytotoxicity value was five times greater than of IC50 activity value.
ACCEPTED MANUSCRIPT 1
Bond-based bilinear indices for computational
2
discovery of novel trypanosomicidal drug-like
3
compounds through virtual screening
RI PT
4
Juan Alberto Castillo-Garit,a,b,c,d,* Oremia del Toro-Cortés,a Maria C.
6
Vega,e Miriam Rolón,e Antonieta Rojas de Arias,e Gerardo M. Casañola-
7
Martin,b,c,f José A. Escario,g Alicia Gómez-Barrio,g Yovani Marrero-
8
Ponce,h Francisco Torrensd and Concepción Abad,c a
Centro de Estudio de Química Aplicada, Facultad de Química-Farmacia, Universidad Central “Marta
M AN U
Abreu” de Las Villas, Santa Clara, 54830, Villa Clara, Cuba. b
Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy. Universidad Central “Marta Abreu” de Las Villas, Santa Clara, 54830, Villa Clara, Cuba.
c
Departament de Bioquímica i Biologia Molecular, Universitat de València, E-46100 Burjassot, Spain.
d
Institut Universitari de Ciència Molecular, Universitat de València, Edifici d'Instituts de Paterna, P.O.
TE D
Box 22085, E-46071, València, Spain. e
Centro para el Desarrollo de la Investigacion Cientifica (CEDIC) and Fundación Moisés Bertoni/Laboratorios Díaz Gill, Pai Perez 265 casi Mariscal Estigarribia. Asuncion-Paraguay.
f
Centro de Información y Gestión Tecnológica, Ministerio de Ciencia Tecnología y Medio Ambiente (CITMA), 65100, Ciego de Ávila, Cuba
g
EP
Departamento de Parasitología, Facultad de Farmacia, UCM, Pza. Ramón y Cajal s/n, 28040 Madrid.
h
Enviromental and Computational Chemistry Group, Facultad de Química Farmacéutica, Universidad de
Cartagena,Cartagena de Indias, Bolivar, Colombia
25 26
AC C
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
SC
5
27
*
28
Postal address: Universidad Central “Marta Abreu” de Las Villas, Facultad de
29
Química-Farmacia, Carretera a Camajuaní Km 51/2, Santa Clara, Villa Clara, Cuba. CP:
30
54830
31
Telephone: 53-42-281510, 211863
32
e-mail:
[email protected] or
[email protected] 33
To whom correspondence should be addressed:
ACCEPTED MANUSCRIPT Abstract
35
Two-dimensional bond-based bilinear indices and linear discriminant analysis are used
36
in this report to perform a quantitative structure-activity relationship study to identify
37
new trypanosomicidal compounds. A data set of 440 organic chemicals, 143 with
38
antitrypanosomal activity and 297 having other clinical uses, is used to develop the
39
theoretical models. Two discriminant models, computed using bond-based bilinear
40
indices, are developed and both show accuracies higher than 86% for training and test
41
sets. The stochastic model correctly indentifies nine out of eleven compounds of a set of
42
organic chemicals obtained from our synthetic collaborators. The in vitro
43
antitrypanosomal activity of this set against epimastigote forms of Trypanosoma cruzi is
44
assayed. Both models show a good agreement between theoretical predictions and
45
experimental results. Three compounds showed IC50 values for epimastigote elimination
46
(AE) lower than 50 μM, while for the benznidazole the IC50=54.7 μM which was used
47
as reference compound. The value of IC50 for cytotoxicity of these compounds is at least
48
5 times greater than their value of IC50 for AE. Finally, we can say that, the present
49
algorithm constitutes a step forward in the search for efficient ways of discovering new
50
antitrypanosomal compounds.
52 53 54
SC
M AN U
TE D
EP
AC C
51
RI PT
34
55 56
Keywords: Bond-based Bilinear Indices, LDA-assisted QSAR Model, virtual
57
Screening, Trypanosomicidal, In Vitro Cytotoxicity.
58
ACCEPTED MANUSCRIPT Introduction
60
American trypanosomiasis or Chagas disease is a chronic parasitosis, caused by the
61
kinetoplastid parasite Trypanosoma cruzi, which has afflicted humanity since its earliest
62
presence in the New World [1]. The disease is technically a zoonosis, endemic to Latin
63
America that currently affects about 15 million people. Nearly 30 million people live in
64
areas with vector-borne transmission risk and the disease causes an estimated 20 000
65
deaths per year [2]. It is also estimated that up to 5.4 million people will develop
66
chronic Chagas heart disease [3, 4], while 900 000 will develop megaesophagus and
67
megacolon [4]. Historically it has been categorized (typified) as a disease of poor, rural
68
populations but recent trends in migration have brought T. cruzi infection to Latin
69
American cities, and far beyond the borders of Latin America [5, 6].
70
There is neither a vaccine nor any preventive treatment for this parasitosis. Current
71
chemotherapy remains unsatisfactory and the available drugs are benznidazole and
72
nitrofurans such as nifurtimox [7]. These compounds (Fig. 1) originally registered to
73
treat acute T. cruzi infections, remain to this date as the only available drugs for the
74
specific treatment of Chagas disease [6, 8]. The efficacy of drug therapy declines with
75
the duration of infection and is still a matter of debate in the late chronic phase of the
76
disease [5, 9], which is prevalent in Latin America and is considered incurable [10].
77
Their efficacy also varies according to geographical areas, mainly because of
78
differences in drug susceptibility of different T. cruzi strains [7, 11]. Moreover, both
79
medications have important side effects such as anorexia, vomiting, peripheral
80
neuropathy and allergic dermopathy, which can result on treatment discontinuation [10,
81
12]. All these inconvenients explain the need for the search and design of new drugs
82
against T. cruzi [13, 14].
AC C
EP
TE D
M AN U
SC
RI PT
59
ACCEPTED MANUSCRIPT At present, there is an increasing interest on the development of rational approaches for
84
antiparasitic drugs discovery. In this sense, a very important role may be played by
85
computer-aided drug design studies [15]. In this context, our research group has recently
86
developed a novel scheme to generate molecular fingerprints based on the application of
87
discrete mathematics and linear algebra theory. The approach [known as TOMOCOMD
88
acronym of TOpological MOlecular COMputer Design] [16] allows us to perform
89
rational in silico molecular design (selection/identification) and Quantitative Structure-
90
Activity/Property Relationship (QSAR/QSPR) studies. In fact, this scheme has been
91
successfully applied to the prediction of several physical, physicochemical, chemical,
92
pharmacokinetical, toxicological as well as biological properties [17-20]. Furthermore,
93
these molecular descriptors (MDs) have been extended to consider three-dimensional
94
(3D) features of small/medium-sized molecules based on the trigonometric-3D-
95
chirality-correction factor approach [21-23].
M AN U
SC
RI PT
83
Figure 1 comes about here
97
In the present report, bond-based non-stochastic and stochastic bilinear indices are used
98
to find classification models that allow the discrimination of antitrypanosomal
99
compounds. This kind of approach permits the rational identification of those candidates
EP
TE D
96
to be evaluated, which have the highest probabilities of being active ones. Therefore,
101
eleven compounds were in silico evaluated and, after that, in vitro assayed against
102
epimastigote forms of Trypanosoma cruzi. Cytotoxic studies against macrophages were
103
also conducted.
AC C
100
104 105
Materials and Methods
106
Data set for QSAR Study
ACCEPTED MANUSCRIPT The general data set used in this study consists of 440 compounds of great structural
108
variation, 143 of which are active and 297 are inactive against trypanosome. The
109
antitrypanosomals considered in this study are representative of families with diverse
110
structural patterns and were collected from previous publications [24-41]. The names of
111
compounds in the database together with their experimental data taken from the
112
literature are reported in the Supporting Information (activity data and structures of the
113
143 anti-trypanosome agents in Tables S1 and S3 and Figures S1-S2, respectively). It is
114
remarkable that the wide variability of drugs and mechanisms of action of active
115
compounds in the training and prediction sets assures adequate extrapolation power and
116
increases the possibilities of the discovery of new lead compounds with novel
117
mechanisms of action, which results one of the most critical aspects in the construction
118
of non-congeneric data. Therefore, this dataset provides us with a suitable data matrix to
119
explore the potential of computational tools in ligand-based drug design of trypanocidal
120
agents.
121
On the other hand, 297 compounds having different clinical uses such as antiviral,
122
sedative/hypnotic, diuretic, anticonvulsant, hemostatic, oral hypoglycemic, anti-
123
hypertensive, antihelmintic and anticancer compounds as well as some other kinds of
124
drugs were selected for the set of inactive compounds through random selection, in
125
order to guarantee great structural variability as well. All these compounds were taken
126
from the Negwer Handbook [42] and Merck Index [43] in which their names,
127
synonyms, and structural formulas can be found. The classification of these organic
128
compounds as ‟inactive‟ (non-antitrypanosomal) does not guarantee that all are truly so;
129
some of them may have inhibitory activity toward tyrosinase, which is undetected. This
130
problem can be reflected in the results of classification for the series of inactive
131
compounds [44].
AC C
EP
TE D
M AN U
SC
RI PT
107
ACCEPTED MANUSCRIPT Computational approach
133
The theory of the bond-based bilinear indices used in this study was discussed in detail
134
in an earlier publication [45]. Specifically, the CARDD (Computed-Aided Rational Drug
135
Design) module implemented in the TOMOCOMD Software [16] was used in the
136
calculation of bond-based non-stochastic and stochastic bilinear indices. In this study,
137
the properties used to differentiate the molecular atoms are those previously proposed
138
for the calculation of the DRAGON [46] descriptors [47-49], i.e., atomic mass (M),
139
atomic polarizability (P), atomic Mulliken electronegativity (K), van der Waals atomic
140
volume (V), plus the atomic electronegativity in Pauling scale (G)[50].
141
The bond-based bilinear indices descriptors computed in this study were the following:
M AN U
SC
RI PT
132
142 1) kth (k = 0,15 ) total non-stochastic bond-based bilinear indices, not considering and 143
considering H-atoms in the molecular graph (G) [Y-Zbk( w , u ) and Y-ZbkH( w , u )],
144
respectively.
TE D
145 2) kth (k = 0,15 ) total stochastic bond-based bilinear indices, not considering and considering H-atoms in the molecular graph (G) [Y-Zsbk( w , u ) and Y-ZsbkH( w , u )],
147
respectively.
EP
146
148 3) kth (k = 0,15 ) bond-type local (group = heteroatoms: S, N, O) non-stochastic bilinear indices, not considering and considering H-atoms in the molecular graph (G) [Y-
150
Z
151
putative molecular charge, dipole moment, and H-bonding acceptors character.
AC C
149
bkLE( w , u ) and Y-ZbkLEH( w , u ), correspondingly]. These local descriptors have
152 4) kth (k = 0,15 ) bond-type local (group = heteroatoms: S, N, O) stochastic bilinear 153
indices, not considering and considering H-atoms in the molecular graph (G) [Y-
154
Zs
155
putative molecular charge, dipole moment, and H-bonding acceptors character.
156
Chemometric method
bkLE( w , u ) and Y-ZsbkLEH( w , u )], correspondingly. These local descriptors have also
ACCEPTED MANUSCRIPT Linear discriminant analysis (LDA) was performed with software package
158
STATISTICA [51]. Forward stepwise was fixed as the strategy for variable selection.
159
The quality of the models was determined by examining Wilks´ λ parameter (U-
160
statistic), square Mahalanobis distance (D2), Fisher ratio (F) and the corresponding p-
161
level (p(F))[52] as well as the percentage, in training and test sets, of global good
162
classification (accuracy), Matthews‟ correlation coefficient (MCC), sensitivity,
163
specificity, negative predictive value (sensitivity of the negative category) and false
164
positive rate (false alarm rate) [53]. Models with a ratio (number of cases)/(number of
165
variables) lower than 5 were rejected.
166
One of the crucial steps, involved in a QSAR study, consists in the statistical validation
167
of the results to determine its reliability and significance, while providing an indication
168
of how well the model can predict activity for new molecules. Several procedures are
169
available for this task, and in the present work we carry out both internal and external
170
ones.
171
The internal validation technique uses the dataset from which the model is performed
172
and checks for internal consistency. The procedure derives a new model using a reduced
173
set of structural data. The new model is used to predict the activities of the molecules
174
that were not included in the new-model set. This is repeated until all compounds have
175
been deleted and predicted once. As internal validation we performed two experiments:
176
a leave-group-out (LGO) and a randomization test (Y-scrambling) [54]:
177
(I)
AC C
EP
TE D
M AN U
SC
RI PT
157
LGO Cross-validation (CV) is one of the most commonly used statistical
178
technique for internal validation in which different proportions of the training set
179
are iteratively held-out; then a new model is developed and used to “predict” the
180
held-out compound as new in order to verify internal “predictability”. This
181
procedure is repeated for each set of modified data. Therefore, in this report the
ACCEPTED MANUSCRIPT LGO test is used in such a way that a fraction (5, 10, 15, 20, 25, and 30%) of the
183
initial data is left out (not participating in the model‟s construction) and model
184
predictions are made based on the reduced data. This process is repeated until
185
each observation has been left out once. The accuracies for the reduced training
186
and test (held-out compounds) series are calculated and plotted.
187
(II)
RI PT
182
Y-scrambling, or response permutation testing, is another technique widely use to check the robustness of a QSAR model and to identify models based on
189
chance correlation. The set of activity values is randomly re-assigned to different
190
molecules, and a new QSAR model is performed. This procedure is similar to
191
cross-validation CV but, instead of leaving groups outside, it categorizes the
192
assignments (1 for −1 and vice versa), for each (active and inactive) group of the
193
database 5%, 10%, 15%, 20%, 25% and 30% of the total. This process is
194
repeated until each case has been inverted once. The accuracies for new models
195
are calculated and plotted. If the quality of the random response-models is
196
comparable to the original one, the set of observations is not sufficient to support
197
the model and the chance correlation is detected.
TE D
M AN U
SC
188
Validation external process is necessary to ensure the quality and predictive power of
199
the QSAR models to predict the activity of compounds that were not used for model
200
development. In this study, the original data are divided into two series, the training and
201
test sets. The training set is used to build the QSAR models, and these discriminant
202
functions (DFs) are used to predict the activities of compounds in the test set. The
203
predictability of a model is estimated by comparing the predicted and observed classes
204
of a sufficiently large and representative test of compounds.
205
Biological Assays
206
AC C
EP
198
Determination of „in vitro‟ Tripanosomicidals Activity [55].
ACCEPTED MANUSCRIPT Parasites: For in vitro studies, the clone CL-B5 of T. cruzi was used. The
208
parasites, stably transfected with the Escherichia coli β-galactosidase gene (lacZ),
209
were kindly provided by Dr. F. Buckner through Universidad Complutense
210
Madrid (Spain). The epimastigotes are grown at 28 ºC in liver infusion tryptose
211
broth (LIT) with 10% fetal bovine serum (FBS), penicillin and streptomycin.
212
Reagents: Chlorophenol red-β-D-galactopyranoside (CPRG; Roche, Indianapolis,
213
Ind.) is dissolved in Triton X-100 (pH7.4).
214
Epimastigote susceptibility assay: The screening assay was performed in 96-well
215
microplates with cultures that have not reached the stationary phase.
216
Epimastigotes were seeded at 2 x 105 per milliliter in 200 µL. The plates are then
217
incubate with the drugs at 28ºC for 72 hours, at which time 50 µL of CPRG
218
solution was added to give a final concentration of 200 µM. The plates were
219
incubated at 37ºC for an additional 6 h and are then read at 595 nm. Each
220
concentration was tested in triplicate. Each experiment was performed twice
221
separately. The efficacy of each compound is estimated by calculating the IC50
222
and AE% (antiepimastigotes percentage).
SC
M AN U
TE D
Cytotoxicity Assay [56].
EP
223
RI PT
207
Cell culture: The cell lines that are used are NCTC clone 929 and murine J774
225
macrophages. NCTC clone 929 cells are grown in Minimal Essential Medium
226 227
AC C
224
(Sigma) and J774 macrophages are grown in RPMI 1640 medium (Sigma). Both media are supplement with 10% heat-inactivated FBS (30 minutes at 56ºC),
228
penicillin G (100 U/mL) and streptomycin (100 µg/mL). For the experiments,
229
cells in the pre-confluence phase are harvested with trypsin. Cell cultures are
230
maintained at 37ºC in a humidified 5% CO2 atmosphere.
ACCEPTED MANUSCRIPT Reagents: Resazurin Sodium Salt was obtained from Sigma-Aldrich, St. Louis,
232
USA and stored at 4 º C protected from light. A solution of Resazurin was
233
prepared in 1% phosphate buffer solution (PBS) at pH7 and filter-sterilized before
234
use.
235
Macrophages cell cytotoxicity assay: J774 macrophages were seeded
236
(5x104cells/well) in 96-well flat-bottom microplates with 100 µL of RPMI 1640
237
medium. The cells were allowed to attach for 24 h at 37ºC, 5% CO2 and the
238
medium is replaced by different concentrations of the drugs in 200 µL of medium,
239
and exposed for another 48 h. Growth controls are also included. Afterwards, a
240
volume 20 µL of 2 mM resazurin solution was added and plates were return to
241
incubator for another 3 h to evaluate cell viability. The reduction of resazurin was
242
determined by dual wavelength absorbance measurement at 490 nm and 595 nm.
243
Background was subtracted. Each concentration was assayed in triplicate.
244
Medium and drug controls were used in every test as blanks. The cytotoxicity
245
percentages (%C) and IC50 were then calculated.
SC
M AN U
TE D
246
RI PT
231
Results and Discussion
248
LDA-QSAR Models: Developing and validation.
249
The LDA has become an important tool for the prediction of chemical properties.
250
Because of the simplicity of this method many useful discriminant models have been
251
developed and presented by different authors in the literature [18, 44, 57-59]. It was the
252
technique used for the generation of a discriminant function in the present work. The
253
principle of maximal parsimony (Occam‟s razor) was taken into account as the strategy
254
for model selection [60]. The general data set was randomly divided into two subsets,
255
training and test set (which have 346 and 94 compounds, respectively), both of them
AC C
EP
247
ACCEPTED MANUSCRIPT 256
containing active and inactive compounds. The best models obtained using bond-based
257
non-stochastic and stochastic quadratic indices as molecular descriptors, together with
258
their statistical parameters are given below, respectively:
259
Class= -5.05 – 9.40x10-5 MVb2H( w , u ) +2.17x10-12 MVb15LH( w E, u E) +8.76x10-1 PKb0H( w , u ) –7.95x10-3PKb4H( w , u ) – 2.24x10-3PKb4LH( w E, u E)
261
– 6.16x10-2VKb0H( w , u ) +2.86x10-3VKb3H( w , u ) N = 346
= 0.459
263
D2 = 5.17
F = 56.88
QTotal = 87.86 % p < 0.0001
264
MCC = 0.73
Class= -3.27 –3.08x10-2 MKsb6L( w E, u E) +3.11x10-3 MVsb6L( w E, u E)
M AN U
265
(1)
SC
262
RI PT
260
266
+2.57x10-1 PKsb7H( w , u ) +4.52x10-1 PKsb14LH( w E, u E) +5.39x10-3 VKs b4H( w , u )
267
–9.30x10-2 VPsb15LH( w E, u E) +1.79x10-1 VPsb6L( w E, u E)
268
–1.77x10-1 VPsb9L( w E, u E) = 0.509
269
N = 346
270
D2 = 4.224
271
where, Class refers to antitrypanosomal activity, N is the number of compounds, is
272
the Wilks‟ statistic, QTotal is the accuracy of the model for the training set, MCC is the
273
Matthews‟ correlation coefficient, D2 is the square Mahalanobis distance, F is the Fisher
274
ratio and p-value is its significance level.
275
Both equations appeared statistically significant at p