Chemosphere 112 (2014) 120–127

Contents lists available at ScienceDirect

Chemosphere journal homepage: www.elsevier.com/locate/chemosphere

Quantitative structure–activity relationship for toxicity of ionic liquids to Daphnia magna: Aromaticity vs. lipophilicity Kunal Roy a,b,⇑, Rudra Narayan Das b, Paul L.A. Popelier a,⇑ a b

Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, Great Britain, United Kingdom Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India

h i g h l i g h t s

g r a p h i c a l a b s t r a c t

 Water solubility of ionic liquids (ILs)

raises concerns on their pollutant potential.  Daphnia magna is a widely accepted model organism for toxicity testing.  Predictive QSAR models have been developed here for toxicity of ILs to Daphnia magna.  The models clearly indicate that aromatic ILs are more toxic than aliphatic ones.  More lipophilic ILs with less toxicity may be designed by avoiding aromaticity.

a r t i c l e

i n f o

Article history: Received 4 December 2013 Received in revised form 12 March 2014 Accepted 4 April 2014

Handling Editor: J. de Boer Keywords: QSAR Ionic liquid ETA QTMS Lipophilicity Aromaticity

a b s t r a c t Water solubility of ionic liquids (ILs) allows their dispersion into aquatic systems and raises concerns on their pollutant potential. Again, lipophilicity can contribute to the toxicity of ILs due to increased ability of the compounds to cross lipoidal bio-membranes. In the present work, we have performed statistical model development for toxicity of a set of ionic liquids to Daphnia magna, a widely accepted model organism for toxicity testing, using computed lipophilicity, atom-type fragment, quantum topological molecular similarity (QTMS) and extended topochemical atom (ETA) descriptors. The models have been developed and validated in accordance with the Organization for Economic Co-operation and Development (OECD) guidelines for quantitative structure–activity relationships (QSARs). The best partial least squares (PLS) model outperforms the previously reported multiple linear regression (MLR) model in statistical quality and predictive ability (R2 = 0.955, Q2 = 0.917, R2pred ¼ 0:848). In this work, the ETA descriptors show importance of branching and aromaticity while the QTMS descriptor ellipticity efficiently shows which compounds are influential in the data set, with reference to the model. While obvious importance of lipophilicity is evident from the models, the best model clearly shows the importance of aromaticity suggesting that more lipophilic ILs with less toxicity may be designed by avoiding aromaticity, nitrogen atoms and increasing branching in the cationic structure. The developed quantitative models are in consonance with the recent hypothesis of importance of aromaticity for toxicity of ILs. Ó 2014 Elsevier Ltd. All rights reserved.

⇑ Corresponding authors. Address: Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India. Tel.: +91 9831594140. E-mail addresses: [email protected] (K. Roy), [email protected] (P.L.A. Popelier). http://dx.doi.org/10.1016/j.chemosphere.2014.04.002 0045-6535/Ó 2014 Elsevier Ltd. All rights reserved.

K. Roy et al. / Chemosphere 112 (2014) 120–127

1. Introduction Ionic liquids (ILs) are a relatively new class of chemicals composed of ions and having melting points below 100 °C (Das and Roy, 2013). Their beneficial and tuneable physicochemical properties such as their excellent thermal and electrochemical stability, low vapour pressure, wide range of fluidity and favourable solvation properties make them interesting for application in different fields including synthetic chemistry, electrochemistry, analytical chemistry, separation and extraction, and other engineering and biological applications (Ventura et al., 2013). Some of their specific applications are in gas compression, sensors, lithium-ion batteries, dye-sensitised solar cells or as potential pharmaceutical ingredients (Stolte et al., 2012). Their negligible vapour pressure, less hazardous synthesis, non-flammability and finally, potentially lower (eco)toxicity have made them popular as ‘‘green solvents’’ (Das and Roy, 2013). They are also considered as ‘‘designer solvents’’ due to the possibility of freely manipulating their characteristics by combining different cations and anions to meet the requirements for a specific application (Ventura et al., 2013). In principle, there are well over one million ILs that can be synthesized, although only about one thousand have been reported to date (Hossain et al., 2011). Because ILs are water-soluble, aquatic ecosystems will be greatly affected on their release to the environment (Luo et al., 2008). To date, there have been relatively few reports on the biological toxicity of ILs. The REACH (Registration, Evaluation, Authorization and Restriction of Chemicals) regulation in the European Union requires the registration of new commercial chemicals and holds the suppliers responsible for their products (Ventura et al., 2013). Thus, predictive toxicology tools should be applied for risk assessment of ILs on various ecologically important endpoints. An alternative to experimental assessment is to predict the toxicity of ILs using structure–activity correlations. A quantitative structure–activity relationship (QSAR) is a mathematical relationship between the response being modelled with descriptors containing chemical information in numerical form (Roy and Mitra, 2011). Such models are generated with application of different chemometric tools and rigorous validation obeying the five principles of the Organization for Economic Co-operation and Development (OECD) (Benfenati, 2007). The QSAR tools have been extensively used for modelling various toxicity endpoints and risk assessment of chemicals and this approach has been employed and encouraged by different regulatory authorities including the United States Environmental Protection Agency (Kar and Roy, 2010). Recently, QSAR has also been used to model the toxicity of ILs in a limited number of published reports (for example, Alvarez-Guerra and Irabien, 2011; Das and Roy, 2012; Viboud et al., 2012). Daphnia magna is a widely accepted model organism for toxicity testing of chemicals because of its rapid reproduction rate and sensitivity to changing conditions (Ventura et al., 2013). It is also specified in the OECD Guidelines for testing of chemicals. D. magna is much more sensitive to ILs than bacteria (Photobacterium phosphoreum) (Luo et al., 2008). However, there are only limited studies reported on QSAR of toxicity of ionic liquids on D. magna. Hossain et al. (2011) reported a QSAR model based on group contribution method using a large number of descriptors; however, they have not used any validation tool for judging reliability of their developed model. Recently, we have reported regression and classification-based models (Roy and Das, 2013) for prediction of toxicity of ionic liquids using extended topochemical atom (ETA) indices (Roy and Ghosh, 2004; Roy and Das, 2012) along with computed lipophilicity and various topological descriptors. In our present attempt, we have additionally used bond critical point (BCP) descriptors with quantum-chemical origin in order to improve further the quality of the models. Additionally, we have

121

tried to develop quantitative models showing contributions of lipophilicity and aromaticity to the toxicity of ionic liquids to give a clearer insight into the structure-toxicity relationships in order to design ‘‘greener’’ solvents (Ventura et al., 2013). 2. Materials and methods 2.1. The data set In the present study, we have used toxicity data (pLC50, concentration in mM, 48 h data) of ionic liquids to D. magna as compiled in our previous work (Roy and Das, 2013). As we have here used regression-based modelling, we could use 49 data points with quantitative toxicity data. Table S1 in Supplementary Materials shows the list of compounds along with the quantitative toxicity data. We have retained the same serial number of the compounds as reported in the earlier report (Roy and Das, 2013) for easy comparison. 2.2. Descriptors Although we computed some descriptors for anions as well, these were not selected in the final models; hence, we have limited our discussion to cation descriptor only given their importance in the response (Roy and Das, 2013). AlogP has been used as the hydrophobicity descriptor considering the significance of lipophilicity in explaining toxicity. From the knowledge of a parabolic dependence of the response on AlogP from our previous report (Roy and Das, 2013), we have also used (AlogP)2 as a descriptor. Additionally, we have used ETA descriptors, selected constitutional and topological descriptors as shown in Tables S2 in Supplementary Materials section. The values of these descriptors were taken from the previous work (Roy and Das, 2013). We have further used quantum topological molecular similarity (QTMS) indices computed at the HF/6-31G(d) level of theory. The details of the QTMS descriptors can be found in previous publications (Popelier, 1999; O’Brien and Popelier, 2001). This method introduces quantities defined by Quantum Chemical Topology (QCT) as descriptors (O’Brien and Popelier, 2002) and has been applied before in a toxicological context (Popelier et al., 2002; Loader et al., 2006). In summary, the QTMS descriptors focus on bond critical points, which occur when the gradient of the electron density (Dq) vanishes at some point in space, located between two bonded nuclei. The electron density at a BCP can be related to bond order via an exponential relationship. At a BCP, the Hessian of q has two negative eigenvalues (k1 < k2 < 0) and one positive one (k3 > 0). Eigenvalues express local curvature of electron density in a point: negative eigenvalues are curvatures perpendicular to the bond, while the positive eigenvalue measures the curvature along the bond. A measure of p character for a homopolar bond is the so-called ellipticity (Ell), which is defined as (k1/k2)1. Additionally, the equilibrium bond length has also been used as one of the descriptors along with other QTMS descriptors. First, an estimated geometry was obtained using the program GaussView (GaussView 4.1, 2003), which was then passed on to the ab initio program GAUSSIAN03 (Frisch et al., 2004). The BCP descriptors were calculated at the HF/6-31G(d) level using a local version of the program MORPHY98 (Popelier, 1996), which locates the BCPs using an automatic and robust algorithm. The BCP descriptors of a particular bond type (quaternary atom singly bonded to a sp3 carbon, i.e., X+-C(sp3) where, X+ is N+ or P+) were considered as variables for the statistical model development. In case of more than one such bond in a cation, we have used as the descriptor’s value the average over all such bonds. Note that for the first time, QTMS descriptors have been computed here for a particular bond type, that is, allowing for a variation of one element

122

K. Roy et al. / Chemosphere 112 (2014) 120–127

involved in the bond (N or P). In previous QTMS studies the elements A and B forming the bonds A–B were always kept the same, throughout the congeneric series of data compounds.

count (topological) and atom-type fragment descriptors using the F criteria to enter 4 and to remove 3.9 (Snedecor and Cochran, 1967), the following equation was obtained,

2.3. Model development and validation

pLC50 ¼ 3:362 þ 0:990  AlogP  0:051  ðAlogPÞ  0:367

2

 C001  25:300  PW4  7:100  Ell þ 0:470  nN Though initially multiple linear regression (MLR) equations were built, finally partial least squares (PLS) regression equations were reported as PLS is a more robust method than MLR (Wold et al., 2001). Moreover, PLS does not suffer from the problem of inter-correlation among descriptors. The PLS algorithm uses scaling of the original descriptors followed by the computation of latent variables (LVs), which are linear combinations of the original descriptors (Wold et al., 2001). Though LVs are actually used during regression against the response variable, PLS can back-transform the PLS components into the original variables making the interpretation of the model easy. For MLR, stepwise regression (with the Fcriterion of entering being 4 or higher, and that of removing 3.9 or lower, F being the significance of the incoming descriptor) or all possible subset regression (with maximum cut-off inter-correlation of 0.7 and minimum R2 of 0.9) method was adopted. The descriptors selected in MLR were subjected to PLS regression with optimum number of latent variables selected through leave-one-out crossvalidation. For model development and subsequent validation, we have used the same splitting of the data set into training and test sets as reported in our earlier work (Roy and Das, 2013). The models are internally validated based on leave-one-out (LOO) cross-validation and externally using the test set prediction. We have reported the equation quality using metrics such as determination coefficient R2, explained variance R2a , standard error of estimate s and variance ratio F. The reported validation metrics include leave-one-out Q2 for the training set and R2pred for test set prediction. We have additionally reported r2m metrics as these have been found to be more stringent than the classical validation metrics (Roy et al., 2012). Note that we have used only the scaled version of r2m metrics (Roy et al., 2013). Additionally, model randomization was performed using an Y-scrambling test, and c R2p values were reported (Mitra et al., 2010). The details of the statistical and validation metrics can be found elsewhere (Roy and Mitra, 2011). The applicability domain of the final model was checked using a leverage approach (Williams plot) and a diversity validation plot using the Euclidean distance approach (Nandy et al., 2013).

R2 ¼ 0:945;

nTraining ¼ 35; s ¼ 0:364;

2

Q ¼ 0:903;

R2a ¼ 0:934; PRESS ¼ 6:6;

F ¼ 80:5ðdf 6; 28Þ; nTest ¼ 14;

R2pred ¼ 0:815

ð1Þ

Note that we have used here QTMS descriptors instead of ETA descriptors as reported in our previous work (Roy and Das, 2013). Here we have also reduced the size of the descriptor pool based on previously obtained knowledge on the importance of descriptors for the response. The definitions of important descriptors are shown in Table 1. As can be observed from Eq. (1), the response shows a parabolic relationship with respect to AlogP, which is similar to the observation made in our earlier work (Roy and Das, 2013). The equation has selected, apart from the AlogP terms, atom-type fragment term C001 (i.e. the number of alkyl substituents), path length descriptor PW4, constitutional descriptor denoted nN (i.e. the number of nitrogen atoms) and the QTMS descriptor ellipticity (Ell). The LOO prediction quality of Eq. (1) is enhanced (Q2 = 0.903) in comparison to that (Q2 = 0.875) observed in the previously reported equation (Roy and Das, 2013). The external prediction quality of the present equation is almost the same as that of the earlier reported one. The extremum of the parabolic response curve with respect to AlogP (keeping other descriptors constant) was determined from Eq. (1), and it was found to be 9.706. We have used this value as a knot of the spline term . A spline term will be assumed zero, if the value of a–f(x) is negative (Rogers, 1996). A spline term partitions the data sample into two classes depending on the value of some feature, which is here AlogP. Such a term is useful for nonlinear modelling as the location of discontinuity is presented to the user in an understandable manner. We tried and its square term as possible descriptors and got the following equation,

pLC50 ¼ 7:657 þ 0:252 < 9:706  AlogP > 0:069 2

2.4. Software tools GAUSSVIEW (GaussView 4.1, 2003) was used for drawing of the molecules. GAUSSIAN 03 (Frisch et al., 2004) was used for optimization of structures while MORPHY (Popelier, 1996) was used for computation of QTMS descriptors. PaDEL-Descriptor (Yap, 2011) and Dragon (Dragon version 6, 2010) were used for computation of ETA and various non-ETA (AlogP, atom-type fragments, selected topological and constitutional) descriptors respectively. STATISTICA (STATISTICA version 7.1.515.0, 2006) and MINITAB (MINITAB version 14.13, 2004) were used for the construction of MLR and PLS models, respectively. The randomization test of the final PLS models was carried out using Cerius2 (Cerius2 version 4.10, 2005) software. The all possible subset regression was performed using a program developed in one of our laboratories and available at http://dtclab.webs.com/software-tools. 3. Results and discussion 3.1. Logical development of models On an initial attempt of developing stepwise MLR model using AlogP, (AlogP)2, QTMS descriptors, constitutional, path length

 ð< 9:706  AlogP >Þ  0:391  C001  26:953  PW4  7:628  Ell þ 0:481  nN R2 ¼ 0:940;

nTraining ¼ 35; s ¼ 0:380;

2

Q ¼ 0:892;

R2a ¼ 0:927; PRESS ¼ 7:3;

F ¼ 73:5ðdf 6; 28Þ; nTest ¼ 14;

R2pred ¼ 0:771

ð2Þ

In comparison to Eq. (1), prediction quality (both internal and external) deteriorates somewhat in Eq. (2). However, when the first order term is removed, the value of external validation metric R2pred rises to 0.844 in Eq. (3): 2

pLC50 ¼ 7:802  0:049  ð< 9:706  AlogP >Þ  0:396  C001  23:126  PW4  7:460  Ell þ 0:460  nN R2 ¼ 0:928;

nTraining ¼ 35; s ¼ 0:410;

2

Q ¼ 0:883;

R2pred ¼ 0:844

R2a ¼ 0:915; PRESS ¼ 7:9;

F ¼ 74:5ðdf 5; 29Þ; nTest ¼ 14; ð3Þ

Using the same descriptor combination as in Eq. (3), we performed PLS regression to obtain Eq. (4) with 4 LVs showing comparable statistics to Eq. (3).

123

K. Roy et al. / Chemosphere 112 (2014) 120–127 Table 1 Definitions of some important descriptors for QSAR of toxicity of ionic liquids to Daphnia magna. Sl. No.

Descriptor

Type

Definition

1

C001

Atom-centred fragment Topological index

‘C’ atom in CH4 and CH3R where ‘R’ is any group PW4 ¼ ðp=wÞ4 ¼ A1  i¼1 ðp=wÞ4i ¼ A1  i¼1 P i =awci 4Pi is the number of paths of length four starting from ith vertex to any other vertex in the molecular graph. awc4i is the total number of equipoise walk of length four and can be defined as: P awc4i ¼ Aj¼1 a4ij

Measure of shape

Constitutional index ETA index

Number of ‘N’ atom

Importance of ‘N’ atoms

2

PW4

3

nN

4

5

DeB

[Ra]p/ [Ra]

ETA index

Meaning

PA 4

PA

DeB ¼ e1  e4 ¼

P

e

N



½

P

eSS

NSS

Chiefly indicates the presence of terminal methyl groups  4

Gives a measure of contribution of unsaturation

, where e is an electronegativity

parameter and N indicates the total number of atoms (including hydrogen). Here, SS denotes saturated carbon skeleton i.e., saturated form of carbon–carbon multiple bond in a chemical structure and eSS and NSS correspondingly refers to the e and N value of such structure In a connected molecular graph, (Ra)p refers to the summation of a values of the vertices that are joined to one other nonhydrogen vertex or atom via covalent bond. Hence, it corresponds to the occurrence of terminal atom or group (nonhydrogen). Here, a is a core count which may be defined as:

Gives measure of the relative branching (with respect to molecular size) and molecular shape

v

1 a ¼ ZZ  PN1 , where Z and Zv are the respective atomic number Zv

6

P [ a] / P X

ETA index

a

and valence electron number of an atom, and PN stands for the period number (Ra)X denotes the summed a values of vertices in a molecular graph that are joined to four other non-hydrogen vertices

Presence of quarternary type of branching is mainly denoted by R1

this descriptor i.e., of type

R4

R

R2.

Provides information

R3

7

g0F

ETA index

½g0F i ¼

½gF i Nv

regarding the relative molecular shape and branching Functionality parameter, denotes contribution of heteroatoms (atoms other than carbon or hydrogen) and multiple bonds present in a molecule

½gR i ½gi Nv

gF, value for the ith atom of a molecule is given by the difference between g value of the original compound and its reference skeleton moiety. The index g0F is defined as the ratio of gF and the number of vertices (Nv) present ¼

in order to impart a relative nature to the index with respect to each atom. The composite topochemical parameter g may be  0:5 P cc , where rij is the topological distance defined as: g ¼ iÞ  0:335  C001  24:000  PW4  9:304  Ell þ 0:576  nN R2 ¼ 0:925;

nTraining ¼ 35; s ¼ 0:169;

2

Q ¼ 0:882;

R2a ¼ 0:915; PRESS ¼ 8:0;

F ¼ 92:4ðdf 4; 30Þ; nTest ¼ 14;

R2pred ¼ 0:833

ð4Þ

Eq. (4) is preferred over Eq. (3) as the previous one is obtained from more robust (PLS) regression (Wold et al., 2001) using a small number of regressing variables (LVs). Interestingly, internal and external prediction qualities of Eq. (4) increase considerably on using an additional ETA parameter P P [ a]X/[ a] with the same number (four) LVs as shown in Eq. (5), 2

A measure of p character for homopolar bonds

We have also tried all possible subset regression using up to 6 predictor variables from the pool of descriptors including QTMS, ETA, atom-type fragments and AlogP. While doing this, only those descriptors showing inter-correlation lower than 0.7 were considered. From the best models, we have further tried PLS regression and the best equation showing very good internal and external prediction qualities is the following: 2

pLC50 ¼ 6:134  0:056  ð< 9:706  AlogP >Þ  12:349  Ell  0:798  C001  37:955  DeB þ 4:989  ð½aP =½aÞ2 þ 1:508  g0F R2 ¼ 0:941;

nTraining ¼ 35; s ¼ 0:133;

pLC50 ¼ 7:855  0:049  ð< 9:706  AlogP >Þ  0:374  C001

Importance of aromatic nucleus

2

Q ¼ 0:908;

R2a ¼ 0:933; PRESS ¼ 6:2;

F ¼ 119:93ðdf 4; 30Þ; nTest ¼ 14;

 25:613  PW4  12:952  Ell þ 0:655  nN þ 2:821

R2pred

 ½aX =½a

The descriptors g and [a]P/[a] belong to the class of ETA descriptors and signify functionality and branching respectively in the chemical structure. Eq. (6) shows significant improvement in quality and prediction ability in comparison to the previously reported equation in Roy and Das (2013). Only four latent variables could explain 93.3% of the variance of the response, and predict

R2 ¼ 0:933;

nTraining ¼ 35; s ¼ 0:150;

2

Q ¼ 0:902;

R2pred ¼ 0:857

¼ 0:868

ð6Þ 0 F

R2a ¼ 0:924; PRESS ¼ 6:7;

F ¼ 105:05ðdf 4; 30Þ; nTest ¼ 14; ð5Þ

124

K. Roy et al. / Chemosphere 112 (2014) 120–127

Table 2 Comparison of quality of the final models. The best model is shown in bold face. Eq. No.

Type Predictors of model

4

PLS

5

PLS

6 10

PLS PLS

7 9 8 Previous paper

LVs R2

AlogP + Atype fragment + QTMS + Dragon ()2, C001, Ell, nN, PW4 AlogP + Atype fragment + QTMS + Dragon + ETA P P ()2, C001, Ell, nN, PW4, [ a]X/ a AlogP + Atype fragment + QTMS + ETA

2 2 2 2 2 r 2mðLOOÞ Drm( LOO) Rpred r 2mðtestÞ Drm(test) r 2mðoverallÞ Drm(overall) c Rp

Q2

R2a

4

0.925 0.915 0.882 0.842 0.001

0.831 0.806 0.104

0.828

0.028

0.922

4

0.933 0.924 0.902 0.865 0.051

0.857 0.826 0.044

0.851

0.027

0.930

()2, C001, Ell, DeB, ([Ra]p/[Ra])2, g0F ()2, C001, Ell, ([Ra]p/[Ra]), ([Ra]p/[Ra])2

4 3

0.941 0.933 0.908 0.874 0.063 0.924 0.917 0.906 0.888 0.058

0.868 0.829 0.000 0.878 0.843 0.015

0.858 0.878

0.047 0.058

0.939 0.922

PLS PLS

AlogP + Atype fragment + QTMS + ETA + Indicator ()2, C001, Ell, DeB, ([Ra]p/[Ra])2, Iaro ()2, C001, Ell, ([Ra]p/[Ra])2, Iaro

4 3

0.955 0.933 0.917 0.938 0.031 0.925 0.917 0.884 0.892 0.058

0.848 0.809 0.084 0.887 0.847 0.059

0.900 0.882

0.000 0.064

0.953 0.922

PLS

AlogP + Atype fragment + QTMS + Dragon + ETA + Indicator ()2, C001, Ell, PW4, Iaro 3

0.912 0.903 0.874 0.866 0.067

0.861 0.823 0.030

0.860

0.060

0.910

MLR

AlogP + Atype fragment + ETA AlogP, (AlogP)2, C001, H-052, [Ra]p/[Ra, [Ra]/Nv

0.948 0.936 0.875 0.832 0.014

0.817 0.802 0.103

0.816

0.016





90.8% of it, while the external predicted variance was found to be 86.8%. Eq. (6) involves the squared spline term of AlogP, one QTMS term (ellipticity or Ell), one atom-type fragment term (C001) and three ETA terms. Note that the regression coefficients presented in Eq. (6) are in the original (unstandardized) form, which means that the relative importance of the descriptors for the response cannot be concluded just from the coefficient values. Based on the standardized regression coefficient values, the descriptors can be ranked as follows: ()2> C001 > DeB > Ell > ([a]P/ [a])2 > g0F . The spline AlogP term indicates that the larger extent to which the value of AlogP is lower than 9.706, the lower is the toxicity. The toxicity also decreases with an increase in the value of the ellipticity, the significance of which is not straightforward. However, it contains electronic information about the single bond connecting the quaternary nitrogen or phosphorus atom (N or P) with adjacent sp3 carbon(s). The negative coefficient of C001 indicates that the toxicity decreases with alkyl substituents as in case of ammonium salts in comparison to imidazolium and pyridinium ILs. The ETA index g0F contains information about heteroatom variation and unsaturation in structure and actually accounts for aromaticity here (the value of g0F will be higher for aromatic compounds than the aliphatic ones). Note that by the term aromaticity we mean here the ratio between the aliphatic portion of the molecules and aromatically invariant ring. This is justified from the fact that when we use an indicator parameter IAro instead of g0F , we got the following equation with a comparable Q2 value, but a slightly reduced R2pred value,

2

pLC50 ¼ 8:709  0:049  ð< 9:706  AlogP >Þ  8:044  Ell  0:475  C001  26:316  PW4 þ 0:486  IAro R2 ¼ 0:912;

nTraining ¼ 35; s ¼ 0:192; R2pred

2

Q ¼ 0:874;

R2a ¼ 0:903; PRESS ¼ 8:5;

F ¼ 107:1ðdf 3; 31Þ; nTest ¼ 14;

¼ 0:861

ð8Þ

When the term PW4 is omitted, the following equation with excellent values of Q2 and R2pred is obtained. However, in this equation, the coefficient of Iaro changes to a negative value due to large positive regression coefficient value of ([a]P/[a])2: 2

pLC50 ¼ 6:344  0:060  ð< 9:706  AlogP >Þ  9:087  Ell  0:854  C001 þ 6:837  ð½aP =½aÞ2  0:187  IAro R2 ¼ 0:924;

nTraining ¼ 35; s ¼ 0:164;

2

Q ¼ 0:884;

R2a ¼ 0:917; PRESS ¼ 7:8;

F ¼ 126:8ðdf 3; 31Þ; nTest ¼ 14;

R2pred ¼ 0:887

ð9Þ

When Iaro is replaced with [a]P/[a], we obtain the following equation showing a parabolic relationship of the response with respect to [a]P/[a]: 2

pLC50 ¼ 6:123  0:060  ð< 9:706  AlogP >Þ  5:404  Ell  0:808  C001  0:674  ½aP =½a þ 8:344  ð½aP =½aÞ2

2

pLC50 ¼ 5:980  0:052  ð< 9:706  AlogP >Þ  12:514  Ell

s ¼ 0:165;

þ 0:969  IAro nTraining ¼ 35; s ¼ 0:100;

R2 ¼ 0:955;

Q 2 ¼ 0:917;

R2pred ¼ 0:848

R2pred R2a ¼ 0:933; PRESS ¼ 5:6;

R2 ¼ 0:924;

nTraining ¼ 35;

 0:816  C001  51:225  DeB þ 5:484  ð½aP =½aÞ2

F ¼ 160:21ðdf 4; 30Þ; nTest ¼ 14; ð7Þ

Eq. (7) indicates that the toxicity of aromatic ILs is higher than that of aliphatic ones. This confirms a previously published hypothesis (Ventura et al., 2013). The ETA parameter DeB indicates the contribution of unsaturation. When the term DeB is replaced by PW4 term and ([a]P/[a])2 is excluded, Q2 value reduces but R2pred value increases.

2

Q ¼ 0:906;

R2a ¼ 0:917; PRESS ¼ 6:3;

¼ 0:878

F ¼ 126:0ðdf 3; 31Þ; nTest ¼ 14; ð10Þ

Eq. (10), which is quadratic in the variable [a]P/[a], yields a minimum toxicity when the value of [a]P/[a] is 0.040 while other descriptors are kept constant. The ETA descriptor [a]P/[a] signifies branching in a molecule. Its value will be higher for more substituted or branched compounds. 3.2. Further validation of the final models We have applied additional validation strategies and computed r2m metrics (scaled version) for the training and test sets and the

K. Roy et al. / Chemosphere 112 (2014) 120–127

125

imidazolium ring substitutions) and 39 (1,1,1,1-tetra-n-butylammonium bromide; note that this is an ammonium salt with all four same substituents) are found to be influential observations as their leverage value is above the critical leverage value. Interestingly, these two compounds have the highest two ellipticity values and hence this QTMS descriptor has been useful in identifying the dissimilar electronic nature of the single bond between quaternary atom and adjacent sp3 carbon of these two compounds with respect to other compounds. A diversity validation plot using the Euclidean distance approach (Fig. 3) also shows that all test set chemicals are within the applicability domain of the model and predictions for these compounds are reliable. 3.3. Discussion on the final models

Fig. 1. Scatter plot of observed vs. calculated (Eq. (7)) toxicity of ionic liquids to D. magna.

overall set. Table 2 shows a comparison of statistical quality of the final (PLS) models. Based on both internal and external validation 2 criteria (r m ðoverallÞ ), Eq. (7) was selected as the best model (shown in bold face in Table 2). Obviously, there is a significant improvement in predictive quality over the model published earlier (Roy and Das, 2013). The present model also substantiates that aromatic ILs are more toxic, as recently suggested (Ventura et al., 2013). The scatter plot of observed vs. calculated response values for Eq. (7) is shown in Fig. 1 suggesting good quality of fit. We have also performed a randomization test of the models and the c R2p values of the PLS models are reported in Table 2. The applicability domain of Model 7 has been checked with the leverage approach and Fig. 2 (Williams plot) shows that all the test set chemicals are within the critical leverage value region and hence within the applicability domain of the model. However, compounds 21 (1-methyl3-[(trimethylsilyl)methyl]-imidazolium bromide; note that this compound contains three low electronegative silicon atoms in the

From the final PLS models [Eqs. (4)–(10)], it is apparent that the toxicity of ILs will decrease when the value of AlogP is low, because the lower is the AlogP value, the higher the value of term will be, which has a negative coefficient in all relevant equations. The negative coefficients of atom-type fragment descriptor C001 indicate that with an increase in the number of alkyl substituents (as in case of ammonium or nonaromatic ILs), the toxicity decreases. This is also evident from the negative coefficient of path length descriptor PW4, because in case of long alkyl-substituted ILs (thus reducing aromatic character), the PW4 value will be high and corresponding toxicity will be low (however, the toxicity increases with an increase in chain length and corresponding rise in lipophilicity for a particular class of ILs). The positive coefficient of nN indicates that the toxicity of nitrogen containing ILs is higher than non-nitrogen containing ILs. The value of [a]P/[a] increases with number of substitutions/increase in branching and decreases with long alkyl chain substituents (thus decreasing the aromatic character of a cation). The value of [a]X/[a] increases when branching increases in comparison to molecular size. The toxicity increases with branching as evidenced from positive coefficients of [a]X/[a] and ([a]P/[a])2 and the toxicity should be minimum when [a]P/ [a] is 0.040. The descriptors g0F and DeB actually account for aromaticity in this work. This is more evident from the use of Iaro descriptor showing a positive coefficient indicating that aromatic ILs are more toxic, an observation in consonance with a recent hypothesis (Ventura et al., 2013).

Fig. 2. Test of applicability domain (Eq. (7)) by the Williams plot.

126

K. Roy et al. / Chemosphere 112 (2014) 120–127

Fig. 3. Test of applicability domain (Eq. (7)) by the Euclidean distance approach.

4. Conclusion

References

Validated PLS models have been generated for toxicity of ionic liquids to D. magna using lipophilicity, atom fragment type, QTMS and ETA descriptors. All five OECD guidelines (a defined endpoint, an unambiguous algorithm, metrics denoting robustness and predictability, applicability domain and mechanistic interpretation) have been followed during development of the models. These models are better in predictive ability than the previously reported model (Roy and Das, 2013). The lipophilicity descriptor AlogP was found to be important as there is a direct relationship between toxicity and lipophilicity. The QTMS descriptor ellipticity is useful in identifying the compounds influential in the data set, with reference to the model. The ETA descriptors signify the importance of branching and aromaticity. The best model also supports that aromatic ILs are more toxic than aliphatic ones as recently suggested (Ventura et al., 2013). This is also evident from the use of an indicator parameter Iaro. Another factor positively influencing the toxicity is presence of nitrogen atoms. Solubility of ILs in water allows their dispersion into aquatic systems and raises concerns about their pollutant potential. Again, lipophilicity can contribute to the toxicity of ILs. Based on the present work, it may be hypothesized that more lipophilic, and at the same time less toxic, ILs may be designed by avoiding aromaticity, nitrogen atoms and increasing branching in cationic structure.

Alvarez-Guerra, M., Irabien, A., 2011. Design of ionic liquids: an ecotoxicity (Vibrio fischeri) discrimination approach. Green Chem. 13, 1507–1516. Benfenati, E., 2007. The quality criteria of the DEMETRA models for regulatory purposes. In: Benfenati, E. (Ed.), Quantitative Structure–Activity Relationships (QSAR) for Pesticide Regulatory Purposes. Elsevier, Amsterdam, pp. 201–282. Cerius2 version 4.10, 2005. Accelrys Inc. San Diego, CA, USA. (accessed 21.11.13). Das, R.N., Roy, K., 2012. Development of classification and regression models for Vibrio fischeri toxicityof ionic liquids: green solvents for the future. Toxicol. Res. 1, 186–195. Das, R.N., Roy, K., 2013. Advances in QSPR/QSTR models of ionic liquids for the design of greener solvents of the future. Mol. Divers. 17, 151–196. Dragon version 6, 2010. Talete srl, Italy. (accessed 21.11.13). Frisch, M.J., Trucks, G.W., Schlegel, H.B., Scuseria, G.E., Robb, M.A., Cheeseman, J.R., Montgomery, J.A. Jr., Vreven, T., Kudin, K.N., Burant, J.C., Millam, J.M., Iyengar, S.S., Tomasi, J., Barone, V., Mennucci, B., Cossi, M., Scalmani, G., Rega, N., Petersson, G.A., Nakatsuji, H., Hada, M., Ehara, M., Toyota, K., Fukuda, R., Hasegawa, J., Ishida, M., Nakajima, T., Honda, Y., Kitao, O., Nakai, H., Klene, M., Li, X., Knox, J.E., Hratchian, H.P., Cross, J.B., Bakken, V., Adamo, C., Jaramillo, J., Gomperts, R., Stratmann, R.E., Yazyev, O., Austin, A.J., Cammi, R., Pomelli, C., Ochterski, J.W., Ayala, P.Y., Morokuma, K., Voth, G.A., Salvador, P., Dannenberg, J.J., Zakrzewski, V.G., Dapprich, S., Daniels, A.D., Strain, M.C., Farkas, O., Malick, D.K., Rabuck, A.D., Raghavachari, K., Foresman, J.B., Ortiz, J.V., Cui, Q., Baboul, A.G., Clifford, S., Cioslowski, J., Stefanov, B.B., Liu, G., Liashenko, A., Piskorz, P., Komaromi, I., Martin, R.L., Fox, D.J., Keith, T., Al-Laham, M.A., Peng, C.Y., Nanayakkara, A., Challacombe, M., Gill, P.M.W., Johnson, B., Chen, W., Wong, M.W., Gonzalez, C., Pople, J.A., 2004. Gaussian 03, revision C.02. Gaussian Inc., Wallingford, CT. GaussView 4.1, 2003. Semichem Inc., Gaussian Inc., Pittsburgh, PA, USA. Hossain, M.I., Samir, B.B., El-Harbawi, M., Masri, A.N., Abdul Mutalib, M.I., Hefter, G., Yin, C.-Y., 2011. Development of a novel mathematical model using a group contribution method for prediction of ionic liquid toxicities. Chemosphere 85, 990–994. Kar, S., Roy, K., 2010. Predictive toxicology using QSAR: a perspective. J. Indian Chem. Soc. 87, 1455–1515. Loader, R.J., Singh, N., O’Malley, P.J., Popelier, P.L.A., 2006. The cytotoxicity of ortho alkyl substituted 4-X-phenols: a QSAR based on theoretical bond lengths and electron densities. Bioorg. Med. Chem. Lett. 16, 1249–1254. Luo, Y.-R., Li, X.-Y., Chen, X.-X., Zhang, B.-J., Sun, Z.-J., Wang, J.-J., 2008. The developmental toxicity of 1-methyl-3-octylimidazolium bromide on Daphnia magna. Environ. Toxicol. 23, 736–744. MINITAB version 14.13, 2004. Minitab Inc., USA, (accessed 21.11.13). Mitra, I., Saha, A., Roy, K., 2010. Exploring quantitative structure–activity relationship (QSAR) studies of antioxidant phenolic compounds obtained from traditional Chinese medicinal plants. Mol. Simul. 36, 1067–1079.

Acknowledgement K.R. thanks the European Commission for a Marie Curie International Incoming Fellowship (ECZ IONTOX). RND thanks CSIR, New Delhi for a Senior Research Fellowship. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.chemosphere. 2014.04.002.

K. Roy et al. / Chemosphere 112 (2014) 120–127 Nandy, A., Kar, S., Roy, K., 2013. Development of classification and regression based QSAR models and in silico screening of skin sensitization potential of diverse organic chemicals. Mol. Simul. 40, 261–274. O’Brien, S.E., Popelier, P.L.A., 2001. Quantum molecular similarity. 3. QTMS descriptors. J. Chem. Inf. Comput. Sci. 41, 764–775. O’Brien, S.E., Popelier, P.L.A., 2002. Quantum topological molecular similarity. Part 4. A QSAR study of cell growth inhibitory properties of substituted (E)-1phenylbut-1-en-3-ones. J. Chem. Soc. Perkin Trans. 2, 478–483. Popelier, P.L.A., 1996. MORPHY, a program for an automated ‘‘atoms in molecules’’ analysis. Comput. Phys. Commun. 93, 212–240. Popelier, P.L.A., 1999. Quantum molecular similarity. 1. BCP space. J. Phys. Chem. A 103, 2883–2890. Popelier, P.L.A., Chaudry, U.A., Smith, P.J., 2002. Quantum topological molecular similarity. Part 5. Further development with an application to the toxicity of polychlorinated dibenzo-p-dioxins (PCDDs). J. Chem. Soc. Perkin Trans. 2 (2002), 1231–1237. Rogers, D., 1996. Some theory and examples of genetic function approximation with comparison to evolutionary techniques. In: Devillers, J. (Ed.), Genetic Algorithm in Molecular Modeling. Academic Press, London, pp. 87–107. Roy, K., Chakraborty, P., Mitra, I., Ojha, P.K., Kar, S., Das, R.N., 2013. Some case studies on application of ‘‘rm2’’ metrics for judging quality of QSAR predictions: emphasis on scaling of response data. J. Comput. Chem. 34, 1071–1082. Roy, K., Das, R.N., 2012. QSTR with extended topochemical atom (ETA) indices. 15. Development of predictive models for toxicity of organic chemicals against fathead minnow using second generation ETA indices. SAR QSAR Environ. Res. 23, 125–140.

127

Roy, K., Das, R.N., 2013. QSTR with extended topochemical atom (ETA) indices. 16. Development of predictive classification and regression models for toxicity of ionic liquids towards Daphnia magna. J. Hazard. Mater. 166–178, 254–255. Roy, K., Ghosh, G., 2004. QSTR with extended topochemical atom indices. 2. Fish toxicity of substituted benzenes. J. Chem. Inf. Comput. Sci. 44, 559–567. Roy, K., Mitra, I., 2011. On various metrics used for validation of predictive QSAR models with applications in virtual screening and focused library design. Comb. Chem. High Throughput Screen. 14, 450–474. Roy, K., Mitra, I., Kar, S., Ojha, P.K., Das, R.N., Kabir, H., 2012. Comparative studies on some metrics for external validation of QSPR models. J. Chem. Inf. Model. 52, 396–408. Snedecor, G.W., Cochran, W.G., 1967. Statistical Methods. Oxford & IBH, New Delhi. STATISTICA version 7.1.515.0, 2006. STATSOFT Inc., USA, (accessed 21.11.13). Stolte, S., Steudte, S., Areitioaurtena, O., Pagano, F., Thöming, J., Stepnowski, P., Igartua, A., 2012. Ionic liquids as lubricants or lubrication additives: an ecotoxicity and biodegradability assessment. Chemosphere 89, 1135–1141. Ventura, S.P.M., Gonçalves, A.M.M., Sintra, T., Pereira, J.L., Gonçalves, F., Coutinho, J.A.P., 2013. Designing ionic liquids: the chemical structure role in the toxicity. Ecotoxicology 22, 1–12. Viboud, S., Papaiconomou, N., Cortesi, A., Chatel, G., Draye, M., Fontvieille, D., 2012. Correlating the structure and composition of ionic liquids with their toxicity on Vibrio fischeri: a systematic study. J. Hazard. Mater. 215–216, 40–48. Wold, S., Sjöström, M., Eriksson, L., 2001. PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109–130. Yap, C.W., 2011. PaDEL-Descriptor: an open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32, 1466–1474.

Quantitative structure-activity relationship for toxicity of ionic liquids to Daphnia magna: aromaticity vs. lipophilicity.

Water solubility of ionic liquids (ILs) allows their dispersion into aquatic systems and raises concerns on their pollutant potential. Again, lipophil...
702KB Sizes 0 Downloads 3 Views