Reverse engineering of gene regulatory networks based on S-systems and Bat algorithm

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

Sudip Mandal*,§, Abhinandan Khan†,¶, Goutam Saha‡,|| and Rajat Kumar Pal†,** *Electronics and Communication Engineering Department, Global Institute of Management and Technology Krishnanagar, West Bengal 741102, India † Computer Science and Engineering Department, University of Calcutta, Acharya Prafulla Chandra Siksha Prangan, JD-2, Sector – III, Salt Lake Kolkata 700098, India ‡ Information Technology Department, North Eastern Hill University, Umshing, Mawkynroh, Shillong, Meghalaya 793022, India § [email protected] ¶ [email protected] ||[email protected] **[email protected]

Received 13 August 2015 Revised 9 December 2015 Accepted 10 December 2015 Published 1 March 2016 The correct inference of gene regulatory networks for the understanding of the intricacies of the complex biological regulations remains an intriguing task for researchers. With the availability of large dimensional microarray data, relationships among thousands of genes can be simultaneously extracted. Among the prevalent models of reverse engineering genetic networks, S-system is considered to be an e±cient mathematical tool. In this paper, Bat algorithm, based on the echolocation of bats, has been used to optimize the S-system model parameters. A decoupled S-system has been implemented to reduce the complexity of the algorithm. Initially, the proposed method has been successfully tested on an arti¯cial network with and without the presence of noise. Based on the fact that a real-life genetic network is sparsely connected, a novel Accumulative Cardinality based decoupled S-system has been proposed. The cardinality has been varied from zero up to a maximum value, and this model has been implemented for the reconstruction of the DNA SOS repair network of Escherichia coli. The obtained results have shown signi¯cant improvements in the detection of a greater number of true regulations, and in the minimization of false detections compared to other existing methods. Keywords: Bat algorithm; cardinality; DREAM4; GNW; gene regulatory network; microarray data; regularization; S-system.

1650010-1

S. Mandal et al.

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

1. Introduction Genes are the functional keys of all cells. Every action like the growth of organs, diseases, etc. that occur inside a cell are a complex regulatory e®ect of di®erent genes. A gene regulatory network (GRN) can be represented by a mathematical model that captures the regulatory control of genes using a directed graph. The genes are indicated by nodes, and the regulatory interactions between the genes (e.g. activation or inhibition) are denoted by edges in such a graph.1 The study of GRNs appears to be a very crucial but complex task in molecular biology as it may hold the key to the discovery of the reasons and subsequent treatment of a disease. However, before the development of microarray technology that contains gene expression value of thousands of genes, the analysis of these genes was a complex and challenging problem for the researchers. The current progress in DNA microarray technologies2 has helped biologists a lot. It has now been made possible to examine the dynamic behavior and the regulations among di®erent genes by analyzing their expression at various instances of time. Several types of linear and nonlinear mathematical models3 have already been proposed to infer GRNs, which is essentially a reverse engineering problem. Boolean Networks4,5 examine binary state transition matrices to search patterns in gene expression depending on a binary state function. A Dynamic Bayesian Network6,7 makes conditional probabilistic transitions8 between network states that merge the features of Hidden Markov Models to include the feedback. Recurrent Neural Network9,10 is also a very popular technique. It is a closed loop Neural Network with a delay variable suitable to model system dynamics from temporal data. S-system11 is another such nonlinear mathematical model that has recently been implemented for modelling di®erent nonlinear dynamics and complex chemical reactions transpired by genes. However, S-system may have di®erent optimal solutions based on di®erent sets of parameter values. Therefore, optimization of the S-system parameters is a crucial step during gene network reconstruction such that ¯tness function i.e. the error is minimized. Di®erent optimization methods such as Di®erential Evolution,12 Multiobjective Optimization,13 Memetic Algorithm,14 Cuckoo Search,15 Clonal Algorithm,16 Immune Algorithm,17 Fire°y Algorithm,18 etc. have already been employed for the reconstruction of GRNs. On the other hand, Wang et al.19 proposed a uni¯ed approach for inferring GRNs. Although it was successful in speeding up the optimization of the S-system parameters to a good extent, it performed poorly in terms of sensitivity and speci¯city. Murata et al.20 introduced Product Unit Neural Network with less convergence time, but the authors did not test the algorithm on noisy and real-life data, and it also had less precision. Conventional methods for inference of real-life genetic networks su®er from the over¯tting problem. Hence, a delicate balance between the accuracy and the actual network structure needs to be achieved. Real-life GRNs are sparsely connected,21 i.e. few regulations exist among genes in a real-life GRN. Kimura et al.1 ¯rst gave the concept of a penalty term or regularizer, based on maximum in-degree or cardinality, 1650010-2

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

Reverse engineering of GRNs based on S-Systems and BA

I and kinetic orders to reduce the over-¯tting problem. Hasan et al.22 further modi¯ed the penalty term by incorporating prior knowledge about both types of kinetic orders. Liu et al.23 introduced separable estimation method and the genetic algorithm with regularization and pruning threshold to infer small scale GRNs. Recently Chowdhury et al.24 proposed an adaptive regulatory genes cardinality where cardinality is varied within the algorithm to achieve the best network structure with the highest accuracy possible. However, its speci¯city is slightly poorer than the other existing techniques. Chowdhury et al.25 incorporated a time delay in di®erential equations to represent system dynamics along with a delay term. Corresponding results for real-life network showed improvements when compared to other state-of-the-art approaches. Palafox et al.26 implemented Dissipative Particle swarm optimization with another regularization term to ¯nd out the optimal structure of small and medium scale real-life networks quite successfully. Most of these proposed methods, however, are yet to accomplish an accurate inference of small-scale real-life GRNs. However, few of them were able to ¯nd all the true regulations, but they also detected some false regulations. Moreover, the `No Free Lunch' (NFL) theorem27 logically states that there is no single metaheuristic that is best suited for solving all kinds of optimization problems. Therefore, ¯nding out the most suitable and e±cient optimization techniques for the accurate inference of small GRNs is still an open problem for researchers. The Bat algorithm (BA), ¯rst proposed by Yang28,29 is based on the echolocation property of bats. BA has been implemented successfully in others ¯elds of engineering.30,31 However, to the best of the authors' knowledge, it is yet to be incorporated into the problem of parameter optimization of S-systems. As BA can be successfully implemented for continuous parameter optimization and multi-objective optimization32,33 it may be suitable for the parameter optimization of an S-system based model of GRN. In this paper, BA has been implemented for the reconstruction of GRNs using a decoupled and regularized S-system. The rest of the paper is organized as follows. The basics of S-system and BA have been discussed in the next section. The details of the S-system ¯tness function, the decoupled S-system with a penalty or pruning term, and the learning process for ¯nding the accurate structure of a GRN have been discussed in Sec. 3. Subsequently, the e®ectiveness of the proposed BA-based S-system optimization model has been tested against an arti¯cial GRN (with and without the presence of noise). Next, a novel accumulative cardinality based decoupled S-system with a known regularizer technique has been proposed for the reconstruction of a real-life benchmark SOS network of E. coli. Results have also been compared with other existing techniques. The paper concludes with Sec. 5. 2. Theoretical Background Before elaborating on the methodology of optimization of the S-system parameters for the reconstruction of a GRN using BA, let us revise their basic concepts. 1650010-3

S. Mandal et al.

2.1. Preliminaries of S-system model Biochemical systems like GRNs can be modelled by a set of ordinary di®erential equations (ODEs). However, nonlinear di®erential equation models, such as S-systems, can model more complicated genetic behaviors and dynamics successfully.1,34 If there are N genes of interest, and Xi is the expression level of the ith gene, then the dynamics of a GRN19,20,35 may be modelled as:

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

dXi ¼ fi ðX1 ; X2 ; . . . ; XN Þ: dt

ð1Þ

The function fi is nonlinear and needs to be calculated from the time-series dataset.35 Reconstruction of a GRN based on the S-system model from timeseries microarray data can be expressed by power-law functions19 that can be written as24–26:

Y g Y h dXi ¼ i X j i;j i X j i;j ði ¼ 1; 2; . . . ; NÞ: dt j¼1 j¼1 N

N

ð2Þ

In the above equation, Xi is the state variable, i and i are the positive rate constants for the increasing and diminishing term, respectively,19 and gi;j and hi;j are the kinetic orders of the system that are also called exponential parameters. If gi;j > 0 then gene j will lead to the expression of gene i, and that is known as activation. On the other hand, if gi;j < 0 then gene j will suppress the expression of gene i, and this is known as inhibition.19 The exponential parameter, hi;j has the inverse e®ect on controlling gene expressions compared to the parameter, gi;j . 2.2. Preliminaries of BA BA, initially proposed by Yang,28 is inspired by the echolocation behavior of bats.34 Bats transmit very loud and high-frequency sound continuously and listen for the echo that re°ects back from the surrounding objects. Thus, a bat can compute direction and distance of the object from the transmitting and receiving waves. Yang idealized some rules8,29,34 to model these behaviors into BA: .

All bats use echolocation to measure distance or direction of objects, and they can also discriminate the di®erence between food and prey, and background obstacles. . Bats °y randomly with a velocity vi at any position xi , and with a minimum frequency fmin , a varying wavelength , and loudness, A0 of their sound to search for prey. They can automatically adjust the wavelength (or frequency) of their emitted sound pulses and adjust the rate of emission r 2 ½0; 1, depending upon the proximity of their target. . Loudness varies from a large (positive) A0 to a constant minimum value Amin . 1650010-4

Reverse engineering of GRNs based on S-Systems and BA

2.2.1. Initialization of parameters or solutions Initial population29,34 of positions of bats are randomly produced from real-valued vectors with dimension jb and the population of bats n by taking into account the lower and upper boundaries. xib;jb ¼ xmin;jb þ rand ð0; 1Þ ðxmax;jb xmin;jb Þ:

ð3Þ

In the above equation ib ¼ 1; 2; . . . ; n and jb ¼ 1; 2; . . . ; d; xmax;jb and xmin;jb are the lower and upper boundaries for dimension jb, respectively, and d is the total number of parameters to be evaluated or optimized. J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

2.2.2. Update process of frequency, velocity, and position The frequency factor that controls the step size of a solution in BA29,34 is assigned a random value for each bat (solution) ½Qmin ; Qmax . The velocity of a solution is proportional to the frequency, and a new solution depends on its new velocity. Qib ¼ Qmin þ ðQmax Qmin Þ ;

ð4Þ

ðx nt ib

ð5Þ

v nt ib

¼

v nt1 ib

þ

x

best

Þ Qib ;

nt1 x nt þ v nt ib ¼ x ib ib :

ð6Þ

In the above equations, nt denotes the iteration number, x best denotes the current global best solution obtained so far, and 2 ½0; 1 is a random number that modi¯es the frequency. Local search part of algorithm (exploitation) for one solution is selected by xnew ¼ xold þ "A nt ;

ð7Þ

where A nt is the average loudness of all bats and " 2 ½0; 1 is a random number which controls the direction and intensity of random-walk.8,29,34 2.2.3. Loudness and pulse emission rate update process As a bat gets closer to its prey then loudness A usually decreases and so does the pulse emission rate r, and those are updated according to the following29,34: ¼ A nt A ntþ1 ib ; ib

ð8Þ

¼ r 0ib ½1 e nt ; r ntþ1 ib

ð9Þ

where and are the loudness reduction and pulse rate increment constants,29,34 r 0ib and A 0ib are the initial pulse rate and loudness which are random values between ½0; 1. 3. Methodology The S-system model represents a set of i , i , gi;j , and hi;j . The inference of genetic networks using the S-system model is done by ¯nding the optimum values of the S-system parameter for which the learning error is minimized.20 1650010-5

S. Mandal et al.

3.1. Estimation criteria and decoupled S-system Normally, an objective function or a ¯tness function is used to measure the quality of a solution of S-system. Here, objective function for the S-system model is used to evaluate the total estimated squared error between experimental and calculated gene expression value, f 22: f¼

M X N X T X X

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

k¼1

i¼1

t¼1

cal;k;i;t

Xexp;k;i;t

2

Xexp;k;i;t

:

ð10Þ

In the above equation, N is the number of genes in the problem, T is the number of sampling instances of the observed gene expression data, and M is the number of training datasets20; Xcal;k;i;t is numerically calculated gene expression value of kth dataset at time t of the ith gene using the set of trained parameters of the S-system model. Lastly, Xexp;k;i;t is the actual gene expression level of the kth dataset at time t for the ith gene. S-system modelling is a nonlinear function optimization problem. To discover the optimal S-system parameters, the ¯tness function or the squared error is required to be minimized so that the calculated gene expression dynamics ¯ts the dynamics of trained gene expression data.20 For N genes, a total of 2NðN þ 1Þ S-system parameters are to be determined to solve the set of di®erential equations given by (2). Therefore, the dimension of the search space 2NðN þ 1Þ becomes huge for large genetic networks. To overcome this problem, the genetic network inference problem can be divided or decoupled into several sub-problems for a single gene. The change in expression level of a particular gene in a given time instant depends on the expression levels of all genes in the previous time instance only. Moreover, the changes in expression level for di®erent genes in that given time instant are independent of each other. Therefore, a decoupling procedure can be introduced here without losing any vital information. Now, the objective function fi for this decoupled S-system is the training error for the ith gene only i.e.20: fi ¼

M X T X X

cal;k;i;t

Xexp;k;i;t

2

Xexp;k;i;t

k¼1 t¼1

Y g Y h dXi ¼ i X j i;j i X j i;j : dt j¼1 j¼1 N

;

ð11Þ

N

ð12Þ

Hence, the number of S-system parameters required to be optimized is only 2ðN þ 1Þ parameters for the ith gene for decoupled S-system. Thus, the proposed decoupling method reduces a 2NðN þ 1Þ-dimensional problem to a 2ðN þ 1Þ-dimensional problem for each gene.20 Instead of ¯nding 2NðN þ 1Þ parameters in single program run, the decoupled method focuses to optimize 2ðN þ 1Þ parameters in a single run and execute the program for N times for N di®erent genes. Accumulating the 1650010-6

Reverse engineering of GRNs based on S-Systems and BA

2ðN þ 1Þ parameters of all N genes, the overall structure of the S-system, and hence, the ¯nal GRN can be achieved. To train the S-system model, we have to predict the present expression value of a gene i from all available previous data. To do that we need to solve Eq. (12), which is nonlinear in nature. It becomes very complex and time-consuming if we directly attempt to solve the family of ODEs, even for a single gene. There are only a ¯xed number of time points available from the expression pro¯les which show saturation. Thus, we propose to solve Eq. (12) by linearizing it as follows:

Y g Y h Xi ¼ i X j i;j i X j i;j t j¼1 j¼1

or;

Y g Y h X itþ1 X it ¼ i X j i;j i X j i;j t j¼1 j¼1 0

or;

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

N

N

N

N

X itþ1 ¼ X it þ @i

N Y

g

X j i;j i

j¼1

N Y

1 X j i;j A t: h

ð13Þ

j¼1

In this work, we have used Eq. (13) to predict the expression value of a gene i for the current time point from the information of all available genes at the previous time point. In this work, three di®erent types of datasets have been used where time intervals of each sampling data point t is di®erent on a case to case basis. For the case of the arti¯cial dataset, we have used t ¼ 0:05 and ¯nal time point tf ¼ 1. Similarly, for the experimental dataset of the E. coli SOS DNA repair network, we have used t ¼ 6 and ¯nal time point tf ¼ 294 as there are 50 time points in the microarray dataset. Finally, for the 20 gene network, we have used t ¼ 20 and ¯nal time point tf ¼ 1000. 3.2. Regularization as penalty term for real-life network Real-life genetic networks are sparsely connected, i.e. very few connections exist between the genes. As microarray data is also very noisy, S-system based GRN modelling may lead to di®erent optimal solutions. Each such solution may possess a satisfactory ¯tness value but with di®erent network connectivity. To overcome this over-¯tting problem for real-life genetic networks, we need to achieve a delicate balance between the prediction error and the actual structure of a GRN. A regularization function20 can be introduced as a penalty term along with the error function to avoid this particular problem. To generate sparse solutions, the concept of in-degree or cardinality1,20 has been used. The cardinality of a gene is de¯ned as the number of regulations allowed for that particular gene. It was assumed that out of the N kinetic parameters for each of the g and h, only I nonzero values are allowed, thereby forcing the other 2ðN IÞ values to be zero. If any of these 2ðN IÞ elements achieved a nonzero value during 1650010-7

S. Mandal et al.

the optimization process, the solution has been penalized in the following way1,20: M X T N I X X Xcal;k;i;t Xexp;k;i;t 2 þc ðjGi;j j þ jHi;j jÞ; ð14Þ fi ¼ Xexp;k;i;t j¼1 k¼1 t¼1 where Gi;j and Hi;j are the vectors that contain the absolute values of gi;j and hi;j , respectively sorted in the ascending order, C is the weight constant that denotes the magnitude of the penalty introduced to balance between over-¯tting and the actual network structure.

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

3.3. Learning process Training the S-system model parameters so as to ¯t best the predicted expression dynamics to the training data is, in essence, an optimization problem. To optimize the parameters of an S-system model, BA has been introduced to minimize the prediction error. In BA-based GRN, xib;jb indicates the position of a bat and it corresponds to a solution (i.e. a set of ; ; g vector, and h vector) in the search space as it moves gradually towards the minimum error area by updating v nt ib velocity with each iteration following Eq. (4)–(6). The objective function of BA is the squared error in output for S-system model. Now, the pseudo code27,28,33 of the proposed method can be given as: Objective function f(X) = fi (MSE error for Decoupled S-system with the penalty term and value ofI ) For i = 1 : N Generate initial population of batswith positions xib with the set, [ib , ib , gib;jb and hib;jb ] (for ib = 1, 2,…,n and jb = 1, 2, . . . , d) and velocity vib (generally zero). Initialize Qib , rib and Aib at xib ; Find out the current best solution; While (nt < Maximum number of iteration) Generate new solutions by adjusting frequency, and updating velocities and locations/solutions; If (rand > rib ) Select a solution among the best solutions; Generate a local solution around the selected best; End if; If (rand < Aib and f(xib ) < f (x best )) Accept new solutions; Increase rib , reduce Aib ; End if; Rank the bats and find current best xbest ; End while; End for; Display results; 1650010-8

Reverse engineering of GRNs based on S-Systems and BA

4. Experimental Results

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

In this research work, the performance of the inference algorithm has been evaluated on both arti¯cial and real-world genetic networks, and the experimental results have been compared with other existing methods found in the contemporary literature. Now, the performance of an S-system model has been measured in terms of its sensitivity (Sn ) and speci¯city (Sp ) which are de¯ned as follows32: Sn ¼

TP ; TP þ FN

ð15Þ

Sp ¼

TN ; TN þ FP

ð16Þ

where True Positive (TP) denotes the number of correctly predicted regulations, and True Negative (TN) represents the number of properly predicted nonregulations. False Positive (FP) denotes the number of incorrectly predicted regulations, and False Negative (FN) represents the number of falsely predicted nonregulations by the inference algorithm. All the experiments have been performed using MATLAB R2009b on a laptop running Windows 7 with an Intel°c Dual Core processor and 2GB of RAM. 4.1. Inference for arti¯cial system with noise free data To test the e®ectiveness of BA-based S-system modelling of GRN, a benchmark small arti¯cial network has been chosen which contains ¯ve genes with a simple regulatory dynamics. Earlier approaches1,22,24,26,36–38 already used this network to verify the e±ciency of their proposed algorithms. Therefore, in this paper, the same network has been used to validate our methodology as well as to compare its performance with the earlier works. 4.1.1. Experimental setup First the proposed methodology has been applied to noiseless data using the parameters shown in Table 1. If an insu±cient amount of time-series data is given as training data, due to the high degree-of-freedom of an S-system, several optimal solutions may exist. Therefore, a large number of di®erent time-series data are needed for training.

Table 1. Actual S-system parameters for arti¯cial network. i

gi;1

gi;2

gi;3

gi;4

gi;5

hi;1

hi;2

hi;3

hi;4

hi;5

i

i

1 2 3 4 5

0 2 0 0 0

0 0 1 0 0

1 0 0 2 0

0 0 0 0 2

1 0 0 1 0

2 0 0 0 0

0 2 1 0 0

0 0 2 0 0

0 0 0 2 0

0 0 0 0 2

5 10 10 8 10

10 10 10 10 10

1650010-9

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

S. Mandal et al.

Speci¯cally, 15 sets of noise-free time-series data have been used, each consisting of all the ¯ve genes. The time-series data has been generated by solving the set of di®erential Eq. (13). The initial values of these sets have been selected from the earlier work.1 The number of sampling points has been set as 21, and it has been observed that during data generation few gene expression values become negative after some sampling points. S-system can fail for negative values due to its power law function. Therefore, the datasets, which contain only positive values, have been used for training. For our work, there was a total of 70 data points for each of the genes. For each of the genes, we have 12 identi¯able parameters. The search space has been selected as ½3; 3 for the kinetic orders, gi;j and hi;j and ½0; 12 for the rate constants, i , and i . Here, the value of I has been set as 5 i.e. the maximum number of all types of possible regulations for a particular gene is 5. For BA, the number of iterations and initial solutions has been set at 5000 and 300, respectively. The number of maximum iteration and initial population are kept high to deal with the nonlinearity of S-system. Moreover, these values have been chosen after testing with some possible values where BA performs better. The values of both and have been set to 0:95. The choice of parameters requires some experimenting.28 Boundary of frequency has been initialized to ½0; 1 and the step size while random-walk has been ¯xed to 0.001. 4.1.2. Results Table 2 shows the inferred parameters for the ¯rst experiment. The values of kinetic orders less than 0.1 have been ignored. The BA-based S-system model has given satisfactory results for noiseless data as almost correct values of all parameters have been predicted. Moreover, BA has also inferred the correct sign and position of the regulations and nonregulations, accurately. However, the predicted parameter values for gene number 3 were slightly less accurate, but still quite satisfactory in terms of the prediction of TPs and FPs and their nature of regulation. Table 3 shows that it needs fewer time points than other state-of-the-art methods. Thus, it can be concluded that with fewer time points for training, BA can detect all the correct regulations and their nature e±ciently with only minor variations in the numerical values of the parameters. However, an increase in the number of training samples does not change the scenario much. Table 3 also shows the comparison of the results, on the basis of speci¯city and sensitivity of the proposed method with that of

Table 2. Inferred S-system parameters for arti¯cial network using noiseless data. i

gi;1

gi;2

gi;3

gi;4

gi;5

hi;1

hi;2

hi;3

hi;4

hi;5

i

i

1 2 3 4 5

0.0 2.1 0.0 0.0 0.0

0.0 0.0 1.1 0.0 0.0

1.0 0.0 0.1 2.0 0.0

0.0 0.0 0.1 0.0 2.0

1.0 0.0 0.1 1.0 0.0

2.0 0.0 0.0 0.0 0.0

0.0 2.1 1.0 0.0 0.0

0.0 0.0 2.1 0.0 0.0

0.0 0.0 0.0 2.0 0.0

0.0 0.0 0.0 0.0 2.1

4.9 9.1 7.2 7.5 9.4

9.8 9.1 8.0 9.5 9.3

1650010-10

Reverse engineering of GRNs based on S-Systems and BA

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

Table 3. A comparative study of existing methods for noise-free arti¯cial network. Process

No. of data points

Sn

Sp

BA DPSO26 LTV37 ARGC24 DE36 CC1

70 60 150 150 150/375 225

1 1 1 1 1 1

1 1 1 1 1 1

the earlier works like cooperative coevolutionary algorithm,1 evolutionary computation,36 Dissipative particle swarm optimization,26 adaptive regulatory gene cardinality,25 and linear time varying model.37 In all the above cases, the inference has been accurate for noise-free data, but the only advantage of the proposed BA-based model is the reduction in data points and faster convergence. It has been observed that BA converged within 2000 iterations while other evolutionary methods needed a greater number of iterations, for example, DPSO needs 5000 iterations.26 4.2. Inference for arti¯cial system with noisy data In the next phase of the experiment, the proposed method has been tested against noisy arti¯cial data to see how the given system performs under di®erent noise levels as the real-life data contains lots of errors or noise. Here, the added noise is random in nature, and di®erent percentages of randomness have been added to observe the performance of the proposed method with an increase in noise present in the data. The noise has been added in the following way: h ns ns i NoisyðdÞ ¼ dðtÞ 1 þ 2 rand : ð17Þ 100 100 In the above equation, dðtÞ is the initial noiseless data, ns is the percentage of random noise added, and rand is a function that generates a random number between [0,1]. In this paper, the performance of the proposed algorithm, for ns ¼ 5%, 15%, and 25% random noises, has been observed. The model parameters and the regularization parameter settings are the same as in the previous experiment. 4.2.1. Results Table 4 shows the inferred parameter values for a noisy system of 5 genes. It has been observed that adding 5% noise does not a®ect the accuracy of the structure signi¯cantly. The predicted structure of the GRN is almost same as the original with only one FP. The parameter values are also within the satisfactory and acceptable range. For the datasets with 15% noise added, the proposed algorithm has been successful in the detection of all the actual regulations, and the parameter values are still within the acceptable range. However, 6 FPs have been added to the network due to the 1650010-11

S. Mandal et al. Table 4. Inferred S-system parameters for arti¯cial network using noisy data. Noise (%)

TP

FP

FN

5 15 25

13 13 13

1 6 9

0 0 0

Table 5. Comparative study of BA-based S-system with the other techniques for noisy arti¯cial data.

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

Methods Noise (%) 5 15 25

DPSO26

BA

LTV37

DE36

MEMTIC22

ARGC24

Sn

Sp

Sn

Sp

Sn

Sp

Sn

Sp

Sn

Sp

Sn

Sp

1.00 1.00 1.00

0.94 0.68 0.59

1.00 0.99 0.89

0.82 0.79 0.74

1.00 0.92

0.77 0.67

1.00 0.92 0.88

0.73 0.76 0.79

1.00 0.92

0.67 0.67

1.00 1.00 1.00

1.00 0.87 0.81

noise. For the ¯nal group of datasets with 25% added noise, the accuracy of the network and the integrity of the data have been a®ected drastically, and the number of FPs has increased to 9. These results indicate that the increase in added noise level leads to increase in the prediction of FPs by the proposed algorithm. However, the prediction of all the TPs in each case proves that the proposed BA-based S-system model for reverse engineering of GRNs is robust to a respectable extent against any added noise. Table 5 shows a comparative study of the performance of various reverse engineering algorithms. It can be observed that the BA-based S-system model is better than DPSO,26 LTV,37 DE,36 and MEMETIC22 based inference algorithms in the presence of di®erent levels of noise. Only ARGC24 has a similar performance as BA in the presence of noise, with respect to sensitivity. However, the number of FPs increased signi¯cantly as the noise is increased which is worse than the other methods like DPSO,26 LTV,27 and ARGC.24 However, the performance of the proposed model is better than MEMETIC22 and LTV37 with respect to speci¯city. 4.3. E. coli DNA repair SOS network (experimentally derived) Microarray experiments on the SOS DNA repair network for E. coli 3 were ¯rst done by the Uri Alon group.39 The experimental datasets are considered as the benchmark for the evaluation of algorithms for reverse engineering of GRNs from real-world datasets. In the SOS network, eight genes were considered (uvrD, lexA, umuD, recA, uvrA, uvrY, ruvA, and polB as shown in Fig. 1) due to their signi¯cant involvement in the process of DNA repair. During their experiments, the E. coli cell was irradiated with UV light. Four experiments were performed with di®erent UV light intensities. Each experiment consists of 50 time steps spaced by 6 min for each of the eight genes. Since two of them (uvrY and ruvA) have little activity in comparison with the rest of 1650010-12

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

Reverse engineering of GRNs based on S-Systems and BA

Fig. 1. The graphical representation of the actual SOS network for E. coli.

the genes in the network, many researchers chose the remaining six genes only for their investigations. However, in this work, all eight genes have been considered. 4.3.1. Accumulative cardinality and experimental setup For practical cases, the actual number of regulations in a GRN is unknown. Therefore, the value of cardinality or maximum in-degree is also unknown for any real-life genetic network. Moreover, due to varying amounts of noise present in real-life datasets and the degree of freedom of S-system parameters, the predicted connectivity of a network (TP and FP) may di®er for the same dataset with di®erent cardinality values. All the earlier works6,26,36,37,40 related to the SOS network of E. coli contained one or more FPs during the reconstruction process, which is undesirable. For a real-life GRN, the network structure is of utmost importance whereas the value of model parameters may di®er from model to model. Thus, our objective is to identify all the regulations correctly with minimum possible number of predicted FPs. A novel accumulative cardinality based GRN reconstruction method has been proposed in this paper, to minimize the number of FPs. Here, despite setting the value of I to a ¯xed integer, the program has been executed for I ¼ 1; 2; . . . ; IMAX sequentially and for each case, the resultant regulations of the network have been stored in an initial connectivity matrix (ICM). An element of this matrix, ICMi;j , denotes the regulatory relationship between the ith gene and the jth gene. ICMi;j can have values of þ1, 0, and 1 depending on the value and sign of the kinetic orders of the S-system model. þ1 denotes activation, 0 denotes no regulation, and 1 denotes suppression. Table 6 shows an ICM for I ¼ 5. Thus, IMAX numbers of ICMs have been generated during this process. All connectivity matrices have been accumulated on the 1650010-13

S. Mandal et al. Table 6. An example of ICM for I ¼ 5. Genes

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

uvrD lexA umuDC recA uvrA uvrY ruvA polB

uvrD

lexA

umuDC

recA

uvrA

uvrY

ruvA

polB

0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Table 7. OM for E. coli. Genes uvrD lexA umuDC recA uvrA uvrY ruvA polB

uvrD

lexA

umuDC

recA

uvrA

uvrY

ruvA

polB

0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4

1 0.8 1 1 0.8 1 1 1

0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4

0 0 0 0 0 0 0 0

0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2

0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2

0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4

basis of occurrence of a particular regulation and divided by IMAX to obtain the occurrence matrix (OM), shown in Table 7. The elements of the OM denote the average number of times a particular regulation has been predicted i.e. the percentage of occurrence can be found in the accumulation of di®erent cardinality based ICMs. An element OMi;j can be expressed in following way: OMi;j ¼

1

IMAX X

IMAX

I¼1

ICMi;j :

ð18Þ

We are only interested in ¯nding the most signi¯cant regulations, and hence, the OM has been ¯ltered with some threshold occurrence value, , and the remaining regulations have been forced to zero to eliminate the insigni¯cant regulations i.e. the FPs. The resultant matrix has been denoted by the score matrix (SM). The value of can be determined by some trial and error method. OMi;j if jOMi;j j SMi;j ¼ ; ð19Þ 0 else Finally, we have de¯ned the inferred matrix (IM) whose elements denote the elements of SM with a proper sign, and this IM denotes the ultimate inferred GRN. IMi;j ¼ signðSMi;j Þ:

ð20Þ

The SM and the ¯nal inferred regulations have been given in Tables 8 and 9, respectively. Before the simulation, the dataset has been normalized. The value of 1650010-14

Reverse engineering of GRNs based on S-Systems and BA Table 8. The SM with threshold 0.8 for E. coli. Genes

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

uvrD lexA umuDC recA uvrA uvrY ruvA polB

uvrD

lexA

umuDC

recA

uvrA

uvrY

ruvA

polB

0 0 0 0 0 0 0 0

1 0.8 1 1 0.8 1 1 1

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Table 9. IM after ¯ltering. Genes uvrD lexA umuDC recA uvrA uvrY ruvA polB

uvrD

lexA

umuDC

recA

uvrA

uvrY

ruvA

polB

0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

IMAX has been set to ¯ve i.e. the experiment has been performed for ¯ve times, varying the cardinality from 1 to 5. Moreover, dual regulations have been ignored as it is not possible to activate or suppress a gene simultaneously. For the reconstruction of the GRN, an initial population of bats has been set to 200, and a maximum number of iterations to 2000. The search space has been selected with the boundaries ½5; 5 for the kinetic orders gi;j and hi;j , and [0,5] for the rate constants. Both values of and were set to 0.95. 4.3.2. Results and comparisons for SOS network After the ¯ltration of the SM with a threshold level of 0.8, the number of remaining regulations is eight, and all of them are TPs. There is no FP. Only one regulation has not been identi¯ed. The type of regulation (i.e. activation or repression) can be found from the sign of the elements of the ICM. The BA-based S-system model has identi¯ed that lexA suppresses all the other genes including itself which are similar to Fig. 1. However, the proposed algorithm has been unable to infer the activation of lexA by recA that proved to be elusive in earlier works as well.24 Now, Precision (PR), Recall (RE), and F-score are de¯ned in the following way: PR ¼

TP ; TP þ FP

1650010-15

ð21Þ

S. Mandal et al. Table 10. Comparative study of speci¯city, sensitivity, and F-score for di®erent models of the E. coli SOS network. Method BA (accumulative cardinality) Bayesian network6 LTV37 S-tree40 DE36 DPSO26

Regulations

TP

FP

TN

FN

Sensitivity

Speci¯city

F-score

8 6 13 7 8 10

8 4 7 6 5 7

0 2 6 1 3 3

55 18 14 19 17 17

1 3 0 1 0 0

0.89 0.57 1 0.86 0.71 1

1 0.90 0.70 0.95 0.85 0.85

0.94 0.61 0.70 0.86 0.67 0.823

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

RE ¼

TP ; TN þ FN

F -score ¼

2 PR RE : PR þ RE

ð22Þ ð23Þ

Table 10 shows the comparisons between di®erent network parameters for the other inference algorithms. Here, algorithms such as ARGC and MEMETIC have been omitted for comparison purpose as these two algorithms were not applied to the E. coli dataset. To the best of the knowledge of the authors, the results presented here are the best in terms of speci¯city, sensitivity, and F-score as it does not contain any FPs. It can be observed that BA has the highest sensitivity (0.89) among all the methods compared. Here, the sensitivity failed to score the best value 1 as it could not infer the activation of lexA by recA. The speci¯city here is 1, as there is no FP, and this is also the best among all the existing methods. F-score is calculated to evaluate an algorithm without looking at the trade-o® between sensitivity and speci¯city. The proposed accumulated cardinality-based BA has achieved the highest F-score, indicating that it is probably the best-suited algorithm for identifying the true regulations of the SOS network. However, due to the accumulated cardinality process, the runtime of the proposed algorithm is slightly on the higher side compared to the other algorithms like LTV, DE, Bayesian network, etc. However, S-tree and DPSO are more expensive with respect to runtime. The comparisons between the di®erent runtimes of the various inference algorithms have been shown in Table 11. Observing all the parameters, it can be concluded that the performance of the accumulative cardinality-based BA is the best among the existing algorithms based on S-system for the reconstruction of the SOS network. The proposed algorithm has good accuracy as well as a reasonably small runtime. 4.4. Real-world 20 gene network extracted from Gene Net Weaver 41 4.4.1. Experimental setup In this section, we have tried to gauge the performance of the proposed methodology for larger networks. For this purpose, we have extracted a GRN, consisting of 20 1650010-16

Reverse engineering of GRNs based on S-Systems and BA Table 11. Runtime comparisons between di®erent algorithms for E. coli SOS network. Method

Runtime [hrs]

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

BA (accumulative cardinality) Bayesian network6 LTV33 S-tree40 DE32 DPSO24

2.5 0.01 35 0.1 0.3 4

genes and 24 interactions, as shown in Fig. 2. 51 time-points have been generated by Gene Net Weaver (GNW) (version 3.1.3 beta), with DREAM442 settings. This data has been used for the purpose of reconstruction of the GRN as done previously in this work. 4.4.2. Results The proposed methodology does not scale up e±ciently. For the 20 gene network considered here, it was able to predict only four out of a possible 24 interactions correctly. Also, the number of FPs is 41. The correct relations predicted have been shown in Table 12.

Fig. 2. The graphical representation of the GRN consisting of 20 genes.

1650010-17

S. Mandal et al. Table 12. TP obtained for the GRN consisting of 20 genes. metJ > metF metJ > metQ metR > metA Phop > Phop

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

5. Conclusion Several techniques have been proposed by various researchers to solve the problem of reverse engineering of GRNs from temporal genetic expression data in the domain of computational biology and bioinformatics. It is imperative to enhance the accuracy of the inference algorithms as well as to reduce the number of incorrect predictions (i.e. FPs) within a plausible runtime. BA (a recently developed metaheuristic) with the incorporation of a regularizer, has increased the performance of the S-system based inference algorithm proposed. The regularizer helps to avoid the prediction of FPs while considering a sparse set of network parameters. To prove the e±ciency of the proposed inference algorithm, it has been applied to a benchmark arti¯cial network with ¯ve genes, with and without noise. With the use of fewer data points, the BA-based S-system model has been able to infer the network with a signi¯cantly high accuracy. However, in the presence of noise, the number of FPs increased with the amount of noise. It has also been found that the robustness of the proposed algorithm is better than that of some of the other existing algorithms. Next, the proposed inference algorithm has been applied to the real-life SOS DNA repair network of E. coli. In this paper, a novel accumulative cardinality-based Ssystem model using BA has been used for the construction of GRN from temporal expression data. The model has been executed for di®erent values of cardinality starting from 1 to IMAX . Based on the score of occurrence for various regulations, actual interactions have been inferred while the numbers of FPs have been minimized to zero. It has also been observed that the proposed algorithm is comparatively better than other existing techniques in term of detecting TPs while avoiding FPs. However, the runtime is slightly higher due to its accumulative nature of cardinality. In future, it can further be validated against larger networks. Moreover, di®erent regularization techniques can be employed to improve the accuracy and runtime further. However, for GRNs of larger size, the proposed methodology fails to produce results with any acceptable quality and accuracy. This indicates that the proposed technique in its current form, is not suitable for larger networks, like many of the other methodologies present in the contemporary literature. This also provides scope for future research into suitable modi¯cations to the technique such that it can 1650010-18

Reverse engineering of GRNs based on S-Systems and BA

extract GRNs consisting of a considerably large number of genes from time-series microarray data.

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

References 1. Kimura S, Ide K, Kashihara A, Kano M, Hatakeyama M, Masui R, Nakagawa N, Yokoyama S, Kuramitsu S, Konagaya A, Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm, Bioinformatics 21(7):1154–1163, 2005. 2. Quackenbush J. Microarray data normalization and transformation, Nat Genet 32:496–501, 2002. 3. Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, and Gardner TS, Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression pro¯les, PLoS Biol 5(1):e8, 2007. 4. Akutsu T, Miyano S, Kuhara S, Identi¯cation of genetic networks from a small number of gene expression patterns under the Boolean network model, Paci¯c Symp Biocomputing, Vol. 4, pp. 17–28, 1999. 5. Weaver DC, Workman CT, Stormo GD, Modeling regulatory networks with weight matrices, Paci¯c Symp. Biocomputing, Vol. 4, pp. 112–123, 1999. 6. Perrin B-E, Ralaivola L, Mazurie A, Bottani S, Mallet J, d'Alche-Buc F, Gene networks inference using dynamic Bayesian networks, Bioinformatics 19(Suppl 2):ii138–ii148, 2003. 7. Werhli AV, Grzegorczyk M, Husmeier D, Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models, and Bayesian networks, Bioinformatics 22(20):2523–2531, 2006. 8. Yilmaz S, Kucuksille EU, Cengiz Y, Modi¯ed bat algorithm, Elektron Elektrotech. 20(2): 71–78, 2004. 9. Xu R, Wunsch II D, Frank R, Inference of genetic regulatory networks with recurrent neural network models using particle swarm optimization, IEEE/ACM Trans Comput Biol Bioinf (TCBB) 4(4):681–692, 2007. 10. Noman N, Palafox L, Iba H, Reconstruction of gene regulatory networks from gene expression data using decoupled recurrent neural network model, Natural Computing and Beyond, Springer, Japan, 2013, pp. 93–103. 11. Savageau MA, Biochemical Systems Analysis: A Study of Function and Design in Molecular Biology, Addison-Wesley, 1976. 12. Noman N, Iba H, Accelerating di®erential evolution using an adaptive local search, IEEE Trans Evol Comput 12(1):107–125, 2008. 13. Liu P-K, Wang F-S, Inference of biochemical network models in S-system using multiobjective optimization approach, Bioinformatics 24(8):1085–1092, 2008. 14. Spieth C, Streichert F, Speer N, Zell A, A memetic inference method for gene regulatory networks based on S-systems, Cong Evolutionary Computation, 2004. CEC2004, Vol. 1, pp. 152–157. IEEE, 2004. 15. Jereesh AS, Govindan VK, Gene regulatory network modelling using cuckoo search and ssystem, Int J Adv Res Compu Sci Softw Eng 3(9):1231–1237, 2013. 16. Jereesh AS, Govindan VK, A clonal based algorithm for the reconstruction of genetic network using S-system, Int J Res Eng Technol (IJRET) 2:44–50 2013. 17. Nakayama T, Seno S, Takenaka Y, Matsuda H, Inference of S-system models of gene regulatory networks using immune algorithm, J Bioinf Comput Biol 9(supp01):75–86, 2011.

1650010-19

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

S. Mandal et al.

18. Mandal S, Saha G, Pal RK, S-system based gene regulatory network reconstruction using Fire°y algorithm, 3rd Int Conf Computer, Communication, Control and Information Technology (C3IT), 2015, pp. 1–5, IEEE, 2015. 19. Wang H, Qian L, Dougherty E, Inference of gene regulatory networks using S-system: A uni¯ed approach, IET Syst Biol 4(2):145–156, 2010. 20. Murata H, Koshino M, Mitamura M, Kimura H, Inference of S-system models of genetic networks using product unit neural networks, IEEE Int Conf Systems, Man and Cybernetics, 2008 SMC 2008, pp. 1390–1395, IEEE, 2008. 21. Thie®ry D, Huerta AM, Perez-Rueda E, Collado-Vides J, From speci¯c gene regulation to genomic networks: A global analysis of transcriptional regulation in Escherichia coli, BioEssays 20(5):433–440, 1998. 22. Hasan Md M, Noman N, Iba H, A prioir knowledge based approach to infer gene regulatory networks, Proc Int Symp Biocomputing, p. 39, ACM, 2010. 23. Liu L-Z, Wu F-X, Zhang WJ, Inference of biological S-system using the separable estimation method and the genetic algorithm, IEEE/ACM Trans Comput Biol and Bioinf (TCBB) 9(4):955–965, 2012. 24. Chowdhury AR, Chetty M, Vinh NX, Adaptive regulatory genes cardinality for reconstructing genetic networks, 2012 IEEE Congr Evolutionary Computation (CEC), pp. 1–8, IEEE, 2012. 25. Chowdhury AR, Chetty M, Vinh NX, Incorporating time-delays in S-System model for reverse engineering genetic networks, BMC Bioinformatics 14(1):196, 2013. 26. Palafox L, Noman N, Iba H, Reverse engineering of gene regulatory networks using dissipative particle swarm optimization, IEEE Trans Evol Comput 17(4):577–587, 2013. 27. Wolpert DH, Macready WG, No free lunch theorems for optimization, IEEE Trans Evol Comput 1(1):67–82 1997. 28. Yang, X-S, A new metaheuristic bat-inspired algorithm, Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), Springer Berlin, Heidelberg, 2010, pp. 65–74. 29. Yang X-S, Engineering Optimization: An Introduction with Metaheuristic Applications, John Wiley & Sons, 2010. 30. Tamiru AL, Hashim FM, Application of bat algorithm and fuzzy systems to model exergy changes in a gas turbine, Arti¯cial Intelligence, Evolutionary Computing, and Metaheuristics, Springer, Berlin Heidelberg, 2013, pp. 685–719. 31. Yang, X-S, Karamanoglu M, Fong S, Bat algorithm for topology optimization in microelectronic applications, 2012 Int Conf Future Generation Communication Technology (FGCT), IEEE, 2012, pp. 150–155. 32. Yang, X-S, Bat algorithm for multi-objective optimization, Int J Bio-Inspir Comp 3(5): 267–274, 2011. 33. Yang, X-S, He X, Bat algorithm: Literature review and applications, Int J Bio-Inspir Comp 5(3):141–149, 2013. 34. Yilmaz S, Kucuksille EU, Improved bat algorithm (IBA) on continuous optimization problems, Lect Notes Softw Eng 1(3):279–283, 2013. 35. Qian L, Wang H, Inference of genetic regulatory networks by evolutionary algorithm and H1 ¯ltering, IEEE/SP 14th Workshop on Statistical Signal Processing, 2007. SSP'07, pp. 21–25, IEEE, 2007. 36. Noman N, Iba H, Reverse engineering genetic networks using evolutionary computation Genome Inf 16(2):205–214, 2005. 37. Kabir M, Noman N, Iba H, Reverse engineering gene regulatory network from microarray data using linear time-variant model, BMC Bioinformatics 11(Suppl 1):S56, 2010. 38. Kikuchi S, Tominaga D, Arita M, Takahashi K, Tomita M, Dynamic modelling of genetic networks using genetic algorithm and S-system, Bioinformatics 19(5):643–650 2003.

1650010-20

Reverse engineering of GRNs based on S-Systems and BA

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

39. Ronen M, Rosenberg R, Shraiman BI, Alon U, Assigning numbers to the arrows: Parameterizing a gene regulation network by using accurate expression kinetics, Proc Natl Acad Sci 99(16):10555–10560, 2002. 40. Cho D-Y, Cho K-H, Zhang B-T, Identi¯cation of biochemical networks by S-tree based genetic programming, Bioinformatics 22(13):1631–1640, 2006. 41. Scha®ter T, Marbach D, Floreano D, GeneNetWeaver: In silico benchmark generation and performance pro¯ling of network inference methods, Bioinformatics 27(16):2263– 2270, 2011. 42. Marbach D, Prill RJ, Scha®ter T, Mattiussi C, Floreano D, Stolovitzky G, Revealing strengths and weaknesses of methods for gene network inference, Proc Natl Acad Sci 107 (14):6286–6291, 2010.

Sudip Mandal received the B. Tech and M. Tech degrees in Electronics and Communication Engineering from Kalyani Government Engineering College in the year 2009 and 2011, respectively. Currently, he is the Head of the Department of Electronics and Communication Engineering, in Global Institute of Management Technology, Krishna Nagar, India. He is also pursuing his Ph.D. degree from Department of Computer Science and Engineering, University of Calcutta, India. His current research interests include bioinformatics, soft computing, and arti¯cial intelligence. Mr. Mandal is a member of the IEEE Computational Intelligence Society and the IEEE Systems, Man, & Cybernetics Society. He has 16 publications in peer-reviewed Journals, and in National and International Conferences.

Abhinandan Khan received the B. Tech degree in Electronics and Communication Engineering, from the West Bengal University of Technology, Kolkata, India and the M.E. degree in Electronics and Telecommunication Engineering, from Jadavpur University, Kolkata, India, in 2011 and 2013, respectively. From 2014, he is with the Department of Computer Science and Engineering, University of Calcutta, India, where he is currently working towards his Ph.D. His research interests are arti¯cial intelligence, optimization techniques, bioinformatics, and computational biology. Mr. Khan is the recipient of the University Gold Medal for securing the highest marks among all post-graduate engineering courses in Jadavpur University. He has 12 publications in International Conferences and peer-reviewed Journals.

1650010-21

J. Bioinform. Comput. Biol. Downloaded from www.worldscientific.com by MCGILL UNIVERSITY on 04/04/16. For personal use only.

S. Mandal et al.

Goutam Saha received the B.E. degree in Electrical Engineering and the M.E. degree in Electronics and Telecommunication Engineering from the Bengal Engineering College, Shibpur under the University of Calcutta, Kolkata, India in 1984 and 1989, respectively. He received the Ph.D. degree from the Indian Institute of Technology, Kharagpur, India in 1999. He also has Post-Doctoral Research experience at the Indian Institute of Technology, Kharagpur, India and the Ben Gurion University, Israel. From 1989 to 2000 he was with the Department of Electrical Engineering, Regional Engineering College, Durgapur, India as a Lecturer. From 2002 to 2005, he was with the Department of Electronics and Communication Engineering, Kalyani Government Engineering College, Kalyani, India, and from 2005 to 2006, with the Curriculum Development Center, N.I.T.T.T.R., Calcutta, India as an Assistant Professor. From 2006 to 2013 he worked as the Associate Professor and the Head of the Department of Information Technology and Computer Science and Engineering, Government College of Leather Technology, Kolkata, India. Presently, he is working as the Professor at the Department of Information Technology, North Eastern Hill University, Shillong, India. Dr. Saha's current research interests include computational biology, bioinformatics, systems biology, IC design and nanotechnology, bio remediation, etc. Dr. Saha has 20 research papers, and has authored and co-authored two books. Dr. Saha also holds an international patent.

Rajat Kumar Pal received the B.E. degree in Electrical Engineering from the Bengal Engineering College, Shibpur under the University of Calcutta, India, and the M. Tech. degree in Computer Science and Engineering from the University of Calcutta, India in 1985 and 1988, respectively. He received the Ph.D. degree from the Indian Institute of Technology, Kharagpur, India in 1996. Since 1994, he has been as a faculty with the Department of Computer Science and Engineering, University of Calcutta. He has also held the position of the Head of the Department from 2005 to 2007. He went on lien to become Professor at the Department of Information Technology, Assam University, India from 2010 to 2012. Presently, he is working as a Professor with the Department of Computer Science and Engineering, University of Calcutta, India. Dr. Pal's major research interests include VLSI design, graph theory and its applications, perfect graphs, logic synthesis, design and analysis of algorithms, computational geometry, parallel computation and algorithms. Dr. Pal has published more than 150 technical research papers, and has authored and co-authored two books. Dr. Pal also holds ¯ve international patents.

1650010-22