ARTICLE IN PRESS

BIO 3544 1–18

BioSystems xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

BioSystems journal homepage: www.elsevier.com/locate/biosystems

Review Article

1

Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer

2

3

4

Q1

Nesma M. ElKalaawy ∗ , Amr Wassal Department of Computer Engineering, Faculty of Engineering, Cairo University, Giza 12613, Egypt

5 6

a r t i c l e

7 22

i n f o

a b s t r a c t

8

Article history: Received 26 June 2014 Received in revised form 23 January 2015 Accepted 23 January 2015 Available online xxx

9 10 11 12 13 14

21

Keywords: Biochemical networks Signal transduction pathways Modeling and simulation Stochastic methods Ordinary differential equations Partial differential equations

23

Contents

15 16 17 18 19 20

1. 2. 3.

24 25 26 27 28 29 30 31 32 33 34 35

4. 5. 6.

36 37 38 39

Biochemical networks depict the chemical interactions that take place among elements of living cells. They aim to elucidate how cellular behavior and functional properties of the cell emerge from the relationships between its components, i.e. molecules. Biochemical networks are largely characterized by dynamic behavior, and exhibit high degrees of complexity. Hence, the interest in such networks is growing and they have been the target of several recent modeling efforts. Signal transduction pathways (STPs) constitute a class of biochemical networks that receive, process, and respond to stimuli from the environment, as well as stimuli that are internal to the organism. An STP consists of a chain of intracellular signaling processes that ultimately result in generating different cellular responses. This primer presents the methodologies used for the modeling and simulation of biochemical networks, illustrated for STPs. These methodologies range from qualitative to quantitative, and include structural as well as dynamic analysis techniques. We describe the different methodologies, outline their underlying assumptions, and provide an assessment of their advantages and disadvantages. Moreover, publicly and/or commercially available implementations of these methodologies are listed as appropriate. In particular, this primer aims to provide a clear introduction and comprehensive coverage of biochemical modeling and simulation methodologies for the non-expert, with speciﬁc focus on relevant literature of STPs. © 2015 Elsevier Ireland Ltd. All rights reserved.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Signaling networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling and simulation methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Ordinary differential equations (ODEs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Partial differential equations (PDEs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Stochastic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. Petri nets (PNs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5. Process calculi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6. Boolean networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7. Bayesian networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8. Cellular automata (CA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9. Agent-based modeling (ABM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

40

1. Introduction Q2

∗ Corresponding author. Tel.: +20 1283217801. E-mail addresses: [email protected] (N.M. ElKalaawy), [email protected] (A. Wassal).

Studying the individual components of a biological system is an Q3 important initial step; however, this alone is not sufﬁcient to arrive at a derivation of the system behavior. Many properties arise at the

http://dx.doi.org/10.1016/j.biosystems.2015.01.008 0303-2647/© 2015 Elsevier Ireland Ltd. All rights reserved.

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

41

42 43 44

G Model BIO 3544 1–18 2

45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

system level only. Therefore, biology has shifted from the molecular characterization of the biological system, towards the integration of its components, their behavior, and their relationships. The latter gave rise to the notion of systems biology. Systems biology attempts to determine a system-level understanding of biological systems through the examination of the structure and dynamics of cellular and organismal functions, rather than the characteristics of the isolated parts of a cell or an organism (Ideker et al., 2001; Kitano, 2002). Static diagrams, which merely describe a system by a collection of components and their interconnections, are of little help when it comes to understanding how the different components interact to achieve a certain function (Kitano, 2002). The inherent complexity of biological systems and the accumulation of huge amounts of biological data mandate a systematic approach. Using computational and mathematical models for biological processes allows discovering emergent properties, examining system behavior, and generating new hypotheses. Such models allow performing in silico experiments that would be very expensive or impossible to perform in the laboratory (Decraene and Hinze, 2010). In light of these beneﬁts, new user-friendly modeling languages and tools that allow biologists to represent biological systems more intuitively are emerging. Moreover, formal approaches are being incorporated in biological research, standardized representations of biological data are being established, and biological systems modeling is gaining better ground (Fisher and Henzinger, 2007). Biochemical networks constitute examples of biological systems that have been modeled extensively. Such networks are classiﬁed into three broad categories: metabolic networks, gene regulatory networks, and signaling networks. Metabolic networks comprise the set of reactions that occur in living organisms for the production and degradation of organic compounds needed for an organism’s vital functions (Liiving et al., 2011).Gene regulatory networks are concerned with the control of transcription, i.e., how genes are up- and down-regulated in response to signals. Signaling networks describe how cells receive, process, and respond to stimuli from the environment, as well as stimuli that are internal to the organism. In this primer, we present an introduction to the modeling and simulation of biochemical networks, illustrated for the class of signaling networks. The rest of this primer is organized as follows. Section 2 discusses signaling networks in detail. Section 3 presents the different methodologies used for modeling and simulation of biochemical networks, with speciﬁc reference to models of signaling networks. Relevant surveys describe similar methodologies (Andrews et al., 2009; Decraene and Hinze, 2010; Eungdamrong and Iyengar, 2004a,b; Fisher and Henzinger, 2007; Gilbert et al., 2006; Hughey et al., 2010; Materi and Wishart, 2007; Meng et al., 2004; Pahle, 2009; Pinney et al., 2003; Sreenath et al., 2008; Turner et al., 2004). Some of these focus on only a subset of methodologies and/or overlook the fundamental differences between the methodologies. On the other hand, this primer covers a large number of methodologies and particularly aims to provide a clear introduction and comprehensive coverage of the methodologies for the non-expert. In particular, by using signaling networks as an example, we try to establish a clear understanding of the fundamentals of biochemical modeling and simulation. By discussing relevant signal transduction models from the literature, we aim to illustrate how the different formalisms capture different behaviors of biochemical systems, and how modelers are guided at their choice of method and at their modeling process. In Section 4, we discuss modeling and analysis tools that are widely used in the biochemical community. Finally, Section 5 provides a summary and comparison of the methodologies, and the conclusions are given in Section 6.

2. Signaling networks In order for an organism to respond to internal and external stimuli, its cells have to communicate together. Cells communicate using intercellular signaling, i.e., either by sending out signaling molecules in the extracellular space, or by direct contact between neighboring cells (Krauss, 2004). When a signal arrives at a recipient cell, certain proteins, called receptors, are activated. Activated receptors pass on the signal to other proteins inside the cell and this initiates a chain of intracellular signaling processes. Such chains ultimately result in generating different cellular responses, such as proliferation, differentiation, or apoptosis. The set of successive events that take place as part of an intracellular signaling chain are often termed a signal transduction pathway, (STP). Generally, a ‘pathway’ is an abstraction that biologists use to describe the core of a biochemical network, comprising a sequence of events (Gilbert et al., 2006). It is used in the context of signaling, metabolic, and gene regulatory networks. A common example of an STP is the mitogen activated protein kinase (MAPK) pathway. MAPKs are a family of protein kinases that mediate signals from cell-surface receptors to different cellular compartments and regulate various cellular activities. They are conserved in many organisms, and they play an important role in many pathological conditions, including cancer and other diseases (Zhang and Liu, 2002). Orton et al. (2005) compared three different models of the MAPK pathway and illustrated how the models focus on different parts of the same pathway. Different models include different subsets of molecules and reactions and stop at different points in the pathway, depending on the questions that they seek to answer. STPs have been traditionally viewed as separate linear entities. However, this description can no longer account for the complex patterns observed in these pathways. STPs are very dense, and have a large number of molecular species that dynamically move between cellular organelles. These species do not only make linear connections with other upstream and downstream components, but they also exhibit a lot of branching, horizontal interconnections, and even feedback loops (Mellman and Misteli, 2003). Moreover, STPs are rarely isolated, i.e., there is usually crosstalk between the different pathways. Like their biological counterparts, models of STPs display sophisticated structures. This is further complicated by the fact that information about a particular pathway comes from various databases and research work, possibly with different representations (Heiner et al., 2004). Several methods have been used to model STPs, of which the most common are ordinary differential equations (ODEs). However, new methods are being continuously proposed and investigated. In general, models of STPs are classiﬁed into two main categories: structural network models and dynamic analysis models. Structural network models are based solely on connectivity information. They can generate hypotheses regarding the structure of the global network, as well as the function of individual proteins. Such models reconstruct the topology of a signaling network. On the other hand, dynamic analysis models require, in addition to the network connectivity, numerical values of the kinetic rate constants and the initial concentrations of signaling proteins and complexes. These models measure the time-variant properties of a network. In other words, once the associated quantitative data are known, dynamic analysis of a reconstructed signaling network can be carried out (Papin et al., 2005). It is believed that signaling molecules interact dynamically and non-linearly to achieve speciﬁcity of responses, i.e., the ability to selectively activate a speciﬁc cellular response (Pawson, 2004). For this reason, dynamic modeling has been the conventional method for modeling STPs. However, since dynamic analysis models build upon structural network models, it is necessary to ﬁrst verify the consistency and correctness of the latter (Heiner et al., 2004).

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

111

112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

G Model BIO 3544 1–18

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

183

In this primer, we particularly choose to focus on STPs since they are highly characterized by time- and/or space-variant dynamics. Moreover, methodologies used for modeling and simulating biochemical networks are mostly applicable to metabolic/genetic networks, but the converse is generally not true. For example, studying metabolic networks is typically centered around steady state analysis and does not usually incorporate spatial detail; two things that are clearly not suitable for signaling networks.

184

3. Modeling and simulation methodologies

176 177 178 179 180 181 182

185

186 187 188 189 190 191 192 193 194 195 196 197 198 199

3.1. Ordinary differential equations (ODEs) ODEs are the most commonly used methodology for modeling signaling networks. They are mathematically well-established, and they have a broad range of numerical solution techniques and solvers. ODE models are relatively simple, and they are mostly based on mass-action kinetics. They can be used to model large biological systems, with many species, and simulate complex dynamics, such as those obeying Michaelis-Menten kinetics (Orton et al., 2005). Given the initial concentrations, kinetic rates, and the topology of the STP, the whole pathway is modeled using a set of nonlinear ODEs. These ODEs measure how the concentration of each species evolves with time, and are often referred to as reaction rate equations (RREs). For example, consider the following binding reaction, between two proteins A and B to form the complex AB, kf

200

A + BAB,

(1)

kb 201 202 203 204

205

206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233

where kf and kb are the kinetic rate constants of the forward and backward reactions, respectively. The ODE that describes the change in the real-valued concentration of the complex AB, [AB](t), is given by: d[AB](t) = kf [A](t) [B](t) − kb [AB](t). dt

(2)

A detailed explanation of the process of constructing the complete ODE model of a pathway can be found in Conrad and Tyson (2006). Also of particular note is the work of Cho and Wolkenhauer (2003), where they brieﬂy illustrated the process of building the ODE model of TNF˛-mediated NF-∗B STP (TNF˛ refers to tumor necrosis factor ˛ and NF-∗B refers to nuclear factor ∗B). They showed how the biological chart of the pathway is transformed into a graphical representation, which in turn can be easily translated into a system of ODEs. Several tools exist for simulating biochemical systems using ODEs, and a comparison of some of them is presented in Pettinen et al. (2005). The solution of the ODE models describes the deterministic time evolution of the concentrations of the signaling components. This solution often represents average population levels. However, the validity of continuous deterministic representations of biochemical networks is often disputed, because they do not account for stochasticity (which arises due to low numbers of molecules, as will be explained in Section 3.3). This limits the applicability of ODE models to cases with large numbers of molecules. Another assumption in ODE models is that reactions occur in a homogeneous well-stirred volume, where all reactants have equal access to each other. Consequently, reaction rates are not affected by the locations of the molecules and spatial information can be omitted. Eungdamrong and Iyengar (2004b) elaborated on compartmental modeling; a technique based on ODEs that treats the same molecule as distinct species, depending on its cellular compartment, e.g., cell membrane, cytosol, Golgi complex, etc. The exchange of molecules between compartments is modeled as a ﬂux,

3

which can either be determined from a priori knowledge, or ﬁtted to empirical observations. In other words, compartmental modeling does not describe molecular motion in space, but only movements between a ﬁnite number of compartments. A further limitation of the ODE approach is the constraint that accurate kinetic rates must be available. These rates are often lacking, or difﬁcult to obtain experimentally. In some cases, kinetic rates may be computationally estimated. However, such estimates are not sufﬁciently reliable, especially for larger systems. Moreover, simulations of ODE models become inefﬁcient when the rates of the different reactions vary signiﬁcantly, i.e., the stiffness problem. Finally, ODE models track only changes in global species concentrations. Therefore, they provide no visualization of signal ﬂows within the network (Sackmann et al., 2006). Nevertheless, ODE models of STPs are very common in the literature, and have provided valuable insights into the workings of several STPs.

3.2. Partial differential equations (PDEs) While ODE-based models allow relatively easy modeling of biochemical networks, the well-stirred assumption intrinsic to these models is not always valid. Biological systems display concentration gradients (Kholodenko, 2006) and other spatial features that render the volume non-homogeneous. In such cases, the system becomes spatially restricted, and cellular interactions happen to be partly controlled by the spatial organization of the signaling components. Data on signaling networks and experimental results have demonstrated that cells use spatial separation of signaling reactions as one mechanism for achieving signaling speciﬁcity. For example, Calcium (Ca2+ ), a fundamental intracellular signal, regulates a variety of cellular functions in the same cell. Typically, localized Ca2+ bursts regulate fast responses, such as exocytosis, while global sustained Ca2+ signals regulate slower responses, such as cell proliferation. In other words, the Ca2+ signaling system achieves speciﬁcity by operating over wide temporal and spatial scales. Ca2+ signals differentially activate downstream molecules based on their duration, frequency, amplitude and location (Ullah et al., 2007). Several studies have emphasized the importance of spatial details. Eungdamrong and Iyengar (2004a) demonstrated how the choice between spatial and non-spatial models is guided by the cellular process under investigation. Meyers et al. (2006) illustrated that a general substrate may be activated or deactivated in response to alterations in cell size and shape, i.e., cellular geometry. Neves and Iyengar (2009) also provided many examples of spatially-restricted biochemical systems. Neves et al. (2008) modeled a cAMP-PKA-MAPK network. Their results indicated that spatial information is transmitted through a signaling network different from that used for information regarding activity states. In order to deal with signiﬁcant concentration gradients, changes in cell geometry, and other spatial phenomena, elaborate spatial models are needed. Compartmental models (Section 3.1) adequately describe the transport of molecules across the boundaries of different compartments. However, they are incapable of representing spatial details beyond the occupancy of these compartments. If details of spatial distribution and diffusion of molecules in highly localized spaces are to be explicitly considered, reaction-diffusion (RD) modeling constitutes a more plausible alternative. RD equations combine the simultaneous effects of reactions and diffusion. The effect of diffusion becomes signiﬁcant, when reactions are relatively faster than diffusion (Takahashi et al., 2005). In this case, it takes a long time for reactants to locate each other, but once they do, they react immediately. Consequently, diffusion is a limiting factor for the occurrence of reactions. There exist formal empirical measures for determining whether a reaction is

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249

250

251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296

G Model BIO 3544 1–18

N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

4 297 298 299 300 301 302

303

304 305 306 307 308 309 310 311 312

ARTICLE IN PRESS

diffusion-limited or not (Fogler, 2005). However, a detailed description of these measures is beyond the scope of this work. RD modeling allows species concentrations to vary continuously over both space and time. Since concentrations depend on more than one independent variable, RD models are formulated as PDEs as follows (Keener, 2002):

∂[C](r, t) = DC ∇ 2 [C](r, t) + R ([C](r, t)) , ∂t

(3)

where [C]( r, t) is the concentration of component C at position r and time t, DC is the diffusion coefﬁcient of C, and R([C]( r, t)) describes the rate of formation and dissociation of C. Eq. (3) indicates that the rate of change of the concentration depends on a diffusion term (based on Fick’s second law of diffusion), and a reaction term (mostly based on mass-action kinetics). Consider again the complex AB in (1). In order to account for space as well as time, the ODE in (2) is extended into a PDE, as follows:

313

314 315

∂[AB](r, t) = DAB ∇ 2 [AB](r, t) + kf [A](r, t) [B](r, t) − kb [AB](r, t). ∂t (4)

316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358

The solution of (4) gives the temporal and spatial evolution of the concentration of the complex AB. Solving PDEs ﬁrst requires initial distributions of species over the modeled space, as opposed to the global initial concentration values required for ODEs (Bittig and Uhrmacher, 2010). Second, boundary conditions (conditions at speciﬁc points in space) need to be speciﬁed. Boundary conditions describe either concentration values at the boundaries of the system (Dirichlet condition), or diffusive ﬂuxes across these boundaries (Neumann condition). Finally, the diffusion coefﬁcients of all molecular species are also required. These coefﬁcients are often difﬁcult to measure experimentally. Therefore, they are often estimated based on molecular weights, or relative to known diffusion constants of some molecules, such as Ca2+ . Nevertheless, estimated values are not sufﬁciently accurate due to effects of molecular crowding and viscosity (Rangamani and Iyengar, 2007). These three requirements and the lack of data are the reasons why RD models are less common than ODE models for biochemical systems. Moreover, PDEs are mathematically more challenging and require greater computational resources. PDEs resulting from RD models can only be solved analytically for very simple cases. For larger systems with greater space dimensionality, solutions are mostly numerical (Bormann et al., 2001). The numerical integration of PDEs relies on the discretization of time and space. That is, time is divided into time steps (similar to ODEs) and space is divided into subvolumes. Both small subvolumes and short time steps are critical for obtaining accurate results. This, however, comes at the expense of higher computational cost. General-purpose PDE solvers may be used to solve RD equations. However, these require complete mathematical speciﬁcation of the set of equations and conditions, and are, therefore, inaccessible for non-mathematicians (Eungdamrong and Iyengar, 2004b). Virtual Cell (VCell) is a well-known user-friendly open source simulation environment that was speciﬁcally designed for the modeling and simulation of cell biology (http://www.vcell.org/). Modeling using VCell involves specifying the compartments of the system through a graphical user interface. The compartments are then mapped onto realistic cellular geometries (which may be obtained from microscopic images) and associated with the relevant molecular species, reactions, and initial and boundary conditions. This biological speciﬁcation is then automatically converted into a corresponding system of PDEs that is solved using the ﬁnite volume method on a regular grid (Schaff et al., 2001).

Despite the lack of spatial data, and the mathematical complexity of solving PDEs, they have been used to study spatiotemporal effects in different pathways. Brown and Kholodenko (1999) modeled a system of a protein that is activated by a kinase located exclusively at the membrane, and dephosphorylated by a phosphatase located homogeneously in the cytosol. Using measured values of kinase and phosphatase activities and protein diffusion rates, they estimated that the resulting spatial gradients of phosphoproteins were potentially very large. A similar model was analyzed by Kazmierczak and Lipniacki (2009). In their model, a positive feedback loop was included, in which the kinase at the cell membrane activates its stimulating receptor back. They showed that in this case of mutual receptor-kinase activation, cell activation is maintained only if kinase diffusion is low. Hernjak et al. (2005) modeled Ca2+ dynamics in a cerebellar Purkinje cell. They used the Virtual Cell environment to solve a one-dimensional RD model, a compartmental model, and a two-dimensional RD model. Other PDE models of Ca2+ signaling can be found in Fink et al. (2000), Smith (2001), and Blackwell (2005). Other signaling systems have been modeled using PDEs, such as the extracellularsignal-regulated kinase (ERK) and signal transducer and activator of transcription (STAT) pathways (Georgiev et al., 2006), plateletderived growth factor (PDGF)-stimulated 3 phosphoinositides (PI) lipids (Haugh and Schneider, 2004) and (Schneider and Haugh, 2004), chemotaxis (Levine et al., 2006) and (Levchenko and Iglesias, 2002), basic Fibroblast Growth Factor (FGF-2) pathway (Filion and Popel, 2004), autocrine signaling (Shvartsman et al., 2001), and membrane signaling (Haugh and Lauffenburger, 1997; Haugh, 2002). Similar to ODEs, PDEs are deterministic, and therefore they cannot account for the stochastic noise resulting from the low numbers of molecules. Furthermore, discretization methods amplify the effect of noise, because the number of molecules in each subvolume becomes even smaller (Takahashi et al., 2005). PDEs also overlook another equally important source of stochasticity; the Brownian motion processes that underlie diffusion (Andrews et al., 2009). Since PDEs provide a macroscopic description in terms of concentration gradients across space, no individual particles are represented. Particle-based methods (explained in Section 3.3) are needed to account for the random diffusion of individual molecules. However, these methods involve much greater computational costs and are generally less scalable compared to PDEs. On the positive side, PDEs have been extensively used in many disciplines, and they provide a range of numerical solution methods and simulation engines. 3.3. Stochastic methods The basic assumptions of ODEs and PDEs are that the reaction processes are deterministic and that the molecular quantities are high. However, this is not always true. First, the diffusive trajectories of the individual molecules and the probabilities of the different chemical reactions are essentially random. As a result, the formation of protein products usually occurs in short, random bursts (Andrews and Arkin, 2006). The continuously varying concentrations used in deterministic methods fail to mimic these discontinuities, or jumps, in the evolution of the system. Second, molecular quantities may exist in several biochemical systems in low copy numbers. Stochastic ﬂuctuations, or noise, around the average value of the concentration of a given species are typically estimated to be proportional to the square root of the available number of the molecules of that species. For a large number of molecules, it is acceptable to assume that these ﬂuctuations average out. However, as molecule counts decrease, stochastic ﬂuctuations represent a more signiﬁcant percentage of the population, and may substantially alter the system behavior (Rao et al., 2002). A

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403

404

405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422

G Model BIO 3544 1–18

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477

stochastic method that captures more than just average population levels becomes then indispensable. Different thresholds have been proposed to decide whether deterministic or stochastic methods should be used. Aldridge et al. (2006) suggested that deterministic, or continuum, methods become suitable when the number of molecules is higher than 100–1000 (i.e., a stochastic variation of about 3–10%). Klipp et al. (2005) proposed that a molecule count on the order of dozens or hundreds necessitate a discrete view of the system. In some instances, stochastic variations were considered signiﬁcant even in the presence of a large number of molecules (Raychaudhuri et al., 2008). In addition to the intrinsic stochasticity arising due to discrete numbers of molecules, biochemical systems also exhibit variability in behavior induced by external factors. Such extrinsic noise may result from upstream signaling components or from external environmental factors, and may switch the system from one stable state to another. For a given set of initial (and boundary) conditions, deterministic models evolve along a ﬁxed path. Therefore, they are unable to explore multiple stable states of the system. On the contrary, the time (and space) evolution of the system in a stochastic model is not fully characterized by initial (and boundary) conditions (Eungdamrong and Iyengar, 2004a). For a given set of conditions, multiple runs of a stochastic model generate different dynamics/observations. Samoilov et al. (2005) demonstrated how extrinsic noise may induce bistable oscillatory behavior in enzymatic futile cycles, a common structure in metabolic and signaling networks. Their analysis demonstrated how the stochastic model captured the bistable behavior, while the deterministic model only predicted a monostable behavior for any set of parameters. Spatial systems must be aware of stochastic variations. This is mainly because the numbers of molecules become smaller, as smaller localized subvolumes are considered. Bhalla (2004) investigated the interplay between stochasticity and diffusion in small subcellular volumes. His results showed how the dynamics produced by deterministic and stochastic models differ as subvolumes ´ become smaller. Dobrzynski et al. (2007) discussed the importance of discrete-spatial-stochastic methods in diffusion-limited biochemical systems and compared the performance of four spatial stochastic methods. The basic concept of stochastic methods is to take a discrete view of the system, where discrete molecule counts are considered, at rather than continuous concentrations. The state of the system time t is represented by a vector, X(t) = X1 (t), X2 (t), . . ., XN (t) , of integer-valued molecule counts of the N species existing in the system. Such a system has a ﬁnite set of states corresponding to all possible combinations of the molecule counts, i.e., X1 , X2 , . . ., XN . Transitions between the system states are caused by chemical reactions that change the species populations. At an arbitrary time t, the system may exist in any possible state x with a certain probability P( x, t), subject to the initial condition X(to ) = xo .The chemical master equation (CME) completely determines the time evolution of the conditional probability distribution P( x, t| xo , to ), for the case of a well-stirred homogeneous volume at thermal equilibrium as follows (Gillespie, 1992):

478

479

480

dP(x, t|x0 , t0 ) dt =

M

[aj (x − v j )P(x − v j , t|x0 , t0 ) − aj (x)P(x, t|x0 , t0 )].

(5)

j=1 481 482 483

where M is the number of reaction channels, aj ( x) is the propensity function of reaction Rj in state x (or the probability per unit time, of one Rj reaction occurring, given that the system is in state x),

5

and vj is the stoichiometric vector for reaction j, which describes the change in x as a result of the occurrence of one Rj reaction. Strictly speaking, the CME is a set of coupled ODEs, with one equation for every possible state of the system. Therefore, the CME suffers from the well-known state space explosion problem (Gillespie, 2007). A more tractable approach is to simulate trajectories of X(t) versus t. This is typically carried out by means of the widely used stochastic simulation algorithm (SSA), which is more commonly known as the Gillespie algorithm (Gillespie, 1977). The SSA is exact with respect to the CME in the sense that the probability of generating a given system state using the SSA is exactly the same as that resulting from solving the CME. However, the key quantity in the SSA is a new probability function P(, j| x, t), rather than the function P( x, t| xo , to ) used in the CME. This new function is the joint probability density function of the two random variables, the time to the next reaction () and the index of the next reaction (j), given that the system is currently in state x (Gillespie, 2007). The formula derived by Gillespie to compute P(, j| x, t) is (Gillespie, 1977): P(, j|x, t) = aj (x) exp(−aj (x)),

(6)

where a0 (x)

484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503

M

aj (x).

(7)

504

j=1

In each iteration of the SSA, a random pair (, j) is generated according to the joint probability distribution in (6) using standard Monte Carlo techniques. Using the generated and j, the system state X(t) and the propensities of all reactions are updated to reﬂect the occurrence of one Rj reaction, and the simulation time is advanced by . The simulation then progresses to the next iteration, unless the end of the simulation time is reached. The time required to complete the simulation using the SSA is proportional to the number of reactions. Therefore, the simulation becomes very slow for complex systems. Gibson and Bruck (2000) introduced the Next Reaction Method (NRM), which is also an exact algorithm, but is more efﬁcient than the original SSA. The improved efﬁciency is obtained by avoiding some calculations that are unnecessarily repeated in every iteration of the SSA, and by reusing the generated random numbers when appropriate. The run time of the NRM is proportional to the logarithm of the number of reactions, rather than the number of reactions. However, this improvement comes at the expense of maintaining additional data structures. The SSA, however, is not adequately efﬁcient at simulating complex biochemical systems, even with the NRM improvement, essentially because it simulates every individual reaction event. To deal with this issue, Gillespie (2001) introduced the -leap method. It is an approximate method that produces signiﬁcant gains in simulation speed with acceptable losses in accuracy. The basic concept of the tau-leap method is to advance the simulation time by a preselected period , such that multiple reaction events may take place during a single simulation step. The only assumption used in this method is that during the period , no propensity function changes signiﬁcantly (leap condition). In other words, during the chosen period , propensity functions remain essentially constant, and, hence, aj ( x) will give the probability that channel Rj will ﬁre at any point in time during the interval [t, t + ), independent of the other reaction channels. Therefore, the value of should be made small enough to satisfy the leap condition, but large enough to achieve considerable gains in performance. Similar to ODEs, the original SSA or the -leap method are inefﬁcient at simulating stiff systems (Cao et al., 2004) and (Cao et al., 2005). Tools that implement the different Gillespie variants include Stochkit (Li et al., 2008), COPASI (Hoops et al., 2006), and Virtual Cell (http://www.vcell.org/). Simulations using exact or approximate Gillespie methods become computationally prohibitive as the number of molecules

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545

G Model BIO 3544 1–18 6 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

considered discretely increases. In this case, it is imperative to use more approximate methods, in order to speed up the simulation. One such method is the chemical Langevin equation (CLE) (Gillespie, 2000). The CLE adds further approximations to the -leap method by allowing molecule counts, which represent the system state X(t), to assume real values. In the limit of inﬁnitely large molecular populations, the CLE reduces to the deterministic RRE (Section 3.1). However, note that the CLE is derived through approximations to stochastic kinetics, and not from the RRE (Gillespie, 2007). The CLE is an example of a stochastic differential equation (SDE) that adds a term representing Gaussian white noise to the RRE. In general, SDEs are often used to simulate hybrid systems, where the abundant species may be treated deterministically, while other low-quantity species require discrete treatment (Higham, 2001). Several tools exist for solving SDEs, such as Cellware (Dhar et al., 2004), E-CELL (Tomita et al., 1999), and Dizzy (Cacchiani, 2007). Another computational complexity associated with Gillespiebased methods arises in systems where large complexes of proteins with multiple binding sites exist. A multisite protein complex with 20 modiﬁcation sites, each of which may be modiﬁed independently, may exist in 220 different states. Each of these states reacts as a separate complex. Since Gillespie-based methods focus on individual reaction channels, a detailed model may contain a number of reactions that is a multiple of 220 (note that this inconvenience is also associated with ODEs). This situation was addressed in the StochSim algorithm, where each molecule is represented as an individual software object (Morton-Firth, 1998). The StochSim algorithm uses a ﬁxed time step that is set to accommodate the most rapid reaction in the system. In every time step, two molecules are selected at random to undergo a chemical reaction. Chemical reactions are chosen according to probabilities derived from rate constants and other physical parameters. If a molecule may exist in more than one state, a multistate molecule is used to represent it with a series of binary ﬂags representing the different binding sites. At a given time step, if a multistate molecule is selected for a reaction, the reaction probabilities become also dependent on the state of the molecule. The simulation then proceeds normally, except that the ﬂags of the multistate molecules are updated according to the executed reaction. In this manner, the StochSim algorithm shifts its focus from individual reactions to individual reactive species. Accordingly, the StochSim algorithm is considered more efﬁcient than the Gillespie algorithm in systems where proteins exist in multiple states. On the other hand, if the number of reactions is small and the number of molecules is large, StochSim may be less efﬁcient than the Gillespie algorithm (Firth et al., 2003; Slepchenko et al., 2002). All the stochastic methods discussed so far operate in the well-mixed regime. As emphasized earlier, stochasticity is even more signiﬁcant in spatial systems than in non-spatial ones. Most of the previously described stochastic methods have their corresponding spatial methods. The spatial counterpart of the CME is the reaction-diffusion master equation (RDME). Simulations with the RDME mainly rely on dividing the system volume into small regular subvolumes, where the subvolume size is chosen small enough to guarantee that reacting species are uniformly distributed within that subvolume. The earlier spatially-homogeneous stochastic methods, such as the SSA and its derivatives, are then applied to each homogeneous subvolume locally. In RDME-based methods, the location of each molecule is not marked beyond the occupancy of a speciﬁc subvolume. The ﬁrst adaptation of the Gillespie algorithm to spatially inhomogeneous systems was done by Stundzia and Lumsden (1996) in 1996. In their algorithm, they treated diffusion events for single particles as monomolecular reaction events. They derived rates for diffusion between ﬁnite subvolumes, which are formally

analogous to reaction rates. By treating diffusion and reaction processes similarly, the SSA is applied to each subvolume without much modiﬁcation. Ander et al. (2004) extended Stundzia and Lumsden’s approach by incorporating Gibson and Bruck’s NRM. Elf and Ehrenberg (2004) presented the Next Subvolume Method (NSM), an adaptation of Gibson and Bruck’s NRM to spatial systems. The NSM is implemented in the stochastic simulation tool MesoRD (Hattne et al., 2005). A slightly different algorithm was implemented in the Gillespie-Multi-Particle (GMP) method by Rodríguez et al. (2006). In the GMP algorithm, a ﬁxed diffusion time step is used for every diffusion species. This time step is equal to the average time between diffusion events in the RDME. In the intervals between the diffusion steps, reaction events are executed in each subvolume by using the original SSA. Thus, the GMP method splits the execution of reaction and diffusion processes. Rod´rı guez et al. claim that the GMP method reduces the computational costs significantly relative to the former non-split methods, mainly by avoiding the extra calculations of individual diffusion events and by diffusing molecules in bulk (in systems with higher densities). More complex implementations use unstructured meshes, as opposed to structured Cartesian grids. These include STEPS (Hepburn et al., 2012) and URDME (Drawert et al., 2012). All the spatial Gillespie implementations mentioned so far are based on the RDME, and are population-based. More detailed models use an individual-based approach, where each molecule is represented as a unique object. The spatial extension to the StochSim algorithm by Shimizu (2002) represents the spatial organization of individual molecules by using a simple two-dimensional lattice. Each lattice site may be occupied by exactly one molecule (microscopic lattice). His implementation allows particles to interact spatially within the two-dimensional lattice, but does not support diffusion of particles between the lattice sites. The purpose of his implementation was to study the changes in signaling activity propagated by nearest neighbor interactions. More accurate stochastic particle-based methods allow diffusive movements as well as reaction events of individual molecules to be tracked over space and time (Tolle and Le Novère, 2006). In such models, space is allowed to take on continuous values. That is, each molecule may occupy any arbitrary point in space, rather than occupying subvolumes or discrete lattice sites. This class of particle-based methods is derived from approximations to a more fundamental approach called molecular dynamics (MD). Simulations of MD use Newton’s laws to deterministically solve a set of classical motion equations for every atom in the system. Even though MD simulations are extremely accurate, the required computational time scales linearly with the number of atoms. This restricts the applicability of MD to models consisting of only a few molecules and simulations of only tens of nanoseconds (Frenkel and Smit, 2002). These limitations led to the development of the approximate stochastic Brownian dynamics (BD) approach (Edelstein and Agmon, 1993). This approach is based on the Smoluchowski model (Smoluchowski, 1906), where the solvent particles are ignored, while only the reactive solute molecules are explicitly modeled. The solute molecules, represented as spheres, diffuse with ideal Brownian motion (Gillespie, 1996) due to collisions with the much smaller and implicitly-represented solvent atoms. A reaction occurs between two solute molecules, if they come within an effective binding radius (smaller than the sum of their molecular radii). If the reaction is reversible and the molecules dissociate, they are separated after the dissociation reaction with an initial unbinding radius that is larger than the binding radius. Afterwards, the molecules may move away from each other or they may move toward each other and rebind. The BD approach uses a numerical solution to approximate the analytical solution of the Smoluchowski diffusion equation (Mogilner et al., 2002). In

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677

G Model BIO 3544 1–18

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743

BD simulations, at each time step, each molecule’s position is updated using a random displacement obtained from a Gaussian distribution. This is justiﬁed, because if a small time step is used, particle-particle correlations are insigniﬁcant and each molecule moves independently with free diffusion. However, if longer time steps are used and molecules are allowed to travel larger distances per time step, their behavior is affected by the possibility of reversible traps (rebindings) (Edelstein and Agmon, 1993). Therefore, BD simulations pose very high computational costs, due to the very short time steps required to resolve reactive collisions. Nonetheless, the BD approach has been used to analyze different aspects of diffusion-inﬂuenced reactions (Northrup and Erickson, 1992; Kim et al., 1999). More recently, van Zon and ten Wolde van Zon and ten Wolde (2005a,b) developed a new method called Green’s function reaction dynamics (GFRD) which allows for time steps larger than those imposed by classical BD. At each iteration, the GFRD algorithm ﬁnds the maximum time step tmax during which only single particles or pairs of particles may react (corresponding to mono- or bimolecular reactions, respectively). Then, the algorithm uses the exact solution of the Smoluchowski equation to ﬁnd the type and time of the next reaction, provided that it occurs within tmax . The simulation is then advanced to the time of the chosen reaction, the reaction is executed, new positions are assigned to all molecules and a new iteration begins, and so on. The basic concept of GFRD is to decompose the many-body problem that constitutes the spatial system into a series of smaller one- or two-body problems. Each smaller problem is then handled by analytically solving the Smoluchowski equation for a single molecule or a pair of molecules using Green’s function. Therefore, the GFRD method allows large leaps in simulation time, if molecules are widely separated in space. On the other hand, if the biochemical system is highly populated, the frequency of collisions is high and the time steps selected by the event-driven GFRD algorithm become fairly close to those of classical BD simulations. The latest release of the GFRD code is available online (http://www.gfrd.org/). The Smoldyn (for Smoluchowski dynamics) program contains an implementation which is suitable for crowded environments (Andrews and Bray, 2004). It provides an approximate numerical realization of the Smoluchowski diffusion model. In Smoldyn, molecules are represented as point (i.e., dimensionless) particles with associated binding and unbinding radii, and the simulation uses ﬁxed time steps. Inputs to the simulation are experimental reaction rates, diffusion coefﬁcients and the simulation time step. These inputs are used to calculate the simulation parameters, including the binding and unbinding radii. After this initialization step, the algorithm loops through a series of alternating diffusion and reaction processes till the end of the simulation time. In other words, the molecules are diffused randomly, then all possible reactions (unimolecular or bimolecular) are executed, then the molecules are diffused again, and so on. The key concept of Smoldyn is to calculate the values of binding and unbinding radii that allow the simulation to reproduce a known macroscopic reaction rate. Although it is not as accurate as classical BD or GFRD, Smoldyn is more efﬁcient at simulating systems with large numbers of molecules. As with classical BD simulations, small time steps are required in order to make sure no reactive collisions are missed. As the time step tends to zero, simulation with the Smoldyn algorithm approaches the Smoluchowski model. Examples of models created using the Smoldyn program include the chemotaxis pathway (Lipkow et al., 2005) and (Lipkow, 2006) and a generic kinase–phosphatase system (Lipkow and Odde, 2008). Detailed information on Smoldyn can be found on its website (http://www.smoldyn.org). Moreover, the Virtual Cell tool uses the Smoldyn algorithm to simulate spatial stochastic systems. Finally, MCell is another biochemical simulation

7

Fig. 1. A Petri net consisting of one transition with three input places and two output places, shown before (top), and after (bottom) the ﬁring of the transition.

tool based on the BD approach (Stiles and Bartol, 2001) (http:// www.mcell.org/). To summarize, stochastic models are substantially complicated and simulations are run several times to gather enough statistics (Kwiatkowska et al., 2006). Hence, stochastic methods are more computationally demanding compared to deterministic differential equation methods. On the other hand, stochastic models closely resemble real systems, and thus they provide greater levels of detail that are sometimes indispensable. 3.4. Petri nets (PNs) Petri nets (PNs) are graphical mathematical representations of concurrent, asynchronous, distributed systems. PNs have been applied to many areas such as communication protocols, industrial systems, and distributed software systems (Petri, 1967). PNs have a solid mathematical foundation and a range of automated analysis tools. The basic PN is a directed, weighted, bipartite graph. A sample PN is illustrated in Fig. 1. A PN consists of two types of nodes, places and transitions, where the arcs of the graph connect nodes of different types only. Places are the passive elements of the net and they are represented as circles. Transitions are the active elements and they are represented as boxes. Arcs are labeled with positive integer weights. A place holds a nonnegative integer number of tokens. The state, or marking, of the system is deﬁned by the number of tokens at each place. A transition has a number of input places on which its execution, or ﬁring, is conditioned. A transition is also connected to a number of output places whose states change as a result of its ﬁring. A transition is enabled, if each of its input places has the required number of tokens (as deﬁned by the weight on the arc connecting this input place to the transition). When a transition is ﬁred, the required tokens are consumed from its input places and new tokens are generated at each output place. The weight on the arc connecting the transition to each output place determines the number of tokens added to it (Murata, 1989). An incidence matrix

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

744 745 746 747 748 749 750 751 752

753

754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777

G Model BIO 3544 1–18 8 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823

824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

can be deﬁned for a PN. For every transition, the matrix lists the change of the marking of each place, as a result of the ﬁring of this transition. Consequently, transitions change the state of the system by moving tokens from input to output places. This original PN deals only with integer numbers of tokens, and therefore, it is often termed a discrete PN. Reddy et al. (1993) were the ﬁrst to use PNs for modeling biochemical networks. They used PNs to qualitatively analyze the structure of metabolic pathways. Since then, several attempts have been made to model biochemical networks with PNs. In most of these models, species are associated with places, reactions are associated with transitions, and the number of tokens indicates the quantity of substance (Hardy and Robillard, 2004). The fact that biological systems exhibit evident concurrent behavior makes the idea of using PNs quite appealing. PNs also offer very intuitive graphical representations of biochemical networks with animation of token movement, representing substance, or information, ﬂow (Koch and Schreiber, 2010). The original PNs do not contain an explicit deﬁnition of time. Instead, time is introduced as a strictly local relation between states (Petri, 1966). As a result, the use of such discrete PNs for the dynamic analysis of pathways is not possible and they may only be used for qualitative analysis of biochemical networks. Akin to all qualitative methods, discrete PN models have the advantage that they do not require any knowledge of kinetic data. Heiner et al. (2004) modeled the Fas-induced apoptosis pathway, and illustrated how certain PN structures (transition- or T-invariants) can be mapped onto meaningful biological modules. They used the Integrated Net Analyzer (INA) tool for their analysis (http://www2.informatik.hu-berlin.de/ starke/ ina.html). Sackmann et al. (2006) used qualitative PNs to model the mating pheromone response pathway. They formulated concepts of new PN structures (feasible T-invariants and maximal common transition sets). They used the graphical PN editor and animator tool Snoopy for developing the PN (http://www-dssz. informatik.tu-cottbus.de/DSSZ/Software/Snoopy), and INA for analyzing it. Other STPs that have been modeled using qualitative PNs include the thrombopoietin STP (Li et al., 2006), Ca2+ /calmodulindependent protein kinase II (CaMKII) network (Hardy and Robillard, 2008), and cephalostatin-induced apoptosis in leukemic cells (Rodriguez et al., 2011). Several attempts have been made to incorporate the concept of time in PNs in different ways. The PN theory has also been extended to increase the expressiveness of the PN formalism, to provide a more compact form, or to include stochastic or continuous features (Hardy and Robillard, 2004). Extensions to the original PN include: 1. Timed Petri nets (TPNs), incorporate time by assigning deterministic time delays to transitions and/or places. 2. Stochastic Petri nets (SPNs) are TPNs where, time delays follow a probabilistic distribution (usually the exponential distribution), instead of being deterministic (Kartson et al., 1994). The stochastic process arising from the SPN representation is equivalent to the CME (Goss and Peccoud, 1998). 3. Functional Petri nets (FPNs) specify arc weights as functions of the marking variables, rather than constant values. FPNs are also referred to as ‘self-modifying’ nets since they are able to modify their own ﬁring rules (via dynamic changes of the arc weights) (Valk, 1978). In the biological context, this allows the reaction rates to be affected by variations in the concentrations (Hardy and Robillard, 2004). 4. Colored Petri nets (CPNs) associate each token with an attached data value, called the token color. The token color may be of an arbitrarily complex type (e.g., a record, or data structure, composed of multiple ﬁelds, such as real numbers, text strings, arrays, etc.). Transitions have more elaborate ﬁring rules that

5.

6.

7.

8.

specify the constraints on the token colors in the input places and the token colors produced at the output places (Jensen, 1996). CPNs make it possible to represent, in the same model, different dynamic behaviors modeled by different token colors. This results in more concise representations and simpler models. Continuous Petri nets allow place markings to take on real values instead of integers. If the continuous PN is also timed, then the ﬁring of a transition is a continuous process governed by a continuous speed variable, instead of a delay. This speed speciﬁes the ﬂow rate of tokens from input places to output places (David and Alla, 2010). The incidence matrix of a continuous PN corresponds to a system of ODEs, and therefore is subject to standard ODE analysis. Hybrid Petri nets (HPNs) comprise the traditional discrete places and transitions, as well as other continuous places and transitions (as deﬁned in the continuous PN) (Hassane and David, 1998). In an HPN, the discrete part may inﬂuence the behavior of the continuous part and vice versa. This PN extension is suitable for modeling hybrid systems. Hybrid functional Petri nets (HFPNs) have been speciﬁcally developed, by Matsuno et al. (2003) to model biological pathways. This PN extension is formulated by extending the notions of HPNs, FPNs, and hybrid object net. They also developed the biological pathway simulation software, Genomic Object Net (GON), based on the HFPN architecture (Nagasaki et al., 2003). GON was later released under the name Cell Illustrator, or CI (http://www.cellillustrator.com/). Hybrid functional Petri nets with extension (HFPNe) were developed, by Nagasaki et al. (2004), to account for some deﬁciencies that they reported in the HFPN architecture. Their contribution involved places that can handle other primitive data types (e.g., Boolean and string), as well as complex data types (e.g., objects).

All of the PN extensions preserve the underlying graph structure of the original PN, and thus, allow the convenient switching between the different modeling paradigms. The different PN extensions have been used to model different STPs. Chen et al. (2007) simulated the apoptosis pathway using TPNs, and they proposed rules for systematically calculating the deterministic delays of the transitions. Napione et al. (2009) used SPNs to model a selected portion of signal transduction events involved in the angiogenesis process. Lee et al. (2006) developed a CPN model of the epidermal growth factor (EGF) signaling system. Although CPNs are usually used for qualitative analysis (as they maintain the applicability of discrete PN analysis), they developed a timed CPN model to quantitatively analyze the EGF pathway. They used the Design/CPN and CPN Tools for the implementation of their model (http://www.cpntools.org). Gilbert and Heiner (2006) qualitatively modeled the inﬂuence of Raf Kinase Inhibitor Protein (RKIP) on the ERK STP. They showed how the time-less discrete PN model can be transformed into a timed continuous PN model and then into a system of ODEs. Tasaki et al. (2006) developed an HFPNe model of the epidermal growth factor receptor (EGFR) pathway. Other STPs have been modeled using the HFPN/HFPNe architecture, such as p53-related pathways (Doi et al., 2006) and (Nagasaki et al., 2005), Akt and MAPK pathways (Koh et al., 2006), the NF∗B signaling pathway (Peng et al., 2010), and the interleukin-6 signaling pathway (Troncale et al., 2006). PNs also allow for hierarchical structuring (e.g., hierarchical CPNs (Huber et al., 1991) and hierarchical TPNs (Xu, 2005)). Furthermore, PNs are amenable to model checking techniques. Tools, such as INA and the PEP (Programming Environment based on Petri Nets) tool (http://parsys.informatik.uni-oldenburg.de/ pep/), incorporate model checking features. A database of existing PN tools is available online (http://www.informatik.uni-hamburg.de/ TGI/PetriNets/tools/db.html).

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874

875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907

G Model BIO 3544 1–18

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

913

Finally, PNs do not encompass a standard representation of spatial properties. Nonetheless, it is possible to assign more than one place to the same species residing in different compartments, with transitions representing the displacement between compartments. More conveniently, CPNs may be used, where colors are used to express such spatial information (Chaouiya, 2007).

914

3.5. Process calculi

908 909 910 911 912

915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971

Process algebras/calculi are a family of formal textual languages for representing systems of concurrent communicating processes. Therefore, like PNs, they are well-suited to representing biochemical systems. The most common process calculus in the domain of biological systems is the stochastic pi-calculus (Priami, 1995); a stochastic extension of the original pi-calculus (Milner et al., 1992a) and (Milner et al., 1992b). Regev and Shapiro (2004) provided a translation scheme for representing biochemical reactions in stochastic pi-calculus. They introduced the moleculeas-computation abstraction, where molecules are treated as computational processes and molecular interactions are achieved via communication (exchange of messages) between the relevant molecular processes. They also developed BioSpi, a tool for simulating pi-calculus models of biochemical pathways (http://www. wisdom.weizmann.ac.il/ biospi/index main.html). They used BioSpi to simulate a model of glycogen metabolism. In earlier work, Regev et al. (2000) explained how the original pi-calculus is suited to model to gene expression regulation, metabolic pathways, and STPs, all within a single theoretical framework, and used it to model the RTK-MAPK pathway (RTK stands for receptor tyrosine kinase). Phillips and Cardelli (2004) developed another simulation tool based on the stochastic pi-calculus, namely, the Stochastic Pi-Machine (SPiM). SPiM features a graphical interface for modeling a range of biological systems. Performance Evaluation Process Algebra (PEPA) is another common stochastic process calculus Gilmore and Hillston (1994). It was used by Calder et al. (2006) to model the inﬂuence of RKIP on the ERK STP. Furthermore, they proposed a systematic procedure for generating a system of ODEs from a PEPA model of an STP (Calder et al., 2005). In general, simulations with process calculi rely on numerical analysis of ODEs or execution of stochastic simulation algorithms, such as Gillespie’s. However, process calculi have the advantage of detaching the modeler from these implementation details, which are quite cumbersome in many cases. Process calculi are also compositional, in contrast to traditional ODE-based and stochastic models. In other words, using process calculi, the behavior of a complex system is expressed in terms of the behaviors of its components. This makes it easy to focus on certain aspects of the system and offers more compact and manageable representations (Priami and Quaglia, 2004). More importantly, process calculi allow formal veriﬁcation via model checking techniques. This involves verifying the model against temporal logic speciﬁcations. Queries, such as “what is the probability that a protein degrades?” or “what is the expected number of bindings that occur before degradation?” may be answered, and best/worst case scenarios may be deduced. This is achieved through the construction of the full model and the systematic exploration of all states and paths, without executing them one by one. It is important to note that this is different from generating certain trajectories of the system via simulation. More speciﬁcally, it produces exact results, as opposed to the estimates obtained by collecting statistics from multiple runs of a simulation. However, model checking is only applicable to small-sized models due to state space limitations (Kwiatkowska et al., 2006). Tools that support model checking features include the PEPA Eclipse plug-in (www.dcs.ed.ac.uk/pepa/tools/plugin/index.html) and the PRISM model checker (http://www.prismmodelchecker.org/).

9

Fig. 2. A Boolean network consisting of three species A, B and C: (a) Schematic network diagram shows that C is activated by A and inhibited by B. (b) A network state where both A and C are inactive, while B is active. (c) Boolean function for variable C. (d) All possible states and attractors of the network.

All process calculi mentioned here have no way of incorporating spatial information. However, subsequent variants were devised to add spatial details, among other additional features. These variants represent space either in a compartmental fashion, or more elaborately, as a system of explicit coordinates in a two- or threedimensional space. A review of these variants is given in (Bittig and Uhrmacher, 2010). 3.6. Boolean networks A Boolean network consists of N Boolean variables, each of which assumes a binary value (“0” or “1”). Each variable has an associated Boolean logic rule/function that determines its value, based on the values of other variables in the network. The collective state of the system is represented by an N-tuple of 0’s and 1’s, describing which components are inactive or active, respectively. Thereﬁnclude this paper in boolean section in survey paper @article{chowdhury2013structuralore, a Boolean network is a discrete state system with a ﬁnite set of 2N possible states, corresponding to the possible combinations of the system variables. A transition from one state to another occurs via the execution of the Boolean rules that change the values of the individual variables. Some states, once reached, may never be left again. These states correspond to steady states of the system and are termed attractors (Kauffman, 1993). In biochemical modeling, a Boolean variable is used to represent a molecular species, and the associated Boolean rule formulates the activation state of this species as a function of the states of other signaling components. A sample Boolean network is depicted in Fig. 2. Boolean networks are mainly used for structural analysis, i.e., to help reconstruct a biochemical network. This is typically done by inferring rules that describe the interactions among the variables from experimental data. Additionally, Boolean networks allow for dynamic analysis i.e. to simulate the dynamics of a biochemical pathway over time. In order to account for time, a discrete time variable is introduced in the Boolean model (Klamt et al., 2006). However, unlike other dynamic analysis formalisms, such as ODEs, Boolean networks provide only abstract representations of signaling dynamics. More speciﬁcally, since a Boolean network considers only two discrete (binary) activity levels for every species, they merely provide qualitative descriptions with intermediate values being left out. Furthermore, molecular interactions are abstracted in terms of simple functions of activation and inhibition, and are assigned explicit time delays. In this sense, Boolean networks only provide an approximation to the deterministic behavior of ODE systems and their dynamic analysis mostly

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

972 973 974 975 976 977 978

979

980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014

G Model BIO 3544 1–18 10 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058

1059

1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

resembles the steady state analysis of ODE models (Fisher and Piterman, 2010). However, they are generally easier to construct, particularly because they require very few parameters. Moreover, Boolean models proved to reveal interesting behaviors of STPs, and to facilitate the design of new experiments. Saez-Rodriguez et al. (2007) built a large-scale Boolean model of T cell receptor (TCR) signaling, on which they conducted both steady-state and structural analysis. Their model produced unexpected ﬁndings that were veriﬁed by performing guided in vitro experiments. Kaufman et al. (1999) built a much simpler model of the TCR pathway, where they formulated Boolean rules that intuitively describe the pathway. They tested their model against different hypotheses, and showed that it accounts for a large body of experimental data. Chowdhury et al. (2013) collated information from different databases, published scientiﬁc literature, and experimental studies in order to reconstruct a comprehensive map of Hedgehog pathway. They used Boolean variables to represent proteins and cellular responses (cell proliferation, cell cycle progression,...etc), and they used CellNetAnalyzer for their simulation (http://www2. mpi-magdeburg.mpg.de/projects/cna/cna.html). Other signaling systems that have been modeled using Boolean networks include EGFR/ErbB signaling (Samaga et al., 2009), and apoptosis signaling (Schlatter et al., 2009). Boolean networks are inherently deterministic, and hence they do not account for the inherent stochasticity in biochemical systems, or the uncertainty (random errors) associated with experimental measurements. Therefore, a single deterministic function for every variable may result in poor accuracy, especially in the case of small data sets Price and Shmulevich (2007). In an attempt to address this issue, probabilistic Boolean networks (PBNs) were introduced (Shmulevich et al., 2002). In PBNs, each variable is associated with a set of Boolean rules, rather than a single rule. Each rule is then chosen with a certain probability, which is estimated from experimental data. In an alternative approach, Helikar et al. (2008) introduced stochasticity into a large-scale integrated Boolean model of RTK, G-protein coupled receptor, and Integrin STPs. They investigated the process of decision-making in signaling networks. They exposed the model to tens of thousands of random combinations of inputs and analyzed the combined dynamics of multiple outputs. Their results indicated that signaling networks are capable of maintaining robust clustering of varying input signals, and mapping them into relevant classes of cellular responses. Comprehensive reviews of Boolean modeling of signaling networks can be found in Klamt et al. (2006) and Helikar et al. (2011). 3.7. Bayesian networks A Bayesian network is a probabilistic directed acyclic graph, where nodes represent variables and edges represent causal dependencies among the variables. A sample Bayesian network is shown in Fig. 3. The dependencies are quantiﬁed by the conditional probabilities of each node, given its parents in the network (Pearl, 1988). Analysis algorithms of Bayesian networks aim to infer the conditional dependence relations between the network variables. Thus, Bayesian networks belong to the class of structural network models. Bayesian networks are used extensively for inferring structures of regulatory networks from gene expression data. However, they are not as common in the signal transduction domain. In either context, nodes are used to represent species and links are used to represent direct/indirect inﬂuences among the species. The ﬁrst application of the Bayesian approach to a signaling network was presented by Sachs et al. (2002), where they used Bayesian networks in the process of model selection. For illustrative purposes, they used a very simple MAPK cascade. They proposed four candidate connectivity structures and showed how they scored against experimental data extracted from the literature.

Fig. 3. A Bayesian network consisting of ﬁve random variables (A, B, C, D, E), their conditional probability distributions, and the joint probability distribution.

In later work, they generated multivariate ﬂow cytometry data of 11 phosphorylated proteins and phospholipids, and analyzed these data sets using a Bayesian network structure inference algorithm (Sachs et al., 2005). They extracted directed connectivity links among the signaling components and graded them according to the degree of conﬁdence. Pe’er (2005) presented a complete procedure for inferring the structure of a signaling network from a large data set of phosphorylated protein activity levels. In general, variables of a Bayesian network may be discrete, in which case the conditional probability distributions are described using tables of discrete values. Needham et al. (2006) illustrated the process of inference in a discrete Bayesian network using a generic example of a simple G protein-coupled receptor pathway. However, the variables may also be continuous, in which case they are modeled by probability density functions. 3.8. Cellular automata (CA) A cellular automaton (CA) is a discrete model that consists of an n-dimensional lattice of uniform sites or cells, as illustrated in Fig. 4. A neighboring relation gives, for each site, a ﬁnite list of sites which represents its neighborhood. Each site may be occupied by a number of automata or objects that interact with objects in neighboring sites at discrete time steps according to a set of rules. Each site, then, has a ﬁnite number of states through which it may evolve. More speciﬁcally, at each time step, the interaction rules determine the new state of each site as a function of its state, and the states of its neighboring sites at the previous time step. The simplest case of a CA allows each site to take on only two possible states (occupied or unoccupied) and uses a set of simple Boolean logic rules (Burks, 1970).

Fig. 4. The discrete lattice used in cellular automata. The new state of a site is determined by its state, and the states of its neighboring sites at the previous time step. The neighborhood is typically chosen as (a) the sites to the left, right, top and bottom of the site (von Neumann neighborhood) or (b) the eight sites surrounding the site (Moore neighborhood).

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093

1094

1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107

G Model BIO 3544 1–18

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

1153

Cellular automata have been applied to several biological systems (Ermentrout and Edelstein-Keshet, 1993; Alber et al., 2002). Models of cellular systems mostly focused on intercellular interactions, where objects represented the individual cells. One of the ﬁrst CA-based models of cellular biology is the well-known Conway’s game of life (Gardner, 1970). It is a simple model that describes the birth and death of cells randomly distributed on a two-dimensional lattice. Each lattice site was allowed to contain only one cell and had only two possible states that indicated whether this cell was alive or dead. The local interactions of cells at adjacent sites were governed by very simple rules that unexpectedly produced complex global changes in the cellular population. Even though most CA-based models addressed multi-cellular systems, the CA approach is equally applicable to signaling interactions within the single cell. In this case, however, objects represent individual molecules. Examples of CA-based intracellular signal transduction models include general Michaelis–Menten enzyme reactions (Berry, 2002; Schnell and Turner, 2004). Grima and Schnell (2006) elaborated on this type of reaction. They systematically investigated how the rate laws describing intracellular reactions vary as a function of several factors, such as the geometry and size of the intracellular space. Other examples include the MAPK pathway (Kier et al., 2005) and the Fas ligand (FasL)-induced apoptosis pathway (Apte et al., 2010). In general, CA models may be individual- or population-based (Takahashi et al., 2005). In the individual-based case, the lattice is said to be microscopic, in the sense that each site is occupied by at most one molecule, and so tracking of the individual molecules is possible. In the population-based case, the lattice is said to be mesoscopic, i.e., multiple molecules may reside in the same lattice site simultaneously. Like other subvolume-based methods, mesoscopic CA models cannot track molecules within each site. Even though CA models naturally incorporate these notions of discrete space, they have no explicit treatment of motion. Objects (e.g. cells, molecules, small molecules, etc.) change properties, appear or disappear in different sites, but they do not move (Materi and Wishart, 2007). To address this shortcoming, Wishart et al. (2005) introduced an extension of the original CA, namely, dynamic cellular automata (DCA), along with its simulator implementation, SimCell (http://wishart.biology.ualberta.ca/SimCell/). They included random object moves at each time step, in an attempt to mimic realistic Brownian motion of molecules. They stated that in order to be consistent with experimental diffusion rates, the lattice size and time steps should be sufﬁciently small. A thorough discussion of the advantages and disadvantages of CA and DCA models is given in (Materi and Wishart, 2007).

1154

3.9. Agent-based modeling (ABM)

1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152

1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171

Agent-based models (ABMs) or multi-agent systems (MASs) are based on the notion of agents (Wooldridge, 2002). An agent is an encapsulated computer system that is situated in some environment. It continuously collects information from this environment, and reacts to changes that occur in it. An agent is capable of autonomous action on behalf of its owner, and should be able to ﬁgure out on its own what it needs to do to meet its design objectives. An ABM consists of a number of agents interacting with one another over time and space, typically by exchanging messages. A sample ABM is illustrated in Fig. 5. Generally, agents have very different goals, since the goals of their owners may be different. In order for each agent to successfully achieve its assigned objectives, it should be able to cooperate, coordinate, and negotiate with other agents, much like people deal with others in their daily life. Even though agents have full control over their own behavior and internal state, they do not control their environment. In other words, an agent may only inﬂuence its environment by executing

11

Fig. 5. An agent-based model consisting of a number of agents (gray circles). Based on the implementation, an agent may move in any of a ﬁnite (or inﬁnite) number of directions (solid arrows) and may interact with any other agent (dotted arrows).

actions, but it cannot control it. This means that if the agent performs the same action twice under identical circumstances (as viewed by the agent), the outcomes may be different, i.e., the agent may fail to achieve the desired result. In other words, the environment in which the agents interact is, in general, nondeterministic and agents should be prepared for the possibility of failure. Therefore, it is often argued that agents should be able to learn from past incidents, and dynamically adapt their behavior to achieve their goals. This is the main difference between agents and ordinary objects used in other individual-based formalisms. An object is continuously being told which action to take. Additionally, an object may instruct another object to take an action. On the other hand, an agent has its own thread of control, is capable of initiating actions on its own independent of other agents in the environment, and may even exhibit some degree of intelligence through learning (Wooldridge, 2002). ABMs are analogous to CA models in that local interactions among agents result in global system behaviors. However, there are several aspects that distinguish the CA approach from the ABM approach. First, while CA models may be individual-based, the emphasis on entity identities is stronger in ABMs. In other words, in CA models, the same rules apply to objects of the same type under any conditions. Agents, on the other hand, with their ability to keep account of their past experiences, allow rules to be dependent on their speciﬁc histories as well as their types (Chen et al., 2006). Second, CA models are characterized by having a spatial lattice (or grid), while ABMs do not essentially require a ﬁxed spatial structure. Finally, agents are generally mobile and are capable of travelling over space in a realistic manner, in contrast to CA objects that are generally unable to move, with the exception of the DCA approach, which allows object movement (Wishart et al., 2005). The ABM has been used to model a variety of systems, including biological systems. Similar to CA, ABMs are frequently used to study multi-cellular systems. However, ABMs have been used for a relatively longer time for modeling intracellular STPs. Schwab and Pienta (1997) investigated the use of a special class of ABMs to model STPs in both normal and cancer cells. (Fisher et al., 1999) described the use of ABMs to simulate the dynamics of signaling proteins, based on the behavior of individuals within a community, and to collectively integrate these behaviors into cellular processes.

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211

G Model BIO 3544 1–18 12 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236

1237

1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

Holland (2001) incorporated condition-action rules (of learning classiﬁer systems) in an ABM of a signaling network, to specify behaviors and message exchange between proteins. González et al. (2003) introduced an ABM of intracellular signaling that takes into account the cognitive (learning) capacities of agents, called Cellulat. They used a blackboard architecture to represent spatial organization. Different blackboard levels corresponded to the different cellular structures, and agents communicated through modiﬁcations of signals on the shared blackboard. They applied their model to the EGFR STP. Emonet et al. (2005) developed AgentCell, an ABM used to study bacterial chemotaxis. They used a wrapper around the StochSim software package (Morton-Firth, 1998), as part of their ABM implementation. Pogson et al. (2006) showed how an ABM of the NF-∗B STP provided predictions that were otherwise unattainable using traditional methods, such as ODEs. Finally, Amigoni and Schiaffonati (2007) discussed the feasibility of agent-based simulation in general biological systems, with speciﬁc reference to the simulation of the MAPK STP. The ABM approach is highly intuitive and may be able to reproduce various known behaviors. However, owing to the complex nature of interactions between agents, analysis of ABMs may not provide an understanding of the underlying mechanisms. Nonetheless, an ABM may be perturbed (through manipulations of the agent’s local repertoire of rules) to see how results change accordingly (Mogilner et al., 2006).

4. Software tools Numerous software tools are available for the analysis and simulation of biochemical pathways. These tools typically provide implementations of one or more of the different methods covered in Section 3. In this section, we discuss some of the widely-used biochemical simulation tools. In addition to providing simulation and analysis capabilities, many of these tools support model exchange (import and export) in standard formats, such as the Systems Biology Markup Language (SBML, http://www.sbml.org/) or the Cell Markup Language (CellML, http://www.cellml.org/); a feature that signiﬁcantly enhances model interchange and reuse among researchers. COPASI (http://www.copasi.org/) is an open source tool that supports both deterministic and stochastic simulations of spatiallyhomogeneous biochemical networks. It also provides a hybrid simulation method that dynamically partitions the system into deterministic and stochastic ‘subnets’ according to copy numbers of molecules. COPASI features a simple GUI that enables the user to enter tables of species, reactions, and compartments, and to initiate various analysis tasks. In most cases, COPASI is selected by modelers due to the wide range of analysis tasks (e.g. parameter estimation and optimization, sensitivity analysis, steady state analysis...etc.) that it supports. E-Cell (http://www.e-cell.org) is another open source platform that supports ODE- and CME-based simulation. In addition, E-Cell provides an RDME-based implementation on a microscopic lattice. By using dimensional particles and allowing each lattice site to be occupied by at most one molecule, E-Cell is able to account for volume exclusion and molecular crowding. The E-Cell platform allows the user to switch between the different simulation algorithms, or ‘steppers’, easily, and to use different algorithms for different parts of the system. It also supports hierarchical modeling and provides an elaborate GUI that allows the user to enter the biochemical network structure in graphical format. As mentioned in Section 3.2, the VCell framework is commonly-used for PDE-based simulation. In addition, VCell also supports non-spatial deterministic simulation (via ODEs), non-spatial stochastic simulation (by using the NRM), and spatial stochastic simulation (by using Smoldyn). VCell also makes use of

COPASI parameter optimization capabilities in non-spatial deterministic models. VCell features an advanced GUI that allows the user to enter the model in graphical format or by writing a math speciﬁcation in the VCell Markup Language (VCML) format. The user may also import SBML or CellML model ﬁles. Users specify spatial structures, or ‘Geometry’, for their models either analytically or based on digital images. Digital images can also be used to specify initial conditions and distributions for spatial simulations (e.g. by using ﬂuorescence imaging data). The VCell tool also allows users to import biochemical models directly from the BioModels database (http://www.ebi.ac.uk/biomodels-main/), the Pathway Commons database (http://www.pathwaycommons.org/), or the VCell database. The last keeps an up-to-date record of the biochemical models that were primarily created using the VCell tool. By using this database, users can share their models with other speciﬁc VCell users or they can make them public to the whole VCell community. This enables collaboration and model reuse, and offers a good repertoire of illustrative examples for new users. Tools supporting a speciﬁc method or class of methods are also available. For example, many tools have emerged to provide implementations of stochastic algorithms. StochKit2 (http://www. engineering.ucsb.edu/ cse/StochKit/index.html), an updated version of StochKit (http://www.engineering.ucsb.edu/ cse/StochKit/ StochKit whatis.html), is a stochastic non-spatial (homogeneous) simulation framework that provides efﬁcient implementations of several variants of the SSA and tau-leap methods. StochKit2 is open source and is speciﬁcally designed to be easily extended or incorporated in other software. This tool also supports SBML and allows for automatic parallelism on multicore architectures (Sanft et al., 2011). MesoRD (http://www.mesord.sourceforge.net), which provides an implementation of the NSM, is an open source tool that comes with detailed documentation and tutorials that help new users construct their own models easily. MesoRD has extensive support for SBML and allows the user to embed additional information required by the simulation in the form of SBML annotations. In order to deﬁne the spatial structure of the model, MesoRD uses constructive solid geometry (CSG), which speciﬁes complex geometries as compositions of simpler ones. The MesoRD implementation accounts for multi-subvolume reactions and trafﬁcking of molecules from one compartment to another. This tool also features a simple GUI that can be used to load SBML ﬁles, specify simulation parameters, and run a real-time visualizer (which renders an approximation of the system during simulation). The source code of the GMP algorithm ´ (Rodríguez et al., 2006) is also available online ((Dobrzynski et al., 2007), http://www.cwi.nl/projects/sic/bioinformatics2007/). This code can be built to create a command line application that loads plain text model ﬁles. Few simple geometries are hardcoded in the GMP implementation. Alternatively, the user can provide a plain text geometry ﬁle that contains, for each subvolume, the coordinates of this subvolume and the list of its neighbors. In order to run different examples, the user may have to modify certain constants in the source code and rebuild the application. Unlike MesoRD, the current GMP implementation provides no visualizations and does not support multi-subvolume reactions or molecular trafﬁcking. Finally, a parallel implementation of GMP that also supports SBML was recently proposed (Vigelius et al., 2011). STEPS (http://www.steps.sourceforge.net/) is an open source tool that implements a derivation of the SSA on tetrahedral meshes (which allows for more realistic geometric structures than the regular subvolumes used by former tools). This tool also supports (spatial/non-spatial) deterministic simulation. By using Python scripts, the user can enter model speciﬁcation, simulation parameters, and geometry, and can output, analyze, and plot simulation results. Alternatively, separate geometry preparation and visualization toolkits can be downloaded

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340

G Model BIO 3544 1–18

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406

and used (Chen and De Schutter, 2014). STEPS also provides thorough and well-validated support for SBML. A similar tool is URDME (http://www.sourceforge.net/projects/urdme/) which provides an implementation of the NSM on tetrahedral meshes. This tool supports model speciﬁcation in SBML and can be easily extended by adding new solvers (other than the NSM solver). Although the URDME software is free and open source, it relies on commercial packages for providing model and geometry speciﬁcations. This pitfall was later avoided by introducing PyURDME (http://www.pyurdme.org), an updated Python-based version of URDME that relies exclusively on open source software. Likewise, implementations of particle-based stochastic algorithms, such as Smoldyn and GFRD are also available as open source software. The Smoldyn website (http://www.smoldyn.org) provides the source code of the Smoldyn program along with thorough documentation and numerous tutorial ﬁles. An up-to-date list of the publications that used Smoldyn simulations is also made available on this website. The Smoldyn implementation supports a variety of molecular interactions with surfaces and provides graphical visualizations of the system during simulation. However, Smoldyn does not support any model exchange formats. Instead it takes as input a plain text conﬁguration ﬁle that contains all model, simulation and visualization parameters. Similarly, the GFRD tool (http://www.gfrd.org/) provides no support for model exchange standards. A user has to customize available Python scripts (or write new ones) in order to simulate a model or run visualizations; a step which becomes cumbersome for large models. The latest version of GFRD, enhanced GFRD (eGFRD), is now provided as part of the E-Cell framework and can be accessed as one of the steppers. Most of the PN tools that have been used for biochemical simulation and analysis are general purpose PN simulators. INA (http://www2.informatik.hu-berlin.de/ starke/ina.html) is a free tool that performs simulation, analysis, and model checking for PNs. It supports discrete PNs, as well as CPNs and TPNs, and it covers a wide range of analysis techniques (including reachability, invariant, and structural analysis). The INA tool uses special commands in order to enter, edit, and analyze the nets. The PEP tool (http://parsys.informatik.uni-oldenburg.de/ pep/) is an open source and extensible platform that integrates a variety of tools (including INA) to offer analysis and model checking for PNs. This platform provides a GUI that enables the user to perform graphical editing of the nets and to access the underlying analysis tools. CPN tools (http://www.cpntools.org) is an open source tool for editing, simulating, and analyzing CPNs. It features a user-friendly GUI and supports untimed, timed, and hierarchical CPNs. CPN tools is still being maintained and augmented with new functionality to date. Cell Illustrator (CI), which is centered around the HFPNe framework, is available either as a limited trial version or a commercial one (http://www.cellillustrator.com/). This tool is equipped with a user-friendly and intuitive GUI (with different biological graphics used as PN places and transitions) and supports SBML (subject to some restrictions). It also supports model import and export in the Cell System Markup Language (CSML) format. The CSML website (http://www.csml.org/) provides access to converter tools that transform SBML and CellML ﬁles into the less-common CSML format in order to be easily imported into the CI tool. This website also provides access to a library of CSML models that are curated by the CI team and that can be easily imported into CI for simulation and visualization. Two Boolean-based tools, that were mainly designed–and that have been extensively used–for the analysis of biochemical networks, are CellNetAnalyzer and CellNetOptimizer. CellNetAnalyzer (CNA) is available as a MATLAB toolbox and has a free academic license (http://www2.mpi-magdeburg.mpg.de/ projects/cna/cna.html). The user may use a GUI to enter and analyze networks, or may alternatively use MATLAB commands.

13

The tool website keeps an up-to-date record of the models that were analyzed using CNA. The website also gives access to plugins that facilitate the import/export of models to/from CNA in different formats. CNA can also import biochemical network diagrams from online databases, or from graph drawing tools. These network diagrams can then be linked to Boolean network models (with the help of user-supplied information) in order to build ‘interactive network maps’. In addition to Boolean logic, CNA also supports multi-valued logic in which a species may have many arbitrary discrete states (Klamt et al., 2007). Moreover, CNA provides several Boolean analysis techniques and can simulate ODEs derived from the Boolean model (via the Odefy plugin, http://www.helmholtz-muenchen.de/cmb/odefy). Finally, CNA also supports model export in SBML qual format (which is used to describe qualitative biochemical models). CellNetOptimizer (CNO) is an open source tool that supports analysis and visualization of Boolean logic models of signaling networks (http://www.cellnopt.org/). This tool uses prior knowledge about the signaling pathway and trains it against experimental data (by minimizing some objective function) in order to create optimized network models. The CNO software is implemented in the R language (http://www.r-project.org/) and consists of a number of packages, some of which are also available as a MATLAB toolbox. In order to enter and analyze networks, users are required to write R scripts. Alternatively, they may use CytoCopteR, which provides an intuitive point and click interface (http://www.cellnopt.org/cytocopter/index.html). Similar to CNA, CNO can also derive and simulate ODE models. Simulation tools based on CA or ABMs are also available. Cell++ is an open source tool that combines the CA approach with the BD approach (http://sourceforge.net/projects/cellpp/). By doing this, Cell++ is able to simulate bulk properties of large quantities of small molecules, while simultaneously allowing larger molecules to be treated individually. Cell ++ has a simple GUI that can be used to run speciﬁc examples. However, in order to create and simulate more complex models, users have to modify the source code of the tool, as well as the corresponding input ﬁles. SimCell, the simulation framework of the DCA approach, is available as a free tool (http://wishart.biology.ualberta.ca/SimCell/). It features an easy-to-use GUI and supports SBML. It is also accompanied with several examples of biochemical models that provide useful illustrations for new users. SimCell allows users to simulate and visualize molecular behavior in different ways. They can generate screenshots or ‘movies’, or output their results in the form of graphs. AgentCell, mentioned in Section 3.9, is available as an open source software (http://www.agentcell.org). In addition to making use of StochSim, AgentCell relies on other tools and libraries to provide useful functionalities, such as agent-based capabilities or different output formats. AgentCell does not provide a user interface. In order to prepare a simulation, the user has to specify model and geometry parameters using Java code.

5. Summary Table 1 summarizes the methodologies/algorithms presented in Section 3. For each method, the table columns list its type (deterministic or stochastic), time granularity (continuous, discrete, or discrete-event), molecular scale (molar concentrations, discrete amounts, or individual molecules), whether/how spatial information is incorporated (no spatial representation, compartmental space, discrete subvolumes, or continuous coordinates), and the underlying assumptions (if any). The table indicates that deterministic methods, such as ODEs and PDEs, treat molecules as continuous concentrations, and therefore, are mostly used when there are large numbers of molecules.

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457

1458

1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469

Molecular scale

Space a

Assumptions

ODEs

Deterministic

Continuous

Continuous concentrations

Non-spatial

Compartmental modeling

Deterministic

Continuous

Continuous concentrations

Compartmental space

PDEs

Deterministic

Continuous

Continuous concentrations

Mesh space

CME

Stochastic

Continuous

Discrete numbers of molecules

Non-spatial

Gillespie (or NRM or tau-Ieap)

Stochastic

Discrete-event

Non-spatial

CLE

Stochastic

Discrete-event

RDME (Spatial Gillespie Methods)

Stochastic

Discrete-eventb

Discrete numbers of molecules Real-valued numbers of molecules Discrete numbers of molecules

Well-mixed space, large numbers of molecules, and deterministic reactions. Spatial information abstracted as surface area/volume, well-mixed compartments, slow transport rates between compartments, large numbers of molecules, and deterministic reactions. Explicit cell geometries, inhomogeneous compartments, fast transport rates between compartments, large numbers of molecules, and deterministic reactions. Well-mixed space, low numbers of molecules, probabilistic reactions, and binding probability that depends on probability of reaction. Like CME. Tau-leap uses a larger time step and assumes that no propensity function changes signiﬁcantly during . Like CME, but larger numbers of molecules.

Discrete space (Mesoscopic)

Classical BD

Stochastic

Continuous

Individual molecules

Particle space

GFRD Smoldyn StochSim

Stochastic Stochastic Stochastic

Discrete-event Discrete Discrete

Individual molecules Individual molecules Individual, multi-state molecules

Particle space Particle space Non-spatial or discrete space (Microscopic)

Original PNsc

Deterministic

N/A

Non-spatial

Process Calculi

Stochastic

Discrete

Boolean networks

Deterministic

Discrete

CA

Stochastic

Discrete

ABM

Stochastic

Discrete

Discrete numbers of molecules Discrete numbers of molecules or Individual molecules Represents only whether a molecular species is active or inactive Discrete numbers of molecules or Individual molecules Individual molecules

a b c

According to the deﬁnition of Takahashi et al. (2005). With exception of GMP which uses an event-driven scheme for reactions and a ﬁxed timing scheme for diffusion. The different PN extensions have different properties.

Non-spatial

Non-spatial

Discretized space (ﬁner subvolumes provide more accurate results), low numbers of molecules, probabilistic reactions, and binding probability that depends on probability of reaction and molecules’ positions. Continuous space (molecules move with ideal Brownian motion), very low numbers of molecules, probabilistic reactions, and binding probability that depends on probability of reaction and molecules’ positions. Like Classical BD Like Classical BD + dimensionless particles. Well-mixed or discretized space (but no explicit treatment of diffusion), low numbers of molecules and large numbers of reaction channels, reactions occur in a probabilistic manner, and binding probability that depends on probability of reaction (and molecules’ positions). Well-mixed space, low numbers of molecules, and deterministic reactions. Well-mixed space, low numbers of molecules, and probabilistic reactions.

Non-spatial

Well-mixed space, large numbers of molecules, and deterministic reactions.

Discrete space (Mcso-or Microscopic)

Discretized space (but no explicit treatment of diffusion), low numbers of molecules, and binding probability that depends on probability of reaction (and molecules’ positions). Low numbers of molecules, and binding probability that depends on probability of reaction (and molecules’ positions).

Implementation-based

ARTICLE IN PRESS

Time

G Model

Stochastic/Deterministic

N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

Methodology/Algorithm

BIO 3544 1–18

14

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

Table 1 Methodologies/Algoritms for the modeling and simulation of STPs.

G Model BIO 3544 1–18

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

1502

Stochastic methods, on the other hand, are used at low numbers of molecules, and they either consider integer molecule counts or distinguish individual molecules. The former are easier to solve, but only offer an approximation to system behavior (averages). Conversely, the latter are more computationally demanding, but are more accurate at capturing noise and stochastic variations. Non-spatial methods, such as ODEs, CME and Gillespie-based algorithms, assume a well-mixed volume and keep no account of positional information whatsoever. Although these methods do not account for spatial gradients and diffusion-limited reactions, they are simpler and require less parameters than their spatial counterparts. PDEs, on the other hand, provide spatial gradients of continuous species concentrations. Spatial lattice-based methods, such as RDME algorithms and CA, divide the reaction volume into discrete subvolumes/sites. Finer subvolumes provide more accurate results at the expense of higher computational cost. The most ﬁne-grained lattice-based method is the microscopic CA, where each lattice site is occupied by at most one molecule. Particle-based methods provide the greatest level of spatial detail. These methods allow each molecule to occupy any arbitrary point in space, rather than a speciﬁc subvolume or lattice site. In other words, spatial location is described by continuous coordinates in a two- or threedimensional space. However, this mandates higher computational resources. Therefore, models range from the simple deterministic, non-spatial models (e.g. Boolean networks and ODEs) to more elaborate models by incorporating space (e.g. PDEs), stochasticity (e.g. CME, Gillespie-based algorithms, and process calculi), or both (e.g. RDME-based algorithms, BD, GFRD, and Smoldyn). As more detail is incorporated, the modeling process becomes more challenging and requires more parameters. Therefore, modelers usually begin with simple models and proceed into building more sophisticated ones as dictated by the available resources and the system under investigation.

1503

6. Conclusions

1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501

1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533

This primer surveyed the most prominent methodologies used for the modeling and simulation of biochemical networks, with speciﬁc focus on literature of STPs. Since this class of biochemical networks is largely characterized by time-variant behaviors, it is mostly simulated using dynamic analysis methods. However, structural analysis methods may also be used to check the model for topological correctness. Our primer provided a comprehensive account of the most common methodologies, their representation details, and their advantages and disadvantages. However, it is not the purpose of this work to provide a complete description of the theory behind the different methods. Whenever appropriate, signiﬁcant literature/references on a speciﬁc method are pointed out. Moreover, this primer discusses the ease-of-use and modeling ﬂexibility of the most widely used biochemical tools. The different modeling and simulation methodologies have different assumptions and address the system at varying abstraction levels. For example, while most methods perform dynamic analysis of a biochemical system via simulation, discrete PNs and Boolean and Bayesian networks help reconstruct the topology of the biochemical network. Methods also differ in their type (deterministic or stochastic), and in their representation of time and space. Furthermore, some methods may be population-based (i.e. track changes in continuous concentrations or in discrete numbers of molecules), while others are individual-based (i.e. track individual molecules as they travel over time and space). None of the existing methods is suitable for all instances of biochemical pathways. However, the choice of method depends on the phenomenon being investigated, the available computational resources, and the possibility of obtaining the necessary data

15

(parameters). Moreover, the knowledge and scientiﬁc background of the modeler is an important factor that primarily affects this choice. While mathematical methods (such as ODEs, PDEs, and master equations) have strong theoretical foundation and analysis techniques, they are generally more complex and difﬁcult to work with. On the other hand, methods, such as PNs, process calculi, and Boolean networks, generally dissociate the modeler from tedious implementation details, provide visualizations that are quite intuitive and valuable, and propose answers via model checking.

References Alber, M., Kiskowski, M., Glazier, J., Jiang, Y., 2002. On cellular automaton approaches to modeling biological cells. In: Rosenthal, J., Gilliam, D.S. (Eds.), Mathematical Systems Theory in Biology, Communication, and Finance. Springer, New York, pp. 1–39. Aldridge, B., Burke, J., Lauffenburger, D., Sorger, P., 2006. Physicochemical modelling of cell signalling pathways. Nat. Cell Biol. 8 (11), 1195–1203. Amigoni, F., Schiaffonati, V.,2007. Multiagent-based simulation in biology – a critical analysis. In: Model-Based Reasoning in Science, Technology, and Medicine, vol. 64. Springer, pp. 179–191. Ander, M., Beltrao, P., Di Ventura, B., Ferkinghoff-Borg, J., Foglierini, M., Kaplan, A., Lemerle, C., Tomas-Oliveira, I., Serrano, L., 2004. SmartCell, a framework to simulate cellular processes that combines stochastic approximation with diffusion and localisation: analysis of simple networks. Syst. Biol. 1 (1), 129–138. Andrews, S., Arkin, A., 2006. Simulating cell biology. Curr. Biol. 16 (14), R523–R527. Andrews, S., Bray, D., 2004. Stochastic simulation of chemical reactions with spatial resolution and single molecule detail. Phys. Biol. 1, 137–151. Andrews, S., Dinh, T., Arkin, A., 2009. Stochastic models of biological processes. Encycl. Complex. Syst. Sci., 8730–8749. Apte, A., Bonchev, D., Fong, S., 2010. Cellular automata modeling of FASL-initiated apoptosis. Chem. Biodivers. 7 (5), 1163–1172. Berry, H., 2002. Monte Carlo simulations of enzyme reactions in two dimensions: fractal kinetics and spatial segregation. Biophys. J. 83 (4), 1891. Bhalla, U., 2004. Signaling in small subcellular volumes. I. Stochastic and diffusion effects on individual pathways. Biophys. J. 87 (2), 733–744. Bittig, A., Uhrmacher, A., 2010. Spatial modeling in cell biology at multiple levels. In: IEEE Proceedings of the 2010 Winter Simulation Conference (WSC 10), pp. 608–619. Blackwell, K., 2005. Modeling calcium concentration and biochemical reactions. Brains Minds Media 1 (2). Bormann, G., Brosens, F., De Schutter, E., 2001. Diffusion. MIT Press, pp. 189–224 [Chapter 7]. Brown, G., Kholodenko, B., 1999. Spatial gradients of cellular phospho-proteins. FEBS Lett. 457 (3), 452–454. Burks, A., 1970. Von Neumann’s self-reproducing automata. In: Burks, A. (Ed.), Essays on Cellular Automata. University of Illinois Press, Urbana, pp. 3–64 [Chapter 1]. Cacchiani, L., 2007. Simulation and analysis of chemical reactions using stochastic differential equations. University of Torino/University of Edinburgh, Ph.D. thesis. Calder, M., Gilmore, S., Hillston, J., 2005. Automatically deriving odes from process algebra models of signalling pathways. Comput. Methods Syst. Biol., 204–215. Calder, M., Gilmore, S., Hillston, J., 2006. Modelling the inﬂuence of rkip on the erk signalling pathway using the stochastic process algebra pepa. Lect. Notes Comput. Sci. 4230, 1–23. Cao, Y., Gillespie, D., Petzold, L., 2005. The slow-scale stochastic simulation algorithm. J. Chem. Phys. 122 (1), 014116. Cao, Y., Petzold, L., Rathinam, M., Gillespie, D., 2004. The numerical stability of leaping methods for stochastic simulation of chemically reacting systems. J. Chem. Phys. 121 (24), 12169. Chaouiya, C., 2007. Petri net modelling of biological networks. Brief. Bioinform. 8 (4), 210–219. Chen, C., Nagl, S., Clack, C., 2006. Computational techniques for modeling and simulating biological systems. ACM Comput. Surv. 34 (3), 5. Chen, L., Qi-Wei, G., Nakata, M., Matsuno, H., Miyano, S., 2007. Modelling and simulation of signal transductions in an apoptosis pathway by using timed petri nets. J. Biosci. 32 (1), 113–127. Chen, W., De Schutter, E., 2014. Python-based geometry preparation and simulation visualization toolkits for STEPS. Front. Neuroinform. 8. Cho, K., Wolkenhauer, O., 2003. Analysis and modelling of signal transduction pathways in systems biology. Biochem. Soc. Trans. 31 (6), 1503–1509. Chowdhury, S., Pradhan, R.N., Sarkar, R.R., 2013. Structural and logical analysis of a comprehensive hedgehog signaling pathway to identify alternative drug targets for glioma, colon and pancreatic cancer. PloS ONE 8 (7), e69132. Conrad, E., Tyson, J., 2006. Modeling Molecular Interaction Networks with Nonlinear Ordinary Differential Equations. The MIT Press, Cambridge, MA, pp. 97–123, Ch. 6. David, R., Alla, H., 2010. Timed Continuous Petri Nets. Springer, pp. 159–229 [Chapter 5]. Decraene, J., Hinze, T., 2010. A multidisciplinary survey of computational techniques for the modelling, simulation and analysis of biochemical networks. J. Univers. Comput. Sci. 16 (9), 1152–1175.

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

1534 1535 1536 1537 1538 1539 1540 1541 1542

1543

1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613

G Model BIO 3544 1–18 16 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

Dhar, P., Meng, T., Somani, S., Ye, L., Sairam, A., Chitre, M., Hao, Z., Sakharkar, K., 2004. Cellware—a multi-algorithmic software for computational systems biology. Bioinformatics 20 (8), 1319–1321. ´ Dobrzynski, M., Rodríguez, J.V., Kaandorp, J.A., Blom, J.G., 2007. Computational methods for diffusion-inﬂuenced biochemical reactions. Bioinformatics 23 (15), 1969–1977. Doi, A., Nagasaki, M., Ueno, K., Matsuno, H., Miyano, S., 2006. A combined pathway to simulate cdk-dependent phosphorylation and arf-dependent stabilization for p53 transcriptional activity. Genome Inform. 17 (1), 112. Drawert, B., Engblom, S., Hellander, A., 2012. URDME: a modular framework for stochastic simulation of reaction-transport processes in complex geometries. BMC Syst. Biol. 6 (1), 76. Edelstein, A., Agmon, N., 1993. Brownian dynamics simulations of reversible reactions in one dimension. J. Chem. Phys. 99 (7), 5396–5404. Elf, J., Ehrenberg, M., 2004. Spontaneous separation of bi-stable biochemical systems into spatial domains of opposite phases. In: IEE Proceedings on Systems Biology, vol. 1. IET, pp. 230–236. Emonet, T., Macal, C., North, M., Wickersham, C., Cluzel, P., 2005. AgentCell: a digital single-cell assay for bacterial chemotaxis. Bioinformatics 21 (11), 2714–2721. Ermentrout, G., Edelstein-Keshet, L., 1993. Cellular automata approaches to biological modeling. J. Theor. Biol. 160 (1), 97–133. Eungdamrong, N., Iyengar, R., 2004a. Computational approaches for modeling regulatory cellular networks. Trends Cell Biol. 14 (12), 661–669. Eungdamrong, N., Iyengar, R., 2004b. Modeling cell signaling networks. Biol. Cell 96 (5), 355–362. Filion, R., Popel, A., 2004. A reaction-diffusion model of basic ﬁbroblast growth factor interactions with cell surface receptors. Ann. Biomed. Eng. 32 (5), 645–663. Fink, C., Slepchenko, B., Moraru, I., Watras, J., Schaff, J., Loew, L., 2000. An imagebased model of calcium waves in differentiated neuroblastoma cells. Biophys. J. 79 (1), 163–183. Firth, C., Le Novère, N., Shimizu, T., 2003. Stochsim, the stochastic simulator (manual distributed with version 1.4 of the software). Fisher, J., Henzinger, T., 2007. Executable cell biology. Nat. Biotechnol. 25 (11), 1239–1249. Fisher, J., Piterman, N., 2010. The executable pathway to biological networks. Brief. Funct. Genomics 9 (1), 79–92. Fisher, M., Paton, R., Matsuno, K., 1999. Intracellular signalling proteins as ‘smart’ agents in parallel distributed processes. BioSystems 50 (3), 159–171. Fogler, H., 2005. Elements of Chemical reaction engineering, 4th ed. Pearson Education. Frenkel, D., Smit, B., 2002. Understanding molecular simulation: from algorithms to applications. Academic, San Diego, CA/London. Gardner, M., 1970. Mathematical games: the fantastic combinations of john conway’s new solitaire game “life”. Sci. Am. 223 (4), 120–123. Georgiev, N., Petrov, V., Georgiev, G., 2006. Reaction-diffusion modeling ERK-and STAT-interaction dynamics. EURASIP J. Bioinform. Syst. Biol. 2006, 1–12. Gibson, M., Bruck, J., 2000. Efﬁcient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem. A 104 (9), 1876–1889. Gilbert, D., Fuß, H., Gu, X., Orton, R., Robinson, S., Vyshemirsky, V., Kurth, M., Downes, C., Dubitzky, W., 2006. Computational methodologies for modelling, analysis and simulation of signalling networks. Brief. Bioinform. 7 (4), 339–353. Gilbert, D., Heiner, M.,2006. From petri nets to differential equations-an integrative approach for biochemical network analysis. In: Proceedings of the 27th International Conference on Applications and Theory of Petri Nets (ICATPN 2006). Springer-Verlag, pp. 181–200. Gillespie, D., 1977. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81 (25), 2340–2361. Gillespie, D., 1992. A rigorous derivation of the chemical master equation. Phys. A 188 (1), 404–425. Gillespie, D., 1996. The mathematics of Brownian motion and Johnson noise. Am. J. Phys. 64 (3), 225–239. Gillespie, D., 2000. The chemical Langevin equation. J. Chem. Phys. 113 (1), 297. Gillespie, D., 2001. Approximate accelerated stochastic simulation of chemically reacting systems. J. Chem. Phys. 115 (4), 1716. Gillespie, D., 2007. Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 58, 35–55. Gilmore, S., Hillston, J.,1994. The pepa workbench: a tool to support a process algebra-based approach to performance modelling. In: Proceedings of the Seventh International Conference on Modelling Techniques and Tools for Computer Performance Evaluation. Springer, pp. 353–368. González, P., Cárdenas, M., Camacho, D., Franyuti, A., Rosas, O., Lagúnez-Otero, J., 2003. Cellulat: an agent-based intracellular signalling model. BioSystems 68 (2), 171–185. Goss, P., Peccoud, J., 1998. Quantitative modeling of stochastic systems in molecular biology by using stochastic Petri nets. Proc. Natl. Acad. Sci. (PNAS) 95 (12), 6750–6755. Grima, R., Schnell, S., 2006. A systematic investigation of the rate laws valid in intracellular environments. Biophys. Chem. 124 (1), 1–10. Hardy, S., Robillard, P., 2004. Modeling and simulation of molecular biology systems using Petri nets: modeling goals of various approaches. J. Bioinform. Comput. Biol. 2 (4), 619–637. Hardy, S., Robillard, P., 2008. Petri net-based method for the analysis of the dynamics of signal propagation in signaling pathways. Bioinformatics 24 (2), 209–217. Hassane, A., David, R., 1998. Continuous and hybrid petri nets. J. Circuits Syst. Comput. 8 (01), 159–188.

Hattne, J., Fange, D., Elf, J., 2005. Stochastic reaction-diffusion simulation with MesoRD. Bioinformatics 21 (12), 2923–2924. Haugh, J., 2002. A uniﬁed model for signal transduction reactions in cellular membranes. Biophys. J. 82 (2), 591. Haugh, J., Lauffenburger, D., 1997. Physical modulation of intracellular signaling processes by locational regulation. Biophys. J. 72 (5), 2014–2031. Haugh, J., Schneider, I., 2004. Spatial analysis of 3’ phosphoinositide signaling in living ﬁbroblasts: I. Uniform stimulation model and bounds on dimensionless groups. Biophys. J. 86 (1), 589–598. Heiner, M., Koch, I., Jürgen, W., 2004. Model validation of biological pathways using Petri nets—demonstrated for apoptosis. BioSystems 75 (1), 15–28. Helikar, M., Kochi, N., Konvalina, J., Rogers, J., 2011. Boolean modeling of biochemical network. Open Bioinform. J. 5, 16–25. Helikar, T., Konvalina, J., Heidel, J., Rogers, J., 2008. Emergent decision-making in biological signal transduction networks. Proc. Natl. Acad. Sci. (PNAS) 105 (6), 1913–1918. Hepburn, I., Chen, W., Wils, S., De Schutter, E., 2012. STEPS: efﬁcient simulation of stochastic reaction-diffusion models in realistic morphologies. BMC Syst. Biol. 6 (1), 36. Hernjak, N., Slepchenko, B., Fernald, K., Fink, C., Fortin, D., Moraru, I., Watras, J., Loew, L., 2005. Modeling and analysis of calcium signaling events leading to long-term depression in cerebellar purkinje cells. Biophys. J. 89 (6), 3790–3806. Higham, D., 2001. An algorithmic introduction to numerical simulation of stochastic differential equations. SIAM Rev. 43 (3), 525–546. Holland, J., 2001. Exploring the evolution of complexity in signaling networks. Complexity 7 (2), 34–45. Hoops, S., Sahle, S., Gauges, R., Lee, C., Pahle, J., Simus, N., Singhal, M., Xu, L., Mendes, P., Kummer, U., 2006. COPASI—a complex pathway simulator. Bioinformatics 22 (24), 3067–3074. Huber, P., Jensen, K., Shapiro, R.,1991. Hierarchies in coloured Petri nets. In: Advances in Petri Nets, vol. 483. Springer, pp. 313–341. Hughey, J., Lee, T., Covert, M., 2010. Computational modeling of mammalian signaling networks. Wiley Interdiscip. Rev. Syst. Biol. Med. 2 (2), 194–209. Ideker, T., Galitski, T., Hood, L., 2001. A new approach to decoding life: systems biology. Annu. Rev. Genomics Hum. Genet. 2 (1), 343–372. Jensen, K.,1996. Informal Introduction to Coloured Petri Nets. Springer, pp. 1–63 [Chapter 1]. Kartson, D., Balbo, G., Donatelli, S., Franceschinis, G., Conte, G., 1994. Modelling with Generalized Stochastic Petri Nets. John Wiley & Sons. Kauffman, S., 1993. The Origins of Order: Self Organization and Selection in Evolution. Oxford University Press, USA. Kaufman, M., Andris, F., Leo, O., 1999. A logical analysis of T cell activation and anergy. Proc. Natl. Acad. Sci. (PNAS) 96 (7), 3894–3899. Kazmierczak, B., Lipniacki, T., 2009. Regulation of kinase activity by diffusion and feedback. J. Theor. Biol. 259 (2), 291–296. Keener, J., 2002. Spatial Modeling. Springer, pp. 171–197 [Chapter 7]. Kholodenko, B., 2006. Cell-signalling dynamics in time and space. Nat. Rev. Mol. Cell Biol. 7 (3), 165–176. Kier, L., Bonchev, D., Buck, G., 2005. Modeling biochemical networks: a cellularautomata approach. Chem. Biodivers. 2 (2), 233–243. Kim, H., Yang, M., Shin, K., 1999. Dynamic correlation effect in reversible diffusioninﬂuenced reactions: Brownian dynamics simulation in three dimensions. J. Chem. Phys. 111 (3), 1068. Kitano, H., 2002. Systems biology: a brief overview. Science 295 (5560), 1662–1664. Klamt, S., Saez-Rodriguez, J., Gilles, E.D., 2007. Structural and functional analysis of cellular networks with CellNetAnalyzer. BMC Syst. Biol. 1 (1), 2. Klamt, S., Saez-Rodriguez, J., Lindquist, J., Simeoni, L., Gilles, E., 2006. A methodology for the structural and functional analysis of signaling and regulatory networks. BMC Bioinform. 7 (1), 56. Klipp, E., Herwig, R., Kowald, A., Wierling, C., Lehrach, H., 2005. Systems Biology in Practice: Concepts, Implementation and Application. Wiley-VCH. Koch, I., Schreiber, F.,2010. Introduction. Springer, pp. 3–18, Ch. 1. Koh, G., Teong, H., Clément, M., Hsu, D., Thiagarajan, P., 2006. A decompositional approach to parameter estimation in pathway modeling: a case study of the Akt and MAPK pathways and their crosstalk. Bioinformatics 22 (14), e271–e280. Krauss, G., 2004. Biochemistry of Signal Transduction and Regulation. Wiley-VCH Verlag GmbH & Co. KGaA. Kwiatkowska, M., Norman, G., Parker, D., Tymchyshyn, O., Heath, J., Gaffney, E., 2006. Simulation and veriﬁcation for computational modelling of signalling pathways. In: IEEE Proceedings of the 2006 Winter Simulation Conference (WSC 06), pp. 1666–1674. Lee, D., Zimmer, R., Lee, S., Park, S., 2006. Colored Petri net modeling and simulation of signal transduction pathways. Metab. Eng. 8 (2), 112–122. Levchenko, A., Iglesias, P., 2002. Models of eukaryotic gradient sensing: application to chemotaxis of amoebae and neutrophils. Biophys. J. 82 (1), 50–63. Levine, H., Kessler, D., Rappel, W., 2006. Directional sensing in eukaryotic chemotaxis: a balanced inactivation model. Proc. Natl. Acad. Sci. (PNAS) 103 (26), 9761–9766. Li, C., Suzuki, S., Ge, Q., Nakata, M., Matsuno, H., Miyano, S., 2006. Structural modeling and analysis of signaling pathways based on Petri nets. J. Bioinform. Comput. Biol. 4 (5), 1119. Li, H., Cao, Y., Petzold, L.R., Gillespie, D.T., 2008. Algorithms and software for stochastic simulation of biochemical reacting systems. Biotechnol. Prog. 24 (1), 56–61. Liiving, T., Baker, S., Junker, B., 2011. Biochemical Fundamentals. Springer, pp. 19–36, Ch. 2.

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785

G Model BIO 3544 1–18

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871

Lipkow, K., 2006. Changing cellular location of chez predicted by molecular simulations. PLoS Comput. Biol. 2 (4), e39. Lipkow, K., Andrews, S., Bray, D., 2005. Simulated diffusion of phosphorylated CheY through the cytoplasm of escherichia coli. J. Bacteriol. 187 (1), 45–53. Lipkow, K., Odde, D., 2008. Model for protein concentration gradients in the cytoplasm. Cell. Mol. Bioeng. 1 (1), 84–92. Materi, W., Wishart, D., April 2007. Computational systems biology in drug discovery and development: methods and applications. Drug Discov. Today 12 (7/8), 295–303. Matsuno, H., Tanaka, Y., Aoshima, H., Doi, A., Matsui, M., Miyano, S., et al., 2003. Biopathways representation and simulation on hybrid functional petri net. Silico Biol. 3 (3), 389–404. Mellman, I., Misteli, T., 2003. Computational cell biology. J. Cell Biol. 161 (3), 463–464. Meng, T., Somani, S., Dhar, P., et al., 2004. Modeling and simulation of biological systems with stochasticity. Silico Biol. 4 (3), 293–309. Meyers, J., Craig, J., Odde, D., 2006. Potential for control of signaling pathways via cell size and shape. Curr. Biol. 16 (17), 1685–1693. Milner, R., Parrow, J., Walker, D., 1992a. A calculus of mobile processes, I. Inf. Comput. 100 (1), 1–40. Milner, R., Parrow, J., Walker, D., 1992b. A calculus of mobile processes, II. Inf. Comput. 100 (1), 41–77. Mogilner, A., Elston, T., Wang, H., Oster, G., 2002. Molecular Motors: Theory. Springer, pp. 320–353 [Chapter 12]. Mogilner, A., Wollman, R., Marshall, W., 2006. Quantitative modeling in cell biology: what is it good for? Dev. Cell 11 (3), 279–287. Morton-Firth, C., 1998. Stochastic simulation of cell signaling pathways. University of Cambridge, Cambridge, UK, Ph.D. thesis. Murata, T., 1989. Petri nets: properties, analysis and applications. Proc. IEEE 77 (4), 541–580. Nagasaki, M., Doi, A., Matsuno, H., Miyano, S., 2003. Genomic Object Net: I. A platform for modelling and simulating biopathways. Appl. Bioinform. 2 (3), 181. Nagasaki, M., Doi, A., Matsuno, H., Miyano, S., 2004. A versatile petri net based architecture for modeling and simulation of complex biological processes. Genome Inform. 15 (1), 180–197. Nagasaki, M., Doi, A., Matsuno, H., Miyano, S., 2005. Petri net based description and modeling of biological pathways. Algebr. Biol., 19–31. Napione, L., Manini, D., Cordero, F., Horváth, A., Picco, A., De Pierro, M., Pavan, S., Sereno, M., Veglio, A., Bussolino, F., Balbo, G.,2009. On the use of stochastic Petri nets in the analysis of signal transduction pathways for angiogenesis process. In: Comput. Methods Syst. Biol. Springer, pp. 281-295. Needham, C., Bradford, J., Bulpitt, A., Westhead, D., 2006. Inference in Bayesian networks. Nat. Biotechnol. 24 (1), 51–54. Neves, S., Iyengar, R., 2009. Models of spatially restricted biochemical reaction systems. J. Biol. Chem. 284 (9), 5445–5449. Neves, S., Tsokas, P., Sarkar, A., Grace, E., Rangamani, P., Taubenfeld, S., Alberini, C., Schaff, J., Blitzer, R., Moraru, I., Iyengar, R., 2008. Cell shape and negative links in regulatory motifs together control spatial information ﬂow in signaling networks. Cell 133 (4), 666–680. Northrup, S., Erickson, H., 1992. Kinetics of protein-protein association explained by Brownian dynamics computer simulation. Proc. Natl. Acad. Sci. (PNAS) 89 (8), 3338–3342. Orton, R., Sturm, O., Vyshemirsky, V., Calder, M., Gilbert, D., Kolch, W., 2005. Computational modelling of the receptor-tyrosine-kinase-activated MAPK pathway. Biochem. J. 392, 249. Pahle, J., 2009. Biochemical simulations: stochastic, approximate stochastic and hybrid approaches. Brief. Bioinform. 10 (1), 53–64. Papin, J., Hunter, T., Palsson, B., Subramaniam, S., 2005. Reconstruction of cellular signalling networks and analysis of their properties. Nat. Rev. Mol. Cell Biol. 6 (2), 99–111. Pawson, T., 2004. Speciﬁcity in signal transduction: from phosphotyrosine-sh2 domain interactions to complex cellular systems. Cell 116 (2), 191–203. Pearl, J., 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausble Inference. Morgan Kaufmann. Pe’er, D., 2005. Bayesian network analysis of signaling networks: a primer. Sci. STKE 2005 (281), pl4. Peng, S., Wong, D., Tung, K., Chen, Y., Chao, C., Peng, C., Chuang, Y., Tang, C., 2010. Computational modeling with forward and reverse engineering links signaling network and genomic regulatory responses: Nf-b signaling-induced gene expression responses in inﬂammation. BMC Bioinform. 11 (1), 308. PPetri, C., 1966. Communication with automata: Volume 1, supplement 1, Technical report radc-tr-65-377. Petri, C.A.,1967. Grundsätzliches zur beschreibung diskreter prozesse (Fundamentals for description of discrete processes). In: 3. Colloquium über Automatentheorie. Springer, Birkhäuser Basel, pp. 121–140. Pettinen, A., Aho, T., Smolander, O., Manninen, T., Saarinen, A., Taattola, K., Yli-Harja, O., Linne, M., 2005. Simulation tools for biochemical networks: evaluation of performance and usability. Bioinformatics 21 (3), 357–363. Phillips, A., Cardelli, L.,2004. A correct abstract machine for the stochastic pi-calculus. In: Bioconcur’04. Elsevier. Pinney, J., Westhead, D., McConkey, G., 2003. Petri net representations in systems biology. Biochem. Soc. Trans. 31 (6), 1513–1515. Pogson, M., Smallwood, R., Qwarnstrom, E., Holcombe, M., 2006. Formal agent-based modelling of intracellular chemical interactions. BioSystems 85 (1), 37–45. Priami, C., 1995. Stochastic -calculus. Comput. J. 38 (7), 578–589. Priami, C., Quaglia, P., 2004. Modelling the dynamics of biosystems. Brief. Bioinform. 5 (3), 259–269.

17

Price, N., Shmulevich, I., 2007. Biochemical and statistical network models for systems biology. Curr. Opin. Biotechnol. 18 (4), 365. Rangamani, P., Iyengar, R., 2007. Modelling spatio-temporal interactions within the cell. J. Biosci. 32 (1), 157–167. Rao, C., Wolf, D., Arkin, A., 2002. Control, exploitation and tolerance of intracellular noise. Nature 420 (6912), 231–237. Raychaudhuri, S., Willgohs, E., Nguyen, T., Khan, E., Goldkorn, T., 2008. Monte Carlo simulation of cell death signaling predicts large cell-to-cell stochastic ﬂuctuations through the type 2 pathway of apoptosis. Biophys. J. 95 (8), 3559–3562. Reddy, V., Mavrovouniotis, M., Liebman, M.,1993. Petri net representations in metabolic pathways. In: Proceedings of the International Conference on Intelligent Systems for Molecular Biology (ISMB), Vol. 1. American Association for Artiﬁcial Intelligence (AAAI), pp. 328–336. Regev, A., Shapiro, E., 2004. The -calculus as an abstraction for biomolecular systems. Model. Mol. Biol., 219–266. Regev, A., Silverman, W., Shapiro, E.,2000. Representing biomolecular processes with computer process algebra: -calculus programs of signal transduction pathways. In: Proceedings of the Paciﬁc Symposium of Biocomputing. American Association for Artiﬁcial Intelligence (AAAI), p. 179. Rodriguez, E., Rudy, A., del Rosario, R., Vollmar, A., Mendoza, E., 2011. A discrete Petri net model for cephalostatin-induced apoptosis in leukemic cells. Nat. Comput. 10 (3), 993–1015. ´ Rodríguez, J.V., Kaandorp, J.A., Dobrzynski, M., Blom, J.G., 2006. Spatial stochastic modelling of the phosphoenolpyruvate-dependent phosphotransferase (PTS) pathway in escherichia coli. Bioinformatics 22 (15), 1895–1901. Sachs, K., Gifford, D., Jaakkola, T., Sorger, P., Lauffenburger, D., 2002. Bayesian network approach to cell signaling pathway modeling. Sci. STKE 2002 (148), pe38. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D., Nolan, G., 2005. Causal proteinsignaling networks derived from multiparameter single-cell data. Science 308 (5721), 523. Sackmann, A., Heiner, M., Koch, I., 2006. Application of Petri net based analysis techniques to signal transduction pathways. BMC Bioinform. 7 (1), 482. Saez-Rodriguez, J., Simeoni, L., Lindquist, J., Hemenway, R., Bommhardt, U., Arndt, B., Haus, U., Weismantel, R., Gilles, E., Klamt, S., Schraven, B., 2007. A logical model provides insights into T cell receptor signaling. PLoS Comput. Biol. 3 (8), e163. Samaga, R., Saez-Rodriguez, J., Alexopoulos, L., Sorger, P., Klamt, S., 2009. The logic of EGFR/ErbB signaling: theoretical properties and analysis of high-throughput data. PLoS Comput. Biol. 5 (8), e1000438. Samoilov, M., Plyasunov, S., Arkin, A., 2005. Stochastic ampliﬁcation and signaling in enzymatic futile cycles through noise-induced bistability with oscillations. Proc. Natl. Acad. Sci. (PNAS) 102 (7), 2310–2315. Sanft, K.R., Wu, S., Roh, M., Fu, J., Lim, R.K., Petzold, L.R., 2011. StochKit2: software for discrete stochastic simulation of biochemical systems with events. Bioinformatics 27 (17), 2457–2458. Schaff, J., Slepchenko, B., Choi, Y., Wagner, J., Resasco, D., Loew, L., 2001. Analysis of nonlinear dynamics on arbitrary geometries with the virtual cell. Chaos 11 (1), 115–131. Schlatter, R., Schmich, K., Vizcarra, I., Scheurich, P., Sauter, T., Borner, C., Ederer, M., Merfort, I., Sawodny, O., 2009. ON/OFF and beyond – a Boolean model of apoptosis. PLoS Comput. Biol. 5 (12), e1000595. Schneider, I., Haugh, J., 2004. Spatial analysis of 3’ phosphoinositide signaling in living ﬁbroblasts: II. Parameter estimates for individual cells from experiments. Biophys. J. 86 (1), 599–608. Schnell, S., Turner, T., 2004. Reaction kinetics in intracellular environments with macromolecular crowding: simulations and rate laws. Prog. Biophys. Mol. Biol. 85 (2), 235–260. Schwab, E., Pienta, K., 1997. Modeling signal transduction in normal and cancer cells using complex adaptive systems. Med. Hypotheses 48 (2), 111–123. Shimizu, T.S., 2002. The spatial organisation of cell signalling pathways – a computer-based study. University of Cambridge, Ph.D. thesis. Shmulevich, I., Dougherty, E., Kim, S., Zhang, W., 2002. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 18 (2), 261–274. Shvartsman, S., Wiley, H., Deen, W., Lauffenburger, D., 2001. Spatial range of autocrine signaling: modeling and computational analysis. Biophys. J. 81 (4), 1854–1867. Slepchenko, B., Schaff, J., Carson, J., Loew, L., 2002. Computational cell biology: spatiotemporal simulation of cellular events. Annu. Rev. Biophys. Biomol. Struct. 31 (1), 423–441. Smith, G.D., 2001. Modeling Local and Global Calcium Signals Using ReactionDiffusion Equations. CRC Press, pp. 49–85, Ch. 3. Smoluchowski, M., 1906. Zur kinetischen theorie der brownschen molekularbewegung und der suspensionen (the kinetic theory of brownian motion and suspensions). Ann. Phys. 326 (14), 756–780. Sreenath, S., Cho, K., Wellstead, P., 2008. Modelling the dynamics of signalling pathways. Essays Biochem. 45, 1–28. Stiles, J., Bartol, T., 2001. Monte Carlo Methods for Simulating Realistic Synaptic Microphysiology using MCell. CRC Press, pp. 87–127, Ch. 4. Stundzia, A., Lumsden, C., 1996. Stochastic simulation of coupled reaction-diffusion processes. J. Comput. Phys. 127 (1), 196–207. Takahashi, K., Arjunan, S., Tomita, M., 2005. Space in systems biology of signaling pathways - towards intracellular molecular crowding in silico. FEBS Lett. 579 (8), 1783–1788. Tasaki, S., Nagasaki, M., Oyama, M., Hata, H., Ueno, K., Yoshida, R., Higuchi, T., Sugano, S., Miyano, S., 2006. Modeling and estimation of dynamic EGFR pathway by data

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957

G Model BIO 3544 1–18 18 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972

ARTICLE IN PRESS N.M. ElKalaawy, A. Wassal / BioSystems xxx (2015) xxx–xxx

assimilation approach using time series proteomic data. Genome Inform. 17 (2), 226. Tolle, D., Le Novère, N., 2006. Particle-based stochastic simulation in systems biology. Curr. Bioinform. 1 (3), 1–6. Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T., Matsuzaki, Y., Miyoshi, F., Saito, K., Tanida, S., Yugi, K., Venter, J., Hutchison III, C., 1999. E-CELL: software environment for whole-cell simulation. Bioinformatics 15 (1), 72–84. Troncale, S., Tahi, F., Campard, D., Vannier, J., Guespin, J., 2006. Modeling and simulation with hybrid functional Petri nets of the role of interleukin-6 in human early haematopoiesis. In: Paciﬁc Symposium on Biocomputing, vol. 11, pp. 427–438. Turner, T., Schnell, S., Burrage, K., et al., 2004. Stochastic approaches for modelling in vivo reactions. Comput. Biol. Chem. 28 (3), 165–178. Ullah, G., Jung, P., Machaca, K., 2007. Modeling Ca2+ signaling differentiation during oocyte maturation. Cell Calcium 42 (6), 556–564. Valk, R., 1978. Self-modifying nets, a natural extension of petri nets. Autom. Lang. Program. 62, 464–476.

van Zon, J., ten Wolde, P., 2005a. Green’s-function reaction dynamics: a particlebased approach for simulating biochemical networks in time and space. J. Chem. Phys. 123 (23), 234910. van Zon, J., ten Wolde, P., 2005b. Simulating biochemical networks at the particle level and in time and space: green’s function reaction dynamics. Phys. Rev. Lett. 94 (12), 128103. Vigelius, M., Lane, A., Meyer, B., 2011. Accelerating reaction-diffusion simulations with general-purpose graphics processing units. Bioinformatics 27 (2), 288–290. Wishart, D., Yang, R., Arndt, D., Tang, P., Cruz, J., 2005. Dynamic cellular automata: an alternative approach to cellular simulation. Silico Biol. 5 (2), 139–161. Wooldridge, M., 2002. An Introduction to Multiagent Systems. John Wiley & Sons. Xu, H., 2005. Timed Hierarchical Object-Oriented Petri Net. I-TECH Education and Publishing, pp. 253–280 [Chapter 12]. Zhang, W., Liu, H.T., 2002. MAPK signal pathways in the regulation of cell proliferation in mammalian cells. Cell Res. 12 (1), 9–18.

Please cite this article in press as: ElKalaawy, N.M., Wassal, A., Methodologies for the modeling and simulation of biochemical networks, illustrated for signal transduction pathways: A primer. BioSystems (2015), http://dx.doi.org/10.1016/j.biosystems.2015.01.008

1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988