Computers in Biology and Medicine ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm

Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics András Hartmann a,b, João M. Lemos a, Susana Vinga b,n a b

INESC-ID/Instituto Superior Técnico, Universidade de Lisboa, Portugal – R Alves Redol 9, 1000-029 Lisboa, Portugal LAETA, IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Portugal – Av. Rovisco Pais, 1049-001 Lisboa, Portugal

art ic l e i nf o

a b s t r a c t

Article history: Received 14 April 2014 Accepted 28 August 2014

The aim of inverse modeling is to capture the systems' dynamics through a set of parameterized Ordinary Differential Equations (ODEs). Parameters are often required to fit multiple repeated measurements or different experimental conditions. This typically leads to a multi-objective optimization problem that can be formulated as a non-convex optimization problem. Modeling of glucose utilization of Lactococcus lactis bacteria is considered using in vivo Nuclear Magnetic Resonance (NMR) measurements in perturbation experiments. We propose an ODE model based on a modified timevarying exponential decay that is flexible enough to model several different experimental conditions. The starting point is an over-parameterized non-linear model that will be further simplified through an optimization procedure with regularization penalties. For the parameter estimation, a stochastic global optimization method, particle swarm optimization (PSO) is used. A regularization is introduced to the identification, imposing that parameters should be the same across several experiments in order to identify a general model. On the remaining parameter that varies across the experiments a function is fit in order to be able to predict new experiments for any initial condition. The method is cross-validated by fitting the model to two experiments and validating the third one. Finally, the proposed model is integrated with existing models of glycolysis in order to reconstruct the remaining metabolites. The method was found useful as a general procedure to reduce the number of parameters of unidentifiable and over-parameterized models, thus supporting feature selection methods for parametric models. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Identification Optimization Regularization Modeling Particle swarm optimization Biochemical systems Bacterial metabolism

1. Introduction In the area of biochemical engineering, emerging analytical techniques such as Nuclear Magnetic Resonance (NMR) deliver increasing amount of experimental biological data containing important information about the system of interest. A great technological advancement is in vivo NMR [1], which makes possible to monitor metabolism in living cells by tracking the concentration of specially marked metabolites,1 allowing to better understand their complex physiology. Indeed, dynamic modeling of the metabolism became one of the main research areas of systems biology due to the expected impact in areas such as metabolic and genetic engineering. A typical but yet unsolved problem is the modeling of glucose utilization in bacteria, a highly regulated process, in which the

n

Corresponding author. Tel.: þ 351 218 419 504; fax: þ351 218 498 097. E-mail addresses: [email protected] (A. Hartmann), [email protected] (J.M. Lemos), [email protected] (S. Vinga). 1 Usually, the 6th carbon in glucose is switched to (13C) isotope, which is tracked through the metabolic network.

external sugar is transported through the membrane into the cell [2]. Accurately modeling this first step is of high relevance since it is usually the first reaction of the bacterial metabolism pathway and to which all the other metabolites are highly dependent [3,4]. Most modelers are focusing on the inverse problem, namely to identify the parameters of a set of differential equations that best fit available experimental datasets [5]. However, the majority of these models lack generalization capabilities, i.e., even if a perfect fit to a single experiment is achieved, they cannot explain the systems' behavior in different experimental conditions. In this context, multi-objective PSO (MOPSO) was extended with term-wise decomposition, using a generalized mass action (GMA) model [6]. This ODE model approximates the reaction rates with power-laws, which in this case are decoupled into an equation system by replacing derivatives with the values of the observed slopes. However, it is known that this type of decomposition is highly sensitive to noise [7], which might lead to poor estimates in real noisy settings. Another approach is to apply MOPSO directly for global modeling [8], and combining the multiple objectives with dynamic weighting. The present work greatly expands this previous proposal but a more throughout

http://dx.doi.org/10.1016/j.compbiomed.2014.08.027 0010-4825/& 2014 Elsevier Ltd. All rights reserved.

Please cite this article as: A. Hartmann, et al., Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics, Comput. Biol. Med. (2014), http://dx.doi.org/10.1016/j.compbiomed.2014.08.027i

A. Hartmann et al. / Computers in Biology and Medicine ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

analysis of the parameter solution space is performed, along with an improvement on the utilization dynamic model. Moreover, regularization is introduced in the optimization procedure, thus leading to a contrasting perspective regarding the overall fitting methodology. In this paper the multiple objectives are extended with regularization, which penalize the deviances between the parameters on different experiments. The aim is to develop and identify a model that can accurately describe different experimental conditions of bacterial glucose utilization and simulate/predict novel experiments. This paper is organized as follows: in Section 2 we introduce the dataset, the model and our method of identification, in Section 3 the results of bacterial glucose utilization is described in detail. The methods and the results are further discussed in Section 4 and finally in Section 5, conclusions are drawn.

optima respectively. r1 and r2 are independent, uniformly distributed random variables on the interval ½0…1 ensuring the stochastic behavior of the method, pbesti is the best solution discovered by the ith particle and gbest is the best global solution found. The particle velocities are lower and upper bounded as vmin o vi o vmax . The method can be summarized in the following steps.

2. Methods Here we first describe the dataset, then the model is introduced, and finally we briefly review the particle swarm optimization (PSO) algorithm used for the identification together with the objective functions including the regularization approach.

2.2.1. Objective function The objective function in Algorithm 1 can be expressed in terms of Mean Squared Error (MSE). This method was already successfully put into practice for inferring metabolic networks [16]. Single experiments are fitted using an objective function based on MSE, that corresponds to the normalized ℓ2 norm:

2.1. The dataset

Ld ¼

In vivo NMR measurements opened new horizons for systems biology, allowing measurement of metabolite concentrations in the living cell [1]. In the case of Lactococcus lactis the extracellular glucose concentration is measured and the glucose transport has to be modeled. Three perturbation datasets were used, where a bolus of 20, 40 and 80 mM (13C) labeled glucose was given to starving bacteria in anaerobe conditions. It was observed that the multiple bolus experiments do not differ much from the single bolus regarding the shape of the glucose decay. The data was made publicly accessible in BGFit, a biological data management and curve fitting system [9] (http://kdbio.inesc-id.pt/bgfit). Some models using similar datasets were proposed previously [4,10–14], integrated in wider dynamic models for the glycolytic pathway.

where yd denotes the measured time-series with length of nd, and y^ d ðtÞ is the estimated (reconstructed) measurements using the model f ðÞ and an estimated parameter set θ^ :

2.2. Identification method Numerous methods were proposed in the literature for fitting ODE models to biochemical and genomic systems [5]. Because of global optimality and its stochastic nature, here the particle swarm optimization (PSO) [15] was chosen for parameter identification. PSO is a population based stochastic optimization method inspired by the collective intelligence of simple interacting individuals. The traditional example for such systems is a bird flock seeking for food. The birds do not know the explicit location of the food, but their distance from it, this corresponds to the objective function. Communication is a key issue of the method, sharing knowledge with the other members of the flock allows them to follow the bird closest to the food. In practice, PSO is initialized with a set of possible solutions, called particles Si ð0Þ and associated random velocities vi ð0Þ. In every iteration k the speed vi(k) and location Si(k) of each particle in the parameter space is updated as vi ðkÞ ¼ wvi ðk  1Þ þ c1 r 1 ðpbest i  Si ðk  1ÞÞ þ c2 r 2 ðgbest Si ðk  1ÞÞ

ð1Þ

Si ðkÞ ¼ Si ðk  1Þ þ vi ðkÞ;

ð2Þ

where w is the inertia describing the impact of the previous velocity to the current one. The positive constants c1 and c2 correspond to the acceleration rate towards the local and global

Algorithm 1. Particle swarm optimization (PSO) algorithm. 1. 2. 3. 4. 5. 6.

Initialize a set of particles of cardinality N Evaluate the objective function for all the particles Update pbesti for each particle i ¼ ½1…N and gbest Compute the new velocities using Eq. (1) Update the particles' position using Eq. (2) Repeat from step 2 until the desired precision or the limit of iterations is reached

2 1 nd  ∑ y ðtÞ  y^ d ðtÞ ; nd t ¼ 1 d

^ ¼ f ðθ^ ; tÞ yðtÞ

ð3Þ

ð4Þ

The index d identifies the different experiments, here d ¼1,2 and 3 represents the 20, 40 and 80 mM initial glucose concentrations, respectively. Here we do not aim at identifying all the metabolites dynamics, focusing only on glucose utilization using data from three different experiments simultaneously. In this context, it is known that the least absolute shrinkage and selection operator (Lasso) regularization [17] promote sparsity [18]. The idea behind introducing regularization is to impose sparsity in the sense that most of the parameters should not vary across the experiments and, therefore, can be considered global parameters. The remaining parameters should be dependent on the initial condition and will be described as a function of the initial sugar concentration. The general form of the objective function including both the MSE terms and the regularization is L ¼ L1 þ L2 þ L3 þ λðjθ^ 1  θ^ 2 j1 þ jθ^ 2  θ^ 3 j1 þ jθ^ 3  θ^ 1 j1 Þ

ð5Þ T

where j  j1 denotes for the ℓ1 norm, i.e. for a vector x ¼ ½x1 …xn  , it is the sum of the absolute values of the elements jxj1 ¼ ∑ni¼ 1 jxi j. The vector θ^ j with j ¼ 1; 2; 3 represents the parameter vector estimated from experiment j. The regularization constant, λ, is arbitrary positive scalar, which is a parameter of the method. In this application λ was chosen to be 10 since this value was found to balance optimally between the MSE and the regularization. 2.3. The model Glucose utilization can be modeled using ODEs where the observations y(t) correspond to the extracellular sugar concentration. One of the simplest model for this decay process involves the use of a pure exponential function, whose ODE formulation states _ is proportional to y(t) through a constant k: that the derivative yðtÞ _ ¼  kyðtÞ yðtÞ

ð6Þ

Other possibilities involve the use of logistic functions under a statistical non-linear mixed effects models framework [19]. In

Please cite this article as: A. Hartmann, et al., Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics, Comput. Biol. Med. (2014), http://dx.doi.org/10.1016/j.compbiomed.2014.08.027i

A. Hartmann et al. / Computers in Biology and Medicine ∎ (∎∎∎∎) ∎∎∎–∎∎∎

order to generalize the available experimental observations, we propose a model based on adding a time-varying parameter to the power-law structure [8]: _ ¼  kðaðtÞÞβ yðtÞ yðtÞ

ð7Þ

where k and β are (constant) parameters, and a is time-varying parameter, representing the acceleration. The rational of this model is to adapt a simple exponential decay to accommodate the accelerating process observed experimentally in the cells, for which there is still no definite biochemical explanation. In fact, for all the initial conditions a sigmoid-type behavior is observed, which does not comply with a pure exponential decay model of Eq. (7) but instead corresponds to an increase in the decay rate in time. The model obtained in Eq. (7) can thus be seen as a timevarying exponential decay system, in which the rate can be interpreted as a polynomial acceleration term. This allows for a flexible, meaningful and yet concise description of the dynamics of the glucose utilization system considered. If a(t) is an affine function of time, the system has a closed-form solution given by 0  β þ 1 1 t þ ck α B C c B C yðtÞ ¼ γ  expB  ð8Þ C @ A βþ1 The model has four constant parameters, θ ¼ ½k α β cT , where k and β come from the original exponential model (Eq. (7)), α and c Table 1 Parameters of the perturbation experiment. The default values of the parameters are based on parameters used in [8] for the 80 mM experiment. Parameter

Default value

Scale

Range

k α β c

0.0107 8.1732 1 4

Log Linear Linear Log

½0:001…0:1 ½0…10 ½0…2 ½0:01…1

describe the time-varying parameter (see Appendix A for derivation). Since the initial value yð0Þ ¼ y0 is known a priori, corresponding to the initial concentration of the glucose bolus, γ can be expressed as ! ckαβ þ 1 ð9Þ γ ¼ y0  exp βþ1 To visualize the effects of the parameters, a realistic perturbation simulation was conducted with the parameter ranges shown in Table 1. From the simulation results in Fig. 1, it is visible that the model is over-parameterized, for instance the effects of β and k coincide. 2.3.1. Integration with existing models of glycolysis One relevant issue that should be addressed is if the utilization model obtained with this procedure can be further integrated into existing models for the glycolytic pathway. In fact, there is a priori no guarantee that a model identified for a given sub-system (in this case glucose utilization) can be incorporated into models obtained for larger systems (e.g. glycolysis) while keeping a reasonable accuracy. To address this question, we have selected two proposed models of the glycolytic pathway in L. lactis, obtained using two distinct strategies, namely Biochemical Systems Theory (BST) [12] and Dynamic Energy Budget (DEB) [13], evaluating the overall fitting of the resulting simulations when the glucose utilisation was substituted with the model identified here.

3. Results First we tested if our model is able to capture the different sigmoid-shaped dynamics derived from the different glucose utilization time-series. Each experiment was identified separately using the objective function of Eq. (3). The model was fitted in 100 independent runs of PSO algorithm to each experiment. The identified parameters and the simulated time-series are shown

100

100

80

80

60

60

40

40

20

20

0

0

10

20

30

40

0

100

100

80

80

60

60

40

40

20

20

0

0

10

20

30

40

3

0

0

10

20

30

40

0

10

20

30

40

Fig. 1. Perturbation experiments on the model, where the arrow shows the direction of changes when the parameters are increasing, also darker lines represent increased parameter values.

Please cite this article as: A. Hartmann, et al., Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics, Comput. Biol. Med. (2014), http://dx.doi.org/10.1016/j.compbiomed.2014.08.027i

4

A. Hartmann et al. / Computers in Biology and Medicine ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Fig. 2. Fitting the model to the distinct experiments, parameters are shown in panel A, and reconstructions in panel B. Blue curves are the fittings with the different parameter estimates. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

in Fig. 2. It can be seen that the identification resulted in good reconstruction for each individual experiment, however as expected, the estimated parameters were found to be different across the experiments. Then the regularized objective function that combines all three experiments of Eq. (5) was plugged in into PSO, and again 100 different parameter sets were identified. It is seen in Fig. 3 that the parameters of the single runs have relatively large standard deviations. Table 2 summarizes all the parameters and corresponding MSE values. As expected, and due to higher degree of

freedom, the error values of the individual models are always smaller than those of the global model, however the fit of the regularized model to the dataset is still considerably good. It was also observed that, as a result of the regularization, within one run of PSO there are only small deviances in the parameters of α, β and c on the different experiments. This fact supports the hypothesis that these three parameters are indeed global, and the values of the global parameters can be taken as the mean or median of the estimates. On the other hand k was found significantly different on all three experiments, but can be approximated well with an

Please cite this article as: A. Hartmann, et al., Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics, Comput. Biol. Med. (2014), http://dx.doi.org/10.1016/j.compbiomed.2014.08.027i

A. Hartmann et al. / Computers in Biology and Medicine ∎ (∎∎∎∎) ∎∎∎–∎∎∎

5

Fig. 3. Fitting the model to the experiments using the regularized objective function, parameters are shown in panel A, and reconstructions in panel B. Blue curves are the fittings with the different parameter estimates, the green curve is the reconstruction with the median of the estimates. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

exponential function of the initial concentration of the glucose bolus, see Fig. 4. It is noteworthy that it is possible to identify an alternative model using Eq. (6) and imposing this structure for k i.e., k ¼ exp k1 þ k2 y0 . The resulting average MSE values are higher than those obtained with the acceleration term for all the initial conditions (5.9406, 9.6283 and 15.0271, simulation figures are not shown). Moreover, the sigmoid-type behaviour is lost, which does not correspond to the observed acceleration process. In order to see whether the model can be further reduced, the fittings were repeated but constraining α ¼0, β ¼0, and c ¼1.

These are convenient values for the parameters, because they simplify Eq. (8). The model was cross-validated by fitting to the 20 and 80 mM experiment only and verifying if the 40 mM experiment could be reconstructed. The result of the cross-validation in Fig. 5 shows good fit on the 40 mM experiment, with MSE ¼3.9152. Crossvalidation results of the remaining two experiments can be found in Appendix B. It can be seen that the fittings in these cases are not as accurate, which confirms that when testing a model beyond the range used for its identification might lead to less accurate results.

Please cite this article as: A. Hartmann, et al., Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics, Comput. Biol. Med. (2014), http://dx.doi.org/10.1016/j.compbiomed.2014.08.027i

A. Hartmann et al. / Computers in Biology and Medicine ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

90

Table 2 Parameter values and corresponding MSEs. Constrained parameters are denoted with italic and best MSE values using the regularized objective function with bold type fonts.

80

20 mM

40 mM

80 mM

Distinct fit k 0.1440 7 0.0497 α 0.9071 70.0815 β 2.7760 7 0.2820 c 1.7229 7 0.4436

0.0729 7 0.0103 1.2607 7 0.1438 1.3495 7 0.1075 1.5518 7 0.2696

0.0632 70.0053 1.1974 7 0.2461 0.4671 7 0.0439 1.3249 70.3128

MSE

0.1709

2.8774

Fit with regularization k 0.3163 7 0.0400 α 1.1520 7 0.3032 β 0.6200 7 0.1519 c 1.3539 70.3914

0.0440

0.1517 7 0.0215 1.1494 70.3040 0.6220 7 0.1504 1.3573 7 0.3829

0.0546 7 0.0066 1.1552 7 0.2987 0.5755 70.1056 1.3555 7 0.3929

MSE

3.5772

2.2355

3.8613

k α β c

0.5323 7 0.0414 0 0.4106 7 0.0384 1.5057 7 0.2807

0.2546 7 0.0229 0 0.4131 70.0372 1.5068 70.2812

0.0889 7 0.0077 0 0.3799 70.0261 1.5077 7 0.2810

MSE

2.8297

1.3554

4.3414

k α β c

0.4633 7 0.0172  0.0065 7 0.2802 0  0.0246 7 0.2946

0.2801 7 0.0068  0.0071 70.2808 0  0.0246 70.2915

0.1298 7 0.0014  0.0068 7 0.2820 0 -0.0239 7 0.2927

MSE

6.3941

9.5966

15.0002

k α β c

0.3298 7 0.0311 1.0041 70.2899 0.51217 0.0645 1

0.1583 7 0.0153 1.0025 7 0.2896 0.5071 7 0.0587 1

0.0573 7 0.0054 1.0076 7 0.2876 0.4849 7 0.0462 1

MSE

3.5080

2.3237

3.5250

Fig. 4. The distribution of k on the experiments (blue), fitting an exponential (red), where the result is kðy0 Þ ¼ expð 0:644  0:029y0 Þ. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

Finally, in order to simulate the remaining metabolites of the 40 mM experiment, the identified glucose curves were plugged in into both the S-system model of [12] and the DEB model of [13]. As it can be seen in Fig. 6, when integrating the results into the S-system model, most of the metabolites were predicted correctly, however the steady-state lactate concentration was overestimated in some of the runs. Fig. 7 shows the reconstructions using the DEB model. Here, all the predicted metabolite concentrations fit well to the measurements.

Glucose concentration (mM)

70

Name

60 50 40 30 20 10 0

0

5

10

15

20

25

Time (min)

Fig. 5. Cross-validation: predicting the 40 mM experiment by fitting to the 20 and the 80 mM experiment. Blue curves are the fittings, black curves the reconstruction with the different parameter estimates, the green curve is the reconstruction with the median of the estimates. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

4. Discussion In this work a model for glucose utilization was proposed together with an approach for the identification of different experimental conditions. According to the categorization of [20], aggregation is an approach for combining all the objectives into a single one. Also the objective function proposed here is of this type, moreover it is extended with a regularization, including the MSE terms to the single experiments as well as a regularization term. On the experimental data, the identification resulted in three global parameters and one depending on the initial value. The advantage of applying PSO to non-convex problems is beneficial, as it is one of the most successful global optimization methods [20]. Another noteworthy advantage of the method comes from its stochastic population-based nature. Particularly, nonlinear models often suffer from identifiability problems, meaning that one parameter can compensate the effect of another, resulting in similar fittings to the data for distinct parameters sets. If this phenomenon is a property of the model itself, it is called global identifiability problem [21], when the identifiability issue is caused by the noise in the data, then it is characterized as a local identifiability problem [22]. When using PSO, instead of discovering one single parameter, it is possible to generate a population of parameters, either by running the method several times, or by taking all the particles from one run. The population of parameters (or its statistics) also characterizes the sensitivity regions of the problem and can reflect the parameters interdependencies. When regularization is introduced, not only the error of the reconstruction is minimized but also the deviations between the parameters of the different time-series decrease. Alternatively, regularization can be interpreted as feature selection consisting of a sensitivity analysis, and choosing the most sensitive parameters. In connection to many models in systems biology it can be argued that the parameters solution may not be unique even within the same experiments. From a biological systems point of view, this phenomena can be interpreted as sloppiness. As pointed out previously [23], different conditions of the living systems may attract different sets of parameters of the same biological model, with similar responses. The parameters of interest should be chosen from the intersections of these sets, since using those usually ensures a good description of the system within different

Please cite this article as: A. Hartmann, et al., Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics, Comput. Biol. Med. (2014), http://dx.doi.org/10.1016/j.compbiomed.2014.08.027i

0

Concentration(mM)

−20

30

150

0

50 100 Time(min)

150

3PGA

20 10 0

0

50 100 Time(min)

150

Lactate

100 50 0

0

50

100

150

Concentration(mM)

20

60

Concentration(mM)

glu

15

Concentration(mM)

40

Concentration(mM)

Concentration(mM)

A. Hartmann et al. / Computers in Biology and Medicine ∎ (∎∎∎∎) ∎∎∎–∎∎∎

40

7

FBP

40 20 0

0

50 100 Time(min)

150

PEP

10 5 0

0

50 100 Time(min)

20 0

0

Time(min)

50

150

G6P Pyr NAD+ ATP Pi 100 150

Time(min)

0 −20

0

50

100

150

Time(min) 10 5 3PGA 0

0

50

100

150

Time(min) 100 50 Lactate 0

0

50

100

150

Time(min)

Concentration(mM)

glu

20

Concentration(mM)

40

Concentration(mM)

Concentration(mM)

Concentration(mM)

Concentration(mM)

Fig. 6. Predicting all the metabolites of the 40 mM experiment by plugging the reconstructed glucose curves into an S-system model of glycolysis. Measurements are denoted by dots and reconstructions with curves.

50

0

FBP

0

50

100

150

Time(min) 10 PEP 5 0

0

50

100

150

Time(min) 40 20 0

0

G6P Pyr NAD+ ATP 50 150 100 Pi Time(min)

Fig. 7. Predicting all the metabolites of the 40 mM experiment by plugging in the reconstructed glucose curves into a DEB model of glycolysis. Measurements are denoted by dots and reconstructions with curves.

conditions. In the case of the glucose bolus data, the parameters obtained by fitting single experiments do not overlap except parameter c. In our interpretation this indicates the inadequacy of this approach for global modeling, even if the model fits remarkably well to individual experiments illustrating its flexibility in capturing the different sigmoid shapes.

It was also demonstrated that it is possible to further reduce the model by constraining one of the parameters to a convenient value. It was found that the constraints α ¼0 and c¼ 1 are preferable, because besides simplifying the model, they reduce the variance of the estimate on the other parameters without significantly increasing the MSE of the fit.

Please cite this article as: A. Hartmann, et al., Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics, Comput. Biol. Med. (2014), http://dx.doi.org/10.1016/j.compbiomed.2014.08.027i

A. Hartmann et al. / Computers in Biology and Medicine ∎ (∎∎∎∎) ∎∎∎–∎∎∎

8

Introducing the global model, a good overall fit was achieved, but the MSE values to individual experiments became higher than in the single experiment case. This is considered to be a natural trade-off for generalization. Cross-validation results on the 40 mM experiment confirm the models ability in predicting unknown experimental data, however the 20 and 80 mM cross-validation results are inferior. In our interpretation this shows that the method is adequate to reconstruct the glucose utilization from two experiments only if the initial value of the experiment to reconstruct is in between the initial values of the two measured experiments. Further numerical experiments were conducted in order to demonstrate how the proposed model could be integrated into existing models of glycolysis. When the predicted glucose utilization curves are plugged into the S-system model of [12], most of the metabolites could be predicted accurately. The steady state lactate concentration was however overestimated. This is hypothesized to be a consequence of the high sensitivity to the glucose levels, and that the coupled structure of the model allows the accumulation of errors. In fact, the underlying model is known to be sensitive to the changes of certain metabolite concentrations, such as Pi and NADH. The Dynamic Energy Budget (DEB) model [13] imposes mass conservation, i.e. the structure is such that all the metabolites are coupled directly with the glucose derivative, which makes the model more robust against accumulation of errors. The integration into the DEB model resulted in good fits to all the measured metabolites. These findings suggest that if the glucose time course includes deviations, then the DEB type model is preferred versus the S-system model, because it is less sensitive to these changes. We do not claim that the proposed approach would work best for all problems, but it definitely has proven successful in solving multi-objective optimization problems of this type, for which the present case-study represents a proof-of-concept. It is important to note that the scope of the model is potentially wider than glucose utilization. Theoretically, any sigmoid-shaped function could be described with this type of formulation. The sigmoid shape of the glucose utilization was found slightly varying on different experiments. This might be a consequence of the different activity of the glucose uptake systems revealed by [4], the differences between the transport of the glucose monomers, or other factors such as biomass and also the overall energy level of the cell [2]. The model could be further extended to include these aspects as well and also experiments for mutant bacterial strains.

5. Conclusion Here a novel model to L. lactis glucose utilization was introduced. The model contains four parameters, from which three are global parameters and one is dependent on the initial glucose concentration. It was shown that further reduction of the model would also be possible by setting certain global parameters to convenient values. Parameters were identified using particle swarm optimization (PSO) algorithm with an objective function, that combines MSE with a regularization term in order to discover both global and local parameters. The algorithm was able to fit a model describing three glucose perturbation experiments with different glucose input concentrations. We hypothesize that this could be a future framework towards a global modeling of data under different experimental conditions.

Equation (ODE). Parameters are often required to fit multiple repeated measurements or different experimental conditions. This typically leads to a multi-objective optimization problem that can be formulated as a non-convex optimization problem. Modeling of glucose utilization of L. lactis bacteria is considered upon in vivo Nuclear Magnetic Resonance (NMR) measurements in perturbation experiments. An ODE model based on a modified timevarying exponential decay is proposed, that is flexible enough to model several different experimental conditions. The starting point is an over-parameterized non-linear model, that will be further simplified through an optimization procedure with regularization penalties. The regularization imposes most parameters to be the same across different experiments and generalizes the model. Parameters were identified using particle swarm optimization (PSO). The algorithm with the regularized objective function was able to fit a model describing three glucose perturbation experiments with different glucose input concentrations. The final model contains four parameters, from which three are considered global and one is dependent on the initial glucose concentration. It was shown that further reduction of the model would also be possible by setting certain global parameters to convenient values. The method was found useful in predicting novel experimental data by fitting the model to two experiments and validating the third one. This study is a proof of concept for a procedure to reduce the number of parameters of unidentifiable and overparameterized models, thus supporting feature selection methods for parametric models. We hypothesize that this could be a future framework towards a global modeling of data under different experimental conditions.

Conflict of interest statement None declared.

Acknowledgments The authors thank Dr. Ana Rute Neves and Professor Helena Santos for permission to access the experimental dataset and for their useful comments. We are grateful to the anonymous reviewers whose constructive suggestions helped us to greatly improve and clarify this paper. This work was supported by national funds through FCT, Fundação para a Ciência e a Tecnologia, under contracts PestOE/EEI/LA0021/2013; through IDMEC, under LAETA (PestOE/EME/LA0022); and projects InteleGen (PTDC/DTPFTO/1747/2012) and European Union Framework Program 7 “BacHBerry” (www.bachberry.eu), Project No. FP7 – 613793). A.H. received a PhD fellowship from FCT Portugal under the reference SFRH/BD/69336/2010. S.V. acknowledges support by Program Investigador FCT (IF/00653/2012) from FCT, co-funded by the European Social Fund (ESF) through the Operational Program Human Potential (POPH).

Appendix A. Derivation of the linear time-varying model The time-varying parameter can be modeled with a simple linear relation: _ ¼ aðtÞ

1 c

ðA:1Þ

6. Summary

It is easy to see that solving Eq. (A.1) results in the following simple affine time dependence:

The aim of inverse modeling is to capture the systems' dynamics through a set of parameterized Ordinary Differential

aðtÞ ¼ α þ

t c

ðA:2Þ

Please cite this article as: A. Hartmann, et al., Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics, Comput. Biol. Med. (2014), http://dx.doi.org/10.1016/j.compbiomed.2014.08.027i

A. Hartmann et al. / Computers in Biology and Medicine ∎ (∎∎∎∎) ∎∎∎–∎∎∎

introducing a new parameter α. Substituting this into the original ODE (Eq. (7)) the final version of the model is   t β _ ¼ k αþ yðtÞ yðtÞ: ðA:3Þ c

Appendix B. Cross-validation results for 20 and 80 mM experiments The results below are to complete the cross-validation tests on the 20 and 80 mM experiments (Figs. B1 and B2). 90 80

Glucose concentration (mM)

70 60 50 40 30 20 10 0

0

5

10

15

20

25

Time (min) Fig. B1. Cross-validation: predicting the 20 mM experiment by fitting to the 40 and the 80 mM experiment. Blue curves are the fittings, black curves are the reconstruction with the different parameter estimates, the green curve is the reconstruction with the median of the estimates. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) 90 80

Glucose concentration (mM)

70 60 50 40 30 20 10

00

5

10

15

20

9

References [1] A.R. Neves, W.A. Pool, J. Kok, O.P. Kuipers, H. Santos, Overview on sugar metabolism and its control in Lactococcus lactis, FEMS Microbiol. Rev. 29 (3) (2005) 531–554. [2] M. Papagianni, N. Avramidis, G. Filiousis, Glycolysis and the regulation of glucose transport in Lactococcus lactis spp. lactis in batch and fed-batch culture, Microb. Cell Fact. 6 (2007) 16. [3] E. Voit, J. Almeida, S. Marino, R. Lall, G. Goel, A. Neves, H. Santos, Regulation of glycolysis in Lactococcus lactis: an unfinished systems biological case study, IEE Proc. Syst. Biol. 153 (4) (2006) 286–298. [4] R. Castro, A.R. Neves, L.L. Fonseca, W.a. Pool, J. Kok, O.P. Kuipers, H. Santos, Characterization of the individual glucose uptake systems of Lactococcus lactis: mannose-PTS, cellobiose-PTS and the novel GlcU permease, Mol. Microbiol. 71 (3) (2009) 795–806. [5] I.-C. Chou, E.O. Voit, Recent developments in parameter estimation and structure identification of biochemical and genomic systems, Math. Biosci. 219 (2) (2009) 57–83. [6] P.C. Naval, L.G. Sison, E.R. Mendoza, Parameter estimation with term-wise decomposition in biochemical network GMA models by hybrid regularized least squares-particle swarm optimization, in: IEEE Congress on Evolutionary Computation, Barcelona, Spain, 2010, pp. 1–8. [7] E.O. Voit, J. Almeida, Decoupling dynamical systems for pathway identification from metabolic profiles, Bioinformatics (Oxford, England) 20 (11) (2004) 1670–1681. [8] A. Hartmann, S. Vinga, J. Lemos, Unified modeling of several perturbation experiments in systems biology-a case study on the glucose uptake of Lactococcus Lactis, in: Bioinformatics, Rome, Italy, 2011, pp. 309–312. [9] A. Veríssimo, L. Paixão, A.R. Neves, S. Vinga, BGFit: management and automated fitting of biological growth curves, BMC Bioinf. 14 (1) (2013) 283. [10] G. Goel, I.-c. Chou, E. Voit, System estimation from metabolic time-series data, Bioinformatics 24 (21) (2008) 2505–2511. [11] M. Vilela, S. Vinga, M.a.G.M. Maia, E.O. Voit, J.S. Almeida, Identification of neutral biochemical network models from time series data, BMC Syst. Biol. 3 (2009) 47. [12] S. Vinga, K. Thomaseth, J. Lemos, A. Neves, H. Santos, A. Freitas, Structural analysis of metabolic networks: a case study on Lactococcus lactis, in: 8th Portuguese Conference on Automatic Control (CONTROLO'2008), 2008, pp. 566–571. [13] S. Vinga, A.R. Neves, H. Santos, B.W. Brandt, S.A.L.M. Kooijman, Subcellular metabolic organization in the context of dynamic energy budget and biochemical systems theories, Philos. Trans. R. Soc. Lond. Ser. B, Biol. Sci. 365 (1557) (2010) 3429–3442. [14] R.S. Costa, A. Hartmann, P. Gaspar, A.R. Neves, S. Vinga, An extended dynamic model of Lactococcus lactis metabolism for mannitol and 2,3-butanediol production, Mol. BioSys. 10 (3) (2014) 628–639. [15] J. Kennedy, R. Eberhart, Particle swarm optimization, in: Proceedings of the IEEE International Conference on Neural Networks, vol. 4, Perth, Australia, 1995, pp. 1942–1948. [16] P.C. Naval, L.G. Sison, E.R. Mendoza, Metabolic network parameter inference using particle swarm optimization, in: International Conference on Molecular Systems Biology, 2006, pp. 3–4. [17] R. Tibshirani, Regression shrinkage and selection via the Lasso, J. R. Stat. Soc. Ser. B (Methodol.) 58 (1) (1996) 267–288. [18] H. Ohlsson, Regularization for Sparseness and Smoothness and Signal Processing (Ph.D. thesis), Linköping University, SE-581 83 Linköping, Sweden, 2010. [19] J. Ndukum, L.L. Fonseca, H. Santos, E.O. Voit, S. Datta, Statistical inference methods for sparse biological time series data, BMC Syst. Biol. 5 (1) (2011) 57. [20] C.A. Coello Coello, M. Reyes-Sierra, Multi-objective particle swarm optimizers: a survey of the state-of-the-art, Int. J. Comput. Intell. Res. 2 (3) (2006) 287–308. [21] S. Audoly, G. Bellu, L. D'angiò, M.P. Saccomani, C. Cobelli, L. D'Angio, Global identifiability of nonlinear models of biological systems, IEEE Trans. Biomed. Eng. 48 (1) (2001) 55–65. [22] J. Jacquez, T. Perry, Parameter estimation: local identifiability of parameters, Am. J. Physiol. 258 (4 Pt. 1) (1990) E727–E736. [23] B.C. Daniels, Y.-J. Chen, J.P. Sethna, R.N. Gutenkunst, C.R. Myers, Sloppiness, robustness, and evolvability in systems biology, Curr. Opin. Biotechnol. 19 (4) (2008) 389–395.

25

Time (min) Fig. B2. Cross-validation: predicting the 80 mM experiment by fitting to the 20 and the 40 mM experiment. Blue curves are the fittings, black curves are the reconstruction with the different parameter estimates, the green curve is the reconstruction with the median of the estimates. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

Please cite this article as: A. Hartmann, et al., Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics, Comput. Biol. Med. (2014), http://dx.doi.org/10.1016/j.compbiomed.2014.08.027i

Modeling multiple experiments using regularized optimization: A case study on bacterial glucose utilization dynamics.

The aim of inverse modeling is to capture the systems׳ dynamics through a set of parameterized Ordinary Differential Equations (ODEs). Parameters are ...
891KB Sizes 3 Downloads 3 Views