Modeling of batch processes using explicitly time-dependent artificial neural networks.

970

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 5, MAY 2014

Modeling of Batch Processes Using Explicitly Time-Dependent Artificial Neural Networks Botla Ganesh, Vadlagattu Varun Kumar, and Kalipatnapu Yamuna Rani

Abstract— A neural network architecture incorporating time dependency explicitly, proposed recently, for modeling nonlinear nonstationary dynamic systems is further developed in this paper, and three alternate configurations are proposed to represent the dynamics of batch chemical processes. The first configuration consists of L subnets, each having M inputs representing the past samples of process inputs and output; each subnet has a hidden layer with polynomial activation function; the outputs of the hidden layer are combined and acted upon by an explicitly time-dependent modulation function. The outputs of all the subnets are summed to obtain the output prediction. In the second configuration, additional weights are incorporated to obtain a more generalized model. In the third configuration, the subnets are eliminated by incorporating an additional hidden layer consisting of L nodes. Backpropagation learning algorithm is formulated for each of the proposed neural network configuration to determine the weights, the polynomial coefficients, and the modulation function parameters. The modeling capability of the proposed neural network configuration is evaluated by employing it to represent the dynamics of a batch reactor in which a consecutive reaction takes place. The results show that all the three time-varying neural networks configurations are able to represent the batch reactor dynamics accurately, and it is found that the third configuration is exhibiting comparable or better performance over the other two configurations while requiring much smaller number of parameters. The modeling ability of the third configuration is further validated by applying to modeling a semibatch polymerization reactor challenge problem. This paper illustrates that the proposed approach can be applied to represent dynamics of any batch/semibatch process. Index Terms— Batch reactor, explicitly time-dependent neural networks, modulation function, nonstationary dynamic modeling, semibatch polymerization reactor.

I. I NTRODUCTION

T

HERE has been a shift in the focus of the chemical industry from mass production of low-value products to small-scale production of high-value products during the past two decades and there has been a tremendous growth in the chemical industry involving the use of batch and semibatch processes because of the demand for a large number of

Manuscript received August 17, 2012; revised July 22, 2013; accepted September 30, 2013. Date of publication October 24, 2013; date of current version April 10, 2014. This work was supported in part by the Department of Science and Technology, Government of India, New Delhi, in part by CSIR XII Plan Project, INDUS MAGIC, and in part by CSIR through research fellowship. B. Ganesh and K. Y. Rani are with Process Dynamics and Control Group, Chemical Engineering Division, Indian Institute of Chemical Technology, Hyderabad 500607, India (e-mail: [email protected]; [email protected]). V. V. Kumar was with Process Dynamics and Control Group, Chemical Engineering Division, Indian Institute of Chemical Technology, Hyderabad 500607, India. He is now with Rashtriya Chemicals and Fertilizers Limited (e-mail: [email protected]). Digital Object Identifier 10.1109/TNNLS.2013.2285242

specialty chemicals. Operation in semibatch mode is typically used for strongly exothermic reactions because one can balance the reaction heat with the available cooling of the reactor and adjustment of the feed rate. Because batch processing is predominantly used in the manufacture of low-volume and high-value products, even a marginal increase in the product yield can lead to a considerable improvement in profitability. Artificial neural networks (ANNs) are typical input–output mapping structures, which can be categorized as one of the most important black box models that can describe the input–output functional relationship. In chemical engineering, some important applications of ANNs are in the fields of fault diagnosis in chemical plants, dynamic modeling of chemical processes, system identification and control, sensor data analysis, chemical composition analysis, and inferential control [1]. Himmelblau [2] presented his experiences in application of ANNs in the area of chemical engineering with the help of four examples on fault detection, prediction of polymer quality, data rectification, and modeling and control. The inherent capability of the ANNs to handle general nonlinear relationships has led to their extensive use in different applications. Recent applications include disease detection [3], estimation of physical properties including vapor pressures and densities [4], viscosities for ionic liquids [5], suspended sediment concentration in water [6], and so on. The ability of neural networks to learn complex nonlinear relationships, even when the input information is noisy and imprecise, designates it as potential problem solver in chemical process industries. ANNs are viewed as effective tools for solving problems such as pattern classification, speech recognition, image analysis, forecasting, and nonlinear or nonstationary system modeling. There are several issues on which research is currently being carried out in the area of ANN-based modeling, namely a unified method for selection of network architecture, the initial values of weights, the learning rate parameter, the momentum factor, and so on. Some of the guidelines proposed for certain cases are reported in [7]. Rani and Rao [8] reported a design and training method for selection and development of ANNs and evaluated its performance for two fermentation case studies. Himmelblau [2] discussed several issues relevant to ANNs, including overfitting and under fitting, smoothing, and so on, along with methods of assessing the performance of ANNs. More recent studies in the area of training methods for ANNs include adaptive algorithm for radial basis function networks [9] and parameter-free simplified swarm optimization method for time-series predictions [10]. An alternate ANN structure of Cohen–Grossberg neural networks has also been

2162-237X © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

GANESH et al.: MODELING OF BATCH PROCESSES

recently reported along with the training method and stability analysis studies [11]–[14]. The most commonly employed ANN structure for dynamic modeling of chemical processes is a multilayer perception in the form of a difference equation where the past sampled values of the input and output variables of the process become the inputs to the ANN model, and the present output variable sample becomes its output. This is a straightforward extension of autoregressive exogenous (ARX) model structure to nonlinear ARX form using ANN models. This structure is static in nature and is very well suited for continuous processes. Batch and semibatch processes are inherently dynamic or time varying. Nonstationarity implies that the same set of state variables at two different times leads to different sets of state variables at the next sampling time. Static ANNs rely on the relation between states and input variables at two consecutive sampling instants. Therefore, with the same set of state variables, the predicted ANN outputs for the state variables at the next sampling instant remain the same irrespective of the time to which such state variables correspond to. Thus, nonstationarity, common in batch/semibatch processes, cannot be captured using static ANNs. One of the ANN architectures used frequently in process dynamics and control is the recurrent neural network structure where there are time-delayed feedback connections to previous layers, and these networks are further classified as internally recurrent or externally recurrent depending on whether the feedback connections are from hidden layer/s to input layer, or from output layer to hidden layer/s, respectively [2]. This structure can be viewed as an implicit way of handling time dependency in dynamic processes. A data-driven modeling approach especially suitable for batch/semibatch processes has been proposed [15] based on abundantly available historic process data, where the timedependent input and output trajectories are parameterized, and the parameterized variables are related using static neural networks. Chang and Hung [16] have proposed the use of neural networks rate function models for optimization of batch polymerization reactors. The output derivatives with respect to time (generated by approximating the outputs as nonlinear functions of time) from a few typical batches are related to the output measurements and the temperature at every sampling instant with the help of a feedforward neural network. These approaches are applicable in situations where the entire batch is considered as a single unit for the purpose of modeling. Iatrou et al. [17] proposed a scheme for modeling nonlinear nonstationary dynamic systems using ANNs with polynomial activation functions modulated by an appropriate time function incorporating time dependency explicitly. Further, Iatrou et al. [18] have applied their approach to a biomedical application of prediction of potentiation properties in a rabbit hippocampus. Their approach is limited to single-input single-output (SISO) systems, and is not directly applicable to multiinput systems. In this paper, this explicitly timedependent neural network model is further developed and alternate configurations are proposed, which are evaluated with the help of simulation case studies of chemical engineering batch and semibatch processes replicating real processes. The approach is presented in the following section, followed by

971

the description of the two case studies considered, namely a batch reactor and a semibatch polymerization reactor. The results on model predictions and their comparison with the simulated results are discussed next, and finally, conclusion is presented. II. T IME -VARYING N EURAL N ETWORK M ODELING Iatrou et al. [17], [18] proposed an approach that employs an ANN architecture based on the theory of functional expansions of nonlinear, nonstationary systems (time-varying Volterra series). Nonlinear Volterra systems have been represented using tapped-delay feedforward ANNs with a single hidden layer and polynomial activation functions [19]. This method was extended to the modeling and identification of nonstationary, nonlinear Volterra systems by introducing a specific structure of explicit nonstationarity in the equivalent ANN model [17]. This model of explicit nonstationarity contains certain parameters, which are estimated using the backpropagation algorithm and the delta-bar-delta learning rule. Provided that one can select the proper form of nonstationarity in the ANN model for a given application, Iatrou et al. [17] have claimed that this approach can provide a practical solution to a very important and difficult problem, often viewed as intractable. The time-varying neural network (TVNN) architecture proposed in [17] is limited to SISO systems, and is not directly applicable to multiinput systems. In this paper, three alterations are proposed. First, their approach is extended to multiinput single-output (MISO) systems. Second, a set of additional weights is incorporated into an intermediate layer to make the representation slightly more general. Finally, a compact TVNN structure is formulated, which has common weights connecting the input to hidden layer, whereas the subnet structure in the next layer is retained as in the second formulation by defining an additional hidden layer. The generalized schematic representation of the proposed three configurations of neural networks architecture is shown in Figs. 1–3. Batch chemical processes represent a class of systems that are inherently highly nonlinear and time varying. Furthermore, they are represented by a set of input and output variables that exhibit interacting behavior, for example, change in temperature is likely to cause a drastic change in the concentration in a batch reactor, although both these variables are categorized as output variables. Therefore, it is necessary to have a model structure incorporating nonlinearity as well as interactions simultaneously with minimum computational effort, and the models presented in this paper focus toward this objective. The first configuration (Model 1) is an extended version of the structure proposed in [17], and is shown in Fig. 1. In this model, the network consists of L subnets, and the output of each subnet is multiplied by a different time-varying function and summed to obtain the overall output prediction. Each subnet consists of an input layer consisting of Q nodes, fully connected to a hidden layer consisting of K neurons, where Q is defined as Q=

m i=1

Mi + m.

(1)

972


Fig. 1. Schematic representation of time-dependent neural network (Model 1). S: summation. P: product. p(S): polynomial function of summation value.

The input layer has m physical input variables, each consisting of present measurement and Mi previous measurements for the i th input variable. The activation function of the neurons in the hidden layer is a polynomial function of order R. The outputs of the hidden layer neurons are summed to obtain the output of the each subnet, which are acted upon by (L−1) explicitly time-dependent modulation functions, whereas one of the subnets represents the time-independent part. The outputs of all subnets are summed to give the overall output of the network. In the model proposed in [17], the inputs to each subnet consist of present and past measurements of a single input variable, whereas in Model 1 in this paper, the past measurements of several input variables are considered as inputs, thus making it an MISO model. Further, inputs to each subnet can also include present and past measurements of process output variables. The network in Model 1 has weights connecting only the input layer of each subnet with the polynomial activation function-based hidden layer of the subnet, whereas the connections between the hidden layer and each subnet output and between the subnet outputs and the overall network output do not carry any weights. Therefore, a second configuration is proposed as a fully connected network by incorporating weights at all the connections. In this new configuration (referred to as Model 2), there are two additional sets of weights, one acting at each output neuron of the hidden layer of each subnet and the other acting upon each subnet output. This type of network is considered to make the network more generalized. This model structure is shown in Fig. 2.

Fig. 2. Schematic representation of time-dependent neural network (Model 2). S: summation. P: product. p(S): polynomial function of summation value.

Fig. 3. Schematic representation of time-dependent neural network with single subnet (Model 3). S: summation. P: product. p(S): polynomial function of summation value.

From Models 1 and 2, it can be observed that each subnet has a repeated version of the input to hidden layer part. The hidden layer in each subnet is connected to a single subnet output, which is acted upon by a modulation function. It is possible to arrive at a compact structure by considering the input to hidden layer part common to all the subnets and connect each output of the hidden layer to each subnet output. This configuration results in another model structure, denoted as Model 3, consisting of an input layer, two hidden layers, and an output layer, and is shown in Fig. 3. In this model, the number of weights is almost half when compared with Model 2, thus making the network simpler and more compact. Mathematically, the three models together with the corresponding expressions for estimation of the weights using backpropagation algorithm with momentum factor can be represented as follows.


973

A. Model 1 1) Input to Hidden Layer of Each Subnet, Hlinj

=

Mi m

wl j p x i (t − kts )

i=1 k=0 i

where p =

d=1

Hlinj :

Expressions to update the weights connecting the input nodes to the hidden neurons, wl j p can be expressed as ⎫ ⎪ ⎪ ⎪ ⎬

(kk−1) wl(kk) j p = μw wl j p ⎧ ⎫ ⎨ ρw [y(t + ts ) − y˜ (t + ts )] ⎬ R in r−1 + r cl j r Hl j ⎩ fl (t)x(t − kts ) ⎭

⎪ ⎪ Md−1 + k + i, with M0 = 0 ⎪ ⎭

r=0

l = 1, . . . , L; j = 1, . . . , K

j = 1, 2, . . . , K ; l = 1, . . . , L (2)

where Hlinj is the input to the j th hidden unit of lth subnet at time t and wl j p is the weight connecting the pth input to j th hidden node of lth subnet. 2) Output from Hidden Layer of Each Subnet, Hlout j : Hlout j =

R

r cl j r Hlinj

j = 1, 2, . . . , K ; l = 1, . . . , L (3)

r=0

where Hlout j is the output of the j th hidden unit of the lth subnet at time t and cl j r are the coefficients of the polynomial activation functions in the j th hidden unit of the lth subnet. 3) Output from Each Subnet, Sl : Sl =

K

Hlout j

l = 1, . . . , L

(4)

j =1

where Sl is the output of lth subnet (with K hidden units). 4) Output from Modulation Layer, ql : ql = Sl fl (t)

l = 1, . . . , L

where f 1 (t) = 1 fl (t) = exp[− pl (k) ∗ time]

L

ql .

i

Md−1 + k + i, with M0 = 0; i = 1, . . . , M (9)

d=1

where y˜ (t + ts ) is the predicted output from the neural network, y(t + ts ) is the measured output of the system, kk denotes the iteration number, and ρ p, μ p, ρc , μc and ρw, μw are the learning rates and momentum factors for the modulation function parameter, polynomial coefficients, and the weights of each subnet of the TVNN, respectively; p, c, and w are the changes to be added to the respective weights to obtain the updated values. B. Model 2 As discussed earlier, in this model, the network configuration is fully connected by weights. The equations representing input to hidden layer of each subnet and the output from hidden layer of each subnet are given by eqns (2) and (3), respectively, as in Model 1. 1) Output from Each Subnet, Sl : Sl =

l = 2, . . . , L (5)

where ql is the output from the modulation layer of lth subnet. Note that for the stationary path f 1 (t) = 1, and alternate modulation functions can be chosen for other subnets for time dependency. 5) Time-Dependent Network Output, y˜ : y˜ (t + ts ) =

p=

K

vl j Hlout j

l = 1, . . . , L

where vl j are the weights connecting the j th hidden layer of lth subnet. 2) Output from Modulation Layer, ql : ql = Sl fl (t) where f 1 (t) = 1

l = 1, . . . , L

fl (t) = exp[− pl (k) ∗ time] (6)

l=1

Using the chain rule of differentiation, we derive the following updating expressions for weights. For the lth subnet, to update the weight pl of the modulation function, the following expression is used:

ρ p [y(t + ts ) − y˜ (t + ts )].Sl (kk) (kk−1) + pl = μ p pl .time.(exp[− pl (k) ∗ time]) l = 1, . . . , L. (7) Updating expressions for the polynomial coefficients of the activation functions, cl j r can be written as

ρc [y(t + t s ) − y˜ (t + ts )] (kk) (kk−1) r cl j r = μc cl j r + fl (t) Hlinj l = 1, . . . , L; j = 1, . . . , K ; r = 0, . . . , R. (8)

(10)

j =1

l = 2, . . . , L. (11)

Note that for the stationary path f 1 (t) = 1, and similar to the case of Model 1, alternate modulation functions can be chosen for other subnets for time dependency. 3) Time-Dependent Network Output: y˜ (t + ts ) =

L

z l ql

(12)

l=1

where z l is the weight connecting the lth subnet to the overall output of network. The expressions for updating different parameters and weights are given as follows. 4) Weights Connecting the Output of Each Subnet to the Overall Network Output, z l :

ρ [y(t + ts ) − y˜ (t + ts )] (kk) (kk−1) z l = μz z l + z z l ql l = 1, . . . , L. (13)

974


5) Modulation Function Parameter of Each Subnet, pl : 3) Output from Second Hidden Layer, Sl :

K ρ [y(t + ts ) − y˜ (t + ts )].z l (kk) (kk−1) pl = μ p pl + p Sl .time.(exp[− pl (k) ∗ time]) Sl = vl j H jout l = 1, . . . , L (20) l = 1, . . . , L. j =1 (14) where S the lth output of the second hidden layer (with L l 6) Weights Connecting the Hidden Layer and the Output of units) and vl j are the weights connecting the j th unit in the first hidden layer to the lth node in the second hidden layer. Each Subnet, vl j :

4) Output from Modulation Layer, ql : ρv [y(t + ts ) − y˜ (t + ts )]z l (kk) (kk−1) vl j = μv vl j + Hlout ql = Sl fl (t) l = 1, . . . , L j f l (t) l = 1, . . . , L; j = 1, . . . , K . where f 1 (t) = 1 (15) f (t) = exp[− p (k) ∗ time] l = 2, . . . , L (21) l

7) Polynomial Coefficients of the Activation Functions of Each Subnet, cl j r :

ts) − y˜ (t + ts )]z l ρc [y(t + (kk) (kk−1) cl j r = μc cl j r + r fl (t)vl j Hlinj l = 1, . . . , L; j = 1, . . . , K ; r = 0, . . . , R. (16) 8) Weights Connecting Input and Hidden Layers of Each Subnet, wl j p : (kk)

(kk−1)

wl j p = μw wl j p ⎫ ⎧ y˜ (t + ts )]z l fl (t) ⎬ ⎨ ρw [y(t + ts ) −

R r−1 + r cl j r Hlinj ⎭ ⎩ vl j x(t − kts )

where ql is the output from the lth node in the modulation layer. Note that for the stationary path f 1 (t) = 1, and similar to the cases of Models 1 and 2, alternate modulation functions can be chosen for other nodes in the modulation layer for time dependency. 5) Time-Dependent Network Output: y˜ (t + ts ) =

L

z l ql

(22)

l=1

r=0

l=1, . . . , L; j = 1, . . . , K i p = Md−1 + k + i, with M0 = 0; i= 1, . . . , Mi

l

(17)

d=1

where ρ p and μ p, ρc and μc, ρz and μz, ρv and μv , and ρw, and μw are the learning rates and momentum factors for the modulation function parameter, polynomial coefficients, and the weights between the hidden layer and output of each subnet, and weights connecting each subnet to the overall output of the TVNN, respectively, and p, c, z, v, and w are the changes in the respective weights. C. Model 3 This model structure is a compact version of Model 2, and is represented by the following equations: 1) Input to First Hidden Layer, H jin : ⎫ Mi m ⎪ ⎪ in ⎪ w j p x i (t − kts ) Hj = ⎬ i=1 k=0 j = 1, 2, ..., K i ⎪ ⎪ where p = Md−1 + k + i, with M0 = 0⎪ ⎭ d=1

where z l is the weight connecting the lth node in the modulation layer to the overall output of network. The expressions for updating different parameters and weights are given as follows. 6) Weights Connecting the Modulation Layer to the Overall Network Output, z l : z l(kk) = μz z l(kk−1) + ρz [y(t + ts ) − y˜ (t + ts )]z l ql l = 1, . . . , L. (23) 7) Modulation Function Parameter of Each Unit of Modulation Layer, pl :

ρ p [y(t +ts )− y˜ (t+ts )].z l Sl . (kk) (kk−1) pl = μ p pl + time.(exp[− pl (k) ∗ time]) l = 1, . . . , L. (24) 8) Polynomial Coefficients of the Activation Functions in the Second Hidden Layer, c j r : ⎫ ⎧ ⎨ρc [y(t + ts ) − y˜ (t + ts )]⎬ (kk) (kk−1) L r + c j r = μc c j r ⎭ ⎩ z k fk (t)v kj H jin k=1

j = 1, . . . , K ; r = 1, . . . , R.

(18)

(25)

where H jin is the input to the j th hidden unit of at time t and w j p is the weight connecting the pth input to j th hidden node of the network. 2) Output from First Hidden Layer, H jout :

9) Weights Connecting the First and the Second Hidden Layers, vl j :

ρv [y(t + ts ) − y˜ (t + ts )]z l (kk−1) = μ v + vl(kk) v out j lj fl (t)(H j ) l = 1, . . . , L; j = 1, . . . , K . (26)

H jout =

R r=0

r c j r H jin

j = 1, 2, . . . , K .

(19)


975

III. S IMULATION C ASE S TUDIES A. Batch Reactor Case Study The modeling ability of the proposed time-dependent neural network structure is evaluated using it to represent the dynamics of a batch reactor reported in [20]. The reactor consists of a jacketed arrangement into which either steam or cooling water can be circulated depending on the requirement with the help of split-ranged valve. Reactant is charged into the vessel. Steam is fed into the jacket to bring reaction mass to a desired temperature. Then, cooling water is added to the jacket to remove the exothermic heat of reaction and to track the reactor temperature along the prescribed temperature–time curve, fed as a variable temperature set point profile to the controller. First-order consecutive reactions of the following form are considered to occur in the batch reactor [20]: k1

k2

A −→ B −→ C. Fig. 4.

Flow chart for training of TVNNs.

10) Weights Connecting Input and First Hidden Layer, w j p : ⎧ ⎫ ρw [y(t + ts ) − y˜ (t + ts )] ⎪ ⎪

⎪ ⎪ ⎪ ⎪ L ⎪ ⎪ ⎨ z f (t)v ⎬ l l lj (kk) (kk−1) w j p = μw w j p + l=1

R ⎪ ⎪ r−1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ r c j r H jin x i (t − kts )⎪ r=1

p=

i

j = 1, . . . , K

Md−1 + k + i, with M0 = 0; i = 1, . . . , M

(27)

d=1

where y˜ (t + ts ) is the predicted output from the neural network, y(t + ts ) is the measured output of the system, kk denotes the update index, and ρ p, μ p, ρc, μc, and ρz, μz, ρv , μv , ρw, μw are the learning rates and momentum factors for the modulation function parameter, polynomial coefficients, and the weights of each subnet of the TVNN, respectively; p, c, z, v, and w are the changes in the respective weights. In all the three models, initial weights are a set of random numbers, which are iteratively updated as per equations reported above, based on the convergence of the square of the error between the measured and predicted values error = [y(t + ts ) − y˜ (t + ts )]2 .

(28)

Note that the weights are updated at each time point ts , and after a complete presentation of the entire training set, we continue with successive iterations until there is no further improvement in the prediction performance of the network. The procedure for training all the three models is represented in the form of a flowchart shown in Fig. 4. After the development of the networks, they are validated using one independent data set, which is not used for training or testing.

The desired product is the intermediate B. The system can be described with the help of differential equations for component balance equations for A and B in the reactor, energy balance equations in the reactor, metal wall and in the jacket (steam or cold water), and some algebraic equations. The equations for the jacket are different for the three phases of the batch cycle: with steam in the jacket, with jacket partially filled with water, and with the jacket filled with water. A proportional controller is employed to control the reactor temperature along a predefined set point trajectory consisting of a constant high initial set value and a falling ramp later. The temperature transmitter has a range of 50–250 F, so its output pneumatic pressure signal ranges between 20.7 Pa at 50 F and 103.4 Pa at 250 F. The controller operates as a split-range controller where either the steam valve or water valve is opened depending on the controller output pneumatic signal. This pneumatic signal is predefined as series of step changes for open-loop operation, or defined with the help of a proportional integral derivative (PID) control law for closed-loop operation. B. Multiproduct Semibatch Polymerization Reactor Challenge Problem The second system considered in this paper is an industrial semibatch polymerization reactor problem. An industrial case study for temperature control of a multiproduct semibatch polymerization reactor has been presented in [21] as a challenge problem. The system consists of a stirred tank reactor to prepare specialty emulsion polymers. During a typical reactor operation, four or five batches of a particular product would be made in succession. Afterward, the reactor would be cleaned before changing to a new product. Data have been provided for two different products (called as products A and B), which are representative of the large variety of products made. This paper is restricted to product A only. The operating recipe for product A is given as: 1) place initial charge of solids and water into the reactor at ambient temperature; 2) raise the temperature of the initial charge to the reaction temperature set point; 3) feed pure monomer into the reactor at 1.0 lb/min for 70 min; and 4) after the feed addition period is complete, hold at the reaction temperature for 60 min.

976


Fig. 5. Actual and predicted output temperature trajectories using Model 1 for batch reactor (training data).

Fig. 6. Actual and predicted output temperature trajectories using Model 1 for batch reactor (testing and validation data).

The differential-algebraic equations describing this system and details of reactor and recirculation data have been defined in the original problem [21], and are not included here for the sake of brevity. IV. R ESULTS AND D ISCUSSION A. Batch Reactor For the purpose of modeling the batch reactor system described in the previous section using a time-dependent neural network structure (Models 1–3), the temperature of the reactor is considered as the output variable, and the previous measured values of temperature of the reactor and controller output pressure are considered as input variables. Six data sets are generated for the given batch process of which two are closed loop data sets (i.e., with PID controller with proportional gain of 0.6 and integral time of 400 s, and derivative time of 2.7 s) and four are open loop data sets obtained by giving series of step changes in the pressure input. The data sets, generated through simulation of batch reactor model [20], for training consist of 380 data points sampled at every minute covering four batches, whereas the data sets for testing and validation consist of a total of 190 data points covering two batches. For training, three open-loop and one closed-loop data sets are employed, whereas the remaining data sets are used for the purpose of testing and validation. Backpropagation algorithm is employed to train the network based on the training data set using a convergence criterion to achieve the sum of squared deviation in predictions of training and testing data less than a threshold value (set to 0.002). Then, these trained weights are used for testing and validating the remaining two data sets. The three models described in the previous section are trained to achieve a final sum of squared deviation in predictions as less than or equal to threshold value. The prediction results for all the three models for the training and testing data sets are shown in Figs. 5–10. The figures clearly show that the prediction performance of all the three time-varying network


models is good for the training data as well as testing data. To assess the relative performance of the three models, Akaike information criterion (AIC) is employed, which is defined as SSE + (2 ∗ N p ) (29) AIC = N. log N where N is the number of data points (training), SSE is the sum of square of prediction errors, and N p is total number of weights. The AIC values for all the models are reported in Table I. In Model 3, it was found that SSE is not converging to the desired value when the number of neurons in the first hidden layer is taken as three. By changing the neurons in the hidden layer to four, the error is found to converge to threshold value. Table I shows that AIC value for Model 3 is minimum, since the SSE values for training and testing data for this model are comparable with those of Model 2, but the number of parameters in the model is much smaller. Therefore, for the next system considered, the study is restricted to Model 3. The choice of TVNN design parameters is based on the speed of convergence as well as the minimum achievable


Fig. 8. Actual and predicted output temperature trajectories using Model 2 for batch reactor (testing and validation data).

977

Fig. 10. Actual and predicted output temperature trajectories using Model 3 for batch reactor (testing and validation data). TABLE I C OMPARISON OF SSE AND AIC VALUES FOR D IFFERENT TVNN M ODELS FOR B ATCH R EACTOR


value of SSE. Momentum factors are not found to influence any of these factors to a large extent; therefore a fixed value of 0.5 is employed for all the three models. On the other hand, learning rate parameters are found to influence both these factors to a considerable extent. To explore this effect, different learning rate parameters are employed to achieve the desired convergence, and SSE is plotted as a function of the iteration number in Fig. 10 as a log–log plot for different learning rate values. It can be observed that higher learning rate results in faster initial progress, while resulting in divergence or oscillatory convergence obscuring the actual end point. Smaller learning rates are found to converge smoothly at the expense of longer computation time due to large number of iterations. Therefore, it was found to be more appropriate to use an optimum learning rate, which gives the faster convergence with the minimum number of iterations. From Fig. 11, it can be observed that a learning rate of 0.001 gives the better convergence with minimum number of iterations.

Fig. 11. Progress of the training for Model 3 with different fixed learning rates for batch reactor.

B. Semi Batch Reactor To develop TVNN model to describe the semibatch system, data are generated through simulation of semibatch

978


Fig. 12. Actual and predicted output temperature trajectories of Model 3 for the semibatch challenge problem (training data).

are employed for training, and the remaining two data sets are used for testing. The temperature of the reactor is considered as the output variable, and the previous samples of temperature of the reactor and control valve opening (expressed as percentage) are considered as inputs. The network structure corresponds to Model 3 as per earlier notation with the current process output as the single output of the model, and with three inputs in the input layer representing the two past process inputs and one past process output. The order of the polynomial function is three and number of neurons in the first hidden layer is four, and in the second layer, the number of neurons is two—one representing time-independent part and the second representing a time-dependent modulation function. The prediction capability of the network for training as well as for testing and validation is shown in Figs. 12 and 13. These figures clearly show that TVNN is able to predict the process output with reasonable accuracy. V. C ONCLUSION

Fig. 13. Actual and predicted output temperature trajectories of Model 3 for the semibatch challenge problem (testing and validation data).

polymerization reactor model [21] using PID controllers with a few typical disturbances introduced in the system in the extent of fouling and summer and winter conditions. For this system, it is not possible to generate open-loop data sets since the process exhibits open-loop instability. The data used for training covers summer conditions of no fouling (1/h f = 0.0), intermediate fouling (1/h f = 0.002), and winter condition at maximum fouling (1/h f = 0.004), and data for testing covers summer conditions of low fouling (1/h f = 0.001) and winter conditions of high fouling (1/h f = 0.003). PID controller parameters used to generate the closed loop data are a proportional gain of 10, integral time of 145 min, and derivative time of 0.125 min. The data are sampled at every 0.125 min, the number of data points for training is 3430 covering three batches, and the number of data points is 2290 for testing and validation covering two batches. The threshold value for sum of squared deviation in predictions is set to 0.002 for this system also. Among the data on five batches of a particular product A that are generated, as mentioned above, data of three batches

Three new formulations of time-dependent neural network architecture are proposed in this paper for batch/semibatch process modeling. The first formulation is a network that consists of multiple process inputs/outputs at previous sampling instants as the inputs to the network, and single process output at the current sampling instant as the network output. Each subnet consists of an input layer, a hidden layer consisting of neurons with polynomial activation functions, and an output layer with linear summation as activation function. The outputs of all the subnets are acted upon by different timedependent modulation functions and are summed to obtain the output prediction. In the second formulation, additional weights are introduced between the hidden layer and subnet outputs, and also between the subnet and network outputs. The third formulation provides a compact single network (without subnets) with two hidden layers. Backpropagation algorithm is used to derive expressions for training all the three network formulations. Batch reactor and semibatch polymerization process case studies are used as test beds to evaluate the modeling capability of the proposed network structures. Data sets representing open- and closed-loop process behaviors of the batch reactor are used for training and testing purposes. All the three TVNN models are found to represent the batch reactor dynamics accurately. Model 3 is found to exhibit minimum SSE and AIC values, and has minimum number of weights and is therefore considered as the best choice among the three models proposed. Therefore, for the semibatch polymerization reactor challenge problem, only Model 3 is considered and developed, and its prediction performance is validated based on data generated through closed-loop tests. The effect of learning rate on the network training is evaluated, and it is found that using an appropriate learning rate results in faster and smoother convergence to the desired accuracy. The data sets employed for batch reactor include open- and closed-loop data, whereas for the semibatch reactor system, closed-loop data are employed for training, testing, and validation, covering a wide range of operating conditions.


The results obtained in both the case studies illustrate that the proposed methodology is suitable for batch/semibatch reactor modeling, and may be attempted for any batch process. These models can be used for real-time control as well as to determine the optimal operating policies for batch processes. R EFERENCES [1] S. K. Arumugasamy and Z. Ahmad, “Elevating model predictive control using feed forward artificial neural networks: A review,” Chem. Product Process Model., vol. 4, no. 1, article no. 45, Oct. 2009. [2] D. M. Himmelblau, “Accounts of experiences in the application of artificial neural networks in chemical engineering,” Ind. Eng. Chem. Res., vol. 47, no. 16, pp. 5782–5796, Jul. 2008. [3] R. Chhabra, S. Kaur, and S. Ghosh, “Diabetes detection using artificial neural networks & back-propagation algorithm,” Int. J. Sci. Technol. Res., vol. 2, no. 1, pp. 9–11, Jan. 2013. [4] N. V. K. Dutt, Y. V. L. R. Kumar, and K. Y. Rani, “Representation of the ionic liquid viscosity—Temperature data by generalized correlations and an artificial neural network (ANN) model,” Chem. Eng. Commun., vol. 200, no. 12, pp. 1600–1622, Jul. 2013. [5] A. Moghadassi, M. Nikkholgh, S. Hosseini, and F. Parvizian, “Estimation of vapor pressures, compressed liquid, and supercritical densities for sulfur dioxide using artificial neural networks,” Int. J. Ind. Chem., vol 4, no. 1, pp. 1–8, Feb. 2013. [6] A. Bayram, M. Kankal, G. Tayfur, and H. Onsoy, “Estimation of suspended sediment concentration from turbidity measurements using artificial neural networks,” Environ. Monitor. Assessment, vol. 184, no. 7, pp. 4355–4365, Jul. 2012. [7] D. R. Baughman and Y. A. Liu, Neural Networks in Bio-Processing and Chemical Engineering. San Diego, CA, USA: Academic, 1995. [8] K. Y. Rani and V. S. R. Rao, “Neural modeling of biochemical systems using CDTA with adaptive learning rate,” Indian J. Chem. Technol., vol. 13, no. 6, pp. 623–633, Nov. 2006. [9] H. G. Han, “Adaptive computation algorithm for RBF neural network,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 2, pp. 342–347, Feb. 2012. [10] W. C. Yeh, “New parameter-free simplified swarm optimization for ANN training and its application to prediction of time series,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 4, pp. 661–665, Apr. 2013. [11] C. L. Castro and A. P. Braga, “Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data,” IEEE Trans. Neural Netw., vol. 24, no. 6, pp. 888–899, Jun. 2013. [12] Z. Wang, H. Zhang, and P. Li, “An LMI approach to stability analysis of reaction–diffusion Cohen–Grossberg neural networks concerning Dirichlet boundary conditions and distributed delays,” IEEE Trans. Syst., Man, Cybern., B, Cybern., vol. 40, no. 6, pp. 1596–1606, Dec. 2010. [13] J. Zhou, S. Xu, B. Zhang, Y. Zou, and H. Shen, “Robust exponential stability of uncertain stochastic neural networks with distributed delays and reaction-diffusions,” IEEE Trans. Neural Netw., vol. 23, no. 9, pp. 1407–1416, Sep. 2012. [14] H. Zhang and Y. Wang, “Stability analysis of Markovian jumping stochastic Cohen–Grossberg neural networks with mixed time delays,” IEEE Trans. Neural Netw., vol. 19, no. 2, pp. 366–370, Feb. 2008. [15] K. Y. Rani and S. C. Patwardhan, “Data-driven modeling and optimization of semi-batch reactors using artificial neural networks,” Ind. Eng. Chem. Res., vol. 43, no. 23, pp. 7539–7551, 2004. [16] J. S. Chang and B. C. Hung, “Optimization of batch polymerization reactors using neural-network rate-function models,” Ind. Eng. Chem. Res., vol. 41, no. 11, pp. 2716–2727, May 2002. [17] M. Iatrou, T. W. Berger, and V. Z. Marmarelis, “Modeling of nonstationary dynamic systems with a novel class of artificial neural networks,” IEEE Trans. Neural Netw., vol. 10, no. 2, pp. 327–339, Feb. 1999. [18] M. Iatrou, T. W. Berger, and V. Z. Marmarelis, “Application of a novel modeling method to the nonstationary properties of potentiation in the rabbit hippocampus,” Ann. Biomed. Eng., vol. 27, no. 5, pp. 581–591, Sep. 1999.

979

[19] V. Z. Marmarelis and X. Zhao, “Volterra models and three-layer perceptrons,” IEEE Trans. Neural Netw., vol. 8, no. 6, pp. 1421–1433, Jun. 1997. [20] W. L. Luyben, Process Modeling, Simulation and Control for Chemical Engineers, 2nd ed. New York, NY, USA: McGraw-Hill, 1996, pp. 58–61. [21] R. W. Chylla and D. R. Haase, “Temperature control of semibatch polymerization reactors,” Comput. Chem. Eng., vol. 17, no. 3, pp. 257–264, Mar. 1993.

Botla Ganesh was born in Andhra Pradesh, India, in August 1978. He received the B.Tech. and M.Tech. degrees in chemical engineering from Osmania University, Hyderabad, India, in 2002 and 2005, respectively. He is currently a Senior Research Fellow in the Chemical Engineering Division, Indian Institute of Chemical Technology, Hyderabad. He has published one paper in an international journal, and over ten papers in conference proceedings. His current research interests include dynamic modeling/simulation of batch chemical processes, advanced process control, and artificial neural networks.

Vadlagattu Varun Kumar was born in Karimnagar, India, on July 21, 1987. He received the bachelor’s degree in chemical engineering from the University College of Technology, Osmania University, Hyderabad, India, in 2008, and the master’s degree in chemical engineering from the Indian Institute of Technology, Delhi, India, in 2010. He is currently an Engineer with the Technical Service Department, Rashtriya Chemicals and Fertilizers Ltd., Mumbai, India. He has presented a few papers in conferences. His current research interests include time varying neural networks, nonlinear stationary modelling, and computational fluid dynamics.

Kalipatnapu Yamuna Rani received the B.Tech. degree in chemical engineering from Osmania University, Hyderabad, India, in 1986, and the M.Tech. and Ph.D. degrees in chemical engineering from the Indian Institute of Technology, Madras, India, in 1988 and 2002, respectively. She has been with the Chemical Engineering Division, Indian Institute of Chemical Technology, Hyderabad, India, since 1990, in various scientific positions and is currently a Senior Principal Scientist with the same institute. She is also a Professor with the Academy of Scientific and Innovative Research in the discipline of engineering sciences. She has authored two books, two book chapters, and over 80 papers in peer-reviewed international and national journals and proceedings of international and national conferences. Her current research interests include dynamic modeling, optimization, product modeling and design, data-driven modeling, and advanced control of chemical and biochemical engineering systems. Dr. Rani has been the recipient of Dr. Subba Raju Memorial Prize and Medal from the Indian Institute of Technology, Madras, the Kuloor Memorial Award for the Best Technical Paper published in Indian Chemical Engineer Transactions, the DAAD German Academic Exchange Service Fellowship for 16 months in Germany, and the CSIR Young Scientist Award for Basic Research in Engineering Sciences.

Surrogate modeling of deformable joint contact using artificial neural networks.

Modeling of tumor growth in dendritic cell-based immunotherapy using artificial neural networks.

Modeling human operators using neural networks.

Prediction of Soil Deformation in Tunnelling Using Artificial Neural Networks.

Design of jetty piles using artificial neural networks.

Classification of images acquired with colposcopy using artificial neural networks.

Holography in artificial neural networks.

Artificial neural networks in neurosurgery.

Honey characterization using computer vision system and artificial neural networks.

Designing Artificial Neural Networks Using Particle Swarm Optimization Algorithms.

A movement pattern generator model using artificial neural networks.

Editorial: Predicting surgical satisfaction using artificial neural networks.

Clustering proteins into families using artificial neural networks.

Artificial neural networks for modeling time series of beach litter in the southern North Sea.

Implementation of recurrent artificial neural networks for nonlinear dynamic modeling in biomedical applications.

Generalization and specialization in artificial neural networks.

Artificial neural networks in bioprocess state estimation.

Modeling land use and land cover changes in a vulnerable coastal region using artificial neural networks and cellular automata.

Artificial neural networks: powerful tools for modeling chaotic behavior in the nervous system.

The potential of random forest and neural networks for biomass and recombinant protein modeling in Escherichia coli fed-batch fermentations.

Prediction of the Fundamental Period of Infilled RC Frame Structures Using Artificial Neural Networks.

Differential diagnostics of Thalassemia Minor by artificial neural networks model.

Predicting all-cause risk of 30-day hospital readmission using artificial neural networks.

Software Design Challenges in Time Series Prediction Systems Using Parallel Implementation of Artificial Neural Networks.