IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

1149

A Two-Layer Recurrent Neural Network for Nonsmooth Convex Optimization Problems Sitian Qin and Xiaoping Xue

Abstract— In this paper, a two-layer recurrent neural network is proposed to solve the nonsmooth convex optimization problem subject to convex inequality and linear equality constraints. Compared with existing neural network models, the proposed neural network has a low model complexity and avoids penalty parameters. It is proved that from any initial point, the state of the proposed neural network reaches the equality feasible region in finite time and stays there thereafter. Moreover, the state is unique if the initial point lies in the equality feasible region. The equilibrium point set of the proposed neural network is proved to be equivalent to the Karush–Kuhn–Tucker optimality set of the original optimization problem. It is further proved that the equilibrium point of the proposed neural network is stable in the sense of Lyapunov. Moreover, from any initial point, the state is proved to be convergent to an equilibrium point of the proposed neural network. Finally, as applications, the proposed neural network is used to solve nonlinear convex programming with linear constraints and L 1 -norm minimization problems. Index Terms— Global convergence, Lyapunov function, nonsmooth convex optimization, two-layer recurrent neural network.

C

I. I NTRODUCTION ONSIDER the following nonsmooth convex optimization problem: minimize subject to

f (x) g(x) ≤ 0 Ax = b

(1)

where x = (x 1 , x 2 , . . . , x n )T ∈ Rn is the vector of decision variables, f : Rn → R is the objective function which is convex but may be nonsmooth, g(x) = (g1 (x), g2 (x), . . . , g p (x))T : Rn → R p is a p-dimensional vector-valued function and gi is also convex but may be nonsmooth (i = 1, 2, . . . , p), A ∈ Rm×n is a full row-rank matrix (i.e., rank(A) = m ≤ n), and b = (b1 , b2 , . . . , bm )T ∈ Rm . Without loss of generality, in this paper, we assume that nonlinear convex programming (1) has at least an optimal solution. Nonlinear programming (1) arises in a broad variety of scientific and engineering applications, such as optimal control, structure design, signal processing, electrical network, Manuscript received April 8, 2013; revised January 11, 2014 and May 24, 2014; accepted June 26, 2014. Date of publication July 15, 2014; date of current version May 15, 2015. This work was supported in part by the National Science Foundation of China under Grant 11271099, Grant 11126218, and Grant 11101107, and in part by the China Post-Doctoral Science Foundation under Project 2013M530915. S. Qin is with the Department of Mathematics, Harbin Institute of Technology, Weihai 264209, China (e-mail: [email protected]). X. Xue is with the Department of Mathematics, Harbin Institute of Technology, Harbin 150001, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2014.2334364

robot control, and power system planning [1]. Recently, people derive many numerical algorithms to solve nonlinear programming (1). However, since the computing time greatly depends on the dimension and the structure of (1), the conventional numerical algorithm is usually less effective for nonlinear programming (1). Then, one possible and promising approach to solve nonlinear programming (1) with high dimension and complex structure in real time is to employ recurrent neural networks based on circuit implementation [2]. The neurodynamics for nonlinear programming (1) has numerous potential applications, such as the coordination of multimanipulator systems [3], [4], the model predictive control [5], [6], the pattern classification [7], and so on. Since Tank and Hopfield [8] first proposed a neural network for linear programming, recurrent neural networks for the optimization problem (1), and their engineering applications have been widely investigated [9]–[27]. For example, Kennedy and Chua [9] improved the neural network in [8] by developing a neural network with a finite penalty parameter to solve nonlinear programming. To avoid using penalty parameters, Xia and Wang [11] presented a novel recurrent neural network for nonlinear convex programming subject to nonlinear inequality constraints. Under the condition that the objective function is convex and all constraint functions are strictly convex, they proved that the state of the proposed neural network was globally convergent to an exact optimal solution. To overcome the strict condition on the constraint function in [11], Yang and Cao [12] extended the neural network proposed in [11] and proved the related results under the condition that the objective function and all constraint functions are convex. Gao [13] presented a projection-type neural network for solving the nonlinear convex programming problem (1) with bounded constraints in real time. Meanwhile, it is well known that nonsmooth optimization plays an important role in many engineering applications, such as manipulator control, signal processing, sliding mode, and so on. Forti et al. [19] proposed a generalized neural network to solve a much wider class of nonsmooth nonlinear programming problems in real time. The generalized neural network in [19] was a gradient system of differential inclusions. For more general nonsmooth programming, Xue and Bian [23], [24] proposed some recurrent neural networks based on subgradient and penalty function method. However, penalty function method is effective relying on exact penalty parameters, and it is difficult to estimate penalty parameters in real applications. Cheng et al. [28] proposed a nonsmooth recurrent neural network to solve the nonsmooth convex optimization problem (1). The proposed neural

2162-237X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1150

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

network in [28] could deal with the nonsmooth convex optimization problem with a larger class of constraints, and was not based on any penalty method. However, the neural network in [28] had a complex structure. To reduce the model complexity, some one-layer recurrent neural networks were proposed [20]–[22], [25], [29]–[31]. Guo et al. [29] presented a one-layer recurrent neural network for solving pseudoconvex optimization problems subject to linear equality constraints. Moreover, they proved the finite-time state convergence to the feasible region defined by the equality constraints. Based on penalty function method, Liu et al. [30] presented another one-layer recurrent neural network to solve pseudoconvex optimization problems subject to linear equality and box constraints. To solve linear programming, Liu et al. [25] proposed a novel neural network based on the gradient method. In [25], they proved that the state of the proposed neural network was globally convergent to exact optimal solutions in finite time. Convergence in finite time or convergence in the presence of nonisolated equilibria is a remarkable phenomenon for some recurrent neural networks, and receives numerous attentions [15], [17], [26], [32]–[35]. For example, in [26], by means of quantitative evaluation of the Łojasiewicz exponent at the equilibrium points, for nonconvex quadratic programming, each trajectory of the neural network in [19] is either exponentially convergent, or convergent in finite time, toward a singleton belonging to the set of constrained critical points. Recently, by introducing a regularization control item, Bian and Xue [36] proposed a new neural network to solve a class of nonsmooth convex optimization problems. Under some mild assumptions, they proved that the state of proposed neural network converged to the feasible region in finite time and to the particular element in the optimal solution set with the smallest norm. Inspired by previous studies, in this paper, we propose a simplified two-layer neural network to solve the nonlinear programming (1). The contributions of this paper are listed as follows. First, unlike neural networks in [19] and [30], the proposed neural network in this paper is not based on penalty method, which means that we need not select an exact parameter in advance. Second, the proposed neural network has a low model complexity. For solving the nonsmooth convex optimization problem (1), neural networks in [13] and [28] have three-layer structure. Obviously, the structure of the proposed neural network in this paper is simpler than the neural networks in [13] and [28]. Third, based on the more general conditions, we prove the stability of the proposed neural network. In this paper, we only assume that f and g in (1) are convex. It is a basic hypothesis for the convex optimization problem. Meanwhile, by some simple transformations, A in (1) can be transformed into a full row-rank matrix and the equality constraint Ax = b remains unchanged. Hence, without loss of generality, we also assume that A is a full row-rank matrix. The remainder of this paper is organized as follows. In Section II, we will present some preliminaries and Karush–Kuhn–Tucker (KKT) conditions for the nonsmooth convex optimization problem (1). In Section III, we will introduce a simplified two-layer neural network and prove

that the equilibrium point set of the proposed neural network is equivalent to the KKT optimality set of the nonsmooth convex optimization problem (1). Then, the convergence analysis of the proposed neural network is studied in Section IV. In Section V, we will propose two examples to illustrate our results. Finally, Section VI concludes this paper. II. P RELIMINARIES For the convenience of later discussions, we present some definitions and properties, which are needed in the remainder of this paper. We refer readers to [37] and [38] for more thorough discussions. Let Rn be an n-dimensional real Euclidean space with inner product < ·, · >, and induced norm  · . A set-valued map F : K ⊆ Rn → Rn is a map that to each point x ∈ K , there corresponds a nonempty set F(x) ⊆ Rn . A set-valued map F : K ⊆ Rn → Rn with nonempty values is said to be upper semicontinuous (U.S.C.) at  x ∈ K if for any open set U y containing F( x ), there exists a neighborhood Ux of  x such that F(Ux ) ⊆ U y . If K is closed, F has nonempty closed values and is bounded in a neighborhood of each point x ∈ K , then F is upper semicontinuous on K if and only if the graph of F (denoted by Gr (F)) is closed, where Gr (F) is defined as Gr (F) = {(x, y) ∈ K × Rn : y ∈ F(x)}. Definition 2.1: Let f be Lipschitz near a given point x 0 ∈ Rn , and v any other vector in Rn . The generalized directional derivative of f at x 0 in the direction v, denoted by f ◦ (x 0 ; v), is defined as follows: f ◦ (x 0 ; v) = lim sup

y→x 0 ,t ↓0

f (y + tv) − f (y) . t

The Clarke subdifferential of f at x 0 is given by ∂ f (x 0 ) = {ξ ∈ Rn : f ◦ (x 0 ; v) ≥ ξ, v for all v in Rn } which is a subset of Rn . Note that if f is Lipschitz of rank K near x 0 , then ∂ f (x 0 ) is a nonempty, convex, compact subset of Rn , and ξ  ≤ K for any ξ ∈ ∂ f (x 0 ) [38, Proposition 2.1.2]. Convex function satisfies the following properties. Proposition 2.1 [38, Proposition 2.2.7]: If f : Rn → R is convex, then: 1) the Clarke subdifferential of f at x coincides with the subdifferential of f at x in the sense of convex analysis, i.e., ∂ f (x 0 ) = {ξ ∈ Rn : f (x 0 ) − f (x) ≤ ξ, x 0 − x , ∀x ∈ Rn }; 2) ∂ f (·) is maximal monotone, i.e., x − x 0 , v − v 0 ≥ 0 for any v ∈ ∂ f (x) and v 0 ∈ ∂ f (x 0 ); 3) ∂ f (·) is upper semicontinuous. Definition 2.2: f is said to be regular at x provided: 1) for all v ∈ Rn , the usual one-sided directional derivative f (x; v) exists; 2) for all v ∈ Rn , f (x; v) = f ◦ (x; v).

QIN AND XUE: TWO-LAYER RECURRENT NEURAL NETWORK

1151

Obviously, any convex function is regular. Regular function has a very important property (i.e., chain rule), which has been used in many papers [29], [30]. Lemma 2.1 (Chain Rule [29]): If V (x) : Rn → R is regular and x(t) : [0, +∞) → Rn is absolutely continuous on any compact interval of [0, +∞), then x(t) and V (x(t)) : [0, +∞) → R are differentiable, and V˙ (x(t)) = ξ, x(t) ˙ ∀ξ ∈ ∂ V (x(t)) for a.e. t ∈ [0, +∞). Without loss of generality, in this paper, we assume that the nonlinear convex programming (1) has at least an optimal solution. We next introduce nonsmooth KKT conditions of the nonlinear programming (1). According to [28, Th. 1], we have the following. Lemma 2.2: Let f and g be convex, then x ∗ is an optimal solution of nonlinear convex programming (1) if and only if there exist μ∗ ∈ R p and ν ∗ ∈ Rm such that

Fig. 1.

Block diagram of F by circuits.

Fig. 2.

Schematic block diagram of (3) for optimization problem (5).

0 ∈ ∂ f (x ∗ ) + ∂g(x ∗ )T μ∗ + A T v∗ 0 = μ∗ − (μ∗ + g(x ∗ ))+ 0 = Ax ∗ − b

(2)

where ∂g(x ∗ )T = (∂g1 (x ∗ ), ∂g2 (x ∗ ), . . . , ∂g p (x ∗ ))T , (μ∗ + g(x ∗ ))+ = ((μ∗1 + g1 (x ∗ ))+ , (μ∗2 + g2 (x ∗ ))+ , . . . , (μ∗p + g p (x ∗ ))+ )T , and (t)+ = max{t, 0}. III. N EURAL N ETWORK M ODEL

TABLE I C OMPARISON OF R ELATED N EURAL N ETWORKS FOR S OLVING THE C ONVEX P ROGRAMMING (1)

Based on Lemma 2.2, we propose a two-layer recurrent neural network to solve the nonlinear convex programming (1), with the following dynamical equations: ⎧ ⎪ ˙ ∈ −(I − P)[∂ f (x(t)) + ∂g(x(t))T μ(t)] ⎨x(t) (3) −A T h(Ax(t) − b) ⎪ ⎩ 2μ(t) ˙ = −μ(t) + (μ(t) + g(x(t)))+ where μ = (μ + g(x))+ , I is the identity matrix, P = ˜ 1 ), h(x ˜ 2 ), . . . , h(x ˜ m ))T , and its A T (A A T )−1 A, h(x) = (h(x component is defined as ⎧ ⎪ if x i > 0 ⎨1, ˜h(x i ) = [−1, 1], if x i = 0 (4) ⎪ ⎩ −1, if x i < 0 and ∂g(x(t))T = (∂g1 (x(t)), ∂g2 (x(t)), . . . , ∂g p (x(t)))T . Nonsmooth neural network (3) can be realized by a generalized circuit. For more details on the generalized circuit, readers can refer to [19], [21] and [36]. Here, we propose a generalized circuit implementation of neural network (3) for the following simple optimization problem: minimize subject to

f (x 1 , x 2 ) = x 1 + |x 2 | g(x 1 , x 2 ) = x 1 + x 2 ≤ 0a1 x 1 + a2 x 2 = 1. (5)

The implementation of F = ∂ f (x) + ∂g(x)T μ can be simulated as the block diagram in Fig. 1, and the implementation of h is simulated in [19]. Then, the architecture of neural network (3) for the optimization problem (5) can be shown in

Fig. 2, where (h 1 , h 2 )T = h(Ax − b), [ei j ]2×2 = I − P, and A = (a1 , a2 ). In Table I, we show a comparison of the proposed neural network (3) with several other existing neural networks for solving the nonsmooth convex optimization problem (1). Here, X 1 = {x ∈ Rn : g(x) ≤ 0} and X 2 = {x ∈ Rn : Ax = b}. It is easy to see that the proposed neural network (3) has less number of neurons than neural networks in [13] and [28]. Meanwhile, although neural networks in [21], [36], and [39] have low model complexity, more additional assumptions or penalty parameters are introduced to ensure the stability of these neural networks. More comparisons and related examples will be introduced in Section IV. For the convenience of later discussions, we introduce the definition of a solution to a Cauchy problem associated to neural network (3) in the sense of Filippov [40] as follows. Definition 3.1: A vector function (x(·), μ(·))T is said to be a state of neural network (3) on [0, T ), if (x(·), μ(·))T is

1152

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

absolutely continuous on [0, T ) with initial condition x(0) = x 0 , μ(0) = μ0 , and for almost all (a.e.) t ∈ [0, T ) ⎧ ⎪ ˙ ∈ −(I − P)[∂ f (x(t)) + ∂g(x(t))T μ(t)] ⎨x(t) −A T h(Ax(t) − b) ⎪ ⎩ 2μ(t) ˙ = −μ(t) + (μ(t) + g(x(t)))+ .

we have d B(x(t)) = ξ(t)T A x(t) ˙ dt = −ξ(t)T A(I − P)[γ (t) + η(t)T μ(t)]

Equivalently, there exist measurable functions γ (t) ∈ ∂ f (x(t)), η(t) = (η1 (t), η2 (t), . . . , η p (t)) ∈ ∂g(x(t)), and ξ(t) ∈ h(Ax(t) − b), such that  x(t) ˙ = −(I − P)[γ (t) + η(t)T μ(t)] − A T ξ(t) 2μ(t) ˙ = −μ(t) + (μ(t) + g(x(t)))+

≤ −λ M (A A T )ξ(t)2

for almost all t ∈ [0, T ). Definition 3.2: (x ∗ , μ∗ )T is called an equilibrium point of the neural network (3) if  0 ∈ (I − P)[∂ f (x ∗ ) + ∂g(x ∗ )T μ∗ ] + A T h(Ax ∗ − b) 0 = −μ∗ + (μ∗ + g(x ∗ ))+ γ∗

∂ f (x ∗ ),

η∗

∂g(x ∗ )

ξ∗

i.e., there exist ∈ ∈ and ∈ such that  0 = (I − P)(γ ∗ + η∗T μ∗ ) + A T ξ ∗ 0 = −μ∗ + (μ∗ + g(x ∗ ))+ .

h(Ax ∗ −b)

Theorem 3.1: For any initial point (x(0), μ(0))T = (x 0 , μ0 )T ∈ Rn+ p , there is at least a local solution (x(t), μ(t))T of neural network (3) defined on a maximal interval [0, T ), for some T ∈ (0, +∞]. Moreover, the state x(t) will reach the equality feasible region X 2 = {x ∈ Rn |Ax = b} in finite time and stay there thereafter. And μ(t) ≥ 0 if μ(0) ≥ 0. Proof: Since  −(I − P)[∂ f (x) + ∂g(x)T μ] − A T h(Ax − b) F(x, μ) = −μ + (μ + g(x))+ is U.S.C. with nonempty convex compact values, then for any initial point (x 0 , μ0 )T ∈ Rn+ p , there exists at least a state (x(t), μ(t))T of neural network (3) with x(0) = x 0 and μ(0) = μ0 , t ∈ [0, T ) [37, Th. 3, p. 98]. That is, there exist measurable functions γ (t) ∈ ∂ f (x(t)), η(t) ∈ ∂g(x(t)), and ξ(t) ∈ h(Ax(t) − b), such that  x(t) ˙ = −(I − P)[γ (t) + η(t)T μ(t)] − A T ξ(t) (6) 2μ(t) ˙ = −μ(t) + (μ(t) + g(x(t)))+ for almost all t ∈ [0, T ). Let B(x) = Ax − b1 , where  · 1 is the 1-norm of Rn . Obviously, B(x) is convex and regular. According to the chain rule (Lemma 2.1), for a.e. t ∈ [0, T ), we have d B(x(t)) = ζ T A x(t) ˙ dt

∀ζ ∈ h(Ax(t) − b).

It is clear that A(I − P) = A(I − A T (A A T )−1 A) = A− A = 0. Hence, taking ζ = ξ(t) ∈ h(Ax(t) − b) which is from (6),

−ξ(t)T A A T ξ(t) = −ξ(t)T A A T ξ(t) (7)

where λ M (A A T ) > 0 stands for the operation of taking the maximum eigenvalue of A A T . Obviously, if Ax(t) = b, by the definition of h in (4), ξ(t)2 ≥ 1, which implies that d B(x(t)) ≤ −λ M (A A T ). dt

(8)

Suppose that there exists t0 > 0 such that x(0) = x 0 ∈ / X2 and x(t) ∈ / X 2 for t ∈ [0, t0 ]. Then, integrating both sides of (8) from 0 to t0 , we have B(x(t0 )) − B(x(0)) ≤ −λ M (A A T )t0 that is 0 ≤ Ax(t0 ) − b1 ≤ Ax(0) − b1 − λ M (A A T )t0 . Therefore, t0 ≤ Ax(0) − b1 /λ M (A A T ). That is, when t > Ax(0) − b1 /λ M (A A T ), x(t) ∈ X 2 , i.e., the state x(t) will reach the equality feasible region X 2 = {x ∈ Rn |Ax = b} in finite time and an upper bound of the hit time is ts = Ax(0) − b1 /λ M (A A T ). Moreover, if x(t) leaves X 2 at t1 > ts , then there must exist the interval (t1 , t2 ) such that x(t) ∈ / X 2 for any t ∈ (t1 , t2 ) and Ax(t1 ) − b1 = 0. Hence, by (7), we have Ax(t2 ) − b1 ≤ Ax(t1 ) − b1 − λ M (A A T )(t2 − t1 ) (9) = −λ M (A A T )(t2 − t1 ) < 0 which leads a contradiction. Hence, the state x(t) reaches X 2 in finite time and stays there thereafter. On the other hand, according to (6), we have d 1t 1 1 (e 2 μ(t)) = e 2 t (μ(t) + g(x(t)))+ ≥ 0 dt 2 1

which implies that e 2 t μ(t) ≥ e0 μ(0) = μ(0). Hence, μ(t) ≥ 0 if μ(0) ≥ 0. Now, we study the uniqueness of the state of neural network (3) starting at a given initial point. Theorem 3.2: For any initial point (x 0 , μ0 )T ∈ X 2 × R p , there is a unique solution (x(t), μ(t))T of neural network (3) with (x(0), μ(0))T = (x 0 , μ0 )T . Proof: Let (x(t), μ(t))T be a state of neural network (3) with initial point (x(0), μ(0))T = (x 0 , μ0 )T ∈ X 2 × R p , i.e., there exist measurable functions γ (t) ∈ ∂ f (x(t)), η(t) ∈ ∂g(x(t)), and ξ(t) ∈ h(Ax(t) − b), such that  x(t) ˙ = −(I − P)[γ (t) + η(t)T μ(t)] − A T ξ(t) (10) 2μ(t) ˙ = −μ(t) + [μ(t) + g(x(t))]+ for a.e. t. By Theorem 3.1 and the initial point (x 0 , μ0 )T ∈ X 2 × R p , the solution (x(t), μ(t))T of neural network (3)

QIN AND XUE: TWO-LAYER RECURRENT NEURAL NETWORK

1153

will stay in X 2 × R p , i.e., Ax(t) = b for all t ≥ 0. Then, by (10) 0 = A x(t) ˙ = −A(I − P)[γ (t) + η(t)T μ(t)] −A A T ξ(t) = −A A T ξ(t) which means that ξ(t) = 0 for all t ≥ 0, since is invertible by the assumption in (1). Hence, the state x(t) also satisfies A AT

x(t) ˙ = −(I − P)[γ (t) + η(t)T μ(t)].

(11)

(x (t), μ (t))T

Suppose there exists another state of neural network (3) with the same initial condition (x (0), μ (0))T = (x 0 , μ0 )T ∈ X 2 × R p , i.e., there exist measurable functions γ (t) ∈ ∂ f (x (t)), η (t) ∈ ∂g(x (t)), and ξ (t) ∈ h(Ax (t)−b), such that  x˙ (t) = −(I − P)[γ (t) + η (t)T μ (t)] − A T ξ (t) (12) 2μ˙ (t) = −μ (t) + [μ (t) + g(x (t))]+ for a.e. t. Similar to the above discussion, (x (t), μ (t))T also satisfies  x˙ (t) = −(I − P)[γ (t) + η (t)T μ (t)] (13) Ax (t) = b for a.e. t.

p Noting that μ2 = (μ + g(x))+ 2 = i=1 [(μi + gi (x))+ ]2 and  (μi + gi (x))2 if μi + gi (x) > 0 [(μi + gi (x))+ ]2 = 0 otherwise we have μ2 is convex with respect to (x, μ), and  2∂g(x)T μ 2 ∂μ = . 2μ

= (x(t) − x (t))T (−[γ (t) + η(t)T μ(t)] + [γ (t) + η (t)T μ (t)]) + (μ(t) − μ (t))T (−μ(t) + μ(t) + μ (t) − μ (t)) = −(x(t) − x (t))T (γ (t) − γ (t)) − μ(t) − μ (t)2 − [(x(t) − x (t))T (η(t)T μ − η (t)T μ ) + (μ(t) − μ (t))T (μ(t) − μ (t))] + 2(μ(t) − μ (t))T (μ(t) − μ (t)) ≤ −μ(t) − μ (t)2 + 2(μ(t) − μ (t))T (μ(t) − μ (t)) ≤ μ(t) − μ (t)2 .

(14)

(x(t) − x (t))T (η(t)T μ − η (t)T μ ) (15)

for a.e. t ≥ 0. Then, according to chain rule (Lemma 2.1) and (11) and (13)  d 1 x(t) − x (t)2 + μ(t) − μ (t)2 dt 2 = (x(t) − x (t))T (x(t) ˙ − x˙ (t)) + 2(μ(t) − μ (t))T (μ(t) ˙ − μ˙ (t)) = (μ(t) − μ (t))T (−μ(t) + μ(t) + μ (t) − μ (t))

(17)

Obviously, (·)+ is globally Lipschitz, i.e., |(s)+ − (t)+ | ≤ |s − t| for all s, t ∈ R. Then μ(t) − μ (t)2 p = [(μi (t) + gi (x(t)))+ − (μ i (t) + gi (x (t)))+ ]2 i=1 p ≤ [(μi (t) + gi (x(t))) − (μ i (t) + gi (x (t)))]2 i=1

≤ 2μ(t) − μ (t)2 + 2g(x(t)) − g(x (t))2

(18)

According to the property of convex function [38, Proposition 2.2.6], g is locally Lipschitz on Rn . Especially, g is locally Lipschitz on the initial point x 0 ∈ X 2 ⊆ Rn , i.e., there exist r > 0 and L > 0 such that g(x) − g(y) ≤ Lx − y

Hence, by the maximal monotonicity of convex subdifferential and (10), (12), and (14), we have + (μ(t) − μ (t))T (μ(t) − μ (t)) ≥ 0

Hence, by the maximal monotonicity of convex subdifferential and (15)  d 1

2

2 x(t) − x (t) + μ(t) − μ (t) dt 2

(19)

for all x, y ∈ B(x 0 , r ). Meanwhile, by above assumption, (x(0), μ(0))T =

(x (0), μ (0))T = (x 0 , μ0 )T ∈ X 2 × R p . Hence, there exists t0 > 0 such that both x(t) and x (t) lie in B(x 0 , r ) for all t ∈ [0, t0 ]. Then, by (19), for all t ∈ [0, t0 ] g(x(t)) − g(x (t)) ≤ Lx(t) − x (t).

(20)

Therefore, according to the inequalities (17), (18), and (20), for all t ∈ [0, t0 ], we have  d 1 x(t) − x (t)2 + μ(t) − μ (t)2 dt 2 ≤ 2μ(t) − μ (t)2 + 2Lx(t) − x (t)2  1

2

2 x(t) − x (t) + μ(t) − μ (t) ≤M 2

(21)

On the other hand, since Ax(t) = Ax (t) = b for t ≥ 0

where M = max{4L, 2} > 0. Hence, by Gronwall’s inequality, for all t ∈ [0, t0 ], we have 1 x(t) − x (t)2 + μ(t) − μ (t)2 2  1 x(0) − x (0)2 + μ(0) − μ (0)2 e Mt = 0 (22) ≤ 2

(x(t) − x (t))T P = (x(t) − x (t))T A T (A A T )−1 A

which means

+ (x(t) − x (t))T (I − P)(−[γ (t) + η(t)T μ(t)] + [γ (t) + η (t)T μ (t)]). (16)

T −1

= (Ax(t) − Ax (t)) (A A ) T

A = 0.

x(t) = x (t), μ(t) = μ (t), for all t ∈ [0, t0 ].

(23)

1154

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

Next, similar to the above proof, and by the fact that g is locally Lipschitz on x(t0 ), there exists t1 > t0 such that x(t) = x (t), μ(t) = μ (t), for all t ∈ [t0 , t1 ]. Then, step by step, we obtain the uniqueness of the state of neural network (3). Since the equilibrium point is a stationary solution of differential equation, by Theorem 3.1, if (x ∗ , μ∗ )T is an equilibrium point of neural network (3), then we must have that Ax ∗ = b. We next prove that x ∗ is just an optimal solution of the nonlinear convex programming (1). Theorem 3.3: Let (x ∗ , μ∗ )T be an equilibrium point of neural network (3), then x ∗ is an optimal solution to the nonlinear convex programming (1). Conversely, if x ∗ is an optimal solution to the nonlinear convex programming (1), then there exists μ∗ > 0 such that (x ∗ , μ∗ )T is an equilibrium point of neural network (3). Proof: Let (x ∗ , μ∗ )T be an equilibrium point of neural network (3)  0 ∈ (I − P)[∂ f (x ∗ ) + ∂g(x ∗ )T μ∗ ] + A T h(Ax ∗ − b) 0 = −μ∗ + (μ∗ + g(x ∗ ))+ . Then, by Theorem 3.1, we have Ax ∗ = b and μ∗ ≥ 0. Meanwhile, taking ν ∗ = (A A T )−1 A(∂ f (x ∗ ) + ∂g(x ∗ )T μ∗ ) + h(Ax ∗ − b), we have 0 ∈ (I − P)[∂ f (x ∗ ) + ∂g(x ∗ )T μ∗ ] + A T h(Ax ∗ − b) = ∂ f (x ∗ ) + ∂g(x ∗ )T μ∗ + A T v∗ . Hence, by Lemma 2.2, x ∗ is an optimal solution to the nonlinear convex programming (1). Conversely, let x ∗ be an optimal solution to the nonlinear convex programming (1). Then, by Lemma 2.2, there exists (μ∗ , v∗ )T such that the equalities in (2) hold. Hence 0 ∈ (I − P)[∂ f (x ∗ ) + ∂g(x ∗ )T μ∗ + A T v∗ ] = (I − P)[∂ f (x ∗ ) + ∂g(x ∗ )T μ∗ ]. On the other hand, since 0 ∈ h(Ax ∗ − b), we have 0 ∈ (I − P)[∂ f (x ∗ ) + ∂g(x ∗ )T μ∗ ] + h(Ax ∗ − b). Combining with equality (2), it is obvious that (x ∗ , μ∗ )T is an equilibrium point of neural network (3). IV. C ONVERGENCE A NALYSIS In this section, we analyze the global convergence of neural network (3). First, we prove the following Lemma, which plays an important role in proving the stability of neural network (3). Lemma 4.1: If f (x, y) : Rn × Rn → R is convex and f is continuously differentiable with respect to x, then ∂ f (x 0 , y0 ) = ∂x f (x 0 , y0 ) × ∂ y f (x 0 , y0 ) where ∂ f (x 0 , y0 ) is the subdifferential of f at (x 0 , y0 ), ∂x f (x 0 , y0 ) is the partial subdifferential of f (·, y0 ) at x 0 , and ∂ y f (x 0 , y0 ) is similarly defined. Proof: From [38, Proposition 2.3.15], we have that ∂ f (x 0 , y0 ) ⊆ ∂x f (x 0 , y0 ) × ∂ y f (x 0 , y0 ). Hence, we only need to prove ∂x f (x 0 , y0 ) × ∂ y f (x 0 , y0 ) ⊆ ∂ f (x 0 , y0 ). Since f is continuously differentiable with respect

to x, ∂x f (x 0 , y0 ) reduces to a point ∇x f (x 0 , y0 ). Then, for any p y ∈ ∂ y f (x 0 , y0 ), we next prove that (∇x f (x 0 , y0 ), p y ) ∈ ∂ f (x 0 , y0 ). At first, by the differential mean value theorem, for any (v 1 , v 2 ) ∈ Rn × Rn , we have f (x 0 + tv 1 , y0 + tv 2 ) − f (x 0 , y0 + tv 2 ) t ∇x f (x 0 + sv 1 , y0 + tv 2 )tv 1 (s ∈ [0, t]) = lim t t →0+ f (x 0 + tv 1 , y0 ) − f (x 0 , y0 ) . = ∇x f (x 0 , y0 )v 1 = lim t t →0+

lim

t →0+

Therefore, for any (v 1 , v 2 ) ∈ Rn × Rn (∇x f (x 0 , y0 ), p y )T (v 1 , v 2 ) = ∇x f (x 0 , y0 )T v 1 + p Ty v 2 ≤ f x (x 0 , y0 ; v 1 ) + f y (x 0 , y0 ; v 2 ) f (x 0 + tv 1 , y0 ) − f (x 0 , y0 ) t f (x 0 , y0 + tv 2 ) − f (x 0 , y0 ) + lim t t →0+ f (x 0 + tv 1 , y0 + tv 2 ) − f (x 0 , y0 + tv 2 ) = lim t t →0+ f (x 0 , y0 + tv 2 ) − f (x 0 , y0 ) + lim t t →0+ f (x 0 + tv 1 , y0 + tv 2 ) − f (x 0 , y0 ) = lim t t →0+ = lim

t →0+

= f (x 0 , y0 ; v 1 , v 2 ) which implies that (∇x f (x 0 , y0 ), p y ) ∈ ∂ f (x 0 , y0 ) by Definition 2.1. Theorem 4.1: Each equilibrium point of neural network (3) is stable in the sense of Lyapunov. Moreover, for any initial point (x 0 , μ0 )T ∈ Rn × R p , the state of neural network (3) is convergent to an equilibrium point of neural network (3). Proof: The proof will be divided into two steps as follows. We firstly prove that for any initial point, the state of neural network (3) exists for t ∈ [0, +∞) and each equilibrium point of neural network (3) is stable in the sense of Lyapunov. According to the assumptions in the nonlinear convex programming (1) and Theorem 3.3, there exists at least an equilibrium point of neural network (3). Without loss of generality, we let (x ∗ , μ∗ )T be an equilibrium point of neural network (3). That is, there exist γ ∗ ∈ ∂ f (x ∗ ), η∗ ∈ ∂g(x ∗ ), and ξ ∗ ∈ h(Ax ∗ − b) such that  0 = (I − P)(γ ∗ + η∗T μ∗ ) + A T ξ ∗ (24) 0 = −μ∗ + (μ∗ + g(x ∗ ))+ . Multiplying (I − P) by both sides of (24), we have 0 = (I − P)2 (γ ∗ + η∗T μ∗ ) + (I − P)A T ξ ∗ = (I − P)(γ ∗ + η∗T μ∗ ).

(25)

QIN AND XUE: TWO-LAYER RECURRENT NEURAL NETWORK

1155

Construct the following Lyapunov function: 1 1 V (x, μ) = f (x) − f (x ∗ ) + μ2 − μ∗ 2 2 2 −(x − x ∗ )T (γ ∗ + η∗T μ∗ ) − (μ − μ∗ )T μ∗ 1 1 + x − x ∗ 2 + μ − μ∗ 2 (26) 2 2 where μ = (μ + g(x))+. It is well known that μ2 is convex on Rn × R p [13]. Let 1 ϕ(x, μ) = f (x) + μ2 . 2 Obviously, ϕ is convex with respect to (x, μ). Then, by Lemma 2.2 ϕ(x, μ)−ϕ(x ∗, μ∗ ) ≥ (γ ∗ +η∗T μ∗ )T (x −x ∗)+μ∗T (μ−μ∗ ) which implies that 1 1 (27) V (x, μ) ≥ x − x ∗ 2 + μ − μ∗ 2 ≥ 0. 2 2 That is, V (x, μ) is positive definite and radially unbounded. Meanwhile, the subdifferential of V (x, μ) with respect to x is ∂x V (x, μ) = ∂ f (x) + μT ∂g(x) − (γ ∗ + η∗T μ∗ ) + x − x ∗ (28) and the subdifferential of V (x, μ) with respect to μ is ∗

∂μ V (x, μ) = μ + μ − 2μ .

(29)

(x(t), μ(t))T

Let be a state of neural network (3) with the initial point (x(0), μ(0))T = (x 0 , μ0 )T , i.e., there exist measurable functions γ (t) ∈ ∂ f (x(t)), η(t) = (η1 (t), η2 (t), . . . , η p (t)) ∈ ∂g(x(t)), and ξ(t) ∈ h(Ax(t) − b), such that  x(t) ˙ = −(I − P)[γ (t) + η(t)T μ(t)] − A T ξ(t) (30) 2μ(t) ˙ = −μ(t) + (μ(t) + g(x(t)))+ for a.e. t. By Theorem 3.1, the state x(t) will reach the equality feasible region X 2 = {x ∈ Rn |Ax = b} in finite time and stay there thereafter. Without loss of generality, we only consider the stability of neural network (3) with x(t) ∈ X 2 , i.e., Ax(t) = b for all t ≥ 0. Then, similar to (11), we have x(t) ˙ = −(I − P)[γ (t) + η(t)T μ(t)].

(31)

Because (I − P) = (I − P)2 , (x(t) − x ∗ )T P = (x(t) − x ∗ )T A T (A A T )−1 A = (Ax(t) − Ax ∗ )T (A A T )−1 A = 0, and 0 = (I − P)(γ ∗ + η∗T μ∗ ) by (25), we have [γ (t) + η(t)T μ − (γ ∗ + η∗T μ∗ ) + x − x ∗ ]T x(t) ˙ = −[γ (t) + η(t)T μ(t)]T (I − P)[γ (t) + η(t)T μ − (γ ∗ + η∗T μ∗ ) + x(t) − x ∗ ] = −(I − P)[γ (t) + η(t)T μ(t)]2 − (γ ∗ + η∗T μ∗ )T (I − P)[γ (t) + η(t)T μ(t)] − (x(t) − x ∗ )T (I − P)[γ (t) + η(t)T μ(t) − γ ∗ − η∗T μ∗ ] 2 − (x − x ∗ )T (γ (t) − γ ∗ ) = −x(t)] ˙ − (x(t) − x ∗ )T (η(t)T μ(t) − η∗T μ∗ ).

Meanwhile 1 (μ + μ − 2μ∗ )T (−μ + μ) 2 1 = (−μ + μ + 2μ − 2μ∗ )T (−μ + μ) 2 2 = −2μ(t) ˙ + (−μ + μ)T (μ − μ∗ ). Hence, by chain rule (i.e., Lemma 2.1) and Lemma 4.1, for a.e. t ≥ 0, we have d V (x(t), μ(t)) dt = [γ (t) + η(t)T μ − (γ ∗ + η∗T μ∗ ) + x(t) − x ∗ ]T x(t) ˙ 1 ∗ T + (μ + μ − 2μ ) (−μ + μ) 2 2 = −x(t) ˙ − (x(t) − x ∗ )T (γ (t) − γ ∗ ) − (x(t) − x ∗ )T (η(t)T μ(t) − η∗T μ∗ ) 2 + (−μ(t) + μ(t))T (μ(t) − μ∗ ). − 2μ(t) ˙ From the convexities of f and g, we have ⎧ ∗ T ∗ ⎪ ⎨(x(t) − x ) (γ (t) − γ ) ≥ 0 g(x ∗ ) − g(x(t)) − η T (t)(x ∗ − x(t)) ≥ 0 ⎪ ⎩ g(x(t)) − g(x ∗ ) − η∗T (x(t) − x ∗ ) ≥ 0.

(32)

(33)

Substituting (33) into (32), one has d 2 2 V (x(t), μ(t)≤ −x(t) ˙ −2μ(t) ˙ +μ(t)T g(x ∗) dt −μ∗T g(x ∗ ) − μ(t)T g(x(t)) + μ∗T g(x(t)) +(−μ(t) + μ(t))T (μ − μ∗ ).

(34)

From (24), it is clear that μ(t)T g(x ∗ ) = g T (x ∗ )(μ + g(x))+ ≤ 0 μ∗T g(x ∗ ) = 0. On the other hand, noting that μ + g(x) − (μ + g(x))+ = (μ + g(x))− ≤ 0 μT (μ + g(x))− = (μ + g(x))+ T (μ + g(x))− = 0 and by (34), we have d V (x(t), μ(t)) dt 2 2 ≤ −x(t) ˙ − 2μ(t) ˙ + μ(t)T g(x ∗ ) − μ∗T g(x ∗ ) + (μ∗ − μ)T [μ(t) + g(x(t)) − (μ(t) + g(x(t)))+ ] 2 2 ≤ −x(t) ˙ − 2μ(t) ˙ + μ∗ (μ(t) + g(x(t)))− 2 2 −μ(t)T (μ(t)+g(x(t)))− ≤ −x(t) ˙ −2μ(t) ˙ . (35)

Hence, 0 ≤ V (x(t), μ(t)) ≤ V (x(0), μ(0)) < +∞, for t > 0. On the other hand, since V is radially unbounded, then for any initial point (x 0 , μ0 )T ∈ Rn × R p , the state (x(t), μ(t))T of neural network (3) is bounded, which means that (x(t), μ(t))T exists for t ∈ [0, +∞). In addition, according to (35), the equilibrium point (x ∗ , μ∗ )T of neural network (3) is stable in the sense of Lyapunov. And by the arbitrariness of (x ∗ , μ∗ )T , each equilibrium point of neural network (3) is stable in the sense of Lyapunov.

1156

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

We next prove that for any initial point, the state of neural network (3) is convergent to an equilibrium point of neural network (3). Let 1 H (x, μ) = inf (I − P)[γ + η T μ]2 + μ − μ2 : 2

γ ∈ ∂ f (x), η ∈ ∂g(x) . (36)

Hence, by (35)  +∞ H (x(t), μ(t))dt 0  s H (x(t), μ(t))dt = lim s→+∞ 0  s V˙ (x(t), μ(t))dt ≤ − lim

It is clear that H (x, μ) = 0 with Ax = b if and only if (x, μ) is an equilibrium point of neural network (3). Due to the boundedness of (x(t), μ(t))T , there exists a convergent subsequence

= V (x(0), μ(0)) − V0 < +∞.

{(x(tk ), μ(tk ))T |0 ≤ t1 ≤ t2 ≤ · · · }, and tk → +∞ such that (x(tk ), μ(tk ))T → ( x,  μ)T . By Theorem 3.1, it is obvious that A x = b. We next prove H ( x,  μ) = 0, which means that ( x,  μ) T is an equilibrium point of neural network (3). First, due to the boundedness of (x(t), μ(t))T and (30), we choose a sufficiently large constant M > 0 such that x(t) ˙ + μ(t) ˙ ≤M for all t ≥ 0. Suppose H ( x,  μ) = 0, then H ( x,  μ) > 0. According to Theorem 5 on page 52 in [37], H (x, μ) is lower semicontinuous with respect to (x, μ). Hence, there exist ε > 0 and δ > 0, such that H (x, μ) > ε

(37)

for all (x, μ) ∈ B( x,  μ; δ) = {(x, μ) ∈ Rn × R p : x −  x + μ −  μ < δ}. Since (x(tk ), μ(tk ))T → ( x,  μ)T , there exists a positive integer N, such that x  + μ(tk ) −  μ < x(tk ) −  for all k > N. When t ∈ [tk − have

δ 4M , tk

+

δ 2

δ 4M ]

and k > N, we

 k≥N

δ δ tk − 4M ,tk + 4M

 εdt =

δ ε = +∞. 2M

W (x, μ) ≥

1 1 x −  x 2 + μ −  μ2 2 2

d W (x(t), μ(t)) ≤ 0. (39) dt Clearly, W ( x,  μ) = 0. By the continuity of W , for any ε > 0, there exists δ > 0 such that W (x, μ) = |W (x, μ) − W ( x,  μ)| < ε

(40)

 μ2

(41)

Hence, combining with (39), (40), and (41), we have

(38)

k≥N

On the other hand, from (27) and (35), there exists a constant V0 such that lim V (x(t), μ(t)) = V0 . t →+∞

Finally, we prove that lim (x(t), μ(t))T = ( x,  μ)T , which t →+∞ completes the proof. We construct another Lyapunov function as follows: 1 1 μ2 W (x, μ) = f (x) − f ( x ) + μ2 −  2 2 −(x −  x )T ( γ + ηT  μ) − (μ −  μ) T  μ 1 1 x 2 + μ −  μ2 . + x −  2 2 By similar analysis above, we have

x 2 + μ(t N ) −  μ2 < δ. x(t N ) − 

tk − 4M ,tk + 4M



Obviously, it contradicts with (38). Hence, H ( x,  μ) = 0, which means that ( x,  μ)T is an equilibrium point of neural network (3), and  x is an optimal solution to the nonlinear convex programming (1). That is, there exist  γ ∈ ∂ f ( x ),  η ∈ ∂g( x ) such that ⎧ ⎪ μ) γ + ηT  ⎨0 = (I − P)( 0 = − μ + ( μ + g( x ))+ ⎪ ⎩ A x = b.

+ μ − < δ. Meanwhile, since when x − x,  μ)T , there exists t N such that (x(tk ), μ(tk ))T → (

By (37), we have H (x(t), μ(t)) > ε for all t ∈ [tk −δ/4M, tk + δ/4M] and k > N. Hence  +∞ H (x(t), μ(t))dt 0  ≥   H (x(t), μ(t))dt δ δ ≥

s→+∞

 x 2

x(t) −  x  + μ(t) −  μ ≤ x(t) − x(tk ) + x(tk ) −  x  + μ(t) − μ(tk ) δ + μ(tk ) −  μ ≤ M|t − tk | + ≤ δ. 2

k≥N

s→+∞ 0

= − lim [V (x(t), μ(t)) − V (x(0), μ(0))]

1 1 x(t) −  x 2 + μ(t) −  μ2 2 2 ≤ W (x(t), μ(t)) ≤ W (x(t N ), μ(t N )) ≤ ε for all t > t N , which implies that

lim (x(t), μ(t))T =

t →+∞

( x,  μ)T . Therefore, for any initial point (x 0 , μ0 )T ∈ Rn × R p , the state (x(t), μ(t))T of neural network (3) is convergent to an equilibrium point of neural network (3). Meanwhile, from different initial point, the state of neural network (3) may be convergent to different equilibrium point of neural network (3). Remark 1: In Theorem 4.1, it is proved that for any initial point, the state of neural network (3) exists for t ∈ [0, +∞).

QIN AND XUE: TWO-LAYER RECURRENT NEURAL NETWORK

From Theorem 3.1, there exists t0 > 0 such that the state of neural network (3) reaches the equality feasible region in finite time t0 and stay there thereafter. Hence, from Theorem 3.2, the state of neural network (3) is unique for t ∈ [t0 , +∞). Remark 2: The nonlinear convex programming (1) has been studied extensively by many researchers [10], [12], [13], [18], [21], [28], [32], [41]. Compared with existed neural networks, the neural network (3) is not based on penalty method, and has low model complexity. For example, in [28], the authors proposed a three-layer recurrent neural network to solve the optimization problem (1). On the other hand, just as said in Remark 4 of [28], after some simple translation, the proposed neural network in [28] can be simplified as a two-layer recurrent neural network. However, in [28], it is only proved that the state of the simplified neural network is exponentially convergent to the equality feasible region X 2 = {x : Ax = b}, not in finite time. It is possible that the state of the simplified neural network is exponentially convergent to X 2 , but never attain X 2 . Hence, to solve optimization problem (1), the initial point of the simplified neural network in [28] should be chosen in X 2 . Obviously, it is not an easy job, especially in high dimension Euclidean space. In this paper, the state of neural network (3) is convergent to the equality feasible region X 2 in finite time, which means that we need not choose some special initial point in advance. Remark 3: Recently, to solve the nonlinear programming (1) efficiently, some simplified neural networks were proposed [21], [22], [27], [29], [30], [36], [39]. However, most of these simplified neural networks were based on some stricter hypothesizes. For example, in [39], a one-layer neural network was proposed to solve (1) based on the following assumption:  X 2 is nonempty and bounded. (42) int(X 1 ) Moreover, the neural network in [39] was also based on penalty function method. Meanwhile, to avoid penalty parameters, Bian and Xue in [36] proposed another neural network based on the assumption (42). Obviously, the hypothesizes in this paper are weaker than those in [36] and [39], and neural network (3) is not based on penalty function method. Here, we introduce an example to show the advantage of neural network (3). Example 1: Consider the following nonsmooth optimization problem: min f (x) = 10(x 1 + x 2 )2 + (x 1 − 2)2 + 20|x 3 − 3| + e x3 s. t. g(x) = (x 1 + 3)2 + x 2 − 36 ≤ 02x 1 + 5x 3 = 7. (43) Obviously, f (x) and g(x) are convex. Moreover, after simple calculation, we have {(1, x 2 , 1)T ∈ R3 : x 2 < 20} ⊆ int{x ∈ R3 : (x 1 + 3)2 + x 2 − 36 ≤ 0} {x ∈ R3 : 2x 1 + 5x 3 = 7}. It is easy to see that the set {(1, x 2 , 1)T ∈ R3 : x 2 < 20} is unbounded, which means that neural networks in [36] and [39] may not be capable of solving this problem.

1157

Fig. 3. Trajectories of the states (x1 (t), x2 (t), x3 (t))T of neural network (3) with three different initial points in Example 1.

On the other hand, the proposed neural network (3) can be used to solve problem (43). Here, we use the proposed neural network (3) to solve problem (43). Fig. 3 displays the transient behavior of (x 1 (t), x 2 (t), x 3 (t))T of neural network (3) with three random initial points A1 = (−1, 0, 1, 6), A2 = (−0.5, 0.5, 1.5, 6), and A3 = (0, 1.5, 2, 6). It is easy to see that states of neural network (3) starting at different initial points are all convergent to the unique equilibrium point. V. A PPLICATIONS AND N UMERICAL E XAMPLES In this section, two examples are proposed to show the effectiveness of neural network (3). The simulation is conducted on MATLAB. A. Nonlinear Convex Programming With Linear Constraints As a special case of the nonlinear convex programming (1), we study the following nonlinear convex programming with linear constraints: minimize f (x) subject to Ax = b

(44)

where x = (x 1 , x 2 , . . . , x n ∈ f : → R is the objective function which is convex but may be nonsmooth, A ∈ Rm×n is a full row-rank matrix (i.e., rank(A) = m ≤ n), b = (b1 , b2 , . . . , bn )T ∈ Rn . According to the discussions in Section IV, the related neural network (3) reduces to the following one-layer neural network: )T

Rn ,

Rn

x(t) ˙ ∈ −(I − P)∂ f (x(t)) − A T h(Ax(t) − b)

(45)

where I , P and h are similarly defined in (3). Theorem 5.1: 1) For any initial point x(0) ∈ Rn , there is at least a solution x(t) of neural network (45), which will reach X 2 = {x ∈ Rn |Ax = b} in finite time and stay there thereafter. 2) x ∗ is an equilibrium point of neural network (45) if and only if x ∗ is an optimal solution to problem (44). 3) Each equilibrium point of neural network (45) is stable in the sense of Lyapunov, and the state of neural

1158

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

Fig. 4. Transient behaviors of x(t) of neural network (45) with eight random initial points in Example 2.

Fig. 5. Transient behaviors of state variables of neural network (45) in Example 3.

network (45) is convergent to an equilibrium point of neural network (45). Remark 4: Recently, to reduce the model complexity, some one-layer neural networks were proposed [20]–[22], [27], [29], [30], [39]. Especially, Guo et al. [29] used neural network (45) to solve problem (44) with pseudoconvex function f . In addition, they proved that the state of the neural network (45) was globally convergent to the equilibrium point set. Obviously, according to Theorem 5.1, when f is convex, neural network (45) is convergent to an equilibrium point, which improves the conclusions of [29]. Next example illustrates it. Example 2: Consider the following optimization problem:

where C ∈ Rn×n , d ∈ Rn , g(x) = (g1 (x), g2 (x), . . . , g p (x))T : Rn → R p is a p-dimensional vector valued function, A ∈ Rm×n is a full row-rank matrix (i.e., rank(A)= m ≤ n), b = (b1 , b2 , . . . , bn )T ∈ Rn , ⊆ Rn is a convex set. L 1 -norm minimization problem plays an important role in engineering applications, such as manipulator control, signal processing, support vector machine, nonlinear curve fitting problem, and so on [42]–[44]. Obviously, L 1 -norm minimization problem (47) is a nonsmooth convex optimization problem if g(x) is convex. Example 3: Consider the following L 1 -norm optimization problem:

1 (x 1 + x 2 )2 2 subject to x 1 + x 2 + x 3 = 7.

subject to g(x) ≤ 0 Ax = b

minimize C x − d1

minimize f (x) =

(46)

In this problem, the objective function f is convex. Here, we use neural network (45) to solve this problem. By Theorem 5.1, for any initial point x(0) ∈ Rn , the state of neural network (45) is convergent to an equilibrium point of neural network (45). Obviously, the optimal solution set of problem (46) is = {x = (x 1 , x 2 , x 3 )|x 1 + x 2 = 0, x 3 = 7}. Fig. 4 shows the transient behavior of x(t) based on neural network (45) with eight random initial points Ai (i = 1, 2, . . . , 8). From Fig. 4, one can see that from different initial point Ai , the state of neural network (45) is convergent to different equilibrium point of neural network (45), which all is the optimal solution of problem (46). B. L 1 -Norm Minimization Problem L 1 -norm minimization problem, also known as the constrained least absolute deviation problem, refers to finding the minimum L 1 -norm solution subject to some constraints, which can be shown as minimize C x − d1 subject to g(x) ≤ 0 Ax = b, x ∈

(47)

(48)

where x ∈ R3 ⎛ ⎞ ⎛ ⎞ 1 0 3.5 1.5 C = ⎝ 0 2 1.6 ⎠, d = ⎝ 3.8 ⎠ 1 1 1 −6.2   1 1 1 0 , b= . g(x) = x 12 − x 2 + x 3 , A = 0 2 1 2. It is clear that problem (48) is a nonsmooth convex optimization problem. Neural network (3) associated to (48) can be described as  x˙ ∈ −(I − P)[∂ f (x) + ∂g(x)T μ] − A T h(Ax − b) (49) 2μ˙ = −μ + (μ + g(x))+ where



⎞ 0.1667 0.1667 −0.3333 0.1667 −0.3333 ⎠ I − P = ⎝ 0.1667 −0.3333 −0.3333 0.6667 f (x) = |x 1 + 3.5x 3 − 1.5| + |2x 2 + 1.6x 3 − 3.8| +|x 1 + x 2 + x 3 + 6.2|.

We simulate the neural network (49) by MATLAB. Fig. 5 displays the trajectory of the state variables x 1 , x 2 , x 3 , μ of neural network (49) with three different initial points (−0.6, 1, −0.2, 1.2)T , (−1.5, 0.5, 0, 1.6)T , and

QIN AND XUE: TWO-LAYER RECURRENT NEURAL NETWORK

(−1, 1.5, 0.4, 1.8)T . In Fig. 5, one can easily see that the states of neural network (49) with different initial points are all convergent to (−1, 1, 0, 1.44)T , which is an equilibrium point of neural network (49). Hence, the optimal solution of L 1 -norm optimization problem (48) is x ∗ = (−1, 1, 0)T and C x ∗ − d1 = 10.5. VI. C ONCLUSION This paper presents a two-layer recurrent neural network modeled by a differential inclusion for solving nonsmooth convex optimization problem subject to convex inequality and linear equality constraints. The equilibrium point set of the proposed neural network is equivalent to the KKT optimality set of the nonsmooth convex optimization problem. Each equilibrium point of the proposed neural network is stable in the sense of Lyapunov, and the state of the proposed neural network starting at any initial point is convergent to an equilibrium point. Compared with existing neural network models, this neural network has two-layer structure and does not depend on any penalty parameters, which means that it is more convenient for design and implementation. As applications, three examples are proposed to show the effectiveness of the proposed neural network. Meanwhile, it is well known that many nonlinear programming problems can be formulated as nonconvex optimization problems. Hence, proposing a simplified recurrent neural network for nonconvex optimization problems (especially pseudoconvex optimization problems) is our further work. ACKNOWLEDGMENT The authors would like to thank the Editor-in-Chief, the associate editor, and the anonymous reviewers for their insightful and constructive comments, which helped to enrich the content and improve the presentation of this paper. R EFERENCES [1] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms. New York, NY, USA: Wiley, 1993. [2] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing. New York, NY, USA: Wiley, 1993. [3] Y. Zhang, J. Wang, and Y. Xia, “A dual neural network for redundancy resolution of kinematically redundant manipulators subject to joint limits and joint velocity limits,” IEEE Trans. Neural Netw., vol. 14, no. 3, pp. 658–667, May 2003. [4] Z.-G. Hou, L. Cheng, M. Tan, and X. Wang, “Distributed adaptive coordinated control of multi-manipulator systems using neural networks,” in Proc. Robot Intell. Adv. Inform. Knowl. Process., 2010, pp. 49–69. [5] Z.-G. Hou, M. M. Gupta, P. N. Nikiforuk, M. Tan, and L. Cheng, “A recurrent neural network for hierarchical control of interconnected dynamic systems,” IEEE Trans. Neural Netw., vol. 18, no. 2, pp. 466–481, Mar. 2007. [6] L. Cheng, Z.-G. Hou, and M. Tan, “Constrained multi-variable generalized predictive control using a dual neural network,” Neural Comput. Appl., vol. 16, no. 6, pp. 505–512, Oct. 2007. [7] Y. Liao, S.-C. Fang, and H. L. W. Nuttle, “A neural network model with bounded-weights for pattern classification,” Comput. Oper. Res., vol. 31, no. 9, pp. 1411–1426, Aug. 2004. [8] D. Tank and J. J. Hopfield, “Simple ‘neural’ optimization networks: An A/D converter, signal decision circuit, and a linear programming circuit,” IEEE Trans. Circuits Syst., vol. 33, no. 5, pp. 533–541, May 1986. [9] M. P. Kennedy and L. O. Chua, “Neural networks for nonlinear programming,” IEEE Trans. Circuits Syst., vol. 35, no. 5, pp. 554–562, May 1988.

1159

[10] Y. Xia, G. Feng, and J. Wang, “A novel recurrent neural network for solving nonlinear optimization problems with inequality constraints,” IEEE Trans. Neural Netw., vol. 19, no. 8, pp. 1340–1353, Aug. 2008. [11] Y. Xia and J. Wang, “A recurrent neural network for nonlinear convex optimization subject to nonlinear inequality constraints,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 7, pp. 1385–1394, Jul. 2004. [12] Y. Yang and J. Cao, “The optimization technique for solving a class of non-differentiable programming based on neural network method,” Nonlinear Anal., Real World Appl., vol. 11, no. 2, pp. 1108–1114, Apr. 2010. [13] X.-B. Gao, “A novel neural network for nonlinear convex programming,” IEEE Trans. Neural Netw., vol. 15, no. 3, pp. 613–621, May 2004. [14] Y. Xia, G. Feng, and J. Wang, “A recurrent neural network with exponential convergence for solving convex quadratic program and related linear piecewise equations,” Neural Netw., vol. 17, no. 7, pp. 1003–1015, 2004. [15] S. Qin and X. Xue, “Dynamical analysis of neural networks of subgradient system,” IEEE Trans. Autom. Control, vol. 55, no. 10, pp. 2347–2352, Oct. 2010. [16] S. Qin and X. Xue, “Dynamical behavior of a class of nonsmooth gradient-like systems,” Neurocomputing, vol. 73, nos. 13–15, pp. 2632–2641, 2010. [17] S. Qin and X. Xue, “Global exponential stability and global convergence in finite time of neural networks with discontinuous activations,” Neural Process. Lett., vol. 29, no. 3, pp. 189–204, 2009. [18] Y. Xia and J. Wang, “A recurrent neural network for solving nonlinear convex programs subject to linear constraints,” IEEE Trans. Neural Netw., vol. 16, no. 2, pp. 379–386, Mar. 2005. [19] M. Forti, P. Nistri, and M. Quincampoix, “Generalized neural network for nonsmooth nonlinear programming problems,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 9, pp. 1741–1754, Sep. 2004. [20] Q. Liu and J. Wang, “A one-layer recurrent neural network with a discontinuous activation function for linear programming,” Neural Comput., vol. 20, no. 5, pp. 1366–1383, 2008. [21] Q. Liu and J. Wang, “A one-layer recurrent neural network with a discontinuous hard-limiting activation function for quadratic programming,” IEEE Trans. Neural Netw., vol. 19, no. 4, pp. 558–570, Apr. 2008. [22] Y. Xia and J. Wang, “A one-layer recurrent neural network for support vector machine learning,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 1261–1269, Apr. 2004. [23] X. Xue and W. Bian, “Subgradient-based neural networks for nonsmooth convex optimization problems,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 8, pp. 2378–2391, Sep. 2008. [24] W. Bian and X. Xue, “Subgradient-based neural networks for nonsmooth nonconvex optimization problems,” IEEE Trans. Neural Netw., vol. 20, no. 6, pp. 1024–1038, Jun. 2009. [25] Q. Liu, J. Cao, and G. Chen, “A novel recurrent neural network with finite-time convergence for linear programming,” Neural Comput., vol. 22, no. 11, pp. 2962–2978, Nov. 2010. [26] M. Forti, P. Nistri, and M. Quincampoix, “Convergence of neural networks for programming problems via a nonsmooth Łojasiewicz inequality,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1471–1486, Nov. 2006. [27] X. Gao and L.-Z. Liao, “A new one-layer neural network for linear and quadratic programming,” IEEE Trans. Neural Netw., vol. 21, no. 6, pp. 918–929, Jun. 2010. [28] L. Cheng, Z.-G. Hou, Y. Lin, M. Tan, W. C. Zhang, and F.-X. Wu, “Recurrent neural network for non-smooth convex optimization problems with application to the identification of genetic regulatory networks,” IEEE Trans. Neural Netw., vol. 22, no. 5, pp. 714–726, May 2011. [29] Z. Guo, Q. Liu, and J. Wang, “A one-layer recurrent neural network for pseudoconvex optimization subject to linear equality constraints,” IEEE Trans. Neural Netw., vol. 22, no. 12, pp. 1892–1900, Dec. 2011. [30] Q. Liu, Z. Guo, and J. Wang, “A one-layer recurrent neural network for constrained pseudoconvex optimization and its application for dynamic portfolio optimization,” Neural Netw., vol. 26, no. 1, pp. 99–109, Feb. 2012. [31] W. Bian and X. Chen, “Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization,” SIAM J. Optim., vol. 23, no. 3, pp. 1718–1741, 2013. [32] Q. Liu and J. Wang, “Finite-time convergent recurrent neural network with a hard-limiting activation function for constrained optimization with piecewise-linear objective functions,” IEEE Trans. Neural Netw., vol. 22, no. 4, pp. 601–613, Apr. 2011.

1160

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

[33] X. Li, C. Ma, and L. Huang, “Invariance principle and complete stability for cellular neural networks,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 3, pp. 202–206, Mar. 2006. [34] M. Forti and A. Tesi, “Absolute stability of analytic neural networks: An approach based on finite trajectory length,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 12, pp. 2460–2469, Dec. 2004. [35] M. Forti, P. Nistri, and D. Papini, “Global exponential stability and global convergence in finite time of delayed neural networks with infinite gain,” IEEE Trans. Neural Netw., vol. 16, no. 6, pp. 1449–1463, Nov. 2005. [36] W. Bian and X. Xue, “Neural network for solving constrained convex optimization problems with global attractivity,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 60, no. 3, pp. 710–723, Mar. 2013. [37] J. P. Aubin and A. Cellina, Differential Inclusions. Berlin, Germany: Springer-Verlag, 1984. [38] F. Clarke, Optimization and Nonsmooth Analysis. New York, NY, USA: Wiley, 1983. [39] Q. Liu and J. Wang, “A one-layer recurrent neural network for constrained nonsmooth optimization,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 41, no. 5, pp. 1323–1333, Oct. 2011. [40] A. F. Filippov, “Differential equations with discontinuous right-hand side,” Transl. Amer. Math. Soc., vol. 42, no. 2, pp. 199–231, 1964. [41] G. Li, S. Song, C. Wu, and Z. Du, “A neural network model for non-smooth optimization over a compact convex subset,” Adv. Neural Netw.-Int. Soc. Nutrigenet. Nutrigenom. (ISNN), vol. 3971, pp. 344–349, Jun. 2006. [42] R. Xu, J. Jiao, B. Zhang, and Q. Ye, “Pedestrian detection in images via cascaded L1-norm minimization learning method,” Pattern Recognit., vol. 45, no. 7, pp. 2573–2583, 2012. [43] S. Bekta¸s and Y. Si¸ ¸ sman, “The comparison of 1 and 2 -norm minimization methods,” Int. J. Phys. Sci., vol. 5, no. 11, pp. 1721–1727, 2010. [44] A. Bhusnurmath and C. J. Taylor, “Graph cuts via 1 norm minimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 10, pp. 1866–1871, Oct. 2008.

Sitian Qin was born in Shandong, China, in 1981. He received the Ph.D. degree in mathematics from the Harbin Institute of Technology, Harbin, China, in 2010. He was an Assistant Professor with the University of Harbin Institute of Technology, Weihai, China, from 2010 to 2013, where he is currently an Associate Professor. His current research interests include neural network theory and optimization.

Xiaoping Xue was born in Inner Mongolia, China, in 1963. He received the Ph.D. degree in mathematics from the Harbin Institute of Technology, Harbin, China, in 1991. He has been a Full Professor of Mathematics with the Harbin Institute of Technology since 1997, where he was an Assistant Professor from 1991 to 1992 and an Associate Professor from 1992 to 1997. He has authored more than 80 scientific papers and two monographs. His current research interests include functional analysis, differential inclusion, and neural network theory.

A two-layer recurrent neural network for nonsmooth convex optimization problems.

In this paper, a two-layer recurrent neural network is proposed to solve the nonsmooth convex optimization problem subject to convex inequality and li...
1015KB Sizes 1 Downloads 4 Views