1130

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 7, JULY 2012

Reproducing Kernel Hilbert Space Approach for the Online Update of Radial Bases in Neuro-Adaptive Control Hassan A. Kingravi, Girish Chowdhary, Patricio A. Vela, Member, IEEE, and Eric N. Johnson

Abstract— Classical work in model reference adaptive control for uncertain nonlinear dynamical systems with a radial basis function (RBF) neural network adaptive element does not guarantee that the network weights stay bounded in a compact neighborhood of the ideal weights when the system signals are not persistently exciting (PE). Recent work has shown, however, that an adaptive controller using specifically recorded data concurrently with instantaneous data guarantees boundedness without PE signals. However, the work assumes fixed RBF network centers, which requires domain knowledge of the uncertainty. Motivated by reproducing kernel Hilbert space theory, we propose an online algorithm for updating the RBF centers to remove the assumption. In addition to proving boundedness of the resulting neuro-adaptive controller, a connection is made between PE signals and kernel methods. Simulation results show improved performance. Index Terms— Adaptive control, kernel, nonlinear control systems, radial basis function (RBF) networks.

I. I NTRODUCTION

I

N PRACTICE, it is impractical to assume that a perfect model of a system is available for controller synthesis. Assumptions introduced in modeling, and changes to system dynamics during operation, often result in modeling uncertainties that must be mitigated online. Model reference adaptive control (MRAC) has been widely studied for broad classes of uncertain nonlinear dynamical systems with significant modeling uncertainties [1], [14], [29], [41]. In MRAC, the system uncertainty is approximated using a weighted combination of basis functions, with the numerical values of the weights adapted online to minimize the tracking error. When the structure of the uncertainty is unknown, a neuro-adaptive approach is often employed, in which a neural network NN with its weights adapted online is used to capture the uncertainty. Manuscript received May 5, 2011; revised April 13, 2012; accepted April 20, 2012. Date of publication May 30, 2012; date of current version June 8, 2012. This work was supported in part by NSF ECS-0238993, NSF ECS0846750, and the NASA Cooperative Agreement NNX08AD06A. H. A. Kingravi and P. A. Vela are with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 303320320 USA (e-mail: [email protected]; [email protected]). G. Chowdhary is with the Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA 02139-4307 USA (e-mail: [email protected]). E. N. Johnson is with the School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0320 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2012.2198889

A popular example of such a NN is the Gaussian radial basis function (RBF) NN for which the universal approximation property is known to hold [31]. Examples of MRAC systems employing Gaussian RBFs can be found in [26], [33], [39], [43], and [44]. In practice, then, nonlinear MRAC systems typically consist of an additional online-learning component. What differentiates MRAC from traditional online-learning strategies is the existence of the dynamical system and its connection to a control task. The strong connection between the learning and the evolving dynamical system through the controller imposes constraints on the online-learning strategy employed. These constraints ensure viability of the system’s closed loop stability and performance. For example, classical gradientbased MRAC methods require a condition on persistency of excitation (PE) in system states [3]. This condition is often impractical to implement and/or infeasible to monitor online. Hence, authors have introduced various modifications to the adaptive law to ensure that the weights stay bounded around an a priori determined value (usually set to 0). Examples of these include Ioannou’s σ -mod, Narendra’s e-mod, and the use of a projection operator to bind the weights [14], [41]. Recent work in concurrent learning has shown that if carefully selected and online-recorded data is used concurrently with current data for adaptation, then the weights remain bounded within a compact neighborhood of the ideal weights [8], [9]. Neither concurrent learning nor the aforementioned modifications require PE. When concurrent learning is used, in order to successfully learn the uncertainty, a set of RBF centers must be chosen over the domain of the uncertainty. In prior neuro-adaptive control research, it is either assumed that the centers for the RBF network are fixed [8], [9], [19], [32], [42], the operating domain is restricted to a compact set [44], or that the centers are moved to minimize the tracking error e [27], [38]. In the former case, the system designer is assumed to have some domain knowledge about the uncertainty to determine how the centers should be selected. In the latter case, the updates favor instantaneous tracking error reduction, and the resulting center movement does not guarantee optimal uncertainty capture across the entire operating domain. In this paper, utilizing the fact that the Gaussian RBF satisfies the properties of a Mercer kernel, we use methods from the theory of reproducing kernel Hilbert spaces (RKHSs) to remove the need for domain knowledge and to propose a novel kernel adaptive control algorithm.

2162–237X/$31.00 © 2012 IEEE

KINGRAVI et al.: RKHS APPROACH FOR THE ONLINE UPDATE OF RADIAL BASES IN NEURO-ADAPTIVE CONTROL

Partly due to the success of support vector machines, there has been great interest in recent years in kernel methods, a class of algorithms exploiting RKHS properties which can be used for classification [6], regression [2], [37] and unsupervised learning [10]. We first use RKHS theory to make a connection between PE signals in the state space (e.g., the input space) and PE signals in the higher dimensional feature space (with respect to the RBF bases). Particularly, we show that the amount of persistent excitation inserted in the states of the system may vanish or greatly diminish, if the RBF centers are improperly chosen. We then utilize this connection along with the kernel linear independence test [30] to propose an online algorithm, called budgeted kernel restructuring (BKR). BKR picks centers so as to ensure that any excitation in the system does not vanish or diminish. The algorithm is then augmented with concurrent learning (CL). The resulting kernel adaptive algorithm BKR-CL uses instantaneous as well as specifically selected and online-recorded data points concurrently for adaptation. Lyapunov-like analysis proves convergence of the RBF weights to a compact neighborhood of their ideal values (in the sense of the universal approximation property), if the system states are exciting over a finite interval. BKR-CL does not require PE. Further, BKR-CL can control an uncertain multivariable dynamical system effectively, even if all the RBF centers are initialized to the same value (e.g., to 0). In addition to removing the assumption of fixed RBF centers, we show that, given a fixed budget (maximum number of allowable centers), the presented method outperforms existing methods that uniformly distribute the centers over an expected domain. Effectiveness of the algorithm is shown through simulation and comparison with other neuro-adaptive controllers. To the authors’ knowledge, little work exists explicitly connecting ideas from RKHS theory to neuro-adaptive control. Recent work on bridging kernel methods with filtering and estimation exists under the name of kernel adaptive filtering [22]. Kernel recursive square methods [11] and kernel least mean square algorithms [21] have been previously explored for kernel filtering, which is mainly used for system-identification problems, where the goal is to minimize the error between the system model and an estimation model. However, The main contribution of this paper is the presentation of an adaptive control technique that uses RKHS theory. The key requirements being that the tracking error between reference model and the system, and the adaptation error between the adaptive element and the actual uncertainty be simultaneously driven to a bounded set. Kernel adaptive filtering only addresses the former requirement. Although there has been other work applying kernel methods such as support vector machines to nonlinear control problems [20], [30], it does not make any explicit connections to PE conditions, and does not show the boundedness of the weights in a region around their true values. This paper takes the first crucial steps in establishing that connection, and therefore serves further to bring together ideas from machine learning and adaptive control. Organization: Section II outlines some preliminaries, including the definition of PE and the relevant ideas from RKHS theory. Section III poses the MRAC problem for uncertain multivariable nonlinear dynamical systems, and outlines

1131

the CL method. Section IV establishes a connection between PE and RKHS theory, then utilizes the connection to outline the BKR algorithm for online center selection. Section V outlines the BKR-CL algorithm, and establishes the boundedness of the weights for the centers using Lyapunov-like analysis. Section VI presents the results of simulated experiments. Section VII concludes this paper. II. P RELIMINARIES A. RKHS Let H be a Hilbert space of functions defined on the domain D ⊂ Rn . Given any element f ∈ H and x ∈ D, by the Riesz representation theorem [12], there exists some unique element K x (called a point evaluation functional) such that f (x) =  f, K x H .

(1)

H is called an RKHS if every such linear functional K x is continuous (something which is not true in an arbitrary Hilbert space). It is possible to construct these spaces using tools from operator theory. The following is a standard material; see [13] and [10] for more details. A Mercer kernel on D ⊂ Rn is any continuous, symmetric positive-semidefinite function of the form k : D × D → R. Associated to k is a linear operator Tk : L 2 (D) → R which can be defined as  Tk ( f )(x) := k(x, s) f (s)ds where f is a function in L 2 (D), the space of square integrable functions on the domain D. Then the following theorem holds: Theorem 1 (Mercer [23]): Suppose k is a continuous, symmetric positive-semidefinite kernel. Then there is an orthonormal basis {ei }i∈I of L 2 (D) consisting of eigenfunctions of Tk such that the corresponding sequence of eigenvalues (λi )i∈I is nonnegative. Futhermore, the eigenfunctions corresponding to nonzero eigenvalues are continuous on D, and k has the representation k(s, t) =

∞ 

λ j e j (s) e j (t)

j =1

wherethe convergence is absolute and uniform. 2 If ∞ i=1 λi < ∞, this orthonormal set of eigenfunctions }i∈I can be used to construct a space H := { f | f = {e i∞ of reals (αi )i∈I , with the i=1 αi ei } for some sequence  2 /λ < ∞, where  f  constraint that  f 2H := ∞ α i H is i=1 i the norm induced by k. It can be seenthat the point evaluation functional K x is given by K x = i λi ei (x). This implies that k(x, y) = K x , K y H , and for a function f ∈ H, (1) is continuous, which is known as the reproducing property. The importance of these spaces from a machine learning perspective is based on the fact that a kernel function meeting the above conditions implies the existance of some Hilbert space H (of functions) and a mapping ψ : D → H such that k(x, y) = ψ(x), ψ(y)H .

(2)

For a given kernel function, the mapping ψ(x) → H (i.e., the set of point evaluation functionals) does not have to be unique,

1132

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 7, JULY 2012

and is often unknown. However, since ψ is implicit, it does not need to be known for most machine learning algorithms, which exploit the nonlinearity of the mapping to create nonlinear algorithms from linear ones [10]. The Gaussian function used in RBF networks given by k(x, y) = e

  2 − x−y 2 2μ

ψ(x) = K x (·) = e



This section outlines the formulation of MRAC using approximate model inversion [8], [15], [17], [18], [27]. Let Dx ∈ Rn be compact, x(t) ∈ Dx be the known state vector, and δ ∈ Rk denote the control input. Consider the uncertain multivariable nonlinear dynamical system x˙ = f (x(t), δ(t))

with bandwidth μ2 is an example of a bounded kernel function, which generates an infinite dimensional RKHS H.1 Given a point x ∈ D, the mapping ψ(x) → H can be thought of as a vector in the Hilbert space H. In D, we have the relation   2 − x−· 2

III. MRAC AND CL

.

The remainder of this paper uses the notation K x (·) = 2 2 k(x, ·) = e(−x−· /2μ ) . Fixing a dataset C = {c1 , . . . , cl }, where ci ∈ Rl , let  l   FC := αi k(ci , ·) : αi ∈ R (3) i=1

be the linear subspace generated by C in H. Note that FC is a class of functions: any RBF network with the same centers and the same fixed bandwidth (but without bias) is an element of this subspace. Let σ (x) = [k(x, c1 ), . . . , k(x, cl )]T , and let W = [w1 , . . . , wl ] where wi ∈ R. Then W T σ (x) is the output of a standard RBF network. In the machine learning literature, σ (x) is sometimes known as the empirical kernel map. Two different datasets C1 and C2 generate two different families of functions FC1 and FC2 , and two different empirical kernel maps σC1 (x) and σC2 (x). Finally, given the above dataset C, one can form an l × l kernel matrix given by K i j := k(ci , c j ). The following theorem is one of the cornerstones of RBF network theory, and will be used in Section IV. Theorem 2 (Micchelli [24]): If the function k(x, y) is a positive-definite Mercer kernel, and if all the points in C are distinct, the kernel matrix K is nonsingular. B. PE Signals In this paper, we use Tao’s definition of a PE signal [41]. Definition 1: A bounded vector signal (t) is exciting over an interval [t, t + T ], T > 0 and t ≥ t0 , if there exists γ > 0 such that  t +T (τ )T (τ )dτ ≥ γ I. (4) t

Definition 2: A bounded vector signal (t) is persistently exciting if for all t > t0 there exists T > 0 and γ > 0 such that  t +T (τ )T (τ )dτ ≥ γ I. (5) t

The strength of the signal depends on the value of γ . 1 In this paper, we assume that the basis functions for the RBF network are Gaussian, but these ideas can be generalized to other kernel functions.

(6)

where the function f is assumed to be Lipschitz continuous in x ∈ Dx , and the control input δ is assumed to be bounded and piecewise continuous. These conditions ensure the existence and the uniqueness of the solution to (6) over a sufficiently large domain Dx . Since the exact model (6) is usually not available or not invertible, an approximate inversion model fˆ(x, δ) is introduced, which is used to determine the control input δ δ = fˆ−1 (x, ν)

(7)

where ν is the pseudo control input that represents the desired model output x˙ and is expected to be approximately achieved by δ. Hence, the pseudo control input is the output of the approximate inversion model ν = fˆ(x, δ).

(8)

This approximation results in a model error of the form x˙ = fˆ(x, δ) + (x, δ) where the model error :

Rn+k



Rn

(9)

is given by

(x, δ) = f (x, δ) − fˆ(x, δ).

(10)

A reference model is chosen to characterize the desired response of the system x˙rm (t) = frm (xrm (t), r (t))

(11)

where frm (xrm (t), r (t)) denotes the reference model dynamics which are assumed to be continuously differentiable in x for all x ∈ Dx ⊂ Rn , and it is assumed that all requirements for guaranteeing the existence of a unique solution to (11) are satisfied. The command r (t) is assumed to be bounded and piecewise continuous. Furthermore, it is also assumed that the reference model states remain bounded for a bounded reference input. Consider a tracking control law consisting of a linear feedback part ν pd = K x, a linear feedforward part νrm = x˙rm , and an adaptive part νad (x) in the following form [7], [16], [27]: ν = νrm + ν pd − νad . (12) Define the tracking error e as e(t) = xrm (t) − x(t), then, letting A = −K the tracking error dynamics are found to be e˙ = Ae + [νad (x, δ) − (x, δ)].

(13)

The baseline full-state feedback controller ν pd = K x is designed to ensure that A is a Hurwitz matrix. Hence for any positive definite matrix Q ∈ Rn×n , a positive definite solution P ∈ Rn×n exists to the Lyapunov equation A T P + P A + Q = 0.

(14)

KINGRAVI et al.: RKHS APPROACH FOR THE ONLINE UPDATE OF RADIAL BASES IN NEURO-ADAPTIVE CONTROL

Let x¯ = [x, δ] ∈ Rn+k . Generally, two cases for characterizing the uncertainty (x) ¯ are considered. In structured uncertainty, the mapping (x) ¯ is known, whereas in unstructured uncertainty, it is unknown. We focus on the latter in this paper. Assume that it is only known that the uncertainty (x) ¯ is continuous and defined over a compact domain D ⊂ Rn+k . ¯ σ3 (x), ¯ . . . , σl (x)] ¯ T be a vector of known Let σ (x) ¯ = [1, σ2 (x), RBFs. For i = 2, 3, . . . , l, let ci denotes the RBF centroid and let μ denotes the RBF bandwidth. Then for each i , the RBFs are given as σi (x) = e

x−c ¯ i 2 − 2μ2

(15)

which can also be written as k(ci , ·) as per the notation introduced earlier (§II). Appealing to the universal approximation property of RBF NN [31], [40], we have that given a fixed number of RBF l there exist ideal weights W ∗ ∈ Rn×l and a real number ( ˜ x) ¯ such that the following approximation holds for all x ∈ D where D is compact ¯ + ˜ (x) ¯ (x) = W ∗ T σ (x)

(16)

˜ (x) ¯ can be made arbitrarily small when and ¯ = supx∈D ¯ sufficient number of RBF are used. To make use of the universal approximation property, a RBF NN can be used to approximate the unstructured uncertainty in (10). In this case, the adaptive element takes the form ¯ = W T σ (x) ¯ νad (x)

(17)

Rn×l .

The goal is to design an adaptive law such where W ∈ that W (t) → W ∗ as t → ∞ and the tracking error e remains bounded. A commonly used update law, which will be referred to here as the baseline adaptive law, is given as [1], [29], and [41] ¯ T P B. W˙ = − W σ (x)e

(18)

The adaptive law (18) guarantees that the weights approach ¯ is PE. In their ideal values (W ∗ ), if and only if the signal σ (x) absence of PE, without additional modifications such as σ -mod [14], e-mod [28], nor projection-based adaptation [41], the law does not guarantee that the weights W remain bounded. The work in [8] shows that if specifically selected recorded data is used concurrently with instantaneous measurements, then the weights approach and stay bounded in a compact neighborhood of the ideal weights subject to a sufficient condition on the linear independence of the recorded data; PE is not needed. This is captured in the following theorem. Theorem 3 ([8]): Consider the system in (6), the control law of (12), x(0) ¯ ∈ D where D is compact, and the case of unstructured uncertainty. For the j th recorded data point let j (t) = W T (t)σ (x¯ j ) − (x¯ j ), with (x¯ j ) for a stored data point j calculated using (10) as (x¯ j ) = x˙ j − ν j . Also, let p be the number of recorded data points σ (x¯ j ) in the matrix Z = [σ (x¯1), . . . , σ (x¯ p )]. If rank(Z ) = l, then the following weight update law: W˙ = − W σ (x)e ¯ T P B − W

p  j =1

σ (x¯ j ) Tj

(19)

1133

renders the tracking error e and the RBF NN weight errors W˜ uniformly ultimately bounded. Furthermore, the adaptive weights W (t) will approach and remain bounded in a compact neighborhood of the ideal weights W ∗ . The matrix Z will be referred to as the history stack. IV. K ERNEL L INEAR I NDEPENDENCE AND THE B UDGETED K ERNEL R ESTRUCTURING A LGORITHM A. PE Signals and the RKHS In this section, we make a connection between adaptive control and kernel methods by leveraging RKHS theory to relate PE of x(t) ¯ to PE of σ (x(t)). ¯ Let C = {c1 , . . . , cl }, and recall that FC represents the linear subspace generated by C in H [see (3)]. Let G = σ (x(t))σ ¯ (x¯ (t))T , then ⎛ ⎞ k(x, ¯ c1 )  ⎜ ⎟ ¯ c1 ) · · · k(x, ¯ cl ) G = ⎝ ... ⎠ k(x, k(x, ¯ cl ) ⎞ ⎛ ¯ c1 ) · · · k(x, ¯ c1 )k(x, ¯ cl ) k(x, ¯ c1 )k(x, ⎟ ⎜ .. .. .. =⎝ ⎠ . . . ¯ c1 ) · · · k(x, ¯ cl )k(x, ¯ cl ) k(x, ¯ cl )k(x, ⎛ ⎞ ψ  · · · ψ ψ ψx,c ¯ 1 x¯ ,c1 x¯ ,c1 x,c ¯ l ⎜ ⎟ .. .. .. =⎝ ⎠ . . . ψx¯ ,cl ψx¯ ,c1  · · · ψx,c ¯ l ψx¯ ,cl 

where ψx,c ¯ i  is shorthand for ψ( x¯ ), ψ(ci ). A matrix G is positive definite, if and only if v T Gv > 0 ∀v ∈ Rl . In the above, this translates to v T Gv =

l 

v i v j G i, j

i, j =1

=

l 

v i v j ψ(x¯ ), ψ(ci )ψ(x¯ ), ψ(c j )

i, j =1



= ψ(x), ¯  = ψ(x), ¯

l 

 v i ψ(ci )

i=1 l 

ψ(x), ¯

 v j ψ(c j )

j =1

2 v i ψ(ci )

l 

.

i=1

From Micchelli’s theorem (Theorem 2), it is known that if ci = c j , then ψ(ci ) and ψ(c j ) are linearly independent in H. Further, if the trajectory x(t) ¯ ∈ Rl is bounded for all time, then k( ¯ c) = 0. From this, it follows trivially that  tx(t), +T the signal t G(τ )dτ is bounded. Furthermore, the next theorem follows immediately. Theorem 4: Suppose x(t) ¯ evolves in the state space according to 6, then if there exists some time t f ∈ R+ such that the mapping ψ(x¯ (t)) → H for t > t f is orthogonal to the linear subspace FC ⊂ H for all time, the signal σ (x(t)) ¯ is not PE. If the state x(t) ¯ reaches a stationary point, we can state a similar theorem. Theorem 5: Suppose x(t) ¯ evolves in the state space according to 6. If there exists some state x¯ f ∈ Rn and some t f ∈ R+ such that x(t) ¯ = x¯ f ∀t > t f , σ (x(t)) ¯ is not PE.

1134

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 7, JULY 2012

Rn

ψ(¯ x(t))

ψ c2

H

ψ(¯ x(t))

ψ(c2 )

x ¯(t) c1

H

ψ(c1 ) FC

Fig. 1. Example Hilbert space mapping. The trajectory x(t) ¯ evolves in the state space Rn via an ODE; the mapping ψ an equivalent ODE in H. If c1 and c2 are the centers for the RBFs, then they generate the linear subspace FC ⊂ H, which is a family of linearly parameterized functions, via the mappings ψ(c1 ) and ψ(c2 ).

Therefore, PE of σ (x(t)) ¯ follows only if neither of the above conditions are met. Fig. 1 shows an example of the mapping ψ in H given the trajectory x(t), ¯ while Fig. 2 depicts a geometric description of a non-PE signal in the Hilbert space. The signal ψ(x¯ (t)) in H becomes orthogonal to the ¯ moves far centers C = {c1 , . . . , cl } mapped to H, if x(t) away from them in Rn . Though orthogonality is desired in the state space Rn for guaranteeing PE, orthogonality in H is detrimental to PE of σ (x(t)). ¯ Recall that without PE of σ (x(t)) ¯ the baseline adaptive law of (18) does not guarantee that the weights converge to a neighborhood of the ideal weights. This shows that in order to guarantee PE in σ (x(t)), ¯ not only does x(t) ¯ have to be PE, but also the centers should be such that the mapping ψ(x(t)) ¯ → H is not orthogonal to the linear subspace FC generated by the centers. It is intuitively clear that locality is a concern when learning the uncertainty; if the system evolves far away from the original set of centers, the old centers will give less accurate information than new ones that are selected to better represent the current operating domain. Theorem 4 enables us to state this more rigorously; keeping the centers close to x(t) ¯ ensures PE of σ (x(t)) ¯ (as long as x(t) ¯ is also PE). At the same time, we do not want to throw out all of the earlier centers either. This is in order to preserve global function approximation (over the entire domain), especially if the system is expected to revisit these regions of state space. The work in [8] shows that if the history stack is linearly independent, the law in (19) suffices to ensure convergence to a compact neighborhood of the ideal weight vector W ∗ without requiring PE of σ (x(t)). ¯ We can reinterpret the CL gradient descent in our RKHS framework. Note that the CL adaptive law in (19) will ensure j (t) → ˜ by driving W (t) → W ∗ . By the above analysis, σ (x¯ j ) is the projection of ψ(x¯ j ) onto FC . If ψ(x(t)) ¯ is orthogonal to FC in H, the first term in (19) [which is also the baseline adaptive law of (18)] vanishes, causing the evolution of weights to  stop with respect to the tracking error e. The condition that p σ (x¯ j )σ (x¯ j )T has the same rank as the number of centers in the RBF network is equivalent to the statement that one has a collection of vectors p p {x¯ j } j =1 , where the projections of {ψ(x¯ j )} j =1 onto FC allow the weights to continue evolving to minimize the difference in uncertainty.

FC

Fig. 2. Example of a non-PE signal: Any trajectory x(t) ¯ ∈ Rn inducing a trajectory ψ( x(t)) ¯ ∈ H that is orthogonal to FC after some time t f renders σ ( x(t)) ¯ non-PE.

B. Linear Independence This section outlines a strategy to select RBF centers online. The algorithm presented here ensures that the RBF centers cover the current domain of operation, while keeping at least some centers reflecting previous domains of operation. We use the scheme introduced in [30] to ensure that the RBF centers reflect the current domain of operation. Similar ideas are used in the kernel adaptive filtering literature in order to curb filter complexity [22]. At any point in time, our algorithm maintains a “dictionary” of centers Cl = {ci }li=1 , where l is the current size of the dictionary, and N D is the upper limit on the number of points (the budget). To test whether a new center cl should be inserted into the dictionary, we check whether it can or cannot be approximated in H by the current set of centers. This test is performed using 2  l     ai ψ(ci ) − ψ(cl ) (20) γ =   i=1

H

where the ai denotes the coefficients of the linear independence. Unraveling the above equation in terms of (2) shows that the coefficients ai can be determined by minimizing γ , which yields the optimal coefficient vector aˆ l = K l−1 kˆl , where K l = k(Cl , Cl ) is the kernel matrix for the dictionary dataset Cl , and kˆl = k(Cl , cl ) is the kernel vector. After substituting the optimal value aˆ l into (20), we get γ = K (cl , cl ) − kˆlT aˆ l .

(21)

The algorithm is summarized in Algorithm 1. For more details see [30]; in particular, an efficient way to implement the updates is given there, which we do not get into here. Note that due to the nature of the linear independence test, every time a state x(t) ¯ is encountered that cannot be approximated within tolerance η by the current dictionary Cl , it is added to the dictionary. Further, since the dictionary is designed to keep the most varied basis possible, this is the best one can do on a budget using only instantaneous online data. Therefore, in the final algorithm, all the centers can be initialized to 0, and the linear independence test can be periodically run to check whether or not to add a new center to the RBF network. This is fundamentally different from update laws of

KINGRAVI et al.: RKHS APPROACH FOR THE ONLINE UPDATE OF RADIAL BASES IN NEURO-ADAPTIVE CONTROL

Algorithm 1 Kernel linear independence test Input: New point cl , η, N D . Compute aˆ l = K l−1 kˆl kl = k(Cl , cl ) Compute γ as in 21 if γ > η then if l < N D then Update dictionary by storing the new point cl , and recalculating γ ’s for each of the points else Update dictionary by deleting the point with the minimal γ , and then recalculate γ ’s for each of the points end if end if

the kind given in [27], which attempt to move the centers to minimize the tracking error e. Since such update laws are essentially rank-1, if all the centers are initialized to 0, they will all move in the same direction together. Furthermore, from the above analysis and from (15), it is clear that the amount of excitation is maximized when the centers are previous states themselves. Summarizing, the following are the advantages of picking centers online with the linear independence test (21). 1) In light of Theorem 4, Algorithm 1 ensures inserted excitation at the current time in the system does not disappear by selecting centers such that FC is not orthogonal to current states. In less formal terms, this means that at least some centers are “sufficiently” close to the current state. 2) This also implies that not all of the old centers are discarded, which means that all of the regions where the system has previously operated are represented to some extent in the dictionary. This implies global function approximation. 3) Algorithm 1 enables the design of adaptive controllers without any prior knowledge of the domain. For example, this method would allow one to initialize all centers to zero, with appropriate centers then selected by Algorithm 1 online. 4) On a budget, selecting centers with Algorithm 1 is better than evenly spacing centers, since the centers are selected along the path of the system in the state space. This results in a more judicious distribution of centers without any prior knowledge of the uncertainty. If the centers for the system are picked using the kernel linear independence test, and the weight update law is given by the standard baseline law (18), we call the resulting algorithm BKR. V. BKR-CL A LGORITHM A. Motivation The BKR algorithm selects centers in order to ensure that any inserted excitation does not vanish. However, to fully

1135

leverage the universal approximation property [see (16)], we must also prove that the weights are driven toward their ideal values. In this section, we present a CL adaptive law that guarantees the weights approach and remain bounded in compact domain around their ideal weights by concurrently utilizing online selected and recorded data with instantaneous data for adaptation. The CL adaptive law is wedded with BKR to create BKR-CL. We begin by describing the algorithm used to select, record, and remove data points in the history stack. Note that Algorithm 1 picks the centers discretely; therefore, there always exists an interval [tk , tk+1 ], where k ∈ N and the centers are fixed. This discrete update of the centers introduces switching in the closed loop system. Let σ k (x) ¯ denotes the value of σ ∗ given by this particular set of centers, denoted by W k the ideal set of weights for these centers and by σ k the radial basis function for these centers. Let p ∈ N denotes the subscript of the last point stored. For a stored data point x¯ j , let σ jk ∈ Rn denotes σ k (x¯ j ). We will let St = [x¯1 , . . . , x¯ p ] denotes the matrix containing the recorded information in the history stack at time t, then Z t = [σ1k , . . . , σ pk ] is the matrix containing the output of the RBF function for the current set of RBF centers. The pth column of Z t will be denoted by Z t (:, p). It is assumed that the maximum allowable number of recorded data points is limited due to memory or processing power considerations. Therefore, we will require that Z t has a maximum of p¯ ∈ N columns; clearly, in order to be able to satisfy rank(Z t ) = l, p ≥ l. For the j -th data point, the associated model error (x¯ j ) is assumed to be ¯ stored in the array (:, j ) = (x¯ j ). The uncertainty of the system is estimated by estimating x˙ j for the j th recorded data point using optimal fixed point optimal smoothing and solving (10) to get (x) ¯ = x˙ j − ν j . The history stack is populated using an algorithm that aims to maximize the minimum singular value of the symmetric p k kT matrix  = j =1 σ (x j )σ (x j ). Thus any data point linearly independent of the data stored in the history stack is included in the history stack. At the initial time t = 0 and ¯ 0 )). k = 0, the algorithm begins by setting Z t (:, 1) = σ 0 (x(t The algorithm then selects sufficiently different points for storage; a point is considered sufficiently different if it is linearly independent of the points in the history stack, or if it is sufficiently different in the sense of the Euclidean norm of the last point stored. Furthermore, if Algorithm 1 updates the dictionary by adding a center, then this point is also added to the history stack (if the maximum allowable size of the history stack is not reached), so the rank of Z t is maintained. If the number of stored points exceeds the maximum allowable number, the algorithm seeks to incorporate new data points in manner that the minimum singular value of Z t is increased; the current data point is added to the history stack, only if swapping it with an old point increases the singular value. One can see that Algorithms 1 and 2 work similarly to one another. The structure of Algorithm 2 is a condition that relates to the PE of the signal x(t), ¯ by ensuring that the history stack is as linearly independent in Rn as possible, while Algorithm 1 does the same for the PE of the signal σ (x(t)) ¯ as well, trying

1136

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 7, JULY 2012

Algorithm 2 Singular value maximizing algorithm for recording data points Require: p ≥ 1 σ k (x(t ))−σ k 2

if σ k (x(t ))p ≥ or a new center added by Algorithm 1 without replacing an old center then p = p+1 ¯ p) = x˙ − ν(t)} St (:, p) = x(t); {store (:, else if new center added by Algorithm 1 by replacing an old center then if old center was found in St then overwrite old center in the history stack with new center set p equal to the location of the data point replaced in the history stack end if end if Recalculate Z t = [σ1k , . . . , σ pk ] if p ≥ p¯ then T = Zt Sold = min SVD(Z tT ) for j = 1 to p do Z t (:, j ) = σ k (x(t)) S( j ) = min SVD(Z tT ) Zt = T end for find max S and let k denotes the corresponding column index if max S > Sold then ¯ k) = x˙ − ν(t)} St (:, k) = x(t); {store (:, Z t (:, k) = σ k (x(t)) p = p−1 else p = p−1 Zt = T end if end if to ensure that the center set is as linearly independent in H as possible. In order to approximate the uncertainty well, both algorithms are needed. B. Stability Analysis In this section, we prove the boundedness of the closed loop signals when using BKR-CL. Over each interval, between switches in the NN approximation, the tracking error dynamics are given by the following differential equation: ¯ − (x)]. ¯ e˙ = Ae + [W T σ k (x)

(22)

The NN approximation error of (16) for the k t h system can be rewritten as ∗T

(x) ¯ = W k σ k (x) ¯ + ˜ k (x). ¯

(23)

We prove the following equivalent of Theorem 3 for the following switching weight update law: W˙ = − W σ k (x)e ¯ T P B − W

p  j =1

T

σ k (x¯ j ) kj .

(24)

Theorem 6: Consider the system in (6), the control law of (12), x(0) ¯ ∈ D, where D is compact, and the case of unstructured uncertainty. For the j t h recorded data point let kj (t) = W T (t)σ k (x¯ j ) − (x¯ j ) and let p be the number of recorded data points σ jk := σ k (x¯ j ) in the matrix Z = [σ1k , . . . , σ pk ], such that rank(Z 0 ) = l at t = 0. Assume that the RBF centers are updated using Algorithm 1 and the history stack updated using Algorithm 2. Then, the weight update law in (24) ensures that the tracking error e of (22) and the RBF NN weight errors W˜ k of (23) are bounded. Proof: Consider the tracking error dynamics given by (22) and the update law of (24). Since νad = W T (σ k (x)), we have j = W T σ k (x) ¯ − (x). ¯ Then the NN approximation error is ∗ now given by (23). With W˜ k = W − W k ∗T

kj (x) ¯ = W T σ k (x) ¯ − W k σ k (x) ¯ − ˜ k (x) ¯ k k k ¯ − ˜ (x). ¯ = W˜ σ (x) Therefore, over [tk , tk+1 ], the weight dynamics are given by the following switching system: ⎡⎛ ⎞ p  T W˙˜ k (t) = − ⎣⎝ σ k (x¯ j )σ k (x¯ j )⎠ W˜ k (t) j =1

+

p 



T σ k (x¯ j )˜ (z j ) − σ k (x(t))e ¯ (t)P B⎦. (25) kT

j =1

Consider the family of positive definite functions V k = T −1 (1/2)e T Pe + (1/2) Tr(W˜ k W W˜ k ), where Tr(·) denotes the k trace operator. Note that V ∈ C 1 , V k (0) = 0 and  k kT V k (e, W˜ k ) > 0 ∀e, W˜ k = 0, and define k := j σj sj . Then 1 V˙ k (e, W˜ k ) = − e T Qe + e T P B ˜ k 2 ⎡ ⎤⎞ ⎛   T − Tr ⎝W˜ k ⎣ σ jk σ jkT W˜ k − σ j ˜ kj ⎦⎠ j

j

1 ≤ − λmin (Q)e T e + e T P B ˜ k  2 T T  k k −λmin (k )W˜ k W˜ k − W˜ k σi ˜  i  1 ≤ e C1k − λmin (Q)e 2 ˜ k + W (C2k − λmin (k )W˜ k ) √ where C1k = P B˜ k , C2 = p ˜ k l (l being the number of  T p RBFs in the network), and k = j =1 σ k (x¯ j )σ k . Note that since Algorithm 2 guarantees that only sufficiently different points are added to the history stack then due to Micchelli’s theorem [24], k is guaranteed to be positive definite. Hence, if e > 2C1 /λmin (Q) and W˜ k  > C3 /λmin (k ), we have V˙ (e, W˜ k ) < 0. Hence, the set k = {(e, W˜ k ) : e + W˜ k  ≤ 2C1 /λmin (Q) + C2 /λmin (k )} is positively invariant for the k t h system. Let S = {(t1 , 1), (t2 , 2), . . . } be an arbitrary switching sequence with finite switches in finite time (note that this is



KINGRAVI et al.: RKHS APPROACH FOR THE ONLINE UPDATE OF RADIAL BASES IN NEURO-ADAPTIVE CONTROL

1137

1.5

Angular Position 3

2 θ (rad)

e−mod weights σ−mod weights proj BKR−CL weights

ref model e−mod σ−mod proj BKR−CL

1

1 0.5

0

−1

0

10

20

30 time (seconds)

40

50

60 0

Angular Rate 2 −0.5 θ Dot (rad/s)

1

0

−1

−1 −1.5 −2

0

10

20

30 time (seconds)

40

50

0

1000

2000

3000

4000

5000

6000

7000

60

Fig. 4. Weight evolution with BKR-CL, e-mod ,σ -mod, and projection operator. Position Error

θ Err (rad)

0.5 e−mod σ−mod proj BKR−CL

0

RBF centers in state space 2 system path BKR centers σ and e−mod centers

−0.5 1.5 −1

0

10

20

30

40

50

60 1

Angular Rate Error θ DotErr (rad/s)

2 1

0.5

0 0 −1

0

10

20

30

40

50

60 −0.5

Control Input

δ (rad)

5 −1 0 −1.5 −5

0

10

20

30 time (seconds)

40

50

60 −2 −1

Fig. 3. Tracking with e-mod, σ -mod, projection operator, and with BKR-CL.

always guaranteed due to the discrete nature of Algorithm 1). The sequence denotes that a system Sk was active between tk and tk+1 . Suppose at time tk+1 , the system switches from Sk to ∗ Sk+1 . Then e(tk ) = e(tk+1 ) and W˜ k+1 = W˜ k + W k , where ∗ ∗ ∗ k k (k+1) k k W = W − W . Since V˙ (e, W˜ ) is guaranteed to be negative definite outside of a compact set, it follows that e(tk ) and W˜ k (tk ) are guaranteed to be bounded. Therefore, e(tk+1 ) and W˜ k+1 (tk+1 ) are also bounded and since V˙ k+1 (e, W˜ k+1 ) is guaranteed to be negative definite outside of a compact set, e(t) and W˜ k+1 (t) are also bounded. Furthermore, over every interval [tk , tk+1 ], they will approach the positively invariant set k+1 or stay bounded within k+1 if inside. Remark 1: Note that BKR-CL along with the assumption that rank(Z ) = l guarantees that the weights stay bounded in a compact neighborhood of their ideal values without requiring any additional damping terms such as σ modification [14] or e modification [29] (even in the presence of noise). Due to Micchelli’s theorem [24], as long as the history stack matrix Z k contains l different points, rank(Z ) = l. Therefore, it is expected that this rank condition will be met within the first few time steps even when one begins with no a priori recorded data points in the history stack. In addition, a

Fig. 5.

−0.5

0

0.5

1

1.5

2

2.5

3

Center placement for the classical laws and BKR-CL.

projection operator [41] can be used to bound the weights until rank(Z t ) = l. This is a straight forward extension of the presented approach.

VI. A PPLICATION TO C ONTROL OF W ING ROCK DYNAMICS Modern highly swept-back or delta wing fighter aircraft are susceptible to lightly damped oscillations in roll known as “Wing Rock.” Wing rock often occurs at conditions commonly encountered at landing [34]; making precision control in presence of wing rock critical for safe landing. In this section, we use BKR-CL control to track a sequence of roll commands in the presence of simulated wing rock dynamics. Let θ denotes the roll attitude of an aircraft, p denotes the roll rate and δa denotes the aileron control input. Then a model for wing rock dynamics is [25] θ˙ = p p˙ = L δa δa + (x)

(26) (27)

1138

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 7, JULY 2012

2

Angular Position 3 ref model e−mod σ−mod proj BKR−CL

θ (rad)

2

σ−mod weights proj BKR−CL weights

1.5

1 1 0

−1

0.5 0

10

20

30 time (seconds)

40

50

60

0

Angular Rate 2

−0.5

θ Dot (rad/s)

1

0 −1 −1 −1.5 −2

0

10

20

30 time (seconds)

40

50

0

1000

2000

3000

4000

5000

6000

7000

60

Fig. 7. Weight evolution with BKR-CL and with BKR e-mod, BKR σ -mod, and BKR proj.

Position Error

θ Err (rad)

0.5 3−mod σ−mod proj BKR−CL

0 −0.5 −1

0

10

20

30

40

50

60

40

50

60

40

50

60

Angular Rate Error θ DotErr (rad/s)

1 0.5 0 −0.5 −1

0

10

20

30 Control Input

2

δ (rad)

0 −2 −4 −6

0

10

20

30 time (seconds)

Fig. 6. Tracking with BKR e-mod, BKR σ -mod, BRK proj, and with BKR-CL.

where (x) = W0∗ +W1∗ θ + W2∗ p + W3∗ |θ | p + W4∗ | p| p + W5∗ θ 3 (28) and L δa = 3. The parameters for wing rock motion are adapted from [36]; they are W1∗ = 0.2314, W2∗ = 0.6918, W3∗ = −0.6245, W4∗ = 0.0095, W5∗ = 0.0214. In addition to these parameters, a trim error is introduced by setting W0∗ = 0.8. The ideal parameter vector W ∗ is assumed to be unknown. The chosen inversion model has the form ν = (1/L δa )δa . This choice results in the modeling uncertainty of (10) being given by (x). The adaptive controller uses the control law of (12). The linear gain K of the control law is given by [0.5, 0.4], a second order reference model with natural frequency of 1 rad/s and damping ratio of 0.5 is chosen, and the learning rate is set to W = 1. The simulation uses a time-step of 0.01 s. We choose to make two sets of comparisons. In the first set, we compare BKR-CL with e-mod, σ -mod, and projection operator adaptive laws (henceforth collectively denoted as “classical adaptive laws”) for neuroadaptive control. In all cases, the number of centers was chosen to be 12. In BKR-CL,

the centers were all initialized to 0 ∈ R2 at t = 0, while in the classical adaptive laws, they were placed across the domain of uncertainty evenly (where the prior knowledge of the domain was determined through experiments conducted beforehand). The BKR-CL algorithm uses the switching weight update law (24). Fig. 3(a) shows the tracking of the reference model by the classical laws and BKR-CL. Fig. 3(b) shows the tracking error for the same. As can be seen, overall error, and especially the transient error for BKR-CL is lower than that for the classical laws. Fig. 4 shows an example of how CL affects the evolution of weights; it has a tendency to spread the weights across a broader spectrum of values than the classical laws, leading the algorithm to choose from a richer class of functions in FC . Fig. 5 shows the final set of centers that are used; as can be seen, the centers follow the path of the system in the state space. This is another reason why the uncertainty is approximated better by BKR-CL than the classical algorithms (this is discussed further in Section VI-A). To illustrate this further, we also ran another set of comparisons between BKRCl and the classical laws with the centers picked according to the BKR scheme instead of being uniformly distributed beforehand. In all subsequent figures, the BKR versions of emod, σ -mod, and the projection operator are denoted by BKR e-mod, BKR σ -mod, and BKR proj, respectively. The tracking performance is compared in Fig. 6, and it can be seen here that BKR-CL is again the better scheme. Finally, in both sets of comparisons, note that the control input is tracked better by BKR-CL than the other algorithms. Fig. 7 shows that BKR-CL again drives the weights over a wider domain. A. Illustration of Long-Term Learning Effect Introduced by BKR-CL One of the main contributions of the presented BKRCL scheme is the introduction of long-term learning in the adaptive controller. Long-term learning here is characterized by better approximation of the uncertainty over the entire domain. Long term learning results due to the convergence of the NN weights to a neighborhood of their ideal values [in sense of the Universal Approximation Property of (17)].

KINGRAVI et al.: RKHS APPROACH FOR THE ONLINE UPDATE OF RADIAL BASES IN NEURO-ADAPTIVE CONTROL 3

1139

4 e−mod σ−mod proj BKR−CL true uncertainty

2.5

e−mod σ−mod proj BKR−CL true uncertainty

3

2

1.5

2

1 1 0.5

0

0

−0.5 −1 −1

−1.5

0

1000

2000

3000

4000

5000

6000

−2

7000

0

1000

2000

3000

(a)

4000

5000

6000

7000

(a) 2.5

2.5 e−mod σ−mod proj BKR−CL true uncertainty

e−mod σ−mod proj BKR−CL true uncertainty

2

2

1.5 1.5

1

1 0.5

0.5 0

0

0

1000

2000

3000

4000

5000

6000

−0.5

7000

0

1000

2000

3000

4000

5000

6000

7000

(b)

(b)

Fig. 8. Plots of the online NN output against the system uncertainty, and the NN output using the final weights frozen at the end of the simulation run. Note that the NN weights and centers with BKR-CL better approximate the uncertainty in both cases.

Fig. 9. Plots of the online NN output against the system uncertainty, and the NN output using the final weights frozen at the end of the simulation run, with a high learning rate.

Fig. 8 presents an interesting comparison between the online NN outputs (W T (t)σ (x¯ (t))) for all the controllers, and the NN output with the weights and centers frozen at their values ¯ then at the final time T of the simulation (W T (T )σ (x(t))), re-simulated with these “learned weights.” Fig. 8(a) compares the online adaptation performance during the simulation, that is it compares (x(t)) ¯ with W T (t)σ (x¯ (t)). We can see that BKR-CL does the best job approximating the uncertainty online, particularly in the transient, whereas the classical laws tend to perform similar to each other. In Fig. 8(b), we see that the difference is more dramatic when the weights are frozen at their learned values during the simulation; only BKRCL completely captures the transient, and approximates the steady-state uncertainty fairly closely, while the classical laws do quite poorly. This implies that the weights learned with BKR-CL have converged to the values that provide a good approximation of the uncertainty over the entire operating domain. On the other hand, in Fig. 8(b) we see that if BKRCL is not used, the NN does not approximate the uncertainty as well, indicating that the weights have not approached their ideal values during the simulation. From Fig. 4, we see that since the weights for e-mod and the projection operator are closer to those approached by BKR-CL than σ -mod, and they do slightly better in uncertainty approximation. The effect of long-term learning can be further characterized by studying the behavior of the BKR algorithm without CL and with a higher learning rate ( W = 10). Higher learning rates are often used in traditional adaptive control (without CL) to guarantee good tracking. Fig. 9(a) compares the online

W INGROCK C OMPUTATION T IME C OMPARISON W ITH N ON -BKR C ONTROLLERS

TABLE I

Centers 8 10 12 14

e-mod 3.6 3.7 3.8 3.8

σ -mod 3.1 3.2 3.4 3.4

proj 3.2 3.3 3.5 3.6

BKR-CL 13.9 16.1 18.9 23.8

TABLE II W INGROCK C OMPUTATION T IME C OMPARISON W ITH BKR C ONTROLLERS Centers 8 10 12 14

BKR e-mod 4.2 4.4 4.6 4.8

BKR proj 4.2 4.4 4.6 4.8

BKR σ -mod 4.4 4.5 4.7 5.0

BKR-CL 13.9 16.1 18.9 23.8

adaptation performance. Except for the initial transient, the NN outputs of the classical laws approximate the uncertainty well locally in time with a higher learning rate. However, Fig. 9(b) shows that they do a poor job in estimating the ideal weights with the learning rate increased, indicating that good local approximation of the uncertainty through high adaptation gains does not necessarily guarantee long-term learning. This is motivated by the fact that the classical laws relying on the gradient-based weight update law of (18) drive the weights only in the direction of maximum reduction of instantaneous tracking error cost, and hence, do not guarantee that the weights are driven close to the ideal values that best represent

1140

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 7, JULY 2012

the uncertainty globally. On the other hand, BKR-CL has a better long-term adaptation performance as not only are the centers updated to best reflect the domain of the uncertainty, but the weights are also driven toward the ideal values that best represent the uncertainty over the domain visited.

recorded and instantaneous data for adaptation, to create the BKR-CL adaptive control algorithm. It was shown through Lyapunov-like analysis that the weights for the centers picked by BKR-CL are bounded without needing PE. Simulations showed improved tracking performance. ACKNOWLEDGMENT

B. Computational Complexity The accuracy of the algorithm comes at a computational cost. The two main components of BKR-CL are the BKR step (Algorithm 1) and the CL step (Algorithm 2). Once the budget of the dictionary N D is reached, the BKR step 2 ) steps to add a new center, and is thus roughly takes O(N D reasonably efficient. The main computational bottleneck in BKR-Cl is the computation of SVD of the history stack matrix  in the CL step. If there are p points in , this takes O( p3 ) steps. Given the fact that p is usually larger than N D , this can be quite expensive. Table I shows a computation time comparison between the non-BKR controllers and BKRCL for the Wingrock simulation run for 60 s in M ATLAB on a dual core Intel 2.93 Ghz processor with 8 GB RAM, while Table II shows the same comparison between BKR controllers and BKR-CL. As can be seen, the computation time increases dramatically for BKR-CL as the size of  increases. However, there are two things to note here; the number of points added to the history stack  is determined by the tolerance in Algorithm 2. Therefore, if too many points are being added to the history stack, one can increase the tolerance, or set a separate budget to make p fixed. Secondly, we used the native implementation of the SVD in MATLAB for these experiments, which is not optimized for an online setting. Recent work in incremental SVD computation [4], [5], [35] speeds up the decomposition considerably, resulting in O( p 2 ) complexity, once the first SVD with p points has been computed. Therefore, the computation time for the SVD component of BKR-CL can be made roughly comparable to the BKR component, which implies that the algorithm will scale better as the number of points in the history stack increases. VII. C ONCLUSION In this paper, we established a connection between kernel methods and PE signals using RKHS theory. Particularly, we showed that in order to ensure PE, not only do the system states have to PE, but the centers need to be also selected in such a way that the mapping from the state space to the underlying RKHS is never orthogonal to the linear subspace generated by the RBF centers. This ensures that the output of the adaptive element does not vanish. We used this connection to motivate an algorithm called BKR that updates the RBF network in a way that ensures any inserted excitation is retained. This enabled us to design adaptive controllers without assuming any prior knowledge about the domain of the uncertainty. Furthermore, it was shown through simulation that on a budget (limitation on the maximum number of RBFs used), the method is better at capturing the uncertainty than evenly spaced centers, since the centers are selected along the path of the system through the state space. We augmented BKR with CL, a method that concurrently uses specifically

The authors would like to thank M. Mühlegg for his valuable help in revising this paper and in the simulations. R EFERENCES [1] K. J. Aström and B. Wittenmark, Adaptive Control, 2nd ed. Reading, MA: Addison-Wesley, 1995. [2] P. Bouboulis, K. Slavakis, and S. Theodoridis, “Adaptive learning in complex reproducing kernel Hilbert spaces employing Wirtinger’s subgradients,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 3, pp. 425–438, Mar. 2012. [3] S. Boyd and S. Sastry, “Necessary and sufficient conditions for parameter convergence in adaptive control,” Automatica, vol. 22, no. 6, pp. 629–639, 1986. [4] M. Brand, “Fast online SVD revisions for lightweight recommender systems,” in Proc. SIAM Int. Conf. Data Mining, 2003, pp. 1–10. [5] M. Brand, “Fast low-rank modifications of the thin singular value decomposition,” Linear Algebra Appl., vol. 415, no. 1, pp. 20–30, May 2006. [6] C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining Knowl. Discovery, vol. 2, no. 2, pp. 121–167, 1998. [7] A. J. Calise, M. Sharma, and S. Lee, “Adaptive autopilot design for guided munitions,” AIAA J. Guid. Control Dynamics, vol. 23, no. 5, pp. 837–843, 2000. [8] G. Chowdhary, “Concurrent learning for convergence in adaptive control without persistency of excitation,” Ph.D. thesis, School Aerosp. Eng., Georgia Inst. Technol., Atlanta, 2010. [9] G. Chowdhary and E. N. Johnson, “Concurrent learning for convergence in adaptive control without persistency of excitation,” in Proc. 49th IEEE Conf. Decision Control, Dec. 2010, pp. 3674–3679. [10] N. Cristianini and J. Shawe-Taylor, Kernel Methods for Pattern Analysis. Cambridge, U.K.: Cambridge Univ. Press, 2004. [11] Y. Engel, S. Mannor, and R. Meir, “The kernel least-mean-square algorithm,” IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2275–2285, Aug. 2004. [12] G. Folland, Real Analysis: Modern Techniques and Their Applications. New York: Wiley, 1999. [13] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. New York: Springer-Verlag, 2001. [14] P. A. Ioannou and J. Sun, Robust Adaptive Control. Upper Saddle River, NJ: Prentice-Hall, 1996. [15] E. N. Johnson, “Limited authority adaptive flight control,” Ph.D. thesis, School Aerosp. Eng., Georgia Inst. Technol., Atlanta, 2000. [16] E. N. Johnson and A. J. Calise, “Limited authority adaptive flight control for reusable launch vehicles,” J. Guid. Control Dynamics, vol. 26, no. 3, pp. 906–913, 2003. [17] S. Kannan, “Adaptive control of systems in cascade with saturation,” Ph.D. thesis, School Aerosp. Eng., Georgia Inst. Technol., Atlanta, 2005. [18] N. Kim, “Improved methods in neural network based adaptive output feedback control, with applications to flight control,” Ph.D. thesis, Aerosp. Eng., Georgia Inst. Technol., Atlanta, 2003. [19] Y. H. Kim and F. L. Lewis, “High-level feedback control with neural networks,” in Robotics and Intelligent Systems, vol. 21. Singapore: World Scientific, 1998. [20] J. Li and J.-H. Liu, “Identification and control of dynamic systems based on least squares wavelet vector machines,” in Proc. 3rd Int. Conf. Adv. Neural Netw., 2006, pp. 934–942. [21] W. Liu, P. P. Pokharel, and J. C. Principe, “The kernel least-mean-square algorithm,” IEEE Trans. Signal Process., vol. 56, no. 2, pp. 543–554, Feb. 2008. [22] W. Liu, J. C. Principe, and S. Haykin, Kernel Adaptive Filtering: A Comprehensive Introduction. New York: Wiley, 2010. [23] J. Mercer, “Functions of positive and negative type and their connection with the theory of integral equations,” Phil. Trans. Royal Soc., vol. 209, nos. 441–458, pp. 415–446, Jan. 1909. [24] C. A. Micchelli, “Interpolation of scattered data: Distance matrices and conditionally positive definite functions,” Construct. Approx., vol. 2, no. 1, pp. 11–22, Dec. 1986.

KINGRAVI et al.: RKHS APPROACH FOR THE ONLINE UPDATE OF RADIAL BASES IN NEURO-ADAPTIVE CONTROL

[25] M. M. Monahemi and M. Krstic, “Control of wingrock motion using adaptive feedback linearization,” J. Guid. Control Dynamics, vol. 19, no. 4, pp. 905–912, Aug. 1996. [26] S. Narasimha, S. Suresh, and N. Sundararajan, “Adaptive control of nonlinear smart base-isolated buildings using Gaussian kernel functions,” Struct. Control Health Monitor., vol. 15, no. 4, pp. 585–603, Jun. 2008. [27] N. Nardi, “Neural network based adaptive algorithms for nonlinear control,” Ph.D. thesis, School Aerosp. Eng., Georgia Inst. Technol., Atlanta, Nov. 2000. [28] K. Narendra and A. Annaswamy, “A new adaptive law for robust adaptation without persistent excitation,” IEEE Trans. Autom. Control, vol. 32, no. 2, pp. 134–145, Feb. 1987. [29] K. S. Narendra and A. M. Annaswamy, Stable Adaptive Systems. Englewood Cliffs, NJ: Prentice-Hall, 1989. [30] D. Nguyen-Tuong, B. Schölkopf, and J. Peters, “Sparse online model learning for robot control with support vector regression,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 2009, pp. 3121–3126. [31] J. Park and I. W. Sandberg, “Universal approximation using radial-basisfunction networks,” Neural Comput., vol. 3, no. 2, pp. 246–257, Mar. 1991. [32] H. D. Patiño, R. Carelli, and B. R. Kuchen, “Neural networks for advanced control of robot manipulators,” IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 343–354, Mar. 2002. [33] D. Patino and D. Liu, “Neural network based model reference adaptive control system,” IEEE Trans. Syst. Man Cybern. B, vol. 30, no. 1, pp. 198–204, Feb. 2000. [34] A. A. Saad, “Simulation and analysis of wing rock physics for a generic fighter model with three degrees of freedom,” Ph.D. thesis, Air Force Inst. Technol., Air Univ., Wright-Patterson Air Force Base, OH, 2000. [35] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Reidl, “Incremental singular value decomposition algorithms for highly scalable recommender systems,” in Proc. Int. Conf. Comput. Inf. Technol., 2002, pp. 1–6. [36] S. N. Singh, W. Yim, and W. R. Wells, “Direct adaptive control of wing rock motion of slender delta wings,” J. Guid. Control Dynamics, vol. 18, no. 1, pp. 25–30, Feb. 1995. [37] K. Slavakis, P. Bouboulis, and S. Theodoridis, “Adaptive multiregression in reproducing kernel hilbert spaces: The multiaccess MIMO channel case,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 2, pp. 260– 276, Feb. 2012. [38] N. Sundararajan, P. Saratchandran, and L. Yan, Fully Tuned Radial Basis Function Neural Networks for Flight Control. New York: SpringerVerlag, 2002. [39] S. Suresh, S. Narasimhan, and S. Nagarajaiah, “Direct adaptive neural controller for the active control of earthquake-excited nonlinear baseisolated buildings,” Struct. Control Health Monitor., vol. 19, no. 3, pp. 370–384, Apr. 2012. [40] J. A. K. Suykens, J. P. L. Vandewalle, and B. L. R. De Moor, Artificial Neural Networks for Modeling and Control of Non-Linear Systems. Norwell, MA: Kluwer, 1996. [41] G. Tao, Adaptive Control Design and Analysis. New York: Wiley, 2003. [42] K. Y. Volyanskyy, W. M. Haddad, and A. J. Calise, “A new neuroadaptive control architecture for nonlinear uncertain dynamical systems: Beyond σ and e-modifications,” IEEE Trans. Neural Netw., vol. 20, no. 11, pp. 1707–1723, Nov. 2009. [43] M.-G. Zhang and W.-H. Liu, “Single neuron PID model reference adaptive control based on RBF neural networks,” in Proc. Int. Conf. Mach. Learn. Cybern., Aug. 2006, pp. 3021–3025. [44] R. M. Sanner and J.-J. E. Slotine, “Gaussian networks for direct adaptive control,” IEEE Trans. Neural Netw., vol. 3, no. 6, pp. 837–863, Nov. 1992.

Hassan A. Kingravi received the B.S. degree from the University of Texas, Arlington, and the M.S. degree from the College of Computing, Georgia Institute of Technology (Georgia Tech), Atlanta, in 2006 and 2010, respectively. He is currently a Graduate Student with the Intelligent Vision and Automation Laboratory, School of Electrical and Computer Engineering, Georgia Tech. His current research interests include theory and application of kernel methods and neural networks to problems in computer vision, online learning, and adaptive control.

1141

Girish Chowdhary received the B.E. degree (Hons.) from RMIT University, Melbourne, Australia, in 2003, the M.S. degree in aerospace engineering, and the Ph.D. degree from the Georgia Institute of Technology (Georgia Tech), Atlanta, in 2008 and 2010, respectively. He is currently a Post-Doctoral Associate with the Laboratory for Information and Decision Systems, School of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge. Prior to joining Georgia Tech, he was a Research Engineer with the German Aerospace Centers (DLRs), Institute for Flight Systems Technology, Braunschweig, Germany. His current research interests include adaptive and fault tolerant control, machine learning and Bayesian inference, vision aided navigation, cooperative distributed control of networked systems, distributed learning in communication networks and distributed decision making, applications in unmanned aerial systems, autonomous systems, flight control, and process control. Dr. Chowdhary was a member of the UAV Research Facility, Georgia Tech.

Patricio A. Vela (M’01) received the B.S. and Ph.D. degrees from the California Institute of Technology (Caltech), Pasadana, in 1998 and 2003, respectively. He was with Caltech, where he worked on geometric nonlinear control. After working as a PostDoctoral Researcher in computer vision with the Georgia Institute of Technology, Atlanta, he joined the School of Electrical and Computer Engineering in 2005, where he is currently an Associate Professor and the Director of the Intelligent Vision and Automation Laboratory. His current research interests include geometric perspectives to control theory and computer vision.

Eric N. Johnson received the B.S. degree from the University of Washington, Seattle, the M.S. degree from the Massachusetts Institute of Technology, Cambridge, and George Washington University, Washington, and the Ph.D. degree from the Georgia Institute of Technology (Georgia Tech), Atlanta, all in aerospace engineering. He is a Professor with the School of Aerospace Engineering, Georgia Tech. He has a diverse background in guidance, navigation, and control including applications, such as airplanes, helicopters, submarines, and launch vehicles. He has five years of industry experience working with Lockheed Martin, Atlanta, GA and Draper Laboratory, Cambridge, MA. He joined Georgia Tech as a Faculty Member in 2000 and has performed research in adaptive flight control, navigation, embedded software, and autonomous systems. He is the Director of the Georgia Tech UAV Research Facility. He was the Lead System Integrator for rotorcraft experiments and demonstrations for the DARPA Software Enabled Control Program, which included the first air-launch of a hovering aircraft and automatic flight of a helicopter with a simulated stuck swash-plate actuator. He was the Principal Investigator of the Active Vision Control Systems AFOSR Multi-University Research Initiative, developing methods that utilize 2-D and 3-D imagery to enable aerial vehicles to operate in uncertain complex 3-D environments. Dr. Johnson has received the National Science Foundation CAREER Award and a member of the American Institute of Aeronautics and Astronautics and American Hemerocallis Society.

Reproducing kernel hilbert space approach for the online update of radial bases in neuro-adaptive control.

Classical work in model reference adaptive control for uncertain nonlinear dynamical systems with a radial basis function (RBF) neural network adaptiv...
2MB Sizes 0 Downloads 3 Views