2422

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015

A Kernel Adaptive Algorithm for Quaternion-Valued Inputs Thomas K. Paul, Member, IEEE, and Tokunbo Ogunfunmi, Senior Member, IEEE

Abstract— The use of quaternion data can provide benefit in applications like robotics and image recognition, and particularly for performing transforms in 3-D space. Here, we describe a kernel adaptive algorithm for quaternions. A least mean square (LMS)-based method was used, resulting in the derivation of the quaternion kernel LMS (Quat-KLMS) algorithm. Deriving this algorithm required describing the idea of a quaternion reproducing kernel Hilbert space (RKHS), as well as kernel functions suitable with quaternions. A modified HR calculus for Hilbert spaces was used to find the gradient of cost functions defined on a quaternion RKHS. In addition, the use of widely linear (or augmented) filtering is proposed to improve performance. The benefit of the Quat-KLMS and widely linear forms in learning nonlinear transformations of quaternion data are illustrated with simulations. Index Terms— Gaussian kernel, kernel least mean square (KLMS), kernel methods, mean-square error (MSE), quaternions, widely linear estimation.

I. I NTRODUCTION

K

ERNEL adaptive algorithms enable learning of nonlinear functions in an efficient fashion, and has been enjoying considerable growth. Current advancements in this area are described in [1]–[4]. The kernel least mean square (KLMS) algorithm in [1] illustrates the well posedness of using LMS in reproducing kernel Hilbert space (RKHS). An extension to complex-valued data, the complex KLMS (CKLMS), was developed in [5], where the complex differentiability of functions on an RKHS was resolved. Further enhancements to the complex-valued approach were made in [6] and [7]. These machine learning methods provide stable, as well as efficient, learning with real and complex-valued inputs. Meanwhile, quaternions have seen recent interest in various statistical signal processing areas, including adaptive filtering, pattern recognition, and tracking of motion [8]–[20]. They are also known for resolving the issue of Gimbal lock, which can occur when modeling 3-D motion using Euler angles [21]–[23]. Some examples of quaternion linear and widely linear algorithms are shown in [8]–[10], and gradient operations to obtain them, referred to as HR calculus, are Manuscript received January 14, 2014; revised October 7, 2014; accepted November 30, 2014. Date of publication January 12, 2015; date of current version September 16, 2015. The authors are with the Department of Electrical Engineering, Santa Clara University, Santa Clara, CA 95053 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2014.2383912

described in [11]. Other application examples are described in [12]–[18]. The benefits of using quaternions include performing transformations in 3-D or 4-D space conveniently compared with vector algebra, since this is performed simply with quaternion multiplication. This differs from real/complex multiplication, however, since it is noncommutative, which is considered here. In this paper, we describe a kernel adaptive algorithm for quaternions, we call the quaternion KLMS (Quat-KLMS) algorithm. The approach in obtaining this algorithm followed the CKLMS derivation of [5]. A key area to deriving Quat-KLMS involved designing kernels suitable with quaternion data, as well as their associated quaternion RKHS. A modified HR calculus for Hilbert spaces was also considered to find cost function gradients defined on a quaternion RKHS. We note previous references exist describing quaternions with kernel methods. For example, Bayro-Corrochano and Daniel [12] provide an approach to design support vector machines using Clifford algebra. The Mercer theorem for quaternionic kernels is shown in [14], and used in designing a quaternion support vector regressor in [15]. Previous work was also done in [24]–[30], describing quaternion and hypercomplex neural networks with various applications. The work here describes an LMS method, providing an online approach combining the reduced complexity, robustness, and universal function approximation power provided by KLMS with quaternion data. The simplicity of the derivation of the algorithm is owed to the gradient calculus in [11], which can be extended to quaternion Hilbert spaces. In addition, widely linear estimation is considered to enhance performance. This paper is organized as follows. In Section II, background on quaternion numbers and linear, widely linear estimation with quaternions is provided. Section III provides quaternion kernel, RKHS definitions. Section IV describes kernel methods with quaternion data, and shows the derivation of Quat-KLMS. The benefit of quaternion widely linear (QWL) estimation is also considered, yielding different algorithms [widely linear Quat-KLMS (WL-Quat-KLMS) and involution basis Quat-KLMS (IB-Quat-KLMS)], depending on if the perpendicular basis vectors used to form QWL estimation are considered before or after the kernel map. Section V provides simulations showing the performance

2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

PAUL AND OGUNFUNMI: KERNEL ADAPTIVE ALGORITHM

2423

of Quat-KLMS for quaternion channels with nonlinearities. Finally, Section VI concludes this paper. II. Q UATERNION L INEAR AND W IDELY L INEAR E STIMATION Here, we provide a brief description of quaternion linear and widely linear estimation. We can express an input vector of quaternions with x = xa + i xb + j xc + kxd where xa , xb , xc , and xd are real-valued vectors, and i , j , and k are orthogonal unit vectors representing the independent directions of the quaternion input (where i 2 = j 2 = k 2 = −1). The multiplication of these unit vectors are noncommutative, i.e., i j = − ji = k, j k = −k j = i , and ki = −i k = j . It was shown previously that three binary operations on quaternions, referred to as involutions [10], may be formed xi = −i xi = xa + i xb − j xc − kxd x j = − j x j = xa − i xb + j xc − kxd (1)

These perpendicular involutions may be used to determine components xa , xb , xc , and xd (shown later), as well as to form a basis, when combined with x, to form an augmented quaternion vector for widely linear estimation. They can be also used in expressing the conjugate of quaternion data, or 1 i (x + x j + xk − x). (2) 2 The perpendicular involutions and conjugation form shown will be used during the analysis, as well as the properties x∗ =

1 −j [x − xi + x j − xk ] xa = [x + xi + x j + xk ], xc = 4 4 −i −k [x + xi − x j − xk ], xd = [x − xi −x j +xk ] (3) xb = 4 4 and their conjugate version 1 ∗ ∗ ∗ ∗ [x + xi + x j + xk ] 4 −i ∗ ∗ ∗ [−x∗ − xi + x j + xk ] xb = 4 −j ∗ ∗ ∗ [−x∗ +xi − x j + xk ] xc = 4 −k ∗ ∗ ∗ xd = (4) [−x∗ + xi + x j − xk ]. 4 For QWL estimation, we know that, from [10], the estimation form may be expressed xa =

yQWL (n) = h (n)x(n) + g (n)x (n) + u (n)x (n) H

i

+ v H (n)xk (n)

H

j

is known as Q-proper. Otherwise, the signal is Q-improper (see [8]–[10] for details). QWL estimation may be seen to allow optimal minimum mean-square error (MSE) modeling for both distributions.

This section describes quaternion kernel definitions and their associated RKHS used to form kernel methods with quaternions. Definitions for the complex-valued case are in [5] and serve as a guide, along with [14], for the quaternion definitions. Background on kernels, RKHSs, and their use in nonlinear adaptive algorithms can be found in [4] and [5]. Here, our focus is on providing quaternion-based kernel and RKHS definitions. Initially, define a function κ : X × X → H, where x 1 , . . . , x N ∈ X defines the set of possible inputs, and H denotes a quaternion division algebra. A division algebra can be defined as an algebra where division can be performed. Specifically, each nonzero element q of the algebra has a unique left, right inverse element. If, for every nonzero element, the left and right inverse element are the same, the algebra is also considered associative. The quaternion division algebra is an associative algebra, even though multiplication in the algebra is noncommutative in general [31]. It should be noted that the noncommutative property for multiplication can be observed to complicate the formation of kernel methods using quaternions. Nevertheless, we shall proceed to consider a quaternion kernel definition here. Based on function κ from before, define an N × N matrix K with elements Ki j = κ(x i , x j ) for i, j = 1, . . . , N. For κ to be a quaternion kernel function, matrix K should satisfy q Kq = H

N,N 

qi∗ Ki, j q j ≥ 0

(7)

i=1, j =1

(5)

where x(n) and coefficients h(n), g(n), u(n), and v(n) are all vectors with quaternion-valued components. Finally, (. ) H denotes Hermitian transpose based on quaternion conjugation. Equation (5) can also be expressed as H (n)xa (n) yQWL (n) = wQWL

{1, i }, {1, j }, {1, k}, {i, j }, { j, k}, {k, i }

III. Q UATERNION K ERNELS AND RKHS

xk = −kxk = xa − i xb − j xc + kxd .

H

where wQWL (n) is the coefficient vector, and xa (n) = [xT (n)xiT (n)x j T (n)xkT (n)]T is the augmented input vector. The use of widely linear estimation is beneficial when handling quaternion inputs of unknown signal distribution. With quaternions, a key characteristic is its properness, which may be referred to as the dependence of the probability distribution to angle rotations. A signal whose probability density function is invariant to arbitrary phase rotations with respect to any of the six axis pairs [10]

(6)

for all qi H, i = 1 . . . N, and should also be Hermitian (i.e., K H = K). This defines a positive semidefinite matrix in the quaternion case, similar to the complex case. A function κ satisfying this may be referred to as a nonnegative definite kernel or, for notation convenience, simply as a kernel. It is well established that nonnegative definite kernels have an associated RKHS [4], [5], [14], [15]. In particular, a nonnegative definite kernel κ can be seen based on the

2424

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015

Mercer theorem to satisfy the property κ(u, v) =

∞ 

∗ λ−1 i φi (u)φi (v)

(8)

i=0

where λi are the eigenvalues and φi the eigenfunctions. For quaternions, in [14], it was seen when κ is Hermitian and nonnegative definite, the associated eigenvalues can be shown to be real and nonnegative. This allows the Mercer theorem to apply, in the quaternion case, with the form of (8), and enables the following factoring. −1/2 If the mapping ϕ(x) : X → Q, ϕ(x) = [λ1 φ1∗ (x), −1/2 ∗ −1/2 ∗ λ2 φ2 (x), λ3 φ3 (x) . . .] is constructed, we can write κ(u, v) = ϕ H (u)ϕ(v) where the dimensionality of Q is found from the number of nonzero values of λ−1 i . Q can be seen to correspond to the RKHS of kernel κ [4], in this case, a quaternion RKHS. Note κ defines an inner product in the RKHS, or ˆ ˆ κ(u, v) = (v), (u) Q ˆ Q (·) is referred to as a kernel mapping of the where notation  input into quaternion RKHS Q, and ·, · Q is its inner product. The characteristic that the associated eigenvalues are real and nonnegative is based on assuming a Hermitian-based symmetry for quaternion kernels [14]. There may exist concerns that positive semidefinite quaternion matrices may be formed with a less restricted eigenvalue decomposition, thus defining alternative kernel functions. However, here, we only consider quaternion kernels possessing this symmetry. Based on these definitions, we now state some quaternion kernels based on the Gaussian kernel form. Derivations for both are described in Appendixes A and B. A kernel form working directly on quaternion data, referred to here as the quaternion Gaussian kernel, we describe as  κσ,Q d (x, y) = exp

−σ

−2

d  

xj −

2 y ∗j

(9)

where x and y are the input vectors, both of length d, consisting of quaternion components. Its associated RKHS is denoted as Q d , and its dimensionality varies based on d. Meanwhile, the kernel function resulting from extending the real-valued Gaussian kernel to quaternion data (similar to the complexification approach of [5]) was seen to be κσ,Qd (x, y) = 4 exp

−σ

−2

d 

 |y j − w j | . 2

ˆ Q (x(n)), w Q  Q dˆQ (n) = 

(10)

j =1

We refer to this function as the quaternion-extended real Gaussian kernel (Quat-Ext Gaussian) kernel for short. Its RKHS is denoted Qd , where dimensionality varies with d. Using these functions, we consider various kernel estimation methods for quaternions, both linear as well as widely linear approaches. Afterward, we derive LMS-based updates to these algorithms, yielding the Quat-KLMS and extensions with widely linear estimation.

(11)

where x(n) is the input vector and w Q is the weight vector, both with quaternion-valued components. The notations ˆ Q (·) and ·, · Q are the same as described previously.  For combining widely linear estimation and Quat-KLMS, we consider here two methods. QWL estimation may be applied after the kernel map, extending linear estimation to a widely linear form (5). We refer to this approach as the WL-Quat-KLMS algorithm. Alternatively, we consider the approach of extending the input vector x(n) based on the perpendicular involutions shown in (6), i.e., forming xa (n), before the kernel map. An interpretation of this approach is provided later. This approach we refer to as the IB-Quat-KLMS. The two methods are as shown. Method A (WL-Quat-KLMS): Using this approach, the estimation form is ˆ Q WL (x(n)), w Q WL  Q WL dˆQ WL (n) = 

(12)

ˆ Q WL (·) may be defined in terms of  ˆ Q (·) where mapping  T iT jT kT ˆ Q WL (a) = [ ˆ Q (a),  ˆ Q (a),  ˆ Q (a),  ˆ Q (a)]T , and as  w Q WL is a weight vector with quaternion-valued components. Q WL denotes the associated quaternion RKHS with the ˆ Q WL (·). mapping  Method B (IB-Quat-KLMS): Based on this approach, we may write ˆ Q IB (x(n)), w Q IB  Q IB dˆQ IB (n) = 



j =1



IV. Q UATERNIONS AND K ERNEL A DAPTIVE M ETHODS To combine quaternions with kernel adaptive methods, an approach with LMS, referred to here as the Quat-KLMS may be described as follows. The goal is estimating the channel output where the reference data, i.e., desired response, is: d(n) = f (x(n)) + v(n). Here, f (·) is an unknown nonlinear quaternion function. The estimator form is

(13)

ˆ Q (·) ˆ Q IB (·) may be defined based on  where mapping  ˆ Q IB (a) =  ˆ Q ([aT , aiT , a j T , akT ]T ). Note that the as  ˆ Q (·) may differ depending dimensionality of mapping  on the input vector length, d (9) and (10). The weight vector w Q IB is of suitable dimension size to perform the inner product. Q IB denotes the associated quaternion RKHS with ˆ Q IB (·).  In the following Sections IV-A to IV-F, we consider the weight updates for these algorithms, as well as their final algorithm forms suitable for implementation and benefits. Finding these weights initially begins with expressing the estimator cost function form in Section IV-A. Afterward, the problem of finding the cost function gradient (i.e., forming a calculus suitable for quaternion Hilbert spaces) is described in Section IV-B. The gradient operations obtained are applied to each of the Quat-KLMS (Section IV-C), WL-Quat-KLMS (Section IV-D), and IB-Quat-KLMS (Section IV-E) cost functions to obtain each algorithm form. Finally, Section IV-F describes the benefit of the WL-based algorithms.

PAUL AND OGUNFUNMI: KERNEL ADAPTIVE ALGORITHM

2425

A. Estimator Cost Function

exists such that

For each estimator, beginning with Quat-KLMS here, the goal of the algorithm is finding weights that minimize a cost function based on the estimator form, i.e., (11) for Quat-KLMS. The cost function here uses the MSE E[|e(n)|2 ], where e(n) = d(n) − dˆQ (n). Following the well-known LMS approach, the instantaneous square error |e(n)|2 is used directly, instead of expectation E[|e(n)|2 ]. A normalized form for the Quat-KLMS weight update is w Q (n) = w Q (n − 1) −

η∇w∗Q {|e(n)|2 } ˆ Q (x(n)),  ˆ Q (x(n)) Q 

(14)

where ∇w∗Q is the gradient with respect to w∗Q , and η is step size. Note this quaternion-based gradient should be found with respect to the conjugate of the weights following Branchwood’s gradient [11], similar to the complex case. To obtain the update, we evaluate ∇w∗Q {|e(n)|2 }, yielding   ˆ w Q  Q |2 ∇w∗Q {|e(n)|2 } = ∇w∗Q |d(n) − (x(n)),  ˆ Q (x(n)), w Q  Q ) = ∇w∗Q (d(n) −  ˆ Q (x(n)), w Q  Q )∗ × (d(n) −   ˆ Q (x(n)), w Q  Q ) = ∇w∗Q (d(n) − 



 ˆ Q (x(n)) Q ) . × (d ∗ (n) − w Q ,  (15)

Consider that the gradient in the case here is performed on a real-valued cost function defined on a quaternion Hilbert space (specifically an RKHS). The scenario closely resembles [5], where a complex-valued RKHS, instead of a quaternion-valued RKHS, was considered. Obtaining the gradient definition there required developing a modified Wirtinger’s (or CR) calculus specifically for Hilbert spaces. Briefly, CR calculus describes a set of rules for finding the derivative of any complex function, analytic or nonanalytic, with respect to a complex variable. Similarly, here, we require the quaternion equivalent (i.e., HR calculus of [11], extended for Hilbert spaces). A description of the modified HR calculus form for Hilbert spaces, and its properties are described next. B. Modified HR Calculus for Hilbert Spaces The approach used here follows [5] and is formed based on the Fréchet derivative that is applicable to general Hilbert spaces. Initially, consider a Hilbert space H for learning functions whose outputs are defined on quaternion division algebra H . It is known that to generalize differentiability to Banach or Hilbert spaces, the Fréchet derivative may be used. Note that a Hilbert space is just a Banach space whose norm is evaluated based on an inner product, and Hilbert spaces are of interest to us here. A definition of the Fréchet derivative follows. Define on Hilbert space H an operator T : H → Hν . Operator T may be referred to as Fréchet differentiable at f 0 if a linear continuous operator W = (W1 , . . . , Wν ) : H → Hν

lim

h H →0

=

T( f 0 + h) − T( f 0 ) − W(h) νH =0

h H

(16)

where · H = (·, · H )1/2 is the induced norm, and ·, · H the dot product, on Hilbert space H . W is called the Fréchet derivative, and can be denoted dT( f 0 ). This Fréchet derivative definition only differs from [5], for complex data, in that the numerator magnitude is measured on H , and since H is a Hilbert space suitable with quaternions. The various Fréchet derivative cases of (16) (for ν = 1, ν > 1, and partial derivatives) are similar to those of [5]. For clarity, these cases are summarized here. When ν = 1, T and W are scalar, i.e., T : H → H and W : H → H. In this case, based on Reisz’s representation theorem, W has inner product form: W (h) = h, w H , where w∗ is called the gradient of T , denoted w∗ = ∇T ( f0 ), in (16). A proof of the applicability of Reisz representation theorem to quaternions is provided in [19], which specifically shows the uniqueness and existence of a gradient of the form shown, i.e., W (h) = h, w H . When ν > 1, we have the vector: T = (T1 , T2 , . . . , Tν ), and the Fréchet derivative can be seen to be ⎞ ⎛ h, ∇T1 ( f 0 )∗  H ⎜ h, ∇T2 ( f 0 )∗  H ⎟ ⎟ ⎜ (17) ∂T( f 0 )(h) = ⎜ ⎟. .. ⎠ ⎝ . h, ∇Tν ( f 0 )∗  H To consider partial derivatives, define T : H μ → H on Hilbert space H μ equipped with inner product f, g H μ =

μ 

 f ι , gι  H

(18)

ι=1

where f = ( f 1 , f 2 , . . . , fμ ), g = (g1 , g2 , . . . , gμ ). T (f ) is considered Fréchet differentiable at f 0 with respect to fι iff there exists w H so that lim

h H →0

=

T (f0 + [h]ι ) − T (f0 ) − [h]ι , w H =0

h H

(19)

where [h]ι = (0, 0, . . . , 0, h, 0, . . . , 0) is the element of H μ with zero entries everywhere except at place ι, and w∗ is the gradient of T at f 0 with respect to fι , denoted w∗ = ∇ι T (f0 ). The Fréchet partial derivative at f 0 with respect to f ι can be denoted ∂T (f )(h) = [h]ι , w H . ∂ fι 0

(20)

These definitions provide the foundations of how the Fréchet derivative is considered here. For describing a modified HR calculus, we consider the Hilbert space Q described from Appendix B, which was equipped with a suitable quaternion inner product. It may be seen that space Q is isomorphic with H 4 (or a real RKHS). This isomorphism can be used to describe Fréchet-based differentiation rules with quaternion-valued functions defined on Q, in a similar way to the rules for complex data in [5].

2426

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015

Consider a function T defined on an open subset A of Q, i.e., T : A ⊆ Q → H, where T(f) = T(fa + i fb + j fc + kfd ) = Ta (fa , fb , fc , fd ) + i Tb (fa , fb , fc , fd ) + j Tc (fa , fb , fc , fd ) + kTd (fa , fb , fc , fd )

(21)

with f a , f b , f c , f d H and Ta , Tb , Tc , and Td being real-valued functions defined on H4 . The equivalence shown in (21) is based on the isomorphism between Q and H 4 . In addition, considering the isomorphism between H and R4 , T(f ) can be equivalently expressed as T(f) = [Ta (fa , fb , fc , fd ), Tb (fa , fb , fc , fd ), Tc (fa , fb , fc , fd ), Td (fa , fb , fc , fd )]. If Ta , Tb , Tc , and Td are Fréchet differentiable in the real sense, then we can describe [regardless of whether a Fréchet-based quaternion derivative of T based on (16) exists] differentiation rules using the Fréchet real derivative applicable to quaternion functions on Q. The rules here should naturally extend the HR calculus of [11] to Hilbert spaces. From the first-order Taylor series expansions of Ta , Tb , Tc , and Td Ta (c + h) = Ta (c) + h a , ∇a Ta (c)H + h b , ∇b Ta (c)H + h c , ∇c Ta (c)H + h d , ∇d Ta (c)H + o( h H 4 ) Tb (c + h) = Tb (c) + h a , ∇a Tb (c)H + h b , ∇b Tb (c)H + h c , ∇c Tb (c)H + h d , ∇d Tb (c)H + o( h H 4 ) Tc (c + h) = Tc (c) + h a , ∇a Tc (c)H + h b , ∇b Tc (c)H + h c , ∇c Tc (c)H + h d , ∇d Tc (c)H + o( h H 4 ) Td (c + h) = Td (c) + h a , ∇a Td (c)H + h b , ∇b Td (c)H + h c , ∇c Td (c)H + h d , ∇d Td (c)H + o( h H 4 ) where ∇ι is the gradient at c with respect to fι (Fréchet partial derivative). Following the methodology shown in [5] in the complex case, and considering the quaternion properties outlined in [11], it can be shown that T(c + h)

  1 ∗ = T(c)+ h, (∇a T(c)−i ∇b T(c)− j ∇c T(c)−k∇d T(c)) 4 Q   1 + hi, (∇a T(c)−i ∇b T(c) + j ∇c T(c) + k∇d T(c))∗ 4   Q 1 + h j, (∇a T(c)+i ∇b T(c)− j ∇c T(c)+k∇d T(c))∗ 4  Q 1 + hk, (∇a T(c) + i ∇b T(c)+ j ∇c T(c)−k∇d T(c))∗ 4 Q + o( h Q ). (22)

From this equation, we can see the Taylor series expansion of T expressed in terms of h, hi, h j, and hk . The same process may also be extended to higher order Taylor series.

Based on (22), the Fréchet-based HR gradients is ⎤ ∇f T ( f, f i , f j , f k )(c) ⎢ ∇f i T( f, f i f j f k )(c) ⎥ ⎥ ⎢ ⎣ ∇f j T( f, f i , f j , f k )(c) ⎦ ∇f k T( f, f i , f j , f k )(c)   ⎤⎡ ⎤ ⎡ ∇fa T fa , fb , fc , fd  (c) 1 −i − j −k ⎢ ⎥ 1 ⎢ 1 −i j k ⎥ ⎥ ⎢ ∇fb T fa , fb , fc , fd  (c) ⎥ = ⎢ ⎣ ⎦ ⎣ −j k ∇fc T fa , fb , fc , fd  (c) ⎦ 4 1 i 1 i j −k ∇fd T fa , fb , fc , fd (c) (23) ⎡

and similarly, using the conjugate quaternion properties (4), the Fréchet HR∗ gradients may be shown to be ⎡ ⎤ ∗ ∗ ∗ ∇f∗ T( f∗ , f i , f j , f k )(c) ∗ ∗ ∗ ⎢ ∇ i ∗ T( f∗ , f i , f j , f k )(c) ⎥ ⎢ f ⎥ ⎢ ⎥ ∗ i∗ j ∗ k∗ ∗ ∇ T( f , f , f , f )(c) ⎣ fj ⎦ ∗ ∗ ∗ ∇f k∗ T( f∗ , f i , f j , f k )(c) ⎡ ⎤ ⎤⎡ 1 i j k ∇fa T( fa , fb fc , fd )(c) ⎥ ⎢ 1 ⎢1 i − j −k ⎥ ⎥ ⎢ ∇fb T( fa , fb fc , fd )(c) ⎥. = ⎢ ⎣ ⎦ ⎦ ⎣ T( f , f f , f )(c) ∇ 1 −i j −k 4 fc a b c d ∇fd T( fa , fb fc , fd )(c) 1 −i − j k (24) The Fréchet HR and HR∗ gradients yielded the following properties. 1) If T(f) = h, f∗  Q or T(f) = f, h∗  Q then ∇f T(c) = h η and ∇f T(c) = 0 for η{i, j, k} at every c. Also: ∇f∗ T(c) = −h/2 and ∇fη∗ T(c) = h/2 for η{i, j, k}. 2) If T(f) = h, fQ or T(f) = f∗ , h∗  Q then ∇f∗ T(c) = h and ∇fη∗ T(c) = 0 for η{i, j, k} at every c. Also: ∇f T(c) = −h/2 and ∇fη T(c) = h/2 for η{i, j, k}. ∗ 3) If T(f) = h, f i  Q or T(f) = f i , h∗  Q then ∇f i ∗ T(c) = h and ∇f∗ T(c) = ∇f j ∗ T(c) = ∇f k∗ T(c) = 0 at every c. Also: ∇f i T(c) = −h/2 and ∇fη T(c) = h/2 for η{1, j, k}. ∗ 4) If T(f) = h, f i  Q or T(f) = f i , h∗  Q then ∇f i T(c) = h and ∇f T(c) = ∇f j T(c) = ∇f k T(c) = 0 at every c. Also: ∇f i ∗ T(c) = −h/2 and ∇fη∗ T(c) = h/2 for η{1, j, k}. 5) Product rule: when the components of operators T, S : Q → H are all Fréchet differentiable in the real sense at c ∈ Q, then ∇f∗ (T · S)(c) = [∇f∗ T(c)]S(c) + T(c)[∇f∗ S(c)] (note quaternion multiplication is noncommutative). Property 1 can be obtained by expanding the quaternion inner product being considered, h, f∗  Q or f, h∗  Q in terms of its quaternion components, and applying the gradient definition used ∇f T( f, f i , f j , f k )(c) 1 = [∇fa T( fa , fb , fc , fd )(c) − i ∇fb T( fa , fb , fc , fd )(c) 4 − j ∇fc T( fa , fb , fc , fd )(c) − k∇fd T( fa , fb , fc , fd )(c)]. The other properties may be obtained similarly.

PAUL AND OGUNFUNMI: KERNEL ADAPTIVE ALGORITHM

The properties described here apply to functions defined on Hilbert spaces, which differs from the HR calculus in [11]. Use of the Fréchet derivative allowed for providing the gradient definition on the desired quaternion Hilbert space, which is considered isomorphic with a multivariable real Hilbert space. For the multivariable real space, the partial Fréchet gradient with respect to each real component of the input is used to provide the Taylor series expansion of each of the real components of the nonlinear function output [i.e., equation before (22)]. These partial gradients were used to express the Fréchet-based gradient with respect to a quaternion (i.e., HR calculus for Hilbert spaces) based on the isomorphism between Q and H 4. The resulting Fréchet HR and HR∗ gradients naturally extend the HR, HR∗ gradients in [11] and their properties, although with inner product and transform space definitions on Hilbert spaces. The Fréchet-based gradient and properties described here are used to find the Quat-KLMS cost function gradient [i.e., gradient of (15)] in Section IV-C. Afterward, they are used to find the gradients of the WL-Quat-KLMS and IB-Quat-KLMS cost functions, which are shown in Sections IV-D and IV-E, respectively. C. Quat-KLMS Algorithm Applying the Fréchet HR and HR∗ gradient properties of Section IV-B to (15), we get ∗ ˆ Q (x(n)) (25) ˆ Q (x(n))e∗ (n) + 1 e(n) ∇w∗Q {|e(n)|2 } = − 2 and the weight update for Quat-KLMS may be expressed as  ˆ Q (x(n))e∗ (n) w Q (n) = w Q (n − 1) + η   ∗ 1 ˆ − e(n) Q (x(n)) K−1 Q (n) 2

2427

We observe that, due to the noncommutative property of quaternion multiplication, the algorithm may not easily be cast in inner products using the well-known kernel trick [4], [5]. To form the gradient so that we may use inner products, we rewrite the gradient as follows. We equivalently write (25) as 3ˆ ∗ ∇w∗Q {|e(n)|2 } = −  Q (x(n))e (n) 2 ∗  1 ˆ ∗ ˆ Q (x(n)) . +  Q (x(n))e (n) + e(n) 2 (27) Since a + a of (27) as

The estimates at iteration n would then be ˆ Q (x(n)), w Q (n − 1) Q dˆQ (n) =   n−1  H  ∗  1 ∗ −1 ˆ Q (x(m)) K (m) ˆ Q (x(m))e (m) − e(m) = η  Q 2 m=1 ˆ Q (x(n))   n−1    1 ˆT H ∗ −1 ˆQ KQ (m) e(m) (x(m)) −  (x(m))e (m) = η 2 Q m=1  ˆ Q (x(n)) × where in the last step, H Q −1 (m) is real valued.

= 2Re{a}, we can write the second term

∗  1 ˆ ∗ ˆ Q (x(n))  Q (x(n))e (n) + e(n) 2   ˆ Q (x(n))e∗ (n) = Re 

ˆ a (x(n))ea (n) +  ˆ b (x(n))eb (n) +  ˆ c (x(n))ec (n) = ˆ d (x(n))ed (n) + ˆ a (·) to  ˆ d (·) and ea (n) to ed (n) are the components where  ˆ of  Q (·) and e(n), respectively. From the quaternion-based properties (3), we get   ˆ Q (x(n))e∗ (n) Re  =

 1 ˆ ˆ iQ (x(n)) +  ˆ Qj (x(n)) +  ˆ kQ (x(n)) ea (n)  Q (x(n)) +  4 1 ˆ ˆ iQ (x(n)) −  ˆ Qj (x(n)) −  Q (x(n)) +  4  ˆ kQ (x(n)) i eb (n) − −

ˆ S (x(a)),  ˆ S (x(a)) S is the normalization where K S (a) =  term. Assuming w(0) = 0, the weights at iteration n can be written w Q (n)   n  ∗  ˆ Q (x(m)) K−1 (m) . ˆ Q (x(m))e∗ (m) − 1 e(m) =η  Q 2 m=1 (26)





1 ˆ ˆ iQ (x(n)) +  ˆ Qj (x(n))  Q (x(n)) −  4  ˆ kQ (x(n)) j ec (n) − 1 ˆ ˆ iQ (x(n)) −  ˆ Qj (x(n))  Q (x(n)) −  4  ˆ kQ (x(n)) ked (n). +

(28)

Finally, using the conjugate quaternion-based properties (4) to factor yields   ˆ Q (x(n))e∗ (n) Re  =

1 ˆ ∗ ˆ iQ (x(n))ei ∗ (n) +  ˆ Qj (x(n))e j ∗ (n)  Q (x(n))e (n) +  4  ˆ kQ (x(n))ek ∗ (n) (29) +

and the gradient becomes 5ˆ 1 ˆi ∗ ∗ (x(n))ei (n) ∇w∗Q {|e(n)|2 } = −  Q (x(n))e (n) +  4 4 Q 1 ˆk 1ˆj j∗ k∗ +  Q (x(n))e (n) +  Q (x(n))e (n). 4 4 (30)

2428

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015

Thus, (26) can be equivalently expressed as w Q (n) = η

n   5 m=1

4

yields

ˆ Q (x(m))e∗ (m) − 1  ˆ Q (xi (m))ei ∗ (m)  4

 1ˆ 1ˆ j j∗ H k∗ −  Q (x (m))e (m)−  Q (x (m))e (m) 4 4

−1 × KQ (m) η

ˆ Q (x(n)) =  ˆ Q (xη (n)) for where we use assumption:  η{i, k, j }, as described in Appendix A. Based on this, the estimates in this case are ˆ Q (x(n)), w Q (n − 1) Q dˆQ (n) =   n−1  5 −1 =η KQ (m) e(m)κ Q (x(m), x(n)) 4

dˆQ (n) = η

m=1

1 ∗ − (s (xcomp (m)))T e (m) 2  n−1   ∗ =η e(m) T (xcomp (m))s

where κ Q (·, ·) is the quaternion Gaussian kernel. For the Quat-Ext Gaussian kernel we can see, considering Appendix B, that

 1 T ∗ − (xcomp (m))s e (m) s (xcomp (n)) 2  n−1    1 ∗ ∗ e(m)s − s e (m) =η 2 m=1 

and the Quat-KLMS may be expressed ˆ Q (x(n)), w Q (n − 1) Q dˆQ (n) =   n−1  5 −1 =η KQ (m) e(m)κ Q (x(m), x(n)) 4 m=1

1 − ei (m) j κ Q (x(m), x(n)) 4 1 j − e (m)kκ Q (x(m), x(n)) 4  1 k − e (m)i κ Q (x(m), x(n)) 4

m=1



× T (xcomp (m)) (xcomp(n)) =η

 n−1   1 ∗ e(m) |s |2 − s e (m)s 2

m=1



× κσ,Rd (xcomp (n), xcomp (m))

(33)

D. WL-Quat-KLMS Algorithm For WL-Quat-KLMS, we minimize the MSE E[|e(n)|2 ], where e(n) = d(n) − dˆQ WL (n), and dˆQ WL (n) is from (12). Using a normalized LMS-based approach

−1 KQ (m)[(5e(m) − ei (m) j − e j (m)k

w Q WL (n) = w Q WL (n − 1) −

m=1

− ek (m)i )κ Q (x(m), x(n))]

 n−1   1 ∗ 2 | |s e(m) − s e (m)s =η 2

where κσ,Rd (·, ·) is the real Gaussian kernel form. Note normalization is not required, since k Q (a) = 4 for all a with the Quat-Ext Gaussian kernel. However, our analysis here focuses on (31) and (32), which form the Quat-KLMS estimates with the quaternion Gaussian and Quat-Ext Gaussian kernels. Finally, an alternative gradient (referred to as the i -gradient), was explored in [32], which could improve convergence. However, here, we focus on the HR calculus of [11] to find the gradient of this, as well as other algorithms. We now describe the widely linear-based algorithms. For these methods, we only consider use of the quaternion Gaussian kernel. The reasoning is shown later in this section.

ˆ Qj (a) =  ˆ Q (a)(−k) 

ˆ kQ (a) =  ˆ Q (a)(−i ) 

n−1 

s (xcomp (n))

× T (xcomp (m)) s (xcomp (n))

1 − ei (m)κ Q (xi (m), x(n)) 4 1 − e j (m)κ Q (x j (m), x(n)) 4  1 − ek (m)κ Q (xk (m), x(n)) (31) 4





m=1

m=1

ˆ Q (a)(− j ), ˆ iQ (a) =  

 n−1   e(m)(s (xcomp (m))) H

(32)

where κ Q (·, ·) is the Quat-Ext Gaussian kernel, and factor (1/4) was absorbed in step-size η. An alternative approach for deriving the Quat-Ext kernel-based algorithm can be obtained by viewing the RKHS in terms of real-valued mapping (xcomp (n)) and quaternion scalar s (both defined in Appendix B). This approach

where ∇wQWL This yields



∗ {|e(n)|2 } η∇wQWL

K Q WL (n)



is the gradient with respect to wQWL .



∇wQWL {|e(n)|2 } ∗

ˆ Q WL (x(n)), w Q WL  Q WL |2 } = ∇wQWL {|d(n) −  ∗ ˆ Q WL (x(n)), w Q WL  Q WL ) = ∇wQWL {(d(n) −  ∗

(34)

ˆ Q WL (x(n)) Q WL )}. × (d (n) − w Q WL , 

PAUL AND OGUNFUNMI: KERNEL ADAPTIVE ALGORITHM

It can be seen, based on the properties from Section IV-B, that ∗ ∗ ˆ Q (x(n)) ˆ Q WL (x(n))e∗ (n)+ 1 e(n) ∇wQWL {|e(n)|2 } = − WL 2 (35) and the estimates, similar to before, are ˆ Q WL (x(n)), w Q WL (n − 1) Q WL dˆQ WL (n) =   n−1  5 −1 e(m)κ Q WL (x(m), x(n)) =η KQ (m) WL 4 m=1

1 − ei (m)κ Q WL (xi (m), x(n)) 4 1 j − e (m)κ Q WL (x j (m), x(n)) 4  1 k k − e (m)κ Q WL (x (m), x(n)) 4 (36) η η ˆ ˆ where again we assume:  Q (x(n)) =  Q (x (n)) for η{i, k, j }. Function κ Q WL (a, b) can be evaluated using ˆ Q WL (b),  ˆ Q WL (a) Q WL κ Q WL (a, b) =  j

i k = κ Q (a, b) + κ Q (a, b) + κ Q (a, b) + κ Q (a, b)

where a η bη = (ab)η for η{i, k, j } was used in factoring κ Q WL (a, b). E. IB-Quat-KLMS Algorithm For IB-Quat-KLMS, E[|e(n)|2 ] is minimized based on the estimator dˆQ IB (n) of (13). The cost function can be shown similarly to Quat-KLMS, WL-Quat-KLMS, and gradient again based on Section IV-B. Thus, using the same process ˆ Q IB (x(n)), w Q IB  Q IB dˆQ IB (n) =   n−1  5 −1 =η K Q IB (n) e(m)κ Q IB (x(m), x(n)) 4 m=1

1 − ei (m)κ Q IB (xi (m), x(n)) 4 1 j − e (m)κ Q IB (x j (m), x(n)) 4  1 k k − e (m)κ Q IB (x (m), x(n)) 4 (37) where

 T  T κ Q IB (a, b) = κ Q a T , aiT , a j T , akT , bT , biT , b j T , bkT .

From (31), (36), and (37), the difference between Quat-KLMS, WL-Quat-KLMS, and IB-Quat-KLMS are the functions κ Q (a, b), κ Q WL (a, b), and κ Q IB (a, b). These functions correspond to different transform spaces, based on their ˆ Q WL (·), and  ˆ Q IB (·). In particuˆ Q (·),  associated mappings  ˆ Q WL (·) and  ˆ Q IB (·) are extended transform spaces based lar,  on widely linear estimation. A brief convergence analysis of the various algorithms described here is listed in Appendix C. To understand the benefit of these transform spaces in learning nonlinear quaternion functions, we provide here interpretations of the various estimation approaches described.

2429

F. Benefit of WL-Based Methods ˆ Q WL (·) is defined From before, the mapping   T T ˆ Q (a) ˆ iT ˆ jT ˆ kT ˆ Q WL (a) =   Q (a) Q (a) Q (a) . We can see from this mapping that widely linear estimation with WL-Quat-KLMS is applied after the nonlinear transform, and resembles the estimator of (5) in the linear stage. ˆ Q IB (·) of IB-Quat-KLMS is defined Meanwhile, mapping  ˆ Q ([aT , aiT , a j T , akT ]T ). ˆ Q IB (a) =   For this mapping, the input vector is formed from the original input concatenated with involutions. The estimator is ˆ Q IB (x(n)), w Q IB  Q IB dˆQ IB (n) =    ˆ Q ([xT (n), xiT (n), x j T (n), xkT (n)]), w Q IB . =  Q A possible interpretation of these mappings is as follows. Initially, consider minimum MSE estimation with real-valued data dˆ = E[d|x] with desired response d = w H x + v, where w is the weight vector, and v is added noise. For quaternions, when linear regression is considered based on the real-valued components we may write dˆ{m} = E[d{m} |xa , xb , xc , xd ] with m{a, b, c, d}, d ∈ H, and x ∈ Hr , where r is the length of observation vector x. This can be equivalently expressed with the perpendicular involutions as a basis [11], [33] dˆ{m} = E[d{m} |x, xi , x j , xk ].

(38)

From [34], although the estimate dˆ = E[d|x] is valid, the regression is linear when d, x are jointly normal and real. When complex normal data is used (or quaternion normal here), the regression required is widely linear. Equation (38) shows this in a similar manner with [11]. For the nonlinear case, the desired response is d = f (x)+v, where f (. ) is a nonlinear function. This may also be expressed ˆ Q (·) is the transform map of a ˆ Q (x) + v, where  d = wH  suitable quaternion kernel function. It follows that the estimation may be described:   ˆ Q (x),  ˆ iQ (x),  ˆ Qj (x),  ˆ kQ (x) (39) dˆ{m} = E d{m} | illustrating widely linear regression. The WL-Quat-KLMS follows the required regression. We note, however, that use of the Quat-Ext Gaussian kernel form may be viewed as dˆ = E[d| (xa , xb , xc , xd )]

(40)

considering how the kernel function is formed in Appendix B. The approach is based using a real mapping (·) to learn the nonlinear function f (. ) applied to components of quaternion x.

2430

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015

Based on using perpendicular involutions as an equivalent basis, we consider ˆ Q (x, xi , x j , xk )]. dˆ = E[d|

(41)

Equation (41) shows the estimation form for IB-Quat-KLMS. The use of the perpendicular involutions of x to form the input to the nonlinear stage of the quaternion Gaussian kernel may be considered to yield a similar benefit observed with the Quat-Ext Gaussian kernel. Specifically, here, involutions are used instead of components. From [6], the complexified kernel form may be viewed as a dual real channel formulation where a real kernel is applied to each channel. This form was seen in [6] to only differ from its augmented (or widely linear) extension by a scaling factor. We assume here the Quat-Ext kernel provides a similar benefit. Thus, the Quat-Ext kernel can be seen to improve performance compared to the quaternion kernel for handing channels, where WL estimation provides benefit. A complete study of the benefit of the approach of (41) in forming an extended RKHS for quaternion-valued data (versus an extended RKHS based on the real components) remains for future work. The next section, however, focuses on simulations to verify the benefit of the approaches described here.

Fig. 1.

Quat-KLMS, WL methods (input: Q-proper, noise = −17 dBm).

Fig. 2.

Quat-KLMS, WL methods (input: Q-improper, noise = −17 dBm).

V. S IMULATION R ESULTS Here, we evaluate the performance of the Quat-KLMS algorithm of Section IV based on a simulated quaternion channel with nonlinearity. The channel structure follows [5], although extended to quaternions, and is based on the Wiener nonlinear model. It consisted of widely linear portion ∗



i



j



k



z = g1 u(n) + g2 u (n) + g3 u (n) + g4 u (n) + h 1 u(n −1) ∗ ∗ ∗ (42) + h 2 u i (n − 1) + h 3 u j (n − 1) + h 4 u k (n − 1) followed by the nonlinearity: y = z + az 2 + bz 3

(43)

where coefficients g1 . . . g4 , h 1 . . . h 4 , a, b are quaternion valued. The coefficient values used were g1 = −0.45 − 0.4i + 0.3 j + 0.15k h 1 = 0.15 + 0.175i − 0.025 j + 0.1k g2 = 0.2 − 0.35i − 0.15 j − 0.05k h 2 = −0.075 + 0.15i − 0.225 j + 0.125k g3 = −0.05 − 0.1i − 0.4 j + 0.2k h 3 = −0.05 + 0.025i + 0.075 j − 0.05k g4 = −0.15 + 0.35i + 0.1 j − 0.1k h 4 = 0.175 − 0.05i − 0.075 j − 0.075k a = −0.05 + 0.075i + 0.35 j + 0.1k b = 0.03 − 0.025i − 0.25 j + 0.05k.

(44)

For input sequence u(n), the quaternion data used was either Q-proper or Q-improper. For Q-proper, the data was u(n) = 0.5u a (n) + i 0.5u b (n) + j 0.5u c (n) + k0.5u d (n) (45)

and for Q-improper   u(n) = 28/32u a (n) + i 2/32u b (n)   + j 1/32u c (n) + k 1/32u d (n)

(46)

where u a (n) to u d (n) are white Gaussian noise with unit variance. The channel output power was −4.5 dB with Q-proper input, and −2.8 dB with Q-improper input. The measurement noise was Q-proper, and various noise levels were used. This channel was tested with the following settings. Fig. 1: Q-proper input, measurement noise = −17 dBm. Fig. 2: Q-improper input, measurement noise = −17 dBm. Fig. 3: Q-proper input, measurement noise = −40 dBm. Fig. 4: Q-improper input, measurement noise = −40 dBm. Fig. 5: Q-proper input, measurement noise = −12 dBm. Fig. 6: Q-improper input, measurement noise = −12 dBm. Figs. 1 and 2 show algorithm performance with both measurement noise and nonlinearity present. Figs. 3 and 4 represent the use of a relatively noise-free channel. Finally, Figs. 5 and 6 show algorithm performance when measurement noise is substantially large compared with the nonlinearity. With Figs. 1 and 2, the results are ensemble-averaged over 100 realizations. A 50-sample moving window average

PAUL AND OGUNFUNMI: KERNEL ADAPTIVE ALGORITHM

Fig. 3.

Quat-KLMS, WL methods (input: Q-proper, noise = −40 dBm).

Fig. 4.

Quat-KLMS, WL methods (input: Q-improper, noise = −40 dBm).

Fig. 5.

Quat-KLMS, WL methods (input: Q-proper, noise = −12 dBm).

was also applied to the MSE before graphing, to provide more visible convergence curves. In remaining Figs. 3–6, an ensemble-average over 25 realizations and 100-sample moving window average were used. Figs. 1 and 2 also have an inserted graph to clarify initial convergence behavior of each algorithm. The algorithms and parameters used in Figs. 1–6 were as follows: QLMS: μ = 0.08; WL-QLMS: μ = 0.02; √ Quat-KLMS1: η = 0.35, σ = 2 2;

2431

Fig. 6.

Quat-KLMS, WL methods (input: Q-improper, noise = −12 dBm).

Quat-KLMS2: η = (0.35/40), σ = 4; WL-Quat-KLMS2: η = (0.35/4), σ = 4; IB-Quat-KLMS2: η = (0.35/4), σ = 4 where KLMS1 denotes the use of the Quat-Ext Gaussian kernel (10), and Quat-KLMS2 the Quaternion Gaussian kernel (9). The parameters η, σ for Quat-KLMS were chosen based on exhaustive trial-and-error testing. For WL-Quat-KLMS, η was chosen based on the use of WL estimation versus linear estimation, i.e., effect on filter length. The choice of η for the IB-Quat-KLMS algorithm was similarly considered, while also considering how higher order dimension terms diminish with σ in Appendix A. Methods for finding optimal values for σ here are outside the scope of this paper. It should be noted that the step size η for Quat-KLMS2 was reduced based on some observed stability issues. A bound for step-size η was considered in Appendix C, however, its evaluation is dependent on understanding the kernel transform space associated with each algorithm. This is needed to determine the eigenvalues of correlation matrix R for the transformed input. This was considered outside the scope here, and remains for future work. Finally, sparsification was used in all tests performed for reduced complexity. Sparsification methods curb the linear growth of the kernel adaptive filter structure (31)–(33) by choosing samples to discard which should minimally affect performance. The results of all tests performed used the vector quantization (VQ) approach [2]. The sparsification parameters used and results obtained for MSE, network size after 15 000 iterations are shown in Table I. Without sparsification, the network size would be 15 000 in all cases. The network size corresponds to the number of kernel evaluations required for the algorithm to compute an estimate at that point, in this case 1500 iteration. This may be seen from (31)–(33), (36) and (37). Thus, the network size gives an indication of algorithm complexity, and the computation reduction is approximately (15 000–2000)/15 000 = 87%. The same sparsification parameter value was used for all algorithms, and only varied depending on whether the input was Q-proper or Q-improper. This resulted in the same dictionary being used across methods, and was considered here

2432

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015

TABLE I S PARSIFICATION PARAMETER , N ETWORK

TABLE II MSE, S TANDARD D EVIATION FOR VARIOUS A LGORITHMS T ESTED

S IZE A FTER 15 000 I TERATIONS

to provide an unbiased algorithm comparison. Performance was seen to vary less than 1 dB for all tests with sparsification. From Figs. 1 and 2, we see Quat-KLMS1, WL-Quat-KLMS2, and IB-Quat-KLMS2 perform well for both Q-proper (Fig. 1) and Q-improper (Fig. 2) input, and achieving an MSE of −12 to −13 dBm with both distribution types. Note that the performance of Q-improper differs from Q-proper since it has an uneven distribution along the real, imaginary directions, and is affected by the nonlinear channel differently. These kernel-based methods improve versus WL-QLMS (i.e., nonkernel method) around 1.5 dB for Q-proper, 3 dB for Q-improper. Quat-KLMS2, on the other hand, had surprisingly worse performance (6 dB worse with Q-proper input, 3 dB with Q-improper), compared with the Quat-KLMS1. This should be due to the inability of Quat-KLMS2 to perform well for channels which are a function of the perpendicular involutions of the input, where QWL estimation provides benefit. Meanwhile, Quat-KLMS1 can be seen to be capable of QWL estimation (Section IV-F). Note, based on (42), the channel simulated is clearly one where QWL estimation would work well. The associated RKHS for Quat-KLMS2 may also affect stability (Appendix A). In addition, understanding the exact differences in performance between the various algorithms here would likely require understanding the kernel transform spaces of each, and differences in dimensionality, convergence behavior. However, this is considered out of scope here, as mentioned previously. Finally, in both figures, QLMS had the worst MSE (7 dB worse for Q-proper input and 5 dB for Q-improper input) compared with Quat-KLMS1, as expected. Table II shows the steady-state MSE for each algorithm in Figs. 1 and 2. Both the average and standard deviation of the MSE are provided. For these results (as well as Figs. 3–6 after), we could also consider the algorithms of [32] and [35]. In [32], an involutions-based gradient, or i -gradient, was used in deriving an algorithm based on cost function (15). This provided an alternative algorithm to Quat-KLMS, which could improve rate of convergence. Meanwhile, Tobar and Mandic [35] describe a quaternion kernel least-square algorithm, which also provides convergence benefit, at cost of complexity. However, our focus here is on LMS-based approaches, and gradients based on HR calculus.

In [36]–[38], multikernel and mixture kernel methods are described. In particular, Tobar et al. [37] illustrate how a multikernel LMS algorithm could improve estimation for nonstationary signals via combining multiple kernels with different kernel bandwidths in a time-varying fashion. These works focus on use of real kernels, however, and would require an extension to quaternion data for comparison. This may be considered for future work. Figs. 3 and 4 show the performance for a relatively measurement noise-free channel (i.e., −40 dBm, compared with −17 dBm in Fig. 1). The performance of Quat-KLMS1, WL-Quat-KLMS2, and IB-Quat-KLMS2 still provide substantial improvement with both Q-proper and Q-improper input, yielding benefits of 3 and 5 dB, respectively. The results are otherwise similar to Figs. 1 and 2, as expected. Convergence is slow toward −40 dB, which may be attributed to the high-dimension space used for learning, which affects both convergence rate and misadjustment. Figs. 5 and 6 show the MSE results with measurement noise at −12 dB. Here, the measurement noise is substantial compared with the performance degradation due to nonlinearity. This can be verified by comparing with the WL-QLMS of Figs. 3 and 4. We observe that the benefit of kernel-based methods here is not substantial, although we still see Quat-KLMS1, WL-Quat-KLMS2, and IB-Quat-KLMS2 providing about 1 dB of benefit with Q-proper input, and 2 dB benefit with Q-improper input. For Quat-KLMS1, note that performance may benefit from further refining kernel parameter σ . However, this was not done here in favor of using a uniform σ for all tests. The results obtained indicate the use of kernel methods should be considered only when the main source of channel distortion is due to nonlinearity, particularly considering computation complexity. This is observed to not be the case in Figs. 5 and 6. A nonstationary channel test was also performed, which can be observed in Fig. 7. Specifically, a channel where an abrupt coefficient change occurs was tested. The test settings were the same used for Fig. 2 (Q-improper data, −17-dBm noise).

PAUL AND OGUNFUNMI: KERNEL ADAPTIVE ALGORITHM

Fig. 7.

2433

Quat-KLMS, WL methods with nonstationary channel. Fig. 8.

At iteration 7500, however, the coefficients were changed to g1 = −0.45 − 0.4i + 0.3 j + 0.15k h 1 = 0.15 + 0.175i − 0.025 j + 0.1k g2 = 0.2 − 0.35i − 0.15 j − 0.05k h 2 = −0.075 + 0.15i − 0.225 j + 0.125k g3 = −0.05 − 0.1i − 0.4 j + 0.2k, h 3 = −0.05 + 0.025i + 0.075 j − 0.05k g4 = −0.15 + 0.35i + 0.1 j − 0.1k h 4 = 0.175 − 0.05i − 0.075 j − 0.075k a = −0.05 + 0.075i + 0.35 j + 0.1k b = 0.03 − 0.025i − 0.25 j + 0.05k.

(47)

In addition, only 25 iterations were used for the ensemble averaging, and a 100-sample moving window average applied to the MSE with Fig. 7. From Fig. 7, we see that the convergence is slower for the kernel-based methods. This is expected, based on the high-dimension spaces associated with these methods, and how filter length affects rate of convergence. However, convergence behavior is reasonably quick, as well as stable, in adjusting to the abrupt channel model transition. This should be partially attributed to the VQ sparsification method, though the results also indicate the robustness of the quaternion kernel methods described. The MSE performance of all methods are essentially as expected, with Quat-KLMS1 and WL methods yielding benefit. Finally, Fig. 8 shows the use of these quaternion-based algorithms in an application using electroencephalogram (EEG) data. The test data used can be obtained from the location shown for [8]. The EEG data was collected from electrodes placed at various locations on the head, based on the 10–20 system [8]. Eight signals were collected, which were used as follows. FP1, FP2—frontal electrodes. C3, C4—central electrodes. O1, O2—occipital electrodes. velectrooculography (vEOG), hEOG—EOG measurements.

Quat-KLMS, WL methods with EEG data (prediction task).

The data used was collected for the scenario where an eye brow was raised. For forming quaternion input, the signals were grouped based on pattern similarity (i.e., FP1 and FP2 are symmetrically located, signals C3 and C4, etc.), and the eight signals were considered a tuple of quaternion inputs. Specifically, the eight input signals are divided into two consecutive quaternion inputs: (FP1 + i C3 + j O1 + k vEOG) and (FP2 + i C4 + j O2 + k hEOG). This formed the input of a one-step predictor of order L = 10, based on the various algorithms. For the tests here, the parameters used were as follows: QLMS :

μ = 0.08;

WL-QLMS : Quat-KLMS1 :

μ = 0.02; √ η = 0.6, σ = 2 2;

Quat-KLMS2 : η = (0.6/40), σ = 4; WL-Quat-KLMS2 : η = (0.6/4), σ = 4; IB-Quat-KLMS2 : η = (0.6/4), σ = 4 where σin 2 is average input power. The prediction gain performance is graphed for each algorithm. From Fig. 8, we see that the Quat-KLMS algorithms (particularly, Quat-KLMS1 and IB-Quat-KLMS2) provide improved steady-state prediction gain, as expected. QLMS shows worse prediction gain than WL-QLMS (around 2 dB worse). The convergence rate of the kernel-based algorithms are also improved (reach steady state in less than 500 iterations). The benefit of Quat-KLMS, WL versions, may be attributed to any nonlinearity in the EEG data, in how the current sample is related to past samples. The simulations of Figs. 1–8 show the ability of the quaternion kernel-based algorithms in learning nonlinear channels, as well as the benefit of the Quat-Ext Gaussian kernel and QWL estimation in improving performance with quaternion data. Note that Quat-KLMS1 performs equally well with the WL-based algorithms, which can be understood based on the Section IV description.

2434

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015

VI. C ONCLUSION In this paper, we describe the derivation of a kernel adaptive algorithm with quaternion data, which we refer to as the Quat-KLMS. The approach follows that of Bouboulis and Theodoridis for CKLMS [5], and involves describing suitable kernel functions with quaternion data, and the associated RKHS spaces. In addition, a description of a modified HR calculus to perform gradient operations on a quaternion RKHS was provided. Simulations of a nonlinear quaternion channel under various SNR, as well as a nonstationary channel test, were used to verify the potential benefits of Quat-KLMS to robustly handle channel nonlinearities.

where α = (qb2 + qc2 + qd2 )1/2 , and functions exp(.), cos(. ), and sin(. ) on the right side follow the definitions of the realvalued case. Expanding (A.4) using the definitions of these functions

exp(q) =

∞  qn

!

∞ ∞   (−1)m α 2m (−1)m α 2m + (2m)! (2m + 1)! m=0 m=0 !

a

n=0

n!

× (i qb + j qc + kqd ) =

∞  qn

!

∞ 

a

n=0

n!

 (−1)m α 2m (2m)!−1 + (2m + 1)!−1

m=0

A PPENDIX A For a Gaussian kernel function applicable to quaternions, we initially consider the real and complex Gaussian kernels forms. The goal is providing a quaternion kernel definition with a similar RKHS. The real Gaussian kernel may be defined [1]–[5] ! d  −2 2 (A.1) (ai − bi ) κσ,R d (a, b) = exp −σ i=1

where a, b are real-valued input vectors of length d, and Rd denotes the associated RKHS (dimensionality affected by d). Meanwhile, the complex Gaussian kernel is [5], [39] ! d    ∗ 2 ai − bi (A.2) κσ,C d (a, b) = exp −σ −2 i=1

where a, b are now complex-valued vectors, and C d denotes the associated complex RKHS. The RKHS of the complex Gaussian kernel is considered in detail in [39], where its associated RKHS is seen to match that of the real kernel. In particular, when the input is restricted to real values, the kernel functions are equivalent. However, (A.2) provides a Hermitian inner product definition, and thus is suitable for complex data. Considering this, for the quaternion Gaussian kernel, we similarly define ! d   ∗ 2 −2 (A.3) ai − bi κσ,Q d (a, b) = exp −σ i=1

where vectors a, b consist of quaternion-valued components, and Q d denotes the associated quaternion RKHS. To verify the RKHS of the quaternion Gaussian kernel, a concern is that multiplication with quaternions, unlike real and complex data, is noncommutative. Thus, establishing the equivalence of (A.3) to the inner product κ(a, b) = ϕ H (a)ϕ(b), where ϕ(·) is the associated mapping to RKHS, may be seen to be nontrivial. This will be considered here. We start by considering the quaternion exponential function definition, which may be expressed [40]   exp(q) = exp(qa ) cos(α) + sin(α)α −1 (i qb + j qc + kqd ) (A.4)

× (i qb + j qc + kqd )

! 

(A.5)

and using the properties of quaternions, we get !  n 1 ∗ n exp(q) = n! (q + q ) 2 n=0 ∞  2m  1 ∗ × (q − q )2m 2 m=0  ! ∗ −1 −1 1 (q − q ) × (2m)! + (2m + 1)! 2   n+2m ∞ ∞  1  ∗ ∗ 2 (q + q )n (q − q )2m = n!(2m)! n=0 m=0   1 n+2m+1 ∞ 

−1

+



2

n!(2m + 1)!



(q + q )n (q − q )2m+1 . (A.6)

Finally, using (A.6) with (A.3) yields the following. Here, we consider the scalar input case only (d = 1) for tractability κσ,Q d (a, b) ∞  ∞   (−σ −2 /2)n+2m ∗ ∗ ((a − b )2 + (a − b)2 )n = n!(2m)! n=0 m=0

(−σ −2 /2)n+2m+1 n!(2m + 1)! ∗ 2 ∗ ∗ 2 n × ((a − b ) + (a − b) ) × (a − b )2  ∗ −(a − b)2 )2m+1 . (A.7) ∗



× ((a −b )2 − (a −b)2 )2m +

The vector input case may be considered as with [39]. Expanding and factoring of (A.7) is required to yield an inner product form. It is observed that, even for the scalar input case, this process is nontrivial. Instead, we initially move our attention to the alternative kernel functions described in Section IV (based on the

PAUL AND OGUNFUNMI: KERNEL ADAPTIVE ALGORITHM

2435

WL-based methods). These can be expressed as

yielding κ Q IB (a, b)

κ Q WL (a, b) = κ Q (a, b)

i + κQ (a, b)

j + κ Q (a, b)

! !i d d       ∗ ∗ 2 2 al − bl al − bl +exp −σ −2 = exp −σ −2 l=1

+ exp −σ

−2

d 



!k d    ∗ 2 ∗ 2 −2 al − bl al − bl +exp −σ

l=1

l=1

= κ Q ([a, ai , a j , ak ], [b, bi , b j , bk ])



+ al − bl

2  j

+





al − bl

2 k

! (A.8)

κ Q WL (a, b) = 4 exp −σ

d 



!  ∗ 2  Re ai − bi



(A.9)



!

+[(al − bl ) ] + [(al − bl ) ] = exp −σ

d 

2 k

! ∗

4Re{(al − bl ) } 2

l=1

n=0

n!

=



























∗i













+ · · · + a i bi a k b k ) .. . ∗











+ (a k b k a b + a k b k a i bi + a k b k a j b j ∗ ∗ + · · · + a k b k a k b k )]. (A.13) Here, we see, for n = 2, the term is not readily expressible as an inner product. However, we consider factoring as follows. ∗ ∗ For evaluating a ba b from (A.13) ∗



1 ∗ i ∗i ∗ ∗ ∗ a (a b + a j b j + a k b k − ab )b 2 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ = (a a i b i b + a a j b j b + a a k b k b − a ab b) 2 (A.14)

a ba b =

we can evaluate (A.8) directly using the real-valued definition of exp(. ). Here, we consider the d = 1 case, # $n ∗ ∗ ∞ 4σ −2 Re{ab + b a} 



+ a k b k ) = ψ 1H (a)ψ 1 (b) (A.12)

+ (a i bi a b + a i bi a i bi + a i bi a j b j









(al − bl )2 + [(al − bl )2 ]i

l=1

−2

∗j

∗i

= (σ −4 /2!)[(a ba b + a ba i bi + a ba j b j ∗ ∗ + · · · + a ba k b k )

where exp(. ) and cos(. ) use the real-valued definitions. In a similar fashion with (A.7), the tractability of (A.9) seems poor. However, for κ Q IB (a, b), since this can be expressed

2 j



(σ −4 /2!)(a b + a i bi + a j b j + a k bk + ab + a i b ∗ ∗ + a j b j + a k b k )2

l=1





where ψ1 (x) = σ [x,x i ,x j ,x k , x ,x i , x j ,x k ]T . Thus, it remains to see if the n > 1 terms may be expressed with inner products. For the n = 2 case

l=1

d 





" d "! "  ∗ 2 " −2 " " × cos σ " Im ai − bi "

κ Q IB (a, b) = exp −σ −2



σ −2 (a b + a i bi + a j b j + a k bk + ab + a i b + a jb

where (A.3) is used for κ Q (·, ·). For κ Q WL (a, b), considering (A.4), we can write −2

(A.10)

Evaluating the summation term in (A.10) using the properties of Section II yields (A.11), as shown at the bottom of this page. Note that the n = 1 term of (A.11) can be readily expressed in an inner product form. Specifically

d    ∗ 2 ∗ 2 i = exp −σ −2 al − bl + (al − bl



n=0



× exp(−4σ −2 Re{b 2 }).

κ Q IB (a, b)

l=1



= exp(−4σ −2 Re{a 2 }) exp(4σ −2 Re{b a + ab }) ∗ × exp(−4σ −2 Re{b 2 })  ∞  (4σ −2 Re{b∗ a + ab∗ })n −2 2 = exp(−4σ Re{a }) n!

l=1

!j



= exp(−σ −2 4Re{(a − b )2 }) ∗ ∗ ∗ = exp(4σ −2 [−Re{a 2 } + Re{b a + ab } − Re{b 2 }])

k + κQ (a, b)

# $n ∗ ∗ ∗ ∗ ∞ 2σ −2 [ab + b a + ba + a b]  n! $n # ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∞ 2σ −2 [ab + a b + 1 (a i b i + a j b j + a k b k − a b) + 1 (a i b i + a j b j + a k b k − ab )]  2 2 n=0

=

n! $n # ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∞ σ −2 [ab + a i b i + a j b j + a k b k + a b + a i bi + a j b j + a k bk ]  n=0

=

n=0

n!

.

(A.11)

2436

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015

TABLE III E XPANSION AND FACTORING OF (A.13)

permutations of the multiplication of n elements chosen, with ∗ ∗ ∗ ∗ repetition, from the set: {x, x i , x j , x k , x , x i , x j , x k }. Finally, combining with (A.10) yields # $ ∗ κ Q WLI (a, b) = exp −σ −2 4Re{(a − b )2 } = ϕ H (a)ϕ(b)



where the conjugate of ba is evaluated based on the quaternion properties of Section II. Table III shows the first half of all terms of (A.13) similarly expanded, and their equivalent sum. The remaining half can be similarly shown. From this table, the n = 2 case may also be expressed using an inner product form, yielding ∗









(σ −4 /2!)(a b + a i bi + a j b j + a k bk + ab + a i b + a jb

∗j

∗i



+ a k b k )2 = ψ 2H (a)ψ 2 (b)

where ψ2 (x) = (σ −4 /2!)1/2 [x x, x x i , x x j , x x k , x i x, ∗k ∗k T i i x x , . . . , x x ] . The vector ψ2 (x) consists of 64 elements, which are all the permutations of the multiplication of two elements chosen, with repetition, from the set: ∗ ∗ ∗ ∗ {x, x i , x j , x k , x , x i , x j , x k }. A similar process for larger n can be shown to yield ∗











(σ −2n /n!)(a b + a i bi + a j b j + a k bk + ab + a i b i ∗ ∗ + a j b j + a k b k )n = ψ nH (a)ψ n (b) (A.15) 1/2



where ψn (x) = (σ −2n /n!) [x ···xx, x ··· xxi , x ···xx j , . . . , x k··· ∗ ∗ x k x k ]T , and consists of 8n elements, which are all the

(A.16)

where ϕ(x) = exp(−4σ −2 Re{x 2 })[ψ0T (x), ψ 1T (x), ψ 2T (x), ψ 3T (x)...]T. From (A.16), we see that the Mercer condition applies to the function κ Q IB (a, b) of (A.8) when d = 1 (i.e., the scalar case). The vector case should have a similar factoring following [39], however, this remains for verification. The validity of the function κ Q IB (a, b) of (A.8) as a kernel function may be used for determining whether (A.3) (the quaternion Gaussian kernel definition proposed) may be seen as a valid kernel. From [41], defining a kernel κ : X × X → C (same as Section III, except defined on complex field, C) and a function ξ : S → X, we can define the composite function as κ ◦ ξ : S × S → C. Thus: (κ ◦ ξ ) (x1 , x2 ) = κ(ξ(x1 ), ξ(x2 )). Function ξ represents a transformation applied to the input of kernel κ. This can be seen to reflect the relationship between (A.3) and κ Q IB (a, b) of (A.8). In [41], it is observed that the composite function κ ◦ ξ is also a kernel function on S for any arbitrary ξ . The associated Hilbert space H(κ ◦ ξ ) is referred to as the pull-back of H(κ) along ξ , and can be expressed H(κ ◦ ξ ) = { f ◦ ξ : f ∈ H(κ)}. This property implies the validity of κ Q IB (a, b) of (A.8) as a kernel, given (A.3) is a valid kernel. The reverse may also be seen, given a suitable inverse map. However, since the function exp(. ) differs between real and quaternion input (A.4), a direct verification of the RKHS of (A.3) may be preferred. This remains to be considered for future work. Finally, for the associated transform mapping of (A.16), we assume the property ϕ(x η ) = ϕ η (x) for η {i , k, j }, holds. This is based on the mapping ψn (x) of (A.15), combined with the property a η bη = (ab)η for η {i , k, j }, as well as (a ω )ρ = (a ρ )ω for ω, ρ { i ,k, j }. Assumption ϕ(x η ) = ϕ η (x) is considered to extend to the vector case of κ Q IB (a, b) of (A.8), and also apply to (A.3), as well as κ Q WL (a, b). However, the transform spaces for these functions should be verified to confirm. This assumption is used in deriving the various methods obtained in Section IV. A PPENDIX B To extend a real-valued kernel function for use with quaternions, we consider an approach similar to the derivation of the complexified real Gaussian kernel [5]. Initially, let X ⊆ Rν and X 4 ≡ X × X × X × X ⊆ R4ν . X 4 is the subspace representing the components of a ν-dimensional vector of quaternions. Then, define the quaternion vector space X = {x a + i x b + j x c + kx d , x a , x b , x c , x d ∈ X} ⊆ Hν with a quaternion product structure. We now define the RKHS H associated with a real kernel κ defined on X 4 × X 4 , and its inner product , H . Finally, let H 4 = H × H × H × H. Then, H 4 can be seen to be a Hilbert space with inner product f, gH 4 =  f 1 , g1 H +  f2 , g2 H +  f3 , g3 H +  f 4 , g4 H

PAUL AND OGUNFUNMI: KERNEL ADAPTIVE ALGORITHM

2437

with f = ( f 1 , f 2 , f3 , f 4 ), g = (g1 , g2 , g3 , g4 ). The Hilbert space H can be seen to represent the RKHS for learning a single component of a quaternion function output, while H 4 is the RKHS for learning all four components. However, we still require a quaternion structure. Define space Qy = {f = f1 + i f 2 + j f 3 + k f 4 , f 1 , f2 , f3 , f 4 ∈ H} with quaternion inner product f, gQ =  f 1 , g1 H +  f2 , g2 H +  f3 , g3 H +  f 4 , g4 H + i ( f 2 , g1 H −  f 1 , g2 H +  f 4 , g3 H − f 3 , g4 H ) + j ( f 3 , g1 H −  f 1 , g3 H +  f 2 , g4 H − f 4 , g2 H ) + k( f4 , g1 H −  f 1 , g4 H +  f3 , g2 H − f2 , g3 H ) with f = f 1 + i f 2 + j f 3 + k f 4 , g= g1 + ig2 + j g3 + kg4 . Thus, f, g ∈ Q are functions defined f, g : X → H, where X ⊆ Hν . Thus, Q can be seen to be a quaternion RKHS with real kernel κ from before. We refer to Q as a quaternion-extended RKHS. For mapping quaternion x to RKHS Q, we can use the rule ˆ ˆ a + i x b + j x c + kx d ) (x) = (x ˆ a , xb , xc , xd ) = (x = (x a , x b , x c , x d ) + i · (x a , x b , x c , x d ) + j · (x a , x b , x c , x d ) + k · (x a , x b , x c , x d ) (B.1) ˆ is the mapping applied to quaternion input where  x = x a + ix b + jx c + kx d , and is the feature map of real kernel κ, defined (x a , x b , x c , x d ) = κ(·, (x a , x b , x c , x d )). With this approach, the mapping to each real component of the quaternion output, based on the real components of the quaternion input, is chosen the same. Specifically, feature map of real kernel κ is used. The mapping can be simplified to ˆ (x) = (x a , x b , x c , x d ) + i (x a , x b , x c , x d ) + j (x a , x b , x c , x d ) + k (x a , x b , x c , x d ) = (1 + i + j + k) · (x a , x b , x c , x d ).

(B.2)

This is due to the fact that the kernel transform mapping (x a , x b , x c , x d ) is the same for each orthogonal component. Note the mapping associated with a real kernel function is a real-valued RKHS. For the real Gaussian kernel, an exact description of the RKHS is in [39]. Thus, the inner product of two vectors in RKHS Q is ˆ ˆ (x), (w) Q ˆ H (w)(x) ˆ = = T (wa , wb , wc , wd ) (xa , xb , xc , xd ) |1 − i − j − k|2 = 4κ ((xa , xb , xc , xd ), (wa , wb , wc , wd )) where w = wa + i wb + j wc + kwd and x = xa + i xb + j xc + kxd are quaternion data vectors. This result, similar with the complexified real kernel trick of [5], shows that any linear method formed from quaternion inner products of form w H x can be converted to a nonlinear method with this approach via replacing each of the inner products using a positive definite real kernel, whose arguments are the real components of the inputs x and w, i.e., κ((xa , xb , xc , xd ), (wa , wb , wc , wd )). This forms an alternative RKHS for quaternion nonlinear adaptive algorithms.

With the real Gaussian kernel, which is expressed as ⎛ ⎞ d    2 u j − u j ⎠. (B.3) κσ,Rd (u, u ) = exp ⎝−σ −2 j =1

The Quat-Ext real Gaussian kernel is κσ,Qd (x, w)



= 4 exp ⎝−σ −2

d % 

(x a, j − wa, j )2 + (x b, j − wb, j )2

j =1 2

+ (x c, j − wc, j ) + (x d, j − wd, j ) ⎛ = 4 exp ⎝−σ −2

d 

2

&

⎞ ⎠

⎞ |x j − w j |2 ⎠.

(B.4)

j =1

The function here satisfies the Mercer condition, and may be considered a valid kernel form. It can be seen as a measure of distance between the input vectors and is also symmetric, which are beneficial characteristics of kernels [4]. However, quaternion multiplication is noncommutative. Because of this, the algorithm may not be easily cast in terms ˆ of inner products. In this case, we use the transform map (x) for quaternion vector x directly, which we express as ˆ (x) = s · (xcomp )

(B.5)

where s = (1 + i + j + k) is the quaternion scalar indicating the mapping direction, and xcomp = [xaT xbT xcT xdT ]T is a vector of the components of x. This form will be used in the analysis. A PPENDIX C Here, we provide a brief description of the convergence of the Quat-KLMS, WL forms, based on convergence in mean. Following [10], to characterize the convergence of Quat-KLMS, we describe the evolution of the weight-error in the following way. For Quat-KLMS, the weight update is w Q (n) = w Q (n − 1)  1ˆ 5ˆ ∗ i i∗ +η  Q (x(n))e (n) −  Q (x (n))e (n) 4 4  1ˆ 1ˆ ∗ k k∗ (x (n))e (n) . −  Q (x j (n))e j (n) −  Q 4 4 (C.1) This may be expressed as w Q (n) = w Q (n − 1)  i 5ˆ 1 ˆ ∗ ∗  Q (x(n))e (n) +η  Q (x(n))e (n) − 4 4  1 ˆ 1 ˆ ∗ ∗ j k [  − [ (x(n))e (n)] − (x(n))e (n)] Q Q 4 4 ˆ Q (xη (n)) and a η bη = (ab)η for ˆ ηQ (x(n)) =  since  η {i , k, j }.

2438

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015

Considering the error can be expressed as H ˆ Q (x(n)) e(n) = d(n) − w Q (n − 1)  oH  H ˆ Q (x(n)) + r (n) − w Q ˆ Q (x(n)) = w (n) (n − 1) ˆ Q (x(n)) + r (n) = −v H (n − 1) (C.2)

where v(n) = w Q (n) − wo (n) is the weight error, and r (n) is the measurement noise of the nonlinear system, we can write w Q (n) = w Q (n − 1) 5η  ˆ H ˆQ  Q (x(n)) (x(n))v(n − 1) − 4  ˆ Q (x(n))r ∗ (n) − η ˆ H ˆQ (x(n))v(n − 1) +  Q (x(n)) 4  ˆ Q (x(n))r ∗ (n) i − η ˆ H ˆQ +  (x(n))v(n − 1) Q (x(n)) 4  ˆ Q (x(n))r ∗ (n) j − η ˆ H ˆQ +  (x(n))v(n − 1) Q (x(n)) 4  ˆ Q (x(n))r ∗ (n) k . −

ˆ Q IB (·) has an exponentially larger dimension size. This while  indicates that the eigenvalue spread is larger in general for the WL-based methods, and thus η is chosen to be small based on stability concerns, as opposed to convergence rate. However, the choice of kernel parameter σ can be used to adjust the transform map, which can reduce the eigenvalue spread, in addition to improving MSE performance. R EFERENCES

(C.3)

The mean value of the weight error can then be expressed as E[v(n)]

 5η  ˆ H ˆQ E  Q (x(n)) = E[v(n − 1)] − (x(n)) E[v(n − 1)] 4 i η ˆ H ˆQ + E  (x(n))  (x(n)) E[v(n − 1)]i Q 4 j η ˆ H ˆQ + E  Q (x(n)) (x(n)) E[v(n − 1)] j 4 k η ˆ H ˆQ + E  (x(n)) E[v(n − 1)]k Q (x(n)) 4 3η Cx E[v(n − 1)] + ηRe{Cx E[v(n − 1)]} = E[v(n − 1)] − 2 (C.4)

H ˆ Q (x(n)) ˆQ (x(n))] is the covariance where Cx = E[ matrix of x(n) in the transform space. Equation (C.4) result considers (3) in the last step, and assumes x(n), v(n) are independent, and r (n) is zero-mean and independent of x(n). For finding an upper bound on η, we may approximate Re{Cx E[v(n − 1)]} as zero, maximizing η and yielding    3 (C.5) E[v(n)] ≈ 1 − η Cx E[v(n − 1)]. 2

Letting R = 3/2Cx , the stepsize bound for convergence in the mean can be expressed as [42], [43] 0η(2/λmax ) where λmax is the maximum eigenvalue of R. Note that in the case of WL-Quat-KLMS and IB-Quat-KLMS, the same expression results, except the covariance matrix definition for Cx is either H ˆ Q WL (x(n)) ˆQ ˆ Q IB (x(n)) Cx = E[ (x(n))] or Cx = E[ WL H ˆ Q (x(n))], respectively.  IB To compare matrix Cx between these algorithms, note ˆ Q WL (·) has a dimensionality four times that of  ˆ Q (·), that 

[1] W. Liu, P. P. Pokharel, and J. C. Principe, “The kernel least-mean-square algorithm,” IEEE Trans. Signal Process., vol. 56, no. 2, pp. 543–554, Feb. 2008. [2] B. Chen, S. Zhao, P. Zhu, and J. C. Principe, “Quantized kernel least mean square algorithm,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 1, pp. 22–32, Jan. 2012. [3] C. Richard, J. C. M. Bermudez, and P. Honeine, “Online prediction of time series data with kernels,” IEEE Trans. Signal Process., vol. 57, no. 3, pp. 1058–1067, Mar. 2009. [4] W. Liu, J. C. Principe, and S. Haykin, Kernel Adaptive Filtering: A Comprehensive Introduction. New York, NY, USA: Wiley, 2010. [5] P. Bouboulis and S. Theodoridis, “Extension of Wirtinger’s calculus to reproducing kernel Hilbert spaces and the complex kernel LMS,” IEEE Trans. Signal Process., vol. 59, no. 3, pp. 964–978, Mar. 2011. [6] P. Bouboulis, S. Theodoridis, and M. Mavroforakis, “The augmented complex kernel LMS,” IEEE Trans. Signal Process., vol. 60, no. 9, pp. 4962–4967, Sep. 2012. [7] F. A. Tobar, A. Kuh, and D. P. Mandic, “A novel augmented complex valued kernel LMS,” in Proc. IEEE 7th Sensor Array Multichannel Signal Process. Workshop (SAM), Jun. 2012, pp. 473–476. [8] S. Javidi, C. C. Took, C. Jahanchahi, N. Le Bihan, and D. P. Mandic, “Blind extraction of improper quaternion sources,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2011, pp. 3078–3711. [Online]. Available: http://www. commsp.ee.ic.ac.uk/~mandic/research [9] C. C. Took and D. P. Mandic, “The quaternion LMS algorithm for adaptive filtering of hypercomplex processes,” IEEE Trans. Signal Process., vol. 57, no. 4, pp. 1316–1327, Apr. 2009. [10] C. C. Took and D. P. Mandic, “A quaternion widely linear adaptive filter,” IEEE Trans. Signal Process., vol. 58, no. 8, pp. 4427–4431, Aug. 2010. [11] D. P. Mandic, C. Jahanchahi, and C. C. Took, “A quaternion gradient operator and its applications,” IEEE Signal Process. Lett., vol. 18, no. 1, pp. 47–50, Jan. 2011. [12] E. J. Bayro-Corrochano and N. Arana-Daniel, “Clifford support vector machines for classification, regression, and recurrence,” IEEE Trans. Neural Netw., vol. 21, no. 11, pp. 1731–1746, Nov. 2010. [13] T. Shaneyfelt, S. Agaian, M. Jamshidi, and S. Erdogan, “Quaternion number based vanilla recognition system for automating polination,” in Proc. Int. Conf. Syst. Sci. Eng., Jun. 2011, pp. 558–563. [14] A. Shilton, “Mercer’s theorem for quaternionic kernels,” Dept. Elect. Electron. Eng., Univ. Melbourne, Melbourne, Australia, Tech. Rep., 2007. [Online]. Available: http://people.eng.unimelb.edu.au/shiltona/publications/mercer.pdf [15] A. Shilton and D. T. H. Lai, “Quaternionic and complex-valued support vector regression for equalization and function approximation,” in Proc. Int. Joint Conf. Neural Netw., Aug. 2007, pp. 920–925. [16] B. C. Ujang, C. C. Took, and D. P. Mandic, “Quaternion-valued nonlinear adaptive filtering,” IEEE Trans. Neural Netw., vol. 22, no. 8, pp. 1193–1206, Aug. 2011. [17] J. Navarro-Moreno, R. M. Fernandez-Alcala, and J. C. Ruiz-Molina, “A quaternion widely linear series expansion and its applications,” IEEE Signal Process. Lett., vol. 19, no. 12, pp. 868–871, Dec. 2012. [18] J. A. Espinosa-Pulido, J. Navarro-Moreno, R. M. FernandezAlcala, and J. C. Ruiz-Molina. (2012). “Widely linear Markov signals,” EURASIP J. Adv. Signal Process. [Online]. Available: http://asp.eurasipjournals.com/content/pdf/1687-6180-2012-256.pdf, accessed Dec. 31, 2014. [19] F. A. Tobar and D. P. Mandic, “Quaternion reproducing kernel Hilbert spaces: Existence and uniqueness conditions,” IEEE Trans. Inf. Theory, vol. 60, no. 9, pp. 5736–5749, Sep. 2014. [20] C.-K. Ng, “On quaternionic functional analysis,” Math. Proc. Cambridge Philosoph. Soc., vol. 143, no. 2, pp. 391–406, Sep. 2007.

PAUL AND OGUNFUNMI: KERNEL ADAPTIVE ALGORITHM

[21] S. Zhou, G. Zhang, R. Chung, J. Y. J. Liou, and W. J. Li, “Real-time hand-writing tracking and recognition by integrated micro motion and vision sensors platform,” in Proc. IEEE Int. Conf. Robot. Biomimetics (ROBIO), Dec. 2012, pp. 1747–1752. [22] D. Kim, H. Dinh, W. MacKunis, N. Fitz-Coy, and W. E. Dixon, “A recurrent neural network (RNN)-based attitude control method for a VSCMG-actuated satellite,” in Proc. Amer. Control Conf. (ACC), Jul. 2012, pp. 944–949. [23] P.-G. Jung, G. Lim, and K. Kong, “Human posture measurement in a three-dimensional space based on inertial sensors,” in Proc. IEEE Int. Conf. Control, Autom. Syst. (ICCAS), Oct. 2012, pp. 1013–1016. [24] T. Nitta, “A quaternary version of the back-propagation algorithm,” in Proc. IEEE Int. Conf. Neural Netw. (ICNN), Nov./Dec. 1995, pp. 2753–2756. [25] T. Nitta, “An extension of the back-propagation algorithm to three dimensions by vector product,” in Proc. 5th IEEE Int. Conf. Tools Artif. Intell. (TAI), Nov. 1993, pp. 460–461. [26] T. Nitta, “A backpropagation algorithm for neural networks based on 3D vector product,” in Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), Oct. 1993, pp. 589–592. [27] P. Arena, L. Fortuna, L. Occhipinti, and M. G. Xibilia, “Neural networks for quaternion-valued function approximation,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May/Jun. 1994, pp. 307–310. [28] P. Arena, S. Baglio, L. Fortuna, and M. G. Xibilia, “Chaotic time series prediction via quaternionic multilayer perceptrons,” in Proc. IEEE Int. Conf. Syst., Man Cybern. (ICSMC), Oct. 1995, pp. 1790–1794. [29] L. Fortuna, G. Muscato, and M. G. Xibilia, “An hypercomplex neural network platform for robot positioning,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 1996, pp. 609–612. [30] P. Arena, L. Fortuna, G. Muscato, and M. G. Xibilia, Neural Networks in Multidimensional Domains. Berlin, Germany: Springer-Verlag, 1998. [31] A. W. Knapp, Basic Algebra. New York, NY, USA: Springer-Verlag, 2006. [32] T. Ogunfunmi and T. Paul, “An alternative kernel adaptive filtering algorithm for quaternion-valued data,” in Proc. Asia-Pacific Signal Inf. Process. Assoc. Annu. Summit Conf. (APSIPA), Dec. 2012, pp. 1–5. [33] C. C. Took and D. P. Mandic, “Augmented second-order statistics of quaternion random signals,” Signal Process., vol. 91, no. 2, pp. 214–224, 2011. [34] B. Picinbono and P. Chevalier, “Widely linear estimation with complex data,” IEEE Trans. Signal Process., vol. 43, no. 8, pp. 2030–2033, Aug. 1995. [35] F. A. Tobar and D. P. Mandic, “The quaternion kernel least squares,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2013, pp. 6128–6132. [36] M. Yukawa, “Multikernel adaptive filtering,” IEEE Trans. Signal Process., vol. 60, no. 9, pp. 4672–4682, Sep. 2012. [37] F. A. Tobar, S.-Y. Kung, and D. P. Mandic, “Multikernel least mean square algorithm,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 265–277, Feb. 2014. [38] R. Pokharel, S. Seth, and J. C. Principe, “Mixture kernel least mean square,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Aug. 2013, pp. 1–7. [39] I. Steinwart, D. Hush, and C. Scovel, “An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels,” IEEE Trans. Inf. Theory, vol. 52, no. 10, pp. 4635–4643, Oct. 2006. [40] S. Sarkka. Notes on Quaternions. [Online]. Available: http://www.lce. hut.fi/~ssarkka/pub/quat.pdf, accessed Dec. 31, 2014. [41] V. I. Paulsen. An Introduction to the Theory of Reproducing Kernel Hilbert Spaces. [Online]. Available: http://www.math.uh.edu/~vern/rkhs.pdf, accessed Dec. 31, 2014.

2439

[42] S. Haykin, Adaptive Signal Processing, 4th ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2002. [43] P. S. R. Diniz, Adaptive Filtering: Algorithms and Practical Implementation, 3rd ed. New York, NY, USA: Springer-Verlag, 2008.

Thomas K. Paul (M’06) received the B.A.Sc. degree from the University of Toronto, Toronto, ON, Canada, in 1996, and the M.S.E.E. and Ph.D.E.E. degrees from Santa Clara University, Santa Clara, CA, USA, in 2007 and 2012, respectively. He was involved in the maintenance and improvement of their V.92 software modem product, attaining the position of Project Lead–Product Enhancement with PCTEL Inc., Bloomingdale, IL, USA. He was with Ralink Technology, Inc., Cupertino, CA, USA, where he was involved in the development of a cost-reduced version of the 802.11b WLAN chipset. He was also with Intel Corporation, Santa Clara, where he was involved in WiMAX product development. He has been with the communications industry since 1996. He is currently with Accelera Mobile Broadband, Inc., Santa Clara, where he is involved in wireless network virtualization technology. He holds a U.S. patent regarding dynamic block processing in software modems. His current research interests include communications system design and digital signal processing.

Tokunbo Ogunfunmi (SM’00) received the B.S. (Hons.) degree from the University of Ife, Ife, Nigeria, and the M.S. and Ph.D. degrees from Stanford University, Stanford, CA, USA, all in electrical engineering. He is currently a Visiting Scholar with the Department of Electrical Engineering, University of Texas at Dallas, Dallas, TX, USA, and a Faculty Member with the Department of Electrical Engineering, Santa Clara University (SCU), Santa Clara, CA, USA. He served as an Associate Dean of Research and Faculty Development with the School of Engineering, SCU, from 2010 to 2014. His current research interests include adaptive/nonlinear signal processing, digital signal processing, multimedia (speech, video), and very large scale integration/digital signal processing/field-programmable gate array implementations. He has authored several publications in these areas. Dr. Ogunfunmi is a member of the Sigma Xi and the American Association for the Advancement of Science. He is also a member of several IEEE Technical Committees. He serves on the Editorial Boards of the IEEE S IGNAL P ROCESSING L ETTERS , the IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS -II, and Circuits, Systems and Signal Processing. He served as an IEEE Distinguished Lecturer for the Circuits and Systems Society from 2011 to 2013.

A kernel adaptive algorithm for quaternion-valued inputs.

The use of quaternion data can provide benefit in applications like robotics and image recognition, and particularly for performing transforms in 3-D ...
2MB Sizes 17 Downloads 6 Views