Projection-based fast learning fully complex-valued relaxation neural network.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 4, APRIL 2013

529

Projection-Based Fast Learning Fully Complex-Valued Relaxation Neural Network Ramasamy Savitha, Member, IEEE, Sundaram Suresh, Senior Member, IEEE, and Narasimhan Sundararajan, Life Fellow, IEEE

Abstract— This paper presents a fully complex-valued relaxation network (FCRN) with its projection-based learning algorithm. The FCRN is a single hidden layer network with a Gaussian-like sech activation function in the hidden layer and an exponential activation function in the output layer. For a given number of hidden neurons, the input weights are assigned randomly and the output weights are estimated by minimizing a nonlinear logarithmic function (called as an energy function) which explicitly contains both the magnitude and phase errors. A projection-based learning algorithm determines the optimal output weights corresponding to the minima of the energy function by converting the nonlinear programming problem into that of solving a set of simultaneous linear algebraic equations. The resultant FCRN approximates the desired output more accurately with a lower computational effort. The classification ability of FCRN is evaluated using a set of real-valued benchmark classification problems from the University of California, Irvine machine learning repository. Here, a circular transformation is used to transform the real-valued input features to the complex domain. Next, the FCRN is used to solve three practical problems: a quadrature amplitude modulation channel equalization, an adaptive beamforming, and a mammogram classification. Performance results from this paper clearly indicate the superior classification/approximation performance of the FCRN. Index Terms— Adaptive beamforming, classification, complexvalued neural network, energy function, quadrature amplitude modulation (QAM).

I. I NTRODUCTION

M

ANY real applications in the areas of adaptive array signal processing [1], [2], image recognition/processing [3]–[7], telecommunications [8], etc., involve signals that are inherently complex-valued. The physical characteristics of these signals and their nonlinear transformations can be approximated efficiently if they are represented and operated in the complex domain. Since neural networks are very good function approximators, the development of fully complexvalued neural networks to preserve and process these signals in the complex domain itself is gaining more attention. Recently, it has been shown that the orthogonal decision boundaries of complex-valued neural networks help them to solve classificaManuscript received June 18, 2012; revised December 15, 2012; accepted December 16, 2012. Date of publication January 17, 2013; date of current version February 13, 2013. The work of R. Savitha and S. Suresh was supported by the Ministry of Education (MoE) of Singapore through TIER-I. R. Savitha and S. Suresh are with the School of Computer Engineering, Nanyang Technological University, 639798 Singapore (e-mail: savi0001@ ntu.edu.sg; [email protected]). N. Sundararajan, retired, was with the School of Electrical and Electronics Engineering, Nanyang Technological University, 639798 Singapore (e-mail: [email protected]). Digital Object Identifier 10.1109/TNNLS.2012.2235460

tion problems more efficiently than their real-valued counterparts [9]. Since then, several complex-valued classifiers have been developed to solve real-valued classification problems [10]–[14]. Further, it has also been shown that the complexvalued neural networks have better generalization ability than the real-valued neural networks [15]. The main obstacle in the development of a complex-valued neural network and its learning algorithm is the selection of an appropriate activation function that is both differentiable and bounded in the complex domain. By Liouville’s theorem, an entire and bounded function is a constant in the complex domain. In earlier stages, the conflict between the requirements of an activation function and Liouville’s theorem was avoided by employing a split-type activation function in a multilayer perceptron framework, i.e., f (z) = f (x) + i f (y); z = x + i y; f ∈ [16]. Moreover, a complexvalued radial basis function (CRBF) network was developed using the Gaussian activation function (φ(z) : C → ) [17]. However, such functions do not completely preserve the physical characteristics of the complex-valued signals. Later, the desired properties of a fully complex-valued activation function were relaxed to be analytic and bounded almost everywhere. A set of elementary transcendental functions that satisfy these relaxed, desired properties were proposed as activation functions for the fully complex-valued multilayer perceptron (FC-MLP) network and its gradient descent-based learning algorithm was derived by Kim et al. [18]. A method to improve the convergence speed of complex-valued back-propagation is given in [19]. Subsequently, a fully complex-valued radial basis function (FC-RBF) network and its gradient descent-based learning algorithm were developed in [20] and [21]. The FC-RBF network employs a complex-valued activation function of the type of hyperbolic secant which satisfies the essential properties of a fully complex-valued activation function [18] and whose magnitude response is similar to that of a real-valued Gaussian function. Convergence of these algorithms is largely influenced by the singularity of the activation function used and its derivatives [22]. Recently, Suresh et al. have developed a fully complex-valued sequential learning algorithm, namely, the complex-valued self-regulatory resource allocation network (CSRAN) using the FC-RBF network as the basic building block [23] and the complex-valued extended Kalman filter [24] to update the network parameters. CSRAN uses both the magnitude and phase errors and the principles of self-regulation to solve function approximation/classification problems efficiently.

2162–237X/$31.00 © 2013 IEEE

530


On the other hand, the learning algorithms in FC-MLP and FC-RBF are derived based on a real-valued mean square error function, which explicitly minimizes only the magnitude error. In addition, the mean squared error function is nonanalytic in the complex domain (not differentiable in an open set), as described by the Cauchy Riemann equations. Therefore, pseudogradients or isomorphic C1 → 2 transformations are generally used in the derivation of the learning algorithms. The use of pseudogradients will affect the phase approximation capabilities of these algorithms. For better phase approximation, one needs to use an error function which simultaneously minimizes both the magnitude and phase errors [22]. If such functions are inseparable into their real and imaginary parts, then the pseudogradients are invalid and one needs to identify a mathematical tool to derive the complex gradients of such functions. Moreover, these algorithms are gradient descentbased batch learning algorithms and require a huge computational effort during training. Therefore, there is a need to develop a fully complex-valued neural network and a fast learning algorithm to overcome the aforementioned issues. In this paper, we present a fully complex-valued single hidden layer neural network with a Gaussian-like sech activation function in the hidden layer and an exponential activation function in the output layer. For a given training dataset and a number of hidden neurons, the network parameters are estimated using a projection-based learning algorithm. The learning algorithm employs a nonlinear logarithmic error function, which explicitly contains both the magnitude and phase errors, as the energy function. The problem of finding the optimal weights is formulated as a nonlinear programming problem and solved with the help of Wirtinger calculus [25]. The Wirtinger calculus provides a framework for determining gradients of a nonanalytic nonlinear complex energy function. The projection-based learning algorithm converts the nonlinear programming problem into solving a system of linear algebraic equations and provides a solution for the optimal weights, corresponding to the minimum energy point of the energy function. This is similar to a relaxation process where the system always returns to a minimum energy state from a given initial condition [26]. Therefore, we refer to the proposed complex-valued network as a fully complex-valued relaxation network (FCRN). As the projection-based learning algorithm of FCRN estimates the optimal output weights as a solution to a system of linear equations, it requires minimal computational effort to approximate any desired function with higher accuracy, compared to other gradient-based algorithms which are iterative. We study the classification and function approximation abilities of FCRN in detail. First, the classification ability of FCRN is studied using a set of real-valued benchmark classification problems from the University of California, Irvine (UCI) machine learning repository. Next, the FCRN is used to solve two practical function approximation problems consisting of a nonminimum, nonlinear quadrature amplitude modulation (QAM) channel equalization problem [27], an adaptive beamforming problem [28], and a real-valued mammogram classification problem [29]. In all these problems, the performance of the FCRN is compared against the best results

t

h1 = sech (v1T(z t −u1)) w11 t h2

z1t

K

exp Σ w1k hk k=1

(

t

)

y^ t 1

zj t

y^ t l

zmt

y^nt

t

hK−1 wnK t hK

Fig. 1.

Architecture of the FCRN.

reported in the literature. The quantitative and qualitative performance study results clearly indicate that the FCRN outperforms the existing algorithms in the literature. This paper is organized as follows. In Section II, the FCRN and its projection-based learning algorithm are described in detail. In Section III, the classification performance of the FCRN is studied in comparison to other complex-valued/realvalued classifiers in the literature using a set of real-valued benchmark problems. Next, the FCRN is used to solve three practical problems and its results are compared with other results available in the literature in Section IV. Finally, Section V summarizes the conclusions from this paper. II. F ULLY C OMPLEX -VALUED R ELAXATION N ETWORK In this section, we first describe the architecture of the FCRN and then present the learning algorithm of the FCRN in detail. A. FCRN Architecture The architecture of the FCRN is shown in Fig. 1. It has a linear input layer with m neurons, a nonlinear hidden layer with K neurons, and a nonlinear output layer with n neurons. The neurons in the hidden layer employ a Gaussianlike hyperbolic secant function (sech). Hence, for an input t ]T ∈ Cm , the response of the kth hidden zt = [z 1t , . . . , z m t neuron (h k ) is k = 1, . . . , K (1) h tk = sech vkT zt − uk , T where vk = [v k1 , . . . , v km ]T ∈ Cm and uk = [u 1k , . . . , u m k ] ∈ m C are the scaling factor and the center of the kth hidden neuron, T represents the matrix transpose operator, and sech(z) = 2/e z + e−z . y1t , . . . , ynt ]T ∈ The output of the network is given by yt = [ n C . The neurons in the output layer employ an exp activation function and the response of the lth output neuron ( ylt ) is K t t l = 1, . . . , n (2) wlk h k , yl = exp k=1

SAVITHA et al.: PROJECTION-BASED FULLY COMPLEX-VALUED RELAXATION NEURAL NETWORK

where wlk ∈ C is the weight connecting the kth hidden neuron and the lth output neuron. Let { z1 , y1 , . . . , zt , yt , . . . , z N , y N } be the training dataset, where zt ∈ Cm is the m-dimensional input and yt ∈ Cn is the n-dimensional target of the tth training sample. The main objective of the FCRN is to estimate the free parameters of the network (U, V, and W) such that the underlying transformation function (F: zt → yt ) is approximated as accurately as possible. Most of the learning algorithms reported in the literature use a mean square error deviation yt ) as the error between the actual (yt ) and predicted output ( function. However, accurate estimations of both magnitude and phase of the complex-valued signals are important in many real-world applications [30]. Hence, in this paper, we use a nonlinear logarithmic error function as the energy function with an explicit representation of both the magnitude and phase of the actual and predicted outputs. B. Nonlinear Logarithmic Energy Function sample is repreThe actual output (yt ) of the tthtraining sented in polar form as ylt = rlt exp i φlt , where rlt = ||ylt || is its magnitude and φlt = arctan Im(ylt )/Re(ylt ) 1 is its phase and i is the complex operator. Similarly, the predicted output in polar form ( yt ) of the tth training t sample ist represented t || is the estimated rlt exp i φl , where = || y r as ylt = l l magnitude and φlt = arctan I m( ylt )/Re( ylt ) is the estimated phase. Then, the energy function Jt should be a monotonically decreasing function that represents the magnitude and phase t → φ t . quantities explicitly, i.e., Jt → 0 when rt → rt and φ We propose an energy function that uses the logarithmic function for explicit representation of magnitude and phase of complex signals and is of the form Jt =

n

ylt )−ln( ylt )) (ln( ylt )−ln( ylt )) (ln(

(3)

l=1

where (.) is the conjugate of the complex signal (.) and ln(.) represents the natural logarithmic function. Substituting the polar representation of actual (ylt ) and predicted output ( ylt ), the above equation reduces to 2 n 2 t rlt t ln t . (4) Jt = + φl − φl rl

531

C. Projection-Based Learning Algorithm for the FCRN For a given initial condition (i.e., N training samples and K hidden neurons), the projection-based learning algorithm finds the network parameters for which the energy function is minimum, i.e., the relaxation point of the energy function. The hidden neuron centers (ck ) and scaling factors (vk ) of the FCRN are chosen as random constants and the optimal output weights (W∗ ∈ Cn×K ) are estimated such that the total energy reaches its minimum W∗ := arg

min

W∈Cn×K

J (W).

(6)

The problem of estimating the optimal weight is converted to an unconstrained minimization problem ( J (W): Cn×K → ) involving minimization of J (W). Let W∗ ∈ Cn×K ; then W∗ is the optimal output weight corresponding to the minimum of the energy function if J (W∗ ) ≤ J (W) ∀ W ∈ Cn×K . The optimal W∗ corresponding to the minimum energy point of the energy function is obtained by equating the first-order partial derivative of J (W) with respect to the output weight to zero ∂ J (W) = 0, l = 1, . . . , n, p = 1, . . . , K . (7) ∂wlp For convenience, we rewrite the energy function as 1 ylt )−ln( ylt )). (ln( yˆlt )−ln( ylt )) (ln( 2 N

J (W) =

n

(8)

t =1 l=1

By substituting the predicted output ( yt ) from (2) into (8), the energy function reduces to K K n N 1 t t t J (W) = wlk h k −ln yl wlk h k −ln ylt 2 t =1 l=1

k=1

k=1

(9) where h tk is the kth hidden neuron response for the tth sample. Since the energy function is a nonanalytic, nonlinear realvalued function of the complex-valued output weights and is inseparable into its real and imaginary parts, we use the Wirtinger calculus2 [25] to obtain the first-order partial derivatives of the energy function with respect to the complexvalued output weight (wlp ). The Wirtinger calculus eliminates

l=1

It can be observed from (4) that the energy function represents the magnitude and phase quantities explicitly and Jt tends to 0, when ylt → ylt . It must also be noted that the energy function is second-order continuously differentiable with respect to the network parameters. For N training samples, the overall energy is defined as 2 n N N t 2 rlt 1 1 t ln t . Jt = + φl − φl J (W) = 2 2 rl t =1

t =1 l=1

(5) In the next section, we derive the projection-based learning algorithm of the FCRN such that J (W) is minimum. 1 y t = Re(y t ) + i Im(y t ), where Re(.) and Im(.) refers to the real and l l l

imaginary parts of a complex number, respectively

2 Let f (z , z ) be a real-valued function of a complex-valued variable R c c z c = xr + iyr . Then, the following pairs of derivatives are defined by the Wirtinger calculus:

∂ fR | ∂z c z c =constant ∂ fR R-derivative of f R (z, z c ) = |z =constant. ∂z c c

R-derivative of f R (z c , z c ) =

(10) (11)

It is proved in [31] that the R-derivative (10) and the R-derivative (11) can be equivalently written as

1 ∂ fR ∂f ∂ fR = −i R ∂z c 2 ∂xr ∂yr

∂ fR 1 ∂ fR ∂ fR = +i (12) ∂z c 2 ∂xr ∂yr where the partial derivatives with respect to xr and yr are true partial derivatives of the function f R (z c ) = f R (xr , yr ), which is differentiable with respect to the xr and yr .

532


the stringent conditions of analyticity for the complex differentiability imposed by the Cauchy Riemann conditions. They define the complex differentiability of almost all functions of interest, including the energy function that maps (C → ). Although the derivatives defined by Wirtinger calculus do not satisfy the Cauchy Riemann equations, they obey all the rules of calculus (like differentiation of products, chain rule, etc.). Using the Wirtinger calculus and the commutative property of the Complex conjugate operator,3 the first-order partial derivative of energy function with respect to wlp (l = 1, 2 . . . , n and p = 1, 2, . . . , K ) is given as

K N t ∂ J (W) t t ¯ (13) = hp wlk h k − ln y¯l . ∂wlp t =1

wlk

k=1

N

h tp h¯ tk =

t =1

N

ln y lt h tp .

wlk A pk = Blp ,

h tk = 0.

p = 1, . . . , K ,

ukT (zt − vk ) = ∞ z → ∞, or uk → ∞, or vk → ∞.

(15) (16)

where the projection matrix A ∈ C K ×K is given by h tp h¯ tk ,

p = 1, . . . , K ,

k = 1, . . . , K (17)

t =1

and the output matrix B ∈ Cn×K is given by Blp =

N

ln y¯lt h tp ,

l = 1, . . . , n,

p = 1, . . . , K . (18)

t =1

Equation (15) gives the set of n × K linear equations with n × K unknown output weights W. Note that the projection matrix is always a square matrix of order K × K . We state the following propositions to find the closed-form solution for these set of linear equations. Proposition 1: The responses of the neurons in the hidden layer are unique. i.e., ∀ zt , when k = p, h tk = h tp ; k, p = 1, 2 . . . K , t = 1, . . . N. Proof: Let us assume the following: For a given zt ,

h tp = h tk ,

k = p.

|z j | < 1,

(19)

A pk =

N

t

h tp h k ,

(24)

p = 1, . . . , K ,

k = 1, . . . , K (25)

t =1

it can be derived that the diagonal elements of A for the tth sample are Atkk = h tk h tk , k = 1, . . . , K . (26) From Proposition 2, the responses of the hidden neurons are nonzero. Hence, Atkk = 0. Therefore, (26) can be written as Atkk = |h tk |2 > 0.

(27)

Hence, the diagonal elements of the projection matrix are real, and positive, i.e., Atkk ∈ > 0. This can be extended for the entire training sample set as Akk =

N

Atkk ∈ > 0.

(28)

t =1

(20)

The pair of parameters u kj and u p j (that are elements of the vectors uk and u p , respectively), and v kj and v p j 3 z + z = z + z and ln(z ) = ln(z ). a a a a b b

j = 1, . . . , m.

Hence, the assumption in (22) is not valid and response of the hidden neuron h tk = 0 ∀ zt . Using Propositions 1 and 2, we state the following theorem. Theorem 1: The projection matrix A is a positive-definite Hermitian matrix, and hence, it is invertible. Proof: From the definition of the projection matrix A given in (17)

This assumption is valid if and only if sech vTp (zt − u p ) = sech vkT (zt − uk ) or vTp (zt − u p ) = vkT (zt − uk ).

(23)

As stated in (21), the hidden layer parameters are random constants from within a circle of radius 1. The input variables zt are also normalized in a circle of radius 1 such that

l = 1, . . . , n

WA = B

(22)

This is possible if and only if

which can be represented in matrix form as

N

j = 1, . . . , m.

Therefore, uk = u p and vk = v p for any zt (the tth random input vector of the training data with N samples). Hence, the response of the kth and pth hidden neurons are not equal, i.e., h tp = h tk ∀ zt ; t = 1, . . . , N. Proposition 2: The responses of the neurons in the hidden layer are nonzero, i.e., ∀ z, h tk = 0; k = 1, 2 . . . K . Proof: Let us assume that the hidden layer response of the kth hidden neuron is 0

t =1

k=1

A pk =

k = 1, . . . , K ,

(21)

(14)

Equation (14) can be written as K

||u kj || < 1; ||v kj || < 1,

k=1

Equating the first partial derivative to zero and rearranging (13), we get K

(that are elements of the vectors vk and v p , respectively) are uncorrelated random constants chosen from a ball of radius 1

The off-diagonal elements of the projection matrix (A) for the tth sample are t

t

Atkj = h tk h j and Atj k = h tj h k

⇒

Atkj

=

t A jk.

(29) (30)


Using the commutative property of the complex conjugate operator, (29) can be extended for all the N samples as Akj =

N

Atkj =

N

t =1

t

A jk = A jk.

(31)

t =1

From (28) and (31), it can be inferred that the projection matrix A is a Hermitian matrix. A Hermitian matrix is positive definite if and if only q H Aq > 0 for any q = 0. Let us consider a unit basis vector q1 ∈ K ×1 such that q1 = [1 · · · 0 · · · 0 · · · 0]T . Therefore q1H Aq1 = A11 .

A11 ∈ > 0 ⇒

q1H Aq1

> 0.

Similarly, for a unit basis vector qk = product qkH Aqk is given by qkH Aqk = Akk > 0,

[0 · · · 1 · · · 0]T ,

k = 1, . . . , K .

(33) the (34)

CK

Let p ∈ be the linear transformed sum of K such unit basis vectors, i.e., p = q1 t1 + · · · + qk tk + · · · + q K t K , where tk ∈ C is the transformation constant. Then p Ap = H

=

K k=1 K

H

(qk tk ) A

K

(qk tk )

k=1 K |tk |2 qkH Aqk = |tk |2 Akk .

k=1

(35)

k=1

As shown in (28), Akk ∈ > 0. Also, note that |tk |2 ∈ > 0 is evident. Therefore |tk |2 Akk ∈ > 0

⇒

K

∀k

|tk |2 Akk ∈ > 0.

(36)

k=1

Hence, A is positive definite and invertible. The solution for W obtained as a solution to the set of equations, given in (16), is minimum, if ∂ 2 J /∂wlp 2 > 0. The second derivative of the energy function (J ) with respect to the output weights is given by N N ∂ 2 J (W) t t = h h = |h tp |2 > 0. p p ∂wlp 2 t =1

(37)

t =1

As the second derivative of J (W) is positive (37), the following observations can be made: 1) the function J is a convex function; 2) the output weight W ∗ obtained as a solution (16) is the weight corresponding to the minimum energy point of J . Using Theorem 1, the solution for the system of equations in (16) can be determined as follows: ∗

W = BA−1 .

(38)

Applying the commutative law of multiplication of complexvalued conjugates W∗ = B A

−1

.

Algorithm 1 Learning Algorithm of FCRN 1 1 t t Given N Nthe training dataset: (z , y ), . . . , (z , y ), . . . , (z , y ) . Step 1: Choose the number of hidden neurons K and the random hidden layer parameters uk , vk ; k = 1, . . . , K . Step 2: Compute the hidden layer responses h tk using h tk = sech vkT zt − uk , k = 1, . . . , K . (40) Step 3: Compute the projection matrix A using

(32)

In (28), it was shown that k = 1, . . . , K , Akk ∈ > 0. Therefore

(39)

533

A pk =

N

h tp h¯ tk ,

p, k = 1, . . . , K .

(41)

t =1

Step 4: Compute the output matrix B using Blp =

N

ln y¯lt h tp ,

l = 1, . . . , n,

p = 1, . . . , K .

t =1

(42)

Step 5: Compute the optimum output weights using W∗ = B A

−1

.

(43)

Thus, the estimated output weights (W∗ ) corresponds to the minimum energy point of the energy function. The learning algorithm of the FCRN is summarized in Pseudocode 1. Thus, it can be seen that the FCRN estimates the output weights in a single step, and is, hence, computationally efficient. The conditions for the global stability of the FCRN can be extended from the conditions derived for complex-valued recurrent neural networks in [32]. In the following sections, we study the classification ability of the FCRN using a set of benchmark classification problems. III. FCRN AS A C LASSIFIER Recent studies have shown that the orthogonal decision boundaries of a complex-valued neural network with a splittype activation function provide them with superior decisionmaking ability than their real-valued counterparts [9]. This generated an increased interest among researchers to develop complex-valued classifiers to solve real-valued classification problems. The multilayer multivalued network (MLMVN) [10], the single-layer network with phase encoded inputs [11], [33] referred to here as phase encoded complex-valued neural network (PE-CVNN), the bilinear branch-cut complex-valued extreme learning machine (BB-CELM), the phase-encoded complex-valued extreme learning machine (PE-CELM) [12], and the FC-RBF classifier [13] are some of the complex-valued classifiers available in the literature. A multivalued neuron of the MLMVN [10] uses a multiplevalued threshold logic to map the complex-valued input to n discrete outputs using a piecewise continuous activation function, where n is the total number of classes. The transformation used in the MLMVN is such that it results in the same complex-valued input feature for both the real-valued features of values 0 and 1. Thus, the transformation does not perform a one-to-one mapping of the real-valued features

534


to the complex domain and might cause misclassification. In the PE-CVNN [11], the complex-valued features are obtained by phase encoding the real-valued features between [0, π] using the transformation z t = exp(i π x t ), where x t are the real-valued input features normalized in [0, 1]. An FC-RBF classifier [13] has also been developed using the phaseencoded transformation. The phase-encoded transformation maps the real-valued input features into the I and II quadrants of the complex plane, completely ignoring the other two quadrants. Therefore, the transformation does not completely exploit the advantages of the orthogonal decision boundaries. In addition, MLMVN, PE-CVNN, and FC-RBF classifiers use gradient descent-based learning algorithms that require significant computational effort during training. Thus, these classifiers suffer from issues due to their transformations and the nature of their learning algorithm. Recently, two complex-valued fast learning neural classifiers, the BB-CELM and the PE-CELM, were developed in [12]. These classifiers use a three-layered complex-valued neural network for classification. The input layer of a BB-CELM employs the bilinear transformation with a branchcut around 2π and the PE-CELM employs a phase-encoded transformation to transform the real-valued input features to the complex domain. These classifiers are fast learning classifiers and compute the output weight for randomly chosen hidden layer parameters. However, as the input transformations in these classifiers are similar to MLMVN and PE-CVNN, they suffer from the issues due to these transformations. In this paper, we use a circular transformation to map the real-valued input features to the complex domain. The circular transformation effectively performs a one-to-one mapping of the real-valued input features to all four quadrants of the complex domain. Hence, it efficiently exploits the orthogonal decision boundaries of the FCRN classifier. A. Real-Valued Classification in the Complex Domain Assume that (x1 , c1 ), . . . (xt , ct ), . . . , (x N , c N ) where xt = [x 1t · · · x tj · · · x mt ]T ∈ m be a set of N observations belonging to n distinct classes, where xt is the m-dimensional input features of tth observation and ct ∈ {1, 2, . . . , n} is its class label. Solving the real-valued classification problem in the complex domain requires that the real-valued input features be mapped onto the complex space (m → Cm ) and the class labels are coded in the complex domain. The complex-valued t ]T ) are obtained by using input features (zt = [z 1t · · · z tj · · · z m a circular transformation. The circular transformation for the j th feature of the tth sample is given by z tj = sin(ax tj + bx tj + α j ),

j = 1, . . . , m

(44)

where a, b ∈ [0, 1] are randomly chosen scaling constants and α j ∈ [0, 2π] is used to shift the origin to enable effective usage of the four quadrants of the complex plane. As the circular transformation performs one-to-one mapping of the real-valued input features to the complex domain and uses all the four quadrants of the complex domain effectively, it overcomes the issues in the existing transformations.

The coded class label in the complex yt = [y1t · · · ylt · · · ynt ]T ∈ Cn is given by 1 + 1i, if ct = l t yl = l = 1, . . . n. −1 − 1i, otherwise.

domain

(45)

Therefore, the classification problem in the complex domain as follows: given the training dataset 1 1 is defined (z , y ), ·(zt , yt ), . . . , (z N , y N ) , estimate the decision function (F : Cm → Cn ) to enable an accurate prediction of the class labels of unseen samples. The predicted class label ( ct ) t is obtained from the predicted output of the network ( y ) as ct = (46) max Re ylt . l=1,2,...,n

B. Datasets Here, we investigate the classification performance of the FCRN on a set of benchmark classification problems from the UCI machine learning repository [35]. The details of the benchmark datasets used in this paper are summarized in Table I. The availability of small number of samples, the sampling bias, and the overlap between classes introduce additional complexity in the classification and may affect the classification performance of the classifier [36], [37]. Hence, we also consider the imbalance factor (I.F.) nof the training Nl ). From dataset (I.F. = 1 − Nn minl−1,...,n Nl ; N = l=1 the table, it can be seen that all the datasets considered, except the image segmentation dataset, are unbalanced datasets and their imbalance factors vary between 0.0968 and 0.68. For all these problems, the performance of the FCRN is compared with the real-valued support vector machines (SVM), extreme learning machines (ELM), and self-adaptive resource allocation network (SRAN) [38] classifiers. As the FCRN is a complex-valued network, its performance is also compared against that of the best performing complexvalued classifiers, viz., BB-CELM [12], FC-RBF [13], and CSRAN [23]. The number of hidden neurons in ELM, BB-CELM, and FC-RBF are selected using a constructive– destructive procedure similar to that presented in [39]. The performance results for SRAN, BB-CELM, and FC-RBF classifiers are reported from [38], [12], and [13], respectively. The results for SVM are reproduced from [40]. Performance Measures: The following performance measures are used to compare the classification performance of the classifiers. n Average Classification Efficiency (ηa ): ηa = 1/n l=1 qll /Nl × 100%, where qll is the total number of correctly classified samples in the training/testing dataset. n Overall Classification Efficiency (ηo ): ηo = l=1 qll /N × 100%. F1-Score: F1-score, which is the weighted average of precision and recall, is also used to compare the classifiers on the binary classification problems. A classifier with F1-score of 1 is considered the best classifier. Statistical Test: The statistical significance of the classifiers is emphasized using a rank-based Friedman test [41] and a post ad hoc Bonferroni-Dunn test [42]. Classifiers are ranked based on their efficiencies. Classifiers with equal efficiencies are ranked based on their average ranks.


535

TABLE I D ESCRIPTION OF THE VARIOUS R EAL -VALUED C LASSIFICATION P ROBLEMS U SED IN THE P ERFORMANCE S TUDY Type of

Prob.

Data Set

No. of

No. of

No. of Samples

Features

Classes

Train

Test

I.F. Train

Test

Image Segmentation

19

7

210

2100

0

0

Multi-

Vehicle Classification

18

4

424

422

0.1

0.12

Categ.

Glass Identification

9

6

109

105

0.68

0.73

Iris

4

3

45

105

0

0

Wine

13

3

60

118

0

0.29

Acoustic Emission [34]

5

4

62

137

0.1

0.33

Binary

Liver Disorder

6

2

200

145

0.17

0.145

PIMA Data

8

2

400

368

0.225

0.39 0.33

Breast Cancer

9

2

300

383

0.26

Heart Disorder

14

2

70

200

0.14

0.1

Ionosphere

34

2

100

251

0.28

0.29

C. Multicategory Classification Problems First, we present the performance results of the FCRN on multicategory benchmark classification problems. The number of neurons used in the classification, the computational effort in training, and the training/testing classification accuracies of the classifiers used in this paper on these problems are presented in Table II. From the table, it can be observed that the generalized performance of the FCRN is better than other real- and complex-valued classifiers available in the literature. In comparison to the real-valued classifiers, the FCRN exhibits significant improvement in the classification performance, especially in the unbalanced vehicle classification and the glass identification datasets. While the overall classification performance improves by 1% in the well-balanced image segmentation dataset, the improvement is about 4% and 8% in the vehicle classification and glass identification datasets. It is also observable from the table that the FCRN classifier outperforms all the other complex-valued classifiers available in the literature. Also, it requires considerably lower computational effort for training compared to other complexvalued batch learning classifiers. This is so because the FCRN estimates the output weights in a single step using (43), and its order complexity is O(N×K 2 ). On the other hand, the computational complexity in training an iterative batch learning FC-RBF network is O(LNP), where L is the number of epochs and P = K (2m + n) is the total number of parameters to be estimated. It must be noted that P >> K and L >> K . D. Binary Classification Problems In this section, we study the performance of the FCRN classifier on the benchmark binary classification problems described in Table I. The overall and average classification accuracies and the F1-score of all the classifiers on the binary classification problems used in this paper are tabulated in Table III. From the table, it can be observed that the FCRN classifier outperforms all the other real-valued/complex-valued

classifiers used in this paper. The orthogonal decision boundaries of the complex-valued classifiers help them to outperform the real-valued classifiers. Although CSRAN and FC-RBF are also complex-valued classifiers with the sech activation function in the hidden layer, the energy function used in the FCRN helps it to perform better than these classifiers. Moreover, the performance of CSRAN is not as good as FC-RBF and FCRN because CSRAN has been originally developed to solve complex-valued function approximation problems. It has been shown in [43] that such an algorithm does not perform classification tasks efficiently. Thus, it can be inferred from the results on multicategory and binary benchmark classification problems presented in this section that the FCRN classifier outperforms other complexvalued and real-valued classifiers available in the literature. This is because of the following reasons. 1) The orthogonal decision boundaries of the FCRN help it to classify more efficiently than real-valued classifiers. 2) The FCRN classifier uses a circular transformation to map the real-valued input features to the complex domain. The circular transformation maps the realvalued features to all the four quadrants of the complex plane and exploits the advantages of the orthogonal decision boundaries more effectively than the transformations used in other complex-valued classifiers. 3) The FCRN uses a logarithmic energy function that is an explicit representation of both the magnitude and phase errors and, hence, approximates phase more efficiently than the complex-valued classifiers that use the meansquared error function. As the classification efficiencies are alone not a conclusive measure of the classifier performance [44], we conduct the Friedman test [41] to establish the statistical significance of the FCRN classifier. Based on the overall efficiencies of the classifiers (ranked in Tables II and III), the average ranks of the CSRAN, SVM, ELM, SRAN, FC-RBF, BB-CELM and FCRN classifiers can be calculated as 5.0909, 5.0455, 3.8636, 4.0909, 5.3182, 3.0909, and 1.5, respectively. Under

536


TABLE II P ERFORMANCE C OMPARISON FOR THE M ULTICATEGORY C LASSIFICATION P ROBLEMS . p T HE S UPERSCRIPTS ((rl ) ) D ENOTE THE R ANK

Problem

Classifier

Classifier

Real valued

Complex valued

Real valued Vehicle Classification Complex valued

Real valued Glass Identification Complex valued

Real valued Iris Complex valued

Real valued Wine Complex valued

Real valued Acoustic Emission Complex valued

Training

Training

Testing ηo

p

(rl )

p ηa (rl )

Parameters

Time (S.)

ηo

ηa

SVM ELM SRAN FC-RBF CSRAN BB-CELM FCRN

127∗ 1323 1296 3420 4860 5850 6300

721 0.25 22 421 339 0.3 0.4

94.28 92.86 97.62 96.19 92 96.19 96.67

94.28 92.86 97.62 96.19 92 96.19 96.67

91.38(5) 90.23(6) 93(2) 92.33(4) 88(7) 92.5(3) 93.3(1)

91.38(5) 90.23(6) 93(2) 92.33(4) 88(7) 92.5(3) 93.3(1)


340∗ 3450 2599 5600 6400 8000 7200

550 0.4 55 678 352 0.11 0.8

79.48 85.14 91.45 88.67 84.19 90.33 89.15

79.82 85.09 90.25 88.88 84.24 90.16 88.95

70.62(7) 77.01(4.5) 75.12(6) 77.01(4.5) 79.15(3) 80.3(2) 82.62(1)

68.51(7) 77.59(4) 76.86(6) 77.46(5) 79.16(3) 80.4(2) 82.46(1)


183∗ 1280 944 4320 3840 3360 3840

320 0.05 28 452 452 0.08 0.25

86.24 92.66 94.49 95.6 87.16 93.58 96.33

93.23 96.34 96.28 95.54 93.67 82.92 98.17

70.47(7) 81.31(6) 86.2(3) 83.76(4) 83.5 (5) 88.16(2) 94.5(1)

75.61(7) 87.43(2) 80.95(4.5) 80.95(4.5) 78.09(6) 81(3) 88.3(1)


13∗ 80 64 220 242 88 132

0.02 0.01 24.3 352 54.6 0.03 0.01

100 100 100 95.24 97.14 95.24 100

100 100 100 95.24 97.14 95.24 100

96.19(6) 96.19(6) 96.19(6) 100(2.5) 100(2.5) 100(2.5) 100(2.5)

96.19(6) 96.19(6) 96.19(6) 100(2.5) 100(2.5) 100(2.5) 100(2.5)


13∗ 153 204 580 754 928 580

0.1 0.25 76 187 102.5 0.03 0.05

100 100 98.33 100 83.33 100 100

100 100 98.33 100 83.33 100 100

97.46(2.5) 97.46(2.5) 96.61(4.5) 90.9(6) 85.6(7) 100(1) 96.61(4.5)

98.04(2.5) 98.04(2.5) 97.19(4) 88.98(6) 84.53(7) 100(1) 96.98(5)


22∗ 100 100 280 224 392 112

0.05 22 129.6 32.6 0.03 0.03

98.39 98.39 96.77 98.39 98.39 96.77 98.44

98.44 98.44 96.875 98.44 98.44 96.88 98.39

98.54(4.5) 98.54(4.5) 99.27(1.5) 96.35(7) 98.54(4.5) 98.54(4.5) 99.27(1.5)

97.95(5.5) 97.95(5.5) 98.91(1.5) 95.2(7) 98.08(3.5) 98.08(3.5) 98.91(1.5)

Domain

Image Segmentation

No. of

∗ Number of support vectors.

the null hypothesis that states that all the classifiers are equivalent so their ranks must be equal, the Friedman and statistic χ F2 is 26.221. A better statistic derived by Iman and Davenport [44] which follows the F-distribution is 6.5916. The modified statistic follows the F-distribution with 6 and 60 degrees of freedom and the critical value for rejecting the

null hypothesis at a significance level of 0.05 is 2.25. Since the modified Friedman statistic (FF ) is greater than the critical value (6.5916 >> 2.25), we can reject the null hypothesis and it can be inferred that the classifiers used in this paper are not equivalent. Similar effects can be observed in the testing average efficiencies also.


537

TABLE III P ERFORMANCE C OMPARISON ON B ENCHMARK B INARY C LASSIFICATION P ROBLEMS . p T HE S UPERSCRIPTS ((rl ) ) D ENOTE THE R ANK

Problem

Classifier

Classifier

Domain Real valued Liver Disorders Complex valued

Real valued PIMA Data Complex valued

Parameters

Breast Cancer Complex valued

Ionosphere

Real valued Complex

Real valued Heart Disease Complex valued

Training Time (S.)

Training ηo

ηa

Testing ηo

p

(rl )

p ηa (rl )

F1-score

70.21(5)

0.7494

SVM

141∗

0.0972

79.5

77.93

71.03(5)

ELM

900

0.1685

88.5

87.72

72.41(4)

71.41(4)

0.7638

SRAN

819

3.38

92.5

91.8

66.9(7)

65.8(7)

0.7166

FC-RBF

560

133

77.25

75.86

74.46(3)

75.41(1)

0.7590

CSRAN

560

38

71

72.41

67.59(6)

69.24(6)

0.6770

BB-CELM

420

0.06

77.5

75.69

75.17(2)

74.64(2)

0.7831

FCRN

280

0.05

78

76.29

75.86(1)

74.63(3)

0.7960

SVM

221∗

0.205

77

74.71

77.45(6)

76.33(4)

0.6635

ELM

1100

0.2942

84.3

82.64

76.63(7)

75.25 (5)

0.6502

SRAN

2067

12.24

89

87.35

78.53(3.5)

74.9(6)

0.6518

FC-RBF

720

130.3

72

65.77

78.53(3.5)

68.49(7)

0.5568

CSRAN

720

64

75.5

73.13

77.99(5)

76.73(3)

0.6692

BB-CELM

360

0.15

76.5

74.77

78.8(2)

77.31(2)

0.6777

77.45(1)

0.6870

0.125

78.75

76.25

SVM

540 24 ∗

80.71(1)

0.1118

98.67

98.76

96.6(4)

97.06(4)

0.9492

ELM

792

0.1442

100

100

96.35(5)

96.5(6)

0.9455

SRAN

84

0.17

98

97.5

96.87(3)

97.3(3)

0.9531

FC-RBF

400

158.3

99

99.02

97.12(2)

97.45(2)

0.9568

CSRAN

800

60

98.67

98.57

96.08(6)

96.86(5)

0.9419

BB-CELM

600

0.06

94.33

94.39

92.69(7)

91.78(7)

0.8906

97.84(1)

0.9608

FCRN Real valued

No. of

FCRN

600

0.16

99

99.02

97.4(1)

SVM

43∗

0.0218

97

95.83

91.24(2)

88.51(3)

0.8702

ELM

1184

0.0396

94

92.27

89.64(4.5)

87.52(5)

0.8496

SRAN

777

3.7

99

98.6

90.84(3)

91.88(1)

0.8786

FC-RBF

1400

186.2

98

97.83

89.64(4.5)

88.01(4)

0.8521

CSRAN

420

86

94

91.67

88.05(7)

85.78(6)

0.8260

85.67(7)

0.8319

BB-CELM

3500

0.17

95

94.27

88.85(6)

FCRN

2100

0.0624

95

93.66

92.03(1)

89.4(2)

0.8825

SVM

42∗

0.038

87.14

86.69

75.5(7)

75.1(7)

0.7238

ELM

612

0.15

90

89.58

76.5(5.5)

75.9(6)

0.7296

SRAN

476

0.534

91.43

90.83

78.5(3)

77.525(4)

0.7430

FC-RBF

1200

45.6

94.29

94.17

78(4)

77.78(3)

0.7556

CSRAN

1200

26

100

100

76.5(5.5)

76.21(5)

0.7376

82.53(2)

0.8061

84.7(1)

0.8297

BB-CELM

300

0.03

84.3

82.5

83(2)

FCRN

600

0.0307

82.86

81.25

85.5(1)

∗ Number of support vectors.

As the null hypothesis is rejected, next, we conduct the post ad hoc Bonferroni-Dunn test [42] to emphasize the significance of the FCRN classifier. This test assumes that the performances of two classifiers are significantly different if their corresponding average ranks are different by a critical difference. The critical difference for the classifiers and datasets used in this paper is calculated as 2.2052 for a significance level of 0.10 (q0.1 = 2.2052). The differences in average rank between the FCRN classifier and CSRAN, SVM, ELM, SRAN, FC-RBF, and BB-CELM classifiers are 3.5909, 3.5455, 2.3636, 2.5909, 3.8182, and 1.5909, respectively. Since the differences in average ranks between the FCRN classifier and

the other classifiers (except BB-CELM) are greater than the critical difference, we can infer that the FCRN classifier is better than the other classifiers. IV. E VALUATION OF THE FCRN ON P RACTICAL P ROBLEMS In this section, the FCRN is used to solve three practical problems: 1) a QAM channel equalization problem [27]; 2) an adaptive beamforming problem [28]; and 3) a mammogram classification problem [29]. The performance of the FCRN is compared with existing results in the literature for these problems.

538


TABLE IV P ERFORMANCE C OMPARISON ON THE QAM

A. QAM Channel Equalization Problem QAM is the most commonly used modulation scheme in a digital telecommunication system. In QAM, two symbols are simultaneously transmitted through the same channel by modulating the amplitude of two carrier waves that are 90° out of phase with each other. The nonlinear characteristics of the channel introduce effects such as spectral spreading, intersymbol interference, and constellation warping, during the transmission of the symbols. Hence, an equalizer at the receiver end of the communication channel is essential to nullify these effects without degrading the signal-to-noise ratio (SNR). The various channel models and the equalization schemes available in the literature have been surveyed in [8]. In this paper, we use the well-known, nonlinear complexvalued Cha and Kassam channel model [27]. The channel output (ot ) at time t defined in [27] is given by 2

3

t

0.43i )s t −1

t =1 l=1

n

N 180 1

e = . | arg ykt ykt | × N ×n π t =1

(49)

k=1

The training time, the number of hidden neurons, the training and testing magnitude, and phase errors of the complexvalued learning algorithms are presented in Table IV. It is evident from the table that the FCRN outperforms the other algorithms, and is comparable to that of CSRAN. It can also be observed that the FCRN requires only 14 neurons and very little computational effort to achieve this performance. The symbol error rate (SER) of the testing sample set is presented in Fig. 2. From the plot, it can be observed that the symbol error rate of the FCRN is lower than those of CRBF, FC-MLP, FC-RBF, and C-ELM equalizers at every SNR of

K

Training

Training Errors

Time (s)

J Me

φe

Testing Errors J Me

φe

FC-MLP

15

3862

0.2

6.5

0.7

31.1

CRBF

15

8107

0.6

35.2

0.6

39.9

C-ELM

15

0.4

0.6

34.1

0.6

35.1

FC-RBF

15

3840

0.4

31.2

0.4

14.7

CSRAN

7

45

0.2

6.4

0.3

7.1

FCRN

14

1

0.3

22.1

0.3

12.6

−0.5 CRBF C−ELM FC−RBF FC−MLP CSRAN FCRN

−1

(47)

+ (0.87 + + (0.34 − where a = (0.34 − 0.21i )s t −2, ν t is a white Gaussian noise with zero-mean and 0.01 variance, and s t is the transmitted symbol at time t. The different symbols transmitted in the channel are {1 + i, 1 − i, −1 + i, −1 − i }. Since the channel model is of order 3, the neural network equalizer input vector at time t is a T 3-D complex-valued channel observation zt = ot ot −1ot −2 and the target output (yt ) is the transmitted symbol s t −τ . The equalizer decision delay (τ ) is set as 1 (as used in [45]). The FCRN is trained with a training set of 5000 samples at 20 deciBels (dB) SNR. The trained equalizer is then tested with a testing sample set of 105 samples, generated randomly at every SNR between 4 and 20 dB. The performance of the FCRN is compared to other complex-valued learning algorithms available in the literature, viz., the FC-MLP [18], the CRBF [17], the C-ELM (with Gaussian activation function in the hidden layer) [45], and the FC-RBF [20]. The root mean squared magnitude error ( JMe ) and the average absolute phase error (φe ) are used as the performance measures for comparison n N 1 t JMe = yl − (48) ylt . ylt − ylt N ×n 0.27i )s t

Algo.

−1.5

log10 (SER)

o = a + 0.1a + 0.05a + ν t

C HANNEL E QUALIZATION P ROBLEM

−2 −2.5

Bayesian boundary

−3 −3.5 −4 4

Fig. 2.

6

8

10 12 14 16 Signal to Noise Ratio (dB)

18

20

Error probability curve.

the testing sample set, and is comparable to that of CSRAN equalizer. B. Adaptive Beamforming Problem Adaptive beamforming is an antenna array signal processing problem, where the beams are directed to the desired signal directions and the nulls are directed to the interference directions [28]. A set of M single-transmit antenna operating at the same carrier frequency and L uniformly spaced receiver elements constitute a typical beamforming mechanism. The spacing between the receiver elements (d) is usually set at half the wavelength (λ) of the received signal. Let θ be the angle of incidence that an incoming signal makes with the receiver array broadside. Then, from the basic trigonometric identities and the geometry of the sensor array, the signal received at the kth receiver antenna element (x k ∈ C) can be derived as

i 2πkdsinθ x k = exp . (50) λ Let ηk be the noise at the kth receiver element. The total signal induced at all the m receiver elements at a given instant t is the input to the beamformer (zt ∈ Cm ) and is given by zt = [x 0 + η0 , . . . , x k + ηk , . . . , x m + ηm ]T .

(51)


TABLE V

C. Mammogram Classification Problem

P ERFORMANCE C OMPARISON ON THE A DAPTIVE B EAMFORMING P ROBLEM Direction

Beam-1

Null-1

Null-2

Null-3

Mammogram is a noninvasive procedure that is preferred for early diagnosis of breast cancer, to detect if an identified abnormal mass is either malignant or benign [47]. In this paper, the mammogram database available in [29] has been used. We use 9 input features to train the FCRN classifier with 97 samples and test it using remaining 11 samples [29]. The performance results of the FCRN classifier, in comparison with other results available in the literature for this problem, are presented in Table VI. From the table, it is seen that the FCRN classifier outperforms other classifiers by at least 4%.

Beam-2

of Arrival

−30°

−15°

0°

15°

30°

FC-MLP

−13.97

−57.2

−57.34

−57.33

−13.98

CRBF

−17.94

−27.53

−27

−28.33

−17.92

C-ELM

−18.05

−48.28

−41.64

−47.5

−16.68

FC-RBF

−16.99

−58.45

−57.23

−56.32

−17.00

CSRAN

−13.83

−59.56

−57.37

−57.93

−13.83

FCRN

−14.27

−59.68

−60.4

−59.68

−14.1

V. C ONCLUSION

TABLE VI P ERFORMANCE C OMPARISON R ESULTS FOR THE M AMMOGRAM P ROBLEM Classifier

Classifier

Domain

539

Train

K

Time

Testing ηo

ηa

ELM

0.032

30

91

91

valued

SVM [48]

-

-

-

95.44

Complex-

FCRN

0.1872

40

100

100

Real-

valued

Let b = [b1 . . . bm ]T ∈ Cm be the weight vector of the sensor array. The actual signal transmitted at an instant t (y t ∈ C) is y t = b H zt .

(52)

This transmitted signal (y t ) forms the target of the beamformer. Thus, the objective of an adaptive beamformer is to estimate the weights (b of (52)), given the transmitted signal (y t ) and the signal received by the antenna array (zt ). In this paper, we consider a 5-sensor uniform linear array. The desired signal directions are set as −30° and +30° and the direction of interferences considered are −15°, 0°, and +15° [28]. An additive Gaussian noise at 50 dB SNR corrupts the received signal at array elements (zt ). A training dataset of 250 randomly chosen samples with 50 for each signal/interference angle is used to train the various beamformers. A 5–5–1 network (a network with five input neurons, five hidden neurons, and one output neuron) is used for the CRBF [17], C-ELM [45], FC-RBF [20], and FCRN beamformers. Table V presents the gains for the signals and interference nulls for the different beamformers. From the table, it can be observed that the FCRN beamformer outperforms all the other complex-valued beamformers in interference nulling and its interference nulling performance is better than that of the conventional optimal matrix method [46]. However, in beampointing, although the FCRN beamformer performs better than the CRBF, C-ELM, and the FC-RBF beamformers, it slightly lags behind the CSRAN and the matrix method-based beamformers.

This paper presented an FCRN and its projection-based fast learning algorithm. For a given set of hidden layer neurons and their associated random parameters, the projection-based learning algorithm determined the output weights corresponding to the minimum energy point of an energy function. The energy function was a nonlinear, logarithmic function that has an explicit representation of both the magnitude and phase of the target and predicted outputs. Using the Wirtinger calculus, the output weights of the FCRN were determined as a solution to a nonlinear programming problem. The projection-based learning algorithm computed the optimal output weights by converting the nonlinear programming problem to a problem of solving a set of linear algebraic equations. The output weights obtained were optimum weights obtained using minimum computational effort. The classification ability of the FCRN is investigated using a set of benchmark realvalued classification problems. The FCRN was then used to solve a QAM channel equalization problem, an adaptive beamforming problem, and a mammogram classification problem. The performance results showed the superior approximation/classification ability of the proposed FCRN. However, the random selection of input weights might influence the performance of the FCRN and must be carefully chosen for optimal performance. ACKNOWLEDGMENT The authors would like to thank the reviewers for their suggestions to improve the quality of this paper. R EFERENCES [1] C. Shen, H. Lajos, and S. Tan, “Symmetric complex-calued RBF receiver for multiple-antenna-aided wireless systems,” IEEE Trans. Neural Netw., vol. 19, no. 9, pp. 1659–1665, Sep. 2008. [2] R. Savitha, S. Vigneshwaran, S. Suresh, and N. Sundararajan, “Adaptive beamforming using complex-valued radial basis function neural networks,” in Proc. IEEE Region 10 Conf., Nov. 2009, pp. 1–6. [3] I. Aizenberg, D. V. Paliy, J. M. Zurada, and J. T. Astola, “Blur identification by multilayer neural network based on multivalued neurons,” IEEE Trans. Neural Netw., vol. 19, no. 5, pp. 883–898, May 2008. [4] M. K. Muezzinoglu, C. Guzelis, and J. M. Zurada, “A new design method for the complex-valued multistate Hopfield associative memory,” IEEE Trans. Neural Netw., vol. 14, no. 4, pp. 891–899, Jul. 2003. [5] G. Tanaka and K. Aihara, “Complex-valued multistate associative memory with nonlinear multilevel functions for gray-level image reconstruction,” IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1463–1473, Sep. 2009.

540


[6] N. Sinha, M. Saranathan, K. R. Ramakrishna, and S. Suresh, “Parallel magnetic resonance imaging using neural networks,” in Proc. IEEE Int. Conf. Image Process., vol. 3, Mar. 2007, pp. 149–152. [7] R. V. Babu, R. Savitha, and S. Suresh, “Human action recognition using a fast learning fully complex-valued classifier,” Neurocomputing, vol. 89, pp. 202–212, Jul. 2012. [8] K. Burse, R. N. Yadav, and S. C. Shrivastava, “Channel equalization using neural networks: A review,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 3, pp. 352–357, May 2010. [9] T. Nitta, “Orthogonality of decision boundaries of complex-valued neural networks,” Neural Comput., vol. 16, no. 1, pp. 73–97, 2004. [10] I. Aizenberg and C. Moraga, “Multilayer feedforward neural network based on multi-valued neurons (MLMVN) and a backpropagation learning algorithm,” Soft Comput., vol. 11, no. 2, pp. 169–183, 2007. [11] M. F. Amin and K. Murase, “Single-layered complex-valued neural network for real-valued classification problems,” Neurocomputing, vol. 72, nos. 4–6, pp. 945–955, 2009. [12] R. Savitha, S. Suresh, N. Sundararajan, and H. J. Kim, “Fast learning complex-valued classifiers for real-valued classification problems,” Int. J. Mach. Learn. Cybern., 2012, to be published. [13] R. Savitha, S. Suresh, N. Sundararajan, and H. J. Kim, “A fully complex-valued radial basis function classifier for real-valued classification,” Neurocomputing, vol. 78, no. 1, pp. 104–110, 2012. [14] R. Savitha, S. Suresh, and N. Sundararajan, “Fast learning circular complex-valued extreme learning machine (CC-ELM) for real-valued classification problems,” Inf. Sci., vol. 187, no. 1, pp. 277–290, 2012. [15] A. Hirose and S. Yoshida, “Generalization characteristics of complexvalued feedforward neural networks in relation to signal coherence,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 4, pp. 541–551, Apr. 2012. [16] H. Leung and S. Haykin, “The complex backpropagation algorithm,” IEEE Trans. Signal Process., vol. 39, no. 9, pp. 2101–2104, Sep. 1991. [17] S. Chen, S. McLaughlin, and B. Mulgrew, “Complex valued radial basis function network, part I: Network architecture and learning algorithms,” EURASIP Signal Process. J., vol. 35, no. 1, pp. 19–31, 1994. [18] T. Kim and T. Adali, “Fully complex multi-layer perceptron network for nonlinear signal processing,” J. VLSI Signal Process. Syst. Signal, Image, Video Technol., vol. 32, nos. 1–2, pp. 29–43, 2002. [19] B. K. Tripathi and P. K. Kalra, “On efficient learning machine with root power mean neuron in complex domain,” IEEE Trans. Neural Netw., vol. 22, no. 5, pp. 727–738, May 2011. [20] R. Savitha, S. Suresh, and N. Sundararajan, “A fully complex-valued radial basis function network and its learning algorithm,” Int. J. Neural Syst., vol. 19, no. 4, pp. 253–267, 2009. [21] R. Savitha, S. Suresh, and N. Sundararajan, “Meta-cognitive learning in fully complex-valued radial basis function network,” Neural Comput., vol. 24, no. 5, pp. 1297–1328, 2012. [22] R. Savitha, S. Suresh, N. Sundararajan, and P. Saratchandran, “A new learning algorithm with logarithmic performance index for complex-valued neural networks,” Neurocomputing, vol. 72, nos. 16–18, pp. 3771–3781, 2009. [23] S. Suresh, R. Savitha, and N. Sundararajan, “A sequential learning algorithm for complex-valued resource allocation network-CSRAN,” IEEE Trans. Neural Netw., vol. 22, no. 7, pp. 1061–1072, Jul. 2011. [24] D. H. Dini and D. P. Mandic, “Class of widely linear complex Kalman filters,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 5, pp. 775–786, May 2012. [25] W. Wirtinger, “Zur formalen theorie der funktionen von mehr komplexen veränderlichen,” Ann. Math., vol. 97, no. 1, pp. 357–375, 1927. [26] S.-S. Yu and W.-H. Tsai, “Relaxation by the Hopfield neural network,” Pattern Recognit., vol. 25, no. 2, pp. 197–209, 1992. [27] I. Cha and S. A. Kassam, “Channel equalization using adaptive complex radial basis function networks,” IEEE J. Sel. Areas Commun., vol. 13, no. 1, pp. 122–131, Jan. 1995. [28] A. B. Suksmono and A. Hirose, “Intelligent beamforming by using a complex-valued neural network,” J. Intell. Fuzzy Syst., vol. 15, nos. 3–4, pp. 139–147, 2004. [29] J. Suckling, J. Parker, D. Dance, S. Astley, I. Hutt, C. Boggis, I. Ricketts, E. Stamatakis, N. Cerneaz, S. Kok, P. Taylor, D. Betal, and J. Savage “The mammographic image analysis society digital mammogram database,” in Proc. Exerpta Med. Int. Congr. Ser., vol. 1069. 1994, pp. 375–378.

[30] D. V. Loss, M. C. F. D. Castro, P. R. G. Franco, and E. C. C. D. Castro, “Phase transmittance RBF neural networks,” Electron. Lett., vol. 43, no. 16, pp. 882–884, 2007. [31] R. Remmert, Theory of Complex Functions. New York: Springer-Verlag, 1991. [32] J. Hu and J. Wang, “Global stability of complex-valued recurrent neural networks with time-delays,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 6, pp. 853–865, Jun. 2012. [33] M. F. Amin, M. M. Islam, and K. Murase, “Ensemble of single-layered complex-valued neural networks for classification tasks,” Neurocomputing, vol. 72, nos. 10–12, pp. 2227–2234, 2009. [34] S. N. Omkar, S. Suresh, T. R. Raghavendra, and V. Mani, “Acoustic emission signal classification using fuzzy C-means clustering,” in Proc. 9th Int. Conf. Neural Inf. Process., vol. 4. 2002, pp. 1827–1831. [35] C. Blake and C. Merz. (1998). UCI Repository of Machine Learning Databases. Dept. Information & Computer Sciences, Univ. California, Irvine [Online]. Available: http://archive.ics.uci.edu/ml/ [36] S. Suresh, N. Sundararajan, and P. Saratchandran, “Risk-sensitive loss functions for sparse multi-category classification problems,” Inf. Sci., vol. 178, no. 12, pp. 2621–2638, 2008. [37] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, Sep. 2009. [38] S. Suresh, K. Dong, and H. J. Kim, “A sequential learning algorithm for self-adaptive resource allocation network classifier,” Neurocomputing, vol. 73, nos. 16–18, pp. 3012–3019, 2010. [39] S. Suresh, S. N. Omkar, V. Mani, and T. N. G. Prakash, “Lift coefficient prediction at high angle of attack using recurrent neural network,” Aerosp. Sci. Technol., vol. 7, no. 8, pp. 595–602, 2003. [40] G. S. Babu and S. Suresh, “Meta-cognitive neural network for classification problems in a sequential learning framework,” Neurocomputing, vol. 81, pp. 86–96, Apr. 2012. [41] M. Friedman, “A comparison of alternative tests of significance for the problem of m rankings,” Ann. Math. Stat., vol. 11, no. 1, pp. 86–92, 1940. [42] O. J. Dunn, “Multiple comparisons among means,” J. Amer. Stat. Assoc., vol. 56, no. 293, pp. 52–64, 1961. [43] S. Suresh, N. Sundararajan, and P. Saratchandran, “A sequential multicategory classifier using radial basis function networks,” Neurocomputing, vol. 71, nos. 7–9, pp. 1345–1358, 2008. [44] J. Demsar, “Statistical comparisons of classifiers over multiple data sets,” J. Mach. Learn. Res., vol. 7, pp. 1–30, Dec. 2006. [45] M. B. Li, G.-B. Huang, P. Saratchandran, and N. Sundararajan, “Fully complex extreme learning machine,” Neurocomputing, vol. 68, nos. 1–4, pp. 306–314, 2005. [46] R. A. Monzingo and T. W. Miller, Introduction to Adaptive Arrays. Raleigh, NC: SciTech Publishing, 2004. [47] P. Gamigami, Atlas of Mammography: New Early Signs in Breast Cancer. Cambridge, MA: Blackwell Science, 1996. [48] T. S. Subashini, V. Ramalingam, and S. Palanivel, “Automated assessment of breast tissue density in digital mammograms,” Comput. Vis. Image Understand., vol. 114, no. 1, pp. 33–43, 2010.

Ramasamy Savitha (M’11) received the B.E. degree in electrical and electronics engineering from Manonmaniam Sundaranar University, Thirunelveli, India, the M.E. degree in control and instrumentation engineering from Anna University, Chennai, India, and the Ph.D. degree from the School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore, in 2000, 2002, and 2011, respectively. She is currently a Post-Doctoral Research Fellow with the School of Computer Engineering, Nanyang Technological University. Her current research interests include neural networks, metacognition, and bioinformatics.


Sundaram Suresh (SM’08) received the B.E. degree in electrical and electronics engineering from Bharathiyar University, Coimbatore, India, in 1999, and the M.E. and Ph.D. degrees in aerospace engineering from the Indian Institute of Science, Bangalore, India, in 2001 and 2005, respectively. He was a Post-Doctoral Researcher with the School of Electrical Engineering, Nanyang Technological University, Singapore, from 2005 to 2007. From 2007 to 2008, he was with the National Institute for Research in Computer Science and ControlSophia Antipolis, Nice, France, as a Research Fellow of the European Research Consortium for Informatics and Mathematics. He was with Korea University, Seoul, Korea, as a Visiting Faculty of industrial engineering. From 2009 to 2009, he was with the Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India, as an Assistant Professor. Since 2010, he has been an Assistant Professor with the School of Computer Engineering, Nanyang Technological University. His current research interests include flight control, unmanned aerial vehicle design, machine learning, optimization and computer vision.

541

Narasimhan Sundararajan (LF’69) received the B.E. degree in electrical engineering (First Class Hons.) from the University of Madras, Chennai, India, the M.Tech. degree from the Indian Institute of Technology Madras, Chennai, and the Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign, Urbana, in 1966, 1968, and 1971, respectively. He was involved in research on various capacities with the Indian Space Research Organization, Bangalore, India, where he joined in 1971. Since 1991, he has been a Professor with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. He was a National Research Council Research Associate with NASA Ames Research Center, Ames, CA, in 1974, and a Senior Research Associate with NASA Langley, Hampton, VA, from 1981 to 1986. He has authored or co-authored more than 250 papers n journals and conferences, and authored six books on computational intelligence and neural networks. His current research interests include aerospace control and neural networks. Dr. Sundararajan is an Associate Fellow of the American Institute of Aeronautics and Astronautics and is a fellow of the Institution of Engineers, Singapore.

A CAD system to analyse mammogram images using fully complex-valued relaxation neural network ensembled classifier.

Fast, Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the 'Extreme Learning Machine' Algorithm.

Memristor bridge synapse-based neural network and its learning.

Structural constraints on learning in the neural network.

Performance of an Artificial Multi-observer Deep Neural Network for Fully Automated Segmentation of Polycystic Kidneys.

Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction.

Unusual fast secondary relaxation in metallic glass.

Schizophrenia interactome: fully-labeled interactome network.

Fast fluorescent imaging-based Thai jasmine rice identification with polynomial fitting function and neural network analysis.

Least Square Fast Learning Network for modeling the combustion efficiency of a 300WM coal-fired boiler.

A fast neural network approach to predict lung tumor motion during respiration for radiation therapy applications.

Fast-convergent double-sigmoid Hopfield neural network as applied to optimization problems.

Fully Isotropic Fast Marching Methods on Cartesian Grids.

Neural constraints on learning.

Neural Network-Based Learning Kernel for Automatic Segmentation of Multiple Sclerosis Lesions on Magnetic Resonance Images.

Lifelong learning of human actions with deep neural network self-organization.

Memristive neural network for on-line learning and tracking with brain-inspired spike timing dependent plasticity.

NeuCube: a spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data.

Interval probabilistic neural network.

Pattern learning and the control of behaviour by all-inhibitory neural network hierarchies.

Fast and fully-scalable synthesis of reduced graphene oxide.

Urinary bladder segmentation in CT urography using deep-learning convolutional neural network and level sets.

Objects Classification by Learning-Based Visual Saliency Model and Convolutional Neural Network.

Criticality meets learning: Criticality signatures in a self-organizing recurrent neural network.