IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

R EFERENCES [1] R. Van Rullen and S. Thorpe, “Rate coding versus temporal order coding: What the retinal ganglion cells tell the visual cortex,” Neural Comput., vol. 13, no. 6, pp. 1255–1283, Jun. 2001. [2] S. Thorpe, “Spike arrival times: A highly efficient coding scheme for neural networks,” in Parallel Processing in Neural Systems and Computers, R. Eckmiller, G. Hartmann, and G. Hauske, Eds. Amsterdam, The Netherlands: Elsevier, 1990, pp. 91–94. [3] M. Meister and M. Berry, “The neural code of the retina,” Neuron, vol. 22, no. 3, pp. 435–450, Mar. 1999. [4] T. Gollisch and M. Meister, “Rapid neural coding in the retina with relative spike latencies,” Science, vol. 319, no. 5866, pp. 1108–1111, Feb. 2008. [5] S. Thorpe, D. Fize, and C. Marlot, “Speed of processing in the human visual system,” Nature, vol. 381, pp. 520–522, Jun. 1996. [6] L. Perrinet, M. Samuelides, and S. Thorpe, “Coding static natural images using spiking event times: Do neurons cooperate?” IEEE Trans. Neural Netw., vol. 15, no. 5, pp. 1164–1175, Sep. 2004. [7] B. S. Bhattacharya and S. Furber, “Biologically inspired means for rankorder encoding images: A quantitative analysis,” IEEE Trans. Neural Netw., vol. 21, no. 7, pp. 1087–1099, Jul. 2010. [8] K. Masmoudi, M. Antonini, P. Kornprobst, and L. Perrinet, “A novel bioinspired static image compression scheme for noisy data transmission over low-bandwidth channels,” in Proc. Int. Conf. Acoust. Speech Signal Process., Dallas, TX, Mar. 2010, pp. 3506–3509. [9] J. Kovacevic and A. Chebira, “An introduction to frames,” Foundations and Trends in Signal Processing. Hanover, MA: Now Publishers, 2008. [10] S. Rakshit and C. Anderson, “Error correction with frames: The BurtLaplacian pyramid,” IEEE Trans. Inf. Theory, vol. 41, no. 6, pp. 2091– 2093, Nov. 1995. [11] P. Burt and E. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. 31, no. 4, pp. 532–540, Apr. 1983. [12] M. Do and M. Vetterli, “Framing pyramids,” IEEE Trans. Signal Process., vol. 51, no. 9, pp. 2329–2342, Sep. 2003. [13] D. Field, “What is the goal of sensory coding?” Neural Comput., vol. 6, no. 4, pp. 559–601, 1994. [14] K. Masmoudi, M. Antonini, and P. Kornprobst, “Frames for exact inversion of the rank order code,” INRIA, Le Chesnay, France, Res. Rep. RR-7744, 2011. [15] S. Toledo, “A survey of out-of-core algorithms in numerical linear algebra,” in External Memory Algorithms and Visualization, J. Abello and J. S. Vitter, Eds. Providence, RI: AMS, 1999, pp. 161–180. [16] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

A Globally Convergent MC Algorithm With an Adaptive Learning Rate Dezhong Peng, Member, IEEE, Zhang Yi, Senior Member, IEEE, Yong Xiang, and Haixian Zhang, Member, IEEE Abstract— This brief deals with the problem of minor component analysis (MCA). Artificial neural networks can be exploited to achieve the task of MCA. Recent research works show that Manuscript received December 30, 2010; revised November 10, 2011; accepted December 3, 2011. Date of publication January 3, 2012; date of current version February 8, 2012. This work was supported in part by the National Basic Research Program of China 973 Program, under Grant 2011CB302201, the National Natural Science Foundation of China, under Grant 60970013 and Grant 61172180, and the Australian Research Council, under Grant DP110102076. D. Peng, Z. Yi, and H. Zhang are with the Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, China (e-mail: [email protected]; [email protected]; [email protected]). Y. Xiang is with the School of Information Technology, Deakin University, Melbourne, VIC 3125, Australia (e-mail: [email protected]). Digital Object Identifier 10.1109/TNNLS.2011.2179310

359

convergence of neural networks based MCA algorithms can be guaranteed if the learning rates are less than certain thresholds. However, the computation of these thresholds needs information about the eigenvalues of the autocorrelation matrix of data set, which is unavailable in online extraction of minor component from input data stream. In this correspondence, we introduce an adaptive learning rate into the OJAn MCA algorithm, such that its convergence condition does not depend on any unobtainable information, and can be easily satisfied in practical applications. Index Terms— Deterministic discrete time system, eigenvalue, eigenvector, minor component analysis, neural networks.

I. I NTRODUCTION The eigenvector associated with the smallest eigenvalue of the autocorrelation matrix of data set is called minor component (MC). As an effective method for data analysis, minor component analysis (MCA) is aimed at extracting MC from data set and has many important applications [1]–[3], such as curve and surface fitting, digital beamforming, frequency estimation, moving target indication, and clutter cancellation. Recently, neural network based MCA algorithms have received considerable research interests [11]– [13]. Although the batch MCA methods, e.g., power algorithms, which usually depend on the computation of the correlation matrix of inputs, are more efficient and can achieve better performance than the online neural networks algorithms [20], the online algorithms do not need to compute and save the correlation matrix and only deal with the computation of vectors and scalars, which results in that the online neural networks algorithms have the lower storage requirement than the batch algorithms. So far, numerous neural networks learning algorithms have been developed to solve the MCA problem [6]–[14]. For these MCA algorithms, convergence is essential and considerable research has been conducted to analyze their convergence properties. Since direct convergence analysis is rather difficult for stochastic discrete time (SDT) systems describing the MCA algorithms, some indirect dynamics study methods are referable. In the existing literature, deterministic continuous time (DCT) method and deterministic discrete time (DDT) method are two widely-used analysis approaches for the stochastic learning algorithms. The DCT method transforms the original SDT systems into their corresponding DCT systems, based on the stochastic approximation theory [4]. Then the convergence properties of MCA neural networks algorithms are indirectly attained by studying the dynamics of these DCT systems [5]–[9]. The DDT method exploits the conditional expectation to transform the original SDT system describing the learning algorithm into a corresponding DDT system. Through studying dynamics of the obtained DDT system, one can indirectly investigate the convergence properties of the original SDT system. Both the DCT method and the DDT method indirectly shed some light on the convergence characteristics of the original stochastic algorithms. Both the DDT method and the DCT method show that the selection of the learning rate plays an essential role in the dynamical behaviors of algorithms. In order to guarantee convergence of algorithms, the learning rate is usually required

2162–237X/$31.00 © 2012 IEEE

360

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

to satisfy some constraints. Specifically, the usage of the DCT method requires the learning rate to approach zero. However, what trajectory the learning rate should follow to approach zero is still an open problem. The current solutions usually depend on particular applications [7]–[9]. On the other hand, the DDT method proves that convergence of MCA algorithms can be guaranteed by the learning rates smaller than certain thresholds, which are determined by the eigenvalues of the autocorrelation matrix of data set [11]–[13]. For example, for the MCA algorithms in [11] and [13], the learning rates η must satisfy η < 1/λ1 and η < λn /λ21 , respectively. Here, λ1 and λn are the largest eigenvalue and the smallest eigenvalue of the autocorrelation matrix, respectively. However, information about these eigenvalues is usually unavailable in some practical applications that need to online extract the MC from the input data stream. As a result, it is difficult to determine the learning rates that satisfy the convergence conditions required by the DDT method. In order to remove the difficulties in the selections of the learning rates, this correspondence introduces an adaptive learning rate which can guarantee global convergence of the MCA algorithm under some mild constraints, which do not depend on the eigenvalues of the autocorrelation matrix and can be easily satisfied in practice applications. II. A LGORITHM D ESCRIPTION MCA can be achieved by exploiting a single linear neuron with the following input-output relations: y(k) = w T (k)x(k), (k = 0, 1, 2, . . .) where y(k) denotes the neuron output, the input sequence {x(k)|x(k) ∈ R n (k = 0, 1, 2, . . .)} is a zero mean stochastic process, and w(k) ∈ R n (k = 0, 1, 2, . . .) stands for the weight vector of the neuron. The aim of MCA neural networks learning algorithms is to extract the MC from the input sequence {x(k)} by updating the weight vector w(k) adaptively. By changing the well-known Oja’s learning algorithm for principal component analysis [14] into a constrained antiHebbian rule, the following MCA learning algorithm (OJA MCA) is obtained [3]: w(k + 1) = w(k) − ηy(k) [x(k) − y(k)w(k)] where η > 0 is the learning rate. Its explicitly normalized version (OJAn MCA) [3] is   y(k)w(k) . (1) w(k + 1) = w(k) − ηy(k) x(k) − T w (k)w(k) In [15], Peng et al. analyzed the convergence of the OJAn MCA algorithm (1) via the DDT method, and proved that the learning rate η ensuring convergence of (1) must satisfy the following conditions: 1 λ1 − λn where λ1 and λn are, respectively, the largest eigenvalue and the smallest eigenvalue of the autocorrelation matrix Rx = [x(k)x T (k)] of the inputs {x(k)}. As we previously mentioned that these eigenvalues are usually unknown in online extraction η
· · · > λn ≥ 0.

(5)

Suppose that {v 1 , v 2 , . . . , v n } is an orthonormal basis of R n , in which each v i is a unit eigenvector of Rx associated with the eigenvalue λi . According to the relevant properties of the Rayleigh Quotient, the symmetric feature of Rx ensures [13], [15] w T Rx w ≥ λn (6) λ1 ≥ wT w for any nonzero vector w. In view of the fact that the vector set {v 1 , . . . , v n } is an orthonormal basis of R n , for each k ≥ 0, the weight vector w(k) can be represented as w(k) =

n  i=1

z i (k)v i

(7)

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

where z i (k)(i = 1, 2, . . . , n) are constants. Also w T (k)Rx w(k) =

n 

λi z i2 (k).

(8)

i=1

Substituting (7) into (4), it follows:  w(0) z i (k + 1)v i = z i (k)v i · 1 − w(k) i=1 i=1   w T (k)Rx w(k) . × λi − w T (k)w(k)

n 

n 

Since the vectors v 1 , v 2 , . . . , v n are mutually linearly independent, we have    w(0) w T (k)Rx w(k) · λi − (9) z i (k + 1) = z i (k) · 1− w(k) w T (k)w(k) where i = 1, 2, . . . , n. From (6) and (9), it yields |z n (k + 1)|

   w(0) w T (k)Rx w(k) · λn − = |z n (k)| · 1− w(k) w T (k)w(k) (10) ≥ |z n (k)|

that is, |z n (k)| is monotonically increasing for all k ≥ 0. This means that the absolute value |z n (k)| of the projection of w(k) on the minor eigenvector v n is gradually becoming larger during the learning procedure of algorithm.

draw the conclusion from (10) that |z n (k)| will converge to a constant. At the same time, it follows from (6) and (9) that z n (k) > 0 for all k > 0 if z n (0) > 0, and z n (k) < 0 for all k > 0 if z n (0) < 0. This means that z n (k) will also converge if |z n (k)| converges. Therefore, we can find lim z n (k) = δ k→∞ where δ is some constant. And then, we obtain from (9) z n (k + 1) k→∞ z n (k)    w(0) w T (k)Rx w(k) = lim 1 − · λn − k→∞ w(k) w T (k)w(k) = 1. lim

This implies lim

k→∞

  w T (k)Rx w(k) w(0) · λn − = 0. w(k) w T (k)w(k)

φ = w(0) · (λ1 − λn ). Since w(k) is monotonically increasing according to (3), the evolutions of w(k) have and only have the following two cases. Case 1: w(k) < φ for all k ≥ 0. Case 2: There exists a positive integer N such that w(N) ≥ φ. For Case 1, clearly, w(k) must converge to a constant less than or equal to φ, which will greatly simplify the proof of convergence of the algorithm, which is shown in Theorem 1. In contrast, the proof of convergence for Case 2 is more difficult and complicated. Due to w(k) = ni=1 z i (k)v i , clearly, the convergence of w(k) can be determined by evolutions of z i (k)(i = 1, 2, . . . , n). Based on this consideration, we can prove the convergence for Case 2 by proving the convergence of z i (k) one by one. Since there are obvious differences between the convergence analysis for Case 1 and that for Case 2, we shall distinguish the two cases in proving their convergence. Let us begin with an important theorem about the dynamical behaviors of the algorithm in Case 1. Theorem 3: If the condition w T (0)v n = 0 is satisfied in Case 1, then w(k) = ±v n . lim k→∞ w(k) T Proof: Since w (0)v n = 0, it holds from (7) that z n (0) = 0. It is obvious that w(k) is bounded in Case 1. From (7), clearly, |z n (k)| must also be bounded. Thus one can

(11)

Since w(k) is bounded for all k ≥ 0 in Case 1, it holds from (11) that   w T (k)Rx w(k) lim λn − =0 k→∞ w T (k)w(k) which leads to

lim λn w T (k)w(k) − w T (k)Rx w(k) = 0. k→∞

Substituting (7) and (8) into the above equation, we obtain n−1  (λn − λi )z i2 (k) = 0. lim

k→∞

III. C ONVERGENCE A NALYSIS Let φ be a constant and

361

i=1

This yields lim z i (k) = 0,

k→∞

(i = 1, 2, . . . , n − 1).

(12)

Due to lim z n (k) = δ, it follows from (7) and (12) that k→∞

n w(k) i=1 z i (k)v i = lim  = ±v n . lim k→∞ w(k) k→∞ n z 2 (k) i=1 i

This completes the proof. Theorem 1 shows that in Case 1, the weight vector w(k) in the DDT system (4) can converge to the direction of the MC v n under the condition w T (0)v n = 0. Next, we shall prove that the same convergence result can also be obtained in Case 2 if the condition w T (0)v n = 0 is satisfied. First, we present a useful lemma. Lemma 2: If the condition w T (0)v n = 0 holds in Case 2, then   w(0) w T (k)Rx w(k) · λi − ≥ 0, (i = 1, 2, . . . , n) 1− w(k) w T (k)w(k) (13) and w(k) ≤ w(N) + (k − N) · (λ1 − λn ) · w(0)

(14)

for all k ≥ N. Proof: See Appendix A. It can be seen from (7) that the convergence of the weight vector w(k) depends on the evolutions of its projections z i (k) = w T (k)v i on the eigenvectors v i , i = 1, 2, . . . , n. If we

362

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

200

1

150

0.998 Dash line: y  8x Solid line: Fitting line

0.996 Directioncosine of w(k)

100 50 y

0 −50

−100

Fig. 1.

0.99 0.988 0.986

0.982 0.98 −30

−20

−10

0 x

10

20

30

40 Fig. 2.

Line fitting.

can prove z i (k) → 0, as k → ∞(i = 1, 2, . . . , n − 1), then the weight vector w(k) must converge into the direction of the MC v n . Following the above way of thinking, we propose the following two theorems, which are built upon Lemma 2. Theorem 4: If w T (0)v n = 0 in Case 2, then lim z 1 (k) = 0. k→∞ Proof: See Appendix B. Theorem 5: If w T (0)v n = 0 in Case 2 and lim z i (k) = 0 k→∞ (i = 1, 2, . . . , m − 1) with 2 ≤ m ≤ n − 1, then lim z m (k) = 0. Proof: See Appendix C. Combining Theorems 4 and 5, and using the induction method, we can derive the following corollary. Corollary 1: If w T (0)v n = 0 in Case 2, then k→∞

lim z i (k) = 0

k→∞

where i = 1, 2, . . . , n − 1. From Corollary 1 and (7), the following theorem is proposed. Theorem 6: If w T (0)v n = 0 in Case 2, then lim

w(k)

k→∞ w(k) w T (0)v n =

= ±v n .

Proof: Since 0, it holds from (7) that |z n (0)| > 0. Considering that w(k) is monotonously increasing according to (3), we have 0 ≤ (|z i (k)|)/(w(k)) ≤ (|z i (k)|)/(w(0))(i = 1, 2, . . . , n − 1). From Corollary 1, it follows that limk→∞ (|z i (k)|)/(w(0)) = 0(i = 1, 2, . . . , n − 1). Thus |z i (k)| = 0(i = 1, 2, . . . , n − 1). k→∞ w(k) lim

(15)

From (7) and (10), it holds that 1≥

0.992

0.984

−150 −200 −40

0.994

|z n (k)| 1 1 = ≥ . (16) n−1 2 n−1 2 w(k)  z i (k)  z i (k)   +1 +1 z n2 (k) z n2 (0) i=1

i=1

0

50

100 Number of iterations

150

200

Convergence of DirectionCosine(k) in line fitting.

From  n−1

Corollary

1,

clearly,

lim k→∞ (1/

2 2 1. Then it results from i=1 (z i (k)/z n (0)) + 1) = (16) that limk→∞ (|z n (k)|)(w(k)) = 1. At the same time, it follows from (6) and (9) that z n (k) does not change its sign for any k ≥ 0. This yields

lim

k→∞

z n (k) = ±1. w(k)

(17)

Based on (7), (15) and (17), we obtain  z i (k) w(k) z n (k) = lim v n + lim v i = ±v n . k→∞ w(k) k→∞ w(k) k→∞ w(k) n−1

lim

i=1

This completes the proof. Theorems 3 and 6 show the convergence of the DDT system (4) in Case 1 and Case 2, respectively. The conclusions drawn in these two theorems can be summarized as follows: Theorem 7: Under the condition w T (0)v n = 0, it holds that the weight vector w(k) in (4) satisfies w(k) = ±v n . w(k) Theorem 5 shows that the only condition required to guarantee the convergence of (4) is w T (0)v n = 0, i.e., the initial weight vector w(0) is not orthogonal to the MC v n . In practical applications, any randomly selected w(0) will not be orthogonal to v n with probability one. Therefore, the condition w T (0)v n = 0 can be easily satisfied. lim

k→∞

IV. S IMULATION E XAMPLES In this section, we will use two application examples to illustrate the effectiveness of the MCA algorithm (2). Example 1: This example uses the algorithm (2) to solve the problem of line fitting. Let us consider a line y = 8x in the 2-D plane. By adding Gaussian noises to the 500 sample points with the equivalent interval on this line, a set of data points D = {d(k)|d(k) = [x(k), y(k)]T , k = 1, 2, . . . , 500} can be obtained as shown in Fig. 1. The target of line fitting is to find a parameterized line model (e.g., w1 x + w2 y = 0) to fit the above data points set, such that the sum of the squared perpendicular distances of the line from these data points is

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

363

1

1

0.5

0.5

0

0

−0.5

−0.5

−1

−1

1

1 1

0

0

−1

1

0

−1

0

−1

(a)

Directioncosine of w(k)

1

0.8

0.6

0.4

0.2

−1

(b) 0

Fig. 3. Surface fitting (a) ellipsoid: 2x 2 + 0.5y 2 + z 2 = 1 and (b) noisedisturbed data set.

minimized [3]. This problem can be solved by searching the MC v ∗ = [v 1∗ v 2∗ ]T of the data set {d(k)}, i.e., the minor eigenvector of the autocorrelation matrix Rˆ x = E{d(k)d T (k)}, and the desired fitting line can be expressed as v 1∗ x + v 2∗ y = 0 [3], [15]. After removing the mean, the vectors from the data set D are randomly selected as inputs for the proposed algorithm (2). In order to measure the performance of the algorithm, an index Dir ecti onConsi ne(k) is defined as   T w (k)v ∗  . Dir ecti onCosi ne(k) = w(k) · v ∗  Clearly, if Dir ecti onCosi ne(k) can converge to 1, we shall obtain the optimal fitting line. Fig. 2 shows the evolution of the Dir ecti onCosi ne(k) of w(k) in the algorithm (2), from which we can see that the performance index Dir ecti onCosi ne(k) can approach 1 rapidly. Although there are some fluctuations in Dir ecti onCosi ne(k) which are brought about by noises, the amplitudes of these random fluctuations are very small. After 200 iterations, the weight vector w(k) is computed as w∗ = [1.6178 − 0.2153]T . Fig. 1 gives the fitting performance of w∗ , from which we can see that the fitting line is very close to the original line, which means he proposed MCA algorithm achieves satisfactory result. Example 2: We consider a more complicated application of MCA, i.e., the fitting problem of nonlinear surface. A 3-D data set G = {(x i , yi , z i }} is generated by sampling the ellipsoid 2x 2 + 0.5y 2 + z 2 = 1

(18)

in such a way that the sampling intervals on x-y plane are uniform [3], which is shown in the left sub-figure in Fig. 3. A noise-disturbed data set G˜ is generated by adding Gaussian noises onto the points from the set G. The right sub-figure in Fig. 3 illustrates the noise-disturbed data ˜ The surface fitting problem is to find points in the set G. a parameterized model a1 x 2 + a2 y 2 + a3 z 2 = 1 to fit the noisy data set G˜ = {(x˜i , y˜i , z˜ i }}. It has been proven in [3] that the solution for the problem lies in searching the MC of the data set F = {(u i , v i , wi )|u i = x˜i2 , v i = y˜i2 , wi = ˜ z˜ 2 , and (x˜i , y˜i , z˜ i ) ∈ G}. i

η(k)||w(0)||/||w(k)|| η(k)0.01 η(k)1/k

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Number of iterations Fig. 4.

Convergence of DirectionCosine(k) in surface fitting.

After the centralization step, the data points from the set F are randomly selected as the inputs for MCA algorithms. Fig. 4 compares the performance index Dir ecti onCosi ne(k) between (2) and the original OJAn MCA algorithm, where our desired vector v ∗ is [2 0.5 1]T according to (18). For the original OJAn MCA algorithm, we select the learning rate η in two different ways. The first selection is a small constant learning rate η = 0.01. The second selection is a zeroapproaching learning rate η(k) = 1/k, which can satisfy the conditions needed by the DCT method. From the simulation results in Fig. 4, we can find that our proposed adaptive learning rate can effectively speed up convergence of the algorithm compared with the constant learning rate η = 0.01 and the zero-approaching learning rate η(k) = 1/k. Although there exist some oscillations in convergence results with our proposed learning rate, they have very limited amplitudes and are tolerable in many practical applications [10]. V. C ONCLUSION Unlike existing MCA algorithms with constant learning rates, this correspondence introduces an adaptive learning rate into OJAn MCA algorithm. By exploiting the DDT method, we prove that convergence of the algorithm can be ensured under a mild condition that is easy to satisfy in practice. In contrast, the convergence conditions of existing MCA learning algorithms require knowledge of the eigenvalues of the autocorrelation matrix of data set, which is not available in many practical applications. A PPENDIX A P ROOF OF L EMMA 1 Since w(k) is monotonically increasing according to (3) and w(N) ≥ φ in Case 2, then w(k) ≥ φ = w(0) · (λ1 − λn )

(19)

364

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

for all k ≥ N. From (6) and (19), it yields   w(0) w T (k)Rx w(k) · λi − 1− w(k) w T (k)w(k) w(0) · (λ1 − λn ) ≥ 0, i = 1, 2, . . . , n ≥1− w(k) for all k ≥ N. Based on (6), (9), and (20), we obtain |z i (k + 1)|

where

! w(0) · (λ1 − λ2 ) w T (N)w(N) − z 12 (N) . δ1 (k) = [w(N)+(k − N)·(λ1 −λn )·w(0)] · w T (N)w(N)

(20)

From Lemma 2, (9), (21), and (23), one can obtain |z 1 (k + 1)|

   w(0) w T (k)Rx w(k) · λ1 − = |z 1 (k)| · 1 − w(k) w T (k)w(k) ≤ |z 1 (k)| · [1 − δ1 (k)] k " ≤ |z 1 (N)| · [1 − δ1 (i )] , (k ≥ N). (24)

  w(0) w T (k)Rx w(k) · λi − = |z i (k)| · 1 − w(k) w T (k)w(k)   w(0) w(0) ≤ |z i (k)| · 1 − · λn + · λ1 w(k) w(k) i = 1, 2, . . . , n, for all k ≥ N. Then, it follows that n  w(k + 1)2 = z i2 (k + 1) 

i=N

∞ It is easily seen k=N δ1 (k) is divergent. #∞that the series Thus we have k=N [1 − δ1 (k)] = 0. From (24), it yields limk→∞ z 1 (k) = 0. This completes the proof.

i=1

 2 w(0) w(0) · λn + · λ1 ≤ z i2 (k) · 1 − w(k) w(k) i=1  2 w(0) w(0) 2 = w(k) · 1 − · λn + · λ1 w(k) w(k) = (w(k) − w(0) · λn + w(0) · λ1 )2 n 

A PPENDIX C P ROOF OF T HEOREM 3

i.e., w(k + 1) ≤ w(k) − w(0) · λn + w(0) · λ1 , for all k ≥ N. Clearly, the above inequality results in

Due to w T (0)v n = 0, it holds from (7) that |z n (0)| > 0. From (10), clearly, |z n (N)| ≥ |z n (0)| > 0. Since lim z i (k) = k→∞ 0(i = 1, 2, . . . , m − 1), there must exist a positive integer M such that m−1 

w(k) ≤ w(N) + (k − N) · (λ1 − λn ) · w(0) for all k ≥ N. This completes the proof. and

Using Lemma 2, it holds from (6) and (9) that for all k ≥ N |z 1 (k + 1)|

   w(0) w T (k)Rx w(k) |z · λ1 − = 1 (k)| · 1 − w(k) w T (k)w(k) (21) ≤ |z 1 (k)|.

Since w(k) is monotonously increasing according to (3), we have that in Case 2 (22)

for all k ≥ N. Based on Lemma 2, and the relations in (6)–(9), and (21)–(22), it follows that for all k ≥ N   w(0) w T (k)Rx w(k) 1− · λ1 − w(k) w T (k)w(k) n w(0) · (λ1 − λ2 )  2 ≤ 1− z i (k) w(k) · w T (k)w(k) i=2 z 12 (N) w(0) · (λ1 − λ2 ) ≤ 1− · 1− T w(k) w (N)w(N) w(0) · (λ1 − λ2 ) ≤ 1− w(N) + (k − N) · (λ1 − λn ) · w(0) z 12 (N) × 1− T w (N)w(N) = 1 − δ1 (k)

(25)

i=1

A PPENDIX B P ROOF OF T HEOREM 2

w(k) ≥ w(N) ≥ φ = w(0) · (λ1 − λn )

(λi − λm )z i2 (k) < (λm − λn )z n2 (N)

(23)

  2 (N) (λi − λm+1 )z i2 (k) λm − λm+1 zm < · 1− T w T (N)w(N) 2 w (N)w(N) i=1 (26) for all k ≥ M. Let η(k) = w(0)/w(k). According to Lemma 1, it follows from (5)–(8), (10), and (25) that   w T (k)Rx w(k) 1 − η(k) λm − w T (k)w(k) n  η(k) = 1− T (λm − λi )z i2 (k) w (k)w(k) i=m m−1  2 − (λi − λm )z i (k) m−1 

i=1

η(k) (λm − λn )z n2 (N) w T (k)w(k) m−1  2 − (λi − λm )z i (k)

≤ 1−

A globally convergent MC algorithm with an adaptive learning rate.

This brief deals with the problem of minor component analysis (MCA). Artificial neural networks can be exploited to achieve the task of MCA. Recent re...
439KB Sizes 0 Downloads 1 Views