Neural Networks 48 (2013) 204

Contents lists available at ScienceDirect

Neural Networks journal homepage: www.elsevier.com/locate/neunet

Letter to the editor

Reply to the Comments on the ‘‘No-Prop’’ algorithm Dear Dr. Lim: My co-authors and I are grateful to you for pointing out the fine work of Dr. G.-B. Huang and his colleagues (Lim, 2013). Before we submitted our paper (Widrow et al., 2013) on the No-Prop algorithm for publication, we used Google to see if we overlooked relevant prior work. Perhaps we used the wrong key words, but we came up empty. Surely someone would have done something like this previously. Until now, we did not find this. The reviewers of our paper, who were very thorough, evidently did not know of Dr. Huang’s work. What our work and that of Dr. Huang have in common is that we both have independently discovered that it is not necessary to train the hidden layers of a multi-layer neural network. Training the output layer will be sufficient for many applications. The difference between the Extreme Learning Machine (ELM) and the No-Prop algorithm lies in the training method for the output layer. The output layer neurons are trained independently, and No-Prop uses the LMS gradient algorithm to do this. The objective is to minimize mean square error, i.e. to find the Wiener solution for each output layer neuron. The ELM algorithm does this by essentially inverting the covariance matrix of the neuron inputs and multiplying by the vector of crosscorrelations between the neuron’s inputs and its desired response. This is a direct method for finding the Wiener solution. No-Prop uses a gradient method and ELM uses matrix inversion. Each output neurons is a linear combiner. Training a linear combiner was discussed by Widrow and Hoff in their 1960 paper,

0893-6080/$ – see front matter © 2013 Published by Elsevier Ltd http://dx.doi.org/10.1016/j.neunet.2013.11.003

Adaptive Switching Circuits. This paper introduced the LMS algorithm which today in one form or another is used in every MODEM in the world for echo cancelling and channel equalization. This is the world’s most widely used learning algorithm and we chose to use it to train the output layer neurons. We like ELM and we like matrix inversion — but LMS, a gradient algorithm, is very useful and has the following advantages as the training method for the output layer: – With streaming inputs, LMS is natural. – With a very large number of weights in the output layer neurons, matrix inversion may be difficult to perform. LMS has no problem with large numbers of weights. – With large eigenvalue spread for the input covariance matrix (large condition number), matrix inversion may be unstable and/or highly sensitive to noise. LMS is stable under these conditions, but does converge slowly. Bernard Widrow ISL, Department of Electrical Engineering, Stanford University, CA, United States

References Lim, M.-H. (2013). Comments on the ‘‘No-Prop’’ algorithm. Neural Networks, 48, 59–60. Widrow, B., Greenblatt, A., Kim, Y., & Park, D. (2013). The No-Prop algorithm: A new learning algorithm for multilayer neural networks. Neural Networks, 37, 182–188.

Reply to the Comments on the "No-Prop" algorithm.

Reply to the Comments on the "No-Prop" algorithm. - PDF Download Free
303KB Sizes 0 Downloads 0 Views