This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

1

Delay-Based Reservoir Computing: Noise Effects in a Combined Analog and Digital Implementation Miguel C. Soriano, Member, IEEE , Silvia Ortín, Lars Keuninckx, Lennert Appeltant, Jan Danckaert, Member, IEEE , Luis Pesquera, and Guy Van der Sande

Abstract— Reservoir computing is a paradigm in machine learning whose processing capabilities rely on the dynamical behavior of recurrent neural networks. We present a mixed analog and digital implementation of this concept with a nonlinear analog electronic circuit as a main computational unit. In our approach, the reservoir network can be replaced by a single nonlinear element with delay via time-multiplexing. We analyze the influence of noise on the performance of the system for two benchmark tasks: 1) a classification problem and 2) a chaotic time-series prediction task. Special attention is given to the role of quantization noise, which is studied by varying the resolution in the conversion interface between the analog and digital worlds.

Index Terms— Delay systems, dynamical systems, electronic circuits, memory capacity, pattern recognition, reservoir computing (RC), time-series prediction. I. I NTRODUCTION Reservoir computing (RC) is a novel machine-learning paradigm that allows for state-of-the-art performance in processing empirical data [1], [2]. The main inspiration underlying RC is the insight that neural systems can process information by generating patterns of transient activity, which are excited by input sensory signals [3]. One of the major advantages of RC is its simple linear training strategy [4]. In RC, the connections in the inner network, i.e., the reservoir, are kept constant at all times, whereas the output connections, i.e., the readout weights, are trained to the specific tasks. The properties of RC can be summarized as follows. First, the input data to be processed is sent to the nodes of the reservoir. If the number of nodes in the reservoir is sufficiently large and the nodes are nonlinear, the projection of the input data over the reservoir is functionally equivalent to a mapping onto a high-dimensional space. In this highdimensional space, a linear separation between reservoir states caused by different input signals becomes exponentially more likely [5]. A final property of RC is the use of recurrent connections, which create a finite (fading) memory of previous inputs. Therefore, the nodes of Manuscript received November 5, 2013; revised February 5, 2014; accepted March 8, 2014. This work was supported in part by MINECO, Spain, in part by the Comunitat Autònoma de les Illes Balears, in part by FEDER, in part by the European Commission under Project TEC2012-38864 and Project TEC2012-36335, in part by Grups Competitius, in part by the EC FP7 Project PHOCUS under Grant 240763, in part by the Interuniversity Attraction Pole Photonics@be, Belgian Science Policy Office, and in part by the Flemish Research Foundation. M. C. Soriano is with the Instituto de Física Interdisciplinar y Sistemas Complejos, IFISC, Palma de Mallorca E-07122, Spain (e-mail: [email protected]). S. Ortín and L. Pesquera are with the Instituto de Física de Cantabria, IFCA, Santander E-39005, Spain (e-mail: [email protected]; [email protected]). L. Keuninckx, L. Appeltant, J. Danckaert, and G. Van der Sande are with the Applied Physics Research Group, Vrije Universiteit Brussel, Brussels 1050, Belgium (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2014.2311855

the reservoir are still influenced by previous information when a new set of input data is received [6]. While most implementations of RC and, in general, recurrent neural networks are done in software [7], efficient hardware implementations of these concepts are of high interest. Hardware implementations offer to exploit the full potential of the intrinsic parallelism of neural networks [8], [9]. A dedicated hardware implementation for specific tasks can offer an advantage over software implementations, where low power consumption or high processing speeds are a priority. Here, we utilize a delay-based implementation of RC. Delay systems represent a class of dynamical systems that meet the requirements of high dimensionality and finite memory. The high dimensionality of the system lies in the many degrees of freedom introduced by the delay time τ [10]. The delay loop also creates a recurrent connection to previous inputs and, thus, a fading memory. Following these properties, it has been shown that a recurrent network can be replaced by a single nonlinear node with delay by means of time-multiplexing [5]. A major advantage of delay-based reservoir computers is that they can be easily implemented in hardware. It only requires a single nonlinear node and a delayed feedback loop. The simplicity of this concept has attracted several hardware implementations of RC in electronics [5], optoelectronics [11]–[13], and optics [14], [15]. In these implementations, although the reservoir itself (the nonlinear delay system) is analog, the input and readout are still digital. Some attempts have been made to construct an analog readout [16], but the performance of these systems is far away from what is achieved with digital readouts. The quantization noise that results from the necessary analog–digital conversion at the output system can limit the system performance. In the following, we address this important issue with a hardware realization of the delay-based RC concept and thorough numerical simulations. Here, we use an analog/digital implementation based on a simple nonlinear electronic circuit. While the nonlinear node is implemented in hardware with standard analog electronic components, the delay loop and the readout are implemented digitally. We explore the possibilities and limitations of this implementation by varying the resolution in the conversion interface between the analog and digital parts. We evaluate the system performance for two types of tasks with different sensitivity to noise. II. C ONCEPT AND I MPLEMENTATION The delay-based RC scheme can be conceptually divided in several distinct blocks, which are schematically shown in Fig. 1. First, there is an input preprocessing stage, where the incoming data are timemultiplexed [5]. This stage also includes the addition of a constant bias voltage to the time-multiplexed data to keep the input values positive due to requirements of the hardware implementation. Each incoming data sample is held over the delay time, τ , and it is multiplied by a mask matrix M that defines the coupling weights from input to the reservoir. The state of the reservoir for a given input is defined by N equidistant temporal positions in the delay

2162-237X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 1. Schematic view of the RC implementation based on a single Mackey–Glass nonlinear element with delay. DAC and ADC stand for digitalto-analog conversion and analog-to-digital conversion, respectively.

line after a given computational step, i.e., one τ after such input was injected into the nonlinear node [5]. These N points within the delay line are often denoted as virtual nodes and here they play a role analogous to that of the nonlinear nodes in conventional RC systems. The total number of virtual nodes (N) and the virtual node separation (θ) define the length of the delay time (τ = θ N ). These virtual nodes can emulate a reservoir network when θ < T , where T is the characteristic time of the system. Finally, at the output postprocessing stage, the system output is given by a linear weighted sum of the values of the virtual nodes. The weights are obtained with a simple multiple linear regression, without regularization, during the training procedure. A digital-to-analog (DAC) and an analog-to-digital (ADC) converter with 12-bits resolution interface the digital and the analog part, and vice versa. We have chosen an analog Mackey–Glass electronic circuit [17], [18] as the nonlinearity for our implementation. A delay element, implemented digitally, provides the required feedback. With the appropriate scaling [5], the Mackey–Glass system with delay can be modeled, in the presence of an external input I , as follows: η · [X (t − τ ) + γ · I (t)] dX = −X (t) + dt 1 + [X (t − τ ) + γ · I (t)] p

(1)

with X denoting the dynamical variable, t a dimensionless time, and τ the delay in the feedback loop. Parameters η and γ represent feedback strength and input scaling, respectively. The exponent p can be used to tune the nonlinearity. Note that for the scaled model T = 1. In the absence of an external input (γ = 0), the system shows a zero fixed point when η < 1, which evolves into a nonzero fixed point for larger values. By increasing further η, we find periodic dynamics and, subsequently, a region of deterministic chaos. In order to get an efficient performance of our reservoir computer, we find that the system must operate at a stable fixed point in the absence of external input. Under this condition, the system can show complex, but reproducible, transient dynamics when the external input is added (γ > 0). This input-induced transient dynamics is then used for computation. In this brief, we employ a hardware implementation of the Mackey–Glass node that uses a single bipolar transistor for the nonlinearity. The circuit diagram of the electronic implementation is depicted in Fig. 2, in which the functionality of each group of components is also noted for clarity on top. Fig. 3 shows the experimental Mackey–Glass function for this implementation, together with the corresponding numerical fit. In this example, the Mackey–Glass equation fits the experimental nonlinearity with an exponent p ∼ 6. By changing the values of the resistors, the value of the exponent p can be varied. III. R ESULTS In our implementation, one of the major factors limiting the performance of the system is the quantization noise at the output postprocessing stage. In the following, we analyze the influence of noise on the performance of the system for two types of tasks:

Fig. 2. Diagram of the electronic implementation of the Mackey–Glass nonlinear node using a single bipolar transistor.

Fig. 3. Experimental nonlinear function (red solid line) compared with a fit using the Mackey–Glass nonlinearity (green dashed line). The operating points marked with dots of different colors correspond to the solid lines in Fig. 8.

1) noise robust tasks (classification) and 2) noise sensitive tasks (memory capacity and time-series prediction). These tasks have been widely used in machine learning studies as benchmarks. For both types of tasks, quantization noise effects are studied by varying the resolution of the ADC. Moreover, for the prediction task, we also study the influence of the operating point in the system performance. We combine experimental and numerical results for the sake of generality. A. Classification Task We first evaluate the performance of the system for a spoken digit recognition task. The spoken digit data set consists of five female speakers uttering numbers from zero to nine with a tenfold repetition for statistics (500 samples in total) [19]. Before injecting the information into the system, we performed standard preprocessing, creating cochleograms using the Lyon ear model [20]. The information injected into the reservoir [I (t)] is given by the product of two matrices, the cochleograms [C(t)] and the input connectivity mask M [11]. For the characterization of the classification performance, we evaluate the word error rate (WER) as a function of some key system parameters. The WER is evaluated by choosing 20 random partitions of 25 samples each (out of the 500 spoken digits), using 475 samples for training the readout weights and keeping the remaining 25 for testing. Following this procedure, each random partition and each sample are used exactly once for testing (20-fold cross validation). Fig. 4 shows the WER as function of the feedback strength for 400 virtual nodes in the delay line and a separation between the virtual nodes of θ = 0.2 T. We find that the WER is systematically below 0.5% for intermediate feedback strengths in the range η = 0.3 – 0.9. The WER increases when the feedback strength approaches the chaotic region of the Mackey–Glass dynamics. The optimum performance of the system for the spoken digit classification task (0.05%) agrees with the good performance also found

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

3

Fig. 6. Experimental time-trace of the response of the nonlinear node to digit 1 represented over the duration of the spoken digit. The ADC resolution is 8 bits (left-hand side) and 6 bits (right-hand side). The parameters of the system are η = 0.8, γ = 0.5, and p ∼ 7. Insets: enlargement of the timetraces over a short temporal window. Fig. 4. Experimental results for the spoken digit recognition test. WER as a function of the feedback strength (η). The input scaling (γ ) has been kept fixed at 0.5 and the exponent is measured to be p ∼ 7. The delay line includes 400 virtual nodes. The ADC and DAC have an amplitude resolution of 12 bits. Error bars: mean and standard deviation over five different realizations.

Fig. 7. Experimental results for the memory capacity as a function of the feedback strength η and input scaling γ . The number of virtual nodes is N = 400 and the exponent is p ∼ 7.

Fig. 5. Minimum WER for the spoken digit recognition task in the experiments (solid line with circles) and numerical simulations (dashed line with crosses) as a function of the number of quantization bits. The input scaling is γ = 0.5, the feedback strength η = 0.8, and the exponent is set to p ∼ 7. The number of virtual nodes is N = 400, with a separation between them of θ = 0.2 T.

in optoelectronic (0.04% [11], 0.4% [12], and 0.6% [13]) and alloptical implementations (0.014% [14] and 3% [15]) of RC based on a single node with delay, which compares with standard software implementations of RC (0.5% [7] and 1.24% [25]). The resolution of the measurements in the electronic implementation is given by the ADC–DAC, which have an amplitude resolution of 12 bits. As previously mentioned, the major factor limiting the performance of our delay-based RC implementation is the quantization noise at the interface between the analog electronic circuit and digital implementation of the output postprocessing stage. We have explored the minimum requirements for the ADC to still show a WER below 0.1%. As shown in Fig. 5, low error rates are obtained for 8 bits or more in the ADC (see solid line with circles). Numerical simulations of this task confirm the influence of output quantization (see dashed line with crosses in Fig. 5). In addition, we have also verified in the numerical simulations that quantization in the input preprocessing stage is less detrimental to the correct classification of spoken digits. In order to understand the robustness of the classification task performance with respect to quantization noise, we analyze the response of the Mackey–Glass node for the isolated digits recorded

at the optimum point for identification. We have found that the shape of the recorded responses of the digits is still preserved when the ADC resolution is decreased from 8 to 6 bits. See for example, the temporal shape of the response of the system to the digit 1 in Fig. 6. These results suggest that the identification of isolated digits relies mainly on the recognition of the shape of the corresponding digit, with a small influence of the quantization at the scale of the transient response of the system (see insets in Fig. 6). B. Memory Capacity and Time-Series Prediction Task In this section, the effect of quantization noise is analyzed for tasks that are considered to be more sensitive to noise. First, we evaluate the memory capacity of the system, which relates to a basic property of RC as it is the estimation of the fading memory. Second, we evaluate a standard time-series prediction task. For the computation of the memory capacity, the input of the reservoir consists of a random sequence u(n) that is injected one sample at a time. The system is trained to reconstruct the delayed input u(n − k) from the current state of the reservoir. The memory function m(k) is given by the normalized correlation between the output and their associated delayed input u(n − k). The memory capacity, m c , is then computed as the sum of m(k) over k [21]. The experimental results shown in Fig. 7 indicate that the memory capacity increases with the feedback strength when η < 0.9. The largest memory capacity of our hardware implementation, m c = 7, is obtained for a feedback strength of η ∼ 0.8. In the case of an ideal, noise-free reservoir, a value of m c = 18 is achieved for η = 0.8. However, the memory capacity is significantly lower

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 8. Experimental (top panels) and numerical (bottom panels) results for the Santa Fe time series prediction test. Color-coded NMSE as a function of the parameters of the system η and γ for different number of bits in the output ADC. The exponent is set to p ∼ 6, N = 400, and θ = 0.4 T. Color lines: operating points shown in Fig. 3. The NMSE values are an average over three data partitions.

when the numerical simulations include a more realistic situation with a 12-bit ADC at the output of the nonlinearity. In this case, the largest memory capacity is m c = 8. These numerical results are in a good qualitative agreement, but slightly larger, than the experimental results reported in Fig. 7. Therefore, the experimental system has a signal-to-noise ratio (SNR) slightly lower than a 12-bits quantization. The second task that we evaluate is a time-series prediction task. In this task, we test the performance of our scheme in the one-step ahead prediction of a benchmark chaotic time series. We employ the Santa Fe laser time series, which is an experimental recording of a farinfrared laser operating in a chaotic regime [22]. These data are the continuation file of data set A (10 000 samples) of the Santa Fe timeseries competition [23]. The information sent to the reservoir is given by the product of the samples of the standardized Santa Fe time series and the 1-D input matrix M [24]. The 1-D input connectivity matrix has six different levels with zero mean. For the evaluation of the prediction error, we take three partitions of 3000 samples each. Each partition is used once for testing. To characterize the performance of the system for this task, we compute the normalized mean squared error (NMSE) of the prediction, defined as the normalized difference between the predicted and targeted value. The time series prediction task requires noninearity and memory. In addition, this task is known to be sensitive to the influence of quantization noise [24]. In Fig. 8 (top panels), we present the NMSE, averaged over the three data partitions, of the experimental prediction as a function of the system parameters for different number of bits in the output ADC. We observe a clear dependence of the NMSE on the number of bits in the output ADC, with a wider region of low NMSE for increasing number of outputs bits. The NMSE below 0.05 are obtained for γ ∼ 0.3 and a wide range of feedback strengths when the ADC resolutions is larger than 8 bits. For NMSE below 0.05, the relative standard deviation of the NMSE over the three data partitions is smaller than 10% (7%) for the experimental (numerical) results. The performance strongly degrades for η > 1.2 (not shown) as the dynamics of the system becomes chaotic. For this task, the NMSE prediction errors obtained with our scheme (0.031),

other implementations of delay-based RC (0.02 [24] and 0.055 [14]), and other machine learning methods (0.0087 [25]) are comparable. The minimum NMSE saturates for resolutions larger than 10 bits, with a NMSE of ∼0.03. This saturation limits the performance and a further improvement in the number of quantization levels does no longer yield an improvement in the minimum NMSE achieved by the system. In contrast, we find that the NMSE obtained from numerical simulations decreases with the number of output bits, when noise is only due to the output ADC resolution. To understand this discrepancy between the experimental and simulation results, we have measured the effective number of bits (ENOB) of the experimental system. The ENOB indicates the dynamic performance of an ADC and its associated circuitry since all real ADC introduce noise and distortion. For a resolution smaller than 8 bits, the ENOB is close to the corresponding number of resolution bits. However, the ENOB for resolutions larger than 10 bits saturates around 9.6. This limit in the experimental ENOB explains why there is not much difference between the NMSE obtained for 10 and 12 bits in the output ADC as shown in Fig. 8 (top panels). The ENOB value actually measures the total system noise, including quantization noise and analog system noise. Therefore, the ENOB value can be used to estimate the analog system noise of the hardware implementation. We have found that for our setup, the signal to analog system noise ratio is slightly >60 dB. In Fig. 8 (bottom panels), we present the NMSE of the numerical simulations of the full system, including a SNR of 60 dB. The SNR is simulated by adding a white Gaussian noise to (1). The numerical results in Fig. 8 are in good quantitative agreement with the experimental ones. Similar to the experimental results, the numerical NMSE saturates for more than 10 bits in the output ADC. From the numerical results, we can conclude that the analog system noise is negligible compared with quantization noise for small resolution. As shown in Fig. 8, an increase in the quantization noise not only decreases the minimum NMSE, but also the size of η − γ areas with good performance, making the choice of operating point critical. In delay-based RC systems with the Mackey–Glass nonlinearity, the operating point (stable fixed point of the system) for η < 1 is zero for

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

effective inputs with zero mean. However, in our scheme, we need a positive-defined input due to hardware requirements. After the input is multiplied by the mask and γ, we add a bias term, b(γ, η), so that the input is always positive. Therefore, the operating point depends on the values of η, p, and γ. The parameters η and p control the shape of the nonlinear function, whereas γ determines the input bias. The evolution of the operating point as a function of the feedback strength η and input scaling γ yields a qualitative interpretation of the performance of the system. In Fig. 8, the pink line shows the combinations of the η − γ parameters, when p = 6, yielding an operating point at the maximum of the nonlinear function (see pink dot in Fig. 3, X = 1.23 V). When the operating point is in a very nonlinear region of the Mackey–Glass function (at the maximum, pink line in Fig. 8), the system does not reach the memory capacity required by the Santa Fe task and the NMSE increases. In turn, the black line accounts for the η − γ values yielding an operating point at the inflection point of the nonlinear function. When the operating point is around the inflection point of the numerical Mackey–Glass function (see black dot in Fig. 3, X = 1.69 V), the response of the Mackey–Glass node is almost linear. This lowers the computational ability and there is a slight increment of the NMSE at this fixed point (black line in Fig. 8). Finally, the operating point becomes nearly independent of the feedback strength η when γ > 0.3 (see the green and brown lines in Fig. 8 that correspond to the green and brown operating points shown in Fig. 3, respectively). In the experiments, the lowest prediction errors are found for the operating point X = 2.29 V (see brown dot in Fig. 3). Besides the good agreement between the numerical and experimental results, we find a few discrepancies. In particular, the system performance decreases abruptly for γ > 0.45 in the experimental implementation. That corresponds to working points at X > 3.05 V, where the experimental nonlinear function is practically constant (see Fig. 3). The slight mismatch between the numerical and experimental nonlinear function for X > 2.29 V can explain some of the different results between the numerical simulations and the experimental results specially for γ > 0.3. As discussed above, we find that the operating point of the system plays an important role in the system performance. The time series prediction task, in contrast with the classification task considered in Section III-A, requires a RC system with fading memory, i.e., delayed feedback is crucial for this task. However, there is a tradeoff between the memory capacity of a RC system and its computational capacity [6]. If the nodes of the RC system operate in a more linear regime, the system shows more memory capacity and lower computational ability. In delay-based RC, the system output oscillates around the operating point and the amplitude of the oscillations is given by the input scaling factor γ . The operating point and the maximum amplitude of the oscillation determine the effective nonlinear function of the system, i.e., the parts of the nonlinear function that the system really explores. Overall, we find that the prediction task is more sensitive to quantization noise than the classification task. We have also verified by numerical simulations that quantization noise due to the DAC in the input preprocessing stage and in the delay line has a negligible effect in the performance once noise in the output ADC is considered. Moreover, we have numerically and experimentally studied the influence of the quantization noise in a Lorenz time-series prediction task and we have reached the same conclusions. In this case, the input is given by a numerically generated noise-free Lorenz chaotic time series so that the performance of the system is not affected by noise in the original data itself. Note that input given by the Santa Fe laser time series is an experimental recording that can contain different sources of noise.

5

IV. C ONCLUSION Our approach combines an analog nonlinear electronic circuit integrated with a digital interface for RC purposes. The digital part takes care of simple input preprocessing and postprocessing steps, which can be easily implemented in a dedicated digital board. The training of the system for a specific task is actually performed off-line with a linear regression algorithm. Output classification after training allows for on-line (real-time) operation with a combination of the analog electronic circuit and a dedicated digital board. We have analyzed the influence of noise on the system performance by varying the resolution of the ADC. We find that the quantization noise at the output ADC has a crucial influence on the overall performance of the full system. A classification task is more resilient against a limited resolution of the ADC than a prediction task. This difference in performance can be attributed to the distinct nature of the corresponding tasks. We have also estimated the influence of the analog noise from the direct quantitative comparison with the numerical simulations. This further illustrates the benefits of comparing hardware and numerical realizations. We have found that our system is very versatile, with optimum performance for the standard tasks tested in this manuscript. The system also exhibits excellent performance over a large range of system parameters, making it robust for the benchmarks tested here. Furthermore, we have analyzed in detail the influence of the operating point in the performance of the system for the prediction task. This brief provides good guidelines for the choice of the system parameters in the experimental implementations. The RC based on a combination of a simple analog electronic circuit and dedicated digital board would offer the advantages of parallelism and full system integration with low power requirements. Taking advantage of these properties, they could be of practical use for control units in, e.g., sensor networks. ACKNOWLEDGMENT The authors would like to thank members of the PHOCUS consortium for fruitful discussions over the last three years. M. C. Soriano would also like to thank Prof. C. R. Mirasso and Prof. I. Fischer for insightful comments. R EFERENCES [1] D. Verstraeten, B. Schrauwen, M. d’Haene, and D. Stroobandt, “An experimental unification of reservoir computing methods,” Neural Netw., vol. 20, no. 3, pp. 391–403, 2007. [2] M. Lukoševiˇcius, H. Jaeger, and B. Schrauwen, “Reservoir computing trends,” KI-Künstliche Intell., vol. 26, no. 4, pp. 365–371, 2012. [3] W. Maass, T. Natschläger, and H. Markram, “Real-time computing without stable states: A new framework for neural computation based on perturbations,” Neural Comput., vol. 14, no. 11, pp. 2531–2560, 2002. [4] M. Lukoševiˇcius and H. Jaeger, “Survey: Reservoir computing approaches to recurrent neural network training,” Comput. Sci. Rev., vol. 3, no. 3, pp. 127–149, 2009. [5] L. Appeltant et al., “Information processing using a single dynamical node as complex system,” Nature Commun., vol. 2, p. 468, Sep. 2011. [6] J. Dambre, D. Verstraeten, B. Schrauwen, and S. Massar, “Information processing capacity of dynamical systems,” Sci. Rep., vol. 2, p. 514, Jul. 2012. [7] D. Verstraeten, B. Schrauwen, D. Stroobandt, and J. Van Campenhout, “Isolated word recognition with the liquid state machine: A case study,” Inf. Process. Lett., vol. 95, no. 6, pp. 521–528, 2005. [8] J. Misra and I. Saha, “Artificial neural networks in hardware: A survey of two decades of progress,” Neurocomputing, vol. 74, nos. 1–3, pp. 239–255, 2010. [9] J. Zhu and P. Sutton, “FPGA implementations of neural networks—A survey of a decade of progress,” in Proc. Conf. Field Program. Logic, Lisbon, Portugal, 2003, pp. 1062–1066. [10] M. Lakshmanan and D. V. Senthilkumar, Dynamics of Nonlinear TimeDelay Systems. Berlin, Germany: Springer-Verlag, 2011.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6

[11] L. Larger et al., “Photonic information processing beyond Turing: An optoelectronic implementation of reservoir computing,” Opt. Exp., vol. 20, no. 3, pp. 3241–3249, 2012. [12] Y. Paquot et al., “Optoelectronic reservoir computing,” Sci. Rep., vol. 2, p. 287, Feb. 2012. [13] R. Martinenghi, S. Rybalko, M. Jacquot, Y. K. Chembo, and L. Larger, “Photonic non-linear transient computing with multiple-delay wavelength dynamics,” Phys. Rev. Lett., vol. 108, no. 24, p. 244101, 2012. [14] D. Brunner, M. C. Soriano, C. R. Mirasso, and I. Fischer, “Parallel photonic information processing at gigabyte per second data rates using transient states,” Nature Commun., vol. 4, p. 1364, Jan. 2013. [15] F. Duport, B. Schneider, A. Smerieri, M. Haelterman, and S. Massar, “All-optical reservoir computing,” Opt. Exp., vol. 20, no. 20, pp. 22783–22795, 2012. [16] A. Smerieri, D. Duport, Y. Paquot, B. Schrauwen, M. Haelterman, and S. Massar, “Analog readout for optical reservoir computers,” in Proc. Adv. NIPS, vol. 25. 2012, pp. 944–952. [17] A. Namaj¯unas, K. Pyragas, and A. Tamaševiˇcius, “An electronic analog of the Mackey-Glass system,” Phys. Lett. A, vol. 201, no. 1, pp. 42–46, 1995.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

[18] M. C. Mackey and L. Glass, “Oscillation and chaos in physiological control systems,” Science, vol. 197, no. 4300, pp. 287–289, 1977. [19] Texas Instruments-Developed 46-Word Speaker-Dependent Isolated Word Corpus (TI46), NIST, Kekaha, HI, USA, Sep. 1991. [20] R. F. Lyon, “A computational model of filtering, detection, and compression in the cochlea,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 1982, pp. 1282–1285. [21] H. Jaeger, “Short term memory in echo state networks,” German Nat. Center Inf. Technol., Berlin, Germany, GMD-Rep. 152, 2002. [22] U. Huebner, N. B. Abraham, and C. O. Weiss, “Dimensions and entropies of chaotic intensity pulsations in a single-mode far-infrared NH3 laser,” Phys. Rev. A, vol. 40, no. 11, pp. 6354–6365, 1989. [23] A. S. Weigend and N. A. Gershenfeld. (1993). Time Series Prediction: Forecasting the Future and Understanding the Past [Online]. Available: http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html [24] M. C. Soriano et al., “Optoelectronic reservoir computing: Tackling noise-induced performance degradation,” Opt. Exp., vol. 21, no. 1, pp. 12–20, 2013. [25] A. Rodan and P. Tino, “Minimum complexity echo state network,” IEEE Trans. Neural Netw., vol. 22, no. 1, pp. 131–144, Jan. 2011.

Delay-based reservoir computing: noise effects in a combined analog and digital implementation.

Reservoir computing is a paradigm in machine learning whose processing capabilities rely on the dynamical behavior of recurrent neural networks. We pr...
1MB Sizes 0 Downloads 8 Views