Investigating the performance improvement of HRV Indices in CHF using feature selection methods based on backward elimination and statistical significance.

Computers in Biology and Medicine 45 (2014) 72–79

Contents lists available at ScienceDirect

Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm

Investigating the performance improvement of HRV Indices in CHF using feature selection methods based on backward elimination and statistical significance Ali Narin a, Yalcin Isler b,n, Mahmut Ozer a a b

Bulent Ecevit University, Department of Electrical and Electronics Engineering, Zonguldak, Turkey Izmir Katip Celebi University, Department of Biomedical Engineering, Cigli, Izmir, Turkey

art ic l e i nf o

a b s t r a c t

Article history: Received 1 June 2013 Accepted 22 November 2013

In this study, the best combination of short-term heart rate variability (HRV) measures was investigated to distinguish 29 patients with congestive heart failure from 54 healthy subjects in the control group. In the analysis performed, wavelet packet transform based frequency-domain measures and several nonlinear parameters were used in addition to standard HRV measures. The backward elimination and unpaired statistical analysis methods were used to select the best one among all possible combinations of these measures. Five distinct typical classifiers with different parameters were evaluated in discriminating these two groups using the leave-one-out cross validation method. Each algorithm was tested 30 times to determine the repeatability of the results. The results imply that the backward elimination method gives better performance when compared to the statistical significance method in the feature selection stage. The best performance (82.75%, 96.29%, and 91.56% for the sensitivity, specificity, and accuracy) was obtained by using the SVM classifier with 27 selected features including non-linear and wavelet-based measures. & 2013 Elsevier Ltd. All rights reserved.

Keywords: Heart rate variability Feature selection Congestive heart failure Wavelet transform Non-linear measures

1. Introduction Heart failure is the disability of the heart in pumping blood to the body efficiently [1,2]. Because of this disorder, many organs do not receive enough oxygen and nutrients, which damages them and reduces their ability to function properly. Since the congestion is very common in this heart condition, this is also called congestive heart failure (CHF) [3]. Symptoms of CHF often begin slowly and occur only when the patient is very active. Occasionally, these symptoms may also begin suddenly; for example, after a heart attack or other heart-related problems. Although the diagnosis of CHF is straightforward for expert cardiologists, physicians are often challenged because of the absence of typical signs and symptoms and the possibility of attributing CHF symptoms to other conditions especially in elderly patients, resulting in delayed interventions [4]. A decreased heart functioning may generally be seen on several tests, including the echo-cardiogram, heart catheterization, chest X-ray, chest CT scan, cardiac MRI, nuclear heart scans (MUGA, RNV), and ECG [5]. Among these tests, ECG is probably the most common method

n

Corresponding author. Tel.: þ 90 5077016050. E-mail addresses: [email protected] (A. Narin), [email protected] (Y. Isler), [email protected] (M. Ozer). 0010-4825/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compbiomed.2013.11.016

in cardiac-related disease because it is a non-invasive and easy to use method. Nonetheless, its use has been limited to observe whether abnormal beats are present or not in the CHF-related literature. An abnormal ECG is reported with an estimated sensitivity of 81% [6]. Recently HRV analysis, which is derived from intervals between two adjacent peaks of ECG records, has been used in the patients with CHF for both the diagnosis [7–13] and the prognosis [14–16] purposes. For example, Asyali studied on discriminating CHF patients from normals using linear discriminant analysis and Bayesian classifier [7]. In his study, only nine common long-term (24 h) time-domain and classical FFT-based frequency-domain HRV measures were used and the sensitivity and the specificity rates of 81.8% and 98.1% were obtained. Isler and Kuntalp studied on CHF discrimination task using classical time- and frequencydomain measures by combining Wavelet Entropy measures and the Nearest Neighbor classifier resulting in the sensitivity rate of 79.3% and the specificity rate of 94.4% [8]. The same authors also studied on the same task by adding heart rate normalization and the k-Nearest Neighbors (k ¼3) classifier resulting in the sensitivity rate of 82.76% and the specificity rate of 100% [9]. Yu and Lee used standard time- and frequency features with bi-spectral analysis based features and Support Vector Machines. They achieved the sensitivity rate of 95.55% and the specificity rate of 100% for RBF kernel with the selected features using Genetic

A. Narin et al. / Computers in Biology and Medicine 45 (2014) 72–79

Algorithm (GA) [10] and the sensitivity rate of 93.10% and the specificity rate of 98.14% for linear kernel [11]. Jovic and Bogunovic used standard time-domain features and non-linear measures with SVM, MLP, C4.5, and Bayesian classifiers resulting in sensitivity rates of 77.2%, 96.6%, 99.2%, and 98.4%, respectively and specificity rates of 87.4%, 97.8%, 98.4%, and 99.2%, respectively [12]. Pecchia et al. used non-standard features and CART classifier resulting in the sensitivity rate of 89.7% and the specificity rate of 100% [13]. Unfortunately, although there are various HRV measures, a significant correlation between these measures has been observed. Therefore, simultaneous use of these highly correlated HRV measures could unnecessarily complicate the implementation of classifiers and the presentation of redundant information may result in confusion, which is called the curse of dimensionality [17]. At first glance, it seems that the problem of selecting the subset of best d features (measures) from a given set of n features (d on) has a straightforward solution. One can form all possible different combinations of d out of n features; for each one, the measure of separability can be calculated and the best feature set can be selected. Unfortunately, such an exhaustive search method is impossible for large n and d because of the amount of computational load [18]. Alternatively, one of the feature selection algorithms can be used. Feature selection algorithms can be divided into three groups: embedded, filter, and wrapper approaches [19–21]. Embedded methods perform variable selection during the process of training and are usually specific to the given learning machines. Filters select subsets of variables as a pre-processing step, independently of the chosen predictor (Fig. 1). Wrappers utilize the learning machine of interest as a black box to score subsets of variables according to their predictive power (Fig. 2). In the literature, there are many algorithms that implement these approaches [19–21]. Recently a statistical method based on the independent t-test is offered as an alternative feature selection method [22]. In this study, feature selection method based on the independent statistical t-test and the well-known backward elimination method [17] were compared by means of their performances in the discrimination of the patients with CHF from normal subjects using different classifier algorithms. The following section covers the methods used in the study including brief information about the studied population, construction of the HRV dataset (i.e. the feature extraction) from the raw ECG data, classifiers, and feature selection steps. The third section gives the results achieved. This section also lists the required parameters for the implemented system. Finally, the last section discusses and concludes this study.

Fig. 1. Feature selection algorithms based on the filter approaches.

73

2. Methods 2.1. Data The data used in this study were obtained from the normal sinus rhythm and congestive heart failure RR interval databases from the MIT/BIH database in PhysioNET [23]. The RR interval records in these databases include beat annotation files for longterm (24 h) ECG recordings that were digitized at 128 samples per second. The beat annotations were obtained through automated analysis following manual review and correction by experts. The normal sinus rhythm database (nsr2db) has data from 54 normal subjects (30 men, 24 women) with an age range of 24–76 years; the CHF database (chf2db) has data from 29 patients with CHF (8 men, 2 women and 19 unrecorded) with an age range of 34–79. These databases can be downloaded from http://www. physionet.org/physiobank/database/. Although the HRV analysis can be applied to both short-term (5 min) and long-term (24 h) HRV data [24], the short-term HRV analysis is commonly used in the literature to diagnose in 5 min. The first 5-min ectopy-free segments of every subject were selected to employ short-term HRV measures in this study because the task force recommends the use of artifact-free sections in the HRV analysis [24]. 2.2. Calculation of HRV measures A study has reported the occurrence of CHF bursts in the adults at a rate of around 2%, and this value further increases to 6–10% in those of over 65. Moreover, the occurrence of CHF was reported to be related with gender categories [25]. To clarify the roles of age and gender in discriminating CHF from NSR, both should be included in the feature set. Nonetheless, it is not possible since the gender information of the records given in the database of chf2db is not recorded for the patients. In this study, 57 short-term (5 min) HRV measures were calculated on 29 patients with CHF and 54 subjects in the control group including the patient age: time-domain HRV measures; frequency-domain HRV measures using ○ PSD estimation with the FFT algorithm. ○ PSD estimation with the LS algorithm. ○ Variance, Energy, and Entropy values from the Wavelet transform. non-linear HRV measures using ○ Poincare plot. ○ Symbolic dynamics. ○ Detrended fluctuation analysis. ○ Sample entropy. All these HRV measures were calculated using scripts implemented by authors in MATLAB software package. This stage is called Feature Extraction in the pattern recognition related literature [17], which is summarized in Fig. 3. After the feature extraction stage, all the features were normalized to improve classification performance [8]. All features were scaled into the interval of ½0; 1 using MIN-MAX algorithm that is defined as f i;j ¼

f i;j MINðf i Þ MAXðf i Þ MINðf i Þ

ð1Þ

where f i;j is the j-th sample of the i-th feature.

Fig. 2. Feature selection algorithms based on the wrapper approaches.

2.2.1. Time-domain HRV measures The derivation of standard time-domain HRV measures is quite simple. This property also helped the HRV analysis to find

74


Fig. 3. The feature extraction stage in heart rate variability based systems.

widespread use among clinicians and researchers. The timedomain measures are calculated directly from the raw HRV data. These measures are Mean (mean of all RR intervals), SDNN (standard deviation of all RR intervals), RMSSD (root means square of differences between adjacent NN intervals), and SDSD (standard deviation of differences between adjacent NN intervals) [8,9].

2.2.2. Frequency-domain HRV measures Frequency-domain HRV measures, on the other hand, are based on the power spectral density (PSD) analysis of the HRV data. The power spectra can be obtained using FFT- and LS-based periodogram estimations or wavelet based measures including variances, energies, and entropies. Some preprocessing steps such as interpolation and detrending are necessary depending on the algorithm used. In FFT- and wavelet-based algorithms, interpolation is necessary to produce an evenly sampled time series from the HRV data, which are in inherently unevenly sampled form [26]. FFTbased methods also require a detrending procedure, which makes the data at least weakly stationary [27]. The spectral measures have the advantage of relating the power of variation in different frequency bands to different physiologically modulating effects. Three main spectral components are distinguished in a spectrum calculated from short-term HRV recordings: very-low-frequency (VLF), low-frequency (LF), and high-frequency (HF) components [24]. These frequency bands are bounded with the limits of 00.04 Hz, 0.04-0.15 Hz, and 0.15–0.40 Hz, respectively. Measurement of the power components consists of absolute values of power (power very-low-frequency (PVLF), power low-frequency (PLF), and power high-frequency (PHF) components) in these frequency bands. LF and HF components are also measured in normalized units (normalized low-frequency (NLF) and normalized high-frequency (NHF) components), which represent the relative value of each power component in proportion to the total power minus the VLF component. The ratio of the LF to HF components (denoted RATIO) gives information on the balance of the sympathetic and parasympathetic divisions of the autonomic nervous system (ANS) [24]. The total power and these six

measures are commonly used in frequency-domain HRV measures irrespective of which periodogram method is used. In this study, these measures were calculated using FFT-based periodogram, LS-based periodogram, Wavelet energies, Wavelet variances, and Wavelet entropies. Among these methods, LS periodogram is reported as a more appropriate PSD estimation method than classical FFT-based PSD estimation methods for unevenly sampled data [28,29] such as typical RR data [30]. FFT-based periodogram method requires both resampled and detrended data, but Wavelet methods require only resampled data. The LS method is able to be used without the need to resample and detrend the RR data [27]. On the other hand, wavelet transform allows both the time and the frequency domains to be investigated and polynomial non-stationarities of the signal to be removed. Owing to this property, wavelet analysis is reported as suitable for analyzing the HRV data [31]. The FFT-based periodogram can be calculated by using FFT algorithm and then simply squaring the FFT coefficients. In order to calculate the wavelet measures, there is a three-step procedure defined in the literature [32,33]: first, obtaining the wavelet packet coefficients (Deubechies 4 wavelet with level 7 and N ¼1200); second, calculating the wavelet energies and variances; and last, calculating the wavelet entropies. By following the definition of entropy given by Shannon [34], the wavelet multi-resolution packets method in HRV analysis is used for separating the signal by scales, defined in agreement with the traditional frequency bands used in the standard HRV analysis. The wavelet coefficients are calculated using the re-sampled HRV data set. A detailed information can be found in the literature [8,9].

2.2.3. Non-linear HRV measures In addition, non-linear characteristics are certainly involved in HRV. Some investigators emphasize the importance of using nonlinear techniques to quantify HRV. It has been suggested that through non-linear analysis of HRV, it might be possible to obtain precious information for physiological interpretation of HRV and to provide information on diagnosis and prognosis of various


cardiac and non-cardiac patients. However, these non-linear methods have not been used systematically to investigate large patient populations. At present, they are just potential tools for HRV assessment [24]. Commonly used non-linear measures in the HRV literature are Poincare Plot, Detrended Fluctuation Analysis, Symbolic Dynamics, and Sample Entropy. A detailed description of these techniques can be found in related literature. Poincare Plot, a technique taken from non-linear dynamics, is a graph of each RR interval plotted against the next interval. The plot provides summary information as well as detailed beat-to-beat information on the behavior of the heart [35,36]. The Poincare Plot is a popular technique thanks to its simple visual interpretation and its proved clinical ability as a predictor of disease and cardiac dysfunction [37]. Poincare Plot is drawn using the raw HRV dataset. Fitting an ellipse to the Poincare Plot's shape (Fig. 4) is a popular technique [38]. The standard deviation of the distance of the points on the plot determines the width (SD1) and length (SD2) of the ellipse [39]. The measures SD1 and SD2 are calculated as follows: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi SD1 ¼ 12 ðSDSDÞ2 ð2Þ SD2 ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ðSDNNÞ2 12 ðSDSDÞ2

ð3Þ

where SDSD and SDNN are defined in standard time-domain measures. In addition, the product, SD1 SD2 , and the ratio, SD1 =SD2 , were also be computed to describe the relationships between these components. Detrended fluctuation analysis: (DFA) is used to quantify the fractal scaling properties of short interval RR interval signals. This technique is a modification of root-mean-square analysis of random walks applied to non-stationary signals [40]. The root meansquare fluctuation of an integrated and detrended time series is measured at different observation windows and plotted against the size of the observation window on a log–log scale. First, the RR time series (of total length K) is integrated using the equation: k

yðkÞ ¼ ∑ ðRRðiÞ RRavg Þ i¼1

ð4Þ

where y(k) is the k-th value of the integrated series, RR(i) is the i-th inter beat interval, and the RRavg is the average inter beat interval over the entire series. Then, the integrated time series is divided into windows of equal length, n. In each window of length n, a least-squares line is fitted to the RR interval data (representing the trend in that window). The y coordinate of the straight line segments are denoted by y(k). Next, the integrated time series, yn(k), is detrended in each window. The root-mean-square

75

fluctuation of this integrated and detrended series is calculated using the equation: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 N FðnÞ ¼ ∑ yðkÞ yn ðkÞ ð5Þ Nk¼1 This computation is repeated over all time scales (window sizes) to discover the relationship between F(n) and the window size n (i.e., the number of beats in a window that is the size of the window of observation). Typically, as F(n) will increase so does the window size. The fluctuations in small windows can be characterized by a scaling exponent, the self-similarity factor, representing the slope of the line relating log FðnÞ to log n. In this method, a fractal-like signal results in a scaling exponent value of 1 while White Gaussian noise (totally random signal) results in a value of 0.5 and a Brownian noise signal with spectrum rapidly decreasing power in the higher frequencies results in an exponent value of 1.5 [40,41]. This value can be viewed as an indicator of the roughness of the original time series: the larger value is the smoother time series is [42,43]. Symbolic dynamics, as an approach to investigate complex systems, facilitates the analysis of dynamic aspects of the signal of interest. The concept of symbolic dynamics is based on a coarsegraining of the dynamics [43]. That is, the range of original observations or the range of some transform of the original observations such as the first difference between the consecutive values, is partitioned into a finite number of regions and each region is associated with a specific symbolic value so that each observation or the difference between successive values is uniquely mapped to a particular symbol depending on the region into which it falls. For instance, the heart rate time series (RR1 ; RR2 ; …; RRn ) can be transformed into a symbol sequence (s1 ; s2 ; …; sn ) with symbols from the alphabet 0; 1; 2; 3 using 0; RRi r μ a 1; μ a o RRi r μ

2; μ o RRi r μ þ a 3; μ þ a o RRi

where μ denotes the mean and the a is a constant equal to the standard deviation of all RR intervals [44]. In this way some detailed information is lost, but a more general dynamic behavior can be analyzed [45]. Then, Shannon entropies from probability values of successive L-length symbols are calculated. In this study, values of L were tested using all integer values between 1 and 10. Sample entropy (SampEn) is a new family of statistics measuring complexity and regularity of clinical and experimental timeseries data [46]. SampEn statistics provide an improved evaluation of time-series regularity and is a useful tool in the studies of the dynamics of human cardiovascular physiology [46]. In SampEn, the comparison between the template vector and the rest of the vectors excludes the comparison with itself. This causes that probabilities may be zero. 2.3. Feature selection

Fig. 4. Ellipse fitting of the Poincare Plot.

The main purpose of the feature selection is determining the feature subset which gives the highest discrimination between the groups. Using all features in a classifier does not give the best performance in many cases [17]. Feature selection also helps people to acquire a better understanding about which features are important in diagnosing the data of interest. In this study, the independent t-test, which is commonly used to show significance of differences between measures of two distinct groups [47], was used as a filter-based feature selection method. “IBM SPSS Statistics 19” software package, which is

76


possibly the most common program in statistical analysis, is used to find p-values that show statistical significances first. Then, the features that have statistical evidence values of 1%, 2%, 5%, 10%, and 20% were selected and applied to the input of a classifier. Alternatively, the backward feature selection algorithm is also used as a wrapper approach in this study. In fact, the forward and the backward feature selection algorithms are available in the literature. The former starts with an empty set and add features until a stopping criterion concludes the search. On the other hand, the latter algorithm begins with all features and remove one by one iteratively [48,49]. Because the backward method starts using all features, it is assumed to be more successful to find the optimal feature subset [19]. The following classifiers of LDA, KNN, MLP, RBF and SVM were tested to determine the success of a feature selection method. 2.4. Classification In this study, k-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Multi-Layer Perceptron (MLP), Support Vector Machines (SVM), and Radial Basis Functions (RBF) were used for the pattern classification. In the following subsections, these classifiers were described briefly. 2.4.1. K-Nearest neighbors (KNN) KNN is one of the most popular classification algorithms because of its simplicity. Distances between the test sample and all other training samples are calculated using the Euclidean distance measure. Then, the classification of the test sample is obtained through the majority of known classes of k-nearest training samples [17]. This algorithm was tested for k ¼ 1; 3; …; 13. In order to avoid ties in the voting, only odd numbers were preferred in selecting the value of k. 2.4.2. Linear discriminant analysis (LDA) LDA is a popular method for both the size reduction and the classification [50,51]. It is a statistical method that maximizes the ratio of between-class to within-class scatter matrices [17]. It seeks a projection to reduce dimensionality while preserving as much of the class discriminatory information as possible. Euclidean distances between test data and classes means of the projected data are calculated. Then, the classes of test data are obtained by considering the minimum distances [52]. 2.4.3. Multi-layer perceptron (MLP) MLP is a popular method to model linear and non-linear relationship between inputs and outputs among artificial neural network structures. It is frequently used in the diagnosis of several diseases [53–55]. MLP has a three-layer structure in general. The input layer consists of 59 nodes corresponding to the number of features in this study. The number of neurons in the hidden layer is set to all integers between 1 and 50. The activation function of neurons in this layer is the function of “tanh”. The output layer computes the final response of the network with the activation function of “linear”. Network weights, which connect the layers, are updated using the error back-propagation learning method. In this method, the mean square error between the calculated network output and the actual output is obtained and this error value is used to update network weights. This process is continued until the mean square error value falls below the pre-determined threshold value, which is 10 8 [17]. 2.4.4. Support vector machines (SVM) SVM, based on the statistical learning theory developed by Vapnik [56], is used in many applications on both the classification

and the regression of linear and nonlinear data [57]. The main purpose is to predict the decision-making function to separate two classes by moving data to a hyper-plane. There may be several hyper planes. SVM is aimed to find the optimal hyper-plane that maximizes the distance between adjacent points of both classes. Selected data points are called support vectors that define the limits of hyper-plane [56]. The largest distance to these support vectors is so-called functional margin and the larger the margin the lower the generalization error of the classifier generally. Among the kernel functions used in the SVM, the linear kernel function is possibly the most commonly used one in the literature. Due to its easy to use structure, this kernel was preferred in this study. In addition, the parameter of margin were tested from 0.05 to 3.0 with 0.05 increments. 2.4.5. Radial basis function (RBF) RBF is another artificial neural network architecture. Transformation between the input layer and the hidden is non-linear; on the other hand, that of between the hidden layer and the output layer is linear. The activation function of neurons in the hidden layer is radial basis (generally Gaussian). The learning consists of two steps: unsupervised learning method to estimate the center of the basis function and supervised learning method to adjust network weights between the hidden layer and the output layer. Because RBF can learn the network faster than MLP [58], RBF has a common use in many studies [54,59]. The distribution parameter were tested from 0.1 to 3.0 with 0.1 increments. 2.5. Validation and performance measures The Leave-One-Out cross-validation method was employed in this paper to evaluate the performance of the classifier. In this method, the entire dataset (83 samples) except one were used for training the classifiers and the remaining one was used for testing the performance evaluation of the classifier. The process was repeated for each sample so that all samples were used for testing. The overall performance of the classifier was evaluated by taking the average of these performances. The performance of the classifier is determined by sensitivity (SEN), specificity (SPE), and accuracy (ACC). SEN is the ratio of the number of positive decisions correctly made by the recognition system to the total number of positive decisions made by the expert. SPE is the ratio of the number of negative decisions correctly made by the recognition system to the total number of negative decisions made by the expert. ACC is the ratio of the total number of correct decisions made by the recognition system to the total number of all decisions.

3. Results For each subject, all the HRV measures were calculated and all the algorithms were implemented by MATLAB 2012b using a AMD Fx-8120 Eight Core CPU-based computer with a 8 GB randomaccess memory. Implementation of the study started with loading the 5-minute HRV data to the computer memory. Time-domain measures, LS algorithm-based PSD measures and all nonlinear measures were calculated directly from the raw HRV data (Fig. 3). For extracting the FFT-based frequency-domain measures, the cubic interpolation method with the rate of 4 Hz [24] and the smoothness priors Eye model based detrending with λ ¼ 1000 [60] were used. The FFTbased periodogram was calculated by using FFT algorithm without overlapping and then simply squaring the FFT coefficients. Both LS and FFT algorithms were applied to find 1024 coefficients. The wavelet transform based measures from resampled data were


MLP with the number of neurons in the hidden layer from 1 to

calculated. The wavelet packet transform with mother wavelet of Deubechies 4 and a scale of 7 [8] was preferred. Frequency-domain HRV measures were calculated using these coefficients from FFT, LS, and Wavelet transform. Among nonlinear measures, Poincare Plot measures were calculated using ellipse fitting. At the end of this stage, the HRV dataset with 59 features (age, 6 time-domain measures, 7 FFT-based PSD measures, 7 LS-based PSD measures, 7 wavelet variances, 7 wavelet energies, 7 wavelet entropies, 4 Poincare Plot measures, 10 symbolic dynamics measures, and 3 other non-linear measures) were acquired. In order to obtain performance results, three different inputs were applied to classifiers such as all features, features selected using backward elimination method, and features found statistically significant at different levels. In the backward elimination method, all features except one were applied to the input of the classifier. If one of these feature subsets does not reduce the discriminating performance, algorithm run again using this feature subset. Because the performances of some classifiers depend on the initial values of their algorithms’ parameters, the results should be validated by running it several times. Each classifier configuration with different parameters was run 30 times for selected feature subsets in order to ensure the reproducibility of results. These classifiers were

50;

SVM and RBF with the distribution parameter from 0.1 to 3.0 of 0.1 increments. The best combination of features and classifier parameters that have very high discrimination power were reported for this stage (Table 1). In the other feature selection method, the independent t-test at statistical significance levels of 1%, 2%, 5%, 10%, and 20% were used. The number of selected features are 22, 26, 34, 39, and 48, respectively. Selected features for these 5 significant levels were applied to the input of classifiers. The best combination of features and classifier parameters that have very high discrimination power were reported for this stage (Table 2).

4. Conclusion In this study, using HRV data were gathered from online and widely used databases, the class discrimination power of different combinations of 59 distinct short-term HRV measures were investigated using both the backward-elimination method and the statistical significance method with popular statistical significances at 1%, 2%, 5%, 10%, and 20%. Five commonly used classifier algorithms were used to evaluate the performance of each feature set selected. The results, as can be seen from Tables 1 and 2, show that using the features determined by both feature selection algorithms improve the performances. The maximum performances of the constructed systems were found to occur by using the selected features based on the backward elimination method. The maximum performance was achieved as 82.75%, 96.29%, and 91.56% for the sensitivity, specificity, and accuracy values, respectively. These were high values in the literature. Although some studies have reported higher performances than this study, they failed to include the gender information [8,10] or used different datasets [12]. Because the gender information is not recorded for the most of patients in the chf2db database, including it in evaluating the performance of designed classifiers misleads the performance [9]. This maximum performance was achieved by using SVM classifier with selected 27 features. The most important point revealed by this study is that the selected features that give the best discrimination accuracy include Age, RMSSD, SDSD, Symbolic dynamics, SD2, SD1/SD2, FFT-, Lomb-, and Wavelet-based frequency-domain measures. This shows that including nonlinear

KNN with k ¼1, 3, 5, 7, 9, 11, 13; LDA for both linear and polynomial kernels; Table 1 Classifier results using both all features and features selected by backward feature elimination method where Classifier is the algorithm implemented, SEN is sensitivity (%), SPE is specificity (%), ACC is accuracy (%), and NF is the number of features to achieve the classifier performances. Classifier

Using all features

SEN

SPE

ACC

Using backward elimination method NF SEN

1-NN 55.17 77.77 69.87 59 3-NN 58.62 87.03 77.10 59 5-NN 51.72 92.54 78.31 59 7-NN 51.72 96.29 80.72 59 9-NN 55.17 96.29 81.92 59 11-NN 51.72 96.29 80.72 59 13-NN 48.27 98.14 80.72 59 Linear LDA 79.31 81.48 80.72 59 Polynomial LDA 68.96 85.18 79.51 59 MLP 62.06 100.00 86.74 59 SVM 75.86 85.18 81.92 59 RBF 65.51 85.18 78.31 59

75.86 65.51 65.51 62.06 62.06 62.06 51.72 79.31 75.86 82.75 82.75 58.62

SPE

ACC

NF

79.62 92.59 96.29 98.14 98.14 98.14 98.14 87.03 90.74 92.59 96.29 96.29

78.31 83.13 85.54 85.54 85.54 85.54 81.92 84.33 85.54 89.15 91.56 83.13

26 40 27 18 23 13 23 33 42 18 27 18

77

Table 2 Classifier results using features selected by statistically significances (p) of 1%, 2%, 5%, 10%, and 20% where Classifier is the algorithm implemented, SEN is sensitivity (%), SPE is specificity (%), ACC is accuracy (%), and NF is the number of features to achieve the classifier performances. Classifier

1-NN 3-NN 5-NN 7-NN 9-NN 11-NN 13-NN Linear LDA Polynomial LDA MLP SVM RBF

p ¼ 1% (NF ¼ 22)

p ¼ 2% (NF¼ 26)

p ¼ 5% (NF¼ 34)

p ¼ 10% (NF¼ 39)

p ¼ 20% (NF ¼48)

SEN

SPE

ACC

SEN

SPE

ACC

SEN

SPE

ACC

SEN

SPE

ACC

SEN

SPE

ACC

62.06 65.51 55.17 55.17 55.17 55.17 48.27 75.86 68.96 75.86 72.41 48.27

85.18 92.59 94.44 92.59 92.59 90.74 92.59 81.48 85.18 92.59 81.48 94.44

77.10 83.13 80.72 79.51 79.51 78.31 77.10 79.51 79.51 86.74 78.31 78.31

68.96 65.51 58.62 58.62 55.17 55.17 55.17 78.86 72.41 75.86 62.06 75.86

83.33 94.44 94.44 94.44 92.59 92.59 92.59 83.33 87.03 92.59 85.18 75.92

78.31 84.33 81.92 81.92 79.51 79.51 79.51 80.72 81.92 86.74 77.10 75.90

55.17 62.06 58.62 58.62 58.62 55.17 55.17 79.31 72.41 79.31 72.41 41.37

83.33 94.44 94.44 96.26 96.29 96.29 96.29 85.18 88.88 92.59 90.74 94.44

73.49 83.13 81.92 83.13 83.13 81.92 81.92 83.13 83.13 87.95 84.33 75.90

58.62 58.62 58.62 48.27 55.17 55.17 51.72 79.31 72.41 68.96 65.51 58.62

87.03 88.88 90.74 96.29 96.29 96.29 96.29 83.33 87.03 96.29 87.03 90.74

77.10 78.31 79.51 79.51 81.92 81.92 80.72 81.92 81.92 86.74 79.51 79.51

48.27 65.51 51.72 48.27 51.72 51.72 51.72 79.31 68.96 68.96 62.06 65.51

79.62 85.18 94.44 96.29 96.29 96.29 96.14 77.77 87.03 94.44 83.33 87.03

68.67 78.31 79.51 79.51 80.72 80.72 81.92 78.31 80.72 85.54 75.90 79.51

78


and wavelet-based measures into the classical HRV features could significantly improve the performance of the HRV analysis in the diagnosis of CHF thanks to the non-linear nature of HRV data. Although wavelet entropy measures have commonly used in the literature, wavelet variance measures were selected by the best configuration in this study. This should be examined in a larger database. On the other hand, the results also show that the wrapper approach (backward elimination method) based feature selection gives better performances than the filter approach (statistical significance) based feature selection. Using GA as a wrapper approach based feature selection may drastically increase the performances achieved by classifiers [8,9]. Because it is a very time-consuming method, this is left as a future work. The beat annotations databases used in this study were downloaded from the Internet [23]. There are some flaws of the study. First, comorbid conditions and used medications for the subjects are not given in the databases, which affects the recorded ECG data and the results of this type of studies. Next, the leave-one-out method is reported as giving optimistic results when compared to other cross-validation methods. Because the number of data used in the study is small, this method is preferred. Using other validation techniques may reduce the achieved performances of classifiers. In addition, the sensitivity is lower than the specificity generally (Tables 1 and 2). This is possibly a consequence of imbalanced data (54 healthy versus 29 patients with CHF). Using a balanced and wide-spread data help to achieve more robust performances.

5. Summary The analysis of heart rate variability (HRV) analysis, which is derived from intervals between adjacent ECG peaks, has been used commonly in discriminating the patients from normal subjects in the literature. There are well-defined standards in HRV analysis based on time- and frequency-domain measures. In this study, HRV analysis was also used to discriminate patients with congestive heart failure (CHF) from healthy control subjects. A recently derived wavelet packet transform based HRV measures (variance, energy, and entropy) and nonlinear measures (symbolic dynamics, sample entropy, Poincare plot, detrended fluctuation analysis) were also included to support the conventional HRV analysis methods. Because there is a significant correlation between HRV measures and using these measures together may cause a performance reduction, the major part of this study focused on finding the best combinations among all possible HRV measures with different well-known classifier algorithms to improve the discrimination performance in CHF. At first glance, it seems that the problem of selecting the best d features (or measures) from a given set of n features (d o n) is easy. One can form all possible different combinations of features; for each one, the measure of separability can be calculated and the best feature set can be selected. Unfortunately, such an exhaustive search method is impossible for large n and d because of the amount of computational load. For this reason, two different feature selection methods were used to test five different classifier algorithms in this study. The databases used in this study were obtained from an online and widely used database from the websites of MIT/BIH database. The achieved results have the sensitivity, specificity, and accuracy values of 82.75%, 96.29%, and 91.56%. These were obtained by using the SVM classifier with selected 27 features including nonlinear and wavelet-based measures, which were significantly comparable to those of similar studies. The most important point, which was revealed by this study, is that the combinations of

features that give the best discrimination accuracies include wavelet transform based measures and nonlinear measures in addition to the standard time- and frequency-domain HRV indices and the age of the patient. The results also show that the feature selection algorithm based on the backward elimination method gives better results than the algorithm based on the statistical significance method.

Conflict of interest statement None declared.

Acknowledgements We want to express our special thanks to Ms Deniz Angun for her support on improving the paper. References [1] J. Eberhart-Phillips, A. Fenaughty, A. Rarig, The Burden of Cardiovascular Disease in Alaska: Mortality, Hospitalization and Risk Factors, Anchorage, AK: Section of Epidemiology, Division of Public Health, Alaska Department of Health and Social Services, 2003. [2] C. Flavell, L.W. Stevenson, Take heart with heart failure, Circulation 104 (2001) e89. [3] J. Wilbur, P. James, Diagnosis and management of heart failure in the outpatient setting, Prim. Care: Clin. Off. Pract. 32 (2005) 1115–1129. [4] N.D. Gillespie, The diagnosis and management of chronic heart failure in the older patient, Br. Med. Bull. 75–76 (1) (2006) 49–62. [5] Medical Encyclopedia, Heart Failure, 〈http://www.nlm.nih.gov/medlineplus/ ency/article/000158.htm〉, last visited in March 2013. [6] C. Fonseca, T. Mota, H. Morais, F. Matias, C. Costa, A.G. Oliveira, F. Ceia, on behalf of the EPICA Investigators, The value of the electrocardiogram and chest X-ray for confirming or refuting a suspected diagnosis of heart failure in the community, Eur. J. Heart Fail. 6 (6) (2004) 807–812. [7] M.H. Asyali, Discrimination power of long-term heart rate variability measures, in: Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Cancun, 2003. [8] Y. Isler, M. Kuntalp, Combining classical HRV indices with wavelet entropy measures improves to performance in diagnosing congestive heart failure, Comput. Biol. Med. 37 (10) (2007) 1502–1510. [9] Y. Isler, M. Kuntalp, Heart rate normalization in the analysis of heart rate variability in congestive heart failure, Proc. Inst. Mech. Eng. Part H: J. Eng. Med. 224 (3) (2010) 453–463. [10] S.-N. Yu, M.-Y. Lee, Bispectral analysis and genetic algorithm for congestive heart failure recognition based on heart rate variability, Comput. Biol. Med. 42 (2012) 816–825. [11] S.-N. Yu, M.-Y. Lee, Conditional mutual information-based feature selection for congestive heart failure recognition using heart rate variability, Comput. Methods Progr. Biomed. 108 (2012) 299–309. [12] A. Jovic, N. Bogunovic, Electrocardiogram analysis using a combination of statistical, and nonlinear heart rate variability features, Artif. Intell. Med. 51 (2011) 175–186. [13] L. Pecchia, P. Melillo, M. Sansone, M. Bracale, Discrimination power of shortterm heart rate variability measures for CHF assessment, IEEE Trans. Inf. Technol. Biomed. 15 (1) (2011) 40–46. [14] J.P. Saul, Y. Arai, R.D. Berger, L.S. Lilly, W.S. Colucci, R.J. Cohen, Assessment of autonomic regulation in chronic congestive heart failure by heart rate spectral analysis, Am. J. Cardiol. 61 (1988) 1292–1299. [15] G. Casolo, E. Balli, T. Taddei, J. Amuhasi, C. Gori, Decreased spontaneous heart rate variability on congestive heart failure, Am. J. Cardiol. 64 (1989) 1162–1167. [16] P.F. Blinkley, E. Nunziata, G.J. Haas, S.D. Nelson, R.J. Cody, Parasympathetic withdrawal is an integral component of autonomic imbalance in congestive heart failure: demonstration in human subjects and verification in a paced canine model of ventricular failure, J. Am. Coll. Cardiol. 18 (1991) 464–472. [17] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, second ed., Wiley, New York, 2000. [18] A. Cohen, Biomedical Signal Processing, first ed., vol. 2, CRC Press, Boca Raton, 1986. [19] I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res. 3 (2003) 1157–1182. [20] A.L. Blum, P. Langley, Selection of relevant features and examples in machine learning, Artif. Intell. 97 (1997) 245–271. [21] R. Kohavi, G.H. John, Wrappers for feature subset selection, Artif. Intell. 97 (1997) 273–324. [22] A. Yildiz, M. Akin, M. Poyraz, An expert system for automated recognition of patients with obstructive sleep apnea using electrocardiogram recordings, Expert Syst. Appl. 38 (2011) 12880–12890.


[23] A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorf, P.C. Ivanov, R.G. Mark, J. E. Mietus, G.B. Moody, C.K. Peng, H.E. Stanley, Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals, Circulation 101 (23) (2000) e215–e220. [24] Heart rate variability: Standards of Measurement, Physiological Interpretation and Clinical Use. Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology, Circulation 93 (1996) 1043–1065. [25] J.J. McMurray, M.A. Pfeffer, Heart failure, Lancet 365 (9474) (2005) 1877–1889. [26] P. Laguna, G.B. Moody, R.G. Mark, Power spectral density of unevenly sampled data by least square analysis: performance and application to heart rate signals, IEEE Trans. Biomed. Eng. 45 (6) (1998) 698–715. [27] G.D. Clifford, L. Tarassenko, Quantifying errors in spectral estimates of HRV due to beat replacement and resampling, IEEE Trans. Biomed. Eng. 52 (4) (2005) 630–638. [28] N.R. Lomb, Least-squares frequency analysis of unequally spaced data, Astrophys. Sp. Sci. 39 (1976) 447–462. [29] J.D. Scargle, Studies in astronomical time series analysis (II): statistical aspects of spectral analysis of unevenly spaced data, Astrophys. J. 263 (1982) 835–853. [30] G.B. Moody, Spectral analysis of heart rate without resampling, Comput. Cardiol. 20 (1993) 715–718. [31] U. Wiklund, M. Akay, U. Niklasson, Short-term analysis of heart-rate variability by adapted wavelet transforms, IEEE Eng. Med. Biol. Mag. 16 (5) (1997) 113–118, 138. [32] Q.R. Quian, O.A. Rosso, E. Basar, M. Schurmann, Wavelet entropy in event related potentials: a new method shows ordering of EEG oscillations, Biol. Cybern. 84 (4) (2001) 291–299. [33] O.A. Rosso, S. Blanco, J. Yordanova, V. Kolev, A. Figliola, E. Basar, Wavelet entropy: a new tool for analysis of short duration brain electrical signals, J. Neurosci. Methods 105 (2001) 65–75. [34] C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J. 27 (1948) 379–423 & 623–656. [35] M.A. Woo, W.G. Stevenson, D.K. Moser, R.B. Trelease, R.H. Harper, Patterns of beat-to-beat heart rate variability in advanced heart failure, Am. Heart J. 123 (1992) 704–710. [36] P.W. Kamen, H. Krum, A.M. Tonkin, Poincare plot of heart rate variability allows quantitative display of parasympathetic nervous activity, Clin. Sci. 92 (1996) 201–208. [37] P.W. Kamen, Heart rate variability, Aust. Fam. Physician 25 (1996) 1087–1094. [38] M.L. Marciano, F. Migaux, D. Acanfora, G. Furgi, F. Rengo, Quantification of poincare maps for the evaluation of heart rate variability, Comput. Cardiol. (1994) 557–580. [39] M. Brennan, M. Palaniswami, P. Kamen, Do existing measures of poincare plot geometry reflect nonlinear features of heart rate variability? IEEE Trans. Biomed. Eng. 48 (11) (2001) 1342–1347. [40] H.V. Huikuri, T.H. Makikallio, C.K. Peng, A.L. Goldberger, U. Hintze, M. Moller, Fractal correlation properties of R–R interval dynamics and mortality in patients with depressed left ventricular function after an acute myocardial infarction, Circulation 101 (2000) 47–53. [41] K.K. Ho, G.B. Moody, C.K. Peng, J.E. Meitus, M.G. Larson, D. Levy, A. L. Goldberger, Predicting survival in heart failure case and control subjects by use of fully automated methods for deriving nonlinear and conventional indices of heart rate dynamics, Circulation 96 (1997) 842–848. [42] U.R. Acharya, N. Kannathal, S.M. Krishnan, Comprehensive analysis of cardiac health using heart rate signals, Physiol. Meas. 25 (2004) 1130–1151. [43] U.R. Acharya, N. Kannathal, O.W. Seng, L.Y. Ping, T. Chua, Heart rate analysis in normal subjects of various age groups, Biomed. Eng. Online 3 (24) (2004). [44] P. Caminal, M. Vallverdu, B. Giraldo, S. Benito, G. Vazquez, A. Voss, Optimized symbolic dynamics approach for the analysis of the respiratory pattern, IEEE Trans. Biomed. Eng. 52 (11) (2005) 1832–1839. [45] J.H. Xu, Z.R. Liu, R. Liu, The measures of sequence complexity for EEG studies, Chaos 4 (11) (1994) 2111–2119. [46] J.S. Richman, M.J. Randall, Physiological time-series analysis using approximate entropy and sample entropy, Heart Circ. Physiol.: Am. J. Physiol. 278 (2000) H2039–H2049. [47] S. Ashcroft, C. Pereira, Practical Statistics for the Biological Sciences: Simple Pathways to Statistical Analyses, first ed., Palgrave Macmillan, 2003. [48] A.K. Jain, R.P.W. Duin, J. Mao, Statistical pattern recognition: a review, IEEE Trans. Pattern Anal. Mach. Intell. 22 (1) (2000) 4–37. [49] F. Pernkopf, Bayesian network classifiers versus selective kNN classifier, Pattern Recognit. 38 (1) (2005) 1–10.

79

[50] C. Liu, H. Wechsler, Enhanced Fisher linear discriminant models for face recognition, in: Proceedings of International Conference on Pattern Recognition, 1998, pp. 1368–1372. [51] S. Chakrabarti, S. Roy, M. Soundalgekar, Fast and accurate text classification via multiple linear discriminant projections, Very Larg. Databases J. 12 (2) (2003) 170–185. [52] K. Fukunaga, Introduction to Statistical Pattern Classification, Academic Press, San Diego, California, 1990. [53] K. Warwick, Artificial Intelligence: The Basics, Taylor & Francis, 2011. [54] S. Pan, S. Iplikci, K. Warwick, T.Z. Aziz, Parkinsons disease tremor classification —a comparison between support vector machines and neural networks, Expert Syst. Appl. 39 (12) (2012) 10764–10771. [55] S. Kocer, M.R. Canal, Classifying Epilepsy diseases using artificial neural networks and genetic algorithm, J. Med. Syst. 35 (4) (2011) 489–498. [56] V. Vapnik, Statistical Learning Theory, New York, Wiley, 1998. [57] H. Byun, S.W. Lee, A survey of pattern recognition applications of support vector machines, Int. J. Pattern Recognit. Artif. Intell. 17 (3) (2003) 459–486. [58] C.G. Looney, Pattern Recognition using Neural Networks, Theory and Algorithms for Engineers and Scientists, Oxford University Press, New York, 1997. [59] C.Y. Chang, Y.S. Tsai, I.L. Wu, Integrating validation incremental neural network and radial-basis function neural network for segmenting prostate in ultrasound images, Int. J. Innov. Comput. Inf. Control 7 (6) (2011) 3035–3046. [60] M.P. Tarvainen, P.O. Ranta-aho, P.A. Karjalainen, An advanced detrending method with application to HRV analysis, IEEE Trans. Biomed. Eng. 49 (2) (2002) 172–175.

Ali Narin (1987) received the B.Sc. degree in Electronics and Communication Engineering from Suleyman Demirel University, Isparta,Turkey, in 2011. He has been working as a research assistant at the Department of Electrical and Electronics Engineering in Bulent Ecevit University, Zonguldak, Turkey, since 2011. His current researches include pattern recognition and biomedical signal processing.

Yalcin Isler (1971) received the B.Sc. degree in Electrical and Electronics Engineering from Anadolu University, Eskisehir, Turkey, in 1993, the M.Sc. degree in Electronics and Communication Engineering, Suleyman Demirel University, Isparta, Turkey, in 1996, and the Ph.D. degree in Electrical and Electronics Engineering, Dokuz Eylul University, Izmir, Turkey, in 2009. He worked as a Lecturer in Burdur Vocational School of Suleyman Demirel University (Burdur, 1993–2000), as a Software Engineer in commercial companies (Izmir, 2000–2002), as a research assistant in Bulent Ecevit University (Zonguldak, 2002–2003) and in Dokuz Eylul University (Izmir, 2003–2010), as an Assistant Professor in Electrical and Electronics Engineering of Bulent Ecevit University (Zonguldak, 2010–2012). He has been working as an Assistant Professor with the Department of Biomedical Engineering in Izmir Katip Celebi University, Izmir, since 2012. His main research interests are biomedical signal processing, medical device design, computational neuroscience, pattern recognition, and embedded systems. He is involved in several national projects on computer science in medicine and on other fields that required designing and implementing a microcontroller-based board.

Mahmut Ozer (1970) received the B.Sc. degree in Electronics and Communications Engineering from Istanbul Technical University, Istanbul, Turkey, in 1992, the M.Sc. degree in 1996 and the Ph.D. degree in 2001 in Electronics Engineering from Karadeniz Technical University, Trabzon, Turkey. He worked as a Service Engineer in General Directorate of State Airports Authority (Ankara, 1992-1994), as a Lecturer in Tokat Vocational School (Tokat, 1994–2002), as an Assistant Professor in Electrical and Electronics Engineering of Bulent Ecevit University (Zonguldak, 2002–2005), as an Associate Professor (2005–2010), as a Professor (2010-present) at the same department. His main research interests include computational neuroscience and biomedical signal processing.

Performance of variable selection methods using stability-based selection.

Feature Selection Applying Statistical and Neurofuzzy Methods to EEG-Based BCI.

The Hybrid Feature Selection Algorithm Based on Maximum Minimum Backward Selection Search Strategy for Liver Tissue Pathological Image Classification.

Improving feature selection performance using pairwise pre-evaluation.

A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.

Feature selection based on dependency margin.

Multiclass classification of sarcomas using pathway based feature selection method.

A comparative study of improvements Pre-filter methods bring on feature selection using microarray data.

Feature Selection and Pedestrian Detection Based on Sparse Representation.

Human activity recognition based on feature selection in smart home using back-propagation algorithm.

Identification of biomarkers for esophageal squamous cell carcinoma using feature selection and decision tree methods.

Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Prediction of protein structural classes based on feature selection technique.

Comparison of selection methods for improvement of the population hybrid.

A feature selection approach based on term distributions.

The impact of feature selection on one and two-class classification performance for plant microRNAs.

Significance of debriefing methods in simulation-based sedation training courses for medical safety improvement in Japan.

Feature Selection Methods for Zero-Shot Learning of Neural Activity.

Multileaf collimator performance monitoring and improvement using semiautomated quality control testing and statistical process control.

Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.

On borderline statistical significance.

Statistical Significance of Clustering using Soft Thresholding.

Red cell distribution width as predictive marker in CHF: testing of model performance by reclassification methods.

Heartbeat classification using disease-specific feature selection.