PcHD: personalized classification of heartbeat types using a decision tree.

Computers in Biology and Medicine 54 (2014) 79–88

Contents lists available at ScienceDirect

Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm

PcHD: Personalized classification of heartbeat types using a decision tree$ Juyoung Park, Kyungtae Kang n Department of Computer Science & Engineering, Hanyang University, Ansan 426-791, Republic of Korea

art ic l e i nf o

a b s t r a c t

Article history: Received 28 February 2014 Accepted 10 August 2014

The computer-aided interpretation of electrocardiogram (ECG) signals provides a non-invasive and inexpensive technique for analyzing heart activity under various cardiac conditions. Further, the proliferation of smartphones and wireless networks makes it possible to perform continuous Holter monitoring. However, although considerable attention has been paid to automated detection and classification of heartbeats from ECG data, classifier learning strategies have never been used to deal with individual variations in cardiac activity. In this paper, we propose a novel method for automatic classification of an individual's ECG beats for Holter monitoring. We use the Pan-Tompkins algorithm to accurately extract features such as the QRS complex and P wave, and employ a decision tree to classify each beat in terms of these features. Evaluations conducted against the MIT-BIH arrhythmia database before and after personalization of the decision tree using a patient's own ECG data yield heartbeat classification accuracies of 94.6% and 99%, respectively. These are comparable to results obtained from state-of-the-art schemes, validating the efficacy of our proposed method. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Heartbeat classification Decision tree model Personalization Electrocardiogram Pan-Tompkins algorithm Holter monitoring

1. Introduction Arrhythmia is an irregular heartbeat or abnormal heart rhythm. The symptoms of arrhythmia are diverse, from minor chest palpitations to a sudden heart attack. If detected at a late stage, arrhythmia is liable to recur frequently, and this is associated with a high risk of death [1]. It is therefore important for patients showing symptoms of arrhythmia, however mild, to be diagnosed as early as possible. Unfortunately, many patients are unaware of their symptoms or are unwilling to visit a hospital, and even those who seek diagnosis may exhibit normal cardiac behavior during their visit. Consequently, there is growing demand for a remote electrocardiogram (ECG) monitoring system that can function continuously and at any location. Fortunately, the recently developed Holter monitoring device can be integrated with modern smartphones to serve precisely this function (see Fig. 1), allowing patients to continue their normal routines while their ECG data are sent over a wireless network to a monitoring center for interpretation. If a decision support system identifies a dangerous condition, it alerts medical staffs at the monitoring center, who make a diagnosis and then inform the

☆ A preliminary version of this paper appeared in the Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2013). n Corresponding author. E-mail address: [email protected] (K. Kang).

http://dx.doi.org/10.1016/j.compbiomed.2014.08.013 0010-4825/& 2014 Elsevier Ltd. All rights reserved.

patient immediately, instructing them as to the most appropriate action to take. Individual heartbeats in ECG data are primarily characterized by five peaks and valleys (labeled P, Q, R, S, and T) [2], and these can be used to detect arrhythmia. A number of successful arrhythmia classification systems [3–16] have been reported; however, none of these systems take individual cardiac peculiarities into account, even though it has been demonstrated that individuals of the same age, sex, weight, and height can have completely different baseline ECG patterns [17]. It has even been suggested that individuals could be identified by their ECG signals [17,18]. On the basis of these ideas, we propose a new method for arrhythmia classification of an individual's heartbeats by Holter monitoring. Specifically, we use the Pan-Tompkins algorithm [2] to accurately extract ECG features, and then use a decision tree to classify beats according to these features. We demonstrate the efficacy of this decision tree classifier by means of extensive experiments using the MIT-BIH arrhythmia database [19]. Our work makes the following two main contributions: (1) a systematic design methodology for a remote ECG monitoring system, and (2) a more accurate classification method that reduces the number of false alarms and missing events by considering more types of heartbeat. The remainder of this paper is organized as follows: In Section 2, we briefly review work related to ECG feature detection and classification. Section 3 presents our proposed method for extracting features and classifying an individual's ECG beats. In Section 4,

80

J. Park, K. Kang / Computers in Biology and Medicine 54 (2014) 79–88

we profile our experimental dataset, before Sections 5 and 6 present and analyze our experimental results. Finally, in Section 7, we give some concluding remarks and outline possible directions for future study.

2. Related work In this section, we briefly describe previous work related to heartbeat detection and classification.

2.1. Feature extraction ECG traces of heartbeats measure the two main activities of the heart: ventricular, characterized by the QRS complex and T waves; and atrial, characterized by P waves [20]. Analysis of the QRS complex remains the simplest noninvasive method of diagnosing a variety of heart diseases. Pan and Tompkins [2] developed a realtime algorithm that is widely used to detect the QRS complex in ECG beats. Further, a number of researchers have proposed detection systems that use the QRS complex; e.g., Ye et al. [3] and Prasad and Sahambi [5] explored the characterization of ECG beats using wavelet features. The wavelet transform is a powerful tool for analyzing non-stationary signals; however, it is expensive from a computational viewpoint. De Chazal et al. [6,7], Rodriguez et al. [9], Llamedo and Martinez [10], De Oliveira et al. [12], Yeap et al. [13], and Zhang et al. [14] used waveform features to detect the QRS complex, while Osowski et al. [8] used higher-order statistics and Hermite coefficient features that can be effective for modeling cumulative indicators, but complicated signals need to be approximated by a linear combination of these functions. Chiarugi et al. [20] and Almeida et al. [21] proposed techniques for detecting P wave, characteristics of atrial activity after the detection of the QRS complex.

Fig. 1. Architecture of a remote cardiac monitoring system.

2.2. Classification of heartbeats from ECG data Recently, a variety of automatic classification methods for ECG beats have been proposed based on machine learning algorithms [3–16]. Ye et al. [3] and Osowski et al. [8] used a support vector machine (SVM), known to be an excellent and generalizable tool for classification. Ceylan et al. [4], Prasad and Sahambi [5], Yeap et al. [13], and Haseena et al. [15] used artificial neural networks (ANNs) based on a combination of many classifiers. De Chazal et al. [6,7] and Llamedo and Martinez [10] used linear discriminants for heartbeat classification, while Rodriguez et al. [9] and Zhang et al. [14] used decision trees, one of the most common and practical methods for inductive inference. De Lannoy et al. [11] used conditional random fields to classify sequential observations of ECG beats, while De Oliveira et al. [12] used dynamic Bayesian networks to combine information from adjacent events. However, SVM and ANN methods are difficult to understand and interpret. A decision tree is much simpler, and has been shown to provide good results in arrhythmia detection [9]. 2.3. Classification of ECG beats using a decision tree Decision trees are one of the most widely used and practical learning methods. A number of researchers have used decision trees for heartbeat classification [14,22,24,25]. Zhang et al. [14] extracted features from ECG waveforms using wavelet transforms, and then applied a decision tree to cluster the ECG signals. In the feature extraction process, they used principal component analysis to remove the mutual dependence of features, applied component analysis to make the features independent, and added the RR

Fig. 2. Overview of the proposed system.

Table 1 Comparison of related work. Reference

Number of segments/beats

Feature extraction

Classification

Number of types

Accuracy (%)

L. Zhang [14] Z. Masetic [22] F. Charfi [24] A. Mert [25]

31,000 2800 – 56,569

Wavelet Autoregressive Burg Pan–Tompkins Waveform

ID3 C4.5 C4.5 Bagged decision tree

6 2 4 6

96.31 99.86 96.87 99.34


interval (the interval between successive R points); the ID3 algorithm was used for classification. They classified 31,000 beat segments from the MIT-BIH arrhythmia database as either normal (N), left bundle branch block (B), right bundle branch block (R), paced beat (P), ventricular premature beat (V), or atrial premature beat (A), and achieved recognition accuracy of 96.31%. Masetic and Subasi [22] proposed a method of detecting congestive heart failure using the autoregressive Burg algorithm and the C4.5 decision tree [23]. Beat segments were detected and separated into two categories: ‘normal’ and ‘congestive heart failure’. Their decision tree was trained using 1300 beat segments from the MIT–BIH arrhythmia database and 1500 beat segments from the BIDMC congestive heart failure database, and their proposed model achieved an accuracy of 99.86%.

81

Charfi and Kraiem [24] conducted a comparative study of ECG classification using decision trees. They used the Pan–Tompkins algorithm for feature extraction, and compared the C4.5, Improved C4.5, Chi squared automatic interaction detector (CHAID), and Improved CHAID classification algorithms. In classifying four types

Table 2 Description of the extracted features. Features

Description

pos-R pos-P val-R val-P inter-RR inter-PR

Position of R point Position of P point Value of R point (amplitude) Value of P point (amplitude) Interval between two consecutive R waves Interval between the P and R points

Original ECG signal − MLII 2

Entropy = 0

Entropy = 1

0 −1 −2 500

1000 1500 2000 2500 3000 Time intervals (0.003 sec.)

3500

4000 Fig. 5. Concept of the entropy of attributes.

Fig. 3. ECG signal.

2

0

2

-1

1

-2 500

1000 1500 2000 Time intervals (0.003 sec.)

2500

Volts

Volts

1

0 -1 -2

(1) Low pass filter

500

2

1000 1500 2000 Time intervals (0.003 sec.)

2500

(4) Squaring

Volts

1 0

2

-1 500

1000

1500

2000

2500

Volts

1

-2

0

Time intervals (0.003 sec.)

-1

(2) High pass filter

-2 500

1000

1500

2000

Time intervals (0.003 sec.) 2

(5) Moving window integration

1 Volts

Volts

1

0 -1 -2 500

1000

1500

2000

2500

Time intervals (0.003 sec.) (3) Derivative Fig. 4. Heartbeat detection module.

2500

82


of beats (normal, right bundle branch block, and atrial fibrillation) C4.5 achieved the best accuracy of 96.87%. Mert et al. [25] proposed an approach for ECG classification that uses an ensemble decision tree. In the feature extraction process, they used filtering to remove direct current (DC) bias and power RR interval 248 N

PR interval 57

V

PR interval > 418977

Superiority of Classification Tree versus Cluster, Fuzzy and Discriminant Models in a Heartbeat Classification System.

Heartbeat classification using disease-specific feature selection.

A modified classification tree method for personalized medicine decisions.

The clinical decision analysis using decision tree.

Decision tree modeling using R.

Prediction of Severe Acute Pancreatitis Using a Decision Tree Model Based on the Revised Atlanta Classification of Acute Pancreatitis.

A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier.

An adapting system for heartbeat classification minimising user input.

Utilizing ECG-Based Heartbeat Classification for Hypertrophic Cardiomyopathy Identification.

Decision tree structure based classification of EEG signals recorded during two dimensional cursor movement imagery.

Identifying fallers among ophthalmic patients using classification tree methodology.

Predicting 'very poor' beach water quality gradings using classification tree.

Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

CUDT: a CUDA based decision tree algorithm.

Revealing real-time emotional responses: a personalized assessment based on heartbeat dynamics.

Prediction model for demands of the health meteorological information using a decision tree method.

Proposal of a Clinical Decision Tree Algorithm Using Factors Associated with Severe Dengue Infection.

Predictors and patterns of problematic Internet game use using a decision tree model.

Decision-tree analysis of control strategies.

Estimation of toxic hazard--a decision tree approach.

A comparison of stimulus types in online classification of the P300 speller using language models.

Determinants of cesarean delivery: a classification tree analysis.

Profiling arthritis pain with decision tree.