Neural Networks 53 (2014) 1–7

Contents lists available at ScienceDirect

Neural Networks journal homepage: www.elsevier.com/locate/neunet

Cross-person activity recognition using reduced kernel extreme learning machine Wan-Yu Deng a,∗ , Qing-Hua Zheng b , Zhong-Min Wang a a

School of Computer Science and Technology, Xian University of Posts & Telecommunications, 710121, China

b

MOEKLINNS Lab, Department of Computer Science and Technology, Xian Jiaotong University, 710049, China

article

info

Article history: Received 30 September 2013 Received in revised form 26 December 2013 Accepted 16 January 2014 Keywords: Extreme learning machine Reduced kernel extreme learning machine Activity recognition Support vector machine

abstract Activity recognition based on mobile embedded accelerometer is very important for developing humancentric pervasive applications such as healthcare, personalized recommendation and so on. However, the distribution of accelerometer data is heavily affected by varying users. The performance will degrade when the model trained on one person is used to others. To solve this problem, we propose a fast and accurate cross-person activity recognition model, known as TransRKELM (Transfer learning Reduced Kernel Extreme Learning Machine) which uses RKELM (Reduced Kernel Extreme Learning Machine) to realize initial activity recognition model. In the online phase OS-RKELM (Online Sequential Reduced Kernel Extreme Learning Machine) is applied to update the initial model and adapt the recognition model to new device users based on recognition results with high confidence level efficiently. Experimental results show that, the proposed model can adapt the classifier to new device users quickly and obtain good recognition performance. © 2014 Elsevier Ltd. All rights reserved.

1. Introduction Automatically recognizing motion activities is very important for many applications in many areas such as healthcare, elderly care and personalized recommendation. Many activity recognition approaches have been established which use specific purpose hardware devices such as in Dijkstra, Kamsma, and Zijlstra (2010) or sensor body networks (Mannini & Sabatini, 2010). Although the use of numerous sensors could improve the performance of a recognition algorithm, it is unrealistic to expect that the general public will use them in their daily activities because of the difficulty and the time required to wear them. Since the appearance of the first commercial hand-held mobile phones in 1979, it has been observed an accelerated growth in the mobile phone market which has reached by 2011 near 80% of the world population (Anguita, Ghio, Oneto, Parra, & Reyes-Ortiz, 2012). With the development of micro-electrical mechanical systems, the accelerometers are miniaturized so that they can be embedded into small mobile devices, and we benefit from this to classify a set of physical activities (standing, walking, laying, walking, walking upstairs and walking downstairs) by processing inertial body signals through a supervised machine learning



Corresponding author. Tel.: +86 34534534. E-mail addresses: [email protected], [email protected] (W.-Y. Deng). 0893-6080/$ – see front matter © 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.neunet.2014.01.008

algorithm for hardware with limited resources. Compared with traditional wearable activity recognition (Roggen, Magnenat, Waibel, & Troster, 2011), the development of activity recognition applications using smart phone has several advantages such as easy device portability without the need for additional fixed equipment, and comfort to the user due to the unobtrusive sensing. One drawback of the smart phone-based approach is that energy and services on the mobile phone are shared with other applications and this becomes critical in devices with limited resources. Additionally when the device is used by varying users, the embedded accelerometer may exert different forces because the movement patterns of different users are distinct, even when the user is doing the same activity. The model learnt from a specific person often cannot yield accurate results when used on a different person. To solve the cross-person activity recognition problem, we propose a novel fast and simple adaptive model to be embedded in mobile devices that have the limited computing resource, storage and power. First, the readings along three axes are synthesized and the magnitude of synthesized acceleration is used to extract features. This can eliminate the orientation difference of the mobile device, at the cost of losing direction information. Second, RKELM (Reduced Kernel Extreme Learning Machine) is used to build up the initial recognition model in offline phase for its fast learning speed and good generalization ability. At last, to new users in the online phase, the high confident recognition results will be selected and generate the new training dataset, on which the recognition model will be updated through taking the advantage of fast incremental

2

W.-Y. Deng et al. / Neural Networks 53 (2014) 1–7

Fig. 1. The overview of TransRKELM activity recognition.

updating speed and low memory cost of OS-RKELM (Online Sequential Reduced Kernel Extreme Learning Machine). Experimental results show the efficiency and high recognition ability of the proposed model. The rest of this paper is organized as follows. In Section 2, some related work will be illustrated. Section 3 introduces the proposed model. Experiments and results will be presented in Section 4. Finally, we conclude this paper in Section 5.

transferring knowledge learnt in one task to a related task by reinforcement learning. They needed a human teacher in order to provide a mapping from the source task to the target task to guide this knowledge transfer. In Stikic, Van Laerhoven, and Schiele (2008), collaborative learning was introduced to add the unlabeled samples into the training dataset and achieved better models, but it needs two or more learners that were learnt from different feature sets separately, which is complicated and not power efficient.

2. Relative works

3. The TransRKELM algorithm

Activity recognition (Taylor, 2009) based on mobile accelerometer has attracted a lot of attention for its widely applications in health care, personalized recommendation, advertisement serving and so on (Cambria & Hussain, 2012; Mital, Smith, Hill, & Henderson, 2011; Wollmer, Eyben, Graves, Schuller, & Rigoll, 2010). In the work of Ravi, Dandekar, Mysore, and Littman (2005) and Ward, Lukowicz, and Gellersen (2011), the authors used multiple accelerometers to classify different activities. Chen, Qi, Sun, and Ning (2010) used a smart phone to detect six activities in order to find the state switch point. These models can achieve high recognition accuracies because their testing and training samples are from the same batch of samples and follow the same distribution. But since accelerometer signals generated from different people follow different distribution, the performance of the model trained on one person will deteriorate when used to other people. Researchers have applied transfer learning to activity recognition when the activity persons, locations or labels change. For example, Zhao, Chen, Liu, Shen, and Liu (2011) propose a transfer learning-based algorithm which integrates a decision tree and the k-means clustering for personalized activity recognition model adaptation. Chen, Zhao, Wang, and Chen (2012) trained a classifier and used it to classify the sample of the target people, then the high confident samples were labeled and added into the training set, finally the model was retrained from a start. Zheng, Hu, and Yang (2009) trained a similarity function between the activities in the source domain and the target domain, then the data were transferred from the source domain to the target domain that have different label space. won Lee and Giraud-Carrier (2007) proposed a TDT (Transfer Decision Trees) algorithm. TDT assumed that the set of attributes of the source task be a proper subset of attributes of the target task. It learned a partial decision tree model from the source task and then transformed it as required by the training data in the target task. Torrey, Walker, Shavlik, and Maclin (2005) presented a method for

In this section, we present the proposed cross-person activity recognition algorithm in detail. As illustrated in Fig. 1, the algorithm mainly contains two steps: Offline classification model construction and online adaptation for a new user. Step (1) Offline classification model construction and online activity recognition. For offline classification model construction, first the readings of three axes are synthesized into magnitude series to get rid of orientation dependence. Statistic and frequencyCdomain features are extracted from magnitude series of synthesized acceleration. Then, with the characters of fast learning speed and good generalization capability, RKELM is used to build the classification model. For online activity recognition, the unlabeled testing sample is generated with the same method as that used in the offline phase. Then the sample is classified by the RKELM classifier and the classification result is obtained. Step (2) Activity recognition model updating. Based on the classification results, the confidence that a sample is correctly classified is estimated. The samples whose confidences are greater than a threshold, g, are selected to generate new training samples. Then, the RKELM model will be incrementally updated using OSRKELM. As the new training samples are collected from a new user, the updated model would adapt to this user gradually (Chen et al., 2012). 3.1. Acceleration synthesization Accelerometer detects and transforms changes in capacitance into an analog output voltage, which is proportional to acceleration. For triaxial accelerometer, the output voltages can be mapped into acceleration along three axes, ax , ay , az . As ax , ay , az are the orthogonal decompositions of real acceleration, the magnitude of synthesized acceleration can be expressed as: a =

a2x + a2y + a2z .

W.-Y. Deng et al. / Neural Networks 53 (2014) 1–7

a is the magnitude of real acceleration, but has no directional information. Therefore, the acceleration magnitude-based activity recognition model is orientation independent (Chen et al., 2012). 3.2. Acceleration feature extraction The sensor acceleration signal was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force often has only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of 17 features is obtained by calculating variables from the accelerometer signals in the time and frequency domain. These features include mean, standard deviation, energy, mean-crossing rate, maximum value, minimum value, first quartile, second quartile, third quartile, four amplitude statistic features and four shape statistic features of the power spectral density (PSD) (Anguita et al., 2012). Fast Fourier Transform is used for finding the signal frequency features. Finally, these features are combined as input (a total of 561 features, 10 299 instances) of the activity recognition model (Anguita et al., 2012). 3.3. RKELM-based recognition model construction Huang’s Kernel Extreme Learning Machine (Huang’s KELM) (Huang, Zhou, Ding, & Zhang, 2012; Huang, Zhu, & Siew, 2006) has attracted widely attention in machine learning field for its good generalization even better than SVM (Support Vector Machine) (Huang et al., 2012). But it is unsuitable for the activity recognition application in mobile devices because: (1) The kernel matrix K (X , X ) which needs to construct on the entire data is prohibitively expensive in storage especially when the size of samples increase dramatically, and (2) the model is difficult to fast update in online phase since the new sample will lead to the simultaneous inflation on row and column. To tackle these problems, we adapt the random sampling method to select a small random subset X˜ = {xi }ni˜=1 from the original n data points (n˜ ≪ n), and use K (X , X˜ ) in replace of K (X , X ) to reduce the problem size of Huang’s KELM. The reduced kernel matrix K˜ can be computed as following,

 K˜ = K (X , X˜ ) = 

K (x1 , x1 )



.. .

K (xn , x1 )

··· .. . ···

K (x1 , xn˜ )



.. .

K (xn , xn˜ )

 

(1)

nטn

and the output weights can be obtained by

β=



I

λ

+ K˜ T K˜

 −1

K˜ T T

(2)

where λ is the regularized parameter to relax the over-fitting problem. Although n˜ is always small but it does not mean it cannot be set as the large value; larger n˜ will produce more approximate recognition accuracy to Huang’s KELM but training time is more expensive. As an extreme case when n˜ → n, RKELM will become Huang’s KELM. If generalization performance is focused we can increase n˜ , otherwise n˜ can be set as the small value. This random method has also been used in SVMs (Lee & Huang, 2007; Liu, He, & Shi, 2008). In the work of Liu et al. (2008) the authors use h(x) = [h(wi , bi , x), . . . , h(wn˜ , bn˜ , x)] as mapping function φ(x) to construct kernel matrix Ωi,j = h(xi ) · h(xj ) where w1 , . . . , wn˜ , b1 , . . . , bn˜ are random values. In the work Lee and Huang (2007) one subset X˜ is randomly selected from the dataset and use K (X , X˜ ) in place of K (X , X ) to train smooth support vector machine. The experiments in the literature (Lee & Huang, 2007; Liu et al., 2008) show that this random method can produce competitive generalized performance. The experiments in our previous work (Deng & Zheng, in press-a; Deng, Zheng, & Zhang, 2013)

3

also show RKELM can produce good generalized performance at fast learning speed. From theoretical aspect, we have proved that the generalized performance of RKELM is mainly guaranteed by the size of the subset rather than the quality of the subset (Deng & Zheng, in press-a). Random subset with same size should produce similar generalized performance. Additionally, the experiments in the performance evaluation section also illustrate this point. Of course, just like the work in Wang, Kong, and Zheng (2010) where clustering technology is applied to optimize the algorithm, if optimization technology is used for samples selection in RKELM, the performance of RKELM will be improved further. However, in our current work, we still use the random method for its simpleness and efficiency. The kernel function can be Gaussian kernel ∥xi −xj ∥2 σ

, Polynomial kernel K (xi , xj ) = (⟨xi , xj ⟩+ 1)d , Sigmoid kernel K (xi , xj ) = tanh(ρ⟨xi , xj ⟩ + c ) and so on. After ex-

K (xi , xj ) = e−

∥xi −xj ∥2

tensive testing, we select Gaussian kernel K (xi , xj ) = e− σ for its good performance in our current work. The RKELM algorithm can be summarized as following. Algorithm 1 (RKELM-Based Recognition Model). Given an activity recognition training set ℵ = {X = {xi ∈ Rm }ni=1 , T = {ti ∈ Rℓ }ni=1 }, a kernel function K (xi , xj ), one constant nonzero value λ, and the number of selected sample n˜ . (1) Randomly choose n˜ samples X˜ from X (n˜ is always nearly 10% of n); (2) Compute  reduced kernel matrix  K (x1 , x1 )

K˜ = 

. . .

K (xn , x1 )

··· . . . ···

K (x1 , xn˜ )

. . .

K (xn , xn˜ )



;

nטn

(3) Compute temporary matrix Z =

I

λ

+ K˜ T K˜ .

(3)

(4) Compute output weights

β = Z −1 K˜ T T .

(4)

Remark 1. Different with SVMs which design ingenious algorithms to find good support vectors, in RKELM, the support vectors are obtained by random selection instead of iterative computing, thus RKELM will be much faster than SVM. Remark 2. KELM is different with ELM in two aspects: (i) The parameter xj of RKELM is dependent on the training data, and this may bring more knowledge of the training data into the model training, which makes KELM can produce better generalized performance; (ii) The conclusions and theorems of ELM, such as universal learning condition, no longer valid for RKELM, thus new theory of RKELM needs to be established. For the details of RKELM please refer to Deng and Zheng (in press-a). In the testing phase, for a testing sample x, the outputs can be calculated as follows: TY1×ℓ = [K (x, x1 ), . . . , K (x, xn˜ )]1טn βn˜ ×ℓ

(5)

TY1×ℓ = [o1 , . . . , oℓ ]; ℓ is the number of output nodes, which equals the number of classes in the classification problem. Then, the RKELM selects the maximum value of TY1×ℓ and assigns its corresponding index, j, as the class label of the test sample. We can calculate the samples confidence to the assigned class by the following steps: TYi = TYi − min(TYi ) confidence = max(TYi )/

(6)



TYi .

(7)

4

W.-Y. Deng et al. / Neural Networks 53 (2014) 1–7 Table 1 Activity sample information.

3.4. Recognition model adaptation to new people On the online phase, the activity recognition model is used by new people. In order to adapt to new people, the recognition model is updated based on high confident recognition results by the following online learning algorithm. The model adaptation process contains the following two steps: (1) During recognition, the test samples whose classification confidence is larger than threshold g will be reserved and used as new training samples. (2) When the number of new training samples exceeds the predefined threshold, OS-RKELM (Deng & Zheng, in press-b) is used to update the recognition model incrementally based on the new training dataset. In this section, we mainly describe OS-RKELM algorithm. OSKELM derives from the sequential implementation of the leastsquares solution of (4). n n The output weights trained on ℵ0 = [X0 = {xi }i=0 1 , T0 = {ti }i=0 1 ] is denoted as:

β

(0)

−1 T

= Z0 K0 T0

(8) n

where Z0 = λI + K0T K0 , K0 = K (X0 , X˜ ) and X0 = {xi }i=0 1 . Suppose n n now n1 new samples ℵ1 = {X1 = {xi }i=1 1 , T1 = {ti }i=1 1 } are selected by (6) and (7) to adapt the recognition model. Considering both chunks of training datasets ℵ0 and ℵ1 , the output weight becomes

β (1) = Z1−1

K0 K1

T0 T1

(9)

 T   K

I

 + K0T

λ  T   K0 K1

T0 T1

Label

Number of samples

WALKING WALKING_UPSTAIRS WALKING_DOWNSTAIRS SITTING STANDING LAYING Total samples

1 2 3 4 5 6

1 722 1 544 1 406 1 777 1 906 1 944 10 299

Algorithm 2 (Recognition Model Adaptation). Given initial model β (0) , X˜ , G0 where G0 = ( λI + K0T K0 )−1 , and new training samples n

n

{X1 = {xi }i=1 1 , T1 = {ti }i=1 1 },

(1) Calculate the incremental kernel matrix K1 = K (X1 , X˜ ); (2) Calculate G1 = G0 − G0 K1T (I + K1 G0 K1T )−1 K1 G0 ;

(3) Calculate the output weights β (1) = β (0) + G1 K1T (T1 − K1 β (0) ).

From (14), it can be seen that the sequential implementation of the least-squares solution of (4) is similar to the RLS (Recursive LeastSquares) algorithm in Chong and Zak (2013). Hence, all the convergence results of RLS can be applied here. The updating algorithm in Algorithm 2 does not need old training samples, therefore once the samples have been used in Algorithm 1 they can be discarded directly. This will save the memory cost and is power-saving, which is suitable for the mobile device.

 T  

K

0 and K1 = K (X1 , X˜ ). For sequential where Z1 = λI + K0 K1 1 learning, we should express β (1) as a function of β (0) and K1 but not a function of the dataset ℵ0 . Now Z1 can be written as

Z1 =

Activity Name

K1T

   K0 K1

= Z0 + K1T K1

(10)

= K0T T0 + K1T T1 = Z0 Z0−1 K0T K0 + K1T T1 = Z0 β (0) + K1T T1 = (Z1 − K1T K1 )β (0) + K1T T1 = Z1 β (0) − K1T K1 β (0) + K1T T1 .

(11)

Substituting (10) and (11) into (9), β (1) is given by

β (1) = Z1−1 (Z1 β (0) − K1T K1 β (0) + K1T T1 ) = β (0) + Z1−1 K1T (T1 − K1 β (0) )

(12)

Z1−1 rather than Z1 is used to compute β (1) from β (0) in formula (12). The update formula for Z1−1 is derived using the Woodbury formula (Golub & Van Loan, 1996) Z1−1 = (Z0 + K1T K1 )−1

= Z0−1 − Z0−1 K1T (I + K1 Z0−1 K1T )−1 K1 Z0−1 .

(13)

Let G0 = Z0−1 , G1 = Z1−1 then the equations for updating β (1) can be written as G1 = G0 − G0 K1T (I + K1 G0 K1T )−1 K1 G0

β (1) = β (0) + G1 K1T (T1 − K1 β (0) )

(14)

(14) gives the recursive formula for β (1) . Recognition model adaptation algorithm can be summarized as follows,

4. Performance evaluation 4.1. Dataset The activation recognition dataset is collected from 30 users within an age bracket of 19–48 years (Blake & Merz, 1998), and can be downloaded from UCI Machine Learning Repository.1 Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smart phone on the waist. 3-axial acceleration and 3-axial angular velocity were captured by the embedded accelerometer and gyroscope at a constant rate of 50 Hz which is sufficient for capturing human body motion. The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain. In our experiments, the sensor signals have been labeled manually and randomly partitioned into two sets, where the 80% for training and 20% for testing. After sensor signals were pre-processed by noise filters, they then sampled in fixed-width sliding windows of 2.56 s and 50% overlap. Window width is mainly determined according to the periods of activities, and one sliding window always should cover 2 to 3 periods, When one person walks, one step needs around one second, so the length of the fixed-width sliding windows is set as 2.56 s. Table 1 shows the number of samples obtained for each activity. The dataset consists of 10 299 samples collected from 30 users; it is so large that the model trained on it can achieve the scientific result.

1 http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+ Smartphones.

W.-Y. Deng et al. / Neural Networks 53 (2014) 1–7

5

Table 2 Performance of four classifiers (80% data for training and the rest 20% for testing).

Training time (s) Testing time (s) Testing accuracy (%)

RKELM λ = 230 n˜ = 500 σ = 210

Huang’s KELM λ = 210 σ = 210

ELM

SVM

γ = 20 n˜ = 1000

λ = 215

0.64 0.12 98.49

85.47 1.60 99.05

2.09 0.18 96.43

5.09 2.80 98.77

Table 3 Confusion matrix (Testing accuracy) of cross-person recognition by RKELM.

u1 u2 u3 u4 u5 u6 u7 u8 u9 u10

u1

u2

u3

100.00 91.07 87.90 76.95 57.93 91.64 87.90 92.80 75.50 74.93

78.48 93.44 90.40 80.13 81.13 79.80 83.11 84.44 77.81 80.13

85.92 87.98 100.00 93.26 81.23 81.52 87.39 70.09 73.02 82.70

u4 77.29 73.19 67.19 98.44 88.64 86.12 75.71 67.51 72.56 78.55

u5 78.15 67.88 65.56 87.42 100.00 82.12 82.45 69.54 68.21 78.81

u6 75.08 78.46 76.31 84.92 91.08 96.92 76.92 64.62 68.00 72.92

u7

u8

74.35 75.65 87.99 65.91 64.29 78.90 100.00 73.38 72.41 56.82

86.12 80.07 70.82 67.26 64.41 74.02 78.65 100.00 67.97 77.94

u9 59.72 60.42 50.35 43.75 57.29 37.85 77.08 75.35 87.01 73.96

u10 70.41 60.20 50.34 56.46 72.11 53.74 60.20 65.31 87.07 100.00

4.2. Experimental results All the simulations have been conducted in MATLAB 2010 environment running on an ordinary PC with 2.53 GHz CPU and 4 Gb memory space. 4.2.1. Classification performance comparison In this section we compare the performance of Huang’s KELM, ELM, and SVM in activity recognition. Matlab implementation of ELM and Huang’s KELM can be downloaded from Huang’s homepage.2 The simulations for SVM are carried out using LIBSVM packages.3 RKELM has been implemented by ourselves using Matlab.4 The Gaussian kernel K (x, xi ) = exp(−∥x − xi ∥2 /σ ) has been used in the RKELM and SVM while sigmoid activation function h(x, a, b) = 1/(1 + e−γ (a·x+b) ) has been used in ELM. For Gaussian kernel parameter σ , we estimate the generalized accuracy using different parameter values σ = [2−5 , . . . , 220 ], and select an optimized value with best generalized performance. After σ is determined, for RKELM, we estimate the generalized accuracy using different combination of regularized parameter λ and the size n˜ of subset: λ = [2−5 , . . . , 220 ] and n˜ are gradually increased by an interval of 5. Average results of 10 trials of simulations with each combination are obtained and the best performance obtained is shown in this paper. For SVM, we estimate the generalized performance using different λ = [2−5 , . . . , 220 ]. Average results of 10 trials of simulations with each λ are obtained and the best performance obtained is shown in this paper. For ELM using the sigmoid function, we estimate the generalized performance using different parameter values γ = [2−10 , . . . , 210 ], and select an optimized value for γ . Then the number of hidden nodes n˜ is gradually increased by an interval of 5 and the best average performance of 10 trials of simulation is reported in this paper. The parameter values, training time (seconds), testing time (seconds), training accuracy and testing accuracy are listed in Table 2. As can be seen from Table 2, although RKELM is not the highest accuracy among them, its accuracy 98.49% is still competitive. In addition, the training time and testing time of RKELM are much less than that of SVM and Huang’s KELM. The average training time of RKELM is 0.64 s, while Huang’s KELM and SVM consume 85.47 s,

2 http://www.ntu.edu.sg/home/egbhuang. 3 http://www.csie.ntu.edu.tw/~cjlin/libsvm/#download. 4 http://yunpan.cn/QDqESbIYiKhid.

Fig. 2. The training accuracy and testing accuracy of 100 trials of simulation (λ = 230 , n˜ = 500, σ = 210 ).

and 15.09 s respectively. This indicates that RKELM is much faster than SVM and Huang’s KELM. Since the number of selected samples in RKELM is smaller than the number of hidden neurons of ELM, the training time and testing time of RKELM are shorter than ELM: ELM consumes 2.09 s for training and 0.18 s for testing while RKLEM only needs 0.64 s and 0.12 s respectively. The testing time of RKELM is obviously less than that of SVM (2.80 s) and Huang’s KELM (1.60 s). In order to test the stability of RKELM activity recognition model, 100 trails of simulation are conducted and the results are shown in Fig. 2. For each trial, the proportion of training and testing sets is fixed according to 80%:20%, but the order of data is randomly shuffled. We can find that the testing accuracy is in the range [98%, 99%], and there are no dramatic ups and downs in 100 trials of simulation. This means that RKELM activity recognition model is very stable. 4.2.2. Cross-person recognition without model adaptation In order to evaluate the performance degradation of recognition model when the device user is changed, an experiment of crossperson recognition is done. For each person ui , a classifier is learned and tested on all the people. Limited by space we just select 10 users from the users. The confusion matrix of testing accuracy of RKELM, ELM and SVM are shown in Table 3. The element value of i-th row and j-th column represents the testing accuracy of the model trained on ui and tested on uj . Each sample in the dataset has been labeled by their user id. So the training data and testing data can be easily selected by user id. From Tables 3–5, we can find that RKELM and ELM obtain similar accuracies at known users. However, the accuracies of

6

W.-Y. Deng et al. / Neural Networks 53 (2014) 1–7

Table 4 Confusion matrix (Testing accuracy) of cross-person recognition by ELM. u1 u1 u2 u3 u4 u5 u6 u7 u8 u9 u10

100.00 55.91 42.94 33.72 49.57 47.26 33.14 70.32 23.92 44.09

u2 61.59 98.36 64.57 48.01 62.91 62.58 47.02 57.28 27.15 59.60

u3 66.57 63.34 100.00 63.64 67.74 62.76 52.20 56.89 29.91 55.72

u4 67.82 64.67 59.62 98.44 69.09 62.15 36.91 58.99 41.32 57.41

u5

u6

u7

u8

63.25 55.96 40.40 43.38 100.00 73.51 33.11 60.93 35.43 67.55

55.69 52.31 46.15 54.46 59.08 100.00 36.62 53.85 23.69 45.85

52.60 61.36 55.19 49.03 45.78 39.94 100.00 49.68 41.23 44.16

63.35 41.99 46.26 43.42 47.33 38.79 34.88 100.00 29.89 48.75

u9 40.28 55.21 47.92 38.89 42.71 32.64 51.04 43.40 77.59 54.86

u10 63.61 44.56 32.99 36.39 41.16 43.20 24.83 43.88 41.50 100.00

Table 5 Confusion matrix (Testing accuracy) of cross-person recognition by SVM. u1 u1 u2 u3 u4 u5 u6 u7 u8 u9 u10

100.00 87.03 92.21 76.65 76.65 90.77 91.93 93.08 83.28 85.30

u2 73.17 90.16 90.72 90.06 81.78 80.79 86.09 90.39 78.80 85.09

u3

u4

u5

u6

u7

u8

u9

87.09 90.61 100.00 91.78 80.64 85.33 86.21 68.32 83.57 78.88

87.06 77.91 85.48 100.00 87.69 82.01 70.66 64.35 81.70 77.60

75.82 76.49 74.50 87.41 100.00 86.42 78.14 68.54 72.18 88.74

72.30 80.30 75.07 78.76 92.30 100.00 74.46 67.38 68.61 80.61

80.19 79.54 92.20 76.29 81.16 81.81 100.00 70.12 83.11 78.57

81.49 72.24 72.95 74.02 72.95 76.15 75.08 100.00 76.51 75.80

72.91 80.20 68.05 71.52 65.97 56.59 80.90 71.52 100.00 80.20

Table 6 Recognition results on the known users before and after adaptation (A and B are the known users and C is the new user).

Train data Test data Accuracy (%)

Before adaptation

After adaptation

TrainAB TestAB 96.95

TrainAB + HC 1 TestAB 97.71

ELM at new users decrease a lot and much lower than that of RKELM. SVM always gets 100% accuracy at known users which is more accurate than RKELM, while RKELM and SVM obtain similar accuracies at new users. 4.2.3. Cross-person model adaptation In this section, the experiments aim to test the RKELM model’s adaptability to new users. Three users are selected from 30 users as examples and they are denoted as A, B and C . The datasets of these users are represented as DA , DB and DC , respectively. Each dataset is randomly divided into two parts (80%:20%), which are represented as DA1 and DA2 , DB1 and DB2 , and DC 1 and DC 1 . Without loss of generality, we first assume that A and B are known users and C is a new one. TrainAB which equals DA1 + DB1 ; is used to train an RKELM model named as initial model. Data DC 1 is used to adapt the initial model to a new one. TestAB , which equals DA2 + DB2 ; is used to test the two model’s classification capability on the known users. DC 2 is used to test the two model’s classification capability on the new people. For the initial model and each test sample in DC 1 , if the classification confidence, g, is larger than 0.75, it will be added into a new dataset, HC 1 . Then, using HC 1 , the recognition model can be adapted. The performances of the initial model and the new model on the known users are shown in Table 6. We can see that after model adaptation, the new model almost has the same classification capability as the initial model. The performances of the initial model and the new model on the new people are shown in Table 7. We can see that after adaptation accuracy is improved about 4.3%. When B and C are assumed as known users and A is as the new people, experiment results are shown in Tables 8 and 9. After adaptation, the accuracy is improved about 1.6%. When A and C are assumed as known people and B is as the new people,

u10 84.01 76.87 63.26 70.74 79.25 82.99 73.46 85.03 81.97 96.61

Table 7 Recognition results on the known users before and after adaptation (A and B are the known users and C is the new user).

Train data Test data Accuracy (%)

Before adaptation

After adaptation

TrainAB DC 2 88.41

TrainAB + HC 1 DC 2 92.75

Table 8 Recognition results on the known users before and after adaptation (B and C are the known users and A is the new user).

Train data Test data Accuracy (%)

Before adaptation

After adaptation

TrainBC TestBC 98.46

TrainBC + HA1 TestBC 95.38

Table 9 Recognition results on the known users before and after adaptation (B and C are the known users and A is the new user).

Train data Test data Accuracy (%)

Before adaptation

After adaptation

TrainBC DA2 88.52

TrainBC + HA1 DA2 90.16

Table 10 Recognition results on the known users before and after adaptation (A and C are the known users and B is the new user).

Train data Test data Accuracy (%)

Before adaptation

After adaptation

TrainAC TestAC 99.2

TrainAC + HB1 TestAC 99.3

experiment results are shown in Tables 10 and 11. After adaptation, the performance is improved about 1.4%. In order to validate the generality of the results, we run 50 trials of simulation in each trial 3 users are randomly selected from the 30 users. The similar improvements can be obtained in 50 trials. This confirms that cross-person model is really effective.

W.-Y. Deng et al. / Neural Networks 53 (2014) 1–7 Table 11 Recognition results on the known users before and after adaptation (A and C are the known users and B is the new user).

Train data Test data Accuracy (%)

Before adaptation

After adaptation

TrainAC DB2 91.43

TrainAC + HB1 DB2 92.86

5. Conclusions This paper intends to recognize the activities based on an accelerometer embedded in the mobile device in the daily life when the device users are varying. We propose a fast and robust activity recognition model based on RKELM to deal with the problem of varying device users. OS-RKELM, a fast online sequential learning algorithm, is used to update the recognition model online and adapt the model to new users. Experimental results demonstrate that model adaptation improves the recognition accuracy obviously. In the future, we will employ 300 persons to collect data of more daily activities and consider about cross-device, crosslocations and other relative problems. Additionally, in the paper confidence threshold value is very important for the model. If it can be automatically determined according to the real applications instead of current manual setting, the performance may be improve further, so we will continue to make research on the automatic determination of the threshold in our future work. Acknowledgments The research was supported in part by Innovation fund research group under Grant No. 61221063; National Science Foundation of China under Grant Nos. 61100166, 91118005, 91218301, 61373116; National High Technology Research and Development Program 863 of China under Grant No. 2012AA011003; Cheung Kong Scholars Program; Key Projects in the National Science and Technology Pillar Program under Grant No. 2011BAK08B01, 2012BAH16F02, 2013BAK09B01; Shaanxi provincial youth science and technology star plan under Grant No. 2013KJXX-29; The special funds of key discipline construction project of Shaanxi province ordinary high school. References Anguita, D., Ghio, A., Oneto, L., Parra, X., & Reyes-Ortiz, J. L. (2012). Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In Ambient assisted living and home care (pp. 216–223). Springer. Blake, C.L., & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/mlearn/MLRepository.html ]. Irvine, CA: University of California, Department of Information and Computer Science, vol. 55. Cambria, E., & Hussain, A. (2012). Sentic computing: techniques, tools, and applications: Vol. 2. Springer.

7

Chen, Y., Qi, J., Sun, Z., & Ning, Q. (2010). Mining user goals for indoor locationbased services with low energy and high QoS. Computational Intelligence, 26(3), 318–336. Chen, Y., Zhao, Z., Wang, S., & Chen, Z. (2012). Extreme learning machine-based device displacement free activity recognition model. Soft Computing, 16(9), 1617–1625. Chong, E. K., & Zak, S. H. (2013). An introduction to optimization: Vol. 76. John Wiley& Sons. Deng, W.-Y., & Zheng, Q.-H. Kernel Extreme Learning Machine (in press-a). Deng, W.-Y., & Zheng, Q.-H. Online Sequential Kernel Extreme Learning Machine (in press-b). Deng, W., Zheng, Q., & Zhang, K. (2013). Reduced Kernel Extreme Learning Machine. In Proceedings of the 8th international conference on computer recognition systems CORES 2013 (pp. 63–69). Dijkstra, B., Kamsma, Y. P., & Zijlstra, W. (2010). Detection of gait and postures using a miniaturized triaxial accelerometer-based system: accuracy in patients with mild to moderate Parkinson’s disease. Archives of Physical Medicine and Rehabilitation, 91(8), 1272–1277. Golub, G. H., & Van Loan, C. F. (1996). Matrix computations. In Johns Hopkins studies in the mathematical sciences. Baltimore, MD: Johns Hopkins University Press. Huang, G.-B., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme Learning Machine for Regression and Multiclass Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(2), 513–529. Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1–3), 489–501. Lee, Y. J., & Huang, S. Y. (2007). Reduced support vector machines: a statistical theory. Neural Networks, IEEE Transactions on, 18(1), 1–13. Liu, Q., He, Q., & Shi, Z. (2008). Extreme support vector machine classifier. In Proceedings of the 12th Pacific-Asia conference on advances in knowledge discovery and data mining (pp. 222–233). Mannini, A., & Sabatini, A. M. (2010). Machine learning methods for classifying human physical activity from on-body accelerometers. Sensors, 10(2), 1154–1175. Mital, P. K., Smith, T. J., Hill, R. L., & Henderson, J. M. (2011). Clustering of gaze during dynamic scene viewing is predicted by motion. Cognitive Computation, 3(1), 5–24. Ravi, N., Dandekar, N., Mysore, P., & Littman, M.L. (2005). Activity recognition from accelerometer data. In AAAI (pp. 1541–1546). Roggen, D., Magnenat, S., Waibel, M., & Troster, G. (2011). Wearable computing. Robotics & Automation Magazine, IEEE, 18(2), 83–95. Stikic, M., Van Laerhoven, K., & Schiele, B. (2008). Exploring semi-supervised and active learning for activity recognition. In 12th IEEE International Symposium on Wearable computers, 2008, ISWC 2008 (pp. 81–88). Taylor, J. G. (2009). Cognitive computation. Cognitive Computation, 1(1), 4–16. Torrey, L., Walker, T., Shavlik, J., & Maclin, R. (2005). Using advice to transfer knowledge acquired in one reinforcement learning task to another. In ECML 2005 (pp. 412–424). Springer. Wang, H., Kong, B., & Zheng, X. (2010). An improved reduced support vector machine. In 2010 IEEE youth conference on information computing and telecommunications (YC-ICT) (pp. 170–173). Ward, J. A., Lukowicz, P., & Gellersen, H. W. (2011). Performance metrics for activity recognition. ACM Transactions on Intelligent Systems and Technology (TIST), 2(1), 6:1–6:23. Wollmer, M., Eyben, F., Graves, A., Schuller, B., & Rigoll, G. (2010). Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework. Cognitive Computation, 2(3), 180–190. won Lee, J., & Giraud-Carrier, C. (2007). Transfer learning in decision trees. In Proceedings of the international joint conference on neural networks (pp. 726–731). Zhao, Z., Chen, Y., Liu, J., Shen, Z., & Liu, M. (2011). Cross-people mobile-phone based activity recognition. In Proceedings of the twenty-second international joint conference on artificial intelligence-volume volume three (pp. 2545–2550). Zheng, V.W., Hu, D.H., & Yang, Q. (2009). Cross-domain activity recognition. In Proceedings of the 11th international conference on Ubiquitous computing (pp. 61–70).

Cross-person activity recognition using reduced kernel extreme learning machine.

Activity recognition based on mobile embedded accelerometer is very important for developing human-centric pervasive applications such as healthcare, ...
553KB Sizes 2 Downloads 0 Views