This article was downloaded by: [University Of Pittsburgh] On: 12 October 2014, At: 12:43 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Research Quarterly for Exercise and Sport Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/urqe20

Recovering Physical Activity Missing Data Measured by Accelerometers: A Comparison of Individual and GroupCentered Recovery Methods a

a

Jie Zhuang , Peijie Chen , Chao Wang a

a b

a

a

, Jing Jin , Zheng Zhu & Wenjie Zhang

a

Shanghai University of Sport

b

Capital University of Physical Education and Sports Published online: 04 Dec 2013.

To cite this article: Jie Zhuang , Peijie Chen , Chao Wang , Jing Jin , Zheng Zhu & Wenjie Zhang (2013) Recovering Physical Activity Missing Data Measured by Accelerometers: A Comparison of Individual and Group-Centered Recovery Methods, Research Quarterly for Exercise and Sport, 84:sup2, S48-S55, DOI: 10.1080/02701367.2013.851060 To link to this article: http://dx.doi.org/10.1080/02701367.2013.851060

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Research Quarterly for Exercise and Sport, 84, S48–S55, 2013 Copyright q AAHPERD ISSN 0270-1367 print/ISSN 2168-3824 online DOI: 10.1080/02701367.2013.851060

Recovering Physical Activity Missing Data Measured by Accelerometers: A Comparison of Individual and Group-Centered Recovery Methods Jie Zhuang and Peijie Chen Downloaded by [University Of Pittsburgh] at 12:43 12 October 2014

Shanghai University of Sport

Chao Wang Shanghai University of Sport Capital University of Physical Education and Sports

Jing Jin, Zheng Zhu, and Wenjie Zhang Shanghai University of Sport

Purpose: The purpose of this study was to determine which method, individual informationcentered (IIC) or group information-centered (GIC), is more efficient in recovering missing physical activity (PA) data. Method: A total of 2,758 Chinese children and youth aged 9 to 17 years old (1,438 boys and 1,320 girls) wore ActiGraph GT3X/GT3Xþ accelerometers for 7 consecutive days. Those with no missing data (n ¼ 900) were used to form a nonmissing sample, which, based on a semisimulation approach, was used to create a missing data set to evaluate a set of recovery methods, including 2 IIC and 22 GIC methods. Root mean square difference (RMSD), mean signed difference, and paired t test were used to determine the effectiveness of the recovery methods. Results: The smallest RMSD values, which represent the most accurate recovery, were found with: (a) GIC-Expectation – maximization (GIC-EM) regardless of gender and by age (113,957.64); (b) GIC-EM regardless of gender and age (114,367.88); (c) GIC-EM regardless of age and by gender (114,697.06); (d) GIC-EM by gender and age (116,178.34); and (e) IIC averaging of remaining days (125,851.23). Conclusion: To recover 7-day PA accelerometer-determined activity missing data, we recommend using the GIC-EM and IIC approaches. Keywords: data analysis, imputation, replacement, simulation

Missing data are a ubiquitous problem that complicates the statistical analysis of data (Fitzmaurice, 2008) and potentially threatens the validity of a research study (Roth, 1994). This is because the missing data may introduce biased results, as well as lead to a loss of statistical power and precision (Karahalios, Baglietto, Carlin, English, &

Correspondence should be addressed to Peijie Chen, Shanghai University of Sport, 399 Chang Hai Road, Shanghai, 200438, P. R. China. E-mail: [email protected]

Simpson, 2012). More importantly, incomplete or missing data could lead to the loss of important information, which, in turn, makes researchers delete the incomplete values or spend more time and funding to retest large samples. Unfortunately, missing data are a common phenomenon when measuring physical activity (PA) using accelerometers, which are electronic sensors that measure the quantity and intensity of movement. Commonly used accelerometers are triaxial and record acceleration in three planes by three different accelerometers positioned internally at 908 from one another. Accelerometers are considered superior to other

Downloaded by [University Of Pittsburgh] at 12:43 12 October 2014

RECOVERING PHYSICAL ACTIVITY MISSING DATA

measurement methods because they detect PA with various intensities in free-living settings and have been demonstrated as a valid approach for assessing children’s PA (Bassett, Mahar, Rowe, & Morrow, 2008; Crouter, Horton, & Bassett, 2012; Park, Ishikawa-Takata, Tanaka, Mekata, & Tabata, 2011; Plasqui & Westerterp, 2007; Puyau, Adolph, Vohra, & Butte, 2002; Rowlands, 2007). However, often times, missing data are unavoidable due to noncompliance of participants, equipment malfunction (i.e., dead battery, equipment breaks, etc.), and investigator error (i.e., not properly initializing the equipment). In fact, a high percentage of missing data is often reported when using accelerometers to measure PA. For example, in a 7-consecutive-day PA study of students in Grades 6 through 8 using accelerometers, only 50% of the participants had 7 days of completed data, and corresponding percentages of complete 6-, 5-, 4-, and 3-day data were 67%, 75%, 86%, and 92%, respectively (Van Coevering et al., 2005). In Troiano et al.’s (2008) study, only 16.8% of adolescents (aged 12 to 19 years old) provided 7 valid days of data. A 60-min allowable interruption period and wear-time criteria of 10 – hr per day resulted in 95% of the subsample having at least 1 valid day and 84% having at least 4 valid days of wearing an accelerometer for 7 consecutive days (Colley, Gorber, & Tremblay, 2010). Less than 80% of the participants recorded 4 valid whole days of data (79.06%) even if the most lenient criterion of defining a valid day ($ 6 hr) was used (Lee, Macfarlane, & Lam, 2013). Thus, it is important to find an effective statistical method to recover missing data. Statistical methods have been developed to recover missing values (Dale, Welk, & Matthews, 2002; Little, & Rubin, 1987; Schafer, 1997). The most common recovery method is the group information-centered (GIC) approach, in which a summary (e.g., mean) from the sample group replaces an individual’s missing value (Acock, 1997; Little & Rubin, 1989). Laird (1988) mentioned that the GIC approach may result in a loss of efficiency and may bias the results. The GIC approach also may not be appropriate in handling step-count data (Kang, Rowe, Barreira, Robinson, & Mahar, 2009; Kang, Zhu, Tudor-Locke, & Ainsworth, 2005; Schafer & Graham, 2002; Tudor-Locke et al., 2005). The individual information-centered (IIC) approach is also commonly used. This approach involves using a summary from the rest of the data from the same individual. Schafer and Graham (2002) suggested using each participant’s available data. Kang and Zhu (2003) compared several missing data recovery methods for step-count data and concluded that IIC methods are superior to GIC techniques. In Kang et al.’s (2005) study, participants (aged 17 to 79 years old) wore pedometers for 21 consecutive days, through varied combinations of weekdays and/or weekend days, and were calculated by four IIC and four GIC approaches. Researchers found that the lowest root mean square difference (RMSD) was in the IIC method with the mean of remaining days (i.e., the IIC approach produced a more

S49

accurate recovery method dealing with missing data). In Kang et al.’s (2009) study, calculated by two IIC and two GIC approaches, the IIC methods produced better recovery results than did GIC methods. All of these studies investigated comparisons between IIC and GIC methods and found IIC to be the superior measure for missing data recovery. Some statistical packages, such as the Statistical Package for the Social Sciences (SPSS; 2003 or later versions), have an analysis function called missing value analysis, which can analyze the missing data using regression imputation and the expectation – maximization (EM) algorithm (Horton & Lipsitz, 2001; SPSS, Inc., 2002; Von Hippel, 2004). The EM algorithm (Maclachlan & Krishnan, 1977) can be separated into the expectations (E) and maximization (M) steps (Firat, Dikbas, Koc , & Gungor, 2010) for updating the estimate un of the unknown parameter u at the iteration. It is one of the most effective algorithms for maximization because it iteratively transfers maximization from a complex function to a simple, surrogate function (Becker, Yang, & Lange, 1997). EM is a generic tool that offers maximum likelihood solutions when data sets are incomplete with data values missing at random or completely at random (Griffith, 2010). The EM algorithm was found to have slightly greater precision in predicting missing values compared with multiple imputation (MI) for the measure of activity used in this analysis (metabolic equivalent minutes of moderateto-vigorous physical activity; Catellier et al, 2005) and was the procedure chosen for imputation in the Trial for Activity in Adolescent Girls (Stevens et al., 2005). Other efforts have also been made for finding the most effective way for missing data recovery. For example, Lee et al. (2013) provided details for estimating compliance rates for samples with different characteristics and thus sample size calculation to account for participant compliance with using log-linear regression in obtaining analyzable accelerometer data for 4 consecutive days. Lee (2013) proposed a new approach to the imputation of missing accelerometer data that takes into account the data available from invalid days, and the combined approach performed significantly better compared with the traditional imputation method (all t tests, p , .001) for 7 days. As mentioned, Kang and Zhu (2003) and Kang et al. (2009) proposed an IIC method to recover the missing data and applied it successfully to pedometer missing data recovery. That method, however, required cross-week information/data. In practice, however, 7 days is a more commonly used data collection period in PA research. In fact, the 7-day wearing period has been used in many large surveillance studies. In a study by Trost, Pate, Freedson, Sallis, and Taylor (2000), a 7-day monitoring protocol provided reliable estimates of usual PA behavior in children and adolescents. Matthews, Ainsworth, Thompson, and Bassett (2002) suggested that 7 days of accelerometer data collection are needed for a stable estimation of adults’ PA

Downloaded by [University Of Pittsburgh] at 12:43 12 October 2014

S50

J. ZHUANG ET AL.

patterns. Kang et al. (2009) selected 7 days as the number of days of data collection for their study. The 7-day wearing period was also utilized with pedometers in a large national study in Canada (Craig, Tudor-Locke, Cragg, & Cameron, 2010). The 7-day period has been instituted in the United States for objectively measured PA with participants wearing accelerometers (Tudor-Locke, Leonardi, Johnson, Katzmarzyk, & Church, 2011). Kang, Hart, and Kim (2012) selected 7 consecutive days of pedometer step counts to examine the threshold of the number of missing days that could effectively be recovered using the IIC approach. Finally, Lee (2013) provided 7 days of valid accelerometer data to propose a new approach to the imputation of missing accelerometer data that takes into account the data available from invalid days. Because 7 days is the most commonly used period for PA data collection yet is a time period that experiences missing data, the question naturally is raised as to which method, GIC or IIC, works best for missing data recovery during a 7-day data collection period. Using a semisimulation design and a large PA data set collected within a 7-day period, the purposes of this study therefore were twofold: (a) to compare the recovery performance of a set of missing data recovery methods, including IIC, GIC, GIC-Regression (GIC-R), and GIC-Expectation – maximization (GIC-EM) approaches; and (b) to determine the best approach for recovering missing data in 7-day PA research. METHODS

a nonmissing sample. The missing sample was made up of 900 participants randomly selected from the remaining 1,858 participants with missing data. The missing data pattern was copied, and the semisimulation data sample was created in a one-to-one fashion using the nonmissing data. As the true values of the missing data in the nonmissing sample were known, these data can be used to determine the most efficient recovery approach by making comparisons with the true missing values. Recovery Methods Mean substitution was employed to replace missing data. Each individual’s activity count average was used for IIC, and the group’s count average was used for GIC. In addition, regression was used for GIC-R, and EM was used for GIC-EM. We examined several conditions, including combinations of weekday and weekend-day information, depending on the type of missing data, and classifications regardless of gender or age. Replacements from the methods can be implemented by using SPSS Version 18.0 statistical software. The created missing values were replaced with the 2 IIC and 22 GIC conditions described in Table 1. The conditions include 2 IIC, 12 GIC, 6 GIC-R, and 4 GIC-EM. The EM algorithm (Dempster, Laird, & Rubin, 1977), as mentioned earlier, used the conditional expectations of missing data, given observed data and estimates of model parameters are calculated by Equation 1 in the E step. Qðu0 jun Þ ¼ Eðzjx;unÞ ½ log Lðu; x; zÞ;

ð1Þ

Participants and Data Collection The data for this study were from the Chinese City Children and Youth Physical Activity Study, a major national PA survey study. A total of 2,758 healthy Chinese city children and youth aged 9 to 17 years old (1,438 boys and 1,320 girls) were involved in the study from 7 elementary schools, 11 junior high schools, and 7 high schools. After receiving complete information about the aims and methods of the study, all participants assented and their parents or guardians gave written informed consent. The study was approved by the Ethics Advisory Committee of the Shanghai University of Sport. Participants were instructed to wear the ActiGraph GT3X/GT3Xþ accelerometer (ActiGraph, Ft. Walton Beach, FL) during their waking hours for 7 consecutive days with random starting dates. Nine hundred of the 2,758 participants had complete data for 7 days, and the remaining 1,858 had at least one missing data point. Semisimulation Data Generation Following the procedures described by Kang et al. (2005), a semisimulation design was employed to create missing data sets to assess the recovery approach. Specifically, 900 of the 2,758 participants had no missing values and formed

where L(u; x, z) is the likelihood function, u is parameter vector, un is the estimate of the model parameters, x is observed data, and z is the missing data. In the M step, the model parameters can be calculated using Equation 2 to maximize the complete data log likelihood function from the E step.

u ¼ arg u max Qðujun Þ; *

ð2Þ

Data Analysis This study evaluated the effect of various combinations in recovering missing values by comparing the original known values purposely removed from the semisimulated missing data set with the replacements estimated based on the different recovery conditions. Two indexes, RMSD and mean signed difference (MSD), were used to determine the effectiveness of the various recovery conditions (Catellier et al., 2005; Kang et al., 2005, 2009). RMSD was calculated by the differences between the original and replacement values, which were then squared, averaged, and square-rooted. The formula for RMSD

RECOVERING PHYSICAL ACTIVITY MISSING DATA TABLE 1 Description of 24 Missing Recovery Methods Missing Recovery Methods Individual informationcentered (IIC)

Group informationcentered (GIC)

Conditions 1

Mean of remaining days

2

Mean of remaining weekdays or weekend days, depending on the type of missing day Mean of overall regardless of gender and age Mean of overall weekday or weekend day regardless of gender and age, depending on the type of missing day Mean of the same day regardless of gender and age Mean of overall regardless of age and by gender Mean of overall weekday or weekend day regardless of age and by gender, depending on the type of missing day Mean of the same day regardless of age and by gender Mean of overall regardless of gender and by age Mean of overall weekday or weekend day regardless of gender and by age, depending on the type of missing day Mean of the same day regardless of gender and by age Mean of overall by gender and age Mean of overall weekday or weekend day by gender and age, depending on the type of missing day Mean of the same day by gender and age Regression residual method regardless of gender and age Regression residual method regardless of age and by gender Regression residual method regardless of gender and by age Regression variable method regardless of gender and age Regression variable method regardless of age and by gender Regression variable method regardless of gender and by age Expectation-maximization regardless of gender and age Expectation-maximization regardless of age and by gender Expectation-maximization regardless of gender and by age Expectation-maximization by gender and age

3 4

Downloaded by [University Of Pittsburgh] at 12:43 12 October 2014

5 6 7

8 9 10

11 12 13

14 GIC-Regression

15 16 17 18 19 20

GIC-Expectation– maximization

Description

21 22 23 24

S51

is as follows: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN 2 j¼1 ðoriginal valuej 2 replacement valuej Þ RMSD ¼ : N ð3Þ MSD was calculated by the differences between the original and replacement values, which were then averaged. The formula for MSD is as follows: PN j¼1 ðoriginal valuej 2 replacement valuej Þ MSD ¼ : ð4Þ N A smaller RMSD and close-to-zero MSD represent a better recovery of the missing values. Paired t tests were used to examine the mean differences between the original and replacement values. These analyses were completed using Microsoft Excel and SPSS Version 18.0 statistical software.

RESULTS Descriptive Statistics The participants are grouped in Table 2 by age group criteria of the Centers for Disease Control and Prevention (2011). In the original data (N ¼ 2,758), 32.63% (n ¼ 900) of the cases had no missing values. This data set was used to create artificial data to test. A total number of 1,438 boys and 1,320 girls were in the original data set, and 445 boys and 455 girls were in the artificial data set. Participants were aged 9 to 17 years old and were classified to middle childhood (9 – 11 years old), young teens (12 –14 years old), and teenagers (15 –17 years old). In the original data set (N ¼ 2,758), the highest mean activity counts were found on Mondays (M ^ SD ¼ 348,200.87 ^ 147,885.65), and mean activity counts for Sundays were reported as being the lowest (303,308.25 ^ 149,917.51). Grand mean counts were 323,805.36. In the semisimulated missing data set, which is the artificial data set (N ¼ 900), the highest mean counts were also found on Mondays (341,438.16 ^ 148,725.22), and mean counts for Sundays were reported as being the lowest (290,822.70 ^ 145,338.01). Grand mean counts were 321,834.61. The means and standard deviations and minimum and maximum values of the activity count data are presented by day and by different data sets in Table 3. Missing Recovery The results of the RMSD for both the IIC and GIC approaches are summarized in Table 4. A smaller RMSD value represents better recovery of the missing values. The RMSD of the IIC, GIC, GIC-R, and GIC-EM approaches

S52

J. ZHUANG ET AL. TABLE 2 Number of the Participants by Age Group

Group

Original Data

Artificial Data

Age (years)

Boys

Girls

Boys

Girls

9 –11 12 –14 15 –17 9 –17

327 599 512 1,438

449 509 362 1,320

123 178 144 445

153 176 126 455

Middle Childhood Young Teens Teenagers Total

Downloaded by [University Of Pittsburgh] at 12:43 12 October 2014

Note. Original data ¼ data from original data set, N ¼ 2,758; artificial data ¼ data from semisimulated missing data set, n ¼ 900.

ranged from 125,851.23 to 130,798.58, from 139,777.80 to 143,877.85, from 156,633.63 to 164,787.39, and from 113,957.64 to 114,697.06, respectively, for activity counts. Overall, the IIC showed a smaller RMSD than did the GIC, the GIC showed a smaller RMSD than did the GIC-R, and the GIC-EM approach showed the lowest RMSD. For the IIC approach, Condition 1—that is, replacing a missing value by the means of remaining days—was the most effective in recovering missing values (RMSD ¼ 125,851.23). For the GIC approach, Condition 14—that is, replacing a missing value by the means of the same day by gender and age—was

the most effective in recovering missing values (RMSD ¼ 139,777.80). For the GIC-R approach, Condition 17—that is, replacing a missing value by a regression residual method regardless of gender and by age—was the most effective in recovering the missing values (RMSD ¼ 156,633.63). For the GIC-EM approach, Condition 23—that is, replacing a missing value by the EM regardless of gender and by age—was the most effective in recovering the missing values (RMSD ¼ 113,957.64). The results of the MSD for both the IIC and GIC conditions are summarized in Table 5. The MSD was included to show the direction and degree of bias that may be caused by missing data recovery methods. The smallest MSD values were found in the GIC-EM (range ¼ 2 9.44 to 48.31). There was no condition more effective between IIC (range ¼ 2 64.40 to 2 76.51), GIC (range ¼ 28.44 to 88.61), and GIC-R (range ¼ 2 37.44 to 80.57). Negative MSDs across all conditions indicated that predicted missing values tended to be overestimated. Paired t test results are presented under each condition in Table 4. No significant mean differences were found between the original values and the replacement values in all conditions after adjusting the alpha level by the Bonferroni technique. The t values of

TABLE 3 Statistical Summary of Count Information by Day and Data Set Original Data Days Mondays Tuesdays Wednesdays Thursdays Fridays Saturdays Sundays Total

Artificial Data

M

SD

Minimum

Maximum

M

SD

Minimum

Maximum

348,200.87 324,521.38 321,231.66 327,483.85 331,778.94 310,112.59 303,308.25 323,805.36

147,885.65 131,641.93 128,123.68 142,501.93 144,345.54 156,271.69 149,917.51 142,955.42

73,066.00 64,199.00 84,902.00 34,675.00 44,953.00 28,435.00 33,979.00

1,404,617.00 827,505.00 909,747.00 948,924.00 1,010,140.00 1,092,643.00 909,109.00

341,438.16 325,046.82 318,683.09 332,829.67 328,951.89 315,069.94 290,822.70 321,834.61

148,725.22 134,952.16 128,530.10 141,372.42 142,041.75 163,605.70 145,338.01 143,509.34

73,066.00 64,199.00 92,082.00 58,125.00 71,562.00 28,435.00 33,979.00

1,404,617.00 811,836.00 909,747.00 948,924.00 1,010,140.00 1,092,643.00 909,109.00

Note. Original data ¼ data from original data set, N ¼ 2,758; artificial data ¼ data from semisimulated missing data set, n ¼ 900.

TABLE 4 Root Mean Square Difference by Method Conditions Missing Recovery Methods Individual information-centered Group information-centered (GIC)

GIC-Regression GIC-Expectation–maximization

1 125,851.23 3 144,016.11 9 142,402.38 15 160,721.01 21 114,367.88

2 130,798.58 4 143,686.15 10 141,957.58 16 157,592.46 22 114,697.06

Note. See the Method section for a description of the conditions.

5 143,877.85 11 141,681.88 17 156,633.63 23 113,957.64

6 141,851.89 12 140,164.93 18 162,272.06 24 116,178.34

7 141,548.98 13 139,791.45 19 164,787.39

8 142,015.59 14 139,777.80 20 159,548.79

RECOVERING PHYSICAL ACTIVITY MISSING DATA

S53

TABLE 5 Mean Signed Difference by Method Conditions Missing Recovery Methods Individual Information-centered t values Group information-centered (GIC) t values

Downloaded by [University Of Pittsburgh] at 12:43 12 October 2014

GIC-Regression t values GIC-Expectation–maximization t values

1 276.51 22.16 3 46.47 0.70 9 63.76 1.34 15 39.41 0.45 21 28.47 0.33

2 264.40 21.47 4 75.06 1.84 10 86.24 2.45 16 80.57 1.93 22 29.44 20.04

5 75.20 1.84 11 88.61 2.60 17 79.74 1.90 23 48.31 0.96

6 28.44 0.27 12 52.03 0.90 18 237.44 20.40 24 44.22 0.79

7 66.04 1.44 13 78.88 2.09 19 18.90 0.10

8 61.44 1.24 14 78.81 2.08 20 60.88 0.84

Note. See the Method section for a description of the conditions.

the IIC, GIC, GIC-R, and GIC-EM approaches ranged from 2 1.47 to 2.16, from 0.27 to 2.60, and from 0.10 to 1.93, and from 2 0.04 to 0.96.

DISCUSSION We used accelerometer data collected from children and youth aged 9 to 17 years old for 7 consecutive days. Nine hundred of the 2,758 participants formed a nonmissing sample, calculated by 2 IIC and 22 GIC approaches. We attempted to further compare the methods in handling accelerometer-determined activity-count missing data, based on the research of Kang et al. (2005, 2009). Then, we examined the accuracy of data replacement. We found the lowest MSD for the GIC-EM (range ¼ 20.04 to 0.96), followed by the GIC-EM Condition 22 (20.04), GIC-R Condition 19 (0.10), GIC Condition 6 (0.27), GIC-EM Condition 21 (0.33), and GIC-R Condition 18 (20.40). In addition, the smallest RMSD values were in the GIC-EM approaches (range ¼ 113,957.64 – 116,178.34), followed by the GIC-EM Condition 23 (113,957.64), GIC-EM Condition 21 (114,367.88), GIC-EM Condition 22 (114,697.06), GIC-EM Condition 24 (116,178.34), and IIC Condition 1 (125,851.23). Our research revealed that the GICEM approach produced better results than did the IIC, GIC, and GIC-R approaches for recovering 7-day missing PA data, but the IIC method was still a better recovery method compared with other GIC methods. In 2003, Kang and Zhu compared IIC and GIC methods (including the EM method) and found that the IIC method was more accurate compared with the EM method. In 2005 and 2009, Kang et al. also arrived at the same conclusion that IIC is a better method than GIC. It can be concluded that IIC is certainly quite straightforward to use, while the EM approach, on the other hand, could be a

little more complex to apply and is likely not practical for researchers who may not have the expertise to use this method. Musil, Warner, Yobas, and Jones (2002) compared and contrasted five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the EM algorithm) for dealing with the missing data and suggested that mean substitution was the least effective and the EM algorithm produced estimates closest to those of the original variables. Von Hippel (2004) compared listwise, pairwise deletion, regression imputation, and EM in analyzing data with missing values using SPSS Version 12.0 software. Only the EM produced asymptotically unbiased estimates. Baneshi and Talei (2012) applied the regression imputation, the EM algorithm, and multivariable imputation via chained equations method, which showed the best performance followed by the EM model, and modern imputation methods were recommended to recover the information. Catellier et al. (2005) used the EM algorithm and MI and encouraged researchers to take advantage of software to implement missing value imputation, as estimates of activity are more precise and less biased in the presence of intermittent missing accelerometer data compared with those derived from an observed data analysis approach. The EM algorithm has been widely used. Tudor-Locke et al. (2004) used the Missing Values Analysis EM function in SPSS Version 11.0.1 to estimate the missing values in the collected 365 days of continuous self-monitored pedometer data to explore the natural variability of PA in adults (aged 38 ^ 9.9 years). Pelclova´, Walid, and Vasˇ´ıcˇkova´ (2010) used the Missing Values Analysis EM function of the SPSS Version 18.0 to estimate step values that were missing from a student data set (aged 16.0 ^ 0.7 years). Reza Naghavi, Shabestari, Roudsari, and Harrison (2012) used the Missing Value Analysis EM method to improve their data and to handle missing values, when they designed and validated a

Downloaded by [University Of Pittsburgh] at 12:43 12 October 2014

S54

J. ZHUANG ET AL.

questionnaire to measure the attitude of hospital staff toward work attendance during an influenza pandemic. Although the GIC approach performed well in this study, we believe that the IIC method was still more advantageous for the following reasons: (a) The individual data have larger differences and may have bigger fluctuations; (b) the group data can only represent the overall trend, excluding the instability of a single data point; and (c) the individual value is always the means of the remaining days and remaining weekday or weekend day. Secondly, the GIC-EM approach could perform better than the IIC approach among the IIC, GIC, GIC-R, and GIC-EM. In fact the EM method has an issue related to the data analysis. Unlike mean and regression imputations, the replacement values using the EM method in SPSS change every time. This means that the results (i.e., RMSD and MSD) can be different every time the SPSS missing value analysis using the EM method is run. In addition, the EM method has numerous problems including the complexity of the method and the requirements of a large sample size for replacing missing values. Thus, examining the accuracy of methods of missing data recovery is of great practicality for PA researchers. More analysis of missing data/recovery methods needs to be used, compared, analyzed, and examined. We collected 7 days of data. Longer periods of data collection (e.g., 2 weeks, 1 month, or 3 months) should be compared in future research. In addition, monthly or seasonal factors should also be examined.

CONCLUSIONS Determining which method, IIC or GIC, is more efficient to recover missing PA data can be valuable for researchers. An appropriate recovery method for missing data can increase the quality of data, decrease research time and costs, improve statistical power, reduce bias, and enhance utilization of data. Our study showed that the smallest RMSD values were from the GIC-EM (range ¼ 113,957.64 – 116,178.34), which represents the most accurate recovery. The methods and conditions in order of accuracy are: Condition 23, GIC-EM regardless of gender and by age (113,957.64); Condition 21, GIC-EM regardless of gender and age (114,367.88); Condition 22, GIC-EM regardless of age and by gender (114,697.06); Condition 24, GIC-EM by gender and age (116,178.34); and Condition 1, IIC averaging of remaining days (125,851.23). In conclusion, to recover 7-day PA accelerometer-determined missing data, we recommend using the GIC-EM and IIC approach.

WHAT DOES THIS ARTICLE ADD? This study was based on research in recovering missing PA data by Kang et al. (2005, 2009). It added the GIC-R and GICEM algorithms and compared 2 IIC, 12 GIC, 6 GIC-R, and 4

GIC-EM methods. To the best of our knowledge, it is the first study to investigate the multifaceted effect of accelerometer data collection in children and youth on the occurrence of missing data. This research was based on a large group of 2,758 healthy Chinese children and youth aged 9 to 17 years old who wore accelerometers for 7 consecutive days. The results provide a resource to address missing data recovery methods. To recover missing 7-day PA accelerometry data, we recommend using the GIC-EM and IIC approaches.

REFERENCES Acock, A. C. (1997). Working with missing values. Family Science Review, 10, 76 –102. Baneshi, M. R., & Talei, A. R. (2012). Does the missing data imputation method affect the composition and performance of prognostic models? Iranian Red Crescent Medical Journal, 14, 31–36. Bassett, D. R., Mahar, M. T., Rowe, D. A., & Morrow, J. R. (2008). Walking and measurement. Medicine & Science in Sports & Exercise, 40 (Suppl. 7), S529–S536. Becker, M. P., Yang, I., & Lange, K. (1997). EM algorithms without missing data. Statistical Methods in Medical Research, 6, 38– 54. Catellier, D. J., Hannan, P. J., Murray, D. M., Addy, C. L., Conway, T. L., & Yang, S. (2005). Imputation of missing data when measuring physical activity by accelerometry. Medicine & Science in Sports & Exercise, 37(Suppl. 11), S555–S562. Centers for Disease Control and Prevention. (2011). Positive parenting tips. Retrieved from http://www.cdc.gov/ncbddd/childdevelopment/ positiveparenting/index.html Colley, R., Gorber, S. C., & Tremblay, M. S. (2010). Quality control and data reduction procedures for accelerometry-derived measures of physical activity. Health Report/ Statistics Canada, 21(1), 1–7. Craig, C. L., Tudor-Locke, C., Cragg, S., & Cameron, C. (2010). Process and treatment of pedometer data collection for youth: The Canadian Physical Activity Levels Among Youth Study. Medicine & Science in Sports & Exercise, 42, 430–435. Crouter, S. E., Horton, M., & Bassett, D. R. (2012). Use of a two-regression model for estimating energy expenditure in children. Medicine & Science in Sports & Exercise, 44, 1177–1185. Dale, D., Welk, G. J., & Matthews, C. E. (2002). Methods for assessing physical activity and challenges for research. In G. J. Welk (Ed.), Physical activity assessments for health-related research (pp. 19–34). Champaign, IL: Human Kinetics. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39, 1–38. Firat, M., Dikbas, F., Koc , A. C., & Gungor, M. (2010). Missing data analysis and homogeneity test for Turkish precipitation series. Sa¯dhana¯, 35, 707 –720. Fitzmaurice, G. (2008). Missing data: Implications for analysis. Nutrition, 24, 200 –202. Griffith, D. A. (2010). Some simplifications for the expectationmaximization (EM) algorithm: The linear regression model case. InterStat, 210(3), 1– 23. Horton, N. J., & Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software packages for regression models with missing variables. The American Statistician, 55, 244 –254. Kang, M., Hart, P. D., & Kim, Y. (2012). Establishing a threshold for the number of missing days using 7 d pedometer data. Physiological Measurement, 33, 1877– 1885. Kang, M., Rowe, D. A., Barreira, T. V., Robinson, T. S., & Mahar, M. T. (2009). Individual information-centered approach for handling physical

Downloaded by [University Of Pittsburgh] at 12:43 12 October 2014

RECOVERING PHYSICAL ACTIVITY MISSING DATA activity missing data. Research Quarterly for Exercise and Sport, 80, 131–137. Kang, M., & Zhu, W. (2003). Current issues with missing data methods in physical activity research. 2003 Daegu Universiade conference proceedings: Facing the challenge (pp. 610–616). Daegu, Korea: 2003 Daegu Universiade Conference Organizing Committee. Kang, M., Zhu, W., Tudor-Locke, C., & Ainsworth, B. (2005). Experimental determination of effectiveness of an individual information-centered approach in recovering step-count missing data. Measurement in Physical Education and Exercise Science, 9, 233–250. Karahalios, A., Baglietto, L., Carlin, J. B., English, D. R., & Simpson, J. A. (2012). A review of the reporting and handling of missing data in cohort studies with repeated assessment of exposure measures. BMC Medical Research Methodology, 12, 96 –105. Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7, 305 –315. Lee, P. H. (2013). Data imputation for accelerometer-measured physical activity: The combined approach. American Journal of Clinical Nutrition, 97, 965–971. Lee, P. H., Macfarlane, D. J., & Lam, T. H. (2013). Factors associated with participant compliance in studies using accelerometers. Gait and Posture, 38, 912 –917, doi: 10.1016/j.gaitpost.2013.04.018 Little, R. J., & Rubin, D. B. (1987). Statistical analysis with missing data. New York, NY: Wiley. Little, R. J., & Rubin, D. B. (1989). The analysis of social science data with missing values. Sociological Methods and Research, 18, 292 –326. Maclachlan, G. J., & Krishnan, T. (1977). The EM algorithm and extensions. New York, NY: John Wiley and Sons. Matthews, C. E., Ainsworth, B. E., Thompson, R. W., & Bassett, D. R. (2002). Sources of variance in daily physical activity levels as measured by an accelerometer. Medicine & Science in Sports & Exercise, 34, 1376–1381. Musil, C. M., Warner, C. B., Yobas, P. K., & Jones, S. L. (2002). A comparison of imputation techniques for handling missing data. Western Journal of Nursing Research, 24, 815–829. Park, J., Ishikawa-Takata, K., Tanaka, S., Mekata, Y., & Tabata, I. (2011). Effects of walking speed and step frequency on estimation of physical activity using accelerometers. Journal of Physiological Anthropology, 30, 119– 127. Pelclova´, J., Walid, E. A., & &Vasˇ´ıcˇkova´, J. (2010). Study of day, month and season pedometer-determined variability of physical activity of high school pupils in the Czech Republic. Journal of Sports Science and Medicine, 9, 490 –498. Plasqui, G., & Westerterp, K. R. (2007). Physical activity assessment with accelerometers: An evaluation against doubly labeled water. Obesity, 15, 2371–2379.

S55

Puyau, M. R., Adolph, A. L., Vohra, F. A., & Butte, N. F. (2002). Validation and calibration of physical activity monitors in children. Obesity Research, 10, 150–157. Reza Naghavi, S. H., Shabestari, O., Roudsari, A. V., & Harrison, J. (2012). Design and validation of a questionnaire to measure the attitudes of hospital staff concerning pandemic influenza. Journal of Infection and Public Health, 5, 89– 101. Roth, P. L. (1994). Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47, 537– 560. Rowlands, A. V. (2007). Accelerometer assessment of physical activity in children: An update. Pediatric Exercise Science, 19, 252–266. Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York, NY: Chapman and Hall. Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147 –177. SPSS, Inc. (2002). MVA: Specification of algorithms. Available from http:// support.spss.com/tech/stat/Algorithms/12.0/mva.pdf (login required). Stevens, J., Murray, D. M., Catellier, D. J., Hannan, P. J., Lytle, L. A., Elder, J. P., . . . Webber, L. S. (2005). Design of the Trial of Activity in Adolescent Girls (TAAG). Contemporary Clinical Trials, 26, 223 –233. Troiano, R. P., Berrigan, D., Dodd, K. W., Masse, L. C., Tilert, T., & McDowell, M. (2008). Physical activity in the United States measured by accelerometer. Medicine & Science in Sports & Exercise, 40, 181–188. Trost, S. G., Pate, R. R., Freedson, P. S., Sallis, J. F., & Taylor, W. C. (2000). Using objective physical activity measures with youth: How many days of monitoring are needed? Medicine & Science in Sports & Exercise, 32, 426– 431. Tudor-Locke, C., Bassett, D. R., Swartz, A. M., Strath, S. J., Parr, B. B., Reis, J. P., & Ainsworth, B. E. (2004). A preliminary study of one year of pedometer self-monitoring. Annals of Behavioral Medicine, 28, 158–162. Tudor-Locke, C., Burkett, L., Reis, J. P., Ainsworth, B. E., Macera, C. A., & Wilson, D. K. (2005). How many days of pedometer monitoring predict weekly physical activity in adults? Preventive Medicine, 40, 293–298. Tudor-Locke, C., Leonardi, C., Johnson, W. D., Katzmarzyk, P. T., & Church, T. S. (2011). Accelerometer steps/day translation of moderateto-vigorous activity. Preventive Medicine, 53, 31 –33. Van Coevering, P., Harnack, L., Schmitz, K., Fulton, J. E., Galuska, D. A., & Gao, S. (2005). Feasibility of using accelerometers to measure physical activity in young adolescents. Medicine & Science in Sports & Exercise, 37, 867– 871. Von Hippel, P. L. (2004). Biases in SPSS 12.0 missing value analysis. The American Statistician, 58, 160–165.

Recovering physical activity missing data measured by accelerometers: a comparison of individual and group-centered recovery methods.

The purpose of this study was to determine which method, individual information-centered (IIC) or group information-centered (GIC), is more efficient ...
133KB Sizes 0 Downloads 3 Views