Journal of Sport Rehabilitation, 2016, 25, 133  -136 http://dx.doi.org/10.1123/jsr.2014-0281 © 2016 Human Kinetics, Inc.

ORIGINAL RESEARCH REPORT

Comparing Computer-Derived and Human-Observed Scores for the Balance Error Scoring System Jaclyn B. Caccese and Thomas W. Kaminski Context: The Balance Error Scoring System (BESS) is the current standard for assessing postural stability in concussed athletes on the sideline. However, research has questioned the objectivity and validity of the BESS, suggesting that while certain subcategories of the BESS have sufficient reliability to be used in evaluation of postural stability, the total score is not reliable, demonstrating limited interrater and intrarater reliability. Recently, a computerized BESS test was developed to automate scoring. Objective: To compare computerderived BESS scores with those taken from 3 trained human scorers. Design: Interrater reliability study. Setting: Athletic training room. Patients: NCAA Division I student athletes (53 male, 58 female; 19 ± 2 y, 168 ± 41 cm, 69 ± 4 kg). Interventions: Subjects were asked to perform the BESS while standing on the Tekscan (Boston, MA) MobileMat® BESS. The MobileMat BESS software displayed an error score at the end of each trial. Simultaneously, errors were recorded by 3 separate examiners. Errors were counted using the standard BESS scoring criteria. Main Outcome Measures: The number of BESS errors was computed for the 6 stances from the software and each of the 3 human scorers. Interclass correlation coefficients (ICCs) were used to compare errors for each stance scored by the MobileMat BESS software with each of 3 raters individually. The ICC values were converted to Fisher Z scores, averaged, and converted back into ICC values. Results: The double-leg, single-leg, and tandem-firm stances resulted in good agreement with human scorers (ICC = .999, .731, and .648). All foam stances resulted in fair agreement. Conclusions: Our results suggest that the MobileMat BESS is suitable for identifying BESS errors involving each of the 6 stances of the BESS protocol. Because the MobileMat BESS scores consistently and reliably, this system can be used with confidence by clinicians as an effective alternative to scoring the BESS. Keywords: concussion, postural stability, reliability It is estimated that there are 1.7 million to 3.8 million sport-related concussions annually.1 In 1999, Riemann et al2 developed the Balance Error Scoring System (BESS) as a method of evaluating postural stability for postconcussion assessment, which is the current standard for sideline assessment.3–5 The BESS consists of a series of 3 stances—single-leg (nondominant), double-leg, and tandem (nondominant in back)—performed on both a firm and a foam surface. Each stance is maintained for 20 seconds. Subjects are asked to keep their hands on their hips and their eyes closed. During the testing period, scorers count the errors that occur. An error includes hands lifting off the iliac crest; opening the eyes; stepping, stumbling, or falling; moving the hip into 30° of abduction or more; lifting the forefoot or heel; and/or remaining out of testing position for more than 5 seconds. A maximum of 10 errors can occur for each stance. Recent guidelines for concussion assessment, the Sport Concussion Assessment Tool (SCAT),3 recommend the use of the Modified BESS.6 The authors are with the Dept of Kinesiology and Applied Physiology, University of Delaware, Newark, DE. Address author correspondence to Jaclyn Caccese at [email protected].

The Modified BESS test includes only the 3 firm-surface stances. However, the Modified BESS has little validation in literature. Moreover, researchers have suggested that using only the firm surface may not be as challenging as using the foam surface, so the Modified BESS may not differentiate between concussed and nonconcussed athletes.4,5 Riemann et al2 reported good to excellent intertester reliability coefficients ranging from .78 to .96. However, Finnoff et al7 have questioned the objectivity and validity of the BESS test, suggesting that while certain subcategories of the BESS test have sufficient reliability to be used in the evaluation of postural stability, the total BESS score is not reliable, demonstrating limited interrater and intrarater reliability. Bell et al8 summarized 8 articles that characterized intertester and intratester reliability of the BESS test. Across these studies, interrater reliability ranged from .57 to .85 for the total BESS score and from .44 to .96 for individual stances, with the double-leg foam stance being the least reliable. Intrarater reliability ICCs ranged from .60 to .92 for the total BESS score and from .50 to .98 for individual stances, once again with the double-leg foam stance being the least reliable. Variability in the results from these investigations is due, in part, to differences in level of training across examiners. From a

133

Downloaded by University of Idaho Library on 12/11/16, Volume 25, Article Number 2

134  Caccese and Kaminski

clinical standpoint, this suggests that the same examiner should perform both the baseline and postconcussion assessment, although even within testers grading the same test twice, ICCs are low (ICC = .57).7 Inconsistent scoring of the BESS may result in inappropriate clinical decisions. Therefore, there is a need to improve both intertester and intratester reliability to make the BESS a more valid concussion-assessment tool. Recently, a computerized implementation of the BESS, the MobileMat® BESS (Tekscan, Inc, Boston, MA), was developed to automate the scoring of this concussion-assessment tool. The MobileMat BESS system consists of a portable MobileMat sensor platform and software that uses proprietary algorithms correlated to the balance errors committed by the subject during the BESS test. The portable sensor platform has been deemed a reliable system for the measurement of postural stability,9 although the BESS scoring algorithm has never been evaluated. The MobileMat BESS calculates a battery of metrics characterizing the stability of an individual’s position on the mat over the duration of a trial. The metrics are designed to identify different types of balance errors (ie, stepping down with an off-foot during a single leg trial, falling out of position, swaying, etc). The software calculates the value of the metrics and the occurrence of any corresponding errors on completion of a trial. To identify balance errors, the calculated metrics are compared with thresholds that mark acceptable ranges—thresholds are calculated based on the entirety of the trial’s data set. Out-of-range metrics identify periods in the trial where the individual has incurred a balance error. A BESS error is charged for every period in the trial where a subject has a metric in an error state—if multiple metrics are in an error state during overlapping periods, only 1 error is charged for the entirety of time that 1 of the metrics is in error. A maximum of 10 BESS errors is reported for each trial. In addition, if subjects cannot maintain the proper stance for at least 5 seconds or do not otherwise complete the condition, they are given the maximum score of 10. The purpose of this investigation was to compare computer-derived BESS scores with those taken from 3 trained human scorers. We hypothesized that the computer-derived BESS scores would be similar to the human-derived errors (ICC > .60).

weight 69 ± 4 kg) participated in this study as part of their preseason medical evaluation and baseline concussion screening. These student athletes were members of the men’s and women’s soccer, women’s field hockey and volleyball, and football teams. Potential subjects were excluded if they had current symptoms of neurological or musculoskeletal injury.

Procedures Subjects were asked to perform the BESS, which includes 3 stances, double-leg (Figure 1[a]), single-leg (nondominant) (Figure 1[b]), and tandem (nondominant in back) (Figure 1[c]), on both a firm and a foam surface while standing on the Tekscan MobileMat. In the foam condition, an Airex Balance Pad (Power Systems Inc, Knoxville, TN) was placed atop the Tekscan MobileMat BESS portable sensor. Each stance was maintained for 20 seconds with the hands on hips and eyes closed. Simultaneously, BESS test errors were recorded by 3 trained examiners (30, 35, and 60 h of grading experience). Each trial was scored by counting the errors or deviations from the proper stance using the standardized BESS scoring criteria.2 In addition, the MobileMat BESS software, guided by an algorithm based on the BESS scoring criteria, displayed an error score at the end of each trial. The number of BESS test errors was computed for each of the 6 stances from the software and each of the 3 human scorers. For each subject, human scorers were blinded from the computer-derived scores. In addition, the 3 scorers did not consult with each other after each trial. Each trial was reprocessed using the MobileMat algorithm to determine repeatability of scoring.

Statistical Analyses ICCs were used to compare the errors for each of 3 stances scored by the MobileMat BESS software with those of

Methods Design This investigation was an interrater reliability study that compared computer-derived BESS scores with those obtained from 3 human scorers. Repeatability of the MobileMat BESS Scoring was also explored. All testing was done in the athletic training room, where preseason baseline BESS scores are normally collected.

Participants One hundred eleven NCAA Division I student athletes (53 male, 58 female; age 19 ± 2 y, height 168 ± 41 cm,

Figure 1 — The MobileMat® Balance Error Scoring System. Firm-surface testing positions include (a) double-leg stance, (b) single-leg stance on nondominant leg, and (c) tandem stance with nondominant leg in back.

JSR Vol. 25, No. 2, 2016

Computer-Derived and Human-Observed BESS Scores   135

each of the 3 human raters. The ICC1,1 values were converted to Fisher Z scores, averaged, and converted back into ICC values. We also used ICC2,1 values to compare the 3 human scorers. Finally, ICC2,1 was used to evaluate the reliability of the scoring algorithm. Fleiss10 set guidelines for interrater reliability, where ICC values of .75 indicates excellent agreement. These guidelines were used to report the ICC values obtained in this investigation.

Downloaded by University of Idaho Library on 12/11/16, Volume 25, Article Number 2

Results The agreement between the mat and the human scorers ranged from fair to excellent across the 6 individual stances (Table 1), with the double-leg foam stance having the lowest agreement (ICC = .441). The agreement between the mat and human scorers was higher in the firm conditions than in the foam conditions. There was good agreement between the mat and the human scorers for the total BESS score (ICC = .631). Similarly, the agreement across the 3 human raters ranged from fair to excellent (Table 1), with the doubleleg foam stance having the lowest agreement (ICC = .529). There was good agreement across the 3 human raters (ICC = .732). The average number of errors reported by the mat and all 3 raters for each stance is presented in Table 2.

Finally, the repeatability of the mat across all stances was perfect (ICC = 1.0). Each time the trial was reprocessed using the MobileMat scoring algorithm, the number of calculated errors was the same.

Discussion The purpose of this study was to compare computerderived BESS scores with those of 3 trained human scorers. Our results showed that the mat was able to objectively score the BESS with the same level of agreement as across human raters. The MobileMat BESS was in better agreement with human scorers during the firm stances, although human raters also were in better agreement with one another in these conditions. As consistent with literature,8 the greatest variation in scores both with the mat and across individuals was in the double-leg firm stance. Our interrater reliability between the mat and the human scorers across individual stances ranged from .441 to .999, which is similar to that previously reported between individuals.8 The interrater reliability between the mat and the human raters for the total BESS score was .631. This, too, is similar to other reliability studies.8 In addition, the total BESS score reported by the MobileMat BESS was 15.5. This number of errors was also consistent with the work of Finnoff et al,7 who reported an average of 15.1 total errors across all 6 stances.

Table 1  Intraclass Correlation Coefficients (ICCs) for Each of the 6 Balance Error Scoring System Stances Mat vs Human

Human Raters

Stance

ICC

Level of agreement

ICC

Level of agreement

Double-leg firm

.999

Excellent

.999

Excellent

Single-leg firm

.731

Good

.739

Good

Tandem firm

.648

Good

.611

Good

Double-leg foam

.441

Fair

.529

Fair

Single-leg foam

.545

Fair

.534

Fair

Tandem foam

.499

Fair

.586

Fair

Total

.631

Good

.732

Good

Table 2  Average Number of Reported Errors Stance

Computer-derived

Rater 1

Rater 2

Rater 3

Double-leg firm

0.0 ± 0.0

0.0 ± 0.0

0.0 ± 0.0

0.0 ± 0.0

Single-leg firm

3.7 ± 2.6

2.7 ± 2.7

2.2 ± 2.3

2.1 ± 2.5

Tandem firm

1.8 ± 1.6

1.0 ± 1.5

0.9 ± 1.4

0.8 ± 1.3

Double-leg foam

0.6 ± 0.8

0.2 ± 0.8

0.3 ± 0.9

0.3 ± 0.8

Single-leg foam

6.5 ± 2.4

8.6 ± 1.6

7.3 ± 1.8

6.9 ± 2.4

Tandem foam

2.9 ± 2.2

3.2 ± 2.2

3.5 ± 2.9

3.3 ± 3.0

Total

15.5 ± 5.0

15.7 ± 5.2

14.2 ± 6.0

13.4 ± 6.0

JSR Vol. 25, No. 2, 2016

Downloaded by University of Idaho Library on 12/11/16, Volume 25, Article Number 2

136  Caccese and Kaminski

Bell et al8 suggested that the same individual should administer the BESS for serial testing. However, Finnoff et al7 examined within testers grading the same test twice. Raters only had fair agreement (ICC = .57) with themselves during the second grading session. Therefore, implementing a computer algorithm to score the BESS removes this subjective aspect of scoring. The repeatability of the mat across all stances was perfect (ICC = 1.0). Each time the trial was reprocessed using the MobileMat scoring algorithm, the number of calculated errors was the same. This suggests that the MobileMat BESS may be a useful clinical tool in evaluating postural stability. Although we questioned the ability of the MobileMat BESS to identify certain errors, the mat achieved the same level of agreement as across raters. The algorithm was able to identify most errors such as lifting the hands off of the iliac crest and hip abduction >30°; however, it was not able to identify when subjects opened their eyes. In our investigation, limited to healthy student athletes, subjects only opened their eyes simultaneously with another balance error. In this scenario, the error of opening the eyes was accounted for in the MobileMat data. This also suggests that the error of opening the eyes may be redundant. This error was also questioned by Brown et al11 in their attempt to validate inertial measurement units for BESS application. However, our study was limited to healthy student athletes, so additional work is needed to validate the MobileMat BESS using subjects with a higher and wider range of BESS scores, such as a concussed population. Despite a wide range of interrater and intrarater reliability, clinically, human scorers are used to evaluate the BESS. Therefore, human raters were used as the standard for mat assessment in this investigation. However, future studies may quantify a 95% confidence interval using the position data provided by the MobileMat to correlate with computer-derived BESS scores to determine the reliability of the BESS in evaluating postural stability. The MobileMat BESS may be useful for reliable baseline and follow-up assessment of the BESS for athletic trainers or in situations where medical personnel unexperienced in BESS scoring may benefit from baseline and postconcussion testing.

Conclusion Our results suggest that the MobileMat BESS is suitable for identifying BESS errors involving each of the 6 stances of the BESS protocol. The MobileMat BESS scored consistently and as reliably as human scorers and

can therefore be used with confidence by clinicians as an effective alternative for scoring the BESS test. Further study involving BESS scores derived from a symptomatic population is warranted.

References 1. Langlois JA, Rutland-Brown W, Wald MM. The epidemiology and impact of traumatic brain injury: a brief overview. J Head Trauma Rehabil. 2006;21:375–378. PubMed doi:10.1097/00001199-200609000-00001 2. Riemann BL, Guskiewicz KM, Shields EW. Relationship between clinical and forceplate measures of postural stability. J Sport Rehabil. 1999;8(2):71–82. 3. Guskiewicz KM. Balance assessment in the management of sport-related concussion. Clin Sports Med. 2011;30:89– 102. PubMed doi:10.1016/j.csm.2010.09.004 4. Hunt TN, Ferrara MS, Bornstein RA, Baumgartner TA. The reliability of the modified Balance Error Scoring System. Clin J Sport Med. 2009;19:471–475. PubMed doi:10.1097/ JSM.0b013e3181c12c7b 5. Valovich McLeod TC, Bay RC, Lam KC, Chhabra A. Representatiove baseline values on the Sport Concussion Assessment Tool 2 (SCAT2) in adolescent athletes vary by gender, grade, and concussion history. Am J Sports Med. 2012;40:927–933. PubMed doi:10.1177/0363546511431573 6. McCrory P, Meeuwisse WH, Aubry M, et al. Consensus statement on concussion in sport: the 4th International Conference on Concussion in Sport, Zurich, November 2012. J Athl Train. 2013;48:554–575. PubMed doi:10.4085/1062-6050-48.4.05 7. Finnoff JT, Peterson VJ, Hollman JH, Smith J. Intrarater and interrater reliability of the Balance Error Scoring System (BESS). PM R. 2009;1(1):50–54. PubMed 8. Bell DR, Guskiewicz KM, Clark MA, Padua DA. Systematic review of the Balance Error Scoring System. Sports Health. 2011;3(3):287–295. PubMed doi:10.1177/1941738111403122 9. Brenton-Rule A, Mattock J, Carroll M, et al. Reliability of the TekScan MatScan® system for the measurement of postural stability in older people with rheumatoid arthritis. J Foot Ankle Res. 2012;5:21. PubMed doi:10.1186/17571146-5-21 10. Fleiss JL. The Design and Analysis of Clinical Experiments. New York: Wiley; 1986. 11. Brown HJ, Siegmund GP, Guskiewicz KM, et al. Development and validation of an objective balance error scoring system. Med Sci Sports Exerc. 2014;46:1610–1616. PubMed doi:10.1249/MSS.0000000000000263

JSR Vol. 25, No. 2, 2016

Comparing Computer-Derived and Human-Observed Scores for the Balance Error Scoring System.

The Balance Error Scoring System (BESS) is the current standard for assessing postural stability in concussed athletes on the sideline. However, resea...
947KB Sizes 0 Downloads 6 Views