Physical & Occupational Therapy in Pediatrics, 35(3):296–310, 2015  C 2015 by Informa Healthcare USA, Inc. Available online at http://informahealthcare.com/potp DOI: 10.3109/01942638.2014.975311

Inter-Rater and Test–Retest Reliability of the German Pediatric Evaluation of Disability Inventory (PEDI-G) Christina Schulze1,2 , Anders Kottorp1,2 , Andr´e Meichtry1 , Margareta Lilja3 , & Julie Page1 1

ZHAW, Health, Winterhur, Switzerland, 2 Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden, 3 Department of Health Sciences, Lule˚a University of Technology, Lule˚a , Sweden

ABSTRACT. Activities of daily living (ADL) of children are widely assessed with the Pediatric Evaluation Disability Inventory (PEDI). This study examined test–retest and inter-rater reliability of the German PEDI (PEDI-G). During the adaptation of the PEDI nine items were added. In total, 117 parents of 53 children without and 64 children with a diagnosed physical disability from Austria, Germany, and Switzerland participated. Reliability was examined by intraclass correlation coefficient (ICC), standard error of measurement (SEM) and smallest detectable difference (SDD) for the Functional Skill Scale with and without added items and the Caregiver Assistance Scale. Cohen‘s Kappa was used to calculate the reliability of the Modification Scale. All ICC’s for test–retest and inter-rater reliability were above 0.75, indicating good to very good reliability. The SDD varied from 0.83–5.58 across PEDI domains and scales. For the Modification Scale, Cohen’s weighted kappa varied from 0.25 to 1.00 indicating sufficient reliability for some but not all items. Our findings indicate that the Functional Skill Scale and the Caregiver Assistance Scale of the PEDI-G are reliable scales that can be used to evaluate ADLs of children with and without physical disability. KEYWORDS. Activities of daily living, children, german translation and adaptation, pediatric evaluation disability inventory, physical disability, reliability

Activities of daily living (ADL) assessments are of fundamental importance in occupational therapy (Gantschnig et al., 2013; Kottorp et al., 2003; Morales et al., 2008). Currently, there is a lack of pediatric ADL assessments that have been developed and/or validated to be used in German-speaking countries (Projektgruppe ¨ ergotherapeutische Befundinstrumente in der Padiatrie, 2004). This gap means that professionals in the health care section in these countries have virtually no reliable Address correspondence to: Christina Schulze, School of Health Professions, Institute of Occupational Therapy, Zurich University of Applied Sciences, Technikumstrasse 71, CH-8401 Winterthur, Switzerland (E-mail: [email protected]). (Received 14 January 2014; accepted 31 August 2014)

296

Evaluating Aspects of Reliability

297

assessments for systematic evaluation of children’s activities of daily living (ADL). To obtain applicable assessments in a new context a process of translation, adaptation and validation of an existing assessment is recommended (Hambleton, 2001). One advantage of translating and adapting an existing validated assessment is that so doing can enhance the international comparability of clinical and research results and may also save time and financial expense (Streiner and Norman, 2008). The Pediatric Evaluation Disability Inventory (PEDI) is used internationally by therapists to evaluate functional ability and performance in children with and without activity limitations (Haley et al., 1992). It samples key functional ADLs of children in three domains: Self-care, Mobility, and Social Function. In the last 20 years, the PEDI has been widely used in both clinical practice and research and has been translated into several languages: Chinese, Danish, Dutch, Hebrew, Norwegian, Portuguese (Brazil), Slovenian, Spanish Swedish, and Turkish (Berg et al., 2003a, b; Berg et al., 2004; Berg et al., 2003b; Chen et al., 2009; Chen et al., 2010; Elad et al., 2012; Erkin et al., 2007; Gannotti, 1998; Gannotti and Cruz, 2001; Groleger et al., 2005; Mancini, 2005; Nordmark et al., 1999; Stahlhut et al., 2010; Stahlhut et al., 2011; Wassenberg-Severijnen, 2005). Many studies on the psychometric properties of the PEDI in different samples have been published and have concluded that it is a valid and reliable assessment for the tested context or population in which the PEDI was used. None of those studies included a German translation or adaptation of the Pediatric Evaluation of Disability Inventory or even participants from a German-speaking part of Europe (Feldman et al., 1990; Knox and Usen, 2000; McCarthy et al., 2002; Nichols and Case Smith, 1996; Ziviani et al., 2001). The PEDI has been translated and adapted for German speaking countries following a strict translation procedure based on international guidelines (Schulze et al., 2013). During the translation and adaptation process of the German Pediatric Evaluation Disability Inventory (PEDI-G) for use in Austria, Germany and Switzerland, new items were added, while many other items were adapted. Experiences from this process illustrate the challenge of achieving equivalence between the original and the target assessment. Evidence suggests that the principle constructs and assumptions of an assessment may not be stable across cultures, and highlights that the reliability of an assessment is likely to vary when it is translated and applied in a different culture (Anastasi and Urbina, 1997). Likewise, functional assessments developed in a certain cultural context and practical set-up may not be applied directly without evaluating their applicability and reliability in the recipient country’s cultural and practical context (Berg et al., 2004; Vacha-Haase, 1998). According to Streiner and Norman (2008), whether a scale which has been shown to be reliable with one group of individuals in a particular context is reliable with other people and in different situations is an empirical issue that must be assessed. Adequate reliability is a prerequisite for an assessment tool used in research and practice and needs to be investigated in each new context within the groups in which the assessment is meant to be used (Beaton et al., 2002). As many professionals may be involved in the evaluation of a child, and the evaluation and intervention process usually occurs over long periods, good reliability between different raters or clinicians and over shorter periods of time are crucial aspects of reliability that need to be evaluated further. The PEDI is used to identify activity limitations and performance restrictions in children and

298

Schulze et al.

to measure change over time (Haley et al., 1992). In order to obtain a sample of children with a wide range of abilities, we included both children with and without a motor disability. The PEDI paper and pencil version has recently been adapted to a computer adaptive test (CAT), the PEDI-CAT (Haley et al., 2012). At the time our study began, the PEDI-CAT was not available. In the German speaking countries computer-based evaluations are still not frequently performed, therefore we evaluated the reliability of the original paper and pencil version. Future studies will be needed to evaluate the PEDI-CAT in various contexts. The aim of this study was to examine test–retest and inter-rater reliability of the PEDI-G in a sample of children with and without motor disabilities in Austria, Germany and Switzerland. Moreover, the study investigated whether the reliability of the PEDI-G with added items shows different reliability to the translated PEDI without added items. A secondary aim was to determine the smallest detectable difference (SDD). The SDD indicates the difference between scores at two points or two circumstances that represents a “real” difference that is not potentially attributable to measurement error. This study is part of a larger project with the overall aim of translating, adapting and validating the PEDI-G for use in Austria, Germany, and Switzerland.

METHODS Participants A heterogeneous sample was used to investigate aspects of reliability. In total, 117 children were included, 64 children diagnosed with a physical disability such as cerebral palsy and 53 children without a known disability. Children were included if: (i) they were aged between six months and 7.5 years, (ii) had diagnosed physical problems (for the group with diagnosed physical disability), and (iii) their primary caregivers gave informed consent and understood and spoke German. The study was approved by Austrian, German, and Swiss ethical review committees. Participation was voluntary, and the parents were informed that they could withdraw from the study at any stage. Children with cerebral palsy, Gross Motor Function Classification System (GMFCS) (Palisano et al., 1997, 2008) level V and children with autism were excluded from the study. Children without a known physical disability were included because in practice the PEDI-G can be used as a tool to discriminate children with and without activity limitations and participation restrictions. Children were recruited by eleven occupational therapists working in four rehabilitation centers in Austria, Germany and Switzerland between August 2010 and December 2012. The centers were selected based on existing professional relations of the authors to institutions in the three countries. The diagnoses of the children were: cerebral palsy (n = 23); developmental coordination disorder (n = 13), spina bifida (n = 6), pre-term birth (n = 5), cardiofaciocutaneous syndrome (n = 2), Lennox syndrome (n = 2), and Morbus CharcotMarie Tooth Syndrome (n = 2). The characteristics of the children are presented in Table 1.

Evaluating Aspects of Reliability

299

TABLE 1. Characteristics of Children (N = 117)

Sex; Female n (%) Male, n (%) Age, Mean in months (SD) Diagnosis; without n (%) with, n (%) Functional Skill Scale Self-care Mean (SD) Range Mobility Mean (SD) Range Social Function Mean (SD) Range Caregiver Assistance Scale Self-care Mean (SD) Range Mobility Mean (SD) Range Social Function Mean (SD) Range

Austria (n = 36)

Germany (n = 32)

Switzerland (n = 49)

Statistical test1

17(47.2) 19 (56.2) 37.9 (20.9)

14(43.8) 18(56.2) 31.4 (16.1)

17(34.7) 32(65.3) 32.0 (19.6)

0.48

16 (44.4) 20 (55.6)

20 (62.5) 12 (37.5)

28 (57.1) 21 (42.9)

x2 = 1.48 df = 2 F = 1.31 df = (2,11.4) x2 = 2.43 df = 2

50.8 (18.8) 8–72

47.8 (23.1) 6–74

F = 0.6 df = (2,11.4)

0.55

52.1 (18.1) 5–74

46.7 (13.8) 6–67

45.5 (15. 9) 4–67

F = 0.22 df = (2,11.4)

0.81

47.6 (13.4) 4–67

48.8 (15.3) 11–63

45.8 (18.5) 7–65

F = 0.47 df = (2,11.4)

0.62

49.0 (16.1) 6–64

26.8 (9.1) 5–40

25.54 (11.9) 0–39

F = 0.78 df = (2,11.4)

0.46

28.0 (8.9) 0–40

18.1 (5.5) 6–38

16.5 (7.6) 0–35

F = 0.61 df = (2,11.4)

0.54

17.8 (7.2) 0–35

50.8 (18.8) 8—25

47.76 (23.1) 0–25

F = 0.59 df = (2,11.4)

0.48

52.1 (18.1) 0–25

p-value

0.27 0.29

1 Chi-squared test for comparisons among countries for gender and diagnosis and one-way ANOVA for comparisons among countries for all other variables; df = degree of freedom.

Pediatric Evaluation of Disability Inventory The PEDI is a comprehensive clinical assessment often used by physical and occupational therapists to measure performance of activities of daily living in children (Haley et al., 1992). It is normed for children aged six months to 7.5 years. The PEDI was developed to (i) identify children with activity limitations or participation restrictions in three domains Self-care, Mobility and Social-Function, (ii) identify need for caregiver assistance and environment modifications, and (iii) measure change over time. The PEDI consists of three scales, the Functional Skills Scale, the Caregiver Assistance Scale, and the Modification Scale. The PEDI domains, scales, subscales, and number of items within each subscale are summarized in Table 2. The Functional Skill Scale consists of three domains: Self-care (73 items), Mobility (59 items), and Social Function (65 items). There are a total of 41 subscales among the three domains. Each subscale includes four to six items, which are scored 1 (capable) if a skill is mastered, or 0 (unable) if not. The Caregiver Assistance Scale and the Modification Scale each have 20 items. The Caregiver Assistance Scale measures the amount of assistance required of the caregiver in order for the child to accomplish Self-care (8 items), Mobility (7 items), and Social Function (5 items) activities. The items of the Caregiver Assistance Scale are scored on a six-point ordinal scale ranging from “total assistance required” (score 0) to “independent”

300

Schulze et al.

TABLE 2. Description of Functional Skill Scale, Caregiver Assistance Scale and Modification Scale Domain Self-care

Functional Skill Scale A B C D E F G H I J K L M N O

Types of food texture (4∗) Use of utensils (5) +1 (added item) Use of drinking containers (5) Tooth brushing (5) Hair brushing (4) Nose care (5) Hand washing (5) Washing body and face (5) Pullover/front opening garments (5) Fasteners (5) Pants (5) Shoes /socks (5) Toileting task (5) Management of bladder (5) Management of bowel (5)

15 subscales in Self-care domain with 73 items (1 item added in the PEDI-G) Mobility A Toilet Transfer (5) B Chair/Wheelchair transfers (5) C Car transfers (5) D Bed mobility/transfers (4) E Tub transfers (5) F Indoor locomotion methods (3) G Indoor locomotion-distance/speed (5) H Indoor locomotion- pulls/carries objects(5) Outdoor locomotion methods(2) I Outdoor locomotion- distance/speed (5) J Outdoor surfaces (5) K Upstairs (5) L Downstairs (5) M Transfer bike trailer (added subscale with 4 items)1 N Transfer stroller (added subscale with 4 items)2

Caregiver Assistance Scale and Modification Scale A) Eating

B) Grooming

C) Bathing D) Dressing upper body E) Dressing lower body F) Toileting G) Bladder Management H) Bowel Management 8 items in Self-care domain A) Chair/toilet transfers B) Car transfers C) Bed mobility/transfers D) Tub transfers E) Indoor locomotion

F) Outdoor locomotion

G) Stairs

15 subscales in Mobility domain with 59 items (8 items added in the PEDI-G) Social Function A Comprehension of word meanings (5) B Comprehension of sentence complexity (5) C Functional use of communication (5) D Complexity of expressive communication (5) E Problem-resolution (5) F Social interactive play (adults) (5) G Peer interaction (child of similar age) (5) H Play with objects (5) I Self-information (5) J Time orientation (5) K Play with objects (5) L Self-protection (5) M Community function (5)

7 items in Mobility domain

13 subscales in Social Function with 65 items – no added item in the PEDI-G

5 items in Social Function domain

∗ 1 2

Number of items in subscale. Added subscale Transfer bike trailer with: 4 items. Added subscale: Transfer stroller with 4 items.

A) Functional comprehension B) Functional expression

C) Joint problem-solving

D) Peer play

E) Safety

Evaluating Aspects of Reliability

301

(score 5). The Modification Scale evaluates the assistance the child needs to improve the ability of ADLs and is also assessed with 20 items on a four-point ordinal scale ranging from “none” to “extensive” modifications needed. The Modification Scale is not a true measurement scale, but rather a frequency count of the type and extent of environmental modifications the child depends on to support performance (Haley et al., 1992). During the translation and adaptation of the PEDI for use in Austria, Germany and Switzerland, nine items and two subscales (each containing four items) were added to the Functional Skill Scale, and therefore the PEDI-G consisted of 206 items and 43 subscales. One item (“uses fork and knife well”) in the subscale “Food Textures” of the Functional Skill Scale Self-care was added. In order to be applicable for all clients across the three countries the expert committee that was responsible for the translation and adaptation of the PEDI suggested including into the German PEDI also other forms of transfer items. Therefore two subscales (“Transfer bike trailer” and “Transfer stroller”) with four items in each subscale were added in the Mobility domain. In the Caregiver Assistance Scale and the Modification Scale no item was added during the translation and adaptation process. Procedure The PEDI-G was administered by 11 occupational therapists from Austria, Germany and Switzerland through interview with a primary caregiver. All therapists had experience of working with young children, knowledge of child development, and could help the parent to understand and judge the PEDI-G items according to the criteria, as recommended in the manual. Before starting the data collection, all occupational therapists intensively studied the PEDI manual and received a minimum of three hours training in administering the PEDI-G from the first author. Each therapist conducted at least two pilot PEDI-G interviews, which were not retained in the data set. During the data collection, in regular meetings, Skype sessions, and by email, additional questions and misunderstandings were discussed among the therapists who performed the interviews and the first author. To examine test–retest reliability the children‘s parents were interviewed by the same therapist who conducted the first interview. The second PEDI-G interview was carried out within a maximum of two weeks. The mean time between test and retest was 10.5 days with a minimum of seven days and a maximum of 14 days. Parents could choose where the PEDI-G interviews took place, either in the occupational therapy department or at the parent‘s home. The duration of interviews ranged from 20 min to one and a half hours. To examine inter-rater reliability, parents were asked whether they agreed to have the interview audiotaped. Here, two occupational therapists worked as a pair, with one scoring items during the parent interview and the other therapist scoring items after listening to the audiotape. All audios were rated by the same occupational therapist. In sum, 117 interviews were performed twice, and the first interview was audiotaped and then rated. Data Analysis All statistical analyses were conducted using the R Core Team (2013) software packages. Comparisons among the groups from the three countries (Austria,

302

Schulze et al.

Switzerland and Germany) were performed in relation to gender and diagnosis using the chi-square test. One-way ANOVA were used to compare the age of the children and the mean raw scores for the Functional Skill Scale and the Caregiver Assistance Scale. There were no statistical significant between country differences with respect to gender, age and diagnosis. In addition, mean scores of the PEDI-G Functional Skill Scale and Caregiver Assistance Scale domains did not differ among children from each country (Table 1). In previous reliability studies the sum scores of the PEDI were treated as interval data. In order to compare the findings with these studies, test–retest and inter-rater analysis were here also based on treating the raw scores of the PEDI-G as interval data. To calculate the reliability for the subscales of the Functional Skill Scale the number of items scored “capable” were summed for each subscale. Thus the sum score of a subscale can vary from 0 to 6. Likewise the items of the three domains of the Functional Skill Scale were summed. This sum score could vary from 0 to 73 for the Self-care domain, from 0 to 59 for the Mobility domain and from 0 to 65 in the Social Function domain of the Functional Skill Scale (Table 2). No subscales exist for the Caregiver Assistance Scale and the Modification Scale, the items of the three domain are rated on a six- and four-point Likert rating scale. For the Caregiver Assistance Scale the items were also summed up to attain a sum score for each domain. The sum score could vary from 0 to 40 in the Self-care domain, from 0 to 35 in the Mobility domain and from 0 to 25 in the Social Function domain of the Caregiver Assistance Scale. Even though other reliability studies of the PEDI did not evaluate the reliability on the Modification Scale we decided to include the Modification Scale in our analysis and calculated test–retest and interrater agreement of all 20 items. As the Modification Scale is considered as an ordinal scale the items were not summed up. Reliability is defined as: Reliability =

Between Subject Variability Total Variability

To examine the test–retest and inter-rater reliability, intraclass correlation coefficients (ICC) were calculated from a one-way ANOVA as (BMS–WMS)/(BMS + (r – 1) WMS), where BMS and WMS are the between subject and the within subject mean square, respectively, and r the number of trials. ICCs are used as a measure of association when studying the reliability of test scores over time and among raters (test–retest and inter-rater) (Norman and Streiner, 2008). Specifically, the ICC1 (one-way random effect model) is used when each subject is rated by a different rater. An ICC value > 0.75 is considered excellent, 0.60–0.75 good, 0.40–0.59 moderate and < 0.40 poor (Fleiss et al., 2004). Test–retest and inter-rater reliability were examined using the ICC1 for all 43 subscales for the Functional Skill Scale and the 20 items of the Caregiver Assistance Scale of the PEDI-G. Confidence Intervals for ICCs were adjusted by the Bonferroni correction when performing multiple comparisons. Cohen‘s weighted Kappa was used to calculate test–retest and inter-rater agreement for the Modification Scale. Cohen‘s Kappa is based on frequencies for nominal or ordinal data and is a measure of agreement that is adjusted for the agreement

Evaluating Aspects of Reliability

303

that can be expected by chance (Cohen, 1968). Kappa values range from −1 to 1 and are interpreted according to Landis and Koch (1977) as follows: values less than 0.20 imply poor agreement; values between 0.21 and 0.40 indicate fair agreement, values between 0.41 and 0.6 indicate moderate agreement, values between 0.61 and 0.80 indicate good agreement and very good agreement is indicated by values between 0.81 and 1.00. In addition to the reliability calculation for PEDI-G subscales and items, in order to provide further precision of the PEDI measurement the absolute reliability among scorings was determined using the standard error of measurement (SEM) and SDD for the three domains of the Functional Skill Scale and the Caregiver Assistance Scale. The calculation of absolute reliability and ICC for both PEDI-G versions with and without items was performed. The SEM is defined in terms of the standard deviation and the reliability as:  SEM = σ 1 − R where σ is the standard deviation of the observed scores. The SEM is directly related to the reliability of a test. That is, the larger the SEM, the lower the reliability of the test and the less precision there is in the measures taken and scores obtained. SEM also allows calculation of the SDD, which is calculated as follows: √ SDD = 1.96 × 2 × SEM = 2.77 × SEM The SDD is an estimate of the smallest difference in PEDI score that is greater than what might be explained by measurement error (between raters or two time points). RESULTS Test–Retest Reliability of PEDI-G with added items The ICC for each domain (Self-care, Mobility and Social Function) of the Functional Skill Scale and the Caregiver Assistance Scale are reported in Tables 3 and 4. All ICC’s were greater than 0.95, indicating excellent test–retest reliability. The ICC’s were greater than 0.75 for each of the 43 subscales of the Functional Skill Scale, indicating excellent test–retest reliability. Additionally the ICCs were greater than 0.75 for each of the 20 items of the Caregiver Assistance Scale, indicating excellent test–retest reliability. The identified differences between the mean scores of the six domains for the first and second interview do not show a systematic pattern. TABLE 3. ICC, SEM, and SDD of the Functional Skill Scale Functional Skill Scale Domain Self-care Mobility Social Function

Test–Retest Reliability

Inter-Rater Reliability

ICC (95%CI)

SEM

SDD

ICC (95%CI)

SEM

SDD

0.97 (0.96–1.00) 0.98 (0.97–0.99) 0.96 (0.96–0.99)

1.95 2.01 1.81

5.42 5.58 5.02

0.98 (0.99–1.00) 0.98 (0.99–1.00) 0.99 (0.99–1.00)

0.66 0.66 0.54

1.84 1.82 1.50

304

Schulze et al.

TABLE 4. ICC, SEM, and SDD of the Caregiver Assistance Scale Caregiver Assistance Scale Domain Self-care Mobility Social Function

Test–Retest Reliability

Inter-Rater Reliability

ICC (95%CI)

SEM

SDD

ICC (95%CI)

SEM

SDD

0.97 (0.95–0.99) 0.97 (0.94–0.98) 0.96 (0.92–0.98)

1.98 1.67 1.46

5.48 4.63 4.06

0.98 (0.97–0.99) 0.99 (0.98–1.00) 0.97 (0.97–0.99)

0.61 0.37 0.30

1.7 1.00 0.83

Inter-Rater Reliability of PEDI-G with added items The ICCs between two raters for each domain (Self-care, Mobility and Social Function) are presented in Tables 3 and 4. ICCs were greater than 0.95 indicating excellent inter-rater reliability. The ICCs for inter-rater reliability were greater than 0.75, for the 43 subscales of the Functional Skill Scale, indicating excellent interrater reliability. The ICCs were greater than 0.75 for the 20 items of the Caregiver Assistance Scale, indicating excellent inter-rater reliability. Test–Retest and Inter-Rater Reliability of Modification Scale The results of inter-rater and test–retest reliability of the Modification Scale using Cohen’s weighted Kappa are presented in Tables 5 and 6. Kappa values varied from 0.25 to 1.00, i.e., from poor to almost perfect reliability. Impact of added items in the PEDI-G on Reliability Due to the fact that nine items were added during the translation and adaptation process of the PEDI-G, absolute reliability was investigated for the two translated versions: the one with added items and the one without added items. The added eight items of the Functional Skill Scale had only a minimal effect on the reliability, with ICC differences of less than 0.02. The ICCs across the domains for the version with added items were in the range 0.96–0.99 and the version without the additional items varied from 0.97–0.99. Smallest Detectable Difference of PEDI-G with added items The SEM and SDD for each domain of the Functional Skill Scale and the Caregiver Assistance Scale are presented in Tables 3 and 4. The SEM when evaluating test–retest and inter-rater reliability varied from 0.54 to 2.01 across all domains of the Functional Skill Scale and from 0.30 to 0.97 across all domains of the Caregiver Assistance Scale. The SDD varied from 1.50 to 5.58 across all domains of the Functional Skill Scale and from 0.83 to 5.58 across all domains of the Caregiver TABLE 5. Test–Retest Reliability for the Modification Scale items Modification Scale Self-care Mobility Social Function

A

B

C

D

E

F

G

H

0.68 0.50 0.79

0.41 0.46 0.53

0.38 0.49 0.33

0.32 0.52 0.56

0.39 0.25 0.51

0.64 0.87

0.73 0.36

0.39

Note: A-H are items of the Modification Sale.

Evaluating Aspects of Reliability

305

TABLE 6. Inter-Rater Reliability for the Modification Scale items Modification Scale Self-care Mobility Social Function

A

B

C

D

E

F

G

H

0.77 0.82 1.00

0.65 0.69 1.00

0.72 0.86 1.00

0.6 0.82 0.71

0.62 0.74 0.65

0.66 0.84

0.69 0.70

0.68

Note: A-H are items of the Modification Scale.

Assistance Scale. This indicates that with 95% certainty a difference of 6 points for the domains of the Functional Skill Scale can be interpreted as a real difference and not due to a measurement error. DISCUSSION Test–Retest and Inter-Rater Reliability for the German PEDI was examined for all three scales in a sample of children with and without a physical disability in Austria, Germany, and Switzerland. Test–Retest and Inter-Rater Reliability Excellent test–retest and inter-rater reliability were demonstrated on the PEDIG in a sample of children with typical development and children with physical disabilities. The findings support the accuracy and stability of the PEDI-G when completed by interview in German. Our findings are comparable to the results of studies with the PEDI in other and same contexts, samples and languages such as Chinese, Danish, Dutch, Norwegian, Turkish (Berg et al., 2004; Chen et al., 2009; Erkin et al., 2007; Stahlhut et al., 2010; Wassenberg-Severijnen, 2005). Although the ICC obtained in our study were excellent, there were slightly more variation in test–retest reliability than inter-rater reliability. The differences between the first and second measurements are also consistent with other reliability studies of the PEDI and may, as in the other studies, be interpreted as being due to a test effect (Berg et al., 2004; Wassenberg-Severijnen, 2005). Parents may have considered the performance of their child and their assistance more consciously after the first PEDI interview. Knowing that they were going to be interviewed a second time may have motivated them to observe their child with increased attention. This is also in line with the findings of a qualitative study from Rich et al. (2013), which explored parental experiences associated with completion of the PEDI before and after therapy intervention, and concluded that completion of the PEDI was associated with increased parental awareness of developmental patterns and greater insight into the levels of assistance they gave their child Inter-rater agreement on the Modification Scale overall was good. In contrast, the test–retest agreement varied from poor to perfect agreement across all three domains. Only two studies have investigated the test–retest agreement of the Modification Scale (Chen et al., 2009; Chen et al., 2010). Unfortunately the results of those studies cannot be directly compared to our findings as the samples either consisted of only children with typical development or only children with cerebral palsy. The test–retest reliability of the Modification Scale in our study is not sufficient overall. Therefore, this scale may not be useful to measure change over time.

306

Schulze et al.

PEDI-G with and without added items Our results show that the PEDI-G with added items can also be used as a reliable tool for evaluating functional ability and performance of Children‘s ADLs in a German-speaking context. Adding items during the translation and adaptation process may have strengthened the cultural validity of the assessment (Nordtorp et al., 2013). However, differences in a measure modified to improve cultural validity need to be taken into consideration when comparing results with findings from other countries. Based on the results of our study a decision on whether or not to keep the added items in the final PEDI-G is not yet possible. Our results did not provide knowledge about whether those items add any new information. However, our findings indicate that reliability was not affected by the items added to the PEDI-G. Smallest Detectable Differences of PEDI-G with added items Iyer et al. (2003) recommend using other means for describing meaningful clinical change in children with disabilities such as the minimal clinically important difference (MCID) instead of age-based or norm-based scoring. This, because for most children with disabilities, the return to age expected performance is not a realistic goal (Iyer et al., 2003). The MCID may be especially important to consider if the assessment will be used to evaluate the effects of interventions (Haley and Fragala, 2006). Fragala-Pinkham et al. (2005) used the PEDI in an intervention study of children with physical disabilities and considered a minimal detectable change for the Mobility domain of the Functional Skill Scale (which is calculated with the ICC instead of the SEM as for smallest detectable difference) of more than ±7 as real change (Fragala-Pinkham et al., 2005). This is in line with our results, where a difference of 6 or more raw score points in the Mobility domain of the Functional Skill Scale is considered as a real difference. Unfortunately, other reliability studies of the PEDI did not report the SEM and the smallest detectable difference within a comparable sample. Therefore it is difficult to compare our findings with those of previous results. If the PEDI-G should be used to evaluate the ADLs of children with and without a diagnosed physical disability by the same rater at two time points, a SEM of around two points across all domains (following a 95% confidence interval) must be considered. Accordingly a mean difference of approximately 6 raw scores in each domain of the Functional Skill Scale can be interpreted as a “real” difference. Methodological Consideration In previous reliability studies the PEDI scores were treated as interval data and thus the ICC is mostly used to evaluate different aspects of reliability. As the assumption that ordinal data can be treated directly as interval data can be questioned this approach could be questioned too, and one might recommend a more conservative approach, e.g., to calculate Cohen‘s weighted Kappa on item level (Merbitz et al., 1989; Wright and Linacre, 1989). However, as the evaluation of item reliability refers more to the development of assessments we agree with the perspective of Streiner and Norman (2008) who list eleven reasons why the ICC is superior. Additionally, Shrout and Fleiss (1979) highlight the need for selecting the

Evaluating Aspects of Reliability

307

appropriate form of the ICC, whether one analyses the data using a one-way or a two-way model to calculate the reliability of test scores. Because the first and second ratings were performed by the same rater and all audios were scored by the same rater, a conservative approach was chosen by calculating the reliability based on a one-way model in which each subject was rated by a different set of raters. Limitations This study has potential limitations that need to be considered. Information on the parents e.g. gender, education and age is lacking and therefore the potential influence of parent characteristics on reliability was not examined. Another limitation could be that the study adopted participants with a wide range of function who might have had different profiles of functional limitations in their daily lives. As the range within the sample (children with and without a diagnosed physical disability and a wide age range) is influencing the reliability – future studies may need to evaluate the reliability also across subgroups in order to confirm or refute these findings. Further research, e.g. concerning discriminant validity of the PEDIG, with children with and without disabilities is also needed and additionally norms for children in German speaking countries may be needed, as research has indicated that the US norms may not fit for other contexts (Groleger et al., 2005; Nordmark et al., 1999). CONCLUSION Our findings support the use of the German translated and adapted (9 additional items) Pediatric Evaluation of Disability Inventory (PEDI-G). The Functional Skill Scale and the Caregiver Assistance Scale can be used as a reliable measure to evaluate performance in Self-care, Mobility and Social Function in children with and without diagnosed physical disability. The results for the Modification Scale indicate sufficient reliability for some but not all items, and is not recommended to measure change. The nine added items in the PEDI-G did not influence the reliability. ACKNOWLEDGMENTS We thank the children‘s caregivers for their participation. We also thank occupational therapists in Austria, Germany and Switzerland for their assistance with data collection; the project would not have been possible without their contributions. Declaration of interest: The authors report no conflict or interest. The authors alone are responsible for the content and writing of this article. This study was supported by grants from the Swiss Foundation for Children with Cerebral Palsy and the Commission for Technology and Innovation in Switzerland. ABOUT THE AUTHORS Christina Schulze, Reg OT, MscOT, Zurich University of Applied Sciences Switzerland, School of Health Professions, Institute of Occupational Therapy, and PhD student at Karolinska Institutet in Sweden, Department of Neurobiology, Care Sciences and Society, Division of Occupational Therapy. Anders Kottorp,

308

Schulze et al.

PhD, OT Reg, associate professor at Karolinska Institutet in Sweden, Department of Neurobiology, Care Sciences and Society, Division of Occupational Therapy, and affiliated professor at Zurich University of Applied Sciences, School of Health Professions, Institute of Occupational Therapy. Andre´ Meichtry, MSc ETH in Statistics, Statistical Consultant, Lecturer, School of Health Professions, Zurich University of Applied Sciences Switzerland. Margareta Lilja, Reg OT, Phd, Full ˚ professor, Department of Health Sciences, Lulea˚ University of Technology, Lulea, Sweden and at Karolinska Institutet in Sweden, Department of Neurobiology, Care Sciences and Society, Division of Occupational Therapy. Prof. Julie Page, PhD, Head of Research and Development in Occupational Therapy, Zurich University of Applied Sciences Switzerland, School of Health Professions, Institute of Occupational Therapy.

REFERENCES Anastasi A, Urbina S. (1997). Psychological testing. Prentice Hall. Beaton D, Bombardier C, Guillemin F, Ferraz MB. (2002). Recommendations for the cross-cultural adaptation of health status measures. from http://www.dash.iwh.on.ca/ assets/images/pdfs/xculture2002.pdf ¨ Berg M, Fraslie KF, Hussain A. (2003a). Applicability of pediatric evaluation of disability inventory in Norway. Scandinavian Journal of Occupational Therapy 10(3):118–126. Berg M, Jahnsen R, Froslie KF, Hussain A. (2004). Reliability of the Pediatric Evaluation of Disability Inventory (PEDI). Physical and Occupational Therapy in Pediatrics 24(3):61–77. Berg M, Jahnsen R, Holm I, Hussain A. (2003b). Translation of a multi-disciplinary assessment - procedures to achieve functional equivalence. Advances in Physiotherapy 5(2):57–66. Chen K-L, Hsieh C-L, Sheu C-F, Hu F-C, Tseng M-H. (2009). Reliability and validity of a Chinese version of the pediatric evaluation of disability inventory in children with cerebral palsy. Journal of Rehabilitation Medicine 41(4):273–278. doi: 10.2340/16501977-0319 Chen K-L, Tseng M-H, Hu F-C, Koh C-L. (2010). Pediatric evaluation of disability inventory: A cross-cultural comparison of daily function between Taiwanese and American children. Research in Developmental Disabilities 31(6):1590–1600. doi: 10.1016/j.ridd.2010.05.002 Cohen J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Retrieved from Your Journals@ database (70, 4). Psychological Bulletin October. Landis, R. K., Gary. (1977). The Measurement of observer Agreement for Categorical Data. International Biometrics Society 33(1):159–174. Elad D, Barak S, Eisenstein E, Bar O, Herzberg O, Brezner. (2012). A. Reliability and validity of Hebrew Pediatric Evaluation of Disability Inventory (PEDI) in children with cerebral palsy – health care professionals vs. mothers. Journal of Pediatric Rehabilitation Medicine 5(2):107–115. Erkin G, Elhan A, Aybay C, Sirzai H, Ozel S. (2007). Validity and reliability of the Turkish translation of the Pediatric Evaluation of Disability Inventory (PEDI). Disability & Rehabilitation 29(16):1271–1279. Feldman AB, Haley SM, Coryell J. (1990). Concurrent and construct validity of the Pediatric Evaluation of Disability Inventory. Physical Therapy Journal 70(10):602–610. Fleiss JL, Levin B, Paik MC. (2004). Statistical methods for rates and proportions. Wiley. Fragala-Pinkham MA, Haley SM, Rabin J, Kharasch VS. (2005). A fitness program for children with disabilities. Physical Therapy 85(11):1182–1200. Gannotti ME. (1998). The validity and reliability of the Pediatric Evaluation of Disability Inventory for children living in Puerto Rico. (Ph.D.), University of Connecticut. Retrieved Available from EBSCOhost cin20 database.

Evaluating Aspects of Reliability

309

Gannotti ME, Cruz C. (2001). Content and construct validity of a Spanish translation of the Pediatric Evaluation of Disability Inventory for children living in Puerto Rico. Physical and Occupational Therapy in Pediatrics 20(4):7–24. Gantschnig BE, Page J, Nilsson I, Fisher AG. (2013). Detecting differences in activities of daily living between children with and without mild disabilities. American Journal of Occupational Therapy May/June 67(3):319–327. Groleger K, Vidmar G, Zupan A. (2005). Applicability of the Pediatric Evaluation of Disability Inventory in Slovenia. Journal of Child Neurology 20(5):411–416. doi: 10.1177/08830738050200050301 Haley SM, Coster WJ, Dumas HM, Moed R. (2012). PEDI-CAT: Development, standardization and administration manual. Trustees of Boston University, under license to CREcare, LLC. Haley SM, Coster WJ, Ludlow LH, Haltiwanger JT, Andrellos PJ. (1992). Pediatric evaluation of disability inventory: Development, standardization, and administration manual. Boston, MA: New England Medical Centre Inc. and PEDI Research Group. Haley SM, Fragala-Pinkham MA. (2006). Interpreting change scores of tests and measures used in physical therapy. Physical Therapy 86(5):735–743. Hambleton RK. (2001). The next generation of the ITC test translation and adaptation guidelines. European Journal of Psychological Assessment 17(3):164–172. Knox V, Usen Y. (2000). Clinical review of the Pediatric Evaluation of Disability Inventory. British Journal of Occupational Therapy 63(1):29–32. Kottorp A, Bernspang B, Fisher AG. (2003). Validity of a performance assessment of activities of daily living for people with developmental disabilities. Journal of Intellectual Disability Research 47(Pt 8):597–605. Iyer LV, Haley SM, Watkins MP, Dumas HM. (2003). Establishing minimal clinically important differences for scores on the pediatric evaluation of disability inventory for inpatient rehabilitation. Physical Therapy 83(10):888–898. McCarthy ML, Silberstein CE, Atkins EA, Harryman SE, Sponseller PD, Hadley-Miller NA. (2002). Comparing reliability and validity of pediatric instruments for measuring health and well-being of children with spastic cerebral palsy. Developmental Medicine & Child Neurology 44(7):468–476. Mancini M. (2005). Inventory Pediatric Evaluation of Disability (PEDI): Brazilian adapted version of the manual. Belo Horizonte UFMG 6:55–133. Merbitz C, Morris J, Grip JC. (1989). Ordinal scales and foundations of misinference. Archives of Physical Medicine & Rehabilitation 70:308–312. Morales N, Funayama C, Rangel V, Frontarolli A, Araujo R, Pinto R, Silva C. (2008). Psychometric properties of the Child Health Assessment Questionnaire (CHAQ) applied to children and adolescents with cerebral palsy. Health and Quality of Life Outcomes 6(1):109. Nichols, Case-Smith. (1996). Reliabilty and validity of the Peidatric Evaluation of Disability Inventory. Pediatrics of the American Physical Therapy Association 8:15–24. ¨ Nordmark E, Orban K, Hagglund G, Jarnlo GB. (1999). The American Paediatric Evaluation of Disability Inventory (PEDI). Applicability of PEDI in Sweden for children aged 2.0–6.9 years. Scandinavian Journal of Rehababilitation Medicine 31(2):95–100. Nordtorp HL, Nyquist A, Jahnsen R, Moser T, Strand LI. (2013). Reliability of the Norwegian version of the Children’s Assessment of Participation and Enjoyment (CAPE) and Preferences for Activities of Children (PAC). Physical & Occupational Therapy in Pediatrics 33(2):199–212. doi: doi:10.3109/01942638.2012.739269. Norman PHD, Streiner DL. (2008). Biostatistics: The bare essentials. B.C. Decker. Palisano R, Rosenbaum P, Bartlett D, Livingston M. (2008). Content validity of the expanded and revised Gross Motor Function Classification System. Developmental Medicine & Child Neurology 50(10):744–750. Palisano RJ, Rosenbaum PL, Walter S, Russell D, Wood E, Galuppi B. (1997). Development and reliability of a system to classify gross motor function in children with cerebral palsy. Developmental Medicine and Child Neurology 39:214–233. ¨ Projektgruppe ergotherapeutische Befundinstrumente in der Padiatrie, D. (Ed.). (2004). Be¨ fundinstrumente in der padiatrischen Ergotherapie. Idstein: Schulz-Kirchner Verlag.

310

Schulze et al.

Rich D, Rigby P, Wright V. (2013). Mothers’ experiences with the Pediatric Evaluation of Disability Inventory (PEDI). Physical & Occupational Therapy in Pediatrics 0(0), null. doi: doi:10.3109/01942638.2013.823473 Schulze C, Page J, Kottorp A, Lilja M. (2013). Adapting functional assessments for use in a new context: A balancing act. Scandinavian Journal of Occupational Therapy 20(5):336–342. doi: doi:10.3109/11038128.2013.802010 Stahlhut M, Christensen J, Aadahl M. (2010). Applicability and intrarespondent reliability of the pediatric evaluation of disability inventory in a random danish sample. Pediatric Physical Therapy 22(2):161–169 110.1097/PEP.1090b1013e3181dbf1965. Stahlhut M, Gard G, Aadahl M, Christensen J. (2011). Discriminative validity of the danish version of the Pediatric Evaluation of Disability Inventory (PEDI). Physical & Occupational Therapy in Pediatrics 31(1):78–89. doi: doi:10.3109/01942638.2010.486824 Streiner DL, Norman GR. (2008). Health measurement scales: A practical guide to their development and use. Oxford University Press. Vacha-Haase T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement 58(1):6–20. doi: 10.1177/0013164498058001002 Wassenberg-Severijnen JE. (2005). Pediatric Evaluation of Disability Inventory (PEDI): Calibrating the Dutch version. (PhD Doctoral thesis), Universiteit Utrecht. Wright B, Linacre M. (1989). Observations are always ordinal; Measurement, however, must be interval. Archives of Physical Medicine & Rehabilitation 70:857–860. Ziviani J, Ottenbacher KJ, Shepard K, Foreman S, Astbury W, Ireland P. (2001). Concurrent validity of the Functional Independence Measure for Children (Wee-FIM) and the Pediatric Evaluation of Disability Inventory for children with developmental disabilities and acquired brain injury. Physical and Occupational Therapy in Pediatrics 21:91–101.

Inter-Rater and Test-Retest Reliability of the German Pediatric Evaluation of Disability Inventory (PEDI-G).

ABSTRACT Activities of daily living (ADL) of children are widely assessed with the Pediatric Evaluation Disability Inventory (PEDI). This study examin...
123KB Sizes 0 Downloads 6 Views