International Journal of Speech-Language Pathology, 2014; 16(4): 417–426

INVITED ARTICLE

Criterion-related validity of the Test of Children’s Speech sentence intelligibility measure for children with cerebral palsy and dysarthria

MEGAN HODGE & CARRIE LYNNE GOTZKE Int J Speech Lang Pathol Downloaded from informahealthcare.com by University of Otago on 07/10/15 For personal use only.

University of Alberta, Edmonton, Alberta, Canada

Abstract Purpose: To evaluate the criterion-related validity of the TOCS+ sentence measure (TOCS+, Hodge, Daniels & Gotzke, 2009) for children with dysarthria and CP by comparing intelligibility and rate scores obtained concurrently from the TOCS+ and from a conversational sample. Method: Twenty children (3 to 10 years old) diagnosed with spastic cerebral palsy (CP) participated. Nineteen children also had a confirmed diagnosis of dysarthria. Children’s intelligibility and speaking rate scores obtained from the TOCS⫹, which uses imitation of sets of randomly selected items ranging from 2–7 words (80 words in total) and from a contiguous 100-word conversational speech were compared. Results: Mean intelligibility scores were 46.5% (SD ⫽ 26.4%) and 50.9% (SD ⫽ 19.1%) and mean rates in words per minute (WPM) were 90.2 (SD ⫽ 22.3) and 94.1 (SD ⫽ 25.6), respectively, for the TOCS⫹ and conversational samples. No significant differences were found between the two conditions for intelligibility or rate scores. Strong correlations were found between the TOCS⫹ and conversational samples for intelligibility (r ⫽ 0.86; p ⬍ 0.001) and WPM (r ⫽ 0.77; p ⬍ 0.001), supporting the criterion validity of the TOCS⫹ sentence task as a time efficient procedure for measuring intelligibility and rate in children with CP, with and without confirmed dysarthria. Conclusion: The results support the criterion validity of the TOCS+ sentence task as a time efficient procedure for measuring intelligibility and rate in children with CP, with and without confirmed dysarthria. Children varied in their relative performance on the two speaking tasks, reflecting the complexity of factors that influence intelligibility and rate scores.

Keywords: Cerebral palsy, children, intelligibility.

Introduction The critical importance of intelligible speech for successful spoken communication is widely accepted in the field of communication disorders and has been elucidated by multiple authors over several decades (e.g. Kent, Miolo, & Bloedel, 1994; Kent, Weismer, Kent, & Rosenbek, 1989; Miller, 2013; Yorkston & Beukelman, 1981). However, reliable and valid assessment of speech intelligibility continues to be challenging because of differences in how narrowly or broadly it is defined and the multiple factors that can affect its measurement (Hodge & Whitehill, 2010; Miller, 2013). These include variables related to the talker, speaking task and environment, phonetic and linguistic characteristics of the spoken material, listener characteristics, judging task and listening environment. This is particularly the case for pre-literate children or children at younger ages developmentally, where factors related to the ability to attend to and perform the speaking task may further complicate

intelligibility assessment. Additional challenges in assessing children with brain disturbances are that aspects other than motor control that influence spoken language competency may be affected. These include understanding and use of vocabulary, grammar and syntax and following conversational norms for using spoken language for different purposes. The Test of Children’s Speech (TOCS⫹) sentence intelligibility measure (Hodge, Daniels, & Gotzke, 2009) was developed in an attempt to provide a standard protocol for obtaining intelligibility scores based on connected speech from children with developmental language ages as young as 3 years. The original impetus for developing this measure was to provide a reliable and valid intelligibility measure to use with younger English-speaking children with dysarthria. This population was considered a priority for such a measure because the chronic nature of neuromuscular impairment underlying these children’s speech disorders makes normally-articulated, natu-

Correspondence: Professor Megan Hodge, University of Alberta, Communication Sciences and Disorders, Rm. 2-70 Corbett Hall, University of Alberta, Edmonton, Alberta, T6G2G4 Canada. Email: [email protected] ISSN 1754-9507 print/ISSN 1754-9515 online © 2014 The Speech Pathology Association of Australia Limited Published by Informa UK, Ltd. DOI: 10.3109/17549507.2014.930174

Int J Speech Lang Pathol Downloaded from informahealthcare.com by University of Otago on 07/10/15 For personal use only.

418

M. Hodge & C. L. Gotzke

ral-sounding speech rarely achievable. However, increasing intelligibility is often a primary therapeutic goal for these children (Hodge & Wellman, 1999). Speech intelligibility measures are desirable for children with dysarthria to estimate the severity of speech disorder, which influences decisions about the need for and nature of intervention and potentially discriminates between children with and without speech disability. Intelligibility measures are also needed to evaluate change in a child’s ability to make her or his words understandable to others over time or following medical or behavioural intervention. In keeping with the original purpose for developing the TOCS⫹, initial evaluation of its reliability (Hodge & Gotzke, 2010) and validity has focused on children with cerebral palsy (CP). Developmental dysarthria has been identified as the most frequently occurring motor speech disorder in children with CP (Webb & Adler, 2008), which is the most common neurodisability in children (Msall & Park, 2009). CP is defined as a group of disorders of movement and posture that limits activity and is attributed to nonprogressive disturbances in the foetal or infant brain (Bax, Goldstein, Rosenbaum, Leviton, & Paneth, 2005). The impact of CP on a child’s motor control for speech production may delay early speech learning and impair the strength, speed, accuracy, co-ordination and endurance of the muscle groups used to speak (Hodge, 2010). These factors constrain development of differentiated, precise, dynamic actions of the oral articulators and their co-ordination with the respiratory–phonatory system to produce clear, efficient speech patterns. The TOCS⫹ sentence measure is based on a narrow definition of intelligibility, that is, the extent to which listeners, who are not familiar with the child’s speech, can identify the words in audio recordings of connected utterances produced by the child. On a continuum of signal dependent and signal independent measures of intelligibility (Miller, 2013), TOCS⫹ was designed to be closer to the signal dependent end because the listener’s task is to understand the child’s spoken words based on the sound signal without broader contextual cues beyond those contained in the utterance. The TOCS⫹ sentence measure is similar to the sentence format of the Sentence Intelligibility Test for Windows developed by Yorkston, Beukelman, and Hakel (1996) for adults with dysarthria; however, the vocabulary and morphosyntax of the TOCS⫹ items are appropriate for children with developmental language ages between 3–7 years. The TOCS⫹ sentence item pools were developed with reference to normative semantic, grammatical, and syntactic characteristics for children with a mean length of utterance (MLU) ranging from two-to-seven words (e.g. Leadholm & Miller, 1992; Loban, 1976; Miller & Chapman, 1986; Reznick & Goldsmith, 1989; Stemach & Williams, 1988; Tager-Flusberg, 1993; Wepman & Lozar, 1973). Each unique sentence test contains both shorter and longer phrases and sentences. The user selects the lon-

gest utterance length to appear in a form (3, 4, 5, 6 or 7 words), based on the child’s mean length of utterance, to ensure that the language level of the items is within the child’s ability. The number of stimulus items in each sentence test depends on the number of words in the longest test items. TOCS⫹ software generates a sentence test by randomly selecting items from the master pools that have sentence lengths at or below the child’s estimated MLU. For example, for a child with an estimated MLU of four words, nine 2-word items (18 words), ten 3-word items (30 words) and eight 4-word items (32 words) would be selected randomly from the respective master sentence length pools to create a “unique” 80-word sentence test. This serves to reduce a listener’s ability to predict what the child will say. During item selection, the software ensures that content words (nouns, verb, adjectives and adverbs) do not repeat within a test so that the items selected for a given test will contain unique content words. Most of the items in the 2-word pool and many of the items in the 3-word pool are noun and verb phrases, as opposed to true sentences. Complex utterances only appear in the 6- and 7-word item pools. The software controls the number and type of complex utterances that are selected for each test that has six and seven word utterances so that these are the same across all tests. TOCS⫹ software elicits a child’s productions through direct imitation of pre-recorded auditory models of the items, which are presented with representative photographs and text. This has the advantages of time efficiency and knowledge of the target utterances, but the disadvantage that it may not be representative of a given child’s typical speech patterns. Imitative and spontaneous speaking tasks have been used to obtain samples of young children’s speech for intelligibility judgements. Gordon-Brannan and Hodson (2000) reported a significant correlation coefficient of 0.85 between intelligibility scores obtained from a 100-word conversational sample and a sentence imitation task for 4–5 year-old children with varying levels of phonological proficiency (adultlike to severe involvement). Despite this strong correlation, the authors argued that spontaneously generated samples of connected speech are the most representative of children’s typical spoken language and speech production abilities and the preferred method for evaluating intelligibility. However, spontaneous samples have several disadvantages compared to imitated samples; they are time-consuming to gloss, may vary considerably in phonetic and morphosyntactic content and are difficult to gloss when intelligibility is poor (Flipsen, Hammer, & Yost, 2005). Johnson, Weston, and Bain (2004) compared spontaneous speech samples with a set of imitated sentences to estimate severity of speech delay in 20 4–6 year-old children. They found that two kinds of percentage consonant correct (PCC) scores, each derived from the two sample types, did not differ significantly between sample types and reported strong, positive

Int J Speech Lang Pathol Downloaded from informahealthcare.com by University of Otago on 07/10/15 For personal use only.

TOCS⫹ sentence measures criterion validity

correlation coefficients (r ⫽ 0.86; 0.84) between PCC scores obtained in each sample type. The authors concluded that imitative samples that are constructed using age-appropriate vocabulary and morphosyntactic structures are an appealing alternative to spontaneous samples for determining measures of phonetic ability such as PCC. Such samples are more timeefficient to collect and analyse and a gloss does not need to be created. Johnson et al. (2004) also reported that the imitative samples obtained in their study allowed for a more in-depth analysis for children who were very unintelligible because parts of the sample that were difficult to gloss were not lost. The TOCS⫹ sentence measure uses a word identification task to determine intelligibility scores. In this task, listeners identify the words perceived without knowledge of the intended utterance using a wordby-word write down procedure (orthographic transcription). These orthographic transcriptions are compared to the stimulus items spoken by the child to determine the number of words that are identified correctly by the listener. The number of words identified correctly out of the total words spoken, converted to a percentage, provides the intelligibility score. Although word identification requires more time than scaling or other subjective estimates of intelligibility measurement (e.g. Hustad, Schueler, Schultz, & DuHadway, 2012; Levy, Ramig, & Camarata, 2013), it has the advantage of greater reliability and sensitivity (Hustad, 2006; Whitehill, 2002). The judging component of the TOCS⫹ software presents the test recordings to listeners for word identification. Listeners type their responses using the computer keyboard. The software records and saves the listener’s response for each item and later provides a preliminary score based on a word-matching algorithm. Like the Sentence Intelligibility Test for Windows, the duration of each of the child’s recorded utterances for the TOCS⫹ can be measured and summed to determine total speaking time in minutes. This value can be used to calculate speaking rate in words per minute (WPM). Rate is of interest in considering intelligibility in persons with CP as their speaking rates have been reported to be slower than expected for their age (DuHadway & Hustad, 2013; Nip, 2013). In addition, speaking rate has been the focus of some interventions to increase intelligibility in children with CP (e.g. Levy et al., 2013; Pennington, Roelant, Thompson, Robson, Steen, & Miller, 2013). Delayed alternate form and inter-rater reliability for the TOCS⫹ intelligibility and rate measures have been reported previously (Hodge & Gotzke, 2010). The purpose of this study was to evaluate the criterion-related validity of the TOCS⫹ sentence measure for children with dysarthria and CP by comparing intelligibility and rate scores obtained concurrently from the TOCS⫹ and from a conversational sample. Spontaneous connected speech samples have been recommended as the “gold standard” for measuring a child’s speech ability. The study was designed to answer the following questions: (1) Do intelligibility

419

scores based on word identification tasks and speaking rate measured in WPM differ for the TOCS⫹ sentence measure, compared with a 100-word conversational sample, for children with CP? and (2) What is the relationship between the intelligibility and rate scores obtained from the TOCS⫹ sentence and the conversational sample conditions? Findings of non-significant differences between the conditions and strong positive correlations between children’s scores in the two conditions would be interpreted as support for the criterion-related validity of the TOCS⫹ sentence measure. Method Ethical considerations The procedures followed in the study were in accordance with and approved by the Health Research Ethics Board at the University of Alberta and the research ethics boards at the two hospital sites where children were recruited and with the Declaration of Helsinki (1975), as revised in 1983. Informed consent was obtained from the parents of the children and from the listeners. The children also assented to participate. Parents were provided with payment to cover their parking expenses for attending the data collection sessions. Each listener received an honorarium of 10 dollars for participating. Participants Children were recruited via convenience sampling from local community programs and two rehabilitation hospitals serving children in western Canada. Audio recordings obtained from 20 children (age range ⫽ 3–10 years; M ⫽ 5.3 years; SD ⫽ 2.3) with a medical diagnosis of spastic CP were analysed. Characteristics of the children are reported in Table I. The medical diagnosis of CP was made by experienced paediatric physiatrists who were providing clinical services to children with CP at these hospitals. To increase homogeneity of the sample studied, only children who had primarily upper motor neuron signs of CP were included. Type and distribution of motor impairment of the 20 child participants are described in Table I. The children (13 girls; seven boys) represented the full range of severity (levels I–V) on the Gross Motor Function Classification System Expanded and Revised (Palisano, Rosenbaum, Bartlett, & Livingston, 2007). All children had developmental receptive and expressive language ages of at least 3 years old, as reported by their clinical speech-language pathologists, normal hearing based on an audiological assessment conducted within 1 year of the date of recording or a standard hearing screening at the time of recording (Alberta College of SpeechLanguage Pathologists and Audiologists, 2008) and were able to perform the tasks required. Nineteen

420

M. Hodge & C. L. Gotzke Table I. Child participant characteristics.

Int J Speech Lang Pathol Downloaded from informahealthcare.com by University of Otago on 07/10/15 For personal use only.

Child 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

CA (years)

CP diagnosis

Sex

GMFCS level

3 3 4 4 4 4 4 5 5 5 5 6 6 6 7 8 8 10 10 10

Spastic quadriplegia Worster-Drought Spastic quadriplegia Spastic hemiplegia (L) Spastic quadriplegia Spastic quadriplegia Worster-Drought Spastic quadriplegia Spastic quadriplegia Worster-Drought Spastic diplegia Spastic hemiplegia (R) Spastic diplegia Spastic quadriplegia Spastic diplegia Spastic hemiplegia (L) Spastic quadriplegia Spastic quadriplegia Spastic quadriplegia Spastic quadriplegia

F M F M F F F M F M M M F F F F F F F M

IV I IV II IV IV I IV IV II III II III IV III II IV IV IV V

MLU/longest utterance (words) 2.91/6 4.29/9 3.52/10 3.82/ 8 4.21/10 2.12/6 4.00/9 2.86/10 4.24/7 6.81/11 4.64/12 3.92/8 5.10/9 3.85/11 4.17/11 4.68/8 3.62/11 4.21/12 6.44/18 2.63/8

TOCS⫹ longest item (words) 4 4 5 4 5 3 6 6 6 6 6 5 7 4 7 6 7 7 7 3

CA, chronological age; CP, cerebral palsy; Worster-Drought, Worster-Drought Syndrome, also known as congenital suprabulbar paresis; L, left; R, right; F, female; M, male; GMFCS, Gross Motor Function Classification System Level (Palisano et al., 2007), which uses a 5-level system to classify a child’s abilities and limitations in gross motor function, with level I indicating the highest level of abilities and fewest limitations; MLU, mean length of utterance in the 100-word spontaneous sample; longest utterance, number of words in longest utterance in the 100-word spontaneous sample; longest item, number of words in the longest item on the TOCS⫹ sentence measure.

of the children were judged to be dysarthric as determined independently by two experienced speech-language pathologists. They ranged in severity of speech impairment from mild-to-severe and all used speech as their main means of communication

for social interaction. Child 12 (Tables I and II) had delayed speech sound use for his age based on a standard articulation assessment (scored below the 16th percentile for his age) and had a mild, inconsistent dysphonic voice quality at the time of recording,

Table II. Child participants’ intelligibility and speaking rate scores in the TOCS⫹ imitated and spontaneous speech samples. Intelligibility (%) Child 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Rate (WPM)

TOCS⫹

Spontaneous

TOCS⫹

Spontaneous

37.9 62.7 59.1 32.9 35.1 2.0 29.9 69.6 80.6 66.0 61.7 87.3 45.0 13.7 85.2 17.7 28.6 25.2 76.9 13.0

52.0 64.7 39.9 58.9 54.8 17.3 32.4 47.6 75.8 79.8 57.2 75.5 59.0 18.6 75.0 35.3 45.1 40.9 64.1 23.8

59.9 80.5 112.8 115.1 73.2 43.1 78.5 81.5 96.6 96.0 109.1 115.6 116.4 65.1 126.9 95.4 83.4 99.7 90.9 63.9

73.4 81.2 136.0 129.7 109.7 48.8 103.6 87.8 75.1 108.8 92.0 93.5 124.1 56.2 131.9 105.8 78.4 110.1 64.7 71.3

Spontaneous adjusted 79.8 79.2 152.9 112.0 64.6 49.7 122.3 91.1 72.1 97.7 104.7 90.0 120.6 66.8 131.3 92.7 79.3 110.6 63.9 57.8

WPM, words per minute; spontaneous adjusted rate was calculated using only those utterances from the spontaneous sample that contained at or below the number of words of the child’s longest TOCS⫹ items.

TOCS⫹ sentence measures criterion validity

but a diagnosis of dysarthria could not be made with a high level of confidence.

Int J Speech Lang Pathol Downloaded from informahealthcare.com by University of Otago on 07/10/15 For personal use only.

Recording procedures All recordings sessions took place in a sound booth. Recordings for both conditions were obtained using a head-mounted microphone (Shure WH20) connected to an Audio Buddy preamplifier. The output cable from the preamplifier was routed through the wall of the sound booth and then to the microphone input on the computer’s inboard audio card. Each child was administered a unique form of the TOCS⫹ sentence measure. The examiner selected the longest item length in number of words to appear in the form based on the child’s estimated MLU in words from parent report. Parents were asked to estimate the average number of words (whole numbers) the child used in an utterance when speaking at home. The child was also engaged in an introductory conversation at the beginning of the session to obtain a short spoken language sample to augment the parent’s estimate of MLU in making a decision about the longest item length for the child’s TOCS⫹ sentence items. It would have been preferable to determine each child’s MLU in words directly from the spontaneous sample condition. However, time did not allow this as the conversational sample was recorded at the same session as the TOCS⫹ imitated sentence sample. Table I reports the longest item in number of words for the TOCS⫹ sentence imitative task (chosen based on parent estimate of MLU in words) and the MLU and longest utterance in words for the conversational sample for each child. It is evident that in some cases the longest utterance on the TOCS⫹ task (and parent estimate of MLU in words) was greater than the child’s MLU in the conversational sample, but in all cases the child’s longest utterance in the conversational sample was greater than in the TOCS⫹. This provides some assurance that the TOCS⫹ task was at an appropriate language level. The examiner used the TOCS⫹ software to present each item to elicit the target utterances from the child. The auditory model for each stimulus item was presented in sound field and a representative colour photograph was displayed on the computer monitor. The audio stimulus files were played from the computer connected to a Technics Stereo Integrated Amplifier (model SU-V460) (both located outside of the sound booth), which was connected to two ElectroVoice S-40 compact monitor speakers located in the sound booth. Children were seated equidistant (∼ 3 feet) between the speakers. TOCS⫹ software has an internal setting that controls audio playback volume. Throughout the study this was set to 50% of maximum on the computer audio control panel. The volume of the amplifier was set to correspond to a sound level meter reading peaking between 50–65 dBA for a sample recording of the sentence stimuli when the sound level meter was

421

positioned 12 inches in front of the left speaker. Based on human calibration (young adults with normal hearing), this was judged to be a comfortable loudness level for presentation of the stimuli when seated in the child’s chair. During administration of the TOCS⫹ practice items, the child was asked if the words were too soft and the amplifier setting was adjusted slightly, depending on the child’s response. A demonstration of the TOCS⫹ intelligibility software can be found at www.tocs.plus.ualberta.ca. The software records each of the child’s utterances directly to a computer as a separate .wav file (SR ⫽ 48 kHz, QS ⫽ 16 bits). On average, recording of the TOCS⫹ sentence task took ∼ 10 minutes per child, including instructions. At the same session, a 15-minute conversational speech sample was recorded using Shriberg and Kwiatkowski’s (1985) procedure of parallel and interactive play scenarios. Children who could not manipulate the play materials were encouraged to talk about favourite activities, using open-ended questions. In cases where this did not elicit a sufficient sample, picture description tasks were included. The sample was audio-recorded directly to computer using the TOCS⫹ Recorder-Player (TOCS⫹ RP) software (Hodge, Gotzke, & Daniels, 2006). TOCS⫹ RP software uses the same digital recording platform as the TOCS⫹ intelligibility software. The conversational speech sample was also video recorded to provide additional contextual information when glossing the child’s utterances. Preparation of the recordings for analysis Keys were created for scoring each sample type. For the TOCS⫹ items, a research assistant checked each target utterance in the transcript generated by the TOCS⫹ software against what the child actually said. This was verified by playing and observing the audio waveform of the child’s recording using Adobe Audition (1992) audio editing software. Changes were made as needed to match the transcript to the child’s utterances. The final key was the corrected TOCS⫹ transcript. On average, these procedures took ∼ 15 minutes per TOCS⫹ transcript. For the conversational samples, research assistants used Adobe Audition software to display and play the waveform of each child’s recording to create an orthographic transcription of the sample. The orthographic transcription was verified by a second research assistant who parsed the recording into separate utterance files beginning after the first minute of the sample had elapsed and continuing until the word count reached 100 consecutive words. Choice of analysis of a 100-word conversational sample followed the procedures of Gordon-Brannan and Hodson (2000), who investigated relationships between 100-word self-generated connected speech samples and a set of imitated sentences in their study of severity measures of pre-kindergarten children’s

Int J Speech Lang Pathol Downloaded from informahealthcare.com by University of Otago on 07/10/15 For personal use only.

422

M. Hodge & C. L. Gotzke

speech. The procedures described by Flipsen (2006) were used to group unintelligible syllables into words during orthographic transcription (Intelligibility Index-original). The last complete utterance containing the hundredth word was included. Therefore, most of the children had slightly more than 100 words in the sample judged (range of 100–108). Exact repetitions, sound effects and fillers (e.g. um, ah) were not counted as words (Shriberg, Kwiatkowski, & Rasmussen, 1990). Research assistants consulted the video recording of the speech sample and the second author to confirm their transcription. The verified orthographic transcription of the parsed utterances provided a key for the intelligibility measure. The utterances to be judged were saved as separate .wav files using Adobe Audition and a play list of the audio files was created using the TOCS⫹ RP software to present the utterances for judging. The number of utterances judged in the children’s conversational samples ranged from 16–40. On average, it took ∼ 2 hours to create a key for a child’s conversational sample and prepare the utterance play list for judging. Intelligibility measure Listeners transcribed orthographically the words perceived from the child’s recordings for the two sample conditions. Three unique listeners who were not familiar with the child’s speech judged each set of recordings (TOCS⫹ sentences; conversational sample) for each child so that a total of 120 listeners participated in the study (20 children ⫻ 2 sample conditions ⫻ 3 listeners per condition). Listeners were between 18–38 years of age, reported Canadian English as a first language, passed a standard hearing screening and were enrolled in or graduates of a university degree program. Of the three listeners who judged a child’s sample in a given condition, two were enrolled in a speech-language pathology training program and one was enrolled in or a graduate of a different program of study. The same procedure and set up that was used to control volume play back of the auditory stimuli to the children during TOCS⫹ administration was used to control volume play back of the children’s utterances to the listeners when judging both the TOCS⫹ and conversational samples. Listeners sat in the same location in the sound booth as the children relative to the computer and speakers. Listeners heard each item no more than twice. The judging component of the TOCS⫹ intelligibility software was used to administer the listening task and collect and save listeners’ responses for the TOCS⫹ sentence item recordings. The TOCS⫹ RP software, which uses an identical format as the TOCS⫹ intelligibility software, was used to administer and collect listeners’ responses for the conversational sample recordings. TOCS⫹ sentence item recordings were played in random order. Conversational sample utterance recordings were played in the order that they occurred in the transcript. The per-

centage of words in each listener’s responses that matched the words in the respective key was determined for each child’s TOCS⫹ and conversational sample items. The mean of the three listeners’ results served as the intelligibility score for each child for each sample type. Three listeners’ responses were used to determine a child’s scores to provide a more stable measure of intelligibility (Yorkston & Beukelman, 1981). Inter-rater reliability for the 20 sets of three listeners’ scores for each sample type was estimated using an intra-class correlation coefficient (ICC). ICCs (Type 1, 3) of 0.98 and 0.95 were obtained, respectively, for the TOCS⫹ and conversational samples. Speaking rate measure The duration from onset to offset of each child’s production of each item on the TOCS⫹ sentence test and the utterances in the 100-word sample was measured using Adobe Audition to obtain WPM scores. WPM was calculated by tallying the number of words spoken in all utterances in a sample condition, based on the verified keys, divided by the sum in minutes of the durations of all utterances in the sample condition. To provide an estimate of measurement error, five imitated samples and five conversational samples were selected randomly to have the utterances measured a second time by a different researcher, who was blind to the original measures. For the five TOCS⫹ samples, the mean absolute difference was 1.8 WPM. For the five conversational samples, the mean absolute difference was 4.8 WPM.

Results Mean intelligibility scores were 46.5% (SD ⫽ 26.4%; Range ⫽ 2.0–87.3) for the 80-word TOCS⫹ imitated sample and 50.9% (SD ⫽ 19.1%; Range ⫽ 17.3–79.8) for the 100-word conversational sample. The result of a paired samples t-test comparing intelligibility scores between sample conditions was not statistically significant t(1, 19) ⫽ ⫺1.40, p ⫽ 0.18). Mean rates in WPM were 90.2 (SD ⫽ 22.3; Range ⫽ 43.1– 126.9) for the TOCS⫹ sample and 94.1 (SD ⫽ 25.6; Range ⫽ 48.8–136.0) for the 100-word conversational sample. There was not a significant difference in WPM scores between sample conditions t(1,19) ⫽ ⫺1.06, p ⫽ 0.30). Strong, positive correlations were found between the two sample conditions for intelligibility (r ⫽ 0.86; p ⬍ 0.001) and WPM (r ⫽ 0.76; p ⬍ 0.001). Table II reports the intelligibility and rate measures for the two speaking conditions for the 20 children. As can be seen in Table II, the 11 children with TOCS⫹ intelligibility scores ⬍ 50% had conversational intelligibility scores that were higher than their TOCS⫹ scores and the four children with TOCS⫹ intelligibility scores ⬎ 70% had conversation intelligibility scores that were lower than their TOCS⫹ scores. Of the five children with

Int J Speech Lang Pathol Downloaded from informahealthcare.com by University of Otago on 07/10/15 For personal use only.

TOCS⫹ sentence measures criterion validity

TOCS⫹ intelligibility scores ⬎ 50% and ⬍ 70%, three had lower intelligibility scores for the conversational sample and two had higher scores for the conversational sample. Of the 20 children, 12 had faster rates in the conversational sample and two had essentially the same rates in both sample conditions. To investigate the possibility that differences in utterance lengths in the two sample conditions for a given child might account for some of the differences observed in speaking rate, a post-hoc analysis was conducted to compare WPM scores using only utterances from the conversational sample that were at or below the length of words of the longest utterances in the TOCS⫹ imitated sample. The adjusted rate scores for the conversational sample are reported in Table II. The mean WPM score obtained for this sub-set of utterances from the conversational samples was 91.9 (SD ⫽ 27.1; Range ⫽ 49.7–152.9). This value is within 2 WPM of the mean rate for the TOCS⫹ samples and marginally slower than the group mean for the full 100-word conversational sample. The Pearson product moment correlation between WPM scores in the TOCS⫹ and adjusted length conversational sample utterances was 0.74, (p ⬍ 0.001), very similar to that for the TOCS⫹ and full set of utterances in the conversational samples. Three additional post-hoc analyses were conducted. One examined the relationship between intelligibility and rate scores in the two sample conditions, the second examined the relationship between intelligibility scores and MLU in the conversational sample condition and the third examined the relationship of age to the intelligibility and rate scores in the two speaking conditions. Figure 1 shows intelligibility scores plotted against rate scores for the 20 children for the TOCS⫹ and conversational samples. A Pearson product moment correlation coefficient of 0.54 (p ⫽ 0.013) was found between the intelligibility and rate scores for the imitated sample. As can be seen in Figure 1, the child with the lowest rate also had the lowest intelligibility score and the child with the fastest rate had the second highest intelligibility score on the TOCS⫹. However, as is evident in Table II, children with similar rates (e.g. 100–120 WPM) had a range of intelligibility scores and children with similar intelligibility scores (e.g. 30–40%; 60–70%) had a range of speaking rates on the TOCS⫹. A non-significant correlation coefficient (r ⫽ 0.37; p ⫽ 0.11) was found between intelligibility and rate scores for the conversational sample. As can be seen in Figure 1 for the conversational sample condition, the two children with the lowest intelligibility scores also had the lowest rates; however, there was no discernable relationship between intelligibility and rate for the rest of the children’s scores. The mean MLU for the children’s conversational samples was 4.1 words (SD ⫽ 1.13, Range ⫽ 2.1–6.8). A significant correlation coefficient (r ⫽ 0.58, p ⫽ 0.007) was found between intelligibility scores and MLU for the conversational samples. As can be seen from examining Tables I and II, the child with

423

Figure 1. Intelligibility (%) and rate (WPM) scores for the TOCS⫹ and conversational sample conditions plotted by child.

the shortest MLU (child 6) also had the lowest intelligibility and the child with the longest MLU (child 10) had the highest intelligibility. However, there was a large amount of scatter in the scores of the children, with those with an MLU between 3.5–4.5 words exhibiting a broad range of intelligibility scores. No significant correlations were found between age and intelligibility or rate scores for the TOCS⫹ samples or between age and intelligibility scores, rate scores or MLU for the conversational samples.

Discussion and conclusions Comparison with previous studies of children’s speech intelligibility Results for the 20 children demonstrate that, as a group, their intelligibility and rate scores were very similar in the TOCS⫹ imitative sentence and conversational sample conditions, with mean intelligibility scores being slightly higher (4.4%) and mean rates being slightly faster (3.9 WPM) in the conversational samples. These results support the criterionrelated validity of TOCS⫹ sentence measure. Gordon-Brannan and Hodson (2000) also found slightly higher mean intelligibility scores in spontaneous connected speech (76%) than in imitated sentences (74%) in their sample of 48 4–5 year-old children. No child in their study had known neuromotor involvement that might affect speech production and all children had age-appropriate receptive language. For the imitated sentence condition the children repeated 10 five-word simple, declarative, present tense sentences that corresponded with activities shown in a picture and sampled later developing consonant sounds. Children in the “adult-like” sub-group obtained high and similar mean scores in the two conditions (96% and 95%, respectively), while the mean scores of the children with the most severe phonological impairment were 48% and 44%, respectively, for the spontaneous and imitated samples. The mean scores for this latter group are similar to the scores for

Int J Speech Lang Pathol Downloaded from informahealthcare.com by University of Otago on 07/10/15 For personal use only.

424

M. Hodge & C. L. Gotzke

the children in the current study and to those reported by Hustad et al. (2012) for 4 year-old children with CP and age-typical language for imitation of TOCS⫹ sentence stimuli ranging from 2–7 words in length. However, the intelligibility scores reported by Hustad et al. are not directly comparable with those of the current study because they did not administer the TOCS⫹ sentence measure as it was designed. Rather, they presented sets of sentences at all possible item lengths for a child to imitate. Pennington et al. (2013) reported intelligibility scores based on word identification of connected speech (combination of imitated and spontaneous utterances) for 15 5–11 year-old children (mean age ⫽ 8 years 2 months) with CP who underwent an experimental speech therapy procedure. The mean score for the children’s connected speech intelligibility measure pre-treatment, based on unfamiliar listeners, was 41.2% (SD ⫽ 26.1), which is slightly lower than, but comparable to the results reported for the current study. There was a strong correlation between intelligibility scores in the TOCS⫹ and conversational conditions (r ⫽ 0.86), supporting the criterion-related validity of the TOCS⫹ sentence measure. This value is very similar to that reported by Gordon-Brannan and Hodson (2000) (r ⫽ 0.85) for a group of 48 4–5-year-olds with a range of phonological proficiency from severe to no discernable impairment. In the current study, intelligibility scores obtained using TOCS⫹ had greater variability compared with the conversational sample condition, suggesting that it provided a more sensitive measure of intelligibility in connected speech. One factor that may account for this is that content words are not allowed to repeat in a TOCS⫹ sentence test. Therefore, the TOCS⫹ sentence sample might be expected to contain a larger number of different content words than a conversational sample where children select their own vocabulary content and may use the same word several times in a sample. The finding that the imitated sample under-estimated the conversational sample intelligibility scores in all cases for children who scored below 50% intelligibility on the TOCS⫹ and over-estimated the conversational sample intelligibility score in all cases for children who scored above 70% on TOCS⫹ was not expected. It is hypothesized that the effect of related utterances and repeated content words in the conversational sample, compared with unrelated utterances and no repeated content words in the imitated sample, advantaged children with lower intelligibility (and a tendency for lower MLU). Conversely, it is hypothesized that the decreased planning (cognitive-linguistic-motor) load of the imitated sample advantaged the children with the highest intelligibility scores.These children also tended to have higher MLUs in the conversational samples. As can be seen in Table I, children with relatively high MLUs produced some relatively long utterances in their conversational samples, compared to the TOCS⫹. These longer, self-formulated utterances might be expected to result in an increased planning load, resulting in a lower-than-ex-

pected conversational intelligibility. Future research that examines the effects of cognitive and linguistic variables on measures of children’s speech ability in imitated and self-formulated speech and that controls for severity of speech and any accompanying language disorders appears warranted and has relevance for both speech assessment and therapy. Speaking rate No comparable data sets for measures of speaking rate for children with CP were located in a recent literature search other than the studies by Nip (2013) and Hodge and Gotzke (2010). TOCS⫹ uses words per minute as a unit of rate, following the conventions of Yorkston et al. (1996) for measuring speaking rate in adults with dysarthria. However, many investigations of children’s speaking rate have used articulatory rate as a measure, which eliminates pauses from the child’s utterances. Duhadway and Hustad (2013) reported slower articulatory rates in syllables per second for 19 children between 2–3 years of age with cerebral palsy compared with five children without CP, but this was based on single word productions. Nip (2013) also reported his results in syllables per second, with pauses of greater than 100 milliseconds removed for two of four subjects with cerebral palsy. Walker and Archibald (2006) reported articulation rates in syllables per second for 16 children reported to have typically-developing speech and who were studied longitudinally between 3–5 years in several contexts including spontaneous speech and imitated sentences. As with the children in the current study, mean group scores for the spontaneous sample were higher than for the imitated sentence condition, but this result was not significant. The typically-developing children had considerable individual differences in rate like the children with CP in the current study. Walker and Archibald reported no gender differences or positive correlations between articulation rate and utterance length. The correlation between rate measures in the two sample conditions in the current study was relatively strong (r ⫽ 0.76) and the difference in group means between the two conditions was small (3.9), supporting the criterion-related validity of the TOCS⫹ WPM measure. However, for individual children, the difference in rate scores between the two sample types varied from essentially no difference to 36 WPM. The results of a post-hoc analysis that compared utterances of the same lengths between the sample types reduced the difference between the group means of the sample types to less than 2 WPM. A limitation of the rate measure used in the study is that within-word pauses were included so it cannot be determined if articulatory rate (e.g. mean time to say each word in the sample), within utterance pause time or both accounted for the differences observed between samples. It might be expected that there would be fewer pauses in the imitated samples because the language formulation demands are low compared to the conversational sample condition.

TOCS⫹ sentence measures criterion validity

However, the majority of children had faster speaking rates in the conversational sample condition. Another possibility is that some children used the model of the utterance that is provided by the software to pace their own production, which might be slower than their typical rate in connected speech. A follow-up study is planned to address these hypotheses.

Int J Speech Lang Pathol Downloaded from informahealthcare.com by University of Otago on 07/10/15 For personal use only.

Relation between intelligibility and rate scores The positive relationships found between the intelligibility and rate scores for the two samples are also of interest. A contributing factor to this may be the positive relationship that was found between intelligibility and MLU in the conversational samples. Children with higher MLUs are more likely to have greater experience using spoken language and, therefore, be more competent speakers than children with shorter MLUs. Greater speech competence would be associated with a faster speaking rate and higher intelligibility. The relationship between intelligibility and rate was significant for the TOCS⫹ imitated sample but not the conversational sample. It is apparent from Figure 1 that the TOCS⫹ imitated sentence sample provides a left-to-right ordering of children with a larger range and more linear distribution than the conversational sample, when intelligibility scores are plotted against rate scores. A potential advantage of the TOCS⫹ imitative sampling procedures over the conversational sampling procedures is that, when both rate and intelligibility scores are taken into consideration, the imitated sentence sample provides a more discriminating measure of severity of speech disorder for children with CP and speech impairment than intelligibility scores alone. Yorkston and Beukelman (1981) reported that intelligible words per minute or IWPM (a measure that combines intelligibility and rate) provides a more sensitive measure of disorder severity than intelligibility scores alone for adults with acquired dysarthria. Based on the results of this study, further investigation of variables that influence speaking rate in the connected speech of children with CP appears warranted, which also has relevance for speech assessment and therapy. Use of the TOCS⫹ sentence measure for children with CP In this study, TOCS⫹ provided a more time efficient procedure for obtaining intelligibility and rate scores for connected speech compared to a conversational sample. On average it took ∼ 15 minutes to check a child’s transcript key for the TOCS⫹ compared to 2 hours for the conversational sample. In addition, separate .wav files had to be made from the conversational sample recording and a play list had to be prepared for playback. Despite their range of ages, all children appeared to enjoy the TOCS⫹ administration and engaged

425

in the sentence imitation task. In the conversational sample condition, some children spoke readily but others, especially those with lower intelligibility, were more reticent to speak, which increased when the examiner did not understand what they had said. A more structured format with additional context for eliciting a spontaneous sample, such as a story retell task, or a set of simple questions (Pennington et al., 2013) may have been a better choice for eliciting a spontaneous connected speech sample. In summary, this study demonstrated that TOCS⫹ software helps to standardize and expedite obtaining a signal dependent measure of intelligibility of connected speech for children with CP who are able to perform the task. The TOCS⫹ imitated sentence task is more efficient and controls for linguistic factors, such as number of repeated content words and utterance length, that can affect intelligibility and rate measures, compared to the conversational sample condition used in this study. TOCS⫹ also has acceptable criterion-related validity when compared to children’s spontaneously generated speech, based on the results for the 20 children with CP and range of severity of speech disorder who participated in this study. A disadvantage is that it is based on imitated utterances rather than self-generated spoken language for measuring intelligibility. As shown in this study, the TOCS sentence measure may under- or over-estimate a child’s conversational speech intelligibility and speaking rate. This is not unexpected, given the differences in these two speaking conditions. Miller (2013) suggested that the advantages of signal dependent measures of intelligibility be complemented by methods that capture more signal independent aspects of intelligibility, such as the set of “Intelligibility in Context” questions rated on a 5-point scale described by McLeod, Harrison, and McCormack (2012). A next step to further evaluate the validity of the TOCS⫹ is to report how intelligibility and rate scores (including IWPM) for children with CP compare with those for a group of young children with typically-developing speech and a group of children with speech sound disorders of unknown origin, without dysarthria.

Acknowledgements The authors want to thank the children, their families and the listeners who participated in the project, and Leslie Wellman for assistance with data collection and management. Declaration of interest: This work was supported by funding from the Canadian Language and Literacy Research Network (Project Reference Number 27013000) and the University of Alberta. TEC Edmonton, the technology transfer office at the Uni-

426

M. Hodge & C. L. Gotzke

versity of Alberta, holds the copyright for and licenses the Test of Children’s Speech software (TOCS+). Revenue generated from the license royalties are paid to the University of Alberta. The authors alone are responsible for the content and writing of the paper.

Int J Speech Lang Pathol Downloaded from informahealthcare.com by University of Otago on 07/10/15 For personal use only.

References Alberta College of Speech-Language Pathologists and Audiologists. (2008). Hearing screening guidelines. Edmonton, AB: http://www.acslpa.ab.ca/public/data/documents/ACFC3D5. pdf Adobe Audition (Version 1.5) [Software]. (1992). San Jose, CA: Adobe Systems. Bax, M., Goldstein, M., Rosenbaum, P., Leviton, A., & Paneth, N. (2005). Proposed definition and classification of cerebral palsy. Developmental Medicine & Child Neurology, 47, 571–576. DuHadway, C., & Hustad, K. (2013). Contributors to intelligibility in preschool-aged children with cerebral palsy. Journal of Medical Speech-Language Pathology, 20, 59–64. Flipsen, P. (2006). Measuring the intelligibility of conversational speech in children. Clinical Linguistics and Phonetics, 20, 303–312. Flipsen, P., Jr., Hammer, J. B., & Yost, K. M. (2005). Measuring severity of involvement in speech delay: Segmental and whole-word measures. American Journal of Speech-Language Pathology, 14, 298–312. Gordon-Brannan, M., & Hodson, B. (2000). Intelligibility/ severity measurements of prekindergarten children’s speech. American Journal of Speech-Language Pathology, 9, 141–150. Hodge, M. (2010). Intervention for developmental dysarthria. In R. McCauley, & S. McLeod (Eds.), Treatment of speech sound disorders in children (pp. 557–578). Baltimore, MD: Brookes Publishing. Hodge, M., & Wellman, L. (1999). Management of children with dysarthria. In A. Caruso, & E. Strand (Eds.), Clinical management of motor speech disorders in children (pp. 209–80). New York, NY: Thieme. Hodge, M., & Whitehill, T. (2010). Intelligibility impairments. In J. S. Damico, M. J. Ball, & N. Müller (Eds.), Handbook of language and speech disorders (pp. 99–114). Oxford: Wiley-Blackwell. Hodge, M., Daniels, J., & Gotzke, C. L. (2009). TOCS⫹ Intelligibility Measures (Version 5.3) [computer software]. Edmonton, Canada: University of Alberta. Hodge, M., Gotzke, C. L., & Daniels, J. (2006). TOCS⫹ Recorder-Player (TOCS⫹ RP) [computer software]. Edmonton, Canada: University of Alberta. Hodge, M. M., & Gotzke, C. L. (2010). Stability of intelligibility measures for children with dysarthria. Journal of Medical Speech-Language Pathology, 18, 61–65. Hustad, K. C. (2006). Estimating the intelligibility of speakers with dysarthria. Folia Phoniatrica et Logopaedica, 58, 217–228. Hustad, K. C., Schueler, B., Schultz, L., & DuHadway, C. (2012). Intelligibility of 4-year-old children with and without cerebral palsy. Journal of Speech, Language, and Hearing Research, 55, 1177–1189. Johnson, C. A., Weston, A. D., & Bain, B. A. (2004). An objective and time-efficient method for determining severity of childhood speech delay. American Journal of Speech-Language Pathology, 13, 55–65. Kent, R. D., Miolo, G., & Bloedel, S. (1994). The intelligibility of children’s speech: A review of evaluation procedures. American Journal of Speech Language Pathology, 3, 81–95.

Kent, R. D., Weismer, G., Kent, J. F., & Rosenbek, J. C. (1989). Toward phonetic intelligibility testing in dysarthria. Journal of Speech and Hearing Disorders, 54, 482–499. Leadholm, B., & Miller, J. (1992). Language sample analysis: The Wisconsin guide. Madison, WI: Wisconsin Department of Public Instruction. Levy, E. S., Ramig, L. O., & Camarata, S. M. (2013). The effects of two speech interventions on speech function in pediatric dysarthria. Journal of Medical Speech-Language Pathology, 20, 82–87. Loban, W. (1976). Language development: Kindergarten through grade twelve. Urbana, IL: National Council of Teachers of English. McLeod, S., Harrison, L., & McCormack, J. (2012). The Intelligibility in Context Scale: Reliability and validity of a subjective rating measure. Journal of Speech, Language, and Hearing Research, 55, 648–656. Miller, N. (2013). Measuring up to speech intelligibility. International Journal of Language and Communication Disorders, 48, 601–612. Miller, J., & Chapman, R. (1986). Systematic Analysis of Language Transcripts (SALT). Madison, WI: University of Wisconsin-Madison. Msall, M., & Park, J. (2009). The changing panorama of preschool disability: Biomedical and social risks. In M. Shevell (Ed.), Neurodevelopmental disabilities: clinical and scientific foundations (pp. 27–42). London: Mac Keith Press. Nip, I. (2013). Kinematic characteristics of speaking rate in individuals with cerebral palsy: A preliminary study. Journal of Medical Speech-Language Pathology, 20, 88–94. Palisano, R., Rosenbaum, P., Bartlett, D., & Livingston, M. (2007). Gross Motor Function Classification System Expanded and Revised. Hamilton, ON, Canada: CanChild Centre for Childhood Disability Research, McMaster University. Pennington, L., Roelant, E., Thompson, V., Robson, S., Steen, N., & Miller, S. (2013). Intensive dysarthria therapy for younger children with cerebral palsy. Developmental Medicine and Child Neurology, 55, 464–471. Reznick, J., & Goldsmith, L. (1989). A multiple form word production checklist for assessing early language. Journal of Child Language, 16, 91–100. Shriberg, L. D., & Kwiatkowski, J. (1985). Continuous speech sampling for phonologic analyses of speech-delayed children. Journal of Speech and Hearing Disorders, 50, 323–334. Shriberg, L. D., Kwiatkowski, J., & Rasmussen, C. (1990). Prosody-Voice Screening Profile (PVSP): Scoring forms and training manual. Tucson, AZ: Communication Skill Builders. Stemach, G., & Williams, W. (1988). Word express: The first 2500 words of spoken English. Novato, CA: Academic Therapy Publications. Tager-Flusberg, H. (1993). Putting words together: Morphology and syntax in the preschool years. In J. Berko-Gleason (Ed.), The development of language (pp. 162–196). New York, NY: Macmillan Publishing. Walker, J., & Archibald, L. (2006). Articulation rate in preschool children: A 3-year study. International Journal of Language and Communication Disorders, 41, 541–565. Webb, W., & Adler, R. (2008). Neurology for the speech-language pathologist (5th ed.). St. Louis, MO: Mosby, Inc. Wepman, J., & Lozar, B. (1973). The most frequently used words of spoken English. Journal of Psycholinguistic Research, 2, 129–136. Whitehill, T. (2002). Assessing intelligibility in speakers with cleft palate: A critical review of the literature. Cleft PalateCraniofacial Journal, 39, 50–58. Yorkston, K. M., & Beukelman, D. R. (1981). Assessment of Intelligibility of Dysarthric Speech. Austin, TX: Pro-Ed. Yorkston, K., Beukelman, D., & Hakel, M. (1996). Speech Intelligibility Test for Windows. [computer software]. Lincoln, NE: Communications Disorders Software Institute for Rehabilitation Science and Engineering at Madonna Rehabilitation Hospital.

Criterion-related validity of the Test of Children's Speech sentence intelligibility measure for children with cerebral palsy and dysarthria.

To evaluate the criterion-related validity of the TOCS+ sentence measure (TOCS+, Hodge, Daniels & Gotzke, 2009 ) for children with dysarthria and CP b...
132KB Sizes 1 Downloads 5 Views