International Journal of Audiology 2014; 53: 719–729
Original Article
Adaptation to nonlinear frequency compression in normal-hearing adults: A comparison of training approaches Ann-Marie Dickinson*, Richard Baker*, Catherine Siciliano* & Kevin J. Munro*,† *School of Psychological Sciences, the University of Manchester, UK, and †Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK
Abstract Objective: To identify which training approach, if any, is most effective for improving perception of frequency-compressed speech. Design: A between-subject design using repeated measures. Study sample: Forty young adults with normal hearing were randomly allocated to one of four groups: a training group (sentence or consonant) or a control group (passive exposure or test-only). Test and training material differed in terms of material and speaker. Results: On average, sentence training and passive exposure led to significantly improved sentence recognition (11.0% and 11.7%, respectively) compared with the consonant training group (2.5%) and test-only group (0.4%), whilst, consonant training led to significantly improved consonant recognition (8.8%) compared with the sentence training group (1.9%), passive exposure group (2.8%), and test-only group (0.8%). Conclusions: Sentence training led to improved sentence recognition, whilst consonant training led to improved consonant recognition. This suggests learning transferred between speakers and material but not stimuli. Passive exposure to sentence material led to an improvement in sentence recognition that was equivalent to gains from active training. This suggests that it may be possible to adapt passively to frequency-compressed speech.
Key Words: Auditory training; analytic; synthetic; passive exposure; frequency lowering
Frequency-lowering hearing aids have been available for several decades. Benefit from this technology has traditionally been reported as highly variable (Braida et al, 1979). New digital signal processing technology has led to a renewed interest in this type of signal processing and several different frequency-lowering algorithms are now commercially available, e.g. Audibility Extender by Widex (Korhonen & Kuk, 2008); SoundRecover or non-linear frequency compression (NLFC) by Phonak AG (Simpson et al, 2005); Digital Frequency Compression by AVR Sonovation (Davis, 2001); and Spectral IQ by Starkey (Galster et al, 2011). Traditional hearing aids provide users with real ear gain at frequencies up to around 5 kHz (Dillon, 2001) and despite recent advances in receiver technology, hearing-aid gain still remains poor above 6 kHz (McDermott, 2010). Studies using filtered speech in the laboratory have shown that providing hearing-impaired listeners with an extended bandwidth, up to 9 kHz, can lead to improved speech perception and production for hearing-impaired adults and children (Stelmachowicz et al, 2001, 2004; Ricketts et al, 2008). These findings suggest that offering hearing-aid users an extended
bandwidth up to 9 kHz could lead to improved speech recognition. Non-linear frequency compression (NLFC) aims to provide an extended input frequency bandwidth to hearing-aid users by lowering sounds, above a pre-defined ‘cut off ’ frequency, downwards by a pre-defined ‘compression ratio’ before presenting them to the listener. This brings high frequency sounds down into a frequency range which can be delivered by a hearing aid. Additionally, for individuals with limited residual hearing above 2 kHz, NLFC shifts sounds downwards to a frequency where their hearing is better, thereby providing audibility of high frequency sounds (Simpson et al, 2005). McDermott (2011) performed a technical comparison between NLFC (Phonak) and Audibility extender (Widex). He found both schemes resulted in spectral distortion of high frequency speech sounds. This indicates that perhaps a period of active auditory training may be beneficial for facilitating adaption to the frequency compressed speech signal—as opposed to the process of passive adaptation to amplification through hearing-aid use, described in literature as ‘auditory acclimatization’ (Arlinger et al, 1996). This hypothesis has been frequently discussed in the
Correspondence: Ann-Marie Dickinson, School of Psychological Sciences, University of Manchester, Oxford Road, Manchester, M13 9PL, UK. E-mail: Ann-Marie.Dickinson@ postgrad.manchester.ac.uk (Received 11 April 2013; accepted 1 May 2014 ) ISSN 1499-2027 print/ISSN 1708-8186 online © 2014 British Society of Audiology, International Society of Audiology, and Nordic Audiological Society DOI: 10.3109/14992027.2014.921338
720
A.-M. Dickinson et al.
Abbreviations A-CDT CDT NLFC VCV
Automated CDT Continuous discourse tracking Nonlinear frequency compression Vowel-consonant-vowel
literature but has not yet been tested (Scollie, 2010). Indeed, auditory training has been shown to be useful in facilitating adaption to other frequency-lowering technologies (Robinson et al, 2007; Kuk et al, 2009; Fullgrabe et al, 2010), although participant numbers have often been small (Fullgrabe et al, 2010) or studies have lacked a control group (Robinson et al, 2007; Kuk et al, 2009). The present study evaluated the role of auditory training in adaptation to frequency compression by asking normal-hearing adults to listen to simulations of NLFC-processed speech. By removing the confounding effects of hearing loss and age it was anticipated that any training effects would be easier to interpret. There is currently no consensus on the most effective approach to auditory training with hearing-aid users. Synthetic training or ‘top-down training’ uses sentences and contextual information in an attempt to mirror everyday listening environments. Analytic training or ‘bottom-up training’ focuses on helping individuals to attend to the more fine-grained acoustic-phonetic properties of a signal, with the ultimate aim of relearning these cues in fluent speech (Sweetow & Palmer, 2005). A review paper which aimed to identify the effectiveness of auditory training with hearing-aid users found evidence to support both approaches but concluded that synthetic training may be more effective (Sweetow & Palmer, 2005). Commercial auditory training programs aimed at hearing-aid users tend to opt for a synthetic approach (Sweetow & Henderson Sabes, 2007). However, second language acquisition training studies have found analytic training to be more effective (Bradlow et al, 1999; Iverson et al, 2005). Cochlear implant simulation training studies have found word- or sentence-based training to be more effective than phonemebased training; Stacey and Summerfield (2008) compared sentence, word, and phoneme training with normal-hearing adults listening to spectrally shifted speech and found no improvement in sentence recognition following phoneme training, although training in word and sentence recognition yielded improvements in performance. Fu et al (2005) found word-based training to be more effective than sentence training in helping normal-hearing listeners adapt to severely shifted cochlear implant simulations. It is unclear which type of training material may be more useful for users of frequency-lowering technology. Previous studies investigating auditory training with frequency transposition hearing aids have opted for a bottom-up approach which focuses on high frequency phonemes (Korhonen & Kuk, 2008; Kuk et al, 2009). Studies evaluating NLFC hearing aids with hearing-impaired individuals have reported mixed outcomes. Simpson et al (2005), Glista et al (2009), and Wolfe et al (2010) found statistically significant improvements in detection thresholds, plural detection and consonant recognition following NLFC use by adults and children. Conversely Simpson et al (2006) and McDermott and Henshall (2010) failed to show any significant benefit. Common to both these afore-mentioned studies is that participants had profound high frequency hearing thresholds, so in order to ensure audibility of the compressed signal the strongest compression setting was used. Perhaps with training participants in these studies may have
adapted more effectively to the distorted frequency-compressed signal. Indeed, a period of training has been shown to be essential in studies using cochlear implant simulations with normal-hearing listeners (Davis et al, 2005; Fu et al, 2005; Faulkner et al, 2006), especially where the spectral shift is large (Li et al, 2009). This study aimed to identify which type of training (analytic versus synthetic), is more effective at improving recognition of frequencycompressed speech. Two groups of listeners were trained with frequency-compressed speech using either analytic or synthetic speech materials. A passive exposure ‘placebo’ group was also recruited to control for the incidental learning which may occur merely through exposure to the training materials (Amitay et al, 2006). Finally a testretest control group was recruited to monitor the effect of repeated testing. It was hypothesized that analytic and synthetic training may improve the perception of frequency-compressed speech, whilst passive exposure would not lead to any improvement. The ultimate aim of this experiment was to help guide the design of a training program suitable for a study involving hearing-impaired listeners who use NLFC hearing aids.
Method Pilot study (Supplementary Appendix A available online at http://www. informahealthcare.com/doi/abs/10.3109/14992027.2014.921338) outlines a small study which was used to assess the errors in consonant recognition incurred by different severities of NLFC setting. This was done to better understand the effect NLFC has on speech perception and to identify persistent consonant confusions. As expected from the spectral overlap of NLFC setting and consonant cues, place of articulation was most affected by NLFC setting with manner affected only on the strongest NLFC setting and voicing unaffected by NLFC. Common consonant confusions identified in this pilot study were used to select the sentence recognition test material used in the main study. It was hoped this would increase the sensitivity of the test to the effects of training.
Participants Participants were required to be native speakers of British English, and have no known speech or language impairments. People trained in music tend to perform better at auditory tasks (Moore & Amitay, 2007), therefore, participants were excluded from this study if they were formally trained in music. This was defined as achieving a graded examination in any instrument and/or regularly playing music in the last five years. Fifteen male and 25 female, normal-hearing adults, mean 22 years of age (range 18 to 29 years) were recruited from the student population at the University of Manchester. All had hearing thresholds of ⱕ 20 dB HL at octave frequencies from 0.25–8 kHz and were naive to the test and training material used in this study. Thirteen more students volunteered but were excluded from the experiment as they failed to meet the inclusion criteria. The forty participants who met the inclusion criterion were randomly allocated to a training group: sentence training (n ⫽ 10), consonant training (n ⫽ 10), passive exposure (n ⫽ 10), or test-retest control (n ⫽ 10). The sample size was chosen based on sentence recognition improvements following phoneme and sentence training reported by Stacey and Summerfield (2008). A priori power analysis showed a minimum of eight participants would be required in each independent group for a statistical power of ⱖ 80%, at a significance level of 5% on a two-tailed test. A sample size of 10 was chosen to allow
Dickinson, Adaptation to nonlinear frequency compression for attrition. Participants were paid £3 per hour for their time and were rewarded with a £10 bonus upon completion of the experiment. Ethical approval was obtained from the University of Manchester Research Ethics Committee.
Speech processing and presentation All test and training speech materials were processed by Phonak AG in accordance with their patented algorithm for NLFC (Phonak AG, 2009). It was hypothesized that the most severe setting of NLFC would best highlight differences in training approaches if they existed. This was particularly important for a study with normal-hearing listeners where a ceiling effect with the processed stimuli was a concern. Therefore, NLFC was applied at the maximum setting (cut-off 1.5 kHz, compression ratio 4:1) but with a gain of 0 dB. All training and testing was done in a sound-treated booth conforming to BS EN ISO 8253-2:2009 (BSI, 2009), with dimensions 4 ⫻ 6 metres. All test and training stimuli were presented through a Fostex 6301B3 speaker located at 0 degrees azimuth, 1.5 metres from the participant’s head at eye level. Sounds were presented via an EDIROL UA-25 24 bit, 96-kHz sound card and calibrated using a CEI-254 sound level meter type 2 (IEC651, BS5969, ANSI S1.4) placed at the reference test point which was the centre position of the participants’ head, with the participant absent. All test and training material was presented at 65 dB (A), calibrated using either a reference test tone, if available, or a concatenated test stimuli signal. Reverberation time was measured and found to be low—in accordance with BS EN ISO3382: 2008 (BSI, 2008). The frequency response of the room and speaker were measured using a Bruel and Kjaer sound level meter 2250 on a tripod, and were found to be flat up to 12 kHz.
Experimental design As shown in Figure 1, participants were encouraged to complete the study over five consecutive days. There were three test sessions: (1)
721
Day 1: screening and familiarity/practice with outcome measures (2) Day 2: baseline performance on outcome measures (3) Day 5: post-training performance on outcome measures. A summary of test sessions and training schedules are shown in Figure 1 and discussed in detail in the sections below. Participants completed the study over a mean period of 6.4 days (SD 2.1). The mean time interval between test sessions 1 and 2 was 1.5 days (SD 0.8), the mean time interval between test sessions 2 and test 3 was 3.9 days (SD 2.1). Between sessions 2 and 3 participants were randomly allocated to one of four groups; sentence training, consonant training, passive exposure or test only. Those in the training and passive exposure groups were asked to attend four one-hour training/exposure sessions in the laboratory (the first and last of these sessions coinciding with test days: see Figure 1). Those in the test-only group were only required to visit for the three test sessions. For the purposes of the present study, perceptual learning was operationally defined as the difference in performance between the post-training and baseline test sessions, compared to the test only group. To account for test-retest effects the improvement in score following training was compared to the test-only group’s change in score (rather than zero).
Training procedures Training was performed under the supervision of the first author. Both sentence and consonant training consisted of a closed-set recognition task, with visual feedback, in quiet and in speech shaped noise at a fixed signal-to-noise ratio (SNR) of 0 dB. This SNR was chosen as it yielded a 50% score on a word recognition task during a small pilot study so it was deemed difficult enough to avoid ceiling effects but still maintain interest. For both training programs the same broadband speech-shaped noise was used (spectrum 0.086–12.5 kHz). All participants trained in quiet and noise for two hours, giving a total training time of four hours. This training time is comparable to other studies investigating training in normal hearing adults (Stacey & Summerfield, 2008). Four hours was considered sufficiently long enough to
A-CDT Sentence training (2 hours in quiet & 2 hours in noise)
Test session 1: Practise • VCV familiarisation (20 unprocessed stimuli in quiet) • VCV test (1 run in quiet)
Test session 2: Pre-training • VCV test (1 run in quiet)
vCv Consonant training (2 hours in quiet & 2 hours in noise)
Random allocation to group
A-CDT Passive exposure (2 hours in quiet & 2 hours in noise)
• IEEE test (2 lists in noise)
• IEEE familiarisation (1 list in quiet)
Test session 3: Post-training • VCV test (1 run in quiet) • IEEE test (2 lists in noise)
Test only control group
• IEEE test (2 lists in noise)
DAY 1 =PRACTISE 45 Minutes
DAY 2 = BASELINE TEST AND BEGIN TRAINING 90 Minutes
DAY 3&4 = TRAINING (EXCEPT FOR TEST ONLY GROUP) 60 minutes each day
DAY 5 = FINISH TRAINING AND FINAL TEST 90 minutes
Figure 1. A summary of the test and training sessions. Participants were encouraged to complete the experiment in five consecutive days. Training sessions lasted one hour with the first and final hours of training being performed immediately after the tests in session 2 and before the tests in session 3.
722
A.-M. Dickinson et al.
see any effects of training whilst still maintaining the interest of the participants. It has been suggested that if training is too easy, it can be less effective: Amitay et al (2006) showed adults improved significantly less on a discrimination task when the training task was easy (participants scored 100% correct), but improvement remained high when the training was made impossible (identical tones used to train a discrimination task). In view of this, participants who completed the sentence or consonant training were scored on their ability to complete the training task (% correct) during the start, middle, and end of their training sessions. These results were not used as an outcome measure for the study but merely to assess the difficulty of the training.
SENTENCE TRAINING Continuous discourse tracking (CDT) sentence training is traditionally scored in terms of words correct per minute when administered using live voice. This is a labour intensive training approach. In order to address this problem, an automated version of CDT (A-CDT) was developed by Rosen and Faulkner at Speech Hearing and Phonetic Sciences, UCL (Faulkner et al, 2008) and has been shown to be as effective as live voice training listeners to adapt to novel speech signals (Faulkner et al, 2012). A-CDT was used in this study with permission of the developers. Training material consisted of three stories spoken by two female speakers of British English. The A-CDT training software displays three correct words and three foil words (similar sounding words) for each sentence. Visual feedback on accuracy was given by a green tick next to the selected word that was correct or a red cross next to a selected word that was incorrect. If the participant selected an incorrect word the sentence was replayed. When all the correct words had been selected the full sentence was written on the screen and the audio replayed. This training procedure has been used successfully in previous studies (Stacey & Summerfield, 2008) and this pattern of feedback, known as ‘distorted-clear-distorted’, has been shown to aid learning (Davis et al, 2005). Participants trained in thirty-minute runs after which time the training software automatically timed out. Participants completed four runs in quiet (total of two hours) and four runs in noise (total of two hours) during the study.
CONSONANT TRAINING Vowel-consonant-vowel (VCV) stimuli were presented using custom software developed in our lab by the second author. Participants were asked to choose the correct response from 20 possible responses displayed on a computer screen: /p/, /t/, /k/, /f/, /θ/, /s/, /∫/, /ƭ/, /b/, /d/, /g/, /v/, /z/, /ʧ/, /w/, /j/, /r/, /㯚/, /m/, /n/. Each consonant was presented in a medial context between the vowels /a/ /i/ and /u/ and three different utterances were played for each vowelconsonant-vowel combination, giving a total of nine presentations of each consonant on each run (180 presentations in total, per run). Eight runs were completed over the course of the training sessions; four runs in quiet and four in noise. Digital recordings of the VCV stimuli were made from one female speaker of British English (16 bit, 48 kHz down sampled to 44 kHz). Feedback was provided to the participants in the form of a green tick for a correct answer and a red cross for an incorrect answer. In contrast to the sentence training the VCVs were not replayed after they had responded.
PASSIVE EXPOSURE The passive exposure group were exposed to the same A-CDT sentences as the sentence training group. Each A-CDT story was combined into one audio file with a gap of one second between each
sentence. Each participant was passively exposed to two hours of sentence material in quiet and two hours in noise. Participants were asked to do academic work during the sessions and not to pay attention to the auditory stimuli.
Outcome measures
SENTENCE
RECOGNITION TEST
Sentence recognition was measured with recordings of the Institute for Electrical and Electronic Engineers (IEEE) Harvard sentences (Rothauser et al, 1969) spoken by one female speaker of British English. The entire IEEE corpus consists of 72 lists each containing 10 sentences. A sub-set of lists was selected which contained a high proportion of the most commonly confused phonemes identified in the pilot study (Supplementary Appendix A available online at http:// www.informahealthcare.com/doi/abs/10.3109/14992027.2014. 921338). Additionally, any lists containing American English terms such as ‘store’, ‘braid’ and ‘cent’ were excluded to avoid incorporating unfamiliar words for participants. During the test, participants were asked to repeat what they had heard. Sentences were scored by the first author as words correct (five words per sentence). To avoid ceiling effects, sentence recognition was measured in the presence of speech-shaped background noise at an SNR of 0 dB. Both the sentences and noise were processed with NLFC. The test was calibrated to ensure presentation of stimuli was at 65 dB (A) in the soundfield. A familiarization list in quiet was presented at the practice session followed by two lists in noise at each test session (100 words scored per test session). The order of lists was pseudo-randomized to ensure different lists were used at each test session for each participant.
CONSONANT RECOGNITION TEST A consonant recognition test was used to assess the effect of different training methods on specific consonant confusions. To ensure this test differed from the training task a male speaker was used (whilst a female speaker was used in training). VCV testing was used to assess participants’ consonant errors pre- and post-training. The method was exactly the same as the VCV described earlier except feedback was not provided during the testing. At the first session, participants completed an unprocessed version of the test to familiarize them with the test requirements. All possible responses were displayed on a screen and participants were asked to select the response they heard. All stimuli were played at 65 dB(A) in quiet. VCV results were analysed using the FIX information transfer analysis software—an implementation of SINFA (Wang & Bilger, 1973). Results were analysed in terms of voice, manner, and place of articulation features. Additionally, confusion matrices were generated to show how specific consonant confusions changed following training.
Analysis A Shapiro Wilks test for normality was conducted on the data for each training group (sentence, consonant, passive, and test-only) and for each outcome test (sentence and consonant recognition). In all cases, the test was non-significant (p ⬎ 0.05), showing that the data did not deviate significantly from a normal distribution. Consequently, parametric tests were used for the statistical analysis. To account for procedural learning the training groups’ performance on each test was compared to the test-only control groups’ performance, rather than zero. This was achieved firstly using a one-way ANOVA to compare changes in score (post-training minus
Dickinson, Adaptation to nonlinear frequency compression pre-training scores) in the four groups, and secondly if relevant, by using Bonferroni-adjusted pair-wise comparisons.
Results Sentence recognition test The mean change in IEEE score for each group following training is shown in Figure 2. This was calculated as each participant’s posttraining score minus pre-training score i.e. a positive value indicates the extent to which performance improved following training. The sentence trained group and passive exposure group showed improved mean scores of 11.0% (SD 7.9%,) and 11.7% (SD 6.6%,) respectively posttraining. The consonant trained group showed a mean improvement of 2.5% (SD 7.1%,) post-training. The test-only group’s score changed very little between test sessions two and three (0.4%, SD 6.8%) which suggests there was little procedural learning between test sessions two and three. A one-way ANOVA revealed statistically significant main effects of training group (F[3, 6.64] ⫽ 334.87, p ⫽ 0.001). A Bonferroni-adjusted paired comparison and Cohen’s d effect size calculation (Cohen, 1988) revealed that the improvement in the sentence trained group was statistically significant compared to the change in the testonly group (p ⫽ 0.012), with a large effect size (d ⫽ 1.4). The improvement in the passive exposure group was also statistically significant compared to the change in the test-only group (p ⫽ 0.006), with a large effect size (d ⫽ 1.7). The same pair-wise comparison of performance in the consonant training and test only groups showed no significant difference (p ⫽ 1), and effect size was small (d ⫽ 0.3).
Consonant recognition test The mean change in VCV correct score for each group following training is shown in Figure 3a. The consonant trained group improved by 8.8% (SD 5.7%) whilst the sentence trained group and passive exposure improved by 1.9% (SD 2.2%) and 2.8% (SD 2.7%) respectively. Again the test-only group showed only a small change between test sessions 2 and 3 (0.8%, SD 2.5%). A one-way ANOVA revealed a statistically significant main effect of group (F[3,
Figure 2. Change in IEEE test scores in noise each group following training (N ⫽ 10 each group). Improvement is calculated by posttraining (session 3) minus pre-training score (session 2), i.e. positive equals an improvement. A statistically significant improvement (compared to the test-only group) is indicated by a star (p ⫽ 0.012 sentence group, p ⫽ 0.006 passive exposure group).
723
10.15] ⫽ 127.9, p ⬍ 0.001). A Bonferroni-adjusted paired comparison and Cohen’s d effect size calculation revealed the improvement in the consonant trained group was statistically significant compared to the change in the test-only group (p ⬍ 0.001) and effect size was large (d ⫽ 1.8), whilst there was no significant difference in consonant recognition between the sentence training, passive exposure and test only groups (p ⫽ 1). Effect size was medium for the sentence group and test-only group pair-wise comparison (d ⫽ 0.5), and large for the passive exposure group and test-only group pair-wise comparison (d ⫽ 0.8). The mean change in VCV place information transmitted for each group following training is shown in Figure 3b. Both groups who received training and the passive exposure control group showed some improvement in their ability to detect place of articulation information post-training. A one-way ANOVA revealed a statistically significant main effect of training group (F[3, 36] ⫽ 4.7, p ⫽ 0.007). A Bonferroni-adjusted paired comparison and Cohen’s d effect size calculation revealed the improvement in consonant trained group was statistically significant compared to the change in the test-only group (p ⫽ 0.005, d ⫽ 1.5; a large effect size). There was no significant difference in place information transmitted between the sentence group (p ⫽ 1, d ⫽ 1.1; a large effect size), passive exposure group (p ⫽ 0.21, d ⫽ 0.6; a medium effect size) and the test only group. The mean changes in VCV manner and voice information transmitted for each group following training are shown in Figures 3c and 3d. Figure 3c shows that the consonant trained group improved in their ability to detect manner of articulation information posttraining but a one-way ANOVA revealed this just failed to reach significance (F[3, 36] ⫽ 2.7, p ⫽ 0.06. Eta squared ⫽ 0.18, large effect). Figure 3d shows there was little change in ability to detect voice information transmitted post-training. A one-way ANOVA revealed no statistically significant main effect of training group on voice information transmitted was present (F[3, 36] ⫽ 0.617, p ⫽ 0.61. Eta squared ⫽ 0.05, a small effect).
The effect of training on consonant confusions The post-training VCV responses minus pre-training VCV responses are shown in Figure 4 (a, b, c, d). These confusion matrices show the effect of different training approaches on specific consonant confusions. Each consonant was played nine times (three times with each vowel) to each participant at each test session. As each group had ten participants each consonant was played 90 times per test session per group. The numbers in Figure 4 can be used together with the total number of presentations to calculate the percentage change in errors between test session 2 and 3. Perhaps unsurprisingly, the group which showed the largest reduction in consonant confusions post-training was the one who received consonant training (Figure 4b) with the biggest change being amongst the unvoiced fricatives. Specifically, /θ/ was correctly identified more often following training, as the number of confusions with /s/ and /∫/ reduced (by 14% and 11% respectively). Additionally, /s/ was identified correctly more often following training, as the number of confusions with /∫/ reduced (50% fewer errors). However /∫/ was mislabelled as /s/ more often post-training (31% more errors), which suggests participants were aware of the /s/-/∫/ confusion but were often unable to identify the correct label or were overcorrecting for this confusion. Recognition of the voiced fricative /z/ improved post-training, as the number of confusions with /ƭ/ reduced (58% fewer errors). The more obvious fricative alternative /ᴣ/ was not a response option in this test as no recording of this consonant was available. Approximant /j/ was identified correctly
724
A.-M. Dickinson et al.
Figure 3. (a) Change in VCV test scores in quiet each in group following training (N ⫽ 10 each group). Improvement is calculated by posttraining score (session 3) minus pre-training score (session 2), i.e. a positive score ⫽ improvement. A statistically significant improvement (compared to the test-only group) is indicated by a star (p ⬍ 0.001). (b) Mean change in place of articulation information transmitted during VCV test in each group after training (N ⫽ 10 each group). Improvement is calculated by post-training score (session 3) minus pre-training score (session 2), i.e. a positive score ⫽ improvement. A statistically significant improvement (compared to the test-only group) is indicated by a star (p ⫽ 0.007). (c) Mean change in manner of articulation information transmitted during VCV test in each group after training (N ⫽ 10 each group). Improvement is calculated by post-training score (session 3) minus pre-training score (session 2), i.e. a positive score ⫽ improvement. (d) Mean change in voicing information transmitted during VCV test in each group after training (N ⫽ 10 each group). Improvement is calculated by post-training score (session 3) minus pre-training score (session 2), i.e. a positive score ⫽ improvement. more often post-training, as the number of confusions with /l/ reduced. Additionally, /r/ was also identified correctly more often post-training, as the number of confusions with /l/ reduced (14%). Following A-CDT sentence training (Figure 4a) unvoiced fricative confusions reduced (although much less than those who had received the consonant training). Specifically, /θ/ was identified correctly more often following training, as the confusions with /∫/ reduced (11% fewer errors). The unvoiced affricate /ʧ/ was also identified more often post-training, as the confusions with /t/ reduced (by 10%). The participants who were passively exposed to frequency compressed speech (Figure 4c) confused /θ/ more with /f/ (12%) post-exposure, whilst confusion between /θ/ and /s/ and /∫/ reduced slightly (by 4% and 8%, respectively). The voiced fricative /v/ was correctly labelled more often post-exposure (by 11%), as errors with several other consonants reduced. Additionally, /z/ was correctly labelled more often post-exposure (16%), as confusion with /ƭ/ reduced (by 19%). Following repeated testing, the test-only group (Figure 4d) confused /θ/ more with /f/ but less with /∫/ (both by 13%). Otherwise,
consonant confusion patterns remained similar between test sessions as would be expected.
Performance on the training task The mean performance on the A-CDT training task and VCV training task is shown in Table 1. Group scores (percent correct) are shown at the start, mid-point, and end of the training (scored separately for quiet and noise). For A-CDT sentences in quiet, all participants were at ceiling right from the start of their training. The A-CDT training was slightly more difficult in noise, however even at the start participants still scored near ceiling. In a small pilot study more challenging SNRs were trialled with the A-CDT training in an effort to avoid ceiling effects. However due to the many contextual clues in the A-CDT stories participants were able to guess the correct answers from the words available on the screen. This meant scores remained near ceiling even with very adverse SNRs. To ensure the story was audible (and participants were not responding solely based on context) a moderate SNR of 0 dB was
Dickinson, Adaptation to nonlinear frequency compression
725
Figure 4. (a, b, c, d) Confusion matrices for each group showing post-training responses minus pre training responses on the consonant recognition test. A positive value on the diagonal indicates more correct responses were made following training, whilst, a positive value away from the diagonal indicates more errors were made. Conversely, a negative value on the diagonal indicates fewer correct responses were made following training, whilst, a negative value away from the diagonal indicates fewer errors were made following training.
726
A.-M. Dickinson et al.
Figure 4. Continued.
Dickinson, Adaptation to nonlinear frequency compression
727
Table 1. Mean and standard deviation of training scores, in quiet and noise, at the start (0 hours), mid-point (1 hour), and end of training (2 hours). Passive exposure was not scored. Training group Sentence Consonant
Training task
Score at start
Score at mid-point
Score at end
A-CDT sentences in quiet A-CDT sentences in noise VCV in quiet VCV in noise
99.4% (0.01) 96.3% (0.04) 70.3% (5.4) 48.0% (4.8)
99.4% (0.01) 97.5% (0.01) 76.6% (4.0) 52.9% (5.3)
99.2% (0.01) 97.6% (0.02) 79.0% (3.6) 57.7% (3.5)
chosen. It should be noted that scores on the sentence recognition test in noise did not reach ceiling, in any group, even after training (mean score for all participants pre-training ⫽ 34.7%, SD 9.1; mean score post-training 41.13%, SD 10.5). The differences in performance on the test and training sentence tasks may be explained by the differences in response methods (open set for testing versus closed set for training) and stimuli (IEEE sentence material with little contextual clues for testing versus stories full of contextual clues for training). The VCV consonant training performance was also scored in percent correct at the start, mid-point, and end of the training (scored separately for quiet and noise). All participants improved over the course of their training. Initial scores in quiet were between 62.8– 77.8% rising to 73.3–85.6% at the end of the two hours training. Initial scores in noise were between 40.4–55.6% rising to 53.3–62.6% at the end of the two hours training.
Discussion The groups showed greatest improvement on the outcome tests which mirrored their training task, i.e. those who trained with sentences improved most on the sentence task and those who trained with consonants improved most on the consonant task. As the test and training material differed in terms of speaker and material this finding suggests learning transferred between speakers and material but not between types of stimuli. This is to be expected as it is frequently reported in the literature that training generalizes best to similar tasks (Robinson & Summerfield, 1996). This phenomenon has recently been described as ‘near-transfer’ (Barcroft et al, 2011). Passive exposure (to the A-CDT sentences) yielded an equivalent improvement in recognition of frequency-compressed IEEE sentences as active training. This occurred despite participants being asked not to attend to the stimuli and not being provided with any feedback. Other studies have reported similar improvements in perception of novel speech stimuli following passive exposure; Dupoux and Green (1997) found a short period of passive exposure was sufficient for normal-hearing listeners to adapt to time-compressed speech, whilst more recently, Li et al (2009) found that passive exposure enabled normal-hearing listeners to adapt to a mildly shifted cochlear implant simulation. It could be argued that some participants in the passive exposure group were in fact actively listening to the stimuli (rather than ignoring them as instructed) and therefore that attention was a confounding factor in this group. If this was the case it may be expected that the participants in the group who ignored the instructions (and actively listened) would show improvement, whilst those that followed the instructions (and ignored the stimuli) would not improve. However all 10 participants in the passive exposure group showed improved sentence recognition post-training (with improvements ranging from 1–20%). Whilst nine of the 10 participants who were given sentence training showed post-training improvement ranging from 4–26% (with one showing a
deterioration of 3%), and only three participants in the test-only control group showed improvement at the final test session ranging from 6–11% (five showed a deterioration ranging from 1–10% and two showed no change). Looking at individual improvements in this way suggests that improvement in sentence recognition following passive exposure was consistent across the whole group and occurred irrespective of attention and despite a lack of feedback. To control for attention, participants in the passive exposure group could have been provided with a task which has a ‘measurable’ performance indicator, e.g. a quiz or game or memory task, as this may have been more effective at controlling attention than doing academic work. However, even studies that have controlled for attention in this way have found improvements in performance following passive exposure; Amitay et al (2006) found passive exposure to tones combined with playing computer games (where performance on the game was measured) led to a statistically significant improvement on a frequency discrimination task which used those tones. To determine the effect of attention on passive adaption to novel stimuli future studies need to compare perceptual changes following passive exposure between groups who are attending to a measurable task, with groups who are not. In this study all test and training materials were processed with the maximum setting of NLFC which has a much bigger shift (above the cut-off frequency) than the mildly shifted CI simulation used by Li et al (2009), and passive exposure still led to improvements on a par to those achieved from active training. Perhaps this indicates that the spectral distortion introduced above the cut off frequency from NLFC is relatively minor, compared to CIs (where the whole frequency range is shifted), thus active training is not required for adaptation—at least amongst normal-hearing listeners. The passive exposure group performed as well on the outcome tests without feedback as those that received explicit feedback through A-CDT sentence training. This seems to contradict the idea that feedback is essential for learning (Davis et al, 2005), and indicates that passive exposure to speech may be worth consideration in studies investigating NLFC, at least for normal hearing adults. Whether this applies to hearing-impaired adults using NLFC has yet to be seen. The process of passive perceptual learning to amplification (or auditory acclimatization) could be highly dependent on an individual’s lifestyle—a socially active person who lives with others may have much more exposure to speech sounds through their hearing aid compared to an individual who is house bound and lives alone. Gatehouse (2006) suggested that ‘auditory ecology’ may influence outcome with hearing-aid technology, where auditory ecology is defined as an individual’s auditory environment and the extent performance in that environment affects their lifestyle (Gatehouse et al, 1999). Whether passive exposure to speech as part of an individuals’ ‘auditory ecology’ is sufficient for hearing-aid users to adapt to NLFC, or if specific auditory training with feedback leads to greater adaptation, has yet to be shown.
728
A.-M. Dickinson et al.
In this study A-CDT sentence training led to a statistically significant improvement in perception of frequency-compressed IEEE sentences in noise compared to the test-only control group. Despite using the same type of speech tokens (i.e. sentences) and the same background noise (speech shaped noise) the IEEE test used different sentence material, a different response method (open set), and a different speaker, compared to the training. It can be concluded that A-CDT training with frequency-compressed speech led to perceptual learning which generalized to new speech material and a new speaker. Generalization of training with normal-hearing adults is well documented. Burk et al (2006; Burk & Humes, 2008) showed word training led to generalization between speakers and response formats (open and closed), but not to new speech material for normal-hearing young adults. Generalization following training appears to be less consistent among hearing-impaired adults. For example, Burk and Humes (2008) showed generalization to an unfamiliar speaker but not to novel material among eight older hearing-impaired adults who trained with words. Stecker et al (2006) showed generalization between speakers but not material for hearing-aid users trained in syllable identification. Overall, the literature indicates that generalization between speakers is more common than generalization to new material following training. Indeed in the present study A-CDT sentence training led to a small improvement in consonant recognition but this failed to reach significance, perhaps due to the innate difference between the speech material, i.e. sentences and syllables. The group which received consonant training had statistically significantly improved consonant recognition post-training and less consonant confusion, particularly between the unvoiced fricatives. Perhaps this is unsurprising since, despite different speakers, the VCV test and training procedures were identical. As NLFC affects the high frequency portion of speech, i.e. consonants, this test was considered essential as an outcome measure despite it being used in training. In order to maintain a difference between the test and training material, speakers of a different sex were used. The consonant training material used a female speaker whilst the consonant test material used a male speaker. As females speakers are higher in pitch they are likely to be more affected by NLFC thus participants in this study trained using a harder condition compared to the test condition. This method was chosen as it has been shown to lead to increased learning when used in a closed set condition (Burk & Humes, 2007). The VCV training and testing results in the present study demonstrate a generalization of learning between different sex of speakers. As discussed above, this finding is in agreement with other training studies which have found that generalization of learning can occur readily between speakers (Burk et al, 2006; Burk & Humes, 2008; Stecker et al, 2006). The consonant trained group did show some improvement on the sentence recognition task (compared to the test only group). However, as shown by the small effect size, this improvement was small and highly variable—perhaps with a larger number of participants this improvement would have been more robust. Additionally, the consonant training performance was still improving at the end of training, which suggests that a longer training time could have led to greater improvements. Indeed, Burk et al (2007) assessed sentence recognition following word training and found that scores increased significantly when training time was increased from five to 15 hours. As the participants in this study completed only four hours of training it is possible that more training may have led to further improvements. This study suggests that users of NLFC hearing aids may find speech training helpful in adapting to the novel frequency compressed speech signal which is provided by the frequency lowering
algorithm. This supports previous research which showed benefit in providing auditory training to users of hearing aids (Stecker et al, 2006; Sweetow & Henderson Sabes, 2007), and to those using frequency lowering hearing aids (McDermott & Dean, 2000; Kuk et al, 2009). However, larger adequately controlled studies providing auditory training to hearing-impaired adults who use NLFC hearing aids are required to confirm this hypothesis. To address this research question the training techniques used in this study will be incorporated into an automated auditory training program which can be used at home by hearing-aid users. This training will combine analytic and synthetic speech material and focus on the high frequency cues which are distorted by NLFC. Two appropriately matched groups of hearing-aid users will be recruited; one group will complete auditory training whilst one group will complete placebo training.
Conclusion The findings suggest that speech training is effective at improving perception of frequency-compressed speech and that training generalizes most effectively to similar tasks. However, results from the ‘placebo’ training group suggest that passive exposure may be as good at improving perception of frequency-compressed sentences as active training. As this study investigated training-related changes in normal-hearing young adults the findings cannot be directly applied to hearing-impaired individuals or older adults; however, the results suggest that sentence training and consonant training could be beneficial to adults who use NLFC hearing aids. Additionally, the process of passive adaption to NLFC and the potential confounding factor of auditory ecology requires consideration in future research.
Acknowledgements Thank you to the editor and anonymous reviewers for their helpful comments. Declaration of interest: This study was part of a studentship funded by Phonak AG, Staefa, Switzerland.
References Amitay S., Irwin A. & Moore D.R. 2006. Discrimination learning induced by training with identical stimuli. Nat Neurosci, 9, 1446–1448. Arlinger S., Gatehouse S., Bentler R.A., Byrne D., Cox R.M. et al. 1996. Report of the Eriksholm Workshop on Auditory Deprivation and Acclimatization. Ear Hear, 17, S87–S90. Barcroft J., Sommers M.S., Tye-Murray N., Mauze E., Schroy C. et al. 2011. Tailoring auditory training to patient needs with single and multiple talkers: Transfer-appropriate gains on a four-choice discrimination test. Int J Audiol, 50, 802–808. Bradlow A.R., Akahane-Yamada R., Pisoni D.B. & Tohkura Y. 1999. Training Japanese listeners to identify English /r/ and /l/: Long-term retention of learning in perception and production. Percept Psychophys, 61, 977–985. Braida L.D., Durlach N.I., Lippmann R.P., Hicks B.L., Rabinowitz W.M. et al. 1979. Hearing aids: A review of past research on linear amplification, amplitude compression, and frequency lowering. ASHA Monogr, (19), 1–114. BSI 2008. BS EN ISO3382-2. Acoustics - measurment of room acoustic parameters. Part 2: Reverberation time in ordinary rooms. British Standards. BSI 2009. BS EN ISO8253-2. Acoustics - audiometric test methods. Part 2: Sound field audiometry with pure tone and narrow band test signals. British Standards. Burk M.H. & Humes L.E. 2007. Effects of training on speech recognition performance in noise using lexically hard words. J Sp Lang Hear Res, 50, 25–40.
Dickinson, Adaptation to nonlinear frequency compression Burk M.H. & Humes L.E. 2008. Effects of long-term training on aided speechrecognition performance in noise in older adults. J Sp Lang Hear Res, 51, 759–771. Burk M.H., Humes L.E., Amos N.E. & Strauser L.E. 2006. Effect of training on word-recognition performance in noise for young normal-hearing and older hearing-impaired listeners. Ear Hear, 27, 263–278. Cohen J. 1988. Statistical Power Analysis for the Behavioral Sciences: L. Erlbaum Associates. Davis M.H., Johnsrude I.S., Hervais-Adelman A., Taylor K. and McGettigan C. 2005. Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences. J Exp Psychol Gen, 134, 222–241. Davis W. 2001. A technical review of a frequency transposition insturment designed for people who have severe hearing loss. Hear Rev [Online]. Available: http://www.hearingreview.com/issues/articles/2001-02_04.asp [accessed 20/01/2011]. Dillon H. 2001. Hearing Aids. Thieme Dubno J.R., Dirks, D.D. & Langhofer L.R. 1982. Evaluation of hearingimpaired listeners using a nonsense-syllable Test II. Syllable recognition and consonant confusion patterns. J Speech Hear Res, 25, 141–148. Dupoux E. & Green K. 1997. Perceptual adjustment to highly compressed speech: Effects of talker and rate changes. J Exp Psychol Hum Percept Perform, 23, 914–927. Faulkner A., Rosen S. & Green T. 2012. Comparing live to recorded speech in training the perception of spectrally shifted noise-vocoded speech. J Acoust Soc Am, 132, EL336–342. Faulkner A., Rosen S. & Norman C. 2006. The right information may matter more than frequency-place alignment: Simulations of frequency-aligned and upward shifting cochlear implant processors for a shallow electrode array insertion. Ear Hear, 27, 139–152. Faulkner A., Rosen S., Watt C. & Gedgaudaite K. 2008. Comparison of live voice and computer-based training for upward-shifted vocoded speech simulating a cochlear implant. In: British Society of Audiology Short Papers Meeting on Experimental Studies of Hearing and Deafness. London. Fu Q.-J., Nogaki G. & Galvin J.J. 2005. Auditory training with spectrally shifted speech: Implications for cochlear implant patient auditory rehabilitation. J Assoc Res Otolaryngol, 6, 180–189. Fullgrabe C., Baer T., Stone M.A. & Moore B.C.J. 2010. Preliminary evaluation of a method for fitting hearing aids with extended bandwidth. Int J Audiol, 49, 741–753. Galster J., Valentine S., Dundas A. & Fitz K. 2011. Spectral IQ: Audibly improving access to high frequency sounds. Starkey. Gatehouse S., Elberling C. & Naylor G. 1999. Aspects of auditory ecology and psychoacoustic function as determinants of benefit from and candidature for non-linear processing hearing aids. Proceedings of the 18th Danavox Symposium: Auditory models and non-linear hearing instruments. Glista D., Scollie S., Bagatto M., Seewald R., Parsa V. et al. 2009. Evaluation of nonlinear frequency compression: Clinical outcomes. Int J Audiol, 48, 632–644. Iverson P., Hazan V. & Bannister K. 2005. Phonetic training with acoustic cue manipulations: A comparison of methods for teaching English /r/ and /l/ to Japanese adults. J Acoust Soc Am, 118, 3267–3278. Korhonen P. & Kuk F. 2008. Use of linear frequency transposition in simulated hearing loss. J Am Acad Audiol, 19, 639–650. Kuk F., Keenan D., Korhonen P. & Lau C.-C. 2009. Efficacy of linear frequency transposition on consonant identification in quiet and in noise. J Am Acad Audiol, 20, 465–479. Li T., Galvin J.J. 3rd & Fu Q.J. 2009. Interactions between unsupervised learning and the degree of spectral mismatch on short-term perceptual adaptation to spectrally shifted speech. Ear Hear, 30, 238–249.
Supplementary material available online Supplementary Appendix A available online at http:// informahealthcare.com/doi/abs/10.3109/14992027.2014.921338.
729
McDermott H. 2010. SoundRecover: The importance of perceptual bandwidth. Phonak Field News. McDermott H. 2011. A technical comparison of digital frequency-lowering algorithms available in two circuit hearing aids PLoS One, 6, e22358. doi:10.1371/journal.pone.0022358. McDermott H. & Henshall K. 2010. The use of frequency compression by cochlear implant recipients with postoperative acoustic hearing. J Am Acad Audiol, 21, 380–389. McDermott H.J. & Dean M.R. 2000. Speech perception with steeply sloping hearing loss: Effects of frequency transposition. British Journal of Audiology, 34, 353–361. Miller G. & Nicely P. 1955. An analysis of perceptual confusions among some English consonants J Acoust Soc Am, 27, 339–352. Moore D. & Amitay S. 2007. Auditory training: Rules and applications. Seminars in Hearing, 28, 99–109. Phonak AG. 2009. Method for adapting sound in a hearing aid device by frequency modification, and such a device. PCT/EP2008/056708. Ricketts T.A., Dittberner A.B. & Johnson E.E. 2008. High-frequency amplification and sound quality in listeners with normal through moderate hearing loss. J Sp Lang Hear Res, 51, 160–172. Robinson J.D., Baer T. & Moore B.C.J. 2007. Using transposition to improve consonant discrimination and detection for listeners with severe highfrequency hearing loss. Int J Audiol, 46, 293–308. Robinson K. & Summerfield A.Q. 1996. Adult auditory learning and training. Ear Hear, 17, S51–S65. Rothauser E.H., Chapman N.D., Guttman N., Norbdy F.S., Silbiger H.R., Urnanek, G.E. & Weinstock M. 1969. IEEE recommended practice for speech quality measures IEEE Transactions on Audio and Electroacoustics, 17, 227–246. Scollie S. Frequency lowering technologies: Overview and introduction to panel session. In: R. Seewald & J. Bamford (eds.), A sound foundation through early amplification 2012, Proceedings of the fifth international conference 2010.Chicago, USA: Phonak AG, 195–200. Simpson A., Hersbach A.A. & McDermott H.J. 2005. Improvements in speech perception with an experimental nonlinear frequency compression hearing device. Int J Audiol, 44, 281–292. Simpson A., Hersbach A.A. & McDermott H.J. 2006. Frequency-compression outcomes in listeners with steeply sloping audiograms. Int J Audiol, 45, 619–629. Stacey P.C. & Summerfield A. 2008. Comparison of word-, sentence-, and phoneme-based training strategies in improving the perception of spectrally distorted speech. J Sp Lang Hear Res, 51, 526–538. Stecker G.C., Bowman G.A., Yund E.W., Herron T.J., Roup C.M. et al. 2006. Perceptual training improves syllable identification in new and experienced hearing aid users. J Rehabil Res Dev, 43, 537–551. Stelmachowicz P.G., Pittman A.L., Hoover B.M. & Lewis D.E. 2001. Effect of stimulus bandwidth on the perception of /s/ in normal- and hearingimpaired children and adults. J Acoust Soc Am, 110, 2183–2190. Stelmachowicz P.G., Pittman A.L., Hoover B.M., Lewis D.E. & Moeller M.P. 2004. The importance of high-frequency audibility in the speech and language development of children with hearing loss. Arch Otolaryngol, 130, 556–562. Sweetow R. & Palmer C.V. 2005. Efficacy of individual auditory training in adults: A systematic review of the evidence. J Am Acad Audiol, 16, 494–504. Sweetow R.W. & Henderson Sabes J. 2007. Listening and communication enhancement (LACE). Seminars in Hearing, 28, 133–141. Wang M.D. & Bilger R.C. 1973. Consonant confusions in noise: A study of perceptual features. J Acoust Soc Am, 54, 1248–1266. Wolfe J., John A., Schafer E., Nyffeler M., Boretzki M. et al. 2010. Evaluation of nonlinear frequency compression for school-age children with moderate to moderately severe hearing loss. J Am Acad Audiol, 21, 618–628.
Copyright of International Journal of Audiology is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.