Training of Cochlear Implant Users to Improve Pitch Perception in the Presence of Competing Place Cues Andrew Vandali,1 David Sly,2 Robert Cowan,1 and Richard van Hoesel1

Objectives: Perception of musical pitch in cochlear implant (CI) systems is relatively poor compared with normal hearing and can be adversely affected by changes in spectral timbre coded by stimulation place. In this study, we evaluated whether the perception of musical pitch could be improved through specific training designed to teach listeners to attend to fundamental frequency (F0) exclusively for judgment of pitch and to spectral envelope exclusively for discrimination of spectral timbre.

INTRODUCTION It has been well established that in comparison to normalhearing (NH) listeners, cochlear implant (CI) recipients exhibit poorer perception of pitch (Pijl 1997b; Vandali et al. 2005, 2012; Sucher & McDermott 2007), melody (Gfeller & Lansing 1991; Gfeller et al. 2002a, 2005, 2007; Leal et al. 2003; Galvin et al. 2007), and lexical tone (Wei et al. 2007; Lee & Lee 2010; Wang et al. 2011). This has been attributed mainly to an inability of current CI systems to adequately code the fine spatiotemporal information that provides the principle cues to pitch in NH (Ritsma 1967; Moore et al. 1985; Dai 2000). In CI coding strategies, changes in electrical stimulation rate can be used to elicit changes in pitch percepts for CI listeners (e.g., McKay et al. 1994; Pijl & Schwarz 1995; Moore & Carlyon 2005). However, the percept is relatively weak and more sluggish (Moore & Carlyon 2005; Lou et al. 2010) compared with that derived from fine spatiotemporal information in NH. In addition, it can be adversely affected by place or level of stimulation (e.g., Pijl 1997a; Zeng 2002; Vandali et al. 2005, 2012, 2013; Arnoldner et al. 2008; Carlyon et al. 2010). Outcomes from previous studies suggest that pitch perception in CI users might be improved by specific training designed to help listeners associate F0 with pitch, and spectral envelope information with spectral timbre, so that each of those musical attributes are better perceived. Several studies have shown that provision of specific training improved CI listeners’ ability to identify melody (e.g., Gfeller et al. 2000; Galvin et al. 2007) and instruments (Gfeller et al. 2002b). Galvin et al. (2007) reported that training using a melodic contour identification paradigm improved listeners’ abilities on that task and also generalized to improved identification of familiar melodies. However, those tests used synthesized stimuli for which place of stimulation was held relatively fixed, and outcomes may therefore differ for natural musical sounds in which changes in spectral timbre are common. Such changes in spectral timbre have been shown to influence pitch perception for both NH (e.g., Singh & Hirsh 1992; Pitt 1994; Warrier & Zatorre 2002; Russo & Thompson 2005) and CI listeners (e.g., Sucher & McDermott 2007; Galvin et al. 2008; Vandali et al. 2012). That influence is particularly evident in listeners who are not musically trained, suggesting that musical training/experience might help listeners’ to discriminate pitch even when confounding effects of spectral timbre are present. This conjecture is supported by a previous study, which showed that participation in a series of rate and modulation rate-pitch discrimination tests improved the ability of musically inexperienced CI users to judge pitch when listening to complex sounds that varied simultaneously in F0 and spectral envelope (Vandali et al. 2012). Studies with NH listeners have shown significant effects of musical experience and/or training on discrimination of pitch (e.g., Spiegel & Watson 1984; Micheyl et al. 2006; Borchert

Design: A computer-based training program to improve musical pitch perception was developed that required listeners to match acoustic patterns of pitch and spectral timbre to visual patterns. Ten adult CI recipients participated: five used the training program and five acted as controls. Before training, F0 and center frequency discrimination for band-limited complex harmonic stimuli presented in the sound field were measured in all subjects using their standard clinical device(s). The F0 tests were conducted with and without spectral variations. The trainees subsequently used the training program at home for a period of 4 months, during which they were asked to train for approximately 30 min per day. The training schedule comprised two successive phases, each lasting 2 months. In the first phase, training employed a single cue (i.e., F0 for pitch or center frequency for spectral timbre) in the absence of other cue variations. In the second phase, training incorporated more complex sounds in which multiple cues were varied. Discrimination thresholds were remeasured in all subjects after each phase and again with trainees 3 months after training had ceased. Results: Trainees obtained significant improvements in F0 and center frequency discrimination as compared with control subjects for tests conducted at 2 months. The improvements in F0 discrimination were observed both in the absence and presence of small variations in place. However, the effect of training diminished for large variations in place or for higher F0s. Neither group showed further improvement in tests after additional training in the second phase. Tests conducted with trainees after training had ceased showed that F0 discrimination improvements were retained. Conclusions: The results showed that performance on pitch and timbre discrimination can be improved by training with single cues (F0 and center frequency) in the absence of other cue variations. Although results indicated that training with single cues can improve F0 discrimination within more complex sounds in which multiple cues vary, little improvement was seen when large variations in place were present, which suggests a diminishing effect of the training with increased influence of place-pitch. These data imply that although such training techniques may help listeners to follow melody in music, changes in instrument are likely to affect perception of melody. Results of subsequent training with complex sounds in which multiple cues varied were less conclusive and showed no further improvement. Follow-up evaluations with trainees conducted well after training had ceased verified the robustness of the training effect on F0 but not center frequency discrimination. Further studies are needed, however, to determine whether and to what degree subject motivation may be an important factor in these outcomes. Key words: Cochlear implant, Music, Pitch, Timbre, Training. (Ear & Hearing 2015;36;e1–e13)

The Hearing CRC, Carlton, 3053, Victoria, Australia; and 2Department of Otolaryngology, University of Melbourne, Australia.

1

0196/0202/2015/362-00e1/0 • Ear & Hearing • Copyright © 2014 Wolters Kluwer Health, Inc. All rights reserved • Printed in the U.S.A. e1

e2

VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

et al. 2011; Allen & Oxenham 2014). Micheyl and colleagues observed that pure-tone frequency and complex harmonictone F0 discrimination thresholds (F0DLs) for a group of nonmusically trained listeners was poorer than that of highly experienced musicians. However, after approximately 4 to 8 hrs of training, frequency discrimination thresholds for the nonmusically trained listeners approached that of the highly experienced musicians. Allen and Oxenham (2014) examined both F0 and spectral centroid discrimination in musicians and nonmusicians using stimuli that varied concurrently in both dimensions. The musicians had better F0 discrimination than nonmusicians, but similar spectral centroid discrimination. However, interference effects between pitch and spectral timbre were similar in both groups (when normalized for differences in sensitivity between dimensions and subjects), which suggested that musical training does not help to overcome those effects. Whether this is also the case for CI recipients has not been studied previously. Regardless, the finding that absolute pitch discrimination both in the absence and presence of spectral timbre variations was better in the musicians suggests that training in musically inexperienced CI recipients may similarly improve pitch discrimination. However, it is also possible that people become musicians simply because they are better at these tasks in the first place. The body of literature related to perceptual training paradigms is extensive (see Schmidt & Bjork 1992; Smith, Reference Note 1 for reviews). Key features common to successful training techniques include the provision of feedback (McCandliss et al. 2002), use of highly variable stimuli such as multiple talkers and varied phonetic content rather than a single talker and fixed phonemes (Logan et al. 1991; Wang et al. 1999; Iverson et al. 2005), and perceptual fading beginning with the most extreme exemplars of stimuli and progressing to more natural sounding exemplars as perception improves (Jamieson & Morosan 1986, 1989; McCandliss et al. 2002; Iverson et al. 2005; Smith, Reference Note 1). There is also some evidence that benefits can be obtained through training with primary cues alone while holding secondary or distracting auditory cues fixed, followed by the introduction of graded variations in secondary cues so that the stimuli evolve into more realistic sounds (Iverson et al. 2005). The aim of the present study was to investigate whether a training paradigm embodying the key features described above could improve listeners’ ability to perceive pitch, particularly in music where changes in pitch and timbre often occur simultaneously. The study assessed whether training involving discrimination of F0 information exclusively for judgment of pitch, and spectral envelope information exclusively for discrimination of spectral timbre, could improve pitch perception and reduce the known adverse effects that changes in spectral envelope (place of stimulation) can have on judgment of pitch. The specific hypotheses tested were the following: 1. Isolated training on a single cue (i.e., F0 for pitch or band-pass center frequency for spectral timbre) in the absence of other cue variations will improve a listener’s sensitivity to that cue in the absence of other cue variations. 2. Isolated training on those single cues in the absence of other cue variations will improve a listener’s sensitivity to F0 pitch in complex sounds in which multiple cues vary.

3. Subsequent training with complex sounds in which multiple cues vary further improves a listener’s ability to discriminate the pitch of complex sounds.

METHODS Training Program A software-based training program was developed that required listeners to match acoustic patterns of pitch and spectral timbre to visual patterns of musical notes presented on a computer screen. The training was embedded in a game format to promote increased participation. The tones used during training differed primarily in fundamental frequency (F0) and spectral envelope, both of which are the primary acoustic cues to discrimination of pitch and spectral timbre, respectively. Training commenced with the simplest tasks (i.e., discrimination of an individual musical attribute using widely spaced stimulus intervals and holding other information fixed) and progressed to more difficult tasks using more natural sounds (i.e., discrimination of stimuli spaced by smaller intervals and variation of multiple attributes of musical signals). For instance, for pitch training, the F0 interval was initially set to a size that the subject could clearly discriminate (as determined by discrimination tests conducted before the training), and this interval was reduced as the training progressed. To promote generalization of results to alternate tasks, variation across the training schedule was provided in the number of different acoustic tones presented during a training run (from two to nine tones but capped at three during the first 2 months of training) and the number of tones presented sequentially within a pattern (from one to three). The software program divided training into five Stages, each designed to train either separate or combined aspects of auditory perception: (1a) pitch; (1b) spectral timbre; (2a) combined pitch and spectral timbre; (2b) musical instruments—pitch and timbre; and (2c) musical instruments—pitch across timbre variations. Within each Stage, 12 different Levels were used which increased in difficulty via variation of stimuli, task, and visual configurations of the training screen. Before each training run for a given Stage-Level, the trainees were presented with an acoustic and visual representation of the tones to be discriminated, together with on-screen instructions about the task to be performed. For each run, subjects listened to a sequence of one or more acoustic tones and were required to select a matching visual sequence from many sequences displayed on the computer screen (see Fig. 1 for an example of the training screen and task). To make the training more challenging, the visual tone-tokens traversed along a predefined path on the screen. Selection of a visual pattern was performed by positioning the mouse cursor over the target pattern and pressing the mouse button. Selection of tokens that matched the target pattern eliminated those tokens only from the screen, whereas selection of an incorrect pattern inserted additional tokens on the screen. After a selection was made (correct or not), another pattern was randomly chosen from those available on the screen and presented acoustically. After all visual tokens were eliminated, or when the predefined time limit was reached, a new block (i.e., many sequences) of visual tokens appeared on the screen at the start of the path. A Level was completed when both 5 min of the game had elapsed and all tokens on the screen were eliminated. If any token reached the end of the path, the training run would stop (recorded as a failed attempt). Each Level needed to be completed before the next Level could be played.



VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

e3

Fig. 1. Example pitch training screen in which two different F0s are employed and two tones are presented sequentially as the target pattern. The tones are visually represented by quarter notes, which are displayed on a musical staff. Visual tokens representing higher F0 tones are positioned closer to the top of the staff than lower F0 tones. Sequences of visual tokens enter from the top-right corner of the screen (see Start arrow) and travel right-to-left along the top section of the path (see arrows), down the ramp, up the pipe (see dotted arrow), then change direction (left-to-right) and travel down the second ramp to drop off the end of the middle section of the path before finally traveling right-to-left to drop off the end of the bottom section of the path on to a drum (see End arrow). As an example, one visual instance of the target pattern is circled (see Target in bottom path) to illustrate the correct pattern that a trainee could select. For training tasks in which F0 is not discriminated (i.e., Stage 1b), the visual tokens are positioned at an equal height on the musical staff. In this case, spectral timbre is depicted by brightly lit notes for high center frequency versus dark (dull) notes for lower center frequency. For discrimination of instrument, images of the instrument are employed, and for discrimination of sung vowels, text symbols of the vowels (/a/, /i/, and /u/) are displayed.

Subsequent runs of a completed Level could be undertaken using smaller cue intervals, allowing trainees to obtain a higher performance score.

Synthetic Complex Harmonic Tone Stimuli Synthetic complex harmonic tones were used in the first three Stages of training. They were generated using additive synthesis from sine-phase (i.e., zero initial phase) harmonics of F0 that were weighted in amplitude in accordance with a bandpass filter specification. F0 Variations  •  F0 for the tones ranged from 87.3 Hz (F2 in musical notation) to 247.0 Hz (B3), which was within the range of modulation rates over which CI users are generally able to use rate-pitch cues (e.g., McDermott & McKay 1997; Geurts & Wouters 2001; Vandali et al. 2012). The F0s were quantized to 1/16th of a semitone in the Western musical scale. At the commencement of each training run, a reference F0 was randomly chosen from the available range and remained fixed within the run. For tasks involving pitch discrimination, additional F0s were chosen to produce one or two (depending on the task) fixed F0 intervals. For those pitch tasks, the spectral envelopes remained fixed across stimuli within the run and were determined as described in the next section. Center Frequency and Spectral Envelope Variations  •  The spectral envelope of the synthetic tones was shaped according to a band-pass filter (BPF) specification comprising a passband (6 dB attenuation at each edge), flanked either side by a transition band (26 dB attenuation at its edge beyond which attenuation was infinite). Tones comprised one of three possible center frequency ranges for the passband depending on the Level: “low” which spanned a range of 2 to 5 × the highest F0 of the

tones to be presented within the training run; “mid” (6 to 14 × highest F0); and “high” (10 to 31 × highest F0). Across each training run, the band-pass center frequency was randomly chosen from its assigned range, but was fixed across stimuli within a run. One of two pass-transition bandwidth profiles was also assigned to each Level: “narrow” (pass-transition width = 2-1, 3-1, 7-1 × highest F0 for the low, mid, and high center frequency ranges, respectively) and “wide” (4-2, 6-2, 12-2 × highest F0 for the low, mid, and high center frequency ranges, respectively). For tasks involving both pitch and spectral timbre variations, each attribute was individually assigned according to the rules described above. Temporal Envelope and Intensity Variations •  For all Stages, the overall duration of tones was 500 ms. The temporal envelope applied to the synthetic tones varied across Levels, but was fixed within each Level. Four different envelopes were used: “uniform” (fast 30 ms linear fade-in and fast 30 ms fadeout); “swell-up” (slow linear fade-in over the duration of tone with a fast 30 ms fade-out); “fade-out” (fast 30 ms linear fade-in with slow fade-out over the remainder of the tone); and “transient-sustained” (Attack-Decay-Sustain-Release envelope with 15 ms attack time, followed by a decay to a level of 30% of the maximum over 60 ms and ending with a 30 ms release). In early Levels of each Stage, the intensity of all tones were RMS-level balanced; however, in later Levels, up to 6 dB of level roving was applied to each presented tone.

Musical Instrument and Sung-Vowel Stimuli A prerecorded database of musical instrument sounds (Goto et al. 2003) was used to derive stimuli for Stages 2b and 2c. Samples from three musical instruments, trombone, tenor saxophone,

e4

VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

and piano, and from male and female rhythm and blues vocalists singing vowels /a/, /i/, and /u/ were used. All samples in the database were mono-WAV files recorded with 16-bits of precision and a 44.1 kHz sampling rate. F0 values of the samples were spaced by one semitone on the Western musical scale from F2 to B4. For intervals less than one semitone, a pitch-shifting routine was used in the training program to quantize F0 to steps of 1/8th of a semitones (double that used for synthetic tones). Instrument (or vowel) was fixed within each Level but varied across Levels. In Stage 2b, discrimination of both F0 and the instrument was required, whereas in Stage 2c, F0 discrimination was trained only for stimuli that differed in instrument type.

Adaptation of Tone Intervals The cue interval between stimuli (e.g., F0 for pitch) for each Level was initialized to approximately four times each subject’s discrimination threshold (measured before the training). It was adapted separately for each Level based on a trainee’s performance in that Level. Performance was measured by estimating the dissimilarity index: d-prime (d') for the stimuli used in the training run (Macmillan & Creelman 1991). If the Level was completed successfully and d' was greater than or equal to 2.0 (indicating that the trainee could discriminate the stimuli well), the interval size was reduced. If d' was less than two, the interval was usually not adjusted. Increases in the interval size were avoided unless a Level could not be completed after 20 attempts, and d' < 0.5. Intervals were not increased beyond eight times the subject’s measured difference limen (DL). The largest F0 interval permitted was 12 semitones and the smallest was 1/16th for synthetic tones or 1/8th for recorded instrument stimuli. For synthetic tones, the largest center frequency interval permitted was 24 semitones and the smallest was 1/8th.

Subjects Ten adult CI users participated in the study. The subjects were split into two groups of five trainees (TS) and five controls (CS), with the groups balanced in terms of prior musical experience and modality of hearing. Because of the limited pool of subjects, differences in gender, age, years of implantation, and implant/processor type could not be balanced between groups (see Table 1).

Training Schedule and Evaluation Protocol All subjects were initially familiarized with the concept of pitch and spectral timbre in week 1 of the study (see the training/evaluation schedule summarized in Table 2). Familiarization included tasks involving ranking of pitch according to changes in F0 and ranking of spectral timbre according to changes in center frequency of band-pass filtered complex harmonic tones. These tasks were repeated until a plateau in performance was observed so as to reduce the potential impact of task learning on subsequent outcomes. In weeks 2 to 3, formal measurements of F0 and center frequency discrimination thresholds were conducted. The trainees were then issued with the training program on a laptop computer (MS Windows 7) with output provided acoustically through powered loudspeakers (Behringer MS16). They were instructed on how to use the program and were asked to train using their everyday clinical device(s) (i.e., CI alone or CI + HA for bimodal subjects) for approximately 30 min per day or 60 min every two days.

The total duration of training, excluding evaluation sessions, was approximately 4 months (16 weeks). It was divided equally into two training phases (see Table 2). In the first phase, which addressed hypotheses 1 and 2 related to training using single cues, discrimination of F0 was initially trained in Stage 1a (pitch) over a 1-month period (weeks 3 to 6), followed by discrimination of center frequency in Stage 1b (spectral timbre) over a second 1-month period (weeks 7 to 10). After the second month of training (or approximately 2 months after the first evaluation tests for control subjects), all subjects had their second session of evaluation tests (weeks 11 to 12). Upon completion of those tests, the trainees commenced phase 2 of the training, which addressed hypothesis 3 related to training with multiple cues over another 2-month period (weeks 12 to 19). They were instructed to train for approximately 1 month in Stage 2a (combined pitch and spectral timbre) and to spend the final month training in Stages 2b (musical instruments—pitch and timbre) and 2c (musical instruments—pitch across timbre variations). Subjects TS1 and TS2 did not have access to Stage 2c and so they where only able to train with Stages 2a and 2b in phase 2. After the second phase of training (or after the second 2-month period for controls), all subjects again returned for their third session of evaluation tests (weeks 20 to 21). Approximately 3 months after completion of the training (week 32), the trainees were further tested using a subset of tests to examine the effectiveness of the training on long-term outcomes. In addition, they were at that time asked a series of questions related to their experiences when using the training program and the effect that it had on their appreciation of music and listening habits.

Evaluation Tests Stimuli and Procedures •  F0 discrimination was measured using six different tests. Stimuli used in tests 1 and 2 comprised synthetic complex harmonic tones in which only F0 varied. In test 3, a natural sung vowel was used for which formant frequencies remained relatively fixed among stimuli. Tests 4 and 5 comprised sung vowel stimuli with small variation in formant frequencies across F0s, and test 6 comprised two different sung vowels that introduced large differences in formant frequencies. Band-pass center frequency discrimination was measured in test 7 using synthetic complex harmonic tones in which F0 remained fixed and center frequency was varied. Specific details about stimuli used in each test are listed in Table 3. Tests 1 to 3 and test 7 served to examine the effect of training with a single cue (in phase 1) on subjects’ sensitivity to that cue. Tests 4 to 6 examined the effects of training (in phases 1 and 2) on F0 discrimination in complex sounds for which multiple cues varied. The discrimination tests were conducted using a two-interval, two-alternative forced choice (2I-2AFC) procedure and a method of constant stimuli. For F0 discrimination tests 1 and 2, stimulus pairs were centered on a reference (nominal) F0 of 150 Hz and were spaced by 12, 10, 8, 6, 4, 2, 1, 1/2, 1/4, and 1/8 semitones. The band-pass filter applied to the complex harmonic stimuli, defined according to the same definitions used for synthetic tones in the training program, had a center frequency (Cf) of 300 Hz in test 1 and 1200 Hz in test 2 (see Table 3). For both tests, the −6 dB passband width was 300 Hz and the −26 dB transition-band width was 600 Hz. Those tests served to measure F0 discrimination at two different spectral regions, the first involving mainly low-order (1 to 4) harmonics



e5

VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

TABLE 1.  Subject details Group

Subject

Trainee Trainee Trainee Trainee Trainee Trainee

TS1 TS2 TS3 TS4 TS5 Average

Control Control Control Control Control Control

CS1 CS2 CS3 CS4 CS5 Average

Age (year)

Years Implanted

M F F M F

62 66 51 50 69 60

3 2.5 2 8 10 5.1

M M M M M

60 83 85 65 79 74

7 8 3 12 11 8.2

Gender

Musical Experience

Modality of Hearing

Cochlear Implant

Clinical Processor

Clinical Strategy

Prior to and post CI Prior to CI None None None

Bimodal Bimodal CI only CI only CI only

CI24RE (CA) CI24RE (CA) CI512 CI24RE (CA) CI24M

CP810 Freedom CP810 CP810 Esprit3G

ACE ACE ACE ACE SPEAK

Prior to and post CI Prior to CI None None None

Bimodal CI only Bimodal CI only CI only

CI24RE (CA) CI24R (CS) CI24RE (CA) CI24M CI24R (CS)

CP810 Freedom CP810 Freedom Freedom

ACE ACE ACE SPEAK ACE

Trainees comprised three female and two male subjects, whereas all control subjects were male. Two subjects in each group had prior musical experience and two were bimodal users of a cochlear implant (CI) and contralateral hearing aid (HA). Subjects used different generations of the Nucleus 24 CI system and either the ACE or SPEAK clinical strategy. Average age and number of years implanted are shown for each subject group. Aided acoustic audiograms for TS2, CS1, and CS3 all demonstrated steeply sloping high frequency loss above approximately 500–1000 Hz, whereas TS1 had more usable acoustic hearing in the frequency range 1000–2000 Hz but very little hearing at low frequencies.

of F0 and the second higher-order (4 to 12) harmonics. The same type of stimuli were used for center frequency discrimination in test 7, although in this case, F0 was fixed at 150 Hz, and center frequency intervals of 16, 14, 12, 10, 8, 6, 4, 2, 1, and 1/2 semitones centered on 300 Hz were used. For tests involving F0 discrimination of sung-vowel stimuli (tests 3 to 6), vowels sung by a male and a female rhythm and blues singer (obtained from the stimulus pool used for training) were used. For most subjects, the F0 of the vowels used ranged from A2 to C3 for the male singer and A3 to C4 for the female singer, although those ranges were expanded for subject CS4 (see Table  3 caption).

All synthetic tones and sung vowels were RMS level–balanced and were presented in the sound field at a maximum level of 65 dB SPL (measured at the subject’s CI device) and roved in level by up to −6 dB to reduce any systematic effects of loudness on judgment of pitch or spectral timbre. The duration of each tone was 500 ms and the interstimulus interval was 300 ms. Before each test, the smallest discrimination interval was determined for which the subject could consistently rank pitch correctly (at least 7 out of 8 times) according to F0 or spectral timbre correctly (on a scale of dull to bright) according to center frequency. For tests 1, 2, and 7, five stimulus pairs were

TABLE 2.  Training/evaluation schedule Phase Familiarization

Phase 1

Phase 2

Extended post training

Week 1

Training/Evaluation Familiarization

2–3

1st evaluation session

3–10

Training phase 1

11–12

Second evaluation session

12–19

Training phase 2

20–21

Third evaluation session

32

Fourth evaluation session

Control Group Tasks

Trainee Group Tasks

Familiarize subjects with pitch, spectral timbre, and F0 and center frequency discrimination tests Measure F0 and center frequency DLs

Familiarize subjects with pitch, spectral timbre, and F0 and center frequency discrimination tests Measure F0 and center frequency DLs Issue training program and instruct on its usage Phase 1 self-administered training  Stage 1a (pitch)  Stage 1b (spectral timbre) Remeasure F0 and center frequency DLs Phase 2 self-administered training  Stage 2a (combined pitch/ timbre)  Stages 2b and 2c (instruments) Remeasure F0 and center frequency DLs Remeasure center frequency and some F0 DLs

NA

Remeasure F0 and center frequency DLs NA

Remeasure F0 and center frequency DLs NA

The study comprised a pretraining/familiarization phase, training phase 1, training phase 2, and an extended post-training phase. Evaluation sessions took place at the end of each phase. Listed in the table is the timeline for each of those phases in weeks and the tasks to be carried out for each subject group. DL, difference limen; F0, fundamental frequency; NA, not applicable.

e6

VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

TABLE 3.  Summary of stimuli used for discrimination tests 1–7 Test

Discrimination

Stimuli

1

F0 (fixed place)

2

F0 (fixed place)

3

F0 (relatively fixed place)

Synthetic complex harmonic tone Synthetic complex harmonic tone Sung vowel

4

F0 (small place variations)

Sung vowel

5

F0 (small place variations)

Sung vowel

6

F0 (large place variations)

Sung vowels

7

Center frequency

Synthetic complex harmonic tone

Varied Parameters F0: 150 Hz nominal F0: 150 Hz nominal F0: A2 to C3 F1: 0.23 Els (59 Hz) F2: 0.25 Els (65 Hz) F0: A2 to C3 F1: 0.51 Els (68 Hz) F2: 0.86 Els (186 Hz) F0: A3 to C4 F1: 0.48 Els (99 Hz) F2: 0.47 Els (162 Hz) F0: A2 to C3 F1: 1.76 Els (221 Hz) F2: 3.91 Els (694 Hz) BPF Cf: 300Hz nominal

Fixed Parameters BPF Cf = 300 Hz −6 dB Bandwidth = 300 Hz BPF Cf = 1200 Hz −6 dB Bandwidth = 300 Hz Male vowel /a/ Av F1: 570 Hz Av F2: 1134 Hz Male vowel /i/ Av F1: 402 Hz Av F2: 1698 Hz Female vowel /a/ Av F1: 770 Hz Av F2: 1314 Hz Male vowels /a/ and /i/

F0 = 150 Hz −6 dB Bandwidth = 300 Hz

F0 discrimination was measured in tests 1–6 and center frequency discrimination in test 7. Synthetic complex harmonic tones were used in tests 1, 2, and 7, whereas sung vowel stimuli were used in tests 3–6. For F0 discrimination tests 1–3, place information (spectral timbre) remained relatively fixed among stimuli, whereas for tests 4–5, place information varied slightly amongst stimuli, and for test 6, two different vowels were used within test, which introduced large variations in place amongst stimuli. Stimulus parameters that were fixed within a test and those that varied among stimuli are listed in the last two columns of the table, respectively. For the male sung-vowel tests, F0 varied between A2 and C3 (110.0 to 130.8 Hz) for all subjects except CS4 for whom the range needed to be expanded to G#2 to G#3 (103.8 to 207.7 Hz). For the female sung-vowel test, F0 varied between A3 and C4 (220.0 to 261.6 Hz) for all subjects except CS4 for whom the range was expanded to F#3 to G4 (185.0 to 392.0 Hz). For the sung-vowel stimuli in tests 3–6, the range of variation in mean place of electrical stimulation for the first (F1) and second (F2) formant frequency are listed in units of electrode place (Els) for the Nucleus 24 system in which electrodes are separated by 0.75 mm and in Hz. Average F1 and F2 frequencies for those sung vowels are also listed. BPF Cf , band-pass filter center frequency; F0, fundamental frequency.

then formed, which included that interval and the four next consecutively smaller intervals. For tests 3 to 6, the two stimuli that subjects could consistently rank were used as well as two with equally spaced intervening F0s. For tests 3, 4, and 5, the same vowel (either /a/ or /i/) was presented within each trial, whereas for test 6, vowels /a/ and /i/ were both presented in balanced random order in each trial. Feedback was provided only during the familiarization phase in week 1 and not during evaluation sessions. Within each test run, 10 repetitions of each interval were presented in randomized order (half with the lowest F0, or dullest spectral timbre, presented first). Accordingly, there were 50 trials per run for the synthetic tone tests 1, 2, and 7; 60 trials for the sung-vowel tests 3, 4, and 5; and 120 trials for the mixed sungvowel test 6. For each test, at least two runs were administered. Short breaks of 2 to 3 min were given to subjects between tests. Response bias–corrected percent-correct scores (Pcmax) (Macmillan & Creelman 1991) were calculated for each interval and used to fit cumulative Gaussian functions using weighted linear regression. Thresholds were determined for a performance criterion of Pcmax = 76% corresponding to d-prime (d') = 1. Standard deviations of the thresholds for each test with each subject were estimated using a bootstrap procedure (Foster & Bischof 1991). Place Variations Amongst Sung-Vowel Stimuli  •  Differences in formant frequencies across sung-vowel stimuli used in tests 3 to 6 produced variations in place of stimulation that could influence listeners’ judgment of pitch (e.g., Pijl 1997a; Zeng 2002; Vandali et al. 2005; 2012, 2013). For the male sung vowel /a/ in test 3, the mean first (F1) and second formant (F2) frequency (measured using Praat software, version 5.0.03, www.praat. org) was 570 and 1134 Hz, respectively, and varied by up to

59 and 65 Hz, respectively, across stimuli. The third formant was much lower in intensity than F1 and F2 and varied little in place among stimuli, and so it was ignored in this analysis. Expressed in terms of changes in mean place of electrical stimulation or spectral centroid (Laneau et al. 2004), those variations were small for both F1 and F2 (i.e., less than approximately 0.25 electrode places when processed through a 22-channel Advanced Combinational Encoder (ACE) strategy such as that used by most subjects in the present study; see Table 3) compared with place differences that can be typically discriminated by CI users according to a review by Moore and Carlyon (2005). Those authors reported a median threshold of 1.2 mm or 1.6 electrode places for users of the Nucleus 22 implant system, although thresholds as low as approximately 0.25 mm, or 0.33 electrode places, were reported for the better subjects. Laneau and Wouters (2004) also reported similarly low thresholds ranging from 0.25 to 0.46 mm in a group of four Nucleus 24 CI recipients. As shall be shown later, center frequency (place) discrimination thresholds for the subjects in the present study were found to be close to those of better-performing CI subjects. Given that variations in formant place among stimuli in test 3 were smaller than subjects’ center frequency (place) discrimination thresholds, it is unlikely that those variations influenced their judgment of pitch. For tests 4 to 6, larger variations in mean place of stimulation were included. For the male sung vowel /i/ in test 4, mean F1 and F2 varied by up to 68 and 186 Hz, respectively. The large variation in F2 was due to the singer altering articulation of the vowel /i/ slightly toward /e/ across F0s. For those stimuli, mean place of electrical stimulation varied by as much as approximately 0.8 electrode places. Those variations were greater than the subjects’ center frequency (place) discrimination thresholds



VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

and so may have influenced their judgment of rate pitch. For the female sung vowel /a/ used in test 5, F1 and F2 varied by 99 and 162 Hz, respectively, and place of stimulation varied by approximately 0.5 electrode places across F0s. However, because F0 for those stimuli was approximately one octave higher than for the male vowels, it is likely that variations in place will have influenced judgment of rate pitch by a greater amount than for the male vowel. This is, in part, expected because rate pitch perception is known to be poorer at higher F0s (e.g., McKay et al. 2000; Zeng 2002; Vandali et al. 2012), but also because coding of F0 modulation depth is expected to be shallower because of fewer harmonics falling within each band-pass filter. Through measurements, we observed an average F0 modulation depth for the female vowel of about 5% of the electrical dynamic range compared to about 20% for the male vowels when stimuli were presented directly into the auxiliary input of subjects’ processors. For test 6, the stimuli in each F0 interval consisted of two different vowels (/a/ and /i/), which introduced large changes in formant frequencies across stimuli (221 and 694 Hz or 1.76 and 3.91 electrode places for F1 and F2, respectively). Outcomes of tests 4 and 5 were thus used to measure listeners’ F0 discrimination when accompanied by “small” variations in place (spectral timbre), whereas test 6 examined F0 discrimination for “large” changes in place.

RESULTS F0 Discrimination F0DLs expressed as percentage of the reference F0 are plotted in Figure 2. The F0DLs are plotted on a logarithmic scale because they were found to be normally distributed on a logarithmic scale but not on a linear scale (Shapiro–Wilk’s test of normality, p = 0.852 versus p < 0.001, respectively). Presentation of DLs in this way is also consistent with other studies (e.g., Micheyl et al. 2006, Goldsworthy & Shannon 2014). Results are plotted for tests 1 to 6 for each trainee in the left column, for each control subject in the middle column, and for group averages in the right column. For most of the F0 discrimination tests, a trend of improvement in DLs from pretraining session 1 to post-training session 2 (i.e., the completion of training phase 1) was seen for the trainees but not for the controls. Little difference in DLs was seen for both groups of subjects when comparing results from sessions 2 to 3 (i.e., as a result of phase 2 training). Differences in pretraining F0DLs (i.e., from session 1) between each group was examined using a one-way repeatedmeasures analysis of variance (RM ANOVA). For tests in which non-normally distributed data were present, a nonparametric ANOVA on rank data was performed (note, nonparametric analysis was also performed in all subsequent tests in which data was found to be non-normal). For all tests, no significant difference in pretraining DLs between groups was observed, although mean DLs for the control group were generally higher than those of the trainees. It should be noted that this trend was strongly influenced by the results of subject CS4, whose average F0DL on these tests was around 58% (or 8 semitones) compared with an average of around 10% (or 1.6 semitones) for the other four control subjects and around 12% (or 2 semitones) for the trainees. For each test, the effect of training in phase 1 on F0 discrimination was examined by conducting RM ANOVA on log F0DLs

e7

using group (trainees and controls) and session (1 to 2) as the within-subject factors. For all tests, no effect of group was observed. However, a significant effect of session was observed for test 1 (F[1,8] = 23.73; p = 0.001) and test 3 (F[1,8] = 7.89; p = 0.023). Importantly, a significant interaction was observed between group and session for test 1 (F[1,8] = 21.19; p = 0.002), test 2 (F[1,8] = 7.37; p = 0.026), test 3 (F[1,8] = 10.27; p = 0.013), and test 4 (F[1,8] = 16.02; p = 0.004). Post hoc analysis of data using the Bonferroni test was conducted for the interaction between group and session. Outcomes showed that F0DLs for session 2 were significantly lower than those for session 1 for trainees but not controls for all four tests. The effect of further training in phase 2 compared with that obtained after phase 1 was examined using a separate RM ANOVA and log F0DLs from sessions 2 and 3. For all tests, no significant effects of factors, group and session, nor their interaction was observed. Group-averaged F0DLs for trainees in tests 1 to 4 decreased from 9.6% (1.6 semitones) in session 1 to 4.8% (0.8 semitones) and 4.2% (0.7 semitones) in sessions 2 and 3, respectively. In contrast, those for controls remained relatively steady at 18.1, 21.1, and 16.2% across sessions 1 to 3, respectively (or 7.3, 7.7, and 8.0%, respectively, if subject CS4 is excluded). Note, the above RM ANOVAs were repeated excluding CS4, and apart from differences in levels of significance, all reported outcomes remained the same. While no significant effects for tests 5 and 6 were observed after training in phase 1 or 2, trends in the data for those tests suggest that significant improvements may be exhibited in a larger subject pool. For test 5, average F0DLs for controls remained relatively fixed across sessions 1 to 3 (13.3%, 13.0%, and 13.1%) but steadily improved for trainees (17.9%, 13.0%, and 10.4%). For test 6, average F0DLs for controls were more variable across sessions (20.1%, 26.5%, and 15.7%) but improved for trainees (9.9%, 6.1%, and 6.1%).

Center Frequency Discrimination Center frequency DLs are plotted on a logarithmic scale in Figure 3 for each subject and each subject group as a function of session. Group average center frequency DLs for trainees measured in sessions 1, 2, and 3 were 51.1, 31.9, and 42.3%, respectively, and 41.1, 46.0, and 42.8%, respectively, for controls. For all trainees, a trend of improvement in DLs between sessions 1 and 2 (i.e., after training phase 1) was seen. For most trainees, DLs increased across sessions 2 and 3, although mean DLs for session 3 remained lower than those for session 1. For controls, little difference in group average DLs across sessions was observed, although a slight trend of improvement was seen for three of the five subjects. An RM ANOVA on pretraining center frequency DLs revealed no significant difference between trainee and control group performance. The same analysis model used for log F0DLs was applied to the log center frequency DLs. For data from phase 1 of the training, no significant effect of group or session was observed, but the interaction between those two factors was significant (F[1,8] = 12.58; p = 0.008). Similar to the F0DL results for tests 1 to 4, post hoc analysis using the Bonferroni test showed that DLs in session 2 were significantly lower than those from session 1 for trainees but not controls. For data from phase 2 of the training, a significant interaction between group and session was also observed (F[1,8] = 6.44; p = 0.035), but in this case, the post

e8

VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

Fig. 2. F0 discrimination thresholds (F0DLs) for trainees, controls, and each group, plotted as a percentage of reference F0 on a logarithmic scale. Data are plotted for session 1 (pretraining), session 2 (completion of Phase 1 training), session 3 (completion of Phase 2 training), and session 4 (3 months after training had ceased for trainees and tests 2, 3, and 6 only). For group averaged DLs, error bars plot 5% least significant difference (LSD) of means for the effect of session within group and asterisk symbols indicate a statistically significant difference between two sessions within group.

hoc comparison showed that DLs in session 3 were significantly higher than those from session 2 for trainees only. The average center frequency DL of subjects in the present study was approximately 42.5% or around 6 semitones. When the stimuli used in this test are processed though a 22-channel ACE strategy, a change of approximately 6 semitones up, or down, from a center frequency of 300 Hz corresponds to centroid changes of +0.26, or −0.44, electrode places, respectively.

These changes are comparable to the electrode place discrimination thresholds for better performing CI recipients (Laneau & Wouters 2004; Moore & Carlyon 2005). Furthermore, these data confirm that the center frequency DLs for the subjects in the present study were smaller than changes in formant place that would be produced across the sung-vowel stimuli in tests 4 to 6 but larger than those in test 3 and support the rational for subdivision of sung-vowel tests described in the methods.



VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

e9

Fig. 3. Center frequency DLs for trainees, controls, and each group plotted as a percentage of reference center frequency on a logarithmic scale. Axis, symbols, and error bars are the same as those described in Figure 2.

Trainees’ Results After an Absence of Training To test whether the training provided any long-term improvement in trainees’ discrimination thresholds, some of the discrimination tests (2, 3, 6, and 7) were repeated in a fourth session approximately 3 months after completing the training schedule. These data are included in Figures 2 and 3 and were compared with DLs from session 3 using an RM ANOVAs on log DLs, with sessions 3 and 4 as the within-subject factor. No significant effect of session was found in any of those tests. The session 4 data were also compared with DLs from session 1, and a significant improvement across session was observed for test 2 (F[1,4] = 16.78; p = 0.015) and test 3 (F[1,4] = 11.37; p = 0.028).

Training Duration Information pertaining to subjects’ participation in the training (e.g., training duration) was extracted from training logs. Those data were used to examine whether there was any correlation between training effort and discrimination improvements. The average number of occasions (days) on which subjects trained over the 16-week period was 55.4 (out of 112 days), and the average duration of training on those occasions was 0.52 hrs. The total training time averaged across the five trainees was 28.8 hrs, with a range of 9 to 45 hrs and a standard deviation of 12.3 hrs. A graph plotting cumulative training durations for each subject and Stage is shown in Figure 4. The correlation between training duration and evaluation test outcomes for each test was examined using a linear regression model. Correlations with the difference in log DLs across sessions were examined for sessions 1 to 2, 2 to 3, and 1 to 3. No significant correlations were found, which is not too surprising given the small sample size (N = 5) and that learning rates are not necessarily the same across subjects, training phases, and test measures. Furthermore, it is possible that subjects’ performance may have saturated in some tests well before training had been completed.

was accustomed to more sophisticated games. His expectations of game play were higher than those of the other subjects, which suggest that improvements to the game play and graphical esthetics may be needed to appeal to such players. Three of the five trainees commented that their appreciation of music had improved over the course of the training. For some, they felt that they could hear changes in pitch, follow melody, or could attend to individual instruments in music better. However, TS4 and TS5 indicated no change in their appreciation of music post-training. For TS4, this is consistent with his lack of positive appraisal of the program and his low duration of training. Most trainees indicated that their music listening habits had changed as a result of the training and indicated that they were more confident about listening to music, particularly at live performances. One trainee (TS3) said that she now listens to new (unfamiliar) songs and enjoys them. That subject has also taken up playing of the piano since completing the training.

DISCUSSION Effects of Training Single Cue Training •  For F0 discrimination tests 1 to 3 and center frequency discrimination test 7, discrimination of a single cue in the absence of other cue variations (excluding loud-

Trainees’ Comments At the completion of the study, trainees were asked to comment on whether they had found the experience enjoyable and whether it had changed their appreciation of music and listening habits in any way. Four of the five subjects reported that they enjoyed the training experience and commented that it was more like playing a game than a training task. Those trainees had little prior computer gaming experience. In contrast TS4, who was the youngest and an experienced user of computers,

Fig. 4. Cumulative training duration for each subject and the group average. Durations are shown in each column for stages 1a, 1b, and 2a and stages 2b and 2c combined. The recommended cumulative training durations for phase 1 and phase 2 are shown by the dashed and dotted lines, respectively.

e10

VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

ness) was examined. The significant improvements observed in those tests support the first hypothesis that training with single cues of F0 and center frequency can improve listeners’ sensitivity to those cues. The synthetic stimuli used in tests 1, 2, and 7 were similar to those used in the training, although during training, a larger range of F0 and center frequencies, and different temporal and spectral envelopes, were used. In contrast, the stimuli used in test 3 to 6 were sung vowels. The results from test 3 show that the effects of training were generalized to discrimination improvements for more natural musical sounds in the absence of other cue variations. F0 discrimination in the presence of variations in spectral envelope was examined in tests 4 to 6. The outcomes of test 4 supported the second hypothesis that training involving discrimination of single cues can improve listeners’ sensitivity to F0 in complex sounds in which multiple cues vary. Although the results for tests 5 and 6 did not demonstrate a significant effect of training, trends in the data for those tests suggest that significant improvements may be exhibited in a larger subject pool. In addition, F0 values of the stimuli in test 5 were an octave higher than those of test 4 and were beyond the range in which temporal F0 information is well coded and perceived with CIs. Thus, the place variations in test 5 were likely to affect judgment of F0 pitch more than those in test 4, which is consistent with observed results. For test 6, much larger changes in spectral envelope were included among stimuli. The spectral envelope place variations were at least four times larger than those of test 4 and 5 and exceeded subjects’ center frequency (place) discrimination thresholds measured in test 7. Those larger place variations may exert a stronger effect on pitch. In summary, the data show that training with F0-only variations can improve sensitivity to F0 in the presence of spectral envelope variations, but the benefit diminishes as the effect of place variations on judgment of F0 rate pitch increases. Multiple Cue Training  •  Outcomes of the second training phase showed that further training with combined cues did not result in F0DL improvements as compared to tests conducted after single cue training. However, because subjects were only trained in phase 2 for approximately two-thirds of the duration that they did in phase 1, it is possible that greater improvements might have been achieved with additional training. Alternatively, subjects’ performance may have saturated during the first phase of training and so no further improvements were possible irrespective of the type of training provided in phase 2. While the results do not support the third hypothesis that subsequent training with complex sounds in which multiple cues vary may further improve listeners’ ability to discriminate the pitch of complex sounds, they do not explicitly reject it either given the confounding effects of training duration. No Training (Control Subjects) •  Results for the control subjects exhibited no significant changes over time. In the first weeks of the study, care was taken to ensure that all subjects were well trained on the discrimination task so that any improvements in DLs observed in subsequent sessions could not be attributed to task learning. Tests were repeated until a plateau in performance was observed. Given the subsequent stability of control subjects’ DLs over time, that approach appears to have been effective in ensuring that most task learning occurred before commencement of the formal tests.

Comparison of Outcomes to Previous Studies The trainees obtained a greater than twofold improvement (i.e., a halving of %DLs) in tests involving discrimination of F0 for stimuli in which place information was held relatively fixed. Those results are comparable to recently reported outcomes by Goldsworthy and Shannon (2014) who measured six CI recipient’s electrical rate discrimination thresholds using an adaptive 2I-2AFC training procedure with feedback over the course of seven sessions totaling 32 hrs. The group average rate DL improved from 20.7 to 9.1% (or 3.3 to 1.5 semitones) across sessions. For tests in the present study, in which place remained relatively fixed (tests 1 to 3), the group average F0DL improved from 13.2 to 5.0% (or 2.1 to 0.84 semitones) over the course of training. Despite differences in subjects, training method, and mode(s) of stimulation, similar levels of improvement were obtained. While Goldsworthy and Shannon only reported significance of findings for a training duration of 32 hrs, effects of session for some subjects were apparent in the data as early as session 2 or 3. In the present study, significant effects were observed after the first training phase (DLs averaged across group and tests 1 to 3 at the end of phase 1 were 5.65% or 0.95 semitones) in which the average training duration was 17.3 hrs. It is possible that even shorter durations of training can be effective, although that was not tested in the present study. Short-term psychoacoustic training in NH listeners has been shown to provide similar levels of improvement in F0DLs in a much shorter period of time. Micheyl et al. (2006) used a 2I-2AFC procedure with feedback to measure F0DLs of complex harmonic tones across a period of 2.5 to 3 hrs. F0DLs improved by a factor of approximately 3 on average in nonmusicians but only by approximately 1.3 in musicians. Although those improvements might be attributable in part to short-term task learning, the authors also demonstrated significant improvements in puretone frequency discrimination in which performance for nonmusicians reached that of musicians after 4 to 8 hrs of training conducted over five 2 hr sessions. In the present study, effects of training on DLs were observed when subjects listened to acoustically presented tones through their standard clinical devices, as opposed to via direct electrical stimulation as reported by Goldsworthy and Shannon (2014). This is an encouraging outcome for CI recipients as it demonstrates that the intervention can promote substantial improvements in their ability to process pitch information in the real world without any changes to a subject’s clinical device(s). It is also encouraging that group average absolute F0DLs after training were around one semitone or lower in all tests excluding test 5, for which the F0 of stimuli was close to the upper limit of rate-pitch discrimination in CIs. Those data suggest that many CI recipients have the capacity through training to discriminate the smallest musical intervals of the western musical scale when listening to monophonic music with F0s below a few hundred hertz. In NH listeners, although it has been demonstrated that musicians have better absolute pitch discrimination abilities than nonmusicians, the interaction between timbre and pitch has been found to be similar in these two groups (e.g., Allen & Oxenham 2014). In the present study with CI recipients, outcomes demonstrated a positive effect of training on discrimination of F0 in the presence of small spectral timbre variations that diminished with increased variation. It is not clear, however, from the present study whether the interaction between



VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

pitch and spectral timbre is affected by training. Tests involving F0 discrimination using more controlled (congruent and incongruent) spectral variations than those used in the present study would be needed to answer that question.

Robustness of Training Effect The long-term effects of training on subjects’ performance was examined by remeasuring F0 (tests 2, 3, and 6) and center frequency DLs (test 7) approximately 3 months after the training schedule was completed. For those four tests, no significant reduction in DLs was observed between results at the end of training phase 2 and 3-months after training had ceased. For F0 discrimination tests 2 and 3, DLs after the no-training period were also significantly better than those before training. These data demonstrate that the significant improvements in DLs obtained during training (in tests 2 and 3) were retained well after training had ceased. For data from test 6, no significant effect of training in any phase had been observed, and so no effect of an absence from training would be expected. For the center frequency test (7), although significant improvements were observed after the first phase of training, those improvements decreased after phase 2 such that they were no longer significantly different to those of the control subjects. The lack of any significant effect after training had ceased is consistent with that which had already been observed at the end of phase 2.

Familiarity and Motivation During the training phases, control subjects did not partake in any form of musical training other than incidental exposure to music during their daily routines (similar use of control subjects was reported in training studies by Gfeller et. al. 2000, 2002b). In contrast, trainees became very familiar with the training discrimination tasks (albeit a different task than that used to measure DLs) and may have been more highly motivated to perform well in the evaluation tests during the training because of the additional commitment required. This raises the possibility that some of the benefits of training may have been due to increased familiarity with stimuli and procedures and level of motivation. It is unlikely that familiarity with stimuli would be a factor, because the stimuli used during training were taken from a large data set and were vary rarely the same as those used during testing. Familiarity with procedural aspects and level of motivation are, however, more difficult to control, and further work is needed to better differentiate the contribution of the training program per se, versus the motivational effects of involvement in the training. There are some indicators in the data that suggest that the effects of motivation were small in the present study. For instance, smaller training benefits were observed in tests 5 and 6 compared to tests 1 to 4, even though initial absolute performance was comparable for all the tests. If motivation were primarily responsible for the training effects, similar improvements might be expected for all 6 tests. A further indication that motivation effects may not have been large is the finding that benefits were retained 3 months after training had ceased, although it is possible that elevated levels of motivation in trainees were maintained for that 3-month period.

Future Research and Recommendations The potential influence of subjects’ familiarity with procedures and their level of motivation on outcomes could be

e11

more accurately controlled by providing control subjects with a placebo program/game (to be used during the training phase) that does not require them to discriminate pitch or timbre. The game task, for instance, may require them to discriminate auditory attributes such as loudness and rhythm instead. Furthermore, after completion of the training and assessment, the protocol could be repeated with trainees and controls alternating their roles. The additional training with stimuli in which multiple cues were varied provided no additional benefit after initial training with single cues in phase 1. However, the present study did not test whether initial training with simultaneous variation in multiple cues might provide similar benefits to those observed using single isolated cues. Nor did it test whether longer training with single or multiple cues might provide benefit compared to the results of phase 1. Further research is needed to address those questions. Most subjects commented that training involving discrimination of F0 and spectral envelope simultaneously was more difficult compared to training with either cue alone. This is not surprising given the interference between pitch and spectral timbre that is experienced by NH (e.g., Pitt 1994; Allen & Oxenham 2014) and CI listeners (Sucher & McDermott 2007; Galvin et al. 2008; Vandali et al. 2012). The increased difficulty may have in part been responsible for some subjects’ lower training duration in the second phase of the study, which may have contributed to the absence of any training effect in that phase. Assuming that training with multiple cues can be effective, that form of training might be improved by reducing the difficulty of the task. Possible approaches include (1) limiting the task to pitch discrimination in the presence of varying spectral timbre, (2) using an adaptive training procedure that commences with target tone patterns that differ only in F0 and not center frequency and gradually introduces patterns in which both cues vary, and (3) using more informative feedback during training to promote faster learning.

CONCLUSIONS Significant and robust improvements to discrimination of F0 and, to a lesser extent, center frequency of band-pass filtered complex harmonic tones were observed for CI recipients after training compared with control subjects that did not train. For tones that varied little in place of stimulation, improvements in F0 discrimination (reduction in % DLs) were twofold or greater, even for tones that were unlike those used during training, demonstrating generalization of outcomes to more complex sounds. Significant improvement was also observed for unfamiliar tones that included small variations in spectral timbre. However, for stimuli with F0s close to the upper limit of rate-pitch discrimination in CIs, or those that encompassed large variations in place, effects of training were not significant despite trends of improvement in the data. Group average absolute F0DLs averaged across all F0 tests were around 1 semitone after training compared with almost 2 semitones before training. These results suggest that training can help CI users to better delineate pitch intervals in music and thereby improve their ability to follow melody. For simple music or even complex music where vocal melodies are often monophonic and prominent, these training techniques can offer listeners self-empowering means to improve their perception of pitch

e12

VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

and foster greater confidence to experience music. However, large spectral timbre variations, such as those that occur with changes in instrument type, are still likely to affect perception of melody. Although the results demonstrate in the broader context that training can improve outcomes, further work is needed to assess the extent to which motivational effects can play a role in providing these benefits.

ACKNOWLEDGMENTS This research was supported by the HEARing CRC, established under the Cooperative Research Centres Program—an Australian Government Initiative. In addition, we acknowledge the support of the Bionics Institute and the support it receives from the Victorian Government through its Operational Infrastructure Support Program. The authors declare no other conflict of interest. Address for correspondence: Andrew Vandali, The Hearing CRC, 550 Swanston St, Carlton, 3053, Victoria, Australia. E-mail: andrewev@ unimelb.edu.au Received December 9, 2013; accepted August 18, 2014.

REFERENCES Allen, E. J., & Oxenham, A. J. (2014). Symmetric interactions and interference between pitch and timbre. J Acoust Soc Am, 135, 1371–1379. Arnoldner, C., Riss, D., Kaider, A., et al. (2008). The intensity-pitch relation revisited: Monopolar versus bipolar cochlear stimulation. Laryngoscope, 118, 1630–1636. Borchert, E. M., Micheyl, C., Oxenham, A. J. (2011). Perceptual grouping affects pitch judgments across time and frequency. J Exp Psychol Hum Percept Perform, 37, 257–269. Carlyon, R. P., Lynch, C., Deeks, J. M. (2010). Effect of stimulus level and place of stimulation on temporal pitch perception by cochlear implant users. J Acoust Soc Am, 127, 2997–3008. Dai, H. (2000). On the relative influence of individual harmonics on pitch judgment. J Acoust Soc Am, 107, 953–959. Foster, D. H., & Bischof, W. F. (1991). Thresholds from psychometric functions: Superiority of bootstrap to incremental and probit variance estimators. Psychol Bull, 109, 152–159. Galvin, J. J., III, Fu, Q. J., Nogaki, G. (2007). Melodic contour identification by cochlear implant listeners. Ear Hear, 28, 302–319. Galvin, J. J., III, Fu, Q. J., Oba, S. (2008). Effect of instrument timbre on melodic contour identification by cochlear implant users. J Acoust Soc Am, 124, EL189–EL195. Geurts, L., & Wouters, J. (2001). Coding of the fundamental frequency in continuous interleaved sampling processors for cochlear implants. J Acoust Soc Am, 109, 713–726. Gfeller, K., & Lansing, C. R. (1991). Melodic, rhythmic, and timbral perception of adult cochlear implant users. J Speech Hear Res, 34, 916–920. Gfeller, K., Witt, S., Stordahl, J., et al. (2000). The effects of training on melody recognition and appraisal by adult cochlear implant recipients. J Acad Rehab Audiol, 33, 115–138. Gfeller, K., Turner, C., Mehr, M., et al. (2002). Recognition of familiar melodies by adult cochlear implant recipients and normal-hearing adults. Cochlear Implants Int, 3, 29–53. Gfeller, K., Witt, S., Adamek, M., et al. (2002). Effects of training on timbre recognition and appraisal by postlingually deafened cochlear implant recipients. J Am Acad Audiol, 13, 132–145. Gfeller, K., Olszewski, C., Rychener, M., et al. (2005). Recognition of “real-world” musical excerpts by cochlear implant recipients and normal-hearing adults. Ear Hear, 26, 237–250. Gfeller, K., Turner, C., Oleson, J., et al. (2007). Accuracy of cochlear implant recipients on pitch perception, melody recognition, and speech reception in noise. Ear Hear, 28, 412–423. Goldsworthy, R. L., & Shannon, R. V. (2014). Training improves cochlear implant rate discrimination on a psychophysical task. J Acoust Soc Am, 135, 334–341. Goto, M., Hashiguchi, H., Nishimura, T., Oka, R. (2003). RWC Music Database: Music genre database and musical instrument database.

Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR Oct 2003), 229–230. Iverson, P., Hazan, V., Bannister, K. (2005). Phonetic training with acoustic cue manipulations: A comparison of methods for teaching English /r/-/l/ to Japanese adults. J Acoust Soc Am, 118, 3267–3278. Jamieson, D. G., & Morosan, D. E. (1986). Training non-native speech contrasts in adults: Acquisition of the English /δ/-/θ/ contrast by francophones. Percept Psychophys, 40, 205–215. Jamieson, D. G., & Morosan, D. E. (1989). Training new, nonnative speech contrasts: A comparison of the prototype and perceptual fading techniques. Can J Psychol, 43, 88–96. Laneau, J., & Wouters, J. (2004). Multichannel place pitch sensitivity in cochlear implant recipients. J Assoc Res Otolaryngol, 5, 285–294. Laneau, J., Wouters, J., Moonen, M. (2004). Relative contributions of temporal and place pitch cues to fundamental frequency discrimination in cochlear implantees. J Acoust Soc Am, 116, 3606–3619. Leal, M. C., Shin, Y. J., Laborde, M. L., et al. (2003). Music perception in adult cochlear implant recipients. Acta Otolaryngol, 123, 826–835. Lee, C. Y., & Lee, Y. F. (2010). Perception of musical pitch and lexical tones by Mandarin-speaking musicians. J Acoust Soc Am, 127, 481–490. Logan, J. S., Lively, S. E., Pisoni, D. B. (1991). Training Japanese listeners to identify English /r/ and /l/: A first report. J Acoust Soc Am, 89, 874–886. Luo, X., Galvin, J. J., Fu, Q. J. (2010). Effects of stimulus duration on amplitude modulation processing with cochlear implants. J Acoust Soc Am, 127, EL23–EL29. Macmillan, N. A., & Creelman, C. D. (1991). Detection Theory: A User’s Guide. New York, NY: Cambridge University Press. McCandliss, B. D., Fiez, J. A., Protopapas, A., et al. (2002). Success and failure in teaching the [r]-[l] contrast to Japanese adults: Tests of a Hebbian model of plasticity and stabilization in spoken language perception. Cogn Affect Behav Neurosci, 2, 89–108. McDermott, H. J., & McKay, C. M. (1997). Musical pitch perception with electrical stimulation of the cochlea. J Acoust Soc Am, 101, 1622–1631. McKay, C. M., McDermott, H. J., Clark, G. M. (1994). Pitch percepts associated with amplitude-modulated current pulse trains in cochlear implantees. J Acoust Soc Am, 96(5 Pt 1), 2664–2673. McKay, C. M., McDermott, H. J., Carlyon, R. P. (2000). Place and temporal cues in pitch perception: Are they truly independent? ARLO, 1, 25–30. Micheyl, C., Delhommeau, K., Perrot, X., et al. (2006). Influence of musical and psychoacoustical training on pitch discrimination. Hear Res, 219, 36–47. Moore, B. C. J., Glasberg, B. R., Peters, R. W. (1985). Relative dominance of individual partials in determining the pitch of complex tones. J Acoust Soc Am, 77, 1853–1860. Moore, B. C., & Carlyon, R. P. (2005). Perception of pitch by people with cochlear hearing loss and by cochlear implant users. In C. J. Plack & A. J. Oxenham (Eds.), Handbook of Auditory Research: Pitch Perception. New York, NY: Springer-Verlag. Pijl, S. (1997a). Pulse rate matching by cochlear implant patients: Effects of loudness randomization and electrode position. Ear Hear, 18, 316–325. Pijl, S. (1997b). Labeling of musical interval size by cochlear implant patients and normally hearing subjects. Ear Hear, 18, 364–372. Pijl, S., & Schwarz, D. W. (1995). Melody recognition and musical interval perception by deaf subjects stimulated with electrical pulse trains through single cochlear implant electrodes. J Acoust Soc Am, 98(2 Pt 1), 886–895. Pitt, M. A. (1994). Perception of pitch and timbre by musically trained and untrained listeners. J Exp Psychol Hum Percept Perform, 20, 976–986. Ritsma, R. J. (1967). Frequencies dominant in the perception of the pitch of complex sounds. J Acoust Soc Am, 42, 191–198. Russo, F. A., & Thompson, W. F. (2005). An interval-size illusion: The influence of timbre on the perceived size of melodic intervals. Psychol Publ Res, 67(4), 559–568. Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychol Sci, 3, 207–217. Singh, P. G., & Hirsh, I. J. (1992). Influence of spectral locus and F0 changes on the pitch and timbre of complex tones. J Acoust Soc Am, 92, 2650–2661. Spiegel, M. F., & Watson, C. S. (1984). Performance on frequency-discrimination tasks by musicians and nonmusicians. J Acoust Soc Am, 76(6), 1690–1695. Sucher, C. M., & McDermott, H. J. (2007). Pitch ranking of complex tones by normally hearing subjects and cochlear implant users. Hear Res, 230, 80–87.



VANDALI ET AL. / EAR & HEARING, VOL. 36, NO. 2, e1–e13

Vandali, A. E., Sucher, C., Tsang, D. J., et al. (2005). Pitch ranking ability of cochlear implant recipients: A comparison of sound-processing strategies. J Acoust Soc Am, 117, 3126–3138. Vandali, A. E., & van Hoesel, R. J. (2012). Enhancement of temporal cues to pitch in cochlear implants: Effects on pitch ranking. J Acoust Soc Am, 132, 392–402. Vandali, A., Sly, D., Cowan, R., et al. (2013). Pitch and loudness matching of unmodulated and modulated stimuli in cochlear implantees. Hear Res, 302, 32–49. Wang, Y., Spence, M. M., Jongman, A., et al. (1999). Training American listeners to perceive Mandarin tones. J Acoust Soc Am, 106, 3649–3658. Wang, W., Zhou, N., Xu, L. (2011). Musical pitch and lexical tone perception with cochlear implants. Int J Audiol, 50, 270–278.

e13

Warrier, C. M., & Zatorre, R. J. (2002). Influence of tonal context and timbral variation on perception of pitch. Percept Psychophys, 64, 198–207. Wei, C., Cao, K., Jin, X., et al. (2007). Psychophysical performance and Mandarin tone recognition in noise by cochlear implant users. Ear Hear, 28(2 Suppl), 62S–65S. Zeng, F. G. (2002). Temporal pitch in electric hearing. Hear Res, 174, 101–106.

REFERENCE NOTE 1. Smith, D. J. (2010). Lexical tone and perceptual training: Implications for children with cochlear implants. PhD thesis, The University of Western Sydney.

Training of cochlear implant users to improve pitch perception in the presence of competing place cues.

Perception of musical pitch in cochlear implant (CI) systems is relatively poor compared with normal hearing and can be adversely affected by changes ...
457KB Sizes 0 Downloads 6 Views