Hearing Research 310 (2014) 76e86

Contents lists available at ScienceDirect

Hearing Research journal homepage: www.elsevier.com/locate/heares

Research paper

Spatial and temporal modifications of multitalker speech can improve speech perception in older adults Brian Gygi a, b, c, *, Valeriy Shafiro d a

National Institute for Health Research, Nottingham Hearing Biomedical Research Unit, Nottingham, UK Otology and Hearing Group, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK c Veterans Affairs Northern California Health Care System, Martinez, CA, USA d Rush University Medical Center, Chicago, IL, USA b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 17 May 2013 Received in revised form 23 January 2014 Accepted 28 January 2014 Available online 13 February 2014

Speech perception in multitalker environments often requires listeners to divide attention among several concurrent talkers before focusing on one talker with pertinent information. Such attentionally demanding tasks are particularly difficult for older adults due both to age-related hearing loss (presbacusis) and general declines in attentional processing and associated cognitive abilities. This study investigated two signal-processing techniques that have been suggested as a means of improving speech perception accuracy of older adults: time stretching and spatial separation of target talkers. Stimuli in each experiment comprised 2e4 fixed-form utterances in which listeners were asked to consecutively 1) detect concurrently spoken keywords in the beginning of the utterance (divided attention); and, 2) identify additional keywords from only one talker at the end of the utterance (selective attention). In Experiment 1, the overall tempo of each utterance was unaltered or slowed down by 25%; in Experiment 2 the concurrent utterances were spatially coincident or separated across a 180-degree hemifield. Both manipulations improved performance for elderly adults with age-appropriate hearing on both tasks. Increasing the divided attention load by attending to more concurrent keywords had a marked negative effect on performance of the selective attention task only when the target talker was identified by a keyword, but not by spatial location. These findings suggest that the temporal and spatial modifications of multitalker speech improved perception of multitalker speech primarily by reducing competition among cognitive resources required to perform attentionally demanding tasks. Published by Elsevier B.V.

Keywords: Speech perception Stream segregation Multitalker processing Dual-task Coordinate response measure Auditory attention

1. Introduction 1.1. Background 1.1.1. Divided and selective attention in multitalker speech perception In everyday communication environments, listeners may often need to attend to the speech of several concurrent talkers at once, before focusing on the speech of a single talker that contains pertinent information. Such situations can be described as “dual task” since listeners need to divide attention among several talkers before selectively focusing on a single one (Brungart et al., 2001), and this requires both divided and selective attention. Findings of

* Corresponding author. NIHR, National Biomedical Research Unit in Hearing, 113 The Ropewalk, Nottingham NG1 5DU, UK. Tel.: þ44 115 823 2603. E-mail address: [email protected] (B. Gygi). http://dx.doi.org/10.1016/j.heares.2014.01.009 0378-5955/Published by Elsevier B.V.

previous studies suggest that, in young normal-hearing listeners, decrements in speech performance in attentionally demanding tasks with multiple talkers might result from the bottleneck in information processing produced by simultaneous competition for cognitive resources (Navon and Gopher, 1979; Wickens, 1991; Shafiro and Gygi, 2007). This mechanism is sometimes referred to as “resource-limited processing” (Norman and Bobrow, 1975). Most previous speech perception research on this topic has focused on listening tasks in which a single target talker, and, rarely, two or more target talkers, are always a priori known to be present in the stimulus (Yost et al., 1996; Best et al., 2010). This approach does not capture the complexity of many real-life communication environments in which the failure to correctly identify the target talker in the first, divided attention task, has negative consequences for the second, selective attention task. When only a single performance measure, proportion of words correct, or p(c), is used, the contributions of each task, with associated attentional components, to overall speech intelligibility are difficult to ascertain.

B. Gygi, V. Shafiro / Hearing Research 310 (2014) 76e86

To provide a separate assessment of divided and selective aspects of speech perception in multitalker environments some studies have used an explicit dual-task format which separately measures the ability to 1) detect one or more concurrently spoken target keywords in the beginning of the utterance; and then 2) identify additional keywords spoken by one of the talkers in the later part of the utterance (Shafiro and Gygi, 2007; Gygi and Shafiro, 2012; Lee and Humes, 2012; Nelson et al., 1999). It was found in these works that dividing attention among multiple talkers in order to detect concurrent keywords negatively affected the ability to selectively attend and identify following keywords of a single target talker in the later part of the task (Gygi and Shafiro, 2012; Shafiro and Gygi, 2007). As the divided attention task increased in difficulty (more concurrent keywords to detect), performance on the selective attention task also worsened, although the overall targetto-masker speech ratio remained unchanged. Thus, although divided and selective attention are often assumed to involve separate cognitive processes, this suggested that in such rapid online task both tasks may draw on a shared pool of processing resources. Further evidence for this was provided by Gygi and Shafiro (2012) who showed that performance on the earlier, i.e. divided attention, task was shown to be influenced by whether or not the listener needed to perform the later selective attention task. Listeners’ ability to divide attention among multiple keywords was better when they performed the divided attention task alone, than when they had to perform a subsequent selective attentional task. A possible confound of this paradigm was the dependence of the selective attention task on the divided attention task: if listeners failed to detect the requisite callsigns, they could not perform the color-number identification task. To control for this, Gygi and Shafiro (2012) introduced another condition in which the stimulus utterances were spatially separated and the keywords to be identified in the selective attention task were located either on the left or the right. This enabled a “decoupling” of the two tasks: as the difficulty of the first part, concurrent keyword detection, increased, accuracy of keyword identification in the second part was unchanged. This suggests that processing resources involved in the two tasks could be dissociated depending on the perceptual route to target talker identification, as was also demonstrated by Gallun et al. (2005). 1.1.2. Effects of time-stretching on speech perception The use of automatic time stretching (TS) of speech has a long history as a potential means of improving speech perception (Di Carlo and Taub, 1972; Luterman et al., 1966a; Korabic et al., 1978; Schmitt and McCroskey, 1981; Schmitt, 1983; Gordon-Salant and Fitzgibbons, 1997; Wingfield et al., 1999; Vaughan et al., 2002; Gordon-Salant et al., 2007; Vaughan et al., 2008; Boyle et al., 2013). Although the exact motivations for using TS speech differed, the rationales were generally either based on “increased clarity”, in that slowed-down speech gives greater emphasis to linguistic boundaries (Schmitt, 1983; Picheny et al., 1989; Vaughan et al., 2002), or on “greater efficiency”, in that TS would lessen the processing demands on cognitive resources, e.g., by enabling listeners to fully process the contents of an attentional “buffer” before new information comes in, a cognitive “bottleneck” is avoided (Korabic et al., 1978; GordonSalant and Fitzgibbons, 1997; Wingfield et al., 1999). Although some of these studies have found improvements (Schmitt, 1983; McCroskey and Kasten, 1982), a greater number have found no benefit or even a decrement with time stretching (Di Carlo and Taub, 1972; Korabic et al., 1978; Luterman et al., 1966; Picheny et al., 1989), even as TS technology has become ubiquitous and can be found in many portable listening devices and audio software applications. Various explanations have been offered for the failure to find what would seem to be an intuitive result. An issue in some older studies, (e.g., Korabic, et al., 1978; Schmitt, 1983) was that

77

techniques for speech expansion usually produced artifacts or distortions; this is much less of an issue now since most recent algorithms are able to modify the tempo of target speech with minimal processing artifact or changes in other voice qualities within a large range of TS rates. Wingfield et al. (1999) suggested that too extreme expansion (beyond about 140%) might stretch syllabic durations beyond the size of a “perceptual window” (a syllable-size acoustic-perceptual buffer). Another explanation for the frequent lack of intelligibility benefit from TS speech in previous studies may have resulted from methodological constraints. For example, most of the studies listed used perceptual tasks with low informational load; for example, target speech is always present on each trial (unlike the studies described here which used catch trials), resulting in relatively few demands placed on working memory. On the other hand, the benefits of TS may result from interactions with cognitive factors potentially leading to improved speech perception, i.e. resource-limited perceptual processes (Gordon-Salant and Fitzgibbons, 1997; Nejime and Moore, 1998), rather than improving the signal-to-noise ratio (SNR) or other sensory enhancements to surmount a data-limited process. In support of this, Gygi and Shafiro (2012) recently found that speech perception benefits of TS speech may occur primarily when the speech task imposes a high attentional load, significantly taxing working memory. The researchers tested young normal hearing adults (YNH) in the dual speech perception task described above. In addition, utterances had either regular tempo or were timestretched to effect a 25% tempo reduction (i.e. the duration was increased by 25%). The results revealed that YNH listeners performed better on both divided and selective attention tasks with TS stimuli in nearly all conditions, comprising different combination of number of talkers and target callsigns. This improvement was observed even though TS did not improve the SNR of the target keywords in either divided or selective attention conditions. 1.1.3. Aging, speech perception and cognitive abilities A growing body of research has demonstrated a link between speech perception problems and age-related decline in cognitive abilities (Pichora-Fuller, 2003; Humes, 2008). With the onset of age-related hearing loss (i.e. presbacusis), the role of attention and working memory in speech perception in noise may become even more important as the quality of peripheral signal encoding at the level of the cochlea and the auditory nerve deteriorates (Salthouse and Babcock, 1991; Humes and Floyd, 2005; Akeroyd, 2008). The relationship between divided and selective attention in the context of working memory was investigated by Humes et al. (2006) for young and elderly listeners on a task using concurrently presented speech. On nearly all of the attentional measures, the elderly performed more poorly than the younger listeners, and the difficulties the elderly had with speech-on-speech masking persisted even when the stimuli were equated for audibility. Performance on both attentional tasks for the older adults was correlated with individual differences in working memory, based on the Digit Span test. These findings corroborate earlier results showing that the accuracy of performing attentionally demanding speech perception tasks can be influenced by higher-order cognitive factors such as attentional load and working memory capacity (Salthouse, 1996; Verhaeghen and De Meersman, 1998; Verhaeghen et al., 2003; Helfer and Freyman, 2008). 1.2. Present study The purpose of the present study was twofold: (a) to investigate the performance of older adults without significant hearing loss performing the dual-listening task under varying attentional load; and, (b) to determine whether older adults would show perceptual

78

B. Gygi, V. Shafiro / Hearing Research 310 (2014) 76e86

benefits of TS similar to that of young adults. If the benefits of TS speech are primarily mediated by the cognitive abilities that are known to decline with age, such as attention and working memory capacity (Verhaeghen et al., 2003), it is possible that older listeners may not derive similar speech perception benefits from TS as the young listeners due to limitations in available cognitive resources. Alternatively, if TS can alleviate the cognitive demands of a speech perception task that taxes attentional and working memory resources, older listeners may achieve the same or greater benefit than YNH adults. Finally, testing the elderly listeners on spatially separated stimuli will reveal whether older listeners can gain a similar performance benefit from spatial separation as younger listeners. Following the design of Gygi and Shafiro (2012), the experiments below examined the role of temporal and spatial factors in the relationship between divided and selective attention in multitalker listening situations for elderly listeners with age-appropriate hearing (EAH). The divided attention task consisted of detecting specific keywords spoken concurrently by multiple talkers, while the selective attention task, performed on the same trial, consisted of additional keyword identification from a single prespecified talker in the mixture. Experiment 1 tested whether performance of older adults on the divided attention task and the subsequent selective attention task would benefit from TS. Experiment 2 examined how performance on the selective attention component of the dual task was affected by differences in the perceptual route to target talker identification: whether the target talker was identified by either a specific keyword or by spatial location. Based on the previous work with YNH listeners, it was expected that perception of the speech of the target talker in the selective component of the dual task would not be affected by performance on the preceding divided attention task when the target talker was identified by spatial location rather than a keyword. However, on both tasks older listeners were expected to perform less accurately than the YNH listeners tested previously. 2. Experiment 1 2.1. Materials and methods 2.1.1. Stimuli and design Stimuli were constructed from sentences in the Coordinate Response Measure (CRM) corpus (Moore, 1981). All CRM sentences are of the form “Ready go to now”. There are eight possible callsigns (Baron, Laker, Hopper, Tiger, Ringo, Charlie, Arrow, Eagle), four possible colors (Green, Red, Blue, White) and numbers from “1” to “8”, for a total of 256 stimuli. In the standardized recorded version of the corpus (Bolia et al., 2000), these sentences are spoken by sixteen different talkers (eight male, and eight female), resulting in 2048 individual sentences. The sentences are normalized for root-mean-square (rms) energy, bandpass filtered from 80 Hz to 8 kHz and have approximately the same duration (see Bolia et al., 2000 for a description of the recording process). Every stimulus contained two to four CRM utterances, with three combinations of talker sex counterbalanced in all conditions (i.e. all male, all female, or mixed). The selection of talker sex was set up to equally represent the different male and female configurations in the total number of stimuli for each condition. Half of the stimuli were composed of sentences that contained specific target callsigns (designated as “Target” such as either “Baron” alone, or “Baron” and “Hopper”, or “Baron”, “Hopper” and “Tiger”), and half of the stimuli were foils that only contained a subset of all possible target callsigns (designated as “No Target” stimuli to the listeners). Overall, there were three possible numbers of talkers, and three combinations of target callsigns, for a total of seven stimulus configurations. Within a given number of talkers, the attentional load

increased with the greater number of target callsigns to attend to. These conditions are outlined in Table 1, with the number of callsign targets designated by a capital “T” and the number of talkers in the stimulus designated by lower-case “t” (Note: a condition with three target callsigns and four talkers was not used e in pilot studies this led to near-chance performance). The experimental design can be described by a 7  2 matrix, with seven stimulus configurations, and two tempo conditions: Regular Tempo or Time Stretched. The unprocessed CRM sentences were represented by the Regular Tempo (RT) stimuli. The Time Stretched (TS) stimuli were CRM sentences that were slowed down by 25% using the Pitch Synchronous Overlap Add Method (PSOLA) implemented in the Praat (version 5.0.21) speech analysis and synthesis package. This method increases the duration of a speech utterance by dividing it into several small overlapping segments and moving the segments further apart in time. Then the segments are interpolated and added using the overlap and add technique. This lengthens the signal, but retains all other qualities of the original speech utterance. Time stretching was applied to each utterance in the stimulus before mixing. The 25% tempo reduction was chosen as the largest tempo change that did not produce noticeable signal artifacts in the utterances. Divided and selective attention were assessed for each stimulus using two dependent variables (Shafiro and Gygi, 2007, 2012; Lee and Humes, 2012). The first, a measure of divided attention, was the sensitivity to the callsign target words (d0 ). It was based on the ability to distinguish trials that contained one or more target callsigns from those that did not. As described above, the divided attention load was manipulated by changing the number of callsign target words listeners had to detect in a stimulus from one to three. The second dependent variable, a measure of selective attention, was the overall accuracy of identifying color and number keywords spoken by a single target talker (which was always identified by the callsign “Baron”). The stimuli were presented in blocks of 320, which made up one test session. Half of the stimuli in every block were TS. Half of both the RT and TS stimuli were “No Target” which did not contain all target callsigns (see below). The No Target stimuli were presented randomly mixed in with the Target stimuli, which contained all target callsigns. Each session tested one stimulus configuration from Table 1. In the first three test sessions (Stimulus Configuration 1T), listeners were asked to detect only a single callsign (‘Baron’) in each stimulus: one half of the stimuli contained one talker saying ‘Baron’ (targets), while the other half did not contain any utterances with this callsign (foils). In the fourth-sixth test sessions (Stimulus Configuration 2T), the target callsigns were ‘Baron’ and ‘Hopper’. Listeners were instructed to identify the color and number spoken by the talker saying ‘Baron’, but only on those trials when they could detect both ‘Baron’ and ‘Hopper’. One half of the stimuli in those sessions contained both target callsigns, while the other half of the stimuli were foils, which contained one utterance with the callsign ‘Baron’, but no utterance with the callsign ‘Hopper’. In the seventh session (Stimulus Configuration 3T), listeners were again instructed to identify the color and number spoken by the talker saying ‘Baron’, but to do it only when they could hear all three callsigns: ‘Baron’, ‘Hopper’ and ‘Tiger’. Foil stimuli in these sessions always contained one utterance with the target callsign ‘Baron’, and one other utterance with either ‘Hopper’ or ‘Tiger’, but not both. Table 1 Diagram of the stimulus configurations used in these studies. No. of targets (T)/ No. of talkers (t)

1 (Baron)

2 (Baron and Hopper)

3 (Baron, Hopper and Tiger)

2 3 4

1T2t 1T3t 1T4t

2T2t 2T3t 2T4T

3T3t 3T-4t Not used

B. Gygi, V. Shafiro / Hearing Research 310 (2014) 76e86

79

This composition of foil stimuli was implemented to determine whether listeners detected all target callsigns. The interfering utterances in all sessions contained any of the five remaining nontarget callsigns in the corpus. 2.1.2. Listeners and procedure There were thirteen elderly listeners whose mean age was 70 years; the range was 61e80 years. All had pure-tone thresholds in their better ear of 40 dB HL (ANSI 1969 Hearing Level) or better at 250, 500, 1000, 2000, and 4000 Hz, and 70 dB HL or better at 6 and 8 kHz. The right- and left-ear audiograms of the listeners are shown in Fig. 1. Although they all had some hearing impairment at higher frequencies, based on the ISO 7029 (2000) statistical distribution of hearing thresholds as a function of age, the largest mean deviation from the median threshold across frequencies for any of them was þ1.1 SD (i.e. the worst listener had thresholds 1.1 SD higher than the median). Thus, for this study they are designated Elderly with Age-appropriate Hearing (EAH) listeners. All had scores of 25 or higher on the Mini-Mental State examination (Folstein et al., 1975) and were in good health, with no history of otological disorders. Each of the seven sessions lasted approximately 2 h. The laboratory conditions were designed to replicate those used in Experiment 2 of Gygi and Shafiro (2012). The listeners were tested in soundproof booths. They listened to stimuli presented through Sennheiser 250 HDII headphones. The stimuli were stored as binary wav files, played from a PC through a Gina 24 sound card, and amplified by a TDT System III headphone buffer.

R ight Ear T hres holds (dB H L)

0 10 20 30 40 50 60 70

500

1000

2000

4000

8000

Frequency (Hz)

Lef t EarT hres holds (dB H L)

0 10 20 30 40 50

Fig. 2. The two-component multi-talker speech task used in the study. Based on the Coordinate Response Measure corpus of utterances with eight callsigns, four colors and eight numbers, the task allows manipulation of attentional load by varying the number of callsign keywords the listener is asked to detect in the first part of the task, before identifying the color and number words spoken by the talker with the callsign ‘Baron’. If the listener cannot detect all target keywords, she presses the ‘No target’ button and continues to the next trial. In the example, in a 4-talker stimulus, the listener has to detect two target callsigns HOPPER and BARON (filled text), presented with two masking talkers (always of the same sex). If the listener can detect HOPPER and BARON, she will attempt to identify RED THREE spoken by the BARON talker.

The listeners were seated in front of a 1700 LCD touch screen monitor. Stimuli were presented diotically at 67e73 dB SPL; following the original protocol of Brungart et al. (2001), the overall level of each stimulus was randomly roved in a 6 dB range in 1 dB steps. The participants responded to each stimulus using a graphical user interface that consisted of an 8 x 4 matrix of numbers and color names displayed on the screen, where each row represented a number and each column a color. The numbers were colored according to their appropriate column. The label for the corresponding color was also written next to each number. The participants either touched the screen or used a mouse, according to their preference, to select the color-number response corresponding to that of the target utterance. Stimulus presentation was listener-paced. In the beginning of each session (i.e. a block of 320 stimuli), each participant performed 20 practice trials that were not scored. Prior to each session, participants were informed that half of the stimuli would be foils. The trials were run in a “detection þ identification” paradigm. The participants were instructed to make a color-number identification response only if they heard all target callsigns in the stimulus, and to press the button labeled “No Target” otherwise (a diagram of the task is shown in Fig. 2). Sensitivity to target callsigns, measured as d0 , was calculated based on ‘hits’, i.e. when the listener made a color-number identification response (whether correct or incorrect) on a trial that contained all target callsigns for that session, and ‘false alarms’, when a color-number response was made on a trial that did not contain all target callsigns. The value of d0 was measured as the difference between the z-scores for hits and the zscores for false alarms for each listener (Macmillan and Creelman, 2005). If a d0 could not be calculated (due to zero false alarms) it was set to 4.0. The p(c) for color-number identification was calculated as the number of correct color-number identifications divided by the number of hit trials.1 These scores were then transformed into rationalized arcsine units (RAU), normalizing the variance.

60 70 80

500

1000

2000

4000

8000

Frequency (Hz) Fig. 1. Right- and left-ear audiometric thresholds for the EAH listeners in both experiments. For three of the listeners, the audiometric data were not preserved (although they fitted the experimental criteria), and are not shown.

1 The number of hits varied across conditions from a mean of 58.9 in 2T3t to 97.9 1T2t. As could be expected from the d0 values, the number of hits was higher in conditions with fewer talkers and target callsigns and lower in conditions with more talkers and target callsigns. A low number of hits does increase the possibility of sampling error in the p(c) estimates; however, even in the low-hit rate conditions, the p(c) values follow the linear trend of the higher hit-rate conditions, and in all cases are well above chance. Furthermore, the hit rate was rather consistent across subjects. This suggests that p(c) estimates remain reasonably accurate even in conditions with a reduced number of hits.

80

B. Gygi, V. Shafiro / Hearing Research 310 (2014) 76e86

2 Targets

1 Target

3 Targ.

2 Targets

1 Target

3 Targ.

4.0

*

Callsign det. TS Callsign det. RT

3.5

Callsign detection,d'

3.0

* *

2.5

*

*

*

2.0

*

1.5

1.0

0.5

0.0 2

3

4

2

3

4

3

2

No. of talkers - EAH 1 Target

80

70

2 Targets

3 Targ.

Color-number ID TS Color-number ID RT

*

3

4

2

3

4

3

No. of talkers - YNH 2 Targets

1 Target

*

3 Targ.

*

*

Color-number ID, p(c)

60

50

*

40

*

30

* *

*

*

20

10

0 2

3

4

2

3

4

No. of Talkers - EAH

3

2

3

4

2

3

4

3

No. of talkers - YNH

Fig. 3. Top panels: Left graph e d0 values from Experiment 1 of this study EAH. Right graph d0 values from for YNH listeners from Experiment 2 of Gygi and Shafiro (2012). Lower panels: Left graph e corresponding color-number identification (p(c)) data for EAH from Experiment 1 of this study. Right graph e p(c) data for YNH listeners from Gygi and Shafiro (2012). For the EAH data, configurations for which the TS performance was significantly better than the RT performance are marked with an asterisk (*). The whiskers represent 1 Standard Error in this and all other graphs, unless otherwise noted.

2.2. Results The effects of TS and attentional load were evaluated using two separate ANOVAs, one based on d0 , and the other based on p(c), converted to RAU scores. In each ANOVA, the results for the present EAH listeners were analyzed along with the results for YNH listeners from Gygi and Shafiro (2012) in a mixed-effect design. The effects of number of target talkers and talker sex configuration, well known factors that have been shown to influence performance (see e.g., Iyer and Brungart, 2005), were not analyzed here because of the extensive previous research on those topics (Brungart et al.,

2006). The mean d0 and p(c) across listeners for each stimulus configuration and TS condition are displayed in Fig. 3, along with the YNH data in the comparable conditions from Gygi and Shafiro (2012). As for the YNH, the EAH listeners displayed a main effect of TS for both d0 and p(c) tasks: F(1, 11) ¼ 27.268, p < .001 and F(1, 11) ¼ 11.6, p ¼ .006, respectively.2 The eta-squared for p(c), a

2 The lower degrees of freedom are because scores were not recorded for one subject in two of the stimulus configurations, and so those data were not included in the ANOVA.

B. Gygi, V. Shafiro / Hearing Research 310 (2014) 76e86

3.0

d' data

Calls ign detec tion d', TS

2.5

2.0

1.5

1.0

0.5 0.5

1.0

1.5

2.0

2.5

3.0

Callsign detection d', RT 0.5

p(c) data Color-number ID p(c), TS

measure of effect size for ANOVAs, was 0.023, which is a small effect size, whereas for d0 it was 0.083, a medium effect size. However, both of these measures are based on proportion of the total variance accounted for across all configurations; the variance accounted for by TS was dwarfed by the variance between stimulus configurations. A better estimate of the effect of TS is given by planned comparisons between the RT and TS conditions for each configuration. All the d0 TS scores were significantly greater than those for RT. For p(c), all the conditions except 2T4t and 3T3t led to significantly higher scores for TS than for RT. Tests of effect size for TS for each configuration showed that Cohen’s d ranged from 0.5 to 3.1, all medium-to-large effect sizes. This demonstrates that decreasing speech tempo enabled the EAH listeners to better detect the presence of concurrently spoken keywords from multiple talkers as well as to more accurately identify specific color-number words of one talker on trials when all callsigns were detected correctly. This pattern of results was remarkably consistent across the EAH listeners. All inter-subject correlations across stimulus configurations were significant, for both the RT and TS conditions and for both selective and divided attention tasks, with the lowest being 0.54 and the mean being 0.85. This indicates that listeners performed in a similar way across conditions; however, there was a wide range of performance across listeners in all conditions. For instance, in the one-target 2-talker RT condition, the range of p(c) was 77e27%, and the range of d0 values was 2.1e4.0. Nearly all listeners showed improvement with TS on both tasks. Fig. 4 shows the scatterplots of d0 in the RT vs. TS conditions (upper panel), and p(c) in the RT vs. TS conditions (lower panel), both averaged across Target/talker combinations. The diagonal isoperformance lines are shown for both plots. Every subject had a better p(c) in the TS condition, and only one subject had a worse d0 . This shows that the improvement with TS was a generalized effect, not limited to a few listeners or a specific task. The overall benefit of TS for EAH listeners, for both detection and identification, was highest for two-target callsign conditions, and slightly less for onetarget callsign conditions. However, for identification accuracy only, the effect of TS was reversed for the three-target callsign condition (possibly reflecting random variations at near floor performance levels). As shown in Fig. 3, the benefit from TS for EAH listeners mirrored that found by Gygi and Shafiro (2012) for YNH listeners. Overall, EAH listeners performed less accurately than YNH listeners in both the divided and selective attention tasks. A mixed ANOVA combining both tasks in all stimulus conditions showed a significant effect of age group, F(1, 20) ¼ 9.032, p ¼ .007. However, the pattern was the same for the two groups with no significant interaction of age and tempo, F(1, 20) ¼ 0.085, p ¼ .610. Both d0 and p(c) decreased as the number of competing talkers increased, probably due to increased modulation and informational masking associated with additional nontarget talkers (for a summary of modulation masking see Stone et al., 2011). Performance on both measures also worsened as the number of target callsign keywords increased, even when the total number of talkers in the stimulus was constant. This reflects increased divided attention load, i.e. change in perceptual accuracy which cannot be accounted by greater masking. The negative effect of attentional load was consistent with the findings of Shafiro and Gygi (2007), who showed that increasing the divided attention load had a deleterious effect on performance in a subsequent selective attention task, indicating the sharing of resources between the two tasks. However, for the current data, there was a significant interaction between TS and divided attention load for color-number identification accuracy. This is shown in Fig. 5, which plots performance across target callsign conditions

81

0.4

0.3

0.2

0.1 0.1

0.2

0.3

0.4

0.5

Color-number ID p(c), RT Fig. 4. Scatterplots of the individual d0 (upper panel), and p(c) (lower panel) values, in the TS conditions of Experiment 1.

(only the one- and two-talker conditions were summarized because there was only one three-target condition). For d0 , performance decreased in both RT and TS conditions as the number of target callsigns increased (i.e. divided attention load increased). For p(c), performance in the RT condition worsened as well, but in the TS condition it was unaffected by the divided attentional load. This result is consistent with the notion that TS, and associated speech tempo changes, might have enabled lesser overlap in resource allocation between the two tasks. To determine if difference in audibility contributed to the considerable range of performance among the EAH, several audiometric measures were calculated, such as the threshold slope from 0.5 to 8 kHz, the PTA from 0.5 to 8 kHz, the high frequency roll off (i.e. the difference between 8 kHz and 2 kHz thresholds) in each

82

B. Gygi, V. Shafiro / Hearing Research 310 (2014) 76e86

ear, and in the better ear. None of these measures was significantly correlated with d0 or p(c) in any configuration for either RT or TS condition. Thus, it appears that audibility did not substantially influence performance. The general benefit in performance with TS speech could be due to more efficient allocation of cognitive resources due to a greater temporal separation of the divided and selective attention tasks. Effectively, as with the “greater efficiency” models described in the Introduction, TS may allow more time to complete the divided attention task before the selective attention task begins, thus enabling better speech recognition by allowing more time for cognitive processing of information in working memory. The notion of further separating the cognitive resources recruited in the divided and selective attention tasks inspired Experiment 2, which

3.0 2.8

Regular Tempo Time Stretched

2.6

Callsign detection, d'

2.4 2.2 2.0 1.8 1.6 1.4

tested the effects of spatial separation on performance in complex multi-talker environments. 3. Experiment 2 Another approach to minimizing the cognitive resources shared by the two tasks is through spatial separation. This was implemented by Gygi and Shafiro (2012) for YNH adults. When the CRM utterances were spatially separated, the cue to color-number identification was spatial location, instead of the characteristics of a talker who produces callsign ‘Baron’. As a result, performance on the selective attention task became independent of performance on the preceding divided attention task. As the difficulty of callsign detection increased due to a greater number of target callsigns, performance on color-number identification did not change. This is consistent with earlier work in which researchers manipulated the allocation of processing resources through altering the tasks (Gallun et al., 2005; Kidd et al., 2005b; Shinn-Cunningham et al., 2005; Gygi and Shafiro, 2012; Ihlefeld and Shinn-Cunningham, 2008). It is possible, however, that as speech perception in multitalker environments becomes considerably more effortful due to aging, spatial separation is not as effective in separating the processing demands of the divided and selective attention tasks (Marrone et al., 2008). In this case, older listeners’ keyword identification in the selective attention task would still be affected by the divided attention load of the earlier task. Alternatively, separation of cognitive resources for each task afforded by spatial separation may be robust even for older listeners, in which case the elderly would experience a similar benefit from the spatially separated stimuli as the YNH tested in Gygi and Shafiro (2012). These competing hypotheses were tested in Experiment 2.

1.2

3.1. Materials and methods 1.0 0.8

1

2

No. of Target Callsigns 60 55

Regular Tempo Time Stretched

Color-number ID, p(c)

50 45 40 35 30 25 20 15

1

2

No. of Target Callsigns Fig. 5. Mean d0 (top panel) and p(c) (bottom panel) in the RT and TS conditions for EAH listeners across number of target callsigns in Experiment 1, showing the effects of increased divided attentional load.

3.1.1. Stimuli and design The stimuli were composed of sentences drawn from the CRM corpus. The stimuli were configured as in Experiment 1: combinations of utterances by 2e4 talkers using 1e3 target callsigns. The major differences were that the utterances were all separated in virtual space, and no TS was applied. The spatial arrangement of concurrent utterances in Experiment 2, arranged by number of talkers is shown in Table 2. Each CRM sentence in a stimulus was spatially located in azimuth using the Head Related Transfer Functions (HRTFs) from the CIPIC database (Algazi et al., 2001). The assignment of a particular utterance to a spatial location was done randomly. 3.1.2. Listeners and procedure The listeners were ten elderly persons (six men, four women) who had taken part in Experiment 1. The tasks followed the format of Experiment 1 with some changes due to the spatial separation of the stimuli. On every trial, listeners completed the callsign detection task, and then, if they determined that all of the required callsigns were present, they would identify the color-number combination of the talker located either on the left or right at 90 . The left/right assignation of the target callsign was initially randomly selected for a listener and then alternated by block: for example, the one-target 2-talker trials might start with a target callsign on the right side, then, in the following block, the onetarget 3-talker trials would have the target on the left, then back to the right side for the one-target 4-talker trials, and so on. Listeners were told of the spatial arrangement and location of the target talker for color-number pairs during the instructions. Other than that, the procedure was the same as for Experiment 1.

B. Gygi, V. Shafiro / Hearing Research 310 (2014) 76e86

3.2. Results

Table 2 Spatial arrangement of sentences in Experiment 2 by number of talkers. 2 Talkers 90

1 Target

4.0

3 Talkers 90 and 0

4 Talkers 90 and 30

2 Targets

3 Targets

3.5

Calls ign detec t ion, d'

3.0 2.5 2.0 1.5 1.0 0.5 0.0

2

3

4

2

3

4

3

No. of Talk ers 1 Target

100

2 Targets

3 Targets

90 80

Color-number ID, p(c)

The mean d0 and p(c), converted to RAU, were calculated for all listeners and subjected to two separate Repeated Measures ANOVA. As in Experiment 1, there were main effects of the number of talkers for both d0 and p(c) (F(6, 54) ¼ 43.652, p < .001, and F(6, 54) ¼ 32.605, p < .001, respectively). The mean d0 and p(c) in each stimulus configuration are shown in Fig. 6. Overall, the d0 values, followed the pattern found in the other experiments, namely that d0 decreased as the number of concurrent target callsigns to be detected increased, and for a given number of targets, d0 was lower with a greater number of talkers. Comparison of the d0 scores for the RT condition from Experiment 1 with the scores for Experiment 2 for the listeners who were in both studies (n ¼ 10) showed no main effect of experiment (F(1, 6) ¼ 0.022, p ¼ .887). This indicates that performance on the divided-attention task was not affected by spatial separation of the stimuli. Apparently, spatially separating the callsigns does not provide any benefit or decrement in the divided attention task. This was found for young listeners of Gygi and Shafiro (2012), as well as in other work using spatially separated CRM stimuli (Kidd et al., 2005a). For p(c), which involved color-number keyword identification, spatially separating the talkers produced a remarkably different pattern from that for diotic presentation. In general, for a given number of talkers in the stimulus, the p(c) did not decrease as the number of target callsigns that had to be detected increased. Planned comparisons showed no significant differences between one-target 2-talker and two-target 2-talker configurations, between two-targets 3-talkers and three-targets 3-talkers, and between one-target 4-talkers and two-target 4-talkers. Somewhat surprisingly, in both the three-target 3-talker and two-target 3talker conditions p(c) was significantly higher than in the onetarget 3-talker case. There were large inter-subject differences in performance. The range of mean performance across conditions was w1.9 d0 from the best to the worst listeners for callsign detection, and w39 percentage points for color-number identification, both larger than for Experiment 1. But, as for Experiment 1, the overall patterns of performance corresponded quite closely; the mean inter-subject correlations across the stimulus configurations were r ¼ 0.84 for the callsign detection task, and r ¼ 0.79 for the color-number identification task, showing good consistency across tasks. An additional ANOVA based on p(c) scores for the RT conditions of Experiments 1 and 2 revealed a significant main effect of spatial separation, F(1,6) ¼ 230.10, p < .001. As illustrated in Fig. 7, spatial separation had a positive effect on p(c). In contrast, the d0 scores, which required listeners to attend to up to four separate locations in a 180 auditory field, showed no significant difference between the diotic and spatially separated conditions. A similar pattern of results was previously reported by Nelson et al. (1999) who also found a positive effect of spatial separation for color-number identification, but no effect of spatial separation for callsign detection. The finding of differences in the effect of spatial separation on d0 and p(c) supports the notion that the use of spatial cues in the color-number identification task decouples it from the prior detection task and allows processing through separate routes. These and earlier findings of Gallun et al. (2005), which also showed the independence of the two tasks in dichotic situations, suggest that variation in the amount of shared resources in diotic

83

70 60 50 40 30 20 10 0

2

3

4

2

3

4

3

No. of Talkers Fig. 6. Mean d0 (top panel) and p(c) (bottom panel) values for the spatially separated stimuli from Experiment 2. On the p(c) graph, the bars with equivalent numbers of talkers have similar shading and are connected by lines with arrows across target conditions.

vs. dichotic listening could be responsible for differences in identification performance in Experiments 1 and 2. It appears that spatial separation of the talkers had a similar effect on EAH listeners as it did on the YNH listeners tested by Gygi and Shafiro (2012), by enabling the use of distinct processing resources. In this condition, performance of EAH and YNH listeners did not differ significantly. This finding is contrary to some previous work, which often demonstrated some decline in performance for the elderly even in conditions of spatial unmasking (Marrone et al., 2008). Possibly, the more effective allocation of processing resources afforded to EAH listeners by spatially separating the talkers was large enough that their overall performance was comparable with that of the YNH listeners. Finally, there was a significant main effect of target side for p(c). Mean p(c) was 72.0 for targets located on the right side, and 67.0 for targets location on the left side, F(1, 91) ¼ 8.81, p ¼ .0038. This was expected from the results with YNH (Gygi and Shafiro, 2012) and work by others who showed a similar effect of right side advantage for the CRM stimuli (e.g., Gallun, et al., 2007), However, the right side advantage did not interact with the Stimulus Configuration (better performance for locations to the right side was consistent

84

B. Gygi, V. Shafiro / Hearing Research 310 (2014) 76e86

1 Target

3 Targets

2 Targets Diotic Spatially Separated

Callsign detection, d'

3.5 3.0 2.5 2.0 1.5 1.0

*

1.0 0.8

+ *

0.6

*

+

0.4 0.2 0.0

0.5

-0.2

0.0 2

3

4

2

3

4

2 Targets

1 Target 100

3

80 70 60 50

3

4

2

3

4

3

No. of Talkers

1 Target

3 Targets

2 Targets

YNH EAH

Color-number ID, p(c) benefit

Diotic Spatially Separated

2

30

3 Targets

90

Color-number ID, p(c)

3 Targets

+

YNH EAH

1.2

Calls ign detec tion, d' benefit

4.0

2 Targets

1 Target

1.4

4.5

+

*

+

20

+

*

*

10

+ *

*

0

+

40

-10

30

2

3

4

2

3

4

3

No. of Talkers 2

3

4

2

3

4

3

No. of talkers Fig. 7. Comparison of the mean d0 (top panel) and mean p(c) (bottom panel) for EAH listeners in the RT condition presented diotically (Experiment 1) with the corresponding spatially separated conditions (Experiment 2).

across configurations) and thus did not influence the overall pattern of results in Experiment 2. 4. Discussion As the difficulty of the divided attention task increased with a greater number of concurrent callsigns to detect, p(c) decreased, even for the same number of talkers in the stimulus. This suggests that performance on the two tasks, in that case, may draw on a shared pool of processing resources or was otherwise affected by a similar set of factors. On the other hand, the lack of a similar relationship between the d0 and p(c) when listeners identified the target talker by spatial location suggests that, in that case, the two tasks were performed using separate processing routes. In fact, in the spatially separated condition, the EAH listeners performed almost equivalently to the YNH. Although lower performance of the elderly relative to the young could be expected based on previous work with spatial unmasking (Marrone et al., 2008), this was not the case here. As predicted, the findings of Experiment 1 were consistent with the conclusion that the positive effects of TS on speech perception

Fig. 8. The differences in TS e RT scores, i.e. TS benefit, for YNH and EAH listeners for d0 (top panel) and p(c) (bottom panel). Asterisks and crosses indicate significant differences between the TS and RT conditions for the EAH and YNH listeners, respectively.

are mediated by higher-level cognitive factors related to the attentional load of the task. The EAH listeners, similar to YNH listeners in the previous study (Gygi and Shafiro, 2012), derived greater overall TS benefits when they had to detect two concurrently spoken words rather than one (Fig. 8), while the lack of benefit in the three-word condition was likely due to near-chance performance. This suggests that decreasing the speech tempo may allow more time for perceptual processing of information stored in working memory, as also follows from previous research showing that reduction of speech presentation rate leads to improvements on memory tasks (Bradlow et al., 1999). Although the YNH overall performed better than the EAH listeners, the amount of improvement in speech perception accuracy from TS did not differ significantly between the two for either d0 or p(c). This is in spite of the fact that working memory abilities are generally reduced in older compared to the young listeners.3 Thus, the amount of benefit with TS may not depend directly on baseline

3 Forward and backward digit span test data are available for seven of the elderly participants, and while that N is quite low, the elderly still showed significantly shorter memory spans than YNH adults.

B. Gygi, V. Shafiro / Hearing Research 310 (2014) 76e86

working memory abilities (at least as measured by the Digit Span test). Differences in audiometric measures such as high frequency audibility between YNH and EAH did not affect the TS benefit for the two groups. Performance on the selective and divided attention tasks also was not correlated with listeners’ differences in thresholds between 8 and 2 kHz. However, it should be noted that the CRM utterances have approximately a 28 dB drop in spectrum level between 4 and 8 kHz (using Welch’s overlapped segment averaging estimator), which would limit audible speech information at high frequencies even for YNH listeners, and, by the same token, minimize the effect of high-frequency hearing loss on performance in the elderly. A possible mechanism underlying the positive effect of TS on speech perception accuracy is illustrated in Fig. 9, using a general model of speech perception, which emphasizes central processing abilities. This model is based on several recent theoretical accounts of speech perception that have received empirical validation (Bradlow et al., 1999; Rönnberg et al., 2008; Boothroyd, 2010). After sensory encoding of spectral and temporal features of multi-talker speech input, partially processed sensory images can be preserved in working memory for a limited time. During this time, long-term memory is searched recursively for the best match with the input. When speech tempo is reduced, information passes through working memory at a slower rate, thus providing additional time for matching the partially processed input in working memory with lexical, syntactic and semantic information in long-term memory. Thus, under adverse listening conditions, reducing speech tempo allows for a greater number of iterations between working memory and long-term memory before partially processed sensory images in working memory decay. This, in turn, provides an opportunity to find a more accurate lexical match in the manner illustrated in Fig. 9.

85

Fig. 9. Model of sensory-perceptual-cognitive information processing in speech perception.

recent patent application from Apple for consistent speech processing, Foo and Hughes, 2011; see also Lasecki et al., 2013; Novak et al., 2013; for other applications). Thus, time stretching may provide a potentially viable approach to augmenting communication in adverse listening environments or with special listener populations such as nonnative speakers or older and hearing impaired listeners.

5. Conclusions Authors note The present study demonstrates that reducing speech tempo by 25% improves speech perception in EAH listeners in attentionally demanding conditions. The dual divided- and selective-attention experimental task used in this study mimics listening situations when the target signal is represented by more than one talker, and allows a closer examination of effects of attentional load independent of masking of the signal. The speech perception benefit of TS extended to both the detection of concurrent multiple keywords from different talkers (divided attention, measured by d0 ) and the identification of keywords of a single talker in the mixture (selective attention, measured by p(c)). Values of p(c) improved when the talker was identified by spatial location rather than a specific keyword. In that case, speech perception was not affected by the difficulty of the preceding divided attention task, suggesting that the different perceptual routes to keyword identification involved separate processing resources. These findings have implications for improving speech perception in real world communication environments. Reduction of speech tempo using computer algorithms can be implemented in a variety of communication devices such as hearing aids, VoIP and telephone apps. The present study provides an empirical validation of the potential benefits of this approach and specifies conditions under which it may be most effective. Although the particular method for time stretching used here, PSOLA, is not practical for real-time face-to-face communications involving several talkers, these concern could be addressed in future algorithms, tailored to the needs of specific communication environments, instruments and tasks. Some potential applications of TS for improving realtime speech communication has already been suggested (e.g., a

Brian Gygi, NIHR Nottingham Hearing Biomedical Research Unit Nottingham, UK. Otology and Hearing group, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK. Veterans Affairs Northern California Health Care System, Martinez, CA. Valeriy Shafiro, Communications Disorders and Sciences, Rush University Medical Center, Chicago, IL. Acknowledgments We would like to thank Nabeel Rahman and Nicholas Livingston for their assistance with administering the tests, Kara Haupt for doing the audiometric testing and the listeners for taking part in the study. This research was supported by a Merit Review Training Grant from the United States Department of Veterans Affairs Research Service, VA File # 06-12-00446. Disclaimer This article presents independent research funded by the United States Department of Veterans Affairs and the UK National Institute for Health Research (NIHR). The views expressed are those of the authors and not necessarily those of the Department of Veterans Affairs, the NHS, the NIHR or the Department of Health.

86

B. Gygi, V. Shafiro / Hearing Research 310 (2014) 76e86

References Akeroyd, M.A., 2008. Are individual differences in speech reception related to individual differences in cognitive ability? A survey of twenty experimental studies with normal and hearing-impaired adults. Int. J. Audiol. 47, 53e71. Algazi, V.R., Duda, R.O., Thompson, D.M., Avendano, C., 2001, Oct. 21e24. The CIPIC HRTF database. In: Paper Presented at the 2001 IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics, New Paltz, NY. Best, V., Gallun, F.J., Mason, C.R., Kidd, G., Shinn-Cunningham, B.G., 2010. The impact of noise and hearing loss on the processing of simultaneous sentences. Ear Hear 31, 213. Bolia, R.S., Nelson, W.T., Ericson, M.A., Simpson, B.D., 2000. A speech corpus for multitalker communications research. J. Acoust. Soc. Am. 107, 1065e1066. Boothroyd, A., 2010. Adapting to changed hearing: the potential role of formal training. J. Am. Acad. Audiol. 21, 601e611. Boyle, P.J., Nunn, T.B., O’Connor, A.F., Moore, B.C.J., 2013. STARR: a speech test for evaluation of the effectiveness of auditory prostheses under realistic conditions. Ear Hear 34, 203e212. Bradlow, A.R., Nygaard, L.C., Pisoni, D.B., 1999. Effects of talker, rate, and amplitude variation on recognition memory for spoken words. Percept. Psychophys. 61, 206e219. Brungart, D.S., Chang, P.S., Simpson, B.D., Wang, D., 2006. Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. J. Acoust. Soc. Am. 120, 4007. Brungart, D.S., Simpson, B.D., Ericson, M.A., Scott, K.R., 2001. Informational and energetic masking effects in the perception of multiple simultaneous talkers. J. Acoust. Soc. Am. 110, 2527e2538. Di Carlo, L.M., Taub, H.A., 1972. The influence of compression and expansion on the intelligibility of speech by young and aged aphasic (demonstrated CVA) individuals. J. Commun. Disord. 5, 299e306. Folstein, M.F., Folstein, S.E., McHugh, P.R., 1975. Mini-Mental State: a practical method for grading the cognitive state of outpatients for the clinician. J. Psychiat. Res. 12, 189e198. Foo, E. W., Hughes, G. F., 2011. US Patent No. 20120215532 Washington, DC: U.S. Patent and Trademark Office. Hearing Assistance System for Providing Consistent Human Speech. Gallun, F.J., Mason, C.R., Kidd, G.R.J., 2007. Task-dependent costs in processing two simultaneous auditory stimuli. Percept. Psychophys. 69, 757e771. Gallun, F.J., Mason, C.R., Kidd, G.R.J., 2005. Binaural release from informational masking in a speech identification task. J. Acoust. Soc. Am. 118, 1614. Gordon-Salant, S., Fitzgibbons, P.J., 1997. Selected cognitive factors and speech recognition performance among young and elderly listeners. J. Speech Lang. Hear Res. 40, 423e431. Gordon-Salant, S., Fitzgibbons, P.J., Friedman, S.A., 2007. Recognition of timecompressed and natural speech with selective temporal enhancements by young and elderly listeners. J. Speech Lang. Hear Res. 50, 1181e1193. Gygi, B., Shafiro, V., 2012. Spatial and temporal factors in a multitalker dual listening task. Acta Acust. Uni. Ac. 98, 142e157. Helfer, K.S., Freyman, R.L., 2008. Aging and speech-on-speech masking. Ear Hear 29, 87e98. Humes, L.E., 2008. Aging and speech communication. ASHA Lead. 13, 10e13, 33. Humes, L.E., Floyd, S.S., 2005. Measures of working memory, sequence learning, and speech recognition in the elderly. J. Speech Lang. Hear Res. 48, 224e235. Humes, L.E., Lee, J.H., Coughlin, M.P., 2006. Auditory measures of selective and divided attention in young and older adults using single-talker competition. J. Acoust. Soc. Am. 120, 2926e2937. Ihlefeld, A., Shinn-Cunningham, B., 2008. Spatial release from energetic and informational masking in a divided speech identification task. J. Acoust. Soc. Am. 123, 4380. Iyer, N., Brungart, D., 2005. The influence of target-masker similarity in a threetalker diotic listening task. J. Acoust. Soc. Am. 117, 2485. Kidd, G.R.J., Arbogast, T.L., Mason, C.R., Gallun, F.J., 2005a. The advantage of knowing where to listen. J. Acoust. Soc. Am. 118, 3804e3815. Kidd, G.R.J., Mason, C.R., Gallun, F.J., 2005b. Combining energetic and informational masking for speech identification. J. Acoust. Soc. Am. 118, 982e992. Korabic, E.W., Freeman, B.A., Church, G.T., 1978. Intelligibility of time-expanded speech with normally hearing and elderly subjects. Audiol 17, 159e164. Lasecki, W.S., Miller, C.D., Bigham, J.P., 2013. Warping time for more effective realtime crowdsourcing. In: Proceedings of the 2013 ACM Annual Conference on Human Factors in Computing Systems, pp. 2033e2036.

Lee, J.H., Humes, L.E., 2012. Effect of fundamental-frequency and sentence-onset differences on speech-identification performance of young and older adults in a competing-talker background. J. Acoust. Soc. Am. 132, 1700e1717. Luterman, D.M., Welsh, O.L., Melrose, J., 1966. Responses of aged males to timealtered speech stimuli. J. Speech Hear Res. 9 (2), 226e230. Macmillan, N.A., Creelman, C.D., 2005. Detection Theory: a User’s Guide, second ed. Erlbaum, Mahwah, N.J. Marrone, N., Mason, C.R., Kidd, G.J.R., 2008. The effects of hearing loss and age on the benefit of spatial separation between multiple talkers in reverberant rooms. J. Acoust. Soc. Am. 124, 3064e3075. McCroskey, R.L., Kasten, R.N., 1982. Temporal factors and the aging auditory system. Ear Hear 3, 124e127. Moore, T.J., 1981. Voice communication jamming research. In: Paper Presented at Aural Communication in Aviation, Neuilly-Sur-Seine, France. Navon, D., Gopher, D., 1979. On the economy of the human-processing system. Psychol. Rev. 86, 214e255. Nejime, Y., Moore, B.C.J., 1998. Evaluation of the effect of speech-rate slowing on speech intelligibility in noise using a simulation of cochlear hearing loss. J. Acoust. Soc. Am. 103, 572e576. Nelson, T.W., Bolia, R.S., Ericson, M.A., McKinley, R.L., 1999. Spatial audio displays for speech communication: a comparison of free field and virtual acoustic environments. In: Proceedings of Human Factors and Ergonomics Society 43rd Annual Meeting. Norman, D.A., Bobrow, D.G., 1975. On data-limited and resource-limited processes. Cogn. Psychol. 7, 44e64. Novak, J.S., Archer, J., Shafiro, V., Kenyon, R., Leigh, J., 2013. On-line audio dilation for human interaction. Proc. Interspeech 2013, 1869e1871. Picheny, M., Durlach, N., Braida, L., 1989. Speaking clearly for the hard of hearing. III. An attempt to determine the contribution of speaking rate to differences in intelligibility between clear and conversational speech. J. Speech Hear Res. 32, 600e603. Pichora-Fuller, M.K., 2003. Cognitive aging and auditory information processing. Int. J. Audiol. 42, 2S26-2S32. Rönnberg, J., Rudner, M., Foo, C., Lunner, T., 2008. Cognition counts: a working memory system for ease of language understanding (ELU). Int. J. Audiol. 47, 99e105. Salthouse, T.A., 1996. The processing-speed theory of adult age differences in cognition. Psychol. Rev. 103, 403e428. Salthouse, T.A., Babcock, R.L., 1991. Decomposing adult age differences in working memory. Dev. Psychol. 27, 763e776. Schmitt, J.F., 1983. The effects of time compression and time expansion on passage comprehension by elderly listeners. J. Speech Hear Res. 26, 373e377. Schmitt, J.F., McCroskey, R.L., 1981. Sentence comprehension in elderly listeners: the effect of rate. J. Gerontol. 36, 441e445. Shafiro, V., Gygi, B., 2007. Perceiving the speech of multiple concurrent talkers in a combined divided and selective attention task. J. Acoust. Soc. Am.122, EL229e235. Sheft, S., Shafiro, V., Lorenzi, C., McMullen, R., Farrell, C., 2012. Effects of age and hearing loss on the relationship between discrimination of stochastic frequency modulation and speech perception. Ear Hear 33, 709e720. Shinn-Cunningham, B.G., Ihlefeld, A., Satyavarta, Larson, E., 2005. Bottom-up and top-down influences on spatial unmasking. Acta Acust. Uni. Ac. 91, 967e979. Stone, M.A., Füllgrabe, C., Mackinnon, R.C., Moore, B.C.J., 2011. The importance for speech intelligibility of random fluctuations in “steady” background noise. J. Acoust. Soc. Am. 130, 2874e2881. Vaughan, N.E., Furukawa, I., Balasingam, N., Mortz, M., Fausti, S.A., 2002. Timeexpanded speech and speech recognition in older adults. J. Rehabil. Res. Dev. 39, 559e566. Vaughan, N.E., Storzbach, D., Furukawa, I., 2008. Investigation of potential cognitive tests for use with older adults in audiology clinics. J. Am. Acad. Audiol.19, 533e541. Verhaeghen, P., De Meersman, L., 1998. Aging and the Stroop effect: a meta-analysis. Psychol. Aging 13, 120e126. Verhaeghen, P., Steitz, D.W., Sliwinski, M.J., Cerella, J., 2003. Aging and dual-task performance: a meta-analysis. Psychol. Aging 18, 443e460. Wickens, C.D., 1991. Processing resources and attention. In: Damos, D.L. (Ed.), Multiple-task Performance. Taylor & Francis, London; Washington D.C, pp. 3e 334. Wingfield, A., Tun, P.A., Koh, C.K., Rosen, M.J., 1999. Regaining lost time: adult aging and the effect of time restoration on recall of time-compressed speech. Psychol. Aging 14 (3), 380e389. Yost, W.A., Dye, R.H.J., Sheft, S., 1996. A simulated “cocktail party” with up to three sound sources. Percept. Psychophys. 58, 1026e1036.

Spatial and temporal modifications of multitalker speech can improve speech perception in older adults.

Speech perception in multitalker environments often requires listeners to divide attention among several concurrent talkers before focusing on one tal...
2MB Sizes 0 Downloads 2 Views