Effects of aging on audio-visual speech integration  lie Huysea) and Jacqueline Leybaert Aure Universit e Libre de Bruxelles, Avenue F.D. Roosevelt, 50, CP 191, 1050 Brussels, Belgium

 de ric Berthommier Fre Gipsa-Lab Grenoble, France, Domaine Universitaire BP 46, 38402 Martin d’He`res Cedex, France

(Received 27 June 2012; revised 22 April 2014; accepted 14 July 2014) This study investigated the impact of aging on audio-visual speech integration. A syllable identification task was presented in auditory-only, visual-only, and audio-visual congruent and incongruent conditions. Visual cues were either degraded or unmodified. Stimuli were embedded in stationary noise alternating with modulated noise. Fifteen young adults and 15 older adults participated in this study. Results showed that older adults had preserved lipreading abilities when the visual input was clear but not when it was degraded. The impact of aging on audio-visual integration also depended on the quality of the visual cues. In the visual clear condition, the audio-visual gain was similar in both groups and analyses in the framework of the fuzzy-logical model of perception confirmed that older adults did not differ from younger adults in their audio-visual integration abilities. In the visual reduction condition, the audio-visual gain was reduced in the older group, but only when the noise was stationary, suggesting that older participants could compensate for the loss of lipreading abilities by using the auditory information available in the valleys of the noise. The fuzzy-logical model of perception confirmed the significant impact of aging on audio-visual integration by showC 2014 Acoustical Society of America. ing an increased weight of audition in the older group. V [http://dx.doi.org/10.1121/1.4894685] PACS number(s): 43.71.Lz, 43.71.An [PBN]

I. INTRODUCTION

It is well documented that speech perception is improved when the visual speech cues are provided along with the acoustic speech signal, especially in noisy surroundings (Erber, 1969; Grant and Seitz, 2000; MacLeod and Summerfield, 1987). The gain in intelligibility issued from the visual information, called audio-visual (AV) gain, is due to the fact that during speech perception, auditory and visual cues are merged into a unified percept, a mechanism known as AV integration. The goal of the present study is to investigate whether integration abilities of older adults are preserved compared to those of younger adults. This question is worth examining since aging is known to affect both hearing and lipreading abilities. Age-related hearing loss (also called presbycusis) is caused by the degeneration of the cochlea, located in the inner ear, and leads to hearing impairments with a varying degree of severity, from mild to severe (Gates and Mills, 2005; Liu and Yan, 2007). The deterioration is inevitable and the prevalence is high, i.e., 40% of the population older than 65 yr has hearing loss great enough to impair communication (Ries, 1994). The loss of hearing begins in the highest frequencies, leading to major difficulties understanding speech in noisy or reverberant environments. Eventually, the loss progresses to lower frequencies (in the 2–4 kHz). In that case, speech understanding is impaired in most daily

a)

Author to whom correspondence should be addressed. Electronic mail: [email protected]

1918

J. Acoust. Soc. Am. 136 (4), October 2014

Pages: 1918–1931

situations, with important psychological and social implications (Ries, 1994). Beside age-related hearing loss, there is also age-related lipreading loss. Indeed, several studies have shown that normally sighted older adults demonstrate decreased lipreading abilities in tasks involving words and sentences (Cienkowski and Carney, 2002; Dancer et al., 1994; Feld and Sommers, 2009). The prevalence of vision impairments (such as cataract, macular degeneration, diabetic retinopathy, and glaucoma) is increased in the elderly (see Legault et al., 2010), with a negative impact on lipreading abilities (Hardick et al., 1970; Johnson and Snell, 1986; Romano and Berlow, 1974). Taken together, these data suggest that older adults may be disadvantaged in face-to-face communication because of the age-related hearing loss and the age-related lipreading decrease (Feld and Sommers, 2009). Few studies compared the AV speech perception of older adults (with normal or near normal visual acuity) to that of younger adults and drew somewhat diverging conclusions. Sommers et al. (2005) found decreased AV gain in older adults. However, they argued that the poorer AV performance was due to differences in lipreading abilities rather than differences in AV integration per se, since AV gain was similar in both groups when differences in visual-only condition were taken into account. Other studies showed no age difference in the AV gain (Gordon and Allen, 2009; Tye-Murray et al., 2010), suggesting that aging has no significant impact on the benefit provided by visual speech cues. This is true when the visual input has good clarity. On the other hand, when the visual input is degraded, the AV gain seems to be decreased in older adults compared to younger adults (Gordon and Allen, 2009; Legault et al., 2010).

0001-4966/2014/136(4)/1918/14/$30.00

C 2014 Acoustical Society of America V

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

Beyond audio-visual gain (AVG), the McGurk effect provides additional information about AV speech integration. The McGurk effect occurs when audition and vision provide incongruent tokens (McGurk and MacDonald, 1976). For example, when presented with visual velar /ka/ and auditory bilabial /pa/, normally hearing individuals tend to report the illusory fusion alveo-dental /ta/. McGurk fusions show that visual articulatory cues about place of articulation (lipreading) are integrated into the auditory percept, which is, thereby modified. McGurk fusion effect has been extensively investigated. Studies showed that the ability to fuse auditory and visual information appears very early, around 4 months of age (Burnham and Dodd, 2004; Rosenblum et al., 1997). Then, McGurk effect increases with age, particularly between the ages of 6 and 8 yr (Sekiyama and Burnham, 2004). Studies also showed that adults are more sensitive to visual influence than children, due to more effective lipreading abilities (Massaro et al., 1986). In sum, the McGurk fusion effect evolves with chronological age. However, there is little information about how normal-hearing older adults perform with incongruent AV items. To our knowledge, only Cienkowski and Carney (2002) examined this issue specifically. They found that the percentage of fusion responses was not significantly different across age groups and concluded that older adults had preserved AV integration abilities, even if they had poorer auditory performance (lipreading abilities were not evaluated at the syllable level). However, they found a group difference in the response chosen by participants when the fusion failed to occur. In that case, older participants produced the response that matched the visual component of the McGurk stimulus (visually based response), whereas younger participants leaned toward the auditory component (auditory-based response). The authors also tested a control group, composed of younger adults whose hearing thresholds were shifted with noise to match the older adults. Since the control group also favored the visual cue, authors concluded that when AV integration fails to occur, participants select an alternative response from the modality with the least ambiguous signal. Considering the fact that older adults may have decreased auditory and visual abilities, the goal of the present research is to examine the nature and the extent of the AV integration in older adults. More specifically, the impact of aging on AV integration of speech in noise is evaluated by studying the impact of the degradation of the visual speech cue. In the present study, data were collected in a paradigm where auditory cues were degraded by a background noise, which was either stationary (ST) or modulated in amplitude (AM). Many studies showed an increase of speech intelligibility in AM noise, in comparison with ST noise in adults (F€ ullgrabe et al. 2006; Nelson et al., 2003) as well as in children (Ziegler et al., 2005). These findings suggest that normal-hearing listeners are able to “listen in the valleys of noise,” i.e., during the short temporal minima present in AM background noise but absent in stationary noise. Some pieces of the message are extracted in these dips of the modulated masker and are often sufficient to deduce the identity of the spoken syllables. This phenomenon is called “release from masking.” The masking release effect is also present in older J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

adults but its size is significantly smaller than observed in younger adults (Dubno et al., 2002). Our paradigm examined AV speech perception in noise by adding a visual speech stimulus to the original masking release paradigm. Stimuli were presented in visual (V), auditory (A), and AV conditions (either congruent or incongruent). During half of the experiment, the visual speech cue was clear (VCL) while it was degraded in the other half (visual reduction, VR) in order to prevent participants from performing good lipreading. A similar paradigm was used in a pilot study with normal-hearing young adults. Results showed that visual reduction led to changes in the AV integration pattern. Indeed, not only did scores in V and AV modalities decrease, but also auditory-based responses in the McGurk trials increased. The use of visual reduction in a masking release paradigm should shed some light on how AV speech integration changes in older adults compared to younger adults. The second goal of the present study is to examine at which stage of AV processing the impact of visual reduction and of age occurs. We seek to determine whether differences between visual clear and visual reduction and between younger and older adults are due to differences in unisensory auditory and visual performance or are they due to differences in the weight assigned to these modalities in the integration process itself (Grant, 2002). In other words, if visual reduction effectively leads to an increase in the number of auditory-based responses, is it because audition has more weight than vision in the integration process or simply because the visual input is less informative than the auditory input? The same question holds for the potential age differences; if older adults perform differently than younger adults, is it because their AV integration is different or because they perform differently in each unisensory modality? These have important clinical implications. Indeed, if older adults differ from younger adults in their ability to integrate AV speech, the focus of rehabilitation programs should be put on the training of integration, rather than on the training of the auditory and visual abilities (Grant and Seitz, 1998; Schwartz, 2010). One possible approach to disentangle unisensory effects from AV integration effects is to use model predictions of AV recognition to estimate a participant’s integration abilities. The “fuzzy-logical model of perception” (FLMP) (Massaro, 1987, 1998) is a model frequently used to that purpose. According to that model, auditory and visual sources of information are evaluated separately before being integrated. The key concept is that integration is optimal and that all subjects integrate equally well (in an optimal way). Thereby, any differences in the AV output are due to differences in unisensory perception. For the study of subject variability, Schwartz (2010) proposed to use a variation of the FLMP, the weighted fuzzy-logical model of perception (WFLMP), in which inputs from audition and vision are weighted. He compared this new model to the standard FLMP using various criteria (root mean square error, Bayesian model selection criterion) and found that WFLMP fit the data better than the FLMP. WFLMP provides a meaningful indicator of how much individuals rely on audition Huyse et al.: Audio-visual integration in older adults

1919

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

and on vision (Schwartz et al., 2010; see Huyse et al., 2013 for a recent application). For these reasons, the WFLMP, rather than FLMP, will be used here. II. METHOD A. Participants

Fifteen young adults (mean age ¼ 20.9 yr, SD ¼ 0.67 yr) participated in this study for course credits and 15 older adults (mean age ¼ 68.3 yr, SD ¼ 0.85 yr) participated in this study on a voluntary basis. All participants reported having normal or corrected-to-normal vision. None of them had any history of hearing disorders or used hearing aids. They had no declared neurological disorder and no history of language disorder. They were prescreened for any auditory impairment, using pure-tone thresholds for the frequencies 250–4000 Hz. The inclusion criterion was the following: no hearing loss up to 2000 Hz [20 dB hearing level (HL) or better] and no more than mild hearing loss at 4000 Hz (35 dB HL or better). Older participants were also screened for dementia using the Mini Mental State Exam (Folstein et al., 1975). Participants who scored below 24 (of 30) were excluded from further testing. Participants were not paid for their participation. The project has been reviewed and approved by the University research ethics board. B. Stimuli

Stimuli were composed of vowel-consonant-vowel Ð (aCa) syllables with the consonants /p, t, k, s, f, / interposed between two /a/ vowels. A male speaker of French was videotaped while saying these syllables. He was filmed from the bottom of the nose to the chin. The production of each stimulus began and ended in a neutral position, with the mouth closed. Videos (QuickTime movie files, 21 by 21 cm) were displayed centered on a 15-in. MacBook Pro laptop on a black background. Three productions of each /aCa/ stimulus were digitally recorded and audio tracks were equalized in level. All 18 stimuli (6 syllables  3 repetitions) were used to create the audio-visual, audio-only, and visual-only trials. Stimuli were delivered through Sennheinser HD 121 Pro headphones at a comfortable presentation level. The congruent AV stimuli included digital audio-video files of the speaker saying and articulating the aCa stimuli. For the auditory-only condition, an image of the speaker, appearing neutral and with mouth closed was presented along with the auditory stimulus. For the visual-only condition, the audio was turned off. Finally, incongruent AV syllables (McGurk stimuli) were created by carefully combining audio files with non-corresponding video files and matching their onset. We used three repetitions of the two following stimuli: the audio /apa/ with the visual /aka/ (fusion /ata/) Ð and the audio /afa/ with the visual /a a/ (fusion /asa/). 1. Auditory noise

Each signal was digitalized at a 22 050 Hz sampling frequency. Throughout all conditions of the experiment, stimuli were embedded in noise which was either stationary (i.e., unmodulated) either modulated in amplitude. Modulation in 1920

J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

amplitude was achieved by using a white Gaussian noise low-pass filtered at 500 Hz (WGNf). The expression describing the sine-wave modulator, m(t), was mðtÞ ¼ ½1 þ cosð2pfm tÞ  WGNf;

(1)

where the first-order modulation frequency fm was 8 and 128 Hz. The noise was then added to the signal. The signalto-noise ratio was fixed at 23 dB (prior to the 500 Hz filtering). This signal-to-noise ratio was determined in a preliminary experiment so as to yield a consonant identification performance of about 30% correct under stationary noise (in A condition). The noise was manipulated so as to lead to minimal masking in the background valleys. 2. Visual reduction

In half the experiment, the quality of the visual component was reduced, by varying the contrast of the video around the mean intensity of the image X, for each RGB color of the image:   ½ X  meanð XÞ ð Þ : (2) Y ¼ mean X þ R The contrast R varies at a period of 4 Hz according to the following function: R ¼ k  10 Ù f1 þ 0:5  cos½/ þ f ðtÞg;

(3)

where the parameter k is set to 4 and where f(t) represents the modulation frequency. f(t) was set at 4 Hz. Since the total duration of the syllable (the mouth movements) exceeded 250 ms, a contrast modulation of 4 Hz always generated periods of masking and periods of unmasking within a single item. The starting phase of the modulation / was randomized in each interval between 0 and 2p. This random phase was added in order to prevent the visual modulation to be synchronized with the auditory modulation. Since / was random, visual reduction did not affect each sample of a single phoneme equally. In the end, the total number of items were 180 visual clear stimuli and 180 visual reduced stimuli: 6 syllables  3 repetitions  3 modalities  3 types of noise (162) þ 2 McGurk syllables  3 repetitions  3 types of noise (18). Four blocks “visual clear” items and 4 blocks of “visual reduction” items were created. We alternated one visual reduction block with one visual clear block. C. Procedure

The experiment took place in a dimly lit quiet room. The monitor was positioned at eye level, 70 cm from the participant’s head. The session began with a training session composed of one visual clear block followed by one visual reduction block. Participants were asked to identify the syllable and to report it aloud. They were informed of the composition of the stimulus set (but not of the presence of the McGurk stimuli) and had the options written on a paper in front of them during the training session but not during the Huyse et al.: Audio-visual integration in older adults

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

experimental session. In both groups, half the participants began the experimental session with a visual reduction block and the other half began with a visual clear block. Responses were noted by the experimenter. Total duration of the experiment was about 40 min. III. RESULTS

Speech identification scores (% correct) and subsequent computed scores are presented in Table I. Results in noise modulated at 8 Hz and noise modulated at 128 Hz were averaged for more clarity and because the masking release effect was not significantly different in 8 Hz compared to 128 Hz. A. Single modality conditions

First, results in the auditory-only modality (A) were analyzed. A repeated-measure analysis of variance (ANOVA) was run with visual condition (visual clear, visual reduction) and noise (stationary, modulated) as within-subject factors and group (young, older) as between-subject factor. Analyses revealed that, as expected, the effect of visual condition was not significant (p ¼ 0.085), nor was any interaction including the visual condition variable. There was a significant main effect of noise [F(1,28) ¼ 923.95, p < 0.0001], with modulated noise leading to higher performance than stationary noise and a significant main effect of group [F(1,28) ¼ 8.64, p ¼ 0.007], with young adults having higher global auditory performance than older adults. There was also a significant interaction between noise and group [F(1,28) ¼ 6.64, p ¼ 0.02]. The group effect was further analyzed for each level of the noise factor with separate ANOVAS. Group differences were not significant in ST noise (p ¼ 0.83) but were significantly different in AM noise (p < 0.0001). These data show that, as expected, performance was better in modulated noise than in stationary noise and that the size of this masking release effect was higher in the younger group compared to the older group (40.28% and 33.98%, respectively, when visual clear and visual reduction variables are averaged). TABLE I. Global performance of participants in the auditory-only, visualonly, and audio-visual modalities; in stationary (ST) and modulated (AM) noises. Visual clear Young Auditory-only ST AM Visual-only ST AM Audio-visual ST AM Masking release Audio-visual gain ST AM

Older

Visual reduction Young

Older

42.59 (1.29) 35.19 (3.86) 47.16 (4.28) 85.19 (1.99) 70 (2.99) 84.63 (1.15)

38.89 (4.47) 72.04 (3.59)

77.41 (2,99) 74.07 (3.33) 44.44 (5.34) 74.26 (2.89) 75.37 (2.61) 48.52 (3.98)

17.04 (3.32) 20.37 (3.92)

93.33 (1.55) 89.63 (2.36) 78.15 (3.45) 60.74 (4) 97.04 (0.79) 92.59 (1.91) 92.78 (1.67) 80.93 (3.35) 42.59 (8.24) 34.81 (8.19) 37.96 (7.47) 33.15 (11.08) 88.30 (2.7) 85.47 (3.08) 59.83 (6.19) 35.29 (5.75) 77.56 (6.29) 75.76 (4.9) 47.67 (11.87) 33.39 (7.58)

J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

Second, results in the visual-only modality (V) were analyzed. A repeated-measure ANOVA was run with visual condition (visual clear, visual reduction) and noise (stationary, modulated) as within-subject factors and group (young, older) as between-subject factor. Analyses revealed that, as expected, the effect of noise was not significant (p ¼ 0.177), nor was any interaction involving the noise variable. There was a significant main effect of visual condition [F(1,28) ¼ 367.14, p < 0.0001], with visual clear blocks leading to higher performance compared to the visual reduction blocks; and a significant main effect of group [F(1,28) ¼ 10.92, p ¼ 0.003], with younger adults having higher global visual performance than older adults. Moreover, there was a significant interaction between visual condition and group [F(1,28) ¼ 35.82, p < 0.0001]. The group effect was further analyzed for each level of the visual condition factor with separate ANOVAS. Analyses showed that when the visual input was clear, group differences were not significant (p ¼ 0.78). However, performance of older participants was decreased compared to that of younger participants when the visual input was degraded. These data show that, as expected, performance in visual-only modality were better when the visual input was clear than when it was degraded. Moreover, the impact of visual reduction was larger for the older group compared to the younger group.

B. Congruent AV modality

The AVG was calculated in both groups, i.e., the relative increase in AV speech perception performance due to the addition of visual information to the auditory signal (Sumby and Pollack, 1954). The AVG was computed in both stationary and modulated noises using the following formula: AVG ¼ ðAV–AÞ=ð100–AÞ:

(4)

Thus, AVG measures the gain performance in the AV modality in comparison to performance in auditory-only (A) modality, normalized relative to the amount by which speech intelligibility could have improved above A scores. A repeated-measure ANOVA was run with visual condition (visual clear, visual reduction) and noise (stationary, modulated) as within-subject factor and with group as betweensubject factor. Analyses revealed a significant main effect of visual condition [F(1,29) ¼ 51.24, p < 0.0001], with visual clear blocks leading to higher AVG than visual reduction blocks. There was also a main effect of noise [F(1,29) ¼ 4.60, p ¼ 0.04], with stationary noise leading to higher AVG than modulated noise (see Table I). There was no significant main effect of group (p ¼ 0.07) and no significant interaction including the group factor. However, further analysis in the visual reduction condition showed that AVG, in stationary noise, was significantly lower in the older group than in the control group [F(1,29) ¼ 8.43, p ¼ 0.01]. These data show that in the visual clear condition, older participants had similar AVGs compared to younger participants. However, as expected, AVGs were decreased by visual reduction, with Huyse et al.: Audio-visual integration in older adults

1921

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

older participants having lower AVG in stationary noise than the younger participants. C. Consonant confusions

To investigate whether patterns of visual and auditory confusions vary as a function of age, matrices of confusions of both groups were subjected to log-linear analysis (Bell et al., 1986; Walden et al., 1977). Analysis of the auditoryonly confusion matrices (VCL and VR averaged, Fig. 1) revealed no significant main effect of group (p ¼ 0.43 in ST noise and p ¼ 0.37 in AM noise), showing that older adults did not make more confusions than younger adults. However, the three-way interaction of Response  Stimulus  Group was significant in stationary noise [G2 ¼ 39.21, p ¼ 0.03] and in modulated noise [G2 ¼ 63.41, p < 0.0001], suggesting that the pattern of confusion was different between groups. In stationary noise, the phoneme /k/ was more confounded with other phonemes (particularly the phoneme /t/) in the older group than in the younger group. In modulated noise, older participants tend to confound the phoneme /f/ with the phoneme /p/ more often than younger participants. They also tend to confound the phoneme /s/ with the other phonemes whereas younger participants never made mistake about this

stimulus /s/. These group differences for /f/ and /s/ might be due to age-related differences in the extraction process of the auditory scene analysis. Indeed, speech perception in noise requires the extraction of the speech signal from the noise signal and a subsequent reconstruction of the missing information (Bregman, 1990, cited by Rimmele et al., 2012). Since it has been shown that aging adversely affects the extraction process (Alain et al., 2006), older participants are disadvantaged compared to younger participants. Moreover, the extraction of the phonemes /s/ and /f/ from the noise is more difficult than the extraction of other phonemes because they are constituted of noise that is equivalent to the masking noise (with a different spectral distribution). Taken together, these data suggest that the differences in the pattern of confusions found in the present study are due to age-related differences in the auditory scene analysis. Analysis of the visual-only confusion matrices (ST and AM noises averaged, Fig. 2) revealed a significant main effect of group in the visual reduction condition only [G2 ¼ 57.29, p < 0.0001], with older participants making more confusions than younger participants. The three-way interaction Response  Stimulus  Group was also significant in VR [G2 ¼ 153.94, p < 0.0001], suggesting that the pattern of confusion of both groups was different. In the

FIG. 1. Confusion matrices in the auditory-only modality (VCL and VR averaged) in ST and AM noises.

1922

J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

Huyse et al.: Audio-visual integration in older adults

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

FIG. 2. Confusion matrices in the visual-only modality (ST and AM noises averaged).

older group, compared to the younger group, each phoneme was globally confounded with the others. Interestingly, no significant effect was found in the visual-only modality when the visual input was clear. Analysis of the AV confusion matrices (Figs. 3 and 4) revealed no significant main effect of group, suggesting that older participants did not make more confusion than younger participants. The three-way interaction Response  Stimulus  Group was significant only in the VR condition, in ST condition [G2 ¼ 183.15, p < 0.0001] and in AM condition [G2 ¼ 157.87, p < 0.0001]. Older participants made more global confusions than the younger participants for the phoneme /s/ and the phoneme /f/. D. McGurk effect R 1. Auditory /afa/ dubbed on visual /a a/ (expected fusion /asa/)

Three responses were taken into account: the number of auditory-based responses (/afa/), visually based responses Ð (/a a/) and fusion responses (/asa/). The distribution of those responses is shown in Fig. 5. For more clarity, results will be first analyzed in the visual clear condition in order to examine the impact of noise J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

condition on AV integration of younger and older adults. Second, results will be analyzed in the visual reduction condition, in order to evaluate the impact of visual reduction. a. Visual clear condition. When the visual input was clear, stationary noise led participants of younger and older groups to give mainly visually based responses (86.70% and 95.60%, respectively) and few fusion (11% and 2.22%) and auditory-based responses (0% in each group). In modulated noise, younger adults gave mainly fusion responses (56.65%) and fewer auditory-based (1.11%) and visually based responses (38.89). Older participants continued to give mainly visually based responses (76.65%) and few auditory (3.33%) and fusion (17.78%) responses. A repeatedmeasure ANOVA with noise (stationary, modulated) and response (auditory-based, fusion, visually based) as withinsubject factor and group (young, older) as between-subject factors confirmed these observations. Indeed, analyses revealed a significant interaction between response, noise and group [F(2,56) ¼ 11.17, p < 0.0001]. Group differences in the response given were further analyzed with separate ANOVAS for each level of the noise factor (Table II). There was no significant group difference in the stationary noise. Group differences were found in modulated noise: Huyse et al.: Audio-visual integration in older adults

1923

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

FIG. 3. Confusion matrices in the AV modality for the younger adults.

younger adults made significantly more fusion responses [F(1,29) ¼ 11.84, p ¼ 0.002] and significantly less visually based responses than older adults [F(1,29) ¼ 11.84, p ¼ 0.003]. b. Visual reduction condition. In the visual reduction condition, a repeated-measure ANOVA with noise (stationary, modulated) and response (auditory-based, fusion, visually based) as within-subject factors and group (young, older) as between-subject factor revealed no significant difference between groups neither in stationary noise or in modulated noise. In the younger group (see Table III), compared to the visual clear condition, the number of visually based responses was decreased in stationary noise (55.56%) [F(1,14) ¼ 9.66, p ¼ 0.008]. The number of auditory-based responses (8.89%) and the number of fusions (22.2%) were not significantly modified. In modulated noise, the number of auditory-based responses (47.78%) increased drastically [F(1,14) ¼ 40.75, p < 0.0001] to the detriment of fusion responses (14.4%) which were significantly decreased [F(1,14) ¼ 39.95, p < 0.0001]. The number of visually based responses (30%) was not changed (p ¼ 0.29). In the older group (see Table III), visual reduction led to a decrease in the number of visually based responses, in both stationary (35.56%) and modulated (18.89%) noises [F(1,14) ¼ 60.47, p < 0.0001]. On the other hand, the number of auditory-based responses was increased, also in both stationary (22.22%) and modulated (38.89%) 1924

J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

noises ([F(1,14) ¼ 12.73, p ¼ 0.003] and [F(1,14) ¼ 15.48, p ¼ 0.001 respectively). The number of fusion response was not significantly changed, either in stationary noise (11.11%, p ¼ 0.16) nor in modulated noise (3.33%, p ¼ 0.07). In summary, in both groups, visual reduction led mainly to a decrease in the number of visually based responses and to an increase in the number of auditory-based responses. 2. Auditory /apa/ dubbed on visual /aka/ (expected fusion /ata/)

Three responses were taken into account: the number of auditory-based responses (/apa/), fusion responses (/ata/) and visually based responses (/aka/). The distribution of those responses is shown in Fig. 6. As for the previous McGurk stimulus, results will be first analyzed in the visual clear condition in order to examine the impact of noise condition on AV integration of younger and older participants. Second, results will be analyzed in the visual reduction condition, in order to evaluate the impact of visual reduction. a. Visual clear condition. When the visual input was clear, stationary noise led participants of younger and older groups to give mainly visually based responses (71.10% and 51.10%, respectively) and less fusion (18% and 31%, respectively) and auditory-based responses (2% in each group). In modulated noise, participants of both groups gave mainly Huyse et al.: Audio-visual integration in older adults

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

FIG. 4. Confusion matrices in the AV modality for the older adults.

fusion responses (66.65% in the younger group and 54.45% in the older group), fewer auditory-based (2.22% and 5.56%, respectively) and fewer visually based responses (21.11% and 28.89%). A repeated-measure ANOVA with noise (stationary, modulated) and response (auditory-based, fusion, visually based) as within-subject factor and group (young, older) as between-subject factors confirmed these observations in the visual clear condition. Indeed, analyses revealed a significant interaction between noise and responses [F(2,56) ¼ 57.62, p < 0.0001] with stationary noise leading to increased number of visually based responses (p < 0.0001) and modulated noise leading to increased number of fusion responses (p < 0.0001). The number of auditory-based responses in stationary noise was not significantly different from that in modulated noise (p ¼ 0.26). No group difference was found in the responses made by participants, neither in stationary noise (p ¼ 0.11) nor in modulated noise (p ¼ 0.34) (Table IV). b. Visual reduction condition. A repeated-measure ANOVA with noise (stationary, modulated) and response (auditory-based, fusion, visually based) as within-subject factors and group (young, older) as between subject factor revealed a significant interaction between noise, response and group [F(2,56) ¼ 4.32, p ¼ 0.02]. Group differences in the response given were further investigated for each level of the noise factor with separate ANOVAS. In stationary noise, older participants gave significantly more auditory-based J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

responses [F(1,14) ¼ 5.07, p ¼ 0.04] and less visually based responses [F(1,14) ¼ 5.6, p ¼ 0.03]. The number of fusions was not significantly different (p ¼ 0.09). In modulated noise, older participants gave significantly less fusion responses than younger participants [F(1,14) ¼ 14.18, p ¼ 0.001]. There was no significant difference in the number of auditory-based responses (p ¼ 0.90) and visually based responses (p ¼ 0.08). In the younger group (see Table V), compared with the visual clear condition, the number of auditory-based response was increased, in stationary noise (17.78%) [F(1,14) ¼ 7.98, p ¼ 0.01] and in modulated noise as well (41.11%) [F(1,14) ¼ 29.07, p < 0.0001], whereas the number of visually based responses was decreased (20%, [F(1,14 ¼ 50.77, p < 0.0001] in stationary noise and 3.33%, [F(1,14) ¼ 8.87, p ¼ 0.01] in modulated noise). The number of fusion responses was unchanged in stationary noise (28.89%, p ¼ 0.09) and decreased in modulated noise (43.33%) [F(1,14) ¼ 6.91, p ¼ 0.02]. In the older group (see Table V), in both stationary and modulated noises, visual reduction led to an increase of the number of auditory-based responses {35.56%, [F(1,14) ¼ 21, p < 0.0001] and 40%, [F(1,14) ¼ 35.97, p < 0.0001] respectively}, and to a decrease of the number of visually based responses {6.67%, [F(1,14) ¼ 19.31, p ¼ 0.001] and 10%, [F(1,14) ¼ 4.52, p ¼ 0.05], respectively}. The number of fusion responses was unchanged in stationary noise (13.33%, p ¼ 0.07) and decreased in modulated noise (13.33%)[F(1,14) ¼ 26.77, p < 0.0001]. Huyse et al.: Audio-visual integration in older adults

1925

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

TABLE II. Group differences in the visual clear condition for the auditory Ð /afa/ dubbed on visual /a a/ McGurk stimulus. Visual clear Stationary noise Young

Older

Auditory 0% 0% Fusion 11% 2.22% Visual 86.70% 95.60%

Modulated noise

Comparison NS NS NS

Young

Older

1.11% 3.33% 56.65% 17.78% 38.89% 76.65%

Comparison NS p ¼ 0.002 p ¼ 0.003

visual cue was clear and lower than that of the younger group when the visual cue was degraded. Finally, results in the McGurk trials were analyzed. In the visual clear condition, modulated noise led to a major increase in the number of fusion responses in the younger group but not in the older group. In the visual reduction condition, both groups demonstrated a decrease in the number of visually based responses and an increase in the number of auditory-based responses. 4. Modeling results

In order to disentangle unisensory effects from integration effects, WFLMP was applied to our data. The standard form of the FLMP proposed by (Massaro, 1987, 1998) is a post-phonetic integration model with a statistically optimal integration rule. It can be expressed as PðRi =AÞPðRi =V Þ PðRi =A; V Þ ¼ X    : P Rj =A P Rj =V

Ð FIG. 5. Response type in the Auditory /afa/ Visual /a a/ McGurk trials for younger (A) and older groups (B), in visual clear and visual reduction blocks and in stationary and modulated noises.

In summary, in both groups, visual reduction led mainly to a decrease in the number of visually based responses and to an increase in the number of auditory-based responses. 3. Summary of the main results

Several group differences were found. First, both groups showed a clear masking release effect but this effect was higher in the younger group. Second, both groups had similar lipreading abilities when the visual input was clear, but older participants performed worse than young participants when the visual input was degraded. Third, AVG of the older group was similar to that of the younger group when the

(5)

j

In this expression, Ri and Rj are response categories, A and V are auditory and visual stimuli, P(Ri/A), P(Ri/V), and P(Ri/A,V) are auditory, visual and audiovisual response probabilities, respectively. Since the FLMP entails a fixed integration rule, a good fit of data to the FLMP means that any differences in AV responses are due to differences in unisensory processing before the AV integration occurs. The WFLMP (see Schwartz, 2010) where inputs from audition and vision are weighted, is defined by PðRi =AÞkA PðRi =V ÞkV PðRi =A; V Þ ¼ X  k : PðRj =AÞkA P Rj =V V

(6)

j

Ð TABLE III. Percentage of responses in each visual condition and in each auditory noise condition for the auditory /afa/ visual /a a/ McGurk stimulus. Stationary noise Young Auditory Fusion Visual Older Auditory Fusion Visual

1926

Modulated noise

Visual clear

Visual reduction

Comparison

Visual clear

Visual reduction

Comparison

0% 11% 86.70%

8.89% 22.2% 55.56%

NS NS p ¼ 0.008

1.11% 56.65% 38.89%

47.78% 24.40% 30%

p < 0.0001 p < 0.0001 NS

Visual clear

Visual reduction

Comparison

Visual clear

Visual reduction

Comparison

0% 2.22% 95.60%

22.22% 11.11% 35.56%

p ¼ 0.003 NS p < 0.0001

3.33% 17.78% 76.65%

38.89% 3.33% 18.89%

p ¼ 0.001 NS p < 0.0001

J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

Huyse et al.: Audio-visual integration in older adults

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

TABLE IV. Group differences in the visual clear condition for the auditory /apa/ dubbed on visual /aka/ McGurk stimulus. Visual clear Stationary noise Young

Older

Modulated noise

Comparison

Auditory 2% 2% Fusion 18% 31% Visual 71.10% 51.10%

NS NS NS

Young

Older

Comparison NS NS NS

2.22% 5.56% 66.65% 54.45% 21.11% 28.89%

mean square error (RMSE) as assessment criterion. It is computed by taking the squared distance between observed and predicted probabilities of responses, averaging them over all categories Ci and all experimental conditions Ej, and taking the square root of the result 2 RMSE ¼

FIG. 6. Response type in the Auditory /apa/ Visual /aka/ McGurk trials for younger (A) and older (B) groups, in visual clear and visual reduction blocks and in stationary and modulated noises.

In this expression, kA and kV are subject-dependent factors used to weight the auditory and visual inputs in the computation of the AV responses. For each subject, a lambda value is defined between 0 and 1. Then, kA and kV are computed from lambda by: kA ¼ lambda/(1-lambda) and kV ¼ (1lambda)/lambda, with thresholds maintaining kA and kV between 0 and 1. Consequently, PAV varies from a value close to PA when lambda is close to 1, to a value close to PV when lambda is close to 0, through a value identical to the FLMP prediction when lambda is close to 0.5, with kA and kV both equal to 1 (Schwartz, 2010). Here, we used the root

6 6 4

X

½PEJ ðCi Þ  pEJ ðCi Þ

2

Ej;Ci

nE nC

!31=2 7 7 5

:

(7)

In Eq. (7), observed probabilities are in lower case and predicted probabilities in upper case. For each group and for each visual condition, data were analyzed using linear regressions analysis of RMSE as a function of lambda (Fig. 7). In the VCL condition, there was no significant relationship between lambda and RMSE (p ¼ 0.07 in both groups). On the contrary, lambda was a significant predictor of the RMSE in the VR condition [younger group: R2 ¼ 0.10, F(1,284) ¼ 31.21, p < 0.0001; older group: R2 ¼ 0.18, F(1,284) ¼ 62.35, p < 0.0001], RMSE decreased when lambda increased. In other words, the WFLMP fitted our data in VR condition better when an auditory weight was added. This was true in both groups but Fig. 7 shows that the slope of the regression line in the older group was steeper than that of the younger group, suggesting an increased auditory weight in the older group. To test this hypothesis, we performed a one-way analysis of covariance (ANCOVA) with RMSE as dependent variable, visual condition (VCL, VR) and group as factors and lambda as covariate factor. All main effects and all interactions were significant. Importantly, the interaction between visual condition and lambda was significant [F(1,75) ¼ 83.97, p < 0.0001], indicating that the impact of lambda on RMSE was not similar

TABLE V. Percentage of responses in each visual condition and in each auditory noise condition for the auditory /apa/ visual /aka/ McGurk stimulus. Stationary noise Young Auditory Fusion Visual Older Auditory Fusion Visual

Modulated noise

Visual clear

Visual reduction

Comparison

Visual clear

Visual reduction

Comparison

2% 18% 71.10

17.80% 28.90% 20%

p ¼ .01 NS p < 0.0001

2.22% 66.65% 21.11%

41.10% 43.30% 3%

p < 0.0001 p ¼ 0.02 p ¼ 0.01

Visual clear

Visual reduction

Comparison

Visual clear

Visual reduction

Comparison

2% 31% 51.10%

35.60% 13.30% 6.67%

p < 0.0001 NS p ¼ 0.001

5.56% 54.45% 28.89%

40% 13.30% 10%

p < 0.0001 p < 0.0001 p ¼ 0.05

J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

Huyse et al.: Audio-visual integration in older adults

1927

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

FIG. 7. Variation of the root mean square error (RMSE) as a function of the lambda parameter tuning fusion in the WFLMP for the younger group and the older group, in the visual clear (VCL) and the visual reduction (VR) condition.

in VCL and in VR. This result confirms the existence of an increased auditory weight in VR compared to VCL condition. Also of interest is the significant three-way interaction between lambda, visual condition and group [F(1,75) ¼ 9.78, p ¼ 0.003]. Further analyses revealed no significant group difference in VCL condition (p ¼ 0.54), contrary to VR condition (p ¼ 0.004). This result confirms the observation that the regression slope in VR condition was steeper in the older group than in the younger group. In other words, data suggest that the increase in the weight of audition in VR compared to VCL was stronger in the older group than in younger group. IV. DISCUSSION

The present study was designed to investigate the effects of aging on AV speech perception in noise. More specifically, we examined the impact of simulated decreased lipreading abilities on AV perception in normal-hearing older adults (with normal-hearing thresholds) compared to normal-hearing younger adults. Stimuli were presented in three modalities (A, V, and AV) in a masking release paradigm. Results in the A modality showed that older and younger participants had comparable performance in the stationary noise. This result was expected since participants had basically equal audibility and since previous research 1928

J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

demonstrated that most of the variance in speech recognition in stationary noise is due to differences in audibility (Van Rooij and Plomp, 1992). On the other hand, older adults had lower auditory performance in modulated noise. Consequently, the size of their masking release effect was smaller. This result corroborates findings of previous research (Dubno et al., 2002, 2003; Takahashi and Bacon, 1992). It has been argued that age differences in masking release might be attributable to possible slightly elevated thresholds in higher frequencies in older participants (Dubno and Ahlstrom, 1997). Even all our participants matched the inclusion criteria, it is still possible that some of the older participants had slight deficiencies at frequencies above 4000 Hz. Results in the V modality showed, as expected, similar lipreading performance in both groups, in the visual clear condition. As mentioned earlier, previous studies demonstrated decreased lipreading abilities in older participants when stimuli had a high level of incertitude (such as sentences, words or a large panel of consonants). Here, there were only 6 syllables, corresponding to 6 different visemes, resulting in a low level of incertitude. Taken together, these data suggest that aging affects lipreading abilities only when items are highly confusable. Analyses of confusion matrices of both groups in the visual-only modality (visual clear blocks) revealed that older participants did not make more Huyse et al.: Audio-visual integration in older adults

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

confusions than their younger counterparts. The distribution of confusions between phonemes was also comparable between both groups. A similar result was found by Walden et al. (1993) with 20 consonants, suggesting that results in our study are not due only to the low level uncertainty used. Rather, it suggests that the perceptual process used to lipread may not be affected by aging, at least in visual clear condition. When the visual speech cue was degraded, older participants demonstrated reduced lipreading abilities. They also made more confusions between phonemes than did younger participants. Moreover, the pattern of confusion was different. In the younger group, confusions that were made concerned phonemes with close places of articulation. The older group however, made confusions that can hardly be explained by a similar place of articulation (velar and bilabial). To summarize, these results show that visual reduction led to decreased lipreading abilities in both groups. This decrease was stronger in the older group,Ð where even the most visually saillant phonemes (/p/, /f/, / /) where highly impacted by visual reduction (more than in the control group). In addition, unexpected confusions appeared due to visual reduction, mainly in the older participants. Taken together, these data suggest that aging affects the perceptual process used in lipreading degraded speech cues. One possible explanation might be found in the visual degradation technique used in the present study, i.e., contrast modulation. Indeed, it has been shown that contrast sensitivity declines with age (Elliott et al., 1990; Ross et al., 1985). Since normal (or corrected normal) vision does not exclude the presence of deficits in contrast sensitivity (Spear, 1993), the major impact of visual degradation on lipreading abilities of older adults, compared to younger adults, might be explained by contrast sensitivity impairments. Using the identification scores in AV modality, we calculated the AVG of each group of participants. When the visual input was clear, results showed that AV gain of the older group was similar to that of the younger group. This result corroborates previous findings (Gordon and Allen, 2009; Tye-Murray et al., 2010) and suggests that the AV integration abilities of older adults are intact. Results of the WFLMP analyses and the fact that we did not find significant difference between confusion matrices support this idea. A recent study suggested that older participants might even have higher AV gain than younger participants (Winneke and Phillips, 2011). The authors investigated age-related differences in AV speech perception by recording event-related potentials. Participants performed an object categorization task in three conditions: auditory-only, visual-only, and audio-visual. On the behavioral level, participants of both groups displayed similar AV gain. Group differences appeared at the neural level. In the older group, the facilitation of neural responses was larger and occurred earlier than in the younger group. The authors concluded that older adults benefit more from the visual speech cues, probably to compensate for the age-related hearing loss. On the other hand, when the visual input was degraded, younger adults had higher AV gain than older adults but only in stationary noise. The fact that age had no significant J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

impact on AV gain in modulated noise suggests that older adults can compensate their difficulties in lipreading by using the auditory signal available in the valleys of the modulated noise (even though older adults demonstrated a weaker masking release effect than younger adults). This result puts previous findings into perspective. The fact that AV gain with degraded visual speech cues is weaker in older adults compared to younger adults seems to be true only when auditory performance is poor, as in Gordon and Allen (2009) and in Tye-Murray et al. (2010). Regarding the McGurk fusion effect, the fricative stimuli and the plosive stimuli led to different results. In the case Ð of A/afa/V/a a/ in visual clear blocks, stationary noise led both groups to make, as expected, mainly visually based responses. This can be explained by the fact that in stationary noise, auditory /afa/ is almost not perceived whereas visÐ ual /a a/ is almost perfectly identified. Thus, in the case of conflicting AV stimuli, participants of both groups based their answer on the most reliable channel: vision. In modulated noise, younger participants gave mainly fusion responses (/asa/) while older participants continued to give Ð mainly visually based responses (/a a/). Again, this result can be explained by identification performance in unisensory Ð modalities. In the older group, visual /a a/ remains by far better perceived than auditory /afa/ whereas in the younger group, performance in A and V modalities are more balanced. In other words, there was a release from masking of the McGurk fusion effect in the younger group but not in the older group probably because of the smaller size of masking release effect in that group. Analyses in the visual reduction condition showed that in stationary noise, both groups continued to give mainly visually based responses, though less than in the visual clear condition. Interestingly, visual reduction led to an increase in the number of auditory-based responses only in the older group. Again, this can be explained by performanceÐ in unisensory modalities. In the younger group, visual /a a/ still led to higher identification percentage than auditory /afa/. This was not true in the older group. This can explain why visually based responses and auditory-based responses were equivalent in the older group (for the visual reduction stationary noise condition). The fact that auditory-based responses increased only in the older group can also be explained by the fact that, when the visual input was degraded, older participants put more weight in the auditory modality than younger participants, as shown by the WFLMP simulations. In other words, in the visual reduction condition, group differences are explained both by differences in unimodal perception and differences in AV integration per se. Next, in modulated noise, the pattern of response of both groups was marked by an increase in the number of auditory-based responses that can be explained by the masking release effect. In the case of A/apa/V/aka/ stimulus, in visual clear blocks, stationary noise led both younger and older adults, as expected, to give mainly visually based responses. As for the previous McGurk stimulus, this can be explained by the fact that in stationary noise, V/aka/ is by far better perceived than A/apa/. In modulated noise, McGurk Huyse et al.: Audio-visual integration in older adults

1929

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

fusion effect was released from masking in both groups. This can be explained by the fact that, in both groups, performances in unisensory V and A modalities were balanced. Thus, when the visual input was degraded, auditory-based responses were increased in both groups. This increase was more important in the older group (particularly in stationary noise), even leading the /apa/ responses to be the main response in that group. This is interesting because in the older group, auditory /apa/ was not better perceived than visual /aka/. So the fact that auditory-based responses were the main responses cannot be explained by performance in unisensory modalities. Rather, analyses made in the framework of WFLMP suggest that in VR condition, older adults differ from younger adults in the process of AV integration. Indeed, they put more weight in the auditory modality than their younger counterparts. Older participants act as if visual reduction forced them to lean more on audition to answer. In modulated noise, auditory-based responses were drastically increased in both groups, as a result of the masking release. To summarize, in the visual clear condition, differences between younger and older participants can be explained by differences of performance in unisensory modalities (A and V). Participants of both groups based their response on the most reliable channel. This suggests that older and younger adults do not differ in term of AV integration but rather in term of unimodal perception. This hypothesis is supported by analyses made in the framework of WFLMP (see Schwartz, 2010). In the visual clear condition, simulations showed that the model fitted the data better when no specific weight was applied either to audition or vision. In other words, data in visual clear condition are well explained by the FLMP, confirming that group differences were due to differences in pure auditory and visual performance. This corroborates the hypothesis that older and younger adults do not differ in term of AV integration abilities, at least in visual clear condition. Furthermore, visual reduction led to an increase in the importance of audition in the response pattern, especially in the older group. Older participants seem to compensate the weakness of their lipreading abilities (due to visual reduction) by leaning more on audition. We showed that, in the visual reduction condition, group differences were not only explained by differences in perception of A and V cues but also by differences in the AV integration process per se; when the visual input was degraded, older adults awarded more weight to the auditory input than younger adults. In other words, ageing had a significant impact on AV speech integration when the visual speech cue was less informative, but not when it was clear. One possible explanation is based on the relation between visuo-spatial memory (SWM, processing speed (PS) and lipreading abilities. Good lipreading necessitates that we store a sequence of visually observed movements, and then combine them into a unified percept. With visual reduction, the phonetic information provided by the visual speech signal is reduced. The SWM and PS skills decline with age, and older adult’s poorer lipreading ability is correlated with age related decline in SWM and PS (Feld and Sommers, 2009). In the present experiment, older adults’ 1930

J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

poorer lipreading abilities in the visual reduction condition may be in part a consequence of their reduced SWM and PS. We may speculate that when perceptual processing becomes more difficult, the demands placed on cognitive mechanisms such as SWM and PS are increased. In this case, the ambiguity of the signal increases, and the task difficulty caused by this ambiguity provokes a heavier burden on cognitive abilities. Therefore, older participants’ performance in identification of lip-read syllables in visual reduction is more impaired than that of younger participants, and older participants would put more weight on auditory information. Further research is required to investigate more precisely the conditions leading to a change of weight of auditory information in older participants. A progressive degradation of visual information could allow us to measure the impact on variation of the weighting process inside the AV integration. Other types of degradation of visual phonemic cues could be used: pixellization (Gordon and Allen, 2009), frequency filtering of high frequencies. The use of these techniques should be combined with models such as WFLMP. V. CONCLUSIONS AND CLINICAL IMPLICATIONS

The present research showed that when visual speech cues are not degraded (and have a low level of uncertainty), older adults have preserved lipreading abilities and preserved AV integration skills. Furthermore, the present study focused on how older and younger adults deal with visually degraded stimuli. This issue is worth examining since it has been shown that the prevalence of visual impairments increases with age (see Legault et al., 2010). Our data showed that AV integration of speech with visually degraded stimuli was altered in older adults only in conditions where auditory performances were low (stationary noise) and not when auditory performances were high (modulated noise). This result suggests that older adults are able to compensate for the loss of lipreading abilities by using the available auditory information. This was also illustrated by the WFLMP simulations, which showed an increased weight of audition in the older group. Taken together, these data suggest that rehabilitation programs should include a first step where auditory and visual abilities are evaluated. Then, they should focus on enhancing not only auditory but also visual performance, even in older adults with visual impairments. ACKNOWLEDGMENTS

We are very grateful for financial support for this project. The study was funded by a grant from the Fonds National de Recherche Scientifique (FNRS), Belgium. Alain, C., Dyson, B. J., and Snyder, J. S. (2006). “Aging and the perceptual organization of sounds: a change of scene?,” in Handbook of Models for Human Aging, edited by P. M. Conn (Academic, New York), pp. 759–770. Bell, T. S., Dirks, D. D., Levitt, H., and Dubno, J. R. (1986). “Log-linear modeling of consonant confusion data,” J. Acoust. Soc. Am. 79(2), 518–525. Bregman, A. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, Cambridge, MA), 790 pp. Huyse et al.: Audio-visual integration in older adults

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

Burnham, D., and Dodd, B. (2004). “Auditory-visual speech integration by prelinguistic infants: Perception of an emergent consonant in the McGurk effect,” Dev. Psychobiol. 45(4), 204–220. Cienkowski, K. M., and Carney, A. E. (2002). “Auditory-visual speech perception and aging,” Ear Hear. 23(5), 439–449. Dancer, J., Krain, M., Thompson, C., and Davis, P. (1994). “A crosssectional investigation of speech reading in adults: Effects of age, gender, practice, and education,” Volta Rev. 96, 31–40. Dubno, J. R., and Ahlstrom, J. B. (1997). “Additivity of multiple maskers of speech,” in Modeling Sensorineural Hearing Loss, edited by Walt Jesteadt (Erlbaum, Hillsdale, NJ), pp. 253–272. Dubno, J. R., Horwitz, A. R., and Ahlstrom, J. B. (2002). “Benefit of modulated maskers for speech recognition by younger and older adults with normal hearing,” J. Acoust. Soc. Am. 111(6), 2897–2907. Dubno, J. R., Horwitz, A. R., and Ahlstrom, J. B. (2003). “Recovery from prior stimulation: masking of speech by interrupted noise for younger and older adults with normal hearing,” J. Acoust. Soc. Am. 113, 2084–2094. Elliott, D., Whitaker, D., and MacVeigh, D. (1990). “Neural contribution to spatiotemporal contrast sensitivity decline in healthy ageing eyes,” Vision Res. 30(4), 541–547. Erber, N. P. (1969). “Interaction of audition and vision in the recognition of oral speech stimuli,” J. Speech, Lang. Hear. Res. 12(2), 423–425. Feld, J. E., and Sommers, M. S. (2009). “Lipreading, processing speed, and working memory in younger and older adults,” J. Speech, Lang., Hear. Res. 52(6), 1555–1565. Folstein, M. F., Folstein, S. E., and McHugh, P. R. (1975). “ ‘Mini-mental state’: A practical method for grading the cognitive state of patients for the clinician,” J. Psychiatr. Res. 12(3), 189–198. F€ullgrabe, C., Berthommier, F., and Lorenzi, C. (2006). “Masking release for consonant features in temporally fluctuating background noise,” Hear. Res. 211(1–2), 74–84. Gates, G. A., and Mills, J. H. (2005). “Presbycusis,” Lancet 366, 1111–1120. Gordon, M., and Allen, S. (2009). “Audiovisual speech in older and younger adults: Integrating a distorted visual signal with speech in noise,” Exp. Aging Res. 35(2), 202–219. Grant, K. W. (2002). “Measures of auditory-visual integration for speech understanding: A theoretical perspective (L),” J. Acoust. Soc. Am. 112(1), 30–33. Grant, K. W., and Seitz, P. F. (1998). “Measures of auditory-visual integration in nonsense syllables and sentences,” J. Acoust. Soc. Am. 104, 2438–2450. Grant, K. W., and Seitz, P. F. (2000). “The use of visible speech cues for improving auditory detection of spoken sentences,” J. Acoust. Soc. Am. 108, 1197–1208. Hardick, E. J., Oyer, H. J., and Irion, P. E. (1970). “Lipreading performance as related to measurements of vision,” J. Speech Hear. Res. 13(1), 92–100. Huyse, A., Berthommier, F., and Leybaert, J. (2013). “Degradation of labial information modifies audiovisual speech perception in cochlear-implanted children,” Ear Hear. 34(1), 110–121. Johnson, D. D., and Snell, K. B. (1986). “Effect of distance visual acuity problems on the speech reading performance of hearing-impaired adults,” J. Acad. Rehabil. Audiol. 19, 42–55. Legault, I., Gagne, J.-P., Rhoualem, W., and Anderson-Gosselin, P. (2010). “The effects of blurred vision on auditory-visual speech perception in younger and older adults,” Int. J. Audiol. 49(12), 904–911. Liu, X. Z., and Yan, D. (2007). “Ageing and hearing loss,” J. Pathol. 211(2), 188–197. MacLeod, A., and Summerfield, Q. (1987). “Quantifying the contribution of vision to speech perception in noise,” Br. J. Audiol. 21(2), 131–141. Massaro, D. W. (1987). Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry (Erlbaum, Hillsdale, NJ), 336 pp. Massaro, D. W. (1998). Perceiving Talking Faces: From Speech Perception to a Behavioral Principle (MIT Press, Cambridge, MA), 494 pp.

J. Acoust. Soc. Am., Vol. 136, No. 4, October 2014

Massaro, D. W., Thompson, L. A., Barron, B., and Laren, E. (1986). “Developmental changes in visual and auditory contributions to speech perception,” J. Exp. Child Psychol. 41(1), 93–113. McGurk, H., and MacDonald, J. (1976). “Hearing lips and seeing voices,” Nature 264, 746–748. Nelson, P. B., Jin, S.-H., Carney, A. E., and Nelson, D. A. (2003). “Understanding speech in modulated interference: Cochlear implant users and normal-hearing listeners,” J. Acoust. Soc. Am. 113(2), 961–968. Ries, P. W. (1994). “Prevalence and characteristics of persons with hearing trouble: United States, 1990–91,” Vital Health Stat. 10 1994(188), 1–75. Rimmele, J., Schr€ oger, E., and Bendixen, A. (2012). “Age-related changes in the use of regular patterns for auditory scene analysis,” Hear. Res. 289, 98–107. Romano, P. E., and Berlow, S. (1974). “Vision requirements for lip reading,” Am. Ann. Deaf 119(4), 383–386. Rosenblum, L. D., Schmuckler, M. A., and Johnson, J. A. (1997). “The McGurk effect in infants,” Percept. Psychophys. 59(3), 347–357. Ross, J. E., Clarke, D. D., and Bron, A. J. (1985). “Effect of age on contrast sensitivity function: Uniocular and binocular findings,” Br. J. Ophthalmol. 69(1), 51–56. Schwartz, J.-L. (2010). “A reanalysis of McGurk data suggests that audiovisual fusion in speech perception is subject-dependent,” J. Acoust. Soc. Am. 127(3), 1584–1594. Schwartz, J. L., Tiippana, K., and Andersen, T. (2010). “Disentangling unisensory from fusion effects in the attentional modulation of McGurk effects: A Bayesian modeling study suggests that fusion is attentiondependent,” International Conference on Auditory-Visual Speech Processing, p. S2-1. Sekiyama, K., and Burnham, D. (2004). “Issues in the development of auditory-visual speech perception: Adults, infants, and children,” in Eighth International Conference on Spoken Language Processing, pp. 1137–1140. Sommers, M. S., Tye-Murray, N., and Spehar, B. (2005). “Auditory-visual speech perception and auditory-visual enhancement in normal-hearing younger and older adults,” Ear Hear. 26(3), 263–275. Spear, P. D. (1993) “Neural bases of visual deficits during aging,” Vision Res. 33(18), 2589–2609. Sumby, W. H., and Pollack, I. (1954). “Visual contribution to speech intelligibility in noise,” J. Acoust. Soc. Am. 26(2), 212–215. Takahashi, G. A., and Bacon, S. P. (1992). “Modulation detection, modulation masking, and speech understanding in noise in the elderly,” J. Speech Hear. Res. 35(6), 1410–1421. Sommers, M. S., and Spehar, B. (2007). “The effects of age and gender on lipreading abilities,” J. Am. Acad. Audiol. 18(10), 883–892. Tye-Murray, N., Sommers, M., Spehar, B., Myerson, J., and Hale, S. (2010). “Aging, audiovisual integration, and the principle of inverse effectiveness,” Ear Hear. 31(5), 636–644. Van Rooij, J. C., and Plomp, R. (1992). “Auditive and cognitive factors in speech perception by elderly listeners. III. Additional data and final discussion,” J. Acoust. Soc. Am. 91(2), 1028–1033. Walden, B. E., Busacco, D. A., and Montgomery, A. A. (1993). “Benefit from visual cues in auditory-visual speech recognition by middle-aged and elderly persons,” J. Speech Hear. Res. 36(2), 431–436. Walden, B. E., Prosek, R. A., Montgomery, A. A., Scherr, C. K., and Jones, C. J. (1977). “Effects of training on the visual recognition of consonants,” J. Speech Hear. Res. 20(1), 130–145. Winneke, A. H., and Phillips, N. A. (2011). “Does audiovisual speech offer a fountain of youth for old ears? An event-related brain potential study of age differences in audiovisual speech perception,” Psychol. Aging 26(2), 427–438. Ziegler, J. C., Pech-Georgel, C., George, F., Alario, F.-X., and Lorenzi, C. (2005) “Deficits in speech perception predict language learning impairment,” Proc. Natl. Acad. Sci. U.S.A. 102(39), 14110–14115.

Huyse et al.: Audio-visual integration in older adults

1931

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 140.254.87.149 On: Fri, 19 Dec 2014 01:22:17

Effects of aging on audio-visual speech integration.

This study investigated the impact of aging on audio-visual speech integration. A syllable identification task was presented in auditory-only, visual-...
1MB Sizes 3 Downloads 6 Views