Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users.

Brain Struct Funct DOI 10.1007/s00429-013-0704-6

ORIGINAL ARTICLE

Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users Jae-Jin Song • Hyo-Jeong Lee • Hyejin Kang • Dong Soo Lee • Sun O. Chang • Seung Ha Oh

Received: 24 June 2013 / Accepted: 30 December 2013 Ó Springer-Verlag Berlin Heidelberg 2014

H.-J. Lee S. O. Chang S. H. Oh Sensory Organ Research Institute, Seoul National University Medical Research Center, Seoul, Korea

demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed more by visual distractors when confronted with incongruent audiovisual stimuli. To cope with this multimodal conflict, CI users activate the left inferior frontal gyrus to adopt a top-down cognitive modulation pathway, whereas normal hearing individuals primarily adopt a bottom-up strategy.

H. Kang D. S. Lee Department of Nuclear Medicine, Seoul National University Hospital, Seoul, Korea

Keywords Cochlear implant Deafness Positron emission tomography Audiovisual Plasticity

S. O. Chang S. H. Oh (&) Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, 101 Daehak-Ro Jongno-Gu, Seoul 110-744, Korea e-mail: [email protected]

Abbreviations NH Normal hearing AV Audiovisual STS Superior temporal sulcus

Abstract While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H15 2 O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects Jae-Jin Song and Hyo-Jeong Lee contributed equally to this work. J.-J. Song Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Bundang Hospital, Seongnam, Korea e-mail: [email protected] H.-J. Lee Department of Otorhinolaryngology-Head and Neck Surgery, Hallym University College of Medicine, Chun-Cheon, Korea

123

Brain Struct Funct

IFG CI PET MRI CAP MTG ITG rCBF vPMC SMG SFG MeFG

Inferior frontal gyrus Cochlear implant Positron emission tomography Magnetic resonance imaging Categories of auditory performance Middle temporal gyrus Inferior temporal gyrus Regional cerebral blood flow Ventral premotor cortex Supramarginal gyrus Superior frontal gyrus Medial frontal gyrus

Introduction To function in multisensory environments, the human brain merges information from multiple sources into a coherent percept to direct attention and to coordinate behavioral responses (Corbetta and Shulman 2002; Werner and Noppeney 2010b; Strelnikov et al. 2011). For example, to process face-to-face communications between normal hearing (NH) individuals, the brain combines cues from the auditory (vocalizations) and visual modalities (orofacial articulatory movements) to reduce noise interference and increase accuracy (Ross et al. 2007; Nath and Beauchamp 2011). However, while congruent orofacial articulatory movements significantly contribute to speech comprehension (Sumby and Pollack 1954; van Wassenhove et al. 2005), incongruent audiovisual (AV) stimuli may lead to novel percepts that match neither the auditory nor the visual information (McGurk and MacDonald 1976). These observations have generated interest in neural substrates specifically involved in AV speech processing, and indeed, regions such as the left superior temporal sulcus (STS) and the inferior frontal gyrus (IFG) have been identified as possible substrates. In NH individuals, the STS has constantly been identified as an AV integrator of both speech and non-speech stimuli (Calvert et al. 2000; Beauchamp et al. 2004a; Miller and D’Esposito 2005; Werner and Noppeney 2010a). In addition, it has been suggested that the IFG has a specific role in the processing of incongruent AV stimuli, potentially reflecting the increased cognitive demands (Hein et al. 2007; Jones and Callan 2003). The STS in NH individuals shows additive or superadditive activation for AV congruence (Beauchamp 2005; Talsma et al. 2007) and this suggests an important role of the STS in AV categorizations (Werner and Noppeney 2010b). Recent observations of increasing supperadditivity for degraded stimuli (original auditory or visual stimuli combined with noise phase spectra) in the STS (Werner and Noppeney 2010b) are consistent with the inverse

123

effectiveness principle, which states that multisensory enhancement is maximal when the individual stimuli are least effective (Meredith and Stein 1983). A good example of inverse effectiveness is a deaf subject whose auditory modality has been restored by cochlear implant (CI). Even for a proficient CI user, poorly represented temporal fine structure and limitations in encoding spectral cues yield impoverished input to the neocortex when compared to normal acoustic stimulation (Kral and O’Donoghue 2010). Consequently, when processing auditory cues, CI users benefit from congruent visual cues as an important compensatory mechanism, and they show enhanced congruent AV information fusion ability comparable to NH individuals (Doucet et al. 2006; Rouger et al. 2007; Tremblay et al. 2010). In addition, a recent functional imaging study has revealed that the activity of the visual cortex is positively correlated with the proficiency level of auditory recovery (Strelnikov et al. 2013). In addition, recent behavioral studies reveal that proficient CI users, as well as NH subjects, perform equally well for AV incongruent tasks, whereas non-proficient CI users demonstrated inferior results and relied predominantly on visual cues (Tremblay et al. 2010; Champoux et al. 2009). In this regard, proficient CI users have revealed remarkable perception abilities for both congruent and incongruent AV stimuli, whereas AV conflict in non-proficient CI users can be a major obstacle for successful rehabilitation. In profound deafness, the deprived auditory cortical regions are taken over by intact sensory modalities such as vision (Finney et al. 2001; Rauschecker 1999) as a result of competition for cortical space. In addition, unimodal auditory speech stimuli activate visual cortices more in CI users than NH individuals (Giraud et al. 2001; Giraud and Truy 2002), designating an AV coupling that is progressively tuned after CI. Moreover, unimodal visual speech cues activate auditory phonological regions of CI users more than those of NH individuals (Lee et al. 2007b). Based on these unique activation patterns of the CI users for unimodal speech stimuli, we may surmise the presence of characteristic neural substrates involved in bimodal AV stimuli processing in CI users. However, despite recent advances in understanding of CI users’ ability to process congruent as well as conflicted AV information, the hitherto available studies are based on behavioral approaches, and thus the neural substrates involved in the integration of AV stimuli in CI users remain unknown. Following the literature reviewed above, we hypothesize that CI users may activate the visual cortex more than NH individuals when processing congruent AV stimuli. In addition, we surmise that this vision-reliant tendency may hinder auditory perception when confronted with incongruent visual cues, and therefore CI users may utilize additional higher-order brain regions such as prefrontal

Brain Struct Funct

cortices to process incongruent AV information. Hence, the purpose of the current study was twofold. First, by means of H15 2 O-positron emission tomography (PET) (Song et al. 2012a, 2013a), the optimal method for an investigation of CI users because other methodologies such as functional magnetic resonance imaging or magnetoencephalography are not feasible in CI users, we sought to reveal in CI users the neural correlates associated with the processing of congruent AV stimuli, as well as the extraction of target auditory cues in the milieu of reinforcing or distracting visual cues. By comparing these results with those of a matched NH control group, we further characterized differences in strategies used in multimodal speech processing between the CI users and the NH controls. Second, by correlation analyses with speech performance and deafness duration as covariates, we examined cortical regions of activation under AV stimuli that were modulated by deafness-induced plasticity before CI surgery, and cortical regions of activation that were related to CI speech outcome.

time elapsed since the patient could not communicate in the auditory mode even with the best-fitted hearing aid. After more than 3 months of practice with hearing aids, none of our patients reached a threshold of 70 dB (criteria of severe hearing loss) (Song et al. 2009, 2012b, 2013a) for any frequency, and none displayed any improvement in auditory language skills. All enrolled CI users demonstrated good performance as defined by categories of auditory performance (CAP) scores of 6 (understand conversation, no lip reading) or 7 (highest score, use of telephone—known speaker) (Archbold et al. 1995). Post-CI speech scores were measured using the Korean Phonetically Balanced (PB) Word Perception test 1 year after CI (6–13 months after CI, mean ± SD 9.7 ± 2.5 months) (Table 1). The PB word perception test is composed of 40 monosyllabic PB words. Two licensed speech therapists with more than 9 years of experience presented words with auditory cues only, and subjects were instructed to verbally repeat the words. The mean duration after CI surgery was 57.8 ± 44.2 months (range 6–147 months). Deafness duration and post-CI speech score showed no significant correlation (P = 0.46, Spearman’s q = -0.27) (Fig. 1).

Materials and methods Stimulus paradigm Participants Twelve post-lingually deafened adult CI users (8 males and 4 females, 31.5 ± 8.0 years) and ten control participants (7 females, 3 males) with normal hearing and vision matched for age and education level were enrolled (Table 1). The study was approved by the institutional review board at Seoul National University Hospital. The mean age of the patients was 31.5 ± 8.0 years (range 19–47 years). Nine of the subjects had a history of idiopathic progressive hearing loss, and the other three had histories of sudden sensorineural hearing loss, sudden hearing loss after febrile illness, and progressive sensorineural hearing loss due to chronic otitis media. All were right-handed (self-reported), had normal/corrected-normal visual acuity, and had no history of mental retardation or neurologic/psychiatric problems. Temporal bone computed tomography and brain magnetic resonance imaging (MRI) were available for all subjects, and no inner ear/cerebral anatomical abnormalities were found. All subjects had bilateral profound hearing loss (C90 dB HL) preoperatively. The mean deafness duration was 7.6 ± 6.6 years (range 0.5–20 years). In cases of progressive hearing loss, some degree of brain plasticity should develop from the onset of hearing loss, prior to deafness onset (Lazard et al. 2013). However, for our deaf subjects with a long history of progressive hearing loss, the onset of hearing loss based on subjective memory was rather unreliable. Therefore, deafness duration was defined as the

Three Korean native speakers (two men and one woman) were videotaped while pronouncing Korean numbers from 1 to 9. Korean numbers 2, 3, and 4 are monosyllabic, while 1, 5, 6, 7, 8, and 9 are bysillabic words. The differences in syllables were minimized by instructing all three speakers to pronounce each word using a normal pronunciation with an even intonation, vocal intensity and tempo. The production of each stimulus word began and ended in a neutral, closed mouth position for a total duration of approximately 1 s. The duration of each stimulus unit, consisting of a neutral lip position, followed by an articulatory movement, and ending in neutral lip position (800–1,200 ms for each), was 3,000 ms. Video clips and sound tracts were separated and edited monitor using Studio Plus v 10.5 (Pinnacle, Mountain View, CA, USA) to produce four stimulus conditions, i.e., auditory stimuli without any visual cue (A-only), the congruent AV stimuli (con-AVS) in which the sound matched the lip movement, the incongruent AV stimuli (inc-AVS) in which the sound did not match the lip movement, and a baseline condition with a flickering cross at the center of the monitor. For conAVS and inc-AVS, the beginnings of the mouth movement and presented sound were synchronized. For all conditions, a white crosshair on black background was presented at the center of the screen at the beginning and end of each session. Before each experiment, all subjects were given a 10-min training session on the stimuli and tasks, practicing all four

123

Brain Struct Funct Table 1 Demographic data of CI user group and control group Subject code

Age (years)

Sex

CI side

Implanted device

Duration of deafness (years)

P01

23

M

L

Cochlear CI24RCA

0.5

Duration of CI usage (months) 6

Cause of deafness

Word perception test score (%) Auditory-only condition

PSNHL

85

P02

32

F

R

Cochlear CI24RE(CA)

1

15

PSNHL

77.8

P03

28

F

L

Cochlear CI24RCA

6

45

PSNHL

100

P04

19

M

R

Cochlear CI24RE(CA)

12

18

PSNHL

33

P05

47

F

R

Med-El Combi40?

17

79

FI

45

P06

27

F

R

Cochlear CI22 M

6

141

SSNHL

45

P07

25

M

R

AB Clarion HiRes 90 K/HiFocus

6

48

PSNHL

75

P08

37

M

L

Cochlear CI24RCA

13

42

PSNHL

100

P09

45

F

L

Cochlear CI22 M

0.5

147

COM

80

P10 P11

35 29

F F

L R

Cochlear CI24RE(CA) Cochlear CI24R(CS)

3 10

24 76

PSNHL PSNHL

100 100

P12

31

F

L

Cochlear CI24RCA

20

52

PSNHL

85

C01

40

F

N/A

C02

33

M

C03

19

M

C04

19

M

C05

34

M

C06

36

M

C07

33

M

C08

41

F

C09

27

F

C10

32

M

N/A

Two patients (P03 and P08) were excluded from both behavioral and PET image analyses due to poor image quality CI cochlear implant, P CI patients, C controls, M male, F female, R right, L left, PSNHL progressive sensorineural hearing loss, FI febrile illness, SSNHL sudden sensorineural hearing loss, COM chronic otitis media, N/A not applicable

conditions to ensure that they understood the instructions and all the conditions. However, to maintain a constant attention level, they were not informed of the existence of target stimuli sequences inserted in each session. In other words, a target stimuli sequence (either congruent or incongruent AV) was inserted in between random (mixed congruent and incongruent AV) stimuli sessions, the subjects were misled as if all the AV conditions were composed of mixed congruent and incongruent AV stimuli. Moreover, they were instructed to stare at the monitor throughout the experiment and to click the left button of the mouse just when an even number was heard, irrespective of the visual stimuli. In this way, they perceived both AV stimuli, but had to attend only the auditory information while disregarding visual stimuli. Each condition was presented twice, so that a total of eight scans were acquired for each subject. To present four conditions in a random order to minimize expectations and concentration decline bias, we used the hospital identification numbers as seed numbers in a random number-generation algorithm.

123

Stimulus presentation and image acquisition The stimulus units for each condition were arranged and presented by Presentation software version 12.0 (Neurobehavioral system, Albany, CA, USA). The visual stimuli were presented on a 14.1-in. monitor located 80 cm away from the subject’s eyes. The video image was 20.6 cm (8.1 in.) high 9 27.7 cm (10.9 in.) wide, centered on the monitor over a black background. The sound was presented at a fixed comfortable listening level (approximately 70 dB SPL) by a headset wide enough to cover the microphone of the CI external device. The input level of the sound stimuli was controlled across words and between individuals by controlling the loudness of sound from the stimulation PC equally. The noise level in the scanning room was approximately 50-dB SPL and the noise attenuation level of the headset was approximately 20-dB SPL, which guaranteed no interruption of auditory stimuli perception. After each stimulus, a log file was automatically created that recorded the response and the time interval between the stimulus and response.

Brain Struct Funct

0.35 cycles/pixel. All reconstructed images were corrected for attenuation, and the transaxial images were realigned to produce sagittal and coronal images. Each scan was acquired for 2 min in the three-dimensional mode with retracted interplane septa and dynamic scans. For image analysis, each scan data consisted of the summation of dynamic scans only for 90 s during the period of target stimuli. By inserting the target stimuli between the random stimuli and not informing the participants that each session contained a target stimuli period, we precluded the possibility that different attention levels affected modulation of the auditory association cortices (van Atteveldt et al. 2007). Data analysis

Fig. 1 Post-CI speech score of CI users plotted as a function of deafness duration, showing no significant correlation (P = 0.46, Spearman’s q = -0.27). Blank diamonds highlight two patients with abrupt deafness

Each scan was obtained for 3 min while presenting each stimulus sets, and a 10–12 min intermission was given between conditions to take into account H15 2 O uptake and washout phases. For the A-only condition, each scan was subdivided into a 4-s instruction session and a 176-s target stimuli sequence. For con-AVS and inc-AVS, the 3-min stimulus session was subdivided into a 4-s instruction session, a 20-s period of random stimuli, a 90-s target stimuli sequence, and a 66-s period of random stimuli. For con-AVS and inc-AVS, both congruent and incongruent AV stimuli were presented during the period of random stimuli to prevent the subject from losing focus during repetitive sequences of congruent or incongruent stimuli. Depending on the subject’s body weight, an intravenous injection of 46 mCi or less of H15 2 O was delivered simultaneously with the beginning of a new scanning sequence. Image acquisition was performed for 2 min starting from isotope injection, using an ECAT EXACT47 (SiemensCTI, Knoxville, TN) PET scanner (BGO crystal detector, spatial resolution 6.1 mm, axial resolution 4.3 mm, sensitivity 214 kcps/lCi/min) in two-dimensional mode with a 16.2-cm axial field of view. A transmission scan was performed using a 68Ga rod source to establish attenuation maps immediately before an emission scan. During the emission scan, 47 slices of brain emission images were acquired over a 2-min period, during which time subjects received minimal sensory input (dimmed light and silence). Emission images were reconstructed in a 128 9 128 9 47 matrix with a pixel size of 2.1 9 2.1 9 3.4 mm, using a filtered back projection method with a Shepp filter with a cut-off value of

For behavioral data obtained during scanning, the significance of differences in accuracy and response time (only for correct responses) among three active stimulus conditions within each group and between the CI user and control groups were determined with the Mann–Whitney U test. Statistical analyses were performed using SPSS for Windows version 13.0 (SPSS, Chicago, IL). Two subjects (P03 and P08) were excluded from both behavioral and PET image analyses due to poor image quality. The two groups remained still matched with regard to age, sex (P = 0.684 and P = 0.143, respectively by the Mann–Whitney U test), and education level. Twenty subjects (ten for each group) were included for the analysis of PET images. Image preprocessing (realignment, spatial normalization, spatial smoothing with a 16-mm full width at half-maximum Gaussian kernel), and statistical analyses were carried out using SPM5 package (Wellcome Department of Cognitive Neurology, London, UK) implemented in Matlab 7.1 (Mathworks Inc., Natick, MA). For all the factorial design specifications, we have used relative threshold masking of 0.8, implicit mask, and proportional scaling. In individual analyses, global normalization of H15 2 O uptake was applied so that the mean count of H15 O uptake of PET 2 images from each subject’s brain was arbitrarily set at 50. For each subject, three contrast images were created by subtracting images of the baseline condition from images of the A-only, con-AVS, and inc-AVS conditions. The outputs of these individual analyses were entered into the 2nd-level group analyses. The possible influence of age and sex was factored out by including age and sex as nuisance variables in all group analyses. The locations of significant clusters were determined initially with the Anatomy Toolbox implemented in SPM5 (Eickhoff et al. 2005), and were reconfirmed by referring to the Talairach and Tournoux’s atlas (Talairach and Tornoux 1988). Group analyses of activated cortical areas were carried out in a voxelwise manner with a flexible factorial design

123

Brain Struct Funct Table 2 Neural activation under A-only, congruent AV, and incongruent AV stimuli The control group

MNI coordinates (x y z)

BA

Cluster size

T

The CI user group

R MTG

72 -12 -4

21

1,798

6.47

L MTG

-62 -24 2

22/42

1,251

4.94

R STG L STG

68 -36 6 -66 -26 0

22/42 21

i.above 1,088

4.71 5.35

L STG L temporal pole

-60 -2 -4 -54 6 -8

22 22

i.above i.above

3.73 3.66

L MTG

-64 -56 4

21

i.above

4.11

R STG

72 -14 -2

21

316

4.04

L thalamus

-14 -16 12

173

3.88

R thalamus

4 -8 6

12 -88 -12

17

1,121

4.84

A-only baseline

BA

Cluster size

T

A-only baseline

3.49

Congruent AV baseline R inferior temporal gyrus


Congruent AV baseline 56 -56 -16

37

4,125

5.91

R lingual G

R temporo-parietal junction

50 -40 20

42

i.above

5.46

R amygdala

32 4 -14

97

3.70

R lingual gyrus

12 -88 -12

17

1,426

5.81

L MTG

-68 -30 6

22

34

3.51

L MTG

-70 -28 0

21

181

5.39

L lingual gyrus

-22 -84 -4

18

49

3.62

Incongruent AV baseline

Incongruent AV baseline

R lingual gyrus

12 -94 -10

17

1,889

6.62

R calcarine gyrus

12 -94 -6

17

1,248

5.31

R STG

68 -36 6

22/42

2,168

5.15

L calcarine gyrus

-6 -104 -6

17

i.above

5.01

R MTG

72 -12 -6

21

i.above

5.04

L STG

-68 -30 4

22

934

5.23

R precentral gyrus R ITG

40 -8 26 58 -58 -16

6 37

i.above 183

3.93 4.13

L MTG L Inferior frontal gyrus

-46 -54 2 -40 20 0

21 47

i.above 675

3.69 4.39

L MTG, middle

-70 -28 -2

21

50

4.09

R STG

74 -22 0

22

433

4.26

L MTG, post

-64 -56 4

21

60

3.95

R pons

6 -12 20

202

4.26

Uncorrected P \ 0.001, k = 25 voxels, T = 3.25 CI cochlear implant, A-only auditory-only, AV audiovisual, L left, R right, STG superior temporal gyrus, STS superior temporal sulcus, MTG middle temporal gyrus, ITG inferior temporal gyrus, i.above included in the above cluster

with group and condition as factors by contrasting the brain activities for the three active stimuli conditions with the brain activity for the baseline condition, using a statistical threshold of P = 0.001, uncorrected (k = 25, T = 3.25) (Table 2). Also, as the current study mainly focus on the CI users’ strategy of multisensory perception, two additional intra-group analyses were performed for the CI user group with contrasts the CI user group (inc-AVS–con-AVS) and the CI user group (con-AVS–inc-AVS) using a statistical threshold of uncorrected P = 0.005 (k = 25, T = 2.67) and an inclusive mask at P = 0.005. Between-group analyses were then performed by subtracting the areas of increased H15 2 O uptake in the control group from that of the CI group for three active stimulus conditions or vice versa with a threshold of uncorrected P = 0.005 (k = 25, T = 2.67) (Table 3) and inclusive masks at P = 0.005. In order to disclose the areas of activation in the CI user group relative to those of the control group, a conjunction analysis was performed using a threshold of P = 0.005, uncorrected (T = 2.67) for all three active stimulus conditions. Moreover, group-by-condition interaction analyses were performed by a contrast ‘‘the CI user group (incAVS–con-AVS)—the control group (inc-AVS–con-AVS)’’

123

and vice versa, ‘‘the control group (inc-AVS–con-AVS)— the CI user group (inc-AVS–con-AVS)’’, with uncorrected P = 0.005 (T = 2.67) (Table 4). Finally, correlation analyses were performed in a voxelwise manner between the contrast images of each condition and (1) deafness duration, (2) word perception score using aforementioned Korean PB word perception scores, and (3) duration of CI experience (uncorrected P \ 0.005, k = 10, T = 3.50 and inclusive masks at P = 0.005, Tables 5, 6), while age and sex being controlled for by treating them as nuisance variables. Areas with significant effects were selected and reported. We have adopted the statistical threshold of P = 0.001, uncorrected for detecting intra-group main effects, similar to the previous literature (Lazard et al. 2011; Lee et al. 2007b). We have adopted this threshold because the aim of the study to explore differences between the two groups with regard to the activity of the areas of higher cognitive processing after primary sensory perception stage and in this regard this was the most stringent post hoc threshold for detecting hypermetabolic effects in auditory areas under the A-only condition and in auditory and visual areas under the con-AVS condition in both groups. By comparing the

Brain Struct Funct Table 3 Areas of relative activation in the CI user group compared to the control group under incongruent AV, congruent AV and A-only conditions, and vice versa The CI user group [ the control group


BA

Cluster size

T

Under the A-only condition

The control group [ the CI group


BA

Cluster size

T

-64 -62 4

39

45

4.74

37

162

2.93

1,177

4.88


No suprathreshold cluster

L MTG

R ITG 56 -56 -18 Under the AV congruent condition

Under the congruent AV condition R amygdala

30 2 -12

78

3.49

R vPMC

46 -12 26

6

L hippocampal tail

-22 -44 4

38

3.15

R supramarginal gyrus

46 -38 20

40

i.above

3.68

R ITG

58 -56 -12

37

513

3.55

L inferior occipital

-26 -84 -6

18

83

3.52 4.57

Under the incongruent AV condition

Under the incongruent AV condition

L inferior frontal gyrus

-44 14 8

47

47

3.04

R vPMC

42 -8 24

6

716

L middle temporal sulcus

-42 -52 4

37

48

2.90

R supramarginal gyrus

54 -38 22

40

57

3.41

R inferior occipital gyrus

16 -98 -22

18

70

3.32

Uncorrected P \ 0.005, k = 25 voxels, T = 2.67 (masked at P = 0.005) CI cochlear implant, A-only auditory-only, AV audiovisual, L left, R right, G gyrus, IFG inferior frontal gyrus, MTG middle temporal gyrus, ITG inferior temporal gyrus, vPMC ventral premotor cortex Table 4 Areas activated in the conjunction and interaction analysis MNI coordinates (x y z)

BA

Cluster size

T

Conjunction analysis: areas more activated in the CI user group than in the control group under A-only, incongruent AV, and congruent AV conditions L superior frontal gyrus

-26 14 62

6

69

3.30

L medial frontal gyrus

-14 52 12

10

31

3.00

Results Behavioral data during the experiment

Interaction analysis: areas of relative activation in CI users under congruent AV condition subtracted from those under incongruent AV condition L middle frontal gyrus

significant in intra-group analyses. In addition, to compensate for the relatively low statistical threshold, we focused on overlaps between functional relevance (obtained by correlation analyses) and the results from the conjunction analysis.

-46 30 42

8

4

2.81

Uncorrected P \ 0.005, k = 0, T = 2.67 CI cochlear implant, A-only auditory-only, AV audiovisual, L left

results using the statistical threshold of P = 0.001, uncorrected with those using P = 0.05, false detection rate (FDR) corrected (Genovese et al. 2002), we confirmed that most of the results are replicable except for the fact that there were no suprathreshold clusters under the con-AVS condition in the CI user group when adopted FDR corrected threshold. Therefore, we report results obtained by the statistical threshold of P = 0.001, uncorrected. In addition, similar to previous works (Lazard et al. 2010, 2012b; Lee et al. 2001, 2003, 2007a, b), we have adopted P = 0.005, uncorrected and also masked with an inclusive mask at P = 0.005 for all the group comparisons and correlation analyses to detect the effects from functionally relevant areas that were already statistically

On intra-group comparison, the response accuracy under the inc-AVS condition was significantly poorer as compared with those under the con-AVS condition in the CI user group (P = 0.003), while the control group showed no significant differences with regard to the response accuracy for all three stimuli (Fig. 2, left panel). Meanwhile, with regard to response time, the CI user group displayed slower responses for the inc-AVS condition than for the A-only condition or con-AVS condition (P = 0.004 and 0.001, respectively), but the control group showed significant difference only between A-only and inc-AVS conditions (P = 0.045; Fig. 2, right panel). On inter-group comparison, the CI user group demonstrated the overall poorer response accuracy than the NH control group (Fig. 2, left panel). There were no statistically significant differences between the two groups under the A-only and con-AVS conditions (P = 0.069 and 0.314, respectively). However, the CI user group revealed significantly lower accuracy than the NH group under the incAVS condition (P = 0.001). Meanwhile, with regard to response time, the CI users showed overall slower

123

Brain Struct Funct Table 5 The activated areas positively or negatively related to the duration of deafness Positive correlation


BA

Cluster size

T


Negative correlation


BA

Cluster size

T

22

18

4.21


n.s.

n.s.

Under the congruent AV condition L hippocampal tail

-12 -38 12

Claustrum

30 8 -10

R lingual gyrus

16 -84 -14

Under the congruent AV condition

18

149

7.17

72

4.44

195

4.04

Under the incongruent AV condition n.s.

L superior temporal gyrus

-58 2 -12


Uncorrected P \ 0.005, k = 10 voxels, T = 3.50 with an inclusive mask at P = 0.005 CI cochlear implant, A-only auditory-only, AV audiovisual, n.s. not significant

Table 6 The activated areas positively or negatively related to the CI word score Positive correlation


BA

Cluster size

T



BA

Cluster size

T


n.s.

n.s.

Under the congruent AV condition R amygdala

Negative correlation

32 -2 20

Under the congruent AV condition 27

5.55


L lingual gyrus

-12 -96 -6

18

132

7.46

R lingual gyrus

14 -90 -12

18

22

3.85

Under the incongruent AV condition R lingual gyrus

18 -94 -10

18

134

5.25

L lingual gyrus L fusiform gyrus

-14 -102 -6 -32 -64 -2

17 19

76 14

4.99 4.58

(uncorrected P \ 0.005, k = 10 voxels, T = 3.50 with an inclusive mask at P = 0.005) CI cochlear implant, A-only auditory-only, AV audiovisual, n.s. not significant

responses than the controls (Fig. 2, right panel) and the discrepancies were prominent for the A-only and inc-AVS conditions (P = 0.009 and 0.007, respectively). In contrast, the difference between the two groups was insignificant for the con-AVS condition (P = 0.314). These results confirm that the participating CI users indeed retain slower, but good, auditory speech perception ability compared to normal hearing peers, and with the support of congruent visual information, speech perception ability is behaviorally not discriminable from controls. Significant activation in three task conditions and group differences Fig. 2 Behavioral data showing significant intra-group differences among the three stimuli conditions in each group (asterisks), and differences between the two groups (daggers) for all three conditions with regard to the correct trials and response time. Error bars represent SE. A-only auditory-only, AVcon congruent audiovisual, AVinc incongruent audiovisual

123

Figure 3 shows the activated cortical areas in the two groups under all three active stimulus conditions, with areas showing significant group difference overlaid. The activated areas and statistical details are summarized in Tables 2 and 3.

Brain Struct Funct Fig. 3 Main activation effects for the three conditions in CI users (yellow) and controls (azure) (uncorrected P \ 0.001, k = 25 voxels, T = 3.25), as well as relative activation foci for the contrast ‘‘CI users– Controls’’ (red) and ‘‘Controls– CI’’ (blue) (uncorrected P \ 0.005, k = 25 voxels, T = 2.67 masked at P \ 0.005). CI cochlear implant, AV audiovisual

For the A-only condition, both groups revealed activation of bilateral primary auditory and auditory association cortices (BAs 21, 22 and 42, Table 2). As for group differences, the CI group demonstrated no suprathreshold clusters compared to the control group. In contrast, greater activation in the NH controls relative to the CI users was observed in the left lateral middle temporal gyrus (MTG, BA 39) and the right inferior temporal gyrus (ITG, BA 37) (Fig. 3, upper panels). Under the con-AVS condition, both groups showed increased regional cerebral blood flow (rCBF) in the visual cortices (BAs 17 and 18). However, compared to the salient activation of bilateral auditory cortices in the control group, the CI group showed a relatively small activation in the left MTG (BA 22). Unlike the CI group, the control group showed increased rCBF in the right ITG (BA 37) and the right temporo-parietal junction (BA 42, Table 2). The control group demonstrated significant CBF increases in the right ventral premotor cortex (vPMC, BA 6), supramarginal gyrus (SMG, BA 40), and ITG (BA 37) relative to the CI user group. By contrast, the CI user group showed more activation than controls in the right amygdala and the left hippocampal tail (Table 3; Fig. 3, middle panels). In the inc-AVS condition, both groups revealed activations in the auditory (BAs 21, 22 and 42) and visual (BA 17) cortices. However, the control group showed increased rCBF in the right ITG (BA 37) under the conAVS condition, whereas the CI user group additionally

indicated a significant cluster in the left IFG (BA 47) (Table 2). Moreover, the CI group displayed significantly more activation in areas of the left IFG and left MTG (BA 37) than controls, whereas more activation in the control group than the CI group was found in the right vPMC, the right SMG, and the right inferior occipital gyrus (BA 18) (Fig. 3, lower panels; Fig. 4, left panel and bar graphs). For the contrast CI users (inc-AVS–con-AVS), no significant effects were found even at P = 0.01, uncorrected. However, for the contrast CI users (inc-AVS–con-AVS), a marginally significant effect at the left IFG was found (P = 0.006, T = 2.50, and k = 57). Conjunction and interaction analysis By means of the conjunction analysis, areas in the left superior frontal gyrus (SFG, BA 6) and left medial frontal gyrus (MeFG, BA 10) were found to be hyperactivated in the CI user group relative to the control group under all three active stimulus conditions (Table 4; Fig. 4). A groupby-condition interaction between inc-AVS and con-AVS conditions was elucidated in an area of the left middle frontal gyrus (MFG, BA 8), which shows effect of the CI user group (inc-AVS–con-AVS)—the control group (incAVS–con-AVS). Interactions of the opposite direction showed no suprathreshold clusters (Table 4; Fig. 4, right panel and bar graphs).

123

Brain Struct Funct

Fig. 4 In the left panel, brain regions where activity is higher in CI users than controls under the incongruent AV condition (red) are displayed over the main activation effects in CI users (yellow) and controls (azure) (from Fig. 3) (uncorrected P \ 0.005, k = 25 voxels, T = 2.67). Right panel shows areas of conjunction of ‘‘CI users– Controls’’ contrasts in all three conditions (green), and that of group-

by-congruency condition interaction [blue; CI users (incongruent AV–congruent AV)—Controls (incongruent AV–congruent AV)] (uncorrected P \ 0.005, T = 2.67). Plots depict the relative effect size across groups and conditions in those regions where CI users exhibited an overactivation relative to controls. A-only auditory-only, AVcon congruent audiovisual, AVinc incongruent audiovisual

Correlation analysis with clinical variables

showed higher activation in a region of the right amygdala (Table 6; Fig. 6, rightmost). With duration of CI experience as a covariate of interest, however, no significant correlations were found under all three active stimulus conditions.

The areas of neural activation under three conditions that are positively or negatively correlated with deafness duration and CI speech score are summarized in Tables 5 and 6 (uncorrected P \ 0.005). In the con-AVS condition, the right lingual gyrus (16, -84, -14) showed a positive correlation with deafness duration (Table 5, Fig. 5 right upper), while the left lingual gyrus (-12, -96, -6) showed a negative correlation with CI speech score (Fig. 5 left lower; Table 6). In addition, the right lingual gyrus (18, -94, -10) displayed a negative correlation with CI speech score (Fig. 5 right lower; Table 6). Additionally, in the con-AVS condition, CI users performed better when they

123

Discussion By merging multisensory information, one obtains a more reliable percept of the environment. Regions such as the IFG (Hein et al. 2007), PMC (Skipper et al. 2007), posterior parietal cortex (Noppeney et al. 2008), ventral occipito-temporal cortex (Beauchamp et al. 2004b), as well

Brain Struct Funct Fig. 5 Regions of early visual cortices showing correlation with clinical factors in deaf CI users. Brain regions correlated positively with deafness duration under the congruent AV condition (red), negatively with the CI speech score under the congruent (pink) and incongruent (azure) AV conditions are displayed over the main activation effects in CI users under the congruent (gray) and incongruent (black) AV conditions (from Fig. 3) (P \ 0.005, k = 25 voxels, T = 2.67). Note that areas of the lingual gyri are correlated positively with deafness duration, but negatively with speech score. Circles highlight two patients with abrupt deafness. CI cochlear implant, AV audiovisual

as the STS, have often been suggested as integrators of AV stimuli. However, there is a scarcity of knowledge about the neural substrates responsible for merging congruent AV stimuli in CI users, as well as those utilized by such individual for selecting the critical information need to resolve conflicting incongruent AV stimuli. In the current study, we attempted to illuminate those neural substrates involved in AV integration mechanisms in CI users in the context of semantic congruency of audiovisual input. In brief, CI users were more vision-reliant when AV stimuli were congruent, whereas they adopted a top-down cognitive pathway when confront with AV conflicts.

and auditory cues for both congruent (Doucet et al. 2006; Rouger et al. 2007; Tremblay et al. 2010) and incongruent AV stimuli (Champoux et al. 2009; Tremblay et al. 2010). These observations are partially in accordance with our results, which revealed no statistically significant difference with regard to response accuracy and response time between the CI users and NH controls for congruent AV stimuli. However, contrary to these previous observations, the performance of the CI users was markedly degraded by simultaneously presented visual distractors, whereas the NH controls were not affected by distracting visual inputs (Fig. 2, left panel).

Behavioral results for AV speech stimuli

Increased load of normal sound processing in CI users as compared to NH controls

Recent behavioral studies suggest that although CI users are typically biased toward visual cues when integrating competing AV stimuli (Rouger et al. 2008), particularly proficient CI users show comparable utilization of visual

We presented non-degraded auditory cues both to CI users and NH controls to investigate the neural substrates involved in the procession of bimodal stimuli that are

123

Brain Struct Funct

Fig. 6 The right amygdala, where activity was higher in CI users than controls under the congruent AV condition (red), is displayed over the main activation effects in CI users (yellow) (P \ 0.005, k = 25 voxels, T = 2.67). The plot depicts the relative effect size

across groups and conditions in this area. Note the positive correlation between activation of the right amygdala and CI speech score (green). Circles highlight two patients with abrupt deafness. A-only auditoryonly, AVcon congruent audiovisual, AVinc incongruent audiovisual

similar to stimuli encountered in our daily environment. Under the inc-AVS condition, the NH control group may have had advantages over the CI user group due to better perception of the auditory cues. The significantly longer response times of the CI users (Fig. 2, right panel), albeit no difference in response accuracy under the A-only condition, may also indicate an additional effort to process auditory input with lower quality in CI users. The two groups also showed slightly different pattern of activation in the A-only condition. In CI users, significant clusters in the bilateral STG extend to the temporal pole, which might suggest an increased load of semantic processing resulting from degraded auditory information (Vigneau et al. 2006). In contrast, normal controls showed increased activity relative to CI users in the left MTG and right ITG, suggesting reduced auditory processing load by virtue of clearer sound and priming effects from previous training sessions using AV stimuli.

auditory cortices encompassing the STS in the NH group, the CI group revealed only slight activation in an area of the auditory cortex centered at the left MTG (Fig. 3). To compensate for auditory deprivation, post-lingual deafness subjects maintain oral comprehension by developing speech reading (Lazard et al. 2012; Lee et al. 2007b). Even several years after implantation, CI users maintain a high level of reliance on speech reading due to rudimentary and approximate auditory cues (Lazard et al. 2012; Rouger et al. 2007). Therefore, the CI user group may have processed congruent AV information mainly utilizing visual cues with minimal assistance of auditory inputs, even though they had been instructed to respond only to specific auditory information while viewing, but ignoring visual information. In this regard, the far lower activation of the left auditory sensory cortical area in the CI group under the con-AVS condition as compared with the NH group implies habitual vision dependence in analyzing congruent AV stimuli.

CI users rely more on speech reading than NH controls to process congruent AV stimuli even when required to attend only speech sounds

Deafness-induced and CI-related functional reorganization was also found in the limbic system

For congruent AV stimuli, both CI users and NH controls demonstrated increased rCBF in areas of the auditory and visual sensory cortices. Increased rCBF in areas of the bilateral STS in NH controls replicate previous reports suggesting the STS as the core integrator of congruent AV stimuli (Werner and Noppeney 2010a; Beauchamp et al. 2010; van Atteveldt et al. 2004). By contrast, compared to widely distributed bilateral activation in areas of the

In addition to the above-mentioned areas of AV processing, during the con-AVS condition two regions of the limbic system revealed an increased activation in CI users as compared with NH controls, showing two different clinical profiles. In a region of the left hippocampus, higher activation was observed in CI users as compared to NH controls under the con-AVS condition, and this activation was positively correlated with

123

Brain Struct Funct

deafness duration (Tables 3, 5). This clinical correlation suggests that to integrate speech sounds with speaking faces, CI users with longer deafness duration tend to rely on stored auditory memory to overcome the impoverished quality of the auditory input. A unique activation by congruent stimuli in the CI group was also noticed in an area of the right amygdala (Fig. 6; Tables 2, 3). In addition, this activation was positively correlated with speech score (Fig. 6; Table 6). Activation of the amygdala has been posited to be related with higher cognitive working memory load (Schaefer et al. 2006; Yun et al. 2010) and perceptual AV integration of emotions in NH individuals (Mesulam 1998; Kreifelts et al. 2010). Responding by auditory information while viewing, but ignoring visual information may evoke some uncomfortable emotion (Song et al. 2013b; Vanneste et al. 2010) in CI users who naturally rely more on visual information than auditory information when AV stimuli are congruent. By contrast, no activation of the amygdala in CI users under inc-AVS may indicate they were paradoxically less disturbed while ignoring non-matching visual stimuli even though they showed poorer performance under the inc-AVS condition. As CI users improve auditory performance along with audiovisual performance (Rouger et al. 2007), those with higher auditory performance might feel more uneasy because of their strong tendency to violate the instruction. Interestingly, a recent study on post-lingually deaf patients indicated that the right amygdala is activated by color imagery, whereas normal hearing controls activate the same region for auditory imagery (Lazard et al. 2011). Taken together with these reports, we may conjecture that the right amygdalar activation may reflect a neural reorganization preferring visual stimuli. Although the region centered the right amygdala where it showed positive correlation with speech score displayed relatively small voxel size (k = 27), this may be a functionally relevant area because it also overlaps with areas where CI users showed activation both in intra- and intergroup comparisons. As we can observe in Fig. 6, the areas of activation in CI users in intra- and inter-group comparisons partially overlaps. Considering that the area with significant activation on an inter-group contrast but without any significance on an intra-group level may have low functional relevance, the overlap of the area showing positive correlation with those showing significance in intra- and inter-group comparisons may be functionally important. Also, this is why we have used inclusive masks for all the group comparisons and correlation analyses, and this is why we report the areas of correlation even with relatively small voxel size, because those areas may be statistically meaningful.

NH individuals utilize vPMC–SMG network for AV integration more than CI users The right vPMC and SMG displayed increased activity in the NH control group relative to the CI user group for both con-AVS and inc-AVS conditions (Fig. 3; Table 3). The increased activity in this network may reflect involvement of the mirror neuron system. Viewing another persons’ action activates cortical areas belonging to the mirror neuron system, presumably to link action execution and observation, and in particular, the observation of mouth movements elicits a covert motor plan to imitate the lip motion (Nishitani and Hari 2002). The premotor area (Skipper et al. 2005; Molenberghs et al. 2010) and the SMG (Aboitiz and Garcia 2009; Molenberghs et al. 2010) are frequently suggested to be sites of the human mirror neuron system. In addition, the vPMC–SMG pathway is involved in linking articulatory motor and somatosensory representations during speech perception (Skipper et al. 2007; Guenther et al. 2006). Considering that the NH controls were more experienced with speech production with the CI users, with an average duration of deafness of 8.6 years in our series, the relative activation of the vPMC– SMG network in NH controls may reflect decreased activity of the mirror neuron system for speech-related motor representation. In this regard, the more salient difference in this network between the two groups for the conAVS relative to inc-AVS supports our interpretation of the different activity of this network as the result of differences in the recruitment of the speech-related mirror neuron system between the two groups. CI users adopt top-down strategy to process auditory information with visual distractors For the inc-AVS condition, NH individuals presented an activation pattern similar to that for the con-AVS condition, and CI users also demonstrated increased rCBF in areas of the bilateral auditory and visual cortices (Fig. 3; Table 2). However, the CI user group revealed distinct activation in an area of the left IFG, and this activation was also prominent in comparison to the NH control group (Fig. 4; Table 3). In addition, the CI user group displayed trend-level activation in an area of the left IFG for the intra-group comparison using the contrast CI users (incAVS–con-AVS). An examination of relative activation across conditions (Fig. 4, lower left) shows that the relative increase of activity was obtained in two conditions (A-only and inc-AVS), for which auditory decisions were not aided by congruent visual input. This suggests greater difficulty and increased cognitive load in the absence of visual aids— a situation further exacerbated by visual distractors.

123

Brain Struct Funct

The results for the NH control group are consistent with previous reports arguing for the STS as the ‘‘bottom-up’’ integrator of AV conflicts (van Atteveldt et al. 2004; Hein et al. 2007). In contrast, increased rCBF in the left IFG encompassing Broca’s area indicates ‘‘top-down’’ processing of the same AV conflict in the CI user group. This is intuitively plausible because CI users rely more on visual than auditory inputs. Hence, if the task is to extract auditory cues, they need to selectively neglect the visual cues that are their primary source of information. The IFG has been reported to be an area that subserves cognitive control (Fletcher and Henson 2001; Koechlin et al. 2003). Functional studies in rats with damaged IFG have demonstrated functional deficits in the detection of response conflict (Haddon and Killcross 2006; Marquis et al. 2007). In this regard, the left IFG may serve as a center for selective auditory cue extraction in the milieu of AV conflict. Another possible explanation for this IFG activation is a stop-signal inhibition for AV conflict (Aron et al. 2003). That is, vision-reliant CI users may have utilized the IFG, an area of response inhibition (Garavan et al. 1999; Bunge et al. 2002), due to overrepresented visual processing under the inc-AVS. The conjunction analysis revealing areas in the left SFG and MeFG as significantly activated regions in the CI users relative to the NH controls under all three stimulus conditions (Fig. 4; Table 4) may also indicate uniformly increased activity of higher cognitive functions for auditory processing in CI users. The SFG contributes particularly to working memory (Park et al. 2011), and the MeFG forms the apex of the executive system for decision-making (Koechlin and Hyafil 2007). These areas have also been reported to be activated by incongruent AV stimuli in NH subjects (Adam and Noppeney 2010; Jones and Callan 2003). From these viewpoints, raised activity in areas of the left SFG and MeFG under all stimulus conditions may indicate higher cognitive load in processing auditory stimuli in CI users. The group-by-congruency interaction analysis yielded an area in the left MFG (Fig. 4, right lower; Table 4) that may also indicate different applications of cognitive resources between groups. This area was deactivated in NH controls by ignoring of visual conflict, whereas it was activated in CI users by congruent visual input. By virtue of previous sensory experience, task difficulty and attentional load were differently applied between groups according to the visual stimuli delivered simultaneously with the auditory stimuli (Goldberg et al. 2007; Ruff et al. 2010). One discrepancy between our results and prior studies should be addressed. While several investigators have described an increased activity in the IFG of NH individual for incongruent AV stimuli (Szycik et al. 2009; Bernstein

123

et al. 2008; Nath and Beauchamp 2012), NH subjects in our study did not show such activation. This may be attributed to the difference in language and task complexity between the current and previous studies. Korean numbers can be differentiated discretely with regard to articulatory orofacial movements, whereas previous studies used less easily discernible words such as disyllabic words comprised of rhyming AV cues (Szycik et al. 2009) or monosyllabic McGurk words (Bernstein et al. 2008; Nath and Beauchamp 2012). Therefore, the IFG activation for top-down auditory cue extraction may have been less critical in the current study compared to previous studies. Previous deafness-induced plasticity in the visual cortex affects current CI outcome The negative correlation between speech score and activity in areas of the bilateral lingual gyri (Fig. 5, lower panels; Table 6) may indicate that CI users with poorer performance rely more on visual cues as they did during the period of deafness. In post-lingual deaf patients prior to implantation, increased metabolism in the visual areas has been reported as a predictor of poorer CI outcome (Giraud and Lee 2007). In the current study, for both AV stimuli, those CI users with the most increased activity in early visual areas were least successful in auditory perception with the CI device (Fig. 5, lower). Positive correlation with deafness duration (Fig. 5, right upper) confirms that previous deafness-induced plasticity in these areas is related to current CI auditory ability. Nonetheless, we may take degraded input delivered by the implant into account when we analyze increased activity in the visual cortices. A crucial difference between the current study and a recent study should be addressed. While the current study revealed negative correlations between the activity of the visual cortex and CI speech score for both AV conditions, a recent study indicated that the most significant activity cluster correlated with CI speech outcomes was observed in the visual occipital cortex for congruent AV stimuli (Strelnikov et al. 2013). This starkly opposite difference may have resulted from two reasons. First, while our CI users’ average duration of CI usage before the PET scan was 66.7 months, that of Strelnikov’s study was 7.6 days. Considering that intra-modal compensation by speech reading is the primary source of speech processing during the period of deafness, about a week’s CI usage may have been insufficient for inducing cross-modal sensory restoration by AV interaction due to the crude nature of the CIderived sound, and thus the CI subjects in Strelnikov’s study may have been benefitted from their speech-reading ability while processing AV stimuli. In contrast, the average CI usage of six and a half years in our participants may have enabled enough cross-modal compensation and

Brain Struct Funct

therefore they performed better when they were less dependent on previously dominant visual cortical activity. Second, while our CI users’ average duration of deafness is 7.6 ± 6.6 years that of Strelnikov’s study was at least more than 16.4 years. Deaf individuals may rely more on their speech-reading skill to process audiovisual information as the deafness duration becomes longer. Therefore, we may surmise that CI users with relatively longer duration of deafness, as in Strelnikov’s cohort, may rely more on their visual cortical activity due to deafness-induced plasticity and they perform better when they activate their visual cortex more, while those with shorter deafness, as in our cohort, duration may utilize auditory information more, and thus their visual cortical activity may negatively affect the speech perception. However, as we have acknowledged above, our cohort showed no correlation between the speech perception score and the duration of deafness, probably because we have collected only good performers. In contrast, Strelnikov et al. collected subjects with relatively long duration of deafness with various outcomes (with [20 years of deafness ranged from 15 to 85 % speech perception accuracy). Therefore, our analyses focusing on the duration of deafness in isolation are limited in that our cohort is not representative of general CI users, and thus general deafness-induced plasticity cannot be evaluated only by our cohort. Future studies with larger number of subjects with various regard to duration of deafness, duration of CI usage, and speech outcome should be performed to confirm whether the visual cortical activity is deleterious or beneficial according to the duration of deafness. To find reversal of deafness-related decrease of the left STG/MTG activity following hearing restoration with CI (Rouger et al. 2012), we performed correlation analyses with duration of CI experience and indeed demonstrated progressive increase of activity in the right STG–STS in all three stimulus conditions, but these could not reach statistical threshold used for correlation analyses in this study (uncorrected P = 0.005, k = 10, T = 3.50), hence warrants future study with more subjects to confirm. Limitations of the current study and proposed future studies To the best of our knowledge, the current study is the first to identify candidate neural substrates for the processing of congruent and incongruent AV stimuli in CI users. By disentangling the characteristic mechanisms of CI-assisted speech-processing strategies, these results may be a milestone for future studies investigating multisensory integration in CI users. Due to the limited number of subjects, we could not recruit a sample of CI users with a homogenous mode of deafness, i.e., sudden or

progressive hearing loss. Two of ten CI users were deafened due to sudden hearing loss, and they might different from those with progressive hearing loss in the plastic changes of their brains. Although these two participants did not exhibit unusual behavioral results (they are indicated by circles in all correlation plots), future studies with a larger series of participants should be performed to exclude bias originating from such etiologic differences. In addition, although the influence of sex was factored out by including it as a nuisance variable in all group analyses and correlation analyses, possible effect of sex may still have affected the results and their interpretation because the previous literature has indicated that sex differences affect cortical speech processing (Bitan et al. 2010; Kempe et al. 2012; Koles et al. 2010). Future studies with better sex-matched groups should be performed to exclude this possible bias. Finally, because we were focused on the effect of agreeing or conflicting visual inputs on auditory processing in CI subjects, we did not include visual-only (Vonly; lip reading) condition in the current study paradigm. Considering that previous studies have revealed identical performance levels (Tremblay et al. 2010; Rouger et al. 2008) but different cortical activation patterns (Rouger et al. 2012) between NH controls and CI subjects under V-only condition, future studies comparing V-only and AV stimuli may help us understand the post-CI changes in visual and audiovisual speech-processing networks.

Conclusions Taken together, comparisons of AV integration circuits in NH controls and CI users delineated neural substrates involved in multimodal speech processing in CI users. With conflicting multimodal input, CI users are visionreliant when processing congruent AV stimuli, and disturbed more by visual distractors than NH controls when confronted with incongruent AV stimuli. To cope with this multimodal conflict, CI users activate prefrontal areas such as the left IFG to adopt a top-down cognitive modulation pathway, whereas NH individuals primarily adopt a bottom-up strategy and utilize multisensory integrators such as the STS or the right vPMC–SMG pathway. Taken together, deafness-induced plasticity makes CI users depend more on visual processing and higher cognitive pathway for coping with multimodal environments. Acknowledgments The authors thank Dr. Yong-Hwi Ahn for his support on the manuscript. Also, the first author thanks Dr. DY Yoon for giving precious support to the study. This work was supported by Korean government (MOST) [Korea Science and Engineering Foundation (KOSEF) (no. 2012-0030102)].

123

Brain Struct Funct

References Aboitiz F, Garcia R (2009) Merging of phonological and gestural circuits in early language evolution. Rev Neurosci 20:71–84 Adam R, Noppeney U (2010) Prior auditory information shapes visual category-selectivity in ventral occipito-temporal cortex. Neuroimage 52:1592–1602 Archbold S, Lutman ME, Marshall DH (1995) Categories of auditory performance. Ann Otol Rhinol Laryngol Suppl 166:312–314 Aron AR, Fletcher PC, Bullmore ET, Sahakian BJ, Robbins TW (2003) Stop-signal inhibition disrupted by damage to right inferior frontal gyrus in humans. Nat Neurosci 6:115–116 Beauchamp MS (2005) Statistical criteria in FMRI studies of multisensory integration. Neuroinformatics 3:93–113 Beauchamp MS, Argall BD, Bodurka J, Duyn JH, Martin A (2004a) Unraveling multisensory integration: patchy organization within human STS multisensory cortex. Nat Neurosci 7:1190–1192 Beauchamp MS, Lee KE, Argall BD, Martin A (2004b) Integration of auditory and visual information about objects in superior temporal sulcus. Neuron 41:809–823 Beauchamp MS, Nath AR, Pasalar S (2010) fMRI-guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect. J Neurosci 30:2414–2417 Bernstein LE, Auer ET Jr, Wagner M, Ponton CW (2008) Spatiotemporal dynamics of audiovisual speech processing. Neuroimage 39:423–435 Bitan T, Lifshitz A, Breznitz Z, Booth JR (2010) Bidirectional connectivity between hemispheres occurs at multiple levels in language processing but depends on sex. J Neurosci 30:11576–11585 Bunge SA, Dudukovic NM, Thomason ME, Vaidya CJ, Gabrieli JD (2002) Immature frontal lobe contributions to cognitive control in children: evidence from fMRI. Neuron 33:301–311 Calvert GA, Campbell R, Brammer MJ (2000) Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Curr Biol 10:649–657 Champoux F, Lepore F, Gagne JP, Theoret H (2009) Visual stimuli can impair auditory processing in cochlear implant users. Neuropsychologia 47:17–22 Corbetta M, Shulman GL (2002) Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci 3:201–215 Doucet ME, Bergeron F, Lassonde M, Ferron P, Lepore F (2006) Cross-modal reorganization and speech perception in cochlear implant users. Brain 129:3376–3383 Eickhoff SB, Stephan KE, Mohlberg H, Grefkes C, Fink GR, Amunts K, Zilles K (2005) A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. Neuroimage 25:1325–1335 Finney EM, Fine I, Dobkins KR (2001) Visual stimuli activate auditory cortex in the deaf. Nat Neurosci 4:1171–1173 Fletcher PC, Henson RN (2001) Frontal lobes and human memory: insights from functional neuroimaging. Brain 124:849–881 Garavan H, Ross TJ, Stein EA (1999) Right hemispheric dominance of inhibitory control: an event-related functional MRI study. Proc Natl Acad Sci USA 96:8301–8306 Genovese CR, Lazar NA, Nichols T (2002) Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage 15:870–878 Giraud AL, Lee HJ (2007) Predicting cochlear implant outcome from brain organisation in the deaf. Restor Neurol Neurosci 25:381–390 Giraud AL, Truy E (2002) The contribution of visual areas to speech comprehension: a PET study in cochlear implants patients and normal-hearing subjects. Neuropsychologia 40:1562–1569

123

Giraud AL, Price CJ, Graham JM, Truy E, Frackowiak RS (2001) Cross-modal plasticity underpins language recovery after cochlear implantation. Neuron 30:657–663 Goldberg RF, Perfetti CA, Fiez JA, Schneider W (2007) Selective retrieval of abstract semantic knowledge in left prefrontal cortex. J Neurosci 27:3790–3798 Guenther FH, Ghosh SS, Tourville JA (2006) Neural modeling and imaging of the cortical interactions underlying syllable production. Brain Lang 96:280–301 Haddon JE, Killcross S (2006) Prefrontal cortex lesions disrupt the contextual control of response conflict. J Neurosci 26:2933–2940 Hein G, Doehrmann O, Muller NG, Kaiser J, Muckli L, Naumer MJ (2007) Object familiarity and semantic congruency modulate responses in cortical audiovisual integration areas. J Neurosci 27:7881–7887 Jones JA, Callan DE (2003) Brain activity during audiovisual speech perception: an fMRI study of the McGurk effect. Neuroreport 14:1129–1133 Kempe V, Thoresen JC, Kirk NW, Schaeffler F, Brooks PJ (2012) Individual differences in the discrimination of novel speech sounds: effects of sex, temporal processing, musical and cognitive abilities. PLoS One 7:e48623 Koechlin E, Hyafil A (2007) Anterior prefrontal function and the limits of human decision-making. Science 318:594–598 Koechlin E, Ody C, Kouneiher F (2003) The architecture of cognitive control in the human prefrontal cortex. Science 302:1181–1185 Koles ZJ, Lind JC, Flor-Henry P (2010) Gender differences in brain functional organization during verbal and spatial cognitive challenges. Brain Topogr 23:199–204 Kral A, O’Donoghue GM (2010) Profound deafness in childhood. N Engl J Med 363:1438–1450 Kreifelts B, Ethofer T, Huberle E, Grodd W, Wildgruber D (2010) Association of trait emotional intelligence and individual fMRIactivation patterns during the perception of social signals from voice and face. Hum Brain Mapp 31:979–991 Lazard DS, Lee HJ, Gaebler M, Kell CA, Truy E, Giraud AL (2010) Phonological processing in post-lingual deafness and cochlear implant outcome. Neuroimage 49:3443–3451 Lazard DS, Giraud AL, Truy E, Lee HJ (2011) Evolution of non-speech sound memory in postlingual deafness: implications for cochlear implant rehabilitation. Neuropsychologia 49:2475–2482 Lazard DS, Giraud AL, Gnansia D, Meyer B, Sterkers O (2012) Understanding the deafened brain: implications for cochlear implant rehabilitation. Eur Ann Otorhinolaryngol Head Neck Dis 129:98–103 Lazard DS, Lee HJ, Truy E, Giraud AL (2013) Bilateral reorganization of posterior temporal cortices in post-lingual deafness and its relation to cochlear implant outcome. Hum Brain Mapp 34:1208–1219 Lee DS, Lee JS, Oh SH, Kim SK, Kim JW, Chung JK, Lee MC, Kim CS (2001) Cross-modal plasticity and cochlear implants. Nature 409:149–150 Lee JS, Lee DS, Oh SH, Kim CS, Kim JW, Hwang CH, Koo J, Kang E, Chung JK, Lee MC (2003) PET evidence of neuroplasticity in adult auditory cortex of postlingual deafness. J Nucl Med 44:1435–1439 Lee HJ, Giraud AL, Kang E, Oh SH, Kang H, Kim CS, Lee DS (2007a) Cortical activity at rest predicts cochlear implantation outcome. Cereb Cortex 17:909–917 Lee HJ, Truy E, Mamou G, Sappey-Marinier D, Giraud AL (2007b) Visual speech circuits in profound acquired deafness: a possible role for latent multimodal connectivity. Brain 130:2929–2941 Marquis JP, Killcross S, Haddon JE (2007) Inactivation of the prelimbic, but not infralimbic, prefrontal cortex impairs the contextual control of response conflict in rats. Eur J Neurosci 25:559–566

Brain Struct Funct McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748 Meredith MA, Stein BE (1983) Interactions among converging sensory inputs in the superior colliculus. Science 221:389–391 Mesulam MM (1998) From sensation to cognition. Brain 121(Pt 6):1013–1052 Miller LM, D’Esposito M (2005) Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J Neurosci 25:5884–5893 Molenberghs P, Brander C, Mattingley JB, Cunnington R (2010) The role of the superior temporal sulcus and the mirror neuron system in imitation. Hum Brain Mapp 31:1316–1326 Nath AR, Beauchamp MS (2011) Dynamic changes in superior temporal sulcus connectivity during perception of noisy audiovisual speech. J Neurosci 31:1704–1714 Nath AR, Beauchamp MS (2012) A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. Neuroimage 59:781–787 Nishitani N, Hari R (2002) Viewing lip forms: cortical dynamics. Neuron 36:1211–1220 Noppeney U, Josephs O, Hocking J, Price CJ, Friston KJ (2008) The effect of prior visual information on recognition of speech and sounds. Cereb Cortex 18:598–609 Park H, Kang E, Kang H, Kim JS, Jensen O, Chung CK, Lee DS (2011) Cross-frequency power correlations reveal the right superior temporal gyrus as a hub region during working memory maintenance. Brain Connect 1:460–472 Rauschecker JP (1999) Auditory cortical plasticity: a comparison with other sensory systems. Trends Neurosci 22:74–80 Ross LA, Saint-Amour D, Leavitt VM, Javitt DC, Foxe JJ (2007) Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cereb Cortex 17:1147–1153 Rouger J, Lagleyre S, Fraysse B, Deneve S, Deguine O, Barone P (2007) Evidence that cochlear-implanted deaf patients are better multisensory integrators. Proc Natl Acad Sci USA 104:7295– 7300 Rouger J, Fraysse B, Deguine O, Barone P (2008) McGurk effects in cochlear-implanted deaf subjects. Brain Res 1188:87–99 Rouger J, Lagleyre S, Demonet JF, Fraysse B, Deguine O, Barone P (2012) Evolution of crossmodal reorganization of the voice area in cochlear-implanted deaf patients. Hum Brain Mapp 33:1929– 1940 Ruff DA, Marrett S, Heekeren HR, Bandettini PA, Ungerleider LG (2010) Complementary roles of systems representing sensory evidence and systems detecting task difficulty during perceptual decision making. Front Neurosci 4:190 Schaefer A, Braver TS, Reynolds JR, Burgess GC, Yarkoni T, Gray JR (2006) Individual differences in amygdala activity predict response speed during working memory. J Neurosci 26:10120– 10128 Skipper JI, Nusbaum HC, Small SL (2005) Listening to talking faces: motor cortical activation during speech perception. Neuroimage 25:76–89 Skipper JI, Goldin-Meadow S, Nusbaum HC, Small SL (2007) Speech-associated gestures, Broca’s area, and the human mirror system. Brain Lang 101:260–277 Song JJ, Choi HG, Oh SH, Chang SO, Kim CS, Lee JH (2009) Unilateral sensorineural hearing loss in children: the importance of temporal bone computed tomography and audiometric followup. Otol Neurotol 30:604–608 Song JJ, De Ridder D, Van de Heyning P, Vanneste S (2012a) Mapping tinnitus-related brain activation: an activation-

likelihood estimation metaanalysis of PET studies. J Nucl Med 53:1550–1557 Song JJ, Yoo YT, An YH, Yoo JC, Kim JS, Koo JW (2012b) Comorbid benign paroxysmal positional vertigo in idiopathic sudden sensorineural hearing loss: an ominous sign for hearing recovery. Otol Neurotol 33:137–141 Song JJ, Punte AK, De Ridder D, Vanneste S, Van de Heyning P (2013a) Neural substrates predicting improvement of tinnitus after cochlear implantation in patients with single-sided deafness. Hear Res 299:1–9 Song JJ, Vanneste S, Schlee W, Van de Heyning P, De Ridder D (2013b) Onset-related differences in neural substrates of tinnitusrelated distress: the anterior cingulate cortex in late-onset tinnitus, and the frontal cortex in early-onset tinnitus. Brain Struct Funct [Epub ahead of print] Strelnikov K, Rosito M, Barone P (2011) Effect of audiovisual training on monaural spatial hearing in horizontal plane. PLoS One 6:e18344 Strelnikov K, Rouger J, Demonet JF, Lagleyre S, Fraysse B, Deguine O, Barone P (2013) Visual activity predicts auditory recovery from deafness after adult cochlear implantation. Brain 136: 3682–3695 Sumby WH, Pollack I (1954) Visual contribution to speech intelligibility in noise. J Acoust Soc Am 26:212–215 Szycik GR, Jansma H, Munte TF (2009) Audiovisual integration during speech comprehension: an fMRI study comparing ROIbased and whole brain analyses. Hum Brain Mapp 30:1990–1999 Talairach J, Tornoux P (1988) Co-planar stereotaxic atlas of the human brain: 3-dimensional proportional system: An approach to cerebral imaging. Georg Thieme, Stuttgart Talsma D, Doty TJ, Woldorff MG (2007) Selective attention and audiovisual integration: is attending to both modalities a prerequisite for early integration? Cereb Cortex 17:679–690 Tremblay C, Champoux F, Lepore F, Theoret H (2010) Audiovisual fusion and cochlear implant proficiency. Restor Neurol Neurosci 28:283–291 van Atteveldt N, Formisano E, Goebel R, Blomert L (2004) Integration of letters and speech sounds in the human brain. Neuron 43:271–282 van Atteveldt NM, Formisano E, Goebel R, Blomert L (2007) Topdown task effects overrule automatic multisensory responses to letter-sound pairs in auditory association cortex. Neuroimage 36:1345–1360 van Wassenhove V, Grant KW, Poeppel D (2005) Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci USA 102:1181–1186 Vanneste S, Plazier M, der Loo E, de Heyning PV, Congedo M, De Ridder D (2010) The neural correlates of tinnitus-related distress. Neuroimage 52:470–480 Vigneau M, Beaucousin V, Herve PY, Duffau H, Crivello F, Houde O, Mazoyer B, Tzourio-Mazoyer N (2006) Meta-analyzing left hemisphere language areas: phonology, semantics, and sentence processing. Neuroimage 30:1414–1432 Werner S, Noppeney U (2010a) Distinct functional contributions of primary sensory and association areas to audiovisual integration in object categorization. J Neurosci 30:2662–2675 Werner S, Noppeney U (2010b) Superadditive responses in superior temporal sulcus predict audiovisual benefits in object categorization. Cereb Cortex 20:1829–1842 Yun RJ, Krystal JH, Mathalon DH (2010) Working memory overload: fronto-limbic interactions and effects on subsequent working memory function. Brain Imaging Behav 4:96–108

123

Normal-Hearing Listeners' and Cochlear Implant Users' Perception of Pitch Cues in Emotional Speech.

Spectrotemporal Modulation Detection and Speech Perception by Cochlear Implant Users.

Effects of noise and noise suppression on speech perception by cochlear implant users.

Cross-Modal and Intra-Modal Characteristics of Visual Function and Speech Perception Performance in Postlingually Deafened, Cochlear Implant Users.

Impact of a moving noise masker on speech perception in cochlear implant users.

Speech perception with interaction-compensated simultaneous stimulation and long pulse durations in cochlear implant users.

Speech prosody perception in cochlear implant users with and without residual hearing.

Trimodal speech perception: how residual acoustic hearing supplements cochlear-implant consonant recognition in the presence of visual cues.

Training of cochlear implant users to improve pitch perception in the presence of competing place cues.

Melodic, rhythmic, and timbral perception of adult cochlear implant users.

Advanced beamformers for cochlear implant users: acute measurement of speech perception in challenging listening conditions.

Event-related potentials for better speech perception in noise by cochlear implant users.

Effects of short-term auditory deprivation on speech production in adult cochlear implant users.

Top-down restoration of speech in cochlear-implant users.

Effects of electrical stimulation rate on speech recognition in cochlear implant users.

Use of amplitude modulation cues recovered from frequency modulation for cochlear implant users when original speech cues are severely degraded.

Precedence based speech segregation in bilateral cochlear implant users.

Relationships Among Peripheral and Central Electrophysiological Measures of Spatial and Spectral Selectivity and Speech Perception in Cochlear Implant Users.

Older and younger adult cochlear implant users: speech recognition in quiet and noise, quality of life, and music perception.

Vowel identification by cochlear implant users: Contributions of duration cues and dynamic spectral cues.

The effects of limiting the number of Nucleus 22 cochlear implant electrodes programmed on speech perception.

Effect of input compression and input frequency response on music perception in cochlear implant users.

Speech perception with F0mod, a cochlear implant pitch coding strategy.

Effects of real-time cochlear implant simulation on speech production.