Brain and Cognition 90 (2014) 19–31

Contents lists available at ScienceDirect

Brain and Cognition journal homepage: www.elsevier.com/locate/b&c

Visual enhancement of auditory beat perception across auditory interference levels Yi-Huang Su ⇑ Department of Movement Science, Faculty of Sport and Health Science, Technical University of Munich, Germany

a r t i c l e

i n f o

Article history: Accepted 8 May 2014

Keywords: Rhythm perception Multisensory perception Visual beat Point-light figure Inverse effectiveness Sensorimotor synchronization

a b s t r a c t This study dealt with audiovisual rhythm perception involving an observed movement. Two experiments investigated whether a visual beat conveyed by a bouncing human point-light figure facilitated beat perception of concurrent auditory rhythms, and whether this enhancement followed a profile of multisensory integration. In Experiment 1, participants listened to three repetitions of a metrically simple rhythm and detected a perturbation in the third repetition. The rhythm was presented alone or with a visual beat in phase to it. Both conditions were presented with or without an auditory interference sequence at four increasing tempi, which served to progressively weaken the beat of the auditory rhythm. In Experiment 2, participants tapped to a regular auditory beat in the same combinations of visual beat and auditory interference. Results showed that the visual beat improved the perception of (Experiment 1) and the synchronization to (Experiment 2) the auditory rhythms. Moreover, in both experiments, visual enhancement was greater when the performance in the unisensory (auditory) conditions was poorer, consistent with the principle of inverse effectiveness. The relative multisensory gain increased as auditory performance deteriorated, except in one intermediate level. Together these results demonstrate that rhythmic visual movement aids auditory rhythm perception, which may be subserved by a perceptually integrated audiovisual beat that couples the internal motor system. Ó 2014 Elsevier Inc. All rights reserved.

1. Introduction ‘‘Rhythm’’ typically refers to temporally organized patterns found in auditory stimuli such as music and speech (Large, 2008). However, humans’ experiences of rhythms are often multimodal, involving visual information: e.g., we observe a musician’s movement as he or she plays an instrument, we watch a dancer moving to the music, we hear the footsteps of a person while seeing the approaching gaits, or we read the lip movement of someone speaking in front of us. These scenarios exemplify how auditory rhythms are closely and pervasively connected to rhythmic human movements that either give rise to, or are coordinated with, such auditory events (Merker, Madison, & Eckerdal, 2009). The effect of movements communicated visually along with the auditory rhythms has been well examined in speech: Rhythmic facial movements, such as lip movements coupled to the 3–8 Hz rhythm of syllable production, are known to facilitate auditory speech perception (Chandrasekaran, Trubanova, Stillittano, Caplier, & ⇑ Address: Department of Movement Science, Faculty of Sport and Health Science, Technical University of Munich, Georg-Brauchle-Ring 62, 80992 Munich, Germany. E-mail address: [email protected] http://dx.doi.org/10.1016/j.bandc.2014.05.003 0278-2626/Ó 2014 Elsevier Inc. All rights reserved.

Ghazanfar, 2009; Ghazanfar, 2013; Ghazanfar, Chandrasekaran, & Morrill, 2010). Although both speech and music are unique means of human communication, multisensory perception in musical rhythms remains poorly understood. For example, does the observation of a dancer’s movement reinforce the perceived rhythm of the music? The present research was concerned with multisensory perception in this context. Of particular interest was whether the perception of concurrent auditory and visual rhythms – with the latter being conveyed by an observed movement coordinated with the auditory rhythm – followed similar principles to those established in multisensory integration. Auditory musical rhythms often give rise to the perception of a regular beat, to which listeners would naturally synchronize through various overt or covert movements (Burger, Thompson, Luck, Saarikallio, & Toiviainen, 2013; Drake, Jones, & Baruch, 2000; Merker et al., 2009; Su & Pöppel, 2012). Besides perpetuating the listeners to move along, the presence of a beat facilitates rhythm perception (Grahn, 2012; Grahn & Brett, 2007; Grube & Griffiths, 2009; Povel & Essens, 1985) and motor synchronization to the rhythms (Patel, Iversen, Chen, & Repp, 2005; Repp, Iversen, & Patel, 2008). It is commonly found that the capacity for rhythm and beat perception differs between the two major senses, audition

20

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

and vision. When the same task was presented through comparable auditory (typically successive tones) and visual stimuli (until recently, repetitive flashes), performance in the former well surpasses that in the latter with regard to rhythmic interval timing (Grondin & McAuley, 2009), beat perception (Grahn, 2012; Grahn, Henry, & McAuley, 2011; McAuley & Henry, 2010), and sensorimotor synchronization (Jäncke, 2000; Kato & Konishi, 2006; Lorås, Sigmundsson, Talcott, Öhberg, & Stensdotter, 2012; Patel et al., 2005; Repp, 2003; see Section 1.4.2 in Repp & Su, 2013). Similarly, when pitting concurrent auditory and visual rhythms in a target–distractor paradigm (Guttman, Gilroy, & Blake, 2005; Repp & Penel, 2002; Repp & Penel, 2004), an auditory distractor affects the visual task considerably, whereas a visual distractor has minimal influence on the auditory task. The modality differences are also reflected in the relevant (sub)-cortical motor areas underlying the performance (Grahn et al., 2011; Section 4.2.6 in Repp & Su, 2013). As rhythm tasks essentially tap onto temporal processing of sensory stimuli, the frequently observed auditory advantage seems to be explained by the modality appropriateness hypothesis (Welch & Warren, 1980). This hypothesis postulates that audition and vision are better at temporal and spatial tasks, respectively, and that the more competent modality for the task dominates. However, the idea of a general visual inferiority in the rhythm domain has recently been challenged, as there is growing evidence of the rhythmic capacity of dynamic visual stimuli. For example, apparent motion of an object improves visual rhythm perception (Brandon & Saffran, 2011; Grahn, 2012) and synchronization (Hove, Fairhurst, Kotz, & Keller, 2013a; Hove, Iversen, Zhang, & Repp, 2013b; Hove, Spivey, & Krumhansl, 2010), as compared to repetitive, stationary flashes that were adopted in the earlier studies. Hove et al. (2013b) even found that a bouncing ball as a visual distractor has a stronger effect than an auditory distractor for visually trained participants (video gamers). Furthermore, similar activations in putamen – an area strongly implicated in beat perception (Grahn & Brett, 2007; Grahn et al., 2011) – have been found for synchronization with a moving visual stimulus (a bar moving in space) and with an auditory metronome (Hove et al., 2013a). It thus appears that rich motion information in the visual stimuli is one key to optimizing visual rhythm processing. This may be because (periodic) visual motions, like auditory rhythms, are capable of coupling humans’ internal motor system that supports rhythm and beat perception (Section 4.1.2 in Repp & Su, 2013). Inspired by these findings, and the link between rhythm perception and human movement (Phillips-Silver & Trainor, 2007; Su & Pöppel, 2012), a recent study examined the effect of a visual stimulus whose motion profile was extracted from a natural human movement (Su, 2014b). In that study, a periodic visual stimulus was presented as a continuously bouncing human point-light figure (PLF, Johansson, 1973, see Visual stimuli in Section 2.1.2), which had been generated by recording a real human bouncing regularly. The visual beat was perceived around each bounce (maximal knee flexion) of the continuous movement, mirroring how humans would move to a musical beat (PhillipsSilver & Trainor, 2007; see also Toiviainen, Luck, & Thompson, 2010, for converging evidence of a musical pulse/beat being embodied in humans’ vertical body movements). It was found that such a visual beat, when combined with an auditory beat (i.e., yielding a bimodal beat), modulated beat perception of complex auditory rhythms, and the phase of the visual beat had more influence than that of the concurrent auditory beat. In that study, however, the effect of a unimodal visual beat was less clear because the employed auditory rhythms had complex temporal structures (the ‘metric complex’ patterns as used in Grahn, 2012), making the imposed beat less salient. The question thus

remains as to whether the given visual beat alone can influence the perception of less complex auditory rhythms. Following this finding, the present study investigated the effect of the same visual stimulus, a bouncing PLF, on the perception of metrically simple auditory rhythms. Specifically, and as has not been shown before, it examined whether the perception of concurrent audiovisual rhythms followed one of the principles that are often found to characterize multisensory perception: the principle of inverse effectiveness (PoIE). The PoIE was initially established in the firing rates of cats’ superior colliculus (SC) neurons that respond to both unisensory and multisensory stimuli (Alvarado, Vaughan, Stanford, & Stein, 2007; Meredith & Stein, 1983; Stein, Stanford, Ramachandran, Perrault, & Rowland, 2009). It describes that multisensory information is combined in such a manner, that the response gain associated with multisensory cues is greater when the (most effective) unisensory stimulus strength – and accordingly the unisensory response – is weaker. For example, the weaker the response of SC neurons to a unisensory cue (e.g., the sight of an object), the more beneficial a multisensory cue (e.g., adding sound to the sight of an object) will be in eliciting neuronal response and its behavioral consequence (e.g., speeding up orienting to the object. See Stein & Stanford, 2008). This pattern has since been shown in various human behavioral measures (e.g., RT or detection) as well as cortical activations (e.g., in the superior temporal sulcus), in the perception of both simple events (e.g., flash and beep) and complex multisensory stimuli, such as speech, objects, motion, or body gestures (Hecht, Reiner, & Karni, 2008; Jessen, Obleser, & Kotz, 2012; Rach & Diederich, 2006; Saldern & Noppeney, 2013; Senkowski, Saint-Amour, Höfle, & Foxe, 2011; Stevenson & James, 2009; Stevenson et al., 2012; Werner & Noppeney, 2010). Considering the possibly overlapping mechanisms underlying speech and music cognition (Patel, 2012; Patel, 2014), that the perception of both relies on their rhythms in a similar manner (Cason & Schön, 2012; Hausen, Torppa, Salmela, Vainio, & Särkämö, 2013; Rothermich, Schmidt-Kassow, & Kotz, 2012), and that both can be conveyed through multisensory information, it may be hypothesized that multisensory perception of musical rhythms would also follow the PoIE. The present study thus asked a novel question: Does a visual beat assist auditory beat perception, such that the benefit of a visual beat increases as the auditory beat is made less distinct by a source of rhythmic interference? This hypothesis was tested in two experiments, borrowing the paradigm from audiovisual speech perception, in which auditory word recognition with and without a visual cue (a talking face) was measured across increasing levels of auditory noise (Ross, Saint-Amour, Leavitt, Javitt, & Foxe, 2007). Here, the perception of (Experiment 1) and the synchronization to (Experiment 2) beat-based auditory rhythms with and without a concurrent visual beat were measured across increasing levels of auditory interference. In Experiment 1, the auditory rhythms were metrically simple, and the beat arises readily in the perception of these rhythms by means of temporal (grouping) accents (Grahn, 2012; Povel & Essens, 1985). A bouncing PLF as a visual beat, if present, accompanied the rhythms synchronously in the same meter. The auditory interference was not of the noise sort that masked the signal of the auditory rhythms. Rather, it presented interference in the rhythm domain, implemented as an isochronous sequence (of a different timber from that of the rhythms) whose period differed from the beat period of the auditory rhythms. The periods of the rhythm and of the interference were related by an integer ratio other than N:1, yielding a polyrhythm that weakened the beat saliency of the auditory rhythm (Poudrier & Repp, 2013). The level of interference was increased by elevating the tempo of the interference sequence across conditions, while keeping the tempo of the auditory rhythms constant, thereby increasing the auditory

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

polyrhythmic complexity. This manipulation has been found to incrementally reduce perceptual sensitivity to the attended rhythms, and the effect occurs regardless of the stimulus parameters (e.g., pitch separation) or listeners’ strategies that favor either segregation or integration of the two auditory streams (Fidali, Poudrier, & Repp, 2013). In Experiment 2, the same conditions were implemented in a sensorimotor synchronization task, where the auditory rhythm was an isochronous beat that paced participants’ finger taps. In both experiments, the visual beat was expected to improve auditory rhythm perception. More importantly, if audiovisual beat perception followed the PoIE, then the magnitude of visual enhancement (i.e., audiovisual relative to auditory) should increase as the auditory performance was gradually impaired by the increasing interference tempo. Results in the two experiments were also compared to assess the nature of this mechanism. As perceptual and sensorimotor timing has been shown to be dissociable (Repp, 2002, 2009), a convergent result would suggest a perceptual nature of the effect.

21

2.1.1. Participants Fourteen paid volunteers (six male, mean age 27 years, SD = 5) participated in this experiment. All reported normal or correctedto-normal vision and normal hearing. Participants were not prescreened for musical training, and the training duration ranged from 0 to 20 years (amateur musicians), with a mean duration of 6 years (SD = 6). Amongst the amateur musicians, the learned instruments included piano (3), guitar (2), percussion (2), flute (2), and singing (1). This study had been approved by the ethic commission of Technical University of Munich, and was conducted in accordance with the ethical standards of the 1964 Declaration of Helsinki. All participants gave written informed consent prior to the experiment.

using Psychophysics Toolbox extensions version 3 (Brainard, 1997), and the animation was down-sampled to 100 Hz to match the monitor’s frame frequency. The PLF was represented by 13 white discs against a black background, each of which subtended 0.4 degrees of visual angle (°). In order to convey the human figure unambiguously, white lines were added to connect the discs. The whole PLF subtended approximately 5° and 12° when viewed at 80 cm, and was centered in the middle of the screen (Fig. 1). In order to generate a temporally precise visual metronome from the recorded movements, several steps of editing were applied to the recording. First, as the bouncing movement was defined mainly along the vertical dimension, the spatiotemporal variations in the horizontal dimension (i.e., lateral motion), which were of a smaller magnitude and less systematic than in the vertical dimension, was not displayed. The movement of the whole PLF was thus displayed along the vertical dimension only (see Su, 2014a, for detailed descriptions of the motion profile in both the original and the edited motion data). In addition, the two discs representing the feet were kept still without any minor movement as had been seen in the recording. Next, the temporal interval of each cycle was calculated for each disc. A cycle consisted of a downward and an upward movement phase; the former was defined by the trajectory between the highest and the lowest vertical positions during knee flexion, and the latter was defined by the trajectory between the lowest and the highest vertical positions during knee extension. An algorithm was applied to identify one single cycle that contained, across all the discs, the least absolute temporal deviation from 500 ms. Following that, a slight temporal interpolation was applied to ensure that, for each disc, the interval of the downward and upward movement phase corresponded to 250 ms, and that all the discs reached the lowest and the highest positions of the movement at the same time. Finally, as the PLF movement was intended to be presented continuously for several cycles, a slight spatial interpolation was applied to the single cycle, so that the spatial position of each disc in the end of one cycle continued smoothly to that in the beginning of the next cycle, i.e., there was no spatial discrepancy when the movement was displayed cyclically. A single cycle lasted 500 ms. Auditory stimuli. Two types of auditory stimuli were generated by the music software Logic 8 Express (Apple Inc. California): the rhythms, and the interference sequences. The rhythms were taken from the thirty standard patterns of the ‘Metric simple’ rhythms as employed by Grahn (2012) and Grahn and Brett (2007) (see Table 1 in Grahn, 2012, for a complete list of the patterns). Each pattern consisted of successive empty intervals of different durations bordered by short, discrete tones. Of the thirty patterns, six had five intervals, twelve had six intervals, and the other twelve had seven

2.1.2. Stimuli and materials Visual stimuli. Visual stimuli consisted of a human PLF performing a repetitive bouncing movement (i.e., knee-bending without the feet leaving the ground), which was the same as employed in a previous work (Su, 2014b). The movement was presented at a tempo corresponding to an inter-bounce interval of 500 ms. The PLF was initially constructed by recording a practiced actor performing this whole-body movement using a 3D motion capture system (Qualisys Oqus, 8 cameras) with a sampling rate of 200 Hz. 13 markers in total were attached to the head, shoulders, elbows, wrists, hips, knees, and feet during the recording (Ahlström, Blake, & Ahlström, 1997). The actor performed this movement continuously paced by an auditory metronome with an inter-beat interval (IBI) of 500 ms. Two sequences of the bouncing movement with 30 cycles each were recorded, during which maximal knee flexion was synchronized to the metronome beat. The recorded motion data were converted into a 2D (without depth information) point-light display in MatlabÒ R2012b (Mathworks)

Fig. 1. Illustration of the human point-light figure.

2. Experiment 1: Rhythm change-detection task This experiment investigated whether a visual beat assisted the perception of concurrent beat-based auditory rhythms, and whether the benefit of a visual beat increased as the underlying beat of the auditory rhythms was progressively weakened by an auditory (rhythmic) interference. A detection task was employed to probe sensitivity to a perturbation in the auditory rhythms (Fidali et al., 2013) in the presence and absence of a visual beat, across increasing levels of auditory polyrhythmic complexity. 2.1. Method

22

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

intervals. The intervals were related by integer ratios of 1:2:3:4 in different combinations and orders. The smallest interval (integer ratio 1) was 250 ms, and the beat period of the patterns was 500 ms. A silent interval of 1000 ms was added to the end of each pattern, so that each rhythm corresponded to two complete measures of a 4/4 musical meter. The tones had a synthesized sound of the instrument ‘clave’ with 43 ms tone duration. The interference sequence consisted of isochronous tones in four possible tempi, whose period formed the following polyrhythmic ratios to the beat period of the rhythms: 3:8, 5:8, 7:8, and 9:8 (where 8 was the beat period of the rhythm), i.e., the corresponding IBI of each interference tempo was 1333.33 ms, 800 ms, 571.43 ms, and 444.44 ms. The interference tones had a synthesized sound of the instrument ‘bongo’ with 50 ms tone duration, and were slightly attenuated (5 dB) relative to the rhythm tones (Fig. 2a). 2.1.3. Procedure and design The experimental program was controlled by a customized Matlab script using Psychophysics Toolbox version 3 routines running on a Mac OSX environment. The visual stimuli were displayed on a 17-in. CRT monitor (Fujitsu X178 P117A) with a frame frequency of 100 Hz at a spatial resolution of 1024  768 pixels. Participants sat with a viewing distance of 80 cm. Sounds were presented at a sampling rate of 44,100 Hz through closed studio headphones (AKG K271 MKII). Each trial started with a fixation cross in the center of the screen for 1000 ms, followed by three repetitions of an auditory rhythm. During each repetition, the number ‘1’, ‘2’, or ‘3’ was shown on the screen as a reminder. The rhythms were either presented alone or accompanied by an interference sequence at one of the four tempi as described in the Auditory stimuli. In half of all the trials, a change could occur in the third repetition of the rhythm, such that a tone coinciding with a beat position was advanced or delayed (with equal probability) for 125 ms. The five polyrhythmic combinations of auditory rhythm and interference (including no interference) were each presented in two visual conditions: with and without a visual beat. In trials with a visual beat (i.e., audiovisual conditions, henceforth termed ‘AV’), a PLF was displayed on the screen that bounced periodically and synchronously to the rhythm, i.e., the lowest position of the movement (the bounce)

was synchronized to the beat of the rhythm (Fig. 3a).1 In trials without a visual beat (i.e., auditory-only conditions, henceforth termed ‘A’), only the numbers indicating the auditory rhythm repetition were shown on the screen. In each trial participants were required to respond whether the third repetition of the rhythm was the same as the first two by pressing one of the two predefined keys. They were instructed to ignore the auditory interference, if present, and to observe the PLF movement, if present. In order to ensure that participants attended to the figure movement, in one third of the trials with a visual figure (randomly selected), one of the 13 discs would briefly turn green for 1000–1200 ms during the visual presentation. Participants were required in every trial with a visual stimulus, following their response to the rhythm change, to answer whether a dot had changed its color by pressing one of the two predefined keys. An error feedback was given upon a wrong answer to the dot color change. It should be noted that the A and AV conditions represented not only the modalities in which stimuli were presented, but also the modalities in which a beat was present, which was the factor that was expected to modulate the perception of auditory rhythms. In the case of A, the auditory beat arose as a result of subjective accents in strongly metrical rhythms (Grahn, 2012; Grahn & Brett, 2007), whereas in AV the subjective auditory beat was accompanied by a physical visual beat. The experiment thus followed a 5 (auditory interference)  2 (beat modality: A, AV)  2 (change in the rhythm) within-participant design, each with 15 repetitions. The trials were assigned to five blocks of 60 trials each, with all the conditions balanced across blocks. Within a block the trials with and without the visual PLF were interleaved every five trials, the order of which was balanced across participants. All the other conditions were randomly placed within a block. Participants practiced four trials (with and without auditory interference  with and without visual PLF) before starting the experiment. The entire experiment lasted around 2 h, completed in two sessions that took place within a week. A break was required after each block of around 20 min. 2.1.4. Analysis Sensitivity to a change in the rhythm was indexed by d0 (signal detection theory analysis, Stanislaw & Todorov, 1999), which was calculated individually for each auditory interference level under A and AV condition. Response bias as indexed by b was also calculated in the same manner, which was intended to reveal whether the measured difference in sensitivity was associated with a shifted tendency to give one response or the other. For all the repeated-measures ANOVAs reported in this study, Greenhouse– Geisser correction was applied to the p values of effects of variables with more than two levels. Tukey HSD was used as post hoc tests following a significant main effect. 2.2. Results The within-subject means of d0 were submitted to a 5 (auditory interference)  2 (beat modality) repeated-measures ANOVA. The main effect of auditory interference was significant, F(4, 52) = 21.79, p < 0.001, g2 = 0.63, and the post hoc tests found that d0 in the no-interference condition was greater than those in all the other conditions with an interference sequence, all ps < 0.05.

Fig. 2. Illustration of the rhythm/beat tones and the interference tones as employed in each experiment. (a) Example of a rhythmic pattern accompanied by an interference sequence (marked by black arrows) in Experiment 1, with the ratio of beat periods between the two being 8:3. (b) Example of a beat sequence accompanied by an interference sequence (marked by black arrows) in Experiment 2, with the ratio of beat periods between the two being 5:3. Note that in both experiments, the first interference tone always coincided with the first rhythm/beat tone within a polyrhythmic cycle (i.e., the first black arrow here points to an interference tone superimposed on a rhythm/beat tone).

1 It is typically found that the auditory stream needs to be delayed relative to the visual one for optimal synchrony perception. However, in a previous study using very similar stimuli (Su, 2014a), it was found that optimal synchrony was perceived very close to their physical coincidence, with on average a 5  9 ms lag (the underlying mechanism is discussed in detail in that study). In addition, the observed temporal integration window was considerably broad (140 ms). Thus, it was reasoned here that without introducing an additional lag between the auditory and visual streams, their synchrony should still be fairly well perceived.

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

23

Fig. 3. Illustration of the auditory and visual stimuli as presented in a trial. (a) In Experiment 1, an auditory rhythm (top panel) could be accompanied by a synchronous visual beat with the same beat period (500 ms), as well as an auditory interference sequence (last four panels) whose period presents one of the four possible polyrhythmic relations to the beat of the auditory rhythm (depicted in gray). The numbers in the first panel depict the interval ratios within a metrically simple auditory rhythm. Here only one repetition of the stimulus combination is shown, whereas three repetitions were included in a trial. (b) In Experiment 2, a similar combination of conditions were presented, in which the auditory rhythm was an isochronous beat (IBI = 600 ms).

Fig. 4. Results of Experiment 1: (a) Mean d0 and b for auditory (A) and audiovisual (AV) conditions as a function of the auditory interference level (as indicated by the polyrhythmic complexity). Error bars represent standard error of the means. (b) Mean d0 and b for A and AV conditions as a function of the unisensory (auditory) performance level. Level 1–5 represents performance in a descending order. (c) Left: mean percentage of correct response for A and AV conditions as a function of the auditory performance level. Right: relative multisensory gain as a function of the auditory performance level. The solid black line represents the best fitted linear function across the five data points, whereas the gray dotted line represents the best linear fit without the data point in level 3.

The main effect of beat modality was also significant, F(1, 13) = 17.34, p < 0.01, g2 = 0.57, showing that d0 was greater with a visual beat (AV) than without (A). The interaction between the two factors was not significant, p > 0.4 (Fig. 4a, left). However, inspection of the individual data revealed that, in the unisensory conditions (A), participants’ d0 s did not always descend according to the ascending interference levels as implemented (i.e., increasing interference tempo). There also appeared to be considerable individual differences in the order of the performance levels relative to the five implemented interference levels. (See Table A1 in Appendix. Possible reasons for this variation, which was likely associated with the polyrhythmic nature of the two auditory streams, will follow in Section 4.3.) The PoIE describes multisensory enhancement as a function of the unisensory response effectiveness, which is assumed to correspond to the unisensory stimulus strength (Holmes, 2007; Holmes, 2009). This correspondence is often the case in subcortical neurons’ response to relatively simple stimuli (Meredith & Stein, 1983; Stein et al., 2009). Some human behavioral data are also consistent with this pattern, e.g., the rate of auditory word recognition decreases as the auditory signal-to-noise ratio decreases (Ross et al., 2007). In the present case, however, the unisensory d0 as indexing auditory rhythm perception was not incrementally impaired according to the objective interference levels. Namely, the perceived stimulus (beat) strength was not progressively weakened by the increasing interference

tempo on an individual basis. As such, the multisensory profile of the data should be characterized not as a function of the objective stimulus effectiveness, but as a function of the response effectiveness. For this purpose, the five unisensory (A) responses (i.e., d0 ) for each individual were sorted in a descending order, representing each participant’s unisensory performance level (or perceived unisensory stimulus strength) in a descending order. For example, instead of 0, 3:8, 5:8, 7:8, and 9:8, for a particular participant the perceived unisensory stimulus strength could be in the descending order of 0, 3:8, 5:8, 9:8, and 7:8. The five levels of multisensory (AV) response for each participant were then sorted according to the order of the unisensory (A) response.2 The sorted mean d0 s were submitted to a 5 (performance level)  2 (beat modality: AV, A) repeated-measures ANOVA. A sig-

2 As discussed by Holmes (2007, 2009), sorting unisensory responses post hoc may introduce the problem that the observed multisensory gain, suggesting of PoIE, could have been due to a statistical confound of ‘‘regression toward the mean’’ in the multisensory responses. However, this problem is relevant for studies that group participants post hoc based on the unisensory performance and then assess the multisensory gain in a between-group manner. Here, the post hoc sorting of AV based on the response level of A was carried out on a within-participant basis, and the means across participants were analyzed in a balanced manner as a function of the five response levels (representing the perceived levels of stimulus strength). This analysis is analogous to that employed by Ross et al. (2007), across levels of objective stimulus intensity, which should not be affected by this statistical issue.

24

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

nificant interaction between the two factors was found, F(4, 52) = 2.75, p < 0.05, g2 = 0.17. Follow-up one-way ANOVAs conducted for each response level separately showed that, while d0 did not differ between AV and A in the first three performance levels, all three ps > 0.1, d0 was greater for AV than for A in the fourth level, F(1, 13) = 18.19, p < 0.001, g2 = 0.58, as well as in the fifth level, F(1, 13) = 17.08, p < 0.01, g2 = 0.57 (Fig. 4b, left). To examine whether the measured sensitivity was associated with a corresponding response bias especially between A and AV conditions, the within-subject means of unsorted b were submitted to the 5  2 ANOVA, which only yielded a significant effect of interference level, F(4, 52) = 10.73, p < 0.001, g2 = 0.45, with post hoc tests showing a greater b in all the conditions with an auditory interference than without, all ps < 0.05. No other significant effect was found (Fig. 4a, right). Similarly, the within-subject means of sorted b were also submitted to the 5  2 ANOVA, which only yielded a significant effect of performance level, F(4, 52) = 12.1, p < 0.001, g2 = 0.48, with post hoc tests identifying a greater b in level 3–5 than in level 1, all ps < 0.05. No other significant effect was found (Fig. 4b, right). Thus, greater sensitivity found in AV compared to A, as well as its interaction with the performance level, were not associated with a corresponding response bias. While the analyses above demonstrated greater sensitivity with than without a visual beat at the two lowest auditory performance levels, also informative would be the relative multisensory gain, i.e., the enhancement of performance in the multisensory (AV) relative to the unisensory (A) condition, under each unisensory response level (Holmes, 2007). Of particular interest was whether the extent of improvement increased monotonously across decreasing levels of unisensory performance. As such, the withinsubject mean percentage of correct responses was calculated for A and AV under each (sorted) auditory performance level. To characterize the relative gain, the percentage of performance improvement for each level was computed on an individual basis as (AV  A)  100/A (Holmes, 2009; Ross et al., 2007). The best-fitted linear regression of the group means yielded a goodness of fit (R2) of 0.80. Upon closer inspection, the data point at level 3 seemed to deviate more than the other data points from the regression. Statistic tests of the z-scores of the residual values for the five data points showed that the residual for level 3 was marginally different from the other residuals (z = 0.74, 0.14, 1.39, 0.57, and 1.09, and p = 0.23, 0.45, 0.08, 0.29, and 0.14, for each of the five levels respectively). A linear regression without this data point yielded a better fit, R2 = 0.91 (Fig. 4c). In sum, the relative benefit of adding a visual beat appeared to increase with decreasing levels of auditory performance, though not in a strictly linear manner. Results of this experiment demonstrated that, first of all, a visual beat conveyed by a bouncing PLF aided the perception of a concurrent auditory rhythm under all the imposed levels of auditory interference. This pattern resembled that found in audiovisual speech perception by Ross et al. (2007). The positive effect of a rhythmic visual movement on auditory rhythm perception thus seems to parallel the effect of rhythmic facial movements on auditory speech perception (Ghazanfar, 2013). Moreover, the degree of visual facilitation was more prominent when the performance in the auditory condition was worse, specifically seen in the two lowest performance levels, suggestive of the PoIE. The multisensory benefit as characterized by a relative (AV vs. A) gain also increased with decreasing auditory performance levels, with the improvement at level 3 being less conforming to the linear trend. Note that the gain represents the increase in the multisensory response relative to the corresponding unisensory baseline, while the multisensory response itself also decreases across decreasing unisensory response levels (Holmes, 2007). Overall it appeared that, to some extent, the perception of concurrent audiovisual beat in a rhythm followed a similar rule as described by the PoIE. Namely, the visual

beat was of greater assistance when beat perception of the auditory rhythms was more weakened. 3. Experiment 2: Rhythm synchronization task Using a synchronization task, this experiment examined whether the same findings as in Experiment 1 – visual enhancement of auditory beat perception, and greater benefit of the visual beat under lower auditory performance – could be reflected in the beat strength for sensorimotor coupling. Participants tapped to a regular auditory beat in the presence and absence of a synchronous visual beat, across incremental levels of auditory polyrhythmic interference. It was hypothesized that the visual beat would help stabilize the taps more, compared to no visual beat, when the taps became more variable due to the interference. 3.1. Method 3.1.1. Participants Eleven paid participants and the author (4 male, mean age 27 years, SD = 6) took part in this experiment. All were righthanded, and all reported normal or corrected-to-normal vision and normal hearing. Musical training ranged from 0 to 20 years (amateur musicians), with a mean duration of 10 years (SD = 7). Amongst the amateur musicians, the learned instruments included piano (5), guitar (2), percussion (1), flute (1), and violin (1). All participants gave written informed consent prior to the experiment. 3.1.2. Stimuli and materials Visual stimuli. Visual stimuli consisted of the same bouncing PLF as in Experiment 1. The period of the movement was now 600 ms, corresponding to the IBI of the auditory beat sequence (see Auditory stimuli). Auditory stimuli. Two types of auditory stimuli were generated: the beat sequence, and the interference sequence. The beat sequence consisted of isochronous tones with an IBI of 600 ms. The interference sequence consisted of isochronous tones in four possible tempi, corresponding to an IBI of 1000 ms, 750 ms, 500 ms, and 428.57 ms, respectively. The beat sequence and the interference sequence formed a polyrhythm with the following ratios: 5:3, 5:4, 5:6, and 5:7 (as in Fidali et al., 2013). The beat tones had a synthesized sound of the instrument ‘bongo’ with 50 ms tone duration, and the interference tones had a synthesized sound of the instrument ‘clave’ with 43 ms tone duration (Fig. 2b). Tapping measurement. An ultrasonic 3-dimensional motion capture system, Zebris (Zebris Medical GmbH, Isny, Germany), was employed to register the trajectories of the finger taps at a sampling rate of 200 Hz. The Zebris system includes a base station that sends out pulses of ultrasonic transmitters, which are received by a small ultrasonic microphone in the form of a sensor. To record the finger taps, a sensor was attached to the foremost joint of each participant’s index finger. Participants rested their elbows and forearms on the desk, and tapped with the index finger of their dominant (right) hand at a fixed position on an even surface of the desk marked by a cross. Participants were instructed to lay their palm flat in a natural manner and moved only with the index finger. They were also instructed to tap with a minimized contact time with the surface and a clear extension in the movement. The experimenter gave sufficient instruction and ensured that each participant tapped in the required manner before starting the experiment. The trajectories of the taps, i.e., the 3-D coordinates of the sensor at each unit time, was measured in real-time by the propagation time of the ultrasonic pulses (Cogollor et al., 2012; Quinlan, Mullett, Stapleton, FitzPatrick, & McCormack, 2006). The timing of the taps was later extracted from the positions of each finger trajectory (see Section 3.1.4).

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

3.1.3. Procedure and design The setup for the stimuli presentation was the same as in Experiment 1. The experimental program running in Matlab also triggered the recording of the Zebris system on a trial basis to ensure optimal synchronization between the two systems; the asynchrony between the two systems was on average about 3 ms. The sounds of the ultrasonic signals were not audible to the participants, who wore closed headphones throughout the experiment. During the experiment, the trajectories of the finger taps were visualized in real-time in the program 3DAWIN running on a separate computer (visible only to the experimenter). Participants initiated each trial by pressing the space key with the left hand. Two task variations were included: tapping to the auditory beat (sync-A), or tapping to the visual beat (sync-V), with the former constituting the major part of the experiment. In the sync-A trials, participants were required to tap with their index finger as synchronously as possible to a sequence of 39 isochronous auditory beats, starting from the third beat. In 80% of the sync-A trials, an interference sequence could be heard in parallel to the beat sequence from the fifth beat tone on, which the participants were instructed to ignore. The interference sequence was in four possible tempi, two faster and two slower than the beat tempo, yielding four ratios of polyrhythm (see Auditory stimuli). Note that, regardless of the interference tempo, an interference tone always coincided with every fifth beat tone in the beat sequence. In half of the sync-A trials, there was no additional visual stimulus except for a fixation cross in the middle of the screen (A condition). In the other half of the sync-A trials, a PLF was displayed on the screen and was bouncing synchronously to the entire beat sequence, i.e., the lowest position of the movement was synchronized to each beat tone (AV condition). Participants were instructed to observe the figure movement, if present, during the task (Fig. 3b). To ensure visual attention in the sync-A trials, in 30% of the trials with a PLF (randomly selected), the same manipulation of a dot color change as in Experiment 1 was implemented. In the sync-V trials, the beat sequence was the PLF bouncing regularly with a period of 600 ms, i.e., there was no more auditory beat sequence. Participants were instructed to synchronize their finger taps regularly to the lowest position (the bounce) of the figure movement (V condition). In all the sync-V trials, an auditory interference sequence in four possible tempi (same as in sync-A trials) could be heard in parallel to the visual beat sequence, which the participants were instructed to ignore. The sync-V trials were implemented as control conditions to examine whether performance in the bimodal sync-A trials could be attributed to tapping to the visual stimulus alone. There were 5 (auditory interference)  2 (beat modality: AV, A) conditions for the sync-A trials and 4 conditions of auditory interference for the sync-V trials. 10 repetitions were included for each experimental condition. The trials were presented in 5 blocks. Within a block, the sync-A and sync-V trials were grouped in subblocks of 20 and 8 trials each, and the order of presentation (sync-A first or sync-V first) was randomly chosen for each block. All the other experimental conditions were balanced across blocks and presented in a random order within a block. Participants practiced three sync-A and two sync-V trials before starting the experiment. The entire experiment lasted around 1.5 h, completed in one session. A break was required after each block of around 15 min. 3.1.4. Analysis No participant exhibited substantial difficulty in carrying out the task, although all reported that the sync-V trials were considerably more difficult than the sync-A trials. The tap data were first processed on a trial basis. The timing of each tap was extracted from the time point of the lowest position in each periodic finger trajectory. Taps in every trial were also inspected

25

visually by plotting the time course of each finger trajectory relative to each beat onset. In case of a tap having a slightly longer dwell time, the first identified time point of the lowest position was taken as the tap time. The first two taps of each trial were excluded from analyses, making 35 analyzed taps per trial. The variability of the taps, measured by the within-trial standard deviation of the tap-beat asynchronies, was first calculated to indicate whether the synchronization in a trial was successful (Repp, 2003): If the variability exceeded 10% and 15% of the IBI in a sync-A and sync-V trial, respectively, then the trial was considered unsuccessful. On occasional trials with a few missing taps, likely due to task difficulty in the presence of an auditory interference, the trials were included in further analyses if (1) no more than five taps were missing, and (2) the variability of the rest of the taps met the aforementioned criterion. Otherwise the trials were considered unsuccessful. The few trials with too many taps, which occurred mostly in the sync-V task, were considered unsuccessful, as the taps in these trials exhibited a systematic drift likely due to the auditory interference. On average, 4% of the sync-A trials and 8% of the sync-V trials were unsuccessful and excluded from further analyses. Amongst the successful trials, the main parameter of task performance was the within-trial standard deviation of the tap-beat asynchronies (SDasy), which served an inverse measure of the synchronization stability (i.e., higher variability indicates poorer synchronization; see Section 1.1 in Repp & Su, 2013). In addition, the mean tap-beat asynchrony (negative mean asynchrony, NMA) was also calculated for each trial to supplement information of the synchronization behavior. The asynchronies for the sync-V trials were calculated between tap times and the time points of the lowest position in the PLF movement. 3.2. Results Data in the sync-A trials (A and AV) were analyzed first: the within-subject means of SDasy were submitted to a 5 (auditory interference)  2 (beat modality) repeated-measures ANOVA, which yielded a main effect of auditory interference, F(4, 44) = 9.2, p < 0.005, g2 = 0.46 (due to large variances, only a marginally significant difference was identified between interference condition 5:0 and 5:6 in the post hoc tests, p = 0.058), and a main effect of visual beat, F(1, 11) = 19.75, p < 0.001, g2 = 0.64, with lower variability for AV than for A. The interaction was not significant (Fig. 5a). Inspection of the individual data again revealed that, as observed in Experiment 1, participants differed in their order of performance levels relative to the order of the imposed interference levels (See Table A1 in Appendix. Surprisingly, four participants synchronized better under an interference condition than in the condition without interference.) As such, the same sorting procedure as described in Experiment 1 was applied here, such that for each participant the five SDasy means in the multisensory (AV) conditions were sorted according to descending levels of unisensory (A) performance, i.e., the SDasy in the A conditions in an ascending order. The sorted means were submitted to a 5 (response level)  2 (beat modality) repeated-measures ANOVA, which identified a significant interaction between the two factors, F(4, 44) = 3.71, p < 0.05, g2 = 0.25. Follow-up one-way ANOVAs conducted for each unisensory performance level separately revealed that, while SDasy in AV and A did not differ at level 1, p > 0.7, SDasy in AV was lower than in A at level 2, F(1, 11) = 8.29, p < 0.05, g2 = 0.43, at level 3, F(1, 11) = 9.49, p < 0.05, g2 = 0.46, at level 4, F(1, 11) = 13.46, p < 0.01, g2 = 0.55, and at level 5, F(1, 11) = 19.08, p < 0.005, g2 = 0.63 (Fig. 5b). To characterize the relative multisensory gain, similar procedures as described in Experiment 1 were carried out, which

26

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

Fig. 5. Results of Experiment 2: (a) Mean SDasy for auditory (A), audiovisual (AV) and visual (V) conditions as a function of the auditory interference (polyrhythmic complexity) level. Error bars represent standard error of the means. (b) Mean SDasy for A and AV as a function of the auditory performance level. Note that increasing SDasy indicates decreasing performance. (c) Relative multisensory gain as a function of the auditory performance level. The solid black line represents the best linear fit across the five data points, whereas the gray dotted line represents the best linear fit without the data point in level 3. (d) Mean NMA for A, AV, and V as a function of the interference level.

computed the percentage of gain in tapping stability as (A  AV)  100/A for each participant under each unisensory performance level.3 The best-fitted linear regression of the mean data points yielded R2 of 0.73. The tests of residuals of the data points showed the following for each respective response level: z = 1.08, 0.63, 1.40, 0.36, and 0.59; p = 0.14, 0.26, 0.08, 0.36, and 0.28. It appeared that the data point at level 3 deviated somewhat more than the others from the linear fit. Linear regression without this data point yielded a better fit, R2 = 0.88 (Fig. 5c). Interestingly, the pattern of the data fit – with the data point at level 3 showing a lower fit than the others – echoed that found in Experiment 1. To examine whether synchronization in AV conditions depended solely on the presence of the visual beat, SDasy in the sync-V trials were included in the analysis, and the unsorted within-subject means of SDasy were submitted to a 4(auditory interference, level 2–5)  3(beat modality: AV, A, and V) repeated-measures ANOVA. A significant main effect of beat modality was found, F(2, 22) = 45.51, p < 0.001, g2 = 0.81, with the post hoc tests showing a greater variability in V (sync-V trials) than in either A or AV (sync-A trials), both ps < 0.001 (Fig. 5a). Thus, the performance in the AV condition could not be attributed to tapping to the visual beat alone, as it would have led to much higher variability. Furthermore, if the improvement in AV relative to A was only associated with the visual beat instead of a coherent percept of an audiovisual beat, strong correlations of SDasy should be found between (A  AV) and V at all the performance levels (Ross et al., 2007). Correlation analyses across participants found, on the contrary, only a significant correlation in the lowest performance level, n = 12, r = 0.59, p = 0.04. This pattern is similar to that reported in Ross et al.’s study on speech perception, where significant correlations between AV and V were only found in the highest auditory noise levels. Presumably, a stronger dependence on the visual information started emerging as the auditory interference reached a considerably detrimental level. Overall, the multisensory gain observed here was not likely due to selective attending to the intact stimulus modality. Finally, to compare tap-beat asynchronies between unisensory (A and V) and multisensory (AV) conditions, a 4 (auditory 3 As SDasy is negatively associated with tap stability, ‘‘A  AV’’ instead of ‘‘AV  A’’ was calculated to represent the extent of variability reduction, and thus the improvement in synchronization.

interference, level 2–5)  3 (beat modality: AV, A, V) repeatedmeasures ANOVA was conducted for NMA, which yielded a significant main effect of beat modality, F(2, 22) = 5.56, p < 0.05, g2 = 0.39. Post-hoc tests showed only that the magnitude of NMA was greater (i.e., more negative) in V than in A or AV, both ps < 0.001. Due to large variances, the effect of interference level was only close to significant, F(3, 33) = 3.74, p = 0.054, g2 = 0.25 (Fig. 5d). Overall, taps preceded the beat to a larger extent when synchronizing to a visual beat than to either an auditory or an audiovisual beat. In sum, results of synchronization with unisensory (A) and multisensory (AV) beats paralleled the perceptual results in Experiment 1: Across all levels of auditory interference, taps were more stable when they were paced by an audiovisual beat than by an auditory beat. Besides, the positive effect of a visual beat was greater when the taps became more destabilized by the auditory interference. As in Experiment 1, this pattern was observed for visual enhancement as a function of the auditory performance level, not of the objective interference level. The relative multisensory gain was also characterized by an increase of improvement (AV relative to A) across decreasing unisensory (A) performance levels, with the improvement in level 3 being less consistent with the linear trend of the function. Thus, the multisensory beat strength for sensorimotor coupling also appeared to follow the profile of the PoIE to some extent. Finally, tapping to the visual beat alone in the presence of an auditory interference was highly variable, as has been reported in other target–distractor paradigms using repetitive visual flashes (Repp & Penel, 2004). It confirmed that the observed audiovisual improvement was attributed to the multisensory beat strength, rather than to the visual beat alone. This interpretation was corroborated by the observed mean NMA, which was considerably different (i.e., more negative) when taps were paced by a visual beat than by an auditory or audiovisual beat. 4. Discussion In the framework of common coding for action and perception (Prinz, 1997), the link between musical rhythms and human movement manifests not only when a listener actively moves to the rhythm (Burger et al., 2013; Su & Pöppel, 2012), but also when humans observe other individuals moving along with the rhythm they hear, e.g., watching dancers moving to music (Cross, Kraemer,

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

Hamilton, Kelley, & Grafton, 2009). Thus far, except for indirect evidence in a previous study (Su, 2014b), it has not been shown how such visually communicated rhythmic movements interact with auditory rhythms in perception. The present study investigated the mechanisms underlying multisensory (audiovisual) rhythm perception, in which the visual part consisted of a PLF that moved regularly to the beat of the auditory rhythms. Two experiments converged to show that a visual beat conveyed by a humanlike movement aided beat perception of a concurrent auditory rhythm, leading to overall higher sensitivity to a change in the auditory rhythm (Experiment 1), and more stable taps synchronized to a regular auditory beat (Experiment 2). Moreover, the benefit of the visual beat – as indexed by the performance in the audiovisual compared to the auditory conditions – was more prominent when the performance in the latter was poorer. In both experiments, the relative multisensory gain followed a linear increase across four of the five performance levels. Thus, a trend that describes the mechanism underlying audiovisual beat perception as (at least partially) following the PoIE (Stein et al., 2009) is noteworthy. 4.1. Visual movement beat aids auditory beat perception Whereas the effect of rhythmic visual information on auditory perception has been established in the speech domain (Ghazanfar, 2010, 2013; Kriegstein et al., 2008; Ross et al., 2007), similar effects have not been examined in the context of musical rhythms. However, several parallels can be drawn between speech and music perception (Patel, 2012, 2014), as both kinds of information are temporally structured (Hausen et al., 2013), conveyed considerably (though not only) through the auditory modality, and are often linked to rhythmic movements that are visually communicated at the same time. Furthermore, the perception of both speech and musical rhythms can be based on a metrical beat (Lidji, Palmer, Peretz, & Morningstar, 2011; Rothermich et al., 2012). As such, the idea of a relevant visual rhythm enhancing auditory musical rhythm perception seems plausible. There is already indication that dynamic visual information can influence auditory temporal perception of non-speech stimuli: For example, a coherent visual motion flow (Su & Jonikaitis, 2011), or the same bouncing PLF as presently adopted (Su, 2014b), have been found to modulate auditory tempo and beat perception, respectively. Besides, it has also been shown that visual cues conveyed by naturalistic, rhythmic human movements, such as dancing, walking, and clapping, do not benefit additionally from concurrent auditory cues (the music they move to) in self-recognition of these actions, presumably because the observed movements already provide sufficient (and perhaps preferred) cues in this regard (Sevdalis & Keller, 2010). Following these effects of dynamic visual stimuli, the present study provides the first evidence that a visual spatiotemporal rhythm, also communicated by a humanlike movement, can serve a beat that improves the perception of a concurrent auditory rhythm. The effectiveness of the present visual beat is consistent with the finding of Hove et al. (2013a), showing that modality appropriateness (Welch & Warren, 1980) depends on each modality’s affinity to processing specific kinds of information, rather than on a mere distinction between temporal and spatial attributes of the task. Whereas audition is sensitive to the temporal structure of successive discrete events, vision can better extract rhythms from continuous motion (containing spatiotemporal periodicity) than from static flashes (see also Grahn, 2012). Furthermore, the present visual stimulus may have the additional advantage of engaging observers’ internal motor system through action observation (Cross et al., 2009; Gilaie-Dotan, Kanai, Bahrami, Rees, & Saygin, 2013; Saygin, 2007; van Kemenade, Muggleton, Walsh, & Saygin, 2012), in which one relevant area, the premotor cortex, is strongly implicated in beat perception (Section 4.1.2 in Repp & Su, 2013).

27

Two mechanisms may have conjunctively contributed to the effect of the present visual beat. One is that the bouncing humanlike figure elicits internal simulation of the movement in the observers (Jeannerod, 2001). As overt bouncing movement is found to impose metrical accents in the perception of an ambiguous auditory rhythm (Phillips-Silver & Trainor, 2007), an observed bounce may similarly lead to enhanced internal motor resonance that marks a regular beat. The strengthened beat percept in turn improves the perception of and the synchronization to the auditory rhythms. In the study of Phillips-Silver and Trainor (2007), however, the same effect of body movement on auditory metrical interpretation is not found with movement observation. With regard to sensory– motor coupling, overt movements presumably have a stronger influence on perception than covert movements (Su & Pöppel, 2012). Thus, it might be that metrical interpretation (i.e., hearing an ambiguous rhythm in a particular metrical frame, such as a duple or a triple meter) entails a complete hierarchical representation of the rhythm, and motor simulation is not effective enough to impose several levels of rhythmic structure in perception. In the present case, the auditory rhythm already has a clear underlying beat in a duple meter, and beat perception may thus be further reinforced, or ‘rescued’ in the presence of an interference, by the internally simulated movement. Besides, as motor areas of the brain are engaged in processing beat-based auditory rhythms in the absence of overt movements (Section 4.1.2 in Repp & Su, 2013), the simulated movement may enhance this process internally. As such, the present auditory and visual rhythms may interact on the basis of a common action representation (Alaerts, Swinnen, & Wenderoth, 2009; Kaplan & Iacoboni, 2007). A second, more parsimonious explanation is that a visual beat can be conveyed by the motion parameters of moving stimuli. Previous synchronization studies have demonstrated that a beat can be perceived in the trajectories of point-light biological motion or motion with other naturalistic velocity profiles (Hove et al., 2013b; Luck & Sloboda, 2008; Luck & Sloboda, 2009; Wöllner, Deconinck, Parkinson, Hove, & Keller, 2012). A visual beat may thus be effectively communicated by the motion profile of the present stimulus. It has recently been shown that both audition and vision can process beat-based rhythms given appropriate stimuli (Grahn, 2012). The present finding further suggests that a beat may be a cross-modal percept, whose effect can be transferred not only from audition to vision (Grahn et al., 2011), but also the other way around, at least when effective visual stimuli are employed. The present results, however, do not imply that the audiovisual interaction occurs because the visual rhythm (or the visual beat) is readily re-coded into an auditory pattern, as some studies have suggested (Guttman et al., 2005; Karabanov, Blom, Forsman, & Ullen, 2009; however, McAuley & Henry, 2010, did not support this claim). There is no evidence thus far suggesting that moving visual stimuli can be more effectively translated into auditory temporal patterns than stationary ones. Therefore, a stronger effect of the former on rhythm perception does not appear justifiable mainly by inter-sensory re-coding. Considering the sensorimotor crosstalk for dynamic visual stimuli as proposed above, and without presuming re-coding of sensory information, a common representation of the audiovisual beat at the sensorimotor level remains plausible. This is consistent with recent proposals for audiovisual speech perception, postulating that concurrent auditory and visual signals are processed in parallel and in a modality-specific manner at the sensory level, without transforming one into the other or transforming both into a common code (Altieri, Pisoni, & Townsend, 2011). It is also argued that multisensory benefit does not depend on the re-coding of sensory information. Visual enhancement of auditory (speech) perception can be achieved by the two sensory pathways communicating with each other directly (Schroeder, Lakatos,

28

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

Kajikawa, Partan, & Puce, 2008), or through a common motor scheme at a higher level (Galantucci, Fowler, & Turvey, 2006; Scott, McGettigan, & Eisner, 2009). Furthermore, the activations in the motor system may feedback to modulate sensory processes through cortical oscillations (Schroeder, Wilson, Radman, Scharfman, & Lakatos, 2010), of which activities in the beta band are associated with the perceived beat in an auditory rhythm (Fujioka, Ween, Jamali, Stuss, & Ross, 2012; Iversen, Repp, & Patel, 2009). As such, the audiovisual interplay in perception based on a convergent link to the motor system remains a promising hypothesis for further explorations. 4.2. Multisensory enhancement of beat perception The present study is the first attempt to characterize multisensory beat perception parametrically, and a pattern can be noted of increasing visual enhancement as the auditory performance deteriorates. That is, the weaker an auditory rhythm appears in perception, the more its perception can be ‘rescued’ by an accompanying visual beat. This pattern, which is in accordance with the PoIE, seems more consistent when comparing the main measures between AV and A, but less unequivocal in the relative gains. Although many studies have shown that the perception of various multisensory stimuli follows the PoIE (Hecht et al., 2008; Jessen et al., 2012; Rach & Diederich, 2006; Saldern & Noppeney, 2013; Senkowski et al., 2011; Stevenson & James, 2009; Stevenson et al., 2012; Werner & Noppeney, 2010), few have measured it parametrically across several stimulus or response levels. Some studies provide only partial or no support for this principle in audiovisual speech perception (Ross et al., 2007; Tye-Murray, Sommers, Spehar, Myerson, & Hale, 2010). In particular, Ross et al. (2007) show that multisensory enhancement of speech perception can be characterized in several ways, and the data obey the PoIE only when certain analyses are chosen. Clearly, whether or not the experimental data abide the PoIE is sensitive to the tasks and stimuli, as well as the analysis methods used to index the gain (Holmes, 2007; Holmes, 2009; Lippert, Logothetis, & Kayser, 2007; Ross et al., 2007). In addition, both Ross et al. (2007) and the present study show that, at an intermediate response level, the relative gain deviates more from the linear trend that describes response gains at the other levels. It might be that the mechanisms for binding complex multisensory information, as found in speech and music, are influenced by other factors. To name a few, stimulus factors such as temporal frequency or predictability, as well as cognitive factors such as prior knowledge or expertise, can all play a role in the inter-sensory integration (Vroomen & Keetels, 2010). The behavioral pattern and neural computation in this case may be more complex and less uniform than that for integrating simple multisensory signals in the subcortical neurons (Meredith & Stein, 1983). Nevertheless, similar integration mechanisms seem to exist for speech and for musical rhythms involving visually communicated movements. This result further points to the overlapping processes underlying speech and music perception that may be linked to the motor system. Given that the observed multisensory gain abides the PoIE at least partially, to what extent does the present audiovisual interaction qualify as ‘integration’? In its original definition (Meredith & Stein, 1983), ‘integration’ suggests that the combination of sensory information is reflected not only in the behavioral response, but also in the neural mechanisms that underlie the response. Specifically, a common neuronal population or brain area responds to both uni- and multisensory events with an operational profile following the PoIE and other additivity rules (Stein & Stanford, 2008). Here, as the auditory and visual stimuli are independent (albeit relevant and coordinated) streams that obviously do not originate from the same source, it seems unlikely that signals at lower levels are fused. An integrated audiovisual beat percept, perhaps at the

sensorimotor level, seems plausible. Future studies may explore neural activities supporting this integration, e.g., whether areas associated with beat perception, such as putamen (Grahn et al., 2011; Hove et al., 2013a) or other secondary motor areas, respond to uni- and multisensory beats with a profile that follows the established rules (Stein & Stanford, 2008). The same mechanism likely underlies the multisensory gain for the present perceptual and synchronization tasks, as a similar result pattern is observed in both tasks. Given that perceptual timing is found more sensitive to the temporal structure of the stimuli than motor timing (Repp, 2002, 2009), the effect on sensorimotor synchronization presumably occurs at the perceptual level, i.e., the perceived beat strength that is used to couple finger movements. Besides, while beat perception is required for overtly synchronizing to the beat, the latter entails additional mapping of sensory rhythms onto motor behaviors (Sowin´ski & Bella, 2013), and it is unclear whether the integration principles would be reflected in this mapping. Elliott, Wing, and Welchman (2010) investigated integration of simple rhythmic audiovisual cues in a finger-tapping paradigm, and found that synchronization to an auditory metronome was improved more by a concurrent visual metronome (flashes) when the auditory metronome was more irregular due to jitters. (They employed only two levels of jitter.) This pattern seems to resemble the multisensory gain in the present synchronization task, and the authors also attribute the effect to central (instead of peripheral/motor) timing variability associated with the cues. Their findings are discussed with regard to the optimal integration model (Ernst & Bülthoff, 2004), which predicts that the visual cues would receive more weight in the integration when the auditory cues become less reliable. Although a similar computation might have explained the perceived beat strength in the present synchronization task, the lack of correlation between the multisensory gain and the visual performance seems to argue against this possibility. The extent to which the audiovisual beat of complex stimuli is optimally integrated is still subject to future modeling. 4.3. Temporal sensitivity in auditory polyrhythms Both experiments yielded an unexpected finding: The order of the unisensory (auditory) performance level did not match that of the implemented interference level, suggesting that the beat strength of the auditory rhythms was not progressively weakened by the increasing polyrhythmic complexity. This finding is intriguing and warrants some discussions on auditory polyrhythm perception, although the topic is less central to the main issue of the present research. With regard to temporal sensitivity in polyrhythms, two opposing hypotheses have been outlined in Fidali et al. (2013): One is that, according to Jones, Jagacinski, Yee, Floyd, and Klapp (1995), when two isochronous sequences forming a polyrhythm are perceptually integrated into a single stream (Bregman, 1990),4 the resultant (unisochronous) inter-onset intervals are shorter than those of each respective stream. According to Weber’s law (e.g., Monahan & Hirsh, 1990), this should lead to improved sensitivity to a temporal perturbation, as sensitivity is greater for smaller intervals. The other hypothesis states that, despite the aforementioned advantage, increasing polyrhythmic complexity would increase the memory demand for perceptual integration, as well as give rise to many different inter-onset intervals in an integrated stream, which increases variability of the internal timekeeper. These factors should hinder perturbation detection. 4 Auditory streaming (Bregman, 1990; Repp, 2009) is typically examined using two sequences of the same timbre with different pitches. Stream segregation is promoted by a wider pitch separation and an increasing rate (temporal frequency) of the sequences.

29

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

Fidali et al. (2013) show that perturbation detection rate decreases accordingly with increasing polyrhythmic complexity, regardless of the pitch separation as well as the instruction to integrate or segregate the two streams. Their results suggest that perceptual integration occurs in all the polyrhythmic complexities, which impairs temporal sensitivity to perturbations. In the present experiments, however, a gradually impaired performance according to the increasing polyrhythmic complexity was not observed, and there were considerable individual differences in the performance levels in relation to the polyrhythmic complexity levels (see Table A1 in Appendix). Although it is not clear why the present results diverged from those of Fidali et al. (2013) and did not show a uniformly detrimental effect of polyrhythmic complexity on auditory task performance, several explanations might have been possible given the mixed results. First, the rhythm and the interference were presented by two different percussion sounds, one of a higher and the other of a lower timber. The two streams might tend to be integrated when the interference was slower, but segregated when the interference was faster (Bregman, 1990). Assuming that integration impairs temporal processing of the target rhythms (Fidali et al., 2013), the tendency to segregate the interference from the rhythms at a higher tempo might explain why, amongst several participants, the highest interference tempo did not yield the worst performance. On the other hand, if the interference is inevitably integrated with the rhythms due to their polyrhythmic relation, it is also possible that integration to some extent improved the perception of the rhythms. This may be due to the shortened inter-onset intervals (Jones et al., 1995) in the integrated rhythms, or that the interference sequence served as polymetric reference for the rhythms in a similar manner as does a same-meter beat (Repp, 2008). The latter idea seems less likely but not entirely unfeasible, as some polyrhythmic combinations in the present experiments, e.g., 5:3, 5:4, or 8:5, may be metrically comprehensible to musically trained participants (Bogacz, 2005), who might be able to perceptually integrate two different beat periods of the polyrhythmic streams (Poudrier & Repp, 2013; Vuust, Roepstorff, Wallentin, Mouridsen, & Ostergaard, 2006). Overall, individuals seem to differ widely in their reactions to polyrhythms, i.e., whether streams are integrated or segregated, and whether either strategy is beneficial for timing the events. Nevertheless, the main finding of the multisensory gain, as well as its pattern of inverse effectiveness, is not compromised by this heterogeneity. This is because the critical comparisons were made between the performance levels in multisensory (AV) and in unisensory (A) conditions, with the former being sorted by the outcome of the latter in a within-participant manner. Whether the presence of a visual beat would alter the tendency to integrate or segregate the two auditory streams under each polyrhythmic ratio remains an empirical question itself; thus far, however, no evidence has pointed to this possibility. Finally, although the implication of the observed heterogeneity in polyrhythm perception is beyond the scope of the present research, further investigations of individual differences in processing polyrhythmic patterns may help understand how humans time such complex stimuli, e.g., whether a beat-based mechanism applies (Teki, Grube, & Griffiths, 2012). 4.4. Conclusion and research outlook The present research links multisensory perception and rhythm perception involving an observed movement, and demonstrates a positive effect of a visual beat conveyed by a humanlike movement on beat perception of auditory rhythms. Moreover, the perception of concurrent auditory and visual beats follows a similar operational profile to the PoIE, as often described in the multisensory

integration literature. The effect thus seems to result from an integrated percept of audiovisual beat, possibly associated with convergent activations in the internal motor system. Taken from here, several questions on this topic are worth addressing in subsequent studies. For example, is biological motion more efficient in cross-modal modulation than artificial or other object motions, such as a bouncing ball (Hove et al., 2013b) or a moving bar (Grahn, 2012)? Besides, do all the rhythmic human movements have the same effect, or are movements related to musical activities more advantageous in rhythm perception? Also, does a human figure play an important role in internal action simulation (Saygin & Stadler, 2012) that perpetuates the link between movement and rhythm in perception? (However, see Hove et al., 2010 and Su, 2014a, for potential arguments against this idea.) Finally, as the present visual stimulus consists of a relatively simple movement, it would be worth investigating whether and how similar crossmodal effects can apply to more realistic dance movements that may communicate different metrical structures through different body parts (Toiviainen et al., 2010). In sum, this study reveals an audiovisual interplay in the perception of dynamic rhythmic stimuli, and its mechanisms may provide knowledge of how sensory information is combined to eventually guide human actions. Acknowledgments This work and the author were supported by a Grant from the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG), SU 782/1-1. The author thanks Joachim Hermsdörfer for advice on the Zebris motion capture system, and two anonymous reviewers for constructive comments on an earlier version of the manuscript. Appendix A See Table A1. Table A1 The auditory interference conditions (denoted by the polyrhythmic complexity) in the order of descending unisensory (auditory) performance for each participant in each experiment. e.g., Level 1 represents the best performance, and Level 5 the worst, within each participant. Performance

Level 1

Level 2

Level 3

Level 4

Level 5

Experiment 1 Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Subject 6 Subject 7 Subject 8 Subject 9 Subject 10 Subject 11 Subject 12 Subject 13 Subject 14

8:0 8:5 8:0 8:0 8:0 8:0 8:3 8:0 8:0 8:0 8:0 8:0 8:0 8:3

8:3 8:3 8:3 8:7 8:3 8:3 8:0 8:7 8:3 8:3 8:5 8:5 8:3 8:7

8:9 8:0 8:9 8:3 8:9 8:9 8:7 8:3 8:5 8:5 8:3 8:7 8:5 8:0

8:5 8:9 8:5 8:9 8:5 8:5 8:5 8:9 8:9 8:9 8:9 8:3 8:7 8:5

8:7 8:7 8:7 8:5 8:7 8:7 8:9 8:5 8:7 8:7 8:7 8:9 8:9 8:9

Experiment 2 Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Subject 6 Subject 7 Subject 8 Subject 9 Subject 10 Subject 11 Subject 12

5:0 5:0 5:0 5:0 5:0 5:0 5:0 5:3 5:4 5:0 5:3 5:6

5:3 5:3 5:3 5:3 5:3 5:7 5:7 5:0 5:6 5:3 5:4 5:0

5:4 5:4 5:4 5:4 5:4 5:3 5:3 5:7 5:3 5:4 5:7 5:7

5:7 5:7 5:6 5:7 5:7 5:4 5:4 5:4 5:0 5:6 5:0 5:3

5:6 5:6 5:7 5:6 5:6 5:6 5:6 5:6 5:7 5:7 5:6 5:4

30

Y.-H. Su / Brain and Cognition 90 (2014) 19–31

References Ahlström, V., Blake, R., & Ahlström, U. (1997). Perception of biological motion. Perception, 26(12), 1539–1548. http://dx.doi.org/10.1068/p261539. Alaerts, K., Swinnen, S., & Wenderoth, N. (2009). Interaction of sound and sight during action perception: Evidence for shared modality-dependent action representations. Neuropsychologia, 47, 2593–2599. Altieri, N., Pisoni, D. B., & Townsend, J. T. (2011). Some behavioral and neurobiological constraints on theories of audiovisual speech integration: A review and suggestions for new directions. Seeing and Perceiving, 24(6), 513–539. http://dx.doi.org/10.1163/187847611X595864. Alvarado, J. C., Vaughan, J. W., Stanford, T. R., & Stein, B. E. (2007). Multisensory versus unisensory integration: Contrasting modes in the superior colliculus. Journal of Neurophysiology, 97(5), 3193–3205. http://dx.doi.org/10.1152/ jn.00018.2007. Bogacz, S. (2005). Understanding how speed affects performance of polyrhythms: Transferring control as speed increase. Journal of Motor Behavior, 37(1), 21–34. Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. Brandon, M., & Saffran, J. R. (2011). Apparent motion enhances visual rhythm discrimination in infancy. Attention, Perception & Psychophysics, 73(4), 1016–1020. http://dx.doi.org/10.3758/s13414-011-0106-x. Bregman, A. S. (1990). Auditory scene analysis: The perceptual organization of sound. Cambridge: MIT Press. Burger, B., Thompson, M. R., Luck, G., Saarikallio, S., & Toiviainen, P. (2013). Influences of rhythm- and timbre-related musical features on characteristics of music-induced movement. Frontiers in Psychology, 4, 1–10. http://dx.doi.org/ 10.3389/fpsyg.2013.00183. Cason, N., & Schön, D. (2012). Rhythmic priming enhances the phonological processing of speech. Neuropsychologia, 50(11), 2652–2658. http://dx.doi.org/ 10.1016/j.neuropsychologia.2012.07.018. Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009). The natural statistics of audiovisual speech. PLoS Computational Biology, 5, e1000436. Cogollor, J. M., Hughes, C., Ferre, M., Rojo, J., Hermsdörfer, J., Wing, A., et al. (2012). Handmade task tracking applied to cognitive rehabilitation. Sensors, 12(12), 14214–14231. http://dx.doi.org/10.3390/s121014214. Cross, E. S., Kraemer, D. J. M., Hamilton, A. F. de C, Kelley, W. M., & Grafton, S. T. (2009). Sensitivity of the action observation network to physical and observational learning. Cerebral Cortex, 19(2), 315–326. http://dx.doi.org/ 10.1093/cercor/bhn083. Drake, C., Jones, M., & Baruch, C. (2000). The development of rhythmic attending in auditory sequences: Attunement, referent period, focal attending. Cognition, 77, 251–288. Elliott, M., Wing, A., & Welchman, A. (2010). Multisensory cues improve sensorimotor synchronisation. European Journal of Neuroscience, 31, 1828–1835. Ernst, M. O., & Bülthoff, H. H. (2004). Merging the senses into a robust percept. Trends in Cognitive Sciences, 8(4), 162–169. http://dx.doi.org/10.1016/ j.tics.2004.02.002. Fidali, B. C., Poudrier, È., & Repp, B. H. (2013). Detecting perturbations in polyrhythms: Effects of complexity and attentional strategies. Psychological Research, 77, 183–195. http://dx.doi.org/10.1007/s00426-011-0406-8. Fujioka, T., Ween, J. E., Jamali, S., Stuss, D. T., & Ross, B. (2012). Changes in neuromagnetic beta-band oscillation after music-supported stroke rehabilitation. Annals of the New York Academy of Sciences, 1252, 294–304. http://dx.doi.org/10.1111/j.1749-6632.2011.06436.x. Galantucci, B., Fowler, C. A., & Turvey, M. T. (2006). The motor theory of speech perception reviewed. Psychonomic Bulletin & Review, 13(3), 361–377. Ghazanfar, A. A. (2013). Multisensory vocal communication in primates and the evolution of rhythmic speech. Behavioral Ecology and Sociobiology, 67(9), 1441–1448. http://dx.doi.org/10.1007/s00265-013-1491-z. Ghazanfar, A. A., Chandrasekaran, C., & Morrill, R. J. (2010). Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: Implications for the evolution of audiovisual speech. European Journal of Neuroscience, 31(10), 1807–1817. http://dx.doi.org/10.1111/j.14609568.2010.07209.x. Gilaie-Dotan, S., Kanai, R., Bahrami, B., Rees, G., & Saygin, A. P. (2013). Neuroanatomical correlates of biological motion detection. Neuropsychologia, 51(3), 457–463. http://dx.doi.org/10.1016/j.neuropsychologia.2012.11.027. Grahn, J. A. (2012). See what I hear? Beat perception in auditory and visual rhythms. Experimental Brain Research, 220(1), 51–61. http://dx.doi.org/10.1007/s00221012-3114-8. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience, 19(5), 893–906. Grahn, J. A., Henry, M. J., & McAuley, J. D. (2011). FMRI investigation of cross-modal interactions in beat perception: Audition primes vision, but not vice versa. NeuroImage, 54(2), 1231–1243. http://dx.doi.org/10.1016/j.neuroimage.2010. 09.033. Grondin, S., & McAuley, J. D. (2009). Duration discrimination in crossmodal sequences. Perception, 38(10), 1542–1559. http://dx.doi.org/10.1068/p6359. Grube, M., & Griffiths, T. D. (2009). Metricality-enhanced temporal encoding and the subjective perception of rhythmic sequences. Cortex, 45(1), 72–79. http:// dx.doi.org/10.1016/j.cortex.2008.01.006. Guttman, S. E., Gilroy, L. A., & Blake, R. (2005). Hearing what the eyes see: Auditory encoding of visual temporal sequences. Psychological Science, 16(3), 228–235. http://dx.doi.org/10.1111/j.0956-7976.2005.00808.x.

Hausen, M., Torppa, R., Salmela, V. R., Vainio, M., & Särkämö, T. (2013). Music and speech prosody: A common rhythm. Frontiers in Psychology, 4, 1–16. http:// dx.doi.org/10.3389/fpsyg.2013.00566. Hecht, D., Reiner, M., & Karni, A. (2008). Multisensory enhancement: Gains in choice and in simple response times. Experimental Brain Research, 189(2), 133–143. http://dx.doi.org/10.1007/s00221-008-1410-0. Holmes, N. P. (2007). The law of inverse effectiveness in neurons and behaviour: Multisensory integration versus normal variability. Neuropsychologia, 45(14), 3340–3345. http://dx.doi.org/10.1016/j.neuropsychologia.2007.05.025. Holmes, N. P. (2009). The principle of inverse effectiveness in multisensory integration: Some statistical considerations. Brain Topography, 21(3–4), 168–176. http://dx.doi.org/10.1007/s10548-009-0097-2. Hove, M. J., Fairhurst, M. T., Kotz, S. A., & Keller, P. E. (2013a). Synchronizing with auditory and visual rhythms: An fMRI assessment of modality differences and modality appropriateness. NeuroImage, 67, 313–321. http://dx.doi.org/10.1016/ j.neuroimage.2012.11.032. Hove, M. J., Iversen, J. R., Zhang, A., & Repp, B. H. (2013b). Synchronization with competing visual and auditory rhythms: Bouncing ball meets metronome. Psychological Research, 77(4), 388–398. http://dx.doi.org/10.1007/s00426-0120441-0. Hove, M. J., Spivey, M. J., & Krumhansl, C. L. (2010). Compatibility of motion facilitates visuomotor synchronization. Journal of Experimental Psychology: Human Perception and Performance, 36(6), 1525–1534. http://dx.doi.org/ 10.1037/a0019059. Iversen, J. R., Repp, B. H., & Patel, A. D. (2009). Top-down control of rhythm perception modulates early auditory responses. Annals of the New York Academy of Sciences, 1169, 58–73. http://dx.doi.org/10.1111/j.1749-6632.2009.04579.x. Jäncke, L. (2000). Cortical activations during paced finger-tapping applying visual and auditory pacing stimuli. Cognitive Brain Research, 10, 51–66. Jeannerod, M. (2001). Neural simulation of action: A unifying mechanism for motor cognition. NeuroImage, 14(1), S103–S109. http://dx.doi.org/10.1006/ nimg.2001.0832. Jessen, S., Obleser, J., & Kotz, S. A. (2012). How bodies and voices interact in early emotion perception. PLoS ONE, 7(4), e36070. http://dx.doi.org/10.1371/ journal.pone.0036070. Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception & Psychophysics, 14(2), 201–211. http://dx.doi.org/10.3758/ BF03212378. Jones, M. R., Jagacinski, R. J., Yee, W., Floyd, R. L., & Klapp, S. T. (1995). Tests of attentional flexibility in listening to polyrhythmic patterns. Journal of Experimental Psychology: Human Perception and Performance, 21(2), 293–307. Kaplan, J. T., & Iacoboni, M. (2007). Multimodal action representation in human left ventral premotor cortex. Cognitive Processing, 8(2), 103–113. http://dx.doi.org/ 10.1007/s10339-007-0165-z. Karabanov, A., Blom, O., Forsman, L., & Ullen, F. (2009). The dorsal auditory pathway is involved in performance of both visual and auditory rhythms. NeuroImage, 44(2), 480–488. http://dx.doi.org/10.1016/j.neuroimage.2008.08.047. Kato, M., & Konishi, Y. (2006). Auditory dominance in the error correction process: A synchronized tapping study. Brain Research, 1084(1), 115–122. http:// dx.doi.org/10.1016/j.brainres.2006.02.019. Kriegstein, von K., Dogan, Ö., Grüter, M., Giraud, A.-L., Kell, C. A., Grüter, T., et al. (2008). Simulation of talking faces in the human brain improves auditory speech recognition. Proceedings of the National academy of Sciences of the United States of America, 105(18), 6747–6752. Large, E. W. (2008). Resonating to musical rhythm: Theory and experiment. In S. Grondin (Ed.), The psychology of time (pp. 189–231). United Kingdom: Emerald. Lidji, P., Palmer, C., Peretz, I., & Morningstar, M. (2011). Listeners feel the beat: Entrainment to English and French speech rhythms. Psychonomic Bulletin & Review, 18(6), 1035–1041. http://dx.doi.org/10.3758/s13423-011-0163-0. Lippert, M., Logothetis, N. K., & Kayser, C. (2007). Improvement of visual contrast detection by a simultaneous sound. Brain Research, 1173, 102–109. http:// dx.doi.org/10.1016/j.brainres.2007.07.050. Lorås, H., Sigmundsson, H., Talcott, J. B., Öhberg, F., & Stensdotter, A. K. (2012). Timing continuous or discontinuous movements across effectors specified by different pacing modalities and intervals. Experimental Brain Research, 220(3–4), 335–347. http://dx.doi.org/10.1007/s00221-012-3142-4. Luck, G., & Sloboda, J. (2008). Exploring the spatio-temporal properties of simple conducting gestures using a synchronization task. Music Perception, 25(3), 225–239. http://dx.doi.org/10.1525/mp.2008.25.3.225. Luck, G., & Sloboda, J. A. (2009). Spatio-temporal cues for visually mediated synchronization. Music Perception, 26(5), 465–473. http://dx.doi.org/10.1525/ mp.2009.26.5.465. McAuley, J. D., & Henry, M. J. (2010). Modality effects in rhythm processing: Auditory encoding of visual rhythms is neither obligatory nor automatic. Attention, Perception & Psychophysics, 72(5), 1377–1389. http://dx.doi.org/ 10.3758/APP.72.5.1377. Meredith, M. A., & Stein, B. E. (1983). Interactions among converging sensory inputs in the superior colliculus. Science (New York, NY), 221(4608), 389–391. Merker, B., Madison, G., & Eckerdal, P. (2009). On the role and origin of isochrony in human rhythmic entrainment. Cortex, 45, 4–17. Monahan, C., & Hirsh, I. (1990). Studies in auditory timing: 2. Rhythm patterns. Perception & Psychophysics, 47(3), 227–242. Patel, A. D. (2012). The OPERA hypothesis: Assumptions and clarifications. Annals of the New York Academy of Sciences, 1252(1), 124–128. http://dx.doi.org/10.1111/ j.1749-6632.2011.06426.x.

Y.-H. Su / Brain and Cognition 90 (2014) 19–31 Patel, A. D. (2014). Can nonlinguistic musical training change the way the brain processes speech? The expanded OPERA hypothesis. Hearing Research, 308, 98–108. http://dx.doi.org/10.1016/j.heares.2013.08.011. Patel, A. D., Iversen, J. R., Chen, Y., & Repp, B. H. (2005). The influence of metricality and modality on synchronization with a beat. Experimental Brain Research, 163, 226–238. Phillips-Silver, J., & Trainor, L. J. (2007). Hearing what the body feels: Auditory encoding of rhythmic movement. Cognition, 105(3), 533–546. http://dx.doi.org/ 10.1016/j.cognition.2006.11.006. Poudrier, È., & Repp, B. H. (2013). Can musicians track two different beats simultaneously? Music Perception, 30(4), 369–390. http://dx.doi.org/10.1525/ MP.2013.30.4.369. Povel, D. J., & Essens, P. (1985). Perception of temporal patterns. Music Perception, 2(4), 411–440. Prinz, W. (1997). Perception and action planning. European Journal of Cognitive Psychology, 9(2), 129–154. Quinlan, J. F., Mullett, H., Stapleton, R., FitzPatrick, D., & McCormack, D. (2006). The use of the Zebris motion analysis system for measuring cervical spine movements in vivo. Proceedings of the Institution of Mechanical Engineers. Part H, Journal of Engineering in Medicine, 220(8), 889–896. http://dx.doi.org/ 10.1243/09544119JEIM53. Rach, S., & Diederich, A. (2006). Visual–tactile integration: Does stimulus duration influence the relative amount of response enhancement? Experimental Brain Research, 173(3), 514–520. http://dx.doi.org/10.1007/s00221-006-0452-4. Repp, B. H. (2002). Perception of timing is more context sensitive than sensorimotor synchronization. Perception & Psychophysics, 64(5), 703–716. Repp, B. H. (2003). Rate limits in sensorimotor synchronization with auditory and visual sequences: The synchronization threshold and the benefits and costs of interval subdivision. Journal of Motor Behavior, 35(4), 355–370. http:// dx.doi.org/10.1080/00222890309603156. Repp, B. H. (2008). Multiple temporal references in sensorimotor synchronization with metrical auditory sequences. Psychological Research, 72, 79–98. Repp, B. H. (2009). Segregated in perception, integrated for action: Immunity of rhythmic sensorimotor coordination to auditory stream segregation. Quarterly Journal of Experimental Psychology (2006), 62(3), 426–434. http://dx.doi.org/ 10.1080/17470210802479105. Repp, B. H., Iversen, J. R., & Patel, A. D. (2008). Tracking an imposed beat within a metrical grid. Music Perception, 26(1), 1–18. Repp, B. H., & Penel, A. (2002). Auditory dominance in temporal processing: New evidence from synchronization with simultaneous visual and auditory sequences. Journal of Experimental Psychology: Human Perception and Performance, 28(5), 1085–1099. Repp, B., & Penel, A. (2004). Rhythmic movement is attracted more strongly to auditory than to visual rhythms. Psychological Research, 68(4). http://dx.doi.org/ 10.1007/s00426-003-0143-8. Repp, B. H., & Su, Y.-H. (2013). Sensorimotor synchronization: A review of recent research (2006–2012). Psychonomic Bulletin & Review, 20(3), 403–452. http:// dx.doi.org/10.3758/s13423-012-0371-2. Ross, L. A., Saint-Amour, D., Leavitt, V. M., Javitt, D. C., & Foxe, J. J. (2007). Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex (New York, NY: 1991), 17(5), 1147–1153. http://dx.doi.org/10.1093/cercor/bhl024. Rothermich, K., Schmidt-Kassow, M., & Kotz, S. A. (2012). Rhythm’s gonna get you: Regular meter facilitates semantic sentence processing. Neuropsychologia, 50(2), 232–244. http://dx.doi.org/10.1016/j.neuropsychologia.2011.10.025. Saldern, von S., & Noppeney, U. (2013). Sensory and striatal areas integrate auditory and visual signals into behavioral benefits during motion discrimination. Journal of Neuroscience, 33(20), 8841–8849. http://dx.doi.org/10.1523/ JNEUROSCI.3020-12.2013. Saygin, A. P. (2007). Superior temporal and premotor brain areas necessary for biological motion perception. Brain, 130(Pt 9), 2452–2461. http://dx.doi.org/ 10.1093/brain/awm162. Saygin, A. P., & Stadler, W. (2012). The role of appearance and motion in action prediction. Psychological Research, 76(4), 388–394. http://dx.doi.org/10.1007/ s00426-012-0426-z. Schroeder, C. E., Lakatos, P., Kajikawa, Y., Partan, S., & Puce, A. (2008). Neuronal oscillations and visual amplification of speech. Trends in Cognitive Sciences, 12(3), 106–113. http://dx.doi.org/10.1016/j.tics.2008.01.002. Schroeder, C. E., Wilson, D. A., Radman, T., Scharfman, H., & Lakatos, P. (2010). Dynamics of active sensing and perceptual selection. Current Opinion in Neurobiology, 20(2), 172–176. http://dx.doi.org/10.1016/j.conb.2010.02.010.

31

Scott, S. K., McGettigan, C., & Eisner, F. (2009). A little more conversation, a little less action—Candidate roles for the motor cortex in speech perception. Nature Reviews Neuroscience, 10, 295–302. Senkowski, D., Saint-Amour, D., Höfle, M., & Foxe, J. J. (2011). Multisensory interactions in early evoked brain activity follow the principle of inverse effectiveness. NeuroImage, 56(4), 2200–2208. http://dx.doi.org/10.1016/ j.neuroimage.2011.03.075. Sevdalis, V., & Keller, P. E. (2010). Cues for self-recognition in point-light displays of actions performed in synchrony with music. Consciousness and Cognition, 19(2), 617–626. http://dx.doi.org/10.1016/j.concog.2010.03.017. Sowin´ski, J., & Bella, S. D. (2013). Poor synchronization to the beat may result from deficient auditory-motor mapping. Neuropsychologia, 51(10), 1952–1963. http://dx.doi.org/10.1016/j.neuropsychologia.2013.06.027. Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection theory measures. Behavior Research Methods, Instruments, & Computers, 31(1), 137–149. Stein, B. E., & Stanford, T. R. (2008). Multisensory integration: Current issues from the perspective of the single neuron. Nature Reviews Neuroscience, 9, 225–266. Stein, B. E., Stanford, T. R., Ramachandran, R., Perrault, T. J., & Rowland, B. A. (2009). Challenges in quantifying multisensory integration: Alternative criteria, models, and inverse effectiveness. Experimental Brain Research, 198(2–3), 113–126. http://dx.doi.org/10.1007/s00221-009-1880-8. Stevenson, R. A., Bushmakin, M., Kim, S., Wallace, M. T., Puce, A., & James, T. W. (2012). Inverse effectiveness and multisensory interactions in visual eventrelated potentials with audiovisual speech. Brain Topography, 25(3), 308–326. http://dx.doi.org/10.1007/s10548-012-0220-7. Stevenson, R., & James, T. (2009). Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage, 44, 1210–1223. Su, Y.-H. (2014a). Peak velocity as a cue in audiovisual synchrony perception of rhythmic stimuli. Cognition, 131(3), 330–344. Su, Y.-H. (2014b). Audiovisual beat induction in complex auditory rhythms: Pointlight figure movement as an effective visual beat. Acta Psychologica. http:// dx.doi.org/10.1016/j.actpsy.2014.05.016. Su, Y.-H., & Jonikaitis, D. (2011). Hearing the speed: Visual motion biases the perception of auditory tempo. Experimental Brain Research, 214(3), 357–371. http://dx.doi.org/10.1007/s00221-011-2835-4. Su, Y.-H., & Pöppel, E. (2012). Body movement enhances the extraction of temporal structures in auditory sequences. Psychological Research, 76(3), 373–382. http:// dx.doi.org/10.1007/s00426-011-0346-3. Teki, S., Grube, M., & Griffiths, T. D. (2012). A unified model of time perception accounts for duration-based and beat-based timing mechanisms. Frontiers in Integrative Neuroscience, 5, 90. http://dx.doi.org/10.3389/fnint.2011.00090. Toiviainen, P., Luck, G., & Thompson, M. R. (2010). Embodied meter: Hierarchical eigenmodes in music-induced movement. Music Perception, 28, 59–70. Tye-Murray, N., Sommers, M., Spehar, B., Myerson, J., & Hale, S. (2010). Aging, audiovisual integration, and the principle of inverse effectiveness. Ear and Hearing, 1. http://dx.doi.org/10.1097/AUD.0b013e3181ddf7ff. van Kemenade, B. M., Muggleton, N., Walsh, V., & Saygin, A. P. (2012). Effects of TMS over premotor and superior temporal cortices on biological motion perception. Journal of Cognitive Neuroscience, 24(4), 896–904. http://dx.doi.org/10.1162/ jocn_a_00194. Vroomen, J., & Keetels, M. (2010). Perception of intersensory synchrony: A tutorial review. Attention, Perception & Psychophysics, 72(4), 871–884. http://dx.doi.org/ 10.3758/APP.72.4.871. Vuust, P., Roepstorff, A., Wallentin, M., Mouridsen, K., & Ostergaard, L. (2006). It don’t mean a thing. . . Keeping the rhythm during polyrhythmic tension, activates language areas (BA47). NeuroImage, 31, 832–841. http://dx.doi.org/ 10.1016/j.neuroimage.2005.12.037. Welch, R., & Warren, D. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638–667. Werner, S., & Noppeney, U. (2010). Superadditive responses in superior temporal sulcus predict audiovisual benefits in object categorization. Cerebral Cortex, 20(8), 1829–1842. http://dx.doi.org/10.1093/cercor/bhp248. Wöllner, C., Deconinck, F. J. A., Parkinson, J., Hove, M. J., & Keller, P. E. (2012). The perception of prototypical motion: Synchronization is enhanced with quantitatively morphed gestures of musical conductors. Journal of Experimental Psychology: Human Perception and Performance, 38(6), 1390–1403. http://dx.doi.org/10.1037/a0028130.

Visual enhancement of auditory beat perception across auditory interference levels.

This study dealt with audiovisual rhythm perception involving an observed movement. Two experiments investigated whether a visual beat conveyed by a b...
823KB Sizes 0 Downloads 3 Views