Dichotic masking of consonants by vowels.

Dichotic masking of consonants by vowels Bruno H. Repp* Psycholinguistics Laboratories, Universityof Chicago, Chicago,Illinois 60637 (Received 28 June 1974; revised 11 December 1974) The dichotic masking effects of a vowel (/a/) on CV syllables (stop + /a/) were investigated by comparing the effect of CV masks with (a) the effect of CV masks in which the vowel was shortened and (b) the effect of an isolated vowel mask. The vowel exerted a clear masking effect, both when isolated and when in CV context, but this effect was much less pronounced than the "masking" produced by a competing consonant. It was hypothesizedthat consonantsexert their "masking" effect on consonant(CV) targets at a central (phonetic) level, while vowel masking is largely a "peripheral" (auditory) phenomenon. Detailed analysis of the vowel masking effect revealed several distinct components.The two most interesting effects were (a) a relationship between stimulus onset asynchronyand the probability of voiced responses,which was interpreted as masking and substitution of the cue of voice onset time, and (b) a relationship between the pitch contour of the vowel mask and the probability of a voiced response,which suggestedthat the pitch contour of an isolated vowel may act as a voicing cue when paired with a consonant in the other ear. While all these effects were interpreted as consequencesof relatively peripheral binaural mixing, the possibility that part of the vowel masking effect occurred at a later, phonetic processingstage could not be excluded.

Subject Classification: 70.30; 65.48, 65.58.

INTRODUCTION

(McNeill and Repp, 1973; Repp, 1975).

The present study investigates the dichotic (forward, backward, and simultaneous) masking effects of vowels on consonants. istics

of such

eral reasons,

The extent and the temporal charactereffects

are

of theoretical

interest

for

sev-

as outlined below.

A number of recent studies have presented dichotic CV syllables at various stimulus onset asynchronies

(SOAs), in order to determine the time course of dichotic interference (e.g., Studdert-Kennedy,Shankweiler, and Schulman, 1970; Pisoni, 1972; Berlin, Lowe-Bell, Cullen, Thompson, and Loovis, 1973; Kitsrein, 1973; McNeill and Repp, 1973; Pisoni and McNabb, 1974;

Repp, 1975).

With the exception of Pisoni (1972),

who used extremely

short stimuli and varied interstim-

ulus interval instead of SOA, all these experiments used syllables of 100-350-msec duration which overlapped temporally until SOA exceeded the duration of the leading stimulus. Therefore, forward and backward mask-

ing (in terms of SOA) were contaminatedwith the simultaneous masking effect that the vowel of the leading syl-

lable may have had on the consonant (the formant transitions) of the lagging syllable. There are reasons to believe that this masking effect was very slight. Experiments on dichotie simultaneous

masking (often called "central masking") of nonspeech sounds at the detection threshold have shown only relatively weak interference, except when target and mask onsets were close to each other, i.e., at very short

SOAs (e.g., Elliott, 1965; Zwisloeki, Damianopoulos, Buining, and Glantz, 1967; Deatherage and Evans, 1969)• On these grounds, one might expect diehotie "forward masking" (in terms of SOA) between CV syllables to be

Moreover,

it is a well-established finding that backward masking between CV syllables exceeds forward masking--the so-

called "lag effect" (Studdert-Kennedyet al., 1970• Kirstein, 1973; Porter, 1974)--while any strong masking effect of the vowel should produce precisely the opposite result. Clearly, it is the information contained in the competing consonant that produces most of the interference.

Nevertheless, it is of interest to know whether the overlapping vowel has any effect at all. For example,

"forward masking" may be largely due to the simultaneous masking effect of the vowel of the leading syllable. Such a finding would be of considerable importance, since it would indicate the interference between competing consonants is unidirectional and that it is primarily a backward masking effect, perhaps due to interruption of the cortical processing of the leading consonant, while forward masking may be a more peripheral phenomenon due to auditory interaction at a subcortical level of bin-

aural convergence(where dichotic maskingbetween nonspeech sounds is assumed to arise--cf.

Zwislocki,

The effect of the vowel can be assessed by comparing CV masks

of full

duration

with

CV masks

in which

the

vowel has been shortened. Porter (1971) has taken this approach and apparently found no significant differences.

However, the way in which he reported his data (viz., only as the difference between backward and forward

masking) allows no good conclusions about the actual effects of his two masks, so that a more detailed investigation is warranted. The first part of the present

study compares the effect of consonant-vowel(½V) masks with the effect of "consonant"(C) masks in which

at a maximumaroundSOAsof -20 to -40 msee,• when

the vowel

the onset of the vowel of the leading syllable coincides with the acoustically much weaker formant transitions of the lagging syllable. This has never been observed: forward masking is always at a maximum at SOA = 0

Thus, one way of studying the masking effect of the vowel is to eliminate the vowel from the mask, as far as this is possible, and to see how much this reduces

724

J. Acoust.Soc. Am., Vol. 57, No. 3, March 11t75

has been

reduced

to minimal

duration.

Copyright¸ 111t75 by the AcousticalSocietyof America

724

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 84.88.136.149 On: Tue, 16 Dec 2014 10:12:05

725

B.H. Repp: Dichotic maskingof consonantsby vowels

interference.

This

method

assesses

the effect

of the

physical presence (or duration) of the vowel, since it cannot be completely eliminated after a stop consonant and thus remains present as a phonetic segment. Another

method

is to eliminate

the

consonant

in the mask

by gating out the transitional part of the CV syllable and to see how much masking remains. This approach has been taken in the second part of the present study which investigates the masking effect of isolated vowels on CV targets and compares it with that of C V masks. Thus, while the first method compares two masks that are acoustically different but phonetically similar, the second method compares two masks that are acoustically more similar but phonetically different. There

is little

doubt

that

dichotic

interference

be-

725

pect, therefore, that vowel masking, if present at all, will reflect relatively peripheral interference, perhaps due to auditory interaction at the point of first binaural convergence. This effect should be primarily a simultaneous masking effect and should be similar in nature to that of the masking vowel in C V context which is as-

sessedby the CV-C comparison.3 There shouldbe relatively little forward or backward masking due to the vowel

itself.

A further question of interest is whether vowel masks

produce a "lag effect," i, e,, more backward than forward masking (in terms of SOA). Sincethey are expected to exert mainly a forward masking (in terms of SOA;

actuallysimultaneous masking)effect, andsince "lag effect" has been assumedto arise at a phonetic stage in processing (Kirstein, 1973; Porter, 1974), this

tween competing consonants arises largely at a late

(phonetic)stage in processing. It may consist in the in-

effect should be absent with vowel masks.

teraction of feature representations which perhaps con-

Darwin (1971) in which CV targets were paired with CV or V (isolated vowel) masks at SOA= + 60 showedthat CV was much more effective than V as abackward mask,

verge on a unilateral speechprocessor (StuddertKennedy, Shankweiler, and Pisoni, 1972), or it may arise even later, after separate categorization of each syllable, from confusions in a response selection mech-

anism (Repp, 1975).

It is almost certain that, by

eliminating the masking consonant, a large part of such high-level interference is removed. The question is whether

an isolated

vowel

does

interfere

with

a conso-

nant target at a late stage in processing at all, or whether vowel masking is strictly a more peripheral phenomenon, i.e., interference at an earlier, auditory level of processing, as opposed to interference at the

phoneticlevel. •' Steady-state vowels are acoustically very different from the rapid transitions that characterize stop conso-

nants. Stop consonantsare "encoded" into the vowel and are likely to require specialized auditory and phonetic processing mechanisms that are not used in recognizing vowels. The existence of such specialized mechanisms

has

often

been

inferred

from

the

result

that

stop consonants are perceived categorically while vowels

are not (Stevens,Liberman,'•hman,andStuddertKennedy, 1969; Pisoni, 1971)--although the reason for this difference appears now to be the unavailability of transitional information in auditory short-term memory

A study by

while both were about equally effective as forward masks. As a result, CV masks produced alag effect but V masks did not. Moreover, this difference seemed to be pri-

marily due to differential masking of the "place" feature of the target, which is the feature that is truly encoded in the formant transitions, suggesting that the C V mask

interfered specifically with the "decoding" of the target while the V mask did not. The present study attempts a more comprehensive investigation of these effects. In addition to C and V masks, the present study used

a short vowel (v) mask in a third condition, in order to assess the effect of vowel duration (and energy) in isolated vowel masks. All masks were presented at two intensities, one equal to target intensity and the other 8 dB higher. This large design necessitated some other restrictions: the targets remained at a constant intensity and were always presented in the right ear. The comparison data for C V masks were taken from an earlier

study by Repp (1975).

The present stimulus tapes

were also taken from the earlier study, after obtaining the masking stimuli by an electronic gating procedure, and the stimulus intensities of the earlier experiment were matched as closely as possible.

(Pisoni, 1971, 1973; Studdert-Kennedy,in press)-from the finding of consistent dichotic right-ear advantages for competing stop consonants but not for compet-

I. METHOD

ing vowels (Shankweilerand Studdeft-Kennedy, 1967; Studdert-Kennedyand Shankweiler, 1970), and from the finding of a "lag effect" for consonantsbut not for competing vowels (Porter, Shankweiler, and Liberman,

A. Stimuli

1969). Although some of these differences are less evident when vowels are more "encoded," such as in

rapid speech(e.g., Stevens, 1968; Haggard, 1971) or under difficult listening conditions(Weiss and House, 1973), this is less relevant in the present case where

The source tape was recorded at Haskins Laborato-

ries (New Haven, Connecticut)using their Parallel Formant Synthesizer and Pulse Code Modulation System. It contained all possible dichotic pairs, including iden-

tical pairs, of the six syllables, /ba/,/da/,/ga/,/pa/, /ta/, and/ka/. Each pair occurred at each of eight different SOAs (0, 20, 40, 60, 90, 120, 240, and 480 msec), resulting in 8x36= 288 stimulus pairs.

They

the vowel is isolated and steady-state, and, moreover,

were arranged into two nearly equivalent half-sets

constant

144, each containing 8 lists of 18 pairs. The sequence of stimulus pairs within each half-set was completely random. The interval between successive pairs was about 4 sec, with a longer interval between lists. Channel A always lagged behind channel B (except at SOA= 0).

from

trial

to trial.

There

seems

to be little

reason to expect the masking vowel to interfere with the target consonant at a late stage in processing, since the vowel presumably contains no features, and generates no responses, that are in strong conflict with features or responses for the target consonant. One might ex-

of

J. Acoust. Soc. Am., Vol. 57, No. 3, March 1975


726

B.H. Repp:Dichoticmasking of consonants by vowels

TABLE I. Important temporal characteristics syllables (in milliseconds).

Total

duration

of the synthetic

/ha/

/da/

Iga/

Ipa/

Ita/

/ka/

350

350

350

335

335

335

Onset of transitions a,b

95

90

85

70

70

70

Onset of vowel portionb

135

130

135

120

115

120

45

45

55

55

50

55

Duration

of transitions

a

•This includes the brief burst preceding the transitions.

726 The equipment used in generating the experimental

tapes consistedof two tape recorders (Magnecord 1022), two electronic switches(Grason-Stadler model 1287), an amplifier (Grason-Stadler model 1288), a level/zone detector (Grason-Stadler model 1270), and two timers (Grason-Stadler model 1216A). The level detector was set to react to the amplified syllable onsets and started the first timer which in turn opened the first switch for

140 msec (C masks). The offset of the firsttimerclosed

bMeasuredfrom "syllable onset."

the first

The syllables exhibited several distinctive istics

in addition

to formant

transitions

character-

and voice

onset

switch and star. ted the second timer which

openedthe second switch for either (arbitrarily) 300 msec (V masks) or 50 msec (v masks). The output

time, the necessary acoustic features. The most important of these additional features was the fundamental

from the appropriate switch was recorded on the masking channel of the experimental tape and was simultaneously monitored on a dual-trace storage oscilloscope

frequency (F0, or pitch) contour of the vowel portion following the formant transitions: "V-" vowels (i.e.,

tronic switches was set at 1 msec, which generated

vowels preceded by voiceless consonants) started at a higher F 0 which fell through the duration of the vowel at a faster rate than that of "V+" vowels (see below). Individual syllables also differed in total duration, and in onset time and duration of the formant transitions, as shown in Table I. (The values in Table I were taken from the computer synthesis specifications which used 5-msec time frames. ) The syllables were temporally

aligned with respect to

their "onsets," which were faint acoustic events (probably computer artifacts) that were inaudible as such but distinguishable on oscilloscope tracings. The syllables were thus not precisely synchronized or separated with

respect to the onset of formant transitions (presumably the best reference point), a factor that was not under the author's control and had to be ignored in the analysis. (The actual SOAs with respect to the onset of formant transitions differed from the "nominal" SOAs by up to + 25 msec. ) From the source tape, six pairs of experimental tapes

(the pairs correspondingto the two stimulus half-sets) were prepared.

Three pairs of tapes were forward

masking (FM) tapes and three were backward masking (BM) tapes. Each of the three pairs of FM (BM) tapes was assigned a different mask type (C, V, or v). The masks were produced by electronically

gating out cer-

tain parts of the syllables on one (the masking) channel, while the syllables on the other channel (the targets) were dubbed directly from the source tape.

(Tektronix type 564B). The rise/fall time of the elecsharp onsets and offsets but no audible transients or clicks, at least not at the relatively low intensities used in the experiment. Pilot

studies

had shown

that

C masks

in isolation

were

as readily identifiable as full CF syllables. Informal tests also showed that the difference in pitch contour between

"V+"

and "V-"

vowel

masks

could be distin-

guished by a careful listener; however, little effect of this variable was expected in the experiment. Each experimental stimulus

half-set

tape contained the appropriate

recorded

twice

in succession

to make

it possible to start with any list and continue through the whole half-set without interruption. Care was taken to record all targets and masks at the same intensity levels. Multiple checks revealed the variation between channels and between tapes to be well within 1 dB. Subsequently, the playback intensities were calibrated such that they approximated as much as possible the target

intensities (46 dB SPL) and mask intensities (46 dB and 54 dB SPL) used in the earlier CV masking experiment (Repp, 1975). This match was made visually on a storage oscilloscope and was confirmed (for target syllables) by volt meter and sound-level meter (peak) readings.

Accuracy was within 1 dB.

The clipping of the initial part of the syllables in the case of vowel masks resulted in a systematic shift of SOAs. This shift is appropriately calculated with re-

For the

masks, a "cut-point" was chosen at 140 msec from onset, which was the earliest point by which the formant

"Onset"

Transitions

transitions of all syllables were completed (cf. Table I).

Vowel

As illustrated in Fig. 1, C masks consisted of the consonantal part before the cut-point, with 5-25 msec of

J

CV c

steady-state vowel (dependingon the individual syllable), while V masks consisted of the vowel portion after the cut-point. The duration of V masks was 210 msec and

195 for "V+" and "V-" vowels, respectively. "V+" vowels had a linearly falling F 0 contour from 120-100 Hz,

while the F 0 of "V-" vowels started at 128 Hz and remained there for 20 msec before it fell linearly to 100 Hz. The duration of v masks was arbitrarily chosen to be 50 msec, so that they represented the first 50 msec

of V masks. The final F 0 of "V+" and "V-" v masks was 116 and 124 Hz, respectively.

I

0

I

~ 80

I

140

I

190

I

',350

msec

FIG. 1. Schematic illustration of the four kinds of masking stimuli and their temporal relationship to each other (see text).

J. Acoust.Soc. Am., Vol. 57, No. 3, March 1975 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 84.88.136.149 On: Tue, 16 Dec 2014 10:12:05

727

B.H. Repp: Dichotic maskingof consonantsby vowels

spect to the onset of the formant transitions of the target and was between +45 and + 55 msec for voiced tar-

gets, and+ 70 msecfor voicelesstargets (cf. Table I and Fig. 1). When voiced and voiceless targets were combined in the data analysis, an average shift factor of + 60 msec was adopted, so that SOAs ranged from -420 to + 60 msec in the vowel "FM" condition, and from + 60 to + 300 msec inthe vowel BM condition. Thus, three of the SOAs in the FM condition were actually BM relationships and were treated as such in the data analysis.

The experiment had a complete factorial design within resulting from the combination of three factors:

mask

type (C, ¾, v), maskingcondition(FM or BM), and mask intensity (46 or 54 dB). The division into stimulus half-sets led to 24 experimental conditions which were administered to each of 12 Ss in a balanced Latin-square

There were four sessions per S, with six ex-

In the comparison of C¾ and C masks, the C¾ data

were averaged over the laterality factor (left versus right ear), to increase the stability of the data. Other-

sisted of a complete half-set of 144 target syllables played to the right ear without masks in the left ear. Since the targets were taken from the tape that was to follow in the first experimental trial of the particular

session, the design of control lists was balanced over stimulus half-sets, tapes, and Ss. A comparison between all three mask types was not so much of interest as the comparisons between C¾ and

C. Procedure

The S was seated in a sound-isolated booth equipped with a writing table, paper, and l•encil. Playback was via a Magnecord 1022 tape recorder, apair of GrasonStadler E10589A transformers, a pair of calibrated attenuators assembled at this laboratory, and a pair of Grason-Stadler

tween trials. masks

target andmask intensities w ere taken from Repp (1975). In the comparison of C ¾ and C masks, "mask duration" constituted a between-group factor. Since the earlier study had used different groups of Ss in FM and BM, these two conditions had to be treated separately in order to keep apart between-S and within-S effects.

In addition to the mask intensity factor , CV and C conditions shared precisely the same design on the stimuli: 8 SOAs by 36 stimulus pairs. To facilitate analysis, the stimulus pairs were classified into four "feature contrast" groups, according to whether the two consonants in each pair contrasted in both, either, or none of the two articulatory features, voicing and place. Thus, the design on the stimuli was effectively an 8 x4 faclinear

contrasts

of the feature

con-

trast factor were treated separately' the "identity effect" (identical versus nonidentical pairs), the "similarity effect" (doublefeature contrasts versus single feature contrasts), and the "feature-specific effect" (singlefeature voicing contrasts versus place contrasts). In the comparison between vowel masks, FM and BM had to be treated separately too, because of the SOA shift discussed above.

FM but ten in BM.

PDH-10

earphones.

Each session lasted

There were only five SOAs in

In each condition, there were four

within-S factors: mask duration (¾ vs •), mask inten-

Targets were always in the right ear and

in the left

ear.

The S was instructed

to write down the targets only

and to ignore the other ear as much as possible. It was stressed that a response had to be given to each stimulus pair, and that, with C masks, the mask would

be different from the target in most (but not all) pairs. The exact nature of the stimuli

and of the experimental

conditions (but not of the experimental hypotheses) was explained to each S prior to the experiment. Control trials

C masks, andbetween¾ and v masks. (A statistical comparison between CV and ¾ masks was not deemed necessary. ) The C¾ masking data at the corresponding

The three

analysis was conducted as in Repp

approximately 90 minutes, including short breaks be-

conditions in each.

In addition to these, each S received four control lists, one at the beginning of each session. Each con-

torial.

The last factor re-

(•).

subjects (Ss). There were 12 experimental conditions

design.

sity, SOA, and "vowel identity."

sulted from the two different pitch contours of the vowels: target-mask pairs were classified according to whether the target did or did not contrast in voicing with the consonant that originally preceded the vowel mask, or, in other words, whether the mask was identical with the vowel portion of the target or not. This factor was included only as a control.

wise, the statistical

B. Design and analysis

perimental

727

always started with the first list of the

respective half-set. Subsequent experimental trials each started with a different list, determined by random sampling without replacement from the numbers 2-8, individually

for each S and session.

All sessions took

place on different days, with varying intervals in between.

•

D. Subjects The Ss were volunteers recruited through an ad in the student newspaper and were paid $2.00 per hour. Pros-

pective Ss had to pass a criterion of at least 50% correct on the first control trial. Eight Ss failed this criterion and were rejected. The 12 Ss who passed the criterion

were seven females and five males, aged 16 to 25. All except two females were right-handed. No S admitted any hearing defect, and all were naive with respect to synthetic speech and dichotic listening. II.

RESULTS

A. Control

trials

The average score (percentage of correct responses) in control trials was 63.6, with individual scores ranging from 48.8 to 83.9. This result was comparable to

that obtained in the C V masking experiments (Repp, 1975). There was some improvement with practice: Control trial scores rose by an average 8.1% over the four sessions (p < 0.04). Average performance in ex-

J. Acoust.Soc. Am., Vol. 57, No. 3, March 19.75


728

B.H. Repp:Dichotic masking of consonants byvowels

728 8O

80

60 • 60 •

4½ L,J 20

ID-I

M=46dB

NONID.I'

201 ,,, 1

, oII o ,I12o

-4806240 -

o .•,_!_•I

2

SOA (msec)(BM)

I

6-1 --240 -120

-480

[] I

( BM)

( FM)

48O

FIG. 4.

SOA(m sec)

CV and C masking functions, separately for identical

and nonidentical pairs, at the lower mask intensity (46 dB).

FIG. 2. C masking functionsfor identical pairs (0-FC), single-feature contrasts (1-FC), and double-feature contrasts (2-FC), at lwo mask (M) intensities. No-mask control (C) performance is indicated by dashed lines to the margins.

It is quiteclearthatthedominant factorin masking is

perimental trials rose by 5.7%, which did not reach

the consonant of the masking syllable, since C masking is substantial. The vowel of C V masks was expected to exert a simultaneous masking effect where it overlapped

significance but represents a comparable effect.

the target consonant, i.e.,

B. Consonant

=-240 in FM. Figures 3 and 4 indicate that, indeed, CV masking exceeded C masking in that region, particularly at the higher mask intensity. The interaction be-

masks

The separate analysis of C masking replicated virtually all the basic results that had been obtained previously with C V masks and which have been discussed

by Repp (1975).

in detail

They are shown in Fig. 2, which

may be compared with the results for comparable tar-

get and mask intensities in Figs. 2 and 3 of Repp(1975). There is no need to enumerate and discuss these findings here

further.

The effects that are of special interest here are the

between SOA= - 60 and SOA

tween mask duration and SOA was significant

in FM

(•

Categorical speech perception during active discrimination of consonants and vowels.

Intelligibility of American English vowels and consonants spoken by international students in the United States.

Dichotic forward and backward "masking" between CV syllables.

Importance of envelope modulations during consonants and vowels in segmentally interrupted sentences.

The role of distinctive features in the dichotic perception of vowels.

Backward masking and asymmetry of processing for vowels differing in acoustical similarity.

Vowels and Consonants in the Brain: Evidence from Magnetoencephalographic Studies on the N1m in Normal-Hearing Listeners.

Forward acoustic masking enhances the auditory brainstem response in a diotic, but not dichotic, paradigm in salicylate-induced tinnitus.

Improving vision by pupil masking.

Nonadditivity of masking by narrow-band noises.

The perception of dichotic chords by hemispherectomized subjects.

Dichotic listening and allusive thinking.

Brightness masking is modulated by disparity structure.

Cognitive neuropsychology. Writing without vowels.

Visual masking by translation equivalents in bilinguals.

Peripheral auditory tuning for vowels.

How much masking is informational masking?

Dichotic studies. II. Two questions.

Dimension-based statistical learning of vowels.

Temporal decline of masking and comodulation masking release.

Some characteristics of the visual masking by moving contours.

Relations among temporal resolution, forward masking, and simultaneous masking.

Intergration and interruption in the masking of form by form.

Mechanisms of masking.