Dichotic masking of consonants by vowels Bruno H. Repp* Psycholinguistics Laboratories, Universityof Chicago, Chicago,Illinois 60637 (Received 28 June 1974; revised 11 December 1974) The dichotic masking effects of a vowel (/a/) on CV syllables (stop + /a/) were investigated by comparing the effect of CV masks with (a) the effect of CV masks in which the vowel was shortened and (b) the effect of an isolated vowel mask. The vowel exerted a clear masking effect, both when isolated and when in CV context, but this effect was much less pronounced than the "masking" produced by a competing consonant. It was hypothesizedthat consonantsexert their "masking" effect on consonant(CV) targets at a central (phonetic) level, while vowel masking is largely a "peripheral" (auditory) phenomenon. Detailed analysis of the vowel masking effect revealed several distinct components.The two most interesting effects were (a) a relationship between stimulus onset asynchronyand the probability of voiced responses,which was interpreted as masking and substitution of the cue of voice onset time, and (b) a relationship between the pitch contour of the vowel mask and the probability of a voiced response,which suggestedthat the pitch contour of an isolated vowel may act as a voicing cue when paired with a consonant in the other ear. While all these effects were interpreted as consequencesof relatively peripheral binaural mixing, the possibility that part of the vowel masking effect occurred at a later, phonetic processingstage could not be excluded.
Subject Classification: 70.30; 65.48, 65.58.
INTRODUCTION
(McNeill and Repp, 1973; Repp, 1975).
The present study investigates the dichotic (forward, backward, and simultaneous) masking effects of vowels on consonants. istics
of such
eral reasons,
The extent and the temporal charactereffects
are
of theoretical
interest
for
sev-
as outlined below.
A number of recent studies have presented dichotic CV syllables at various stimulus onset asynchronies
(SOAs), in order to determine the time course of dichotic interference (e.g., Studdert-Kennedy,Shankweiler, and Schulman, 1970; Pisoni, 1972; Berlin, Lowe-Bell, Cullen, Thompson, and Loovis, 1973; Kitsrein, 1973; McNeill and Repp, 1973; Pisoni and McNabb, 1974;
Repp, 1975).
With the exception of Pisoni (1972),
who used extremely
short stimuli and varied interstim-
ulus interval instead of SOA, all these experiments used syllables of 100-350-msec duration which overlapped temporally until SOA exceeded the duration of the leading stimulus. Therefore, forward and backward mask-
ing (in terms of SOA) were contaminatedwith the simultaneous masking effect that the vowel of the leading syl-
lable may have had on the consonant (the formant transitions) of the lagging syllable. There are reasons to believe that this masking effect was very slight. Experiments on dichotie simultaneous
masking (often called "central masking") of nonspeech sounds at the detection threshold have shown only relatively weak interference, except when target and mask onsets were close to each other, i.e., at very short
SOAs (e.g., Elliott, 1965; Zwisloeki, Damianopoulos, Buining, and Glantz, 1967; Deatherage and Evans, 1969)• On these grounds, one might expect diehotie "forward masking" (in terms of SOA) between CV syllables to be
Moreover,
it is a well-established finding that backward masking between CV syllables exceeds forward masking--the so-
called "lag effect" (Studdert-Kennedyet al., 1970• Kirstein, 1973; Porter, 1974)--while any strong masking effect of the vowel should produce precisely the opposite result. Clearly, it is the information contained in the competing consonant that produces most of the interference.
Nevertheless, it is of interest to know whether the overlapping vowel has any effect at all. For example,
"forward masking" may be largely due to the simultaneous masking effect of the vowel of the leading syllable. Such a finding would be of considerable importance, since it would indicate the interference between competing consonants is unidirectional and that it is primarily a backward masking effect, perhaps due to interruption of the cortical processing of the leading consonant, while forward masking may be a more peripheral phenomenon due to auditory interaction at a subcortical level of bin-
aural convergence(where dichotic maskingbetween nonspeech sounds is assumed to arise--cf.
Zwislocki,
The effect of the vowel can be assessed by comparing CV masks
of full
duration
with
CV masks
in which
the
vowel has been shortened. Porter (1971) has taken this approach and apparently found no significant differences.
However, the way in which he reported his data (viz., only as the difference between backward and forward
masking) allows no good conclusions about the actual effects of his two masks, so that a more detailed investigation is warranted. The first part of the present
study compares the effect of consonant-vowel(½V) masks with the effect of "consonant"(C) masks in which
at a maximumaroundSOAsof -20 to -40 msee,• when
the vowel
the onset of the vowel of the leading syllable coincides with the acoustically much weaker formant transitions of the lagging syllable. This has never been observed: forward masking is always at a maximum at SOA = 0
Thus, one way of studying the masking effect of the vowel is to eliminate the vowel from the mask, as far as this is possible, and to see how much this reduces
724
J. Acoust.Soc. Am., Vol. 57, No. 3, March 11t75
has been
reduced
to minimal
duration.
Copyright¸ 111t75 by the AcousticalSocietyof America
724
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 84.88.136.149 On: Tue, 16 Dec 2014 10:12:05
725
B.H. Repp: Dichotic maskingof consonantsby vowels
interference.
This
method
assesses
the effect
of the
physical presence (or duration) of the vowel, since it cannot be completely eliminated after a stop consonant and thus remains present as a phonetic segment. Another
method
is to eliminate
the
consonant
in the mask
by gating out the transitional part of the CV syllable and to see how much masking remains. This approach has been taken in the second part of the present study which investigates the masking effect of isolated vowels on CV targets and compares it with that of C V masks. Thus, while the first method compares two masks that are acoustically different but phonetically similar, the second method compares two masks that are acoustically more similar but phonetically different. There
is little
doubt
that
dichotic
interference
be-
725
pect, therefore, that vowel masking, if present at all, will reflect relatively peripheral interference, perhaps due to auditory interaction at the point of first binaural convergence. This effect should be primarily a simultaneous masking effect and should be similar in nature to that of the masking vowel in C V context which is as-
sessedby the CV-C comparison.3 There shouldbe relatively little forward or backward masking due to the vowel
itself.
A further question of interest is whether vowel masks
produce a "lag effect," i, e,, more backward than forward masking (in terms of SOA). Sincethey are expected to exert mainly a forward masking (in terms of SOA;
actuallysimultaneous masking)effect, andsince "lag effect" has been assumedto arise at a phonetic stage in processing (Kirstein, 1973; Porter, 1974), this
tween competing consonants arises largely at a late
(phonetic)stage in processing. It may consist in the in-
effect should be absent with vowel masks.
teraction of feature representations which perhaps con-
Darwin (1971) in which CV targets were paired with CV or V (isolated vowel) masks at SOA= + 60 showedthat CV was much more effective than V as abackward mask,
verge on a unilateral speechprocessor (StuddertKennedy, Shankweiler, and Pisoni, 1972), or it may arise even later, after separate categorization of each syllable, from confusions in a response selection mech-
anism (Repp, 1975).
It is almost certain that, by
eliminating the masking consonant, a large part of such high-level interference is removed. The question is whether
an isolated
vowel
does
interfere
with
a conso-
nant target at a late stage in processing at all, or whether vowel masking is strictly a more peripheral phenomenon, i.e., interference at an earlier, auditory level of processing, as opposed to interference at the
phoneticlevel. •' Steady-state vowels are acoustically very different from the rapid transitions that characterize stop conso-
nants. Stop consonantsare "encoded" into the vowel and are likely to require specialized auditory and phonetic processing mechanisms that are not used in recognizing vowels. The existence of such specialized mechanisms
has
often
been
inferred
from
the
result
that
stop consonants are perceived categorically while vowels
are not (Stevens,Liberman,'•hman,andStuddertKennedy, 1969; Pisoni, 1971)--although the reason for this difference appears now to be the unavailability of transitional information in auditory short-term memory
A study by
while both were about equally effective as forward masks. As a result, CV masks produced alag effect but V masks did not. Moreover, this difference seemed to be pri-
marily due to differential masking of the "place" feature of the target, which is the feature that is truly encoded in the formant transitions, suggesting that the C V mask
interfered specifically with the "decoding" of the target while the V mask did not. The present study attempts a more comprehensive investigation of these effects. In addition to C and V masks, the present study used
a short vowel (v) mask in a third condition, in order to assess the effect of vowel duration (and energy) in isolated vowel masks. All masks were presented at two intensities, one equal to target intensity and the other 8 dB higher. This large design necessitated some other restrictions: the targets remained at a constant intensity and were always presented in the right ear. The comparison data for C V masks were taken from an earlier
study by Repp (1975).
The present stimulus tapes
were also taken from the earlier study, after obtaining the masking stimuli by an electronic gating procedure, and the stimulus intensities of the earlier experiment were matched as closely as possible.
(Pisoni, 1971, 1973; Studdert-Kennedy,in press)-from the finding of consistent dichotic right-ear advantages for competing stop consonants but not for compet-
I. METHOD
ing vowels (Shankweilerand Studdeft-Kennedy, 1967; Studdert-Kennedyand Shankweiler, 1970), and from the finding of a "lag effect" for consonantsbut not for competing vowels (Porter, Shankweiler, and Liberman,
A. Stimuli
1969). Although some of these differences are less evident when vowels are more "encoded," such as in
rapid speech(e.g., Stevens, 1968; Haggard, 1971) or under difficult listening conditions(Weiss and House, 1973), this is less relevant in the present case where
The source tape was recorded at Haskins Laborato-
ries (New Haven, Connecticut)using their Parallel Formant Synthesizer and Pulse Code Modulation System. It contained all possible dichotic pairs, including iden-
tical pairs, of the six syllables, /ba/,/da/,/ga/,/pa/, /ta/, and/ka/. Each pair occurred at each of eight different SOAs (0, 20, 40, 60, 90, 120, 240, and 480 msec), resulting in 8x36= 288 stimulus pairs.
They
the vowel is isolated and steady-state, and, moreover,
were arranged into two nearly equivalent half-sets
constant
144, each containing 8 lists of 18 pairs. The sequence of stimulus pairs within each half-set was completely random. The interval between successive pairs was about 4 sec, with a longer interval between lists. Channel A always lagged behind channel B (except at SOA= 0).
from
trial
to trial.
There
seems
to be little
reason to expect the masking vowel to interfere with the target consonant at a late stage in processing, since the vowel presumably contains no features, and generates no responses, that are in strong conflict with features or responses for the target consonant. One might ex-
of
J. Acoust. Soc. Am., Vol. 57, No. 3, March 1975
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 84.88.136.149 On: Tue, 16 Dec 2014 10:12:05
726
B.H. Repp:Dichoticmasking of consonants by vowels
TABLE I. Important temporal characteristics syllables (in milliseconds).
Total
duration
of the synthetic
/ha/
/da/
Iga/
Ipa/
Ita/
/ka/
350
350
350
335
335
335
Onset of transitions a,b
95
90
85
70
70
70
Onset of vowel portionb
135
130
135
120
115
120
45
45
55
55
50
55
Duration
of transitions
a
•This includes the brief burst preceding the transitions.
726 The equipment used in generating the experimental
tapes consistedof two tape recorders (Magnecord 1022), two electronic switches(Grason-Stadler model 1287), an amplifier (Grason-Stadler model 1288), a level/zone detector (Grason-Stadler model 1270), and two timers (Grason-Stadler model 1216A). The level detector was set to react to the amplified syllable onsets and started the first timer which in turn opened the first switch for
140 msec (C masks). The offset of the firsttimerclosed
bMeasuredfrom "syllable onset."
the first
The syllables exhibited several distinctive istics
in addition
to formant
transitions
character-
and voice
onset
switch and star. ted the second timer which
openedthe second switch for either (arbitrarily) 300 msec (V masks) or 50 msec (v masks). The output
time, the necessary acoustic features. The most important of these additional features was the fundamental
from the appropriate switch was recorded on the masking channel of the experimental tape and was simultaneously monitored on a dual-trace storage oscilloscope
frequency (F0, or pitch) contour of the vowel portion following the formant transitions: "V-" vowels (i.e.,
tronic switches was set at 1 msec, which generated
vowels preceded by voiceless consonants) started at a higher F 0 which fell through the duration of the vowel at a faster rate than that of "V+" vowels (see below). Individual syllables also differed in total duration, and in onset time and duration of the formant transitions, as shown in Table I. (The values in Table I were taken from the computer synthesis specifications which used 5-msec time frames. ) The syllables were temporally
aligned with respect to
their "onsets," which were faint acoustic events (probably computer artifacts) that were inaudible as such but distinguishable on oscilloscope tracings. The syllables were thus not precisely synchronized or separated with
respect to the onset of formant transitions (presumably the best reference point), a factor that was not under the author's control and had to be ignored in the analysis. (The actual SOAs with respect to the onset of formant transitions differed from the "nominal" SOAs by up to + 25 msec. ) From the source tape, six pairs of experimental tapes
(the pairs correspondingto the two stimulus half-sets) were prepared.
Three pairs of tapes were forward
masking (FM) tapes and three were backward masking (BM) tapes. Each of the three pairs of FM (BM) tapes was assigned a different mask type (C, V, or v). The masks were produced by electronically
gating out cer-
tain parts of the syllables on one (the masking) channel, while the syllables on the other channel (the targets) were dubbed directly from the source tape.
(Tektronix type 564B). The rise/fall time of the elecsharp onsets and offsets but no audible transients or clicks, at least not at the relatively low intensities used in the experiment. Pilot
studies
had shown
that
C masks
in isolation
were
as readily identifiable as full CF syllables. Informal tests also showed that the difference in pitch contour between
"V+"
and "V-"
vowel
masks
could be distin-
guished by a careful listener; however, little effect of this variable was expected in the experiment. Each experimental stimulus
half-set
tape contained the appropriate
recorded
twice
in succession
to make
it possible to start with any list and continue through the whole half-set without interruption. Care was taken to record all targets and masks at the same intensity levels. Multiple checks revealed the variation between channels and between tapes to be well within 1 dB. Subsequently, the playback intensities were calibrated such that they approximated as much as possible the target
intensities (46 dB SPL) and mask intensities (46 dB and 54 dB SPL) used in the earlier CV masking experiment (Repp, 1975). This match was made visually on a storage oscilloscope and was confirmed (for target syllables) by volt meter and sound-level meter (peak) readings.
Accuracy was within 1 dB.
The clipping of the initial part of the syllables in the case of vowel masks resulted in a systematic shift of SOAs. This shift is appropriately calculated with re-
For the
masks, a "cut-point" was chosen at 140 msec from onset, which was the earliest point by which the formant
"Onset"
Transitions
transitions of all syllables were completed (cf. Table I).
Vowel
As illustrated in Fig. 1, C masks consisted of the consonantal part before the cut-point, with 5-25 msec of
J
CV c
steady-state vowel (dependingon the individual syllable), while V masks consisted of the vowel portion after the cut-point. The duration of V masks was 210 msec and
195 for "V+" and "V-" vowels, respectively. "V+" vowels had a linearly falling F 0 contour from 120-100 Hz,
while the F 0 of "V-" vowels started at 128 Hz and remained there for 20 msec before it fell linearly to 100 Hz. The duration of v masks was arbitrarily chosen to be 50 msec, so that they represented the first 50 msec
of V masks. The final F 0 of "V+" and "V-" v masks was 116 and 124 Hz, respectively.
I
0
I
~ 80
I
140
I
190
I
',350
msec
FIG. 1. Schematic illustration of the four kinds of masking stimuli and their temporal relationship to each other (see text).
J. Acoust.Soc. Am., Vol. 57, No. 3, March 1975 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 84.88.136.149 On: Tue, 16 Dec 2014 10:12:05
727
B.H. Repp: Dichotic maskingof consonantsby vowels
spect to the onset of the formant transitions of the target and was between +45 and + 55 msec for voiced tar-
gets, and+ 70 msecfor voicelesstargets (cf. Table I and Fig. 1). When voiced and voiceless targets were combined in the data analysis, an average shift factor of + 60 msec was adopted, so that SOAs ranged from -420 to + 60 msec in the vowel "FM" condition, and from + 60 to + 300 msec inthe vowel BM condition. Thus, three of the SOAs in the FM condition were actually BM relationships and were treated as such in the data analysis.
The experiment had a complete factorial design within resulting from the combination of three factors:
mask
type (C, ¾, v), maskingcondition(FM or BM), and mask intensity (46 or 54 dB). The division into stimulus half-sets led to 24 experimental conditions which were administered to each of 12 Ss in a balanced Latin-square
There were four sessions per S, with six ex-
In the comparison of C¾ and C masks, the C¾ data
were averaged over the laterality factor (left versus right ear), to increase the stability of the data. Other-
sisted of a complete half-set of 144 target syllables played to the right ear without masks in the left ear. Since the targets were taken from the tape that was to follow in the first experimental trial of the particular
session, the design of control lists was balanced over stimulus half-sets, tapes, and Ss. A comparison between all three mask types was not so much of interest as the comparisons between C¾ and
C. Procedure
The S was seated in a sound-isolated booth equipped with a writing table, paper, and l•encil. Playback was via a Magnecord 1022 tape recorder, apair of GrasonStadler E10589A transformers, a pair of calibrated attenuators assembled at this laboratory, and a pair of Grason-Stadler
tween trials. masks
target andmask intensities w ere taken from Repp (1975). In the comparison of C ¾ and C masks, "mask duration" constituted a between-group factor. Since the earlier study had used different groups of Ss in FM and BM, these two conditions had to be treated separately in order to keep apart between-S and within-S effects.
In addition to the mask intensity factor , CV and C conditions shared precisely the same design on the stimuli: 8 SOAs by 36 stimulus pairs. To facilitate analysis, the stimulus pairs were classified into four "feature contrast" groups, according to whether the two consonants in each pair contrasted in both, either, or none of the two articulatory features, voicing and place. Thus, the design on the stimuli was effectively an 8 x4 faclinear
contrasts
of the feature
con-
trast factor were treated separately' the "identity effect" (identical versus nonidentical pairs), the "similarity effect" (doublefeature contrasts versus single feature contrasts), and the "feature-specific effect" (singlefeature voicing contrasts versus place contrasts). In the comparison between vowel masks, FM and BM had to be treated separately too, because of the SOA shift discussed above.
FM but ten in BM.
PDH-10
earphones.
Each session lasted
There were only five SOAs in
In each condition, there were four
within-S factors: mask duration (¾ vs •), mask inten-
Targets were always in the right ear and
in the left
ear.
The S was instructed
to write down the targets only
and to ignore the other ear as much as possible. It was stressed that a response had to be given to each stimulus pair, and that, with C masks, the mask would
be different from the target in most (but not all) pairs. The exact nature of the stimuli
and of the experimental
conditions (but not of the experimental hypotheses) was explained to each S prior to the experiment. Control trials
C masks, andbetween¾ and v masks. (A statistical comparison between CV and ¾ masks was not deemed necessary. ) The C¾ masking data at the corresponding
The three
analysis was conducted as in Repp
approximately 90 minutes, including short breaks be-
conditions in each.
In addition to these, each S received four control lists, one at the beginning of each session. Each con-
torial.
The last factor re-
(•).
subjects (Ss). There were 12 experimental conditions
design.
sity, SOA, and "vowel identity."
sulted from the two different pitch contours of the vowels: target-mask pairs were classified according to whether the target did or did not contrast in voicing with the consonant that originally preceded the vowel mask, or, in other words, whether the mask was identical with the vowel portion of the target or not. This factor was included only as a control.
wise, the statistical
B. Design and analysis
perimental
727
always started with the first list of the
respective half-set. Subsequent experimental trials each started with a different list, determined by random sampling without replacement from the numbers 2-8, individually
for each S and session.
All sessions took
place on different days, with varying intervals in between.
•
D. Subjects The Ss were volunteers recruited through an ad in the student newspaper and were paid $2.00 per hour. Pros-
pective Ss had to pass a criterion of at least 50% correct on the first control trial. Eight Ss failed this criterion and were rejected. The 12 Ss who passed the criterion
were seven females and five males, aged 16 to 25. All except two females were right-handed. No S admitted any hearing defect, and all were naive with respect to synthetic speech and dichotic listening. II.
RESULTS
A. Control
trials
The average score (percentage of correct responses) in control trials was 63.6, with individual scores ranging from 48.8 to 83.9. This result was comparable to
that obtained in the C V masking experiments (Repp, 1975). There was some improvement with practice: Control trial scores rose by an average 8.1% over the four sessions (p < 0.04). Average performance in ex-
J. Acoust.Soc. Am., Vol. 57, No. 3, March 19.75
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 84.88.136.149 On: Tue, 16 Dec 2014 10:12:05
728
B.H. Repp:Dichotic masking of consonants byvowels
728 8O
80
60 • 60 •
4½ L,J 20
ID-I
M=46dB
NONID.I'
201 ,,, 1
, oII o ,I12o
-4806240 -
o .•,_!_•I
2
SOA (msec)(BM)
I
6-1 --240 -120
-480
[] I
( BM)
( FM)
48O
FIG. 4.
SOA(m sec)
CV and C masking functions, separately for identical
and nonidentical pairs, at the lower mask intensity (46 dB).
FIG. 2. C masking functionsfor identical pairs (0-FC), single-feature contrasts (1-FC), and double-feature contrasts (2-FC), at lwo mask (M) intensities. No-mask control (C) performance is indicated by dashed lines to the margins.
It is quiteclearthatthedominant factorin masking is
perimental trials rose by 5.7%, which did not reach
the consonant of the masking syllable, since C masking is substantial. The vowel of C V masks was expected to exert a simultaneous masking effect where it overlapped
significance but represents a comparable effect.
the target consonant, i.e.,
B. Consonant
=-240 in FM. Figures 3 and 4 indicate that, indeed, CV masking exceeded C masking in that region, particularly at the higher mask intensity. The interaction be-
masks
The separate analysis of C masking replicated virtually all the basic results that had been obtained previously with C V masks and which have been discussed
by Repp (1975).
in detail
They are shown in Fig. 2, which
may be compared with the results for comparable tar-
get and mask intensities in Figs. 2 and 3 of Repp(1975). There is no need to enumerate and discuss these findings here
further.
The effects that are of special interest here are the
between SOA= - 60 and SOA
tween mask duration and SOA was significant
in FM
(•