The role of medial
consonant
transitions
in word
perception Lynn A. Streeter Bell Laboratories,Murray Hill New Jersey07974
Georgia N. Nigro Departmentof Psychology, Yale University,New Haven, Connecticut06520 and Haskins Laboratories, New Haven, Connecticut 06520
(Received 31 May 1978; revised 14 February 1979)
In VCV nonsenseforms (such as /•d•/), while both the CV transition and the VC transition are perceptiblein isolation, the CV transition dominates identification of the stop consonant.Thus, the questionarises,what role, if any, do VC transitionsplay in word perception?Stimuli were two-syllable English words in which the medial consonantwas either a stop or a fricative (e.g., "feeding" and "gravy"). Each word was constructedin three ways: (1) the VC transitionwas incompatiblewith the CV in either place,mannerof articulation,or both; (2) the VC transitionwas eliminatedand the steady-state portion of first vowel was substitutedin its place; and (3) the original word. All versionsof a particular word were identical with respect to duration, pitch contour, and amplitude envelope. While an
intelligibilitytest revealedno differencesamongthe three conditions,data from a paired comparison preferencetask and an unspeededlexical decisiontask indicatedthat incompatibleVC transitionshindered word perception,but lack of VC transitionsdid not. However, there were clear differencesamong the three conditionsin the speededlexical decisiontask for word stimuli, but not for nonword stimuli that were constructedin an analogousfashion.We discussthe use of lexical decisiontasksfor speechquality assessment and possibleprocesses by which listenersrecognizespokenwords. PACS numbers: 43.70.Dn, 43.70.Ve
INTRODUCTION
An inherent aspect of spoken language is its redundancy. That is, more than one acoustic cue can contribute to the perception of a single linguistic event. One such redundancy can be found in VCV forms such
as/eda/,
in which the consonantaltransitions out of
the first vowel (VC transition) and the consonantal transitions into the second vowel (CV transition) both
signal the same stop consonant,in this case/d/.
It
dominate perception of the medial consonant. Our research attempted to determine what role VC transitions play in a more natural stimulus domain, namely English words. Since word recognition is fundamental to language comprehension, it is critical to understand the relation between the acoustic structure of spoken words and the process by which they are recognized.
Four tasks were used to examine perception of the VC transitions. In the first task, an intelligibility test,
has been shown that when the VC transition signals one stop consonant, whereas the CV transition signals a different stop consonant, the CV transition dominates
listeners
the listener's percept (Abbs, 1971; Dorman, Raphael, Liberman, and Repp, 1975; Repp, 1977a, 1977b;
pair sounded more natural. In the third task, subjects judged whether items were words or nonwords. In the final task, subjects again judged whether items were words or nonwords, but under time pressure.
Fujimura, Macchi, and Streeter, 1978). Using two synthetic nonsenseforms/b•bd•/and/b•gd•/, Dorman et al. varied the period of silence (closure duration) between the two syllables, and asked subjects to report what they heard. At silent intervals of 50 ms or less,
subjects reported hearing/b•d•/;
that is, they did
not report hearing the consonant signaled by the VC
transition.
Fujimura et al. (1978) also found that sub-
jects relied on the CV transitions in VCV forms when the CV and VC transitions conflicted, and showed that
this effect was not based on'acoustic asymmetries present in the speech signal, but rather was due to
a perceptual bias. Repp (1977a) obtained the same result with a discrimination task; at short closure durations VCV stimuli with conflicting VC and CV transitions
were
difficult
to discriminate
which the two transitions the VC
transition
was
1533
durations
stimuli
in
absent.
Thus, previous research closure
from
were compatible or in which
indicates that at short
the CV transitions
in VCV
J. Acoust.Soc. Am. 65(6), June 1979
forms
identified
words
in the three
conditions.
The
second task was a preference test in which listeners heard pairs of words and indicated which member of the
Our reason for using a number of tasks was twofold. First, we were interested in comparing results from
traditional speech quality measures (i.e., intelligibility and preference tasks) with tasks that have rarely been used for quality assessment purposes (i.e., lexical
decision tasks), but which may be more sensitive to differences
between
stimuli
than
the more
conventional
measures. Second, results from lexical decision reaction time tasks may give some indication of how auditorally-presented words are recognized. The
speeded lexical decision task frequently has been used with visually presented material (cf. Landauer and Freedman, 1968; Meyer, 1970; Meyer and Schvaneveldt, 1976; Landauer, Didner, and Fowlkes, 1975). For example, this technique has been used to investigate the sequential stages involved in translating a visually presented work into its spoken name (Landauer et al.,
1975). Thus, this technique applied to the auditory
001-4966/79/061533-09500.80
(D 1979 AcousticalSocietyof America
1533
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 134.71.135.39 On: Thu, 04 Dec 2014 21:43:51
domain may be useful for determining the stages that intervene between receipt of the acoustic stimulus and deciding that a particular stimulus was an English word. I. METHOD Stimulus
materials
Thirty-six two-syllable English words served as the material for stimulus preparation. All words had the
"traitor"
and the second syllable from "traces."
(4)
The two transitions differed in both manner and place of articulation--the VC transition was appropriate for one of the stops, whereas the CV transition was appropriate for one of the fricatives. If the VC was labial, the CV was alveolar, and vice versa. For example, the first syllable came from "gable" and the second from "gazing." ,
(reduced); (3) the medial consonant in all words was
The two syllables were concatenated, and the duration of the first syllable was adjusted to be the same as the duration of the first syllable in the original word. Likewise, the pitch and amplitude contours of the original word were imposed on the concatenated version. All concatenating, duration adjustments, and amplitude and pitch contour replacements were done on the LPC analyzed materials using an interactive program on a
either a stop or a fricative
DDP-224 computer (Nakatani, 1976).
following properties:
(1) each word began with a
consonant or a consonant cluster and was followed by
a tense vowel--either/i/as in "pea" or/e/as in "pay"; (2) the first syllable was always stressed, ß
whereas the second syllable was always unstressed and was one of the follow-
ing: /b/, /p/, /d/, /t/, /v/, /f/ , /z/,
or/s/.
Examples
of some of the words are: "labor, .... fever," and "races." All words were of moderate frequency .of occurence in written English, according to two different word counts. Based on one count (Carroll, Davies,
and Richman, 1971), the average frequency was
133/5000000 (SD= 263); the average frequencyusing a second count (Kucera and Francis, 1976) was 32/1 000 000 (SD= 71). A trained male speaker recorded all stimulus materials used in all experiments in a single recording session.
Stimuli
were
uttered
in isolation
in a random
order in a sound-insulated booth. Materials
were digi-
tized at 10000 samples/s and submittedto linear predictive coding analysis (LPC) (Atal and Hanauer, 1971) and formant tracking (Olive, 1971). Th'ese analyses compute parameters, either pseudoarea function parameters or formant frequencies and formant amplitudes every 10 ms. Three
versions
of each word
were
constructed.
/e/,
and the
/d/ transition from the word "faded" and concatenating it with the second syllable of "fable" to form "fadble." Thus, in condition two the VC transition with the CV transition.
burst
second syllable. were
four
subconditions
within
the second
con-
dition: (1) The VC transition conformed to the CV transition in both manner and place of articulation. For example, the first syllable came from the word "stable" and was concatenated with the second syllable of "staple." (2) The VC and CV transitions were both stops, but differed in place of articulation (labial or
alveolar). For example, the first syllable came from "lady" and the second syllable came from "label." (3) Manner
of articulation
differed
between the VC and CV
transitions--the VC was always from a stop environment, whereas the CV was from a fricative environment. For example, the first syllable came from 1534
In summary,
there were three versions
of each word:
(1) the original LPC analyzed word, (2) the conflicting transition word, and (3) the transition-absent word. Note that all versions of a particular word were identical with respect to syllable duration, pitch contour and source amplitude. •' II.
EXPERIMENT
I-INTELLIGIBILITY
Method 1. Stimulus
materials
All 36 words of the same experimental version
(original, conflicting transition, and transition-absent) were randomized separately and recorded on analog tape. There was a total of three tapes, one for each experimental
condition.
could conflict
• In all cases the medial
and/or noise portions were consideredto be part of the There
In the third condition, the first syllable environments were identical to those described for the conflicting transition condition, but in this condition the VC transition was eliminated entirely and an extension of the steady-state portion of the first vowel substituted in its place. Again, the syllable duration patterns, pitch and amplitude contours of the naturally uttered word were imposed on the steady-state extended words.
The
first condition was the original word as uttered by the speaker and subsequently LPC analyzed and synthesized. In the second condition, the first syllable was taken from a new word environment. For example, the
word "fable" was formed by excising/f/,
,
J. Acoust.Soc. Am., Vol. 65, No. 6, June 1979
2. Subiects High school students in the Murray Hill area participated in the experiment. There were eight subjects in each group. 3. Procedure
Subjects were run in groups of four or five in a soundinsulated booth. Each subject heard only stimuli in one of the three experimental conditions. Stimuli were presented binaurally through earphones at a comfortable listening level. Subjects heard 36 English words presented one at a time;
word they heard.
Their
task Was to write
down the
If unsure, they were to make their
best guess. There were five seconds of silence separating each word. L.A. Streeterand G. N. Nigro: Consonanttransitions
1534
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 134.71.135.39 On: Thu, 04 Dec 2014 21:43:51
B. Results
natural.
Subjects were run in groups of five in a
sound-insulated
All renderings that were pronounceable as the intended word were scored as correct. The percentage correct for each of the three experimental conditions was as
follows: (1) original word; 91% (standarderror= 3%), (2) conflicting transitions; 89% (standard error = 4%), and (3) transition-absent; 91% (standard error= 3%). An analysis of variance using an arcsine transformation of proportion
correct for each word confirmed that there
were no differences among the three conditions (F < 1). Thus, the different ways in which the words were constructed did not differentially affect intelligibility judgments. This is not surprising, since intelligibility tests are characteristically subject to ceiling effects. It is often possible to identify words even when the quality is poor.
III. A.
EXPERIMENT
Block
order
was counterbalanced
B.
Results
Table
I shows the results
for the three
word versions
subdivided into source environment types: + manner, +place; + manner, -place; -manner, +place; and -manner, -place. (A "+" indicates that both transitions were in accord for a particular feature, whereas "-"
indicates
particular
that the two transitions
feature.
differed
for a
This notation does not apply to the ,
original stimuli.) A test of the pairs of conditions using a combined word and subject error term (min-F test, F"; Winer, 1971) failed to show a reliable difference
among the three condition pairs (F" = 2.07, (if= 2, 17, p >0.10). However, note that the preference means for comparison 1 and comparison 2 differed signifi-
II-PREFERENCE
cantly from chance (0.50, no preference for either
condition). Only comparison 3's (original versus
Method
1. Stimulus
booth.
across the subject groups.
transition-absent) 95% confidence interval includes the
0.50 chancepoint. Thus, the original words .werepre-
materials
An analog tape of lexically identical word pairs was constructed. That is, in the conflicting transition condition, the identity of the word was based on the CV transition, e.g., "fadble" was defined to be "fable." Each of the three versions
of each word occurred
with
every other version of the same word in all possible
ferred to words in the conflicting condition, .and transition-absentwords were preferred to the conflicting words.
In analyzing manner by place effects for each comparison separately, there were marginally significant
manner effects (p < 0.10 in both cases) for the original
permutations.These216trials were recordedin ran-
versus conflicting and for the transition-absent versus conflicting words. In addition, there was a marginal
dom order with the first 108 trials constituting one block, and the second 108 constituting a second block.
place effect in the transition-absent versus conflicting comparison (p < 0.10). In summary,
2. Subiects
both original words and transition-
absent words were preferred to conflicting words. However, there was no significant difference between transition-absent and original words in preference ratings.
Ten high school students from the Murray Hill area participated in the experiment. None of the subjects had participated in the first experiment. 3. Procedure
IV.
EXPERIMENT
DECISION
Word pairs were presented binaurally through headphones at a comfortable listening level. Subjects were told that they would hear a pair of utterances.
Each
utterance in the pair was the same word, but differed in how they had been constructed by the computer. Subjects indicated which member of the pair sounded more
III-UNSPEEDED
LEXICAL
TASK
An equal number of phonologicallyregular"nonwords," were intermixed with the words, and listeners indicated whether a given stimulus was a word or a nonword. Presumably, to perform this task, listeners must
search
their
mental
lexicons
to determine
TABLE I. Preference judgments. Percentageof judgmentsfavoring underscoredcondition. Comparison 1 original
,
versus
conflicting
Comparison2 transition absent versus
50% 62% 63 %
49% 59% 60 %
-manner,-place
67%
68•0
mean standarderror
60% -2.6%
59% 2.3%
J.Acoust. Soc.Am.,Vol.65,No.6,June1979
transition
conflicting
+manner, +place +manner,-place -manner, +place
1535
Comparison3 original versus absent
51• 54% 52 %
•
57% 54% 2.5 %
L.A.Streeter andG.N.Nigro'Consonant transitions 1535
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 134.71.135.39 On: Thu, 04 Dec 2014 21:43:51
whether or not a given stimulus exists. We reasoned that the extent to which a given auditory stimulus deviates from the way in which the word has been heard
in the past should be reflected in the word/nonword judgments and in the listeners' confidence in their decisions. In other words, words of poor quality should be judged to be nonwords more often than words of higher quality, and poor quality words should have lower confidence ratings assigned to them. The unspeeded lexical decision task has been used previously to assess speech quality; the task correlated significantly with intelligibility scores, but it was more sensi-
Subjects were tested in groups of four to six in a sound-insulated booth. Stimuli were presented binaurally through headphones at a comfortable listening level. There were three groups of ten subjects; each group listened to only one of the three tapes. B.
Results
Each word/nonword judgmentwas combinedwith its confidence rating to form a single score. The scores ranged from one to ten; a "one" indicated that the subject had incorrectly classified the stimulus and was
tive than the intelligibility measure (Streeter, 1976).
"very confident" of the judgment, whereas a "ten" indicated that the subject's word/nonword judgmentwas
A.
correct, and further the subject had assigned the highest confidence rating to it.
Method
1. Stimulus
materials
In addition
to the 36 words
in each of the three
con-
ditions, an equal number of nonwords was constructed by taking the first syllable from each of the words and combining it with a second syllable that had the
same phonetic properties as the second syllables of the words. Whenever possible, first and second syllables from the word set were intermixed to form phonologically regular nonwords. For example, the syllables in "faces" and "staple" were intermixed to form the two nonwords "faple" and "staces." These nonwords were constructed in an analogous fashion to the words. Thus, there were three versions of each nonword: (1) the original nonword as uttered by the speaker in the previously described recording session and subsequently
Li•C analyzed; (2) conflicting-transition nonword, in which the VC transition varied in compatibility with the CV transition in manner or place or both manner and
place of articulation; (3) transition-absent nonword, in which
the VC transition
steady-state
was eliminated'
and the
portion of the first vowel extended.
To minimize the effect of subject differences in scale usage, the one-to-ten scores were normalized separately by subject. These averaged normalized scores (z scores) are shown for each condition in Table II. (Note that the mean of the z distribution is zero. Consequently, scores below zero are below the mean, while scores greater than zero are above the mean.) In general, words were rated higher than nonwords
(F" =6.77, df= 1, 41,/•