The role of medial

consonant

transitions

in word

perception Lynn A. Streeter Bell Laboratories,Murray Hill New Jersey07974

Georgia N. Nigro Departmentof Psychology, Yale University,New Haven, Connecticut06520 and Haskins Laboratories, New Haven, Connecticut 06520

(Received 31 May 1978; revised 14 February 1979)

In VCV nonsenseforms (such as /•d•/), while both the CV transition and the VC transition are perceptiblein isolation, the CV transition dominates identification of the stop consonant.Thus, the questionarises,what role, if any, do VC transitionsplay in word perception?Stimuli were two-syllable English words in which the medial consonantwas either a stop or a fricative (e.g., "feeding" and "gravy"). Each word was constructedin three ways: (1) the VC transitionwas incompatiblewith the CV in either place,mannerof articulation,or both; (2) the VC transitionwas eliminatedand the steady-state portion of first vowel was substitutedin its place; and (3) the original word. All versionsof a particular word were identical with respect to duration, pitch contour, and amplitude envelope. While an

intelligibilitytest revealedno differencesamongthe three conditions,data from a paired comparison preferencetask and an unspeededlexical decisiontask indicatedthat incompatibleVC transitionshindered word perception,but lack of VC transitionsdid not. However, there were clear differencesamong the three conditionsin the speededlexical decisiontask for word stimuli, but not for nonword stimuli that were constructedin an analogousfashion.We discussthe use of lexical decisiontasksfor speechquality assessment and possibleprocesses by which listenersrecognizespokenwords. PACS numbers: 43.70.Dn, 43.70.Ve

INTRODUCTION

An inherent aspect of spoken language is its redundancy. That is, more than one acoustic cue can contribute to the perception of a single linguistic event. One such redundancy can be found in VCV forms such

as/eda/,

in which the consonantaltransitions out of

the first vowel (VC transition) and the consonantal transitions into the second vowel (CV transition) both

signal the same stop consonant,in this case/d/.

It

dominate perception of the medial consonant. Our research attempted to determine what role VC transitions play in a more natural stimulus domain, namely English words. Since word recognition is fundamental to language comprehension, it is critical to understand the relation between the acoustic structure of spoken words and the process by which they are recognized.

Four tasks were used to examine perception of the VC transitions. In the first task, an intelligibility test,

has been shown that when the VC transition signals one stop consonant, whereas the CV transition signals a different stop consonant, the CV transition dominates

listeners

the listener's percept (Abbs, 1971; Dorman, Raphael, Liberman, and Repp, 1975; Repp, 1977a, 1977b;

pair sounded more natural. In the third task, subjects judged whether items were words or nonwords. In the final task, subjects again judged whether items were words or nonwords, but under time pressure.

Fujimura, Macchi, and Streeter, 1978). Using two synthetic nonsenseforms/b•bd•/and/b•gd•/, Dorman et al. varied the period of silence (closure duration) between the two syllables, and asked subjects to report what they heard. At silent intervals of 50 ms or less,

subjects reported hearing/b•d•/;

that is, they did

not report hearing the consonant signaled by the VC

transition.

Fujimura et al. (1978) also found that sub-

jects relied on the CV transitions in VCV forms when the CV and VC transitions conflicted, and showed that

this effect was not based on'acoustic asymmetries present in the speech signal, but rather was due to

a perceptual bias. Repp (1977a) obtained the same result with a discrimination task; at short closure durations VCV stimuli with conflicting VC and CV transitions

were

difficult

to discriminate

which the two transitions the VC

transition

was

1533

durations

stimuli

in

absent.

Thus, previous research closure

from

were compatible or in which

indicates that at short

the CV transitions

in VCV

J. Acoust.Soc. Am. 65(6), June 1979

forms

identified

words

in the three

conditions.

The

second task was a preference test in which listeners heard pairs of words and indicated which member of the

Our reason for using a number of tasks was twofold. First, we were interested in comparing results from

traditional speech quality measures (i.e., intelligibility and preference tasks) with tasks that have rarely been used for quality assessment purposes (i.e., lexical

decision tasks), but which may be more sensitive to differences

between

stimuli

than

the more

conventional

measures. Second, results from lexical decision reaction time tasks may give some indication of how auditorally-presented words are recognized. The

speeded lexical decision task frequently has been used with visually presented material (cf. Landauer and Freedman, 1968; Meyer, 1970; Meyer and Schvaneveldt, 1976; Landauer, Didner, and Fowlkes, 1975). For example, this technique has been used to investigate the sequential stages involved in translating a visually presented work into its spoken name (Landauer et al.,

1975). Thus, this technique applied to the auditory

001-4966/79/061533-09500.80

(D 1979 AcousticalSocietyof America

1533

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 134.71.135.39 On: Thu, 04 Dec 2014 21:43:51

domain may be useful for determining the stages that intervene between receipt of the acoustic stimulus and deciding that a particular stimulus was an English word. I. METHOD Stimulus

materials

Thirty-six two-syllable English words served as the material for stimulus preparation. All words had the

"traitor"

and the second syllable from "traces."

(4)

The two transitions differed in both manner and place of articulation--the VC transition was appropriate for one of the stops, whereas the CV transition was appropriate for one of the fricatives. If the VC was labial, the CV was alveolar, and vice versa. For example, the first syllable came from "gable" and the second from "gazing." ,

(reduced); (3) the medial consonant in all words was

The two syllables were concatenated, and the duration of the first syllable was adjusted to be the same as the duration of the first syllable in the original word. Likewise, the pitch and amplitude contours of the original word were imposed on the concatenated version. All concatenating, duration adjustments, and amplitude and pitch contour replacements were done on the LPC analyzed materials using an interactive program on a

either a stop or a fricative

DDP-224 computer (Nakatani, 1976).

following properties:

(1) each word began with a

consonant or a consonant cluster and was followed by

a tense vowel--either/i/as in "pea" or/e/as in "pay"; (2) the first syllable was always stressed, ß

whereas the second syllable was always unstressed and was one of the follow-

ing: /b/, /p/, /d/, /t/, /v/, /f/ , /z/,

or/s/.

Examples

of some of the words are: "labor, .... fever," and "races." All words were of moderate frequency .of occurence in written English, according to two different word counts. Based on one count (Carroll, Davies,

and Richman, 1971), the average frequency was

133/5000000 (SD= 263); the average frequencyusing a second count (Kucera and Francis, 1976) was 32/1 000 000 (SD= 71). A trained male speaker recorded all stimulus materials used in all experiments in a single recording session.

Stimuli

were

uttered

in isolation

in a random

order in a sound-insulated booth. Materials

were digi-

tized at 10000 samples/s and submittedto linear predictive coding analysis (LPC) (Atal and Hanauer, 1971) and formant tracking (Olive, 1971). Th'ese analyses compute parameters, either pseudoarea function parameters or formant frequencies and formant amplitudes every 10 ms. Three

versions

of each word

were

constructed.

/e/,

and the

/d/ transition from the word "faded" and concatenating it with the second syllable of "fable" to form "fadble." Thus, in condition two the VC transition with the CV transition.

burst

second syllable. were

four

subconditions

within

the second

con-

dition: (1) The VC transition conformed to the CV transition in both manner and place of articulation. For example, the first syllable came from the word "stable" and was concatenated with the second syllable of "staple." (2) The VC and CV transitions were both stops, but differed in place of articulation (labial or

alveolar). For example, the first syllable came from "lady" and the second syllable came from "label." (3) Manner

of articulation

differed

between the VC and CV

transitions--the VC was always from a stop environment, whereas the CV was from a fricative environment. For example, the first syllable came from 1534

In summary,

there were three versions

of each word:

(1) the original LPC analyzed word, (2) the conflicting transition word, and (3) the transition-absent word. Note that all versions of a particular word were identical with respect to syllable duration, pitch contour and source amplitude. •' II.

EXPERIMENT

I-INTELLIGIBILITY

Method 1. Stimulus

materials

All 36 words of the same experimental version

(original, conflicting transition, and transition-absent) were randomized separately and recorded on analog tape. There was a total of three tapes, one for each experimental

condition.

could conflict

• In all cases the medial

and/or noise portions were consideredto be part of the There

In the third condition, the first syllable environments were identical to those described for the conflicting transition condition, but in this condition the VC transition was eliminated entirely and an extension of the steady-state portion of the first vowel substituted in its place. Again, the syllable duration patterns, pitch and amplitude contours of the naturally uttered word were imposed on the steady-state extended words.

The

first condition was the original word as uttered by the speaker and subsequently LPC analyzed and synthesized. In the second condition, the first syllable was taken from a new word environment. For example, the

word "fable" was formed by excising/f/,

,

J. Acoust.Soc. Am., Vol. 65, No. 6, June 1979

2. Subiects High school students in the Murray Hill area participated in the experiment. There were eight subjects in each group. 3. Procedure

Subjects were run in groups of four or five in a soundinsulated booth. Each subject heard only stimuli in one of the three experimental conditions. Stimuli were presented binaurally through earphones at a comfortable listening level. Subjects heard 36 English words presented one at a time;

word they heard.

Their

task Was to write

down the

If unsure, they were to make their

best guess. There were five seconds of silence separating each word. L.A. Streeterand G. N. Nigro: Consonanttransitions

1534

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 134.71.135.39 On: Thu, 04 Dec 2014 21:43:51

B. Results

natural.

Subjects were run in groups of five in a

sound-insulated

All renderings that were pronounceable as the intended word were scored as correct. The percentage correct for each of the three experimental conditions was as

follows: (1) original word; 91% (standarderror= 3%), (2) conflicting transitions; 89% (standard error = 4%), and (3) transition-absent; 91% (standard error= 3%). An analysis of variance using an arcsine transformation of proportion

correct for each word confirmed that there

were no differences among the three conditions (F < 1). Thus, the different ways in which the words were constructed did not differentially affect intelligibility judgments. This is not surprising, since intelligibility tests are characteristically subject to ceiling effects. It is often possible to identify words even when the quality is poor.

III. A.

EXPERIMENT

Block

order

was counterbalanced

B.

Results

Table

I shows the results

for the three

word versions

subdivided into source environment types: + manner, +place; + manner, -place; -manner, +place; and -manner, -place. (A "+" indicates that both transitions were in accord for a particular feature, whereas "-"

indicates

particular

that the two transitions

feature.

differed

for a

This notation does not apply to the ,

original stimuli.) A test of the pairs of conditions using a combined word and subject error term (min-F test, F"; Winer, 1971) failed to show a reliable difference

among the three condition pairs (F" = 2.07, (if= 2, 17, p >0.10). However, note that the preference means for comparison 1 and comparison 2 differed signifi-

II-PREFERENCE

cantly from chance (0.50, no preference for either

condition). Only comparison 3's (original versus

Method

1. Stimulus

booth.

across the subject groups.

transition-absent) 95% confidence interval includes the

0.50 chancepoint. Thus, the original words .werepre-

materials

An analog tape of lexically identical word pairs was constructed. That is, in the conflicting transition condition, the identity of the word was based on the CV transition, e.g., "fadble" was defined to be "fable." Each of the three versions

of each word occurred

with

every other version of the same word in all possible

ferred to words in the conflicting condition, .and transition-absentwords were preferred to the conflicting words.

In analyzing manner by place effects for each comparison separately, there were marginally significant

manner effects (p < 0.10 in both cases) for the original

permutations.These216trials were recordedin ran-

versus conflicting and for the transition-absent versus conflicting words. In addition, there was a marginal

dom order with the first 108 trials constituting one block, and the second 108 constituting a second block.

place effect in the transition-absent versus conflicting comparison (p < 0.10). In summary,

2. Subiects

both original words and transition-

absent words were preferred to conflicting words. However, there was no significant difference between transition-absent and original words in preference ratings.

Ten high school students from the Murray Hill area participated in the experiment. None of the subjects had participated in the first experiment. 3. Procedure

IV.

EXPERIMENT

DECISION

Word pairs were presented binaurally through headphones at a comfortable listening level. Subjects were told that they would hear a pair of utterances.

Each

utterance in the pair was the same word, but differed in how they had been constructed by the computer. Subjects indicated which member of the pair sounded more

III-UNSPEEDED

LEXICAL

TASK

An equal number of phonologicallyregular"nonwords," were intermixed with the words, and listeners indicated whether a given stimulus was a word or a nonword. Presumably, to perform this task, listeners must

search

their

mental

lexicons

to determine

TABLE I. Preference judgments. Percentageof judgmentsfavoring underscoredcondition. Comparison 1 original

,

versus

conflicting

Comparison2 transition absent versus

50% 62% 63 %

49% 59% 60 %

-manner,-place

67%

68•0

mean standarderror

60% -2.6%

59% 2.3%

J.Acoust. Soc.Am.,Vol.65,No.6,June1979

transition

conflicting

+manner, +place +manner,-place -manner, +place

1535

Comparison3 original versus absent

51• 54% 52 %



57% 54% 2.5 %

L.A.Streeter andG.N.Nigro'Consonant transitions 1535

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 134.71.135.39 On: Thu, 04 Dec 2014 21:43:51

whether or not a given stimulus exists. We reasoned that the extent to which a given auditory stimulus deviates from the way in which the word has been heard

in the past should be reflected in the word/nonword judgments and in the listeners' confidence in their decisions. In other words, words of poor quality should be judged to be nonwords more often than words of higher quality, and poor quality words should have lower confidence ratings assigned to them. The unspeeded lexical decision task has been used previously to assess speech quality; the task correlated significantly with intelligibility scores, but it was more sensi-

Subjects were tested in groups of four to six in a sound-insulated booth. Stimuli were presented binaurally through headphones at a comfortable listening level. There were three groups of ten subjects; each group listened to only one of the three tapes. B.

Results

Each word/nonword judgmentwas combinedwith its confidence rating to form a single score. The scores ranged from one to ten; a "one" indicated that the subject had incorrectly classified the stimulus and was

tive than the intelligibility measure (Streeter, 1976).

"very confident" of the judgment, whereas a "ten" indicated that the subject's word/nonword judgmentwas

A.

correct, and further the subject had assigned the highest confidence rating to it.

Method

1. Stimulus

materials

In addition

to the 36 words

in each of the three

con-

ditions, an equal number of nonwords was constructed by taking the first syllable from each of the words and combining it with a second syllable that had the

same phonetic properties as the second syllables of the words. Whenever possible, first and second syllables from the word set were intermixed to form phonologically regular nonwords. For example, the syllables in "faces" and "staple" were intermixed to form the two nonwords "faple" and "staces." These nonwords were constructed in an analogous fashion to the words. Thus, there were three versions of each nonword: (1) the original nonword as uttered by the speaker in the previously described recording session and subsequently

Li•C analyzed; (2) conflicting-transition nonword, in which the VC transition varied in compatibility with the CV transition in manner or place or both manner and

place of articulation; (3) transition-absent nonword, in which

the VC transition

steady-state

was eliminated'

and the

portion of the first vowel extended.

To minimize the effect of subject differences in scale usage, the one-to-ten scores were normalized separately by subject. These averaged normalized scores (z scores) are shown for each condition in Table II. (Note that the mean of the z distribution is zero. Consequently, scores below zero are below the mean, while scores greater than zero are above the mean.) In general, words were rated higher than nonwords

(F" =6.77, df= 1, 41,/•

The role of medial consonant transitions in word perception.

The role of medial consonant transitions in word perception Lynn A. Streeter Bell Laboratories,Murray Hill New Jersey07974 Georgia N. Nigro Depar...
1MB Sizes 0 Downloads 0 Views