Organization in Infant Speech Perception* JOANNE L. MILLER Northeastern University

PETER D. EIMAS Brown University

ABSTRACT

Over the past few years it has become increasingly clear that infants are capable of distinguishing speech contrasts that are differentiated by a single acoustic parameter and, moreover, that they perceive these distinctions in terms of adult phonetic categories. In a series of experiments, we examined whether infants also perceive organization in speech; specifically, whether they are sensitive not only to individual elements, but also to the combination of elements that comprise basic units. We found evidence that two- to fourmonth-old infants perceive organization at both the syllabic and segmental levels: infants noticed the rearrangement of consonants and vowels to form new syllables and the rearrangement of phonetic feature values to form new phonetic segments. These findings provide further support for the assumption that very young infants have quite sophisticated speech perception abilities, which constitute an important prerequisite for the acquisition of language.

The acquisition of language is certainly one of the most remarkable accomplishments of childhood. A normal child, with even minimal linguistic experience, will learn a language and, moreover, will do so in a relatively rapid and seemingly effortless manner - quite different from the slow, difficult process of learning to read (Liberman, Liberman, Mattingly, & Shankweiler, 1978). Given the enormous complexity of

language, and the success and apparent ease with which it is mastered by the child, many theorists have proposed that the infant possesses innate predispositions that guide him through the language development process: abilities and strategies that allow him to discover the structure of language and to develop the processes necessary for its use in communication (Chomsky, 1965; Lenneberg, 1967; McNeill, 1970; Slobin, 1973). Within this view of language development, a critical issue is the specification of these innate abilities and how they interact with early linguistic experience to allow the acquisition process to proceed. In order for an infant to begin to acquire a language, he must be able to perceive and distinguish among the sounds of the language, the carriers of the more abstract linguistic message. Although this requirement seems quite straightforward, it is not, inasmuch as the sound system of any language, the phonetic structure, is itself a complex level of language. The perceived phonetic segments - the consonants and vowels we hear when we listen to speech — are not related in a simple one-to-one fashion with any acoustic segments in the speech waveform. Rather, given the nature of the articulatory system that produces speech, there is a restructuring of information, analogous to that which exists at the higher levels of language (Liberman, 1970; Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). The listener, in perceiving speech, must process the speech waveform in a way that recovers the phonetic segments intended by the speaker. If the infant does possess certain innate predispositions for language learning, we might expect that some of the earliest manifested abilities would be those that permit the perception of speech in terms of its phone-

*This research was supported by NICHD Grant HD-05331 to P.D. Eimas and NICHD Contract NIH-71-2420 to Haskins Laboratories. We would like to thank Dr. A.M. Liberman for making available the facilities at Haskins Laboratories for the preparation of stimulus materials and Robin Brill, Sally Mac, Arlene McAuley, and June Shepp for testing the infants.

Canad. J. Psychol./Rev. canad. Psychol., 1979,33 (4)

353

tic structure; that is, that allow the infant to perceive linguistically relevant acoustic distinctions and to recover the phonetic segments from the speech stream. Considerable progress has been made over the past ten years in understanding the nature of infant speech perception, much of it due to the use of synthetically produced speech which allows for the precise manipulation of acoustic parameters. For recent comprehensive reviews of this literature, the reader is referred to Eimas & Tartter (1979) and Jusczyk (1980). Most of the research has focused on the various phonetic constrasts the infant can perceive and the manner in which perception occurs. It can be concluded on the basis of this work that young infants, with minimal exposure to language, are quite sophisticated in their ability to perceive speech. First, infants are sensitive to the acoustic properties that signal phonetic distinctions as well as able to make rather fine-grained distinctions among these parameters. More specifically, they can perceive contrasts in vowel quality, as [a] vs. [i] or [i] vs. [I], when the vowel is specified by the frequencies of the steady-state formants, a major cue for vowel identity (e.g., Swoboda, Morse, & Leavitt, 1976; Trehub, 1973). In addition, they can also perceive many (if not all) of the consonantal distinctions. For example, they can discriminate two consonants that vary in the phonetic feature of voicing, as [b] vs. [p] or [d] vs. [t], when that distinction is cued by voice-onset-time (VOT) (e.g., Eimas, 1975a; Eimas, Siqueland, Jusczyk, & Vigorito, 1971); they show evidence of distinguishing consonants that vary with respect to place of articulation, as [b] vs. [d] or [b] vs. [g], when that contrast is cued either by formant transition or burst frequency (e.g., Miller, Morse, & Dorman,

1977; Morse, 1972); and they perceive differences in manner of articulation, as [b] vs. [m], when the contrast is signaled primarily by the presence of a nasal resonance (Eimas & Miller, 1979). Second, the nature of the infant's perception mirrors that of the adult in an important respect. For both adults and infants, vowels (at least relatively long, steady-state vowels) are perceived in a nearly continuous manner (e.g., Fry, Abramson, Eimas, & Liberman, 1962; Pisoni, 1973; Swoboda, Kass, Morse, & Leavitt, 1978; Swoboda, Morse, & Leavitt, 1976). That is to say, for adults, the classification of a given formant pattern into a particular vowel category (e.g., [i] vs. [I]) does not limit its discriminability: two vowels that differ in formant structure but have the same identity are typically discriminated quite well. And, similarly, infants can discriminate two vowels that belong to the same adult category. On the other hand, many consonantal distinctions, for example those based on voicing or place of articulation, are perceived by both adults and infants in a more categorical fashion, such that two acoustic variants of a given (adult) phonetic category are typically discriminated quite poorly (e.g., Eimas, 1974, 1975a; Eimas et al., 1971; Liberman, Harris, Hoffman, & Griffith, 1957; Pisoni, 1973).' In fact, it is just this finding, namely that young infants do perceive consonantal distinctions in terms of adult phonetic categories, which has led to the hypothesis that they possess mechanisms tuned for the perception of speech, in particular, for the categorization of sounds into phonetic classes (see Eimas & Tartter, 1979). Finally, cross-language research on infants, in conjunction with studies on adults, has indicated that, although specific experience may not be necessary to perceive some contrasts, exposure to a lan-

1

Recent investigations have shown that, at least for adults, numerous factors influence the degree to which any set of stimuli (speech or non-speech) is categorically perceived. These include the nature of the acoustic information specifying the perceived distinction, the nature of the experimental procedure used to assess discrimination, and the amount of training given the subject. For further discussion of these issues, the reader is referred to Pisoni (1978).

354

J.L. Miller & P.D. Eimas

guage can modulate the way in which a particular contrast is perceived (e.g., Eimas, 1975b; Lasky, Syrdal-Lasky, & Klein, 1975; Streeter, 1976). Thus, the nature of the adult's perception is a function of the interaction of innately given perceptual abilities and exposure to a particular language. The infant, then, can perceive variations in many acoustic parameters that specify phonetic contrasts and, at least for consonantal distinctions, does so in terms of adult phonetic categories. But the perception of speech, that is, the extraction of phonetic segments from the speech waveform, requires more than this. First, as we indicated above, the relation between perceived phone and acoustic signal is enormously complex. For one thing, it appears that any particular phonetic contrast, for example, the contrast in voicing between [b] and [p], can be cued by a number of acoustic parameters; there are in effect multiple cues for a given phonetic distinction. For another, even considering a single cue, the exact form of the acoustic information varies considerably as a function of a number of contextual factors such as phonetic context, speaker, and rate of speech. The adult shows evidence of perceptual constancy - he perceives a given distinction in spite of its being specified by multiple, context-dependent cues (see Miller & Eimas, 1979, for a further discussion of these issues). The issue with respect to development is whether the young infant also shows evidence of such perceptual constancy, or whether this is somehow learned

through the course of language development. Although there is very little research on this matter, the available data do suggest that the infant may perceive at least certain kinds of variant information in terms of phonetic categories, that is, show evidence of at least some types of perceptual constancy (Fodor, Garrett, & Brill, 1975; Holmberg, Morgan, & Kuhl, 1977; Kuhl, 1976, A second prerequisite for adult-like phonetic perception that goes beyond the ability to perceive distinctions cued by a single acoustic parameter is the ability to recognize combinations of phonetic elements, inasmuch as it is the combination of elements that specifies the basic units of language. The combination of phones consonants and vowels - specifies the syllables of the language and the combination of values along the phonetic features (voicing, place of articulation, manner of articulation) defines the individual phones of the language. Thus, in order to perceive syllables as units, the infant must recognize combinations of phones and, in order to perceive phonetic segments as units, he must recognize combinations of feature values. In a series of studies, we examined whether young infants, two to four months of age, show evidence of such pattern recognition, that is, whether they can go beyond treating each phonetic element as a separate, unrelated entity, and recognize the organization that exists among elements.2 We should point out that this organiza-

2

It is possible that the infant's perception of complex patterns is holistic in nature, that is, based on the entire structure of a pattern and not derived from initial analytic processes which decompose the pattern into components or features. Of course, if this is true, then the infant's perception of pattern is by definition, organized. There are thus three possibilities concerning the nature of infant pattern perception: (1) perception is holistic; (2) perception is based on an initial analysis of features, and there is some organization of the features; and (3) perception is based on an initial analysis of features, but without any organization. Our study permits us to eliminate either the first and second possibilities, or the third possibility, but does not permit us to differentiate between the first two. Indeed, we know of no way to make this distinction experimentally with infant subjects. There is, however, rather extensive behavioral as well as neurophysiological evidence for an analysis into components or features in human and non-human pattern perception. Given this evidence, and the infant's sensitivity to these components, we believe that it is likely that the perceptual system of infants analyzes complex patterns into components. Consequently, in the paper we have elected to describe the infant's perceptual system in terms of components, and not in terms of holistic patterns.

Organization in infant speech perception

355

tion based on elements can actually be described in two ways. One is in terms of the organization of phonetic elements, e.g., consonant and vowel. The other is in terms not of phonetic elements, but rather of the acoustic properties that underlie those phonetic elements. An important consideration, of course, is which framework is correct with respect to infant perception, an issue to which we shall return later in the paper. For ease of explication, we shall describe our experiments in terms of the organization of phonetic elements — features and phones.

[da]

[ba]

1

l l n

Mill

300 msec

[die] 4 —

EXPERIMENT I: THE SYLLABLE

Our first experiment was concerned with the infant's ability to organize or perceive the combination of consonants and vowels that form consonant-vowel (cv) syllables. Specifically, the stimuli were four synthetically generated cv syllables, [ba], [da], [bae], and [dae], the spectrograms of which are shown in Figure 1. These four syllables can be differentiated from each other on the basis of the consonant, [b] or [d], and the vowel, [a] or [ae]. As can be seen from Figure 1, the consonantal distinction is specified by the nature of the initial formant transitions: the second and third formants (F2 and F3) are rising for [b] and falling for [d]. The vowel contrast is signaled primarily by the frequencies of F2 and F3, which are higher for [ae] than for [a].3 With these four syllables, we designed an experiment to investigate whether infants would recognize a recombination of consonants and vowels to form new syllables. We tested two types of change, one, a

3—

2—

300 msec

300 msec

FIGURE 1 Spectrograms of the four syllables used in Experiment 1.

change in a single phone and the other, a change in the combination of phones. In each test, two pairs of stimuli were presented for discrimination. It is important to note that the stimuli were presented in an alternating manner that precluded their having any inherent order. (See the method section for details.) Infants in the single group (Group S) were tested on the discrimination of two pairs of syllables that differed either only in vowel - [ba]-[da] vs. [bae]-[dae] or only in consonant, [ba][bae] vs. [da]-[dae]. From earlier research,

3

As is apparent from Figure 1, it is not the case that the perceived consonant corresponds to the first acoustic segment (the transitions) and the perceived vowel corresponds to the second (the steady-state formants). Although for these syllables the consonant is specified solely by the initial formant transitions, the vowel is specified by both the transitions and the steady-state formants. That is, F2 and F3 are relatively higher for [ae] than [a] throughout the entire syllable. The fact that the same acoustic segment (transitions) conveys information about more than one phonetic segment (consonant and vowel) is a consequence of coarticulation during speech production. This type of parallel transmission of information is pervasive in speech and is one of the properties making the relation between acoustic feature and perceived phonetic segment so complex (cf. Liberman, 1970; Liberman et al., 1967).

356

J.L. Miller & P.D. Eimas

we fully expected that these infants would show evidence of discriminating the singlephone changes, whether based on consonantal or vocalic distinctions. Infants in the recombination group (Group R) were also tested on the discrimination of two pairs of syllables but, in this case, the two consonants and the two vowels occurred in both sets and the sets differed only in the pairing of the consonant and vowel - [ba][dae] vs. [bae]-[da]. If infants are responsive only to the individual consonants and vowels, and do not perceive the manner in which they are combined, that is, the syllabic structure, then we would expect no evidence that these patterns are discriminated. However, if they are able to perceive not only the separate phonetic segments, but also the combinations of segments, that is, which consonant is paired with which vowel, then evidence of discrimination should be obtained. METHOD Procedure

The procedure we used was the high-amplitudesucking (HAS) technique, originally developed by Siqueland and DeLucia (1969) and described in earlier papers (see Eimas, 1974, 1975a for a detailed description of the procedure and apparatus). Essentially, an infant is brought into a dimly lighted sound-treated room and placed in an infant seat approximately 45 cm in front of a speaker. Just under the speaker is a colourful projected picture that serves to keep the infant oriented towards the speaker. He is given a blind nipple to suck, which is held in place by an experimenter. Throughout the session the experimenter listens to music over earphones, thereby masking the stimuli and eliminating the possibility of experimenter bias. At the outset of a session, a second experimenter, positioned in front of control equipment in a separate room, sets a criterion level such that between approximately 20 and 40 of the infant's sucking responses during a minute-long period would be considered high-amplitude, criterion sucks. As soon as a one-minute baseline reading is taken with the criterion level so set, contingent sound presentation begins. Specifically, contingent on each high-amplitude suck, the infant is presented one syllable (plus intersyllabic silence), provided that there is at least one second Organization in infant speech perception

between sucking responses. If the interval between sucks is less than one second, as is typically the case since infants suck in bursts, the timing mechanism is reset so that he hears a series of syllables, separated by the intersyllabic silent interval. No sound is presented at an interval longer than one second after the last criterial response. Contingent sound presentation typically results in an initial increase in sucking rate followed by a decrease in rate, or satiation. When a set of satiation criteria is met, the tape recorder output is switched to the second channel. For infants in the two experimental groups (Groups S and R), this results in the second set of sounds being presented; for infants in a control group (Group C), the same set of sounds is presented as before. After the stimulus shift (or what would have been shift for the control infants), the second set of sounds is presented for four minutes. The satiation criteria are as follows: (1) the sucking rate must decrease by 20% or more for two consecutive minutes compared with the immediately preceding minute; (2) there must be at least three minutes of sound presentation before the two satiation minutes; (3) there must be at least fifteen sucks in each of the two satiation minutes; and (4) the above criteria must be met within twenty minutes. Subjects who fail to meet these criteria, who persistendy cry or sleep for at least two minutes, or who fail to accept the nipple or will not suck, are eliminated from the experiment. Three groups of infants were tested, S (single), R (recombination), and C (control). Group S contained 24 infants. Half of these were tested on a change in vowel only ([ba]-[da] vs. [bae][dae]) and the other half were tested on a change in consonant only ([ba]-[bae] vs. [da]-[dae]). Within each subgroup, half of the infants heard one set before shift and the other half heard the other set before shift. Group R contained 12 infants who were tested on the contrast between [ba]-[dae] and [da]-[bae]. Again, half heard one set before shift and half heard the other set during preshift. Finally, Group C contained 36 infants, 12 tested on a set taken from the singlevowel condition ([ba]-[da]), 12 tested on a set taken from the single-consonant condition ([ba][bae]), and 12 tested on a set from the recombination condition ([ba]-[dae]). Of course, these infants heard the same two sounds throughout the experimental session. Stimuli

The stimuli were four five-formant cv syllables, [ba], [da], [bae], and [dae], produced on the OVE IIIc serial synthesizer at Haskins Labor357

atories (see Fig. 1). For each syllable, there was an initial transition period in the lower three formants over the first 50 msec, followed by 250 msec of steady-state information, for a tota] syllable duration of 300 msec. The higher two formants, F4 and F5, were kept constant throughout the syllable at 3515 and 4265 Hz, respectively. (The level of F5 was very weak and is therefore not readily apparent in the spectrograms as is true for F 4 for [a].) All syllables had the same first formant (Ft), which rose over the first 50 msec from 359 Hz to 771 Hz. Finally, all syllables had falling fundamental frequency and amplitude contours. Within each vowel context, [a] and [ae], the consonantal distinction was cued by rising F2 and F3 transitions for [b] and falling transitions for [d]. Consider first [ba] and [da]. The steadystate values of F2 and F3 were 1233 an0< 2520 Hz, respectively. The F2 and F3 transitions for [ba] rose from initial values of 897 and 2000 Hz, respectively, and fell for [da] from values of 1620 and 2998 Hz, respectively. Next consider [bae] and [dae]. In order to create the change in vowel, we first raised the steady-state values of F2 and F3 to new values of 1796 and 2780 Hz, respectively. Then, working within the constraints of the step sizes of the synthesizer, we raised the F2 and F3 transitions by the same amounts as the steady states, so that the slopes of the transitions for [bae] and [dae] would be approximately the same as those for [ba] and [da]. That is, as can be seen in Figure 1, F2 and F3 are simply displaced upward for [bae] and [dae] compared to [ba] and [da]. Finally, since the OVE IIIc is a serial synthesizer, the amplitudes of the individual formants vary according to the relative frequencies of the formants. As is apparent from the spectrograms, this resulted in a considerably more intense F4 for the syllables with [ae] than those with [a]; the amplitudes of the other formants are similar across the two vowel contexts. Each of the four syllables was digitized, using the Haskins PCM system and six different tapes were made. Two tapes were created for Group S. One contained [ba] and [da] on one channel and [bae] and [dae] on the other, and the other contained [ba] and [bae] on one channel and [da] and [dae] on the other. One tape was made for Group R which contained [ba] and [dae] and [da] and [bae] on the two channels, respectively. Finally, three tapes were made for Group C, each containing the same two syllables on both channels. The three tapes contained [ba] and [da], [ba] and [bae], and [ba] and [dae]. On all tapes, the two stimuli on a given channel were recorded in an alternating sequence with an ISI

358

of 700 msec, to yield one stimulus per second. Moreover, the timing of the syllables on the two channels was synchronized to ensure a smooth transition between channels at stimulus shift. All syllables were presented to the infant at approximately 75 dB SPL against a background level of 63 dB SPL. It is important to note that the fact that the syllables were recorded in alternating sequence with a constant ISI, in conjunction with the nature of the sucking procedure, precludes the possibility that the infant considers any given syllable in a pair first and the other second. That is, since it is the infant who initiates and terminates sound presentation by sucking, and since the time at which he begins a sucking burst is independent of the position of the tape head with respect to the two syllables, a burst of sucking sometimes produces a series beginning with one of the syllables and sometimes produces a series beginning with the other, and which syllable begins or ends a series occurs essentially randomly throughout the session. Consequently, the syllables do not have an inherent order. As a result discrimination in the experimental groups cannot be based only on detection of a change in what would be the 'first' or 'second* stimulus of a pair, as would be possible if the stimuli were ordered. Subjects

The subjects for this study were 72 normal infants, two, three, and four months of age, with approximately one-third of the infants at each age level and approximately half males and half females. In order to obtain complete data on 72 infants, we tested 117 additional infants who did not complete the experimental procedures. Of these infants, 91% were dropped from the experiment before shift and 9% after shift. Twenty per cent of the rejected infants never sucked or accepted the nipple, 4% never met the satiation criteria, 74% were eliminated for persistent crying or sleeping, and 2% were eliminated for equipment failure or experimenter error. The 72 infants were randomly assigned to the three treatment groups with the constraints that approximately equal numbers of males and females and equal numbers at each age be assigned to each group. RESULTS

Summary data for the preshift and postshift performance of the three groups of infants are presented in Table 1. Simple analyses of variance on the three preshift

J.L. Miller & P.D. Eimas

Mean response rate (criterion sucks per minute) during three preshift periods and two postshift periods for the Control, Single, and Recombination groups in Experiment i (The scores for the two different Single groups are given in parentheses) Control

Single (Vowel, Consonant)

Recombination

27.3

29.7(31.3,28.0)

27.1

56.7

57.8 (60.2,55.3)

60.3

35.8

35.8(37.8,33.8)

33.5

35.8

46.4 (43.9, 48.9)

46.1

34.3

44.6 (43.0,46.3)

47.1

PRESHIFT

Baseline Third minute before shift Mean of two minutes before shift POSTSHIFT

Mean of two minutes after shift Mean of four minutes after shift

CONSONANT-VOWEL 14—1 12 —

a.

SI ui

-2 — -4—1

FIGURE 2 Mean difference score, in sucks per minute, for the three groups in Experiment i. Each difference score was calculated by subtracting the mean number of criterion sucks during the two minutes before shift from the mean number of criterion sucks during the two minutes after shift.

measures (baseline, third minute before shift, mean of two minutes before shift) indicated no reliable differences in sucking rate among the three groups, F(2,6g) < l in each case.

Organization in infant speech perception

Since, as is typical of this procedure, there was considerable individual variability in the mean sucking rate during the two minutes prior to shift, the analyses of postshift data were based on difference scores. The main analyses were based on the difference in sucking rate during the two minutes before shift and the two minutes after shift. Figure 2 shows that there was no mean change in sucking rate for infants in Group C, whereas the two experimental groups showed a similar increase in rate. An analysis of variance confirmed that there was a reliable difference among groups, ^(2,69) = 5.8, p < .01, and post hoc comparisons indicated that whereas both experimental groups differed from the control group (p < .05 in each case), they did not differ reliably from each other (p > .10). Thus, the infants provided clear evidence of discriminating not only a change in a single phonetic segment, but also a recombination of consonant and vowel to form new syllables. At this point we should look more closely at Group S which, as the reader will recall, was actually made up of two subgroups, those infants who received a change in consonant and those who received a change in vowel. As indicated above, the main analyses showed a reliable difference between

359

Groups S and C, when the consonant and vowel subgroups were treated together. However, from Table i it is apparent that a change in consonant and a change in vowel were not equivalent: for the single-vowel (sv) group, there was a mean increase in sucking rate from 37.8 to 43.9 (a difference of 6.1) whereas for the single-consonant (sc) group the increase was from 33.8 to 48.9 (a difference of 15.1). Two additional analyses were performed to ascertain whether the consonant and vowel change functioned in a reliably different manner. The first was a one-way analysis of variance on four groups, C, SC, SV, and R. As expected, there was a reliable group effect, ^(3,69) = 4.77, p < .005. Post hoc analyses, though, showed that while Groups R and SC differed reliably from Group C (p < .05 in each case), Group SV did not differ reliably from Group C (p < .20), although it also did not differ reliably from Group SC (p < .20). Thus the evidence for discriminating a change of vowel was only marginal. The second analysis largely confirmed these findings. This was a two-way analysis of variance, Groups (SV vs. SC vs. R) x Condition (experimental vs. control). As we described earlier, the control infants were actually divided into three subgroups, each of which received a stimulus pair given to one of the three experimental groups. This analysis permits a comparison of each experimental group with its respective control group. There was a highly reliable effect of condition, F( 1,66) = 11.9, p < .001, indicating a difference between infants who received a change at stimulus shift and those who did not. If the two experimental single groups functioned differently with respect to their controls, there would have been a Group X Condition interaction. Although there was a trend in that direction, the interaction failed to reach statistical significance, F(2,66) = 2.16, p < .13. Thus, the results of the two additional analyses, taken together, are somewhat equivocal with respect to the relation between vowel and consonant dis-

360

crimination. They do suggest, however, that the vowel contrast may have been somewhat more difficult than the consonantal contrast. It is difficult to say why infants may have found the vowel discrimination task more difficult than the consonant discrimination task, especially since there is no suggestion in the literature that vowels in general are more difficult to discriminate than consonants. We can, however, offer a speculation. It may have been that the particular vowel contrast we chose was somewhat less discriminable, and hence less salient, than the consonantal contrast we chose. And this, coupled with the stimulus variation inherent in our procedure, led infants to have relatively greater difficulty discriminating vowels. Specifically, infants in the vowel group may have attended more to the highly salient variation in the consonant both before and after the stimulus shift than to the less salient change in vowel at stimulus shift, which may sometimes have gone unnoticed. On the other hand, infants in the consonant group would have heard the less salient variation in the vowel throughout the session and, at shift, a highly salient change in consonant, that was readily detected and discriminated. Further research, with controls for the level of discriminability of the two distinctions, is required to assess the viability of this explanation. We should note, however, that infants in Group R must have discriminated the vowel contrast; otherwise, they would have heard a single vowel before and after shift, paired with the same two consonants, [b] and [d], which would have provided no basis for discrimination. Finally, we conducted a comparable set of analyses on the difference scores based on responding during the entire four-minute postshift period. These results were essentially the same as those based on the first two minutes of postshift: they showed that infants discriminated a change in a single phonetic segment and in the combination of segments and, furthermore, that the J.L. Miller &P.D.Eimas

consonantal distinction may have been somewhat easier than the vowel distinction. EXPERIMENTS II A N D

[da]

[ba]

III:

THE C O N S O N A N T

We have shown that infants are responsive to particular combinations of consonants and vowels, that they are sensitive to syllabic structure. Our next question was whether they are also sensitive to the structure of a single consonant. That is, the phonetic segments of the language are themselves comprised of subunits, the phonetic features, such as voicing, place of articulation, and manner of articulation. And, as we indicated earlier, infants can discriminate consonants varying in a single feature. We asked whether infants are sensitive to combinations of feature values that comprise a single consonant, just as they are sensitive to combinations of segments that comprise a given syllable. In doing so we conducted two studies, one examining the two features of voicing and place of articulation and the other examining the features of manner of articulation and place of articulation. The rationale for these studies was the same as that of the consonant-vowel study. Let us first consider the voicing-place study. The four syllables used in this study, [ba], [pa], [da], and [ta], are shown in Figure 3. They are differentiated from each other on the basis of two features, voicing and place of articulation. The voicing distinction between the voiced consonants, [b] and [d], and the voiceless consonants, [p] and [t], is signaled by the acoustic parameter of VOT. This can be seen as a change in the duration of the aspiration and the first formant at the beginning of the syllable. The place distinction between the labial consonants, [b] and [p], and the alveolar consonants, [d] and [t], is cued (as it was in the previous study) by the F2 and F3 transitions, which are rising for the labials and falling for the alveolars. Thus each consonant- the voiced labial [b], the voiced alveolar [d], the voiceless labial [p], and the voiceless alveolar [t] - is defined

Organization in infant speech perception

I2" 1—

z

240 msec

240 msec

[pa]

[ta]

~

I —

W L240 msec

240 msec

FIGURE 3 Spectrograms of the four syllables used in Experiment 11.

by a specific combination of VOT and formant transition values. From earlier work on infant perception, we expected that infants would perceive a change in voicing alone, i.e., [ba] and [da] vs. [pa] and [ta], and a change in place alone, i.e., [ba] and [pa] vs. [da] and [ta]. The critical question is whether they would also discriminate pairs of syllables for which only the combination of voicing and place values has changed, that is, [ba] and [ta] vs. [da] and [pa]. For this set, both values of the voicing feature (voiced and voiceless) and both values of the place feature (labial and alveolar) are present in both stimulus pairs; only the combination of feature values is different. Thus, only if infants are sensitive to the relation between feature values will they discriminate these pairs. An analogous set of arguments can be made for the study on manner and place, 361

Ua]

[bal

soo nraec [na]

[ma]

2 I

I

340 msec

340 msec

FIGURE 4 Spectrograms of the four syllables used in Experiment HI.

which used the syllables shown in Figure 4, [ba], [ma], [da], and [na]. For these syllables, the place distinction between the labial consonants, [b] and [m], and the alveolar consonants, [d] and [n], was again cued by the F2 and F3 transitions; the manner distinction between the stop consonants, [b] and [d], and the nasal consonants, [m] and [n], was cued by nasal resonance and the slope of Fi. For this study, as for the voicingplace study, the critical issue is whether infants can discriminate a change in the combination of values — [ba] and [na] vs. [da] and [ma]. METHOD Procedure

The same procedure was used in each of these experiments as in the consonant-vowel study. Infants were assigned to groups in the following manner (half of the infants in each group heard one of the syllable pairs tested before shift): (1) Voicing-place. Group R consisted of 16 infants

362

tested on the contrast [ba]-[ta] vs. [pa]-[da]; Group S consisted of 32 infants, half tested on the place contrast, [ba]-[pa] vs. [da]-[ta], and half tested on the voicing contrast, [ba]-[da] vs. [pa][ta]; and Group C contained 16 infants, five given [ba]-[pa], five [pa]-[ta], and six [pa]-[da]. (2) Manner-place. Group R again consisted of 16 infants, this time tested on the contrast [ba]-[na] vs. [ma]-[da]; Group S consisted of 3a infants, half tested on the manner contrast, [raa]-[na] vs. [ba]-[da], and half on the place contrast, [ba][ma] vs. [da]-[na]; and Group C consisted of 16 infants, five presented [ma]-[ba], five [ba]-[da], and six [na]-[ba]. Stimuli

Voicing-place. The four three-formant syllables used in this study were generated on the Haskins Laboratories parallel resonance synthesizer (see Fig. 3). Each was 240 msec long, including an initial 36 msec transition segment followed by 204 msec of steady-state formants. Both the fundamental frequency and overall amplitude had falling contours. The steady-state formant values were 769, 1232, and 2525 Hz for Fi, F2, and F3, respectively. Fi started at 537 Hz for all syllables. The voicing distinction was cued by a change in VOT: [ba] and [da] each had a VOT of o msec and [pa] and [ta] each had a VOT of 40 msec. The change in VOT from o to 40 msec was accomplished by exciting the higher formants with a noise source rather than a periodic source and by eliminating F1 for the first 40 msec. The place distinction was signaled by a change in the F2 and F3 formant transitions: F2 and F3 had starting values of 921 and 2018 Hz, respectively, for the labial consonants and values of 1695 and 3363 Hz, respectively, for the alveolar consonants. Each of the four syllables was digitized and six tapes were constructed, corresponding to the six contrasts tested across the groups, described above in the procedure section. Each tape contained an alternating sequence of the two syllables on each channel, with an ISI of 760 msec. As in the prior study, stimuli were presented at approximately 75 db SPL. Manner-place. Each of the syllables used in this experiment (see Fig. 4) was a three-formant pattern generated on the Haskins Laboratories parallel resonance synthesizer. Consider first the two stop-vowel syllables, [ba] and [da]. Each was 300 msec in duration, including an initial 45 msec of transition and a 255 msec steady-state segment. Each stimulus had a falling fundamental frequency contour and an overall amplitude contour that rose slightly over the course

J.L. Miller & P.D. Eimas

of the transition and then fell gradually. The steady-state formant values were 769, 1232, and 2525 Hz for Fi, F2, and F3, respectively. Fi began at 537 Hz for both syllables. The syllables differed, however, in the starting values of F2 and F3. These were 921 and 2018 Hz for [ba] and 1695 and 3363 Hz for [da]. Thus, as in the voicing-place study, the labial value was cued by rising transitions and the alveolar value by falling transitions. Now consider the two nasal stimuli, [ma] and [na]. These were identical with their [ba] and [da] counterparts except for two changes which signaled a change in manner. First, a 60 msec nasal resonance, 20 msec of which overlapped the onset of the syllable, was added to each syllable, yielding a total stimulus duration of 340 msec. Second, the Fi transition was flattened by raising its starting frequency from 537 to 734 Hz. Thus, both place distinctions were cued in the same way (a change in F2 and F3), as were both manner distinctions (a change in the resonance and Fi slope). As in the previous studies, digitized versions of the syllables were used to prepare six stimulus tapes, each containing an alternating sequence of syllables recorded synchronously on both channels. In order to maintain a constant rate of stimulus presentation on the tapes, it was necessary to use two ISIS: the ISI following [ba] and [da] was 700 msec, and that following [ma] and [na] was 660 msec. In this way, all stimuli were recorded at a rate of one stimulus per second.

VOCWG-PLACE

MANNER-PLACE

FIGURE 5 Mean difference score, in sucks per minute, for the three groups of Experiments 11 and in. Each difference score was calculated by subtracting the mean number of criterion sucks during the two minutes before shift from the mean number of criterion sucks during the two minutes after shift.

RESULTS

vealed no significant differences among groups for either study, ^(2,61) =£ 1.86, p > . 10 in each case. As in the consonant-vowel study, the main postshift analyses were conducted on the two-minute difference scores. Figure 5, which shows the mean scores for each group in each experiment, indicates a pattern of performance very similar to that in the consonant-vowel study: for both the voicing-place and manner-place studies, there is a clear difference between Group C, which showed an average decrease in sucking after shift, and Groups S and R, which showed a recovery in sucking. However, separate analyses of variance on the two studies failed to substantiate these impressions. Although there was a significant group effect for the voicingplace study, F(2,6i) = 3.06, p < .05, post hoc comparisons revealed an ambiguous picture. Group S was reliably different from Group C \p < .05); however, Group R was not different from either Groups S or C (p > .10). For the manner-place data, the main effect was not statistically significant, F(2,6i) = 1.84,p < .20.

Summary pre- and postshift data for the voicing-place study are given in Table 11 and for the manner-place study in Table in. Analyses on the three preshift measures re-

Inspection of the individual subject data in the two studies on feature combination indicated considerable individual variation, resulting in a failure to achieve statistical

Subjects

Sixty-four normal infants, two, three, and four months of age, approximately half male and half female, served as subjects in each of the two experiments. To obtain complete data on 128 infants, it was also necessary to test an additional 371 infants, 90% of whom were eliminated from the experiment before stimulus shift. The infants were rejected for the following reasons: 25% never sucked or accepted the nipple; 8% never met the satiation criteria; 63% cried persistently or slept; and for 4% there was equipment failure or experimenter error. In each experiment, infants were assigned to groups so that each contained approximately the same number of males and females and approximately the same number of infants at each age.

Organization in infant speech perception

363

Mean response rate (criterion sucks per minute) during three preshift periods and two postshift periods for the Control, Single, and Recombination groups in Experiment n (The scores for the two different Single groups are given in parentheses) Control

Single (Voicing, Place)

Recombination

35.1

31.6(32.2,31.1)

30.8

63.5

56.2 (54.9,57.6)

54.6

40.5

35.2 (35.8, 34.5)

33.6

37.7

43.1(41.4,44.8)

39.3

36.8

42.6(41.8,43.5)

37.4

PRESHIFT

Baseline Third minute before shift Mean of two minutes before shift POSTSHIFT

Mean of two minutes after shift Mean of four minutes after shift TABLE III

Mean response rate (criterion sucks per minute) during three preshift periods and two postshift periods for the Control, Single, and Recombination groups in Experiment m (The scores for the two different Single groups are give in parentheses) Control

Single (Manner, Place)

Recombination

35.2

34.0(34.1,33.9)

29.2

57.2

59.9 (62.9,56.9)

61.8

36.8

36.0(40.2,31.7)

37.2

36.1

42.4 (44.7,40.0)

43.6

37.1

44.6 (44.8,44.3)

41.8

PRESHIFT

Baseline Third minute before shift Mean of two minutes before shift POSTSHIFT

Mean of two minutes after shift Mean of four minutes after shift

significance. Inasmuch as the trends in the two studies were similar, and similar to that of the previous study, we conducted an analysis in which we pooled the data of the two studies by treating them as replications. This analysis revealed no effect of replication and, more importantly, no interaction with replication (F < l in each case), thereby justifying the pooling of the data. There was a significant effect of groups, F(2,i22) = 4.75, p < .02. Moreover, post hoc comparisons showed that both experimental groups were reliably different from the control group (p < .05 in each case), but were not reliably different from each other (p > . 10). Thus, at least when all subjects are

364

considered together, the pattern of these studies mirrors that of the consonant-vowel study, indicating that infants are sensitive both to changes in a single feature and to particular arrangements of feature values within a single consonant. A comparable set of analyses on the four-minute difference scores yielded the same general pattern of results. DISCUSSION

Our studies show that young infants, two to four months of age, are sensitive to the combinations of elements that comprise basic units of speech; combinations of feaJ.L. Miller & P.D. Eimas

ture values that define the phonetic segments and combinations of phonetic segments that define the syllable. Thus the infant not only can distinguish phonetic units on the basis of a single acoustic parameter, but can distinguish them as well on the basis of the relation or organization among parameters. Whether the ability to recognize such rearrangements in pattern is innately given or requires some minimal experience listening to speech is currently unknown. What is clear, however, is that well before infants are able to articulate or attach meaning to sequences of sounds, they have at least one of the prerequisites for forming a unified percept at the segmental and syllabic levels. We should point out that although this type of pattern recognition is critical for language development it is by no means limited to it, but rather is an important aspect of perceptual development in general. A number of investigators have, in fact, examined this issue with respect to infant visual and auditory perception. Although there is some controversy over the age at which this abilty is first evidenced, there is general agreement that at least by a few months of age the infant is capable of differentiating patterns on the basis of the arrangement of their elements (e.g., Chang & Trehub, 1977; Cornell, 1975; Fagan, 1977; Osman, 1977; and see Cohen & Gelber, 1975 and Fantz, Fagan, & Miranda, 1975 for reviews of the literature on vision). It will be especially interesting to determine whether the developmental time course for this ability runs parallel in the domains of speech and visual perception. Earlier we indicated that it is possible to conceptualize the problem of organization in speech perception in two different ways, namely, either in terms of acoustic properties or linguistic units. With respect to infant speech perception, the issue becomes that of determining the level at which the recognition of pattern occurs, whether at an auditory or phonetic level. If recognition is at an auditory level, then the infant is sensitive to the combination of acoustic

Organization in infant speech perception

properties of the waveform. In the syllable study, this would mean that the infant noticed the particular combination of the direction of the formant transitions and the relative level or frequency of the formants. For the two consonant studies, this would mean that, in the case of the voicing-place study, the infant was responding to the combination of VOT value of the stimulus and the direction of the transitions and, in the manner-place study, to the presence of nasal resonance and the direction of formant transitions. According to this hypothesis, the infant, in all instances, responds on the basis of acoustic pattern. If recognition is instead at the phonetic level, then the infant is not sensitive to the combinations of acoustic properties per se, but to the combination of phonetic values specified by the properties; that is, in the case of the syllable study, recognition of pattern configuration occurred after the phonetic segments had been identified and, in the consonant studies, after the phonetic feature values had been extracted. According to this hypothesis, then, the infant responds on the basis of phonetic, not acoustic, values. Although any complete theory of infant speech perception will require the specification of the level, auditory or phonetic, at which such pattern recognition occurs, identifying what that level is will be, if not impossible, extremely difficult. RESUME

II est devenu de plus en plus evident, au cours des dernieres annees, que le nourrisson est capable de distinguer des contrastes de langage different par un seul parametre acoustique et qu'il est meme capable de percevoir ces distinctions selon des categories phonetiques adultes. Une serie d'experiences cherche a determiner si le nourrisson peut aussi percevoir l'organisation du langage ou, en termes plus specifiques, s'il est sensible non seulement aux elements individuels, mais aussi a la combinaison des elements constituant des unites de base. Les resultats prouvent que des nourrissons ages de deux a quatre mois percoivent une telle organisation aussi bien au niveau syllabique qu'au niveau segmental: les nourrissons remarquent le rear-

365

rangement des consonnes et des voyelles en de nouvelles syllabes et le rearrangement des valeurs phonetiques en de nouveaux segments phonetiques. Ces resultats apportent un nouvel appui a l'hypothese que de tres jeunes nourrissons possedent des aptitudes relativement sophistiquees de perception du langage, ces aptitudes constituant un important prerequis a l'acquisition du langage. REFERENCES CHANG, H.-W., &TREHUB, s.E. Auditory processing of

relational information by young infants.y. exp. child Psychoi, 1977, 34, 324-331 CHOMSKY, N. Aspects of the theory ofsyntax. Cambridge: MIT Press, 1965 COHEN, L.B.,&GELBER, E.R. Infant visual memory. In L.B. COHEN & p. SALAPATEK (Eds.), Infant perception: From sensation to cognition (Vol. 1). New York: Academic Press, 1975 CORNELL, E.H. Infants' visual attention to pattern arrangement and orientation. Child Develop., 1975, 46,229-232 EIMAS, P.D. Auditory and linguistic processing of cues for place of articulation by infants. Percept. Psychophys., 1974,16,513-521 EIMAS, P.D. Speech perception in early infancy. In L.B. COHEN & p. SALAPATEK (Eds.), Infant perception: From sensation to cognition (Vol. 11). New York: Academic Press, 1975. (a) EIMAS, P.D. Auditory and phonetic coding of the cues for speech: Discrimination of the [r-1 ] distinction by young infants. Percept. Psychophys., 1975,18,341347- (b) EIMAS, P.D., & MILLER, J.L. Discrimination of the in-

formation for manner of articulation by young infants. Unpublished manuscript, 1979. EIMAS, P.D., SIQUELAND, E.R..JUSCZYK, P., & VIGORITO,J.

Speech perception in infants. Science, 1971,171, 303-306 EIMAS, P.D., & TARTTER, v.c. On the development of speech perception: Mechanisms and analogies. In H.w. REESE & L.P. LiPSiTT (Eds.), Advances in child development and behavior (Vol. 13). New York: Academic Press, 1979 FAGAN, J.F., in. An attention model of infant recognition. Child Develop., 1977.48.345-359 FANTZ, R.L., FAGAN, J.F., III, & MIRANDA, S.B. Early visual

selectivity. In L.B. COHEN & p. SALAPATEK (Eds.),

Infant perception: From sensation to cognition (Vol. 1). New York: Academic Press, 1975 FODOR,J.A., GARRETT, M.F., & BRILL, S.L. Pi ka p u : T h e

perception of speech sounds by prelinguistic infants. Percept. Psychophys., 1975,18,74—78 FRY, D . , ABRAMSON, A.S., EIMAS, P . D . , & LIBERMAN, A.M.

The identification and discrimination of synthetic vowels. Lang. Speech, 1962,5,171-189

366

HOLMBERG, T.L., MORGAN, K.A., & KUHL, P.K. S p e e c h

perception in early infancy: Discrimination of fricative consonants. Paper presented at the 94th meeting of the Acoustical Society of America, Miami Beach, 1977 JUSCZYK, p.w. Infant speech perception: A critical appraisal. To be published in P.D. EIMAS &J.L. MILLER

(Eds.), Perspectives on the study ofspeech. Hillsdale: Erlbaum Associates, 1980 KUHL, P.K. Speech perception in early infancy: Perceptual constancy for vowel categories. Paper presented at the 92nd meeting of the Acoustical Society of America, San Diego, 1976 KUHL, P.K. Speech perception in early infancy: perceptual constancy for the vowel categories /a/ and / I)/. Paper presented at the 93rd meeting of the Acoustical Society of America, State College, 1977 LASKY, R.E., SYRDAL-LASKY, A., & KLEIN, R.E. VOT discri-

mination by four- to six-and-a-half-month-old infants from Spanish environments. J. exp. child Psychoi., 1975,80,215-225 LENNEBERG, E.H. The biological foundations of language. New York: Wiley, 1967 LIBERMAN, A.M. The grammars of speech and language. Cognit. Psychoi., 1970,1,301-323 LIBERMAN, A.M., COOPER, F.S., SHANKWEILER, D.P., &

STUDDERT-KENNEDY, M. Perception of the speech code. Psychoi. Rev., l967,74,431 -461 LIBERMAN, A.M., HARRIS, K.S., HOFFMAN, H.S., &

GRIFFITH, B.C. The discrimination of speech sounds within and across phoneme boundaries. J. exp. Psychoi., 1957,54,358-368 LIBERMAN, I.Y., LIBERMAN, A.M., MATTINGLY, I.G., 8c

SHANKWEILER, D. Orthography and the beginning reader. Paper presented at the Cross-Language Conference on Orthography, Reading, and Dyslexia, Bethesda, Maryland, 1978 MC NEILL, D. The acquisition of language: The study of developmentalpsycholinguistics. New York: Harper & Row, 1970 MILLER, c , MORSE, p., & DORMAN, M. Cardiac indices of

infant speech perception: Orienting and burst discrimination. Quart.], exp. Psychoi., 1977, ag, 533545 MILLER, J.L., & EIMAS, P.D. Feature detectors and speech perception: A critical evaluation. To be published in the Proceedings ofthe Symposium on the role of feature detectors in the recognition ofpattern andform, Austin, Texas, 1979 MORSE, P. A. The discrimination of speech and nonspeech in early infancy .J. exp. child Psychoi., 1972,14, 477-492 OSMAN, N.B. A study of the perception of compounds in three month old infants. Honors Thesis, Department of Psychology, Brown University, 1977 PISONI, D.B. Auditory and phonetic memory codes in the discrimination of consonants and vowels. Percept. Psychophys., 1973,13, 253—260

J.L. Miller & P.D. Eimas

PISONI, D.B. Speech perception. In W.K. ESTES (Ed.), Handbook of learning and cognitive processes (Vol. 6). Hillsdale, NJ: Erlbaum Associates, 1978 SIQUELAND, E.R., & DELUCIA, c.A. Visual reinforcement

of nonnutritive sucking in human infants. Science, 1969,165,1144—1146 SLOBIN, D.I. Cognitive prerequisites for the development of grammar. In C.A. FERGUSON & D.I. SLOBIN (Eds.), Studies ofchild language development. New York: Holt, Rinehart, & Winston, 1973 STREETER, L. Language perception of 2-month-old in-

Organization in infant speech perception

fants shows effects of both innate mechanisms and experience. Nature, 1976,259,38-41 SWOBODA, P.J., KASS, J . , MORSE, P.A., 8c LEAVITT, L.A.

Memory factors in infant vowel discrimination of normal and at risk infants. Child Develop., 1978,49, 332-339 SWOBODA, P.J., MORSE, P.A., 8c LEAVITT, L.A. Continuous

vowel discrimination in normal and at risk infants. Child Develop, 1976,47,459—465 TREHUB, S.E. Infant's sensitivity to vowel and tonal contrasts. Develop. Psychol., 1973,9,91—96

367

Organization in infant speech perception.

Organization in Infant Speech Perception* JOANNE L. MILLER Northeastern University PETER D. EIMAS Brown University ABSTRACT Over the past few years...
1MB Sizes 0 Downloads 0 Views