ACOUSTIC PERCEPTION

REDUNDANCY AND THE OF TIME-COMPRESSED SPEECH

ARTHUR WINGFIELD

Brandeis University, Waltham, Massachusetts

An experiment is reported in which time-compressed sentences were heard spoken either in normal intonation or in intonation patterns that conflicted with their underlying syntactic structure. Although there was an overall decrement in intelligibility with increasing compression, sentences heard in normal intonation were significantly better able to withstand the debilitating effects of compression than those with anomalous intonation. An error analysis of subject responses suggests that intonation normally operates to supply supplemental cues for determining syntactic structure as a step in the perceptual coding of heard speech. In the everyday perception of speech, information is available from a variety of sources beyond that contained in the speech signal itself. Such additional sources include knowledge of the context: both linguistic context from preceding utterances, and situational context from environmental circumstances. Both of these can facilitate perception as they determine stimulus probability for the listener. Another common source is visual information accompanying the speech in the form of facial expression, lip movements, gesture, and body posture (Ewertsen and Birk Nielsen, 1971; Birk Nielsen, 1972; Birk Nielsen and Lieth, 1973). To the extent that these and other additional sources of information aid or supplement the speech signal itself, they can be said to add useful redundancy to the perceptual act. There is another source of redundancy in speech perception that is easily overlooked since it forms an integral part of the speech signal itself. This relates to the intonation pattern of the heard speech. Although the term intonation must include changes in loudness, stress, and melodic pattern, acoustic pauses in speech have received the most research attention. As much as 40 to 50~ of ordinary speaking time is occupied by pauses that are systematically related to the content of the speech message (Goldman-Eisler, 1968). Although characteristic of speech production, timing patterns are also known to represent critical perceptual features from the level of within morpheme units, to rhythmic patterns across entire phrases (Huggins, 1972). On the sentence level, pauses and stress pattern can both emphasize important words, in the form of "oral underlining," and can distinguish intended 96

Downloaded From: http://jslhr.pubs.asha.org/ by a Northwestern University User on 09/15/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

meaning in ambiguous structures. One can, for example, change the perception of "lighthouse keeper" into "light housekeeper" by simply altering the time interval between the constituent morphemes (Lieberman, 1967). However, in ordinary conversation hesitation pauses and other acoustic factors may have an even more important role.. Acoustic pauses may also mark perceptual processing "units" in speech and provide necessary time for their analysis (Dittman and Llewellyn, 1967; Aaronson, Markowitz, and Shapiro, 1971). Furthermore, the points they mark may relate to encoding and decoding of syntactic structure by indicating major linguistic clauses (Wilkes and Kennedy, 1969; Wingfield and Klein, 1971). Studies of acoustic pauses alone, as they relate to syntactic analysis in speech perception, are not without ambiguity (Ruder and Jensen, 1972). This problem may reflect the likelihood that pauses represent only the most easily measurable component of the full complex of acoustic features associated with intonational change. Like other potential sources of supplementary information, the role of intonation pattern in speech perception can easily go unnoticed under ideal listening conditions. The speech may be poorly articulated, heard in a noisy background, or simply spoken with extreme rapidity. One way to examine this question is to study the role of intonation under controlled conditions where a generally "degraded" signal is used. Artificially accelerated or time-compressed speech has a special advantage in this regard. It can increase word rates up to and beyond tolerable limits, while leaving the relative temporal pattern of speech and silent periods essentially intact. The increasing use of time-compressed speech in research arises from the availability of new and efficient compression methods. One such technique is the so-called "sampling method" of speech compression, which periodically deletes small segments of the recorded message at regular intervals and then abuts the remaining segments in time. When the deleted segments are small (less than 30 msec) there is little chance of discarding entire critical features. The remaining time-abutted segments are then played back at normal speed, resulting in speech reproduced in less than normal time, and without the distortion of vocal pitch or quality that would, for example, accompany tape recorder playback at faster than normal speed. The degree of compression is varied by the frequency with which the tape segments are deleted. Although highly dependent on message complexity, good intelligibility for speech reproduced in as little as 40 to 50~ of normal playing time can be easily demonstrated (Foulke, 1971). The materials employed in the present study were sentences specially constructed through a tape-splicing procedure, such that their intonation patterns could be made either to agree or conflict with their underlying syntactic structure (Garrett, Bever, and Fodor, 1966; Wingfield and Klein, 1971). Of primary interest to determine is first, the degree to which the supplementary information supplied by appropriate intonation may overcome the debilitating effects of time compression, and second, the specific role this information may play in the perceptual decoding of the heard speech. WINCFIELD:

Redundancy in Speech Perception 97

Downloaded From: http://jslhr.pubs.asha.org/ by a Northwestern University User on 09/15/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

METHOD

Stimulus Sentences The stimuli consisted of 10-word English sentences heard either in normal intonation or in an intonation pattern appropriate to a sentence with a different syntactic structure. This variation was accomplished by recording pairs of sentences that shared an identical five-word sequence, but for which the surrounding linguistic context put the maior syntactic boundary at different points within the sequence. In the example below, the identical five-word sequences are shown in italics, and the maior syntactic boundary is indicated by a comma. (1) To avoid any attempts to influence voting, machines were installed. (2) Due to our new mayor's influence, voting machines were installed. The sentences were recorded by a female speaker of American English in a normal fluent manner. From each pair, two new sentences of anomalous intonation were now constructed by cross splicing the five-word segments of each sentence into the context of the other. This cross splicing produces recordings of an additional pair of sentences with the acoustic pauses and associated intonational cues no longer coinciding with the maior syntactic boundary as defined by the words in the sentences. In the examples below, the commas indicate the points of acoustic marking as taken from the matched sentences from which they were derived. (1) To avoid any attempts to influence, voting machines were installed. (2) Due to our new mayor's influence voting, machines were installed. A total of 20 sentence pairs spoken in normal intonation, plus another 20 pairs in anomalous intonation, were constructed in this way. All sentences consisted of a dependent clause plus an independent clause, or an independent plus a dependent clause. This structure was chosen because sentences of this type are especially unambiguous in both syntax and acoustic intonation when spoken normally. Locations of the maior syntactic boundary were varied from the second to the seventh word in the sentences to minimize any possible systematic influence of their serial position. Splices were silent and unnoticed in playback, such that the anomalous sentences sounded as clear and as naturally fluent as those heard in normal intonation.

Speech Compression The speech materials were recorded at an original fast-normal rate of 207 words per minute and then compressed using the previously described sam98 1ournal of Speech and Hearing Research

Downloaded From: http://jslhr.pubs.asha.org/ by a Northwestern University User on 09/15/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

18 96-104 1975

piing method on an electro-mechanical compressor of the Fairbanks type. Sampling was on a periodic basis with deleted segments of 20 msec occurring regularly both between and within words. The entire set of sentences was compressed to 80, 70, 60, 50, and 40% of normal playing time, corresponding to speaking rates of approximately 259, 296, 345, 414, and 518 words per minute.

Listening Tasks and Sub/ects Subiects were told that they would be hearing a series of short samples of English speech recorded at various speaking rates and were given a brief explanation of time compression. The task was presented as a test of perception of such materials. The instructions were to listen carefully to each passage and, when it was finished, to write down as much of it as could be remembered. No specific mention was made either of the fact that all the speech materials would be 10-word sentences or that some of them would be heard with anomalous intonation. Each subiect heard a total of 20 test sentences, four in each of the five compression ratios discussed. The order of presentation of compression ratios was systematically varied among subiects, as were the particular sentences heard at each compression ratio. Half the sentences at each ratio were normal, and half were in anomalous intonation. No subiect heard the same sentence more than once, but, across all subiects, all sentences were heard in both intonation patterns and at all compression ratios. Materials were presented on a highquality tape recorder and monitored over binaural earphones at an average sound pressure level of 70 dB (re 0.0002 dyne/cm2). The main experiment was preceded by 15 practice sentences to familiarize the subiects with the testing procedures and the general nature of the speech materials. Ten university undergraduates served as subiects. All reported normal hearing, and all spoke American English as their first language. RESULTS

Scoring Credit was given for all words reported correctly, regardless of order. Additions to words (for example, reporting quick as quickly) did not lose credit, but failure to give an ending when it was present (for example, reporting quickly as quick) lost credit for the entire word. Intelligibility was taken as the mean percentage of words in each sentence reported correctly.

Intelligibility The main results are summarized in Figure 1, which shows mean intelligibility scores as a function of degree of compression for sentences heard in WINGFIELI)" Redundancy in Speech Perception 99

Downloaded From: http://jslhr.pubs.asha.org/ by a Northwestern University User on 09/15/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

ioo

o-~.~ r--....o 9o-

80

9 ""~l k

o

~

~ o

u

~

-

"o

6G

FIGURE 1. Intelligibility of sentences heard in normal and in anomalous intonation as a function of degree of time compression of the speech signal. Intelligibility at normal speech rates is shown for comparison.

\

,-

,,,

413

20

~ o Normal Intonation e - - e Anomalous Intonation

I

No mol Rate

I

0.80

I

"~

I

0.60

I

,

| 0.40

Compression Ratio

normal and anomalous intonation. Analysis of variance confirmed the significance of both the progressive decrement in intelligibility with time compression ( F = 76.36; df = 5, 45; p

Acoustic redundancy and the perception of time-compressed speech.

An experiment is reported in which time-compressed sentences were heard spoken either in normal intonation or in intonation patterns that conflicted w...
686KB Sizes 0 Downloads 0 Views