TIME-INTENSITY TRADE FOR A TEMPORAL SPEECH-STENGER

SPEECH: EFFECT

MICHAEL J.M. RAFFIN, DAVID J. LILLY, and AARON R. THORNTON

University of Iowa, Iowa City Time-intensity trade for selected spondaically stressed words was investigated using a centering method for interaural time delays of 0.00, 1.00, 2.00, 2.25, 2.50, and 2.75 msec at five levels of presentation: 0-, 25-, 40-, 55-, and 70-dB HL (ANSI, 1969). Lateralization effects increased with level of presentation, with a maximum lateralization effect of between 22 and 30 dB occuring with an interaural time delay of 2.25 msec. Multiple images were perceived by all subiects with an interaural time delay of 2.75 msec and by some subiects with an interaural time delay of 2.50 msec at high levels of presentation. No "ear effect" was observed for any of the listeners. A potential clinical application is discussed for this temporal speechStenger effect. When the same acoustic signal is presented simultaneously to both ears, and when both signals are at equal sensation levels, the subiective percept is a sound image that appears to be inside the head, usually in the median plane. Several factors may affect the apparent position of the sound image within the head. Venturi (1796) probably was the first investigator to postulate that this percept is dependent upon differences in interaural intensity. Shaxby and Gage (1932) showed experimentally that interaural differences in the phase (or time of arrival) of the signal also may affect the apparent position of the sound image within the head. Subsequent studies have corroborated this interaction between interaural time and interaural intensity differences. In general, a time advantage in one ear can compensate for an intensity advantage in the other ear. Studies that deal with these reciprocal compensatory relations often are considered under the heading of "time intensity trade" experiments (Hafter and Carrier, 1972). Three psychophysical procedures have been used to study the stimulus parameters involved in the lateralization produced by interaural time and interaural intensity differences for pure tones, noise, and click stimuli. The matching method requires the subject to use a "pointer" to indicate the apparent location of a test signal presented with certain interaural time and intensity differences (Moushegian and Jeffress, 1959; yon Bekesy, 1930). The scaling method utilizes a visible scale set in front of the listener to represent his perceived auditory "space." The listener reports the number on the scale that corresponds most closely with the position of the perceived auditory image. As the image is displaced further from the median plane, the subject 749

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/28/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

must assign an increasingly larger number to its position (Sayers, 1964; Sayers and Lynn, 1968; Sayers and Toole, 1964; Toole and Sayers, 1965). The centering method utilizes the ability of a listener to offset an interaural intensity difference by an opposing interaural time difference to produce a sound image perceived in the center of the head (Blodgett, Wilbanks, and Jeffress, 1956; David, Guttman, and van Bergeijk, 1958, 1959; Deatherage and Hirsh, 1959; Harris, 1960; Shaxby and Gage, 1932). This method yields a traditional measure of binaural interaction-.the binaural trading ratio. This ratio is determined by the interaural time difference necessary to cancel a given interaural intensity advantage. Computationally, the binaural trading ratio (in /zsec/dB) provides a measure of interaural time divided by an equally effective amount of interaural intensity. David et al. (1959) point out that experimental results obtained with click stimuli cannot be used to predict pure-tone data. The binaural trading ratio for pure tones ranges from 0.3 /zsec/dB to 2.5 /zsec/dB (Moushegian and Jeffress, 1959; Shaxby and Gage, 1932; Whitworth and Jeffress, 1961). For clicks presented at similar sesnsation levels, the binaural trading ratio fal]s between 100 and 800/xsec/dB (David et al., 1959). Implicit in the use of the binaural trading ratio is the assumption that functionally identical values for interaural time difference and interaural intensity difference in some way involve an identical neural representation. Consequently, models of binaural interaction typically postulate a transformation of time and intensity into a single neural code (van Bergeijk, 1962). Support for this view comes from the fact that subjects in some experiments reported hearing a fused auditory image that could be centered with appropriate balance of interaural time differences and interaural intensity differences (David et al., 1959; Harris, 1960). In exception to this generalization are the data of Banister (1926), Whitworth and Jeffress (1961), and Hafter and Jeffress (1968). When presented with binaural stimuli, their subjects reported hearing two separate images rather than a fused image. One image was amenable to changes in position through interaural intensity differences, whereas the other image was moved principally by changes in interaural time differences. Accordingly, some investigators argue that no combination of interaural time and interaural intensity differences can bring both images to the median plane of the head at the same time (Gilliom and Sorkin, 1972; Hafter and Carrier, 1972). Hafter and Jeffress (1968) propose that subjects (in experiments whose results indicate the perception of a single image) might actually hear two images but, owing to the nature of the instructions, center on only one of the images. Clinically, only the lateralization produced by an interaural intensity difference has been incorporated into an audiometric test. When Stenger (1900, 1907) presented identical tones to the two ears of listeners with normal hearing, they reported the presence of a sound image in the median plane. When the intensity of the signal presented to one ear was increased, while the intensity of the signal presented to the other ear remained unchanged, the 750 Journal of Speech and Hearing Research

19 749-766 1976

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/28/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

signal appeared to lateralize completely to the ear receiving the more intense signal. Under these conditions, most listeners were unable to discern whether the less intense signal was present or absent. This effect commonly is referred to as the Stenger effect. The Stenger test is particularly valuable for patients who exhibit unilateral hearing loss with no evidence of organic disease (Politzer, 1909; Priest, 1945). Unfortunately, this procedure is not applicable for all patients with unilateral hearing impairment. Patients with diplacusis binauralis, for example, often are unable to demonstrate the Stenger effect for pure tones. Moreover, the pure-tone Stenger procedure can yield spurious test data with some clinical audiometers that use a separate oscillator for each channel. If these two oscillators are not phase locked, binaural beats may be perceived when the signals (nominally at the same frequency) are presented simultaneously to both ears (Hopkinson, 1972). To circumvent these potential problems, the pure-tone Stenger test has been modified so that speech may be used as the test stimulus (Johnson, Work, and McCoy, 1956; Taylor, 1949). For most patients, the conventional (intensive) pure-tone or speech Stenger tests can provide evidence regarding the existence of unilateral nonorganic hearing loss (Menzel, 1962). In addition, these tests can be used to estimate the magnitude of the nonorganic component for some patients (Menzel, 1960, 1965). Ventry and Chaiklin (1965), however, have shown that the intensive Stenger procedures are most valuable for the patient who exhibits a large difference between auditory threshold in the better ear and alleged threshold in the poorer ear. Our review of the literature failed to reveal any time-intensity trade experiment for speech. Published research for noise, for clicks and for pure tones, however, suggests that it should be possible to demonstrate a time-intensity trade for speech signals. If interaural time differences and interaural intensity differences both affect the apparent location of the speech signal within the head, and if the image is fused, then a clinical application of these phenomena may be realized. Specifically, systematic manipulation of interaural time differences for speech may be used to circumvent some of the problems inherent to the conventional intensive Stenger test. Any investigation of the time-intensity trade for speech, however, must be designed to evaluate also the presence of a potential "ear effect." For example, if time-intensity trade data for speech signals vary according to which ear is stimulated first, then a straightforward clinical application of this timeintensity trade might be difficult. Kimura (1961a, b) found a right-ear superiority for recall of verbal material. Additional research documents the predominance of right-ear scores over left-ear scores in the recognition of dichotically presented speech for right-handed individuals (Bartz et al., 1967; Berlin et al., 1973; Broadbent and Gregory, 1964; Bryden, 1963; Cooper et al., 1967; Gerber and Goldman, 1971; Lowe-Bell et al., 1970; Satz et al., 1965; Studdert-Kennedy and Shankweiler, 1970). Other studies show a failure of left-handed listeners to demonstrate a preference for either ear, which may RAFFIN ET AL.: Temporal Speech--Stenger Effect 751

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/28/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

relate, in part, to the fact that left-handed individuals may show mixed cerebral dominance for speech (Benton, 1970; Branch, Milner, and Rasmussen, 1964; Cyr, Danitoff, and Berry, 1971; Hecaen and Ajuriguerra, 1964; Knox and Boone, 1970; Penfield and Roberts, 1959). Because the present experiment is concerned with potential clinical application of the time-intensity trade for speech, lateralization produced by interaural time disparities will be referred to as a "temporal speech-Stenger effect" in the remainder of this report. The primary goals of this experiment were: 1. To determine if a time-intensity trade could be demonstrated for speech signals. 2. To determine whether subjects demonstrate an "ear effect" for the temporal speech-Stenger effect. 3. To determine the magnitude of the interaural time delay that produced maximum lateralization effect while still maintaining the percept of a single fused image. 4. To determine whether the magnitude of the temporal speech-Stenger effect exceeds the uncertainty usually associated with clinical audiometric measurements. 5. To determine whether intrasubject and intersubject variability for the temporal speech-Stenger effect is comparable to corresponding measures obtained for the intensive speech-Stenger effect. METHOD

Subjects Five young adults, between 24 and 30 years of age, were selected as subjects for this study. All subjects had equal air- and bone-conduction thresholds of 10 dB hearing level (HL) (ANSI, 1969) or less at octave frequencies from 250 through 4000 Hz. Air-conduction thresholds for these subjects also were 10 dB HL or less for 125, 1500, 3000, 6000, and 8000 Hz. In addition, the speech-reception threshold (SRT) for each subject did not exceed 5 dB HL. No subject had a history of ear disease. Two male subjects and one female subject were right handed. One male and one female were left handed.

Apparatus and Calibration A speech audiometer (Grason-Stadler, Model 162) was used to determine the SRT for each subject and to control the signal levels in the main experiment. The output of a magnetic tape recorder (Magnecord, Model 1020) was fed to the auxiliary input of the audiometer. All of the data for this experiment were gathered with the subject seated in a double-walled sound-insulated test room (Industrial Acoustics, Model 1204-A). The magnetic tape recorder was 752 ]ournal o[ Speech and Hearing Research

19 749-766 1976

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/28/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

calibrated to conform with specifications promulgated by the National Association of Broadcasters (NAB, 1965). One channel of the speech audiometer was fed directly to a dynamic earphone (Telephonics, TDH-39) mounted in a neoprene cushion (Telephonics, MX-41/AR). This channel was designated as the "lagging" channel in the main experiment. The second channel of the speech audiometer was fed to an impedance-matching transformer (United Transformer, Type LS-33), through the experimenter's attenuator, through the subject's attenuator, through a second impedance-matching transformer (United Transformer, LS-33), and then to a second (matched) earphone (Telephonics, TDH-39) mounted in a cushion (Telephonics, MX-41/AR). All of the attenuators used in this experiment were found to be linear ( + 0.2 dB) within the range of attenuation values required for the study. A remote monitoring circuit, that was part of a method-of-adjustment system, was used to determine the setting on the subject's attenuator.

Selection of Speech Materials Since the speech-Stenger test often is performed with spondaically stressed CID W-1 words (Hirsh et al., 1952), these words also were used to construct the temporal speech-Stenger test tape. Because of published research indicating that these words were not recorded at equal levels (Lilly, Sung, and Franzen, 1966), and that they are not equally intelligible when presented auditorily (Bowling and E1pern, 1961; Curry and Cox, 1966; Beattie, Svihovec, and Edgerton, 1975), the words hothouse, drawbridge, northwest, inkwell, farewell, and duckpond were deleted from the master list. The remaining 30 words were used in the recording of our experimental speech-Stenger test tape.

Recording of Speech Materials Initial Master Tape. A male talker with general American speech was seated in an anechoic chamber as he read the 30 selected W-1 words. A microphone (Bruel and Kjaer, Type 4131) was affixed on a stand 20 cm from the talker's mouth. The output of this microphone was amplified and fed to the input of a 1-dB step attenuator. This attenuator allowed the words to be recorded at a level near 0 VU while the talker maintained a comfortable level of speaking. From the attenuator, the signal was fed to the auxiliary input of the tape recorder. A second VU meter was placed in front of the speaker so that he could monitor himself. Experimental Test Tape. Figure 1 is a block diagram of the instrumentation used for recording the temporal speech-Stenger test tape. A tape deck (Sony, Model TC-650) was used to play back the initial master tape. Although the reproduced equalization of this tape deck was calibrated optimally, it met the NAB (1965) frequency-response criteria only for the range from 100 through 7500 Hz. The output of the tape deck was fed to an attenuator (Hewlett-Packard, Model 350-D), a power amplifier (Langevin, Model AM-50), and then to a RAFFIN ET AL.: Temporal Speech--Stenger Effect 753

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/28/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

ATTENUATOR TAPE DECK

1 !

"LEADING CHANNEL" MICROPHONE POWER S U P P L Y I I

--I

MICROPHONE POWER S U P P L Y

I

"LAGGING CHANNEL"

I

I

I

I

h~----- d .~ I ( 7 7 . 5 era) I DISTANCE BETWEEN MICROPHONES (d) = c X t ,THEREFORE d=(34.45 cm/rnsec) ( 2 . 2 5 m s e c ) = 7 7 . 5 crn

} A T T E N U A T O RI'~' i l PRECISION

A T T E N U A T O R _ I PRECISION

/

I

(~ '

q TAPE RECORDER

FIGURE 1. Block diagram of instrumentation used for recording the temporal speechStenger test tape. Computational example shows approximate distance (d) between the two microphones (77.5 cm) for an interchannel time delay (t) of 2.25 msec. Abbreviation (c) denotes speed of sound in air.

loudspeaker (KLH, Model 32) located within the anechoic chamber. Two microphones (Bruel and Kjaer, Type 4131) with their respective power supplies (Bruel and Kjaer, Type 2801) were used to transduce the acoustic signals from the loudspeaker. The output from each microphone power supply 754

Journal of Speech and Hearing Research

19 749-766

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/28/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

1976

was led out of the chamber through a 0.1-dB step (precision) attenuator to one of the two channels of the tape recorder. These attenuators were used to set the VU meters for each channel to 0 VU for each carrier phrase and for each syllable of each word. The materials thus recorded showed fluctuations no greater than • VU. Deatherage (1966) found that multiple images are perceived when the interaural time delay exceeds 2 msec for nonspeech stimuli. In a pilot study, we found that multiple images were not perceived for speech stimuli until the interaural time delay was 3 msec. In addition, we found that interaural time delays of between 2 and 3 msec produced the maximum lateralization effect. Accordingly, six interaural time delays were chosen for the experiment (0.00, 1.00, 2.00, 2.25, 2.50, and 2.75 msec). For the 0-msec time-delay list, each of the two microphones was placed exactly the same distance from the loudspeaker. A 1000-Hz calibration tone was played through the loudspeaker, and the output of the two microphones was monitored on a dual-beam oscilloscope (Tektronix, Type 502A). This calibration tone also was recorded at 0 VU for each channel. For each subsequent list, as the interchannel time delay increased, one of the microphones was moved further away from the loudspeaker. This microphone (the lower one in Figure 1) and its associated channel on the tape recorder was defined as the "lagging channel." The position of the upper microphone in Figure 1 remained fixed. This microphone and its associated channel was defined as the "leading channel." The distance between the two microphones was computed for each time delay. An example of this computation is provided in Figure 1 for the 2.25msec interchannel time delay. When each experimental test list was recorded, the relative interchannel phase of the 1000-Hz calibration tone always was used to adjust precisely each interchannel time delay. Identification phrases for each time-delay list later were spliced to the beginning of each list. The calibration tone recorded at 0 VU was spliced to the beginning of the experimental test tape.

Procedure A method of adjustment was used to gather data for the experiment. Each subject was instructed to adjust the image of a speech stimulus until it was perceived to be in the center of his head. These adjustments were made by controlling the level of the signal in the leading channel. In addition, each subject was instructed to report immediately any confusion in image localization or the perception of multiple images. After each trial, the subject reset his attenuator knob to some random position chosen by the experimenter. The settings of the experimenter's attenuators also were chosen at random within a range of values that would allow for median-plane localization (within the subject's attenuator range). Resetting the subject's attenuator after each trial also helped to minimize positional references that the subject might have established for the knob. RAFFIN ET AL.; Temporal Speech-Stenger Effect 755

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/28/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

The channel in which the stimulus lagged was chosen as the reference channel. Each of the time delays (0.00, 1.00, 2.00, 2.25, 2.50, and 2.75 msec) was presented at 10, 25, 40, 55, and 70 dB HL. For each of the 30 resulting conditions, each subject was given five trials for the right ear leading and five trials for the left ear leading. For all trials, the same earphone was used with the leading channel. Each subject, therefore, completed the experiment after 300 judgments of median-plane localization. The choices of leading ear, time delay, and level of presentation in the reference channel were each selected at random for each subject.

RESULTS Table 1 provides a summary of the group data. Means, standard deviations, and ranges are listed for all experimental conditions. To determine the interTABLE 1. Summary of group data, obtained on five normal-hearing listeners.

Interaural Time Delay (msec) 0.00

1.00

2.00

2.25

2.50

2.75

Level of Presentation (dB HL) 10 25 40 55 70 10 25 40 55 70 10 25 40 55 70 10 25 40 55 70 10 25 40 55 70 10 25 40 55 70

Mean (dB)

Standard Deviation

Range

1.4 1.0 1.2 1.2 1.2 7.0 14.0 17.3 20.2 20.4 9.7 18.96 21.5 23.7 26.9 9.94 21.5 25.6 27.3 29.9 6.6 15.9 18.4 23.2 28.7 5.58 11.6 15.5 18.2 24.1

1.31 0.69 0.53 0.84 1.85 2.02 2.42 4.01 3.70 5.09 0.61 2.83 2.54 1.45 5.79 1.27 1.89 1.48 0.77 5.16 1.38 3.26 3.30 1.81 6.02 1.74 2.21 1.89 1.30 7.97

--0.1- 3.1 0.1- 1.6 0.5- 1.9 0.2- 2.1 --0.2- 4.3 4.2- 9.6 12.2-18.2 14.6-24.0 15.7-25.7 16.4-29.2 9.1-10.4 14.7-22.4 18.6-25.2 22.3-26.0 21.8-36.8 8.8-12.1 18.4-23.4 23.3-27.0 26.8-28.5 26.3-38.9 4.4- 8.2 12.3-19.7 15.4-22.5 20.1-24.6 23.0-36.7 3.7- 8.2 10.0-15.5 12.7-18.0 16.3-19.9 15.8-37.3

756 1ournal o[ Speech and Hearing Research

19 749-766

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/28/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

1976

actions of time delay, level of presentation, and ear effect, two four-way (time delay • level of presentation • ear X subject) analyses of variance (ANOVAs) ( Lindquist, 1956; Winer 1962) were computed using the Statistical Analysis System (Service, 1972). A 0.01 level of confidence for these analyses was chosen prior to the collection of data. Because the data from published research were inconclusive with regard to the left-ear dominance for lefthanded individuals, it was decided that the first ANOVA would be computed with the ear variable defined as the right versus the left ear for all listeners (dr = 20,80; F = 0.709; p < 0.8062). For the second ANOVA, the ear variable was defined as the dominant versus the nondominant ear based on the individual's handedness (dr = 20,80; F = 1.476; p < 0.1136). The results of the ANOVA for simple effects also failed to demonstrate the presence of an ear effect (dr = 1,4; F = 0.618; p < 0.5208) for the task of median-plane localization using speech signals. The results of three-way analyses of variance (time delay • level of presentation X subject) revealed a time delay by level of presentation interaction. As a result of this significant interaction (clf = 20.80; F = 10.612; p < 0.0001), additional ANOVAs were computed (Myers, 1973). These analyses revealed that the introduction of a time delay produces a significant effect (p < 0.0002) at each level of presentation, and that there was no effect of level of presentation with a time delay of 0.00 msec. For all other interaural time delays, a significant effect was noted (p < 0.0004). The group data also are plotted in Figures 2 and 3. Figure 2 shows the interaural intensity difference (IID, in dB) required to obtain an image localized in the median plane as a function of level of presentation (in dB HL) for each interaural time delay (ITD, in msec). The effect is negligible at a time delay of 0.00 msec regardless of level of presentation. For all other time delays, the magnitude of the effect increases with level of presentation. At 10 dB HL, the magnitude of the ITD effect is between 5 and 10 dB. It was not expected that this magnitude would exceed 10 dB at 10 dB HL, since an effect greater than 10 dB would have required the signal in the leading channel to be attenuated to some infrathreshold level. Although the ITD effect tends to increase with level of presentation, the effect asymptotes at 55 dB HL with an interaural time delay of 1.00 msec. Figure 3 shows the magnitude of the effect (the interaural intensity difference required to obtain an image localized in the median plane) as a function of interaural time delay (in msec) for each level of presentation (in dB HL). It should be noted that for all levels of presentation, a maximum effect occurs at 2.25-msec time delay. At this interaural time delay, a mean effect of about 30 dB was obtained at 70 dB HL in the reference ear. The magnitude of the effect increases slowly from 0.00 to 2.25 msec interaural time delay and then decreases rapidly with larger interaural time delays. This pattern was noted for all levels of presentation. To determine whether the maximum effect (for an interaural time delay of 2.25 msec at each level of presentation) was significantly greater than the RAFFIN ET AL,; Temporal Speech-Stenger Effect 757

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/28/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

Z

i

0

i

I

I

I

i

I

l

i

~

I

I

I

T D = O . O 0 MSEC I

I.-

Time-intensity trade for speech: a temporal speech-Stenger effect.

Time-intensity trade for selected spondaically stressed words was investigated using a centering method for interaural time delays of 0.00, 1.00, 2.00...
2MB Sizes 0 Downloads 0 Views