On the role of spatial and temporal cues in the

perception of the pitch of complex tones Kengo Ohgushi NHK BroadcastingScienceResearchLaboratories,1--10--11 Kinuta, $etagaya,Tokyo 157 Japan (Received9 September1976; revised27 March 1978)

This paper describesa new approachto pitch perception.It focusesattention on the slight difference betweenthe pitch of complex tone and the pitch of a pure tone with the same (fundamental)frequency. This approachis basedon the assumptionthat pitch perceptionis basedon both spatial and temporal cue analysis.In this study, the values provided by the temporal cue are calculatedfrom physiologicaldata from the auditory nerve fibers. The possibleratios of the pitch of complextonesrelative to the pitch of

,

puretonesat various frequencies arepredicted. Psychophysical experimental results strongly support this prediction. In addition, another experimentsuggeststhat the above psychophysical effect is not based upon a mutual maskingeffect in the spatial domain. PACS numbers:43.66.Hg, 43.66.Ba, 43.66.Mk

I.

INTRODUCTION

In normal listening, a complex tone consisting of many harmonics,

such as the tone of a musical instru-

ment, is always perceived as a single tone with one deftnite pitch. It was thought in the past that the pitch of a complex tone was equal to the pitch of a pure tone whose frequency is equal to the fundamental frequency of the complex tone.

However, Walliser (1969) published a new experimental result. In the experiment, complex tones consisted of short pulses with fundamental frequency between 230 and 380 Hz which were filtered through an octave band from 1.4 to 2.8 kHz. The pitch of the complex tones was slightly but systematically lower than the pitch of the pure tone with the same frequency. Furthermore, Terhardt (1971) carried out similar and extensive experiments with several kinds of complex tones. The results revealed a small but systematic pitch discrepancy in the fundamental frequency range below 1 kHz regardless of whether or not the fundamental frequency was present.

The question of whether the pitch of complex tones is based on spatial cue analysis or temporal cue analysis has been discussed by many scientists. The most recent theories leading toward an explanation of the pitch

of complex tones were presented by Terhardt (1974), Wightman (1973), and Goldstein (1973). These theories virtually ignored temporal cue analysis and concerned themselves only with pitch mechanisms in the spatial

domain. However, the possibility of at least partial use of temporal cue analysis cannot be excluded. For example, a study of frequency difference limens for a short-duration

pure tone suggests that a temporal mech-

anism operates for frequencies below 5 kHz (Moore, 1973). Accordingly, I propose to explain the abovementioned psychophysical effect on the basis of both temporal and spatial cue analyses.

THE

BASIC

IDEA

OF PITCH

MATCHING

The physical acoustic stimulus is transformed into neural activity in the auditory nervous system. Neural activity is thought to carry two types of information; by spatial distribution (or tonotopic organization) of firing fibers and by temporal distribution of neural impulses.

In this study, the following two items are thought to lead toward pitch matching between two successive tones:

(1) The spatial positions of neural excitation maxima for the first tone coincide with or are as close as possible to those

for the second

tone.

(2) Time intervals between successive neural impulses (temporal distribution of neural impulses) for the first tone coincide with or are as close as possible to those for

the second

tone.

For the sake of convenience, the subjective pitch value P will

be introduced

henceforth.

It is assumed

that

the

P of a (complex) tone is determined by the values given by the temporal cue and the spatial cue. The subjective pitch value P may then be expressed as P= WS' Vs + WT' Vt,

(1)

where Vs and Vt mean the value given by the spatial cue and that given by the temporal cue, respectively. WS and WT represent the relative weights given the spatial and the temporal cues, and

WS+ WT= 1.

(2)

The values of WS and WT are considered

to be influ-

enced by the number of harmonic components, intensities of individual harmonic components, and the fundamental frequency. II.

PREDICTION

OF PITCH

A. Calculation method for pitch discrepancy between pure tones and complex tones The calculation of pitch discrepancy between pure

a)A portion of this paper was publishedin J. Acoust. Soc. Jpn.

tones and complex tones is made as follows: Let •/be

32(11) (1976) in Japanese.

the ratio of a pure-tone frequency to the fundamental

764

J. Acoust Soc. Am. 64(3), Sept 1978

0001-4966/78/6403-0764500.80

¸ 1978 AcousticalSociety of America

764

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 128.189.93.16 On: Wed, 10 Dec 2014 01:21:13

765

KengoOhgushi:Spatial and temporal cuesin pitch

765

frequency of a complex tone when the pure-tone frequency is adjusted so that it and the complex tone are equal in pitch.

(a)

100Hz

8oi

Since a pure tone and a complex tone which

give equal pitch are near in (fundamental) frequency, U

60

can be considered as nearly equal to the value of Pc

4oi

(pitchvalue of complextone) dividedby P• (pitchvalue of pure tone).

Then, U can be expressed as

•6.o

where suffixes p and c represent a pure tone and a com-

•.o TIME

n = Pc/Po= (WSc. Vs + WTc. Vt)/(wsp' Vs + WTp' Vt),

m.o

•.o

(ms)

• z

(b) 80

plex tone. ea•

1700 Hz

From Eq. (2), we have WSp= 1 - WTp, (4)

WSc = 1 - WTc .

SubstitutingEq. (4)into Eq. (3), we have

7 = [WTc. (Vt - Vs)+ Vs]/[WTp. (Vt - Vs)+ Vs].

ß '

(5)

If the values of WTp, WTc, Vs, and Vt are given, •/ can be calculated from Eq. (5) and can be compared with the experimental results of Terhardt (1971).

'

:L5 ....

5.0.... TIME

7.6

ß ß ß lO.O

(ms)

FIG. 1. Interspike interval histogram for a single auditory nerve fiber of a squirrel monkey when a tone of 100 or 1700 Hz was presented. Bin size is 0.2 ms in the top panel and 0.1 ms in the bottom panel. (From Rose ½t al., 1967, 1968. )

B. Physiological correlates of spatial cue and temporal cue

Consider, first, the manner in which spatial and temporal cues are transmitted in the auditory nervous system. Auditory neurons are known to be arranged in order of their characteristic frequency (CF or the frequency to which the neuron is most sensitive)

on a neu-

tonal layer (Tsuchitani and Boudreau, 1966). The spatial cue seems to be transmitted as a spatial response pattern on a neuronal layer. The value given by the spatial cue may be regarded as the C F of the neuron at which the spatial excitation pattern becomes maximal. In order to transmit the cue of the place of the maximal response, the spatial response pattern on a neuron layer is required. Therefore, not only the neurons at

which the spatial response pattern becomes maximal, but also the surrounding neurons on a neuron layer may transmit the spatial cue. On the other hand, it is well known that auditory nerve fibers (primary neurons) discharge impulses corresponding to peaks of the stimulat-

ing waveform for low-frequency tone (Rose el al., 1967). Figure 1 shows an interspike interval histogram of a single auditory nerve fiber of the squirrel monkey to a sinusoidal tone stimulus published by Rose el al. (1967, 1968). The abscissa represents the interspike interval with a bin of 0.2 ms (a) and 0.1 ms (b), and the ordinate indicates

the number

of intervals

in each bin.

The his-

tograms show that the interspike interval distribution is polymodal and each modal peak corresponds to the value of the period and its integral multiples of the tone is waveform.

Up to now, some theories (Ritsma, 1962) of pitch perception suggested that pitch corresponded to the time interval between successive pronounced positive peaks in the temporal fine structure of the displacement wave-

form

of the basilar

membrane.

Since

the time

interval

between successive peaks is preserved in the response

of an auditory nerve fiber as shown in Fig. 1, interspike intervals grouped around the period of a tone stimulus are expected to transmit the temporal cue for pitch perception. Furthermore, it may be possible that the modal value of the second and other peaks also transmit the temporal cues.

Since Vs and V! correspond to the (fundamental) frequency of (complex) tones and the inverse value of the period of the tones in terms of physical aspect of tones, respectively, the values of Vs and V! are equal. If the values of Vs and Vt are kept equal within the auditory nervous system, then our hypothesis cannot explain the above-mentioned pitch discrepancy.

Consider, then, how the value of Vt can differ from Vs in the auditory nervous system. In other words, how the value of gl can deviate from 17 Here, gl is defined as the period of a tone divided by the most frequent interspike interval corresponding to the period of the tone, namely, the modal value of the first peak in Fig. 1.

A neuron has a relative refractory period as well as an absolute refractory period. When the membrane potential of a neuron exceeds the threshold potential for firing, a spike occurs and the threshold is reset to its maximal value. This period is called an absolute refractory period. During this period, a neuron cannot discharge an impulse. The threshold then decays to its resting value. The decaying process is called a relative refractory period. During this period, a neuron can discharge if the membrane potential reaches a high enough level.

If the period of a tone becomes shorter

J. Acoust.Soc. Am., Vol. 64, No. 3, September1978

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 128.189.93.16 On: Wed, 10 Dec 2014 01:21:13

766

KengoOhgushi:Spatial and temporal cuesin pitch

766

basilar membrane, corresponding to the (fundamental) 1.10

frequency.

i-1

0--•

-

,w,w -

If Vs is normalized to the (fundamental) frequency, normalized Vs (which is expressed as rs) can be ex-

0

pressed as

0©Oo 0.90

0.80

UNIT o

66-15l-5

ß

65-340-8

ß

O0

0 65-48-1

vs= 1.

Vt, on the other hand, is expected to be determined from the interspike interval corresponding to the period of the waveform of tone. Interspike interval histograms will be examined in order to determine the value'of Vt. Let Vt normalize to the period of the waveform of tone.

00

0

i2 10

2i

3i 4i

.6i 8i 1io3! 2i

FREQUENCY (Hz)

The normalized Vt (which is expressed as vt) is consid-

FIG. 2. Ratio of the period of pure tone to the modal value of the first peak corresponding to the period of a tone in the interspike interval histogram. The ratios were calculated from the data of Rose et al. (1967).

ered to be calculated from interspike interval histo-

grams (e.g.,

comes difficult for a neuron to discharge impulses corresponding to successive peaks of a tone stimulus wave-

1

creasing frequency of the tone. If the frequency increases further, it becomes impossible for a neuron to discharge impulses to successive two peaks of tone stimulus waveform. When the frequency of tone stimulus decreases and the period of tone becomes far longer than the relative refractory period, a neuron can discharge impulses with a shorter interval than the peak interval of waveform. In this case, the value of g• can

be larger than 1. Consequently, there is a possibility of Vs and Vt differ

from

each other

in

the auditory nervous system.

Figure 1 shows the examples of the cases of a low frequency and a high frequency. A short check informed us that the values of • in the cases of 100 and

2

3

-t-W2'--

-t-W3'•

-t-..,

=

(7)

where

form. Interspike interval may then become longer and the value of g• can be slightly lower than 1. This tendency is expected to become more pronounced with in-

Fig. 1) as

Vt = Wl*•

compared with the relative refractory period, it be-

that the values

(6)

i=1

and •i means the value of the time interval between the zero point and the ith modal peak on the histogram divided by the period of stimulating tone, wi represents the relative weight given the ith modal peak, and M represents the final modal peak number. As Fig. 2 clearly

shows, •l (: reciprocal value of •)

dependson frequen-

cy of tone.

Now, let pi be the reciprocal value of •'•. The value of •. for several frequencies calculated from physiological experimental data (Rose et al., 1967) is shown in Fig. 3. This shows that if the frequency is below 1.6 kHz the value of •2 is around 1.0, irrespective of frequency. Furthermore, no significant evidence could be found that • (i •- 3) depends on frequency. Then, vt can be expressed as

1700 Hz were larger and smaller than 1, respectively. This tendency has also been observed in the auditory

v• =w•' •t• + (1 -w•).

neuron of an insect (Ishizuka and Yagi, 1976). The re-

(8)

suit met our expectation.

We calculated the values of g• as a function of frequency from the data tor three auditory nerve fibers

(UNIT 65-48-1, 66-151-5, and 65-340-8) published by Rose et al. (1967). These are shown in Fig. 2. The abscissa shows the frequency of sinusoidal tone stimulus and the ordinate shows the value of g•. Figure 2 suggests that the interspike intervals corresponding to the period of the waveform are longer than the period above

1.10

o

1.00

0.90

0.80

to be shorter than the period below 200-300 Hz. This suggests that the values of Vs and Vt are generally dif-

$

o

UNIT o

66-151-5

ß o

65-340-8 65-48-1

0.70 i

ferent.

102

I

2

i

i

3 4

i

i

i_

b 8 103

i

i

2

FREQUENCY ( H z )

C. Predictionof the pitch discrepancy First, Vs is considered as the resonant (characteristic) frequency of the maximally vibrating location on the

oe-

¸

•/2

200-300 Hz. On the contrary, the intervals are found

Now, let us try to determine the values of Vs and Vt.



FIG. 3. Ratio of the twofold value of the period of tone to the modal value of the second peak in the interspike interval histogram. The ratio was calculated from the same data as Fig.

2.

J. Acoust.Soc. Am., Vol. 64, No. 3, September1978 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 128.189.93.16 On: Wed, 10 Dec 2014 01:21:13

767

KengoOhgushi:Spatialand temporalcuesin pitch

I

I

I

I

I

I

I

I

I

I

I

767

pure tone. It is possible that v t for complex tone is different from vt for pure tone. Furthermore, it may be that vt varies according to kinds of complex tones and that vt varies with the C F of auditory nerve fibers since the vibrating waveform of the basilar membrane varies with its location. However, we do not have enough physiological data (interspike interval histogram for

I

1.03 1.02

1.01

Vt

1.00

complex tones) to draw firm

conclusions.

o

0.99

• 000

ß 0,98

0 0

0

0

0.97 I

10•

I

I

I

I

2

3

4

FREQUENCY

I

I

I

6

I

II

8 103

I

I

2

(Hz)

FIG. 4. Value of vt calculated from interspike interval histograms for auditory nerve fibers and conditional probability that a nerve fiber which discharged at time zero will discharge again in response to the 1st-12th cycles following the timezero discharge (From Rose et al.,

1967).

In this study, it is assumed that vt for complex tone does not vary with CF and that vt for complex tone is equal to vt for pure tone. This assumption can be regarded as reasonable for the following reason. Almost all fibers responding to a harmonic complex tone, irrespective of their CF, are expected to discharge spikes almost synchronously to the fundamental frequency peri'od, since each location on the basilar membrane generally vibrates with the fundamental frequency period. This tendency can be seen in the results of physiological experiments

Next, it is necessary to determine the values of WTc and WTp. It is impossible to determine these values

Let us try to determine the value of w•.

Rose et al. (1967) published the conditional probability that a nerve fiber which discharged at time zero will discharge again in response to the lst-12th cycles following the time-zero discharge. w• can be considered as the conditional probabfiity that a nerve fiber will discharge again in response to the'first cycle following the time-zero discharge. The conditional probability can also be calculated from an interspike interval histogram, dividing the number of intervals corresponding to the period by the total number of intervals.

As is shown above, the value of vt can be determined from physiological experimental data.

The values of vt for various frequencies were calculated from Eq. (8) using the physiological data of abovementioned three auditory nerve fibers. These values are shown in Fig. 4. The abscissa shows the frequency of pure tone stimulus and the ordinate shows the val-

ue of vt. This figure reveals that the value of vt is larger than 1 below 200-300 Hz and smaller at higher frequencies. This strongly suggeststhat in general is different from

directly from physiological data. However, we can infer whether or not WTp is larger than WTc from the fol-

lowing consideration: In the case of a pure tone, almost all auditory nerve fibers responding to the tone are expected to transmit the temporal cue even though the CF

of a fiber is not equal to the frequency of the pure tone stimulus, since the fibers discharge spikes almost synchronously to the waveform of tone (Rose et al., 1967). In other words, the number of the fibers transmitting the temporal cue can be regarded as nearly equal to that transmitting

the spatial cue.

In the case of a complex tone, a larger number of fibers are expected to respond to the stimulus than in the case of a pure tone. So far as the peripheral nervous

system (primary neuron level) is concerned, only the fibers whose CF are around the fundamental frequency of the complex tone can probably transmit the spatial cue for the pitch corresponding to the fundamental fre-

quency. Accordingly, almost. all fibers (primary neurons) responding to the complex tone may transmit the temporal cue but only those fibers whose CF's are near the fundamental frequency may transmit the spatial cue.

The value of vt shown in Fig. 4 was calculated for

._

using such harmonic complex tones as hu-

man speech (Watanabe and Sakai, 1973) and synthetic vowels (Moore and Cashin, 1976).

If N is defined

the number of the fibers transmitting the temporal cue the number of the fibers transmitting the spatial cue '

as

(9)

!

then the relation between N in the case of a complex tone and N in the case of a pure tone can be expressed as

>

(lO)

where c represents a complex tone and • a pure tone. In other words, the weight of the available temporal cue compared with the available spatial cue for pitch per-

ception in the case of a complex tone can be expected to be larger than that of a pure tone. Thus, we can deduce the following three conditions concerning WTp and WTc:.

0 -

On the role of spatial and temporal cues in the perception of the pitch of complex tones.

On the role of spatial and temporal cues in the perception of the pitch of complex tones Kengo Ohgushi NHK BroadcastingScienceResearchLaboratories,1-...
1MB Sizes 0 Downloads 0 Views