The role of excitation-pattern, temporal-fine-structure, and envelope cues in the discrimination of complex tones Helen M. Jackson and Brian C. J. Moorea) Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, England

(Received 9 August 2013; revised 16 January 2014; accepted 21 January 2014) The discrimination of bandpass-filtered harmonic (H) from inharmonic (I) tones (produced by shifting all components of the H tones upwards by a fixed amount in Hz) could be based on shifts in the pattern of ripples in the excitation pattern (EP) or on changes in the temporal fine structure evoked by the tones. The predictions of two computational EP models were compared with measured performance. One model used auditory filters with bandwidth values specified by Glasberg and Moore [(1990). Hear. Res. 47, 103–138] and one used filters that were twice as sharp. Stimulus variables were passband width, fundamental frequency, harmonic rank (N) of the lowest component within the passband, component phase (cosine or random), signal-to-noise ratio (SNR), and random perturbation in level of each component in the tones. While the EP models correctly predicted the lack of an effect of phase and some of the trends in the data as a function of fundamental frequency and N, neither model predicted the worsening in performance with increasing passband width or the lack of effect of SNR and level perturbation. It is concluded that discrimination of the H and I tones is not based solely on the use of EP cues. C 2014 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4864306] V PACS number(s): 43.66.Hg, 43.66.Fe, 43.66.Ba, 43.66.Nm [VMR]

I. INTRODUCTION

Discrimination of the fundamental frequency (F0) of complex tones is usually good when the tones contain low harmonics, but worsens when the number of the lowest harmonic present, N, increases above about 7, reaching a plateau when N is about 14 (Hoekstra and Ritsma, 1977; Houtsma and Smurzynski, 1990). The worsening as N is increased from 7 to about 14 has been interpreted by some as resulting from a progressive reduction of the ability to resolve the components in the complex tone (Houtsma and Smurzynski, 1990; Shackleton and Carlyon, 1994). This is referred to here as the “resolvability” hypothesis. An alternative hypothesis is based on the idea that the waveform evoked by a sound at a given point on the basilar membrane can be considered as a slowly varying temporal envelope imposed on a more rapidly oscillating carrier, the temporal fine structure (TFS). In this paper, TFS is used to refer either to the rapid oscillations of the waveform on the basilar membrane (TFSBM) or to the representation of those oscillations in the detailed timing of the nerve spikes in the auditory nerve (TFSn), following the nomenclature proposed by Moore (2014). When the harmonics of a complex tone interact on the basilar membrane, pitch may be extracted from the time intervals between peaks in TFSBM close to adjacent envelope maxima (Schouten, 1940; de Boer, 1956; Schouten et al., 1962). This idea is illustrated in the upper panel of Fig. 1, which shows the waveform that might be evoked at a place on the basilar membrane with characteristic frequency (CF) ¼ 2000 Hz in response to a complex tone a)

Author to whom correspondence should be addressed. Electronic mail: [email protected]

1356

J. Acoust. Soc. Am. 135 (3), March 2014

Pages: 1356–1370

with components at 1800, 2000, and 2200 Hz; the F0 is 200 Hz and N ¼ 9. It is assumed that most nerve spikes are evoked at times corresponding to the largest peaks in TFSBM; these occur close to peaks in the envelope (Javel, 1980). The times when most spikes occur are indicated by 1, 2, 3 and 10 , 20 , 30 . It is assumed further that pitch is extracted from the time intervals between TFSBM peaks close to adjacent envelope maxima. As illustrated in Fig. 1, the most prominent such interval (the time between 1 and 10 , 2 and 20 , or 3 and 30 ) is 5 ms. The reciprocal of this value corresponds to the pitch that is heard most often (matching that of a sinusoid with frequency ¼ 200 Hz). However, there are other “candidate” time intervals, such as 4.5 ms (the interval between 2 and 10 or 3 and 20 ) and 5.5 ms (the interval between 1 and 20 or 2 and 30 ). Consistent with this, such stimuli can evoke more than one pitch, and each pitch corresponds to one of the candidate time intervals (corresponding to frequencies of 222, 200, and 182 Hz) (Schouten, 1940; de Boer, 1956; Schouten et al., 1962). If all components in the complex tone are shifted upwards by the same amount in Hz, say by 30 Hz, a shift in pitch is heard (Schouten, 1940; de Boer, 1956; Schouten et al., 1962). This is as predicted, since the time intervals between TFSBM peaks close to adjacent envelope maxima also shift, while the envelope repetition rate remains unaltered. This is illustrated in the lower panel of Fig. 1. The most prominent interval shifts to 4.93 ms (corresponding to 203 Hz). Correspondingly, an upward shift in pitch is usually, heard, although, again, the pitch is ambiguous. When N is increased, adjacent peaks in TFSBM (e.g., the interval between 10 and 20 in Fig. 1) become closer together in time. For example, if N is increased to 15 for the harmonic tone, so that harmonics at 3000, 3200, and 3400 Hz are

0001-4966/2014/135(3)/1356/15/$30.00

C 2014 Acoustical Society of America V

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

FIG. 1. The upper part of the figure shows a simulation of the waveform on the basilar membrane in response to a complex tone with components at 1800, 2000, and 2200 Hz. The lower part shows the waveform when all components are shifted up by 30 Hz, to 1830, 2030, and 2230 Hz. From Moore (2014) with permission of the author.

present, the intervals between adjacent peaks in TFSBM are close to 0.31 ms. The most prominent candidate time intervals for pitch would then be 4.69, 5.0, and 5.31 ms. These intervals are closer to one another than the candidate intervals with N ¼ 9. Also, when N is increased there are more TFSBM peaks with similar high amplitude, so there are more candidate time intervals for pitch. Given that nerve spikes do not occur at exactly the same phase of TFSBM on every cycle of TFSBM (Javel, 1980), and that there is probably some jitter in the mechanism that measures the candidate intervals (Moore and Glasberg, 2010), a point may be reached with increasing N when the candidate intervals cannot be distinguished one from another. At this point TFSn may become unusable for F0 discrimination, and only a weak pitch based on the envelope repetition rate is heard (Moore and Moore, 2003). Hence, the worsening in F0 discrimination with increasing N could result from a progressive loss of the ability to use TFSn information (Moore et al., 2006; Hopkins and Moore, 2007; Ives and Patterson, 2008). This is referred to here as the “TFS hypothesis.” The decision as to whether the resolvability hypothesis or the TFS hypothesis is more nearly correct depends partly on the extent to which harmonics with numbers in the range 7 to 14 are resolved, which is still a matter of debate (Moore and Gockel, 2011). The experiments described here were aimed at deciding whether complex tones whose audible harmonics all lie above the seventh are discriminated based on their TFS or upon cues related to changes in the excitation pattern resulting from (partial) resolution of the harmonics. The experiments were based on a method of measuring sensitivity to TFS described by Hopkins and Moore (2007) and later modified by Moore and Sek (2009a; Sek and Moore, 2012). Subjects were required to discriminate a harmonic complex tone (H) with a given fundamental frequency value, F0, from a complex tone in which all components were shifted upwards by the same amount in Hertz, DF, resulting in an inharmonic tone (I). The two tones had the same envelope repetition rate (equal to F0), but had different TFS, as illustrated in the upper and lower panels of Fig. 1. To reduce cues related to differences in the excitation patterns of the H and I tones, all tones were passed through a bandpass J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

filter with a flat passband and skirts that decreased in level at a rate of 30 dB/octave. In what follows, the width of the passband is defined as the width of the flat region. To mask combination tones and to prevent components with frequencies well away from the filter passband from being audible, the stimuli were presented in a threshold equalizing noise (TEN, Moore et al., 2000). In the variation of the task described by Moore and Sek (2009a), there were two successive observation intervals, each of which contained four successive tones. In one interval, the sequence was HHHH and in the other interval the sequence was HIHI. Subjects were required to indicate the interval in which the sound appeared to change. As described above, frequency-shifted tones usually have a slightly different pitch from their harmonic counterparts. Hence, it is assumed that subjects performed the task by choosing the interval in which they heard the pitch of the tones to go up and down. This is consistent with subjective reports of the subjects, who stated that they listened for changes in pitch, not changes in roughness or some other sound attribute, such as timbre. According to the resolvability hypothesis, the task is performed using shifts in the ripples of the excitation pattern along the tonotopic axis. According to the TFS hypothesis, the task is performed by detecting changes in the time intervals between peaks in TFSBM close to adjacent envelope maxima, for example detecting the change from 5 to 4.93 ms in the intervals illustrated in Fig. 1. Note that in the studies of Hopkins and Moore (2007) and Moore and Sek (2009a), N was used to refer to the harmonic number corresponding to the center of the bandpass filter through which the stimuli were passed. Here, N is used to refer to the number of the lowest harmonic within the passband. When N ¼ 9, performance of normal-hearing subjects in this task is usually very good for F0s in the range 100–400 Hz (Hopkins and Moore, 2007; Moore et al., 2009; Moore and Sek, 2009a). Performance worsens as N is increased, and the task often cannot be performed reliably when N ¼ 16 (Hopkins and Moore, 2007; Moore et al., 2009; Moore and Sek, 2009a). The question addressed by the present experiments was: Does the worsening in performance with increasing N reflect a progressive reduction in the ability to use excitation-pattern cues or a progressive loss of the ability to use TFS information? The experiments described here involved several manipulations of the stimuli that would be expected to influence the ability to use excitation-pattern cues. To predict the pattern of thresholds to be expected from these manipulations if changes in the excitation pattern were the sole cue used to perform the task, a computational excitation-pattern model was developed. The implementation of this model is described below. Two versions of the model were developed. Model A used estimates of the bandwidths and shapes of the auditory filters derived from simultaneous masking experiments (Glasberg and Moore, 1990), and model B used modified auditory filters whose bandwidths were smaller by a factor of two, which is towards the lower end of the range of bandwidth values estimated using forward masking for signals with low sound levels (Moore and Glasberg, 1981; Glasberg and Moore, 1982; Oxenham and Shera, 2003; Oxenham and Simonson, 2006). Our main motivation was to

H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

1357

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

compare the pattern of results predicted by the model with the pattern actually measured using human listeners. The experiments described here also addressed the issue of whether performance in discriminating the H and I tones might depend on cues in the temporal envelopes of the signals as represented in the auditory system. In the version of the task used by Hopkins and Moore (2007), most of the stimuli had components added in cosine phase. This results in a waveform with one large envelope peak per period. Although the H and I tones had the same Hilbert envelope (Bracewell, 1986), it is possible that the representation of the envelope in the auditory system differed slightly for the H and I tones, since for the I tones the position of TFSBM peaks relative to the temporal envelope peaks changes from one envelope peak to the next, as illustrated in the lower panel of Fig. 1. Hence, subjects might have discriminated the stimuli using subtle differences in temporal envelope shape. This cue can be made ineffective by selecting the component phases randomly for every tone. This results in an envelope shape that varies markedly from one tone to the next, regardless of whether the tone is harmonic or inharmonic, as illustrated in Fig. 1 in Sek and Moore (2012). Hopkins and Moore (2007) re-ran a subset of their conditions using stimuli with random-phase components. They found that the task could still be performed, a result that was confirmed by Moore and Sek (2009a). However, to our knowledge, the present paper is the first to report a direct comparison of performance for discrimination of the H and I tones by the same subjects using cosine- and random-phase complex tones. Experiment 1 assessed the effect of varying passband width, keeping constant the overall level of the signal relative to that of the background noise. If performance were based on the detection of shifts in the pattern of ripples along the tonotopic axis, one might expect performance to improve with increasing passband width, since this would lead to a greater number of ripples in the excitation pattern. In fact, performance tended to worsen with increasing passband width. Experiment 2 assessed the possibility that the results of experiment 1 were determined by changes in the effective signal-to-TEN ratio across the different passband widths. In experiment 3, thresholds were measured for several combinations of F0 and N to assess the extent to which the model could predict the pattern of results for a wide range of conditions. Experiment 4 assessed the effect of randomly perturbing the level of each component in the stimuli. This was expected to disrupt any regular pattern of ripples in the excitation pattern and to lead to markedly poorer performance if excitationpattern cues were exclusively used to perform the task. II. EXCITATION-PATTERN MODEL A. Background and rationale for model

Figure 2 shows the excitation patterns for examples of an H tone (solid line) and an I tone (dashed line) for a condition tested in experiment 1 of this paper. The excitation patterns were calculated using the model described by Glasberg and Moore (1990), but with the modification described by Moore et al. (1997). In this model, excitation patterns are calculated from the outputs of simulated auditory filters as a 1358

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

FIG. 2. Examples of excitation patterns for an H tone and an I tone when the frequency shift DF was 0.5F0. The value of F0 was 200 Hz and N was 9.

function of filter center frequency, after taking into account the transmission of sound through the outer and middle ear. The input signals were generated using the same software as used to generate the experimental stimuli (Moore and Sek, 2009a), and included the background TEN. The level of the signal was 15 dB higher than the level/ERBN of the TEN at 1 kHz, where ERBN refers to the equivalent rectangular bandwidth of the auditory filter for listeners with normal hearing, as specified by Glasberg and Moore (1990). The nominal F0 was 200 Hz, the value of N was 9, and the passband width was 5F0; the flat part of the passband extended from 6.5F0 to 11.5F0. The frequency shift was 0.5F0, leading to the maximum difference between the H and the I tones. Ripples in the excitation pattern can be seen corresponding to individual components; for this frequency shift, peaks in the pattern for the H tone coincide with dips in the pattern for the I tone. However, the magnitude of the ripples is small. It has been proposed that a difference between two excitation patterns can be detected when the excitation level at any point (any single CF) differs by a criterion amount (Zwicker, 1956). For the case illustrated in Fig. 2, the largest difference in excitation level is less than 2 dB. The difference would be smaller than 0.8 dB for a frequency shift corresponding to the threshold value measured for human listeners, which was about 0.2F0 for this condition. However, it seems unlikely that discrimination of the H and I tones is based on the detection of a difference at a single point on the excitation pattern, since the smallest detectable change at a single CF appears to be greater than 2 dB (Moore and Sek, 2009b), and since the background TEN introduces random perturbations in level. It is more likely that discrimination would be based on changes in the pattern of ripples, specifically shifts in the ripples along the tonotopic axis. Profile-analysis studies (Green et al., 1987; Bernstein and Green, 1987) have shown that a regularly spaced pattern of ripples in an excitation pattern, for example as evoked by a stimulus with log-spaced components in which every second component is incremented in level, is more detectable than a single ripple (peak), produced by incrementing the level of a single component. To assess the performance that would be predicted based on shifts in the H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

pattern of ripples, a computational excitation-pattern model was developed. The model was required to discriminate between an interval containing HHHH and one containing HIHI, where the components of the I tones had an upward frequency shift of DF (in Hz) relative to the components of the H tones. To take into account the effect of random variability introduced by the TEN, the model was not based on average excitation patterns, but rather simulated the forced-choice task that was performed by the subjects. There are several strategies that a subject might use to select the interval containing the HIHI tones based solely on shifts in the ripples in the excitation pattern. Here, a strategy was selected that was thought to make nearly optimal use of the ripples. The strategy was based on the assumption that the subject can make use of stimuli that are relatively easy to discriminate, presented early during an adaptive run when the value of DF is large, to construct a template representing the difference that is to be detected (Dau et al., 1996). The use of a template generally leads to better performance than when no template is used. The strategy was also based on the assumption that the subject is able to attend selectively to the region of the excitation pattern that is most informative, ignoring regions that are dominated by the background TEN.

5.

6.

B. Steps in the model

The sequence of steps in the model was as follows:

7.

1. For a given trial, a complete stimulus waveform file was generated using a 48-kHz sampling rate. This contained the HIHI and HHHH tones as well as the masking TEN, exactly as in the experiments using human listeners. 2. The waveform file was split into eight sections, each containing a single complex tone with its onset and offset ramps (each was 9600 samples in length, equivalent to 200 ms). The sections for each interval are denoted T1, T2, T3, and T4. 3. A Hamming window was applied to the samples for each section and a Fourier transform with a size of 8192 (equivalent to 171 ms) was calculated to determine the spectrum of that section. 4. The magnitude spectrum determined in step 3 was entered into the excitation-pattern model to estimate the excitation pattern for each H tone and each I tone. The excitation patterns were calculated at intervals of 0.1Cam on the ERBN-number scale (Cams are the units of the ERBN-number scale, Moore, 2012). Model A used the estimates of the bandwidth and shape of the auditory filters specified by Glasberg and Moore (1990). There is some evidence that auditory filters may be somewhat sharper than estimated by Glasberg and Moore (Oxenham and Shera, 2003). To assess the effect of using sharper filters, model B was similar to model A, but used modified auditory filters that had bandwidths that were one-half of those specified by Glasberg and Moore. This was achieved by doubling the values of the parameter “p” in the rounded-exponential filter model (Patterson and Nimmo-Smith, 1980). This yielded filters with J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

8.

bandwidths comparable to the smallest published bandwidth estimates (Oxenham and Shera, 2003; Oxenham and Simonson, 2006). To simulate the effect of internal noise corresponding to variability in neural responses, a Gaussian-distributed random number with a mean of zero and a standard deviation of r was added to the excitation level at each frequency. This had the effect of perturbing the excitation pattern of each tone in each interval. The value of r was adjusted for each model (A and B) so that predicted performance matched observed human performance for a baseline condition in each experiment. More information on this is given below. For each interval, the perturbed excitation patterns were averaged for sections T1 and T3 and for sections T2 and T4. This gave, respectively, EP1 and EP2 for interval 1 (the HIHI interval) and EP3 and EP4 for interval 2 (the HHHH interval). The difference in dB for each point (each CF) on the averaged excitation patterns was taken (EP1-EP2 and EP3-EP4), giving the arrays EPdiff1 for interval one and EPdiff2 for interval 2. Figure 3 shows examples of the arrays for a passband width of 5F0, N ¼ 13, F0 ¼ 200 Hz, and DF ¼ 100 Hz. For the interval containing the I tones, the array showed a pattern of ripples (dashed line), whereas the array for the other interval did not (dotted line). The array for the interval containing all H tones is not flat because of the random fluctuations produced by the background TEN. It was assumed that performance was based on comparison of the EPdiff1 and EPdiff2 arrays with a template that was built up during the first two trials of an adaptive run, when the value of DF was relatively large (usually well above threshold) and it would have been clear to human listeners (in most conditions) which interval contained the I tones. The template, EPT, was formed by taking the mean of the arrays corresponding to EPdiff for the interval containing the I tones across the first two trials, where DF was always 0.5F0. Figure 3 also shows an example of a template (solid line). Steps 1 to 6 were repeated for each remaining trial (1000 in total). Normalized cross-correlation functions were

FIG. 3. Examples of the arrays EPdiff1 (dashed line) and EPdiff2 (dotted line) that were assumed to be used for discrimination of the H and I tones. EPdiff1 shows a pattern of ripples, while EPdiff2 does not. See text for details. The template is shown by the solid line.

H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

1359

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

calculated between EPdiff1 and EPT and between EPdiff2 and EPT, the variable being shift in frequency in Hz (since all components in each I tone were shifted upwards by the same amount in frequency from the corresponding components in each H tone). The cross-correlations were calculated only over the range of CFs that would be informative, i.e., for which the components in the tones would have been detectable in the TEN. This range was taken as extending from (N  1)F0 to (Nþncomps)F0, where ncomps refers to the passband width divided by F0. For example, for F0 ¼ 200 Hz, N ¼ 13, and a passband width of 1000 Hz (5F0), the range of CFs was from 2400 to 3600 Hz (this is the range shown in Figs. 2 and 3). The cross-correlations for intervals 1 and 2 are denoted xcorr1 and xcorr2, respectively. 9. The normalized cross-correlation function value giving the largest peak at a non-zero value of the frequency shift was selected by the model as corresponding to the interval containing the HIHI tone. The model choice was then scored as correct or incorrect. Note that the approach of using the cross-correlation and basing performance on the largest peak at a non-zero value of the frequency shift (steps 8 and 9) meant that the template was useful even for the random-phase conditions and even when the frequency shift was different from that used to form the template. The model completed 1002 trials (two to construct the template and 1000 trials to assess performance) for each of a range of fixed values of DF for each condition, and the number of times the correct interval was selected was counted for each value of DF. Probit analysis (Finney, 1971) was then used to determine the 70.7% correct point on the resulting psychometric function, to estimate predicted thresholds. The threshold for each condition was estimated twice, using independent sets of 1002 trials, to allow an estimate of the variability of the predictions. Except when predicted thresholds were very close to 0.5F0, the two threshold estimates were within 5% of each other. The source code for the model is available at http://github.com/helenjackson/excitationpatternmodel. C. Selection of the value of the internal noise variable r

The value of r was adjusted such that the threshold predicted by each model matched that measured for human listeners for one baseline condition in each experiment; this was usually the condition that led to the lowest threshold. To determine the appropriate value of r, the model completed 5002 trials for a series of values of DF for each of a range of values of r. The value of r was varied in 0.25-dB steps. The predicted threshold, as estimated using probit analysis, was compared to the measured threshold for the baseline condition, and a value of r was selected that gave the closest correspondence between the two. The threshold predicted using the selected value of r was within 3% of the mean measured threshold for each baseline condition in all cases, except when the closest correspondence was obtained with r set to zero. The selected values are specified below for each experiment. 1360

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

III. GENERAL EXPERIMENTAL METHOD A. Procedure for measuring discrimination of the H and I tones

The procedure was the same as described by Moore and Sek (2009a). The subject was required to discriminate between H tones and I tones in which each component was shifted upwards by a given number of Hz, DF. To avoid problems due to the ambiguity of the pitch of the stimuli (Schouten et al., 1962), the task was designed so that subjects did not have to indicate the direction of any pitch shift between the H and I tones, but only to detect a change. In a given trial, subjects were presented with two sets of four tones: either HHHH followed by HIHI or HIHI followed by HHHH. The interval that contained the HIHI sequence was varied randomly. The subject was asked to indicate which of the two intervals contained HIHI. Subjects responded by using a mouse to click one of two buttons on a computer screen, each of which flashed during the corresponding interval of the task. Subjects received visual feedback on their performance via color flashes on the computer screen. A two-down one-up adaptive procedure was used to estimate the value of DF leading to 70.9% correct. The value of DF started at 0.5F0, which is the value leading to the largest possible difference between the H and I tones. Before the second trial after the first turnpoint, DF was changed by a factor of 1.253; the factor was then changed to 1.252 and kept at this value until the second trial after the second turnpoint. After this, the factor was set to 1.25. Eight turnpoints were obtained, and the threshold was calculated as the geometric mean of the values of DF at the last six turnpoints. If the value of DF requested by the procedure exceeded 0.5F0 more than three times during a run, the task was switched to a non-adaptive procedure, with the value of DF fixed at 0.5F0. Subjects completed 20 trials in this nonadaptive procedure. Each subject completed two runs for each condition. If the standard deviation of the logarithms of the values of DF at the last six turnpoints of a run was greater than 0.2, the results for that run were discarded and a new run was performed. B. Stimuli

All signals were digitally generated using a personal computer. The digital signals were used as input to the excitation-pattern models. An external M-Audio Audiophile soundcard (Cumberland, RI) was used to generate analog signals for presentation to subjects, using a 48-kHz sampling rate and 16-bit resolution. The output of the soundcard was used to drive one earpiece of a Sennheiser HD580 headset (Wedemark, Germany). The H tones initially contained many harmonics of a given F0. The I tones were generated from the H tones by shifting all components upwards in frequency by DF (Hz). Components were added in either random or cosine starting phase. For the random-phase stimuli, the selection of starting phases was re-calculated for every tone. The H and I tones were passed through a bandpass filter with a linear phase H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

response, a flat passband, and skirts with a slope of 30 dB/oct. The width of the passband is specified below for each experiment. Each tone lasted 200 ms, including 20-ms raised-cosine onset and offset ramps. The gap between the four tones within an observation interval was 100 ms. The two observation intervals were separated by 300 ms. A background TEN was used to mask combination tones and to limit the audibility of components falling on the skirts of the filter. The TEN started 300 ms before the first interval started, and ended 300 ms after the second interval had finished, again including 20-ms onset and offset ramps. The level of the TEN, specified as the level in a 1-ERBN-wide band centered on 1000 Hz, was set to 15 dB below the overall level of the signal, except where noted below. C. Procedure for measurement of absolute threshold

The stimuli were presented at a fixed sensation level (SL). To determine the appropriate level, the absolute threshold for detecting a sinewave with a frequency corresponding to N was measured using an adaptive two-alternative forcedchoice, two-down one-up procedure. The signal duration was 200 ms, including 20-ms raised-cosine on and off ramps. The two intervals in which the signal might occur were indicated by lights. The intervals were separated by 500 ms. The starting level was always 60 dB sound pressure level (SPL). The initial step size was 6 dB. The step size was decreased to 4 dB after one turnpoint and to 2 dB after the second turnpoint. The procedure terminated after six turnpoints had occurred, and the threshold was taken as the mean of the signal levels at the last four turnpoints. In all experiments, the overall level of the H and I tones was set 30 dB above the absolute threshold level for a sinusoid at the frequency corresponding to N. IV. EXPERIMENT 1: EFFECT OF VARYING THE WIDTH OF THE PASSBAND

C. Data analysis

As each subject completed two runs for each condition, and each run could result in either a threshold estimate or a percent-correct score, it was necessary to convert all scores into a common format for analysis. The two scores for each subject and condition were combined into a single threshold estimate in the following way. For a given subject and condition, if one or both runs led to a percent-correct score, and if the overall score was greater than 14/20 or 25/40 (the scores required to be considered above chance at the 0.05 level, based on the binomial distribution), it was converted to a d0 value using the table presented by Hacker and Ratcliff (1979). The threshold value of DF that would be obtained for a d0 value of 0.77 (the d0 score given for 70.7% correct) was then extrapolated from this value, assuming that d0 is proportional to DF. If the percent-correct score was not significantly different from chance, the threshold associated with it was set to 0.5F0, the value leading to the biggest possible difference between the H and I tones. When both scores for a given subject and condition were percent correct scores, the value obtained in this way was taken as the final threshold estimate. In other cases, the final estimate was taken as the geometric mean of the two threshold estimates (whether measured directly or extrapolated). D. Results

A. Rationale and conditions

In this experiment, the effect of varying the width of the passband was assessed. The bandpass filter had a passband with a width of 1, 3, or 5 times F0. It might be expected that if performance were based on the detection of shifts in the pattern of ripples along the tonotopic axis, performance would improve with increasing passband width, since this would lead to a greater number of ripples in the excitation pattern. F0s of 75 and 200 Hz were used. The lower edge of the flat region of the passband fell at 8.5 or 12.5 times F0, so that for the H tone N was 9 or 13. As the passband width was increased, the lower edge of the passband was kept fixed, and the upper edge frequency was increased. Components were added in either random or cosine phase. B. Subjects

Seven subjects (three male) with absolute thresholds of 15 dB hearing level (HL) or below for both ears at all audiometric frequencies took part. Their ages ranged from 22 to 31 yr. All subjects had some experience playing nonkeyboard musical instruments. Musically trained subjects J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

were used to avoid long learning effects, which can occur in frequency discrimination tasks, although Moore and Sek (2009a) found no significant effect of training for the procedure used here. As no training effect was expected, subjects were not trained before the experiment was run. However, after completing the last block of conditions, subjects were retested on the first condition tested to check that performance was similar to that measured initially. This was the case for all subjects.

For this and all subsequent experiments reported in this paper, although overall performance varied across subjects, the pattern of results was consistent across subjects. Hence, the analyses focus on the mean data. A within-subjects analysis of variance (ANOVA) was conducted on the measured data, using the logarithm of the single combined threshold for each subject and condition as the variate. The effects of passband width [F(2,12) ¼ 11.65; p ¼ 0.002], F0 [F(1,6) ¼ 29.1; p ¼ 0.002], and N [F(1,6) ¼ 119.6; p < 0.001] were all significant. The effect of phase was not significant [F(1,6) ¼ 0.25; p ¼ 0.633]. There was a significant interaction between passband width, N and phase [F(2,12) ¼ 4.4; p ¼ 0.036], but it accounted for only a very small proportion of the variance in the data. The interaction between passband width and F0 failed to reach significance [F(2,12) ¼ 3.77; p ¼ 0.054]. The lack of effect of phase suggests that performance of the task was not based on the use of subtle differences in temporal envelope shape for the H and I tones. As there was no significant effect of phase, no significant two-way interaction of phase with passband width or N, and the three-way interaction was small, to simplify data presentation the

H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

1361

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

geometric mean across the two phase conditions was taken for each condition. The open circles in Fig. 4 show the threshold estimates obtained in this way. Error bars indicate 6 two standard errors based on across-subject variation; the true mean would fall within this range 95% of the time. The horizontal dotted line at 0.5F0 shows the maximum threshold possible. Performance was generally better for F0 ¼ 200 Hz than for F0 ¼ 75 Hz. Also, performance was better for N ¼ 9 than for N ¼ 13. For F0 ¼ 200 Hz, the mean data showed slightly worse performance when the width of the passband was 5F0 than when it was 3F0 or 1F0. For F0 ¼ 75 Hz, the mean data showed no clear change in performance with bandwidth. However, the interaction between F0 and passband width failed to reach significance, as noted above. E. Comparison of data with model predictions

As described earlier, the value of r was adjusted until the predicted threshold matched the mean measured threshold for a “baseline” condition. For this experiment, the baseline condition was F0 ¼ 200 Hz, N ¼ 9, and passband width ¼ 1F0. This condition was selected for two reasons: First, it led to the lowest threshold; second, for this condition the ninth component would have been close to the limit of peripheral resolution (Plomp, 1964; Bernstein and Oxenham, 2003; Moore and Gockel, 2011), while TFS processing ability might plausibly have been good (Moore et al., 2009). Model A predicted a higher threshold than measured for the baseline condition even when r was 0 dB, so r was set to 0 dB. For model B, the value of r was set to 1.75 dB. The predicted thresholds did not differ for the two phase conditions, which was expected since the excitation-pattern model is based on the power spectra of the stimuli. The filled symbols in Fig. 4 show predictions based on random-phase stimuli; the predictions would be essentially identical for cosine-phase stimuli. Error bars indicate 6 two standard

errors, based on the two threshold estimates for each condition. When the predicted psychometric function failed to cross 70.7% correct, the predicted threshold was plotted at 0.6F0, for visual clarity. The thresholds predicted by model A (filled squares) were higher than measured thresholds for all conditions, even though r was zero. Indeed, model A predicted that thresholds should be unmeasurable for F0 ¼ 75 Hz, N ¼ 13, and the passband widths of 3F0 and 5F0, which was not the case. Thus measured performance was not adequately predicted using a model with the degree of frequency selectivity specified by Glasberg and Moore (1990). However, for F0 ¼ 200 Hz, model A did predict a slight worsening of performance with increasing passband width, which is consistent with the measured thresholds. The worsening may have occurred because the overall signal-to-TEN ratio was fixed, so the effective signal-to-TEN ratio for each component in the passband decreased as the passband width was increased. This explanation was assessed in experiment 2. The thresholds predicted by model B tended to be below the measured thresholds, except for the condition used to select the value of r (F0 ¼ 200, N ¼ 9, passband width of 1F0) and the corresponding condition for N ¼ 13. Model B did not predict the worsening in performance with increasing passband width that occurred for F0 ¼ 200 Hz. Indeed, for F0 ¼ 200 Hz and N ¼ 13, the model predicted an improvement as the passband width was increased from 1F0 to 3F0, followed by a slight worsening with a further increase in passband width to 5F0. This pattern of predictions probably resulted from two opposing factors: An increase in the number of components within the passband would create a clearer pattern of ripples, but this would be offset by the decreasing signal-to-TEN ratio for each component. Thresholds were predicted by both models to be higher for F0 ¼ 75 Hz than for F0 ¼ 200 Hz, and higher for N ¼ 13 than for N ¼ 9, which is consistent with the measured thresholds. The predicted effect of F0 is mainly a consequence of

FIG. 4. The open circles show results of experiment 1 (geometric mean thresholds across subjects), plotted as a function of passband width. F0 was 75 Hz (left) or 200 Hz (right). N was 9 (top) or 13 (bottom). Squares and triangles show predictions of models A and B, respectively. Error bars indicate 6 two standard errors for both human data and model predictions. The horizontal dotted line at 0.5F0 shows the maximum threshold possible. When a threshold was predicted to be unmeasurable, it is plotted at an ordinate value of 0.6.

1362

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

the fact that the bandwidth of the auditory filter, as a proportion of the CF, is somewhat greater for frequencies around 750 Hz (the region of the passband for F0 ¼ 75 Hz) than for frequencies around 2000 Hz (the region of the passband for F0 ¼ 200 Hz). Hence, the ripples in the excitation pattern corresponding to individual components were more marked for F0 ¼ 200 than for F0 ¼ 75 Hz. The predicted increase in thresholds with increasing N occurs because the bandwidth of the auditory filters increases with increasing CF, so the ripples in the excitation pattern become progressively less deep as N increases. In summary, model A predicted performance that was consistently worse than measured performance, even though the internal noise was set to zero. Model B predicted performance for most conditions that was better than measured performance, and model B did not correctly predict the worsening in performance with increasing passband width or decreasing F0. Thus neither model was fully consistent with the data.

V. EXPERIMENT 2: EFFECT OF VARYING THE SIGNAL-TO-NOISE RATIO A. Rationale and conditions

In experiment 1, the overall level of the complex tones was fixed for each subject and the level/ERBN of the TEN was set 15 dB below that level. Hence, as the passband width was increased, the level of each sinusoidal component decreased and the signal-to-TEN ratio decreased correspondingly. The decrease was close to 3 dB when the passband width was increased from 1F0 to 5F0. It is possible that both the models and the subjects were sensitive to this change in signal-to-noise ratio, and that this led to the effect of passband width. Experiment 2 assessed this possibility. To assess the effect that a 3-dB change in signal-to-TEN ratio would have on performance, the condition with F0 ¼ 200 Hz, N ¼ 13, and the passband width of 1F0 was repeated with the level of the TEN set 12 dB lower than the overall level of the stimulus rather than 15 dB lower as in experiment 1. With the higher TEN level, the signal-to-TEN ratio for each component was the same as for the passband width of 5F0 in experiment 1. The condition with the original relative TEN level was repeated, as most subjects had not taken part in experiment 1. Both random-phase and cosine-phase conditions were used.

experiment 1 suggested that some time to become familiar with the task would be helpful. C. Procedure and data analysis

The procedure was the same as for experiment 1, with the following exceptions. The absolute threshold at the frequency corresponding to N was estimated twice instead of once, and the mean of the two was used to set the SL of the stimuli. The step size in DF was a factor of 1.253 until one turnpoint occurred, was reduced to 1.252 until the second turnpoint occurred and thereafter was 1.25. If the value of DF requested by the procedure exceeded 0.5F0 more than twice before the second turnpoint or at all after this, the task switched to the non-adaptive procedure. Third, 40 trials were completed for the non-adaptive procedure rather than 20. These changes were made to reduce the number of times that a subject could complete a run and achieve a threshold measurement by chance, when the H and I tones could not actually be discriminated, and to give a more accurate estimate of the percent-correct score when the non-adaptive procedure was used. Each subject completed three runs for each condition rather than two. The results for each subject and condition were combined into a single threshold estimate in the same way as for experiment 1. As there were 40 trials in the non-adaptive procedure in this experiment, performance that was significantly different from chance at the 5% level was calculated to be > 25/40, >47/80, or >69/120, using the binomial distribution. D. Results

A within-subjects analysis of variance (ANOVA) was conducted on the logarithms of the thresholds, with factors relative TEN level and phase. There were no significant effects. The open symbols in Fig. 5 show the geometric mean data across subjects and across the two phase conditions. Consistent with the results of the ANOVA, the mean threshold was almost the same for the two relative TEN levels: 0.178F0 for the relative level of 15 dB, and 0.179F0 for the relative level of 12 dB.

B. Subjects

Seven subjects (four male) with absolute thresholds of 20 dB hearing level (HL) or below in both ears at all audiometric frequencies took part. Their ages ranged from 23 to 30 yr. All were musically trained. Two of the female subjects also took part in experiment 1, so they received no further training. The five naive subjects were trained using four conditions from experiment 1, which took about half an hour to complete. Although training effects were not expected, this training was included because feedback from the subjects in J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

FIG. 5. The open circles show results of experiment 2 (F0 ¼ 200 Hz, N ¼ 13), plotted as a function of the level of the TEN relative to the overall level of the signal. For visual clarity, predictions of models A and B are offset from their correct positions along the x axis. Otherwise as Fig. 4.

H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

1363

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

E. Comparison of data and model predictions

C. Results

The value of r was adjusted as before so as to match performance as closely as possible for the signal-to-TEN ratio of 15 dB. The resulting value of r was 0.25 dB for model A and 3.5 dB for model B. For both models, the value of r was larger than for experiment 1, because the subjects of experiment 2 generally performed more poorly than those of experiment 1. The filled symbols in Fig. 5 show the thresholds predicted by each model for each condition. For both models, performance was predicted to worsen markedly with the decrease in signal-to-TEN ratio, which is not consistent with the experimental data. Model A predicted that threshold would be unmeasurable for the signal-to-TEN ratio of 12 dB, while a threshold could be measured for all subjects. The predicted worsening in performance with the decrease in SNR can account for why model A predicted a worsening in performance with increasing passband width in experiment 1. As mentioned earlier, the non-monotonic predictions of model B for experiment 1 can be explained in terms of the interaction between two factors in that experiment. As the passband width increased there would have been more of a pattern in the ripples, but the TEN would have had a greater disruptive effect on each part of the pattern. In summary, the predictions of the models for the data of experiment 1 were probably influenced by the decrease in effective signal-to-TEN ratio with increasing passband width. However, the predicted worsening in performance with decrease in signal-to-TEN ratio in experiment 2 was not consistent with the measured data, which showed no clear change when the signal-to-TEN ratio was decreased by 3 dB. An effect of signal-to-TEN ratio presumably would have been found if the ratio had been varied over a wider range. The lack of effect of signal-to-TEN ratio over the small range tested here suggests that the worsening in human performance with increasing passband width found in experiment 1 was unlikely to have been due to the change in signal-to-TEN ratio.

A within-subjects ANOVA was conducted using the logarithm of the single combined threshold for each subject and condition as the variate. The factors were F0, N, and phase. The effects of F0 [F(4,24) ¼ 19.2; p < 0.001], and N [F(2,12) ¼ 137.3; p < 0.001] were both significant. The effect of phase was not significant [F(1,6) ¼ 1.48; p ¼ 0.27], and there were no interactions of phase with any other factor. There was a significant interaction between F0 and N [F(8, 48) ¼ 7.63; p < 0.001]. Since there was no significant effect of phase, the data were averaged across the cosine- and random-phase conditions. The open symbols in Fig. 6 show the geometric mean thresholds across subjects. Performance generally improved with increasing F0 up to 200 Hz, and then flattened off (except for N ¼ 16). Performance worsened with increasing N, consistent with the results of experiment 1, and with earlier results (Hopkins and Moore, 2007; Moore et al., 2009). Some subjects gave thresholds close to 0.5F0 for the F0s of 50, 75, and 100 Hz for N ¼ 13 and N ¼ 16, but for N ¼ 9 all subjects but one gave thresholds well below 0.5F0 for F0s of

VI. EXPERIMENT 3: EFFECT OF VARYING F0 AND HARMONIC RANK A. Rationale and conditions

Experiment 3 was intended to provide data for a wide range of values of F0 and N, for comparison with model predictions. Values of F0 were 50, 75, 100, 200, and 400 Hz and values of N were 9, 13, or 16. Components were added in either random or cosine phase and the signal-to-TEN ratio was 15 dB. The bandpass filter had a central flat region with a width of 5F0. B. Subjects, procedure, and data analysis

The seven subjects were the same as for experiment 1. Experiment 3 was carried out after experiment 1, so no training was given. The procedure and method of data analysis were the same as for experiment 1. 1364

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

FIG. 6. The open circles show results of experiment 3, plotted as a function of F0. The value of N was 9 (top), 13 (middle), or 16 (bottom) and the passband width was 5F0. Otherwise as Fig. 4. H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

75 Hz and above. For N ¼ 16, mean thresholds were close to the maximum possible threshold, but still below it. As noted earlier, previous studies have shown that the task often cannot be performed reliably when N ¼ 16 (Hopkins and Moore, 2007; Moore et al., 2009; Moore and Sek, 2009a). The fact that most subjects here could perform the task reliably for F0s above 50 Hz may be related to the fact that all of our subjects were musically trained. D. Comparison of data and model predictions

Since the subjects were the same as for experiment 1, the r values were kept the same as for experiment 1, that is 0 dB for model A and 1.75 dB for model B. The filled symbols in Fig. 6 show the predicted thresholds for each model and each condition. Both models predicted that thresholds should increase with decreasing F0 and with increasing N, consistent with the pattern in the experimental data. However, model A predicted thresholds that were higher than measured thresholds for most conditions, except for F0 ¼ 400 Hz and N ¼ 9 and 13. Model A predicted that threshold should not be measurable for N ¼ 16 for all F0s up to 200 Hz, whereas most subjects were able to perform the task at above-chance levels for all F0s except 50 Hz. Model A also predicted that thresholds should not be measurable for N ¼ 13 and F0 ¼ 50, 75, and 100 Hz, and for N ¼ 9 and F0 ¼ 50 Hz, predictions that were again inconsistent with the experimental data. Model B predicted thresholds that were lower than measured thresholds for N ¼ 9 and 13, but predicted and measured thresholds matched reasonably well for N ¼ 16. However, model B predicted that thresholds should not be measurable for N ¼ 16 and F0 ¼ 75 Hz, whereas all subjects performed above chance for F0 ¼ 75 Hz. E. Discussion

It is notable that the human listeners achieved abovechance performance for N ¼ 16 and F0 ¼ 75 to 400 Hz. For F0 ¼ 75 Hz, above-chance performance was obtained for 34 out of 52 runs (7 subjects  2 phases  3 repetitions). The combined geometric means for this condition were below 0.4F0 for both phases, with standard errors of about 0.04F0. Even model B, with its assumed very sharp filters, predicted that the task should not be possible for N ¼ 16 and F0 ¼ 75 Hz, presumably because the ripples in the calculated excitation pattern were too small to be usable given the random fluctuations in excitation level produced by the background TEN. This supports the observation from experiment 2 that the models were more affected by the TEN than the human listeners. For the condition with F0 ¼ 400 Hz and N ¼ 16, the lowest component in the passband of the H tone had a frequency of 6400 Hz, and the lowest component that would have been above threshold in the TEN had a frequency of 5800 or 5400 Hz. This is above the usual assumed limit of about 5000 Hz for the use of phase-locking information (Johnson, 1980; Palmer and Russell, 1986), although the upper limit in humans is not known and, in any case, there is no “hard” limit. The present results are, however, consistent with those J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

of Moore and Sek (2009b), obtained using the same task as was employed here. They found that most listeners could perform the task for an F0 of 800 Hz when the lowest audible component in the H tone had a frequency of 8000 Hz. Moore and Sek argued that the results supported the idea that TFS information can be used to discriminate complex tones when all of their audible frequency components lie above 5000 Hz. The present results are consistent with this conclusion. The recent results of Moore and Ernst (2012) suggest that the frequency discrimination of pure tones may also be based on phase-locking information for frequencies up to about 8000 Hz, consistent with the modeling results of Goldstein and Srulovicz (1977) and Heinz et al. (2001).

VII. EXPERIMENT 4: EFFECT OF PERTURBING COMPONENT LEVELS A. Rationale and conditions

If the H and I tones are discriminated using excitationpattern cues, then performance should worsen markedly if these cues are disrupted by randomly perturbing (roving) the level of each component in the complex tones. This prediction was tested in experiment 4. The stimuli were similar to a subset of those used for experiment 1, except that a random level perturbation was applied to each component in each stimulus. The magnitude of the perturbation was chosen from a uniform distribution with range 6P, where P was 0, 3 or 5 dB. F0 was 200 Hz, the passband width was 1F0 or 5F0, and N was 13. Both random-phase and cosine-phase conditions were used. The level of the TEN in dB/ERBN was 15 dB relative to the overall level of the complex. The top row of Fig. 7 shows examples of excitation patterns for F0 ¼ 200 Hz, N ¼ 13, P ¼ 5 dB and passband width ¼ 5F0 for a single interval containing HIHI tones. Excitation patterns are shown as the average for two H tones (first and third tones in the interval, solid lines) and for two I tones (second and fourth tones in the interval, dashed lines), for model A (left) and model B (right). The bottom row shows examples of the templates constructed by the models (solid lines) and the differences between the excitation levels of the averaged H and I tones in the corresponding top panels (dashed lines). The templates and excitation-pattern differences generated by model A do not show a clear pattern of ripples. The templates and excitation-pattern differences generated by model B show some ripples, but they are much less regular than shown in Fig. 3, due to the random level perturbation applied to each component. Note that the overall range of level perturbation for the largest value used (10 dB for P ¼ 5 dB) was several times the depth of the ripples in the excitation pattern, even for model B, which meant that the level perturbation should have been very effective in disrupting the pattern of the ripples (Green, 1988). Thus, it was expected that both models would predict higher thresholds when P was 5 dB than when it was 0 dB. If human listeners do not use the regular pattern of ripples in the excitation pattern to perform the task, but instead use TFS cues, then measured thresholds would not necessarily worsen greatly with increasing P; the extent of any worsening would

H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

1365

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

FIG. 7. Examples of the averaged excitation patterns for the HIHI interval of one trial when P was 5 dB and the passband width was 5F0. Excitation patterns were averaged for two H tones (first and third tones in the interval, solid lines) and for two I tones (second and fourth tones in the interval, dashed lines), for model A (left) and model B (right). The bottom row shows examples of the templates constructed by the models (solid lines) and the differences between the excitation levels of the averaged H and I tones in the corresponding top panels (dashed lines).

depend on the extent to which TFS cues were disrupted by the perturbation. This is considered in more detail later. B. Subjects, procedure and data analysis

The seven subjects were the same as for experiment 2. Experiment 4 was carried out after experiment 2, so no training was given. The procedure and method of data analysis were the same as for experiment 3. C. Results

A within-subjects ANOVA was conducted, with factors passband width, phase, and amount of level perturbation, using the logarithm of the single combined threshold for each subject and condition as the variate. There was a significant effect of passband width [F(1,6) ¼ 8.53; p ¼ 0.027]; consistent with the results of experiment 1, thresholds increased with increasing passband width. There were no other significant effects or interactions. As the ANOVA showed no significant effect of phase, or interactions with phase, the geometric mean of the random-phase and cosine-phase thresholds was calculated. The geometric mean thresholds across subjects are shown as open circles in Fig. 8. Randomly perturbing the levels of the individual components had little effect. D. Comparison of data with model predictions

As the subjects were the same as for experiment 2, the values of r for the two models were kept the same as for 1366

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

FIG. 8. The open circles show results of experiment 4 (F0 ¼ 200 Hz, N ¼ 13), plotted as a function of the amount of level perturbation of the individual components for passband widths of 1F0 (top) and 5F0 (bottom). Predictions are shown for models A and B, and for versions of the models with lower values of r (designated A2 and B2, lower panel only). For visual clarity, predictions of the models are offset from their correct positions along the x axis. Otherwise as Fig. 4. H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

experiment 2, that is 0.25 dB for model A and 3.5 dB for model B. The filled squares and up-pointing triangles in Fig. 8 show model predictions. The chosen values of r gave accurate predictions for both models for the condition giving the best performance, namely with a passband width of 1F0 and P ¼ 0. Model A predicted that the threshold should not be measurable for all other conditions, which is not consistent with the experimental data. Model B predicted that the threshold should just be measurable (the threshold was just below the maximum possible threshold) for the passband width of 1F0 and P ¼ 3 dB, but predicted that threshold should not be measurable for all remaining conditions. Although both models predicted that threshold should not be measurable for any value of P for the passband width of 5F0, the predicted percent correct for fixed values of DF did decrease markedly with increasing P. Hence, both models predicted a substantial worsening in performance with increasing P, whereas the measured data showed no significant worsening. Thus the predicted pattern of results was clearly different from the measured pattern. Both models predicted that performance should be worse for the passband width of 5F0 than for the passband width of 1F0. However, for the comparable conditions of experiment 1, model B predicted that performance should be slightly better for the passband width of 5F0 than for the passband width of 1F0. The discrepancy is a consequence of the higher value of r used to model the data in experiment 4, which was in turn a consequence of the fact that the subjects in experiment 4 generally performed slightly more poorly than the subjects of experiment 1. Since both models predicted that thresholds should be unmeasurable for all conditions with the passband width of 5F0, the models were run again for this passband width, but with r set to 0 and 2.5 dB for models A and B, respectively. We refer to the models with these values of r as A2 and B2. The thresholds predicted by these two models are shown in the lower panel of Fig. 8. Model A2 predicted a threshold that was higher than the measured threshold for P ¼ 0, and predicted that thresholds should be unmeasurable for P ¼ 3 and 5 dB. Model B2 predicted a threshold similar to the measured threshold for P ¼ 0, but predicted that threshold should increase markedly with increasing P, reaching 0.48F0 for P ¼ 5 dB. The predicted increase was greater than the observed (non-significant) increase. In summary, as expected, the excitation-pattern models predicted that performance would be adversely affected by perturbation of the levels of the individual components for both passband widths, but the measured data showed no significant effect of level perturbation. This suggests that performance was not based solely on the use of excitationpattern cues. VIII. GENERAL DISCUSSION A. Performance of the excitation-pattern models

For experiment 1, which examined the effects of varying passband width for F0 ¼ 75 and 200 Hz and N ¼ 9 and 13, model A predicted performance that was consistently worse than measured performance. Model B predicted performance J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

for most conditions that was better than measured performance, and model B did not correctly predict the worsening in performance with increasing passband width. Thus neither model was fully consistent with the data. For experiment 2, which examined the effect of varying the signal-to-TEN ratio for F0 ¼ 200 Hz, N ¼ 13, and a passband width of 1F0, the measured thresholds did not change significantly with a 3-dB decrease in signal-to-TEN ratio, while the predicted thresholds increased markedly. For experiment 3, which measured thresholds for N ¼ 9, 13, and 16, using F0s ranging from 50 to 400 Hz, the models predicted some of the general trends in the data, but did not fully account for the pattern of results. For experiment 4, both models predicted that performance would be strongly adversely affected by perturbation of the levels of the individual components, but the measured data showed no significant effect of level perturbation. Overall, these outcomes suggest that subjects did not rely exclusively on differences in excitation patterns between the H and I tones to perform the task. In the next three sub-sections, we consider the extent to which the pattern of results in the four experiments is consistent with the TFS hypothesis. B. Effects of F0, N, and passband width

In this section we consider whether the changes in performance with F0, N, and passband width observed in experiments 1 and 3 are consistent with an explanation based on the use of TFS cues. If the TFS hypothesis holds, then performance is based on estimation of the time intervals between peaks in the TFSBM close to adjacent envelope maxima (aem) on the basilar membrane (Schouten et al., 1962; Moore and Sek, 2009b); such intervals are referred to here as Iaem. Consider first the effect of N. As N increases, adjacent peaks in the TFS of the waveform become closer together in time. This means that the time intervals to be discriminated also become closer together. For example, consider discrimination of H and I tones with an F0 of 100 Hz, corresponding to an envelope period of 10 ms. Assume that the frequency shift, DF, is 50 Hz. For N ¼ 9, the largest shift in the TFSBM would occur for an auditory filter centered close to 900 Hz. For the H tone, the most prominent values of Iaem would be 8.89, 10, and 11.1 ms. For the I tone, the corresponding intervals would be 9.44, 10.56, and 11.67 ms. The auditory system therefore has to discriminate intervals such as 8.80 and 9.44 ms, which differ by 6.2%, and 10 and 10.56 ms, which differ by 5.6%. If N is increased to, say, 16, then the largest shift in the TFS would occur for an auditory filter centered close to 1600 Hz. The prominent values of Iaem for the H tone become 9.38, 10, and 10.63 ms, while those for the I tone become 9.69, 10.31, and 10.94 ms. In this case, the intervals to be discriminated differ by only about 3%. This could partly account for the worsening in performance with increasing N. Consider next the effect of F0. As F0 decreases, the values of Iaem become progressively longer. The auditory system may find it difficult to measure long time intervals with high accuracy (de Cheveigne and Pressnitzer, 2006), and there may be an upper limit to the time intervals that can

H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

1367

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

be measured (Krumbholz et al., 2000); this could account for the increase in thresholds with decreasing F0. This explanation is consistent with earlier work showing that the ability to process TFS information in complex tones worsens for very low F0s (Moore et al., 2009). It is also consistent with the increase in pure-tone frequency difference limens for very low frequencies (Moore, 1973). The deleterious effect of increasing passband width could be explained in terms of the TFS hypothesis by the worsening in effective signal-to-TEN ratio, as discussed earlier. Recall that the overall level of the signal relative to the TEN was held constant. As the passband width was increased, the signal-to-TEN ratio at the output of each auditory filter would have decreased. However, experiment 2 showed no effect of changing the signal-to-TEN ratio by 3 dB, suggesting that some other factor contributed to the effect of passband width. Another possible explanation comes from the idea that pitch extraction involves the combination of information across CFs, even under conditions where such combination is disadvantageous (Gockel et al., 2004). For the I tone, the TFSBM information is different (conflicting) at the output of different auditory filters, for example filters centered at the lower and upper edges of the passband. The TFSBM information may be easier to analyze when the conflict is reduced by decreasing the bandwidth of the stimulus. This is illustrated in Fig. 9, which shows 20-ms samples of responses of simulated (gammatone) auditory filters to an I tone with a nominal F0 of 200 Hz when DF was 100 Hz. The passband of the stimulus extended from 1700 to 2700 Hz (so that the 9th–13th harmonics of the H tone fell within it) and components were added in cosine phase. For simplicity, no TEN was added. The top panel shows the output of a simulated auditory filter with a CF of 1900 Hz in response to this signal, and the bottom panel shows the output of a simulated auditory filter with a CF of 2500 Hz in response to the same signal. The values of Iaem are different for the two filters. This creates a conflict of information across CFs. The wider the bandwidth of the stimulus, the more auditory filters respond, and hence the more conflicting sets of information the auditory system has to process. Also, the time intervals to be

FIG. 9. Twenty-ms samples of responses of simulated (gammatone) auditory filters to an I tone with a nominal F0 of 200 Hz, N ¼ 9, and DF ¼ 100 Hz. The passband of the stimulus extended from 1700 to 2700 Hz and components were added in cosine phase. The top and bottom panels are for CFs of 1900 and 2500 Hz, respectively. The most prominent values of Iaem are indicated. The Hilbert envelopes of the waveforms are indicated by dashed lines. 1368

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

discriminated differ by a greater percentage at low CFs than at high CFs, as described above when discussing the effect of varying N. It may be that increasing the passband width by increasing the upper cutoff frequency merely has the effect of adding less informative TFSBM information and “diluting” the average difference in Iaem between the H and I tones. C. Effect of level perturbation

Adding a level perturbation to all components in the tones would not alter the values of Iaem for the H tones, since each tone remains strictly periodic. The level perturbation would be expected to make the TFS vary somewhat across I tones, perhaps making it more difficult to detect differences in TFS between the H and I tones. Also, a component with a large amplitude might dominate the waveform at the outputs of auditory filters with CFs close to its frequency, particularly if neighboring components happened to have low amplitudes, which would reduce the amount of temporal envelope fluctuation and make envelope maxima less distinct. Therefore, performance based on TFS cues should worsen for a sufficiently high value of P. However, moderate level perturbation might not prevent TFS cues from being used effectively. This is illustrated in Fig. 10, which shows 20-ms samples of waveforms at the output of a simulated (gammatone) auditory filter with CF ¼ 3100 Hz in response to random-phase H and I tones with F0 ¼ 200 Hz, N ¼ 13, passband width ¼ 5F0, and DF ¼ 100 Hz. For simplicity, no TEN was added. The top panel is for an H tone with P ¼ 0 dB and the next lower panel is for an H tone with P ¼ 5 dB. In both cases, the most prominent value of Iaem is 5 ms, corresponding to 1/F0. The lower two panels show

FIG. 10. Illustration of the effect of perturbing the level of each component on the waveforms at the output of a simulated auditory filter with CF ¼ 3100 Hz. The input signals had F0 ¼ 200 Hz, N ¼ 13, passband width ¼ 5F0, and components were added in random phase. The panels show samples of the waveforms for, from top to bottom: (1) an H tone with P ¼ 0 dB; (2) an H tone with P ¼ 5 dB; (3) an I tone with DF ¼ 100 Hz, and P ¼ 0 dB; (4) the same tone but with P ¼ 5 dB. The most prominent values of Iaem are indicated. The Hilbert envelopes of the waveforms are shown by dashed lines. H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

samples of waveforms for I tones with DF ¼ 100 Hz and P ¼ 0 or 5 dB (bottom). The most prominent values of Iaem are only slightly affected by the level perturbation, and the values of Iaem for the H and I tones are clearly different, both without and with level perturbation. This analysis indicates that the TFS cues for discrimination of the H and I tones are robust, and are only minimally affected by a level perturbation of 65 dB. Hence, the fact that performance of the human subjects was hardly affected by the level perturbation is consistent with the use of TFS cues. D. Lack of effect of phase

There was no significant main effect of phase in any of the four experiments here. This suggests that subtle differences between the temporal envelope shapes of the H and I tones were not used to perform the task. Another possible temporal-envelope-related cue is connected with the depth of envelope modulation at the outputs of the auditory filters as a function of filter CF. The output of an auditory filter centered between two components within the passband would have a larger envelope modulation depth than for a filter centered on a component. Hence, the envelope modulation depth would show a CF-dependent ripple. When all of the components are shifted in frequency, the CF-dependent ripple in modulation depth would shift, and this might provide a detection cue. Several pieces of evidence suggest that this cue was not used. First, for high N, the depth of modulation would have varied only very slightly as a function of CF, but quite large changes in modulation depth are required for the change to be detectable (Wakefield and Viemeister, 1990). Second, the perturbation of the level of each component used in experiment 4 would have strongly disrupted the across-CF ripple in temporal envelope modulation depth, but the level perturbation had little effect on performance. Finally, subjects reported listening for changes in pitch across tones, not changes in roughness or any other sensation that might be related to the depth of temporal envelope modulation. Performance of the task using TFS cues does require temporal envelope fluctuations to be present, since it is assumed that listeners perform the task by estimating time intervals between peaks in TFSBM close to adjacent envelope maxima, Iaem, as illustrated in Figs. 9 and 10. If the envelopes at the outputs of the auditory filters were flat, or nearly so, the auditory system might have difficulty in determining which time intervals in TFSBM to estimate in order to discriminate the H and I tones. However, even for random-phase stimuli, for which the envelope fluctuations are less prominent than for cosine-phase stimuli, simulations suggest that the waveforms at the outputs of auditory filters have distinct periodic envelope fluctuations; see Fig. 10. This could explain the fact that performance did not differ significantly across phase conditions. The fact that there was no significant main effect of phase does not necessarily imply that the harmonics on which performance was based were resolved. Indeed, it seems almost certain that for N ¼ 16 (experiment 3), all harmonics were completely unresolved. While an effect of J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

phase on pitch indicates that at least some harmonics are unresolved, the presence of unresolved harmonics does not always result in a phase effect (Moore et al., 2006; Oxenham et al., 2009). E. Conclusions on the adequacy of the TFS hypothesis

We conclude that the main features of the data from experiments 1–4 are consistent with the TFS hypothesis. It would be desirable to produce more quantitative predictions using a model of pitch that incorporates the use of TFS information (Meddis and O’Mard, 1997; Bernstein and Oxenham, 2005; Ives and Patterson, 2008). However, such models are designed with the goal of predicting pitch values and pitch strength, based on average neural responses. The models do not take into account trial-by-trial variability in neural responses, resulting from the presence of background noise or from inherent neural variability. Hence, the models do not readily lend themselves to generating quantitative predictions of discrimination performance in the presence of background noise. IX. SUMMARY AND CONCLUSIONS

The prediction of two excitation-pattern models were compared with measured data for a wide range of stimulus manipulations involving the discrimination of bandpassfiltered H and I tones presented in a background of TEN. There was no effect of component phase (cosine or random) in any of the experiments, consistent with the predictions of the excitation-pattern models. The lack of a phase effect suggests that subject did not use cues related to temporal envelope shape to perform the task. While the excitation-pattern models predicted some of the trends in the data, for example the worsening of performance with increasing N and decreasing F0, they incorrectly predicted that a 3-dB decrease in signal-to-TEN ratio would impair performance, and they incorrectly predicted that randomly perturbing the level of each component in each tone over a range of 63 and 65 dB would strongly impair performance. Also, the models predicted that it should not be possible to perform the task for N ¼ 16 and F0 ¼ 75 Hz, but thresholds could be measured for all subjects for this condition. Overall, these outcomes suggest that subjects did not rely solely on differences in excitation patterns between the H and I tones to perform the task. This indirectly supports the idea that TFS cues were used. It is argued that the general trends in the data are consistent with the use of TFS cues. However, quantitative predictions using a model that incorporates the use of TFS cues are needed to further test this idea. ACKNOWLEDGMENTS

This work was supported by Deafness Research UK (now merged with Action on Hearing Loss) and by the Medical Research Council (Grant number G0701870). We thank Brian Glasberg for help with modeling and statistical

H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

1369

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

analysis, Aleksander Sek for providing the original source code for the TFS1 test, and Kristian Glass and Simon Jackson for help with getting the models running. Bernstein, J. G., and Oxenham, A. J. (2003). “Pitch discrimination of diotic and dichotic tone complexes: Harmonic resolvability or harmonic number?,” J. Acoust. Soc. Am. 113, 3323–3334. Bernstein, J. G., and Oxenham, A. J. (2005). “An autocorrelation model with place dependence to account for the effect of harmonic number on fundamental frequency discrimination,” J. Acoust. Soc. Am. 117, 3816–3831. Bernstein, L. R., and Green, D. M. (1987). “Detection of simple and complex changes of spectral shape,” J. Acoust. Soc. Am. 82, 1587–1592. Bracewell, R. N. (1986). The Fourier Transform and Its Applications (McGraw Hill, New York), pp. 267–272. Dau, T., P€uschel, D., and Kohlrausch, A. (1996). “A quantitative model of the ‘effective’ signal processing in the auditory system. I. Model structure,” J. Acoust. Soc. Am. 99, 3615–3622. de Boer, E. (1956). “Pitch of inharmonic signals,” Nature 178, 535–536. de Cheveigne, A., and Pressnitzer, D. (2006). “The case of the missing delay lines: Synthetic delays obtained by cross-channel phase interaction,” J. Acoust. Soc. Am. 119, 3908–3918. Finney, D. J. (1971). Probit Analysis (Cambridge University Press, Cambridge), pp. 1–333. Glasberg, B. R., and Moore, B. C. J. (1982). “Auditory filter shapes in forward masking as a function of level,” J. Acoust. Soc. Am. 71, 946–949. Glasberg, B. R., and Moore, B. C. J. (1990). “Derivation of auditory filter shapes from notched-noise data,” Hear. Res. 47, 103–138. Gockel, H., Carlyon, R. P., and Plack, C. J. (2004). “Across-frequency interference effects in fundamental frequency discrimination: Questioning evidence for two pitch mechanisms,” J. Acoust. Soc. Am. 116, 1092–1104. Goldstein, J. L., and Srulovicz, P. (1977). “Auditory-nerve spike intervals as an adequate basis for aural frequency measurement,” in Psychophysics and Physiology of Hearing, edited by E. F. Evans, and J. P. Wilson (Academic, London), pp. 337–346. Green, D. M. (1988). Profile Analysis (Oxford University Press, Oxford), pp. 1–144. Green, D. M., Onsan, Z. A., and Forrest, T. G. (1987). “Frequency effects in profile analysis and detecting complex spectral changes,” J. Acoust. Soc. Am. 81, 692–699. Hacker, M. J., and Ratcliff, R. (1979). “A revised table of d0 for Malternative forced choice,” Percept. Psychophys. 26, 168–170. Heinz, M. G., Colburn, H. S., and Carney, L. H. (2001). “Evaluating auditory performance limits: I. One-parameter discrimination using a computational model for the auditory nerve,” Neur. Comput. 13, 2273–2316. Hoekstra, A., and Ritsma, R. J. (1977). “Perceptive hearing loss and frequency selectivity,” in Psychophysics and Physiology of Hearing, edited by E. F. Evans, and J. P. Wilson (Academic, London, England), pp. 263–271. Hopkins, K., and Moore, B. C. J. (2007). “Moderate cochlear hearing loss leads to a reduced ability to use temporal fine structure information,” J. Acoust. Soc. Am. 122, 1055–1068. Houtsma, A. J. M., and Smurzynski, J. (1990). “Pitch identification and discrimination for complex tones with many harmonics,” J. Acoust. Soc. Am. 87, 304–310. Ives, D. T., and Patterson, R. D. (2008). “Pitch strength decreases as F0 and harmonic resolution increase in complex tones composed exclusively of high harmonics,” J. Acoust. Soc. Am. 123, 2670–2679. Javel, E. (1980). “Coding of AM tones in the chinchilla auditory nerve: Implications for the pitch of complex tones,” J. Acoust. Soc. Am. 68, 133–146. Johnson, D. H. (1980). “The relationship between spike rate and synchrony in responses of auditory-nerve fibers to single tones,” J. Acoust. Soc. Am. 68, 1115–1122. Krumbholz, K., Patterson, R. D., and Pressnitzer, D. (2000). “The lower limit of pitch as determined by rate discrimination,” J. Acoust. Soc. Am. 108, 1170–1180. Meddis, R., and O’Mard, L. (1997). “A unitary model of pitch perception,” J. Acoust. Soc. Am. 102, 1811–1820.

1370

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

Moore, B. C. J. (1973). “Frequency difference limens for short-duration tones,” J. Acoust. Soc. Am. 54, 610–619. Moore, B. C. J. (2012). An Introduction to the Psychology of Hearing, 6th Ed. (Brill, Leiden, The Netherlands), pp. 1–441. Moore, B. C. J. (2014). Auditory Processing of Temporal Fine Structure: Effects of Age and Hearing Loss (World Scientific, Singapore), pp. 1–182. Moore, B. C. J., and Ernst, S. M. (2012). “Frequency difference limens at high frequencies: Evidence for a transition from a temporal to a place code,” J. Acoust. Soc. Am. 132, 1542–1547. Moore, B. C. J., and Glasberg, B. R. (1981). “Auditory filter shapes derived in simultaneous and forward masking,” J. Acoust. Soc. Am. 70, 1003–1014. Moore, B. C. J., and Glasberg, B. R. (2010). “The role of temporal fine structure in harmonic segregation through mistuning,” J. Acoust. Soc. Am. 127, 5–8. Moore, B. C. J., Glasberg, B. R., and Baer, T. (1997). “A model for the prediction of thresholds, loudness and partial loudness,” J. Audio Eng. Soc. 45, 224–240. Moore, B. C. J., Glasberg, B. R., Flanagan, H. J., and Adams, J. (2006). “Frequency discrimination of complex tones; assessing the role of component resolvability and temporal fine structure,” J. Acoust. Soc. Am. 119, 480–490. Moore, B. C. J., and Gockel, H. (2011). “Resolvability of components in complex tones and implications for theories of pitch perception,” Hear. Res. 276, 88–97. Moore, B. C. J., Hopkins, K., and Cuthbertson, S. J. (2009). “Discrimination of complex tones with unresolved components using temporal fine structure information,” J. Acoust. Soc. Am. 125, 3214–3222. Moore, B. C. J., Huss, M., Vickers, D. A., Glasberg, B. R., and Alcantara, J. I. (2000). “A test for the diagnosis of dead regions in the cochlea,” Br. J. Audiol. 34, 205–224. Moore, B. C. J., and Sek, A. (2009a). “Development of a fast method for determining sensitivity to temporal fine structure,” Int. J. Audiol. 48, 161–171. Moore, B. C. J., and Sek, A. (2009b). “Sensitivity of the human auditory system to temporal fine structure at high frequencies,” J. Acoust. Soc. Am. 125, 3186–3193. Moore, G. A., and Moore, B. C. J. (2003). “Perception of the low pitch of frequency-shifted complexes,” J. Acoust. Soc. Am. 113, 977–985. Oxenham, A. J., Micheyl, C., and Keebler, M. V. (2009). “Can temporal fine structure represent the fundamental frequency of unresolved harmonics?,” J. Acoust. Soc. Am. 125, 2189–2199. Oxenham, A. J., and Shera, C. A. (2003). “Estimates of human cochlear tuning at low levels using forward and simultaneous masking,” J. Assoc. Res. Otolaryngol. 4, 541–554. Oxenham, A. J., and Simonson, A. M. (2006). “Level dependence of auditory filters in nonsimultaneous masking as a function of frequency,” J. Acoust. Soc. Am. 119, 444–453. Palmer, A. R., and Russell, I. J. (1986). “Phase-locking in the cochlear nerve of the guinea-pig and its relation to the receptor potential of inner haircells,” Hear. Res. 24, 1–15. Patterson, R. D., and Nimmo-Smith, I. (1980). “Off-frequency listening and auditory filter asymmetry,” J. Acoust. Soc. Am. 67, 229–245. Plomp, R. (1964). “The ear as a frequency analyzer,” J. Acoust. Soc. Am. 36, 1628–1636. Schouten, J. F. (1940). “The perception of pitch,” Philips Tech. Rev. 5, 286–294. Schouten, J. F., Ritsma, R. J., and Cardozo, B. L. (1962). “Pitch of the residue,” J. Acoust. Soc. Am. 34, 1418–1424. Sek, A., and Moore, B. C. J. (2012). “Implementation of two tests for measuring sensitivity to temporal fine structure,” Int. J. Audiol. 51, 58–63. Shackleton, T. M., and Carlyon, R. P. (1994). “The role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination,” J. Acoust. Soc. Am. 95, 3529–3540. Wakefield, G. H., and Viemeister, N. F. (1990). “Discrimination of modulation depth of SAM noise,” J. Acoust. Soc. Am. 88, 1367–1373. Zwicker, E. (1956). “Die elementaren Grundlagen zur Bestimmung der Informationskapazit€at des Geh€ ors (The foundations for determining the information capacity of the auditory system),” Acustica 6, 356–381.

H. M. Jackson and B. C. J. Moore: Discrimination of complex tones

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Wed, 25 Mar 2015 21:52:05

The role of excitation-pattern, temporal-fine-structure, and envelope cues in the discrimination of complex tones.

The discrimination of bandpass-filtered harmonic (H) from inharmonic (I) tones (produced by shifting all components of the H tones upwards by a fixed ...
840KB Sizes 1 Downloads 3 Views