Neural

network

models

of sound

localization

based

on directional

filtering by the pinna ChalapathyNetia)andEricD. Young DepartmentofBiomedicalEngineeringandHearingScienceCenter,TheJohnsHopkinsUniversitySchoolof Medicine,Baltimore,Maryland 21205

Michael H. Schneider b) DepartmentofMathematicalSciences, TheJohnsHopkinsUniversity, Baltimore,Mary/and 21218

(Received1 April 1992;acceptedfor publication5 August1992) Three-layerneural-networkfunctibnsweredevelopedto transformspectralrepresentations of pinna-filteredstimuliat the input to a space-mapped representation of sound-source direction at the output. The inputsare modeledafter transferfunctionsof the externalear of the cat; the output is modeledon the spatialsensitivityof superiorcolliculusneurons.Network solutions are obtainedby backpropagation and by a methodthat enforcesuniform taskdistributionin the hiddenlayer of the model. Solutionsare characterizedusingbandlimitedinputsto study the relativestrengthof potentialsoundlocalizationcuesin variousfrequencyregions.This analysissuggests that the frequencyregioncontainingthe first spectralnotch ( 5-18 kHz) providesthe bestlocalizationcues.Responsepropertiesof modelneuronswere studiedusing input patternsmodeledafter auditorynerveresponseprofilesto pure tonesat various frequenciesand soundlevels.The responsepropertiesof hiddenlayer modelneuronsresemble cochlearnucleustypesIII and IV and their composites. Neuronsin both hiddenand output layersshowthe propertiesof spectralnotchdetectors.Althoughneuralnetworkshave limitationsasmodelsof real neural systems,the resultsillustratehow they can provideinsight into the computationof complextransformationsin the nervoussystem. PACS numbers:43.64.Bt, 43.64.Ha, 43.64.Tk, 43.66.Qp

INTRODUCTION

The neural code for the location of acoustic stimuli is

fundamentallydifferentfrom the codesfor tactileandvisual stimuli. Locationson the body surfaceor in the visualfield are encodeddirectlyby activityat corresponding loci in the primaryafferentpopulations. By contrast,spatialcharacteristics of a sound source must be extracted within the central

auditorysystemfrom multiplemonauralandbinauralcues. Threegeneralclasses of cueshavebeendemonstrated to be importantfor localizationof soundsources:interaural time delays (ITDs), interaural level differences(ILDs), andspectralcues.Theprimarycuesfor detectingazimuthin the horizontalplane are ITDs createdby the delay of the waveformto the distalear and ILDs generatedby the acoustic shadowof the head (Durlach and Colburn, 1978; Kuhn,

1987;Mills, 1972). Spectralcuesproducedby directional filteringof the pinnaare the principalsourceof information about the elevationof a soundsource(Blauert, 1983;Butler and Belendiuk, 1977; Hebrank and Wright, 1974; Roffier

andButler, 1968;WightmanandKistler, 1989),but spectral cuescan also provide information about sourceazimuth (Belendiuk and Butler, 1975; Musicant and Butler, 1984).

Spectralcuescanbe monaural,i.e., informationcanbe conveyedsolelyby the spectrumof the soundin oneear; however,thereis alsoa binauralcomponentto spectralcues,in a)Current address:IBM, Boca Raton, FL 33429. b)Current address:AT&T Bell Labs., Holmdel, NJ 07733.

3140

J. Acoust.Soc.Am.92 (6), December1992

that ILD canbea strongfunctionof frequencynearspectral peaksand notchesin eitherear. That is, ILDs and spectral cuesare differentaspectsof the samephenomenon. In this paper,we use"ILD" to referto differences in stimuluslevel betweenthe earsat a particularfrequencyor overa rangeof frequencies; by "spectralcues"we meanthe variationin stimuluslevelacrossfrequencyeithermonaurallyor binaurally.

The directionalfiltering propertiesof the pinna have beenstudiedby measuringthe transferfunctionfrom free fieldto a point nearthe tympanicmembranein humansubjects(Blauert, 1983;HebrankandWright, 1974;Mehrgardt and Mellert, 1977;Middlebrookset al., 1989;Shawand Teranishi,1968) andin animals(e.g.,Musicantet al., 1990and Rice et al., 1992 in cats;Jen and Sun, 1988 in bats;Carlile, 1990in ferrets). Thesemeasurements of pinnatransferfunctionsshowthe existenceof systematiccuesthat couldpotentially definesourcelocationuniquely,both in azimuthand elevation.

The extraction

of sound source location from

spectralcuesis the principalfocusof this paper.

Figure1 shows examples of pinnatransfer functions • measuredin anesthetizedcats (Rice et al., 1992). Thesedata

are typical of transferfunctionsfor soundsoriginatingin front of a cat. Examination

of the variation of transfer func-

tionshapeassoundsourcedirectionchanges revealsthat the variationscan be broadly categorizedinto three frequency regions:at frequencies below5 kHz (AL region),the transfer functionshavea relativelysmoothspectralshape;in the

0001-4966/92/123140-17500.80

@ 1992AcousticalSocietyof America

3140

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Sat, 22 Nov 2014 10:15:01

-15.0 o EL

(a)

15øAZ ..... ....... 30.07.5 øEL EL o

20 -:.j

--•,... ..

midfrequencyregionbetween5 and 18 kHz, transferfunctionsare characterizedby a prominentnotchin the spectrum (the first notch or FN); and in the high-frequency(HF) regionabove18kHz, transferfunctionsare characterizedby complexand highly variablespectralshapes.Figure 1(a) illustratesthe variation of transferfunction shapewith elevation at a fixed azimuth of 15ø.As elevation increases,there

o:! ......... z•LRegion

FN Region

HF Region

I

2

I

10

40

is almost no changein gain in the AL region;there is an orderly increasein the frequencyof the FN; and there are unsystematic complexchangesin gainand transferfunction shapein the HF region.Similar effectsare seenfor changes in azimuth at constantelevationin Fig. 1(b), except that now there is substantialgain increaseacrossfrequencyas azimuth increases,i.e., as the stimulus source moves closer

to the ear in whichthe measurementisbeingmade.This gain changeis the sourceof azimuth-dependentILDs. The fact that FN frequencyvarieswith both elevation and azimuthsuggests a schemeto specifyuniquelythe location of a soundsourcein the frontal region,usingonly the

20-

0

-30 2

I

I

10

40

Frequency,kHz

(c)

60øEL .•.

...'-

,..-

-_

/

;

•.•-,.

'...

--..

,•o.. .; ... -?.. .! ;ß ,, ,•, ..•....J-.-' -... 30" EL• •-.,..•' ??--, ; • • ' : .:•.--' • . .

• '-......; ! , •.x'-,'-..-' .-'

•5'E

"•



.'

'

'

I

.".;> -.::

2o

10

o

• -40 • 100

..40'....

-100

0

Frequency,•Hz

100

10

Azimuth,degrees

• ....

• ....

-100

100

, ....

0

Frequency, kHz

100

Azimuth, degrees

(c) 6

(c) 6

I i!

.9

40

2O

........ • .......

0

60

J,

0



.9

o

0

o

o

o i

-6

i

f

i

! i i i I

3

i

10

i

,

i

! ' i i

3O

100

lO

Frequency, kHz

(d) 1.o

,

i

i

i |

!

!

I

i

!

i

i

(d) •.o

!

i

_

AZ = 0, ELV = 52.5 to 90.0



0.8



0.6

30

lOO

Frequency, kHz i

i

i

i

i i

I

_

• •

i

i

i

i

i

i

i

i

AZ -- - 30, EL = - 30.0 to - 7.5 AZ =- 15, EL =-30.0 to -15.0

_

-

U

_

.._

E

_

.•

0.4

0

_

_

o

z

E 0.2

_

_

o

z

_

o.o

1

10

30

lOO

ol.0

i

i

i

i

i

i

i i

I

i

lO

Frequency, kHz

i

i

i

i

i

i

30

Frequency, kHz

FIG. 12. Propertiesof a hidden-layerunit from a fault-tolerant solution

of a hidden-layer unitwithtypeIV properties. Layout withmonauralinput-s. LayoutsameasFig. 11.Thisunithasnospontaneous FIG. 13.Properties of the figureis the sameas Fig. 11. Unit is from a solutionwith monaural activity,soinhibitoryregionscannotbeseenin the response maps. inputstrainedwith backpropagation.

Figure11(d) shows theinputpatterns forninedirections at which excitatoryresponses are obtained (listed in the legend).ThesespectrashowFNs at frequencies between 8 and 11 kHz, where the connectionstrengthhas a broad negativepeak and the responsemap showsinhibitory responses. The patternof excitatoryand inhibitoryweightsat otherfrequencies alsoshowsa generalcorrespondence to the spectralpatternsin Fig. 11(d), sothat this unit canbe interpreted as a matched filter for such stimuli. The sameinterpretationcanbe madefrom the spatialresponse map, Fig. 11(b). Ignoringthe four excitatoryresponses at highelevations,theexcitatoryresponse directions arebelowa diagonal line runningfrom 22.5øelevationat -- 30øazimuth to -- 7.5ø elevationat q- 30øazimuth. Figure 1(c) showsthat this is approximatelya lineof constantFN frequencyfor a frequency of about 11.5 kHz. Belowthis line, the FN frequencyis below11kHz, but above8 kHz. Thus,the excitatoryspatial responsearea correspondsto sourcedirectionswith FN fre-

quencies within the inhibitoryregionof the unit'sresponse -map; the large negativeweightingof connectionsat these frequencies apparentlypreventsexcitatoryresponses unless the stimulushas a notch near 10 kHz. The four excitatory responses at highelevations areoutsidethe regionof orderly FN behaviorsummarizedin Fig. 1(c); transferfunctions from thesedirectionsshowa deepnotch near 10 kHz and other featuresconsistent with the connectionstrengthdiagram (seeRice et al., 1992, Fig. 5). Thus, all measuresof 3152

J. Acoust. Soc. Am., Vol. 92, No. 6, December 1992

responsefor the unit in Fig. 11 are consistentwith one another.

Figure 12 showsa hiddenunit from a fault-tolerantsolution. In this case,lack of spontaneous activity conceals inhibitoryregionsandmakesinterpretation of the response map equivocal.The spatial responsemap [Fig. 12(b)] showsexcitatoryresponses to high elevationstimuli;these stimulihavenotchesat high frequencies and a dip at 5-6 kHz, with a broadpeakof energynear 10 kHz, asshownin Fig. 12(d). The connectionstrengthplot [Fig. 12(c) ] makesit clearhowtheunitgainsitsselectivityfor thesehighelevationstimuli;the importanceof the dip at 5-6 kHz is apparent,sinceit corresponds to the largenegativeweighting centeredon 6 kHz. However,thesefeaturesarelessapparentin thefrequencyresponse map,because theinhibitory responsescannot be observed.

Figure 13 showsa casein which the response map revealsverylittle aboutthe response characteristics of the unit. Onceagain,however,the unit'sspatialresponse mapcanbe understood in termsof theconnection strengthdiagramand thespectraof theinputpatternsgivingexcitatoryresponses. In this case,the unit givesexcitatoryresponses to spectral patternswith a FN between8 and 9 kHz and a broadpeak slightly above 10 kHz [Fig. 13(d) ]. The connection strengthplot showsa regionof large negativeweightingat Neti ot a/.' Neural network localization model

3152

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Sat, 22 Nov 2014 10:15:01

the frequencyof the FN and a weakexcitatoryregionnear the energypeakin thestimuli.Thesefeaturescannotbeseen in the response map,whichis entirelyinhibitory;in fact,the small excitatoryregion in the responsemap just below 20 kHz corresponds to nothingin particularin the stimulus patternsin Fig. 13(d). A secondaspectof connectivitywhich is importantfor interpretingthe functionalroleof a unit isitsprojectivefield (Lehky andSejnowski,1988),i.e.,thetargetsandweighting of its outputs.In the caseof the networksolutionsstudied here, hidden units usuallyhave projectivefieldswhich are predictablefrom their spatialreceptivefields.Most hidden unitsin our solutionshavespatialreceptivefieldslike those shownin Figs. 11-13, with excitatoryresponses over some fairly largeconnectedregionof spaceand inhibitoryor no responses at surroundingregions.The weightingof hidden units'outputsat the outputlayer reflectstheseresponses in that positiveweightsappearat pointsin the output array representingregionsof spacewherethe hiddenunit gives excitatoryresponses and negativeweightsappearat points representing regionswherethe hiddenunit givesinhibitory responses (Neti, 1990). Thus the strategyadoptedby the networksolutionsfor carryingout the transformationfrom spectralpatternsto soundsourcedirectionisto producehiddenunitswith a varietyof spatialselectivities, which appear to bestronglyinfluencedby FN frequency,andthenmapthe hiddenunit outputsto the outputlayerin a way that reflects the spatialsensitivity.Naturally, a wide variety of hidden unit spatialselectivitiesis seenand thesepatternsare not duplicatedfrom solutionto solution. III. DISCUSSION

The purposeof this studyis to understandthe importance,for extractingsoundlocationinformation,of the variousfeaturesin the spectralrepresentation of signalsat the eardrumand to understandthe typesof neuralsystemsthat are usefulfor extractingthat information.We showthat a three-layerneural network can be trained to perform the transformationfrom spectralinformationto soundsource directionwith a varietyof input codingschemes. Neural networks are highly simplifiedmodelsof real neural systems, but theyprovidea meansof studyingthoseaspects of neural informationprocessing that involvepatternsof connectivity in hierarchicalneural systems.The generalnature of the solutionto the problemstudiedin thispaperinvolvesgeneratinghidden-layerunitsthat respondto relativelylargeconnectedsubregions of spaceandthenmappingthe outputsof thosehidden units onto the output layer with connection weightingthat parallelstheir spatialresponsiveness. With this scheme,it is only necessaryto generatea number of differenthidden-layerspatialresponseprofilesthat is adequateto allowdifferentsourcedirectionsto be discriminated. Althoughwe carriedout only limitedtestsof the effects of hidden-unitnumber,it is clearthat a relativelysmall number, lessthan 10, of hidden units is sufficient. A. Importance of the FN cue

A principalresultof themodelingstudies isthattheFN in pinna transferfunctionsis of paramountimportanceto 3153

J. Acoust.Soc.Am.,Vol. 92, No. 6, December1992

the networksolutionsthat weregenerated.This istrue for all inputcodingschemes andtrainingalgorithmsthat wereconsidered.The importanceof the FN regionto the computations is clearly demonstratedby the partial-information analysisof Fig. 7, but it isalsosuggested by otherresults.The output-layerunitsdevelopan excitatory-inhibitory-excitatory organizationof their response mapsthat isstronglycorrelatedwith the spectraof stimuli in the FN region.In fact, in somesituations(Fig. 9), the responsemapscanbe interpreted as matchedfilters for the FN feature of the input patterns.In the hidden layer, spatial responsemaps frequentlyshowexcitatoryresponseregionsin which the FN frequencycorresponds to deepnegativepeaksin the connection strengthpatternsfromthe inputlayer (Figs. 11and 13). Finally, the mostrobustmodelconfigurationsin the partialinformationanalysisare binauralmodelspresentedwith FN information. This result is consistentwith the hypothesis, presentedin Fig. 1(c) ( Riceetal., 1992), that comparisonof the FN frequencies in the two earsis a powerfulsoundlocalization

cue.

The ideasummarizedin Fig. 1(c), that soundlocalization canbe basedon comparison of FN frequencies in the two ears,doesnot imply a particularmechanism for using the binauralFN information.In particular,the modelsdo notextractFN frequencies for eachearseparately in monaural calculationsandthencombinethat informationat a higher level.Instead,aswasdiscussed above,thebinauralmodels

depend heavilyonEI (or IE) unitsin thehiddenlayer.The weightpatternsof theseunitstendto be symmetrical betweenthetwoears(i.e., positivefromoneear,negativefrom the other) with the largestweightsin the FN frequencyre-

gion;theshapes of theweightpatterns areotherwise similar to thoseshownin Figs. 10-13 for monauralmodels.With theseweightpatterns,the binauralmodelsseemto compute a form of binauraldifferencespectrumat the hiddenlayer and then use that information in a way similar to the way

monauralspectraareusedin themonauralmodelsanalyzed in detail above.Suchbinauralspectrahave strongpeaknotchfeaturesin the FN frequencyregionbecauseof the occurrenceof notchesin the two earsat nearbyfrequencies

(seeFig.8 of Musicantetal., 1990andFig.6 of Riceetal., 1992).The shapes of thesepeak-notch featuresconveythe sameinformationas FN frequencyin the schemeof Fig. 1(c). The nervoussystemcoulddo a similarsortof calculation in the lateralsuperiorolivarynucleusor the inferior colliculus,whereunitsare sensitiveto ILDs (Irvine, 1986),

and thenanalyzethe informationin the form of binaural spectralpatterns. Behavioral evidence that cats use the FN information

is

lacking at present.Cats do better in spatialdiscrimination taskswith broadband stimuli, in that their minimum audible

angleis smallerfor noisethan for tones(Martin and Webster, 1987). This resultis consistentwith the useof a spectral cue like the FN; however a similar result obtained in Old World monkeys(Brown et al., 1980) hasbeeninterpretedas implyingthat ITD cuesderivedfrom envelopefluctuations are important. It is not possibleto discriminatethesetwo hypotheses with currentdata,but if spectralcuesare being used,the resultsreportedhereclearlypredictthat catswill Netieta/.: Neuralnetworklocalization model

3153

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Sat, 22 Nov 2014 10:15:01

showa particularrelianceon the frequencyregionfrom 5 to 18 kHz when they are localizingsoundsin the frontal field where FN frequencyis a strongcue. It is importantto notethat the importanceof the FN cue is an emergentpropertyof the network solutions.There are no constraintson the way in which the modelscombineinformation acrossfrequencyregions,exceptthat combination of informationacrossfrequencyis a key aspectof the transformationsperformedin the models.This sortof information processingmay be an important aspectof complex auditorypatternrecognitiontasksin general,and the useof spectralcuesfor soundlocalizationmay provide a convenient, straightforward,and easilyinterpretedparadigmfor studyingcomplexstimulusprocessingin the auditory system.

B. Models with binaural input coding

A surprisingresultof thepartial-information analysisof Fig. 7 is the lack of dependenceof modelswith binaural input codingon informationin the low frequencyregion. The acousticsof the headand pinnaproducea verystraightforward cuefor azimuthin this frequencyregion,the differencein soundintensityin the two ears,or ILD (Irvine, 1987; Musicant et al., 1990, Fig. 8; Rice et al., 1992, Figs. 6 and 13). Intuitively, binaural modelsseemto be the bestway to analyze ILDs. However, the changesin ILD as azimuth changesare reflectedin monauralgainchanges[ Fig. 1(b) ], that can be utilized in monaural models, at least if stimulus

amplitudeis fixed.In fact, networksolutionswith binaural input coding developedvery little dependenceon the AL region.This fact is illustratedby the weightsfrom the input layer to the hidden unit layer; in binaural models, the weightsin the AL regionstend to be near zero, whereassignificant weights develop in the higher frequencyregions (Neti, 1990).

The reasonfor the low dependenceof the binauralmodels on the AL regionrequiresfurther study;however,four hypothesesfor this behaviorcan be offered.First, the range of azimuths used in the training and test sets (- 30ø to •- 30ø) is not wide. It may be that, over this range, the changein low-frequencyILD is not particularlylarge and the other cues, especiallyin the FN region, are strong enoughthat the low-frequencyILD cuesare overwhelmed. As was mentionedabove, significantILD cuesappear at frequenciesabovethe AL regionbecauseof the binaural interactionof the complextransferfunctionsin the FN and HF regions(Musicant et al., 1990,Fig. 8; Rice et al., 1992, Fig. 6). Thesecuesare at leastas strongas thosein the AL region,althoughthey are more complexbecausethe ILD variesstronglywith frequency.The network solutionswith binaural inputsseemto usethesecues,becausethe weight patternsfrom the two earsto the hiddenlayerarelargein the FN and HF regionsand are sometimes mirror imagesthere,

ly, but ILDs in the AL regionconveyno informationabout elevationover this rangeof azimuths. Third, the trainingand test patternsusedin this study areall basedon a flat-spectrumstimulussourceasfilteredby the headand pinnae.A flat-spectrumsourcesimplifiesthe analysisproblem,whichamountsto separatingthe transfer functionof the pinnaefrom the sourcespectrum.In fact, with a flat sourcespectrum,almost as much information about sourcelocationcan be gainedfrom comparisonof stimuluslevel acrossfrequencyin a monauralmodelascan be gainedby comparisonof the two earsin a binauralmodel. In real life, the auditorysystemmustcontendwith unknown sourcespectra,in whichcasecomparisonacrossfrequency in a monauralmodelprovideslesscertainresults.With unknownspectra,the relativelysimpletransferfunctionsin the AL regionmightprovideadvantages for estimatingazimuth, and might be weighedmore heavily. Fourth, most of the networks (and all the binaural networks) were trained and tested with fixed level stimuli. In

thissituationtheaverageamplitudeof theinputpatternover anyfrequencyregionbecomesa sufficientcue,especiallyfor azimuth [see Fig. 1(b)]. The importanceof ILD cuesin general,and AL regioncuesin particular,is reduced.In the morerealisticsituationwherestimulusamplitudeis unpredictable,the ILD cuesin the AL regionarelikely to bemore important in the model.

C. Response properties of units compared with neurophysiology

The responsepropertiesof hiddenunitswereexamined with stimuli similar to thoseused in singleunit studiesin order to relate the types of units that developin network solutionsto neuralresponse typesthat havebeenseenin the auditorysystem.This comparisonhasbeendonewith neuronsin the cochlearnucleus,becausedetailedresponsemap data are available for the cochlear nucleus. The most inter-

azimuths ( - 30øto •- 30ø). As a result, the network solu-

estingaspectof thisanalysisis the occurrenceof unitsresembling the type IV principal neuronsof the dorsalcochlear nucleus(Evans and Nelson, 1973;Spirouand Young, 1991; Young and Brownell, 1976). In a recent study of type IV units (Spirou and Young, 1991), a modelfor the organization of their response mapswaspresentedwhichis similarto the model neuronsdescribedin Figs. 10 (left column) and 13. In this model, type IV units receiveexcitatoryinputsat BF and strongerinhibitory input from interneuronswhose BFs are usuallyslightlybelowthe type IV BF, althoughthe inhibitorytuningcurvesstill overlaptype IV BF. This organization is similar to that of the type IV units from the networksolutions,in that weakexcitatoryareasarebounded by stronginhibitory areas;in both models,recruitmentof the inhibitory inputs as tone level increasesproducesthe nonmonotonictype IV responsecharacteristic.In the same paper,it is shownthat DCN type IV unitsare very sensitive to smallchangesin the width of a spectralnotchcenteredon BF (Spirou and Young, 1991), leadingto the speculation that type IV units are notch detectors,which servea role in processingspectrallyencodedinformation about soundlo-

tionsareforcedto weightinformationaboutelevationheavi-

cation.

i.e., in EI (or IE) hidden units.

Second,the rangeof elevationsrepresentedin the train-

ingandtestsetsislarger( -- 30øto •- 90ø)thantherangeof

3154

J. Acoust. Soc. Am., Vol. 92, No. 6, December 1992

Neti eta/.: Neural network localization model

3154

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Sat, 22 Nov 2014 10:15:01

The examplesshownin Figs. 12and 13pointout the fact that neurophysiological responsemaps generatedusing tones as inputs do not necessarilyreveal much about responsesto natural complexstimuli. The lack of correspondencecanbe due either to the lack of spontaneous activity, whichmasksinhibitorycomponents of the response map,or to the failure of responsemapsto reveal the extent of local variationof inhibition,as in caseslike Fig. 13. This result showsthat it is necessary to developbetter techniquesfor characterizingcomplexunits in the auditory system,and furthersuggests thattechniques thatrevealsomething about the patternsof connectivityof individualcellsare desirable. D. Limitations

and future

studies

The results described above demonstrate the usefulness

of neuralnetworkmodelsasonemeansof approaching the problemof informationprocessing in the nervoussystem. Howeverthereare a numberof aspectsof the methodsused in thisstudythat couldbeimproved.First,theinputpatterns are idealizationsof the auditorynervepopulationprofiles with whichthe centralnervoussystemis normallypresented. In particular,the effectsof cochlearfilteringandnonlinearitiesand the variationsin the shapeof input patternsas stimuluslevelchangeshavenot beenmodeled.The largest effectswhichhavebeenignoredare the changesin the representation of spectra due to rate saturation (Sachs and Young, 1979;Young et al., 1990). Saturationeffectswere completelyignored for the pinna-filteredstimuli and were only partly accountedfor in generatingresponses to tones. However, it seemslikely that improvementsin the accuracy of modelingof input patternswouldnot substantiallychange the conclusionsof this study. Second,we have usedstimuli from a limited range of azimuths.This causedsomeproblemsof interpretation,noted above, and should be corrected. In addition, it would be

interestingto extendthe modelto stimuliwith spectraon the sides and behind the animal, where the FN is not as consis-

tent and simplea cue. Third, we have assumeda fixed, flat sourcespectrum and (usually) a fixed stimulus level. The effectsof source spectrumon the organizationand responsepropertiesof network solutions is a particularly interesting problem. Some aspectsof this problem were discussedaboveand it will be interestingto seehow the resultschangeif nonflat sourcespectraare used in the training and test sets.For example, it is likely that nonflat sourcespectrawill force increasedrelianceon binaural comparisonsin all frequency regions,becausecomparisonof the spectrain the two earsis the only way that the systemcan separatea completelyunknown sourcespectrumfrom the pinna transferfunctions, giventhat the transferfunctionsare differentin the two ears. Similar comments can be made about stimulus level.

A final weaknessof the presentmodelis the lack of constraints on connections, which has resulted in unrealistic

tuning and responsepropertiesof hidden units. As we gain more informationaboutthe responsepropertiesof cochlear nucleusunits, or of units in more central auditory nuclei, it will be useful to constrain the hidden layers of neural network modelsto be tonotopicallyorganized,so that each 3155

J. Acoust. Soc. Am., Vol. 92, No. 6, December 1992

unitreceives inputfromonlya limitedfrequency rangealong the input layer, andto constraintheir connectionssothat the

response properties ofhiddenunitsresemble theresponses of units.in therealauditorysystem.By studyingthepatternsof projectionthat developfrom such model central neurons when a networkis trained,it shouldbe possibleto gain further insightsinto the value of such neuronsfor various auditory functions. ACKNOWLEDGMENTS

This work wassupportedby Grant No. DC00115 from the NIDCD, Grant No. N0014-89-J-1390 from the ONR, andGrantNo. ECS91-11548fromNSF. J. J. Riceprovided thepinnatransferfunctions thatwereusedasinputpatterns andBertrandDelgutteprovidedthetuningcurvesthat were usedin the tone-response calculations. We are gratefulfor the assistance of PhyllisTaylor with graphicsand to our colleagues in theHearingSciences Centerfor usefulsuggestionson the manuscript.

The term "transferfunction"is usedin thispaperto referto the magnitudesof transferfunctionsonly;i.e.,their phasespectraarenot considered. Aitkin, L., and Martin, R. (1987). "The representationof stimulusazimuth by high bestfrequencyazimuth-selective neuronsin the central nucleus of the inferior colliculusof the cat," J. Neurophysiol.57, 1185-1200. Aitkin, L., and Martin, R. (1990). "Neurons in the inferior colliculus of cats sensitiveto sound-sourceelevation," Hear. Res. 50, 97-106.

Belendiuk,K., and Butler, R. (1975). "Monaural localizationof low-pass noisebandsin the horizontal plane," J. Acoust. Soc.Am. 58, 701-705. Blauert,J. (1983). SpatialHearing: The Psychophysics of Human Sound Localization(MIT, Cambridge,MA). Brown, C., Beecher,M., Moody, D., and Stebbins,W. (1980). "Localization of noisebandsby Old World monkeys,"J. Acoust.Soc.Am. 68, 127132.

Butler, R., and Belendiuk,K. (1977). "Spectralcuesutilized in the localization of soundin the mediansagittalplane," J. Acoust. Soc.Am. 61, 1264-1269.

Caird, D., and Klinke, R. (1983). "Processing of binauralstimuliby cat superiorolivarycomplexneurons,"Exp. BrainRes.52, 385-399. Carlile,S. (1990). "The auditoryperipheryof the ferret.I: Directionalresponsepropertiesand the pattern of interaural level differences,"J. Acoust. Soc. Am. 88, 2180-2195.

Durlach,N., and Colburn,H. (1978). "Binauralphenomena," in Handbookof Perception, editedby E. CarteretteandM. Friedman(Academic, New York), Vol. IV, pp. 365-466. Evans,E., andNelson,P. (1973). "The responses of singleneurones in the cochlear nucleus of the cat as a function of their location and the anaesth-

etic state,"Exp. Brain Res. 17, 402-427.

Gill, P., Murray,W., andWright,M. ( 1981). PracticalOptimization (Academic, New York).

Goldberg,J., andBrown,P. (1969). "Response ofbinauralneurons of dog superiorolivarycomplexto dichotictonalstimuli:Somephysiological mechanisms of soundlocalization,"J. Neurophysiol. 32, 613-636. Hebrank,J., andWright, D. (1974). "Spectralcuesusedin the localization of soundsourceson the medianplane,"J. Acoust.Soc.Am. 56, 18291834.

Hornik, K., Strinchombe, M., andWhite, H. (1989). "Multilayerfeedforwardneuralnetworksareuniversalapproximators,"Neural Networks2, 359-366.

Irvine,D. (1986). TheAuditoryBrainstem(Springer-Verlag, Berlin). Irvine,D. (1987')."Interauralintensitydifferences in the cat:Changein soundpressure levelat the two earsassociated with azimuthaldisplacementsin the frontal horizontal field," Hear. Res. 26, 267-286.

Jen,P., andSun,D. (1988). "Directionalityof soundpressure transformation at the pinnaof echolocating bats,"Hear. Res.34, 101-118. Kim, D., andMolnar, C. (1979). "A populationstudyof cochlearnerve fibers:Comparison of spatialdistributions of average-rate and phaseNeti ot a/.: Neural network localization model

3155

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Sat, 22 Nov 2014 10:15:01

lockingmeasures of responses to singletones,"J. Neurophysiol. 42, 1630.

Knudsen,E. (1982). "Auditory and visualmapsof spacein the optic tectum of the owl," J. Neurosci. 2, 1177-1194.

Knudsen,E., andKonishi,M. (1978). "Spaceandfrequencyarerepresentedseparately in auditorymidbrainof theowl," J. Neurophysiol. 41, 870884.

Kuhn, G. (1987). "Physicalacousticsand measurements pertainingto directional hearing," in DirectionalHearing, edited by W. Yost and G. Gourevitch (Springer-Verlag,New York), pp. 3-25. Lehky,S., andSejnowski, T. (1988). "Networkmodelof shape-from-shading:neuralfunctionarisesfrombothreceptiveandprojective fields,"Nature 333, 452-454.

Liberman,M. (1982). "The cochlearfrequencymap for the cat: Labeling auditory-nervefibersof knowncharacteristicfrequency,"J. Acoust.Soc. Am. 72, 1441-1449.

Lippmann,R. (1987). "An introductionto computingwith neuralnets," IEEE ASSP Mag. April, pp. 4-22. Martin, R., and Webster,W. (1987). "The auditory spatialacuity of the domesticcat in the interaural horizontal and median vertical planes," Hear. Res. 30, 239-252.

Mehrgardt,S., andMellert, V. (1977). "Transformationcharacteristics of the external human ear," J. Acoust. Soc. Am. 61, 1567-1576.

Middlebrooks, J., and Knudsen, E. (1984). "A neural code for auditory spacein the cat'ssuperiorcolliculus,"J. Neurosci.4, 2621-2634. Middlebrooks, J., Makous, J., and Green, D. (1989). "Directional sensitivity of sound-pressure levelsin the humanear canal," J. Acoust.Soc.Am. 86, 89-108.

Mills, A. (1972). "Auditory localization,"in Foundationsof ModernAuditory Theory,editedby J. Tobias ( Academic,New York), Vol. II, pp. 301348.

Musicant, A., and Butler, R. (1984). "The influenceof pinnae-basedspectral cues on sound localization," J. Acoust. Soc. Am. 78, 1195-1200.

Musicant, A.D., Chan, J. C. K., and Hind, J. E. (1990). "Direction-dependent spectralpropertiesof cat externalear: New data and cross-species comparisons,"J. Acoust.Soc.Am. 87, 757-781. Neti, C. (1990). "Neural Network Models of Sound Localization Basedon Directional Filtering of the Pinna," Ph.D. thesis,The JohnsHopkins Univ.

Neti, C., Schneider,M., and Young, E. (1992). "Maximally fault tolerant neural networks," IEEE Trans. Neural Networks 3, 14-23.

Palmer,A., andKing, A. (1985). "A monauralspacemapin theguinea-pig superiorcolliculus,"Hear. Res. 17, 267-280. Poggio,T., and Girosi,F. (1990). "Regularizationalgorithmsfor learning that are equivalentto multilayer networks,"Science247, 978-982. Rice, J. J., May, B. J., Spirou,G. A., and Young, E. D. (1992). "Pinnabasedspectralcuesfor soundlocalizationin the cat," Hear. Res.$8, 132152.

Roffler, S., and Butler, R. (1968). "Factors that influencethe localization of soundin the verticalplane," J. Acoust. Soc.Am. 43, 1255-1259.

3156

J. Acoust. Soc. Am., Vol. 92, No. 6, December 1992

Rumelhart,D., Hinton, G., and Williams,R. (1986). "Learninginternal representations byerrorpropagation,"in ParallelDistributedProcessing: Explorationsin theMicrostructure of CognitionVol 1:Foundations, edited by D. Rumelhartand J. McClelland (MIT, Cambridge,MA), pp. 318-362.

Sachs,M., andAbbas,P. (1974). "Rate versuslevelfunctionsfor auditorynervefibersin cats:tone-burststimuli," J. Acoust. Soc.Am. $6, 18351847.

Sachs,M., and Young, E. (1979). "Encodingof steady-state vowelsin the auditory nerve:Representationin terms of dischargerate," J. Acoust. Soc. Am. 66, 470-479.

Sachs,M., Winslow, R., and Sokolowski,B. (1989). "A computational model for rate-levelfunctionsfrom cat auditory-nervefibers,"Hearing Res. 41, 61-70.

Sejnowski,T., Koch, C., and Churchland, P. (1988). "Computational neuroscience,"Science241, 1299-1306. Shaw,E., and Teranishi,R. (1968). "Soundpressuregeneratedin an external-earreplicaand real humanearsby a nearbypoint source,"J. Acoust. Soc. Am. 44, 240-249.

Shofner,W., and Sachs,M. (1986). "Representationof a low-frequency tonein the dischargerate of populationsof auditorynervefibers,"Hear. Res. 21, 91-95.

Shofner,W. P., andYoung,E. D. (1985). "Excitatory/inhibitoryresponse typesin the cochlearnucleus:Relationshipsto dischargepatternsand responses to electricalstimulationof the auditory nerve,"J. Neurophysiol. $4, 917-939.

Spirou,G., and Young, E. (1991). "Organizationof dorsalcochlearnucleustype IV unit response mapsand their relationshipto activationby bandlimitednoise,"J. Neurophysiol.66, 1750-1768. Tsuchitani, C., and Boudreau, J. (1969). "Stimulus level of dichotically presentedtones and cat superior olive S-segmentcell discharge,"J. Acoust. Soc. Am. 46, 979-988.

White, H. (1989). "Learning in Artificial Neural Networks:A Statistical Perspective,"Discussionpaper,UCSD, Departmentof Economics. Wightman,F., andKistter,D. (1989). "Headphonesimulationof free-field listening.II: Psychophysical validation," J. Acoust. Soc.Am. 85, 868878.

Wise,L., and Irvine, D. (1983). "Auditory response propertiesof neurons in deeplayersof cat superiorcolliculus,"J. Neurophysiol.49, 674-685. Yin, T., andChan,J. (1990). "Interaural time sensitivityin medialsuperior olive of cat," J. Neurophysiol.64, 465-488. Yin, T., and Kuwada, S. (1984). "Neuronal mechanismsof binaural inter-

action," in DynamicAspectsof NeocorticalFunction,editedby G. Edelman, W. Gall, and W. Cowan (Wiley, New York), pp. 263-313. Young,E., and Brownell,W. (1976). "Responses to tonesand noiseof singlecellsin dorsalcochlearnucleusof unanesthetizedcats,"J. Neurophysiol. $9, 282-300.

Young,E., Rice, J., and Spirou,G. (1990). "Auditory-nerverate representation of soundlocalizationinformationpresentin spectrageneratedby direction-dependent pinnafiltering,"Soc.Neurosci.Abst. 16, 875.

Neti et aL: Neural network localizationmodel

3156

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Sat, 22 Nov 2014 10:15:01

Neural network models of sound localization based on directional filtering by the pinna.

Three-layer neural-network functions were developed to transform spectral representations of pinna-filtered stimuli at the input to a space-mapped rep...
3MB Sizes 0 Downloads 0 Views