Relations among different measures of speech reception in subjects using a cochlear implant W. M. RabinowitzD. K. EddingtonL. A. DelhorneP. A. Cuneo

Citation: The Journal of the Acoustical Society of America 92, 1869 (1992); doi: 10.1121/1.405252 View online: http://dx.doi.org/10.1121/1.405252 View Table of Contents: http://asa.scitation.org/toc/jas/92/4 Published by the Acoustical Society of America

Relations among different measures of speech reception in subjects using a cochlear implant W.M.

Rabinowitz

Research LaboratoryofElectronics, Massachusetts Instituteof Technology, Room36-789,50 Vassar Street, Cambridge, Massachusetts 02139andDepartment ofOtology andLaryngology, HarvardMedicalSchool, Boston,Massachusetts02115

D.K. Eddington Research LaboratoryofElectronics, Massachusetts Instituteof Technology, 50 Vassar Street,Cambridge, Massachusetts 02139,Cochlear ImplantResearch Laboratory, Massachusetts EyeandEarInfirmary,243 Charles Street,Boston, Massachusetts 02114,andDepartment ofOtology andLaryngology, HarvardMedical School, Boston,Massachusetts02115

L.A.

Delhorne

Research Laboratory ofElectronics, Massachusetts Instituteof Technology, 50 Vassar Street,Cambridge, Massachusetts 02139

P.A.

Cuneo

Cochlear ImplantResearch Laboratory, Massachusetts EyeandEarInfirmary,243Charles Street,Boston, Massachusetts 02114

(Received11September1991;revised21 May 1992;accepted17June1992)

A comprehensive setof speechreceptionmeasures wereobtainedin a groupof about20 postlingually deafened adultusersof the Ineraidmultichannel cochlearimplant.The measures includedaudio,visual,andaudiovisualrecognitionof wordsembedded in two typesof sentences (with differingdegreesof difficulty)andaudio-onlyrecognitionof isolated monosyllabic words,consonantidentification( 12 alternatives,/Ca/), and vowelidentification (8 alternatives,/bVt/). For mostimplantees,the audiovisualgainsin the sentencetestswere veryhigh.Quantitativerelationsamongaudio-onlyscoreswereassessed usingpower-law transformations suggested by BoothroydandNittrouer [J. Acoust.Soc.Am. 84, 101-114 (1988) ] that can accountfor the benefitof sentencecontext(via a factork) and the relation betweenwordandphonemerecognition(via a factorj). Acrossthe broadrangeof performance that existedamongthe subjects, substantial orderwasobserved amongmeasures of speechreceptionalongthe continuumfrom recognition of wordsin sentences, wordsin isolation,speech segments, andtheretrievalof underlyingphoneticfeatures.Correlations exceeded 0.85 amongdirectandsentence-derived measures of isolatedwordrecognition aswell asamongdirectandword-derived measures of segmental recognition. Resultsfroma varietyof otherstudiesinvolvingpresentation of limitedauditorysignals,single-channel and multichannel implants,andtactualsystems revealeda similarpatternamongwordrecognition, overallconsonant identificationperformance, andconsonantal featurerecruitment.Finally, improvingthereception of consonantal placecueswasidentifiedaskeyto producing the greatestpotentialgainsin speechreception. PACS numbers:43.71.Ky, 43.66.Ts, 43.71.Es, 43.64.Me

INTRODUCTION

It isnowwell established that cochlearimplantsprovide

usefulbenefits to speech communication for manypostlinguallydeafened individuals. Significant studies areavailable covering open-set measures of speech-reception (e.g.,word, sentence, andspeech trackingtests)andclosed-set measures designed to revealunderlying cuesthatarebeingreceived

1985, 1987;Blameyet al., 1987;Skinneret al., 1988, 1991) the multichannelIneraid device(Eddington, 1983;Dorman et al., 1988, 1989a,b, 1990a), and from the University of Iowa where users of these two multichannel devices as well

asusersof a varietyof othersingle-channel andmultichannel deviceshave beenstudied (Gantz et al., 1988; Tye-Mur-

rayandTyler, 1989;Tyleretal., 1989a-c).Notwithstanding the extensiveresultsfrom theseand other studies,the quan(e.g.,speech-segment recognition withnaturalandsynthe- titativerelationsamongdifferentmeasuresof speechreception within implantees(and acrossdevices)has received sizedstimuli). Thesestudiesare availablefrom centerstestingusers ofthemultichannel Nucleus device (Dowelletal., onlylimitedattention.The main goalof this paperis to ex-

1869

J. Acoust.Soc.Am.92 (4),Pt.1, October19920001-4966/92/101869-13500.80 @ 1992Acoustical SocietyofAmerica

1869

plorethoserelationsin a groupof about20 individualsusing the Ineraid device.Insofar asorderly relationsare found (a) theycanhelprelatewhatsometimes appearsasa disjointset of descriptivemeasures of performance,(b) theycanreduce futureneedsfor conductingasmanytests,and (c) they may aidin predictingperformancewith otherauditoryprosthetic devices(e.g.,tactileaids) on the basisof morelimitedevaluation.

Articulation theory (ANSI, 1969) representsone framework for exploringsuchrelations.Performanceon a given speech-reception test increasesmonotonicallyas a functionof the ArticulationIndex (AI), a quantityranging from zero to one. For normal-hearingindividuals,AI reflectsspeech-signal audibility--the proportionof the speech levels and spectralcomponentsavailableto the listener. Thus, decreasing speech-to-noise ratio and/or decreasing a system'spassband decreases AI. The main usesof the theory follow from its assumptionthat a givenAI, irrespectiveof how it is achieved,resultsin a givenlevel of performance. Applicationto implanteespeechreceptionis hamperedby two facts.First, for mostof the speech-reception teststhat we use,the articulationfunctions(performanceversusAI) have not been measured.Second,without speechreading cues,presentimplanteeshaveequivalentAIs that are relatively low (for averageperformersequivalentAIs are probably • 0.1, for thebestperformersequivalentAIs mayreach 0.4); hence,within the AI framework,onewouldbeoperating closeto the "floor." Boothroyd and Nittrouer ( 1988; Nittrouer and Boothroyd, 1990) suggested an alternativeapproach,usingsimple power-lawequationsfor relatingdifferentspeech-reception measures.The equations(given below) were motivatedby the applicationof probabilitytheoryto a modelof the effects of speechcontext.We have generalizedtheir approachto allow for representation of other factorsthat can affectthe relativedifficultyamongtests.For example,theydetermine a factork that relatesmeasures of wordrecognitionamong sentences with varyingdegreesof predictability(from near zeroto high) but whichareotherwisesimilar(with the same lengthandtalker, andsimilarvocabularies).We will usek to derivean estimateof isolatedword recognitionfrom testsof word recognitionin contextual sentences(of varying lengths)for subsequent comparisonwith an independent measureof isolatedword recognition(obtainedwith a different talker and vocabulary). Similarly, Boothroyd and Nittrouer estimatea factorj by comparingword and phoneme recognition measuresobtained from the same test words.Our useofj will beto estimatea phonemerecognition probabilityfrom measuresof word recognitionfor subsequentcomparisonwith independentmeasuresof consonant and vowelidentificationaccuracy(obtained,again,with a differenttalker and phonemecorpora). We will showthat whenappliedto thebroadrangeof speech-reception abilities exhibitedamong our implantees,high correlationsexist amongdirectandestimatedmeasuresof word andphoneme recognition.Finally, we will exploit the relation between consonantidentificationand word recognitionto interpret and predictspeech-reception performancewith a wide variety of auditoryprostheses.

1870

I. METHODS

A. Implant system

The Ineraid cochlearimplant (RichardsMedical Co.) consistsof an implantedelectrodearray, a percutaneous pedestaland connector,and an externalsoundprocessor (Eddington,1983;Youngbloodand Robinson,1988). The soundprocessor hasan eadevelinput microphone, a wideband automaticgain control (AGC), and a four channel overlapping bandpass filter systemwith crossover frequenciesof approximately 0.7, 1.4,and2.3 kHz. Thefouranalog filter outputsare delivered(via the percutaneous connector) individually to four monopolarintracochlearelectrodes,with a commonreturnelectrodein the temporalis muscle.Outputsfrom the lowestfrequencyfilterare deliveredto themostapicalelectrode, etc.Thisstrategyattempts to partiallymimicthenormalsound-frequency to cochlearplacetransformation thatoccurswithacoustic hearingbutis absentfor electroauditorystimulation. Gain controls includeuseradjustments for inputsensitivity(preceeding the AGC) andvolume(followingthe AGC) andchannel-specificgainsthat are set (internally) for eachindividual. B. Subjects

All of the subjects werepostlinguallydeafenedandimplantedwiththeIneraidcochlear prosthesis. Ageat implantation rangedfrom 30 to 87 years (mean = 50, s.d. = 14)

and durationof profounddeafness rangedfrom 0.5 to 43 years(mean-- 15,s.d.-- 15). All but two subjects wereimplantedat the Massachusetts Eye and Ear Infirmary.Implantationbeganat MEEI in 1985andhasproceeded since 1987 (underan NIDCD fundedProgramProject) at a rate of aboutfiveperyear.Theothertwosubjects wereimplanted in 1985at St. RaphaelsHospital(New Haven,CT) aspart of a programwhich has subsequently terminated;they joinedour researchprogramin early 1990. C. Speech reception tests

Speechreceptionwasassessed usinga varietyof measuresthat are part of our routinetestbattery.The testsare intendedto documentimplantees' speech-reception abilities at differentlevelsandprovideinformationregardingtheunderlyingcuesthat are beingreceived. Isolatedword recognition,usingonly auditoryinput (no speechreaching cues),was measuredusingthe standardized NU6 monosyllabicword subtestof the minimal auditorycapabilities(MAC) battery (Owenset al., 1985).

Thistestusesa singlerecording of 50testwordsspoken by a male talker on an audio cassette(Auditec, Inc. ). There is a

fixedinterwordanswertime of 4.5 s, whichis usuallyadequate,but sometimes rushesa subject'sanswer.A fewwords on the recordingare consistently identifiedincorrectlyby normal-hearinglisteners.Successivetest administrations usedthe samewordsandpresentation orderbut no feedback was provided.

Word recognition in sentences wasmeasured usingthe videodisc recordings of the CUNY sentences (Boothroydet al., 1985) andour ownvideotaperecordings of the IEEE/

J.Acoust. Soc.Am.,Vol.92,No.4, Pt.1,October 1992 Rabinowitz eta/.:Relations among measures ofspeech reception

1870

Harvard sentences(IEEE, 1969;Grant and Braida, 1991). The CUNY corpuscontains60 lists of 12 sentencespro-

ducedby a singlefemalespeaker.Eachlistcontainsonesentenceeachof three, four, five, ... and up to 14 words,for a total of 102words/list.The IEEE corpuscontains72 listsof ten sentences producedby onefemaletalker and onemale talker (fixed within a list and identified as LD and KG,

respectively, in Grant andBraida,1991) Eachsentence containsfive key words,for a total of 50 key words/list. The main difference between CUNY

and IEEE

sentence sets

concernstheir level of difficulty. The CUNY sentencesare relativelyeasy;theyareeveryday-likeandhavehighinternal

predictability. Examplesentences include"Cakeis sweet.," "Take your baseballglove to the game.," and "Did you know that they both get seasonticketsfor the operaevery year?."In contrast,the IEEE sentences are more difficult withfewcontextual cuesto aidkeywordidentification. Examplesentences (with key wordsitalicized)include"Glue the sheetto the dark bluebackground.," and "Thesedaysa chickenlegisa raredish."For bothsentence sets,individual listswereusedonly onceper listenerand no feedbackwas given.Thesetestswereconductedin threemodes:V = vision (speechreading) alone,A = audio (implant) alone, andA V-- audiovisual(implant plusspeechreading). Scoring is basedcountingall words (CUNY) and key words (IEEE) that are identified correctly verbatim, including plurality,verbtense,etc.(Sucha "strict"scoringrulesometimes seemsmisrepresentative; however,it is straightforwardto applyandinformalevaluationof morelenientrules did not changescoresappreciably.)Subjectsdictatedtheir answersto an experimenterwho satnearby;whentherewas anyuncertaintyasto whatthesubjectsaid,it wasresolvedby verbalor, on rare occasionsif necessary,written interaction beforeproceeding to the nextsentence. In a typicaltestsession,two or threelistsweretestedfor eachmodeand corpus.

Speech segment recognition, usingauditoryinputonly, wasassessed by measuring theidentification of closedsetsof consonants and vowels.Consonantrecognitiontestsuseda 12-alternativeidentificationparadigmfor the consonants /p,t,k,b,d,g,f,s,v,z,m,n/presented in a/Co/context, e.g., pa, ta, ka, etc.Vowel recognition wasmeasured usingan eight-alternative identification paradigm for themonophthongalvowels/o/(as in "hot"),/a•/(as in "cat"),/i/as in ("beet"),/e/(as in "bet"),/I/(as in "hid"),/u/(as in "boot"),/A/(as in "cut" ), and/u/(as in "could"). The vowelswerepresented in a/bVt/context, e.g.,bat,bit, beet, etc.• The tokensfor thesetestsweredigitized( 10-kHzsamplingrate,4.5-kHzbandwidth)naturalproductions froma singlemalespeaker.Eachsyllablewasrepresented by three tokensto reducethe influenceof artifactual cuesthat may be

availablefrom useof a singletoken (e.g., Uchanskiet al., 1991). Theexperiments wererunundercomputercontrolin blocksof 72 trials for consonants(2 presentations X 3 tokensX 12 consonants)and 48 trials for vowels(2 presentationsX 3 tokensX 8 vowels)with syllablesand tokensdrawn

mancewasunaffected.)Trial-by-trial feedbackwasnot provided.Prior to the firstblockof testingin a session,the subject could requestpresentationof stimulusexemplarswith feedback.In a typicaltest session,three to five blockseach for consonant and vowel identification

were measured.

All testswereconductedwith the subjectseatedwithin a sound-proofbooth at 1-1.5 m from a small loudspeaker (Realistic Minimus 7 or Yamaha S10X) and for audiovisual tests • 1.5 m from a 480-cm color monitor (Sony KV2080). Soundwaspresentedat a comfortablelisteninglevel and while the subjectwasfreeto adjustthe sound-processor input sensitivityand volumeas desired,only minor adjustmentsto their normalsettingsweresometimesmade. D. Score

selection

Eachsubjectwasadministeredthe abovetestsrepeatedly. As observedby others(Tyler et al., 1986;Dorman et al., 1990b;SpivakandWaltzman,1990), performance typically exhibitsvariouscoursesof improvementovertime.With the exceptionof our mostrecentimplantees, mostsubjects have nowbeenfollowedsufficientlylong (typically, morethan 18 monthsof processoruse) that asymptoticperformanceappearsto havebeenreached.For thispaper,we will focuson resultsobtainedby averagingperformanceon the most recenttestadministrations.Measuresobtainedduringthe first 3 monthsof processor use,whenlearningeffectsaregenerally largest,havebeenexcluded.Usingtheseguidelines,the scoresbelow are basedtypicallyon two administrationsof the NU6 word test (with resultsfor 23 subjects),six sentence lists for each test mode (V, A, and •1 V) for the two

sentence corpora(20 and 22 subjectsfor CUNY and IEEE, respectively),and about500 trials eachfor consonantand vowelrecognition( 18 subjects). Onesubject(S05) hasshowna patternof resultsover timewhichisuniqueandcontradictory to the"asymptotic" descriptiongivenabove.Followingimplantationin 1985

andimprovements in performance overthenext12months, his scoreson a variety of speechreceptionteststhen remainedconstantfrom 1987through 1989.In January1990 (60 monthsafterprocessor fitting), routineadministration of our sentencerecognitiontestsrevealedan obviousimprovement in performance (i.e., his•1 scoreon CUNY sentencesincreasedfrom a previousaverage of 14.4% to 40.7%). During his next few laboratoryvisits,improvementswerefoundon all otherreceptionmeasures andsince then(April 1990)hisperformance hasagainremainedconstant.The causeof this suddenand generalizedimprovementisunknown.Sincetwosetsof"asymptotic"scoresexist for thissubject, bothareincludedin ouranalyses of correlationsamongreceptionmeasures; for descriptive statistical summaries,however,only his more recentscoresare used. II. RESULTS

A. Isolated word recognition

randomlywithoutreplacement. Testsyllables wereusually Resultsfrom the NU6 monosyllabicword recognition presented in isolation, withoutanycarrierphrase.(For a few test range from 0% to 45% with a mean of 11.6% stibjects, testswerealsoconducted preceeding eachstimulus (s.d. = 13%, n - 23). The scoredistribution (see Fig. 1) presentation with the word "ready;"identification perfor- includesfive subjectsunableto recognizeany words (0%), 1871

J. Acoust.Soc.Am.,Vol.92, No.4, Pt.1, October1992

Rabinowitz eta/.:Relations amongmeasures ofspeechreception

1871

5O

'

I

,

I

'

I ß

ß

4O





80

e'b

o

ß

30

>-

o

60

o

z

o

0

::3 2o

o i

z

40

0

Oo lO

'• 2o 0•-' 00•

O• 0 CXl

I

0

,

2O

I

,

40

I

,

6O

I

8O

A- CUNY Words (%)

Subject I

FIG. 1. Monosyllabic wordrecognition(NU6) scores arrangedin increasing order for 23 subjectswith two valuesfor S05 ("o" for old and "n" for

so-

new).



.-

60 12 subjectswith measureablebut only modestrecognition scores(from 2% to 14% ), and six subjectswith considerableword recognitionperformance( >•20% ). Basedon similardistributionsof scoresreportedfor other groupsof Ineraid usersaswell asusersof the multichannel Nucleus device (e.g., Gantz et al., 1988; Dorman et al., 1989b), this distributionappearsrepresentativeof performance with present-daymultichannel implant systems. However, with a mean scorenear 12% this testis very difficult and, additionally,it is basedon a singlesetof 50 words. How the NU6 scorerelatesto other recognitionmeasures

I



I

I

I

I

I

100-

40

ot-o, 0

t

ß o

, , 40, , 60, , 80,

20

A- IEEE Words (%) FIG. 2. Audiovisualwordrecognition in CUNY sentences (upperpanel) andIEEE sentences (lowerpanel);resultsfor V (opensymbols)and•4V (closedsymbols)versus•4.In thisandsubsequent figures,somedatahave beenshiftedby smallamountsto reduceoverlapamongpoints.

will be of central interest below.

B. Word recognition in sentences Performance on the CUNY and IEEE sentences,for V,

A, andA Vmodes,is summarizedin Fig. 2. As with the NU6 scores,variabilityamongsubjectsis largefor eachtestmode and sentenceset. Nevertheless,the audiovisualbenefit due to

the implantisclear:Averagescoresare V = 44%,A = 31%, and A V= 76% for the CUNY sentences and V= 18%, A = 13%, and A V = 57% for the more difficult IEEE sen-

tences.A usefulmetric summarizingthe benefitto speechreading is G= 100 (AV- V)/(100- V) (Tyler et al., 1990). This quantity expresses the percentageaudiovisual gain normalizedby the potential benefitpossiblegiven an individual'sspeechreading score.(It is analogousto the often used correction for chance in closed-settests, but with

theindividual'sVscorereplacingchanceperformance. ) The averageC,sare 62% and 49% for the CUNY and IEEE sentences,respectively. Four subjectsobtain no benefitto speechreading using their implants.For thesesubjectsA V= Vand and G•>V) alsoexhibit positivescoreson

all threesound-only tests. 2 Whilethesefive"failures"must not beignored,the resultsfrom the remaininggroupprovide 1872

J. Acoust.Soc.Am.,Vol.92, No.4, Pt. 1, October1992

a usefulbenchmarkagainstwhich to comparealternative prostheses (tactual aidsand acoustichearingaids). We suggestthis because,with a particularprosthesis,it is usefulto knownot only the likelihoodof a positiveoutcomebut given suchan outcome,how large a benefitcan be expected.For the subjectswith an A V benefit, the averagescoresare V= 47% and 19%, A = 41% and 16%, AV= 88% and 71%, and G = 80% and 65% for the CUNY and IEEE sen-

tences,respectively(and 15% for the NU6 words). On the easierCUNY materials,A V scoresare often limited by ceiling effects.Clearly, on sentencetestssuchasthese,the benefits dueto the implant are routinelyquitelarge. The relation amongthe sound-onlytestscoreswill now beexaminedin greaterdetail.NU6 word andA-IEEE scores are similar,with averagesof 12% and 13%, respectively; ACUNY scores(average= 31% ) are generallymuch higher (seeFig. 3 ), presumablydueto the availabilityof context.In order to compareindividuals'scoreson thesetests,we first needto compensate for theseoveralldifferences betweenthe tests.Following Boothroydand Nittrouer (1988), we expressthe averagerelationamongthesetestsin termsof the

following equation: Ps= 1 - ( 1 -- Pw) k,whereps istheproponion correctfor wordsin sentences, pw is the proportion correctfor isolatedwords,and the exponentk > 1 is a free parameter. Basically, the subjectsare assumedto apply

Rabinowitz eta/.' Relationsamongmeasuresof speechreception

1872

I

0

10

20

30

40

Isolated NU6 Words

10

50

(%)

I

,

I

20



with k -- 4.50.

"top-down"knowledgeof the languageto reducethe error rate of word recognitionwithin sentences.Boothroyd and Nittrouer show that the exponentk can be interpretedas reflectingstatisticallyindependentchannelsof information arisingfrom contextualconstraints.This interpretationcan begeneralized,however,to think of k asalsoreflectingother factorsthat influencethe relativedifficultyamongtests(differencesin talker, vocabulary,etc.). Operationally,as k increases, Psgrowsincreasingly rapidlyasa functionofpw (see Fig. 4). Combiningresultsacrosssubjects,we determineda singlek factorrelatingAscoresfor eachsentence testto NU6

[

I

40

Isolated Words

FIG. 3. Relationbetweenwordrecognition in CUNY sentences andthatin isolation(NU6). Linedrawnis fromtheequation givenlaterin thetext,

I

30

50

(%)

FIG. 4. Illustrativecurvesfor theeffectsof differentk factors(the parameteralongsideeachcurve). Becauseour NU6 scoresdo not exceed50%, the curvesare shownonly overthisrangefor the abscissa.

tences,onesubjecthasparticularlylow A-CUNY-k ----2.4%

givenher NU6 -- 11%; on the IEEE sentences, onesubject has particularly high A-IEEE-k=64.5% given her NU6- 36%. In the main, however,thesethree testsyield consistentmeasuresof performance,if onefirstaccountsfor averagetestdifficulty.Thisconsistency suggests that repeat-

X

-

scores:3 k-CUNY - 4.50 and k-IEEE ----!. !4. The relative-

ly high k-CUNY reflectsthe low degreeof difficulty for thesesentences. For the IEEE sentences, with k just above unity, the benefitassociatedwith the sentencecontext is indeed small.

Using thesetwo valuesof k, estimatesof isolatedword recognitionwere then derivedfrom eachsubjects'A-CUNY and A-IEEE scores.Specifically,the equation above was solvedforpwasa functionofœs.Thesederivedisolatedword scores (denoted A-CUNY-k

and A-IEEE-k)

50

,,u{, 40

30

correlate

stronglywith the measuredNU6 word score(see Fig. 5):

o

Pearsonproduct-momentcorrelationsare r -- 0.93 and 0.85

for A-CUNY-k and A-IEEE-k, respectively(and r = 0.93 between A-CUNY-k

and A-IEEE-k).

:

Since these transfor-

mationsremovethe averagedifferencesamongthe testsfor our subject population, the remaining deviations (from r = 1.0) primarily reflect (a) intersubjectdifferencesin the ability to usecontext and (b) the effectsof the limited sample sizesusedto estimateall of thesescores(e.g., NU6 is basedon a particular50 words).Thesedeviationsaregener-

FIG. 5. Relation betweenestimatesof isolatedword recognitionderived from sentencescores,A-CUNY-k (¸) and A-IEEE-k ( X ), and NU6 iso-

ally modestbut the outliers are notable. On the CUNY sen-

lated word score.

1873

J. Acoust.Soc.Am.,Vol.92, No.4, Pt. 1, October1992

I

0

10

I

20

[

I

30

i

I

40

Isolated NU6 Words

i

I

[

50

I

i

60

(%)

Rabinowitzeta/.: Relationsamongmeasuresof speechreception

1873

I

ed, but infrequent,useof the sameNU6 test may not be a seriousproblem. In subsequent analyses,we useboth the NU6 s'coreand an averageof the threeword scores(NU6, A-CUNY-k, and A-IEEE-k),

1 oo

r = o.go 8o

denoted WORD, to characterize isolated word

recognition.That is, we assumethat the three component scoresprovideindependentestimatesof a subject's"true" score.This WORD measurehasvery highcorrelationswith the three componentscores(averager - 0.97), in part because1/3 of WORD is eachcomponentscorebut alsobecauseof averagingthe othertwo components.

o

-

oo

øo

a3 40 o o

C. Segmental recognition

0

20

!

Stimulus-response confusion matricies were formed from the raw identification

data obtained in the tests with 12

consonants(denotedC12) and eightvowels(denotedV8). The percent-correct scoresfor C12 and V8 (availablefor 19 subjects)exhibita broadrangefrom nearchance( 8.3% for C12 and 12.5% for V8) up to about85%. Direct correlations

I

mations of the form (NU6) •/• and (WORD) •/•, with 1/j < O.5, were explored.Thesetransformswere motivated (once againfollowingBoothroydand Nittrouer, 1988) by reasoning that recognitionof a nonsense CVC wordrequires identificationof threephonemes,the initial consonant,the medialvowel,andthefinalconsonant; assuming equalprob-

abilities forall threephonemes (pp),oneexpects wordrec-

ognition Pw-Pv3.When, asinourcase, wordrecognition is assessed with meaningfulwords, languageconstraintsreducethe numberof independentelementsand,therefore,on the averageoneneednot perfectlyrecognizeall threephonemes,but probablybetweentwo and three. Inverting this relation,leadsto the predictionthat segmentalscoresshould

I

' I

6

'

I

10

'

I

20

'

'

I

40

'

'

70

Transformed- WORD

of these C 12 and V8 scores with the

two word recognitionmeasuresNU6 and WORD yield a rangeof r -- 0.77 -- 0.81. While moderatelyhigh, thesecorrelationsare limited by an underlyingnonlinearrelationship: In contrast to the broad distributionof segmental scores,the word recognitionmeasuresare bunchedtoward low values(mean • 12% ) and initially improveonly slowly while segmentalscoresincreaserapidly. If this curvilinear dependence is removed,higherlinear correlationsresult. To expandword-scoredifferencesnear zero, transfor-

[

2

I

100

(%)

1 O0 -

- r=0.85 • 60 --

40

'

20

o/:'o '

0

I

2

'

I

6

'1

'

10

I

'

20

I

'

40

Transformed- WORD

'

I

'

70

'

100

(%)

FIG. 6. Relationbetweensegmentalreception(consonants C12 above; vowels V8 below) and transformed-WORD scores (obtained using 1/j----0.375). Linesdrawn are the best-fitlinear regression with indicated correlation(r). In eachpanel,the lowerabscissa showsWORD valueson the transformedscaleand the upperabscissa is linearin transformedunits.

relatelinearlywithp?, with 2

o." o

s

I

2O

s

I

4O



I

6O

,

I

,

8O

Overall V8 Information Transfer

lOO

(%)

FIG. 9. Relation between V8 feature and overall ITs.

1876

J. Acoust.Soc. Am., Vol. 92, No. 4, Pt. 1, October 1992

Rabinowitzeta/.: Relationsamong measures of speech reception

1876

identificationof factorsthat underly the large differencesin speechreceptionexhibitedamongimplantees.Potentially useful factors include psychophysicalmeasuresthat are basedon simpler,nonspeech stimuliand objectivemeasures basedonelectricallyevokedresponses. Giventhehighcorrelationsdescribedabove,the specificchoiceof speech-reception measureusedfor suchstudiescanonly be of minor influence. However, as noted for our results, simple linear correlationsupon measuredvariablesmay underestimate the strengthof the relations. Other implicationsderivefrom the extentto which the resultsgeneralizewith respectto variationsin speech-reception testsand variationsin prostheses. A. Test variations

Boothroydand Nittrouer (1988) usethe factorj to relate word and phonemescoresobtainedfrom separatescoringsof the sametest (sameitems,talker, and recording).In this case,j can be relativelyinsensitiveto the particular choiceof testitems.Our useofj to relateword-derivedand independent measures of segmental receptionissuceptible to greatervariation (althoughfor our data, variationofj between2 and3 producedsystematic but onlysmallchangesin correlation).In particular,if the difficultyof one test were substantiallyincreased,by usingmuch larger stimulussets in the segmentaltestsand/or talkers and recordingsthat were particularly lessclear, the quantitativerelation between percent-correct segment identification and

(WORD) •/•couldchange. For relatingsentences and word recognition,k factors for other corporaand recordingswould be expectedto be unique.For example,thecommonlyusedCID sentences (of which there are only 10 lists) and the BKB sentences(21 lists,developedin England,Benchand Bamford,1979) are bothrelativelysimple;their k factorsare probablycloserto that for the CUNY sentences(k ----4.50) than the IEEE sentences (k = 1.14).

Sentence testspotentiallytaplinguisticknowledge(and suprasegmental perception)whichcouldvarygreatlyacross subjects.Why then are therehigh correlationsbetweenisolated word scores derived from sentence tests and NU6

scores? By design,the sentencetestshavebeenconstructed to requireonly a modestlanguagelevel,whichmostof our subjects certainlypossess. Thisservesto substantially diminish the contributionthat language-leveldifferencescould make,therebyfacilitatingthe high correlationswe observe. B. Application to other prostheses

The relationsobservedhere shouldaid in interpreting and predictingthe benefitsto speechreceptionthat occur with otherauditoryprostheses. To a greatextent,theoverall relationsare generalattributesof speechreception,not dependentspecificallyupon the Incraid device.To illustrate the useof theserelations,we will draw upon a variety of studies of limited auditory and tactual presentationof speech.We will focuson measuresof consonantidentification performanceand word recognition,whenmeasured,as well asthat predictedusingthe resultsin Figs.6 and7. Only 1877

J. Acoust.Soc. Am., Vol. 92, No. 4, Pt. 1, October 1992

measuresobtainedwith soundalone (i.e., without speech-

reading)willbeconsidered. ? In studies wheremultipleconditions were tested, the best results are used. In some studies

featurescoresweregivenin termsof percentcorrectand/or not for the three featuresmanner,voicing,and place.We estimatedpercentITs using (a) the raw confusionmatrix, (b) a simulationof the relationbetweenpercentcorrectand percentIT, and/or by (c) averagingavailablecomponent feature scores.

A summaryof illustrativestudieswith auditorypresentation of signalsis givenin Table III. Resultsfrom the first threestudiesarefromnormalhearinglistenerswith "impoverished"signals.Faulkneret al. (1989) presentedvoicefundamentalfrequency(F0) duringintervalsof voicedspeech and an aperiodicnoise (Nx) during intervalsof voiceless (but not silent) speech.Van Tasellet al. (1987) and Rosen ( 1989) presenteddifferentformsof speech( or its envelope) modulatinga widebandnoise(SMN); their purposewasto eliminateall spectralcueswhile attemptingto preservetemporal cuesavailablefrom the speechsignal.For thesethree studies,manner (MAN) and voicing (VOI) informationis sometimeshigh ( > --•50% ), but place (PLA) information is very low ( < 11% ). The overall (OVL) informationdoes not exceed45% (nor doespercentcorrect). While thesesignalscanprovidesupportfor speechreading, theiraudio-only word recognitionisexpectedto be verylow (not morethan & few percent). The fourth study,from Rosenet al. (1989), showsaverageresultsfrom four usersof the House/3M single-channelimplant;thesescoresare similarto thosefrom the three previousstudies.[Carney et al. (1990) present resultsfrom an "averagesample"of eight"successful" House/3M users;their scoresappearlower than thosein Rosen et al. (1989). ]

The next three studiesillustrate that single-channel electricalstimulationis capableof yieldingmuchbetterperformance.Shannonet al. ( 1991) reportresultsfrom an exceptionalperformerwith a single-channel, auditorybrainstemimplant(ABI) that wasdrivenwith a pulsatilesystem designedto maximizetemporal/envelope information.Rabinowitzand Eddington(1989) testedan exceptionalIneraid user (S04, seeFig. 1) in a single-channel analogcondition where the four filter outputs were summed and deliveredto a singleelectrode.In both studies,high MAN and VOI scoreswere found, PLA increased to --•35%, and

OVLs (andpercents correct)wereabove60%. Basedonthe OVL and PLA scores,we would predict word scoresof about 15%; measuredscoreswere 12% (NU6, Shannonet al.) and 15% (NU6 and IEEE sentences,Rabinowitz and

Eddington).EdgertonandBrimacombe(1984) reportlimited data on four subjectsusingan "optimized"House/3M device.Consonantidentificationrangedfrom 13% to 59% correct.The scorefrom the bestsubject(59%) is similar to that the two previousexamples. Our multichannelresultsare givenas the next two entries.Comparedto theexceptional single-channel examples citedabove,resultsfor our "average"multichanneluser(see Sec.II) are similar for PLA and somewhatlower otherwise; thewordscorefor thisgroupaveraged14%. Resultsfor our "better" subjectsare higher,particularlyfor PLA (averag-

Rabinowitzeta/.: Relationsamong measures of speech reception

1877

TABLE III. Consonantidentificationperformancefrom severalstudies(seetext for details). % Information MAN

% Correct

No.

Study

Subjects

Signalprocessing

Test

1 2 3 4 5

Faulkner et al. (1989) Van Tasell et al. (1987) Rosen (1989) Rosen et al. (1989) Shannon et al. ( 1991 )

5N 12 N 2N 4 CI 1 ABI

Acoustic FO -3-Nx Acoustic SMN Acoustic SMN 1 Channel House/3M 1 Channel Pulsatile

12/aCa/ 19/aCa/ 12/aCo/ 12/aCa/ 16/aCa/

6

Rabinowitzand Eddington(1989)

1 C1

1 ChannelAnalog

Edgertonand

1 CI

1-Channel

7

Brimacombe (1984)

OVL

OVL

23

45

19

35

-•40

37

45

•40

22

transmitted PLA

VOI 48

8

46

< 10

50

70

11

53

56

11

•50

62

68

73

86

41

12/Ca/

65

64

94

60

30

12/aCa/

59

(Insufficient data)

House?3M

8(a) 8(b) 9(a) 9(b)

Rabinowitzet al. (1992) Rabinowitzet al. (1992) Blarneyetal. (1987) Blarneyet al. (1987)

"Average"C1 "Better"CI 8 CI Best2

"Optimized" 4 ChannelAnalog 4 ChannelAnalog 22 ChannelFOF1F2 22 ChannelFOF 1F2

12/Ca/ 12/Ca/ 12/aCa/ 12/aCa/

10(a) 10(b)

Skinner et al. ( 1991) Skinner et al. (1991)

5 CI 5 CI

22 Channel WSP 22 Channel MSP

12/aCa/ 12/aCa/

53

54

67

37

30

75

74

88

61

54

37

48

•50

55

33

78

74

•70

87

62

57

55

64

53

18

60

59

81

61

19

ing 54% ) and, aswell, for WORD (averaging34% ). The next two entriesare from Blameyet al. (1987) and illustrativeof performancewith the 22-channelNucleusdevice. Resultswere reportedfor 28 subjects,subgroupsof which weretestedusinga varietyof encodingstrategies; we consideronlythedataobtainedwith the "FOF 1F 2" strategy. The averageresultsfromthe eightuserstestedaresomewhat belowthosefor our "average"user,exceptfor the PLA score

connected speechmayimposespecialdemandsuponthetactual systemand, therefore,performanceon isolatedsegmentsmay not predictasymptoticword andsentencerecognition (e.g., Sparkset al., 1979;Weisenberger et al., 1989). Nevertheless,we will usesegmentalperformance,together with the aboverelations,to place an upperboundon the levelsof word recognitionthat might be possible. Despitemuchpreviousresearchontactualspeechcomof 33%, which is similar to our 30%. The results from the munication systems (for review, see Reed et al., 1989; CHABA, 1991), only a limited number of recent studies their two best usersare much higher (and very similar to ours), with high values for all featuresand OVL. Word report segmentalidentificationresults. scoreswere not given. Carney (1988) evaluateda single-channel aid and a 24The final two entries are from Skinner et al. (1991). channel, thigh-worn tactile vocoder,and testedlive-voice Theytestedfiveusersof the 22-channelNucleusdevice,with presentation of 20 consonants (/Co/). Trainingwaslimited over 3 weeks;resultswere similarand both the latest implementationof the FOF1F2 strategy to only eight sessions (WSP) and the newer "multipeak" strategy(MSP) that relativelylow for both devices(percentcorrect = 28%). Theseresultsare roughlysimilarto thosefrom the firstfour supplies FOF 1F2 encodingplussomeband-energy levelinformation.As notedby Skinneret al., consonant-identifica- studiesin Table III---correspondingto near zero predicted word recognition. tion performancewas very similar for both strategiesbut, Blameyet al. (1988) evaluatedthe eight-channel,handinterestingly,word recognitionwas different, averaging 13% for WSP and (much higher) 29% for MSP. Clearly, worn electrotactile"Tickle Talker" that implementsan this patternof resultsis at variancewith our picture.The F0F 2 encodingstrategyandtestedlive-voicepresentation of WSP word scoreof 13% is roughlyconsistentwith the con12 consonants(/oCo/). Following70 h of trainingover 6 sonant scores.For the MSP word score of 29%, we would months,sevensubjects(normal hearingbut artificiallydeafpredictimprovedconsonantrecognition(above- 70% corened)obtainedan averagescoreof 47 percentcorrectandIT rect), which could occur only by increasingPLA substan- measuresof OVL = 48%, MAN -- 35%, VOI= 66%, and tially abovethe observed19%. PLA = 9%. The overallscoresare consistentwith a potenIn the caseof tactual presentationof speech,the comtial word scoreof -8%, but the very low scorefor place pletelynovelrepresentations that mustbelearnedimplythat probablymakesthis an overestimate. thetime requiredto obtainasymptoticperformanceon word Weisenbergeret al. (1989) evaluatedthe 16-channelviand sentencetasks is likely to be considerable(as in, say, bratory"Queens"devicethat implementsa spectraldisplay and is worn on the forearm (Brooks and Frost, 1983). Livelearninga newlanguage;seealsoCowanet al., 1988). However, performanceon the identificationof isolatedspeech voice presentationof 21 consonants(/Ca/) was tested. segments will asymptotemuch more quickly.Conceivably, Overall percentcorrectwas 52%. This performance,espe1878

J. Acoust.Soc. Am., Vol. 92, No. 4, Pt. 1, October1992

Rabinowitzeta/.' Relationsamongmeasuresof speechreception

1878

cially on a 21 consonanttest, is relatively high and correspondsto a word scoreapproximatelyin the range 10%15%, which is closeto our averagemultichannelimplant performance.Interestingly,Brookset al. (1986) reporton a singlesubjectwho,afterabout200h of training,wasableto identify 88 of an opensetof 1000testwords,includingone-, two-, andthree-syllable words,presentedup to threetimesif the answer was incorrect, but without any feedback.This measuredscoreof 8.8% demonstrates that open-setword recognitionwith a tactiledeviceis indeedpossible,andclose to the valuethat we predictfrom segmentalperformance. Finally, Tadoma is a natural (i.e., no device) tactual speechcommunicationmethod used by certain highlytrained,deaf-blindindividuals.The userplacesa handon the face of a talker and directly monitors articulatory actions (jaw and lip positions,laryngeal vibration, oral airflow, etc.). In an evaluationby Reed et al. (1985), nine users tested on the identification of 24 consonants (/Ca/,

Henderson (1989)].

OVL = 69%,

ITs

MAN ----56%,

from

one user were

VOI ----74%,

and

PLA----53%. Monosyllabic word recognition averaged 40% (range26-56% ), and threeproficientuserswho were also tested on the IEEE sentences obtained scores from 45 %

to 60%. All of thesescoresare veryhigh, similarto thoseof the best implantees,and consistentwith the relationswe have identified above.

We are gratefulfor the continuingand enthusiasticcooperationof the subjectsthroughoutthe research.Helpful suggestions and commentswere providedby L. D. Braida, K. W. Grant, M. Svirsky, P. J. Blamey, two anonymous reviewers,and the editor.This work wassupportedby Grant No. PO 1 DC00361 (Dr. J. B. Nadol, P. I. ) from the National Institute

of Deafness and other Communication

Disor-

ders.

For mostimplanteesthe audiovisualgainin testsof sentencereceptionis generallyvery high. Particularlyfor simple everyday-likesentences, this gain is of limited utility in differentiatingthe degreeof benefitprovidedby the implant. Relationsamongaudio-onlymeasuresof performancesupport the following conclusions.(1) Among implantees, word recognitionandsegmentalperformancearehighlycorrelated (r>0.85) if a sufficientrangeof performanceis considered.Over restrictedranges,e.g., among subjectswith intermediatelevels of performance,correlationsare only moderate (r•0.5). (2) Resultswith tactual presentation alsoexhibitconsistent patternsbetweenconsonantandword recognition.Modestlevelsof word recognitionappearpossible with at leastoneexistingdevice.Demonstratingsimilar resultsin a largersubjectpopulationwith a wearabledevice isan importantnextstep.(3) The performanceof exceptional single-channel implant subjectsis indeedintriguing (see also Hochmair-Desoyer et al., 1985, and Tyler et al., 1989a,b), both in terms of their overall performanceand their modestlevelsof placecue reception.However, when groupsof either averageor bestperformersare compared, subjectsusingmultichanneldevicesobtain superiorspeech reception. (4) Increasingreceptionof consonantalplace cuesis key to further improvementsin speechreception(see also Tye-Murray and Tyler, 1989). Manner and voicing J. Acoust.Soc. Am., Vol. 92, No. 4, Pt. 1, October 1992

We chosea/bVt/context becauseit minimizes durational cuesto vowel identity (within naturalproductions),therebyprovidinga betteropportunity to reveal receptionof formant-frequencydistinctionsthat are more salient(and, therefore,important) in runningspeech.

Theexceptions areS12withGs(CUNY andIEEE) of4% and6%, 0% in •/-CUNY and •/-IEEE but 2% on NU6 and S13 with Gsof 62% and 42%, 7% •/-CUNY, 1% •/-IEEE and 0% NU6.

Using•/-CUNY and•/-IEEE asestimates ofp• (foreachtest)andNU6 as an estimate of Pw, we then averaged values of k log(1-pw); however,becausemeasurementerrors on very low (and high) probabilitiesundulyinfluenceestimationof k (Boothroydand Nittrouer, 1988), we excludedsubjectswith NU6

Relations among different measures of speech reception in subjects using a cochlear implant.

A comprehensive set of speech reception measures were obtained in a group of about 20 postlingually deafened adult users of the Ineraid multichannel c...
2MB Sizes 0 Downloads 0 Views