JOURNALOF NEUROPHYSIOLDGY Vol. 66, No. 1, July 1991. Printed

in U.S.A.

Coding Visual Images of Objects in the Inferotemporal Cortex of the Macaque Monkey KEIJI

TANAKA,

HIDE-AKI

SAITO,

YOSHIRO

F’UKADA,

AND

MADOKA

MORIYA

Laboratory for Neural Information Processing,Frontier Research Program, The RIKEN Institute, Hirosawa, Wako-city, Saitama, 3.51-01; Department of Information Science and Communication Technology, Faculty of Engineering, Tamagawa University, Tamagawagakuen, Machida-city, Tokyo, 194; and Department of Psychology,Teikyo University, Otsuka, Hachioji-city, Tokyo, 192-03 Japan SUMMARY

AND

CONCLUSIONS

1. The inferotemporal cortex (IT) hasbeenthought to play an essentialand specificrole in visual object discrimination and recognition, becausea lesionof IT in the monkey resultsin a specific deficit in learning tasksthat require these visual functions. To understandthe cellular basisof the object discrimination and recognition processes in IT, we determined the optimal stimulusof individual IT cellsin anesthetized,immobilized monkeys. 2. In the posterior one-third or one-fourth of IT, most cells could be activated maximally by barsor disksjust by adjustingthe size, orientation, or color of the stimulus. 3. In the remaininganterior two-thirds or three-quartersof IT, mostcellsrequired more complex featuresfor their maximal activation. 4. The critical feature for the activation of individual anterior IT cellsvaried from cell to cell: a complex shapein somecellsand a combination of texture or color with contour-shapein other cells. 5. Cellsthat showeddifferent types of complexity for the critical feature were intermingled throughout anterior IT, whereas cellsrecordedin singlepenetrationsshowedcritical featuresthat were relatedin somerespects. 6. Generally speaking,the critical featuresof anterior IT cells weremoderatelycomplex and can bethought of aspartial features common to imagesof severaldifferent natural objects.The selectivity to the optimal stimuluswasrather sharp,although not absolute. We thus proposethat, in anterior IT, imagesof objectsare codedby combinationsof active cells,eachof which represents the presenceof a particular partial feature in the image. INTRODUCTION

The inferotemporal cortex (IT) of the macaque monkey is an extrastriate visual area, which receives visual inputs from the primary visual area (Vl) after relays in two prestriate areas: V2 and then V4 (Desimone et al. 1980; Rockland and Pandya 1979; Zeki 197 1). It, in turn, projects to the limbic system: to the amygdaloid complex directly (Amaral and Price 1984; Herzog and Van Hoesen 1976; Iwai and Yukie 1987; Turner et al. 1980) and to the hippocampus both indirectly via the parahippocampal gyrus (for review see Van Hoesen 1982) and directly (Yukie and Iwai 1988). Some integrative function may be suggested by this anatomic organization; experimentally, monkeys in which ITS were ablated bilaterally showed severe and specific deficits in learning tasks that required visual discrimination or recognition of objects (for review see Dean 1976; Gross 170

1972; Mishkin 1982). It has thus been thought that the IT plays an essential and specific role in object discrimination and recognition. To understand the cellular mechanisms of the visual object discrimination and recognition processes in IT, it is necessary to know how visual images of objects are coded by IT cells. The previous studies found cells that selectively responded to the sight of a brushlike pattern (Desimone et al. 1984; Gross et al. 1969, 1972; Schwartz et al. 1983), a face (Baylis et al. 1987; Bruce et al. 198 1; Desimone et al. 1984; Perrett et al. 1982; Yamane et al. 1988), or a hand (Desimone et al. 1984; Gross et al. 1969, 1972), but there are no clear evidences for the presence of cells that selectively respond to other complex features than the above three. We therefore designed the present study to find out general rules of analysis and representation of visual object information in IT. We examined responses of individual IT cells with an extensive set of visual stimuli, including realistic objects as well as bars and disks, and carefully determined the stimulus features necessary and sufficient for the maximal activation. We used anesthetized and paralyzed preparation, because Isuch an extensive test of responses needs long stable recordings from single cells, which may be easier in anesthetized monkeys than in behaving monkeys. Because the previous studies have suggested that IT is not a homogenous region (Baylis et al. 1987; Cowey and Gross 1970; Desimone and Gross 1979; Iwai and Mishkin 1967; Iwai and Yukie 1987,1988; Seltzer and Pandya 1978,1989; Weller and Kaas 1987; Yukie and Iwai 1988), we paid special attention to the distribution of cells with different types of selectivity; therefore the same monkeys were repeatedly used for recordings to make many penetrations distributed over a wide area in the same hemisphere. We found that most cells in the anterior two-thirds or three-quarters of IT were maximally activated only by stimuli more complex than bars, disks, and gratings. The critical feature for the activation varied from cell to cell: an integrated shape in some cells and a combination of contourshape and texture or color within the contour in other cells. These selective responses of anterior IT cells may constitute a basis of the critical role of this region for the object discrimination and recognition. Some of these results were previously reported in abstract form (Saito et al. 1987; Tanaka et al. 1987).

0022-3077/9 1 $1.50 Copyright 0 199 1 The American Physiological Society Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

INFEROTEMPORAL

CODING

OF VISUAL

IMAGES

OF OBJECTS

171

rior partsof the IT. Becausecellswererecordedwithin 3 mm from the initial unit in all but four penetrations(the meanof the distance of the final unit from the initial unit was 1.81 mm), the Preparation and recording region did not cover the lower bank of the superiortemporal sulThree Japanesemonkeys (Macaca fuscata) weighingfrom 6.5 cus or the ventral region medial to the anterior middle temporal to 7.5 kg were used.The generalmethodsof preparation and re- sulcus.Monkey SU wasusedfor preliminary experiments. cordingsare sameasthosedescribedpreviously (Saito et al. 1986; Tanaka et al. 1986, 1989; Tanaka and Saito 1989).Briefly, the Visual stimuli monkeys were first preparedfor repeatedrecordingsby a single The corneaswerecoveredwith contact lenseswith artificial puasepticsurgeryperformedunder anesthesiawith pentobarbital sodium (initially 35 mg/kg followed by 5-10 mg. kg-’ h-l): a brass pils 3 mm in diameter. A translucent tangent screenwasplaced block for headfixation wasattached to the top of the skull with 114 cm from the corneas.The power of the contact lenseswas stainlesssteelboltsand acrylic resin,and the right lateral surfaceof determined so that imageson the screenfocusedon the retina. the skull wasexposedand coveredwith acrylic resin for later unit Severalretinal landmarkswere projected onto the screenby the recordings.Extracellular single-cellrecordingswere madeonce a useof a reversibleretinoscope,and the position of the fovea was week on eachmonkey: an endotrachealcannulawasinsertedinto determinedgeometrically from theseby referring to photographs the tracheaunder initial anesthesia with ketamine hydrochloride; of the fundus. The fovea could be exactly indicated in a photothe monkey was immobilized with gallamine triethiodide and graph of the fundus. Three-dimensional(3-D) objectswerepresentedin front of the anesthetizedwith a gasmixture of N,O and O2and a smalldose (up to 1 mg kg-’ h-l) of pentobarbital sodium throughout the screen,and photographsof various patternswere projected onto recording; and a glass-coated platinum-iridium electrodewasin- the screen.The monkey sawthe stimuli monocularly, usually by sertedinto the brain through a smallholein the resin-coatedskull, the eye contralateral to the recording site. The other eye wasocwhich wasmade at the beginning of each recording sessionand cluded by a removableopaqueplate. The screenwasilluminated by two setsof fluorescentlights that were attached to the ceiling filled with resinwhen the recording wasfinished. Only one or two penetrationsweremadein a day, but mapping behind the monkey. The luminanceof the screenwas3.1 cd/m2, in the samehemispherewascontinued for up to 12 mo. Figure 1 that of a white papersheetplacedin front of the screen,9.0 cd/m2; showsthe distribution of electrodepenetrationsin the three mon- that of a black paper sheet,0.92 cd/m2; and that of a light slit keys. All the penetrationswere madein the right hemisphere.The projected onto the screen,30 cd/m2. Stimulusselectivity of cellswasexaminedasfollows.Responses mappedregion in monkey IC and monkey MO covered the prelunate gyrus,which is occupiedby V4, and the posteriorand ante- of a cell were first tested with a routine set of stimuli, and the

METHODS

l

l

l

10mm

A20

A:0

A20

/ 3o”

1Omm

1. Functional mapping was made over a wide posterior-anterior extent of the temporal cortex. Top: lateral view of the temporal part of the right hemispheres of 3 monkeys (SU, IC, and MO). Dots indicate positions of penetrations. The position of recording of illustrated cells in the following figures is indicated by the figure and cell number attached to the dot (for example, 4.1 represents cell I of Fig. 4). Bottom: frontal section of the brain at anterior 10. Straight line indicates the angle of penetration, and stippled area indicates the mapped region. Abbreviations: STS, superior temporal sulcus; LF, lateral fissure; IOS, inferior occipital sulcus; LuS, lunate sulcus; APO, A 10, and A20, level of 0, anterior 10, and anterior 20 in Horsley-Clarke coordinates. FIG.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

172

TANAKA

effective stimuli were listed. The routine setwas composedof I) slitsand spotsof various sizesand colors,projectedfrom a handheld projector; 2) rectangular and circular paper cuts of various sizesand colors,presentedin front of the screen;3) regulargeometric patterns, such as stripes,dot patterns, concentric rings, and patternslike windmills, projected from a handheld projector; 4) plastic and spongespheres,spongecubes,plastic cylinders, and feather brooms,of variouscolors; 5) 3-D animal imitations made of vinyl, cloth, or plastic, including imitations of tiger, tabby cat, spotted dog, zebra, giraffe, gorilla, hawk, duck, frog, raccoon, monkey, human head,and human hand; 6) 3-D plant imitations madeof plastic, including banana,apple, orange,maize, pineapple, grapefruit, melon, cabbage,carrot, potato, cucumber, watermelon, eggplant,onion, potato, a bunch of grapes,ivy, a potted plant, a cut piece of apple, and an obliquely cut sweetpotato (which wasfed every day to the monkeysin the cage);and 7) the experimenter’shands,body, and face. Various sidesof the objects werepresentedwith various orientations. If a cell gave consistentresponseto one of the 3-D objects,we tried to clarify which component or combination of components of the imagewereessentialfor the activation. If a cell wasconsistently activated by more than two different stimuli, we started from the featurescommon to the stimuli. We madetwo-dimensional(2-D) paper modelsthat simulatedthe imageof the object and comparedthe responseof the cell to these2-D paper models and the original 3-D object. The papermodel simulatednot only the shapeof the outline, but alsothe texture and the color of the image.If the cell respondedto the 2-D modelasstrongly asto the 3-D object,we then reducedthe complexity of the 2-D modelstep by stepand assumedthat the simplest2-D feature that fully activated the cell was the feature that the cell extracted from the image. (The criteria to determine if two responsesare equally strongwill be describedbelow.) The initial listing of effective stimuli was mostly done qualitatively by hearingthe firing through an audiomonitor, and the subsequentprocedureof reducingthe complexity of the effective stimulus wasperformed quantitatively. For the quantitative test,the 2-D and 3-D stimuli wereattached to the end of a stick and presentedby hand in synchrony with a soundfrom a computer. The timing of the presentationwasaccurate to well within 0.3 s. A large opaque plate with a window usually 40” in diameterwasplaced 10 cm in front of the openeye to make the experimenter’shand invisible to the monkey. The stimuli wereusually presentedwith slow(OS-2 Hz) oscillationsof smallamplitudes(0.5-3”). Although the 2-D modelswereusually presentedwith a stick, any contribution of the stick to the activation wasexcludedby presentingthe modelsattachedon a translucent plate. Two to 15different stimuli werecombinedin a stimulus set, and they were presentedby turns so that the stimulus rotated amongthe combinedset.The cycle wasrepeated,usually 10times. Although eachcell wastestedwith severalstimulussets, eachset alwayscontained a common stimulus,usually the optimal stimulus, so that the relative effectivenessof all the stimuli could be evaluatedby referenceto the responseto the common stimulus.The magnitudeof responsewasrepresentedby the mean firing rate during the time of the presentationminusthe spontaneousfiring ratejust beforethe presentation.The time epochduring which the responsewasmeasuredwasshiftedby an amount equal to the latency of the responseof individual cells. A statisticalevaluation wasperformedasfollows. As described above,a singletestwasusuallycomposedof 10cyclesof sequential presentationof 2- 15different stimuli. We averagedthe responses to all stimuli within a singlecycle and divided the magnitudeof eachresponse by this cycle average.This procedurewasperformed for all cycles to yield a set of normalized magnitudesfor each stimuluspattern. The Kolmogorov-Smimov test (a nonparametric test)wasthen appliedbetweentwo setsof the normalizedmag-

ET AL.

nitudes, usually betweenresponsesto the optimal stimulusand responsesto another stimulus. The symbols* and + in figures indicate that the responsemarked by the symbol is significantly smallerthan the responseto the optimal stimulus,with a possible error < 1%for * and 5% for +. We assumedtwo responses to be equally strong if the smallerresponsewas270% of the largerresponseand the differencewasnot significantin the statisticaltest (P > 0.05).

It should be noted that there wasa limitation in identifying a selectivity of responsefor color of stimulusin the presentstudy. As describedabove, we simulatedthe view of effective 3-D objectsby paper models. Color of the model or part of the model was changedby replacingit by a papercut of other colors.We consideredthe responseto be color-selectiveif the responsewasconsiderably reducedby changingthe color to white and black aswell asto other colors. However, sheetsof paper of different colors were different not only in the wavelengthcomposition(hue) but alsoin the total energy(brightness)of the light reflectingfrom the surface. Therefore it is possiblethat the apparent color-selectivity of a responsewas only due to the selectivity for the exact value of the brightness.We cannot excludethis possibility,but wedo not think it plausiblefor the following reasons.I) Among the initial routine setof stimuli, the color-selectivecellsusually respondedto several different 3-D objects that were similar in hue but different in brightness.2) We preparedpaper sheetsof -25 different colors. For eachcolor there wereother colorsclosein huebut different in brightness.We testedseveralcellswith suchpairsof papersheets and observedthat the magnitude of responsedependedon the similarity in hue but not on the similarity in brightness. The extent of the receptive field was determinedby useof the optimal stimulus.We plotted the borderby the centerof the stimulus, and thereforewe tried to usesmallerstimulusasfar asthe cell respondedmaximally. The extent of the receptivefield thusdetermined is larger than “the minimum responsefield” (Barlow et al. 1967)that must be plotted by the inner edgeof the stimulus.The positionson the tangent screenwereconverted into thosein visual angle,and the areaof the receptivefield wascalculatedasif it were elliptic in shape.The averageddata shownin Table 4 and Fig. 18 werecalculatedby averagingvaluesof the squareroot of the area. RESULTS

Discrimination betweei the posterior and anterior parts of IT Response properties were studied in a total of 725 cells located in visual areas inferior to the superior temporal sulcus. They were classified as Primary, Texture, Elaborate, Others, Weak, and Unresponsive cells according to the criteria described below. Primary cells dominated the prelunate gyrus and a posterior part of IT, whereas Elaborate cells dominated the remaining anterior part of IT. Primary cells are those that could be activated by slits or spots, by adjusting the orientation, size, and/or color. Because the slit could be replaced by an ellipse and the spot by a small square, the exact shape of the stimulus was not important for their activation. Texture cells are those that were activated by some simple texture such as a stripe or dot pattern, regardless of the shape of the outer boundary. A single component of the texture resulted in little or no activation. Although Primary cells and Texture cells were activated by simple stimuli, they responded to complex stimuli that contained the critical feature as strongly as to the simple stimuli.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

INFEROTEMPORAL

CODING

OF VISUAL

Elaborate cells are those that required a particular shape or a combination of a shape with a color or texture. The selectivity of these cells could not be explained by the selectivity for orientation, size, and color of stimuli, or by the selectivity to some simple texture. Others included two cells in IT that were activated only by a temporal change of color (from orange to red in 1 cell and from green to red in the other), seven cells in IT that were activated only by some complex movements, and three cells in the prelunate gyrus that had pure inhibitory center and wide excitatory surround regardless of the sign of the contrast of stimuli. Weak cells are those that showed only a weak response to certain visual stimuli. The response was too unstable to study the stimulus selectivity reliably. Unresponsivecells are those that did not respond to any of the visual stimuli included in the initial routine set.

IMAGES

OF OBJECTS

173

Although stimuli were usually given through the contralateral eye, a change of the eye did not improve the responsiveness of Weak and Unresponsive cells. Binocular view was also tested for these cells, but we did not try to align the two visual axes exactly. There thus remains a possibility that some of the cells required particular disparity values for the activation. Auditory and tactile stimuli were also given to Weak and Unresponsive cells, but none of them was activated by these non-visual stimuli. The classification was done mostly qualitatively by hearing the firing through the audiomonitor, but if necessary quantitative measurements of responses (by making peristimulus time histograms or PSTHs) were done to eliminate an uncertainty present in the qualitative determination. PSTHs were made for 298 cells. The distribution of cells of each category is shown in Figs. 2 and 3. Although Texture, Weak, and Unresponsive cells

oriented

0 l

oriented

color

nonoriented

a noncolor

‘Elaborate identified

+

unidentified

o f

A20

A10

FIG. 2. Top: Primary cells dominated in the prelunate gyrus and posterior IT but were few in anterior IT. An open circle indicates an oriented cell without color selectivity, a star a color-selective cell without orientation selectivity, and a star within a circle an oriented color-selective cell. Cells without either orientation or color selectivity are differentially indicated by their selectivity for the size: a downward open triangle indicates a cell preferring a small size, an upward filled triangle a cell preferring a large size, and a small filled circle a cell without size preference. Broken line indicates border between anterior and posterior IT, which was determined on the basis of the ratio of Primary cells to Elaborate cells recorded in single penetrations; and the dotted line indicates the border between posterior IT and the prelunate gyrus. inh, cell was inhibited by the stimulus. Cells marked in a cluster were recorded within a single penetration. Bottom: Elaborate cells dominated anterior IT, but were few in the posterior regions. Filled square, an identified Elaborate cell; open square, an unidentified Elaborate cell. f, a face cell. Border between the prelunate gyrus and posterior IT is indicated only by an arrowhead in these maps and the following figures.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

174

TANAKA

ET AL.

Texture Others

A20

Al0

FIG. 3. Top: Texture cells were scattered throughout the whole mapped region. Asterisk, a Texture cell; downward filled triangle, an Other cell. Bottom: Weak and Unresponsive cells were distributed throughout the whole mapped region. Open circle, a Weak cell; filled circle, an Unresponsive cell.

were scattered throughout the mapped region (Fig. 3), Primary and Elaborate cells showed clear characteristics in their distribution. Primary cells dominated a posterior part of IT as well as the prelunate gyrus, whereas Elaborate cells dominated the remaining anterior part of IT (Fig. 2). The distribution of Primary cells was complementary to that of Elaborate cells, and the change from the dominance of Primary cells to that of Elaborate cells was abrupt rather than gradual. We have thus determined to divide the IT into posterior and anterior parts by the use of this discontinuity. Quantitatively, the border line is drawn between the penetrations in which Primary cells were more numerous than Elaborate cells (posterior to the line) and the penetrations in which Elaborate cells predominated (anterior to the line). The border is shown by broken lines in Figs. 2 and 3. The posterior part will be called “posterior IT” and the anterior part “anterior IT.” Although the results are shown only for monkeys IC and MU in Figs. 2 and 3, results consistent with these were also obtained in monkey SU Primary cells dominated the posteriorly situated two penetrations and Elaborate cells dominated the remaining, more anteriorly situated penetrations (see Fig. 1). Table 1 shows the number of cells in each category recorded in the prelunate gyrus, posterior IT, and anterior IT.

The border between the prelunate gyrus and posterior IT was tentatively drawn by extending the anterior tip of the inferior occipital sulcus. It is indicated by dotted lines in the top maps of Fig. 2 and by arrowheads in other maps. The difference between the prelunate gyrus and posterior IT was TABLE

1.

Classificationof cells

Cell Type

Prelunate Gyrus

Posterior IT

Anterior IT

Primary Oriented Color Oriented color Nonoriented noncolor Texture Elaborate Identified Face Unidentified Others Weak Unresponsive

109 (70%) 59 7 7 36 8 (5%) 3 (2%) 1 0 2 3 (2%) 19 (12%) 13 (8%)

101 (72%) 49 16 6 30 3 (2%) 12 (9%) 7 1 4 1 (1%) 14 (10%) 10 (7%)

53 (12%) 16 11 1 25 27 (6%) 193 (45%) 82 30 81 8 (2%) 68 (16%) 80 (19%)

Total

155 (100%)

141 (100%)

429 (100%)

Values are numbers of cells; numbers in parentheses are percentages of column totals. IT, inferotemporal cortex.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

INFEROTEMPORAL

CODING

OF VISUAL

small, whereas anterior IT was very different from these two regions. Elaborate cells constituted 45% of the whole sample and even 69% of the clearly responsive cells in anterior IT. We thus established that cells that required complex features dominate the anterior part of IT. This region occupied the anterior two-thirds to three-quarters of the IT. Primary

cells in the prelunate gyrus and posterior IT

Here we will describe response properties of Primary cells, which dominated the prelunate gyrus and posterior IT. Slightly more than one-half of the Primary cells in the prelunate gyrus and posterior IT showed the orientation selectivity, namely, they required the stimulus to be elongated along a particular axis of orientation (noted as “oriCell 1 *\

‘$\,,

\ -.-I

A

B

\‘.-.‘\-,

‘\

‘\ -

,

\

\

.21*

1

ALL

L.L

.52*

10 i/s

Cell 2

red 0

white

J

0

ICO1082

green

0

0

blue

01

so

red cl

K 1.25

AdL

yellow

L

L 15 i/s 5s

FIG. 4. Cell I: Primary cell with orientation selectivity, recorded in posterior IT. Thin broken lines indicate the receptive field. Response was moderately inhibited by lengthening the stimulus further beyond the optimal (C). Cell was activated by a white bar on a gray background but not by a black bar on the gray background (D and E). In this and following figures, white patterns lighter than the background are indicated by drawing the outline (except for in D). Figures above each PSTH indicate the magnitude of the response (see METHODS for the way of calculating the magnitude) normalized to the maximal response to the optimal stimulus. * and + indicate that the response is significantly smaller than the maximum response, with a possible error < 1% for * and 5% for +. Cell 2: Primary cell with color selectivity, recorded in posterior IT. Stimuli were paper cuts presented on a gray background. The color (F-J) but not the shape (Kand L) of stimulus was crucial. Response was inhibited by increasing the size of stimulus (M).

IMAGES

OF OBJECTS

175

TABLE 2. Frequencyoccurrenceof the inhibition by long or large size in Primary cells Prelunate Gyms

Posterior IT

Anterior IT

Total

Oriented Color Oriented color Nonoriented noncolor

13143 416 l/5 22/3 1

24150 5/11 l/3 9129

418 213 O/l 3123

41/101 1 l/20 219 34183

Total

40185

39193

9135

99/254

Values are the number of cells that showed >50% inhibition by increasing the length or size of the stimulus out of the number of cells tested for the inhibition. IT, inferotemporal cortex.

ented” in Table 1). Figure 4, top (cell 1) shows an example that was recorded in posterior IT. The cell responded much more strongly to a light slit elongated along 4:30- lo:30 (B) than to a small square (A). A slit in the orthogonal orientation did not activate the cell at all. Such oriented cells frequently showed two other kinds of stimulus selectivity. One is inhibition by long length of the stimulus. The degree of inhibition by long stimuli was moderate in cell 2 of Fig. 4 (C). An inhibition by >50% was observed in one-third of the oriented cells (Table 2). The other property is a preference for sign of the contrast. Cell 1 of Fig. 4 preferred a light stimulus: it was activated by a white bar presented on a gray background (D) but not by a black bar on the same background (E), although the contrasts of the two stimuli were almost the same (0.46 and 0.53 log unit). In the present sample, two-fifths of the oriented cells showed such selectivity for sign of the contrast (Table 3). The cells that preferred light stimuli and those that preferred dark stimuli were equally common. Some of the Primary cells required a particular color. They showed a weak response or none to a white or black stimulus. More than one-half of these color-sensitive cells did not require the stimulus to be elongated (noted as “color” in Table 1). Cell.2 of Fig. 4 is an example of these “color” cells. The cell responded to a red disk (F) but not to a white (J) or a black disk (not shown). It even seemed to be suppressed by a spot of other colors (G-Z). The shape of the stimulus did not affect the activation (Kand L), but the size of the stimulus was important. The magnitude of the response increased gradually as the size of the disk increased from 1O(50% maximum) to 2.5’ (75% maximum) and then to 5O (maximum) but decreased when the size of the spot further increased to 10’ (M, 28% maximum). Such inhibition by large size was shown by one-third of the color-coded cells (Table 2). The sign of the luminance contrast did not matter to the activation of cell 2 of Fig. 4. It responded both to a red disk (F, darker than the background) and to a red light spot projected onto the screen (not shown, 1.4 1 times as large as the response to the disk). Some other color-coded Primary cells responded only to patterns lighter or darker than the background (Table 3). The remaining cells showed neither orientation selectivity nor color selectivity but were not completely nonselective. Two-thirds of these required the stimulus to be within some small size (up to 5 O, Table 2), and many of them showed a selectivity for sign of contrast (Table 3).

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

TANAKA

176 TABLE 3.

ET AL.

Frequency occurrenceof the selectivityfor the signof contrast in Primary cells

Oriented Color Nonoriented noncolor Total

Prelunate Gyrus

Posterior IT

Anterior IT

Total

(4, 5, 1wo (4 1, W3

(4,4, WI23 (1, 1, a/4

(09 0, w2 a 2, w

(8, 9, 28)/45

(4, 3, W/l8 (9, 9, WI41

(3, 1, 5)/9 @,6, W/36

(0, 1, w (0, 3, 3)/h

(7, 5, W29 (17, 18,48)/83

(294, 3)/9

Values in denominators indicate numbers of cells tested for the preference for sign of contrast. Values in numerators indicate numbers of cells that respond to light patterns more than twice as strongly as to dark patterns (left), cells that preferred dark patterns (middle), and cells that responded both patterns (right). IT, inferotemporal cortex.

Cells of these subgroups of Primary cells did not occur randomly but tended to be grouped in each given penetration. Color-coded cells occurred in a restricted number of penetrations, and usually the cells sampled in a single penetration were either mostly oriented or mostly nonoriented (Fig. 2).

CLUSTERING.

Texture cells Before going to the Elaborate cells, which dominated the anterior part of IT, we will explain the properties of the Texture cells, which were scattered throughout the whole

mapped region. The cells classified as Texture cells are those for which we succeeded in reconstructing the optimal texture. We used stripe patterns, regular dot patterns, random dot patterns, patterns with parallel fragmentary line segments, and patterns with randomly oriented fragmentary line segments. The cells that were sensitive to the shape of the outer boundary in addition to the texture were classified into Elaborate cells and excluded from Texture cells. The majority of the Texture cells (30/38) were maximally activated by stripe patterns. Figure 5, top (cell I) shows an example that was recorded in the prelunate gyrus. The cell responded to a stripe pattern (A) but not to a single black

Cell 1 \ ~

A

\

C

I

L

~

1

IC 11201

D -02 .

J&L

*

.35 *

30 i/s 5s

Cell 2

MO 06158

H

.L

0*

10i/s 5s

5. Cell 1: Texture cell responding to a grating of particular orientation, recorded in the prelunate gyrus. A and B: a rectangular grating with a spatial frequency of 2 cycles/deg was presented within a window 20’ in diameter. Cand D: a black bar 0.25’ in width (C) and 3 black bars, each 0.25” in width with 0.5’ interval, (D) were presented within the same window. Cell 2: Texture cell responding to a dot pattern, recorded in the prelunate gyrus. E: a dot pattern composed of light spots on dark background was presented within a window 20° in diameter. Diameter and interval of spots were 0.5 and 1O,respectively. F: a dot pattern composed of dark spots on light background. Spatial parameter of the pattern was the same as in E. G and H: a rectangular grating with 1O interval. FIG.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

INFEROTEMPORAL

CODING OF VISUAL IMAGES OF OBJECTS

bar that was a component of the stripe pattern (C). Some response appeared when three black bars were combined (D), but the magnitude of the response was still one-third of that of the response elicited by the stripe pattern. The magnitude of the response increased further as the number of bars increased and reached a maximum with 10 bars (not shown). The cell also did not respond to single bars that were larger in width than the component bar of the stripe pattern. The response was selective for orientation (B) and for the spatial frequency of the pattern. These were common properties of the Texture cells that responded to stripe patterns (29/30). The optimal frequency ranged from 0.25 to 4 cycles/deg. Another five cells were activated maximally by a dot pattern. One of these, recorded in the prelunate gyrus, is illustrated in Fig. 5, bottom (cell 2). A stripe pattern activated the cell in an orientationally selective manner (G and H), but a pattern composed of light spots on a dark background activated it more than twice as strongly as the stripe pattern (E). A dot pattern with the same spatial organization but with reversed contrast was not effective at all (F). For this cell it is probable that the orderly arrangement of spots was essential, because random dot patterns were not effective. On the other hand, the shape of the texture components was not critical, because the light spots could be replaced by light squares. What was critical for the activation may have been a rather orderly arrangement of small light patches. In addition to the cells that selectively responded to a particular kind of texture, we saw three cells that responded to textured patterns regardless of the kind of the texture. They did not respond to patterns without a texture. We did not find the Texture cells that selectively responded to texture patterns composed of fragmentary line segments, but a few Elaborate cells required patterns composed of parallel fragmentary line segments within a contour of a specific shape. One-quarter of the Texture cells (10/38) showed inhibition by large size. Interestingly, a few of them preferred stimuli elongated along a particular orientation.

177

unidentified

0

identified

0.5 20

pattern

1 /30

1.5

2

.2

object

FIG. 6. Distribution of the ratio of the magnitude of the response elicited by the identified 2-D critical feature to that of the response elicited by the most effective 3-D object is shown for identified Elaborate cells (open bars). Distribution of the corresponding ratio, but with the most effective 2-D stimulus, calculated for unidentified Elaborate cells is also shown for a comparison (dotted bars).

evoked a response stronger than that to the original 3-D object. This occurred because the contrast was increased, and the size and even some parameters of the shape were adjusted to the preference of the cell during the course of the reduction process. For the remaining 27 identified cells, the quantitative test was started with some 2-D patterns. The dotted bars of Fig. 6 show the ratio of the magnitude of the response to the most effective 2-D stimulus to that of the response to the most effective 3-D object for 33 unidentified Elaborate cells. For the remaining 54 unidentified cells, quantitative tests were not performed because it was clear in the qualitative tests that any of the 2-D models that we made to simulate the image of the 3-D object failed to activate the cell. We did not study the properties of face cells extensively, because there have been many extensive studies for similar cells located in the depth of the superior temporal sulcus (Bruce et al. 198 1; Desimone et al. 1984; Perrett et al. 1982) and in the ventral surface of the IT (Yamane et al. 1988). The main objective of this paper is to describe the properties of the critical (necessary and sufficient) features for the activation of the identified Elaborate cells, with the aim of Elaborate cells in anterior IT finding out general principles of coding of images of objects in anterior IT. We will show several examples to illustrate Here we will explain response properties of Elaborate the degree of complexity of the critical features and how cells in anterior IT. Elaborate cells can be further divided steeply selective the responses were to these features. into three groups. Ninety of the 208 Elaborate cells were The identified critical features of 67 Elaborate cells were “identified cells” for which we succeeded in reconstructing the stimulus feature required for the activation by 2-D stim- specified just by shape of the contour and sign of the conuli made of paper (see METHODS). Another 3 1 were “face trast, whereas another 22 cells required the texture or the cells,” which selectively responded to the presentation of a color within the contour in addition to the contour-shape face. The remaining 87 cells were “unidentified cells” that (shape + texture, 15 cells; shape + color, 5 cells; shape + selectively responded to one or a few 3-D objects but for texture + color, 2 cells). The remaining one cell required a combination of the texture and the color. which we failed to reduce the effective stimuli to a particular 2-D feature. SHAPE-ONLY CELLS. Figures 7-9 show six examples of cells The procedure of reducing the complexity of the effective for which critical features were specified only by shape. stimulus was performed quantitatively. The open bars of Cell I of Fig. 7 responded to brushes, pineapple leaves, Fig. 6 show the distribution of the ratio of the magnitude of and other shapes that had many projections from a central the response to the identified 2-D feature to that of the body. We quantified the response with star-shaped patresponse to the most effective 3-D object, from which the terns. A star with eight projections activated the cell as reduction of complexity was begun, for 63 identified Elabo- strongly as these initial objects (A). Disks (B), squares (C), rate cells. The ratio was by definition >0.7 (see METHODS) and triangles (-0.04 of the maximum) of various sizes were and was usually > 1. The ratio > 1 means that the 2-D model tested, but none of them were effective. A star with 16 proDownloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

TANAKA

178

Cell 1 *

0

A

+

1

-.01*

A&-

E

x

k”

C

B

-.04*

.44*

DL

---c,

*

F

+

G

w

H

$

MO

03124

Cell 2

-L

1

.17*

.29* L

.19*

ET AL.

ence for abruptness of the projections was not tested. The color did not influence the magnitude of the response (qualitative tests); but the sign of contrast of the pattern, namely, whether it was darker or lighter than the background, was very important. This cell never responded to light stars (0.02 of the maximum). This importance of sign of contrast was a general feature among the shape-only Elaborate cells. Three cells selectively responded to a rounded shape without a projection or a comer (like disks and ellipses), one cell responded to a rectangle, one cell responded to a triangle, and one cell responded to a cross. Three cells seleo tively responded to an elongated cut regardless of the orientation of the long axis. The shape required for cell 2 of Fig. 7 was more complex than the above-described examples. The cell responded maximally to a rear view of a gorilla model within the initial set of stimuli (0.84 of the maximum response shown in I) with some good responses to a rear view of the head of a tiger model (0.59 of the maximum) and a rear view of a mannequin’s head with black hair (0.50 of the maximum). The critical feature was identified to be a rounded tongue (I). The tongue had to be projected from the direction of 7 Cell 1

,4

\

hi ’

N f

0 /

p /

MO

04161

A

B 1

1

, :6*

L

.42*

A

."

_1L

.48+

lISi/s

C .15*

D .27*

.34*

AhIC 12256

5s

FIG. 7. Cell 1: Elaborate cell responding to a star shape. Invariance of the response for the size of the stimulus is shown in Fig. 13. Cell 2: Elaborate cell responding to a rounded tongue projected from 7 o’clock. Selectivity of this cell for the orientation of the stimulus is shown in Fig. 12A.

jections was as effective as the star with 8 projections (0.87 of the maximum), but a shape with four projections evoked only a 44Smaximum response (II) and a shape with three projections 17%maximum response. On the other hand, the response to stars was invariant for the size, as documented in Fig. 13. To check the possibility that only some part of the star-like shape was crucial for the activation, we tested the response with partial blocks of the star. The top side (E), right side (F), bottom side (G), and left side (H) were all effective to some extent, especially the bottom side, but the magnitude of the response was less than one-third of the response elicited by the whole star. We also presented stripe patterns with spatial frequencies between 0.25 and 2 cycles/deg and with different orientations 45O apart, but none of the stimuli activated the cell (~0.0 1 of the maximum). Thus the overall shape, or Gestalt, must have been crucial for the activation. We did not test systematically whether or not a regular arrangement of projections was a requirement; but, because the cell responded to a wide variety of objects (see above), multiple projections from a central body rather than a regular star shape in a strict sense seems to have been the essential requirement. The prefer-

5OiIs

-/ M -

N\ .13*

Irl

o&( .:4*

c

p* c

IC 12252

/Ilils 5s

8. Cell I: Elaborate cell responding to a combination of a disk and a bar projecting from the disk. Selectivity of this cell for the orientation of the stimulus is shown in Fig. 12G. Cell 2: Elaborate cell responding to a T shane. FIG.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

INFEROTEMPORAL Cell

CODING

OF VISUAL

1 0

0

0 A

0 B

I

6

C .10*

A

boo

D -.01*

1

.58* Jl&

lCO2246

E

F

Gu

H lOOi/s

5s

Cell 2

IMAGES

OF OBJECTS

179

can be assumed to be composed of more than two components. The segmentation of the critical feature into components may be arbitrary in the examples shown in Fig. 8, but the segmentation is rather objective in the examples shown in Fig. 9. Cell I of Fig. 8 responded maximally to an apple model among the initial routine set of stimuli. The apple had a stem, and when the stem was removed the cell ceased to respond. We immediately realized that a combination of a round body and a projection from it was important. A projection from a disk made of black paper activated the cell more strongly than the original apple model (A). A bar or a disk by itself resulted in little or no activation of the cell (B and C). The direction of the projection was critical, as shown in Fig. 12G. The projection needed to be longer than some minimum: the maximum response was obtained with a 2.5 Oprojection; if the projection was 1Othe response was only one-third of maximum (D). The width of the projection was also critical. The maximum response was obtained with a 0.25”-wide projection, and the response was reduced significantly if it was made as wide as 1.5 O(E) or as narrow as 0.05O (0.69 of the maximum). If two more projections were attached to the disk at positions near the first projection, the response was halved (0.48 of the maximum). The selectivity for the shape of the body was moderate: the response was halved if the disk was replaced by a square (F) and by a star (G). The selectivity for the size of the disk was also moderate. The response was significantly reduced if the size of the disk was halved (0.55 of the maximum). The response did not decrease when the disk was doubled (0.95 of the maximum), but it completely disappeared if the bar was attached to a dark straight edge of a massive body that was so big that the other borders of the body was outside the stimulus window (H). Cell 2 of Fig. 8 responded to a light T-shaped cut in an orientationally selective manner (I and J). A cross that could be made by attaching another bar to the optimal Tshaped cut was not effective (K), and the right-side L (L), left-side L (M), or horizontal bar (N), which were components of the T-shaped cut, were not effective, either. The angle was important in some graded manner. If the two light bars that constituted the horizontal bar were tilted by 30”, there remained the initial transient excitation, but the sustained component of the response vanished (0 and P). Two cells selectively responded to five bars projecting from a disk (see Fig. 12F), and one cell selectively responded to a combination of three bars that joined at one end (see Fig. 12B). In the case of cell 1 of Fig. 9, the investigation of the critical feature started with a fact that a rear view of an experimenter’s head, but not a mannequin’s head detached from its body, was effective in activation (qualitative observations). We then placed a white board under the mannequin’s head and obtained a good response from the cell (0.7 1 of the response shown in A). Finally, the critical feature was identified as a combination of a black disk and a white disk (A). A black disk had to be placed above a white disk (H, see also Fig. 12C). Either component, a black disk or a white disk, in isolation was not effective at all (B and C). The response vanished if both disks were made black (0.08 of the response to the stimulus shown in A, not

6 6 0 0 Iso K

-.07* -

1

.08* A

.01* -

,,,, 8 N8 oA p8 ~“05283 .24* 4s

Cell I: Elaborate cell responding to a combination of a dark disk and a light disk above the dark disk. F&r the orientation selectivity of the response, see Fig. 12C. Cell 2: Elaborate cell responding to a combination of a dark ovoid and a light disk within the ovoid. FIG.

9.

o’clock (see Fig. 12A). Tongues having corners did not activate the cell (J and K) and neither did a disk itself(L). The cell responded to the stimulus if a short skirt was attached to the disk in the appropriate direction (IM), but the response was still smaller than half-maximum. A combination of the disk with a long but narrow bar also elicited only a half the maximum response (N). Thus it seems safe to conclude that a rounded tongue was the necessary and sufficient feature for the activation. The final two histograms were added to show how the response tolerated changes in size of the patterns (0 and P). A white tongue evoked much smaller response (0.29 of the maximum) than a black tongue, and a black outline (the width was 0.25”) with a white inside was even less effective (0.17 of the maximum). Two cells selectively responded to a concave or a convex edge, one cell responded to a narrow bar with a special curvature, and one cell responded to an elongated cut sharpened at both ends. Figures 8 and 9 show further examples of the shape-only Elaborate cells. The critical features of these cells are different from those of the above-described examples in that they

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

180

TANAKA

ET AL.

shown) or white (qualitative observation). Although the ori- shape and other submodalities. Figures 10 and 11 show entation of the alignment of two disks was critical, the dis- four examples of such cells. tance between two disks was less important. The magnitude A combination of a shape and a texture was required for of the response slightly decreased when the borders of two the activation of cell I of Fig. 10. The critical feature of the disks touched each other (0.78, smaller than the response cell was derived from combined hindlimbs of a tabby cat shown in A by P < 0.05) but did not change even when the model (1.29 of the maximum response shown in A). The distance between the borders increased to 16.5 O( 1.08 of the width decreased from the proximal to the distal, and there response shown in A). The selectivity for shape of either was a grating orthogonal to the axis of the limbs. Finally, we component was moderate. The magnitude of the response concluded that a vertical grating within a triangle directed decreased by -40% when either disk was replaced by a to the right was the critical feature for the activation (A). A square of a similar size (D and E), and the magnitude of the white or black triangle without a grating did not activate the response was reduced by 55% when both disks were re- cell (B). With a horizontal grating the magnitude of the placed by squares (F). The possibility that this reduction response was one-third of the maximum (C). With a grid was caused by a subtle change in the size that accompanied that contained the vertical grating as a component, the rethe change in the shape was excluded, because the magnisponse was close to the maximum (D). The shape of the tude of the response did not change significant .ly when ei- outline had to be a triangle directed to the right. A vertical ther disk was changed in size by up to 30% (exemplified grating within a rectangle, either wide or narrow (E and F), or a triangle directed to the left (G) was not effective, but a bY G). Cell 2 of Fig. 9 responded maximally to an obliquely cut vertical grating within a disk evoked about one-third maxisweet potato among the initial set of stimuli, and a 2-D mum response (H). model composed of a dark ovoid elongated vertically and a Three cells specifically responded to a grating within a light disk placed within and near the bottom end of the dark circle, two of them with selectivity for the orientation of the ovoid activated the cell as strongly as the original stimulus did (I). Although the cut surface of the sweet potato made a Cell 1 contrast with the peel in both luminosity and color, the color contrast was not required (qualitative observation in this cell). The response was obtained with any configuraA tion of color, if it satisfied the condition that the inner part B C D was lighter than the outer part. The response failed when the contrast was reversed (L). Either the outer ovoid (J) or 1 .12* .32* .09 the inner disk (-0.02 of the maximum response) by itself A&& AALAL L A&did not activate the cell at all. The inner disk needed to be placed near the bottom end of the outer ovoid (K). ComIIlIuJml pared with this strict requirement on the positional relasu06162 tionship between the two components, the selectivity for E F G H the shape of either component was rather moderate. A re.09* placement of the inner disk with a square reduced the re-I.03*I ,1,-" -;36; loi/S sponse considerably (M), but with a hexagon the reduction was small and not statistically significant (IV). A replacement of the outer ovoid with a triangle reduced the response considerably (O), but a replacement with an ellipse resulted in a small, nonsignificant reduction in the re- Cell 2 sponse (P). There were five more cells that had critical features simiQ 0 cl v green boo lar to that of cell 2 of Fig. 9, namely, a light disk within a green J green K green L ’ dark ovoid. These six cells were recorded in four penetrations made within a region 1.2 mm in diameter in monkey 1 &d&L SU. Although the critical feature was the same in shape, the Am .best orientation was different; and the tuning properties for parameters of the shape, for example, the strictness of the selectivity for the shape of each component and the position of the inner disk within the ovoid, varied from cell to cell. There were also a cell responding to a configuration in ,, L & which a pair of dark rectangles overlaid on an end of a light square, a cell responding to a combination of a big rectangle 5s and a small horizontal bar above it (Fig. 120), and a cell FIG. 10. Cell I: Elaborate cell that required integration of texture and responding to a dark bar projecting from a light disk. shape. The critical feature was a vertical grating within a triangle directed

D

ES+

L 5s

COMBINATION

CELLS-

Now

we turn to the Elaborate

cells in

which critical features were specified by combination

of

to the right. Interval of the grating was 1O, and width of black lines was 0.4”. Cell 2: Elaborate cell that required integration of color and shape. The critical feature was a green star.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

INFEROTEMPORAL

CODING

OF VISUAL

IMAGES

OF OBJECTS

181

Finally, we see examples of cells with critical features that represented a combination of textured and/or colored segments. Figure 11 shows two examples. For the activation of cell 1, a combination of two textured bodies was required. The critical feature was derived from a front view of a tabby cat (0.70 of the maximum response shown in A). The cat had a horizontal grating on the face and a vertical grating on the body. The forelimbs placed under the face had also a vertical grating. A combination of a horizontal grating in the top half and a vertical grating in the bottom half of a 5O circular area (A) activated the cell more strongly than the cat did (l-0.70, P < 0.0 1). The vertical grating or the horiIC 03034 % zontal grating within the half area by itself was much less H effective (B and C). A vertical or horizontal grating occupy80 i/s -.04* 55* ing the whole circular area did not activate the cell (qualitative observations). A grid that occupied the circular area 5s also was not effective (D). A combination of a vertical and a horizontal grating was required. We next examined effect of changing the outline of the textured area. The response was Cell 2 significantly smaller when the pattern occupied a wide area yellow brown green brown (E, 40’ in diameter), whereas a combination of two disks, one occupied by a horizontal grating and the other occupied by a vertical grating, elicited a comparably strong response (F). From these results, it may be concluded that only the position and area extent, not the exact boundary shape of each grating, was critical. The response was significantly reduced if the two disks were separated (G) or alternated in their position (H; see also Fig. 12E). brown brown brown The critical feature of cell 2 of Fig. 11 was a combination of a dotted brown disk and a narrow light bar. The cell first responded to a potato and a pineapple attached to an end of a white bar but not to a brown or red sphere attached to a 15ils .13* .34+ .41+ white bar. The view was simulated by a dotted brown disk A &ulldh A attached to an end of a white bar, and it activated the cell as 5s strongly as the 3-D objects (I). The shape of the body was FIG. 11. Cell I: Elaborate cell responding to a combination of 2 orthogimportant in the sense that the response was much reduced onally striped cuts. Interval of the grating was 1O,and width of black lines when the disk was replaced by a square (J). The color of the was 0.4’. For the selectivity of the response for the orientation, see Fig. disk was also important because the response disappeared 12E. Cell 2: Elaborate cell responding to a combination of a dotted brown disk and a narrow bar. Interval and diameter of the dots were 0.125 and when the color was changed to green (K), yellow (L), blue, 0.05 O,respectively. and red (not shown). The response was significantly reduced when the dots were removed (M). This corresponds grating and the remaining one without the selectivity for to the initial observation that a brown sphere attached to a the orientation; and one cell specifically responded to a white bar was not effective, because the sphere was smooth on the surface. From these results, we concluded that the grating within a square. A combination of a shape and a color was required for critical feature included the color and texture as well as the the activation of cell 2 of Fig. 10. The cell responded selec- shape of the block. In addition, the bar was indispensable tively to a green star. A green star (I) but not green cuts of because the response failed when the brown dotted disk was other shapes (J-L) activated the cell. The same star-shaped presented in isolation (N). The bar had to be brighter than cuts of other colors were not effective, either (M-0). We the background: the response was reduced significantly if it suspected that some spatial frequency components but not was made darker than the background (0). There were also a cell that required a combination of a the shape of star itself were responsible for the activation, and tested the response with green and white gratings of white and blue blocks and a cell that required a red star attached to an end of a blue rectangle. various spatial frequencies and with various orientations. In conclusion, the critical features for the activation of But we could not activate the cell with the gratings (exemplified by P). The cell responded to a green star both on a identified Elaborate cells were more complex than orientawhite sheet and a black sheet, although we did not measure tion of contours, size and color of stimuli, and some simple texture, which are sufficient for cells in the lower stages, but the responses quantitatively. Therefore the sign of the lumiwere nevertheless not complex enough to specify a concept nosity contrast might not be important for the activation. There were also two cells that selectively responded to a of some particular real object. Instead, the critical features red star-shaped cut and one cell that responded to a yellow seemed to be partial features common to images of several different objects. There are two exceptions to this general elongated cut sharpened at both ends. Cell 1

e

L

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

182

TANAKA

conclusion. Thirty-one cells responded selectively to a face (see Figs. 14 and 17), and 4 cells responded only to the realistic silhouette of a hand (see Fig. 17). The number of cells that required a certain color for their activation was small. Color was relevant only for 6 of 90 (6.7%) identified Elaborate cells. Within the whole sample of cells in anterior IT for which we identified the critical features (n = 162), namely, Primary, Texture, and identified Elaborate cells, only 17 cells ( 10.5%) were color relevant. The selectivity of cells for the stimulus features was moderately steep: to patterns far from optimal, the cells did not yield even one-fifth of the maximum responses. In many cells the spontaneous activity was even suppressed by the inappropriate patterns. Nevertheless, for each cell there were many suboptimal stimuli that evoked responses larger than one-fifth of the maximum but still significantly less than the maximum. The selectivity for the shape was usually less steep in Combination cells than in Shape-only cells (compare Figs. 10 and 11 with Figs. 7-9), and the selectivity for the segmental shape was usually less steep in the cells in which critical features can be segmented into multiple components than in the cells in which critical features are composed of a single segment (compare Figs. 8 and 9 with Fig. 7). SELECTIVITY ORIENTATION

OF THE IDENTIFIED OF THE STIMULUS.

ELABORATE

CELLS

FOR

THE

The response of the identified Elaborate cells was selective for the orientation of the stimulus. This conclusion is based on qualitative tests routinely performed for all cells in the course of identifying the critical feature and an additional quantitative measurement performed for 28 cells (with >4 orientations for 19 cells and with 2 opposing orientations for 9 cells). Figure 12 lists quantified results of eight cells, selected to represent the whole variety. Rotation of the optimal pattern by 90’ de-

-90

0

90



-90

creased the magnitude of the response by ~50% for most cells (A-F in Fig. 12, 12/19). The tuning of the remaining cells was broader: the response was reduced by >50% just by a rotation of 180’ (G, 6/ 19), or the cell showed only ~50% change (H, 2/28). SELECTIVITY OF THE SIZE OF THE STIMULUS





1

0

90

180

1

-90

deviation

0

90

180

from

0



-90

the

90





0

90

180

IDENTIFIED PATTERN.

ELABORATE

RESPONDING

TO THE

CELLS

FOR

THE

A moderate change in size was tested for many cells in the course of identifying the critical feature (see cell 1 of Fig. 9), but a change larger than four times was tested only for 10 cells. This limitation was partly caused by the troublesome procedure of making very big and very small stimuli by hand. Cells 1 and 2 of Fig. 13 are the 2 extremes among these 10 cells. Cell 2 (see also Fig. 7, top) showed comparable responses to star-shaped cuts tolerating a change in size as large as eightfold. The response of cell 2 was significantly reduced when the optimal pattern was halved or doubled. The other eight cells were intermediate between these two examples.

CELLS

SELECTIVELY

SIGHT

OF

A FACE.

Thirty cells in anterior IT selectively responded to the sight of a face. These responses were selective to a face in the sense that they showed weak or no responses to the other 3-D objects included in the initial routine set and to various regular 2-D patterns. Most cells responded to faces of both monkeys and humans, but five cells responded only to a monkey face and two cells only to a human face. Three of these cells responded to a lateral view of head (profile) and the remaining to a front view of head. Cells that selectively responded to the sight of a face have been found in the depth of the superior temporal sulcus (Bruce et al. 198 1; Desimone et al. 1984; Perrett et al. 1982). These responses of cells in the fundus of the superior temporal sulcus were reported to be insensitive to the orientation of the face in the frontoparallel plane (Desimone et

180 -90

I

ET AL.

-90

0

90

180”

-90

0

90

180°

FIG. 12. Tuning curves of 8 Elaborate cells for the orientation of the stimulus. Abscissas are deviations from optimal orientations. Positive value is in the clockwise direction. Ordinates are magnitudes of responses normalized by the maximum response to the optimal orientation. Thin horizontal lines are drawn at 0.5 to visualize the steepness of tuning. A, C, E, and G are for the same cells as cell 2 of Fig. 7, cell I of Fig. 9, cell I of Fig. 11, and cell I of Fig. 8, respectively. H is the most flat tuning out of 28 cells tested quantitatively. * indicates that the response is significantly smaller than the maximum response with a possible error < 1%.

1

180

optimum

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

INFEROTEMPORAL

CODING

OF VISUAL

IMAGES

OF OBJECTS

183

4 MO03124

3Oi/s FIG. 13. Degree of tolerance to size changes of the stimulus varied from cell to cell. Two extremes are shown. Cell 1 is the same as cell 1 of Fig. 7.

5s

-@ -@ loo SU07142

L 15 i/s

5s

al. 1984; Perrett et al. 1982), whereas the responses of the present cells were selective for the orientation of the face in the frontoparallel plane. For all of 10 cells tested quantitatively, a rotation of the face 90’ from the optimal orientation reduced the response by >SO% (Fig. 14, A-F). Eleven other cells that were tested only qualitatively also showed a sensitivity for the orientation. The remaining 10 cells were examined only with upright faces and were not tested on the selectivity for the orientation. The optimal orientations of the 21 cells included many different orientations, although the upright face was more frequently represented than the other orientations (Fig. 14G). Interestingly, for some cells, a face rotated 180° from the optimal evoked larger responses than faces rotated 90° from the optimal

(Fig. 14, C, E, and F; all P < 0.01). The cell shown in Fig. 14F is the most striking example of this. For87Elaboratecells,we failed to reduce the requirement for the activation from 3-D objects to particular 2-D features. Fifty-eight cells responded equally well to more than two objects, and the remaining 29 cells responded only to single objects. It is possible that these unidentified Elaborate cells responded to some depth cue, because we always tried to reduce the requirement to 2-D features. Cell 2 of Fig. 15 was suggestive. It responded to the bottom of a pineapple (A), a potato (B), a brown sphere (C), and a red sphere (not shown), but not to spheres of other colors or a brown

UNIDENTIFIEDELABORATECELLS.

B

--

J D

2

*

*

12r

G

*

E

h

\

,,, IIIJ --p I

-90

0

90

180°

-90

I

I

J

0

90

180°

-90

0

90

180°

orientation of face 14. A-F: tuning curves of 6 face cells for the orientation of the face. Abscissas are absolute orientations of the face. Zero is the upright position. Ordinates are magnitudes of the responses normalized by the maximum response to the optimal orientation. G: distribution of the best orientation among 2 1 face cells. FIG.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

184

TANAKA

Cell 1 loo

A

fi

C

B

ET AL.

We suspected that some particular spatial frequency component was responsible and tested this possibility by presenting rectangular gratings of various spatial frequencies, but the response evoked by the gratings was only up to 30% of the maximum (exemplified in J and K). Distribution

.17+

.25*

30 i/s

.15*

I

5s

Cell 2

0

00

H

G s

1

n

.24*

SU 07286

J A*

I .94

&*

/I.ilr 5s

15. Cell I: example of an unidentified Elaborate cell. Bottom of a pineapple (A), a potato (B), and a brown sphere (C) activated the cell, whereas a brown disk was not effective (D). We attached various textures on the brown disk, but none of them activated the cell (E and F). Cell 2: another example of an unidentified Elaborate cell. A dot pattern (G) and a radial pattern- like a windmill (H) activated the cell, whereas a pattern composed of concentric rings (I), vertical stripes (J), or horizontal stripes (K) failed to activate the cell. Interspot interval, interval between neighboring rings, and interstripe interval were 4”. Diameter of spots, width of light rings, and width of light bars were 2O. Patterns were projected within a stimulus window 40” in diameter. FIG.

feather brush. The feature common to the effective objects was the spherical shape and the brown or red color. However, a brown disk was much less effective (D). We then suspected that some subtle texture on the surface was required and added various kinds of regular and irregular textures to the brown disk. These textured disks still failed to activate the cell (exemplified by E and F). We concluded that some depth cue that was difficult to reconstruct by hand might be responsible. Binocular disparity was irrelevant, because these Elaborate cells were activated by a monocular presentation of the 3-D objects. It is also possible that some unidentified Elaborate cells responded to more than two independent features. Cell 2 of Fig. 15 is the most suggestive example for this. The cell responded equally well to a dot (G) and a windmi 11pattern (H) but not to patterns composed of concentric rings (I).

of Elaborate cells

Because Elaborate cells showed a wide variety in the complexity of the critical feature and because anterior IT is huge in its area1 extent, it seemed possible that anterior IT is divided into several subareas with the response specificity of cells made up in progressive anatomic stages from subarea to subarea. If this is true, one would expect that the level of complexity of the response specificity varies in different subareas. To test this possibility, we further classified the identified Elaborate cells according to the level of the complexity of the critical feature and examined the distribution of cells of the subgroups. We tested two systems of classification. In the first we distinguished Elaborate cells selective only for shape and Elaborate cells selective for shape combined with texture or color. In Fig. 16, top, the shape-only cells are indicated by circles, and the combination cells are indicated by combined symbols. There was no clear separation between the shape-only cells and the combination cells. There were some combination cells even at the posterior edge of the region, and there were some shapeonly cells at the most anterior part within the mapped region. Cells that selectively responded to faces showed a grouping at the most posterior part in anterior IT. In the second way of subdividing the identified Elaborate cells, we asked whether the critical feature can be divided into multiple segments. The cells for which critical features can be divided into multiple segments are called multisegment, and the cells with critical features composed of single segments are called single-segment. Although this criteria is more or less arbitrary, the conclusion about the clustering does not change by shifting the border of division. In Fig. 16, bottom, single-segment cells are indicated by filled circles and multisegment cells by open circles. The single-segment cells that were classified with great uncertainty are indicated by filled triangles, and the multisegment cells classified with great uncertainty are indicated by open triangles. There was no clear regional separation between singlesegment cells and multisegment cells, although the penetrations were probably too sparse to detect a fine organization. We thus failed to find a way in which anterior IT can be divided into several subdivisions containing cells of different properties, but this does not rule out organizations on more microscopic levels. For example, it has been established that Vl cells lined up in a direction normal to the cortical surface show similar preference for the orientation of contours and similar ocular dominance. Although we do not have enough data to conclude whether IT cells lined up in the direction normal to the surface show related critical features, we have some suggestive preliminary results. It usually took several hours to identify the critical feature of a cell, so the number of cells that could be studied along a single penetration was limited. However, in several penetrations successive cells showed similar critical features, and

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

INFEROTEMPORAL

CODING

OF VISUAL

IMAGES

OF OBJECTS

l

185

QB, shape a shape * texture

+ texture + color + color

e

+ texture

shape

+ color

single-segment

A single-segment? 0 multi-segment A multi-segment?

A20

Al0

FIG. 16. Distribution of subgroups of Elaborate cells. Top: identified Elaborate cells are subdivided by their critical features into shape only, shape + texture, shape + color, texture + color, shape + texture + color, and face. There was no clear separation between cells requiring shape only and cells requiring shape + something, although the face cells clustered at the most posterior part of anterior IT (around anterior 8). Bottom: identified Elaborate cells are subdivided into single-segment cells, the critical features of which can be assumed to be composed of a single segment, multisegment cells, the critical features of which can be divided into >2 segments, and face cells. Filled and open triangles indicate cells that were classified, with uncertainty, into single-segment and multisegment cells. There was no clear separation between single-segment and multisegment cells.

it took less time than usual to identify their critical features. Five of these penetrations are illustrated in Fig. 17. In the first penetration (the leftmost), the critical features of the cells were related to gratings. The first two cells were Texture cells, which responded to a grating. They had surround inhibition and only responded to a grating within a limited area. Then two Elaborate cells were recorded. One responded to a combination of a vertical grating and a horizontal grating (cell I of Fig. 1 l), and the other responded to a grating within a circular area. In the second penetration, the cells showed the critical features that can be related with the sight of a hand. They included two Texture cells that responded to rather rough gratings, an Elaborate cell that responded to an elongated block regardless of the orientation, two Primary cells that responded to an elongated block with orientation selectivity, two Elaborate cells that responded only to the realistic

silhouette of a hand, and two Elaborate cells that responded to a combination of 5 bars projecting from a disk. We can speculate that the features detected by the former five cells are related to the sight of fingers. In the third penetration, we recorded an Elaborate cell that selectively responded to a T-shaped cut (cell 2 of Fig. 8) and three Elaborate cells that selectively responded to a combination of a bar with a disk (one of which is cell 1 of Fig. 8). If we assume the T shape as a projection of a short bar from a long bar, then all the cells were related to a “projection.” In the fourth penetration, we recorded three cells that selectively responded to faces, and they responded to different orientations. The cell recorded at the deepest position also responded well to a face, but it responded to a rear view of a head as well. The requirement was finally reduced to a combination of a dark big body with a smaller light piece.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

TANAKA

186

2

ET AL.

3 Unidentified

F

Weak

Texture ,Texture

-i.

Texture

G

-Unidentified

/ -Weak -EIab.

l-l\ ‘Unidentified

’ /Unidentified Primary

-Unidentified 0

1

,EIab. Primary

1

-

Elab. -Unidentified

Weak

\ \EIab.

‘Elab.

t -Elab.

Elab.

.

‘Unidentified

Weak

5”

Elab.

5”

-Weak

Weak Elab.

I

OSmm Elab.

5O

5”

FIG. 17. Five penetrations in which cells showed related critical features. Positions of recording of cells are indicated by short horizontal lines on the electrode track (vertical line). Top is the surface of the cortex. Elab., identified Elaborate cells; unidentified, unidentified Elaborate cells. Because we did not determine the position of the transition from the gray matter to the white matter, we cannot determine the layer localization of recording positions. Position of the entrance of these penetrations are indicated in Fig. 1 by 17.1- 17.5.

A20

A10

18. Averaged square root of the area of the receptive field. Area1 extent of the receptive fields was measured by assuming they were elliptic in shape, and the square root of the area1 extent was averaged over cells recorded in a single penetration. Circles indicate the averaged value by their diameter and the position of the penetration by the position of their center. Figures attached to circles indicate the number of cells over which the averaging was made. FIG.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

INFEROTEMPORAL TABLE 4.

Primary Texture Elaborate

CODING

OF VISUAL

Squareroot of the area of the receptive$eld Prelunate Gyrus, O

Posterior IT, O

Anterior IT, O

1.94 + 1.18 (96) 4.17 It 3.77 (7) 2.25 t 0.78 (2)

3.66 AI 4.34 (96) 3.03 t 1.36 (3) 10.12 t 7.96 (11)

12.38 I!I 8.89 (39) 13.83 t 10.01 (16) 13.62 t 7.32 (152)

Values are means +_ SD; numbers in parentheses are numbers of cells tested. IT, inferotemporal cortex.

Finally, in the fifth penetration, we succeeded in identifying the critical features for only two cells. One cell responded to a dark star-shaped cut and the other cell responded to a red star attached to an end of a blue tongue.

Size and position of the receptive$eld The extent of the receptive field was determined for 105 cells in the prelunate gyrus, 110 cells in posterior IT, and 207 cells in anterior IT. The receptive field was much larger in anterior IT than in posterior IT and the prelunate gyrus. Figure 18 shows the value of the square root of the area of the receptive fields averaged over cells recorded in single penetrations; the diameter of the circle indicates the averaged value and the center of the circle indicates the position of the penetration. There is a clear discontinuity at the border between posterior IT and anterior IT, determined by the discontinuity in the selectivity of responses. The mean field size is given in Table 4 separately for cells of different classes in different regions. All of the Primary, Texture, and Elaborate cells in anterior IT had large receptive fields, whereas all of the Primary and Texture cells in the prelunate gyrus and posterior IT had small receptive fields. The Elaborate cells in posterior IT were rather exceptional in the posterior regions: their fields were similarly large as the fields of anterior IT cells. The central visual field was over-represented throughout the mapped region. Receptive fields of cells in anterior IT usually included the fovea (152/207), and most cells in the prelunate gyrus and posterior IT had their receptive fields near the fovea with the center within 5 O (94/ 105 in the prelunate gyrus and 93/l 10 in posterior IT). DISCUSSION

The highlight of the present results is that most cells in the anterior two-thirds or three-quarters of IT were maximally activated only by more complex stimuli than bars, disks, and simple texture. The selectivity of responses cannot be explained by the selectivity for the orientation, size, and color of stimuli or by the selectivity to simple texture. This is in line with the previous findings by Gross, Desimone, and colleagues in IT that there are cells specifically responding to brushlike shapes, hands, or faces (Desimone et al. 1984; Gross et al. 1969, 1972; Schwartz et al. 1983). We expanded these previous findings to the coding of general objects by finding cells that each responded specifically to a feature unique to images of a limited number of real objects. The critical feature varied from cell to cell, and the

IMAGES

OF OBJECTS

187

critical features of anterior IT cells as a whole may constitute a basic set of features with which the image of any particular object can be discriminated from images of the other objects. This set of selective responses of cells to various features may constitute a basis of the critical role of this region in the visual object discrimination and recognition. On the other hand, we confirmed the previous results that anterior IT cells have large receptive fields and posterior IT cells have smaller receptive fields (Desimone and Gross 1979; Gross et al. 1969, 1972).

Combination coding Although responses of anterior IT cells were thus selective to particular features, the features coded by individual cells were not complex enough to indicate a concept of particular real objects. Instead, they may be partial features common to images of several different objects. This means that simultaneous activation of a few to a few tens of cells is required to indicate the concept of a particular object. In addition, because the responses of the cells were selective for the orientation of the stimulus, different sets are required for indicating view of the object with different orientations. Images of objects are thus coded by combinations of active cells each of which represents the presence of a particular partial feature. We will call this type of coding “combination coding.” An advantage of the combination coding over the “local coding” (Barlow 1972) is the capability of generalization, although not so strong as that of the “distributed coding.” That is, knowledge acquired for an item represented by a population of cells is automatically generalized for other items represented by mostly overlapping populations (Hinton et al. 1986). Why is the coding of faces and that of hands different from the coding of the other general objects? We confirmed the previous results (Bruce et al. 198 1; Desimone et al. 1984; Gross et al. 1972) by finding cells that responded only to the sight of a face or the realistic silhouette of a hand. Activity of a single cell, instead of a population of cells, seems enough to indicate the presence of a face or a hand. This may be explained by two reasons. One is the especially frequent occurrence of seeing faces and hands. The other is the special meanings of faces and hands. The face and hand have very distinguished meanings from the other objects, and therefore knowledge about faces and hands should not be generalized to the other objects. Although responses of the face cells are highly selective in the sense that they do not respond to the other objects than faces, their selectivity among different faces is not sharp. They responded to virtually all different faces with only broad tunings (Baylis et al. 1985; Yamane et al. 1988). On the basis of these findings, Rolls (1987) proposed that different faces are coded in a distributed manner by many face cells. This distributed coding may have the advantage of capability of generalization among faces. Desimone et al. (1984) proposed that a way of distributed coding similar to that proposed by Rolls is also used for the coding of general objects by emphasizing their finding that “Many IT cells responded equally to nearly every stimulus tested, and most of the stimulus-selective cells gave at least a small response

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

188

TANAKA

to virtually every stimulus tested, especially visually complex stimuli” (p. 206 1 in Desimone et al. 1984). Our combination coding is different from such a distributed coding schema. We propose that activity of each individual cell represents the presence of a particular partial feature in the image, and therefore a cell is active only when the monkey sees images of a limited number of objects that contain the critical feature of the cell. However, the discrepancy with Desimone et al. ( 1984) may be mostly due to a difference in emphasis.

Anatomic substanceof anterior IT The IT has been divided into TEO and TE on the basis of cytoarchitectural properties and effects of lesions (Iwai and Mishkin 1967; von Bonin and Bailey 1947, 1950). Recent anatomic studies (Felleman et al. 1986; Fenstemaker et al. 1984; Ungerleider et al. 1986) further divided TE into posterior and anterior parts by showing that V4 projects to the posterior part of TE as well as to TEO. These anatomic studies also gave a criteria to determine the border between TEO and TE. The projection from V4 to TEO is topographically organized, whereas the projection from V4 to the posterior part of TE shows no topographical organization. On the basis of the correspondence in positions relative to the sulci, we tentatively suggest that our posterior IT corresponds to TEO and our anterior IT includes both the posterior and anterior parts of TE. Our finding that the size of the receptive fields increased rather dramatically as the recording position went from posterior IT to anterior IT can be explained by the difference in the connections from V4, namely, the presence and the absence of a topographical organization. Although we thus found physiological counterparts for the anatomic segregation between TEO and the posterior part of TE, we failed to find differences between the posterior and anterior parts of TE. We also failed to find a difference between V4 and TEO. The conclusion here is in line with the conclusion of Fenstemaker et al. (1985) that the receptive field became larger and the stimulus selectivity became more complex approaching the border between TEO and TE. Finally, we ask how the elaborate selectivity to complex features comes about. Because cells with different levels of complexity were intermingled throughout anterior IT, it is not likely that the selectivity develops along global connections from subregion to subregion of anterior IT. We raise two alternative possibilities, in which Elaborate cells in posterior IT are evaluated differently. Elaborate cells were found in posterior IT, although their population was small. If we emphasize the presence of Elaborate cells in posterior IT, we would hypothesize that the selectivity is mostly made up in posterior IT. Primary cells, which constitute a majority in posterior IT, converge to make Elaborate cells through local connections in posterior IT, and these posterior Elaborate cells project to anterior IT. If we think that the number of Elaborate cells in posterior IT is too small, we would hypothesize that the selectivity develops rapidly through local connections in anterior IT. This latter possibility may be supported by the fact that in anterior IT there were several penetrations in which the critical features of cells varied in complexity but could be related.

ET AL. The authors are grateful to Profs. Charles G. Gross and David H. Hubel for critical reading of an early version of the manuscript. They also thank D. Brockmeyer for improving the English. Present address of Madoka Moriya: Dept. of Anatomy, Faculty of Medicine, Tokai University, Bouseidai, Isehara-city, Kanagawa, 259- 11 Japan. Address for reprint requests: K. Tanaka, Laboratory for Neural Information Processing, Frontier Research Program, The RIKEN Institute, Hirosawa, Wako-city, Saitama, 35 1-O1 Japan. Received 18 June 1990; accepted in final form 7 March 199 1. REFERENCES D. G. AND PRICE, J. I. Amygdalo-cortical projections in the monkey (Macaca facicularis). J. Comp. Neural. 230: 465-496, 1984. BARLOW, H. B. Single units and sensation: a neuron doctrine for perceptual psychology? Perception 1: 37 l-394, 1972. BARLOW, H. B., BLAKEMORE, C., AND PETTIGREW, J. D. The neural mechanism of binocular depth discrimination. J. Physiol. Lord. 193: 327342, 1967. BAYLIS, G. C., ROLLS, E. T., AND LEONARD, C. M. Selectivity between faces in the responses of a population of neurons in the cortex in the superior temporal sulcus of the monkey. Brain Res. 342: 9 l- 102, 1985. BAYLIS, G. C., ROLLS, E. T., AND LEONARD, C. M. Functional subdivisions of the temporal lobe neocortex. J. Neurosci. 7: 330-342, 1987. BRUCE, C. J., DJZSIMONE, R., AND GROSS, C. G. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J. Neurophysiol. 46: 369-384, 198 1. COWEY, A. AND GROSS, C. G. Effects of fovea1 prestriate and inferotemporal lesions on visual discrimination by rhesus monkeys. Exp. Brain Res. 11: 128-144, 1970. DEAN, P. Effects of inferotemporal lesions on the behavior of monkeys. Psychol. Bull. 83: 4 l-7 1, 1976. DESIMONE, R., ALBRIGHT, T. D., GROSS, C. G., AND BRUCE, C. Stimulusselective properties of inferior temporal neurons in the macaque. J. Neurosci. 4: 205 l-2062, 1984. DESIMONE, R., FLEMING, J., AND GROSS, C. G. Prestriate afferents to inferior temporal cortex: an HRP study. Brain Res. 184: 4 l-55, 1980. DESIMONE, R. AND GROSS, C. G. Visual areas in the temporal cortex of the macaque. Brain Res. 178: 363-380, 1979. FELLEMAN, D. J., KNIERIM, J. J., AND VAN ESSEN, D. C. Multiple topographic and non-topographic subdivisions of the temporal lobe revealed by the connections of area V4 in macaques. Sot. Neurosci. Abstr. 12: 1182, 1986. FENSTEMAKER, S. B., ALBRIGHT, T. D., AND GROSS, C. G. Organization and neuronal properties of visual area TEO. Sot. Neurosci. Abstr. 11: 1012, 1985. FENSTEMAKER, S. B., OLSON, C. R., AND GROSS, C. G. Afferent connections of macaque visual areas V4 and TEO. AR VO Abstr. 25: 2 13, 1984. GROSS, C. G. Visual functions of inferotemporal cortex. In: Handbook of Sensory Physiology, edited by R. Jung. Berlin: Springer-Verlag, 1972, vol. VIII, part 3B, p. 45 l-482. GROSS, C. G., BENDER, D. B., AND ROCHA-MIRANDA, C. E. Visual receptive fields of neurons in inferotemporal cortex of the monkey. Science Wash. DC 166: 1303- 1306, 1969. GROSS, C. G., ROCHA-MIRANDA, C. E., AND BENDER, D. B. Visual properties of neurons in inferotemporal cortex of the macaque. J. Neurophysiol. 35: 96-l 11, 1972. HERZOG, A. G. AND VAN HOESEN, G. W. Temporal neocortical afferent connections to the amygdala in the rhesus monkey. Brain Res. 115: 57-69, 1976. HINTON, G. E., MCCLELAND, J. L., AND RUMELHART, D. E. Distributed representations. In: Parallel Distributed Processing, edited by D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Cambridge, MA: MIT Press, 1986, p. 77-109. IWAI, E. AND MISHKIN, M. Further evidence on the locus of the visual area in the temporal lobe of the monkey. Exp. Neural. 25: 585-594, 1967. IWAI, E. AND YUKIE, M. AmygdalofugaI and amygdalopetal connections with modality-specific visual cortical areas in macaques (Macaca fuscata, M. mulatta, and M. Facicularis). J. Comp. Neural. 26 1: 362-387, 1987. IWAI, E. AND YUKIE, M. A direct projection from hippocampal field CA1 to ventral area TE of inferotemporal cortex in the monkey. Brain Res. 444: 397-401, 1988. AMAW,

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

INFEROTEMPORAL

CODING

OF VISUAL

M. A memory system in monkey. Philos. Trans. R. Sot. Land. B Biol. Sci. 298: 85-95, 1982. PERRETT, D. I., ROLLS, E. T., AND CAAN, W. Visual neurons responsive to faces in the monkey temporal cortex. Exp. Brain Res. 47: 329-342, MISHKIN,

1982. ROCKLAND,

K. S. AND PANDYA, D. N. Laminar origins and terminations of cortical connections of the occipital lobe in the rhesus monkey. Brain Res. 179: 3-20, 1979. ROLLS, E. T. Information representation, processing, and storage in the brain: analysis at the single neuron level. In: The Neural and Molecular Bases of Learning, edited by J.-P. Changeux, and M. Konishi. New York: Wiley, 1987, p. 503-540. SAITO, H., TANAKA, K., FUKUMOTO, M., AND FUKADA, Y. The inferior temporal cortex of the macaque monkey: II. The level of complexity in the integration of pattern information. Sot. Neurosci. Abstr. 13: 628, 1987. SAITO, IWAI,

H., YUKIE, M., TANAKA, K., HIKOSAKA, K., FUKADA, Y., AND E. Integration of direction signals of image motion in the superior temporal sulcus of the macaque monkey. J. Neurosci. 6: 145- 157, 1986. SCHWARTZ, E. I., DESIMONE, R., ALBRIGHT, T. D., AND GROSS, C. G. Shape recognition and inferior temporal neurons. Proc. Natl. Acad. Sci. USA 80: 5776-5778, 1983. SELTZER, B. AND PANDYA, D. N. Afferent cortical connections and architectonics of the superior temporal sulcus and surrounding cortex in the rhesus monkey. Brain Res. 149: l-24, 1978. SELTZER, B. AND PANDYA, D. N. Intrinsic connections and architectonics of the superior temporal sulcus in the rhesus monkey. J. Camp. Neural. 290:451-471, 1989. TANAKA, K., FUKADA,

Y., FUKUMOTO, M., AND SAITO, H. The inferior temporal cortex of the macaque monkey: I. regional difference in response properties of cells. Sot. Neurosci. Abstr. 13: 627, 1987. TANAKA, K., FUKADA, Y., AND SAITO, H. Underlying mechanisms of the response specificity of expansion/contraction and rotation cells in the

IMAGES

OF OBJECTS

189

dorsal part of the medial superior temporal area of the macaque monkey. J. Neurophysiol. 62: 642-656, 1989. TANAKA, K., HIKOSAKA, K., SAITO, H., YUKIE, M., FUKADA, Y., AND IWAI, E. Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkey. J. Neurosci. 6: 134- 144, 1986. TANAKA, K. AND SAITO, H. Analysis of motion of the visual field by direction, expansion/contraction, and rotation cells clustered in the dorsal part of the medial superior temporal area of the macaque monkey. J. Neurophysiol. 62: 626-64 I, 1989. TURNER, B. H., MISHKIN, M., AND KNAPP, M. Organization of the amygdalopetal projections from modality-specific cortical association areas in the monkey. J. Comp. Neural. 19 1: 5 15-543, 1980. UNGERLEIDER, L. G., DESIMONE, R., AND MORAN, J. Asymmetry of central and peripheral field inputs from area V4 into the temporal and parietal lobes of the ‘macaque. Sot. Neurosci. Abstr. 12: 1182, 1986. VAN HOESEN, G. W. The parahippocampal gyrus: new observations regarding its cortical connections in the monkey. Trends Neurosci. 5: 345350,1982. VON BONIN,

G. AND BAILEY, P. The Neocortex ofMacaca Mulatta. Urbana, IL: Univ. of Illinois Press, 1947. VON BONIN, G. AND BAILEY, P. The Isocortex of the Chimpanzee. Urbana, IL: Univ. of Illinois Press, 1950. WELLER, R. E. AND KAAS, J. H. Subdivisions and connections of inferior temporal cortex in owl monkeys. J. Comp. Neural. 256: 137- I72,1987. YAMANE, S., KAJI, S., AND KAWANO, K. What facial features activate face neurons in the inferotemporal cortex of the monkey? Exp. Brain Res. 73: 209-214,1988. YUKIE, M. AND IWAI,

E. Direct projections from the ventral TE area of the inferotemporal cortex to hippocampal field CA 1 in the monkey. Neurosci. Lett. 88: 6-10, 1988. ZEIU, S. M. Cortical projections from two prestriate areas in the monkey. Brain Res. 34: 19-35, 197 1.

Downloaded from www.physiology.org/journal/jn by ${individualUser.givenNames} ${individualUser.surname} (192.236.036.029) on August 17, 2018. Copyright © 1991 American Physiological Society. All rights reserved.

Coding visual images of objects in the inferotemporal cortex of the macaque monkey.

1. The inferotemporal cortex (IT) has been thought to play an essential and specific role in visual object discrimination and recognition, because a l...
4MB Sizes 0 Downloads 0 Views