Opinion

Space reconstruction by primary visual cortex activity: a parallel, non-computational mechanism of object representation Moshe Gur Department of Biomedical Engineering, Technion, Israel Institute of Technology, Haifa, Israel

The current view posits that objects, despite changes in appearance, are uniquely encoded by ‘expert’ cells. This view is untenable. First, even if cell ensemble responses are invariant and unique, we are consciously aware of all of the objects’ details. Second, in addition to detail preservation, data show that the current hypothesis fails to account for uniqueness and invariance. I present an alternative view whereby objects’ representation and recognition are based on parallel representation of space by primary visual cortex (V1) responses. Information necessary for invariance and other attributes is handled in series by other cortical areas through integration, interpolation, and hierarchical convergence. The parallel and serial mechanisms combine to enable our flexible space perception. Only in this alternative view is conscious perception consistent with the underlying architecture. ‘Where there is a question of ‘‘mind’’ the nervous system does not integrate itself by centralization upon one pontifical cell. Rather it elaborates a million-fold democracy whose each unit is a cell’ [1]. It is time for a different view on the neural basis of conscious object perception How the retinal image is transformed into our object-based 3D perception has been the focus of much research in the past five decades. While most investigators have shied away from dealing with the question of how objective, physical brain activity generates private subjective percepts, trying to understand what brain activity is likely to generate conscious visual perception has attracted much attention recently [2–4]. Let me stress that here I do not deal with the question of how neural activity generates conscious perception; rather, I discuss the neuronal substrate of perception: what neural activity results in perception? This question is relevant to all models wishing to find the relationship, or Corresponding author: Gur, M. ([email protected]). Keywords: vision; object representation; recognition; conscious perception; parallel processing. 0166-2236/ ß 2015 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tins.2015.02.005

correlation, between activity in various visual areas and conscious perception. Models that aim to describe a path from visual input to action that bypasses perception are not considered. Practically all models trying to explain space perception are influenced by the notion that the ‘brain is a remarkable computer’ [5]. Thus, all suggested brain mechanisms are such that can be, in principle, instantiated by computer programs. This approach is powerful in allowing rigorous simulation and testing of various models but, as I show below, it restricts our thinking to ‘computer-friendly’ theories that ignore prima facie perceptual evidence. Here I offer a different, non-computational view that is driven by well-accepted anatomical, physiological, and perceptual data (Box 1). Current view of object representation and recognition The dominant view is that images are analyzed into edges and line segments by V1 feature-selective cells and that, after several steps of hierarchical convergence and integration, small ensembles of expert cells, by their collective responses, represent objects uniquely and invariantly.

Glossary 2-Deoxyglucose (2DG) imaging: uses the radioactive tracer 14C-2-deoxy-Dglucose to image active (glucose-consuming) brain areas. Fovea: a small retinal area of 1.5 mm diameter with a high density of photoreceptors. Responsible for high-acuity vision. Hypercolumn: a module in V1 comprising orientation and eye dominance columns. All cells within a hypercolumn receive input from the same VF area. Lateral geniculate nucleus (LGN): a subcortical relay nucleus receiving its sensory input from the retina. Its main output is to V1. Receptive field (visual) (RF): an area in the outside world or in the retina stimulation of which leads to responses of a particular visual neuron. Retinotopy: a projection of visual input from the retina to a brain area that preserves retinal topography. Simple cell (in V1): has an orientation-specific RF with nonoverlapping regions that are excited by either light increments or light decrements. Sparse population coding: coding an item such as a face by a relatively small number of nerve cells. Visual dorsal system: originates in V1 and continues along the cortical dorsal surface into the parietal cortex. Believed to be involved in non-conscious analysis of spatial locations, shapes, and orientations of objects, leading to reaching and grasping movements. Visual ventral system: originates in V1 and continues along the ventral cortical surface into the temporal cortex. Assumed to lead to conscious perception and recognition of visual shapes and objects.

Trends in Neurosciences, April 2015, Vol. 38, No. 4

207

Opinion Box 1. On the uniqueness of space perception All sensory attributes of the world except space are not perceived as such but are transformed by the brain into new, internal-only constructs. For example, wavelengths are transformed into colors and air movements into sounds. When seeing colors, hearing sounds, or feeling pain we have no direct information on the nature of the energies in the outside world that are the source of these internal percepts. So when a tree falls in the forest and there is no one around, there is no sound, just air movement. This internal-only representation of sensory data is often used to show that the brain is not a passive, computer-like data processor but rather an organ creating a new, not externally observed reality. For obvious reasons this subjective reality has created difficulties in accepting as valid data first-person reports on sensory perception. If perception of blue exists only inside one’s head how can personal reports serve as reliable, objective scientific data? Generations of psychophysicists have found clever ways around this problem, so today such reports are important components of any scientific investigation of sensory perception. Interestingly, the unique attributes of space perception that make it much less subjective than other senses have not been appreciated. Space, unlike other sensory information, is not perceived as a unique ‘something else’ structure but rather as a one-to-one correspondence between the internal percept and the layout of actual space elements. At the center of the visual field, we perceive not only what is out there, but with an amazing exactitude of details and topography. Thus, our space perception can be objectively verified in at least two important ways. One is that, for a given spatial pattern, say a collection of dots of various shapes and intensities, all observers’ reports will be very similar; all will state that the rightmost dot is more circular and less bright than the one below and so on. This high degree of exactitude shared by all observers lends considerable credibility to subjective reports. The second verification of subjective reports is the ability to compare such reports with objective measurements by physical devices such as a photometer. Thus, all basic spatial characteristics reported by an observer (e.g., location, size, intensity) are completely verifiable by objective measurements. We can thus conclude that, since perception of the organization and structure of basic spatial elements closely corresponds to physical reality, evidence from perception is essential to our understanding of neural mechanisms of space perception.

Hubel and Wiesel’s evidence [6,7] of orientationselective cells in the cat and monkey V1 established the basic tenets for all succeeding models of object representation and recognition. They suggested that feature selectivity in V1 cells is generated by hierarchical convergence of cells with concentric receptive fields (RFs) (see Glossary) in the subcortical lateral geniculate nucleus (LGN) and predicted that further convergence in areas downstream from V1 will enable encoding of increasingly complex features and simultaneously allow a considerable degree of invariance. The next two areas downstream from the monkey V1, V2 and V4, show only a modest increase in feature selectivity [8–12] and it is the temporal cortex where cells that clearly respond to elaborate integrative features are found. Numerous studies showed that cells in the monkey inferotemporal (IT) cortex are selective for various complex shapes, including faces [13–21]. The hierarchical transformations leading to face-selective cells are paralleled by an increase in spatial integration from cells integrating over a few minutes of arc in V1 to cells at the pinnacle of the hierarchy responding within a large portion of the visual field (VF). Research in homologous areas in the human visual cortex is consistent with single-cell data from the monkey IT cortex [22–25]. Experimentalists invariably identify the temporal cortex as the 208

Trends in Neurosciences April 2015, Vol. 38, No. 4

site of object representation and recognition [15–17,20–25]. Presumably, responses of orientation-selective V1 cells give rise to perception of short line segments [26] while perception of more complex objects such as eyes or faces is predicated on the responses of cells in areas that are higher on the hierarchical ladder such as V4 or the IT cortex [27]. It is thought that the collective properties of expert cells enable individual objects to be recognized despite changes in global parameters such as size or viewpoint [15]. Single-cell recordings also showed that receptive field (RF) properties can be modulated by top-down mechanisms. Context, experience, and, most notably, attention affect the responses of cells in V1 [28], V2 and V4 [29,30], and the IT cortex [31,32]. The physiological findings from all levels of the visual cortex that were taken as confirmation of object representation by sparse population coding (Box 2) have influenced practically all computational models of object representation and recognition. Physiology-based models, either those stressing feedforward mechanisms [33– 35] or those adding top-down or lateral interactions related to attention, detail scrutiny, or awareness [36–38], have at their core a process of hierarchical convergence leading to feature elaboration. Other models that are only loosely patterned after visual cortex physiology, while emphasizing various aspects of network interactions, are also assuming that objects are ultimately represented by cell assemblies generated by hierarchical convergence [39–41] (Box 2). Fundamental to the current view of object representation are two main processes: encoding and hierarchical convergence. It assumes that once parallel retinal space information reaches feature-selective V1 cells, it is transformed into a code carried by cells that, by hierarchical convergence, respond to increasingly elaborate features. Suggested codes may be relatively simple, whereby single cells’ responses encode spatial features such as lines or faces, or more complex ones where dynamic ensembles from various levels of the visual hierarchy combine to encode the required spatial information. How such a code is decoded into our parallel, detailed space perception is usually not dealt with. Problems with the current view Most physiological research regarding object representation has concentrated on the properties of face-selective cells found in the IT cortex. Here I thus use findings and ideas related to ‘face cells’ as examples of the more general case of object representation and recognition. The idea that an ensemble of a few hundred expert cells can respond uniquely to thousands of faces yet be invariant to many possible changes in their global appearance (e.g., size, contrast, position, tilt, viewpoint, noise, shadows; see Figure 1) is generally accepted, although it is hard to think of any realistic implementation. Indeed, no system, physiological or physiologically inspired, was ever tested with stimuli that come close to our perceptual ability and the available physiological evidence is mostly anecdotal and does not support true invariance (see below). Moreover, even if we accept that expert cell ensemble responses are able to uniquely

Opinion

Trends in Neurosciences April 2015, Vol. 38, No. 4

Box 2. Grandmother cells versus population coding It is generally assumed that objects – faces in particular – are represented by ‘expert’ cell ensembles in ‘high’ visual areas. Presumably, cells in those ensembles acquire their properties through hierarchical convergence such that cells in the lower rung of the hierarchy are selective for elementary features such as orientation of lines while cells higher on the hierarchy integrate across space to generate selectivity for more complex, abstract features, culminating in cells responding to a very narrow range of stimuli across a large expanse of space. How specific are single cells’ responses has been debated in the decades since the discovery of cells selective for faces and body parts [67]. One view, used largely as a straw man (although see [68] for a different opinion) was that a highly specific single cell, termed a grandmother cell, represents a single face (Figure I, bottom). This view was never considered seriously since it is hard to accept that, for every object that we recognize or will recognize in the future, there is a dedicated cell. Furthermore, it is unlikely that single-cell responses will distinguish a single individual face among many thousands yet retain this distinction across the many changes in the acute appearance of this face due to variations in size, orientation, or contrast. The dominant view is that faces are represented by the collective responses of cell populations or ensembles where each cell is selective for the object’s category while the information specific to a particular face is encoded by the whole ensemble (Figure I, top). The collective response is also assumed to be invariant to changes in the object’s acute appearance. The idea that a cell population can outperform each individual member is not limited to face-selective ensembles but can be found as an explanation for discrepancies between single-cell discrimination abilities and the abilities of the whole organism. For example, a single cell’s best orientation discrimination threshold is 5–108 [59,60] while the human discrimination threshold is 158 and optimal stimuli range from 48 to 108, such results are only to be expected. However, using real faces, line-drawn faces, and photographs we found (M. Gur and A. Grinfeld, unpublished) that the threshold for face discrimination or recognition is approximately 0.28. (The reader is encouraged to view faces as well as other objects at appropriate distances to gain an informal impression.) Thus, faces can be perceived, discriminated, and recognized without activating the ‘face’ cells in the IT cortex. No canonical, invariant representation is perceived If object representation, invariance, and, ultimately, recognition are based on the collective properties of expert cells, we should perceive objects in their normalized, invariant versions. However, we perceive objects in their

TRENDS in Neurosciences

Figure 3. Various dot collections that are perceived as meaningful figures or objects. Note that the image is devoid of line segments, edges, corners, and T junctions.

212

acute form; for example, small or large, tilted, rotated, or smeared. The idea of an invariant transform leading to recognition is probably correct but it is not perceptually evident. Evidence from cortical impairments The hierarchical convergence view leads to the straightforward prediction that damage to any link of the hierarchical chain should disable all aspects of object perception. Conversely, lesions in the monkey V4 [50] and IT [51] areas impaired recognition of objects only after they had been globally transformed by changing size, contrast, or degree of occlusion, while non-transformed objects were normally detected. Likewise, studies in humans with face or object agnosia [52–56] have consistently shown that low-level vision is not impaired; subjects were able to compare objects, read, and perceive all details comprising the objects. In other words, damage to areas downstream from V1 severely impairs transformed holistic perception and recognition but not perception of non-transformed objects or of spatial details and topography. An alternative view: V1 response patterns are the foundation of our space perception That space is retinotopically mapped onto V1 has been known since the early 20th century [57]. A modern version of this mapping was introduced by Horton and Hoyt [58], who showed that the retinal 2D image is transformed into a V1 activity map that preserves the topographical relationship but does so nonlinearly. This reconstruction was beautifully demonstrated by 2-deoxyglucose (2DG) imaging [26]. The term ‘map’ hides what is central to the alternative hypothesis: mapping is not just an organizational tool to maintain the topography of retinal inputs. The profound importance of what is implied by mapping is that the visual image is reconstructed in V1 by point-to-point responses, regardless of feature extraction by specialized cells. Thus, a V1 map is in fact the way the system continuously and meticulously ensures that the detailed parallel spatial information imaged on the retina reaches V1. A line, a collection of dots, or any complex object such as a face is available to the neural machinery as such by generating a corresponding activity pattern in V1 [26]. Thus, all of the information that is unique to any conceivable spatial input is immediately available as V1 activity patterns. (Note that in describing my hypothesis, I use the term ‘representation’ as being synonymous with V1 pre-integrative activity patterns.) The suggestion that V1 response patterns are the substrate of a fundamental parallel spatial representation is based on evidence showing that V1 is the only area where response properties are compatible with the highest perceptual resolution and topography. In the alert monkey, very small RF widths (5–100 at approximately 28 eccentricity; [59–61]) are found. Such widths are compatible with our best resolution at that eccentricity. Closer to the center of vision, even smaller RFs are found (M. Gur and D.M. Snodderly, unpublished). 2DG imaging data [26] show that hypercolumns at the foveal representation extend for approximately 40 and the resolution of V1 response

Opinion patterns is approximately 1.50 . The large V1 cortical area dedicated to central vision [26,58] ensures that even the most detailed high-resolution spatial patterns will be reconstructed by response patterns without losing even the smallest detail. In areas downstream from V1, RF size increases considerably; thus, response patterns are far too coarse to support our fine spatial perception. What is the cellular basis of the V1 non-integrative, point-to-point response patterns? The Tootell et al. [26] 2DG imaging study, which is the only one to demonstrate V1 response patterns to high-resolution stimuli, could not provide information on the cellular basis of the response patterns. It is noteworthy, however, that 2DG activity was seen in almost all V1 layers, with the highest contrast found in layers 3 and 4Cb. I can thus only point out that layer 4Cb contains cells with concentric RFs that will generate activity patterns similar to those generated by retinal ganglion cells and by LGN cells. In addition, many stimuli, such as any dot pattern or very short line segments, will generate responses in many orientationselective cells regardless of the cells’ specific selectivity, particularly in layer-3 cells, which have very small RFs [60,61]. V1 cells that integrate stimuli across their RFs, and cells in V1 and areas downstream from V1 with large RFs [8,62], are unlikely to be the basis of fundamental spatial representation. Why V1 and not the LGN? It is much more likely that V1 response patterns rather than LGN ones are the basis of space perception, since V1, unlike the LGN, is a part of the neocortex and has very extensive input/output connections with cortical and subcortical areas [63]. Also, perceptually important functions such as binocular vision and stereopsis and the localization of contrast regardless of polarity are predicated on binocular cells and on cells responding to both light increments and decrements (respectively) that are found in V1 but not in the LGN [64]. Two complimentary mechanisms The alternative hypothesis thus suggests that the acute V1 2D response patterns are the foundation of space perception in providing the details and the topographical organization of spatial elements. However, our perception goes further than this 2D representation. There must be processes that enable perception of 3D spatial organization, binding, foreground/background separation, coordinate transforms, and object invariance; such processes rely on global features that are independent of the unique withinobject characteristics and must be extracted from the 2D patterns as well as from external sources such as the eyes’ vergence state signal. V1 feature-extracting cells, visual areas from V1 on, and some non-visual areas are there to provide information on, for example, distance, size, orientation, contrast, shading, and continuity. Such information accompanies the acute image to allow it to be compared with a memory prototype and enable object recognition. In other words, all cortical activity from V1 integrating cells through all other visual and non-visual areas is secondary, auxiliary, and informational – not representational.

Trends in Neurosciences April 2015, Vol. 38, No. 4

Another way of looking at the differences between the two mechanisms, the representational and the informational, is to think of them as direct versus inferred perception. As demonstrated in Figure 3, we directly perceive the detailed localized information conveyed by individual dots: size, shape, intensity, and position. However, we can extrapolate beyond individual dots to perceive particular groupings as vertical or horizontal (first row), geometric figures (second row), or complex figures such as an animal or a face (third row). Thus, interpolation, grouping, completion, integration, and comparison with memory allow us to infer the existence of complex objects from the directly perceived spatial elements. It is interesting to note that we are able to correctly recognize a figure such as a rabbit even when it is presented for 100 ms or less [65,66]. In summary, my view is that space perception is based on two complementing mechanisms, one representational and the other informational. The first process, a parallel spatial representation, is the basis of our conscious space perception and is detail preserving, non-encoded, and nonintegrative. This stage is implemented in V1, which reconstructs space as high-resolution (in central vision) retinotopic response patterns. This representation preserves the parallel nature of space but may be modified and enhanced to enable our space perception, which is flexible and richer than a rigid 2D representation. Thus, a second process complements the basic one by extracting information from the 2D image through integration, interpolation, and hierarchical convergence. The combined activity of both mechanisms, the parallel and the serial, the direct and the inferred, enables the full gamut of space perception leading eventually to object perception and recognition. Does the alternative hypothesis resolve the flaws inherent to the current view? Loss of spatial details By assuming that activation patterns generated in V1 are accessible to perception, there is no convergent, integrative stage inserted between detailed input and perception and thus no details are lost. Since V1 is the only area where fine spatial information exists, and since no encoding/decoding is called for, the idea that pre-integrative V1 activity is the foundation of our space perception is simple, parsimonious, and, in my view, inevitable. There are cases, indeed, when it seems that almost unmodified V1 activity, without interaction with other areas, leads directly to perception; consider the myriad sparkling dots generated by a firework – they are spatially random, novel, and not perceived as an object but just a bunch of sparkling dots. How would the current view account for their perception? Dot cells in V1 converging on firework cells in V4? The alternative hypothesis offers a simple explanation: those sparkling dots generate a dot pattern in V1 that leads to a corresponding percept without additional feature encoding. Uniqueness and invariance The alternative hypothesis posits that responses of face cell ensembles provide auxiliary information on global properties such as category. Thus, the fact that face cell ensembles cannot, in all probability, represent hundreds of different faces yet be indifferent to the enormous global 213

Opinion changes in individual appearance poses no difficulty; it simply means that the acute versions of any given face are perceived through V1 responses and that the extra information needed for their holistic perception and recognition is supplied by other cortical areas. The ability to perceive spatial elements that are not encoded by V1 orientation-selective cells This is a direct consequence of the alternative hypothesize premise that any pre-integrative responses in V1 may be perceived. Small perceived faces do not stimulate IT cells This simply means that small faces, like large ones, are perceived through the response patterns generated in V1 and that the extra information needed for their recognition is supplied by cortical areas other than the IT cortex. The very large RFs found in expert IT cells may be useful in cases where only a part of the object is sampled by the central fovea and additional saccades are needed for a full, detailed view. Such a mechanism is not necessary for small objects that are fully captured by the fovea. Acute, not canonical, objects are perceived The position that object perception is predicated on the responses of high-level expert cells leads to the prediction that we should perceive transformed, canonical objects. Since the alternative hypothesis assumes that V1 response patterns underlie our space perception, the fact that acute, not normalized, objects are perceived, is a natural outcome. Damage to areas downstream from V1 Representation and eventually recognition through hierarchical convergence predicts, contrary to findings in both monkeys and humans, that lesions in downstream areas would lead to a complete loss of object representation, perception, and recognition. Since the alternative hypothesis assumes that areas downstream from V1 provide information enabling invariance, it neatly explains experimental results showing not a total inability to recognize objects but rather inability to recognize transformed ones. The alternative hypothesis also explains why detail perception is not lost, since such details are assumed to be represented by V1 patterns and are not affected by damage to downstream areas. Implications of the alternative hypothesis We need to remind ourselves that both space and perceived space are parallel and simultaneous in their intrinsic occurrences. The ability to process and ultimately perceive in parallel seems both natural and essential for normal vision but is not unique to the visual system. Parallel processing is ubiquitous in the brain even when the very nature of the information is serial, as is the case for the auditory system. Clearly the parallel abilities of the brain are what enable it to deal quickly and efficiently with a great deal of information and are unique to the brain and what makes it different from any machine, computers included, that we know. The preservation of parallel organization of spatial elements from the retina to perception has not been considered 214

Trends in Neurosciences April 2015, Vol. 38, No. 4

by the hierarchical convergence hypothesis, where a bottleneck of information flow exists when the entire parallel V1 input is converted into increasingly serial processing and representation. Such mechanisms not only ignore perceptual reality but also strip the brain of what makes it unique by reducing it, by and large, to a serial processor. The alternative view retains the highly efficient parallel representation reaching V1 while utilizing the additional information generated by integration and convergence to enable, when needed, binding, 3D information, object invariance, and other auxiliary mechanisms. From neural activity to perception The central issue that both the current view and the alternative one are dealing with is what neural activity leads to object representation, perception, and, at times, recognition. Both hypotheses trivially assume that activity of cells in the visual cortex leads to perception. My suggestion that V1 patterns are the foundation of space perception and that objects are represented by V1 activity is not more radical than the assumption that responses in a fairly small number of temporal cortex cells lead to face perception and recognition. Indeed, the latter is the more complicated proposition; it is simpler and more parsimonious to have a parallel spatial V1 pattern leading to a parallel percept than for a spatial pattern encoded in the collective responses of expert cells to be somehow decoded back into a parallel spatial percept. Pre-integrative V1 patterns are the main components of conscious space perception There is an ongoing debate about whether V1 contributes directly to conscious perception. An extreme stand was taken by Crick and Koch [2], who suggested that since V1 has no direct connections with frontal areas it cannot have direct access to conscious perception. Others [3,4] offer a less radical view whereby simple, but not complex, features such as line segments that are encoded by V1 orientationselective cells may lead directly to perception. The alternative view implies that V1 is the major contributor to our conscious space perception. If we exclude the possibility that population coding by space-integrating cells can extract the very basic geometry of spatial elements (see above), since we perceive fine details of space and since V1 is the only cortical area that represents these details, we must conclude that V1 activity is consciously perceived. A fresh look at the role of feature-selective cells As argued above, the view that perception and recognition of complex objects is predicated on the responses of a group of expert cells requires that those cells encode the fine details that characterize a specific object while being invariant to any number of global changes in the object appearance. Such requirements are unrealistic. If we give up the idea of object representation by expert cells, it enables us to look at single cells and consider what global features such as size, contrast, or viewpoint may be encoded by separate ensembles located in different visual areas. For example, we may conclude that while details that are unique to a face (e.g., eyelashes, lips, eyes) are represented by V1 patterns, V1 feature-selective cells

Opinion

Trends in Neurosciences April 2015, Vol. 38, No. 4

Box 3. Predictions

Box 4. Outstanding questions

A new hypothesis is worthy of consideration if it can explain a body of evidence that cannot be explained otherwise and in doing so provides a new perspective on the field of study. I believe that I have been successful in doing so. A new hypothesis should also generate testable predictions. Although in such a complex and multilevel system as the visual system it is hard to find a critical prediction that can totally refute the hypothesis, in what follows I offer several useful predictions.

 The alternative hypothesis posits that activity in V1 is consciously perceived. Can a similar suggestion be made regarding other sensory systems such as the somatosensory and auditory?  It is argued here that activity in different visual loci can be associated and perceived without convergence in higher areas. Can evidence for similar mechanisms be observed in other sensory systems?  Within a sensory modality, what are the space–time constraints of the association without convergence? In other words, how far removed in time and cortical distance can separate elements be and remain associated and directly and simultaneously perceived?  Can we perceptually associate, in parallel, spatial elements affecting separate loci within one hemisphere only or can we also associate across hemispheres?  Can such interaction without convergence be used to explain multisensory integration? For example, when viewing a talking person detailed spatial and auditory information is combined although detailed representation exists only at separate and distant cortical loci (V1 and the primary auditory cortex).  Can association without convergence be the key to the unique properties of memory processes?

Confirmed ‘predictions’ Here I have indicated major flaws in current thinking that are explained by my hypothesis. However, it is sufficient to observe that our conscious perception of fine spatial details is compatible only with V1 pre-integrative activity to hypothesize that conscious perception is based on this activity. Now the other observations, not explained by the current view, can be presented as predictions of the alternative hypothesis. Thus, perceiving non-encoded dots and acute, not canonical, objects, perceiving very small faces that do not stimulate IT face cells, and observing that damage to the temporal lobe prevents recognition of transformed but not of nontransformed objects can all be seen as confirmed predictions. Testable predictions (i) Cells at various levels of the visual cortex would not be sensitive to small changes in the spatial elements comprising their optimal stimuli. Simple cells would give the same responses to many different patterns (see Figure 2 in main text), V4 cells responding to complex objects within, say, a 58 RF, would not discriminate between stimuli that differ by a few arc minutes in their internal arrangements, and, most notably, face-selective cells would give the same response to faces where, for example, the eyes have slightly different shapes. (ii) Presenting subjects with novel random dot patterns will evoke very little, if any, bottom-up responses beyond V1. Such an experiment can be conducted in humans by visual evoked potential (VEP) recordings that will show little activity outside V1 within the bottom-up timeframe of 100 ms. (iii) Using electric or magnetic means to directly and exclusively stimulate the ‘face patches’ in the human temporal cortex will not evoke a vivid normal perception of faces.

provide information on the orientation and size of the within-face elements, middle temporal (MT) cells encode its motion, V4 cells respond best to its orientation, posterior IT cells are tuned to its overall size, and anterior IT cells encode its category (Box 3). Concluding remarks: perception integrates over disparate brain loci When we perceive elements that are arranged at different spatial locations, this organization, which is parallel and simultaneous, generates corresponding activity in V1 that reaches perception largely unaltered. So, two dots generate two activity loci and are perceived as such. We perceive their relative locations, size, and any other characteristics without sending it into a ‘higher’ area, which means that the brain can compare, associate, and perceive across disparate V1 loci. Furthermore, when an object is seen as a unified entity it follows that the brain can relate and associate neural activity existing at separate V1 loci as well as at other brain loci providing auxiliary information. In other words, when an object is perceived, our brain creates a percept based on activity in physically distant loci without sending all such activities elsewhere. A similar idea was expressed years ago by Sherrington [1] who said, in rejecting William James’s pontifical cell notion, that

conscious perception is based on ‘a million-fold democracy whose each unit is a cell’. The ability to relate disparate cortical loci without converging the information to a comparator or an integrator seems to be a natural and easy task for our perception yet is one that no physical device, including computers, can perform. It is interesting to note that since the current view rejects the idea of a gnostic or grandmother cell [67] in favor of ensemble coding, the same conclusion of perceptual integration without convergence into a higher stage can be drawn. The clear difference between the two views is that whereas in the alternative view perception of basic spatial elements is directly evoked by V1 patterns, in the current view there are several stages of convergence and integration before the disparate cortical loci generate a code that is somehow decoded and then ‘reaches’ perception (Box 4). Acknowledgments The author is grateful to E. Greene, T. Hendel, A. Reeves, and D. Rose for their willingness to engage in a continuous and helpful dialog during several stages of the manuscript. He thanks R. Malach, D. Sagi, and R. Shapley for their insightful comments.

References 1 Sherrington, C.S. (1940) Man on his Nature, Cambridge University Press 2 Crick, F.A. and Koch, C. (1995) Are we aware of neural activity in primary visual cortex? Nature 375, 121–123 3 Paradiso, M.A. (2002) Perceptual and neuronal corrrespondence in primary visual cortex. Curr. Opin. Neurobiol. 12, 155–161 4 Tong, F. (2003) Primary visual cortex and visual awareness. Nat. Rev. Neurosci. 4, 219–229 5 Hinton, G.E. (1992) How neural networks learn from experience. Sci. Am. 267, 145–161 6 Hubel, D.H. and Wiesel, T.N. (1962) Receptive fields, binocular interactions and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 7 Hubel, D.H. and Wiesel, T.N. (1968) Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195, 215–245 8 Anzai, A. (2007) Neurons in monkey visual area V2 encode combinations of orientations. Nat. Neurosci. 10, 1313–1321 9 Hegde, J. and Van Essen, D.C. (2007) A comparative study of shape representation in macaque visual areas V2 and V4. Cereb. Cortex 17, 1100–1116 215

Opinion 10 Gallant, J.L. et al. (1996) Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. J. Neurophysiol. 76, 2718–2739 11 Carlson, E.T. et al. (2011) A sparse object coding scheme in area V4. Curr. Biol. 21, 288–293 12 Roe, A.W. et al. (2014) Toward a unified theory of visual area 4. Neuron 74, 12–29 13 Perrett, D.L. et al. (1982) Visual neurons responsive to faces in the monkey temporal cortex. Exp. Brain Res. 47, 329–342 14 Orban, G.A. (2008) Higher order visual processing in macaque extrastriate cortex. Physiol. Rev. 88, 59–89 15 Freiwald, W.A. and Tsao, D.Y. (2010) Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science 330, 845–851 16 Eifuku, S. et al. (2011) Neural representation of personally familiar faces in the anterior inferior temporal cortex of monkeys. PLoS ONE 6, e18913 17 Hung, C.P. et al. (2005) Fast readout of object identity from macaque inferior temporal cortex. Science 310, 863–866 18 Yamins, D.K.L. et al. (2014) Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. U.S.A. 111, 8619–8624 19 Ohayon et al. (2012) What makes a cell face selective? The importance of contrast. Neuron 74, 567–581 20 Sato, T. et al. (2013) Object representation in inferior temporal cortex is organized hierarchically in a mosaic-like structure. J. Neurosci. 33, 11642–16656 21 Hirabayashi, T. et al. (2013) Microcircuits for hierarchical elaboration of object coding across primate temporal areas. Science 341, 191–195 22 Axelrod, V. and Yovel, G. (2012) Hierarchical processing of face viewpoint in human visual cortex. J. Neurosci. 32, 2442–2452 23 Parvizi, J. et al. (2012) Electrical stimulation of human fusiform faceselective regions distorts face perception. J. Neurosci. 32, 14915–14920 24 Anzellotti, S. et al. (2014) Decoding representations of face identity that are tolerant to rotation. Cereb. Cortex 24, 1988–1995 25 Ramirez, F.M. (2014) The neural code for face orientation in the human fusiform face area. J. Neurosci. 34, 12155–12167 26 Tootell, R.B.H. et al. (1988) Functional anatomy of macaque striate cortex. II. Retinotopic organization. J. Neurosci. 8, 1531–1568 27 Balduzzi, D. and Tononi, G. (2009) Qualia: the geometry of integrated information. PLoS Comput. Biol. 5, 1–24 28 Li, W. et al. (2004) Perceptual learning and top-down influences in primary visual cortex. Nat. Neurosci. 7, 651–657 29 McAdams, C.J. and Maunsell, J.H.R. (1999) Effects of attention on orientation-tuning functions of single neurons in macaque cortical area V4. J. Neurosci. 19, 431–441 30 Reynolds, J.H. et al. (1999) Competitive mechanisms subserve attention in macaque areas V2 and V4. J. Neurosci. 19, 1736–1753 31 Chelazzi, L. et al. (1998) Responses of neurons in inferior temporal cortex during memory-guided visual search. J. Neurophysiol. 80, 2918–2940 32 Davidesco, I. et al. (2013) Spatial and object-based attention modulates broadband high-frequency responses across the human visual cortical hierarchy. J. Neurosci. 33, 1228–1240 33 Serre, T. et al. (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29, 411–426 34 Rolls, E.T. (2012) Invariant visual object and face recognition: neural and computational bases, and a model, VisNet. Front. Comp. Neurosci. 6, 35 35 Ghodrati, M. et al. (2014) The importance of visual features in generic vs. specialized object recognition: a computational study. Front. Comp. Neurosci. 8, 86 36 Lamme, V.A. and Roelfsema, P.R. (2000) The distinct modes of vision offered by feedforward and recurrent processing. Trends Neurosci. 23, 571–579 37 Hochstein, S. and Ahissar, M.A. (2002) A view from the top: hierarchies and reverse hierarchies in the visual system. Neuron 36, 791–804 38 Tschechne, S. et al. (2014) Hierarchical representation of shapes in visual cortex – from localized features to figural shape segregation. Front. Comp. Neurosci. 8, 93 39 von der Malsburg, C. (1999) The what and the why of binding: the modeler’s perspective. Neuron 24, 95–104 40 Ullman, S. et al. (2002) Visual features of intermediate complexity and their use in classification. Nat. Neurosci. 5, 682–687 41 Kelso, S.J.A. (2008) An essay on understanding the mind. Ecol. Psychol. 20, 180–208 216

Trends in Neurosciences April 2015, Vol. 38, No. 4

42 Sally, S.L. and Gurnsey, R. (2007) Foveal and extra-foveal orientation discrimination. Exp. Brain Res. 183, 351–360 43 Quiroga, R.Q. (2012) Concept cells: the buildings blocks of declarative memory functions. Nat. Rev. Neurosci. 13, 587–597 44 Rolls, E.T. and Baylis, G.C. (1986) Size and contrast have only small effects on the responses to faces of neurons in the cortex of the superior temporal sulcus of the monkey. Exp. Brain Res. 65, 38–48 45 Desimone, R. (1984) Stimulus-selective properties of inferior temporal neurons in macaque. J. Neurosci. 4, 2051–2062 46 Tovee, M.J. et al. (1994) Translation invariance in the responses to faces of single neurons in the temporal visual cortex areas of the alert macaque. J. Neurophysiol. 72, 1049–1059 47 Rolls, E.T. et al. (1997) Activity of neurons in the inferotemporal cortex of the alert monkey. Brain Res. 130, 229–238 48 Eifuku, S. (2004) Neural correlates of face identification in the monkey anterior temporal cortical areas. J. Neurophysiol. 91, 358–371 49 Ito, M. (1995) Size and position invariance on neuronal responses in monkey inferotemporal cortex. J. Neurophysiol. 73, 218–226 50 Schiller, P.H. (1995) Effect of lesions in visual cortical area V4 on the recognition of transformed objects. Nature 376, 342–344 51 Weiskrantz, L. and Saunders, R.C. (1984) Impairments of visual object transforms in monkeys. Brain 107, 1033–1072 52 Roberts, D.J. et al. (2013) Efficient visual object and word recognition relies on high spatial frequency coding in the left posterior fusiform gyrus: evidence from a case-series of patients with ventral occipitotemporal damage. Cereb. Cortex 23, 2568–2580 53 Avidan, G. et al. (2011) Impaired holistic processing in congenital prosopagnosia. Neuropsychologia 49, 2541–2552 54 Konen, C.S. et al. (2011) The functional neuroanatomy of object agnosia: a case study. Neuron 71, 49–60 55 Ariel, R. and Sadeh, G. (1996) Congenital visual agnosia and prosopagnosia in a child: a case report. Cortex 32, 221–240 56 Gilai-Dotan, S. (2009) Seeing with profoundly deactivated mid-level visual areas: non-hierarchical functioning in the human visual cortex. Cereb. Cortex 19, 1687–1703 57 Holmes, G. and Lister, W.T. (1916) Disturbances of vision from cerebral lesions, with special reference to the cortical representation of the macula. Brain 39, 34–73 58 Horton, J.C. and Hoyt, W.F. (1991) The representation of the visual field in the human striate cortex: a revision of the classic Holmes map. Arch. Ophthalmol. 109, 816–864 59 Snodderly, D.M. and Gur, M. (1995) Organization of striate cortex in alert trained monkeys (Macaca fascicularis): ongoing activity, stimulus selectivity, and widths of receptive field activating regions. J. Neurophysiol. 74, 2100–2125 60 Gur, M. et al. (2005) Orientation and direction selectivity of single cells in V1 of alert monkeys: functional relationships and laminar distributions. Cereb. Cortex 15, 1207–1221 61 Gur, M. and Snodderly, D.M. (2008) Physiological differences between neurons in layer 2 and layer 3 of primary visual cortex (V1) of alert macaque monkeys. J. Physiol. 586, 2293–2306 62 Willmore, B.D.B. et al. (2010) Neural representation of natural images in visual area V2. J. Neurosci. 30, 2102–2114 63 Felleman, D.J. and Van Essen, D.C. (1991) Distributed hierarchical processing in the primate visual cortex. Cereb. Cortex 1, 1–47 64 Schiller, P.H. and Malpeli, J.G. (1978) Functional specificity of lateral geniculate laminae of the rhesus monkey. J. Neurophysiol. 41, 788–797 65 Greene, E. (2007) Retinal encoding of ultrabrief shape recognition cues. PLoS ONE 2, e871 66 Greene, E. and Ogden, R.T. (2012) Evaluating the contributions of recognition using the minimal transient discrete cue protocol. Behav. Brain Funct. 8, 53 67 Gross, C.G. (2002) Genealogy of the ‘‘grandmother cell’’. Neuroscientist 8, 84–90 68 Bowers, J.S. (2010) More on grandmother cells and the biological implausibility of PDP models of cognition: a reply to Plaut and McClelland (2010) and Quian Quiroga and Kreitman (2010). Psychol. Rev. 117, 300–308 69 Vogel, R. (1990) Population coding of stimulus orientation by striate cortical cells. Biol. Cybern. 64, 25–31 70 Doi, E. and Lewicki, M.S. (2014) A simple model of optimal population coding for sensory systems. PLoS Comput. Biol. 10, e1003761

Space reconstruction by primary visual cortex activity: a parallel, non-computational mechanism of object representation.

The current view posits that objects, despite changes in appearance, are uniquely encoded by 'expert' cells. This view is untenable. First, even if ce...
1MB Sizes 1 Downloads 6 Views