Vision Research xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Vision Research journal homepage: www.elsevier.com/locate/visres

Recent developments in a computational theory of visual attention (TVA) Claus Bundesen ⇑, Signe Vangkilde, Anders Petersen University of Copenhagen, Denmark

a r t i c l e

i n f o

Article history: Received 24 July 2014 Received in revised form 10 November 2014 Available online xxxx Keywords: Visual attention TVA

a b s t r a c t This article reviews the foundations of the theory of visual attention (TVA) and describes recent developments in the theory. TVA is based on the principle of biased competition: All possible visual categorizations ascribing features to objects compete (race) to become encoded into visual short-term memory before it is filled up. Each of the possible categorizations is supported by sensory evidence, but the competition is biased by multiplication with attentional weights (high weights on important objects) and perceptual biases (toward use of important categories). The way sensory evidence and attentional biases interact is specified in the rate and weight equations of TVA, so TVA represents a mathematical formalization of the biased competition principle. In addition to describing TVA as a psychological theory, we present the neural interpretation of TVA, NTVA. Ó 2014 Elsevier Ltd. All rights reserved.

0. Introduction

1. A computational theory of visual attention (TVA)

In this article, the foundations of the theory of visual attention, TVA (Bundesen, 1990), are reviewed and recent developments in the theory are described. TVA is based on the principle of biased competition (Desimone & Duncan, 1995): All possible visual categorizations ascribing features to objects compete (race) to become encoded into visual short-term memory before it is filled up. Each of the possible categorizations is supported by sensory evidence, but the competition is biased by multiplication with attentional weights (high weights on important objects) and perceptual biases (toward use of important categories). The way sensory evidence and attentional biases interact is specified in the rate and weight equations of TVA, so TVA represents a mathematical formalization of the biased competition principle. Bundesen (1990) showed how TVA can account for many psychological findings (reaction times and error rates) in the field of visual attention. More recently, Duncan et al. (1999) showed how TVA provides a method to quantify individual variation in attentional abilities, which has been used to study many normal and clinical populations. Later TVA was extended by Bundesen, Habekost, and Kyllingsbæk (2005, 2011) and Bundesen and Habekost (2008) into a neural theory of visual attention (NTVA), which accounts for many effects of attention observed in firing rates of single cells in the visual system of primates.

1.1. Basic assumptions

⇑ Corresponding author at: Department of Psychology, University of Copenhagen, Øster Farimagsgade 2A, DK-1353 Copenhagen S, Denmark. E-mail address: [email protected] (C. Bundesen).

In TVA, both visual recognition and selection of objects in the visual field consist in making visual categorizations of the form ‘‘object x has feature i’’ or, equivalently, ‘‘object x belongs to category i.’’ A categorization is made (i.e., selected) when it is encoded into visual short-term memory (VSTM). If and when the categorization is made, object x is said to be selected and also to be recognized as a member of category i. Thus, selection and recognition are viewed as two aspects of the same process. If and when a visual categorization of an object completes processing, the categorization enters VSTM if memory space for the categorization is available in VSTM. The storage capacity of VSTM (parameter K) is normally assumed to be about four independent objects (see, e.g., Luck & Vogel, 1997; Shibuya & Bundesen, 1988), however, K seems to vary not only between individuals but also on a trial-by-trial basis (see Dyrholm et al., 2011). Clearing VSTM opens a race among all objects in the visual field to become encoded into VSTM. An object becomes encoded in VSTM if, and only if, some categorization of the object becomes encoded in VSTM. Thus, each object x may be represented in the encoding race by all possible categorizations of the object.

1.1.1. Rate equation By the rate equation of TVA, the rate, m(x, i), at which a particular visual categorization, ‘‘x belongs to i,’’ is encoded into VSTM is given by a product of three terms:

http://dx.doi.org/10.1016/j.visres.2014.11.005 0042-6989/Ó 2014 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Bundesen, C., et al. Recent developments in a computational theory of visual attention (TVA). Vision Research (2014), http://dx.doi.org/10.1016/j.visres.2014.11.005

2

C. Bundesen et al. / Vision Research xxx (2014) xxx–xxx

wx : z2S wz

v ðx; iÞ ¼ gðx; iÞbi P

ð1Þ

The first term, g(x, i), is the strength of the sensory evidence that x belongs to category i. The second term, bi, is a perceptual decision bias associated with category i (0 6 bi 6 1). The third term is the relative attentional weight of object x—that is, the weight of object x, wx, divided by the sum of weights across all objects in the visual field, S. 1.1.2. Weight equation The attentional weights in the rate equation of TVA are assumed to be derived from pertinence values. Every visual category j is supposed to have a pertinence, pj, which is a nonnegative real number, reflecting the current importance of attending to objects that belong to category j. By the traditional weight equation of TVA, the attentional weight of an object x in the visual field is

wx ¼

X

gðx; jÞpj ;

ð2Þ

j2R

where R is the set of all visual categories, g(x, j) is the strength of the sensory evidence that object x belongs to category j, and pj is the pertinence of category j. By Eq. (2), the attentional weight of an object is a weighted sum of pertinence values. The pertinence of a given category enters the sum with a weight equal to the strength of the sensory evidence that the object belongs to the category. Based on partial-report experiments with color singletons, Nordfang, Dyrholm, and Bundesen (2013) recently proposed a generalization of the weight equation such that

wx ¼ jx

X

gðx; jÞpj ;

0

ð2 Þ

j2R

where jx is the strength of the local feature contrast of object x. Thus, Nordfang, Dyrholm, and Bundesen (2013) provided evidence that the attentional weight of an object is a product of both bottom-up (feature contrast) and top-down (feature relevance) components. Note that, for displays in which the local feature contrast jx is the same for all objects x, the change from Eq. (2) to Eq. (20 ) is immaterial because the ratios between the attentional weights of different objects remain the same. 1.2. Mechanisms of selection TVA assumes two mechanisms of selection: a mechanism for selection of objects and a mechanism for selection of categories. Selection of objects is controlled by pertinence values and attentional weights. If selection of objects with a particular feature is desired, the pertinence of the feature in question should be high. The weight equation implies that when a feature has a high pertinence, objects with the given feature get high attentional weights. By the rate equation, therefore, objects with the given feature are likely to win the processing race and be encoded into VSTM. Selection of categories is controlled by decision bias parameters. Whereas pertinence values determine which objects are selected (filtering), decision biases determine how the objects are categorized (pigeonholing). If particular types of categorizations are important, the bias values of the corresponding categories should be high so that the desired types of categorizations are likely to be made (cf. the rate equation). Coupled to a front-end system that supplies appropriate measures (g values) of the strength of the sensory evidence concerning the features of objects in its visual field, and given adequate settings of pertinence (p) and bias (b) parameters, the specified system accomplishes both filtering and pigeonholing by an encoding race between visual categorizations whose rate parameters are determined by the simple algebraic operations of the rate and

weight equations. In this sense, the theory yields a truly computational and mechanistic account of selective attention in vision. 1.3. Applications to basic studies of human performance TVA has been applied to a broad range of findings on singlestimulus recognition and selection from multielement displays both in normal subjects and in patient populations (see Bundesen & Habekost, 2008, 2014, for overviews). In addition, the scope of the theory has been extended to other cognitive domains such as memory and executive control. 1.3.1. Single-stimulus recognition TVA provides a mathematical derivation of the biased-choice model of Luce (1963) for single-stimulus recognition (see Bundesen, 1990, 1993). This model has been successful in explaining many experimental findings on effects of visual discriminability and bias (see, e.g., Townsend & Ashby, 1982; Townsend & Landon, 1982). TVA also provides predictions of the time needed to recognize a singly-presented simple stimulus. The v value on the left-hand side of Eq. (1) is a hazard rate. Thus, at any time t, the instantaneous value of v(x, i) is the conditional probability density that the categorization that object x belongs to category i is encoded into VSTM at time t, given that the categorization has not been encoded before time t, and given that storage space for the categorization is available in VSTM. This assumption implies that, if v(x, i) = 0 until time t0, and m(x, i) is kept constant for times t > t0, the probability p(t) of making the categorization ‘‘x belongs to i’’ at or before time t is exponentially distributed:

pðtÞ ¼ 1  exp½v ðx; iÞðt  t0 Þ::

ð3Þ

When incorrect categorizations due to guessing or perceptual confusions among stimuli can be neglected, p(t) equals the probability of recognizing x as a member of i from a postmasked presentation with an exposure duration of t, given that t0 equals the longest ineffective exposure duration. Recently, Dyrholm et al. (2011) provided evidence of a small trial-by-trial variation in t0, which could be approximated by a normal distribution. By this approximation, the probability of making a particular categorization at or before time t is no longer exponentially distributed but follows an ex-Gaussian distribution (i.e., a convolution of a normal and an exponential distribution; Luce, 1986). Eq. (3) has yielded close and useful fits to many empirical data, including data from patient studies based on TVA (see Habekost & Starrfelt, 2009; for a review). The equation presupposes that perceptual confusions among stimuli can be neglected, but Kyllingsbæk, Markussen, and Bundesen (2012) recently showed how the time course of visual recognition of mutually confusable stimuli can be analyzed by use of TVA (also see Logan, 1996). Kyllingsbæk et al. proposed and tested a Poisson counter model of visual recognition of briefly presented, mutually confusable single stimuli in pure accuracy tasks (i.e., unspeeded tasks). The model implies that during stimulus analysis, tentative categorizations that stimulus x belongs to category i are made at a constant Poisson rate, v(x, i). The analysis is continued until the stimulus disappears, and the overt response is based on the categorization made the greatest number of times. This model provided a close fit to individual data on identification of digits and an apparently perfect fit to data on identification of Landolt rings. 1.3.2. Temporal attention in single-stimulus recognition Vangkilde, Coull, and Bundesen (2012) and Vangkilde, Petersen, and Bundesen (2013) extended TVA to temporal attention (also see Matthias et al., 2009, 2010). Temporal expectations guide our

Please cite this article in press as: Bundesen, C., et al. Recent developments in a computational theory of visual attention (TVA). Vision Research (2014), http://dx.doi.org/10.1016/j.visres.2014.11.005

3

C. Bundesen et al. / Vision Research xxx (2014) xxx–xxx

Ó Royal Society 2012

attention in time (see, e.g., Nobre, 2001, 2010; Nobre & Rohenkohl, 2014). Beneficial effects of valid temporal expectations on motoric responses have been demonstrated repeatedly over the past century (Woodrow, 1914; for reviews, see Los, 2010; Niemi & Näätänen, 1981). Vangkilde, Coull, and Bundesen (2012) explored effects of temporal expectation on visual processing speed in a cued single-stimulus recognition paradigm with briefly presented, postmasked stimuli. Temporal expectations were manipulated by different hazard rate functions for the cue-stimulus foreperiod. For example, in one experiment, the length of the foreperiod from the cue to the stimulus was distributed exponentially. For each block of trials, the participants knew from which of two exponential distributions with different hazard rates the foreperiods would be drawn. The hazard rate could be either high (1.33 s1) or low (0.22 s1) corresponding to mean foreperiods of 750 ms and 4500 ms, respectively. In either condition, the probability p(t) of correct report as a function of the stimulus duration (t) was well described by Eq. (3), t0 being the threshold of conscious perception (the longest ineffective exposure duration), and v(x, i) being the speed of encoding into VSTM at times t > t0. As manipulated by the hazard rate, temporal expectation had no effect on the threshold of conscious perception but strong effect on the speed of subsequent encoding into VSTM. Averaged across participants, the speed of encoding was lowered by 30% in the low hazard rate condition. This effect was found even though no general decrease in processing speed with time-on-task occurred. Thus, the effect was independent of the actual duration of the foreperiod on a given trial, but depended entirely on expectation. These findings were extended in a parametric exploration of the effect of different levels of temporal expectation (Vangkilde, Petersen, & Bundesen, 2013). In this study, the length of the cue-stimulus foreperiod was exponentially distributed with one of six hazard rates presumably giving rise to one of six levels of temporal expectation within a particular block. Again, we found no effect of expectation on the threshold of conscious perception but a strong and remarkably systematic increase in perceptual processing speed with increasing expectation. Specifically, to a very good approximation processing speed could be described by a linearly increasing function of the logarithm of the level of expectancy given by the six different hazard rates (see Fig. 1). Similar effects were recently found by Sørensen, Vangkilde, and Bundesen (2014) when the design

outlined above was extended to multi-element displays. Vangkilde, Coull, and Bundesen (2012) and Vangkilde, Petersen, and Bundesen (2013) explained the effect in terms of TVA by assuming that temporal expectations affect perception by changing perceptual biases (values of b parameters). Specifically, a strong expectation that a stimulus letter will appear at the next moment should yield an increase in the b values of letter types, which should speed the recognition of the stimulus letter if it appears when it is highly expected. 1.3.3. Components of perceptual bias Bundesen, Vangkilde, and Habekost (in press) proposed the general multiplicative hypothesis that the perceptual bias associated with feature i (bi) in itself is a product of three terms:

bi ¼ Api ui ;

where A is the level of alertness, pi is the subjective prior probability of being presented with feature i, and ui is the subjective importance (‘‘utility’’) of identifying feature i. By the rate equation of TVA, the rate of processing of a visual categorization that a given object has feature i is directly proportional to bi. The multiplicative structure of Eq. (4) implies that bi = 0 if, and only if, either A = 0, pi = 0, or ui = 0. That bi = 0 if A = 0 entails the biologically very plausible implication that, if the level of alertness is zero, no visual categorizations are made. The implication that bi = 0 if the subjective prior probability of being presented with feature i, pi, is zero, means that no objects are seen as members of category i if the subject is absolutely certain that the category is empty. This would seem to characterize an ideal observer. The implication that bi = 0 if the subjective importance of identifying feature i, ui, is zero, also would seem to be a characteristic of an ideal observer. Humans may not be ideal observers, but the hypothetical description of performance given by Eq. (4) at least provides a benchmark for the analysis of how alertness, prior knowledge and importance may directly affect human performance. 1.3.4. Selection from multielement displays TVA provides a mathematical derivation of the fixed-capacity independent race model (FIRM; Shibuya & Bundesen, 1988) for selection from multielement displays consisting of homogeneous elements (in the sense of Bundesen, 1990, p. 524). According to FIRM, the processing of a stimulus display occurs in two stages. At the first stage, an attentional weight is computed for each element in the display. The weight of an element is a measure of the strength of the sensory evidence that the element is a target. At the second stage, the race between the elements takes place. At this stage of the system, the total processing capacity is assumed to be a constant, C elements per second. In the derivation of FIRM from TVA, C is defined as the sum of all v values across all perceptual categorizations of all elements in the visual field (cf. Townsend & Ashby, 1983):



Fig. 1. Best linear fit to the mean estimated processing speed v (letters/s) for each hazard rate condition plotted on a logarithmic scale. Conditions are: k = 0.1 s1 (violet), k = 0.2 s1 (blue), k = 0.4 s1 (green), k = 0.8 s1 (yellow), k = 1.6 s1 (orange), and k = 3.2 s1 (red). Error bars show standard errors of the means. The linear fit accounted for 99.4% of the variance of the estimated v values. From ‘‘Temporal Expectancy in the Context of a Theory of Visual Attention’’ by S. Vangkilde, A. Petersen, and C. Bundesen, 2013, Philosophical Transactions of the Royal Society, 368, 20130054, p. 8.

ð4Þ

X

XX

x2S

x2S i2R

v ðxÞ ¼

v ðx; iÞ:

ð5Þ

The processing capacity C is distributed across the display elements in proportion to their weights. Thus, every element in the display is allocated a certain fraction of the total processing capacity, and that fraction equals the attentional weight of the element divided by the sum of the attentional weights across all of the elements in the display. In many visual search tasks, the stimuli fall into two categories: targets (i.e., items to be reported or otherwise responded to) and distractors (items to be ignored), such that every target has approximately the same attentional weight as any other target, and every distractor has approximately the same attentional weight as any other distractor. In such tasks, the efficiency

Please cite this article in press as: Bundesen, C., et al. Recent developments in a computational theory of visual attention (TVA). Vision Research (2014), http://dx.doi.org/10.1016/j.visres.2014.11.005

4

C. Bundesen et al. / Vision Research xxx (2014) xxx–xxx

(or inefficiency) of top-down selection can be measured by the ratio, a, between the attentional weights of a distractor and a target:

a ¼ wdistractor =wtarget :

ð6Þ

1.3.5. Attentional dwell time The majority of TVA-based studies have investigated attention using brief (e.g., 20–200 ms) exposures of simultaneously presented stimuli. However, Petersen, Kyllingsbæk, and Bundesen (2012) recently elaborated TVA in the temporal domain by presenting a computational model (a theory of temporal visual attention; TTVA) to account for classical findings of attentional

Ó American Psychological Association 1988

The time it takes to encode an element into VSTM depends on the amount of processing capacity allocated to the element. Specifically, the encoding time is assumed to be exponentially distributed with a rate parameter equal to the amount of processing capacity allocated to the element. Encoding times for different elements are stochastically independent, and the selected elements are those elements whose encoding processes complete before (a) the stimulus presentation terminates and (b) VSTM has been filled up. FIRM makes precise predictions of effects of variations in the exposure duration of the stimuli. Shibuya and Bundesen (1988) tested such predictions in a comprehensive study of partial report of digits from mixtures of letters and digits with exposure durations ranging from 10 ms up to 200 ms. The participants were instructed to report as many digits (targets) as possible from the stimulus display while ignoring the letters (distractors). Each display was terminated by a pattern mask. The results from one representative participant are illustrated in Fig. 2. For each combination of exposure duration, number of targets, and number of distractors, the figure shows the probability distribution of the number of correctly reported targets. Each panel shows the results for a particular combination of the numbers of targets (T) and distractors (D). The top curve in the panel shows the probability of reporting at least 1 target correct as a function of exposure duration, the second curve from the top shows the probability of reporting at least 2 targets correct, and so on. According to FIRM,

the processing rate of a target and a distractor in each of the display types is given by C/(T + aD) and aC/(T + aD), respectively. The close fit shown by the smooth curves was obtained with VSTM capacity K at a value of 3.7 elements, total processing capacity C at 49 elements/s, distractibility parameter a (the weight ratio of a distractor to a target) at 0.40, and longest ineffective exposure duration t0 at 19 ms. Noninteger values of VSTM capacity K were treated as probability mixtures such that, for example, a value of 3.7 elements for K represented a mixture of values of 3 and 4 with a probability of .7 that K = 4 on a given trial. Many other well-established results have been closely fit by TVA. The results include findings of stochastic independence (Bundesen, Kyllingsbæk, & Larsen, 2003; Kyllingsbæk & Bundesen, 2007) in recognition of multiple features, effects of object integrality (Duncan, 1984), effects of number and spatial position of targets in studies of divided attention (e.g., Posner, Nissen, & Ogden, 1978; Sperling, 1960, 1967), effects of selection criterion and the number of distractors in studies of focused attention (e.g., Bundesen & Pedersen, 1983; Treisman & Gelade, 1980; Treisman & Gormican, 1988), and effects of consistent practice in search (Schneider & Fisk, 1982; also see Kyllingsbæk, Schneider, & Bundesen, 2001).

Fig. 2. Relative frequency of scores of j or more correctly reported targets as a function of exposure duration with j, number of targets T, and number of distractors D as parameters in partial report experiment of Shibuya and Bundesen (1988). Data are shown for Subject MP. Parameter j varies within panels; j is 1 (open circles), 2 (open squares), 3 (solid squares), 4 (solid circles), or 5 (triangle). T and D vary among panels. Smooth curves represent a theoretical fit to the data by the FIRM model. For clarity, observed frequencies less than .02 were omitted from the figure. Adapted from ‘‘Visual Selection From Multielement Displays: Measuring and Modeling Effects of Exposure Duration,’’ by H. Shibuya and C. Bundesen, 1988, Journal of Experimental Psychology: Human Perception and Performance, 14, p. 595.

Please cite this article in press as: Bundesen, C., et al. Recent developments in a computational theory of visual attention (TVA). Vision Research (2014), http://dx.doi.org/10.1016/j.visres.2014.11.005

5

C. Bundesen et al. / Vision Research xxx (2014) xxx–xxx

1.3.6. Relation to other cognitive domains Gordon Logan has extended TVA to other cognitive domains than visual attention. First, Logan (1996) proposed a combination of TVA with the Contour Detector (CODE) theory of perceptual grouping (gestalt formation) based on nearness that was proposed by van Oeffelen and Vos (1982, 1983). The resulting theory—the CODE Theory of Visual Attention (CTVA)—integrates space-based and object-based approaches to visual attention. Thus, the theory explains many effects of spatial distance in visual attention (cf. Logan, 1996; Logan & Bundesen, 1996). CTVA has been formalized by Bundesen (1998). Second, Logan and Gordon (2001) extended CTVA to ECTVA: a theory of Executive Control of TVA in dual-task situations. This theory accounts for crosstalk, set-switching cost, and concurrence costs in dual-task settings. The theory is based on the assumption that superordinate, executive processes coordinate and control subordinate processes by manipulating their parameters. TVA is used to describe the subordinate processes, and a task set is defined as a set of TVA parameters (primarily gs, bs, and ps) that suffices to configure a subordinate process to perform a task. Set switching costs are explained in terms of the time it takes to change parameters and the number of parameters that need to be changed. Finally, by combining ECTVA with the exemplar-based random walk model of categorization proposed by Nosofsky and Palmeri (1997), Logan (2002) created an Instance Theory of Attention and Memory (ITAM). The model of Nosofsky and Palmeri (1997) is itself a combination of Nosofsky’s (1986) generalized context model of categorization and Logan’s (1988) instance theory of automaticity. In ITAM, as in Logan’s (1988) instance theory of automaticity, a cat-

Subject 2

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

T1

T2

p , p or p

eye

Subject 1

Ó Psychonomic Society 2012

dynamics when two stimuli are presented in close temporal proximity. Specifically, TTVA accounts for the attentional dwell time phenomenon, where the report of a masked target (T2) is severely impaired when it is presented with a delay of less than 500 ms after a spatially separate masked target (T1) (see Duncan, Ward, & Shapiro, 1994; Ward, Duncan, & Shapiro, 1996). TTVA assumes that a stimulus encoded and retained in VSTM takes up visual processing resources, which could otherwise have been employed for encoding subsequent stimuli into VSTM. These resources are locked until the stimulus in VSTM has been recoded to a nonvisual (e.g., auditory, motoric, or amodal) format, which gives rise to the long dwell time. Fig. 3 shows the observed and predicted performance with T1 and T2 for 3 individual subjects tested by Petersen, Kyllingsbæk, and Bundesen (2012) and for a group of subjects reported in Duncan, Ward, and Shapiro (1994). Petersen, Kyllingsbæk, and Bundesen (2013) used TTVA to explain the well-known finding that removal of the mask of T1 results in a faster recovery of T2 performance (Moore et al., 1996; Raymond, Shapiro, & Arnell, 1992). This effect has previously been linked to a shorter dwell-time on unmasked targets, but Petersen, Kyllingsbæk, and Bundesen (2013) found evidence that both targets and masks lock attentional resources such that masks effectively function as distractors, suggesting that the fast recovery of T2 performance may simply be due to T2 not having to compete with other objects for processing resources after the presentation (and decay) of T1. For stimuli presented in synchrony, TTVA reduces to TVA, so TTVA explains all behavioral findings previously explained by TVA.

0

0

200

400

600

800

0

0

Subject 3

200

400

600

800

Duncan et al. (1994) 1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

T1

T2

p , p or p

eye

1

0

0

200

400

SOA (ms)

600

800

0

T1 (model) T2 (model) T1 (data) T2 (data) Eye movements

0

200

400

600

800

SOA (ms)

Fig. 3. Observed T1 (pT1, squares) and T2 (pT2, circles) performances as functions of stimulus onset asynchrony (SOA) for the 3 individual subjects tested by Petersen, Kyllingsbæk, and Bundesen (2012) and for the group average reported in Duncan, Ward, and Shapiro (1994). Only trials without eye movements were analyzed. Bars indicate the observed proportion of trials in which eye movements were registered, peye. Error bars indicate the standard deviations of pT1 and pT2 assuming that the responses from the 3 individual subjects were approximately binomially distributed. Dashed lines (predicted T1 performance) and solid lines (predicted T2 performance) show least squares fits of TTVA to the four data sets. From ‘‘Measuring and Modeling Attentional Dwell Time’’ by A. Petersen, S. Kyllingsbæk, and C. Bundesen, 2012, Psychonomic Bulletin & Review, 19, p. 1030.

Please cite this article in press as: Bundesen, C., et al. Recent developments in a computational theory of visual attention (TVA). Vision Research (2014), http://dx.doi.org/10.1016/j.visres.2014.11.005

6

C. Bundesen et al. / Vision Research xxx (2014) xxx–xxx

egory is represented as a set of instances (individual examples or members of the category). Thus, ITAM is an integrated theory of attention, memory, and categorization.

2. A neural interpretation of TVA (NTVA) 2.1. Filtering and pigeonholing at the single-cell level

Pigeonholing (rate of firing)

Ó American Psychological Association 2005

The neural theory of visual attention (NTVA; Bundesen, Habekost, & Kyllingsbæk, 2005) provides a neural interpretation of the two central equations of TVA, the rate and weight equations. The equations jointly describe two mechanisms of attentional selection: one for selection of objects and one for selection of categories or, equivalently, features. NTVA specifies how the mechanisms work at the single-cell level: Filtering (selection of objects) works by changing the number of cortical neurons in which an object is represented, whereas pigeonholing (selection of features) works by changing the rate of firing in cortical neurons coding for particular features (see Fig. 4). As explained below, in NTVA, the total neural activation representing a visual categorization of the form ‘‘object x has feature i’’ is directly proportional to both (a) the number of neurons representing the categorization, which is controlled by filtering, and (b) the level of activation of the individual neurons representing the categorization, which is controlled by pigeonholing, and the rate equation simply expresses these direct proportionalities. Filtering makes the number of cells in which an object is represented increase with the behavioral importance of the object. Thus, in NTVA, visual processing occurs in parallel, with differential allocation of resources so that important objects become represented in more cells than do less important objects. Specifically, in NTVA, the probability that a cortical neuron represents a particular object in its classical receptive field equals the attentional weight of the object divided by the sum of the attentional weights across all objects in the neuron’s receptive field (see Bundesen, Habekost, & Kyllingsbæk, 2005, for a hypothetical neural network that may be used for the selection of objects).

Filtering (number of cells) Fig. 4. Attentional selection in NTVA: Combined effects of filtering (selection of objects) and pigeonholing (selection of features) on the set of cortical spike trains representing a particular visual categorization of the form ‘‘object x has feature i.’’ The 4 conditions (quadrants) correspond to the factorial combinations of 2 levels of filtering (weak vs. strong support to object x) by 2 levels of pigeonholing (weak vs. strong support to feature i). Filtering changes the number of cortical neurons in which an object is represented. Pigeonholing changes the rate of firing of cortical neurons coding for a particular feature. From ‘‘A Neural Theory of Visual Attention: Bridging Cognition and Neurophysiology,’’ by C. Bundesen, T. Habekost, and S. Kyllingsbæk, 2005, Psychological Review, 112, p. 292.

The weight equation of TVA describes how attentional weights are computed. The computation must occur before processing resources (cells) can be distributed in accordance with the weights. Therefore, in NTVA, a typical perceptual cycle consists of two waves: a wave of unselective processing, in which attentional weights are computed, followed by a wave of selective processing, once processing resources have been allocated in accordance with the weights. During the first wave, cortical processing resources may be distributed at random (unselectively) across the visual field. At the end of the first wave, an attentional weight has been computed for each object in the visual field and the weight has been stored in a priority map. The weights are used for reallocation of attention (visual processing capacity) by dynamic remapping of receptive fields of cortical neurons. The remapping of receptive fields makes the number of neurons allocated to an object increase with the attentional weight of the object. Thus, during the second wave, cortical processing is selective in the sense that the amount of processing resources allocated to an object (the number of neurons that represent the properties of the object) varies with the attentional weight of the object. Because more processing resources are devoted to behaviorally important objects than to less important ones, the important objects are processed faster, and are therefore more likely to become encoded into VSTM. Judging from data by Chelazzi et al. (1998, 2001), the second wave of processing may begin about 200 ms after the presentation of the stimulus array. NTVA assumes that a typical neuron in the visual system is specialized to represent a single feature (the feature preferred by the neuron). The feature for which the neuron is specialized can be a more or less simple physical feature or a microfeature in a distributed representation (cf. Hinton, McClelland, & Rumelhart, 1986). NTVA also assumes that a visual neuron responds to the properties of only one object at any given time. In the first wave of processing the object is selected at random among the objects in the neuron’s classical receptive field, but in the second wave of processing, the probability that the neuron represents a particular object equals the attentional weight of the object divided by the sum of the attentional weights across all objects in the receptive field. In NTVA, the activation of a neuron by the appearance of an object in its receptive field is defined as the increase in firing rate above a baseline rate representing the spontaneous (undriven) activity of the neuron. If the baseline rate is zero, the activation equals the firing rate. In the suggested interpretation, the rate equation of TVA:

wx ; z2S wz

v ðx; iÞ ¼ gðx; iÞbi P

describes the combined effects of filtering and pigeonholing on the total activation of the population of neurons representing the categorization ‘‘object x has feature i.’’ The v value on the left-hand side of the rate equation is the total activation of the neurons that represent the categorization at the level of processing in the brain where objects compete for entrance into VSTM. At this level of processing, the classical receptive fields of neurons are assumed to be so large that each one covers nearly the entire visual field. Because bi and

wx ; z2S wz

P

may only take on values up to 1, the rate equation implies that the strength of the sensory evidence that x has feature i, g(x, i), is the highest possible value of v(x, i). Specifically, g(x, i) equals the total activation of the set of all neurons coding feature i when all of these cells represent object x (say, when x is the only object in the visual field) and when the featural bias in favor of i is maximal (i.e., bi = 1).

Please cite this article in press as: Bundesen, C., et al. Recent developments in a computational theory of visual attention (TVA). Vision Research (2014), http://dx.doi.org/10.1016/j.visres.2014.11.005

C. Bundesen et al. / Vision Research xxx (2014) xxx–xxx

When the proportion of feature-i coding neurons representing object x:

P

wx z2S wz

;

is smaller than 1, then the total activation representing the categorization ‘‘object x has feature i’’ is scaled down by multiplication with this factor on the right-hand side of the equation. The total activation representing the categorization also varies in direct proportion to the level of activation of the individual neurons representing the categorization. The bias parameter bi is a scale factor that multiplies activations of all feature-i coding neurons, so the total activation representing the categorization ‘‘object x has feature i’’ is also directly proportional to bi. In summary, the neural interpretation of TVA’s rate equation essentially reduces to the statement that the total activation representing the categorization ‘‘object x has feature i’’ is directly proportional to both the number of neurons representing the categorization (which is controlled by filtering) and the level of activation of the individual neurons representing the categorization (which is controlled by pigeonholing). 2.2. Related models of attention The rate equation of TVA summarizes the way the strength

g(x, i) of the sensory evidence supporting the categorization ‘‘object x has feature i’’ interacts with two types of biases: the relative attentional weight of object x,

P

wx z2S wz

;

and the perceptual bias associated with feature i, bi, in determining the total competitive strength v(x, i) of the given categorization. As explained in the introduction, the rate equation makes TVA a theory of biased competition in the general sense of Desimone and Duncan (1995). Of course, TVA is special in assuming two different types of biases (one associated with objects and one with features) and also in being formalized. In addition to being a theory of biased competition, NTVA is a feature similarity gain model in the sense of Treue and MartinezTrujillo (1999) and Martinez-Trujillo and Treue (2004). A gain model of attention is a model in which attention works by multiplicative scaling of neuronal responses by a certain gain factor (see McAdams & Maunsell, 1999). Treue and Martinez-Trujillo proposed a fundamental principle, the feature similarity gain principle, which says that the gain factor of a neuron increases with increasing similarity between the sensory selectivity (the stimulus preferences) of the neuron and the currently attended features. Thus, assuming that a neuron encodes those features it prefers, attention to feature i should increase the responses of neurons encoding feature i or encoding features similar to feature i. In terms of TVA, the gain factor in question is the multiplicative perceptual bias bi applied to neurons that are specialized for signalling feature i. NTVA also implies the saliency or priority map hypothesis proposed by Itti and Koch (2000), Koch and Ullman (1985), Bisley and Goldberg (2010), and others. By this hypothesis, the visual system contains one or more maps of attentional weights or attentional weight components. Specifically, according to a recent version of the hypothesis, the lateral intraparietal area (LIP) acts as a priority map in which objects are represented by activity proportional to their behavioral priority. Bisley and Goldberg (2010) point at evidence that the priority map combines bottom-up inputs from both the traditional dorsal and ventral streams of visual processing, including areas V2, V3, V3a, MT, MST, V4, and IT, with topdown inputs from a wide range of cortical and subcortical areas,

7

including the frontal eye fields, anterior cingulate cortex, the claustrum, and many thalamic nuclei. As also assumed in NTVA, Bisley and Goldberg assume that the priority (attentional weight) of a stimulus reflects, in part, an evaluation of the subjective importance of the stimulus. The priorities in the map are thought to be used by the oculomotor system to target saccades and by the visual system to guide visual attention. Bisley and Goldberg (2010) seem to assume that visual attention to objects is a serial process, in which objects are processed one by one, but in NTVA this is only true in special cases (see Bundesen, 1990, pp. 536–537). Finally, TVA is closely related to the normalization models of Reynolds and Heeger (2009; see also Moran & Desimone, 1985; Reynolds, Chelazzi, & Desimone, 1999) and Lee and Maunsell (2009). The model by Reynolds and Heeger was originally suggested as a way of implementing biased competition, which was inspired by analyses of responses of cortical visual neurons to simultaneously presented pairs of stimuli. In the normalization model, the excitatory neural activation caused by the stimuli (the stimulus drive) is represented in a map that shows the activation as a function of both the locations of the receptive-field centers of neurons and their feature preferences. The map of the stimulus drive first becomes multiplied point-by-point with an attention field and then becomes normalized by being divided point-bypoint with the suppressive drive to yield the output neural firing rates. The suppressive drive is computed by convolving the point-by-point product of the stimulus drive and the attention field with a Gaussian kernel. Thus, the division by the suppressive drive has the effect that the attention-weighted stimulus drive from a preferred stimulus is normalized with respect to the activity in other neurons that respond to the surrounding spatial and featural context. This model may be able to explain the way in which the shape of contrast-response functions for attended and nonattended stimuli change depending on both the size of the stimulus and the size of the attentional field. The effect of the normalization is akin to the effect of using relative (i.e., normalized) instead of absolute attentional weights in the rate equation of TVA. On the other hand, for a pair of adequate stimuli both of which are presented within the classical receptive field of a recorded cell, the normalization model implies that the response (firing rate) of the cell is a weighted average of the responses obtained when either stimulus is presented alone (see Reynolds, Chelazzi, & Desimone, 1999), whereas NTVA implies that the response to the stimulus pair is a probability mixture of the responses obtained when the stimuli are presented alone (see Bundesen, Habekost, & Kyllingsbæk, 2005): With probability p, the cell responds as if only Stimulus 1 had been presented, and with probability 1  p, the cell responds as if only Stimulus 2 had been presented. 2.3. Functional anatomy NTVA is a fairly abstract neurocomputational model. As emphasized by Bundesen, Habekost, and Kyllingsbæk (2005), the model does not depend in a critical way on particular anatomical localizations of the proposed computations. On the other hand, the model suggests some ways in which visual computations may be distributed across the human brain. One plausible distribution is given by the so-called thalamic version of NTVA, which is illustrated in Fig. 5. For further discussions of the anatomy of visual attention, see Bundesen, Habekost, and Kyllingsbæk (2005, 2011), Habekost and Starrfelt (2009), and Gillebert et al. (2012). 2.4. Applications to single-cell studies Quantitative applications of NTVA to attentional effects observed in single-cell studies can be found in Bundesen,

Please cite this article in press as: Bundesen, C., et al. Recent developments in a computational theory of visual attention (TVA). Vision Research (2014), http://dx.doi.org/10.1016/j.visres.2014.11.005

C. Bundesen et al. / Vision Research xxx (2014) xxx–xxx Ó American Psychological Association 2005

8

Fig. 5. Possible distribution of visual processing across the human brain. Visual information from the eye enters the lateral geniculate nucleus (LGN) of the thalamus (1) and is transmitted to striate and extrastriate cortical areas where g values (strengths of evidence that objects at particular scales and positions have particular features) are computed (2). The g values are multiplied by p (pertinence) values, and the products are transmitted from the cortex to a priority map in the pulvinar (Pul) nucleus of the thalamus, where the products are summed up as attentional weights of the stimulus objects (3). After the first (unselective) wave of processing, cortical processing capacity is redistributed by means of attentional weight signals (w) from the pulvinar to the cortex, so that during the second (selective) wave of processing, objects with high attentional weights are processed by many neurons (4). The resulting g values are multiplied by b (bias) values, and the products are transmitted from the cortex to a multiscale VSTM map of locations, which is tentatively localized in the thalamic reticular nucleus (TRN) (5). When the VSTM map is initialized, objects in the visual field effectively start a race to become encoded into VSTM. In this race, each object is represented by all possible categorizations of the object, and each possible categorization participates with an activation (v value) proportional to the corresponding g value multiplied by the corresponding b value. For the winners of the race, the TRN gates activation representing a categorization back to some of those cells in LGN whose activation supported the categorization (6). Thus activity in neurons representing winners of the race is sustained by positive feedback. From ‘‘A Neural Theory of Visual Attention: Bridging Cognition and Neurophysiology,’’ by C. Bundesen, T. Habekost, and S. Kyllingsbæk, 2005, Psychological Review, 112, p. 295.

Habekost, and Kyllingsbæk (2005) and Bundesen and Habekost (2008). The applications comprise attentional effects on the rate of firing with multiple stimuli in the receptive field of a neuron, effects on the rate of firing with a single stimulus in the receptive field, and effects on baseline firing. Neural filtering is viewed as dynamic remapping of receptive fields, changing the number of neurons representing perceptual categorizations of a given type of objects. Neural pigeonholing is viewed as a mechanism that selects the set of neurons that prefer a given feature for a multiplicative increase or decrease in activation. 3. Final remark In this article, we have reviewed the foundations of a formal computational theory of visual attention, TVA (Bundesen, 1990), and presented some recent developments in the theory. We have also summarized a neural interpretation of TVA, NTVA (Bundesen, Habekost, & Kyllingsbæk, 2005, 2011). Acknowledgment This work was supported by a grant from the University of Copenhagen Programme of Excellence to Claus Bundesen. References Bisley, J. W., & Goldberg, M. E. (2010). Attention, intention, and priority in the parietal lobe. Annual Review of Neuroscience, 33, 1–21.

Bundesen, C. (1990). A theory of visual attention. Psychological Review, 97, 523–547. Bundesen, C. (1993). The relationship between independent race models and Luce’s choice axiom. Journal of Mathematical Psychology, 37, 446–471. Bundesen, C. (1998). A computational theory of visual attention. Philosophical Transactions of the Royal Society of London, Series B, 353, 1271–1281. Bundesen, C., Vangkilde, S., & Habekost, T. (in press). Components of visual bias: A multiplicative hypothesis. Annals of the New York Academy of Sciences. Bundesen, C., & Habekost, T. (2008). Principles of visual attention: Linking mind and brain. Oxford University Press. Bundesen, C., Habekost, T., & Kyllingsbæk, S. (2005). A neural theory of visual attention: Bridging cognition and neurophysiology. Psychological Review, 112, 291–328. Bundesen, C., Habekost, T., & Kyllingsbæk, S. (2011). A neural theory of visual attention and short-term memory. Neuropsychologia, 49, 1446–1457. Bundesen, C., & Habekost, T. (2014). Theory of visual attention (TVA). In A. C. Nobre & S. Kastner (Eds.), The Oxford handbook of attention (pp. 1095–1121). Oxford University Press. Bundesen, C., Kyllingsbæk, S., & Larsen, A. (2003). Independent encoding of colors and shapes from two stimuli. Psychonomic Bulletin & Review, 10, 474–479. Bundesen, C., & Pedersen, L. F. (1983). Color segregation and visual search. Attention, Perception, & Psychophysics, 33, 487–493. Chelazzi, L., Duncan, J., Miller, E. K., & Desimone, R. (1998). Responses of neurons in inferior temporal cortex during memory-guided visual search. Journal of Neurophysiology, 80, 2918–2940. Chelazzi, L., Miller, E. K., Duncan, J., & Desimone, R. (2001). Responses of neurons in macaque area V4 during memory-guided visual search. Cerebral Cortex, 11, 761–772. Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual Review of Neuroscience, 18, 193–222. Duncan, J. (1984). Selective attention and the organization of visual information. Journal of Experimental Psychology: General, 113, 501–517. Duncan, J., Bundesen, C., Olson, A., Humphreys, G., Chavda, S., & Shibuya, H. (1999). Systematic analysis of deficits in visual attention. Journal of Experimental Psychology: General, 128, 450–478. Duncan, J., Ward, R., & Shapiro, K. (1994). Direct measurement of attentional dwell time in human vision. Nature, 369, 313–315. Dyrholm, M., Kyllingsbæk, S., Espeseth, T., & Bundesen, C. (2011). Generalizing parametric models by introducing trial-by-trial parameter variability: The case of TVA. Journal of Mathematical Psychology, 55, 416–429. Gillebert, C. R., Dyrholm, M., Vangkilde, S., Kyllingsbæk, S., Peeters, R., & Vandenberghe, R. (2012). Attentional priorities and access to short-term memory: Parietal interactions. Neuroimage, 62, 1551–1562. Habekost, T., & Starrfelt, R. (2009). Visual attention capacity: A review of TVA-based patient studies. Scandinavian Journal of Psychology, 50, 23–32. Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986). Distributed representations. In D. E. Rumelhart & J. L. McClelland (Eds.). Parallel distributed processing (Vol. 1, pp. 77–109). Cambridge, MA: MIT Press. Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40, 1489–1506. Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4, 219–227. Kyllingsbæk, S., & Bundesen, C. (2007). Parallel processing in a multi-feature wholereport paradigm. Journal of Experimental Psychology: Human Perception and Performance, 33, 64–82. Kyllingsbæk, S., Markussen, B., & Bundesen, C. (2012). Testing a Poisson counter model for visual identification of briefly presented, mutually confusable single stimuli in pure accuracy tasks. Journal of Experimental Psychology: Human Perception and Performance, 38, 628–642. Kyllingsbæk, S., Schneider, W. X., & Bundesen, C. (2001). Automatic attraction of attention to former targets in visual displays of letters. Attention, Perception, & Psychophysics, 63, 85–98. Lee, J., & Maunsell, J. H. R. (2009). A normalization model of attentional modulation of single unit responses. PLoS One, 4(2), e4651. Logan, G. D. (1988). Toward an instance theory of automatization. Psychological Review, 95, 492–527. Logan, G. D. (1996). The CODE theory of visual attention: An integration of spacebased and object-based attention. Psychological Review, 103, 603–649. Logan, G. D. (2002). An instance theory of attention and memory. Psychological Review, 109, 376–400. Logan, G. D., & Bundesen, C. (1996). Spatial effects in the partial report paradigm: A challenge for theories of visual spatial attention. In D. L. Medin (Ed.). The psychology of learning and motivation (Vol. 35, pp. 243–282). San Diego, CA: Academic Press. Logan, G. D., & Gordon, R. D. (2001). Executive control of visual attention in dualtask situations. Psychological Review, 108, 393–434. Los, S. A. (2010). Foreperiod and sequential effects: Theory and data. In A. C. Nobre & J. T. Coull (Eds.), Attention and time (1st ed., pp. 289–302). Oxford University Press. Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.). Handbook of mathematical psychology (Vol. 1, pp. 103–189). New York: Wiley. Luce, R. D. (1986). Response times: Their role in inferring elementary mental organization. New York: Oxford University Press. Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390, 279–281.

Please cite this article in press as: Bundesen, C., et al. Recent developments in a computational theory of visual attention (TVA). Vision Research (2014), http://dx.doi.org/10.1016/j.visres.2014.11.005

C. Bundesen et al. / Vision Research xxx (2014) xxx–xxx Martinez-Trujillo, J. C., & Treue, S. (2004). Feature-based attention increases the selectivity of population responses in primate visual cortex. Current Biology, 14, 744–751. Matthias, E., Bublak, P., Costa, A., Müller, H. J., Schneider, W. X., & Finke, K. (2009). Attentional and sensory effects of lowered levels of intrinsic alertness. Neuropsychologia, 47, 3255–3264. Matthias, E., Bublak, P., Müller, H. J., Schneider, W. X., Krummenacher, J., & Finke, K. (2010). The influence of alertness on spatial and nonspatial components of visual attention. Journal of Experimental Psychology: Human Perception and Performance, 36, 38–56. McAdams, C. J., & Maunsell, J. H. R. (1999). Effects of attention on orientation-tuning functions of single neurons in macaque cortical area V4. Journal of Neuroscience, 19, 431–441. Moore, C. M., Egeth, H., Berglan, L. R., & Luck, S. J. (1996). Are attentional dwell times inconsistent with serial visual search? Psychonomic Bulletin & Review, 3, 360–365. Moran, J., & Desimone, R. (1985). Selective attention gates visual processing in the extrastriate cortex. Science, 229, 782–784. Niemi, P., & Näätänen, R. (1981). Foreperiod and simple reaction time. Psychological Bulletin, 89, 133–162. Nobre, A. C. (2001). Orienting attention to instants in time. Neuropsychologia, 39, 1317–1328. Nobre, A. C. (2010). How can temporal expectations bias perception and action? In A. C. Nobre & J. T. Coull (Eds.), Attention and time (1st ed., pp. 371–392). Oxford University Press. Nobre, A. C., & Rohenkohl, G. (2014). Time for the fourth dimension in attention. In A. C. Nobre & S. Kastner (Eds.), The Oxford handbook of attention (pp. 676–723). Oxford University Press. Nordfang, M., Dyrholm, M., & Bundesen, C. (2013). Identifying bottom-up and topdown components of attentional weight by experimental analysis and computational modeling. Journal of Experimental Psychology: General, 142, 510–535. Nosofsky, R. M. (1986). Attention, similarity, and the identification– categorization relationship. Journal of Experimental Psychology: General, 115, 39–57. Nosofsky, R. M., & Palmeri, T. J. (1997). An exemplar-based random walk model of speeded classification. Psychological Review, 104, 266–300. Petersen, A., Kyllingsbæk, S., & Bundesen, C. (2012). Measuring and modeling attentional dwell time. Psychonomic Bulletin & Review, 19, 1029–1046. Petersen, A., Kyllingsbæk, S., & Bundesen, C. (2013). Attention dwells on both targets and masks. Journal of Vision, 13, 1–12. Posner, M. I., Nissen, M. J., & Ogden, W. C. (1978). Attended and unattended processing modes: The role of set for spatial location. In H. L. Pick & I. J. Saltzman (Eds.), Modes of perceiving and processing information (pp. 137–157). Hillsdale, NJ: Erlbaum. Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18, 849–860.

9

Reynolds, J. H., Chelazzi, L., & Desimone, R. (1999). Competitive mechanisms subserve attention in macaque areas V2 and V4. Journal of Neuroscience, 19, 1736–1753. Reynolds, J. H., & Heeger, D. J. (2009). The normalization model of attention. Neuron, 61, 168–185. Schneider, W., & Fisk, A. D. (1982). Degree of consistent training: Improvements in search performance and automatic process development. Attention, Perception, & Psychophysics, 31, 160–168. Shibuya, H., & Bundesen, C. (1988). Visual selection from multielement displays: Measuring and modeling effects of exposure duration. Journal of Experimental Psychology: Human Perception and Performance, 14, 591–600. Sørensen, T. A., Vangkilde, S., & Bundesen, C. (2014). Components of attention modulated by temporal expectation. Journal of Experimental Psychology: Learning, Memory, and Cognition. http://dx.doi.org/10.1037/a0037268. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs, 74(11, Whole No. 498). Sperling, G. (1967). Successive approximations to a model for short-term memory. Acta Psychologica, 27, 285–292. Townsend, J. T., & Ashby, F. G. (1982). Experimental test of contemporary mathematical models of visual letter recognition. Journal of Experimental Psychology: Human Perception and Performance, 8, 834–864. Townsend, J., & Ashby, F. G. (1983). The stochastic modeling of elementary psychological processes. Cambridge, UK: Cambridge University Press. Townsend, J. T., & Landon, D. E. (1982). An experimental and theoretical investigation of the constant-ratio rule and other models of visual letter confusion. Journal of Mathematical Psychology, 25, 119–162. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97–136. Treisman, A. M., & Gormican, S. (1988). Feature analysis in early vision: Evidence from search asymmetries. Psychological Review, 95, 15–48. Treue, S., & Martinez-Trujillo, J. C. M. (1999). Feature-based attention influences motion processing gain in macaque visual cortex. Nature, 399, 575–579. van Oeffelen, M. P., & Vos, P. G. (1982). Configurational effects on the enumeration of dots: Counting by groups. Memory & Cognition, 10, 396–404. van Oeffelen, M. P., & Vos, P. G. (1983). An algorithm for pattern description on the level of relative proximity. Pattern Recognition, 16, 341–348. Vangkilde, S., Coull, J. T., & Bundesen, C. (2012). Great expectations: Temporal expectation modulates perceptual processing speed. Journal of Experimental Psychology: Human Perception and Performance, 38, 1183–1191. Vangkilde, S., Petersen, A., & Bundesen, C. (2013). Temporal expectancy in the context of a theory of visual attention. Philosophical Transactions of the Royal Society of London, Series B, 368, 20130054. http://dx.doi.org/10.1098/ rstb.2013.0054. Ward, R., Duncan, J., & Shapiro, K. (1996). The slow time-course of visual attention. Cognitive Psychology, 30, 79–109. Woodrow, H. (1914). The measurement of attention. Psychological Monographs, 17(5, Whole No. 76), 1–158.

Please cite this article in press as: Bundesen, C., et al. Recent developments in a computational theory of visual attention (TVA). Vision Research (2014), http://dx.doi.org/10.1016/j.visres.2014.11.005

Recent developments in a computational theory of visual attention (TVA).

This article reviews the foundations of the theory of visual attention (TVA) and describes recent developments in the theory. TVA is based on the prin...
992KB Sizes 0 Downloads 9 Views