© 2014 American Psychological Association 0096-1523/14/$ 12.00 http://dx.doi.org/10.1037/a0037744

Journal of Experimental Psychology: Human Perception and Performance 2014, Vol. 40, No. 6, 2214-2242

An Information Capacity Limitation of Visual Short-Term Memory David K. Sewell, Simon D. Lilbum, and Philip L. Smith The University of Melbourne

Research suggests that visual short-term memory (VSTM) has both an item capacity, of around 4 items, and an information capacity. We characterize the information capacity limits of VSTM using a task in which observers discriminated the orientation of a single probed item in displays consisting of 1, 2, 3, or 4 orthogonally oriented Gabor patch stimuli that were presented in noise for 50 ms, 100 ms, 150 ms, or 200 ms. The observed capacity limitations are well described by a sample-size model, which predicts invariance of for displays of different sizes and linearity of (d[)2 for displays of different durations. Performance was the same for simultaneous and sequentially presented displays, which implicates VSTM as the locus of the observed invariance and rules out explanations that ascribe it to divided attention or stimulus encoding. The invariance of 2 ; WO2 is predicted by the competitive interaction theory of Smith and Sewell (2013), which attributes it to the normalization of VSTM traces strengths arising from competition among stimuli entering VSTM.

Keywords: attention, visual short-term memory, mathematical modeling, capacity limitations

(Fougnie & Alvarez, 2011), or some combination of the two (Alvarez & Cavanagh, 2004; Awh, Barton, & Vogel, 2007; Fukuda, Vogel, Mayr, & Awh, 2010). It has also focused on whether encoded features in VSTM are bound or free (Bays & Husain, 2008; Bays, Wu, & Husain, 2011; Fougnie & Alvarez, 2011) or, as an intermediate position between these two theoretical extremes, whether the coding of features is hierarchical or clustered in some way (Brady & Alvarez, 2011; Orhan & Jacobs, 2013). One of the more recent developments in this literature has focused on the precision with which stimulus information is held in VSTM, and how precision is affected by factors such as memory load (Ander­ son, Vogel, & Awh, 2011; Fougnie, Asplund, & Marois, 2010; Wilken & Ma, 2004) and encoding time (Bays, Gorgoraptis, Wee, Marshall, & Husain, 2011), and has led to the development of detailed statistical models of responding in tasks requiring repro­ duction of continuous-valued stimulus features (van den Berg, Awh, & Ma, 2014; van den Berg, Shin, Chou, George, & Ma,

Many tasks we carry out in daily life require us to make decisions about transient stimulus events. In these situations, the time needed to make a decision is often much longer than the time for which the stimulus is physically present, so the decision must be based on our memory of the stimulus rather than on the stimulus as it is perceived. In the laboratory, decisions about the identities of brief visual stimuli are well described by models in which an encoded stimulus representation is preserved without appreciable decay for the time needed to make a decision (Ratcliff & Rouder, 2000; Smith, Ratcliff, & Wolfgang, 2004). The lack of decay during the decision time suggests that such decisions are based on durable representations in visual short-term memory (STM), whose role is to buffer stimulus information against iconic decay or overwriting by backward masks while the decision is being made. The stage of memory that supports psychophysical decisions, variously termed visual STM (VSTM) or visual working memory, has been the subject of intensive inquiry over the last decade or more. Much of the recent debate has focused on whether the capacity of VSTM is best characterized as a limitation on the number of items it can hold (Awh, Barton, & Vogel, 2007; Cowan, 2001; Luck & Vogel, 1997; Vogel, Woodman & Luck, 2001; Zhang & Luck, 2008), the amount of information it can hold

2012). In this article, we investigate the VSTM information limits that affect simple psychophysical decision making. Specifically, we report evidence for a sample-size model of the information capac­ ity of VSTM. This model was first proposed as a model for the capacity limitations of divided attention by Taylor, Lindsay, and Forbes (Lindsay, Taylor, & Forbes, 1968; Taylor, Lindsay, & Forbes, 1967) and was subsequently investigated by Shaw (1980). The empirical signature of the model is the invariance of 2 ; (d-)2 across displays of different sizes, where d' is the sensitivity measure of signal detection theory (Green & Swets, 1966; Mac­ millan & Creelman, 1991). Symbolically, the model predicts that

This article was published Online First September 15, 2014. David K. Sewell, Simon D. Lilburn, and Philip L. Smith, Melbourne School of Psychological Sciences, The University of Melbourne. The research in this article was supported by Australian Research Council Discovery Grants DPI 10103406 and DP140102970 to Philip L. Smith, and Discovery Early Career Award D E I40100772 to David K. Sewell, and an Australian Postgraduate Award to Simon D. Lilburn. We thank Ed Vogel and Chris Donkin for their comments on a previous version of this article. Correspondence concerning this article should be addressed to David K. Sewell, Melbourne School of Psychological Sciences, The University of Melbourne, Victoria 3010, Australia. E-mail: [email protected]

m

2(4)2 =g i=t

a)

where m is the display size and c is a constant. When there are two items in the display the value of (d-)2 will be half that for a one-item display, and when there are three items in the display it will be a third that for a one-item display, and so on. The predic2214

CAPACITY LIMITS OF VISUAL SHORT-TERM MEMORY

tion is a powerful and highly diagnostic one because it is wholly parameter free. Later in the article, we show how this key property of the sample-size model is reproduced in a neurally inspired model of attention and decision making, Smith and Sewell’s (2013) competitive interaction theory. The sample-size model has typically been interpreted as reflect­ ing an encoding capacity limitation, that is, a limitation on peo­ ple’s ability to simultaneously form representations of multiple stimuli. In concurrent tasks, in which people are asked to devote varying fractions of their attentional capacity to two different tasks (e.g., Sperling & Dosher, 1986), the model predicts the attention operating characteristic describing the locus of the pairs of d' values will be a quadrant of a circle (Norman & Bobrow, 1975). Bonnel and colleagues (Bonnel & Hafter, 1998; Bonnel & Miller, 1994) reported evidence supporting this prediction, showing that the sample-size model gave a good account of performance in concurrent tasks of this kind. However, Palmer (1990) argued, based on the results of a probe task in which people reported the contents of a single location probed after display offset, that the sample-size model was better conceived of as a limitation of the STM system, rather than encoding capacity. The VSTM capacity interpretation is supported by the fact that performance in simple visual search tasks, in which people search for targets in briefly presented displays of variable size, is often well-described by unlimited capacity signal detection models, in which the only source of performance limitation is decision noise (Eckstein, Thomas, Palmer, & Shimozaki, 2000; Palmer, 1994; Palmer, Ames, & Lindsey, 1993; Palmer, Verghese, & Pavel, 2000). These tasks show no evidence of an encoding capacity limitation of any kind, suggesting that the capacity limitations indexed by the sample-size model may arise later in the processing system. We report further support for the sample-size model and provide strong support for Palmer’s view that it reflects an information capacity limit of VSTM that is distinct from the item capacity limit, typically estimated to be around four items (Cowan, 2001; Luck & Vogel, 1997; Pashler, 1988). In Experiment 1, we show that the predictions of the sample-size model hold uniformly for exposure durations of between 50 ms and 200 ms and for display sizes up to four items, confirming and extending Palmer’s (1990) results. In Experiment 2, we use a simultaneous-sequential pre­ sentation paradigm (Shiffrin & Gardner, 1972) and show that performance with sequentially presented stimuli does not differ appreciably from that with simultaneously presented stimuli for stimulus exposures of 100 ms, 150 ms, or 200 ms for displays of one, two, or four items. Such differences as are present are in the direction of slightly poorer performance with sequential presenta­ tion. Because all of the available encoding capacity in the sequen­ tial presentation condition could be concentrated on one item at a time, a limited-capacity encoding model predicts better perfor­ mance with sequential than with simultaneous presentation, con­ trary to our findings. Later in this article we describe a computational model based on Smith and Sewell’s (2013) competitive interaction theory of atten­ tion and decision making, which makes the same predictions as the sample-size model. The model attributes the invariance of 2 , (d'j)2 to VSTM itself, rather than to VSTM encoding, consistent with Palmer’s interpretation. Mathematically, the model is a form of normalization model (Lee & Maunsell, 2009; Reynolds &

2215

Heeger, 2009), in which the strength of item representations in VSTM is reduced in proportion to the number of items in the memory system. The representational assumptions of the model are such that it predicts the invariance of Our main theoretical aim in this article is to highlight the formal relationship between the classic sample-size model and Smith and Sewell’s competitive interaction theory. We believe this relationship is important because we show the parameter free predictions of the sample-size model (see Equation 1) provide an excellent account of our data, and consequently, we believe the model reveals something fundamental about the storage capacity limits of the VSTM system. Because Smith and Sewell’s competitive interac­ tion theory predicts sample-size effects through dynamic VSTM trace normalization, it shows how the (re)allocation of memory resources in models of visual processing can be achieved in real-time, in a biologically plausible computational framework. Our findings extend already successful (static) normalization mod­ els of visual processing (e.g., Lee & Maunsell, 2009; Reynolds & Heeger, 2009) to the VSTM domain while also showing that dynamic normalization provides a natural account of performance when stimuli are presented sequentially.

Approaches to Studying Visual Short-Term Memory To place our study in the broader context of the VSTM litera­ ture, we review the major methodological approaches recently used to study VSTM. A key point of difference among method­ ologies for studying VSTM is the psychophysical complexity of the decision process. Some tasks require a decision— often be­ tween two mutually exclusive response alternatives—about a sin­ gle probed stimulus (e.g., “Is this stimulus identical to the item previously presented at this location?”). Other tasks require a global decision to be made about an entire array of stimuli (e.g., “Are any of the stimuli different to the items previously presented at their corresponding locations?”). Differences in decision com­ plexity as a function of display size limit the extent to which it is possible to distinguish between representational or memory pro­ cesses, on the one hand, and decision processes, and the associated differences in decision noise, on the other. Accordingly, different approaches each have their own advantages and limitations. Many studies in the contemporary VSTM literature have used a change-detection task, introduced by Phillips (1974) and popular­ ized by Luck, Vogel and colleagues (e.g., Luck & Vogel, 1997; Vogel, Woodman, & Luck, 2006). Psychophysically, the changedetection task is a two-interval same-different task (Macmillan & Creelman, 1991) in which people are asked to judge whether two consecutively presented visual arrays are the same or whether one of its elements is different. More recently, researchers have inves­ tigated the precision with which features are represented in VSTM and how precision is influenced by the number of other features or objects in the display (e.g., Fougnie et al., 2010; Wilken & Ma, 2004; Zhang & Luck, 2008). These studies of representational precision have used the psychophysical method of adjustment (see Engen, 1971, for a discussion of the history) in which people are asked to reproduce some continuous attribute of a stimulus using a computer mouse or other analog device (e.g., Wilken & Ma, 2004). Although the method of adjustment fell into disuse histor­ ically because of concerns that it confounded fidelity of sensory representations with precision of motor reproduction (Engen,

2216

SEWELL, LILBURN, AND SMITH

1971; Green & Swets, 1966, pp. 401-403), it has recently been revived as a way to distinguish trials on which participants have partial information about a sensory attribute from trials on which they appear to be guessing (Zhang & Luck, 2008). Whether or not participants truly guess on some proportion of trials—that is, make decisions in the complete absence of sensory information rather than on the basis of noisy or imperfect information— has been theoretically important in attempting to distinguish between dis­ crete item, or slot, models of VSTM (Rouder et al., 2008; Zhang & Luck, 2008) and continuous resource models of memory (Bays, Catalao, & Husain, 2009; Wilken & Ma, 2004). The question of whether VSTM representations are better char­ acterized as continuous or discrete is a controversial one (e.g., Bays & Husain, 2008, 2009; Cowan & Rouder, 2009; Thiele, Pratte, & Rouder, 2011). A number of recent studies have com­ pared the fits of slot, resource, and hybrid models to data from experiments using the method of adjustment (e.g., van den Berg et al., 2014) and some authors have argued that the best model is a form of resource model in which the precision of item represen­ tations varies randomly from trial to trial (Mazyar, van den Berg, & Ma, 2012; van den Berg et al., 2012, 2014).' We do not address this aspect of the debate in detail here, as our main aim is to explore the predictions of Smith and Sewell’s (2013) competitive interaction theory. In our view, the “slots versus resources” debate remains unresolved despite recent advances in statistical models of the continuous report task. For example, there is striking converg­ ing evidence from a variety of task domains that has been argued to imply some sort of object-based limit on working memory capacity (Cowan, 2001, 2005). Within the visual domain, qualita­ tive signatures favoring discrete object representations have been reported, including “plateau” effects in estimated VSTM precision for set sizes exceeding item capacity limits (Anderson et al., 2011; Zhang & Luck, 2008, although see van den Berg & Ma, in press, for a critique of this literature), linearity of receiver operating characteristic (ROC) curves (Donkin, Nosofsky, Gold, & Shiffrin, in press; Rouder et al., 2008), and electrophysiological evidence relating changes in neural response to object load (Anderson et al., 2011; Ikkai, McCollough, & Vogel, 2010; Vogel & Machizawa, 2004; Vogel, McCollough, & Machizawa, 2005) and fMRI (Cowan et al., 2011; Todd & Marois, 2004; Xu & Chun, 2006). Rather than using change detection and/or the representational precision approach to characterize VSTM capacity, we instead used a two-alternative, orthogonal-discrimination task (Smith et al., 2004) in a poststimulus probe paradigm (Downing, 1988; Hawkins et al., 1990; Muller & Humphreys, 1991; Palmer, 1990). In the orthogonal discrimination task, participants discriminate the orientations of vertical and horizontal Gabor patch stimuli (Gauss­ ian vignetted sinusoidal gratings). In the poststimulus probe par­ adigm, people report the contents of a single display location in a backwardly masked display that is probed after stimulus offset. We chose to use orthogonal discrimination because of our larger theoretical interest in the memory representations that support psychophysical decision making (Smith & Ratcliff, 2009; Smith & Sewell, 2013). Orthogonal discrimination has been used in many attention and decision-making studies (Carrasco, Penpeci-Talgar, & Eckstein, 2000; Lee, Koch, & Braun, 1997; Sewell & Smith, 2012; Smith et al., 2004; Smith, Ellis, Sewell, & Wolfgang, 2010) and its psychophysical properties are well understood. Its contrast sensitivity function is similar to that for yes-no detection (Thomas

& Gille, 1979), but it has the methodological advantage of being largely unbiased. Because the poststimulus probe paradigm requires report of the contents of a single display location only, it controls the complex­ ity of the decision task and ensures that any display-size effects are not due to changes in decisional complexity and associated deci­ sion noise. Our use of the poststimulus probe task combined with a simultaneous-sequential presentation manipulation in Experi­ ment 2 allowed us to exclude both stimulus encoding and decisional complexity as possible causes of the invariance of 2 ; (d'j)12 and suggests it represents instead an informational ca­ pacity limit of VSTM itself. We also carried out a modeling study in which we compared the predictions of the sample-size model with a limited-capacity exponential race model, which has been shown to provide a good account of encoding capacity limitations in some tasks (Bundesen, 1987; Bundesen, Pedersen, & Larsen, 1984; Shibuya & Bundesen, 1988), and show that the sample-size model outperforms the race model. These results further reinforce our contention, and that of Palmer (1990), that the sample-size effects in our data reflect a capacity limitation of VSTM itself rather than VSTM encoding.

The Sample-Size Model The sample-size model (Bonnel & Hafter, 1998; Bonnel & Miller, 1994; Lindsay et al., 1968; Palmer, 1990; Taylor et al., 1967; Shaw, 1980) is most simply expressed in discrete time, but it can be generalized to continuous time in a natural way. In discrete time, the model assumes that the stimulus exposure dura­ tion, t, is subdivided into a set of n observation intervals, each of duration A, with t = nA. In each interval, an independent sample is drawn from a normally distributed random variable with mean p, and standard deviation cr, and these observations are summed across intervals. When there is only a single stimulus in the display, the stimulus representation is based on all n observations. The mean and variance of the sum are then n\x and pier2, respec­ tively, and d' = "\/n(p/CT). The predicted square-root law im­ provement in the signal-to-noise ratio with increasing sample size follows from elementary sampling theory. When there are m stimuli in the display and all are equally relevant to the task, the observations are assumed to be distributed equally across stimuli, so that each stimulus only receives n/m observations, which reduces the sensitivity per item to d' = That is, d! decreases in proportion to the square root 1 One issue that arises when attempting to evaluate the apparently superior performance of variable precision models over slot models is that, for large display sizes, variable precision models assume a large concen­ tration of probability mass around zero (e.g., Mazyar et al., 2012; van den Berg et al., 2012, 2014). That is, the models assume that items are supposedly retained in memory, but with no, or virtually no, information about their feature values. This allows variable precision models to account for data from trials on which people appear to be guessing—a finding that has typically been used to support slot models. How these “content-free” memory representations differ from items that have not secured a slot in VSTM remains somewhat unclear. It is likely that resolving this issue will require a model that specifies the psychological—and perhaps biological— computations required to establish and maintain representations in VSTM. We believe the competitive interaction theory is a reasonable step in this direction, and we seek to develop that account of VSTM here.

CAPACITY LIMITS OF VISUAL SHORT-TERM MEMORY

of the number of items in the display. Denoting the sensitivity to a single stimulus in an m-item display by d'm, the model predicts

m or equivalently,

2217

indefinite number of items. However, the finding that resource allocation appears to be limited to a fixed number of objects defined by an individual’s VSTM storage capacity (Barton, Ester, & Awh, 2009) appears inconsistent with the assumptions of pure resource models. Evidence for coarse-grained resource allocation led Zhang and Luck (2008) to propose their slot-plus-averaging model, which we discuss subsequently.

m

2 (d[)2 = c, (constant),

(3)

Experiment 1

;=i

which is Equation 1. It states that the sums of the squared d's will be the same for displays of different sizes. The prediction holds uniformly for all values of exposure duration, t. The sampling theory account of the model is the standard one found in the literature. When viewed in sampling terms, the model appears most naturally to characterize an encoding capacity limi­ tation, in which the number of observations, n, is an encoding resource that is proportional to the encoding time, t. However, the encoding interpretation of the model is inconsistent with our finding in Experiment 2 that performance does not differ with simultaneous and sequential presentation of the stimulus array. With sequential presentation, the time available to encode each item is held constant as display size is varied, so an encoding interpretation predicts that there should be no display size effect. An alternative to the sampling interpretation is that the samples in the model represent a memory resource rather than an encoding resource. Smith and Sewell (2013, p. 616) described a neural coding model based on the attention model of Smith (1998) in which representations in VSTM depend on the Poisson statistics of a pool of neurons. This pool is shared among the items in VSTM. For Poisson-coded representations, (d ')2 is a natural measure of stimulus information because the mean and variance of a Poisson process both equal to its intensity parameter, A. Consequently, d’ = X / \ / a = "\/A and (d1)2 = A. Poisson processes possess a superposition property: An aggregation of n independent Poisson processes with intensity A. is the same as a single Poisson process with intensity nA. For a superposition of n Poisson processes, (d')2 = nA; that is (d1)2 is additive across processes. If the pool of neurons is shared equally among the m neurons in VSTM, the superposition property gives (d 'm)2 = (d[)2/m , which is the samplesize model. A number of objections to pure resource models like the Poisson coding model have been raised in the literature. One is that there is good evidence that there is an item limit on VSTM of around four items (Anderson et al., 2011; Cowan, 2001; Luck & Vogel, 1997; Zhang & Luck, 2008). A second is that ROC analysis has provided evidence that participants do in fact guess on some proportion of trials in VSTM tasks (Donkin et al., in press; Rouder et al., 2008). Guessing is more consistent with item models of VSTM than with resource models, because the latter imply a progressive degradation of VSTM trace strength with increasing display size—although, as we noted in Footnote *, variable preci­ sion models attribute apparent guessing to trial-to-trial variability in encoding precision. Item-limited models imply that participants must guess once the VSTM item limit has been exceeded. A third objection is that resource sharing in VSTM appears to be more coarse grained than is predicted by continuous resource models. The latter imply that resources can be freely allocated among an

Experiment 1 investigated probed VSTM recall as a function of display size and stimulus exposure. Because we were interested in performance under conditions in which the theoretical four-item capacity of VSTM is not exceeded (Cowan, 2001; Luck & Vogel, 1997), we limited the maximum size of our displays to four items. We were also interested in memory representations of stimuli before they are identified by decision processes (Ratcliff & Rouder, 2000; Smith, Ratcliff, & Sewell, 2014; Smith et al., 2004). We assume that decision processes act to provide categorical (i.e., named) representations of stimuli that can be stored in an articu­ latory loop in STM (Sperling, 1960). In our orthogonal discrimi­ nation task, this would result in a categorical representation of a stimulus as either “vertical” or “horizontal.” As our aim was to assess the capacity limitations of the precategorical stage of VSTM representation, we restricted our maximum exposure duration to 200 ms to try to minimize the likelihood that decision processes would have identified the stimulus before presentation of the report cue.

M ethod Stimuli and apparatus. Stimuli were presented on a linear­ ized 21” Sony Trinitron Multiscan G520 monitor driven by a Cambridge Research Systems ViSaGe framestore. The monitor refresh rate was set at 100 Hz, giving a frame duration of 10 ms. The mean luminance of the display was 30 cd/m2. Software written in C + + controlled stimulus presentation and recorded responses. Stimuli were Gaussian vignetted, 3.5 cpd sinusoidal luminance gratings (Gabor patches) that subtended 0.97° of visual angle. The form of the stimuli was as given by Graham (1989, p. 53). Under normal viewing conditions, Gabor patches can be encoded in around 50 m s-60 ms, even when they are backwardly masked (Liu, Wolfgang, & Smith, 2009; Smith, 2000; Smith et al., 2010; Smith & Wolfgang, 2004). To allow us to assess the time course of VSTM processing, we embedded the stimuli in dynamic noise to reduce the rate of encoding (Ratcliff & Smith, 2010; Smith et al., 2014). The noise patches were constructed by assigning to each 4 X 4 pixel area in the stimulus a luminance value sampled from a truncated Gaussian distribution with mean set equal to the background display luminance. The truncated distribution was scaled to fit within 20% of the total luminance range. Participants viewed the display from a distance of 100 cm. Viewing position was stabilized with a chinrest. Participants. Five observers, including two of the authors (DS and SL) and another member of the laboratory (VT), partic­ ipated in Experiment 1. Two naive observers were recruited from outside the laboratory and remunerated at a rate of A$12 per experimental session. All observers completed a variable number

2218

SEWELL, LILBURN, AND SMITH

of practice and calibration sessions prior to completing 14 exper­ imental sessions. In the calibration sessions, stimulus contrast was adjusted by the experimenter to ensure that a range of perfor­ mance, ranging from near-chance to near-perfect, was obtained with the smallest set size. Each session lasted approximately 35 min. Design and procedure. A 4 (Set Size: 1,2, 3, or 4 stimuli) X 4 (Exposure Duration: 50 ms, 100 ms, 150 ms, or 200 ms) within-subjects design was used. Each session of the experiment consisted of 384 trials, yielding 5,376 trials per observer. Trial presentation order was randomized, and all trial types were ran­ domly interleaved; the general structure is depicted schematically in Figure 1. Each trial began with a two second uniform field, followed by a 0.29° fixation cross for two seconds, which re­ mained on screen for the remainder of the trial. The stimulus array was then presented by interleaving 10 ms duration frames of Gabor and noise patches. Stimuli could appear 3.5° from fixa­ tion at any of the four cardinal locations. Each location con­ taining a stimulus was backward masked with a high-contrast checkerboard pattern mask (Smith, 2000). Fifty ms after mask onset, a report cue was presented, prompting observers to report the orientation (vertical or horizontal) of the Gabor patch at the probed location via a button-press response. Masks and report cues remained on screen until a response was made. Auditory accuracy feedback via distinctive tones was given immediately after each response.

Results and Discussion We present the results in two parts. In this section we present the group data, focusing on the performance of the simplest, parameter-free version of the sample-size model. After describing Experiment 2, we provide a more detailed analysis of the data from both experiments at the individual observer level, in which we compare the performance of the sample-size model to a limitedcapacity exponential race model. Figure 2 shows the proportion of correct responses, P(C), averaged across observers, as a function of set size and exposure duration. A 4 (Set Size) X 4 (Exposure Duration) repeated measures ANOVA on the data revealed a significant set size effect, F(3, 12) = 31.12, MSe = .001,/? < .001, r\j = .89, reflecting poorer performance at larger set sizes. The improvement in performance with increasing exposure duration was reflected in a main effect of exposure duration, F(3, 12) = 354.15, MSe = .001, p < .001, r\p = .99. There was also a significant Set Size X Exposure Duration interaction, F(9, 36) = 7.06, MSe = .001, p < .001, t)2 = .64, showing that the rate of improvement in performance with increased exposure was greater for smaller set sizes. The results are similar to those previously reported by Vogel, Wood­ man, and Luck (2006) for a change-detection task. The solid lines in Figure 2 show the predictions of the samplesize model (Equation 3). For an unbiased observer in a twoalternative forced choice (2AFC) task, d! = 2z[P(C)], where z[.] is the standard normal deviate (z-score).2 To obtain the predictions in Figure 2, we used this formula to obtain d\ from P X{C), the proportion of correct responses for single-item displays. We then used Equation 2 to obtain predicted values of d'm for m = 2, 3, 4 for each exposure duration which we back-transformed using the formula

Pm( Q =

(4)

to obtain the predicted proportion of correct responses. Here the symbol (.) denotes the Gaussian distribution function. Figure 2 shows the data exhibit the predicted invariance of 2 ; W )2- This method of obtaining predictions presupposes that performance in the single-item reference condition is measured essentially without error. Clearly, a more realistic assumption would be to treat performance in the reference condition as a latent quantity, which must be estimated from the data with some amount of error and, indeed, this assumption is needed to obtain a reasonable account of performance at the level of individual observers. At the group level, however, the parameter-free sample-size model does a fairly good job of predicting performance in displays of two, three, and four items. Our results agree with those of Palmer (1990), who showed that the function relating sensory thresholds to display size for a line-length discrimination task had a slope of 1/2 for displays of one, two, and four items when the data were plotted in logarithmic coordinates. The predicted slope of the display size function of 1/2 in his study follows from the same square-root law that gives the invariance of 2 / (d'i)2- Our results have replicated Palmer’s with a different task (orientation discrimination vs. line-length discrimi­ nation) and, in addition, we have shown the sample-size model holds uniformly at all of the exposure durations we studied and across a range of performance from floor to ceiling. Palmer (1990) attributed the sample-size capacity limitations in his data to memory limitations. Like us, he used a design in which stimuli were presented simultaneously. However, such designs are unable to distinguish capacity limitations associated with the stor-

2 A widely used probability model for stimulus orientation in continuous reproduction tasks assumes a von Mises distribution of stimulus variability. The von Mises distribution is a close approximation to the wrapped normal distribution, which describes the probability law of a normally distributed random variable on a circular domain. It has sometimes been argued that noise in any kind of judgment involving orientation should be modeled using a von Mises distribution because of the periodic nature of the stimulus domain, rather than the more traditional normal distribution of signal detection theory, which we have used here. We do not believe it is appropriate to use a von Mises distribution for discrimination tasks, be­ cause continuous reproduction and discrimination judgments depend on different aspects of the encoded stimulus representation and require dif­ ferent probability models. The representation presumably takes the form of a statistically variable pattern of activity across of a set of cortical filters that serve as basis functions for orientation coding. The von Mises distri­ bution serves as a probability model for observers’ point estimates of the orientation at which this pattern of activity is maximal (e.g., van den Berg et al., 2012). In contrast, the probability model for discrimination assumes that observers compare the strength of activity in a pair of oriented detectors centered on the stimulus orientations, which sample this pattern of activity and respond according to whichever is the greater. Trial-to-trial variability in strength will depend on the entire pattern of activity across the set of basis functions, not just on the orientation of the maximum, and also on the orientation bandwidths of the detectors that sample it. As a result, point estimates of the maximum and of the strength of the activity in a pair of detectors centered on the stimulus orientations will only be indirectly related to each another. The relationship can only be character­ ized via a computational model that specifies how activity is distributed across basis functions and the bandwidths of the detectors that sample it. For these reasons, we believe variability in discrimination judgments is more appropriately represented using the standard Gaussian signal detec­ tion model.

CAPACITY LIMITS OF VISUAL SHORT-TERM MEMORY

2219

Figure 1. Stimuli and display timing for Experiment 1. On each trial, a variable number of stimuli were presented at the four cardinal locations around fixation. Stimuli were then backwardly masked, and a report cue prompted the observer to report the orientation of the stimulus presented at the probed location.

age of items in VSTM from capacity limitations arising at encod­ ing. Indeed, as discussed previously, the natural interpretation of the sample-size model is that it represents an encoding capacity limitation in which the number o f samples, n, is a finite resource that is proportional to the exposure duration, t. Consequently, Experiment 1 provides evidence for a sample-size limitation on information held in VSTM, but is unable to specify whether the limit is one of encoding capacity or memory resources. To attempt to distinguish these possibilities, in Experiment 2 we used a simultaneous-sequential design (Scharff, Palmer, & Moore, 2011; Shiffrin & Gardner, 1972), in which stimuli were either presented simultaneously, as in Experiment 1, or sequentially, in random order. In the sequential presentation condition, the onset o f each stimulus followed 50 ms after the offset o f its predecessor so there was only one stimulus in the display at a time. If the display-size effect in Figure 2 were due to the division o f encoding

resources across multiple items, then it should be abolished when stimuli are presented sequentially. The encoding-resources predic­ tion presupposes that only filled display locations— that is, loca­ tions that actually contain stimuli— consume encoding resources and empty display locations do not. However, another possibility is that the encoding resources allocated to any location depend on the spatial extent of the attended region, irrespective of its con­ tents, as envisaged in the zoom-lens model of attention o f Eriksen and colleagues (Eriksen & St. James, 1986; Eriksen & Yeh, 1985). This second, attended-region, account can be rejected using the results of Experiment 1. In Experiment 1, the size o f the attended region should have remained constant for displays of different sizes, because the ordering o f conditions and the locations o f the stimuli within the display were randomized from trial to trial. If the encoding resources allocated to a stimulus depends on the size of the attended region rather than on the number of items it contains, there should have been no display-size effect whatsoever in Ex­ periment 1. The large and systematic effects in Figure 2 suffice to reject this alternative and show that the only plausible version of an encoding resources account is one in which resources depend on the number of items simultaneously present in the display. This possibility is tested by the simultaneous-sequential design o f Ex­ periment 2.

Experiment 2 Method

Figure 2. Group average data from Experiment 1 (filled symbols) along with predictions of the parameter-free sample-size model (solid lines). Error bars are 1 standard error of the mean.

Stimuli and apparatus. The stimuli and apparatus were the same as in Experiment 1, except that we tested a smaller number of exposure durations and display sizes, to allow for a comparison of simultaneous and sequential presentation. Participants. Five observers, including two of the authors (DS and SL) and another member of the laboratory (SS) partici­ pated in Experiment 2. Two naive observers were recruited from outside the laboratory and remunerated at a rate of $12 per hr. All observers completed a variable number of practice and calibration sessions, in which the level of the stimulus contrast was set individually for each observer, as described in Experiment 1. They

SEWELL, LILBURN, AND SMITH

2220

then completed 18 experimental sessions, each lasting approxi­ mately 50 min. Design and procedure. The design was a 3 (Set Size: 1, 2, or 4) X 3 (Exposure Duration: 100 ms, 150 ms, or 200 ms) X 2 (simultaneous or sequential) within-subjects design. The procedure for the simultaneous condition was identical to that in Experiment 1. For arrays comprising more than one element in the sequential condition, there was a 50 ms stimulus onset asynchrony (SOA) between the onset of the backward mask of the preceding item and the onset of the next item. Stimulus presentation in the sequential condition was otherwise identical to that in the simultaneous condition. The report cue was presented 50 ms after the onset of the backward mask for the last item in the display. Within each session, observers completed alternating blocks of simultaneous and sequential trials. There were 18 blocks of trials in a session, comprising 36 trials each, resulting in 324 simultaneous and se­ quential trials per session (5,832 total trials per observer).

Results and Discussion Here we again present only an analysis of the group data, and defer consideration of the individual observer data to the following section. Figure 3 shows the proportion of correct responses as a function of display size, exposure duration, and presentation mode (simultaneous or sequential) averaged across observers. A 2 (Pre­ sentation Mode) X 3 (Set Size) X 3 (Exposure Duration) repeated measures ANOVA on the proportion correct data confirmed that performance was affected by exposure duration, F(2,8) = 56.25,MSe = .01, p < .001,t)2 = .93, and set size, F(2,8) = 30.21 ,MSe — .004, p < .001, ti* = .88. As in Experiment 1, these factors interacted, F(4, 16) = 5.56,M Se = .001, p = .005, r\2 =

Exposure Duration (ms)

Figure 3. Group average data from Experiment 2 (symbols) along with predictions of the parameter-free sample-size model (solid line). Filled symbols are simultaneous presentation, open symbols are sequential pre­ sentation. The predictions for the simultaneous and sequential presentation conditions were generated from single-item display performance averaged across simultaneous and sequential conditions. Error bars are 1 standard error of the mean. Because performance in the two conditions was so similar, the plotting symbols have been slightly offset along the abscissa for presentation purposes.

.58, reflecting a faster rate of improvement at lower set sizes. There was a marginal effect of presentation mode, F (l,4) = 1 3 l,M S e = .001, p < .06, rip = .65, reflecting the fact that performance in the sequential condition (P(C) = .75) was slightly worse than in the simultaneous condition (F(Q = .76). Presenta­ tion mode did not interact with either of the other variables (maximum F = 2.16, minimum p = .18). Overall, then, there is no evidence that sequential presentation of the memory array con­ ferred any performance benefit— contrary to the predictions of the limited-capacity encoding account (see Figure 4). We also considered whether serial position effects for sequen­ tially presented displays might account for the lack of any sequen­ tial presentation benefit. For example, items presented earlier in a sequence have to be retained in memory for longer, relative to the simultaneous condition. Any detrimental effect of temporal decay for early items might offset potential benefits of sequen­ tial presentation. The serial position data for sequentially pre­ sented four-item displays, each level of exposure duration, are shown in Figure 3. A 3 (Exposure Duration) X 4 (Serial Position) repeated measures ANOVA on the serial position data revealed an effect of exposure duration, F(2,8) = 49.95, MSe = .005, p < .001, r\2p = .93, reflecting better perfor­ mance at longer exposure durations, in addition to a serial position effect, F(3, 12) = 15.35, MSe = .007, p < .001, ^ = .79. The interaction was not significant, F (6,24) = 1.42, MSe = .004, p = .25, y]2 = .26. Contrary to a decay account of the lack of a sequential presentation benefit, at each level of exposure dura­ tion, performance on the first list item is highly accurate. A paired-samples t test, averaging across exposure durations, re­ vealed a significant reduction, M = —.20, in accuracy across serial positions 1 and 2, t(4) = 5.02, p = .007, r2 = .86. Although Figure 3 suggests some evidence of an upturn in performance for the most recently presented item, we did not find statistical support for this in our data. The mean increase in accuracy across serial positions 3 and 4 was relatively small, M = .04, and the difference nonsig­ nificant, f(4) = 1.37, p = .24, r2 = .32 (cf. Phillips & Christie, 1977; Woodman, Vogel, & Luck, 2012). Given the primacy effect in our sequential data, we were par­ ticularly interested in whether the predictions of the sample-size model would accommodate these data. The solid lines in Figure 3 show the predictions of the parameter-free sample-size model for simultaneous and sequential presentation condition, obtained in the same way as for Experiment 1. Although the data for Experiment 2 are noisier than those for Experiment 1, they agree in showing that the sample-size model accounts for the display size effect at all exposure durations in both simultaneous and sequential presen­ tation modes. This is particularly surprising because it shows that the information limit on VSTM capacity is robust against variabil­ ity in the quality of item-level representations in memory. Criti­ cally, the finding of similar display-size effects with simultaneous and sequential presentation rejects the limited-capacity encoding interpretation of this model. It suggests, rather, that the locus of the effect is the information capacity of VSTM itself, as Palmer (1990) proposed. In a subsequent section, we describe models of this kind. First, however, we compare the performance of the sample-size model, at the individual and group level, to that of a limitedcapacity exponential race model. Exponential models have suc­ cessfully characterized the extraction of information from brief,

CAPACITY LIMITS OF VISUAL SHORT-TERM MEMORY

2221

Figure 4. Effect of serial position on accuracy for sequentially presented four-item displays in Experiment 2 at each level of exposure duration. Error bars are 1 standard error of the mean.

multielement arrays (Bundesen, 1987; Bundesen et al., 1984; Shibuya & Bundesen, 1988).

Comparing Signal Detection and Exponential Race Models Like other signal detection models, the sample-size model as­ sumes that performance depends on the strength of a noiseperturbed stimulus representation (Green & Swets, 1966; Swets, 1984), which increases with exposure duration through the action of temporal integration processes (Liu et al., 2009; Smith & Ratcliff, 2009; Smith & Sewell, 2013). We have shown above that the sample-size model provides a remarkably accurate parameter free description of the group data from both of our experiments. However, an alternative view that is well established in the liter­ ature is that stimuli race to enter a limited-capacity system of some kind (Bundesen, 1990; Fisher, 1982; Shibuya & Bundesen, 1988). Unlike signal detection or temporal integration models, race mod­ els assume that items enter the limited-capacity system in an all-or-none way: The probability of an item being correctly iden­ tified is a function of the probability that it enters the limitedcapacity system, which, in tasks using brief stimulus exposures, is often identified theoretically with VSTM (Bundesen, 1987; Bundesen et al., 1984; Rouder et al., 2008). The probability of an item entering VSTM will depend on exposure duration and may depend on display size even when the VSTM item capacity is not exceeded, because the processes that select stimuli into VSTM may themselves be limited in capacity. In this article, we limit ourselves to contrasting the broad classes of signal detection and object-based race models with minimal processing assumptions. The different versions of the models were consider here were chosen because of their previous success in handling data from experiments similar to ours, and also because of their simplicity and tractability.

In view of the success of race models in describing performance with brief, multielement arrays, we sought to determine whether a model of this kind could provide an account of our data. The most common and most tractable assumption is the time taken for an item to enter VSTM is exponentially distributed (Bundesen, 1987, 1990; Bundesen et al., 1984; Townsend, 1972). That is, the selec­ tion of an item into VSTM is a Poisson event. Allowing for temporal delays associated with early perceptual processing, the probability that an item will be selected into VSTM, denoted ^memory, given exposure duration, t, can be described by a shifted exponential distribution, with rate parameter |3 and temporal offset t0, /5m e m o r y = l-^ P(' ' ' 0), (f “ T0) > 0.

(5)

The temporal offset term, t0, is needed for an exponential model to describe data from experiments using very short exposure durations, in which performance is at or near chance (e.g., Shibuya & Bundesen, 1988). The ( t - f0) in Equation 5 can be interpreted as the effective stimulus processing time, where the offset depends on the time required to complete early perceptual processing. The effective processing time therefore reflects the time available for transferring a perceptual representation of the stimulus into VSTM. The exponential rate parameter, (3, characterizes the rate at which stimulus information is selected into VSTM and becomes available to a decision mechanism. In perceptually unlimited ca­ pacity models, (3 is independent of the number of items in the display. In such models, the only capacity limitation is the item capacity of VSTM itself. In perceptually limited capacity systems, (3 decreases with increases in display size. As we found display size effects in Experiments 1 and 2 with even the smallest displays (e.g., 1 vs. 2 item displays), we have focused on perceptually limited-capacity systems.

2222

SEWELL, LILBURN, AND SMITH

Under exponential processing assumptions, VSTM selection is an all-or-none process in which the probability of selection is the principal determinant of performance. When an item is not se­ lected, the observer is forced to guess. For an unbiased observer in a 2AFC task, the probability of a correct guess is one half. The overall probability of a correct response, corrected for guessing, is 1

P{Q

^memory

2^

^memory ) .

( 6)

Rouder et al. (2008) argued that Equation 6 is unduly restrictive, because it assumes that participants pay attention on each trial. They proposed that it is more plausible to assume that people experience occasional lapses in concentration or attention, which result in no usable stimulus information on some trials. Letting a denote the proportion of attended trials, and assuming unbiased guessing on the proportion of trials where an attentional lapse occurs, the predicted proportion of correct responses is P (Q = a

Dmemory ^ " 2 ^

^memory)

+ -(!-< * ).

(7)

Algebraically, Equations 6 and 7 can be rewritten as

P(Q ~

2 + ^ m e m o ry

( 8)

and three to fit Experiment 2. We also considered a more con­ strained, linear growth, version of the model, based on the pre­ diction that (d')2 grows linearly with exposure duration. This prediction follows from the square-root law for the growth in sensitivity with sample size. As a result, it suffices to estimate the latent values of d[ at two exposure durations only. This model, which simultaneously tested the invariance of (d'i)2 and the linearity of {d')z, had two free parameters for both Experiments 1 and 2. To compare performance of the parameterized sample-size and exponential race models, we fitted the models described above to the data from each individual observer, as well as the group average, by minimizing a likelihood ratio statistic, G2,

G2 = 2 2 A d > 0ln(— l i= l

j= 1

\'tty /

In Equation 10, the summation over i runs over the c conditions in the experiment, A,, is the number of trials in condition i, p„ is the observed proportion of responses in outcome category j (i.e., correct or incorrect), and tty is the proportion predicted by the model. For Experiment 1, c = 16 and A,- = 336; for Experiment 2, c = 18 and A,. = 324. Because different versions of the sample size and exponential models varied in terms of their numbers of free parameters, we carried out model comparison using the Bayesian Information Criterion (BIC),

and

BIC = G2 + Mn(A), 1

a

P (Q = 2 + 2 ^ memory’

(9)

respectively. Written in this way, it is evident that the effect of the attention-lapse assumption is simply to add a free parameter that describes how accuracy grows as a function of stimulus informa­ tion. This endows the model with much greater flexibility than the simple exponential-selection-plus-guessing model of Equations 6 and 8, because the a parameter is not constrained by any of the physical properties of the display. This contrasts with the samplesize model, which, in its simplest form, is completely parameter free. In using Equation 7 as the equation for our race model we were seeking to ensure that the model had every opportunity of competing successfully against the sample-size model. We considered two versions of the exponential race model, which differed in the constraints they imposed on the rate param­ eter, (3. The more flexible unconstrained encoding model assumed separate values of (3 for each set size; these were estimated freely from the data, along with the t0 and a parameters. The second model was &fixed encoding capacity model that assumed encoding rates were reduced in proportion to the number of items in the display: (3m = (3jtm. This version of the model simultaneously predicted the effects of exposure duration and set size using a single estimated value of (3, with t0 and a again as free parameters. For model comparison purposes, we also considered two pa­ rameterized versions of the sample-size model. In one, the flexible growth version of the model, we treated d\ as a latent variable that we estimated from the data at each exposure duration. This model required the sample-size prediction to hold for every exposure duration but imposed no constraints on the rate of growth of d' with time. This required four free parameters to fit Experiment 1

(10)

(11)

where k is the number of free parameters in the model, and A is total number of observations on which G2 was computed. For Experiment 1, A = 5376 and for Experiment 2, A = 5832. Like the Akaike Information Criterion (AIC = G2 + 2k), the BIC combines an index of model fit with a penalty term for model complexity, which is based on the number of free parameters. The penalty term reflects the inherent tradeoff between model complexity and fit to data; more complex models tend to fit better. Unlike the AIC, the penalty in the BIC scales with the overall sample size, which makes it less likely to prefer more complex models as the sample size increases (Liu & Smith, 2009; Wagenmakers & Farrell, 2004). Because the BIC incorporates a more severe penalty term for complexity, it is useful to compare model selection outcomes under both AIC and BIC. To foreshadow our results, for both experiments, the data strongly favor the sample-size model over the exponential model regardless of whether the AIC or BIC is used for model comparison. However, at the level of individual observers, the specific version of the preferred model differs depending on which statistic is used.3

Experiment 1 For Experiment 1, we parameterized the sample size and expo­ nential models as follows. For the linear growth sample-size

3 For model selection purposes, we do not consider the performance of the parameter-free sample-size model. Because that model utilizes a subset of the data to generate predictions, it is unclear what an appro­ priate penalty term to apply, in an AIC or BIC sense, would be. For qualitative comparison purposes with the other parameterized models, we report G2 as an index of model fit for the parameter-free sample-size model in Tables 1 and 2.

CAPACITY LIMITS OF VISUAL SHORT-TERM MEMORY

model, the number of VSTM samples for a single-item array, d\, was estimated as a free parameter for the 50 ms and 200 ms exposure duration conditions, resulting in two free parameters. Predictions for the intermediate exposure durations were made by interpolating linearly between these two points in (d')2 space in accordance with Equation 2. For the flexible growth version of the sample-size model, d\ was estimated for each level of exposure duration, resulting in four free parameters. For each model, we estimated d' from the proportion of correct responses, P(C), in single-item displays and used Equation 4 to predict performance in displays of two, three, and four items. For both versions of the exponential model, we estimated the perceptual encoding time, t0, and the probability of attending to the trial, a , from the data. For the fixed encoding rate expo­ nential model, the sum of encoding rates, |3; = (3, was estimated as a third free parameter from the data (i.e., (3m = (3,/m). For the unconstrained encoding model, separate encod­ ing rates were estimated for each level of set size, resulting in six free parameters. Figure 5 shows the fits of the flexible growth and the linear growth versions of the sample-size model to the group data from Experiment 1, together with the fits of the unconstrained encoding and fixed-capacity exponential race model. Fit statistics for each model (G2, BIC, and AIC) are shown in Table 1. Best fitting parameter estimates for the models at both the individual and group average levels are presented in the Appendix (Tables A1 and A2). It is clear from the figure that the sample-size model and the exponential model both provide good accounts of the general patterns in the data. Specifically, the models correctly predict effects of set size and exposure duration. For four out of five observers (observer JH being the exception) a version of the sample-size model provided, in a BIC sense, a better account of the data than either of the exponential models (see Table 1).

2223

Although model comparison using the AIC identifies a more diverse set of preferred models across observers— the flexible growth version of the sample-size model provides the best account of the data of observers DS and HW whereas the linear growth sample-size model provides the best account of observer VT’s data— for three out of five observers, a version of the sample-size model is preferred (observers JH and SL being exceptions). For the group data, regardless of whether AIC or BIC is used, the linear growth sample-size model provides the best fit to data. As discussed above, the least constrained versions of the sample size and exponential models differ in the relative flexibility with which they address set size and exposure duration effects. Whereas the sample-size model is highly constrained in the set size domain, it is relatively unrestricted in the temporal domain. By contrast, the exponential model has greater flexibility in the set size domain, but is more constrained in the temporal domain—although the tempo­ ral offset parameter, t0, endows the model with much greater freedom than it would otherwise have. It is therefore important to contrast the performance of the models under strong constraints in both the temporal and set size domains. For this comparison, we contrasted the fits of the two-parameter linear growth sample-size model with the three-parameter fixed encoding capacity exponen­ tial model. For four out of five observers (observer JH again being the exception) in addition to the group average data, the twoparameter sample-size model provided a superior fit in a BIC sense, compared with the three-parameter exponential model (mean ABIC = 9.79).

Experiment 2 We fitted the sample size and exponential race models to the individual observer and group data from Experiment 2, using a

Linear Interpolation Sample-Size Model

Flexible Growth Sample-Size Model

Fixed Encoding Capacity Exponential Model

Unconstrained Encoding Exponential Model

Figure 5. Fits of the parameterized sample-size (top panels) and exponential (bottom panels) models to the data from Experiment 1.

SEWELL, LILBURN, AND SMITH

2224 Table 1

Fit Statistics (G2, BIC, and AIC) fo r Different Versions o f the Sample-Size Exponential Models in Experiment 1 SSOp Observer DS HW JH SL VT

Group

G2 67.13 30.85 106.42 38.80 29.64 7.50

SS2p

BIC

AIC

G2

___

— — — — — —

69.24 21.17 31.33 33.34 21.94 8.46

— — — —



SS4p

Exp3p

Exp6p

BIC

AIC

G2

BIC

AIC

G2

BIC

AIC

G2

BIC

AIC

86.42

73.24 25.17 35.33 37.34 25.94 12.46

35.85 14.98 30.85 26.44 18.73 4.54

70.21

49.34 65.21 60.80 53.09 38.90

43.85 22.98 38.85 34.44 26.73 12.54

77.79 23.32 22.39 33.21 26.34 19.72

103.56 49.09

83.79 29.32 28.39 39.21 32.34 25.72

46.50 14.97 20.81 17.88 18.44 11.54

98.04 66.51 72.35 69.42 69.98 63.08

58.50 26.97 32.81 29.88 30.44 23.54

38.35

48.51 50.52 39.12 25.64

48.15

58.97 52.11 45.49

Note. For each observer, the lowest BIC is shown in bold. For the parameter free sample-size model, only G2 is shown (see text for details). BIC = Bayesian Information Criterion; AIC = Akaike Information Criterion; SSOp = Parameter free sample-size model; SS2p = Two-parameter linear growth sample-size model; SS4p = Four-parameter flexible growth sample-size model; Exp3p = Three-parameter fixed encoding capacity exponential model; Exp6p = Six-parameter unconstrained encoding exponential model.

similar procedure as for Experiment 1. Because the absence of any benefit of sequential array presentation immediately discounts an encoding capacity interpretation of the sample-size model, we interpreted the display size effects predicted by the model as a VSTM information capacity limit, arising from a limit on the number of stimulus samples available to represent the set of stored items. Accordingly, for the flexible growth and linear growth versions of the model, we constrained the model parameters to be the same in the simultaneous and sequential presentation condi­ tions. In the case of the exponential model, the absence of a sequential presentation advantage is strong evidence against a limited-capacity perceptual encoding process. Nevertheless, we investigated whether the model could account for the data from the simultaneous condition. Because sequential presentation involved a fixed processing load of one item, regardless of overall display

size, we estimated a single encoding rate parameter that was applied to all set sizes in the sequential condition. For the fixed encoding capacity model, we constrained the encoding rates to sum to a constant and estimated a single, separate, encoding rate parameter for all set sizes under sequential presentation. Fits of the parametrized sample-size and exponential race mod­ els to the data from Experiment 2 are shown in Figure 6. The corresponding fit statistics (G2, BIC, and AIC) are shown in Table 2. Best fitting parameter estimates for individual observers and the group data are presented in the Appendix (Tables A3 and A4). As in Experiment 1, the sample-size model provides a very good fit to the data. The sample-size model was able to fit the relatively small differences in performance across the simultaneous and sequential conditions by assuming invariance of performance in the two conditions. In contrast, the exponential model, although it provides

Linear Growth Sample-Size Model

Flexible Growth Sample-Size Model

Fixed Encoding Capacity Exponential Model

Unconstrained Encoding Exponential Model

Figure 6. Fits of the parameterized sample-size model (top panels) and exponential model (bottom panels) to the data from Experiment 2. Filled symbols denote data from simultaneous presentation conditions, open symbols are sequential presentation. Symbols have been slightly offset along the abscissa for presentation purposes.

CAPACITY LIMITS OF VISUAL SHORT-TERM MEMORY

2225

Table 2 Fit Statistics (G2, BIC, and A1C) fo r Different Versions o f the Sample Size Exponential Models in Experiment 2 SSOp Observer

G2

BIC

DS MP RGD SL SS Group

56.16 51.77 28.14 28.10 38.82 3.98

— — — — —



SS2p

SS4p

Exp3p

Exp6p

AIC

G2

BIC

AIC

G2

BIC

AIC

G2

BIC

AIC

G2

BIC

AIC

— — — — — —

41.05 45.37 29.43 27.31 24.70 9.08

58.39

45.05 49.37 33.43 31.31 28.70 13.08

31.67 40.36 25.62 20.00 21.68 3.79

57.69

66.37 51.63 46.01 47.69 29.80

37.67 46.36 31.62 26.00 27.68 9.79

42.68 123.24 95.25 45.16 161.77 70.21

77.37 157.93 129.94 79.85 196.46 104.90

50.68 131.24 103.25 53.16 169.77 78.21

26.02 117.18 77.46 32.11 148.14 62.35

78.05 169.21 129.49 84.14 200.17 114.37

38.02 129.18 89.46 44.11 160.14 74.35

62.71 46.77 44.65 42.04 26.42

Note. For each observer, the lowest BIC is shown in bold. For the parameter free sample-size model, only G2 is shown (see text for details). BIC = Bayesian Information Criterion; AIC = Akaike Information Criterion; SSOp = Parameter free sample-size model; SS2p = Two-parameter linear growth sample-size model; SS4p = Four-parameter flexible growth sample-size model; Exp3p = Three-parameter fixed encoding capacity exponential model; Exp6p = Six-parameter unconstrained encoding exponential model.

a reasonable characterization of performance in the simultaneous condition, is unable to characterize the set size effects in the sequential condition. The model comparison results for Experi­ ment 2 are therefore unambiguous. For all five observers, includ­ ing the group data, a version of the sample-size model is preferred over either version of the exponential model. Whereas the BIC favors the linear growth model for all observers, including the group average, apart from observer DS, the flexible growth version of the sample-size model is preferred for all observers, including the group average, under AIC. Turning to the versions of the sample-size and exponential models that were constrained in both the temporal and set size domains, we contrasted the fits of the two-parameter linear growth sample-size model with the four-parameter fixed encoding capac­ ity exponential model. For all five observers and for the group average data, the two-parameter sample-size model provided a better fit in a BIC sense, compared with the four-parameter expo­ nential model (mean ABIC = 77.40). In summary then, our comparison of models for individual observer performance consistently favored the sample-size model over the exponential encoding model. These results are in addition to our more general conclusion that limited-capacity VSTM en­ coding models cannot account for the results of Experiment 2 because these models predict no display size effects with sequen­ tial presentation.

A Computational Model of VSTM Item and Information Capacity In this section we describe a computational model of VSTM capacity based on the competitive interaction theory of Smith and Sewell (2013), which predicts the sample-size effects we have observed experimentally. The competitive interaction theory gen­ eralizes the integrated system theory of attention and decision making of Smith and Ratcliff (2009). Whereas the integrated system theory was developed to account for performance in visual attention tasks in which a single stimulus is presented in an otherwise empty display, Smith and Sewell’s theory was devel­ oped to account for tasks in which targets are presented among distractors. Both theories are formulated as continuous-time dy­ namic models that are comprised of submodels of perceptual encoding, VSTM, attention, and decision making. In each model, the role of VSTM is to preserve a durable representation of a target

stimulus while it is being identified by a decision process. Smith and Sewell’s model ascribes sample-size effects to the information capacity of VSTM itself, which is characterized as a trace strength normalization property. The version of the VSTM model reported here has both item and information capacity limits, so its predic­ tions for large display sizes agree with those of slot models, in which observers are forced to guess when the item capacity of VSTM is exceeded. Although we have found the inclusion of an item limit to be a useful theoretical property (see Smith & Sewell, 2013, for details), it is not required to address the effects in our data. The main elements of Smith and Sewell’s (2013) model are illustrated in Figure 7. The model assumes that VSTM selection occurs via a process of competitive interaction. This process is similar to that envisaged in the biased competition model of Desimone and Duncan (1995), but is formulated in an explicit, quantitative way. Mathematically, the dynamics of selection and VSTM trace formation are described by coupled systems of com­ petitively interacting shunting differential equations (Grossberg, 1980, 1987a, 1987b, 1988). These equations describe the time course of VSTM formation and how the contents of VSTM depend jointly on the observer’s attentional set and on the nature and number of stimuli in the display. The process of VSTM trace formation occurs via a pair of interacting pathways that Smith and Ratcliff (2009) called energy and form pathways and Smith and Sewell (2013) called where and what pathways. Contrast energy in the where pathway signals the presence of target-like stimuli at particular locations in the display. The activ­ ity in the where pathway gates the stimulus information carried by the what pathway at the selected locations into VSTM. Selection is assumed to be object-based (Duncan, 1984): Perceptual objects that have target features are selected into VSTM. Only those stimuli that are selected by the where pathway compete to enter VSTM; the remaining stimuli are rejected preattentively. Compet­ itively interacting shunting equations exhibit a normalization prop­ erty (Foley, Grossberg, & Mingolla, 2012; Grossberg, 1980, 1987a, 1987b, 1988; Smith et al., 2010) in which the strengths of the representations of individual items are reduced in proportion to the number of items in the memory system. This is the basis of the model’s ability to reproduce the predictions of the sample-size model and, moreover, to predict VSTM resource allocation dy­ namics in the simultaneous-sequential paradigm, which the encod-

■SF.WF.TJ ., LILBURN, AND SMITH

2226

What Pathway and VSTM Attention Gain

Where Pathway Attention Gain

Figure 7. Overview of the competitive interaction model of Smith and Sewell (2013).

ing capacity interpretation of the sample-size model cannot. The normalization property of the shunting equations used to represent VSTM in the model endows it with similar properties to the normalization models of attention that have been proposed by Reynolds and Heeger (2009) and Lee and Maunsell (2009). Unlike normalization models, however, which only predict the asymptotic strengths of stimulus representations, Smith and Sewell’s (2013) competitive interaction model also predicts their time courses. VSTM trace formation in Smith and Sewell’s (2013) model is described by three systems of shunting equations, which describe,

respectively, activity in the where pathway, the what pathway, and the VSTM trace itself. These systems are depicted in Figure 7; the mathematical details are summarized in the Appendix. The nodes in the pathways denote stimuli; Figure 7 depicts a display with two stimuli, but the model applies to displays of arbitrary size. Nodes in the where pathway are denoted u; (Greek upsilon), i = 1 , 2 . . . , m\ nodes in the what pathway are denoted v, (Greek nu), and nodes in the VSTM trace are denoted o); (Greek omega). The activity in the nodes are continuous functions of time, as shown in Equations A4, A6, and A7, but, for simplicity, the time dependency is omitted from the notation in the figure. As shown in the lower panel of Figure 7, nodes in the where pathway are excited by the contrast energy in the set of m stimuli, I* t, i = 1 ,2 . . . m. In this notation, Iv denotes the intensity, or amplitude, of the stimulus at location i, which is a function of its visually transduced contrast, and /„2 ; is its squared amplitude, or power. The stimulus intensities are written with a second subscript, u, denoting the pathway, to emphasize that the information en­ coded by the where and what pathways may be different. The time course of perceptual encoding is described by a sensory response function, p,(t) (Equation A l). Mathematically, the sensory re­ sponse is obtained by putting a brief, pulsed stimulus of duration d through a linear filter consisting of a series of cascaded expo­ nential stages (Equation A2). Linear filter models of visual tem­ poral sensitivity are commonly used in visual psychophysics to describe the temporal smoothing effects of early visual processing (Busey & Loftus, 1994; Watson, 1986). The temporal properties of the sensory response are controlled by separate onset and offset parameters, the relative values of which vary depending on whether stimuli are backwardly masked: Unmasked stimuli are subject to relatively slow iconic decay; masked stimuli are rapidly suppressed by the mask (see Equations A l and A2 for details). The activity in the where pathway depends jointly on the con­ trast energy in the stimulus and on the observer’s cognitive rep­ resentation of the target stimuli. In general, 7?,,- can be thought of as the energy in a saliency map (Itti & Koch, 2001; Li, 2002), in which large values of provide strong evidence for a target stimulus at location, i. Nodes in the where pathway are excited by external stimuli; they mutually inhibit one another and they selfexcite via nonlinear feedback, via the feedback function, /(u,). When there is only a single stimulus in the display at location i, activity in the display at that location will drive node u; to satura­ tion; that is, the stimulus at location i will be maximally selected. When there is more than one stimulus in the display the dynam­ ics become more complex. The VSTM selection performed by the where pathway then depends on the form of the feedback function (Grossberg, 1980, 1987a, 1987b, 1988). When the feedback func­ tion is increasing and positively accelerating, like a power func­ tion, the where pathway selects the strongest, or most target-like, stimulus in the display and suppresses all other stimuli. Under these circumstances, the model reproduces the predictions of the signal detection MAX model (Smith & Sewell, 2013; see also Appendix). When the feedback function is positively and then negatively accelerating, like a sigmoid, all stimuli above a certain threshold are selected and all stimuli below that threshold are suppressed. In the competitive interaction model, the activity in the where pathway is modulated by spatial attention, via the attentional gain, y T h e values of y,. are points on a spatiotemporal attention weight function, similar to that proposed in the episodic

CAPACITY LIM ITS OF VISUAL SHORT-TERM M EMORY

theory of attention of Sperling and Weichselgartner (1995). Attentional modulation of the where pathway means that attended stimuli are more strongly selected and are more rapidly transferred to VSTM, as proposed by Smith and colleagues (Liu et al., 2009; Smith, 2000; Smith & Wolfgang, 2004, 2007; Smith, Wolfgang & Sinclair, 2004) and Carrasco and McElree (2001). The where pathway does not carry the information used to make perceptual decisions; rather, it gates the information carried by the what pathway into VSTM. As shown in the upper panel of Figure 7, nodes in the what pathway, denoted v;, are excited by external stimuli. The model assumes that the growth in activity in the nodes in the what pathway depends on the contrast energy in the stimuli, which is proportional to the time-integrated squared amplitude, and denoted /L As dis­ cussed by Smith and Sewell (2013), the idea that the what pathway responds to contrast energy follows from the assump­ tion that stimulus information in the pathway is encoded per­ ceptually by matched filters, which are optimum linear filters for detecting targets with known spatiotemporal characteristics (Cooper & McGillem, 1986, p. 345). As shown in the upper panel of Figure 7, stimuli encoded by the what pathway compete to enter VSTM. The strength of the VSTM trace for the stimulus at location i is denoted co;. VSTM in the competitive interaction model has both an item capacity and an information capacity, which is realized mathematically using a combination of additive and multiplicative inhibition (Equation A7). Multiplicative inhibition, acting in concert with threshold nonlinearities, endows the system with an item capacity, which we refer to as the A'-capacity, following Shibuya and Bundesen (1988). The A'-capacity expresses the idea that the unit of attentional selection is the perceptual object (Duncan, 1984; Egly, Driver, & Rafal, 1994; Moore, Yantis, & Vaughan, 1998) and that the capacity of VSTM is, at least in part, item-based (Cowan, 2001; Luck & Vogel, 1997; Rouder et al., 2008). We include discussion of A'-capacity here to provide a complete overview of the competitive interaction theory, and show how it is possible for item limits to coexist with information limits within a unified computational framework. As we noted previously, however, the A'-capacity is not needed to account for our results because our display size that did not exceed the theoretical item capacity limits in the model. The VSTM equations in the competitive interaction model also have an additive inhibition component, which means that repre­ sentations in VSTM compete with one another, as depicted by the arrow connecting the nodes to, in Figure 7. The normalization properties of competitively interacting systems of shunting equa­ tions means that VSTM trace strength is reduced in proportion to the number of items in memory. The nodes in VSTM are excited by activity in the what pathway, v,. The assumption that activity in the what pathway is encoded by matched filters, whose outputs are proportional to stimulus energy, means that the VSTM traces are also a function of stimulus contrast energy. As a result, VSTM normalizes the squared amplitudes rather than the amplitudes themselves. The normalization properties of VSTM predicts the invariance of '^ Ji(d'i)2 we have found experimentally, under the assumption that d' is proportional to the VSTM trace strength of the probed item.4

2227

As shown in Figure 7, the model assumes that items strengths in VSTM are weighted by attention. The effects of attention on VSTM trace strength is separate from its effects on selection strength, via modulation of where pathway activity. The dual locus of attention in the model allows it to account for attentional effects when targets are presented either in isolation or among distractors and is consistent with physiological evidence showing that atten­ tion can act at multiple points in the processing system (Buffalo, Fries, Landman, Liang, & Desimone, 2011). Figure 8 shows the time course of activity in the where pathway, the what pathway, and the VSTM trace strength for display sizes of m = 1, 2, 3, 4 target items, presented for 500 ms. We used long exposure durations to generate these predictions to show the mo­ del’s performance when exposures are sufficient to allow VSTM trace strength to reach its asymptotic maximum. The predictions were obtained by setting the mean of /„ the intensity of the target stimuli arbitrarily to 0.5. We use this value for illustrative purposes only; none of the main properties of the model are altered by assuming different values. The predictions are for a noiseless display; Smith and Sewell (2013) showed that the model is robust to the effects of added noise. To allow the activities associated with the stimuli in the figure to be distinguished visually, the actual set of intensities used in the simulation was I, = {.53, .51, .49, .47}. We assumed that the stimulus intensities in the where and what pathways were the same, so we have written 7( without a second subscript denoting the pathway. The other model parame­ ters used to generate the predictions in the figure are listed in Table A5 in the Appendix. To generate the predictions in the figure, we assumed a sigmoid feedback function, fiy f, in the where pathway and set the item capacity (A'-capacity) of VSTM to four items. With these settings, the system attempts to transfer as many of the items in the display that exceed the selection threshold into VSTM. If the number of above-threshold items exceeds the A'-capacity, the K strongest items are transferred to VSTM while the weaker items are ex­ cluded. When the feedback function is quadratic, the system se­ lects and forms a VSTM representation of only the largest, or most target-like, stimulus in the display, regardless of assumptions about item capacity limits. Smith and Sewell (2013) termed the two modes of information acquisition in the model, which are determined by the shape of the feedback function, acquisition mode and search mode. The feedback function describes the way in which the top-down allocation of resources interacts with bottom-up stimulus-driven processes to determine which stimuli are selected into VSTM and available to the decision process. As shown in the top, left-hand panel of Figure 8, when there is only a single stimulus in the display, m = 1, the stimulus at that location is strongly selected by the where pathway. The activation

4 Smith and Sewell (2013, p. 616) discussed an alternative neural coding model in which stimuli are represented in the activity of excitatory-inhibitory pairs of Poisson shot noise processes (cf. Smith, 2010; Smith & McKenzie, 2011). Like the computational model outlined in the main text, this model naturally predicts the additivity of id')2, owing to the mean and standard deviation of a Poisson process being equal to the Poisson rate, However, because both the mean and variance of the underlying Poisson processes scale with object load normalization would be assumed to occur at the level of the excitatory-inhibitory shot noise rates, rather than squared trace strength, as it is in the model discussed in the main text

SEWELL, LILBURN, AND SMITH

2228

m =1

0.50

0.35

0.29

m = 4

0.25

Figure 8. Where pathway activity, what pathway activity, and VSTM trace strength predicted by the competitive interaction model of Smith and Sewell (2013) for simultaneously presented stimuli in displays of one, two, three, or four items. The stimulus intensities used to generate the predictions were = {.53, .51, .49, .47).

in the pathway, u;(t), increases to a maximum that depends on the relative magnitudes of the feedback function, f(vt), and the path­ way decay parameter, \ (see Equation A4). Activity in the where pathway gates stimulus information at the target location into VSTM. The what pathway forms a veridical representation of Iu the intensity of the target stimulus. Asymptotically v,(f) ^ .5 3 , which is the value of /, used in the simulation. More generally, the asymptotic strength of activation in the what pathway depends on

the perceptual encoding dynamics of the stimulus, determined by Equations A1 and A2. When stimulus exposure durations are long, perceptual processing is able to run to completion and the activa­ tion in the what pathway has time to reach its theoretical maximum of If perceptual processing is truncated, for example, by pre­ sentation of a backward mask, the asymptotic activation would be reduced. Because in Figure 8, there is only a single stimulus in the display, there is also no competition for VSTM capacity, and a

CAPACITY LIMITS OF VISUAL SHORT-TERM MEMORY

veridical VSTM representation of the stimulus is formed from the information carried by the what pathway: rn^oo) —» v,(oo) =» 0.5. When the system is in acquisition mode it attempts to select as many stimuli as possess target features as possible into VSTM. When there are two or more targets in the display, the where pathway attempts to select all of them and the what pathway forms representations of each target. Asymptotic where pathway activity is reduced by selection competition, but has a comparatively small effect on the gated what pathway activity when exposure duration is long, as shown in the panels in the middle column of Figure 8. For all display sizes, the what pathway forms a veridical repre­ sentation of the stimulus intensity at location i. The principal effect of larger displays is on VSTM trace strength, co,(f), is shown in the panels on the right. Competition among stimuli entering VSTM leads to normalization of VSTM trace strengths (Equation A7). When there are two stimuli in the display the mean asymptotic trace strength is \ / o . 5 2/2 = 0.35; when there are three stimuli it is '\/o .5 2/3 = 0.29, and when there are four it is "\/o .5 2/4 = 0.25. That is, the model reproduces the predictions of the sample-size model for displays up to the item capacity of VSTM. The predictions of the competitive interaction model do not depend on sampling assumptions as they do for the sample-size model; instead, they arise from the normalization properties of the VSTM model. Because the VSTM growth equa­ tions operate on squared-amplitude, energy representations of stimulus intensities, they normalize the sum of the squared trace strengths, not the trace strengths themselves. Consequently, the model predicts 2 ; W,')2 = c and not 2 ; 4 ; = cThe main aim of Smith and Sewell’s (2013) competitive inter­ action theory was to provide a general framework for unifying the results of a variety of attention and VSTM paradigms rather than a model for fitting data. Nevertheless, the theory is able to make specific quantitative predictions of performance in our experimen­ tal task, under the assumption that d' is proportional to asymptotic VSTM trace strength, co(»). This proportionality follows most simply and directly from a constant variance signal detection model, but the model can be extended to obtain predictions for both response time and accuracy using a diffusion model, as discussed by Smith and Sewell (2013).5 The predictions of the theory depend on the values of some 14 to 17 numerical constants, which are listed in Table A5, and which potentially can be treated as parameters of the model; but the only values that need to be adjusted to predict performance in our paradigm quantitatively are those that specify the growth of accuracy as a function of exposure duration. Figure 9a shows the overall proportions of correct responses, P(C), for Experiment 1 as a function of exposure duration and set size, together with the proportions predicted by the compet­ itive interaction model. The parameters that were adjusted to generate these predictions were the onset and offset rates, (3on and (3otr which determine the perceptual processing dynamics of masked stimuli, the attention gain, y, which was assumed to be the same for all locations and all display sizes, the precision or dispersion of the sigmoid feedback function, a, and a scaling parameter, 0, that maps VSTM trace strength to d '. The numer­ ical values of the parameters used to generate these predictions are described in Table A5 of the Appendix. Figure 9a shows that, with an appropriate choice of parameter values, the

2229

theory predicts the combined effects of exposure duration and set size effect found in Experiment 1. Specifically, it predicts sample size effects for each of the four exposure durations. The theoretical significance of this result is that it provides a dis­ tributed, computational framework that reproduces the predic­ tions of the sample-size model without invoking sampling as­ sumptions. As well as predicting invariance of 2 ; (d[)2, the second distinc­ tive prediction made by the sample-size model is that (d ')2 grows linearly with exposure duration. This prediction follows from the fact that (d') in that model is proportional to \ f n / m , the square root of the total number of samples divided by the number of stimuli in the display, where the total number of samples is proportional to exposure duration. We did not investigate the effects of very long exposures in our experiments because of the difficulty in characterizing d' reliably at high accuracies and because we regarded it as a priori unlikely that d' would continue to increase with exposure indefinitely, as this requires a process with an unlimited integration time. The fits of our individual observer data in the previous section were equivocal in their support for the temporal integration prop­ erties of the sample-size model, although they strongly supported its display-size predictions. Depending on whether model selection was carried out using AIC or BIC, the best model for a given individual was typically either the linear growth model, which assumes linearity of (d1)2, or the flexible growth model, which does not. The latter model assumes that the predictions of the sample-size model hold at every exposure duration, but does not impose any constraints on how sensitivity increases with exposure. Nevertheless, in Experiment 1, the predicted linearity of (d')2 holds fairly well, particularly at the group level. Figure 9b shows values of (d')2 computed from the group data of Figure 9a, together with best-fitting straight lines. The average correlation between exposure duration and (d')2 in the figure is r = .94. Figure 9c shows the squared, asymptotic VSTM traces, o>2(°°), that give rise to the predicted proportions of correct responses in Figure 9a. Like the plot of (d')2 values in Figure 9b, the plot of o)2(oo) is fairly linear: The average correlation between exposure duration and co2(°°) in the figure is r = .95. As discussed by Smith and Sewell (2013), if the rate at which stimulus information is extracted by the what pathway is comparatively slow relative to the rate at which stimuli are selected by where pathway, the model predicts approximately linear temporal integration functions over a fairly wide range of exposure durations. In sum, then, the com­ petitive interaction theory successfully reproduces both the display size and the exposure duration predictions of the sample-size model found in our data. One of the most striking features of the competitive interaction theory are its predictions for sequential stimulus presentation. As we discussed previously, the conventional stimulus sampling in­ terpretation of the sample-size model, which posits a limit on encoding capacity, does not predict display size effects under

5 The proportionality also obtains under the assumption that encoding variability scales with mean trace strength, as it would under a Poisson-like coding scheme. However, it is unclear whether such an account, when combined with a diffusion model, would successfully account for the shapes of correct and error reaction time (RT) distributions along with their relative orderings without further assumptions to normalize the variance.

SEWELL, LILBURN, AND SMITH

2230

a

b

c

Figure 9. A: P(C) data for Experiment 1 (symbols) and competitive interaction theory predictions (solid lines). B: (d')2 data for Experiment 1 (symbols) and best fitting straight lines (solid lines). C: VSTM trace strength predictions, u>2(t), from competitive interaction theory (symbols) and best fitting straight lines (solid lines).

sequential presentation. Figure 10 compares the predicted perfor­ mance of the competitive interaction theory for a four-item display in which items are presented either simultaneously or sequentially, for 200 ms per item. The predictions for simultaneous presentation in the three top panels essentially reproduce those for an unlimitedexposure four-item display in Figure 8. As in Figure 8, when the feedback function is sigmoid all target-like stimuli in the display are selected by the where pathway. The what pathway forms veridical representations o f all o f the targets and attempts to transfer them to VSTM. Because the number o f targets in the display does not exceed the item capacity of VSTM, all stimuli are transferred to VSTM, but the trace strengths are normalized (ap­ proximately) to w,(oo) = • \ / o.52/4 = 0.25. The bottom three panels o f Figure 10 show the effects of sequential presentation. The panel on the lower left shows that there are som e costs in selection efficiency as subsequent items are added to the display. Once an item is selected, its where pathway representation remains active until it decays (Equation A 4), which occurs comparatively slowly. Consequently, the representations o f early items compete with those o f later items.

The effect o f competition is to progressively reduce the strength o f where pathway activity for later items. This gives the set of activation functions the scalloped appearance they have in the figure. The progressive reduction in where pathway activation translates into a decrease in what pathway activation, but, as a comparison o f the middle panels shows, the effects are com­ paratively small: Asymptotic what pathway activation with sequential presentation is virtually indistinguishable from si­ multaneous presentation. In either instance, the what pathway forms approximately veridical representations o f the stimulus intensities, /X00) ~ 0.5. The most interesting property o f the model is illustrated in the panel on the lower right, which shows the VSTM trace strengths. As new items are added to the memory, VSTM capacity is dynamically reallocated. Initially, VSTM forms a veridical representation o f the first presented item, utj = 0.5. When a second item is presented, the available capacity is reallocated equally between the two items, (o; = \ / o . 5 2/2 — .35, i = 1, 2. Reallocation again occurs after the third item,

Simultaneous 1

1 Where u (t)

What V|(t) >N S 0.5 o

___________ Sequential

Figure 10.

0

/

L

......... - .............. ..

__________

-C O) c

VSTM co.(t)

ffl

55 0.5 counts the number of other stimuli, not including the one at location i, that are seeking to access VSTM. This number is compared with a number K that is strictly less than the item capacity of VSTM and again thresholded and then subtracted from 1.0. (We use K = 3.5 in our model to implement an item capacity of four items.) The term as a whole will be equal to 1.0 if there are fewer than four other items in VSTM with trace strengths above some threshold value and equal to 0 if there are more than four items. When there are fewer than four items, the Af-capacity term has no effect on the growth of VSTM; when there are more than four items, 4w,/df goes to zero and the trace for item i stops growing. This means that once four items are established in VSTM, the memory is full and no new items will be admitted. The parameter e determines the minimum trace strength required for an item to become established in VSTM.

Model for Change Detection (Vogel et al., 2006) (A7)

In these equations, the additive inhibition terms (the terms in braces) express the information capacity of VSTM; the multipli­ cative inhibition terms express its item capacity. The constants, y,are attention gain terms that determine the attention allocated to the stimulus at location i; 0 determines the asymptotic trace strength, and K determines the item capacity of VSTM. The function D(co), is a so-called “soft threshold,” that is, a smooth approximation to a step function. In our model, D((o), was imple­ mented as

To model the change-detection data of Vogel et al. (2006) in Figure 10, we assumed a MAX-model version of a same-different model, similar to the one proposed by Wilken and Ma (2004). The same-different signal detection model for a single pair of stimuli is described by Macmillan and Creelman (1991, pp. 150-151). The model can be represented geometrically as three normal distribu­ tions on a line: a central distribution, with mean zero, representing the sensory effect on “same” (no change) trials, flanked symmet­ rically by a pair of distributions with means d and - d , each with probability mass of 1/2, which together represent the sensory effect

(Appendix continues)

2242

SEWELL, LILBURN, AND SMITH

on “different” (change) trials. The three distributions express the idea that the decision rule depends only the magnitude of the change and is indifferent to its sign or direction, and that positive and negative changes are equally likely. The MAX model gener­ alizes the single location model to multielement displays. To make a decision, the observer sets a pair of criteria at c and —c, symmetrically above and below the mean of the “same” distribu­ tion. The observer responds “no change” only if the sensory effect sampled at every location falls in the interval —c to c; if the sensory effect at any location falls above c or below —c, the observer responds “change.” For an m location display, assuming independence of the sensory effect at each location, the probability of a correct no change response, P(Same I same), is P(Same I same) = [(

An information capacity limitation of visual short-term memory.

Research suggests that visual short-term memory (VSTM) has both an item capacity, of around 4 items, and an information capacity. We characterize the ...
21MB Sizes 1 Downloads 10 Views