Behavioural Processes 41 (1997) 227 – 236

Win-stay/lose-shift and win-shift/lose-stay learning by pigeons in the absence of overt response mediation Christopher K. Randall, Thomas R. Zentall * Department of Psychology, Uni6ersity of Kentucky, Lexington, KY 49596, USA Received 23 January 1997; received in revised form 29 April 1997; accepted 22 May 1997

Abstract Win-stay/lose-shift and win-shift/lose-stay behavior in pigeons was compared using a two-alternative conditional discrimination for which the number of trials involving each of the task components could be precisely controlled. One group was rewarded for pecking the location just pecked if those pecks were followed by food and for pecking the other location if those pecks were not followed by food (win-stay/lose-shift). Another group was rewarded for pecking the location just pecked if those pecks were not followed by food and for pecking the other location if those pecks were followed by food (win-shift/lose-stay). With increasing delay to comparison choice, pigeons were more accurate on trials when initial pecking was followed by the absence of food than by food (Experiment 1). However, when hypothesized overt response mediation was discouraged (Experiment 2), a win-stay superiority effect emerged with increasing delay to comparison choice. Thus, unlike rats, pigeons may be somewhat predisposed to repeat a response to a location to which responses have been previously rewarded. © 1997 Elsevier Science B.V. Keywords: Win-stay/lose shift; Win-shift/lose-stay; Response

1. Introduction Rats show evidence of a natural predisposition to avoid recently visited locations whether they have been fed there (e.g. Dember and Fowler, 1958) or not (Timberlake and White, 1990). Such

* Corresponding author. Tel.: +1 606 2574076; e-mail: [email protected]

shift behavior does not appear to be characteristic of all species, however. For example, there is a prevalent notion that pigeons are predisposed to exhibit stay behavior. This hypothesis has been supported by findings that pigeons tend to perseverate (Goodwin, 1967; Bond et al., 1981; Zentall et al., 1990). Feral pigeons, for example, habitually forage in established patches (Goodwin, 1967) and possess excellent long-term memory for those locations (Levi, 1974).

0376-6357/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved. PII S 0 3 7 6 - 6 3 5 7 ( 9 7 ) 0 0 0 4 8 - X

228

C.K. Randall, T.R. Zentall / Beha6ioural Processes 41 (1997) 227–236

Although some of this tendency to perseverate may be under control of a predisposition to avoid novel alternatives (neophobia), under natural, or even seminatural conditions, contingencies established by the natural distribution of food may encourage perseveration over alternation. For example, if pigeons are allowed to feed in a particular location, but they are removed (or they leave) prior to depletion of the patch, returning to that patch would be reinforcing. On the other hand, if rats tend to consume the small amount of food that they find in a patch, it would not be appropriate for them to return. Under laboratory conditions, there is suggestive evidence that when the contingencies for staying and shifting are manipulated, pigeons show a tendency to stay, especially following a reinforced response. Shimp (1966), for example, reported that, when trained to respond to two alternatives each associated with a different reinforcement probability, pigeons exhibit a significant tendency to repeat a just-reinforced response, independent of the overall reinforcement probability associated with that alternative. Similarly, Williams (1972) reported that pigeons learn to repeat a recently-reinforced response more when that behavior was reinforced with a high probability (0.80), than when it was reinforced with a lower probability (0.65). Finally, Zeiler (1987) demonstrated that pigeons persistently respond to an alternative associated with a high overall density of reinforcement even when responses to that alternative are not reinforced locally. Although these studies suggest that pigeons possess a win-stay response bias, when different operant procedures are used, pigeons can acquire a win-shift pattern of responding. Hearst (1962), for example, trained pigeons on a win-shift, delayed-alternation task in which reinforcement was available only when pigeons alternated responding between two keys from trial to trial. Although the pigeons rapidly learned the delayed alternation task and maintained a moderate level of performance at up to a 10 s delay between choices, no comparison was made between win-shift and win-stay acquisition. Similarly, Williams (1971a,b) trained pigeons on a non-spatial delayed alternation task in which

reinforcement was contingent on alternating responses between two hues. This procedure revealed a tendency in pigeons to respond perseveratively; however, that behavior was markedly reduced when pigeons were trained with a relatively large fixed ratio (FR) response requirement. The large FR (15 or 30 pecks) afforded pigeons the opportunity to switch between response alternatives after making a few perseverative responses. Williams also reported that alternation accuracy following incorrect (nonreinforced) trials was better than that observed after correct (reinforced) trials. A variable that appears to affect shift versus stay behavior in rats is the separation of the response alternatives. Williams (1991) has concluded that when response alternatives are widely separated (e.g. in a radial-arm maze; Olton, 1979) there is a tendency for animals to exhibit shift behavior. When the response alternatives are closely spaced, however (e.g. two bars in an operant chamber; Evenden and Robbins, 1984) stay behavior is more likely. The only study with pigeons to compare performance of a win-stay/lose-shift task with that of a win-shift/lose-stay task was reported by Shimp (1976). Using a variation of the delayed alternation procedure, Shimp (1976) compared the performance of pigeons assigned to either a win-stay/lose-shift or a win-shift/lose-stay condition. In this task, the cue for reinforcement on any particular trial was the response reinforced on the previous trial. In the win-stay/lose-shift condition, following a reinforced choice, the probability of reinforcement for repeating that choice was 0.80 and the probability of reinforcement for choosing the other response alternative was 0.20. Those probabilities were reversed in the win-shift/ lose-stay condition. With this procedure, clearly, the optimal strategy for win-stay/lose-shift subjects was to repeat the response following reinforcement and to respond to the other alternative following non-reinforcement. On the other hand, the optimal strategy for win-shift/lose-stay birds was to select the other alternative following a reinforced response and to repeat a non-reinforced response.

C.K. Randall, T.R. Zentall / Beha6ioural Processes 41 (1997) 227–236

Shimp (1976) reported that pigeons in the winshift/lose-stay condition developed position biases that significantly impaired task acquisition and he subsequently altered the training procedure for that group. Consequently, differences in training between the two conditions precluded a direct comparison of the acquisition data between groups. Thus, win-stay and win-shift behavior in pigeons have not been compared under comparable conditions. One problem with interpreting research in which response consequences have been used as discriminative stimuli is there is an inherent asymmetry in behavior following each of the two outcomes that may provide an advantage for matching accuracy following a no-food (lose) event. To see this most clearly, imagine a pigeon performing a spatial delayed alternation task (i.e. win-shift). Following an initial response that results in no food (i.e. an incorrect response) the pigeon can simply continue (or resume) pecking the key just pecked with minimal interference from the consequence of the key peck (i.e. little spatial memory is required). Following an initial response that results in food (i.e. a correct response), however, the pigeon’s head is likely to be in the feeder. Thus, following a food event, making a correct choice would require memory for the spatial location of the first response. A similar asymmetry would be present during win-stay training. The pigeon could learn that following an initial response that results in no food, it should move to and peck the alternative response key for reinforcement, but following an initial response that results in food, it could not use such response-produced cues because its head would be in the feeder. Thus, under either training condition, one might expect matching accuracy following a lose event to be superior to that following a win event. Although this asymmetry is most obvious in the case of a spatial discrimination, it is also likely that in a hue discrimination, the absence of an interfering feeder presentation on lose trials would facilitate memory for the hue just pecked. The purpose of the present experiments was to present pigeons with a task that would assess both their natural tendencies to stay or shift following

229

reinforcement or non-reinforcement and their ability to remember those win and lose events over time. The reason for the delay manipulation was to ask if pigeons are able to remember win and lose events comparably. The purpose of the first experiment was to ask if prior research could have been biased by the asymmetry of response to the consequences following win (food) and lose (no-food) events, which might have provided a response-mediated advantage on lose trials. In Experiment 2, the task was designed to avoid differential task requirements on win and lose trials. Both experiments were designed to avoid differential natural consequences of stay and shift behavior (as would be present in, e.g. a radial arm maze because reentry into a visited arm under natural conditions would be unlikely to result in reinforcement whether food had been found there originally or not). Because in prior research (e.g. delayed alternation) the cue that signals which response is to be reinforced is under the subject’s control (i.e. the win or lose cue depended on whether the subject made a correct response or not on the preceding trial), in the present research, correct responding on each trial was determined by specific events that occurred earlier in the trial and were under experimenter control. 2. Experiment 1

2.1. Method 2.1.1. Subjects The subjects were ten White Carneaux pigeons, obtained from the Palmetto Pigeon Plant (Sumter, SC), and were randomly divided into two groups (win-stay/lose-shift or win-shift/lose-stay). All subjects had previous experience on delayedmatching-to-sample (DMTS) tasks with hue stimuli. The pigeons were maintained at 80–85% of their free-feeding weights throughout the experiment and they were housed in individual wiremesh cages with free access to both water and grit. The colony was maintained in a climate-controlled vivarium that operated on a 12/12 h light/ dark cycle. All experimental protocols were approved by the Animal Care and Use Committee at the University of Kentucky.

230

C.K. Randall, T.R. Zentall / Beha6ioural Processes 41 (1997) 227–236

2.1.2. Apparatus The experiment was conducted in a BRS/LVE (Laurel, MD) operant chamber with inside dimensions measuring 37×41 × 37 cm (l× w× h). The response panel was equipped with three rectangular response keys, each 3.0×2.5 cm, spaced 1.5 cm apart and located 25 cm above the grid floor. The center key was not used in this experiment. Side keys were illuminated with a green hue by stimulus projectors mounted behind the response panel. A 6.0 cm2 opening provided access to a food hopper mounted behind the response panel. This opening was equidistant from the center key and the grid floor and was illuminated by a white light (GE No. 1820) during all food presentations. A shielded houselight (GE No. 1820), centered in the chamber ceiling, provided general chamber illumination during intertrial intervals (ITIs). Extraneous sounds were masked by white noise (at 72 dB) presented through a panel-mounted speaker. A chamber-mounted fan provided ventilation. Experimental sessions were controlled and monitored by a microcomputer connected to the operant chamber via a Rayfield interface and electromechanical relay equipment, all located in an adjoining room.

2.2. Results

2.1.3. Procedure Preliminary training. Subjects were shaped to respond to the green stimulus presented on either side key. A single peck to a singly-lit key resulted in 2 s access to mixed grain. Conditional discrimination training. Conditional discrimination trials began with the presentation of a green stimulus on either side key (randomly alternated between trials). Following ten responses (fixed ratio, FR, 10) to the illuminated key, either food, 2 s access to the grain hopper and hopper light (the win event) or no-food, 2 s of hopper light alone (the lose event) was presented. Following the 2 s win or lose event, both side keys were illuminated with the green hue. Subjects were rewarded for matching or mismatching (FR 1) the original stimulus location (left or right) depending on whether the preceding event was win or lose, and whether the group assignment was win-shift/lose-stay or win-stay/lose-shift. Correct comparison choices resulted in 1.5 s access to

2.2.1. Acquisition One subject (in the win-stay/lose-shift group) failed to learn the task (i.e. it never performed at a level significantly above chance). Data from that subject were omitted from the analyses. Although not all of the remaining nine subjects attained the two session criterion on both components of their tasks, they all attained a criterion of one session at 90% correct or better on both task components. Thus, sessions to this 1-session criterion was used to analyze task acquisition. Differences in the rate of acquisition for the four conditions (win-stay, lose-shift, win-shift, and lose-stay) were quite small. The win-stay/lose-shift pigeons acquired the win-stay component in 10.7 sessions and the lose-shift component in 12.0 sessions. The win-shift/lose-stay pigeons acquired the win-shift component in 12.0 sessions and the lose-stay component in 10.7 sessions. The correction versus noncorrection manipulation had little effect on accuracy in acquisition (or delay testing) and thus,

the food hopper and a 10 s ITI; incorrect responses terminated the trial and immediately initiated the ITI. For two of the subjects in each group, a correction procedure was in effect—an incorrect response restarted the trial (up to a maximum of five times). For the remaining subjects, a non-correction procedure was in effect— they progressed to the next trial following an incorrect response. Each session consisted of 64 trials, counterbalanced for the location of initial stimulus and the consequences of an initial response (win or lose). Discrimination training continued to an overall performance criterion of 90% correct or better for two consecutive sessions. Delay testing. As each pigeon reached the performance criterion, mixed delays were introduced between the food or no-food event and comparison choice. Each of the ten test sessions consisted of an equal number of 0, 1, 2, 4 s delays. The number of trials at each delay (16) was balanced for both position of initial stimulus (right or left) and task component (win or lose). The order of the 64 trials in each test session was randomly determined.

C.K. Randall, T.R. Zentall / Beha6ioural Processes 41 (1997) 227–236

this variable was ignored in all analyses. A threeway mixed-design analysis of variance (ANOVA) was performed on the acquisition data with group (win-stay/lose-shift versus win-shift/lose-stay), task component (win versus lose event following the initial response), and procedure (correction versus noncorrection) as factors. The analysis indicated that none of the main effects or interactions was significant, all Fs B 1. All statistical analyses were computed at the 0.05 significance level. It should be noted that the individual components of the task for each group are acquired (on average) faster than the entire task. This is the case because task acquisition to criterion can occur no faster than acquisition to criterion of the slowest component (and if there is a post criterion drop in the performance of the first acquired component, it could take even longer).

2.2.2. Delay test Treatment effects did appear on the delay test. A three-way mixed-design ANOVA was performed on the delay data. The three factors were: group (win-stay/lose-shift versus win-shift/losestay), task component (win versus lose trial), and delay (0, 1, 2, 4 s). The analysis indicated that there was a significant main effect of task component, F(1,7)= 22.02. Performance on lose trials (84.1% correct) was better than on win trials (68.0% correct). There was also a significant Group ×Task component interaction, F(1,7) = 10.62. This interaction was produced by better performance on win trials and poorer performance on lose trials by the win-stay/lose-shift group (see Fig. 1). As expected, a main effect of delay was also found, F(3,21) = 40.29. In addition, the Task component × Delay interaction was also significant, F(3,21) =16.77. This interaction can be attributed to the sharp decrease in task accuracy over delays on win trials relative to lose trials for both groups. No other effect was significant.

231

trials appeared in the delay test. This effect appeared both as a main effect of task component and as a Task component× Delay interaction (the difference between lose and win performance increased with increasing delays). These effects are consistent with the hypothesis that there may be a bias in favor of higher levels of performance on lose trials because overt responding can mediate lose-trial but not win-trial performance. Thus, although win-trial performance may reflect the true effects of memory for the spatial location associated with a win outcome, the same may not be true of lose trial performance. If response mediation is responsible for superior lose trial performance, one might expect such performance to be better when the response requirement following a lose outcome was to stay rather than to shift (i.e. it should be relatively easier to maintain pecking the sample response key than to move over to the other key). Although neither the Group× Task component interaction nor the Group× Task component× Delay interaction reached statistical significance, those effects may have been constrained somewhat by the large task-component effect.

2.3. Discussion Although significant effects of the manipulation were not found in acquisition, clear evidence for better performance on lose trials than on win

Fig. 1. Mean percent correct performance ( 9S.E.) for pigeons in Group win-stay/lose-shift (WSt/LSh) and Group win-shift/ lose-stay (WSh/LSt) as a function of trial type (win, lose) and delay (0, 1, 2 and 4 s).

232

C.K. Randall, T.R. Zentall / Beha6ioural Processes 41 (1997) 227–236

The present findings also suggest that prior research in which the outcome of the preceding response or trial (i.e. food or no food) was the discriminative stimulus for choice on the current trial (e.g. Shimp, 1976), may not properly address the question of whether stay or shift behavior is predisposed in pigeons. Consequently, the purpose of Experiment 2 was to compare the relative acquisition and delay performance by pigeons of a win-stay/lose shift task with that of a win-shift/ lose-stay task, under conditions that should greatly reduce the likelihood of differential response mediating behavior on win and lose outcome trials.

3. Experiment 2 To avoid the potential bias produced by differential opportunity to use response-produced cues following win and lose events, pigeons in Experiment 2 were required to peck a center response key during the 2 s of no food (i.e. the lose event) prior to the onset of the comparison stimuli. With this procedure, on trials in which the initial response was not reinforced, the location of the pigeon’s head throughout the no-food event could not be used to mediate that delay. If the results of Experiment 1, as well as prior research on shift versus stay behavior (depending on its outcome), were influenced by differential opportunity to use response mediation, then one might expect to see a pattern of results with the present procedure that is different from that found in Experiment 1.

3.1. Method 3.1.1. Subjects Ten experimentally-naive White Carneaux pigeons purchased as retired breeders (5 – 8 years old) from the Palmetto Pigeon Plant (Sumter, SC) served in the present experiment. Subjects were housed and maintained as described in Experiment 1. 3.1.2. Apparatus The experiment was conducted in an operant chamber, similar to the one used in Experiment 1,

with the following exceptions: The response panel had a matrix of 25 response keys arranged in a 5 × 5 array. Only three of the 25 keys were used. Those keys, each measuring 1.5 cm in diameter and spaced 6.0 cm apart, were the center and two outermost keys (right and left) in the middle row of the matrix. The hopper opening located in the center of the panel (equidistant from the center of the matrix and the grid floor) measured 6 cm2 and was illuminated by a light (GE No. 1820) during mixed grain presentations. A house light (GE No. 1820) centered in the ceiling provided general chamber illumination during the ITIs.

3.1.3. Procedure Preliminary training. Subjects were autoshaped to respond to three stimuli: a white stimulus on the center key, a green stimulus on the right key, and a red stimulus on the left key. Reinforcement consisted of 2.0 s access to Purina Pro Grains. After autoshaping, pretraining progressed from an FR 1 to an FR 10 schedule on both side keys. Following successful responding in this pretraining phase, subjects began conditional discrimination training. Conditional discrimination training. Pigeons were trained on either a win-stay/lose-shift or a win-shift/lose-stay conditional discrimination. To maximize the distinctiveness of the response alternatives, the color of the initial stimulus was redundant with its spatial location for both groups (i.e. the right key was always illuminated with a green hue and the left key with a red hue). All trials began with the illumination of one of the two side keys. On win trials, ten pecks to the illuminated key resulted in 2 s access to mixed grain, and a left/right comparison choice. On lose trials ten pecks to the illuminated key resulted in a 2 s presentation of a white stimulus on the center key. If a peck occurred at any time during the 2 s interval, the white stimulus (lose event) was terminated at the end of 2 s; however, if a peck did not occur during this interval, the stimulus was terminated immediately following the first peck to the key. Data from the few trials in which a peck occurred to the illuminated center key only after 2 s had elapsed were not included in the analyses. Following either a win or lose event, the

C.K. Randall, T.R. Zentall / Beha6ioural Processes 41 (1997) 227–236

233

left and right comparison keys were illuminated. A comparison choice was defined as the first key to be pecked five times successively (FR5, see Williams, 1971a; Zentall et al., 1990) and reinforcement was provided if appropriate. Trials were separated by 10 s ITIs with the house light lit. Discrimination training continued until subjects reached an overall performance criterion of 90% correct or better for two consecutive 64-trial sessions. Delay testing began on the next session. Delay testing. As each pigeon reached the acquisition criterion, mixed delays were introduced between food or no-food event and comparison choice. Subjects received ten, 64-trial sessions that included mixed delays of 0, 2, 4 and 8 s. An equal number of trials occurred at each delay (16), balanced for both position of initial stimulus (right or left) and trial type (win or lose). The order of the trials was randomly determined.

3.2. Results Acquisition. A two-way ANOVA performed on the number of sessions to criterion failed to yield significant effects, all Fs B1. On average, the win-stay/lose-shift group acquired the task in 23.7 sessions while the win-shift/lose-stay group acquired the task in 23.3 sessions. Win-stay and win-shift task components were acquired in 16.3 and 16.5 sessions, respectively and lose-shift and lose-stay task components were acquired in 23.2 and 18.5 sessions, respectively. Delay Test. The delay data were analyzed with a three-way ANOVA with group (win-stay/loseshift versus win-shift/lose-stay), task component (win versus lose), and delay (0, 2, 4, 8 s) as factors. As can be seen in Fig. 2, overall, pigeons were generally more accurate on win trials (66.3% correct) than on lose trials (59.6% correct). This conclusion was supported by a significant main effect of task component, F(3,24) = 5.87. There was also a significant main effect of delay, F(3,24)= 29.83, but there was no significant effect of Group, F(1, 8)= 2.58. Of particular interest with regard to the purpose of the present experiment, accuracy on win-stay trials was better than accuracy on win-shift, lose-shift, or lose-stay trials. This observation was supported by a signifi-

Fig. 2. Mean percent correct performance ( 9S.E.) for pigeons in Group win-stay/lose-shift (WSt/LSh) and Group win-shift/ lose-stay (WSh/LSt) as a function of trial type (win, lose) and delay (0, 2, 4 and 8 s).

cant Group× Task component interaction, F(1,8)= 6.58. Over delays, win-stay performance (73.9% correct) exceeded win-shift (58.6% correct), lose-stay (59.0% correct), and lose-shift (60.2% correct) performance. Also of interest, from the perspective of evaluating memory effects, was the absence of significant interaction of Group× Delay, FB 1, Task component× Delay, FB 1, and Group × Task component × Delay, F(3, 24)= 1.41. Thus, increasing the delay did not have a differential effect on task accuracy.

3.3. Discussion The absence of significant differences in the acquisition of win-stay/lose-shift versus win-shift/ lose-stay tasks is consistent with the results of Experiment 1. When delays were interposed between sample presentation and comparison choice, however, win-stay accuracy generally exceeded win-shift, lose-stay, and lose-shift accuracy. This result is different from that found in Experiment 1. In Experiment 1, for both groups, accuracy on lose trials was better than accuracy

234

C.K. Randall, T.R. Zentall / Beha6ioural Processes 41 (1997) 227–236

on win trials. The present findings support the hypothesis that the results of Experiment 1, as well as the results of earlier research on shift versus stay performance, were influenced by differential response mediation that likely facilitated delay performance on lose trials. A factor that also may have contributed to the difference in findings between the two experiments was the increased differentiation of the two response alternatives (see Williams, 1991). In Experiment 2 the two comparison keys were further apart and were distinguished by distinctive hues. Unlike previous studies, the present design permits an analysis of pigeons’ response tendencies following lose samples in addition to, and independent of, their behavior following win samples. Notably, better performance on lose trials was not apparent during either acquisition or delay testing in Experiment 2. Although one might imagine that under natural conditions a lose-shift strategy would be an advantage over a lose-stay strategy, pigeons did not acquire the lose-shift task significantly faster than the lose-stay task. Even when delays were introduced, differences between loseshift and lose-stay matching accuracy did not emerge. A number of studies have described divergent delay functions following conditional discrimination training that involves food and no-food samples in pigeons (see Colwill, 1984; Grant, 1991; Maki et al., 1981; Sherburne and Zentall, 1993a,b; Wilson and Boakes, 1985). The typical observation is that accuracy on food sample trials declines precipitously across a retention interval, whereas accuracy on no-food sample trials is remarkably unaffected by the delay. This asymmetry is presumed to be indicative of a single-code-default coding strategy that is adopted by pigeons when presented with these distinct samples (see Grant, 1991). Specifically, pigeons appear to code (remember) only food samples and respond appropriately as long as their memory for that sample persists across delays. Thus, if there is memory of a food sample at the time of comparison choice, the pigeon learns to make a response to one of the comparisons. If no memory of a food sample is present, the pigeon learns to make a response to the other comparison. The divergent

delay functions are believed to reflect the relatively rapid decay of the pigeons’ memory for food over a retention interval, together with a consistently accurate default response on no-foodsample trials because there is no coded event to be forgotten. This presence-versus-absence sample effect, which is found in simple conditional discriminations (i.e. delayed matching), is relevant to the present results because the win and lose samples used here are perhaps analogous to the food and no-food samples used by others. Examination of the retention functions for the data from Experiment 1 suggests just such divergent retention functions. On lose (no-food) sample trials, the retention functions were relatively flat (see, in particular, the retention functions for the winshift/lose-stay group, the left panel of Fig. 1). An examination of the retention data from Experiment 2, however, indicates little evidence of divergent retention functions on win and lose trials for pigeons in either group. Although winstay trial accuracy was better than the other conditions, the slope of the win-stay function is remarkably parallel to that of the other three functions. The difference in procedures used in Experiments 1 and 2 may account for this difference in the relative slopes of the retention functions. In Experiment 1, no response was required during the no-food (lose) event, whereas eating occurred during the food event. In Experiment 2, however, responding was required following both food and no-food events. The results of Experiments 1 and 2 suggest that the basis for the single-code-default coding strategy proposed to account for divergent food/no-food-sample delayed matching retention functions may be differential responding, rather than food or (eating), and its absence. In Experiment 2, when the pigeons pecked at the food on food-sample trials and they pecked at the center response key on no-food-sample trials, parallel retention functions were found. Thus, there is no indication in Experiment 2 of the use of a single-code-default strategy. The present results also may be relevant to the literature on conditional discriminations in which the pigeon’s behavior is the source of the condi-

C.K. Randall, T.R. Zentall / Beha6ioural Processes 41 (1997) 227–236

tional cue. For example, Lydersen and Perkins (1974) have shown that pigeons can learn to match accurately when the only relevant discriminative stimulus is the number of responses (8 versus 16) they have made to an otherwise undifferentiated stimulus. Similarly, Urcuioli and Honig (1980) have shown that when pigeons are required to respond to samples of one hue with a fixed number of pecks (fixed ratio ten) and to respond to samples of a different hue, with two responses separated by a minimum amount of time (differential-reinforcement-of-a-low-rate-ofresponding 3 s), not only is acquisition of the conditional discrimination enhanced, but also the response requirement, rather than the sample hue appears to acquire control of comparison choice. The present procedure can be seen, potentially, as a special case of control of comparison choice by response produced cues. In the present case, however, the conditional cue required for comparison choice would have to be a compound consisting of (the response to) a spatial location and (the response associated with) the food or no-food outcome that follows. Furthermore, on no-food trials, the commonality of response following the initial response to the left or right response key should ensure that differential predispositions regarding win and lose events and the succeeding stay or shift behavior, rather than response mediation, accounts for differential task accuracy. One curious finding in Experiment 2 was the relatively poor level of accuracy at the 0 s delay, especially on shift trials (see Fig. 2). This result is surprising given that accuracy on these same trials was required to be at 90% or higher just prior to the introduction of delays. However, such a decrease in accuracy has been reported in other conditional-discrimination research when mixed delays are introduced following training at 0 s delay (Zentall et al., 1978; Honig, 1987), particularly when the to-be-remembered events are food and the absence of food (e.g. Sherburne and Zentall, 1995; Zentall et al., 1995). Zentall et al. (1997) have recently suggested that such a decrease in matching accuracy on 0 s delay matching trials may result from the general increase in delay of reinforcement associated with sample-oriented behavior when many of the trials in a session involve relatively longer delays.

235

4. General discussion Pigeons readily learn a conditional discrimination that involves either a win-stay/lose-shift or a win-shift/lose-stay response rule. On delay trials, however, win-stay accuracy appears to exceed win-shift, lose-stay, and lose-shift accuracy. On the other hand, although pigeons may indeed be predisposed to exhibit win-stay behavior (see Goodwin, 1967; Bond et al., 1981; Zentall et al., 1990), under the present conditions they appear to be able to show considerable flexibility, and they readily adopt and respond according to a winshift response strategy when that strategy is appropriate to the particular experimental contingencies. At a broader level, one might ask about the ecological generality of these results. To what extent is timed access to the feeder analogous to the partial depletion of a patch? Would the results have been different if the pigeons had been fed from a pellet dispenser? Although the present results are consistent with our notions of the foraging strategies of pigeons, one might expect those strategies to be influenced as well by the nature of the reinforcement context. Thus, the foraging strategy adopted by the pigeon may critically depend on the way the pigeon ‘interprets’ environmental contingencies. Another factor that may affect the nature of the foraging strategy is the relation of the response to the reinforcer. Although there is a tendency to think of responses as being interchangeable, a very different pattern of results might be found if the experiments were repeated with treadle stepping replacing key pecking as the targeted behavior. It is perhaps in experiments of the present kind that results of studies investigating the foraging strategies of animals in natural or artificial settings can find common ground with those of laboratory experiments in which the acquisition of stimulus control in conditional discriminations have been studied. By determining the conditions under which an animal’s predispositions affect task acquisition and memory we may be able to better understand the influences of evolution on the processes involved in animal learning.

236

C.K. Randall, T.R. Zentall / Beha6ioural Processes 41 (1997) 227–236 tion of backward associations. Anim. Learn. Behav. 23, 177 – 181. Shimp, C.P., 1966. Probabilistically reinforced choice behavior in pigeons. J. Exp. Anal. Behav. 9, 443 – 455. Shimp, C.P., 1976. Short-term memory in the pigeon: The previously reinforced response. J. Exp. Anal. Behav. 26, 487 – 493. Timberlake, W., White, W., 1990. Winning isn’t everything: Rats need only food deprivation and not food reward to efficiently traverse a radial arm maze. Learn. Motiv. 21, 153 – 163. Urcuioli, P.J., Honig, W.K., 1980. Control of choice of conditional discriminations by sample-specific behaviors. J. Exp. Psychol.: Anim. Behav. Process. 6, 251 – 277. Williams, B.A., 1971a. Color alternation learning in the pigeon under fixed-ratio schedules of reinforcement. J. Exp. Anal. Behav. 15, 129 – 140. Williams, B.A., 1971b. Non-spatial delayed alternation by the pigeon. J. Exp. Anal. Behav. 16, 15 – 21. Williams, B.A., 1972. Probability learning as a function of momentary reinforcement probability. J. Exp. Psychol.: Anim. Behav. Process. 17, 363 – 368. Williams, B.A., 1991. Choice as a function of local versus molar reinforcement contingencies. J. Exp. Anal. Behav. 56, 455 – 473. Wilson, B., Boakes, R.A., 1985. A comparison of the shortterm memory performances of pigeons and jackdaws. Anim. Learn. Behav. 13, 285 – 290. Zeiler, M.D., 1987. On optimal response strategies. J. Exp. Psychol.: Anim. Behav. Process. 13, 31 – 39. Zentall, T.R., Clement, T.S., Kaiser, D.H., 1997. Delayed matching in pigeons: Can apparent memory loss be attributed to the delay of reinforcement of sample-orienting behavior? Behav. Proc. (submitted). Zentall, T.R., Hogan, D.E., Howard, M.M., Moore, B.S., 1978. Delayed matching in the pigeon: Effect on performance of sample-specific observing responses and differential delay behavior. Learn. Motiv. 9, 202 – 218. Zentall, T.R., Sherburne, L.M., Urcuioli, P.J., 1995. Coding of hedonic and nonhedonic samples by pigeons in many-toone delayed matching. Anim. Learn. Behav. 23, 189 – 196. Zentall, T.R., Steirn, J.N., Jackson-Smith, P., 1990. Memory strategies in pigeons’ performance of a radial-arm-maze analog task. J. Exp. Psychol.: Anim. Behav. Process. 16, 358 – 371.

Acknowledgements This research was supported by National Science Foundation grants BNS 8418275 and BNS 9019080 awarded to TRZ. The authors wish to thank Philipp J. Kraemer and Donald F. McCoy for their advice throughout this project and they gratefully acknowledge the assistance of Lou M. Sherburne, Karen L. Roper and Zhongbiao Zhang in conducting this research.

References Bond, A.B., Cook, R.G., Lamb, M.R., 1981. Spatial memory of rats and pigeons in the radial arm maze. Anim. Learn. Behav. 9, 575 – 580. Colwill, R.M., 1984. Disruption of short-term memory for reinforcement by ambient illumination. Q. J. Exp. Psychol. 36B, 235 – 258. Dember, W.N. and Fowler. H., (1958). Spontaneous alternation behavior. Psychological Bulletin, 55, 412-428. Evenden, J.L., Robbins, T.W., 1984. Win-stay behavior in the rat. Q. J. Exp. Psychol. 36B, 1–26. Goodwin, D., 1967. Pigeons and Doves of the World. London: Trustees of the British Museum (Natural History). Grant, D.S., 1991. Symmetrical and asymmetrical coding of food and no-food samples in delayed matching in pigeons. J. Exp. Psychol.: Anim. Behav. Process. 17, 186–193. Hearst, E., 1962. Delayed alternation in the pigeon. J. Exp. Anal. Behav. 5, 225 –228. Honig, W.K., 1987. Memory interval distribution effects in pigeons. Anim. Learn. Behav. 15, 6–14. Levi, W.M., 1974. The Pigeon. Sumter, SC: Levi. Lydersen, T., Perkins, D., 1974. Effects of response-produced stimuli upon conditional discrimination performance. J. Exp. Anal. Behav. 21, 307–314. Maki, W.S., Olson, D.J., Rego, S., 1981. Directed forgetting in pigeons: Analysis of cue functions. Anim. Learn. Behav. 9, 189 – 195. Olton, D.S., 1979. Mazes, maps and memory. Am. Psychol. 34, 583 – 596. Sherburne, L.M., Zentall, T.R., 1993a. Coding of feature and no feature events by pigeons performing a delayed conditional discrimination. Anim. Learn. Behav. 21, 92–100. Sherburne, L.M., Zentall, T.R., 1993b. Asymmetrical coding of food and no-food events by pigeons: Sample pecking versus food as the basis of the sample code. Learn. Motiv. 24, 141 – 155. Sherburne, L.M., Zentall, T.R., 1995. Delayed matching in pigeons with food and no-food samples: Further examina-

Biographies Christopher K. Randall is now at the Department of Psychology and Education, Mount Holyoke College, South Hadley, MA 01075-1462.

.

lose-stay learning by pigeons in the absence of overt response mediation.

Win-stay/lose-shift and win-shift/lose-stay behavior in pigeons was compared using a two-alternative conditional discrimination for which the number o...
105KB Sizes 4 Downloads 4 Views