1991, 55, 37-46

JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR

NUMBER

1

(JANUARY)

PREFERENCE FOR CONDITIONED REINFORCEMENT BEN A. WILLIAMS AND ROGER DUNN UNIVERSITY OF CALIFORNIA-SAN DIEGO AND SAN DIEGO STATE UNIVERSITY

Pigcons werc presented with a concurrent-chains schedule in which both choice alternatives led to the same terminal-link stimulus, which was followed by food. Superimposed on the food-reinforced presentations of the terminal-link stimulus was a second schedule of presentations of the same stimulus that were followed by no food. The absolute number of these no-food stimulus presentations was held constant while their relative frequency assigned to one or the other choice alternative was systematically varied. Preference for a given choice alternative tracked the relative frequency of these stimulus presentations, thus demonstrating that they served as reinforcers. These results resolve conflicts in the literature regarding the effect of conditioned reinforcement on choice. Key zords: conditioned reinforcement, choice, concurrent chains, timeout, key peck, pigeons

A common procedure for studying conditioned reinforcement involves a chain schedule in which the initial links are concurrently available (concurrent chains) and the terminal-link stimuli are correlated with different parameters of primary reinforcement. The critical measure is the relative rate of responding ("preference") in the initial links of the chains. The procedure is useful for studying conditioned reinforcement based on the assumption that the determinants of conditioned reinforcement can be assessed by the effects of the various characteristics of the terminal-link schedules of primary reinforcement on preference (see Fantino, 1977, for a review). Despite the development of the procedure as a method for studying conditioned reinforcement, the status of conditioned reinforcement as an explanation for behavior maintained by concurrent chains has been uncertain. For example, delay-reduction theory (Squires & Fantino, 1971; see also Fantino & Davison, 1983) maintains that the strength of the terminal-link stimuli as conditioned reinforcers is determined by the relative reduction in time to reinforcement signaled by stimulus onset. But, at the same time, delay-reduction theory This research was supported by NIMH Grant MH 42797 to the University of California-San Diego and NIMH Grant MH 40853 to San Diego State University. Correspondence and reprint requests may be addressed to Ben Williams, Department of Psychology, University of California-San Diego, La Jolla, California 92093-0109 or Roger Dunn, Department of Psychology, San Diego State University, Imperial Valley Campus, Calexico, California 92231.

has no role for the frequency of stimulus presentation. Instead, the relative frequency of terminal-link presentation is assumed to affect behavior via the number of primary reinforcer presentations. The result is a hybrid theory of concurrent-chains performance in which the terminal-link stimuli are assumed to be conditioned reinforcers, but no provision is made for their frequency of presentation as a variable that affects behavior. A second example of the uncertain status of the role of terminal-link stimuli is apparent in the investigations of Mazur (1984, 1985, 1986), which have assessed preference for various combinations of reinforcement delays in the terminal links of a discrete-trial procedure. The result of this research has been a theoretical formulation that predicts choice solely in terms of the delay values of the terminal links, without reference to any function played by the stimuli correlated with those different delays. Yet it is known that the stimuli play an important role. When the same stimulus is correlated with both terminal-link delay values, preference for the shorter delay is substantially reduced (Williams & Fantino, 1978), sometimes to indifference (Navarick & Fantino, 1976). Thus, an important issue is how to conceptualize that stimulus function. Does the stimulus have conditioned reinforcement value and consequently affects behavior according to the same functional relationships that have been established for primary reinforcers, or is the stimulus effect discriminative in nature, in a manner that awaits adequate definition? Clear evidence that stimulus value is in37

BEN A. WILLIAMS and ROGER DUNN

volved was provided by Dunn, Williams, and Royalty (1987). They presented pigeons with a concurrent-chains schedule with the modification that occasional presentations of one of the terminal-link stimuli were added to the procedure, and these additional presentations were correlated with extinction. Such extinction presentations substantially devalued the particular stimulus with which they were correlated, as demonstrated by preference shifting strongly away from the terminal link associated with that stimulus. Such effects were independent of any effect on the primary reinforcement contingencies, because in control conditions the additional extinction periods, when correlated with different stimuli, had no effect on preference. Thus, Dunn et al. established that stimulus value per se must be considered as a critical variable. But a second feature of the results of Dunn et al. (1987) poses a puzzle that awaits solution. The additional extinction presentations were presented either independently of initiallink responding according to a variable-time (VT) schedule or were contingent on one of the initial-link responses (on a variable-interval or VI schedule). We reasoned that both procedures should reduce the preference for the terminal link correlated with the devalued stimulus, but because the terminal-link stimulus should continue to possess some conditioned reinforcing properties (although lessened by the correlation with extinction), the more frequent presentation of that stimulus contingent on the initial-link response might be expected to offset the effect of the extinction training. Choice of the devalued stimulus in the initial link should then be greater in the condition with extinction periods presented on the response-contingent VI schedule than in the VT condition, in which the presentation was independent of responding. In other words, choice of a conditioned reinforcer should be modulated by its frequency of response-contingent presentation. In fact, however, there was no consistent difference between the two conditions, indicating that the contingency had no reliable effect. The failure to find a schedule effect in our previous study raises the question of whether response contingencies for putative conditioned reinforcers obey the same functional relations as do those for primary reinforcers. To the extent that such stimuli really do possess

conditioned value, preference for such stimuli should increase as a function of their frequency of presentation contingent on a particular behavior. In fact, we have been unable to find a single demonstration of such an effect. Moreover, Schuster (1969) reported data in the opposite direction. In his study, pigeons were presented with a concurrent chain in which the conditioned reinforcer (the brief stimuli correlated with food presentation) was presented on a fixed-ratio (FR) 11 schedule during one of two terminal links. Despite there being a higher response rate during that terminal rate, preference for that terminal link was systematically reduced. Thus, not only was there no evidence for the conditioned reinforcers in the terminal link having conditioned value, but the results also supported the view that they were actually aversive. Unfortunately, Schuster's (1969) experiments can be criticized on several grounds. As noted by Gollub (1970), the higher response rates that occurred during the terminal link with the FR 11 schedule might themselves be responsible for the reduced preference for that terminal link, in light of other evidence showing that pigeons find schedules that require such rates (e.g., differential reinforcement of high rates) to be relatively aversive (Fantino, 1968). It is also uncertain how the FR 11 schedule during the terminal-link stimulus should affect the preference in the initial links, because the stimuli produced by the FR 11 schedule were themselves never immediately contingent on choice responding. It is only if the brief stimuli are assumed to increase the value of the terminal-link stimulus itself, perhaps through higher order conditioning, that such an effect would be expected, and there are enormous complications in knowing whether such higher order conditioning effects should occur (e.g., the FR 11 stimuli might compete with the terminal-link stimulus for association with food via the mechanisms of overshadowing). It is also possible that the FR 11 schedule so reduced the value of the brief stimuli that their conditioned reinforcement properties were extinguished. Given that high rates in the terminal link occurred with the FR 11 schedule, which meant that the brief stimuli were themselves occurring at a high rate without reinforcement, it is possible that the frequency of pairing with food was not sufficient to maintain the conditioned rein-

CONDITIONED REINFORCEMENT forcement properties. Because of these interpretative complications, Schuster's results cannot be considered definitive. What is essential for a clear demonstration of the effects of conditioned reinforcement on choice is that preference track the frequency of conditioned reinforcement presentation, without the confounding effects of differences in primary reinforcement and without the confounding effects of changes in the value of the conditioned reinforcer itself. The present study was designed to provide such a demonstration. Pigeons were presented with a concurrentchains procedure in which a single terminallink stimulus, correlated with a constant food schedule, was contingent on both choice alternatives. Superimposed on this food schedule was an independent schedule of stimulus presentations in which the same stimulus was correlated with extinction. The total number of these extinction presentations was held constant, so that the overall percentage of stimulus presentations paired with food was also constant. The critical variable was the percentage of these extinction presentations that were assigned to one or the other choice alternative. To the extent that the stimulus had conditioned value, preference should be greater for the choice alternative followed most frequently by the conditioned reinforcer, regardless of whether those presentations led to food. On the other hand, if the controlling relationship was the signaling properties of the stimulus with respect to food, an opposite effect might be expected, because choice of the alternative that led to a greater number of stimulus presentations would mean that the subject was producing a higher frequency of periods of extinction.

METHOD

Subjects Four male White King pigeons, maintained at 85% of their free-feeding weights, served.

The duration of the food deliveries was adjusted for each subject during initial training to reduce the need for supplemental feedings. When necessary, supplemental feedings were given approximately 4 hr after the experimental sessions. Water and grit were available freely in the home cages. All subjects had prior experience with a concurrent-chains procedure.

39

Apparatus Each subject was assigned to one of four experimental chambers. The chambers were cubes, 32 cm on a side, housed within wooden enclosures. In each, one side panel was a Plexiglas door; the remaining sides and ceiling were aluminum. There were three translucent response keys on the front panel. The keys were 2.5 cm in diameter and evenly separated, 6.0 cm apart, 24 cm above the grid floor. The keys could be transilluminated with various colors. The food-hopper opening was located 9 cm beneath the center key. When activated, the solenoid-operated hopper was illuminated by white light and allowed access to mixed grain. A houselight mounted in the center of the ceiling provided general chamber illumination except during operation of the hopper. Stimuli, contingencies, and data collection were controlled by an XT-compatible computer with Turbo-Pascal @ software. Procedure Food delivery was arranged on the concurrent-chains schedule shown in Figure 1. During the initial links, the two side keys were illuminated white. Entry to the terminal link was contingent on responses on the side keys. During a terminal link the side keys were dark and the center key was illuminated. The color on the center key, green or blue, differed across subjects. The initial links were VI 120-s schedules; the terminal link was fixed-interval (FI) 20 s. A VI 30-s schedule of center keylight presentations also operated during initial links, as represented on the right in Figure 1. Upon completion of each interval in the schedule, the terminal-link entry was assigned to follow the next response to one of the two side keys. The percentages of these added presentations of the center keylight assigned to each alternative were varied across conditions. In the first series of conditions, the added center keylight presentations did not end in food delivery but terminated automatically after 20 s. Otherwise they were identical to the terminal links from the basic schedule that did end in food. In all conditions, the terminal links and periods of extinction were inaccessible for 1.8 s following changeovers between alternatives (a changeover delay or COD). In the event that both a terminal-link entry and period of ex-

BEN A. WILLIAMS and ROGER DUNN

40

the center keylight ended in food delivery (i.e., all presentations of the center keylight ended in food, including those scheduled by the VI VI@ 30 s II VI I20 s VI 20s 30-s schedule). For conditions .80-S and .20-S, the additional presentations of the center key terminal link again ended in extinction, but IeG with a key color (blue or green) different from that used in the terminal links of the concurrent-chains food schedule. The final three con]I ditions for each subject replicated the ABA comparison used in the first series of conditions FT 20 s Fl 20 s in which the center keylight during extinction was identical to that used in the terminal links ending in food. The sequence of conditions for Fig. 1. Schedules of center keylight presentations. In the schedule on the left, the center keylight was correlated each subject is shown in Table 1. Assessment of preference. The dependent with the terminal links of the concurrent chains and each presentation of the center keylight ended in food delivery variable was the relative rate of responding on an Fl 20-s schedule. The schedule on the right was (preference) in the initial link to the left alsuperimposed on the concurrent chains. Upon completion ternative. The number of responses on the iniof a single VI schedule, the response contingency was assigned to one of the two side keys. In most conditions, tial-link stimulus on the left was divided by the center keylight on this schedule did not end in food the number of responses on both initial-link terminating on a fixed-time (FT) 20-s schedule. The value stimuli. After 15 sessions (and for each session of p was varied across conditions. thereafter) in a condition, the relative response rates for the nine preceding sessions were ditinction were scheduled to follow an initial- vided into blocks of three sessions. Preference link response, the selection of the outcome was was considered stable when the block means random. (M) did not differ by more than 0.05 and The duration of access to the food hopper showed neither an upward trend (M1 < M2 and the number of food deliveries per session < M3) nor a downward trend (M1 > M2 > were adjusted on an individual basis to reduce M3). All values reported are the means of the the need for supplemental feeding. Prior work 9-day periods for which stability was achieved. in this laboratory with schedules providing infrequent reinforcement suggested that reRESULTS sponding is more likely to be maintained when The relative response rates in each condition feeding in the home cage is minimized. Hopper durations and the maximum number of food for each subject are represented in Figure 2. delivers per session were 3 s and 60 per session In general, the results were similar for the two for Subjects G42 and G59 and 4 s and 80 per replications of the two presentations of the ABA session for Y69 and R90. These values were manipulation of conditioned-reinforcement not manipulated during the course of the ex- frequency, and for all birds preference was sensitive to the rate of conditioned reinforceperimental conditions. Experimental conditions. In the first condi- ment. Preference for the left key was below tion, 50% of the extinction periods followed .50 for the majority of conditions regardless of the left response and 50% followed the right the assignment of the extinction periods, inresponse. The percentages of extinction peri- dicating a response bias toward the right key ods following the two responses were varied (with the exception of G59). This bias was over the next three conditions in an ABA pat- less evident during the initial condition in which tern. First, 80% of the extinction periods, then the probability of the extinction periods was 20%, then 80% again, followed responses on equal for the two choice alternatives, suggestone alternative (left key for 2 birds, right key ing that the bias developed as a function of the for the other 2). asymmetry in the contingency that occurred in The experimental conditions also included subsequent conditions. Regardless of this bias, two control manipulations. For conditions preference for the left key generally increased .80-R and .20-R, the added presentations of when the frequency of the extinction presen*G

rX

CONDITIONED REINFORCEMENT

41

Table 1 The first four columns show the probability of assignment of the periods of extinction to the left alternative, the proportions of initial-link responses to the left, the proportion of foodreinforced terminal-link entries from the left, and the obtained proportion of extinction presentions following left responses. The remaining columns show absolute response rates during the initial link, during each terminal link of the basic food schedule, and during the extra terminal link presentions normally correlated with extinction. The last column shows the number of sessions per condition.

Bird

p (left)

G59

.50 .20 .80 .20 .20-R .80-R .80-S .20-S .20 .80 .20 .50 .20 .80 .20 .20-S .80-S .80-R .20-R .20 .80 .20 .50 .80 .20 .80 .80-S .20-S .20-R .80-R .80 .20 .80 .50 .80 .20 .80 .80-R .20-R .20-S

G42

Y69

R90

.80-S .80 .20 .80

Proportions TerReminal EXT sponse links .49 .33 .72 .35 .21 .81 .53 .69 .43 .75 .18 .54 .27 .43 .35 .41 .40 .86 .22 .29 .55 .40 .40 .51 .29 .33 .36 .39 .18 .49 .37 .21 .33 .43 .30 .15 .24 .74 .23 .42 .38 .41 .31 .45

.49 .50 .53 .48 .42 .54 .51 .53 .52 .54 .45 .51 .51 .53 .53 .51 .47 .56 .49 .50 .54 .51 .47 .49 .42 .47 .43 .49 .39 .50 .42 .35 .38 .49 .44 .32 .35 .54 .44 .47 .47 .46 .41 .48

.50 .19 .81 .22 .21 .80 .79 .19 .22 .74 .20 .52 .20 .83 .19 .20 .82 .78 .18 .20 .83 .19 .50 .76 .19 .75 .80 .21 .20 .79 .78 .17 .77 .51 .75 .18 .84 .81 .20 .21 .82 .80 .19 .79

Response rate Terminal link

EXT

Initial link

Left

Right

Left

Right

Sessions

61.8 81.7 93.2 97.5 40.3 49.7 63.9 81.9 44.6 56.6 97.1 25.7 48.4 28.7 22.5 17.9 24.3 37.3 46.7 44.0 26.7 28.6 19.1 30.8 35.9 33.8 25.8 29.2 23.7 15.2 43.0 54.3 40.9 26.8 50.3 46.4 46.2 38.9 40.2 47.3 59.4 31.3 33.2 30.0

34.4 21.0 23.7 19.2 27.0 31.2 45.1 39.6 35.1 33.9 36.9 12.8 21.0 38.1 44.7 83.7 78.0 19.0 15.9 27.0 31.2 17.4 45.2 74.6 73.5 80.7 91.6 81.0 63.2 72.6 90.2 83.4 97.1 17.6 28.8 11.4 16.8 10.2 14.2 30.3 27.9 55.8 49.2 29.4

30.7 20.7 28.5 19.8 32.1 25.5 49.6 55.2 38.7 41.7 35.4 13.5 16.2 31.2 33.6 80.1 89.5 18.3 12.9 21.9 22.5 14.7 58.2 74.8 47.7 77.3 98.4 89.2 71.5 54.6 95.3 94.8 95.7 14.0 35.7 12.6 20.7 9.6 15.0 28.8 25.2 54.5 51.6 30.2

29.2 20.1 24.0 16.4 30.6 28.6 1.2 0.9 33.6 32.7 27.9 15.4 15.6 33.0 42.3 0.0 0.0 16.9 16.5 25.2 24.9 19.2 49.0 67.5 67.2 84.5 0.0 0.0 70.2 82.8 89.9 93.0 98.2 15.8 31.5 10.8 17.1 7.8 18.6 0.0 0.0 49.3 49.7 34.0

31.0 18.3 32.4 19.5 28.5 31.0 1.8 0.3 32.7 33.6 39.0 10.1 14.4 30.6 32.4 0.0 0.0 14.8 12.0 20.7 21.0 12.6 47.5 78.8 67.2 85.1 0.0 0.0 68.3 70.8 90.4 86.4 99.0 20.1 39.6 9.3 18.0 7.5 14.7 0.0 0.0 56.4 55.2 26.3

18 21 28 22 34 28 34 19 27 32 30 29 18 20 21 34 35 32 30 27 31 32 23 22 20 26 34 27 22 20 27 29 32 25 21 21 22 23 20 26 30 23 31 32

tations contingent on the left key was increased (the .80 condition) and decreased when their frequency of presentation was reduced (the .20 condition). Averaged over replications, pref-

erence for the left key when the probability of conditioned reinforcement assignment was .80 versus when it was .20 was .74 versus .32, .49 versus .33, .39 versus .25, and .35 versus .23

BEN A. WILLIAMS and ROGER DUNN 1.0,

1.0

0.8

0.8

0.6

0.6

z a: 0.4

0.4

0.2

0.2

LUi

LL I LU)

r-J

0.0 .

r

H LL Ll.J

0.0, EXTINCTION

RFT

SIG

- EXT-

0.8.

EXTINCTION

RFT

SIG

- EXT-

EXTINCTION

RFT

SIG

-

0.8

a: 0.6.

z IL 0 LLJ llJ

0.4 .

0.2 0.0

EXTINCTION

RFT

SIG

-

EXT-

EXT-

Fig. 2. Preference for the left alternative across probabilities of added center keylight presentations contingent on a response to the left key. Data are from the last nine sessions of training. The length of the error bars corresponds to one standard deviation of the daily relative response rates across those sessions. The center keylight presentations were not followed by food delivery (EXT) in most conditions. In other conditions, the center keylight was followed by food (RFT) or a different key color signaled the period of extinction (SIG).

for Subjects G59, G42, Y69, and R90, respectively. Thus, all subjects preferred the left key more when a greater frequency of conditioned reinforcement was contingent on left responses, despite the fact that the added presentations of the conditioned reinforcer always ended in extinction. Table 1 includes the absolute response rates during the terminal-link presentations, separated according to whether they ended in extinction or food, and as a function of whether they were presented following the left versus right choice alternative. It is possible that different contingencies during the center key presentations could be discriminated as a function of the relative frequency with which they were produced by the different choice alternatives. For example, a center key presentation in the .20 condition would have a higher probability of reinforcement following a left key

choice than following a right key choice. Some discrimination did in fact occur. Averaged over both reinforced and nonreinforced terminallink presentations, the mean response rate was 43.2 during the terminal link associated with the .20 probability and 41.3 during the terminal link associated with the .80 probability. Although this difference was very small, it was consistent for all 4 subjects. There was no difference in the response rate as a function of whether the terminal-link stimulus ended in food or no food; here the mean rate during reinforced terminal-link entries was 42.4 and that during nonreinforced terminal-link entries was 42.2. Figure 2 also shows the results from the control conditions. In one of these (labeled RFT), the center key presentations produced by the VI 30-s schedule were correlated with an FI 20-s food schedule, just as the regular

CONDITIONED REINFORCEMENT terminal-link presentations were. To the extent that choice was controlled solely by the conditioned reinforcement value of the terminal-link stimulus, the degree of preference should have been the same as for the conditions with the same stimuli but with the extra terminal-link presentations ending in extinction. Alternatively, if the delayed food contingencies also contributed to the preference, preference should be more extreme when all terminallink presentations ended in food. For all 4 subjects, the latter was the case. Averaged over all subjects, preference for the left key was .21 for the .20-R condition and .81 for the .80-R condition, in contrast to the values of .28 and .49 from the .20 and .80 conditions in which the extra center key presentations ended in extinction. The second set of control conditions (labeled SIG) had the additional center key presentations ending in extinction, but these were correlated with a different key color than the terminal-link presentations ending in food. Table 1 shows that all subjects discriminated this contingency; response rates during the extinction periods were near zero and those during the food-reinforced terminal links were quite high. Thus, the additional center key presentations were effectively discriminated as periods of timeout. As shown in Figure 2, the effects of this contingency on the degree of preference were very small, because little difference occurred as a function of the percentage of timeout presentations. Nevertheless, the effect was consistent across subjects. All 4 subjects slightly preferred the choice alternative correlated with the smaller number of timeout presentations. Note that this punishment effect is in the opposite direction to the conditionedreinforcement effect seen when the additional extinction presentations were correlated with the same key color as the terminal-link presentations ending in food.

DISCUSSION The present results support the view that the stimuli correlated with the terminal links of a concurrent-chains schedule have conditioned value: Preference in the initial link of the schedule consistently tracked the relative frequency of occurrence of the terminal-link stimuli. Of greatest importance, this change in preference occurred despite the fact that the

43

additional stimulus presentations were correlated with extinction, which meant that the subjects chose the alternative that led to the greater frequency of timeout periods. Timeout per se was shown to be slightly aversive in the conditions in which the extra extinction periods were signaled by different stimuli (see also Dunn, 1990). The opposite pattern that was obtained when the extra stimulus presentations were the same as the regular terminallink stimulus demonstrates that it is not the signal value of the terminal-link stimulus that determines its effect on choice. To the extent that the additional presentations of the terminal-link stimulus conveyed any information about the reinforcement consequences of a given choice, that information was that the response with more frequent terminal-link entry was correlated with more frequent presentations of extinction. The fact that preference was increased by these additional presentations of extinction can apparently be explained only in terms of the conditioned value of the terminallink stimulus. Thus, the frequency of response-contingent conditioned reinforcers modulates choice in the same manner as has been demonstrated for primary reinforcers (e.g., Herrnstein, 1961). The fact that preference is controlled by the relative frequency of terminal-link presentation, independent of the frequency of food presentation, may not be regarded by some as critical evidence that the terminal-link stimulus possessed conditioned value (Mazur, personal communication). According to this alternative perspective, the terminal-link stimulus may signal an upcoming reinforcer, which is probabilistic in occurrence and undifferentiated in probability as a function of the choice response, so that different numbers of terminal-link presentations are viewed by the subject as indicators of different numbers of food presentations. Thus, it is not that the signal has value in its own right, but rather confuses the subject into learning a spurious correlation between choice and food. The difficulty with this account is that it requires a concept of "signal" that allows the subject to learn a correlation between choice and reinforcement probability that is exactly the opposite of the real contingencies. Given that the concept of "information" need no longer be constrained by the objective contingencies, it seems dubious that such a view can be distin-

44

BEN A. WILLIAMS and ROGER DUNN

guished empirically from the concept of conditioned value. Given the clear effects obtained in the present study, some comment is appropriate regarding their conflict with the previous results of Schuster (1969), who found that choice was inversely related to the frequency of the putative conditioned reinforcers. One major difference between the two procedures was the schedule of the stimulus or conditioned-reinforcer presentations. Whereas here they were immediately contingent on choice behavior, in Schuster's study they were contingent on responding in the terminal link of the chain. A second difference was the overall percentage of stimulus presentations followed by food. Here that percentage was fixed, and only the relative allocation of the stimulus presentations to one or the other choice alternative was varied. In Schuster's presentation the stimuli were presented on an FR 11 schedule, which meant that both the number of stimulus presentations and their percentage of pairing with food varied with the subjects' response rates. Both of these variables are of potential importance for a variety of possible reasons (see above). In any event the present procedure leaves little doubt that choice is positively correlated with the frequency of stimulus presentation when these confounding effects are removed. An interesting feature of the present results was the difference between the degree of preference for the additional stimulus presentations as a function of whether these additional presentations ended in food. Although preference tracked the relative frequency of stimulus presentations regardless of their outcome, the degree of preference was significantly greater when all stimulus presentations were followed by food. There are at least three plausible explanations for this difference. First, it is possible that the terminal-link presentations differed in value as a function of their associated probabilities of reinforcement. As shown in Table 1, this assumption is partly supported by the small but consistent differences in terminal-link response rates as a function of the probability of presentation of the extinction periods. On average, response rates were two or three responses per minute lower when the terminal link was entered from the side associated with the .80 probability than when the terminal link was entered from the side

with the .20 probability. Although this difference is very small, it may underestimate the difference in value of the terminal-link stimulus at the time of terminal-link entry, because these response rates were averaged over the entire terminal-link duration and it is likely that this discrimination was at its maximum just after terminal-link entry. Thus, the difference in terminal-link frequency would be compensated for to some degree by an effect on stimulus value that was in the opposite direction. The result should be an attenuation in preference for the choice alternative associated with the higher probability of terminallink entry. A second explanation is that the degree of preference varied with the overall value of the terminal-link stimulus. Because reinforcement followed all terminal-link entries in the RFT conditions, the terminal-link stimulus during those conditions was presumably of greater value than during the other conditions in which the majority of terminal-link entries ended in extinction. The difficulty with this explanation is that it is not obvious why the overall value should affect the degree of preference, given that the effects of reinforcement frequency and reinforcement magnitude (which presumably would be equivalent to stimulus value) are generally regarded to be independent (see Williams, 1988, for a review). Moreover, there are other data that show the opposite pattern of results, in that preference is enhanced by increasing the absolute duration of the terminal links while keeping their ratio constant (e.g., Williams & Fantino, 1978). The most straightforward explanation for the stronger preferences with the .20-R and .80-R conditions is that choice may have been controlled not only by the conditioned reinforcement value of the terminal-link stimulus but also by the direct strengthening effect of delayed reinforcers. That is, the different conditions were similar in the relative frequency of the same terminal-link stimulus but differed in terms of the occurrence of the delayed primary reinforcers. Such a possibility clearly explains the present pattern of results, but it requires delayed reinforcement effects over delays as long as 20 s. Evidence for such a possibility is provided by Williams and Fantino (1978), who demonstrated clear preference for the shorter of two long delays (e.g., 15 vs. 30 s) even when the same stimuli occurred in the

CONDITIONED REINFORCEMENT intervening delay periods. Such results, combined with the present findings, suggest that delayed reinforcement effects extend over longer intervals than might be expected on the basis of research with unsignaled delay-of-reinforcement contingencies (Sizemore & Lattal, 1978; Williams, 1976), in which very short delays (e.g., 3 s) have reduced response rates by 75% to 80% from the baseline values with zero delay. Why choice procedures should produce such greater sensitivity to extended delay contingencies is unclear. Finally to be considered are the implications of these results for models of choice behavior maintained by concurrent chains. Delay-reduction theory (Fantino, 1977; Fantino & Davison, 1983) fails because it omits any role for the frequency of conditioned reinforcement independent of differences in the relative frequency of primary reinforcement. Similarly, theories of choice based on timing theory (Gibbon, Church, Fairhurst, & Kacelnik, 1988) also fail because they omit any role for conditioned reinforcement entirely. Clearly, the present data contradict any account that assumes choice is controlled solely by the time to reinforcement, however that time is calculated. On the other hand, accounts that assume that choice is controlled solely by the conditioned-reinforcement value of the terminal-link stimuli (Vaughan, 1985) are also challenged by the present data because of the differences that were obtained as a function of whether or not the additional terminal-link presentations ended in food. As noted above, this difference suggests that the delayed effects of primary reinforcement should be considered separately from the immediate effects of conditioned reinforcement (although it is also clear that this distinction is not forced by the data, given the alternative explanations of the difference). The only major theory of choice that explicitly distinguishes these different effects is that of Killeen (1982; see also Killeen & Fantino, 1990). All of the present data are at least qualitatively consistent with his account. Whether his theory can also predict the quantitative features of the present data remains to be seen.

45

Dunn, R., Williams, B., & Royalty, P. (1987). Devaluation of stimuli contingent on choice: Evidence for conditioned reinforcement. Journal of the Experimental Analysis of Behavior, 48, 117-131. Fantino, E. (1968). Effects of required rates of responding upon choice. Journal of the Experimental Analysis of Behavior, 11, 15-22. Fantino, E. (1977). Conditioned reinforcement: Choice and information. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 313-339). Englewood Cliffs, NJ: Prentice-Hall. Fantino, E., & Davison, M. (1983). Choice: Some quantitative relations. Journal of the Experimental Analysis of Behavior, 40, 1-13. Gibbon, J., Church, R. M., Fairhurst, S., & Kacelnik, A. (1988). Scalar expectancy theory and choice between delayed rewards. Psychological Review, 95, 102-114. Gollub, L. R. (1970). Information on conditioned reinforcement: A review of Conditioned Reinforcement, edited by Derek P. Hendry. Journal of the Experimental Analysis of Behavior, 14, 361-372. Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis ofBehavior, 4, 267272. Killeen, P. R. (1982). Incentive theory: II. Models for choice. Journal of the Experimental Analysis of Behavior, 38, 217-232. Killeen, P. R., & Fantino, E. (1990). Unification of models for choice between delayed reinforcers. Journal of the Experimental Analysis of Behavior, 53, 189-200. Mazur, J. E. (1984). Tests of an equivalence rule for fixed and variable reinforcer delays. Journal of Experimental Psychology: Animal Behavior Processes, 10, 426436. Mazur, J. E. (1985). Probability and delay of reinforcement as factors in discrete-trial choice. Journal of the Experimental Analysis of Behavior, 43, 341-351. Mazur, J. E. (1986). Fixed and variable ratios and delays: Further tests of an equivalence rule. Journal of Experimental Psychology: Animal Behavior Processes, 12, 116-124. Navarick, D. J., & Fantino, E. (1976). Self-control and general models of choice. Journal of Experimental Psychology: Animal Behavior Processes, 2, 75-87. Schuster, R. H. (1969). A functional analysis of conditioned reinforcement. In D. P. Hendry (Ed.), Conditioned reinforcement (pp. 192-234). Homewood, IL: Dorsey Press. Sizemore, 0. J., & Lattal, K. A. (1978). Unsignalled delay of reinforcement in variable-interval schedules. Journal of the Experimental Analysis of Behavior, 30, 169-175. Squires, N., & Fantino, E. (1971). A model for choice in simple concurrent and concurrent-chains schedules. Journal of the Experimental Analysis of Behavior, 15, 2738. Vaughan, W., Jr. (1985). Choice: A local analysis. Journal of the Experimental Analysis of Behavior, 43, 383405. Williams, B. A. (1976). The effects of unsignalled deREFERENCES layed reinforcement. Journal of the Experimental Analysis of Behavior, 26, 441-449. Dunn, R. (1990). Timeout from concurrent schedules. Journal of the Experimental Analysis of Behavior, 53, Williams, B. A. (1988). Reinforcement, choice, and response strength. In R. C. Atkinson, R. J. Herrnstein, 163-174.

46

BEN A. WILLIAMS and ROGER DUNN

G. Lindzey, & R. D. Luce (Eds.), Stevens' handbook of experimental psychology: Vol. 2. Learning and cognition (2nd ed., pp. 167-244). New York: Wiley. Williams, B. A., & Fantino, E. (1978). Effects on choice of reinforcement delay and conditioned reinforcement.

Journal of the Experimental Analysis of Behavior, 29, 7786. Received July 6, 1990 Final acceptance September 14, 1990

Preference for conditioned reinforcement.

Pigeons were presented with a concurrent-chains schedule in which both choice alternatives led to the same terminal-link stimulus, which was followed ...
1MB Sizes 0 Downloads 0 Views