Cognitive Science 39 (2015) 457–495 Copyright © 2014 Cognitive Science Society, Inc. All rights reserved. ISSN: 0364-0213 print / 1551-6709 online DOI: 10.1111/cogs.12170

A Cognitive Model of Dynamic Cooperation With Varied Interdependency Information Cleotilde Gonzalez,a Noam Ben-Asher,a Jolie M. Martin,a Varun Duttb a

Dynamic Decision Making Laboratory, Department of Social and Decision Sciences, Carnegie Mellon University b School of Computing and Electrical Engineering, School of Humanities and Social Sciences, Indian Institute of Technology Received 1 September 2013; received in revised form 12 December 2013; accepted 7 January 2014

Abstract We analyze the dynamics of repeated interaction of two players in the Prisoner’s Dilemma (PD) under various levels of interdependency information and propose an instance-based learning cognitive model (IBL-PD) to explain how cooperation emerges over time. Six hypotheses are tested regarding how a player accounts for an opponent’s outcomes: the selfish hypothesis suggests ignoring information about the opponent and utilizing only the player’s own outcomes; the extreme fairness hypothesis weighs the player’s own and the opponent’s outcomes equally; the moderate fairness hypothesis weighs the opponent’s outcomes less than the player’s own outcomes to various extents; the linear increasing hypothesis increasingly weighs the opponent’s outcomes at a constant rate with repeated interactions; the hyperbolic discounting hypothesis increasingly and nonlinearly weighs the opponent’s outcomes over time; and the dynamic expectations hypothesis dynamically adjusts the weight a player gives to the opponent’s outcomes, according to the gap between the expected and the actual outcomes in each interaction. When players lack explicit feedback about their opponent’s choices and outcomes, results are consistent with the selfish hypothesis; however, when this information is made explicit, the best predictions result from the dynamic expectations hypothesis. Keywords: Instance-based learning theory; Cognitive modeling; Prisoner’s dilemma; Cooperation; Social behavior; Interdependency information

Correspondence should be sent to Prof. Cleotilde Gonzalez, Dynamic Decision Making Laboratory, Department of Social and Decision Sciences, Carnegie Mellon University, 4609 Winthrop Street, Pittsburgh, PA 15213. E-mail: [email protected]

458

C. Gonzalez et al. / Cognitive Science 39 (2015)

1. Introduction Our society relies on the idea that we can interact closely and amicably with other individuals, groups, and organizations in order to effectively achieve goals. For example, in the corporate world, the viability of business ventures requires managers and stakeholders to work together and cooperate. Cooperation within and across organizations is highly predictive of the creation and success of new business initiatives (Cable & Shane, 1997; Pinto, Pinto, & Prescott, 1993). Cooperation helps to build consensus and to implement important decisions (Gersick & Davis-Sacks, 1990; Pinto et al., 1993). In personal relationships, long-lasting friendships and marriages often require social bonds that are developed through close cooperation (Tallman & Hsiao, 2004); and in international policy, countries often develop coalitions with the expectation of cooperation to achieve mutual benefits (Bueno de Mesquita, 2006). Yet, in reality, conflicts are pervasive in human life across all these levels: Business ventures often fail due to the lack of cooperation between individuals, many marriages end in divorce, and many countries engage in conflicts that last many decades. How can we better understand the emergence and maintenance of cooperation between two players? Behavioral game theory (BGT) often describes real-life interactions between two players in terms of abstract games in which an individual may cooperate or compete with an opponent (Camerer, 2003). In contrast with classical economic models, BGT makes no assumptions about how correct beliefs about one another’s behavior are formed; rather, BGT explains how cooperation is impacted by social dimensions such as interpersonal relationships, past encounters, identities, and emotions (Camerer, 2003; Camerer & Fehr, 2006). There are, however, at least two factors that limit the use of BGT models for understanding the emergence of cooperation in real interactions. First, most BGT results rely on one-time interactions. Although one-time interactions are relevant in our everyday decisions, cooperation often develops through repeated interactions. Research on learning that can explain the emergence, sustainability, and adaptability of cooperation is very scarce, and many existent learning models underplay the role of cognitive processes necessary to recognize, remember, adapt, and respond to one’s prior history of interactions with an opponent (Camerer, 2003). Second, most BGT studies rely on complete interdependency information (about possible actions and the resulting outcomes) being provided to the players (Camerer, 2003). However, we rarely have complete and accurate information regarding each actor’s possible actions and the corresponding outcomes in social interactions. Information has been regarded as an important factor in the emergence of cooperation (Camerer, 2003; Gonzalez & Martin, 2011; Martin, Gonzalez, Juvina, & Lebiere, 2013; Rapoport & Chammah, 1965), yet very little empirical research exists that systematically tests the effects of different amounts of information in repeated interactions (Martin et al., 2013; Rapoport & Chammah, 1965). This research advances our understanding of the emergence and maintenance of cooperation under different levels of information by analyzing sequential behavior and the ensuing dynamics of cooperation in a large data set involving repeated two-person

C. Gonzalez et al. / Cognitive Science 39 (2015)

459

interactions in the Prisoner’s Dilemma (PD) (Martin et al., 2013); proposing an instancebased learning cognitive model for repeated social interactions (IBL-PD) that builds on an IBL model of individual learning in repeated binary choice (Gonzalez & Dutt, 2011; Lejarraga, Dutt, & Gonzalez, 2012); proposing hypotheses of how a player weighs an opponent’s outcomes while making repeated choices over time; and testing such hypotheses by running simulations with IBL-PD and comparing the results to learning patterns in human data. 1.1. The Prisoner’s Dilemma, interdependency information, and main empirical phenomena Social dilemmas are often represented with economic incentives that depend on the actions of two players. These situations, often referred to as 2 9 2 games (two players, each with two possible actions in each trial) (Rapoport, Guyer, & Gordon, 1976), include classic conundrums such as the PD. Fig. 1 shows a payoff matrix of the PD with two players, 1 and 2, each with two options: C (cooperate) and D (defect). Assuming both players make independent choices in parallel, it yields an equal outcome for each player (+1) if both players cooperate. If one of the players cooperates and the other defects, that yields a highly positive outcome for the player that defected (+10) and a highly negative outcome for the one that cooperated (10). If both players defect, there is an equal loss (1) for each of the players. In a one-shot game, each individual’s outcomes will be greater by defecting rather than by cooperating, regardless of what action the other player takes. Mutual defection is the Nash equilibrium; however, mutual cooperation leads to a larger joint (social) outcome in the long term. In a repeated PD, where the same two individuals play the same game repeatedly over multiple trials (e.g., 200 trials), players face a tradeoff between the temptation of the short-term gain from defection and a potential long-term gain from mutual cooperation. Overall, mutual cooperation leads to a larger joint outcome (e.g., joint outcome of +400 assuming that both players consistently cooperate in each trial over 200 trials) than do mixed actions (e.g., joint outcome of 0 assuming that both players consistently take opposite actions in each trial) or mutual defection (e.g., joint outcome of 400 assuming that both players consistently defect in each trial). Early studies of human behavior in the PD show some evidence of how information provided to players (i.e., the presence or absence of a payoff matrix) determines the

Player 1 Options

D C

Player 2 Options D C -1, -1 10, -10 -10, 10 1, 1

Fig. 1. Prisoner’s Dilemma payoff matrix. An example of a Prisoner’s Dilemma payoff matrix (Rapoport & Chammah, 1965) with Option D denoting defection and Option C denoting cooperation. The cells show a pair of outcomes (x, y), where x is the payoff to Player 1 and y is the payoff to Player 2. These outcomes remained constant across trials.

460

C. Gonzalez et al. / Cognitive Science 39 (2015)

amount of cooperation that emerges through repeated interaction (Rapoport & Chammah, 1965). Rapoport and Chammah (1965) expected that the matrix’s absence would result in the sustained cooperation, given the consistent positive payoffs that both players would get from mutual cooperation (e.g., in Fig. 1, +1, +1), as opposed to the unstable mixed outcomes (CD and DC) or negative DD outcomes. They also expected that its presence would highlight the dominance of the defection strategy and inhibit cooperation. Their results showed exactly the opposite: The total amount of mutual cooperation observed when the matrix was explicitly provided to both players was greater than that observed when the matrix was concealed (Rapoport & Chammah, 1965). Presumably, the matrix served as a reminder of an implicit agreement that had to be made between the players to cooperate in the long term, regardless of the short-term equilibrium strategy from game theory (to defect). Given the extreme significance of Rapoport and Chammah (1965)’s results for real social interactions, it is puzzling that little empirical research has been devoted to investigating the effects of information content and format on cooperation in the PD and other social dilemmas. Most research conducted with social dilemmas utilizes “matrix presentation,” a condition where a representation of the players’ interdependency is provided, even when the game is repeated. Martin et al. (2013) report a large experiment conducted using the PD game in Fig. 1 to test the effects of four levels of information: Individual, where each player is unaware of their interdependence with another person, and they are only shown their own actions and outcomes (i.e., no payoff matrix is provided); Minimal, which builds on the Individual level by informing players of the existence of another person and the influence that the player and their opponent have on one another’s outcomes, but players are not given explicit information about their opponent’s actions or outcomes; Experiential, which builds on the Minimal level by providing players with explicit information about their opponent’s actions and outcomes in each trial; and Descriptive, which builds on the Experiential level by providing the payoff matrix explicitly from the outset and throughout the repeated trials of the game. The data set reported in Martin et al. (2013) included 240 participants who were randomly paired to play the PD game for 200 repeated trials (the endpoint was unknown to participants). A trial consisted of each player selecting one of the two actions and receiving feedback about the actions and outcomes taken by the opponent (depending on the level of information). Thirty pairs (60 participants) were assigned to each of the four information conditions: Individual, Minimal, Experiential, and Descriptive. (These levels of information are called “No-Info,” “Min-Info,” “MidInfo,” and “Max-Info” in Martin et al., 2013, respectively. Please refer to their publication for a detailed description of the methods.) Their findings include a significant effect of the information condition on the average levels of individual and mutual cooperation, as well as on the trends of cooperation over repeated trials. Martin et al. (2013) found greater average levels of cooperation with higher levels of information. They also found decreasing trends of cooperation over trials in the Individual and the Minimal conditions, and significant increasing trends over trials in the Descriptive and Experiential conditions over the 200 trials.

Information Level

Proportion of Cooperation MSD (r2) Mistrust D DD MSD (r2)

Forgiveness C CD MSD (r2)

Repetition Propensities Abuse D DC MSD (r2)

Trust C CC MSD (r2)

Individual 0.0191 (0.76*) 0.0589 (0.77*) 0.0070 (0.69*) 0.0084 (0.45*) 0.002 (0.75*) Minimal 0.0241 (0.62*) 0.0772 (0.70*) 0.0091 (0.73*) 0.011 (0.47*) 0.0025 (0.67*) Experiential 0.0671 (0.19*) 0.1037 (0.45*) 0.003 (0.75*) 0.0032 (0.71*) 0.0393 (0.06*) Descriptive 0.1355 (0.05) 0.1527 (0.53*) 0.0032 (0.75*) 0.0019 (0.66*) 0.1047 (0.07) 0.0043 (0.70*) 0.0163 (0.70*) 0.0051 (0.63*) 0.002 (0.43*) 0.0029 (0.73*) Individual: Selfish hypothesis, wt = 0, d = 3.583; for t = 1 to 200. r = 4.396 Parameters d and r were 0.0047 (0.61*) 0.0098 (0.68*) 0.0067 (0.67*) 0.0024 (0.43*) 0.0032 (0.66*) Minimal: calibrated to human data in d = 0.631; the Individual and Minimal r = 1.713 conditions Note. *Indicates significance at p < 0.05. The table shows a summary of MSD values and correlations over the 200 trials between the model data and human data in the four information levels (top section). These results are model generalizations, using parameter values (d = 5 and r = 1.5) carried over from calibration to a different data set (Lejarraga et al., 2012). Bottom section shows results from calibrating the d and r model parameters to the Individual and Minimal conditions.

Selfish hypothesis, wt = 0, for t = 1 to 200 (d = 5 and r = 1.5)

Hypotheses (Parameters)

Table 1 Summary of tests of the selfish hypothesis, wt =0, for t = 1 to 200

472 C. Gonzalez et al. / Cognitive Science 39 (2015)

462

C. Gonzalez et al. / Cognitive Science 39 (2015)

dependency information influenced the dynamics of cooperation in varied ways only after extended repeated interactions. Fig. 3 shows the proportions of sequential dependencies over the course of 200 trials for each of the four conditions. This complete set of sequential dependencies (repetition propensities) emerging from the sequential interactions were not reported in Martin et al. (2013), and they are needed to precisely characterize reciprocal behaviors in PD (Rapoport et al., 1976). These describe the player’s behavior in each trial (at the individual level) contingent upon the actions and the corresponding outcomes of both players in the preceding trial. Mistrust is the decision a player makes to defect (D) at time t, after both players mutually defected (DD) at time t  1. It denotes a player’s unwillingness to shift away from mutual defection, probably due to the perceived risk that one’s opponent will continue to defect. Forgiveness refers to the decision to continue cooperating (C) at time t, even though mutual cooperation was not achieved due to the other’s defection (CD) at time t  1. It denotes an attempt to teach cooperation by example, repeating an unreciprocated attempt to cooperate. Abuse refers to the decision to continue to defect (D) at

Fig. 3. Repetition propensities over time. Each panel in the figure shows one type of sequential dependency: Mistrust, top left; Forgiveness, top right; Abuse, bottom left; and Trust, bottom right, per trial, for t = 1 to 200, for each of the four information conditions (Individual, Minimal, Experiential, and Descriptive).

C. Gonzalez et al. / Cognitive Science 39 (2015)

463

time t after a profitable defection (DC) at t  1, and denotes a player’s tendency to exploit their opponent. Trust is a decision to continue to cooperate at time t, after successful mutual cooperation (CC) at time t  1. This measure denotes a resistance to the temptation to maximizing short-term gain and overcoming the fear of being exploited, and it might indicate that trust has been established between the two players.1 Participants in the four information conditions differed significantly in their average proportions of Mistrust (F(3, 236) = 3.774, p = .011,g2 = .046), Forgiveness (F(3, 236) = 2.649, p = .05, g2 = .032), Abuse (F(3, 236) = 3.797, p = .011,, g2 = .046), and Trust (F(3, 236) = 16.1, p < .001, g2 = .17). To tease these apart further, we first examine comparisons of Mistrust, Forgiveness, Abuse, and Trust, across pairs of conditions. Next, we examine the trends over time. In pair-wise comparisons we find no differences in Mistrust, Forgiveness, and Abuse except between the Individual and Descriptive conditions. The tests show no differences between the Individual and Minimal conditions: Mistrust, t(118) = 1.023, p = n.s.; Forgiveness, t(118) = 0.595, p = n.s.; Abuse, t(118) = 0.593, p = n.s. Similarly, there are no differences between the Individual and Experiential conditions: Mistrust, t(118) = 1.446, p = n.s.; Forgiveness, t(118) = 1.909, p = n.s.; Abuse, t(118) = 1.561, p = n.s. There are also no differences between the Experiential and Descriptive conditions: Mistrust, t(118) = 1.526, p = n.s.; Forgiveness, t(118) = 0.148, p = n.s.; Abuse, t(118) = .924, p = n.s. However, tests comparing the Individual and Descriptive conditions show significantly higher levels of Mistrust (t(118) = 3.085, p = .003) and Abuse (t(118) = 2.594, p = .012) in the Individual condition than in the Descriptive condition, as well as marginal differences in Forgiveness (t(118) = 1.737, p = .085). However, pair-wise comparisons of Trust across conditions showed interesting patterns. There is no difference in Trust between the Individual and Minimal conditions, t(118) = .096, p = n.s. However, there is a significantly lower level of Trust in the Individual condition compared to the Experiential condition (t(118) = 3.408, p < .001), a significantly lower level in the Experiential condition than in the Descriptive condition (t(118) = 2.187, p = .031), and a significantly lower level in the Individual condition than in the Descriptive condition (t(118) = 5.196, p < .001). Analyses of the trends over time demonstrate the following patterns. We find increasing Mistrust in the Individual condition (b = +0.002, t(198) = 11.48, p < .001) and in the Minimal condition (b = +0.0015, t(198) = 13.56, p < .001), but stable trends in the Experiential (b = .0001, t(198) = 1.38, n.s.) and Descriptive (b = .0001, t(198) = .778, n.s.) conditions. Forgiveness is stable in the Individual (b < .0001, t(198) = 1.505, n.s.) and Minimal (b < .0001, t(198) = .076, n.s.) conditions, but there is significantly decreasing Forgiveness in the Experiential (b = 0.0003, t(198) = 5.435, p < .001) and Descriptive (b = 0.0002, t(198) = 4.268, p < .001) conditions. In all conditions, there is significantly decreasing Abuse: Individual condition (b = 0.0001, t(198) = 2.223, p = .027), Minimal condition (b = 0.0002, t(198) = 3.044, p = .003), Experiential condition (b = 0.0005, t(198) = 8.858, p < .001), and Descriptive condition (b = .0004, t(198) = 8.00, p < .001). Finally, the trends over time demonstrate decreasing Trust in the Individual (b = 0.0005, t(198) = 9.181, p < .001) and Minimal

464

C. Gonzalez et al. / Cognitive Science 39 (2015)

(b = 0.0006, t(198) = 10.31, p < .001) conditions, but increasing Trust in the Experiential (b = +0.001, t(198) = 22.14, p < .001) and Descriptive (b = +0.001, t(198) = 29.47, p < .001) conditions. These analyses show that the dynamics of cooperation differ considerably in the short term compared to the long term. Furthermore, the analyses of repetition propensities shed new light on the differences across levels of information. While Mistrust, Forgiveness, and Abuse seem to show similar patterns across levels of information, Trust is greater for higher levels of information. The tests of trends over time show that Trust trends are very similar to the overall trends of cooperation: decreasing over trials in the Individual and Minimal conditions versus increasing in the Descriptive and Experiential conditions; and higher levels of information are also associated with decreased Forgiveness and Abuse over the 200 trials.

2. Models of learning and the IBL-PD model A large number of models in the BGT tradition make one common assumption: that players have full and complete information regarding the state of the environment, including the actions taken and the outcomes received by themselves and their opponents, as well as each player’s forgone payoffs (outcomes that would have been received had one chosen the other option). In a review of learning models, Camerer (2003; table 6.3, p. 272) concludes that most models rely on the full information assumption, and they would not be able to predict behavior in conditions where this information is not available. Also, most of these models make cognitively implausible assumptions, such as players being able to remember large amounts of information. Reinforcement learning models are those that need the least amount of information (just the player’s choices and payoffs, and the opponent’s choices and payoffs), and these models are also well founded in psychological research (Erev & Roth, 1998, 2001). Although these models have been successful in accounting for the dynamics of human behavior, there are some limits to what they can predict in the context of social dilemmas. For example, Erev and Roth (2001) noted that simple reinforcement learning models predicted the effect of experience in two-person games only in situations where players did not punish or reciprocate. However, behavioral data indicate that players do reciprocate, and the ability to punish and be punished influences behavior. Furthermore, such models predict a decrease in cooperation over time, while most behavioral experiments demonstrate an increase in cooperation due to the possibility of reciprocation (Rapoport & Chammah, 1965; Rapoport & Mowshowitz, 1966). To account for reciprocity, Erev and Roth made two explicit modifications to their basic reinforcement model (Erev & Roth, 1998): If a player adopts a reciprocation strategy, he will cooperate in the next trial only if the opponent cooperated in the current trial; in this model, the probability that a player will continue with the same strategy depends on the number of times the reciprocation strategy was played in the past. Although these changes may accurately represent the kind of cognitive reasoning that people actually use in social dilemmas, they are

C. Gonzalez et al. / Cognitive Science 39 (2015)

465

bypassing learning from experience and are unlikely to generalize to other situations with different action sets or outcomes, and most important, this updated model still assumes the presence of information regarding the opponent’s action and outcomes. Similarly, the work of Axelrod and others (Axelrod, 1980, 1984; Axelrod & Hamilton, 1981; Rapoport et al., 1976) makes less stringent cognitive assumptions when explaining the evolution of cooperation. Axelrod’s influential work was founded on evolutionary biology and formal game theory, particularly related to the repeated PD and the dynamics of cooperation. He organized a tournament calling for the submission of computational models that would play the repeated PD against one another, using the history of past actions and outcomes to determine their next action (possibly by predicting what actions the other strategies would take). The winning model was the simplest one entered. It used a Tit-for-Tat strategy: It starts with cooperation and does whatever its opponent did in the previous trial thereafter. From Axelrod’s tournament, it is clear that although the dynamics of cooperation can be very complex, they can often be explained with very simple models of learning and adaptation. A similar conclusion was also reached from more recent tournaments of competing models of decisions from experience (see Erev, Ert, & Roth, 2010a; Erev, Ert, Roth, Haruvy, Herzog, Hau et al., 2010b). Although Tit-for-Tat is a simple strategy, it relies on the assumption that a player knows an opponent’s last action taken. Thus, this strategy could not be applied in situations where such information is not available. Here we propose an IBL cognitive model to account for the behavioral phenomena described above in the PD, across differing levels of information. The model, IBL-PD, builds on a model of individual learning and decisions from experience in repeated binary choice (IBL model, Gonzalez & Dutt, 2011; Lejarraga et al., 2012), which we present next. 2.1. The IBL model of individual binary decisions from experience The IBL model of binary choice is primarily concerned with individual learning processes determined by the information available to the model. The key representation of cognitive information in the IBL model is an instance (an experience stored in memory). Each instance is an action-outcome pair, represented in memory as an episode: [Action, Outcome].2 When making a decision at time t, the IBL model selects the option with the highest Blended value, which is a weighted average of experienced outcomes in instances that belong to a given option. The blended value V of option j at time t is defined as: Vj ¼

n X

pij xij

ð1Þ

i¼1

where xij is the outcome stored in an instance i for option j, and pij is the probability of retrieving the instance i from memory for blending purposes. The variable n is the number of instances containing the different experienced outcomes for option j up to the last trial.

484

C. Gonzalez et al. / Cognitive Science 39 (2015)

hypothesis (wt = 1). When surprise is 1 at trial t (i.e., the player’s expectations are exactly against the opponent’s actions), then wt = 0, turning into the selfish hypothesis formulation, which as observed in Figs. 4 and 5 results in a low level of cooperation. When surprise is zero at trial t (i.e., the player’s expectations are met by the opponent’s actions exactly), then wt = 1, turning into the extreme fairness hypothesis formulation, which results in rapid and high collaboration over time (see Fig. 7). The interesting aspect of our formulation is that the value of wt will be adjusted dynamically in each trial, according to the surprise resulting from the difference between expected and actual opponent’s behavior in that trial. We define surprise at trial t, according to past formulations (Erev et al., 2010a; Gonzalez et al., 2011), as follows: Gapt ½MeanðGapt Þ þ Gapt 

ð8Þ

Gapt ¼ Abs½Vjðt1Þ  ðXij þ Oij Þ

ð9Þ

Surpriset ¼ where the Gap at time t is defined by:

The Gap is the absolute value of the difference between the expected utility for choosing option j, which is the Blended value Vj, from the previous time period (t1) and its actual joint outcome, which is the sum of the player’s outcome and the opponent’s outcome: (Xij + Oij). The Mean Gap at time t is defined by assuming a horizon of 200 trials of repeated PD as follows:     1 1 MeanðGapt Þ ¼ MeanðGapt1 Þ 1  þ GapðtÞ ð10Þ 200 200 The normalization of surprise in Eq. 8 ensures that the value of Surpriset is between 0 and 1, and as a result, the value of wt is also between 0 and 1. The important aspect of this mechanism is that it is dynamic and dependent on the difference between the player’s expectations and the actual obtained outcomes (i.e., surprise) on each trial. The IBL-PD model with a wt defined by Eq. 7 was run for 100 simulated pairs of players in the Experiential condition. Fig. 11 (top panel) presents the resulting learning dynamics for the proportion of cooperation plotted against the human data in the Experiential condition. The bottom panels show the corresponding repetition propensities. The dynamic expectations hypothesis has a dramatic effect on the fit between human data and model predictions. The IBL-PD model matches human dynamics of cooperation and their corresponding propensities very well. To explain the connection between the dynamics of cooperation and surprise, we calculated the average Gapt (Eq. 9) and the average of the Mean (Gapt) for all 100 model participants in each trial (Eq. 10). Fig. 12 (top panel) shows the model’s predictions of the average gap between the expected and actual outcomes over the course

486

C. Gonzalez et al. / Cognitive Science 39 (2015)

Fig. 11. Dynamic expectations hypothesis, wt = 1  Surpriset; for t = 1 to 200. Generalization results to Experiential condition. Top panel shows the proportion of individual cooperation from the human data and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5). Bottom panel shows the repetition propensities from the human data in the Experiential condition and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5).

The results from fitting the IBL-PD model’s parameters to each of the two conditions are shown in Fig. 14. The model fits human behavior in the Experiential condition extremely well. The fits to the Descriptive condition, although very good, still show some remaining deviations from human data that need to be resolved. Several possibilities to

C. Gonzalez et al. / Cognitive Science 39 (2015)

493

Notes 1. Mistrust, Forgiveness, Abuse, and Trust are called Alpha, Beta, Gamma, and Delta, respectively, in Rapoport et al. (1976). 2. Oftentimes (as in social conflict situations), the attributes that define the current situation are trivial and do not change throughout the interaction. Thus, the instance representation simplifies to [Action, Outcome] and disregards the situation attributes.

References Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Hillsdale, NJ: Lawrence Erlbaum Associates. Anderson, J. R., & Lebiere, C. (2003). The Newell test for a theory of mind. The Behavioral Brain Sciences, 26(5), 587–639. Axelrod, R. (1980). Effective choice in the Prisoner’s Dilemma. Journal of Conflict Resolution, 24(1), 3–25. Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books. Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science, 211, 1390–1396. Balliet, D., Parks, C., & Joireman, J. (2009). Social value orientation and cooperation in social dilemmas: A meta-analysis. Group Processes & Intergroup Relations, 12(4), 533–547. Ben-Asher, N., Dutt, V., & Gonzalez, C. (2013). Accounting for the Integration of Descriptive and Experiential Information in a Repeated Prisoner’s Dilemma Using an Instance-Based Learning Model. Ottawa, Canada: Behavior Representation in Modeling and Simulation (BRiMS). Ben-Asher, N., Lebiere, C., Oltramari, A., & Gonzalez, C. (2013). Balancing fairness and efficiency in repeated societal interaction. In M. Knauff, M. Pauen, N. Sebanz & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (pp. 175–180). Austin, TX: Cognitive Science Society. Bereby-Meyer, Y., & Roth, A. E. (2006). The speed of learning in noisy games: Partial reinforcement and the sustainability of cooperation. The American Economic Review, 96(4), 1029–1042. Bueno de Mesquita, B. (2006). Game theory, political economy, and the evolving study of war and peace. American Political Science Review, 100(4), 637–642. Cable, D. M., & Shane, S. (1997). A prisoner’s dilemma approach to entrepreneur-venture capitalist relationships. Academy of Management Review, 22(1), 142–176. Camerer, C. F. (2003). Behavioral game theory: Experiments in strategic interaction. Princeton, NJ: Princeton University Press. Camerer, C. F., & Fehr, E. (2006). When does “economic man” dominate social behavior? Science, 311 (5757), 47–52. Cohen, R. L. (1982). Perceiving justice: An attributional perspective. In J. Greenberg & R. L. Cohen (Eds.), Equity and justice in social behavior (pp. 119–160). New York: Academic Press. Erev, I., Ert, E., & Roth, A. E. (2010a). A choice prediction competition for market entry games: An introduction. Games, 1(2), 117–136. Erev, I., Ert, E., Roth, A. E., Haruvy, E., Herzog, S., Hau, R., Hertwig, R., Stewart, T., West, R., & Lebiere, C. (2010b). A choice prediction competition for choices from experience and from description. Journal of Behavioral Decision Making, 23(1), 15–47. Erev, I., & Haruvy, E. (2013). Learning and the economics of small decisions. In J. H. Kagel & A. E. Roth (Eds.), The handbook of experimental economics (Vol. 2). Princeton, NJ: Princeton University Press. Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. The American Economic Review, 88(4), 848–881.

C. Gonzalez et al. / Cognitive Science 39 (2015)

469

parameter values (d = 5 and r = 1.5) from calibration to the data set in Lejarraga et al. (2012) (hereafter we will refer to these values as default parameters). Then, in each level of information, we present results from the best-fit IBL parameters in order to evaluate how well the model is able to account for possible remaining differences in human behavior. To assess the accuracy of the model’s predictions, we calculated the mean squared deviation (MSD) using the average of the dependent measure (e.g., the average proportion of cooperation per trial) and using the Pearson correlation coefficient (r) to assess the similarity of the time trends between model and human data. These two measures were computed for the proportion of individual cooperation and for the repetition propensities mentioned earlier (i.e., Mistrust, Forgiveness, Abuse, and Trust). 3.1. Selfish hypothesis SVO research provides a framework for characterizing w, which is often based on archetypal fixed motivations for the weight given to the opponent’s outcomes (Murphy & Ackermann, 2013). The selfish hypothesis (w = 0 over all trials) is expected to explain behavior in the Individual and Minimal conditions, given that players did not receive explicit information about the outcomes and actions of the opponent. Players are expected to use their own outcome in their evaluation of which choice to make. In contrast with explicit interdependency information as in the Experiential and Descriptive conditions, players are expected to account for that information in making their evaluation for a choice, resulting in a poorer account of the selfish hypothesis. The IBL-PD model with default parameters and w = 0 over 200 trials was run for 100 simulated pairs of players. Fig. 4 illustrates the proportion of cooperation for the IBL-PD model compared to human data over the course of 200 trials for each of the four levels of information, and Fig. 5 presents the learning dynamics for the corresponding repetition propensities. The IBL-PD model under the Selfish hypothesis is able to closely predict the dynamics of cooperation and the repetition propensities in the Individual and Minimal information conditions. Given no knowledge of the opponent’s outcome, players use only their own outcome in the evaluation of their decision process, which increasingly reinforces defection over cooperation. The decreasing trend in cooperation over time produced by the Selfish hypothesis results from the model’s attempt to maximize its own individual gains (i.e., blended values) through the blending equation (Eq. 4 with w = 0). The [D, 10] instances may be retrieved and reinforced more often because the blended value of the “D” option would be higher than the blended value of the “C” option. Furthermore, the experience (i.e., instance) of [C, 10] will contribute to the aversion of cooperation. Since both players act under the same reasoning and expectations given that both lack information about one another, Mistrust (D after DD) increases over time. As observed in human behavior, the Minimal information level did not elicit different human behavior compared to the Individual level of information: knowing that there exist dependencies

470

C. Gonzalez et al. / Cognitive Science 39 (2015)

Fig. 4. Selfish hypothesis, wt = 0, Generalization results. Proportion of cooperation. The figure shows the proportion of individual cooperation from the human data and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5). Each panel shows the human and model results for each of the four levels of interdependency information: Individual, Minimal, Experiential, and Descriptive conditions.

with another person without providing any additional information (action and outcome) did not alter human behavior. The model’s predictions also reveal that the selfish hypothesis does not explain behavior in the Experiential and Descriptive conditions. That is, once explicit information about the opponent’s actions and outcomes is available, players use such information and move away from the predictions of selfish behavior. As seen in Table 1 (top section), the MSD for the proportion of cooperation increases as the amount of interdependency information increases, suggesting that the IBL-PD model under the Selfish hypothesis made poorer predictions given increased amounts of information about the opponent. The correlations over trials also suggest good predictions in the Individual and Minimal conditions, but poor predictions (with negative correlations) in the Experiential and Descriptive conditions. 3.1.1. Fitting the IBL-PD model parameters to the Individual and Minimal conditions As seen in Figs. 4 and 5 when using the default parameters (d = 5, r = 1.5), the predicted proportion of cooperation as well as the repetition propensities, mainly Trust and Mistrust, are slightly off the human behavior in Individual and Minimal conditions. To determine whether the model is capable of accounting for these differences, we calibrated the IBL-PD parameters (d and r) in the Individual and Minimal conditions in order to minimize the MSD between the proportion of cooperation observed and the predicted proportion of cooperation at each trial. We used a Genetic Algorithm to calibrate the values

C. Gonzalez et al. / Cognitive Science 39 (2015)

471

Fig. 5. Selfish hypothesis, wt = 0, Generalization results. Repetition propensities. The figure shows the repetition propensities from the human data and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5). Each of the four panels shows one of the repetition propensities measures: Mistrust, Forgiveness, Abuse, and Trust; and within each panel graphs show the four levels of interdependency information: Individual, Minimal, Experiential, and Descriptive conditions.

of the two free parameters. d varied between 0.01 and 20 and r varied between 0.01 and 10. Each iteration of the optimization process included 100 pairs of simulated players. The same set of d and r was used for all the pairs, as well as for the two players that construct a pair. Furthermore, the values of the parameters did not change throughout the run of 200 trials. After simulating 100 pairs of players with the same set of parameters, the result of the simulation (i.e., MSD) was evaluated by comparing the average proportion of cooperation of 100 simulated pairs to the average proportion of human cooperation in each of the 200 trials. The results from fitting the IBL-PD model’s parameters are shown in Fig. 6 and Table 1 (bottom section). Accordingly, the IBL-PD model under the Selfish hypothesis is able to account for the dynamics of cooperation and the repetition propensities in both the Individual and Minimal information conditions extremely well, after fitting the model’s parameters (d and r) in each of the two data sets. The resulting parameter values

Information Level

Proportion of Cooperation MSD (r2) Mistrust D DD MSD (r2)

Forgiveness C CD MSD (r2)

Repetition Propensities Abuse D DC MSD (r2)

Trust C CC MSD (r2)

Individual 0.0191 (0.76*) 0.0589 (0.77*) 0.0070 (0.69*) 0.0084 (0.45*) 0.002 (0.75*) Minimal 0.0241 (0.62*) 0.0772 (0.70*) 0.0091 (0.73*) 0.011 (0.47*) 0.0025 (0.67*) Experiential 0.0671 (0.19*) 0.1037 (0.45*) 0.003 (0.75*) 0.0032 (0.71*) 0.0393 (0.06*) Descriptive 0.1355 (0.05) 0.1527 (0.53*) 0.0032 (0.75*) 0.0019 (0.66*) 0.1047 (0.07) 0.0043 (0.70*) 0.0163 (0.70*) 0.0051 (0.63*) 0.002 (0.43*) 0.0029 (0.73*) Individual: Selfish hypothesis, wt = 0, d = 3.583; for t = 1 to 200. r = 4.396 Parameters d and r were 0.0047 (0.61*) 0.0098 (0.68*) 0.0067 (0.67*) 0.0024 (0.43*) 0.0032 (0.66*) Minimal: calibrated to human data in d = 0.631; the Individual and Minimal r = 1.713 conditions Note. *Indicates significance at p < 0.05. The table shows a summary of MSD values and correlations over the 200 trials between the model data and human data in the four information levels (top section). These results are model generalizations, using parameter values (d = 5 and r = 1.5) carried over from calibration to a different data set (Lejarraga et al., 2012). Bottom section shows results from calibrating the d and r model parameters to the Individual and Minimal conditions.

Selfish hypothesis, wt = 0, for t = 1 to 200 (d = 5 and r = 1.5)

Hypotheses (Parameters)

Table 1 Summary of tests of the selfish hypothesis, wt =0, for t = 1 to 200

472 C. Gonzalez et al. / Cognitive Science 39 (2015)

C. Gonzalez et al. / Cognitive Science 39 (2015)

473

result in higher levels of noise and higher reliance on more recent information in the Individual compared to the Minimal condition. Perhaps the additional information provided in the Minimal condition regarding the existence and interdependence on a human opponent created a need to rely on longer historic information more consistently. In any case, different in parameters resulting in accurate accounts of proportion of cooperation and repetition propensities in these two conditions (see Table 1, bottom section). The following hypotheses where wt is different from zero, are only relevant to the Experiential and Descriptive conditions. We first test the following hypotheses in the Experiential condition, and later on make a final comparison to behavior in the Descriptive information condition. 3.2. Extreme fairness hypothesis Often, fairness is defined by equality in an exchange, and conflicts are assumed to emerge more often when exchanges between two entities are unequal (Molm, 2003). In fact, it is expected that the more an actor attributes responsibility for inequality to another person, the more that other person’s behavior is perceived as being unfair (Cohen, 1982; Molm, 2003). Some believe that humans are sensitive to fairness, because forming secure social bonds is fundamental to survival (Ben-Asher, Lebiere, Oltramari, & Gonzalez, 2013; Tabibnia, Satpute, & Lieberman, 2008). The extreme fairness hypothesis is the equal treatment of the opponent’s outcomes as the player’s own outcomes in all trials. The IBL-PD model with default parameters and a w = 1 over 200 trials was run for 100 simulated pairs of players. Fig. 7 (top panel) illustrates the proportion of cooperation for the IBL-PD model compared to human data over the course of 200 trials, and Fig. 7 (bottom panel) presents the learning dynamics for the corresponding repetition propensities against the human behavior in the Experiential condition. Under the extreme fairness hypothesis, the IBL-PD model is a lot more altruistic and thus less concerned with exploitation than actual humans in the Experiential condition, generating a rapid increase in cooperation over time to an optimal level at the end of 200 trials. The relatively high MSD values suggest poor predictions of the overall proportion of cooperation, as well as for repetition propensities (Mistrust and Trust in particular). The dynamics of cooperation (correlation values) also suggest that the extreme fairness does not account for human behavior in the Experiential condition. In fact, using this hypothesis worsens the model’s predictions, compared to the Selfish hypothesis (see top section in Table 2). In an ideal world, if each player maximizes the sum of their own outcome and their opponent’s outcome while weighting them equally, they would reach near optimal cooperation levels rather quickly. However, humans did not behave in this way. 3.3. Moderate fairness hypothesis Often, measures of SVO rely on extreme values of wt, as in the selfish and extreme fairness hypotheses. The increasing trend in cooperation over time produced by extreme fair-

474

C. Gonzalez et al. / Cognitive Science 39 (2015)

Fig. 6. Selfish hypothesis, wt = 0, Best fit to human data. Results from fitting the IBL-PD model’s parameters to the Individual information condition (d = 3.583 and r = 4.396) and to the Minimal information condition (d = 0.631 and r = 1.173). Top panels show the proportion of individual cooperation and the Individual and Minimal information conditions. Bottom panels show the corresponding repetition propensities.

ness indicates that it is implausible that people would treat their own and the opponent’s outcomes the same in the PD. Thus, more intermediate values of w may be more likely. To determine a constant value of w over trials that best represents human data, we fitted w to the individual proportion of cooperation from the Experiential information condition. The IBL model parameters, d and r, were kept at the default values (generalized from Lejarraga et al., 2012). The value of w varied between 0.001 and 0.990 in units of 0.001 with the goal of finding the minimal value of MSD between the model’s data and the observed human behavior. This calibration resulted in a w value of .706, with a MSD of .015. The IBL-PD model with a wt = .706 over 200 trials was run for 100 simulated pairs of players using the default values for the model parameters, d and r. Fig. 8 (top panel) illustrates the proportion of cooperation for the IBL-PD model compared to human data over the course of 200 trials, and Fig. 8 (bottom panels) presents the learning dynamics for the corresponding repetition propensities against the human behavior in the Experiential condition. Although there is clear improvement in these results compared to the extreme fairness and the selfish hypotheses (see also Table 2), the model does not fully

C. Gonzalez et al. / Cognitive Science 39 (2015)

475

Fig. 7. Extreme fairness hypothesis, wt = 1. Generalization results to Experiential condition. Top panel shows the proportion of individual cooperation from the human data and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5) in the Experiential condition. Bottom panel shows the repetition propensities from the human data in the Experiential condition and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5).

capture the trends of cooperation and Trust behaviors over trials (these correlations are significant and negative). While humans start decreasing cooperation and learn to increase it with more practice, the IBL-PD model under moderate fairness continues to decrease cooperation over repeated trials.

Descriptive information Selfish hypothesis, wt = 0, for t = 1 to 200 (d = 5 and r = 1.5) Dynamic expectations hypothesis, wt = 1  Surpriset; for t = 1 to 200 (d = 5 and r = 1.5)

Experiential information Selfish hypothesis, wt = 0, for t = 1 to 200 (d = 5 and r = 1.5) Extreme fairness hypothesis, wt = 1, for t = 1 to 200 (d = 5 and r = 1.5) Moderate fairness hypothesis, wt = 0.706 (d = 5 and r = 1.5) Linear increasing hypothesis, wt = 0.246 + 0.0029 9 t(d = 5 and r = 1.5) Hyperbolic discounting hypothesis, 1 for t = 1 to 200 k wt ¼ 1þkð200tÞ fitted to = 0.028 (d = 5 and r = 1.5) Dynamic expectations hypothesis, wt = 1  Surpriset; for t = 1 to 200 s (d = 5 and r = 1.5) Dynamic expectations hypothesis, wt = 1  Surpriset; for t = 1 to 200. Parameters d = 3.56 and r = 1.41 were calibrated to human data in the Experiential condition

Hypotheses (wt function and IBL model parameters)

1

0.0756 (0.44*)

0.0368 (0.35*)

0.0104 (0.51*)

0.0581 (0.18*)

0.022 (0.69*)

0.004 (0.63*)

0.0169 (0.47*)

0.0325 (0.55*)

0.0275 (0.32*)

0.0129 (0.75)

0.0223 (0.29*)

0.015 (0.43*)

0.1527 (0.53*)

0.2673 (0.06)

0.359 (0.18*)

0.1355 (0.05)

0.1037 (0.45*)

Mistrust D DD MSD (r2)

0.0671 (0.19*)

Proportion of Cooperation MSD (r2)

0.0027 (0.72*)

0.0032 (0.75*)

0.002 (0.69*)

0.0025 (0.72*)

0.0029 (0.68*)

0.0028 (0.75*)

0.0025 (0.77*)

0.0011 (0.70*)

0.003 (0.75*)

Forgiveness C CD MSD (r2)

Repetition Propensities

Table 2 Summary of results in Experiential condition (top section) and the Descriptive condition (bottom section)

0.0020 (0.60*)

0.0019 (0.66*)

0.0023 (0.51*)

0.0018 (0.61*)

0.0025 (0.63*)

0.002 (0.62*)

0.0018 (0.61*)

0.0104 (0.63*)

0.0032 (0.71*)

Abuse D DC MSD (r2)

(continued)

0.0135 (0.91*)

0.1047 (0.07)

0.0041 (0.85*)

0.018 (0.86*)

0.0397 (0.18*)

0.0187 (0.59*)

0.0119 (0.21*)

0.3432 (0.62*)

0.0393 (0.06*)

Trust C CC MSD (r2)

476 C. Gonzalez et al. / Cognitive Science 39 (2015)

Mistrust D DD MSD (r2) 0.0192 (0.64*)

Proportion of Cooperation MSD (r2) 0.0181 (0.73*)

0.0029 (0.73*)

Forgiveness C CD MSD (r2)

Repetition Propensities

0.0015 (0.67*)

Abuse D DC MSD (r2)

0.0111 (0.91*)

Trust C CC MSD (r2)

Note. *Indicates significance at p < 0.05. For completeness, the results of generalizations in the selfish hypothesis are repeated in this table (top row in each panel). The table shows a summary of MSD values and correlations over the 200 trials between the model and human data. The selfish hypothesis results in Experiential and Descriptive Information conditions are repeated from Table 1.

Dynamic expectations hypothesis, wt = 1  Surpriset; for t = 1 to 200. Parameters d = 2.04 and r = 0.49 were calibrated to human data in the Descriptive condition

Hypotheses (wt function and IBL model parameters)

1

Table 2. (continued)

C. Gonzalez et al. / Cognitive Science 39 (2015) 477

478

C. Gonzalez et al. / Cognitive Science 39 (2015)

3.4. Linear increasing hypothesis Recent conclusions on the measures of SVO suggest that a static or constant w may be inadequate, particularly when addressing preference change with time and with different amounts of information (Murphy & Ackermann, 2013). From this point on, we consider preferences that change over time, as a function of the repeated interactions with an opponent. As a first step, we consider functions that represent an improved over-time understanding that the joint outcomes determine each of the player’s actions. In line with the behavioral findings, the proportion of cooperation emerges gradually as a result of repeated interactions. This improved understanding of the common outcomes’ benefits may take different functional forms. The first one we consider is a simple linear increasing function. The linear increasing hypothesis suggests that a player would increase consideration for an opponent’s outcomes over repeated interactions in equal increasing rates over time. We used a genetic algorithm to find the best fitting linear function. We varied the values for the slope and intercept between 1 and 1 in order to minimize the MSD of the proportion of cooperation between the model and human data. The resulting linear equation was: wt ¼ 0:246 þ 0:0029  t

ð5Þ

The IBL-PD model with default parameters and a w defined by Eq. 5 was run for 100 simulated pairs of players. Fig. 9 (top panel) illustrates the proportion of cooperation for the IBL-PD model compared to human data in the Experiential condition over the course of 200 trials, and Fig. 9 (bottom panels) presents the learning dynamics for the corresponding repetition propensities. Results show significant improvements in accounting for human behavior compared to previous hypotheses (see also Table 2). Yet the best linear function does not capture the shape and trends of cooperation and Trust over time. The increase in the proportion of cooperation occurs too late in the model compared to human data, and the Trust repetition propensities show differences in the increase of trust that the human data demonstrate. 3.5. Hyperbolic discounting hypothesis As suggested by the results above, the assumption of a linear increase of w in equal rates as a function of time may be too stringent to represent human behavior well. A more plausible assumption is that the realization of the long-term benefits of mutual cooperation versus the short-term benefits are not weighted the same in the human mind. According to hyperbolic discounting research, when given two similar rewards, humans express preference for the one that arrives sooner rather than later (Thaler, 1981). In the case of the PD rewards (Fig. 1), because the incentive of defection is larger than the incentive of cooperation, and the benefits of cooperation only arrive after long-term interactions, the initial preference for defection is very strong. The hyperbolic discounting function has been used successfully in many studies to model human preferences that

C. Gonzalez et al. / Cognitive Science 39 (2015)

479

Fig. 8. Moderate fairness hypothesis, wt = 0.706. Generalization results to Experiential condition. Top panel shows the proportion of individual cooperation from the human data and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5). Bottom panel shows the repetition propensities from the human data in the Experiential condition and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5).

emerge over time (Frederick et al., 2002). The function represents the way that humans discount later rewards by a factor that increases with the length of the time that it takes to obtain such reward.

480

C. Gonzalez et al. / Cognitive Science 39 (2015)

Fig. 9. Linear increasing hypothesis, wt = 0.246 + 0.0029 9 t, for t = 1 to 200. Generalization results to Experiential condition. Top panel shows the proportion of individual cooperation from the human data and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5). Bottom panel shows the repetition propensities from the human data in the Experiential condition and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5).

Under the hyperbolic discounting hypothesis, w may take a functional form where the weight for the opponent’s outcome in the Blending mechanism (Eq. 4) increases nonlinearly over time as follows:

C. Gonzalez et al. / Cognitive Science 39 (2015)

wt ¼

1 1 þ kð200  tÞ

481

ð6Þ

Here k is a parameter governing the degree of discounting. To evaluate the plausibility of this functional form of w, we first determined the best value of k. We fitted k to the human data in the Experiential information condition. The other parameters, d and r, were kept at the default values. The goal of the optimization process was to maximize the fit between cooperation rates from the IBL-PD model and the human data at each trial. For that, the value of k varied between 0 and 5 in units of 0.001 with the goal of finding a value that minimized the MSD between the model’s predictions and the observed human behavior. Calibrating the k parameter on the Experiential condition resulted in a value of k = 0.028 with a MSD value of 0.0581. The IBL-PD model with a wt defined by Eq. 6 was run for 100 simulated pairs of players. Fig. 10 (top panel) illustrates the proportion of cooperation for the IBL-PD model compared to human data in the Experiential condition over the course of 200 trials, and Fig. 10 (bottom panels) presents the learning dynamics for the corresponding repetition propensities. According to these results, the hyperbolic discounting hypothesis is not supported. Under this assumption, the weight given to the other’s outcome is small and the IBL-PD model decreases the proportion of cooperation over time. Then there is a sudden transition toward cooperation near the end, produced by more weight being given to the opponent’s outcomes when the end of the game approaches. Based on Eq. 6, the value of wt approaches 1 toward the end of the interaction as it depends on t. Thus, the weighted value of the mixed outcomes is close to zero (for mixed decisions, CD and DC, one player gets +10 and the other gets 10). Then, the model’s decisions depend mostly on the sum of the weighted outcomes from mutual cooperation, which is always positive and close to 2, and on mutual defection, which is always negative and close to 2. Given that mutual cooperation leads to positive outcomes and mutual defection leads to negative outcomes, both players prefer to simultaneously cooperate at an increasing rate. Human transition toward cooperation happens earlier and it is more gradual than what this IBLPD model predicts. The MSD values suggest poor fits for proportions of cooperation and repetition propensities (Mistrust and Trust in particular). Abuse seems to be on target with human data, but not Forgiveness or Mistrust. The model’s Mistrust is greater than the human data’s. 3.6. Dynamic expectations hypothesis Although many studies support hyperbolic discounting as a good description of behavior when short- and long-term rewards trade off, a large number of studies also suggest that a constant value for the degree of discounting (i.e., k) results in behavioral patterns often debunked by human studies (Frederick et al., 2002). Rather than a unique functional form that describes wt, it is possible that w is dynamic and takes values in each trial that depend on individual experiences, actions, and expectations regarding the opponent. A

482

C. Gonzalez et al. / Cognitive Science 39 (2015)

1 Fig. 10. Hyperbolic discounting hypothesis, wt ¼ 1þkð200tÞ for t = 1 to 200 k fitted to = 0.028. Generalization results to Experiential condition. Top panel shows the proportion of individual cooperation from the human data and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5). Bottom panel shows the repetition propensities from the human data in the Experiential condition and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5).

sophisticated player might dynamically adjust his or her consideration for an opponent’s outcomes according not only to the number of interactions but also according to the opponent’s previous actions. Humans are dynamic and often adjust their decisions to the circumstances of the environment, creating their own individual expectations and acting accordingly (Gonzalez et al., 2003). Understanding how people’s preferences change and

C. Gonzalez et al. / Cognitive Science 39 (2015)

483

the factors that influence interdependence is an extremely important challenge in social dilemmas that has only started to be investigated (Murphy & Ackermann, 2013). The dynamic expectations hypothesis proposes an adjustment of the weight that a player gives to the opponent’s outcomes as a function of the expected outcomes and actual outcomes obtained. The difference between the expected outcomes and those actually obtained is deemed a “surprise,” a concept introduced recently in the IBL model and related cognitive models to explain individual behavior in a market entry task (Erev et al., 2010a; Gonzalez, Dutt, & Lejarraga, 2011; Nevo & Erev, 2012). Surprise is often defined as disconfirmed expectations, and the degree of surprise is often expected to contrast with the degree of an expected outcome (Maguire, Maguire, & Keane, 2011). For example, it has been found that the level of surprise experienced is related to the uncertainty and the difficulty of integrating that event with existent information (Maguire et al., 2011). In the repeated PD game, periods of low surprise may involve reciprocity (the player expects the opponent to defect or to cooperate and the opponent does what the player expects). As demonstrated in our analyses of sequential dependencies, the trends over time of Mistrust (defection after both players have defected in the previous trial) are stable, while trends of Trust (cooperation after both players have cooperated in the previous trial) increase over time in the Experiential and Descriptive information conditions. This suggests that the player’s ability to predict the opponent’s reciprocal cooperation accentuates the long-term benefits of mutual cooperation. In this way, a player would behave more “fairly” in periods of low surprise expecting mutual cooperation. On the other hand, it is possible that periods with high surprise involve lack of reciprocity, which could be positive (the player defects and the opponent cooperates) or negative (the player cooperates and the opponent defects). However, as shown in the analyses of sequential dependencies, the trends of Forgiveness (cooperation after the player cooperated in the previous trial and the opponent defected) and of Abuse (defection after the player defected in the previous trial and the opponent cooperated) both significantly decrease over time in the Experiential and Descriptive conditions. Thus, the player’s inability to predict the opponent’s actions (surprising actions) would decrease over time, through repeated interactions with the same opponent. Players would be able to predict each other’s actions more accurately over time, and less surprise will allow cooperation to emerge as a player gains a better understanding of each other’s motivations, as well as the ability to better interpret their mutual dependencies. Under the dynamic expectations hypothesis, a player will account for the opponent’s outcome as a function of a normalized gap between the expected and the actual outcomes (i.e., surprise). Our assumption is that the value of wt (with regards to the opponent’s outcome at trial t) will be reduced by surprise as follows: wt ¼ 1  Surpriset

ð7Þ

This formulation assumes that the way a player accounts for the opponent’s outcomes will vary between two extremes, the selfish Hypothesis (wt = 0) and the extreme fairness

484

C. Gonzalez et al. / Cognitive Science 39 (2015)

hypothesis (wt = 1). When surprise is 1 at trial t (i.e., the player’s expectations are exactly against the opponent’s actions), then wt = 0, turning into the selfish hypothesis formulation, which as observed in Figs. 4 and 5 results in a low level of cooperation. When surprise is zero at trial t (i.e., the player’s expectations are met by the opponent’s actions exactly), then wt = 1, turning into the extreme fairness hypothesis formulation, which results in rapid and high collaboration over time (see Fig. 7). The interesting aspect of our formulation is that the value of wt will be adjusted dynamically in each trial, according to the surprise resulting from the difference between expected and actual opponent’s behavior in that trial. We define surprise at trial t, according to past formulations (Erev et al., 2010a; Gonzalez et al., 2011), as follows: Gapt ½MeanðGapt Þ þ Gapt 

ð8Þ

Gapt ¼ Abs½Vjðt1Þ  ðXij þ Oij Þ

ð9Þ

Surpriset ¼ where the Gap at time t is defined by:

The Gap is the absolute value of the difference between the expected utility for choosing option j, which is the Blended value Vj, from the previous time period (t1) and its actual joint outcome, which is the sum of the player’s outcome and the opponent’s outcome: (Xij + Oij). The Mean Gap at time t is defined by assuming a horizon of 200 trials of repeated PD as follows:     1 1 MeanðGapt Þ ¼ MeanðGapt1 Þ 1  þ GapðtÞ ð10Þ 200 200 The normalization of surprise in Eq. 8 ensures that the value of Surpriset is between 0 and 1, and as a result, the value of wt is also between 0 and 1. The important aspect of this mechanism is that it is dynamic and dependent on the difference between the player’s expectations and the actual obtained outcomes (i.e., surprise) on each trial. The IBL-PD model with a wt defined by Eq. 7 was run for 100 simulated pairs of players in the Experiential condition. Fig. 11 (top panel) presents the resulting learning dynamics for the proportion of cooperation plotted against the human data in the Experiential condition. The bottom panels show the corresponding repetition propensities. The dynamic expectations hypothesis has a dramatic effect on the fit between human data and model predictions. The IBL-PD model matches human dynamics of cooperation and their corresponding propensities very well. To explain the connection between the dynamics of cooperation and surprise, we calculated the average Gapt (Eq. 9) and the average of the Mean (Gapt) for all 100 model participants in each trial (Eq. 10). Fig. 12 (top panel) shows the model’s predictions of the average gap between the expected and actual outcomes over the course

C. Gonzalez et al. / Cognitive Science 39 (2015)

485

of the 200 trials. The middle panel shows the average Mean (Gapt) over the course of 200 trials. As seen, the Gap quickly decreased during the first few trials. Initially, the expectations were based on the values stored in prepopulated instances of the model (+30), which do not correspond to any actual outcomes. Thus, even with little experience, expectations were quickly adjusted to the actual outcomes within a relatively small number of trials. Furthermore, after this major adaptation and with the accumulation of experiential information, the gap continues to decrease, although less dramatically than initially. There is a relatively steady process in which the two simulated players adjust their expectations to each other’s actions and the corresponding observed outcomes. The bottom panel in Fig. 12 illustrates the average surprise over the 200 trials (Eq. 8). As expected, surprise decreases as the amount of experience with the same opponent increases, suggesting that the weight given to the opponent’s outcome increases with experience (according to Eq. 7). The decrease in surprise (and thus the increase in w) is driven by the decrease in Gapt and the increase in the Mean (Gapt) presented above. As the information gathered through repeated interaction is used to make conclusions about the mutual dependencies between the two players, surprise decreases and the weight that the opponent’s outcomes receive in the selection of strategy increases. This in turn highlights the benefits from mutual cooperation compared to mutual defection. Thus, the dynamics of surprise eventually influence the model’s tendency to cooperate. 3.6.1. Fitting the IBL model parameters to data in the Experiential and Descriptive conditions The results from a generalization of the IBL-PD model in the dynamic expectations hypothesis compared to the human data in the Descriptive information level are shown in Fig. 13. These results suggest that the same model accounts for behavior in the Descriptive information condition as well. This model generalization was achieved with no structural changes to the model used for the Experiential condition. Results for the Experiential and Descriptive conditions are very encouraging, and they are good support for the robustness of the IBL model across different levels of information. Yet these results also suggest some differences between the model’s predictions and human behavior. In the Experiential condition, Fig. 11 shows that the rate of increase in cooperation for humans (and the rate of mistrust, for example) stabilizes in the final trials, while the model continues an increasing trend over the last trials. Also, in the Descriptive condition, Fig. 13 suggests that the presentation of the matrix together with experiential information may have some effects toward cooperation in humans: Behavioral data show initial higher level of cooperation in the Descriptive compared to the Experiential condition. To evaluate whether the model is capable of accounting for such remaining differences, we calibrated the IBL model’s parameters (d and r) for the Experiential and the Descriptive conditions separately. The process for calibration was identical to that performed for the selfish hypothesis in the Individual and Minimal conditions.

486

C. Gonzalez et al. / Cognitive Science 39 (2015)

Fig. 11. Dynamic expectations hypothesis, wt = 1  Surpriset; for t = 1 to 200. Generalization results to Experiential condition. Top panel shows the proportion of individual cooperation from the human data and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5). Bottom panel shows the repetition propensities from the human data in the Experiential condition and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5).

The results from fitting the IBL-PD model’s parameters to each of the two conditions are shown in Fig. 14. The model fits human behavior in the Experiential condition extremely well. The fits to the Descriptive condition, although very good, still show some remaining deviations from human data that need to be resolved. Several possibilities to

C. Gonzalez et al. / Cognitive Science 39 (2015)

487

Fig. 12. Dynamic expectations hypothesis, wt = 1  Surpriset; for t = 1 to 200. Surprise Values and the Gap. Top panels show the Gapt and Mean (Gapt) data from the IBL-PD model for t = 1 to 200. Bottom panel shows the surprise data from the IBL-PD model in the Experiential condition.

resolve these remaining differences that the IBL-PD model does not account for in the Descriptive condition are addressed in the Discussion section. Numerical results from the different hypotheses are summarized in Tables 1 and 2. Overall, the best accounts of behavioral data occur with the selfish hypothesis in the Individual and Minimal conditions, and with the dynamic expectations hypothesis in the Experiential and Descriptive conditions.

4. Discussion and conclusions This research advances our understanding of the socio-cognitive mechanisms involved in the emergence of cooperation and trust in situations that are representative of real-life interactions with various levels of interdependency information. Building on behavioral findings from a large data set collected in the well-known social dilemma, the PD (Martin et al., 2013), we demonstrate that the presence of interdependency information has different effects on the dynamics of the short and long terms of interaction. While there is a

488

C. Gonzalez et al. / Cognitive Science 39 (2015)

Fig. 13. Dynamic expectations hypothesis, wt = 1  Surpriset; for t = 1 to 200. Generalization results to Descriptive condition. Top panel shows the proportion of individual cooperation from the human data and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5). Bottom panel shows the repetition propensities from the human data in the Experiential condition and the results from the IBL-PD model from t = 1 to 200 with parameters (d = 5 and r = 1.5).

decrease of cooperation in all information conditions in the short term, cooperation emerges and increases in the long term with more explicit information about the opponent. Our analyses also show interesting differential effects of interdependency information on the patterns of repetition propensities over time. Most noteworthy are the trends

C. Gonzalez et al. / Cognitive Science 39 (2015)

489

found in the Trust repetition propensities over time, which are very similar to those found in the dynamics of cooperation. Trust (and cooperation) decreases over time with less interdependency information, while Trust (and cooperation) increases over time with more interdependency information. These trends suggest that to obtain long-term benefits, it is necessary to establish a level of trust in the opponent that is expected to continue to cooperate given that both players cooperated in previous trials. To explain these behavioral trends, we proposed a cognitive model (IBL-PD) that builds on a model of individual decisions from experience in binary choice tasks (Gonzalez & Dutt, 2011; Lejarraga et al., 2012). The IBL-PD model relies on the IBLT idea of maximization of rewards (joint outcomes) (Gonzalez et al., 2003) and on the concept of SVO, where the opponent’s outcome is weighted relative to the own outcome (Balliet

Fig. 14. Dynamic expectations hypothesis, wt = 1  Surpriset; for t = 1 to 200. Generalization results to Descriptive condition. Best fit to human data in both Experiential and Descriptive conditions. Figures show results from fitting the IBL-PD model’s parameters to each of the two conditions. The parameters in the Experiential condition were (d = 3.56 and r = 1.41) and the parameters in the Descriptive condition were (d = 2.04 and r = 0.49). Top panels shows the proportion of individual cooperation from the human and the results from the IBL-PD model fitted to the Experiential condition (left panel) and to the Descriptive condition (right panel). Bottom panels show the corresponding repetition propensities from the human data and the results from the IBL-PD model fitted to the Experiential condition (left panels) and to the Descriptive condition (right panels).

490

C. Gonzalez et al. / Cognitive Science 39 (2015)

et al., 2009; Fiedler et al., 2013; Murphy & Ackermann, 2013). Not only the current IBL-PD model provides a plausible computational integration of the SVO into a cognitive learning theory, but it also postulates a formalization and a fundamental connection between SVO and dynamic learning processes based on expectations and actual behavior (i.e., surprise). It is important to highlight that the cognitive mechanisms of the IBL-PD model (activation, retrieval probability, and blending) are those used for dynamic decision making at the individual level. Thus, this research presents an important generalization and demonstration of both the IBL model and of the theory’s robustness (Gonzalez et al., 2003). Our results show very accurate generalizations from the individual IBL mechanisms. For example, the selfish hypothesis where wt = 0, is equivalent to the Blending equation in the model used in individual binary choice studies (Gonzalez & Dutt, 2011; Lejarraga et al., 2012). The IBL’s Blending mechanism represents the player’s expected utility, which was expanded to account for the player’s own outcome as well as the opponent’s outcome. The SVO assumption is that an opponent’s outcome receives some weight (w) during the decision making process. When wt = 0, each player learns from experience with the objective of maximizing individual gains only. Predictions from the IBL-PD model under the selfish hypothesis show a trend of cooperation and repetition propensities that very accurately resemble human behavior in the Individual and Minimal levels of information. The decreasing trend in cooperation is due to attempts to maximize individual gains, reinforcing defection that is more personally rewarding in any given trial. Behavioral results indicate that mere knowledge that some interdependency exists (in the Minimal condition) does not make a difference in these interactions: The absence of social information takes the sting out of the social dilemma (see also Martin et al., 2013). However, as discussed above, when the IBL model’s parameters, d and r, were fitted to the Individual and Minimal levels of information, we observed the best correlation of the trend in the Individual than Minimal conditions and a difference in the parameter values that best account for the human cooperation over time. Fitting to the Minimal condition resulted in lower d and r parameters compared to the fit values of the individual condition. This suggests reliance on more recent instances and less accurate retrievals in the Individual than Minimal conditions. Thus, knowledge of the existence of an opponent, even without explicit information of the opponent’s outcomes, may have an effect on the memory processes which motivate the use of more historic information and more accurate retrievals than when players think they are playing individually. Further studies where such historic information may be tracked from participants’ memory are needed. Together with the extensive demonstrations of the robustness of the IBL model mechanisms in individual choice tasks (Gonzalez, 2013; Gonzalez & Dutt, 2011; Lejarraga et al., 2012), this result suggests that in social dilemmas with no explicit interdependency information, the dynamics of cooperation rely on the same set of cognitive mechanisms as in individual repeated-choice tasks: Each player learns from experience with the objective of maximizing individual gains, while constrained by memory attributes like frequency and recency of information (activation).

C. Gonzalez et al. / Cognitive Science 39 (2015)

491

Results from the extreme fairness hypothesis, when wt = 1, predicted that high and nearly optimal levels of cooperation would be quickly achieved when each individual maximizes the sum of both players’ outcomes weighted equally. But according to observed human behavior, this is not what players do. Slightly better results from the model under the linear increasing hypothesis lend some support to the empirical observation that an individual’s regard for the other’s outcomes increases with repeated interactions; however, this hypothesis did not capture the initial gradual increase in cooperation. A less stringent hypothesis, hyperbolic discounting hypothesis, was expected to account better for the differences in the short-term and long-term cooperation, but instead, cooperation emerged very late in the interaction, failing to capture human behavior. The IBL-PD model’s predictions under the dynamic expectations hypothesis performed the best in the Experiential and Descriptive conditions. Thus, it seems that humans calibrate the consideration they give to the opponents’ outcomes dynamically, according to their own past experience and as a function of the gap between the expected and the obtained outcome (i.e., surprise). This dynamic consideration of the opponent’s outcomes performed the best when the d and r parameters were calibrated to human cooperation. However, after fitting these parameters to the Descriptive condition, some differences between the model’s and human behavior remained. The question of how descriptive information works in conjunction with experiential information has been studied recently (e.g., Lejarraga & Gonzalez, 2011). Current evidence seems to indicate that people ignore descriptive information to rely purely on experiential information (at least at the individual level). Our results suggest that this may not be the case in social dilemmas. But how does descriptive information mediate the effects of experiential information in social dilemmas? According to our dynamic expectations hypothesis, we expect that the presence of the payoff matrix from the outset would set initial expectations differently. Participants would be “less surprised” with the behavior of the opponent, given that they could see the possible actions and outcomes of the opponent, and make some inferences about the opponent’s expected behavior accordingly. In the IBL-PD model, we used values of initial expectations (prepopulated instances) that were relatively arbitrary and were inherited from past work with the IBL model at the individual level (Gonzalez & Dutt, 2011; Lejarraga et al., 2012). The presence of the payoff matrix may be modeled by changing the value of these initial expectations in the prepopulated instances. Preliminary work with the IBL-PD model suggests that initial expectations are modified from the explicit information in the payoff matrix, reducing surprise in the subsequent experiential actions (Ben-Asher, Dutt, & Gonzalez, 2013). However, more theoretical work and guided experimentation are needed to understand these effects. Although our results demonstrate the far-reaching potential of a theory that was developed to explain individual behavior (IBLT; Gonzalez et al., 2003), many more demonstrations of the predictive potential of the IBL-PD model in other social dilemmas are needed. For example, we used only one out of the six possible matrices in the PD studies presented by Rapoport and Chammah (1965). It would be interesting to expand our results and demonstrate the generality of the IBL-PD model across different PD matrices and even across multiple social dilemmas (Rapoport et al., 1976). Similarly, more demonstrations related

492

C. Gonzalez et al. / Cognitive Science 39 (2015)

to the availability of information are needed, including the model’s ability to capture wellknown effects from random matching (Erev & Haruvy, 2013) and noisy payoffs (BerebyMeyer & Roth, 2006). Additionally, an important observation relates to the SVO and the way we have used the concept in our IBL-PD model. Traditionally, SVO has been used as a measure of the individual motives of how those motives would predict the actions taken in social dilemmas (Balliet et al., 2009; Fiedler et al., 2013; Murphy & Ackermann, 2013). The relationship between the SVO individual measures and cooperation in social dilemmas has been small (Balliet et al., 2009). An interesting idea for future research involves collecting SVO measures from individuals and using the resulting value from this measure as the w value in the IBL-PD Blending equation. Such procedure would allow for predictions about cooperative behavior at the individual level. Finally, these findings may be used to design new approaches for improving collaboration in natural repeated interactions. Our results suggest that motivations to cooperate in day-to-day repeated interactions are often dynamic, and that decisions are frequently guided by experience rather than from explicit reciprocal strategies such as tit-for-tat or other rules of social behavior. In a business setting, for example, the motivations between two opponents to cooperate may be mostly determined by their history of interaction, by recent cooperative or noncooperative actions taken by each opponent, and by the desire to increase personal gains and to succeed in a business venture, rather than by the often unavailable, explicit description of possible actions and outcomes. In the absence of descriptive information, people may draw upon the success or failure of similar past interactions and make implicit inferences about each other’s motivations. Assuming learning across repeated interactions, these inferences are expected to change over time and are likely to be influenced by the availability of information. It is clear that some explicit knowledge of an opponent’s outcomes and actions is essential for the development of cooperation. It is also clear that even when full information is given, the development is slow and suboptimal. Our results predict that this slow learning is because consideration for an opponent’s outcome decreases or increases dynamically as a function of the gap between the player’s expected and obtained outcome (i.e., surprise). The maximum possible cooperation between two players can be reached more quickly when individuals are less surprised by the actions that the other person is likely to take or by the outcomes that both of them will receive.

Acknowledgments This research was supported by a Defense Threat Reduction Agency (DTRA) grant to Cleotilde Gonzalez and Christian Lebiere (HDTRA1-09-1-0053); and by the U.S. Army Research Laboratory under Cooperative Agreement No. W911NF-09-2-0053. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. We thank Hau-yu Wong for her review and comments.

C. Gonzalez et al. / Cognitive Science 39 (2015)

493

Notes 1. Mistrust, Forgiveness, Abuse, and Trust are called Alpha, Beta, Gamma, and Delta, respectively, in Rapoport et al. (1976). 2. Oftentimes (as in social conflict situations), the attributes that define the current situation are trivial and do not change throughout the interaction. Thus, the instance representation simplifies to [Action, Outcome] and disregards the situation attributes.

References Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Hillsdale, NJ: Lawrence Erlbaum Associates. Anderson, J. R., & Lebiere, C. (2003). The Newell test for a theory of mind. The Behavioral Brain Sciences, 26(5), 587–639. Axelrod, R. (1980). Effective choice in the Prisoner’s Dilemma. Journal of Conflict Resolution, 24(1), 3–25. Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books. Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science, 211, 1390–1396. Balliet, D., Parks, C., & Joireman, J. (2009). Social value orientation and cooperation in social dilemmas: A meta-analysis. Group Processes & Intergroup Relations, 12(4), 533–547. Ben-Asher, N., Dutt, V., & Gonzalez, C. (2013). Accounting for the Integration of Descriptive and Experiential Information in a Repeated Prisoner’s Dilemma Using an Instance-Based Learning Model. Ottawa, Canada: Behavior Representation in Modeling and Simulation (BRiMS). Ben-Asher, N., Lebiere, C., Oltramari, A., & Gonzalez, C. (2013). Balancing fairness and efficiency in repeated societal interaction. In M. Knauff, M. Pauen, N. Sebanz & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (pp. 175–180). Austin, TX: Cognitive Science Society. Bereby-Meyer, Y., & Roth, A. E. (2006). The speed of learning in noisy games: Partial reinforcement and the sustainability of cooperation. The American Economic Review, 96(4), 1029–1042. Bueno de Mesquita, B. (2006). Game theory, political economy, and the evolving study of war and peace. American Political Science Review, 100(4), 637–642. Cable, D. M., & Shane, S. (1997). A prisoner’s dilemma approach to entrepreneur-venture capitalist relationships. Academy of Management Review, 22(1), 142–176. Camerer, C. F. (2003). Behavioral game theory: Experiments in strategic interaction. Princeton, NJ: Princeton University Press. Camerer, C. F., & Fehr, E. (2006). When does “economic man” dominate social behavior? Science, 311 (5757), 47–52. Cohen, R. L. (1982). Perceiving justice: An attributional perspective. In J. Greenberg & R. L. Cohen (Eds.), Equity and justice in social behavior (pp. 119–160). New York: Academic Press. Erev, I., Ert, E., & Roth, A. E. (2010a). A choice prediction competition for market entry games: An introduction. Games, 1(2), 117–136. Erev, I., Ert, E., Roth, A. E., Haruvy, E., Herzog, S., Hau, R., Hertwig, R., Stewart, T., West, R., & Lebiere, C. (2010b). A choice prediction competition for choices from experience and from description. Journal of Behavioral Decision Making, 23(1), 15–47. Erev, I., & Haruvy, E. (2013). Learning and the economics of small decisions. In J. H. Kagel & A. E. Roth (Eds.), The handbook of experimental economics (Vol. 2). Princeton, NJ: Princeton University Press. Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. The American Economic Review, 88(4), 848–881.

494

C. Gonzalez et al. / Cognitive Science 39 (2015)

Erev, I., & Roth, A. E. (2001). Simple reinforcement learning models and reciprocation in the Prisoner’s Dilemma game. In G. Gigerenzer, & R. Selten (Eds.), Bounded rationality: The adaptive toolbox (pp. 215–231). Cambridge, MA: MIT Press. Fiedler, S., Glockner, A., Nicklisch, A., & Dickert, S. (2013). Information search and social value orientation in social dilemmas: An eye-tracking analysis. Organizational Behavior and Human Decision Processes, 120(2), 272–284. Frederick, S., Lowenstein, G., & O’Donoghue, T. (2002). Time discounting and time preference: A critical review. Journal of Economic Literature, 40(2), 351–401. Gersick, C. J., & Davis-Sacks, M. L. (1990). Summary: Task forces. In J. R. Hackman (Ed.), Groups that work (and those that don’t) (pp. 146–154). San Francisco, CA: Jossey-Bass. Gonzalez, C. (2013). The boundaries of Instance-based Learning Theory for explaining decisions from experience. In V. S. Pammi, & N. Srinivasan (Eds.), Progress in brain research. Vol. 202 (pp. 73–98). Amsterdam: Elsevier. Gonzalez, C., & Dutt, V. (2011). Instance-based learning: Integrating sampling and repeated decisions from experience. Psychological Review, 118(4), 523–551. Gonzalez, C., Dutt, V., & Lejarraga, T. (2011). A loser can be a winner: Comparison of two instance-based learning models in a market entry competition. Games, 2(1), 136–162. Gonzalez, C., Lerch, J. F., & Lebiere, C. (2003). Instance-based learning in dynamic decision making. Cognitive Science, 27(4), 591–635. Gonzalez, C., & Martin, J. M. (2011). New theoretical perspectives on conflict and negotiation scaling up Instance-Based Learning Theory to account for social interactions. Management, 4(2), 110–128. Lejarraga, T., Dutt, V., & Gonzalez, C. (2012). Instance-based learning: A general model of repeated binary choice. Journal of Behavioral Decision Making, 25(2), 143–153. Lejarraga, T., & Gonzalez, C. (2011). Effects of feedback and complexity on repeated decisions from description. Organizational Behavior and Human Decision Processes, 116(2), 286–295. Maguire, R., Maguire, P., & Keane, M. T. (2011). Making sense of surprise: An investigation of the factors influencing surprise judgments. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(1), 176–186. Martin, J. M., Gonzalez, C., Juvina, I., & Lebiere, C. (2013). A description-experience gap in social interactions: Information about interdependence and its effects on cooperation. Journal of Behavioral Decision Making, Advance online publication. Molm, L. D. (2003). Power, trust, and fairness: Comparisons of negotiated and reciprocal exchange. Advances in Group Processes, 20, 31–65. Murphy, R. O., & Ackermann, K. A. (2013). Social value orientation: Theoretical and measurement issues in the study of social preferences. Personality and Social Psychology Review, Advance online publication. Murphy, R. O., Ackermann, K. A., & Handgraaf, M. J. J. (2011). Measuring social value orientation. Judgment and Decision Making, 6(8), 771–781. Nevo, I., & Erev, I. (2012). On surprise, change, and the effect of recent outcomes. Frontiers in Psychology, 3, 24. Payne, J. W., Bettman, J. R., & Johnson, E. J. (1993). The adaptive decision maker. New York: Cambridge University Press. Pinto, M. B., Pinto, J. K., & Prescott, J. E. (1993). Antecedents and consequences of project team crossfunctional cooperation. Management Science, 39(10), 1281–1297. Rapoport, A., & Chammah, A. M. (1965). Prisoner’s dilemma: A study in conflict and cooperation. Ann Arbor, MI: University of Michigan Press. Rapoport, A., Guyer, M. J., & Gordon, D. G. (1976). The 2X2 game. Ann Arbor, MI: University of Michigan Press. Rapoport, A., & Mowshowitz, A. (1966). Experimental studies of stochastic models for the Prisoner’s dilemma. System Research and Behavioral Science, 11(6), 444–458.

C. Gonzalez et al. / Cognitive Science 39 (2015)

495

Tabibnia, G., Satpute, A. B., & Lieberman, M. D. (2008). The sunny side of fairness: Preference for fairness activates reward circuitry (and disregarding unfairness activates self-control circuitry). Psychological Science, 19(4), 339–347. Tallman, I., & Hsiao, Y.-L. (2004). Resources, cooperation, and problem solving in early marriage. Social Psychology Quarterly, 67(2), 172–188. Thaler, R. (1981). Some empirical evidence on dynamic inconsistency. Economic Letters, 8(3), 201–207.

A cognitive model of dynamic cooperation with varied interdependency information.

We analyze the dynamics of repeated interaction of two players in the Prisoner's Dilemma (PD) under various levels of interdependency information and ...
2MB Sizes 0 Downloads 5 Views