J Neurophysiol 111: 1823–1832, 2014. First published February 12, 2014; doi:10.1152/jn.00393.2013.

Intelligence moderates neural responses to monetary reward and punishment Daniel R. Hawes,1 Colin G. DeYoung,2 Jeremy R. Gray,3 and Aldo Rustichini4 1

Department of Applied Economics, University of Minnesota, St. Paul, Minnesota; 2Department of Psychology, University of Minnesota, Minneapolis, Minnesota; 3Department of Psychology, Michigan State University, East Lansing, Michigan; and 4 Department of Economics, University of Minnesota, Minneapolis, Minnesota Submitted 31 May 2013; accepted in final form 6 February 2014

intelligence; reinforcement learning; risk; decision making; reward; punishment

in intelligence are linked to systematic differences in preferences and choice behavior during laboratory experiments (Benjamin et al. 2005; Burks et al. 2009; Rustichini 2009; Shamosh et al. 2008; Shamosh and Gray 2008). Furthermore, measures of intelligence (IQ) correlate with important life outcomes pertaining to educational achievement, job performance, wealth, and health status (Deary et al. 2007; Gottfredson 1997; Gottfredson and Deary 2004; Lawlor et al. 2006). Importantly for modern theories of decision making, differences in IQ consistently predict parameters describing individuals’ preferences with respect to risk and temporally delayed rewards (Burks et al. 2009; Rustichini et al. 2009; Shamosh et al. 2008; Shamosh and Gray 2008), suggesting that these fundamental decision parameters may be critically influenced by common neurobiological mechanisms related to intelligence. An important step toward understanding

INDIVIDUAL DIFFERENCES

Address for reprint requests and other correspondence: D. R. Hawes, Department of Applied Economics, University of Minnesota, 213 Ruttan Hall, 1994 Buford Avenue, St. Paul, MN 55108 (e-mail: [email protected]). www.jn.org

the functional role of intelligence in decision making is to investigate the existence of a link between IQ and the basic experience of rewarding and punishing outcomes of decisions. Finding such a link would open the possibility that the relation of IQ to preferences may be at least partly rooted in how decision outcomes are experienced ex post, rather than being limited to how options are evaluated ex ante. The striatum and medial prefrontal cortex are essential structures for human reward processing, and figure centrally in the brain systems that mediate goal directed behavior and experience-based learning (Balleine et al. 2007; Daw et al. 2011; Delgado 2007; Hawes et al. 2012; Schönberg et al. 2007), making these structures natural regions of interest for investigating the influence of intelligence on decision making and preferences. The present study focused on brain responses in the striatum and medial prefrontal cortex during reward processing to establish neural feasibility of a functional link between intelligence and gain and loss processing. To do so, we evaluated 94 subjects’ behavior and neural responses within a paradigm very similar to a task developed by Delgado et al. (2000). In our version of this well-known task, participants guessed whether a computer-generated number would be high or low and received monetary gains and punishments depending on whether their guesses were correct or incorrect. To minimize differences in reward responses due to randomization or systematic differences in guessing success, we experimentally manipulated the task so as to present each subject with the same pseudorandom sequence of gains and losses. That is, the computer-generated number was produced after the subject’s guess and was chosen such that each subject would experience the same sequence of gains and losses. Hence, our design eliminated the opportunity for subjects to experience different performance histories and thus minimized any potential concern that between-subject variance in choice behavior and outcome evaluation could be caused by differences in the history of obtained rewards. Instead, remaining individual differences in responses to gains and losses during our task were restricted to differences in preferences for reward/punishment or to individual differences in how the outcome of reward/punishment is experienced. Our analysis was primarily designed to assess whether an association of IQ with neural responses to rewards and punishments exists. Toward this aim we focused attention on the influence of intelligence on feedback processing in the rostral part of the caudate. This region was chosen a priori because of its joint implication for reward/punishment processing, reinforcement learning, and decision making in previous studies with similar task design (Delgado et al. 2000; Li et al. 2011; van den Bos et al. 2012a). Additionally, the caudate is identi-

0022-3077/14 Copyright © 2014 the American Physiological Society

1823

Downloaded from http://jn.physiology.org/ by 10.220.32.246 on August 11, 2017

Hawes DR, DeYoung CG, Gray JR, Rustichini A. Intelligence moderates neural responses to monetary reward and punishment. J Neurophysiol 111: 1823–1832, 2014. First published February 12, 2014; doi:10.1152/jn.00393.2013.—The relations between intelligence (IQ) and neural responses to monetary gains and losses were investigated in a simple decision task. In 94 healthy adults, typical responses of striatal blood oxygen level-dependent (BOLD) signal after monetary reward and punishment were weaker for subjects with higher IQ. IQ-moderated differential responses to gains and losses were also found for regions in the medial prefrontal cortex, posterior cingulate cortex, and left inferior frontal cortex. These regions have previously been identified with the subjective utility of monetary outcomes. Analysis of subjects’ behavior revealed a correlation between IQ and the extent to which choices were related to experienced decision outcomes in preceding trials. Specifically, higher IQ predicted behavior to be more strongly correlated with an extended period of previously experienced decision outcomes, whereas lower IQ predicted behavior to be correlated exclusively to the most recent decision outcomes. We link these behavioral and imaging findings to a theoretical model capable of describing a role for intelligence during the evaluation of rewards generated by unknown probabilistic processes. Our results demonstrate neural differences in how people of different intelligence respond to experienced monetary rewards and punishments. Our theoretical discussion offers a functional description for how these individual differences may be linked to choice behavior. Together, our results and model support the hypothesis that observed correlations between intelligence and preferences may be rooted in the way decision outcomes are experienced ex post, rather than deriving exclusively from how choices are evaluated ex ante.

1824

INTELLIGENCE MODERATES REINFORCEMENT SIGNALS

METHODS

The research was approved and conducted in accordance with the stipulations of the Yale University Institutional Review Board. Participants We collected data from 114 male, right-handed subjects. Subjects were recruited from Yale University (n ⫽ 25) and the surrounding community by distribution of fliers and Internet advertisements. We performed functional magnetic resonance imaging (fMRI) preprocessing on 104 subjects, leaving out 10 subjects who exhibited highly irregular responses in parts of the experimental battery (e.g., reporting having fallen asleep during the scan). Of the 104 subjects, data were discarded for 6 subjects because of excessive head motion in the scanner and for 4 subjects because of poor quality of obtained structural images. These exclusions left 94 participants in our analysis. Subjects’ average IQ was 122.9 (minimum: 95.5, maximum 148.0, SD: 11.6). Median age of our subjects was 22 yr (minimum: 18 yr, maximum: 38 yr). The sample was selected to be all male for the purposes of genetic research unrelated to the present study (Shehzad et al. 2012). Measures Subjects were administered the Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler 1999), which provides an estimate of full-scale IQ using four subtests (vocabulary, similarities, block design, and matrix reasoning). Subjects further completed a battery of questionnaires and cognitive tasks that included an n-back working memory task. In the working memory task, subjects viewed a series of words and judged whether each word matched the one appearing three previously. Performance on this task (d-prime) was used as an indicator of working memory. fMRI Procedures Subjects performed the task described below and three additional unrelated tasks for a scanning time of 1.25 h. Imaging data were collected using a 3-Tesla Siemens Trio scanner at the Yale Magnetic Resonance Research Center. For each participant, a high-resolution T1-weighted anatomic image [MPRAGE; time repetition (TR) ⫽ 2,500 ms; time echo (TE) ⫽ 3.34 ms; inversion time ⫽ 1,100 ms; flip angle ⫽ 7°; slices ⫽ 256, voxel size ⫽ 1 ⫻ 1 ⫻ 1 mm] and 180 contiguous functional volumes [gradient-echo EPI sequence; TR ⫽ 2,000 ms; TE ⫽ 25 ms; field of view (FOV) ⫽ 240 cm; flip angle ⫽

80°; voxel size ⫽ 3.75 ⫻ 3.75 ⫻ 4 mm] were acquired. Participants viewed stimuli projected onto a screen through a mirror mounted on the head coil. Responses were made via fiber-optic response buttons using the fingers of the right hand. Stimuli were presented in PsyScope (Cohen et al. 1993). Caudate volume was calculated in Freesurfer using the “asegstats2table” command in its default settings (Fischl and Dale 2000). Stimuli and Design Subjects were instructed to guess whether an upcoming computer generated number would be either Low (in the set 1, 2, 3) or High (4, 5, 6). Subjects received a reward of $2 for each correct guess and a punishment of ⫺$1 for each incorrect guess. Forty of these rewardrelevant trials were interspersed with 20 reward-neutral control trials, during which subjects were also instructed to press a button but for which they received neither feedback nor monetary rewards/punishments. As depicted in Fig. 1, during reward-relevant trials subjects first saw a one-dollar bill displayed on the screen for 3 s. During this time subjects indicated their guess regarding the upcoming number. After 3 s, subjects first saw a computer-generated number for 1 s and then, depending on trial type, a green upward or a red downward arrow containing the words “you win” or “you lose” for another 1 s. All trials were separated by fixation periods of 3, 5, or 7 s in duration (jittered). Reward-neutral trials started with a 3-s display of a gray rectangle of the same size as the dollar bill, followed by an asterisk for 1 s and a blue rectangle containing the word “same” for another 1 s. Each subject saw the same sequence of gain and loss trials (20 gains and 20 losses in total) in the same order: the computer responded to the subject’s guess in each trial by presenting a high or low number to match the predetermined outcome of the fixed trial sequence. Subjects were unaware that the computer’s number-generating process was fixed in this way and received instructions only about the mechanics and incentives of the task. Subjects were not instructed about the underlying reward-generating process. Therefore, reward-motivated subjects had a monetary incentive to try to discover possible patterns or regularities in the computer’s number-generating behavior. fMRI Data Analysis All data were preprocessed and analyzed using FSL version 4.1.9 (Jenkinson et al. 2012). Motion correction was performed using MCFLIRT. T1-weighted anatomic images were registered to MNI space using 12 degrees of transformational freedom as the FSL registration defaults. Functional data were preprocessed by applying slice time correction, spatial smoothing (using a 7-mm Gaussian kernel), linear trend removal, and temporal high-pass filtering (using FSL default settings). Subjects were eliminated if motion correction indicated deviations in the estimated center of mass ⬎3 mm, leading to the elimination of 6 subjects. For statistical analysis, we computed a general linear model (GLM) on the blood oxygen level-dependent (BOLD) signal time course of 94 subjects. Our model contained four predictors, specified as a zero-one box car-shaped variable, 2 s in duration. One of these predictors matched the onset of relevant trials (Rel); another predictor matched the onset of control trials (Ctrl). The remaining two predictors indicated the valence of feedback as positive (FB⫹) or negative (FB⫺) and coincided with the onset of each type of feedback stimulus, respectively. Predictors were convolved with a double gamma function estimate of the hemodynamic response. The model also contained the motion correction parameters (MCP) obtained during preprocessing. The final GLM was given by BOLD ⫽ a ⫹ b ⫻ Rel ⫹ c ⫻ Ctrl ⫹ d ⫻ FB⫹ ⫹ e ⫻ FB⫺ ⫹ X' ⫻ MCP ⫹ error. Significant differences in gain vs. loss responses were identified according to t-tests performed on the whole brain contrast FB⫹–FB⫺.

J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org

Downloaded from http://jn.physiology.org/ by 10.220.32.246 on August 11, 2017

fied as the only subcortical structure for which anatomic volume is correlated with IQ (Grazioplene RG, Ryman S, Grey Jeremy R, Rustichini A, Jung RE, and DeYoung CG, unpublished observations). Our main investigation entailed extracting the strength of neural responses to gains and losses in the caudate and demonstrating their statistical association with IQ. Following our finding of these associations, we utilized more exploratory analysis of the behavioral data to investigate a potential theoretical model that might explain our findings and that might be tested more extensively in future research. In particular we considered the relation of our results to known reinforcement learning functions of the regions of interest and considered the relation of IQ to how subjects may learn reward associations in a task like ours. Because the number of trials in our task was insufficient for fitting a reinforcement learning model for each subject individually, the suggested model is ultimately not critically tested by our data and should therefore be understood as an additional interpretational aid, complementing the central correlational analysis of this research.

INTELLIGENCE MODERATES REINFORCEMENT SIGNALS

1825

Using FSL defaults for computing significance of contingent clusters, based on the number of voxels and the smoothness of the data, we set a cluster-forming threshold at z ⬎ 7.5. Voxels within these clusters were significant at P ⬍ 0.01. Regression weights for the GLM were extracted separately for each subject and then correlated with IQ scores. Subject-specific regression weights were obtained by performing the above GLM on voxels falling into an anatomic template of the caudate head according to the Talairach atlas (Talairach and Tournoux 1988). Importantly, we performed individual differences analysis on this a priori anatomically defined region of interest (ROI), rather than the functionally identified region that showed the strongest contrast between gain and loss feedback. Use of a priori ROI for the investigation of individual differences is preferable to investigation of ROIs showing the strongest contrast because important influences of individual differences may often imply small main (i.e., group average) effects. Alternatively, identifying ROIs based on the presence of individual-difference effects renders any subsequent test of individual differences based on that ROI nonindependent (Vul et al. 2009). We performed additional exploratory analysis on three functionally identified ROIs that showed significant activation for gains compared with losses in our task. These regions comprised ROIs in the ventromedial prefrontal cortex [vmPFC; 575 contiguous voxels, maximum z-stat at 4, 50, ⫺2 (MNI)], the posterior cingulate cortex [pCC; 257 contiguous voxels, maximum z-stat at 0, ⫺32, 32 (MNI)], and also an ROI in the left inferior frontal lobe [liPFC; 23 contiguous voxels, maximum z-stat at ⫺34, 54, ⫺8 (MNI)]. The vmPFC and pCC both have been linked to the subjective utility of monetary outcomes of risky choices (Wu et al. 2011), and the liPFC has been associated with neural representations of loss aversion (Tom et al. 2007). Additional event-related averaging of BOLD signal was performed for the depiction of BOLD time courses in RESULTS. Event-related averages were computed relative to average BOLD at trial onset. Regression Analysis Regressions in Tables 2–5 were performed using the statistical programming software R. Regressions of behavioral data were computed using mixed-effects linear logistic regression with coefficients conditioned at the subject level, where appropriate (Tables 3–5), using the general linear modeling packages lme4 (Bates et al. 2012). The

regressions reported in Table 2 were performed using robust regression to control for potential influence of outliers using the modeling package robust (Wang et al. 2008).

RESULTS

Brain Imaging Results Striatum. Averaged across all subjects, and ignoring for the moment the effects of IQ, neuroimaging results for our large sample replicate and extend findings reported by Delgado et al. (2000) on a sample of 9 subjects. We found increased BOLD response in the caudate at the onset of reward relevant trials, and this BOLD response remained elevated after revelation of gains but decreased steeply below baseline after revelation of losses. Figure 2 shows this pattern of BOLD response for the anatomically identified caudate head. Brain regions showing significantly more activation after gains than after losses are shown in Fig. 3 and listed in Table 1. No brain areas showed significantly larger activation after losses compared with gains. We compared subject-specific responses to gains and losses by obtaining each subjects’ predictor of percentage BOLD change following rewarding (FB⫹) and punishing (FB⫺) feedback using the regression model described in METHODS. For the anatomically defined caudate, the difference relation between gain and loss responses and subject IQ is illustrated in Fig. 2, D and E, as well as in the regressions of Table 2. For the caudate [468 contiguous voxels, barycenter: 10, 4, ⫺4 (MNI)], percentage BOLD change after losses is positively predicted by larger IQ (i.e., the effect is less negative with higher IQ) (␤ ⫽ 0.87, P ⫽ 0.03). Controlling for caudate volume, age, and performance in the working memory task does not significantly alter the coefficient obtained for IQ. The equivalent regressions for the BOLD predictor of gains (FB⫹) showed no significant correlation between IQ and gain responses (P ⫽ 0.26). The simple correlation between IQ and the neural loss response is significant at r ⫽ 0.21 (P ⫽ 0.047). The

J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org

Downloaded from http://jn.physiology.org/ by 10.220.32.246 on August 11, 2017

Fig. 1. Task design. Subjects engaged in 40 reward relevant trials. At the beginning of these trials a U.S. dollar bill was displayed for 3 s, during which subjects pressed 1 of 2 buttons indicating their guess as to whether a computer-generated number would be Low (1–3) or High (4 – 6). Guesses were followed by 2 feedback screens, each 1 s in duration. Reward relevant trials were interspersed with 20 reward-neutral trials, which were signaled by a gray rectangle. In the image shown (but not in the task itself), a green outline marks the sequence of screens seen during gain trials, a red outline marks the sequence of screens for loss trials, and a gray outline marks the sequence of screens for rewardneutral control trials. Correct or incorrect outcomes were rewarded or punished with $2 or ⫺1, respectively.

1826

INTELLIGENCE MODERATES REINFORCEMENT SIGNALS

B % BOLD from global baseline

% BOLD from global baseline

A

0.1

0.0

Condition Loss Win

−0.1

0.2

0.1

0.0

Condition Loss (high IQ) Loss (low IQ)

−0.1

Win (high IQ) Win (low IQ)

0

3

5

8

0

3

C

y = 13

x=8

D

8

z=2

GLM coefficient for BOLD response to Gains (FB+)

E

100

GLM coefficient for BOLD response to Losses (FB−)

100 %BOLD change

%BOLD change

5

Time since trial onset (in seconds)

0

−100

0

−100

100

110

120

130

140

150

100

110

IQ

120

130

140

150

IQ

Fig. 2. Blood oxygen level-dependent (BOLD) response is FB⫹ ⬎ FB⫺ (gain ⬎ loss) in the bilateral caudate. BOLD response in caudate after gains is significantly higher than after losses. This results holds for whole brain analysis as well as a masked regression on the anatomically defined caudate (shown). A: event-related average BOLD response in caudate replicates the time course identified by Delgado et al. (2000). B: separate event-related average BOLD for 47 highest IQ and 47 lowest IQ subjects demonstrates that BOLD decrease after losses is less pronounced for higher IQ subjects. C: the mask used for analysis of caudate. D: general linear model (GLM) regression coefficients for BOLD response after gains plotted against IQ (r ⫽ 0.13, P ⫽ 0.185). E: GLM regression coefficients for BOLD response after losses plotted against IQ (r ⫽ 0.20, P ⫽ 0.047).

effect of IQ on gain and loss responses is further illustrated in Fig. 2, A and B, which shows average BOLD time course for the 47 highest and 47 lowest IQ subjects of our sample. The results indicate reduced differential responses to monetary outcomes for higher IQ, driven in particular by the subdued response after losses. vmPFC and liPFC. We performed the same analysis on BOLD responses after gains and losses for the vmPFC, pCC, and left inferior/middle frontal gyrus (liPFC). For the

vmPFC, we found a significant relation between intelligence and the BOLD responses to gains as well as losses: r ⫽ 0.36 (P ⬍ 0.01) for losses, and r ⫽ 0.30 (P ⬍ 0.01) for gains. For the liPFC, the correlation with the loss predictor is r ⫽ 0.29 (P ⬍ 0.01), and that for gains is r ⫽ 0.31 (P ⬍ 0.01). A marginally significant correlation was found for loss responses in the pCC with r ⫽ 0.20 (P ⫽ 0.051), but not for gain responses: r ⫽ 0.11 (P ⫽ 0.27). These results are further illustrated in Figs. 4, 5, and 6.

J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org

Downloaded from http://jn.physiology.org/ by 10.220.32.246 on August 11, 2017

Time since trial onset (in seconds)

INTELLIGENCE MODERATES REINFORCEMENT SIGNALS

1827

Fig. 3. Regions of interest showing significantly higher activation for gains than losses in whole brain analysis. Tal, Talairach coordinates.

Model Our task is analog to a repeated, simultaneous choice, matching pennies game between a participant and a computer opponent. The participant chooses between two options (l and r), while the opponent chooses between L and R. Suppose the opponent (the environment, or any rewardgenerating process) is choosing L with a fixed probability p independently in every period, and the participant believes that the choice of L is fixed in every period and independent, but occurs with probability q, possibly different from p. Table 1. Regions of interest differentiating between gain and loss feedback Region Identified by FB⫹ ⬎ FB⫺ Ventromedial prefrontal cortex Posterior cingulate cortex Caudate (bilateral) Left medial/inferior prefrontal cortex

Peak (X, Y, Z)

No. of Voxels

Average z Statistic

43 45 51

88 47 67

4 52 33

575 257 2160

7.54 7.46 5.64

61

91

33

23

7.21

Regions with significant activation for the gain feedback (FB⫹) ⬎ loss feedback (FB⫺) contrast in whole brain analysis (cluster correction z ⬎7.5, P ⬍ 0.01).

Prediction error is, as usual, the difference between the realized reward (according to p) and the participant’s subjective expectation (according to q, when the action is the optimal one with respect to q) of the reward. For example, if q ⬎ ½, the individual thinks that L is more likely, so he will then choose l all the time and have a q-expected payoff 2q ⫺ 1. The prediction error is 1 ⫺ (2q ⫺ 1) with probability p and ⫺1 ⫺ (2q ⫺ 1) with probability 1 ⫺ p. In this simple case, it is easy to see that the expected (with respect to the true probability p) prediction error of a participant with belief q has some basic properties, which hold more generally: If the belief is correct, that is, q ⫽ p, then the expected prediction error is zero. If individuals learn about the true distribution by Bayesian updating and those with higher IQ are willing to consider a larger set of initial values, then they are more likely to hold a correct belief and thus are more likely to have a zero prediction error. When the behavior of the opponent is partially predictable (that is, p is not ½), then the expected prediction error can be positive or negative. For example, if p ⬎ ½ and q ⬎ ½ (so the action chosen by the individual is the correct one), the p-expected prediction error is 2(p ⫺ q). This is positive when p is larger than q because the individual chooses the right action all the times (he is on the right side of ½), but he is guessing right more times than he thinks. The case closest to the environment in our experiment is that where the behavior of the opponent is unpredictable, that is, Table 2. Impact of IQ on caudate BOLD signal after losses Model 1 (b/SE)

Model 2 (b/SE)

Model 3 (b/SE)

Dependent variable: FB⫺ ⫺115.6* (51.0) 0.99* (0.41)

Intercept IQ Working memory, d-prime

⫺114.6* (56.7) 0.99* (0.5)

⫺141.6* (64.8) 0.93* (0.44)

⫺0.3 (6.6)

⫺1.2 (6.5) 4.5 (5.8)

3

Caudate volume, cm R2

0.061

0.062

0.069

The regression coefficient for neural responses to losses correlates significantly with intelligence. Controlling for working memory and caudate volume does not affect the independent effect of intelligence. BOLD responses after losses increase with higher IQ. Results are from a robust regression (*P ⬍ 0.05). IQ scores on standard scale: mean 100, SD 15; caudate volume: mean 81 cm3, SD 92 cm3; d-prime as a measure of working memory: mean 1.9, SD 0.83.

J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org

Downloaded from http://jn.physiology.org/ by 10.220.32.246 on August 11, 2017

Our findings provide the first neurobiological evidence for a link between intelligence and ex post processing, or experience, of monetary rewards and punishments. The additional results for vmPFC, because of its role in coding expected utility, suggest a potential link to differences in experienced utility for the outcomes in our task. Given the prominent roles of the caudate and the medial prefrontal cortex during reinforcement learning, and the conceptual link between our task and standard reinforcement learning tasks, our results suggest the hypothesis that outcomes of probabilistic events have differential impact on negative and positive reinforcement signals for subjects who differ in IQ. Support for this hypothesis would be an important next step toward explaining how individual differences in intelligence relate to long-term differences in attitudes to risk, and to economic preferences generally. Within the limitations of the experimental design, we therefore considered further evidence for this conceptual link between the reinforcement learning value of gain/loss responses and IQ in the behavioral data. In particular, we considered a link motivated by the following theoretical view of how prediction errors relate to cognitive ability in tasks like ours.

1828

INTELLIGENCE MODERATES REINFORCEMENT SIGNALS

A 100

%BOLD change

B % BOLD from global baseline

GLM coefficient for BOLD response to Losses (FB )

0

100

100

110

120

130

140

150

0.00

Condition Loss

0.05

Win

0

3

5

8

GLM coefficient for BOLD response to Gains (FB+)

D

%BOLD change

100

0

100

100

110

120

130

140

150

IQ

Fig. 4. Results for ventromedial prefrontal cortex. A: GLM regression coefficients for BOLD response after losses plotted against IQ (r ⫽ 0.36, P ⬍ 0.01). B: event-related average plot. C: GLM regression coefficients for BOLD response after gains plotted against IQ (r ⫽ 0.30, P ⬍ 0.01). D: location of regions of interest (ROI).

p ⫽ ½. In this case the p-expected prediction error is negative no matter what q is. This is because the p-expected payoff is zero no matter what the policy of the individual is, and the q-expected payoff is positive or zero. So in the completely unpredictable case, any deviation of q from ½ makes the individual overconfident in his ability to predict, that is, he overestimates the expected payoff, which is bound to be zero, thus the expected disappointment. In our experimental environment, the computer’s behavior has no exploitable pattern. Thus, if participants differ in IQ, and if those with high IQ have beliefs closer to the truth, then the prediction error of those with lower IQ will be lower (more negative) than the prediction error of participants with higher IQ. One way in which subjects can search for patterns in the computer’s behavior in our task and obtain predictions that are closer to the truth is to consider longer histories of observed computer choices during the pattern search. We thus consider the relation between subject behavior and observed computer choices in the behavioral data. Behavioral Results Response times in our task did not substantially differ with respect to subject IQ (r ⫽ ⫺0.14, P ⫽ 0.16). A mixed-effects logistic panel regression (Table 3) grouped by subject showed

that subjects’ guesses in any given trial were influenced primarily by the computer choices in the most recent two trials, with subjects more likely to choose the option not recently selected by the computer. The payoff outcome (gain or loss) of the previous trials did not exhibit a significant effect on subjects’ choices; thus we did not find evidence of a pervasive use of a win-stay/lose-shift heuristic. Additionally, the observed overall frequency of High choices by the computer did not influence subjects’ behavior after controlling for the computer’s choice one and two trials back (Table 3, model 3), indicating that subjects modified their choices with respect to past observations of computer behavior in a manner that extends beyond responding to current total frequency. Subjects appeared engaged, consciously or unconsciously, in an attempt to learn and exploit perceived patterns in the reward-generating process by responding to recently observed computer choices. We found that this behavior was systematically moderated by intelligence. For this moderation analysis, we first considered the observed trial-by-trial frequencies of observed computer choices for each subject. In particular we considered three frequencies: F0t , the unconditional frequency of observed computer choices at trial t; F1t , the frequency of observed computer choices up to trial t conditional on the computer’s choice in the previous trial; and F2t , the computer choices conditional on the compu-

J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org

Downloaded from http://jn.physiology.org/ by 10.220.32.246 on August 11, 2017

Time since trial onset (in seconds)

IQ

C

0.05

INTELLIGENCE MODERATES REINFORCEMENT SIGNALS

A 100

%BOLD change

B % BOLD from global baseline

GLM coefficient for BOLD response to Losses (FB )

0

100

0.10

0.05

0.00

Condition Loss Win

0.05

100

110

120

130

140

150

0

3

5

8

GLM coefficient for BOLD response to Gains (FB+)

D

%BOLD change

100

0

100

100

110

120

130

140

150

IQ

Fig. 5. Results for left inferior frontal lobe. A: GLM regression coefficients for BOLD response after losses plotted against IQ (r ⫽ 0.31, P ⬍ 0.01). B: event-related average plot. C: GLM regression coefficients for BOLD response after gains plotted against IQ (r ⫽ 0.30, P ⬍ 0.01). D: location of ROI.

ter’s choices two trials back.1 Hence, a given value F0 describes the overall frequency of High/Low responses of the computer. F1 indicates the frequency of pairs of High-High, High-Low, Low-High, and Low-Low choices, which is to say the expected probability of a High/Low number given the identity of the previous number. Likewise, F2 considers the expected probability of a High/Low number given the identity of the number two periods back. On the basis of each of these frequencies, we calculated the trial-by-trial expected value of guessing High for each subject and entered these values into a mixed-effects logistic panel regression, grouped by subject, to assess the impact of these conditional frequencies on subject choice (Table 4). As already demonstrated by the results in Table 3, expected values based on the unconditional frequency of computer choices, EV0, were unrelated to subject choices. However, expected values based on conditional frequencies relating to one and two trials back, denoted EV1 and EV2, respectively, significantly predicted subjects’ guessing behavior. For subjects at the lower range of intelligence for our sample, conditional frequencies one trial back positively predicted guessing behavior, whereas subjects at the higher range of intelligence displayed a larger effect of the events two trials back. Note that individuals with 1 Considering no more than two previous trials is justified in light of our analysis, described above.

lower IQ within our sample have an average IQ with respect to the overall population. Because our experiment was not designed to differentiate between competing possible models as to how subjects use past information to determine future choices, our analysis remains restricted to showing that intelligence predicts the extent to which information farther back in time predicts subjects’ behavior. To illustrate this conclusion more clearly without assuming any specific functional form for information integration, Table 5 combines information from one period and two periods back into a single regressor, EV1⫹2, consisting of the unweighted average of F1t and F2t . In this analysis, controlling for F1t does not eliminate the hypothesized effect that the influence of the composite information on subjects’ guesses significantly increases with IQ. Hence, we conclude that subjects with higher IQ were influenced by events one and two periods back, whereas subjects with lower IQ were chiefly responding to events only one period back. This conclusion is further illustrated in Table 6, which reports predicted probabilities of guessing High based on the true observed expected value of this guess. Table 6 shows that the influence of the expected value computed by considering only one period back is more influential for subjects when IQ is low, whereas the influence of the expected value considering events two periods back moderates subject choices when IQ is high. An informative result is obtained by further comparing

J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org

Downloaded from http://jn.physiology.org/ by 10.220.32.246 on August 11, 2017

Time since trial onset (in seconds)

IQ

C

1829

1830

INTELLIGENCE MODERATES REINFORCEMENT SIGNALS

A 100

%BOLD change

B % BOLD from global baseline

GLM coefficient for BOLD response to Losses (FB )

0

100

0.10

0.05

0.00

Condition Loss

0.05

Win

100

110

120

130

140

150

0

5

8

GLM coefficient for BOLD response to Gains (FB+)

D

%BOLD change

100

0

100

100

110

120

130

140

150

IQ

Fig. 6. Results for posterior cingulate cortex. A: GLM regression coefficients for BOLD response after losses plotted against IQ (r ⫽ 0.20, P ⬍ 0.051). B: event-related average plot. C: GLM regression coefficients for BOLD response after gains plotted against IQ (r ⫽ 0.11, P ⬍ 0.27). D: location of ROI.

the average expected value of a reward-maximizing subject using the model based on one period back, mean(EV1), to that for a subject using the model based on one and two periods back, mean(EV1⫹2). For this comparison, using the actual observed histories of subjects in our task, we find that the model which considers the longer history produces a significantly smaller average expected value across all trials [mean(EV1) ⫽ 0.269, mean(EV1⫹2) ⫽ 0.253, P ⫽ 0.006]. This result should not be surprising, and it is implicit in the model we consider, since by considering a more complex relationship among observed data, the mean(EV1⫹2) essentially produces a less noisy estimate of the true expected value, which is zero in our task. In evaluating these findings it is important to remember that, in our task, considering more information and a larger class of models is useful only to learn that there is nothing meaningful for the subject to learn, given the pseudorandom nature of the outcomes. However, reaching this conclusion should produce reduced prediction error signals for punishing outcomes, and this is consistent with the caudate responses we report. DISCUSSION

The results reported in this article provide an important first step in understanding the functional role of intelligence during decision making, by establishing the existence of a link between intelligence and neural reward processing. Our main

result demonstrates that IQ moderates BOLD responses to monetary outcomes of decisions. In the caudate this moderating effect is in large part due to a relation between IQ and responses to monetary losses. Our simple modeling account extends a possible explanation for how these neural findings may link to behavior in the context of prediction error-based reward learning. The model conceptually describes higher IQ subjects as considering richer sets of possibilities (or mental models) during reward learning, and it predicts that for random reward processes such as the one considered in our task, subjects with higher IQ should experience, on average, reduced negative prediction errors. Assuming disutility from negative prediction errors, this account could be extended to an explanation of how cognitive ability affects willingness to take risks. Specifically, our results provide first support for the hypothesis that observed correlations between IQ and economic preferences could be based on systematic differences in the way rewards and punishments are experienced ex post. This effect of intelligence would be in addition to, but distinct from, its role during the ex ante evaluation of decision options. We find support for this hypothesis in the observed correlations between IQ and neural activity in the caudate following punishment, as well as increased activity in utility tracking regions (vmPFC). Our findings and the discussed model are consistent and compatible with current research considering contributions of

J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org

Downloaded from http://jn.physiology.org/ by 10.220.32.246 on August 11, 2017

C

3

Time since trial onset (in seconds)

IQ

INTELLIGENCE MODERATES REINFORCEMENT SIGNALS

Table 3. Influences of past computer actions on subject choices Model 1 (b/SE)

Model 2 (b/SE)

Model 3 (b/SE)

1831

Table 5. Relation of IQ to conditional probabilities influencing choice behavior Model 1 (b/SE)

Dependent variable: Subject’s choice of H Intercept

0.091 (0.051) ⫺1.136‡ (0.333)

Freq Hcpu

0.453‡ (0.084) ⫺0.091 (0.374) ⫺0.210† (0.073) ⫺0.44‡ (0.073) ⫺0.110 (0.073)

Hcpu 1-back Hcpu 2-back Hcpu 3-back Hcpu 1-back ⫻ Won

4,802 ⫺2,389

4,784 ⫺2,368

4,790 ⫺2,367

Mixed-effects logistic panel regression grouped by subject [N ⫽ 3,478 observations (94 subjects ⫻ 37 trials; 3 trials were not used because of reference 3-back)]. Hcpu i-back is a dummy variable representing whether the computer chose “High” i periods back. Freq Hcpu is the experienced frequency of High choices by the computer up to each trial, expressed as the difference from 0.5. *P ⬍ 0.05; †P ⬍ 0.01: ‡P ⬍ 0.001. BIC, Bayesian Information Criterion; LL, log-likelihood.

cognitive ability to reinforcement learning and the influence of experimentally induced increases in cognitive load on decision makers’ reliance on mental models during reinforcement learning tasks (Collins et al. 2012; Otto et al. 2013; van den Bos et al. 2012b). However, the proposed model is not critically tested on our data and should therefore be viewed as an exploratory account and interpretative aid for a possible mechanism underlying the neural results that constitute the main result of this study. Table 4. Influence of conditional transition probabilities on subject behavior Model 1 (b/SE)

Model 2 (b/SE)

Dependent variable: subject’s choice of H Intercept EVm0(H) EVm1(H) EVm2(H) EVm0(H) ⫻ IQ EVm1(H) ⫻ IQ EVm2(H) ⫻ IQ BIC LL

0.102 (0.05) ⫺1.039 (2.36) 2.66* (1.22) ⫺1.87* (1.00) 0.001 (0.02) ⫺0.02* (0.01) 0.02* (0.01) 4,828 ⫺2,382

0.06 (0.05) 2.45* (1.06) 2.00* (0.93) ⫺0.02* (0.01) 0.02* (0.01) 4,828 ⫺2,389

Mixed-effects logistic panel regression grouped by subject (N ⫽ 3,478 observations). Expected values of choosing High [EVmi(H)] were calculated based on the frequencies, Ft, of past computer choices as described in text. *P ⬍ 0.05. IQ scores on standard scale: mean 100, SD 15; EV centered: mean 0.5.

Dependent variable: subject’s choice of H Intercept

0.066 (0.05) 4.45† (1.59) ⫺4.00* (1.96) ⫺0.04† (0.01) 0.03* (0.015)

EVm1(H) EVm1⫹m2(H) EVm1(H) ⫻ IQ EVm1⫹m2(H) ⫻ IQ BIC LL

4,828 ⫺2,389

Mixed-effects logistic panel regression grouped by subject (N ⫽ 3,478 observations). EVmi(H) expresses the expected value as a simple average of the expected conditional probabilities i periods back. *P ⬍ 0.05; †P ⬍ 0.01. IQ scores on standard scale: mean 100, SD 15; EV centered: mean 0.5.

Our results add to an emerging literature suggesting the potential of investigating neural correlates of stable individual differences in cognitive ability as a means of better understanding neural computations underlying feedback-based learning (Collins et al. 2012; Otto et al. 2013; van den Bos et al. 2012b). Our results for medial prefrontal cortex, left inferior frontal gyrus, and posterior cingulate cortex further suggest that investigation into the relation between intelligence and experienced utility may benefit particularly from analyzing the functional integration of utility-coding prefrontal areas with subcortical regions during reinforcement learning.2 In addition to the link to theories of behavior discussed in this article, our results mark a neurophysiological path toward a potential bridge between theoretical neuroscience and psychological research on intelligence: Although it has been shown that general intelligence is predicted by subjects’ efficiency during associative learning (Kaufman et al. 2009), the mechanism via which associative learning contributes to general intelligence has remained essentially unexplored in psy2 Notably, a link between developmental changes in reinforcement learning and striatal-medial prefrontal cortex connectivity has been identified in a recent study (van den Bos et al. 2012a). Additionally, neural responses in medial orbitofrontal cortex and dorsomedial striatum have been shown to co-vary as a function of causal contingency (Tanaka et al. 2008).

Table 6. Predicted probabilities of choosing High given the experienced expected values for High p(H IQ, EVm1, EVm2)

Expected Value EVm1 ⫽ EVm2 ⫽ 0 EVm1 ⫽ 0.5 ⬎ EVm2 ⫽ 0 EVm1 ⫽ 0 ⬍ EVm2 ⫽ 0.5 EVm1 ⫽ EVm2 ⫽ 0.5

min IQ (IQ ⫽ 100)

max IQ (IQ ⫽ 140)

0.51 0.63 0.57 0.68

0.51 0.51 0.64 0.65

Rows show different combinations of expected values computed for the model considering 1 period back (EVm1) and the model considering 2 periods back (EVm2). Min IQ data are for the lowest IQ subject in our sample, max IQ data are for the highest IQ subject in our sample. Predicted probabilities (p) are based on regression results in Table 5.

J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org

Downloaded from http://jn.physiology.org/ by 10.220.32.246 on August 11, 2017

0.168 (0.10) ⫺0.12 (0.09)

Hcpu 1-back ⫻ Lost BIC LL

0.386‡ (0.07) ⫺0.23 (0.360) ⫺0.273† (0.087) ⫺0.373‡ (0.086)

1832

INTELLIGENCE MODERATES REINFORCEMENT SIGNALS

chology. Further investigation of the computational role of IQ during associative learning may therefore benefit from a targeted investigation of how IQ relates to the complexity of mental models held by decision makers. ACKNOWLEDGMENTS We thank technical support staff and seminar participants at the University of Minnesota, as well as conference participants at the Annual Meeting for Cognitive Neuroscience 2012, for valuable feedback. In particular, we thank Edward Patzelt, Philip Burton, Piotr Evdokimov, Rachael Grazioplene, KimSau Chung, Claudia Civai, and Mauricio Delgado. GRANTS

DISCLAIMER Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. DISCLOSURES No conflicts of interest, financial or otherwise, are declared by the authors. AUTHOR CONTRIBUTIONS D.R.H. analyzed data; D.R.H. and A.R. interpreted results of experiments; D.R.H. prepared figures; D.R.H. and A.R. drafted manuscript; D.R.H., C.G.D., J.R.G., and A.R. edited and revised manuscript; D.R.H. and A.R. approved final version of manuscript; C.G.D. and J.R.G. conception and design of research; C.G.D. and J.R.G. performed experiments. REFERENCES Balleine BW, Delgado MR, Hikosaka O. The role of the dorsal striatum in reward and decision-making. J Neurosci 27: 8161– 8165, 2007. Bates D, Maechler M, Bolker B. lme4: Linear mixed-effects models using S4 classes (Online). Version 0.999999-0. http://CRAN.R-project.org/package⫽ lme4 [2012]. Benjamin D, Brown S, Shapiro J. Who is ‘behavioral’? Cognitive ability and anomalous preferences. J Eur Econ Assoc 11: 1231–1255, 2013. Burks SV, Carpenter Jeffery P, Goette L, Rustichini A. Cognitive skills affect economic preferences, strategic behavior and job attachment. Proc Natl Acad Sci USA 106: 7745–7750, 2009. Cohen J, MacWhinney B, Flatt M, Provost J. PsyScope: an interactive graphic system for designing and controlling experiments in the psychology laboratory using Macintosh computers. Behav Res Methods Instrum Comput 25: 257–271, 1993. Collins AG, Frank MJ. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur J Neurosci 35: 1024 –1035, 2012. Daw N, Gershman SJ, Seymoour B, Dayan P, Dolan RJ. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69: 1204 –1215, 2011. Deary IJ, Strand S, Smith P, Fernandes C. Intelligence and educational achievement. Intelligence 35: 13–21, 2007. Delgado MR. Reward-related responses in the human striatum. Ann NY Acad Sci 1104: 70 – 88, 2007.

J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org

Downloaded from http://jn.physiology.org/ by 10.220.32.246 on August 11, 2017

This work was supported by National Institute of Mental Health Grant F32 MH077382 (to C. G. DeYoung) and National Science Foundation Grants DRL 0644131 (to J. R. Gray) and SES-1061817 (to A. Rustichini).

Delgado MR, Nystrom L, Fissel C. Tracking the hemodynamic responses to reward and punishment in the striatum. J Neurophysiol 84: 3072–3077, 2000. Fischl B, Dale AM. Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proc Natl Acad Sci USA 97: 11050 – 11055, 2000. Gottfredson LS. Why g matters: the complexity of everyday life. Intelligence 24: 79 –132, 1997. Gottfredson LS, Deary IJ. Intelligence predicts health and longevity, but why? Curr Dir Psychol Sci 13: 1– 4, 2004. Hawes DR, Vostroknutov A, Rustichini A. Experience and abstract reasoning in learning backward induction. Front Neurosci 6: 23, 2012. Jenkinson M, Beckmann CF, Behrens TE, Woolrich MW, Smith SM. FSL. Neuroimage 62: 782–790, 2012. Lawlor D, Clark H, Smith GD, Leon D. Childhood intelligence, educational attainment and adult body mass index: findings from a prospective cohort and within sibling-pairs analysis. Int J Obes 30: 1758 –1765, 2006. Li J, Delgado MR, Phelps EA. How instructed knowledge modulates the neural systems of reward learning. Proc Natl Acad Sci USA 108: 55– 60, 2011. Otto AR, Gershman SJ, Markman AB, Daw ND. The curse of planning dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol Sci 24: 751–761, 2013. Rustichini A. Neuroeconomics: what have we found, and what should we search for. Curr Opin Neurobiol 19: 672– 677, 2009. Schönberg T, Daw ND, Joel D, O’Doherty JP. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. J Neurosci 27: 12860 –12867, 2007. Shamosh NA, DeYoung CG, Green AE, Reis DL, Johnson MR, Conway AR, Gray JR. Individual differences in delay discounting relation to intelligence, working memory, and anterior prefrontal cortex. Psychol Sci 19: 904 –911, 2008. Shamosh NA, Gray JR. Delay discounting and intelligence: a meta-analysis. Intelligence 36: 289 –305, 2008. Shehzad Z, DeYoung CG, Kang Y, Grigorenko EL, Gray JR. Interaction of COMT val158 met and externalizing behavior: relation to prefrontal brain activity and behavioral performance. Neuroimage 60: 2158 –2168, 2012. Talairach J, Tournoux P. 3-Dimensional proportional system: an approach to cerebral imaging. In: Co-Planar Stereotaxic Atlas of the Human Brain. New York: Thieme, 1988. Tanaka SC, Balleine BW, O’Doherty JP. Calculating consequences: brain systems that encode the causal effects of actions. J Neurosci 28: 6750 – 6755, 2008. Tom SM, Fox CR, Trepel C, Poldrack RA. The neural basis of loss aversion in decision-making under risk. Science 315: 515–518, 2007. van den Bos W, Cohen MX, Kahnt T, Crone EA. Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning. Cereb Cortex 22: 1247–1255, 2012a. van den Bos W, Crone EA, Güro˘glu B. Brain function during probabilistic learning in relation to IQ and level of education. Dev Cogn Neurosci 2: S78 –S89, 2012b. Vul E, Harris C, Winkielman P, Pashler H. Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspect Psychol Sci 4: 274 –290, 2009. Wang J, Zamar R, Marazzi A, Yohai V, Salibian-Barrera M, Maronna R, Zivot E, Rocke D, Martin D, Konis K. robust: Insightful Robust Library (Online); R package version 0.3– 4. http://CRAN.R-project.org/package⫽ robust [2008]. Wechsler D. Wechsler Abbreviated Scale of Intelligence (3rd ed.). San Antonio, TX: Pearson Education, 1999. Wu SW, Delgado MR, Maloney LT. The neural correlates of subjective utility of monetary outcome and probability weight in economic and in motor decision under risk. J Neurosci 31: 8822– 8831, 2011.

Intelligence moderates neural responses to monetary reward and punishment.

The relations between intelligence (IQ) and neural responses to monetary gains and losses were investigated in a simple decision task. In 94 healthy a...
983KB Sizes 2 Downloads 3 Views