CHAPTER FIVE

Reinforcement Learning and Tourette Syndrome Stefano Palminteri*,1, Mathias Pessiglione†

*Laboratoire des Neurosciences Cognitives (LNC), Ecole Normale Supe`rieure (ENS), Paris, France † Motivation Brain and Behaviour Team (MBB), Institut du Cerveau et de la Moelle (ICM), Paris, France 1 Corresponding author: e-mail address: [email protected]

Contents 1. Reinforcement Learning: Concepts, and Paradigms 2. Neural Correlates of Reinforcement Learning 2.1 Electrophysiological correlates in monkeys 2.2 Functional magnetic resonance imaging correlates in humans 2.3 Parkinson’s disease and reinforcement learning 3. Tourette Syndrome and Reinforcement Learning 3.1 Experimental study 1: Tourette syndrome and subliminal instrumental learning (Palminteri, Lebreton, et al., 2009) 3.2 Experimental study 2: Tourette syndrome and reinforcement of motor skill learning (Palminteri et al., 2011) 3.3 Experimental study 3: Tourette syndrome and probabilistic reinforcement learning (Worbe et al., 2011) 4. Conclusions and Perspectives Acknowledgments References

132 135 135 136 138 139

140

141

143 144 148 148

Abstract In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex–basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well

International Review of Neurobiology, Volume 112 ISSN 0074-7742 http://dx.doi.org/10.1016/B978-0-12-411546-0.00005-6

#

2013 Elsevier Inc. All rights reserved.

131

132

Stefano Palminteri and Mathias Pessiglione

as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter.

ABBREVIATIONS ADHD attention deficit-hyperactivity disorder DA dopamine DBS deep brain stimulation fMRI functional magnetic resonance imaging OCD obsessive–compulsive disorder PD Parkinson disease RL reinforcement learning TD temporal difference (learning) TS Tourette syndrome VPFC ventral prefrontal cortex VS ventral striatum

1. REINFORCEMENT LEARNING: CONCEPTS, AND PARADIGMS Reinforcement learning (RL) deals with the ability of learning the associations between stimuli, actions, and the occurrence of pleasant events, called rewards, or unpleasant events called punishments. The term reinforcement indicates the process of forming and strengthening of these associations by the reinforcer, which encompasses both rewards (positive reinforcers) and punishments (negative reinforcers). These associations affect the learner’s behavior in a variety of fashions: they shape vegetative and automatic responses as a function of reward and punishment anticipation and they also bias learner’s actions. RL has an evident adaptive value, and it is unsurprising that it has been observed in extremely distant zoological phyla, such as nematoda, arthropoda, mollusca and, of course, chordata (Brembs, 2003; Murphey, 1967; Rankin, 2004). Modern neurocomputational accounts of RL are situated on the convergence of two scientific threads of the twentieth century: animal learning and artificial intelligence (Dickinson, 1980; Sutton & Barto, 1998). The heritage of the first thread includes behavioral paradigms and psychological concepts;

Reinforcement Learning and Tourette Syndrome

133

the heritage of the second one is to be found in the mathematical formalization of these concepts and paradigms. The computational and the psychological views share the basic idea that the learner (the animal or the automaton) wants something (goal-directness). This feature distinguishes RL from the other learning processes, such as procedural or observational learning (Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. The MIT Press © 2005. ISBN: 0262541858). From this standpoint two features emerge: RL is selectional (the agent must try and select among several alternative choices) and associative (these choices must be associated with a particular state). In animal learning literature, RL was originally referred as conditioning. The experimental paradigms in conditioning have belonged to two main classes: classical conditioning and instrumental conditioning. The minimal conditioning processes imply the building up of associations between a reinforcer and a stimulus or an action. In classical conditioning, the reinforcer is delivered irrespective of the learner’s behavior, and the observed response is represented by innate preparatory responses. The typical example is Pavlov’s dog learning to salivate (innate response) in response to a bell (stimulus), which announced the delivery of food (reinforcer) (Pavlov, 1927). In instrumental conditioning, the reinforcer’s delivery is contingent on a behavioral response. This feature was observable in the early experimental observations of this process, provided by Thorndike and Skinner: an animal closed in a box had to learn to perform specific actions (string pulling, lever pressing) in order to escape captivity or get food (Skinner, 1938; Thorndike, 1911). Looking at the causal forces of conditioning several conditions shown to be necessary: temporal contiguity (an action or a stimulus must be temporally close to the outcome for an association to be established), contingency (the probability of an outcome should be higher after the action or the stimulus, i.e., the action or stimulus should be predictors of the outcome), and prediction error (an action or a stimulus is associated to an outcome if the same outcome was not already fully predicted by the learner) (Rescorla, 1967). Rescorla and Wagner have first introduced the latter idea (Rescorla & Wagner, 1972). They were interested in understanding a particular conditioning effect called the “blocking effect” (Kamin, 1967). In Kamin’s blocking paradigm, an animal is exposed to a first conditioned stimulus (i.e., a bell ring), which predicts the occurrence of a reinforcer (i.e., food). After learning the association between the bell and the food, another stimulus (i.e., a light) is presented with the food. Hence both the bell and the light are stimuli

134

Stefano Palminteri and Mathias Pessiglione

that predict the food. However, when tested, the animal does not learn the association between the light and the food, as if it were “blocked” by the first association. Rescorla and Wagner proposed that conditioning occurs not only because two events co-occur, but because that co-occurrence is unanticipated on the basis of current knowledge. In the example above, the occurrence of food is already fully predicted by the bell, so no novel association with the light is learned. The primitive learning signal of their model is a “prediction error,” defined as the difference between the predicted and the obtained reinforcer. The reinforcer (reward or punishment) prediction error is a measure of the prediction’s accuracy and the Rescorla and Wagner model is an error minimization model. RL in the artificial intelligence perspective is a field of machine learning aimed to find computational solutions to a class of problems closely related to the psychological paradigms described in the case of instrumental conditioning (Sutton & Barto, 1998). The agent is conceived as navigating through states of the environment selecting actions and collecting a quantitative reward,1 which should be maximized. From this learning perspective, two main functions arise as necessary for a RL agent: predicting the expected reward in a given state (reward prediction) and optimal action selection for reward maximization (choice). Most influential modern RL models incorporate a temporal difference (TD) learning. TD learning algorithm builds accurate reward predictions from delayed rewards; the learning rule of this model is not dissimilar of that used in the Rescorla and Wargner model, and it is based on a reward prediction error term. Q-learning is an extension of TD learning that learns separately the reward to be expected following each available action. The optimal choice becomes to simply choose the action with the highest reward expectation (Watkins & Dayan, 1992). Also Q-learning is based on a TD error. Thus, the experimenter can, thanks to RL algorithms, extrapolate key computational variables of these models and make quantitative predictions on how neural and behavioral data should evolve under the assumptions of the model. These computational constructs are referred as “hidden variables,” as opposed to the experimental observables (choices, reaction times), from which they are derived. In the next section, we shall see where

1

The reward of computational modeling is a quantitative term that can take negative values and therefore represent punishments as well.

Reinforcement Learning and Tourette Syndrome

135

these computational hidden variables, focusing on prediction errors, have been mapped in the primate brain.

2. NEURAL CORRELATES OF REINFORCEMENT LEARNING2 2.1. Electrophysiological correlates in monkeys In this section, we will review the most significant contributions to the understanding of neural substrates of RL, coming from electrophysiological studies in monkeys and functional neuroimaging and neuropsychology in humans. A large series of experiments performed by Wolfram Schultz and colleagues in the 1990s provided the first evidences of a neural system representing RL variables in the primate brain. At this time dopamine (DA) function was mainly associated with many debilitating conditions including Parkinson’s disease, Tourette’s syndrome, schizophrenia, attention deficithyperactivity disorder (ADHD), and addictions (Kienast & Heinz, 2006). It appeared soon that, instead of motor parameters, the behavioral variables most prominently associated with dopaminergic response were rewardrelated (Mirenowicz & Schultz, 1996). In a seminal paper published in Science in 1997, Schultz and colleagues showed that during a classical conditioning task, at the moment of the outcome, the activity of midbrain dopaminergic neurons encoded the discrepancy between the reward and its prediction, such that an unpredicted reward elicits an augmentation of activity (positive prediction error), a fully predicted reward elicits no response (no prediction error), and the omission of a predicted reward induces a depression (negative prediction error; Fig. 5.1A) (Schultz, Dayan, & Montague, 1997). The prediction error hypothesis of dopaminergic neurons’ response during learning has since been replicated with other paradigms and by other groups (Bayer & Glimcher, 2005; Fiorillo, Tobler, & Schultz, 2003; Morris, Nevet, Arkadir, Vaadia, & Bergman, 2006). In summary, single-cell recording studies in monkeys consistently showed that during RL, dopaminergic neurons represent a theoretical learning signal (hidden variable): the reward prediction error. In the next section we will show that the same learning signal has been shown to underpin human RL. 2

A vast and rich literature exists concerning the neural bases of reinforcement learning in rodents. We opted for restricting this chapter to primate studies, because they were the first to test and adopt computational concepts and models. Please note that recent studies strongly suggest that the same neurobiological and computational models are valid for both orders. Steinberg et al., 2013.

136

Stefano Palminteri and Mathias Pessiglione

A

Healthy subjects Reward occurs

DA level

Level of DA to reach for effective positive prediction errors Reward learning occurs

Level of DA to reach for effective negative prediction errors Punishment learning occurs

Time Punishment occurs

PD patients

PDs wih l-dopa

TS patients

TSs wih neuroleptics

DA level

DA level

B

Figure 5.1 (A) A schematic representing dopaminergic signals (gray) following positive and negative outcomes compared to baseline level in healthy subjects. The green and the red line, respectively, represent the level to reach (either above or below the baseline) to express a signal strong enough to induce either reward or punishment learning. This schematic is based on the results originally reported in Schultz, Dayan, and Montague (1997). (B) The same processes are represented for unmedicated and medicated PD and TS patients, where the DA baselines are supposed to be modified by the clinical and the pharmacological condition. When the dopaminergic signal does not reach the green line, reward learning does not occur (this is the case for unmedicated PD and medicated TS). When the dopaminergic signal does not reach the red line punishment learning does not occur (this is the case in unmedicated TS and medicated PD). This schematic represents a possible interpretation for the results obtained in the experimental study 1 (Palminteri et al., 2009).

2.2. Functional magnetic resonance imaging correlates in humans The findings from electrophysiological studies in nonhuman primates presented earlier motivated subsequent functional magnetic resonance imaging (fMRI) experiments in humans aimed to find corresponding neural representations of reward prediction error in the human brain. The first evidence for reward prediction error encoding in the human brain was provided by Berns, McClure, Pagnoni, and Montague (2001). In this study, they gave

Reinforcement Learning and Tourette Syndrome

137

squirts of juice and water either in a predictable or in an unpredictable manner and they found that unpredictable reward sequences selectively induced activation of the ventral striatum (VS) and in the ventral prefrontal cortex (VPFC) (both target structures of the midbrain dopaminergic neurons) compared to predictable reward sequences, indicating that positive prediction errors, instead of reward itself, induced increased activity in these areas. These results have been replicated later (O’Doherty, Deichmann, Critchley, & Dolan, 2002). These first studies used the so-called categorical approach to fMRI data analysis. Though this approach has the advantages of being easy to implement and explain, it has the great disadvantage of preventing one from capturing the online temporal evolution of RL signals (Friston et al., 1996). This is crucial for RL variables, such as reward predictions and reward prediction errors, supposed to change radically during time. In fact as learning occurs, reward prediction signals increase and prediction errors decrease: a feature completely missed with cognitive subtraction. A second wave of studies used a different approach, called “model-based fMRI,” which allows following of learning-related changes in reward prediction and prediction error encoding (O’Doherty, Hampton, & Kim, 2007). This approach begins with computing the model estimation of the hidden computational variables according to the RL algorithm (most often simple TD learning model for classical conditioning tasks and Q-learning model for instrumental conditioning tasks), from subjects’ behavioral data. The fMRI data analysis consists of the search of the brain areas whose neural activity covary with the model’s estimate of the computational variables. Following this model-based approach, a study from O’Doherty and colleagues using classical conditioning procedure revealed that responses in the VS and in the VPFC were significantly correlated with this error signal (O’Doherty, Dayan, Friston, Critchley, & Dolan, 2003). Similar results were obtained by the same group in a subsequent experiment in which they contrasted a classical conditioning and an instrumental conditioning procedure (O’Doherty et al., 2004). These first results have been replicated consistently hereinafter with different kinds of rewards (primary, as well as secondary), different paradigms (classical, as well as instrumental conditioning), and by different groups (see e.g., Abler, Walter, Erk, Kammerer, & Spitzer, 2006; Kim, Shimojo, & O’Doherty, 2006; Palminteri, Boraud, Lafargue, Dubois, & Pessiglione, 2009; Rutledge, Dean, Caplin, & Glimcher, 2010). Thus, reward prediction errors have been reported consistently in the basal ganglia (VS) and in the VPFC, which are main projections site of

138

Stefano Palminteri and Mathias Pessiglione

the dopaminergic neurons (Draganski et al., 2008). The consensual interpretation, built in analogy with electrophysiology studies in nonhuman primates, of these results has been that these signals reflect the midbrain dopaminergic input in these areas. This idea has been further supported by another experiment in which the authors utilized a special MRI sequence to enhance the sensitivity in the midbrain. They reported that the responses of dopaminergic nuclei were compatible with the reward prediction error hypothesis (D’Ardenne, McClure, Nystrom, & Cohen, 2008). However, functional imaging uniquely provides us with “functional correlates,” which, in principle, could be merely epiphenomenal. A limitation which is not specific to fMRI; it is also common to other electrophysiology techniques. To assess causal relations between a neural system and a behavior, neuroscientists must observe the behavioral output of system’s perturbation (Siebner et al., 2009). The perturbation can be the administration of a given molecule or an accidental brain injury. The causal implication of dopaminergic transmission in fMRI prediction error signals has been given by a pharmacological perturbation fMRI study in which subjects performed an instrumental learning task with probabilistic monetary rewards and were given a dopaminergic treatment. The treatment was either a DA enhancer (levodopa), or a DA antagonist (aldol) or placebo (Pessiglione, Seymour, Flandin, Dolan, & Frith, 2006). fMRI results showed again that reward prediction errors were represented in the VS; furthermore, they showed that DA treatments modified the amplitude of these signals, so that l-dopa amplified prediction errors and aldol blunted them, establishing a direct link between dopaminergic transmission and fMRI prediction error signals. Moreover, these medications affected learning performances accordingly to their neural effects (enhancement under l-dopa, impairment under aldol), suggesting a causal role of DA modulation in reward learning. In summary, the study of the neural bases of RL in humans has consistently shown that (1) reward prediction errors are represented in the striatum and in the prefrontal cortex (mainly in the ventral parts, VS and VPFC) and that (2) dopaminergic pharmacological manipulation significantly affect these signals and consequently the behavioral performance.

2.3. Parkinson’s disease and reinforcement learning Given the prominent implication of dopaminergic system in RL, it is unsurprising that the first neuropsychological investigations implicated Parkinson’s disease (PD). In a seminal study, Frank and colleagues

Reinforcement Learning and Tourette Syndrome

139

administrated an instrumental learning task to a cohort of PD patients medicated (on) or unmedicated with l-dopa (off ) (Frank, Seeberger, & O’Reilly, 2004). Their results showed that the off patients were impaired in learning from positive outcomes, whereas on patients were impaired in learning from negative outcomes. This result is consistent with the idea that reward and punishment learning are driven by dopaminergic positive and negative prediction errors, respectively. According to this interpretation, the level of dopaminergic transmission cannot increase enough to produce positive prediction errors in off patients, because of their neural loss in the midbrain DA nuclei, so that positive outcomes are not able to induce learning. On the contrary in the on patients, where the level of DA has been artificially increased by the treatment with l-dopa, negative prediction errors (pauses in DA transmission) are not possible, leading to the impairment in learning from negative outcomes (Fig. 5.1B). These results have been further replicated (partially or totally) by our and other groups (Bo´di et al., 2009; Frank, Samanta, Moustafa, & Sherman, 2007; Palminteri, Lebreton, et al., 2009; Rutledge et al., 2009; Voon et al., 2010). In summary these results indicated that dopaminergic motor disease, such as PD could display nonmotor symptoms in a fashion that is fully compatible with the hypothesis of dopaminergic encoding of prediction errors during RL. On this basis a natural extension of these studies has been to study RL in Tourette syndrome (TS).

3. TOURETTE SYNDROME AND REINFORCEMENT LEARNING In 1895 Georges Albert E´douard Brutus Gilles de la Tourette, working at the Pitie´-Salpeˆtrie`re in Paris, published an article reporting the accurate clinical description of nine patients presenting a tic disorder characterized by onset in childhood and multiple physical (motor) tics and at least one vocal (phonic) tic. This article is still considered as the first accurate description of this particular tic disorder, which has since been named TS after the author (Rickards & Cavanna, 2009). There is no cure for TS and no medication that works universally for all individuals without significant adverse effects (Hartmann & Worbe, 2013). The classes of medication with the most proven efficiency in treating tics are typical and atypical neuroleptics, DA antagonists, including risperidone, haloperidol, and pimozide. They can have long-term and short-term adverse effects, both at the motor level (DA antagonists induced Parkinsonian

140

Stefano Palminteri and Mathias Pessiglione

syndrome) and at the cognitive (executive dysfunction) and at the affective level (blunted affects). Recently, a promising new molecule, the aripiprazole, a DA partial agonist, has been proposed for the symptoms of TS syndrome. For drug-resistant adulthood-persistent TS patients, deep brain stimulation (DBS) treatment is under study in several clinical centers (McNaught & Mink, 2011). The precise etiology behind TS is unknown. An influent pathophysiologic hypothesis implicates a condition of elevated DA levels. This hypothesis was first suggested by the observation of the beneficial effects of DA antagonists on TS symptoms. Consistent with this hypothesis several genetic, postmortem, and neuroimaging studies supported a dopaminergic hyperfunctioning, though these observations have recently been challenged (Gilbert et al., 2006; Malison et al., 1995; McNaught & Mink, 2011; Tarnok et al., 2007; Yoon, Gause, Leckman, & Singer, 2007; Yoon, Rippel, et al., 2007). TS displays frequent comorbidities with other psychiatric diseases. Among them the most common are obsessive–compulsive disorder (OCD) and ADHD, which have also been associated with cortico-striatal dysfunction (Worbe, Mallet, et al., 2010). Recent anatomy and functional connectivity studies suggest that different phenotypes of TS (in terms of tic complexity or psychiatric comorbidities) could be associated with the impairment of distinct cortico-striatal circuits (Worbe et al., 2012; Worbe, Gerardin, et al., 2010). In summary, TS tics are believed to result from DA-induced dysfunction in cortical and subcortical regions. More precisely, even if still debated, TS seems to be associated to a functional hyperdopaminergia. In addition, TS is treated with DA antagonists. These observations, together with the wellestablished implication of DA and frontal striatal circuits in RL, motivated a series of experiments, aimed to explore RL in this pathology, described below.

3.1. Experimental study 1: Tourette syndrome and subliminal instrumental learning (Palminteri, Lebreton, et al., 2009) The first investigation of RL in TS patients employed a subliminal instrumental learning task, with monetary gains and losses, which had already been shown to activate the VS in a previous fMRI study (Pessiglione et al., 2008). The learning task was subliminal since the contextual cues, associated with gains and losses, were presented for a very short period (50 ms) and between two visual masks (Kouider & Dehaene, 2007). The advantage of using subliminal visual presentation is to ensure that basic RL processes are not

Reinforcement Learning and Tourette Syndrome

141

perturbed by high level cognitive processes (Dehaene, Changeux, Naccache, Sackur, & Sergent, 2006). In addition to TS patients, in this experimental study we tested a cohort of PD patients. The PD patients were tested twice: once on and once off dopaminergic medication. TS patients were split into two groups according to their medication status: unmedicated or medicated with DA antagonists. Behavioral data analysis showed the following pattern: unmedicated PD patients were impaired in reward learning, as were medicated TS patients; while medicated PD patients and unmedicated TS were impaired in punishment learning (Fig. 5.2A). Several significant conclusions could be drawn from these results: (1) the double dissociation between medication status (on vs. off l-dopa) and outcome valence (reward vs. punishment), first shown by Frank and colleagues in PD patients, is robust across tasks, and particularly in the unconscious case; (2) this double dissociation is robust across pathologies and pharmacological models, since it was replicated in TS patients (Frank et al., 2004). These results support the hypothesis that bursts in DA transmission encode positive prediction errors and therefore drive reward learning, whereas dips in DA transmission encode negative prediction errors and therefore drive punishment learning (Fig. 5.1). Thus, this study provides the first experimental evidence of a functional hyperdopaminergia outside the motor domain in TS by showing that unmedicated TS patients behave in a similar way that l-dopa-medicated PD patients.

3.2. Experimental study 2: Tourette syndrome and reinforcement of motor skill learning (Palminteri et al., 2011) In a subsequent study, we investigated the effect of reward-based reinforcement in motor skill learning and the role of DA of this process. RL theory has been extensively used to understand choice behavior: DA signals reward prediction errors in order to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcers not only affect choices but also motor skills such as playing piano or football. Here, we employed a novel paradigm to demonstrate that monetary rewards could improve motor skill learning in humans. Indeed, healthy participants got progressively faster at executing sequences of key presses that were repeatedly rewarded with 10 euro compared with 1 cent. Interestingly, control tests revealed that the effect of reinforcement on motor skill learning was independent of subjects’ awareness of sequence-reward associations: a result that is reminiscent of what we have shown in the experimental study

142

Stefano Palminteri and Mathias Pessiglione

Parkinson

Tourette PD OFF PD ON

TS OFF

Learning performances

Learning performances

A

Reward Punishment

TS ON

Reward Punishment

B Reinforcement learning

Reaction time (1 cent)

Motor learning

TS OFF Controls TS ON

15

1

Trial

15

Neural RPE

C Reward prediction errors (RPE)

Ventral striatum

Group

Trial

Behavioral RPE

Learning curves

Learning performances

1

TS OFF TS AA TS PA

1

24

Trial

Figure 5.2 (A) A schematic summarizing the behavioral results of the experimental study 1 (Palminteri, Lebreton, et al., 2009). The graphs show the interaction between reinforced valence (positive or negative) and medication studies. The same pattern can be observed in PD and TS (ON, medicated; OFF, unmedicated). (B) This schematic summarizes the main results of the experimental study 2 (Palminteri et al., 2011). Motor skill learning is impaired in TS, compared to controls, irrespective of the medication status, whereas reinforcement leaning effect on motor learning follows a completely different pattern: it is exacerbated in unmedicated TS patients (TS OFF) compared to healthy controls and unmedicated TS (TS OFF) in which it is absent. (C) This schematic summarizes the main results of the second experimental study (Worbe et al., 2011). Reward prediction encoding has been found in the VS (among other areas, such as the VPFC). Learning performances were blunted in DA antagonist medicated patients (TS AA) compared to unmedicated patients (TS OFF) and partial agonist medicated patients (TS PA). Note that all the graphs here represent ideal values meant to illustrate the pattern of the experimental results, but not the experimental results themselves (except for the ventral striatal activation).

Reinforcement Learning and Tourette Syndrome

143

1 concerning the possibility of unconscious instrumental learning. TS patients, who were either medicated or unmedicated with DA antagonists as in the previous study, performed the same behavioral task. We also included patients with focal dystonia, as an example of hyperkinetic motor disorder unrelated to DA. The behavioral data analysis, based on computational modeling, showed the following dissociation: while motor skills were affected in all patient groups, RL was selectively enhanced in unmedicated patients with TS syndrome and impaired by DA antagonists (Fig. 5.2B). These results support the idea that RL has multiple behavioral effects, which are all mediated by DA transmission (Niv, Daw, Joel, & Dayan, 2007; Suri & Schultz, 1998). Clinically, the results further support the hypothesis that overactive DA transmission leads to excessive reinforcement of motor sequences, which might explain the formation of tics in TS (see the Section 4 of this chapter).

3.3. Experimental study 3: Tourette syndrome and probabilistic reinforcement learning (Worbe et al., 2011) In the last study, we investigated instrumental learning with a task associated with probabilistic monetary gains, which have already been shown to activate the VPFC and VS as a function of reward prediction and prediction error (Palminteri, Boraud, et al., 2009). In this study, we investigated the effect of different clinical phenotypes in terms of tic complexity and psychiatric comorbidity. Indeed, it has been suggested that the heterogeneity of clinical phenotypes in TS may relate to the dysfunction of distinct frontal cortex–basal ganglia circuits (Worbe et al., 2012; Worbe, Gerardin, et al., 2010). To assess RL performances across various clinical phenotypes and different pharmacological treatments, we recruited a large cohort of TS patients. Subjects (patients and controls) were scanned using functional MRI while they performed the probabilistic instrumental learning task. fMRI data analysis confirmed the implication of the VPFC and the VS in reward encoding. Reward-related activation in limbic circuits was independently reduced by two factors: presence of associated obsessive–compulsive symptoms and medication with DA antagonists. Computational modeling with standard RL algorithms indicated that for both factors, the diminished reward-related activation could account for the impaired choice performance. Furthermore, RL performance and related brain activations were not affected by aripiprazole, a recent medication that acts as a DA partial agonist (Kawohl, Schneider, Vernaleken, & Neuner, 2009). These results support

144

Stefano Palminteri and Mathias Pessiglione

the hypothesis that the heterogeneity of clinical phenotypes in TS patients relates to dysfunction of distinct frontal cortex–basal ganglia circuits and suggest that, unlike DA antagonists, DA partial agonists may preserve reward sensitivity and hence avoid blunting motivational drives. In summary, this study replicated the finding of a reward learning impairment in TS patients, associated with (1) medication with DA antagonists and extended (2) this finding to comorbid OCD. This experiment also showed that RL performances and reward-related brain activities were significantly correlated (Fig. 5.2C).

4. CONCLUSIONS AND PERSPECTIVES Long-lasting experimental research in cognitive neuroscience indicates that RL is based on a teaching signal called the prediction error, the difference between the obtained and the expected outcome and that this signal is represented by DA neurons projecting to frontal cortex–basal ganglia circuits. This has been confirmed by different techniques, such as electrophysiology, fMRI, and neuropharmacology, in nonhuman primates, healthy subjects, PD patients and, more recently, in TS patients. The smoothing of positive prediction errors observed in PD off patients might provide a mechanism for the expressed symptoms of PD at both the motor and the cognitive–psychiatric level. For instance, if an action is not reinforced when rewarded, selection of that action will not be facilitated in the future and the consequent deficit in movement selection could account for some motor symptoms, such as akinesia and rigidity (BarGad & Bergman, 2001). At another level a reduced reward sensibility could account for psychiatric symptoms, such as depression or apathy (Agid et al., 2003; Weintraub, Comella, & Horn, 2008). The resulting impairment in punishment avoidance, explained by the impossibility to express negative prediction errors, may account for DA dysregulation syndrome, which encompasses the manifestation of different impulse control deficits (addictions, pathological gambling, hypersexuality) secondary to DA replacement therapy (Lawrence, Evans, & Lees, 2003; Voon et al., 2009). Abnormally high DA level blunts negative prediction errors, and therefore makes the punishing consequences of these inadaptive behaviors (i.e. losing money in the case of gambling) ineffective to impede them in the future. In the experimental study 1, we found that TS mirrored PD patients with respect to reward and punishment learning, since unmedicated TS patients were impaired in punishment learning. A parsimonious explanation of their

Reinforcement Learning and Tourette Syndrome

145

deficit is to hypothesize impairment in coding negative prediction errors, which is compatible with the idea of overactive DA transmission in TS (Fig. 5.1B). The idea of a functional hyperdopaminergia in TS patients has also been replicated in the experimental study 2, in which we showed that TS condition, in unmedicated patients, was associated with exacerbated effect reinforcement on motor learning performances compared to healthy controls and captured by enhanced reward prediction errors in the computational analysis. Traditionally, the pathological hyperactivity of DA system has been linked to tic generation, supposing a role on the disinhibition of inappropriate motor patterns (the tics) and their positive reinforcement (Leckman, 2002; Mink, 2003). Against this view, we speculate that the absence of negative reinforcement, instead of an excessive positive reinforcement, is to be linked to the tics. Accordingly, the most plausible scenario is that the absence (or the reduction) of negative reinforcement, due to an excessive dopaminergic state, impedes the negative selection (the extinction) of inappropriate motor patterns. In healthy subjects an inappropriate movement could be occasionally emitted during life time, but it is rapidly inhibited by negative reinforcement occasionally (tics are very frequent during childhood). On the contrary, in subjects with abnormally high dopaminergic level (TS patients) this negative selection process would fail, and the tic will persist. This would also consistently explain the beneficial effects of DA antagonists treatment for TS symptoms: reducing the DA level, these molecules allow the tic to be negatively reinforced and finally suppressed. At the cognitive level, a pathophysiological process, similar to the one proposed for PD, could account for the impulse control disorders, whose frequencies are enhanced in TS (Frank, Piedad, Rickards, & Cavanna, 2011). In all three experimental studies, we showed that DA antagonists administration in TS patients blunted reward prediction errors at both the computational–behavioral and the neural level. This observation has also been reported in healthy subjects (Pessiglione et al., 2006). A possible mechanism to achieve this reward learning inhibition is to blunt the expression of dopaminergic positive prediction errors, therefore reducing reward sensitivity (Fig. 5.1B). This property would also explain certain cognitive and affective side effects of DA antagonists treatment, such as apathy, a general lack of motivation (Hartmann & Worbe, 2013). Interestingly, DA partial agonist has been shown to preserve reward sensitivity at both behavioral and neural. The preserved reward sensitivity could explain then why this molecule displays reduced side effects.

146

Stefano Palminteri and Mathias Pessiglione

In the experimental study 3, we also found an instrumental learning deficit in TS patients with OCD comorbidity, which correlated with blunted activity in the VPFC. This finding is consistent with evidences describing RL deficits in OCD patients (Cavanagh, Gru¨ndler, Frank, & Allen, 2010; Chamberlain et al., 2008; Figee et al., 2011; Nielen, den Boer, & Smid, 2009; Palminteri, Clair, Mallet, & Pessiglione, 2012; Remijnse et al., 2006). Thus, since it has been shown with a variety of behavioral tasks and clinical models, RL deficit may represent a neuropsychological feature of OCD. These findings of a neural and behavioral reward processing impairment are consistent with the alleged dysfunction of ventral frontal cortex–basal ganglia loops that has been reported in OCD and in comorbid TS-OCD (Aouizerate et al., 2004; Rotge et al., 2010; Worbe, Gerardin, et al., 2010). Although the connection between RL impairment and obsessive–compulsive symptoms remains to be articulated, we speculate here that repetitive behaviors or thoughts might come from aberrant reinforcement processes in a similar way described for tics in TS syndrome. Future research should also focus on studying reinforcement in other pathology of the TS spectrum, such as ADHD, which is characterized by monoaminergic dysfunction and dopaminergic treatment (Biederman & Faraone, 2005). In summary, RL is a process whose dysfunction could in part be responsible for the behavioral manifestation of TS at different levels (from “lower” motor symptoms to “higher” cognitive and psychiatric symptoms). Thus, the formal framework of RL study can provide fundamental insights for the comprehension of neuropsychiatric disorders. From this perspective the experimental studies presented here can be considered part of a newborn and promising discipline, computational psychiatry, which aims to explain neuropsychiatric diseases with formal and quantitative behavioral models (Maia & Frank, 2011; Montague, Dolan, Friston, & Dayan, 2012). Beyond the interest for the physiopathology of TS, these data have also implications for the implementation of current treatments and the development of new ones, at both the pharmacological, surgical, and the behavioral therapy levels (Hartmann & Worbe, 2013; McNaught & Mink, 2011). We already showed that different kinds of pharmacological treatment differentially affect RL, possibly explaining the different expression of side effects. On the other side, behavioral therapy, which is largely based on conditioning procedures, should take into account the medication status of the patient. For instance, on the basis of our results, negative reinforcement is not likely to be effective in unmedicated TS patients, whereas the opposite can be true for medicated ones. Concerning surgical

Reinforcement Learning and Tourette Syndrome

147

approaches, the known implication of different subcortical nuclei in RL could inform and affect the choice of new target nuclei. As previously mentioned, the sensitivity of the RL paradigms with respect to the dopaminergic status has proven very robust across neuropsychiatric pathologies and treatments. Accordingly, RL tasks, such as the probabilistic RL task with monetary gain and losses, with proven sensitivity to the dopaminergic status and well-established dopaminergic subcortical neural correlates, could potentially be adapted and standardized in order to be used in daily neuropsychological assessment as a proxy of dopaminergic functioning, and therefore used to assess the patient’s propensity to display particular psychiatric symptoms or treatment side effects (Murray et al., 2008; Palminteri, Justo, et al., 2012; Pessiglione et al., 2006; Voon et al., 2010). Short- and mid-term experimental perspectives include the study of the effect of DBS in RL, as it has already been done in PD, but not in TS (Frank et al., 2007; Palminteri et al., 2013). DBS is becoming more and more studied as a treatment for TS. Targeted nuclei included so far the globus pallidum and the VS (Viswanathan, Jimenez-Shahed, Baizabal Carvallo, & Jankovic, 2012). All these structures have been previously implicated in RL (see previous sections). Studying the effect of DBS on RL performances should extend the studies presented earlier. From the fundamental perspective, local field potentials recording in these subjects will give us the unique opportunity to study RL-related electrophysiological signals in humans (Cohen et al., 2009; Priori et al., 2013). Studying reinforcement learning performances in TS in a developmental perspective would represent another interesting perspective. As a matter of facts, brain circuits mature with different speeds; for instance, the motor circuit matures before the VPFC (Giedd et al., 1999; Gogtay et al., 2004). TS being a development disorder it would be interesting to map the RL capabilities of TS patients within the time course of brain maturation. Finally, the decision-making and learning community has recently witnessed the blossoming of studies directed toward the understanding of “model-based” RL (“model-based” here is used in a difference sense compared to “model-based fMRI” described earlier). This learning approach, though more precise and flexible, is computationally more complex, because it requires mentally simulating alternative courses of action (Daw, Niv, & Dayan, 2005; Samejima & Doya, 2007). Recently, modelbased computation has been shown to be underpinned by the dorsal prefrontal cortex region, which has been classically associated with cognitive

148

Stefano Palminteri and Mathias Pessiglione

control (Gla¨scher, Daw, Dayan, & O’Doherty, 2010; Koechlin & Summerfield, 2007; Wunderlich, Dayan, & Dolan, 2012). Further research should investigate model-based learning performances in TS patients, as well as the effect of DA antagonists in this process.

ACKNOWLEDGMENTS Mae¨l Lebreton took very important and active place in the experimental studies 1 and 2. Yulia Worbe designed and conducted the experimental study 3. Yulia Worbe and Andreas Hartmann took care of the TS patients and provided clinical data. David Grabli had a similar role, but for PD and dystonic patients. S. P. received a PhD fellowship from the Neuropole de Recherche Francilien (Nerf ). The studies were funded by the Fyssen Fondation (FF), the Ecole de Neuroscience de Paris (ENP), the Agence National de la Recherche (ANR), and the Association Franc¸aise du Syndrome de Gilles de la Tourette (AFSGT).

REFERENCES Abler, B., Walter, H., Erk, S., Kammerer, H., & Spitzer, M. (2006). Prediction error as a linear function of reward probability is coded in human nucleus accumbens. NeuroImage, 31(2), 790–795. http://dx.doi.org/10.1016/j.neuroimage.2006.01.001. Agid, Y., Arnulf, I., Bejjani, P., Bloch, F., Bonnet, A. M., Damier, P., et al. (2003). Parkinson’s disease is a neuropsychiatric disorder. Advances in Neurology, 91, 365–370. Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/12442695. Aouizerate, B., Guehl, D., Cuny, E., Rougier, A., Bioulac, B., Tignol, J., et al. (2004). Pathophysiology of obsessive-compulsive disorder: A necessary link between phenomenology, neuropsychology, imagery and physiology. Progress in Neurobiology, 72(3), 195–221. http://dx.doi.org/10.1016/j.pneurobio.2004.02.004. Bar-Gad, I., & Bergman, H. (2001). Stepping out of the box: Information processing in the neural networks of the basal ganglia. Current Opinion in Neurobiology, 11(6), 689–695. Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/11741019. Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47(1), 129–141. http://dx.doi.org/10.1016/j. neuron.2005.05.020. Berns, G. S., McClure, S. M., Pagnoni, G., & Montague, P. R. (2001). Predictability modulates human brain response to reward. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 21(8), 2793–2798. Retrieved from, http://www.ncbi.nlm.nih. gov/pubmed/11306631. Biederman, J., & Faraone, S. V. (2005). Attention-deficit hyperactivity disorder. Lancet, 366(9481), 237–248. http://dx.doi.org/10.1016/S0140-6736(05)66915-2. Bo´di, N., Ke´ri, S., Nagy, H., Moustafa, A., Myers, C. E., Daw, N., et al. (2009). Rewardlearning and the novelty-seeking personality: A between- and within-subjects study of the effects of dopamine agonists on young Parkinson’s patients. Brain: A Journal of Neurology, 132(Pt. 9), 2385–2395. http://dx.doi.org/10.1093/brain/awp094. Brembs, B. (2003). Operant conditioning in invertebrates. Current Opinion in Neurobiology, 13(6), 710–717. http://dx.doi.org/10.1016/j.conb.2003.10.002. Cavanagh, J. F., Gru¨ndler, T. O. J., Frank, M. J., & Allen, J. J. B. (2010). Altered cingulate sub-region activation accounts for task-related dissociation in ERN amplitude as a function of obsessive-compulsive symptoms. Neuropsychologia, 48(7), 2098–2109. http://dx. doi.org/10.1016/j.neuropsychologia.2010.03.031.

Reinforcement Learning and Tourette Syndrome

149

Chamberlain, S. R., Menzies, L., Hampshire, A., Suckling, J., Fineberg, N. A., del Campo, N., et al. (2008). Orbitofrontal dysfunction in patients with obsessivecompulsive disorder and their unaffected relatives. Science (New York, N.Y.), 321(5887), 421–422. http://dx.doi.org/10.1126/science.1154433. Cohen, M. X., Axmacher, N., Lenartz, D., Elger, C. E., Sturm, V., & Schlaepfer, T. E. (2009). Neuroelectric signatures of reward learning and decision-making in the human nucleus accumbens. Neuropsychopharmacology: Official Publication of the American College of Neuropsychopharmacology, 34(7), 1649–1658. http://dx.doi.org/10.1038/npp.2008.222. D’Ardenne, K., McClure, S. M., Nystrom, L. E., & Cohen, J. D. (2008). BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science (New York, N.Y.), 319(5867), 1264–1267. http://dx.doi.org/10.1126/science.1150605. Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 1704–1711. http://dx.doi.org/10.1038/nn1560. Dehaene, S., Changeux, J.-P., Naccache, L., Sackur, J., & Sergent, C. (2006). Conscious, preconscious, and subliminal processing: A testable taxonomy. Trends in Cognitive Sciences, 10(5), 204–211. http://dx.doi.org/10.1016/j.tics.2006.03.007. Dickinson, A. (1980). Contemporary animal learning theory. Cambridge University Press. Retrieved from, http://books.google.com/books?hl¼it&lr¼&id¼2y84AAAAIAAJ&pgis¼1. Draganski, B., Kherif, F., Klo¨ppel, S., Cook, P. A., Alexander, D. C., Parker, G. J. M., et al. (2008). Evidence for segregated and integrative connectivity patterns in the human basal ganglia. The Journal of Neuroscience, 28(28), 7143–7152. http://dx.doi.org/10.1523/ JNEUROSCI.1486-08.2008. Figee, M., Vink, M., de Geus, F., Vulink, N., Veltman, D. J., Westenberg, H., et al. (2011). Dysfunctional reward circuitry in obsessive-compulsive disorder. Biological Psychiatry, 69(9), 867–874. http://dx.doi.org/10.1016/j.biopsych.2010.12.003. Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science (New York, N.Y.), 299(5614), 1898–1902. http://dx.doi.org/10.1126/science.1077349. Frank, M. C., Piedad, J., Rickards, H., & Cavanna, A. E. (2011). The role of impulse control disorders in Tourette syndrome: An exploratory study. Journal of the Neurological Sciences, 310(1–2), 276–278. http://dx.doi.org/10.1016/j.jns.2011.06.032. Frank, M. J., Samanta, J., Moustafa, A. A., & Sherman, S. J. (2007). Hold your horses: Impulsivity, deep brain stimulation, and medication in parkinsonism. Science (New York, N.Y.), 318(5854), 1309–1312. http://dx.doi.org/10.1126/science.1146157. Frank, M. J., Seeberger, L. C., & O’Reilly, R. C. (2004). By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science (New York, N.Y.), 306(5703), 1940–1943. http://dx.doi.org/10.1126/science.1102941. Friston, K. J., Price, C. J., Fletcher, P., Moore, C., Frackowiak, R. S., & Dolan, R. J. (1996). The trouble with cognitive subtraction. NeuroImage, 4(2), 97–104. http://dx.doi.org/ 10.1006/nimg.1996.0033. Giedd, J. N., Blumenthal, J., Jeffries, N. O., Castellanos, F. X., Liu, H., Zijdenbos, A., et al. (1999). Brain development during childhood and adolescence: A longitudinal MRI study. Nature Neuroscience, 2(10), 861–863. http://dx.doi.org/10.1038/13158. Gilbert, D. L., Christian, B. T., Gelfand, M. J., Shi, B., Mantil, J., & Sallee, F. R. (2006). Altered mesolimbocortical and thalamic dopamine in Tourette syndrome. Neurology, 67(9), 1695–1697. http://dx.doi.org/10.1212/01.wnl.0000242733.18534.2c. Gla¨scher, J., Daw, N., Dayan, P., & O’Doherty, J. P. (2010). States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 66(4), 585–595. http://dx.doi.org/10.1016/j.neuron.2010.04.016. Gogtay, N., Giedd, J. N., Lusk, L., Hayashi, K. M., Greenstein, D., Vaituzis, A. C., et al. (2004). Dynamic mapping of human cortical development during childhood through

150

Stefano Palminteri and Mathias Pessiglione

early adulthood. Proceedings of the National Academy of Sciences of the United States of America, 101(21), 8174–8179. http://dx.doi.org/10.1073/pnas.0402680101. Hartmann, A., & Worbe, Y. (2013). Pharmacological treatment of Gilles de la Tourette syndrome. Neuroscience and Biobehavioral Reviews, 37(6), 1157–1161. http://dx.doi.org/ 10.1016/j.neubiorev.2012.10.014. Kamin, L. J. (1967). “Attention-like” processes in classical conditioning. Hamilton, Ontario: Department of psychology, McMaster University. Kawohl, W., Schneider, F., Vernaleken, I., & Neuner, I. (2009). Aripiprazole in the pharmacotherapy of Gilles de la Tourette syndrome in adult patients. The World Journal of Biological Psychiatry: The Official Journal of the World Federation of Societies of Biological Psychiatry, 10(4 Pt. 3), 827–831. http://dx.doi.org/10.1080/15622970701762544. Kienast, T., & Heinz, A. (2006). Dopamine and the diseased brain. CNS & Neurological Disorders—Drug Targets, 5(1), 109–131. http://dx.doi.org/10.2174/187152706784111560. Kim, H., Shimojo, S., & O’Doherty, J. P. (2006). Is avoiding an aversive outcome rewarding? Neural substrates of avoidance learning in the human brain. PLoS Biology, 4(8), e233. http://dx.doi.org/10.1371/journal.pbio.0040233. Koechlin, E., & Summerfield, C. (2007). An information theoretical approach to prefrontal executive function. Trends in Cognitive Sciences, 11(6), 229–235. http://dx.doi.org/ 10.1016/j.tics.2007.04.005. Kouider, S., & Dehaene, S. (2007). Levels of processing during non-conscious perception: A critical review of visual masking. Philosophical Transactions of the Royal Society of London Series B, Biological Sciences, 362(1481), 857–875. http://dx.doi.org/10.1098/rstb.2007.2093. Lawrence, A. D., Evans, A. H., & Lees, A. J. (2003). Compulsive use of dopamine replacement therapy in Parkinson’s disease: Reward systems gone awry? Lancet Neurology, 2(10), 595–604. Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/14505581. Leckman, J. F. (2002). Tourette’s syndrome. Lancet, 360(9345), 1577–1586. http://dx.doi. org/10.1016/S0140-6736(02)11526-1. Maia, T. V., & Frank, M. J. (2011). From reinforcement learning models to psychiatric and neurological disorders. Nature Neuroscience, 14(2), 154–162. http://dx.doi.org/10.1038/nn.2723. Malison, R. T., McDougle, C. J., van Dyck, C. H., Scahill, L., Baldwin, R. M., Seibyl, J. P., et al. (1995). [123I]beta-CIT SPECT imaging of striatal dopamine transporter binding in Tourette’s disorder. The American Journal of Psychiatry, 152(9), 1359–1361. Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/7653693. McNaught, K. S. P., & Mink, J. W. (2011). Advances in understanding and treatment of Tourette syndrome. Nature Reviews. Neurology, 7(12), 667–676. http://dx.doi.org/ 10.1038/nrneurol.2011.167. Mink, J. W. (2003). The basal ganglia and involuntary movements. Archives of Neurology, 60, 1365–1368. Mirenowicz, J., & Schultz, W. (1996). Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature, 379(6564), 449–451. http://dx.doi. org/10.1038/379449a0. Montague, P. R., Dolan, R. J., Friston, K. J., & Dayan, P. (2012). Computational psychiatry. Trends in Cognitive Sciences, 16(1), 72–80. http://dx.doi.org/10.1016/j.tics. 2011.11.018. Morris, G., Nevet, A., Arkadir, D., Vaadia, E., & Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience, 9(8), 1057–1063. http:// dx.doi.org/10.1038/nn1743. Murphey, R. M. (1967). Instrumental conditioning of the fruit fly, Drosophila melanogaster. Animal Behaviour, 15(1), 153–161. http://dx.doi.org/10.1016/S0003-3472(67)80027-7. Murray, G. K., Corlett, P. R., Clark, L., Pessiglione, M., Blackwell, A. D., Honey, G., et al. (2008). Substantia nigra/ventral tegmental reward prediction error disruption in psychosis. Molecular Psychiatry, 13(3), 267–276. http://dx.doi.org/10.1038/sj.mp.4002058, 239.

Reinforcement Learning and Tourette Syndrome

151

Nielen, M. M., den Boer, J. A., & Smid, H. G. O. M. (2009). Patients with obsessivecompulsive disorder are impaired in associative learning based on external feedback. Psychological Medicine, 39(9), 1519–1526. http://dx.doi.org/10.1017/S0033291709005297. Niv, Y., Daw, N. D., Joel, D., & Dayan, P. (2007). Tonic dopamine: Opportunity costs and the control of response vigor. Psychopharmacology, 191(3), 507–520. http://dx.doi.org/ 10.1007/s00213-006-0502-4. O’Doherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal difference models and reward-related learning in the human brain. Neuron, 38(2), 329–337. Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/12718865. O’Doherty, J. P., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science (New York, N.Y.), 304(5669), 452–454. http://dx.doi.org/10.1126/science.1094285. O’Doherty, J. P., Deichmann, R., Critchley, H. D., & Dolan, R. J. (2002). Neural responses during anticipation of a primary taste reward. Neuron, 33(5), 815–826. Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/11879657. O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and its application to reward learning and decision making. Annals of the New York Academy of Sciences, 1104, 35–53. http://dx.doi.org/10.1196/annals.1390.022. Palminteri, S., Boraud, T., Lafargue, G., Dubois, B., & Pessiglione, M. (2009). Brain hemispheres selectively track the expected value of contralateral options. The Journal of Neuroscience, 29(43), 13465–13472. http://dx.doi.org/10.1523/JNEUROSCI.150009.2009. Palminteri, S., Clair, A.-H., Mallet, L., & Pessiglione, M. (2012). Similar improvement of reward and punishment learning by serotonin reuptake inhibitors in obsessivecompulsive disorder. Biological Psychiatry, 72(3), 244–250. http://dx.doi.org/10.1016/ j.biopsych.2011.12.028. Palminteri, S., Justo, D., Jauffret, C., Pavlicek, B., Dauta, A., Delmaire, C., et al. (2012). Critical roles for anterior insula and dorsal striatum in punishment-based avoidance learning. Neuron, 76(5), 998–1009. http://dx.doi.org/10.1016/j.neuron.2012.10.017. Palminteri, S., Lebreton, M., Worbe, Y., Grabli, D., Hartmann, A., & Pessiglione, M. (2009). Pharmacological modulation of subliminal learning in Parkinson’s and Tourette’s syndromes. Proceedings of the National Academy of Sciences of the United States of America, 106(45), 19179–19184. http://dx.doi.org/10.1073/pnas.0904035106. Palminteri, S., Lebreton, M., Worbe, Y., Hartmann, A., Lehe´ricy, S., Vidailhet, M., et al. (2011). Dopamine-dependent reinforcement of motor skill learning: Evidence from Gilles de la Tourette syndrome. Brain, 134(8), 2287–2301. http://dx.doi.org/ 10.1093/brain/awr147. Palminteri, S., Serra, G., Buot, A., Schmidt, L., Welter, M.-L., & Pessiglione, M. (2013). Hemispheric dissociation of reward processing in humans: Insights from deep brain stimulation. Cortex: A Journal Devoted to the Study of the Nervous System and Behavior, pii: S0010-9452(13) 00072-5. http://dx.doi.org/10.1016/j.cortex.2013.02.014. [Epub ahead of print]. Pavlov, I. P. (1927). Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. London: Oxford University press. Retrieved from,http://psychclassics. yorku.ca/Pavlov/. Pessiglione, M., Petrovic, P., Daunizeau, J., Palminteri, S., Dolan, R. J., & Frith, C. D. (2008). Subliminal instrumental conditioning demonstrated in the human brain. Neuron, 59(4), 561–567. http://dx.doi.org/10.1016/j.neuron.2008.07.005. Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J., & Frith, C. D. (2006). Dopaminedependent prediction errors underpin reward-seeking behaviour in humans. Nature, 442(7106), 1042–1045. http://dx.doi.org/10.1038/nature05051. Priori, A., Giannicola, G., Rosa, M., Marceglia, S., Servello, D., Sassi, M., et al. (2013). Deep brain electrophysiological recordings provide clues to the pathophysiology of Tourette

152

Stefano Palminteri and Mathias Pessiglione

syndrome. Neuroscience and Biobehavioral Reviews, 37(6), 1063–1068. http://dx.doi.org/ 10.1016/j.neubiorev.2013.01.011. Rankin, C. H. (2004). Invertebrate learning: What can’t a worm learn? Current Biology, 14(15), R617–R618. http://dx.doi.org/10.1016/j.cub.2004.07.044. Remijnse, P. L., Nielen, M. M., van Balkom, A. J., Cath, D. C., van Oppen, P., Uylings, H. B. M., et al. (2006). Reduced orbitofrontal-striatal activity on a reversal learning task in obsessive-compulsive disorder. Archives of General Psychiatry, 63(11), 1225–1236. http://dx.doi.org/10.1001/archpsyc.63.11.1225. Rescorla, R. A. (1967). Pavlovian conditioning and its proper control procedures. Psychological Review, 74(1), 71–80. Rescorla, R. A., & Wagner, A. R. (1972). RescorlaWagnerChapter1972.pdf. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-Century-Crofts. Rickards, H., & Cavanna, A. E. (2009). Gilles de la Tourette: The man behind the syndrome. Journal of Psychosomatic Research, 67(6), 469–474. http://dx.doi.org/ 10.1016/j.jpsychores.2009.07.019. Rotge, J.-Y., Langbour, N., Guehl, D., Bioulac, B., Jaafari, N., Allard, M., et al. (2010). Gray matter alterations in obsessive-compulsive disorder: An anatomic likelihood estimation meta-analysis. Neuropsychopharmacology: Official Publication of the American College of Neuropsychopharmacology, 35(3), 686–691. http://dx.doi.org/10.1038/npp.2009.175. Rutledge, R. B., Dean, M., Caplin, A., & Glimcher, P. W. (2010). Testing the reward prediction error hypothesis with an axiomatic model. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 30(40), 13525–13536. http://dx.doi.org/10.1523/ JNEUROSCI.1747-10.2010. Rutledge, R. B., Lazzaro, S. C., Lau, B., Myers, C. E., Gluck, M. A., & Glimcher, P. W. (2009). Dopaminergic drugs modulate learning rates and perseveration in Parkinson’s patients in a dynamic foraging task. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 29(48), 15104–15114. http://dx.doi.org/10.1523/ JNEUROSCI.3524-09.2009. Samejima, K., & Doya, K. (2007). Multiple representations of belief states and action values in corticobasal ganglia loops. Annals of the New York Academy of Sciences, 1104, 213–228. http://dx.doi.org/10.1196/annals.1390.024. Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599. http://dx.doi.org/10.1126/science.275.5306.1593. Siebner, H. R., Bergmann, T. O., Bestmann, S., Massimini, M., Johansen-Berg, H., Mochizuki, H., et al. (2009). Consensus paper: Combining transcranial stimulation with neuroimaging. Brain Stimulation, 2(2), 58–80. http://dx.doi.org/10.1016/j.brs.2008.11.002. Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Oxford, England: Appleton-Century. Retrieved from, http://en.wikipedia.org/wiki/The_Behavior_of_Organisms. Steinberg, E. E., Keiflin, R., Boivin, J. R., Witten, I. B., Deisseroth, K., & Janak, P. H. (2013). A causal link between prediction errors, dopamine neurons and learning. Nature Neuroscience, 16(7), 966–973. http://dx.doi.org/10.1038/nn.3413. Suri, R. E., & Schultz, W. (1998). Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Experimental brain research. Experimentelle Hirnforschung. Expe´rimentation ce´re´brale, 121(3), 350–354. Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/9746140. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 9(5), 1054. http://dx.doi.org/10.1109/TNN.1998.712192. Tarnok, Z., Ronai, Z., Gervai, J., Kereszturi, E., Gadoros, J., Sasvari-Szekely, M., et al. (2007). Dopaminergic candidate genes in Tourette syndrome: Association between tic severity and 30 UTR polymorphism of the dopamine transporter gene. American Journal of Medical Genetics Part B, Neuropsychiatric genetics: The Official Publication of the

Reinforcement Learning and Tourette Syndrome

153

International Society of Psychiatric Genetics, 144B(7), 900–905. http://dx.doi.org/10.1002/ ajmg.b.30517. Theoretical neuroscience: computational and mathematical modeling of neural systems. (2005). The MIT Press. ISBN:0262541858. Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York: The Macmillan Company. Retrieved from, http://www.psycontent.com/content/xg6830n4711655l1/. Viswanathan, A., Jimenez-Shahed, J., Baizabal Carvallo, J. F., & Jankovic, J. (2012). Deep brain stimulation for Tourette syndrome: Target selection. Stereotactic and Functional Neurosurgery, 90(4), 213–224. http://dx.doi.org/10.1159/000337776. Voon, V., Fernagut, P.-O., Wickens, J., Baunez, C., Rodriguez, M., Pavon, N., et al. (2009). Chronic dopaminergic stimulation in Parkinson’s disease: From dyskinesias to impulse control disorders. Lancet Neurology, 8(12), 1140–1149. http://dx.doi.org/10.1016/ S1474-4422(09)70287-X. Voon, V., Pessiglione, M., Brezing, C., Gallea, C., Fernandez, H. H., Dolan, R. J., et al. (2010). Mechanisms underlying dopamine-mediated reward bias in compulsive behaviors. Neuron, 65(1), 135–142. http://dx.doi.org/10.1016/j.neuron. 2009.12.027. Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292. http://dx.doi.org/10.1007/BF00992698. Weintraub, D., Comella, C. L., & Horn, S. (2008). Parkinson’s disease–Part 3: Neuropsychiatric symptoms. The American Journal of Managed Care, 14(2 Suppl.), S59–S69.Retrieved from,http://www.ncbi.nlm.nih.gov/pubmed/18402509. Worbe, Y., Gerardin, E., Hartmann, A., Valabre´gue, R., Chupin, M., Tremblay, L., et al. (2010). Distinct structural changes underpin clinical phenotypes in patients with Gilles de la Tourette syndrome. Brain: A Journal of Neurology, 133(Pt. 12), 3649–3660. http://dx. doi.org/10.1093/brain/awq293. Worbe, Y., Malherbe, C., Hartmann, A., Pe´le´grini-Issac, M., Messe´, A., Vidailhet, M., et al. (2012). Functional immaturity of cortico-basal ganglia networks in Gilles de la Tourette syndrome. Brain: A Journal of Neurology, 135(Pt. 6), 1937–1946. http://dx.doi.org/ 10.1093/brain/aws056. Worbe, Y., Mallet, L., Golmard, J.-L., Be´har, C., Durif, F., Jalenques, I., et al. (2010). Repetitive behaviours in patients with Gilles de la Tourette syndrome: Tics, compulsions, or both? PloS One, 5(9), e12959. http://dx.doi.org/10.1371/journal.pone.0012959. Worbe, Y., Palminteri, S., Hartmann, A., Vidailhet, M., Lehe´ricy, S., & Pessiglione, M. (2011). Reinforcement learning and Gilles de la Tourette syndrome. Archives of General Psychiatry, 68(12), 1257–1266. Wunderlich, K., Dayan, P., & Dolan, R. J. (2012). Mapping value based planning and extensively trained choice in the human brain. Nature Neuroscience, 15(5), 786–791. http://dx. doi.org/10.1038/nn.3068. Yoon, D. Y., Gause, C. D., Leckman, J. F., & Singer, H. S. (2007). Frontal dopaminergic abnormality in Tourette syndrome: A postmortem analysis. Journal of the Neurological Sciences, 255(1–2), 50–56. http://dx.doi.org/10.1016/j.jns.2007.01.069. Yoon, D. Y., Rippel, C. A., Kobets, A. J., Morris, C. M., Lee, J. E., Williams, P. N., et al. (2007). Dopaminergic polymorphisms in Tourette syndrome: Association with the DAT gene (SLC6A3). American Journal of Medical Genetics Part B, Neuropsychiatric genetics: The Official Publication of the International Society of Psychiatric Genetics, 144B(5), 605–610. http://dx.doi.org/10.1002/ajmg.b.30466.

Reinforcement learning and Tourette syndrome.

In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few ye...
372KB Sizes 0 Downloads 0 Views