Cognition 142 (2015) 299–311

Contents lists available at ScienceDirect

Cognition journal homepage: www.elsevier.com/locate/COGNIT

The role of explanatory considerations in updating Igor Douven a,⇑, Jonah N. Schupbach b,1 a b

Sciences, Normes, Décision, Paris-Sorbonne University, France Department of Philosophy, University of Utah, United States

a r t i c l e

i n f o

Article history: Received 4 July 2014 Revised 27 April 2015 Accepted 29 April 2015 Available online 9 June 2015 Keywords: Explanation Updating Inference Probability Bayesianism

a b s t r a c t There is an ongoing controversy in philosophy about the connection between explanation and inference. According to Bayesians, explanatory considerations should be given weight in determining which inferences to make, if at all, only insofar as doing so is compatible with Strict Conditionalization. Explanationists, on the other hand, hold that explanatory considerations can be relevant to the question of how much confidence to invest in our hypotheses in ways which violate Strict Conditionalization. The controversy has focused on normative issues. This paper investigates experimentally the descriptive question of whether judgments of the explanatory goodness of hypotheses do play a role when people revise their degrees of belief in those hypotheses upon the receipt of new evidence. We present the results of three experiments that together strongly support the predictive superiority of the explanationist position. Ó 2015 Elsevier B.V. All rights reserved.

1. Introduction Human learning involves the modifying (‘‘updating’’) of degrees of confidence (or ‘‘credences’’) over time in response to new information. A crucial question thus facing any account of human learning is what factors and principles determine when and to what extent we change our credences. The Bayesian response to this question is that, upon learning that some proposition is undoubtedly true, an agent’s updated credences ought to match his or her prior credences (taken just prior to learning that new information) conditional on the supposition that same proposition holds true. In other words, where credences are formally represented as probabilities, posterior (updated) probabilities are determined by prior conditional probabilities. This idea gets officially canonized in the following rule:

Strict Conditionalization: Upon learning A 2 A and nothing else between times t 1 and t2 , an agent’s credences are to be updated so as to satisfy the equality Prt2 ðBÞ ¼ Prt1 ðBjAÞ for all propositions B 2 A (provided Prt1 ðAÞ > 0).

⇑ Corresponding author. E-mail addresses: [email protected] (I. Douven), jonah.n.schupbach@ utah.edu (J.N. Schupbach). 1 The authors contributed equally to this article. http://dx.doi.org/10.1016/j.cognition.2015.04.017 0010-0277/Ó 2015 Elsevier B.V. All rights reserved.

Here, A is an algebra of propositions over which the probability measures Prt1 —representing the agent’s credences at t1 —and Prt2 —representing the agent’s credences at t2 —are defined, and Prt1 ðBjAÞ designates the conditional probability of B given A at t 1 . Adherence to Strict Conditionalization commits Bayesians to the claim that prior conditional credences alone determine posterior credences. An ongoing controversy in philosophy points to an alternative theory, however. In debate over the confirmation-theoretic status of explanatory considerations, so-called explanationists hold that judgments of the explanatory goodness of hypotheses are directly relevant to the question of how much confidence we should invest in those hypotheses. Moreover, this claim is often taken to imply that explanatory considerations influence our credences in ways not captured by Strict Conditionalization.2 By contrast, Bayesians hold that explanatory considerations have no home in confirmation theory, at least not in any way that might conflict with Strict Conditionalization (Salmon, 2001; van Fraassen, 1989).

2 Some explanationists have proposed ways in which explanatory considerations might influence posterior probabilities purely via prior conditional probabilities, thus making explanation’s influence on updating compatible with Strict Conditionalization (see, e.g., Okasha (2000), Lipton (2004), and Weisberg (2009); see Douven (2011) for a critical discussion of this approach). This is akin to the proposal, made by some psychologists (e.g., Lombrozo, 2007), that priors may be informed by considerations of simplicity. Such an approach provides a gloss on, rather than alternative to, Bayesianism. In this paper, we reserve the label ‘‘explanationism’’ for approaches that are incompatible with Bayesianism.

300

I. Douven, J.N. Schupbach / Cognition 142 (2015) 299–311

The philosophical controversy has focused on normative issues— particularly the issue of whether explanatory considerations ought to be given special confirmatory weight in the logic of updating. This paper instead presents three experiments that collectively aim to investigate the respective descriptive merits of Bayesianism and explanationism. For our purposes, when taken as descriptive theories, we understand these positions as involving the following general claims: Bayesianism: People actually update their credences in accordance with Strict Conditionalization. Explanationism: In updating their credences, people take into account their explanatory judgments in a way not already captured by Strict Conditionalization. Importantly, there are two distinct ways in which explanationism could prove descriptively superior to Bayesianism. Firstly, explanatory considerations may have a significant role in a descriptive account of updating in addition to conditional probabilities. The implication here would be that the Bayesian model is on the right track, but ultimately in need of explanationist augmentation. Secondly, explanatory judgments may have a significant role in a descriptive account of updating in place of conditional probabilities. Here, the supported model would be one that abandons the Bayesian model altogether, replacing—rather than augmenting—conditional probabilities with explanatory judgments. Our investigation aims to shed light on the following specific questions: Q1. How do Bayesianism and explanationism compare with regard to their descriptive adequacy? Do judgments of the explanatory goodness of hypotheses play an essential role in updating in a way that is incompatible with the Bayesian doctrine? Q2. If explanatory judgments are found to have such a role, do conditional probabilities retain an important influence in updating alongside such judgments? Q3. What sort of explanatory judgments in particular (if any) factor into updating? Past studies suggest that people’s updates deviate from Strict Conditionalization (e.g., Phillips & Edwards, 1966; Robinson & Hastie, 1985; Zhao, Crupi, Tentori, Fitelson, & Osherson, 2012). There is also empirical work showing that explanatory considerations do have an impact on people’s beliefs (e.g., Douven & Verbrugge, 2010; Khemlani & Johnson-Laird, 2011, 2012, 2013; Koehler, 1991; Lombrozo, 2006, 2007; Lombrozo & Carey, 2006; Pennington & Hastie, 1992). And while of late a broadly Bayesian approach to learning has come to resonate among many working in cognitive psychology (e.g., Baratgin, Over, & Politzer, 2013; Elqayam & Evans, 2013; Oaksford & Chater, 2007; Oaksford & Chater, 2013; Over, 2009) and cognitive neuroscience (e.g., Doya, Ishii, Pouget, & Rao, 2006, Friston and Stephan, 2007, Hohwy, 2013), we are not aware of any research in those areas that could be said to favor Strict Conditionalization over some probabilistic version of explanationism. Indeed, Oaksford and Chater (2013:374) conclude their discussion of the issue of belief change in the context of the new Bayesian paradigm in the psychology of reasoning with the remark that ‘‘it is unclear what are the rational probabilistic constraints on dynamic inference.’’ At any rate, a systematic empirical comparison of Bayesianism and explanationism is, to the best of our knowledge, not yet to be found in the literature. With the following three experiments, we aim to begin filling this lacuna.

2. Experiment 1 Experiment 1 was originally conducted with an eye toward comparing different probabilistic measures of explanatory goodness (the results of this comparison were presented in Schupbach, 2011). It was only later realized that the data gathered in the experiment might also shed light on questions Q1–Q3. For the purposes of comparing measures of explanation, subjective probability judgments (credences) as well as objective probabilities were used to calculate values of explanatory goodness according to the various measures, which were then compared with participants’ judgments of explanatory goodness. Here, our goal is different. Instead of aiming to determine which measure of explanation best predicts actual judgments of explanatory goodness, we are interested in the role (if any) that such judgments play in updating credences. To answer this question, we reanalyzed the data from the earlier experiment. We begin by summarizing the experiment. 2.1. Participants Twenty-six students from the University of Pittsburgh were individually interviewed as part of the study. In return for their participation, they received $10 each. The mean age of the participants was 20 years (SD ¼ 2). Twelve of the participants were females. 2.2. Materials and procedure Experiment 1’s materials and method were based closely upon those used by Phillips and Edwards (1966)—and more recently by Tentori, Crupi, Bonini, and Osherson (2007). Two opaque urns were used in the interview, each containing 40 balls, but one urn (‘‘urn A’’) containing 30 black balls and 10 white ones, and the other urn (‘‘urn B’’) containing 15 black balls and 25 white ones. At the beginning of the interview, each participant was shown both urns and informed of their respective contents. The participant was also given a visual representation of these contents, which he or she was allowed to consult at any point during the interview. The decision of which urn to use throughout the remainder of the interview was next decided via an actual flip of a fair coin. Each participant saw that the coin flip determined the choice of urn; however, whether the chosen urn was A or B was concealed from the participant. Ten balls were then randomly drawn from the chosen urn without replacement. Throughout the interview, the drawn balls were lined up in front of the participant, in the order in which they had been drawn. After each ball was drawn, the participant was asked first to judge the explanatory goodness, in light of the evidence so far, of the hypothesis that urn A had been selected (HA ), and then to do the same for the hypothesis that urn B had been selected (HB ). The participant had to answer these questions by making a mark on a continuous scale with range ½1; 1. The scale included five interpreted points labeled at equal distances, the leftmost label reading that the hypothesis at issue was an extremely poor explanation of the evidence so far, the rightmost reading that the hypothesis was an extremely good explanation, and the labels in between reading that the hypothesis was a poor/neither poor nor good /good explanation, in the obvious order. After these questions had been answered, the participant was asked how likely it was in his or her judgment that urn A had been selected, given the color of the ball or balls that so far had been drawn. The same question was asked concerning urn B, but here the participant was notified that the answer to this question and

301

I. Douven, J.N. Schupbach / Cognition 142 (2015) 299–311

2468

1

3

4

5

1.0

6 0.8

7

13 0.8 0.4 0.0

2

2468

19

25

8

14

20

26

9

15

21

10

16

22

11

17

23

12

18

0.8 0.4 0.0

24

subjective posteriors

0.8 0.4 0.0

2468

0.8 0.4 0.0

0.6

0.4

0.2

0.8 0.4 0.0

0.0

2468

0.0

0.2

0.4

0.6

0.8

1.0

conditional probabilities Fig. 1. The left panel shows posteriors (blue) and objective conditional probabilities (red) after each draw for the 26 participants; the right panel plots posteriors against objective conditional probabilities (with added smoother to facilitate visual interpretation). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

the answer to the previous question had to sum to 1 (or 100, if the estimates were given in percentages). 2.3. Descriptive statistics Throughout the following, we use ‘‘subjective posterior probabilities’’ (or simply ‘‘subjective posteriors,’’ or even just ‘‘posteriors’’) to refer to participants’ credences in HA as they are updated upon learning the results of each draw. Additionally, we use ‘‘objective conditional probabilities’’ to refer to the probabilities of HA calculated from the chance setup of the experiments conditional upon the results corresponding to each posterior. So, for instance, according to Bayesianism, upon learning the results of the first four draws, fd1 ; d2 ; d3 ; d4 g, a participant’s posterior ought to correspond to the conditional probability PrðHA jd1 ^ d2 ^ d3 ^ d4 Þ.3 Comparing participants’ posteriors with corresponding objective conditional probabilities, as in Fig. 1, indicates that there is a central tendency bias present in participants’ subjective probability judgments.4 If there were a perfect match between the two, all plotted points in the right panel of the figure would lie on the diagonal. Instead, posteriors smaller than .5 tend to be higher than the corresponding objective conditional probabilities, posteriors greater than .5 tend to be lower than the corresponding objective conditional probabilities. The central tendency bias is confirmed by further analysis: 87.3% of all subjective posterior probabilities are closer to .5 than 3 Given that in the context of the experiment HA and HB were mutually exclusive and jointly exhaustive, posteriors as well as objective conditional probabilities for HB are obtained by simply subtracting the corresponding posteriors or objective conditional probabilities for HA from 1. 4 According to Stevens (1971:428), the central tendency effect is ‘‘one of the most obstinate’’ response biases. It has been known in the psychological literature at least since Hollingworth (1910). For recent evidence of the effect, see Seth and Shimojo (2001), Matthews and Stewart (2009), Karolis, Iuculano, and Butterworth (2011). In the present case, where the measurements concern probabilities, there may be a deeper explanation of the bias. For instance, according to Kahneman and Tversky’s (1979) prospect theory people tend to overweight small probabilities and to underweight large probabilities because they are less sensitive to changes in probability toward the end points of the scale than around the midpoint.

the corresponding objective conditional probabilities, which is reliably different from what one would expect to happen by chance, v2 ð1Þ ¼ 143:27; p < :0001; the mean of the deviation from .5 for subjective probabilities is also significantly less than that for the objective conditional probabilities, tð259Þ ¼ 16:27; p < :0001. While the deviations of posteriors from objective conditional probabilities are more or less normally distributed around a mean of 0.02 and with a standard deviation of 0.17, the greatest deviations are to be found at the outer ends of the scale, as the right plot in Fig. 1 already suggests. Specifically, for objective conditional probabilities in the interval ½0; 1=3Þ, the average absolute deviation from subjective posteriors equals .15, for those in the interval ½1=3; 2=3Þ, the average absolute deviation from posteriors equals .05, and for objective conditional probabilities in the interval ½2=3; 1, the average absolute deviation equals .13. Table 1 gives the descriptive statistics as well as the correlation matrix of the four relevant variables: subjective posteriors, objective conditional probabilities, judgments on the explanatory goodness of hypothesis HA , and judgments on the explanatory goodness of hypothesis HB . Means of subjective posteriors and objective conditional probabilities differ significantly, tð259Þ ¼ 2:08; p ¼ :039, but, as the table shows, they are still highly correlated. How closely each participant’s posteriors come, on average, to their corresponding objective conditional probabilities can be measured by taking the sum of the Euclidean distances between these values (see Schupbach, 2011). The average value of this sum for each participant was 1.37 (SD ¼ 0:69). Table 1 Descriptive statistics and correlations between the variables.

O S A B

M

SD

O

S

A

B

.48 .50 0.10 0.09

.37 .27 0.54 0.48

1.00 .91 .85 .75

1.00 .84 .81

1.00 .56

1.00

Note: O = objective conditional probabilities of HA ; S = subjective posteriors of HA ; A = judgments on explanatory goodness of HA ; and B = judgments on explanatory goodness of HB . All correlation coefficients are significant at an a-level of.0001.

302

I. Douven, J.N. Schupbach / Cognition 142 (2015) 299–311

Inspection of Fig. 2 reveals that some, but not all, participants always judged one hypothesis to be explanatorily good more or less to the extent that they judged the other to be explanatorily bad—the sum of their judgments at each stage of the experiment thus summed approximately to 0. Table 1 gives the correlation between judgments on the two hypotheses taken over all participants. Looking at the individual level, one finds that correlations range from rð8Þ ¼ 1; p < :0001, to rð8Þ ¼ :41; p ¼ :24. The mean of the correlations for the individual participants equals :65 (SD ¼ 0:42), and the median equals :89; thus, for most participants, the judgment of the explanatory goodness of the one hypothesis is indeed strongly negatively correlated with the judgment of the explanatory goodness of the other hypothesis. We can consider the ‘‘symmetry’’ of participants’ explanatory judgments by summing the two judgments at each step and then summing these sums for the 10 steps. It turns out that there is no significant correlation between that symmetry and how closely participants’ subjective probability judgments come to corresponding objective conditional probabilities (rð24Þ ¼ :003; p ¼ :99). 2.4. Regression analyses In the main part of the analysis, we fitted various linear models to the data. All models had subjective posterior probabilities of HA (S)—consisting of 26 (participants)  10 (draws) = 260 data points—as a response variable. We started by fitting a model with objective conditional probabilities of HA (O), judgments of the explanatory goodness of HA (A), and judgments of the explanatory goodness of HB (B), as well as all possible interaction terms, as predictors. Because none of the interactions were significant, we continued with the model with predictors O, A, and B, and we compared this model with all models with some subset of {O, A, B} as predictors. It was of particular interest to see how the ‘‘Bayesian’’ model, which has only O as a predictor, fared in comparison with models that included A, B, or both.5 Explanationism predicts that such comparisons will turn out in favor of the model that has at least one of A and B among its predictors. Although there are moderate to high correlations among the predictors, the condition number j calculated as recommended in Belsley, Kuh, and Welsch (1980) was found to equal 8.82, which means that there is no real reason to be concerned about multicollinearity (Baayen (2008:182)). Furthermore, diagnostic plots and tests (Shapiro–Wilk test and Breusch–Pagan test) for the various models did not reveal any deviations from normality or homoscedasticity. Table 2 gives various statistics relevant to comparing the models. Because the explanationist model includes predictors over the one present in the Bayesian model, it might yield a better fit to the data just for that reason. To address this concern, we compared the models using metrics that penalize for additional parameters, namely, the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), which weigh model fit and model complexity against each other. DAIC and DBIC are the values for each model minus the smallest AIC or BIC value. AIC and BIC values are not interpretable per se; these values are to be used comparatively, in that models with smaller values are taken to be better predictors of the data. According to Burnham and Anderson (2002:70 f), a difference in AIC value greater than 10 means that the model with the higher value enjoys basically no empirical support. Note in particular that the difference in AIC value between the 5 O could still be good predictor of subjective posteriors if the two variables negatively correlated with each other. Such a correlation would obviously go against Bayesianism. However, it is still warranted to call the model with O as a single predictor a ‘‘Bayesian model’’ in that Bayesianism predicts that model to perform best. (Thanks to an anonymous referee here.)

Bayesian model MO1 and the explanationist model MOAB1 is greater than 120, with the latter model having the lower value.6 Because all simpler models are nested within the MOAB1 model, each of them can be compared with that model by means of a likelihood ratio test. The v2 column in Table 2 gives the outcomes of these tests. The v2 values obtained in the tests were all highly significant, indicating that the MOAB1 model provides a better fit of the data than any of the simpler models. Furthermore, the MOA1 and the MOB1 model also do significantly better than the Bayesian model MO1, v2 ð1Þ ¼ 27:41; p < :0001 and v2 ð1Þ ¼ 56:74; p < :0001, respectively. Applying Vuong’s (1989) closeness test for nonnested models to MOA1 and MOB1 showed that neither model is significantly preferred over the other. Comparing by means of the same test the nonnested models MO1, MA1, and MB1 shows that the Bayesian model is significantly preferred to either of the other models (p < :0001 in both cases) and that neither of MA1 and MB1 is significantly preferred over the other. On the other hand, a further application of Vuong’s test shows that MAB1, which leaves out O as a predictor, does significantly better than MO1 (p ¼ :024). Even more remarkable is that, in terms of AIC and BIC values, MAB1 does second best. Finally, the R2 column in Table 2 gives the squared correlation between the fitted values of a model and the observed values. It is readily seen that while most models provide at least relatively good fits to the data, the fit of MOAB1 is particularly good. (Note that in terms of R2 value MAB1 does second best as well.) In sum, the explanationist model MOAB1 best fits the data robustly across all of these comparative tests. Table 3 gives the results of the regressions for the Bayesian model MO1 and for the explanationist model MOAB1. We see that according to the former, predicted subjective posterior equals 0.18 + 0.66O, whereas according to the latter, predicted subjective posterior equals 0.37 + 0.26O + 0.17A  0.19B, where in both models the predictors are highly significant. Recall from our question Q1 (in Section 1) that we are interested in whether, as Bayesians would expect, objective conditional probabilities alone have a significant role in a model of updating, or whether explanatory factors are an essential part of a satisfactory model. The results, in particular that O is highly significant in MO1, support the Bayesian tenet that objective conditional probabilities are crucial for modeling subjective posteriors. It would in effect have been highly surprising if the analysis had shown that people’s posteriors do not track objective conditional probabilities at all, or only to a very moderate extent. However, the fact that all three of O, A, and B are highly significant in the better fitting explanationist model MOAB1 is strong evidence that people change their subjective probabilities upon the receipt of new evidence not only on the basis of objective probabilistic considerations, but also on the basis of explanatory considerations. This last point is worth highlighting with regard to our question Q2—whether explanatory considerations are important in addition to, or in place of, objective conditional probabilities. The fact that objective conditional probabilities retain an important role alongside explanatory judgments in MOAB1 suggests that descriptive Bayesianism is more in need of explanationist augmentation than replacement. At the same time, the results on MAB1 show that replacing objective conditional probabilities by judgments of explanatory goodness already leads to greater predictive accuracy. Finally, with respect to Q3, it is worth noting that A and B are both significant predictors in MOAB1, despite the fact that the

6 We find a similar difference in the AICc values for these models, which some prefer over AIC values (Hurvich & Tsai, 1991). We can also calculate and compare Akaike weights, as recommended in Wagenmakers and Farrell (2004). Doing so allows us to say that MOAB1 is 1:76  1026 times likelier to be the correct model than is MO1.

303

I. Douven, J.N. Schupbach / Cognition 142 (2015) 299–311

2 1.0

4

6

8

10

2

4

6

8

10

2

4

6

1

2

3

4

5

6

7

8

9

10

11

12

8

10

0.5 0.0 -0.5 -1.0

1.0 0.5 0.0 -0.5

1.0

13

14

15

16

17

18

19

20

21

22

23

24

-1.0

0.5 0.0 -0.5 -1.0

1.0 0.5 0.0 -0.5

25

1.0

-1.0

26

0.5 0.0 -0.5 -1.0 2

4

6

8

10

Fig. 2. The development of judgments (over the 10 draws) of explanatory goodness of HA (blue) and HB (red) for the 26 participants. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Table 2 Comparison of the seven regression models.

MOAB1 MOA1 MOB1 MAB1 MO1 MA1 MB1

k

LL

AIC

DAIC

BIC

DBIC

v2

R2

5 4 4 4 3 3 3

264.82 216.09 230.76 240.37 202.39 132.53 107.89

519.64 424.18 453.51 472.74 398.77 259.06 209.78

0.00 95.46 66.13 46.90 120.87 260.57 309.86

501.84 409.94 439.27 458.49 388.09 248.38 199.10

0.00 91.90 62.57 43.34 113.75 253.45 302.74

– 97.46 68.13 48.90 124.87 264.57 313.86

.90 .85 .86 .87 .83 .71 .65

Note: k is the number of parameters and LL the log-likelihood. All

v2 values are significant at the p < :0001 level.

Table 3 Regression results for the Bayesian model MO1 and the best explanationist model MOAB1. Model

Variable

MO1

Intercept O

0.18 0.66

0.01 0.02

Intercept O A B

0.37 0.26 0.17 0.19

0.02 0.04 0.02 0.02

MOAB1

B

SE B

t

b

p

0.91

16.42 35.70

The role of explanatory considerations in updating.

There is an ongoing controversy in philosophy about the connection between explanation and inference. According to Bayesians, explanatory consideratio...
506KB Sizes 2 Downloads 10 Views