Cognitive Psychology 69 (2014) 1–24

Contents lists available at ScienceDirect

Cognitive Psychology journal homepage: www.elsevier.com/locate/cogpsych

Costs and benefits of automatization in category learning of ill-defined rules Maartje E.J. Raijmakers a,b,⇑, Verena D. Schmittmann c, Ingmar Visser a,b a b c

Department of Psychology, University of Amsterdam, The Netherlands Amsterdam Brain and Cognition (ABC), University of Amsterdam, The Netherlands Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, The Netherlands

a r t i c l e

i n f o

Article history: Accepted 11 December 2013

Keywords: Category learning Latent Markov analysis Representational shifts Strategies Automaticity Individual differences Exemplar-based learning Rule-based learning Ill-defined categories

a b s t r a c t Learning ill-defined categories (such as the structure of Medin & Schaffer, 1978) involves multiple learning systems and different corresponding category representations, which are difficult to detect. Application of latent Markov analysis allows detection and investigation of such multiple latent category representations in a statistically robust way, isolating low performers and quantifying shifts between latent strategies. We reanalyzed data from three experiments presented in Johansen and Palmeri (2002), which comprised prolonged training of ill-defined categories, with the aim of studying the changing interactions between underlying learning systems. Our results broadly confirm the original conclusion that, in most participants, learning involved a shift from a rule-based to an exemplar-based strategy. Separate analyses of latent strategies revealed that (a) shifts from a rule-based to an exemplar-based strategy resulted in an initial decrease of speed and an increase of accuracy; (b) exemplar-based strategies followed a power law of learning, indicating automatization once an exemplar-based strategy was used; (c) rule-based strategies changed from using pure rules to rules-plus-exceptions, which appeared as a dual processes as indicated by the accuracy and response-time profiles. Results suggest an additional pathway of learning ill-defined categories, namely involving a shift from a simple rule to a complex rule after which this complex rule is automatized as an exemplar-based strategy. Ó 2014 Published by Elsevier Inc.

⇑ Corresponding author. Address: Department of Psychology, University of Amsterdam, Weesperplein 4, 1018 XA Amsterdam, The Netherlands. Fax: +31 206390279. E-mail address: [email protected] (M.E.J. Raijmakers). 0010-0285/$ - see front matter Ó 2014 Published by Elsevier Inc. http://dx.doi.org/10.1016/j.cogpsych.2013.12.002

2

M.E.J. Raijmakers et al. / Cognitive Psychology 69 (2014) 1–24

1. Introduction Human category learning is a highly debated subject in cognitive science with increasingly complex conclusions about underlying learning mechanisms being rule based or similarity based (Ashby & Maddox, 2010; Goldstone & Kersten, 2003; Palmeri & Gauthier, 2004). Over many decades of category learning research, the focus has shifted from rule-based learning of well-defined categories (e.g., Bruner, Goodnow, & Austin, 1956) to learning ill-defined categories, i.e., category structures that are only partially describable by simple rules (e.g., Medin & Schaffer, 1978; Rosch & Mervis, 1975) and/ or that are only well-defined by integrating information from multiple dimensions (Ashby & Ell, 2001). To understand the results of these and related studies, many single-component models of category learning have been proposed in the literature. These include prototype models (e.g., Posner & Keele, 1968; Rosch & Mervis, 1975), exemplar models (e.g., Kruschke, 1992), connectionist models (e.g., Gluck & Bower, 1988), Bayesian models (e.g., Anderson, 1991), and decision-boundary models (e.g., Ashby & Gott, 1988). Exemplar-based models in particular have been very successful in explaining many empirical results in category learning research (e.g., Kruschke, 1992; Nosofsky, 1986, 1988; Nosofsky & Palmeri, 1997). Notwithstanding their success, there are important empirical results that the single-component models fail to account for. Those results reveal that different types of representations are formed within and between experiments (Ashby & Ell, 2001; Erickson & Kruschke, 1998). For example, analysis of the performance after learning an ill-defined category structure showed that individuals differed in the types of generalizations they made, suggesting the involvement of different learning systems (e.g., Erickson & Kruschke, 1998; Johansen & Palmeri, 2002; Nosofsky, Clark, & Shin, 1989). Dissociation studies and cognitive neuroscience studies revealed additional evidence for the existence of multiple systems of category learning (e.g., Davis, Love, & Preston, 2012; Maddox & Ing, 2005; Nomura et al., 2007), which is now a more commonly accepted hypothesis (Ashby & Maddox, 2010; Hélie, Waldschmidt, & Ashby, 2010). Hence, several hybrid models that combine multiple learning systems were introduced (Anderson & Betz, 2001; Ashby, Alfonso-Reese, Turken, & Waldron, 1998; Erickson & Kruschke, 1998; Nosofsky & Palmeri, 1998; Vandierendonck, 1995; for an overview see Palmeri, Wong, & Gauthier, 2004). Assuming the existence of multiple modes of categorization learning, an important question in current research is to identify the interaction between those learning systems (Ashby & Crossley, 2010; Ashby & Maddox, 2010). In this article, we study the interaction between two category-learning systems by means of detailed analyses of participants’ process of learning an ill-defined category structure using data partially presented in Johansen and Palmeri (2002). As they found representational shifts during learning of an ill-defined categorization structure, representational formats are latent, that is, not directly observable. In the current article, we use a statistical approach that allows for the identification of (subgroups of) participants that use different representations. This approach then extends the Johansen and Palmeri (2002) results by analyzing these subgroups of participants separately, allowing for a more detailed characterization of the process of changing representations. Following Rickard (2004, p. 65), to denote a type of categorization-learning process (either rule-based or exemplar-based) we will use henceforth the term categorization strategy, which is defined as ‘‘a unique series of mental steps toward a solution’’ and which ‘‘does not necessarily have direct implications regarding intention or awareness’’. Before presenting our statistical approach we discuss the occurrence of representational shifts in relation to theories of automatization. This results in several hypotheses about the (latent) categorization strategies during learning. 1.1. Representational shifts and automatization Johansen and Palmeri (2002) showed that there exist important inter-individual differences in learning ill-defined categories: during and at the end of their experiment, some participants had formed exemplar-based representations and others had formed rule-based representations. They also observed intra-individual differences: individual participants seemed to change their representations from rule-based to exemplar-based during the course of learning.

M.E.J. Raijmakers et al. / Cognitive Psychology 69 (2014) 1–24

3

Johansen and Palmeri (2002) defined a process model with a specific shift mechanism that outperforms single-process models in modeling the generalization data. The model showed an initial focus of attention to a single dimension and a gradual increase of attention spreading as a function of trial number. That is, the representational shift is a gradual process and is not directly depended on performance. The assumption that adults would start learning an ill-defined category structure by testing simple rules is obvious, since one-dimensional rules also dominate free classification (Ashby, Queller, & Berretty, 1999). However, the shift mechanism is less obvious (or maybe underspecified). Strategy shifts in categorization are believed to depend on the costs and benefits in cognitive load, speed, and accuracy for the different strategies (Bourne, Healy, Kole, & Graham, 2006). Memorization strategies are typically faster than algorithms (e.g., Rickard, 2004; Touron & Hertzog, 2004), although Bourne et al. (2006) report that response times are lower for the individual’s preferred strategy, either algorithm execution or memorization. Hence, the application of a simple rule while monitoring exceptions is difficult and relatively slow. These costs would explain representational shifts in learning an ill-defined rule from an algorithmic process, such as the use of an explicit rule with exceptions, to an automatic process, such as the retrieval of exemplars from memory.1 In that case cognitive load and accompanying speed would drive a shift from rule-based to exemplar-based strategies, resulting in automatization of the categorization. Following this line of reasoning, hypothesizing a shift from a rule-based to an exemplar-based strategy would imply a simultaneous increase in classification speed. Several specific criteria have been proposed to assess automaticity that can also be tested in category learning (Moors & De Houwer, 2006; Schneider & Chein, 2003; Shiffrin & Schneider, 1977). One set of criteria concerns the characteristics of the behavior that results after the process of automatization. Behavioral inflexibility of automatic categorization implies a large effect of switching response locations. Efficiency of automatic categorization implies a very limited effect on speed and accuracy of executing a dual task (Schneider & Chein, 2003). Hélie et al. (2010) show that these criteria for automaticity hold after extensive learning (>10,000 trials) of different types of category structures, a simple rule structure, a disjunctive rule-based structure, and an information-integration structure. A second set of criteria follows from Logan’s (1988) instance theory and Rickard’s (2004) component power laws model of memory retrieval. These criteria concern the characteristics of the processes leading up to and during automatization. According to Logan (1988) and Rickard (2004), automatization could occur in two ways: a shift of strategy from algorithm execution to memorization and/or an increase of speed of memorization. The proposed automatization processes are characterized by a change of response times (RTs) with experience. The first prediction that both theories make, albeit for different reasons, is that RTs of the memorization strategy follow a power law of learning, which implies that for the RTs at the group level the rate of change of the standard deviation is identical to that of the mean. This law is derived for the case that there are independent RT-distributions for each item, which is true for memorization but not for algorithms. In the case that there is a shift from algorithm to memory retrieval one would expect a distortion of the power law at the start of learning. The distribution of RTs is most likely to be different for algorithms than for memorization strategies, but hard to derive analytically (Logan, 1988). With a memorization strategy, items are learned independently. In contrast, with a rule strategy items are learned together. Hence it is expected that the speed of memory retrieval for specific items reflects the number of training occasions. 1.2. Goals and hypotheses The main goal of this paper is to study the interaction between learning systems by describing the costs and benefits of the different categorization strategies throughout the learning process in relation to shifts between strategies. The ill-defined category structures applied in Johansen and Palmeri (2002) resulted in ample strategy shifts during learning and their data are thus an important source of information about such a learning process. Using their data, we aim to make a detailed analysis of distinct categorization strategies, which specifies their costs and benefits in terms of accuracy and speed and which might explain reasons for shifting. 1 Bourne et al. (2006) show that in specific cases a reverse shift, from memorization strategy to an algorithmic strategy, can also occur under specific conditions for which the cost/benefits ratio supports it.

4

M.E.J. Raijmakers et al. / Cognitive Psychology 69 (2014) 1–24

More in detail, the hypotheses are as follows: We hypothesize, in accordance with Johansen and Palmeri (2002), the existence of multiple learning pathways. First, applying a rule-based strategy from start (i.e., the first transfer block) to end (i.e., after 32 learning blocks); second, applying an exemplarbased strategy from start to end; third, starting with a rule-based strategy and shift towards an exemplar-based strategy. We first test the above hypotheses by detecting strategies and quantifying the transitions between them. Subsequently, we test additional hypotheses about the distinct strategies. First, we hypothesize that a shift from a rule-based strategy to an exemplar-based strategy goes together with an increase in speed. From the literature, it is not so clear how accuracy would change as a result of a representational shift from a rule-based to an exemplar-based strategy. Second, we expect to find a power law of learning for the memorization strategy, but not for the rule strategy. Third, for an exemplar-based strategy we expect a relation between frequency of presentation and classification speed for individual items. Fourth, within each categorization strategy, we expect a gradual increase in accuracy: when using the memorization strategy participants are expected to make fewer mistakes with more practice; when using a rule-based strategy, participants are expected to learn the exceptions by memorization and increase their accuracy as a result. Finally, in a more explorative way we will study the speed and accuracy changes for regular vs. exception items for the rule-based categorization strategy. 1.3. Latent Markov analysis of learning an ill-defined two-category structure The most widely applied ill-defined two-category structure was introduced by Medin and Schaffer (1978; see also Johansen & Palmeri, 2002; Nosofsky & Palmeri, 1998; Nosofsky, Palmeri, & McKinley, 1994; Nosofsky et al., 1989), and is presented schematically in Table 1. Items are composed of four binary dimensions. In a learning experiment, the training items (A1–A5 and B1–B4) are presented repeatedly one by one in a random order, and classified as category A or B by the participant, who receives immediate feedback on the correctness of each classification (e.g., Johansen & Palmeri, 2002, Experiment 1). The category structure for categories A and B, as shown in Table 1, cannot be correctly represented by a simple, one-dimensional, rule. However, if exceptions are taken into account, the category structure can be defined by the rule that says that only items with a 1 in the first dimension belong to category A. According to that rule, items A5 and B1 are the exceptions (note that there also exists an alternative simple rule resulting in two exceptions which involves dimension 3 of the stimuli; other one-dimensional rules imply more exceptions). To assess the representation that people form during learning, Johansen and Palmeri (2002) examined responses to critical transfer items, the so-called generalization patterns. Johansen and Palmeri included several transfer blocks during the learning process. Transfer items (T1–T7, see Table 1) were presented together with training items without feedback. The pattern of responses (i.e., generalization pattern) to the so-called critical transfer items is indicative for the kind of representation

Table 1 Typical ill-defined category structure. Training items

Transfer items

Category A

Category B

A1 A2 A3 A4 A5

B1 B2 B3 B4

1 1 1 1 2

1 2 2 1 1

1 1 1 2 1

2 2 1 1 1

1 2 2 2

1 1 2 2

2 1 2 2

2 2 1 2



T1 1 2 2 1 T2 1 2 2 2 T3 1 1 1 1  T4 2 2 1 2  T5 2 1 2 1  T6 2 2 1 1 T7 2 1 2 2 

Note: Each stimulus is composed of four dimensions. Each dimension can assume one of two possible values (denoted as 1 and 2). A1–A5 and B1–B4 are the nine training items. T1–T7 are the seven transfer items. Critical transfer items for determining the kind of generalization are marked with an asterisk. Johansen and Palmeri (2002) used this category structure in their Experiment 1.

M.E.J. Raijmakers et al. / Cognitive Psychology 69 (2014) 1–24

5

formed during learning. With these critical transfer items Johansen and Palmeri detected both rulebased and exemplar-based representations both as inter-individual and intra-individual differences. In their statistical analyses of the categorization data, Johansen and Palmeri compared the observed generalization patterns to expected generalization patterns. Expected patterns were derived from rule-based, exemplar-based, and prototype categorization process models. However, in general, this type of analysis, i.e. based on pattern matching, does not result in a robust classification of individuals into different types (van der Maas & Straatemeier, 2008). The major problem of pattern matching is that one needs to settle on a criterion for a match between observed and expected pattern. Johansen and Palmeri (2002) chose 100% and 80% for collecting evidence for the detection of expected generalization patterns. Van der Maas and Straatemeier showed with simulation studies that the optimal criterion for assigning observed patterns to expected patterns depends on the amount of noise in the data (e.g., noise due to random patterns, errors in the execution of a strategy, and/or gradually changing behavior). For several cases it was shown that the application of pattern matching techniques in the presence of unanticipated noise lead to erroneous conclusions (Jansen & van der Maas, 1997; Quinlan, van der Maas, Jansen, Booij, & Rendell, 2007; Raijmakers, Jansen, & van der Maas, 2004; Raijmakers, Van Koten, & Molenaar, 1996; Straatemeier, van der Maas, & Jansen, 2008; van der Maas & Straatemeier, 2008). The optimal choices for cutoff criteria require statistical modeling, otherwise classifying has a great danger to result in false positives. This means that one would wrongfully conclude that a specific strategy is used by some of the participants. In addition to the analysis with pattern matching techniques, Johansen and Palmeri (2002) fitted multiple computational process models to the data. From these models, the shift-model for category learning is the best fitting model. This analysis is a confirmative analysis with sound statistical tools for parameter estimation. However, based on these models individuals were not assigned to specific strategies or pathways, therefore, the strategies themselves and their interaction could not be studied. We follow a different analysis approach. Instead of matching generalization patterns to expected patterns, we model the data with a categorical latent variable technique: latent Markov analysis (LMA, Rabiner, 1989; Visser, Schmittmann, & Raijmakers, 2007, chap. 14; Visser, 2011). In other domains of cognition, LMA and latent class analysis (i.e., a special case of LMA) have proven useful in the analysis of response patterns aimed at detecting multiple latent strategies, i.e. strategies that cannot be observed directly. Examples in the area of cognitive development include analyses of the balance scale task (Boom, Hoijtink, & Kunnen, 2001; Jansen & van der Maas, 1997, 2002), conservation of liquid task (Schmittmann, Dolan, van der Maas, & Neale, 2005), a discrimination-learning task (Raijmakers, Dolan, & Molenaar, 2001; Schmittmann, Visser, & Raijmakers, 2006), a dimensional change card sorting task (van Bers, Visser, van Schijndel, Mandell, & Raijmakers, 2011), the triad classification task (Raijmakers et al., 2004), and implicit learning tasks (Visser, Raijmakers, & Molenaar, 2002, 2007). The use of latent Markov models for classifying response patterns into different types has a number of important advantages with respect to pattern matching techniques. First, with these techniques the criterion for classifying patterns into types is based on sound statistical inference. Second, the use of latent variable techniques provides the possibility of detecting groups of participants that show hitherto unknown types of behavior. In particular, the parameters in a LMA provide estimates of the probabilities of responses within subgroups. If the probabilities of responses to two-choice items within a subgroup were around .5, for example, the generalization pattern characterizing this subgroup would resemble guessing. Third, LMA allows for modeling the conditional probabilities of making transitions from one strategy to another strategy. LMA makes it possible to test whether the shift model for categorization corresponds to the latent structure, as this is present in the data. In addition, LMA allows for testing more specific hypotheses about distinct strategies once a latent Markov model has been selected for the data. One such hypothesis concerns error variability in response patterns. Assuming that our cognitive system is noisy, it is to be expected that participants make errors in executing a strategy. However, a straightforward execution of a strategy implies equal error rates for all items. With LMA, we test whether some items are more error-prone, given a strategy. A second, specific hypothesis about categorization strategies concerns changes over the course of learning within the strategy. According to automatization theories (Logan, 1988; Rickard, 2004), the accuracy of responses generated with a memorization strategy

6

M.E.J. Raijmakers et al. / Cognitive Psychology 69 (2014) 1–24

increases with experience. For transfer items, this would translate into increased certainty with experience. For a rule-plus-exception strategy, the accuracy of training items is expected to increase with experience, but this would not translate into increased certainty of transfer items if the rule were applied to unknown items. With LMA, we could model the uncertainty of responses within a latent subgroup as a function of block number. Subsequently, the contribution of this predictor variable to the fit of the model can be tested. Finally, LMA provides a statistically robust method for assigning individual subjects to one of the latent strategies at each point in time (Visser, 2011). This allows us to analyze the characteristics of categorization behavior in terms of accuracy and speed, for the latent subgroups of participants, separately. The main disadvantage of LMA is that it requires relatively large data sets consisting of long time series of a few subjects or short time series of many subjects. However, with the present data this requirement is readily met. How do the computational models, such as Johansen’s and Palmeri’s (2002) shift model and RULEX (Nosofsky & Palmeri, 1998), relate to LMA? First of all, the latent Markov models (LMMs) are descriptive models of the data. In contrast, the computational models incorporate a process explanation for the observed data. A computational model represents a formal, psychologically plausible, account of the learning process, which is not true for LMMs, at least not without additional assumptions (c.f. Schmittmann et al., 2006). Hence, the LMMs should not be viewed as an alternative to the computational models. Rather, LMA has its own particular contribution to the analysis of data. As demonstrated in the present article, the LMA provides a formal, explorative description of the observed data that is not directly dependent on a theoretical model of representations. This introduces the possibility to detect unforeseen strategies, such as response patterns that do not denote a clear strategy at some point of the learning process. The latter is important for testing specific hypotheses about distinct categorization strategies. The present article is fully based on the experiments described in Johansen and Palmeri (2002). In addition to the data already presented, we include the latency data in complementary analyses. In the methods section of Experiment 1, we briefly introduce latent Markov analysis (LMA) because this technique forms the core of our statistical approach. In the Appendix we provide a more detailed description with an example of a latent Markov model. In subsequent sections we provide a detailed and robust description of representational shifts for all three experiments. For Experiment 2, we will test hypotheses about changes in both the accuracy and response times during learning derived from theories of automatization. 2. Experiment 1, Johansen and Palmeri (2002) As this experiment has been published, we limit our description of the method to a brief summary, and refer the reader to the original article by Johansen and Palmeri (henceforth J&P) for the details. 2.1. Method 2.1.1. Participants Participants were 198 undergraduate students who received credit in an introductory psychology course. All participants were tested individually. 2.1.2. Stimulus materials Stimuli were computer-generated drawings of rockets that varied along four binary dimensions. The category structure is presented in Table 1. As shown in Table 1, 5 stimuli belong to category A, 4 stimuli belong to category B, and 7 are new transfer items. J&P showed that the category structure could be represented in (at least) three different ways: by an exemplar representation (E), by a Rule 1 representation (R1), and by a Rule 2 representation (R2). For R1, category A is characterized by value 1 on dimension 1, with the exception of stimulus A5, while category B is characterized by value 2 on dimension 1, with exception of stimulus B1. The expected generalization pattern of the critical transfer items (see Table 1) of a participant following R1 is AABBB. R2 is based on dimension 3 of

7

M.E.J. Raijmakers et al. / Cognitive Psychology 69 (2014) 1–24

the stimuli: value 1 characterizes category A, and value 2 characterizes category B, with stimuli A4 and B2 being exceptions. The expected generalization pattern is BBABA. The generalization pattern according to an exemplar representation was predicted by applying the exemplar learning model ALCOVE (Kruschke, 1992) to the category learning task under a wide set of parameters. The most prominent generalization pattern in these simulations is ABBBA (E). 2.1.3. Procedure Participants were trained to categorize the 9 training stimuli into the two categories during 32 training blocks. During each training block all training stimuli were presented in a random order. Corrective feedback was given after each response. Transfer blocks were presented after 2, 4, 8, 16, 24, and 32 training blocks. During each transfer block both the training items and the transfer items were presented without corrective feedback. 2.1.4. Statistical approach: latent Markov analysis Latent Markov analysis (LMA) is a useful statistical technique to model multiple latent types of response patterns and the transitions between them (Langeheine & van de Pol, 1990; Rabiner, 1989. We describe the approach conceptually in the following, and provide more technical details in the Appendix. In the case of Experiment 1 in J&P, the data to be analyzed consists of responses to 5 transfer categorization items (i.e., the critical transfer items in Table 1: T1, T2, T4, T5, T6) for 6 transfer blocks distributed over the learning process. Responses are binary (category A or B) and the observed data is a distribution of generalization patterns. A generalization pattern is the series of responses for these 5 items in one block, e.g. AABBB. Fig. 1 shows the distributions after learning block 2 and 32 of Experiment 1. These distributions are hypothesized to be the result of different types of categorization strategies: Rule 1, Rule 2, and an Exemplar-based strategy (see Section 1). Each categorization strategy is expected to result in a typical (i.e., most frequent) generalization pattern. For Rule 1, the expected typical generalization pattern is AABBB, for Rule 2 BBABA, for the exemplar-based strategy ABBBA.

AAAAA AAAAB AAABA AAABB AABAA AABAB AABBA AABBB ABAAA ABAAB ABABA ABABB ABBAA ABBAB ABBBA ABBBB BAAAA BAAAB BAABA BAABB BABAA BABAB BABBA BABBB BBAAA BBAAB BBABA BBABB BBBAA BBBAB BBBBA BBBBB

0.00

block 2

0.10

AAAAA AAAAB AAABA AAABB AABAA AABAB AABBA AABBB ABAAA ABAAB ABABA ABABB ABBAA ABBAB ABBBA ABBBB BAAAA BAAAB BAABA BAABB BABAA BABAB BABBA BABBB BBAAA BBAAB BBABA BBABB BBBAA BBBAB BBBBA BBBBB

0.20

proportion

0.30

0.00

block 32

0.10

0.20

0.30

proportion

Fig. 1. The observed frequency distribution of the generalization patterns of the critical transfer items in Experiment 1 (after Johansen & Palmeri, 2002, Fig. 2). Left panel shows transfer block 2, right panel transfer block 32. The x-axis shows the proportion of participants. The y-axis shows all possible generalization patterns. The dark bars represent the patterns that were expected (Rule 1: AABBB; Exemplar: ABBBA; Rule 2: BBABA).

8

M.E.J. Raijmakers et al. / Cognitive Psychology 69 (2014) 1–24

However, if the participants do not execute the strategy consistently, some deviations of the typical pattern are expected as well. This results in a multivariate binomial distribution of responses, where each generalization pattern has an assigned probability of occurring, given a specific strategy in a given transfer block. As each strategy is characterized by different typical generalization patterns, and therefore, different probabilities associated with each generalization pattern, the complete data set is generated by a mixture of multivariate binomial distributions, with one component for each strategy. Over time, a participant could switch between different strategies, for instance, applying Rule 1 in the first transfer block, Rule 2 in the second and third transfer block, and the Exemplar strategy in the subsequent transfer blocks. With LMA, we can disentangle the underlying strategies of responding, and obtain estimates of the frequencies of strategy application at the beginning, and the occurrence of strategy shifts during the course of learning. At each measurement occasion (here, in each transfer block), a participant is assumed to be in one of a number of distinct states, where the states correspond to different strategies of responding. Fig. 2 shows a graphical representation of a 3-state latent Markov model that agrees with the expected model as was put forward by J&P for Experiment 1. The circles represent different states: SE, SR1, SR2. Each state is associated with specific conditional probabilities of responding to each respective item, for instance, if the conditional probabilities of responding AAAAA to the 5 transfer items were (.85, .82, .12, .16, .10), respectively, we would expect generalization pattern AABBB to occur most frequently. Furthermore, each state has an associated initial probability of occurring at the first measurement occasion, and each ordered pair of states (S1, S2) has an associated transition probability of switching from state S1 to state S2 from one measurement occasion to the next. For more technical information, we refer the reader to the Appendix. In applying explorative LMA, our first aim was to obtain a parsimonious and accurate description of the data that could confirm or falsify the expected model for the data. We firstly needed to infer the optimal number of categorization strategies to describe the data. To this end we fitted multiple models to the data with an increasing number of states. Subsequently, we selected the most parsimonious, best fitting model by means of model selection techniques. We used the Bayesian Information Criterion, BIC (Schwarz, 1978) to select between models. BIC considers the loglikelihood of the model in relation to the number of participants and the number of degrees of freedom. Lower BIC values indicate preferable models. After finding the optimal explorative model, we tested specific hypotheses about the data. We expressed our hypotheses in terms of equality constraints, which were imposed on the parameters of the

Fig. 2. Diagram of a Markov model as would be expected for Experiment 1. The circles indicate a Rule 1 state (SR1), a Rule 2 state (SR2), and an exemplar state (SE). pR1 pR2 pE indicate the initial probabilities of states SR1, SR2, SE, respectively. The arrows indicate the transitions between states; For example, AR1–E is the conditional probability of making a transition from state SR1 to state SE.

9

M.E.J. Raijmakers et al. / Cognitive Psychology 69 (2014) 1–24

observation probabilities. Imposing equality constraints reduces the number of freely estimated parameters in the model. Firstly, we test a more stringent interpretation of a categorization strategy by constraining the uncertainty for all transfer items to be equal, conditional on state. For example, bR1(T1, T2, T4, T5, T6) = (1  a, 1  a, a, a, a) with a freely estimated and expected to be near 0. The lower the value of a, the more certain the responses will result in generalization pattern AABBB. As the value increases up to .5, the more uncertain the generalization pattern will be; in the limit, if a equals .5, resulting in random categorization of stimuli into categories A and B. To test whether parameter constraints are acceptable, we used the loglikelihood difference test: Minus twice the difference in the loglikelihoods of the models with and without the constraints is chi-square distributed with the degrees of freedom equal to the difference in degrees of freedom of the two models (Azzelini, 1996; Wickens, 1982). In this constrained model we would estimate 1 parameter for this observation probabilities bR1 instead of 5 parameters in the unconstrained model, which results in df = 4 for the chi-square test between models. Secondly, we tested whether the uncertainties of responses decrease with experience, i.e. block number. To model this effect, we extended a latent Markov model by introducing block number as a covariate of the conditional probabilities. bR1(T1, T2, T3, T5, T6) = (1  a, 1  a, a, a, a) + (b, b, b, b, b)(block number) Parameters a and b are both conditional on state. The statistical significance of the contribution of the covariate was tested using the loglikelihood difference statistic comparing models with b freely estimated and with b equal to zero. To test further hypotheses about specific categorization strategies, we needed to assign individual participants to the most likely sequence of states. The assignment was based on the posterior probabilities for the responses given the optimal model with all the acceptable parameter constraints (see Visser et al., 2011). 2.2. Results We limit the presentation of the experimental results to the results of the LMA. We refer the reader to J&P for additional analyses. As in J&P, 68 participants with an error rate of more than 25% on the last four training blocks were removed from further analyses. Table 2 shows the results of LMA. The most parsimonious, best fitting explorative model is a 4-state model, which is the explorative model with the lowest BIC. Introducing equality constraints on the observation probabilities was only feasible for the state representing the exemplar-based generalization (model 4b in Table 2). Introducing block number as a covariate on the observation probabilities for the Exemplar state (model

Table 2 Fit statistics of latent Markov models for the data of Experiment 1. Model

Loglike

df

BIC

N

Dlogl

p

2 states 3 states 4 states 5 states 4a 4b 4b-cov-R 4b-cov-E

2312.67 2186.93 2112.36 2073.19 2153.48 2113.03 2111.57 2109.72

13 23 35 49 23 31 33 32

4711.90 4527.02 4457.80 4472.68 4460.12 4432.49 4442.90 4432.54

780 780 780 780 780 780 780 780

82.24 1.34 2.91 6.61

0.00 0.86 0.23 0.01

Note: The asterisk indicates the most parsimonious, best fitting model; model 4a is a 4-state model with equality constraints on the conditional probabilities for states representing different kinds of generalization: R1, R2, and E. Model 4b is a 4-state model with equality constraints for the state representing E only; df = degrees of freedom; BIC = Bayesian Information Criterion. N = the number of data points (# measurements * # participants). Dlogl is the loglikelihood difference statistic of the constrained and the corresponding unconstrained model. For models 4a and 4b, the corresponding unconstrained model is model ‘‘4 states’’. Model 4b-cov-R (4b-cov-E) is equivalent to model 4b but with block number as a covariate of the conditional probabilities of the Rule (Exemplar) state(s). Models 4b-cov-R and 4b-cov-E are compared to model 4b. Note that in this comparison model 4b has the fewest number of parameters. p is the p-value of the corresponding chi-square test indicating whether the decrease in fit of the constrained model is significant.

10

M.E.J. Raijmakers et al. / Cognitive Psychology 69 (2014) 1–24

4b-cov-E: v2(1) = 6.61, p = .01), but not for the rule states, results in a model with a significantly better fit. Fig. 3a depicts the parameter estimates of the observation probabilities of the four states, in model 4b-cov-E. Three states roughly correspond to the three expected types of generalization: E, R1, and R2. E is a clear exemplar-based generalization with small uncertainties that decrease with experience from .15 (first transfer block) to .03 (last transfer block). The estimated initial probability of E is zero (i.e., estimated on the boundary), meaning that participants did not apply E in the first transfer block. R1 has low error rates for T1, T2, and T4 (

Costs and benefits of automatization in category learning of ill-defined rules.

Learning ill-defined categories (such as the structure of Medin & Schaffer, 1978) involves multiple learning systems and different corresponding categ...
724KB Sizes 1 Downloads 0 Views