c o r t e x 7 5 ( 2 0 1 6 ) 2 5 5 e2 5 9

Available online at www.sciencedirect.com

ScienceDirect Journal homepage: www.elsevier.com/locate/cortex

Discussion forum

Invalid assumptions in clustering analyses of category fluency data: Reply to Sung, Gordon and Schretlen (2015) Steven Verheyen a, Wouter Voorspoels a, Julia Longenecker b, Daniel R. Weinberger c,d,e,f,g, Brita Elveva˚g h,i and Gert Storms a,* a

Brain & Cognition Research Unit, University of Leuven, Belgium Cognition and Brain in Psychopathology Lab, University of Minnesota, Minneapolis, MN, USA c Lieber Institute for Brain Development, Johns Hopkins University School of Medicine, Baltimore, MD, USA d Department of Psychiatry, Johns Hopkins University School of Medicine, Baltimore, MD, USA e Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA f Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA g Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA h Psychiatry Research Group, Department of Clinical Medicine, University of Tromsø, Norway i Norwegian Centre for Integrated Care and Telemedicine (NST), University Hospital of North Norway, Tromsø, Norway b

The semantic structure of a category is reflected in the stable and meaningful relations that exist between exemplars of that category. Sung, Gordon, and Schretlen (2016) claim that this semantic structure can be inferred from category fluency tasks via clustering analyses. Indeed, the sequence of words that is generated in a category fluency task does not resemble a haphazard order: It generally shows signs of at least some semantic organization, in that consecutively mentioned words are often closely semantically related and clusters of meaningfully related words can be discerned. It is therefore not surprising that the idea of using category fluency data as a window on the semantic structure has appealed to so many, with the expectation of providing better insight in neurocognitive distortions in patients with a variety of neurological and psychiatric conditions.

Unfortunately, the procedures advocated by Sung et al. (2016) are the wrong tools to tackle the interesting questions regarding potential semantic distortion, degradation or change. Their outcomes warrant little or no conclusions regarding semantic distortions in patient groups, nor do they shed light on the cognitive or semantic difficulties that patients face. Our arguments for this claim are grounded soundly in empirics and plain logic, and we welcome the opportunity to reiterate them in non-technical terms. Where appropriate, we point the interested reader to publications that hold more technical elaborations. We develop our case along two invalid assumptions of the clustering procedures: (i) The procedures supposedly provide a measurement of semantic structure that is sufficiently accurate to allow group comparisons. There is strong evidence

DOI of original article: http://dx.doi.org/10.1016/j.cortex.2015.02.013. * Corresponding author. Brain & Cognition Research Unit, University of Leuven, Tiensestraat 102, Box 3711, BE-3000 Leuven, Belgium. E-mail address: [email protected] (G. Storms). http://dx.doi.org/10.1016/j.cortex.2015.05.012 0010-9452/© 2015 Elsevier Ltd. All rights reserved.

256

c o r t e x 7 5 ( 2 0 1 6 ) 2 5 5 e2 5 9

that this assumption is false. (ii) The procedures tacitly assume that any semantic deviations in patients with a condition that affects cortical function are highly consistent and thus systematic. There is overwhelming evidence that this assumption is not warranted.

1. The false assumption of accurate measurement The category fluency task is a hybrid task involving various subprocesses. It does not only rely on the semantic knowledge of an individual, but also requires the retrieval of words from semantic memory, self-monitoring, working memory, strategy generation, and divergent thinking (Schretlen & Vannorsdall, 2010). There is considerable behavioral (Shao, Janse, Visser, & Meyer, 2014), clinical (Henry & Crawford, 2004), and neuroimaging (Costafreda et al., 2006) evidence that suggests a strong involvement of executive functioning in category fluency tasks. Category fluency performance is also affected by factors such as age, education, gender, and intelligence (Acevedo et al., 2000; Bolla, Gray, Resnick, Galante, & Kawas, 1998), which are more likely to affect one's executive functioning than one's semantic structure. According to many, category fluency tasks are therefore first and foremost tests of frontal lobe functioning, not semantic memory (Baldo & Shimamura, 1998) per se. Due to the potential involvement of many neurocognitive processes, the response sequences of individuals can differ considerably, even if we can safely assume that there is a single semantic structure (for example because the individuals are from the same, healthy, population). This makes inferring semantic structure from the category fluency responses a tremendous challenge. Even if one is correct in assuming that all individuals in a sample share a common semantic structure, current clustering analyses of category fluency responses do not allow one to arrive at an accurate estimate of this structure. We would expect from an accurate procedure for measuring semantic structure that, at the very least, it yields a highly similar outcome on each replication of measuring the same thing. That is, we would expect the estimated semantic relatedness between any two exemplars included in the analysis to be more or less the same across different measurements. If the relatedness between a donkey and a horse is inferred to be high, we would expect a similarly high relatedness one week later. We would also expect to see the same patterns across healthy individuals who recognize that donkeys and horses are both four-legged farm animals. As we have shown for the procedures advocated by Sung et al. (2016), repeatedly measuring the same semantic structure by sampling from a population of healthy controls results in substantially different structures for each sample (Voorspoels et al., 2014). Since there is no reason to assume that the healthy individuals actually have substantially different semantic representations, clearly the measurement procedures are too sensitive to extra-semantic characteristics of the particular response sequences generated by the individuals in a sample. Given the many processes involved in a category fluency task, it is not easy to identify what the exact factors

are that are responsible for this finding, but the overall result is that the estimate of semantic structure is not accurate. White, Voorspoels, Storms, and Verheyen (2014) showed that administering a category fluency task twice to the same group of healthy undergraduate students and applying the clustering procedures Sung et al. (2016) advocate, resulted in different semantic structures. That is, even when one has every reason to believe that the semantic structure is, in fact, identical, clustering analyses yield substantial differences. Following the procedures advocated by Sung et al. (2016), researchers would conclude that the semantic structure of the sample of healthy undergraduate students was significantly altered in the course of a week! This would be an absurd inference. Thus, another conclusion presents itself as more likely, namely that the clustering procedure does not provide an accurate measurement of semantic structure. This interpretation was corroborated by another finding in the same study in that the semantic structures that were derived from the category fluency data did not correspond to a direct measure of semantic structure provided by the exact same students (White et al., 2014), thus suggesting that the structures derived from the category fluency data do not succeed at capturing the true semantic structure. A plausible explanation for the lack of accuracy is that the derivation of semantic relatedness is not very direct. A variety of processes plays a role in category fluency tasks, and can be highly variable between and within individuals, contexts, and occasions. The observed inaccuracy may be due to any of the functions involved in generating a category's members and be vulnerable to a range of environmental influences and task cues (Elveva˚g, Storms, Heit, & Goldberg, 2005; Hornberger, Bell, Graham, & Rogers, 2009). However, the estimate of semantic structure yielded by the clustering procedure is assumed to be representative of all individuals in the population, which is inconsistent with the observed differences across samples of the same population. In sum, the assumption that clustering analyses of category fluency data yield accurate measurements of semantic structure is refuted by empirical results. The resulting structures are clearly not sufficiently accurate to warrant group comparisons, as even the putative semantic structures inferred from consecutive category fluency tasks differ widely. In the next section we will establish that when these clustering procedures are applied to category fluency data from a group of patients with conditions that affect cortical function, these issues become aggravated.

2. The false assumption of systematic deviations The clustering procedures Sung et al. (2016) advocate assume that a single semantic structure, shared by all individuals in the sample who provided the data, underlies performance in the category fluency task. The output of the procedures is indicative of this assumption, providing a single semantic structure for the entire group of individuals who generated response sequences for a particular category. Assuming a single semantic structure for a group of individuals is sensible if and only if one is willing to accept that every individual in the

c o r t e x 7 5 ( 2 0 1 6 ) 2 5 5 e2 5 9

group has a sufficiently similar semantic structure. This might be a plausible assumption for healthy controls, but for a wide range of clinical conditions that affect cortical function it implies that any distortions in the semantic structure are systematic across the individual patients. The assumption of a group semantic structure thus comes with a pervasive restriction on the type of conclusions one can logically draw regarding distortions of the semantic structure of patients. For example, Sung et al. (2012) observed that the clustering analysis outcome of supermarket items fluency data of patients with schizophrenia shows eggs clustered with lettuce and tomatoes, instead of with breakfast items as in the output of healthy controls. By highlighting these differences between the two groups, one is effectively endorsing the assumption that all these types of patients have a conception of eggs that is shifted towards vegetables such as lettuce and tomatoes, and away from breakfast items. The procedure after all professes to capture the common semantic structure in the patients' data. The logical conclusion this result would yield is that patients with schizophrenia have a highly consistent conception of eggs, which is equally clear and precise as that of healthy controls, yet systematically different. Below, we will first establish that the assumption of systematic semantic deviations is not supported by the available data. We will then continue to show how applying clustering procedures that work under this faulty assumption, obscures rather than helps the comprehension of semantic difficulties in patients with disorders that affect cortical function. The assumption of a single semantic structure shared by all patients with a specified condition implies that at the level of the category fluency data, deviations in response patterns are consistent across the individual patients. For instance, if the category fluency data of patients with schizophrenia were systematically different from that of healthy controls, one would expect that the category fluency data of these patients are mutually more similar than the category fluency data of patients and controls. Indeed, if a single semantic structure underlies the category fluency data of the patients, their response sequences should resemble each other, and given the systematic deviation from the semantic structure of healthy controls, they should resemble the response sequences of healthy controls less. However, Elveva˚g and Storms (2003) showed that the correspondence between the category fluency data of patients with schizophrenia is not larger than the correspondence between the data of patients with schizophrenia and healthy controls. In other words, the response sequences of individual patients are more similar to the response sequences of healthy controls than to the response sequences of other patients with schizophrenia. This would be impossible to observe if the clustering procedures' assumption of systematic deviations were true. Voorspoels et al. (2014) demonstrated the invalidity of this assumption by repeatedly dividing the category fluency data from a large group of patients with schizophrenia into two halves and applying the clustering procedure to the data of each half. The correspondence of the halves was problematically low, yielding widely different semantic structures on each repetition. Storms, Dirikx, Saerens, Verstraeten, and De Deyn (2003a) compared the semantic structures resulting

257

from category fluency data from different independent studies of patients with dementia of the Alzheimer type, again revealing little or no consistency across different samples of patients with the same clinical condition. Even the results reported by Sung et al. (2016) in Figure 3 refute the assumption of a shared semantic structure: For the clustering solutions discussed (with 5, 6, 7, and 8 factors), their figure shows that the correspondence between the semantic structures of two halves of the data from the group with schizophrenia is smaller than or identical to the correspondence between patients and controls. Put differently, the semantic structure of a group of patients with schizophrenia resembles the semantic structure of healthy controls more than it does the semantic structure of another group of patients with schizophrenia, thus arguing against the logic of inferring a single structure for the entire group of patients. Applying clustering procedures in spite of the violations of their assumptions can lead researchers astray. In a re-analysis of a series of published results from category fluency tasks on putative semantic impairments in patients with schizophrenia and dementia of the Alzheimer type, Storms et al. (2003a) found that the data from patients were invariably more variable than the data from healthy controls. The documented variability in the patient data was so extreme that it even resembled random data (see also Hornberger et al., 2009). A similar conclusion can be drawn from Figure 3 in Sung et al. (2016): For the discussed clustering solutions, the figure shows that the correspondence between the semantic structures derived from two halves of the data from the group with schizophrenia is only slightly larger than the correspondence between two randomly generated data sets. Regardless of the amount of structure present in the input data, clustering algorithms will always produce a solution by picking up on contingencies. Given the resemblance to random data, indicating that the relatedness of two exemplars in the resulting patient solution differs from their relatedness in the control solution, does not provide any insight in the nature of the group difference. Particular differences are merely the result of chance. In fact, any difference is likely to arise. Differences between semantic structures of samples from two populations are only meaningful to the extent that they cannot be accounted for by the differences across separate samples of each population. This attribution of variability is at the core of any statistical inference technique, but generally is not undertaken in clustering analyses of category fluency data. When one does take within-group variability into account for these data, the between-group differences are swamped by within-group differences (Voorspoels et al., 2014). Applying this logic to the results in Figure 3 in Sung et al. (2016) also indicates that the variability within the patients is larger than the variability between patients and controls. When the degree of within-group consistency is not higher than the degree of between-group consistency, one runs the risk of mistaking differences that exist within different samples of the patient group or the control group for between-group differences (see Hornberger et al., 2009; Voorspoels et al., 2014; White et al., 2014). It is clear that the assumption of a single semantic structure shared by all individuals in a sample of patients with a condition that affects cortical function does not withstand

258

c o r t e x 7 5 ( 2 0 1 6 ) 2 5 5 e2 5 9

empirical tests. Perhaps advocates of the procedures under scrutiny might assert that their results point to idiosyncratic semantic deficits or general degradation of semantic knowledge. Yet, if that is their hypothesis, they have used the wrong tool for the job (Storms, Dirikx, Saerens, Verstraeten, & De Deyn, 2003b). Idiosyncrasy or general degradation implies that each individual patient deviates in a different way from healthy controls, and thus that each patient has a separate, and different, semantic structure. Establishing heterogeneous patient-specific deficits requires a method that allows a structure to be mapped out for one person's data collected at one time. The clustering procedures advocated by Sung et al. (2016) infer the semantic structure from data of several individuals. If disruptions were to be patient-specific, the semantic structure obtained on the basis of different idiosyncratic sequences would not be representative of any individual patient (Voorspoels et al., 2014). A patient, whose semantic memory degrades or changes, gradually loses the conventional meaning of concepts up until the point they are no longer available. Degradation of semantic memory thus implies absence of semantic structure and as such is incompatible with clustering analyses since the aim of clustering analyses is to represent the structure. In sum, when using clustering procedures on category fluency data, it is tacitly assumed that a single semantic structure is present in the population of patients under examination, implying consistent and systematic distortions within a particular class of patients. Practitioners of clustering procedures seemingly blindly adopt the far-reaching assumption of a systematic impact of a particular clinical condition on semantic structure. The conclusion that appears far more plausible is that patients suffer from idiosyncratic distortions of semantic structure, a general semantic degradation, or deficits in the variety of non-semantic processes associated with category fluency tasks. Such conclusions, however, cannot be drawn from comparing clustering outcomes for differences between a single patient sample and a single control sample: Arguing on the basis of these comparisons that certain patient groups have a vague conception of eggs, that they are less consistent as to what eggs mean in the context of supermarket items, or that they are less clear about whether eggs are vegetables or breakfast items runs counter to the very assumption of the clustering procedures. Indeed, the procedure aims to reveal the meaningful semantic relations among a category's exemplars, shared by all individuals in the sample. Either the outcomes are interpreted in that specific way or they should not be interpreted at all. The available evidence is strongly in favor of the latter option.

3.

Conclusion

In summary, we have provided an overview of the evidence that shows that clustering analyses of category fluency data rest on two invalid assumptions: These analyses (i) do not yield accurate measurements of semantic structure and they (ii) wrongfully assume that any semantic deviations in patients with conditions that affect cortical function are highly consistent and thus systematic. This makes them inadequate tools to further our understanding of the neurocognitive

problems underlying the category fluency output of patients with conditions that affect cortical functions. Clustering analyses yield conclusions that are simply not justified because they discard the inordinate variability in category fluency data. We therefore strongly encourage the complete abandonment of these procedures in favor of methods with less inappropriate assumptions.

Acknowledgments SV and WV are postdoctoral fellows at the Research Foundation-Flanders (FWO).

references

Acevedo, A., Loewenstein, D. A., Barker, W. W., Harwood, D. G., Luis, C., Bravo, M., et al. (2000). Category fluency test: normative data for English- and Spanish-speaking elderly. Journal of the International Neuropsychological Society, 6, 760e769. Baldo, J. V., & Shimamura, A. P. (1998). Letter and category fluency in patients with frontal lobe lesions. Neuropsychology, 12, 259e267. Bolla, K. I., Gray, S., Resnick, S. M., Galante, R., & Kawas, C. (1998). Category and letter fluency in highly educated older adults. The Clinical Neuropsychologist, 12, 330e338. Costafreda, S. G., Fu, C. H. Y., Lee, L., Everitt, B., Brammer, M. I., & David, A. S. (2006). A systematic review and quantitative appraisal of MRI studies of verbal fluency: role of the left inferior frontal gyrus. Human Brain Mapping, 27, 799e810. Elveva˚g, B., & Storms, G. (2003). Scaling and clustering in the study of semantic disruptions in patients with schizophrenia: a re-evaluation. Schizophrenia Research, 63, 237e246. Elveva˚g, B., Storms, G., Heit, E., & Goldberg, T. (2005). Category content and structure in schizophrenia: an evaluation using the instantiation principle. Neuropsychology, 19, 371e380. Henry, J. D., & Crawford, J. R. (2004). A meta-analytic review of verbal fluency performance following focal cortical lesions. Neuropsychology, 18, 284e295. Hornberger, M., Bell, B., Graham, K. S., & Rogers, T. T. (2009). Are judgements of semantic relatedness systematically impaired in Alzheimer's disease? Neuropsychologia, 47, 3084e3094. Schretlen, D. J., & Vannorsdall, T. D. (2010). Calibrated Ideational Fluency Assessment (CIFA) professional manual. Lutz, FL: Psychological Assessment Resources, Inc. Shao, Z., Janse, E., Visser, K., & Meyer, A. (2014). What do verbal fluency tasks measure? Predictors of verbal fluency performance in older adults. Frontiers in Psychology, 5, 772. Storms, G., Dirikx, T., Saerens, J., Verstraeten, S., & De Deyn, P. P. (2003a). On the use of scaling and clustering in the study of semantic deficits. Neuropsychology, 17, 289e301. Storms, G., Dirikx, T., Saerens, J., Verstraeten, S., & De Deyn, P. P. (2003b). On what we cannot learn from proximity data. Neuropsychology, 17, 323e329. Sung, K., Gordon, B., & Schretlen, D. J. (2016). Semantic structure can be inferred from category fluency tasks via clustering analyses: Reply to Voorspoels et al. (2014). Cortex, 75, 249e254. Sung, K., Gordon, B., Vannorsdall, T. D., Ledoux, K., Pickett, E. J., Pearlson, G. D., et al. (2012). Semantic clustering of category fluency in schizophrenia examined with singular value decomposition. Journal of the International Neuropsychological Society, 18, 565e575. Voorspoels, W., Storms, G., Longenecker, J., Verheyen, S., Weinberger, D. R., & Elveva˚g, B. (2014). Deriving semantic

259

c o r t e x 7 5 ( 2 0 1 6 ) 2 5 5 e2 5 9

structure from category fluency: clustering techniques and their pitfalls. Cortex, 55, 130e147. White, A., Voorspoels, W., Storms, G., & Verheyen, S. (2014). Problems of reliability and validity with similarity derived from category fluency. Psychiatry Research, 220, 1125e1130.

Received 14 April Reviewed 21 April Revised 30 April Accepted 11 May

2015 2015 2015 2015

Invalid assumptions in clustering analyses of category fluency data: Reply to Sung, Gordon and Schretlen (2015).

Invalid assumptions in clustering analyses of category fluency data: Reply to Sung, Gordon and Schretlen (2015). - PDF Download Free
334KB Sizes 0 Downloads 6 Views