c o r t e x x x x ( 2 0 1 5 ) 1 e6

Available online at www.sciencedirect.com

ScienceDirect Journal homepage: www.elsevier.com/locate/cortex

Commentary

Semantic structure can be inferred from category fluency tasks via clustering analyses: Reply to Voorspoels et al. (2014) Kyongje Sung a,*, Barry Gordon a,b,1 and David J. Schretlen c,d,2 a

Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA Department of Cognitive Science, The Johns Hopkins University School of Medicine, Baltimore, MD, USA c Department of Psychiatry and Behavioral Sciences, The Johns Hopkins University School of Medicine, Baltimore, MD, USA d Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD, USA b

1.

Introduction

We (Sung et al., 2012) used singular value decomposition (SVD) to investigate semantic and associative clustering of words given by participants during category word fluency tasks. In that study, healthy controls (NC) and persons with schizophrenia (SZ) showed different word association patterns. Voorspoels et al. (2014) recently criticized the application of SVD and other clustering methods, such as multidimensional scaling (MDS) and hierarchical clustering, to verbal fluency productions. Voorspoels et al. did not criticize clustering procedures themselves or the usefulness of the fluency tasks, but the appropriateness of verbal fluency data as input measures for clustering analyses. However, their criticism is based on inadequate and/or insufficient data analyses, a logical problem in their analyses, and a misinterpretation of the analytic procedures used by other investigators. It appears that an unconventional and inconsistent definition of measurement “reliability” also biased their perspective. We and other researchers have used statistical clustering methods to investigate the underlying semantic structure that gives rise to verbal fluency productions (e.g., Aloia,

Gourovitch, Weinberger, & Goldberg, 1996; Chan et al., 1993; Sumiyoshi et al., 2001; Sung et al., 2012). These investigations typically aim to identify clusters of semantically related exemplars of a given category (e.g., animal names) and often compare differences in clustering patterns between healthy controls and persons whose semantic systems are compromised by illness. An assumption in these studies is that verbal fluency productions reflect hidden semantic associations among concepts (words) in the semantic system. MDS and other classical clustering methods use the distance between two named words (a “proximity” measure defined by the number of other words between them) to assess the strength of the association between concepts: the shorter the distance, the stronger the association. SVD analyses assume that if two concepts are associated in the semantic system, they tend to be named together by many participants. This “co-occurrence” assumption is fundamental to many scientific applications of SVD (e.g., Alter, Brown, & Botstein, 2000; Landauer, 2007). Voorspoels et al. (2014) asserted that proximity and cooccurrence measures of association are inherently unreliable, and that observed difference between the word clustering patterns of NC and patients (such as SZ) are not likely to

* Corresponding author. Johns Hopkins University School of Medicine, 1629 Thames St., Suite 350, Baltimore, MD 21231-3440, USA. E-mail addresses: [email protected] (K. Sung), [email protected] (B. Gordon), [email protected] (D.J. Schretlen). 1 Johns Hopkins University School of Medicine, 1629 Thames St., Suite 350, Baltimore, MD 21231-3440, USA. Tel.: þ1 410 955 3407. 2 Johns Hopkins Hospital, 600 N. Wolfe St, Meyer 218, Baltimore, MD 21287-7218, USA. Tel.: þ1 410 614 6341; fax: þ1 410 955 0504. http://dx.doi.org/10.1016/j.cortex.2015.02.013 0010-9452/© 2015 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Sung, K., et al., Semantic structure can be inferred from category fluency tasks via clustering analyses: Reply to Voorspoels et al. (2014), Cortex (2015), http://dx.doi.org/10.1016/j.cortex.2015.02.013

2

c o r t e x x x x ( 2 0 1 5 ) 1 e6

reflect true differences, thereby invalidating conclusions based on the observations.

2. Reliability of the proximity measurement and results of MDS To test the reliability of MDS findings, Voorspoels et al. (2014) selected 100 samples of 20 participants from a pool of NC (N ¼ 204). For each sample, they calculated proximity measures of the 12 most frequently named animals in an animal fluency test and performed an MDS analysis that yielded 100 different MDS representations of the same 12 words. To compare these 100 MDS results, Voorspoels et al. first plotted the locations of 12 animals of one NC sample as reference positions. They then applied a Procrustes geometric transformation to each of the remaining 99 MDS representations with respect to those 12 reference positions. In Fig. 1, the reference positions of 12 words from the first sample of NC (left panel) are plotted with “þ” signs. The position of giraffe for the first sample is circled. The transformed locations of giraffe for the other 99 samples are plotted as dots (Voorspoels et al. did not present other animals from 99 samples). The same 100 sample analysis was done for SZ (N ¼ 204) separately (right panel). Based on Fig. 1, Voorspoels et al. argued that if the 100 MDS representations reliably show the same semantic association pattern, then the 100 locations for giraffe should be packed closely together. Although no objective criterion defines how close they should be, Voorspoels et al. argued that variability in the locations of giraffe was too large to interpret the clustering as a reliable pattern in either group. We regard the method Voorspoels et al. used to compare the 100 MDS results as incomplete and misleading. The

variability of 99 locations for giraffe relative to its reference location should be examined in terms of the goodness of fit of the Procrustes transformations. Also, the relationships between giraffe and other animals should be examined within each of the 100 samples separately. For example, suppose that the location of giraffe in the 100th sample is the one marked with a square box in the left panel of Fig. 1. This giraffe clearly was not clustered with other wild animals (lion, elephant, and tiger) by the reference sample and was far away from other giraffes. However, this does not mean that giraffe was distant from other wild animals in the 100th sample. Giraffe could well be clustered with wild animals in the 100th sample, even though it did not cluster with the same animals in the reference sample if the Procrustes transformation yielded a poor fit. Thus, finding that the location of giraffe in the 100th sample does not cluster closely with other wild animals of the reference group is only informative if we know the fit of the Procrustes transformation for the 100th sample and the word clustering patterns in that sample. Voorspoels et al. (2014) did not provide this essential information along with other MDSrelated statistics (e.g., stress values). Interestingly, even without additional information, we can see in Fig. 1 that the locations of the giraffe were scattered across all four quadrants in SZ, but in NC they cluster in the third and fourth quadrants, where all wild animals are located. This observation, which Voorspoels et al. failed to note, suggests that their own analyses reveal a striking difference in the clustering patterns of these two groups. Voorspoels et al. also investigated the similarity of the proximity measures between NC and SZ prior to the use of MDS. In this analysis, they reasoned that if proximity measures based on wordeword distances actually reflect their hidden association strengths, then the correlation of proximity measures between NC and SZ should be low if the

Fig. 1 e Two-dimensional MDS representations of words analyzed in Voorspoels et al. (2014) for 100 NC (left panel) and 100 SZ (right panel) samples. Two gray lines and a black open square were added to the original figure to better identify the representations (see text). Thus, the intersection of two gray lines is not the origin of the axes of the 2-D representations. The original figure did not have scale values on its axes. Adapted from Voorspoels et al. (2014) with permission. Please cite this article in press as: Sung, K., et al., Semantic structure can be inferred from category fluency tasks via clustering analyses: Reply to Voorspoels et al. (2014), Cortex (2015), http://dx.doi.org/10.1016/j.cortex.2015.02.013

c o r t e x x x x ( 2 0 1 5 ) 1 e6

groups have different word association patterns. Conversely, a high correlation would signify no appreciable difference between the semantic systems of the two groups. The correlation of proximity measures from the 100-sample analysis was .3, which was low, according to Voorspoels et al. However, they further showed that the correlations between NC and SZ tended to increase as the sample size increased (from .3 for n ¼ 20 to .82 for n ¼ 204), which should not happen if the two groups truly differ in underlying semantic organization. They even argued that a perfect correlation would be possible with a much larger sample size. Although we do not know the exact reason for high proximity correlation with a large sample size, we Sung et al. (2012) found that animal naming may not be the best fluency task to discriminate SZ from NC, especially when analysis is limited to the 12 most frequently named animals. Regardless, we find the logic of the correlation analysis by Voorspoels et al. to be problematic. While it may be true that two similar semantic systems yield highly correlated proximity measures, highly correlated proximity measures do not guarantee similarity between the two semantic systems on which they are based. For example, assume three different groups whose proximity measures for the same five words are shown in Fig. 2. These proximity matrices were created so that the correlations were .94 between group A and B, .91 between A and C, and .76 between B and C.3 Fig. 2 shows that MDS solutions for groups A and C are five words arranged in a straight line in order of 1, 2, 3, 4 and 5 without errors (i.e., perfect one-dimensional solutions). It is not easy to identify the solution for group B despite the high correlation (.94) between group A and B. A perfect MDS solution does exist for group B when words are represented in two-dimensional space. This demonstration shows that proximity measures can be highly correlated (groups A and B) even when the underlying semantic structures differ substantially. Thus, the correlations of proximity measures demonstrated by Voorspoels et al. (2014) do not necessarily support their claim.

3.

Reliability of SVD analysis

To test the stability of word clusters reported in our previous study (Sung et al., 2012), Voorspoels et al. examined the splithalf correlation by constructing two sub-groups from 204 NC and another two sub-groups from 204 SZ. They performed SVD analysis for each sub-group separately and extracted a 25-factor solution.4 Voorspoels et al. then calculated the wordeword associations (i.e., cosines of angles) within each SVD solution and examined the similarity of these cosine values between two split-half NC and SZ subgroups. In SVD analysis, fluency data are represented as a matrix with 1s and 0s, depending on whether or not a particular word (represented in rows) was given by a participant (represented in columns). SVD decomposes this input matrix and rerepresents each word in multi-dimensional space as a

3

vector, whose dimensionality is determined by the researcher. The association strength between words is then measured by the cosines of angles of the resulting word vectors (þ1: highly associated, 0: independent, and 1: exclusive). Voorspoels et al. repeated the above split-half analysis 500 times as if there were 500 separate split-half studies. Using the top 40 animal exemplars, the results of 500 repeated analyses showed that the mean of 500 correlations between two halves of NC was .20. The mean correlation was .17 between two halves of SZ. Voorspoels et al. concluded that the clustering patterns of words shown by NC and SZ groups did not differ substantially, casting doubt on both the validity of SVD results and conclusions of Sung et al. (2012). Importantly, Voorspoels et al. made a simple analytic mistake that renders their claim a “straw man” argument. Voorspoels et al. (2014) said that they replicated the SVD procedures of Sung et al. (2012), including the extraction of 25 factors. However, we never used 25-factor solutions to compare NC and SZ. Rather, as stated in that paper, we “first sought 25-dimensional SVD solutions for all four matrices using PROPACK software (Larsen, 2004) for Matlab (Version 7.8, MathWorks). While arbitrarily chosen, we assumed that the number of meaningful dimensions would be smaller than 25” (Sung et al., 2012, p. 568). In other words, not knowing the appropriate number of factors in advance, we chose a reasonably large number for factors and examined different SVD solutions with an increasing number of factors. This was possible because whether we extracted 10 or 25 factors, the first 10 factors would be identical according to the SVD algorithm. We concluded that 4- or 5-factor solutions best discriminated the two groups in terms of word clustering patterns. Voorspoels et al. mistakenly assumed that we examined the semantic clustering patterns with 25-factor solutions. Given this error by Voorspoels et al. (2014), we wondered about the stability of word clusters that would occur with the same split-half correlation with 5 factors or so. We followed Voorspoels et al.'s split-half analysis and examined the actual correlations of wordeword similarity between the even- and odd-numbered NC and SZ participants reported in Sung et al. (2012) for all possible 24-factor solutions starting with two factors. Twenty words were examined, half of the original number of words analyzed. In Fig. 3, the result of a single splithalf analysis is presented except for the simulation result. Fig. 3 highlights four findings. First, the correlations between even- and odd-numbered NC were higher than those between even- and odd-numbered SZ across all 24 factor solutions except for the two factor solutions. This exception occurred because the top four animals were better clustered in SZ than in NC only when two factors were considered (see Sung et al., 2012). Second, the maximum difference in correlations between NC (r ¼ .409) and SZ (r ¼ .138) emerged for the 7-factor solution. Third, with 25 factors, the correlations were .26 for NC and .186 for SZ, similar to those found by Voorspoels et al. Fourth, the correlations between NC and SZ5 (dark

3

Correlations were calculated without “0” cells. In Sung et al. (2012) we used the term “dimension” rather than “factor.” In this commentary, we use the two terms interchangeably. 4

5 The correlations were the averages of correlations between odd-NC and even-SZ pairs and between even-NC and odd-SZ pairs.

Please cite this article in press as: Sung, K., et al., Semantic structure can be inferred from category fluency tasks via clustering analyses: Reply to Voorspoels et al. (2014), Cortex (2015), http://dx.doi.org/10.1016/j.cortex.2015.02.013

4

c o r t e x x x x ( 2 0 1 5 ) 1 e6

Fig. 2 e Made-up examples of three sets of proximity measures of the same five words.

broken line) were either similar to or worse than those of splithalf SZ. These findings show that word clustering patterns extracted using SVD generally were more stable in NC than in SZ, and that the claim by Voorspoels et al. (2014) was simply based on one of the worst possible cases. A correlation of .409 with 7 factors in NC seems reasonably high to us, although others might disagree with this. Therefore, we performed a simulation to determine the correlation if there were no meaningful co-occurrence patterns of words. We constructed a random matrix with the same dimensions as the matrix for even-numbered NC (202 words by 54 participants) in the following way: For each row of the random matrix, the corresponding row of the even-numbered NC matrix was examined to determine how many people

actually said a word that was represented in the row. For example, the first row of the even-numbered NC matrix represents “cat” (rows were sorted by the word frequency). Since 49 of the 54 even-numbered NC said “cat” on animal naming, we randomly selected 49 cells of the first row of the random matrix and filled them with 1s (placing 0s in the remaining cells). We repeated this procedure until all rows of the random matrix were filled with 1s and 0s. This random matrix construction copied the exact word frequency of the even-numbered NC matrix, but no systematic co-occurrence pattern of 1s and 0s should have appeared since their locations within a row were selected randomly. Another random matrix was constructed independently in the same way to provide a pair of two random matrices. A splithalf analysis was performed with this pair of random

Fig. 3 e Average correlations between even-odd halves of NC (dark solid line), SZ (gray solid line), NC-SZ (dark broken line) and random matrix simulation (solid line with circle markers) across all 24 different factor solutions (see text). The sample sizes were 54 for even-numbered NC, 55 for odd-numbered NC, 51 for even-numbered SZ, and 51 for odd-numbered SZ. Except for the simulation result, the correlations are the result of single split-half analysis. The gray area indicates the range of correlations within two standard deviations above the simulation means. Please cite this article in press as: Sung, K., et al., Semantic structure can be inferred from category fluency tasks via clustering analyses: Reply to Voorspoels et al. (2014), Cortex (2015), http://dx.doi.org/10.1016/j.cortex.2015.02.013

5

c o r t e x x x x ( 2 0 1 5 ) 1 e6

matrices, similar to the even- and odd-numbered NC pair analyses. This entire process was repeated 100 times, as if there were 100 independent split-half studies. The mean of 100 simulated correlations are presented in Fig. 3. Fig. 3 shows that most correlations of similarity measures between even- and odd-numbered NC groups fell beyond two standard deviations of the simulated means. In particular, the correlations of 7- and 8-factor solutions fell beyond three standard deviations from the simulated means. The correlation difference between NC and SZ was greatest at these solutions. In our original study (Sung et al., 2012), we did not examine 7- and 8-factor solutions because we prematurely concluded that factors beyond 5 would not improve discrimination of the two groups. Regardless of this oversight in our previous study, it is clear that the claim by Voorspoels et al. (2014) simply was based on the worst possible case, which we did not rely on, or even report, in our previous study.

4.

How reliable should behavioral data be?

In reviewing Voorspoels et al. (2014), it seems that they applied two separate and inconsistent criteria to define acceptable levels of within- and between-group variability. It is hard to ignore the difference in overall clustering patterns between NC and SZ in Fig. 1, unless variability within the NC is totally unacceptable. Ironically, Voorspoels et al. described a correlation of .62 between NC and SZ from proximity analysis results as a clear indication of considerable group difference. They further stated, “But even with as many as 204 participants per group, there still is a difference (i.e., .82 is still different from 1)” (Voorspoels et al., 2014, p. 138). We began this reply to Voorspoels et al. (2014) by noting their criticism of basic assumptions about the application of clustering analyses to verbal fluency data, not the fluency task itself. They acknowledge that many traditional verbal fluency measures, such as overall productivity, errors, clustering, and switching, are useful. They imply that the reliability of these variables may be better than those they calculated from MDS analysis to demonstrate the group difference. However, the question remains: Are they? The database from which our 109 NC were sampled in Sung et al. (2012) included 97 healthy adults who were examined twice an average of 5.5 (SD ¼ .8) years apart using the same verbal fluency tasks.6 At the time of the first test, the average age of participants was 57.1 (SD ¼ 17.0) years, and they had completed an average of 14.1 (SD ¼ 3.0) years of schooling. Forty-three participants were male, 84 participants were Caucasian, and 13 were African American. The correlations of correct words produced across sessions for each task are shown in Table 1. The test-retest reliability results in Table 1 are generally consistent with other findings (e.g., Lemay, Bedard, Rouleau, & Tremblay, 2004; Ross et al., 2007; Strauss, Sherman, & Spreen, 2006). Considering practically no change in overall productivity from the first to the second session at the group level, these correlations demonstrate considerable but inevitable 6

In Sung et al. (2012), the first testing data was analyzed for those who participated twice.

Table 1 e The mean (standard deviation) number of words named in four fluency tests at sessions 1 and 2 and their correlations (n ¼ 97). Fluency test Letter S (session 1) Letter S (session 2) Letter P (session 1) Letter P (session 2) Animal (session 1) Animal (session 2) Supermarket (session 1) Supermarket (session 2)

Mean (Std. Dev.) 14.3 14.8 13.6 14.0 18.7 18.1 24.7 24.7

(4.1) (4.3) (4.5) (4.8) (5.6) (5.9) (6.6) (6.9)

Correlation* (95% CIs) .487 (.318,.626) .595 (.449,.719) .753 (.651,.828) .587 (.439,.704)

*

All correlations: p < .01, two-tailed.

individual variability. The reliability of most cognitive tests is far from perfect. Opinions about what represents clinically or scientifically acceptable reliability vary, but Voorspoels et al. (2014) seem to find a fairly narrow window of acceptable reliability estimates, and even those for verbal fluency productivity likely fall short.

5.

Conclusion

We have demonstrated that the criticism by Voorspoels et al. (2014) of MDS analysis as applied to verbal fluency data was based on incomplete and misleading analyses. We also have shown that their criticism of SVD analysis in our previous study was based on a worst-case scenario that was never considered in our previous study. The level of correlations we found for SVD analysis still might not satisfy some researchers. However, the results of MDS and SVD analyses generally agree with word clusters reported in non-clustering studies (i.e., common animal groups such as African/wild, domestic, sea animals, and birds) (Ledoux et al., 2014; Troyer, Moscovitch, & Winocur, 1997). In some cases, clustering analysis revealed word clusters missed in previous studies (e.g., semantic clusters in letter fluency; Sung, Gordon, Yang, & Schretlen, 2013), further demonstrating the utility of these analyses applied to verbal fluency.

Acknowledgments This research was supported by The Therapeutic Cognitive Neuroscience Fund (BG), the Benjamin and Adith Miller Family Endowment on Aging, Alzheimer's, and Autism Research (BG), NIMH grants MH60504 and MH43775 (DJS), and the National Alliance for Research on Schizophrenia and Depression (DJS). The Johns Hopkins Medicine Institutional Review Board approved this study. Under an agreement with Psychological Assessment Resources, Inc., Dr. Schretlen is entitled to a share of royalties on sales of a test used in the study that is described in this article. The terms of this arrangement are managed by the Johns Hopkins University in accordance with its conflict of interest policies.

Please cite this article in press as: Sung, K., et al., Semantic structure can be inferred from category fluency tasks via clustering analyses: Reply to Voorspoels et al. (2014), Cortex (2015), http://dx.doi.org/10.1016/j.cortex.2015.02.013

6

c o r t e x x x x ( 2 0 1 5 ) 1 e6

references

Aloia, M. S., Gourovitch, M. L., Weinberger, D. R., & Goldberg, T. E. (1996). An investigation of semantic space in patients with schizophrenia. Journal of the International Neuropsychological Society, 2(4), 267e273. Alter, O., Brown, P. O., & Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences of the United States of America, 97(18), 10101e10106. Chan, A. S., Butters, N., Paulsen, J. S., Salmon, D. P., Swenson, M. R., & Maloney, L. T. (1993). An assessment of the semantic network in patients with Alzheimer's disease. Journal of Cognitive Neuroscience, 5(2), 254e261. Landauer, T. K. (2007). LSA as a theory of meaning. In T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 3e34). Mahwah, NJ: LEA. Ledoux, K., Vannorsdall, T. D., Pickett, E. J., Bosley, L. V., Gordon, B., & Schretlen, D. J. (2014). Capturing additional information about the organization of entries in the lexicon from verbal fluency productions. Journal of Clinical and Experimental Neuropsychology, 36(2), 205e220. Lemay, S., Bedard, M. A., Rouleau, I., & Tremblay, P. L. (2004). Practice effect and test-retest reliability of attentional and executive tests in middle-aged to elderly subjects. Clinical Neuropsychologist, 18(2), 284e302. Ross, T. P., Calhoun, E., Cox, T., Wenner, C., Kono, W., & Pleasant, M. (2007). The reliability and validity of qualitative scores for the Controlled Oral Word Association Test. Archives of Clinical Neuropsychology, 22(4), 475e488.

Strauss, E., Sherman, E. M. S., & Spreen, O. (2006). A compendium of neuropsychological tests: Administration, norms, and commentary (3rd ed.). Oxford; New York: Oxford University Press. Sumiyoshi, C., Matsui, M., Sumiyoshi, T., Yamashita, I., Sumiyoshi, S., & Kurachi, M. (2001). Semantic structure in schizophrenia as assessed by the category fluency test: effect of verbal intelligence and age of onset. Psychiatry Research, 105(3), 187e199. Sung, K., Gordon, B., Vannorsdall, T. D., Ledoux, K., Pickett, E. J., Pearlson, G. D., et al. (2012). Semantic clustering of category fluency in schizophrenia examined with singular value decomposition. Journal of the International Neuropsychological Society, 18(03), 565e575. Sung, K., Gordon, B., Yang, S., & Schretlen, D. J. (2013). Evidence of semantic clustering in letter-cued word retrieval. Journal of Clinical and Experimental Neuropsychology, 35(10), 1015e1023. Troyer, A. K., Moscovitch, M., & Winocur, G. (1997). Clustering and switching as two components of verbal fluency: evidence from younger and older healthy adults. Neuropsychology, 11(1), 138e146. Voorspoels, W., Storms, G., Longenecker, J., Verheyen, S., Weinberger, D. R., & Elvevag, B. (2014). Deriving semantic structure from category fluency: clustering techniques and their pitfalls. Cortex, 55, 130e147.

Received 5 November Reviewed 16 December Revised 22 January Accepted 22 January

2014 2014 2015 2015

Please cite this article in press as: Sung, K., et al., Semantic structure can be inferred from category fluency tasks via clustering analyses: Reply to Voorspoels et al. (2014), Cortex (2015), http://dx.doi.org/10.1016/j.cortex.2015.02.013

Semantic structure can be inferred from category fluency tasks via clustering analyses: Reply to Voorspoels et al. (2014).

Semantic structure can be inferred from category fluency tasks via clustering analyses: Reply to Voorspoels et al. (2014). - PDF Download Free
709KB Sizes 2 Downloads 4 Views