Mem Cogn DOI 10.3758/s13421-013-0378-6

Grammatical context constrains lexical competition in spoken word recognition Julia Strand & Andrea Simenstad & Allison Cooperman & Jonathon Rowe

# Psychonomic Society, Inc. 2013

Abstract When perceiving spoken language, listeners must match the incoming acoustic phonetic input to lexical representations in memory. Models that quantify this process propose that the input activates multiple lexical representations in parallel and that these activated representations compete for recognition (Weber & Scharenborg, 2012). In two experiments, we assessed how grammatically constraining contexts alter the process of lexical competition. The results suggest that grammatical context constrains the lexical candidates that are activated to grammatically appropriate competitors. Stimulus words with little competition from items of the same grammatical class benefit more from the addition of grammatical context than do words with more within-class competition. The results provide evidence that top-down contextual information is integrated in the early stages of word recognition. We propose adding a grammatical class level of analysis to existing models of word recognition to account for these findings. Keywords Auditory word recognition . Language comprehension . Lexical processing . Psycholinguistics . Speech perception Recognizing spoken words seems to occur instantly and effortlessly for most people, but it requires that the listener rapidly and accurately map the incoming speech signal onto meaningful representations stored in the mental lexicon. Many models of spoken word recognition (see Weber & Scharenborg, 2012, for a recent review) agree that stimulus input activates multiple representations in the mental lexicon and that the selected representations compete for recognition. J. Strand (*) : A. Simenstad : A. Cooperman : J. Rowe Carleton College, Northfield, USA e-mail: [email protected]

For example, hearing the word cat will also activate phonologically similar words, including sat and cap. Each similar competitor in the mental lexicon, often called a neighbor, competes with the stimulus word, causing words with many neighbors to be recognized more slowly and less accurately than those with few neighbors (Goldinger, Luce, & Pisoni, 1989; Kaiser, 2003; Luce & Pisoni, 1998; Sommers, 1996). In addition to the amount of competition present, other lexical factors that affect spoken word recognition include word frequency (frequently used words are recognized more quickly and accurately; see, e.g., Dahan, Magnuson, & Tanenhaus, 2001; Luce & Pisoni, 1998), and neighbor frequency (words with high-frequency neighbors are recognized less accurately than words with low-frequency neighbors; see Luce & Pisoni, 1998). Although the specific implementations differ, models of word recognition, including jTRACE (Strauss, Harris, & Magnuson, 2007), the neighborhood activation model (NAM; see Luce & Pisoni, 1998), MERGE (Norris & McQueen, 2000), PARSYN (Luce, Goldinger, Auer, & Vitevitch, 2000), and Shortlist B (Norris & McQueen, 2008), all incorporate mechanisms that explain the effects of word frequency and competition on spoken word recognition. Although models of word recognition have not included the grammatical class (e.g., noun, verb) of competitors as a factor that may influence recognition, a large body of evidence demonstrates that grammatical class can have strong influences on language-processing domains (for reviews, see Mirman, Strauss, Dixon, & Magnuson, 2010; Vigliocco, Vinson, Druks, Barber, & Cappa, 2011). For example, neuroimaging and case studies suggest some neuroanatomical distinctions between the regions required for processing nouns and verbs (Miceli, Silveri, Nocentini, & Caramazza, 1988; Shapiro, Pascual-Leone, Mottaghy, Gangitano, & Caramazza, 2001), including double dissociations for verb and noun retrieval among patients with aphasia (Damasio & Tranel, 1993). In

Mem Cogn

speech production, word substitution errors are almost always of the same grammatical class as the target word, providing evidence that selection of grammatical class occurs earlier in speech production than does phonological assignment (Fay & Cutler, 1977). In addition, the tip-of-the-tongue (TOT) phenomenon, in which an individual is temporarily unable to retrieve and produce a known word, demonstrates the influence of grammatical class on word selection (Abrams & Rodriguez, 2005; Abrams, Trunk, & Merrill, 2007). When an individual experiences a TOT state, exposure to a phonologically similar prime word of a different grammatical class than the target word—for example, a verb prime for a noun target word—facilitates the resolution of the TOT state, whereas priming with a word of the same class does not (Abrams & Rodriguez, 2005). This suggests that class activates grammatically appropriate words during word production and phonologically similar words of the same grammatical class provide competition. Despite evidence that grammatical class can influence other domains of language processing, no research to date has explicitly assessed how grammatical class influences competition among lexical neighbors. Related research has demonstrated that ambiguous words with interpretations that are both nouns (e.g., glass) are recognized more slowly than ambiguous words with interpretations that are nouns and verbs (e.g., bark) (Mirman et al., 2010), suggesting that competition is stronger between words of the same class. However, Mirman et al. did not assess the role of lexical competition or the grammatical classes of the neighbors. In addition, no research has directly evaluated whether the processes of lexical competition differ when words are presented in grammatical context. Indeed, models of lexical competition are primarily based on data derived from words presented in isolation or within carrier phrases that provide no context for the target word (e.g., “repeat the word cat”) rather than in more meaningful frames (e.g., “the boy had a pet cat”). However, there is ample evidence that words are recognized differently in meaningful context than in isolation. For example, spoken word recognition occurs more quickly in a syntactically meaningful context than in a nonmeaningful one (Marslen-Wilson & Tyler, 1980). Listeners are more likely to perceive an ambiguous sound as to when it precedes a verb (e.g., “we tried . . . go”) than when it precedes a noun (“we tried . . . gold”) (Isenberg, 1980). In addition, when presented with an ambiguous phoneme between /b/ and /p/, listeners’ perception of the phoneme shifts to accommodate sentence context—that is, “she ran hot water for the p/bath” (Connine, 1987). Although it is clear that context influences word recognition in some manner, models of word recognition are divided about how and when that influence occurs. Models of word recognition that are modular and primarily stimulus driven (i.e., give priority to the bottom-up signal; Forster, 1978) would predict that all neighbors are unavoidably activated

even when they are incongruent with grammatical context. Therefore, in the sentence, “the boy had a pet cat,” these models predict that sat will be partially activated, even though it is contextually inappropriate. To address whether contextually inappropriate candidates are considered, Zwitserlood (1989) presented participants with semantically constraining sentences that ended in a word fragment, followed by primes related either to the final word of the sentence or a neighbor of that word. For example, when a sentence strongly biased the final word to be captain , after hearing the first syllable, participants completed a lexical decision on ship (an associate of captain) or money (an associate of capital ). Zwisterlood found some facilitation for both associates, suggesting that context does not constrain activation (but Janse & Quené 2004, for methodological critiques). In contrast to the modular models (e.g., Forster, 1978), more interactive models argue that context can serve to limit or constrain the lexical alternatives that are activated, such that context influences processing at the earliest stages (Brock & Nation, 2013; Dahan, Swingley, Tanenhaus, & Magnuson, 2000; Dahan & Tanenhaus, 2004). Using a visual world paradigm and tracking participants’ eye movements, Dahan et al. (2000) found that presenting a gender-marked determiner in French reduced activation of grammatically incongruent neighbors. For example, when a feminine determiner was presented just before a feminine noun, participants did not fixate on phonologically similar but grammatically incongruous masculine nouns. Supporting event-related potential data also show that words that are violations of predictions based on grammatical context elicit distinctive patterns of neural activation that differ from those observed from semantic violations (Hagoort, 2003; Kutas & Hillyard, 1984; Van Berkum, Brown, Zwitserlood, Kooijman, & Hagoort, 2005). These findings have been used to support a word recognition model in which context eliminates incongruent candidates and facilitates word recognition in real time, rather than as a secondary processor that evaluates the lexical selection. Additional evidence that context is immediately integrated with phonological information comes from an artificial lexicon study (Magnuson, Tanenhaus, & Aslin, 2008). In this study, participants learned words representing a series of shapes (nouns) and textures that could be applied to the shapes (adjectives). Although some nouns and adjectives were phonologically similar, they did not compete with one another, provided that syntactic context was available to explicitly predict the class of the target word. These results further support the idea that top-down, contextual expectations about grammatical class may influence which phonologically appropriate competitors are activated. The studies above demonstrate the strong influences that contextual information can have on word recognition. However, no research to date has assessed how the processes of lexical competition change in the presence or absence of

Mem Cogn

context. Given the influence of syntactic expectations on word recognition (Dahan et al., 2000; Dahan & Tanenhaus, 2004; Magnuson et al., 2008) and the other domains in which topdown information influences bottom-up processing of speech (Reicher, 1969; Warren, 1970), we predicted that the grammatical context in which words are presented would influence the processes of lexical competition. If the preceding grammatical context constrains activation to grammatically appropriate words (in line with more interactive models), then only competitors that share the grammatical class of the stimulus word should be expected to compete with it. For example, if context constrains which words are activated, then in the sentence “The girl thought about the cat,” the noun mat would provide competition for the final word, whereas the verb sat would not. Thus, we hypothesized that words with little within-class competition (e.g., nouns with primarily verb competitors) should show a significant benefit from the addition of grammatical context, because the grammatical context would eliminate the majority of the competitors. However, words with many within-class competitors should not show as large an advantage, because the grammatical context does not substantially reduce the number of activated competitors.

Experiment 1 Lexical variables Our hypothesis requires a method for quantifying the amount of lexical competition a given word encounters, as well as a way of determining how much of that competition comes from words of the same grammatical class. The most common method for quantifying lexical competition is to define a neighbor as any word that can be formed by a single phoneme substitution, addition, or deletion. For example, neighbors of boat include moat (a substitution), boast (an addition), and bow (a deletion). Although the number of neighbors (neighborhood size) a word has predicts word recognition accuracy (Goldinger et al., 1989; Luce & Pisoni, 1998), neighbor-based approaches have limitations: They implicitly assume that all neighbors provide the same amount of competition to the stimulus word and that words that differ by more than one phoneme do not provide any competition. More recently, an alternative measure of calculating lexical competition, called phi-square density, was proposed (Strand, 2013; Strand & Sommers, 2011). The major difference between neighbor-based approaches and phi-square density is that the latter quantifies the amount of competition that each competitor word provides to the stimulus word continuously. Neighbor-based measures quantify how many words serve as competitors; phi-square density quantifies how much each word competes on a graded scale. Given that graded activation (meaning words are activated to the extent that they are

perceptually similar to the target) is a feature of many models of word recognition (e.g., Luce & Pisoni, 1998; McClelland & Elman, 1986), this method more accurately represents the assumptions of the models. Phi-square density correlates with measures of neighborhood size but accounts for significant unique variance in word recognition accuracy beyond that which is explained by measures of neighborhood size (Feld & Sommers, 2011; Strand, 2013; Strand & Sommers, 2011). Phi-square density is based on the phi-square statistic, which quantifies the perceptual similarity of any phoneme pair on the basis of the pattern of responses to those phonemes in a forced choice identification task (Iverson, Bernstein, & Auer, 1998). For example, when asked to identify /b/ or /v/ in noise, listeners demonstrate similar patterns of responses (e.g., often mistaking them for each other or for /m/, but very rarely for /h/ or /z/). Therefore, /b/ and /v/ have a high phi-square value, reflecting this similarity. However, /b/ and /n/ have a low phi-square value, indicating that listeners choose different response alternatives for each (instead, opting for /t/ or /d/ when presented with /n/). Phi-square values range from 0 (indicating no overlap in the response categories for two phonemes) to 1 (indicating identical patterns of responses). These values correlate highly with forced choice phoneme identification scores but avoid the limitations of using these values, including response biases (see Iverson et al., 1998, for a detailed discussion of these issues). Phi-square values were obtained from an existing data set (Strand & Sommers, 2011). To calculate the perceptual similarity of two words, the phisquare values for each phoneme of the stimulus word and each phoneme of the competitor are multiplied. For example, the phi-square similarity of vote and boat is Φ2(v|b) * Φ2(o|o) * Φ2(t|t). Words with more similar segments (e.g., vote and boat) have higher phi-square similarity values than do words with few overlapping or less similar segments (e.g., boat and node). As target words (see Appendix 1), we randomly selected 240 consonant–vowel–consonant (CVC) words (half nouns, half verbs) from a 1,575-word lexicon that included all CVC words that have entries in both the Subtitle word frequency norms (Brysbaert, New, & Keuleers, 2012) and the English Lexicon Project (ELP; an online database of 40,000 phonologically transcribed word forms; Balota et al., 2007)].1 These databases provided the phonological transcriptions of the words and the frequency with which each word occurs as a given grammatical class. The phi-square similarity values comparing each of the 240 stimulus words and every other word in the 1,575word reference lexicon were calculated. Critically, given the evidence that words outside the neighborhood can provide competition (Strand & Sommers, 2011), all CVCs were included as competitors, regardless of whether they would be considered neighbors. Then, each phi-square similarity value was 1 Multiple instances of the same homophone (e.g., feet and feat) were replaced, as were proper nouns.

Mem Cogn

weighted by the competitor word’s log-transformed frequency of occurrence (Brysbaert et al., 2012; see Luce & Pisoni, 1998, for a similar procedure). Therefore, the frequency-weighted phi-square similarity for vote |boat is higher than that for moat|boat, both because vote and boat are more perceptually similar than moat and boat and because vote occurs more frequently than moat. The frequency-weighted phi-square similarities of the target word and every other word in the 1,575word lexicon were summed to arrive at phi-square density. To determine how the grammatical classes of the competitors influence recognition for each word, we also calculated the amount of each word’s phi-square density that comes from within-class competitors. In this measure (grammatical density), words provide competition only if they have the same grammatical class as the target word. For example, when the grammatical density for boat is calculated, the frequencyweighted phi-square density of moat is included, because it is a noun, but the frequency-weighted phi-square density of bode is not, because it is a verb. Although this procedure is relatively straightforward for words that can serve as only one part of speech, another process is necessary for the vast majority of English words, which can occur as multiple grammatical classes. For example, the word vote occurs most frequently as a verb (in 52% of cases reported in the Subtitle Norms; Brysbaert et al., 2012) but serves as a noun in the remainder of the cases. It would be misleading to entirely exclude vote from the grammatical density of boat because it is not primarily a noun, but it is equally misleading to treat it similarly to moat, which occurs as a noun in 100% of cases. Therefore, we calculated the log-transformed frequency that each word serves as each grammatical class (Brysbaert et al., 2012) and weighted the phi-square similarity value by that value. For example, the phi-square similarity of vote|boat is weighted by the log-transformed frequency with which vote occurs as a noun, rather than the log-transformed frequency with which it occurs overall. Therefore, grammatical density will necessarily be smaller than phi-square density, because some words (e.g., bode in this example) are excluded altogether, and others (e.g., vote in this example) are weighted by less than their full frequency. Finally, these frequency-weighted values were summed to arrive at the grammatical density. By calculating both phi-square density and grammatical density, we can determine the overall amount of competition a word encounters and the amount of competition that comes from within-class words. For example, the nouns cake and wine have relatively similar phi-square densities but differ in grammatical densities. Cake is grammatically dense, because the words that provide the most competition to it (e.g., cork and lake) tend to be nouns. Wine is relatively grammatically sparse, because much of its competition comes from words of a different grammatical class (e.g., while and fine). The goal of this investigation was to isolate the influence of grammatical density on recognition accuracy in constrained

and unconstrained contexts. However, there are multiple other lexical variables that influence word recognition accuracy, including phi-square density, frequency, the length of the word in milliseconds, and the phonotactic probability (Vitevitch & Luce, 2004). In order to remove the influence of these variables on recognition accuracy and ensure that the effects obtained were attributable to grammatical density alone, it was necessary to generate a measure that represents grammatical density, with the variance attributable to the other lexical variables removed. Therefore, we conducted a simple linear regression predicting the grammatical density of both nouns and verbs from their phi-square density, frequency, length, and phonotactic probability. This process allowed us to represent the standardized grammatical density, with the influence of the other lexical variables partialled out (see Gahl, Yao, & Johnson, 2012, for a similar procedure). Because there are many more nouns than verbs in the word norms (Brysbaert et al., 2012), nouns will necessarily have higher grammatical density; standardizing the noun and verb grammatical density separately allows us to compare nouns and verbs on the same scale.2 Table 1 shows the comparison between cake and wine, which are similar on grammatical density and other lexical variables, but higher grammatical density value for cake leads to large differences in the standardized grammatical density values. Although measures based on the phi-square statistic are relatively new to the field, they provide several advantages to the traditional neighbor-based approach. First, they allow lexical competition to be quantified on a continuous scale, rather than categorically, meaning that the extent to which words compete depends on their perceptual similarity. Second, continuous measures provide a way to overcome the problem of words that serve as more than one part of speech. If, instead, we were to define grammatical density as the proportion of neighbors that are the same grammatical class, we would likely need to establish some cutoff above which words are classified as being one part of speech or another. In the example above, for instance, should vote be classified as a within-class neighbor of boat, or should it be excluded altogether? Using continuous measures of perceptual similarity and weighting them by continuous measures of frequency that words occur as each part of speech allows a more fine-grained approach to modeling the influence of grammatical class on lexical competition. Method Participants Forty native English speakers with self-reported normal hearing and normal or corrected-to-normal vision were recruited from 2 When standardized grammatical density was calculated for nouns and verbs combined (rather than both classes separately), the main findings did not differ.

Mem Cogn Table 1 Descriptive data for sample items to illustrate standardization process cake Phi-square density Grammatical density Frequency Phonotactic probability Length Standardized grammatical density

wine

30.69 22.61 3.36 1.18

30.15 16.95 3.49 1.15

551 1.11

602 −1.75

Note . Frequency values were obtained from Brysbaert, New, and Keuleers (2012); phonotactic probability values were obtained from Vitevitch and Luce (2004).

the Carleton College community. Participants (28 female, 12 male) ranged in age from 19 to 27 years (M = 20.8, SD = 2.2). Testing took approximately 30 min, and participants were awarded $5 for their time. Carleton College’s Institutional Review Board approved all of the research procedures. Stimuli The 240 CVC stimulus words (half nouns, half verbs) were presented as the final word of short sentences. Half of the sentences were grammatically unconstrained and allowed the target word to be either a noun or a verb—for example, “Type the word + [target].” The other half were grammatically constrained according to the syntactical norms of simple verb phrases and noun phrases, such that each sentence led to the exclusive expectation of a nonfinite verb or a noun—for example, “The boy began to + [target]” for verbs and “The boy considered the + [target]” for nouns. The sentences were constructed in order to limit the influence of semantic constraints on the target word. Then sentences were piloted to ensure that the above properties were demonstrated appropriately. Stimuli were recorded (using a Blue Microphones Yeti USB Microphone) and equated on RMS (using Adobe Audacity, version 2.0.2) by a female speaker with a standard midwestern accent. Words were recorded in isolation and then edited together with the sentences to eliminate the possibility of coarticulation in the speech preceding the target word.

that the participants were successfully able to understand the sentences and, therefore, form grammatical expectations. Each participant identified 60 nouns and 60 verbs in each context (i.e., unconstrained and constrained). Stimulus presentations were counterbalanced across participants so that no participant responded to the same word more than once, and the order of stimulus presentation was randomized, with all sentence types intermixed. Critically, for each target word, the same speech token was presented at the end of the grammatically unconstrained and constrained contexts, so the comparison of identification accuracy should depend only on the constraining context, and not on any idiosyncrasies of the recording. Participants were presented with sentence stimuli and were asked to identify the final word of the sentence by typing their response on a keyboard. They were encouraged to guess even when unsure. After entering their response, there was a 1-s intertrial interval before the next sentence began. Prior to analysis, the responses were hand-checked for obvious entry errors. Following the procedure described in Luce and Pisoni (1998), responses that were phonologically identical to the target (e.g., guise to guys) were counted as correct, superfluous punctuation was removed (e.g., fan // to fan), and, when the entry did not represent a real word but differed from the target by one letter, the entry was corrected (e.g., guidew to guide or shein to shine). These corrections represented approximately 1% of the responses. Word recognition accuracy was then calculated for each of the target words in both grammatically unconstrained and constrained contexts. Due to experimental error, 16 words that serve as both nouns and verbs were presented in an inappropriate grammatical context. For example, the word vote was presented in noun-constraining context, despite occurring as a verb slightly more often. Due to this error, these words were excluded, and the analysis was conducted on the 224 remaining words. Descriptive statistics for the lexical variables for both nouns and verbs are shown in Table 2.

Table 2 Descriptive statistics for lexical variables for nouns and verbs

Procedure Participants were seated in a sound-attenuating chamber at a comfortable distance from an iMac computer (OS X, 10.6). Stimulus presentation and participant responses were controlled with PsyScope (Version X, B57), and sentences were presented at approximately 68-dB sound pressure level via the computer’s internal speakers. The final words (but not the preceding sentences) were presented in a background of multitalker babble (signal to noise ratio = −2). The preceding sentences were presented without background noise to ensure

Frequency Phi-square density Grammatical density Phonotactic probability Length (ms)

Nouns (N = 109)

Verbs (N = 115)

Mean 2.88 41.76 28.61 1.13 551

Mean 3.40 45.93 18.17 1.14 557

SD 0.76 17.76 12.43 0.04 69

SD 0.96 17.06 7.46 0.04 80

Note . Frequency values were obtained from Brysbaert, New, and Keuleers (2012); phonotactic probability values were obtained from Vitevitch and Luce (2004). Because standardized grammatical density was calculated for nouns and verbs separately, both have a mean of 0 and an SD of 1.

Mem Cogn

Results and discussion Although predicting word recognition accuracy from lexical factors is commonly conducted using an analysis of variance (ANOVA), this method has serious limitations (Baayen, Davidson, & Bates, 2008; Dixon, 2008; Jaeger, 2008). As a result, Dixon and Jaeger advocate the use of mixed-effects logistic regression analyses, rather than ANOVAs, for accuracy data, a method that many other psycholinguistic investigations have recently adopted (Adani, 2011; Bicknell, Elman, Hare, McRae, & Kutas, 2010; Brouwer, Mitterer, & Huettig, 2012; Kantola & van Gompel, 2011; Kootstra, Van Hell, & Dijkstra, 2010). The advantages of this method are that it can simultaneously include random effects of subject and item variance, as well as allow binomial dependent measures (such as accuracy) to be used directly as the input. Data analysis was conducted using R (R Development Core Team, 2008) and the R packages lme4 and languageR (see Baayen, 2008). We included items and subjects as the random effects and included context (constrained or unconstrained), class (noun or verb), and standardized grammatical density as fixed effects, along with the critical context ×standardized grammatical density interaction. Conditional R 2 for the model was calculated following the procedure described in Nakagawa and Schielzeth (2013; see Appendix 2 for model output). The analysis revealed that words were identified more accurately in constrained than in unconstrained context, β = 0.62, SE = 0.05, z = 12.01, p < .001. There was no significant effect of class, β = 0.33, SE = 0.21, z = 1.63, p = .10, indicating that nouns and verbs were identified at approximately similar rates. There was also no significant effect of standardized grammatical density, β = 0.21, SE = 0.13, z = 1.61, p = .10, demonstrating that standardized grammatical density did not influence recognition accuracy when collapsing across constrained and unconstrained contexts. Of most interest to the present investigation was a significant standardized grammatical density × context interaction, β = −0.15, SE = 0.05, z = −2.88, p = .004, indicating that the influence of standardized grammatical density differed depending on whether the context was constrained or unconstrained (see Fig. 1). In unconstrained contexts, the relationship between standardized grammatical density and accuracy was positive, whereas in constrained contexts, the relationship was negative (although the correlations between standardized grammatical density and accuracy did not approach significance in either context, p > .42 for both). The results revealed that standardized grammatical density had different effects on word recognition when the context was constrained or unconstrained. When grammatical context was added, recognition of grammatically sparser words increased, relative to grammatically denser words. This provides additional evidence (e.g., Dahan et al., 2000; Magnuson et al., 2008) that grammatical context may serve to constrain the lexical alternatives that are considered candidates for lexical

Fig. 1 The influence of standardized grammatical density on word recognition accuracy in constrained and unconstrained contexts

competition. Grammatical context eliminates a larger subset of the competitors of grammatically sparse words than of grammatically dense words. Therefore, grammatically sparse words benefit more from the addition of context than do grammatically dense words. However, a limitation of Experiment 1 is that the word recognition in the noise task depends both on online word-parsing processes and on a postprocessing decisional stage (Pisoni, 1996). The present results could thus reflect response bias during the decisional stage; that is, participants opted to report grammatically congruent responses, rather than a process that occurs in real time during lexical competition. To help address this issue, a second experiment was conducted in which response speed was also measured.

Experiment 2 Method Participants Twenty-six participants with self-reported normal hearing and normal or corrected-to-normal vision were recruited from the Carleton College community. Participants (19 female, 7 male) ranged in age from 18 to 21 years (M = 19.5, SD = 1.2). One participant was excluded for having a high error rate and long reaction times (RTs). Testing took approximately 30 min, and participants were awarded $5 for their time. Carleton College’s Institutional Review Board approved all of the research procedures. Procedure Experiment 2 followed the structure of Experiment 1, but 240 nonwords were also included in addition to the 240 target words presented in Experiment 1. The nonwords were

Mem Cogn

phonotactically legal English CVCs (e.g., boke and lib). One hundred twenty of these nonwords were presented in unconstrained contexts, 60 in noun-constraining contexts, and 60 in verb-constraining contexts. All stimuli were presented in the absence of background noise at approximately 68 dB SPL. Each stimulus item was presented to each participant only once, and stimulus order was randomized, with all sentence and word types intermixed. Rather than identifying the final word of the sentence (as in Experiment 1), participants were asked to determine whether the final word of the sentence was a real English word as quickly and accurately as possible. Participants pressed a button with their right hand (using an IoLab ButtonBox) to indicate that the last word of the sentence was a real English word, and another button with their left hand to indicate that it was not a word. Results and discussion Word analysis RTs for accurate identifications were obtained and were square-root transformed. Individual RTs of longer than 2-s were removed (N = 38)—fewer than 1% of responses. The analysis paralleled that of Experiment 1, using Markov chain Monte Carlo sampling to assess significance levels (Baayen, 2008). We included items and subjects as the random effects and included context (constrained or unconstrained), class (noun or verb), and standardized grammatical density as fixed effects, as well as the standardized grammatical density × context interaction. The analysis revealed that words were identified more quickly in constrained than in unconstrained context, β = −0.85, SE = 0.15, t = −5.72, p < .001. There was no significant effect of class, β = 0.52, SE = 0.33, t = −1.62, p = .11. There was a significant effect of standardized grammatical density, (β = −1.13, SE = 0.28, t = −4.04, p < .001), and a standardized grammatical density × context interaction, β = 0.57, SE = 0.15, t = 3.78, p < .001, indicating that the influence of standardized grammatical density differed depending on whether the context was constrained or unconstrained. In the unconstrained context, standardized grammatical density facilitated recognition (led to faster reaction times), r = −.21, p = .001, but the benefit disappeared in the constrained context (p = .85; see Fig. 2). The significant interaction indicates that words that were more grammatically sparse benefitted more from the addition of context than did more grammatically dense words. Nonword analysis The speed with which nonwords are rejected (correctly identified as not being real words) also provides information about the processes of lexical competition; nonwords with many

Fig. 2 The influence of standardized grammatical density on lexical decision latency in constrained and unconstrained contexts

neighbors are recognized more slowly than nonwords with few neighbors (Luce & Pisoni, 1998). RTs to correct lexical decisions were calculated and square-root transformed. Nonwords in grammatically constrained contexts were rejected (correctly identified as nonwords) more quickly than nonwords in unconstrained contexts, t(1, 238) = 3.51, p = .001, Cohen’s d = 0.44. To assess whether nonwords recognition is influenced by grammatical density as well (independently of any other lexical factors that may influence recognition), we also calculated values for lexical competition and grammatical density for the nonwords that appeared in grammatically constraining context. We calculated phi-square density values for each of the nonword stimuli by comparing them with each of the 1,575 words in the reference lexicon, following the same procedure that was used for words. For example, fid was compared with every other word in the lexicon, (e.g., did|fid, boat|fid, face|fid, NNN|fid), and these values were summed. Although nonwords have no grammatical class of their own, we treated the nonwords in noun-constraining context as nouns and the nonwords in the verb-constraining context as verbs for the purpose of calculating grammatical density. Therefore, in the sentence “He thought about the fid,” we weighted the phi-square density of each competitor by the frequency with which it occurs as a noun. In the example above, the phi-square value of did would not be included in the grammatical density of fid because it cannot occur as a noun, but the phi-square values of fad and face would be. Different lists of nonwords were used in the constrained and unconstrained contexts, and given that it is not possible to calculate grammatical density of nonwords in the absence of constraining context, the 120 nonwords in the unconstrained context have no values for grammatical density. The analysis described below therefore reflects only the nonwords that appeared in grammatically constraining contexts. To examine the influence of grammatical density separately from that of neighborhood density, we calculated a linear regression to generate residuals for the standardized grammatical

Mem Cogn

density of the nonwords with the influence of phi-square density removed, following the procedure used above for words. This analysis was done separately for nonwords in verb-constraining contexts and noun-constraining contexts. This standardized grammatical density was used as the criterion variable in subsequent analyses (following the procedure for word analyses). Nonwords in noun-constraining contexts and verb-constraining contexts were identified equally quickly, t(118) = 0.19, p = .85, Cohen’s d = 0.03. Unlike the word analysis, each nonword was identified only in one context (constrained or unconstrained). Therefore, a correlation was conducted to assess the relationship between standardized grammatical density and RT, rather than using the mixed effects model. In the grammatically constrained context, standardized grammatical density and RT were significantly correlated: Words with low standardized grammatical density were identified more quickly than words with high standardized grammatical density, r = .29, p = .002 (see Fig. 3). The correlation between standardized grammatical density and RT was significant for both nonwords in nounconstraining contexts, r = .27, p = .04, and for nonwords in verb-constraining contexts, r = .31, p = .02. This suggests that, in grammatically constraining contexts, nonwords that are perceptually similar to many words of the appropriate grammatical class take longer to eliminate from consideration than nonwords that do not sound like grammatically appropriate words.

General discussion Thus far, models of word recognition have not incorporated the effects of grammatical context on lexical competition

Fig. 3 The influence of standardized grammatical density on lexical decision latency for nonwords in grammatically constraining contexts

during spoken word recognition (but see Mirman et al., 2010). However, our results suggest that grammatical context may limit the lexical activation to grammatically appropriate competitors of the stimulus word. As a sentence unfolds, it provides expectations about grammatical context and reduces or eliminates activation for competitors that do not satisfy the grammatical requirements for a particular target word (e.g., a verb competitor for a noun target). Although removing competitors will benefit all words to some extent, it preferentially benefits grammatically sparse words, because a larger proportion of their competitors are eliminated by the context. The word analyses from Experiments 1 and 2 demonstrate that the influence of standardized grammatical density depends on whether grammatical context is present. Although there is a moderate benefit for standardized grammatical density in unconstrained contexts, that benefit disappears in grammatically constrained contexts, at which point standardized grammatical density impairs recognition. Therefore, responses to grammatically sparse words were facilitated to a greater degree than to grammatically dense words. The nonword analysis from Experiment 2 provides further evidence that context limits the set of words that compete. Lexical decisions to nonwords are assumed to occur when the lexical search space has been exhausted and no matching candidate is found (Forster & Bednall, 1976). If grammatical context eliminates incompatible candidates, it reduces the necessary search space. Thus, when presented with a nonword serving as a noun that is perceptually similar to only a few other nouns, listeners search the lexical candidates and more rapidly reject the stimulus as a word. These results provide further support for previous findings (Dahan et al., 2000; Magnuson et al., 2008) that grammatical context affects lexical activation by reducing activity for competitors that are contextually inappropriate. Although current models of isolated spoken word recognition (i.e., PARSYN, NAM, Trace) do not currently include a mechanism to account for the influence of grammatical class in lexical competition, small modifications to the architectures of the existing models could explain the present findings. The TRACE model of word recognition (McClelland & Elman, 1986; Strauss et al., 2007) is a connectionist model that includes three levels of processing: a feature layer, a phoneme layer, and a word layer.3 As the speech stimulus unfolds, activation spreads up from the feature layer through the phoneme layer to the word layer. Additionally, there are feedback connections from the word layer to the phoneme layer and lateral inhibition within layers. The lateral inhibition accounts for lexical competition effects (lexical nodes within the word layer inhibit one another), and the feedback connections between word and 3 The TRACE is not unique in its ability to be modified to account for these findings; it is presented here because of its established place in the field.

Mem Cogn

phoneme layers account for top-down lexical processes, such as the phoneme restoration effect (Warren, 1970). To account for the present findings, we propose adding a grammatical class level to the architecture of the model, at a higher level than the word level. A syntactic or grammatical class level of processing in spoken word recognition would parallel the syntactic frame of word production models (Dell, 1986; Levelt, 1999) as well as sentence-processing models supported by ERP data (Hagoort, 2003). In the production models, specification of grammatical class occurs separately from lexical selection. These syntactic frames account for the findings that word substitution errors usually occur within grammatical class (Fay & Cutler, 1977); these errors represent successful syntactic frames but failures in phonological or lexical selection. The proposed addition of a grammatical class level closely parallels the principles of a computational model of parsing, the unification space model (Vosse & Kempen, 2000). In this model, when a word in the mental lexicon is retrieved, it activates a lexical frame , which is a basic syntactic structure that specifies the appropriate grammatical contexts for an activated word. As words continue to be processed, the lexical frames are bound together to form a sentence (see Hagoort, 2003, for ERP support for this model). Given the emphasis of the present research on the processes of lexical competition, we frame our discussion in light of models that explicitly include mechanisms to account for lexical competition but acknowledge that there are many parallels between this class of models and others (e.g., Friederici, 2002; Hagoort, 2005). The proposed grammatical class level contains bidirectional excitatory connections with the word level. When a listener hears an utterance, the preceding grammatical context increases activation for the contextually appropriate grammatical class node. Activation from the grammatical class node spreads down to grammatically appropriate word nodes, raising the activation level for those nodes. As acoustic-phonetic input feeds up the feature and phoneme levels and begins to activate the word nodes as well, activation is highest for word nodes that are both grammatically and phonologically appropriate. Because words that are grammatically incongruent are less activated than those that are grammatically congruent, they inhibit the target word less. Therefore, when presented in grammatical context, grammatically incongruent competitors provide less competition for the stimulus word. Thus, adding context selectively benefits words with less withinclass competition. The present research is not able to assess the time course at which the grammatical activation gathers (relative to the lexical nodes), but future work should seek to address this. The grammatical class level of analysis also accounts for the nonword analysis in Experiment 2. Because nonwords lack entries in the lexicon, a nonword is correctly rejected when the system realizes that no lexical entry is consistent

with the phonological input. Given that grammatically constraining context reduces the number of active lexical candidates, those words can be examined and rejected as matching the input more quickly than when many candidate competitors are present. The grammatical class level of analysis can also account for the finding that standardized grammatical density provides small benefits for RT in the grammatically unconstrained context. In a grammatically unconstrained context, no topdown information will initially be present, since the sentence is not preferentially activating any specific grammatical class node. As bottom-up speech input unfolds, activation spreads from the feature level to the phoneme level to the word level. As multiple words of the same grammatical class are activated, that activation accumulates in the appropriate grammatical class node. Therefore, perceiving a grammatically dense stimulus word will cause heightened activation in the appropriate grammatical class node, whereas the grammatical class level activation of a grammatically sparse word will be more dispersed across class nodes. Assuming excitatory feedback connections from grammatical class level to word level, higher levels of activity in the appropriate grammatical class node will provide additional activation for the stimulus word. This additional activation could be sufficient to raise the activation level of the stimulus word enough to push it over the recognition threshold. For grammatically sparse words, activity in the grammatical class level will be more distributed across class nodes, and therefore the stimulus word will receive less activation from the grammatical class level. In real-world speech processing, listeners undoubtedly make use of many top-down, contextual cues to process spoken language, in addition to the bottom-up acousticphonetic information. Given that, incorporating the influence of top-down cues into models of word recognition may improve their predictive power. This proposed change to models of word recognition is consistent with research that suggests early influences of top-down grammatical information occurring simultaneously with bottom-up acoustic-phonetic activation (Dahan et al., 2000; Magnuson et al., 2008). A grammatical class level allows the model to represent the effects of grammatical context on the processes of lexical competition and word recognition by creating a set of candidates that are both grammatically and phonologically appropriate. Future research should address how other types of cues, including semantic or contextual, may also constrain which lexical candidates are activated for competition. Acknowledgments This work was supported in part by a grant from Howard Hughes Medical Institute to Carleton College Interdisciplinary Science and Math Initiative (Grant 52006286). We are grateful to Sarah Meerts, Julie Neiworth, and three anonymous reviewers for helpful comments on an earlier draft. Portions of this research were presented at the Association for Psychological Science Annual Meeting, May 2013.

Mem Cogn Table 3 (continued)

Appendix

Table 3 Stimuli used in Experiments 1 and 2 Nouns and verbs used in Experiments 1 and 2 Nouns Verbs badge food luck song bake fill bag foot lull soup beg fit ban fuss lung suit bet fought batch game mace sun bide gain beach germ mice surf boil gave beef goal mob tag bought give bell gong mood thing budge got bike goose moss tour burn guess boss gown muck town call had cake guide mud toys catch has calf guys mug van caught heed

make met mock move need paid pave pull push put ran

shows shut sing sit soak take talk taught teach tell tied

cash cause chain chase chum coach cub dash date dice dot duck fad faith fang fees feet fig

haze hearse hedge hoof hoop house hub jade jaws jib joys juice keys knife latch leaf lid lodge

myth name neck nick pawn peg pep pod pub robe room rug sauce shade shape sheen shin shirt

took tuck turn use veer walk wash watch weave wed win wish woke wrote

folk

loop

shoes

vase vine wage wife wig wine year yen

cease check cheer chose cook cope dare deem dig dine dug fade fail fall fed feed feel fell

hid hide hope jog join keep knock laugh lead learn leave led let lied lit look lose lurk

rang reach read reek rid rowed run rung sang sat save seek seep serve shed shine shook shoot

fight

made

shout

dʒɛr dʒɝz dʒig dʒɪs dʒoɪg

gɛk gɛn gɝn gid gɪŋ

kɔs kʊp ledʒ lɛk lɝd

bɛm bɛs biv bok bɔn bug bul

dʒoɪt fæl fæt∫ fæθ faɪd fet∫ fɛg

gɔm gut guv hæb haʊt hɑdʒ hem

lit liθ lɪ∫ lɪb lɪg lɪn lɪθ

fɛn zil fɪk fod fɔd fɔz fuf fʊs fʌb fʌl fʌm fʌt fʌθ gæd gaʊd ged geθ gɛd

hɛz him hos hun hʌd hʌp hʌz kaɪb kaɪd kaɪn kaʊf ke∫ keb kɛp kɛz kim kit kob

lot luf lʌd lʌs lʌt mɑt∫ meb mɝf mɝt miv mok mos mɔk mɔt mʌb mʌn mʌθ næf

paɪt paʊd paʊn pɑv pɛd pɛdʒ pɛl pɝn pɪd pʊd pʊn pʌŋ ræf raʊd raʊs rɛm rɛt rib

sɛp sɝd sɝk sɝm sig sɪf sos suf ∫æn ∫et ∫ɛn ∫ɛt ∫ɝn ∫ɪd ∫op ∫oθ ∫un ∫uv

ted tes tɝt tɝt∫ tɝz tɪd tɪg tɪv toɪk tɔz tup tʊb tʊt tʌp tʌz θek θit θɪg

wɛg wɛk wɝn wɝt∫ wim wo∫ won wɔg yer yoz zaɪd zaɪz zaʊt∫ zed zɪr zol zɔt zʌŋ

Table 4 GLMM model output Experiment 1 Fixed Effects

β

(Intercept)

−1.25 0.35

SE(β) z-value p −3.61

Grammatical context constrains lexical competition in spoken word recognition.

When perceiving spoken language, listeners must match the incoming acoustic phonetic input to lexical representations in memory. Models that quantify ...
341KB Sizes 0 Downloads 0 Views