JSLHR

Research Article

Articulatory and Acoustic Correlates of Contrastive Focus in Congenitally Blind Adults and Sighted Adults Lucie Ménard,a Annie Leclerc,a and Mark Tiedeb

Purpose: The role of vision in speech representation was investigated in congenitally blind speakers and sighted speakers by studying the correlates of contrastive focus, a prosodic condition in which phonemic contrasts are enhanced. It has been reported that the lips (visible articulators) are less involved in implementing the rounding feature for blind speakers. If the weight of visible gestures in speech representation is reduced in blind speakers, they should show different strategies to mark focus-induced prominence. Method: Nine congenitally blind French speakers and 9 sighted French speakers were recorded while uttering sentences in neutral and contrastive focus conditions. Internal lip area, upper lip protrusion, and acoustic values (formants, fundamental frequency, duration, and intensity) were measured.

Results: In the acoustic domain, both groups signaled focus by using comparable values of fundamental frequency, intensity, and duration. Formant values in sighted speakers were more affected by the prosodic condition. In the articulatory domain, sighted speakers significantly altered lip geometry in the contrastive focus condition compared with the neutral condition, whereas blind speakers did not. Conclusion: These results suggest that implementation of prosodic focus is affected by congenital visual deprivation. The authors discuss how these findings can be interpreted in the framework of the perception-for-action-control theory.

P

Correspondence to Lucie Ménard: [email protected] Editor: Jody Kreiman Associate Editor: Anne Smith

that the range of upper lip protrusion between rounded and unrounded French vowels was reduced in blind speakers compared with sighted speakers. In contrast to this, tongue position and shape varied to a greater extent in blind participants than in sighted participants. Those results led us to propose that vision regulates, to some extent, the phonetic implementation of phonological features. To describe our conclusions, we used the perception-for-action-control theory (PACT) framework, which has been presented by Schwartz, Basirat, Ménard, and Sato (2012), in which speech goals correspond to multisensory perceptuo-motor units. In the course of speech development, perception and action are tightly linked, and speech perception involves procedural knowledge of speech production mechanisms. Furthermore, perceptual mechanisms provide speech gestures with auditory, visual, and somatosensory templates that guide and maintain their development. Congenitally blind speakers have developed articulatory strategies based on nonvisual sensory templates and, thus, are less likely to associate (visible) lip gestures with these templates as much as sighted speakers do. To explore the link between multimodal feedback and speech production, we conducted an experiment using

Received December 11, 2012 Revision received June 28, 2013 Accepted October 29, 2013 DOI: 10.1044/2014_JSLHR-S-12-0395

Disclosure: The authors have declared that no competing interests existed at the time of publication.

revious studies have suggested that congenital visual deprivation affects speech production. In blind babies, for example, patterns of early babbling differ from those of sighted babies (Lewis, 1975). Developmental trajectories in language acquisition also differ between blind and sighted children. We recently conducted experiments aimed at investigating acoustic and articulatory differences in speech produced by congenitally blind speakers and sighted speakers (Leclerc, 2007; Ménard, Cathiard, Dupont, & Tiede, 2013; Ménard, Dupont, Baum, & Aubin 2009; Ménard, Toupin, et al., 2013). The experiments revealed that despite having better auditory discrimination abilities, blind speakers had reduced values of vowel contrast distances in the acoustic space compared with their sighted peers. Furthermore, at the articulatory level, we showed

a

Center for Research on Brain, Language, and Music, Université du Québec à Montréal, Montreal, Quebec, Canada b Haskins Laboratories, New Haven, CT

Key Words: vowels, sensory deprivation, speech production, speech perception

Journal of Speech, Language, and Hearing Research • Vol. 57 • 793–804 • June 2014 • A American Speech-Language-Hearing Association

793

prosodic contrasts as a means of manipulating distinctiveness. It has been reported that in accented constituents, phonemic contrasts are enhanced, thus increasing the constituent’s saliency for the listener (Cho, 2005; de Jong, 1995; Hay, Sato, Coren, Moran, & Diehl, 2006). In such prosodically strong conditions, speakers often hyperarticulate the canonical speech features associated with vowels and consonants (Cho, Lee, & Kim, 2011; de Jong, 1995). As suggested by Hay et al. (2006), the different strategies used to increase perceptual saliency can be explored by investigating distinctiveness-enhancing contexts such as focus. Those strategies would depend on the phonological contrasts of a language and also on the different weights given to possible phonetic realizations of a contrast. We hypothesized that if the weighting given to gestures in French for an associated visual component (such as lip protrusion) was greater in sighted speakers than in congenitally blind speakers, then those gestures could be used in different ways to signal focus in these two speaker groups. To investigate this, we conducted a study of the articulatory (lip geometry) and acoustic correlates of vowels produced in a neutral condition and under contrastive focus, in congenitally blind and sighted French speakers.

Contrastive Focus Variation in prosodic structure is a source of variability in produced speech (Cho et al., 2011). In discourse organization, important information is signaled by different (e.g., syntactic or lexical) means, including prosodic prominence. This prominence is referred to as focus. A unit bearing broad focus in a sentence is the unit introducing new information into the discourse, without any intended emphasis. Narrow focus, however, affects the unit put forward by the speaker in contrast to other units (e.g., the word apple in the sentence “No, I ate the apple” in reply to the question “Did you eat the orange?”). Contrastive focus is a type of narrow focus defined as involving the selection by the speaker of a given constituent of the message to be emphasized as opposed to another constituent in a paradigmatic comparison (Bartels & Kingston, 1994; Dahan & Bernard, 1996; Di Cristo, 2000; Ladd, 1996; Pierrehumbert & Hirshberg, 1990; Selkirk, 1984; Touati, 1987). Sometimes referred to as focal accent, contrastive emphasis, contrastive stress, or contrastive focus, this type of focus indicates a contrast between the emphasized unit and others. In the current article, we deal with contrastive focus in French conveyed by prosody. There are multiple correlates of prosodic focus. At the intonational level, focused segments are produced with a large and sharp rise in the fundamental frequency (F0) and/or intensity of the focused constituent and an increased duration of the focused syllable (e.g., Clech-Darbon, Rebuschi, & Rialland, 1999; de Jong & Zawaydeh, 2002; Delais-Roussarie, Rialland, Doetjes, & Marandin, 2002; Di Cristo, 1998; Dohen & Llvenbruck, 2004; Dohen, Llvenbruck, Cathiard, & Schwartz, 2004; Heldner & Strangert, 2001; Jun & Fougeron, 2000; Rossi, 1999; Smiljanic, 2004; Touati, 1987). Although

the terminology used may differ, a number of articulatory studies in several languages have examined what we consider to be contrastive focus. Studies carried out in English have shown that this prosodic prominence is associated with larger, faster, and longer lip opening gestures and/or jaw movements (Beckman & Edwards, 1994; Cho, 2005, 2006; de Jong, 1995; Erickson, 1998, 2002; Harrington, Fletcher, & Beckman, 2000; Llvenbruck, 1999). These articulatory correlates are often related to spectral changes, with low vowels being more peripheral in the acoustic space (Cho, 2005) and vowel space being expanded in focused positions (Cho et al., 2011; Hay et al., 2006; van Bergem, 1993). To explain the effects of contrastive focus on the kinematic patterns of the supraglottal articulators, it has been proposed that the lowering of the jaw and the larger lip opening would increase the sonority feature (Beckman, Edwards, & Fletcher, 1992; Erickson, 2002; Harrington et al., 2000), whereas the tongue dorsum would be controlled independently to enhance place features (deJong, 1995; Cho et al., 2011; Cho & McQueen, 2005). Dohen, Llvenbruck, and Hill (2009), in a description of lip geometry involved in the focus versus neutral distinctions in French, reported a global increase of lip protrusion and lip opening in focused constituents. It is not clear, however, which segments (rounded or unrounded, high or low vowels) are most affected. This pattern is referred to as lip feature enhancement; such maneuvers aim to enhance sonority by making the syllable more salient and thus more intelligible, for the perceiver.

French Vowel System: Auditory and Visual Correlates French oral vowels contrast along three phonological dimensions: height, place of articulation, and rounding. The vowel space is delimited by the four vowels /i/, /y/, /u/, and /a/. The vowels /i/, /y/, and /u/ are high vowels, whereas the vowel /a/ is a low vowel. The three high vowels further contrast along place of articulation (/i/ and /y/ are front, /u/ is back) and rounding (/i/ is unrounded and /y/ is rounded). In traditional phonetic description (Ladefoged, 1982; Vaissière, 2007), changes in height degree require the tongue dorsum to be farther from the palatal vault. This is achieved, partly, by jaw lowering. Contrasts between places of articulation mainly involve tongue position, the tongue being more backward in back vowels than in front vowels. As for lip rounding, compared with their unrounded counterparts, rounded vowels are generally produced with a reduced lip opening and more protruded lips. However, trading relationships between various articulators can act to modify vocal tract geometry (Perkell, Matthies, Svirsky, & Jordan, 1993). As demonstrated by simulations with an articulatoryto-acoustic model of the vocal tract (Maeda, 1979; Ménard, Schwartz, & Boë, 2004), across speakers, the rounding contrast might involve smaller movements of the lips but some displacement of the tongue. In acoustic terms, changes in height degree (when going from /i/ to /a/, for instance) are mainly related to the first formant frequency (F1) dimension, and changes in place of articulation mainly

794 Journal of Speech, Language, and Hearing Research • Vol. 57 • 793–804 • June 2014

affect F2. Lip rounding involves a decrease in F2, and sometimes F3 (as is the case for some vocal tract configurations for the /i/ vs. /y/ pair; see Schwartz, Beautemps, Abry, & Escudier, 1993). In the visual channel, some of the articulatory maneuvers described above can be seen by the addressee and, thus, have visual consequences (Scarborough, Keating, Matty, Cho, & Alwan, 2009; Munhall, Jones, Callan, Kuratate, & Vatikiotis-Bateson, 2004). Abry et al. (1980) and Abry and Boë (1986) have extensively studied lip geometry in French vowels and have shown the relevance of interlabial width, interlabial height, interlabial area, and lip protrusion. Using audiovisual perceptual tests, Cathiard (1994) and Cathiard et al. (1992) demonstrated that interlabial area and upper lip protrusion are related to the perception of rounded and unrounded vowels. Lip protrusion would be particularly relevant in the perceptual distinction between /i/ and /y/ (Cathiard, 1994). Those geometrical features are involved in prosodic focus, at both the production and perception levels. For instance, Dohen and Llvenbruck (2009) presented participants with utterances produced in narrow focus (subject, verb, or object) in three conditions (audio only, visual only, or audiovisual). Perceivers could correctly identify focus more often in the audiovisual condition than in the audio-only or visual-only conditions. Significant correlations were found between perceptual scores, on the one hand, and interlabial area and jaw lowering, on the other (Dohen et al., 2004). In the current study, we investigated the role of congenital visual deprivation on phonemic representations of French vowels, through manipulations of prosodic context. It has been proposed that the articulatory and acoustic correlates of focus vary depending on the features’ weights in the phonemic representations (Hay et al., 2006). Therefore, differences between blind and sighted speakers in marking prosodic focus-induced prominence might be revealed in speech target differences between these speaker groups. Since we previously showed that blind speakers produce smaller lip protrusion contrasts than their sighted peers (Ménard et al., 2009; Ménard, Toupin, et al., 2013), which we suggested may reflect the reduced weight attributed to this visible gesture in blind speakers, we hypothesized that blind and sighted speakers would differ in the lip protrusion and lip area strategies they used to implement prosodic focus. In the focus context, compared with the neutral context, all vowels are expected to be spoken with larger lip areas and upper lip protrusion, according to Dohen et al. (2009). The difference between focus and neutral conditions would be reduced for blind participants compared with sighted participants. To investigate other potential effects of visual deprivation on prosody, we aimed to also examine traditional acoustic correlates (F0, intensity, duration).

recruited a subset of nine sighted participants (six men and three women) and nine congenitally blind participants (six men and three women) from our previous study (Ménard et al., 2009). All participants were native speakers of Canadian French who lived in the Montreal area. The blind participants had congenital, complete visual impairment, classified as Class 3, 4, or 5 in the International Disease Classification of the World Health Organization, and they had never had any perception of light or movement. They ranged in age from 40 to 52 years (M = 44 years). They had no self-reported language disorders or motor deficits. Table 1 presents pertinent characteristics of the blind participants. Nine sighted adult participants were also recruited and formed the control group. They had either perfect vision (20/20) or impaired vision that was near-perfect when corrected by lenses. The control subjects ranged in age from 22 to 39 years (M = 33 years).1 All participants passed a 20-dB HL pure-tone screening procedure at 500, 1000, 2000, 4000, and 8000 Hz. The corpus consisted of the four French vowels /i y u a/ embedded in CVC syllables where C was one of the following consonants: / b d g /. Those three consonants were chosen because they offer various degrees of lingual coarticulatory resistance (Manuel & Stevens, 1995; Noiray, Ménard, & Iskarous, 2013). Labial consonants do not involve the tongue and thus are maximally coarticulated with the vowels that follow, whereas alveolar and velar consonants involve the tongue dorsum and/or the tongue body and thus are produced with a lesser degree of coarticulation with the upcoming vowel (Krull, 1987; Neary & Shammass, 1987). Because the prosodic variation under study involves hyperarticulation related to reduced coarticulation, the three consonants were chosen to elicit various degrees of coarticulation. The resulting syllables were embedded in carrier sentences of the type, “Le mot CVC me plaît” (“I like the word CVC.”) Ten repetitions of each sentence were obtained in random order in each of the following prosodic conditions. First, sentences were elicited in a neutral manner, without any particular accent, as if the subjects were speaking to a friend (neutral condition; e.g., “Le mot bab me plaît.” [“I like the word bab.”]) After each sentence, the experimenter repeated the sentence in an interrogative form, but replaced the target syllable with a different one (e.g., “Le mot did te plaît?”) The participant was instructed to repeat the sentence with the original target syllable, as if he or she was correcting the experimenter’s error (e.g., “Le mot BAB me plaît.”). This last condition was called the focus condition (or contrastive focus condition). Ten repetitions of each sentence, in each of the two prosodic conditions, were performed. Items were randomized across vowels and consonants. As judged by three trained phoneticians, the prosodic contrasts produced by the participants were equally salient.

Method Procedure The experiment was approved by the Université du Québec à Montréal Institutional Review Board. We

1 Despite the fact that sighted participants were, on average, 10 years younger than blind participants, they had comparable results on the audiologic screening tests. Thus, the group differences reported here cannot be related to age differences at the auditory level.

Ménard et al.: Contrastive Focus in Blind and Sighted Speakers

795

Table 1. Characteristics of the nine blind speakers. Subject

Gender

Age

S1

F

48

S2

F

S3

Etiology of blindness

Vision at birth

Current vision

Retinitis pigmentosa

U

40

Congenital cataract

U

M

52

Optic atrophy

S4

M

40

Detachment of the retina

U

S5

M

42

U

S6

M

52

Congenital cataract and congenital glaucoma Congenital cataract

Total blindness

S7

F

45

Congenital cataract

Total blindness

S8

F

52

Retinitis pigmentosa

U

S9

M

42

Retinitis pigmentosa

Total blindness

RE = 3/210 LE = 0 RE = 0 LE = 6/1260 RE = 0 LE = 0 RE = 2/180 LE = 2/105 U (total blindness) RE = 3/180 LE = 2/180 U (total blindness) U (total blindness) U (total blindness)

Total blindness

Note. F = female; M = male; LE = left eye; RE = right eye; U = undetermined.

The participants were recorded using the Labiomètre Indépendant Programmable Temps Réel Avec ChromaKey (Liptrack) system (a program originally developed at Institut de la Communication Parlée (now Grenoble Images Parole Signal et Automatique laboratory); Lallouache, 1991), synchronized with a high-quality microphone. They were seated comfortably in a quiet room, with their heads immobilized by a helmet. Their lips were painted with blue make-up, in accordance with a detection method originally developed at the Grenoble Images Parole Signal et Automatique laboratory (Lallouache, 1991). Frontal and lateral views of the lips were obtained using a 45-degree mirror mounted to the helmet. The acoustic signal was transduced with a Shure SM-86 microphone and preamplified before being recorded using a Canopus ADVC-100 digital audio/ video converter at a 48000 Hz sampling rate.

Data Analysis Images were digitized at the National Television System Committee standard 29.97-Hz frame rate, in which one video image is composed of two interleaved fields. By extracting these fields and interpolating, a 60-Hz frame rate was obtained. Lip geometry was analyzed by looking at two components (Abry et al., 1980; Noiray, Cathiard, Ménard, & Abry, 2011): interolabial lip area (hereafter referred to as lip area) and upper lip protrusion. Those components have proved suitable for phonetic analyses of French vowels, as described earlier. For each vowel, the lip area and the protrusion of the upper lip relative to a consistent reference line were tracked on each image between the vowel acoustic onset and offset, using a supervised MATLAB procedure developed in our laboratory (see Figure 1). Only the vowel embedded in the target syllable CVC was analyzed. Lip area and lip protrusion values were extracted for each vowel at the vowel midpoint.

In the acoustic domain, fundamental frequency values (F0) were extracted by the autocorrelation method at the vowel midpoint, using Praat software.2 The first two formant frequencies (F1 and F2) were then measured at the vowel midpoint, using the linear predictive coding autocorrelation algorithm implemented in Praat. The number of poles varied from 10 to 14 (and the number of linear predictive coding coefficients varied from 20 to 28), which is in the range of the parameters used by Hillenbrand, Getty, Clark, and Wheeler (1995). Data were downsampled at 22050 Hz. A 14-ms Hamming window was used, with a pre-emphasis factor of 0.98 (pre-emphasis from 50 Hz). The automatically extracted formant values were overlaid on a wide-band spectrogram with a spectral slice obtained by a fast Fourier transform analysis. When large discrepancies were observed—either (a) between the overlaid formant values and the spectrogram, or (b) between the overlaid formant values and the spectral slice—the prediction order of the automatic detection algorithm was readjusted, and the analysis was repeated. Formant frequency values and F0 values, in Hz, were transformed into mel units for subsequent analysis, to account for the ear’s frequency tonotopic scaling and to allow between-speaker comparisons. Vowel duration was also extracted, as well as root mean square (RMS) values at vowel midpoint. In keeping with the method used by Lane, Matthies, Perkell, Vick, and Zandipour (2001), Ménard et al. (2007), and others, we computed measures of the contrast distance between vowel categories. (Those measures have been used in studies of speech production in cochlear implant users.) For each participant, Euclidean distances were calculated between the mean F1 and F2 values of a given vowel for all possible

2

Available at http://www.fon.hum.uva.nl/praat/.

796 Journal of Speech, Language, and Hearing Research • Vol. 57 • 793–804 • June 2014

Figure 1. Interolabial area (a) and upper lip protrusion (b) measurements are extracted from frontal and lateral video images. The dashed line corresponds to the upper lip protrusion value.

vowel pairs, in the acoustic space. The distances were then averaged across participants to produce a measure of average vowel spacing (AVS; Lane et al., 2001). Two separate, repeatedmeasure multivariate analyses of variance (MANOVAs) were carried out on the data averaged across the 10 repetitions (because this method provides a better measure of targets than the individual values). The first MANOVA was conducted using the articulatory data, with (a) lip area and upper lip protrusion as the dependent variables; (b) group (blind or sighted) as the between-subject factor; and (c) prosodic condition (neutral or focus), consonant context (/b/, /d/, or /g/), and vowel (/i/, /y/, /u/, or /a/) as within-subject independent variables. The second MANOVA used the same betweensubject factor and independent variables, but the dependent variables were the acoustic measurements F0, F1, F2 (in mels), RMS values at vowel midpoint, and vowel duration. Lastly, the effects of the independent variables on AVS values were explored using a repeated-measures ANOVA.

Results Acoustic Results The average values of the acoustic variables are shown in Figures 2 and 3. Values of the first two formants (F1 and F2) for the four vowels under study in the neutral and focused conditions are depicted in Figure 2 for the sighted speakers (upper row) and for the blind speakers (lower row). To improve clarity, the values are linked by a solid line that delimits the vowel space. Data are shown separately for vowels in the /b/ context (left panels), the /d/ context (middle panels), and the /g/ context (right panels). Figure 3 presents average F0 values, vowel duration values, and RMS values across subject groups, prosodic conditions, and consonantal contexts. At the multivariate level, a significant main effect of group was found, F(6, 11) = 381.4, p < .001, Wilks’s l = .005 (h p2 = 0.97), as well as a significant main effect of prosodic condition, F(6, 11) = 407.6, p < .001, Wilks’s l = .004 (h p2 = 0.98). The interaction between the prosodic condition and the group was also significant, F(6, 11) = 430.9, p < .001, Wilks’s l = .004 (h p2 = 0.98). Univariate results were then considered for each dependent variable. Regarding formant values (Figure 2),

for both participant groups and for all three consonantal contexts, it was observed that the vowel space was lower in the F1 versus F2 dimension in the focus condition than in the neutral condition. This pattern was confirmed by the univariate results of the MANOVA, which revealed a significant main effect of the prosodic condition on F1, F(1,16) = 35.23, p < .001. A significant, three-level interaction between speaker group, prosodic condition, and consonantal context was also found, F(2, 32) = 9.41, p < .001. Indeed, F1 values were increased significantly more when going from the neutral condition to the focused condition for vowels in the /bVb/ and /gVg/ contexts than for vowels in the /dVd/ context for the sighted group, whereas this prosody-related difference was similar (and significant) for all three consonantal contexts for blind speakers. A three-level significant interaction between prosodic condition, vowel, and speaker group showed that the high vowels /i/, /y/, and /u/ were significantly less affected by prosodic context than the low vowel /a/ in sighted speakers. F1 was significantly more increased for focused contexts in sighted speakers than in blind speakers. Results of univariate analyses conducted on F2 values revealed a significant interaction between prosodic condition and speaker group, F(1, 16) = 4.58, p < .05, with F2 values increasing more in the focus condition compared with the neutral condition in the sighted group, compared with the blind group. The difference between the neutral and focused condition was significantly smaller for the low vowel /a / than for the three high vowels /i/, /y/, and /u/, for both speaker groups, F(3, 48) = 2.97, p < .05. Average F0 values are plotted in Figure 3 (left panel). As can be observed on this graph, F0 values were significantly higher in the focused condition than in the neutral condition. The main effect of the prosodic condition was significant, F(1, 16) = 1,692.21, p < .001. No significant effect of speaker group on F0 values was found. Looking at vowel duration values, univariate results for the data depicted in Figure 3 (middle panel) across speaker groups, prosodic conditions, and consonantal contexts suggest that blind speakers produced significantly longer vowels than sighted speakers, F(1, 16) = 60.38, p < .001. Overall, vowels produced under the focus condition were also significantly longer than vowels produced in the neutral condition, F(1, 16) = 2,384.95, p < .001. A significant interaction between the prosodic condition and the speaker group also appeared, F(1, 16) = 78.64, p < .001, with vowel lengthening under focus being more pronounced for blind speakers than for sighted speakers. Vowels were also significantly longer in the /gVg/ context than in the /bVb/ and /dVd/ contexts, when pooling across prosodic conditions and for both speaker groups, F(2, 32) = 2,339.57, p < .001. This effect of the consonant was greater in the neutral condition for blind speakers, where there was a significant effect of interaction, F(2, 32) = 1,872.56, p < .001. RMS values are plotted in Figure 3 (right panel). Univariate results showed that vowels produced under contrastive focus were significantly louder than vowels produced in the neutral condition, F(1, 16) = 1,857.75, p < .001. Also, sighted speakers produced significantly louder vowels

Ménard et al.: Contrastive Focus in Blind and Sighted Speakers

797

Figure 2. Average values of F1 and F2 (in mels) for the vowels /i y u a/ for the sighted speakers (upper row) and for the blind speakers (lower row). Data are presented in the /bVb/ context in the left graphs, in the /dVd/ context in the middle graphs, and in the /gVg/ context in the right graphs. To improve clarity, data produced in the neutral condition are linked by a solid black line, and data produced in the focus condition are linked by a solid red line.

than congenitally blind speakers, F(1, 16) = 89.73, p < .001. As can be observed in Figure 3, a significant interaction between subject group, prosodic condition, and consonantal context was found, F(2, 32) = 1,094.17, p < .001. Indeed, for sighted speakers, vowels’ RMS values were significantly higher in the /bVb/ context than in the /dVd/ or /gVg/ context,

and more so in the focus condition than in the neutral condition. However, a different pattern was observed for blind speakers, for which vowels’ RMS values were higher in the /bVb/ context than in the /dVd/ context in the neutral condition, but the values did not differ across consonantal contexts in the focus condition.

Figure 3. Average F0 values (in mels), average vowel duration values (in seconds), and average root-mean-square (RMS) values for both speaker groups (error bars are standard errors). Vowels produced in the neutral condition are linked by a black line, and vowels produced in the focus condition are linked by a red line. Vowels produced in the /bVb/ context correspond to the solid line, vowels produced in the /dVd/ context correspond to the dashed line, and vowels produced in the /gVg/ context correspond to the dotted line.

798 Journal of Speech, Language, and Hearing Research • Vol. 57 • 793–804 • June 2014

Finally, vowel contrast, as measured by AVS values, is represented in Figure 4 for the speaker groups, prosodic conditions, and consonantal contexts. As predicted from our previous study (Ménard et al., 2009), sighted speakers had significantly larger AVS values than blind speakers, F(1, 16) = 9.86, p < .01 (hp2 = 0.62). As revealed by the significant interaction of prosodic condition and speaker group, F(1, 16) = 9.75, p < .01; hp2 = 0.63), average contrast distances were increased under focus versus the neutral condition to a greater extent for sighted participants than for their blind peers. Consonantal context had a significant effect on AVS, with contrast distances being lower for vowels in the /dVd/ context than for vowels in the /bVb/ and /gVg/ contexts, F(2, 32) = 141.66, p < .001 (hp2 = 0.90). This factor interacted with speaker group, such that those differences were produced to a greater extent for sighted speakers than for blind speakers, F(2, 32) = 11.80, p < .01 (hp2 = 0.41).

Articulatory Results In this section, we investigate the associated articulatory correlates of contrastive focus in terms of lip geometry. If vision regulates the internal organization of the speech target, a visible parameter such as lip geometry might participate in the implementation of the target at different levels for the two speaker groups. Results from measures of lip area and upper lip protrusion are shown in Figure 5. For the sake of clarity, dispersion ellipses (± 2 SD) are plotted for all values in the lip area versus lip protrusion space, for both speaker groups (upper row: sighted speakers; lower row: blind speakers), prosodic conditions (solid black line: neutral; dashed red line: focus), and for the three consonantal contexts (left graphs: /bVb/; middle graphs: /dVd/; right graphs: /gVg/ ). At the multivariate level, MANOVA results revealed that the vowel Figure 4. Average vowel space values (in mels) and standard errors for blind and sighted speakers. Vowels produced in the neutral condition are linked by a black line, and vowels produced in the focus condition are linked by a red line. Vowels produced in the /bVb/ context correspond to the solid line, vowels produced in the /dVd/ context correspond to the dashed line, and vowels produced in the /gVg/ context correspond to the dotted line.

had a significant effect on the articulatory data, F(6, 4) = 99.12, p < .001, Wilks’s l = .007, as did the prosodic condition, F(2, 8) = 13.22, p < .01, Wilks’s l = .232. The group factor significantly interacted with the prosodic condition, F(2, 8) = 14.57, p < .05, Wilks’s l = .276. The vowel factor explains the larger proportion of the variance (hp2 = 0.98), followed by prosodic condition (hp2 = 0.77). Concerning lip area values, univariate results revealed a significant effect of prosodic condition, as a main effect, F(1, 9) = 74.42, p

Articulatory and acoustic correlates of contrastive focus in congenitally blind adults and sighted adults.

The role of vision in speech representation was investigated in congenitally blind speakers and sighted speakers by studying the correlates of contras...
425KB Sizes 0 Downloads 3 Views