Methods for quantifying on-off speech patterns under delayed auditory feedback.

Journal o f Psyeholinguistic Research, VoL 1, No. 1, 1971

Methods for Quantifying O n - O f f Speech Patterns Under Delayed Auditory Feedback Stephen Breskin,1 Louis J. Gerstman,1'2 and Joseph Jaffe1 Received July 15, 1970

Changes in the on-off patterns o f speech under delayed auditory feedback (DAF) have been represented by gross measures such as total reading time, percent phonation time, and "total pause time. The research reported in the present paper employed an analog-todigital converter to assess changes (from normal speech to DAF speech) in 11 descriptive measures derived from the distribution o f pause and vocalization durations. These measures reflected variations in individual speech performance under DAF which were consistent with the subjective ratings o f the authors. It was concluded that independent correlates o f subject, task, and environmental variables would be required to determine the usefulness o f these measures.

INTRODUCTION This paper proposes a new approach to the specification of speech performance under delayed auditory feedback (DAF). Its virtues are that the method is objective and quantitative, independent of the talker's articulatory intent or lexical content, free from the subjectivity of listener ratings, yet it yields measurements presumed to be relevant to the physiological substrate of speech. Previous attempts to quantify DAF effects have depended on the experimenter knowing in advance what the talker is trying to say, which he ensures This research was supported by the Department of Mental Hygiene, New York State, and by a General Research Support grant from the National Institute for Mental Health to the Research Foundation for Mental Hygiene, Inc. 1Department of Communication Sciences, New York State Psychiatric Institute, New York, New York. 2City College, The City University of New York, New York, New York. 89

90

Breskin, Ge~stman, and Jaffe

by having the talker read a standard passage aloud. Thereafter he may either focus on articulatory deviations from the talker's presumed intent or opt for acoustic descriptors of the overall difference between two readings of the same passage, once with no interference and again under DAF. Examples of the former approach are reports of stuttering (Fairbanks, 1955; Fairbanks and Guttman, 1958), mispronunciations (Atkinson, 1953), syllable repetitions and omissiorfs (Tiffany and Hanley, 1956), and omitted word endings (Korowbow, 1955). Examples of the latter are reports of increased reading time and increased speech intensity (Black, 1951; Lee, 1950; Spilka, 1954). In either approach, however, the experimenter achieves his precision at the price of foregoing analysis of spontaneous speech, which not only constitutes the majority of talking behavior, but which has also been shown to possess markedly different temporal properties from those of reading (Goldman-Eisler, 1968). In an important symposium commentary, Chase (1967, p. 50) asserts that: "We should be mindful of the fact that the speech motor system can undoubtedly also operate in different modes. If I am imitating another person's speech, without any concern about transmitting infonuation but simply doing an acoustic matching job, "the neurophysiology underlying the organization of my speech motor activity is probably quite different from the neusophysiology underlying the same patterns of sounds when I am spontaneously piecing these sounds together in an attempt to transmit information. Our laboratory experiences have taught us that the disorganization of speech motor activity that results from delayed auditory feedback of speech is less marked if the subject is reading a passage than if the subject is engaged in spontaneous propositional speech." It seems clear that the above distinction cannot be meaningfully examined until techniques are devised which can be applied as easily to spontaneous speech as to passages read aloud.

METHOD

Our laboratory has developed both a measuring device and a mathematical model to describe speech rhythms and variations in them as a function of time (Cassotta et al., 1964; Jaffe and Feldstein, 1970). An analog-to-digital converter samples the presence or absence of speech at intervals as short as 100 msec. With reference to a preset threshold, we term absence of speech as state 0 and presence of speech as state 1. Observations of adjacent samples in the identical state are presumed to have been continuous in the interval between samples. The system has a built-in "hangover time" so that unvoiced stop consonants may be bridged (i.e., pauses less than 190 msee) and thus make the system's output comparable to what the human ear perceives.

Methods for Quantifying On-Off Speech Patterns Under Delayed Auditory Feedback

91

In the experiment that follows we employ this system to analyze the effects of DAF on the on-off properties of speech, and we explore the basis for the use of various descriptive measures to assess these effects. Six female college students participated in the experiment as paid volunteers. They were seated comfortably in a room having a microphone connected to an Ampex PR-10 recorder (located in another room) and spoke spontaneously a 15-min monologue on a topic of personal interest. The first 5 min was a normal monologue (NM) free of experimental manipulation, while during the next 10 min S spoke with headphones on. This 10-min condition comprised 5 min of DAF interspersed with 5 min of wnchronous auditory feedback (SAF). 3 The DAF was produced at 190-msec delay and was played back at an intensity which varied with S's own vocal intensity. A sampling rate of 300 times per minute was chosen as the nearest time after a first sample that the second sample could be physically independent.4 The results were then analyzed using the PDP-8/I computer. Since a preliminary analysis indicated carryover effect from DAF to SAF, the SAF condition was dropped from subsequent analyses, and the NM condition was used as the S's own control. While speech samples of 5-min duration have been found to represent the temporal characteristics of speech reliably (Jaffe and Breskin, 1970), it was of interest to determine ff the 5-rain speech samples were representative of an individual's performance, or whether the sample overlooked some idiosyncratic behavior occurring in periods of less than 5-rain duration. To test for this, the 5-min frequency distributions of sound and of silence, for each S and in both NM and DAF conditions, were compared to the distributions for each minute by means of Kolmogorov-Smimov test (K-S test; Siegel, 1956). No significant differences were found in any of the comparisons and thus, for all further analyses, data within each 5-min set are combined.

RESULTS Prior to any statistical investigation of results, the authors listened to the actual speech of the Ss and made the following observations. For the speech of two Ss it was difficult to distinguish minutes of DAF from those of NM. These Ss suffered little disturbance by the DAF and were termed "mildly disrupted." For two other Ss, speech under DAF was quite different from 3The order of conditions for each of the 10 rain was: SAF, DAF, SAF, SAF, DAF, DAF, SAF, DAF, DAF, and SAF. 4The sampling rate of 300 times per minute is equivalent to one sample every 200 reset. This rate was chosen to be slightly longer than the delay time and hangover time of 190 rasec.

92

Breskin, Gerstman, and Jaffe

Table I. Frequency Distribution of Pause and Vocatization Durations for NM

and D A F (in Number of Sample Units at 300 per Minute) II

1 Duration

IIIIII

2

3

4 DAF

NM

5

NM

DAF

NM

DAF

NM

DAF

69 30 16 21 8

19 24 18 18 14

55 21 18 13 4

27 21 16 10 1

44 29 9 9 16

45 26 19 7 6

72 44 17 10 5

23 17 8 10 9

63 15 19 12 12

9 5 5 3 II

11 13 12 I0 24

10 10 9 6 II

11 3 4 1 34

7 7 9 1 i0

5 7 3 4 23

5 3 1 1 3

5 2 3 4 26

6

NM DAF

NM

DAF

80 23 14 16 9

25 7 6 8 5

41 22 9 8 6

8 4 4 I 8

7 7 2 2 17

1 '4 3 2 11

5 1 1 1 I!

19 31 23

27 40 29

6 11 2

7 17 11

Pause distr~ut~n 1 2 3 4

5 6 7 8

9 10 or more 1 2 3 4 5 6 7 8 9 i0 or more

21 33 19

34 60 52

19 32 27

13 12 24

Vocalizat~n disttibut~n 13 27 19 12 14 22 25 14 14 13 14 15

15 13

25 4 4

19 8 8

17 23 6

13 11 13

21 I0 12

19 6 II

15 15 9

9 15 3

24 II 12

6 6 3

Ii 5 ii

15 17

0 2

8 5

9 6

12 13

5 12

13 7

4 3

$ 8

7 4

7 3

6 6

20

4

0

8

6

11

6

11

4

2

2

6

6

20

0

22

12

27

17

34

16

31

21

13

25

I

I

II

I

NM. During D A F these Ss stammered frequently, their temporal r h y t h m of speaking was altered m a r k e d l y , and they were termed "severely disrupted." The remaining two Ss were somewhere in b e t w e e n and were termed "moderately disrupted." Table I shows for each S the frequency distributions o f sound and o f silence under NM and D A F conditions. Table II lists a number o f descriptors o f the NM and D A F monologues which have been derived from the distributions above. These descriptors will be discussed in turn. Experimenter's rating refers to subjective ratings as described in the preceding section. Subject No. identifies the performance o f an individual S. TPT refers to the traditional measure o f total pause time s and is equivalent in our data to the p r o d u c t o f the number o f O's c o u n t e d and the sother lxaditional measures such as total phonation time, percent phonafion time, and percent pause time are linearly derivable from the TPT result.


93

sampling interval of 0.2 sec. When DAF is compared to NM, we note that, on the average, the mildly disrupted Ss paused 2% less, the moderately disrupted Ss 16% more, and the severely disrupted Ss 75% more. qo is the conditional probability of speaking given previous silence and is equivalent to the number of transitions from state 0 to state 1 divided by the total number of O's. We note an increase in qo for the mildly disrupted Ss, a decrease for the moderately disrupted Ss, and an even greater decrease for the severely disrupted Ss. ql is the conditional probability of speaking given previous speech and is determined by the number of transitions from state 0 to state 1 divided by the total number of l's. We note that ql decreased for all Ss, with subject No. 1 showing the largest decrease and subject No. 2 the least. Table II. Descriptive Statistics of NM and of DAF Experimenter's rating Mildly disrupted Subject No. TPT (sec)

NM DAF

Moderately disrupted

Severely disrupted

5

6

2

3

1

4

132 139

181 155

169 179

109 153

117 205

108 177

NM

0.221

0.080

0.234

0.260

0.303

0.424

qo

DAF

0.256

0.136

0.143

0.189

0.176

0o121

ql

NM DAF

0.819 0.776

0.876 0.855

0.808 0.800

0.852 0.805

0.806 0.615

0.848 0.810

M P T (sec)

NM DAF

0.904 0.783

2.517 1.472

1.075 1.400

0.770 1.058

0.660 1.136

0.672 1.651

1.103

1.617

1.039

1.353

1.032

1.317

MVT (sec) DAF

NM

0.892

1.381

1.002

1.023

0.520

1.050

NM MEPT (sec) DAF

0.433 0.378

0.633 0.410

0.533 0.700

0.486 0s

0.433 0.856

0.340 0.820

NM MEVT (sec) DAF

0.700 0.652

1.367 1.136

0.710 0.876

1.192 0.804

0.911 0.490

1.033 0.874

0.692 0.734

1.156 0.640

0.958 1.564

0.848 1.068

0.644 1.004

0.348 1.402

1.214 0.750

1.124 1.420

0.970 0.708

1.144 1.078

1.020 0.324

t.322 0.666

PDmax

0.048

0.158

0.196a

0.108

0.322a

0.369a

VDmax

0.128

0.112

0.132

0.190 a

0.448a

0.164a

NM PIR (sec)

DAF

NM VIR (see)

DAF

aSignificant beyond the 0.05 level.

94

Breskin,

o-~

#M

0.-,43

,oo

PD

OAF

Gecstman,

and

Jaffe

max

MILD

io 7o 6a ~Q 4Q

t~ io i

o

,

,

1

. . . .

I

,,

,

. I .

,

,

.

ioo P Z 0 r162 b.I 0,.

~

I

,I

~ ~,,

I

. . . .

I

. . . .

J

.

i

. . . .

|

. . . .

I

MODERATE

so so ~o so

M.I

_> r

.J

so zo

0

! .

o

leo

,0

.

.

.

.

.

.

_ _-

s ,,i

- - -

SEVERE

so

so

'2 PAUSE

DURATION

(SEG)

Fig. 1. Cumulative percentage function of increasing pause duration for the six Ss under him and DAF conditions. PDma x is shown as the maximum displacement between NM and DAF distributions.


95

MPT is the mean pause time and may be obtained from the frequency distribution of O's or, equivalently, from the product of the reciprocal of qo and the sampling interval of 0.2 sec. We note a decrease in MPT for the mildly disrupted Ss and a progressive increase for the moderately and severely disrupted Ss. MVT is the mean vocalization time and may be obtained from the frequency distribution of 1's or, equivalently, from the product of the reciprocal of 1 - q~ and the sampling interval of 0.2 sec. We note a decrease in MVT for all Ss, with subject No. 1 showing the largest decrease and subject No. 2 the least. MEPT is the median pause time and was obtained from the frequency distribution of O's. When NM is compared to DAF, we note a slight decrease for mildly disrupted Ss, a slight increase for moderately disrupted Ss, and a large increase for severely disrupted Ss. MEVT is the median vocalization time and was obtained from the infrequency distribution of l's. When NM is compared to DAF, we note decreases for all Ss, with the largest taking place for the severely disrupted Ss. PIR is the pause interquartile range and is determined by subtracting the pause duration at the 25th percentile from that duration at the 75th percentile. With the exception of S#6, we note an increase in the PIR from NM to DAF conditions with the largest increase occurring in the severely disrupted Ss. VIR is the vocalization interquartile range and is determined by subtracting the vocalization duration at the 25th percentile from that duration at the 75th percentile. With the exception of 5#6, we note a decrease in the VIR from NM to DAF conditions with the largest decrease occurring in the severely disrupted S. PDmax is the maximum difference between the percentage cumulative distribution functions of pausing under NM and DAF conditions. These distribution functions are shown in Fig. 1. A statistical test of the percentage difference of these functions (the K-S test) yielded significant results for both of the severely disrupted Ss. VDmax is the maximum difference between the percentage cumulative distribution functions of vocalization under NM and DAF conditions. These distribution functions are shown in Fig. 2. The K-S test yielded significant results for both of the severely disrupted Ss.

DISCUSSION

This paper has examined various quantitative descriptions of alteration in individual speech performance under NM and DAF conditions. The measure

Breskin, Gelrstman, and Jaffe

96

O-..O NM

) VD max

DAF

,GO

MILD

tO ilO

T0

9 ~0

Io EO 0

i

i

i

I

i

,

i

.

l

,

.

,

,

I

,

,

,

,

I O0

FZ LU

, * *

I

I . . . .

I

. . . .

I

. . . .

*

I

. .

i

.

9

J

I

MODERATE

SO ilo

nUJ O.

?0 tO

lIJ

_>

SO 40

.J ~0

o eO ,,

0

,

I

i

.

,

9

I

. . . .

I

,

,

. .

I

IO0

SEVERE

i 0

*

I?

3

VOCALIZATION

9

r

.

I i

9

DURATION

. . . .

z

Y

i 4

(SEG)

Fig. 2. Cumulative percentage function of increasing vocalization duration for six Ss under NM and DAF conditions. VDma x is shown as the maximum displacement between NM and DAF distn'butions.

Methods for Quantifying On-Off Speech Patterns Under DelayedAuditory Feedback

97

TPT provides a gross description of pause behavior, while MPT, MVT, MEPT, MEVT, qo, and ql provide measures of central tendency and PIR, VIR, PDm ax, and VDmax provide measures of the range in individual variation. The interdependence of the above measures is largely determined by the nature of the underlying distributions of pause and vocalization durations. Jaffe et al. (1964) have found these distributions, for individual talkers, to be independent of each other and to approximate decaying exponential functions, under normal spontaneous speech. These findings impose direct interdependencies upon qo and MEPT, qo and PIR, q~ and MEVT, and q~ and VIR. 6 Consequently, we need only refer to qo and ql to specify the entire distribution of pauses and vocalizations, respectively. In the present study, subsequent analysis has indicated that the distributions of pause and of vocalization under D A F are independent of each other for all Ss, but that these distributions are n o t describable by the decaying exponential functions. 7 Thus, under DAF, there is additional information in the measures MEPT, MEVT, PIR, and VIR that is not derivable from either qo or q~ .s There is a particular advantage to the use of the measures PDma x and VDma x. Both may be referred to a standard sampling distribution (Goodman, 1954), thereby enabling the experimenter to evaluate statistically alterations in individual speech performance. With the subject serving as his own control, the experimenter avoids the pitfalls attendant on the formation of equivalence groups. While there is no necessary correlation among the eleven descriptive measures (except for qo with MPT and for ql with MVT as previously noted), it is of interest to know the potential of each measure for describing alteration of speech under DAF. This can best be accomplished by obtaining a large group of subjects upon which all measures are calculated and then intercorrelated. As far as the potential use of these measures to reflect relevant speech changes under DAF, independent correlates of subject, task, and environmental variables are required in order to determine the predictability of each of these measures. This research is presently being conducted by our laboratory.

6Explicitly, M E P T -- (log 0.5/log q ~ X (0.2 see); PIR = (- log 3/log qo) X (0.2 s~c); M E V T = [log 0.5/log(1-ql)] X (0.2 see); and VIR = [- log 3/log(l-ql)] X (0.2 sec). 7A median chi-square test (Siegel, 1956) employed to test for the independence of the pause and vocalization distributions showed nonsignificance (i.e.,independence) for all six Ss. Also, the D A F distributions in Table I did not exhibit the decreasing monotonic Property of the decaying exponential function. It should be noted that,regarcllessof the underlying distribution, the following relationships hold: M P T = (1/qo) X (0.2 see); M V T = [i/(1-ql)] X (0.2 see); and TPT = [(1-ql)/(1+qo-ql)] X (duration of speech sample = 300 see).

98

Breskin, Gerstman, and Jaffe

ACKNOWLEDGMENT We t h a n k Mr. Edward L. Lorick for c o m p u t e r processing o f the data.

REFERENCES Atkinson, C. J. (1953). Adaptation to delayed sidetone. 3. Speech Hearing Disorders 18: 386-391. Black, J. W. (1951). The effect of delayed sidetone upon vocal rate and intensity. 3. Speech Hearing Disorders 16: 56-60. Cassotta, L., Feldstein, S., and Jaffe, J. (1964). AVTA: A device for automatic vocal transaction analysis. 3. Exptl. Anal. Beha~. 7: 99-104. Chase, R. A. (1967). In Millikan, C. H., and Darley, F. L. (eds.), Brain Mechanisms Underlying Speech and Language. Grune and Stratton, New York. Fairbanks, G. (1955). Selective vocal effects of delayed auditory feedback. J. Speech Hearing Disorders 20: 333-345. Fairbanks, G., and Guttman, N. (1958). Effects of delayed auditory feedback upon articulation. 3. Speech Hearing Res. 1: 12-22. Goldman-Eisler, F. (1968). Psycholinguistics. Academic Press, London. Goodman, L. A. (1954). Kolmogorov-Smirnov tests for psychological research. s Bull. 51: 160-168. Jaffe, J., and Breskin, S. (1970). Temporal patterns of speech and sample size..L Speech Hearing Res. 13: 667-668. Jaffe, J., and Feldsteirt, S. (1970). Rhythms of Dialogue. Academic Press, New York. Jaffe, J., Cassotta, L., and Feldstein, S. (1964). Markovian model of time patterns of speech. Science 144: 884-886. Korowbow, N. (1955). Reactions to stress: a reflection of personality trait organization. 3. Abnorm. See. Psyehol. 51: 464-468. Lee, B. S. (1950). Effects of delayed speech feedback. Y. Acoust. Soc. Am. 22: 824-826. Siegel, S. (1956). Nonparametric Statistics. McGraw-Hill, New York. Spilka, B. (1954). Some vocal effects of different reading passages and time delays in speech feedback. 3,. Speech Hearing Disorders 19: 37-47. Tiffany, W. R., and Hanley, C. N. (1956). Adaptation to delayed sidetone. J. Speech Hearing Disorders 21:164-172.

The effects of delayed auditory and visual feedback on speech production.

Speech disfluencies and delayed auditory feedback reactions of stuttering and non-stuttering children.

Speech compensation for time-scale-modified auditory feedback.

Susceptibility to delayed auditory feedback and dependence on auditory or oral sensory feedback.

The role of auditory feedback in speech and song.

Auditory feedback perturbation in children with developmental speech sound disorders.

Variables affecting stutterer's initial reactions to delayed auditory feedback.

Delayed auditory feedback simulates features of nonfluent primary progressive aphasia.

Therapeutic and diagnostic applications of delayed auditory feedback.

Response to delayed auditory feedback in patients with hemispheric lesions.

Auditory Masking Effects on Speech Fluency in Apraxia of Speech and Aphasia: Comparison to Altered Auditory Feedback.

The predictability of frequency-altered auditory feedback changes the weighting of feedback and feedforward input for speech motor control.

Effects of postlingual deafness on speech production: implications for the role of auditory feedback.

Reliance on auditory feedback in children with childhood apraxia of speech.

Auditory feedback control is involved at even sub-phonemic levels of speech production.

Generic stabilizability for time-delayed feedback control.

Age, sex, and delay time as factors affecting reaction to delayed auditory feedback.

Effects of dichotically presented simulatneous synchronous and delayed auditory feedback on key tapping performance.

Effects of randomly delayed visual and auditory feedback on keying performance.

Recalibration of subjective simultaneity between self-generated movement and delayed auditory feedback.

Methods for quantifying training in sprint kayak.

Auditory-visual perception of speech.

Electrodermal response, delayed auditory feedback, and lengthened off-time procedures. Comparison of use for medical-legal audiological assessment.

Feedback valence affects auditory perceptual learning independently of feedback probability.