Journal o f Psyeholinguistic Research, VoL 1, No. 1, 1971
Methods for Quantifying O n - O f f Speech Patterns Under Delayed Auditory Feedback Stephen Breskin,1 Louis J. Gerstman,1'2 and Joseph Jaffe1 Received July 15, 1970
Changes in the on-off patterns o f speech under delayed auditory feedback (DAF) have been represented by gross measures such as total reading time, percent phonation time, and "total pause time. The research reported in the present paper employed an analog-todigital converter to assess changes (from normal speech to DAF speech) in 11 descriptive measures derived from the distribution o f pause and vocalization durations. These measures reflected variations in individual speech performance under DAF which were consistent with the subjective ratings o f the authors. It was concluded that independent correlates o f subject, task, and environmental variables would be required to determine the usefulness o f these measures.
INTRODUCTION This paper proposes a new approach to the specification of speech performance under delayed auditory feedback (DAF). Its virtues are that the method is objective and quantitative, independent of the talker's articulatory intent or lexical content, free from the subjectivity of listener ratings, yet it yields measurements presumed to be relevant to the physiological substrate of speech. Previous attempts to quantify DAF effects have depended on the experimenter knowing in advance what the talker is trying to say, which he ensures This research was supported by the Department of Mental Hygiene, New York State, and by a General Research Support grant from the National Institute for Mental Health to the Research Foundation for Mental Hygiene, Inc. 1Department of Communication Sciences, New York State Psychiatric Institute, New York, New York. 2City College, The City University of New York, New York, New York. 89
90
Breskin, Ge~stman, and Jaffe
by having the talker read a standard passage aloud. Thereafter he may either focus on articulatory deviations from the talker's presumed intent or opt for acoustic descriptors of the overall difference between two readings of the same passage, once with no interference and again under DAF. Examples of the former approach are reports of stuttering (Fairbanks, 1955; Fairbanks and Guttman, 1958), mispronunciations (Atkinson, 1953), syllable repetitions and omissiorfs (Tiffany and Hanley, 1956), and omitted word endings (Korowbow, 1955). Examples of the latter are reports of increased reading time and increased speech intensity (Black, 1951; Lee, 1950; Spilka, 1954). In either approach, however, the experimenter achieves his precision at the price of foregoing analysis of spontaneous speech, which not only constitutes the majority of talking behavior, but which has also been shown to possess markedly different temporal properties from those of reading (Goldman-Eisler, 1968). In an important symposium commentary, Chase (1967, p. 50) asserts that: "We should be mindful of the fact that the speech motor system can undoubtedly also operate in different modes. If I am imitating another person's speech, without any concern about transmitting infonuation but simply doing an acoustic matching job, "the neurophysiology underlying the organization of my speech motor activity is probably quite different from the neusophysiology underlying the same patterns of sounds when I am spontaneously piecing these sounds together in an attempt to transmit information. Our laboratory experiences have taught us that the disorganization of speech motor activity that results from delayed auditory feedback of speech is less marked if the subject is reading a passage than if the subject is engaged in spontaneous propositional speech." It seems clear that the above distinction cannot be meaningfully examined until techniques are devised which can be applied as easily to spontaneous speech as to passages read aloud.
METHOD
Our laboratory has developed both a measuring device and a mathematical model to describe speech rhythms and variations in them as a function of time (Cassotta et al., 1964; Jaffe and Feldstein, 1970). An analog-to-digital converter samples the presence or absence of speech at intervals as short as 100 msec. With reference to a preset threshold, we term absence of speech as state 0 and presence of speech as state 1. Observations of adjacent samples in the identical state are presumed to have been continuous in the interval between samples. The system has a built-in "hangover time" so that unvoiced stop consonants may be bridged (i.e., pauses less than 190 msee) and thus make the system's output comparable to what the human ear perceives.
Methods for Quantifying On-Off Speech Patterns Under Delayed Auditory Feedback
91
In the experiment that follows we employ this system to analyze the effects of DAF on the on-off properties of speech, and we explore the basis for the use of various descriptive measures to assess these effects. Six female college students participated in the experiment as paid volunteers. They were seated comfortably in a room having a microphone connected to an Ampex PR-10 recorder (located in another room) and spoke spontaneously a 15-min monologue on a topic of personal interest. The first 5 min was a normal monologue (NM) free of experimental manipulation, while during the next 10 min S spoke with headphones on. This 10-min condition comprised 5 min of DAF interspersed with 5 min of wnchronous auditory feedback (SAF). 3 The DAF was produced at 190-msec delay and was played back at an intensity which varied with S's own vocal intensity. A sampling rate of 300 times per minute was chosen as the nearest time after a first sample that the second sample could be physically independent.4 The results were then analyzed using the PDP-8/I computer. Since a preliminary analysis indicated carryover effect from DAF to SAF, the SAF condition was dropped from subsequent analyses, and the NM condition was used as the S's own control. While speech samples of 5-min duration have been found to represent the temporal characteristics of speech reliably (Jaffe and Breskin, 1970), it was of interest to determine ff the 5-rain speech samples were representative of an individual's performance, or whether the sample overlooked some idiosyncratic behavior occurring in periods of less than 5-rain duration. To test for this, the 5-min frequency distributions of sound and of silence, for each S and in both NM and DAF conditions, were compared to the distributions for each minute by means of Kolmogorov-Smimov test (K-S test; Siegel, 1956). No significant differences were found in any of the comparisons and thus, for all further analyses, data within each 5-min set are combined.
RESULTS Prior to any statistical investigation of results, the authors listened to the actual speech of the Ss and made the following observations. For the speech of two Ss it was difficult to distinguish minutes of DAF from those of NM. These Ss suffered little disturbance by the DAF and were termed "mildly disrupted." For two other Ss, speech under DAF was quite different from 3The order of conditions for each of the 10 rain was: SAF, DAF, SAF, SAF, DAF, DAF, SAF, DAF, DAF, and SAF. 4The sampling rate of 300 times per minute is equivalent to one sample every 200 reset. This rate was chosen to be slightly longer than the delay time and hangover time of 190 rasec.
92
Breskin, Gerstman, and Jaffe
Table I. Frequency Distribution of Pause and Vocatization Durations for NM
and D A F (in Number of Sample Units at 300 per Minute) II
1 Duration
IIIIII
2
3
4 DAF
NM
5
NM
DAF
NM
DAF
NM
DAF
69 30 16 21 8
19 24 18 18 14
55 21 18 13 4
27 21 16 10 1
44 29 9 9 16
45 26 19 7 6
72 44 17 10 5
23 17 8 10 9
63 15 19 12 12
9 5 5 3 II
11 13 12 I0 24
10 10 9 6 II
11 3 4 1 34
7 7 9 1 i0
5 7 3 4 23
5 3 1 1 3
5 2 3 4 26
6
NM DAF
NM
DAF
80 23 14 16 9
25 7 6 8 5
41 22 9 8 6
8 4 4 I 8
7 7 2 2 17
1 '4 3 2 11
5 1 1 1 I!
19 31 23
27 40 29
6 11 2
7 17 11
Pause distr~ut~n 1 2 3 4
5 6 7 8
9 10 or more 1 2 3 4 5 6 7 8 9 i0 or more
21 33 19
34 60 52
19 32 27
13 12 24
Vocalizat~n disttibut~n 13 27 19 12 14 22 25 14 14 13 14 15
15 13
25 4 4
19 8 8
17 23 6
13 11 13
21 I0 12
19 6 II
15 15 9
9 15 3
24 II 12
6 6 3
Ii 5 ii
15 17
0 2
8 5
9 6
12 13
5 12
13 7
4 3
$ 8
7 4
7 3
6 6
20
4
0
8
6
11
6
11
4
2
2
6
6
20
0
22
12
27
17
34
16
31
21
13
25
I
I
II
I
NM. During D A F these Ss stammered frequently, their temporal r h y t h m of speaking was altered m a r k e d l y , and they were termed "severely disrupted." The remaining two Ss were somewhere in b e t w e e n and were termed "moderately disrupted." Table I shows for each S the frequency distributions o f sound and o f silence under NM and D A F conditions. Table II lists a number o f descriptors o f the NM and D A F monologues which have been derived from the distributions above. These descriptors will be discussed in turn. Experimenter's rating refers to subjective ratings as described in the preceding section. Subject No. identifies the performance o f an individual S. TPT refers to the traditional measure o f total pause time s and is equivalent in our data to the p r o d u c t o f the number o f O's c o u n t e d and the sother lxaditional measures such as total phonation time, percent phonafion time, and percent pause time are linearly derivable from the TPT result.
Methods for Quantifying On-Off Speech Patterns Under Delayed Auditory Feedback
93
sampling interval of 0.2 sec. When DAF is compared to NM, we note that, on the average, the mildly disrupted Ss paused 2% less, the moderately disrupted Ss 16% more, and the severely disrupted Ss 75% more. qo is the conditional probability of speaking given previous silence and is equivalent to the number of transitions from state 0 to state 1 divided by the total number of O's. We note an increase in qo for the mildly disrupted Ss, a decrease for the moderately disrupted Ss, and an even greater decrease for the severely disrupted Ss. ql is the conditional probability of speaking given previous speech and is determined by the number of transitions from state 0 to state 1 divided by the total number of l's. We note that ql decreased for all Ss, with subject No. 1 showing the largest decrease and subject No. 2 the least. Table II. Descriptive Statistics of NM and of DAF Experimenter's rating Mildly disrupted Subject No. TPT (sec)
NM DAF
Moderately disrupted
Severely disrupted
5
6
2
3
1
4
132 139
181 155
169 179
109 153
117 205
108 177
NM
0.221
0.080
0.234
0.260
0.303
0.424
qo
DAF
0.256
0.136
0.143
0.189
0.176
0o121
ql
NM DAF
0.819 0.776
0.876 0.855
0.808 0.800
0.852 0.805
0.806 0.615
0.848 0.810
M P T (sec)
NM DAF
0.904 0.783
2.517 1.472
1.075 1.400
0.770 1.058
0.660 1.136
0.672 1.651
1.103
1.617
1.039
1.353
1.032
1.317
MVT (sec) DAF
NM
0.892
1.381
1.002
1.023
0.520
1.050
NM MEPT (sec) DAF
0.433 0.378
0.633 0.410
0.533 0.700
0.486 0s
0.433 0.856
0.340 0.820
NM MEVT (sec) DAF
0.700 0.652
1.367 1.136
0.710 0.876
1.192 0.804
0.911 0.490
1.033 0.874
0.692 0.734
1.156 0.640
0.958 1.564
0.848 1.068
0.644 1.004
0.348 1.402
1.214 0.750
1.124 1.420
0.970 0.708
1.144 1.078
1.020 0.324
t.322 0.666
PDmax
0.048
0.158
0.196a
0.108
0.322a
0.369a
VDmax
0.128
0.112
0.132
0.190 a
0.448a
0.164a
NM PIR (sec)
DAF
NM VIR (see)
DAF
aSignificant beyond the 0.05 level.
94
Breskin,
o-~
#M
0.-,43
,oo
PD
OAF
Gecstman,
and
Jaffe
max
MILD
io 7o 6a ~Q 4Q
t~ io i
o
,
,
1
. . . .
I
,,
,
. I .
,
,
.
ioo P Z 0 r162 b.I 0,.
~
I
,I
~ ~,,
I
. . . .
I
. . . .
J
.
i
. . . .
|
. . . .
I
MODERATE
so so ~o so
M.I
_> r
.J
so zo
0
! .
o
leo
,0
.
.
.
.
.
.
_ _-
s ,,i
- - -
SEVERE
so
so
'2 PAUSE
DURATION
(SEG)
Fig. 1. Cumulative percentage function of increasing pause duration for the six Ss under him and DAF conditions. PDma x is shown as the maximum displacement between NM and DAF distributions.
Methods for Quantifying On-Off Speech Patterns Under Delayed Auditory Feedback
95
MPT is the mean pause time and may be obtained from the frequency distribution of O's or, equivalently, from the product of the reciprocal of qo and the sampling interval of 0.2 sec. We note a decrease in MPT for the mildly disrupted Ss and a progressive increase for the moderately and severely disrupted Ss. MVT is the mean vocalization time and may be obtained from the frequency distribution of 1's or, equivalently, from the product of the reciprocal of 1 - q~ and the sampling interval of 0.2 sec. We note a decrease in MVT for all Ss, with subject No. 1 showing the largest decrease and subject No. 2 the least. MEPT is the median pause time and was obtained from the frequency distribution of O's. When NM is compared to DAF, we note a slight decrease for mildly disrupted Ss, a slight increase for moderately disrupted Ss, and a large increase for severely disrupted Ss. MEVT is the median vocalization time and was obtained from the infrequency distribution of l's. When NM is compared to DAF, we note decreases for all Ss, with the largest taking place for the severely disrupted Ss. PIR is the pause interquartile range and is determined by subtracting the pause duration at the 25th percentile from that duration at the 75th percentile. With the exception of S#6, we note an increase in the PIR from NM to DAF conditions with the largest increase occurring in the severely disrupted Ss. VIR is the vocalization interquartile range and is determined by subtracting the vocalization duration at the 25th percentile from that duration at the 75th percentile. With the exception of 5#6, we note a decrease in the VIR from NM to DAF conditions with the largest decrease occurring in the severely disrupted S. PDmax is the maximum difference between the percentage cumulative distribution functions of pausing under NM and DAF conditions. These distribution functions are shown in Fig. 1. A statistical test of the percentage difference of these functions (the K-S test) yielded significant results for both of the severely disrupted Ss. VDmax is the maximum difference between the percentage cumulative distribution functions of vocalization under NM and DAF conditions. These distribution functions are shown in Fig. 2. The K-S test yielded significant results for both of the severely disrupted Ss.
DISCUSSION
This paper has examined various quantitative descriptions of alteration in individual speech performance under NM and DAF conditions. The measure
Breskin, Gelrstman, and Jaffe
96
O-..O NM
) VD max
DAF
,GO
MILD
tO ilO
T0
9 ~0
Io EO 0
i
i
i
I
i
,
i
.
l
,
.
,
,
I
,
,
,
,
I O0
FZ LU
, * *
I
I . . . .
I
. . . .
I
. . . .
*
I
. .
i
.
9
J
I
MODERATE
SO ilo
nUJ O.
?0 tO
lIJ
_>
SO 40
.J ~0
o eO ,,
0
,
I
i
.
,
9
I
. . . .
I
,
,
. .
I
IO0
SEVERE
i 0
*
I?
3
VOCALIZATION
9
r
.
I i
9
DURATION
. . . .
z
Y
i 4
(SEG)
Fig. 2. Cumulative percentage function of increasing vocalization duration for six Ss under NM and DAF conditions. VDma x is shown as the maximum displacement between NM and DAF distn'butions.
Methods for Quantifying On-Off Speech Patterns Under DelayedAuditory Feedback
97
TPT provides a gross description of pause behavior, while MPT, MVT, MEPT, MEVT, qo, and ql provide measures of central tendency and PIR, VIR, PDm ax, and VDmax provide measures of the range in individual variation. The interdependence of the above measures is largely determined by the nature of the underlying distributions of pause and vocalization durations. Jaffe et al. (1964) have found these distributions, for individual talkers, to be independent of each other and to approximate decaying exponential functions, under normal spontaneous speech. These findings impose direct interdependencies upon qo and MEPT, qo and PIR, q~ and MEVT, and q~ and VIR. 6 Consequently, we need only refer to qo and ql to specify the entire distribution of pauses and vocalizations, respectively. In the present study, subsequent analysis has indicated that the distributions of pause and of vocalization under D A F are independent of each other for all Ss, but that these distributions are n o t describable by the decaying exponential functions. 7 Thus, under DAF, there is additional information in the measures MEPT, MEVT, PIR, and VIR that is not derivable from either qo or q~ .s There is a particular advantage to the use of the measures PDma x and VDma x. Both may be referred to a standard sampling distribution (Goodman, 1954), thereby enabling the experimenter to evaluate statistically alterations in individual speech performance. With the subject serving as his own control, the experimenter avoids the pitfalls attendant on the formation of equivalence groups. While there is no necessary correlation among the eleven descriptive measures (except for qo with MPT and for ql with MVT as previously noted), it is of interest to know the potential of each measure for describing alteration of speech under DAF. This can best be accomplished by obtaining a large group of subjects upon which all measures are calculated and then intercorrelated. As far as the potential use of these measures to reflect relevant speech changes under DAF, independent correlates of subject, task, and environmental variables are required in order to determine the predictability of each of these measures. This research is presently being conducted by our laboratory.
6Explicitly, M E P T -- (log 0.5/log q ~ X (0.2 see); PIR = (- log 3/log qo) X (0.2 s~c); M E V T = [log 0.5/log(1-ql)] X (0.2 see); and VIR = [- log 3/log(l-ql)] X (0.2 sec). 7A median chi-square test (Siegel, 1956) employed to test for the independence of the pause and vocalization distributions showed nonsignificance (i.e.,independence) for all six Ss. Also, the D A F distributions in Table I did not exhibit the decreasing monotonic Property of the decaying exponential function. It should be noted that,regarcllessof the underlying distribution, the following relationships hold: M P T = (1/qo) X (0.2 see); M V T = [i/(1-ql)] X (0.2 see); and TPT = [(1-ql)/(1+qo-ql)] X (duration of speech sample = 300 see).
98
Breskin, Gerstman, and Jaffe
ACKNOWLEDGMENT We t h a n k Mr. Edward L. Lorick for c o m p u t e r processing o f the data.
REFERENCES Atkinson, C. J. (1953). Adaptation to delayed sidetone. 3. Speech Hearing Disorders 18: 386-391. Black, J. W. (1951). The effect of delayed sidetone upon vocal rate and intensity. 3. Speech Hearing Disorders 16: 56-60. Cassotta, L., Feldstein, S., and Jaffe, J. (1964). AVTA: A device for automatic vocal transaction analysis. 3. Exptl. Anal. Beha~. 7: 99-104. Chase, R. A. (1967). In Millikan, C. H., and Darley, F. L. (eds.), Brain Mechanisms Underlying Speech and Language. Grune and Stratton, New York. Fairbanks, G. (1955). Selective vocal effects of delayed auditory feedback. J. Speech Hearing Disorders 20: 333-345. Fairbanks, G., and Guttman, N. (1958). Effects of delayed auditory feedback upon articulation. 3. Speech Hearing Res. 1: 12-22. Goldman-Eisler, F. (1968). Psycholinguistics. Academic Press, London. Goodman, L. A. (1954). Kolmogorov-Smirnov tests for psychological research. s Bull. 51: 160-168. Jaffe, J., and Breskin, S. (1970). Temporal patterns of speech and sample size..L Speech Hearing Res. 13: 667-668. Jaffe, J., and Feldsteirt, S. (1970). Rhythms of Dialogue. Academic Press, New York. Jaffe, J., Cassotta, L., and Feldstein, S. (1964). Markovian model of time patterns of speech. Science 144: 884-886. Korowbow, N. (1955). Reactions to stress: a reflection of personality trait organization. 3. Abnorm. See. Psyehol. 51: 464-468. Lee, B. S. (1950). Effects of delayed speech feedback. Y. Acoust. Soc. Am. 22: 824-826. Siegel, S. (1956). Nonparametric Statistics. McGraw-Hill, New York. Spilka, B. (1954). Some vocal effects of different reading passages and time delays in speech feedback. 3,. Speech Hearing Disorders 19: 37-47. Tiffany, W. R., and Hanley, C. N. (1956). Adaptation to delayed sidetone. J. Speech Hearing Disorders 21:164-172.