I have found a variety of practice effects on speech production planning by comparing SOTs in everyday conversation and those in TV programs. Preplanned speech in TV programs tended to contain more PHONOLOGICAL and SYNTACTIC errors, anticipatory
PHONOLOGICAL errors and SYNTACTIC lexical unit errors, PHONOLOGICAL omission errors, especially telescopings, contextual lexical unit errors, lexical unit errors involving CCs, PHONOLOGICAL errors with a greater anticipatory error-source distance, and vowel errors than spontaneous speech in everyday conversation. Spontaneous speech in everyday
conversation, on the other hand, tended to contain more LEXICAL errors, perseveratory PHONOLOGICAL errors and SYNTACTIC lexical unit errors, non-contextual lexical unit errors, lexical unit errors involving OCs, blend errors, lexical unit errors with a greater
perseveratory error-source distance, errors involving verbs and adjectives, and consonant errors
than preplanned speech in TV programs. Dell’s and Levelt’s models can account for many of the effects. The present study shows that content-practice can influence not only macroplanning but also microplanning in Levelt’s term and that content-practice cannot be fully explained by means of strengthened connections alone. Most of the content-practice effects can be accounted for by the following three characteristics of content-practice. First, content-practice shifts the locus of errors down to the level of phonology. Second, it enables the speaker to plan ahead with less influence from the previously processed elements. Third, content-practice prevents elements outside of the linguistic context from intruding into the current speech plan.
At the same time, this study has shown that there are some aspects of the speech production planning mechanism which are unsusceptible to practice. Furthermore, it became evident that content-practice and the other two types of practice show characteristics different from one another, although they might have some aspects in common.
These findings suggest that SOT researchers should collect data from the same type of source with respect to practice. Practice could be a factor that might disrupt any “More Errors”
arguments about SOTs (Cutler 1982).
However, there are aspects of speech production planning that I could not discuss in this paper. First, the rates of errors produced during a particular length of utterance have not been considered, as was investigated in Schwartz et al. (1994) and Dell et al. (1997), who argued that practice reduces the number of errors and that more anticipation errors are likely to occur when the error rate is lower. Unlike their tongue-twister experiments, where the speaker says the fixed sentences, it would be difficult to measure error rates in a study like the present study. Error rates must also interact with another factor, speech rates. If these factors are taken into consideration, it would be possible to solve the problem of whether practice increases or decreases particular kinds of errors (3.2). Second, the present study could have looked at how practice interacts with the influence of orthographic representations. One could hypothesize that orthographic similarities between errors and their targets are more likely to be found in
preplanned speech than in spontaneous speech because speakers in TV programs often use witten scripts to practice their speech, although no data were collected from utterances that may involve reading. Third, it would have been interesting to examine what kinds of semantic relationships lexical substitution errors and their targets have and how they show phonological similarities with each other, and if these factors differ due to practice.
Admittedly, the practice effects found in the present study might have limitations to their application to practice effects in other languages. It is necessary to look at crosslinguistic data on practice effects on SOTs in order to probe into universal aspects of practice effects and those of the speech production planning mechanism.
Notes
[1] This paper is a substantially expanded version of Kawachi (1999, 2000, 2001) and a long version of Kawachi (2002). The data can be found at the following website.
http://www.acsu.buffalo.edu/~kawachi/kawachi2002data.html
[2] Individual erroneous linguistic elements in SOTs will be called “SOT errors” or “errors.”
[3] Kamio and Tonoike (1979) demonstrated how regularities found in phonological SOTs in English apply to phonological SOTs in Japanese. They pointed out examples of Japanese SOTs that are difficult to classify into a single SOT type, some of which will be discussed later in 2.1.5.2. The data that they analyzed are made up of two types, one type of which was collected by the authors from everyday conversation by means of the “pen-and-paper” method (Poulisse 1999:96) and the other of which was “still in the process of being collected” mainly from discussion, interviews, and baseball programs on radio and TV by means of the tape-recording method (Kamio and Tonoike 1979:278). Many of them were recorded and transcribed by other people (p.308).
Tabusa (1982) showed how an element before or after the error influences the error in terms of assimilation and dissimilation. Most of the SOTs in his data were collected from TV programs and the remainder from public speech such as announcements in a train or bus and school ceremonies. He used the pen-and-paper method for data collection from the two different types of sources. The TV programs that he examined are mainly news, and include Diet
sessions, sport broadcasts, discussions on political issues, and quizzes. It is not clear whether the programs were live-broadcast or edited. He also included data from those scenes in TV
programs (e.g., news) where the participant was reading, assuming that speech errors and reading errors occur through a similar mechanism (p.8). His interest was in illustrating how different types of units are assimilated to or dissimilated from their neighboring elements, not in investigating statistical distributions of different SOT types.
Kubozono (1985, 1989) argued that the mora is a psychologically real unit that plays an important role in speech production in Japanese, by presenting SOTs where morae rather than syllables serve as units. There is one passage where he took discourse contexts into
consideration: he speculated that non-syllabic morae, which were borrowed from Chinese, would appear more frequently in academic speech, where many Sino-Japanese morphemes tend to be used, than in non-academic speech (Kubozono 1985:237). The data in Kubozono (1985) consisted of his own data and data in Kamio and Tonoike (1979) and Tabusa (1982), and the data in Kubozono (1989) consisted of his own data and data from Terao (1984), Tabusa (1982), and Tonoike (1983) (Kubozono 1985:227, 1989:251). He did not mention how the data were collected.
Terao (1987, 1995) found that the nominative case particle ga, the accusative case
particle o, and the genitive case particle no, especially the pair of the first two, are more likely to
be involved in substitutions and exchanges than any other pair of particles. He argued that this is because those particles are activated under the influence of the predicate; however, another explanation is that these particles may be just more frequent than others. Terao (1987) collected his data from natural conversation, which he claims includes conversation in TV and radio programs. He employed the pen-and-paper method for 78% of the data and the tape-recording method for the rest. Terao (1995) used data augmented from the ones used in Terao (1987).
SOTs in Japanese were experimentally studied by Wells-Jensen (1999). She had
speakers of English, Hindi, Japanese, Spanish, and Turkish narrate a videotaped cartoon with its sound muted and compared SOTs in five languages. She found that, although the language production mechanism is the same regardless of the language, and languages show equal complexity as a whole, languages differ in those aspects where complexity appears and this is reflected in different patterns of SOTs across languages. Some of her findings about Japanese SOTs are as follows.
(a) 1. There are significantly more SOTs involving syllables in Japanese than in the other four languages (pp.154-155, 179-181). This may be because of the
almost consistent CV syllable structure. She finds no instances where morae rather than syllables serve as planning units (p.181), as claimed by Kubozono (1985, 1989).
2. Although SOTs involving inflectional morphemes are more common than derivational SOTs in many languages, Japanese, which has little inflectional morphology, has significantly fewer inflectional SOTs than the other four languages (p.156, pp.171-172).
3. Japanese, which has a large number of semantically related and
phonologically similar transitive and intransitive verb pairs which are stored as separate entries in the lexicon, shows many more lexical substitutions i
nvolving these verbs than the other four languages (pp.167-168).
4. Since the consonant inventory of Japanese is small compared to the other four languages, it shows significantly fewer paradigmatic consonant SOTs than
those languages (p.179).
Note that data in her study were collected consistently from the same type of source.
In sum, there are no Japanese SOT corpora gathered exclusively from real spontaneous speech in daily life or exclusively from preplanned speech in daily life.
[4] The author of this paper is a researcher highly skilled in SOT detection and collection, who has received instruction in phonetics and in psycholinguistics from Dr. Jeri Jaeger. On the other hand, it is not clear how motivated the subjects of the experiments in (1) were.
[5] Contrary to findings by Dell (1990), Schwartz et al. (1994), and Dell et al. (1997), more anticipatory errors were found as the speech rate increased in Wells-Jensen’s (1999) experiment.
She holds that this may be because the task “forced speakers to plan ahead in terms of content in order to get the correct truth value for their utterances” (p.121).
[6] The argument here applies to data collection from adults, but not from children.
[7] Nevertheless, there are those situations where speech is unsuspendable and cannot be tape-recorded. Examples are conversations among strangers that one overhears on the street or on the bus, and public lectures where tape-recording is prohibited.
[8] Specifically, when schoolchildren make excuses to their teachers for being late for school, when people make marriage proposals to their girlfriends or boyfriends, when parents convince their children to behave, when a robber demands money in a bank, when telling stories, etc.
[9] Practice of public speech, the researcher’s own public speech, and conversation with authority would be placed in the figure as follows.
Figure 1.1’: Different kinds of data collection contexts defined by suspendablility and spontaneity
more spontaneous more preplanned
more everyday conversation practice of public speech suspendable
the researcher’s own public speech more
unsuspendable conversation with authority conversation in TV programs, public speech
[10] At the beginning of a live program in Japan, the emcee says that it is live and/or the screen says so.
Even in a live-broadcast program, the speech is more or less preplanned, except that the speech is spontaneous in some unpredictable portions of programs like play-by-play sport broadcasting. Any program has its script, and their participants are expected to follow it, although it depends on the program how much in detail the script is written and how precisely the participant has to follow it (Yumi Tanaka:personal communication).
[11] Nevertheless, there are still those portions of such live-broadcast programs which should be excluded to examine only uneditable portions of live-broadcast TV programs, so that speech that is produced through the same production mechanism as in the naturalistic setting can be examined. First, no data was collected from utterances that may involve reading rather than speaking. This is the reason why news and sport broadcasting programs were avoided. Even in talk shows and entertainment shows, the speaker can sometimes read a letter or fax or written material on a board. In some cases, the speaker looks at the script at hand or some other written material shown by the director while speaking. When a videotape is played within a program, it is possible that the speaker may be using the script. There is a further case where the speaker is talking with someone on the phone. In such a case, the speaker in the TV studio is not reading but the person on the phone who is not filmed may be speaking from notes. Hence, the latter’s utterance was excepted. Thus, any reading portions of the programs, that is those cases where written material that is related to what a speaker is saying is or may be visible to the speaker, were eliminated so that this study does not include any data concerning reading on the ground
that reading must involve a processing mechanism different from speaking. For this reason, the present study does not gather data from news programs or news embedded in live-broadcast talk shows and entertainment shows. Second, no data comes from utterances in such videotaped reports as documentaries embedded in live-broadcast TV programs simply because they are not live but videotaped and must have been edited. Third, Japanese sentences produced by a simultaneous interpreter who is translating what another speaker says in another language were excluded on the assumption that translation is different in processing from normal speech production planning. Fourth, the present study does not deal with songs performed in live-broadcast TV programs, because pronunciations, especially vowel lengths and pitch accents, are somewhat different from those in speech, and also because, even when the singer makes an error, the target is unknown unless the caption shows it or the researcher is familiar with the song.
[12] First, SOTs have to be discerned from restarts or false starts. According to Wells-Jensen (1999), the following types of restarts, which are not errors in speech production planning, are not SOTs.
(a) change of proposition
e.g. And the brother come ... has to save the fish ... fishbowl again.
(The speaker probably wanted to say “comes into the room” or “comes over.” The
“fish ... fishbowl” part is a stutter, which is not a SOT, either, as mentioned later.) (b) substitution of a lexical item for its subordinate lexical item
e.g. Kind of like an Irish da ... jig.
(The speaker probably wanted to say “dance.”) (c) clarification of a pronoun referent
e.g. He’s yelling at him ... yelling at the cat.
(d) insertion of optional adverbial information e.g. And the f ... and now the fish is mad.
(e) cases where it is impossible to tell what the target was (for example, when the speaker said only one consonant)
e.g. They’re bobbin’ their heads, /k/ singin’ with the song.
Wells-Jensen (1999:86-87)
A restart like (a) occurs when the speaker starts to express a proposition, notices that it was not appropriate, and then corrects it by restarting with another proposition. It often involves the substitution of a verb phrase for another verb phrase. The issue is whether the speaker actually intended to say the wrong proposition or not. If the wrong proposition were a deviation from the speaker’s intended one, it would have to be classified as a SOT, like a lexical
substitution where the wrong lexical item is a deviation from the speaker’s intended lexical item.
However, it is usually the case that the speaker actually intended to say the wrong proposition.
Therefore, a change of proposition should not be considered to be a SOT.
(b) is a difficult case. Is it the case that the speaker actually intended to say the
superordinate lexical item, realized, after saying it, that it had not been suitable, and replaced it by its subordinate one? As long as it is not a deviation from the speaker’s intention, it should not be a SOT. Hence, the issue is whether it is the superordinate lexical item or the subordinate one that the speaker intended. Levelt et al. (1999:3-4) examines this issue in terms of “perspective taking.” According to them, when a word is produced, the first process involved is “conceptual
preparation,” after which the lexical concept is activated. The process makes reference to the discourse context (“discourse record”: Levelt 1989). The activated concept is mapped onto the lexical item at the stage of lexical selection, but the mapping is not necessarily one-to-one.
There can be many-to-one relationships where different concepts can be expressed with the same lexical item (polysemy). Can there be one-to-many relationships? Levelt et al. maintains that even after a single concept is activated, there is more than one way to refer to the same object by mentioning the case where the same object may be called “animal,” “horse,” “mare,” and so on.
However, do these different lexical items express the same concept? Obviously, no. Their argument is based on the following presupposition about what Levelt (1989) calls the “hypernym problem.”
When lemma A’s meaning entails lemma B’s meaning, B is a hypernym of A. If A’s conceptual conditions are met, then B’s are necessarily also satisfied. Hence, if A is the correct lemma, B will (also) be retrieved.
Levelt (1989:201)
In short, given a concept to be expressed, the hypernym is also accessed. Therefore, if a lexical item is replaced by its hypernym or superordinate item, the substitution can occur at the level of lexical selection rather than that of conceptual preparation. If this is the case, the substitution must be a SOT. Therefore, the indeterminacy lies in whether the substitution occurs at the level of conceptual preparation partly determined by perspective taking or at the level of lexical selection. Further research seems to be necessary to solve this problem. Nonetheless, this type of substitution is extremely rare, and my data does not contain a single instance.
(c) is due to an error in conceptual preparation, not in lexical selection. The process of conceptual preparation is required to take the discourse context into consideration. At first, the concept was formulated correctly and a right lexical item was selected according to the standard of the speaker’s intention. The speaker found the use of the pronoun inappropriate in the
context, and corrected it with the use of the common noun that fits the context. Therefore, this type of error is not a SOT.
Also in a restart like (d), the concept for the expression without the optional adverbial information is formulated in light of the discourse context. At first, the concept accords with the speaker’s intention. The speaker later finds the expression without the optional adverbial
information inappropriate and corrects it with a more elaborate expression.
(e) is also a difficult case, because, if the researcher were able to reconstruct what the speaker intended to say from the erroneous portion of the utterance and the context where it occurred, it might be classified as a SOT. It is even more difficult, especially when the tape-recording method or an experimental method is employed, because the researcher cannot ask the subject about the target. However, errors of this type are very rare to the extent that its
occurrence is negligible: I found only two utterances that belong to this type in the TV programs.
The second type of errors in speech to be differentiated from SOTs are deliberately-given, ungrammatical utterances. Errors sometimes provoke laughter and Japanese comedians on TV attempt to make people laugh, but it is rare for them to make use of ungrammatical
utterances that they intentionally produce. I encountered only one such instance in one of the TV programs, which seems to have been deliberately produced, judging from the context. The speaker, who regularly says Tooi toko(ro) doomo. (far place thank.you) ‘Thank you for coming
from distant places.’ at the beginning of the section of the program where the “error” occurred,”
said Toitokodoidoko (NONWORD). The “error” was not corrected; instead, he made a comment on his “error” by saying Kan-ja-tta. put.away-PAST) Kan-ja-tta-nee. (bite-put.away-PAST-SFT) ‘I’ve made a speech error (lit, I have bitten my tongue). I’ve made a speech error, haven’t I?.’ In the previous discourse, the speaker had made several speech errors including SOTs and another speaker pointed out that he was making more speech errors on that day than usual. Because he often makes fun of people including himself by pointing out their errors, probably he wanted to show that his tongue faltered during the program, and it is very likely that the error was intentional.
Third, SOTs are different from discourse anomalies. Wells-Jensen (1999:89) discusses cases where the inappropriate use of a pronoun shows a discourse anomaly but is not a SOT, specifically cases where the referent of the pronoun has not appeared in the discourse but can be inferred from the context, where the number of the pronoun is incorrect, and where the referent of the pronoun is ambiguous.
There are also discourse anomalies that do not involve pronouns and are distinct from SOTs, as in the following example that occurred in everyday conversation. One speaker (A) started the conversation with the topic of weather, as is common with conversations in Japanese, by saying Kotosi no natu wa raku-desu-yo-nee. (this.year GEN summer TOP easy-COP.PLT-SFT-SFT) ‘This summer is easy.’ The other speaker (B), however, was not sure what the topic that A initiated was, and said Kikoo desu-ka? (weather COP.PLT-Q) ‘Is it the weather that you are talking about ?.’ The information that A provided by his utterance may have been
insufficient to B, but A intended the utterance. Therefore, the utterance, which was a discourse anomaly to B, is not a SOT.
Fourth, SOTs need to be distinguished from expressions or pronunciations idiosyncratic to an individual. The distinction was easily made in the case of the naturalistic data collection.
However, if the speaker’s idiolect is unfamiliar to the researcher, it is difficult to make a clear distinction between an SOT and the speaker’s idiolect. It is even more difficult in the case of data collection from TV programs, because the researcher cannot always reconstruct the intended utterance with precision by asking the speaker about it. Nevertheless, if the speaker self-corrects the SOT that has been made, or at least notices it (for example, after the listener points it out), the researcher can tell that it was not the speaker’s idiolect but a SOT. Out of 249 SOTs collected from TV programs in this study, 195 SOTs were self-corrected by the speakers. Out of the remaining 54 (21.7%), some of them are clearly SOTs, judging from each speaker’s correct utterances in other places in the program. Therefore, for the rest of what are described as SOTs, it is almost impossible to tell precisely whether what appears to be a SOT is really a SOT or the speaker’s idiolect, but it is very likely that they are SOTs.
In addition to the aforementioned four types of errors in speech, there are additional types of anomalous utterances that appear to be errors but should be distinguished from SOTs (Wells-Jensen 1999:13). First, repetitions that are used for clarification or emphasis are not SOTs. If a word is repeated in a sentence, the sentence as a whole is very often ungrammatical, technically speaking. However, such repetitions occur with the speaker’s intention of clarification or emphasis. Since they accord with the speaker’s intention, they are not SOTs. Second, stutters, stammers, and other disfluent utterances caused by motor disfluencies (e.g., an incorrectly-articulated utterance given with the mouth full) are not SOTs. SOTs occur during the speech production planning, not at the stage of motor planning or commands in the articulatory systems.