関西学院大学リポジトリ

(1)

PRODUCTION OF ENGLISH SCHWA BY JAPANESE

SPEAKERS

著者（英）

Kaori Sugiura

学位名

博士（言語コミュニケーション文化）

学位授与機関

関西学院大学

学位授与番号

34504甲第568号

URL

http://hdl.handle.net/10236/00025148

(2)

PRODUCTION OF ENGLISH SCHWA BY JAPANESE SPEAKERS by Kaori Sugiura A Dissertation Presented to

The Graduate School of Language, Communication, and Culture Kwansei Gakuin University

In Partial Fulfillment

of the Requirements for the Degree Doctor of Philosophy

(3)

Doctor of Philosophy Dissertation

PRODUCTION OF ENGLISH SCHWA BY JAPANESE SPEAKERS

by

Kaori Sugiura

Members of Evaluation Committee Major Advisor:

Associate Advisor: Associate Advisor: Associate Advisor:

(4)

Acknowledgements

This dissertation was completed thanks to the advice, encouragement, and support of many people.

First, I would like to express my sincere gratitude to my supervisor, Hiromi Otaka, who inspired my interest in phonetics and phonology and who patiently gave me valuable advice and feedback on my dissertation drafts. I also greatly appreciate his forbearance over the long years it took me to finish the dissertation.

In addition to my supervisor, I would like to thank the rest of my thesis committee members, Takaaki Kanzaki, Naoya Hase, and Keiichi Ishikawa, for their insightful comments and suggestions. I also wish to express my thanks to Katsumasa Yagi, Shuhei Kadota, and Hisao Asada for their invaluable advice, especially during my graduate school days.

I am grateful to Yuriko Kaito (Kansai University), who provided me with a supportive and stimulating academic environment in which to work.

Additional gratitude is offered to the assistance of the members of the LET (The Japan Association for Language Education and Technology) Kansai, Fundamental Research Group, who made valuable comments on my study.

I thank my colleagues Thomas Pals and Byron O’Neil and the many students who participated in the experiments.

A special thanks goes to Mark Sheehan, a former colleague, for his consistent encouragement and advice on my written English. I would also like to express my deep gratitude to the members of staff at Kwansei Gakuin University Language Center for their

(5)

generous time and support. I am also extremely grateful to Kwansei Gakuin University for the scholarship that enabled me to undertake a PhD program at Graduate School of Language, Communication, and Culture for three years.

This research was partially supported by the Grant-in-Aid for Scientific Research of the Ministry of Education, Culture, Sports, Science and Technology in Japan (No. 24720262 and 26370681). Finally, I want to express my deepest appreciation to my parents and sister for their understanding and generous support of my graduate research.

(6)

Abstract

PRODUCTION OF ENGLISH SCHWA BY JAPANESE SPEAKERS

by

Kaori Sugiura

This dissertation acoustically investigates how Japanese speakers deal with the English schwa when speaking English. This investigation is interesting because the Japanese language does not have a reduced vowel equivalent to the English schwa. In addition, English and Japanese differ in language rhythm: English has a stress-timed rhythm where schwa plays a crucial role, whereas Japanese has a mora-timed rhythm, in which a reduced vowel like schwa is not necessary to generate the rhythm. Given these factors, the intriguing questions raised here are as follows: (1) how do Japanese speakers pronounce the English schwa (as its loanword adaptation into Japanese and as L2 speech production) and (2) whether and how Japanese speakers can improve the English schwa through training with auditory word repetition. The acoustical analyses focused on the duration ratio and the quality. The results of the online experiment on loanword adaptation indicate that the English schwa is basically adapted into the phonetic approximants,

(7)

reflecting the quality of schwa that varies in accordance with phonetic environment when auditory input is provided. This suggests that Japanese speakers of English are sensitive to subtle differences in its quality. In addition, when orthographic information is available in addition to auditory input, it greatly impacts the adaptation of schwa. The results from the task of reading the words aloud demonstrate that Japanese learners of English at the intermediate and high level have problems with the schwa in the initial syllable (duration). Additionally, the duration aspect of schwa develops more easily than the quality aspect, suggesting that the Japanese can effectively exploit the phonetic cues of duration in the L2 acquisition. Lastly, the results of the three experiments investigating the effect of immediate repetition of auditory words on pronunciation improvement confirmed the effectiveness of restructuring the existing phonological representation in terms of duration, resulting in

pronunciation improvement. More specifically, (1) a

weakly-represented linguistic pattern (weak-strong stress pattern: schwa in the initial syllable of a word) is strengthened by repetition, (2) a certain amount of orthographic input is necessary after intensive repetition with auditory words, suggesting that timing and amount of auditory and orthography input greatly influence the formation of target-like L2 phonological representation of schwa (this was clearly shown in the aspect of duration ratio), and (3) attention to syllables in auditory words facilitates (albeit insufficiently) the shaping of the English rhythm of phonological representation.

To conclude, the importance of L2 pronunciation pedagogy and further research from the perspectives of L2 phonological processing is suggested.

(8)

Contents

Members of Evaluation Committee i

Acknowledgements ii

Abstract iv

Introduction 1

Chapter 1 Literature Review 5

1.1 Characteristics of English Schwa 5

1.1.1 Two Types of Schwa 5

1.1.2 Phonetic Characteristics of Schwa in English 7

1.1.2.1 Duration 7

1.1.2.2 Effect of speech rate on the duration of schwa 8

1.1.2.3 Formant frequencies 10

1.1.2.4 Effect of schwa position in a word on its

quality 12

1.1.2.5Vowel duration variability and distribution of

schwa. 14

1.1.2.6 Summary of the phonetic characteristics of

schwa 16

1.1.3 Phonological Characteristics of English Schwa 17

1.1.3.1 Phonological status of schwa 17

1.1.3.2 Historical development of schwa 18

1.1.3.3 Syllabic consonants and schwa in a word final

syllable 19

1.1.3.4 Summary of the phonological characteristics of

schwa 23

1.2 English and Japanese Phonologies 24

(9)

1.2.2 Devoiced Vowels 27

1.2.3 A Mora and a Syllable 27

1.2.4 English and Japanese Word Accents 28

1.2.5 Language Rhythm 29

1.2.6 Summary of English versus Japanese Phonologies 31 1.3 Preference of Word Accent Patterns in English and

Japanese 33

1.3.1 Preference of Word Accent Patterns by

English-speaking Children 33

Japanese-speaking Children 36

1.3.3 Summary of Preference of Word Accent Patterns in

English and Japanese 37

1.4 Loanword Adaptation 39

1.4.1 Loanword Adaptation Theories 39

1.4.1.1Phonological views 40

1.4.1.2 Phonetic views 42

1.4.1.3 Integrated views 45

1.4.2 Summary of the Findings and Further Research 46 1.5 Second Language Acquisition of Word Stress and

Schwa 48

1.5.1 Duration of Schwa 48

1.5.2 Quality of Schwa 52

1.5.3 Relations between Schwa and Rhythm: Schwa’s

Phonetic Behaviors in a Rhythmic Unit 57 1.5.4 Summary of the Findings and Further Research 62

1.6 Second Language Pronunciation Training 65

(10)

1.6.1 Effects of Intensive Experience of L2 Auditory Input

on L2 Pronunciation Improvement 65

1.6.2 Summary of the Findings and Further Research 68

Chapter 2 Study 1: Loanword Adaptations of English Schwa into

Japanese 71 2.1 Introduction 71 2.2 Background 73 2.3 Research Questions 76 2.4 Methods 76 2.4.1 Participants 76 2.4.2 Experimental Paradigm 77 2.4.3 Materials 77 2.4.4 Presentation of Materials 78 2.4.5 Carrier Sentences 79 2.4.6 Procedures 79 2.4.7 Analysis 80

2.5 Results and Discussion 80

2.5.1 Schwa in the Pre-tonic Syllable 80

2.5.2 Schwa in the Post-tonic Syllable 85

2.5.3 Orthographic Effect 89

2.5.4 Comparison of Results with Existing Loanwords 90

2.6 Chapter 2: Conclusions and Further Research 93

2.6.1 Summary of Findings 93

2.6.2 Limitations and Further Research 95

2.6.3 Pedagogical Implications 96

Chapter 3 Study 2: Production of Schwa by Japanese Speakers of

English 98

(11)

3.2 Background 100 3.2.1 Phonetic Features Used in English and

Japanese 100

3.2.2 The Characteristics of Schwa according to its

Position 102

3.2.3 The Prediction for which Position Schwa is Easy to

Acquire—Markedness Differential Hypothesis 104

3.3 Research Questions and Hypotheses 105

3.4 Methods 106

3.4.1 Participants 106

3.4.2 Materials 108

3.4.3 Analysis 109

3.5 Results and Discussion 109

3.5.1 Results and Discussion on Duration Ratio 110

3.5.1.1 Results on the duration 110

3.5.1.2 Discussion on duration ratio 112

3.5.2 Results and Discussion on Quality 114

3.5.2.1 Results on F1 values 114

3.5.2.2 Discussion on F1 values 117

3.5.2.3 Results on F2 values 117

3.5.2.4 Discussion on F2 values 120

3.5.2.5 Summary of F1 and F2 values 121

3.6 Chapter 3 Conclusion and Further Research 125

3.6.1 Summary of Findings 125

3.6.2 Limitations and Further Research 126

Chapter4 Study 3: Pronunciation Training for Schwa by Auditory

Word Repetition 128

(12)

4.1.1 Characteristics of Schwa Realized by Japanese Speakers of

English 129

4.1.1.1 Duration ratio 129

4.1.1.2 Quality 129

4.1.2 Auditory Words Repetition 130

4.1.3 Experimental Design 131

4.1.4 Second Language Acquisition Process and the

Present Study 133

4.2 Experiment 1: Effects of the Number of Repetitions and Characteristics of Word Stimuli on Pronunciation

Improvement 137

4.2.1 Introduction to Experiment 1 137

4.2.2 Background 139

4.2.2.1 Amount of input (number of repetitions) 139

4.2.2.2 Word familiarity 140

4.2.2.3 Position of schwa in a word (word stress

pattern) 141 4.2.2.4 Persistent effect 144 4.2.3 Research Questions 144 4.2.4 Method 145 4.2.4.1 Participants 145 4.2.4.2 Materials 145

4.2.4.3 Counterbalancing the materials 148

4.2.4.4 Procedures 148

4.2.4.5 Analysis 150

4.2.5 Results and Discussion 151

4.2.5.1 Duration ratio 151

(13)

4.2.5.3 Quality: F2 values 167 4.2.6 Chapter 4.2 Conclusion and Further Study 175

4.2.6.1 Summary of findings 175

4.2.6.2 Limitation and further study 179

4.3 Experiment 2: Effect of Orthographic Information on

Pronunciation Improvement 181

4.3.1 Introduction to Experiment 2 181

4.3.2 Background and Present Study 183

4.3.2.1 Effect of orthography on L2 pronunciation 183 4.3.2.2 Pronunciation training and elicitation task 186

4.3.3 Research Questions 188 4.3.4 Methods 189 4.3.4.1 Participants 189 4.3.4.2 Materials 190 4.3.4.3 Procedures 191 4.3.4.4 Analysis 192

4.3.5 Results and Discussion 1: Comparison of the

Pronunciation in Non-studied Conditions in Tests 1

and 2 193

4.3.5.1 Duration ratio in non-studied conditions in

Tests 1 and 2 194

4.3.5.2 Quality in non-studied conditions in Tests 1 and

2 196

4.3.5.3 Summary of duration ratio and quality in non-studied conditions in Tests 1 and

2 199

4.3.6 Results and Discussion 2: Training Effects on Duration

(14)

4.3.6.1 Duration ratio: Test 1 200

4.3.6.2 Duration ratio: Test 2 204

4.3.6.3 Duration ratio: Presentation timing of

orthography in Tests 1 and 2 207 4.3.6.4 Discussion on the duration ratio 209 4.3.7 Results and Discussion 3: Training Effects on

Quality 213

4.3.7.1 Quality: Test 1 213

4.3.7.2 Quality: Test 2 217

4.3.8 Chapter 4.3 Conclusion and Further Study 222

4.3.8.2 Limitations and further study 225

4.4 Experiment 3: Learners’ Attention to the Phonological Form of

Auditory Words 226 4.4.1 Introduction to Experiment 3 226 4.4.2 Background 227 4.4.3 Research Questions 231 4.4.4 Methods 232 4.4.4.1 Participants 232 4.4.4.2 Materials 232 4.4.4.3 Procedures 235 4.4.4.4 Analysis 236

4.4.5 Results and Discussion 237

4.4.5.1 Results and discussion on duration ratio 237 4.4.5.2 Results and discussion on quality 242

4.4.6 Discussion 250

4.4.6.1 Test 1 253

(15)

4.4.7 Chapter 4.4 Conclusion and Further Study 255

4.4.7.2 Limitations and further study 256

4.5 Summary of Experiments 1, 2, and 3 and Further Study 258

Conclusion 262

References 268

(16)

Introduction

This dissertation acoustically investigates how Japanese speakers produce the English schwa [ə] when adapting it into Japanese and when speaking English as a second language (L2), and whether or not they can improve the pronunciation of schwa by the immediate repetition of auditory words that include schwa.

The English schwa, which is articulated with the tongue returning to its mid-central rest position (Whitley, 2004, p. 150), is a unique vowel as compared to the full vowels. Firstly, this sound occurs only in reduced syllables that do not receive stress, and it is likely to be pronounced with shorter duration and lower pitch and intensity compared to a vowel in a stressed syllable (Wallace, 1994). Second, the quality is greatly influenced by the adjacent phonetic environments (e.g., Kondo, 1994). Moreover, the schwa occurs more often than any other vowels in English and appears in all vowel orthographies (Cruttenden, 2014, p. 138). Lastly, the schwa plays important roles in creating a stress-timed language rhythm (e.g., Abercrombie, 1967, p. 97; Pike, 1946). Since language rhythm affects speech intelligibility (e.g., Tajima, Port, & Dalby, 1997), it is crucial for L2 leaners to acquire English schwa.

Investigating how Japanese speakers pronounce English schwa is interesting. One of the reasons is that Japanese language does not have such a reduced vowel in the vowel system. Another reason is that English and Japanese differ in language rhythm: English has a

(17)

stress-timed rhythm where schwa plays a crucial role, while Japanese has a mora-timed rhythm, in which a reduced vowel like schwa is not required to produce the rhythm.

This dissertation, which focuses on English schwa produced by Japanese speakers, especially examines the two aspects of the schwa: the duration ratio of schwa to a stressed vowel in a word (henceforth, the duration ratio) and the quality, both of which greatly influence the English rhythm (e.g., Beckman, 1986 for the duration ratio; Grabe & Low 2002 for the quality). In terms of the duration ratio, Japanese learners of English might have difficulty in pronouncing the native-like schwa because in the rhythm of the Japanese language, each mora is pronounced approximately equally in duration, while English has a stress-timed rhythm characterized by alternations of stressed syllables that are significantly longer than unstressed ones where schwa appears most frequently (Bolinger, 1965). As for quality, probably because Japanese does not have a sound that is phonologically identical to the schwa (Vance, 1987), Japanese speakers might depend on their L1 full vowels and fail to produce the target quality when pronouncing schwa.

This dissertation centers on an investigation of, first, how Japanese speakers perceive and pronounce the English schwa. Second, the research delves into whether or not and how they can improve the English schwa through training with auditory word repetition, by focusing on the duration ratio and the quality.

(18)

investigates how Japanese speakers perceive English schwa and produces it from the perspective of loanword adaptation (Chapter 2). The present study sheds light on whether or not Japanese speakers are sensitive to the quality of schwa that varies with neighboring phonetic environments.

Then, Study 2 investigates how Japanese speakers of English pronounce the English schwa in a word when it is presented in written form (Chapter 3). The positions of schwa (initial, medial, and final syllables) in a word are examined. Since investigations of the contexts in which schwa may be more difficult for Japanese speakers are lacking (Tomita, Yamada, & Takatsuka, 2010), the study will contribute to the literature.

Based on the findings from these two studies investigating the characteristics of English schwa perceived and produced by Japanese speakers, to address the second objective, Study 3 of this dissertation investigates whether or not and how Japanese speakers can improve the pronunciation of English schwa by a pronunciation training (Chapter 4).

As a training method, the immediate repetition of auditory words, in which learners listen to and repeat the presented auditory words as quickly as possible, is used since the method is considered to allow the learners to imitate specific phonetic details of stimuli that are not used phonologically in their first language (Goldinger, 1998). As a unique aspect of this study, the amount (i.e., the number of repetitions) and quality (i.e., word familiarity, the position of schwa in a word, orthographic information, acoustically enhanced syllables of a word) of the auditory words as language input are controlled in the experiments because these factors are crucial for facilitating the

(19)

process of L2 language input. This investigation will be able to reveal how much auditory input and what type of linguistic information Japanese learners are sensitive to and utilize to improve their pronunciation of schwa. The findings of this study are discussed based on the phonological encoding processing in the model of L2 speech production (Kormos, 2006), and using the model, further research for improving English schwa by Japanese learners of English will be proposed. The view of L2 speech processing will pave the way for providing new perspectives to L2 pronunciation research and teaching.

The dissertation is organized as follows: Chapter 1 provides a literature review, including studies on English schwa, Japanese and English phonology, the theories of loanword adaptation, production of English schwa (or unstressed vowels) by L2 learners, and oral repetition in L2 pronunciation training. Chapter 2 (Study 1) presents the findings that investigate how Japanese people perceives English schwa and adapts it into Japanese sounds. Chapter 3 (Study 2) reports in which position it is the most difficult for Japanese leaners of English at intermediate and advanced levels to produce English schwa in the same manner as native speakers of English in terms of duration ratio of schwa to a stressed vowel (henceforth, duration ratio) and quality of schwa. Chapter 4 (Study 3) shows the effect of auditory word repetition on the improvement of English schwa in terms of duration and quality of schwa by Japanese learners at the intermediate level. Finally, Conclusion summarizes the findings from the perspective of L2 speech production and claims the significance of this dissertation.

(20)

Chapter 1 Literature Review 1.1 Characteristics of English Schwa

Schwa is uniformly transcribed as /_{ə/ in the International} Phonetic Alphabet. However, it is often observed that the quality of schwa in languages varies greatly across contexts. Focusing on the English schwa, this section first introduces the lexical and phonetic schwa. Then, it provides the phonetic and phonological characteristics of schwa. In terms of phonetic nature, its duration and quality are introduced along with how other factors, such as phonetic environments, the positions in a word, and speech rates, influence the phonetic characteristics of schwa. The information on the phonetic characteristics of schwa in this study is crucial, as it establishes the criteria for judging the existence of schwa produced by the participants.

In terms of phonology, schwa is explained from the perspective of historical phonology. In addition, the issue of the ubiquity of syllabic consonants and the schwa in a final syllable count in a word (Cohen, 1957; Toft, 2002), is described for a better understanding of English schwa.

1.1.1 Two Types of Schwa

Two types of schwa can be categorized according to their phonological characteristics. The first is obligatory schwa 1

1 _{Since schwas in function words are not lexically significant, as opposed to the} ones in content words, several factors, such as planning problems in the speech production, predictability, the segmental context, and the rate of speech, greatly influence the phonetic forms (Jurafsky,Bell, Fosler-Lussier, Girand, &

(21)

(Bolinger, 1985) or a lexical vowel reduction (Van Bergem, 1995). This type of schwa can occur in any position (i.e., initial, medial, or final) in a content word (e.g., ‘ago,’ ‘atom,’ ‘column,’ ‘telephony,’ and ‘sofa2_’) (Roca & Johnson, 1999; Whitley, 2004). It occurs irrespective of the local contexts, such as stress and speaking rate. The present study examined this lexical schwa.

The second is a non-obligatory or non-lexical schwa, called an acoustic vowel reduction (Van Bergem, 1995). It occurs in function words (e.g., ‘a,’ ‘the,’ ‘ to,’and ‘of ’) (Jurafsky et al., 1998). This type of schwa occurs when vowels are unintentionally reduced due to speech rate and speech style. Thus, it is not phonologically represented in a word (Dalby, 1984; Van Bergem, 1995).

As a non-lexical schwa, a transitional schwa (Davidson, 2005, 2006; Gick & Wilson, 2006) appears in the sequence of a high tense vowel (i.e., [i]) plus liquid (i.e., [l] or [r]), as in the English words ‘heel,’ ‘hail,’ and ‘hire.’ It does not have phonemic status because it appears as a phonetic result of the tongue passing through a schwa-like configuration during the transition from the preceding high vowel toward the coda [l]. Producing a transitional schwa is a strategy for reconciling an intrinsic conflict between the articulatory targets (e.g., [i] and [l] in ‘heal’ [hi:l]). Compared to the lexical schwa, the transitional schwa exhibits lower first formant frequency and a very short length, which are articulated by closing the mouth more than when pronouncing the lexical schwa (Flemming, 2004). It should be kept in mind that defining non-lexical schwa is not straightforward. Specifically, no linguistic criterion exists for deciding exactly under

Raymond,1998).

(22)

what conditions3_{an underlying full vowel in a function word becomes} schwa. However, among the function words, ‘a’ and ‘the,’ occur most often as schwa (about 76% and 70%, respectively), compared to other function words, such as ‘in,’ ‘on,’and ‘and’(Jurafsky, et al., 1998) .

1.1.2 Phonetic Characteristics of Schwa in English

This section describes the phonetic nature of schwa in terms of duration of schwa, effect of speech rate on the duration, quality (formant frequencies), and effect of schwa’s position in a word on its quality. In addition, the relationship between the duration and quality of schwa is introduced.

1.1.2.1 Duration

Wallace (1994) examined the phonetic characteristics of schwa using a large speech corpus of two American male speakers in three different modes (conversation, sentence reading, and wordlist reading). She indicated that the duration of the schwa varied depending on its position within a word. In word-initial and word-internal syllables (between 33 and 53 milliseconds), the schwa had a relatively shorter duration than in the word-final syllables (59 to 131 milliseconds). The schwa also showed a longer average duration in the word mode (71 milliseconds) than in the sentence conversation mode (50 and 58 milliseconds).

In terms of the duration ratio of a weak syllable to a stressed syllable in different word positions, the one in a non-final syllable of a

3_{It is known that the vowel in a function word such as ‘at’, ‘in’, and ‘on’ can} be full and stressed in careful speech, but may become a schwa in casual speech.

(23)

word was 0. 47 of the length of a stressed vowel; whereas the counterpart in a final syllable, 0.63. These findings are consistent with those of previous studies. Liberman (1960), who compared vowels with and without stress in minimal pairs distinguished by stress (e.g., CONtrast vs. conTRAST), found that stressed vowels were 66% longer than unstressed vowels. Oller (1973) and Lehiste (1975) demonstrated that an unstressed syllable was 65% shorter than a stressed syllable.

In sum, the duration is sensitive to schwa position in a word and the duration ratio of a weak syllable including a schwa to a strong syllable is around 0.4 – 0.65.

1.1.2.2 Effect of speech rate on the duration of schwa

Speech rate is a linguistic factor that influences the acoustic characteristics of vowels (Bell et al., 2003; Wallace, 1994) or causes vowels to become increasingly reduced in fast speech. Although many studies have examined whether speaking rate affects the quality and duration of full vowels (e.g., Crystal & House, 1988 a, b; Fourakis, 1991; Tuller, Kelso, & Harris, 1982), few studies have dealt with the schwa in English.

In their studies of the relationship between speech rate and duration of vowels, including schwa, in the Dutch language, Van Son and Pols (1992) found that the duration of schwa had nothing to do with speech rate and claimed that a schwa is characteristically too short by nature to be shortened further. However, it can be assumed that this result might have been due to the speech rate not being fast enough to make the schwa shorter. Further evidence may be needed to support their result. Although it remains unconfirmed whether

(24)

speech rate influences the reduction of the schwa, several studies have investigated the relationship between the speech rate and deletion of schwa.

First, Rubach (1977) reported that a reduced vowel followed by a sonorant consonant tends to be shortened in rapid and casual speech, for example, the word ‘nation’ ([-_{ʃən] is to become [-ʃn]). In addition,} Patterson, LoCasto, and Connine (2003) conducted a large-scale corpus study showing that the speech rate affects schwa deletion, although it is not the most influential factor in the phenomenon. According to Davidson (2006), the elision of pre-tonic schwa (e.g., such as ‘believe’ → ‘blieve’) in fast speech is due to the overlapping of sequential gestures (Byrd & Tan, 1996). Specifically, on the surface, deletion may appear, but it just reflects an overlap that hides the schwa, and it is actually produced. For example, in the case of /#C1əC2-/ sequences (e.g., dəl-, məl-, bəl-, səl), if the consonant overlaps the schwa, a remnant of the schwa should still appear on the surface. However, if the portion of the schwa remaining in the acoustic signal is very short, it may be difficult to distinguish the schwa both perceptually and visibly from the following /l/. This is because /l/ has a vowel-like formant structure that is influenced by co-articulation; thus, the remaining schwa portion cannot be distinguished from the following /l/ (Ladefoged & Maddieson, 1996). In summary, speech rate can potentially influence the realization of phonetic variants of schwa in English; thus, this factor merits consideration in the present study.

(25)

1.1.2.3 Formant frequencies

Vowel quality is described in terms of formant frequencies. The first formant frequency (F1) reflects the degree of mouth opening; the second formant frequency (F2) primarily reflects the front-back position of the tongue (Lindblom & Sundberg, 1971). Due to the vowel reduction, schwa is likely to appear in the central vowel position and the F1 and F2 range of middle frequency values. An acoustical analysis by Wallace (1994), based on the large corpus of speech produced by two male speakers, shows that schwa has an F1 of 497 Hz, close to the value of a neutral vowel (i.e., 500 Hz) and an F2 of 1685 Hz.

Arguing about the phonetic realization of schwa in terms of formant frequencies, researchers had two main viewpoints. Most researchers agreed that a schwa occurs at the center of the vowel diagram as a result of destressing (Crystal, 1985; Delattre, 1969; Fouraski, 1991; Gimson, 1980; Koopmans-van Benium, 1980). Due to the centralization, the formant frequencies of the schwa approach those of a neutral vowel, that is, around 500 Hz, 1500 Hz, and 2500 Hz for F1, F2, and F3, respectively (Fant, 1960).

Another view is that schwa is assimilated into neighboring phonetic contexts (De Jong, Beckman, & Edwards, 1993; Fowler, 1981; Magen, 1984), and thus its formant frequencies demonstrate more variation than those of stressed vowels (Browman & Goldstein, 1990).

Recent studies resolved this issue and concluded that both centralization and phonetic contextual assimilation are involved in determining the acoustic qualities of schwa (Browman & Goldstein, 1992). Kondo (1994) clearly demonstrated that the quality of schwa

(26)

in British English is realized in a certain target area (i.e., around 500 Hz) in F1, but F2 values vary according to the neighboring phonetic environments due to co-articulation. Kondo’s (1994) study examined the acoustic sequences of VC_əCV4_{with combinations of three vowels} [_{ɪ, ӕ, u] and three consonants [p, t, k] that differ in placement in the} articulation (the labial consonant [p], the coronal consonant [t], and the velar consonant [k]) of British English speakers. The example sentences used for data collection were:

Please dip a pin in the solution.

You may pick a kitten from the basket.

Table 1.1 shows the F1 and F2 values at the midpoint of the schwa produced in the three consonants × three vowels. In terms of F1s, the formant values are relatively consistent regardless of the types of vowels and consonants (e.g., F1s = 301, 276, and 284 Hz in [_ɪ], and consonants = [p], [t], and [k]). As for the F2s, the values differ according to the place of articulation of the neighboring consonants (e.g., F2s = 1391, 1695, and 1945 Hz in [_{ɪ], and consonants = [p], [t],} and [k]). Thus, this study showed that schwas with a neighboring [k] are distributed in a higher region of the vowel space than those with the neighboring consonant [p] due to the co-articulation of successive sounds. From these findings, Kondo (1994) concluded that the schwa is specified for F1 but unspecified for F2.

(27)

Table 1.1

A sample of F1 and F2 values of schwa in the VC_{əCV produced by a} British English speaker

Note. Adapted from “Targetless schwa: is that how we get the impression of stress timing in English?” by Y. Kondo,1994, Proceedings of the Edinburgh Linguistics Department Conference ’94, p. 67.

1.1.2.4 Effect of schwa position in a word on its quality Traditionally, all reduced vowels were transcribed as [_ə] (Chomsky & Halle, 1968), but several researchers have argued that two types of reduced vowels—[_{ɨ] as found in ‘roses’ [roʊˈzɨz] and [ə] as} in ‘Rosa’ [ro_{ʊzə]—exist and should not be described in a uniform} way (e.g., Flemming & Johnson, 2007; Kenstowic, 1994; Ladgefoged, 2001). Both [_{ɨ] and [ə] are central vowels but are distinguished by} height (Trager & Smith, 1951) (see also Figure 1.1).

In addition, Flemming and Johnson (2007) verified that a word-final schwa (e.g., ‘sofas,’ ‘Rosas’) has a different phonetic quality from schwas in other positions (e.g., ‘suggest,’ ‘today,’ ‘ begin,’ ‘probable’), even though it has been uniformly transcribed as [ə]. They argued that the word-final schwa tends to exhibit a mid-vowel quality (i.e., 539 Hz for F1 and 1797 Hz for F2), while schwas

F1(Hz) F2 (Hz) Consonant Vowel [p] [t] [k] [p] [t] [k] [ɪ] 301 276 284 1391 1695 1945 [ӕ] 321 283 293 1266 1567 1850 [u] 312 275 286 1263 1640 1562

(28)

occurring in other positions generally have an F1 of 449 Hz and a relatively higher F2 of 1922 Hz, which is close to that of [i] (Figure 1.2) (see also Cruttenden, 2014, p. 138, for variants of schwa /_ə/).

Figure 1.1. Formant frequencies of all tokens of [_{ɨ] (filled triangles)} and schwa (open squares) from the minimal pairs; the mean formant frequencies of the full vowels (gray circles.) Adapted from “Rosa’s roses: reduced vowels in American English,” by E. Flemming and S. Johnson, 2007, Journal of the International Phonetic Association, 37, p.93. Copyright 2007 by International Phonetic Association.

(29)

Figure 1.2. Formant frequencies of word final schwa (filled triangles) and schwa (open squares) from the minimal pairs; the mean formant frequencies of the full vowels (gray circles.) Adapted from “Rosa’s roses: reduced vowels in American English,” E. Flemming and S. Johnson, 2007, Journal of the International Phonetic Association, 37, p.93. Copyright 2007 by International Phonetic Association.

1.1.2.5 Vowel duration variability and distribution of schwa From the perspectives of the relations between the duration and quality of schwa, Grabe and Low (2002) proposed an acoustic-phonetic system for measurement of rhythm called the Pairwise Variability Index (PVI). Their rhythmic classification of speech differs from traditional systems of classification in that it relies on phonetic rather than phonological features. Instead of measuring phonological units, such as inter-stress intervals or syllable durations, this index calculates durational variability according to successive acoustic phonetic intervals (Grabe & Low, 2002). PVI consists of

(30)

normalized PVI (nPVI) and rawPVI (rPVI).

A formula for calculating nPVI is given in equation (a).

(a)

where m is the number of items in an utterance and d is the duration of kth item.

The latter, rPVI is the measurement of the variability of consonantal intervals. A formula for calculating the rPVI score is given in equation (b).

(b)

where m is the number of intervals, vocalic or intervocalic, in the text and d is the duration of the kth interval.

In this language rhythm indicator, stress-timed rhythm languages are expected to show relatively large nPVI values, which means that the duration of vowels in successive syllables will vary (i.e., a stressed syllable will usually be longer than a successive unstressed syllable in length). Syllable rhythm languages, on the other hand, are expected to have relatively low nPVI scores (i.e., vowels in successive syllables tend to be equal in duration).

Low, Grabe, and Nolan (2000) applied nPVI to British English (BE) and Singapore English (SE) to investigate the variability in the duration of successive vowels (i.e., nPVI and the degree of vowel reduction in an unstressed syllable spectral pattern). They found that SE demonstrated significantly less variability between

(31)

successive vowels (52.3 in the nPVI) than BE, and it exhibited reduced vowels in the periphery area of the first and second formant spaces. This finding suggests that the presence of schwa in a language can be predicted by vocalic variability to some extent; unstressed vowels in languages with a low nPVI values may potentially be less reduced.

1.1.2.6 Summary of the phonetic characteristics of schwa The findings of the phonetic characteristics of schwa, taken together, can be summarized as follows:

(1) There are two types of schwa: the obligatory lexical schwa, which appears in content words (i.e., ‘ago,’ ‘atom’), and the acoustic schwa, which appears in function words (i.e., ‘the,’ ‘ a’).

(2) The duration of the phonetic characteristics of schwa is sensitive to its position in a word.

(3) Speech rate also seems to influence the deletion of schwa in a word.

(4) The F1 value of a schwa appears to be comparatively stable, while that of F2 varies according to the features of the adjacent consonants.

(5) The position of schwa in a word influences its acoustic qualities.

(6) The presence of schwa might be predicted by vocalic variability to some extent. That is, schwa is likely to exist in languages with large nPVI values (i.e., the duration of vowels in successive syllables varies).

(32)

Thus, these phonetic characteristics must be carefully considered in the present study.

1.1.3 Phonological Characteristics of English Schwa

This section introduces the phonological nature of schwa in terms of phonological status of schwa, historical development of schwa, and syllabic consonants and schwa in a word final syllable. 1.1.3.1 Phonological status of schwa

This section briefly describes the phonological characteristics of schwa. Chomsky and Halle (1968, p. 111) explained the schwa phenomena with the phonological rules, illustrated in Figure 1.3. It indicates the traditional view of English vowel reduction, which shows that every unstressed vowel is reduced to schwa. However, Burzio (1994) argued that this explanation is too simplistic and that vowel reduction is generally blocked in an unstressed closed syllable with obstruents,5_{but not in one with sonorant consonants.}

5_{Obstruents include stops (e.g., [p], [t], [k]), fricatives (e.g., [f], [s], and [ʒ]),} affricates (e.g., [tʃ], [dʒ]), and sonorants refer to nasals (e.g., [n], [m], [ŋ]), liquids (e.g., [l], [r]), and glides (e.g., [j], [w]).

Vowel tensing rule in English

(33)

Figure 1.3. Schwa phenomena with the phonological rules.

1.1.3.2 Historical development of schwa

Ahn (2001) described the historical development of the phonological characteristic of schwa. The English schwa has diachronically changed in form over the last thousand years. Ahn (2001) stated that in Old English (OE) (eighth to eleventh centuries), schwa was an unreduced vowel even in unstressed syllables, and it was clearly pronounced. Although the schwa in the Present-Day English is pronounced almost identically despite being represented by various spellings (e.g., ‘ banana,’ medium,’ ‘ item,’ ‘random’), the schwa in OE represented different sounds with different spellings (Kenyon, 1966, p. 202). However, there is some evidence that certain vowels in unstressed positions were reduced to weak and short forms from full vowels in OE. This fluctuation of the schwa’s pronunciation is reflected in the spelling of <e> in the West Saxon,6 for example, in heofenas (‘heaven’) and adesa (‘adze’) (Hogg, 1992, p. 247).

By the eleventh century, distinctions between the vowels [_{ɛ], [ʊ],} and [а] had become vague in unstressed final syllables. This led to a

6_{The West Saxon dialect is most often referred to as Old English.} Vowel reduction rule in English

stress

(34)

historical transition to a uniform sound, [_{ə] (Gimson, 1972). This} phenomenon had penetrated all the vowels in unstressed syllables by the mid-fifteenth century, and the pronunciation of [_{ə] was similar to} that of ME (also see Cruttenden, 2014, p. 139).

1.1.3.3 Syllabic consonants and schwa in a word final syllable

Let us closely look at the historical change in word-final vowels with which the current study deals. In Old English (OE), a vowel in the word-final position was fully articulated. However, in Middle English (ME) (eleventh to the late-fifteenth centuries), the final vowel was gradually reduced and evolved into a schwa. After the ME period, the word-final schwa was further reduced and ultimately deleted (Minkova, 1991b). Yamane (1996) analyzed this historical sound change using observations from Burzio (1994. pp. 112–126), Minkova (1982, 1991b), Kiparsky (1977), Kisparsky and O’ Neil (1976, pp. 553–4), Wells (1990). She found that the development of word-final schwa deletion varied with the environment within the final syllable. The affecting phonetic and phonological environments were as follows: open syllable, closed syllables with sonorant codas (e.g., [l, m, n, ŋ, r, w, or j]), and closed syllables with non-sonorant codas. Schwa deletion occurred over the course of three different periods: Late Old English (LOE), Late Middle English7_{(LME), and} Modern English (MnE).

As shown in Figure 1.4(a), in word-final open syllables, word-final schwa deletion became obligatory after the LME period.

7_{LOE is a language that was used in the tenth to eleventh centuries, and} LME was used in the fourteenth to sixteenth centuries (Stewart & Vaillette, 2001, p. 414).

(35)

To explain this phenomenon, Yamane adopted Minkova’s (1982; 1991b) argument that schwa deletion occurs to compensate for open syllable lengthening (and vice versa) in order to preserve the metrical rhythm of the language.

Schwa deletion in closed syllables with sonorant consonants, as shown in Figure 1.4 (b), was optional before the MnE period. Yamane (1996) noted that one of the reasons for this deletion was found in metrical poetry. In many cases, had schwa not been optionally deleted, the metrical poetry would have been un-metrical (Kisparsky,1977; Kisparsky & O’Neil, 1976, pp. 553–554). Yamane (1996) further argued that schwa deletion became obligatory in the MnE period, since word-final sonorant consonants could become nuclear syllables without the insertion of schwa8_{(Wells, 1990).} Finally, Yamane (1996) contended that in word-final syllables with a vowel preceding a non-sonorant coda, as indicated in Figure 1.4 (c), the vowel cannot be reduced or deleted because the remaining non-sonorant cannot play the role of the nucleus of the syllable on its own and needs the support of the vowel to form a syllable (Burzio, 1994, pp. 112–126). She concluded that the sonority of the word-final syllable seems to play a significant role in the possibility of vowel reduction and deletion. She added the interesting observation

8_{However, Wells (1995) states that “the correct analysis of syllabic} consonants is to treat them as constituting a phonetic manifestation of an underlying sequence of schwa plus an ordinary consonant. Thus ‘bottle’ is phonemically [batəl] and‘button’ is phonemically [bʌtən].” That is, phonetically, he might support all the sonorant consonants in the word final position as a nuclear syllable without inserting a schwa, but he might not phonologically. However, he admitted that “English has many words that exhibit fluctuation between a syllabic consonant and [ə] plus a non-syllabic consonant,” although he indicated the nasal syllabic consonants as examples.

(36)

that liquid syllabic consonants, such as [l] and [r], trigger the deletion of schwa because these are the consonants closest to vowels in sonority and can take the phonological status of vowels (Hayes, 1995). Her observation has given us an important insight into word-final vowel deletion from the point of view of phonology.

Figure1.4. Optional or obligatory schwa deletion in the three types of word-final environments in each period. PrWd = Prosodic Word, F = Foot, $’= stressed syllable, $ = unstressed syllable, @ = schwa, #-word boundary. LOE = Late Old English, LME = Late Middle English, prE = present English. 9 _{Adapted from “Rekishiteki Oninhenka to} Saitekisei Riron: Schwa shoushitsu no Baai,” In Onin Kenkyuukai Souritsu 10 Shu Nen Kinen Ronbunn Shu, edited by Onin Ron Kenkyu Kai, 1996, p.168. Copyright 1996 by Kaitakusha.

9_{According to Traugott (2008, p.23), “The approximate periods of the} history of English is referred to: Old English 650-1150, Middle English 1150-1500, Early Modern English 1500-1750, Modern English 1750-1970, Present Day English 1970.”

(37)

Toft’s (2002) empirical study supports Hayes’s (1995) view that liquid syllabic consonants, such as [l] and [r], trigger the deletion of schwa in a word final syllable because the consonants closest to vowels in sonority can play a role in the phonological status of vowels. She argues that syllabic consonant /l_{̩/ and /n̩/, where /l̩/ is higher than} / n_{̩/ in sonority, behave differently. The syllabic consonant /n̩/ is} actually pronounced /_{ən/ when preceded by a non-coronal consonant} (/p/, /b/, /k/, /g/. etc.), whereas syllabic /l_{̩/ is always pronounced as [l]} regardless of the preceding consonants. Based on the phonetic evidence, as shown in Figure 1.5, Toft phonologically explained the phonetic behaviors of the syllabic consonants: a syllabic /l_{̩/ is directly} attached to a nuclear constituent in the phonological structure, whereas a syllabic /n_{̩/ is attached to the onset and is sometimes} spread to nuclear.

Figure 1.5. Structures for syllabic /l/ and syllabic /n/. “N” stands for a nuclear and “O” for onset. X shows a slot.10_{“ The phonetics and} phonology of some syllabic consonants in Southern British English, ” by Z. Toft, 2002, ZAS Papers in Linguistics, 28, p.134. Copyright by ZAS (Center for General Linguistics).

10_{This is referred to as “occupying the nucleus position on the syllable tier” and used} in the generative theory.

(38)

1.1.3.4 Summary of the phonological characteristics of schwa

The previous sections overviewed schwa in light of historical phonology and presented the issue of the presence and absence schwa in unstressed closed syllables with sonorant consonants in the word-final position. The findings of the phonological characteristics of schwa are summarized as follows:

(1) In terms of the dynamic change of a full vowel to schwa, the reduction started in OE and prevailed in ME, becoming stable in the present form around the fifteenth to sixteenth centuries. This process probably occurred due to the economy principles in language.11

(2) Regarding the characteristics of syllabic consonants, the syllabic consonant /n_{̩/ is realized as [ən] when preceded by a} non-coronal consonant (/p/, /b/, /k/, /g/. etc.), whereas syllabic [l] is always realized as /l_{̩/ regardless of the preceding} consonants. Syllabic consonants should be carefully dealt with in the experiments in the present study.

11_{Regarding the economy principles in language, Vicentini (2003) states that “in} such a dynamic process as linguistic change, words are constantly being shortened, permuted, eliminated, borrowed and altered in meaning, but, thanks to the Principle of Least Effort, an equilibrium with a maximum of economy is always preserved.” (p.40).

(39)

1.2 English and Japanese Phonologies

This section presents the similarities and differences in vowels, syllables, word accents, and language rhythms between English and Japanese, in particular, to understand the acquisition of English schwa by Japanese learners of English.

1.2.1 Vowels

There are several differences between the vowels in English and Japanese in terms of the richness. American English exhibits nine phonemic vowels (Ladefoged, 1993), including [i, _{ɪ, ɛ, æ, e, ə, ʌ, ʊ, ɔ,} and a] in the vowel space, whereas Japanese vowels consist of five full vowels, [i.e., i, e, a, o, and u], which are phonetically realized as monophones that are not reduced in length.

English does not possess such phonemically contrastive vowels in terms of length, although it does have phonetically contrastive vowels; for example, a lax vowel /_{ɪ/ as in‘bit’ /bɪt/ and a tense vowel /i/} as in ‘beet’ /bit/. Because of this difference, Japanese speakers of English often tend to distinguish English contrastive vowels by employing long and short categorical boundaries according to the Japanese strategy of phonemic distinction (Flege, 1995).

On the other hand, in Japanese, vowel length can be phonemically contrastive between short and long versions (i.e., one mora vs. two morae in syllable weight). The Japanese short vowels have their long counterparts: /a, aa/, /e, ee/, /i, ii/, /o, oo/, and /u, uu/. For example, in‘biru’ (building), a short vowel /i/ is used; whereas in‘biiru’ (beer), a long vowel is used. In this way, there is a contrast between long and short vowels in Japanese.

(40)

in peripheral areas in the F1 and F2 acoustic vowel space, as shown in Figure 1.6. The Japanese vowel space has an empty central area with no central vowel categories, whereas English has a mid–central vowel. Tables 1.2 and 1.3 indicate the vowel formant frequencies for Japanese vowels (Imaishi, 1997) and English vowels (Peterson & Barney, 1952). Due to the lack of a central vowel in the Japanese vowel system, it is plausible that the schwa produced by Japanese speakers of English might be mapped onto the sounds distributed in their native vowel spaces.

Japanese English

Figure 1.6. The vowels of Standard Japanese (left) and English (right). Adapted from “Japanese,” by H. Okada, 1991, Journal of the International Phonetic Association, 22(1), p.94. Copyright 1991 by Cambridge University Press and adapted from Technology Enhanced Accent Modification. (2008). American English Vowels. Retrieved from http://www.tap.msu.edu/team/online/Default.aspx

(41)

Table 1.2

F1, F2, and third formant frequencies (F3) for Japanese vowels (females) F1 F 2 F3 [a] 978 1384 2716 [ɪ] 381 2866 3699 [u] 390 1274 2760 [e] 510 2509 3246 [o] 567 894 3099

F1, F2, and F3 for English vowels (females)

F1 F 2 F3 [i] 300 2800 3300 [ɪ] 430 2500 3100 [ɛ] 600 2350 3000 [æ] 860 2050 2850 [a] 850 1200 2800 [ɔ] 590 900 2700 [ʊ] 470 1150 2700 [u] 370 950 2650 [ʌ] 760 1400 2800 ['] 575 1700 2800

(42)

1.2.2 Devoiced Vowels

Although the Japanese language does not have a phonologically mid–central vowel, i.e., schwa (Kondo, 2000), it has a so-called devoiced vowel that shares common phonetic characteristics with schwa. It is, for example, intrinsically short in duration, low in intensity, and often occurs in unaccented syllables (Kondo). In Japanese, almost all the high vowels, /i/ and /_{ɯ/, frequently become} voiceless in the full vowels between voiceless consonants and between a voiceless consonant and a pause (Kondo).

Although devoiced vowels and schwa share phonetic similarities, they differ in the process of phonetic realization (Kondo, 1994). Electromyographic data obtained by Yoshioka (1981) showed that vowel devoicing in Japanese occurs not from centralization or reduction, but from the glottal–gestural overlap between the consonants and vowels. One might assume that Japanese speakers of English have no difficulty in producing schwa because they can produce a weak vowel as a devoiced vowel in their speech. However, because the processes of articulating schwa and articulating the devoiced vowels are different, Japanese speakers may not be able to apply the same strategy for pronouncing the devoiced vowels to the production of the English schwa.

1.2.3 A Mora and a Syllable

English and Japanese syllables have different internal constituency: Japanese has a subsyllabic unit, a mora (Arisaka, 1940; Han, 1962; Hattori, 1965; Sugito, 1989; see Otaka, 2009, pp. 2-3, for a review), while English does not have a subsyllabic unit.

(43)

vowel (V), with optional initial and final margins (i.e., onsets and codas), which are typically consonants (Cs) (English Phonetic Society of Japan, 2004, pp. 229-230). English allows at most three consonants in an onset and four consonants in a coda (e.g., CCCV, ‘string’; VCCCC ‘sixths’) (Otaka, 1998, p.61).

A Japanese mora typically consists of a consonant plus a vowel. In addition, there are three special phonemes that can become a mora (i.e., more phonemes) (Otaka, 2009, p. 2): 1) hatuson: /N/ (e.g., /taNbo/ ‘rice fields’ has three moras), 2) sokuon: geminate obstruents, transcribed as /Q/ followed by an obstruent (e.g., /kiQta/ ‘cut’ has three moras), and 3) hikion: the second half of a long vowel, transcribed as /R/ (e.g., /ojiRsan/ ‘grandfather’ has four moras, while /ojisan/ ‘man’ has three moras) (see Otaka, 2009; Warner & Arai, 2001). Because of the relatively simple structure of a consonant and vowel (e.g., CV, V), the open syllabicity, and the practice of kana orthography, in which kanas (syllabaries) correspond with moras, it is easy to count the moras (Otaka, 2009, p. 2).

Since Japanese speakers are accustomed to the simple structures of the mora, they may face difficulty in perceiving English words (e.g., Tajima & Erickson, 2001) and pronouncing complex English syllables appropriately (Otaka, 1998, p. 220).

1.2.4 English and Japanese Word Accents

English and Japanese exhibit different types of accents. English exhibits a stress accent; Japanese possesses a pitch accent. One of the differences between the two types of word accents lies in the phonetic features that generalize the accents. The English accent is realized with pitch (fundamental frequencies, F0),

(44)

amplitude (intensity), and duration (Beckman, 1986), while Japanese mainly uses pitch (Beckman, 1986; Sugito, 1969). It should be noted here that Beckman (1986) demonstrated an interesting similarity between the word accents in the two languages by comparing fundamental frequencies of disyllabic words. She revealed that the contrast between falling (L) and rising (H) intonation contours in Japanese (e.g., initial vs. final accented words as in kata ‘shoulder’: HL vs. kata ‘form’: LH) were very similar to English counterparts (initial vs. final stressed words, as in English ‘CONtrast’ vs. ‘conTRAST’). This finding suggests that the pitch factor in both languages corresponds to the other, and the type of pitch accent tone is limited to rising (H) and falling (L) (Akita, 2001, p. 137). Therefore, it can be assumed that for Japanese speakers of English, it is not difficult to perceive the pitch information of the English word accent.

Another difference between English and Japanese word accents is that in English, the phonetic features (e.g., pitch, intensity, and duration) involved in word accents are closely related to creating a stress-timed rhythm (Ueyama, 2000). On the other hand, the Japanese accent is independent of the rhythmic aspect observed in the English stress accent (Haraguchi, 1977). The rhythmic structure of Japanese largely depends on the number of moras and pauses with no contribution of pitch accent.

1.2.5 Language Rhythm

English and Japanese have been traditionally classified into different timing categories. English is a stress-timed language (Abercrombie, 1967), where an alternation of weak and stressed

(45)

syllables occurs in the stream of speech at an equal interval of time called an inter-stress interval (ISI), or a foot.

On the other hand, Japanese is a mora-timed language (Hattori, 1960; Trubetzkoy, 1939; see Warner & Arai, 2001 for a review), where the mora called ‘haku’ (beat) is regularly produced at an equal time interval as in syllable-timed rhythms.

Note here that in terms of the theory of the ISI in stress-timed languages, many researchers have taken the position that this idea is more uncertain than has previously been asserted. Such theorists have put forward evidence from acoustic data that stressed syllables in English are not spaced at regular intervals (e.g., Classes, 1939; Lehiste, 1977). Furthermore, researchers have also argued that foot isochrony is a perceptual phenomenon, not a physical one. Darwin and Donovan (1980), Donovan and Darwin (1979), and Lehiste (1977), for example, found that listeners tend to perceive regularity in speech even where there is no such regular rhythm in a strict sense in the acoustic data.

Regarding mora-timed rhythm, Port, Dalby, and O'Dell (1987) provided a new definition of mora-timing, claiming that a word’s duration can be predicted from the number of moras in the word. Warner and Arai (2001) reviewed a number of major studies concerning mora-timing in Japanese, and concluded that although the concept of mora-timing began with the notion that moras were regularly timed, it seems that mora-timing is a matter of perceived duration rather than physical duration, as Bloch (1950) has also proposed.

In sum, English and Japanese employ different language rhythms. English is traditionally classified as a stress-timed

(46)

language (Abercrombie, 1967; Pike, 1945), and Japanese has a mora-timed rhythm, although there are different opinions about the realizations of each rhythm (Block, 1950; Hattori, 1960; Trubetzkoy, 1939; see Warner & Arai, 2001 for a review). This difference in language rhythm might affect the production and perception of schwa by Japanese speakers of English in the present study.

1.2.6 Summary of English versus Japanese Phonologies The previous sections briefly reviewed the comparison of Japanese and English phonologies in terms of vowels, syllable units (a syllable vs. mora), word accent, temporal organization. In summary, the major differences between Japanese and English phonology are summarized as follows:

(1) The Japanese accent is independent of the rhythmic aspect and the Japanese language does not have a schwa unlike in English, which plays an important role in making the English rhythm.

(2) In realizing the accent patterns, English uses several elements including pitch, intensity, and duration, whereas Japanese mainly uses pitch.

(3) English uses phonetically contrastive vowels in quality (i.e., lax and tense), whereas Japanese employs a phonemic category using length (i.e., short and long).

It is expected that due to these phonological differences in accent and vowels between the two languages, Japanese learners of English might have difficulty in dealing with the English accent and schwa. Thus, particular attention should be paid to the use of

(47)

(48)

1.3 Preference of Word Accent Patterns in English and Japanese

The present section reviews past studies that investigated the preference of accent patterns by children in English and Japanese. If the common preference in word accent patterns is found cross-linguistically in the first language (L1) acquisition, the preference might also be possible in second language (L2) acquisition.

English-speaking Children

In the acquisition of L1 phonology, a number of empirical studies reported that young children have difficulty in learning schwa, showing that they tend to produce deformed (i.e., a full vowel) or omitted unstressed syllables in their speech (e.g., Allen & Howkins, 1980; Carter & Gerken, 2003, 2004; Demuth, 1994; Gerken, 1991,1994; Gerken, Landau, & Remez 1990; Gerken & McIntosh, 1993; Vihman, 1996).

Allen and Hawkins (1978, 1980) were the first scholars to claim that young English-speaking children have difficulty in alternating stressed and unstressed syllables. According to the researchers, English-speaking infants tend to omit unstressed functional morphemes that precede syllables with primary stress in multisyllabic words in their spontaneous speech; for example, ‘elephant’ becomes ‘ephant*12_{’ and ‘banana’ becomes ‘nana*.’ Also,} they stated that children slowly develop the ability to reduce unstressed vowels and that the speech production of one- or two-year-olds sounds syllable-timed (Allen & Hawkins, 1980).

(49)

Three-year-old children accurately produced reduced syllables 33—70 % of the time, but by the age of four or five, as their speech rate increases, they are able to produce a greater number of reduced vowels with an adult-like stress-timed rhythm (Allen & Hawkins, 1978). Similarly, Nittrouer (1993) showed that young children produce a longer reduced vowel in place of a formal schwa as late as seven years of age.

Researchers studying young children’s phonological development have maintained that there is a template for the development of language-specific timing patterns. Allen and Hawkins (1978, 1980) propose that children are in favor of producing a certain rhythmic pattern, i.e., the trochaic (strong–weak) foot in phonological development.

Gerken(1991) and her colleagues claim that young children13 are more likely to produce determiners such as ‘the’ with a schwa when these could be prosodified as shown in example (a), than as a part of a weak–strong syllable foot, as shown in example (b) (Gerken). In (a) and (b), ‘S’ stands for a stressed syllable, ‘W’ for a weak syllable, and ‘Ft’ for foot.

(a) Ken [pushed the] Ft [dog] Ft. S W

(b) Ken [pushes] Ft the [dog] Ft. S W W

Wijnen, Krikhaar, and Os (1994) examined rhythmic pattern information from the lexicon associated with the independently addressable onset, peak, and coda slots of disyllabic templates. In

(50)

the disyllabic template, the leftmost peak slot is marked ‘strong’ (S), and the rightmost peak slot is marked‘weak’ (W). They analyzed two Dutch children whose ages ranged from 18 to 35 months and demonstrated that SW trochaic words are almost always produced correctly, whereas the combination of iambic words and weak-initial words (e.g., a WSW sequence of syllables such as in the word‘banana’) are often deformed or omitted. The data obtained in this study generally agree with the patterns of weak vowel omission reported by Allen and Hawkins (1978, 1980).

Demuth 14 _{(1994, 1995) cross-linguistically explained the} truncation of functional categories, which are likely to be produced as syllables containing a schwa sound in early children’s speech. She proposed the metrical model of production (MMP), adapting the phonological conception of ‘ foot.’ In her study, the language-specific metrical phonologies were considered for both penultimate stress (i.e., English) and stress-final languages (i.e., Sesotho.15_{) The details of} MMP are as follows:

(1) Stressed syllables of a word are most likely to be retained (e.g., S [SW])

(2) Unstressed syllables of a prosodic word are most likely to be omitted or reduced

(e.g., trochaic foot + pre-tonic syllable, W [S W])

(3) Unstressed syllables that fall within a foot are more likely to be retained than extra-metrical syllables

(e.g., iambic foot [WS], trochaic foot [SW])

14_{See Demuth (1994, 1995) for details on the development of prosodic} words from a cross-linguistic perspective.

15_{Sesotho is the southern Bantu language, spoken in Lesotho and adjacent} parts of South Africa to the north (Doke & Mofokeng, 1985).