Lingual articulation
of devoiced vowels
in Japanese
RIKEN, 2 February, 2016
Jason A. Shaw, Shigeto Kawahara, James Whang, Jeff Moore The Keio Ins9tute of Cultural and Linguis9c Studies
01.
Introduction
00/00/0000 FILE NAME GOES HERE PAGE 2
INTRODUCTION
Phonological rules vs. spontaneous
speech processes (phoneAcs)
• DescripAon of sound paFerns in terms of phonological vs. phoneAc processes is a classic debate in Laboratory Phonology (e.g. Browman and Goldstein, 1990)
• General agreement that some paFerns are
phonological in nature (i.e., categorical addi.on,
dele.on, change); others are phoneAc (i.e., gradient overlap of arAculatory gestures) (e.g., Zsiga, 1995, 1997)
• Phonological vs. phoneAc nature of a paFern implicate different models of linguis.c knowledge and,
therefore, of language acquisi.on and learnability
Browman, C., & Goldstein, L. (1990). Tiers in ArAculatory Phonology with some implicaAons for casual speech. In Kingston & Beckman (Eds.), Papers in Laboratory Phonology I: between the grammar and physics of speech (pp. 341-376). Cambridge: Cambridge University Press.
Vowel devoicing
• Several languages have been reported to devoice vowels between voiceless
consonants: Greek, Shanghai Chinese, Korean, Montreal French, Tokyo Japanese (Jun et al. 1998)
• These languages have features in common:
(1) voicing contrasts in consonants
(2) variaAon in syllable duraAon (long vs. short syllables).
PhoneAc vs. phonological descripAons
of vowel devoicing
• Simple phoneAc descripAon:
– Laryngeal gestures overlap to passively devoice vowels (e.g., Jun nd Beckman 1993).
• Simple phonological descripAon:
deleAon: V ! ∅ / [-voice] _____ [-voice] (Kondo, 2001) OR
devoicing: V ! [-voice] / [-voice] _____ [-voice]
(Tsuchida, 1997)
Laryngeal Aer: Oral Aer:
devoicing devoicing
ʃ u t
Japanese high vowel devoicing
• Possibly, the best-studied case of vowel devoicing
– Extensive phoneAc work on laryngeal arAculaAon, acousAcs of devoicing, and percepAon of
devoiced/deleted vowels (e.g., Hirose, 1971; Beckman and Shoji, 1984; Fujimoto et al., 2002; Whang, 2016) including in child-
directed speech (MarAn et al., 2015)
– Phonological arguments for deleAon (Kondo, 2001) and
opposing arguments for devoicing (Kawahara, 2015)
Japanese laryngeal gestures
• In devoiced vowel contexts, e.g., /kite/, there is a single laryngeal gesture of greater magnitude
than a single consonant gesture, c.f. /kide/ (Fujimoto
et al., 2002)
Fujimoto, M., Murano, E., Niimi, S., & Kiritani, S. (2002). Differences in gloFal opening paFern between Tokyo and Osaka dialect speakers: factors contribuAng to vowel
devoicing. Folia phoniatrica et logopaedica, 54(3), 133-143.
“Kiss ted” quickly
• When laryngeal gestures are brought close together in Ame, they merge into a single gesture of magnitude equal to sum of component laryngeal gestures (Munhall and Lofqvist, 1992)
Munhall, K., & Lofqvist, A. (1992). GESTURAL AGGREGATION IN SPEECH-LARYNGEAL GESTURES. Journal of Phone9cs, 20(1), 111-126.
s t s t s t
Japanese laryngeal gestures
• The magnitude of the laryngeal gesture in Japanese devoicing contexts is greater than the sum of two voiceless consonant gestures
(Fujimoto et al., 2002)
Fujimoto, M., Murano, E., Niimi, S., & Kiritani, S. (2002). Differences in gloFal opening paFern between Tokyo and Osaka dialect speakers: factors contribuAng to vowel
devoicing. Folia phoniatrica et logopaedica, 54(3), 133-143.
Summary of past work
• The laryngeal data indicate that devoicing in Japanese is acAvely controlled.
– Possibly due to reinterpretaAon of passive devoicing as acAve phonological control in modern Tokyo
Japanese.
• AcousAc evidence for presence/absence of a lingual vowel arAculaAon has been largely
equivocal, with some studies claiming the vowel has been deleted (Beckman and Shoji 1984) while others claiming that it is present (Faber and Vance 2001; cf. Whang 2016)
Our contribuAon to the debate
• ElectromagneAc ArAculography (EMA) data on the lingual ar.cula.on of vowels in voiced
and devoiced contexts.
• A computa.onal method for evaluaAng vowel deleAon on the basis of movement
kinemaAcs.
02.
EMA experiment
00/00/0000 FILE NAME GOES HERE PAGE 12
EMA experiment
• InvesAgate lingual arAculaAon of devoiced vowels in environments that vary in
predictability.
• Compare arAculatory dynamics of devoiced vowels to voiced vowels and to the “deleAon trajectory”, generated via simulaAon.
Materials (target words)
Devoicing/dele.on Voiced vowel Entropy = 1.99 (W,K) masutaa masuda
Entropy = 1.89 (w) hakusai yakuzai Entropy = 1.46 utaisei udaika Entropy = 1.08 (W,K) usoku uzoku
Entropy = 0.09 (W,K) katsutoki katsudou
10-15 repeAAons of the target words in the carrier phrase: okee ______ to iMe ‘Ok, say ______’. ParAcipants were instructed to speak as if they were making a request of a friend. Interspersed with 10 words lacking /u/
Procedure
• SAmuli were presented on a monitor in random order.
• To promote fluent reading, target words were
previewed before displayed in the carrier sentence.
• A naAve speaker of Japanese monitored pronunciaAon manually advancing trials aqer accepAng or rejecAng each token.
Target preview (500ms) Target sentence
Post-processing
• Head movements corrected computaAonally
• Data rotated to the occlusal (bite) plane
• Robust smoothing (Garcia, 2010)
EMA Sensors
Garcia, D. (2010). Robust smoothing of gridded data in one and higher dimensions with missing values. Computa9onal sta9s9cs & data analysis,54(4), 1167-1178.
AcousAc results: vowel devoicing
In line with current descripAons of Tokyo Japanese, /u/ was devoiced between voiceless consonants and voiced otherwise.
u? s o k u u z o k u
/ɸusoku/ /ɸuzoku/
Results: fusoku~fuzoku
e /u/
s/z TD is lower for
devoiced vowel
VerAcal PosiAon (mm) o
TT rise is earlier for [s] than [z]
Time (ms)
S01
Results: shutaisei~shudaika
e
a /u/
t/d lower TD and TB
for /u/
VerAcal PosiAon (mm)
earlier TT gesture for /t/
than for /d/
S01
Results: hakusai~yakuzai
a /u/
s/z TD is lower for
devoiced vowel
VerAcal PosiAon (mm)
a
TT rise is earlier aVer the
devoiced vowel
k S01
Results: katsutoki~katsudou
a /u/
ts TD is lower for
devoiced vowel
VerAcal PosiAon (mm) o:
TT release is earlier for the devoiced vowel
t/d o
S01 k
Results: masutaa~masuda
a
a /u/
d/t Temporal
difference in TD rise; devoiced
vowel is later
VerAcal PosiAon (mm) aa
And TB s
rise; devoiced vowel is later
and higher
S01
03.
Phonological deletion vs. phonetic
reduction
00/00/0000 FILE NAME GOES HERE PAGE 25
Analysis
• Voicing effect: do voiced and devoiced vowel trajectories differ?
– Discrete Cosine Transform (DCT) on TD trajectories – MANOVA on DCT coefficients
• Phonological dele.on: is the /u/ absent in any of these words?
– Simulate a “targetless” trajectory
– Compare DCT coefficients of “targetless” trajectory to data, c.f., t-test against zero.
Discrete Cosine Transform (DCT)
Complex curve
represented as the sum of Cosines:
1st Cosine
2nd Cosine
3rd Cosine intercept
Sum of cosines
Rao, K. R., & Yip, P. (2014). Discrete cosine transform: algorithms, advantages, applica9ons. Academic press.
How many DCT coefficients are
needed?
• Nearly lossless compression (99.6%) with 6 coefficients.
• We used 4 DCT coefficients (99.0%)
.996 .990
InterpretaAon of cosine components
Raw data (green) Mean DCT (black) [e]-to-[a] line
(red)
1st DCT Coefficient
! Average TD height 2nd DCT Coefficient
! V-to-V trajectory 3rd DCT Coefficient
! Intervening vowel
4th DCT Coefficient
! Coar.cula.on
e
a
e
a u
d
d
Effect of voicing on lingual arAculaAon
S03 Mean and standard deviaAon of DCT Coefficients 1st Coeff 2nd Coeff 3rd Coeff 4th Coeff
shudaika 69.47(3.01) 6.31 (1.59) -4.54(0.74) 0.94(0.48) shutaisei 62.18 (6.34) 6.17 (1.83) -0.04 ( 2.27) 0.63 (0.95)
[F(1,23)=23.30, p < .0001***; Wilk's Λ = 0.3209]
To assess the effect of vowel voicing on lingual arAculaAon, we conducted
a MANOVA on the DCT coefficients.
Are the blue (voiced) and red (voiceless) lines significantly different?
Vowel deleAon (“noisy” null)
trajectories
Noisy null trajectories (black lines) generated from
stochasAc sampling of Gaussian distribuAons defined by mean DCT
coefficients fit to the direct e-to-a trajectory (red line) and the standard devia.on of DCT coefficients fit to the raw data (green lines).
e
a a
raw data (green lines) e
direct e-to-a trajectory (red line)
Phonological deleAon
S03 Mean and standard deviaAon of DCT Coefficients
1st Coeff 2nd Coeff 3rd Coeff 4th Coeff
shudaika (data) 69.47(3.01) 6.31 (1.59) -4.54(0.74) 0.94(0.48) shudaika (sim) 60.49(3.01) 5.49(1.59) -0.00(0.74) 0.61(0.48) shutaisei (data) 62.18 (6.34) 6.17 (1.83) -0.04 ( 2.27) 0.63 (0.95) shutaisei (sim) 61.21(6.34) 6.30(1.83) 0.00( 2.27) 0.70(0.95)
To evaluate whether vowels are specified in the phonological output we conducted MANOVAs comparing the DCT coefficients of the data and the
“noisy null” hypothesis.
[F(1,23)= 57.49, p < .0001; Wilk's Λ = 0.0564] [F(1,23)= 3.39, p = . 0.4940; Wilk's Λ = 0.8439]
PhoneAc reducAon
in fusoku (c.f. fuzoku)
Main effect of voicing
[F(1,21)=22.92, p < .001; Wilk's Λ = 0.2798]
But, both fusoku and fuzoku are significantly different
from noisy null
PhoneAc reducAon vs. phonological
deleAon: the view from DCT components
Phone.c reduc.on Phonological dele.on
Summary: Vowel reducAon, deleAon
(6 parAcipants)
Target word Reduc.on Dele.on hakusai (c.f., yakuzai) (5/6) (1/6)
utaisei (c.f., udaika) (6/6) (4/6) usoku (c.f., uzoku) (4/6) (1/6) katsutoki (c.f., katsudou) (6/6) (3/6) masutaa (c.f., masuda) (6/6) (0/6)
Total: (27/30) (9/30)
DeleAon criteria possibly too lax for /aXda/ context as TD rises from / a/ posiAon for /d/ DeleAons and
reducAons paFern together;
ResulAng clusters from most to least deleAon:
t >> tst >> z, kz
Lingual vowel gestures are reduced/
deleted in voiceless contexts
• Vowel height is largely predictable from the large laryngeal gesture devoicing vowels.
• UninformaAve gestures are oqen subject to
reducAon/deleAon (Priva Cohen, 2012,2014; Hume, 2016; AyleF and Turk, 2004, 2006; Bell et al., 2009; Whang, 2016)
• In the absence of the large laryngeal gesture (as for voiced vowels), the lingual gesture is needed to express vowel height contrast.
Effect of surface consonant cluster
type?
• Results are consistent with syllable contact laws
(e.g., Murray and Venneman, 1983; Venneman, 1987).
• DeleAon is most likely when the resulAng cluster is fricaAve stop (FS), followed by affricate stop
(aFS), followed by fricaAve-fricaAve (FF) and stop- fricaAve (SF).
FS >> aFS >> FF , SF
Gender differences
Target word Reduc.on Dele.on
hakusai (c.f., yakuzai) (2/3) (3/3) (0/3) (1/3)
utaisei (c.f., udaika) (3/3) (3/3) (2/3) (2/3) usoku (c.f., uzoku) (2/3) (2/3) (1/3) (0/3)
katsutoki (c.f., katsudou) (3/3) (3/3) (2/3) (1/3)
masutaa (c.f., masuda) (3/3) (3/3) (0/3) (0/3) Totals: (13/15) (14/15) (5/15) (4/15)
Male female
Some limitaAons of the analysis
• Distribu)ons of word tokens modelled as either deleAon or reducAon
– this precludes the possibility that individual speakers someAmes delete vowels in a word.
• Noisy null hypothesis is linear, so it doesn’t
take into account contribuAons of consonants to TD height (e.g., masutaa )
04.
Phonetic reduction as variable
deletion
00/00/0000 FILE NAME GOES HERE PAGE 40
Inter-speaker varia.on fusoku~fuzoku S01
S02
S03
S04
S05
S06
PhoneAc reducAon (not deleAon)
Complete vowels (no reducAon)
Dele.on
fusoku
Within-speaker, within-word variaAon
S01
Although the distribuAon of fusoku tokens is significantly different from the voiced version (fuzoku) and from the noisy null, could it be that variable deleAon is a beFer characterizaAon than phoneAc reducAon?
PhoneAc reducAon Or variable deleAon?
Token-by-token evaluaAon
Fit a naïve Bayes classifier to the data and used it to
generate (posterior) deleAon probabiliAes Training data = voiced tokens & noisy null
Test data = voiceless tokens
ClassificaAon parameters
1st DCT Coefficient
2nd DCT Coefficient
3rd DCT Coefficient
4th DCT Coefficient
Greatest
separaAon for 3rd DCT
coefficient
/fusoku/
DeleAon probabiliAes by token - /
ɸusoku/ (all speakers)
Less than .1 chance of dele.on
Greater than .9 chance of dele.on
Mean P(D) : .404 42% of tokens
had P(D) > .5
Posterior dele.on probability
Parameters and probabiliAes /
ʃutaisei/ (all speakers)
Mean P(D) : .698 75% of tokens
had P(D) > .5
Posterior dele.on probability
1st DCT
2nd DCT
3rd DCT
4th DCT
Parameters and probabiliAes /
katsutoki/ (all speakers)
Mean P(D) : .597 64% of tokens
had P(D) > .5
Posterior dele.on probability
1st DCT
2nd DCT
3rd DCT
4th DCT
Parameters and probabiliAes /
hakusai/ (all speakers)
Mean P(D) : .258 25% of tokens
had P(D) > .5
Posterior dele.on probability
1st DCT
2nd DCT
3rd DCT
4th DCT
Parameters and probabiliAes /
masutaa/ (all speakers)
Mean P(D) : .198 17% of tokens
had P(D) > .5
Posterior dele.on probability
1st DCT
2nd DCT
3rd DCT
4th DCT
Summary: deleAon probability
Target word Reduc.on Dele.on P(D) hakusai (c.f., yakuzai) (5/6) (1/6) .258 ʃutaisei (c.f., ʃudaika) (6/6) (4/6) .698 ɸusoku (c.f., ɸuzoku) (4/6) (1/6) .405 katsutoki (c.f., katsudou) (6/6) (3/6) .597 masutaa (c.f., masuda) (6/6) (0/6) .198
Total: (27/30) (9/30)
ResulAng clusters from most to least deleAon:
t >> tst >> z >> ks
Token by token analysis
• The token by token analysis indicates that
“phoneAc reducAon” can be understood as variable deleAon.
• Vowels are variably deleted in all devoicing contexts, although percentages vary across contexts and speakers.
Conclusions: Japanese high vowel
devoicing
• Japanese devoiced vowels are characterized by:
– large laryngeal gestures (past work).
– And variable deleAon of the lingual gesture (this study).
• DeleAon probabiliAes vary across speakers but are uniformly influenced by linguisAc factors
– like other types of well-studied variaAon (e.g., Guy, 1997; Coetzee and Kawahara, 2013)
More broadly
• The structure of phonological form can be rigorously informed by phoneAc data.
• We have provided a specific example of a general approach:
– The key tenet is explicit (and realisAc) specificaAon of compeAng phonological hypothesis in the
dimensions of the phoneAc data.
04.
Acknowledgments
00/00/0000 FILE NAME GOES HERE PAGE 60
Research supported by the Japan Society for the Promo.on of Science postdoctoral fellowship to Jason Shaw and grant-in-aid to Shigeto Kawahara (# 15F15715).
fusoku~fuzoku (two subjects)
S03 S04
hakusai~yakuzai
S03 S04
katsutoki~katsudou
S03 S04
masutaa~masuda
S03 S04
Summary: ReducAon or deleAon (F3)
Devoicing/dele.on Reduc.on Dele.on hakusai (c.f., yakuzai) Yes No
utaisei (c.f., udaika) Yes Yes usoku (c.f., uzoku) Yes No katsutoki (c.f., katsudou) Yes No masutaa (c.f., masuda) Yes No
Summary: ReducAon or deleAon (M4)
Devoicing/dele.on Reduc.on Dele.on hakusai (c.f., yakuzai) No No
utaisei (c.f., udaika) Yes No (p = .03) usoku (c.f., uzoku) Yes No
katsutoki (c.f., katsudou) Yes Yes masutaa (c.f., masuda) Yes (p=.02) No
Summary: ReducAon or deleAon (F5)
Devoicing/dele.on Reduc.on Dele.on hakusai (c.f., yakuzai) Yes Yes
utaisei (c.f., udaika) Yes No usoku (c.f., uzoku) Yes No katsutoki (c.f., katsudou) Yes No masutaa (c.f., masuda) Yes No
Summary: ReducAon or deleAon (M6)
Devoicing/dele.on Reduc.on Dele.on hakusai (c.f., yakuzai) Yes No
utaisei (c.f., udaika) Yes Yes usoku (c.f., uzoku) Yes Yes katsutoki (c.f., katsudou) Yes Yes masutaa (c.f., masuda) Yes No
Summary: ReducAon or deleAon (F7)
Devoicing/dele.on Reduc.on Dele.on hakusai (c.f., yakuzai) Yes No
utaisei (c.f., udaika) Yes Yes usoku (c.f., uzoku) No No katsutoki (c.f., katsudou) Yes No masutaa (c.f., masuda) Yes No
Summary: ReducAon or deleAon (M8)
Devoicing/dele.on Reduc.on Dele.on hakusai (c.f., yakuzai) Yes No
utaisei (c.f., udaika) Yes Yes usoku (c.f., uzoku) No No katsutoki (c.f., katsudou) Yes Yes masutaa (c.f., masuda) Yes No
QuanAfying predictability
• Entropy is a simple measure of informaAon content from:
• Basic insight: an informaAve event (high
entropy) is one that is not predictable (low probability).
CondiAonal vowel entropy
• VariaAon in vowel Entropy as a funcAon of preceding consonant.