• 検索結果がありません。

FAJLawahara 最近の更新履歴 川原繁人の論文倉庫3

N/A
N/A
Protected

Academic year: 2018

シェア "FAJLawahara 最近の更新履歴 川原繁人の論文倉庫3"

Copied!
55
0
0

読み込み中.... (全文を見る)

全文

(1)

Japanese geminate devoicing

once again:

Insights from information Theory

Shigeto Kawahara kawahara@icl.keio.ac.jp

FAJL 8, Feb 19th, 2016 Mie University

(2)

Synopsis

In Japanese loanwords, voiced geminates devoice in response to OCP(voice) (e.g. /doggu/ => [dokku]), but voiced singletons do not.

I have been working on this phenomenon almost throughout my academic career (since 2004).

I attempt to develop a new analysis based on Shannon’s (1948) Information Theory.

(3)

Information theory in

linguistics

The starting premise: languages convey messages in an efficient way.

This is an old idea, with renewed interests in recent years. Martinet (1972): phonological contrasts that carry high

functional loads are less likely to neutralize diachronically. Also pursued in syntax-related research; e.g. by Florian

Jaeger (2010, Cognitive Psychology)

Martinet, André. 1972. Function, structure, and sound change. Reader in historical and comparative linguistics, ed. by A.R. Keiler, 139–74. New York: Holt, Reinhart, & Winston, Inc.


(4)

Surviving truncation in

Chinese (Shaw et al. 2014)

Chinese has a truncation-based compounding process that is very similar to Japanese.

(5)

The Chinese data

(6)

A bit of digression:

Japanese data?

The difference between Chinese and Japanese is that second elements can survive in truncation in Chinese, but very rarely in Japanese.

東大(東京大学)、京大(京都大学)、名大(名古屋大学)、九大(九 州大学)、 外大(外国語大学)、神大(神戸大学)、北大(北海道 大学)、明大(明治大学)、(慶大)慶應大学、(早大)早稲田大 学、中大(中央大学)、駒大(駒沢大学)、日大(日本大学)、青学

(青山学院)、国研(国語研究所)

阪大(大阪・大学)、こいちゅん(恋する・フォーチュンクッ

キー)、スタコン(ミスター・コンテスト)、レコパ(レコード・ キーパー)

(7)

Informativity plays a role

Shaw et al. (2014, Morphology) show that what

survives in Chinese truncation is elements that are informative.

Being informative is quantified by “the probability of being able to guess, correctly, the original word from the truncated bit” (a rough approximation).

(8)

Family size as a predictor of

informativity

The larger family size means less likely to guess

what the original word is

(9)

Deletion of onset [t,d] in

English

Onset consonants usually do not delete (cf. McCarthy 2008, Phonology; Wilson 2001, Phonology etc).

Raymond et al. (2006, JVC) studied deletion of onset [t,

d] in the Buckeye corpus of spontaneous interview speech. [t, d] deletion is found in frequent words like somebody, lady, and better, especially when the following context makes that word predictable; e.g. ladies and gentlemen. cf. “All the single ladies”.

(10)

Time (s)

0 0.9

0 5000

Frequency (Hz)

Time (s)

0 0.9

0 5000

Frequency (Hz)

Time (s)

0 0.9

0 5000

Frequency (Hz)

(11)

Yoshinoya-Effect

Also commonplace in Japanese as well: [aʃʃita] or [ʃitaa] from /arigatoogozaimaʃita/ and [zaasu] from /ohayoogozaimasu/. The effect that I would like to call “yoshino-ya (吉野家) effect” Phrases that are common—i.e. those that are predictable—

undergo heavy reduction/lenition.

This is an observation that is very common, but is usually

relegated away as “a matter of performance” (my impression).

(12)

Some other relevant work

Aylett and Turk (2004, Lg & Sp; 2006, JASA). Bell et al. (2009, JML)

Cohen-Priva (2012, Stanford diss.; 2014, LabPhon) Shaw & Kawahara (in progress; see below)

Whang (2016, NYU diss.)

(13)

Anti-Chomskian?

Such probabilistic predictability should not affect linguistic patterns (Chomsky 1957, Syntactic

Structures).

“[I]n the context “I saw a fragile __,” the words

“whale” and “of” may have equal (i.e. zero) frequency in the past linguistic experience of a speaker who will recognize that one of these substitutions, but not the other, gives a grammatical sentence. (p.16)”

(14)

Colorless green ideas

The grammaticality of “Colorless green ideas sleep furiously” shows after all that syntactic knowledge is independent of lexical knowledge (SS, p.15)?

Pereira (2002, Phil. Trans of Royal Society): “Formal grammar and informational theory: Together again?”

p(Colorless green ideas sleep furiously)

p(Furiously sleep ideas green colorless) = 2*10^5

(15)

Anti-Chomskian?

Chomsky repeatedly argue that language is not designed for communication.

In all honesty, I don’t understand this argument, but let’s put this issue on the back burner and come back to it during the Q&A period.

This talk pursues the complete opposite line of attack: what if languages are designed to convey messages in an efficient way?

(16)

The Japanese data:

The pattern under

question and its brief

history

(17)

The data

Geminates devoice when there is another voiced

obstruent within the same morpheme (Nishimura 2003, MA thesis; Kawahara 2006, Lg).

/beddo/=>[betto]; /biggu/=>[bikku]

cf. /reddo/=>*[retto]; /eggu/ => *[ekku]

But singletons do not devoice in the same environment. cf. /bado/ => *[bato]; /bagu/ => *[baku]

(18)

Experimental confirmation

of the data

/d...dd/ /...dd/ /d...d/ /...d/

Devoicing response (%) 010203040

Kawahara & Sano (2016, JJL), head-to-head expt; “do you

usually pronounce the

Japanese word for “dog” as [doggu] or [dokku]?”

(19)

Kawahara (2006)

An Optimality Theoretic analysis:

Faith(voi)sing >> OCP(voice) >> Faith(voi)gem

/dagu/ Faith(voi)sing OCP(voice) Faith(voi)gem

[daku] *!

[dagu] *

/doggu/ Faith(voi)sing OCP(voice) Faith(voi)gem

[dokku] *

[doggu] *!

(20)

P-Map (Steriade

2001/2008)

Where does the ranking Faith(voi)sing >> Faith(voi)gem come from?

Kawahara (2006, Lg): A voicing contrast is more

perceptible in singletons than in geminates, because voiced geminates are half-devoiced (see below for

more on this) and hence less perceptible.

Heavily influenced by Steriade’s P-map Theory (2008, Kiparsky’s Festschrift).

(21)

An alternative hypothesis

pursued in this talk

The new hypothesis pursued today: A voicing contrast in geminates is less informative than a voicing contrast in singletons.

The guiding intuition behind this new hypothesis: voiced geminates are allowed only in loanwords, and therefore, a voicing distinction is not very common among geminates (cf. Rice 2006, Toronto Working Papers).

Information Theory provides a way to formalize this idea.

(22)

A brief introduction

to Information

Theory

(23)

Information theory

A theory proposed by Claude Shannon (1948) in his MA thesis written at MIT (“A mathematical theory of communication”).

It proposed a way to quantify information in a mathematical way.

The overall goal was to develop a method for

“efficient communication”.

(24)

Given binary switches

# of binary

“switches” # of distinctions exponential

notation log notation

0 1 2^0=1 log2(1)=0

1 2 2^1=2 log2(2)=1

2 4 2^2=4 log2(4)=2

3 8 2^3=8 log2(8)=3

4 16 2^4=16 log2(16)=4

5 32 2^5=32 log2(32)=5

How many distinctions can

you make?

How many switches are

necessary?

(25)

PROBABILITY AND

INFORMATIVITY

A rare event is more informative (causes bigger “surprisal”).

Info(x) = log2(1/P(x))

= -log2(P(x))

“How many binary

switches do you need to convey that

information?”

0.0 0.2 0.4 0.6 0.8 1.0

0123456

Pr(x)

bit

(26)

Intuitively...

“The culprit is either in Kanto, Tohoku, Chuubu, Hokuriku, Kinki, Chuugoku, Shikoku, Okinawa, Hokkaido, or somewhere abroad”

(27)

We find this funny, because we know that he’s offering no information; i.e. there is no surprisal.

In terms of informativity, if p(1), then Info=-log2(1)=0. Zero information!

(28)

But wait, he’s excluding Kyushu. So he’s saying “somewhere other than Kyushu”, whose probability is, say, 0.9.

In terms of informativity, -log2(0.9)=0.15 bits, not very informative.

(29)

But what if he was saying that “the culprit was in Kyushu”. It’s

probability is much lower, say, 0.1. Its informativity would be -log2(0.1) = 3.3 bits.

In summary, in terms of informativity: Saying X is in Kyushu (3.3) >

Saying X is not in Kyushu (0.15) > X could be anywhere (0).

See http://matome.naver.jp/odai/2127017524627754901 for more.

(30)

Shannon (average) entropy

A rare event has high entropy, but its probability (i.e. weight) is low.

Entropy(x)=log2(1/P(x))*P +log2(1/Q(x))*Q

when P and Q are the two choices.

For a binary contrast, it is most informative when the choice is at the random level.

Its maximum value is 1.

0.0 0.2 0.4 0.6 0.8 1.0

0.20.40.60.81.0

Pr(x)

bit

(31)

Applying Information

Theory to the

Japanese geminate

devoicing pattern

(32)

Calculating entropy for a

real language example

Count([t])=6,166,896 & Count([d])=1,986,985 p([t])=0.76 & p([d])=0.24

log2(1/0.76)=0.40 & log2(1/0.24)=2.03

Entropy=0.40*0.76+2.03*0.24=0.80 bits

Counts based on the CSJ.

(33)

Calculating entropy

Count([tt])=478,525 & Count([dd])=7,727 p([tt])=0.98 & p([dd])=0.02

log2(1/0.98)=0.02 & log2(1/0.02)=5.98

Entropy=0.98*0.02+5.98*0.02=0.12 bits

(34)

Entropy based on the CSJ-

Core

p-b t-d k-g pp-bb tt-dd kk-gg

Token Freq Type Freq

Entropy of voicing contrasts based on the CSJ-Core

Entropy (bit) 0.00.20.40.60.81.0

(35)

Entropy based on Amano &

Kondo (2001)

p-b t-d k-g pp-bb tt-dd kk-gg

Token Freq Type Freq

Entropy of voicing contrasts based on A & K

Entropy (bit) 0.00.20.40.60.81.0

(36)

Informativity to

faithfulness strength

A voicing contrast is more informative in singletons than on geminates in the Japanese lexicon.

This difference could be the source of ranking Faith(voi)sing >> Faith(voi)gem, independently proposed by Kawahara (2006, Lg)! Informativity can be projected onto “faithfulness strength”.

Note that there is still a role of abstract phonological grammar: devoicing is not context-free but is delineated by grammatical factor (i.e. OCP(voice)).

(37)

Place effects

bb dd gg

The effect of place

Probability of devoicing 0.00.10.20.30.40.5

Based on Kawahara & Sano (2013, LgSci), a corpus study using

the CSJ.

Setting aside [bb], for which there are not many examples.

[gg] is more likely to devoice than [dd]—in most entropy

calculations, voicing is more informative in [dd] than [gg].

(38)

Further predictions

The more frequent the word is, the easier it is for the listener to recover what that word is. Hence the

importance of the voicing contrast is reduced.

The longer word, the more information listeners get from other segments. It reduces the importance of the voicing contrast in geminates (cf. Cohen-Priva 2012, Stanford diss., 2014, LabPhon).

(39)

Effects of lex freq and word

length

rating~freq+mora

(freq: t=5.4, p<.001; mora: t=3.2, p<.01)

0 2 4 6 8

3.23.43.63.84.04.24.4

Effects of lex freq

log lexical frequency

naturalness rating

rho=.59

3 4 5 6

3.23.43.63.84.04.24.4

Effects of word length

mora word length

naturalness rating

rho=.28

Based on Kawahara’s (2011, JEAL) naturalness rating

(40)

Slight complications?

Now we seem to be talking about informativity in the context of lexical access.

How difficult does devoicing make lexical access?

The domain within which to calculate entropy may now be word rather than a segment (cf. Cohen-Priva 2014; Hume 2016, Phonological Studies).

An idea is now getting very close to P-map (Steriade 2008, Kiparsky’s Festschrift).

(41)

If word is the domain…

If word is the domain of entropy calculation, it may

provide us with another way of looking at OCP(voice). Native vocabularies do not have two voiced

obstruents (Ito & Mester 1986, LI).

Hearing one voiced obstruent gives rise to expectation that no voiced obstruents will appear again within the same morpheme.

(42)

What’s wrong with

Kawahara (2006)?

Nothing, as far as the phonological analysis is concerned.

Well, it cannot account for the frequency effect

without an extra mechanism (Coetzee & Kawahara 2013, NLLT) or the word length effect.

And Kawahara (2006) left one question unanswered: why Japanese speakers do not implement full voicing during geminates.

(43)

Devoicing is not necessarily

the norm

Time (s)

0 0.5

-0.5126 0.399

0

dd

Time (s)

0 0.5

Time (s)

0 0.5

-0.3964 0.3542

0

dd

Time (s)

0 0.5

Time (s)

0 0.5

-0.273 0.4496

0

dd

Time (s)

0 0.5

Time (s)

0 0.5

-0.5 0.4049

0

dd

Time (s)

0 0.5

(a) Arabic

(c) Norwegian

(b) Hindi

(d) Swedish

(44)

Arabic vs. Japanese

Time (s)

0 0.6

-0.608 0.4935

0

Time (s)

0 0.6

0 5000

Frequency (Hz)

h a dd u

Time (s)

0 0.6

Time (s)

0 0.6

-0.5093 0.3344

0

Time (s)

0 0.6

0 5000

Frequency (Hz)

h e dd o

Time (s)

0 0.6

Arabic Japanese

(45)

Entropy in Arabic

t-d k-g tt-dd kk-gg

Token Freq Type Freq

Entropy of voicing contrasts in Arabic

Entropy (bit) 0.00.20.40.60.81.01.2

(46)

Explaining phonetic

differences

Japanese speakers do not implement voicing during geminates, because it is not informative.

Arabic speakers do implement voicing during geminates, because it is informative.

(47)

Matsuura (2016): voiced

geminates in a dialect of Japanese

In some Kyushu dialects, [voice] is contrastive in geminates.

(48)

Bit of digression

Recent EMA experiment with Jason Shaw on the lingual gesture of Japanese devoiced vowels.

Their lingual gestures seem reduced compared to voiced counterparts, in both temporal and spatial dimensions.

e

o

voiced [u]

devoiced [u]

(49)

Summary

Information Theory explains:

why geminates are more likely to devoice than singletons (in response to OCP(voice)).

why [gg] is more likely to devoice than [dd].

why lexical frequencies and word length affect the likelihood of devoicing.

why Japanese speakers do not implement voicing during geminates as much as Arabic speakers.

(50)

From the lexicon to

grammar

I’m not denying the role of a traditional “generative phonological mechanism”.

Weights of constraints are projected from entropy values calculated from the lexicon.

(cf. GLA: Boersma & Hayes 2001, LI)

(51)

Phonetics Phonology

Lexicon uncontroversial

very controversial, may threaten

the phonology-phonetics divide

(52)

Remaining questions

Most work using Information Theory to explain phonological patterns focused on deletion.

The current analysis implies that Information Theory may work for (context-sensitive) feature neutralization.

Does Information Theory make right predictions about other features and other phonological processes (like epenthesis?

Hume BLS 30; Hume & Bromberg 2005, conference handout.)

Articulatory strengthening at strong position (Hume 2016).

(53)

Provocative ending

I have moved from the Rutgers linguistics department to the Keio Institute of Cultural and Linguistic Studies. Now I have more opportunities to interact with non-

linguists and explain my research to them. I even work with ALS patients to help them preserve their voice.

It is much better if I use vocabularies that are not

specific to linguistics. Well, this is purely sociological, and has nothing to do with academic truth, though.

(54)

Provocative ending

I feel that linguists are (perhaps unconsciously) proud of using notations that are very specific to linguistics. (which may come from the belief that language is

special.)

But everything else being equal, why not use the tools that are not specific to linguistics? (like Information Theory)

(55)

Acknowledgements

Jason Shaw for continuing inspiration and discussion on this and related topics. Also thanks to Beth Hume for the initial inspiration and her comments on a previous version.

JSPS Grants (#26770147 and #26284059), and especially # 15F15715, which makes my collaboration with Jason possible. Shin-ichiro Sano, Mafuyu Kitahara, and Uriel Cohen-Priva for their help with getting the frequency data.

Thanks to my colleagues at Keio.

参照

関連したドキュメント

In the latter half of the section and in the Appendix 3, we prove stronger results on elliptic eta-products: 1) an elliptic eta-product η (R,G) is holomorphic (resp. cuspidal) if

In a graph model of this problem, the transmitters are represented by the vertices of a graph; two vertices are very close if they are adjacent in the graph and close if they are

Therefore, with the weak form of the positive mass theorem, the strict inequality of Theorem 2 is satisfied by locally conformally flat manifolds and by manifolds of dimensions 3, 4

An integral inequality is deduced from the negation of the geometrical condition in the bounded mountain pass theorem of Schechter, in a situation where this theorem does not

·The infant carrier is only allowed to be used in combination with the child seat in the vehicle and only in rearward-facing orientation. ·Please keep any parts removed in a safe

Theorem 4.1 Two flocks of a hyperbolic quadric in PG ( 3 , K ) constructed as in Section 3 are isomorphic if and only if there is an isomorphism of the corresponding translation

In the first section of this article symmetric ∗-autonomous monoidal categories V (in the sense of [1]) and enriched functor categories of the form P (A) = [A, V] (cf. [13]), are

Keywords: Artin’s conjecture, Galois representations, L -functions, Turing’s method, Riemann hypothesis.. We present a group-theoretic criterion under which one may verify the