• 検索結果がありません。

fajl8 最近の更新履歴 川原繁人の論文倉庫3

N/A
N/A
Protected

Academic year: 2018

シェア "fajl8 最近の更新履歴 川原繁人の論文倉庫3"

Copied!
35
0
0

読み込み中.... (全文を見る)

全文

(1)

Can Japanese speakers

compensate for coarticulation

due to [l] and [r]?

FAJL 8 @ Mie University

Chika Takahashi & Shigeto Kawahara (Keio University)

2016/2/18

Contact: kawahara@icl.keio.ac.jp

(2)

Introduction

(3)

Context effect in speech

perception

Speech production of a segment is influenced by surrounding segments (a.k.a. coarticulation). Speech perception of a segment is likewise

influenced by surrounding segments.

Classic work by Ladefoged & Boardbend (1957), which shows that the perception of vowel

height is affected by the precursor sentence.

(4)

Ladefoged & Boardbend

(1957)

[b t] [bɛt]

(5)

Context effect as

normalization

Context effect is a way to deal with context-dependent variability (due to coarticulation).

Mann (1980): Given a [d]-[g] continuum, English listeners hear more of the

continuum as [g] after [l] than after [r].

(6)

Mann (1980)

% /g/ responses

/ al_

/ ar_ isolation

(7)

Compensation for

coarticulation

Listeners assume that after [l], the speaker’s tongue position is fronted, make up this assumed fronting, and are more likely to judge the continuum as [g].

This theory is called “compensation for coarticulation” (Fowler 2006, P&P).

Front Back [l] [r]

[d] [g]

(8)

Mann (1986)

Japanese listeners are famously unable to hear the difference bet ween English [l] and [r] (Goto 1971, et seq.).

Mann (1986) argues that Japanese

speakers cannot hear this difference, but they nevertheless compensate for coarticulation due to [l] and [r].

(9)

Mann (1986)

Figure 2.

LU Z 0 or) LU n"

The pattern o f "ga" responses given to stimuli along an acoustic/da/-/ga/ continuum when the stimuli were presented in isolation (left panels), and

when they were preceded b y / a l / a n d / a r / (right panels). From top to bot- tom, subjects include: (1) native speakers o f English who are 100% correct in identifying /l/ and /r/, (2) native speakers o f Japanese who are 99% correct in labeling /l/ and /r/, and (3) native speakers of Japanese who perform at chance level in l a b e l i n g / / / a n d / r / .

"< 5 0

== (.9

Z ILl (0 tr UJ 0.

1 0 0 -

5 0

0

1 0 0

0

1 0 0 -

I I I I i I I

Speech perception 187

1 0 0

5 0

I I I I I I I

I00-

5 0 -

5 0

- i s o l a t e d C V s t i m u l i

I i I I I I I

1 2 3 4 5 6 7

1 0 0

5 0

0

-o--o preceding [or ]

e--e preceding [ OI ]/ /~Y"

f '

, , ,

1 2 3 4 5 6 7

S T I M U L U S N U M B E R

Figure 2.

LU Z 0 or) LU n"

The pattern o f "ga" responses given to stimuli along an acoustic/da/-/ga/ continuum when the stimuli were presented in isolation (left panels), and

when they were preceded b y / a l / a n d / a r / (right panels). From top to bot- tom, subjects include: (1) native speakers o f English who are 100% correct in identifying /l/ and /r/, (2) native speakers o f Japanese who are 99% correct in labeling /l/ and /r/, and (3) native speakers of Japanese who perform at chance level in l a b e l i n g / / / a n d / r / .

"< 5 0

== (.9

Z ILl (0 tr

UJ 0.

1 0 0 -

5 0

0

1 0 0

0

1 0 0 -

I I I I i I I

Speech perception 187

1 0 0

5 0

I I I I I I I

I00-

5 0 -

5 0

- i s o l a t e d C V s t i m u l i

I i I I I I I

1 2 3 4 5 6 7

1 0 0

5 0

0

-o--o preceding [or ]

e--e preceding [ OI ]/ /~Y"

f '

, , ,

1 2 3 4 5 6 7

S T I M U L U S N U M B E R

English listeners Japanese listeners

% /g/ responses

d< >g d< >g

(10)

General auditory

contrast?

But why? Japanese listeners are aware of the

different articulatory gestures of [l] and [r] anyway? Mann (1986) attributes this result to a universal

perceptual mechanism.

An alternative explanation of contrast effect is general auditory contrast (Kluender & Lotto 1998)

High F3 l d Low F3 r g

(11)

Low after high

High after low

Time (s)

0 1

0 5000

Frequency (Hz)

Time (s)

0 1

0 5000

Frequency (Hz)

(12)

General auditory

contrast

In this theory, listeners do not need to know how [l] and [r] are articulated.

Context effect arise as the result of auditory contrast.

This theory is further supported by the

observation that non-speech precursors can cause context effect (Lotto and Kluender

1998).

(13)

Lotto and Kluender’s

(1998) results

Though see

Viswanathan et al. (2009, 2012) for a reply.

(14)

The Current

Experiment

(15)

Questions about Mann

(1986)

Do all Japanese speakers show contrast effect due to [l] and [r]? The general auditory contrast theory predicts that they

should.

Does the magnitude of contrast effect correlate with the ability to distinguish [l] and [r]?

The compensation for coarticulation theory (perhaps) predicts a positive correlation.

Is context effect universal (cf. Beddor et al. 2002; Kang et al. in press; Yu et al. 2013)?

In Mann (1986), context=natural speech; target=synthetic speech. There could have been some unnaturalness.

(16)

The current experiment

The current experiment tested the ability to distinguish [l] and [r], and the effect of context effect due to [l]-

[r] at the same time, from the same participants. The current experiment also used a synthetic [l]-[r] continuum (Kingston et al. 2014 et al.)

The auditory contrast theory predicts that the

higher the F3 is, the more [g] response we should get. Would we observe a simple linear increasing effect of

the [l]-[r] continuum on the [d]-[g] judgment?

(17)

Stimulus structure

[aXYa] where X is a {r-l} continuum and Y is a {d-g} continuum.

[l]-[r] continuum before [d]

[d]-[g] continuum after [l]

(18)

Method: Stimuli

The t wo surrounding vowels are always identical, [a] with F3 of 2500 Hz.

A liquid continuum {r-l} was created by varying F3: for the [r]-endpoint, it fell to 2000 Hz, and for the [l]-endpoints, it rose to 2800 Hz.

The continuum was created with 6 step increments.

The liquid portion was followed by a 95 ms gap with low-frequency periodic energy to mimic closure voicing of [d] and [g].

The [d]-[g] continuum was created by varying F3: in the [da] endpoint, F3 began at 2690 Hz, while in the [ga] endpoint, it began at 2104 Hz, again with 6 step increments.

(19)

Illustration with

spectrograms

Time (s)

0 1

0 5000

Frequency (Hz)

Time (s)

0 1

0 5000

Frequency (Hz)

Time (s)

0 1

0 5000

Frequency (Hz)

Time (s)

0 1

0 5000

Frequency (Hz)

arda

arga

alda

alga

(20)

Procedure

In the listening phase, listeners heard one stimulus and were asked to judge whether the second syllable was [da] or [ga].

The order of the stimuli was randomized within each block.

All listeners went through 8 blocks.

In the second phase of the experiment, the listeners were presented with the [ar] and [al] endpoint stimuli in isolation, and were asked to identify these sounds (20 trials). D-prime was calculated for each listener as a

measure of their ability to perceive the difference bet ween [r] and [l].

30 native speakers of Japanese participated in this study.

(21)

Analyzing the

identification patterns

Various logistic models were fit, and the model with the best AIC was chosen.

logit(Y)=β0 + β1DG + β2RL + β3DG*RL + e β2 = context effect

(22)

D-prime

d-prime = z(hit)-z(FA)

Measure of the ability to distinguish /r/ and /l/. Higher d- prime values indicate higher sensitivity to the contrast.

l (True) r (False)

l (True) hit alarmfalse

r (False) miss rejectioncorrect

stimuli response

(23)

Prediction 1

0.00 0.25 0.50 0.75 1.00

0 2 4 6

dg

pred

after[al]

after[ar]

Those who are sensitive to the [l]-[r] distinction would show strong

context effect.

(24)

Prediction 2

Those who are not sensitive to the [r]- [l] distinction

would show weak context effect.

0.00 0.25 0.50 0.75 1.00

0 2 4 6

dg

pred

(25)

Results

(26)

Average identification

functions

β<.001

No context effects for Japanese

listeners?

(27)

Interspeaker differences

0.00 0.25 0.50 0.75 1.00

0 2 4 6

dg

pred

after[al]

after[ar]

0.00 0.25 0.50 0.75 1.00

0 2 4 6

dg

pred

β=1.05 β<.001

(28)

Correlation with d’ and

magnitude of context effect

−1.0

−0.5 0.0 0.5 1.0

−1 0 1 2 3 4

D...prime(measure of ability to distinguish /r/ and /l/

Beta coefficient effect of /r/.../l/ continuum on stop judgment

Correlation of D...prime and Beta coefficient effect of /r/.../l/

Ability to distinguish [l] and [r]

Magnitude of context effect

r = -0.4, p <.01

Those with high d-prime values show “anti-compensation for compensation” effect

Those with low d-prime values can differ in how they are affected by

context effect.

(29)

It is not the case that the most [r]-like liquid induces the most [d] responses.

This is not expected from the auditory contrast theory,

perhaps hard to explain in the compensation

for coarticulation theory.

All the data together

(30)

Result summary

There are three groups of Japanese listeners: 1. who show expected context effect.

2. who show unexpected context effect (i.e. assimilator). 3. who are insensitive.

Those who can distinguish [r] and [l] tend to belong to Group 2.

The relationship bet ween the liquid’s F3 and the perceived F3 of the following stop is not (negatively) linear.

(31)

What do the current results say

about the theories of speech

perception?

These results are predicted by neither the compensation for coarticulation theory or general auditory contrast. We could only partially replicate Mann (1986).

After all, where does “assimilation effect” come from?

“Mis-parsing” explanation pursued in Kingston’s lab at UMass; e.g. low frequency of [r]’s F3 is “mis-parsed” as information belonging to the stop, inducing [g]-

responses.

But why does mis-parsing happen and when?

(32)

Why assimilation?

Those who know English well may be sensitive to lexical statistics.

The IPhOD calculator (Vaden et al 2009):

rd 0.00380 ld 0.00244

rg 0.00068 lg 0.00011

rd 0.848 ld 0.957

rg 0.152 lg 0.043

raw

frequency

conditional probability Bias toward [d] is slightly stronger after [r]

than after [l].

No explicit instructions that the stimuli were English words.

(33)

Discussion and remaining

questions

Not all Japanese speakers show context effect due to [l] and [r].

The results are not compatible with

either compensation for coarticulation or general auditory contrast.

What’s the mechanism behind

“assimilation”?

(34)

My teacher told me to…

(35)

Acknowledgments

John Kingston for our collaboration (past and present)

JSPS grants to the second author: #26770147 and #26284059.

Members of the Keio phonetics/phonology study group.

Participants at Tokyo Circle of Participants (1/30/2016).

Illustration with  spectrograms Time (s)0 105000Frequency (Hz)Time (s)0105000Frequency (Hz) Time (s)0 105000Frequency (Hz)Time (s)0105000Frequency (Hz)ardaargaaldaalga

参照

関連したドキュメント

Keywords: continuous time random walk, Brownian motion, collision time, skew Young tableaux, tandem queue.. AMS 2000 Subject Classification: Primary:

The oscillations of the diffusion coefficient along the edges of a metric graph induce internal singularities in the global system which, together with the high complexity of

Related to this, we examine the modular theory for positive projections from a von Neumann algebra onto a Jordan image of another von Neumann alge- bra, and use such projections

“rough” kernels. For further details, we refer the reader to [21]. Here we note one particular application.. Here we consider two important results: the multiplier theorems

In my earlier paper [H07] and in my talk at the workshop on “Arithmetic Algebraic Geometry” at RIMS in September 2006, we made explicit a conjec- tural formula of the L -invariant

The proof uses a set up of Seiberg Witten theory that replaces generic metrics by the construction of a localised Euler class of an infinite dimensional bundle with a Fredholm

This paper presents an investigation into the mechanics of this specific problem and develops an analytical approach that accounts for the effects of geometrical and material data on

The object of this paper is the uniqueness for a d -dimensional Fokker-Planck type equation with inhomogeneous (possibly degenerated) measurable not necessarily bounded