fajl8 最近の更新履歴川原繁人の論文倉庫3

(1)

Can Japanese speakers

compensate for coarticulation

due to [l] and [r]?

FAJL 8 @ Mie University

Chika Takahashi & Shigeto Kawahara (Keio University)

2016/2/18

Contact: [email protected]

(2)

Introduction

(3)

Context effect in speech

perception

Speech production of a segment is inﬂuenced by surrounding segments (a.k.a. coarticulation). Speech perception of a segment is likewise

inﬂuenced by surrounding segments.

Classic work by Ladefoged & Boardbend (1957), which shows that the perception of vowel

height is affected by the precursor sentence.

(4)

Ladefoged & Boardbend

(1957)

[b t] [b_ɛt]

(5)

Context effect as

normalization

Context effect is a way to deal with context-dependent variability (due to coarticulation).

Mann (1980): Given a [d]-[g] continuum, English listeners hear more of the

continuum as [g] after [l] than after [r].

(6)

Mann (1980)

% /g/ responses

/ al_

/ ar_ isolation

(7)

Compensation for

coarticulation

Listeners assume that after [l], the speaker’s tongue position is fronted, make up this assumed fronting, and are more likely to judge the continuum as [g].

This theory is called “compensation for coarticulation” (Fowler 2006, P&P).

Front Back [l] [r]

[d] [g]

(8)

Mann (1986)

Japanese listeners are famously unable to hear the difference bet ween English [l] and [r] (Goto 1971, et seq.).

Mann (1986) argues that Japanese

speakers cannot hear this difference, but they nevertheless compensate for coarticulation due to [l] and [r].

(9)

Mann (1986)

Figure 2.

LU Z 0 or) LU n"

The pattern o f "ga" responses given to stimuli along an acoustic/da/-/ga/ continuum when the stimuli were presented in isolation (left panels), and

when they were preceded b y / a l / a n d / a r / (right panels). From top to bot- tom, subjects include: (1) native speakers o f English who are 100% correct in identifying /l/ and /r/, (2) native speakers o f Japanese who are 99% correct in labeling /l/ and /r/, and (3) native speakers of Japanese who perform at chance level in l a b e l i n g / / / a n d / r / .

"< ^{5 0}

== (.9

Z ILl (0 tr UJ 0.

1 0 0 -

5 0

0

1 0 0

0

1 0 0 -

I I I I i I I

Speech perception 187

1 0 0

5 0

I I I I I I I

I00-

5 0 -

5 0

- i s o l a t e d C V s t i m u l i

I i I I I I I

1 2 3 4 5 6 7

1 0 0

5 0

0

-o--o preceding [or ]

e--e preceding [ OI ]/ /~Y"

f '

, , ,

1 2 3 4 5 6 7

S T I M U L U S N U M B E R

Figure 2.

LU Z 0 or) LU n"

The pattern o f "ga" responses given to stimuli along an acoustic/da/-/ga/ continuum when the stimuli were presented in isolation (left panels), and

when they were preceded b y / a l / a n d / a r / (right panels). From top to bot- tom, subjects include: (1) native speakers o f English who are 100% correct in identifying /l/ and /r/, (2) native speakers o f Japanese who are 99% correct in labeling /l/ and /r/, and (3) native speakers of Japanese who perform at chance level in l a b e l i n g / / / a n d / r / .

"< ^{5 0}

== (.9

Z ILl (0 tr

UJ 0.

1 0 0 -

5 0

0

1 0 0

0

1 0 0 -

I I I I i I I

Speech perception 187

1 0 0

5 0

I I I I I I I

I00-

5 0 -

5 0

- i s o l a t e d C V s t i m u l i

I i I I I I I

1 2 3 4 5 6 7

1 0 0

5 0

0

-o--o preceding [or ]

e--e preceding [ OI ]/ /~Y"

f '

, , ,

1 2 3 4 5 6 7

S T I M U L U S N U M B E R

English listeners Japanese listeners

% /g/ responses

d< >g d< >g

(10)

General auditory

contrast?

But why? Japanese listeners are aware of the

different articulatory gestures of [l] and [r] anyway? Mann (1986) attributes this result to a universal

perceptual mechanism.

An alternative explanation of contrast effect is general auditory contrast (Kluender & Lotto 1998)

High F3 l d Low F3 r g

(11)

Low after high

High after low

Time (s)

0 1

0 5000

Frequency (Hz)

Time (s)

0 1

0 5000

Frequency (Hz)

(12)

General auditory

contrast

In this theory, listeners do not need to know how [l] and [r] are articulated.

Context effect arise as the result of auditory contrast.

This theory is further supported by the

observation that non-speech precursors can cause context effect (Lotto and Kluender

1998).

(13)

Lotto and Kluender’s

(1998) results

Though see

Viswanathan et al. (2009, 2012) for a reply.

(14)

The Current

Experiment

(15)

Questions about Mann

(1986)

Do all Japanese speakers show contrast effect due to [l] and [r]? The general auditory contrast theory predicts that they

should.

Does the magnitude of contrast effect correlate with the ability to distinguish [l] and [r]?

The compensation for coarticulation theory (perhaps) predicts a positive correlation.

Is context effect universal (cf. Beddor et al. 2002; Kang et al. in press; Yu et al. 2013)?

In Mann (1986), context=natural speech; target=synthetic speech. There could have been some unnaturalness.

(16)

The current experiment

The current experiment tested the ability to distinguish [l] and [r], and the effect of context effect due to [l]-

[r] at the same time, from the same participants. The current experiment also used a synthetic [l]-[r] continuum (Kingston et al. 2014 et al.)

The auditory contrast theory predicts that the

higher the F3 is, the more [g] response we should get. Would we observe a simple linear increasing effect of

the [l]-[r] continuum on the [d]-[g] judgment?

(17)

Stimulus structure

[aXYa] where X is a {r-l} continuum and Y is a {d-g} continuum.

[l]-[r] continuum before [d]

[d]-[g] continuum after [l]

(18)

Method: Stimuli

• The t wo surrounding vowels are always identical, [a] with F3 of 2500 Hz.

• A liquid continuum {r-l} was created by varying F3: for the [r]-endpoint, it fell to 2000 Hz, and for the [l]-endpoints, it rose to 2800 Hz.

• The continuum was created with 6 step increments.

• The liquid portion was followed by a 95 ms gap with low-frequency periodic energy to mimic closure voicing of [d] and [g].

• The [d]-[g] continuum was created by varying F3: in the [da] endpoint, F3 began at 2690 Hz, while in the [ga] endpoint, it began at 2104 Hz, again with 6 step increments.

(19)

Illustration with

spectrograms

Time (s)

0 1

0 5000

Frequency (Hz)

Time (s)

0 1

0 5000

Frequency (Hz)

Time (s)

0 1

0 5000

Frequency (Hz)

Time (s)

0 1

0 5000

Frequency (Hz)

arda

arga

alda

alga

(20)

Procedure

• In the listening phase, listeners heard one stimulus and were asked to judge whether the second syllable was [da] or [ga].

• The order of the stimuli was randomized within each block.

• All listeners went through 8 blocks.

• In the second phase of the experiment, the listeners were presented with the [ar] and [al] endpoint stimuli in isolation, and were asked to identify these sounds (20 trials). D-prime was calculated for each listener as a

measure of their ability to perceive the difference bet ween [r] and [l].

• 30 native speakers of Japanese participated in this study.

(21)

Analyzing the

identiﬁcation patterns

Various logistic models were ﬁt, and the model with the best AIC was chosen.

logit(Y)=β0 + β1DG + β2RL + β3DG*RL + e β2 = context effect

(22)

D-prime

d-prime = z(hit)-z(FA)

Measure of the ability to distinguish /r/ and /l/. Higher d- prime values indicate higher sensitivity to the contrast.

l (True) r (False)

l (True) hit _alarm^false

r (False) miss _rejection^correct

stimuli response

(23)

Prediction 1

0.00 0.25 0.50 0.75 1.00

0 2 4 6

dg

pred

after[al]

after[ar]

Those who are sensitive to the [l]-[r] distinction would show strong

context effect.

(24)

Prediction 2

Those who are not sensitive to the [r]- [l] distinction

would show weak context effect.

0.00 0.25 0.50 0.75 1.00

0 2 4 6

dg

pred

(25)

Results

(26)

Average identiﬁcation

functions

β<.001

No context effects for Japanese

listeners?

(27)

Interspeaker differences

0.00 0.25 0.50 0.75 1.00

0 2 4 6

dg

pred

after[al]

after[ar]

0.00 0.25 0.50 0.75 1.00

0 2 4 6

dg

pred

β=1.05 β<.001

(28)

Correlation with d’ and

magnitude of context effect

−1.0

−0.5 0.0 0.5 1.0

−1 0 1 2 3 4

D...prime(measure of ability to distinguish /r/ and /l/

Beta coefficient effect of /r/.../l/ continuum on stop judgment

Correlation of D...prime and Beta coefficient effect of /r/.../l/

Ability to distinguish [l] and [r]

Magnitude of context effect

r = -0.4, p <.01

Those with high d-prime values show “anti-compensation for compensation” effect

Those with low d-prime values can differ in how they are affected by

context effect.

(29)

It is not the case that the most [r]-like liquid induces the most [d] responses.

This is not expected from the auditory contrast theory,

perhaps hard to explain in the compensation

for coarticulation theory.

All the data together

(30)

Result summary

There are three groups of Japanese listeners: 1. who show expected context effect.

2. who show unexpected context effect (i.e. assimilator). 3. who are insensitive.

Those who can distinguish [r] and [l] tend to belong to Group 2.

The relationship bet ween the liquid’s F3 and the perceived F3 of the following stop is not (negatively) linear.

(31)

What do the current results say

about the theories of speech

perception?

These results are predicted by neither the compensation for coarticulation theory or general auditory contrast. We could only partially replicate Mann (1986).

After all, where does “assimilation effect” come from?

“Mis-parsing” explanation pursued in Kingston’s lab at UMass; e.g. low frequency of [r]’s F3 is “mis-parsed” as information belonging to the stop, inducing [g]-

responses.

But why does mis-parsing happen and when?

(32)

Why assimilation?

Those who know English well may be sensitive to lexical statistics.

The IPhOD calculator (Vaden et al 2009):

rd 0.00380 ld 0.00244

rg 0.00068 lg 0.00011

rd 0.848 ld 0.957

rg 0.152 lg 0.043

raw

frequency

conditional probability Bias toward [d] is slightly stronger after [r]

than after [l].

No explicit instructions that the stimuli were English words.

(33)

Discussion and remaining

questions

Not all Japanese speakers show context effect due to [l] and [r].

The results are not compatible with

either compensation for coarticulation or general auditory contrast.

What’s the mechanism behind

“assimilation”?

(34)

My teacher told me to…

(35)

Acknowledgments

John Kingston for our collaboration (past and present)

JSPS grants to the second author: #26770147 and #26284059.

Members of the Keio phonetics/phonology study group.

Participants at Tokyo Circle of Participants (1/30/2016).

fajl8 最近の更新履歴 川原繁人の論文倉庫3

Can Japanese speakers

compensate for coarticulation

due to [l] and [r]?

Introduction

Context effect in speech

perception

Ladefoged & Boardbend

(1957)

Context effect as

normalization

Mann (1980)

Compensation for

coarticulation

Mann (1986)

Mann (1986)

General auditory

contrast?

Low after high

High after low

General auditory

contrast

Lotto and Kluender’s

(1998) results

The Current

Experiment

Questions about Mann

(1986)

The current experiment

Stimulus structure

Method: Stimuli

Illustration with

spectrograms

Procedure

Analyzing the

identiﬁcation patterns

D-prime

Prediction 1

Prediction 2

Results

Average identiﬁcation

functions

Interspeaker differences

Correlation with d’ and

magnitude of context effect

All the data together

Result summary

What do the current results say

about the theories of speech

perception?

Why assimilation?

Discussion and remaining

questions

My teacher told me to…

Acknowledgments

fajl8 最近の更新履歴川原繁人の論文倉庫3