音源分離の限界を規定する要因の検討(第23回大会優秀発表賞抄録)

(1)

The Japanese Psychonomic Society

NII-Electronic Library Service

The JapanesePsychonomic Society

llige

_fitpanese

_.jou･rnat

_oj'tlsychonemic Science

2005, VoL24,No.1,]27-128

Summary

ofAwarded

Presentation2P15

Factors

affecting

theperception

ofmultiple

simultaneous

voices

TakayukiKAwAsHIMA*'

i)_and

_Takao

The [hiiversit))

of

Toleyo

SATO**

The maximum number of streams _thatcould be heurd from a mixed sound wa$ measured.

In

each triala mixcd voice and a single voice

(probc)

"rere succcssively _presented.

_The

₇ adult

subjects were required to

judge

whether. or not, theprobe was _present

jn

_the mult,iple voices. The

pereeptual 1imit

(the

maximum number of streams) was calculated

by

multiplying the truehitratio

with the number o[ speakers, The estimated _perceptual limitwas approximately

3

when more

than 4 speakers wcre _presented.

The

results indicate_thatauditory processing ismore effective

than _previously

bc]icved

(cL

Kashino

&

Hirahara,

1996,}.

The

data indicatedthatthe probe could

bc

detected

morc casily when itwas _presented _before_themixed voices rather than after themixed

voices.

This

_may indicate_that_the _perceptual _limit_results _from _a limited attention capacity. Key wordsi multiple voices, attention, sound source segrcgation, masking

We

have measured the maximum number of

per-ceptua] streams that can

be

heard

from

a sound which arises

from

multiple simultaneous sources.

The limitof per¢eption was then considered to

meas-ure theeMciency of auditory _processing.

Other

than Kashino and

Hirahara

_<1996)

thereare

few studies _that have attempted tomeasure the

liinit

ef _perception of multiple simultaneuus sounds, In

theirreport multiple voices were _presented to the

subjects.

_The

number of speakers

(voices)

was ma-nipulatecl as an independent varjable, and the

sub-)ectswere required to _give the{rnumber.

The

pro-portion of the correct responses was very

high

(near

touniLy) when 1or

2

speakers were _presented but

decreased rapidly when more than

3

speakers were

presented,

The

aurhors thereforeconcluded that the

maximum nurnber of streams thatcan be_perceived isapproximately two,

One

drawback

of estimating the _pcrceptual _limit

by

asking thesubjects

to

count the number of voices

is

thatthey can

(correctly>

answer, but without

per-* _21st_century _COE

program "Center

for

ary cognitive sciences,"

Graduate

School ef Arts

and

Sciences,

The

University of

Tokyo,

3-8-1

Komaba, Meguro-ku, Tokyo 153-8902

i) _Takayuki

_Kawashima

_is

now at the

Department

of

Cognitive

and Behavioral

Science,

Graduate

School

of

Arts

and

Sciences,

The

University

of

Tokyo

** _Department _of

_Psychology,

_Graduate _SchoQl

of

Hurnanities and

Sociology,

The Univer$ity of

kyo, 7L3-1

Hongo,

Bunkyo-ku,

Tok}ro

113-O033,

ceiving the

_individual

voices separately.

For

exam-ple,they might use the _timbre of the mixed sound as

acue forestimating the numbcr, Itis_possible

there-fore,

that

Kashino

and Hirahara

(1996)

didnot

meas-ure the perceptual ]imitcorrecUy.

Wc

estimated the limitof _perception _with _a _new method. Inthe_present experiment inevery trialwe used mixed voices and a single voice

{probe)

succes-sively.

The

subjects were required to

judge

whether,

or not, the _probe was presenr inthe multiple voices.

At theconclusion we calculated the

true

hit

ratio

(IL)

according to

Equation

1

_(Macmillan

&

Creelman,

1991,_p.89).

fl}==(H-F)/(1-F).

₍₁₎

The symbolsH and F inEquation

1

represent thehit

ratio and the falsealarm ratio respective]y. For each

condition of the number of speakers Hl was

calcu-latedand the number of $treams was estimated by

multiplying

ff}

with thenumber ofconcurrent

speak-ers. This _procedure ensured _that _theestimation of

the_perceptual

limit

was

based

on a perceptual

sepa-ration ofa mixecl sound.

Methods

Participants

The

participants were

7 Japanese

aduats with normal hearing.

Apparatus

and stimuli

A

set of the 30

Japanese

words were digitallyrecorded on audio tape

(441OO

Hz, 16bit>

in

a sottndproef room by

7 female

(2)

The Japanese Psychonomic Society

NII-Electronic Library Service

The JapanesePsychonomic Society

128 The

_Japanese

_Journal

of Psychonomic Science VoL24, No. 1

ers who did not

_participate

as subjects. Allof the

words consisted of 4 moras and theiraverage

dura-tionwas

O.87

seconds. Allof the stimuli were

pre-sentcd

diotically

through headphones

(Sennheiser

HDA200} and a single word was played at a sound

levelof 63 dB SPL. The stimulus presentation and

data

acquisition were controlled

by

a personal

com-puter

{Apple

Power

Mac

G4).

Procedure As mentioned above, we presented a

mixed sound and a single voicc successively

in

every

trial. The _probability that the mixed sound

con-tainedtheprobe was

O.5.

In

one condition, theprobe

was presented before the multiple voices

(preprobc

condition), and in the other condition, itwas

pre-sented after the rnultiple voices

(postprobe

condi-tion), A silent interval

(O,3

seconds} was inserted

between thetwo sounds inboth conditions,

There

were

6

conditions inwhich 1,2,3,4,5.and 6

voices _produced the mixed sound.

These

6

condi-tionswere _presented

in

a random order

in

a

block.

The mixed voices were composed of the different

speaker$ each of whom said differentwords. The

speaker of the probe voice was randomly selected

from the 7 speakers incvery tria].Thc _participants

repeated

their

judgments

33

times

ineach condition.

Three

of the 7subjects were _testedfirstinthe

pre-probe

cendition and the remainder were testedfirst

inthe _postprobe condition.

Results

and

Discussion

The mean va]ues of the estimated perceptual

limit

across the subiects are shown in Figure 1. In the

postprobe condition

the

numbcr of strcaTns was

ap-proximately

3when more than 4speakers were

pre-sented

(square

symbols). This suggests thatat

least

3streams can

be

scparated from multiple simultane-ous voices and indicatesthatthe auditory _processing ismore etficient than previously reported.

Therc was a considerable differenceinthe

esti-mated number of streams

in

the

2

probe conditions

when more than 3speakers were _presented. A

two-way ANOVA revealed thatthe effect of the

interac-tionwas significanL

(F{5,

30).= 14.12,p<O,Ol),_and the

sirnple main effect of the _probe _positionswas sig-nificant when more than

3

speakers were presented

6co:5gco"o'4gi3iee2..E--,coLLj

d

o

1

2 3

4

5

6 Number

ct

Talkers

Figure 1. The estimated number of streams as

a

function

of the number of concurrent

speakers. The symbols

_(Z

and small

O}

ind[cate the average values acruss the

jects.

Error bars are the SEMs. The thin tine

represents the theoreticalmaximum number,

(p<O.05).

Ifinformation of the individual sounds was

lost

by

mutual

interference

inthe _peripheral auditory system

_(as

in

energetic masking), then the changed _probe _positionwould not affect the

estima-tion of the number of streams. The difference

be-tween the probe conditions thereforemay indicate

thatthe perceptual

limit

is

determined

by

cognitive

(central)

factors,such as attention or memory.

It

is

not clear

how

the

limit.

of _percept.ion

is

changed whcn sounds other than

human

voices

cre-ate a mixed $ound, or when objects are _presented

across differentsense modalities. Mcasurement. of

the perceptual

limit

under theseconditions will

be

helpful

for

understanding

how

our perceptual world

arises.

References

Kashino, M.

&

Hirahara, T. 1996

One,

two, many

judging

the number of concurrent talkers.

.10urnat

of

theAcoustical Soctety

_ofAmerica,

99,

2597,

Macmillan, N. A. &

Cree]man,

C,

D.1991 Detection

theonyv':A user's _guide.Cambridge: Cambridge versitv

音源分離の限界を規定する要因の検討(第23回大会 優秀発表賞抄録)

fitpanese

.jou･rnat

Summary

Presentation2P15

Factors

affecting

theperception

ofmultiple

simultaneous

voices

TakayukiKAwAsHIMA*'

Takao

of

Toleyo

SATO**

In

(probc)

The

judge

jn

(the

by

3

The

bc]icved

(cL

&

Hirahara,

1996,}.

The

detected

This

We

be

heard

from

from

Other

Hirahara

<1996)

liinit

The

(voices)

The

high

(near

2

3

The

One

drawback

by

to

is

(correctly>

Graduate

Sciences,

The

Tokyo,

Kawashima

Department

Cognitive

Science,

Graduate

School

Arts

Sciences,

The

University

Psychology,

Sociology,

Hongo,

Tok}ro

113-O033,

individual

For

that

Kashino

(1996)

音源分離の限界を規定する要因の検討(第23回大会優秀発表賞抄録)

_fitpanese

_.jou･rnat

_Takao

_The

_<1996)

_The

_Kawashima

_Psychology,

_individual

_(Macmillan

₍₁₎

_Japanese

_Journal

_participate