The point-of-view reinforcement is resp onsible for modulatingthis basic similaritybyp oint-of-viewwords

(1)

employing Point-of-View Reinforcement

Kenji Nagamatsu and Hidehiko Tanaka

University of Tokyo

Hongo 7-3-1, Bunkyo-ku,Tokyo,Japan, 113

1 Intro duction

Two dierent words may not b e similar in general,

rather theyaresimilar under some asp ects or p oint-of-

views. This pap er prop oses a new similarity measure

between words based on p oint-of-views. The metho d

utilizes co-o ccurrence probability-based similarity as a

basis and extends it byweighting the values according

totherelevanceb etweeninputwordsandp oint-of-view

words(calledpoint-of-viewreinforcement).

2 Similarity with Point-of-View

Based on b oth corpus- and feature-based measures the

formulationofoursimilaritySim(w

1

;w

2

;w

p

)isdened.

S im(w

1

;w

2

;w

p )=

X

8w 2C o(w

1 )\C o(w

2 ) Pr (w jw

1

;w

p

)+Pr (w jw

2

;w

p )

2

Pr (w jw 0

;w

p

) denotes the co-o ccurrence probability of

w conditioned by w 0

and reinforced by a point-of-view

w

p

,Co(w )thesetofwordsco-o ccurring withw .

The point-of-view reinforcement is resp onsible for

modulatingthis basic similaritybyp oint-of-viewwords.

Pr (w jw 0

;w

p )=

(wp;w )

f(w jw 0

)

( (w

p

;w)

01)f(w jw 0

)+ P

8x2Co(w 0

) f(xjw

0

)

f(w jw 0

) denotes the normal co-o ccurrence frequency

and is a parameter controlling how the relatedness

betweentwop oint-of-views(w

p

;w )aectsthesimilarity.

(w

p

;w) is the factor indicating the relatedness b e-

tween input words and a p oint-of-viewword. It is de-

nedasthemutualinformationcontentb etweenw

p and

wandapproximatedwithanothertyp eofco-o ccurrence

dataextractedfrom ataggedcorpus.

3 Experiments

One experiment is a selectivity

test([Nagamatsu

and

Tanaka, 1996 ]

) with large word-pair sets of synonyms

andnon-synonyms. Thisevaluatesthewholeattitudeof

similaritymeasures(seethegure).

The resultshows clearlythat the corpus-based mea-

sures(co, pov*) are sup erior to the thesaurus-based

ones(link#, depth). Moreover, among these corpus-

based measures, employingthe point-of-view reinforce-

ment(pov)makestheselectivityhigherthanitsoriginal

co(the lower a data sequenceis located, thehigher the

selectivityofthemeasureb ecomes).

Theotherisaexp erimentemployinghumansubjects.

Thisshowsthecorrelationb etweensimilarityvaluesand

80% 82% 84% 86% 88% 90% 92% 94% 96% 98% 100%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

10.00%

100.00%

depth link#

co pov(1.2) pov(1.5) pov(2.0) resnik95

coverage of synonym pairs coverage of non-synonym pairs

Simmeasure Whole IPAL Bunrui-Goi-Hyo

resnik95 0.426 0.235 0.420

pov(2.0) 0.424 0.232 0.495

pov(1.2) 0.390 0.210 0.415

depth 0.380 0.164 0.449

link# 0.365 0.104 0.442

co 0.344 0.211 0.306

Thisexp erimentshowsthatthethesaurus-basedmea-

sures(depth, link#, resnik95) havehigher correlation

with human judgmentthan thecorpus-based ones(co).

By employingp oint-of-viewreinforcement,however,the

derived measures(pov*) have b ecome even b etter than

the thesaurus-based measures and when theparameter

isadequatelyselected,thehighestcorrelationhasb een

achieved.

4 Conclusion

Fromtheexp erimentsitisconcludedthat theprop osed

similarity measure can distinguish synonym pairs from

non-synonym pairs b etter than other similarity mea-

sures(selectivitytest)andthatthemeasurehashighcor-

relationwiththeratingscores byhumansubjects.

References

[

NagamatsuandTanaka,1996 ]

KenjiNagamatsuandHide-

hikoTanaka.Estimatingp oint-of-view-basedsimilarityus-

ing povreinforcement& similarity propagation. InPro-

ceedingsofPACLIC11 Language,InformationandCom-

putation,pages373{382,December1996.

[Resnik,1995]

Philip Resnik. Using information content to

evaluatesemanticsimilarityinataxonomy.InProceedings

ofthe14thInternationalJointConferenceonArticialIn-

telligence,volume1,pages448{453,1995.