employing Point-of-View Reinforcement
Kenji Nagamatsu and Hidehiko Tanaka
University of Tokyo
Hongo 7-3-1, Bunkyo-ku,Tokyo,Japan, 113
1 Intro duction
Two dierent words may not b e similar in general,
rather theyaresimilar under some asp ects or p oint-of-
views. This pap er prop oses a new similarity measure
between words based on p oint-of-views. The metho d
utilizes co-o ccurrence probability-based similarity as a
basis and extends it byweighting the values according
totherelevanceb etweeninputwordsandp oint-of-view
words(calledpoint-of-viewreinforcement).
2 Similarity with Point-of-View
Based on b oth corpus- and feature-based measures the
formulationofoursimilaritySim(w
1
;w
2
;w
p
)isdened.
S im(w
1
;w
2
;w
p )=
X
8w 2C o(w
1 )\C o(w
2 ) Pr (w jw
1
;w
p
)+Pr (w jw
2
;w
p )
2
Pr (w jw 0
;w
p
) denotes the co-o ccurrence probability of
w conditioned by w 0
and reinforced by a point-of-view
w
p
,Co(w )thesetofwordsco-o ccurring withw .
The point-of-view reinforcement is resp onsible for
modulatingthis basic similaritybyp oint-of-viewwords.
Pr (w jw 0
;w
p )=
(wp;w )
f(w jw 0
)
( (w
p
;w)
01)f(w jw 0
)+ P
8x2Co(w 0
) f(xjw
0
)
f(w jw 0
) denotes the normal co-o ccurrence frequency
and is a parameter controlling how the relatedness
betweentwop oint-of-views(w
p
;w )aectsthesimilarity.
(w
p
;w) is the factor indicating the relatedness b e-
tween input words and a p oint-of-viewword. It is de-
nedasthemutualinformationcontentb etweenw
p and
wandapproximatedwithanothertyp eofco-o ccurrence
dataextractedfrom ataggedcorpus.
3 Experiments
One experiment is a selectivity
test([Nagamatsu
and
Tanaka, 1996 ]
) with large word-pair sets of synonyms
andnon-synonyms. Thisevaluatesthewholeattitudeof
similaritymeasures(seethegure).
The resultshows clearlythat the corpus-based mea-
sures(co, pov*) are sup erior to the thesaurus-based
ones(link#, depth). Moreover, among these corpus-
based measures, employingthe point-of-view reinforce-
ment(pov)makestheselectivityhigherthanitsoriginal
co(the lower a data sequenceis located, thehigher the
selectivityofthemeasureb ecomes).
Theotherisaexp erimentemployinghumansubjects.
Thisshowsthecorrelationb etweensimilarityvaluesand
80% 82% 84% 86% 88% 90% 92% 94% 96% 98% 100%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
100.00%
10.00%
100.00%
depth link#
co pov(1.2) pov(1.5) pov(2.0) resnik95
coverage of synonym pairs coverage of non-synonym pairs
Simmeasure Whole IPAL Bunrui-Goi-Hyo
resnik95 0.426 0.235 0.420
pov(2.0) 0.424 0.232 0.495
pov(1.2) 0.390 0.210 0.415
depth 0.380 0.164 0.449
link# 0.365 0.104 0.442
co 0.344 0.211 0.306
Thisexp erimentshowsthatthethesaurus-basedmea-
sures(depth, link#, resnik95) havehigher correlation
with human judgmentthan thecorpus-based ones(co).
By employingp oint-of-viewreinforcement,however,the
derived measures(pov*) have b ecome even b etter than
the thesaurus-based measures and when theparameter
isadequatelyselected,thehighestcorrelationhasb een
achieved.
4 Conclusion
Fromtheexp erimentsitisconcludedthat theprop osed
similarity measure can distinguish synonym pairs from
non-synonym pairs b etter than other similarity mea-
sures(selectivitytest)andthatthemeasurehashighcor-
relationwiththeratingscores byhumansubjects.
References
[
NagamatsuandTanaka,1996 ]
KenjiNagamatsuandHide-
hikoTanaka.Estimatingp oint-of-view-basedsimilarityus-
ing povreinforcement& similarity propagation. InPro-
ceedingsofPACLIC11 Language,InformationandCom-
putation,pages373{382,December1996.
[Resnik,1995]
Philip Resnik. Using information content to
evaluatesemanticsimilarityinataxonomy.InProceedings
ofthe14thInternationalJointConferenceonArticialIn-
telligence,volume1,pages448{453,1995.