• 検索結果がありません。

クラスタ指向インデクシングに関する一検討

N/A
N/A
Protected

Academic year: 2021

シェア "クラスタ指向インデクシングに関する一検討"

Copied!
6
0
0

読み込み中.... (全文を見る)

全文

(1)社団法人 情報処理学会 研究報告 IPSJ SIG Technical Report. 

(2)  

(3)  

(4) 

(5) . 2004−NL−159  (7) 2004/1/13.   "! #%$&'()*"+ ([email protected]). ,"-/."02143"5/687"92:<;8=/>2?A@CB"D/EGF8H"I/JLKNM814O"PNQ2M8R8S2T2:/; &"'"U2V/W ?L= 68a"5/R UGb :/;8ced"f2g ."0"hLi14j2k"l"m 729"n j"k2o nrX"Y86tZ"sv[2ux\"wy]"Bv^"Dx_"Ev` z|{?v}~€FrMtR1%‚vƒv:x` ;y„v k ]r†r‡ Irˆ Y Y &|' Q%‰v] Šy‹rR%B|Y8Drp"q z {"?"}"ELŒ :/;8ct<6"1ŽB2D/z"{2?"}"E8"<‘"’N“”a B"D•Q ’/M8R8–/—L˜/€8™•š›z2?"E œ _ :/;Lc U"_ V/ 6208B"D/E8j2ž•QŸ:/;Lƒ"  Y UGV 68¡/Y¢"RL–/—L\˜/€L™•šLzG?"6/£8; U"V EL¤ 5/1e¥8¦/E8§2¨NML©8ª2«/E8¬"­":<;8ce®/’";8¯"°<6Ls"uN±²B"D/z"{"?2}N~Ÿ€"F Y ª"«/E"14³ $ M8©8–/—L˜/€L™´šLz"?µQŸ5A¶”·/.L¸"¡<‘"6L¹"º":/;/»2QL."14¼"½2¾ Y ‰"; UGV Y8¿"À<Á8Â"à . ‰;x»Q%E%Äj/’ ¿Å 6x£xÆÇ­:tc An Approach to Cluster-based Indexing Akiko AIZAWA National Institute of Informatics This paper introduces a framework and implementation of an information retrieval system that utilizes clusters of similar documents. The proposed method first generates document clusters together with their representative terms and phrases based on the term distribution or term sequence match. Next, considering each document cluster as a single virtual document, an extended index is created. Upon a query submission, the system uses both the original and the extended indices and returns the integrated result. An example is shown where indices generated based on different viewpoints are used to enhance the flexibility of the retrieval system.. È ÉÊË  ,-x.01%=/>?Ì@ &'UV 68ÍxÎ; ÏÐz{?"} Ñ<Ò ˜<€›™<z W €2FLÓxÔ cluster-based indexingÕ Y2Ö JrרIrÙ6|Ú%:x; Ub ª«xE 'Û :x;%c,|-x.%dvf :x;zv{?} ÑxÒ ˜x€%™rz W €vF Y s,x‘v’%Üx¢tÝ 0v1"‰rÞyßxà%átâ5vªãarwEyäaBvDxFtHIxJvE Urå My1"FyHIæJ%ç Y%èérY nv¨5 Áyê 5 å À kQ Qìë›6îí2ï‘›ð´ñ›’îòì˜<ˆ Y BGDµzì{2?ì}GE ^2_ 1t» ó E &v' OPQMtR%ôõM%R UvV 6yö $ Rr£x¶%Q45 ¶%ë Y .‰"; [1]c =x>?Ì@÷BD Y UV 6%Í5xR18BDxø Y 79 Á8ù ú ’›g Á ßƛQ›’G;»ìQL0ì1üûìý<6îþ´± Y Bìÿµ6›ÍG5 R Ñ ñ ó Rw2© [2]cµ©xQ%¢ 1 Rijsbergen Ô 1979. 1. QvMyR Ï 3|5r6 Õªvã|ar0 w “theÔ clustering hypothesis” Á âv5vB|Dr0rà U|V associationÕ 6 M%R ê 5xÚ nxEy­: Ò Á ‰;yÓNQ

(6) rR%5 ; [3] c »»y.|1/» Y  ¨ Y BvD æ¶yM Y Ï ªãva wyÓ Q%0 r‘6v0%BD  Y å À k x}yIx€ Ô :x’ Š ›BGD <z@ H Õ Y 729ì¾<6›sGu´±2ë Y .G‰´Æ 1 U V k/6 ":<;8BGD Y  ¨"¾ "¹´!Q "<6 #$´ñ ó ;8c :x’vŠ %7|9BD Y zv{?}~؀FrQ  ¨BD %¨ ; ,'-/.‰;%c 'Y &( 0%35x!6 )*x6'Ú +:x » Y »Q%ßx/Þ .§/‘6 &'UV 6%Í/Î;%BDxz{ ?v}~؀vF01x M x M  UvV 0k xzÌ@ H Y Œ 2 _ 143 c 6 5 Y ,'- Q xEv’%: ë Y QvM%R )|* ñ ó Rwv©%7 Y ". 8:9L<ë ;Q">ë =•@± ?•Þ ó R85<; Y 021 Rocchio Þe6æ£y;  ¨ † A C I BED šLz Ô relevance feedbackÕ . −39− 1.

(7) .‰"; c  ¨ †A I2B4D´š8z0"1T"¦ Á U"V ª « Y vB Dx[4]E  ¨  ¨r6F%HvIxJylxÎxMy©%ª«æ6 svuxwv1 UvV k0xz @÷HrE 3 5 :x;yÝ ` .‰v;%c  ¨ † A I B DNš%zv06 YrY þ± Y &'vUV 6Úy: ; )v* Y v .  ñ ó RyÍ Æ 1

(8) x.v0 Takano Þ Ô 2001  Õ Á 1 UvV  6t¨ q :r;ykQ  ¨v:r;tB DxE vT¦x6  ‘6   ñ؋r’ Á Þ UV E xá|; ê ’ T  #$x€ €E ¿À M%R%5x; [5]c  Y g `x Á Y BDxE "T¦x6%d­:x; Y 6 M%Rv18BvDxz|{?v}E *vT¦x6yd­|:x;%Ý ` ë4dfñ ó Re5æ;tc Cutting Þ Ô 1992  Õ Y Scat01 UV ª«B"D Y  z{"?} ~Ÿ€F ter/Gather Qî1 ìTG¦ Y g6<£î;  ¨<zì{ì?2} Y  µE !G3µ#6 " %Æ $´Mî’ Á Þ UGV  E µáì; &ì({ '#ì€ìF Y ©Gá Y ˜µ€ } † ) I/?G."‰"; [6]c|h"©21 Zamir&Etzioni Ô 1998  Õ 0v'1 * *+,  ^ - E vT M%R UV ª« Y z{v? }~Ž€FE ê  6t¤ ¶ g ` Eìd|fMì1 Web BD Y &" { '.€"F"68Í/Î; 0ù / ¾xE8­NMLR%5/; [7]c 132 Þ Ô 1999  Õ 01 4 ‘8BD/z{"?2} ~Ÿ€F"E  ¨ † A I B4D š%zxQ [|\ ¨rŠ%‹r61 5vr‘6 UV k–r—  E%¤Ì¶ÇÝ ` E%df M%R%5x; [8]c p Ýx1 T¦ Á 78 :x;x»Q8’x±  ‘6 UV k z @ H Y 3 5 Eì¤ ¶ Ý ` ë4dfñ ó Re5æ;ìc AtÔ  #Y 9 + † A I B Dršz Ô local tar&Frankel 1977 Õ v 0 1  žr6v{v€v>|€|Frñ ó © UvV ªv«|B feedbackÕ D . Y ‚ :r'Ú +æ6%surw%k r¶%M Y 79v¾x'E #0$ 7 My1/» ó E TM%R UV k xz @÷H Y 3 5 E%¤Ì¶ Ý ` ."‰; [9]cµh2©1îB"D %"0¨ ;0<x6LÍ/Î;80‚ :x!Ú + ß/Þ k /¶LM Y 7"9Gn/E #$G:/.; =?>/‘LÝ ` Q"MLR21 Ô  Y AÏ @B Ô conceptÕ Óx6tsvu Qiu&Frei 1993 Õ ± UGV kG–µ— Á ‰2; [11]cµñGÞL621 Xu&Croft Ô 1996  Õ C . Y D 9 + l E Ô analysisÕ 021 BGD %G?¨ ;?<ß<FÞ =?automatic >µ‘2’ GÏ @ local B2Ó µ ¶LM Y 7G9 nr'E #$:æ;xQvQëy618jvk Y ßvŠ Æ46 HÏ @BvÓ/E Tv5x© 9 + † A C I B DNš%zv. UvV k xz @÷HæE 3 5 :/;›Ý ` E8TG5<R85µ; [10]c p ݵ1 I0J´Þ Ô 1999  Õ 0 =>r‘’t7kx Ú +xEyT5xRv1 UvV k xz @÷Hx.v0. ’r±Ž18Bv0D rzÌ@ HxE 3 5 :æ ; K™vHxEydvf MtRy5 ; [12]c »»%.1x» ó Þ Y LM g `xY þ ±%6%‚ƒ M%R%5/; Y 0v1"hyi1%BvD ] k Y FyH|IxJ Á UV Y û Næ.  W OI 4 ‘6yŒ _ ñ ó ;y°x.v‰;%c UvV k–r—%T Y P {?E UV  QB|D ;<rßxÞ RS å :x;  ¨xE T wv1"zv{v?}E U%­r‘6 W ?%= X 6%ôvõ:æ;x»Qy0 v¹ MyR%5æ’%5cxhv©1 L M g ` 6%ÍrÎ; UV Y V. ’ W%‘v0 (rY  X. BD Y Y žv¸xÎv.v‰ Æ 1 UvV ªv« BvD Y z{v?}~؀Fv01Tv¦ Y ¨¹xE Z[ :x;v©á Y &{P' €vF%g\QM%RyT5Þ ó ;  ¨ Á þ|5c2í|ïr‘]^ Y zv{?}vE pX_ ¾£±Ødv­: ;r»|Q Á ` d Qy’;©vá1z{v?}|ø. Y' ¨vn Y Y žx0‰h /Æ ,!-x6’xÞ%’%5c » ó '6  M%R%,-x."01‰xÞ%ßxà8á S å M%©%BD zv{|?v}vEyvx‘|’yBvDQvMt R a5r1%–r—y˜æ€y™ š z?rQM%Rtôõ MyR UvV 6 vT:x; W ?t= XYy^_ E U|b :r;%c"BvDrz{v?}vE%˜æ€%™•š%z|?y:x;r»Q Y ° QM%R%/E‰ b;x»Q Á .w";%c 7G92B2D .Á c ^ ?d?8 :<;  ¨<621"» ó ޛE2h • Q8áG;/»2QL6/£ ;GR"1 eìY B2D<E8j2žNQ :/;  ¨•£<ƛëLF›H2I<JµQ2M›R  žµ6 Y ž2¸<Λ:<; Q Á ."w;%c F›H2I<J›K<6µ£<Æ  ¨2n .Y Y ž Á  Á ; / «<6 • f W²:<;<QL1؄G¹ Y l g 0 ] h -<06 i›¹•M8R›B2D zG{2?2}2ELŒ _ MLR21 U2V 6›¸2¡/‘G6L0l g Ñ<Ò ¾xE%ä/©%‹x;x»Q Á .w;Lc 1 lonx.%BD/z{?} Y ¹ºN'Q 6 ó E%T5x© jk m c prnx.v01%j UvVæW ?%= XYy^_ 6%a5r'R

(9) 0x;yq klvm QØj|ko pvq 6 f WMym© lÇa Y B|Dxz{v?} Œ _|` 6ya5x R sPUrM%1%rP6 tDnx.%˜r€%™ š%zv? Y u \ #$Q§¨ ` E%­|:t c v  m6 wrnr.%Ä|jr’ ¿ ¤ Å E x 7 M%m1 yznx.hxQ%á'E

(10) x;8c {}|}~}€}‚ 2.1 ƒ„P 0†‡Pˆ‰Š‹ UGV 0NQ Q›’";8BGD Y %"¨µE 2. 1 66Œ/h ó ;8k Y %|¨xE T Q|MyR1rÏ BvDxz{|?}yÓ/Ev18BvD%¨ Ô ∈ DÕ Q"» ó E62¬2:/;8k%"¨ S Ô ∈ T Õ Y8[ S \ ¨æŠ%‹x.y¹ºv:x;%cBvDxz{v?} Y 

(11) · Ž Y%Å E 1 6%­":ecz{?"}6x6 S 1 S 6%¡x¢"R2.1 ‘ ’ ¬­T Y =x>"?Ì@ &' E%¹º M%R"‰;%c D D. D. T. T. “‡P”6•‰–—. D. 2U VxW ?%= X 68ÍxÎ";.˜™ Yš ó E0 2 Q 3 6 ­2:ec›s2,µ‘2’Lg Y 0G1 (1) B2Dµz2{2?G} Y Œ _ 1 (2) ˜x€%™ š%z? Y œ _ 1 (3) U2V Qؘx€%™ š%z2?%§¨æ1 Y pÇax.‰;%c jPk 1›  6%a5xR8Äjx6'

(12) x;%c BDxz{?"} Y Œ _ (1) ,2-x.0%BDxz{?"}xQM%R1 œ‘’hxQ%h Æ ‰Y ";›í"ï<‘8ð´ñ8’Lj2ž Y F8HGI/J2EL2¹NM›R85<;8c 2.2. −40− 2.

(13) <CLUSTER> </SUMMARY> <SUMMARY> <DOCSIZE>m</DOCSIZE> <DOCLIST> ID1</ID><TEXT> 1</TEXT></DOC> <DOC><ID> ID2</ID><TEXT> 2</TEXT></DOC> <DOC><ID> ··· <DOC><ID> IDm</ID><TEXT> m</TEXT></DOC> <DOCLIST> <TERMSIZE>n</TERMSIZE> <TERMLIST> <DOC><ID> ID1</ID><TEXT> 1</TEXT></DOC> <DOC><ID> ID2</ID><TEXT> 2</TEXT></DOC> ··· <DOC><ID> IDn</ID><TEXT> n</TEXT></DOC> </TERMLIST> </CLUSTER>.  

(14)      .      . . . . . . . B Dxz{?} Y 

(15) ·Ž Å » Y £r¶Ø’|z{?v}EyŒ _ :r;%Ý ` QMtR1©xQy¢  Dhillon Þ Ô 2001  Õ 0ì1 <‘ì6›BìD<EìzG{G?G} ~Ÿ€"F<MLR6›  ß<Þ S å ML.© "¬ /z @CH/E Ï @0BGÓ Q²:.; @ BGl  Ô concept decompositionsÕ ` E›d2f D ÷kv¤|o Y Vv_ l| l Er6ysvu± MyRyÍ Æ [13] 1%Bv Ô 6ߊ";%g ` LSI Latent Semantic IndexingÕ [14] QvM%R f Wñ ó ;yc/hv ©

(16) r.0|1 Slonim&Tishby Ô 2000  Õ Y Information Bottleneck ` 1 Dhillon Þ Ô 2003  Õ Y &"' ™!x‘ co-clustering " 18BD QŸk Y  3 &!' # 6%s|u ±  3xz{v?}~؀F ` ëØd|f ñ ó RL5<;8ct, W ?8= X ."02140 n<!.

(17) /;/£/¶Ÿ6G1 &!' # 6%su±ØkQØBD Y  3r‘’vz{?v} ~؀|F 1%͕£Øãæ1%j2ko pq 6%su ±ØBDxz{?} ~؀ [1] F [15] E  T:x;8cN» ó ÞL061 =!$&%xß%!a '/’%™I } ^ - ç Y  4 E œ & My© 9 + ‘’yBDxzv{?}~ €F ` .‰";%c . 1:. :/;8˜/€L™•š8z"? Ô jk Ï –/—8˜/€8™´šLz"?LÓ Õ E6 ó)(tó Œ _ :r;%c–x—y˜x€y™Nš%zv?|01%BvDxzv{? } * Y ¬r‘’yk ] k+,xEv1"zv{?|}E ^_ :r;%B + Dr6  4 ¸rÎ;rë Y .‰;ycN»v»t.1 À|¿ W ?y= X þ ^ Y BGD .Á d 8 :µ;2©2áG1ŸOGP Y  ø2‘ 1 6G0L? øv‘|/’ .!0 Y ëvQ%./1 1+2x‘’yBDæz{v?} ~€F Ey¤ ¶%»Qy0 Àv¿ ‘v.0’y5c » Y ©áv18+k 3 ] B D)4t6= 5x~E iy¹ My©  .y–x—t˜x€y™Nš%zv?E œ _ :x;x»Q /Á 7  6’;%c UV Qؘx€8™ š%z?%§¨ (3) `

(18) Y £x¶Ø61%,"-x6%ÍxÎ;%–x—%˜/€%™ š%z? Y Wy‘|01L7|9vBDrEF%HvIxJyK:æ;x»Qy.1x£rÆ ê 5  ¨n Y žxE é ¢x£r¶%Q5̶%ë Y .v‰;%csv,˜ €y™ š%z?æQؖr—%˜r€%™ š%zv?Ey³ $ 6 ^v_ :r;x» Qy.|.1 |T¦r6x£y; Uy­x‘v’%˜æ€%™•š%z|? Y   Á Âvà 6’Æ 1 UV 6y¸¡æ‘v6% l g ÑrÒ ¾xEtär©%‹r; »Q Á .w"; Ô  3Õ cG–x—8˜x€8™Nš8z?2'6 ":x;8k Y u \ ¸xÎrQؘr€%™ š%z?y§¨r6%a|5xRC0 tnr'.

(19) x;%c. . 3:. U"V Qؘx€%™ š%z?%§"¨. 8:9<;<=|?>?@BA<C. BìD<zG{2?G}2E S å :µ;2©2á Y2Ö J<ײIµÙ<QGMLR21 j"kl"m Y 729n/6%s"uN±%z"{?2}N~؀"F41îÍ £ (i) ã/1 (ii) j"k2o Y8p"q 6Ls"uN±8z"{2?2}N~Ÿ€2F41 Y l ax6%a5x'R

(20) /;%c 3. BDxz{?} Y Œ _ ˜x€%™Nš%z? Y œ _ 3.1 D!E!F&G!H!IKJ/LKMKNPORQ (2) BDxE%jž QØ:/;%ƒ  Y UV T"˜x€%™ š%z? Ô j rä ©|j2’%k5lk m Y Y%%vp¨ q 6%Q \ s"u’xMy±ØÝR`18jv.k|0l18mBDxÁ/T E rlY!S ‘E6 k Ï s","˜<€8™•š›z"?8Ó Õ Q›1ŽB"D/z"{"?2}"E8j"žNQ . 2:. 3 −41−.

(21) Before agglomeration freq(ti, dj)=1 |ST|=3. SD. After agglomeration. freq(ST, SD)=7. 1. 1 1 0 1. 1 1. A representation of a cluster that is composed of subsets of documents and terms. Documents. 1 0. |SD|=3. |ST|=3 Documents. |SD|=3. c(ST,SD). ST. 7. Terms. Terms ST. ST. SD. SD. j"klmx6%su ±ØB"Dxz{?} ~؀F ` [1] 79:r;%BDrF%H|IxJxQ 6 ó 6  4 :x;%jvkxF%H|I 5;ec » Y u \ 6æMì© Á ;R S Y ß

(22) Þ  ž M JvE S å :r;yc <æ‘v6v0v1 R0 ¨ † A I B D š Ô M = 10Õ kxE%BDxz{?} Y *%xE  0:x;%¬­ zrQØ7v9 Y[\ E%Tv5xR1 jCk Y £r¶Ø6tBDxzv{ T Y >%'I &I2BQM%R S å :x;%cxh|© UvV T Y >%I &ŸI B”620"1 S Y j2k": <R"E8T"5/;Lc|’8Í/1Žg Y ?}E S å :x; [1]c .  ":<;8kµE8„2¹ Y ?l g/6 i8¹2:<;<»"Q8.G1( 6  {  €  % 6 x k  E     x : % ; c (1) X Æ Y ‰;z{?"}E%Œ _ :x;x»Q Á .w;%c (1).   y M % © x k E k v Q % M v R 1  6 E   Œ |   B r D E  U V ó (2) ÌÆ å : Ô S QØ:x; Õ c E  Æ å : Ô S Q 3.2j2kD!o Ep)&q H!6%IKsJ/u LK±ŸMKÝ NP` O!.Q 01%í"ïx‘ *5jžx. S å Mì©tB|D6 Œh ó ;tk (3) :x; Õ c k ,æE%‚ ù :x;yBDxFyHIrJE S å :r;%c Àv¿ +, +  • Í £ ã 6 3 s µ  E T G & ' R # 6›s2u´±Ÿ  (4) S M8RG143"5/S62Ú Á â"5%"¨ (S , S ) E/Q/Æ YYy=æp|qx>Á%?Ì\ @ %Þ ¨ró 6;%B0D 1xÁ » ^vY þ £x± ¶d’ *|8 5:rjv; žxÁ .1 /¿ Y - . , 6 MØBD/z{?}xQØ:x;%c =r>|?Ì@ÇEyl EM%©%ªv«x6r£%;xQt1<» ó Þ Y BvD  g Y (4) .v0|1  4 6y­:£r¶46v1%k QBvD Y ‚ : ¶yMt0 ê 5 

(23) r .  æ'Ú +r6v‰ Æ 1  3x6 h - Á ‚ k ,r0 2  6

(24) "¤2o Y  .21 S ͕£Ÿã S 6/£ ;GR8„2¹Nñ ó ; ƒMtR%5r;%c/hv© 01M% R Tñ ó ;y! >x6 f WM81 (S , S ) * Y "Y kNQØBD Y  ‚ : MyR ù Tr’ &' E Œ43|.%Í Æ 1©xQy¢ 576  rY Ô E 6 Kó (%ó ³ $ Y  Q  ¨r60 u  ’ sCU%B ][ 89] :;xY < Dxw Y £  (t , d ) t ∈ S , d ∈ S Õ Q \ ’8:  ¨NQ%1 (S , S ) ;</EhxQ8áR “”a Y  ¶6|1  0 Y g Á ߯%QvM%Rxë ù / 6 ¶%»Q Á . Q Q \ ’%:  ¨ Q Y  3 &! ' # Y  lx'E #$ M%’ Á w;%c » ó 6%suxw%jko pq 6%su ±ØÝ ` .01h%i Þ vx‘|6%k ] BDr E %¨æßxÞ  Æ Ty5rR%¤±üc » v j æ k E%jžQØ:r'; ** +,rE ^_ M%Rv1 &v!' # 6x£ Y  ¨ Y B|Dxz{v?} Y  nr01 n Á ðñØ5 »vQ 6r£%; 

(25) Y  ¹   Q%1 (rY  QrEhæQ%á;r» ; p|q n Á = $ j  Y jkvorE ^r¢v‰ bì1x» ó Þ Y Qy6x£y; &|' Y xY @ ‡ I B † .%¹æh;y c 

(26) jvk|oxE%‚ ù :r;%BDrF%HvIxJE S å :x;%c »v»%.   6æ£y;  0v1 

(27) r‘  k Kv™|Hr6yÍrÎv;%™ A = $æ012‰æÞyßxà%+á 4%= 5x~ Á ¸ é ñ ó © 546   Y =x>? @ÇE 4¿ . 6%l Ev:x;æ»Qy6x!£ ;Ry¹rá© ? 4P'%€%= A €FE 7;'R #$NM%R%5x;%c ! " ª« QMyR S å ñ ó ; S , S Y › ! 601%B [15]cx» Y Ý ` 0Ls",µ‘2!6 * *?+0,µE œ _ :<?; >›? @ Drz{v?}|'6 v:x; èéxY n¨v5x6æM%© Á ;Rv#1 " . ˜™ Á ¤x¢;"©á =!$!%x’%BDx6  M%R%5x;%c Á “Ø6v’;x£r¶Ø6 5 $KM%© $r. u \xÁ ¸rÎxÞ ó R BeDzG{ì?ì} Y S å  0Go1 ›µzì{ì?GA} @GQî6 ë ;vQ . 4:. T. T. ∗ D. ∗ T. ∗ D. ∗ T. T. ∗ T ∗ T. i. j. i. ∗ T. D. ∗ D. ∗ D. j. ∗ T. ∗ D ∗ D. T. D. −42− 4.

(28) ë pGq n Á8ê 5"kR,µE# Æ å M81/ßµÞ N Ô N = k/E2z"{G?"}L¬2­"T Y =µ>"? @²Q2M8R S å :<;Lc 50Õ h©1 UV T Y >%I'&4I2BÇ6v0 S 6%‚ƒv:x;tjk :xRvE%T|5x;%cx’yÍx1%7v9n #$x6ylg ÑxÒ ¾rE är©%‹r;©|á61 ¿   

(29)   D.  !"#  $%&'. (*),+.-./021,35476*8:9; 4.1 <=>?@ABCDFEGDHIJK LM &NOPRQSTUVWXZY[!\]^&Z  _` a bdc ^e&fghSjilk mndopqr !yZ%F& stZu VZZ&',vZvVwZxFZ Z_ z]! {  _|e}NO P~QSTjtf-idf €‚% ƒ % $W„^&' †‡ˆ‰Š M‹ Pk   sZŒ _RŽfgF U‘ ‡Zˆ‰ ’“”•  M & [17][1][16] ' 4. ØÙ@ZÚAÛBC f gS½T¾U  mZÜ s &z]  _ÝÞVßR  ÕFfÀgFS½ZTÀ¾Z!T¼àá—â Àã   —äZå ^F&¼Ž_ Á

(30) Ææç¤ èé S½T¾ sê RëªÏÐ^e&z ] s & ' ã vV NO P~QST\]ì  wx M mÜ íïî Q»SÀ!—ðñ•»ë—™Ž—$_fÃgFS½ÀT¾ ¿  « Õ &òZó å fg¼ô] s r õ r = 50%ö ’ZÝFFë»Z÷ âøúùTji‚  Æûü ^& ' M ¯ _—ýdfge÷ þÿq¸N OPÉQ»S T d¤V x FV¼ëXF͗o¼ p ÕF$%™&ZZ Z_ v ªû ü Ê   ” $ ó å â. s Ç &z]Veë_ó å  ëÕ&vZŽ÷ M %'

(31)  æÃçïó å     ÷Ã_ fÀg SÀ½ïTþÀ Êï˙ì  ÁZì  y T i  ! ya  äÀå ^¼&'   ϼ»ó å  4.2. 5. . ÐF_ZfWX™OZO namazu[18] ! y~—$Z_ !#" i‚¾N !%$. T '&ÊË! ( $ %&'*) ¸ X X P (ti , dj )   à -—S $/. OÀVÀ& NTCIR1 õ10 30 § +, ™Tñi  P (ti , dj )log I(T, D) = (1) P (ti )P (dj ) ! 4 5 ¯ 6” 574 6  å ^Ã' 2 fZg ö !—yZ%FWÀXF 3F t ∈T d ∈D Z–F T —Z˜Z]F_ D —fZgZ˜Z]™_ P (·) —ZšZfg v 83V÷_ý:9%;eV <ó=Õef?>  o%@=ÆÕ a  CEDFCHG ™  ¥ IJ™»yÀ  Kg ! ÊÀË »$Ã_Äf ! ›œŽn M  z]ežŸ ¡eVd&'% _f A B ' 4 ¿ V÷_x g¢Ne£šfg¤dVx¥¦  §¨ M © € s M geSd½T¾žË  yR $% &' %Ž^e&Žª_ N !  t ! «¬fg ­_ N !ªfg ¥¦  q%L ×NM%O Vߝ Õ fgS½Td¾!|e}N T 7& ,Ž% • WXZ  ®­RŽ$_ iP (d |wi ) ≈ 1 , P (d ) ≈ 1 Ž M &' OPÉQ»SZTFŽZ—$_¼1þ 9P $  j i j N N  ± ^  &  Y e [ !    $ e %  &  '  Á

(32) ÆWXd ! Q%R^e&z å vFŽ ¨ _ tf-idf ¯ pZ& idf —° (1) —±­Z²¥   SUTWVXY9;   4 5 Ž—_Z]þ )  Z\[ — ±³^e&Ž´n M µ & 'Z¶· Š  _qd¸NeOP~QS ]¼V™ë—_þ  ¸%^%_:9%;%  Z`[ :  4 V÷fdgUV LeM &½ T w  idf ¹ ’“Ž M &' i O % % O a s ßR Õ&vŽ s Ç 6 &' N P (dj |wi ) M ¯ = _ b ƒ fgFS½T¾žË O Ö ^&­ ‰Š = log (2) idfb (wi ) = log P (dj ) Ni cd —ƒ %™$Z÷_ ã 7 Õ e—Õ f > [1][15]  å —$»%¼&»' º  _|F}—NFO—P:Q»SZTV—fZg¼SZ½T¾ (S , S ) ¿  WX $. M c%d  ƒ % $÷: T '&­ ‰Š  Ð W T D  Z^¼&™Ž—_ |S !| S „ VRg€ f

(33) .  w  ±À^¼& idf ¹ !Á»Â  À M  % h R i Ž    $  %  & ' D ¿ ½ÃTÀ¾i ¿ »fÃgÀ­,ŽÃ»$ P (S |w ) ≈ |S | , P (S ) ≈ D D €Ä_ i N |S | !y%$ º °”•  M 6 j/kmlon N ¸ pV÷_fgSZ½T¾!|}NOPRQSTŽ  |S | $ q:r%s%dy^& WX $. T #&.  tË O. ! w  u%v % P (SD |wi ) N idfe (wi ) = log = log |SN | = log (3) $_ ã 8  x u. ! y:z' $. T #& {dÐ u Ž M &m P (SD ) Ni N M |%} Ž$_fgS½T¾ždË ¯ N ” 5 u 0  q¸NOªPRQST  ¯ p& idf ¹ ŽÆÅ%'^ MÇ u 8   ú T ‚ i   Ï  ~ b ! : %   €  &  v Ž V  & 'fgS s  ƒ ¨ È_|e} NOªPÉQSdT! ÊË^& z]  d_—xdÌ    O ÷_: ¸ p. V y:z M%O  Õ XÍe!fgÎdÏÐÑ ­®Ò–p«¬Ó MªÔ ½Td¾žËe. _ ‚:i  Ï šNhOj„i ƒ% dÅ ¯ p { Š fg!ª{  ^Õ  ”%'šfg Ö ^&\ &eëªV÷ M ×. &  Oe V &ª' _ † # ] ‡%ˆ C‰G‹ŠRQ s † y"d# : ‡ˆ  q¸dNOPRQSTŽ bc   ¹ !yd%Õ  ”   $ Ž  S  š 6 Œ C  k  ‡ » ˆ ¼ k ~ ¾  ù  O Z a   Z Y ™ [ !qr~ɀ _ × _Ó  R M ¯ ^ tu  M %' ÅFÉùOZ S %F!—yZ%F&FvŽZWë ‘  Œ ’ Õ&'Ãó Web i. j. i. D i. D. D i. D. −43− 5.

(34) 4. 5: NTCIR.  ”& {ð%3. (1). %S=TNV%X%Y%9%;.   ªfgS½T¾ u 0 #T i´Ž $_¸ å  pV÷ xÌ  m u %m u   šm u !yd%  s _ ò .   Tji u 0  O  † ye!« f

(35)  W„ stu % V &'. 

(36) . [1] Akiko Aizawa 2002. A Method of Cluster-Based Indexing of Textual Data. Proc. of COLING 2002, 1-7. 2002. [2] R. K. Belew Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW Cambridge University Press. 2000. [3] C. J. van Rijsbergen. Information Retrieval Butterworth 1979. [4] R. Baeza-Yates and B. Riberio-Neto. Modern Information Retrieval Addison-Wesley. 1999. [5] A. Takano, Y. Niwa, S. Nishioka, T. Hisamitsu, M. Iwayama, and O. Imaichi. Associative Information Access using DualNAVI. Proc. of NLPRS 2001, pp. 771-772. 2001. [6] D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. Proc. of ACM SIGIR ’92, pp. 318-329. 1996. [7] Oren Zamir and Oren Etzioni. Web Document Clustering: A Feasibility Demonstration. Proc. of SIGIR ’98, 46–54. 1998. [8].  .0/21 5!26278* 92. :2;8!-0<"=4#>@$&?%(A'B2)C*,D+E FG2H 62)43 I , D-I, Vol. J82-D-I, No. 1, pp. 140-149. 1999.. 4. 6: NTCIR.  ”& {ð%3. (2). ) ¸%^%_%9%;. for Computing Machinery, Vol. 24, No. 3, pp. 397417. 1977. [10] Jinxi Xu and W. Bruce Croft. Query Expansion Using Local and Global Document Analysis. Proc. of ACM SIGIR ’96, pp. 4-11. 1996. [11] Y. Qiu and H. P. Frei. Concept Based Query Expansion. Proc. of ACM SIGIR ’93, pp. 160-169. 1993. [12]. JKFL2G2\M] N . 26 7PO0QR .4ST@U )0V2W2X2Y . BCZ[ C^ , _ `;ba`c:edgf:bhi , Vol. 98, No. 58, pp.165-172. 1998.. [13] I. S. Dhillon, S. Mallela, and D. S. Modha. Concept Decompositions for Large Sparse Text Data using Clustering. Machine Learning, Vol.42, No.1, pp. 143-175. 2001. [14] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by Latent Semantic Analysis. Journal of American Society of Information Science, Vol. 41, pp. 391-407. 1990. [15] Akiko Aizawa Analysis of Source Identified Text Corpora: Exploring the Statistics of Reused Text and the Authorship. Proc. of ACL 2003, pp. 383390. 2003. [16] I. S. Dhillon, S. Mallela, and D. S. Modha. Information-Theoretic Co-clustering. Proc. of ACM SIGKDD 2003. pp. 89-98. 2003. [17] N. Slonim and N. Tishby. Document Clustering Using Word Clusters via the Information Bottleneck Method. Proc. of SIGIR 2000, pp. 208-215. 2000. [18] http://www.namazu.org/ [19] http://research.nii.ac.jp/ntcir/. [9] R. Atter and A. S. Frankel. Local Feedback in FullText Retrieval Systems. Journal of the Association. −44− 6.

(37)

参照

関連したドキュメント

学術資源リポジトリにおけるLightweight Information Describing ObjectLIDOの検討 A study of Lightweight Information Describing Object LIDO in Academic Resource

Imperial China: A Social History of Writing about Rites , Princeton University Press. Ebrey,Patricia Buckley 1991b, Chu Hsi's Family Rituals : A Twelfth-Century Chinese Manual for

Aging and retrieval- induced forgetting of associatively structured lists Takashi Matsuda and Junko Matsukawa (Kanazawa University).. Research on retrieval-induced forgetting has

5 On-axis sound pressure distribution compared by two different element diameters where the number of elements is fixed at 19... 4・2 素子間隔に関する検討 径の異なる

繊維フィルターの実用上の要求特性は、従来から検討が行われてきたフィルター基本特

(実被害,構造物最大応答)との検討に用いられている。一般に地震動の破壊力を示す指標として,入

In light of his work extending Watson’s proof [85] of Ramanujan’s fifth order mock theta function identities [4] [5] [6], George eventually considered q- Appell series... I found

modular proof of soundness using U-simulations.. &amp; RIMS, Kyoto U.). Equivalence