ίʔϧηϯλʔͷۀվળʹ͚ͨ
ԠϚχϡΞϧͷੳͱݕࡧख๏ͷݕ౼
Analysis of FAQ and improvement of information retrieval for call center operation
ࢁԼ ྒྷਅ
∗1 Ryoma Yamashitaݪ ݠհ
∗2 Kensuke Haraాଜ ࢚
∗1 Satoshi Tamuraਫ ޛ
∗1 Satoru Hayamizu ∗1ذෞେֶ
Univ. of Gifu ∗2גࣜձࣾ ηΠϊʔใαʔϏε
Seino Information Service Co., Ltd.
At a call center, an operator searches for an appropriate manual based on an inquiry given by the customer. However, it is difficult to choose a correct manual from retrieval results. The purpose of this study is to analyze manuals used in an actual call center in order to build an effective retrieval method such as a two-question-and-two-answer scheme. We applied a clustering technique to the manuals and analyzed data by cluster visualization. We then investigated the potential to employ the approach. We finally compared the proposed method with the baseline method and showed its effectiveness.
1. ͡Ίʹ
ʑͷੜ׆ۀͳͲ༷ʑͳ໘Ͱසൟʹจॻݕࡧ͕ߦΘΕ ͍ͯΔɽͦ͜Ͱɼେͳจॻ܈͔Βదͳจॻͷݕࡧ͕ߦ͑Δ ͜ͱඇৗʹॏཁͱͳΔɽίʔϧηϯλʔۀʹ͓͍͍ͯ ߹ΘͤΛड͚ͨࡍɼԠϚχϡΞϧͷݕࡧΛߦ͍ճΛߦͬͯ ͍Δɽਝ͔ͭਖ਼֬ʹରԠΛߦ͏ඞཁ͕͋ΔͨΊɼԠϚχϡ Ξϧͷݕࡧਫ਼্ಛʹظ͞ΕΔɽҰൠతͳݕࡧγεςϜ ͰɼϢʔβ͕ೖྗͨ͠ΫΤϦʹର͠ճީิΛෳ݅ఏࣔ ͠ɼϢʔβ͕దͳީิΛબ͢Δɽ͔͠͠ɼݕࡧ݁Ռ͕ෳ ఏࣔ͞Εͨͱ͖ɼͦͷ༰Λਝʹஅ͠બ͢Δ͜ͱ͕Ͱ͖ ͳ͔ͬͨܦݧΛ࣋ͭΦϖϨʔλଟ͍ɽ·ͨɼઐੑ͕ඞཁͳ ͜ͱؚΊɼΦϖϨʔλͷίʔϧηϯλʔۀͷܦݧ͕ઙ͍ ߹ʹదͳΩʔϫʔυʹΑΔਖ਼֬ͳݕࡧ͕ߦ͑ͳ͍߹͕ ͋Δɽ Ҏ্ͷ՝͔ΒɼίʔϧηϯλʔۀͷิॿΛతʹɼैདྷ ͷݕࡧγεςϜΛରܗࣜʹஔ͖͑Δ͜ͱΛߟ͑Δɽରܗ ࣜʹɼରͷதͰճͷߜΓࠐΈΛߦ͑Δͱ͍͏ར͕͋ ΔɽຊݚڀͰγεςϜଆ͔Β͍ฦ͠Λߦ͏22ܗࣜ ʹ͢Δ͜ͱͰݕࡧਫ਼ͷ্Λࢦ͢ɽಛʹᐆດͳΫΤϦʹର ͯ͠Ճͷ࣭Λߦ͏͜ͱͰ࠷దͳճʹಋ͘͜ͱ͕ߟ͑Β ΕΔɽຊΈͰదͳճΛಋͨ͘ΊʹԿΛγεςϜଆ ͔Β͍͔͚Δͷ͔͕՝ͱݴ͑Δɽ՝ͷղܾࡦͱͯ͠ɼ ϢʔβͷΫΤϦʹର͠ɼԠจΛఏࣔ͢ΔͷͰͳ͘ɼෳ ͷԠจΛؚΉάϧʔϓΛ݅ఏࣔ͢Δ͜ͱΛఏҊ͢Δɽݕ ࡧରͷάϧʔϓ͕ఆ·Δ͜ͱͰճީิ͕ߜΓࠐΊΔͨΊɼ ݕࡧਫ਼ͷ্͕ظ͞ΕΔɽͦ͜Ͱɼ·ͣԠϚχϡΞϧͷ ΫϥελϦϯάʹΑΔάϧʔϓԽͱੳΛߦͬͨɽ࣍ʹɼϢʔ βͷΫϥελબޙʹݕࡧΛߦ͏͜ͱʹΑΔճͷߜΓࠐΈͷ ༗ޮੑʹ͍ͭͯݕূΛߦͬͨɽ࠷ޙʹɼҎ্ͷݕূ݁Ռ͔Β2 2ݕࡧγεςϜͷ࣮ݱʹ͚ͯߟΛߦͬͨɽ ୈ2ষͰɼຊ࣮ݧͰ༻͍ͨσʔλͷ֓ཁΛࣔ͢ɽୈ3ষͰ ɼԠϚχϡΞϧͷΫϥελϦϯάʹ͍ͭͯࣔ͢ɽୈ4ষͰ ɼݕࡧਫ਼ʹؔ͢ΔධՁ࣮ݧͷํ๏ͱͦͷ݁Ռ͍ͭͯࣔ͢ɽ ୈ5ষʹ·ͱΊΛࣔ͠ɼ࠷ޙʹୈ̒ষͰࠓޙͷ՝Λࣔ͢ɽ ࿈བྷઌ:ࢁԼྒྷਅɼذෞେֶɼ[email protected] ද1: ࣭จ(ΫΤϦ) (ਖ਼ղจॻIDද2ͱରԠ) ࣭จ(ΫΤϦ) ਖ਼ղจॻID ిࢠாථʹࡌ͍ͬͯΔͷΑΓલͷ ใݟΕ·͔͢ NT00026B92 excelॎॻ͖ ԣॻ͖ NT00025336 ిࢠாථͰ࣍ϖʔδͷόʔ͕ͳ͍ NT00029B122. ࣭Ԡίʔύε
ຊݚڀͰ༻࣭ͨ͠Ԡίʔύεʹ͍ͭͯઆ໌͢Δɽ͜Ε ɼίʔϧηϯλʔͷϔϧϓσεΫͰ༻͞Ε࣭ͨจ17,157 ݅ͱɼͦͷ࣭ʹର͢ΔԠϚχϡΞϧ5,171݅ͷσʔλ͔Β ߏ͞Ε͍ͯΔɽ ຊݚڀͰɼΫϥελϦϯάͷରͱͯ͠ԠϚχϡΞϧΛ ༻͠ɼݕࡧਫ਼ͷධՁ࣮ݧʹ࣭จͱԠϚχϡΞϧΛ༻ ͨ͠ɽ2.1અͰ࣭จɼ2.2અͰԠϚχϡΞϧʹ͍ͭͯ આ໌͢Δɽ࠷ޙʹ2.3અͰจॻͷલॲཧʹ͍ͭͯࣔ͢ɽ2.1 ࣭จ (ΫΤϦ)
࣭จίʔϧηϯλʔͰ࣮ࡍʹߦΘΕ࣭ͨΛूΊͨίʔ ύεͰ͋Δɽ࣭จͷςΩετʹՃ͑ͯɼ࣭ʹରͯ͠ͲͷϚ χϡΞϧ͕ద͍ͯ͠Δ͔ͷϥϕϧ(ਖ਼ղจॻID)͕༩͞Εͯ ͍Δɽຊ࣮ݧͰɼ༩͞ΕͨϥϕϧΛΫΤϦʹର͢Δਖ਼ղจ ॻͱͯ͠ѻ͏ɽ࣭จͷྫΛද1ʹࣔ͢ɽ2.2 ԠϚχϡΞϧ (ճจ)
ԠϚχϡΞϧɼIDɼճຊจɼճͷλΠτϧɼେྨ (form)ɼதྨ(system)ͷ5ͭͷใΛ࣋ͭɽຊ࣮ݧͰɼ ը૾ͷΈͷϚχϡΞϧͳͲͷจॻΛআ֎͠ɼܭ5,171݅Λѻͬ ͨɽྫͱͯ͠ɼ࣮ࡍͷճͷλΠτϧͱճຊจΛද2ʹࣔ ͢ɽճλΠτϧΫΤϦͱྨࣅͨ͠จʹͳ͍ͬͯΔ͜ͱ͕ଟ ͍ɽ·ͨɼେྨɼதྨͷৄࡉΛද3ʹࣔ͢ɽ͜ΕΒͷϥϕ ϧਓखͰ͚ΒΕͨԠϚχϡΞϧͷྨͰ͋Δɽ2.3 จॻͷલॲཧ
࣭จճจʹɼશ֯ͱ֯ͷҧ͍ɼΞϧϑΝϕοτ ͷେจࣈখจࣈͷҧ͍ʹΑΔදهͷ༳Ε͕ଘࡏ͢Δɽ͜ΕΒʹ ରॲ͢ΔͨΊɼΞϧϑΝϕοτͱࣈ֯ɼͦͷଞΛશ֯ʹ1
The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020
ද2: ԠϚχϡΞϧจ(จॻIDද1ͱରԠ) จॻID ԠϚχϡΞϧ NT00026B92 ճλΠτϧ: ిࢠாථʹࡌ͍ͬͯ Δՙࣄނۜߦৼࠐ໌ࡉͷաڈΛ ݟ͍ͨ ճຊจ: ݱ࣌ͰӾཡՄೳͳΑ Γաڈͷग़ͤͳ͍͜ͱΛҊ ྨ: ۀ֓ཁɼిࢠாථγεςϜ NT00025336 ճλΠτϧ: excelͰೖྗ߲͕ॎ ॻ͖ʹͳͬͯ͠·͏ ճຊจ: ஔϘλϯ͔Βॎॻ͖Λ ղআ ྨ: ۀ֓ཁɼpcؔ࿈ NT00029B12 ճλΠτϧ: ిࢠாථͷϖʔδૹ ΓͷϘλϯ͕ফ͍͑ͯΔ ిࢠாථը ໘ͰϖʔδૹΓͰ͖ͳ͍ ճຊจ: දࣔ πʔϧόʔ ϖʔδ όʔΛνΣοΫͰදࣔ͠·͢ ྨ: োରԠɼిࢠாථγεςϜ ද3: ྨใ ྨཻ ྨ େྨ(form) ۀ֓ཁɼোରԠ(2छ) தྨ(system) pcؔ࿈ɼssxؔ࿈ɼΠϯϑϥؔ࿈ɼΧ ϯΨϧʔɼσʔλަɼσδλίɼϓϩ ϑΟοτɼϝʔϧɼԻԠγεςϜɼ ૹঢ়ՙࡳҹγεςϜɼిࢠாථγε ςϜɼిfaxؔ࿈ɹ(12छ) ͠ɼΞϧϑΝϕοτʹؔͯ͠શͯখจࣈʹ౷Ұͨ͠ɽ·ͨɼ ه߸ɼ۟ಡͷআڈΛߦͬͨɽจষΛ୯ޠ୯Ґʹׂ͢Δࡍʹ ܗଶૉղੳثMeCabΛ༻͍ͨɽMeCabͷࣙॻʹݻ༗ද ݱʹڧ͍ͱ͞ΕΔmecab-ipadic-NEologdΛ༻ͨ͠ɽ
3. ԠϚχϡΞϧͷάϧʔϓԽͱੳ
ԠϚχϡΞϧͷੳʹจॻΫϥελϦϯάΛ༻͍ͨɽΫ ϥελϦϯάʹΑΔྨࣅจॻͷߜΓࠐΈͱɼจॻͷ·ͱ·Γ͝ ͱͷಛΛଊ͑Δ͜ͱΛతͱ͢ΔɽԠϚχϡΞϧͷάϧʔ ϓԽɼਓखͰ͚ΒΕͨେྨͱதྨͷϥϕϧʹΑΓׂ ͨ͠ޙɼͦΕͧΕͷϥϕϧʹରͯ͠จॻΫϥελϦϯάΛߦ ͏ɽ͜ΕʹΑΓ107ͷΫϥελΛಘͨɽຊ࣮ݧͰ༻͍ͨจॻ ͷϕΫτϧԽʹ͍ͭͯ3.1અʹɼΫϥελϦϯάͷख๏ʹؔ͠ ͯ3.2અʹड़Δɽ3.3અͰΫϥελΛՄࢹԽͨ݁͠Ռͱ ߟΛࣔ͢ɽ3.1 จॻͷϕΫτϧԽ
จॻͷϕΫτϧԽʹTF-IDFͱword2vec[Mikolov 13]Λ ༻͍ͨɽword2vecʹΑΓಘΒΕͨ୯ޠࢄදݱͷIDFՃॏ Λ༻͍ͯΫϥελϦϯάΛߦͬͨɽ·ͨɼTF-IDFʹΑΓಘ ΒΕͨϕΫτϧΛจॻݕࡧʹ༻͍ͨɽҎԼʹTF-IDFͱ୯ޠ ࢄදݱͷՃॏʹ͍ͭͯͦΕͧΕࣔ͢ɽ 3.1.1 TF-IDF TF-IDFͱɼจॻதͷग़ݱස͕ߴ͘ɼଞͷจॻͰग़ݱ සͷ͍୯ޠ͕จॻΛಛ͚Δ୯ޠʹͳΔͱ͍͏Ծఆʹج ͖ͮॏΈ͚Λߦ͏ख๏Ͱ͋ΔɽTF-IDFࣜ(1)ʹΑΓࢉग़ ͞ΕΔɽTF-IDFʹΑΓࢉग़ͨ͠Λ༻͍ͯޠኮ࣍ݩͷจॻ ϕΫτϧ͕֫ಘͰ͖Δɽ TFIDF(w, d) = TF(w, d)ɾIDF(w) (1) ͜͜ͰɼTF(w, d)จॻdͰͷ୯ޠwͷग़ݱස(Term Fre-quency; TF)Λද͠ɼIDF(w)୯ޠw͕ग़ݱͨ͠จॻͷස ͷٯ(Inverse Document Frequency; IDF)Λද͢ɽC(w, d)Λจॻdʹ͓͚Δ୯ޠwͷग़ݱසͱ͠ɼ୯ޠwͷग़ݱ͢Δ จॻͷස(Document Frequency; DF)ΛDF(w)ɼจॻΛ |d|ɼจॻΛ|D|ͱ͢ΔͱɼͦΕͧΕҎԼͷࣜͰදݱ͞ΕΔɽ TF(w, d) = C(w, d)|d| (2) IDF(w) = log( |D| DF(w)) + 1 (3) 3.1.2 ୯ޠࢄදݱͷՃॏ word2vecʹΑΓ୯ޠϕΫτϧΛ࡞͠ɼจʹग़ݱ͢Δ୯ޠ ͷϕΫτϧͷ૯Λࢉग़͢Δɽ͜ͷͱ͖ɼ୯ޠͷIDFΛ ͱ֤ͯ͠୯ޠϕΫτϧʹࢉ͢Δ͜ͱʹΑΓɼॏཁͳ୯ޠ͕ ΑΓେ͖ͳӨڹΛ࣋ͭΑ͏ʹ͢Δɽࣜ(4)ʹi൪ͷจॻDi ͷϕΫτϧͷࢉग़ํ๏Λࣔ͢ɽ͜͜ͰE(wj)୯ޠຒΊࠐΈ ʹΑΓ֫ಘͨ͠୯ޠwjͷϕΫτϧͱ͠ɼNจʹग़ݱ͢Δ୯ ޠͱ͢ΔɽຊݚڀͰ୯ޠͷ࣍ݩΛ300ʹݻఆ͍ͯ͠Δɽ ͦͷͨΊɼจॻ300࣍ݩͷݻఆϕΫτϧͰද͞ΕΔɽ Di= N j=1 (IDF(wj)ɾE(wj)) (4)
3.2 จॻΫϥελϦϯά
ຊ࣮ݧʹ༻͍ͨΫϥελϦϯάख๏ʹ͍ͭͯઆ໌͢Δɽ Stein-bachΒ[Steinbach 00]֊ܕΫϥελϦϯάk-means๏ ͳͲͷҰൠతͳจॻΫϥελϦϯάख๏ͷ࣮ݧ݁ՌΛࣔͯ͠ ͍ΔɽຊݚڀͰΫϥελϦϯάʹk-means๏Λ༻͍ͨɽਓ खͰ͚ΒΕͨάϧʔϓʹ͍ͭͯ1Ϋϥελʹ͖ͭ100จॻ ఔʹͳΔΑ͏ΫϥελϦϯάΛߦͬͨɽ͜ΕΒͷڭࢣͳ͠Ϋ ϥελϦϯάɼϥϕϧ͕ͳ͘ͱάϧʔϓԽ͕ՄೳͰ͋Δ͜ ͱ͕ར͕ͩɼҙਤͨ͠άϧʔϓ͕Ͱ͖͍ͯΔͱݶΒͳ͍ɽ Αͬͯɼਓ͕֤ؒΫϥελ͔ΒԿΒ͔ͷੑ࣭Λݟ͚ͭΔ͜ͱ͕ ॏཁʹͳΔɽ3.3 ֤ΫϥελͷಛޠͷՄࢹԽ
ΫϥελͷಛޠΛTF-IDFͷେ͖͞ʹج͖ͮϫʔυΫ ϥυʹΑΔՄࢹԽΛߦͬͨɽຊݚڀͰෳͷಛޠͷՄࢹ ԽΛߦ͏͜ͱͰ֤ΫϥελͷಛΛੳͨ͠ɽਤ1ʹ֤άϧʔ ϓͷಛޠͷՄࢹԽΛߦͬͨ݁ՌͷҰ෦Λࣔ͢ɽΫϥελͷಛ ޠ͕ਖ਼͘͠நग़͞Ε͍ͯΔͷݟΒΕ͕ͨɼҰݟ༰ʹࠩ ҟ͕ݟΒΕͳ͍Ϋϥελଘࡏͨ͠ɽ4.1અʹࣔ͢Ϋϥελਪ ఆ݁Ռ͔Βɼ֤Ϋϥελͷࠩҟʹؔ͢ΔߟΛߦͬͨɽ4. ݕࡧਫ਼ʹΑΔධՁ࣮ݧ
ຊݚڀͰఏҊ͢ΔݕࡧϞσϧͷ֓ཁΛਤ2ʹࣔ͢ɽຊϞσ ϧͰɼϢʔβ͕ΫϥελΛબ͢Δ͜ͱʹΑΓɼݕࡧൣғΛ ݶఆ͠จॻΛਖ਼͘͠ਪఆ͢Δ͜ͱ͕ՄೳʹͳΔͱߟ͑ΒΕΔɽ ͦ͜ͰɼϢʔβʹఏࣔ͢ΔΫϥελΛਖ਼͘͠ਪఆͰ͖Δ͔ɼΫ ϥελͷબ͕ݕࡧʹ༗ޮͰ͋Δ͔ͷ2Λݕূ͢Δඞཁ͕ ͋ΔɽΑͬͯɼຊষͰɼΫϥελͷਪఆʹؔ͢Δख๏ͱͦͷ2
The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020
ਤ 1: ֤ΫϥελͷϫʔυΫϥυʹΑΔՄࢹԽ݁ՌͷҰ෦ (֤୯ޠͷେ͖͞TF-IDFʹجͮ͘ɽ·ͨɼਤதͷʮݪථʯ ʮૹΓঢ়ʯͷ͜ͱΛࣔ͢ɽ) ਤ2: Ϋϥελͷਪఆ݁ՌΛ༻͍ͨݕࡧͷྲྀΕ(UԠϚχϡ Ξϧͷจॻ܈ɼCΫϥελɼDจॻɼQΫΤϦΛද͢ɽ D4Λਖ਼ղจॻͱͨ͠ͱ͖ɼΫΤϦ͔ΒྨࣅͷΫϥελΛఏࣔ ͠ɼબ͞ΕͨΫϥελͷத͔ΒจॻͷݕࡧΛߦ͏ɽ) ݁ՌΛࣔ͠ɼΫϥελͰݕࡧ͢Δ͜ͱʹΑΔݕࡧਫ਼্ ͷ༗ޮੑΛ֬ೝ͢ΔධՁ࣮ݧʹ͍ͭͯड़Δɽ༗ޮੑΛ֬ೝ͢ Δʹ͋ͨΓɼҰൠతͳݕࡧγεςϜͱಉ͡શจ͔Βͷݕࡧͱɽ Ϋϥελͷਪఆ݁ՌΛ༻͍ͨݕࡧΛߦ͍ൺֱͨ͠ɽ 4.1અʹΫϥελͷਪఆख๏ʹ͍ͭͯઆ໌͠ɼ4.2અʹݕࡧ ख๏ɼ࠷ޙʹ4.3અͰ࣮ݧ݁ՌΛࣔ͠ɼͦͷߟΛߦ͏ɽ
4.1 Ϋϥελͷਪఆ
ຊ࣮ݧͰɼΫΤϦ͔Βਖ਼ղจॻ͕ଐ͢ΔΫϥελΛਪఆͰ ͖Δ͔Λݕূͨ͠ɽ୯ޠࢄදݱͷՃॏʹΑΓ࡞͞ΕͨΫ ΤϦͷϕΫτϧΛೖྗͱ͠ɼ୯७ͳ3χϡʔϥϧωοτϫʔ ΫΛ༻͍ͯਫ਼ͷݕূΛߦͬͨɽͳ͓ɼग़ྗ֤ΫΤϦʹඥͮ ͘ਖ਼ղจॻͷΫϥελ൪߸ͱͨ͠ɽ·ͨɼͦͷࡍͷσʔλΛ ද4ʹࣔ͢ɽΫϥελਪఆਫ਼ͷධՁAccuracyΛ༻͍ͨɽ4.2 จॻݕࡧख๏
จॻݕࡧϕΫτϧۭؒϞσϧʹج͍ͮͯߦͬͨɽΫΤϦͱ จॻΛͦΕͧΕϕΫτϧԽ͠ɼϕΫτϧಉ࢜ͷྨࣅΛࢉग़͠ ͨɽWealΒ[Gomaa 13]ςΩετͷྨࣅੑʹؔ͢Δख๏Λ ·ͱΊ͍ͯΔɽྨࣅੑͷଌఆจॻݕࡧʹ͓͍ͯ༻͍ΒΕɼ ຊݚڀͰɼྨࣅͷࢉग़ʹίαΠϯྨࣅΛ༻͍ͨɽࣜ(5) ʹίαΠϯྨࣅͷࢉग़ํ๏Λࣔ͢ɽ͜͜Ͱx,yจॻϕΫ τϧͱ͢Δɽ ද4: ΫΤϦͷྨਪఆʹ༻͍ͨσʔλ ༻్ σʔλ ֶशσʔλ 12,009 ݕূσʔλ 1,715 ςετσʔλ 3,431 ද5: Ϋϥελਪఆͷਖ਼ղ ਖ਼ղΫϥελͷग़ݱॱҐ ਖ਼ղ ্Ґ1Ґ 66.2% ্Ґ3ҐҎ 85.7% cos(x, y) = xɾy |x||y| = n i=1xiyi n i=1x2i n i=1yi2 (5) จॻͷϕΫτϧԽ4.1અͱಉ༷ʹTF-IDFʹΑΓٻΊΔɽධ Ձʹද4ʹࣔͨ͠ςετ༻σʔλ3,431݅Λ༻͍ͨɽ 4.2.1 ධՁख๏ จॻݕࡧʹ͓͍ͯɼਖ਼ղจॻ্͕Ґʹग़ݱ͢Δ͔൱͔ॏཁ ͳࢦඪͱݴ͑ΔɽΑͬͯɼݕࡧ݁Ռ্Ґʹ͓͚Δద߹จॻͷ ૿ݮΛධՁ͢ΔͨΊʹɼ্Ґ10݅Λରͱͨ͠࠶ݱͷฏۉ(mean Recall; mRecall)Λ༻͍ͨɽCii൪ͷΫΤϦʹ͓ ͚Δਖ਼ղจॻΛࣔ͠ɼRii൪ͷΫΤϦʹର͢Δ্Ґ10 ݅Ҏʹଘࡏ͢Δద߹จॻΛࣔ͢ɽ mRecall = 1 |Q| |Q| i=1 Recall(i) (6) Recall(i) = R i Ci (7) ·ͨɼਖ਼ղจॻ্͕Ґʹग़ݱ͢Δ΄Ͳݕࡧੑೳྑ͍ͱݴ͑Δ ͨΊɼฏۉద߹ͷฏۉ(Mean Average Precision; MAP)Λ ༻͍ͯධՁͨ͠ɽࣜ(8)ͦͷࣜΛࣔ͠ɼ͜Εࣜ(9)ʹΑͬ ͯٻΊͨi൪ͷΫΤϦͷAPͷAP(i)ΛશΫΤϦ|Q| Ͱฏۉͨ͠ͷͱݴ͑Δɽͨͩ͠ɼࣜ(9)ͷO(rk)ਖ਼ղจ ॻrkͷॱҐΛද͢ɽ MAP = 1 |Q| |Q| i=1 AP(i) (8) AP(i) = 1 Ci Ci k=1 k O(rk) (9) ·ͨɼରܗࣜʹ͓͍ͯɼ্Ґ1Ґʹݕࡧ݁Ռ͕ݱΕΔ͜ͱ ͕ॏཁͰ͋ΔɽΑͬͯɼਖ਼ղจॻ্͕Ґ1Ґʹग़ݱ͢Δׂ߹Λ Precision@1ͱͯ͠ධՁࢦඪͱͨ͠ɽ
4.3 ݁Ռɾߟ
4.3.1 Ϋϥελͷਪఆਫ਼ Ϋϥελͷਪఆʹؔ͢Δ࣮ݧ݁ՌΛ ද5ʹࣔ͢ɽϢʔβʹ ఏࣔ͢ΔΫϥελΛ3݅ఔͱԾఆ͠ɼ্Ґ3݅·Ͱʹਖ਼ղ ͷΫϥελͷਪఆ͕ߦ͑Δ͔Λ֬ೝͨ͠ɽ݁Ռ͔Β66.2%ͷ ΫΤϦ1Ґʹਖ਼͍͠ΫϥελΛਪఆͰ͖ɼ্Ґ3Ґ·Ͱʹ 85.7%ͷΫΤϦ͕ਖ਼͍͠ਪఆΛߦ͑Δ͜ͱΛ֬ೝͨ͠ɽ3
The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020
ਤ 3: ʮQ.ΩʔϘʔυͷࠨͱԼͷҹΩʔͷԠ͕ѱ͍ʯʹ ͓͚ΔΫϥελͷਪఆ݁Ռ্Ґ3݅ͷϫʔυΫϥυ(ׅހͷ தsoftmaxؔʹΑΔग़ྗͷΛࣔ͢ɽ) ਤ4: ʮQ.PC͕উखʹىಈ͢Δʯʹ͓͚ΔΫϥελͷਪఆ݁ Ռ্Ґ3݅ͷϫʔυΫϥυ(ׅހͷதsoftmaxؔʹΑ Δग़ྗͷΛࣔ͢ɽࠨͷϫʔυΫϥυ͕ΫΤϦʹ͓͚Δਖ਼ ղɼͦͷӈʹ͋Δͷ͕ਪఆ͞ΕͨΫϥελΛࣔ͢ɽ) 4.3.2 Ϋϥελͷਪఆʹؔ͢Δߟ ΫϥελਪఆΛߦͬͨ݁Ռɼ͍͔ͭ͘ͷΫΤϦਖ਼͘͠ਪఆ ͢Δ͜ͱ͕Ͱ͖ͳ͔ͬͨɽͦ͜ͰɼχϡʔϥϧωοτϫʔΫͷ ग़ྗͷΛ֬ೝ͠ߟΛߦͬͨɽχϡʔϥϧωοτϫʔΫͷग़ ྗsoftmaxؔΛ༻͍͓ͯΓɼͦͷ֤Ϋϥελʹॴଐ ͢Δ֬ͱݟͳ͢͜ͱ͕Ͱ͖Δɽਖ਼͘͠ਪఆ͕Ͱ͖͍ͯΔྫ Λਤ3ʹࣔ͢ɽ͜ΕɼʮΩʔϘʔυͷࠨͱԼͷҹΩʔͷ Ԡ͕ѱ͍ʯͱ͍͏ΫΤϦʹର͢ΔΫϥελͷਪఆ݁ՌͰ͋Δɽ ༗ࣝऀ͔Βɼग़ྗ݁Ռͷ1Ґͱ2ҐͷΫϥελɼPCؔ࿈ͷ ʮނো/อकʯͱʮૢ࡞ํ๏ʯΛࣔ͢ͱ͍͏ࠩҟ͕͋Δͱͷҙݟ ͕ಘΒΕͨɽ͜Εɼ1ҐͷΫϥελʹݟΒΕΔʮอकʯɼ 2ҐͷΫϥελʹݟΒΕΔʮ͔ͳೖྗʯͳͲͷ୯ޠ͔Βਪଌ ͞ΕΔɽೖྗ͞ΕͨΫΤϦނোʹؔ͢Δ༰ͱͳ͓ͬͯΓɼ ਖ਼͘͠ਪఆͰ͖ͨͱߟ͑ΒΕΔɽͨͩ͠ɼҰݟྨࣅͨ͠Ωʔ ϫʔυ͕ฒΜͰ͓Γɼۀʹର͢Δཧղ͕ͳ͍߹ʹΫϥε λͷผࠔʹݟ͑Δɽ Ϋϥελͷਪఆ͕Ͱ͖ͳ͔ͬͨྫΛਤ4ʹࣔ͢ɽਤ4ʹ͓͚ Δग़ྗ1Ґਤ3ʹ͓͚Δग़ྗ1Ґͱಉ༷ͱͳ͓ͬͯΓɼʮނ ো/อकʯʹؔ͢Δจॻ܈Λࣔ͢ɽΫΤϦͷ༰Λނোͱଊ͑ Δͱਪఆ݁Ռؒҧ͍ͱݴ͍Εͳ͍͕ɼਖ਼ղͷΫϥελ ҟͳΔɽֶशσʔλͷภΓ͕͋ͬͨͱߟ͑ΒΕΔ͕ɼਖ਼ղͷ Ϋϥελͷਪఆ͕Ͱ͖ͳ͔ͬͨ͜ͱ͔ΒɼΫϥελϦϯάͷଥ ੑʹ͕ٙΔɽ 4.3.3 ֤ݕࡧख๏ʹΑΔݕࡧਫ਼ ຊݚڀʹ͓͚Δ22ͷΈͰɼఏࣔͨ͠Ϋϥελ ΛϢʔβʹબͤ͞Δաఔ͕ଘࡏ͢Δɽຊ࣮ݧͰɼΫϥελ ͷਪఆ݁Ռ͔Β1Ґʹਪఆ͞ΕͨΫϥελ͕બ͞Εͨ߹ (Ծఆ1)ͱɼ3Ґ·ͰΛߟྀͦ͠ͷத͔Βਖ਼ղͷΫϥελ͕બ ͞Εͨ߹(Ծఆ2)ͷݕࡧਫ਼ΛධՁͨ͠ɽՃ͑ͯɼݕࡧ ൣғͷߜΓࠐΈΛߦΘͳ͍શจݕࡧͷਫ਼ͷൺֱΛߦͬͨɽ ݕࡧਫ਼ʹؔ͢Δ࣮ݧ݁ՌΛ ද6ʹࣔ͢ɽԾఆ2ͷ߹͕ ֤ࢦඪͰ࠷ߴ͍Λࣔͨ͠ɽΑͬͯɼϢʔβ͕ਖ਼͍͠Ϋϥε λͷબΛߦ͏͜ͱ͕Ͱ͖Εɼݕࡧਫ਼ͷ্ʹܨ͕Δͱ ظͰ͖Δɽ·ͨɼԾఆ1ͷmRecall͕࠷͍Λࣔͨ͠ɽ Ϋϥελਪఆͷ݁ՌΛ༻͍ͯݕࡧΛߦ͏࣌ɼΫϥελਪఆͷஈ ද6: ݕࡧਫ਼ʹؔ͢Δ࣮ݧ݁Ռ(Ϋϥελબબ͞Εͨ ΫϥελͰͷݕࡧΛࢦ͢ɽ·ͨɼ@Ҏ߱ͷࣈݕࡧ݁Ռͷ ্ҐԿ݅·ͰΛධՁରͱ͢Δ͔Λࣔ͢ɽ)
ख๏ mRecall@10 MAP@10 Precision@1
શจݕࡧ 0.630 0.366 0.257 Ϋϥελબ (Ծఆ 1) 0.589 0.383 0.286 Ϋϥελબ (Ծఆ 2) 0.769 0.510 0.387 ֊Ͱਖ਼ղ͕ಘΒΕͳ͍߹ɼਖ਼ղจॻ͕ݕࡧ݁Ռʹग़ݱ͢Δ ͜ͱͳ͘ɼ࠷ऴతͳݕࡧਫ਼ʹେ͖ͳӨڹΛٴ΅͢ͱߟ͑Β ΕΔɽMAPPrecision@1ʹ͓͍ͯɼશจݕࡧΑΓΫϥε λબʹΑΔݕࡧ͕ߴ͍Λࣔͨ͠ɽΑͬͯɼΫϥελͷબ ʹΑΔݕࡧൣғ͕ߜΓࠐΈ͕༗ޮʹಇ͘Մೳੑ͕͋Δɽ
5. ·ͱΊ
ຊݚڀͰจॻݕࡧʹ͓͚Δ22ͷΈʹର͢Δݕ ূΛߦͬͨɽ࠷ॳʹจॻݕࡧʹ͓͍࣮ͯࡍʹ༻͞Ε͍ͯΔ ίʔϧηϯλʔͷϔϧϓσεΫʹ͓͚ΔԠϚχϡΞϧͷੳ Λߦ͍ɼͦͷޙ࣭ԠίʔύεΛ༻͍ͨݕࡧਫ਼ͷݕূΛ ߦͬͨɽ·ͣɼԠϚχϡΞϧͷੳΛߦ͏ͨΊɼจॻΫϥε λϦϯάͷ݁ՌΛϫʔυΫϥυʹΑΔಛޠͷՄࢹԽΛߦͬ ͨɽ࣍ʹɼΫϥελϦϯά݁ՌΛ༻͍ͨݕࡧਫ਼ͷݕূΛߦͬ ͨɽΫΤϦ͔ΒΫϥελΛਪఆ͢Δͱ͍͏աఔΛ͞Ή͜ͱʹ ΑΓɼ௨ৗͷશจॻݕࡧΑΓਖ਼ղจॻΛ্Ґ1Ґʹఏࣔ͢ Δ͜ͱ͕ظͰ͖Δɽ6. ࠓޙͷ՝
ຊ࣮ݧͰ٬؍ධՁʹΑͬͯ༗ޮੑͷݕূΛߦ͕ͬͨɼ͜Ε ͕࣮ࡍͷݱͰ༗༻͔ݕূͰ͖͍ͯͳ͍ɽಛʹΫϥελͷબ ʹ͓͍ͯਓ͕ؒ༰қʹదͳΫϥελΛબ͕Ͱ͖Δͱ ݶΒͣɼϢʔβͷΞϯέʔτͳͲΛ༻͍ͯओ؍ධՁΛߦ͏ඞ ཁ͕͋Δɽ·ͨɼ݁ՌΛݟΔͱશจݕࡧͱൺֱͯ͠ಛఆͷਫ਼ ͷ্͕ߦ͍͑ͯΔͷͷɼ࣮༻্ेͳݕࡧਫ਼ͱݴ͑ͳ ͍ɽຊ࣮ݧͰTF-IDFword2vecͱ͍͏ݹయతͳख๏Λ ༻͍ͯจॻͷϕΫτϧԽΛߦ͕ͬͨɼۙͰχϡʔϥϧωο τϫʔΫΛ༻͍ͨϕΫτϧԽɼݕࡧ͕Α͘ߦΘΕ͓ͯΓߴ͍ਫ਼ ͕֬ೝ͞Ε͍ͯΔɽͦΕΒͷٕज़Λ༻͍ͯਫ਼ͷ্ΛਤΔ ͱͱʹɼຊΈ͕༗ޮͰ͋Δ͔ͷݕূΛߦ͏ඞཁ͕͋Δɽࢀߟจݙ
[Mikolov 13] Tomas MikolovɼIlya SutskeverɼKai Chenɼ
Greg S CorradoɼJeff Dean : Distributed representa-tions of words and phrases and their compositionalityɼ
In Advances in Neural Information Processing Systems 26ɼpp.3111-3119ɼ2013
[Steinbach 00] Michael SteinbachɼGeorge KarypisɼVipin Kumar : A Comparison of Document Clustering Tech-niquesɼIn KDD-Workshop on Text Miningɼ2000 [Gomaa 13] Weal H. GomaaɼAly A. Fahmy : A survey
of text similarity approachesɼInternational Journal of Computer Applications, vol.68, noɽ13ɼpp.13-18ɼ
2013