• 検索結果がありません。

TETDMを用いた汎用性を考慮したシステムの設計指針に関する基礎的検討

N/A
N/A
Protected

Academic year: 2021

シェア "TETDMを用いた汎用性を考慮したシステムの設計指針に関する基礎的検討"

Copied!
6
0
0

読み込み中.... (全文を見る)

全文

(1)

TETDM

Λ༻͍ͨ൚༻ੑΛߟྀͨ͠γεςϜͷ

ઃܭࢦ਑ʹؔ͢Δجૅతݕ౼

Consideration of Design Guide

for Constructing General Purpose System using TETDM

ֿฒ ஌ه

1∗

ా୅ ߤҰ

2

རࠜ઒ ୓അ

2

๺ଜ ါ໵

2

ߴؒ ߁࢙

2

Tomoki Kajinami

1

Koichi Tashiro

2

Takuma Tonegawa

2

Yuuya Kitamura

2

Yasufumi Takama

2

1

ਆಸ઒޻Պେֶ

1

Kanagwa Institute of Technology

2

ट౎େֶ౦ژ

2

Tokyo Metropolitan University

Abstract: This paper considers a collaborative policy for combining tools, in development of system using TETDM. TETDM is an total environment for text data mining, can prepare for various mining tasks by combination of small mining tools. However, an useful guide in the design of system constructed with several small tools developed by different tool developers has not been considered. This paper describes a design guide adjusting user’s purpose and system’s specifications for constructing general purpose system, and shows an example of practice.

1

͸͡Ίʹ

ຊߘͰ͸ɼTETDM Λ༻͍ͨγεςϜ։ൃʹ͓͚Δɼ πʔϧಉ࢜ͷ࿈ܞํ਑ʹ͍ͭͯݕ౼͢ΔɽTETDM ͸ɼ ςΩετσʔλϚΠχϯάͷͨΊͷ౷߹؀ڥͰ͋Γɼখ ن໛ͳπʔϧಉ࢜Λ࿈ܞͤ͞Δ͜ͱͰଟ༷ͳλεΫ΁ର ԠՄೳͱ͍ͯ͠Δ [6]ɽπʔϧͷछྨ͸ɼʮϚΠχϯάॲ ཧπʔϧʯͱʮՄࢹԽπʔϧʯͷ̎ͭʹ෼ྨ͞ΕɼϢʔ β͸೚ҙͷϚΠχϯάॲཧπʔϧͱՄࢹԽπʔϧΛ 1 ͭͣͭબ୒͠ɼͦΕΒΛ 1 ର 1 Ͱ૊Έ߹Θͤͯ࢖͏͜ ͱͰɼςΩετ෼ੳΛߦ͏ɽ͜͜Ͱɼ1 ର 1 ͷ૊Έ߹Θ ͤ͸ෳ਺छྨಉ࣌ʹ࢖༻͢Δ͜ͱ͕ՄೳͰɼTETDM ্ͷෳ਺ຕͷύωϧͦΕͧΕʹɼϚΠχϯάॲཧπʔ ϧͱՄࢹԽπʔϧͷ૊͕ 1 ͭͣͭ഑ஔ͞ΕΔɽ͜Εʹ ΑΓɼಉ࣌ʹ͞·͟·ͳ؍఺͔ΒςΩετ෼ੳΛߦ͏ ͜ͱ͕Մೳʹͳ͍ͬͯΔɽ·ͨɼTETDM Ͱ͸ɼπʔ ϧؒͷ࿈ಈͱͯ͠ɼଞͷπʔϧ͔Βग़ྗ͞ΕΔσʔλ ΛผͷπʔϧͰར༻͢Δ࢓૊Έ͕༻ҙ͞Ε͍ͯΔɽ͜ ΕʹΑΓɼςΩετ෼ੳʹͱͲ·Βͣɼෳ਺ͷπʔϧ ͔ΒͳΔςΩετσʔλϚΠχϯάγεςϜΛ։ൃ͢ ΔϓϥοτϑΥʔϜͱͯ͠ TETDM Λ׆༻͢Δ͜ͱ΋ɼ Մೳͱͳ͍ͬͯΔɽ ∗࿈བྷઌɿਆಸ઒޻Պେֶ৘ใֶ෦৘ใ޻ֶՊ ɹɹɹɹɹɹ˟ 243-0292 ਆಸ઒ݝް໦ࢢԼԮ໺ 1030 ɹɹɹɹɹɹ E-mail: [email protected] ͔͠͠ͳ͕Βɼෳ਺ͷ։ൃऀ͕ݸผʹ։ൃͨ͠খن ໛πʔϧಉ࢜Λ࿈ܞͯ͠γεςϜΛઃܭ͢Δࢦ਑ʹͭ ͍ͯ͸ະݕ౼Ͱ͋ΔɽຊߘͰ͸ɼର࿩తͳΫϥελϦϯ ά؀ڥͷߏஙΛ໨తͱͨ͠γεςϜ։ൃʹ͓͍ͯɼ໨త ༏ઌͱखஈ༏ઌࢤ޲Λ੆Γ߹ͤΔγεςϜઃܭࢦ਑ʹ ͍ͭͯड़΂ΔɽTETDM ͷ࢓༷Λมߋͤͣɼ࢓༷ͷ੍ ݶ͕͋ΔதͰจষΛର৅ʹͨ͠ର࿩తͳΫϥελϦϯ ά؀ڥΛߏங͢Δຊݚڀͷҙٛ͸ɼҎԼͷ 3 ఺Ͱ͋Δɽ 1. TETDM ʹඋΘ͍ͬͯΔπʔϧ࿈ಈͷ࢓༷͔Βҳ ୤ͤͣର࿩తͳΫϥελϦϯά؀ڥΛ TETDM ্ʹ࣮૷͢ΔྫΛࣔ͢͜ͱͰɼTETDM ্ʹ৽ ͨͳϓϥοτϑΥʔϜΛߏங͢ΔγεςϜ։ൃʹ Ԡ༻Ͱ͖Δɽɹ 2. ՄࢹԽπʔϧͱͷ૊Έ߹ΘͤΛ૝ఆ͠ͳ͍ෳ਺ͷ ϚΠχϯάॲཧπʔϧΛ౷߹తʹѻ͏ख๏ΛఏҊ ͢Δ͜ͱͰɼϚΠχϯάॲཧπʔϧͱՄࢹԽπʔ ϧΛ 1 ର 1 ରԠͤ͞Δ TETDM ͷಛ௃Λ׆͔͠ ͭͭɼTETDM Λ֦ு͢Δํ޲ੑΛࣔ͢ɽ 3. ର࿩తͳΫϥελϦϯάͷͨΊͷɼ౷Ұతͳσʔ λͷ΍ΓऔΓΛՄೳͱ͢Δ͜ͱͰɼΫϥελϦϯ άʹؔ࿈͢ΔπʔϧΛɼಉҰ؀ڥ্Ͱൺֱ͠қ͘ ͳΔɽ

(2)

ຊߘͰ͸ɼෳ਺ͷπʔϧΛ࿈ܞ͠ TETDM ্ʹର࿩ తͳΫϥελϦϯά؀ڥΛߏங͢Δ͜ͱΛ໨ࢦ͕͢ɼಛ ఆͷΫϥελϦϯάख๏ʹಛԽͨ͠Γɼٕज़จॻͷ෼ ྨ΍঎඼ϨϏϡʔͷ෼ྨͱ͍ͬͨಛఆͷλεΫʹಛԽ ͢Δ΋ͷͰ͸ͳ͘ɼ൚༻తͳ΋ͷͰ͋ΔɽͦͷͨΊɼ൚ ༻ੑΛҙࣝͨ͠ɼπʔϧ࿈ܞͷํུΛݕ౼͢Δɽ ຊߘͷߏ੒͸ҎԼͷͱ͓ΓͰ͋Δɽ2 અͰɼTETDM ͷԠ༻ʹؔ͢Δݚڀʹ͍ͭͯड़΂ɼຊߘͷҐஔ͚ͮΛ ໌֬ʹ͢Δɽ3 અͰɼର࿩తͳΫϥελϦϯά؀ڥʹ ͓͚Δπʔϧͷ໾ׂͷಉఆ΍ɼπʔϧಉ࢜Ͱ΍ΓͱΓ ͢Δσʔλͷ಺༰Λந৅ԽɼσʔλͷܕΛఆٛ͢Δɽ4 અͰɼπʔϧΛ౷߹తʹѻ͏؅ཧύωϧํࣜͷఏҊΛ ߦ͍ɼ5 અͰɼࢼݧతͳγεςϜ࣮૷ྫΛࣔ͢ɽ

2

ؔ࿈ݚڀ

2.1

TETDM ͷ׆༻΍֦ு

TETDM Λ༻͍Δ͜ͱͰɼϢʔβ͸͞·͟·ͳπʔ ϧ͔ΒɼγεςϜ্ՄೳͳൣғͰ೚ҙͷ૊Έ߹ΘͤΛ બ୒ͯ͠ɼςΩετ෼ੳॲཧͷ݁ՌΛಘΔ͜ͱ͕Ͱ͖ Δɽ࣮ફతͳ׆༻ྫͱͯ͠ɼҩྍݱ৔ͰͷΧϧς෼ੳ ͕͋Δ [7]ɽ·ͨɼR ͱ͍ͬͨطଘͷ෼ੳιϑτ΢ΣΞ ͱ࿈ܞ͠ɼTETDM Λ֦ு͢Δݚڀ΋͓͜ͳΘΕ͍ͯ Δ [8]ɽTETDM ͷ֦ுʹؔ͢Δݚڀͱͯ͠ɼϚΠχϯ άॲཧπʔϧͱՄࢹԽπʔϧͷ૊Έ߹ΘͤΛϢʔβ͕ ೳಈతʹબ୒͢Δඞཁ͕͋Δ TETDM ͷಛ௃ʹண໨͠ ͍ͯΔ΋ͷ͕͋ΔɽTETDM ͷίΞͱͳΔϓϩάϥϜ ෦෼΋ΦʔϓϯιʔεͰ͋Δ͜ͱΛ׆͔ͯ͠πʔϧͷ ૊Έ߹Θͤ࡞ۀͷࢧԉ͕ߦΘΕ͍ͯΔ [2][4]ɽ ຊߘͰ͸ɼςΩετ෼ੳΛߦ͏ಛఆͷݱ৔Λ૝ఆ͠ ͨ΋ͷͰ͸ͳ͘ɼ·ͨɼطଘͷιϑτ΢ΣΞͱͷີͳ࿈ ܞΛ໨ࢦ͢΋ͷͰ͸ͳ͍ɽຊߘ͸ TETDM ͷ֦ுʹؔ ͢ΔݚڀͰ͋Δ͕ɼTETDM ͷίΞͱͳΔϓϩάϥϜ ෦෼ʹ͸৮ΕͣɼϚΠχϯάॲཧπʔϧͱՄࢹԽπʔ ϧͷ 2 छྨͷπʔϧΛ࣮૷͢Δ࿮૊ΈɼTETDM ͷ࢓ ༷ʹै͍πʔϧಉ࢜Λ࿈ܞͤ͞Δ࿮૊ΈͷதͰɼ৽ͨ ͳϓϥοτϑΥʔϜΛߏங͢Δ΋ͷͰ͋Δɽ

2.2

ର࿩తͳΫϥελϦϯά

ର࿩తͳΫϥελϦϯά͸ɼϢʔβͷཁٻʹԠͨ͡ ΫϥελϦϯά݁ՌΛग़ྗ͢ΔͨΊͷํ๏ͰɼϢʔβ ʹΑΔΫϥελϦϯάʹඞཁͳύϥϝʔλɼ੍໿ͷೖྗ Λࢧԉ͢Δ [3]ɽϢʔβ͸ɼࣗ਎ͷҙਤ΍എܠ஌ࣝΛߟ ྀͨ͠ΫϥελϦϯά΁ͷ੍໿෇༩Λߦ͍ɼΫϥελ Ϧϯάͨ݁͠ՌͱͷΠϯλϥΫγϣϯΛ܁Γฦ͠ɼ๬ ΈͷΫϥελϦϯά݁ՌΛಘΔɽର࿩తͳΫϥελϦ ϯά͸ɼจॻͷ෼ྨʹԠ༻͞Ε͍ͯΔ [5]ɽ ຊߘͰ͸ɼର࿩తͳΫϥελϦϯά؀ڥͷߏஙΛ໨ ࢦ͕͢ɼπʔϧͷ૊Έ߹ΘͤʹΑͬͯ͞·͟·ͳࢹ఺ ͔ΒςΩετσʔλΛோΊɼΠϯλϥΫςΟϒʹ෼ੳ ͢Δ TETDM ͱɼҟͳΔΫϥελϦϯά݁ՌΛฒྻʹ ோΊɼ͔ͦ͜ΒϢʔβͷҙਤ΍എܠ஌ࣝʹԠͯ͡ɼ൓ ෮తʹΫϥελϦϯάΛߦ͏ର࿩తͳΫϥελϦϯά ؀ڥʹ͸਌࿨ੑ͕͋Δͱߟ͑Δɽ

3

ΫϥελϦϯάͷͨΊͷπʔϧ࿈

ຊߘͰ͸ɼϢʔβ͕ෳ਺ͷΫϥελϦϯά݁ՌΛݟ ൺ΂Δ͜ͱ͕Ͱ͖ɼ·ͨ࢖༻͢ΔΫϥελϦϯάख๏ɼ ֤छύϥϝʔλͷઃఆ͕ಈతʹߦ͑Δ؀ڥͷߏஙΛ૝ ఆͨ͠γεςϜઃܭͷํུΛߟ͑Δɽ·ͨɼΫϥελ Ϧϯά݁ՌʢՄࢹԽʣͱͯ͠Ϣʔβ͕ݟ͍ͨ৘ใ͸ɼΫ ϥελू߹ɼͦΕʹؚ·ΕΔΫϥελɼΫϥελʹ෼ ྨ͞Ε͍ͯΔจॻɼจॻ಺ͷ୯ޠͷ 4 छྨͰ͋Δͱ૝ ఆ͢Δɽ3.1 અͰɼΫϥελϦϯάͷྲྀΕΛ 3 ஈ֊ʹ Θ͚ɼϚΠχϯάॲཧπʔϧ΍ՄࢹԽπʔϧͱͷରԠ ʹ͍ͭͯड़΂Δɽ3.2 અͰɼର࿩తͳΫϥελϦϯάͷ ͨΊͷɼπʔϧؒ࿈ಈͰ༻͍Δσʔλܕʹ͍ͭͯड़΂ Δɽ3.3 અͰɼෳ਺ਓ͔ΒͳΔγεςϜ։ൃͷதͰ࣮ࡍ ʹߦͬͨɼπʔϧͷ෼ྨ࡞ۀɼπʔϧؒ࿈ಈͷ੔߹ੑ ֬ೝ࡞ۀʹ͍ͭͯड़΂Δɽ

3.1

ΫϥελϦϯάͷྲྀΕ

ΫϥελϦϯάͷ࣮ߦखॱΛେ͖͘ 3 ஈ֊ʹ෼͚Δ ͱɼҎԼͷΑ͏ʹͳΔɽՄࢹԽॲཧஈ֊ͱ TETDM ͷ ՄࢹԽπʔϧ͸ࣗવͱద߹͢Δ͕ɼલॲཧͱΫϥελ Ϧϯάॲཧ͸ɼͱ΋ʹϚΠχϯάॲཧπʔϧͱ࣮ͯ͠ ૷͢Δɽ લॲཧ ΫϥελϦϯά͢ΔจॻͷϕΫτϧԽɾಛ௃ྔ ͷࢉग़͢Δஈ֊ ΫϥελϦϯάॲཧ ೚ҙͷΫϥελϦϯάख๏ʹΑΓ ΫϥελϦϯά͢Δஈ֊ ՄࢹԽॲཧ બ୒ͨ͠ΫϥελϦϯάख๏ʹԠͨ͡/Ϣʔ βͷҙਤʹԠͨ͡ՄࢹԽख๏ʹΑͬͯɼΫϥελ Ϧϯά݁ՌΛग़ྗ/ϑΟϧλϦϯά͢Δஈ֊ ෳ਺ͷπʔϧΛ૊Έ߹ΘͤͯΫϥελϦϯάγες ϜશମΛߏ੒͢ΔͨΊɼ࠷௿Ͱ΋֤ஈ֊ 1 ͭͣͭͷπʔ ϧΛ࿈݁͢Δ͜ͱͰɼΫϥελϦϯά͕Ұ௨Γ׬ྃͰ ͖Δ͜ͱʹͳΔɽ

(3)

3.2

ΫϥελϦϯάʹඞཁͳσʔλܕ

ຊઅͰ͸ɼπʔϧͷ໾ׂ෼୲Λߟ͑ΔࡍɼγεςϜ ઃܭͷࡍʹ࠾༻͞ΕΔσʔλͷྲྀΕʹண໨͢Δߟ͑ํ [1] ΛࢀߟʹɼπʔϧؒͰ΍ΓͱΓ͢Δσʔλʹ۩ମੑ Λ࣋ͨͤͯݕ౼Λߦ͏ɽ ͜͜Ͱ͸ɼલॲཧஈ֊ɼΫϥελϦϯάॲཧஈ֊ɼՄ ࢹԽஈ֊ͷؒʹͲͷΑ͏ͳσʔλ͕ඞཁͰ͋Δ͔ݕ౼ ͢ΔɽͰ͖Δ͚ͩෳࡶʹͳΒͣɼͳ͓͔ͭϢʔβ͕ඞ ཁͱ͢ΔཁૉʢΫϥελू߹ɼͦΕʹؚ·ΕΔΫϥε λɼΫϥελʹ෼ྨ͞Ε͍ͯΔจॻɼจॻ಺ͷ୯ޠʣΛ දݱ͢Δͷʹे෼ͳσʔλܕͰ͋Δඞཁ͕͋Δɽͳ͓ɼ TETDM ͷ࢓༷ʹै͍ɼϢʔβ͔ΒγεςϜʹೖྗ͢ Δ΋ͷ͸ςΩετܗࣜͷจॻϑΝΠϧͱ͢Δɽೖྗจ ॻ͸୯ҰϑΝΠϧͱ͸ݶΒͣɼෳ਺ͷจॻϑΝΠϧʹ ΋ରԠͰ͖Δɽ·ͨɼTETDM ͷඪ४తͳػೳʹΑΓɼ จॻΛஈམ΍จষɼ୯ޠʹ෼ׂ͢Δૢ࡞͸׬͓ྃͯ͠ Γɼจॻ಺ͷจষ਺΍୯ޠ਺ͳͲ͸ಛఆͷม਺ʹ֨ೲ ͞Εɼ·ͨಛఆͷ୯ޠͳͲΛɼ഑ྻͷཁૉ਺ʢIDʣΛ ࢦఆ͢Δ͜ͱͰҰҙʹఆΊΔ͜ͱ͕Ͱ͖Δ͜ͱΛલఏ ͱ͍ͯ͠Δɽ͕ͨͬͯ͠ɼຊߘͰ͸ɼπʔϧؒͰ۩ମ తʹ΍ΓͱΓ͢Δσʔλͷ಺༰ΛจॻϕΫτϧϦετɼ ΫϥελจॻϦετɼΫϥελ୯ޠϦετͷ 3 ͭͱ͠ɼ TETDM Ͱ༻ҙ͞Ε͍ͯΔɼπʔϧ࿈ಈ༻ͷσʔλܕ ʹରԠͤ͞ΔɽจॻϕΫτϧϦετ͸ɼจॻͱ୯ޠͷ 2 ࣍ߦྻͰఆٛ͢Δɽத਎͸ɼ೚ҙͷಛ௃ྔʢTF-IDF ͳͲʣʹΑͬͯܭࢉ͞Ε֤ͨ୯ޠͷॏΈͱͳΔɽΫϥ ελจॻϦετ͸ɼΫϥελΛߦɼΫϥελʹؚ·Ε ΔจॻΛྻͱ͢Δ 2 ࣍ߦྻͰఆٛ͢ΔɽΫϥελ୯ޠ Ϧετ͸ɼΫϥελΛߦɼΫϥελʹؚ·ΕΔ୯ޠΛ ྻͱ͢Δ 2 ࣍ߦྻͰఆٛ͢Δɽ ද 1 ͸ɼจॻϕΫτϧϦετɼΫϥελจॻϦετɼ Ϋϥελ୯ޠϦετʹ͍ͭͯɼTETDM Ͱ༻ҙ͞Εͯ ͍ΔσʔλܕͱͷରԠΛ͍ࣔͯ͠ΔɽΫϥελจॻϦ ετͷ෦෼ʹɼboolean ͱ double ͷ 2 ͭͷܕ͕͋Δ͕ɼ ྻ਺͕શจॻ਺͋ΓΫϥελʹؚ·Ε͍ͯΔจॻΛ 1ɼ ؚ·Ε͍ͯͳ͍จॻΛ 0 ͱ͢Δ 2 ஋දݱΛߦ͏৔߹ͱɼ ͋ΔΫϥελʹؚ·Ε͍ͯΔจॻ ID Λ഑ྻͱͯ֨͠ ೲ͢Δ৔߹ͷ྆ํʹରԠ͢ΔͨΊͰ͋Δɽ ද 1: ۩ମతͳσʔλͱσʔλͷܕ. σʔλ ܕ จॻϕΫτϧϦετ double[][] ΫϥελจॻϦετ boolean[][]ɼint[][] Ϋϥελ୯ޠϦετ double[][]

3.3

࣮ࡍͷઃܭํུ

3.1 અͱ 3.2 અͰड़΂ͨɼஈ֊෼ྨͱσʔλͷఆٛʹ ج͖ͮɼຊߘͰ࣮ࡍʹߦͬͨγεςϜઃܭํུ͸ҎԼ ͷͱ͓ΓͰ͋Δɽ 1. πʔϧ໊ͱೖग़ྗσʔλͷ಺༰ͱॲཧ಺༰Λهࡌ ͢ΔΧʔυΛ༻ҙ 2. ෳ਺ͷ։ൃऀʢϓϩδΣΫτϝϯόʣʹΑΔɼΧʔ υ΁ͷهࡌ 3. πʔϧಉ࢜ͷೖग़ྗσʔλͷϚονϯάΛਫ਼ࠪ 4. πʔϧͷೖग़ྗσʔλ࠶ݕ౼΍ɼπʔϧͷ෼ׂ΍ ౷߹ 1 ͭͷπʔϧΛ 1 ຕͷΧʔυͰදݱ͠ɼલॲཧɼΫϥ ελϦϯάɼՄࢹԽͷ 3 ஈ֊ʹ෼ྨ͞ΕͨπʔϧΛͭ ͳ͙ͨΊʹɼσʔλೖग़ྗͷ੔߹ੑΛͱΔྲྀΕͰ͋Δɽ σʔλೖग़ྗͷ੔߹ੑ͕ͱΕͳ͍৔߹͸ɼॲཧ಺༰ͱ ೖग़ྗσʔλͷؔ܎͕ద੾͔Ͳ͏͔ɼ·ͨπʔϧͷॲ ཧ಺༰Λ෼ׂ·ͨ͸౷߹Մೳ͔Ͳ͏͔ݕ౼͢Δɽͳ͓ɼ લॲཧɼΫϥελϦϯάɼՄࢹԽͷ͍ͣΕ͔ʹ౰ͯ͸ ΊΔͷ͕೉͍͠πʔϧɼಛఆͷΫϥελϦϯάख๏ʹ ґଘ͢Δπʔϧʹؔͯ͠͸ɼผ్ΦϓγϣϯΧςΰϦ ͱ͢Δɽ ্هํུͷʢ̍ʣͱʢ̎ʣ͕ɼ։ൃϓϩδΣΫτͷ ໨తΛߟྀͨ͠໨త༏ઌͷࢤ޲ʹରԠ͠ɼʢ̏ʣͱʢ̐ʣ ͕ɼTETDM ͷ࢓༷͔Β࣮ݱՄೳͳखஈΛߟྀͨ͠ख ஈ༏ઌͷࢤ޲ʹରԠ͢Δɽ͢ͳΘͪɼ։ൃऀ΍Ϣʔβ ͷߟ͑Δɼʮ࣮ݱ͍ͨ͜͠ͱʯͷʮೖग़ྗσʔλ͕Կ͔ʯ ݕ౼͠ɼTETDM ͷπʔϧ࿈ಈͷ࢓૊Έʹద߹͢ΔΑ ͏ͳσʔλͷྲྀΕʹͳΔΑ͏ɼௐ੔͍ͯ͘͠ɽ ද 2 ʹɼ۩ମతʹग़͞ΕͨπʔϧҊͷҰ෦Λࣔ͢ɽલ ड़ͨ͠ΫϥελϦϯάͷஈ֊͝ͱʹɼπʔϧΛ෼ྨ͠ ͍ͯΔɽׅހ಺ͷ΋ͷ͸ɼΦϓγϣϯΧςΰϦͷ΋ͷ Ͱ͋Δɽ·ͨɼຊݚڀ͸ڭҭػؔͰ࣮ࢪ͓ͯ͠Γɼஶ ऀΒͷҰ෦ʢ޻ֶܥֶੜɼେֶӃੜʣͷΫϥελϦϯ άख๏ʹؔ͢Δֶश΋݉Ͷ͍ͯΔɽ͕ͨͬͯ͠ɼ͜͜ ͰطଘͷΫϥελϦϯάख๏ͷ͢΂ͯΛྻڍ͢Δ͜ͱ ͸໨ࢦ͍ͯ͠ͳ͍ɽ ද 2: ΫϥελϦϯάͷஈ֊ͱπʔϧ܈. ஈ֊ πʔϧ લॲཧ TF-IDF ܭࢉ, BM25 ܭࢉ Ϋϥελ K-meansɼ֊૚తΫϥελϦϯάɼ Ϧϯά ੍໿෇͖֊૚తΫϥελϦϯά, (ॏ৺ܭࢉɼڑ཭ܭࢉ) ՄࢹԽ ωοτϫʔΫܕਤɼ֊૚ߏ଄ਤ

(4)

4

؅ཧύωϧํࣜʹΑΔπʔϧ؅ཧ

ຊઅͰ͸ɼෳ਺ͷϚΠχϯάॲཧπʔϧͷ؅ཧΛߦ ͏؅ཧύωϧΛ TETDM ্ʹߏங͠ɼ1 ͭͷύωϧΛ ར༻ͯ͠πʔϧͷ૊Έ߹ΘͤΛมߋ͢Δɼ؅ཧύωϧ ϞσϧΛఏҊ͢Δɽศ্ٓɼ͜͜Ͱ͸ TETDM Ͱ࠾༻ ͞Ε͍ͯΔجຊతͳπʔϧͷ؅ཧΛʮجຊํࣜʯɼ؅ ཧύωϧϞσϧʹΑΔπʔϧͷ؅ཧΛʮ؅ཧύωϧํ ࣜʯΛݺͿɽ

4.1

جຊํࣜͷ໰୊఺

ਤ 1 ʹɼجຊํࣜʹجͮ͘ɼΫϥελϦϯά؀ڥΛ ࣔ͢ɽجຊํࣜͰ͸ɼϚΠχϯάॲཧπʔϧಉ࢜ͷ࿈ ܞʢσʔλͷ΍ΓऔΓʣ͕ڐ͞Ε͍ͯΔ΋ͷͷɼ1 ຕͷ ύωϧʹϚΠχϯάॲཧπʔϧͱՄࢹԽπʔϧΛ 1 ର 1 Ͱ૊Έ߹Θͤͯ഑ஔ͢Δɽπʔϧ։ൃऀͷࢹ఺Ͱ͸ɼ ϚΠχϯάॲཧπʔϧΛ։ൃ͢ΔࡍʹɼඞͣͳΜΒ͔ ͷՄࢹԽπʔϧͱηοτͰ࢖ΘΕΔ͜ͱΛ૝ఆ͓ͯ͠ ͔ͳͯ͘͸ͳΒͳ͍ɽTETDM Ͱ༻ҙ͞Ε͍ͯΔɼϚ Πχϯάॲཧπʔϧಉ࢜ͷ࿈ಈػೳΛ༻͍ͯɼଞͷϚ ΠχϯάॲཧπʔϧͰͷΈ࢖ΘΕΔσʔλΛग़ྗ͢Δ πʔϧͷ࡞੒΋ՄೳͰ͋Δ͕ɼݪଇతʹɼTETDM Ͱ ͸ϚΠχϯάॲཧπʔϧͱՄࢹԽπʔϧΛ 1 ର 1 ରԠ ͤ͞Δઃܭํ਑ͱͳ͍ͬͯΔɽ͜ͷ͜ͱ͸ɼπʔϧར ༻ऀʢϢʔβʣଆʹ͓͍ͯ΋ɼ໰୊ͱͳΔɽπʔϧͷ ૊Έ߹ΘͤΛࢦఆ͢ΔɼͲͷπʔϧͱͲͷπʔϧ͕૊ Έ߹ΘͤՄೳͳͷ͔ࣄલʹ஌͓ͬͯ͘ɼ·ͨ͸πʔϧ ʹ෇ଐ͢Δઆ໌จΛख़ಡͯ͠ௐ΂Δඞཁ͕͋Δɽͦͷ ͨΊɼπʔϧબ୒ͷվળΛࢼΈͨݚڀ΋ͳ͞Ε͍ͯΔ [2][4]ɽ ର࿩తͳΫϥελϦϯά؀ڥΛߏங͢ΔࡍɼϢʔβ ʹͱͬͯॏཁͳ͜ͱ͸ɼͲͷύωϧʹͲͷΑ͏ͳ (໊ લɾػೳͷ) πʔϧΛ૊Έ߹ΘͤΔ͔ΑΓɼΫϥελϦ ϯάʹඞཁͳύϥϝʔλʢલॲཧஈ֊ʣΛ͍͔ʹ༩͑ Δ͔ɼ·࣮ͨߦ͍ͨ͠ΫϥελϦϯάख๏͕બ୒Ͱ͖ Δ͔ͱ͍ͬͨ఺Ͱ͋Δɽ͕ͨͬͯ͠ɼTETDM ্ʹ࣮ ૷͞Ε͍ͯΔΫϥελϦϯάؔ࿈πʔϧΛ౷߹తʹѻ ͏ɼΠϯλϑΣʔεͷඞཁੑ͕ੜ͡Δɽ

4.2

؅ཧύωϧϞσϧ

ਤ 2 ʹɼ؅ཧύωϧϞσϧʹΑΔΫϥελϦϯά؀ ڥͷ֓ཁΛࣔ͢ɽຊߘͰఏҊ͢Δ؅ཧύωϧϞσϧ͸ɼ ΫϥελϦϯάͷͨΊʹ TETDM ্ʹԾ૝తͳ౷߹؀ ڥΛߏங͢Δ΋ͷͰ͋Δɽύωϧ΁ϚΠχϯάॲཧπʔ ϧ΍ՄࢹԽπʔϧΛ഑ஔ͢Δ࢓༷΍ɼఆٛ͞Ε͍ͯΔ πʔϧؒͷσʔλ࿈ಈʹ༻͍Δϝιου΍σʔλܕͳ ਤ 1: جຊํࣜʹΑΔΫϥελϦϯά. ਤ 2: ؅ཧύωϧํࣜʹΑΔΫϥελϦϯά. Ͳɼͦͷ··࢖༻͍ͯ͠ΔɽTETDM ͷίΞͱͳΔϓ ϩάϥϜ෦෼ʹखΛՃ͑Δ͜ͱ͸ͳ͍ɽ ؅ཧύωϧ͸ɼTETDM ͷ࢓༷ʹै͍ɼϚΠχϯά ॲཧπʔϧͱՄࢹԽπʔϧ͕ 1 ର 1 ରԠͯ͠ɼTETDM ͷύωϧ্ʹ഑ஔ͞ΕΔɽ؅ཧύωϧ͕ߦ͏͜ͱ͸ɼจ ࣈ௨ΓɼΫϥελϦϯάʹؔ࿈͢Δπʔϧ܈ͷ؅ཧͰ ͋ΓɼϢʔβʹఏࣔ͢Δ৘ใ͸ɼͲͷΑ͏ͳલॲཧ͕ Մೳ͔ɼͲͷΑ͏ͳΫϥελϦϯάख๏͕ར༻Մೳ͔ ͷ 2 ఺Ͱ͋ΔɽϢʔβ͸ɼ؅ཧύωϧ্Ͱɼࣗ਎͕ར ༻͍ͨ͠લॲཧํ๏ɼΫϥελϦϯάख๏Λબ୒͢Δɽ ؅ཧύωϧͷϚΠχϯάॲཧπʔϧ͸ɼϢʔβ͕બ୒ ͨ͠લॲཧͱΫϥελϦϯάॲཧͷ૊Έ߹ΘͤʹԠ͡ ͯɼؔ࿈͢ΔϚΠχϯάॲཧπʔϧΛಈ࡞ͤ͞Δɽ͜ ͷࡍɼલॲཧͱΫϥελϦϯάॲཧͷ૊Έ߹ΘͤΛม ͑ͨɼҟͳΔॲཧΛฒྻʹ࣮ߦՄೳͰ͋Δɽ ؅ཧύωϧҎ֎ͷύωϧʹ͸ɼ؅ཧύωϧͰબ୒͠ ͨ৘ใΛҾ͖ܧ͙ϚΠχϯάॲཧϞδϡʔϧΛ഑ஔ͢ Δɽ͜ΕʹΑΓɼϢʔβ͔Β͸ಛఆͷલॲཧ΍Ϋϥε

(5)

ਤ 3: ؅ཧύωϧϞσϧʹجͮ͘ΫϥελϦϯά؀ڥ. λϦϯάख๏Λݸผʹબ୒͢Δࢦఆ͢Δॲཧ͕Ӆ͞Ε Δ͜ͱʹͳΔɽϢʔβ͕؅ཧύωϧҎ֎ͷύωϧͰࣗ ΒͷཁٻʹԠͯ͡બ୒͢Δͷ͸ɼͲͷΑ͏ͳ৘ใ͕ݟ ͍͔ͨɼ͢ͳΘͪՄࢹԽख๏ͷબ୒͚ͩͰ͋Δɽ ؅ཧύωϧํࣜͰ͸ɼϢʔβ͸ɼύωϧ͝ͱʹϚΠ χϯάॲཧπʔϧͱՄࢹԽπʔϧͷ૊Έ߹Θͤʹ೰Ή ඞཁ͕ͳ͘ͳΔɽϢʔβ͸ɼ؅ཧύωϧ 1 ͭͰෳ਺ͷ πʔϧΛ·ͱΊͯऔΓѻ͏͜ͱ͕Ͱ͖ΔΑ͏ʹͳͬͨ ্ͰɼΫϥελϦϯά݁ՌΛग़ྗ͢ΔՄࢹԽπʔϧΛ ഑ஔͨ͠ύωϧʹݱΕΔɼϘλϯ΍ೖྗϑΥʔϜͳͲΛ ར༻͠ɼग़ྗͷϑΟϧλϦϯάɼ݁Ռʹର͢ΔϑΟʔυ όοΫΛߦ͏͜ͱ΋Ͱ͖Δɽ3.1 અͰड़΂ͨΦϓγϣϯ ΧςΰϦͷπʔϧ͸ɼಛఆͷΫϥελϦϯάख๏ͱͷ ݁ͼ͖͕ͭڧ͍ϚΠχϯάॲཧπʔϧͱͳΔͨΊɼͦ ͷΫϥελϦϯάख๏ͷ݁ՌΛग़ྗ͢Δύωϧ͔Βύ ϥϝʔλೖྗΛड͚෇͚ΔܗͰ࣮૷͢Δ͜ͱ͕๬·͠ ͍ɽ؅ཧύωϧํࣜΛ࠾༻͢Δ͜ͱͰɼTETDM ͷಛ ௃Ͱ΋͋Δύωϧ͝ͱͷΠϯλϥΫγϣϯػೳΛҡ࣋ ͨ͠··ɼ؅ཧύωϧͰෳ਺ͷπʔϧΛ౷߹͢Δɼ൚ ༻తͳର࿩తΫϥελϦϯά؀ڥ͕ߏஙͰ͖Δ͜ͱʹ ͳΔɽجຊํࣜͰ͸ύωϧͷಠཱੑ͕ߴ͍ͷʹରͯ͠ɼ ؅ཧύωϧํࣜͰ͸ɼෳ਺ͷύωϧʹ·͕ͨͬͯڞ௨ ͢ΔॲཧΛߦ͏ϚΠχϯάॲཧπʔϧΛ౷߹؅ཧ͢Δɽ

5

ΫϥελϦϯά؀ڥͷ࣮૷

3 અͰड़΂ͨ࿈ಈϧʔϧͱɼ4 અͰड़΂ͨ؅ཧύωϧϞ σϧʹैͬͨɼࢼݧతͳΫϥελϦϯά؀ڥΛ TETDM ্ʹߏங͢Δɽ ਤ 3 ʹɼΫϥελϦϯά؀ڥͷ࣮ߦྫΛࣔ͢ɽਤத ͷɼࠨ୺ͷύωϧʢਤத੺࿮Ͱғͬͨʣ͕ɼ؅ཧύω ϧͰ͋Δɽಛ௃ྔબ୒ͱΫϥελϦϯάख๏ͷબ୒͕ ՄೳͰ͋ΔɽݱࡏͷΫϥελϦϯά؀ڥͰ͸ɼಛ௃ྔ Λ TFIDF ͱ BM25 ͔Βબ୒Ͱ͖ɼΫϥελϦϯάख ๏Λ K-means ͱ֊૚తΫϥελϦϯάʢ࠷ۙྡ๏ʣ͔ Βબ୒Ͱ͖ΔɽΫϥελϦϯά݁Ռ͸ɼ؅ཧύωϧͷ ӈଆʹฒΜͰ͍Δύωϧʹදࣔ͞Ε͓ͯΓɼςΩετ ΍ਤܗΛ༻͍ͯ݁ՌΛఏ͍ࣔͯ͠Δɽ

6

͓ΘΓʹ

ຊߘͰ͸ɼTETDM Λ༻͍ͨγεςϜ։ൃʹ͓͚Δɼ πʔϧಉ࢜ͷ࿈ܞํ਑ʹ͍ͭͯݕ౼ͨ͠ɽର࿩తͳΫ ϥελϦϯά؀ڥͷߏஙΛ໨తͱͨ͠γεςϜ։ൃΛ ૝ఆ͠ɼϢʔβͷཁٻͱ TETDM ͷ࢓༷ʹؔ͢Δ੆Γ ߹ͤΛߦͬͨɽσʔλͷྲྀΕʹண໨͠ɼΫϥελϦϯά ͷஈ֊͝ͱʹඞཁͳσʔλͷܕͱɼTETDM Ͱఆٛ͞ Ε͍ͯΔπʔϧͷ࿈ಈʹؔ͢Δ࢓༷ΛରԠͤͨ͞ɽ· ͨɼର࿩తͳΫϥελϦϯά؀ڥʹద͢Δɼύϥϝʔ λͷࢦఆ΍ΫϥελϦϯάख๏ͷબ୒Λࢧԉ͢Δɼ؅ ཧύωϧϞσϧΛఏҊͨ͠ɽຊߘͰ͸ɼ2 छྨͷಛ௃ ྔܭࢉख๏ͷࢦఆͱɼΫϥελϦϯάख๏ͷࢦఆ͕Մ ೳͳࢼݧతͳγεςϜͷ࣮૷Λߦͬͨɽ ࠓޙɼϢʔβʹΑΔ੍໿ͷࢦఆɼϢʔβ͔ΒͷϑΟʔ υόοΫʹԠͨ͡ॲཧΛՄೳʹ͢Δ΄͔ɼલॲཧͷπʔ ϧɼΫϥελϦϯάख๏ͷπʔϧΛ૿΍͠ɼΫϥελ Ϧϯά؀ڥΛॆ࣮ͤ͞Δɽ ຊߘͰఏҊ͢Δ࿮૊ΈʹΑΓߏங͢Δ؀ڥ͸ɼΫϥ ελϦϯάʹΑΔจॻ෼ੳΛߦ͍͍ͨϢʔβ͚ͩͰͳ ͘ɼ৽نͷΫϥελϦϯάख๏΍ՄࢹԽख๏ͳͲΛଞ

(6)

ͷख๏ͱൺֱͯ͠ධՁΛߦ͍͍ͨݚڀऀʹ΋༗ӹͰ͋ Δͱߟ͑Δɽ

ࢀߟจݙ

[1] τϜɾσϚϧίʢஶʣ, ߴསஐ߂, ࠇా७Ұ࿠ (؂ ༁)ɿߏ଄Խ෼ੳͱγεςϜ࢓༷—໨ࢦ͢γεςϜ Λ໌֬ʹ͢ΔϞσϧԽٕ๏—, ೔ܦ BP ग़൛ηϯ λʔ (1994) [2] த֞಺ཥࡊ, ઒ຊՂ୅, ࠭ࢁ౉ɿ౷߹؀ڥ TETDM Λ༻͍ͨςΩετϚΠχϯάʹ͓͚Δॳ৺ऀͷͨ Ίͷπʔϧબ୒ࢧԉ, ୈ 27 ճਓ޻஌ೳֶձશࠃେ ձ, 3B3-NFC-01a-1 (2013) [3] தଜ๎݈, ্౔Ҫཅࢠ, एྛਅҰ, ٢ాయՄɿΫϥε λϦϯά݁Ռͷಛ௃நग़Λ༻͍Δߴ࣍ݩσʔλͷର ࿩తΫϥελϦϯά, ৘ใॲཧֶձ࿦จࢽɿσʔλ ϕʔε, Vol.47, No.SIG 19ʢTOD 32ʣ, pp.28–41 (2006) [4] େ௩௚໵, দԼޫൣɿςΩετ෼ੳʹ͓͚Δࢼߦࡨ ޡͷࢧԉʹ޲͚ͯ—TETDM ͷΠϯλϑΣʔεʹ ؔ͢ΔҰߟ࡯—, ୈ 2 ճΠϯλϥΫςΟϒ৘ใΞ ΫηεͱՄࢹԽϚΠχϯάݚڀձ, SIG-AM-02-10, pp. 56–61 (2012) [5] ࠤ౻༞հ, ؠࢁਅɿ൒ڭࢣ༗ΓΫϥελϦϯάΛద ༻ͨ͠ର࿩ܕจॻ෼ྨٕज़ͷఏҊ, ৘ใॲཧֶձݚ ڀใࠂ, Vol. 2009-DBS-148, No. 7, pp.1–6 (2009) [6] ࠭ࢁ౉, ߴؒ߁࢙, ੢ݪཅࢠ, ಙӬल࿨, ۲ؒफ෉, Ѩ෦लঘ, ֿฒ஌هɿςΩετσʔλϚΠχϯά ͷͨΊͷ౷߹؀ڥ TETDM ͷ։ൃ, ਓ޻஌ೳֶձ ࿦จࢽ, Vol. 28, No. 1, pp. 1–12 (2013) [7] ୩ܙཬ߳, ࠭ࢁ౉ɿిࢠΧϧςʹ͓͚Δ৽ਓͱϕ ςϥϯͷಛ௃ൺֱࢧԉγεςϜ, ୈ 3 ճΠϯλϥ ΫςΟϒ৘ใΞΫηεͱՄࢹԽϚΠχϯάݚڀձ, SIG-AM-03-07, pp. 37–43 (2013) [8] ಙӬल࿨ɿR ʹΑΔςΩετϚΠχϯά༻ TETDM Ϟδϡʔϧ։ൃ, ୈ 27 ճਓ޻஌ೳֶձશࠃେձ, 3B3-NFC-01b-2 (2013)

参照

関連したドキュメント

繊維フィルターの実用上の要求特性は、従来から検討が行われてきたフィルター基本特

Keywords: homology representation, permutation module, Andre permutations, simsun permutation, tangent and Genocchi

繰延税金資産は、「繰延税金資産の回収可能性に関する適用指針」(企業会計基準適用指針第26

Applications of msets in Logic Programming languages is found to over- come “computational inefficiency” inherent in otherwise situation, especially in solving a sweep of

Shi, “The essential norm of a composition operator on the Bloch space in polydiscs,” Chinese Journal of Contemporary Mathematics, vol. Chen, “Weighted composition operators from Fp,

[2])) and will not be repeated here. As had been mentioned there, the only feasible way in which the problem of a system of charged particles and, in particular, of ionic solutions

指針に基づく 防災計画表 を作成し事業 所内に掲示し ている , 12.3%.

Amount of Remuneration, etc. The Company does not pay to Directors who concurrently serve as Executive Officer the remuneration paid to Directors. Therefore, “Number of Persons”