• 検索結果がありません。

音声中の検索語検出における単語共起情報の利用

N/A
N/A
Protected

Academic year: 2021

シェア "音声中の検索語検出における単語共起情報の利用"

Copied!
6
0
0

読み込み中.... (全文を見る)

全文

(1)Vol.2016-SLP-110 No.1 2016/2/5. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. Ի੠தͷ‫ݕޠࡧݕ‬ग़ʹ͓͚Δ୯‫ىڞޠ‬৘ใͷར༻ খా‫ ݪ‬Ұ੒†1. ࢁԼ ༸Ұ†2,a). ֓ཁɿຊ࿦จͰ͸ɼԻ੠தͷ‫ݕޠࡧݕ‬ग़ʹ͓͍ͯɼԻ੠υΩϡϝϯτͱ‫ޠࡧݕ‬ͷ‫ྻ߸ه‬র߹ʹՃ͑ͯɼ‫ࡧݕ‬ ‫ؔ͢ʹޠ‬Δ‫ىڞ‬୯‫ޠ‬৘ใͱ‫ؒ۠ิީޠࡧݕ‬ͷ୯‫ޠ‬৴པ౓Λ༻͍Δख๏ΛఏҊ͢ΔɽԻૉ୯ҐͰͷ DP Ϛο νϯάʹΑΓԻ੠υΩϡϝϯτத͔Β‫ޠࡧݕ‬ͷީิ۠ؒΛ‫ݕ‬ग़͠ɼ‫ݕ‬ग़ͨ͠ީิ۠ؒΛ Web ςΩετ͔Β ֶशͨ͠‫ىڞ‬୯‫ޠ‬৘ใͱީิ۠ؒͷ୯‫ޠ‬৴པ౓ʹ‫࠶͖ͮج‬ධՁ͢Δɽ‫ىڞ‬୯‫ޠ‬৘ใͱͯ͠ɼTF-IDF ஋ɼ ީิ۠ؒલ‫ʹޙ‬ग़‫ىڞͨ͠ݱ‬୯‫ޠ‬ͷ਺ɼԻ੠ೝࣝ࣌ʹಘΒΕΔ୯‫ޠ‬৴པ౓ͷ 3 छྨͷख๏Λൺֱ͢Δɽ‫ڞ‬ ‫ى‬୯‫ޠ‬৘ใΛ༻͍Δ͜ͱʹΑΓ STD ͷੑೳΛվળͰ͖ɼީิ۠ؒલ‫ʹޙ‬ग़‫ىڞͨ͠ݱ‬୯‫ޠ‬ͷ਺͕ STD ͷ ਫ਼౓޲্ʹ࠷΋༗ޮͰ͋Δ͜ͱ͕Θ͔ͬͨɽ‫ىڞ‬৘ใʹՃ͑ީิ۠ؒͷ୯‫ޠ‬৴པ౓Λ૊ΈࠐΜͩ৔߹ɼߋ ʹਫ਼౓ͷ޲্͕‫ݟ‬ΒΕͨɽ ΩʔϫʔυɿSTDɼDP Ϛονϯάɼ‫ىڞ‬୯‫ޠ‬ɼTF-IDF. Spoken Term Detection Using Information of Collocation Odawara Kazunari†1. Yamashita Yoichi†2,a). Abstract: This paper proposes a new spoken term detection method which uses collocation information for a query word and word reliability of candidate segments, in addition to word matching score of phoneme sequences between spoken documents and a query word. This method detects candidate segments from spoken documents using DP matching in phoneme unit. The detected candidate segments are re-evaluated based on collocation information trained with web text and word reliability. We compared three measures for collocation information ; TF-IDF value, the number of the collocation word appeared before and after the candidate segment and word reliability scores obtained in speech recognition. Experimental results show that the introduction of collocation information improves STD performance and that the measure using number of the collocation word is most effective to improvement of STD. In addition to collocation information, word reliability of candidate segments improves STD performance. Keywords: STD, DP Matching, Word Collocation, TF-IDF. 1. ͸͡Ίʹ ‫ࡏݱ‬ɼԻ੠υΩϡϝϯτΛ‫͢ࡧݕ‬Δࡍɼͦͷίϯςϯπ ͷ࡞੒ऀ͕෇Ճͨ͠λΠτϧ΍λάͳͲͷϝλσʔλΛ༻. Γ͢Δͱɼ‫ٻ‬Ί͍ͯΔԻ੠υΩϡϝϯτΛಘΔ͜ͱ͸೉͠ ͍ɽԻ੠υΩϡϝϯτʹର͢Δ‫Ͱࡧݕ‬͸ɼίϯςϯπ࡞੒ ऀ͕༩͑ΔϝλσʔλʹґΒͣ໨తͷίϯςϯπΛޮ཰ྑ ͘‫͚ͭݟ‬Δ‫ٕࡧݕ‬ज़͕‫ٻ‬ΊΒΕ͍ͯΔɽ. ͍Δख๏͕ҰൠతͰ͋Δ͕ɼϝλσʔλ͕༩͑ΒΕ͍ͯͳ. ͜ͷԻ੠υΩϡϝϯτʹର͢Δ‫ͯ͠ͱࡧݕ‬ɼ༩͑ΒΕͨ. ͔ͬͨΓɼίϯςϯπͷ಺༰Λे෼ʹද‫͔ͨͬͳ͍ͯ͠ݱ‬. ‫ޠࡧݕ‬ΛԻ੠υΩϡϝϯτத͔Βࣗಈతʹ‫ݕ‬ग़͢ΔԻ੠த. †1. †2. a). ‫ࡏݱ‬ɼ໋ཱ‫ؗ‬େֶେֶӃ৘ใཧ޻ֶ‫ڀݚ‬Պ Presently with Graduate School of Information and Engineering, Ritsumeikan University ‫ࡏݱ‬ɼ໋ཱ‫ؗ‬େֶ৘ใཧ޻ֶ෦ Presently with College of Information and Engineering, Ritsumeikan University [email protected]. ⓒ 2016 Information Processing Society of Japan. ͷ‫ݕޠࡧݕ‬ग़ʢSTDɿSpoken Term Detectionʣͷ‫͕ڀݚ‬ ੝ΜʹߦΘΕ͍ͯΔ [1], [2]ɽSTD ͸ɼର৅ͷԻ੠υΩϡ ϝϯτΛେ‫ޠ‬ኮ࿈ଓԻ੠ೝࣝͯ͠ಘΒΕΔ୯‫ࡧݕͱྻޠ‬ ‫ޠ‬Λর߹͢Δ͜ͱʹΑ࣮ͬͯ‫͖Ͱݱ‬Δ͕ɼԻ੠ೝࣝͷࣙॻ ʹొ࿥͞Ε͍ͯͳ͍୯‫͋Ͱޠ‬Δະ஌‫ޠ‬΍Ի੠ೝࣝ࣌ʹೝ. 1.

(2) Vol.2016-SLP-110 No.1 2016/2/5. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. ࣝ‫ޡ‬Γͱͳͬͨ୯‫͍ͳ͖Ͱࡧݕ͕ޠ‬ɽ͜ͷ໰୊Λղܾ͢Δ. Ի‫ڹ‬৘ใʹ‫͍ͨͮج‬αϒϫʔυྻʹΑΔ࿈ଓ DP Ϛονϯ. ͨΊɼ୯‫ޠ‬ΑΓ୹͍Իૉ΍ԻઅͳͲͷαϒϫʔυΛ୯Ґͱ. άʹΑΓ‫ޠࡧݕ‬ͷީิ۠ؒΛ‫ݕ‬ग़͠ɼ‫ىڞ‬୯‫ޠ‬৘ใΛ༻͍. ͯ͠Ի੠ೝࣝ͠ɼ‫ྻܥͨ͠ࣅྨͱޠࡧݕ‬Λ‫ݕ‬ग़͢Δख๏͕. ͯείΞΛ࠶ධՁ͢Δ͜ͱʹΑΓ‫ޠࡧݕ‬Λ‫ݕ‬ग़͢Δɽ. ޿͘༻͍ΒΕɼະ஌‫ʹޠ‬ରͯ͠΋Ұఆͷੑೳ͕ಘΒΕ͍ͯ Δ [3], [4], [5], [6]ɽ͜ͷख๏Ͱ͸ɼ‫ޠࡧݕ‬ͷαϒϫʔυྻ ͱྨࣅͨ͠αϒϫʔυ‫ྻܥ‬ΛԻ੠υΩϡϝϯτ͔Β‫ݕ‬ग़͢ ΔͨΊɼ༙͖ग़͠‫ޡ‬Γͷൃੜ͕໰୊ͱͳΔ΋ͷͷɼԻ੠Λ छྨͷ‫ݶ‬ΒΕΔαϒϫʔυ୯ҐͰද‫͢ݱ‬ΔͨΊɼະ஌‫ޠ‬ͷ ໰୊͸ආ͚ΒΕΔɽҰํͰɼSTD Ͱൃ੠͞Εͨ‫ޠࡧݕ‬ͷ۠ ؒΛ࿙Εͳ͘‫ݕ‬ग़͢Δɼ͢Θͳͪ‫ݕ‬ग़཰Λ্͛ΔͨΊʹ͸ɼ Ի੠υΩϡϝϯτʹ͓͚ΔԻ‫ڹ‬తͳଟ༷ੑΛͲͷΑ͏ʹߟ ྀͯ͠‫ͱޠࡧݕ‬র߹͢Δͷ͔͕େ͖ͳ໰୊ͱͳΔɽ͜Ε· Ͱʹɼෳ਺ͷԻ੠ೝࣝ݁ՌΛ༻͍Δ͜ͱͰԻ‫ڹ‬తଟ༷ੑΛ ߟྀ͢Δख๏ [7] ΍ɼԻૉͷସΘΓʹԻ‫ڹ‬ಛ௃ΛϕΫτϧ. 2.1 ‫ىڞ‬୯‫ޠ‬৘ใͱީิ۠ؒͷ୯‫ޠ‬৴པ౓Λ༻͍ͨ‫ࡧݕ‬ ‫ݕޠ‬ग़ͷॲཧͷ֓ཁ ఏҊ͢Δ STD ख๏ͷॲཧͷྲྀΕΛ Fig. 1 ʹࣔ͢ɽҎԼ ʹͦͷखॱΛड़΂Δɽ. ( 1 ) Ի੠υΩϡϝϯτͷೝࣝ ‫ࡧݕ‬ର৅ͷԻ੠υΩϡϝϯτΛԻ੠ೝࣝ͢Δ͜ͱʹ ΑͬͯɼԻૉྻͱ୯‫ྻޠ‬ͷ 2 छྨͷԻ੠ೝࣝ݁ՌΛ ಘ͓ͯ͘ɽ. ( 2 ) ‫ؒ۠ิީޠࡧݕ‬ͷੜ੒. ྔࢠԽͨ͠ಘΒΕΔ‫ʹྻ߸ه‬ΑͬͯԻ੠υΩϡϝϯτΛද. ‫͕ޠࡧݕ‬ೖྗ͞ΕΔͱɼԻ੠υΩϡϝϯτͷԻૉྻ. ‫͢ݱ‬ख๏ [8] ͳͲ͕ఏҊ͞Ε͍ͯΔɽҎ্ͷख๏͸ɼԻ੠. Ͱͷೝࣝ݁ՌΛର৅ʹͯ͠ɼ࿈ଓ DP Ϛονϯάʹ. ೝࣝʹΑͬͯԻ੠υΩϡϝϯτΛม‫ͯ͠׵‬ੜ੒͞Εͨ‫߸ه‬. Αͬͯ‫ݕޠࡧݕ‬ग़Λߦ͍ɼ݁ՌΛ‫ޠࡧݕ‬ͷީิ۠ؒ. ྻͱ‫ͱޠࡧݕ‬ͷর߹ʹΑͬͯ‫ޠࡧݕ‬Λ‫ݕ‬ग़͢ΔͨΊɼԻ੠. ͱ͢Δɽ͜͜Ͱɼ‫ൃ͕ޠࡧݕ‬੠͞Ε͍ͯΔ͕۠ؒީ. υΩϡϝϯτʹ͓͚ΔԻ‫ڹ‬తͳಛ௃ͷΈΛ༻͍ͯ‫ޠࡧݕ‬Λ. ิ͔۠ؒΒͳΔ΂͘࿙Εͳ͍Α͏ɼ‫ݕ͕ޠࡧݕ‬ग़͞. ‫ݕ‬ग़͍ͯ͠Δͱ‫͑ݴ‬Δɽ. Ε΍͍͢Α͏ߴΊͷ͖͍͠஋ઃఆΛ͓ͯ͘͠ɽ. ‫ޠࡧݕ‬͸Ի੠υΩϡϝϯτதͰ͸จͷҰ෦ͱͯ͠ൃ੠͞ Εɼલ‫ޙ‬ͷ୯‫ͱޠ‬ͷؔ࿈ੑ͕‫ݟ‬ΒΕΔɽԻ੠υΩϡϝϯτ ʹ͓͚Δ୯‫ؒޠ‬ͷؔ܎Λ͜ͷΑ͏ͳ‫ޠݴ‬৘ใͱͯ͠࢖༻͢ Δ͜ͱʹΑͬͯɼ༙͖ग़͠‫ޡ‬ΓΛ཈੍͠ STD ͷੑೳΛ޲ ্ͤ͞ΒΕΔՄೳੑ͕͋Δɽ‫ޠݴ‬৘ใΛ࢖༻ͨ͠ STD ख ๏ͱͯ͠͸ɼ‫ޠࡧݕ‬ͷલ‫ࢺॿ֨ʹޙ‬Λ෇༩͠‫͢ࡧݕ‬Δख ๏ [9] ΍ɼ୯‫ޠ‬ϥςΟεͰද‫͞ݱ‬ΕͨԻ੠υΩϡϝϯτʹ ରͯ͠ɼ࿩୊ਪఆʹ‫͍ͨͮج‬୯‫ىڞޠ‬৘ใΛ༻͍‫ط‬஌‫ޠ‬ͷ. ( 3 ) ‫ىڞ‬୯‫ޠ‬Ϧετͷੜ੒ ‫ޠࡧݕ‬Λ web ‫͠ࡧݕ‬ɼ‫ؔ͢ʹޠࡧݕ‬Δ web จॻΛ ऩू͢Δɽࣄલʹ web ‫ࡧݕ‬Λߦͬͯ࡞੒͓͍ͯ͠ ͨ web จॻू߹ͱ‫ͯؔ͠ʹޠࡧݕ‬ऩूͨ͠ web จ ॻΛ༻͍ͯɼ‫ؔͱޠࡧݕ‬࿈ੑ͕ߴ͍ͱߟ͑ΒΕΔ‫ڞ‬ ‫ى‬୯‫ޠ‬ϦετΛੜ੒͢Δɽৄࡉ͸ 3.1 અͰड़΂Δɽ. ( 4 ) ‫ͱޠࡧݕ‬ͷ‫ىڞ‬Λߟ͑Δ୯‫߹ूޠ‬ͷੜ੒. ‫ݕ‬ग़Λߦ͏ख๏ [10] ͳͲ͕ఏҊ͞Ε͍ͯΔɽ·ͨɼԻ੠υ. Ի੠υΩϡϝϯτͷ୯‫Ͱྻޠ‬ͷೝࣝ݁Ռʹ͓͍ͯɼ. ΩϡϝϯτΛ࿈ଓ୯‫ޠ‬ೝࣝͨ͠ࡍɼ୯‫ޠ‬ຖʹ୯‫ޠ‬৴པ౓͕. ‫ؒ۠ิީޠࡧݕ‬ͷҐஔΛܾఆ͢Δɽ‫ؒ۠ิީޠࡧݕ‬. ࢉग़͞ΕΔɽ͜ͷ୯‫ޠ‬৴པ౓͸ɼԻ੠ೝࣝ‫͕ث‬ਖ਼͍݁͠Ռ. ʹઌߦ͢Δ M ‫ݸ‬ɼ͓Αͼ‫ޙ‬ଓ͢Δ M ‫ ݸ‬ʢ‫ ܭ‬2M. ͱ൑அͨ͠৔߹ߴ͘ͳΔ͕ɼ‫ݕ‬ग़͢΂͖୯‫͕ޠ‬ະ஌‫ޠ‬΍ೝ. ‫ݸ‬ʣ͔Β੒Δ୯‫߹ूޠ‬ͷ͏ͪɼ ʢ3ʣͰੜ੒ͨ͠‫ىڞ‬୯. ࣝ‫ޡ‬Γͷ৔߹ɼ‫ݕ‬ग़͢΂͖۠ؒͷ୯‫ޠ‬৴པ౓͸௿͘ͳΔ܏. ‫ޠ‬Ϧετʹ‫·ؚ‬ΕΔ୯‫ ߹ूޠ‬U = u1 , . . . , uK ʢ͜. ޲ʹ͋Δɽ‫ޠࡧݕ‬ͷީิ۠ؒͷ୯‫ޠ‬৴པ౓Λ‫ݟ‬ΔࣄͰ‫ݕޡ‬. ͜ͰɼK ≤ 2M ʣΛ࠶ධՁʹ༻͍Δ‫ىڞ‬୯‫ͱ߹ूޠ‬. ग़Λܰ‫͖Ͱݮ‬ΔՄೳੑ͕͋Δɽ. ͢Δɽ͜ͷ୯‫߹ूޠ‬Λ༻͍ͯީิ۠ؒʹର͢Δ‫ىڞ‬. ຊ࿦จͰ͸ɼ୯‫ޠ‬ͷ‫ىڞ‬৘ใͱީิ۠ؒͷ୯‫ޠ‬৴པ౓ Λར༻ͯ͠ STD ͷੑೳΛվળ͢Δख๏ʹ͍ͭͯड़΂Δɽ ैདྷ͔Β༻͍ΒΕ͍ͯΔԻૉྻͷর߹ʹ‫ ͍ͨͮج‬STD ʹ Αͬͯ‫ޠࡧݕ‬ͷީิ۠ؒΛੜ੒͠ɼީิ۠ؒͷલ‫ʹޙ‬ग़‫ݱ‬ ͢Δ୯‫ͱޠࡧݕͱޠ‬ͷ‫ىڞ‬৘ใɼԻ੠ೝࣝ࣌ʹಘΒΕΔީ ิ۠ؒͷ୯‫ޠ‬৴པ౓Λ༻͍ͯީิ۠ؒΛ࠶ධՁͨ͠είΞ ʹ‫ޠࡧݕ͍ͯͮج‬Λ‫ݕ‬ग़͢ΔɽҎԼɼ2 ষͰ͸ఏҊ͢Δ‫ݕ‬ ࡧ‫ݕޠ‬ग़ख๏ͷ֓ཁɼ3 ষͰ͸‫ىڞ‬୯‫ޠ‬৘ใͱީิ۠ؒͷ ୯‫ޠ‬৴པ౓ͷ۩ମతࢉग़ํ๏ɼ4 ষͰ͸ධՁ࣮‫ݧ‬ɼ5 ষʹ ݁࿦ͱࠓ‫ޙ‬ͷ՝୊ʹ͍ͭͯड़΂Δɽ. 2. ‫ݕޠࡧݕ‬ग़ख๏ ຊ࿦จͰఏҊ͢Δ STD ख๏Ͱ͸ɼैདྷ͔Β༻͍ΒΕΔ. ⓒ 2016 Information Processing Society of Japan. ୯‫ޠ‬৘ใΛࢉग़͢Δɽৄࡉ͸ 3.2 અͰड़΂Δɽ. ( 5 ) ީิ۠ؒʹ͓͚Δ୯‫ޠ‬৴པ౓ͷࢉग़ ‫ޠࡧݕ‬ͷީิ۠ؒ w ˜ ͸ɼԻ੠υΩϡϝϯτͷԻૉྻ ͱ‫ޠࡧݕ‬ͷԻૉྻͰ‫ྻ߸ه‬র߹͢ΔͨΊɼಘΒΕΔ ީิ۠ؒ͸୯‫ޠ‬୯ҐͰ͸ͳ͘ɼԻ੠υΩϡϝϯτத ͷ͍͔ͭ͘ͷ୯‫ݕ͍͕ͯͬͨ·ʹޠ‬ग़͞ΕΔ৔߹͕ ͋Δɽީิ۠ؒʹ‫·ؚ‬ΕΔ୯‫ޠ‬Λ bi (1 ≤ i ≤ B) ͱ ͠ɼީิ۠ؒ w ˜ ʹ‫·ؚ‬ΕΔ୯‫ ޠ‬bi ͷϑϨʔϜ਺Λ. di ͱ͢ΔɽԻ੠ೝࣝ࣌ʹಘΒΕΔ୯‫ ޠ‬bi ͷ୯‫ޠ‬৴ པ౓Λ ci ͱ͢Δͱɼީิ۠ؒͷ୯‫ޠ‬৴པ౓ Jw˜ ͸. PB. Jw˜ = Pi=1 B. c i di. i=1. di. (1). 2.

(3) Vol.2016-SLP-110 No.1 2016/2/5. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. ਤ 1. ୯‫ىڞޠ‬Λ༻͍ͨ STD ख๏ͷྲྀΕ. Fig. 1 Flow of STD method using infomation of collocation. ਤ 2. ͱͯ͠ࢉग़͢Δɽ. ‫ىڞ‬୯‫ޠ‬Ϧετ࡞੒ͷྲྀΕ. Fig. 2 Flow of making collocation information list.. ( 6 ) ‫ޠࡧݕ‬ͷ‫ݕ‬ग़ ‫ ޠࡧݕ‬w ʹର͢Δީิ۠ؒ w ˜ ʹରͯ͠ɼʢ2ʣͰಘ ΒΕΔর߹είΞ Lw˜ ͱʢ4ʣͰࢉग़ͨ͠‫ىڞ‬୯‫ޠ‬৘ ใ Iw˜ ͱʢ5ʣͰࢉग़ͨ͠ީิ۠ؒͷ୯‫ޠ‬৴པ౓ J( w) ˜ Λ༻͍ͯɼީิ۠ؒΛ࠶ධՁͨ͠είΞ Cw˜ Λ. Cw˜ =. 1 + αIw˜ − βJw˜ Lw˜. (2). ͷΑ͏ʹ‫ٻ‬ΊΔɽ͜͜Ͱɼα ͸‫ىڞ‬୯‫ޠ‬৘ใͷॏΈɼ. β ͸ީิ۠ؒͷ୯‫ޠ‬৴པ౓ͷॏΈͰ͋Δɽ͜ͷεί Ξ Cw˜ ʹର͢Δ͖͍͠஋ॲཧʹΑͬͯɼ࠷ऴతʹ‫ݕ‬. ਤ 3 TF-IDF ࢉग़ͷͨΊͷจॻ. ࡧ‫ޠ‬Λ‫ݕ‬ग़͢Δɽ. Fig. 3 Document of calculating TF-IDF.. 3. ‫ىڞ‬୯‫ޠ‬৘ใ. ද 1 ‫ޠࡧݕ‬ϥϯΩϯά্Ґ‫ޠ‬. Table 1 An example of top words. ຊষͰ͸ɼ‫ىڞ‬୯‫ޠ‬Ϧετͷੜ੒ͱ‫ʹؒ۠ิީޠࡧݕ‬ର. in retrieval term rank.. ͢Δ‫ىڞ‬୯‫ޠ‬৘ใͷࢉग़ख๏ʹ͍ͭͯड़΂Δɽ. 3.1 ‫ىڞ‬୯‫ޠ‬Ϧετͷੜ੒. NISA. ύζυϥ. ͋·ͪΌΜ. ࡖխਓ. ҆౻ඒඣ. ୌ઒ΫϦεςϧ. ෋࢜ࢁ. iPhone. ౦‫ژ‬εΧΠπϦʔ. ຊख๏Ͱ͸ɼ‫͠ىڞͱޠࡧݕ‬΍͍͢୯‫ޠ‬ͷू߹ͱͯ͠‫ڞ‬ ‫ى‬୯‫ޠ‬ϦετΛੜ੒͢Δɽ‫ىڞ‬୯‫ޠ‬Ϧετ࡞੒ͷྲྀΕΛ. ηοτ V = v1 , . . . , v49 Λߟ͑ɼTF-IDF Λࢉग़͢ΔͨΊ. Fig. 2 ʹࣔ͢ɽ‫ ޠࡧݕ‬w ͕༩͑ΒΕΔͱɼ‫ޠࡧݕ‬Λ web. ͷจॻू߹ͱͯ͠ɼD(w) ͷଞ D(v1 ), . . . , D(v49 ) ͷ‫ ܭ‬50. ‫͠ࡧݕ‬ɼ‫ؔ͢ʹޠࡧݕ‬Δ web จॻΛऩू͢ΔɽจॻΛ‫ܗ‬ଶ. จॻΛ༻͍ΔɽD(v1 ), . . . , D(v49 ) ͸ɼࣄલʹ࡞੒͓ͯ͠. ૉղੳ͠ɼग़‫͚ͩࢺ໊ͨ͠ݱ‬Λநग़͠ɼ໊֤ࢺͷ TF-IDF. ͘͜ͱͱ͢Δɽ͜ΕΑΓɼࣜʢ3ʣͷ tf ͸ D(w)ɼdf ͸. ͷ஋Λ‫͢ࢉܭ‬ΔɽTF-IDF ͸ɼจॻ਺ N , ‫ޠ‬ස౓ tfʢterm. D(w), D(v1 ), . . . , D(v49 ) ͔Β‫͞ࢉܭ‬ΕɼN ͸ 50 ͱͳΔɽ. frequencyʣ, จॻස౓ df ʢdocument frequencyʣ͔Β. V ͱͯ͠ɼ2013 ೥ͷ google ‫ޠࡧݕ‬ϥϯΩϯά্Ґ 49 ‫ޠ‬Λ. N TF − IDF = tf × log df. (3). ༻͍ͨɽͦͷҰ෦Λ Table. 1 ʹࣔ͢ɽจॻ D(x) ΛಘΔͨ Ίͷ‫ࡧݕ‬Τϯδϯͱͯ͠͸ google Λ༻͍ɼऔಘ͢Δ web. ͷΑ͏ʹࢉग़͞ΕɼTF-IDF ͷ্Ґ T ‫ݸ‬ͷ୯‫ʹޠ‬Αͬͯ. ϖʔδ਺ S ͸ 500 ͱͨ͠ɽͨͩ͠ɼจॻΛ͏·͘औಘͰ. ‫ىڞ‬୯‫ޠ‬ϦετΛߏ੒͢Δɽ. ͖ͳ͍ URL ͕Ұ෦ଘࡏ͢ΔͨΊɼ࣮ࡍʹऔಘͰ͖ͨ web. TF-IDF ͷࢉग़ʹ͸ɼҰൠʹɼจॻू߹͕ඞཁͰ͋Γɼ. ϖʔδ਺͸ 500 ΑΓগͳ͍৔߹͕͋Δɽ‫ܗ‬ଶૉղੳ‫͠ͱث‬. ຊ‫Ͱڀݚ‬͸ɼFig. 3 ʹࣔ͢Α͏ʹ୯‫ޠ‬Λ web ‫্͠ࡧݕ‬. ͯʮ஡᝖ʯΛ༻͍ɼ‫ىڞ‬୯‫ޠ‬Ϧετͷ୯‫ ਺ޠ‬T ͸༧උ࣮‫ݧ‬. Ґ S ݅ͷ web ϖʔδΛ࿈݁ͨ͠΋ͷΛ 1 ͭͷจॻͱ. ͷ݁Ռ͔Β T = 150 ͱͨ͠ɽ‫ޠࡧݕ‬ʮ҆อཧʯʹରͯ͠ಘ. ߟ͑Δɽ͜͜Ͱɼ͋Δ୯‫ ޠ‬x Λ‫ͯ͠ࡧݕ‬ಘΒΕΔจॻ. ΒΕͨ‫ىڞ‬୯‫ޠ‬ͷ͏ͪɼ্Ґ 5 Ґ·Ͱͷ୯‫ޠ‬Λ Table. 2 ʹ. Λ D(x) ͱද‫͢ه‬Δ͜ͱͱ͢Δɽ‫ ޠࡧݕ‬w Ҏ֎ͷ୯‫ޠ‬. ࣔ͢ɽ. ⓒ 2016 Information Processing Society of Japan. 3.

(4) Vol.2016-SLP-110 No.1 2016/2/5. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report ද 2. ‫ޠࡧݕ‬ʮ҆อཧʯʹର͢Δ‫ىڞ‬୯‫ޠ‬ͷྫ. Table 2 An example of collocation words for a query term “Anpori”. ॱҐ. ୯‫ޠ‬. 1. ҆อཧ. 8.2. 2. ܾٞ. 2.4. 3. ࠃ࿈. 2.3. 4. ৗ೚. 1.6. 5. ࠾୒. 1.4. ද 3. TF-IDF ஋. ฏ‫ىڞۉ‬୯‫਺ޠ‬ͷൺֱ. Table 3 Comparison of average number of collocation words. M. ਖ਼ղ۠ؒ. ෆਖ਼ղ۠ؒ. 10. 0.48. 0.0001. 100. 2.61. 0.0007. ਤ 4. ฏ‫ىڞۉ‬୯‫਺ޠ‬ͷมԽ. Fig. 4 Change of average number of collocation words. ද 4. 3.2 ‫ىڞ‬୯‫ޠ‬৘ใͷࢉग़. ະ஌‫ ޠ‬50 ‫ޠ‬ͷྫ. Table 4 An example of 50 unknown words.. ‫ىڞ‬୯‫ޠ‬৘ใͷࢉग़ख๏ͱͯ͠ɼҎԼͰड़΂Δ 3 छྨͷ ख๏ΛࢼΈΔɽ. 3.2.1 ख๏ 1ɿTF-IDF Λ༻͍ͨख๏ ‫ ޠࡧݕ‬w ʹରͯ͠ɼީิ۠ؒͷલ‫ ޙ‬M ‫ʹ಺ޠ‬ग़‫ͨ͠ݱ‬. ίϯςΩετσΟϖϯσϯτ. ໊‫ݘ‬ϥογʔ. ஌চ. εςΟʔϒϯΩϯά. ੴ઒ౡ଄ધॴ. ҆อཧ. ϢχόʔαϧελδΦ. ஊऱ࿖ᖼഅ. NATO ‫܉‬. ‫ىڞ‬୯‫ ޠ‬ui ͷ TF-IDF ஋ Λ Fw˜ (ui ) ͱ͢Δͱ͖ɼ‫ىڞ‬୯ ‫ޠ‬৘ใΛࣜ (4) ʹΑͬͯࢉग़͢Δɽ. X Fw˜ (ui ) Iw˜ = 2M. 4. ධՁ࣮‫ݧ‬ (4). ui ∈U. 4.1 ࣮‫ݧ‬৚݅ͷઃఆ ࣮‫ݧ‬σʔλʹ͸ɼNTCIR-9 Spoken Doc[13] ͷςετί. 3.2.2 ख๏ 2ɿ‫ىڞ‬୯‫ޠ‬ग़‫਺ݱ‬Λ༻͍ͨख๏. ϨΫγϣϯΛ༻͍ͨɽ࣮‫ݧ‬σʔλʹ‫·ؚ‬ΕΔ‫ࡧݕ‬ର৅ͷԻ. ‫ؒ۠ิީޠࡧݕ‬ͷલ‫ʹޙ‬ग़‫͢ݱ‬Δ୯‫ͯؔ͠ʹޠ‬ɼ‫ىڞ‬୯. ੠υΩϡϝϯτ͸ɼ೔ຊ‫ޠ‬࿩͠‫ݴ‬༿ίʔύεʢCSJɿCorous. ‫Ͳ͕ޠ‬ͷఔ౓Ͱग़‫͢ݱ‬Δ͔Λ༧උతʹ෼ੳͨ͠ɽ‫͕ޠࡧݕ‬. of Spontaneous JapaneseʣͷίΞ 177 ߨԋʢஉੑ 99 ߨԋɼ. ࣮ࡍʹ࿩͞Ε͍ͯΔਖ਼ղ۠ؒͱީิ͕༙͖۠ؒग़͠‫ޡ‬ΓͰ. ঁੑ 78 ߨԋʣ[14], [15] Λ༻͍ɼ‫ࡧݕ‬ର৅ͷԻ੠υΩϡϝ. ͋Δෆਖ਼ղ۠ؒʹ෼͚ͯɼલ‫ޙ‬ͷ୯‫ ਺ޠ‬M Λม͑ͨͱ͖. ϯτͷೝࣝ݁Ռ͸ɼNTCIR-9 Spoken Doc ͷλεΫΦʔ. ͷฏ‫ىڞۉ‬୯‫਺ޠ‬Λ Table. 3 ͱ Fig. 4 ʹࣔ͢ɽ͜ΕΑΓɼ. ΨφΠβ͔Β഑෍͞Ε͍ͯΔ୯‫ޠ‬Ի੠ೝࣝ݁Ռʢ୯‫ޠ‬ਖ਼ղ. લ‫ޙ‬ͷ୯‫ ਺ޠ‬M ʹґΒͣɼਖ਼ղ۠ؒͱෆਖ਼ղ۠ؒͰ͸‫ڞ‬. ཰ 76.68%ʣͱԻઅ trigram ͰͷԻઅೝࣝ݁ՌʢԻઅೝࣝ. ‫ى‬୯‫ޠ‬ͷग़‫͕਺ݱ‬େ͖͘ҟͳΔ͜ͱ͕Θ͔Δɽͦ͜Ͱɼީ. ཰ 81.8%ʣΛ༻͍ͨɽ·ͨɼධՁ༻ͷ‫ޠࡧݕ‬͸ɼίΞߨԋ. ิ۠ؒͷલ‫ ޙ‬M ‫ʹ಺ޠ‬ग़‫ىڞͨ͠ݱ‬୯‫ޠ‬ͷ୯‫ ਺ޠ‬K Λ. ༻ະ஌‫ޠ‬ηοτ 50 ‫ޠࡧݕ‬Λ༻͍ͨ [13]ɽະ஌‫ ޠ‬50 ‫ޠ‬ͷҰ. ༻͍ͯɼ‫ىڞ‬୯‫ޠ‬৘ใΛ. ෦Λ Table. 4 ʹࣔ͢ɽະ஌‫ ޠ‬50 ‫ޠ‬ͷฏ‫ۉ‬Իૉ਺͸ 11.8ɼ. Iw˜ =. X ui ∈U. 1 K = 2M 2M. (5). ͍͜ͱΛࣔ͢ɽϕʔεϥΠϯͱͯ͠ɼ2.1 અͰड़΂ͨαϒ. 3.2.3 ख๏ 3ɿ୯‫ޠ‬৴པ౓Λ༻͍ͨख๏ Ի੠υΩϡϝϯτΛ࿈ଓԻ੠ೝࣝͨ͠ͱ͖ʹಘΒΕΔ୯ ‫ޠ‬৴པ౓Λ༻͍ͯ୯‫ىڞޠ‬৘ใΛࢉग़͢Δɽީิ۠ؒͷલ ‫ ޙ‬M ‫ʹ಺ޠ‬ग़‫ىڞͨ͠ݱ‬୯‫ ޠ‬ui ͷ୯‫ޠ‬৴པ౓Λ Rw˜ (ui ) ͱ͢Δͱ͖ɼ‫ىڞ‬୯‫ޠ‬৘ใΛ. X Rw˜ (ui ) 2M. ui ∈U. ʹΑͬͯࢉग़͢Δɽ. ͯɼ࠶‫཰ݱ‬ɼద߹཰ɼF ஋ɼMAP Λ༻͍Δɽ͜ΕΒͷධ Ձई౓͸શͯ 0 ∼ 1 ͷ஋ΛͱΓɼ1 ʹ͍ۙ΄Ͳਫ਼౓͕ߴ. ʹΑͬͯࢉग़͢Δɽ. Iw˜ =. ࠷େԻૉ਺͸ 21ɼ࠷খԻૉ਺͸ 6 Ͱ͋ΔɽධՁई౓ͱ͠. ϫʔυྻʹ‫ͮ͘ج‬࿈ଓ DP Ϛονϯάͷ݁ՌΛ༻͍Δɽ‫ڞ‬ ‫ى‬୯‫ޠ‬Ϧετʹ༻͍Δ୯‫ ਺ޠ‬T ͸ 150 ͱ͠ɼલ‫ޙ‬୯‫਺ޠ‬ ʢM ʣ͸ 0 ∼ 300 ·Ͱ 10 ִؒɼ‫ىڞ‬୯‫ޠ‬৘ใͷॏΈʢαʣ ͸ 0 ∼ 100 ·Ͱ 5.0 ɼ୯‫ޠ‬৴པ౓ͷॏΈʢβ ʣ͸ 0 ∼ 2.0 ·Ͱ 0.2 ִؒͰมԽͤͨ͞ɽ. (6) 4.2 ࣮‫݁ݧ‬Ռ ࣮‫ݧ‬ͷ݁Ռͷ࠶‫཰ݱ‬ɼద߹཰ɼF ஋ɼMAP Λ Table.5 ʹ ࣔ͢ɽM ɼα ͸ɼF ஋͕࠷΋ߴ͍஋Λࣔͨ͠ͱ͖ͷ஋Ͱ͋. ⓒ 2016 Information Processing Society of Japan. 4.

(5) Vol.2016-SLP-110 No.1 2016/2/5. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. ਤ 5 ࠶‫཰ݱ‬-ద߹཰‫ۂ‬ઢ. Fig. 5 Recall - Precision curve.. ਤ 6. ॏΈ α ͷมԽʹΑΔ F ஋ͷมԽ. Fig. 6 F-mesure change curve by weight “α”.. ΓɼFig.5 ʹ͸ɼϕʔεϥΠϯɼख๏ 1-3 ͷ࠶‫཰ݱ‬-ద߹཰ ‫ۂ‬ઢΛ͍ࣔͯ͠Δɽख๏ 1 Ͱ͸ϕʔεϥΠϯʹൺ΂ɼF ஋ Ͱ 4.6% ɼMAP Ͱ 13.5% ͷਫ਼౓޲্ɼख๏ 2 Ͱ͸ϕʔ εϥΠϯʹൺ΂ɼF ஋Ͱ 7.0% ɼMAP Ͱ 17.1% ͷਫ਼౓޲ ্ɼख๏ 3 Ͱ͸ϕʔεϥΠϯʹൺ΂ɼF ஋Ͱ 5.8% ɼMAP Ͱ 15.4% ͷਫ਼౓޲্͕‫ݟ‬ΒΕͨɽख๏ 2 ͕࠷΋޲্ͨ͠ ͨΊɼख๏ 2 ʹީิ۠ؒͷ୯‫ޠ‬৴པ౓‫ࢉݮ‬Λ૊ΈࠐΜͩ৔ ߹ɼख๏ 2 ʹൺ΂ߋʹ F ஋Ͱ 1.9%ɼMAP Ͱ 6.7%ͷਫ਼౓ ޲্͕‫ݟ‬ΒΕͨɽશͯͷख๏Ͱਫ਼౓ͷ޲্͕‫ݟ‬ΒΕͨ͜ͱ ͔ΒɼఏҊख๏ͷ༗༻ੑ͕֬ೝͰ͖ͨɽ·ͨɼͦΕͧΕͷ ख๏Ͱ࠷ద F ஋ΛऔΔͱ͖ͷલ‫ޙ‬୯‫ ਺ޠ‬M Λ‫ݻ‬ఆ͠ॏΈ. α ΛมԽͤͨ͞άϥϑΛ Fig. 6 ʹɼॏΈ α Λ‫ݻ‬ఆ͠લ‫ޙ‬୯ ‫ ਺ޠ‬M ΛมԽͤͨ͞άϥϑΛ Fig. 7 ʹࣔ͢ɽͦΕͧΕɼ. α = 0ɼM = 0 ͕ϕʔεϥΠϯͷੑೳΛ͓ࣔͯ͠Γɼख๏ 1-3 ʹ͓͍ͯɼM = 100, α = 50 ·Ͱʹ࠷ద F ஋͕ग़‫͠ݱ‬ ͨͨΊɼM = 100, α = 50 ·ͰΛࣔ͢ɽFig. 6 ͔Β‫ىڞ‬୯. ਤ 7. લ‫ޙ‬୯‫਺ޠ‬ͷมԽʹΑΔ F ஋ͷมԽ. Fig. 7 F-mesure change curve by number of context words.. ‫ޠ‬৘ใͱͯ͠‫ىڞ‬୯‫ޠ‬ग़‫਺ݱ‬Λ༻͍Δख๏ 2 Ͱɼଞͷ 2 ͭ. ख๏ 2 Ͱ͸౳͍͠஋ɼख๏ 3 Ͱ͸Ի੠ೝࣝΛߦͬͨͱ͖ͷ. ͷख๏ΑΓͲͷ α ʹରͯ͠΋ F ஋͕ߴ͍͜ͱ͕Θ͔Δɽ. ୯‫ޠ‬৴པ౓Λ༻͍͍ͯΔͱߟ͑Δ͜ͱ͕Ͱ͖Δɽ‫ىڞ‬୯‫ޠ‬. Fig. 7 ͔ΒɼͲͷఏҊख๏ʹ͓͍ͯ΋ɼ‫ىڞ‬৘ใͷࢉग़ʹ. ͷՃࢉʹ͓͍ͯɼ୯‫ޠ‬ผʹҟͳΔॏΈΛ༩͑Δ͜ͱʹΑͬ. ༻͍Δީิ۠ؒલ‫ޙ‬ͷ୯‫ ਺ޠ‬M Λ૿Ճͤ͞Δͱ F ஋্͕. ͯΑΓద੾ͳ‫ىڞ‬୯‫ޠ‬৘ใΛಘΔ͜ͱΛ‫ظ‬଴͕ͨ͠ɼ݁Ռ. ঢ͢Δ͕ɼM = 40 ∼ 60 Λӽ͑ͯ૿Ճͤͯ͞΋ͦΕ΄Ͳ F. ͱͯ͠͸Ͳͷ୯‫ʹޠ‬΋౳͍͠ॏΈΛ༩͑Δɼ͢ͳΘͪ୯‫ޠ‬. ஋ͷ৘ใ͸‫ݟ‬ΒΕͳ͍͜ͱ͕Θ͔Δɽ͜Ε͸ɼൃ࿩ͷ಺༰. ਺͚ͩΛߟྀ͢Δख๏ 2 ͕࠷΋ྑ͍ੑೳΛࣔͨ͠ɽख๏. ͕มԽ͠ɼͦΕʹ൐͍ؔ࿈ੑͷ͋Δ୯‫ޠ‬΋มԽ͍ͯ͠Δ͔. 2 ʹΑͬͯɼԻ‫ڹ‬৘ใΛͷΈΛ༻͍ΔϕʔεϥΠϯͷख๏. ΒͩͱਪଌͰ͖Δɽ. ʹൺ΂ͯɼF ஋Ͱ 7.0%ɼMAP Ͱ 17.3% ޲্ͨ͠ɽߋʹɼ. 5. ·ͱΊ. ख๏ 2 ʹର͠ɼީิ۠ؒͷ୯‫ޠ‬৴པ౓ͷ‫ࢉݮ‬ΛՃ͑ͨͱ͜ Ζख๏ 2 ͔Β F ஋Ͱ 1.9%ɼMAP Ͱ 6.7% ਫ਼౓͕޲্͠. Ի੠தͷ‫ݕޠࡧݕ‬ग़ (STD: Spoken Term Detection) ʹ. ͨɽຊ࿦จͰ͸ɼ‫͢ىڞͱޠࡧݕ‬Δ୯‫ޠ‬Λ໊ࢺ͚ͩʹ‫ݶ‬ఆ. ͓͍ͯɼԻ੠υΩϡϝϯτͷԻ‫ڹ‬తͳಛ௃ʹՃ͑ͯɼ‫ޠݴ‬. ͠ TF-IDF ʹΑͬͯબ୒͍ͯ͠Δɽ͜ͷબ୒ํ๏ͷมߋ΍. తͳಛ௃Λར༻͢Δख๏ΛఏҊ͠ɼͦͷ༗ޮੑΛ‫ͨ͠ূݕ‬ɽ. ผͷॏΈ෇͚Λߟ͑Δ͜ͱʹΑͬͯɼ‫ىڞ‬୯‫ޠ‬ͷॏΈ෇͚. ‫ޠݴ‬తͳಛ௃͸ɼ‫ؒ۠ิީޠࡧݕ‬ͷલ‫ʹޙ‬ग़‫͢ݱ‬Δ୯‫ͱޠ‬. ͕༗ޮͱͳΔՄೳੑ΋͋Γɼࠓ‫ޙ‬ɼ‫ݕ‬౼͍͖͍ͯͨ͠ɽ·. ‫ͱޠࡧݕ‬ͷ‫͍ͯͮجʹىڞ‬ද‫͞ݱ‬ΕΔɽ୯‫ىڞޠ‬৘ใΛఆ. ͨɼީิ۠ؒલ‫ʹޙ‬ग़‫͢ݱ‬Δ୯‫ͱޠࡧݕͱޠ‬ͷ‫ىڞ‬Λίʔ. ྔԽ͢Δख๏ͱͯ͠ɼ3 ͭͷख๏Λൺֱͨ͠ɽ͜ΕΒͷख. ύε͔Β௚઀తʹֶश͢Δख๏ͳͲ΋ࠓ‫ޙ‬ͷ‫ݕ‬౼՝୊ͱ͠. ๏ͷҧ͍͸ɼग़‫ىڞͨ͠ݱ‬୯‫ޠ‬ΛՃࢉ͢Δ࣌ͷॏΈͷҧ͍. ͯ‫͛ڍ‬ΒΕΔɽ. Ͱ͋Γɼ‫ىڞ‬୯‫ޠ‬ͷॏΈͱͯ͠ख๏ 1 Ͱ͸ TF-IDF ͷ஋ɼ. ⓒ 2016 Information Processing Society of Japan. 5.

(6) Vol.2016-SLP-110 No.1 2016/2/5. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report ද 5. STD ͷ݁Ռ. Table 5 Results of STD. ख๏. ‫ىڞ‬৘ใ. ୯‫ޠ‬৴པ౓. ϕʔεϥΠϯ. ͳ͠. ͳ͠. 0. 0. 0. 55.9. 60.6. 58.2. 48.9. ख๏ 1. TF-IDF ஋. ͳ͠. 50. 10. 0. 64.3. 61.3. 62.8. 62.4. ख๏ 2. ‫ىڞ‬୯‫ޠ‬ग़‫਺ݱ‬. ͳ͠. 60. 25. 0. 56.6. 76.8. 65.2. 66.0. ख๏ 3. ୯‫ޠ‬৴པ౓. ͳ͠. 40. 35. 0. 63.5. 64.5. 64.0. 64.3. ख๏ 2. ‫ىڞ‬୯‫ޠ‬ग़‫਺ݱ‬. ͋Γ. 180. 65. 1.6. 65.7. 68.5. 67.1. 72.7. M. α. ँࣙ ຊ‫ڀݚ‬Λߦ͏ʹ͋ͨΓʮ೔ຊ‫ޠ‬࿩͠‫ݴ‬༿ίʔύεʯ ‫ͼٴ‬ɼʮNTCIR-9 Spoken Doc λεΫʯͷ ʮCSJ Spoken. Document Retrieval CollectionʯΛ࢖༻ͨ͠ɽ. ࠶‫཰ݱ‬ʢ%ʣ. β. [14] [15]. ద߹཰ʢ%ʣ. F ஋ʢ%ʣ MAPʢ%ʣ. લ઒‫ٱت‬༤ɼ“ʮ೔ຊ‫ޠ‬࿩͠‫ݴ‬༿ίʔύεʯͷ֓ཁ”ɼࠃཱ ࠃ‫ॴڀݚޠ‬ɼVer.1.2ɽ http://www.ninjal.ac.jp/corpus center/csj/. ࢀߟจ‫ݙ‬ [1]. [2] [3]. [4]. [5]. [6]. [7]. [8]. [9]. [10]. [11]. [12]. [13]. T.Akiba, K.Aikawa, Y.Itoh, T.Kawahara, H.Nanjo, H.Nishizaki, N.Yasuda, Y.Yamashita and K.Itouɼ“Construction of a test collection for spoken document retrieval from lecture audio data”ɼJournal of Information ProcessingɼVol.17ɼpp. 82-94ɼ2009ɽ ळ༿༑ྑɼ“Ի੠υΩϡϝϯτ‫ࡧݕ‬ͷ‫ݱ‬ঢ়ͱ՝୊”ɼ৘ใॲ ཧֶձ‫ڀݚ‬ใࠂɼVol.2010-SLP-82ɼNo.10ɼpp.1-6ɼ2010ɽ ‫ߞాؠ‬ฏଞɼ“‫ޠ‬ኮϑϦʔԻ੠‫ࡧݕ‬ख๏ʹ͓͚Δ৽͍͠α ϒϫʔυϞσϧͱαϒϫʔυԻ‫཭ڑڹ‬ͷ༗ޮੑ”ɼ৘ใॲ ཧֶձ࿦จࢽɼVol.48ɼNo.5ɼpp.1990-2000ɼ2007ɽ T.AkibaɼH.NishizakiɼK.AikawaɼT.Kawaharaɼ T.Matsuiɼ“Overview of the IR for Spoken Documents Task in NTCIR-9 Workshop”ɼProceedings of NTCIR-9 Workshop Meetingɼpp.223-235ɼ2011ɽ ࡔຊॷɼࢁຊҰެɼத઒੟Ұɼ“‫͖ͭ཭ڑ‬ԻઅτϥΠάϥ ϜΛ༻͍ͨԻ੠ೖྗʹΑΔԻ੠υΩϡϝϯτͷ‫ݕޠࡧݕ‬ ग़๏ͷධՁ”ɼୈ 7 ճԻ੠υΩϡϝϯτॲཧϫʔΫγϣο ϓɼSDPW2013-05ɼpp.1-4ɼ2013. େ໺఩ฏɼळ༿༑ྑɼ“Իઅ‫ܧ‬ଓ࣌ؒΛར༻ͨ͠௚ઢ‫ݕ‬ग़ ʹ‫ ͮ͘ج‬STD ख๏”ɼୈ 8 ճԻ੠υΩϡϝϯτॲཧϫʔ ΫγϣοϓɼSDPW2012-02-01ɼpp.1-8ɼ2012ɽ ໊औ‫ݡ‬ɼ੢࡚തޫɼؔ‫ޱ‬๕ኍɼ“ෳ਺Ի੠ೝࣝγεςϜΛ ༻͍ͨԻ੠தͷ‫ݕޠࡧݕ‬ग़ͷ‫ݕ‬౼”ɼ৘ใॲཧֶձ‫ڀݚ‬ใ ࠂɼVol.2009-SLP-79ɼNo.19ɼpp.1-6ɼ2009ɽ ࡔຊҏ৫ɼদӬపɼ᪅ᅳɼࢁԼ༸Ұɼ“Ի‫ڹ‬৘ใͷϕΫτϧ ྔࢠԽΛ༻͍ͨԻ੠υΩϡϝϯτ͔Βͷ‫ݕޠࡧݕ‬ग़”ɼ৘ ใॲཧֶձ࿦จࢽɼVol.55ɼNo.12ɼpp.2537-2545ɼ2014ɽ ೆᑍߒًɼલాᠳɼ٢‫ؽݟ‬඙ɼ“Ի੠‫ݕޠࡧݕ‬ग़ͷͨΊͷ ΫΤϦ֦ுͷ‫ݕ‬౼”ɼ৘ใॲཧֶձ‫ڀݚ‬ใࠂɼVol.2014SLP-101ɼNo.16ɼpp.1-6ɼ2014ɽ Haiyang LIɼTiean ZHENGɼGuibin ZHENGɼJiqing HANɼ“Confidence Measure Based on Context Consistency Using Word Occurrence Probability and Topic Adaptation for Spoken Term Detection”ɼIEICE TRANSACTIONS on Information and Systemsɼ Vol.E97-DɼNo.3ɼpp.554-561ɼ2014ɽ ઒࡚঵ɼळ༿༑ྑɼ“୯‫ىڞޠ‬Λ༻͍ͨٙཅੑ‫ޡ‬Γʹ‫݈ؤ‬ ͳԻ੠υΩϡϝϯτͷ‫ࡧݕ‬Ϟσϧ”ɼ೔ຊԻ‫ֶڹ‬ձ 2014 ೥ य़‫ൃڀݚق‬දձߨԋ࿦จूɼ3-Q5-5ɼpp.193-196ɼ2014ɽ ֯஍ྑଠɼҶ໺ඒࣿࢠɼ౔԰੣࢘ɼ౉෦޿Ұɼ“୯‫ޠ‬ͷ‫ڞ‬ ‫ى‬৘ใʹ‫͍ͨͮج‬Ի੠ೝࣝ‫ޡ‬Γ୯‫ޠ‬ͷิਖ਼ख๏”ɼిࢠ৘ ใ௨৴ֶձٕज़‫ڀݚ‬ใࠂɼAI2012-44ɼpp.19-24ɼ2013ɽ ੢࡚തޫɼ‫ً৽މ‬ɼೆᑍߒًɼҏ౻‫໌ܚ‬ɼळ༿༑ྑɼՏ ‫ݪ‬ୡ໵ɼத઒੟ҰɼদҪ஌ࢠɼࢁԼ༸Ұɼ૬઒ਗ਼໌ɼ“Ի ੠தͷ‫ݕޠࡧݕ‬ग़ͷͨΊͷςετίϨΫγϣϯͷߏஙͱ ෼ੳ”ɼ৘ใॲཧֶձ࿦จࢽɼVol.54ɼNo.2ɼpp.471-483ɼ 2013ɽ. ⓒ 2016 Information Processing Society of Japan. 6.

(7)

Fig. 1 Flow of STD method using infomation of collocation.
Table 3 Comparison of average number of collocation words.
Fig. 7 F-mesure change curve by number of context words.

参照

関連したドキュメント

学術資源リポジトリにおけるLightweight Information Describing ObjectLIDOの検討 A study of Lightweight Information Describing Object LIDO in Academic Resource

2022 年9月 30 日(金)~10 月 31 日(月)の期間で東京・下北沢で開催される「下北沢カレーフェステ ィバル 2022」とのコラボ企画「MANKAI

大六先生に直接質問をしたい方(ご希望は事務局で最終的に選ばせていただきます) あり なし

○社会福祉事業の経営者による福祉サービスに関する 苦情解決の仕組みの指針について(平成 12 年6月7 日付障第 452 号・社援第 1352 号・老発第

○ 4番 垰田英伸議員 分かりました。.

Let T (E) be the set of switches in E which are taken or touched by the jump line of E. In the example of Fig. This allows us to speak of chains and antichains of switches.. An

検索対象は、 「論文名」 「著者名」 「著者所属」 「刊行物名」 「ISSN」 「巻」 「号」 「ページ」

It is clear that each Dyck path is coded by a word u ∈ {a, a} ¯ ∗ , called Dyck word, so that every rise (resp. fall) corresponds to the letter a (resp. valley ) if it is preceded by