• 検索結果がありません。

オープンアクセスを想定した日本語学術論文ファイルの自動判定

N/A
N/A
Protected

Academic year: 2021

シェア "オープンアクセスを想定した日本語学術論文ファイルの自動判定"

Copied!
8
0
0

読み込み中.... (全文を見る)

全文

(1)2006−FI−82(8) 2006−DD−54(8)   2006/3/22. 社団法人 情報処理学会 研究報告 IPSJ SIG Technical Report. ỼὊἩὅỴἁἍἋửे‫ܭ‬ẲẺଐஜᛖ‫ܖ‬ᘐᛯ૨ἧỳỶἽỉᐯѣЙ‫ܭ‬ᴾ ‫࢟ܤ‬ᴾ ᠗ᴾᵆʣኬʣ‫ܖٻ‬ᵇ*ᴾ ჽဋ௿፦ᵆᬮඕӨ‫ܖٻ‬ᵇᴾ ᴾ ൷ϋᴾ คᴾᵆ‫ٻ‬ி૨҄‫ܖٻ‬ᵇᴾ ʁ᣼᭗࣓ᵆ˺ૼ‫ܖ‬ᨈ‫ܖٻ‬ᵇᴾ᣼஛ᢊ‫܇‬ᵆᤧᢊዮӳ২ᘐᄂᆮ৑ᵇᴾ ɥဋ̲ɟᵆঅড፯‫ܖٻؽ‬ᵇᴾ *e-mail㧦[email protected] ࠝ࡯ࡊࡦࠕࠢ࠮ࠬⅣႺ߇ㅴዷߔࠆߦߟࠇ‫ߩࠣࡦࡆࠗࠞ࡯ࠕࡈ࡞࠮ޔ‬ᒻᑼߢ⥄ࠄߩ⎇ⓥᚑ ᨐࠍ౏㐿ߔࠆ⎇ⓥ⠪߇ᕆჇߒߡ޿ࠆ‫ߥ߁ࠃߩߘޕ‬ᚑᨐߪ‫ޔ‬ᓥ᧪ߩߔߴߡߩ࠙ࠚࡉࠍኻ⽎ߣ ߔࠆᬌ⚝ࠛࡦࠫࡦ߆ࠄ߽ࠕࠢ࠮ࠬ߇น⢻ߢ޽ࠆ߇‫ޔ‬ᬌ⚝⚿ᨐਛߩઁߩ߽ߩߦၒᴚߒߡߒ߹ ߁ߎߣ߇ᄙ޿‫⎇ᧄޔߢߎߘޕ‬ⓥߢߪ࠙ࠚࡉࠦࡦ࠹ࡦ࠷ਛ߆ࠄߩቇⴚ⺰ᢥ‫⺰ߪ޿ࠆ޽ޔ‬ᢥߦ Ḱߕࠆࠦࡦ࠹ࡦ࠷ࠍ್ቯߔࠆࠪࠬ࠹ࡓ᭴▽ࠍ⋡ᜰߒ‫ޔ‬SVM ߥߤ‫ޔ‬ᄙߊߩᚻᴺࠍ↪޿ߡ⥄േ ್ቯታ㛎ࠍⴕߞߚ‫⥄ޕ‬േ್ቯߩᚻ߇߆ࠅߣߥࠆዻᕈ⟲ߣߒߡߪࡈࠔࠗ࡞ਛߦ಴⃻ߔࠆ⺆ߣ ⚻㛎⊛ߥ࡞࡯࡞⟲ࠍ↪޿ߚ‫ޕ‬ታ㛎ߢߪ SVM ߢߪ㜞޿♖ᐲ‫ߪߢ࠭ࠗࡌࡉ࡯ࠗ࠽ޔ‬㜞޿ౣ⃻₸ ߇ᓧࠄࠇࠆߥߤ‫ޔ‬ฦᚻᴺߩ․ᕈ߇᣿ࠄ߆ߣߥߞߚ‫ޕ‬. Automatic identification of academic articles in Japanese PDF files towards open-access Teru AGATA (Asia University) * Michiko NOZUE (Railway Technical Research Institute) Atushi IKEUCHI (Daito Bunka Univ.) Takashi KUNO (Sakushingakuin University) Emi ISHIDA (Surugadai University) Shuichi UEDA (Keio University) *email: [email protected] As open-access becomes common, many researchers deposit their research products in a publicly accessible web (i.e. self-archiving). Although they are accessible from general search engines, massive other contents tend to hide them. The purpose of this research is to identify academic articles or quasi-articles from the entire web automatically. In this paper we conduct experiments on the performance of various classifiers and compare in terms of precision, recall, F-value. The classifiers used such attributes as terms appeared in PDF files and empirical rules. The diverse performance of each classifier discloses its characteristics.. 㸇 䈲䈛 䉄䈮 A. 䉥䊷䊒䊮䉝䉪䉶䉴䈫䈲 ㄭᐕ䇮ቇⴚᖱႎᵹㅢ䈮䈍䈇䈩 䇮䉥䊷䊒䊮䉝 䉪 䉶䉴䈮ኻ䈜䉎㑐ᔃ䈏㜞䉁䈦䈩䈇䉎䇯᭽䇱䈭࿅ ૕䉇୘ੱ䈏䇮⚵❱⊛䊶ᛛⴚ⊛஥㕙䈎䉌䇮䉥䊷 䊒䊮䉝䉪 䉶䉴䈱᥉෸䈫ᝄ⥝䈮ነਈ䈜䉎䉋䈉 䈮䈭 䈦䈩䈇䉎㪈䋩 䇯2001 ᐕ䈮㐿௅䈘 䉏䈢䉥䊷䊒䊮䉝䉪 䉶䉴䈮 㑐䈜䉎ળ ⼏䈱ᚑᨐ 䈪䈅 䉎䇸Budapest Open Access Initiative䋨BOAI䋩䇹䈮䉋䉎⴫⃻ 䉕୫䉍 䉎䈭䉌䈳䇮䉥䊷䊒䊮䉝䉪 䉶䉴䈫䈲 䇮"ቢో 䈮ήఘ䈪೙⚂䈱䈭䈇䉝䉪 䉶䉴䈮䉋 䈦䈩䇮ቇⴚ ᢥ₂䉕਎⇇ⷙᮨ䈪㔚ሶ⊛䈮ឭଏ䈜䉎䈖 䈫 " 㪉㪀䈫 䈘 䉏䈩䈇䉎䇯䈢䈣䈚䇮䉥䊷䊒䊮䉝䉪 䉶䉴⺰⠪䈱 㑆䈪䈲 䇮ᩏ⺒ઃ䈐ቇⴚ⺰ᢥ䈮㒢ቯ䈜䉎 䈎䈬䈉 䈎䇮㔀⹹಴ ᓟ䈱₈੍ᦼ㑆䉕⹺䉄䉎䈎䈬䈉 䈎 䈭䈬䈱ὐ䈮䈍䈇䈩ቯ⟵䈲৻᭽䈪䈲䈭䈇䇯. ৻⥸䈮䇮⎇ⓥ⠪䈏䊒䊧䊒䊥 䊮䊃䈅䉎 䈇䈲䊘䉴 䊃䊒䊥 䊮䊃䉕⥄ಽ䈱䉡䉢䊑䉰䉟 䊃 䈮౏㐿䈜䉎䇸䉶 䊦䊐䉝䊷䉦䉟 䊎䊮䉫䇹䇮ᯏ㑐䊥 䊘䉳䊃 䊥 䈭䈬৻⥸ 䈱ੱ䈏䉝䉪 䉶䉴䈪䈐䉎䉡䉢䊑䉰䉟 䊃䈮⊓㍳䈜䉎 ᣇᑼ䇮⺰ᢥ䈱⪺⠪䈏಴ ⾌↪䉕ᡰᛄ䈇䇮⺒⠪ 䈲 ήᢱ䈪⺒䉃䈖 䈫 䈏䈪䈐䉎 䇸䉥䊷䊒䊮䉝䉪 䉶䉴 䉳䊞 䊷䊅䊦䇹䈱ឭଏ䈫 䈇䈦䈢ታ⃻ᣇᴺ䈏⹺䉄䉌 䉏䈩䈇䉎䇯ቇⴚ⺰ᢥ䈏䉥䊷䊒䊮䉝䉪 䉶䉴䈫 䈇䈉 ᒻ䈪ឭଏ䈘䉏䉎䈖䈫 䈲 䇮⎇ⓥ⠪䈮䈫 䈦䈩䈲䇮⪺ ⠪䈫䈚䈩⥄䉌䈱⎇ⓥᚑᨐ䉕ᐢ▸࿐䈮ᵹㅢ䈘䈞 䉎 䈢䉄䈱೙ᐲ⊛ၮ⋚ 䈏⏕┙䈘 䉏䉎䈖 䈫 䈪䈅䉍䇮 ⺒⠪䈮䈫 䈦䈩䈲⎇ⓥ⾗Ḯ䉕ኈᤃ䈮೑↪䈪䈐䉎 䈖 䈫 䈮䈧䈭䈏䉎䇯䈠䈱⚿ᨐ䇮䉥䊷䊒䊮䉝䉪 䉶䉴 䈪ឭଏ䈘 䉏䈩䈇䉎⺰ᢥ䈲䈠䈉 䈪䈭䈇⺰ᢥ䉋䉍䉅䇮 ⵍᒁ↪₸䈏㜞䈇䈫 䈇䈉⎇ⓥᚑᨐ䈏ႎ๔䈘 䉏䈩 䈇䉎䉋䈉 䈮䈠䈱ല↪䈏⹺䉄䉌 䉏䈧䈧䈅䉎 䇯 䈭䈍䇮2006 ᐕ 2 ᦬ᤨὐ䈪䈲䇮9 ഀએ਄䈱ਥ. −55− - 1-.

(2) ⷐ䈭ቇⴚ㔀⹹䈏䇮䊘䉴䊃 䊒䊥 䊮䊃䈅䉎䈇䈲䊒䊧䊒 䊥 䊮䊃䉕䉶䊦䊐䉝䊷䉦䉟 䊎䊮䉫䈜䉎䈖䈫 䉕⸵น䈚 䈩䈇䉎㪊㪀䇯 B. 䉥䊷䊒䊮䉝䉪䉶䉴䈭⺰ᢥ䈻䈱䉝䉪䉶䉴ᚻᲑ 䉥䊷䊒䊮䉝䉪 䉶䉴䈻䈱ᦼᓙ䈫 䈠䈱⿲൓䈲᣿ 䉌 䈎䈪䈅䉎䈏䇮⃻⁁䈪䈲 䇮䈅䉌 䉉䉎ቇⴚ⺰ᢥ䈏 䊈䉾 䊃 䊪䊷䉪਄䈮䈍䈇䈩ήఘ䈪೑↪䈪䈐䉎⸶䈪 䈭䈇䇯䈫䈒䈮䇮⸒⺆㑆䇮෸䈶䇮ಽ㊁㑆䈱ᩰᏅ䈲 㗼⪺䈪䈅䉎 䇯వㅴ⊛䈭⹜䉂䈫 䈚䈩⚫੺䈘䉏䉎䉅 䈱䈲䈇䈝䉏䉅ᶏᄖ䈱੐଀䈪䈅䉍䇮ᚒ䈏࿖䈪䈲䇮 ⑼ቇᛛⴚᝄ⥝ᯏ᭴(JST) 㪋 㪀 䈱䇸⑼ቇᛛⴚᖱႎ ⊒ା䊶ᵹㅢ✚ว䉲䉴䊁䊛䇹(J-STAGE) 㪌㪀䈏䉥䊷 䊒䊮䉝䉪 䉶䉴䉳䊞 䊷䊅䊦䉕 100 䉺䉟䊃䊦 એ਄ឭ ଏ䈚 䈩䈇䉎 䈏䇮ో૕䈫 䈚 䈩䉂䉏䈳ᣣᧄ䈱ቇⴚ 㔀⹹䈱㔚ሶൻ⁁ᴫ䈲ᭂ䉄䈩ㆃ䉏䈩䈍䉍䇮․䈮 ੱᢥ␠ળ⑼ቇಽ㊁䈪䈲 ᦠ⹹䊂䊷䉺 䊔䊷䉴䈜 䉌ቢ஻䈘 䉏䈩䈇䈭䈇㗔ၞ䉅ዋ䈭䈒 䈭䈇㪍㪀䇯 ৻ᣇ䇮઒䈮䇮ᄙ䈒䈱ቇⴚ⺰ᢥ䈏䉥䊷䊒䊮䉝䉪 䉶䉴䈮䈭䈦䈢䈫 䈚 䈩䉅䇮䈠䈱ត⚝䇮౉ᚻ䈱໧㗴 䈲 ଐὼ䈫䈚䈩ᱷ䈘 䉏䉎䇯䈖 䉏䈮ኻ䈚 䈩䈲䇮ᣢ䈮 䇸 Open Archive Initiative Protocol for Metadata Harvesting䋨OAI-PMH䋩䇹㪎㪀䈱䉋䈉 䈭ⷙᩰ䉅⠨᩺䈘䉏䇮OAI-PMH 䈮Ḱ᜚䈚䈢ᯏ 㑐䊥 䊘䉳䊃 䊥 䈱ᮮᢿᬌ⚝䉰䊷䊎 䉴䉕ឭଏ䈜䉎 OAIster 䈭䈬 䉅 䈅䉎 䇯 ⃻ ࿷䇮 604 ᯏ 㑐 䈎䉌 6,509,564 ઙ䈏⊓㍳䈘 䉏䈩䈇䉎䈏 䇮䈠䉏䈏ታ 㓙䈮⎇ⓥ⠪䈮೑↪䈘 䉏䈩䈇䉎䈎䈬䈉 䈎䈲ᧂ⍮ ᢙ䈪䈅䉎䇯䉥䊷䊒䊮䉝䉪 䉶䉴䉳䊞 䊷䊅䊦䈱႐ว䇮 䈠䉏䈏೑↪⠪䈮䈫 䈦䈩ᣢ⍮䈱ᖱႎḮ䈪䈅䉏䈳 䉝䉪 䉶䉴䈲ኈᤃ䈭䉅 䈱䈫 䈭䉎䈏䇮⪺⠪䈱䉡䉢䊑 䉰䉟䊃䈭䈬䈮䉋䈦䈩ឭଏ䈘 䉏䉎⾗ᢱ䈮䈧䈇䈩䈲䇮 Google 䈭䈬䈱৻⥸⊛䈭䉰䊷䉼䉣䊮䉳䊮䉕↪䈇 䈩ᬌ⚝䈜䉎 䈾䈎ᚻ┙䈩䈲䈭䈇䇯䈚 䈎䈚 䈭䈏䉌䇮 ታ㓙䈮․ቯ䈱⪺⠪䇮⺰㗴䈱ቇⴚ⺰ᢥ䉕䉰䊷 䉼䉣䊮䉳䊮䈪ត⚝䈚䉋䈉 䈫 䈜䉎႐ว䇮⤘ᄢ䈭ᬌ ⚝䊉䉟䉵䈮ㆣㆄ䈜䉎น⢻ᕈ䈏㜞䈇䈖 䈫 䈎䉌䇮䈭 䉖䉌 䈎䈱⛔৻䈘 䉏䈢䉟 䊮䉺 䊷䊐䉢䊷䉴䈱ሽ࿷䈏 ᦼᓙ䈘 䉏䉎䇯 ቇⴚ⺰ᢥ䉕ኻ⽎䈫䈚䈢䉰䊷䉼䉣䊮䉳䊮䈫䈚䈩 䈲 䇮⧷⺆࿤䈮䈲 䇮CiteSeer.IST 㪏 㪀 䉇 Google Scholar㪐㪀╬䇮ઍ⴫⊛䈭䉅 䈱䈏䈇䈒 䈧䈎ሽ࿷䈜 䉎䇯CiteCeer.IST 䈲䇮⸘▚ᯏ⑼ቇಽ㊁䉕ਛᔃ 䈫䈚䈢㒢ቯ⊛䈭෼㓸䈪䇮ⷙᮨ䈲䈠䉏䈾䈬ᄢ䈐䈒 䈭䈇䇯Google Scholar 䈲ಽ㊁䉕㒢ቯ䈲䈚 䈩䈇 䈭䈇䈏䇮ᬌ⚝⚿ᨐ䈮䈲 䇮ቇදળ䇮ቇⴚ಴ ␠䇮 ᄢቇ࿑ᦠ㙚䊂䊷䉺 䊔䊷䉴 䊔䊮䉻䈭䈬䈖 䉏䉁䈪 ቇⴚᖱႎᵹㅢ䉕ᜂ䈦䈩䈐䈢䉰䉟䊃䈚䈎฽䉁 䉏䈝䇮 ෼㓸ኻ⽎䈏㒢ቯ䈘 䉏䈩䈇䉎䈫ផ᷹䈘 䉏䉎㪈㪇㪀䇯⸒. ⪲䉕឵䈋䉏䈳䇮Google Scholar 䈪䈲䇮⪺⠪䈱 䉡䉢䊑䉰䉟 䊃䈮౏㐿䈘 䉏䈢䉥䊷䊒䊮䉝䉪 䉶䉴⺰ ᢥ䈲ᬌ⚝䈘 䉏䈭䈇䇯 䉁䈫 䉄䉎䈫 䇮ቇⴚᖱႎ䉕ኻ⽎䈫䈚䈢ᄢඨ䈱ᬌ ⚝䉰䊷䊎䉴䈲䇮(1)వㅴ⊛䈭䉰䊷䊎䉴䈲⧷⺆࿤ 䈏ਛᔃ䈪䈅䉍 䇮䈠䈱䉋䈉 䈭 䉰䊷䊎 䉴䈪䈜䉌䇮(2) ቇⴚ㔀⹹䉕ਛᔃ䈫䈚䈢ቇⴚᖱႎᵹㅢ䉕ᔒะ䈚 䈩䈍䉍䇮⎇ⓥ⠪䈱୘ੱ䈱䉡䉢䊑䉰䉟 䊃䉁䈪෼㓸 ኻ⽎䉕ᐢ䈕䈩䈲䈇䈭䈇䇯චಽ䈭䉝䉪 䉶䉴ᚻᲑ 䈏ឭଏ䈘 䉏䈩䈇䈭䈇⃻⁁䈮䈍䈇䈩 䇮ో䉡䉢䊑 䈎䉌䉥䊷䊒䊮䉝䉪 䉶䉴䈭ቇⴚ⺰ᢥ䉕චಽ䈭♖ ᐲ䈪⼂೎䈪䈐䉎䈭䉌 䈳䇮ታ↪⊛䈭ቇⴚᖱႎᬌ ⚝䉲䉴䊁䊛䈱᭴▽䈏น⢻䈫 䈭䉎䈲䈝䈪䈅䉎䇯 એ਄䈱⁁ᴫ䉕〯䉁䈋䇮ᧄ⎇ⓥ䈲䇮ᣣᧄ⺆࿤ 䉕ኻ⽎䈫 䈚䈩䈱䉡䉢䊑䉮䊮䊁䊮䉿䈎䉌 䈱ቇⴚ⺰ ᢥ䈱⥄േ್ቯ䈮䈧䈇䈩 ᭽䇱䈭್ቯᚻᴺ䉕↪ 䈇䈢Ყセታ㛎䈎䉌ᬌ⸛䈚 䈩䈇䈒䇯 C. 䊐䉤䊷䊙䊦䋯䉟䊮䊐䉤䊷䊙䊦ቇⴚᖱႎᵹㅢ Google Scholar 䈭䈬ቇⴚᖱႎ䈮․ൻ䈚䈢 ᬌ⚝䉣䊮䉳䊮䈲 䇮䊐䉤 䊷䊙䊦䈭ᵹㅢ⚻〝䈮ਸ਼ 䉎ቇⴚᖱႎ䉕ᔒะ䈚 䈩䈇䉎䇯䈚 䈎䈚䇮䊐 䉤 䊷䊙 䊦䈭ቇⴚᖱႎ䈪䈲䈭䈒䈫䉅䇮⎇ⓥ⠪ห჻䈪੤ ឵䈘 䉏䉎䊒䊧 䊒䊥 䊮䊃䇮ౝㇱ䈪૞ᚑ䈘 䉏䉎⎇ⓥ ႎ๔䈱⾗ᢱ䈭䈬䇮䉟 䊮䊐䉤䊷䊙䊦䉮䊚 䊠䊆䉬䊷 䉲䊢䊮䈱㊀ⷐᕈ䈲 䇮ቇⴚᖱႎᵹㅢ䈱㗔ၞ䈪䈢 䈶䈢䈶ᜰ៰䈘 䉏䈩䈐䈢㪈㪈㪀䇯 䈖 䈖 䈪䈲ቇⴚ⺰ᢥ䈱⥄േ್ቯ䈮䈧䈇䈩ᬌ⸛ 䈚 䈩䈇䈒㓙䈮䇮䉟 䊮䊐䉤䊷䊙䊦䉮 䊚 䊠 䊆䉬䊷䉲䊢 䊮䉕䉅൮฽䈚䈢ᒻ䈪䈱⥄േ್ቯ䉕⋡ᜰ䈚 䈩䈇䈒䇯 䈢䈣䈚䇮䉟䊮䊐䉤䊷䊙䊦䉮䊚 䊠䊆䉬䊷䉲䊢 䊮 䈱ᒻ 䈲䈘 䉁䈙䉁䈪䈅䉎䈢䉄䇮╙৻Ბ㓏䈫䈚䈩䇮䈖䈖䈪 䈲⎇ⓥႎ๔䈭䈬䇮ቇⴚ⺰ᢥ䈮Ḱ䈛䉎䉮䊮䊁䊮䉿 䋨એਅ䇮Ḱ⺰ᢥ䋩䉕್ቯኻ⽎䈮ട䈋䉎䇯. - 2−56−. ࿑䋱㩷 䉡䉢䊑ਛ䈱ቇⴚ⺰ᢥ㩷.

(3) D. PDF 䊐䉜䉟䊦䉕ኻ⽎䈫䈚䈢⥄േ್ቯ ⃻࿷䇮ቇⴚ⺰ᢥ䈱ోᢥ䉕䊐䉜䉟䊦 䈪ឭଏ䈜 䉎႐ว 䇮䊐䉜 䉟䊦 ᒻᑼ 䈮䈲 PDF䇮HTML䇮 XML䇮TeX䇮MS Word 䈭䈬䈏䈅䉎䇯䈭䈎䈪䉅䇮 PDF ᒻᑼ䈲ઁ䈱ᒻᑼ䈫Ყ䈼ᢥᦠ䈱䊧䉟䉝䉡䊃 䉇䊂䉱䉟䊮䉕⛽ᜬ䈚䈢䉁䉁㑛ⷩ䈪䈐䇮㑛ⷩ᧦ઙ 䉕⸳ቯ䈜䉎䈖䈫 䉅 น⢻䈪䈅䉎 䈢䉄䇮ᦨ䉅৻⥸⊛ 䈭㈩Ꮣᒻᑼ䈫䈭䉍䇮࿑䋱䈱䉋䈉 䈮䈾䈫 䉖䈬䈱ቇⴚ ⺰ᢥ䈅䉎 䈇䈲䈠䉏䈮Ḱ䈛䉎䉮䊮䊁䊮䉿䈲 PDF 䈫⠨䈋 䉌 䉏䉎䇯䈠䈖 䈪䇮䉡 䉢 䊑䉮 䊮䊁䊮䉿ਛ䈱 PDF 䊐䉜䉟䊦⟲䈎䉌䈱ቇⴚ⺰ᢥ䈫Ḱ⺰ᢥ䈱್ ቯ䉕╙৻䈱⺖㗴䈫䈚䈩⠨䈋䈢䇯㩷. 䈱⛔৻䉕࿑䉍䇮䈪䈐䉎䈎䈑䉍್ቯ䈮ំ䉏䈏䈭䈇 䉋䈉 䈮䈚䈢䇯. 㸈 ታ㛎㓸ว䈱૞ᚑ. 䉟䊮䉺䊷䊈䉾䊃 㩷 㪧㪛㪝 䊐䉜䉟䊦䈱᛽಴㩷. 㪧㪛㪝 䊐䉜䉟䊦⟲㩷 ੱᚻ䈮䉋䉎ಽ㘃㩷. ቇⴚ㩷 A. 䌐䌄䌆䊐䉜䉟䊦䈱෼㓸 Ḱ⺰ᢥ㩷 㕖⺰ᢥ㩷 ⺰ᢥ㩷 PDF 䊐䉜䉟䊦㓸ว䈱૞ᚑ䈮䈧䈇䈩䈲䇮2005 ᐕ 5 ᦬䈫ඨᐕᓟ䈱 2005 ᐕ 11 ᦬䈫䈱 2 ᐲ䈮䉒 ⥄േ್ቯ㩷 䈢䈦䈩ⴕ䈦䈢䇯䉁䈝䇮ipadic2.5.1 䈱 6 䈧䈱ฬ ⹖ㄉᦠ䊐䉜䉟䊦䋨⸘ 213,020 ⺆䋩䈎䉌䇮䈠䉏䈡 ࿑㩷 㩷 㩷 㩷 ᚻ㗅㩷 ࿑䋲㩷 ್ቯታ㛎䈱ᚻ㗅㩷 䉏䇮9,750 ⺆䋨╙䋱࿁⋡䋩䇮10,250 ⺆䋨╙䋲࿁ ቇⴚ⺰ᢥ䈱್ቯⷙḰ䈫䈚䈩䇮(1)⺰ᢥ䈱ᒻᘒ ⋡䋩䉕ή૞ὑ᛽಴䈚䇮ฦ䇱䈱⺆䈮䈧䈇䈩䇮䉰䊷 䉕䈫 䈦䈩䈇䉎䇮(2) 䉺䉟䊃 䊦 䇮⪺⠪ฬ䇮ᚲዻᯏ㑐 䉼䉣䊮䉳䊮䉕↪䈇䈩ᬌ⚝䉕ⴕ䈦䈢䇯䈠䈱㓙䇮ᬌ 䈏᣿⸥䈘 䉏䈩䈇䉎䇮(3) ᒁ↪䇮ෳ⠨ᢥ₂䈏䈅䉎䇮 ⚝ኻ⽎䉕䇸PDF 䊐䉜䉟䊦䇹䋫䇸ᣣᧄ⺆䇹䈮㒢ቯ (4)1 ⺰ᢥ䈏 1 䊐䉜䉟䊦䈪᭴ᚑ䈘 䉏䈩䈇䉎䇮(5)䋲 䈜䉎䈫 䈫 䉅䈮䇮ฦᬌ⚝⺆ 䈱ᦨᄢ෼㓸ઙᢙ䈲 ਄ 䊕䊷䉳એ਄䈪䈅䉎 䈭䈬䉕↪䈇䈢䇯Ḱ⺰ᢥ䈱ၮ ૏ 100 ઙ䉁䈪䈫 䈚䈢䇯಴ജ⚿ᨐ䈱㊀ⶄ㒰෰ᓟ Ḱ䈲 ቇⴚ⺰ᢥ䈱್ቯၮḰ 䈮৻ㇱḩ䈢䈭䈇䉅 䈱⇣䈭䉍 URL ઙᢙ䈲䇮䈠䉏䈡䉏䇮307,514 ઙ 䈱䇮ౝㇱะ䈔䈱䉟 䊮䊐䉤 䊷䊙䊦䈭⎇ⓥႎ๔䉕 䋨╙䋱࿁⋡䋩䈫 441,598 ઙ䋨╙䋲࿁⋡䋩䈫 䈭䉍䇮 ฽䉄䈢䇯ౕ૕⊛䈮䈲䇸⎇ⓥ䊉䊷䊃䈭䈬䈱♿ⷐ⺰ ฦ䇱䈱 URL䈮ኻ䈚䈩 PDF 䊐䉜䉟䊦䈱䉻䉡䊮䊨 ᢥ䇹䇸䋨ቇⴚ㔀⹹એᄖ䈪䈱䋩⎇ⓥႎ๔䇹䇸ญ㗡⊒ 䊷䊄䉕⹜䉂䈢䇯䉻䉡 䊮䊨 䊷䊄䈏ਇน⢻䈪䈅䈦䈢 ⴫ේⓂ䇹䇸ⶄᢙ䈱⺰ᢥ䈱㓸ว૕䇹䇸⺰ᢥ䊶ቇⴚ 䉅䈱䇮෸䈶䇮0 䊋䉟䊃䊶䊐䉜䉟䊦䇮⎕៊䊐䉜䉟䊦䇮 ᦠ䈱ᢿ 䇹䇸තᬺ⎇ⓥ䊶ୃ჻⺰ᢥ䇹䇸᝼ᬺᢎ᧚䇹 ᥧภൻ䊐䉜䉟䊦䇮PDF 䊐䉜䉟䊦䈪䈭䈇䉅 䈱╬䉕 䈭䈬䈪䈅䉎䇯 㒰෰䈚䈢⚿ᨐ䇮╙䋱࿁෼㓸䈪䈲 248,314 ઙ䇮 䈭䈍䇮ᣣᧄ⺆䈱䊐䉜䉟䊦䉕ኻ⽎䈮䈚 䈩䈇䉎䈢 ╙䋲࿁෼㓸䈪䈲 349,971 ઙ䈱 PDF 䊐䉜䉟䊦㓸 䉄䇮⺋್ቯ䈎䉌ᄖ࿖⺆䈱䊐䉜䉟䊦䉅৻ㇱ฽䉁䉏 ว䈏ᓧ䉌 䉏䈢䇯䈘䉌䈮䇮╙䋱࿁⋡䈫╙䋲࿁⋡䈱 䈩䈇䈢䈢䉄䇮䈖 䉏䉌䈲㕖⺰ᢥ䈫್ቯ䈚䈢䇯 ㊀ⶄ䉕㒰෰䈚䈢䈫䈖䉐 599,673 ઙ䈫 䈭䈦䈢䇯 C. ታ㛎㓸ว䈱․ᕈ B. ቇⴚ⺰ᢥ䇮Ḱ⺰ᢥ䈱್ቯ (1) ታ㛎㓸ว䈱ၮᧄ⊛䈭ዻᕈ ో૕䈱 PDF 䊐䉜䉟䊦㓸ว䈎䉌䇮20,000 ઙ䉕 ቇⴚ⺰ᢥ䈫⺰ᢥએᄖ䋨એਅ䇮䇸㕖⺰ᢥ䇹䈫␜ ή૞ὑ䈮᛽಴䈚䇮6 ੱ䈱್ቯ⠪䈏ฦ PDF 䊐䉜 䈜䋩䈱䊐䉜䉟䊦ᢙ䇮䊐䉜 䉟䊦䉰䉟䉵 䇮䊕䊷䉳ᢙ䇮 䉟䊦䈮䈧䈇䈩 ቇⴚ⺰ᢥ䇮Ḱ⺰ᢥ䇮㕖⺰ᢥ䈪䈅 䉎䈎䉕್ቯ䈚䈢䇯12,000 ઙ䉕್ቯ䈚䈢ᤨὐ䈪䇮 ❑ဳ䈱ഀว䉕⴫䋱䈮␜䈜䇯⴫䋱䈎䉌䇮PDF 䊐䉜 䉟䊦㓸ว 20,000 ઙਛ䈱⺰ᢥ䈱ഀว䈲 1.63% ቇⴚ⺰ᢥ䈫Ḱ⺰ᢥ䈫್ቯ䈘 䉏䈢 565 ઙ䈱䊐䉜 䈫ૐ䈒䇮Ḱ⺰ᢥ䉕฽䉄䈩䉅 5%䈮ḩ䈢䈭䈇䈖䈫 䈏 䉟䊦䉕ᡷ䉄䈩 6 ੱోຬ䈏ౣ್ቯ䈚䇮್ቯၮḰ 䉒䈎䉎 䇯⥄േ್ቯ䉕ⴕ䈉 ⴫䋱㩷ታ㛎㓸ว䈱ၮᧄ⊛䈭ዻᕈ ዻᕈ䈮㑐䈚 䈩䈲䇮㕖Ᏹ䈮 ቇⴚ⺰ᢥ Ḱ⺰ᢥ 㕖⺰ᢥ ઙᢙ 㪊㪉㪍 㪍㪉㪋 㪈㪐㪃㪇㪌㪇 ஍䈦䈢㓸ว䈪䈅䉎 䈫 䈇䈋 ᐔဋ䊐䉜 䉟䊦䉰䉟䉵㪋㪐㪎㪃㪍㪉㪉㪅㪎 㪹㫐㫋㪼㫊 㪋㪊㪍㪃㪎㪊㪍㪅㪋 㪹㫐㫋㪼㫊 㪉㪐㪌㪃㪈㪈㪈㪅㪐 㪹㫐㫋㪼㫊 䉎䇯ᐔဋ䊐䉜 䉟䊦䉰䉟䉵䈲 ᐔဋ䊕䊷䉳ᢙ 㪈㪇㪅㪐㪋 㫇㪸㪾㪼㫊 㪈㪊㪅㪏㪍 㫇㪸㪾㪼㫊 㪍㪅㪏㪏 㫇㪸㪾㪼㫊 ⺰ᢥ䇮Ḱ⺰ᢥ䈱ᣇ䈏㕖⺰ ❑ᒻ䈱ഀว 㪈㪇㪇㪅㪇㪇㩼 㪐㪏㪅㪌㪋㩼 㪐㪉㪅㪌㪇㩼 ᢥ䉋䉍 䉅 ᄙ䈇 䇯ᐔဋ䊕䊷 −57− - 3-.

(4) 䉳ᢙ䈲⺰ᢥ䉋䉍䉅Ḱ⺰ᢥ䈫್ቯ䈘 䉏䈢䉅 䈱䈱ᣇ 䈏ᄙ䈒䇮㕖⺰ᢥ䈲䈣䈇䈹ዋ䈭䈇䉅 䈱䈏ᄙ䈇䈖䈫 䈏䉒䈎䉎䇯❑ဳ䈱ഀว䈲䊕䊷䉳䈱❑ᮮᲧ䉕ข 䈦䈢䈫 䈐䈮䇮❑㐳䈪䈅䉎 ഀว䈪䈅䉎䇯⺰ᢥ䈪䈲 䈜䈼䈩䈱䊐䉜䉟䊦䈏❑㐳䈪䈅䉎䈖 䈫 䈏䉒䈎䉎䇯 (2) 䊄䊜䉟䊮䈱ಽᏓ 䊄䊜䉟䊮 㪸㪺 㪾㫆 㪺㫆 㫆㫉 㫅㪼 䈠䈱ઁ. ⴫䋲㩷㪡㪧䉰䊑䊄䊜 䉟䊮䈱ಽᏓ ቇⴚ⺰ᢥ Ḱ⺰ᢥ 㕖⺰ᢥ 㪈㪎㪉 㪌㪉㪅㪍㪇㩼 㪉㪍㪐 㪋㪊㪅㪊㪉㩼 㪈㪃㪎㪋㪐 㪐㪅㪊㪇㩼 㪌㪉 㪈㪌㪅㪐㪇㩼 㪈㪇㪐 㪈㪎㪅㪌㪌㩼 㪈㪃㪏㪏㪐 㪈㪇㪅㪇㪌㩼 㪉㪐 㪏㪅㪏㪎㩼 㪋㪎 㪎㪅㪌㪎㩼 㪊㪃㪇㪉㪊 㪈㪍㪅㪇㪏㩼 㪉㪋 㪎㪅㪊㪋㩼 㪌㪐 㪐㪅㪌㪇㩼 㪉㪃㪈㪉㪎 㪈㪈㪅㪊㪉㩼 㪌 㪈㪅㪌㪊㩼 㪉㪇 㪊㪅㪉㪉㩼 㪈㪃㪊㪉㪉 㪎㪅㪇㪊㩼 㪋㪌 㪈㪊㪅㪎㪍㩼 㪈㪈㪎 㪈㪏㪅㪏㪋㩼 㪏㪃㪍㪏㪎 㪋㪍㪅㪉㪈㩼 㩷. 㩷 㩷 ⴫䋲䈲෼㓸䈘 䉏䈢 URL 䈱䊃 䉾䊒䊧䊔䊦䊄䊜 䉟 䊮䈪ᄢ൓䉕භ䉄䈢 jp 䊄䊜䉟䊮䈮䈍䈔䉎䉰䊑䊄 䊜 䉟䊮䈱਄૏ 5 ૏䉁䈪䉕␜䈚 䈢䉅 䈱䈪䈅䉎䇯⴫䋲 䈎䉌䇮ቇⴚ⺰ᢥ䈫Ḱ⺰ᢥ䈱䉰䊑䊄䊜 䉟䊮䈲 ac.jp 䈏ᄙ䈇䈖 䈫 䈏䉒䈎䉎䇯৻ᣇ䇮㕖⺰ᢥ䈱䊄䊜䉟䊮䇮 䉰䊑䊄䊜 䉟䊮䈲ᄙ⒳ᄙ᭽䈪䈅䉍䇮ਛ䈪䈲 co.jp 䈎 䉌 䈱䉅 䈱䈏ᦨ䉅ᄙ䈇䇯 (3)⺰ᢥ䈱ਥ㗴ಽ㊁ ⴫䋳㩷⺰ᢥ䈱ਥ㗴ಽ㊁ ⺰ᢥ 㪥㪛㪚 㪇㪇㩷✚⸥ 㪊㪊 㪈㪇㪅㪈㩼 㪈㪇㩷ືቇ 㪉㪇 㪍㪅㪈㩼 㪉㪇㩷ᱧผ 㪈㪉 㪊㪅㪎㩼 㪊㪇㩷␠ળ⑼ቇ 㪍㪋 㪈㪐㪅㪍㩼 㪋㪇㩷⥄ὼ⑼ቇ 㪍㪋 㪈㪐㪅㪍㩼 㪌㪇㩷ᛛⴚ䋯Ꮏቇ 㪏㪏 㪉㪎㪅㪇㩼 㪍㪇㩷↥ᬺ 㪉㪉 㪍㪅㪎㩼 㪎㪇㩷⧓ⴚ䋯⟤ⴚ 㪋 㪈㪅㪉㩼 㪏㪇㩷⸒⺆ 㪈㪇 㪊㪅㪈㩼 㪐㪇㩷ᢥቇ 㪐 㪉㪅㪏㩼. Ḱ⺰ᢥ 㪉㪉 㪊㪅㪌㩼 㪈㪌 㪉㪅㪋㩼 㪉㪉 㪊㪅㪌㩼 㪈㪍㪈 㪉㪌㪅㪏㩼 㪈㪎㪌 㪉㪏㪅㪇㩼 㪈㪋㪌 㪉㪊㪅㪉㩼 㪍㪇 㪐㪅㪍㩼 㪈㪋 㪉㪅㪉㩼 㪌 㪇㪅㪏㩼 㪌 㪇㪅㪏㩼. 䊦䈎䉌䊁䉨䉴 䊃䊂䊷䉺䉕᛽಴䈜䉎ᔅⷐ䈏䈅䉎䇯 ᧄታ㛎䈪䈲䇮Xpdf 3.01pl2㩷 㪈㪉㪀䉕↪䈇䈩䇮PDF 䊐䉜䉟䊦䈎䉌䊁䉨䉴䊃 䊂䊷䉺䈱᛽಴䉕ⴕ䈦䈢䇯 PDF 䊐䉜䉟䊦䈲⴫␜䊶ශ೚ᤨ䈮䊧䉟䉝䉡䊃䈏 ౣ⃻น⢻䈭䊂䊷䉺ᒻᑼ䈪䈅䉍 䇮ౝㇱ⊛䈮䈲ᢥ ᦠ᭴ㅧ䈱ᖱႎ䉕䉅଻ᜬ䈜䉎䈖䈫 䈏น⢻䈪䈅䉎䇯 䈚 䈎䈚 䈭䈏䉌䇮ᄙ䈒䈱 PDF 䊐䉜䉟䊦䈲න䈮䊧䉟 䉝䉡䊃ᖱႎ䈚䈎ᜬ䈢䈭䈇䇯䈠䈱䈢䉄䇮䊁䉨䉴䊃䊂 䊷䉺䈱᛽಴䉕ⴕ䈉䈫䇮Xpdf 䈲䊧䉟䉝䉡䊃䈱ᜰቯ 䈏䈘 䉏䈩䈇䉎▎ᚲ䉕ᡷⴕ䊶ⓨ⊕䈻䈫ᄌ឵䈜䉎䈖 䈫䈏ᄙ䈇䇯ᗧ࿑䈘 䉏䈢ᡷⴕ䊶ⓨ⊕䈫 Xpdf 䈮䉋䈦 䈩ᄌ឵䈘 䉏䈢ᡷⴕ䊶ⓨ⊕䉕್೎䈜䉎䈖䈫䈲 ࿎㔍 䈪䈅䉎 䈢䉄䇮䈖 䈖 䈪䈲ᡷⴕ䊶ⓨ⊕䈱㒰෰╬䈱․ ೎䈭ᓟಣℂ䈲 ⴕ䉒䈝䇮ᄌ឵䈚䈢䊐䉜䉟䊦䉕䈠 䈱䉁䉁↪䈇䈢䇯 ᣣᧄ⺆䈲 ⤔⌕⺆䈪䈅䉎 䈢䉄䊁䉨䉴 䊃 䊂䊷䉺 䉕䇮䊃䊷䉪䊮䋨ᢥሼ೉䉇න⺆䋩䈮ಽഀ䈜䉎ᔅⷐ 䈏䈅䉎 䇯䊃䊷䉪 䊮ൻ䈮䈲 䇮ᒻᘒ⚛⸃ᨆ䉲䉴䊁䊛 MeCab 0.81㪈㪊㪀䈫 bigram 䉕↪䈇䈢䇯એਅ䈪䈲 ೨ ⠪ 䈮䉋 䈦 䈩 ᒻ ᘒ ⚛ 䈮 ಽ ഀ 䈘 䉏䈢 䉅 䈱䈲 mecab䇮bigram 䈮䉋 䈦䈩ಽഀ䈘 䉏䈢䉅 䈱䈲 bigram 䈫䈚䈩ෳᾖ䈜䉎䇯ಾ䉍಴䈚䈢䊃䊷䉪䊮䈎 䉌䈱ㆬᛯ䈲ⴕ䉒䈝䇮䈜䈼䈩↪䈇䈢䇯. 㸉 ታ㛎ⅣႺ. ⺰ᢥ䊐䉜䉟䊦 䈱ಽ㊁䈱ಽᏓ䉕⴫䋳䈮␜䈜䇯 ⴫䋳䉕䉂䉎䈫䇮⺰ᢥ䈱ಽ㊁䈮䈲ᛛⴚᎿቇ䈏ᄙ䈒䇮 䈧䈇䈪⥄ὼ⑼ቇ䇮␠ળ⑼ቇ䈱㗅䈮䈭䈦䈩䈇䉎䇯 䉡䉢䊑਄䈪౏㐿䈘 䉏䈩䈇䉎⺰ᢥ䈮䇮⥄ὼ⑼ቇ䊶 ᛛⴚᎿቇ䈏ᄙ䈇䈱䈲 ੍ᗐ䈘 䉏䉎䈏䇮৻ᣇ䈪䈠 䉏䉌䈮඘ᢜ䈜䉎䈾䈬䈱ੱᢥ␠ળ⑼ቇಽ㊁䈱⺰ ᢥ䉅౏㐿䈘 䉏䈩䈇䉎䈖 䈫 䈏䉒䈎䉎䇯⥄ὼ⑼ቇಽ ㊁䈫Ყセ䈚䈩䇮ోᢥ䊂䊷䉺 䊔䊷䉴䈱ឭଏ䈏ㆃ 䉏䈩䈇䉎ੱᢥ␠ળ⑼ቇಽ㊁䈪䉅䉡䉢䊑਄䈪䈱 ⎇ⓥᚑᨐ䈱౏⴫䈲Ⓧᭂ⊛䈮ⴕ䉒䉏䈩䈇䉎䈫⠨ 䈋䉌 䉏䉎䇯 (4) PDF 䊐䉜䉟䊦䈱䊁䉨䉴䊃ൻ䈫䊃 䊷䉪䊮ൻ ್ቯታ㛎䈮↪䈇䉎್ቯེ䈲䇮PDF 䊐䉜䉟䊦 䉕䇮⋥ធ䇮ᛒ䈉 䈖 䈫 䈲䈪䈐䈭䈇䈢䉄䇮PDF 䊐䉜䉟. A. ್ቯ䈮↪䈇䈢ዻᕈ ቇⴚ⺰ᢥ䈱್ቯ䈲䊁䉨䉴䊃ಽ㘃䈱⺖㗴䈱৻ 䈧䈪䈅䉍 䇮䉁䈝䊁䉨䉴䊃䈱ౝኈ䉕⠨ᘦ䈚䈢႐ว 䈮䇮PDF 䊐䉜䉟䊦ਛ䈮಴⃻䈜䉎⺆䉕ᚻ䈏䈎䉍 䈫 䈭䉎ዻᕈ䈫䈚䈩↪䈇䉎䈖䈫䈏⠨䈋䉌 䉏䉎䇯䈖䈱಴ ⃻⺆䈮䉋䉎 䉝䊒䊨 䊷䉼䈲䇮ᓥ᧪䈱⎇ⓥᚑᨐ䉅 ᄙ䈒䇮ታ❣䈱䈅䉎䊁䉨䉴䊃ಽ㘃䈱್ቯེ䉕↪䈇 䉎䈖䈫 䈏䈪䈐䉎䇯䈢䈣䈚䇮಴⃻⺆䉝䊒䊨 䊷䉼䈪䈲䇮 ዻᕈᢙ䈏Ყセ⊛ዋ䈭䈇 mecab 䈪䉅 77,814 ઙ 䈫 䈭䉍䇮ዻᕈ䈱ㆬᛯ╬䉕ⴕ䉒䈭䈇႐ว䇮ᔕ↪น ⢻䈭್ቯེ䈲㒢ቯ䈘䉏䈩䈚䉁䈉䇯 ╩⠪䉌䈮䉋䉎వⴕ⎇ⓥ䈪䈲䇮䉡䉢䊑䉮䊮䊁䊮 䉿䈎䉌 䈱⺰ᢥ್ቯ䈲 䇮⤘ᄢ䈭㕖⺰ᢥਛ䈎䉌䈱 㕖Ᏹ䈮㒢䉌 䉏䈢ᢙ䈱⺰ᢥ䉕᛽಴䈜䉎㔍ᐲ䈱㜞 䈇್ቯⴕὑ䈪䈅䉍䇮಴⃻⺆䈣䈔䈎䉌 䈱䉝䊒䊨䊷 䉼䈣䈔䈪䈲ਇචಽ䈭䈖䈫 䈏␜ໂ䈘 䉏䈢㪈㪋㪀䇯䈠䈖 䈪䇮಴⃻⺆䈮䉋䉎䉝䊒䊨 䊷䉼䈣䈔䈪䈭䈒䇮ઁ䈱 ዻᕈ䉕↪䈇䈢䉝䊒䊨 䊷䉼䉅ⴕ䈦䈢䇯 ⃻ᤨὐ䈪䈲ੱ䈏ᢥᦠ⟲䈎䉌⺰ᢥ䉕⺰ᢥ䈫䈚 䈩್ᢿ䈜䉎ⴕὑ䈮ኻ䈜䉎૕♽⊛䈭⎇ⓥ䈏䈭䈇 䈢䉄䇮⥄േ್ቯ 䈱ᚻ䈏䈎䉍 䈫 䈭䉎ዻᕈ⟲䉕᳿ ቯ䈜䉎 䈖 䈫 䈲䈪䈐 䈭䈇䇯䈚 䈎䈚䇮⺰ᢥ䈱ౝኈ䈣 䈔䈪䈭䈒䇮⺰ᢥ䈱䊧䉟䉝䉡䊃䇮᭴ㅧ⊛䈭․ᕈ䇮౉. −58− - 4-.

(5) ᚻర╬䈱䈘 䉁䈙䉁䈭ⷐ⚛䉕✚ว⊛䈮↪䈇䉎䈲 䈝䈪䈅䉎䇯䈖䈖 䈪䈲䇮⚻㛎⊛䈮⺰ᢥ䈫㑐ଥ䈜䉎 䈫⠨䈋䉌䉏䉎䈖 䈫 䇮䌐䌄䌆䊐䉜䉟䊦 䈎䉌౉ᚻน⢻ 䈎䈧್ቯེ䈮ᛩ౉น⢻䈭䈖䈫 䈱ੑὐ䉕᧦ઙ䈫 䈚䈩એਅ䈱⴫䋴䈮䈅䉎䋴䉦䊁䉯䊥䇮䋱䋹ዻᕈ䋨䊦䊷 䊦䋩䉕ណ↪䈚䈢䇯. 䳦. ⴫䋴㩷䊦䊷䊦䊔䊷䉴䈱್ቯ䈪↪䈇䈢ዻᕈ 䉦䊁䉯䊥 ዻᕈ 䊐䉜 䉟䊦䉰䉟䉵 ᭴ 䊕䊷䉳ᢙ ㅧ 䊕䊷䉳䈱ᒻ䋨❑ဳ䈎ᮮဳ䈎䋩 㪬㪩㪣䈏㪸㪺㪅㫁㫇䈪䈅䉎䈎 ౉ᚻర 㪬㪩㪣䈏㪾㫆㪅㫁㫇䈎䉌䈪䈅䉎䈎 ᢥᧃ䈏䈪䈅䉎⺞䈎䈪䈜䉁䈜 ⺞䈎 ળ⹤䈏಴䈩䈒 䉎䈎 ᢥ ૕ 䋨ᢥᧃ䈮䇸䈰䇯䇹䇮䇸䇹 䈏૶䉒䉏䈩䈇䉎䈎 䋩 䈵䉌 䈏䈭䈏಴⃻䈜䉎䈎䋨ᄖ࿖⺆䈎䋩 䇸⎇ⓥ䇹 䇸ᢥ₂䇹 ಴ 䇸ⵍ㛎⠪䇹 ⃻ 䇸⺞ᩏ䇹䇸ಽᨆ䇹䇸ታ㛎䇹 ࠠ 䇸♿ⷐ䇹䇸⎇ⓥႎ๔䇹䇸⎇ⓥ䊉 䊷䊃䇹 䇸࿑䇹䇸⴫䇹 ࡢ 䇸ᧄⓂ䇹䇸ᧄ⎇ⓥ䇹䇸ᧄ⺰ᢥ䇹 䇸⎇ⓥᚑᨐ䇹䇸⎇ⓥ⚿ᨐ䇹 ࠼ 䇸⠨ኤ䇹䇸⠨ᘦ䇹 䇸ᒁ↪ᢥ₂䇹䇸ෳᾖᢥ₂䇹䇸ෳ⠨ᢥ₂䇹 䇸ᄢቇ䇹䇸⎇ⓥᚲ䇹䇸⎇ⓥ䉶䊮䉺䊷䇹 䳦. 䈖䉏䉌䈱ዻᕈ⟲䉕↪䈇䈢⥄േ್ቯ䈲 䇮એਅ 䈪䈲䇸䊦䊷䊦䊔䊷䉴 䉝䊒䊨 䊷䉼䇹䈫䈚䈩䇮䇸಴⃻ ⺆䉝䊒䊨 䊷䉼䇹䈫඙೎䈜䉎䇯 䈖䈱䉋䈉 䈭 ዻᕈ䉕↪䈇䉎 䈢䉄䈮䈲䈅䉌 䈎䈛 䉄 ቇⴚ⺰ᢥ䈮㑐䈜䉎⍮⼂䈏ᔅⷐ䈪䈅䉎 䈢䉄䇮᳢ ↪ᕈ䈮䈲ᰳ䈔䉎䈏䇮಴⃻⺆䉝䊒䊨 䊷䉼䈫Ყセ 䈚䈩䇮㕖Ᏹ䈮ዋ䈭䈇ዻᕈ䈎䉌 䈱್ቯ䈪䈅䉎䈢 䉄䇮㒢䉌 䉏䈢ᯏ᪾⾗Ḯ䇮ᤨ㑆䈫 䈇䈉ὐ䈎䉌䈲᦭ ೑䈭䉝䊒䊨 䊷䉼䈫 䈭䉎䇯 B. ್ቯᚻᴺ䈫 䈠䈱ታⵝ ್ቯᕈ⢻䈱ะ਄䉕࿑䉍䇮ฦ್ቯᚻᴺ䈱․ᕈ Ყセ䈱䈢䉄䇮䈪䈐 䉎 䈣䈔᏷ᐢ䈇ⷰὐ䈎䉌ណ↪ 䈜䉎್ቯᚻᴺ䉕ᬌ⸛䈚䈢䇯⚿ᨐ䈫䈚䈩䇮಴⃻⺆ 䉝䊒䊨 䊷䉼䈪䈲䇮䊁䉨䉴䊃ಽ㘃䈮䈍䈇䈩⹏ଔ䈱 㜞䈇䇮SVM䇮AdaBoost䇮䈠䈚䈩 䇮䉴䊌䊛䊐䉞 䊦 䉺䈫䈚䈩ᐢ䈒૶䉒䉏䈩䈇䉎䊔䉟 䉳䉝䊮䊐䉞 䊦䉺䈱 ਃ⒳㘃䈱್ቯᚻᴺ䈫䈚䈩↪䈇䈢䇯䊦䊷䊦䊔䊷 䉴 䉝䊒䊨 䊷䉼䈪䈲䇮SVM䇮AdaBoost 䈮ട䈋 䈩䇮䊅䉟 䊷䊑䊔䉟 䉵 䇮᳿ቯᧁ(C4.5) 䇮䊜䉺್ቯ ᚻᴺ䈫䈚䈩 Voting 䈎䉌䈱್ቯ䉕ⴕ䈦䈢䇯 ฦ್ቯᚻᴺ䉕ታⵝ䈚䈢䉲䉴䊁䊛䈫䈚䈩䇮䊦䊷 䊦䊔䊷䉴 䉝 䊒 䊨 䊷䉼 䈪䈲 䇮 Weka(Waikato. Environment for Knowledge Analysis)䉕 ↪䈇䈢䇯Weka 㪈㪌㪀䈲 Waikato ᄢቇ(䊆䊠䊷䉳䊷 䊤䊮䊄)䈱ᯏ᪾ቇ⠌䉶䊮䉺䊷䉕ਛᔃ䈮 Java ⸒ ⺆䈪㐿⊒䈏ⴕ䉒䉏䈩䈇䉎䊂䊷䉺 䊙䉟 䊆䊮䉫䉿 䊷䊦䈪䈅䉍䇮ᢙᄙ䈒䈱ᯏ᪾ቇ⠌䈮ၮ䈨䈒್ቯ ེ䉕ታⵝ䈚 䈩䈇䉎䇯ේೣ⊛䈮䈲 Weka 3.4.7䇮 ᔅⷐ䈭႐㕙䈪䈲㐿⊒ 䈪䈅䉎 Weka 3.5.2 䉕 ↪䈇䈢䇯Weka 䈲಴⃻⺆䈮䉋䉎䉝䊒䊨䊷䉼䈪䈲 ૶↪䈚 䈭䈎䈦䈢䇯䈖 䉏䈲䇮೨ㅀ䈱䉋䈉 䈮 䇮ታ㛎 㓸ว䈮䈍䈔䉎಴⃻⺆䈲㜞ᰴర䈱ዻᕈ⟲䈪䈅䉍䇮 Weka 䈪䈲ᛒ䈉 䈖䈫 䈏䈪䈐䈭䈎䈦䈢䈢䉄䈪䈅䉎䇯 䈠䈱䈢䉄䇮಴⃻⺆䈮䉋 䉎 䉝䊒䊨 䊷䉼䈪䈲ฦ್ ቯᚻᴺ䈮䈧䈇䈩⇣䈭䉎ታⵝ䉕↪䈇䈩䈇䉎䇯 (1) 䉰䊘䊷䊃 䊔䉪䉺䊷䊙䉲䊮 䉰䊘 䊷䊃䊔䉪 䉺 䊷 䊙䉲䊮䋨એਅ䇮SVM䋩䈲 䇮 Vladimir N. Vapnik 䈮䉋䈦䈩ឭ᩺䈘 䉏䈢 2 䉪 䊤䉴ಽ㘃ེ䈱৻⒳䈪䈅䉎 㪈㪍㪀䇯SVM 䈲ᱜ䈱଀䈫 ⽶䈱଀䉕ಽ㔌䈜䉎ᐔ㕙䉕᭴ᚑ䈚䇮䈠䈱ಽ㔌ᐔ 㕙䈮ᦨ䉅ㄭ䈇଀䋨䉰䊘䊷䊃䊔䉪 䉺䊷 䋩ห჻䈱䊙 䊷䉳䊮䋨䉰䊘䊷䊃䊔䉪䉺䊷䈫ಽ㔌ᐔ㕙䈱ᦨዊ〒 㔌䋩䉕ᦨᄢൻ䈜䉎䈖䈫 䈪 ቇ⠌䈏ⴕ䉒䉏䉎䇯䈖䉏 䉕䉦䊷䊈䊦㑐ᢙ䈮䉋䉍㜞ᰴరⓨ㑆䈮౮௝䈜䉎 䈖䈫䈪䇮㜞ᰴరⓨ㑆䈮䈍䈇䈩䉅✢ᒻಽ㔌䉕ⴕ䈉 䉅 䈱䈪䈅䉎䇯 㜞䈇᳢ൻᕈ⢻䉕ᜬ䈤䇮䉦䊷䊈䊦ᴺ䈮䉋䉍 㕖 Ᏹ䈮㜞ᰴర䈱䊂䊷䉺䉕ᛒ䈉 䈖䈫 䈏䈪䈐䉎ὐ䈏․ ᓽ䈪䈅䉍 䇮ᛩ౉䈜䉎ዻᕈᢙ䈏ᄙ䈒 䈭䉍 䈏䈤䈭䊁 䉨䉴䊃 ಽ㘃䈮䈍䈇䈩 䇮ᄙ䈒䈱ᔕ↪੐଀䈏䈅䉎䇯 ᓟㅀ䈱 AdaBoost 䈫䉅䈮䊁䉨䉴䊃ಽ㘃䈮䈍䈇䈩 㜞䈇ᕈ⢻䉕␜䈜䈫 䈘 䉏䈩䈐䈢䇯 SVM 䈱ታⵝ䈫 䈚䈩䈲䇮಴⃻⺆䈮㑐䈚 䈩䈲䇮 SVMlight 6.01 㪈㪎㪀䉕↪䈇䇮䊦䊷䊦䊔䊷䉴䈪䈲䇮 Weka 䈎䉌 LIBSVM 2.81㪈㪏㪀䉕๭䈶಴䈜ᒻ䈪 ↪䈇䈢䇯 (2) AdaBoost 䊑䊷䉴 䊁䉞 䊮䉫(Boosting)ᴺ䈲 䇮䊋䉾 䉩䊮䉫 (Bagging) 䈫 ห ᭽ 䈮 㓸 ࿅ ቇ ⠌ (ensemble learning)䈪䈅䉍 䇮♖ᐲ䈏䈠䉏䈾䈬㜞䈒䈭䈇ⶄ ᢙ䈱ᒙቇ⠌ེ䈱⚵䉂ว䉒䈞ᣇ䇮㊀䉂ઃ䈔䉕 ቇ ⠌ 䈜䉎 䈖 䈫 䈪 ᕈ ⢻ 䉕㜞 䉄 䉎 ᚻᴺ 䈪䈅 䉎 䇯 AdaBoost 䈲ೋᦼ䈱䊑䊷䉴䊁䉞 䊮䉫ᴺ䉕ᡷ⦟䈚 䈢䉅 䈱䈪䇮Schapire 䈫 Singer㪈㪐㪀䈮䉋䉎ታ㛎䈪 䈲䇮න⺆䈱᦭ή䈮䉋䉎ᒙቇ⠌ེ䉕 AdaBoost 䈮䉋 䈦 䈩 ⚵䉂 ว 䉒 䈞䈢 ್ ቯ ེ 䈏ᦨ ㄭ ற ᴺ (k-NN ᴺ)䉇䊅䉟 䊷䊑䊔䉟 䉵 ᴺ䈮䉋䉎್ቯེ䉋 䉍䉅㜞䈇್ቯᕈ⢻䉕␜䈚 䈩䈇䉎䇯 SVM 䈫䈱㑐ଥ䈪䈲䊙䊷䉳䊮䈱ℂ⺰䈮ၮ䈨䈒 ὐ䈪䈲㕖Ᏹ䈮ૃㅢ䈦䈩䈇䉎䈏 䇮䇸⇣䈭䉎䊉 䊦䊛. −59− - 5-.

(6) 䈲⇣䈭䉎䊙䊷䉳䊮䈮ኻᔕ䈚䈉䉎 䇹䇸ᔅⷐ䈭⸘▚ ㊂䈏㆑䈉䇹䇸㜞ᰴర䈪䈱ត⚝䉕ല₸⊛䈮ⴕ䈉䈢 䉄䈮⇣䈭䉎䉝䊒䊨 䊷䉼䉕↪䈇䈩䈇䉎䇹ὐ䈏⇣䈭 䈦䈩䈇䉎㪉㪇㪀䇯 ੹࿁䈲 Boosting 䈱ታⵝ䈫䈚䈩䇮಴⃻⺆䈮㑐 䈚 䈩䈲 BoosTexter 䈪䈲 AdaBoost.MH 䉕↪ 䈇䈢䇯BoosTexter 㪉㪈㪀䉕↪䈇䈢ℂ↱䈲 䇮಴⃻⺆ 䈲 mecab 䈪 70 ਁએ਄䈫ᰴరᢙ䈏ᄙ䈇䈏䇮ᰴ ర❗⚂䈭䈚䈮ᛒ䈉 䈖䈫䈏䈪䈐䉎 AdaBoost ታⵝ 䈪䈅䉎䈎䉌䈪䈅䉎䇯BoosTexter 䈮ታⵝ䈘 䉏䈢 AdaBoost ታ ⵝ 䈲 3 ⒳ 㘃 䈪 䈅 䉎 䈏 䇮 AdaBoost.MH 䈏వⴕ⎇ⓥ䈪ઁ䈱 2 䈧䈮ఝ䉎 ⚿ᨐ䉕಴䈚 䈩䈇䉎䈢䉄㪉㪉㪀䇮AdaBoost.MH 䉕↪ 䈇䈢䇯䊦䊷䊦䊔䊷䉴䈪䈲 Weka 䈱䊝䉳䊠 䊷䊦 䉕↪䈇䈢䇯ਔᣇ䈫䉅䈮 AdaBoost 䉕ᒙቇ⠌ེ䈫 䈚䈩න৻䊉 䊷䊄䈎䉌᭴ᚑ䈘 䉏䉎᳿ቯᧁ䋨᳿ቯ ᩣ䋺decision stumps䋩䈫⚵䉂ว䉒䈞䇮➅䉍㄰䈚 ᢙ䉕 10 ࿁䇮100 ࿁䈮䈍䈔䉎್ቯ䉕ⴕ䈦䈢䇯㩷 (3) 䊅䉟䊷䊑䊔䉟䉵䋯䊔䉟䉳䉝䊮䊐䉞 䊦䉺 䊅䉟 䊷䊑䊔䉟 䉵 ಽ㘃ེ (naive Bayesian classifier)䈲䇮䊔䉟䉵䈱⏕₸䊝䊂䊦䈮ၮ䈨䈒䇮 න⚐䈭䉲䉴 䊁䊛䈪䈅䉎䇯"naive"䈫䈲ฦዻᕈห ჻䈏⁛┙䈪䈅䉎䇮ẜ࿷⊛䈭ዻᕈ䈏ᓇ㗀䈚 䈭䈇 䈫 䈇䈉઒ቯ䈎䉌ฬ䈨䈔䉌 䉏䈢䉅 䈱䈪䈅䉎䈏䇮න ⚐䈪䈅䉎 䈏䉉䈋䈮ℂ⺰⊛䈭᜛ᒛ䈏ኈᤃ䈪䈅䉍 䇮 ᔕ↪▸࿐䉅ᐢ䈇䇯 䊔䉟 䉳䉝䊮䊐䉞 䊦䉺(Bayesian Filter)䈲䇮䊅 䉟 䊷䊑䊔䉟 䉵 ᴺ䈱ᔕ↪䈚䈢䉅䈱䈪䈅䉎 䇯⃻࿷䈪 䈲䇮ਥ䈫䈚䈩㔚ሶ䊜 䊷䊦䈱ਛ䈎䉌䉴䊌䊛䊜 䊷䊦 䉕ᬌ಴䈜䉎䉲䉴䊁䊛䈪↪䈇䉌 䉏䈩䈍䉍䇮․䈮㵰A plan for spam㵱 㪉㪊㪀䈏⊒⴫䈘 䉏䈩એ㒠䇮ᄙ䈒䈱 䉲䉴䊁䊛䈏㐿⊒䈘 䉏䈩䈇䉎䇯䊔䉟 䉳䉝䊮䊐䉞 䊦䉺 䉕䉴䊌䊛䊜 䊷䊦䈮ᔕ↪䈜䉎႐ว䇮㕖䉴䊌䊛䊜 䊷 䊦䈫䉴䊌䊛䊜 䊷䊦䈮಴⃻䈜䉎䊃䊷䉪 䊮 䈮ኻ䈜䉎 䉴 䊌䊛⏕₸䉕ቇ⠌䈚䇮䈠䈱䉴 䊌䊛⏕₸䉕䉅䈫䈮 䇮 ᣂ䈢䈮ฃା䈚䈢㔚ሶ䊜䊷䊦䈮ኻ䈚䈩䇮䉴䊌䊛䊜 䊷䊦䈱ᬌ಴䉕ⴕ䈉䇯䉴䊌䊛䊜 䊷䊦䈲䇮ౝኈ䈎䉌 䉅್ቯ䈜䉎䈖䈫䈲 น⢻䈪䈅䉎䈏䇮ౝኈ䈣䈔䈪䈭 䈒䇮ઙฬ䈱ᦠ䈐ᣇ䈭䈬 䈠䈱䉴䉺䉟䊦䈏್ቯ䈮᦭ ല䈪䈅䉎䈫 䈇䉒䉏䈩䈇䉎䇯 䊔䉟 䉳䉝䊮䊐䉞 䊦䉺䈱ታⵝ䈫䈚䈩䈲 䇮ᣣᧄ⺆ 䈮 䉅 ኻ ᔕ น ⢻ 䈪䈅 䉎 bsfilter 㪉㪋 㪀 䉕↪ 䈇 䈢䇯 bsfliter 䈮䈲᦭ฬ䈭 Paul Graham ᣇᑼ䉅ታ ⵝ䈘 䉏䈩䈇䉎䈏䇮䉋䉍♖ᐲ䈏㜞䈇䈫 䈘 䉏䉎 Gary Robinson-Fisher ᣇᑼ㪉㪌㪀䉕↪䈇䈢䇯 bsfilter 䈲䇮ฦ䊐䉜䉟䊦䈮ኻ䈚䈩䉴 䊌䊛⏕₸ 䉕▚಴䈜䉎䇯䉴 䊌䊛䊜 䊷䊦್ቯ䈮↪䈇䉎႐ว 䈮䈲 䇮䈖䈱⏕₸䈏㜞䈇䈫䉴䊌䊛䊜 䊷䊦䈪䈅䉎䈫. ್ቯ䈘䉏䉎䈏䇮ᧄታ㛎䈪䈲䇮䇸㕖⺰ᢥ䇹䈫䈚䈩್ ቯ䈜䉎䇯 (4) ᳿ቯᧁ(C4.5) ᳿ቯᧁ(decision tree)䈲 น⺒ᕈ䈱㜞䈇ಽ 㘃ེ䈪䈅䉍䇮ㄭᐕ䈪䈲 AdaBoost 䈭䈬䈱㓸࿅ ቇ⠌䈱ᒙቇ⠌ེ䈫䈚䈩૶䉒䉏䉎䈖䈫 䈏 ᄙ䈇䇯ઍ ⴫⊛䈭᳿ቯᧁ䉝䊦䉯䊥 䉵䊛䈫 䈚䈩䈲䇮CART 㪉㪍㪀䇮 ID3㪉㪎㪀䇮C4.5㪉㪏㪀䈏䈅䉎䈏䇮䈖䈖 䈪䈲 Weka 䈮ታ ⵝ䈘 䉏䈩䈇䉎 C4.5䋨䊝䉳䊠 䊷䊦ฬ䈲 J48䋩䉕ቇ ⠌⚿ᨐ䈱ಽᨆ䈱䈢䉄䈮↪䈇䈢䇯 ታ㓙䈮੤Ꮕᬌቯ↪ቇ⠌㓸ว No.䋱䈎䉌ቇ⠌ 䈘 䉏䈢᳿ቯᧁ䈱৻ㇱ䉕࿑䈮䈅䈕䈢䇯. ࿑䋳 ↢ᚑ䈘 䉏䈢᳿ቯᧁ䈱৻ㇱ (5) 䊜䉺್ቯེ䋨Voting䋩 䊦䊷䊦 䊔䊷䉴䈱್ቯ䈪䈲 ⶄᢙ䈱್ቯེ䉕 ⚵䉂ว䉒䈞䈢䊜䉺್ቯེ䉅↪䈇䈢䇯䈖 䉏䈲䇮ቇ ⴚ⺰ᢥ䈱್ቯ䈫 䈇䈉㔍䈚䈇⺖㗴䈮ኻ䈚䈩䇮䈪䈐 䉎䈣䈔ᄙ䈒䈱ⷰὐ䈎䉌 䈱੍᷹䉕↪䈇್ቯᕈ⢻ ะ਄䉕⋡ᜰ䈜䈫 䈇䈉Ⓧᭂ⊛䈭ℂ↱䈫䇮䊦䊷䊦 䊔䊷䉴䈱್ቯ䈪䈲ዻᕈᢙ䈏ዋ䈭䈒⸘▚㊂䈏ዋ 䈭䈇䈢䉄䇮ⶄᢙ䈱್ቯེ䉕หᤨ䈮↪䈇䈩䉅ᯏ ᪾⊛䈭⾗Ḯ䈻䈱⽶⩄䈏䈭䈇䈫 䈇䈉ᶖᭂ⊛䈭ℂ ↱䈎䉌ⴕ䈦䈢䇯 ታ㛎䈪䈲 Weka 䈱 Vote 䊝䉳䊠 䊷䊦䉕↪䈇䇮 ⺰ᢥ್ቯ䈮ᄬᢌ䈚䈢 SVM 䉕㒰䈒䊅䉟 䊷䊑䊔䉟 䉵䇮AdaBoost䋨100䋩䇮᳿ቯᧁ䈱䋳䈧䈱್ቯེ 䈱੍᷹୯䉕⚵䉂ว䉒䈞ⴕ䈦䈢䇯 C. ⹏ଔዤᐲ 䈖䈱⎇ⓥ䈪䈲♖ᐲ(P)䇮ౣ⃻₸(R)䇮F ୯(F1䇮 F2)䉕⹏ଔ䈱䈢䉄䈮↪䈇䈢䇯 ♖ᐲ䈲䈬 䉏䈣䈔 ᱜ⏕䈮ᬌ಴䈪䈐 䈢䈎䉕䇮ౣ ⃻₸䈲䈬䉏䈣䈔✂⟜⊛䈮್ቯ䈪䈐 䈢䈎䉕␜䈜䇯 䈢䈣䈚䇮ේೣ⊛䈮♖ᐲ䈫ౣ⃻₸䈲෻Ყ଀䈱㑐. −60− - 6-.

(7) ଥ䈮䈅䉎 䈢䉄䇮♖ᐲ䈣䈔䈅䉎 䈇䈲 ౣ⃻₸䈣䈔 䈎䉌⹏ଔ䈜䉎 䈖 䈫 䈲䈪䈐 䈭䈇䇯䈠䈖䈪 䇮✚ว⊛ 䈭ᜰᮡ䈫䈚䈩 F ୯䉕↪䈇䈢䇯F ୯䈲㱍䈱୯䈮䉋 䈦䈩䇮♖ᐲ䈫ౣ⃻₸䈱㊀䉂䉕ᄌ䈋䉎䈏䇮㱍=0.5 䈫䈚䈢႐ว䇮䈧䉁䉍♖ᐲ䈫ౣ⃻₸䈱⺞๺ᐔဋ䈱 ୯䈫 䈚 䈢䉅 䈱䈏৻⥸⊛䈮䈲↪䈇䉌 䉏䉎䇯䈚 䈎䈚 䈭䈏䉌䇮䈖䈱䉲䉴 䊁䊛䈱⋡⊛䈪䈅䉎⺰ᢥ䈅䉎䈇 䈲䈠 䉏䈮Ḱ䈛䉎䉮䊮䊁䊮䉿 䈱⥄േ್ቯ䉕ᗐቯ 䈚䈢႐ว䇮ౣ⃻₸䈏䉋䉍㊀ⷞ䈘 䉏䉎䈫⠨䈋䉌䉏 䉎䇯䈠䈱䈢䉄䇮䈖 䈖 䈪䈲㱍=0.5 䈱႐ว䈱 F1䇮 ౣ⃻₸䉕䉋䉍㊀ⷞ䈚䈢㱍䋽0.33 䈱 F2 䈱ਔᣇ 䉕Ყセ䈮↪䈇䈩䈇䉎䇯. 䉲䉴 䊁䊛 䈏್ቯ䈚 䈢ᱜ ⸃ઙᢙ 䉲䉴 䊁䊛 䈏⺰ᢥ䈫 ್ቯ 䈚 䈢ઙᢙ 䉲䉴 䊁䊛 䈏್ቯ䈚 䈢ᱜ ⸃ઙᢙ R= ో⺰ᢥઙᢙ 1 F= 1 1 α ⋅ + (1 − α ) ⋅ P R P=. ᧄታ㛎䈪䈲䇮ቇ⠌↪䊶್ቯ↪䊂䊷䉺䉕ಽഀ 䈚䇮4 ੤Ꮕᬌቯ䉕ⴕ䈦䈢䈏䇮ฦ䊂䊷䉺䉶䉾 䊃 䈮 䈍䈇䈩䇮ฦ⹏ଔዤᐲ䈱୯䉕᳞䉄䇮䈠䉏䉌 䉕ᐔ ဋ䈚䈢୯䉕▚಴䈚䈢(macro-averaging)䇯. 㸊 ್ቯ⚿ᨐ ᧄ⎇ⓥ䈱ੑ䈧䈱⋡⊛䈮ኻᔕ䈚䈩䇮⥄േ್ቯ ታ㛎䈲 䇮ੱᚻ䈪ቇⴚ⺰ᢥ䈫್ቯ䈚 䈢䉅 䈱䈱䉂 䉕⺰ᢥ䈫䈚䈩ቇ⠌䈚್ቯ䈚䈢႐ว䈫䇮Ḱ⺰ᢥ䈫 ್ቯ䈘 䉏䈢䉅 䈱䉅฽䉄䈩⺰ᢥ䈫䈚䈩ቇ⠌䈚್ቯ 䈚䈢႐ว䈱ੑ䈧ⴕ䈦䈢䇯. ዻᕈ ಴ ⃻ ⺆. 丶. 䊦 䊦. ዻᕈ ಴ ⃻ ⺆. 丶. 䊦 䊦. ⴫䋵㩷ቇⴚ⺰ᢥ䈮㑐䈜䉎⥄േ್ቯ ᚻᴺ 䊃䊷䉪䊮 ♖ᐲ ౣ⃻₸ 㫄㪼㪺㪸㪹 㪎㪌㪅㪇㩼 㪉㪎㪅㪎㩼 㪪㪭㪤 㪹㫀㪾㫉㪸㫄 㪎㪉㪅㪎㩼 㪉㪎㪅㪋㩼 㪘㪻㪸㪙㫆㫆㫊㫋㩿㪈㪇㪀 㫄㪼㪺㪸㪹 㪌㪉㪅㪈㩼 㪋㪇㪅㪊㩼 㪘㪻㪸㪙㫆㫆㫊㫋㩿㪈㪇㪇㪀 㫄㪼㪺㪸㪹 㪌㪋㪅㪐㩼 㪋㪇㪅㪎㩼 㪘㪻㪸㪙㫆㫆㫊㫋㩿㪈㪇㪇㪇㪀 㫄㪼㪺㪸㪹 㪍㪇㪅㪌㩼 㪊㪏㪅㪊㩼 㫄㪼㪺㪸㪹 㪈㪈㪅㪊㩼 㪏㪐㪅㪌㩼 䊔䉟䉳䉝䊮䊐䉞 䊦䉺 㪹㫀㪾㫉㪸㫄 㪋㪅㪎㩼 㪐㪉㪅㪊㩼 㪉㪊㪅㪊㩼 㪏㪐㪅㪊㩼 䊅䉟䊷䊑䊔䉟䉵 㪋㪊㪅㪇㩼 㪉㪊㪅㪍㩼 ᳿ቯᧁ㩿㪡㪋㪏㪀 㪋㪉㪅㪉㩼 㪊㪊㪅㪈㩼 㪘㪻㪸㪙㫆㫆㫊㫋㩿㪈㪇㪀 㪘㪻㪸㪙㫆㫆㫊㫋㩿㪈㪇㪇㪀 㪋㪍㪅㪎㩼 㪊㪐㪅㪊㩼 㪪㪭㪤 㪇㪅㪇㩼 㪇㪅㪇㩼 㪭㫆㫋㪼 㪋㪋㪅㪋㩼 㪌㪊㪅㪎㩼. 㪝㪈୯ 㪝㪉୯ 㪅㪋㪇㪋 㪅㪊㪌㪇 㪅㪊㪐㪏 㪅㪊㪋㪍 㪅㪋㪌㪌 㪅㪋㪊㪍 㪅㪋㪍㪎 㪅㪋㪋㪌 㪅㪋㪍㪐 㪅㪋㪊㪎 㪅㪉㪇㪇 㪅㪉㪎㪇 㪅㪇㪐㪇 㪅㪈㪉㪏 㪅㪊㪎㪇 㪅㪋㪌㪐 㪅㪊㪇㪌 㪅㪉㪎㪏 㪅㪊㪎㪈 㪅㪊㪌㪎 㪅㪋㪉㪎 㪅㪋㪈㪌 㪥㪆㪘 㪥㪆㪘 㪅㪋㪏㪍 㪅㪌㪇㪉. ⴫䋶㩷ቇⴚ⺰ᢥ䈫Ḱ⺰ᢥ䈮㑐䈜䉎⥄േ್ቯ ᚻᴺ 䊃䊷䉪䊮 ♖ᐲ ౣ⃻₸ 㪝㪈୯ 㪝㪉୯ 㫄㪼㪺㪸㪹 㪎㪋㪅㪉㩼 㪋㪏㪅㪉㩼 㪅㪌㪏㪋 㪅㪌㪋㪍 㪪㪭㪤 㪹㫀㪾㫉㪸㫄 㪎㪋㪅㪐㩼 㪋㪎㪅㪏㩼 㪅㪌㪏㪋 㪅㪌㪋㪋 㪘㪻㪸㪙㫆㫆㫊㫋㩿㪈㪇㪀 㫄㪼㪺㪸㪹 㪌㪏㪅㪇㩼 㪋㪊㪅㪉㩼 㪅㪋㪐㪌 㪅㪋㪎㪉 㪘㪻㪸㪙㫆㫆㫊㫋㩿㪈㪇㪇㪀 㫄㪼㪺㪸㪹 㪍㪉㪅㪋㩼 㪌㪈㪅㪌㩼 㪅㪌㪍㪋 㪅㪌㪋㪎 㪘㪻㪸㪙㫆㫆㫊㫋㩿㪈㪇㪇㪇㪀 㫄㪼㪺㪸㪹 㪍㪎㪅㪌㩼 㪌㪈㪅㪊㩼 㪅㪌㪏㪊 㪅㪌㪌㪎 㫄㪼㪺㪸㪹 㪈㪈㪅㪊㩼 㪏㪐㪅㪌㩼 㪅㪉㪇㪇 㪅㪉㪎㪇 䊔䉟䉳䉝䊮䊐䉞 䊦䉺 㪹㫀㪾㫉㪸㫄 㪈㪊㪅㪍㩼 㪐㪈㪅㪊㩼 㪅㪉㪊㪍 㪅㪊㪈㪋 䊅䉟䊷䊑䊔䉟䉵 㪊㪍㪅㪊㩼 㪎㪉㪅㪍㩼 㪅㪋㪏㪋 㪅㪌㪋㪌 ᳿ቯᧁ㩿㪡㪋㪏㪀 㪍㪍㪅㪉㩼 㪋㪋㪅㪌㩼 㪅㪌㪊㪉 㪅㪌㪇㪇 㪘㪻㪸㪙㫆㫆㫊㫋㩿㪈㪇㪀 㪍㪋㪅㪉㩼 㪋㪊㪅㪊㩼 㪅㪌㪈㪎 㪅㪋㪏㪍 㪘㪻㪸㪙㫆㫆㫊㫋㩿㪈㪇㪇㪀 㪍㪌㪅㪋㩼 㪋㪊㪅㪎㩼 㪅㪌㪉㪋 㪅㪋㪐㪈 㪪㪭㪤 㪍㪏㪅㪈㩼 㪋㪊㪅㪍㩼 㪅㪌㪊㪉 㪅㪋㪐㪌 㪭㫆㫋㪼 㪌㪐㪅㪉㩼 㪌㪌㪅㪈㩼 㪅㪌㪎㪈 㪅㪌㪍㪋. −61− - 7-. A. ቇⴚ⺰ᢥ䉕ኻ⽎䈫䈚 䈢್ቯ⚿ᨐ ቇⴚ⺰ᢥ䉕 ኻ⽎䈫䈚 䈢⥄േ್ቯ䈱್ቯ⚿ᨐ 䉕⴫䋵䈮␜䈚䈢䇯♖ᐲ䊶 ౣ⃻₸䈱ਔᣇ䈏หᤨ䈮 50%䉕⿧䈋 䉎ᚻᴺ䈏䈭 䈇ὐ䈎䉌 䈲ో૕⊛䈮ච ಽ䈭್ቯᕈ⢻䈏ᓧ䉌䉏 䈢䈫 䈲䈇䈋䈭䈇䇯䈚 䈎䈚䇮 ⹏ଔዤᐲ೎䈮䈲ฦᚻᴺ 䈱․ᓽ䈏㗼⪺䈮⃻䉏䈩 䈍䉍⥝๧ᷓ 䈇⚿ᨐ 䈫䈇 䈋䉎䇯 ♖ᐲ䊶ౣ⃻₸ 䈱ὐ䈎 䉌䈲 䇮SVM䋨mecab䋩䈱 ႐ว䈲 75%એ਄䈱㜞䈇 ♖ᐲ䈪⺰ᢥ䉕ᬌ಴䈪䈐 䈢䇯䉁䈢䇮92% 䈫㜞ౣ⃻ ₸䈪䈅䉎 䈱䈲䊔䉟 䉳䉝䊮 䊐 䉞 䊦䉺䋨bigram䋩 䈪䈅 䉎䈏♖ᐲ䈏ૐ䈒䇮♖ᐲ䈫 䈱䊋䊤 䊮䉴䈎䉌䈲䊦䊷䊦 䊔䊷䉴 䈱䊅䉟 䊷䊑䊔䉟 䉵䈏䈅䉎⒟ᐲ䈱♖ᐲ䈲 ⏕଻䈚 䈧䈧䇮㜞䈇ౣ⃻ ₸䉕␜䈚䈢䈫 䈇䈋䉎䇯䊜䉺 ್ቯ䉕㒰䈐♖ᐲ䉋䉍䉅㜞 䈇ౣ⃻₸䉕␜䈚 䈢䈱䈲䇮.

(8) 䊔䉟 䉳䉝䊮䊐䉞 䊦䉺䈫䊅䉟 䊷䊑䊔䉟 䉵 䈪䈱್ቯ 䈣䈔䈪䈅䉎䇯 F ୯䈎䉌䉂䉎䈫䇮䊜䉺್ቯ䈱 Vote 䈏ᦨ㜞୯ 䈪䈅䉎䈏䇮䈠䉏䉕㒰䈔䈳䇮F1 ୯䈪䈲 䇮಴⃻⺆ 䉝 䊒 䊨 䊷䉼 䈱 AdaBoost(1000) 䈱 mecab 䈏.469 䈫ᦨ㜞୯䉕␜䈚䈢䇯ౣ⃻₸㊀ⷞ䈱 F2 ୯䈪䈲䊅䉟 䊷䊑䊔䉟 䉵 䈱୯䈏㜞䈇䇯 B. ቇⴚ⺰ᢥ䈫Ḱ⺰ᢥ䉕ኻ⽎䈫䈚䈢್ቯ⚿ᨐ ቇⴚ⺰ᢥ䈫Ḱ⺰ᢥ䉕ኻ⽎䈫䈚䈢⥄േ್ቯ䈱 ⚿ᨐ䉕⴫䋶䈮␜䈜䇯⥄േ್ቯ䈮䈍䈇䈩඙೎䈏 䈧䈐 䈮䈒䈇䈫⠨䈋䉌 䉏䉎Ḱ⺰ᢥ䉕฽䉄䉎䈖䈫 䈪 ቇⴚ⺰ᢥ䈱䉂䉕ኻ⽎䈫䈚䈢႐ว䉋䉍䉅 ್ቯᕈ ⢻䈲ో૕⊛䈮ะ਄䈚 䈩䈇䉎䇯F1 ୯䈲䈘䉌䈮♖ ᐲ䈏਄䈏䈦䈢 SVM 䈱 mecab 䈏䇮F2 ୯䈲䊜䉺 ್ቯ䈱୯䈏㜞䈇䇯 C. ⠨ኤ ቇⴚ⺰ᢥ䈱್ቯ䈱䊦䊷䊦䊔䊷䉴䈮䈍䈇䈩䇮 SVM 䈏䋱ઙ䉅⺰ᢥ䉕ᱜ䈚䈒್ቯ䈪䈐 䈭䈎䈦䈢䈖 䈫䈲䇮ዻᕈ䈱ㆬᛯ䈮໧㗴䈏䈅䈦䈢䈖䈫 䈏 ේ࿃䈎 䉅 䈚 䉏䈭䈇䈏䇮৻ᣇ䈪䇮ઁ䈱ᚻᴺ䈪䈲䈅䉎⒟ᐲ 䈱ᱜ⸃ᢙ䈏ᓧ䉌䉏䈢䈖䈫 䉕 ⠨ᘦ䈜䉎䈫 䇮․╩䈮 ଔ䈜䉎䇯ේ࿃䈱․ቯ䈱䈢䉄䈮䈲⹦⚦䈭ಽᨆ䉕 ⴕ䈉ᔅⷐ䈏䈅䉎䈏䇮䈖䈱ታ㛎䈱㔍ᐲ䈱㜞䈘䉕␜ ໂ䈜䉎৻଀䈫 䈇䈋䉎䇯 䉁䈢䇮䊦䊷䊦䊔䊷䉴 䉝䊒䊨 䊷䉼䈪䈲䋱䋹䈱ዻ ᕈ䈚䈎↪䈇䈩䈇䈭䈇䈮䉅 䈎䈎䉒䉌 䈝 䇮಴⃻⺆ 䉝䊒䊨 䊷䉼䈮Ყ䈼㆖⦡䈱䈭䈇䇮䈅䉎 䈇䈲䈠䉏 એ਄䈱ᕈ⢻䉕␜䈚 䈩䈇䉎䇯 ੹ᓟ䈲಴⃻⺆䉝䊒䊨 䊷䉼䈮ኻ䈚䈩ẜ࿷⊛ᗧ ๧䉟 䊮䊂䉨䉲䊮䉫䇮ਥᚑಽಽᨆ䈭䈬䈱ᚻᴺ䉕 ㆡ↪䈚䈢ᰴర❗⚂ಣℂ䉕ⴕ䈇䇮ᰴరᢙ䉕ᷫ䉌 䈜䈫䈫䉅䈮䇮䊦䊷䊦䊔䊷䉴 䉝䊒䊨 䊷䉼䈫䈱⛔ว 䉕࿑䉍䇮್ቯᕈ⢻䈱ะ਄䉕⋡ᜰ䈜䇯 䇼ᵈ䊶ᒁ↪ᢥ₂䇽㩷 1) ᤨታ⽎৻. "䉥䊷䊒䊮䉝䉪 䉶䉴䈱േะ". ᖱႎ▤ℂ. Vol.47, No.9, 2004, p.616-624. 2) Budapest Open Access Initiative. 2002. <http://www.soros.org/openaccess/read.shtml> 3) http://romeo.eprints.org/stats.php 4) “⁛┙ⴕ᡽ᴺੱ⑼ቇᛛⴚᝄ⥝ᯏ᭴” <http://www.jst.go.jp/> 5) “J-STAGE” <http://www.jstage.jst.go.jp/> 6) 㜞ᧁర. “⎇ⓥ⠪䈮䈫 䈦䈩䈱䉶䊦䊐䉝䊷䉦䉟 䊎䊮 䉫”. ᖱႎ䈱⑼ቇ䈫ᛛⴚ. Vol.55, No.10, 2005, p.434. 7) Open Archive Initiative Protocol for Metadata Harvesting. <http://www.openarchives.org/OAI/openarchiv esprotocol.html> 8) “CiteSeer.IST”. <http://citeseer.ist.psu.edu/cs> 9) “Google Scholar Beta” <http://scholar.google.com/> 10) Google Scholar 䈲 Beta  䈪䈅䉍䇮ᱜᑼ䉰䊷䊎䉴 㐿ᆎᤨ䈮䈬䈉 䈭䉎䈎䈲᣿䉌 䈎䈪䈲䈭䈇 11) 䉻䉟䉝䊅䊶䉪 䊧䊷䊮⪺䋨ᵤ↰⦟ᚑ⋙⸶䋩. ⷗䈋䈙䉎 ᄢቇ:⑼ቇ౒ห૕䈱⍮⼂䈱વ᠞. ᧲੩, ᢘᢥၴ, 1979, 260p. 12) 㵰Xpdf㵱<http://www.foolabs.com/xpdf/> 13) “MeCab: Yet Another Part-of-Speech and Morphological Analyzer” <http:// chasen.org/~taku/software/mecab/> 14) ⍹↰ᩕ⟤䈾䈎. “ᣣᧄ⺆ PDF 䊐䉜䉟䊦䉕ኻ⽎䈫䈚 䈢ቇⴚ⺰ᢥ䈱⥄േ್ቯ”. ᣣᧄ࿑ᦠ㙚ᖱႎቇળ䋬 ਃ↰࿑ᦠ㙚䊶ᖱႎቇળวห⎇ⓥᄢળ⊒⴫ⷐ✁ 2005䋬ᘮᙥ⟵Ⴖᄢቇ䋬2005-10-22/23䋬p.165-168 15) “Weka” <http://www.cs.waikato .ac.nz/~ml/weka> 16) Vladimir N. Vapnik. The nature of statistical learning theory, 2nd ed. New York, Springer, xix, 314p., 2000 17) “SVMlight” <http://svmlight.joachims.org/> 18) “LIBSVM -- A Library for Support Vector Machines” <http://www.csie .ntu.edu.tw/~cjlin/libsvm/> 19) Schapire, R.E.; Singer, Y. “BoosTexter : A Boosting-based System for Text Categorization”, Machine Learning, Vol. 39, Number 2/3, p.135-168 (2000) 20) 䊣 䉝䊑䊶䊐䊨䉟䊮䊄; 䊨 䊋䊷䊃䊶䉲䊞䊏䊥⪺(቟ㇱ⋥᮸ ⸶). “䊑䊷䉴䊁䉞 䊮䉫౉㐷”. ੱᎿ⍮⢻ቇળ⹹, Vol. 14, No. 5, 1999, p.771-780. 21) “BoosTexter” <http://www.research.att.com/sw/tools/BoosTe xter/> 22) R. E. Schapire. ”The boosting approach to machine learning: an overview.” MSRI workshop on nonlinear estimation and classification. 2001. p. 149-172 23) 䊘䊷䊦䊶䉫䊤 䊊䊛⪺(Ꮉวผᦶ⋙⸶). ╙ 8 ┨䇸䉴䊌 䊛䈻䈱ኻ╷䇹. 䇺䊊䉾 䉦䊷䈫↹ኅ䋺䉮䊮䊏䊠䊷䉺ᤨઍ䈱 ഃㅧ⠪䈢䈤䇻 ᧲੩, 䉥䊷䊛␠, 2005, p.127-135 24) “bsfilter / bayesian spam filter”. <http://bsfilter.org/> 25) Gary Robinson. A statistical approach to the spam problem. <http://www. linuxjournal.com/article/6467> 26) Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Classification and regression trees. Belmont, Wadsworth International Group, 1984, 358p. 27) Quinlan, R. J. "Induction of decision trees". Machine Learning. Vol.1, No.1, p.81-106(1986) 28) J. R. 䉨䊮䊤䊮⪺(ฎᎹᐽ৻⋙⸶). AI 䈮䉋䉎䊂䊷䉺 ⸃ᨆ. ᧲੩, 䊃 䉾䊌䊮, 1995, 293p.. −62− - 8-E.

(9)

参照

関連したドキュメント

「臨床推論」 という日本語の定義として確立し

SVF Migration Tool の動作を制御するための設定を設定ファイルに記述します。Windows 環境 の場合は「SVF Migration Tool の動作設定 (p. 20)」を、UNIX/Linux

一階算術(自然数論)に議論を限定する。ひとたび一階算術に身を置くと、そこに算術的 階層の存在とその厳密性

本学級の児童は,89%の児童が「外国 語活動が好きだ」と回答しており,多く

ダウンロードしたファイルを 解凍して自動作成ツール (StartPro2018.exe) を起動します。.

[r]

䇭䊶㪥㪢⸽ᦠ⊒ⴕ䈮ᔅⷐ䈭ᦠ㘃䈱 㩷㩷㩷㩷ឭଏ 䇭䊶㪡㪞ឭ಴㪥㪢䊧䊘䊷䊃䊄䊤䊐䊃 㩷㩷㩷㩷૞ᚑଐ㗬 㩷㩷㩷䋨᭎䈰䊐䊤䉾䉫䊋䉾䉪䈱䋱ㅳ㑆

㩿㫋୯㪀 㩿㪍㪅㪍㪋㪋 㪁㪁 㪀 㩿㪍㪅㪌㪏㪊 㪁㪁 㪀 㩿㪍㪅㪍㪎㪊 㪁㪁 㪀 㩿㪍㪅㪌㪏㪊 㪁㪁 㪀 㩿㪍㪅㪍㪍㪉 㪁㪁 㪀 㩿㪍㪅㪉㪐㪏 㪁㪁 㪀 㩿㪌㪅㪋㪌㪍 㪁㪁 㪀