• 検索結果がありません。

JAIST Repository: ブースティング手法を用いた分類システムの精度向上

N/A
N/A
Protected

Academic year: 2021

シェア "JAIST Repository: ブースティング手法を用いた分類システムの精度向上"

Copied!
52
0
0

読み込み中.... (全文を見る)

全文

(1)JAIST Repository https://dspace.jaist.ac.jp/. Title. ブースティング手法を用いた分類システムの精度向上. Author(s). 綾川, 聡司. Citation Issue Date. 2003-03. Type. Thesis or Dissertation. Text version. author. URL. http://hdl.handle.net/10119/454. Rights Description. Supervisor:Ho Tu Bao, 知識科学研究科, 修士. Japan Advanced Institute of Science and Technology.

(2) ୃ ჻ ⺰ ᢥ. ࡉ࡯ࠬ࠹ࠖࡦࠣᚻᴺࠍ↪޿ߚ ಽ㘃ࠪࠬ࠹ࡓߩ♖ᐲะ਄. ᜰዉᢎቭ  Ho Tu Bao ᢎ᝼. ർ㒽వ┵⑼ቇᛛⴚᄢቇ㒮ᄢቇ ⍮⼂⑼ቇ⎇ⓥ⑼⍮⼂ࠪࠬ࠹ࡓၮ␆ቇኾ᡹. 150003. ✍Ꮉ ⡡ม. ክᩏᆔຬ㧦 Ho Tu Bao ᢎ᝼㧔ਥᩏ㧕 ⍹ፒ 㓷ੱ ഥᢎ᝼            ૒⮮ ⾫ੑ ഥᢎ᝼            ᨋ  ᐘ㓶 ഥᢎ᝼            2003 ᐕ 2 ᦬ Copyright հ 2003 by Satoshi Ayakawa.

(3) ⋡ ᰴ. 㧝 ߪߓ߼ߦ                               㧝 㧝㧚㧝⎇ⓥߩ⢛᥊ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧝 㧝㧚㧞⎇ⓥߩ⋡⊛ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧞 㧝㧚㧟ᧄ⺰ᢥߩ᭴ᚑ㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧞 㧞 ಽ㘃ࠪࠬ࠹ࡓ                             㧟 㧞㧚㧝ಽ㘃ࠪࠬ࠹ࡓߣߪ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧟 㧞㧚㧞᳿ቯᧁࠍ↪޿ߚಽ㘃(C4.5) 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧡 㧞㧚㧞㧚㧝᳿ቯᧁߩ૞ᚑ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧢 㧞㧚㧞㧚㧞ᧁߩᨑಿࠅ(Tree Pruning)  㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧣 㧞㧚㧟ಽ㘃ࠪࠬ࠹ࡓߩ⹏ଔᣇᴺ㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧤 㧟 ࡉ࡯ࠬ࠹ࠖࡦࠣ(Boosting)                      㧝㧜 㧟㧚㧝ࡉ࡯ࠬ࠹ࠖࡦࠣߣߪ㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧝㧜 㧟㧚㧞AdaBoost 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧝㧝 㧟㧚㧟AdaBoost ߩ․ᓽ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧝㧟 㧠 AdaBoost ߩᡷ⦟. 㧝㧡. 㧠㧚㧝ᡷ⦟ߩឭ᩺㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧝㧡 㧠㧚㧞ࡕ࠺࡞㧝(AdaBoost_M1)  㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧝㧢 㧠㧚㧟ࡕ࠺࡞㧞(AdaBoost_M2)  㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧝8 㧡 ࠪࠬ࠹ࡓߩ᭎ⷐ                           㧞㧜 㧡㧚㧝᭎ⷐ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧞㧜 㧡㧚㧞ಽ㘃ࠪࠬ࠹ࡓ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧞㧝 㧡㧚㧟⹏ଔࠪࠬ࠹ࡓ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧞㧠. i.

(4) 㧢 ታ㛎                                㧞㧢 㧢㧚㧝࠺࡯࠲࠮࠶࠻ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧞㧢 㧢㧚㧞ታ㛎ߩ᭎ⷐ㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧞㧤 㧢㧚㧟ታ㛎ߩᚻ㗅㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧞㧤 㧢㧚㧠ታ㛎ߩ⚿ᨐߣ⠨ኤ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧞㧥 㧢㧚㧠㧚㧝C4.5 ߣ BC4.5 ߣߩᲧセ  㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧞㧥 㧢㧚㧠㧚㧞BC4.5 ߣ BC4.5_M1 ߣߩᲧセ 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧟㧟 㧢㧚㧠㧚㧟BC4.5 ߣ BC4.5_M2 ߣߩᲧセ 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧟㧣 㧣 ⚿⺰                                㧠㧝. ii.

(5) ࿑ ⋡ ᰴ 㧞㧚㧝 ࠺࡯࠲ಽ㘃ߩࡊࡠ࠮ࠬ㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧠 㧞㧚㧞 ᳿ቯᧁ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧡 㧞㧚㧟 k-fold cross validation㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚9 㧟㧚㧝 AdaBoost ࠕ࡞ࠧ࡝࠭ࡓ㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧝㧞 㧠㧚㧝 AdaBoost_M1 ࠕ࡞ࠧ࡝࠭ࡓ㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧝㧣 㧠㧚㧞 AdaBoost_M2 ࠕ࡞ࠧ࡝࠭ࡓ㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧝㧥 㧡㧚㧝 ಽ㘃ࠪࠬ࠹ࡓߩ◲නߥࡈࡠ࡯࠴ࡖ࡯࠻㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧞㧟 㧡㧚㧞 ⹏ଔࠪࠬ࠹ࡓߩ◲නߥࡈࡠ࡯࠴ࡖ࡯࠻㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧞㧡 㧢㧚㧝 Pruned C4.5 ߣ Un-pruned BC4.5 ߣߩᲧセ  㧚 㧚 㧚 㧚 㧚 㧚㧟㧜 㧢㧚㧞 Pruned C4.5 ߣ Pruned BC4.5 ߣߩᲧセ 㧚 㧚 㧚 㧚 㧚 㧚 㧚㧟㧝 㧢㧚㧟 Pruned BC4.5 ߣ Un-pruned BC4.5 ߣߩᲧセ 㧚 㧚 㧚 㧚 㧚 㧚㧟㧞 㧢㧚㧠 Un-pruned BC4.5 ߣ Un-pruned BC4.5_M1 ߣߩᲧセ 㧚 㧚 㧚 㧚㧟㧠 㧢㧚㧡 Un-pruned BC4.5 ߣ Pruned BC4.5_M1 ߣߩᲧセ 㧚 㧚 㧚 㧚 㧚㧟㧡 㧢㧚㧢 Un-pruned BC4.5_M1 ߣ Pruned BC4.5_M2 ߣߩᲧセ 㧚 㧚 㧚 㧚㧟㧢 㧢㧚㧣 Un-pruned BC4.5 ߣ Un-pruned BC4.5_M㧞ߣߩᲧセ 㧚 㧚 㧚 㧚㧟㧢 㧢㧚㧤 Un-pruned BC4.5 ߣ Pruned BC4.5_M㧞ߣߩᲧセ 㧚 㧚 㧚 㧚 㧚㧟㧥 㧢㧚㧥 Un-pruned BC4.5_M2 ߣ Pruned BC4.5_M2 ߣߩᲧセ 㧚 㧚 㧚 㧚㧠㧜. iii.

(6) ⴫ ⋡ ᰴ 㧞㧚㧝 ࠺࡯࠲࠮࠶࠻ߩ․ᓽ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧞㧣 㧞㧚㧞 C4.5 ߣ BC4.5 ߣߩᲧセ㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧞㧥 㧞㧚㧟 BC4.5 ߣ BC4.5_M1 ߣߩᲧセ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧟㧟 㧞㧚㧠 BC4.5 ߣ BC4.5_M2 ߣߩᲧセ 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧚 㧟㧣. iv.

(7) ╙ 1 ┨ ߪߓ߼ߦ 1. 1 ⎇ⓥߩ⢛᥊ߣ⋡⊛  ㄭᐕ‫ޔ‬ᖱႎᛛⴚߩ⊒ዷߦߣ߽ߥ޿࠺࡯࠲ߩ㊂ߪᜬ⛯⊛ߦჇᄢߒߡ޿ࠆ‫߼ߚߩߘޕ‬ᖱ ႎ↥ᬺߢߪ‫ޔ‬Ꮒᄢߥ࠺࡯࠲ߩਛ߆ࠄ᦭↪ߥᖱႎ‫ޔ‬⍮⼂ࠍขࠅ಴ߔߎߣ㧔࠺࡯࠲ࡑࠗࡦ ࠾ࡦࠣ㧕߇ᔅⷐߦߥߞߡ߈ߚ‫ޔߪߣࠣࡦ࠾ࡦࠗࡑ࠲࡯࠺ޕ‬᭽‫ߩࡊࠗ࠲ߥޘ‬Ꮒᄢߥ࠺࡯ ࠲ߩਛ߆ࠄ‫⑼ޔ‬ቇ⺞ᩏ‫ޔ‬ડᬺ⚻༡‫▤↥↢ޔ‬ℂ߅ࠃ߮Ꮢ႐ಽᨆߦᵴ↪ߔࠆ੐߇ߢ߈ࠆ⍮ ⼂ࠍ᛽಴ߔࠆᣇᴺߢ޽ࠆ‫ߩ࠲࡯࠺ޔߪࠣࡦ࠾ࡦࠗࡑ࠲࡯࠺ޕ‬ਛ߆ࠄ⍮⼂ࠍขࠅ಴ߔᣇ ᴺߢ޽ࠅ KDD㧔Knowledge Discovery in Databases㧕ߣ߽๭߫ࠇߡ޿ࠆ‫ޕ‬ ࠺࡯࠲ࡑࠗࡦ࠾ࡦࠣᚻᴺߦߪ‫ޔ‬᭽‫ߥޘ‬ᚻᴺ߇޽ࠆ߇ᧄ⎇ⓥߢߪ‫ޔ‬ㆊ෰ߩ࠺࡯࠲ࠍಽ 㘃ߔࠆߎߣߦࠃࠅᧂ⍮ߩ࠺࡯࠲ࠍ੍᷹ߔࠆಽ㘃ࠪࠬ࠹ࡓߦߟ޿ߡᵈ⋡ߒߚ‫޿⦟ޕ‬ಽ㘃 ߪᧂ⍮ߩ੐଀ߦኻߔࠆㆡಾߥ੍ᗐࠍਈ߃ࠆ੐ߦߥࠆ‫ޔߡߞ߇ߚߒޕ‬ಽ㘃ࠪࠬ࠹ࡓߩ♖ ᐲࠍะ਄ߐߖࠆ⎇ⓥ߇ⴕࠊࠇߡ޿ࠆ‫⎇ߩߘޕ‬ⓥߩ৻ߟߩࠕࡊࡠ࡯࠴ߣߒߡ‫ޔ‬ಽ㘃ࠪࠬ ࠹ࡓ߇࠺࡯࠲߆ࠄ᛽಴ߒߚಽ㘃ⷙೣ㧔઒⺑㧕ࠍ⚵ߺวࠊߖࠆߎߣߦࠃࠅ‫♖ޔ‬ᐲߩ㜞޿ ಽ㘃ⷙೣࠍ૞ᚑߔࠆᚻᴺ߇޽ࠆ‫ߩߘޕ‬ᚻᴺߪࠕࡦࠨࡦࡉ࡞ቇ⠌ߣ๭߫ࠇ‫ࠖ࠹ࠬ࡯ࡉޔ‬ ࡦࠣ߿ࡃ࠶ࠠࡦࠣߣ޿ߞߚᚻᴺ߇޽ࠆ‫ޕ‬ ᧄ⎇ⓥߢߪࡉ࡯ࠬ࠹ࠖࡦࠣᚻᴺࠍ૶ߞߚಽ㘃ࠪࠬ࠹ࡓߩ♖ᐲะ਄ߦߟ޿ߡᵈ⋡ߒ ߚ‫ࠣࡦࠖ࠹ࠬ࡯ࡉޔߚ߹ޕ‬ᚻᴺߪ᭽‫ߥޘ‬⒳㘃ߩᚻᴺ߇޽ࠆ߇‫ޔ‬ಽ㘃ࠪࠬ࠹ࡓߦㆡ↪น ⢻ߥ AdaBoost ࠕ࡞ࠧ࡝࠭ࡓࠍ↪޿ࠆߎߣߦߒߚ‫ޔߪࡓ࠭࡝ࠧ࡞ࠕߩߎޕ‬᭽‫ߥޘ‬ቇ⠌ ࠪࠬ࠹ࡓߦㆡ↪ߒߚታ㛎ߢ⦟޿⚿ᨐࠍᓧߡ޿ࠆ‫ߩ߆ߟߊ޿ޔߒ߆ߒޕ‬໧㗴߽޽ࠆ‫ߚޕ‬ ߣ߃߫‫ޟޔ‬ㆡ↪ߔࠆࠪࠬ࠹ࡓߩ♖ᐲ߇ૐߔ߉ࠆ‫ޟޔޠ‬ㆡ↪ߔࠆ࡞࡯࡞߇ⶄ㔀ߔ߉ࠆ‫ޔޠ‬ ‫ޟ‬ቇ⠌ߔࠆ࠺࡯࠲ߦ㑆㆑ߞߚ࠺࡯࠲߇ᄙߊሽ࿷ߔࠆ‫ߩߤߥޠ‬႐วߪ޽߹ࠅലᨐ⊛ߢߪ. 1.

(8) ߥ޿ߎߣ߇ಽߞߡ޿ࠆ‫⃻ޕ‬࿷ߩࡉ࡯ࠬ࠹ࠖࡦࠣ⎇ⓥߩ৻ߟߦ‫ߩࠄࠇߎޔ‬໧㗴ࠍ⸃᳿ߔ ࠆߚ߼ߦቇ⠌ࠪࠬ࠹ࡓߣࡉ࡯ࠬ࠹ࠖࡦࠣᚻᴺߣߩ㑐ଥߦ㑐ߔࠆ⎇ⓥ߇ߥߐࠇߡ޿ࠆ‫ޕ‬ ᧄ⺰ᢥߢߪߎߩ⎇ⓥߦᵈ⋡ߒߡ޿ࠆ‫ޕ‬. 1. 2 ᧄ⎇ⓥߩ⋡⊛ ᧄ⎇ⓥߩᄢ߈ߥ⋡⊛ߪ‫ޔ‬ಽ㘃ࠪࠬ࠹ࡓߩࠛ࡜࡯₸ߩᷫዋߐߖࠆߎߣߢ޽ࠆ‫ߚߩߘޕ‬ ߼ಽ㘃ࠪࠬ࠹ࡓߩ♖ᐲࠍะ਄ߐߖࠆᣇᴺߩ৻ߟߢ޽ࠆࡉ࡯ࠬ࠹ࠖࡦࠣᚻᴺߩ⺞ᩏ‫ࡉޔ‬ ࡯ࠬ࠹ࠖࡦࠣᚻᴺߣಽ㘃ࠪࠬ࠹ࡓߣߩ㑐ଥߦߟ޿ߡ⺞ᩏ‫ߩࠣࡦࠖ࠹ࠬ࡯ࡉޔ‬ᡷ⦟ߩឭ ᩺ࠍ⋡ᮡߣߒߡ⎇ⓥࠍㅴ߼ߚ‫ߩߘޕ‬ᣇᴺߣߒߡ‫ޔ‬᳿ቯᧁࠍ↪޿ߚಽ㘃ࠪࠬ࠹ࡓߢ޽ࠆ C4.5 ߇↢ᚑߔࠆ♖ᐲߩ⇣ߥࠆ 2 ⒳㘃ߩ᳿ቯᧁ㧔߭ߣߟߪ̌un-pruned tree̍ߣ๭߫ ࠇࠆⷙೣ‫ߦ࠲࡯࠺ߥ⊛⥸৻ࠍࠇߘߪߟ৻߽ޔ‬ኻᔕ಴᧪ࠆࠃ߁ߦߦߒߚⷙೣ̌pruned tree̍㧕ߦኻߒߡ AdaBoost ࠍㆡ↪ߒ‫♖ߩߘޔ‬ᐲࠍᲧセߔࠆߎߣࠍⴕߞߚ‫ޔߚ߹ޕ‬ AdaBoost ࠍᡷ⦟ߒߚࠕ࡞ࠧ࡝࠭ࡓࠍ 2 ⒳㘃⠨᩺ߒߚ‫ޕ‬ᡷ⦟ࠍⴕߞߚࠪࠬ࠹ࡓ߽ C4.5 ߩ 2 ⒳㘃ߩ᳿ቯᧁߦㆡ↪ߒߘࠇߙࠇᲧセࠍⴕߞߚ‫ޕ‬. 1. 3 ᧄ⺰ᢥߩ᭴ᚑ  ᧄ⺰ᢥߪ‫ޔ‬7 ߟߩ┨ࠃࠅߥࠆ‫ޕ‬㧞┨ߪಽ㘃ࠪࠬ࠹ࡓ‫ߦ․ޔ‬᳿ቯᧁࠍ↪޿ߚಽ㘃ࠪࠬ ࠹ࡓߢ޽ࠆ C4.5 ߦߟ޿ߡㅀߴߚ‫ޔߚ߹ޕ‬ಽ㘃ࠪࠬ࠹ࡓߩ⹏ଔᣇᴺߦߟ޿ߡ߽⚫੺ߒ ߚ‫ޕ‬3 ┨ߪࡉ࡯ࠬ࠹ࠖࡦࠣߦߟ޿ߡߩ⚫੺ࠍߒߚ‫ࠣࡦࠖ࠹ࠬ࡯ࡉޔߚ߹ޕ‬ᚻᴺߩ৻ߟ ߢ޽ࠆ AdaBoost ߩࠕ࡞ࠧ࡝࠭ࡓߦߟ޿ߡㅀߴߚ‫ޕ‬4 ┨ߪ AdaBoost ߩᡷ⦟ߦߟ޿ߡ ߩឭ᩺ߦߟ޿ߡㅀߴࠆ‫ޔߚ߹ޕ‬AdaBoost ࠍᡷ⦟ߒߚࠕ࡞ࠧ࡝࠭ࡓࠍ 2 ⒳㘃⚫੺ߒߚ‫ޕ‬ 5 ┨ߪᧄ⎇ⓥߢ㐿⊒ߒߚࠪࠬ࠹ࡓߩ᭎ⷐߦߟ޿ߡ⸥ߒߚ‫ޕ‬6 ┨ߢߪታ㛎ߢ૶↪ߒߚ࠺ ࡯࠲‫ౕޔ‬૕⊛ߥታ㛎ߩㅴ߼ᣇ‫ޔ‬ታ㛎⚿ᨐ‫⚿ޔ‬ᨐߦኻߔࠆ⠨ኤࠍ⸥ߒߚ‫ޕ‬7 ┨ߢߪᧄ⎇ ⓥߦኻߔࠆ⚿⺰ࠍㅀߴߚ‫ޕ‬. 2.

(9) ╙ 2 ┨ ಽ㘃ࠪࠬ࠹ࡓ 2. 1 ಽ㘃ࠪࠬ࠹ࡓߣߪ ಽ㘃ࠪࠬ࠹ࡓߪቇ⠌↪࠺࡯࠲࠮࠶࠻(⸠✵࠺࡯࠲)߆ࠄಽ㘃ⷙೣࠍ᭴▽ߒ‫↪ࠍࠇߘޔ‬ ޿ߡᧂ⍮࠺࡯࠲࠮࠶࠻ߦ੍᷹⚿ᨐ(ࠢ࡜ࠬ࡜ࡌ࡞)ࠍਈ߃ࠆࠪࠬ࠹ࡓߢ޽ࠆ‫ޕ‬ಽ㘃ࠪࠬ ࠹ࡓߦߪ‫ޔ‬᳿ቯᧁ‫ޔࠢ࡯ࡢ࠻࠶ࡀ࡞࡜࡯ࡘ࠾ޔ‬Nearest Neighbor ߥߤ߇޽ࠆ‫ޕ‬ ࠺࡯࠲ߩಽ㘃ߦߪ‫ޔ‬2 ߟߩࠬ࠹࠶ࡊ߇޽ࠆ‫ࡊ࠮ࡦࠦߪߚ߹࡞ࡌ࡜ߩ࠲࡯࠺ޔߕ߹ޕ‬ ࠻ߦၮߠ޿ߚಽ㘃ⷙೣࠍ᭴▽ߔࠆ‫ߩߘޕ‬ಽ㘃ⷙೣߪ‫ޔ‬ዻᕈߦࠃߞߡࠊߌࠄࠇߡ޿ࠆ࠺ ࡯࠲ࡌ࡯ࠬౝࠍಽᨆߔࠆߎߣߦࠃߞߡᓧࠄࠇࠆ‫ࠬ࡯ࡌ࠲࡯࠺ޔߚ߹ޕ‬ౝߩฦ࠺࡯࠲ߦ ߪ‫⸳߼ߓ߆ࠄ޽ޔ‬ቯߐࠇߚ৻ߟߩዻᕈ㧔ࠢ࡜ࠬዻᕈ㧕‫⸳߇࡞ࡌ࡜ޔ‬ቯߐࠇߡ޿ࠆ‫ߎޕ‬ ߩࠃ߁ߥ࠺࡯࠲㓸วࠍ࠺࡯࠲࠮࠶࠻‫ޔ‬੐଀‫ߥ߁ࠃߩߎޕ߁⸒ߣߤߥ࠻ࠢࠚࠫࡉࠝޔ‬ฦ ⸠✵࠺࡯࠲ߦࠢ࡜ࠬ࡜ࡌ࡞߇ߔߢߦਈ߃ࠄࠇߡ޿ࠆ‛ࠍ૶↪ߒߡቇ⠌ࠍⴕ߁ࠪࠬ࠹ ࡓߪ‫ޔ‬ᢎᏧઃ߈ቇ⠌ߣ๭߫ࠇߡ޿ࠆ‫৻ޕ‬ᣇ‫ߥ߁ࠃߩߎޔ‬ቇ⠌ߣߪ೎ߦᢎᏧήߒቇ⠌㧔ࠢ ࡜ࠬ࠲࡝ࡦࠣ㧕ߣ๭߫ࠇࠆಽ㘃ࠪࠬ࠹ࡓ߇޽ࠆ‫ߩࠬ࡯ࡌ࠲࡯࠺ޔߪࠇߎޕ‬ฦ࠺࡯࠲ߦ ᳿߼ࠄࠇߚࠢ࡜ࠬ࡜ࡌ࡞ߪ᳿߼ࠄࠇߡ޿ߥ޿‫ޔߚ߹ޕ‬ಽ㘃ߐࠇࠆߴ߈ࠢ࡜ࠬߩዻᕈ߿ ᢙߥߤߪ‫ޔ‬ቇ⠌⺖⒟ߢ᳿߼ࠄࠇߡ޿ߊ‫⺰ᧄޕ‬ᢥߢߪ‫ޔ‬ᢎᏧઃ߈ቇ⠌ߦߟ޿ߡߩߺขࠅ ਄ߍߡ޿ࠆ‫ޕ‬ ㅢᏱ‫ޔ‬ቇ⠌ߐࠇߚࡕ࠺࡞ߪಽ㘃࡞࡯࡞ߩᒻߢ␜ߐࠇߡ޿ࠆ‫ޔ߫߃ߣߚޕ‬᳿ቯᧁ߹ߚ ߪᢙᑼߥߤߢ޽ࠆ‫ޔߢߎߎޕ‬࿑ ࿑ 2㧚1㧔㨍㧕ࠍ߽ߣߦ଀ࠍ᜼ߍࠆ‫ߕ߹ޕ‬ቇ⠌↪ߩ࠺࡯ ࠲࠮࠶࠻ߪ‫ޔ‬㘈ቴߩࠢ࡟ࠫ࠶࠻ᖱႎߢ޽ࠆ‫ߩߎޕ‬႐วࠢ࡜ࠬዻᕈߪ̌credit_rating̍ ߦ⸳ቯߐࠇߡ޿ࠆߩߢ‫ޔ‬૞ᚑߐࠇࠆ࡞࡯࡞ߪ̌fair̍߹ߚߪ̌excellent̍ࠍዉ߈಴ߒ ߡ޿ࠆ‫ޕ‬૞ᚑߐࠇߚ࡞࡯࡞ߪ‫ᧂޔ‬⍮ߩ࠺࡯࠲ߩ੍᷹ߦ૶↪ߐࠇࠆ‫ޕ‬. 3.

(10) 㧔㨍㧕 ಽ㘃ࠪࠬ࠹ࡓ ⸠✵࠺࡯࠲. name Sandy Bill Courtney Susan Claire Andre. age <=30 <=30 31…40 >40 >40 31…40. incom low low high med med high. credit_rating fair excellent excellent fair fair excellent. ಽ㘃ⷙೣ. If age=”31…40” and income = high Then Credit_rating = excellent. 㧔㨎㧕 ಽ㘃ⷙೣ. ࠹ࠬ࠻࠺࡯࠲. ᣂߒ޿࠺࡯࠲ (John, 31…40, high). name Frank Sylvia Anne. age >40 <=30 31…40. incom high low high. credit_rating fair fair excellent. Credit rating?. excellent. (a) ቇ⠌㧦ಽ㘃ࠪࠬ࠹ࡓߪ‫ࠍ࠲࡯࠺✵⸠ޔ‬ಽᨆߒಽ㘃ⷙೣࠍ↢ᚑߔࠆ‫ޕ‬ (b) ಽ㘃㧦࠹ࠬ࠻࠺࡯࠲ߪ‫ޔ‬ಽ㘃ⷙೣߩ⹏ଔߦ↪޿ࠄࠇࠆ‫⹏ߩߘߒ߽ޕ‬ଔ߇ḩ⿷ߩⴕߊ߽ߩߞ ߛߞߚߥࠄ߫‫ޔ‬ᣂߒ޿࠺࡯࠲ࠍㆡ↪ߒߚ႐ว߽ḩ⿷ߩⴕߊ੍᷹߇಴᧪ࠆ‫ޕ‬. ࿑㧞㧚㧝 ࠺࡯࠲ಽ㘃ߩࡊࡠ࠮ࠬ. ߟ߉ߩࠬ࠹࠶ࡊߪ‫ޔ‬࿑ ࿑ 2㧚1㧔㨎㧕ߢ޽ࠆ‫ޕ‬૞ᚑߐࠇߚಽ㘃ⷙೣߪ‫ߩ࠲࡯࠺ޔ‬ಽ㘃 ߦ૶↪ߐࠇࠆ‫ޕ‬࿑ ࿑ߩᏀ஥ߩ૞ᬺߪ‫ޔ‬ಽ㘃ⷙೣߩ੍᷹♖ᐲ߇⹏ଔߐࠇߡ޿ࠆ‫ޕ‬ಽ㘃ࠪࠬ ࠹ࡓߩ⹏ଔߦߟ޿ߡߪ‫ޔ‬2.3 ߢ⺑᣿ߔࠆ‫⹏ߡ޿↪ࠍ࠲࡯࠺✵⸠ޔߒ߽ޕ‬ଔࠍߒߚ႐ว ߪ‫ߩߘޔ‬ಽ㘃ⷙೣߪߘߩ࠺࡯࠲ߦ․ൻߒߚⷙೣߥߩߢߎࠇߪ‫⹏ޔ‬ଔߣߪ޿߃ߥ޿‫ࠃޕ‬ ߞߡ‫ޔߒ߽ޕࠆߔ↪૶ࠍ‛ߚߞߥ⇣ߪߣ࠲࡯࠺✵⸠ޔߪ࠲࡯࠺࠻ࠬ࠹ࠆ޿↪ߢߎߎޔ‬. 4.

(11) ಽ㘃ེߩ⹏ଔ߇ḩ⿷ߩ޿ߊ‛ߢ޽ߞߚߥࠄ߫‫ᧂޔ‬⍮ߩ࠺࡯࠲㧔ࠢ࡜ࠬ࡜ࡌ࡞߇ಽࠄߥ ޿࠺࡯࠲㧕ߩࠢ࡜ࠬ࡜ࡌ࡞ࠍ㜞޿♖ᐲߢ੍᷹ߔࠆߎߣ߇಴᧪ࠆ‫ޕ‬ ಽ㘃ߩᣇᴺߪ‫ޔ‬᭽‫ߥޘ‬ᣇᴺ߇޽ࠆ߇‫⺰ᧄޔ‬ᢥߢ૶↪ߔࠆ᳿ቯᧁߦࠃࠆಽ㘃ࠪࠬ࠹ࡓ 㧔C4.5㧕ߦߟ޿ߡ⚫੺ߔࠆ‫ޕ‬. 2. 2 ᳿ቯᧁߦࠃࠆಽ㘃㧔C4.5㧕  ᳿ቯᧁߣߪ‫ޔ‬࿑ 2㧚2 ߩࠃ߁ߥᧁ᭴ㅧߩࡈࡠ࡯࠴ࡖ࡯࠻ߢ޽ࠆ‫ޕ‬ᨑߩಽ߆ࠇ⋡㧔node㧕 ߇࠺࡯࠲ߩዻᕈ‫ߩߎߘޔ‬ᨑ㧔branch㧕߇ߘߩዻᕈߩ୯‫ߩ⪲ߡߒߘޔ‬ㇱಽ㧔leaf㧕߇ࠢ ࡜ࠬߩ࡜ࡌ࡞ߣߥߞߡ޿ࠆߣߊߦ‫⇟৻ޔ‬రߦߥࠆࡁ࡯࠼߇ᩮߩㇱಽ㧔root㧕ߣߥߞߡ ޿ࠆ‫ޕ‬࿑ ࿑ 2㧚2 ߪ‫ߣ޿ߒ߶ࠍ࠲࡯ࡘࡇࡦࠦޔ‬ᕁߞߡ޿ࠆ㘈ቴ߇ࠦࡦࡇࡘ࡯࠲ࠍ⾈߁߆ ⾈ࠊߥ޿߆ߣ⸒߁ࠦࡦ࠮ࡊ࠻ߢ૞ᚑߐࠇߚ᳿ቯᧁߢ޽ࠆ‫ߩߎޕ‬᳿ቯᧁࠍ૶↪ߒߡᧂ⍮ ߩ࠺࡯࠲ࠍ੍᷹ߔࠆߣ߈ߪ‫ߩ࠻࡯࡞ߕ߹ޔ‬ㇱಽ㧔ߎߎߢߪ ̌age̍㧕߆ࠄಽ㘃ࠍᆎ߼ ᰴߩࡁ࡯࠼߳ߣಽ㘃ࠍㅴ߼‫ᦨޔ‬ᓟߩ࡝࡯ࡈߩㇱಽߢ⚿ᨐ߇ߢࠆ‫ߦ߁ࠃߩߎޕ‬᳿ቯᧁߪ‫ޔ‬ ࠾ࡘ࡯࡜࡞ࡀ࠶࠻ࡢ࡯ࠢߥߤߣߪ⇣ߥࠅ‫ޔ‬㕖Ᏹߦࠊ߆ࠅ߿ߔ޿̌࡞࡯࡞̌ࠍ૞ᚑߔࠆ ߎߣ߇಴᧪ࠆ‫ޕ‬૞ᚑߐࠇߚ࡞࡯࡞ߪ‫ޔ‬㨬IF̖THEN̖㨭ߩࠃ߁ߥ৻⥸⊛ߥ⸒⺆ߢ◲නߦ ⴫ߔߎߣ߇಴᧪ࠆ‫ޕ‬  ಽ㘃ེࠍ૶↪ߒߚಽ㘃ࠪࠬ࠹ࡓߦߪ‫ޔ‬ID3‫ޔ‬CART‫ޔ‬C4.5 ߥߤ߇޽ࠆ‫⺰ᧄޕ‬ᢥߢߪ‫ޔ‬ C4.5 ࠍಽ㘃ࠪࠬ࠹ࡓߣߒߡ૶↪ߔࠆ‫ߩࠇߘߢߎߘޕ‬േ૞ߦߟ޿ߡ‫ޔ‬2.2.1 ߢ᳿ቯᧁߩ ૞ᚑࠍ⺑᣿ߔࠆ‫ߩߎޔߚ߹ޕ‬ಽ㘃ࠪࠬ࠹ࡓߪ‫ߩᧁޔ‬ᨑಿࠅࠍⴕ߁‫ޔߪࠇߘޕ‬૞ᚑߒߚ ᧁߪᨑߩᢙ߇ᄙߊ⸠✵࠺࡯࠲ߦ㕖Ᏹߦ․ൻߒߚ࡞࡯࡞ߦߥߞߡ޿ࠆ(ㆊቇ⠌)߆ࠄߢ޽ ࠆ‫ߩߎޕ‬ᣇᴺࠍ 2.2.2 ߢ⺑᣿ࠍߔࠆ‫ޕ‬. age㧫 <=30. 30…40. sutudent? no. no. >40. yes. credit_rating?. yes. excellent. yes. no. ࿑㧞㧚㧞 ᳿ቯᧁ. 5. fair. yes.

(12) 2.2.1. ᳿ቯᧁߩ૞ᚑ.  ᳿ቯᧁ૞ᚑߪ‫ߩ࠼࡯ࡁߕ߹ޔ‬᳿ቯ߆ࠄᆎ߹ࠆ‫ߒ߽ޕ‬តᩏߒߡ޿ࠆ࠺࡯࠲࠮࠶࠻ౝߩ ࠢ࡜ࠬ࡜ࡌ࡞߇ߔߴߡหߓ⒳㘃ߛߞߚߥࠄ߫‫ߥߢ߁ߘޕࠆߥߦࡈ࡯࡝ߪ࠼࡯ࡁߩߘޔ‬ ޿႐วߪ‫ߦ࠼࡯ࡁߩߘޔ‬ᒰߡߪ߹ࠆ̌ዻᕈ̍ࠍㆬᛯߔࠆ‫ߩߎޕ‬૞ᬺࠍ➅ࠅ㄰ߒⴕߞߡ ߔߴߡ࡝࡯ࡈߦߚߤࠅ⌕޿ߚࠄቇ⠌ࠍ⚳ੌߐߖࠆ‫ޕ‬  C4.5 ߪ‫ޔ‬ዻᕈߩ᳿ቯߩߚ߼ߦ೑ᓧᲧ(Gain Ratio)ߣ޿߁⹏ଔ㑐ᢙࠍ↪޿ߡㆬᛯࠍߔ ࠆ‫ߩߎޕ‬ᣇᴺߦߟ޿ߡએਅߢ⺑᣿ࠍߔࠆ‫ޕ‬  㓸ว sߢ᭴ᚑߐࠇߚ㓸ว S ߦߟ޿ߡ⠨߃ࠆ‫ޕ‬㓸ว S ߩࠢ࡜ࠬዻᕈߪ‫ ޔ‬m ୘ߩࠢ࡜ࠬ 㧔C. {C 1,...,C m }㧕ࠍᜬߞߡ޿ࠆߣߒ‫ ޔ‬siߪ㓸วౝߩࠢ࡜ࠬ C iࠍࠨࡐ࡯࠻ߔࠆ࠺࡯࠲. ߩᢙߣߔࠆ‫ޔߣࠆߔ߁ߘޕ‬㓸ว S ࠍ⴫ߔߚ߼ߦᔅⷐߥᖱႎ㊂ߪ‫ޔ‬ m. I(s1,s2 ,...,sm ) ¦ pi log2 (pi) ࡮࡮࡮࡮࡮࡮࡮ 㧔㧞㧚㧝㧕 i1. ߣߥࠆ‫ޔߢߎߎޕ‬piߪ㓸ว S ߩਛ߆ࠄ C iߩ࠺࡯࠲࠮࠶࠻ࠍㆬ߮಴ߔ⏕₸ߢ޽ࠆ㧔 si /S 㧕‫ޕ‬ ߹ߚ‫ޔ‬log ߩᐩ߇㧞ߣߥߞߡ޿ࠆߩߪࡆ࠶࠻ߦࠛࡦࠦ࡯࠼ߐࠇߡ޿ࠆ߆ࠄߢ޽ࠆ‫ޕ‬ 㧔  log2 (8)ߥࠄ߫㧟ࡆ࠶࠻㧕  ߟ߉ߦ n ୘ߩ୯ࠍᜬߟዻᕈ A㧔 A {a1,...,an}㧕ߦߟ޿ߡ⠨߃ࠆ‫ޕ‬ዻᕈ A ߪ‫ޔ‬㓸ว S ࠍ n ୘ߩࠨࡉ࠮࠶࠻ {S1,...,Sn}ߦಽߌࠆߎߣ߇಴᧪ࠆ‫ߩߢߎߎޕ‬㓸ว S j ߪ‫ޔ‬㓸ว S ਛߩ. ዻᕈ A ߩ a jࠍᜬߟ‛ߢ޽ࠆ‫ߢߎߎޕ‬ዻᕈ A ߇ࡁ࡯࠼ߩዻᕈߣߥߞߚ႐วߪ‫ߩࠄࠇߎޔ‬ ࠨࡉ࠮࠶࠻ߪ㓸ว S ࡁ࡯࠼߆ࠄᨑಽ߆ࠇߒߡ೎ߩࡁ࡯࠼ߦߥࠆ‫ޕ‬ᰴߩ૞ᬺߪ‫ޔ‬ዻᕈ A ߇ࡁ࡯࠼ߩዻᕈߣߥߞߚ႐วߩࠛࡦ࠻ࡠࡇ࡯ࠍ᳞߼ࠆ‫ޔߢߎߎޕ‬sijࠍࠨࡉ࠮࠶࠻ S j ౝ ߩ C iࠍࠨࡐ࡯࠻ߔࠆ࠺࡯࠲ߩᢙߣߔࠆߣࠛࡦ࠻ࡠࡇ࡯ߪ n. E (A). ¦. j1. s1 j  ... sm j s. I(s1 j  ... sm j) ࡮࡮࡮࡮ 㧔㧞㧚㧞㧕. ߎߎߢߩ s1 j  ... sm j sߪ‫ޔ‬㓸ว S ౝߢߩࠨࡉ࠮࠶࠻ S j ߩ㗫ᐲߦኻߔࠆ㊀ߺߠߌߢ޽ ࠆ‫ޔߚ߹ޕ‬I(s1 j  ... sm j)ߪ‫ ࠻࠶࠮ࡉࠨޔ‬S j ࠍ⴫ߔߚ߼ߦᔅⷐߥᖱႎ㊂ߢ޽ࠅ‫ޔ‬ᑼ㧔㧞㧚 㧝㧕ߣห᭽ߩᣇᴺߢ᳞߼ࠆߎߣ߇಴᧪ࠆ‫ޔߦࠄߐޕ‬਄⸥ߩ㧞ᑼࠍ૶↪ߒߡዻᕈ A ࠍࡁ ࡯࠼ߣߒߚ႐วߩ೑ᓧ(Gain)ࠍ᳞߼ࠆ‫ޔߪࠇߎޕ‬ᰴߩࠃ߁ߦߥࠆ‫ޕ‬. G ain(A) I(s1,s2 ,...,sm ) E (A) ࡮࡮࡮࡮࡮࡮࡮ 㧔㧞㧚㧟㧕 C4.5 ߩ೨ㅴߢ޽ࠆ ID3 ߣ޿߁ಽ㘃ࠪࠬ࠹ࡓߢߪ‫⹏ࠍࠇߎޔ‬ଔ㑐ᢙߣߒߡ߽ߜ޿ߡಽ. 6.

(13) 㘃ࠍⴕߞߡ޿ߚ‫⹏ࠍࠇߎޕ‬ଔ㑐ᢙߣߒߡ೑↪ߔࠆߎߣߢ⦟޿⚿ᨐ߇ᓧࠄࠇࠆ߇‫ޔ‬ᄙᢙ ߩ୯ࠍߣࠆዻᕈࠍ㊀ⷞߔࠆ௑ะ߇޽ࠆ‫ߩ࠻ࠬ࡝ޔ߫߃ߣߚޕ‬ਛߩฬ೨ߥߤߢ޽ࠆ‫ߎޕ‬ ࠇࠍࡁ࡯࠼ߣߒߡߒ߹߁ߣ㧝੐଀߆ࠄߥࠆߚߊߐࠎߩㇱಽ㓸ว߇಴᧪ߡߒ߹߁‫ࠇߎޕ‬ ࠍ࿁ㆱߔࠆߚ߼ߦ C4.5 ߢߪ৻⒳ߩᱜⷙൻࠍⴕ߁‫ޔߪࠇߎޕ‬ᄙᢙߩ୯ࠍขࠆߎߣߦࠃ ߞߡᓧࠄࠇߚ೑ᓧㇱಽࠍ⺞ᢛߔࠆߎߣߢ޽ࠆ‫ࠆ޽ޕ‬੐଀ߦ㑐ߒߡ‫ࠬ࡜ࠢߩߤ߇ࠇߘޔ‬ ߦዻߔࠆ߆ߢߪߥߊ‫⚿࠻ࠬ࠹ߩߘޔ‬ᨐ⥄૕ࠍવ߃ࠆࡔ࠶࠮࡯ࠫߩᖱႎ㊂ࠍ⠨߃ࠆ‫ߎޕ‬ ࠇߪ‫ޔ‬એਅߩᑼߢ⴫ߔߎߣ߇಴᧪ࠆ‫ޕ‬ n. si s log2 i  ࡮࡮࡮࡮࡮࡮࡮ 㧔㧞㧚㧠㧕 s i1 s. Split_ info(A) ¦. ߎࠇߪ‫ޔ‬㓸ว S ࠍ n ୘ߩㇱಽ㓸ว߳ಽഀߔࠆߎߣߦࠃߞߡᓧࠄࠇࠆోᖱႎ㊂ࠍ⴫ߒߡ ޿ࠆ‫৻ޕ‬ᣇᖱႎ㊂೑ᓧ(Gain)ߪ‫ࠬ࡜ࠢޔ‬ಽߌߦ㑐ࠊࠆㇱಽߩᖱႎ㊂ࠍ޽ࠄࠊߔ‫ߞࠃޕ‬ ߡ‫ޔ‬ዻᕈ A ߇ࡁ࡯࠼ߩዻᕈߣߥߞߚ႐วߩ೑ᓧᲧ㧔Gain Ratio㧕ߪᑼ㧞㧚㧟ߣᑼ㧞㧚 㧠ࠍ↪޿ߡ  ࡮࡮࡮ 㧔㧞㧚㧡㧕 G ain _ Ratio(A) G ain(A) Split_ info(A) ߣߥࠆ‫ޕ‬  C4.5 ߪ‫ߥ߁ࠃߩߎޔ‬㗅ᐨߢ㓸วౝߩߔߴߡߩዻᕈߦߟ޿ߡ೑ᓧᲧࠍ⸘▚ߒ߽ߞߣ ߽㜞޿⹏ଔ߇ߢߚዻᕈࠍࡁ࡯࠼ߩዻᕈߣߒߡㆬࠎߢࠁߊ‫ޕ‬. 2.2.2. ᧁߩᨑಿࠅ㧔Tree Pruning㧕.  ૞ᚑߒߚ᳿ቯᧁߪ‫✵⸠ޔ‬㓸วࠍಽ㘃ߔࠆߚ߼ߦ㕖Ᏹߦᄙߊᨑಽ߆ࠇߒᄙጘߦߥߞߡ ޿ࠆ‫ߩߎޔߪࠇߎޕ‬㓸วߦߩߺ․ൻ(ㆊቇ⠌)ߒߚᧁߢ޽ࠆ‫ᧂޕ‬⍮ߩ࠺࡯࠲ߦ߽ኻᔕߢ ߈ࠆࠃ߁ߦⶄ㔀ߦߥࠅߔ߉ߚᧁߩᨑࠍಿࠆ૞ᬺ߇ᔅⷐߦߥࠆ‫ޕ‬  ߢߪ‫ߦ߁ࠃߩߤޔ‬ᨑࠍಿࠆߩߛࠈ߁߆‫ޕ‬᳿ቯᧁߩᨑಿࠅߩᣇᴺߪ‫ޔ‬㧞⒳㘃ߩᣇᴺ߇ ޽ࠆ‫ޕ‬  㧝ߟ⋡ߩᣇᴺߪ‫ޔ‬ ̌pre-pruning̍ߣ๭߫ࠇ‫ࠇߘޔ‬એ਄⸠✵੐଀ߩ㓸วࠍಽഀߒߥ޿ ߎߣࠍ᳿ቯߔࠆᣇᴺߢ޽ࠆ‫ޔߪࠇߎޕ‬න⚐ൻߔࠇ߫ਇⷐߦߥࠆ᭴ㅧࠍ૞ࠆߚ߼ߩᤨ㑆 ࠍᶉ⾌ߒߥ޿ߣ޿߁․ᓽ߇޽ࠆ‫ߥ⊛ဳౖޕ‬ᣇᴺߢߪ‫ޔ‬ㇱಽ㓸วࠍಽഀߔࠆ߽ߞߣ߽⦟ ޿ᣇᴺࠍ⺞ߴ‫ߥ⊛⸘⛔ޔ‬㊀ⷐᕈ‫ޔ‬ᖱႎ೑ᓧ‫ࠅ⺋ޔ‬ᷫዋᕈ‫⹏ࠍޘ╬ޔ‬ଔߔࠆ‫ߎޔߒ߽ޕ‬. 7.

(14) ߩ⹏ଔ߇૗ࠄ߆ߩ㑣୯ࠃࠅૐߊߥࠇ߫‫ߩߘޔ‬ಽഀߪළਅߐࠇࠆ‫ߩߘޔߡߒߘޕ‬ㇱಽ㓸 วߦኻߔࠆᧁߪᦨㆡߥ⪲ߣ⸒߁ߎߣߦߥࠆ‫ߩߎޔߒ߆ߒޕ‬ᣇᴺߪ㑣୯ߩ⸳ቯ߇ߣߡ߽ 㔍ߒ޿‫ޕ‬᳿ቯᧁࠍ૞ᚑߔࠆߣ߈‫ޔ‬㑣߇㜞ߔ߉ࠇ߫න⚐ߔ߉ࠆ᳇ߦߥߞߡߒ߹޿‫ޔ‬ૐߔ ߉ࠇ߫ోߊߎߩᣇᴺ߇෻ᤋߐࠇߥ޿ߣ⸒ߞߚ໧㗴߇޽ࠆ‫ޕ‬  㧞ߟ⋡ߩᣇᴺߪ‫̌ޔ‬post-pruning̍ߣࠃ߫ࠇ‫ޔ‬᭴▽ߐࠇߚ᭴ㅧߩ޿ߊߟ߆ࠍㆊ෰ߦ ߐ߆ߩ߷ߞߡ೥㒰ߔࠆᣇᴺߢ޽ࠆ‫ߩߎޕ‬ᣇᴺߪ‫ޔ‬ㅢᏱߤ߅ࠅᧁࠍ૞ᚑߒߡ߆ࠄ‫ޔ‬૞ᚑ ߐࠇߚᧁߩ㗅ᔕߒߔ߉ߚㇱಽࠍಿࠅขߞߡ޿ߊ‫ߦߣ޽ޕ‬ಾࠅᝥߡࠄࠇࠆᧁߩㇱಽࠍ૞ ࠆߚ߼ߦ⾌߿ߐࠇࠆ⸘▚㊂ߪᧄ⾰⊛ߥ໧㗴ߣߥࠆ‫↢ࠍᧁޔߒ߆ߒޕ‬㐳ߐߖߚ޽ߣߢᨑ ಿࠅࠍⴕ߁ߎߣߪ‫ᤨޔ‬㑆ߪ߆߆ࠆ߇ࠃࠅା㗬ߢ߈ࠆᣇᴺߢ޽ࠆ‫ޕ‬. 2. 3 ಽ㘃ࠪࠬ࠹ࡓߩ⹏ଔᣇᴺ  ಽ㘃ࠪࠬ࠹ࡓߩᱜ⏕ᕈࠍ⷗Ⓧ߽ࠆߎߣߪ‫ߩߘޔ‬ಽ㘃ࠪࠬ࠹ࡓ߇૞ᚑߒߚ࡞࡯࡞߇ᧂ ⍮ߩ࠺࡯࠲ߦߚ޿ߔࠆ੍᷹♖ᐲࠍ⍮ࠆߚ߼ߦ߽㊀ⷐߢ޽ࠆ‫ޔߣࠆߍ޽ࠍ଀ޕ‬એ೨ߩ࠮ ࡯࡞ࠬ࠺࡯࠲ࠍ↪޿ߡಽ㘃ࠪࠬ࠹ࡓߦ⸠✵ߐߖ㘈ቴߩ⾈߁‛ࠍ੍᷹ߔࠆߣ߈‫੍ߩߘޔ‬ ᷹߇ታ㓙ߩ㘈ቴߦߚ޿ߒߡߤߩ⒟ᐲߩା㗬ᐲ߇޽ࠆ߆ࠍ⺞ߴࠆᤨ߿‫ޔ‬ಽ㘃ࠪࠬ࠹ࡓߩ ᕈ⢻ࠍᲧセߔࠆߣ߈ߥߤߪ‫ޔ‬ಽ㘃ࠪࠬ࠹ࡓߩ⹏ଔߪߣߡ߽㊀ⷐߦߥࠆ‫ޔߪߢߎߎޕ‬ಽ 㘃ࠪࠬ࠹ࡓߩ৻⥸⊛ߥ⹏ଔᣇᴺ‫ޔ‬holdout ߣ cross-validation ࠍ⚫੺ߔࠆ‫ޕ‬  Holdout ߪ‫ߕ߹ޔ‬ਈ߃ࠄࠇߚ࠺࡯࠲ࠍ࡜ࡦ࠳ࡓߦ⸠✵࠺࡯࠲ߣ࠹ࠬ࠻࠺࡯࠲ߩ㧞ߟ ߦಽഀߔࠆ‫ߪ࠲࡯࠺✵⸠ޔߦ․ޕ‬೨࠺࡯࠲ߩ㧟ಽߩ㧞ࠍഀࠅᒰߡ‫ޔ‬ᱷࠅࠍ࠹ࠬ࠻࠺࡯ ࠲ߦഀࠅᒰߡࠆ‫ࠍ࠲࡯࠺✵⸠ޔߡߒߘޕ‬ಽ㘃ࠪࠬ࠹ࡓߦቇ⠌ߐߖ࠹ࠬ࠻࠺࡯࠲ߣ౒ߦ ૞ᚑߐࠇߚ࡞࡯࡞ߩ⹏ଔࠍⴕ߁‫⹏ߡߒ↪૶ࠍ࠲࡯࠺ߚ޿↪ߦ✵⸠ޕ‬ଔࠍߔࠆࠃࠅߪ‫ޔ‬ ⦟޿⹏ଔ߇ߢ߈ࠆ‫ޔߦࠄߐޕ‬㨗࿁หߓ૞ᬺࠍ➅ࠅ㄰ߒⴕ޿ߘߩᐔဋ୯ࠍߘߩಽ㘃ࠪࠬ ࠹ࡓߩ⹏ଔߣߔࠆ‫ޕ‬  k-fold Cross-validation ߪ‫ޔ‬ਈ߃ࠄࠇߚ࠺࡯࠲ࠍ࡜ࡦ࠳ࡓߦ㨗୘ߩห㊂ߩ࠺࡯࠲࠮ ࠶࠻ߦಽഀߔࠆ‫ޕ‬ಽഀߐࠇߚ࠺࡯࠲࠮࠶࠻ࠍ S1,S2 ,...,Sk ߣߔࠆ‫ߦࠄߐޕ‬ᰴߩࠃ߁ߥᴺ ೣߢ‫ ࠍ࠻ࠬ࠹ߣ✵⸠ޔ‬k ࿁➅ࠅ㄰ߒⴕ߁‫ޕ‬i ࿁⋡ߩ૞ᬺߢ࠺࡯࠲࠮࠶࠻ Siࠍ࠹ࠬ࠻࠺ ࡯࠲ߣߒߡ↪޿ࠆ‫ޔߡߒߘޕ‬ᱷࠅࠍ⸠✵࠺࡯࠲ߣߔࠆ‫ߡ޿↪ࠍ࠲࡯࠺✵⸠ޕ‬ಽ㘃ࠪࠬ ࠹ࡓቇ⠌ࠍߐߖࠆ‫ޕ‬૞ᚑߐࠇߚ࡞࡯࡞ߪ‫⹏ߡ޿↪ࠍ࠲࡯࠺࠻ࠬ࠹ޔ‬ଔߐࠇࠆ‫߹ߎߎޕ‬. 8.

(15) ߢߩ૞ᬺࠍ➅ࠅ㄰ߒⴕ޿‫⚿ߩࠄࠇߘޕ‬ᨐߩᐔဋ߇ಽ㘃ࠪࠬ࠹ࡓߩ⹏ଔߣߥࠆ‫ޔߚ߹ޕ‬ ㅢᏱߩಽ㘃ࠪࠬ࠹ࡓߩ⹏ଔߢߪ‫ޔ‬10-fold cross-validation ߇ㅴ߼ࠄࠇߡ޿ࠆ‫⺰ᧄޕ‬ᢥ ߢߪ‫⹏ߩߎޔ‬ଔᴺࠍ 10 ࿁ⴕ߁ߎߣߦࠃࠅಽ㘃ࠪࠬ࠹ࡓࠍ⹏ଔߒߚ‫ޕ‬. Data. ࠹ࠬ࠻࠺࡯࠲. S1. S2. ̖̖̖. Sk. ⸠✵࠺࡯࠲. ಽ㘃ࠪࠬ࠹ࡓ. ࡞࡯࡞. ⹏ଔ㧝.  k-fold cross-validation ߩ 1 ࿁⋡ߦⴕࠊࠇࠆ૞ᬺࠍ࿑ߦ␜ߒߚ‫ߩߎޕ‬૞ᬺࠍ k ࿁ⴕ޿⹏ଔߩᐔဋ ߇ಽ㘃ེߩ⹏ଔߣߥࠆ‫ޕ‬. ࿑ 2㧚3㧦k-fold cross-validation. 9.

(16) ╙ 3 ┨ ࡉ࡯ࠬ࠹ࠖࡦࠣ(Boosting) 3. 1 ࡉ࡯ࠬ࠹ࠖࡦࠣߣߪ  ࡉ࡯ࠬ࠹ࠖࡦࠣߦၮᧄ⊛ߥ⠨߃ᣇߪ‫ߺ⚵ࠍ࡞࡯࡞ޔ‬วࠊߖߡࠃࠅ⦟޿࡞࡯࡞ࠍ૞ࠆ. ߣ޿߁‛ߢ޽ࠆ‫{ ࠍ࡞࡯࡞ޕ‬h1,h2 ,...,hT }ߣߔࠆߣ⚵ߺวࠊߐࠇߚ࡞࡯࡞ߪᰴߩࠃ߁ߦ ⴫ߔߎߣ߇಴᧪ࠆ‫ޕ‬ T. f(x). ¦ D h (x) t t. t1. D tߪ htߩଥᢙߣߥߞߡ޿ࠆ‫ޔߚ߹ޕ‬D t,ht ߪࡉ࡯ࠬ࠹ࠖࡦࠣߩቇ⠌⺖⒟ߢ⸘▚ߐࠇࠆ‫ޕ‬ ߎߩࠃ߁ߥࠕ࡞ࠧ࡝࠭ࡓߪࠕࡦࠨࡦࡉ࡞ቇ⠌ࠕ࡞ࠧ࡝࠭ࡓߣ๭߫ࠇߡ޿ࠆ‫ޔߦ߆߶ޕ‬ Bagging ߿ Arcing ߥߤߩࠕ࡞ࠧ࡝࠭ࡓ߇޽ߍࠄࠇࠆ‫ޕ‬  ࡉ࡯ࠬ࠹ࠖࡦࠣߩ࡞࡯࠷ߪ㧼㧭㧯ቇ⠌⎇ⓥߦ޽ࠆ‫⎇ߩߘޕ‬ⓥߩਛߢ‫ޔ‬Kearns ߣ Valiant ߇‫ࠅࠃࡓ࠳ࡦ࡜ޟޔ‬㜞޿♖ᐲߩቇ⠌ᯏ᪾ࠍ⚵ߺวࠊߖࠆߎߣߦࠃߞߡߘࠇࠃ ࠅ߽⦟޿ቇ⠌ᯏ᪾ࠍ૞ࠆߎߣ߇น⢻ߢ޽ࠆ‫ߚߒ⸽┙ࠍߣߎ߁⸒ߣޠ‬੐߆ࠄᆎ߹ߞߚ‫ޕ‬ 1989 ᐕ㧘Schapire ߦࠃߞߡℂ⺰⊛ߦ଻⸽ߐࠇߚࡉ࡯ࠬ࠹ࠖࡦࠣ߇⠨᩺ߐࠇߚ‫ߩߎޕ‬ ೋᦼߩࡉ࡯ࠬ࠹ࠖࡦࠣߪ‫ ߚߒߣࠬ࡯ࡌࠍࠢ࡯ࡢ࠻࠶ࡀ࡞࡜࡯ࡘ࠾ޔ‬OCR ߦㆡ↪ߒߚ ታ㛎߇ⴕࠊࠇߡ޿ࠆ‫⃻ޔߒ߆ߒޕ‬ታߦ޽ࠆ᭽‫ߥޘ‬ቇ⠌ࠕ࡞ࠧ࡝࠭ࡓ߳ߩㆡ↪ߦߪ᭽‫ޘ‬ ߥ໧㗴ࠍᛴ߃ߡ޿ߚ‫ޕ‬ 1995 ᐕ㧘Freund ߣ Schapire ߦࠃߞߡ⃻ታߦ޽ࠆቇ⠌ࠕ࡞ࠧ࡝࠭ࡓ߳ߩㆡ↪߇น ⢻ߥ AdaBoost (࿑ 2.1)߇㐿⊒ߐࠇߚ‫ߩࡓ࠭࡝ࠧ࡞ࠕߩߎޕ‬േ૞ߪએਅߩ▵ߢ⺑᣿ߔࠆ‫ޕ‬ ߎߩࠕ࡞ࠧ࡝࠭ࡓߪ‫ޔ‬ℂ⺰਄ߢߪቇ⠌ࠍㅴ߼ߡ޿ߊ߁߃ߢㆊቇ⠌ࠍߐߌࠆߎߣ߇಴᧪ ࠆߣߐࠇߡ޿ࠆ߇‫ߣࠆ޿ߡߞ౉߇࠭ࠗࡁߦ࠲࡯࠺✵⸠ޔ‬૛ࠅࠃ޿⚿ᨐ߇ᓧࠄࠇߥ޿੐. 10.

(17) ߇ታ㛎ߦࠃߞߡ␜ߐࠇߡ޿ࠆ‫ޕ‬ ᦨㄭߩࡉ࡯ࠬ࠹ࠖࡦࠣ⎇ⓥߪ‫ࠅࠃޔ‬ലᨐ⊛ߥࠕ࡞ࠧ࡝࠭ࡓߩ㐿⊒‫ࠆߥߣࠬ࡯ࡌޔ‬ቇ ⠌ᯏ᪾ߩᕈ⢻ߣߩ㑐ଥߩ⸃᣿ߥߤ߇޽ࠆ‫⎇ᧄޕ‬ⓥߢߪ‫ࠆߥߣࠬ࡯ࡌޔ‬ቇ⠌ᯏ᪾ߩᕈ⢻ ߣ AdaBoost ߣߩ㑐ଥߩಽᨆߦᵈ⋡ߒߚ‫ޔߚ߹ޕ‬AdaBoost ࠍᡷ⦟ߒߘࠇߣߩ㑐ଥߦ ߟ޿ߡ߽ಽᨆࠍⴕߞߚ‫ޕ‬એਅߦ‫ޔ‬AdaBoost ࠕ࡞ࠧ࡝࠭ࡓ‫ޔ‬ᡷ⦟ߦߟ޿ߡㅀߴࠆ‫ޕ‬. 3. 2 AdaBoost  AdaBoost㧔Adaptive Boosting㧕ߪ‫ޔ‬ೋᦼߩࡉ࡯ࠬ࠹ࠖࡦࠣߩ໧㗴ὐࠍ⸃᳿ߔߴߊ Freund ߣ Schapire ߦࠃߞߡឭ᩺ߐࠇߚࠕ࡞ࠧ࡝࠭ࡓߢ޽ࠆ‫ޕ‬AdaBoost ߪ‫ޔ‬࿑ ࿑㧟㧚 㧝ߦ␜ߔࠃ߁ߥࠕ࡞ࠧ࡝࠭ࡓߢ޽ࠅ‫ߩߘޔ‬ၮᧄേ૞ߪએਅߢ⺑᣿ߔࠆ‫౉ޔߕ߹ޕ‬ജߣ ߒߡ⸠✵࠺࡯࠲ {(x1,y1 ),...,(xm ,ym )}ࠍฃߌขࠆ‫ޔߢߎߎޕ‬ฦ xiߪ৻ቯߩ੐଀ⓨ㑆 X ߦ ዻߒߡ߅ࠅ‫ߚ߹ޔ‬ฦ yiߪ৻ቯߩ࡜ࡌ࡞㓸ว Y ߦዻߒߡ޿ࠆ‫ࠆߥߣࠬ࡯ࡌޔߡߒߘޕ‬ቇ ⠌ᯏ᪾㧔BaseLearner㧕ࠍ๭߮಴ߔ࡜࠙ࡦ࠼ࠍ T ࿁➅ࠅ㄰ߔ㧔 t 1,...T 㧕‫ࠗࡐߢߎߎޕ‬ ࡦ࠻ߣߥࠆࠕࠗ࠺ࠖࠕߪ‫࠲࡯࠺✵⸠ޔ‬਄ߦቯ⟵ߐࠇߚ⏕₸ಽᏓ㧔߹ߚߪ㊀ߺ㧕ߦࠃࠆ ࡝ࠨࡦࡊ࡝ࡦࠣࠍ↪޿ࠆߣ⸒߁ߎߣߢ޽ࠆ‫ ࠼ࡦ࠙࡜ޕ‬tߦ߅ߌࠆߎߩಽᏓߦࠃࠆ੐଀ i ਄ߩ㊀ߺࠍ D t(i)ߣᦠߊ‫ߩࠄࠇߎޕ‬㊀ߺߩೋᦼ୯ߪߔߴߡ╬ߒߊ⸳ቯߔࠆ߇‫ޔ‬ฦ࡜࠙ ࡦ࠼ߦ߅޿ߡ⻢ߞߡ੍᷹ߐࠇߚ੐଀ߩ㊀ߺ߇Ⴧ߿ߐࠇ‫ޔ‬BaseLearner ߇ࠃࠅ㔍ߒ޿੐ ଀ߦ㓸ਛߒߡቇ⠌ߔࠆࠃ߁ߦߥߞߡ޿ߊ‫ޕ‬  AdaBoost ߩߥ߆ߢߩ BaseLearner ߩ௛߈ߪ‫₸⏕ޔ‬ಽᏓ D tߦኻߒߡㆡߒߚᒙ઒⺑. ht: X o Y .ࠍ⷗ߟߌ‫ޔ‬಴ജߔࠆߎߣߦ޽ࠆ‫ߢߎߎޕ‬ᒙ઒⺑ htߩ㊀ⷐᐲߪ‫ ޔ‬D tߦࠃࠆ⺋ ࠅ⏕₸. Ht. ¦D. (i). t. i:ht(xi)z yi. ߦࠃࠅ᷹ࠄࠇࠆ‫ ߩߢߎߎޕ‬D tߪ‫ޔ‬BaseLearner ߦቇ⠌ߐߖࠆ㓙૶↪ߒߚ੐଀ߩಽᏓ ߢ޽ࠆߎߣߦᵈᗧߔࠆ‫ޕ‬. 11.

(18) The algorithm AdaBoost Input:. m ୘ߩ⸠✵࠺࡯࠲ {(x1,y1 ),...,(xm ,ym )} ࡜ࡌ࡞㓸วߪ‫ ޔ‬yi  Y. {1,...,k}ߣߔࠆ‫ޕ‬. ࡌ࡯ࠬߣߥࠆቇ⠌ᯏ᪾㧦 BaseLearner ࠻࡜ࠗࠕ࡞࿁ᢙ T ߩ⸳ቯ Step1: D 1 (i) 1/m ߦࠃߞߡฦ੐଀ߦኻߒߡ㊀ߺࠍೋᦼൻ Step2: t 1,2,...,T ߦኻߒߡ. 1. ⏕₸ಽᏓ D tࠍ↪޿ߡ BaseLearner ߦࠃࠆቇ⠌ 2. ઒⺑ ht: X o Y .ࠍᓧࠆ‫ޕ‬ 3.. ¦D. D tߦࠃࠆ⺋ࠅ⏕₸ ht: H t. (i).ࠍ⸘▚ߔࠆ‫ޕ‬. t. i:ht(x)z yi. ߽ߒ H t ! 1/2 ߥࠄ߫‫ ޔ‬T 4.. ઒⺑ߩ㊀ⷐᐲ E t. 5.. ㊀ߺߩᦝᣂ D t: D t1 (i). t 1ߣߒߡᦨೋߩࠬ࠹࠶ࡊߦᚯࠆ‫ޕ‬. H t /(1 H t).ࠍ⸘▚‫ޕ‬ D t(i) ­ E t ifht(xi) yi u® Z t ¯ 1 otherw ise. Z tߪ‫ޔ‬ᱜⷙൻቯᢙ‫ޕ‬ Step3: ᦨ⚳઒⺑. hfin (x) arg m ax yY. ¦. t:ht(x) y. log. 1. Et. .. ࿑ 3㧚㧝㧦AdaBoost ࠕ࡞ࠧ࡝࠭ࡓ  ᒙ઒⺑ ht߇ᓧࠄࠇࠆߚ߮ߦ࿑ 2.1 ߦ␜ߐࠇࠆࠃ߁ߦ E tࠍ⸳ቯߔࠆ‫ޔߪࠇߎޕ‬ᒙ઒⺑. htߩ㊀ⷐᐲࠍ␜ߒߡ޿ࠆ‫ ߒ߽ޔߢߎߎޕ‬H t d 1/2ߢ޽ࠇ߫ E t d 1ߣߥࠅ‫⥸৻ޔ‬ᕈࠍᄬࠊ ߕߦ઒ቯߔࠆߎߣ߇಴᧪ࠆ‫ޔߚ߹ޕ‬H t߇ዊߐߌࠇ߫ዊߐ޿߶ߤ‫ ޔ‬E tߪዊߐߊߥࠆ‫ޕ‬ᰴ ߦ‫₸⏕ޔ‬ಽᏓ D tߩᦝᣂ߇࿑ ࿑㧟㧚㧝ߦ␜ߔࠃ߁ߥⷙೣߢⴕࠊࠇࠆ‫ᦝߩߎޕ‬ᣂⷙೣߦࠃࠅ‫ޔ‬ ᒙ઒⺑ߦࠃߞߡᱜߒߊಽ㘃ߐࠇߚ੐଀ߦኻߒ㊀ߺࠍᷫࠄߒ‫ޔ‬㑆㆑ߞߡಽ㘃ߐࠇߚ੐଀ ߦኻߒ㊀ߺࠍᷫࠄߒߡ޿ߊ‫ޔߦ߁ࠃߩߎޕ‬㊀ߺߪ㔍ߒ޿੐଀ߦኻߒߡ㓸ਛߒߡ޿ߊ‫ޕ‬  ᦨ⚳઒⺑ hfin ߪ‫ߡߒ߁ߎޔ‬ᓧࠄࠇߚ T ୘ߩᒙ઒⺑ ht ߣߘߩ㊀ⷐᐲ E t ࠍ↪޿ߡ‫ޔ‬㊀ߺ. ઃ߈ᄙᢙ᳿ߣߒߡᓧࠄࠇࠆ‫ޕ‬. 12.

(19) 3. 3 AdaBoost ߩ․ᓽ ߎߩࠕ࡞ࠧ࡝࠭ࡓߪ‫⃻ޔ‬ታߦㆡ↪ߔࠆߦ޽ߚߞߡᄙߊߩఝࠇߚᕈ⾰ࠍᜬߞߡ޿ࠆ‫ޕ‬ ߚߣ߃߫‫ޔ‬න⚐ߥࠕ࡞ࠧ࡝࠭ࡓߢ޽ࠅ‫◲߇ࡓ࡜ࠣࡠࡊޔ‬නߢ޽ࠅ‫߽▚⸘ޔ‬ല₸⊛ߢ޽ ࠆ‫࡞ࠕࠗ࡜࠻ޕ‬࿁ᢙ T ࠍߩߙߌ߫⺞ᢛߩᔅⷐߥࡄ࡜ࡔ࡯࠲߽ߥ޿‫ޔߚ߹ޕ‬BaseLearner ߦߚ޿ߔࠆ⹦ߒ޿⍮⼂ࠍᔅⷐߣߖߕ‫ޔ‬઒⺑ࠍ⊒⷗ߔࠆࠕ࡞ࠧ࡝࠭ࡓߢ޽ࠇ߫છᗧߩ‛ ߣ⚵ߺวࠊߖࠆߎߣ߇಴᧪ࠆ‫ޔ߽߆ߒޕ‬BaseLearner ߩ੍᷹♖ᐲߣ⸠✵࠺࡯࠲ߩࠨࠗ ࠭ߦ㑐ߔࠆ✭߿߆ߥ᧦ઙߩਅߢ‫ޔ‬ℂ⺰⊛ߥᕈ⢻଻⸽߇ਈ߃ࠄࠇߡ޿ࠆ‫ޔߪࠇߎޕ‬ቇ⠌ 㗔ၞో૕ߦ߅޿ߡ㜞♖ᐲࠍ㆐ᚑߔࠆቇ⠌ࠕ࡞ࠧ࡝࠭ࡓࠍ⊒⷗ߔࠆઍࠊࠅߦ‫ࡓ࠳ࡦ࡜ޔ‬ ੍᷹ࠃࠅ߶ࠎߩዋߒ⦟޿♖ᐲࠍᜬߟࠕ࡞ࠧ࡝࠭ࡓࠍ⊒⷗ߔࠇ߫⦟޿ߎߣߦߥࠆ‫߆ߒޕ‬ ߒ‫߇࠲࡯࠺✵⸠ޔ‬ਇ⿷ߒߒߡ޿ࠆ႐ว߿ⶄ㔀ߔ߉ࠆᒙ઒⺑߇ᓧࠄࠇࠆ႐ว‫ߪߚ߹ޔ‬ BaseLearner ߩ੍᷹♖ᐲ߇ૐߔ߉ࠆ႐วߥߤߪ‫ߪࠣࡦࠖ࠹ࠬ࡯ࡉޔ‬ℂ⺰⊛ߦലᨐ⊛ߢ ߪߥ޿ߣߐࠇߡ޿ࠆ‫ޕ‬ AdaBoost ߪ᭽‫⎇ߥޘ‬ⓥ⠪ߦࠃߞߡታ㛎⊛ߦ⹏ଔߐࠇߡ޿ࠆ‫ޕ‬Freund ߣ Schapire 㨇2㨉ߪ‫ޔ‬AdaBoost ߇᳿ቯᧁࠍ↪޿ߚಽ㘃ࠪࠬ࠹ࡓߢ޽ࠆ C4.5‫ޔ‬න৻ࡁ࡯࠼߆ࠄߥ ࠆ᳿ቯᧁ㧔᳿ቯᩣ㧕ࠍ↪޿ߚࠕ࡞ࠧ࡝࠭ࡓߦ AdaBoost ࠍㆡ↪ߒߚታ㛎ߢ‫ޔ‬AdaBoost ࠍㆡ↪ߒߚ᳿ቯᩣߪ‫ޔ‬C4.5 ߣห╬ߩᕈ⢻ࠍᜬߟߎߣࠍታ⸽ߒߚ‫ޔߚ߹ޕ‬C4.5 ߣ⚵ߺ วࠊߖࠆߎߣߦࠃࠅ♖ᐲߩะ਄߇⷗ࠄࠇࠆߎߣࠍ␜ߒߡ޿ࠆ‫ޕ‬ ߐࠄߦ‫ޔ‬Quinlan[1]ߢߪ‫ޔ‬C4.5 ߦߚ޿ߒߡ AdaBoost‫ޔ‬Bagging ߣ޿߁㧞ߟߩࠕࡦ ࠨࡦࡉ࡞ቇ⠌ࠕ࡞ࠧ࡝࠭ࡓࠍૐⷐߒߡታ㛎ࠍⴕߞߡ޿ࠆ‫ޕ‬ਔᣇߩࠕ࡞ࠧ࡝࠭ࡓߪ‫౒ޔ‬ ߦ C4.5 ߦኻߒߡ᦭ലߢ޽ࠆߣ޿߁⚿ᨐߣߥߞߡ޿ࠆ‫ޔߦࠄߐޕ‬AdaBoost ߪ‫ޔ‬Bagging ࠃࠅ߽ C4.5 ߦኻߒߡ᦭ലߢ޽ࠆߣ޿߁⚿ᨐ߽␜ߐࠇߡ޿ࠆ‫ߩߎޕ‬ታ㛎ߢߪ‫ࠬ࡯ࡉޔ‬ ࠹ࠖࡦࠣߩ࠻࡜ࠗࠕ࡞࿁ᢙߪ‫ޔ‬10 ࿁ߣߥߞߡ޿ࠆ‫ޕ‬ ઁߦ߽᭽‫ߥޘ‬ቇ⠌ࠪࠬ࠹ࡓߦኻߔࠆታ㛎ߢࡉ࡯ࠬ࠹ࠖࡦࠣ߇᦭ലߢ޽ࠆߣ޿߁ႎ ๔ߐࠇߡ޿ࠆ‫᦭߇ࠣࡦࠖ࠹ࠬ࡯ࡉޔߒ߆ߒޕ‬ലߢߥ޿ࠤ࡯ࠬߩႎ๔߽޽ߞߚ‫߃ߣߚޕ‬ ߫ Freund ߣ Schapire[5]ߦࠃࠆ OCR ߩታ㛎ߦ߅޿ߡ‫ߩ࠲࡯࠺ޔ‬ਛߦ଀ᄖ੐଀߇㕖Ᏹ ߦᄙߊሽ࿷ߔࠆߣ߈‫ޔ‬࿎㔍ߥ੐଀߳㊀ߺࠍ㓸ਛߐߖࠆߎߣ߇ቇ⠌ߦ㊀ᄢߥᖡᓇ㗀ࠍ␜ ߔߎߣ߇޽ࠆߣ⸒߁‛ߢ޽ࠆ‫ߦࠬ࡯ࠤߥ߁ࠃߩߎޕ‬ኻᔕߔࠆߚ߼ߦᡷ⦟ߐࠇߚࠕ࡞ࠧ ࡝࠭ࡓ߽޿ߊߟ߆⊒⴫ߐࠇߡ޿ࠆ‫ޕ‬. 13.

(20) ߹ߚ‫ߩࠣࡦࠖ࠹ࠬ࡯ࡉޔ‬ᄌࠊߞߚㆡ↪଀߽޽ߞߚ‫଀ޔߪࠇߘޕ‬ᄖ੐଀ߩ⊒⷗ߢ޽ࠆ‫ޕ‬ ࡉ࡯ࠬ࠹ࠖࡦࠣߪ‫੍ߚߞ⻢ޔ‬᷹ࠍߒߚ੐଀ߦኻߒߡቇ⠌ࠍ㓸ਛߐߖߡ޿ߊߣ޿߁ㇱಽ ࠍ೑↪ߒߚ‛ߢ޽ࠆ‫ޕ‬. 14.

(21) ╙ 4 ┨ AdaBoost ߩᡷ⦟ 4. 1 ᡷ⦟ߩឭ᩺  ᧄ⺰ᢥߢߪ‫ޔ‬AdaBoost ߩേ૞ࠍࠃࠅᷓߊಽᨆߔࠆߚ߼ߦ‫ߩࡓ࠭࡝ࠧ࡞ࠕޔ‬ᡷ⦟ࠍ ⴕ޿‫ޔ‬AdaBoost ߣߩᲧセࠍⴕ߁‫ޕ‬AdaBoost ߩᡷ⦟ߪ‫ޔ‬᭽‫⎇ߥޘ‬ⓥ߇ߥߐࠇߡ޿ࠆ‫ޕ‬ ߚߣ߃߫‫ޔ‬Quinlan[1]ߢߪ‫⚳ᦨޔ‬઒⺑ߩᣇᴺߩᡷ⦟ࠍⴕߞߡ޿ࠆ‫ߩߎޕ‬ᡷ⦟ߪ‫ߎߘޔ‬ ߘߎ⦟޿⚿ᨐ߇಴ߡ޿ߚ‫ޔߚ߹ޕ‬೨┨ߢ߽⚫੺ߒߚ߇㑆㆑ߞߚ੐଀ߦቇ⠌߇㓸ਛߒߔ ߉ߥ޿ࠃ߁ߦ‫ޔ‬㊀ߺᦝᣂߩᣇᴺࠍᡷ⦟ߒߚ‛ߥߤ߇޽ࠆ‫ޕ‬  ᧄ⺰ᢥߢߪ‫ޔ‬㊀ߺߩᦝᣂᣇᴺߩᡷ⦟ࠍⴕߞߚ‫ޕ‬AdaBoost ߩ㊀ߺᦝᣂߩᴺೣߪ‫ߤޔ‬ ߩࠃ߁ߥ੐଀ߦኻߒߡ߽หߓᣇᴺߢⴕࠊࠇࠆ‫࠲࡯࠺✵⸠ޔߒ߆ߒޕ‬ౝߩ࡜ࡌ࡞㓸ว. yi  Y. {1,...,k}߇ሽ࿷ߔࠆഀวߪ৻ቯߢߪߥߊ‫ޔ‬ሽ࿷ߔࠆഀว߇ዋߥ޿࡜ࡌ࡞ߪ‫ޔ‬ഀ. วߩᄙ޿࡜ࡌ࡞ࠃࠅ੍߽᷹ߔࠆߣ߈ߩ㊀ⷐᐲߪ‫ޔ‬ᄢ߈޿ߣ⠨߃ࠆߎߣ߇಴᧪ࠆ‫ߣߚޕ‬ ߃߫‫࠲࡯࠺✵⸠ޔ‬ᢙ 10 ୘㧘࡜ࡌ࡞㓸ว yi  Y. {1,1}ߩ⸠✵࠺࡯࠲߇޽ࠆߣߔࠆߣ‫ޔ‬. ࡜ࡌ࡞ 1߇ 7 ୘޽ࠆߣߔࠆߣ‫  ࡞ࡌ࡜ޔ‬1ߪ 3 ୘ߒ߆ሽ࿷ߒߥ޿ߎߣߦߥࠆ‫߈ߣߩߎޕ‬ ࡜ࡌ࡞  1ߪ‫ ࡞ࡌ࡜ޔ‬1ࠃࠅ੍᷹ߔࠆߣ߈ߩ㊀ⷐᐲߪᄢ߈޿ߣ⠨߃ࠄࠇࠆ‫ࠍߣߎߩߎޕ‬ AdaBoost ߩ㊀ߺᦝᣂߩᣇᴺߦขࠅ౉ࠇߡቇ⠌ࠍⴕ߁ࠕ࡞ࠧ࡝࠭ࡓߩᡷ⦟ࠍⴕߞߚ‫ޕ‬ ߎࠇߦࠃߞߡ AdaBoost ߩᕈ⢻߇ߤߩࠃ߁ߦᄌൻߔࠆ߆ࠍᲧセߔࠆߎߣߦߒߚ‫ޕ‬ ߥ߅‫ޔ‬ ᡷ⦟ߒߚࠕ࡞ࠧ࡝࠭ࡓߪ‫ޔ‬㧞⒳㘃૞ᚑߒߚ‫⺰ᧄޕ‬ᢥߢߪ‫࡞࠺ࡕޔ‬㧝ࠍ AdaBoost_M1 ߣ๭߱ߎߣߦߒ‫࡞࠺ࡕޔ‬㧞ࠍ AdaBoost_M2 ߣ๭߱ߎߣߦߔࠆ‫ޕ‬એਅߩ▵ߢࠕ࡞ࠧ࡝ ࠭ࡓߩ⺑᣿ࠍⴕ߁‫ޕ‬. 15.

(22) 4. 2 ࡕ࠺࡞㧝㧔AdaBoost_M1㧕  ߎߩࠕ࡞ࠧ࡝࠭ࡓߪ‫ޔ‬AdaBoost ߩቇ⠌ࡊࡠ࠮ࠬߦ⸠✵࠺࡯࠲ౝߩ࡜ࡌ࡞ߩഀวߦ ࠃࠆ㊀ⷐᐲࠍട߃ߡቇ⠌ߐߖࠆߚ߼ߦᰴߩࠃ߁ߥ૞ᬺࠍട߃ߚ‫ޕ‬  ߹ߕ‫ޔ‬ቇ⠌ࠍߔࠆ㓙ߩḰ஻ߩߣ߈ߦ࡜ࡌ࡞㓸วߩഀว. D (yi) ߚߛߒ y  Y. {1,...,k}ߣߔࠆ‫ޕ‬. ࠍḰ஻ߔࠆ‫ޕ‬  BaseLearner ߦࠃࠆቇ⠌‫ޔ₸ࠅ⺋ޔ‬઒⺑ߩ㔛ⷐߤߩ⸘▚ߪ‫ޔ‬ㅢᏱߩ AdaBoost ߣห ߓᣇᴺߢⴕ߁‫ޕ‬  ᰴߦ‫ޔ‬࿑ ࿑㧠㧚㧝ߩߣ߅ࠅ㊀ߺߩᦝᣂࠍⴕ߁‫ޕ‬ᱜߒ޿੍᷹ࠍߒߚ‛ߦኻߒ E t ߣߘߩ੐ ଀ߩታ㓙ߩ࡜ࡌ࡞ߩഀว D (yi)ࠍࠍដߌࠆ‫ޔߡߞࠃޕ‬ᱜߒ޿੍᷹ࠍߒߚ㊀ߺߪᷫዋߔ ࠆ‫ߩ࡞ࡌ࡜ޔߦࠄߐޕ‬ഀวࠍដߌࠆߎߣߦࠃߞߡ‫࡞ࡌ࡜ޔ‬ഀวߩዊߐ޿‛ߩ㊀ߺߪ࡜ ࡌ࡞ߩഀวߩᄢ߈޿‛ߩ㊀ߺࠃࠅߐࠄߦዊߐߊߥࠆ‫ߩߣ޽ޕ‬૞ᬺߪ‫ޔ‬ㅢᏱߩ AdaBoost ߣห᭽ߢ޽ࠆ‫ޕ‬. 16.

(23) The algorithm Modified AdaBoost 1 (AdaBoost_M1) Input:. m ୘ߩ⸠✵࠺࡯࠲ {(x1,y1 ),...,(xm ,ym )} ࡜ࡌ࡞㓸วߪ‫ ޔ‬yi  Y ࡜ࡌ࡞㓸วߩሽ࿷ߔࠆഀว㧦 D (yi) ࡌ࡯ࠬߣߥࠆቇ⠌ᯏ᪾㧦 BaseLearner ࠻࡜ࠗࠕ࡞࿁ᢙ T ߩ⸳ቯ. Step1: D 1 (i) 1/m ߦࠃߞߡฦ੐଀ߦኻߒߡ㊀ߺࠍೋᦼൻ Step2: t 1,2,...,T ߦኻߒߡ. 1. ⏕₸ಽᏓ D tࠍ↪޿ߡ BaseLearner ߦࠃࠆቇ⠌ 2. ઒⺑ ht: X o Y .ࠍᓧࠆ‫ޕ‬ 3.. ¦D. D tߦࠃࠆ⺋ࠅ⏕₸ ht: H t. (i).ࠍ⸘▚ߔࠆ‫ޕ‬. t. i:ht(x)z yi. ߽ߒ H t ! 1/2 ߥࠄ߫‫ ޔ‬T 4.. ઒⺑ߩ㊀ⷐᐲ E t. 5.. ㊀ߺߩᦝᣂ D t: D t1(i). t 1ߣߒߡᦨೋߩࠬ࠹࠶ࡊߦᚯࠆ‫ޕ‬. H t /(1 H t).ࠍ⸘▚‫ޕ‬ D t(i) ­ E t u D (yi) ifht(xi) yi u® otherw ise Zt ¯ 1. Z tߪ‫ޔ‬ᱜⷙൻቯᢙ‫ޕ‬ Step3: ᦨ⚳઒⺑. hfin (x) arg m ax yY. ¦. t:ht(x) y. log. 1. Et. .. ࿑㧠㧚㧝㧦AdaBoost_M1 ࠕ࡞ࠧ࡝࠭ࡓ. 17. {1,...,k}ߣߔࠆ‫ޕ‬.

(24) 4. 3 ࡕ࠺࡞㧞㧔AdaBoost_M2㧕  ߎߩࠕ࡞ࠧ࡝࠭ࡓߪ‫ޔ‬AdaBoost ߩቇ⠌ࡊࡠ࠮ࠬߦ⸠✵࠺࡯࠲ౝߩ࡜ࡌ࡞ߩഀวߦ ࠃࠆ㊀ⷐᐲࠍട߃ߡቇ⠌ߐߖࠆߚ߼ߦ࡜ࡌ࡞ࠍࠊߌߡ㊀ߺߩᦝᣂࠍⴕߞߚ‫ࠫ࡯ࡔࠗޕ‬ ߪ‫ޔ‬AdaBoost ߩቇ⠌ߔࠆࠬ࠶࠹ࡊߩ৻ߟߩ࠲࡯ࡦߢ BaseLearner ߩቇ⠌⚳ੌᓟ‫⸠ޔ‬ ✵࠺࡯࠲ࠍ࡜ࡌ࡞ߏߣߦࠊߌ㊀ߺߩᦝᣂࠍⴕ޿߹ߚ‫ోޔ‬૕ࠍ⚵ߺวࠊߖࠆߣ޿ߞߚᣇ ᴺߢ޽ࠆ‫ޕ‬ ࿑㧠㧚㧞ࠃࠅ Step2 ߩ㧞߹ߢߪㅢᏱߩ AdaBoost ߣหߓߢ޽ࠆ‫ޔߦ߉ߟޕ‬  ᚻ㗅ߪ‫ޔ‬࿑ ࡜ࡌ࡞೎ߩ㊀ߺࠍ↪޿ߚ⺋ࠅ⏕₸. H (y) ߚߛߒ y  Y. {1,...,k}ߣߔࠆ‫ޕ‬. ࠍ⸘▚ߒߔࠆ‫߽ߢߟ৻߇₸ࠅ⺋ߢߎߎޕ‬㧝/㧞એ਄ߦߥߞߚ႐วቇ⠌ࠍ⚳ੌߐߖࠆ‫ߘޕ‬ ߒߡ‫ߩߘޔ‬઒⺑ߩ࡜ࡌ࡞೎ߩ㊀ⷐᐲ. E (y) ߚߛߒ y  Y. {1,...,k}ߣߔࠆ‫ޕ‬. ࠍ⸘▚ߔࠆ‫ߩࠄࠇߎߦ߉ߟޕ‬ᐔဋ୯ࠍ᳞߼ో૕ߩ઒⺑ߦኻߔࠆ㊀ⷐᐲࠍ᳞߼ࠆ‫ޕ‬㊀ߺ ߩᦝᣂᤨߪ‫࡞ࡌ࡜ޔ‬೎ߩߘߩ઒⺑ߦኻߔࠆ㊀ⷐᐲࠍ૶↪ߒߡ㊀ߺߩᦝᣂࠍⴕ߁‫ᦨޕ‬ᓟ ߦ㊀ߺߩᱜⷙൻࠍⴕ߁‫ޕ‬  ᦨ⚳઒⺑ߢ૶↪ߔࠆ઒⺑೎ߩ㊀ⷐᐲߪ Step2 ߩ㧢ߢ⸘▚ߒߚ‛ࠍ૶↪ߒᄙᢙ᳿ࠍ ⴕ߁‫ޕ‬. 18.

(25) The algorithm Modified AdaBoost 1 (AdaBoost_M2) Input:. m ୘ߩ⸠✵࠺࡯࠲ {(x1,y1 ),...,(xm ,ym )} ࡜ࡌ࡞㓸วߪ‫ ޔ‬yi  Y. {1,...,k}ߣߔࠆ‫ޕ‬. ࡌ࡯ࠬߣߥࠆቇ⠌ᯏ᪾㧦 BaseLearner ࠻࡜ࠗࠕ࡞࿁ᢙ T ߩ⸳ቯ Step1: D 1 (i) 1/m ߦࠃߞߡฦ੐଀ߦኻߒߡ㊀ߺࠍೋᦼൻ Step2: t 1,2,...,T ߦኻߒߡ. 1. ⏕₸ಽᏓ D tࠍ↪޿ߡ BaseLearner ߦࠃࠆቇ⠌ 2. ઒⺑ ht: X o Y .ࠍᓧࠆ‫ޕ‬ 3.. D tߦࠃࠆ࡜ࡌ࡞೎⺋ࠅ⏕₸ H (1),...H (k)ࠍ⸘▚ߔࠆ‫ޕ‬ ߽ߒ H (y)! 1/2㧔ߚߛߒ y Y. {1,...,k}㧕ߥࠄ߫‫ޔ‬T. 4.. ࡜ࡌ࡞೎ߩ઒⺑ߩ㊀ⷐᐲࠍ⸘▚ E (y). 5.. ࡜ࡌ࡞೎ߩ⺋ࠅ₸ߩᐔဋࠍ⸘▚. t 1ߣߒߡᦨೋߩࠬ࠹࠶ࡊߦᚯࠆ‫ޕ‬. H (y)/(1 H (y)) ߚߛߒ y Y {1,...,k} k. Ht. ¦ H (Y ) k Y 1. 6.. ઒⺑ߩ㊀ⷐᐲ E t. H t /(1 H t).ࠍ⸘▚‫ޕ‬. 7.. ࡜ࡌ࡞೎ߩ㊀ߺᦝᣂ D t: D 't1 (i) D t(i)u ®. 8.. ᱜⷙൻࠍⴕ߁‫ޕ‬. ­ E (yi) ifht(xi) yi otherw ise ¯ 1. Step3: ᦨ⚳઒⺑. hfin (x) arg m ax yY. ¦. t:ht(x) y. log. 1. Et. .. ࿑㧠㧚㧞㧦AdaBoost_M2 ࠕ࡞ࠧ࡝࠭ࡓ. 19.

(26) ╙ 5 ┨ ࠪࠬ࠹ࡓߩ᭎ⷐ 5. 1 ᭎ⷐ  ᧄ⎇ⓥߢߪ‫ߣࠣࡦࠖ࠹ࠬ࡯ࡉޔ‬ಽ㘃ࠪࠬ࠹ࡓߩᱜ⏕ߐߣߩ㑐ଥࠍ⺞ߴࠆߚ߼ߦએਅ ߩࠃ߁ߥࠪࠬ࠹ࡓࠍ૞ᚑߒߚ‫ޕ‬ x C4.5 ߩᨑಿࠅ೨ߩᧁߦ AdaBoost ࠍㆡ↪ߒߚ̌un-pruned BC4.5̍ x C4.5 ߩᨑಿࠅᓟߩᧁߦ AdaBoost ࠍㆡ↪ߒߚ̌pruned BC4.5̍ x C4.5 ߩᨑಿࠅ೨ߩᧁߦ AdaBoost_M1 ࠍㆡ↪ߒߚ̌un-pruned BC4.5̍ x C4.5 ߩᨑಿࠅᓟߩᧁߦ AdaBoost_M1 ࠍㆡ↪ߒߚ̌pruned BC4.5̍ x C4.5 ߩᨑಿࠅ೨ߩᧁߦ AdaBoost_M2 ࠍㆡ↪ߒߚ̌un-pruned BC4.5̍ x C4.5 ߩᨑಿࠅᓟߩᧁߦ AdaBoost_M2 ࠍㆡ↪ߒߚ̌pruned BC4.5̍ x 10-times 10-fold Cross-validation ࠪࠬ࠹ࡓ ̌CV̍ ਄⸥ߩࠃ߁ߦ㧢⒳㘃ߩಽ㘃ࠪࠬ࠹ࡓ‫ޔ‬ಽ㘃ࠪࠬ࠹ࡓࠍ⹏ଔߔࠆߚ߼ߩࠪࠬ࠹ࡓ‫⸘ߩޔ‬ 㧣⒳㘃ߩࠪࠬ࠹ࡓࠍ૞ᚑߒߚ‫̌ߪࡓ࡜ࠣࡠࡊߩࠄࠇߎޔߚ߹ޕ‬Microsoft Visual C++ 6.0̍ߢ૞ᚑߒߚ‫ޔߦࠄߐޕ‬C4.5 ߪ‫ޔ‬ ̌http://www.cse.unsw.edu.au/~quinlan/̍਄ߩ ̌C4.5 Releace8̍ߩ࠰࡯ࠬࠦ࡯࠼ࠍ೑↪ߒߚ‫ޔ߅ߥޕ‬ಽ㘃ࠪࠬ࠹ࡓߣ CV ࠪࠬ࠹ࡓ ߩᔕ╵ߪ‫ޕߚߞⴕߡߒ╬ࠍ࡞ࠗࠔࡈޔ‬  㧡㧚㧞ߦߡಽ㘃ࠪࠬ࠹ࡓ‫ޔ‬㧡㧚㧟ߦߡ CV ࠪࠬ࠹ࡓߦߟ޿ߡߩ⹦ߒ޿⺑᣿ࠍߔࠆ‫ޕ‬. 20.

(27) 5. 2 ಽ㘃ࠪࠬ࠹ࡓ ૞ᚑߒߚಽ㘃ࠪࠬ࠹ࡓߪ‫ޔ‬㧟⒳㘃ߩ Boosting ࠕ࡞ࠧ࡝࠭ࡓࠍߘࠇߙࠇ C4.5 ߩᨑಿ ࠅ೨ᓟߩᧁߦㆡ↪ߒߚಽ㘃ࠪࠬ࠹ࡓߢ޽ࠆ‫ޕ‬㧢⒳㘃ߩಽ㘃ࠪࠬ࠹ࡓࠍ૞ᚑߒߚ߇‫ޔ‬ၮ ᧄ⊛ߦߪߔߴߡหߓࠃ߁ߥേ૞ࠍߔࠆ‫ޔߪߢࡓ࠭࡝ࠧ࡞ࠕࠣࡦࠖ࠹ࠬ࡯ࡉޕ‬ BaseLearner ߦࠃࠆቇ⠌ߩ⚳ੌᓟߦ‫ߩߘޔ‬઒⺑ߩ㊀ߺࠍ↪޿ߚࠛ࡜࡯₸߇ 0㧚㧡ࠍ⿥ ߃ߚ႐วߪ‫ޔ‬ቇ⠌ࠍ߿ࠅ⋥ߔߎߣߦߥߞߡ޿ࠆ߇ C4.5 ߩᕈ⾰਄หߓ࠺࡯࠲࠮࠶࠻߆ ࠄߪ‫ޔ‬㆑߁ᒻߩ࡞࡯࡞ߪ૞ᚑߢ߈ߥ޿ߚ߼ߘߎߢࡉ࡯ࠬ࠹ࠖࡦࠣߩቇ⠌ࠍࠬ࠻࠶ࡊߐ ߖߚ‫ޕ‬ ߟ߉ߦ C4.5 ߦࡉ࡯ࠬ࠹ࠖࡦࠣࠍㆡ↪ߔࠆߦߚ߼ߩᄌᦝὐࠍએਅߦ␜ߔ‫ޕ‬ x C4.5 ߩᄌᦝὐ ࡮Visual C++ߢࠦࡦࡄࠗ࡞ߢ߈ࠆࠃ߁ߦߒߚ‫ޕ‬ ࡮౉ജࠍᄌᦝߒߚ‫ޕ‬ 㧔એਅߢ⺑᣿㧕 ࡮C4.5 ߩቇ⠌ߦᔅⷐߥࡄ࡜ࡔ࡯࠲ߪ࠺ࠖࡈࠜ࡞࠻୯ࠍ૶↪ߔࠆ‫ޕ‬ ࡮᳿ቯᧁ૞ᚑߣᧁߩᨑಿࠅએᄖߩᯏ⢻ࠍ೥㒰ߒߚ‫ޕ‬ ࡮ᨑಿࠅήߒߩቇ⠌߽ⴕ߃ࠆࠃ߁ߦߒߚ‫ޕ‬ ࡮಴ജࠍ࠹ࠬ࠻࠺࡯࠲ߩߺߩࠛ࡜࡯₸ߛߌߦߒߚ‫ޕ‬ x ࡉ࡯ࠬ࠹ࠖࡦࠣᯏ⢻ߩߚ߼ߩㅊടὐ ࡮ࡉ࡯ࠬ࠹ࠖࡦࠣߦᔅⷐߥࡄ࡜ࡔ࡯࠲‫ޔ‬㑐ᢙߩㅊട ࡮⹜ⴕ࿁ᢙࠍ⸳ቯน⢻ߦߒߚ‫ޕ‬ ࡮㊀ߺࠍ↪޿ߚࠛ࡜࡯₸߇㧜㧚㧡ࠍ⿥߃ߚᤨὐߢቇ⠌ࠍ⚳ੌߐߖࠆ‫ޕ‬ ࡮ࡉ࡯ࠬ࠹ࠖࡦࠣᓟߩ࡞࡯࡞ߦኻߔࠆ࠹ࠬ࠻࠺࡯࠲ߩࠛ࡜࡯‫ⴕ⹜ޔ‬࿁ᢙࠍ಴ജ. ߟ߉ߦ‫ޔ‬ಽ㘃ࠪࠬ࠹ࡓߩࠛ࡜࡯₸಴ജ߹ߢߩᵹࠇࠍㅀߴࠆ‫ޕ‬ ಽ㘃ࠪࠬ࠹ࡓߪ‫࡞ࠗࠔࡈߕ߹ޔ‬ฬ‫ޔ‬᳿ቯᧁߩ⒳㘃‫ⴕ⹜ߩࠣࡦࠖ࠹ࠬ࡯ࡉޔ‬࿁ᢙߩ㗅 ߦ౉ജࠍฃߌขࠆ‫࡞ࠗࠔࡈޕ‬ฬࠍ̌DF̌ߣߒߚߣ߈ߩ౉ജ଀ߪ‫ޔ‬ C:㩯BC45 DF 㧝 㧝㧜 ߩࠃ߁ߦߥࠆ‫ޔ߅ߥޕ‬᳿ቯᧁߩ⒳㘃ߪ‫ޔ‬㧝㧚Un-pruned tree‫ޔ‬㧞㧚Pruned tree ߣߥ ߞߡ޿ࠆ‫౉ޕ‬ജᓟࡊࡠࠣ࡜ࡓߪ‫ޔ‬ ̌DF.names̍‫̌ޔ‬DF.data̍ߣ޿߁㧞⒳㘃ߩࡈࠔࠗ࡞. 21.

(28) ࠍ⺒ߺㄟ߻‫ޕ‬ᰴߦࡄ࡜ࡔ࡯࠲ߩೋᦼൻࠍⴕ߁‫ޕ‬ ߘࠇ߆ࠄએਅߩ૞ᬺࠍ⹜ⴕ࿁ᢙ T ࿁ߛߌ➅ࠅ㄰ߔ‫ޕ‬ 1. C4.5 ߦࠃࠆ᳿ቯᧁߩ↢ᚑ 2. ㊀ߺࠍ↪޿ߚࠛ࡜࡯₸ࠍ⸘▚㧔߽ߒࠛ࡜࡯₸߇㧜㧚㧡એ਄ߩ႐วߪቇ⠌⚳ੌ㧕 3. ㊀ߺߩᦝᣂ 4. ㊀ߺߩᱜⷙൻ ቇ⠌⚳ੌᓟ‫ޔ‬ ̌DF.test̍ࡈࠔࠗ࡞ࠍ⺒ߺㄟ߻‫ߩࠣࡦࠖ࠹ࠬ࡯ࡉޕ‬ቇ⠌ߩ㧝࿁⋡ߪߔߴ ߡ AdaBoost ߇ㆡ↪ߐࠇߡ޿ߥ޿ಽ㘃ࠪࠬ࠹ࡓ㧔ߟ߹ࠅ C4.5㧕ߩ⚿ᨐߣߥߞߡ޿ࠆߩ ߢ‫ࠍ₸࡯࡜ࠛߩߘޔ‬಴ജߔࠆ‫ⴕ⹜ޔߡߒߘޕ‬࿁ᢙಽߩ૞ᚑߒߚ࡞࡯࡞ࠍ⚵ߺวࠊߖߡ ᦨ⚳઒⺑ߣߒߡߘࠇߩࠛ࡜࡯₸ࠍ಴ജߔࠆ‫ޕ‬಴ജߪ‫ޔ‬AdaBoost ㆡ↪೨ߣㆡ↪ᓟߩਔ ᣇߩࠛ࡜࡯₸ߣࡉ࡯ࠬ࠹ࠖࡦࠣߩ⹜ⴕ࿁ᢙߩ㧟⒳㘃ߣߥߞߡ޿ࠆ‫ޕ‬  ૞ᚑߒߚಽ㘃ࠪࠬ࠹ࡓߩࡈࡠ࡯࠴ࡖ࡯࠻ࠍ࿑㧡㧚㧝ߦ␜ߔ‫ޕ‬. 22.

(29) “DF.name”ߣ”DF.data”ࡈࠔࠗ࡞ࠍ⺒ߺㄟ߻. i=1. C4.5 ߦࠃࠆ㊀ߺઃ߈ቇ⠌. H d 0.5. F. i=i+i. T ㊀ߺߩᦝᣂ. T ti. F. T. “DF.test”ࡈࠔࠗ࡞ߣ⺒ߺㄟ߻. Voting ฦቇ⠌߆ࠄᓧߚ઒⺑ࠍ↪޿ߡ㊀ߺ ࠍ૶ߞߚᄙᢙ᳿ߦࠃࠅಽ㘃ࠍⴕ޿‫ޔ‬ ࠛ࡜࡯₸ࠍ⸘▚ߔࠆ‫ޕ‬. ࿑ 5㧚1 ಽ㘃ࠪࠬ࠹ࡓߩ◲නߥࡈࡠ࡯࠴ࡖ࡯࠻ ߎߎߢߩ H ߪ C4㧚5 ߩ㊀ߺࠍ↪޿ߚࠛ࡜࡯₸‫ ޔ‬T ߪ⹜ⴕ࿁ᢙࠍ⴫ߒߡ޿ࠆ‫ޕ‬. 23.

(30) 5. 3 ⹏ଔࠪࠬ࠹ࡓ  CV ࠪࠬ࠹ࡓߪ‫ޔ‬ಽ㘃ࠪࠬ࠹ࡓߣࡈࠔࠗ࡞ࠍ೑↪ߒᔕ╵ࠍⴕ߁ࠪࠬ࠹ࡓߦߒߚ‫ߎޕ‬ ߩࠪࠬ࠹ࡓߩേ૞ߪ‫ޔ‬೨▵ห᭽࠺࡯࠲࠮࠶࠻ฬࠍ̌DF̌ߣߔࠆߣ‫̌ߕ߹ޔ‬DF.all̍ࡈ ࠔࠗ࡞ࠍ⺒ߺㄟ߻‫ߡߒߘޕ‬એਅߩ૞ᬺࠍ 10 ࿁ⴕ߁‫ޕ‬ 1. ࠺࡯࠲ࠍࠪࡖ࠶ࡈ࡞ߔࠆ‫ޕ‬ 2. ࠺࡯࠲ࠍ 10 ୘ߦಽߌࠆ‫ޕ‬ 3. ᰴߩ૞ᬺࠍಽߌࠄࠇߚ࠺࡯࠲ߘࠇߙࠇߦኻߒߡⴕ߁㧔⸘ 10 ࿁㧕 i.. 1 ߟࠍ̌DF.test̍ࡈࠔࠗ࡞ߦᦠ߈ㄟ߻‫ޕ‬. ii.. ᱷࠅࠍ̌DF.data̍ࡈࠔࠗ࡞ߦᦠ߈ㄟ߻‫ޕ‬. iii.. ૞ᚑߒߚ 6 ⒳㘃ߩಽ㘃ࠪࠬ࠹ࡓߦࠃࠆቇ⠌ࠍⴕ߁‫ޕ‬. iv.. ಽ㘃ࠪࠬ࠹ࡓߩ಴ജߒߚࠛ࡜࡯₸ࠍฃߌขࠆ‫ޕ‬. 4. ࠛ࡜࡯₸ߩᐔဋࠍ⸘▚ߔࠆ‫ޕ‬ ߎࠇࠄߩ૞ᬺᓟߐࠄߦࠛ࡜࡯₸ߩᐔဋࠍ⸘▚ߒߘࠇࠍ಴ജߔࠆ‫ࠪߩߎޔߡߞ߇ߚߒޕ‬ ࠬ࠹ࡓߪ‫ޔ‬ಽ㘃ࠪࠬ࠹ࡓߦࠃࠆಽ㘃‫ ⸘ޔ‬100 ࿁ߩࠛ࡜࡯₸ߩᐔဋࠍ᳞߼ࠆߎߣߦߥࠆ‫ޕ‬ CV ࠪࠬ࠹ࡓߩ◲නߥࡈࡠ࡯࠴ࡖ࡯࠻ࠍ࿑ 6㧚2 ߦ␜ߔ‫ޕ‬. 24.

(31) DF.all ࡈࠔࠗ࡞ߩ⺒ㄟ߻. i=1. ࠺࡯࠲ࠍࠪࡖ࠶ࡈ࡞ߔࠆ‫ޕ‬. ࠺࡯࠲ࠍ 10 ಽഀߔࠆ‫ޕ‬ S[1]࡮࡮࡮S[10]. j=1. i=i+1. S[j]ࠍ̌DF.test̍ߦᦠㄟߺ‫ޔ‬ ᱷࠅࠍ̌DF.data̍ߦᦠㄟ߻. 6 ⒳㘃ߩಽ㘃ࠪࠬ࠹ࡓߦࠃࠆ ࠛ࡜࡯₸ߩ಴ജ. jt 10. it 10. ࠛ࡜࡯₸ߩᐔဋࠍ಴ജ. 25. j=j=1.

(32)                 ࿑ 5㧚2 ⹏ଔࠪࠬ࠹ࡓߩ◲නߥࡈࡠ࡯࠴ࡖ࡯࠻. 26.

(33) ╙ 6 ┨ ታ㛎 6. 1 ࠺࡯࠲࠮࠶࠻  ታ㛎ߩ⋡⊛ߪಽ㘃ࠪࠬ࠹ࡓߩᱜ⏕ߐࠍ᷹ࠆߎߣߢ޽ࠆ‫ޕ‬ಽ㘃ࠪࠬ࠹ࡓߪ࠺࡯࠲࠮࠶ ࠻ߩ㆑޿ߦࠃߞߡ⇣ߥߞߚᱜ⏕ߐࠍ⷗ߖࠆ‫ޕ‬ᓥߞߡ࠺࡯࠲࠮࠶࠻ߩᢙߪᄙ޿߶ߤࠃ޿‫ޕ‬ ߘߒߡ⒳㘃߽⼾ንߢ޽ࠆᔅⷐ߇޽ࠆ‫⹏ࠍࡓ࠹ࠬࠪޔߚ߹ޕ‬ଔߔࠆߦߪᄸᛮߥ࠺࡯࠲࠮ ࠶࠻ߪᔅⷐߥߊ‫ߊࠃޔ‬⍮ࠄࠇߚ࠺࡯࠲࠮࠶࠻ࠍ↪޿ࠆߎߣ߇ᅢ߹ߒ޿ߣߐࠇߡ޿ࠆ‫ޕ‬ ߎࠇࠄߩ᧦ઙࠍߔߴߡḩߚߔ Web ਄ߩ࠺࡯࠲ࡌ࡯ࠬ߇޽ࠆ‫ޕ‬ ߘࠇߪ‫ޔ‬ UCI 㧔University of California Irvine㧕ߩࡎ࡯ࡓࡍ࡯ࠫ (http://www.ics.uci.edu/mlearn/MLRepository.html) ਄ߦ޽ࠅ‫ޔ‬ቇ⠌ࠪࠬ࠹ࡓߩ⹏ଔ ࠍⴕ߁ኾ㐷ኅߩߚ߼ߦ⛽ᜬߐࠇߡ޿ࠆ‫ޕ‬  ᧄ⎇ⓥߢߪ‫ޔ‬ 㨇1㨉ߢ↪޿ࠄࠇߡ޿ࠆ 27 ⒳㘃ߩ࠺࡯࠲࠮࠶࠻ࠍਛᔃߦ‫ޔ‬UCI ߩ࠺࡯ ࠲ࡌ࡯ࠬ߆ࠄ౉ᚻߒߚ 30 ⒳㘃ߩ࠺࡯࠲࠮࠶࠻ࠍ૶↪ߒߚ‫⎇ᧄޔ߅ߥޕ‬ⓥߢᛒ߁ߔߴ ߡߩࠪࠬ࠹ࡓߪ C4㧚5 ߩᒻᑼߦࠃࠆࡈࠔࠗ࡞ߒ߆ฃߌઃߌߥ޿ߩߢ࠺࡯࠲࠮࠶࠻ߪ ߎߩᒻᑼߦߥࠆࠃ߁ߦߒߚ‫౉ޔߚ߹ޕ‬ᚻߒߚ࠺࡯࠲࠮࠶࠻ߪᰳ៊୯ߩή޿‛ࠍㆬᛯߒ ߚ‫౉ߦࠄߐޕ‬ᚻߒߚ࠺࡯࠲࠮࠶࠻ߪ‫ޔ‬ዻᕈ୯ߪㅪ⛯୯߆㔌ᢔ୯ߣߥߞߡ޿ࠆ‫ޔߚ߹ޕ‬ ࠢ࡜ࠬߪ㔌ᢔ୯ߣߥߞߡ޿ࠆ‫ޕ‬  ⴫ 6㧚1 ߦ୘‫․ߩ࠻࠶࠮࠲࡯࠺ߩޘ‬ᓽࠍߒ߼ߒߚ‫ޕ‬. 27.

(34) データセット名 anneal audiology auto breast chess cleve diabetes DNA flare glass heart hepatitis horse- colic hypothyroid ionosphere iris labor- neg letter lymphography segment shuttle sick- euthyroid sonar soybean- large splice vehicle vote waveform- 21 wine zoo. データ数. クラス数. 898 226 205 699 3500 303 768 3186 1066 214 270 155 368 2800 351 150 85 20000 148 2310 58000 3163 208 316 3190 840 435 5000 178 101. 6 24 7 2 2 2 2 3 2 5 2 2 2 2 2 3 2 26 4 7 7 2 2 19 3 4 2 3 3 7. 属性数 連続値 離散値 6 32 70 0 14 12 10 0 0 36 6 7 8 0 0 180 2 8 10 0 13 0 6 14 6 16 7 22 34 0 4 0 8 8 16 0 3 15 19 0 9 0 7 18 59 0 0 35 0 59 18 0 0 16 21 0 13 0 0 18. ⴫㧢㧚㧝 ࠺࡯࠲࠮࠶࠻ߩ․ᓽ. 28.

(35) 6. 2 ታ㛎ߩ᭎ⷐ  ᧄ⎇ⓥߦ߅ߌࠆታ㛎ߪ‫ޔ‬ ࡮C4.5 ߣ BC4.5 ߣߩᲧセ ࡮BC4.5 ߣ BC4.5_M1 ߣߩᲧセ ࡮ BC4.5 ߣ BC4.5_M2 ߣߩᲧセ. ߎࠇࠄ 3 ⒳㘃ߩታ㛎ࠍⴕߞߚ‫߅ߥޕ‬ฦታ㛎ߣ߽ un-pruned tree ߣ pruned tree ߩᲧ セ߽ⴕߞߡ޿ࠆ‫ޕ‬  ᧄ⎇ⓥߦ߅ߌࠆࡉ࡯ࠬ࠹ࠖࡦࠣᚻᴺࠍㆡ↪ߒߚಽ㘃ࠪࠬ࠹ࡓߩࡉ࡯ࠬ࠹ࠖࡦࠣቇ ⠌ᤨߩ⹜ⴕ࿁ᢙߪ‫ޔ‬ 㨇1㨉ߩ⺰ᢥห᭽ 10 ࿁ߣߒߡⴕߞߚ‫ޔ߅ߥޕ‬ฦታ㛎ߦ߅ߌࠆᲧセ ߪ CV ࠪࠬ࠹ࡓࠍ↪޿ࠆߎߣߦࠃࠆផቯࠛ࡜࡯₸ࠍ᷹ࠆߎߣߢⴕߞߚ‫ޕ‬. 6. 3 ታ㛎ߩᚻ㗅  ታ㛎ߩᚻ㗅ࠍએਅߦ␜ߔ‫ޕ‬ 㧝㧚 ࠺࡯࠲࠮࠶࠻ࠍㆬᛯߔࠆ‫ޕ‬ 㧞㧚 ࠺࡯࠲࠮࠶࠻ࠍ CV ࠪࠬ࠹ࡓߦ⺒ߺㄟ߹ߖࠆ‫ޕ‬ 㧟㧚 ߔߴߡߩಽ㘃ࠪࠬ࠹ࡓߩࠛ࡜࡯₸ࠍᓧࠆ‫ޕ‬ ߎߩ૞ᬺࠍߔߴߡߩ࠺࡯࠲࠮࠶࠻ߦኻߒߡⴕߞߚ‫ࠍ₸࡯࡜ࠛߩߡߴߔޕ‬಴ജᓟ਄⸥ߩ 3 ⒳㘃ߩᲧセࠍⴕߞߚ‫ޕ‬. 29.

(36) 6. 4 ታ㛎ߩ⚿ᨐߣ⠨ኤ 6.4.1. C4.5 ߣ BC4.5 ߣߩᲧセ. anneal audiology auto breast chess cleve diabetes DNA flare glass heart hepatitis horse- colic hypothyroid ionosphere iris labor- neg letter lymphography segment shuttle sick- euthyroid sonar soybean- large splice vehicle vote waveform- 21 wine zoo ave. C4.5 un- pruned pruned err err 0.0578 0.0793 0.2296 0.2213 0.1762 0.1962 0.0592 0.0557 0.0056 0.0057 0.2356 0.2438 0.2593 0.2600 0.0835 0.0761 0.1884 0.1728 0.3169 0.3169 0.2478 0.2178 0.2133 0.2043 0.1742 0.1459 0.0078 0.0073 0.1075 0.1059 0.0500 0.0580 0.2180 0.2193 0.1190 0.1197 0.2601 0.2348 0.0339 0.0333 0.0002 0.0003 0.0238 0.0213 0.2725 0.2592 0.0918 0.0845 0.0789 0.0576 0.2789 0.2726 0.0545 0.0476 0.2376 0.2355 0.0711 0.0732 0.0731 0.0813 0.1409. 0.1369. BC4.5 un- pruned T err 10.0 0.0508 9.9 0.2230 10.0 0.1640 10.0 0.0392 10.0 0.0056 10.0 0.1990 10.0 0.2517 10.0 0.0832 9.9 0.1769 10.0 0.2541 10.0 0.1996 10.0 0.1796 10.0 0.1601 10.0 0.0087 10.0 0.0730 9.7 0.0553 9.8 0.1570 10.0 0.0531 10.0 0.1992 10.0 0.0206 9.2 0.0002 10.0 0.0226 9.9 0.1971 10.0 0.0729 9.9 0.0677 10.0 0.2457 10.0 0.0517 10.0 0.1721 8.6 0.0502 2.3 0.0691. pruned T err 10.0 0.0569 10.0 0.2551 10.0 0.1821 10.0 0.0388 10.0 0.0074 10.0 0.2064 10.0 0.2486 10.0 0.1744 10.0 0.1708 10.0 0.2509 10.0 0.1915 10.0 0.1808 10.0 0.1513 10.0 0.0080 10.0 0.0718 10.0 0.0600 10.0 0.1977 10.0 0.0552 10.0 0.2092 10.0 0.0215 9.9 0.0002 10.0 0.0219 9.9 0.1816 10.0 0.0594 10.0 0.0625 10.0 0.2434 10.0 0.0435 10.0 0.1737 9.4 0.0440 8.6 0.1180. 9.6. 9.9. 0.1168. 0.1229. ⴫㧢㧚㧞 C4.5 ߣ BC4.5 ߣߩᲧセ BC4.5 ߩ T ߪࡉ࡯ࠬ࠹ࠖࡦࠣߩ⹜ⴕ࿁ᢙߩᐔဋߢ޽ࠆ‫ޕ‬. 30.

(37)  ⴫ 6㧚2 ࠃࠅ‫ޔ‬C4.5 ߦ㑐ߒߡߪߘߩᕈ⢻ㅢࠅ un-pruned tree ࠃࠅ߽ pruned tree ߩᣇ߇ࠃ޿⚿ᨐߣߥߞߚߚ߼‫ޔ‬C4.5 ߩ Pruned tree ߣ 2 ⒳㘃ߩ BC4.5 ߩᲧセࠍⴕߞ ߚ‫ޕ‬. C4.5とUnprunedBC4.5との比較. 0.3. C4.5. 0.2. 0.1. 0.0 0.0. 0.1. 0.2. 0.3. Unpruned BC4.5 ࿑㧢㧚㧝 Pruned C4.5 ߣ Un-pruned BC4.5 ߣߩᲧセ. Un-pruned BC4.5 ߦ㑐ߒߡߪ‫ޔ‬C4.5 ߩ pruned tree ߦኻߒߡ 30 ୘ਛ 23 ୘ߩ࠺࡯ ࠲ߢࠛ࡜࡯₸ߩᷫዋ߇⷗ࠄࠇߚ‫ޔߚ߹ޕ‬ᐔဋ୯ࠍᲧߴࠆߣ⚂ 2.0㧑ߩࠛ࡜࡯₸߇ᷫዋ ߺࠄࠇߚ‫ޕ‬࿑㧢㧚㧝ߩࠣ࡜ࡈߪ‫ޔ‬x ゲߦ Un-pruned BC4.5 ߩࠛ࡜࡯₸ࠍߣࠅ‫ޔ‬y ゲߦ Pruned C4.5 ߩࠛ࡜࡯₸ࠍขߞߚࠣ࡜ࡈߢ޽ࠆ‫ޕ‬Un-pruned BC4.5 ߪ 7 ୘ߩ࠺࡯࠲ ߢࠛ࡜࡯₸ߩᖡൻ߇޽ߞߚ߇‫ࠅࠃࠍࡈ࡜ࠣߩߎޔ‬ᄢ᏷ߥᖡൻߪߥ޿੐߇⏕⹺ߐࠇߚ‫ޕ‬. 31.

(38) C4.5とPrunedBC4.5との比較. 0.3. C4.5. 0.2. 0.1. 0.0 0.0. 0.1. 0.2. 0.3. Pruned BC4.5 ࿑ 6㧚2. Pruned C4.5 ߣ Pruned BC4.5 ߣߩᲧセ. Pruned BC4.5 ߦ㑐ߒߡߪ‫ޔ‬C4.5 ߩ pruned tree ߦኻߒߡ 30 ୘ਛ 2㧠୘ߩ࠺࡯࠲ ߢࠛ࡜࡯₸ߩᷫዋ߇⷗ࠄࠇߚ‫ޔߚ߹ޕ‬ᐔဋ୯ࠍᲧߴࠆߣ⚂ 1.5㧑ߩࠛ࡜࡯₸ߩᷫዋ߇ ⷗ࠄࠇߚ‫ޕ‬ ࿑㧢㧚 㧞ߩࠣ࡜ࡈߪ‫ޔ‬ x ゲߦ Pruned BC4.5 ߩࠛ࡜࡯₸ࠍߣࠅ‫ޔ‬ y ゲߦ Pruned C4.5 ߩࠛ࡜࡯₸ࠍขߞߚࠣ࡜ࡈߢ޽ࠆ‫ޕ‬Pruned BC4.5 ߪ 6 ୘ߩ࠺࡯࠲ߢࠛ࡜࡯₸ߩ ᖡൻ߇޽ߞߚ߇‫߇₸࡯࡜ࠛࠅࠃࠍࡈ࡜ࠣߩߎޔ‬ᄢ᏷ߦჇᄢߒߡ޿ࠆ࠺࡯࠲߇ 3 ⒳㘃 㧔audiology, DNA, zoo㧕⏕⹺ߐࠇߚ‫ޕ‬. 32.

(39) UnprunedBC4.5とPrunedBC4.5との比較. Unpruned BC4.5. 0.3. 0.2. 0.1. 0.0 0.0. 0.1. 0.2. 0.3. Pruned BC4.5 ࿑㧢㧚㧟 BC4.5 ߩ Un-pruned ߣ Pruned ߩᲧセ BC4.5 ߩ Un-pruned ߣ Pruned ߩਔᣇߣ߽ C4.5 ࠃࠅ߽⦟޿⚿ᨐߣߥߞߚ‫ޕ‬ᐔဋߩ ࠛ࡜࡯₸ߩᷫዋࠍ⷗ࠆߣ AdaBoost ߪ‫ޔ‬C4.5 ߩ Pruned tree ࠃࠅ߽ Un-pruned tree ߳ㆡ↪ߒߚᣇ߇⦟޿⚿ᨐ߇ᓧࠄࠇࠆߎߣ߇ಽࠆ‫ޕ‬࿑ 6㧚3 ࠃࠅ‫ޔ‬BC4.5 ߩ Un-pruned ߣ Pruned ߢߪ‫ޔ‬Pruned BC4.5 ߇ Un-pruned BC4.5 ࠃࠅ߽⦟޿⚿ᨐ߇ᓧࠄࠇߡ޿ ࠆ࠺࡯࠲߽ᄙᢙ޽ࠆ߇‫ޔ‬ᢙ⒳㘃ߩ࠺࡯࠲ߢߪ Un-pruned BC4.5 ࠃࠅ߽ࠛ࡜࡯₸߇ᄢ ᏷ߦჇᄢߒߡ޿ࠆ࠺࡯࠲߇⷗ฃߌࠄࠇߚ‫✚ࠄ߆ߣߎߩࠄࠇߎޕ‬ว⊛ߦ AdaBoost ߪ‫ޔ‬ C4.5 ߩ Pruned tree ࠃࠅ߽ Un-pruned tree ߦㆡ↪ߒߚᣇ߇ലᨐ⊛޽ࠆߎߣ߇ಽࠆ‫ޕ‬. 33.

(40) 6.4.2. BC4.5 ߣ BC4.5_M1 ߣߩᲧセ.  ೨▵ߢ BC4.5 ߩ Un-pruned ߣ Pruned ߢߪ‫ޔ‬Un-pruned ߩᣇ߇⦟޿⚿ᨐ߇಴ߡ ޿ࠆߩߢ‫ޔ‬BC4.5_M1 ߣ Un-pruned BC4.5 ࠍᲧセߔࠆߎߣߦߒߚ‫ޕ‬ BC4.5 Un- pruned T err anneal 10.0 0.0508 audiology 9.9 0.2230 auto 10.0 0.1640 breast 10.0 0.0392 chess 10.0 0.0056 cleve 10.0 0.1990 diabetes 10.0 0.2517 DNA 10.0 0.0832 flare 9.9 0.1769 glass 10.0 0.2541 heart 10.0 0.1996 hepatitis 10.0 0.1796 horse- colic 10.0 0.1601 hypothyroid 10.0 0.0087 ionosphere 10.0 0.0730 iris 9.7 0.0553 labor- neg 9.8 0.1570 letter 10.0 0.0531 lymphography 10.0 0.1992 segment 10.0 0.0206 shuttle 9.2 0.0002 sick- euthyroid 10.0 0.0226 sonar 9.9 0.1971 soybean- large 10.0 0.0729 splice 9.9 0.0677 vehicle 10.0 0.2457 vote 10.0 0.0517 waveform- 21 10.0 0.1721 wine 8.6 0.0502 zoo 2.3 0.0691. BC4.5_M1 Un- pruned Pruned T err T err 6.7 0.0597 10.0 0.1173 1.1 0.2305 7.4 0.2612 10.0 0.1912 9.8 0.2468 9.0 0.0416 10.0 0.0392 10.0 0.0056 10.0 0.0057 10.0 0.1983 10.0 0.1962 10.0 0.2530 10.0 0.2462 5.7 0.0838 10.0 0.1579 1.0 0.1884 10.0 0.1708 9.3 0.2803 10.0 0.2783 10.0 0.2004 10.0 0.1848 5.4 0.2027 10.0 0.2053 9.9 0.1601 10.0 0.1434 4.6 0.0083 10.0 0.0165 10.0 0.0778 10.0 0.0794 9.9 0.0573 10.0 0.0633 9.9 0.1720 10.0 0.2280 10.0 0.0635 10.0 0.0639 7.7 0.2297 10.0 0.2373 10.0 0.0239 10.0 0.0248 9.7 0.0003 10.0 0.0010 3.2 0.0232 10.0 0.0329 9.9 0.2005 10.0 0.1937 8.9 0.0807 10.0 0.0791 9.3 0.0750 10.0 0.0810 10.0 0.2461 10.0 0.2420 10.0 0.0545 10.0 0.0440 10.0 0.1783 10.0 0.1764 9.8 0.0489 9.8 0.0512 2.0 0.0788 5.1 0.1447. ave. 8.1. 9.6. 0.1168. 34. 0.1238. 9.7. 0.1337.

(41) ⴫ 6㧚3 BC4.5 ߣ BC4.5_M1 ߣߩᲧセ. UnprunedBC4.5とUnprunedBC4.5_M1との比較. UnprunedBC4.5. 0.3. 0.2. 0.1. 0.0 0.0. 0.1. 0.2. 0.3. UnprunedBC4.5_M1 ࿑ 6㧚4 Un-pruned BC4.5 ߣ Un-pruned BC4.5_M1 ߣߩᲧセ. Un-pruned BC4.5_M1 ߪ‫ޔ‬C4.5 ࠃࠅߪࠛ࡜࡯₸ߪ⦟ߊߥߞߡ޿ࠆ‫ޔߒ߆ߒޕ‬ Un-pruned BC4.5 ߦኻߒߡ 30 ୘ਛ 27 ୘ߩ࠺࡯࠲ߢࠛ࡜࡯₸ߩჇട߇⷗ࠄࠇ‫ޔ‬ᐔဋ ୯ࠍᲧߴࠆߣ⚂ 0.8㧑ߩࠛ࡜࡯₸߇Ⴧടߣߥߞߡ޿ࠆ‫ޕ‬࿑㧢㧚㧠ߩࠣ࡜ࡈߪ‫ޔ‬x ゲߦ Un-pruned BC4.5_M1 ߩࠛ࡜࡯₸ࠍߣࠅ‫ޔ‬y ゲߦ Un-pruned BC4.5 ߩࠛ࡜࡯₸ࠍข ߞߚࠣ࡜ࡈߢ޽ࠆ‫ޔࠅࠃࡈ࡜ࠣޕ‬Un-pruned BC4.5_M1 ߪ․ߦᄢ߈ߥᖡൻߪ⷗ࠄࠇ ߥ߆ߞߚ߇‫ߦ࠲࡯࠺ߩߡߴߔ߷߶ޔ‬ኻߒߡࠛ࡜࡯₸߇Ⴧടߒߡߒ߹ߞߚ‫ޕ‬. 35.

(42) UnprunedBC4.5とPrunedBC4.5_M1との比較. UnprunedBC4.5. 0.3. 0.2. 0.1. 0.0 0.0. 0.1. 0.2. 0.3. PrunedBC4.5_M1. ࿑ 6㧚5 Un-prunedBC4.5 ߣ PrunedBC4.5_M1 ߣߩᲧセ. Pruned BC4.5_M1 ߽‫ޔ‬C4.5 ࠃࠅߪࠛ࡜࡯₸ߪ⦟ߊߥߞߡ޿ࠆ‫ޔߒ߆ߒޕ‬Un-pruned BC4.5 ߦኻߒߡ 30 ୘ਛ㧣୘ߩ࠺࡯࠲ߢࠛ࡜࡯₸ߩᷫዋ߇⷗ࠄࠇ‫ޔ‬Un-pruned BC4.5_M1 ࠃࠅߪࠛ࡜࡯₸߇ᷫዋߒߡ޿ࠆ࠺࡯࠲ߪᄙ޿੐߇⏕⹺ߐࠇߚ‫ޔߒ߆ߒޕ‬ᐔ ဋ୯ࠍᲧߴࠆߣ Un-pruned BC4.5 ߣᲧߴࠛ࡜࡯₸߇⚂ 1.7㧑߽Ⴧടߒߡ޿ࠆ‫ޕ‬ ࿑㧢㧚 5 ߩࠣ࡜ࡈߪ‫ޔ‬x ゲߦ Pruned BC4._M1 ߩࠛ࡜࡯₸ࠍߣࠅ‫ޔ‬y ゲߦ Un-pruned BC4.5 ߩࠛ࡜࡯₸ࠍขߞߚࠣ࡜ࡈߢ޽ࠆ‫ޔࠅࠃࡈ࡜ࠣޕ‬Pruned BC4.5_M1 ߪ‫߇₸࡯࡜ࠛޔ‬ ᄢ߈ߊᖡൻߒߡ޿ࠆ࠺࡯࠲߇߼ߛߞߡ޿ࠆ‫ోޕ‬૕⊛ߦߪ‫⚿޿ߥߊ⦟ࠅ߹޽ޔ‬ᨐߣߥߞ ߚ߇⃻࿷Ყセߒߚಽᵹࠪࠬ࠹ࡓ㧔6 ⒳㘃㧕ߩߥ߆ߢ 4 ߟߩ࠺࡯࠲ߦ㑐ߒߡࠛ࡜࡯₸߇ ᦨዊߦߥߞߡ޿ࠆ࠺࡯࠲߽ߺࠄࠇߚ‫ޕ‬. 36.

(43) UnprunedBC4.5_M1とPrunedBC4.5_M1との比較. Unpruned BC4.5_M1. 0.3. 0.2. 0.1. 0.0 0.0. 0.1. 0.2. 0.3. PrunedBC4.5_M1 ࿑㧢㧚㧢 Un-prunedBC4.5_M1 ߣ PrunedBC4.5_M1 ߣߩᲧセ.  BC4.5_M1 ߩ Un-pruned ߣ Pruned ߩਔᣇߣ߽ C4.5 ࠃࠅ߽⦟޿⚿ᨐߣߥߞߚ‫ߒޕ‬ ߆ߒ‫ޔ‬ਔᣇߣ߽ BC4.5 ࠃࠅ߽ᖡ޿⚿ᨐߣߥߞߚ‫ޕ‬࿑ 6㧚㧢ࠃࠅ‫ޔ‬BC4.5_M1 ߩ Un-pruned ߣ Pruned ߢߪ‫ޔ‬Pruned BC4.5_M1 ߇ Un-pruned BC4.5_M1 ࠃࠅ߽⦟ ޿ ⚿ ᨐ ߇ ᓧ ࠄ ࠇ ߡ ޿ ࠆ ࠺ ࡯ ࠲ ߪ ᄙ ᢙ ޽ ࠆ ߇ ‫ ޔ‬ᢙ ⒳ 㘃 ߩ ࠺ ࡯ ࠲ ߢ ߪ Un-pruned BC4.5_M1 ࠃ ࠅ ߽ ࠛ ࡜ ࡯ ₸ ߇ ᄢ ᏷ ߦ Ⴧ ᄢ ߒ ߚ ‫ ✚ ࠄ ߆ ߣ ߎ ߩ ࠄ ࠇ ߎ ޕ‬ว ⊛ ߦ AdaBoost_M1 ߪ‫ޔ‬C4.5 ߩ Pruned tree ࠃࠅ߽ Un-pruned tree ߦㆡ↪ߒߚᣇ߇ലᨐ ⊛ߢ޽ࠆߎߣ߇ಽࠆ‫ޕ‬. 37.

(44) 6.4.3. BC4.5 ߣ BC4.5_M2 ߣߩᲧセ.  ೨▵ห᭽‫ޔ‬BC4.5_M2 ߣ Un-pruned BC4.5 ࠍᲧセߔࠆ‫ޕ‬ BC4.5 Un- pruned T err anneal 10.0 0.0508 audiology 9.9 0.2230 auto 10.0 0.1640 breast 10.0 0.0392 chess 10.0 0.0056 cleve 10.0 0.1990 diabetes 10.0 0.2517 DNA 10.0 0.0832 flare 9.9 0.1769 glass 10.0 0.2541 heart 10.0 0.1996 hepatitis 10.0 0.1796 horse- colic 10.0 0.1601 hypothyroid 10.0 0.0087 ionosphere 10.0 0.0730 iris 9.7 0.0553 labor- neg 9.8 0.1570 letter 10.0 0.0531 lymphography 10.0 0.1992 segment 10.0 0.0206 shuttle 9.2 0.0002 sick- euthyroid 10.0 0.0226 sonar 9.9 0.1971 soybean- large 10.0 0.0729 splice 9.9 0.0677 vehicle 10.0 0.2457 vote 10.0 0.0517 waveform- 21 10.0 0.1721 wine 8.6 0.0502 zoo 2.3 0.0691. BC4.5_M2 Un- pruned Pruned T err T err 0.9 0.0578 0.9 0.0793 0.0 0.2296 0.0 0.2213 1.0 0.1762 1.0 0.1962 10.0 0.0382 10.0 0.0373 10.0 0.0058 10.0 0.0170 10.0 0.2428 10.0 0.2552 1.8 0.2591 1.9 0.2602 10.0 0.0742 10.0 0.0703 0.0 0.1884 0.0 0.1728 1.0 0.3169 1.0 0.3169 10.0 0.2100 10.0 0.1822 7.9 0.1853 2.0 0.2003 10.0 0.1573 10.0 0.1540 10.0 0.0138 10.0 0.0097 10.0 0.0813 10.0 0.0792 10.0 0.0500 10.0 0.0567 5.4 0.1720 1.7 0.2283 5.6 0.1106 9.1 0.1098 1.2 0.2601 0.9 0.2348 10.0 0.0290 9.9 0.0300 1.0 0.0002 1.2 0.0003 9.9 0.0298 10.0 0.0256 6.3 0.2480 7.7 0.2385 0.8 0.0918 0.8 0.0845 10.0 0.0541 10.0 0.0639 1.1 0.2789 1.2 0.2726 10.0 0.0446 10.0 0.0437 10.0 0.1857 10.0 0.1858 10.0 0.0591 10.0 0.0617 1.0 0.0731 1.0 0.0813. ave. 6.2. 9.6. 0.1168. 0.1308. 6.0. ⴫ 6㧚4 BC4.5 ߣ BC4.5_M2 ߣߩᲧセ. 38. 0.1323.

(45) UnprunedBC4.5とUnprunedBC4.5_M2との比較. UnprunedBC4.5. 0.3. 0.2. 0.1. 0.0 0.0. 0.1. 0.2. 0.3. UnprunedBC4.5_M2. ࿑ 6㧚7 Un-prunedBC4.5 ߣ Un-prunedBC4.5_M2 ߣߩᲧセ.  Un-pruned BC4.5_M2 ߪ‫ޔ‬C4.5 ࠃࠅߪࠛ࡜࡯₸ߪ⦟ߊߥߞߡ޿ࠆ‫ޔߒ߆ߒޕ‬ Un-pruned BC4.5 ߦኻߒߡ 30 ୘ਛ㧠୘ߩ࠺࡯࠲ߢࠛ࡜࡯₸ߩᷫዋ߇⷗ࠄࠇࠆ߇‫ޔ‬ᐔ ဋ୯ࠍᲧߴࠆߣ⚂ 1.4㧑ߩࠛ࡜࡯₸߇Ⴧടߣߥߞߡ޿ࠆ‫⹜ߩࠣࡦࠖ࠹ࠬ࡯ࡉޔߚ߹ޕ‬ ⴕ࿁ᢙ߽ㅜਛߢᱛ߹ߞߡߒ߹޿‫ޔ‬ቇ⠌߇ᱛ߹ߞߡߒ߹߁࠺࡯࠲߇ᄙᢙሽ࿷ߒߡ޿ߚ‫ޕ‬ ߹ߚ‫ޔ‬࿑㧢㧚㧣ߩࠣ࡜ࡈߪ‫ޔ‬x ゲߦ Un-pruned BC4.5_M2 ߩࠛ࡜࡯₸ࠍߣࠅ‫ޔ‬y ゲߦ Un-pruned BC4.5 ߩࠛ࡜࡯₸ࠍขߞߚࠣ࡜ࡈߢ޽ࠆ‫ޔࠅࠃࡈ࡜ࠣޕ‬Un-pruned BC4.5_M1 ߪ߃ࠄ₸ߩᄢ߈ߥᖡൻ߇⷗ߺࠄࠇ‫ߦ࠲࡯࠺ߩߡߴߔ߷߶ߦࠄߐޔ‬ኻߒߡࠛ ࡜࡯₸߇ᖡൻߒߡ޿ࠆ‫ޕ‬. 39.

(46) UnprunedBC4.5とPrunedBC4.5_M2との比較. UnprunedBC4.5. 0.3. 0.2. 0.1. 0.0 0.0. 0.1. 0.2. 0.3. PrunedBC4.5_M2. ࿑ 6㧚8 Un-prunedBC4.5 ߣ PrunedBC4.5_M2 ߣߩᲧセ.  Un-pruned BC4.5_M2 ߪ‫ޔ‬C4.5 ࠃࠅߪࠛ࡜࡯₸ߪ⦟ߊߥߞߡ޿ࠆ‫ޔߒ߆ߒޕ‬ Un-pruned BC4.5 ߦኻߒߡ 30 ୘ਛ㧢୘ߩ࠺࡯࠲ߢࠛ࡜࡯₸ߩᷫዋ߇⷗ࠄࠇࠆ߽ߩߩ‫ޔ‬ ᐔဋ୯ࠍᲧߴࠆߣ⚂ 1.6㧑ߩࠛ࡜࡯₸߇Ⴧടߒߡ޿ࠆ‫ߩ₸࡯࡜ࠛޔߒ߆ߒޕ‬ᷫዋߒߚ ࠺࡯࠲ਛߩ㧠୘ߩ࠺࡯࠲ߢߪታ㛎ߢ↪޿ߚಽ㘃ࠪࠬ࠹ࡓߩਛߢ߽ߞߣ߽ࠛ࡜࡯₸߇ ૐߊߥߞߚ‫ޔߦࠄߐޕ‬2 ୘ߩ࠺࡯࠲ߢߪ AdaBoost ߢࠛ࡜࡯₸߇Ⴧᄢߒߡߒ߹ߞߚ࠺ ࡯࠲ߦኻߒߡ߽ C4.5 ࠃࠅࠛ࡜࡯₸ߩᷫዋ߇⷗ࠄࠇߚ‫ⴕ⹜ߩࠣࡦࠖ࠹ࠬ࡯ࡉޔߚ߹ޕ‬ ࿁ᢙ߽ㅜਛߢᱛ߹ߞߡߒ߹޿‫ޔ‬ቇ⠌߇ᱛ߹ߞߡߒ߹߁࠺࡯࠲߇ᄙᢙሽ࿷ߒߡ޿ߚ‫ޕ‬࿑ 㧢㧚㧣ߩࠣ࡜ࡈߪ‫ޔ‬x ゲ Pruned BC4.5_M2 ߩࠛ࡜࡯₸ࠍߣࠅ‫ޔ‬y ゲߦ Un-pruned BC4.5 ߩࠛ࡜࡯₸ࠍขߞߚࠣ࡜ࡈߢ޽ࠆ‫ޔࠅࠃࡈ࡜ࠣޕ‬Un-pruned BC4.5_M1 ߪ߃. 40.

(47) ࠄ₸ߩᄢ߈ߥᖡൻ߇⷗ߺࠄࠇ‫ߦ࠲࡯࠺ߩߡߴߔ߷߶ߦࠄߐޔ‬ኻߒߡࠛ࡜࡯₸߇ᖡൻߒ ߡ޿ࠆ‫ޕ‬. UnprunedBC4.5_M2とPrunedBC4.5_M2との比較. Unpruned BC4.5_M2. 0.3. 0.2. 0.1. 0.0 0.0. 0.1. 0.2. 0.3. PrunedBC4.5_M2. ࿑ 6㧚9 Un-prunedBC4.5_M2 ߣ PrunedBC4.5 ߣߩᲧセ.  AdaBoost_M2 ߪ‫ޔ‬C4.5 ߩࠛ࡜࡯₸ࠍᷫዋߐߖࠆߎߣ߇಴᧪ߚ‫߽ࠄߜߤޔߒ߆ߒޕ‬ AdaBoost ࠃࠅߪ‫ޔ‬ᖡ޿⚿ᨐߣߥߞߚ‫ޕ‬AdaBoost_M2 ߪ‫ޔ‬Un-pruned tree ߦㆡ↪ߒ ߚಽ㘃ࠪࠬ࠹ࡓߩᣇ߇ࠛ࡜࡯₸ߩᐔဋߪ⦟߆ߞߚ߇‫ޔ‬࿑ 6㧚9 ࠍ⷗ࠆߣ 2 ߟߩࠪࠬ࠹ ࡓߩᏅߪ‫ߩࠄࠇߎޕߚߞ߆ߥࠇࠄ⷗ࠅ߹޽ޔ‬ಽ㘃ࠪࠬ࠹ࡓߪ‫ࡦࠖ࠹ࠬ࡯ࡉ߽ࠄߜߤޔ‬ ࠣߩቇ⠌࿁ᢙ߇ᱛ߹ߞߡߒ߹޿ቇ⠌߇ㅴ߹ߥ޿࠺࡯࠲߇ᄙߊ⷗ࠄࠇߚ‫ޕ‬ቇ⠌ࠍㅴ߼ࠆ ߚ߼ߦࠕ࡞ࠧ࡝࠭ࡓߩᡷ⦟߇ᔅⷐߢ޽ࠆߣᗵߓߚ‫ޕ‬. 41.

(48) ╙ 7 ┨ ⚿⺰.  ታ㛎ߩ⚿ᨐ‫ޔ‬AdaBoost ߅ࠃ߮ᡷ⦟ߒߚ AdaBoost ߪ‫ޔ‬C4.5 ߩ♖ᐲࠍะ਄ߐߖࠆߎ ߣ߇಴᧪ࠆߎߣ߇⏕⹺ߐࠇߚ‫ޔߚ߹ޕ‬C4.5 ߩ̌Pruned tree̍ࠃࠅ߽̌Un-pruned tree̍ ߦㆡ↪ߒߚᣇ߇ C4.5 ߩࠛ࡜࡯₸ࠍᷫዋߐߖࠄࠇࠆߎߣ߽⏕⹺ߐࠇߚ‫ࡦࠖ࠹ࠬ࡯ࡉޕ‬ ࠣᚻᴺࠍ̌Un-pruned tree̍ߦㆡ↪ߒߚ႐วߪࠛ࡜࡯₸߇Ⴧᄢߒߡߒ߹߁࠺࡯࠲࠮࠶ ࠻߽޽ߞߚ߇ᭂ┵ߦ♖ᐲ߇⪭ߜࠆߎߣߪή߆ߞߚ߇‫ޔ‬ ̌Pruned tree̍ߦㆡ↪ߒߚ႐ว ߪ‫߇₸࡯࡜ࠛޔ‬ᭂ┵ߦჇᄢߒߡߒ߹߁࠺࡯࠲࠮࠶࠻߇ᢙ୘⏕⹺ߐࠇߚ‫̌ޕ‬Pruned tree̍ߪ̌Un-pruned tree̍ࠃࠅ߽♖ᐲߩ㜞޿઒⺑ߢ޽ࠆ‫ߩࠄࠇߎޕ‬੐ࠃࠅ‫ޔ‬AdaBoost ߪ♖ᐲߩ㜞ߔ߉ࠆ઒⺑ߦኻߒߡߪ᦭ലߢߥ޿ߣ⠨߃ࠄࠇࠆ‫ޔߒ߆ߒޕ‬ታ㛎ߩ⚿ᨐࠃࠅ නߦࠛ࡜࡯₸߇ૐ޿઒⺑ߦ᦭ലߢߪߥ޿ߣ޿߁ᗧ๧ߢߪߥߊ‫ࠆߥߣࠬ࡯ࡌޔ‬ቇ⠌ࠕ࡞ ࠧ࡝࠭ࡓߩᕈ⢻߇㜞޿‛ߦኻߒߡߪࡉ࡯ࠬ࠹ࠖࡦࠣᚻᴺ߇޽߹ࠅ᦭ലߢߪߥ޿ߣ⸒ ߁ߎߣߢ޽ࠆ‫ޕ‬  ᧄ⎇ⓥߢߪ‫ޔ‬AdaBoost ࠕ࡞ࠧ࡝࠭ࡓߩᡷ⦟߽ⴕߞߚ߇૞ᚑߒߚࠕ࡞ࠧ࡝࠭ࡓߪ AdaBoost ࠃ ࠅ ߽ C4.5 ߩ ♖ ᐲ ࠍ ะ ਄ ߐ ߖ ࠆ ߎ ߣ ߇ ಴ ᧪ ߥ ߆ ߞ ߚ ‫ޔ ߒ ߆ ߒ ޕ‬ AdaBoost_M2 ࠍ̌Pruned tree̍ߦㆡ↪ߒߚࠪࠬ࠹ࡓߪࠛ࡜࡯ߩᷫዋ₸৻⇟ᖡ߆ߞߚ ߇࠺࡯࠲࠮࠶࠻ߦࠃߞߡߪ‫ߩ₸࡯࡜ࠛޔ‬ᷫዋ߇ᦨᄢߦߥࠆ႐ว߇޽ߞߚ‫ߚ߹ޕ‬ AdaBoost ߇ലᨐ⊛ߢߪߥ޿࠺࡯࠲࠮࠶࠻ߦኻߒߡ᦭ലߢ޽ࠆ႐ว߇޽ߞߚ‫ޕ‬೎ߩቇ ⠌ࠕ࡞ࠧ࡝࠭ࡓߦࠃࠆታ㛎‫ߦࠄߐޔ‬ᡷ⦟ࠍട߃ࠆ੐߇ᔅⷐߛ߇‫ߪࡓ࠭࡝ࠧ࡞ࠕߩߎޔ‬ ᕈ⢻ߩ⦟޿ࠕ࡞ࠧ࡝࠭ࡓ‫ࠆࠁࠊ޿ޔ‬ᒝቇ⠌ࠕ࡞ࠧ࡝࠭ࡓߦኻߒߡ᦭ലߦߥࠆߎߣ߇⠨ ߃ࠄࠇࠆ‫ޕ‬. 42.

(49)  ᦨᓟߦ੹ᓟߩ⺖㗴ߣߒߡ‫ࠣࡦࠖ࠹ࠬ࡯ࡉޔ‬ᚻᴺߪࡌ࡯ࠬߣߥࠆቇ⠌ࠕ࡞ࠧ࡝࠭ࡓߦ ࠃߞߡ᦭ലᕈ߇ᄌൻߔࠆߎߣ߇⏕⹺ߐࠇߚ߇‫ߩ࡯࡜ࠛߡߞࠃߦ࠻࠶࠮࠲࡯࠺ޔ‬ᷫዋ₸ ߩᏅ߇ᄙ޿ߎߣ߇⏕⹺ߐࠇߚ‫ޔߒ߆ߒޕ‬ታ㛎ߢ૶↪ߒߚ࠺࡯࠲࠮࠶࠻ߢߪ‫ޔ‬੐଀ߩᢙ‫ޔ‬ ዻᕈߩᢙ‫ߩ࡞ࡌ࡜ޔ‬ᢙߩ㆑޿ߦࠃࠆ᣿⏕ߥ᦭ലᕈߩ㆑޿ߪ⏕⹺ߔࠆߎߣߪ಴᧪ߥ߆ߞ ߚ‫ࠍࠣࡦࠖ࠹ࠬ࡯ࡉޔࠅࠃߣߎߩࠄࠇߎޕ‬ലᨐ⊛ߦᡷ⦟ߔࠆߚ߼ߦߪ࠺࡯࠲࠮࠶࠻ߩ ㆑޿ߦࠃࠆ᦭ലᕈߩᄌൻߩ⺞ᩏߦ߽ᵈ⋡ߔࠆᔅⷐ߇޽ࠆߣᗵߓߚ‫ޕ‬. 43.

(50) ⻢ㄉ.  ᧄ⎇ⓥߩ㐿ᆎ߆ࠄ߹ߣ߼߹ߢ‫⎇ޔ‬ⓥߩో㕙ߦࠊߚࠅ৻⽾ߒߡߏᜰዉ‫ߏޔ‬ഥ⸒ࠍ㗂߈ ߹ߒߚ Ho Tu Bao ᢎ᝼ߦᷓߊ߅␞ࠍ↳ߒ਄ߍ߹ߔ‫⎇ޕ‬ⓥߩᣇ㊎ߦߟ޿ߡഥ⸒ࠍ㗂߈ ߹ߒߚ⍹ፒ㓷ੱഥᢎ᝼ߦᔃ߆ࠄᗵ⻢޿ߚߒ߹ߔ‫⎇ޕ‬ⓥࠍㅴ߼ࠆߦᒰߚࠅ‫ޔ‬ೋᱠ⊛ߥ⾰ ໧߆ࠄ⎇ⓥߩᣇᴺ߹ߢᥦ߆޿ߏᗧ⷗ߣৼካߥߏᜰዉࠍ㗂߈߹ߒߚ Nguyen Dung Trong ഥᚻߦᔃ߆ࠄᗵ⻢޿ߚߒ߹ߔ‫ޕ‬  ᦨᓟߦ‫⎇ᧄޔ‬ⓥࠍㅴ߼ࠆߦᒰߚࠅ‫ࠆࠁࠄ޽ޔ‬႐㕙ߦ߅޿ߡ⾆㊀ߥߏᗧ⷗ࠍ㗂߈߹ߒ ߚ Ho-⍹ፒ⎇ⓥቶߩ⊝᭽ߦᗵ⻢޿ߚߒ߹ߔ‫ޕ‬. 44.

(51) ෳ ⠨ ᢥ ₂ [1]. Quinlan J.R., 1998, Bagging boosting and C4.5.. [2]. Robert E. Schapire, 1999, A Brief Introduction to Boosting.. [3]. Michael j. Kearns and Umesh V. Vazirani, 1994, An Introduction to Computational Learning Theory.. [4]. Yoav Freund, Robert E. Schapire, 1995, a decision-theoretic generalization of on-line learning and an application to boosting.. [5]. Yoav Freund, Robert E. Schapire, 1996, Experiments with a new boosting algorithm.. [6]. Robert E. Schapire, 1990, The strength of weak learn ability. Machine Learning.. [7]. Yoav Freund, Robert Schapire, (⸶㧦቟୚⋥᮸), 1999, ࡉ࡯ࠬ࠹ࠖࡦࠣ౉㐷 (A Short Introduction to Boosting).. [8]. ࡑࠗࠤ࡞ J.A.ࡌ࡝࡯㧘ࠧ࡯࠼ࡦ࡮࡝ࡁࡈ⪺㧘SAS ࠗࡦࠬ࠹ࠖ࠴ࡘ࡯࠻㧘ᳯේᷕ㧘 ૒⮮ᩕ૞⸶㧘࠺࡯࠲ࡑࠗࡦ࠾ࡦࠣᚻᴺ (Data Mining Techniques) ᶏᢥၴ಴  㧘㧝㧥㧥㧥ᐕ.. 45.

(52) [9]. J.R.ࠠࡦ࡜ࡦ⪺㧘ฎᎹᐽ৻⋙⸶㧘AI ߦࠃࠆ࠺࡯࠲⸃ᨆ (Programs for machine learning) ࠻࠶ࡄࡦ㧘1995 ᐕ.. [10] H.M.࠳ࠗ࠹࡞㧘P.J.࠳ࠗ࠹࡞⪺㧘ዊ᎑㓉৻⸶㧘Computer Science Textbook C ේ★ࡊࡠࠣ࡜ࡒࡦࠣ㧘ࡊ࡟ࠗࡦ࠹ࠬࡎ࡯࡞಴ 㧘1998. [11] Jiawei Han and Micheline Kamber ( ⪺ ), Data Mining: Concepts and Techniques, The Morgan Kaufmann Series in Data Management Systems, Jim Gray, Series Editor Morgan Kaufmann Publishers, 2000 ᐕ.. 46.

(53)

参照

関連したドキュメント

A novel optical profiling method is proposed, which is nearly insensitive to vertical vibrations and able to measure the roughness of supersmooth surfaces on a long track.. This

We develop vibration measuring equipment using high accurate inclimeter sensor that was not used in the past studies related to MEMS sensor. Since high accurate inclimeter sensor

ImproV allows the users to mix multiple videos and to combine multiple video effects on VJing arbitrary by data flow editor. We employ a unified data type, we call, Video Type which

本節では本研究で実際にスレッドのトレースを行うた めに用いた Linux ftrace 及び ftrace を利用する Android Systrace について説明する.. 2.1

①物流品質を向上させたい ②冷蔵・冷凍の温度管理を徹底したい ③低コストの物流センターを使用したい ④24時間365日対応の運用したい

定可能性は大前提とした上で、どの程度の時間で、どの程度のメモリを用いれば計

電子式の検知機を用い て、配管等から漏れるフ ロンを検知する方法。検 知機の精度によるが、他

[r]