• 検索結果がありません。

並列分散処理による2段階ランキングアルゴリズム

N/A
N/A
Protected

Academic year: 2021

シェア "並列分散処理による2段階ランキングアルゴリズム"

Copied!
7
0
0

読み込み中.... (全文を見る)

全文

(1)Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.20 Vol.2014-EMB-35 No.20 2014/11/18. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. ฒྻ෼ࢄॲཧʹΑΔ 2 ஈ֊ϥϯΩϯάΞϧΰϦζϜ ෢Ҫɹྑଠ1,a). ৽ඒɹྱ඙1,b). ֓ཁɿ΢ΣϒάϥϑΛ࠶ߏங͢Δ͜ͱͰ, ϥϯΩϯάΞϧΰϦζϜΛ࣮ߦ͢ΔࡍʹඞཁͱͳΔղੳ࣌ؒΛ ୹ॖ͢Δ͜ͱ͕ՄೳͰ͋Δ. ϥϯΩϯάΞϧΰϦζϜͰղੳ͢Δࡍ, ฒྻ෼ࢄԽॲཧʹదԠͰ͖ΔΑ͏ʹ ͢Δ͜ͱͰ, ߋͳΔղੳ࣌ؒͷ୹ॖ͕‫ظ‬଴Ͱ͖Δ. ຊ‫ڀݚ‬͸, ΢ΣϒάϥϑΛ PageRank ͷղੳ͕ฒྻ෼ࢄ ॲཧՄೳͳάϥϑ΁࠶ߏஙΛߦ͍,PageRank ͷղੳΛߦ͏ࡍʹฒྻ෼ࢄॲཧΛ༻͍Δ͜ͱͰ, ղੳʹඞཁ ͳ࣌ؒΛ୹ॖ͢Δํ๏ΛఏҊ͢Δ. ຊఏҊํ๏͸, ΢ΣϒάϥϑͷϊʔυΛ SCAN ʹΑΓ Cluster ͱ Hub, Outlier ʹ෼ผ͠,Cluster ͱ Hub Ͱߏ੒͞ΕΔάϥϑͱ Cluster ಺ͷϊʔυͰߏ੒͞ΕΔάϥϑʹ΢Σϒ άϥϑΛ࠶ߏங͢Δ.Cluster ಺ͷάϥϑΛϥϯΩϯάΞϧΰϦζϜͰղੳ͢Δࡍ, ͓‫͍ޓ‬ͷ Cluster ͸ؔ܎ ੑ͕ແ͍ͨΊฒྻ෼ࢄॲཧ͕ՄೳʹͳΓ, ϥϯΩϯάΞϧΰϦζϜͷղੳ࣌ؒΛ୹ॖ͢Δ͜ͱ͕ग़དྷΔ.. 1. ͸͡Ίʹ. Cluster Λ࡞੒͢Δ.Cluster ʹଐ͞ͳ͔ͬͨϊʔυΛ 2 ͭ Ҏ্ͷ Cluster ͱͭͳ͕Γ͕͋ΔϊʔυΛ Hub ͱ͠,1 ͭͷ. ΢Σϒάϥϑͱ͸΢ΣϒΛάϥϑߏ଄Խͨ͠΋ͷͰ͋Γ,. Έͷ Cluster ͱͭͳ͕Γ͕͋ΔϊʔυΛ Outlier ͱ෼ผ͢. ϊʔυΛ΢Σϒϖʔδ, ΤοδΛϦϯΫͰද‫ͨ͠ݱ‬΋ͷͰ. Δ. ͜ΕʹΑΓ, ΢ΣϒάϥϑΛ Cluster ͱ Hub Ͱߏ੒͞Ε. ͋Δ. ΢ΣϒάϥϑΛղੳ͢Δ͜ͱͰ஌‫ݟ‬Λಋग़͢Δ͜ͱ. Δάϥϑͱ Cluster ಺ͷϊʔυͰߏ੒͞ΕΔάϥϑʹ΢Σ. ͕੝ΜʹߦΘΕ͓ͯΓ, ओͳղੳ๏ͱͯ͠ϥϯΩϯάΞϧ. ϒάϥϑΛ࠶ߏங͢Δ.Cluster ಺ͷάϥϑΛ PageRank Ͱ. ΰϦζϜͰ͋Δ PageRank[1] ΍ HITS[2], ίϛϡχςΟந. ղੳ͢Δࡍ, ͓‫͍ޓ‬ͷ Cluster ͸ؔ܎ੑ͕ແ͍ͨΊฒྻ෼. ग़ͱͯ͠ trawling[3] ΍ DBG[4] ͕͋Δ. ۙ೥Ͱ͸, ϊʔυ. ࢄॲཧ͕ՄೳʹͳΔ. ߋʹ, ΫϥελͰ੾Γ෼͚Δ͜ͱͰ 1. ਺ͱΤοδ਺͕਺ԯ‫ͳʹݸ‬Δάϥϑ͕ଘࡏ͍ͯ͠Δ. ྫ͑. ճ͋ͨΓͷ PageRank ͷղੳର৅ͱͳΔϊʔυͱΤοδ͕. ͹ Netcraft ͸ 2014 ೥ 10 ݄ͷ࣌఺Ͱ, શੈքͰগͳ͘ͱ΋. ‫ݮ‬গ͢Δ. Ҏ্ΑΓ,PageRank ͷղੳ࣌ؒΛ୹ॖ͢Δ͜ͱ. 10 ԯϖʔδҎ্ଘࡏ͢Δͱใࠂ͍ͯ͠Δ [5]. ΢Σϒάϥ. ͕ग़དྷΔ.. ϑ͸‫ۃ‬Ίͯେ‫ن‬໛ʹͳ͓ͬͯΓ, ղੳʹ͔͔Δ࣌ؒ͸๲େ ʹͳΔ. ྫ͑͹, ϥϯΩϯάΞϧΰϦζϜͰ͋Δ PageRank ͷ৔߹,N ‫ݸ‬ϊʔυ͕͋Δ࣌, ‫ྔࢉܭ‬͸ O(N 2 ) ͱͳΔ. ͦͷ. 2. ઌߦ‫ڀݚ‬ େ‫ن‬໛ͳ΢ΣϒάϥϑΛ࠶ߏங͢Δ͜ͱ͸, ΢Σϒάϥ. ͨΊ, େ‫ن‬໛ͳάϥϑʹରͯ͠ղੳ࣌ؒΛ୹ॖ͢Δํ๏͕. ϑΛ༻͍Δղੳํ๏ʹ͓͍ͯॏཁͳٕज़Ͱ͋Δ. ΢Σϒά. ඞཁͰ͋Δ. ղੳ࣌ؒΛ୹ॖ͢Δํ๏ [6],[7],[8],[9],[10] ͸ෳ. ϥϑͷ࠶ߏங͸େ͖͘෼͚Δͱ, ූ߸ԽʹΑΔํ๏ͱߏ଄. ਺͋Δ͕,1 ͭͱͯ͠΢ΣϒάϥϑΛ࠶ߏங͠ղੳ࣌ؒΛ. վྑʹΑΔํ๏ͷ 2 छྨʹ෼͔ΕΔ.. ୹ॖ͢Δํ๏͕‫͛ڍ‬ΒΕΔ. ຊ‫Ͱڀݚ‬͸, ϥϯΩϯάΞϧ ΰϦζϜͰ͋Δ PageRank ͷղੳ࣌ؒͷߋͳΔ୹ॖΛ໨ඪ. 2.1 ූ߸Խ. ͱ͠,PageRank ͷղੳΛฒྻ෼ࢄԽग़དྷΔΑ͏ͳ‫ܗ‬ঢ়ʹά. ූ߸ԽʹΑΔํ๏͸‫ط‬ଘͷσʔλѹॖΛ࢖༻͢Δ͔, ΢Σ. ϥϑΛ࠶ߏங, ղੳʹඞཁͳ࣌ؒΛ୹ॖ͢Δํ๏ΛఏҊ͢. ϒάϥϑͷಛ௃Λ༻͍Δํ๏Ͱ͋Δ.P. Boldi Β͸΢Σϒά. Δ. ຊํ๏Ͱ͸ Xu Β͕ఏҊͨ͠ SCAN[11] Λ༻͍ͯάϥϑ. ϥϑͷಛ௃ͱ WebGraph[12] ͱ͍͏໊ͷํ๏ΛఏҊͨ͠.. Λ࠶ߏங͢Δ.SCAN ͸ߏ଄తྨࣅ౓ʹ‫͍ͨͮج‬άϥϑΫ. ΢Σϒάϥϑͷಛ௃ͱͯ͠, ‫ॴہ‬ੑͱྨࣅੑ, ࿈ଓੑΛ‫͛ڍ‬. ϥελϦϯάํ๏ͷ 1 ͭͰ͋Δ. ߏ଄ྨࣅ౓ͷԼ‫ݶ‬஋Λࣔ. ͍ͯΔ.. ͢ᮢ஋ ε ͱ,Cluster ͷ࠷খϊʔυ਺Λࣔ͢ᮢ஋ЖΛ༻͍ͯ. ɾ‫ॴہ‬ੑ ɹଟ͘ͷ΢Σϒϖʔδ͸ಉ͡ϗετ಺ͷϖʔ. 1. a) b). ެཱ͸ͩͯ͜ະདྷେֶ Future University Hakodate [email protected] [email protected]. ⓒ 2014 Information Processing Society of Japan. δ ʹ ༠ ಋ ͢ Δ ͨ Ί ͷ Ϧ ϯ Ϋ Λ ‫ ؚ‬Μ Ͱ ͍ Δ. ྫ ͑ ͹,“home”,“next”,“previous”,“up” ͳͲͷϦϯΫ͕‫͛ڍ‬Β. 1.

(2) Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.20 Vol.2014-EMB-35 No.20 2014/11/18. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. ΕΔ. ྫͰ্͛ͨϦϯΫ͸ಉ͡ϗετ಺ͰϦϯΫ͞ΕΔ͜ ͱ͕ଟ͍. ͭ·Γ, ϦϯΫ‫ͱݩ‬ϦϯΫઌͷ΢Σϒϖʔδͷ. URL Λൺֱ͢Δͱઌ಄จࣈྻ (ϗετ໊) ͕Ұக͠΍͘͢, ‫ॴہ‬ੑ͕‫ݱ‬ΕΔ. ɾྨࣅੑ ɹಉ͡ϗετͷ΢Σϒϖʔδ͸ࣅͨΑ͏ͳ΢Σϒϖʔδʹ ϦϯΫΛ࣋ͭ܏޲͕͋Δ. ྫ͑͹, ‫ڞ‬௨ͷςϯϓϨʔτͰ࡞ ੒͞ΕΔϖʔδͳͲ͕‫͛ڍ‬ΒΕΔ. ྫͰ্͛ͨϖʔδ͸ಉ. ਤ 1. ѹॖલ. ਤ 2. ѹॖ‫ޙ‬. ͡ߏ଄ͷϖʔδ͕ͨ͘͞Μ͋ΔͨΊಉ͡΢ΣϒϖʔδʹϦ. ͍Δ. ߏ଄վྑʹΑΔํ๏͸, ϊʔυ਺ͱΤοδ਺Λ࡟‫͢ݮ‬. ϯΫΛషΔ͜ͱ͕ଟ͍. ͭ·Γ, ಉҰϗετ໊ͷ΢Σϒϖʔ. Δ͜ͱ͕ग़དྷ͔ͭ,PageRank ͷղੳ࣌ؒΛ୹ॖ͢Δ͜ͱ͕. δͷϦϯΫͷ URL Λൺֱ͢Δͱ URL ͕Ұக͠΍͘͢, ྨ. ՄೳͰ͋Δ. Αͬͯຊ‫Ͱڀݚ‬͸,PageRank ͷղੳ࣌ؒͷ୹. ࣅੑ͕‫ݱ‬ΕΔ.. ॖΛ໨ඪͱ͍ͯ͠ΔͨΊߏ଄վྑʹΑΔํ๏Λ༻͍Δ. ै. ɾ࿈ଓੑ. དྷํ๏͸Τοδ਺΍ϊʔυ਺Λ࡟‫͠ݮ‬,PageRank ͷղੳ࣌. ɹ΢Σϒϖʔδʹ‫·ؚ‬ΕΔϦϯΫͷ URL ͸Ұఆͷ‫·ݻ‬Γ. ؒΛ୹ॖ͢Δ͜ͱʹ͸੒ޭ͍ͯ͠Δ. ຊ‫Ͱڀݚ‬͸, ΢Σϒά. Ͱ‫ݱ‬ΕΔ܏޲͕͋Δ. ྫ͑͹, େྔͷࣸਅΛ‫͍ͯ͠ࡌܝ‬Δ. ϥϑΛ PageRank ͷղੳ͕ฒྻ෼ࢄॲཧՄೳͳάϥϑ΁࠶. ϖʔδ͕‫͛ڍ‬ΒΕΔ. ྫͰ্͛ͨϖʔδ͸͋ΔҰఆͷ࿈ଓ. ߏஙΛߦ͍,PageRank ͷղੳΛߦ͏ࡍʹฒྻ෼ࢄॲཧΛ༻. ͨ͠ϦϯΫͷ‫·ݻ‬Γ͕ଘࡏ͢Δ. ͭ·Γ,URL Λࣙॻॱʹ੔. ͍Δ͜ͱͰ, ղੳʹඞཁͳ࣌ؒΛ୹ॖ͢Δํ๏ΛఏҊ͢Δ.. ྻ͠ॱʹ ID ΛׂΓ౰ͯΔͱ਺ࣈ͕࿈ଓ͠, ࿈ଓੑ͕‫ݱ‬ΕΔ.. P. Boldi Β͸্‫ه‬ͷಛ௃Λ༻͍ͯ, ࿈ଓੑΛར༻ͨࠩ͠. 3. SCAN. ෼ූ߸Խͱ, ‫ॴہ‬ੑͱྨࣅੑΛར༻ͨ͠ϥϯϨϯάεѹॖ. ఏҊํ๏Ͱ༻͍ΔΫϥελϦϯάํ๏ͷ SCAN[11] ʹ. Λ༻͍ͨํ๏ΛఏҊ͍ͯ͠Δ. ͜ΕʹΑΓ,1 ϦϯΫ͋ͨΓ. ͍ͭͯઆ໌͢Δ.SCAN ͸ΫϥελϦϯάର৅ͱ͢Δάϥ. 3.08bit ʹ཈͑Δ͜ͱʹ੒ޭ͍ͯ͠Δ.S. Tei Β͸ WebGraph. ϑ G = (V, E) Λղੳ͠, ߏ଄ྨࣅ౓ͷԼ‫ݶ‬஋Λࣔ͢ᮢ஋ ε. ͷࠩ෼ූ߸Խͷ‫ࢉޙ‬ज़ූ߸Λ༻͍Δ͜ͱͰ, ΢Σϒάϥϑ. ͱ,Cluster ͷ࠷খϊʔυ਺Λࣔ͢ᮢ஋ЖʹԠͨ͡ structure-. ͷ৘ใྔΛ WebGraph ΑΓ΋ 2 ׂ΄Ͳଟ͘࡟‫͢ݮ‬Δ͜ͱ. connected cluster Λ࡞੒͢Δ.Cluster ʹଐ͞ͳ͔ͬͨϊʔ. ʹ੒ޭ͍ͯ͠Δ [13]. ͔͠͠, ූ߸ԽʹΑΔํ๏Ͱ͸৘ใྔ. υ͸ Hub ͔ Outlier ͷ෼ผ͕ߦΘΕΔ.. Λ࡟‫͢ݮ‬Δ͜ͱ͸ग़དྷΔ͕, ϊʔυ਺ͱΤοδ਺Λ࡟‫ݮ‬ग़. ࠷ॳʹ͢΂ͯͷϊʔυʹରͯ͠ unclassfied ͷϥϕϧΛ. དྷͳ͍. ·ͨ, ෮߸Խ͢Δඞཁ͕͋Γ PageRank ͷղੳ࣌ؒ. ༩͑Δ.SCAN ͸ unclassfied ͷϥϕϧ͕෇͍͍ͯΔϊʔυ. ͕૿Ճͯ͠͠·͏. Αͬͯຊ‫Ͱڀݚ‬͸, ූ߸ԽʹΑΔํ๏͸. ʹରͯ͠, ͦͷϊʔυ͕ core Ͱ͋Δ͔Ͳ͏͔ͷ൑ఆΛߦ. ༻͍ͳ͍.. ͏.core ϊʔυ͸ Cluster ͷத৺ͱͳΔϊʔυͰ͋Δ.core Ͱ͋Δ͔൑அ͢ΔͨΊʹ͸, ର৅ͱͳΔ unclassfied ͳϊʔ. 2.2 ߏ଄վྑ. υͷߏ଄తྡ઀ϊʔυू߹ΛఆΊ, ߏ଄తྨࣅ౓Λ‫ࢉܭ‬, ᮢ. ߏ଄վྑ͸΢Σϒάϥϑͷߏ଄ΛҰ෦૊Έସ͑Δ͜ͱ. ஋ ε Λ༻͍ͯ ε-neighborhood Λࢉग़͢Δ͜ͱͰ core ͔Ͳ. Ͱϊʔυ਺΍Τοδ਺Λ࡟‫͢ݮ‬Δํ๏Ͱ͋Δ.G. Buehrer. ͏͔൑அ͢Δ͜ͱ͕ग़དྷΔ. ҎԼʹߏ଄తྡ઀ϊʔυू߹. Β͸ Virtual Node Miner[6] ͱ‫ݺ‬͹ΕΔ, Սۭͷϊʔυ (Ҏ. ͱߏ଄తྨࣅ౓,ε-neighborhood ͷఆٛΛ‫ه‬ड़͢Δ.. Լ,VN) Λ΢Σϒάϥϑʹૠೖ͢Δ͜ͱͰΤοδ਺Λ࡟‫ݮ‬. ɾߏ଄తྡ઀ϊʔυू߹. ͢Δํ๏ΛఏҊͨ͠.VN ͸ਤ 1 ͷΑ͏ͳ‫׬‬શ 2 ෦άϥϑ. v ∈ V Ͱ͋Δͱ͖, ߏ଄తྡ઀ϊʔυू߹͸ϊʔυ v ʹ. ͷதؒʹਤ 2 ͷΑ͏ʹૠೖ͢Δ. ͜ΕʹΑͬͯ, ϦϯΫΛ. ΤοδͰ઀ଓ͢Δϊʔυͱ, ϊʔυ v ࣗΒͰߏ੒͞ΕΔू. ू໿͠ϊʔυΛ࡟‫͍ͯ͠ݮ‬Δ.Virtual Node Miner ͸‫׬‬શ. ߹ Γ(v) Ͱද͞ΕΔ.. 2 ෦άϥϑͷ୳ࠪʹ࠷খ஋ಠཱஔ‫׵‬଒Λར༻͍ͯ͠Δ. ͜ ΕʹΑΓ, ϊʔυ਺໿ 7800 ສ, Τοδ਺໿ 3 ԯͷ΢Σϒά. Γ(v) = {v ∈ V | {v, w} ∈ E} ∪ {v}. ϥϑʹରͯ͠, ໿ 1700 ສͷ VN Λૠೖ͠Τοδ਺Λ 1/7. ɾߏ଄తྨࣅ౓. ʹ࡟‫ݮ‬ग़དྷͨͱ͍ͯ͠Δ. ย੉Β͸ LittleWeb[7] ͱ‫ݺ‬͹Ε. |Γ(v)| ͕ྡ઀ϊʔυू߹ʹ‫·ؚ‬ΕΔϊʔυ਺ͱ͢Δͱ, ϊʔ. Δ, ྨࣅϊʔυू໿ʹΑΔ΢Σϒάϥϑѹॖख๏ΛఏҊ͠. υ v, w ؒͷߏ଄తྨࣅ౓͸ σ(v, w) Ͱද͞ΕΔ.. ͨ.LittleWeb ͸ू໿ର৅ͱͳΔϊʔυΛ୳ࠪ͢ΔͨΊͷΫ ϥελϦϯάॲཧͱ, ΫϥελϦϯά݁ՌΛ༻͍ͨߏ଄վ. |Γ(v) ∩ Γ(w)| σ(v, w) = p |Γ(v)||Γ(w)|. ྑͷ 2 εςοϓͰ΢ΣϒάϥϑΛ࠶ߏங͍ͯ͠Δ. ͜Εʹ. ɾε-neighborhood. ΑΓ, ϊʔυΛ 67.3 ˋ, ΤοδΛ 56.3 ˋ࡟‫͍ͯ͠ݮ‬Δ. ߋ. ߏ ଄ ྨ ࣅ ౓ ͷ Լ ‫ ݶ‬஋ Λ ࣔ ͢ ᮢ ஋ ε Λ ༻ ͍ Δ ͜ ͱ Ͱ,ε-. ʹ,PageRank ͷղੳ࣌ؒΛ 67 ˋ୹ॖ͢Δ͜ͱʹ੒ޭͯ͠. neighborhood ͱද͞ΕΔ.. ⓒ 2014 Information Processing Society of Japan. 2.

(3) Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.20 Vol.2014-EMB-35 No.20 2014/11/18. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. Nε (v) = {w ∈ Γ(v)|σ(v, w) ≧ ε} ɾCore. ε ∈ R,µ ∈ N,v ∈ V ,|Nε (v)| Λϊʔυ v ͷ ε-neighborhood ͷϊʔυ਺ͱ͢Δͱ,core ͸ COREε,µ (v) Ͱද͞ΕΔ.. COREε,µ (v) ⇔ |Nε (v)| ≧ µ SCAN ͷ൑ఆʹΑΓϊʔυ͕ core Ͱ͋ͬͨ࣌, ͜ͷ core ϊʔυʹϢχʔΫ ID Ͱ͋Δ clusterID ͷϥϕϧΛ༩͑,core ϊʔυΛத৺ʹ structure-connected cluster Λ࡞੒͢Δ. ϊʔυ͕ core Ͱͳ͔ͬͨ࣌, ͦͷϊʔυʹ non-menber ͷ ϥϕϧΛ༩͑Δ. ɹ SCAN ͸ core ϊʔυͱ൑ఆ͞Εͨϊʔυ͔Β,structure. reachability Ͱ౸ୡՄೳͳϊʔυ͕ॴଐ͢Δ Cluster ʹ clusterID ͷϥϕϧΛ༩͑Δ. ࠷ॳʹ, Ωϡʔ Q ΁ core ϊʔυͷ ε-neighborhood ʹ‫·ؚ‬ΕΔ͢΂ͯͷϊʔυΛૠೖ͢Δ. ࣍ ʹ, Ωϡʔ Q ΁ૠೖ͞Εͨϊʔυʹରͯ͠,direct structure. reachability ͱͳΔϊʔυͷू߹Ͱ͋Δ R Λ‫ٻ‬ΊΔ. ҎԼʹ direct structure reachability ͱ structure reachability ͷఆ ٛΛ‫ه‬ड़͢Δ. ɾDirect structure reachability ϊʔυ v ͱϊʔυ w ʹ͓͚Δ direct structure reachability ͸ DirREACHε,µ (v, w) Ͱද͞ΕΔ.. 4. ఏҊํ๏ ຊ‫Ͱڀݚ‬͸, ΢ΣϒάϥϑΛ PageRank ͷղੳ͕ฒྻ෼ ࢄॲཧՄೳͳάϥϑ΁࠶ߏஙΛߦ͍,PageRank ͷղੳΛߦ ͏ࡍʹฒྻ෼ࢄॲཧΛ༻͍Δ͜ͱͰ, ղੳʹඞཁͳ࣌ؒΛ ୹ॖ͢Δํ๏ΛఏҊ͢Δ. ఏҊํ๏ʹΑΔ΢Σϒάϥϑղ ੳํ๏͸,3 ͭͷεςοϓͰߦΘΕΔ.. Step 1 SCAN Λ༻͍ͯ Web graph ΛΫϥελϦϯά͠ Cluster ͱ Hub,Outlier ʹϊʔυ෼ผ͢Δ. Step 2 PageRank ͷղੳ͕ฒྻ෼ࢄॲཧՄೳʹ͢ΔͨΊ,Cluster ͱ Hub Λ༻͍ͨ Compression graph ͱ Cluster ಺ ͷ΢ΣϒάϥϑͰ͋Δ Cluster graph ͷ 2 छྨͷάϥ ϑΛ࡞੒͠,Web graph Λ࠶ߏங͢Δ.. Step 3 Compression graph ͷ PageRank ஋Λࢉग़͠,Cluster ͱ Hub ʹΑΔϥϯΩϯάΛ࡞੒͢Δ. ͦͷ‫ޙ‬,Cluster. graph ͷ PageRank ஋Λࢉग़͠,Cluster ಺ͷϊʔυʹ ΑΔϥϯΩϯάΛ࡞੒͠, ͦΕͧΕͷϥϯΩϯάΛϚʔ δ͢Δ. ਤ 3 ͸ఏҊํ๏ͷྲྀΕΛਤࣔͨ͠΋ͷͰ͋Δ. Ҏ߱, ͦΕͧ ΕͷεςοϓΛৄ͘͠આ໌͢Δ.. DirREACHε,µ (v, w) ⇔ COREε,µ (v) ∧ Nε (v) ɾStructure reachability. ε ∈ R,µ ∈ N,v ∈ V ͱ͢Δͱ, ϊʔυ v ͱϊʔυ w ʹ͓͚ Δ Structure reachability ͸ REACHε,µ (v, w) Ͱද͞ΕΔ.. REACHε,µ (v, w) ⇔ ∃1 , ...vn ∈ V : v1 = v ∧ vn = w ∧ ∀i ∈ {1, ..., n − 1} : DirREACHε,µ (vi , vi+1 ) ϊʔυू߹ R Λ‫ٻ‬ΊΔʹ͸, Ωϡʔ Q ʹ‫·ؚ‬ΕΔશͯͷ ϊʔυͷߏ଄తྡ઀ϊʔυू߹ʹରͯ͠, ߏ଄తྨࣅ౓Λ‫ܭ‬ ࢉ͢Δ. ࠷‫ʹޙ‬ϊʔυू߹ R ʹ‫·ؚ‬ΕΔϊʔυͷॴଐ͢Δ. Cluster ͕ unclassfied ͷ࣌, ϊʔυΛΩϡʔ Q ΁ૠೖ͢Δ. ϊʔυू߹ R ʹ‫·ؚ‬ΕΔϊʔυ͕ unclassfied, ·ͨ͸ non-. menber ͷ࣌, ࡞੒ͨ͠ clusterID ͷϥϕϧΛ༩͑Δ. Ұ࿈ͷ ॲཧΛΩϡʔ Q ͕ͳ͘ͳΔ·Ͱߦ͍,structure-connected. cluster Λ࡞੒͠ଓ͚Δ. ɹ structure-connected cluster ͷ࡞੒͕ऴྃͨ͠Β,non-. menber ͷϥϕϧ͕෇͍ͨϊʔυʹରͯ͠ Hub ͱ Outlier ͷ෼ผΛߦ͏.non-menber ͷϥϕϧ͕෇͍ͨϊʔυ͕ 2 ͭ Ҏ্ͷҟͳΔ Cluster ͱ઀ଓ͍ͯ͠Δ৔߹, ͦͷϊʔυʹର. ਤ 3 ఏҊํ๏ͷྲྀΕ. ͯ͠ Hub ͷϥϕϧΛ༩͑Δ.1 ͭͷΈͷ Cluster ͱ઀ଓͯ͠ ͍Δ৔߹, ͦͷϊʔυʹରͯ͠ Outlier ͷϥϕϧΛ༩͑Δ. ɹҎ্ΑΓ, શͯͷϊʔυʹରͯ͠ϥϕϧ͕෇͚ऴΘΓ. SCAN ʹΑΔΫϥελϦϯά͸‫ྃ͢׬‬Δ. ⓒ 2014 Information Processing Society of Japan. 4.1 ΫϥελϦϯά ఏ Ҋ ํ ๏ Ͱ ͸,SCAN Λ ༻ ͍ ͯ Ϋ ϥ ε λ Ϧ ϯ ά Λ ߦ. 3.

(4) Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.20 Vol.2014-EMB-35 No.20 2014/11/18. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. ͏.SCAN ͸άϥϑ಺ͷ͢΂ͯͷΤοδΛղੳ͠, ߏ଄త ྨࣅ౓ͷԼ‫ݶ‬஋Λද͢ᮢ஋ ε ͱ Cluster ͷ࠷খϊʔυ਺Λ ද͢ᮢ஋ЖʹԠͨ͡ Cluster Λ࡞੒͢Δ.Cluster ʹଐ͞ͳ ͔ͬͨϊʔυʹରͯ͠ 2 ͭҎ্ͷ Cluster ͱΤοδͰ઀ଓ ͞Ε͍ͯΔϊʔυ͸ Hub ͱ͠,1 ͭͷΈͷ Cluster ͱΤοδ Ͱ઀ଓ͞Ε͍ͯΔϊʔυ͸ Outlier ͱ෼ผ͢Δ. Ϋϥελ ϦϯάॲཧͷྫͰ͋Δਤ 4 ͸্ஈͷ Web graph ʹରͯ͠. SCAN Λ࣮ߦ͠, Լஈ͸ϊʔυΛ Cluster ͱ Hub,Outlier ʹ ෼ผͨ͠άϥϑͰ͋Δ.. ਤ 5. ࠶ߏஙͷྲྀΕ. ͏. ࣍ʹ Cluster ಺ͷ΢ΣϒάϥϑͰ͋Δ Cluster graph ͷ. PageRank ஋Λࢉग़͠,Cluster ಺ͷϊʔυʹର͢ΔϥϯΩϯ άΛ‫ٻ‬ΊΔ. ͜ͷ࣌,Cluster ֎ͷϊʔυؒͷΤοδ͸ແࢹ͢ Δ.Cluster graph ͷϥϯΩϯά͕‫·ٻ‬Γ࣍ୈ,Compression. graph ͷରԠ͢Δ Cluster ෦෼ʹ Cluster graph ͷϥϯΩϯ άΛૠೖ͠, ΢ΣϒάϥϑͷϥϯΩϯάΛ࡞੒͢Δ. ղੳͷ ਤ 4. ΫϥελϦϯάͷྲྀΕ. εςοϓͷྫͰ͋Δਤ 6 ͸, ॳΊʹ Compression graph Λ. PageRank Λ༻͍ͯϥϯΩϯάΛ࡞੒. ࣍ʹ,Cluster graph Λ PageRank Λ༻͍ͯϥϯΩϯάΛ࡞੒. ͜ͷ࣌,Hub ʹؔ. 4.2 ࠶ߏங SCAN ͷ݁ՌΛ༻͍ͯ Web graph Λ࠶ߏங͢Δ.Clus-. ͯ͠͸ಛʹૢ࡞͸ߦΘͳ͍. ࠷‫ʹޙ‬, ͦΕͧΕͷղੳ݁ՌͰ ͋ΔϥϯΩϯάΛϚʔδ͍ͯ͠Δ.. ter ͱ Hub Ͱߏ੒͞Εͨ΢ΣϒάϥϑͰ͋Δ Compression graph ͱ Cluster ಺ͷ΢ΣϒάϥϑͰ͋Δ Cluster graph ͷ. 4.4 ಛ௃. 2 छྨΛ࡞੒͢Δ. ࠶ߏஙͷྫͰ͋Δਤ 5 ͸ 4.1 Ͱ SCAN Λ. ຊ‫Ͱڀݚ‬͸, ΢ΣϒάϥϑΛ PageRank ͷղੳ͕ฒྻ෼. ༻͍ͯϊʔυΛ෼ผ্ͨ͠ஈͷάϥϑΑΓ,Cluster ͱ Hub. ࢄॲཧՄೳͳάϥϑ΁࠶ߏஙΛߦ͍,PageRank ͷղੳΛߦ. ͷΈͰߏ੒͞ΕΔ Compression graph ͱ,Cluster ಺ͷϊʔ. ͏ࡍʹฒྻ෼ࢄॲཧΛ༻͍Δ͜ͱͰ, ղੳʹඞཁͳ࣌ؒΛ୹. υͰߏ੒͞ΕΔ Cluster graph ʹ࠶ߏஙͨ͠΋ͷΛԼஈʹ. ॖ͢Δํ๏ΛఏҊ͍ͯ͠Δ͕, ฒྻ෼ࢄॲཧ͸ Cluster ಺ͷ. ࣔ͢.. ϊʔυʹର͢ΔϥϯΩϯάΛ‫ٻ‬ΊΔ࣌ʹߦΘΕΔ.Cluster ಺ͷϊʔυʹର͢ΔϥϯΩϯάΛ‫ٻ‬ΊΔࡍ,Cluster ֎ͷ. 4.3 PageRank. ϊʔυؒͷΤοδ͸ແࢹ͢ΔͨΊ, ͓‫͍ޓ‬ͷ Cluster ͸ؔ܎. PageRank Ͱͷղੳεςοϓ͸ 2 ͭͷεςοϓʹ෼͔Εͯ. ੑ͕ແ͘ͳΓ, ฒྻ෼ࢄॲཧ͕ՄೳͱͳΔ. ߋʹ,Cluster ͝. ͍Δ.1 ͭ໨͸ Compression graph ͷղੳ,2 ͭ໨͸ Cluster. ͱʹ PageRank ஋Λࢉग़͢ΔͨΊ,1 ճ͋ͨΓͷ PageRank. graph ͷղੳͰ͋Δ.2 ͭ໨ͷ Cluster graph ͷղੳ͸ฒྻ. ஋ͷࢉग़࣌ʹ‫ࢉܭ‬ର৅ͱͳΔϊʔυͱΤοδ͕‫ݮ‬গ͢Δ.. ෼ࢄॲཧΛ༻͍Δ. ਤ 6 ͸ղੳͷεςοϓΛਤࣔͨ͠΋ͷͰ. Ҏ্ΑΓ, ΢ΣϒάϥϑΛ PageRank Ͱղੳ͢Δࡍͷղੳ. ͋Δ. ࠷ॳʹ Cluster ͱ Hub Ͱߏ੒͞Εͨ΢ΣϒάϥϑͰ. ࣌ؒΛ୹ॖ͢Δ͜ͱ͕ՄೳͱͳΔ.. ͋Δ Compression graph ͷ PageRank ஋Λࢉग़͠,Cluster ͱ Hub ʹର͢ΔϥϯΩϯάΛ‫ٻ‬ΊΔ. ͜ͷ࣌, ҟͳΔ 2 ͭ. 5. ߟ࡯. ͷ Cluster ʹଐ͢ΔϊʔυؒͷΤοδͷΈΛߟྀ͠,Cluster. ఏҊํ๏ʹରͯ͠ 4 ͭͷߟ࡯͢΂͖఺͕͋Δ.. ಺ͷϊʔυؒͷΤοδ͸ແࢹ͢Δ. ·ͨ, ಉ͡ Cluster ؒͷ. 1 ͭ໨ʹຊఏҊํ๏ͷଥ౰ੑΛߟ͑Δ. ฒྻ෼ࢄॲཧΛ. Τοδ͕ 2 ͭҎ্ଘࡏͯ͠΋ 1 ͭͷΈͷΤοδͱͯ͠ѻ. ߦ͍͍ͨ৔߹, ߟྀ͠ͳ͚Ε͹ͳΒͳ͍໰୊ͱͯ͠, ෼ׂ. ⓒ 2014 Information Processing Society of Japan. 4.

(5) Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.20 Vol.2014-EMB-35 No.20 2014/11/18. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. ਤ 6. ղੳͷྲྀΕ. ͨ͠ॲཧͲ͏͠ʹؔ܎ੑ͕ແ͍͔Ͳ͏͔ߟ͑Δඞཁ͕͋. Δ. ͜ͷ෼ผํ๏ʹΑΓ, ΠϯͷΈͷΤοδΛॴ༗͍ͯ͠Δ. Δ. ຊఏҊํ๏͸ Cluster ಺ͷϊʔυΛ PageRank ʹͯղ. ϊʔυΛ੾Γམͱ͢͜ͱͰ PageRank ͷղੳ͕࣌ؒߋʹ୹. ੳ͢Δࡍʹฒྻ෼ࢄॲཧΛߦ͏͕,Cluster ಺ͷϊʔυ͸ଞ. ॖग़དྷΔͱߟ͑Δ.. ͷ Cluster ΍ Hub,Outlier ͱ͸Τοδʹͯ઀ଓ͞Ε͍ͯͳ. 4 ͭ໨ʹ΢ΣϒάϥϑͷΫϥελϦϯάʹֻ͔Δ࣌ؒʹ. ͍. ͢ͳΘͪ,Cluster ಺ͷϊʔυ͸ Cluster ಺ͷϊʔυؒ. ؔ͢Δ‫ݕ‬౼Ͱ͋Δ. ఏҊํ๏͸,PageRank ʹΑΔ΢Σϒά. ͷΈʹͯΤοδ͕͋Γ, ؔ܎ੑ͕͋Δ. Αͬͯ,Cluster ಺ͷ. ϥϑͷղੳʹલॲཧͱͳΔ΢ΣϒάϥϑͷΫϥελϦϯά. PageRank ղੳ͸ಠཱͨ͠ॲཧͱͳΓฒྻ෼ࢄԽ͕Մೳͱ. ࡞‫͕͋ۀ‬ΔͨΊ, ୯७ʹ PageRank ʹΑΔղੳΑΓ΋ॲཧ. ͳΔ.. ਺͕ଟ͘ͳ͍ͬͯΔ. ͦͷͨΊ, ΫϥελϦϯάॲཧ͸୹. 2 ͭ໨ʹղੳର৅ͱͳΓ͏Δ΢ΣϒάϥϑΛߟ͑Δ.SCAN. ࣌ؒͰऴΘΔ͜ͱ͕๬·͍͠. ࣮૷͢Δࡍ,SCAN ͷॲཧΛ. ͸૬‫܎ؔͳີʹޓ‬ੑ, ͢ͳΘͪ૬‫ͳີʹޓ‬Τοδ͕͋Δ෦. MapReduce ʹରԠͤͨ͞ PSCAN[14] ΍, Ϋϥελ܎਺͕. ෼Λ Cluster ͱͯ͠ൈ͖ग़͢. Αͬͯ, ղੳର৅ͱ͢Δ΢Σ. େ͖͍‫࣮ݱ‬ͷάϥϑߏ଄ʹରͯ͠ΫϥελϦϯάॲཧ࣌ؒ. ϒάϥϑʹ͸શମʹૈີ͕ଘࡏ͢ΔάϥϑͰ͋Δඞཁੑ. Λ୹ॖͰ͖Δվྑ‫ ܕ‬SCAN[15] ͕͋Γ, ͜ΕΒΛ༻͍Δ͜. ͕͋Δͱߟ͑Δ. ۩ମతͳղੳର৅ͱͳΔ΢Σϒάϥϑ͸,. ͱͰΑΓॲཧ࣌ؒΛ୹ॖ͢Δ͜ͱ͕ՄೳʹͳΔͷͰ͸ͳ͍. ΢ΣϒΛීวతʹΫϩʔϦϯάͨ͠άϥϑͰ͋Δ͜ͱ͕๬. ͔ͱࢥΘΕΔ.. ·͍͠. ຊఏҊํ๏Ͱ PageRank ղੳͷߴ଎Խ͕๬Ίͳ͍ ΢Σϒάϥϑ͸, ୯ҰυϝΠϯΛର৅ͱͨ͠άϥϑ΍ϒϩ άαΠτΛΫϩʔϦϯάͨ͠άϥϑ͸ղੳ͕࣌ؒ୹ॖͮ͠ Β͍ͱࢥΘΕΔ.. 6. MapReduce ʹΑΔఏҊํ๏ͷฒྻ෼ࢄԽ ຊఏҊํ๏͸ϊʔυ਺͕ԯ୯ҐҎ্ͷ΢Σϒάϥϑ͕ ղੳର৅ͱͳΔͨΊ, ΫϥελϦϯάॲཧͱ PageRank ʹ. 3 ͭ໨ʹΤοδͷѻ͍ʹؔ͢Δ‫ݕ‬౼Ͱ͋Δ. ‫ࡏݱ‬͸ SCAN. ΑΔղੳ͸ MapReduce Λ༻͍ͨฒྻ෼ࢄॲཧΛߦ͏ํ. ͷํ๏Λ࢖༻͠,Cluster ͱͷΤοδʹ஫໨ͯ͠ Hub ͱ Out-. ਑ΛऔΔ.MapReduce ͸େ‫ن‬໛σʔλΛฒྻ෼ࢄॲཧ͢. lier ͷ෼ผΛߦ͓ͬͯΓ, Τοδ͕͋Δ͔ͳ͍͔ͷΈΛ෼. ΔͨΊͷϑϨʔϜϫʔΫͰ͋Δ.Map ॲཧͰσʔλΛఆ. ผ৚݅ͱ͍ͯ͠Δ. ͜͜ʹΤοδͷຊ਺, ͢ͳΘͪΤοδʹ. ٛͨ͠ํ๏Ͱ෼ׂΛߦ͍,Reduce ॲཧͰఆٛͨ͠ํ๏Ͱ. ॏΈͷ৘ใΛ௥Ճ͢Δ. ͜ͷ৘ใͷ௥ՃʹΑΓ, ॏΈͷԼ‫ݶ‬. ෼ׂͨ͠σʔλΛऩू͠, ॲཧΛߦ͏.SCAN ʹؔͯ͠͸. ஋ͷᮢ஋Λઃఆ͠, ᮢ஋ʹຬͨͳ͍ॏΈ෇͖ΤοδΛॴ༗. PSCAN[14] Ͱ,PageRank ʹؔͯ͠͸ Data-intensive text. ͍ͯ͠ΔϊʔυΛ੾Γམͱ͢͜ͱͰ PageRank ͷղੳ͕ߋ. processing with MapReduce[16] Ͱ‫ ʹط‬MapReduce Ͱఆٛ. ʹߴ଎Խग़དྷΔͱߟ͑Δ. ͜ͷଞʹΤοδͷํ޲ʹΑΔ෼. ͞Ε͍ͯΔͨΊ,Web graph ͷ࠶ߏஙͰ͋Δ Cluster graph. ผख๏ʹ‫ݕ‬౼ͷ༨஍͕͋Δͱߟ͑Δ. ର৅ͱ͢Δϊʔυ͕. ͱ Compression graph ʹؔͯ͠ MapReduce ͷఆٛΛड़. ྆ํ޲ͷΤοδ, ͢ͳΘͪΠϯͱΞ΢τ྆ํͷΤοδΛॴ. ΂Δ.. ༗͍ͯ͠Δϊʔυͱ, ยํ޲ͷΈΤοδ, ͢ͳΘͪΠϯ·. ·ͣ,Cluster graph ʹ͓͚Δ MapReduce ʹ͍ͭͯड़΂. ͨ͸Ξ΢τยํͷΤοδΛॴ༗͍ͯ͠Δϊʔυͱʹ෼ผ͢. Δ.Key/Value ͸ද 1 ʹࣔ͢΋ͷΛ༻͍Δ. දதͷ label. ⓒ 2014 Information Processing Society of Japan. 5.

(6) Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.20 Vol.2014-EMB-35 No.20 2014/11/18. ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. ͸ϊʔυʹ෇͍͍ͯΔ Cluster ΍ Hub,Outlier ͜ͱͰ͋. Hub ʹΑΔϥϯΩϯάΛ࡞੒͢Δ. ͦͷ‫ޙ‬,Cluster graph. Γ,node ͸ϊʔυʹ‫ݻ‬༗Ͱ༩͑ΒΕ͍ͯΔ ID,outdegree. ͷ PageRank ஋Λࢉग़͠,Cluster ಺ͷϊʔυʹΑΔϥϯΩ. ͸ϊʔυ͔Βग़͍ͯΔϦϯΫઌͷ node ͷ͜ͱΛࣔ͢. ·. ϯάΛ࡞੒, Ϛʔδ͢Δ.Cluster ಺ͷϊʔυʹର͢Δϥϯ. ͨ,Cluster ͱ Hub ʹ͸ͦΕͧΕ‫ݻ‬༗ͷ ID ͕༩͑ΒΕ͍ͯ. ΩϯάΛ‫ٻ‬ΊΔ࣌,Cluster ֎ͷϊʔυؒͷΤοδ͸ແࢹ͢. Δ΋ͷͱ͢Δ.. ΔͨΊ, ͓‫͍ޓ‬ͷ Cluster ͸ؔ܎ੑ͕ແ͘ͳΓ, ฒྻ෼ࢄॲ. ද 1. Cluster graph ʹ͓͚Δ Key/Value. Reduce. ͨΓͷ PageRank ͷղੳର৅ͱͳΔϊʔυͱΤοδ͕‫ݮ‬গ. Key. Value. label. node. ͢Δ. Ҏ্ΑΓ,PageRank ͷղੳ࣌ؒΛ୹ॖ͢Δ͜ͱ͕ग़. output. label. node, outdegree. དྷΔ. ͜ΕʹΑΓ,PageRank Λ༻͍͍ͯΔ‫ࡧݕ‬Τϯδϯͳ. input. Cluster ID. node, outdegree. ͲͰ, ΑΓߴ౓ͳ‫݁ࡧݕ‬ՌΛฦ͢͜ͱ͕ՄೳʹͳΓ, ৘ใࣾ. output. node. outdegree. ձʹ͓͚Δ඼࣭ͷߴ͍αʔϏεΛఏ‫͢ڙ‬Δ͜ͱʹߩ‫͖Ͱݙ‬. input Map. ཧ͕ՄೳͱͳΔ. ߋʹ, ΫϥελͰ੾Γ෼͚Δ͜ͱͰ 1 ճ͋. Δ. ࠓ‫ޙ‬͸࣮‫ݧ‬Λߦ͍, ຊఏҊํ๏͕Մೳ͔‫ূݕ‬Λߦ͏༧ఆ. Cluster graph ʹ͓͚Δ MapReduce ͸, ·ͣ Map ॲཧͱ. Ͱ͋Δ. ՄೳͰ͋Ε͹, ‫ࡏݱ‬͸ SCAN Λར༻͍ͯ͠Δ͕, ଞ. ͯ͠,input Ͱ͸,Cluster ΍ Hub,Outlier ͷϥϕϧΛ Key ͱ. ͷΫϥελϦϯάํ๏Ͱ΋ఏҊํ๏͕࣮‫͖Ͱݱ‬ΔՄೳੑ͕. ͠, ͦΕͧΕʹରԠ͢Δ node Λ Value ʹૠೖ͢Δ.output. ͋ΔͱࢥΘΕΔͨΊ, ௥ՃͰ࣮‫ݧ‬Λߦ͏༧ఆͰ͋Δ.. Ͱ͸,Value ͷ node ʹରԠ͢Δ outdegree Λ௥Ճ͢Δ. ࣍ʹ. Reduce ॲཧͱͯ͠,input Ͱ͸,Key ͔Β Cluster ͷϥϕϧ. ࢀߟจ‫ݙ‬. ͕෇͍ͨ΋ͷΛऩू͢Δ.output Ͱ͸,Value ʹ֨ೲ͞Εͯ. [1]. ͍Δ node Λ Key ʹૠೖ͢Δ.Reduce ॲཧΛ Cluster ID ͝ ͱʹ‫܁‬Γฦ͠, શͯͷ Cluster graph Λ࡞੒͢Δ.. [2]. ࣍ʹ,Compression graph ʹ͓͚Δ MapReduce ʹ͍ͭͯ ड़΂Δ.Key/Value ͸ද 2 ʹࣔ͢΋ͷΛ༻͍Δ. ද 2. [3]. Compression graph ʹ͓͚Δ Key/Value Key. Value. input. label. node. Map. output. label. node, outdegree. input. Hub ID. node, outdegree. Reduce. output. Hub ID. Cluster ID. Map ॲཧʹؔͯ͠͸ Cluster graph ͱಉ͡ॲཧΛߦ͏.Re-. [4]. [5] [6]. duce ॲཧͱͯ͠,input Ͱ͸,Key ͔Β Hub ͷϥϕϧ͕෇͍ ͨ΋ͷΛऩू͢Δ. ऩू‫ޙ‬,Value ͷ outdegree ͔Β, Ͳͷ. Cluster ͱ઀ଓ͍ͯ͠Δͷ͔Λௐ΂,Cluster ID Λ output ͷ Value ʹ௥Ճ͢Δ.Reduce ॲཧΛ Hub ID ͝ͱʹ‫܁‬Γฦ. [7]. ͠,Compression graph Λ࡞੒͢Δ.. 7. ͓ΘΓʹ ຊ‫Ͱڀݚ‬͸, ΢ΣϒάϥϑΛ PageRank ͷղੳ͕ฒྻ෼ ࢄॲཧՄೳͳάϥϑ΁࠶ߏஙΛߦ͍,PageRank ͷղੳΛߦ. [8]. [9]. ͏ࡍʹฒྻ෼ࢄॲཧΛ༻͍Δ͜ͱͰ, ղੳʹඞཁͳ࣌ؒΛ୹ ॖ͢Δํ๏ΛఏҊ͢Δ. ఏҊํ๏ʹΑΔ΢Σϒάϥϑղੳ. [10]. ํ๏͸,3 ͭͷεςοϓͰߦΘΕΔ.1 ͭ໨͸,SCAN Λ༻͍ ͯ Web graph ΛΫϥελϦϯά͠ Cluster ͱ Hub,Outlier. [11]. ʹϊʔυ෼ผ͢Δ.2 ͭ໨ʹ,PageRank ͷղੳ͕ฒྻ෼ࢄॲ ཧՄೳʹ͢ΔͨΊ,Cluster ͱ Hub Λ༻͍ͨ Compression. graph ͱ Cluster ಺ͷ΢ΣϒάϥϑͰ͋Δ Cluster graph ͷ 2 छྨͷάϥϑΛ࡞੒,Web graph Λ࠶ߏங͢Δ.3 ͭ໨ ͸,Compression graph ͷ PageRank ஋Λࢉग़͠,Cluster ͱ. ⓒ 2014 Information Processing Society of Japan. [12]. Brin, S. and Page, L.: The anatomy of a large-scale hypertextual Web search engine, Computer networks and ISDN systems, Vol. 30, No. 1, pp. 107–117 (1998). Kleinberg, J. M.: Authoritative sources in a hyperlinked environment, Journal of the ACM (JACM), Vol. 46, No. 5, pp. 604–632 (1999). Kumar, R., Raghavan, P., Rajagopalan, S. and Tomkins, A.: Trawling the Web for emerging cyber-communities, Computer networks, Vol. 31, No. 11, pp. 1481–1493 (1999). Reddy, P. K. and Kitsuregawa, M.: An approach to relate the web communities through bipartite graphs, Web Information Systems Engineering, 2001. Proceedings of the Second International Conference on, Vol. 1, IEEE, pp. 301–310 (2001). Netcraft: October 2014 Web Server Survey. Buehrer, G. and Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities, Proceedings of the 2008 International Conference on Web Search and Data Mining, ACM, pp. 95–106 (2008). ย੉߂থɼ্ాߴಙɼࢁ໊ૣਓɿLittleWeb: ྨࣅϊʔυू ໿ʹΑΔ Web άϥϑѹॖख๏ɼThe 2nd Forum on Data Engineering and Information Management, DBSJ, pp. E1–4 (2010). Kohlsch¨ utter, C., Chirita, P.-A. and Nejdl, W.: Efficient parallel computation of pagerank, Advances in information retrieval, Springer, pp. 241–252 (2006). Wicks, J. and Greenwald, A.: Parallelizing the computation of pagerank, Algorithms and Models for the WebGraph, Springer, pp. 202–208 (2007). Kamvar, S., Haveliwala, T., Manning, C. and Golub, G.: Exploiting the block structure of the web for computing pagerank, Stanford University Technical Report (2003). Xu, X., Yuruk, N., Feng, Z. and Schweiger, T. A.: Scan: a structural clustering algorithm for networks, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 824–833 (2007). Boldi, P. and Vigna, S.: The webgraph framework I: compression techniques, Proceedings of the 13th international conference on World Wide Web, ACM, pp.. 6.

(7) ৘ใॲཧֶձ‫ڀݚ‬ใࠂ IPSJ SIG Technical Report. [13]. [14]. [15]. [16]. Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.20 Vol.2014-EMB-35 No.20 2014/11/18. 595–602 (2004). γϡ΢ςΠɼɹยࢁ‫܆‬ɿࢉज़ූ߸Λ༻͍ͨ΢Σϒάϥϑ ද‫ݱ‬ͷͨΊͷѹॖํ๏ɼThe 6nd Forum on Data Engineering and Information Management, DBSJ, pp. D6–1 (2014). Zhao, W., Martha, V. and Xu, X.: PSCAN: A Parallel Structural Clustering Algorithm for Big Networks in MapReduce, Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on, IEEE, pp. 862–869 (2013). Ԙ઒ߒতɼ౻‫ݪ‬༃޺ɼَ௩ɹਅɿߏ଄తྨࣅ౓ʹ‫ͮ͘ج‬ά ϥϑΫϥελϦϯάͷߴ଎ԽɼThe 6th Forum on Data Engineering and Information Management, DBSJ, pp. D6–2 (2014). Lin, J. and Dyer, C.: Data-intensive text processing with MapReduce, Synthesis Lectures on Human Language Technologies, Vol. 3, No. 1, pp. 1–177 (2010).. ⓒ 2014 Information Processing Society of Japan. 7.

(8)

参照

関連したドキュメント

[r]

定期的に採集した小学校周辺の水生生物を観 察・分類した。これは,学習指導要領の「身近

[r]

血は約60cmの落差により貯血槽に吸引される.数

The goods and/or their replicas, the technology and/or software found in this catalog are subject to complementary export regulations by Foreign Exchange and Foreign Trade Law

過水タンク並びに Sr 処理水貯槽のうち Sr 処理水貯槽(K2 エリア)及び Sr 処理水貯槽(K1 南エリア)の放射能濃度は,水分析結果を基に線源条件を設定する。RO

過水タンク並びに Sr 処理水貯槽のうち Sr 処理水貯槽(K2 エリア)及び Sr 処理水貯槽(K1 南エリア)の放射能濃度は,水分析結果を基に線源条件を設定する。RO

[r]