並列分散処理による2段階ランキングアルゴリズム
7
0
0
全文
(2) Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.2 Vol.2014-EMB-35 No.20 2014/11/18. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. ΕΔ. ྫͰ্͛ͨϦϯΫಉ͡ϗετͰϦϯΫ͞ΕΔ͜ ͱ͕ଟ͍. ͭ·Γ, ϦϯΫͱݩϦϯΫઌͷΣϒϖʔδͷ. URL Λൺֱ͢Δͱઌ಄จࣈྻ (ϗετ໊) ͕Ұக͘͢͠, ॴہੑ͕ݱΕΔ. ɾྨࣅੑ ɹಉ͡ϗετͷΣϒϖʔδࣅͨΑ͏ͳΣϒϖʔδʹ ϦϯΫΛ͕࣋ͭ͋Δ. ྫ͑, ڞ௨ͷςϯϓϨʔτͰ࡞ ͞ΕΔϖʔδͳͲ͕͛ڍΒΕΔ. ྫͰ্͛ͨϖʔδಉ. ਤ 1. ѹॖલ. ਤ 2. ѹॖޙ. ͡ߏͷϖʔδ͕ͨ͘͞Μ͋ΔͨΊಉ͡ΣϒϖʔδʹϦ. ͍Δ. ߏվྑʹΑΔํ๏, ϊʔυͱΤοδΛ͢ݮ. ϯΫΛషΔ͜ͱ͕ଟ͍. ͭ·Γ, ಉҰϗετ໊ͷΣϒϖʔ. Δ͜ͱ͕ग़དྷ͔ͭ,PageRank ͷղੳ࣌ؒΛॖ͢Δ͜ͱ͕. δͷϦϯΫͷ URL Λൺֱ͢Δͱ URL ͕Ұக͘͢͠, ྨ. ՄೳͰ͋Δ. ΑͬͯຊͰڀݚ,PageRank ͷղੳ࣌ؒͷ. ࣅੑ͕ݱΕΔ.. ॖΛඪͱ͍ͯ͠ΔͨΊߏվྑʹΑΔํ๏Λ༻͍Δ. ै. ɾ࿈ଓੑ. དྷํ๏ΤοδϊʔυΛ͠ݮ,PageRank ͷղੳ࣌. ɹΣϒϖʔδʹ·ؚΕΔϦϯΫͷ URL Ұఆͷ·ݻΓ. ؒΛॖ͢Δ͜ͱʹޭ͍ͯ͠Δ. ຊͰڀݚ, Σϒά. ͰݱΕΔ͕͋Δ. ྫ͑, େྔͷࣸਅΛ͍ͯ͠ࡌܝΔ. ϥϑΛ PageRank ͷղੳ͕ฒྻࢄॲཧՄೳͳάϥϑ࠶. ϖʔδ͕͛ڍΒΕΔ. ྫͰ্͛ͨϖʔδ͋ΔҰఆͷ࿈ଓ. ߏஙΛߦ͍,PageRank ͷղੳΛߦ͏ࡍʹฒྻࢄॲཧΛ༻. ͨ͠ϦϯΫͷ·ݻΓ͕ଘࡏ͢Δ. ͭ·Γ,URL Λࣙॻॱʹ. ͍Δ͜ͱͰ, ղੳʹඞཁͳ࣌ؒΛॖ͢Δํ๏ΛఏҊ͢Δ.. ྻ͠ॱʹ ID ΛׂΓͯΔͱࣈ͕࿈ଓ͠, ࿈ଓੑ͕ݱΕΔ.. P. Boldi Β্هͷಛΛ༻͍ͯ, ࿈ଓੑΛར༻ͨࠩ͠. 3. SCAN. ූ߸Խͱ, ॴہੑͱྨࣅੑΛར༻ͨ͠ϥϯϨϯάεѹॖ. ఏҊํ๏Ͱ༻͍ΔΫϥελϦϯάํ๏ͷ SCAN[11] ʹ. Λ༻͍ͨํ๏ΛఏҊ͍ͯ͠Δ. ͜ΕʹΑΓ,1 ϦϯΫ͋ͨΓ. ͍ͭͯઆ໌͢Δ.SCAN ΫϥελϦϯάରͱ͢Δάϥ. 3.08bit ʹ͑Δ͜ͱʹޭ͍ͯ͠Δ.S. Tei Β WebGraph. ϑ G = (V, E) Λղੳ͠, ߏྨࣅͷԼݶΛࣔ͢ᮢ ε. ͷࠩූ߸Խͷࢉޙज़ූ߸Λ༻͍Δ͜ͱͰ, Σϒάϥϑ. ͱ,Cluster ͷ࠷খϊʔυΛࣔ͢ᮢЖʹԠͨ͡ structure-. ͷใྔΛ WebGraph ΑΓ 2 ׂ΄Ͳଟ͘͢ݮΔ͜ͱ. connected cluster Λ࡞͢Δ.Cluster ʹଐ͞ͳ͔ͬͨϊʔ. ʹޭ͍ͯ͠Δ [13]. ͔͠͠, ූ߸ԽʹΑΔํ๏Ͱใྔ. υ Hub ͔ Outlier ͷผ͕ߦΘΕΔ.. Λ͢ݮΔ͜ͱग़དྷΔ͕, ϊʔυͱΤοδΛݮग़. ࠷ॳʹͯ͢ͷϊʔυʹରͯ͠ unclassfied ͷϥϕϧΛ. དྷͳ͍. ·ͨ, ෮߸Խ͢Δඞཁ͕͋Γ PageRank ͷղੳ࣌ؒ. ༩͑Δ.SCAN unclassfied ͷϥϕϧ͕͍͍ͯΔϊʔυ. ͕૿Ճͯ͠͠·͏. ΑͬͯຊͰڀݚ, ූ߸ԽʹΑΔํ๏. ʹରͯ͠, ͦͷϊʔυ͕ core Ͱ͋Δ͔Ͳ͏͔ͷఆΛߦ. ༻͍ͳ͍.. ͏.core ϊʔυ Cluster ͷத৺ͱͳΔϊʔυͰ͋Δ.core Ͱ͋Δ͔அ͢ΔͨΊʹ, ରͱͳΔ unclassfied ͳϊʔ. 2.2 ߏվྑ. υͷߏతྡϊʔυू߹ΛఆΊ, ߏతྨࣅΛࢉܭ, ᮢ. ߏվྑΣϒάϥϑͷߏΛҰ෦Έସ͑Δ͜ͱ. ε Λ༻͍ͯ ε-neighborhood Λࢉग़͢Δ͜ͱͰ core ͔Ͳ. ͰϊʔυΤοδΛ͢ݮΔํ๏Ͱ͋Δ.G. Buehrer. ͏͔அ͢Δ͜ͱ͕ग़དྷΔ. ҎԼʹߏతྡϊʔυू߹. Β Virtual Node Miner[6] ͱݺΕΔ, Սۭͷϊʔυ (Ҏ. ͱߏతྨࣅ,ε-neighborhood ͷఆٛΛهड़͢Δ.. Լ,VN) ΛΣϒάϥϑʹૠೖ͢Δ͜ͱͰΤοδΛݮ. ɾߏతྡϊʔυू߹. ͢Δํ๏ΛఏҊͨ͠.VN ਤ 1 ͷΑ͏ͳશ 2 ෦άϥϑ. v ∈ V Ͱ͋Δͱ͖, ߏతྡϊʔυू߹ϊʔυ v ʹ. ͷதؒʹਤ 2 ͷΑ͏ʹૠೖ͢Δ. ͜ΕʹΑͬͯ, ϦϯΫΛ. ΤοδͰଓ͢Δϊʔυͱ, ϊʔυ v ࣗΒͰߏ͞ΕΔू. ू͠ϊʔυΛ͍ͯ͠ݮΔ.Virtual Node Miner શ. ߹ Γ(v) Ͱද͞ΕΔ.. 2 ෦άϥϑͷ୳ࠪʹ࠷খಠཱஔΛར༻͍ͯ͠Δ. ͜ ΕʹΑΓ, ϊʔυ 7800 ສ, Τοδ 3 ԯͷΣϒά. Γ(v) = {v ∈ V | {v, w} ∈ E} ∪ {v}. ϥϑʹରͯ͠, 1700 ສͷ VN Λૠೖ͠ΤοδΛ 1/7. ɾߏతྨࣅ. ʹݮग़དྷͨͱ͍ͯ͠Δ. ยΒ LittleWeb[7] ͱݺΕ. |Γ(v)| ͕ྡϊʔυू߹ʹ·ؚΕΔϊʔυͱ͢Δͱ, ϊʔ. Δ, ྨࣅϊʔυूʹΑΔΣϒάϥϑѹॖख๏ΛఏҊ͠. υ v, w ؒͷߏతྨࣅ σ(v, w) Ͱද͞ΕΔ.. ͨ.LittleWeb ूରͱͳΔϊʔυΛ୳ࠪ͢ΔͨΊͷΫ ϥελϦϯάॲཧͱ, ΫϥελϦϯά݁ՌΛ༻͍ͨߏվ. |Γ(v) ∩ Γ(w)| σ(v, w) = p |Γ(v)||Γ(w)|. ྑͷ 2 εςοϓͰΣϒάϥϑΛ࠶ߏங͍ͯ͠Δ. ͜Εʹ. ɾε-neighborhood. ΑΓ, ϊʔυΛ 67.3 ˋ, ΤοδΛ 56.3 ˋ͍ͯ͠ݮΔ. ߋ. ߏ ྨ ࣅ ͷ Լ ݶ Λ ࣔ ͢ ᮢ ε Λ ༻ ͍ Δ ͜ ͱ Ͱ,ε-. ʹ,PageRank ͷղੳ࣌ؒΛ 67 ˋॖ͢Δ͜ͱʹޭͯ͠. neighborhood ͱද͞ΕΔ.. ⓒ 2014 Information Processing Society of Japan. 2.
(3) Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.2 Vol.2014-EMB-35 No.20 2014/11/18. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. Nε (v) = {w ∈ Γ(v)|σ(v, w) ≧ ε} ɾCore. ε ∈ R,µ ∈ N,v ∈ V ,|Nε (v)| Λϊʔυ v ͷ ε-neighborhood ͷϊʔυͱ͢Δͱ,core COREε,µ (v) Ͱද͞ΕΔ.. COREε,µ (v) ⇔ |Nε (v)| ≧ µ SCAN ͷఆʹΑΓϊʔυ͕ core Ͱ͋ͬͨ࣌, ͜ͷ core ϊʔυʹϢχʔΫ ID Ͱ͋Δ clusterID ͷϥϕϧΛ༩͑,core ϊʔυΛத৺ʹ structure-connected cluster Λ࡞͢Δ. ϊʔυ͕ core Ͱͳ͔ͬͨ࣌, ͦͷϊʔυʹ non-menber ͷ ϥϕϧΛ༩͑Δ. ɹ SCAN core ϊʔυͱఆ͞Εͨϊʔυ͔Β,structure. reachability Ͱ౸ୡՄೳͳϊʔυ͕ॴଐ͢Δ Cluster ʹ clusterID ͷϥϕϧΛ༩͑Δ. ࠷ॳʹ, Ωϡʔ Q core ϊʔυͷ ε-neighborhood ʹ·ؚΕΔͯ͢ͷϊʔυΛૠೖ͢Δ. ࣍ ʹ, Ωϡʔ Q ૠೖ͞Εͨϊʔυʹରͯ͠,direct structure. reachability ͱͳΔϊʔυͷू߹Ͱ͋Δ R ΛٻΊΔ. ҎԼʹ direct structure reachability ͱ structure reachability ͷఆ ٛΛهड़͢Δ. ɾDirect structure reachability ϊʔυ v ͱϊʔυ w ʹ͓͚Δ direct structure reachability DirREACHε,µ (v, w) Ͱද͞ΕΔ.. 4. ఏҊํ๏ ຊͰڀݚ, ΣϒάϥϑΛ PageRank ͷղੳ͕ฒྻ ࢄॲཧՄೳͳάϥϑ࠶ߏஙΛߦ͍,PageRank ͷղੳΛߦ ͏ࡍʹฒྻࢄॲཧΛ༻͍Δ͜ͱͰ, ղੳʹඞཁͳ࣌ؒΛ ॖ͢Δํ๏ΛఏҊ͢Δ. ఏҊํ๏ʹΑΔΣϒάϥϑղ ੳํ๏,3 ͭͷεςοϓͰߦΘΕΔ.. Step 1 SCAN Λ༻͍ͯ Web graph ΛΫϥελϦϯά͠ Cluster ͱ Hub,Outlier ʹϊʔυผ͢Δ. Step 2 PageRank ͷղੳ͕ฒྻࢄॲཧՄೳʹ͢ΔͨΊ,Cluster ͱ Hub Λ༻͍ͨ Compression graph ͱ Cluster ͷΣϒάϥϑͰ͋Δ Cluster graph ͷ 2 छྨͷάϥ ϑΛ࡞͠,Web graph Λ࠶ߏங͢Δ.. Step 3 Compression graph ͷ PageRank Λࢉग़͠,Cluster ͱ Hub ʹΑΔϥϯΩϯάΛ࡞͢Δ. ͦͷޙ,Cluster. graph ͷ PageRank Λࢉग़͠,Cluster ͷϊʔυʹ ΑΔϥϯΩϯάΛ࡞͠, ͦΕͧΕͷϥϯΩϯάΛϚʔ δ͢Δ. ਤ 3 ఏҊํ๏ͷྲྀΕΛਤࣔͨ͠ͷͰ͋Δ. Ҏ߱, ͦΕͧ ΕͷεςοϓΛৄ͘͠આ໌͢Δ.. DirREACHε,µ (v, w) ⇔ COREε,µ (v) ∧ Nε (v) ɾStructure reachability. ε ∈ R,µ ∈ N,v ∈ V ͱ͢Δͱ, ϊʔυ v ͱϊʔυ w ʹ͓͚ Δ Structure reachability REACHε,µ (v, w) Ͱද͞ΕΔ.. REACHε,µ (v, w) ⇔ ∃1 , ...vn ∈ V : v1 = v ∧ vn = w ∧ ∀i ∈ {1, ..., n − 1} : DirREACHε,µ (vi , vi+1 ) ϊʔυू߹ R ΛٻΊΔʹ, Ωϡʔ Q ʹ·ؚΕΔશͯͷ ϊʔυͷߏతྡϊʔυू߹ʹରͯ͠, ߏతྨࣅΛܭ ࢉ͢Δ. ࠷ʹޙϊʔυू߹ R ʹ·ؚΕΔϊʔυͷॴଐ͢Δ. Cluster ͕ unclassfied ͷ࣌, ϊʔυΛΩϡʔ Q ૠೖ͢Δ. ϊʔυू߹ R ʹ·ؚΕΔϊʔυ͕ unclassfied, ·ͨ non-. menber ͷ࣌, ࡞ͨ͠ clusterID ͷϥϕϧΛ༩͑Δ. Ұ࿈ͷ ॲཧΛΩϡʔ Q ͕ͳ͘ͳΔ·Ͱߦ͍,structure-connected. cluster Λ࡞͠ଓ͚Δ. ɹ structure-connected cluster ͷ࡞͕ऴྃͨ͠Β,non-. menber ͷϥϕϧ͕͍ͨϊʔυʹରͯ͠ Hub ͱ Outlier ͷผΛߦ͏.non-menber ͷϥϕϧ͕͍ͨϊʔυ͕ 2 ͭ Ҏ্ͷҟͳΔ Cluster ͱଓ͍ͯ͠Δ߹, ͦͷϊʔυʹର. ਤ 3 ఏҊํ๏ͷྲྀΕ. ͯ͠ Hub ͷϥϕϧΛ༩͑Δ.1 ͭͷΈͷ Cluster ͱଓͯ͠ ͍Δ߹, ͦͷϊʔυʹରͯ͠ Outlier ͷϥϕϧΛ༩͑Δ. ɹҎ্ΑΓ, શͯͷϊʔυʹରͯ͠ϥϕϧ͕͚ऴΘΓ. SCAN ʹΑΔΫϥελϦϯάྃ͢Δ. ⓒ 2014 Information Processing Society of Japan. 4.1 ΫϥελϦϯά ఏ Ҋ ํ ๏ Ͱ ,SCAN Λ ༻ ͍ ͯ Ϋ ϥ ε λ Ϧ ϯ ά Λ ߦ. 3.
(4) Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.2 Vol.2014-EMB-35 No.20 2014/11/18. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. ͏.SCAN άϥϑͷͯ͢ͷΤοδΛղੳ͠, ߏత ྨࣅͷԼݶΛද͢ᮢ ε ͱ Cluster ͷ࠷খϊʔυΛ ද͢ᮢЖʹԠͨ͡ Cluster Λ࡞͢Δ.Cluster ʹଐ͞ͳ ͔ͬͨϊʔυʹରͯ͠ 2 ͭҎ্ͷ Cluster ͱΤοδͰଓ ͞Ε͍ͯΔϊʔυ Hub ͱ͠,1 ͭͷΈͷ Cluster ͱΤοδ Ͱଓ͞Ε͍ͯΔϊʔυ Outlier ͱผ͢Δ. Ϋϥελ ϦϯάॲཧͷྫͰ͋Δਤ 4 ্ஈͷ Web graph ʹରͯ͠. SCAN Λ࣮ߦ͠, ԼஈϊʔυΛ Cluster ͱ Hub,Outlier ʹ ผͨ͠άϥϑͰ͋Δ.. ਤ 5. ࠶ߏஙͷྲྀΕ. ͏. ࣍ʹ Cluster ͷΣϒάϥϑͰ͋Δ Cluster graph ͷ. PageRank Λࢉग़͠,Cluster ͷϊʔυʹର͢ΔϥϯΩϯ άΛٻΊΔ. ͜ͷ࣌,Cluster ֎ͷϊʔυؒͷΤοδແࢹ͢ Δ.Cluster graph ͷϥϯΩϯά͕·ٻΓ࣍ୈ,Compression. graph ͷରԠ͢Δ Cluster ෦ʹ Cluster graph ͷϥϯΩϯ άΛૠೖ͠, ΣϒάϥϑͷϥϯΩϯάΛ࡞͢Δ. ղੳͷ ਤ 4. ΫϥελϦϯάͷྲྀΕ. εςοϓͷྫͰ͋Δਤ 6 , ॳΊʹ Compression graph Λ. PageRank Λ༻͍ͯϥϯΩϯάΛ࡞. ࣍ʹ,Cluster graph Λ PageRank Λ༻͍ͯϥϯΩϯάΛ࡞. ͜ͷ࣌,Hub ʹؔ. 4.2 ࠶ߏங SCAN ͷ݁ՌΛ༻͍ͯ Web graph Λ࠶ߏங͢Δ.Clus-. ͯ͠ಛʹૢ࡞ߦΘͳ͍. ࠷ʹޙ, ͦΕͧΕͷղੳ݁ՌͰ ͋ΔϥϯΩϯάΛϚʔδ͍ͯ͠Δ.. ter ͱ Hub Ͱߏ͞ΕͨΣϒάϥϑͰ͋Δ Compression graph ͱ Cluster ͷΣϒάϥϑͰ͋Δ Cluster graph ͷ. 4.4 ಛ. 2 छྨΛ࡞͢Δ. ࠶ߏஙͷྫͰ͋Δਤ 5 4.1 Ͱ SCAN Λ. ຊͰڀݚ, ΣϒάϥϑΛ PageRank ͷղੳ͕ฒྻ. ༻͍ͯϊʔυΛผ্ͨ͠ஈͷάϥϑΑΓ,Cluster ͱ Hub. ࢄॲཧՄೳͳάϥϑ࠶ߏஙΛߦ͍,PageRank ͷղੳΛߦ. ͷΈͰߏ͞ΕΔ Compression graph ͱ,Cluster ͷϊʔ. ͏ࡍʹฒྻࢄॲཧΛ༻͍Δ͜ͱͰ, ղੳʹඞཁͳ࣌ؒΛ. υͰߏ͞ΕΔ Cluster graph ʹ࠶ߏஙͨ͠ͷΛԼஈʹ. ॖ͢Δํ๏ΛఏҊ͍ͯ͠Δ͕, ฒྻࢄॲཧ Cluster ͷ. ࣔ͢.. ϊʔυʹର͢ΔϥϯΩϯάΛٻΊΔ࣌ʹߦΘΕΔ.Cluster ͷϊʔυʹର͢ΔϥϯΩϯάΛٻΊΔࡍ,Cluster ֎ͷ. 4.3 PageRank. ϊʔυؒͷΤοδແࢹ͢ΔͨΊ, ͓͍ޓͷ Cluster ؔ. PageRank Ͱͷղੳεςοϓ 2 ͭͷεςοϓʹ͔Εͯ. ੑ͕ແ͘ͳΓ, ฒྻࢄॲཧ͕ՄೳͱͳΔ. ߋʹ,Cluster ͝. ͍Δ.1 ͭ Compression graph ͷղੳ,2 ͭ Cluster. ͱʹ PageRank Λࢉग़͢ΔͨΊ,1 ճ͋ͨΓͷ PageRank. graph ͷղੳͰ͋Δ.2 ͭͷ Cluster graph ͷղੳฒྻ. ͷࢉग़࣌ʹࢉܭରͱͳΔϊʔυͱΤοδ͕ݮগ͢Δ.. ࢄॲཧΛ༻͍Δ. ਤ 6 ղੳͷεςοϓΛਤࣔͨ͠ͷͰ. Ҏ্ΑΓ, ΣϒάϥϑΛ PageRank Ͱղੳ͢Δࡍͷղੳ. ͋Δ. ࠷ॳʹ Cluster ͱ Hub Ͱߏ͞ΕͨΣϒάϥϑͰ. ࣌ؒΛॖ͢Δ͜ͱ͕ՄೳͱͳΔ.. ͋Δ Compression graph ͷ PageRank Λࢉग़͠,Cluster ͱ Hub ʹର͢ΔϥϯΩϯάΛٻΊΔ. ͜ͷ࣌, ҟͳΔ 2 ͭ. 5. ߟ. ͷ Cluster ʹଐ͢ΔϊʔυؒͷΤοδͷΈΛߟྀ͠,Cluster. ఏҊํ๏ʹରͯ͠ 4 ͭͷߟ͖͕͋͢Δ.. ͷϊʔυؒͷΤοδແࢹ͢Δ. ·ͨ, ಉ͡ Cluster ؒͷ. 1 ͭʹຊఏҊํ๏ͷଥੑΛߟ͑Δ. ฒྻࢄॲཧΛ. Τοδ͕ 2 ͭҎ্ଘࡏͯ͠ 1 ͭͷΈͷΤοδͱͯ͠ѻ. ߦ͍͍ͨ߹, ߟྀ͠ͳ͚ΕͳΒͳ͍ͱͯ͠, ׂ. ⓒ 2014 Information Processing Society of Japan. 4.
(5) Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.2 Vol.2014-EMB-35 No.20 2014/11/18. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. ਤ 6. ղੳͷྲྀΕ. ͨ͠ॲཧͲ͏͠ʹؔੑ͕ແ͍͔Ͳ͏͔ߟ͑Δඞཁ͕͋. Δ. ͜ͷผํ๏ʹΑΓ, ΠϯͷΈͷΤοδΛॴ༗͍ͯ͠Δ. Δ. ຊఏҊํ๏ Cluster ͷϊʔυΛ PageRank ʹͯղ. ϊʔυΛΓམͱ͢͜ͱͰ PageRank ͷղੳ͕࣌ؒߋʹ. ੳ͢ΔࡍʹฒྻࢄॲཧΛߦ͏͕,Cluster ͷϊʔυଞ. ॖग़དྷΔͱߟ͑Δ.. ͷ Cluster Hub,Outlier ͱΤοδʹͯଓ͞Ε͍ͯͳ. 4 ͭʹΣϒάϥϑͷΫϥελϦϯάʹֻ͔Δ࣌ؒʹ. ͍. ͢ͳΘͪ,Cluster ͷϊʔυ Cluster ͷϊʔυؒ. ؔ͢Δݕ౼Ͱ͋Δ. ఏҊํ๏,PageRank ʹΑΔΣϒά. ͷΈʹͯΤοδ͕͋Γ, ؔੑ͕͋Δ. Αͬͯ,Cluster ͷ. ϥϑͷղੳʹલॲཧͱͳΔΣϒάϥϑͷΫϥελϦϯά. PageRank ղੳಠཱͨ͠ॲཧͱͳΓฒྻࢄԽ͕Մೳͱ. ࡞͕͋ۀΔͨΊ, ୯७ʹ PageRank ʹΑΔղੳΑΓॲཧ. ͳΔ.. ͕ଟ͘ͳ͍ͬͯΔ. ͦͷͨΊ, ΫϥελϦϯάॲཧ. 2 ͭʹղੳରͱͳΓ͏ΔΣϒάϥϑΛߟ͑Δ.SCAN. ࣌ؒͰऴΘΔ͜ͱ͕·͍͠. ࣮͢Δࡍ,SCAN ͷॲཧΛ. ૬ؔͳີʹޓੑ, ͢ͳΘͪ૬ͳີʹޓΤοδ͕͋Δ෦. MapReduce ʹରԠͤͨ͞ PSCAN[14] , Ϋϥελ͕. Λ Cluster ͱͯ͠ൈ͖ग़͢. Αͬͯ, ղੳରͱ͢ΔΣ. େ͖͍࣮ݱͷάϥϑߏʹରͯ͠ΫϥελϦϯάॲཧ࣌ؒ. ϒάϥϑʹશମʹૈີ͕ଘࡏ͢ΔάϥϑͰ͋Δඞཁੑ. ΛॖͰ͖Δվྑ ܕSCAN[15] ͕͋Γ, ͜ΕΒΛ༻͍Δ͜. ͕͋Δͱߟ͑Δ. ۩ମతͳղੳରͱͳΔΣϒάϥϑ,. ͱͰΑΓॲཧ࣌ؒΛॖ͢Δ͜ͱ͕ՄೳʹͳΔͷͰͳ͍. ΣϒΛීวతʹΫϩʔϦϯάͨ͠άϥϑͰ͋Δ͜ͱ͕. ͔ͱࢥΘΕΔ.. ·͍͠. ຊఏҊํ๏Ͱ PageRank ղੳͷߴԽ͕Ίͳ͍ Σϒάϥϑ, ୯ҰυϝΠϯΛରͱͨ͠άϥϑϒϩ άαΠτΛΫϩʔϦϯάͨ͠άϥϑղੳ͕࣌ؒॖͮ͠ Β͍ͱࢥΘΕΔ.. 6. MapReduce ʹΑΔఏҊํ๏ͷฒྻࢄԽ ຊఏҊํ๏ϊʔυ͕ԯ୯ҐҎ্ͷΣϒάϥϑ͕ ղੳରͱͳΔͨΊ, ΫϥελϦϯάॲཧͱ PageRank ʹ. 3 ͭʹΤοδͷѻ͍ʹؔ͢Δݕ౼Ͱ͋Δ. ࡏݱ SCAN. ΑΔղੳ MapReduce Λ༻͍ͨฒྻࢄॲཧΛߦ͏ํ. ͷํ๏Λ༻͠,Cluster ͱͷΤοδʹͯ͠ Hub ͱ Out-. ΛऔΔ.MapReduce େنσʔλΛฒྻࢄॲཧ͢. lier ͷผΛߦ͓ͬͯΓ, Τοδ͕͋Δ͔ͳ͍͔ͷΈΛ. ΔͨΊͷϑϨʔϜϫʔΫͰ͋Δ.Map ॲཧͰσʔλΛఆ. ผ݅ͱ͍ͯ͠Δ. ͜͜ʹΤοδͷຊ, ͢ͳΘͪΤοδʹ. ٛͨ͠ํ๏ͰׂΛߦ͍,Reduce ॲཧͰఆٛͨ͠ํ๏Ͱ. ॏΈͷใΛՃ͢Δ. ͜ͷใͷՃʹΑΓ, ॏΈͷԼݶ. ׂͨ͠σʔλΛऩू͠, ॲཧΛߦ͏.SCAN ʹؔͯ͠. ͷᮢΛઃఆ͠, ᮢʹຬͨͳ͍ॏΈ͖ΤοδΛॴ༗. PSCAN[14] Ͱ,PageRank ʹؔͯ͠ Data-intensive text. ͍ͯ͠ΔϊʔυΛΓམͱ͢͜ͱͰ PageRank ͷղੳ͕ߋ. processing with MapReduce[16] Ͱ ʹطMapReduce Ͱఆٛ. ʹߴԽग़དྷΔͱߟ͑Δ. ͜ͷଞʹΤοδͷํʹΑΔ. ͞Ε͍ͯΔͨΊ,Web graph ͷ࠶ߏஙͰ͋Δ Cluster graph. ผख๏ʹݕ౼ͷ༨͕͋Δͱߟ͑Δ. ରͱ͢Δϊʔυ͕. ͱ Compression graph ʹؔͯ͠ MapReduce ͷఆٛΛड़. ྆ํͷΤοδ, ͢ͳΘͪΠϯͱΞτ྆ํͷΤοδΛॴ. Δ.. ༗͍ͯ͠Δϊʔυͱ, ยํͷΈΤοδ, ͢ͳΘͪΠϯ·. ·ͣ,Cluster graph ʹ͓͚Δ MapReduce ʹ͍ͭͯड़. ͨΞτยํͷΤοδΛॴ༗͍ͯ͠Δϊʔυͱʹผ͢. Δ.Key/Value ද 1 ʹࣔ͢ͷΛ༻͍Δ. දதͷ label. ⓒ 2014 Information Processing Society of Japan. 5.
(6) Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.2 Vol.2014-EMB-35 No.20 2014/11/18. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. ϊʔυʹ͍͍ͯΔ Cluster Hub,Outlier ͜ͱͰ͋. Hub ʹΑΔϥϯΩϯάΛ࡞͢Δ. ͦͷޙ,Cluster graph. Γ,node ϊʔυʹݻ༗Ͱ༩͑ΒΕ͍ͯΔ ID,outdegree. ͷ PageRank Λࢉग़͠,Cluster ͷϊʔυʹΑΔϥϯΩ. ϊʔυ͔Βग़͍ͯΔϦϯΫઌͷ node ͷ͜ͱΛࣔ͢. ·. ϯάΛ࡞, Ϛʔδ͢Δ.Cluster ͷϊʔυʹର͢Δϥϯ. ͨ,Cluster ͱ Hub ʹͦΕͧΕݻ༗ͷ ID ͕༩͑ΒΕ͍ͯ. ΩϯάΛٻΊΔ࣌,Cluster ֎ͷϊʔυؒͷΤοδແࢹ͢. Δͷͱ͢Δ.. ΔͨΊ, ͓͍ޓͷ Cluster ؔੑ͕ແ͘ͳΓ, ฒྻࢄॲ. ද 1. Cluster graph ʹ͓͚Δ Key/Value. Reduce. ͨΓͷ PageRank ͷղੳରͱͳΔϊʔυͱΤοδ͕ݮগ. Key. Value. label. node. ͢Δ. Ҏ্ΑΓ,PageRank ͷղੳ࣌ؒΛॖ͢Δ͜ͱ͕ग़. output. label. node, outdegree. དྷΔ. ͜ΕʹΑΓ,PageRank Λ༻͍͍ͯΔࡧݕΤϯδϯͳ. input. Cluster ID. node, outdegree. ͲͰ, ΑΓߴͳ݁ࡧݕՌΛฦ͢͜ͱ͕ՄೳʹͳΓ, ใࣾ. output. node. outdegree. ձʹ͓͚Δ࣭ͷߴ͍αʔϏεΛఏ͢ڙΔ͜ͱʹߩ͖Ͱݙ. input Map. ཧ͕ՄೳͱͳΔ. ߋʹ, ΫϥελͰΓ͚Δ͜ͱͰ 1 ճ͋. Δ. ࠓޙ࣮ݧΛߦ͍, ຊఏҊํ๏͕Մೳ͔ূݕΛߦ͏༧ఆ. Cluster graph ʹ͓͚Δ MapReduce , ·ͣ Map ॲཧͱ. Ͱ͋Δ. ՄೳͰ͋Ε, ࡏݱ SCAN Λར༻͍ͯ͠Δ͕, ଞ. ͯ͠,input Ͱ,Cluster Hub,Outlier ͷϥϕϧΛ Key ͱ. ͷΫϥελϦϯάํ๏ͰఏҊํ๏͕࣮͖ͰݱΔՄೳੑ͕. ͠, ͦΕͧΕʹରԠ͢Δ node Λ Value ʹૠೖ͢Δ.output. ͋ΔͱࢥΘΕΔͨΊ, ՃͰ࣮ݧΛߦ͏༧ఆͰ͋Δ.. Ͱ,Value ͷ node ʹରԠ͢Δ outdegree ΛՃ͢Δ. ࣍ʹ. Reduce ॲཧͱͯ͠,input Ͱ,Key ͔Β Cluster ͷϥϕϧ. ࢀߟจݙ. ͕͍ͨͷΛऩू͢Δ.output Ͱ,Value ʹ֨ೲ͞Εͯ. [1]. ͍Δ node Λ Key ʹૠೖ͢Δ.Reduce ॲཧΛ Cluster ID ͝ ͱʹ܁Γฦ͠, શͯͷ Cluster graph Λ࡞͢Δ.. [2]. ࣍ʹ,Compression graph ʹ͓͚Δ MapReduce ʹ͍ͭͯ ड़Δ.Key/Value ද 2 ʹࣔ͢ͷΛ༻͍Δ. ද 2. [3]. Compression graph ʹ͓͚Δ Key/Value Key. Value. input. label. node. Map. output. label. node, outdegree. input. Hub ID. node, outdegree. Reduce. output. Hub ID. Cluster ID. Map ॲཧʹؔͯ͠ Cluster graph ͱಉ͡ॲཧΛߦ͏.Re-. [4]. [5] [6]. duce ॲཧͱͯ͠,input Ͱ,Key ͔Β Hub ͷϥϕϧ͕͍ ͨͷΛऩू͢Δ. ऩूޙ,Value ͷ outdegree ͔Β, Ͳͷ. Cluster ͱଓ͍ͯ͠Δͷ͔Λௐ,Cluster ID Λ output ͷ Value ʹՃ͢Δ.Reduce ॲཧΛ Hub ID ͝ͱʹ܁Γฦ. [7]. ͠,Compression graph Λ࡞͢Δ.. 7. ͓ΘΓʹ ຊͰڀݚ, ΣϒάϥϑΛ PageRank ͷղੳ͕ฒྻ ࢄॲཧՄೳͳάϥϑ࠶ߏஙΛߦ͍,PageRank ͷղੳΛߦ. [8]. [9]. ͏ࡍʹฒྻࢄॲཧΛ༻͍Δ͜ͱͰ, ղੳʹඞཁͳ࣌ؒΛ ॖ͢Δํ๏ΛఏҊ͢Δ. ఏҊํ๏ʹΑΔΣϒάϥϑղੳ. [10]. ํ๏,3 ͭͷεςοϓͰߦΘΕΔ.1 ͭ,SCAN Λ༻͍ ͯ Web graph ΛΫϥελϦϯά͠ Cluster ͱ Hub,Outlier. [11]. ʹϊʔυผ͢Δ.2 ͭʹ,PageRank ͷղੳ͕ฒྻࢄॲ ཧՄೳʹ͢ΔͨΊ,Cluster ͱ Hub Λ༻͍ͨ Compression. graph ͱ Cluster ͷΣϒάϥϑͰ͋Δ Cluster graph ͷ 2 छྨͷάϥϑΛ࡞,Web graph Λ࠶ߏங͢Δ.3 ͭ ,Compression graph ͷ PageRank Λࢉग़͠,Cluster ͱ. ⓒ 2014 Information Processing Society of Japan. [12]. Brin, S. and Page, L.: The anatomy of a large-scale hypertextual Web search engine, Computer networks and ISDN systems, Vol. 30, No. 1, pp. 107–117 (1998). Kleinberg, J. M.: Authoritative sources in a hyperlinked environment, Journal of the ACM (JACM), Vol. 46, No. 5, pp. 604–632 (1999). Kumar, R., Raghavan, P., Rajagopalan, S. and Tomkins, A.: Trawling the Web for emerging cyber-communities, Computer networks, Vol. 31, No. 11, pp. 1481–1493 (1999). Reddy, P. K. and Kitsuregawa, M.: An approach to relate the web communities through bipartite graphs, Web Information Systems Engineering, 2001. Proceedings of the Second International Conference on, Vol. 1, IEEE, pp. 301–310 (2001). Netcraft: October 2014 Web Server Survey. Buehrer, G. and Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities, Proceedings of the 2008 International Conference on Web Search and Data Mining, ACM, pp. 95–106 (2008). ย߂থɼ্ాߴಙɼࢁ໊ૣਓɿLittleWeb: ྨࣅϊʔυू ʹΑΔ Web άϥϑѹॖख๏ɼThe 2nd Forum on Data Engineering and Information Management, DBSJ, pp. E1–4 (2010). Kohlsch¨ utter, C., Chirita, P.-A. and Nejdl, W.: Efficient parallel computation of pagerank, Advances in information retrieval, Springer, pp. 241–252 (2006). Wicks, J. and Greenwald, A.: Parallelizing the computation of pagerank, Algorithms and Models for the WebGraph, Springer, pp. 202–208 (2007). Kamvar, S., Haveliwala, T., Manning, C. and Golub, G.: Exploiting the block structure of the web for computing pagerank, Stanford University Technical Report (2003). Xu, X., Yuruk, N., Feng, Z. and Schweiger, T. A.: Scan: a structural clustering algorithm for networks, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 824–833 (2007). Boldi, P. and Vigna, S.: The webgraph framework I: compression techniques, Proceedings of the 13th international conference on World Wide Web, ACM, pp.. 6.
(7) ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. [13]. [14]. [15]. [16]. Vol.2014-DBS-160 No.20 Vol.2014-OS-131 No.2 Vol.2014-EMB-35 No.20 2014/11/18. 595–602 (2004). γϡςΠɼɹยࢁ܆ɿࢉज़ූ߸Λ༻͍ͨΣϒάϥϑ දݱͷͨΊͷѹॖํ๏ɼThe 6nd Forum on Data Engineering and Information Management, DBSJ, pp. D6–1 (2014). Zhao, W., Martha, V. and Xu, X.: PSCAN: A Parallel Structural Clustering Algorithm for Big Networks in MapReduce, Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on, IEEE, pp. 862–869 (2013). Ԙߒতɼ౻ݪ༃ɼَ௩ɹਅɿߏతྨࣅʹͮ͘جά ϥϑΫϥελϦϯάͷߴԽɼThe 6th Forum on Data Engineering and Information Management, DBSJ, pp. D6–2 (2014). Lin, J. and Dyer, C.: Data-intensive text processing with MapReduce, Synthesis Lectures on Human Language Technologies, Vol. 3, No. 1, pp. 1–177 (2010).. ⓒ 2014 Information Processing Society of Japan. 7.
(8)
関連したドキュメント
[r]
定期的に採集した小学校周辺の水生生物を観 察・分類した。これは,学習指導要領の「身近
[r]
血は約60cmの落差により貯血槽に吸引される.数
The goods and/or their replicas, the technology and/or software found in this catalog are subject to complementary export regulations by Foreign Exchange and Foreign Trade Law
過水タンク並びに Sr 処理水貯槽のうち Sr 処理水貯槽(K2 エリア)及び Sr 処理水貯槽(K1 南エリア)の放射能濃度は,水分析結果を基に線源条件を設定する。RO
過水タンク並びに Sr 処理水貯槽のうち Sr 処理水貯槽(K2 エリア)及び Sr 処理水貯槽(K1 南エリア)の放射能濃度は,水分析結果を基に線源条件を設定する。RO
[r]