ソーシャルメディア上のテキスト情報を考慮した社
会ネットワーク分析モデル
著者
五十嵐 未来, 照井 伸彦
雑誌名
DSSR Discussion Papers
号
J-7
ページ
1-27
発行年
2020-05
URL
http://hdl.handle.net/10097/00127725
Data Science and Service Research
Discussion Paper
Discussion Paper No.J-7
ࢯ࣮ࢩ࣓ࣕࣝࢹୖࡢࢸ࢟ࢫࢺሗࢆ⪃៖ࡋࡓ ♫ࢿࢵࢺ࣮࣡ࢡศᯒࣔࢹࣝ
༑ᔒᮍ᮶ ↷ఙᙪ
2020 ᖺ ᭶
Center for Data Science and Service Research Graduate School of Economic and Management
Tohoku University 27-1 Kawauchi, Aobaku
ιʔγϟϧϝσΟΞ্ͷςΩετใΛߟྀͨ͠
ࣾձωοτϫʔΫੳϞσϧ
ޒेཛྷະདྷ
*রҪ৳
2020 5 ݄
Abstract ۙɼࣾձωοτϫʔΫΛϞσϧԽͯ͠ੳ͢ΔࡍʹɼωοτϫʔΫใ͚ͩͰͳ͘ɼ ਓʑ͕ιʔγϟϧϝσΟΞ্ͳͲͰੜ͢ΔςΩετใΛߟྀͯ͠ίϛϡχςΟߏΛ ଊ͑Δ͜ͱͷॏཁੑ͕૿͍ͯ͠ΔɽςΩετใΛߟྀ͢Δ͜ͱʹΑΓɼωοτϫʔΫ ্ͰີʹΤοδ͕ܗ͞Ε͍ͯΔߏͷதʹɼਓʑ͕࣋ͭڵຯؔ৺ʹԠͨ͡ෳͷ· ͱ·Γ͕ଘࡏ͢Δͱ͍͏Α͏ͳෳࡶͳίϛϡχςΟߏΛ࣋ͭࣾձωοτϫʔΫͷੳ ͕ՄೳͱͳΔɽຊݚڀͰɼ͜ΕΛϞσϧԽͨ͠ Igarashi and Terui (2020) ʹΑΔωο τϫʔΫσʔλͱςΩετσʔλͷಉ࣌ར༻ϞσϧΛ֦ு͠ɼΤοδੜ֬Λؔ͢ Δϊʔυ͝ͱʹҟͳΔΑ͏ʹఆࣜԽ͢Δ͜ͱͰɼϊʔυͷ͕͖࣍ଇʹै͏ͱ ͍͏ࣾձωοτϫʔΫ͕࣋ͭҰൠతͳੑ࣭Λߟྀ͢ΔϞσϧΛఏҊ͍ͯ͠ΔɽTwitter Λ ༻͍࣮ͨূੳͰɼఏҊϞσϧΛ༻͍࣮ͨσʔλͷԠ༻ྫΛࣔ͢ͱͱʹɼઌߦݚ ڀʹ͓͚ΔطଘϞσϧΑΓ༏Εͨ༧ଌੑೳΛ࣋ͭ͜ͱΛࣔ͢ɽ Keywords: ࣾձωοτϫʔΫੳɾίϛϡχςΟݕग़ɾςΩετղੳɾτϐοΫϞσ ϦϯάɾϕΠζਪఆɾϊʔυ࣍ͷҟ࣭ੑ *౦େֶେֶӃܦࡁֶݚڀՊɹത࢜ޙظ՝ఔɿ˟980-8576ɹٶݝઋࢢ੨༿۠27ʵ1ʢE-mailɿ [email protected]ʣɽຊݚڀJSPSՊݚඅ18J20698ͷॿΛड͚ͨͷͰ͢ɽ౦େֶେֶӃܦࡁֶݚڀՊɹڭतʢE-mailɿ[email protected]ʣɽຊݚڀJSPSՊݚඅ(A) 17H01001
1
ং
Social Networking Sites (SNS) ͷྲྀߦe-ίϚʔεαΠτͷ಄ͳͲʹΑΓɼফඅऀΛऔΓ רࣾ͘ձωοτϫʔΫΛੳ͠ɼͦͷߏΛѲ͢Δ͜ͱɼاۀͷϚʔέςΟϯά׆ಈʹ ͓͚ΔॏཁͳҐஔΛΊΔΑ͏ʹͳ͍ͬͯΔɽࣾձωοτϫʔΫੳͷख๏ɼ౷ܭֶࣾ ձֶͷΛத৺ʹݚڀ͞Ε͓ͯΓɼωοτϫʔΫߏΛଊ͑ΔͨΊͷ౷ܭϞσϧ͕ଟ
͘ఏҊ͞Ε͍ͯΔ(e.g., Snijders and Nowicki, 1997; Airoldi et al., 2008)ɽ͜ΕΒͷϞσϧͰ
ɼωοτϫʔΫ্ͷϊʔυͱΤοδΛ؍ଌσʔλͱͯ͠ѻ͍ɼ͔ͦ͜ΒίϛϡχςΟߏ Λநग़͢Δ͜ͱΛతͱ͍ͯ͠Δɽ·ͨɼࣾձωοτϫʔΫʹ͓͍ͯɼϊʔυਓʑͷ͜ͱ Λද͓ͯ͠Γɼਓʑͷଐੑߦಈͱ͍ͬͨਵతͳσʔλΛߟྀ͢Δ͜ͱͰɼωοτϫʔΫ
Ϟσϧͷਫ਼៛ԽΛࢦ͢ݚڀ৺ʹऔΓ·Ε͍ͯΔ(e.g., Handcock et al., 2007)ɽதͰ
ɼۙɼιʔγϟϧϝσΟΞͷྲྀߦޱίϛػೳΛࡌͨ͠e-ίϚʔεαΠτͷ಄ͳ
ͲʹΑΓɼϢʔβʔੜίϯςϯπ(User-Generated-ContentsɼUGC)ɼಛʹςΩετใ
ΛωοτϫʔΫͱΈ߹ΘͤͨࣾձωοτϫʔΫੳϞσϧ͕ଟ͘ఏҊ͞Ε͍ͯΔ(e.g., Liu
et al., 2009; Bouveyron et al., 2018)ɽ
ωοτϫʔΫใ͚ͩͰͳ͘ςΩετใߟྀͨ͠ϞσϧΛߏங͢Δ͜ͱͷརͱ͠ ͯɼҰํͷใ͚ͩͰଊ͑Δ͜ͱ͕͍͠ίϛϡχςΟߏΛநग़Ͱ͖Δͱ͍͏͕ڍ ͛ΒΕΔɽྫ͑ɼ͋ΔֶߍͷಉڃੜͰߏ͞ΕΔίϛϡχςΟΛఆ͢Δɽͦ͜Ͱɼֶ ੜΒޓ͍ʹԿΒ͔ͷؔੑΛ࣋ͬͨີͷߴ͍ωοτϫʔΫ͕ܗ͞Ε͍ͯΔͣͰ͋Δɽ ͕ͨͬͯ͠ɼωοτϫʔΫใͷΈΛߟྀͨ͠ϞσϧΛ༻͍ΔͱɼͦͷΑ͏ͳωοτϫʔΫ ্ʹɼҰͭͷίϛϡχςΟ͕ଘࡏ͍ͯ͠Δͱೝࣝ͞ΕΔɽ͔͠͠ɼͦΕͱಉ࣌ʹɼֶੜΒ Իָಡॻɼεϙʔπͱ͍༷ͬͨʑͳझຯΛ͍࣋ͬͯΔ͜ͱ͕ߟ͑ΒΕΔͨΊɼڞ௨ͷझ ຯΛֶ࣋ͬͨੜΒΛ·ͱΊͯෳͷίϛϡχςΟ͕ଘࡏ͢ΔͱΈͳ͢ํ͕ɼΑΓҙຯͷ͋Δ
ηάϝϯςʔγϣϯͱͳΔՄೳੑ͕͋ΔɽIgarashi and Terui (2020)ͰɼͦͷΑ͏ͳίϛϡ
χςΟΛτϐοΫϕʔεɾίϛϡχςΟͱ໊͚ɼωοτϫʔΫͱςΩετΛߟྀͨ͠Ϟσ ϧʹΑΔݕग़ΛఏҊ͍ͯ͠ΔɽιʔγϟϧϝσΟΞʹද͞ΕΔΦϯϥΠϯ্ͷࣾձωοτ ϫʔΫͰɼݱ࣮ੈքʹ͓͚Δࣾձతͳͭͳ͕Γ͚ͩͰͳ͘ɼڵຯؔ৺ͳͲʹج͍ͮͨͭ ͳ͕Γɼͭ·ΓτϐοΫϕʔεɾίϛϡχςΟ͕ࡏ͍ͯ͠Δ͜ͱ͕ߟ͑ΒΕΔͨΊɼϝσΟ
Ξ্ʹੜ͞ΕͨςΩετίϯςϯπ͔ΒͦͷϢʔβʔͷڵຯؔ৺Λਪఆ͢Δ͜ͱͰɼࣾ ձωοτϫʔΫੳϞσϧΛਫ਼៛Խͤ͞Δ͜ͱ͕Ͱ͖Δɽ ·ͨɼࣾձωοτϫʔΫ͕࣋ͭੑ࣭ͷҰͭͱͯ͠ɼ͕͖࣍ଇʹै͏ͱ͍͏ੑ ࣭͕͋Δɽ͜Εɼ͘͝গͷਓʑ͕ଟ͘ͷਓʑͱωοτϫʔΫ্ͰؔΛ࣋ͪɼͦͷଞେ ͷਓʑɼ͘͝গͷਓʑͱͷΈؔੑΛ࣋ͭʹ͋Δͱ͍͏ੑ࣭Ͱ͋ΔɽͦͷΑ͏ʹɼ ݱ࣮ͷࣾձωοτϫʔΫʹ͓͍ͯɼ࣍ϊʔυ͝ͱʹҟ࣭Ͱ͋Δ͕ɼ֬తϒϩοΫϞσ ϧ(Snijders and Nowicki, 1997)ͳͲදతͳωοτϫʔΫϞσϧͷଟ͕ͦ͘͏Ͱ͋ΔΑ͏ ʹɼIgarashi and Terui (2020)Ͱɼ࣍ͷҟ࣭ੑΛߟྀ͍ͯ͠ͳ͍ɽ
ຊݚڀͰɼIgarashi and Terui (2020)ͷϞσϧΛ֦ு͠ɼϒϩοΫϞσϧʹଈͨ͠Το
δੜ֬Λϊʔυ͝ͱʹҟ࣭ͳΤοδ֬ͱ͠ɼ࣍ͷҟ࣭ੑΛߟྀͨ͠ϞσϧΛఏҊ͢ Δɽຊݚڀͷ࣮ূੳͰɼ࣍ͷҟ࣭ੑΛߟྀͨ͠ఏҊϞσϧΛɼҰൠతͳ֬ϒϩοΫ Ϟσϧͱಉ༷ʹҟ࣭ੑΛߟྀ͠ͳ͍ࠩϞσϧͱൺֱ͠ɼ֎༧ଌʹ͓͍ͯఏҊϞσϧͷํ ͕༏Ε͍ͯΔ͜ͱΛࣔ͢ɽ ҎԼɼ2અͰɼࣾձωοτϫʔΫੳʹؔ͢ΔઌߦݚڀΛ·ͱΊɼຊݚڀͷతͱҐ ஔ͚ͮΛ໌֬ʹ͢Δɽ3અͰɼఏҊϞσϧΛઆ໌͠ɼ4અͰͦͷਪఆ๏Λಋग़͢Δɽଓ ͍ͯɼ5અͰɼTwitterσʔλΛར༻࣮ͨ͠ূݚڀΛใࠂ͠ɼ࠷ޙʹɼ6અͰ݁ͱࠓޙͷ ՝Λड़Δɽ
2
ઌߦݚڀ
2.1
ࣾձωοτϫʔΫੳϞσϧͷਐల
౷ܭֶࣾձֶͳͲΛத৺ͱͯ͠ɼݹ͔͘ΒࣾձωοτϫʔΫΛϞσϧԽ͠ɼͦͷߏΛ Ѳ͢ΔͨΊͷݚڀ͕ଓ͍͍ͯΔɽதͰදతͳͷ͕ɼ֬తϒϩοΫϞσϧ(StochasticBlock Models, SBM, Wang and Wong, 1987; Snijders and Nowicki, 1997)Ͱ͋ΔɽSBMɼ
ϊʔυ͕KݸͷίϛϡχςΟͷ͏ͪҰ͚ͭͩʹଐ͢Δ͜ͱΛԾఆ͓ͯ͠Γɼϊʔυi͕ଐ͢
ΔίϛϡχςΟΛzi ∈ {1, . . . , K}ͱ͢ΔͱɼϊʔυiͱjͷؒʹΤοδ͕ੜ͞ΕΔ֬ɼ
λͰ͋Δɽ
SBMɼϞσϧ͕ఏҊ͞ΕͯҎ߱ɼ༷ʑͳจ຺ͰϞσϧͷ֦ு͕औΓ·Ε͓ͯΓɼྫ
͑ɼAiroldi et al. (2008)ɼSBMͷ֦ுϞσϧͱͯࠞ͠߹ϝϯόʔγοϓ֬తϒϩο ΫϞσϧ (Mixed Membership Stochastic Blockmoldels, MMSB) ΛఏҊ͍ͯ͠ΔɽSBM͕ɼ
ϊʔυʹ୯ҰͷϝϯόʔγοϓΛԾఆ͍ͯͨ͠ͷʹର͠ɽMMSBͰɼ֤ϊʔυɼଞϊʔ υͱͷؔੑຖʹෳͷίϛϡχςΟʹଐ͢Δ͜ͱ͕ڐ༰͞Ε͍ͯΔɽϊʔυi͔Βjͷؔ ੑʹ͓͍ͯɼϊʔυi͕ଐ͢ΔίϛϡχςΟΛsijɼϊʔυj͕ଐ͢ΔίϛϡχςΟΛrjiͱ͢ Δͱɼ྆ऀͷؒʹΤοδ͕ੜ͞ΕΔ֬ɼψsijrjiͰද͞ΕΔɽ͜ͷ֦ுʹΑΓɼMMSB ίϛϡχςΟͷॏͳΓΛߟྀ͢Δ͜ͱ͕Ͱ͖ʢSBMͰίϛϡχςΟ͕ॏͳΔ͜ͱͳ ͍ʣɼΑΓݱ࣮ʹଈͨ͠ϞσϦϯά͕Մೳͱͳ͍ͬͯΔɽ ·ͨɼࣾձֶͷจ຺Ͱɼϊʔυؒͷؔੑ͕ੑผྸͱ͍ͬͨϊʔυݻ༗ͷಛྔͷ
ӨڹΛड͚ܾͯ·Δ͜ͱΒΕ͍ͯΔ(Hoff et al., 2002; Handcock et al., 2007; Krivitsky
et al., 2009)ɽ͔͠͠ɼຊݚڀͰɼιʔγϟϧϝσΟΞʹද͞ΕΔΑ͏ͳΦϯϥΠϯ্ͷ ࣾձωοτϫʔΫʹண͍ͯ͠ΔͨΊɼͦͷΑ͏ͳಛྔߟྀ͠ͳ͍ɽTwitterͷΑ͏ͳ ಗ໊ܕιʔγϟϧϝσΟΞͰɼϢʔβʔྸੑผͱ͍ͬͨݸਓใΛӅͨ͠ঢ়ଶͰΞ ΧϯτΛొ͢Δ͜ͱ͕Ͱ͖ɼͦͷΑ͏ͳঢ়گʹ͓͍ͯଞऀͱؔΛ݁ͿࡍʹߟྀͰ͖Δ ใɼ૬ख͕ܗ͍ͯ͠ΔωοτϫʔΫͱϝσΟΞ্ʹߘͨ͠ίϯςϯπͷΈͰ͋Δɽ ͜ΕΒͷσʔλ͕ར༻ՄೳͰ͋ΕɼఏҊϞσϧʹऔΓࠐΉ͜ͱ༰қͰ͋Γɼࣾձֶతࢹ ͔ΒͷੳՄೳͰ͋Δɽ
2.2
ωοτϫʔΫͱςΩετใͷಉ࣌ϞσϦϯάʹؔ͢Δݚڀ
લઅͰڍ͛ͨࣾձωοτϫʔΫϞσϧʹؔ͢ΔݚڀͰɼωοτϫʔΫใͷΈʹணͯ͠ ϞσϧΛఏҊ͍ͯ͠Δ͕ɼۙɼTwitterFacebookͱ͍ͬͨΦϯϥΠϯ্ͷࣾձωοτ ϫʔΫߏΛΑΓਂ͘ཧղ͢ΔͨΊʹɼωοτϫʔΫͱςΩετใΛͲͪΒߟྀ͢ΔϞσϧ͕Μʹݚڀ͞Ε͍ͯΔɽྫ͑ɼChang and Blei (2010)ɼϊʔυʹݻ༗ͷςΩε
τใʹରͯ͠τϐοΫϞσϧΛద༻͠ɼϊʔυͷςΩετʹׂΓͯΒΕͨτϐοΫׂ߹
Topic Model, RTM) ΛఏҊ͍ͯ͠Δɽͨͩ͠ɼRTMͷత͕ɼωοτϫʔΫใΛՃຯ͠ ͯςΩετใʹ͓͚ΔτϐοΫΛਪఆ͢Δ͜ͱͰ͋Δͷʹରͯ͠ɼຊݚڀͷతςΩε τใΛߟྀͯ͠ωοτϫʔΫ্ͷίϛϡχςΟߏΛѲ͢Δ͜ͱͰ͋ΔΑ͏ʹରরతͳ ͷͰ͋Δɽ
Chang and Blei (2010)ͷΑ͏ʹςΩετใΛજࡏతσΟϦΫϨ๏(latent Dirichlet allocation, LDA, Blei et al., 2003)ͦͷ֦ுϞσϧΛ༻͍ͯωοτϫʔΫϞσϧʹऔΓࠐ
Ήͱ͍͏ํ๏ଞʹ͍͔ͭ͘ͷݚڀͰݟΒΕΔɽྫ͑ɼLiu et al. (2009)ɼTopic-Link
LDAΛఏҊ͓ͯ͠Γɼϊʔυݻ༗ͷςΩετใΛߟྀͯ͠ίϛϡχςΟߏΛݕग़͢Δͱ ͍͏Ͱຊݚڀͱಉ͡తΛ͍࣋ͬͯΔɽͨͩ͠ɼSBMͱಉ༷ʹɼϊʔυ͕୯Ұͷίϛϡχ ςΟʹଐ͢Δ͜ͱΛԾఆ͍ͯ͠Δ෦ຊݚڀͱҟͳΔͰ͋Δɽ·ͨɼLiu et al. (2009) ͰɼΤοδੜ͕֬ɼϊʔυݻ༗ͷτϐοΫٴͼίϛϡχςΟׂ߹ͷྨࣅʹΑͬͯఆ ٛ͞Ε͍ͯΔͨΊɼରͱ͢ΔωοτϫʔΫΛແάϥϑͱͯ͠ѻ͏͜ͱΛఆ͍ͯ͠Δͷ ʹର͠ɼຊݚڀΛؚΊͨϒϩοΫϞσϧʹ͓͍ͯɼK× KߦྻͷΤοδ֬ύϥϝʔλΛ ༻͍ͨωοτϫʔΫϞσϦϯάʹΑΓɼάϥϑͷํੑʹ͔͔ΘΒͣϞσϧΛద༻ՄೳͰ͋ ΔɽଞʹɼBouveyron et al. (2018)ɼSBMʹςΩετใͷϞσϧΛՃ͑Δ͜ͱͰ֦ு
ͨ͠ɼStochastic Topic Block Model (STBM) ΛఏҊ͍ͯ͠Δɽ
͜ΕΒ୯ҰͷϝϯόʔγοϓΛԾఆͨ͠SBMͷ֦ுϞσϧͰ͋Δ͕ɼZhu et al. (2013) ɼϊʔυͷࠞ߹ϝϯόʔγοϓΛԾఆ͠ɼςΩετͱωοτϫʔΫใͷ྆ऀΛߟྀ͢Δ ωοτϫʔΫੳϞσϧΛఏҊ͍ͯ͠Δɽ͜ͷʹ͓͍ͯຊݚڀʹ͓͚ΔఏҊϞσϧͱࣅ ͨߏΛ༗͍ͯ͠Δ͕ɼओͳ૬ҧɼΤοδʹׂΓͯΒΕΔίϛϡχςΟͱ୯ޠʹׂΓ ͯΒΕΔτϐοΫ͕ಉҰͷʹै͍ͬͯΔͱ͍͏Ͱ͋Δɽݴ͍͑ΕɼZhu et al. (2013)ίϛϡχςΟͱτϐοΫͷ࣍ݩΛಉҰͷͷͱͯ͠ѻ͍ͬͯΔͱ͍͑Δɽ͔͠͠ɼݱ ࣮ͷࣾձωοτϫʔΫͰɼίϛϡχςΟͱτϐοΫ͕ඞͣ͠ޓ͍ʹରԠ͍ͯ͠Δͱݶ Βͳ͍ɽྫ͑ɼԻָͱεϙʔπʹڵຯͷ͋Δϝϯόʔ͕ີͳϦϯΫߏΛ͍࣋ͬͯΔωο τϫʔΫΛߟ͑Δɽ͜ͷΑ͏ͳίϛϡχςΟΛZhu et al. (2013)ͷϞσϧͰݕग़ͨ͠ͱ͢ ΔͱɼҰͭͷίϛϡχςΟʹରͯ͠ɼԻָͱεϙʔπͱ͍͏ෳͷҙຯత·ͱ·ΓΛͭτ ϐοΫ͕ରԠͯ͠͠·͍ɼτϐοΫͷղऍੑʹ͚ܽΔɽҰํͰɼຊݚڀͰɼίϛϡχςΟ
ͱτϐοΫ͕ͦΕͧΕҟͳΔʹै͏͜ͱΛԾఆ͓ͯ͠Γɼ্هͷΑ͏ͳωοτϫʔΫʹ ରͯ͠ɼҰͭͷίϛϡχςΟͱɼԻָτϐοΫٴͼεϙʔπτϐοΫͷΑ͏ʹผʑʹෳ
τϐοΫΛରԠͤ͞Δ͜ͱ͕Ͱ͖Δɽ3અͰɼͦͷৄࡉͳఆࣜԽΛઆ໌͢Δɽ
͜ΕΒͷطଘϞσϧΛ౿·͑ͯɼIgarashi and Terui (2020)Ͱɼϊʔυͷࠞ߹ϝϯόʔ
γοϓΛԾఆͨ͠ωοτϫʔΫͱςΩετͷಉ࣌ϞσϦϯάΛఏҊ͍ͯ͠ΔɽຊݚڀͰɼ ͜ͷϞσϧΛ֦ு͠ɼΤοδ֬Λϊʔυ͝ͱʹҟ࣭ͳύϥϝʔλͱ͢ΔϞσϧΛݕ౼͢Δɽ ͜ΕʹΑΓɼࣾձωοτϫʔΫ͕Ұൠతʹ༗͢Δ࣍ͷҟ࣭ੑΛߟྀͨ͠ϞσϦϯά͕
ՄೳͱͳΔɽઌߦݚڀʹ͓͍ͯɼKarrer and Newman (2011)͕ɼSBMͰఆٛ͞ΕΔΑ͏
ͳϊʔυʹ͍ͭͯಉ࣭తͳΤοδੜ֬Λద༻͢ΔͷͰͳ͘ɼϊʔυ͝ͱͷظ࣍Λ ύϥϝʔλͱͯ͠ಋೖ͠ɼؔ͢ΔϊʔυʹԠͯ͡Τοδੜ͕֬ҟ࣭ͱͳΔΑ͏ͳิਖ਼ Λߦ͏ϞσϧΛఏҊ͍ͯ͠Δɽຊݚڀʹ͓͚ΔఆࣜԽͰɼΤοδੜ֬ࣗମΛϊʔυ͝
ͱʹҟ࣭ͳύϥϝʔλͱͯ͠ఆ͓ٛͯ͠ΓɼKarrer and Newman (2011)ͱҟͳΔΞϓϩʔ
νΛͱ͍ͬͯΔɽ ද1Ͱɼ͜͜·Ͱʹٞͨ͠ຊݚڀͱઌߦݚڀͱͷൺֱΛ·ͱΊ͍ͯΔɽ·ͣɼωοτ ϫʔΫςΩετͲͪΒ͔ͷΈΛ؍ଌσʔλͱͯ͠ѻ͏Ϟσϧͱൺֱ͢ΔͱɼຊݚڀͰఏҊ ͢ΔϞσϧɼͦͷ྆ऀΛߟྀͯࣾ͠ձωοτϫʔΫੳΛߦ͏ͷͰ͋Γɼલड़ͨ͠Α͏ ʹͲͪΒ͔Ұํͷใ͚ͩͰัଊ͢Δ͜ͱ͕͍͠ωοτϫʔΫߏΛ໌Β͔ʹग़དྷΔՄ ೳੑ͕͋Δɽ·ͨɼͦͷ྆ใΛѻ͏طଘϞσϧͱൺֱ͢Δͱɼϊʔυʹࠞ߹ϝϯόʔγο ϓΛڐ༰͍ͯ͠Δɼάϥϑͷ༗ແʹ͔͔ΘΒͣద༻Մೳͳɼͦͯࣾ͠ձωοτϫʔ Ϋʹ͓͚Δ࣍ͷҟ࣭ੑΛߟྀͨ͠ϞσϦϯάΛߦ͍ͬͯΔ͕ຊݚڀͷಛ৭ͱݴ͑Δɽ͜ ΕΒͷൺֱΛ௨ͯ͠ɼ5અͰɼఏҊϞσϧ͔Β࣍ͷҟ࣭ੑͷߟྀΛআ͍ͨϞσϧʹ͋ͨ
ΔIgarashi and Terui (2020)ɼٴͼςΩετใΛߟྀ͠ͳ͍Ϟσϧʹ૬͢ΔAiroldi et al. (2008)ΛൺֱϞσϧͱͯͦ͠ΕΒͷ༧ଌੑೳΛݕূ͍ͯ͠Δɽ
3
Ϟσϧ
ຊઅͰɼ·ͣఏҊϞσϧͷجૅͱͳΔIgarashi and Terui (2020)ͷϞσϧΛհ͠ɼ࣍ʹ
ͦͷࠩҟΛ໌Β͔ʹ͠ͳ͕ΒຊݚڀͰ༻͢ΔϞσϧͷઆ໌Λߦ͏ɽ·ͨɼ྆ϞσϧͰڞ௨ ͯ͠ɼ؍ଌ͞ΕΔσʔλɼωοτϫʔΫใΛද͢ྡߦྻAɼٴͼϊʔυʹݻ༗ͷςΩ ετใΛද͢୯ޠͷBag-of-Wordsू߹W ͷೋͭͰ͋Δɽ ·ͣɼDݸͷϊʔυΛ࣋ͭ༗άϥϑΛߟ͑ΔͱɼͦͷྡߦྻAɼD× DߦྻͰ͋ Γɼߦྻͷ֤ཁૉϊʔυؒͷؔੑΛࣔ͢ೋมͰ͋Δɽͭ·Γɼaij = 0Τοδ͕ ଘࡏ͠ͳ͍͜ͱΛද͠ɼaij = 1ଘࡏ͢Δ͜ͱΛද͢ɽ·ͨɼࣗݾϧʔϓߟ͑ͳ͍͜ͱ
ͱ͠ɼશͯͷiʹ͍ͭͯaii = 0Ͱ͋ΔɽIgarashi and Terui (2020)Ͱɼϊʔυi͔Βj
ͷؔੑʹ͓͍ͯɼͦͷૹΓखi͕જࡏతͳίϛϡχςΟsij ∈ {1, . . . , K}ʢKίϛϡχ ςΟʣʹଐ͠ɼड͚खjજࡏίϛϡχςΟrji ∈ {1, . . . , K} ʹଐ͢Δ͜ͱΛԾఆ͢Δɽ· ͨɼ͜ΕΒજࡏίϛϡχςΟͷߦྻදݱΛS = (sij), R = (rji)ͱ͢ΔɽϞσϧͷੜաఔʹ ͓͍ͯɼૹΓखٴͼड͚खͷίϛϡχςΟΧςΰϦΧϧɼsij | ηi ∼ Categorical (ηi)ɼ rji| ηj ∼ Categorical (ηj)ʹै͏ɽͨͩ͠ɼηi = (ηi1, . . . , ηiK)ϊʔυiͷίϛϡχςΟॴ ଐׂ߹Λද͢ύϥϝʔλͰ͋Γɼkηik= 1Λຬͨ͢ɽ͜ͷίϛϡχςΟͷߦྻදݱ H = (η1, . . . , ηD)Ͱද͞ΕΔɽHͷࣄલσΟϦΫϨηi | γ ∼ Dirichlet(γ)ʹै͏ ͜ͱΛԾఆ͓ͯ͠Γɼγ = (γ1, . . . , γK)ਪఆʹ͋ͨͬͯௐ͕ඞཁͳϋΠύʔύϥϝʔλ Ͱ͋Δɽ
ϊʔυiͱjؒͷؔੑaijɼsijͱrji͕ॴ༩ͷ࣌ɼϕϧψʔΠɼaij | sij = k, rji=
k, Ψ ∼ Bernoulli (ψkk)ʹै͏͜ͱΛԾఆ͢Δɽͨͩ͠ɼψkk ɼૹΓखͷίϛϡχςΟ ͕kɼड͚खͷίϛϡχςΟ͕kͷ࣌ʹΤοδ͕ੜ͞ΕΔ֬Λࣔ͢ɽ·ͨɼΤοδ֬ ͷK× KߦྻදݱΨ = (ψkk)Ͱද͞Εɼߦྻͷ֤ཁૉɼࣄલͱͯ͠ϕʔλɼ ψkk | δkk, kk ∼ Beta(δkk, kk)Λ࣋ͭɽ͜ͷͱ͖ɼδ, Ψͱಉ࣍͡ݩΛ࣋ͭϋΠύʔύϥ ϝʔλͰ͋Δɽ ैͬͯɼίϛϡχςΟHΛॴ༩ͱͨ͠ͱ͖ͷωοτϫʔΫσʔλʹର͢Δ݅
ҎԼͰఆٛ͞ΕΔɽ p(A, S, R, Ψ| H) = p(A| S, R, Ψ)p(S | H)p(R | H)p(Ψ | δ, ) = D i=1 D j=1,j=i
{p(aij | sij, rji, Ψ)p(sij | ηi)p(rji| ηj)}
K k=1 K k=1 p(ψkk | δkk, kk). (1) ଓ͍ͯɼϊʔυݻ༗ͷςΩετίϯςϯπʹ͍ͭͯߟ͑Δɽ͜͜Ͱɼϊʔυi͕ੜ͠ ͨςΩετʹ͍ͭͯɼจষͷ୯ޠͷॱ൪Λແࢹͯ͠ɼͭ·ΓBag-of-WordsͷܗࣜͰอଘ͠ ͨMiݸͷ୯ޠΛ؍ଌσʔλͱ͢Δɽϊʔυiʹؔ͢Δm൪ͷ୯ޠwimજࡏతͳίϛϡχ ςΟxim∈ {1, . . . , K}ٴͼτϐοΫzim∈ {1, . . . , L}ʢLτϐοΫʣΛ࣋ͭ͜ͱΛԾఆ͢ Δɽ୯ޠίϛϡχςΟͱ୯ޠτϐοΫͷྻදݱͦΕͧΕXͱZͰද͞Εɼ֤ྻͷཁૉ Mi࣍ݩͷϕΫτϧͰ͋ΔɽϞσϧͷੜաఔʹ͓͍ͯɼ୯ޠίϛϡχςΟximΧςΰϦ
Χϧxim | ηi∼ Categorical(ηi)ʹै͏ɽ͜͜Ͱɼηi͕୯ޠίϛϡχςΟxim͚ͩͰͳ͘ɼ
ϊʔυίϛϡχςΟsij, rjiΛੜ͢ΔύϥϝʔλͰ͋ͬͨ͜ͱΛࢥ͍ग़͢ͱɼηiωοτϫʔ
ΫσʔλͱςΩετσʔλͷϞσϧʹڞ௨͢ΔύϥϝʔλͰ͋Γɼ྆ऀͷใΛͭͳ͛Δ ׂΛՌ͍ͨͯ͠ΔɽҰํɼ୯ޠτϐοΫ୯ޠίϛϡχςΟ͕ॴ༩ͷঢ়ଶͰΧςΰϦΧϧ zim | xim = k, Θ∼ Categorical(θk)ʹै͏ɽ͜ͷͱ͖ɼθk = (θk1, . . . , θkL)ɼίϛϡχ ςΟkʹؔ͢ΔτϐοΫׂ߹Λࣔ͢ύϥϝʔλͰ͋Γɼlθkl = 1Λຬͨ͢ɽ͜ͷτϐοΫ ͷߦྻදݱΘ = (θ1, . . . , θk)Ͱ͋ΓɼࣄલσΟϦΫϨθk | α ∼ Dirichlet(α) ʹै͏ɽ ୯ޠτϐοΫzimΛॴ༩ͱͯ͠ɼͦΕʹରԠ͢Δ୯ޠwim ∈ {1, . . . , V }ʢV ૯୯ޠʣ ɼ୯ޠτϐοΫʹରԠ͢ΔΧςΰϦΧϧwim | zim = l, Φ ∼ Categorical(φl)ʹै͏ɽ ͨͩ͠ɼφl= (φl1, . . . , φlV)ɼͦͷτϐοΫʹ͓͍ͯ୯ޠ͕ੜ͞ΕΔ֬Λද͢୯ޠ Ͱ͋Γɼvφlv = 1Λຬͨ͢ɽ୯ޠͷߦྻදݱΦ = (φ1, . . . , φL)Ͱ͋Γɼͦͷࣄલ σΟϦΫϨφl ∼ Dirichlet(β)ʹै͏ɽ ैͬͯɼςΩετσʔλʹର͢Δ݅ɼಉ͘͡ίϛϡχςΟHΛॴ༩ͱ͠
ͯɼҎԼͰఆٛ͞ΕΔɽ p(W, X, Z, Θ, Φ| H) = p(W | Z, Φ)p(Z | X, Θ)p(X | H)p(Θ | α)p(Φ | β) = D i=1 Mi m=1
{p(wim| zim, Φ)p(zim| xim, Θ)p(xim | ηi)}
K k=1 p(θk | α) L l=1 p(φl| β). (2) ίϛϡχςΟHΛॴ༩ͱ͢Δ͜ͱͰɼࣜʢ1ʣٴͼʢ2ʣͷ͕݅ಠཱͱͳΔԾఆ
Λஔ͍͍ͯΔͨΊɼIgarashi and Terui (2020)ͷ݁߹ɼࣜʢ1ʣͱʢ2ʣٴͼHͷີ
Λֻ͚߹ΘͤΔ͜ͱͰҎԼͷΑ͏ʹಘΒΕΔɽ p(A, W, S, R, X, Z, H, Ψ, Θ, Φ) = D i=1 D j=1,j=i
{p(aij | sij, rji, Ψ)p(sij | ηi)p(rji| ηj)} Mi
m=1
{p(wim | zim, Φ)P (zim| xim, Θ)p(xim | ηi)}
× D i=1 p(ηi | γ) K k=1 K k=1 p(ψkk | δkk, kk) K k=1 p(θk | α) L l=1 p(φl | β). (3)
Igarashi and Terui (2020)ͷϞσϧͰɼϢʔβʔ͕ੜͨ͠ςΩετίϯςϯπΛߟྀ ͠ͳ͕ΒωοτϫʔΫ্ͷίϛϡχςΟߏΛѲ͢Δɼͭ·ΓτϐοΫϕʔεɾίϛϡχ ςΟΛݟ͚ͭΔ͜ͱΛతͱ͍ͯ͠Δɽ͜ͷͱ͖ɼϊʔυؒʹΤοδ͕ੜ͞ΕΔ֬Λɼ
aij = 1 | sij = k, rji = k ∼ Bernoulli(ψkk)ͱͯ͠શͯͷϊʔυʹରͯ͠ಉ࣭తͰ͋Δ͜ͱ
ΛԾఆ͍ͯ͠Δɽ͔͠͠ɼલઅͰઆ໌ͨ͠Α͏ʹɼݱ࣮ͷࣾձωοτϫʔΫʹ͓͍ͯɼ
͕࣍ϊʔυʹΑͬͯେ͖͘ҟͳΔ͜ͱ͕ҰൠతͰ͋ΓɼIgarashi and Terui (2020)Ͱɼ͜
ͷੑ࣭ΛߟྀͰ͖͍ͯͳ͍ͨΊɼݱ࣮ͷωοτϫʔΫσʔλʹରͯ͠ेʹϑΟοςΟϯά Ͱ͖ͳ͍Մೳੑ͕͋Δɽ ຊݚڀͰɼ͜ͷΛղܾ͢ΔͨΊʹɼΤοδੜ֬ͷ෦Λaij = 1| sij = k, rji= k ∼ Bernoulli(ψjkk)ͱͯ͠ϞσϧΛ֦ு͢Δɽ͜ͷͱ͖ɼψjkkɼૹΓखͷίϛϡχςΟ ͕kͰɼड͚खͷίϛϡχςΟ͕kͷ࣌ʹΤοδ͕ੜ͞ΕΔ֬Λࣔ͠ɼड͚खͷϊʔυ jʹґଘ͢Δҟ࣭ͳύϥϝʔλͰ͋Δɽ͜ͷఆࣜԽʹΑΓɼྫ͑ɼड͚खj͕ίϛϡχςΟ
kͷதͰଟ͘ͷΤοδΛूΊΔɼ͍ΘΏΔϋϒϊʔυͰ͋Δ߹ʹɼψjkk͕େ͖ͳΛऔΔ ͜ͱͰͦΕΛදݱ͢Δɽ͜ΕʹΑΓɼఏҊϞσϧɼࣾձωοτϫʔΫʹ͓͚Δ࣍ͷ ҟ࣭ੑΛө͠ɼϊʔυ͝ͱͷ࣍ͷଟՉʹԠͯ͡Τοδ֬ύϥϝʔλΛҟ࣭తʹਪఆ͢ Δ͜ͱͰɼΑΓݱ࣮ͷࣾձωοτϫʔΫʹଈͨ͠ϞσϦϯά͕ՄೳͱͳΔɽ·ͨɼΤοδ֬ ͷK× KߦྻදݱΨi = (ψikk)Ͱද͞Εɼߦྻͷ֤ཁૉɼࣄલͱͯ͠ϕʔλ ɼψikk | δkk, kk ∼ Beta(δkk, kk)ʹै͏͜ͱΛԾఆ͢Δɽ
ຊݚڀͰ༻͍ΔϞσϧɼ্ड़ͨ͠Ҏ֎Igarashi and Terui (2020)ͱಉ͡ఆࣜԽΛ
Ծఆ͍ͯ͠ΔͨΊɼίϛϡχςΟHΛॴ༩ͱͨ͠ͱ͖ͷωοτϫʔΫσʔλʹର͢Δ ɼࣜ(1)͕ҎԼͷΑ͏ʹมߋ͞ΕΔɽ p(A, S, R, Ψ| H) = p(A| S, R, Ψ)p(S | H)p(R | H)p(Ψ | δ, ) = D i=1 D j=1,j=i
{p(aij | sij, rji, Ψj)p(sij | ηi)p(rji| ηj)} K k=1 K k=1 p(ψikk | δkk, kk) . (4)
4
͖݅ࣄޙͱύϥϝʔλਪఆ
ઌߦݚڀʹ͓͍ͯɼτϐοΫϞσϧΛਪఆ͢ΔͨΊͷख๏ɼมϕΠζ๏ஞֶ࣍श๏ͳ Ͳଟ͘ఏҊ͞Ε͍ͯΔɽͦͷதͰ࠷͘ΘΕ͍ͯΔͷͷҰ͕ͭɼ่յܕΪϒεαϯ ϓϦϯά(collapsed Gibbs sampling, CGS, Griffiths and Steyvers, 2004)Ͱ͋Δɽ͜Εɼ જࡏมͷࣄޙΛಋग़͢ΔաఔͰϞσϧύϥϝʔλΛੵফڈ͠ɼαϯϓϦϯάΛޮ తʹߦ͏ख๏Ͱ͋ΔɽҎԼͰɼຊݚڀͷఏҊϞσϧʹର͢ΔCGSͷͷͨΊͷ͖݅ࣄ ޙΛಋग़͢Δɽ ఏҊϞσϧʹ͓͚ΔɼίϛϡχςΟHɼΤοδ֬ΨɼτϐοΫΘɼ୯ޠΦ ͷ4ͭͷύϥϝʔλʹ͍ͭͯɼࣄલͱͷڞੑʹج͖ͮɼ͖݅ࣄޙΛطͷ ͱͯ͠ಋग़͢Δ͜ͱ͕Ͱ͖Δɽͨͩ͠ɼͦͷৄࡉͳಋग़աఔAppendix AʹৡΔɽ· ͨɼͦΕҎ֎ͷજࡏมͱͯ͠ɼૹΓखٴͼड͚खͷજࡏίϛϡχςΟS, Rɼ୯ޠͷજࡏί ϛϡχςΟXٴͼજࡏτϐοΫZͷ4͕ͭ͋Δ͕ɼ͜ΕΒͷ͖݅ࣄޙɼAppendixAͰಋग़ͨ͠ࣄޙΛ༻͍ͯҎԼͷΑ͏ʹಋग़͞ΕΔɽ
p(sij = k, rji = k | aij, A\ij, S\ij, R\ji, X, γ, δ, ) ∝
p(sij = k | ηi)p(rji = k | ηj)p(xi | ηi)p(xj | ηj)p(ηi | S\ij, R\ji, X, γ) p(ηj | S\ij, R\ji, X, γ)dηidηj×
p(aij | ψjkk)p(ψjkk | A\ij, S\ij, R\ji, δ, )dψjkk
= Nik\ij + Mik+ γk t Nit\ij+ Mit+ γt × Njk\ji+ Mjk+ γk t Njt\ji+ Mjt+ γt× n(+)jkk\ij+ δkk I(aij=1) n(−)jkk\ij + kk I(aij=0) n(+)jkk\ij + n(−)jkk\ij + δkk+ kk , (5)
p(xim = k, zim= l | W, S, R, X\im, Z\im, α, β, γ) ∝
p(si, ri | ηi)p(xim = k | ηi)p(ηi | S, R, X\im, γ)dηi×
p(zim= l | θk) p(θk | X\im, Z\,im, α)dθk× p(wim = v | φl)p(φl | W\im, Z\im, β)dφl = Nik+ Mik\im+ γk t Nit+ Mit\im+ γt × Mkl\im+ αl q Mkq\im+ αq × Mlv\im+ βv u Mlu\im+ βu. (6) ͨͩ͠ɼࣜʢ5ʣʹ͓͚ΔNikɼϊʔυi͕࣋ͭD− 1ݸͷؔੑʹ͓͍ͯɼૹΓखٴͼड ͚खͷજࡏίϛϡχςΟͱͯ͠kׂ͕ΓͯΒΕͨճΛද͠ɼMikɼϊʔυiͷ୯ޠί ϛϡχςΟʹkׂ͕ΓͯΒΕͨճΛද͢ɽn(+)ikkɼϊʔυiʹؔ͢ΔD− 1ݸͷؔੑ ͷ͏ͪɼίϛϡχςΟk, kׂ͕ΓͯΒΕͨΤοδ͕ੜ͞Ε͍ͯΔؔੑͷɼn(−)ikk ɼ Τοδ͕ੜ͞Ε͍ͯͳ͍ؔੑͷΛද͢ɽࣜʢ6ʣʹ͓͚ΔMklɼίϛϡχςΟkׂ͕ ΓͯΒΕͨ୯ޠͷ͏ͪτϐοΫlׂ͕ΓͯΒΕͨճɼMlvɼޠኮvʹτϐοΫlׂ͕ ΓͯΒΕͨճΛද͢ɽ·ͨɼఴ͑ࣈͷ\͜ΕΒͷΧϯτ͔Βɼ֘σʔλΛআ͘͜ ͱΛҙຯ͢Δɽ CGSͰɼࣜʢ5ʣٴͼʢ6ʣʹैͬͯɼ֤ؔੑٴͼ୯ޠʹରͯ͠જࡏίϛϡχςΟͱ τϐοΫΛ܁Γฦ͠αϯϓϦϯά͢Δɽ࠷ऴతʹɼॳظʹґଘ͢ΔՔಇظؒΛআ͍ͨαϯ ϓϧΛ༻͍ͯɼੵফڈ͍ͯͨ͠4ͭͷύϥϝʔλͷظΛܭࢉ͢Δ͜ͱͰਪఆΛಘΔɽ
5
࣮ূੳ
5.1
༻σʔλ
͜͜Ͱɼݱ࣮ͷΦϯϥΠϯࣾձωοτϫʔΫʹରͯ͠ɼఏҊϞσϧΛ༻͍ͨੳ͕༗ӹͰ ͋Δ͜ͱΛࣔͨ͢ΊʹɼTwitterσʔλΛ࣮ͬͨূੳΛߦ͏ɽຊઅͰɼ·ͣੳʹ༻͍ ͨσʔληοτͷ֓ཁͱલॲཧʹ͍ͭͯઆ໌͢ΔɽຊݚڀͰɼఱಊגࣜձ͕ࣾTwitter ্Ͱอ͍࣋ͯ͠Δӳޠ൛ެࣜΞΧϯτΛத৺ͱ͢ΔωοτϫʔΫΛରͱͯ͠ɼҎԼͷख ॱͰσʔλΛऩूٴͼՃͨ͠ɽ ·ͣɼ20185݄1࣌ͰͷϑΥϩʔؔʹैͬͯɼఱಊͷΞΧϯτΛϑΥϩʔ͠ ͍ͯΔϢʔβʔ͔ΒϥϯμϜʹαϯϓϦϯάΛߦͬͨɽଓ͍ͯɼαϯϓϧ͞ΕͨϢʔβʔΛ ϑΥϩʔ͍ͯ͠ΔผͷϢʔβʔ͔ΒϥϯμϜʹαϯϓϦϯάΛߦͬͨɽͦͯ͠ɼͦΕΒͷ ϢʔβʔͰܗ͞ΕΔωοτϫʔΫʹ͓͍ͯɼೖ࣍ͱग़࣍ͷฏۉ͕3ҎԼͷϢʔβʔΛ ֎ΕͱΈͳͯ͠σʔληοτ͔Βআ֎ͨ͠ɽ݁Ռͱͯ͠ɼ3,500ਓͷϢʔβʔ͕Γɼωο τϫʔΫʹ͓͚ΔΤοδͷ૯68,949ຊͰ͋ͬͨɽ͜ΕΒͷϢʔβʔͰܗ͞ΕΔ༗ άϥϑΛωοτϫʔΫใͱͯ͠༻͢Δɽ ࣍ʹɼςΩετσʔλͷ࡞ํ๏Λઆ໌͢Δɽ·ͣɼ্Ͱαϯϓϧ͞Εͨ3,500ਓͷΞ Χϯτʹରͯ͠ɼ20179݄1͔Β2018ͷ2݄281·Ͱʹߘͨ͠ߘ༰͔Β ςΩετ෦Λશͯൈ͖ग़ͨ͠ɽ͜ΕΒͷςΩετσʔλʹରͯ͠ɼจষ͔Β୯ޠू߹ͷ ղɼখจࣈͷ౷Ұɼࣈɼه߸ɼٴͼओཁͳετοϓϫʔυʢaɼtheɼIͳͲʣͷআɼ ׆༻ܗ͔Βޠװͷ౷ҰʢstemmingʣͷॱʹલॲཧΛߦͬͨɽ͞ΒʹɼॲཧࡁΈͷςΩετ σʔλͷ͏ͪɼίʔύεͰͷස͕20ҎԼɼ͋Δ͍20ਓҎԼͷϢʔβʔʹ͔͠ΘΕ ͍ͯͳ͍සͷ୯ޠͱɼ50ਓҎ্ͷϢʔβʔʹΘΕ͍ͯΔߴසͷ୯ޠΛɼτϐοΫਪ ఆͷѱӨڹΛආ͚ΔͨΊʹσʔληοτ͔Βআ͍ͨɽ݁Ռͱͯ͠ɼίʔύεʹ9,001 छྨͷ୯ޠ͕Γɼϊʔυ͝ͱͷฏۉ୯ޠ98.2Ͱ͋ͬͨɽ࣍અͰɼఏҊϞσϧʹ͓͚ ΔίϛϡχςΟɼτϐοΫͷܾఆํ๏Λઆ໌ͨ͠ͷͪɼ࡞ͨ͠σʔληοτʹର͢Δ 1ςΩετσʔλͷલॲཧͷஈ֊ͰɼେͷϢʔβʔ͕ɼ20183݄ʹ։͔ΕͨNintendo Directͱ͍͏ ৽ൃදΠϕϯτʹؔ͢ΔߘΛߦ͍ͬͯΔ͜ͱ͕໌ͨ͠ɽ͕ͨͬͯ͠ɼຊݚڀͰɼ͜ͷΑ͏ͳଟ͘ͷ ϢʔβʔͰڞ௨͢ΔಉҰͷࣄʹର͢Δߘ͕τϐοΫͷਪఆʹ༩͑ΔӨڹΛආ͚ΔͨΊɼςΩετσʔλͷ ؍ଌظؒΛ20182݄28·Ͱͱͨ͠ɽఏҊϞσϧͷਪఆ݁Ռʹ͍ͭͯٞ͢Δɽ
5.2
ੳ݁Ռ
ఏҊϞσϧΛؚΊͯɼҰൠʹϒϩοΫϞσϧΛ༻͍ͯੳ͢ΔࡍʹɼࣄલʹίϛϡχςΟ ʢٴͼຊݚڀͰͦΕʹՃ͑ͯτϐοΫʣΛܾΊΔඞཁ͕͋ΔɽઌߦݚڀͰɼίϛϡχ
ςΟͷܾఆΛใྔج४Λ༻͍ͨϞσϧൺֱͱͯ͠ଊ͑ɼBICʹΑΔํ๏(Handcock et al.,
2007; Salda˜na et al., 2017)ɼintegrated completed likelihoodʹΑΔํ๏(Daudin et al., 2008; Bouveyron et al., 2018)ɼมϕΠζʹΑΔํ๏(Latouche et al., 2012)ͳͲ༷ʑͳख๏͕ ఏҊ͞Ε͍ͯΔɽ͔͠͠ɼຊݚڀͰɼۙ৽ͨͳใྔج४ͱͯ͠ఏҊ͞ΕɼݱࡏͰ
ଟ͘ͷྖҬͰΘΕ͍ͯΔ͑͘Δใྔج४(widely applecable information criterion,
WAIC, Watanabe, 2010)ΛϞσϧൺֱͷج४ͱͯ͠࠾༻ͨ͠ɽఏҊϞσϧʹର͢ΔWAIC
ͷৄࡉAppendix BʹৡΔɽද2ɼίϛϡχςΟٴͼτϐοΫΛ5͔Β10ͷൣғͰ ઃఆ͠ɼ5.1અͰ࡞ͨ͠σʔληοτʹରͯ͠WAICΛܭࢉͨ݁͠ՌͰ͋Δɽͨͩ͠ɼ͜ͷ ࣌ͷ܁Γฦ͠5,000ճͰ͋Γɼͦͷ͏ͪ2,000ճΛॳظʹґଘ͢ΔՔಇظؒͱͯ͠আ͍ ͨɽ·ͨɼϋΠύʔύϥϝʔλͷઃఆɼͦΕͧΕɼαl= 0.1,∀lɼβv = 0.1,∀vɼγk = 1.0,∀kɼ δkk = kk = 0.1,∀k, kͰ͋Δɽͦͷ݁ՌɼίϛϡχςΟ7ɼτϐοΫ7ͷϞσϧ͕બ ΕͨͨΊɼҎ߱Ͱ͜ͷϞσϧΛ༻͍ͨTwitterσʔλͷੳ݁ՌΛٞ͢Δɽ ·ͣɼϊʔυʹґଘ͠ͳ͍άϩʔόϧύϥϝʔλΛݟΔ͜ͱͰɼਓʑ͕ݕग़͞Εͨίϛϡ χςΟͰͲͷΑ͏ͳ͜ͱʹؔ৺Λ͍࣋ͬͯΔͷ͔͕͔Δɽਤ2ɼਪఆ͞Εͨ୯ޠ ͷ͕࠷ߴ্͍Ґ10ݸͷ୯ޠΛτϐοΫຖʹฒͨͷͰ͋Γɼ͜ΕʹΑͬͯτϐο ΫͷҙຯΛղऍ͢Δ͜ͱ͕Ͱ͖Δɽ֤τϐοΫͷҙຯͱදతͳ୯ޠҎԼͷ௨ΓͰ͋Δɽ τϐοΫ1Ξχϝʔγϣϯʹؔ͢ΔτϐοΫʢදతͳ୯ޠblackclovɼhunterxhuntɼ jojosbizarreadventurͳͲʣɼτϐοΫ2ετϦʔϛϯά৴શൠʹؔ͢ΔτϐοΫʢද
తͳ୯ޠteamemmmmsiɼtwitchkittenɼrokuͳͲʣɼτϐοΫ3Իָʹؔ͢ΔτϐοΫ
ʢදతͳ୯ޠvevoɼsprinrillaͳͲʣɼτϐοΫ4ήʔϜετϦʔϛϯά৴ʹؔ͢Δτ
ϐοΫʢදతͳ୯ޠcritical roleɼzeldathonͳͲʣɼτϐοΫ5ಡॻʹؔ͢ΔτϐοΫ
ΔτϐοΫʢදతͳ୯ޠdigitalmarketɼsmmɼcontentmarketͳͲʣɼͦͯ͠τϐοΫ 7εϙʔπʹؔ͢ΔτϐοΫʢදతͳ୯ޠoilerɼtfcͳͲʣͱݴ͑Δɽ·ͨɼਤ3ɼ ਪఆ͞Ε֤ͨίϛϡχςΟͷτϐοΫͰ͋Γɼ֤ίϛϡχςΟʹ͓͚ΔτϐοΫͷׂ ߹Λ֬ೝ͢Δ͜ͱ͕Ͱ͖Δɽ ࣍ʹɼ֤ϊʔυʹ͍ͭͯҟ࣭ͳϩʔΧϧύϥϝʔλͷਪఆ݁ՌΛ֬ೝ͢Δɽਤ4ٴͼ5 ɼϊʔυ൪߸1൪ͱ237൪ʹؔ͢ΔΤοδ֬ͱίϛϡχςΟͷਪఆ݁ՌͰ͋Δɽ· ͨɼϊʔυ1ͷೖ࣍6ɼग़࣍0Ͱ͋Γɼϊʔυ237ͷೖ࣍657ɼग़࣍37Ͱ ͋Δɽਪఆ݁Ռɼ͜ͷ྆ϊʔυͷ࣍த৺ੑͷҧ͍Λ࣮ʹද͓ͯ͠Γɼϊʔυ1͕ओʹ ଐ͢ΔίϛϡχςΟʢίϛϡχςΟ1ͱ6ʣʹؔ͢ΔΤοδ͍֬Ͱਪఆ͞Ε͍ͯΔͷ ʹରͯ͠ɼϊʔυ237͕ओʹଐ͢ΔίϛϡχςΟʢίϛϡχςΟ1ͱ5ʣʹؔ͢ΔΤοδ֬ ߴ͍Ͱਪఆ͞Ε͍ͯΔɽ͜ͷΑ͏ʹɼΤοδ֬ͷύϥϝʔλ͕Τοδͷܨ͕Γ͢͞ ʹؔ͢Δҟ࣭ੑΛଊ͑ΒΕΔΑ͏ͳԾఆΛಋೖ͢Δ͜ͱͰɼΑΓॊೈʹωοτϫʔΫϞσϧ ΛදݱͰ͖ΔΑ͏ʹͳΓɼςετσʔλʹର͢Δ༧ଌੑೳ্͢Δ͜ͱ͕ظ͞ΕΔɽ࣍ અͰɼ͜ΕΛݕূ͢ΔͨΊʹɼઌߦݚڀʹ͓͚ΔطଘϞσϧͱڞʹൺֱ࣮ݧΛߦ͏ɽ
5.3
༧ଌੑೳͷݕূ
ຊઅͰɼఏҊϞσϧͷςετσʔλʹର͢Δ༧ଌੑೳΛɼൺֱϞσϧͱڞʹݕূ͢ΔɽൺֱϞσϧͱͯ͠ɼઌߦݚڀʹ͓͚ΔطଘϞσϧ͔ΒɼAiroldi et al. (2008)ͱIgarashi and
Terui (2020)ΛબΜͩɽAiroldi et al. (2008)ͷϞσϧɼIgarashi and Terui (2020)ͷϞσϧ
͔ΒςΩετใͷߟྀΛআ͍ͨϞσϧʹ૬͢ΔͨΊɼ͜ΕΒΛൺֱ͢Δ͜ͱͰɼςΩε τใΛߟྀ͢Δ͜ͱʹΑΔ༧ଌੑೳͷӨڹΛݟΔ͜ͱ͕Ͱ͖Δɽ͞ΒʹɼIgarashi and Terui (2020)ͷϞσϧɼຊݚڀͷϞσϧ͔ΒΤοδ֬ͷҟ࣭ੑΛআ͍ͨಉ࣭ϞσϧͰ͋ Γɼҟ࣭ੑͷߏ͕༧ଌੑೳ༩͑ΔӨڹΛݟΔ͜ͱ͕Ͱ͖Δɽ 5.2અͰɼશͯͷωοτϫʔΫɼςΩετσʔλΛֶशσʔλͱͯ͠ϞσϧͷਪఆΛߦͬ ͕ͨɼ͜͜Ͱɼ֤ϊʔυ͕࣋ͭD− 1ݸͷؔੑͷ͏ͪɼ90%Λֶशσʔλͱͯ͠Ϟσϧ ͷਪఆʹ͍ɼΓͷ10%Λςετσʔλͱͨ͠͠ɽςΩετσʔλʹ͍ͭͯɼલઅಉ༷ શͯͷσʔλΛֶशσʔλͱͯ͠༻͍ͨɽ·ͨɼ܁Γฦ͠ϋΠύʔύϥϝʔλͷઃఆ
લઅͱಉ݅͡Ͱਪఆ͍ͯ͠Δɽ͜ΕΒͷ݅ͷԼͰֶशσʔλʹର͢ΔਪఆΛߦ͍ɼ֤ύ ϥϝʔλͷਪఆΛಘͨɽਪఆ͞ΕͨίϛϡχςΟͱΤοδ֬ΛH, ˆˆ Ψͱද͢ͱɼྫ ͑ఏҊϞσϧʹ͍ͭͯɼςετσʔλaij ∈ Atestʹର͢Δ༧ଌ֬ҎԼͰܭࢉͰ͖Δɽ p(aij = 1) = K k=1 K k=1 ˆ ηikηjkˆ ψjkkˆ (7)
Airoldi et al. (2008)ͱIgarashi and Terui (2020)ͷϞσϧʹ͍ͭͯಉ༷ʹɼίϛϡχςΟ ͱΤοδ֬ͷੵʹΑͬͯ༧ଌ֬ΛܭࢉͰ͖Δɽ
ද3ɼίϛϡχςΟͱτϐοΫΛͦΕͧΕ5͔Β10·ͰมԽͤͨ͞ͱ͖ͷ֤Ϟσ
ϧͷArea Under the CurveʢAUCʣͷͰ͋Δɽ͜ΕΛݟΔͱɼ΄΅શͯͷΈ߹Θͤʹͭ
͍ͯఏҊϞσϧʢදͰHeteroʣ͕ൺֱϞσϧͰ͋ΔIgarashi and Terui (2020)ʢදͰ
HomoʣΑΓ༏Ε͍ͯΔ͜ͱ͕͔ΔɽΑͬͯɼࣾձωοτϫʔΫʹҰൠతʹΈΒΕΔ
࣍ͷҟ࣭ੑΛߟྀ͠ɼΤοδ͕ੜ͞ΕΔ֬ϊʔυ͝ͱʹಉ࣭తͰͳ͍ͱ͍͏Ծఆ
Λஔ͍ͨϞσϧͷํ͕༧ଌੑೳ͕༏ΕͨϞσϧͰ͋Δͱ͍͑Δɽ·ͨɼAiroldi et al. (2008)
ʢදͰMMSBʣͱIgarashi and Terui (2020)Λൺֱ͢ΔͱɼίϛϡχςΟͷ͕গͳ͍
ͱ͖ʢK = 5, 6ʣʹAiroldi et al. (2008)ͷํ͕AUC͕શମతʹߴ͘ɼίϛϡχςΟͷ͕
ଟ͍ͱ͖ʢK = 8, 9, 10ʣIgarashi and Terui (2020)ͷํ͕શମతʹAUC͕ߴ͍ͱ͍͏݁
ՌͰ͋ͬͨɽ͜ͷ݁Ռ͔ΒɼຊݚڀͰ༻͍ͨTwitterωοτϫʔΫɼେ·͔ʹίϛϡχςΟ Λ͚ΔࡍʹωοτϫʔΫใͷΈΛߟྀ͢Δ͚ͩͰॆͰ͋Δ͕ɼΑΓࡉ͔ͳίϛϡχ ςΟʹ͚Δ߹ʹɼςΩετใΛ༻͍ͯτϐοΫͷ·ͱ·ΓΛߟྀ͠ͳ͕Β͚ͨํ ͕ΑΓྑ͍ΫϥελϦϯάͱͳΔωοτϫʔΫͰ͋Δͱ͍͑Δɽͭ·Γɼ1અͰड़ͨΑ ͏ʹɼֶߍಉڃੜͱ͍ͬͨେ͖ͳ·ͱ·ΓͷίϛϡχςΟͷத͔ΒɼԻָεϙʔπͷΑ ͏ʹझຯؔ৺ࣄ͕ڞ௨͍ͯ͠Δ·ͱ·ΓɼτϐοΫϕʔεɾίϛϡχςΟΛݟ͚͍ͭͯ͘ ϞσϦϯά͕ɼΑΓਫ਼៛ͳωοτϫʔΫੳͷͨΊʹ༗ӹͰ͋Δͱ͍͏͕ࣔࠦಘΒΕͨɽ
6
݁
ຊݚڀͰɼࣾձωοτϫʔΫੳΛΑΓݱ࣮ʹଈͨ͠༗ҙٛͳੳͱ͢ΔͨΊʹɼωοτ ϫʔΫใ͚ͩͰͳ͘ɼਓʑͷڵຯؔ৺Λද͢ιʔγϟϧϝσΟΞ্ͷςΩετใΛߟ ྀ͠ɼ͞ΒʹɼࣾձωοτϫʔΫʹಛ༗ͷ࣍ͷҟ࣭ੑΛՃຯͨ͠ϞσϧΛఏҊͨ͠ɽઌߦ ݚڀʹ͓͚ΔطଘϞσϧͱൺֱͨ͠ͱ͖ɼຊݚڀͰఏҊ͢ΔϞσϧͷಛ৭ͱͯ͠ɼωοτ ϫʔΫ্ͷ֤ϊʔυ͕࣋ͭςΩετใΛར༻͍ͯ͠Δɼϊʔυ͕ͦΕͧΕͷؔੑʹ ԊͬͯෳͷίϛϡχςΟʹଐ͢Δ͜ͱΛڐ༰͍ͯ͠Δɼແάϥϑ͔༗άϥϑʹ͔͔ ΘΒͣద༻Ͱ͖Δɼͦͯ࣍͠ͷҟ࣭ੑΛߟྀ͠ɼΤοδ֬ͷύϥϝʔλ͕ϊʔυ͝ͱ ʹҟ࣭Ͱ͋Δ͜ͱΛԾఆ͍ͯ͠Δ͕ڍ͛ΒΕΔɽ͜ΕʹΑͬͯɼ͕࣍ϊʔυʹΑͬͯେ ͖͘ҟͳΔҰൠతͳࣾձωοτϫʔΫʹରͯ͠ेͳϑΟοςΟϯάੑೳΛ༗͠ͳ͕Βɼ Τοδ͕ີʹू·͓ͬͯΓɼ͔ͭੜ͞ΕͨςΩετͷτϐοΫ͕ಉҰͷ͔Βੜ͞Ε ΔɼτϐοΫϕʔεɾίϛϡχςΟͷݕग़͕ՄೳͱͳΔɽ ࣮ূੳͷ݁Ռɼ่յܕΪϒεαϯϓϦϯάʹΑͬͯਪఆ͞ΕΔఏҊϞσϧɼݱ࣮ͷ Twitterσʔλʹରͯ͠ɼҙຯͷ͋ΔίϛϡχςΟٴͼτϐοΫߏΛଊ͑Δ͚ͩͰͳ͘ɼͲ ͷΑ͏ͳίϛϡχςΟɼτϐοΫͷΈ߹ΘͤͰ͋ͬͯطଘϞσϧΑΓ༏Εͨ༧ଌ ੑೳΛ͓࣋ͬͯΓɼ࣍ͷҟ࣭ੑΛߟྀͯ͠Τοδ֬Λਪఆ͢ΔϞσϦϯάɼ༧ଌੑೳ ʹ͓͍ͯ༏Ε͍ͯΔ͜ͱ͕ࣔ͞Εͨɽ͞Βʹɼ͜ͷ݁Ռ͔ΒɼΦϯϥΠϯͷࣾձωοτϫʔ Ϋੳʹ͓͍ͯɼωοτϫʔΫ্ͷେ·͔ͳίϛϡχςΟߏΛ͑ͯɼ͞Βʹࡉ͔͘Ϋϥ ελʔΛੳ͍ͯ͘͠߹ɼ֤ϊʔυ͕࣋ͭςΩετใΛՃຯͯ͠τϐοΫϕʔεɾί ϛϡχςΟΛݟ͚͍ͭͯ͘ϞσϦϯά͕༗ӹͰ͋ΔͱͷࣔࠦΛಘΔ͜ͱ͕Ͱ͖ͨɽ ຊݚڀͰɼΦϯϥΠϯ্ͷࣾձωοτϫʔΫʹணͨͨ͠Ίɼਓʑ͕ଞͱωοτϫʔ ΫΛܗ͢Δࡍʹɼ૬खͷωοτϫʔΫใͱςΩετใͷΈΛߟྀ͢Δͱ͍͏ԾఆΛ ஔ͖ɼ૬खͷྸੑผͱ͍ͬͨଐੑใɼ͋Δ͍ߦಈଶͱ͍ͬͨใɼ͜ΕΒͷ σʔλ͕ར༻Ͱ͖ͳ͍͜ͱ͔ΒఏҊϞσϧͷߟྀ͔Β֎͍ͯͨ͠ɽҰํͰɼࣾձωοτϫʔ Ϋੳʹؔ͢Δઌߦݚڀͷจ຺ͰɼͦͷΑ͏ͳϊʔυݻ༗ͷʢ͋Δ͍ೋ߲ɼࡾ߲ؒͷʣಛ ྔ͕ωοτϫʔΫܗʹӨڹ͍ͯ͠Δ͜ͱ͕ଟ͘ͷݚڀͰࣔ͞Ε͍ͯΔ(Hoff et al., 2002; Handcock et al., 2007)ɽຊݚڀͰɼΤοδܗͷ͕ؔɼؔੑΛ݁Ϳ྆ऀͷίϛϡχςΟɼٴͼؔੑΛड͚औΔଆͷΤοδ֬Ͱߏ͞Ε͍͕ͯͨɼઌߦݚڀΛࢀর͢Δͱɼ ͜͜ʹଐੑߦಈใͱ͍ͬͨϊʔυݻ༗ͷಛྔΛΈࠐΉ֦ு༗ҙٛͰ͋Γɼ͜ΕΒ ͷใΛϞσϧʹऔΓࠐΉ͜ͱతʹՄೳͰ͋Δɽσʔλͷར༻Մೳੑͱ߹Θͤͯࠓޙ ͷ՝ͱ͍ͨ͠ɽ
References
Airoldi, E. M., Blei, D. M., Fienberg, S. E., and Xing, E. P. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9(SEP):1981–2014, 2008.
Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent Dirichlet allocation. Journal of Machine
Learning Research, 3(4-5):993–1022, 2003.
Bouveyron, C., Latouche, P., and Zreik, R. The stochastic topic block model for the clus-tering of vertices in networks with textual edges. Statistics and Computing, 28(1):11–31, 2018.
Chang, J. and Blei, D. M. Hierarchical relational models for document networks. The Annals
of Applied Statistics, 4(1):124–150, 2010.
Daudin, J.-J., Picard, F., and Robin, S. A mixture model for random graphs. Statistics and
Computing, 18(2):173–183, 2008.
Griffiths, T. L. and Steyvers, M. Finding scientific topics. Proceedings of the National
Academy of Sciences, 101(Supplement 1):5228–5235, 2004.
Handcock, M. S., Raftery, A. E., and Tantrum, J. M. Model-based clustering for social networks. Journal of the Royal Statistical Society. Series A: Statistics in Society, 170(2): 301–354, 2007.
Hoff, P. D., Raftery, A. E., and Handcock, M. S. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):1090–1098, 2002. Igarashi, M. and Terui, N. Characterization of Topic-based Online Communities by
Combin-ing Network Data and User Generated Content. Statistics and ComputCombin-ing, 2020. (forth-coming).
Karrer, B. and Newman, M. E. Stochastic blockmodels and community structure in networks.
Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 83(1), 2011.
Krivitsky, P. N., Handcock, M. S., Raftery, A. E., and Hoff, P. D. Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Social Networks, 31(3):204–213, 2009.
Latouche, P., Birmel´e, E., and Ambroise, C. Variational Bayesian inference and complexity control for stochastic block models. Statistical Modelling: An International Journal, 12 (1):93–115, 2012.
Liu, Y., Niculescu-Mizil, A., and Gryc, W. Topic-link LDA: Joint models of topic and author community. Proceedings of the 26th International Conference On Machine Learning, ICML
Salda˜na, D. F., Yu, Y., and Feng, Y. How Many Communities Are There? Journal of Computational and Graphical Statistics, 26(1):171–181, 2017.
Snijders, T. A. and Nowicki, K. Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure. Journal of Classification, 14(1):75–100, 1997. Wang, Y. J. and Wong, G. Y. Stochastic Blockmodels for Directed Graphs. Journal of the
American Statistical Association, 82(397):8–19, 1987.
Watanabe, S. Asymptotic equivalence of Bayes cross validation and widely applicable infor-mation criterion in singular learning theory. Journal of Machine Learning Research, 11: 3571–3594, 2010.
Zhu, Y., Yan, X., Getoor, L., and Moore, C. Scalable text and link analysis with mixed-Topic link models. Proceedings of the ACM SIGKDD International Conference on Knowledge
Tables
ද 1: ఏҊϞσϧͱطଘϞσϧͷൺֱ
؍ଌσʔλ ϝϯόʔγοϓ άϥϑͷํੑ ࣍ͷҟ࣭ੑ Blei et al. (2003) ςΩετͷΈ ࠞ߹ - -Snijders and Nowicki (1997) ωοτϫʔΫͷΈ ୯Ұ ྆ํՄೳ ߟྀͤͣ Airoldi et al. (2008) ωοτϫʔΫͷΈ ࠞ߹ ྆ํՄೳ ߟྀͤͣ Chang and Blei (2010) ωοτϫʔΫʗςΩετ ࠞ߹ ແάϥϑͷΈ ߟྀͤͣ Liu et al. (2009) ωοτϫʔΫʗςΩετ ୯Ұ ແάϥϑͷΈ ߟྀͤͣ Bouveyron et al. (2018) ωοτϫʔΫʗςΩετ ୯Ұ ྆ํՄೳ ߟྀͤͣ Zhu et al. (2013) ωοτϫʔΫʗςΩετ ࠞ߹ ྆ํՄೳ ߟྀͤͣ Igarashi and Terui (2020) ωοτϫʔΫʗςΩετ ࠞ߹ ྆ํՄೳ ߟྀͤͣ Karrer and Newman (2011) ωοτϫʔΫͷΈ ୯Ұ ྆ํՄೳ ϊʔυ͝ͱͷظ࣍ύϥϝʔλΛಋೖ ຊݚڀ ωοτϫʔΫʗςΩετ ࠞ߹ ྆ํՄೳ Τοδ֬Λҟ࣭ύϥϝʔλͱͯ͠ఆٛ
ද 2: WAICʹΑΔϞσϧൺֱɿKίϛϡχςΟΛɼLτϐοΫΛද͠ɼଠࣈ࠷ খͷΛҙຯ͢Δɽ L=5 L=6 L=7 L=8 L=9 L=10 K=5 4422206.32 4340879.93 4321068.95 4333535.35 4354814.11 4553144.83 K=6 4333313.32 4333488.66 4351008.38 4309479.01 4302773.27 4280703.13 K=7 4313265.58 4285253.01 4272682.48 4346780.91 4301005.75 4414800.13 K=8 4320416.87 4282485.37 4326300.05 4324393.23 4321806.29 4426226.19 K=9 4429170.84 4329997.66 4439594.82 4407656.85 4296128.61 4301655.85 K=10 4361219.83 4342899.53 4282056.30 4306509.44 4306244.12 4406655.34
ද 3: AUCʹΑΔ༧ଌੑೳͷൺֱʀ֤Ϟσϧͷ໊લͦΕͧΕɼMMSB͕Airoldi et al. (2008)ɼHomo͕Igarashi and Terui (2020)ɼHetoro͕ຊݚڀͷϞσϧΛࢦ͠ɼଠࣈ֤ί
ϛϡχςΟʢKʣɼτϐοΫʢLʣͷΈ߹Θͤʹ͓͚Δ࠷େͷAUCΛද͢ɽ L 5 6 7 8 9 10 K=5 MMSB 0.897 0.897 0.897 0.897 0.897 0.897 Homo 0.896 0.900 0.890 0.883 0.905 0.890 Hetero 0.917 0.920 0.924 0.921 0.923 0.924 K=6 MMSB 0.913 0.913 0.913 0.913 0.913 0.913 Homo 0.904 0.908 0.910 0.909 0.908 0.906 Hetero 0.925 0.930 0.922 0.930 0.923 0.921 K=7 MMSB 0.907 0.907 0.907 0.907 0.907 0.907 Homo 0.920 0.900 0.895 0.900 0.903 0.911 Hetero 0.929 0.926 0.929 0.930 0.929 0.931 K=8 MMSB 0.904 0.904 0.904 0.904 0.904 0.904 Homo 0.925 0.918 0.916 0.921 0.897 0.916 Hetero 0.928 0.930 0.927 0.930 0.929 0.928 K=9 MMSB 0.912 0.912 0.912 0.912 0.912 0.912 Homo 0.920 0.922 0.918 0.917 0.922 0.920 Hetero 0.922 0.923 0.925 0.927 0.927 0.922 K=10 MMSB 0.906 0.906 0.906 0.906 0.906 0.906 Homo 0.923 0.926 0.923 0.925 0.920 0.917 Hetero 0.923 0.921 0.926 0.925 0.923 0.930
Figures
A S R η γ X Z W BernoulliMulti Multi Multi
Dir Multi Multi ψ δ θ α φ β Beta Dir Dir ∀k, k∈ K ∀j ∈ D, j = i ∀1 ≤ n ≤ Mi ∀i ∈ D ∀k ∈ K ∀l ∈ L ਤ 1: ఏҊϞσϧͷάϥϑΟΧϧදݱ
nonfollow blackclov hunterxhunt jojosbizarreadventur mkleosaga wnf hori mdva hyrulesaga nyxl teamemmmmsi dokkan twitchkitten vgc roku wizebot ryzen freebiefriday streamersconnect nbaliv trapadr vevo ddrive leed spinrilla ifb gainwithpyewaw gainwithxtiandela horford suav criticalrol zeldathon orton fursuitfriday dramaalert sdlive htgawm sml robloxdev yoongi iartg amread erotica asmsg momlif hemp writerslif bookreview kindleunlimit bookboost growthhack digitalmarket gdpr smm contentmarket gamedesign podernfamili socialmediamarket bigdata emailmarket savvi lube foodporn oiler austria tfc crowdfir tranc tock thexfil Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7
Community 5 Community 6 Community 7
Community 1 Community 2 Community 3 Community 4
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Topic To pic distr ib ution ਤ 3: ֤ίϛϡχςΟͷτϐοΫʹؔ͢Δਪఆ݁Ռ
1 2 3 4 5 6 7 1 2 3 4 5 6 7 Receiver Community Sender Comm unity 0.25 0.50 0.75 Edge Probability 0.0 0.2 0.4 0.6 1 2 3 4 5 6 7 Community Comm unity Distr ib ution ਤ4: ϊʔυ1ͷΤοδ֬ͱίϛϡχςΟʹؔ͢Δਪఆ݁Ռ
1 2 3 4 5 6 7 1 2 3 4 5 6 7 Receiver Community Sender Comm unity 0.25 0.50 0.75 Edge Probability 0.0 0.2 0.4 0.6 1 2 3 4 5 6 7 Community Comm unity Distr ib ution ਤ 5: ϊʔυ237ͷΤοδ֬ͱίϛϡχςΟʹؔ͢Δਪఆ݁Ռ
Appendices
A
͖݅ࣄޙͷಋग़
??અͰɼજࡏίϛϡχςΟٴͼજࡏτϐοΫͷ͖݅ࣄޙΛಋग़ͨ͠ʢࣜ5ɼ6ʣɽ ͜ΕΒͷࣄޙΛಘΔͨΊʹɼ·ͣɼίϛϡχςΟɼΤοδ֬ɼτϐοΫɼ୯ ޠͷ4ͭͷύϥϝʔλʹ͍ͭͯɼ͖݅ࣄޙΛಋग़͢Δඞཁ͕͋Δɽࣄલͱ ͷڞੑʹج͍ͮͯɼ͜ΕΒͷࣄޙҎԼͷΑ͏ʹಋग़͞ΕΔɽ p(ηi | S, R, X, γ) = Γ ( kNik+ Mik+ γk) kΓ(Nik+ Mik+ γk) K k=1 ηNik+Mik+γk ik (8) p(ψikk | A, S, R, δ, ) = Γ(n (+) ikk+ n(−)ikk + δkk+ kk) Γ(n(+)ikk + δkk)Γ(n(−)ikk + kk)× ψ I(aij=1)ikk (1− ψikk)I(aij=0) (9) p(θk | X, Z, α) = Γ ( lMkl + αl) ΠlΓ(Mkl+ αl) L l=1 θMkl+αl kl (10) p(φl| W, Z, β) = Γ ( vMlv+ βv) ΠvΓ(Mlv+ βv) V v=1 φMlv+βv lv , (11)
B
ఏҊϞσϧʹର͢Δ
WAIC
ͷఆٛࣜ
ఏҊϞσϧʹର͢ΔWAICͷఆٛࣜҎԼͷ௨ΓͰ͋Δɽ lpd(i) = log 1 G G g=b+1 D j=1 p aij | H(g), Ψ(g)j Mi m=1 pwim| H(g), Θ(g), Φ(g) (12) p(i)waic = G G− 1 1 G G g=b+1 D j=1 log p aij | H(g), Ψ(g)j 2+ Mi m=1 log pwim | H(g), Θ(g), Φ(g)2 − 1 G G g=b+1 D j=1 log p aij | H(g), Ψ(g)j + Mi m=1 log pwim| H(g), Θ(g), Φ(g) 2⎞ ⎠ (13) W AIC =−2 D i=1ͨͩ͠ɼp aij | H(g), Ψ(g)j ͱpwim| H(g), Θ(g), Φ(g)ɼCGSʹΑΔαϯϓϧͷ͏ͪgճ ͷ܁Γฦ͠ʹ͓͚ΔαϯϓϧͰਪఆͨ͠ύϥϝʔλΛ༻͍ͯܭࢉ͞ΕΔͰ͋ΓɼҎԼ Ͱఆٛ͞Ε͍ͯΔɽ p aij | H(g), Ψ(g)j = K k=1 K k=1
ηik· ηjk(g) · ψjkk(g)I(a ij=1)· (1 − ψjkk)(g)I(aij=0) (15)
pwim | H(g), Θ(g), Φ(g)= K k=1 L l=1 ηik(g)· θkl(g)· φ(g)lwim. (16)