情報検索技術に基づくブロッククローン検出
全文
(2) Vol.2017-SE-196 No.19 2017/7/20. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. ͍ؔͷҰ෦ʹίʔυΫϩʔϯ͕·ؚΕͨ߹ɼݕग़ ࿙Ε͕ੜ͡ΔՄೳੑ͕͋Δɽ͜ͷΑ͏ͳݕग़࿙ΕΛݮΒ͢ ͨΊʹɼؔ୯ҐΑΓখཻ͍͞ͰίʔυΫϩʔϯͷݕ ग़Λߦ͏͖Ͱ͋Δɽͦ͜ͰɼຊͰڀݚίʔυϒϩοΫ ୯ҐͷίʔυΫϩʔϯʢϒϩοΫΫϩʔϯʣΛݕग़͢Δख ๏ΛఏҊ͢Δɽ͜͜ͰίʔυϒϩοΫΛɼؔͱɼؔ ෦ͷ ifɼforɼwhile จͷதͰހׅғ·Εͨ෦ͱఆٛ͢ Δɽຊख๏Ͱɼ·ͣιʔείʔυʹରͯ͠ߏจղੳΛߦ. λΠϓ 3 λΠϓ 2 ͷҧ͍ʹՃ͑ͯɼจͷૠೖআɼม ߋͳͲ͕ߦΘΕ͍ͯΔίʔυΫϩʔϯɽ λΠϓ 4 ྨࣅͨ͠ॲཧΛ࣮ߦ͢Δ͕ɼߏจ্ͷ࣮͕ҟ ͳΔίʔυΫϩʔϯ λΠϓ 4 ͷίʔυΫϩʔϯͱͯ͠ɼҎԼͷͷ͕͛ڍΒ ΕΔɽ. • ݅ॲذཧ܁Γฦ͠ॲཧͳͲͷ੍ߏޚͷ࣮͕ ҟͳΔɽ. ͍ɼίʔυϒϩοΫͷநग़Λߦ͏ɽͦͷޙɼநग़֤ͨ͠ίʔ. • தؒഔհมͷར༻ͷ༗ແ͕ଘࡏ͍ͯ͠Δɽ. υϒϩοΫʹରͯ͠ TF-IDF ๏Λ༻͍ͯಛϕΫτϧʹม. • จͷฒͼସ͕͑ൃੜ͍ͯ͠Δɽ. ͠ɼಛϕΫτϧؒͷྨࣅΛ͢ࢉܭΔ͜ͱͰϒϩοΫ Ϋϩʔϯͷݕग़Λߦ͏ɽ·ͨɼMulti-Probe LSH [9] Λ༻. 2.1 ؔΫϩʔϯݕग़๏. ͯ͠ಛϕΫτϧͷΫϥελϦϯάΛߦ͏ɽMulti-Probe. ࢁதΒใٕࡧݕज़Λར༻͢Δ͜ͱʹΑͬͯɼҙຯత. LSH ͱɼैདྷͷ LSH ͷϝϞϦ༻ྔ͕ଟ͍ͱ͍͏. ʹॲཧ͕ྨࣅͨؔ͠ΫϩʔϯΛݕग़͢Δख๏ΛఏҊ͠. Λվྑͨ͠ΞϧΰϦζϜͰ͋ΔɽʢҎ߱ɼ୯ʹ“LSH”ͱද. ͨɽίʔυย୯ҐͰίʔυΫϩʔϯͷݕग़Λߦ͏߹ɼߏ. ͨ͠ه߹ LSH ΞϧΰϦζϜશൠΛࢦ͠ɼLSH Ξϧΰ. จͷෆશͳ෦Ͱऴྃ͢ΔίʔυยͳͲɼूΛߦ͏͜. ϦζϜΛ۠ผ͢Δࡍ“Multi-Probe LSH”ɼ“ैདྷͷ LSH”. ͱ͕ࠔͰ͋ΔίʔυΫϩʔϯ͕ଟ͘ݕग़͞ΕΔڪΕ͕͋. ͱ͍͏දهΛ༻͍Δʣ͜ͷ͜ͱͰɼΑΓগͳ͍ϝϞϦ༻. Δ [14]ɽҰํɼؔΫϩʔϯॲཧͷ༰͕·ͱ·͍ͬͯ. ྔͰߴͳݕग़Λ࣮ͨ͠ݱɽ. ΔͨΊɼ։ൃऀʹͱͬͯूͷରʹͳΓ͍͢ίʔυΫ. ධՁ࣮ͰݧɼؔΫϩʔϯݕग़๏ [13] ͱ CCFinder [7]. ϩʔϯ͕ݕग़Ͱ͖Δɽ·ͨؔΫϩʔϯݕग़๏ɼλΠϓ. ͷ 2 ͭͱݕग़ਫ਼ͱݕग़࣌ؒͷ͔؍ΒൺֱΛߦͬͨɽ. 1 ͔ΒλΠϓ 4 ·ͰͷίʔυΫϩʔϯΛݕग़ՄೳͰ͋Δɽ. CCFinder ਆ୩Β͕։ൃͨ͠ίʔυΫϩʔϯݕग़πʔϧ. ͜ͷख๏ɼ·ͣɼೖྗ͞Εͨιʔείʔυதͷϫʔυʹ. Ͱ͋Γɼࣈ۟୯ҐͷίʔυΫϩʔϯݕग़͕ՄೳͰ͋Δɽ3. ֤͍ؔͯͮجΛಛϕΫτϧʹม͢Δɽ͜͜Ͱϫʔυ. ͭͷ C ޠݴͷϓϩδΣΫτʹରͯ͠ద༻ͨ݁͠Ռɼຊख๏. ͱɼҎԼͷ 2 ͭΛରͱ͢Δɽ. ͕૯߹తʹߴ͍ਫ਼ͰΑΓଟ͘ͷίʔυΫϩʔϯΛݕग़͢. • มؔͳͲʹ͚ͭΒΕͨࣝผࢠ໊Λߏ͢Δ୯ޠ. Δ͜ͱ͕Ͱ͖ͨɽ·ͨɼຊख๏ͷݕग़ʹ͔͔Δ࣌ؒ 3 . • ݅จ܁Γฦ͠จͳͲͷߏจʹར༻͞ΕΔ༧ޠ. ҎԼͱͳΓɼؔΫϩʔϯݕग़๏ͱ CCFinder ΑΓߴ. ͦͯ͠ɼಛϕΫτϧؒͷྨࣅΛٻΊΔ͜ͱʹΑͬͯΫ. ʹίʔυΫϩʔϯΛݕग़͢Δ͜ͱ͕Ͱ͖ͨɽ·ͨɼݕग़࣌. ϩʔϯϖΞͷू߹ΛϦετͱͯ͠ग़ྗ͢Δɽ·ͨɼྨࣅ. ؒͱεέʔϥϏϦςΟͷධՁߦ͍ɼ100MLOC ͷେن. ͷࢉܭͷલʹ LSH [6] Λར༻͠ɼಛϕΫτϧͷΫϥε. ϓϩδΣΫτʹରͯ͠ 4 ࣌ؒఔͰݕग़Ͱ͖Δ͜ͱΛ֬ೝ. λϦϯάΛߦ͏͜ͱʹΑͬͯɼݕग़ͷߴԽΛߦ͍ͬͯΔɽ. ͨ͠ɽ Ҏ߱ɼ2 ষͰɼίʔυΫϩʔϯͱؔΫϩʔϯݕग़๏. 2.2 ؔΫϩʔϯݕग़๏ͷ. ʹ͍ͭͯड़Δɽ3 ষͰɼຊͰڀݚఏҊ͢ΔϒϩοΫΫ. 2.1 અͰɼؔΫϩʔϯݕग़๏ͷ֓ཁʹ͍ͭͯઆ໌͠. ϩʔϯݕग़๏ʹ͍ͭͯड़Δɽ4 ষͰɼຊख๏ͷධՁ࣮. ͨɽ͔͠͠ɼؔΫϩʔϯݕग़๏ʹରͯ͠ҎԼͷ 2 ͭͷ. ͍ͯͭʹݧड़Δɽ࠷ʹޙɼ5 ষͰ·ͱΊͱࠓޙͷ՝ʹ. ͕͛ڍΒΕΔɽ. ͍ͭͯड़Δɽ. 2. ίʔυΫϩʔϯ ຊষͰɼຊڀݚͷഎͯ͠ͱܠίʔυΫϩʔϯɼ͓Αͼ ࢁதΒͷؔΫϩʔϯݕग़๏ͱɼͦͷʹ͍ͭͯड़. 1 ͭɼؔ୯Ґͷݕग़ʹΑΔͰ͋Δɽ͜ͷख ๏ɼؔશମͰͳ͘ɼҰ෦ͷΈ͕ίʔυΫϩʔϯʹ ͳ͍ͬͯΔͷΛݕग़͢Δ͜ͱ͕Ͱ͖ͳ͍ɽྫ͑ਤ 1 ͷ Α͏ʹɼ͍ؔͷҰ෦ʹίʔυΫϩʔϯ͕·ؚΕΔ ߹ɼݕग़࿙Ε͕ੜ͡ΔՄೳੑ͕͋Δɽ. ΔɽRoy ΒίʔυΫϩʔϯͷఆٛͱͯ͠ɼίʔυΫϩʔ. 2 ͭɼϝϞϦ༻ྔ͕ଟ͍ͱ͍͏Ͱ͋Δɽؔ. ϯؒͷҧ͍ͷ߹͍ʹ͖ͮجҎԼͷ 4 ͭͷλΠϓʹྨ͠. ΛಛϕΫτϧʹม͢Δํ๏ͱͯؔ͠Ϋϩʔϯݕग़. ͍ͯΔ [12]ɽ. ๏ TF-IDF ๏Λ༻͍͍ͯΔ͕ɼTF-IDF ๏ͰશؔͰ. λΠϓ 1. ۭനλϒͷ༗ແɼίʔσΟϯάελΠϧɼί. ग़ͨ͠ݱϫʔυͷछྨ͕࣍ͳͱݩΔͨΊɼಛϕΫτ. ϝϯτͷ༗ແͳͲͷҧ͍Λআ͖શʹҰக͢Δίʔυ. ϧͷ͕࣍ݩඇৗʹେ͖͘ͳΔʹ͋ΔɽಛϕΫτϧ. Ϋϩʔϯ.. ֤ؔʹ 1 ͭͣݸ༩͑ΔͨΊɼ࣍ݩͷେ͖͍ಛϕΫ. λΠϓ 2. λΠϓ 1 ͷҧ͍ʹՃ͑ͯɼม໊ͳͲͷϢʔβʔ. ఆ໊ٛɼมͷ͕ͲͳܕҟͳΔίʔυΫϩʔϯ.. c 2017 Information Processing Society of Japan . τϧΛଟอ࣋͢Δ͜ͱʹͳΓɼϝϞϦ༻ྔ͕େ͖͘ͳ Δɽ·ͨɼߴʹݕग़Λߦ͏ͨΊʹ LSH Λ༻͍ͯΫϥελ. 2.
(3) Vol.2017-SE-196 No.19 2017/7/20. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report Ƈ. Ƈ. 㛗. ſ ƀƇ ʰɥŚ ʰɨŚ ƈ. 厦 㛵. ſ ƀƇ ʰɥŚ ʰɨŚ ƈ. ୍⮴㒊ศ. ᩘ ƈ. ਤ 1. ͦͷͨΊɼಛϕΫτϧͷΫϥελϦϯάͰɼLSH [6] ͷ ҰछͰ͋ΓɼۭؒྔࢉܭͷվྑΛߦͬͨ Multi-Probe LSH. [9] Λద༻ͨ͠ɽ 3.1 ༻ޠͷఆٛ. ƈ. ͍ؔͷҰ෦ʹίʔυΫϩʔϯΛؚΉྫ. Fig. 1 An example of parts of long function containing code clones. 3.1.1 ίʔυϒϩοΫ ϓϩάϥϛϯά͍͓ͯʹޠݴɼෳͷ໋ྩจΛҰׅΓʹ ·ͱΊͨͷΛίʔυϒϩοΫͱ͍͏ɽଟ͘ͷϓϩάϥϛ ϯάͰޠݴɼίʔυϒϩοΫΛೖΕࢠߏʹ͢Δ͜ͱ͕ Ͱ͖ɼมͷείʔϓͱͯ͠ͷҙຯΛ࣋ͭ͜ͱ͕͋Δɽ. ϦϯάΛߦ͍ͬͯΔ͕ɼLSH ਫ਼Λ্͛ΔͱϝϞϦ༻ ྔ͕େ͖͘ͳΔಛ͕͋ΔɽΑͬͯɼେنϓϩδΣΫτ ʢLinux Kernel ʣʹରͯ͠ίʔυΫϩʔϯͷݕग़Λߦͬ ͨ߹ɼϝϞϦෆͰݕग़Λڪ͍ͳ͖ͰྃΕ͕͋Δɽ ͦ͜Ͱɼ্ͷ 2 ͭͷΛ౿·͑ͨ৽͍͠ίʔυΫ ϩʔϯݕग़๏ͷඞཁੑ͕ߟ͑ΒΕΔɽຊͰڀݚؔ୯Ґ. ຊख๏ͰɼҎԼͷ 2 ͭͷ݅ͷ͍ͣΕ͔Λຬͨ͢ίʔ υϒϩοΫΛݕग़ରͱ͢Δɽରޠݴ C ͱޠݴJava ͢ͱޠݴΔɽ ݅ 1 ؔͷ“{ }”Ͱғ·Εͨൣғ ݅ 2 ifɼelseɼforɼwhileɼdo-whileɼswitch จͷ“{ }” Ͱғ·Εͨൣғ. ΑΓݕग़ཻΛখͨ͘͞͠ɼίʔυϒϩοΫ୯Ґͷίʔυ. ͨͩ͠ɼݱ͕”} {“ʹޙΕͳ͍୯จͷ໋ྩจίʔυϒϩο. Ϋϩʔϯݕग़๏ΛఏҊͨ͠ɽίʔυϒϩοΫ୯ҐͰݕग़Λ. Ϋͱͯ͠ͷవ·Γ͕ͳ͍ͨΊݕग़ରͱ͠ͳ͍ɽ·ͨਤ 3. ߦ͏͜ͱͰɼؔΫϩʔϯݕग़๏Ͱݕग़Ͱ͖ͳ͔ͬͨ. ͷ Block A ʹର͢Δ Block B Block C ͷΑ͏ʹɼೖΕ. ίʔυΫϩʔϯΛݕग़͕ՄೳʹͳΔɽ·ͨɼۭؒྔࢉܭͷ. ࢠߏͷଆͷίʔυϒϩοΫݕग़ՄೳͰ͋Γɼݕग़ର. গͳ͍ಛϕΫτϧͷ࣮ํ๏ LSH ʹมߋ͢Δ͜ͱͰɼ. Λ࠶ؼతʹ୳ࡧ͢Δɽ. େنϓϩδΣΫτͷద༻͕ՄೳͱͳΔɽ. 3.1.2 ϫʔυ. 3. ఏҊ͢Δݕग़ख๏ ຊͰڀݚɼ2.1 અͰઆ໌ͨ͠ࢁதΒͷؔΫϩʔϯݕ ग़๏ΛʹجɼίʔυϒϩοΫ୯ҐͷΫϩʔϯݕग़ʹରԠ͢ ΔΑ͏มߋͨ͠ख๏ʢϒϩοΫΫϩʔϯݕग़๏ʣΛఏҊ͢ Δɽຊख๏ͷ֓ཁΛਤ 2 ʹࣔ͢ɽຊख๏ओʹҎԼͷ 5 ͭ ͷεςοϓͰ࣮ߦ͞ΕΔɽ. STEP 1 ߏจղੳΛߦ͍ιʔείʔυ͔Βநߏจ Λੜ͠ɼੜͨ͠நߏจ͔ΒίʔυϒϩοΫ ʢ3.1.1 અࢀরʣΛऔΓग़͢ɽ. STEP 2 STEP 1 Ͱநग़֤ͨ͠ίʔυϒϩοΫ͔Βɼϫʔ υʢ3.1.2 અࢀরʣͷநग़Λߦ͏ɽ. ຊख๏ͰҎԼͷ݅ 3ɼ4 ͷ͍ͣΕ͔Λຬͨ͢ͷΛ ϫʔυͱͯ͠ఆٛ͢Δɽ ݅ 3 ༧ޠ ݅ 4 ࣝผࢠ໊Λߏ͢Δ୯ޠ ࣝผࢠ໊͕ෳͷ୯͔ޠΒߏ͞ΕΔ߹ɼҎԼͷํ๏ Ͱϫʔυ୯Ґʹׂ͢Δɽ. • ϋΠϑϯΞϯμʔείΞͳͲͷ۠Γʹ߸هΑΔ ׂ. • ࣝผࢠ໊தͷେจࣈʹͳ͍ͬͯΔΞϧϑΝϕοτʹΑ Δׂ ·ͨɼ2 จࣈҎԼͷࣝผࢠɼͦΕΒΛ·ͱΊͯಉҰͷϝ λϫʔυͱͯ͠ѻ͏ɽྫ͑ɼ܁Γฦ͠จͰΑ͘ར༻͞. STEP 3 TF-IDF ๏Λར༻͠ɼSTEP 2 Ͱநग़ͨ͠ϫʔ. ΕΔ i j ͱ͍ͬͨมɼҙຯใ͕ࠐΊΒΕ͍ͯͳ͍. υʹॏΈ͚Λߦ͍ɼ֤ίʔυϒϩοΫΛಛϕΫτ. มͱͯ͠ѻ͏ͨΊͰ͋Δɽ͞Βʹɼ͍݅༻ʹذΒΕ. ϧʹม͢Δɽ. Δ if whileɼ܁Γฦ͠ʹ༻͍ΒΕΔ for while ͷ༧. STEP 4 LSH Λར༻͠ɼSTEP 3 ͰٻΊ֤ͨίʔυϒ ϩοΫʹର͢ΔಛϕΫτϧͷΫϥελϦϯάΛߦ͏ɽ. STEP 5 STEP 4 ͰٻΊͨίʔυϒϩοΫͷ֤Ϋϥελ ͷதͰɼಛϕΫτϧؒͷྨࣅͷࢉܭΛߦ͍ɼϒ ϩοΫΫϩʔϯʢ3.1.3 અࢀরʣΛݕग़͢Δɽ ຊख๏ͱؔΫϩʔϯݕग़๏ͷओͳ૬ҧɼίʔυΫ ϩʔϯͷݕग़ཻͰ͋ΔɽؔΫϩʔϯݕग़๏Ͱؔͷ ݕग़ͷΈΛߦ͏͕ɼຊख๏ͰಛϕΫτϧͷํࢉܭ๏ มߋ͠ɼؔͱؔͷίʔυϒϩοΫͷ྆ํΛݕग़͢Δɽ ͔͠͠ɼຊख๏ݕग़ཻΛখ͘͢͞Δ͜ͱͰݕग़ର ͕૿͑ɼͦΕʹ͍ݕग़࣌ؒɼϝϞϦ༻ྔ͕૿େ͢Δɽ. c 2017 Information Processing Society of Japan . ޠϫʔυͱͯ͠ѻ͏ɽͳ͓ɼ֤ϫʔυͷେจࣈͱখจࣈ ʹΑΔ۠ผ͚ͭͣɼಉҰͷϫʔυͱͯ͠ѻ͏ɽ. 3.1.3 ϒϩοΫΫϩʔϯ ݅ 5 ίʔυϒϩοΫؒͷྨࣅ͕ᮢҎ্. sim(CB1 , CB2 ) ≥ p. (0 ≤ p ≤ 1). ݅ 6 ίʔυϒϩοΫؒʹڞ௨෦͕ͳ͍. CB1 ∩ CB2 = φ CB1 ɼCB2 ͦΕͧΕΛਅʹแؚ͢ΔԿͳΔίʔυϒ ϩοΫϒϩοΫΫϩʔϯϖΞͰͳ͍ͱ͖ɼCB1 ɼCB2 Λ. 3.
(4) Vol.2017-SE-196 No.19 2017/7/20. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. ^dWϭ. ^dWϮ. ^dWϯ. ^dWϰ. ^dWϱ. 䝤䝻䝑䜽 . ὁὊἛ ׅૠ ᶖᶖᶖ ᵑ ᶗᶗᶗ ᵐ Ḡ Ḡ. 䝤䝻䝑䜽 ଵ , ଶ , ଷ , ⊃. 䝤䝻䝑䜽 䝤䝻䝑䜽. 䝤䝻䝑䜽. 䝋䞊䝇䝁䞊䝗. ὁὊἛ ׅૠ ᶖᶖᶖ ᵒ ᶗᶗᶗ ᵐ Ḡ Ḡ. ᢳ㇟ᵓᩥᮌ. 䝤䝻䝑䜽 䝤䝻䝑䜽 䝤䝻䝑䜽. 䝤䝻䝑䜽 ଵ , ଶ , ଷ , ⊃. 䝽䞊䝗䝸䝇䝖 ≉ᚩ䝧䜽䝖䝹 ਤ 2. 䜽䝷䝇䝍. ˩ࡇ ⇼∓⇩⇕⇕ݣ∓∞∙ ⇼∓⇩⇕″ ݱ •†‧ ⇼∓⇩⇕‴ ⇼∓⇩⇕‵ •† • ⇼∓⇩⇕‶ ⇼∓⇩⇕‶ ݱ •†• ⇼∓⇩⇕‷ . 䜽䝻䞊䞁䝨䜰䝸䝇䝖. ఏҊख๏ͷ֓ཁ. Fig. 2 An overview of our detection approach. ůŽĐŬ. Ƈ. Ƈ. ſƀƇ. ſƀƇ. ůŽĐŬ. ſƀƇ ʰɥŚ ƈ. ůŽĐŬ. ƈ. ůŽĐŬ . ſƀƇ ʰɥŚ ƈ. ʰɨŚ. ƈ. ƈ. ਤ 3. 3.2 ίʔυϒϩοΫͱϫʔυͷநग़ ຊख๏ͰߏจղੳΛߦ͍ɼίʔυϒϩοΫͷநग़Λߦ ůŽĐŬ . ͚ΒΕΔɽຊख๏Ͱɼߏจղੳʹ ANTLR v4*1 Λར༻. ʰɨŚ. ͨ͠ɽ. ƈ. ೖΕࢠߏʹ͋Δ. STEP I ιʔείʔυʹରͯ͠ߏจղੳΛߦ͍ɼநߏ. ਤ 4. ڞ௨෦͕͋Δ. ίʔυϒϩοΫ. ίʔυϒϩοΫϖΞ. Fig. 3 Code blocks in. ͏ɽίʔυϒϩοΫநग़ͷख๏ҎԼͷ 6 ͭͷεςοϓʹ. Fig. 4 Code block pair. nested structure. จΛੜ͢Δɽ. STEP II நߏจ͔Βؔʢ3.1.1 અͷ݅ 1 Λຬͨ ͢ίʔυϒϩοΫʣͷ෦ΛऔΓग़͢ɽ. sharing a common part. STEP III STEP II ͰऔΓग़ͨ͠෦Λ࠷֎ଆͷίʔ Ƈ ſƀƇ.
(5) . ƈ. ਤ 5. STEP IV STEP II ͰऔΓग़ͨ͠෦͔Βɼίʔυϒ. ſƀƇ.
(6) . ſƀƇ ʰɥŚ ƈ. ƈ. υϒϩοΫͱͯ͠நग़͢Δɽ. Ƈ. ſƀƇ ʰɥŚ ƈ.
(7) . ʰɨŚ. ƈ. ϩοΫʢ3.1.1 અͷ݅ 2 Λຬͨ͢ίʔυϒϩοΫʣͷ.
(8) . ʰɨŚ. ෦ΛऔΓग़͢ɽ. STEP V STEP IV ͰऔΓग़ͨ͠෦ΛೖΕࢠؔʹ ͋ΔίʔυϒϩοΫͱͯ͠நग़͢Δɽ. ƈ. ۃେίʔυϒϩοΫϖΞͱॏෳͨ͠ίʔυϒϩοΫϖΞ. STEP VI Ҏ߱ɼਂ͞༏ઌ୳ࡧͰநߏจ͔Βίʔυ ϒϩοΫΛநग़͢Δɽ. Fig. 5 Code block pair overlapped. ίʔυϒϩοΫͷநग़ޙɼ֤ίʔυϒϩοΫʹ·ؚΕ. with a maximam code block pair. Δϫʔυͷநग़Λߦ͏ɽ ۃେϒϩοΫΫϩʔϯͱͿݺɽຊख๏ͰɼۃେϒϩοΫ ΫϩʔϯΛϒϩοΫΫϩʔϯͱఆٛ͢Δɽ. 3.3 ಛϕΫτϧͷࢉܭ. ݅ 6 Ͱࣔͨ͠Α͏ʹɼϒϩοΫΫϩʔϯϖΞίʔυ. ಛϕΫτϧͷͰࢉܭɼϫʔυʹର͠ TF-IDF ๏ [2]. ϒϩοΫؒʹڞ௨෦͕ͳ͍͜ͱ͕݅Ͱ͋Δɽίʔυϒ. Λར༻ͯ͠ॏΈΛ͠ࢉܭɼͦͷΛಛྔͱ֤ͯ͠ίʔυ. ϩοΫؒʹڞ௨෦͕ଘࡏ͢Δ߹ɼҰํͷίʔυϒϩο. ϒϩοΫΛಛϕΫτϧʹม͢ΔɽΑͬͯɼ֤ίʔυϒ. Ϋ͕ଞํΛแؚ͍ͯ͠Δ͜ͱΛ͍ࣔͯ͠Δɽྫ͑ɼਤ 4. ϩοΫͷಛϕΫτϧͷ࣍ݩιʔείʔυதʹଘࡏ͢. ͷίʔυϒϩοΫ A ͱ B ڞ௨෦͕ଘࡏ͠ɼแؚؔ. ΔશϫʔυͷछྨͱͳΔɽຊख๏Ͱɼtf ίʔυϒ. ʹ͋ΔͨΊϒϩοΫΫϩʔϯϖΞͰͳ͍ɽ. ϩοΫதͷϫʔυͷग़ݱසΛɼidf ιʔείʔυத. ·ͨɼۃେϒϩοΫΫϩʔϯΛϒϩοΫΫϩʔϯͱఆٛ ͢Δͱɼ͍͑ݴΔͱͦΕͧΕೖΕࢠؔʹ͋Δίʔυ. ͷϫʔυͷرগ͞Λද͍ͯ͠Δɽtf ࣜ (1)ɼidf ࣜ. (2) Ͱ༩͑ΒΕΔɽ. ϒϩοΫͷྨࣅ͕ᮢҎ্ͷ߹ɼ࠷֎ଆͷίʔυϒ. tf i,j = . ϩοΫϖΞΛϒϩοΫΫϩʔϯͱ͢Δͱ͍͏ҙຯͰ͋Δɽ ྫ͑ɼਤ 5 ͷίʔυϒϩοΫ A ͱ CɼB ͱ D ͦΕͧΕ. idf i = log. ͷྨࣅ͕ᮢҎ্ͱͳΔ߹ɼ࠷֎ଆͷίʔυϒϩο Ϋ A ͱ C ΛϒϩοΫΫϩʔϯͱ͢Δɽ. c 2017 Information Processing Society of Japan . ni,j k∈bj nk,j. *1. |F | |{f : f wi }|. (1) (2). http://www.antlr.org/. 4.
(9) Vol.2017-SE-196 No.19 2017/7/20. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report ද 1. ͜͜Ͱɼni,j ίʔυϒϩοΫ bj ʹ͓͚Δϫʔυ wi ͷग़ݱճɼ k∈bj nk,j ίʔυϒϩοΫ bj ʹ͓͚Δશ. ݕग़ରϓϩδΣΫτ. Table 1 Target projects. ϫʔυͷग़ݱճͷɼ|F | શؔͷɼ|{f : f wi }| ϫʔυ wi ͕ग़͢ݱΔؔͷΛ͍ࣔͯ͠Δɽ ؔΫϩʔϯݕग़๏ͱൺֱͯ͠ɼtf ͷٻΊํΛίʔυ. ϓϩδΣΫτ. όʔδϣϯ. ޠݴ. ن. Apache HTTPD*3. 2.2.14. C/C++. 343 KLOC. PostgreSQL*4. 8.5.1. C/C++. 937 KLOC. Python*5. 2.5.1. C/C++. 435 KLOC. ϒϩοΫ୯Ґʹมߋ͕ͨ͠ɼidf ͷٻΊํίʔυϒϩο Ϋ୯Ґʹมߋͤͣؔ୯Ґͷ··Ͱ͋ΔɽͳͥͳΒɼίʔ υϒϩοΫ୯ҐͰ idf ΛٻΊΔͱϫʔυͷॏΈ͚ʹภ. LSH ͷ࣮ͱͯ͠ FALCONN [1]*2 ϥΠϒϥϦΛར༻ͨ͠ɽ. Γ͕ੜͯ͡͠·͏͔ΒͰ͋Δɽ͋Δϫʔυ͕͍ͭ͘ͷίʔ υϒϩοΫʹ·ؚΕΔ͔ग़ݱॴʹΑͬͯҟͳΔɽྫ͑. 3.5 ಛϕΫτϧͷྨࣅͷࢉܭ. ɼਤ 4 ͷؔͷม a ϒϩοΫ A ͱ B ͷ 2 ݸͷίʔ. ຊख๏ͰίαΠϯྨࣅΛ༻͍ͯΫϩʔϯϖΞͷఆ. υϒϩοΫʹ·ؚΕΔ͕ɼม b ϒϩοΫ A ͷΈʹ͔͠ ·ؚΕͳ͍ɽͦͷͨΊɼιʔείʔυதͷग़ݱճಉ͡. Λߦ͏ɽίαΠϯྨࣅଟ࣍ݩϕΫτϧͷྨࣅΛද͢ ईͰ͋Γɼ࣍ ͕ݩV Ͱ͋Δ 2 ͭͷಛϕΫτϧ a, b ؒͷ. ʹؔΘΒͣɼग़͢ݱΔίʔυϒϩοΫͷ͕ҟͳͬͯ͠. ྨࣅҎԼͷࣜ (3) Ͱ༩͑ΒΕΔɽ. |V | i=1 ai bi sim(a, b) = cos(a, b) = |V | 2 |V | 2 a i=1 i i=1 bi. ·͏ɽidf ϫʔυͷرগੑʹ͍ͯͮجॏΈ͚Λߦͬ ͍ͯΔͨΊɼภΓΛແͨ͘͢Ίʹؔ୯ҐͰٻΊͨɽ. 3.4 ಛϕΫτϧͷΫϥελϦϯά. (3). TF-IDF ๏ͷࣜࢉܭΑΓɼಛྔৗʹਖ਼ͷΛऔΔͨΊɼ. ಛϕΫτϧͷΫϥελϦϯάͱͯ͠ɼؔΫϩʔϯݕ. ίαΠϯྨࣅ 0 ͔Β 1 ͷൣғͱͳΔɽίαΠϯྨࣅ. ग़๏ͱಉ༷ʹ LSH Λ༻͍ͨɽΫϥελϦϯάΛߦ͏͜ͱ. ͕ᮢҎ্Ͱ͋ΕɼͦΕΒ 2 ͭͷίʔυϒϩοΫΫ. ʹΑͬͯɼΫϩʔϯϖΞͱͳΓ͏ΔީิΛߜΔ͜ͱ͕Ͱ͖ɼ. ϩʔϯϖΞͰ͋Δͱఆ͢Δɽ. ߴͳΫϩʔϯϖΞͷݕग़͕ՄೳͱͳΔɽ͔͠͠ɼେن ͷσʔληοτΛ LSH ΞϧΰϦζϜΛ༻͍ͯߴ͍ਫ਼Ͱ. 4. ධՁ࣮ݧ ຊষͰɼຊͰڀݚఏҊͨ͠ϒϩοΫΫϩʔϯݕग़๏ͷ. ٻΊΑ͏ͱ͢ΔͱɼϝϞϦ༻ྔ͕ඇৗʹେ͖͘ͳΔͱ͍ ͏͕͋Δ [8], [9].. LSH ͷΞϧΰϦζϜɼۭؒతʹۙͨ͠ೋ͕ಉ͡ ϋογϡʹͳΔ͕֬ߴ͘ͳΔΑ͏ͳϋογϡؔΛ. ධՁ࣮͍ͯͭʹݧड़ΔɽධՁ࣮Ͱݧɼຊख๏ͱطଘख ๏ͱͷൺֱͱɼຊख๏ͷݕग़࣌ؒͱεέʔϥϏϦςΟͷධ ՁΛߦͬͨɽ. ༻͍ɼಉ͡ϋογϡΛऔΔΛಉ͡όέοτʹೖΕΔ͜ ͱͰΫϥελϦϯάΛߦ͏ɽ͔͠͠ LSH ֬తख๏Ͱ. 4.1 ؔΫϩʔϯݕग़๏ͱ CCFinder ͱͷൺֱ. ͋ΔͨΊɼۙͨ͠ೋ͕ۮવʹผͷόέοτʹೖΔՄೳ. ຊઅͰؔΫϩʔϯݕग़๏ͱ CCFinder [7] ͷ 2 ͭͷ. ੑ͕͋Δɽैདྷͷ LSH ͰෳͷϋογϡؔΛ༻͍Δ. طଘख๏ͱͷൺֱ࣮͍ͯͭʹݧड़Δɽຊ࣮Ͱݧɼର. ͜ͱͰ֬తͳࠩޡΛগͳ͘͠ਫ਼Λ্͍͛ͯΔɽͦͷͨ. ϓϩδΣΫτ͔Β࡞ͨ͠ϕϯνϚʔΫʹର͢Δݕग़ਫ਼. Ίɼେنͳσʔληοτʹରͯ͠ΑΓଟͷϋογϡ. ͱݕग़࣌ؒͷ͔؍ΒൺֱΛߦͬͨɽCCFinder ࠃ. ͕ؔඞཁͱͳΓɼ͜ΕʹΑΓ LSH ͷϝϞϦ༻ྔͷ૿. ֎ͷۀاɾେֶͰ༻͞Ε͍ͯΔͨΊɼൺֱରʹՃ͠. େʹͭͳ͕͍ͬͯΔɽ. ͨɽຊ࣮ݕͰݧग़ରͱͨ͠ϓϩδΣΫτͷҰཡΛද 1 ʹ. ͦ͜ͰɼLv ΒϝϞϦ༻ྔͷվྑΛߦͬͨ Multi-Probe. LSH[9] ΛఏҊͨ͠ɽMulti-Probe LSH ɼ͋Δ͕ೖΔό. ࣔ͢ɽ. 4.1.1 ϕϯνϚʔΫͷ࡞ํ๏ ϕϯνϚʔΫͷ࡞ҎԼͷ 3 εςοϓͰߦͬͨɽ. έοτ͚ͩͰͳ͘ɼۭؒతʹۙͨ͠όέοτ܈ௐΔ ͱ͍͏ͷͰ͋Δɽ͜ΕʹΑΓɼগͳ͍ϋογϡؔͰ. ( 1 ) ද 1 ͷϓϩδΣΫτʹର͠ɼຊख๏ɼؔΫϩʔϯ. ۮવผͷόέοτʹೖͬͨͷݟམͱ͠Λ͍Ͱ͍Δɽ࣮. ݕग़๏ɼCCFinder ͷ 3 ͭͷख๏ͰίʔυΫϩʔϯΛ. ࡍʹɼ192 ࣍ݩͷσʔληοτʹରͯ͠ಉ͡࠶( ݱ0.90). ݕग़ɽ. ΛಘΔͨΊʹɼैདྷͷ LSH 49 ݸͷϋογϡ͕ؔඞཁ. ( 2 ) ֤ख๏͕֤ϓϩδΣΫτ͔Βݕग़ͨ͠ΫϩʔϯϖΞ͔. ͩͬͨͷʹର͠ɼMulti-Probe LSH 3 ݸͷϋογϡؔ. Βɼ30 ݸͷΫϩʔϯϖΞΛϥϯμϜαϯϓϦϯά͠ɼ. Ͱؒ࣌ࡧݕΛ΄΅མͱͣ͞ʹୡͨ͠ͱ͍͏݁Ռ͕ใࠂ͞. ߹ ܭ270 ݸͷΫϩʔϯϖΞू߹Λ࡞ɽ. Ε͍ͯΔ [9]ɽ ຊख๏ͰϝϞϦ༻ྔͷվྑΛߦ͏ͨΊɼMulti-Probe. LSH Λ༻͍ͯΫϥελϦϯάΛߦ͏ɽͳ͓ɼMulti-Probe. ( 3 ) (2) Ͱ࡞ͨ͠ 270 ݸͷΫϩʔϯϖΞʹର͠ɼࢹͰ *2 *3 *4 *5. c 2017 Information Processing Society of Japan . https://falconn-lib.org/ http://httpd.apache.org/ http://www.postgresql.org/ http://www.python.org/. 5.
(10) Vol.2017-SE-196 No.19 2017/7/20. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report ද 2. ू·ͨಉ࣌मਖ਼ͷอकରͱͳΔίʔυΫϩʔϯ. Table 2 Evaluation of detection accuracy. ͔ͷஅΛߦ͍ɼϕϯνϚʔΫΛ࡞ɽ ͳ͓ɼϕϯνϚʔΫʹ٬؍ੑΛ࣋ͨͤΔͨΊɼΞϯέʔ. ݕग़ख๏. ݕग़ର. ద߹. ࠶ݱ. F. Apache HTTPD. 0.90. 0.74. 0.81. PostgreSQL. 0.57. 0.87. 0.69. Python. 0.90. 0.53. 0.67. ߹ܭ. 0.68. 0.70. 0.69. Apache HTTPD. 0.87. 0.53. 0.66. ؔΫϩʔϯ. PostgreSQL. 0.83. 0.74. 0.78. ݕग़๏. Python. 0.30. 0.21. 0.25. ߹ܭ. 0.67. 0.47. 0.55. Apache HTTPD. 0.70. 0.55. 0.62. PostgreSQL. 0.13. 0.33. 0.19. Python. 0.87. 0.63. 0.73. ߹ܭ. 0.57. 0.52. 0.54. τʹΑΓୈࡾऀʹίʔυΫϩʔϯͷஅΛґཔͨ͠ɽΞϯ ຊख๏. έʔτͷ֓ཁΛҎԼʹࣔ͢ɽ ௐࠪର. ҎԼͷ 3 ໊ʹґཔ. • ίʔυΫϩʔϯͷ ऀڀݚ1 ໊ • ίʔυΫϩʔϯͷ͍ͯ͠ࣄैʹڀݚΔେֶӃੜ 2 ໊ ࣭༰. ू·ͨಉ࣌मਖ਼ͷอकରͱͳΔ͔. ճํࣜ. ೋʢ͍/͍͍͑ʣ. ্هͷ࣭Λɼݕग़݁Ռ͔ΒαϯϓϦϯάͨ͠ 270 ݸͷΫ CCFinder. ϩʔϯϖΞʹରͯ͠ߦͬͨɽͦͯ͠ɼաͰ͋Δ 2 ਓҎ ্͕ʮ͍ʯͱճͨ͠ΫϩʔϯϖΞΛਖ਼ղͱ͠ɼຊ࣮Ͱݧ ༻͍Δਖ਼ղΫϩʔϯϖΞू߹ͱͯ͠ϕϯνϚʔΫΛ࡞͠ ͨɽຊ࣮Ͱݧɼ֤ϓϩδΣΫτͷਖ਼ղΫϩʔϯϖΞɼ. ද 3. Apache HTTPD ͕ 74 ݸɼPostgreSQL ͕ 46 ݸɼPython. ຊઅͰɼؔΫϩʔϯݕग़๏ͱ CCFinder ͱͷൺֱ࣮. ݕग़࣌ؒͷൺֱ. Table 3 Comparison of detection time. ͕ 62 ͨͬͳͱݸɽ. 4.1.2 ൺֱ݁Ռ. ݕग़ਫ਼ͷධՁ. ݕग़ର. ຊख๏. ؔΫϩʔϯݕग़๏. CCFinder. Apache HTTPD. 1m 39s. 4m 7s. 2m 1s. PostgreSQL. 2m 27s. 8m 47s. 5m 30s. Python. 1m 15s. 3m 33s. 3m 10s. ݧͷ݁Ռʹ͍ͭͯड़Δɽຊ࣮ͰݧɼϕϯνϚʔΫΛ༻ ͍ͨݕग़ਫ਼ͱݕग़࣌ؒͷ͔؍ΒൺֱΛߦͬͨɽݕग़ਫ਼ ͷࢦඪͱͯ͠ɼద߹ɼ࠶ݱɼF ͷ 3 ͭͷࢦඪΛ༻. ૯߹తʹ࠶͕֬ͱ͍͜ߴ͕ݱೝͰ͖ͨɽ. ͍ͨɽຊ࣮͚͓ʹݧΔ 3 ͭͷࢦඪΛද 2 ʹࣔ͢ɽ. ݕग़࣌ؒ. ద߹ ຊख๏ͰɼApache HTTPD ͱ Python ʹ͓͍ͯɼؔ. ݕग़࣌ؒͷൺֱͰɼද 1 ͷ 3 ͭͷϓϩδΣΫτʹର͢ Δݕग़࣌ؒΛଌఆͨ͠ɽຊ࣮ݧͷ࣮ߦڥɼCPU Intel. Ϋϩʔϯݕग़๏ CCFinder ΑΓߴ͍ద߹͕ಘΒΕͨɽ. Xeon 2.80GHz 4coreɼϝϞϦ 16GBɼϋʔυσΟεΫυϥ. ·ͨɼPostgreSQL ʹ͓͍ͯɼCCFinder ΑΓߴ͍ద. ΠϒɼOS Windows 10 64bit Ͱ͋Δɽ. ߹͕ಘΒΕ͕ͨɼؔΫϩʔϯݕग़๏ΑΓ͍ద߹ͱ. ݕग़࣌ؒͷൺֱ݁ՌΛද 3 ʹࣔ͢ɽຊख๏Ͱɼݕग़ର. ͳͬͨɽ3 ͭͷશͯͷϓϩδΣΫτͷ߹Ͱܭద߹ 0.68. શͯͷϓϩδΣΫτʹରͯ͠ 3 ҎԼͰίʔυΫϩʔϯ. ͱɼؔΫϩʔϯݕग़๏ͱಉఔͷద߹ɼCCFinder Α. Λݕग़͕Ͱ͖ͨɽ·ͨɼؔΫϩʔϯݕग़๏ʹରͯ͠ 3∼4. Γߴ͍ద߹Ͱ͋Δ͜ͱ͕֬ೝͰ͖ͨɽ. ׂఔɼCCFinder ʹରͯ͠ 4∼8 ׂఔͱɼൺֱख๏ΑΓ. ࠶ݱ. ࣌ؒͰݕग़͢Δ͜ͱ͕֬ೝͰ͖ͨɽ. ຊख๏ͰɼApache HTTPDɼPostgreSQL ʹ͓͍ͯɼ ؔΫϩʔϯݕग़๏ CCFinder ΑΓߴ͍࠶͕ݱಘΒ Εͨɽ·ͨɼPython ʹ͓͍ͯɼؔΫϩʔϯݕग़๏Α Γߴ͍࠶͕ݱಘΒΕ͕ͨɼCCFinder ΑΓ͍࠶ݱ. 4.2 ݕग़࣌ؒͱεέʔϥϏϦςΟͷධՁ ຊઅͰɼఏҊख๏ͷݕग़ن͝ͱͷݕग़࣌ؒͱεέʔ ϥϏϦςΟͷධՁʹ͍ͭͯड़Δɽݕग़ରͷنͷࢦඪ. ͱͳͬͨɽ3 ͭͷશͯͷϓϩδΣΫτͷ߹Ͱܭɼ࠶ݱ. ʹίϝϯτۭߦΛআ͍ͨ LOC Λ༻͍Δ͜ͱͱͨ͠ɽ. 0.70 ͱɼؔΫϩʔϯݕग़๏ͱ CCFinder ΑΓ૯߹తʹ࠶. ܭଌʹ cloc*6 Λ༻͍ͨɽݕग़ରɼIJaDataset 2.0*7 ͔. ͕֬ͱ͍͜ߴ͕ݱೝͰ͖ͨɽ. ΒϥϯμϜʹϑΝΠϧΛબ͠ɼද 4 ʹࣔͨ͠ LOC ͝ͱ. F. ͷαϒηοτγεςϜΛ࡞ͨ͠ɽຊ࣮ݧͷ࣮ߦڥɼ. ຊख๏ͰɼApache HTTPD ʹ͓͍ͯ F 0.81 ͱɼؔ. CPU Intel Xeon 2.80GHz 4coreɼϝϞϦ 32GBɼιϦο. Ϋϩʔϯݕग़๏ CCFinder ΑΓߴ͍ F ͕ಘΒΕͨɽ. υεςʔτυϥΠϒɼOS Windows 10 64bit Ͱ͋Δɽ·. PostgreSQL ʹ͓͍ͯ F 0.69 ͱɼCCFinder ΑΓߴ. ͨɼJava ԾϚγϯͷελοΫྖҬΛ 1GBɼώʔϓྖҬ. ͍ F ͕ಘΒΕ͕ͨɼؔΫϩʔϯݕग़๏ΑΓ͍ F . Λ 15GB ͱઃఆ࣮ͯ͠ߦͨ͠ɽ. ͱͳͬͨɽ·ͨɼPython ʹ͓͍ͯ F 0.67 ͱɼؔΫ. ݕग़࣌ؒͷධՁ݁ՌΛද 4 ʹࣔ͢ɽ͜ΕʹΑΓɼຊख๏. ϩʔϯݕग़๏ΑΓߴ͍ F ͕ಘΒΕ͕ͨɼCCFinder Α. Ͱ 100MLOC ͷݕग़ରʹରͯ͠ 4 ࣌ؒఔͰݕग़Մೳ. Γ͍ F ͱͳͬͨɽ3 ͭͷશͯͷϓϩδΣΫτͷ߹ܭͷ. F 0.69 ͱͳΓɼؔΫϩʔϯݕग़๏ͱ CCFinder ΑΓ. c 2017 Information Processing Society of Japan . *6 *7. http://cloc.sourceforge.net http://secold.org/projects/seclone. 6.
(11) Vol.2017-SE-196 No.19 2017/7/20. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report ද 4. ݕग़ن͝ͱͷݕग़࣌ؒ. Table 4 Detection time for varying input size LOC. 1K. 10K. 100K. 1M. 10M. 100M. ࣌ؒ. 1s. 2s. 15s. 2m 53s. 32m 59s. 4h 5m 17s. Ͱ͋Δ͜ͱ͕֬ೝͰ͖ͨɽ. ɪɪɫśɏſɏɏƀɏɏſɏɏ Ƌƀ ɪɪɬśƇ ɪɪɭśɏɏ ʰɏŚ ɪɪɮś ɪɪɯśſŞʴƀƇ ɪɪɰśɏ ſƀŚ ɪɫɥś ʰɏɏɏ ſƀŚ ɪɫɨśɏ ſƀŚ ɪɫɩśƈ ɪɫɪśŵƋ ɪɫɫśƋ 䠄䝁䝯䞁䝖┬␎䠅 ɪɫɬśƋŵ ɪɫɭśŚ ɪɫɮśƈ. 4.3 ϒϩοΫΫϩʔϯͷ࣮ྫ ຊઅͰɼΞϯέʔτʹͯอकରͷίʔυΫϩʔϯͱ. (a) httpd/srclib/apr/file io/unix/readwrite.c. அ͞ΕͨΫϩʔϯϖΞͷத͔Βɼຊख๏ʹΑͬͯݕग़͠ ͨϒϩοΫΫϩʔϯͷ࣮ྫΛࣔ͢ɽ ಉؔ͡ʹଘࡏ͢ΔϒϩοΫΫϩʔϯ ؔΫϩʔϯݕग़๏Ͱؔ୯Ґͷݕग़ͷͨΊɼಉؔ͡ ͰίϐʔΞϯυϖʔετΛߦ͏ͳͲͯؔ͠Ͱൃੜ ͨ͠ίʔυΫϩʔϯΛݕग़Ͱ͖ͳ͔͕ͬͨɼຊख๏Ͱݕ ग़ՄೳͰ͋ΔྫΛ֬ೝͰ͖ͨɽ ͍ؔͷҰ෦͕Ұக͢ΔϒϩοΫΫϩʔϯ. 90 ߦҎ্ͷ͍ؔͷҰ෦͕Ұக͢ΔϒϩοΫΫϩʔ ϯͰ͋Δɽؔ୯ҐͰҟͳΔॲཧΛߦ͍ͬͯΔͨΊɼؔ Ϋϩʔϯݕग़๏Ͱ͜ͷΑ͏ͳίʔυΫϩʔϯΛݕग़Ͱ ͖ͳ͔ͬͨɽ͔͠͠ɼຊख๏Ͱݕग़ՄೳͰ͋ΔྫΛ֬ೝ Ͱ͖ͨɽ จͷૠೖ͕ߦΘΕͨϒϩοΫΫϩʔϯ ؔ୯ҐͰɼྨࣅͨ͠ॲཧΛߦ͏λΠϓ 4 ͷίʔυΫ. ɪɫɰśɏſɏɏƀɏɏ ſɏɏ Ƌƀ ɪɬɥśƇ ɪɬɨśɏɏ ʰɏŚ ɪɬɩś ɪɬɪśɏ ſƀŚ ɪɬɫś ɪɬɬśſŞʴƀƇ ɪɬɭś ʰɏɏɏ ſƀŚ ɪɬɮś ɪɬɯśſ ŠʰɏƀƇ ɪɬɰśɏ ſƀŚ ɪɭɥśŚ ɪɭɨśƈ ɪɭɩśƈ ɪɭɪś ɪɭɫśſ ſŞʴƀƀƇ ɪɭɬś ʰɏɏɏſƀŚ ɪɭɭśƈ ɪɭɮś ɪɭɯśɏ ſƀŚ ɪɭɰś ɪɮɥśŚ ɪɮɨśƈ. (b) httpd/srclib/apr/file io/unix/readwrite.c ਤ 6. ϑΝΠϧͷग़ྗॲཧΛߦ͏ϒϩοΫΫϩʔϯ. Fig. 6 Block clones writing data to a file. ϩʔϯͰ͋Δ͕ɼؔΫϩʔϯݕग़๏Ͱ࣮ࡍʹݕग़Ͱ͖ ͳ͔ͬͨɽ͔͠͠ݕग़ཻΛԼ͛Δ͜ͱͰɼຊख๏Ͱݕ. ݕग़๏ͷ༗༻ੑ͕֬ೝͰ͖Δ [10]ɽ͔͠͠ɼؔΫϩʔϯ. ग़ՄೳͰ͋ΔྫΛ֬ೝͰ͖ͨɽ. ݕग़๏Ͱؔ୯Ґͷݕग़ͷͨΊʹݕग़࿙Ε͕ੜ͍ͯ͡Δ. ྨࣅͨ͠ॲཧΛߦ͏ϒϩοΫΫϩʔϯ. ՄೳੑΛߟ͑ɼݕग़ཻΛίʔυϒϩοΫ୯Ґʹখ͘͞͠. ਤ 6 ɼϑΝΠϧͷग़ྗॲཧΛߦ͏λΠϓ 4 ͷϒϩοΫ. ͨຊख๏ΛఏҊͨ͠ɽຊ࣮ݧͷ݁Ռɼ3 ͭͷϓϩδΣΫτ. ΫϩʔϯͰ͋Δɽ৭ͷίʔυยͱ৭ͷίʔυย͕Ұக. ͷ߹ʹܭରͯ͠ద߹ɼ࠶ݱɼF ͷ 3 ͭͷࢦඪͰؔ. ՕॴΛද͍ͯ͠ΔɽͲͪΒϑΝΠϧͷೖग़ྗؔ࿈ͷॲཧ. Ϋϩʔϯݕग़๏ CCFinder ΑΓߴ͍͕ಘΒΕɼݕग़ਫ਼. Λߦ͏͕ؔͩɼਤ 6(b) ͷ apr file sync ͕ؔɼਤ 6(a). ͷͰ؍ຊख๏ͷ༗༻ੑΛ֬ೝͰ͖ͨɽ. ͷ apr file flush ؔʹՃ͑ͯಠࣗͷॲཧΛߦ͏ͨΊɼؔ. ݕग़࣌ؒͱεέʔϥϏϦςΟ. Ϋϩʔϯݕग़๏Ͱݕग़Ͱ͖ͳ͔ͬͨɽ͔͠͠ݕग़ཻΛ. ຊख๏ؔΫϩʔϯݕग़๏ CCFinder ΑΓߴʹݕ. Լ͛Δ͜ͱͰ֤ؔͷڞ௨෦Λ͚ͭݟग़͠ɼਤ 6 ͷΑ͏. ग़Ͱ͖Δ͜ͱ͕֬ೝͰ͖ͨɽ·ͨɼ100MLOC ͱ͍͏େن. ʹݕग़ՄೳͰ͋ΔྫΛ֬ೝͰ͖ͨɽ. ϓϩδΣΫτʹରͯ͠ 4 ࣌ؒఔͰݕग़Ͱ͖Δ͜ͱ֬ ೝͰ͖ͨɽݕग़ͷߴԽͷཧ༝ͱͯ͠ɼMulti-Probe LSH. 4.4 ߟ ຊઅͰɼఏҊख๏ɼ͓ΑͼධՁ࣮ݧͷ݁Ռʹ͍ͭͯͷ. Λ༻͍ͯΫϥελϦϯάΛߦ͏͜ͱͰɼϝϞϦ༻ྔ͕ݮ ΓɼΑΓޮతͳ͑ߦ͕ࢉܭΔ͜ͱʹΑΓߴԽʹ͕ͬܨ. ٞΛߦ͍ɼຊख๏ͷ༗༻ੑɼ֦ுੑɼ͓ΑͼධՁ࣮ݧͷ. ͨͱߟ͑ΒΕΔɽ. ଥੑʹ͍ͭͯͷߟΛߦ͏ɽ. ϒϩοΫΫϩʔϯͷ࣮ྫ. ݕग़ਫ਼. ϒϩοΫΫϩʔϯͷ࣮ྫʹΑͬͯɼ͍ؔͷҰ෦͕. ࢁதΒ͕ߦͬͨ Tempero ΒͷίʔύεΛ༻͍ͨධՁ࣮. Ұக͢ΔίʔυΫϩʔϯɼಉؔ͡ʹଘࡏ͢Δίʔυ. ͰݧɼؔΫϩʔϯݕग़๏ద߹͕ 90%Λ͑Δͱ͍. ΫϩʔϯͳͲɼؔΫϩʔϯݕग़๏Ͱݕग़Ͱ͖ͳ͔ͬͨ. ͏ใࠂ͕͞Ε͍ͯΔ [13]ɽ·ͨɼপాΒ͕ߦͬͨόάΛؚ. ίʔυΫϩʔϯΛ֬ೝͰ͖ͨɽΑͬͯɼؔΫϩʔϯݕग़. Ήίʔυยʹର͢ΔؔΫϩʔϯݕग़ख๏ͱ CCFinder ͷ. ๏ʹΑΔݕग़࿙ΕͷݮΛࣔͤͨɽ. ൺֱ࣮ʹݧΑΓɼؔΫϩʔϯݕग़๏͕ CCFinder ΑΓߴ. ຊख๏ͷ֦ுੑ. ͍ద߹ΛಘΒΕΔ͜ͱใࠂ͞Ε͓ͯΓɼؔΫϩʔϯ. c 2017 Information Processing Society of Japan . ຊख๏ͷ࣮ɼ ࡏݱC ͱޠݴJava ʹޠݴͷΈରԠ͠. 7.
(12) Vol.2017-SE-196 No.19 2017/7/20. ใॲཧֶձڀݚใࠂ IPSJ SIG Technical Report. ͍ͯΔɽ͔͠͠ɼຊख๏Ͱ ANTLR Λ༻͍ͯߏจղੳΛ. ༻ੑΛධՁ͢Δඞཁ͕͋Δɽ͞ΒʹɼMeCC ͳͲͷଞ. ߦ͓ͬͯΓɼANTLR ʹͯߏจղੳΛߦ͏ͨΊͷจ๏ϑΝ. ͷπʔϧͱͷൺֱΛߦ͏ඞཁ͕͋Δɽ. Πϧ͕ 100 छྨҎ্༻ҙ͞Ε͍ͯΔ͜ͱ͔Βɼଞͷޠݴ ͷ֦ு͕༰қʹՄೳՄೳͰ͋Δɽ ·ͨɼຊख๏ͰίʔυϒϩοΫΛ“{ }”ʹғ·Εͨൣ ғʹΑͬͯநग़͍ͯ͠Δɽ͔͠͠ɼࣈԼ͛ʹΑΔίʔυϒ. ँࣙ ຊ༷͍͓ͯʹڀݚʑͳ͝ྗڠΛ͍͍ͨͩͨຊి ࣜגؾձࣾ લాਓࢯɼौ୩݈հࢯʹਂ͘͢ँײΔɽຊݚ ڀ JSPS Պݚඅ 25220003ɼ16K16034ɼ15H06344 ͷॿ Λड͚ͨɽ. ϩοΫͷநग़ɼ͋ΔҰఆߦͷ·ͱ·ΓΛίʔυϒϩο Ϋͱͯ͠நग़͢ΔͳͲɼநग़ํ๏ʹଞͷख๏Λద༻͢Δ͜. ࢀߟจݙ. ͱՄೳͰ͋Δɽ. [1]. ධՁ࣮ݧͷଥੑ ຊ࣮Ͱݧɼ3 ͭͷ C ޠݴͷϓϩδΣΫτͷΈʹରͯ͠ ൺֱΛߦ͏͜ͱʹΑͬͯຊख๏ͷ༗༻ੑΛࣔͨ͠ɽ͔͠ ͠ɼࠓޙɼଞͷ࣮͞ͰޠݴΕͨଟ͘ͷϓϩδΣΫτʹର. [2]. ͯ͠ద༻͠ɼҰൠੑΛࣔ͢ඞཁੑ͕͋Δɽ·ͨɼࠓճϕ ϯνϚʔΫͷ࡞ʹ͓͍ͯɼΞϯέʔτʹͯաͰ͋Δ. [3]. 2 ਓҎ্͕อकରͱͳΓ͏ΔίʔυΫϩʔϯͱఆͨ͠ ߹Λਖ਼ղͱͯ͠ѻͬͨɽ͔͠͠ɼΑΓਖ਼֬ੑΛ্͛Δͨ Ίʹɼҙׂ͕ݟΕͨΫϩʔϯϖΞʹ͍ͭͯٞΛߦ͏ඞ. [4]. ཁੑߟ͑ΒΕΔɽ. 5. ·ͱΊͱࠓޙͷ՝ ຊͰڀݚɼใٕࡧݕज़Λར༻ͨ͠ϒϩοΫΫϩʔϯ. [5]. [6]. ݕग़ख๏ͷఏҊΛߦͬͨɽຊख๏ͰɼߏจղੳΛߦ͍ ίʔυϒϩοΫͷநग़Λߦ͍ɼίʔυϒϩοΫதͷࣝผࢠ ༧ʹޠར༻͞Ε͍ͯΔ୯͔ޠΒϫʔυΛநग़͢Δɽͦ. [7]. ͯ͠ɼTF-IDF Λར༻֤ͯ͠ϫʔυʹର͢ΔॏΈΛ͠ࢉܭɼ ͦͷॏΈΛಛྔͱ֤ͯ͠ίʔυϒϩοΫΛಛϕΫτ ϧʹม͢ΔɽͦͷޙɼಛϕΫτϧؒͷྨࣅΛ͢ࢉܭ. [8]. Δ͜ͱʹΑͬͯɼҙຯతʹॲཧ༰͕ྨࣅͨ͠ϒϩοΫΫ ϩʔϯͷݕग़Λߦ͏ɽ·ͨɼLSH ΞϧΰϦζϜΛ༻͍ͯ͋. [9]. Β͔͡ΊಛϕΫτϧͷΫϥελϦϯάΛߦ͏͜ͱʹΑͬ ͯɼߴͳϒϩοΫΫϩʔϯͷݕग़Λ࣮ͨ͠ݱɽ ධՁ࣮Ͱݧɼ3 ͭͷ C ϓϩδΣΫτʹର͠ɼݕग़ਫ਼ ͱݕग़࣌ؒͷ͔؍ΒɼؔΫϩʔϯݕग़๏ͱ CCFinder. [10]. ͷ 2 ͭͷख๏ͱൺֱΛߦͬͨɽͦͷ݁Ռɼຊख๏͕૯߹త ʹߴ͍ਫ਼Ͱଟ͘ͷίʔυΫϩʔϯΛݕग़Ͱ͖Δ͜ͱ͕֬ ೝͰ͖ͨɽ·ͨɼݕग़࣌ؒ 3 ҎԼͱͳΓɼؔΫϩʔ. [11]. ϯݕग़๏ͱ CCFinder ΑΓߴʹίʔυΫϩʔϯΛݕग़Ͱ ͖ͨɽ͞ΒʹɼؔΫϩʔϯݕग़๏Ͱݕग़Ͱ͖ͳ͔ͬͨ. [12]. ίʔυϒϩοΫ୯ҐͷίʔυΫϩʔϯΛݕग़Ͱ͖ͨɽ ࠓޙͷ՝ͱͯ͠ɼҎԼ͕͛ڍΒΕΔɽ. • ϫʔυͷॏΈͷʹࢉܭɼLSIʢLatent Semantic Index-. [13]. ingʣ[2] ɼLDAʢLatent Dirichlet Allocationʣ[4] Λ ༻͍ͨख๏ͱൺֱΛߦ͏ඞཁ͕͋Δɽ. • ຊख๏Ͱ“{ }”ʹΑͬͯίʔυϒϩοΫͷఆٛΛߦͬ ͍ͯΔ͕ɼίʔυϒϩοΫͷఆٛநग़ํ๏Λ࠶ߟ͢ Δඞཁ͕͋Δɽ. [14]. Andoni, A., Indyk, P., Laarhoven, T., Razenshteyn, I. and Schmidt, L.: Practical and Optimal LSH for Angular Distance, Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 1225–1233 (2015). Baeza-Yates, R. and Ribeiro-Neto, B.: Modern information retrieval: The concepts and technology behind search, Addison-Wesley (2011). Baxter, I. D., Yahin, A., Moura, L., Anna, M. S. and Bier, L.: Clone detection using abstract syntax trees, Proceedings of International Conference on Software Maintenance, pp. 368–377 (1998). Blei, D. M., Ng, A. Y. and Jordan, M. I.: Latent dirichlet allocation, Journal of machine Learning research, Vol. 3, No. Jan, pp. 993–1022 (2003). ංޙ๕थɼೇຊਅೋɼҪ্ࠀɿίʔυΫϩʔϯݕग़ͱͦ ͷؔ࿈ٕज़ɼిࢠใ௨৴ֶձจࢽɼVol. J91-D, No. 6, pp. 1465–1481 (2008). Indyk, P. and Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality, Proceedings of the 30th annual ACM symposium on Theory of computing, pp. 604–613 (1998). Kamiya, T., Kusumoto, S. and Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code, IEEE Transactions on Software Engineering, Vol. 28, No. 7, pp. 654–670 (2002). ݹլࢤٱɿϋογϡΛ༻͍ͨྨࣅٕࡧݕज़ͱͦͷԠ༻ɼ ిࢠใ௨৴ֶձૅجɾڥքιαΠΤςΟ Fundamentals Reviewɼ Vol. 7, No. 3, pp. 256–268 (2014). Lv, Q., Josephson, W., Wang, Z., Charikar, M. and Li, K.: Multi-probe LSH: efficient indexing for highdimensional similarity search, Proceedings of the 33rd international conference on Very large data bases, pp. 950–961 (2007). Numata, S., Yoshida, N., Choi, E. and Inoue, K.: On the Effectiveness of Vector-Based Approach for Supporting Simultaneous Editing of Software Clones, pp. 560–567, Springer International Publishing (2016). Rattan, D., Bhatia, R. and Singh, M.: Software clone detection: A systematic review, Information and Software Technology, Vol. 55, No. 7, pp. 1165–1199 (2013). Roy, C. K., Cordy, J. R. and Koschke, R.: Comparison and evaluation of code clone detection techniques and tools: A qualitative approach, Science of Computer Programming, Vol. 74, No. 7, pp. 470–495 (2009). ࢁத༟थɼቐɹԸಬɼ٢ాଇ༟ɼҪ্ࠀɿใٕࡧݕज़ ʹߴͮ͘جͳؔΫϩʔϯݕग़ɼใॲཧֶձจࢽɼ Vol. 55, No. 10, pp. 2245–2255 (2014). Yamanaka, Y., Choi, E., Yoshida, N., Inoue, K. and Sano, T.: Applying clone change notification system into an industrial development process, Proceedings of the 21st International Conference on Program Comprehension, pp. 199–206 (2013).. • ଞͷେنϓϩδΣΫτʹରͯ͠ద༻͠ɼຊख๏ͷ༗ c 2017 Information Processing Society of Japan . 8.
(13)
図
関連したドキュメント
Here is the “surprise”: the validity of assumption (2.14) on Claim 2.3 for some hyperbolic/Petrowski-type systems is verified (see Section 4) by precisely the same hard analysis
Here is the “surprise”: the validity of assumption (2.14) on Claim 2.3 for some hyperbolic/Petrowski-type systems is verified (see Section 4) by precisely the same hard analysis
Bae, “Blind grasp and manipulation of a rigid object by a pair of robot fingers with soft tips,” in Proceedings of the IEEE International Conference on Robotics and Automation
T´oth, A generalization of Pillai’s arithmetical function involving regular convolutions, Proceedings of the 13th Czech and Slovak International Conference on Number Theory
Moreover, it is important to note that the spinodal decomposition and the subsequent coarsening process are not only accelerated by temperature (as, in general, diffusion always is)
de la CAL, Using stochastic processes for studying Bernstein-type operators, Proceedings of the Second International Conference in Functional Analysis and Approximation The-
In this paper, we obtain some better results for the distance energy and the distance Estrada index of any connected strongly quotient graph (CSQG) as well as some relations between
This is applied in Section 3 to linear delayed neutral difference- differential equations and systems, with bounded operator-valued coefficients: For weighted LP-norms or