• 検索結果がありません。

グラフ再構築による大規模ウェブグラフ解析の性能評価

N/A
N/A
Protected

Academic year: 2021

シェア "グラフ再構築による大規模ウェブグラフ解析の性能評価"

Copied!
6
0
0

読み込み中.... (全文を見る)

全文

(1)Vol.2015-DBS-162 No.18 2015/11/26. IPSJ SIG Technical Report. 1,a). ,. 1,b). ,. . .. ,. ,. , .. , ,. .. , ,. , . 6. .. .. ,. , ,. .. .. 1.. , ,. ,. .. ,. ,. ,. .. . .. , .. 2014. 9. Internet Live Stats. ,. 2.. 10. ,. [1].. [7].. ,. .. ,. ,2. .. Step 1 . ,. [2],[3],[4],[5],[6] 1. . .. Step 2. , , , [7].. , ,. Compression graph Cluster graph. , ,. ,. 2 .. . ,. .. 1. a) b). Future University Hakodate [email protected] [email protected]. ⓒ 2015 Information Processing Society of Japan. 2.1. 1.

(2) Vol.2015-DBS-162 No.18 2015/11/26. IPSJ SIG Technical Report. 3. 3 .1. Compression graph. Cluster graph. ,2 ,3. .1 Compression graph. graph. Cluster. , .. PageRank. . Compression graph. PageRank Cluster graph. 1. PageRank. .Compression graph ,. .. 1. 2. ,. , 2. PageRank .. Cluster graph. PageRank. ,. .. .. ,. , 2.2. Compression graph. PageRank. ,Cluster graph .. PageRank. .. Compression graph. ,Cluster. Cluster graph. . 2. ,. .. ,. , .. ,. 2. .. 3. .. 4.. 2.1. , , Compression graph. , ,. Cluster graph. .. . SCAN[8], PageRank =2. .SCAN. .SCAN. Hub,Outlier. ,. Cluster. 3. ,. Outlier. , .SCAN. MacBook Air. 6.2. .. .. . Spark. ε=0.7,. 2 Hadoop,PageRank. .. 4. .. Laboratory for Web Algorithmics[9] 3. CPU. 2. ⓒ 2015 Information Processing Society of Japan. . 1 Intel Core i5 1.4GHz. Memory. 8.0GB. Disk. SSD 128GB. 4.1. 2.

(3) Vol.2015-DBS-162 No.18 2015/11/26. IPSJ SIG Technical Report. 3 2 Hadoop. 1.0.3. Spark. 1.4.0. 4.3 .. 5. .. SCAN ,. 3 Data Set. Nodes. Edges. cnr-2000. 325,557. 3,216,152. .. , .. , ,. PageRank. ,. .. PageRank. .. 4. .. 6. 5. , .. 231. 32. 40. 12. 4. 5. PageRank. 476. 5.1. 173. 1. Compression graph. Cluster graph. .. 4. 4.2. . .. 1000 .. ,. ,. Compression graph. 4. PageRank. . , .. 5. ,Cluster graph. PageRank. . ,. ,. .. Gap. Gap. .. SCAN. ,. Hub. Cluster graph 1.0. Gap = OriginalRank − P roposalRank. .Hub. , .. PageRank ,Hub .. OriginalRank. ,. ,Compression graph. PageRank. Cluster graph. ,P roposalRank. .. ,Hub. PageRank .. 5. ,. 100. P roposalRank. ,. OriginalRank ,100. . .. ⓒ 2015 Information Processing Society of Japan. . 2 5. . ,. 100. P roposalRank 3.

(4) Vol.2015-DBS-162 No.18 2015/11/26. IPSJ SIG Technical Report. 4. 5. OriginalRank. 6.1. ,100 .. , ,. .. .. , 2. 5.2. .. 6.1.1. 1. .. , .P. Boldi. SCAN Outlier. .. WebGraph[9]. ,. .. , .. ,. .. , . ,. . .. ,SCAN. ,“home”,“next”,“previous”,“up”. ,. . .. ,. .. ,. URL. (. ,. ). ,. . .. , ,. .. ,. .. .. 2. . . PageRank. .. ,. URL. PageRank. URL. ,. .. ,. . ,. URL. .. .. ,. .. 6.. . , SCAN. ,URL. ID .. ,. . P. Boldi. , ,. ⓒ 2015 Information Processing Society of Japan. 4.

(5) Vol.2015-DBS-162 No.18 2015/11/26. IPSJ SIG Technical Report. .. ,1. 3.08bit. ,. .S. Tei. WebGraph. .. , WebGraph. 6.2 SCAN. 2. [10].. ,. SCAN[8]. , .. .SCAN. ,. G = (V, E). PageRank. .. ,. ,. ε. ,Cluster. structure-. connected cluster. .. 6.1.2. Hub. .Cluster. Outlier. . unclassfied. .G. Buehrer Virtual Node Miner[2]. ,. .SCAN. (. unclassfied. ,. ,VN). core. .core .VN. 6. 7. Cluster. .core. 2. .. , ,. .Virtual Node Miner. 2. ε. 7800 ,. 1700. ,. ,. ε-neighborhood. . ,. unclassfied. ,. core. .. 3. ,ε-neighborhood. VN. .. 1/7 v ∈ V. .. ,. v ,. Γ(v). v. . Γ(v) = {v ∈ V | {v, w} ∈ E} ∪ {v}. |Γ(v)|. ,. v, w. 6. σ(v, w). .. 7. LittleWeb[3]. |Γ(v) ∩ Γ(w)| σ(v, w) = ! |Γ(v)||Γ(w)|. , .LittleWeb. ε-neighborhood ,. ε. 2 . 56.3. neighborhood. , .. 67.3. ,PageRank. , Nε (v) = {w ∈ Γ(v)|σ(v, w) ≥ ε}. 67. .. ,ε-. .. , ,PageRank .. Core ε ∈ R,µ ∈ N,v ∈ V ,|Nε (v)| ,core. ,PageRank .. v. ε-neighborhood. COREε,µ (v). .. COREε,µ (v) ⇔ |Nε (v)| ! µ. 6.1.3 .. ,. SCAN. core ID. ,PageRank. ,. clusterID. core ,core. structure-connected cluster. .. ,. core. ,. ,. .. non-menber. . ,. ⓒ 2015 Information Processing Society of Japan. SCAN. core. ,structure 5.

(6) Vol.2015-DBS-162 No.18 2015/11/26. IPSJ SIG Technical Report. reachability. Cluster. terID. .. ,. Q. clus6. ε-neighborhood ,. PageRank. core .. Q. 1000. ,direct structure. reachability. R. direct structure reachability. ,. .. structure reachability. ,. .. ,. Direct structure reachability v. w. .. direct structure reachability. DirREACHε,µ (v, w). . [1]. DirREACHε,µ (v, w) ⇔ COREε,µ (v) ∧ Nε (v). [2]. Structure reachability ε ∈ R,µ ∈ N,v ∈ V. ,. Structure reachability. v. w. REACHε,µ (v, w). . [3]. REACHε,µ (v, w) ⇔ ∃1 , ...vn ∈ V : v1 = v ∧ vn = w ∧ ∀i ∈ {1, ..., n − 1} : DirREACHε,µ (vi , vi+1 ) R. ,. [4]. Q ,. .. [5]. R unclassfied. ,. R menber. ,. .. .. .. Cluster. .. ,. Q. .. unclassfied,. non-. clusterID. .. Q cluster. Web. ,structure-connected. structure-connected cluster. ,non-. menber. Hub. 2. Cluster. ,. Hub. [8]. Outlier. .non-menber .1. ,. [9]. Cluster Outlier. .. ,. 10 AFP ⟨http://www.afpbb.com/articles/2015-08-27 . /3026121⟩ Buehrer, G. and Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities, Proceedings of the 2008 International Conference on Web Search and Data Mining, ACM, pp. 95–106 (2008). LittleWeb Web The 2nd Forum on Data Engineering and Information Management, DBSJ, pp. E1–4 (2010). Kohlsch¨ utter, C., Chirita, P.-A. and Nejdl, W.: Efficient parallel computation of pagerank, Advances in information retrieval, Springer, pp. 241–252 (2006). Wicks, J. and Greenwald, A.: Parallelizing the computation of pagerank, Algorithms and Models for the WebGraph, Springer, pp. 202–208 (2007).. [6]. [7]. .. AFP:. (2008). MapReduce 2 PageRank The 7th Forum on Data Engineering and Information Management, DBSJ, pp. E5–4 (2015). Xu, X., Yuruk, N., Feng, Z. and Schweiger, T. A.: Scan: a structural clustering algorithm for networks, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 824–833 (2007). Boldi, P. and Vigna, S.: The webgraph framework I: compression techniques, Proceedings of the 13th international conference on World Wide Web, ACM, pp. 595–602 (2004).. [10]. SCAN. The 6nd Forum on Data Engineering and Information Management, DBSJ, pp. D6–1 (2014).. .. 7. , , ,. .. , , .. , ,. .. ,. , ⓒ 2015 Information Processing Society of Japan. .. , 6.

(7)

参照

関連したドキュメント

Bae, “Blind grasp and manipulation of a rigid object by a pair of robot fingers with soft tips,” in Proceedings of the IEEE International Conference on Robotics and Automation

Standard domino tableaux have already been considered by many authors [33], [6], [34], [8], [1], but, to the best of our knowledge, the expression of the

T´oth, A generalization of Pillai’s arithmetical function involving regular convolutions, Proceedings of the 13th Czech and Slovak International Conference on Number Theory

Zheng and Yan 7 put efforts into using forward search in planning graph algorithm to solve WSC problem, and it shows a good result which can find a solution in polynomial time

東京都は他の道府県とは値が離れているように見える。相関係数はこう

We then prove the existence of a long exact sequence involving the cohomology groups of a k-graph and a crossed product graph.. We finish with recalling the twisted k-graph C

This paper presents an investigation into the mechanics of this specific problem and develops an analytical approach that accounts for the effects of geometrical and material data on

(4S) Package ID Vendor ID and packing list number (K) Transit ID Customer's purchase order number (P) Customer Prod ID Customer Part Number. (1P)