Matrix Representations of Graphs and Their Experimental Comparison for Detecting Non-subgraphs by Eigenvalues

全文

(1)Journal of Information Processing Vol.22 No.4 638–641 (Oct. 2014) [DOI: 10.2197/ipsjjip.22.638]. Technical Note. Matrix Representations of Graphs and Their Experimental Comparison for Detecting Non-subgraphs by Eigenvalues Kaoru Katayama1,a). Takumi Sato1,b). Received: March 23, 2014, Accepted: May 17, 2014. Abstract: Eigenvalues of graphs have been used for detecting non-subgraphs or non-supergraphs based on their interlacing property. However the detected subgraphs are often restricted to induced subgraphs or trees due to their matrix representations. We consider five matrix representations of a graph, which can be used to detect general non-subgraphs or non-supergraphs, and compare them experimentally. Keywords: graph, eigenvalue, matrix. 1. Introduction Graphs are commonly used as models of real data such as XML documents, 3D objects, chemical compounds, Web data and social networks [1]. Eigenvalues of graphs are used in indexing [2], [3], [4] and clustering [5] of graphs. The interlacing property of eigenvalues [6] is applied to detecting non-subgraphs or non-supergraphs of a query graph from a graph database. Although the adjacency matrix is commonly used as a representation of a graph, that of a subgraph of the graph does not have the prerequisite for the property. That is, the adjacency matrix of the subgraph is not a principal submatrix of the adjacency matrix of the graph in general. In such cases, the detected subgraphs are restricted to induced subgraphs or trees due to their matrix representations. In order to detect general non-subgraphs, we define a matrix representation of a labeled graph in Ref. [7], which have the prerequisite of the theorem. However the size of the matrix is much larger than the size of the adjacency matrix and it costs more to compute the eigenvalues. In this paper, we present five matrix representations of a graph for detecting non-subgraphs or non-supergraphs, which include a matrix modified slightly from that used in Ref. [7]. They are based on the adjacency matrix, the signless Laplacian matrix and the Seidel matrix of a graph or its line graph. For non-regular graphs, there is no simple relation among the eigenvalues of these matrices [8]. We compare the processing time and the number of detected non-subgraphs experimentally for the matrix representations. To our knowledge, this is the first such experimental comparison of these matrix representation in terms of detecting non-subgraphs or non-supergraphs according to the interlacing property of the eigenvalues. In Ref. [7], it is shown that the processing time and the number of detected non-supergraphs are improved by decomposing graphs according 1 a) b). Graduate School of System Design, Tokyo Metropolitan University, Hino, Tokyo 191–0065, Japan [email protected] [email protected]. c 2014 Information Processing Society of Japan . to labels of the vertices and the edges. We also decompose graphs as the same way before computing their eigenvalues in this work.. 2. Preliminaries 2.1 Graphs and Matrices We focus on a labeled undirected simple graph g = (V g , E g , L(V)g , L(E)g , μg , νg ). V g is a set of vertices. E g is a set of edges where an edge e in E g is an ordered pair (v1 , v2 ) of vertices v1 and v2 in V g . L(V)g and L(E)g are sets of labels of vertices and edges, respectively. μg and νg are labeling functions V g → L(V)g and E g → L(E)g , respectively. The line graph L(g) of a simple graph g is the graph where a vertex is associated with each edge of g and two vertices are adjacent if and only if the corresponding edges of g have a vertex in common. Figure 1 shows examples of a graph and its line graph. We use the signless Laplacian matrix and the Seidel matrix of a graph in addition to the adjacency matrix and the incidence matrix as matrix representations of a graph. The signless Laplacian matrix Pg of a graph g is a |V g | × |V g | matrix where the i-th diagonal element is the degree of the vertex vi ∈ V g , the (i, j) element is 1 if (vi , v j ) ∈ E g and the other elements are 0. The Seidel matrix S g = (si j ) of a graph g is a |V g | × |V g | matrix where the diagonal elements are 0, the (i, j) element is −1 if (vi , v j ) ∈ E g and the other elements are 1. 2.2 Graphs and Eigenvalues Let {αi }i=1,...,n and {β j } j=1,...,m be two ordered sequences of real numbers where m < n, α1 ≤ α2 ≤ · · · ≤ αn and β1 ≤ β2 ≤ . . . ≤ βm , respectively. We say that {β j } j=1,...,m interlaces {αi }i=1,...,n , if the condition, αk ≤ βk ≤ αk+(n−m) , is satisfied for k = 1, . . . , m.. g. L(g). Fig. 1 Graph g and its line graph L(g).. 638.

(2) Journal of Information Processing Vol.22 No.4 638–641 (Oct. 2014). g. g[1]. g[1][3]. Fig. 2 Graph g and decomposed graphs g[1], g[1][3].. Theorem 1 (Interlace Theorem) [6] Let A and B be n × n and m × m real symmetric matrices where n > m, respectively. If B is a principal submatrix of A, eigenvalues of B interlace those of A. If matrix representations of a graph and its subgraph have the prerequisite of this theorem, we can check whether a graph is not a subgraph of another graph by comparing their eigenvalues. 2.3 Graph Decomposition by Labels We decompose graphs according to labels of the vertices and the edges before computing their eigenvalues as follows, which is the same as Ref. [7]. VE decomposing graphs according to labels of vertices and then decomposing the result according to labels of edges EV decomposing graphs according to labels of edges and then decomposing the result according to labels of vertices Example 1 Figure 2 shows an example of the decomposition. A graph g is decomposed into g[1] according to the label 1 of the vertex. g[1] is decomposed into g[1][3] according to the label 3 of the edge. If g s is a subgraph of g, the graphs into which g s is decomposed according to a label are also subgraphs of the graphs into which g is decomposed according to the same label. In detecting non-subgraphs or non-supergraphs, eigenvalues of decomposed graphs which have the same labels are compared based on the theorem. Before computing the eigenvalues, we compare the sizes of a matrix representations of decomposed graphs of g and g s . If that of g s is larger than that of g with the same labels, g s is not a subgraph of g. In such case, there is no need to compute their eigenvalues.. 3. Matrix Representations for Detecting Nonsubgraphs We used the following matrix representation of a graph for detecting non-subgraphs based on the interlace theorem in Ref. [7]. The extended incidence matrix C g of a graph g is the matrix ⎤ ⎡ ⎢⎢⎢ Q N ⎥⎥⎥ ⎥⎦ where N = (ni j ) is the |V g | × |E g | incidence matrix of ⎢⎣t N R g, Q = (qi j ) and R = (ri j ) are |V g | × |V g | and |E g | × |E g | diagonal matrices whose i-th diagonal element is the label μg (vi ) of the vertex vi ∈ V g and the label νg (ei ) of the edge ei ∈ E g , respectively. t N is the transpose of a matrix N. In this paper, we assign the label νg (e j ) of the edge e j ∈ E g to the (i, j) element ni j of N. In order to reduce the sizes of matrices and the cost of computing their eigenvalues, we consider the following smaller matrix representations. Adjacency matrix of line graph Although the adjacency matrix of a subgraph g s of a graph g is not the principal submatrix of the adjacency matrix of g in general, the adjacency s matrix AL(g ) of the line graph L(g s ) is always the principal submatrix of the adjacency matrix AL(g) of L(g). However g. c 2014 Information Processing Society of Japan . cannot be restored from L(g) in general. We assign labels of vertices and edges of g to the corresponding edges and vertices of L(g) as the graphs of Fig. 1, respectively. The i-th diagonal element of AL(g) is the label μL(g) (vi ) of the vertex vi ∈ V L(g) , the (i, j) element is the label νL(g) (vi , v j ) of the edge (vi , v j ) ∈ E L(g) and the other elements are 0. Seidel matrix of line graph As with the adjacency matrix of a graph g, the Seidel matrix S g of g does not have the prerequisite of the interlace theorem, but the Seidel matrix S L(g) of the line graph L(g) does. Since there is no simple way to assign labels of vertices and edges of L(g) to elements of the Seidel matrix like the adjacency matrix, we do not assign the labels to the elements and use the Seidel matrix as the definition. Signless Laplacian matrix The signless Laplacian matrix Pg of a graph g does not have the prerequisite of the interlace theorem. However we can use it as follows. Pg is equal to the multiplication N × t N of the |V g | × |E g | incidence matrix N and its transpose t N. Although Pg = N × t N does not have the prerequisite, the matrix t N × N does. In addition, the non-zero eigenvalues of N × t N are equal to those of t N × N. Therefore we can use eigenvalues of t N × N or Pg = N × t N to detecting non-subgraphs based on the theorem by adding a necessary number of zeros to them. In order to reduce the cost of computing eigenvalues, we choose the matrix whose size is smaller. We do not assign labels of the vertices and the edges to elements of t N × N or Pg = N × t N and use these matrices as the definition. Signless Laplacian matrix of line graph If a graph g s is a subgraph of a graph g, the line graph L(g s ) is also a subgraph of L(g). Therefore we can use the signless Laplacian matrix PL(g) of the line graph in the same way as Pg . Example 2 Figure 3 shows the extended incidence matrix C g , the adjacency matrix AL(g) , the Seidel matrix S L(g) , the two signless Laplacian matrices Pg = N × t N, PL(g) , and the matrix t N × N for the graph g and its line graph L(g) of Fig. 1. N is the incidence matrix of g. Since the size of t N × N is smaller than Pg = N × t N in this case, we compute the eigenvalues of t N × N instead of those of Pg = N × t N. The eigenvalues of t N × N and Pg √ √ √ √ are {2 − 2, 2, 2 + 2} and {0, 0, 2 − 2, 2, 2 + 2}, respectively.. ⎡ ⎢⎢⎢1 ⎢⎢⎢⎢0 ⎢⎢⎢ ⎢⎢⎢0 ⎢⎢⎢0 g C = ⎢⎢⎢⎢ ⎢⎢⎢0 ⎢⎢⎢0 ⎢⎢⎢ ⎢⎢⎢0 ⎣ 0. S L(g). t. 0 1 0 0 0 3 0 0. ⎡ ⎢⎢⎢ 0 = ⎢⎢⎢⎢⎣−1 1. ⎡ ⎢⎢⎢2 N × N = ⎢⎢⎢⎣⎢1 0. ⎤ 0⎥⎥ ⎥ 0⎥⎥⎥⎥ ⎥ 0⎥⎥⎥⎥ ⎥ 3⎥⎥⎥⎥ L(g) ⎥,A 3⎥⎥⎥⎥⎥ 0⎥⎥⎥⎥⎥ ⎥ 0⎥⎥⎥⎦ 3 ⎡ ⎢⎢⎢1 ⎤ ⎢⎢⎢1 −1 1 ⎥⎥ ⎢⎢ ⎥⎥⎥ g t 0 −1⎥⎥ , P = N × N = ⎢⎢⎢⎢0 ⎢⎢⎢ ⎦ −1 0 ⎢⎢⎣0 0 ⎡ ⎤ ⎤ 1 0⎥⎥ 1 1 0 ⎢ ⎥ ⎢⎢ ⎥⎥ ⎥ 2 1⎥⎥⎥⎥ , PL(g) = ⎢⎢⎢⎢1 2 1⎥⎥⎥⎥ ⎣ ⎦ ⎦ 1 2 0 1 1 0 0 1 0 0 3 3 0. 0 0 0 1 0 0 3 3. 0 0 0 0 1 0 0 3. 0 3 3 0 0 3 0 0. 0 0 3 3 0 0 3 0. ⎡ ⎢⎢⎢3 = ⎢⎢⎢⎢⎣1 0. 1 2 1 0 0. ⎤ 0⎥⎥ ⎥ 1⎥⎥⎥⎥ , ⎦ 3. 1 3 1. 0 1 2 1 0. 0 0 1 1 0. ⎤ 0⎥⎥ ⎥ 0⎥⎥⎥⎥ ⎥ 0⎥⎥⎥⎥ , ⎥ 0⎥⎥⎥⎥⎦ 0. Fig. 3 Matrix representations of graph g and its line graph L(g) of Fig. 1.. 639.

(3) Journal of Information Processing Vol.22 No.4 638–641 (Oct. 2014). 4. Experimental Evaluation For the five matrix representations of a graph g and a smaller graph g s , which are described in Section 3, we compare the processing time and the number of detected non-subgraphs in the cases of varying the number of vertices of g s , the densities of g and g s , and the number of labels of vertices of g and g s . The processing time is the time required to compute eigenvalues of the decomposed graphs of g and g s and compare them. We check whether g s is not a subgraph of g using the following procedure. ( 1 ) decompose g and g s according to labels of the vertices and the edges in each of the ways VE and EV, which are described in Section 2.3 ( 2 ) compare the sizes of matrix representations of each of the decomposed graphs of g and g s ( 3 ) check whether eigenvalues of each of the decomposed graphs of g s interlace eigenvalues of the decomposed graph of g with the same label The processing time and the required memory size depend on the sizes of matrix representations of g and g s , and the number of labels of their vertices and edges since they are decomposed according to labels. In each experiment, we prepare 500 pairs of graphs g s and g and repeat the experiment twice. These graphs are generated randomly with the graph generator which we develop. The results of the experiments are the average performance for two sets of 500 pairs of graphs. All algorithms are implemented using Matlab R2012b. The experiments done on a PC running Microsoft Windows 7 Professional with the Intel Core i3 3.3 GHz processor and 16 GB RAM. The ways VE and EV for decomposing graphs have little affect on the results in the case of using the same matrix representation. Table 1 shows the result when we vary the number of vertices of g s from 50 to 90. The number of vertices of g is 100. The denTable 1 (a) Processing Time (sec) Vertices Extended Incidence Matrix Adjacency Matrix of Line Graph Seidel Matrix of Line Graph Signless Laplacian Matrix Signless Lapalcian Matrix of Line Graph. VE EV VE EV VE EV VE EV VE EV. 50 6.34 6.44 2.78 2.76 2.73 2.70 0.56 0.57 3.18 3.14. 60 6.79 6.88 2.91 2.92 2.85 2.84 0.61 0.62 3.33 3.32. (a) Processing Time (sec) Density Extended Incidence Matrix Adjacency Matrix of Line Graph Seidel Matrix of Line Graph Signless Laplacian Matrix Signless Laplacian Matrix of Line Graph. VE EV VE EV VE EV VE EV VE EV. 30 5.07 5.25 1.73 1.72 1.69 1.69 0.62 0.64 1.91 1.90. c 2014 Information Processing Society of Japan . 35 6.27 6.34 2.39 2.35 2.34 2.30 0.67 0.69 2.74 2.70. 70 7.61 7.68 3.19 3.18 3.13 3.11 0.65 0.66 3.56 3.52. sity of each g and g s is 0.4. We assign 3 labels to each of vertices and edges in g and g s . As the number of vertices of g s increases, the number of detected non-subgraphs increases in all matrix representations. This is because it becomes more difficult that eigenvalues of g s interlace those of g as the difference between the sizes of their matrix representations decreases. The largest number of non-subgraphs is detected in the case of using adjacency matrices of their line graphs. However it differs only slightly from that in the case of using extended incidence matrices or signless Laplacian matrices. The least number of non-subgraphs is detected in the case of using Seidel matrices of their line graphs. We detect non-subgraphs the fastest in the case of using signless Laplacian matrices. This is due to the size of a signless Laplacian matrix of a graph, which depends on only the number of the vertices, is smaller than other matrices. The processing time for signless Laplacian matrices at 90 vertices becomes faster than that for 80 vertices since many non-subgraphs can be detected without computing their eigenvalues by comparing the sizes of each decomposed graphs in the second step of the procedure. Table 2 shows the result when we vary the densities of g and g s from 30 to 50. The numbers of vertices of g and g s are 100 and 70, respectively. We assign 3 labels to each of vertices and edges of g and g s . As the density of g s decreases, the number of detected non-subgraphs increases in all the matrix representations. This is because difference between the sizes of matrix representations of g and g s decreases as the density of g s decreases. The number of detected non-subgraphs and the processing time for the five matrix representations shows the same tendency as the results shown in Table 1. The largest number of non-subgraphs is detected in the case of using adjacency matrices of their line graphs. As the densities of graphs increase, we can detect them much faster in the case of using signless Laplacian matrices than the case of using the other matrix representations.. Varying number of vertices of smaller graph. (b) Number of Detected Non-subgraphs 80 8.78 8.79 3.68 3.65 3.63 3.61 0.66 0.67 3.76 3.73. 90 10.05 9.97 4.31 4.23 4.25 4.15 0.59 0.59 3.59 3.56. Vertices Extended Incidence Matrix Adjacency Matrix of Line Graph Seidel Matrix of Line Graph Signless Laplacian Matrix Signless Laplacian Matrix of Line Graph. VE EV VE EV VE EV VE EV VE EV. 50 74 74 79 79 56 56 75 75 72 72. 60 203 203 204 204 159 159 203 203 195 195. 70 391 391 391 391 348 348 391 391 383 383. 80 484 484 484 484 472 472 484 484 478 478. 90 500 500 500 500 500 500 500 500 500 500. 45 355 355 358 358 324 324 355 355 346 346. 50 340 340 344 344 307 307 340 340 338 338. Table 2 Varying densities of pair of graphs. (b) Number of Detected Non-subgraphs 40 6.97 6.80 2.92 2.81 2.85 2.76 0.66 0.65 3.42 3.32. 45 8.08 7.99 3.68 3.59 3.61 3.52 0.68 0.68 4.37 4.30. 50 9.42 9.36 4.60 4.53 4.52 4.42 0.71 0.71 5.56 5.54. Density Extended Incidence Matrix Adjacency Matrix of Line Graph Seidel Matrix of Line Graph Signless Laplacian Matrix Signless Laplacian Matrix of Line Graph. VE EV VE EV VE EV VE EV VE EV. 30 427 427 428 428 381 381 428 428 415 415. 35 385 385 387 387 341 341 386 386 379 379. 40 376 376 376 376 340 340 376 376 369 369. 640.

(4) Journal of Information Processing Vol.22 No.4 638–641 (Oct. 2014). Table 3 Varying number of vertex labels of pair of graphs. (b) Number of Detected Non-subgraphs (a) Processing Time (sec) Labels Extended Incidence Matrix Adjacency Matrix of Line Graph Seidel Matrix of Line Graph Signless Laplacian Matrix Signeless Laplacian Matrix of Line Graph. VE EV VE EV VE EV VE EV VE EV. 2 21.82 21.51 11.86 11.75 11.67 11.57 1.22 1.22 14.00 13.99. 3 7.65 7.59 3.20 3.13 3.14 3.08 0.70 0.70 3.73 3.64. 4 3.72 3.84 1.32 1.32 1.29 1.28 0.51 0.53 1.42 1.41. 5 2.07 2.13 0.73 0.71 0.71 0.69 0.40 0.40 0.69 0.68. 6 1.45 1.60 0.53 0.53 0.51 0.51 0.34 0.36 0.45 0.44. Table 3 shows the result when we vary the number of labels of vertices of g and g s from 2 to 6. We assign 3 labels to edges of g and g s . The numbers of vertices of g and g s are 100 and 70, respectively. The densities of g and g s are 0.4. As the number of labels of vertices increases, the number of detected non-subgraphs increases in all the matrix representations. This is because we check interlacing of eigenvalues of the more decomposed graphs of g and g s as the the number of labels of vertices increases. In the case of using extended incidence matrices, adjacency matrices of their line graphs and signless Laplacian matrices, the largest number of non-subgraphs is detected for almost any number of labels of vertices. As the number of the labels decreases, we can detect them faster in the case of using signless Laplacian matrices than the case of using the other matrix representations. This is because the sizes of signless Laplacian matrices of decomposed graphs become smaller than the sizes of other matrix representations of them as the number of the labels decreases.. 5. Conclusion We present the five symmetric matrix representations of a graph g, which have the prerequisite of the interlace theorem. That is, the matrix representation of a subgraph g s of g is a principal submatrix of the matrix representation of g, in general. We compare them experimentally in terms of the processing time and the number of non-subgraphs which are detected by comparing eigenvalues of g and g s based on the theorem. The graphs are decomposed according to labels of the vertices and the edges before computing their eigenvalues. In the results of our experiment, the largest or almost the largest number of non-subgraphs are detected in the shortest time when we use the signless Laplacian matrix as the matrix representation.. Labels Extended Incidence Matrix Adjacency Matrix of Line Graph Seidel Matrix of Line Graph Signless Laplacian Matrix Signless Laplacian Matrix of Line Graph. [6] [7]. [8]. VE EV VE EV VE EV VE EV VE EV. 2 132 132 133 133 119 119 132 132 145 145. 3 384 384 384 384 336 336 384 384 380 380. 4 478 478 478 478 458 458 478 478 470 470. 5 498 498 498 498 495 495 498 498 496 496. 6 500 500 500 500 500 500 500 500 500 500. Computer Science, Vol.6332, pp.205–220, Springer Berlin/Heidelberg (2010). Haemers, W.H.: Interlacing Eigenvalues and Graphs, Linear Algebra Appl., pp.593–616 (1995). Katayama, K., Amagasa, Y. and Nagaya, H.: Detecting Non-subgraphs Efficiently by Comparing Eigenvalues of Decomposed Graphs, IEICE Trans. Information and Systems, Vol.E95-D, No.11, pp.2724–2727 (2012). Cvetković, D., Rowlinson, P. and Simić, S.: An Introduction to the Theory of Graph Spectra, Cambridge University Press (2009).. Kaoru Katayama received his Ph.D. degree in informatics from Kyoto University in 2000. He is currently an associate professor in the Graduate School of Information and Communication Systems, Tokyo Metropolitan University. His research interests are in the areas of data engineering and data mining.. Takumi Sato received his Bachelor degree in engineering from Tokyo Metropolitan University in 2014. He is currently a master’s student in the Graduate School of Information and Communication Systems, Tokyo Metropolitan University. His research interests are in the area of data engineering.. References [1] [2]. [3]. [4] [5]. Cook, D.J. and Holder, L.B. (Eds.): MINING GRAPH DATA, John Wiley & Sons, Inc. (2007). ¨ Zhang, N., Ozsu, M.T., Ilyas, I.F. and Aboulnaga, A.: FIX: Featurebased Indexing Technique for XML Documents, Proc. 32nd International Conference on Very Large Data Bases, pp.259–270, VLDB Endowment (2006). Shokoufandeh, A., Macrini, D., Dickinson, S., Siddiqi, K. and Zucker, S.W.: Indexing Hierarchical Structures Using Graph Spectra, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.27, No.7, pp.1125–1140 (2005). Demirci, M.F., van Leuken, R.H. and Veltkamp, R.C.: Indexing through Laplacian spectra, Computer Vision and Image Understanding, Vol.110, No.3, pp.312–325 (2008). Vinh, N., Inokuchi, A. and Washio, T.: Graph Classification Based on Optimizing Graph Spectra, Discovery Science, Lecture Notes in. c 2014 Information Processing Society of Japan . 641.

(5)