• 検索結果がありません。

Grammar-based Compression for Unrooted Trees and Subclass of Plane Graphs Using Integer Linear Programming

N/A
N/A
Protected

Academic year: 2021

シェア "Grammar-based Compression for Unrooted Trees and Subclass of Plane Graphs Using Integer Linear Programming"

Copied!
2
0
0

読み込み中.... (全文を見る)

全文

(1)Vol.2016-MPS-108 No.11 Vol.2016-BIO-46 No.11 2016/7/4. IPSJ SIG Technical Report. Grammar-based Compression for Unrooted Trees and Subclass of Plane Graphs Using Integer Linear Programming Morihiro Hayashida1,a). Yue Cao1. Yang Zhao2. Abstract: We address a problem of finding generation rules from biological data. In the previous study, the extraction of generation rules from glycans and RNAs represented by rooted tree structures was examined. Grammars were defined for rooted ordered and unordered trees, and methods for finding the minimum grammars that produce only given tree structures were developed. In nature and organisms, however, there are various kinds of structures other than rooted tree ones. In this study, we relax the limitation of structures to be compressed, and propose grammars representing unrooted trees and some subclass of plane graphs together with an integer linear programming-based method for finding its minimum grammars.. 1.. Introduction. Data compression is related with information that it contains. Our purpose is to extract useful information from data through compression. In the previous study, we developed integer linear programming-based methods for finding the minimum grammar that represents given multiple rooted trees, and applied to glycans and RNAs [1]. It, however, is considered that data in nature cannot be always represented by rooted trees. Fig. 1 shows rooted and unrooted trees. The rooted tree in the left is the same as the unrooted tree in the right if the rooted tree becomes unrooted. Although it is difficult to find substructures in the rooted tree, we can find several substructures in the unrooted tree. In this study, we relax the limitation of structures to be compressed, and propose grammars for unrooted trees and some subclass of planar graphs with an integer programming-based method for finding the minimum grammar for ordered version of its grammars.. 2.. Method. We define a grammar for an unrooted edge-labeled tree by 4tuple (Σ, Γ, S , ∆) as an extension of [2], where Σ is a set of terminal symbols (labeled edges), Γ is a set of nonterminal symbols, S is the start symbol in Γ, and ∆ is a set of production rules represented by Fig. 2. A nonterminal symbol corresponds to a connected unrooted subtree, which links to the remaining part with at most two nodes, called tags. If a subtree with two tags is not symmetric, either tag must be determined to link to the node of one side. That is the reason why direction is attached on a non1. 2. a). Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611–0011, Japan The National Institute of Advanced Industrial Science and Technology, 2-4-7, Aomi, Koto, Tokyo 135–0064, Japan [email protected]. ⓒ 2016 Information Processing Society of Japan. a. b. b. a. b a. a. a. b b. a. a. b. b a. b. Fig. 1 Example of rooted and unrooted trees.. A. B. C A. B. C A. a. A. B. C A. B. C A. a. A. B. C. A. a. Fig. 2 Production rules for an unrooted tree. Black circles denote tags. A nonterminal symbol includes at most two tags.. terminal symbol. In addition to the grammar, we also define a grammar for a subclass of planar graphs to deal with graphs including closed paths by adding production rules represented by Fig. 3. Fig. 4 shows an example of a plane graph and its grammar that generates only the graph. The edge-labeled graph is transformed from the purine chemical compound. Let G(V, E) be an undirected planar graph with a set V of ver1.

(2) Vol.2016-MPS-108 No.11 Vol.2016-BIO-46 No.11 2016/7/4. IPSJ SIG Technical Report. C. B. A. A. C A. B. C. B. Fig. 3 Additional production rules for a planar graph.. n. N. N. n N H. N. c. n. n. c n. n m. m. Ei0 ,S 0 , j0 ,T 0 ∪ Ei00 ,S 00 , j00 ,T 00 = Ei,S , j,T , Ei0 ,S 0 , j0 ,T 0 ∩ Ei00 ,S 00 , j00 ,T 00 = ∅, Ei0 ,S 0 , j0 ,T 0 , ∅, and Ei00 ,S 00 , j00 ,T 00 , ∅. I(G) contains a large number of elements for unordered trees and planar graphs, and the number of nonterminal symbols becomes also large. We consider an ordered version of these grammars, and represent S ∈ E(i) by two nodes, starting and ending nodes in clockwise direction. Then, a nonterminal symbol is represented by Gi,s,e, j,t, f for s, e ∈ E(i) and t, f ∈ E( j) instead of Gi,S , j,T , and we propose the following integer linear programming formulation for finding the minimum grammar representing only a given plane graph G. min. S. A. B. B. F. A. C. D. H. I. H. I. X. pu. u∈S(G). subject to x,∅,,∅ = 1, xi,s,e, j,t, f = 1 for all (i, s, X e, j, t, f ) ∈ I(G) s. t. |Ei,s,e, j,t, f | = 1, xi,s,e, j,t, f ≤ yi0 ,s0 ,e0 , j0 ,t0 , f 0 ,i00 ,s00 ,e00 , j00 ,t00 , f 00 (i0 ,s0 ,e0 , j0 ,t0 , f 0 ,i00 ,s00 ,e00 , j00 ,t00 , f 00 )∈C(Gi,s,e, j,t, f ). for all (i, s, e, j, t, f ) ∈ I(G) s. t. |Ei,s,e, j,t, f | ≥ 2, 1 y ≤ (xi0 ,s0 ,e0 , j0 ,t0 , f 0 + xi00 ,s00 ,e00 , j00 ,t00 , f 00 ), 2 su ≤ pu < 1 + su for all u ∈ S(G), X 1 su = xi,s,e, j,t, f . |E| {(i,s,e, j,t, f )∈I(G)|Gi,s,e, j,t, f is isomorphic to u} i0 ,s0 ,e0 , j0 ,t0 , f 0 ,i00 ,s00 ,e00 , j00 ,t00 , f 00. C. D. E. D. c. E. F. F. G. n 3.. F. G. G. I. m. Fig. 4 Example of a plane graph and its grammar. The edge-labeled graph (the upper right) is transformed from the purine chemical compound (the upper left). Dashed circles denote nitrogen atoms.. tices and a set E of labeled edges. Consider the following cutting procedure from G to G0 (V 0 , E 0 ) at a vertex i with a set S of vertices, where S ⊂ E(i) = { j ∈ V|(i, j) ∈ E}. (i) add a new vertex i0 to G (i.e., V 0 = V ∪ {i0 }). (ii) replace edges connected to i which endpoints are in S with edges connected to i0 (i.e., E 0 = E ∪ {(i0 , j)| j ∈ S } − {(i, j)| j ∈ S }). Then, if G0 is not connected and consists of two connected components, we say that G is separable with i and S . In a similar way, we consider the cutting procedure from G at two vertices i and j with two sets S and T , where S ⊂ E(i) and T ⊂ E( j). Gi,S , j,T (Vi,S , j,T , Ei,S , j,T ) is defined as a connected subgraph of G obtained by the cutting procedure, which includes non-empty sets S , T , does not include E(i) − S and E( j) − T , where i, j ∈ V ∪ {}, S ⊂ E(i), T ⊂ E( j),  means nothing. G is represented by G,∅,,∅ . Gi,S , j,T is distinguished from G j,T,i,S in almost all cases. Suppose that S(Gi,S , j,T ) and I(Gi,S , j,T ) are sets of all subgraphs Gi0 ,S 0 , j0 ,T 0 and its indices (i0 , S 0 , j0 , T 0 ), respectively, of Gi,S , j,T by the cutting procedure. Consider the case that Gi,S , j,T is decomposed into Gi0 ,S 0 , j0 ,T 0 and Gi00 ,S 00 , j00 ,T 00 . Let C(Gi,S , j,T ) be a set of all index combinations (i0 , S 0 , j0 , T 0 , i00 , S 00 , j00 , T 00 ) that Vi0 ,S 0 , j0 ,T 0 ∪ Vi00 ,S 00 , j00 ,T 00 = Vi,S , j,T , Vi0 ,S 0 , j0 ,T 0 ∩ Vi00 ,S 00 , j00 ,T 00 = {i, j}, ⓒ 2016 Information Processing Society of Japan. Discussion. In this study, we proposed the definition of grammars for unrooted trees and some subclass of planar graphs, and integer linear programming-based method for finding the minimum grammar representing only a given plane graph. We can prove that the proposed grammar generates only a planar graph. It, however, is difficult to concretely show the subclass of planar graphs that the grammar can generate, and to determine whether or not our method can be applied to a given plane graph. As future work, we would like to uncover the subclass that the grammar can generate, apply the proposed method to actual plane graphs, and extend the grammar to more complicated structures.. Acknowledgements This work was partially supported by Grant-in-Aid #16K00392 from MEXT, Japan. References [1]. [2]. Zhao, Y., Hayashida, M., Cao, Y., Hwang, J. and Akutsu, T.: Grammarbased compression approach to extraction of common rules among multiple trees of glycans and RNAs, BMC Bioinformatics, Vol. 16, p. 128 (2015). Akutsu, T.: A bisection algorithm for grammar-based compression of ordered trees, Information Processing Letters, Vol. 110, pp. 815–820 (2010).. 2.

(3)

Fig. 2 Production rules for an unrooted tree. Black circles denote tags. A nonterminal symbol includes at most two tags.
Fig. 3 Additional production rules for a planar graph.

参照

関連したドキュメント

Making use of Linear operator theory, we define a new subclass of uniformly convex functions and a corresponding subclass of starlike functions with nega- tive coefficientsG. The

Theorem 1. Tarnanen uses the conjugacy scheme of the group S n in order to obtain new upper bounds for the size of a permutation code. A distance that is both left- and right-

Motivated by the new perturbation results of closed linear generalized inverses [12], in this paper, we initiate the study of the following problems for bounded homogeneous

As just mentioned, the method used for recognizing circulant graphs is based on the notions of coherent configurations and Schur rings generated by graphs and on the in-

Chapoton pointed out that the operads governing the varieties of Leibniz algebras and of di-algebras in the sense of [22] may be presented as Manin white products of the operad

Upon deleting the tripleton part, and recalling our earlier analysis of the tripartite pairing (Theorem 3.1), we see that each partition τ contributes to σ a sign which involves

3 by two simple examples: we first give another solution of (2) obtained when m = 2, and then a generating function proof of MacMahon’s formula for the number of standard tableaux of

Definition 18 A total labeling of a finite leaf labeled tree with leaves labeled from a totally ordered set such as N ∪ {∞} is a maxmin labeling if each internal vertex, v, has label