Kyushu University Institutional Repository
歩行からのグラフ推論
丸山, 修
Interdisciplinary Graduate School of Engineering Sciences, Kyushu University
https://doi.org/10.11501/3084068
出版情報:Kyushu University, 1995, 博士(理学), 課程博士 バージョン:
権利関係:
December 1995
Osamu Maruyama
Let G b an edge-colored graph. A walk on G is a path in G containing all edge of G. Th trace of a path in G is the s quence of the edge-colors s n in the path.
The graph in} renee from a walk is the problem of, given a string x of color , finding an edg -color d graph with th minimum numb r of edges which r aliz a walk with trace x.
This th sis is d vot d to the study of the graph inf rene from a walk and it mod
ified versions call d th graph inference from partial walks and the walk reali:::ability problem. Here, a partial walk on a graph is a path in the graph. These graph inference problems are closely related to inferring a Markov chain from its output, th identifi
cation of a finite tate autornaton from it input/output behavior, and th probl n1 of constructing an dg -weight d graph from distance data.
For the graph inference from a walk, all results so far known are concerned with degree-bounded graphs as follows: (i) Ragha van gave an 0 ( n log n )-tin1e algori thn1 to solve the graph inferenc from a walk for graphs of bounded degree two, which ar linear chains and cycles, wh r n is the length of a given string. (ii) He al o showed that the following variant of th graph inference from a walk for graphs of bounded degree k is NP-complete for any k 2:: 3; Given a string x and a positive integer ]{, to find a graph of bounded degree k with at most R nodes which realizes a walk with trace x.
First, we focus our attention on th tructure of tree without any degr e-bound constraint. We consider the tree inf renee from a walk that is th probl n1 of finding from a string x th smallest undirected edge-colored tre that realizes a walk with trace x. Our first contribution is an algorithm that solv this probl n1 in tin1c lin ar in the length of x. This linear-tim algorithm is developed by devising a tr r writing
syst m with the 1hurch-Ro scr prop rty.
The NP-complet n ss result of (ii) on d gr -bound cl graph and our linear-ti1ne algorithm on trees naturally raise a question wh ther the problen1 for trees of bounded degre k is solvable in polynomial bme for k � 3. Our econd contribution i. an gativ answ r to thi qu stion. We prov that for any k � 3, the graph infer nc fr01n a v alk for trees of bounded degree k is NP-complete even if the numb r of color. J r tricted to k + 1. The numb r k +
1
of colors is optirnal b caus wh n th nun1b r of olor · i.at most k, the problem for trees of bounded degree k is olvabl in linear time.
Our third contribution is a thorough refinement of the abov NP- ompl en r sult on tre s of bounded degr k for k � 3. vVe onsider fairly in1pl tr . of bounded degree three called
(1 ,1
)-cat rpillars. A(1
1 )-caterpillar is a tr obtain d from a linear chain l by attaching at most one other appendage edg to ea h nod of the linear chain l. The shap of a (1,1)-cat rpillar is very sin1ilar to that of it ba kl on linear chain. By exploiting a technique of mbedding a tr e into a (1,1 )-caterpillar we prove that the problem is NP-con1plet even for this class of imple tr . Thi result is strong nough to convince us that the cla of ( 1,1 )-caterpillars is one of th irnpl t classes of graphs for which the graph infer nc from a walk is P-compl t .Because of the NP-completeness result on trees of bounded d gr k
(
k �3)
wfurther investigate the probl m fro1n th view point of how n arly optimal solution can b produc d in polynomial-time. W prove that there is no polynornial-time approxi
mation sch m (PTAS) for th problem unless P=NP by showing that th problen1 i MAX SNP-hard. nder the same assumption that P#NP, we show that th problem
can not be approximat d in polyno1nial time with ·ratio les than two. In similar way w can show that all th above r sults cone rning polynon1ial-tim approxin1ation hol l ev n for
(1 ,l
)-caterpillars. From these nonapproximability results it een1s to be . tillrather hard to approxin1ate the probl 111 on d grec-boundecl tr es.
Instead of inferring graphs from a ingl walk, w consider th problem of inferring graphs from a finite number of partial walk . vV ask: Gi v n a finite ct of string of colors, what is the small st tr e T such that for each string ::r E , th tree T realize sorne partial walk with trace x. We call the probl m the tree inference from partial walks. In contrast with the cas of a single walk we prO\' that th tr infer nee from partial walks turns to b P-complete. This problem r mains NP-complete even if the number of colors is restrict d to three. On th other hand the probl rn with at most two colors is solvable in linear time. Moreover, it i . hown that th r i no polynomial-tirne approximation scheme for the problem unles P=NP. However we yield a polynomial-time approximation algorithm for th problem. The perfonnance of the algorithm is guarant d in terms of the comp1 e sian in th trees T constru ted, that is,
IISII- ITI
whereIISII
is the total length of th tring in S' and11'1
i the number of edges in T.We al o consider the problen1 of inferring a lin ar chain from partial walks. While the linear chain inference from a single walk, as stat d above in ( i) is olva ble in polynomial time, the problem from parbal walks is P-complete even if th number of colors is thre (Th problern with at most two colors is solvable in lin ar tin1 ).
Although the problem is shown to b unfortunately MAX SNP-hard, we pr nt a polynomial-tim approxin1ation algorithn1 wh r th approxin1ation perforn1ance i analyzed in term of th length of a linear chain.
Our last contribution is a s ries of results on th walk realizability probl m. The walk realizability proble m is to decid if, given a graph G in addition to a tring x of colors, G realiz a walk with trace x, i .. if the string x can be produce l by walking on all edges of G. W introduce a linear chain d composition of a graph and how
that th size of the linear chain decomposition is a k y to the walk realizability. he problem is shown to be in LOG if a graph is d composabl into a con tant number of linear chains. W next show that the probl m is solvable in poly no1nial tin1e for tr cs 1' decomposable into O(log m
)
lin ar chains where rn is the number of dg of 1'. How ver, the problen1 for the class of tr es, which ar decompo able into O(m) linear chains, turns NP-complete even for (1,1)-caterpillars, which ar imple trees of bounded degre three.Acknowledgments
For most I would lik to thank rny supervisor, Prof. Satoru l\Ii ano. Fron1 th v r l eginning, he boosted my mathematical . eH-confidenc hy treating rne as a. pe r. H has collaborated on much of my r search. The guts of research h how d n1e ha,·e ] -ft a deep impact on me. This exc ptional hon sty gave me a generou look at th inn r working of a truly creative mind. He has gr atly influenc d my graduaL car r and I will be forever grateful for these five years I have . pent working with hirn. Knowing Prof. Miyano has been without any doubt, the high point of my intell ctual life.
I enjoyed the numerous m etings I have had with Prof. Setsuo ArikaY� a during which he monitored my progress, bolstered my confidence and at the a111 time pro
vided rne with his candid opinion on my work. Every n1e ting with him was a highl · fulfilling experience.
I wish to thank Prof. Fumihiro Mat uo, who has support d my research a tivity in various ways. His study group has been one of th b st ource of infonnation to n1e.
Prof. Ayumi Shinohara has be n a constant source of in piration and an infinit source of inter sting is u s to me. He introduc d me to th ar a of graph infer nee and much of n1y initial as well as current work have b en done with the fruitful dis u sions with him. H has b en very g nerous with his time during which I gleane l imn1ense technical knowledg from hirn. Like Prof. Miyano, Prof. Shinohara' intcll tual hon
esty has always rnade it possibl for me to learn from him.
Prof. Yasuo Kawahara has cr at d a pleasant. atmo ph re. I wi h to xpre rny gratitud to him for valuable comment and critici rn on many occa ion .
Thanks are due to Prof. Marek Karpinski at · niversity of Bonn. I had tu ii d there
over on m on th, which was a wand rful experience for n1e, and the work of Chapter 4 was started there.
Thanks arc al. o due to Ms. Erika Tateishi, who coauthored one of n1y papers. It was fantastic for me to share her id a with m .
I wish I could h r xpress my gratitudes and thanks to all of great and wonderful
p ople who helped, ncouraged and advised rne in my r earch of the e five years. If I did, the part of acknowledgments would be thick r than the main part. v\ ithout anyon
,
I could not have finished any portion of the work done here.I gr atly thank the Japan Society for the Promotion of Science
(
J P)
for its fellowship that supported me for rny last two years in graduate school.
Last, I would lik to thank my mother, who has be n most loving and under tanding me specially in my absence at horne. 1 his thesis is dedi cat cl to Ya u u k l\ 1aruyama
(1937-1989).
1 Introduction
1.1 Importance of th graph inference problems 1.2 Contributions ... .
1.2.1 1.2.2 1.2.3
Graph inference from a walk Graph inference fron1 partial walks Walk r alizability problem
2 Inferring a Graph fron1 a Walk 2.1 Preliminaries . . . . . . 2.2 Degr -bound d graphs .
2.3 Trees without any degre -bound constraint 2.4 Trees of bounded degree three
2.4.1 Gen ral results . 2.4.2 (1,1)-caterpillars 2.5 Approximability ..
2.5.1 MAX SNP .
2.5.2 Hardness of approxirnations 2.6 Concluding remarks . . . . . 3 Inferring a Graph from Partial Walks
Vll
1 4 7 7 11 14
17 17 19 20 25 26 4 50 51 -5 56 58
3.1 Tr e inference fro1n partial walks . . . . 3.2 Linear chain inference from partial walk 3.3 Polynomial-time approximation algorithm
3.3.1 On trees . . . . . 3.3.2 On lin a.r chains . 3.4 Concluding remarks . . .
4 Realizing a Walk on a Graph 4.1 Walk realizability problem 4.2 Realizing walks on trees 4.3 Concluding remarks . 5 Conclusions
Bibliography
58 64 66 67 6 69
72 73 76
87 90
Introduction
Imagine a robot trying to construct a treet map of an unfamiliar city by walking along all streets in the city. Typically, ach street has the name of it elf. Ho\vev r th names of streets are not necessarily distinct, i.e., some tr ets have th same name.
This situation evidently makes the task of the robot more challenging. ow suppo that th robot has so poor eyesight that the robot can onl . e the narn of the . tr ct that the robot is walking on.
This problem can be formalized as follows: Let G be an edge-color d graph. A walk on G is a path in G containing all edg s of G. The trace of a path in G is the s qu nc of the colors s en in the path.
Instance: Th trace of a walk on a graph G.
Output: Th graph G.
For a particular trace of a string x, there may b many graph that are con i tent with x, i.e., realiz a walk with trac x. For example, the string x = abaabbaabcddcd d can b produced as the trace of a walk on any of the graph in Fig. 1.1.
For a sp cified clas C of dge-colored graphs and a tring :r w wish Lo find a graph G E C realizing a walk with trace x such that th nun1ber of dges in G is minimized. Let C be a class of edge-colored graphs. The problen1 for C an be defined
1
a
b
a ab
a
b
a
c
c
(a)
a
d
e
(b)
(
c)
(d)
d d
cc
d
e
d
eFigure
1.1:Each graph realiz
sa walk with trac
x = abaabbaabcddcd d.a follows:
Graph Inference fron1 a Walk for C
Instance: A string x over a finite alphabet D .
Output: A graph in C with th minimum number of edge which realiz a walk with trac x.
In Fig. 1.1, (a.) is the optimum solution for linear chains, (b) i th optin1un1 olution for trees, (c) is th optimum solution for graphs of bounded degree four, and
( cl)
i the optimum solubon for graphs of bounded degree five. The graph inf ren from a.walk is the main problem we int nsively discuss in this th sis. Next we n1odify th problem and investigate the
graph in} r nee from partial walks
and the wal
k reali:::nbilityproblem.
A partial walk on a. graph is c1 fined as a path in th graph. E 1 e ially. the graph inferenc from partial walks provid s another method of graph inf r nc where we are given the traces obtained by a. finite number of partial walks on a graph. VV d fine the problem as follows:Graph Inference from Partial Walks for C
Instance: A finite set S of strings over a finite alphabet �.
Output: A graph G in C with the minin1u111 numb r of dg uch that for ach tring
x in S, the graph G r aliz s a partial walk with tra.c x.
These graph inference problems can be classified as th probl m of r con tructing a graph G from (partial) structural information of G u h a inferring a l\1arkov chain from its output, the identification of a finite tate auton1aton frorn i input/output behavior, and the probl m of constructing an eclg -weighted graph from eli. tan data.
tudiecl from various viewpoints in Con1put r . ci nee.
1.1 Importance of the graph inference problems
Rudi h
[
Rud85]
consider d the probl m of inf rring a Marko chain fron1 it output.Here, a Markov chain is a finit . tat rna hine, each state of which ha one arc lab led 0 and one arc lab led 1 with the transition probabiliti s on th two arc. umn11ng up to one. The output of a Markov chain is a string over 0 and 1 which is produced in infinit transitions from states to states according to the transition probabilitie of th Markov chain. Rudich
[
Rud85]
developed an algorithm that for the binar output of a Markov chain in the limit, reconstructs the und rlying Markov chain structur a well as the associated transition probabilities. The graph inf rene fro1n a walk an be viewed as r constructing the structure of a Markov chain fron1 it output.Gold
[
Gol78]
and Angluin[
Ang7]
considered the problem of id n tifying th sinallest automaton consistent with given input
/
output b havior. This probl n1 i losely related to th following graph inference problem for dir ct d graphs: We are giv n two finite sets Sp and Sn of strings called po itive exampl and n gative xampl , resp - tiv ly, and ask d to find an dge-color d directed graph G su h that for ea h positiv example x E SP, the graph G realizes some partial walk with trace x and that for ach negative xample y E Sn any partial walk with tra y cannot be realiz d in If n is mpty, i.e., there are no negativ xan1ples, and in addition, the nu1nb r of dg of G is minimized, th above graph inference problern is equivalent to the graph inference from partial walks. For the id ntification prohl m of auton1ata Gold
[
ol7]
and Angluin[
Ang78]
show d that th problem is, in general, P-con1pl te. Mor over Pitt and Warmuth[
PW93]
gave an intere ting negative re ult on approxin1ation algorithrnfor the probl m. They showed that for any con tant k no polynornial-tin1 algorithn1 can be guarante d to find a consist nt finite automaton of siz at 1no t (
opt)k
whereopt
i the size of th smalle t finite automaton consist nt with given behavior.Fron1 the vi wpoint of learning th ory, Angluin [Ang 7] e tabli heel a learning al
gorithm for finite state automata with memb r hip and quival nee qu ri s. Riv t and Schapire [RS93] pr s nted an ext nsion of Angluin 's algorithm to the prol l m of l arning finite stat automata fron1 its input/output b havior, ven in th ab nc of a m ans of resetting th machine to a start state. The absenc of the re t m chani n1 is recovered by inf renee of a
homing
squence
for an unknown automaton. Riv t and Schapire [RS94] also studied learning algorithms using an alternative r pr entation for finit state automata bas d on a quantity called the div r ity. Freund t al. [FKR+93]gav algorith1ns for learning finite state automata on the basis of a singl long walk in an av rag case.
When instead of a color a nu1nerical value is assigned into each edge of a graph, it would b also inter sting to infer a graph from distance data betwe n nod . Th prob
lem of constructing an edge-weighted graph (in many ca es, tree) from di tan data is motivated by applications in n1olecular biology computational lingui tic and oth r areas. Therefor various optimization criteria of th problen1 have been con idered.
On of the standard problems is de cribed a follow : W ar giv n an n x n di tanc matrix D =
( di,j)
and asked to find an dge-weighted tree T with eli tingui h d 1 ave v1, v2, ..
. , Vn that realizes D, i. . , the co td[j
of the path fromVi
to Vj i. xa tlydi,j
for ach 1�
i� j �
n, and such that th number of edg is minimized. l�or this case, polynomial-time algorithms ar giv n in {WSSB77, H i 9, R 9]. Ilow v r other optimization criteria for the probl m of in£ rring a graph fron1 di tanc da a wer proposed and NP-complet ne r ults wer r port d [Day 7 Win FI-(\tV95].For xan1ple, in
[FKW95]
it is shown that for an optin1ization criterion to find a treC' T with the minirnized sum of dge-wcights uch thatJ[1
� di 1, th r is a con. tantE > 0 uch that no polynomial-time algorithm can approximate the opti1nal solution
within a ratio of n( unl s P=NP. In order to handle actual numeri unreliable data in many appbcations, interesting approache. were tak n in
[F
1 2FK\i\19-, K\V95].
Especially, Kannan and Warnow
[KW95]
proposed to con truct a tree from partial ord rs of pairwis distances of distinguished node . They showed that the con ist ncy probl m is NP-complet in general but for unw ight d binary trees the con truction problem can b solvabl in polynomial time.In the model of the graph inference from a walk, the available information to r - construct an dg -color d graph G i only the trace information of orne ingl walk on G. If we could have ace ss to G in order to get more information of \ it n1a b easier to reconstruct G. Such a situation has b n d alt with in the problen1 of 1 arn
ing an unknown graph, and some of important results und r various condition were published
(
e.g.,[DP90,
BS94,BRS95]).
Broadly speaking, in th probl 111 we n1u t r - construct an unknown graph G by wand ring actively through G and gath ring partial information of G we can s ther . B nder and Slonin1[B 94]
showed that two cooperating robots can l arn any strongly connect d graph with n distingui hab]e nod s in expected time polynomial in n without membership qu ri s while imultaneou ly learning a homing sequenc . To find ho1ning sequence the algorith1n u ad litional information
(
e.g., indegree, outdegree, or color of node)
.B
tk et al.[BR 5]
con id r to learn a complet map of an unknown nvironrnent(
i . . , a graph)
by a natural earch method, called the pi c rneal search. The "pi cemea] constraint' requires that ach of the learn r's xploration phases must b of limit d duration. Th y devi d lin ar-tirn piecemeal-s arch learning algorith1ns on some kinds of graph .D
cng and Papadim-itriou [DP90] devised algoritluns to explore all edges o[ an unknown direct l, strongl connected graph. They evaluated the algorithm-performance by n1eans of the ratio of th total number o[ edges traversed divided by the optin1al numb r of traver als.
1.2 Contributions
In this section, we briefly view our contributions to th graph inf rene [rom a walk and relat d works. Th first and s cond parts are concern d with the graph inferenc problems from a walk and from partial walks, respectiv ly. The third part is on the walk realizability problem.
1.2.1 Graph inference from a walk
A walk on an dge-colored graph G is a path in G which contain all edg of G, and th trace of a path is th string of edge-color seen in the path. L t C b a cla . of graphs. The graph inference from a walk for C is d fin d a the problen1 of finding for a string x, a graph in C with the minimum numb r of dges which r aliz s a walk for x.
The graph in£ r nee problen1 from a walk was first deeply discu ed by lain and Rivest [AR90]. L t k be a fixed positiv integer. They con idered th probl n1 for undirected graphs of bounded degre k, and s ttled the ca. e for k = 2. A graph of bounded d gr two can be cha.ract rized a eith r a lin ar chain or a cycle. Th y devised O(n3)-tin1 and O(n5)-tin1 algorithms on linear chain and ycle , re p tiv ly, wher n is the length of a given string. In sequence, Raghavan [Rag94] developed a faster algorithm on graphs of bounded degree two ru_nning in 0( n log n) tim . ot that the graph inf renee problem is trivial for directed graph of bounded clegre two. On the oth r hand, Ragha.va.n [Rag94] gave a ha.rdne s result on a variant of th proble1n for graphs of bound d degr e k (k
� 3)
that is, given a ti·ing x and a positive int ger]{, Lo find a graph of bound d degrcP k with at mo t }\'
nod
s which realize. a walkwith trac x is NP-complet for all k
�
3. How v r, it has not b en known if the original problen1 of the graph inference fron1 a walk for graph of bounded d gr k i NP-compl t yet.Th above results, which are about graphs with a bounded degr e are aU for the graph inference from a walk. For this problem, we first ettle th ca. for undir ct d trees without any degree-bound constraint, called the
tree inference from
a walk. B inquiring into som structural properties of such trees wed v lop anO(n)-time
algorithm to solve the problem, i.e., to produce, from a string x, the small t tr realizing a walk with trace x, wh re n is the length of x. Becaus th structur of tr e 1 more complex than that of graphs of bound d degree two, it is quite remarkabl that our algorithm on tre s runs fa ter than Raghavan's algorithm on graph of bounde 1 degr e two, which is currently the fast st on on uch graphs. W d fine a Lr e r wri - ing system which is bas d on a binary rela,tion on th s t of undirect d edg -colored trees. This tree rewriting system is a key idea of the linear-time algorithm. W show that the tree rewriting ysten1 satisfi s th Church-Ro s r property. In addition wh n rewriting a tree T by the system, the r suiting tre i strictly mall r than T. Th combination of th se properties is ess ntially a rea on to allow the tr inf r n e fro1n a walk to have a polynomial-tim algorithm.
Second, we ask if th graph inferen e from a walk for tr s of bounded d gr three is solvabl in polynon1ial tim . This qu stion is inspired by our linear-tin1 algor] Lhn1 on trees and the NP-con1pleteness r sult by Raghavan on graph of bound d degre three.
When the answer to the question is affirmative we can ay that th structure of tr s is a stronger factor in th complexity of th graph inferenc problem than bounding the maximum d gr of grapbs. Unfortunately, th problem is ett]cd negaLiv ly as follow :
vVe prove that th graph infer nee from a. walk for tre of bound d degr e thr e i P-complet . However this r ult docs not giv . us any id a of wh th r or not th proble1n with a. constant number of colors is still NP-compl te b causf' th reduction w giv to show the NP-hardness of the problem does not bound the alphabet iz of colors by a constant. By exploiting a technique of r clueing the numb r of colors on tr s, we then show that th problem for trees of bounded degr thr e r main NP-complete ev n if the numb r of colors is restricted to four. This re ult can be gen ralized in the following way: For any k 2::: 3, the problem for trees of bound d degree k is N P-complete even when the number of colors is k + 1. It can b aid that the number k + 1 of colors is optimal since the proble1n for trees of bounded degr e k with at most k colors is solved by the linear-time algorithm that we construct for the tre inf renee from a walk.
hird, we consid r very simple trees with degree-bound thr , called ( 1) )-cat rpillar · . Let a and {3 be positive integers. An (a, {3)-caterpillar is defin d as a tr onstructed from a linear chain, called the backbon , by attaching at most a other app ndage linear chains with length at most {3, call d hair to each nod of the backbon . Obviou ly an
(
a, {3)-caterpillar is a tree of bounded degr a+ 2. Espe ially a ( 1 1 )-cat rpillar is so simple that it looks like a linear chain b cause each nod of th backbon , which is a linear chain, has at most one hair with length on.
W prove that th graph inf r neeFigur 1.2: A (1,1)-cat rpillar.
from a walk for (a, {3)- caterpillar is NP-compl t for a h a 2::: 1 and 2::: 1. It is very int resting that, ven for (1 ,1 )-caterpillars the N P-completen s result sti 11 ·tand
up.
A
con1pared with the result that the bncar chain infer nee from a walk is olvable in polynomial time, which was given by A larn aucl Rivest [AR90] and Raghavan [Rag94], it can be said that allowing each nod of a lin ar hain to have at n1o. t one app ndage edge makes the graph inference from a walk intra tabl . Here, let
b
be thlength of th backbon of a fixed arbitrary (1,1)-caterpilla.r. Our reduction bowing th NP-hardness for (
1,1
)-caterpillars does not work for the r tricted ca. that th total number of hairs is at 1nost O(logb).
From thi observation, the graph infer n e from a walk for (1,1)-cat rpillars with O(logb)
hairs seems to b solvabl in pol no1nial time although it has not been proved yet. We believe that th cla s of(1
1 )- aterpillars can be regarded as one of the simpl st class s of graphs for which the graph inf r nee fron1 a walk is NP-complete.Under the natural assumption P#NP th graph infer nee from a walk for tree. of bounded degree
k (k � 3)
does not have any polynomial-tim algorithn1 b au of it.NP-completeness. W wish to develop polynomial-tin1e approximation algorithn1s for the problem, but yet we give so1n hardne s r ult on polynomial-tim approxin1ation of th problen1. We prove that th problem has no polynomial-tim approximation schem (PTAS) unless P= P. W say that a problem has a PTAS if for e ery fixe l
c > 0, the probl m can be approximated with ratio 1 + c in polynornial tim . lVIor
pr cisely peaking, w prove that th probl m is MAX S P-hard. Th compl xity class MAX SNP is a clas. of optirnization problem which i syntactically d fin d by P apadimi triou and Yannakakis
[PY91],
together with a one pt of reduction, call d 1-r duction, that preserves approximation schemes. By th result du to rora et al.[ALM+92],
a MAX SNP-hard problem has no PT S unle s P= P. Mor over, we giv another nonapproximability re ult of th graph inf r nc from a walk for tr of bound cl degree k fork � 3.
We show that there is no polynon1ial-tim approxin1ationalgorithn1 with ratio less than two unl ss P= P. Thi. a trivial eonsequ nee of th reduction showing th P-hardness.
Surprisingly, all the abov results cone rning polynomial-time approxin1ation are shown to b true even for (1,1)-caterpillar . These nonapproximability re. ults as ert that it is even hard to approximately solve the graph in£ rene fron1 a walk for degree
bound d trees in polynomial time.
All the above our r sults of the graph infer nee from a walk, which ar provided in Chapter 2, are published in [MM92 MM95a].
1.2.2 Graph inference from partial walks
In Chapter 3, we discuss the problem of inferring a graph from a finit nun1b r of partial walks instead of a single walk. A partial walk on a graph G i defined a a path in G. Let C b a cla s of graphs. The
gTaph injeT
ncfron""L paTtial walk. for
C i. th problem of, given a finite set S of strings over an finite alphabet, flncling a graph with the minimum numb r of edges which realiz s some partial walk with trace x for ach string x E S. Obviously, this problem is an ext nd d ver ion of th graph inf r n from a walk b cause wh n I I = 1, i.e. the numb r of tring in xactly one th problems are equivalent.First, we consid r th problen1 for undir cted tre , called th
tre inf T nee jTo7n
paTtial walks.
As previou ly stat d, one can infer a tree from a ingle walk in linear time. However, we show that th tre inference from partial walk i NP-con1pl t . Frorn this result, w can say that th trace of a single walk on a tree T ha n1or information of T for reconstructing T than the trac of a finit nun1ber of partial walks on T. This observation would be true for arbitrary graph . The tre in£ renee from partial walks is still NP-eomplet even if the alphab t ·iz of olor i thr e. The number thre of colors is optimal because if the nun1ber of colors i at n1o t two, thproblen1 is solvable in tim linear in th total length of string. m ' b applying our linear-time algorithm for the tree inference from a walk.
We al o di cuss how to approximately solv th problem in polynomial time. v\
prov a n gative result that the problem has no PT S un 1 ss P= P by . ho\r\ ing the MAX S P-hardness of the problem. On th other hand, fortunat ly the problem is shown to be approximately solvable in polynomial time. The perforn1anc of our approximation algorit hrn is guarante d in terms of the
compr ssion
in th tree T construct d, that is, IISII-ITI wher II 'II is th total length of th tring. in and ITI is th number of edg s in T. We prov that there is a polynomial-time approxirr ation algorithm to find, from a finit set S of strings, a tr T realizing om partial vvalk with trace x for each string x E S such that C:::: Cm/(ISI-1)
wh re Cis th compression in T and Cm is the n1aximum compression for S. The algori thn1 i con truct d by employing the polynomial-time approximation algorithn1 for the rninimmn cornn1on supertree problem [YM93]
.We also consider th problem for undire ted linear chains, which i r ferr d to
th linear chain inference from partial walk
. This problem is clo ly relat d to h hortest common superstring problem[
GMS OJ becauset
hedirect d
v r ion of our probl m i.e., the graph inf r nc from partial walk for directed linear chain i intrin ically equival nt to the shortest common sup rstring probl m. This problen1 i on of th most basic and important problem in Cornputer Sci nc and ha pl ntiful r ult (e.g.[GMSSO, TU88, Tur89, Li90, Ukk90 BJL +94, TY93, Micl94 Gus94, JL94, I�PS94 CGPR94, AS94,
JT95]).
The n1inimum con1n1on sup rtree problem [YM93], who polynomial-time approximation algorithm is employed in uch an algori hn1 for th tree inf renee from partial walks, is also related to th hort t comn1on uper. iring b cause the minin1um common supertree problem can b vi w d as an ext n ion ofth shortest con1mon up rstring. In the sa1nc way as t h � ca e of tree., we pro\·e that the Enear chain infer nc from partial walks is P-c01nplet , while the lin ar chain in£ r nee from a single walk is solvable in polyno1nial time
[AR90,
Rag94]
. This re ult holds even if th alphab t size is three. The number thre is al. o optimal because th problem with at most two colors is solvable in lin ar tim . To how th P-harclne . of the problem we giv a reduction from the shorte t common superstring probl In.It is interesting that although th short t common sup r tring problem is till NP
compl te even if the alphabet size is restricted to two
[GMS 0],
y t th linear chain inference from partial walks is solvabl in linear time under th sam con traint on th alphabet size.We show that the linear chain inference from partial walk ha no PT AS unles P=NP, which is a direct consequen e of the MAX
S
P-hardn s . But w an g1v polynomial-time approximation algorithms that mploy approximation algorithn1 for the problem of shortest superstrings with flipping[JLD92]. A
sup rstring of a L ,_' of strings with flipping is a string s which contains for each tring x ES
ith r x or th reversal of x as a substring of s. Th p rformances of our approximation algorithm are analyzed in t rn1s of the length of a linear chain l and the c01npr ssion in l. LetS
b a finite set of string . We show that one approximation algorithm ati fie that C 2:: Cm /2
, where C is the compression in the produc d linear chain l and m i the maximum compression forS
and that another algorithm find a linear hain with at most 3 ·opt( S)
edges, wh reopt( S)
is the number of edg s in the shorte t lin ar hain realizing all partial walks forS.
The results in Chapter 3 are based on
[MM92].
In1agine a guard with a. security agency having a sidelin . The ta k which the agency takes upon th guard is to patrol all . treets of a. ity by walking along thf' street ..
His sid line consists of many kinds of action such a.s withdrawing hi savings fron1 a. bank, visiting a. company and buying something a.t hops. He know a sequence of actions which ma.k s a profit. However, he can not always take any actions an wh re.
For xa.mple, when withdrawing n1oney there must b a. bank ther . Therefore, it is needed for him to test if the profitable sequence of a tions can be realized whil p rforming his duty as a guard i.e., walking on all street of the city.
We formalize and discuss the abov problem in Chapter 4. As a n1od l of a cit , we consid r an edge-colored graph. Each color itself corresponds to an action of the guard' id lin . ow uppose that wh n the guard traverses an dge e th guard must tak th action assigned to e. Let C b a class of graph . Our probl n1 i d fin d as follows:
Walk Realizability Problem for C
Instance: A tring x over a finite alphab t L.J and a graph G.
Question: Does G r aliz a. walk with trace ·?
The walk realizability problen1 is closely related to the graph inf rene fron1 a walk.
When we regard a string x as a eries of actions and a graph as a con tra.int on action th graph inf r nc proble1n is to design the small t con traint graph which accepts the s ri s of actions. On the oth r hand, the walk realizability problen1 i to te t if a constraint graph can be sw pt by a given eri of action .
Another r la.ted problem is th partial walk rea.Jizability probl n1, which i to a k, given a graph G and a tring x, if G realizes a partial walk whos s quencf' of edge-
colors is x, where a partial walk on a graph G is a path in 0. It is ca. ily seen that the partial walk realizability problen1 is complete for NLO
[
Sav70]
. The. e two problen1s cliff r on the point that all dges must be tra.v rs d with a specifi d qucncc of edgecolors for the walk realizability problen1 whil th partial walk r alizabilit problern does not require this condition.
We introduce a notion of the linear chain decomposition of a graph and how that the size of the lin ar chain decompo ition is a key to the con1pl xit of th walk realizability problem. Our first observation on the walk realizabilit probl m i that it is solvable nond t rministically in log-space, i.e. in th class NLOG, if a graph is undirected and decomposable into a constant numb r of lin ar chain . Thi r ult al o holds for directed graphs.
Furthermore, we show that the walk realizability problem is solvabl in polynomial tim for undirected trees T d con1posable into O(log
1n)
lin ar chain,
w h r 111 i. thnumber of edges rn T. Unfortunat ly, it do s not e n1 that our algorithn1 on uch trees is applicable to gen ral graphs decornposable into O(log Til) lin ar chain .
In contrast, w prov that th walk r alizability problem for the class of undirected trees, which are d composabl into
0(
m)
linear chains turns NP-con1plete. We refine this result by providing furth r observation on th probl m. W ay that a node i proper if all of the dg s incident to the node have mutually di. tinct color . We can easily see that if all nod s in a giv n tre ar proper, the walk realizability probl In i solvabl in polynomial tirne by applying the linear-time algorithm we d v lop for the tree infer nee frorn a walk. However, it is quit remarkable that the walk realizability problern becomes NP-compl t for the cla of tr s having xactly one nod being not proper. Obviously, if tr s are directed, the walk realizability probl 1n i olvable in polynomial tim .A (1,1)-caterpillar is an undirected tre obtained from a linear chain l by attaching at n1ost one hair to each node of the bnear chain l. For uch imple tr e like linear chains, we show that the walk r alizability probl m remain NP-compl te though the probl n1 for linear chains is in NLOG. It is really int re. ting that the NP-complet n result still holds even if th number of colors is two. Wh n any ( 1
1 )
- aterpillar i monotone, i.e., the nu1nber of colors is one, the problem is trivial. By an a y n1odification of a r duction showing that th walk realizability problem for (1 ,1
)
-cat rpillars is NP-hard, th problem for the class of m shes can be shown NP-compl tc, wh re a mesh is an undirected graph with the node set V ={ vi,j 11 :::;
ij :::; n}
and th dg set E ={ {vi ,j , vi ,j +
1} I 1 :::;
i:::; n, 1 :::; j :::; n
-1 }
U{ {vi
,J ,vi+
1,j } I 1 :::;
i:::; n-
1 1:::; j :::; n} .
The results of this subsection are based on [MM95b].
Inferring a Graph from a Walk
In this chapter we consider the graph inference from a walk. After revie\Ying some important results given by Aslam and Rivest
[AR90)
and Raghavan [Rag94) we pre nt our results on the problem. First, we give necessary definitions and notations used throughout this thesis.2.1 Preliminaries
Let � be a finite alphabet. Th s t of all strings ov r � i denoted by Th reversal of a string x is written as xR. Th 1 ngth of a string x is denoted by lxl. To represent the cardinality of a set S, we use the same notation lSI. The concat nation of strings x and y is written as x · y, or simply xy. For strings x1, . . . x n
TI�1
xisimply xn. For convenience, x0 is d fined a the e1npty string c.
A c
oloT
is a symbol in�. We will con ider edge-colored graph G = (V E, c) wh rc : E ----* � is called the
edge-coloTing of
G. H reafter a graph n1eans an dg -coloredgraph without any notic . Let G be a graph. A
paTtial walk
on G i a path in G.Let vo, v1, ... , Vn be th sequenc of nodes of a partial walk
w
on G. \t\ e denot i byw[i]
for0 ::; i ::;
n. We callw[O]
andw[n]
thtaTt
and end node of the partial walkw,
resp ctively. A partial walkw
is aid to beclosed
if th tart node ofw
coincides 17with the end noel . Let
e1
... , enbe the equence of edges of a partial walk
uon a graph
G =(V E,
c).h
traceof
wis f1�1
c( i)
·L t
xbe a . tring in
_/'. If wi a partial walk with trace
x, wi call d a
partial walk for x.For a graph
G, wsay that
G realizes
a partial walk for
xif th r is a partial walk for
xon
G.Let
:r =f17=1 i with
iin
I:.For a partial walk
wfor
x,th
ubwalkof w for sub tring TI7=j. i with
1:::; j :::; k:::;
nis the partial walk with the nod sequence of
w[j -1)
w[j]
· · · ,w[k]. If
a partial walk on
Gcontains all edges of
G,it is called a
walk on G.A
node v of a graph i said to be
properif the colors assign d to th edg incident to v are mutually distinct.
Agraph
Gis said to be
properif all nodes of
Gar proper.
For graphs
Gand
G',if
Gand
G'ar isomorphic including edge labels, we id ntify
Gwith
G'without any notice.
A cycle
is an undirected graph
l =(V, E, c) with V
={
v1v2, ... , vm} and
E ={{v1,v2}, {v2,v3}, ... , {vm_1,vm},{vm,v1}}.
A linear chaini· an un lir
t dgraph
l =
(V,E,c) with V
={v1,v2, ... ,vm} and
E ={{v1,v2}, {v2 v3}, ... , {vm-l,l.m}}.
The
labelof
lis f1�11 c( {Vi, Vi+I} )
. A directed linear chainis
l =(V, E, c) wh r V
={vi,v2, ... ,vm} and
E ={(v1,v2) (v2,v3) ... ,(vm-l,vm)}. Th label of th direct d linear chain
lis defined in the same way as that of a linear chain.
Let
Cbe a class of graphs. The
graph inference from a walk foris d flned a follows:
Graph Inference from a Walk for C
Instance: A
string
xover a finite alphabet
I:.Output: A
graph in
Cwith the minimun1 number of eclg which realiz , a walk for
X.
2.2 Degree-bounded graphs
If C i. th coll ction of all graphs, i.e., there i no constraint on graphs in C, the solution of the graph inference from a walk for C is i1nn1 diate. For xamplc, given a string x of the colors a, b, c, d, the graph in Fig. 2.1 is th mallest graph r alizing a walk for x.
Figure 2.1: When given a string x of n colors, any graph reaUzing a walk for ::r ha at least n edges.
Th graph in Fig. 2.1 is allowed to contain self-loop . Even if any elf-loop are not allowed, the solution is still clear (s Fig. 2.2).
Figure 2.2:
d
Evid ntly, the maximu1n degre s of both the graphs in 1� ig. 2.1 and 2.2 d pend on the alphab t size, i.e., th number of color of the string x. The following qu tiou ari es
out of the above fact on degrees: When th maxirnu1n de re of graph is bound d by a onstant, say k, is it easy to find fron1 a tring :r the small t graph of bound d degree k r alizing a walk with trace x? For the question, A. lam and Rivest
[
AR90]
and Raghavan
[
Rag94]
gave some important results.Aslam and Rivest
[
AR90]
dev loped polynomial-tim algorithm for th graph inferenc frorn a walk for undir cted graph. of bounded degree two. Such a graph can b classified as a lin ar chain or a cycle. Their algorithm run in
O(n3)
time andO(n.s)
tim for linear chains and cycl s, r spectiv ly, where
n
is th length of a giv n tring.A faster algorithm on graphs of bound d degree two is given by Raghavan
[
Rag94]
:Theore1n
2.1(Raghavan [Rag94])
The graph inf ren from a walk for graphs of bound d degree two is solvable in0( n
logn)
tim , where n is the length of a gi en . tr·ing.otice that if graphs of bound d d gree two ar diT cted, the graph inferenc fro1n a walk turns to be trivial.
L t k
�
3 b a fixed integer. Raghavan solv d a problem sirnilar to th graphinfer nc from a walk for graphs of bound d d gree k which wa a k d to b tra tabl or NP-con1plete by Aslam and Rive t
[
AR90]
.Theorem
2.2(Raghavan [Rag94])
Let k�
3. Giv n a tring x and a po itiv int g r J{, the probl m of deciding if there exi ts a graph of bound d d gr k with at most ]{ nod s which realizes a walk for x is NP-con1pl te.2.3 Trees without any degree..:bound constraint
As we hav s n in Section 2.2, all of th pr v1ou works ar conn ct d with degree
bounded graphs. In this s ction, w consid r th graph inf r nc fron1 a walk for
undir ted trees without any degre -bound constraint. We call the problem the
tre inferenc jTom
a walk.Exan1ple 2.1 Given the string x = daabbcacbbcbbabcaccbbabbcdbbc, what is the snJall
est tree realizing a walk for x? That is i 01norphic to th tree in Fig. 2.3.
a
c
b
a
d
Figure 2.3:
V\ e prove the following th orem:
Theorem 2.3 The tree inference fron1 a walk is solvable in 0( n) time, wh T n 1 the length of a given stTing.
W here construct a tree rewriting system and then giv a proof of Th or m 2.3.
In fact, we have another simpl proof of the theorem, which wa given in
[MM92].
The r ason why the former has b en selected is as follows: The tree r writing y ten1 provide us a clear understanding of th problem as w ll a some helpful prop rti cone rning th graph inf r nee fron1 a walk and the graph inf r nee fron1 partial walks, which will be used in the later sections and chapters.
Let 8 b an arbitrary set. L t ____, x be a binary relation on,_, i.e .. a subset o[ 'x S'.
We denote
(x,y)
E---tx byx
---tx y. 1h compo ition of two binary r lation i defined in a usual way. Th r fl xiv and transitive closure of ---tx is denoted by�X
and th tran jtive closure of---tx
is denoted by�X·
An ele1nentx
E S is calledirreducible
if there is no
y
E S withx
____, x y. Ifx
� x y andy
is irr ducibl then we say thaty
is an---tx-normal form of x,
denoted byX(x).
binary r lation ---tx is noe!h rian if ther is no infinit s qu nc x1---t
xx
2 ---t x · · · ____, x Xn ____, x · · · . A binary relation____, x 1s
confluent
if, for any x, Yo, Y1 E S, x�
x Yo and x�
x Y1 impl that th re iz E S with Yo
�
x z and y1 � x z. A binary relation---t
x ilo c al
yl con flu
nt if, for anyx,
Yo, Y1 E S,x
____, x Yo andx ---t x
y1 imply that there is z E S with Yo � x z and y1 �x z . Letf
be a function from S toN, whereN
is the set of nonnegativ integ r . A binary r lation---t
x isstrictly deer
aing
with respect to the functionf
if, for anyx
y
E Sx
---txy
impli sf(x)
>f(y).
The following r sults are well known.
Proposition
2.1(Huet [Hue80])
(1)
For a noeth rian relation, it is conflu nt if and only if it i locally confluent.(2)
Let ____, x be a confluent binary relatlon on a s t S. 1 h n, for each l m ntx
E c;an ____, x-normal form of
x
is uniqu if it exists.Proposition
2.2(Aslan1 and Rivest [AR90])
If a binary rela ion____, x on con- flu nt and strictly decreasing with r sp ct to a functionf
implies
f(X(x)) :::; f(y)
for anyx
andy E S.W h r defin a binary relation on th t of tr
____,
N,
then x �xy
Definition
2.1 Le
t T1 =(V,
E,c)
be a tre which includ adjacentdgcs
1 ={
Vt v2}
and 2 =
{
v2 v3}
with c(
1)
= c(
e2) (see Fig.
2.4(
a))
. L tT2
be th tr c obtain dfrom T1 by identifying v3 with v1 togeth r with th adjacent edges
1
and e2 (. ce Fig.2.4 (b)). For such trees T1 and T2, we say th at T2 is an
edg -folding of T1.
Th binary relation -+ F on th e set of trees is defined as the set of pairs (T1,T2)
su h that T2 is an edge-folding of T1.(
a) T1Figure 2.4: t1, t2 and t3 in (a) and (b) ar arbitrary tr and a is an arbitrary color.
Trivially, -+ F is strictly decreasing with respect to th number of dge . Hereafter for a string x, w denote a lin ar chain of label x by lx. The following l n1ma would be trivial:
Lemma 2.1 Let x be a string and T be a tree. The tree T realizes a walk for x if and only if lx �FT.
Notic that -+ F is noeth rian. For proving Lemma 2.2, it is uffici nt to how that
-+ F is locally confluent by Proposition 2.1.
Lemma 2.2 The binary relation -+ F is conflu nt.
We can see that for each string x, an -+ F-norma.l fon11 of x umqu ly XJ ts by Proposition 2.1 (2).
Lemn1a 2.3 Let ;r be a string. The ---+p-normal form of :r, i.e., F(lx), is the mallest tree realizing a walk for x.
Proof. Sine Lx �F F(lx), it follow from L n1ma 2.1 that F(lx) al o realizes a walk for x. Let T b a tree which realizes a walk for x. By Lemn1a. 2.J, the lin a.r chain lx satisfies lx �FT. Sine the binary r lati on ---+ F i confluent (Lernma 2.2) and strictly decreasing with resp ct to the number of edges, we see from Proposition 2.2 that F(lx) has edges less than or qual to T. H nc F( lx) is the smallest tr e that r aliz a walk
w for x. 0
W can easily s the fact that, for any tree T realizing a walk for a string x th tree Tis proper if and only if T is isomorphic to F(lx) because of Lemma 2.1 and th definition of the binary relation ---+ F· By this fact and Lemma 2.3 the following lemrna is clear:
Lemma 2.4 L et x be a string. The smallest tree realizing a wa.lk for x is uniqu and proper.
Evid ntly, the algorithm EDGE-FOLD produc s a prop r tree T, whi h i repr - sented as an array indexed on the vertices and the colors. It is clear that T realizes a walk for a giv n string. Thus, we hav the following:
Lemn1a 2.5 The algori thm EDGE-FOLD pToduces from a t ring x th smalle t tre realizing a walk for x.
The number of steps executed by v ry iteration of the loop of EDGE-FOLD is bound d by a constant. Thus EDGE-FOLD runs in O(n) time.
I*
X= X1 · · · Xn(
xi E�)*I begin
end;
u
:= 1;v
:=1;
fori
:= 1to
nif T[u,
xi]
= 0then v
:=v
+1;
T[u,
xi]
=v; T[v,
xi]
:=u· I* { u, v}
is an edge labeled Xi*I u
:=v;
else u
:=T[u,
xi] end if
end;
return
TAlgorithm
: EDGE-FOLD2.4 Trees of bounded degree three
In this section, we consider the graph inf renee from a walk for tree with degree
bound. This probl m aris s in th light of th P-compl t n ss resul on graphs of bound d degree k for k
�
3 (Theorem 2.2) and our lin ar-time algorithn1 on trees (Th orem 2.3). Our qu stion is wh ther or not the graph inference from a walk for tr es of bound d d gree three i tractabl . vVe answer th que tion negativ ly and moreover show that it is still hard to solv the probl rn approximat ly.Her , we give the decision version of the graph inference from a walk. L t be a
cla s of graphs. The decision problem for can be de. cribcd a follo\vs:
Instance: A string x over a finite alphabet L.J and a po itivc integ r f{.
Question: Is there a graph inC with at 1nost ]{edges which realiz a walk for 1·?
We first address the graph inference problem for the cla s of trees of bound d d gr three. After that, we restrict such trees into sin1pler trees of bound cl d gree thr , call cl (1 ,1 )-caterpillars.
2.4.1 General results
We shall see an example of the decision probl m on tre s of bounded degr thre .
Example 2.2 Let ]{ = 5. For a string x = abbccd, the graph in Fig. 2.5 is a tr c of bounded degree thTee with J( edges which realizes a walk foT x. On th oth r hand, foT y = abbccddccaad, theTe is no tTee of bounded degree three with at nJost f{ clges
which Tealizes a walk foT y.
c d
---
Figure 2.5: The graphical repr s ntation of a nod , o, 1 omitt d.
Our main object of this subsection i to prove the following th orem:
Theorem 2.4 Th gTaph inference hozn a walk for the cla s of unclir ct cl trees of bounded degree three is NP-complete.
The following fact, which is trivial, is u eful when w construct a tree realizing a closed walk:
occurs in x exactly twice, then t include. exactly one edge labeled a and th edo'e i traversed exa.ctly twice by w in di tinct directions.
Proof of Theoren1. 2.4. Obviously, the problen1 is in NP. To . how that the proble1n is NP-hard, we give a reduction from th vertex cover problem
[GJ79].
which i to decide if, given a graph G =(V, E)
and a positiv integer ]{,
ther is a v r x cov r of size at n1ost ]( for G, that i , a sub. et U� V
withIU I ::;
]{ uch that for a h edge{
u,v}
EE
at least one of u andv
b longs to U. W denote this problem b VERT� X COVER. Especially, wh n graphs are restricted to graphs of bounded degr e k without any self-loop, VERTEX COVER is denoted byk-DEGR
VERTEX COVER. It i known that3-DEGREE
VERTEX COVER remains NP-compl te[GJ79].
L t f{::; l VI
be a positive integ r and
G
=(V, E)
b a graph of bound d d gree three without any self-loop, whereV
={v1, ... ,vn}
andE
={ 1, ... em}·
We will define an alphabet. .__;and construct a string x ov r � and a positive integer ]{' such that th r i a tre of bound d d gr three with at n1ost ]{' edges which realiz a walk for x if and only if G has a vertex cover of siz ]{ or less. Hereafter a tree m ans a tr e of bound d d gre three. The alphabet � is defin d as follow :
� =
{ a1, a2} U {91, 92, 93} U { h1, h2} U { r1, r2} U {
sl?s2}
U{0,1 #}
u
{
ai, aiI
i =1,
2,3}
U {
i ,jI
i = 1 . . . n andj
=1
, 2,3}
uv.
In order to defin the string x, we in trod uc th four .kinds of trings a follows:
(1)
Wand vV': Let fl = 53n + n1 + 2 where n =l V I
and 1n=lEI.
vV d finJ.L
vv =