ラベル選択を有する最小全域木問題
藤芳明生 鈴木昌和
茨城大学工学部情報工学科 九州大学大学院数理学研究院
[email protected] [email protected]
Abstract
In this paper, we study the minimum spanning tree problem for vertex-labeled
graphs, where the weights of edges may vary depending on the selection of labels
of vertices at both ends. The problem is especially important as the application to mathematical OCR. It is shown that the problem is NP-hard. However, there exists alinear-time algorithm for graphs of small tree-width. In this paper, a linear-time algorithm for series-parallelgraphs is presented. The relation tothe generalized minimumspanning tree problem is discussed.
1
Introduction
The minimum spanning tree problem is one ofthe most famous combinatorial problems in computer science. Fast algorithms to solve the problem
are
well-known. In this paper, we study a generalization of the problem for vertex-labeled graphs, where the weight of edges mayvary
dependingon
the selection of labels of vertices at both ends.An instance of the problem is illustrated in Fig. 1 (a): a finite set of vertex labels is $\Sigma=\{a, b,c, d\}$; vertices
are
indicated by dotted rectangles; each vertex has at leastone
candidates of labels represented by circled symbols; each weighted edge connects labels that belong to different vertices; andsome
pairs oflabels may not be connected. For this instance of the problem, the minimum spanning tree is illustrated in Fig. 1 (b): exactlyone
label is selected from candidates for each vertex; and the graph induced by selected labels and selectededges becomesa
spanning treewhere thesum
of weights of edges is the minimum. We also introduce the notion of a base graph. The corresponding base graph is illustrated in Fig. 1 (c). We say that two verticesare
connected ifsome
candidates of labels of two verticesare
connected.For the development of mathematical OCR [1], the problem is especially important. As shown in Fig. 2 (a) and (b), a mathematical OCR system constructs a graph that expresses possible adjacency connections of bounding boxes from
a
scanned image. At this moment, several character recognition candidatesmay remain for each bounding box. Eachedge is weighted by the positional relation andco-occurrence
ofcharacter recognition candidates. In order to outputabetter recognition resultas
shown in Fig. 2 (c), thesystem wants to find the minimum spanning tree from the graph not only by selecting character recognition candidates for boundingboxes but also by determining adjacency connections of bounding boxes.Figure 1: (a)
an
instance ofthe minimum spanning tree problem with label selection, (b) the minimum spanning tree, and (c) the base graph.2
NP-Hardness
To show the NP-hardness, we reduce the Boolean satisfiability problem (SAT) to this problem.
Theorem 1 The minimum spanning tree problem with label selection is NP-hard.
Sketch
of
proof.Given a CNF
formula,we
can
constructa
graph such that it hasa
spanning tree if and only if the formula has
a
truth assignment. For example, the graph correspondingtothe CNF formula $(x_{1}\vee x_{2}^{-}\vee x_{5})$A$(x_{2}\vee x_{3}\vee x_{4}^{-})$A$(x_{3}\vee x_{4}^{-}\vee\overline{x}_{5})$ isillustratedin Fig. 3. All edges are weighted by the unit value. $\square$
3
Linear-Time
Algorithm
Inthis section, wepresent a linear-time algorithm for graphs whose base graph is a series-parallel graph [4]. The idea behind the algorithm is described as follows:
.
When two graphs are connected in series, the minimum spanning tree is obtained by simply connecting the two minimum spanning trees of original graphs.(a)
$\mu(a,$ $b)= \int_{l}^{b}\overline{\Theta}\frac{\ }{(\epsilon)}$
(b)
$b$ $d-c$
(c) $\mu-(-a-,-b-)-=-(\backslash -+$
a
$\Theta-(-c-)$Figure 2: (a)
a
scanned image, (b) thegraph expressing possible adjacencyconnections of bounding boxes, (c) the correct recognition result.Figure 3: the DAG corresponding to the formula.
$\bullet$ When two graphs are connected in parallel, the minimumspanning tree is obtained
by connecting the minimum spanning tree of
one
original graph and the minimum two disconnected spaming subtrees of the other graph.Let us write $G(s, t)$ to
mean
that the graph $G$ has two distinguished vertices, namely,the
source
$s$ and the sink $t$.
Definition 1 A graph is
a
series-parallel graph $(SPG)$ if (1) it consists of a single edgeconnecting $s$and$t$, or (2) it
can
be produced by asequence of thefollowing two operations:Series Composition: Given two series-parallel graphs $G_{1}(s_{1}, t_{1})$ and $G_{2}(s_{2}, t_{2})$, form a
Parallel Composition: Given two series-parallel graphs $G_{1}(s_{1}, t_{1})$ and $G_{2}(s_{2}, t_{2})$, form
a new
graph $G(s, t)$ by identifying $s=s_{1}=s_{2}$, and $t=t_{1}=t_{2}$.
Due to the recursive definit\’ion of SPGs,
we
can
obtainan
vertex-labeled ordered tree in accordance with a decomposition ofan
SPG.Deflnition 2 A series-paralleltree $(SPT)T$for
an
SPG $G(s, t)=(V, E)$ isan
edge-labeled rooted tree definedas
follows: The set of vertex labels is $\{S, P\}\cup E$.
$\bullet$ If $G(s,t)$ consists of
a
single edge, then$T=(r,\emptyset)$ where $r$ is
a
new
vertex (the rootof $T)$, and the label of$r$ is $(s, t)$
.
$\bullet$ If $G(s,t)$ is obtained by a series composition of
$G_{1}(s_{1}, t_{1})$ and $G_{2}(s_{2}, t_{2})$, and $T_{1}=$
$(V_{1},E_{1})$ and$T_{2}=(V_{2}, E_{2})$
are SPTs
of them, then$T=(\{r\}\cup V_{1}\cup V_{2},$ $\{(r,r_{1}), (r,r_{2})\}\cup$ $E_{1}\cup E_{2})$ where $r$ isa new
vertex (the root of$T$), and $r_{1}$ and $r_{2}$are
the roots of$T_{1}$and $T_{2}$, the label of$r$ is $S$, the first child of$r$ is $r_{1}$, and the second child of$r$ is $r_{2}$
.
$\bullet$ If$G(s_{1}t)$ is
obtained
bya
parallel composition of$G_{1}(s_{1}, t_{1})$ and $G_{2}(s_{2},t_{2})$, and $T_{1}=$ $(V_{1}, E_{1})$ and$T_{2}=(V_{2}, E_{2})$
are
SPTsofthem, then$T=(\{r\}\cup V_{1}\cup V_{2},$ $\{(r,r_{1}), (r, r_{2})\}\cup$ $E_{1}\cup E_{2})$ where $r$ isa
new vertex (the root of$T$), and $r_{1}$ and $r_{2}$are
the roots of$T_{1}$and $T_{2}$, the label of$r$ is $P$, the order of children of$r$ is ignored.
Notethat thechildren of
a
vertex labeled $S$are
ordered, while thechildren ofa
vertexlabeled $P$
are
unordered. All edges of $D$ appear exactlyonce
as
a
label of leaves. AnSPG may have many corresponding SPTs since the above decomposition is not unique in general. It is known that an SPT is obtained from any SPG in linear time depending on the number of edges of
an
SPG [4].Let $\Sigma=\{\sigma_{1}, \sigma_{2}, \ldots , \sigma_{m}\}$ be the set of vertex lables. Note that the number of vertex
labels $m$ is
a
constant number.The algorithm consists of the following functions Main and Calculate:
$\overline{Ehnction}$
Main$\overline{Input:}$
an SPG $G(s, t)=(V, E)$ anda corresponding SPT $T=(V_{T}, E_{T})$;Output: the minimumweight of spanning trees of $G$;
1: Let $u$ be the root vertex of$T$
:
2: $(A, B)$ $:=$ Calculate$(u)$;
3: $\min=\infty$
4: for $i:=$ lto $m$ do
$\check{o}$: for $j$ $:=1$ to $m$ do 6: if $A[i,$$j]<$ 7漉$n$ then 7: $\min:=A[i,j]$; 8: end if 9: end for 10: end for 11: return $\min$; $\overline{\frac{EunctionCa1cu1ate}{Input:avertexu\in V_{\mathcal{T}_{1}}}}$
Output: arrays of real numbers $A[1\ldots m, 1\ldots m]$ and $B[1\ldots m, 1\ldots m]$;
1: if the label of$u$ is $(v_{1}.v_{2})\in E$ then
2: for $i:=$ lto $m$ do
3: for $j:=$ lto $m$ do
4: if there extsts an weighted edge $e$ between the lable candidate $\sigma i$ of$v_{1}$ and the
lable candidate $\sigma j$ of $v_{2}$ then
5: $A[i_{1}j]$ $:=$ the weight of $e$.
6: else 7: $A[i,j]:=\infty$; $s$: end if 9: $B[i,j]:=0_{:}$
.
10: end for 11: end 最)r12: else if the label of$u$ is $S$ then
13: Let $u_{1}$ and $u2$ be the first and second child of$u$, respectively;
14: $(A_{1}, B_{1})$ :$=$ Calculate(ui);
5: $(A_{2}, B_{2})$ $:=$ Calculate$(u_{2})$;
16: for $i$ $:=1$ to $m$ do
17: for $j$ $:=1$ to $m$ do lS: $m$伽ノ 1 $=\infty$;
19: $minB=\infty$;
20: for $k:=1$ to $m$ do
21: if $A_{1}[i,$ $k]$ 十 $A_{2}[$ん,$j]< \min$ノ 1 then
22: $minA:=A_{1}[i$,た$]$ $+A_{2}[$た,$j]$;
23: end if
24: if $A_{1}[i,$ $k]$ 十 $B_{2}[k,$
$j]<minB$
then25: $minB:=A_{1}|$嗣 $+B_{2}[k,j]$;
26: end if
27 if $B_{1}[i,$$k]+A_{2}[k,$
$j]<m$
伽$B$ then28: $minB:=B_{1}[i,$$k]+A_{2}[$た,$j]$; 29: end if 30: end for 31: $A[i,j]:=minA$; 32: $B[i,j]:=minB_{:}$
.
33: end for 34: end for35: else if the lable of $u$ is $P$ then
36: Let $u_{1}$ and $u_{2}$ be the children of$u$;
37: $(A_{1}, B_{1})$ :$=$ Calculate(ui);
38: $($ノ12,$B_{2})$ $:=Calculate(u_{2})$
:
39: 最)r $i:=1$ to $m$ do
40: for ブ:$=1$ to $m$ do
41: if $A_{1}[i, j]+B_{2}[i,j]<B_{1}[i,j]+A_{2}[i,j]$ then
42: $A[i,j]:=A_{1}[i_{:}j]+B_{2}[i,j]$;
43’. else
45: end if 46: $B[i,j]:=B_{1}[i,j]+B_{2}[i.j]$; 47: end for $4S$: end for 49: end if
$\underline{60:return(A,B);}$
4
Relation to the
Generalized Minimum
Spanning
Tree
Prob-lem
The minimum spanning tree problem with label selection is closely related to the
gener-alized minimum spanning tree problem (GMSTP) [2, 3]. The following results
are
known forGMSTP
[3]:$\bullet$ GMSTP is NP-haid, and the problem is still NP-hard
even on
trees.$\bullet$ If the number of clusters is fixed, then GMSTP
can
be solved in polynomial-time with respect to the number ofvertices.
The result of this paper
can
be translated into the following new results forGMSTP:
.
GMSTP is still NP-hardeven
ifthe size of each cluster is at most 2..
Ifthe tree-widthofthe base graph issmall, tllenGMSTPcan be solved in polynomial-time with respect to the number ofvertices.References
[1] Yuko Eto and Masakazu Suzuki. Mathematical formula recognition using virtual link network. In Proceedings
of
the 6th IntemationalConference
on Document Analysisand Recognition (ICDA$R$ 2001), pp. $430-437_{:}$ 2001.
[2] Young-Soo Myung, Chang-Ho Lee, and Dong-Wan Tcha. Onthe generalized minimum spanning tree problem. Networks, Vol. 26, No. 4, pp. 231-241, 1995.
[3] Petrica Claudiu Pop. The generalized minimum spanning tree problem. $PhD$ thesis,
Twente University Press, http://doc.utwente.nl/38643/, December 2002.
[4] Kazuhiko Takamizawa, Takao Nishizeki, and Nobuji Saito. Linear-time computability of combinatorial problems on series-parallel graphs. Joumal