Probabilistic Inference by Minimizing the TRW Free Energy using CCCP IBIS2010 poster

(1)

Probabilistic Inference by minimizing the TRW

Free Energy using CCCP

Yu Nishiyama

1

, Xingyao Ye

2

, Alan L. Yuille

2

1

_{Laboratory for Integrated Theoretical Neuroscience, RIKEN, Brain Science Institute, Wako, JAPAN}

2

_{Department of Statistics, UCLA, Los Angeles, USA}

Introduction

◮ Probabilistic inference – computing marginals and/or the most probable

assignment (MAP estimate) given a large-scale graphical model.

◮ Many applications – e.g., computer vision, protein folding, genetic analysis,

neural science.

◮ No exact inference algorithms as belief propagation in tree graphs when

graphical models have cycles.

◮ Goal – developing approximate algorithms for graphs with cycles, favoring

algorithms which are as accurate as possible and are guaranteed to converge.

◮ Efficient TRW-BP algorithms [Wainwright et. al. 2005]

⊲ guaranteed to compute the global minimum of TRW free energy, though

not convergent.

◮ TRW-GP [Globerson and Jaakkola 2007]

⊲ _{guaranteed to converge by optimizing the convex dual of a modified TRW}

free energy.

◮ sum-TRW-S [Meltzer et al 2009]

⊲ _{TRW-BP is convergent if messages are updated with right update order.}

Purpose

◮ Developing a set of probabilistic inference algorithms by minimizing the

TRW free energy using CCCP.

Markov Random Field & Free Energy

◮ Markov Random Fields (MRFs) defined over graph _G _{= (}_V_, _E₎

p(x; θ) = exp  



X

i∈V

θ_i(x_i) + X ij∈E

θ_ij(x_i, x_j) − Φ(θ)  

 .

◮ The true singleton and pairwise marginals _p ₌ _{{_p_i₍_x_i₎_}_, _{_p_ij₍_x_i_, _x_j₎_}} are

searched within the local polytope.

L(G) =   

 

b ≥ 0

P xi

b_i(x_i) = 1, ∀i ∈ V

P xj

b_ij(x_i, x_j) = b_i(x_i), P

xi

b_ij(x_i, x_j) = b_j(x_j), ∀x_i, ∀x_j, ∀ij ∈ E.

  

 

.

◮ The beliefs _b which minimize the free energy yield approximations to the

marginals b∗ ≃ p. The free energies have the following form;

F (b) = −b · θ − S(b),

where −b · θ = P

i∈V

hθ_i(x_i)i_b_i + P ij∈E

hθ_ij(x_i, x_j)i_b

ij and S(b) is the entropy term.

Tree-ReWeighted (TRW) Entropy

◮ The TRW entropy is given by the convex combination of tree entropies

S_T (b):

S_TRW (b) = X

T∈T

ρ(T)S_T(b) = X i∈V

S_i(b_i) − X

ij∈E

ρ_ijI_ij(b_ij).

⊲ T is a spanning tree on graph G. T is the set of all spanning trees. ρ(.) is

a probability distribution over the spanning trees. ρ_ij is the edge

appearance probability of ij ∈ E.

◮ Edge appearance vector _ρ_e ₌ _{_ρ_ij_} is chosen within the spanning tree

polytope:

I

T (G) =

(

ρ_e ∈ IR|E||ρ_e = X T∈T

ρ(T )v(T) )

.

⊲ v(T) ∈ {0, 1}|E| is an indicator vector such that [v(T)]_ij = 1, ij ∈ T, and

otherwise 0.

◮ The TRW entropy can be rewritten over the local polytope _L₍_G₎ as

¯

S_TRW(b) = X

i∈V

c_iS_i(b_i) + X

ij∈E

c_ijS_ij(b_ij). (1)

⊲ c_ij = ρ_ij and c_i = 1 − P

j∈Ni ρij. Ni is the set of neighboring nodes of node i.

◮ The TRW entropy _S_TRW₍_b₎ is concave over the local polytope _L₍_G₎.

◮ The Bethe-approximation entropy corresponds to _ρ_e ₌ ₁, though ₁ _∈_/_T_I ₍_G₎.

Hence the Bethe-approximation entropy is not guaranteed to be concave if the graph is loopy.

Concave-Convex Procedure(CCCP)

◮ Let _F_vex be the set of all convex functions and Dec₍_F ₎ be the set of all pairs

of two convex functions such that the difference of the two convex functions

is equal to the objective function F (b):

Dec₍_F _{) =} ₍_f_vex_, _g_vex₎ _{∈ F}2

vex |fvex(b) − gvex(b) = F (b) .

◮ Theorem

Given a series of decompositions of an objective function F (b) into convex

and concave functions, {(f_vext , g_vext ) ∈ Dec₍_F ₎_}, the iterative algorithm

∂f_vext (bt+1) = ∂g_vext (bt) (2 )

is guaranteed to monotonically decrease the objective function F (b).

◮ Let CCCP₍_F _; _f 0

vex, gvex0 ) denote the specific CCCP algorithm under a

decomposition (f_vex0 , g_vex0 ). The family of CCCP algorithms is given by the

collection:

CCCP₍_F _{) =} _{CCCP₍_F _; _f_vex_, _g_vex₎_|₍_f_vex_, _g_vex₎ _∈ Dec₍_F ₎_} _. (3)

◮ _A CCCP(F ; f 0

vex, gvex0 ) can be

expanded into a family of

CCCP algorithms CCCP₍_F ₎.

geometrical meaning

time t time t+1 time t+2

TRW-CCCP

◮ Let _f_vex be a convex free energy which has the form of

f_vex(b; u) = −b · θ − u · S, (4)

where u · S = P

i∈V

u_iS_i(b_i) + P

ij∈E

u_ijS_ij(b_ij), and u = ({u_i}, {u_ij}) is chosen in a set U

to ensure that f_vex(b; u) is convex.

◮ We decompose the TRW free energy, using two convex free energies

f_vex(b; ˜u) and g_vex(b; u), as

F_TRW (b) = −b · θ + f_vex(b; ˜u) − g_vex(b; u), (5)

where u˜ = c + u. The set in which CCCP algorithms are guaranteed to

converge is

UCCCP =

n

u ∈ IR|V|+|E||u ∈ U, u˜ ∈ Uo. (6)

◮ Resulting TRW-CCCP algorithms are

Algorithm TRW-CCCP; A family of CCCP to minimize the TRW free energy.

1. Initialization: bα(1)(xα) = bα(0)(xα) ∝ exp h

θ_α(x_α)

c_α i

α ∈ V ∪ E

2. Choose a free vector u in the set UCCCP. 3. Inner loop:

b_i(x_i) ∝ h bij(xi)

bi(xi)

i

˜ uij ˜ ui+˜_uij

b_i(x_i)

b_ij(x_i, x_j) ∝ h bij(xi)

bi(xi)

i−_˜ ui˜

ui+˜_uij

b_ij(x_i, x_j) ij ∈ E

4. Outer loop:

bα(t+2)(xα) ∝

h

b_α(t+1)(x_α)

b_α(t)(x_α)

iu_u_˜α_α

bα(t+1)(xα) α ∈ V ∪ E

5. Output: A set of approximate marginals b∗(≃ p) when outer loop converges. Otherwise, set beliefs to b(t) = b(t+1), b(t+1) = b(t+2) and go to 2.

Numerical Results: Ising Spin Models

◮ In our experiments,

TRW-CCCP under any

setting of parameter u˜

computed the same

pseudo-marginals b∗ as that

from TRW-BP. Larger

settings of u made

TRW-CCCP local and

smaller settings of u made

TRW-CCCP non-local.

◮ Two dimensional grid (10x10 size)

5 10 15 20 25

‑1.26 ‑1.24 ‑1.22 ‑1.2 ‑1.18 ‑1.16 ‑1.14 ‑1.12 ‑1.1

Mixed(10x10grid)

step(outer loop)

TRW Free Energy

tilde‑u=0.1

tilde‑u=0.2

tilde‑u=0.5

tilde‑u=1

tilde‑u=2

tilde‑u=5

tilde‑u=10

0 200 400 600 800 1000

0

0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01

Mixed(5x5grid), Inner Loop=1

step

difference of beliefs

TRW‑BP

tilde‑u=1

tilde‑u=2

20 40 60 80 100 120 140 160

20

40 60 80

100 120

140

Mixed(10x10grid)

step(outer loop)

Number of inner loop

tilde‑u=0.1

tilde‑u=0.2

tilde‑u=0.5

tilde‑u=1

tilde‑u=2

tilde‑u=5

tilde‑u=10

100 200 300 400 500 600 700 800 900 1000

0

0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01

Mixed(5x5grid), Inner Loop=2

step

difference of beliefs

TRW‑BP

tilde‑u=1

tilde‑u=2

Conclusions

◮ We developed a new set of convergent CCCP algorithms which are

guaranteed to find the global minimum of the TRW free energy. The

algorithms include many free parameters (u ∈ IR|V|+|E|) which could be altered

to change the convergence rate. TRW-CCCP seems more stable for difficult problems such that energy functions have large random weights and