• 検索結果がありません。

Model based Kernel Sum Rule with Applications to State Space Models slide

N/A
N/A
Protected

Academic year: 2018

シェア "Model based Kernel Sum Rule with Applications to State Space Models slide"

Copied!
26
0
0

読み込み中.... (全文を見る)

全文

(1)

Model-based Kernel Sum Rule with

Applications to State Space Models

Yu Nishiyama1, Motonobu Kanagawa2, Arthur

Gretton3, Kenji Fukumizu4

1. The University of Electro-Communications, 2. The Graduate University for Advanced Studies, 3. University College

London, 4. The Institute of Statistical Mathematics

ABC in Montreal@NIPS2014Workshop 12/12/2014 (Fri.)

(2)

Likelihood-Free (LF)

Approximate Bayesian Computation (ABC)

Kernel Bayes' Rule (KBR)

[This Talk]

▶ keywords: nonparametric, kernel methods,

(3)

This Talk

ABC vs KBR

[Fukumizu et al., NIPS2011]

KBR and others

▶ kernel embedding[Smola et al., ALT2007] ▶ kernel sum rule (KSR) [Song et al., ICML2009]

▶ kernel Bayes' rule (KBR) [Fukumizu et al., NIPS2011]

My Contribution (Poster)

▶ parametric& nonparametric

(4)

ABC vs KBR

ABC

▶ generates an approximate posterior sample

X1, . . . , XT ∼q(x|y)∝p(y|x)π(x)

under likelihood-free.

ABC rejection algorithm:

(i) GenerateXt from π(·).

(ii) GenerateYt from p(·|Xt).

(iii) If D(y, Yt)< τ, accept Xt; otherwise reject.

(iv) Go to (i).

(5)

ABC vs KBR

Assumption (ABC)

▶ π(x) and p(y|x) are probabilistic models. ▶ Sampling from π(x) and p(y|x) is possible.

ABC weak points:

▶ If τ is finite, bias errors. ▶ If τ is small, X

t is rarely accepted.

▶ If y is high dimension, it is so.

(huge computational cost)

▶ Summary statistics fory are not obvious.

(6)

ABC vs KBR

Assumption (KBR)

▶ π(x) and p(y|x) are unknown. ▶ Instead one is given with samples

(Xi, Yi) ∼ p(y|x)p(train)(x)

˜

Xi ∼ π(x) (not necessarily)

KBR

[Fukumizu et al., NIPS2011]

▶ nonparametric Bayes' rule computation. ▶ Bayes' rule in a feature space (reproducing

(7)

ABC vs KBR

KBR good points:

▶ Complex-shaped π(x) and p(y|x) are OK. ▶ no tolerance value τ.

▶ consistency (sample size n→ ∞ with

regularizationϵn →0).

▶ applicable to any domainX, Y (e.g., strings,

graphs, images).

(8)

ABC vs KBR

Kernel ABC

[Fukumizu et al., NIPS2011]

▶ apply KBR to a sample (X

t, Yt)Tt=1 drawn from

known p(y|x)π(x).

▶ rejection ABC vs KBR

100 101 102 103 10-3 10-2

CPU time vs Error (2 dim.)

CPU time (sec)

Mean Square Errors

KBR COND ABC

6.1 102

1.5 103

2.1 105

4.9 103

2.7 103

3.1 104

8000 2000 1500 1000 800 600 400 200 3000 6000 4000 5000 200 400 600 800 1000 1500 2000 3000 4000 5000 6000 100 101 102 103 104 10-2 10-1

CPU time vs Error (6 dim.)

CPU time (sec)

Mean Square Errors

KBR COND ABC 3.3 102

1.3 106 7.0 104 1.0 104 2.5 103 9.5 102

200 400 2000 600 800 1500 1000 5000 6000 4000 3000 200 400 2000 4000 600 800 1000 1500 6000 30005000

(9)

Probabilities in Feature Space

Kernel Embedding

[Smola et al., ALT2007]

feature space probabilities

P

m

P

kernel mean

▶ Map a probability P to RKHS H. ▶ kernel mean

mP :=EX∼P[k(·, X)] =

X

k(·, x)dP(x)∈ H.

k(·, X): feature function.

▶ P 7→m

P is one-to-one (characteristic kernel).

(10)

Probabilities in Feature Space

Why kernel mean?

▶ Expectation of RKHS function f

EX∼P[f(X)] =EX∼P[⟨f,k(·, X)⟩H] =⟨f,mP⟩H.

▶ sufficient only to estimate m

P.

▶ no density estimation.

Kernel Bayesian Inference (KBI)

▶ Inferm

P instead of P.

(11)

Inference in Feature Space

Kernel Sum Rule (KSR)

[Song et al., ICML2009]

probabilities on X

Π

m Π

QY

mQY

probabilities on Y feature space

on X

feature space on Y

sum rule

kernel sum rule

▶ sum rule q(y) =

Xp(y|x)π(x)dx.

▶ estimators

▶ mˆ

Π=∑li=1γikX(·,X˜i), mˆQY =

∑n

i=1wikY(·, Yi).

(12)

Inference in Feature Space

Kernel Bayes' Rule (KBR)

probabilities on X Π m Π QY mQ Y probabilities on Y feature space on X feature space on Y

Bayes rule kernel Bayes' rule

QX|y

mQX|y

+y

+y

▶ Bayes' rule q(x|y)∝p(y|x)π(x). ▶ Posterior estimator mˆ

QX |y =

∑n

j=1wjkX(·, Xj).

▶ matrix multiplication w=M(KBR)( ˆm

(13)

Intuitive

x

y

?

input output

y

?

input output

input output

x y?

(X1,Y1)

(X2,Y2)

input output

KSR

▶ If x is close toX

1, y is close to Y1.

▶ If x is far from X

2, y is far from Y2.

▶ closeness? ⇒ similarity on kernel k(x, X

1).

▶ Introduce RKHS H

X on X.

▶ Vector-valued kernel ridge regression. ▶ Introduce kernel k(y, Y

1) (RKHS HY) on Y.

(14)

Intuitive

x?

y (

X1,Y1)

(X2,Y2)

input output

x?

y (

X1,Y1)

(X2,Y2)

input output

KBR

▶ Predict x fory.

▶ One-way KSR with reverse is enough?

⇒ No. Take into account prior π(x).

(15)

KSR

&

KBR can be a building block

KBR-based LF Algorithms

▶ State Space Models

[Fukumizu et al., NIPS2011].

▶ Partially Observable Markov Decision Process

[Nishiyama et al., UAI2012].

▶ Predictive State Representation

[Boots et al., UAI2013].

State Space Models

[Fukumizu et al., NIPS2011]

▶ filtering algorithm ▶ fu - ara etric

▶ transition processp(x′|x) (nonparametric). ▶ observation processp(z|x) (nonparametric).

(16)

Remain & Poster

Objective

▶ flexible inference (parametric& nonparametric).

State Space Models (proposal)

▶ kernel LF filtering algorithm

▶ transition process (parametric)

additive Gaussian noise models

xt+1 =f(xt) +ςt, ςt ∼N(0,Σ)

1. f may be nonlinear.

2. Other noise models are possible (α-stable, normal inverse Gaussian,

variance Gamma distribution).

(17)

Parametric

&

Nonparametric

Example (Vision-based Robot Localization)

▶ State x ⇒ robot's position.

▶ Observation z ⇒ image (structural data). ▶ Transition process p(x′|x)

⇒ physical system (parametric).

▶ Observation processp(z|x):

⇒ complex (nonparametric).

▶ KBR is applicable to any domain.

(18)

Parametric

&

Nonparametric

Sum Rule in Feature Space

▶ nonparametric KSR (Non-KSR)

[Song et al., ICML2009]

▶ model-based KSR (Mb-KSR)

[This NIPS2014 Workshop]

Model-based KSR (Mb-KSR)

(19)

Parametric

&

Nonparametric

Model-based KSR (Mb-KSR)

▶ input: mˆ

Π:=

∑l

i=1γikX(·,X˜i).

▶ output: mˆ

QY =

∑l

i=1γimY|X˜i.

mY|X˜i: closed-form kernel embedding.

Closed-form Kernel Embedding

▶ Gaussian model p(y|x) + Gaussian kernelk(·,·). ▶ Other pairs are possible.

▶ conjugate pair.

(20)

Parametric

&

Nonparametric

Bayesian Inference in Feature Space

▶ Apply Non-KSR &Mb-KSR.

X Y Z

Nonparametric Nonparametric

X Y Z

Nonparametric Parametric

X Y Z

Parametric Nonparametric

▶ Left:

Non-KSR + Mb-KSR

▶ Right:

(21)

State Space Models in Feature Space

Filtering in State Space Models.

▶ Transition Process (parametric)

▶ Observation Process (nonparametric)

Known Known

Unknown Unknown Unknown

Xt

Xt-1 Xt+1

Zt-1 Zt Zt+1

Known Known

Unknown Unknown Unknown

Xt

Xt-1 Xt+1

Zt-1 Zt Zt+1

▶ Prediction Step

▶ Apply Mb-KSR.

▶ Filtering Step

▶ Apply KBR.

(22)

State Space Models in Feature Space

Kernel LF filtering Algorithm.

▶ computations are matrix multiplications.

Algorithm 1KBR-Filter (Transition: probabilistic model, Observation: nonparametric) Initial Belief:prior kernel meanmX

1.

Observation: z1∈ Z.

Initial Filtering:αX1|z1←KBR Algorithm with priormX1.

for t= 2 :Tdo

Weight Computation 1: βZt|z1:t−1= (GX+ǫnIn)

−1G

X′|XαXt−1|z1:t−1.

Observation: zt∈ Z.

Weight Computation 2: αXt|z1:t=RX |Z(βZt|z1:t−1)kZ(zt).

end for

-8 -6 -4 -2 0 2 4

(23)

Experiments

Synthetic Experiment

▶ proposed v.s. full-nonparametric [Fukumizu et al.,

NIPS2011]. -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -0.1 0 0.1

100 200 300 400 0.015

0.02 0.025

Sample Size

Mean Squared Error

fKBF proposed

100 200 300 400 0

0.005 0.01 0.015

Sample Size

Mean Squared Error

fKBF proposed

100 200 300 400 0 0.002 0.004 0.006 0.008 0.01 Sample Size

Mean Squared Error

fKBF proposed

(24)

Experiments

Vision-based Robot Localization Problem

▶ COsy Localization Database (COLD) [Pronobis et

al., 2009].

(a) Corridor

(b) Two-persons office

(25)

Experiments

Vision-based Robot Localization Problem

▶ RMSE vs sample size

-6 -4 -2 0 2 4 6 8 -10 -8 -6 -4 -2 0 2 4 x y

200 400 600 800 1.5 2 2.5 3 3.5 4 4.5 5

Training Sample Size

R MSE NAI NN fKBF Analy

NAI: naive (maximum similarity) NN: k nearest neighbors

fKBF: full-nonparametric

Analy: proposed

(26)

Conclusion

KBR is a likelihood-free approach.

Probabilities are inferred in feature space.

Various algorithms in feature space.

Parametric

&

nonparametric inference in

参照

関連したドキュメント

In this state space model, the stochastic system model is represented by the stochastic Equations (4) and (5) and the probability distributions given in Section (2.3); the

Parameter estimation using the technique described in this paper is also discussed in Beh [5] where the Pearson chi-squared statistic for a singly ordered two-way contingency table

The basic virus dynamics model with humoral immune response has four state variables: x, the population of uninfected target cells; y, the population of productive infected cells;

To overcome the drawbacks associated with current MSVM in credit rating prediction, a novel model based on support vector domain combined with kernel-based fuzzy clustering is

Projection of Differential Algebras and Elimination As was indicated in 5.23, Proposition 5.22 ensures that if we know how to resolve simple basic objects, then a sequence of

This, together with the observations on action calculi and acyclic sharing theories, immediately implies that the models of a reflexive action calculus are given by models of

At the same time, a new multiplicative noise removal algorithm based on fourth-order PDE model is proposed for the restoration of noisy image.. To apply the proposed model for

• For k and λ small enough, a large typical map consists of several planar components each of a fixed bicolored type, connected by a finite number of monocolored edges (with weight