讓句哨遏･荵縲檎ｵｱ險医Δ繝Ν繧堤畑縺◆螟ｧ隕乗ｨ｡繝繧ｿ縺ｮ蛻｡橸ｼ悟､画鋤後◎縺励※遏･隴倡匱隕九

(1)

1

ISM

統計モデルを用いた大規模データ

の分類，変換，そして知識発見

樋口知之

情報・システム研究機構統計数理研究所＆ＪＳＴＣＲＥＳＴ

第10回情報論的学習理論ワークショップ (IBIS 2007) 2007年11月5日(月),6(火),7日(水) 東京工業大学すずかけ台キャンパス

ISM

2

アウトライン

１．異常値と欠損値処理

２．オンライン処理と時系列モデル

３．非ガウス情報処理

ー数値的に分布を構成

ーModel Averaging

ーConditional Dynamic Linear Model

４．Sequential Monte Carlo (SMC)

(2)

3

ISM

大量データは巨大なゴミ箱？

大規模データの実際は、そのまま

だと単なる屑の山

生ゴミ _プラスチックビン、アルミ缶新聞・紙

分別、整理することで

じゃ、大量データの解析は、

砂金探しのようなもの？

錬金術の話

ではない

4

ISM

言葉の使われかた

情報

知識

データ

○○知 Wisdom:英知取り扱っていない

情報科学

（AI，情報処理

、計算機統計

）

統計科学

○○抽出

○○発見

○○処理

センシング

明確には認識されていない部分

(3)

5

ISM

超大量データ（情報）処理関連研究領域

統計科学

機械学習

データマイニング

•パターン列挙（枚挙）

•高速探索

•生成モデル構成 ※

(Generative Model Building)

•伝統と蓄積

•判別モデル構成 ※

(Discriminative Model Building)

(Discriminant Function Builidng)

•最適化

※ 『』 : Bishop “Pattern Recognition and Machine Learning” (2006), ：伊庭による解説（信学技報告NC2006-55 (2006-10) 61—66）中の用語を利用

『データの生成過程を条件付き確率で表現して，すべての変数の同時分布を書き下し，あとは必要に応じてベイズの公式を使う。』『与えられた目的に必要な条件付き確率のみを抜き出してモデル化』類似度のモデル化全体のモデル化 → 予測，制御の作業が見通しよくできる。新しい学問領域の創生

)

,

(

_i _j

K

x

IBIS

ISM

NSF: Office of Cyberinfractructures

■ Cyber-Enabled Discovery and Innovation

Cyber-Enabled Discovery and Innovation (CDI) is NSF’s bold five-year initiative to create revolutionary science and engineering research outcomes made possible by innovations and advances in computational thinking.

Computational thinking is defined comprehensively to encompass computational concepts, methods, models, algorithms, and tools.

* From Data to Knowledge: enhancing human cognition and generating new knowledge from a wealth of heterogeneous digital data;

* Understanding Complexity in Natural, Built, and Social Systems: deriving fundamental insights on systems comprising multiple interacting elements; and

* Building Virtual Organizations: enhancing discovery and innovation by bringing people and resources together across institutional, geographical and cultural boundaries.

※This program is expected to start at $26M (約30億円)for this fiscal yearand increase significantly in future years.

■ Sustainable Digital Data Preservation and Access Network

Partners

(4)

7

ISM

事前のノイズ処理が実は本質的

目が細かいと，水しか通らない目が荒いと，小石まで通ってしまう

異常値を含んだデータを次のス

テップへ大量に渡してしまう。

新たな知見を生む可能性が

あるデータも捨ててしまう。

裏ごしちょうど良い目の大きさパラパラさぁ、どうやって最初は手をつけようかぁ … 次の解析プロセスへ

ゴミデータをふるいにかける

小麦粉をふるうと、ごみや異物を取り除いたり、粉をほぐしてきめを細かくし、空気を含ませたりする役目があります。異常値を除いたり、欠損値を補ったり、順番を揃え直したり、…. 8

ISM

情報縮約（不可逆変換）の加減

煮すぎると栄養も

旨みも流れ出る

ゆでが足りないと

苦みが残る

最適な調理具合

処理が足らなければ，

玉石混淆の情報が溢れる

やりすぎれば，必要な

情報まで捨ててしまう。

いくら素材がよくとも、…

(5)

9

ISM

Chain Structure Graphical Model

0 x

2 y

y

_t

2 x

x

_t

観測できない

1 y

観測できる

観測モデル

1 x

システムモデル

{

}

{

t N

}

N t

x

y

,

1 0 1

K

ベクトル量

過去

& 現在

現在

& 将来

状態

x

_t

観測値 y

(

|

_t

)

])

,

[

|

(

])

,

[

|

(

])

,

[

|

(

:

1 :

1

2

1 :

1

2

1 :

1

2

1

1 :

1 T

T

t

p

y

p

y

p

y

p

y

x

y

x

y

x

y

x

K

≡

₋

−

きのうまでのデータに基づく今日の状態今日までのデータに基づく今日の状態数年後，データをすべて得たもとで振り返った今日の状態 日次株価データを考えると ---t t t t t t

e

Hx

y

Gv

Fx

x

+

=

+

=

₋₁ 10

ISM

内挿と外挿

0 x

1 −

t

y

1 −

t

x

観測できない

1 y

観測できる

観測モデル

1 x

システムモデル

t

y

t

x

T

y

T

x

_T

₊

₁

欠測値、異常値

⎥

⎦

⎤

⎢

⎣

⎡

=

t

M

t

x

,

2 ,

1 M

x

潜在変数を多数

用意する

データ数

(6)

11

ISM

賢いアルゴリズムの開発よりモデリングの妙技が肝！

例：季節調整法

(北川，樋口，1998) （月データ）・前年同月比・季調済みデータ（USセンサス） 12

ISM

)

,

0 (

,

)

,

0 (

,

)

(

)

,

0 (

,

2

2 2 , , 3 2 1 2 , , 2 1

σ

μ

τ

μ

_μ

N

e

s

y

N

v

s

N

v

t t t t t s t s t s t t t t t t t t t

～

+

=

+

−

=

+

−

=

− − − − −

:

t t t

e

s

μ

_{トレンド成分}

季節変動成分

観測ノイズ

季節調整モデル（四半期データの場合）

‘

[

]

[

]

[

1

0

1

0

0 ]

,

0

1

0

1 ,

1

0

1

2 ,

,

, , 2 1 1

=

⎥

⎦

⎤

⎢

⎣

⎡

=

⎥

⎦

⎤

⎢

⎣

⎡

−

=

₋ ₋ ₋

H

G

F

v

s

x

t

μ

t

μ

t t t t t μt st t t t t t t

e

Hx

y

Gv

Fx

x

+

=

+

=

_{− 1}

‘

季節調整モデルの状態空間表現

(7)

13

ISM

2562

0.0321

2506

10

3.21 2556

10

3.21 AIC

4 -6

-×

×

2

α

Too smooth

Too rough

(Kitagawa, 1994)

T

t|

μ

AIC best

)

parameters

-(hyper

#

AIC

=

−

2 log

p y

(

₁_:_T

|

α

2

,

σ

2

)

+

2 )

|

(

)

(

₁

_:

_T

=

Π

T

_t

₌

₁

p

y

_t

₁

_:

_t

₋

₁

p

y

経験ベイズ：ハイパーパラメータの決定

各時刻毎のフィルタのステップで求められる 14

Fixed-Lag Smoother

T

t

=

'

t

=

1 =

t

)

|

(

_t_' ₁_:_t_'

p

x

y

フィルタ分布：

固定区間平滑化分布：

_p

₍

_x

_t_'

_|

_y

₁_:_t_'

₎

)

|

(

)

|

(

_t

_'

₁

_:

_T

⇒

p

_t

_'

₁

_:

_t

_'

₊

₄

p

x

y

x

y

[

(

)

]

4 '

|

4 '

)

(

,

4 '

|

3 '

)

(

,

4 '

|

2 '

)

(

,

4 '

|

1 '

)

(

,

4 '

'|

)

(

4 '

|

4 '

i

t

i

t

i

t

i

t

i

t

i

t

+

=

+

Ξ

x

)

20 (

:

)

|

(

)

|

(

₁

_:

₊

≈

p

_:

₊

L

=

p

x

_t

y

_t

_L

x

_t

y

_t

_L

は長くとる

200１年12月 3月 6月 9月 2002年12月

(8)

15

Gaussian

Non-Gaussian

Smoother

Data

t t n t t t

w

y

v

+

=

+

=

₋

μ

₁

トレンドモデル

ノイズの分布

　　　正規分布

)

,

0 (

~

N

σ

2

w

n

~

(

0 ,

)

　　　 Cauchy

分布

2

τ

C

v

t

ISM

非ガウス情報処理のからくり：非ガウス平滑化

16

ISM

(Kitagawa and Gersch, 1996)

ジャンプの自動同定

(

)

(

)

(

)

ガウス分布

）

分布（ローレンツ分布

族：

:

Cauchy

:

1

5 .

0

1 )

2 /

1 (

)

2 /

1 (

)

(

)

,

|

(

Pearson

2 2 1 2 2

+∞

=

+∞

≤

<

−

+

⋅

Γ

−

Γ

=

−

b

v

b

v

p

_b b

β

τ

β

1 :

model

system

v

_t

=

μ

_t

−

μ

_t

₋

t

v

1 − t

μ

_t t

μ

t

μ

)

|

(

v

⋅

p

(9)

17

ISM

異常値の自動同定

t

y

e

=

−

μ

:

観測モデル

t

e

t

μ

y

_t t

y

t

y

)

|

(

e

_t

⋅

p

t

e

異常値処理された時系列データ系列データ 18

ISM

異常値の癖をモデル化する

t

y

e

=

−

μ

:

観測モデル

t

e

t

μ

y

t

)

,

(

)

1 (

)

,

0 (

)

|

(

e

_t

N

_s

2 N

_out

2 p

⋅

=

α

σ

+

−

α

μ

σ

系列データ

out

μ

■計測機器の癖をモデル化する

Normal Mixture

異常値処理された時系列データ 1－α：異常値の割合

(10)

19

スミソニアン博物館内の自走案内ロボット

•Position tracking

•Global localization problem

(初期位置未知）

•Kidnapped robot problem

(予告無しにどこかに連れ去られる)

•Multi-robot localization problem

•館内部の展示域は複雑な形状 •特別展などでガラスケースの位置などに変更がある •似たような場所が展示域に複数ある •混雑した中を自走する必要 •廉価かつ簡単に実装できるシステムが望ましい

難

ISM

Mobile Robot Localization

(D. Fox et al., “Particle filter for mobile robot localization,” 2001)

Experiences with

Interactive Museum

Tour-Guide Robots

Wolfram Burgard

University of Freiburg

Department of Computer Science

Autonomous Intelligent Systems

http://www.informatik.uni-freiburg.de/~burgard

[email protected]

確率ロボティクス

Sebastian Thrun (著), Wolfram Burgard ( 著), Dieter Fox (著), 上田隆一 (翻訳)

(11)

21

ISM

Motion model

))

,

(

)

(

))

,

(

)

,

|

(

)

,

|

(

)

,

|

,

(

)

,

|

(

1 1 1 1 1 1 1 1 1 1 1 1 − − ∗ − − − − − − − − − −

=

⋅

−

=

⋅

=

∫

t t t t t t t t t t t t t t t t t t t t t t t t t t

u

x

f

v

p

dv

v

p

v

u

x

f

x

dv

u

x

v

p

v

u

x

p

dv

u

x

v

x

p

u

x

p

δ

Convolution of conventional

robot kinematics and two

independent zero-mean

random variable

)

,

(

)

,

(

1 1 t t t t t t t

w

x

h

y

v

u

x

f

x

=

₋ ₋

Motion model

：パターンを集めシステムモデルを数値的に構成

)

,

|

(

_t

₋

₁

_t

₋

₁

p

x

u

22

ISM

))

,

(

)

(

))

,

(

)

|

(

)

,

|

(

)

|

,

(

)

|

(

t t t t t t t t t t t t t t t t t t t t

x

y

h

w

p

dw

w

p

w

x

h

y

dw

x

w

p

w

x

y

p

dw

x

w

y

p

x

y

p

∗

=

⋅

−

=

⋅

=

∫

δ

observation model

)

,

(

)

,

(

₁ ₁ t t t t t t t

w

x

h

y

v

u

x

f

x

=

₋ ₋

Perceptual model：

観測誤差モデルも数値的に構成すればいい

センサーが被る観測誤差

普通の観測誤差＋普通，異常

値として取り扱うような誤差

Planar 2D laser range finderの場合

)

|

(

_t

(12)

)

,

|

(

)

,

|

(

2

1

1 θ

x

y

θ

x

t

p

⋅

₋

～

23

ISM

自己組織（調整）型時系列モデルのグラフィカルモデル

観測できない

1 y

観測できる

t

y

1 −

t

y

1 θ

θ

_t

₋

₁

θ

_t

0 θ

0 x

x

₁

x

t

−

₁

x

_t

(Kitagawa, 1996)

)

,

|

(

]'

'

,

'

[

1 ,

2

1 ,

1 ψ

−

⋅

≡

t

p

θ

～

ロボティクスの分野で応用開発研究が非常に盛ん。主にオンライン処理。

{Ghahramani, Jordan, Hinton} {Shamway&Stoffer}

24

ISM

状態ベクトルへの埋め込みとオンライン型 Model Averaging

1 −

t

y

1 −

t

x

t

y

t

x

⎥

⎦

⎤

⎢

⎣

⎡

=

t

M

t

x

θ

x

,

2 ,

1 M

SOSSM with latent switching variable

⎥

⎦

⎤

⎢

⎣

⎡

=

t

I

x

α

(Higuchi, 2000, 2001) in Sequential Monte Carlo Methods in Practice(eds. A.

Doucet, J.F.G, de Freitas, and N.J.Gordon)

あとは粒子フィルタを適用するだけ！

Evolution of : Markov switching prior

I

_t

t

I

がどのモデルを使うかを指定する．異種，多数のモデルを同時に考え，Model Averaging をオンラインで達成する．

)

,

(

_t

i

t

h

x

w

y

=

観測モデル集合： i=1, …., M

⎩

⎨

⎧

=

+

=

≠

+

−

=

0 ,

log

0 ,

/

)

1 (

λ

λ t t t t t t

w

x

y

w

x

y

Box-Cox変換を多数用意

(13)

25

ISM

)

,

0 (

,

)

,

0 (

,

2

1 σ

μ

τ

μ

N

w

y

N

v

t

～

+

=

+

−

=

₋

真のトレンド

Small Dip

線形・

非

ガ

ウ

ス

ト

レ

ン

ド

モ

デ

ル

C

線形

ガ

ウ

ス

ト

レ

ン

ド

モ

デ

ル

ジャンプの自動同定（例）

26

ISM

自己組織型状

態空間モ

デ

ル

)

,

0 (

,

log

)

,

0 (

,

)

,

0 (

,

2

2 1 10 2 10 2 2 2 1

ξ

ε

σ

τ

μ

C

N

w

t

y

C

v

t t t t t t t t t t t t t t

～

+

=

+

=

+

−

=

− − −

分散変動の自動同時推定

(14)

ISM

27

Given trend:

μ

t

true

1 =

true t

I

2 =

true t

I

Local level model with switching

system/observation variance

観測ノイズ小

•Kim and Nelson (1999)

•Fruhwirth-Schnatter (2001)

観測ノイズ大

US/UK real exchange rate

from Jan. 1885 to Nov. 1999

(Grilli and Kaminsky (1991), Engle and Kim (1999))

The real exchange rate is defined as the relative price of UK to US producer goods: US/UK nominal exchange rate times the UK producer price index divided by the US producer price index

ISM

28

Simulation Data

1

1 =

+

=

₋

t t t

I

t

I

t

I

t

H

w

E

x

H

y

( )

⎩

⎨

⎧

=

2 ,

1 ,

arg

e

t

l

t

small

I

t

I

E

x

t

σ

μ

Model

異常値の同定と同じ．ただ背後に，マルコフ性をもつ時系列構造が潜んでいる

(15)

29

{

I

l

i

N

}

N

l

I

_t

_k

|

_T

)

1 #

_t

i

_T

with

_t

_k

|

1 ,

,

r(

Pˆ

_,

=

y

₁

_:

=

z

(

_|

₁

)

_:

_,

=

K

Points with larger obs. noise

2 =

true t

I

=

1

true t

I

事後分布：レジームの推定

ISM

t

w

D

x

H

y

u

G

x

F

x

λ

+

=

+

=

₋₁

Conditional Dynamic Linear Model (CDLM)

Time-Dependent Gaussian Mixture Model

t

I

=

λ

,

が与えられば定数行列

は

I

_t

D

H

G

F

_λ

,

_λ

,

_λ

,

_λ

λ

=

)

|

Pr(

I

_t

j

I

_t

₁

i

ij

=

−

=

π

:

t

I

latent indicator variable

stationary, discrete, first order homogenous Markov chain 遷移確率

(16)

31

Rao-Blackwellization

に相当

の場合：

)

|

,

(

)

,

(

CDLM

:

1 :

1

2

1 x

p

x

t

x

p

⇒

I

y

[

]

られれば．．．

がもし，解析的に求め

1 2 1 2 1 2 2 1 2 1 2 1 2 1 2 1 2 1

)

|

(

)

,

(

)

(

)

|

(

)

,

(

)

,

(

)

,

(

dx

x

p

x

g

dx

x

p

dx

x

p

x

g

dx

x

p

x

g

∫

∫ ∫

∫

=

{ }

でモンテカルロ積分を

行えばいい

に従うサンプル　

～

j m_j

x

p

x

( ) ₁ 2 2 2

(

)

=

)

,

(

)

,

|

(

x

_t

₁

(

_:

_t

j

)

₁

_:

_t

N

x

_t

(

_|

_t

j

)

V

_t

_|

(

_t

j

)

p

I

y

=

いか？

どうやって求めればい

を

に従うサンプル

(

)

:

1 :

1 |

)

(

_t

j

p

I

y

I

32

Conditioningの表記法：フィルタ分布

)

KF

|

(

)

,

(

)

,

|

(

)

,

(

and

)

,

(

KF

)

(

1

1 )

(

1 |

1 )

(

1 |

1

1 :

1 )

(

1

1 :

1 )

(

1 :

1 )

(

1 |

1 )

(

1 |

1 )

(

1 j

t

j

t

j

t

j

t

j

t

j

t

j

t

j

t

x

p

V

x

N

x

p

V

x

−

=

y

I

y

I

ISM

(17)

33

)

KF

,

|

(

)

'

,

(

)

,

|

(

)

,

|

(

)

(

1 )

(

)

(

)

(

)

(

1 |

)

(

1 |

1 :

1 )

(

1 :

1 )

(

1 :

1 )

(

:

1

) ( ) ( ) ( ) ( ) (

j

t

j

t

j

I

j

I

j

t

I

j

t

I

t

j

t

j

t

j

t

I

y

p

D

H

V

H

x

H

N

I

y

p

y

p

j t j t j t j t j t

−

=

+

=

I

y

I

)

KF

,

|

(

)

,

(

)

,

|

(

)

,

|

(

)

(

1 )

(

)

(

1 |

)

(

1 |

1 :

1 )

(

1 :

1 )

(

1 :

1 )

(

:

1 j

t

j

t

j

t

j

t

j

t

j

t

j

t

I

x

p

V

x

N

I

x

p

x

p

−

=

I

y

I

) ( ) ( ) ( ) ( ) (

'

,

) ( 1 | 1 ) ( 1 | ) ( 1 | 1 ) ( 1 | j t j t j t j t j t I I I j t t I j t t j t t I j t t

G

F

V

F

V

x

F

x

+

=

− − − − − −

ISM

Conditioningの表記法：予測分布

34

)

|

(

)

(

I

₁_:_T

=

p

I

₁_:_T

y

₁_:_T

π

Posterior probability:

SIS framework:

)

(

)

(

)

|

(

)

|

(

)

(

)

(

)

|

(

)

|

(

)

|

(

)

(

)

(

)

(

)

|

(

)

|

(

)

|

(

)

(

)

|

(

)

|

(

)

|

(

)

(

)

(

)

(

1 : 1 1 : 1 1 : 1 : 1 : 1 : 1 : 1 : 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 1 1 : 1 2 2 2 : 1 1 1 1 : 1 1 1 : 1 2 2 : 1 1 1 : 1 : 1 : 1 : 1 − − − − − − − − − − − − − −

≈

=

≈

=

t t t t t t T t t t t t t t t t t t t t t t T T T T T T T T T T T T T

I

p

I

q

I

w

I

q

I

q

I

q

I

q

I

q

w

I

y

I

y

I

π

L

On line 計算に不向き

filter dist.

target function

trial function

Importance

weight

Sequential Monte Carlo(SMC)の基礎１．

(18)

35

General SIS framework:

)

KF

|

(

)

,

|

(

)

,

|

(

)

|

(

1 )

|

(

)

|

(

)

,

|

(

)

|

(

)

|

(

1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 : 1 1 : 1 t-t t t t t t t t t t t t t t t t t t t t

y

p

y

p

y

p

y

p

y

p

=

≡

∝

⋅

=

− − − − − − − − − − − − − −

y

I

y

x

y

x

y

x

y

x

y

x

y

x

)

|

(

)

KF

,

|

(

)

,

|

(

)

,

|

(

)

,

|

(

)

,

|

(

)

,

|

(

)

,

|

(

)

,

|

(

)

|

(

1 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 : 1 1 : 1 1 : 1 − − − − − − − − − − − − − − − −

⋅

≡

⋅

∝

⋅

=

t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t

I

p

I

y

p

x

p

x

y

p

y

p

x

p

x

y

p

y

x

p

x

p

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

π

)

|

(

)

|

(

)

(

)

(

)

|

(

)

|

(

)

|

(

)

(

)

|

(

)

(

)

|

(

)

(

)

,

(

)

|

(

)

(

)

(

)

|

(

)

(

)

(

1 : 1 1 : 1 : 1 1 : 1 1 : 1 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 1 : 1 1 : 1 1 : 1 1 : 1 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 − − − − − − − − − − − − − − − − − − − − − − − −

=

⋅

=

⋅

=

t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t

p

x

q

x

q

x

q

x

q

u

x

q

w

u

w

y

x

y

x

π

とすると

SMCの基礎２．

ISM

各粒子あたり、だけ和をとる必要があるモデルの数の要素数 : : M K It K

M

36

粗形粒子フィルタ in SIS framework

(Monte Carlo filter (Kitagawa, 1993), Bootstrap filter (Gordon et al., 1993))

x

p

x

y

p

x

p

x

y

p

x

p

x

y

p

x

p

y

p

y

p

y

p

y

p

y

p

t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t

)

(

)

|

(

)

|

(

)

|

(

)

|

(

)

|

(

)

|

(

)

,

|

(

)

|

(

)

|

,

(

)

,

|

(

)

|

(

)

,

|

(

)

|

(

)

|

(

)

,

|

(

)

,

|

(

)

|

(

)

(

1 : 1 1 1 1 : 1 1 : 1 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : 1 : 1 1 : 1 : 1 1 : 1 : 1 1 : 1 1 : 1 : 1 1 : 1 : 1 1 : 1 : 1 : 1 : 1 : 1 − − − − − − − − − − − − − − − − − − −

=

∝

=

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

π

SMCの基礎３．

x

q

x

p

x

y

p

x

q

x

p

x

y

p

x

q

u

t t t t t t t t t t t t t t t t t t t t t t t t t t

)

|

(

)

|

(

)

|

(

)

|

(

)

(

)

(

)

|

(

)

|

(

)

|

(

)

(

)

(

1 : 1 1 1 : 1 1 : 1 1 1 : 1 1 1 1 : 1 1 : 1 1 : 1 − − − − − − − − − − −

=

∝

=

x

π

)

|

(

u

_t

∝

p

y

_t

x

_t

x

p

x

q

_t

(

_t

|

x

₁_:_t₋₁

)

=

(

_t

|

_t₋₁

)

システムモデル観測モデル

(19)

ISM

37

)

|

(

)

,

|

(

)

,

|

(

)

|

,

(

)

,

|

(

)

|

(

)

,

|

(

)

|

,

(

)

|

(

)

|

,

(

)

,

|

(

)

|

(

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1

1 :

1 −

−

=

∝

=

t

p

I

p

I

y

p

I

p

I

y

p

y

p

y

p

y

p

y

p

y

p

y

I

y

I

y

I

y

I

y

I

y

I

y

I

y

I

y

I

y

I

y

I

Trial Function に予測分布を使う簡易版

)

|

(

)

,

|

(

)

,

|

(

)

|

(

₁

(

_:

_t

j

)

₁

_:

_t

₋

₁

∝

p

y

_t

I

_t

(

j

)

₁

(

_:

_t

j

)

₋

₁

_:

_t

₋

₁

p

I

_t

(

j

)

₁

(

_:

j

_t

₋

)

₁

_:

_t

₋

₁

p

₁

(

_:

_t

j

)

₋

₁

_:

_t

₋

₁

p

I

y

I

y

I

y

I

y

Samplingへ

Resamplingへ

ISM

38

2. Constant Velocity Model:

3. Constant Acceleration Model:

cv t x t

w

t

d

v

d

, ,

=

ca t x t

w

t

d

a

d

. ,

=

Target Tracking Problem：複数モデル

1. Constant Position Model:

tcv x t

w

t

d

s

d

, ,

=

4. Constant Jerk Model:

tcj x t

w

t

d

a

d

. ,

₌

∇

t

x t

y

_,

x

y

(20)

ISM

39

(

)

⎥

⎦

⎤

⎢

⎣

⎡

=

y

I

x

I

y

t

y

t

y

t

x

t

x

t

x

t

t t t

u

a

v

s

a

v

s

,

1 ,

1

1 ,

,

u

x

⎥

⎦

⎤

⎢

⎣

⎡

Δ

=

0

1

, 1

t

F

_I _x t

⎥

⎦

⎤

⎢

⎣

⎡

Δ

=

1

2 /

)

(

1

2 , 2

t

F

_I _x t

t

I

t

I

t

F

t

x

G

t

u

x

=

₋₁

+

Constant Velocity Model:

cv

_{Constant Acceleration Model:}

t x t

u

t

d

v

d

=

, ca t x t

u

t

d

a

d

=

,

Target Tracking Problem：状態ベクトル

位置速度加速度

分散の違いは，

Gで表現させれ

ばいい

[

]

'

,

x

,

t

x

,

t

x

,

t

x

t

=

s

v

a

∇

a

x

ISM

１）速度０とする

２）初期等速度モデル分布からリサンプリングする

３）一期前の等速度モデル分布からリサンプリングする

次元の異なる状態ベクトル間の情報交換

同位置モデル⇒等速度モデル

)

,

0 (

~

0 ,

2

1

0 ,

0 ),

,

0 (

~

,

2 ,

,

1 ,

,

1 ,

,

2 ,

,

1 ,

x x x x x x x

v

t

v

t

v

t

x

t

x

t

x

t

x

t

v

t

x

t

x

t

x

t

x

t

x

t

x

t

s

t

s

t

s

t

x

t

x

t

N

w

v

a

w

v

s

a

v

N

w

s

τ

+

=

∇

=

+

=

∇

=

+

=

−

{

}

N j t t j i x t

v

I

v

t x t 1| 1 ₁ ) ( ) ( , 1

~

1,| 1

|

− − = − − −

=

等速度モデル

(21)

41

ISM

簡易版オンライン型 Model Averaging の手続き

{

|

1 ,

|

1 }

P

_t

≡

x

_t

₋

V

_t

₋

) ( 1 | j t t

I

₋ ) ( 1 | 1 j t t

I

₋ ₋ ) ( 1 | j t L t

I

₋ ₋ ) ( 1 )| 1 ( j t L t

I

₋ ₊ ₋ ) ( 1 j t−

ＫＦ

( j) t

ＫＦ

j番目の粒子 ) 2 (

KF

− + ∧ L t

)

1 (

+

− L

t

y

_t

) 1 (

KF

− + ∧ L t ) 1 (

P

− + ∧ L t

{

}

m

の最頻値

j j L t L t

I

ˆ

−( +1)

:

(−() +1) ₌₁

{

|

1 ,

|

1 }

S

_t

≡

x

_t

₊

V

_t

₊

) 2 (

S

− + ∧ L t 次の時刻までのデータが所与のもとでの状態ベクトル

x

の推定

j

=1,…,m

Fixed-Lag Smoother with Model Averaging

帰納的

42

ISM

TESD: 第4の科学，第4の方法論

T:理論

E:実験

S:シミュレ

ーション

D：大量デ

ータ処理

演繹的

データ同化

科学の駆動力

予測・制御

前には進むが，どちらにい

くのかコントロールが必要

Modeling

(22)

43

ISM

2007年6月出版

ベイジアンモデリングによる実世界イノベーション

全体モデルから局所モデルへ：状態空間モデルとシミュレーション樋口知之今月号

讓句哨遏･荵 縲檎ｵｱ險医Δ繝Ν繧堤畑縺◆螟ｧ隕乗ｨ｡繝繧ｿ縺ｮ蛻｡橸ｼ悟､画鋤後◎縺励※遏･隴倡匱隕九

ISM

統計モデルを用いた大規模データ

の分類，変換，そして知識発見

樋口知之

情報・システム研究機構 統計数理研究所＆ＪＳＴ ＣＲＥＳＴ

ISM

アウトライン

１．異常値と欠損値処理

２．オンライン処理と時系列モデル

３．非ガウス情報処理

ー数値的に分布を構成

ーModel Averaging

ーConditional Dynamic Linear Model

４．Sequential Monte Carlo (SMC)

ISM

大量データは巨大なゴミ箱？

大規模データの実際は、そのまま

だと単なる屑の山

分別、整理することで

じゃ、大量データの解析は、

砂金探しのようなもの？

錬金術の話

ではない

ISM

言葉の使われかた

情報

情報

知識

知識

データ

データ

情報科学

（AI，情報処理

）

統計科学

○○抽出

○○抽出

○○発見

○○発見

○○処理

○○処理

センシング

ISM

超大量データ（情報）処理関連研究領域

統計科学

機械学習

データマイニング

•パターン列挙（枚挙）

•高速探索

•生成モデル構成 ※

(Generative Model Building)

•伝統と蓄積

•判別モデル構成 ※

(Discriminative Model Building)

(Discriminant Function Builidng)

•最適化

)

,

(

K

x

x

IBIS

ISM

NSF: Office of Cyberinfractructures

■ Cyber-Enabled Discovery and Innovation

■ Sustainable Digital Data Preservation and Access Network

Partners

ISM

事前のノイズ処理が実は本質的

異常値を含んだデータを次のス

テップへ大量に渡してしまう。

新たな知見を生む可能性が

あるデータも捨ててしまう。

ゴミデータをふるいにかける

ISM

情報縮約（不可逆変換）の加減

煮すぎると栄養も

旨みも流れ出る

讓句哨遏･荵縲檎ｵｱ險医Δ繝Ν繧堤畑縺◆螟ｧ隕乗ｨ｡繝繧ｿ縺ｮ蛻｡橸ｼ悟､画鋤後◎縺励※遏･隴倡匱隕九

情報・システム研究機構統計数理研究所＆ＪＳＴＣＲＥＳＴ

_t

_t