ガウス過程モデルとベイズ最適化

(1)

ガウス過程モデルとベイズ最適化

(2)

▶ 問題設定１：関数推定関数f を精度良く推定したい

f^∗= arg min

fˆ∈F

∑n i=1

(f(x_i)−f(xˆ _i))²

▶ 問題設定２：最適化

関数f を最大化するパラメータxを求めたい x^∗_i = arg max

x∈{x1,...,xn} f(x)

(3)

(4)

Input

Output

Step 0

Objective

Prediction Observations

Next Sample Uncertainty

(5)

Input

Output

Step 1

Objective

Prediction Observations

Next Sample Uncertainty

(6)

Input

Output

Step 2

Objective

Prediction Observations

Next Sample Uncertainty

(7)

Input

Output

Step 3

Objective

Prediction Observations

Next Sample Uncertainty

(8)

Input

Output

Step 4

Objective

Prediction Observations

Next Sample Uncertainty

(9)

Input

Output

Step 5

Objective

Prediction Observations

Next Sample Uncertainty

(10)

Input

Output

Step 6

Objective

Prediction Observations

Next Sample Uncertainty

(11)

Input

Output

Step 7

Objective

Prediction Observations

Next Sample Uncertainty

(12)

Input

Output

Step 8

Objective

Prediction Observations

Next Sample Uncertainty

(13)

Input

Output

Step 9

Objective

Prediction Observations

Next Sample Uncertainty

(14)

(15)

Input

Output

Step 0

Objective

Prediction Observations

Next Sample Uncertainty

(16)

Input

Output

Step 1

Objective

Prediction Observations

Next Sample Uncertainty

(17)

Input

Output

Step 2

Objective

Prediction Observations

Next Sample Uncertainty

(18)

Input

Output

Step 3

Objective

Prediction Observations

Next Sample Uncertainty

(19)

Input

Output

Step 4

Objective

Prediction Observations

Next Sample Uncertainty

(20)

Input

Output

Step 5

Objective

Prediction Observations

Next Sample Uncertainty

(21)

Input

Output

Step 6

Objective

Prediction Observations

Next Sample Uncertainty

(22)

Input

Output

Step 7

Objective

Prediction Observations

Next Sample Uncertainty

(23)

(24)

Input

Output

Step 4

Objective

Prediction Observations

Next Sample Uncertainty

(25)

予測分布

(26)

▶ 線形モデル

y=Xw+ε, ε∼N(0, σ²I)

▶ ラベルありデータとラベルなしデータ

(XL,yL)← {(xi, yi)}i∈L, (XU, )← {(xi, )}i∈U

▶ 事前予測分布

wU ∼N(0, σ₀²I) ⇒ yˆU ∼N(µU,ΣU)

▶ 事後予測分布

w|(XL,yL)∼N(wL, SL) ⇒ yˆU |(XL,yL)∼N(µU|L,ΣU|L)

（前回の講義より）

w = (X^⊤X +σ²

I)⁻¹X y , S = ( 1

X^⊤X + 1 I

)₋1

(27)

▶ ベイズ線形モデルの事前予測分布：yˆU ∼N(µU,ΣU) µU =0,

ΣU =σ²₀XUX_U^⊤+σ²I

▶ ベイズ線形モデルの事後予測分布：

ˆ

yU |(XL,yL)∼N(µ_U_|_L,Σ_U_|_L)

µ_U_|_L=σ₀²XUX_L^⊤(σ²₀XLX_L^⊤+σ²I)⁻¹yL

Σ_U_|_L=σ₀²XUX_U^⊤+σ²I−σ²₀XUX_L^⊤(σ²₀XLX_L^⊤+σ²I)⁻¹XLX_U^⊤

（導出には条件付き正規分布の公式を利用）

(28)

多次元正規分布が [ za

zb

]

∼N ([ µa

µb

] ,

[ Σaa Σab

Σba Σbb

])

と表されているとき，条件付き分布 z_a|z_b∼N(

µ_a_|_b,Σ_a_|_b)

の期待値µ_a_|_b と分散共分散行列Σ_a_|_b は以下のように書ける：

µ_a_|_b=µa+ ΣabΣ⁻_bb¹(zb−µb), Σ_a_|_b= Σ_aa−Σ_abΣ⁻_bb¹Σ_ba

（証明は，例えば，「パターン認識と機械学習」2.3.1節を参照）

(29)

▶ ベイズ線形モデル

y_U =X_Uw+ε, w∼N(0, σ₀²I), ε∼N(0, σ²I), w⊥ε.

における予測値y_U の期待ベクトルと分散共分散行列が，それぞれ，

E[yU] =0,

Cov[yU] =E[(yU −E[yU])(yU−E[yU])^⊤] =σ²₀XUX_U^⊤+σ²I と表されることを示せ．

(30)

(31)

ガウス過程モデル

(32)

▶ カーネル関数：２つの事例xとx^′の類似度を表す関数 k(x,x^′)

▶ 内積カーネル

k(x,x^′) =x^⊤x^′

▶ （q次）多項式カーネル

k(x,x^′) = (x^⊤x^′+ 1)^q

▶ ガウシアンカーネル

k(x,x^′) = exp (

−∥x−x^′∥²2

2s² )

(33)

▶ ラベルありデータとラベルなしデータ

XL ∈Rⁿ^L^×^d, XU ∈Rⁿ^U^×^d

▶ カーネル行列

K(X_L,X_U) =







k(x1,x1) k(x1,x2) · · · k(x1,xnU) k(x₂,x₁) k(x₂,x₂) · · · k(x₂,x_n_U)

... ... . .. ... k(xn_L,x1) k(xn_L,x2) · · · k(xn_L,xn_U)





∈Rⁿ^L^×ⁿ^U

▶ カーネルベクトル

k(XL,xi) =







k(x1,xi) k(x2,xi)

...





∈Rⁿ^L

(34)

▶ ガウス過程モデル（Gaussian Process Model）

ベイズ線形モデルのカーネル化

▶ ガウス過程モデルの事前予測分布：yˆU ∼N(µU,ΣU)

µ_U=0,

Σ_U=σ₀²K(X_U,X_U) +σ²I

▶ ガウス過程モデルの事後予測分布：

ˆ

yU |(XL,yL)∼N(µ_U_|_L,Σ_U_|_L)

µ_U_|_L=σ₀²K(X_U,X_L)(σ²₀K(X_L,X_L) +σ²I)⁻¹y_L

Σ_U|L=σ₀²K(XU,XU) +σ²I−σ²₀K(XU,XL)(σ₀²K(XL,XL) +σ²I)⁻¹K(XL,XU)

（導出には条件付き正規分布の公式を利用）

(35)

Input

Output

Step 4

Objective

Prediction Observations

Next Sample Uncertainty

▶ ガウス過程モデルの事後予測分布：ˆyi|(XL,yL)∼N(µ_i_|_L,Σ_i_|_L)

µi|L=σ²₀k(XL,xi)^⊤(σ₀²K(XL,XL) +σ²I)⁻¹y

⊤ −

(36)

ベイズ最適化

(37)

▶ ラベルありデータ集合L：{(xi, yi)}i∈L

▶ ラベルなしデータ集合U：{(xi,yi)}i∈U

▶ アルゴリズム

Input: {(xi, yi)}i∈L, {(xi,yi)}i∈U

1: while実験リソース（予算や時間）があるdo 2: for i∈ U do

3: 獲得関数a(x_i)を計算 4: end for

5: i^′←arg max_i_∈Ua(x_i)

6: （実験などにより）y_i′を求める 7: L ← L ∪ {i^′},U ← U \ {i^′} 8: end while

9: ibest←arg maxi∈Lyi

Output: 最適な入出力(xi_best, yi_best)

(38)

▶ Probability of Improvement (PI) a(xi) =Pi|L(yi≥max

h∈L yh), i∈ U

ただし，maxh∈Lyh はラベルあり事例の中で最大の出力値

(39)

▶ Expected Improvement (EI)

a(xi) =E[f(xi)−max

h∈Lyh] i∈ U

ただし，maxh∈Lyh はラベルあり事例の中で最大の出力値

(40)

▶ Exploration（探索）とExploitation（搾取）のトレードオフ

(41)

(42)

Input

Output

Step 0

Objective

Prediction Observations

Next Sample Uncertainty

(43)

Input

Output

Step 1

Objective

Prediction Observations

Next Sample Uncertainty

(44)

Input

Output

Step 2

Objective

Prediction Observations

Next Sample Uncertainty

(45)

Input

Output

Step 3

Objective

Prediction Observations

Next Sample Uncertainty

(46)

Input

Output

Step 4

Objective

Prediction Observations

Next Sample Uncertainty

(47)

Input

Output

Step 5

Objective

Prediction Observations

Next Sample Uncertainty

(48)

Input

Output

Step 6

Objective

Prediction Observations

Next Sample Uncertainty

(49)

Input

Output

Step 7

Objective

Prediction Observations

Next Sample Uncertainty

(50)

▶ ロジスティック回帰分析

ˆP(y= 1|x) =f_sigmoid(w₀+

∑d j=1

w_jx_j), f_sigmoid(z) = 1 1 + exp(−z)

−5 0 5

0.00.20.40.60.81.0

Logistic Function

(51)

0.0 0.2 0.4 0.6 0.8 x

0.0 0.2 0.4 0.6 0.8 1.0

y,p(y=1)

Mean Truth Data Confidence

0.0 0.2 0.4 0.6 0.8

x

−1.5

−1.0

−0.5 0.0 0.5 1.0 1.5

f(x)

Mean Truth Confidence

(52)