• 検索結果がありません。

⇓ ラグランジアン :

テンソル結果 2: 計算速度

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 10 20 30 40 50

Fraction of observed elements

Computation time (s)

As a Matrix Constraint Mixture Tucker (large) Tucker (exact)

しかも凸最適化は速い!

(Tomioka et al. 10)

. . . . . .

手法3: Operator Splitting

Outline

.

. .

1

イントロ

.

. .

2 準備

.

. .

3 手法

1: Prox

作用素

.

. .

4 手法

2: Legendre

変換

.

. .

5 手法3: Operator Splitting

.

. .

6 まとめ

.

. .

7

Appendix

冨岡 亮太(東大) 51 / 66

. . . . . .

手法3: Operator Splitting

凸最適化 方程式のゼロ点

.

最小化問題

.

.

.

. . .

.

.

min

x

f (x ) + g(x ).

.

方程式のゼロ点

.

.

.

. . .

.

.

Find x such that (T

f

+ T

g

)(x )

0,

where

T

f

= ∂f,

T

g

= ∂g (

劣微分作用素

).

(Prox

作用素

):

x

= prox

f

(z ) = argmin

x∈Rn

µ

f

(x

) + 1

2

xz2

Tf

(x ) + (x

z

)

0

prox

f

(z) = (I +

Tf

)

1

(z )

. . . . . .

手法3: Operator Splitting

凸最適化 方程式のゼロ点

.

最小化問題

.

.

.

. . .

.

.

min

x

f (x ) + g(x ).

.

方程式のゼロ点

.

.

.

. . .

.

.

Find x such that (T

f

+ T

g

)(x )

0,

where

T

f

= ∂f,

T

g

= ∂g (

劣微分作用素

).

(Prox

作用素

):

x = prox

f

(z ) = argmin

x∈Rn

µ

f (x

) + 1

2 x

z

2

T

f

(x ) + (x z )

0

prox

f

(z) = (I + T

f

)

1

(z )

冨岡 亮太(東大) 52 / 66

. . . . . .

手法3: Operator Splitting

Forward-Backward Splitting ( IST)

(Lions & Mercier 76)

minimize

x

f (x) + g(x )

find x T

f

(x ) + T

g

(x)

0

find

x ηTf

(x ) +

ηTg

(x )

0

find

x x

+

ηTf

(x)

x−ηTg

(x )

find

x

(I +

ηTf

)(x ) = (x

−ηTg

(x ))

find

x x

= prox

ηf

(x

−ηTg

(x))

反復式

:

xt+1

= prox

ηtf

(x

t −ηt∇g(xt

))

Douglas Rachford Splitting

も同様の方法で導出できる(複雑すぎる ので省略)

. . . . . .

手法3: Operator Splitting

Forward-Backward Splitting ( IST)

(Lions & Mercier 76)

minimize

x

f (x) + g(x )

find x T

f

(x ) + T

g

(x)

0

find x ηT

f

(x ) + ηT

g

(x )

0

find

x x

+

ηTf

(x)

x−ηTg

(x )

find

x

(I +

ηTf

)(x ) = (x

−ηTg

(x ))

find

x x

= prox

ηf

(x

−ηTg

(x))

反復式

:

xt+1

= prox

ηtf

(x

t −ηt∇g(xt

))

Douglas Rachford Splitting

も同様の方法で導出できる(複雑すぎる ので省略)

冨岡 亮太(東大) 53 / 66

. . . . . .

手法3: Operator Splitting

Forward-Backward Splitting ( IST)

(Lions & Mercier 76)

minimize

x

f (x) + g(x )

find x T

f

(x ) + T

g

(x)

0

find x ηT

f

(x ) + ηT

g

(x )

0

find x x + ηT

f

(x) x ηT

g

(x )

find

x

(I +

ηTf

)(x ) = (x

−ηTg

(x ))

find

x x

= prox

ηf

(x

−ηTg

(x))

反復式

:

xt+1

= prox

ηtf

(x

t −ηt∇g(xt

))

Douglas Rachford Splitting

も同様の方法で導出できる(複雑すぎる ので省略)

. . . . . .

手法3: Operator Splitting

Forward-Backward Splitting ( IST)

(Lions & Mercier 76)

minimize

x

f (x) + g(x )

find x T

f

(x ) + T

g

(x)

0

find x ηT

f

(x ) + ηT

g

(x )

0

find x x + ηT

f

(x) x ηT

g

(x )

find x (I + ηT

f

)(x ) = (x ηT

g

(x ))

find

x x

= prox

ηf

(x

−ηTg

(x))

反復式

:

xt+1

= prox

ηtf

(x

t −ηt∇g(xt

))

Douglas Rachford Splitting

も同様の方法で導出できる(複雑すぎる ので省略)

冨岡 亮太(東大) 53 / 66

. . . . . .

手法3: Operator Splitting

Forward-Backward Splitting ( IST)

(Lions & Mercier 76)

minimize

x

f (x) + g(x )

find x T

f

(x ) + T

g

(x)

0

find x ηT

f

(x ) + ηT

g

(x )

0

find x x + ηT

f

(x) x ηT

g

(x )

find x (I + ηT

f

)(x ) = (x ηT

g

(x ))

find x x = prox

ηf

(x ηT

g

(x))

反復式

:

xt+1

= prox

ηtf

(x

t −ηt∇g(xt

))

Douglas Rachford Splitting

も同様の方法で導出できる(複雑すぎる ので省略)

. . . . . .

手法3: Operator Splitting

Forward-Backward Splitting ( IST)

(Lions & Mercier 76)

minimize

x

f (x) + g(x )

find x T

f

(x ) + T

g

(x)

0

find x ηT

f

(x ) + ηT

g

(x )

0

find x x + ηT

f

(x) x ηT

g

(x )

find x (I + ηT

f

)(x ) = (x ηT

g

(x ))

find x x = prox

ηf

(x ηT

g

(x))

反復式

:

x

t+1

= prox

ηtf

(x

t

η

t

g(x

t

))

Douglas Rachford Splitting

も同様の方法で導出できる(複雑すぎる ので省略)

冨岡 亮太(東大) 53 / 66

. . . . . .

手法3: Operator Splitting

Forward-Backward Splitting ( IST)

(Lions & Mercier 76)

minimize

x

f (x) + g(x )

find x T

f

(x ) + T

g

(x)

0

find x ηT

f

(x ) + ηT

g

(x )

0

find x x + ηT

f

(x) x ηT

g

(x )

find x (I + ηT

f

)(x ) = (x ηT

g

(x ))

find x x = prox

ηf

(x ηT

g

(x))

反復式

:

x

t+1

= prox

ηtf

(x

t

η

t

g(x

t

))

Douglas Rachford Splitting

も同様の方法で導出できる(複雑すぎる ので省略)

. . . . . .

まとめ

Outline

.

. .

1

イントロ

.

. .

2 準備

.

. .

3 手法

1: Prox

作用素

.

. .

4 手法

2: Legendre

変換

.

. .

5 手法

3: Operator Splitting

.

. .

6 まとめ

.

. .

7

Appendix

冨岡 亮太(東大) 54 / 66

. . . . . .

まとめ

メッセージ

ブラックボックス最適化から中身を考慮した最適化へ 微分不可能でも怖くない

Prox

作用素や凸共役を計算しよう

. . . . . .

Appendix

Outline

.

. .

1

イントロ

.

. .

2 準備

.

. .

3 手法

1: Prox

作用素

.

. .

4 手法

2: Legendre

変換

.

. .

5 手法

3: Operator Splitting

.

. .

6 まとめ

.

. .

7 Appendix

冨岡 亮太(東大) 56 / 66

. . . . . .

Appendix

凸共役の性質 (contd.)

凸共役(

Legendre

変換)と畳み込み

: (f + g)

(y) = inf

α

(f

(y

α) +

g

(α)) .

最小化と凸共役

:

inf

x f

(x ) =

sup

x

(〈

0,x〉 −f

(x)) =

−f

(0).

Fenchel

双対

:

inf

x

(f (x) +

g(x

)) =

(f +

g)

(0) = sup

α

(

−f

(

α)−g

(α))

.

. . . . . .

Appendix

凸共役の性質 (contd.)

凸共役(

Legendre

変換)と畳み込み

: (f + g)

(y) = inf

α

(f

(y

α) +

g

(α)) .

最小化と凸共役

:

inf

x

f (x ) = sup

x

(〈

0,

x 〉 − f (x)) = f

(0).

Fenchel

双対

:

inf

x

(f (x) +

g(x

)) =

(f +

g)

(0) = sup

α

(

−f

(

α)−g

(α))

.

冨岡 亮太(東大) 57 / 66

. . . . . .

Appendix

凸共役の性質 (contd.)

凸共役(

Legendre

変換)と畳み込み

: (f + g)

(y) = inf

α

(f

(y

α) +

g

(α)) .

最小化と凸共役

:

inf

x

f (x ) = sup

x

(〈

0,

x 〉 − f (x)) = f

(0).

Fenchel

双対

:

inf

x

(f (x) + g(x )) = (f + g)

(0) = sup

α

( f

(

α)

g

(α)) .

. . . . . .

Appendix

Fenchel 双対定理

inf

x

(f (Ax ) + g(x )) = sup

α

³ f

(

α)

g

(A

α)

´ .

等式制約付き最適化問題

minimize

x∈Rn,z∈Rm

f (z ) + g(x), subject to Ax = z

の双対問題として導出できる.

(http://www.ibis.t.u-tokyo.ac.jp/RyotaTomioka/Notes/DerivingDual)

凸共役の一覧表があれば機械的に双対問題を作れる点が美味しい.

冨岡 亮太(東大) 58 / 66

. . . . . .

Appendix

Fenchel 双対の例 (1): SVM

.

主問題

.

.

.

. . .

.

.

min

w

f (y X w) + φ

λ

(x)

 

 

 

f (z) = X

m i=1

max(0, 1 z

i

), φ

λ

(x ) = λ

2 x

2

.

.

双対問題

.

.

.

. . .

.

.

max

α

f

(

α)

φ

λ

(X

y ))

 

 

 

f

(−α) = (P

m

i=1

α

i

(0

α

1),

+∞ (oterwise), φ

λ

(v ) = 1

v

2

.

冨岡 亮太(東大) 59 / 66

. . . . . .

Appendix

Fenchel双対の例(2):正則化付きロジスティック回帰(Jaakkola & Haussler 99)

.

主問題

.

.

.

. . .

.

.

min

w

f (y X w) + φ

λ

(x)

 

 

 

f (z) = X

m i=1

log(1 + exp( z

i

)), φ

λ

(x ) = λ

2 x

2

.

.

双対問題

.

.

.

. . .

.

.

max

α

f

(

α)

φ

λ

(X

y))

 

 

 

 

 

f

(

α) =

X

m

i=1

α

i

log(α

i

)

+(1 α

i

) log(1 α

i

), φ

λ

(v ) = 1

v

2

,

(a) primal losses

O 1

Hinge Loss Logistic Loss

(b) dual losses

O 1

Hinge Loss Logistic Loss

冨岡 亮太(東大) 60 / 66

. . . . . .

Appendix

Fenchel 双対の例 (3): 線形計画

.

主問題

.

.

.

. . .

.

.

(P) min c

x,

s.t. Ax = b, x 0.

.

主問題

.

.

.

. . .

.

.

(P

) min

x

f (Ax ) + g(x)

 

 

 

 

 

f(z) =

( 0 (z = b), + (otherwise), g(x ) =

( c

x (x 0), + (otehrwise).

.

双対問題

.

.

.

. . .

.

.

(D) max b

y, s.t. A

y c.

.

双対問題’

.

.

.

. . .

.

.

(D

) max

y

f ( y) g(A

y)

 

 

f

(y ) = b

y , g

(α) =

(

0 (α c), +∞ (oterwise).

冨岡 亮太(東大) 61 / 66

. . . . . .

Appendix

凸じゃない最適化は?

一般的に対処するのは困難

基本的な方針は凸最適化を繰り返し解くこと EM

Difference of Convex (DC) programming CCCP

冨岡 亮太(東大) 62 / 66

. . . . . .

Appendix

文献

全体的なまとめ

Tomioka, Suzuki, & Sugiyama (2011) Augmented Lagrangian Methods for Learning, Selecting, and Combining Features. In Sra, Nowozin, Wright., editors,Optimization for Machine Learning, MIT Press.

Combettes & Pesquet (2010) Proximal splitting methods in signal processing. InFixed-Point Algorithms for Inverse Problems in Science and Engineering. Springer-Verlag.

Boyd, Parikh, Peleato, & Eckstein (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers.

教科書

Rockafellar (1970) Convex Analysis. Princeton University Press.

Bertsekas (1999) Nonlinear Programming. Athena Scientific.

Nesterov (2003) Introductory Lectures on Convex Optimization: A Basic Course. Springer.

. . . . . .

Appendix

文献

Proximal Point Algorithm/Augmented Lagrangian

Arrow, Hurwicz, & Uzawa (1958) Studies in Linear and Non-Linear Programming. Stanford University Press.

Moreau (1965) Proximité et dualité dans un espace Hilbertien. Bul letin de la S. M. F.

Rockafellar (1976) Monotone operators and the proximal point algorithm.

SIAM J Control Optim 14, 877–898.

Bertsekas (1982) Constrained Optimization and Lagrange Multiplier Methods. Academic Press.

Tomioka, Suzuki, & Sugiyama (2011) Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparse Learning. Arxiv:0911.4046.

IST/FISTA

Nesterov (2007) Gradient Methods for Minimizing Composite Objective Function.

Beck & Teboulle (2009) A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J Imag Sci 2, 183–202.

冨岡 亮太(東大) 64 / 66

. . . . . .

Appendix

文献

Operator Splitting

Lions & Mercier (1976) Splitting Algorithms for the Sum of Two Nonlinear Operators. SIAM J Numer Anal 16, 964–979.

Gabay & Mercier (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2, 17–40.

Eckstein & Bertsekas (1992) On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators.

マルチタスク/マルチカーネル

Evgeniou, Micchelli, & Pontil (2005) Learning Multiple Tasks with Kernel Methods. JMLR 6, 615–637.

Bach, Thibaux, & Jordan (2005) Computing regularization paths for learning multiple kernels. Advances in NIPS, 73–80.

Suzuki & Tomioka (2009) SpicyMKL. Arxiv:0909.5026.

関連したドキュメント