• 検索結果がありません。

B. Proof of the main theorem

N/A
N/A
Protected

Academic year: 2021

シェア "B. Proof of the main theorem"

Copied!
12
0
0

読み込み中.... (全文を見る)

全文

(1)

Supplementary Material:

Stochastic Dual Coordinate Descent with Alternating Direction Multiplier Method

Taiji Suzuki S-TAIJI@IS.TITECH.AC.JP

Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Tokyo 152-8552, JAPAN

A. Derivation of the proximal operation for the smoothed hinge loss

By the definition of the smoothed hinge loss, we have that, for−1≤yiv≤0, fi(v) = sup

u∈R

{uv−fi(u)}= sup

u∈R

uv−1

2(1−yiu)2

= sup

u∈R

1

2(1 +yiv)2−1 2−1

2(1 +yiv−yiu)2

=1

2(1 +yiv)2−1 2, andfi(v) =∞otherwise.

Since

fi(v) C +1

2(q−v)2= ( 1

2C(1 +yiv)22C1 +12(q−v)2 (−1≤yiv≤0),

∞ (otherwise),

=



1+C 2C

v+y1+Ci−qC2

+v

2(yi−qC)2

2C(1+C) +q22 (−1≤yiv≤0),

∞ (otherwise).

Thus by minimizing this with respect tov, we have that

prox(u|fi/C) =





Cu−yi

1+C (−1≤ Cuy1+Ci−1 ≤0),

−yi (−1> Cuy1+Ci−1), 0 (otherwise).

B. Proof of the main theorem

In the section, we give the proofs of the theorems in the main body. For notational simplicity, we rewrite the dual problem as follows:

x∈Xmin,y∈Y

Xn i=1

gi(xi) +φ(y), (S-1a)

s.t. Zx+By= 0, (S-1b)

whereZ ∈ Rp×n,B ∈ Rp×d. This is equivalent to the dual optimization problem in the main text whengi = fi and φ=nψ(·/n)(or equivalentlyφ=nψ). We writeg(x) =Pn

i=1gi(xi).

(2)

Then we consider the following update rule:

y(t)←arg min

y

φ(y)− hw(t−1), Zx(t−1)+Byi+ρ

2kZx(t−1)+Byk2+1

2ky−y(t−1)kQ

x(t)i ←arg min

xI

X

i∈I

gi(xi)− hw(t−1), ZIxI +By(t)i+ρ

2kZIxI+Z\Ix(t−1)\I +By(t)k2+1

2kxI−x(t−1)I kGii

w(t)=w(t−1)−γρ{n(Zx(t)+By(t))−(n−n/K)(Zx(t−1)+By(t−1))}.

Assumption 1 can be interpreted as follows. There is an optimal solution(x, y)and corresponding Lagrange multiplier wsuch that

∂g(x)∋Zw, ∂φ(y)∋Bw.

We denote by∇f(x)an arbitrary element of the subgradient∂f(x)of a convex functionf atx. Moreover, we suppose that each (dual) loss functiongiisv-strongly convex andφish-smooth:

gi(xi)−gi(xi)≥ hzi w, xi−xii+vkxi−xik2

2 .

We also assume that there exithandvφsuch that, for ally, uand ally ∈ Y, there exitsyb ∈ Ywhich depends ony and we have

φ(y)−φ(y)≥ hBw, y−yi+vφ

2 kPKer(B)(y−y)k2, φ(u)−φ(Bw)≥ hy, u−Bwi+h

2ku−Bwk2.

Note that the primal and dual are flipped compared with the main text. Once can check that there is a correspondence betweenvψ, hin the main text andvφ andhsuch thatvφ =vnψ andh=nh.

Define

F(x, y) :=

Xn i=1

gi(xi) +φ(y)− hw, Zx+Byi (=nFD(x, y)).

By the definition ofw, one can easily check that

F(x, y)−F(x, y)≥ nv

2 kx−xk2≥0.

We define

R(x, y, w)

=F(x, y)−F(x, y) +2

ρkw(t)−wk2+ρ(1−γ)

2 kZx+Byk2+1

2kx−xk2vIp+H+ 1

2Kky− Yk2Q. Here again we have that R = nRD. Let nˆ = n/K, the expected cardinality of |I|, and let DiagI(S)be a block diagonal matrix whoseIk×Ik (k = 1, . . . , K)diagonal elements are non-zero and given by(Diag(S))Ik,Ik = SIk,Ik

(k= 1, . . . , K).

Theorem 2. Suppose thatγ= 4n1,DiagI(G)≻2γρ(n−n)Diagˆ I(ZZ)andBis injective. Then, under the assump- tions, the objective function converges R-linearly:

R(x(t), y(t), w(t))≤ 1− µ

K T

R(x(0), y(0), w(0)), E[F(x(t), y(t))−F(x, y)]≤

1− µ K

T

R(x(0), y(0), w(0)), where

µ:= min (1

2

v v+σmax(H)

, hρσmin(BB)

2 max{1,4hρ,4hσmax(Q)}, Kvφ

max(Q), Kvσmin(BB) σmax(Q)(ρσmax(ZZ) + 4v)

) ,

(3)

In particular, we have that

E[kw(t)−wk2]≤ ρ 2

1− µ K

T

R(x(0), y(0), w(0)).

Theorem 1 in the main text can be obtained using the relationvφ = vnφ,h = nh, F = nFD andR = nRD. The convergence of the primal objective is obtained by using the following fact: Sincegis strongly convex aroundx, we have that

g(x)−g(x)≥ hZw, x−xi+vkx−xk2

2 (∀x)

⇒g(u)≤g(Zw) +hx, u−Zwi+ku−Zwk2 2v (∀u).

where we usedZλ∈∂g(x). Using this, we have that, 1

n Xn i=1

fi(ziw(t))−1 n

Xn i=1

fi(ziw)≤D

Zx/n, w(t)−wE

+kZ(w(t)−w)k2 2nv

=D

−y/n, B(w(t)−w)E

+kZ(w(t)−w)k2

2nv ,

where we used the relationZx+By= 0. Moreover, using the relationψ(Bw)≤ψ(Bw) +hy/n, B(w−w)i+ l1kw−wk+l2kw−wk2and the Jensen’s inequalityE[kw(T)−wk]2≤E[kw(T)−wk2], we obtain the assertion.

Proof of Theorem 2.

Step 1(Deriving a basic inequality):

g(x(t))−g(x(t−1)) +φ(y(t))−φ(y(t−1))

=X

i∈I

gi(x(t)i )−X

i∈I

gi(x(t−1)i ) +φ(y(t))−φ(y(t−1))

=X

i∈I

gi(x(t)i )− hw(t−1), Zx(t)+By(t)i+ρ

2kZx(t)+By(t)k2+1

2kx(t)I −x(t−1)I k2GI,I +hw(t−1), Zx(t)+By(t)i −ρ

2kZx(t)+By(t)k2−1

2kx(t)I −x(t−1)I k2GI,I

−X

i∈I

gi(x(t−1)i ) +φ(y(t))−φ(y(t−1)). (S-2)

Here we define thatZ˜I = [Z\IZI]andx˜:=

"

x(t−1)\I xI

#

for a givenxI, and

˜

gI(xI) :=X

i∈I

gi(xi)−D

w(t−1),Z˜Ix˜+By(t)E +ρ

2kZ˜Ix˜+By(t)k2+1

2kxI−x(t−1)I k2GI,I.

Then by the update rule ofx(t), we have that

˜

gI(x(t)I )≤g˜I(xI)−v

2kx(t)I −xIk2−ρ

2kZI(x(t)I −xI)k2−1

2kx(t)I −xIkGI,I,

(4)

which implies X

i∈I

gi(x(t)i )−D

w(t−1), Zx(t)+By(t)E +ρ

2kZx(t)+By(t)k2+1

2kx(t)I −x(t−1)I k2GI,I

≤X

i∈I

gi(xi)−D

w(t−1), ZIxI+Z\Ix\I+By(t)E +ρ

2kZIxI +Z\Ix(t−1)\I +By(t)k2+1

2kxI−x(t−1)I k2GI,I

−v

2kx(t)I −xIk2−ρ

2kZI(x(t)I −xI)k2−1

2kx(t)I −xIkGI,I

=X

i∈I

gi(xi)−D

w(t−1), ZI(xI−x(t)I )E

−D

w(t−1), Zx(t)+By(t)E +ρ

2kZIxI+Z\Ix(t−1)\I +By(t)k2−ρ

2kZx(t)+By(t)k2

2kZx(t)+By(t)k2+1

2kxI−x(t−1)I k2GI,I

−v

2kx(t)I −xIk2−ρ

2kZI(x(t)I −xI)k2−1

2kx(t)I −xIkGI,I

=X

i∈I

gi(xi)−D

w(t−1), ZI(xI−x(t)I )E

−v

2kx(t)I −xIk2−ρ

2kZI(x(t)I −xI)k2−1

2kx(t)I −xIkGI,I

−ρhZ\Ix(t)\I +By(t), ZI(x(t)I −xI)i+ρ

2kZIxIk2−ρ

2kZIx(t)I k2+1

2kxI−x(t−1)I k2GI,I

−D

w(t−1), Zx(t)+By(t)E +ρ

2kZx(t−1)+By(t)k2. Using this, the RHS of Eq. (S-2) can be further bounded by

(RHS)≤X

i∈I

gi(xi)−X

i∈I

gi(x(t−1)i )− hw(t−1), ZI(xI −x(t)I )i

−v

2kx(t)I −xIk2−ρ

2kZI(x(t)I −xI)k2−1

2kx(t)I −xIkGI,I

−ρhZ\Ix(t)\I +By(t), ZI(x(t)I −xI)i+ρ

2kZIxIk2−ρ

2kZIx(t)I k2 +1

2kxI −x(t−1)I k2GI,I−1

2kx(t)I −x(t−1)I k2GI,I

+φ(y(t))−φ(y(t−1)). (S-3)

Here, we bound the term

−ρhZ\ix(t)\i +By(t), ZI(x(t)I −xI)i+ρ

2kZIxIk2−ρ

2kZIx(t)I k2. By Lemma 3, the expectation of this term is equivalent to

Eh

−ρ

nhZx(t−1)+By(t), Z(nx(t)−(n−ˆn)x(t−1)−nxˆ )ii + ρ

2Kkx(t−1)−xk2Diag

I(ZZ)−ρ 2Eh

kx(t)−x(t−1)k2Diag

I(ZZ)

i.

Note that, for any block diagonal matrixSwhich satisfiesSIk,Ik′ = (Si,j)(i,j)∈Ik×Ik′ =O(∀k6=k), we have that E[kx(t)I −xIk2SI,I] = E[kx(t)I −x(t−1)I +x(t−1)I −xIk2SI,I]

= E[kx(t)I −x(t−1)I k2SI,I] + E[2hx(t)I −x(t−1)I , x(t−1)I −xIiSI,I] + E[kx(t−1)I −xIk2SI,I]

= E[kx(t)−x(t−1)k2S] + E[2hx(t)−x(t−1), x(t−1)−xiS] + 1

Kkx(t−1)−xk2S

= E[kx(t)−xk2S]−E[kx(t−1)−xk2S] + 1

Kkx(t−1)−xk2S

= E[kx(t)−xk2S]−

1− 1 K

E[kx(t−1)−xk2S],

(5)

where the expectation is taken with respect to the choice ofI∈ {I1, . . . , IK}. Moreover, for a fixed vectorq, we have that E[hqI, x(t)I −xIi]

=E[hqI, x(t)I −x(t−1)I +x(t−1)I −xIi] = E[hq, x(t)−x(t−1)i] + E[hqI, x(t−1)I −xIi]

=E[hq, x(t)−x(t−1)i] + E

" K X

k=1

1[I=Ik]hqIk, x(t−1)Ik −xIki

#

=E[hq, x(t)−x(t−1)i] + 1 K

XK k=1

1[I=Ik]hqIk, x(t−1)Ik −xIki= E[hq, x(t)−x(t−1)i] + 1

Khq, x(t−1)−xi

=E

q, x(t)

1− 1 K

x(t−1)− 1 Kx

.

Then, by taking expectation with respect toIand multiplying both sides of the above inequality byn, we have that nE[g(x(t)) +φ(y(t))−g(x(t−1))−φ(y(t−1))]

≤g(x)−g(x(t−1)) + E[hw(t−1), Z(nx(t)−(n−n)xˆ (t−1)−nxˆ )i]

−Ehnv

2 kx(t)−xk2+nρ

2 kx(t)−xk2Diag

I(ZZ)+n

2kx(t)−xk2Diag

I(G)

i +(n−n)vˆ

2 kx(t−1)−xk2] + (n−ˆn)ρ

2 kx(t−1)−xk2Diag

I(ZZ)+n−nˆ

2 kx(t−1)−xk2Diag

I(G)

+ Eh

−ρhZx(t−1)+By(t), Z(nx(t)−(n−n)xˆ (t−1)−ˆnx)ii +ρˆn

2 kx(t−1)−xk2Diag

I(ZZ)−nρ 2 Eh

kx(t)−x(t−1)k2Diag

I(ZZ)

i +nˆ

2kx(t−1)−xk2Diag

I(G)−n

2E[kx(t)−x(t−1)k2Diag

I(G)]

+nφ(y(t))−nφ(y(t−1)). (S-4)

Here, note that the last two termnφ(y(t))−nφ(y(t−1))is bounded as nφ(y(t))−nφ(y(t−1))

=ˆn(φ(y(t))−φ(y(t−1))) + (n−n)(φ(yˆ (t))−φ(y(t−1)))

≤ˆn(φ(y)−φ(y(t−1))) +D

∇φ(y(t)),(n−n)(yˆ (t)−y(t−1)) + ˆn(y(t)−y)E

−nhˆ

2 kBw− ∇φ(y(t))k2.

for arbitraryy∈ Ywhere we used Lemma 4 in the last line. Define

˜

w(t):=w(t−1)−ρ(Zx(t−1)+By(t)).

Note thatB(t)−Q(y(t)−y(t−1))∈∂φ(y(t)).

Next, addingE[nhw, Z(x(t−1)−x(t)) +B(y(t−1)−y(t))i]to the both sides of Eq. (S-4), we have that nE[F(x(t), y(t))−F(x(t−1), y(t−1))]

≤ˆn(F(x, y)−F(x(t−1), y(t−1)))

+ E[hw(t−1)−w, Z(nx(t)−(n−ˆn)x(t−1)−nxˆ )i]

+ E[hw˜(t)−w, B(ny(t)−(n−n)yˆ (t−1)−nyˆ )i]

− hQ(y(t)−y(t−1)), ny(t)−(n−n)yˆ (t−1)−nyˆ i

−Ehnv

2 kx(t)−xk2+n

2kx(t)−xk2Hi

(6)

+(n−n)vˆ

2 kx(t−1)−xk2+n

2kx(t−1)−xk2H + Eh

−ρhZx(t−1)+By(t), Z(nx(t)−(n−ˆn)x(t−1)−nxˆ )ii

−n 2Eh

kx(t)−x(t−1)k2Hi

−ˆnh

2 kBw− ∇φ(y(t))k2. (S-5) Step 2(Rearranging cross terms between(x(t), y(t), w(t))and(x(t−1), y(t−1), w(t−1))):

Now, we definexˆ(t):=nx(t)−(n−n)xˆ (t−1)andyˆ(t):=ny(t)−(n−n)yˆ (t−1). Then by the update rule ofw(t), we have thatw(t)=w(t−1)−γρ(Zxˆ(t)+Byˆ(t)). We evaluate the termE[hw(t−1)−w, Z(ˆx(t)−ˆnx)i] + E[hw˜(t)−w, B(ˆy(t)− ˆ

ny)i]:

hw(t−1)−w, Z(ˆx(t)−ˆnx)i+hw˜(t)−w, B(ˆy(t)−nyˆ )i

=hw(t−1)−w, Z(ˆx(t)−ˆnx)i+hw(t−1)−ρ(Zx(t−1)+By(t))−w, B(ˆy(t)−nyˆ )i

=hw(t)+γρ(Zxˆ(t)+Byˆ(t))−w, Z(ˆx(t)−ˆnx)i

+hw(t)+γρ(Zxˆ(t)+Byˆ(t))−ρ(Zx(t−1)+By(t))−w, B(ˆy(t)−nyˆ )i

=− 1

γρhw(t)−w, w(t)−w(t−1)i

+γρkZxˆ(t)+Byˆ(t)k2−ρhZx(t−1)+By(t), B(ˆy(t)−nyˆ )i

=− 1 2γρ

kw(t)−wk2+kw(t)−w(t−1)k2− kw(t−1)−wk2 +γρkZxˆ(t)+Byˆ(t)k2−ρhZx(t−1)+By(t), B(ˆy(t)−nyˆ )i

= 1 2γρ

−kw(t)−wk2+kw(t−1)−wk2 +γρ

2 kZxˆ(t)+Byˆ(t)k2

−ρhZx(t−1)+By(t), B(ˆy(t)−nyˆ )i.

Therefore,

hw(t−1)−w, Z(ˆx(t)−nxˆ ) +hw˜(t)−w, B(ˆy(t)−nyˆ )i

−ρhZx(t−1)+By(t), Z(nx(t)−(n−n)xˆ (t−1)−ˆnx)i

= 1 2γρ

−kw(t)−wk2+kw(t−1)−wk2 +γρ

2 kZˆx(t)+Byˆ(t)k2

−ρhZx(t−1)+By(t), Zxˆ(t)+Byˆ(t)i

= 1 2γρ

−kw(t)−wk2+kw(t−1)−wk2 +γρ

2 n2kZx(t)+By(t)k2+γρ

2 (n−n)ˆ 2kZx(t−1)+By(t−1)k2

−γρn(n−n)hZxˆ (t)+By(t), Zx(t−1)+By(t−1)i

−ρhZx(t−1)+By(t), Z(nx(t)−(n−n)xˆ (t−1)) +B(ny(t)−(n−n)yˆ (t−1))i.

Next, we expand the non-squared term:

−γρn(n−n)hZxˆ (t)+By(t), Zx(t−1)+By(t−1)i

−ρhZx(t−1)+By(t), Z(nx(t)−(n−n)xˆ (t−1)) +B(ny(t)−(n−n)yˆ (t−1))i

=−γρn(n−n)hZxˆ (t)−Zx, Zx(t−1)−Zxi

−γρn(n−n)hByˆ (t)−By, By(t−1)−Byi

−γρn(n−n)hZxˆ (t)−Zx, By(t−1)−Byi

−γρn(n−n)hByˆ (t)−By, Zx(t−1)−Zxi

(7)

−nρhZx(t−1)−Zx, Z(x(t)−x)i+ (n−ˆn)ρkZx(t−1)−Zxk2 + (n−n)ρhByˆ (t)−By, B(y(t−1)−y)i −nρkBy(t)−Byk2

−ρhZx(t−1)−Zx, B(ny(t)−(n−n)yˆ (t−1)−nyˆ )i

−ρhB(y(t)−y), Z(nx(t)−(n−n)xˆ (t−1)−nxˆ )i

=−(γρn(n−n) +ˆ nρ)hZx(t)−Zx, Zx(t−1)−Zxi

−(γρn(n−n)ˆ −(n−n)ρ)hByˆ (t)−By, By(t−1)−Byi

−γρn(n−n)hZxˆ (t)−Zx, By(t−1)−Byi

−(γρn(n−n) +ˆ nρ−(n−n)ρ)hByˆ (t)−By, Zx(t−1)−Zxi + (n−n)ρkZxˆ (t−1)−Zxk2−nρkBy(t)−Byk2

−ρ(n−ˆn)hZx(t−1)−Zx, B(y−y(t−1))i

−ρnhB(y(t)−y), Z(x(t)−x)i. (S-6)

Using the relation

hZx(t)−Zx, By(t−1)−Byi=hZ(x(t)−x), B(y(t)−y)i+hZ(x(t)−x), B(y(t−1)−y(t))i, hBy(t)−By, Zx(t−1)−Zxi=hB(y(t)−y(t−1)), Z(x(t−1)−x)i+hB(y(t−1)−y), Z(x(t−1)−x)i, the RHS of Eq. (S-6) is equivalent to

−(γρn(n−n) +ˆ nρ)hZx(t)−Zx, Zx(t−1)−Zxi

−(γρn(n−n)ˆ −(n−n)ρ)hByˆ (t)−By, By(t−1)−Byi + (n−n)ρkZxˆ (t−1)−Zxk2−nρkBy(t)−Byk2

+{−(γρn(n−n) +ˆ ρˆn) +ρ(n−n)}hZxˆ (t−1)−Zx, B(y(t−1)−y)i

−(γρn(n−n) +ˆ ρn)hB(y(t)−y), Z(x(t)−x)i

−γρn(n−n)hZ(xˆ (t)−x), B(y(t−1)−y(t))i

−(γρn(n−n) +ˆ ρˆn)hBy(t)−By(t−1), Zx(t−1)−Zxi.

The last two terms are transformed to

−γρn(n−n)hZˆ (x(t)−x), B(y(t−1)−y(t))i

−(γρn(n−ˆn) +ρˆn)hBy(t)−By(t−1), Zx(t−1)−Zxi

=γρn(n−ˆn)hZ(x(t)−x(t−1)), B(y(t)−y(t−1))i

−ρˆnhBy(t)−By, Zx(t−1)−Zxi+ρˆnhBy(t−1)−By, Zx(t−1)−Zxi.

Thus, the RHS of Eq. (S-6) is further transformed to

−(γρn(n−n) +ˆ nρ)hZx(t)−Zx, Zx(t−1)−Zxi

−(γρn(n−n)ˆ −(n−ˆn)ρ)hBy(t)−By, By(t−1)−Byi + (n−n)ρkZxˆ (t−1)−Zxk2−nρkBy(t)−Byk2

+{−γρn(n−n) +ˆ ρ(n−ˆn)}hZx(t−1)−Zx, B(y(t−1)−y)i

−(γρn(n−n) +ˆ ρn)hB(y(t)−y), Z(x(t)−x)i +γρn(n−ˆn)hZ(x(t)−x(t−1)), B(y(t)−y(t−1))i

−ρˆnhBy(t)−By, Zx(t−1)−Zxi.

(8)

By Lemma 5 andZx=−By, this is equivalent to

−1

2(γρn(n−n) +ˆ nρ){kZx(t)−Zxk2+kZx(t−1)−Zxk2− kZx(t)−Zx(t−1)k2}

−1

2(γρn(n−n)ˆ −(n−ˆn)ρ){kBy(t)−Byk2+kBy(t−1)−Byk2− kBy(t)−By(t−1)k2} + (n−n)ρkZxˆ (t−1)−Zxk2−nρkBy(t)−Byk2

−1

2{−γρn(n−n) +ˆ ρ(n−ˆn)}(kZx(t−1)−Zxk2+kB(y(t−1)−y)k2− kZx(t−1)+By(t−1)k2) +1

2(γρn(n−n) +ˆ ρn)(kZx(t)−Zxk2+kB(y(t)−y)k2− kZx(t)+By(t)k2) +γρn(n−ˆn)hZ(x(t)−x(t−1)), B(y(t)−y(t−1))i

−ρˆnhBy(t)−By, Zx(t−1)−Zxi

=−ρˆn

2 kZx(t−1)−Zxk2+1

2(γρn(n−n) +ˆ nρ)kZx(t)−Zx(t−1)k2

−ρˆn

2 kBy(t)−Byk2+1

2(γρn(n−n)ˆ −(n−n)ρ)kByˆ (t)−By(t−1)k2

−1

2{γρn(n−n)ˆ −ρ(n−n)}kZxˆ (t−1)+By(t−1)k2

−1

2{γρn(n−n) +ˆ ρn}kZx(t)+By(t)k2

+γρn(n−ˆn)hZ(x(t)−x(t−1)), B(y(t)−y(t−1))i

−nρhByˆ (t)−By, Zx(t−1)−Zxi

=−nρˆ

2 kZx(t−1)+By(t)k2 +1

2(γρn(n−n) +ˆ nρ)kZx(t)−Zx(t−1)k2 +1

2(γρn(n−n)ˆ −(n−ˆn)ρ)kBy(t)−By(t−1)k2

−1

2{γρn(n−n)ˆ −ρ(n−n)}kZxˆ (t−1)+By(t−1)k2

−1

2{γρn(n−n) +ˆ ρn}kZx(t)+By(t)k2

+γρn(n−ˆn)hZ(x(t)−x(t−1)), B(y(t)−y(t−1))i. (S-7) Since

γρn(n−ˆn)hZ(x(t)−x(t−1)), B(y(t)−y(t−1))i

≤γρn(n−n)ˆ

2 {kZ(x(t)−x(t−1))k2+kB(y(t)−y(t−1))k2}, the RHS of Eq. (S-7) is bounded by

−ˆnρ

2 kZx(t−1)+By(t)k2 +1

2(2γρn(n−ˆn) +nρ)kZx(t)−Zx(t−1)k2 +1

2(2γρn(n−ˆn)−(n−ˆn)ρ)kBy(t)−By(t−1)k2

−1

2{γρn(n−n)ˆ −ρ(n−ˆn)}kZx(t−1)+By(t−1)k2−1

2{γρn(n−n) +ˆ ρn}kZx(t)+By(t)k2. Combining this and Eq. (S-5), and noticingkZx(t)−Zx(t−1)k=kZI(x(t)I −x(t−1)I )k=kx(t)−x(t−1)kDiag

I(ZZ), we

(9)

obtain

nE[F(x(t), y(t))−F(x(t−1), y(t−1))]

≤ˆn(F(x, y)−F(x(t−1), y(t−1))) + 1

2γρ

−kw(t)−wk2+kw(t−1)−wk2

−nρˆ

2 kZx(t−1)+By(t)k2 +1

2{γρn2−γρn(n−ˆn)−ρn}kZx(t)+By(t)k2 +1

2{γρ(n−n)ˆ 2−γρn(n−n) +ˆ ρ(n−n)}kZxˆ (t−1)+By(t−1)k2

−Ehnv

2 kx(t)−xk2+n

2kx(t)−xk2Hi +(n−n)vˆ

2 kx(t−1)−xk2+n

2kx(t−1)−xk2H +γρn(n−n)Eˆ h

kx(t)−x(t−1)k2Diag

I(ZZ)

i−n

2E[kx(t)−x(t−1)k2Diag

I(G)] + (γρn(n−ˆn)−(n−n)ρˆ

2 )kB(y(t)−y(t−1))k2

− hQ(y(t)−y(t−1)), ny(t)−(n−ˆn)y(t−1)−nyˆ i

−nhˆ

2 kBw− ∇φ(y(t))k2.

Since we have assumedDiagI(G)≻2γρ(n−n)Diagˆ I(ZZ), it holds that γρn(n−n)Eˆ h

kx(t)−x(t−1)k2Diag

I(ZZ)

i

−n

2E[kx(t)−x(t−1)k2Diag

I(G)]≤0.

Moreover, we have that

− hQ(y(t)−y(t−1)), ny(t)−(n−n)yˆ (t−1)−nyˆ i

=−nky(t)−y(t−1)k2Q+1

2{ky(t)−y(t−1)k2Q+ky(t−1)−yk2Q− ky(t)−yk2Q}

=−

n−nˆ 2

ky(t)−y(t−1)k2Q+nˆ

2ky(t−1)−yk2Q−ˆn

2ky(t)−yk2Q. Finally, we achieve

nE[F(x(t), y(t))−F(x(t−1), y(t−1))]

≤ˆn(F(x, y)−F(x(t−1), y(t−1))) + 1

2γρ

−kw(t)−wk2+kw(t−1)−wk2

−ρn(1−γ)

2 kZx(t)+By(t)k2+ρ(n−n)(1 +ˆ γ)

2 kZx(t−1)+By(t−1)k2

−Ehnv

2 kx(t)−xk2+n

2kx(t)−xk2Hi +(n−n)vˆ

2 kx(t−1)−xk2+n

2kx(t−1)−xk2H +γρn(n−n)Eˆ h

kx(t)−x(t−1)k2Diag

I(ZZ)

i−n

2E[kx(t)−x(t−1)k2Diag

I(G)]

−nρˆ

2 kZx(t−1)+By(t)k2 + (γρn(n−ˆn)−(n−n)ρˆ

2 )kB(y(t)−y(t−1))k2

(10)

n−ˆn 2

ky(t)−y(t−1)k2Q+nˆ

2ky(t−1)−yk2Q−nˆ

2ky(t)−yk2Q

−nhˆ

2 kBw− ∇φ(y(t))k2. (S-8)

Note that Eq. (S-8) holds for arbitraryy∈ Y. Step 3:(Deriving the assertion)

(i) Now, since we can take∇φ(y(t)) =Bw(t−1)−ρ(Zx(t−1)+By(t))−Q(y(t)−y(t−1)), it holds that kBw− ∇φ(y(t))k2=kB(w−w(t−1))−ρ(Zx(t−1)+By(t))−Q(y(t)−y(t−1))k2. SinceBis injective, this gives that

−h

2kBw− ∇φ(y(t))k2

≤ −hσmin(BB)kw−w(t−1)k2+ 2hρ2kZx(t−1)+By(t)k2+ 2hkQ(y(t)−y(t−1))k2

≤ −hσmin(BB)kw−w(t−1)k2+ 2hρ2kZx(t−1)+By(t)k2+ 2hσmax(Q)ky(t)−y(t−1)k2Q. Now, dividing both sides bymax{1,4hρ,4hσmax(Q)}(≥1), we have

−nhˆ

2 kBw− ∇φ(y(t))k2

≤ − nhˆ σmin(BB)

max{1,4hρ,4hσmax(Q)}kw−w(t−1)k2+ˆnρ

2 kZx(t−1)+By(t)k2+nˆ

2ky(t)−y(t−1)k2Q. (S-9) (ii) Next, it holds that, for someyb∈ Y,

1 2

F(x, y)−F(x(t−1), y(t−1))

≤ −vφ

4 kPKer(B)(y(t−1)−yb)k2. (S-10) On the other hand, for arbitrarya >0, it follows that

−ρ

8kZx(t−1)+By(t−1)k2

≤ −1

8(1−a)kZ(x(t−1)−x)k2−1

8(1−a−1)kB(y(t−1)−by)k2. Thus, settinga= 1 +ρσmax2v(ZZ), we have that

−ρ

8kZx(t−1)+By(t−1)k2

≤ρ 8

2v

ρσmax(ZZ)σmax(ZZ)kx(t−1)−xk2

−ρ 8

2vρ

ρσmax(ZZ) + 4vσmin(BB)kPKer(B) (y(t−1)−yb)k2

=v

4kx(t−1)−xk2− vρσmin(BB)

4(ρσmax(ZZ) + 4v)kPKer(B) (y(t−1)−yb)k2. (S-11) Combining Eqs. (S-10), (S-11), we have that

ˆ n 2

F(x, y)−F(x(t−1), y(t−1))

−nρˆ

8nkZx(t−1)+By(t−1)k2

≤nvˆ

4 kx(t−1)−xk2−ˆn nmin

nvφ, nρvσmin(BB) ρσmax(ZZ) + 4v

ky(t−1)−byk2Qmax(Q)

≤nvˆ

4 kx(t−1)−xk2−ˆn nmin

nvφ, nρvσmin(BB) ρσmax(ZZ) + 4v

ky(t−1)− Yk2Q

max(Q) . (S-12)

(11)

(iii) By the assumptionDiagI(G)≻2γρ(n−n)Diagˆ I(ZZ), it holds that γρn(n−n)Eˆ h

kx(t)−x(t−1)k2Diag

I(ZZ)

i−n

2E[kx(t)−x(t−1)k2Diag

I(G)]≤0. (S-13) (iv) Therefore, ifγ= 4n1 , applying Eq. (S-9), Eq. (S-12) and Eq. (S-13) to Eq. (S-8), for

ν =nˆ nmin

(1 4

v v+σmax(H)

, hρσmin(BB)

2 max{1,4hρ,4hσmax(Q)}, nvφ/ˆn

max(Q), nvσmin(BB)/nˆ 4σmax(Q)(ρσmax(ZZ) + 4v)

) ,

we have that

Eh

F(x(t), y(t))−F(x, y) + 1

2nγρkw(t)−wk2 +ρ(1−γ)

2 kZx(t)+By(t)k2+1

2kx(t)−xk2vIp+H+ nˆ

2nky(t)−yk2Qi

≤(1−ν) (

F(x(t−1), y(t−1))−F(x, y) + 1

2nγρkw(t−1)−wk2 +ρ(1−γ)

2 kZx(t−1)+By(t−1)k2+1

2kx(t−1)−xk2vIp+H+ nˆ

2nky(t−1)−yk2Q )

.

Settingµ:=nν/ˆn, this gives the assertion.

Lemma 3.

Eh

−ρhZ\ix(t)\i +By(t), ZI(x(t)I −xI)i+ρ

2kZIxIk2−ρ

2kZIx(t)I k2i

≤Eh

−ρ

nhZx(t−1)+By(t), Z(nx(t)−(n−ˆn)x(t−1)−x)ii + ρ

2nkx(t−1)−xk2Diag

I(ZZ)−ρ 2Eh

kx(t)−x(t−1)k2Diag

I(ZZ)

i .

Proof.

ρhZ\Ix(t−1)\I , ZI(xI−x(t)I )i+ρhBy(t), ZI(xI−x(t)I )i+ρ

2kZIxIk2−ρ

2kZIx(t)I k2

=ρhZx(t−1), ZI(xI−x(t)I )i+ρhBy(t), ZI(xI−x(t)I )i+ρ

2kZIxIk2−ρ

2kZIx(t)I k2

−ρhZIx(t−1)i , ZI(xI−x(t)I )i

=ρhZx(t−1), ZI(xI−x(t−1)I +x(t−1)I −x(t)I )i+ρhBy(t), ZI(xI −x(t−1)I +x(t−1)I −x(t)I )i +ρ

2kZIxIk2−ρ

2kZIx(t)I k2−ρhZIx(t−1)I , ZI(xI−x(t)I )i

=ρhZx(t−1)+By(t), ZI(xI −x(t−1)I )i +ρ

2kZI(x(t−1)I −xI)k2−ρ

2kZI(x(t)I −x(t−1)I )k2 +ρhZx(t−1)+By(t), Z(x(t−1)−x(t))i.

参照

関連したドキュメント

Faculty of Engineering, Tokyo Institute of Technology, 2– 12–1–S3–5 Ookayama, Meguro-ku, Tokyo, 152–8552 Japan a) E-mail: [email protected]

Department of Physics, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8551, Japan. (Dated: October

Global Scientific Information and Computing Center, Tokyo Institute of Technology, 2–12–1 Ookayama, Meguro-ku, Tokyo, 152–8550 Japan..

Watabe, Shibaura Institute of Technology, Japan I.Volovich, Steklov, Mathematical Institute, Russia QBIC members in Tokyo University of Science Sponsor Tokyo University of Science

Abstract. In this article we expose a proof of the Canonical Decomposi- tion Theorem of irreducible 3-manifolds along tori and annuli, also known as JSJ Theorem. This proof will

Ozawa; A functional analysis proof of Gromov’s polynomial growth

[1] Nakayama, T., Support Theorem for mild solutions of SDE’s in Hilbert spaces, Journal of mathematical sciences, the University of Tokyo, 2004.

Department of Mathematical and Computing Sciences, Tokyo Institute of Technology Department of Mathematics, Ewha Women’s University Sunyoung