集中講義「深層学習および機械学習の数理」
レポート課題 締め切り:9月25日
2020-9-2
Taiji Suzuki 6th Laboratory of Mathematical Informatics (Room 352, Faculty of Engineering Building 6) e-mail: [email protected] 以下の問題のうち2問以上を選択して解答してください.レポートは日本語で構いません.
1. Consider the second order polynomial kernel k(x, y) = (1 +x⊤y)2 on R2. Construct a function having a formf(x) =∑m
j=1αjk(xj, x) withxj∈R2 andαj ∈R(j= 1, . . . , m) such that {
f(x)≤0 (∥x∥ ≤1), f(x)>0 (∥x∥>1), forx∈R2
2. Derive the dual problem of support vector machine by using Fenchel’s duality theorem:
min
α∈Rn
∑n i=1
max {
1−yi
∑n j=1
k(xi, xj)αj,0 }
+λ 2
∑n i=1
∑n j=1
αiαjk(xi, xj).
Fenchel’s duality theorem is given in Theorem 2.
3. Let k : Rd ×Rd → R be a characteristic kernel that has H as its corresponding RKHS. For a probability measureP, define
mP(x) =
∫
k(x, X)dP(X).
Assume thatk is transition invariant, i.e. there exitsψ such thatk(x, y) =ψ(x−y). Then show that, for any probability measuresP andQ,
∥mP −mQ∥H = 0 ⇔ P =Q.
Hint: Use Bochner’s theorem and the necessary and sufficient condition of positive definiteness.
4. For 1 ≤ q < ∞, let ∥w∥q := (∑d
i=1|wi|q)1/q for w ∈ Rd, and Hq := {f(x) = w⊤x (x ∈ Rd) |
∥w∥q ≤1, w∈Rd}. Givenx1, . . . , xn∈Rn, its empirical Rademacher complexity is denoted by
Rˆn(Hq) = Eσ
[ sup
f∈Hq
1 n
∑n i=1
σif(xi)
x1, . . . , xd
] .
Now, suppose that 1≤p, q <∞satisfiesq < p. Then, show that Rˆn(Hp)≤d1/p∗−1/q∗Rˆn(Hq) wherep∗ =p/(p−1) andq∗=q/(q−1).
5. Prove the Massart’s theorem: For a finite set of functions F ={f1, . . . , fM} where eachfi (i = 1, . . . , M) is a function fromRd to Rsatisfying supx|fi(x)| ≤R, it holds that
Rˆn(F) = Eσ
[ sup
f∈F
1 n
∑n i=1
σif(xi)
(xi)ni=1 ]
= Eσ
[
sup
f∈F∪{−f′|f′∈F}
1 n
∑n i=1
σif(xi)(xi)ni=1 ]
≤R
√2 log 2M
n .
Hint: You may use the following inequalities:
1
• Jensen’s inequality: exp(sEσ[g(σ1, . . . , σn)])≤Eσ[exp(sg(σ1, . . . , σn))] fors∈Randg:Rn → R.
• exp(maxf∈F′F(f))≤∑
f∈F′exp(F(f)) forF :F′ →R.
• Hoeffding’s inequality: Eσi[exp(σia)]≤exp(a2/2) fora∈R.
6. Suppose that∥x∥∞<1 (a.s.). Derive an upper bound of the Rademacher complexity of a neural network model:
F=
∑M j=1
αjη(a⊤jx)|αj ∈R, aj ∈Rd, max
1≤j≤M∥aj∥p≤C1, ∥α∥q ≤C2
where 1≤p, q≤ ∞andη(u) = max{u,0}.
7. Write your opinion about the future direction of machine learning and artificial intelligence tech- niques. Of course, you may discuss the future of deep learning.
Definition 1 (Legendre transform, convex conjugate) Let f : Rd →Rbe a convex function. Its convex conjugate functionf∗:Rd→R∪ {+∞} is defined by
f∗(y) = sup
x∈Rd{x⊤y−f(x)}. This transformationf 7→f∗ is called Legendre transformation.
Theorem 2 (Fenchel’s duality theorem (special case)) Letf :Rn →Randg:Rd →Rbe convex functions, and let A∈Rn×d. Then it holds that
inf
x∈Rd{f(Ax) +g(x)}=− inf
y∈Rn{f∗(y) +g∗(−A⊤y)} where f∗, g∗are the convex conjugate functions off andg respectively.
2