• 検索結果がありません。

2010/12/10: On the Extension of Trace Norm to Tensors

N/A
N/A
Protected

Academic year: 2021

シェア "2010/12/10: On the Extension of Trace Norm to Tensors"

Copied!
16
0
0

読み込み中.... (全文を見る)

全文

(1)

On the extension of trace norm to tensors

Ryota Tomioka1, Kohei Hayashi2, Hisashi Kashima1

1The University of Tokyo

2Nara Institute of Science and Technology

2010/12/10

(2)

Convex low-rank tensor completion Sensors Time Featur es Factors (loadings) Core (interactions) I1 I2 I3 I1 I2 I3 r1 r2 r3 r1 r2 r3 Tucker decomposition

(3)

Conventional formulation (nonconvex) mode-k product observation minimize C,U1,U2,U3 !Ω ◦ (Y − C ×1 U 1 ×2 U2 ×3 U3) ! 2 F + regularization. minimize X !Ω ◦ (Y − X ) ! 2 F s.t. rank(X ) ≤ (r1, r2, r3).

Alternate minimization

(4)

Our approach

Matrix

Estimation of low-rank matrix (hard) Trace norm minimization (tractable)

[Fazel, Hindi, Boyd 01]

Tensor

Estimation of

low-rank tensor

(hard)

Rank defined in the sense of Tucker decomposition Extended trace norm minimization (tractable) Generalization

(5)

Trace norm regularization (for matrices) X ∈ RR×C , m = min(R, C) Linear sum of singular-values • Roughly speaking, L1 regulariza6on on the singular‐values. • Stronger regulariza6on ‐‐> more zero singular‐values     ‐‐>  low rank. • Not obvious for tensors (no singular‐values for tensors)

!X!

tr

=

m

!

j=1

σ

j

(X)

(6)

Mode-k unfolding (matricization) I1 I2 I3 I1 I2 I2 I2 I3 Mode-1 unfolding I1 I2 I3 I2 I3 I3 I3 I1 Mode-2 unfolding X (1) X (2)

(7)

Elementary facts about Tucker decomposition

X = C ×

1

U

1

×

2

U

2

×

3

U

3 r1 r2 r3

C

I1 r

U

1 1 I2 r2

U

2 I3 r3

U

3 Mode-1 unfolding Mode-2 unfolding Mode-3 unfolding X(1) = U 1C(1)(U 3 ⊗ U 2)! X(2) = U 2C(2)(U 1 ⊗ U 3)! X(3) = U 3C(3)(U 2 ⊗ U 1)! rank ≦ r1 rank ≦ r2 rank ≦ r3 The rank of X(k) is no more than the rank of C(k)

(8)

What it means • We can use the trace norm of an unfolding of a  tensor X to learn low‐rank X.

Tensor X is

low-rank

∃k, r

k

<I

k

Unfolding X

k

is

a low-rank

matrix

Matricization Tensorization

(9)

Approach 1: As a matrix • Pick a mode k, and hope that the tensor to be  learned is low rank in mode k. minimize X ∈RI1×···×IK 1 2λ!Ω ◦ (Y − X ) !2F + !X(k)!∗,

Pro: Basically a matrix problem

→ Theoretical guarantee (Candes & Recht 09) Con: Have to be lucky to pick the right mode.

(10)

Approach 2: Constrained optimization • Constrain so that each unfolding of X is  simultaneously low rank. minimize X ∈RI1×···×IK 1 2λ!Ω ◦ (Y − X ) !2F + K ! k=1 γk!X(k)!.

γk: tuning parameter usually set to 1.

Pro: Jointly regularize every mode

Con: Strong constraint

(11)

Approach 3: Mixture of low-rank tensors • Each mixture component Zk is regularized to be  low‐rank only in mode‐k. minimize Z1,...,ZK 1 ! ! ! ! !Ω ◦ " Y − #K k=1 Zk $!! ! ! ! 2 F + K # k=1 γk#Zk(k)#,

Pro: Each Zk takes care of each mode

(12)

Numerical experiment • True tensor: Size 50x50x20, rank 7x8x9. No noise (λ=0). • Random train/test split. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10−4 10−2 100 102

Fraction of observed elements

Generalization error As a Matrix (mode 1) As a Matrix (mode 2) As a Matrix (mode 3) Constraint Mixture Tucker (large) Tucker (exact) Optimization tolerance Tucker=EM algo (nonconvex)

(13)

Computation time • Convex formula6on is also fast 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50

Fraction of observed elements

Computation time (s) As a Matrix Constraint Mixture Tucker (large) Tucker (exact)

(14)

Phase transition behaviour 0 20 40 60 80 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 [10 12 13] [10 12 9] [16 6 8] [17 13 15] [20 20 5] [20 3 2] [30 12 10] [32 3 4] [40 20 6] [40 3 2] [40 9 7] [4 42 3] [5 4 3] [7 8 15] [7 8 9] [8 5 6]

Sum of true ranks

Fraction required for perfect reconstruction

(15)

Summary • Low‐rank tensor comple6on can be computed in a convex  op6miza6on problem using the trace norm of the unfoldings. ‐ No need to specify the rank beforehand. • Convex formula6on is more accurate and faster than  conven6onal EM‐based Tucker decomposi6on. • Curious “phase transi6on” found → compressive‐sensing‐ type analysis is an on‐going work. • Technical report: Arxiv:1010.0789 (including op6miza6on) • Code: ‐ h]p://www.ibis.t.u‐tokyo.ac.jp/RyotaTomioka/So_wares/Tensor

(16)

Acknowledgment

• This work was supported in part by MEXT KAKENHI  22700138, 22700289, and NTT Communica6on 

参照

関連したドキュメント

As an application, in Section 5 we will use the former mirror coupling to give a unifying proof of Chavel’s conjecture on the domain monotonicity of the Neumann heat kernel for

In particular, in view of the results of Guillemin [16] [17], this means that on Toeplitz operators T Q of order ≤ −n, the Dixmier trace Tr ω T Q coincides with the residual trace

Section 4 will be devoted to approximation results which allow us to overcome the difficulties which arise on time derivatives while in Section 5, we look at, as an application of

Then it follows immediately from a suitable version of “Hensel’s Lemma” [cf., e.g., the argument of [4], Lemma 2.1] that S may be obtained, as the notation suggests, as the m A

Applications of msets in Logic Programming languages is found to over- come “computational inefficiency” inherent in otherwise situation, especially in solving a sweep of

In Section 3 using the method of level sets, we show integral inequalities comparing some weighted Sobolev norm of a function with a corresponding norm of its symmetric

In Section 7, we state and prove various local and global estimates for the second basic problem.. In Section 8, we prove the trace estimate for the second

Shi, “The essential norm of a composition operator on the Bloch space in polydiscs,” Chinese Journal of Contemporary Mathematics, vol. Chen, “Weighted composition operators from Fp,