On the extension of trace norm to tensors
Ryota Tomioka1, Kohei Hayashi2, Hisashi Kashima1
1The University of Tokyo
2Nara Institute of Science and Technology
2010/12/10
Convex low-rank tensor completion Sensors Time Featur es Factors (loadings) Core (interactions) I1 I2 I3 I1 I2 I3 r1 r2 r3 r1 r2 r3 Tucker decomposition
Conventional formulation (nonconvex) mode-k product observation minimize C,U1,U2,U3 !Ω ◦ (Y − C ×1 U 1 ×2 U2 ×3 U3) ! 2 F + regularization. minimize X !Ω ◦ (Y − X ) ! 2 F s.t. rank(X ) ≤ (r1, r2, r3). •
Alternate minimization
Our approach
Matrix
Estimation of low-rank matrix (hard) Trace norm minimization (tractable)[Fazel, Hindi, Boyd 01]
Tensor
Estimation of
low-rank tensor
(hard)
Rank defined in the sense of Tucker decomposition Extended trace norm minimization (tractable) Generalization
Trace norm regularization (for matrices) X ∈ RR×C , m = min(R, C) Linear sum of singular-values • Roughly speaking, L1 regulariza6on on the singular‐values. • Stronger regulariza6on ‐‐> more zero singular‐values ‐‐> low rank. • Not obvious for tensors (no singular‐values for tensors)
!X!
tr=
m!
j=1σ
j(X)
Mode-k unfolding (matricization) I1 I2 I3 I1 I2 I2 I2 I3 Mode-1 unfolding I1 I2 I3 I2 I3 I3 I3 I1 Mode-2 unfolding X (1) X (2)
Elementary facts about Tucker decomposition
X = C ×
1U
1×
2U
2×
3U
3 r1 r2 r3C
I1 rU
1 1 I2 r2U
2 I3 r3U
3 Mode-1 unfolding Mode-2 unfolding Mode-3 unfolding X(1) = U 1C(1)(U 3 ⊗ U 2)! X(2) = U 2C(2)(U 1 ⊗ U 3)! X(3) = U 3C(3)(U 2 ⊗ U 1)! rank ≦ r1 rank ≦ r2 rank ≦ r3 The rank of X(k) is no more than the rank of C(k)What it means • We can use the trace norm of an unfolding of a tensor X to learn low‐rank X.
Tensor X is
low-rank
∃k, r
k<I
kUnfolding X
kis
a low-rank
matrix
Matricization TensorizationApproach 1: As a matrix • Pick a mode k, and hope that the tensor to be learned is low rank in mode k. minimize X ∈RI1×···×IK 1 2λ!Ω ◦ (Y − X ) !2F + !X(k)!∗,
Pro: Basically a matrix problem
→ Theoretical guarantee (Candes & Recht 09) Con: Have to be lucky to pick the right mode.
Approach 2: Constrained optimization • Constrain so that each unfolding of X is simultaneously low rank. minimize X ∈RI1×···×IK 1 2λ!Ω ◦ (Y − X ) !2F + K ! k=1 γk!X(k)!∗.
γk: tuning parameter usually set to 1.
Pro: Jointly regularize every mode
Con: Strong constraint
Approach 3: Mixture of low-rank tensors • Each mixture component Zk is regularized to be low‐rank only in mode‐k. minimize Z1,...,ZK 1 2λ ! ! ! ! !Ω ◦ " Y − #K k=1 Zk $!! ! ! ! 2 F + K # k=1 γk#Zk(k)#∗,
Pro: Each Zk takes care of each mode
Numerical experiment • True tensor: Size 50x50x20, rank 7x8x9. No noise (λ=0). • Random train/test split. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10−4 10−2 100 102
Fraction of observed elements
Generalization error As a Matrix (mode 1) As a Matrix (mode 2) As a Matrix (mode 3) Constraint Mixture Tucker (large) Tucker (exact) Optimization tolerance Tucker=EM algo (nonconvex)
Computation time • Convex formula6on is also fast 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50
Fraction of observed elements
Computation time (s) As a Matrix Constraint Mixture Tucker (large) Tucker (exact)
Phase transition behaviour 0 20 40 60 80 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 [10 12 13] [10 12 9] [16 6 8] [17 13 15] [20 20 5] [20 3 2] [30 12 10] [32 3 4] [40 20 6] [40 3 2] [40 9 7] [4 42 3] [5 4 3] [7 8 15] [7 8 9] [8 5 6]
Sum of true ranks
Fraction required for perfect reconstruction
Summary • Low‐rank tensor comple6on can be computed in a convex op6miza6on problem using the trace norm of the unfoldings. ‐ No need to specify the rank beforehand. • Convex formula6on is more accurate and faster than conven6onal EM‐based Tucker decomposi6on. • Curious “phase transi6on” found → compressive‐sensing‐ type analysis is an on‐going work. • Technical report: Arxiv:1010.0789 (including op6miza6on) • Code: ‐ h]p://www.ibis.t.u‐tokyo.ac.jp/RyotaTomioka/So_wares/Tensor
Acknowledgment
• This work was supported in part by MEXT KAKENHI 22700138, 22700289, and NTT Communica6on