2010/12/10: On the Extension of Trace Norm to Tensors

(1)

On the extension of trace norm to tensors

Ryota Tomioka1_{, Kohei Hayashi}2_{, Hisashi Kashima}1

1_{The University of Tokyo}

2_{Nara Institute of Science and Technology}

2010/12/10

(2)

Convex low-rank tensor completion Sensors Time Featur es Factors (loadings) Core (interactions) I1 I2 I3 I1 I2 I3 r1 r2 r3 r1 r2 r3 Tucker decomposition

(3)

Conventional formulation (nonconvex) mode-k product observation minimize C,U1,U2,U3 !Ω ◦ (Y − C ×1 U 1 ×2 U2 ×3 U3) ! 2 F + regularization. minimize X !Ω ◦ (Y − X ) ! 2 F s.t. rank(X ) ≤ (r1, r2, r3). •

Alternate minimization

(4)

Our approach

Matrix

Estimation of low-rank matrix (hard) Trace norm minimization (tractable)

[Fazel, Hindi, Boyd 01]

Tensor

Estimation of

low-rank tensor

(hard)

Rank defined in the sense of Tucker decomposition Extended trace norm minimization (tractable) Generalization

(5)

Trace norm regularization (for matrices) X _{∈ R}R×C , m = min(R, C) Linear sum of singular-values • Roughly speaking, L1 regulariza6on on the singular‐values. • Stronger regulariza6on ‐‐> more zero singular‐values ‐‐> low rank. • Not obvious for tensors (no singular‐values for tensors)

!X!

tr

=

m

!

j=1

σ

_j

(X)

(6)

Mode-k unfolding (matricization) I1 I2 I3 I1 I2 I2 I2 I3 Mode-1 unfolding I1 I2 I3 I2 I3 I3 I3 I1 Mode-2 unfolding X ₍₁₎ X ₍₂₎

(7)

Elementary facts about Tucker decomposition

X = C ×

1

U

1

×

2

U

2

×

3

U

3 r1 r2 r3

C

I1 r

_U

1 1 I₂ r2

U

2 I3 r3

U

3 Mode-1 unfolding Mode-2 unfolding Mode-3 unfolding X₍₁₎ = U ₁C₍₁₎(U ₃ _{⊗ U} ₂)! X₍₂₎ = U ₂C₍₂₎(U ₁ _{⊗ U} ₃)! X₍₃₎ = U ₃C₍₃₎(U ₂ _{⊗ U} ₁)! rank ≦ r1 rank ≦ r2 rank ≦ r3 The rank of X(k) is no more than the rank of C(k)

(8)

What it means • We can use the trace norm of an unfolding of a tensor X to learn low‐rank X.

Tensor X is

low-rank

∃k, r

k

<I

k

Unfolding X

k

is

a low-rank

matrix

Matricization Tensorization

(9)

Approach 1: As a matrix • Pick a mode k, and hope that the tensor to be learned is low rank in mode k. minimize X ∈RI1×···×IK 1 2λ!Ω ◦ (Y − X ) !2F + !X(k)!∗,

Pro: Basically a matrix problem

→ Theoretical guarantee (Candes & Recht 09) Con: Have to be lucky to pick the right mode.

(10)

Approach 2: Constrained optimization • Constrain so that each unfolding of X is simultaneously low rank. minimize X ∈RI1×···×IK 1 2λ!Ω ◦ (Y − X ) !2F + K ! k=1 γ_k_!X_(k)_!_∗.

γk: tuning parameter usually set to 1.

Pro: Jointly regularize every mode

Con: Strong constraint

(11)

Approach 3: Mixture of low-rank tensors • Each mixture component Zk is regularized to be low‐rank only in mode‐k. minimize Z1,...,ZK 1 2λ ! ! ! ! !Ω ◦ " Y − #K k=1 Zk $!! ! ! ! 2 F + K # k=1 γ_k_#Z_k(k)_#_∗,

Pro: Each Zk takes care of each mode

(12)

Numerical experiment • True tensor: Size 50x50x20, rank 7x8x9. No noise (λ=0). • Random train/test split. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10−4 10−2 100 102

Fraction of observed elements

Generalization error As a Matrix (mode 1) As a Matrix (mode 2) As a Matrix (mode 3) Constraint Mixture Tucker (large) Tucker (exact) Optimization tolerance Tucker=EM algo (nonconvex)

(13)

Computation time • Convex formula6on is also fast 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50

Fraction of observed elements

Computation time (s) As a Matrix Constraint Mixture Tucker (large) Tucker (exact)

(14)

Phase transition behaviour 0 20 40 60 80 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 [10 12 13] [10 12 9] [16 6 8] [17 13 15] [20 20 5] [20 3 2] [30 12 10] [32 3 4] [40 20 6] [40 3 2] [40 9 7] [4 42 3] [5 4 3] [7 8 15] [7 8 9] [8 5 6]

Sum of true ranks

Fraction required for perfect reconstruction

(15)

Summary • Low‐rank tensor comple6on can be computed in a convex op6miza6on problem using the trace norm of the unfoldings. ‐ No need to specify the rank beforehand. • Convex formula6on is more accurate and faster than conven6onal EM‐based Tucker decomposi6on. • Curious “phase transi6on” found → compressive‐sensing‐ type analysis is an on‐going work. • Technical report: Arxiv:1010.0789 (including op6miza6on) • Code: ‐ h]p://www.ibis.t.u‐tokyo.ac.jp/RyotaTomioka/So_wares/Tensor

(16)

Acknowledgment

• This work was supported in part by MEXT KAKENHI 22700138, 22700289, and NTT Communica6on