Sparse Additive Matrix
Factorization for Robust
PCA and Its Generalization
Shinichi Nakajima (Nikon)
Masashi Sugiyama (Tokyo Tech.)
S. Derin Babacan (Illinois Univ.)
✤ Model-induced regularization
✤ Formulation of SAMF
✤ Mean update (MU) algorithm
✤ Experimental results
Sparse additive matrix factorization (SAMF) is a Bayesian
model that allows arbitrary sparsity design (like Group-LASSO).
It makes use of model-induced regularization of variational
Bayesian matrix factorization.
Contents:
Model-induced regularization
Bayesian matrix factorization (Probabilistic PCA)
Observations:
Parameters:
Hyperparameters:
Likelihood: Priors:
Approximate with a low rank matrix
[Salakhutdinov&Mnih:NIPS2008, Tipping&Bishop:JRSS1999]
(Loading matrix) (Hidden variables)
0 1 2 3 4 5 6 7 8 9 10 0
1 2 3 4 5 6 7 8 9 10
VB ML
γh
! γh
Model induced regularization (MIR)
[Nakajima&Sugiyama:JMLR2011]
VB solution is written as , where
Factorization induces implicit regularization and low-rank sparsity.
Let be SVD of the observed matrix .
0 1 2 3 4 5 6 7 8 9 10 0
1 2 3 4 5 6 7 8 9 10
VB ML
γh
! γh
Model induced regularization (MIR)
[Nakajima&Sugiyama:JMLR2011]
a → u/b b → cu ca = 1 cb → ∞ p(a|ca) ∝ exp
!
− a
2
2ca2
"
p(b|cb) ∝ exp
!
− b
2
2cb2
" p(y|a,b) ∝ exp
!
−(y− ab)2 2
"
p(y|u) ∝ exp
!
−
(y − u)2 2
"
p(u|cu) ∝ exp
!
− u2 2cu2
"
Linear + ARD prior MF + Gauss prior
cu → ∞ ML estimation when
b → ∞ Regularized unless !
VB solution is written as , where
Let be SVD of the observed matrix .
Factorization induces implicit regularization and low-rank sparsity.
Sparse additive
matrix factorization
(SAMF)
Idea: By using factorization, we will construct a Bayesian variant of
Group-LASSO, where no manual parameter tuning is required.
Sparse matrix factorization (SMF) term
Outlined font
!→
1. partition
2. rearrange
3. factorize
Idea: By using factorization, we will construct a Bayesian variant of
Group-LASSO, where no manual parameter tuning is required.
Sparse matrix factorization (SMF) term
Partition-wise sparsity is expected!
such that
where is a one-to-one map.
Formal expression:
!→
Row-wise sparse term:
!→
Column-wise sparse term:
Element-wise sparse term:
!→
Examples of SMF term
:Hadamard product
✤
an extension of robust PCA.✤
a unified framework of factorization.A single theory is applied to the GMF terms,
giving a useful algorithm called mean update (MU). SAMF model is a sum of SMF terms:
Ex:
SAMF model is
[Candes+:CoRR2009, Babacan+:IEEETSP2012]
Sparse additive matrix factorization (SAMF) model
Mean Update (MU)
Algorithm
Likelihood:
Priors:
Observations: Parameters: Hyperparameters:
for
Formulation of SAMF
Bayesian inference is intractable!
Variational Bayesian Approximation
Trial distribution: Free Energy:
s.t.
Bayes posterior is minimizer.
Impose decomposability constraint, which makes posterior tractable.
Standard VB Iteration
Standard VB procedure gives an iterative algorithm for means and covariances:
For empirical variational Bayes: For unknown noise variance:
Note 2: [Babacan+:IEEETSP2012] derived almost identical algorithm for robust PCA Note 1: Standard VB algorithm gives a local solution.
Mean Update (MU) Algorithm
VBMF solution can be analytically computed.
Partial analytic solution is available.
Each step gives global solution, but no guarantee of global optimality of MU.
Advantage will be experimentally shown.
Given , VB posterior on coincides to
VB posterior of a plain MF model, for each . Theorem 1:
3. Iterate 1 and 2. Mean update (MU) algorithm:
2. is updated, given .
is updated for each s, given
1. . Can also be applied
to empirical VB. [Nakajima+:NIPS2012]
Experimental results
MU vs Standard VB (with artificial data)
MU tends to give a better local solution (with lower free energy and
smaller reconstruction error) than Standard VB
(although computation time for each step is a bit longer).
ULRCE = Ulow-rank + Urow-wise + Ucolumn-wise
+ Uelement-wise
`LRCE’-SAMF:
0 50 100 150 200 250
4.1 4.2 4.3 4.4 4.5 4.6 4.7
Iteration
F/(LM)
MeanUpdate Standard(iniML) Standard(iniMLSS) Standard(iniRan)
0 1 2 3 4 5
!! U−
U∗!2 Fro/(LM) Overall Low−rank Row Column Element
MeanUpdate Standard(iniML) Standard(iniMLSS) Standard(iniRan)
ULE = Ulow-rank + Uelement-wise
`LE’-SAMF: (robust PCA)
0 50 100 150 200 250
3 3.2 3.4 3.6 3.8 4 4.2
Iteration
F/(LM)
MeanUpdate Standard(iniML) Standard(iniMLSS) Standard(iniRan)
0 1 2 3 4
!! U−
U∗!2 Fro/(LM) Overall Low−rank Row Column Element
MeanUpdate Standard(iniML) Standard(iniMLSS) Standard(iniRan)
Free energy Reconstruction error
blue: MU
red: Standard VB
Free energy Reconstruction error
0 50 100 150 200 250
0 20 40 60 80
Iteration
Time(sec)
MeanUpdate Standard(iniML) Standard(iniMLSS) Standard(iniRan)
0 50 100 150 200 250
0 2 4 6 8 10
Iteration
Time(sec)
MeanUpdate Standard(iniML) Standard(iniMLSS) Standard(iniRan)
Computation time
Computation time
V
pixels
V = U
BG+ U
FG+ E
V = U
low-rank+ U
element-wise+ E
Robust PCA
(`LE’-SAMF)
[Candes+:CoRR2009, Babacan+:IEEETSP2012]
FG/BG video separation
Segmentation-based SAMF (sSAMF)
Sparsity design based on pre-segmentation.
Efficient graph-based image segmentation [Felzenszwalb&Huttenlocher04]
~ 0.05 sec / image
time
pixels Usegment-wise
V = U
BG+ U
FG+ E
V = U
low-rank+ U
element-wise+ E
Robust PCA
(`LE’-SAMF)
[Candes+:CoRR2009, Babacan+:IEEETSP2012]
V = U
low-rank+ U
segment-wise+ E
Segmentation-based SAMF
`LE’-SAMF vs sSAMF (with Caviar dataset)
Segment-wise sparse term works better than element-wise term!
sSAMF:
`LE’-SAMF:
Ulow-rank +Usegment-wise
Ulow-rank + Uelement-wise
(a) Original (b) BG (‘LE’-GMF) (c) FG (‘LE’-GMF)
(d) Segmented (e) BG (sGMF) (f) FG (sGMF)
BG FG