An Inexact Coordinate Descent Method for the
Weighted
$l_{1}$
-regularized
Convex optimization
Problem
Xiaoqin
Hua*
NobuoYamashita\daggerOctober26,2012
Abstract. The
purpose
ofthispaper
istopropose
an
inexactcoordinate descent(ICD) method for solving the weighted $l_{1}$-regularizedconvex
optimization problem witha
box constraint.The proposed algorithm solves
a
one
dimensional subproblem inexactly ateachiteration. Wegive criteriaof the inexactness under which the
sequence
generated by the proposed methodconverges
toan
optimal solution anditsconvergence
rateisatleast$R$-linearwithoutassumingtheuniqueness ofsolutions.
Keywords. $l_{1}$-regularized
convex
optimization,inexactcoordinate descentmethod,linearcon-vergence,
error
bound.AMS
SubjectClassmcation(2000). $65K05,90C25,90C30.$1 Introduction
We consider thefollowing weighted$l_{1}$-regulanized
convex
optimization problem:minimize $F(x)$ $:=g(Ax)+\langle b,$ $x \rangle+\sum_{i=1}^{n}\tau_{i}|x_{i}|$
(1. 1)
subjectto $l\leq x\leq u,$
where$g$ : $\mathcal{R}^{m}arrow(-\infty, \infty]$ is
a
strictlyconvex
functionon
dom$g,$ $A\in \mathcal{R}^{m\cross n}$ and $b\in \mathcal{R}^{n}.$Moreover, $\tau,$
$l$ and
$u$
are
$n$-dimensionalvectors such that$l_{i}\in[-\infty, \infty),$ $u_{i}\in(-\infty, \infty],$$\tau_{i}\in$$[0, \infty)$ and $l_{i}<u_{i}$ for each$i=1,$ $\cdots,$ $n$. Thenonnegative scalar constant $\tau_{i}$ is called weight
and theterm $\sum_{i=1}^{n}\tau_{i}|x_{i}|$ iscalled the $l_{1}$-regularization function. Forconvenience,
we
denote thedifferentiabletermof$F$by$f$, thatis,$f(x)$ $:=g(Ax)+\langle b,$$x\rangle.$
$*$
Department of Applied Mathematics and Physics,GraduateSchool of$Info\ovalbox{\tt\small REJECT}$atics, Kyoto University, Kyoto 606-8501,JAPAN.$E$-mail:
\dagger Deparmentof Apphed Mathematlcs and Physics,GraduateSchool ofInformatics, KyotoUniversity, Kyoto 606-8501,JAPAN. $E$
The problem(1.1) contains
many
well-known problemsas
specialcases
[7, 17, 15]. When$\tau_{i}=0$ for all index $i$, the problem (1.1) is reduced to the differentiable
convex
problemwith
a
box constraint. When $l_{i}=-\infty$ and $u_{i}=\infty$ for each $i$, it is reduced to theun-constrained $l_{1}$-regulanized
convex
problem. When$\tau_{i}$ is
a
fixed positive constant$\hat{\tau}$ for all $i,$
$b=0$ and$g(y)$ $:= \frac{1}{2}\Vert y-z\Vert^{2}$ with
some
$z\in \mathcal{R}^{m}$, itis the famous $l_{1^{-}}l_{2}$ problem. Anotherimportant $s_{m}pecia1$
case
is the $l_{1}$-regularized logistic regression problem where$g$ is given by
$g(y)$
$:= \frac{1}{m}\sum_{i=1}\log(1+\exp(-y_{i}))$. Each special
case
haswide applications in the real life suchas
the compressed sensing [15], the feature selection in the data classification [7], the datamining [9], geophysics [1] and
so
on.
Typically, the scales of these weighted $l_{1}$-regularizedconvex
optimizationproblemsare
verylarge and the objective functionsare
notdifferentiable everywhere duetothe regularization function.Moreover, theoptimal solutionsare
possiblynotunique becausethe matrix $A$
may
nothavefull columnrank. Thus theNewton-type methodssuch
as
theinteriorpoint methodcan
notbe applied directly.Inthe past,thecoordinate descent($CD$)methodisverifiedtobe
one
of thefeasible methodsfor the large scaleoptimization problems [4, 11, 14, 17]. The $CD$ methodminimizesthe
ob-jective function withrespectto
one
vaniable while all the othersare
fixed ateachiteration. Theidea ofthis method isverysimpleand the storage of calculationsislittle. In
some
special cases,it
can
beimplementedinparallel. Luo and Tseng [17]proveditsglobal andlinearconvergence
for
a
smooth problem, thatis, $\tau_{i}=0$ for all $i$. Note that the problem (1.1)can
bereformu-lated
as a
smooth problem. However, thereformulatedproblemhastwice vanables. In2001,Tseng [11] showed the global
convergence
ofa
block coordinate descent ($BCD$) methodforminimizing
an
nondifferentiable function withcertainseparability. But theexactminimizersofthesubproblemmustbefound
on
eachiteration in [11, 17]. Itispossible for the $l_{1^{-}}l_{2}$problem,while usuallyitishard for the general$l_{1}$-regularized
convex
problem.Inthis
paper, we propose an
inexactcoordinate descent(ICD)method and extend the results of LuoandTseng[17]. Roughly speaking,we
extendin the following threeaspects:$\bullet$ The smooth
convex
problemisextendedtothat with the$l_{1}$-regularized function.$\bullet$ On eachiterationstep,
we
acceptan
inexactsolution ofa
subprobleminstead of theexact
solution.
$\bullet$ The linear
convergence
rateisextendedtothe nonsmooth problem.In this paper, underthe
same
basic assumptionsas
in [17],we
show that the proposed ICDmethod is not only globally convergentbut also withatleast$R$-linearconvergencerate under
This
paper
is
organizedas
follows.
In Section 2,we
derive theoptimal conditions
of theproblem (1.1)and also define the$\epsilon$-optimality conditions which
are
relatedtoan
inexactsolu-tion. In Section3,
we
presenta
framework of theICD method and makesome
assumptionsforthe“inexact solutions”. Theglobal
convergence
andlinearconvergence
rateare
established inSection 4.Finally,
we
conclude thispaper
inSection5.Throughout this
paper,
we
use
the following notations. Fora
differentiable function $h,$$\nabla h$ denotes the gradient of$h$ and $\nabla^{2}h$ denotes the Hessian matrix of$h.$ $\nabla_{i}h$ denotes the $i$th
coordinate ofthe gradient vector $\nabla h$
.
If $h$ isconvex
andnondifferentiable, $\partial h$ denotes thesubdifferential of$h$
.
Forany
real number $x,$ $|x|$ denotes the absolute value of$x$.
Fora
givenvector$x\in \mathcal{R}^{n}$,
we
denote by$x_{i}$ theithcoordinate of$x$
.
We denote the 2-norm of$x$ by $\Vert x\Vert.$For
any
matrix $A$, the transpose of$A$is denoted by $A^{T}$ and $A_{j}$ denotes the $j$th column. Forthefunction $F:\mathcal{R}^{n}arrow \mathcal{R}$ and
a
vector$x\in \mathcal{R}^{n}$,we
sometimesuse
a
notation $F(x_{1}, \cdots, x_{n})$instead of$F(x)$
.
Since the length of the
paper
mustbewithin 17pages, we
omitall proofs of theoremsinthe subsequentsections. The full proofscan
be foundin[16].2 Preliminaries
Throughout the
paper, we
make the following basicassumptionsfor the problem(1.1).Assumption
2.1.
For the problem(1.1),we assume
that$(a)A_{j}$ is
a
nonzero vectorfor
all$j\in\{1,2, \cdots, n\}.$$(b)l_{i}<0<u_{i}$
for
all$i\in\{1,2, \cdots, n\}.$$(c)$ Theset
of
theoptimalsolutions, denoted by$X^{*}$, isnonempty.$(d)$ The
effective
domainof
$g$,denotedbydom$g$, isnonemptyand open.$(e)g$ istwicecontinuously
differentiable
ondom$g.$$(f)\nabla^{2}g(Ax^{*})$ ispositive
definitefor
everyoptimalsolution$x^{*}\in X^{*}.$We make
a
few remarkson
these assumptions. In Part (a), if $A_{j}$ is zero, then $x_{j}^{*}$ of theoptimal solution$x^{*}$
can
beeasily determined. Thuswe can remove
$x_{j}$ from the problem (1.1).
Part (b)is justforsimplification. If both $l_{i}$ and
$u_{i}$
are
positiveforsome
$i\in\{1,2, \cdots, n\}$,we
mayreplace $x_{i},$ $l_{i}$and
$u_{i}$by$\overline{x}_{i}+\frac{l+u}{2}$, $\frac{l-u}{2}$ and $\frac{u-l}{2}$
.
Then the problem(1.1)isreformulated into thecase
without$l_{1}$-regularizedtermfor theindex$i$.
If$g$ isstronglyconvex
andtwicefunction,
an
exponential function, andeven some
complicate functions in the $l_{1}$-regularizedlogistic regression problem satisfy (e) and(f). Note that
we
do notassume
theboundness ofthe optimal solutionset$X^{*}.$
Next,
we
presentsome
propertiesunderAssumption 2.1 thatare
used in the subsequentsec-tions. FromAssumption 2.1(e) and(f), thereexists
a
sufficiently small closed neighborhood$B(Ax^{*})$ of$Ax^{*}$ such that $B(Ax^{*})\subseteq$ dom
$g$ and$\nabla^{2}g$ is positive definitein $B(Ax^{*})$
.
Further-more, it implies that $g$ is strongly
convex
in $B(Ax^{*})$, i.e., there existsa
scalar $\sigma>0$ such that$g(y)-g(z)-\langle\nabla g(z), y-z\rangle\geq\sigma\Vert y-z\Vert^{2}, \forall y, z\in B(Ax^{*})$ . (2.1)
2.1 Optimalityconditions
The KKT conditions [13] forthe problem(1.1)
are
describedas
follows.$\nabla_{i}f(x)+\tau_{i}\partial|x_{i}|-\mu_{i}+v_{i}\ni 0,$
$x_{i}\geq l_{i}, \mu_{i}\geq 0, \mu_{i}(x_{i}-l_{i})=0, i=1, \cdots, n$, (2.2)
$x_{i}\leq u_{i}, v_{i}\geq 0, v_{i}(u_{i}-x_{i})=0,$
where $\partial|$ $|$ is the subdifferential of the absolute value function. Since the problem (1.1) is
convex, $x$ satisfying(2.2) is
a
solution of the problem(1.1). The KKT conditions (2.2)can
berewritten
as
follows.Lemma
2.1.
$A$ vector $x$ is an optimal solutionof
theproblem (1.1)if
and onlyif
oneof
thefollowingstatementsholds
for
each$i.$(i) $\nabla_{i}f(x)\geq\tau_{i}$ and$x_{i}=l_{i}.$
(ii) $\nabla_{i}f(x)=\tau_{i}$and$l_{i}\leq x_{i}\leq 0.$
(iii) $|\nabla_{i}f(x)|\leq\tau_{i}$and$x_{i}=0.$
(iv) $\nabla_{i}f(x)=-\tau_{i}$and$0\leq x_{i}\leq u_{i}.$
(v) $\nabla_{i}f(x)\leq-\tau_{i}$ and$x_{i}=u_{i}.$
Next,
we
representtheseconditionsas a
fixedpoint ofsome
operator. To thisend,we
firstdefine
a
mapping$T_{\tau}$ : $\mathcal{R}^{n}arrow \mathcal{R}^{n}$as
where the scalarfunction $(a)_{+}$ is defined by $(a)_{+};= \max(0, a)$ and
sgn
$(a)$ isa
$sign$functiondefined
as
follows:$sgn(a):=\{\begin{array}{ll}-1 if a<0,0 if a=0,1 if a>0.\end{array}$
It
can
beverified that$T_{\tau}$ is nonexpensive,i.e., $\Vert T_{\tau}(y)-T_{\tau}(z)\Vert\leq\Vert y-z\Vert$,forany
$y,$$z\in$ dom$F.$Let $[x]_{[l,u]}^{+}$ denote the orthogonalprojectionof
a
vector$x$ ontothe box $[l, u]$. Thisprojectionisalsononexpensive anditsithcoordinate
can
be writtenas
$[x_{i}]_{1\prime}^{+}l_{t},u_{i}]$ $:=$ mid$\{x_{i}, l_{i}, u_{i}\}$,where$mid\{x_{i}, l_{i}, u_{i}\}$ is defined by$mid\{x_{i}, l_{i}, u_{i}\}$ $:= \max\{l_{i}, \min\{u_{i}, x_{i}\}\}.$
By usingthemappings$T_{\tau}$ and $[\cdot]_{[l,u]}^{+}$,
we
definea
mapping $P_{\tau,l,u}(x)$ : $\mathcal{R}^{n}arrow \mathcal{R}^{n}$ by$P_{\tau,l,u}(x):=[T_{\tau}(x-\nabla f(x))]_{[l,u]}^{+}$. (2.4)
Since $[x]_{[l,u]}^{+}$ and$T_{\tau}$
are
nonexpensive,we
have that$\Vert P_{\tau,l,u}(y)-P_{\tau,l,u}(z)\Vert\leq\Vert y-z-\nabla f(y)+\nabla f(z)\Vert,$ $\forall y,$$z\in$ domF. (2.5)
Now, theoptimal solutions
can
be describedas a
fixedpointof themapping$P_{\tau,l,u}.$Theorem2.1. For theproblem (1.1), a vector$x$ belongsto theoptimalsolutionset$X^{*}if$and
only
if
$x=P_{\tau,l,u}(x)$, i.e., $X^{*}=\{x|x\in$ dom$g,$$x=P_{\tau,l,u}(x)\}.$Since the solutionset$X^{*}$isnotnecessarilybounded,thelevel setof$F$
may
benotbounded.Nevertheless,
as an
extension of [17, Lemma 3.3],we
can
show the compactness of the set$\Omega(\zeta)$ $:=\{t|t=Ax, F(x)\leq\zeta, x\in[l, u]\}.$
Lemma
2.2.
Foragiven constantvalue$\zeta$, theset$\Omega(\zeta)$ isacompactsubsetof
dom$g.$
Next,
we
show that $\nabla g$ isLipschitz continuouson
some
compact setincluding $\Omega(\zeta)$.
Forthispurpose, wedefineaset$\Omega(\zeta)+B(\epsilon_{0})$ as$\Omega(\zeta)+B(\epsilon_{0})$ $:=\{p+v|p\in\Omega(\zeta), \Vert v\Vert\leq\epsilon_{0}\},$
where$\epsilon_{0}$ is
a
positiveconstant. Itiseasy
tosee
that theset$\Omega(\zeta)+B(\epsilon_{0})$ isa
compactset.Lemma
2.3.
There existconstants $L>0$ and $\epsilon_{0}>0$ such that$\Omega(\zeta)+B(\epsilon_{0})\subseteq$ dom$g$ and$\Vert\nabla g(y)-\nabla g(z)\Vert\leq L\Vert y-z\Vert for$ally, $z\in\Omega(\zeta)+B(\epsilon_{0})$.
Similarto [18, Lemma2.1],
we
can
prove thefollowing invariant property of the optimal solutionset$X^{*}.$2.2 $\epsilon$-optimality conditions
Inthis subsection,
we
givea
definition ofa
relaxed optimality conditions anda
relation betweentheconditionsand themapping$P_{\tau,l,u}.$
Definition
2.1.
Wesaythatthe$\epsilon$-optimality conditionsfor
theproblem(1.1)holdat$x$
if
one
of
thefollowingstatementsholdsfor
each $i.$(i) $\nabla_{i}f(x)-\tau_{i}\geq-\epsilon and|x_{i}-l_{i}|\leq\epsilon.$
(ii) $|\nabla_{i}f(x)-\tau_{i}|\leq\epsilon$ and$l_{i}-\epsilon\leq x_{i}\leq\epsilon.$
(iii) $|\nabla_{i}f(x)|\leq\tau_{i}+\epsilon and|x_{i}|\leq\epsilon.$
(iv) $|\nabla_{i}f(x)+\tau_{i}|\leq\epsilon and-\epsilon\leq x_{i}\leq u_{i}+\epsilon.$
(v) $\nabla_{i}f(x)+\tau_{i}\leq\epsilon and|x_{i}-u_{i}|\leq\epsilon.$
Definition 2.2. Wesay$x$ isan $\epsilon$-approximatesolution
of
the problem (1.1)if
the$\epsilon$-optimalityconditions holdat$x.$
Note that the optimality condition in Lemma 2.1
can
be obtained by Definition 2.1 with $\epsilon=0.$Forconvenience,
we
define thefollowingfive indexsets:$J_{1}(x, \epsilon):=\{i|\nabla_{i}f(x)-\tau_{i}\geq-\epsilon, |x_{i}-l_{i}|\leq\epsilon\}$; $J_{2}(x, \epsilon):=\{i||\nabla_{i}f(x)-\tau_{i}|\leq\epsilon, l_{i}-\epsilon\leq x_{i}\leq\epsilon\}$; $J_{3}(x, \epsilon):=\{i||\nabla_{i}f(x)|\leq\tau_{i}+\epsilon, |x_{i}|\leq\epsilon\}$;
$J_{4}(x, \epsilon):=\{i||\nabla_{i}f(x)+\tau_{i}|\leq\epsilon, -\epsilon\leq x_{i}\leq u_{i}+\epsilon\}$ ; $J_{5}(x, \epsilon):=\{i|\nabla_{i}f(x)+\tau_{i}\leq\epsilon, |x_{i}-u_{i}|\leq\epsilon\}.$
Thenthe$\epsilon$-optimality conditionshold at
$x$ if and only if$\bigcup_{i=1}^{5}J_{i}(x, \epsilon)=\{1,2, \cdots, n\}.$
Throughout thepaper, for simplicity,
we
assume
that$\epsilon<\frac{1}{2}\min_{i}\{-l_{i}, u_{i}\}$. (2.6) Bythedefinition of$T_{\tau}(x)$ and$P_{\tau,l,u}(x)$ in(2.3)and (2.4),
we
givean
equivalent description of the $\epsilon$-optimality conditions by the next theorem,which will be used for constmcting
an
inexact $CD$ method and
investigating
itsconvergence
properties, where the proofis omittedhere.
Theorem
2.2.
The $\epsilon$-optimality conditions holdat$x$
if
and onlyif
$|x_{i}-P_{\tau,l,u}(x)_{i}|\leq\epsilon$ holdsfor
each$i.$3
Inexact
coordinate
descent
(ICD)method
In this section,
we
will first presenta
framework for theICD method, and then givesome
assumptions for the“inexact solutions”.
A general framework of theICD method
can
be describedas
follows. Inexact coordinate descent(ICD)method:Step$0$
:
Choosean
initialpoint$x^{0}\in[l, u]$ and let$r$ $:=0.$Step
1:
Ifsome
termination
conditionholds,then stop.Step
2:
Choosean
index$i(r)\in\{1, \cdots, n\}$, getan
approximate solution$x_{i(r)}^{r+1}$ ofthefollowingone
dimensional subproblem:$x_{i(r)}\in\{\iota_{()}\leq x_{i(r)}\leq u_{i(r)}\}minimizeF (x_{1}^{r}, x_{2}^{r}, \cdots , x_{i(r)-1}^{r}, x_{i(r)}, x_{i(r)+1}^{r}, \cdots , x_{n}^{r})$ . (3.1) Step
3:
Set$x_{j}^{r+1}$ $:=x_{j}^{r}$forall$j\in\{1, \cdots, n\}$ such that$j\neq i(r)$ and let$r:=r+1$.
GotoStep1.
Note that theexactsolution of the subproblem (3.1)isunique fromAssumption 2.1(a)and
the strict convexity of$g$
.
Weuse
thenotation $i(r)$ forthe index chosen attherthiteration. Forsimplicity,
we
willuse
$i$instead of$i(r)$ when$i(r)$ is clearfrom the context.For theglobal convergenceof the ICD method, itis importantto define theinexactness of
theapproximatesolutions of the subproblem(3.1)and choose
an
appropriateindex$i(r)$ in Step2.
For theinexactness,
we
requirethe following assumptions:Assumption3.1. Weassume that the followingstatementshold:
(i) $F(x_{1}^{r}, x_{2}^{r}, \cdots, x_{i-1}^{r}, x_{i}^{r+1}, x_{i+1}^{r}, \cdots,x_{n}^{r})\leq$ $\min$ $F(x_{1}^{r}, x_{2}^{r}, \cdots, x_{i-1}^{r}, x_{i}, x_{i+1}^{r}, \cdots, x_{n}^{r})$. $x.\in\{l_{i},0,u_{\iota},x_{i}^{r}\}$
(ii) $x_{i}^{r+1}$ is feasible, i.e., $x_{i}^{r+1}\in[l_{i}, u_{i}].$ $(i\ddot{u})x_{i}^{r+1}$ is
an
$\epsilon^{r+1}$-approximatesolutionof
the subproblem(3.1).(iv) Conditions
on
$\epsilon^{r+1}:\epsilon^{r+1}\leq\min\{\delta_{r}, \alpha_{r}|x_{i}^{r+1}-x_{i}^{r}|, \epsilon^{r}\}$, where $\{\delta_{r}\}$ isa monotonicallydecreasingsequencesuchthat$\lim_{rarrow\infty}\delta_{r}=0$, and$\alpha_{r}\in[0,\overline{\alpha}]$ withapositiveconstant
$\overline{\alpha}.$
(v) Conditions
on
$\alpha_{r}:\alpha_{r}<\frac{\sigma\min_{j}\Vert A_{j}\Vert^{2}}{L\max\Vert A_{j}\Vert^{2}+1}$holds
for
sufficiently large $r$, where $\sigma$ is a $j$positive constant
defined
in (2.1), and$L$ is theLipschitz constantof
$\nabla g$given in LemmaHere
we
makea
simpleexplanation. Part (i) enforcesnotonly that $\{F(x^{r})\}$ is decreasingbutalso that$\{F(x^{r+1})\}$ isless than$F(x_{1}^{r}, x_{2}^{r}, \cdots, x_{i-1}^{r}, x_{i}, x_{i+1}^{r}, \cdots, x_{n}^{r})$at
a
pointwhere$F$isnonsmooth. This condition is
easy
to check when computing. Italso playsa
key role for theconvergence
of$\{x^{r}\}$ when theobjective function is notdifferentiable. InPart (iii), recall thatthe $\epsilon$-optimality conditions forthe
one
dimensional subproblem(3.1) is that oneof $(i)-(v)$ in
Definition 2.1 holdsat $x_{i(r)}$. Theassumptions(i)-(iv)
are necessary
forthe globalconvergence
while the assumption(v) for$\alpha_{r}$ isusedto guarantee the linearconvergencerate of$\{x^{r}\}.$Note that if
we
obtain theexact solution of the subproblem(3.1)on
eachiteration, then thesequence
$\{x^{r}\}$ satisfies Assumption3.1
automatically. Hence, the classical $CD$ method isa
specialcase
of the ICD method.For the choice of the coordinate $i(r)$ in Step 2,
we
adopt the following almost cycle$mle,$whichis
an
extension of the classical cyclic rule in [3].Almostcyclic$mle$
:
Thereexists an integer$B\geq n$, such that
every
coordinateis iteratedupon atleastonce every
$B$ successiveiterations.
In the next section,
we
will show the ICD method with almost cycle ruleconverges
$R$-linearlyto
a
solution underAssumption2.1 and3.1.4 Global
and
linear
convergence
Inthis section,
we
will show the global and linearconvergenceofthe ICD method.Com-pared with the classicalexact $CD$ method, the ICD method has many “inexact” factors. Thus
we
needsome
preparations.First ofall,
we
illustratea
brief outline of the proof.(1) $\lim_{rarrow\infty}\{x^{r+1}-x^{r}\}=0$
.
(Lemma 4.3)(2) $Ax^{r}arrow Ax^{*}$ where$x^{*}$is
one
ofthe optimal solutions. (Theorem 4.1)(3) Sufficient decreasing: $F(x^{r})-F(x^{r+1})\geq\eta\Vert x^{r}-x^{r+1}\Vert^{2}$ for
some
positive constant $\eta.$(Lemma 4.8)
(4) Error bound: $\Vert Ax^{r}-Ax^{*}\Vert\leq\kappa\Vert x^{r}-P_{\tau,l,u}(x^{r})\Vert$for
some
$\kappa$. (Lemma 4.9)(5) Linear
convergence.
(Theorems4.2and4.3)Note that since it is not
necessary
for thematrix $A$tohave full column rank, $Ax^{r}arrow Ax^{*}$Forconvenience,
we
define twovectors$\tilde{x}^{r+1}$ and$x^{r+1}$as
follows.$\tilde{x}^{r+1}:=(x_{1}^{r}, x_{2}^{r}, \cdots, x_{i(r)-1}^{r},\tilde{x}_{i(r)}^{r+1}, x_{i(r)+1}^{r}, \cdots, x_{n}^{r})$ (4.1) and
$x^{r+1}:=(x_{1}^{r}, x_{2}^{r}, \cdots, x_{i(r)-1}^{r}, x_{i(r)}^{r+1}, x_{i(r)+1}^{r}, \cdots, x_{n}^{r})$, (4.2)
where$x_{i(r)}^{r+1}$and$\tilde{x}_{i(r)}^{r+1}$
are
an
$\epsilon^{r+1}$-approximatesolution and theexactsolutionof the subproblem
(3. 1),respectively.
Inthefirst part of thissection,
we
show$\lim_{rarrow\infty}\{F(\tilde{x}^{r})-F(x^{r})\}=0$and$\lim_{rarrow\infty}\{x^{r+1}-x^{r}\}=0.$Tothisend,
we
need thefollowing function $h_{i}$ : $\mathcal{R}^{n}\cross \mathcal{R}^{n}arrow \mathcal{R}$and Lemma4.1.
$h_{i}(y, z) :=\nabla_{i}f(z)(y_{i}-z_{i})+\tau_{i}(|y_{i}|-|z_{i}|)$
$=\{\begin{array}{ll}(\nabla_{i}f(z)+\tau_{i})(y_{i}-z_{i}) if y_{i}\geq 0, z_{i}\geq 0,\nabla_{i}f(z)(y_{i}-z_{i})+\tau_{i}(y_{i}+z_{i}) if y_{i}\geq 0, z_{i}\leq 0,(43)\nabla_{i}f(z)(y_{i}-z_{i})+\tau_{i}(-y_{i}-z_{i}) if y_{i}\leq 0, z_{i}\geq 0,(\nabla_{i}f(z)-\tau_{i})(y_{i}-z_{i}) if y_{i}\leq 0, z_{i}\leq 0.\end{array}$
Lemma
4.1.
There existsa
positiveconstant$M$such$that|x_{i(r)}^{r+1}- \tilde{x}_{i(r)}^{r+1}|\leq\frac{2M}{\Vert A_{i(r)}\Vert}$for
all$r.$Lemma
4.2.
$\lim_{rarrow\infty}\{F(\tilde{x}^{r})-F(x^{r})\}=0.$Usingtheabove lemmas,
we
can
show that $\{x^{r+1}-x^{r}\}$converges
to$0$bya
similarway
of[18,Lemma 4.1].
Lemma4.3. Forthesequence$\{x^{r}\}$generated by the$ICD$method,
we
have$\lim_{rarrow\infty}\{x^{r+1}-x^{r}\}=0.$Inthe second partof this section,
we
will show theconvergence
of$\{Ax^{r}\}$. Since $\{Ax^{r}\}$ isbounded,there exist$t^{\infty}\in \mathcal{R}^{n}$ and
an
infiniteset $\mathcal{X}$such that$\lim_{rarrow\infty,r\in \mathcal{X}}Ax^{r}=t^{\infty}$. (4.4) Then with thecontinuityof$\nabla g$,
we
have$\lim_{rarrow\infty,r\in \mathcal{X}}\nabla f(x^{r})=d^{\infty}$, (4.5) where
$d^{\infty}:=A^{T}\nabla g(t^{\infty})+b$
.
(4.6)For theset $\mathcal{X}$,
we
have the following result with Lemma4.3, whichprovidesan
interestingLemma
4.4.
Forany $s\in\{0,1, \cdots, B-1\}$,where$B$is the integerdefined
inthe almostcyclerule, wehave$\lim_{rarrow\infty,r\in \mathcal{X}}\nabla f(x^{r-s})=d^{\infty}.$
Lemma4.4 implies that for each$i\in\{1,2, \cdots, n\}$,for
any
$s\in\{0,1, \cdots, B-1\}$,we
have$\lim_{rarrow\infty,r\in \mathcal{X}}\nabla_{i}f(x^{r-s})=d_{i}^{\infty}$
.
(4.7) Fixed coordinate $i$, let, $(r, i)$ denote the largest integer $\overline{r}$, which does not exceed$r$, such
thatthe$i$thcoordinateof
$x$ isiterated
upon
atthe$\overline{r}$thiteration, thatistosay,
forall$r\in \mathcal{X}$,
we
have
$x_{i}^{r}=x_{i}^{\varphi(r,i)}$. (4.8) Since the coordinateis chosenbythe almost cyclerule, therelation $r-B+1\leq,$ $(r, i)\leq r$
holds for all$r\in \mathcal{X}$
.
From(4.7),we
further obtain$\lim_{rarrow\infty,r\in \mathcal{X}}\nabla_{i}f(x^{\varphi(r,i)})=d_{i}^{\infty}$. (4.9) Now
we
define thefollowing sixindex sets associated with$d_{i}^{\infty}$as
$J_{1}^{\infty}:=\{i|d_{i}^{\infty}>\tau_{i}\}$; $J_{2}^{\infty}:=\{i|d_{i}^{\infty}<-\tau_{i}\})$ $J_{3}^{\infty}:=\{i||d_{i}^{\infty}|<\tau_{i}\}$; $J_{4}^{\infty}:=\{i|d_{i}^{\infty}=\tau_{i}, \tau_{i}>0\}$; $J_{5}^{\infty}:=\{i|d_{i}^{\infty}=-\tau_{i}, \tau_{i}>0\}$; $J_{6}^{\infty}:=\{i|d_{i}^{\infty}=0, \tau_{i}=0\}.$
Note that $\bigcup_{i=1}^{6}J_{i}^{\infty}=\{1,2, \cdots, n\}$. Nexttwo lemmas give sufficient conditions underwhich
$\{x_{i}^{r}\}_{\mathcal{X}}$ is fixed
or
liesinsome
interval.Lemma 4.5. Suppose thatAssumption 3.1(i) and(iii) hold, andthat$x^{\varphi(r,i)}$ is avector,
where
the i-th coordinate is chosenon the, $(r, i)$-th iteration. Let$L$and$\epsilon_{0}$ be theconstantsgiven in
Lemma 2.3.
If
$\epsilon^{\varphi(r,i)}<\epsilon_{0}$, then thefollowingstatementshold
for
anyfixed
$i$:(i)
If
$\nabla_{i}f(x^{\varphi(r,i)})-\tau_{i}>L\Vert A_{i}\Vert^{2}\epsilon^{\varphi(r,i)}$and$x_{i}^{\varphi(r,i)}\leq\epsilon^{\varphi(r,i)}+l_{i}$, then$x_{i}^{\varphi(r,i)}=l_{i}.$(ii)
If
$\nabla_{i}f(x^{\varphi(r,i)})+\tau_{i}<-L\Vert A_{i}\Vert^{2}\epsilon^{\varphi(r,i)}$ and$u_{i}-\epsilon^{\varphi(r,i)}\leq x_{i}^{\varphi(r,i)}$, then$x_{i}^{\varphi(r,i)}=u_{i}.$(iii)
If
$\nabla_{i}f(x^{\varphi(r,i)})+\tau_{i}>L\Vert A_{i}\Vert^{2}\epsilon^{\varphi(r,i)}and|x_{i}^{\varphi(r,i)}|\leq\epsilon^{\varphi(r,i)}$ , then$x_{i}^{\varphi(r,i)}\leq 0.$Lemma4.6.
Supposethat Assumption3.1
holds. Then,for
sufficiently large$r$,we
have $\{x_{i}^{r}\}_{\mathcal{X}}=l_{i}, \forall i\in J_{1}^{\infty}$; (4.10) $\{x_{i}^{r}\}_{\mathcal{X}}=u_{i}, \forall i\in J_{2}^{\infty}$; (4.11) $\{x_{i}^{r}\}_{\mathcal{X}}=0, \forall i\in J_{3}^{\infty}$; (4.12) $l_{i}\leq\{x_{i}^{r}\}_{\mathcal{X}}\leq 0, \forall i\in J_{4}^{\infty}$; (4.13) $0\leq\{x_{i}^{r}\}_{\mathcal{X}}\leq u_{i},\forall i\in J_{5}^{\infty}$; (4.14) $l_{i}\leq\{x_{i}^{r}\}_{\mathcal{X}}\leq u_{i}, \forall i\in J_{6}^{\infty}$. (4.15)Next,
we
will show $Ax^{r}arrow Ax^{*}$, where$x^{*}$ isan
arbitrary optimal solution of the problem(1.1). For this
purpose, we
recall Hoffman’serror
bound [2].Lemma4.7. Let$B\in \mathcal{R}^{k\cross n},$ $C\in \mathcal{R}^{k\cross n}$ and$e\in \mathcal{R}^{k},$ $d\in \mathcal{R}^{k}$
.
Suppose that the linearsystem$By=e,$$Cy\leq d$ isconsistent. There existsascalar$\theta>0$depending onlyon $B$ and$C$, and
such that,
for
any$\overline{x}\in[l, u],$ $l,$ $u\in \mathcal{R}^{n}$, there is apoint$\overline{y}\in \mathcal{R}^{n}$ satisfying $By=e,$$C\overline{y}\leq d$ and$\Vert\overline{x}-\overline{y}\Vert\leq\theta(\Vert B\overline{x}-e\Vert+\Vert(C\overline{x}-d)_{+}\Vert$, where $(x_{i})_{+}:= \max\{0, x_{i}\}.$Theorem
4.1.
Let$x^{*}be$an
optimal solutionofthe
problem(1.1). Thenwe
$ha \nu e\lim_{rarrow\infty}Ax^{r}=Ax^{*}.$Theorem4.1 implies that thereexists
a
scalar$\overline{r}>0$,forany
$r\geq\overline{r}$,such that$Ax^{r}\in B(Ax^{*})$,where$B(Ax^{*})$ istheclosed ball definedjustbefore(2.1). Notethat$g$isstrongly
convex.
Inthe third part of thissection,we
show the sufficient decreasing of$\{F(x^{r})\}$forsufficientlylarge$r.$
Lemma
4.8.
UnderAssumption 3.1, there existsascalar$\eta>0$such that$F(x^{r})-F(x^{r+1})\geq$$\eta\Vert x^{r}-x^{r+1}\Vert^{2}$
for
sufficientlylarge$r.$In the last part ofthis section, before showing the global and linear
convergence
of $\{x^{r}\},$wefirstrecall akind oftheLipschitzerrorbound in [10, 11, 18].
Lemma4.9. There existsascalarconstant$\kappa>0$ such that
for
any$Ax^{r}\in B(Ax^{*})$,$\Vert Ax^{r}-Ax^{*}\Vert\leq\kappa\Vert x^{r}-P_{\tau,l,u}(x^{r})\Vert$
.
(4.16)The following result is adirectextensionof[17,Lemma$4.5(a)$] totheproblem(1.1).
Lemma
4.10.
UnderAssumption 3.1, there exists a constant. $>0$such that the inequality$\Vert Ax^{r}-Ax^{*}\Vert\leq.\sum_{h=r}^{r+B-1}\Vert x^{h}-x^{h+1}\Vert$holds
for
sufficientlylarge$r.$Theorem
4.2.
Suppose that $\{x^{r}\}$ isgenemtedby the$ICD$ method with the almostcycle rule. Let $F^{*}$ denote the optimalvalueof
the problem(1.1). Then $\{F(x^{r})\}$ converges to $F^{*}at$least$B$-step$Q$-linearly.
Theorem 4.3. Suppose that $\{x^{r}\}$ isgeneratedby the $ICD$method with the almostcycle rule. Then thereexists
an
optimalsolution $x^{*}of$the problem(1.1)suchthat $\{x^{r}\}$converges to$x^{*}at$least$R$-linearly.
5
Conclusions
Inthispaper, wehave presentedaframework of the ICD method for solving$l_{1}$-regularized
con-vex
optimization(1.1). We also have established the$R$-linearconvergence rateof this methodunder the almost cycle$mle$. The keytotheICD method lies in Assumption3.1 forthe”inexact
solution”. On each
iteration
step,we
only needto findan
approximate solution, thatraisesthepossibilitytosolve general $l_{1}$-regularized
convex
problem.The proposed ICD method solves
an
one-dimensional subproblemon
each iteration. TheBlock Coordinate Descentmethod, which solves asmall scalemulti-dimensionalsubproblem,
is efficient for
some
practical problems. Thus it is interesting to extend the proposed ICD method to the ”inexact” block$CD$method.References
[1] A. Gholami and H. R. Siahkoohi, Regularization of linear and non-linear
geophysi-calill-posed problems withjoint sparsity constraints,Geophysical Joumal Intemational,
Vol.180, No.2,
pp.
871-882, 2010.[2] A. J. Hoffman, On approximate solutions of systems of linear inequalities, Joumal of Research of the National Bureau ofstandards, Vol.49, No.4,pp. 263-265, 1952.
[3] D. G. Luenberger, Linear and nonlinearprogramming, Addision-Wesley, Reading,
Mas-sachusetts, 1973.
[4] H. Liu, M. Palatucci, and J. Zhang, Blockwise coordinate descent procedures for the
multi-tasklasso, with applications to neuralsemanticbasis discovery, Proceedings of the 26th Annual Intemational Conference
on
MachineLeaming, pp. 649-656,2009.[5] J. M. Borwein and A. S.
Lewis.
Convex analysisandnonlinearoptimization:
theory andexamples, secondedition,New York: spinger-verlag, 2000, Canadian mathematical
soci-etybooksin mathematics.
[6] J. M. Ortega and W. C. Rheinboldt, Iterative solution of nonlinear equations in several
variables,Reprinted by SIAM, Philadelphia,
2000.
[7] K. Koh, S. J. Kim, and S. Boyd, Aninterior-point method for large- scale $l_{1}$-regularized
logistic regression, Joumal of Machine Learmng Research8, pp.1519-1555, 2007.
[8] M. A. T. Figueiredo,R. D.Nowak and S. J. Wright. Gradientprojectionfor
sparse
recon-stmction: Applicationto compressedsensing and otherinverse problems, IEEE Joumal
of Selected Topics in Signal Processing,Vol.1, No.4, pp. 586-597, 2007.
[9] M.Y.Park, and T.Hastie,Ll-regularization path algorithm for generalized linearmodels,
Joumalof theRoyalStatisticalSociety: SeriesB(StatisticalMethodology),Vol. 69,No.4,
pp.
659-677,2007.
[10] P.Tseng, Approximation
accuracy,
gradientmethods,anderror
bound for stmcturedcon-vex
optimization, Mathematical Programming: Series A andB -20th IntemationalSym-posium
on
MathematicalProgramming-ISMP2009,Vol. 125NO. 2,pp.
263-295,2010.
[11] P. Tseng, Convegence ofa block coordinate descent method for nondifferentiable
mini-mization, Joumal ofoptimization and Applications, Vol.109, pp. 475-494,2001.
[12] P. Tseng and S. Yun. A coordinate gradient descent method for nonsmooth separable
minimization, MathProgramming, Vol. 117,
pp.
387-423,2009.
[13] R. T. Rockafellar, Convex analysis, Princeton University Press, Princeton, New Jersey,
1970.
[14] T. T. Wu, and K. Lange, Coordinate descent algorithms for lasso penahzed regression,
The Annals ofAppliedStatistics,Vol.2, No.1,
pp.
224-244,2008.
[15] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, Bregman iterative algorithms for $l_{1^{-}}$
minimizationwith applicationstocompressedsensing, SIAM J. ImagingSciences, Vol.1,
No.1,
pp.
143-168,2008.[16] X. Q. Hua, N. Yamashita, An Inexact Coordinate Descem Method for the
Weighted $l_{1}$-regularized Convex optimization Problem, Technical Reports 2012,
[17] Z. Q. Luo and P. Tseng, On the
convergence
ofthe coordinate descent methodforconvex
differentiableminimization, Joumal ofoptimizationandApplications, Vol.72,
pp.
7-35,1992.
[18] Z. Q. Luo and P. Tseng, On the linear
convergence
of descent methods forconvex
es-sentially smooth minimization, SIAM J. Control and optimization, Vol.30, pp. 408-425,
1992.