On global minimization for general $p$-regularized subproblems with $p>2$ (Nonlinear Analysis and Convex Analysis)

全文

(1)47. 数理解析研究所講究録第2011巻 2016年 47-56. On. global. minimization for. subproblems. general p ‐regularized. with. p>2. Reuy‐Lin Sheu1 1.. Department. of. Mathematics, National Cheng Kung University, Tainan, Taiwan. Abstract The. p‐‐regularized subproblem (p‐RS). Newton‐like step for unconstrained. regularization. RS). term. \displaystyle\frac{$\sigma$}{p}\Vertx\Vert^{p}. .. In this. is. regularizing. a. term for. computing. optimization, which incorporates. article,. for p>2 with necessary and sufficient. resolve the. we. optimality. global. a. a. weighted. minimizers of. (\mathrm{p}-. conditions.. Key Words: Thrust‐region subproblem; Local minimizer; Extended Trust‐region sub‐. problem.. 1. Introduction For. an. has. an. the. global. unconstrained optimization. attractive local convergence property. to. imizes. a. near a. guarantee. Levenberg‐Marquardt type of methods. maintain. to minimize. convergence for Newtons method with. ity, however, requires modifications the. problem. positive‐definite approximate. or. a. f. over. \mathbb{R}^{n} , Newtons method. second order critical. an. analyzable computational complex‐. sufficient descent. most. Hessian of. point. Ensuring. at each. step. Unlike. quasi‐Newton methods which always. f. ,. the. p‐‐regularized subproblem. globally the second order Taylors polynomial of f plus. a. weighted (by. $\sigma$. min‐. ) higher.

(2) 48. order regularization term. The. (p‐RS) where. ,. It is often assumed that. f. is smooth. decrease in the value of. f. ,. global. accepted;. it is. f. enough. convergence. If the. global. takes the. following model. \displaystyle \min_{x\in \mathb {R}^{n} \{g(x)=\frac{1}{2}x^{T}Hx+c^{T}x+\frac{ $\sigma$}{p}\Vert x\Vert^{p}\},. H is the Hessian of. $\sigma$>0, p>2 and. the desired. subproblem. but. at any. iterate, regardless of its definiteness.. to have. a. symmetric Hessian and. minimizer of. rejected. (p‐RS). renders. otherwise with. to obtain. satisfactory. a. increase in. an. $\sigma$. to. enhance the regularization force.. literature, (p‐RS). In most. common. p=3. is known. choice among all others.. due to Griewank convergence and. [9]. and later. and. by. Gould and Toint. Cartis,. regularization. The idea of the cubic. considered. was. the cubic. as. and. [2].. which is the. regularization. many authors with. complexity analysis. See Nesterov. [17];. and Erdmann. with. Polyak [15];. When p=4. first. thorough global. Weiser Deuflhard. (p‐RS). ,. was. reduces to. a. form of the double well potential function which has many applications in solid mechanics and quantum mechanics. general p>2. in. [5, 18]. Gould,. Robinson and Thorne. [8]. studied. (p‐RS). for. a. comparison with the the trust‐region subproblem. (TRS). \displaystyle \min. \displaystyle \frac{1}{2}x^{T}Hx+c^{T}x. \mathrm{s}. \mathrm{t}. \Vert x\Vert^{2}\leq\triangle, x\in \mathbb{R}^{n}. In this. article,. we. characterize. necessary and sufficient. (p‐RS) completely. global optimality conditions for p=3. using the secular function (to be specified later) for p=4 Notations. matrix. Let. v. denote the. P\in \mathbb{R}^{n\times n}, P\succ(\succeq)0. means. P is denoted. by \det(P) whereas. Diag (x). diagonal. $\beta$\in \mathbb{R},. is. a. \displaystyle\mathrm{s}\mathrm{i}\mathrm{g}\mathrm{n}($\beta$)=\frac{$\beta$}{|$\beta$|}. eigenvalue of. P.. the. matrix with. if. p>2 by (i) extending the. for any. $\beta$\neq 0. ,. in. in. [2]. identity. n. diagonal components being. otherwise. For any. (semi)definite.. matrix of order. analysis. by. I. x_{1} ,. .. symmetric. The determinant of For. \cdots. \mathrm{s}\mathrm{i}\mathrm{g}\mathrm{n}( $\beta$)=0 Finally, $\lambda$_{i}(P) .. the. [18].. optimal value of problem. that P is positive. (ii). and. ,. a. x_{n}. .. vector. For. a. x\in \mathbb{R}^{n}, number. is the ith smallest.

(3) 49. 2Main results for We first observe that the. global minimization. objective. function. g(x). (p‐RS). of. is. coercive, i.e.,. \displaystyle \lim g(x)=+\infty.. | x\Vert\rightarrow+\infty. Consequently, the global analysis is the first. minimizer of. (p‐RS) always. exists.. The. starting point of the. order and the second order necessary conditions for any local minimizer. of g.. Lemma 1 Assume that \underline{x} is. a. local minimizer. of (p‐RS), p>2. \nabla g(\underline{x})=(H+ $\sigma$\Vert\underline{x}\Vert^{p-2}I)\underline{x}+c=0. .. It holds that. (1). ,. \nabla^{2}g(\underline{x})=(H+ $\sigma$\Vert\underline{x}\Vert^{p-2}I)+ $\sigma$(p-2)\Vert\underline{x}\Vert^{p-4}\underline{xx}^{T}\succeq 0 where. \nabla g, \nabla^{2}g denote. the. gradient and the Hessian of g(x) respectively. ,. The next theorem shows. $\sigma$\Vert\underline{x}\Vert^{p-2}I\succeq sufficiency. O. The. that,. a. local minimizer \underline{x} becomes. necessity has been shown by Theorem. The point x^{*} is. critical point. satisfy ing \nabla g(x^{*})=0. all the. minimizers. global. If x^{*}=0_{n}. ,. are. then. a. global. minimizer. 2 in. [8].. if and We. and. of (p‐RS) for p>2 if. only. only. and. H+ $\sigma$\Vert x^{*}\Vert^{p-2}I\succeq 0 Moreover, .. if H+. prove the. only if it. the. P_{2}. is. norms. a. of. equal.. $\sigma$\Vert x^{*}\Vert^{p-2}=0. so. that. H=H+ $\sigma$\Vert x^{*}\Vert^{p-2}I\succeq 0 Consequently, x^{T}Hx\geq 0, .. global. global. here.. Theorem 1. Proof. (2). ,. c=-(H+ $\sigma$\Vert x^{*}\Vert^{p-2}I)x^{*}=0. and. \forall x\in \mathbb{R}^{n} It follows that x^{*}=0_{n} is .. minimizer since. g(x)=\displaystyle \frac{1}{2}x^{T}Hx+c^{T}x+\frac{ $\sigma$}{p}\Vert x\Vert^{p}\geq\frac{ $\sigma$}{p}\Vert x\Vert^{p}>0=g(0) , \foral x\neq 0_{n}=x^{*}.. a.

(4) 50. Now. x^{*}\neq 0_{n} i.e., \Vert x^{*}\Vert>. we assume. ,. assumption, Q\succeq 0 Then, for .. O. Define. any x\in \mathbb{R}^{n} and. Q=H+ $\sigma$\Vert x^{*}\Vert^{p-2}I According. to the. .. x\neq x^{*}. ,. it holds that. g(x) = \displaystyle \frac{1}{2}x^{T}Hx+c^{T}x+\frac{ $\sigma$}{p}\Vert x\Vert^{p} = \displaystyle \frac{1}{2}x^{T}Qx+c^{T}x-\frac{1}{2}( $\sigma$\Vert x^{*}\Vert^{p-2})x^{T}x+\frac{ $\sigma$}{p}\Vert x\Vert^{p}. =\displaystyle\frac{1}{2}x^{T}Qx+c^{T_{X} +\frac{$\sigma$}{p}\Vertx^{*}\Vert^{p}( \frac{|x\Vert^{2} {\Vertx^{*}\Vert^{2} )^{2}e-\frac{p}{2}\frac{|x ^{2} {\Vertx^{*}|^{2} ). Define. f(t)=t^{2}2,. p>2. .. It is. strictly. convex. for t>0. .. (3). Therefore,. f(t)=t^{B}2\displaystyle \geq f(1)+f'(1)(t-1)=1+\frac{p}{2}(t-1) , \forall t>0. By substituting. t with. \displaystle\frac{|x\Vert^{2} \Vertx^{*}\Vert^{2}. ,. we. have. (\displaystyle\frac{|x|^{2} {\Vertx^{*}|^{2} )^{2}-\frac{p}{2}\frac{|x|^{2} {\Vertx^{*}|^{2} \geq21-\frac{p}{2}. Then,. g(x) \displaystyle \geq \frac{1}{2}x^{T}Qx+c^{T}x+\frac{ $\sigma$}{p}\Vert x^{*}\Vert^{p}(1-\frac{p}{2}) By Q\succeq 0 quadratic. ,. the lower. in terms of. minimizer of the. x. bounding function of .. convex. Since x^{*} satisfies function in the. g in the. right. (4). .. hand side of. (H+ $\sigma$\Vert x^{*}\Vert^{p-2}I)x^{*}=Qx^{*}=-c,. right. hand side of. (4).. As. a. (4). is. x^{*} is. convex a. global. consequence,. g(x)\displaystyle \geq\frac{1}{2}(x^{*})^{T}Qx^{*}+c^{T}x^{*}+\frac{ $\sigma$}{p}\Vert x^{*}\Vert^{p}(1-\frac{p}{2})=g(x^{*}) and x^{*} is. a. Finally,. global from. \displaystyle \frac{1}{2}x^{T}Qx+c^{T}x Qx=-c. and. Remark 1. found. minimizer of. (3),. and. (p‐RS).. if \hat{x} is also. a. global. minimizer of. (\displaystyle\frac{|x\Vert^{2} {\Vertx^{*}|^{2} )^{2}B-2\Vertx\Vert^{2}. 2\overline{\Vert x^{*}| ^{2} simultaneously.. (p‐RS),. This. can. \hat{x} must minimize both. happen. if and. only. if. can. be. \Vert\hat{x}\Vert=\Vert x^{*}\Vert.. When p=3 , two other. in Theorem 3.1 in. l2J. proofs of the. and Theorem 10 in. necessary and. sufficient. [15J respectively. ,. condition. We notice that the. proof.

(5) 51. in. [2]. is inherited. from. of. that. the necessary and. subproblem l37 and the proof in l15\displaystyle \int highly relies Our. proof is. between. much easier in. g(x). and. g(x^{*}). understanding,. sufficient condition for the trust‐region the special structure. on. based. simply. since it is. on a. of the direct. case. p=3.. comparison. .. To characterize the set of. global. (p‐RS),. minimizers of. we. may. assume. that H is. diagonal, i.e.,. H=\mathrm{D}\mathrm{i}\mathrm{a}\mathrm{g}(\mathrm{a}_{1}, \ldots, $\alpha$_{n}). (5). ,. where. $\alpha$_{1}=\ldots=$\alpha$_{k}<$\alpha$_{k+1}\leq\ldots\leq$\alpha$_{n} and k is the. multiplicity. of the smallest. eigenvalue decomposition of a. diagonal. version of. Theorem 2 The set. (p‐RS). H. .. Let. eigenvalue. y=U^{T}x. .. Let x^{*} be any. that t^{*} is. independent of the. global. minimizers. If H+ $\sigma$ t^{*}I is. choice of x^{*} since the .. .. .. specific. we. obtain. singleton. n. ,. .. is,. k ‐dimensional. t^{*}=\Vert x^{*}\Vert^{p-2}\geq 0. of all the. That. or a. global. Notice. .. minimizers. are. t^{*}\displaystyle \in(\max\{- $\alpha \Delta \sigma$, 0\}, +\infty). define the set of the. minimizer x^{*} is. uniquely. defined. .. global. by (the. that) . on a. norms. satisfying (H+ $\sigma$ t^{*}I)x^{*}=-c. invertible, the global. a. and define. \ell_{2}. x_{i}^{*}=\underline{-c_{i}}i=1 $\alpha$_{i}+ $\sigma$ t^{*} By summing all. is either. (p‐RS). minimizer of. ,. x^{*}. and. be the. in terms of y.. equal. By Theorem 1, $\alpha$_{i}+ $\sigma$ t^{*}\geq 0, \forall i=1 2, Moreover, the solutions. Otherwise, let H=U $\Sigma$ U^{T}. .. \Vert y\Vert=\Vert U^{T}x\Vert=\Vert x\Vert. Then. of global minimizers of (p‐RS). Proof. still unknown t^{*}. $\alpha$_{1}. (x_{i}^{*})^{2}, t^{*}. is. necessarily. a. ,. nonnegative. ... .. ,. n.. root of the. following secular function. open interval:. h(t)=\displaystyle \sum_{i=1}^{n}\frac{c_{i}^{2} {($\alpha$_{i}+ $\sigma$ t)^{2} -t\frac{2}{\mathrm{p}-2}, t\in I_{g}=(\max\{-\frac{$\alpha$_{1} { $\sigma$},0\}, +\infty). .. (6).

(6) 52. \displaystyle \lim_{t\rightar ow\max\{-\lrcorner^{$\alpha$_{0\}} $\sigma$},h(t)>0, \displaystyle \lim_{t\rightar ow+\infty}h(t)=-\infty. Since. secular function. On the other not. can. h(t). has. unique. a. hand, H+ $\sigma$ t^{*}I. root. is. happen for $\alpha$_{1}>0. ) Then,. I_{g}. on. singular. ,. h(t). and. is. strictly decreasing. on. I_{g}. the. ,. which must be t^{*}.. in which. c_{1}^{2}+\ldots+c_{k}^{2}=0. ,. case. t^{*}=\displaystyle \frac{- $\alpha$ 1}{ $\sigma$} (Obviously, .. this. and $\alpha$_{i}+ $\sigma$ t^{*}>0, i=k+1, k+2 ,. case. ... .. ,. n. such that. is a. \displaystyle \hat{x}^{*}=(0, \ldots, 0, \frac{-c_{k+1} {$\alpha$_{k+1}-$\alpha$_{1} , \ldots, \frac{-c_{ $\eta$} {$\alpha$_{n}-$\alpha$_{1} )^{T}. one. (H-$\alpha$_{1}I)x^{*}=-c By summing. trivial solution to. .. all. (\hat{x}_{i}^{*})^{2}. in. (7) (7),. we. again obtain. secular function. \displaystyle\hat{h}(t)=\sum_{i=k+1}^{n}\frac{ _{i}^{2} {($\alpha$_{i}+$\sigma$t)^{2} -t\frac{2}{p-2},t\inI_{\hat{g} =[-\frac{$\alpha$_{1} {$\sigma$},+\infty) Notice that. then. \hat{h}(t). If does. is the. t^{*}=- $\alpha$\lrcorner $\sigma$. minimizer of. is also. strictly decreasing root of. unique. \hat{h}(t). on. on. I_{\hat{g}. and. I_{\overline{g} Thus, .. \displaystyle \lim_{t\rightar ow+\infty}\hat{h}(t)=-\infty. \hat{x}^{*} defined. by (7). (8). .. .. If. is the. \hat{h}. (-\displaystyle \frac{ $\alpha$ 1}{ $\sigma$})=0,. unique global. (p‐RS).. \hat{h}. (-\displaystyle \frac{ $\alpha$ 1}{ $\sigma$})<0 then (8) has no solution and the trivial solution \hat{x}^{*} to (H-$\alpha$_{1}I)x^{*}=-c not satisfy t^{*}=\displaystyle \frac{- $\alpha$ 1}{ $\sigma$}=\Vert\hat{x}^{*}\Vert^{p-2} Then, any x^{*} satisfying ,. .. (x_{1}^{*})^{2}+\displaystyle\ldots+(x_{k}^{*})^{2}+\sum_{i=k+1}^{n}\frac{ _{i}^{2} {($\alpha$_{i}-$\alpha$_{1})^{2} =(\frac{-$\alpha$_{1} {$\sigma$})^{\frac{2}{p-2} is. minimizer of. global. a. dimensional. (p‐RS). Namely,. sphere centered. at. the. global. (9). minimum solution set forms. a. k‐. (0, \displaystyle \cdots, 0, -\frac{\mathrm{C}k+1}{$\alpha$_{k+1- $\alpha$ 1} , \cdots, -\frac{\mathrm{c}_{n} {$\sigma$_{n}- $\sigma$ 1}) with the radius. \sqrt{(\frac{$\alpha$_{1}{$\sigma$})^{\frac{2}p-2} \sum_{i-k+1}^{n}\frac{ _i}^{2}{($\alpha$_{i}-$\alpha$_{1})^{2} . Otherwise, \hat{h} obtain. a. contradiction that. Finally, can. we. show that. be obtained. have. (-\displaystyle \frac{ $\alpha$ 1}{ $\sigma$})>0. ,. then. (p‐RS). (p‐RS). by solving. (8). an. has. has. no. no. possesses. solution and. global. some. equivalently. (9). cannot hold for any x^{*}. minimizer. The. hidden. proof. convexity that. reformulated. convex. its. is thus. .. We. complete.. global minimizer. programming. We first.

(7) 53. Proposition. 1. Suppose. H is. Let x^{*} be any. diagonal.. c_{t}x_{i}^{*}\leq 0, Proof. Comparing. x^{*} with. i=1 ,. .. .. .. ,. global. minimizer. of (p‐RS),. then. n.. \overline{x}=(-x_{1}^{*}, x_{2}^{*}, x_{3}^{*}, \ldots, x_{n}^{*}). ,. we. immediately have. 0\geq g(x^{*})-g(\overline{x})=c_{1}(x_{1}^{*}-\overline{x}_{1})=2c_{1}x_{1}^{*}. A similar argument. to all other. applying. components yields the result. This completes. proof.. the. By Proposition 1, (p‐RS). and. (10). below share the. same. optimal solution. \displaystyle\min\sum_{i=1}^{n}\{-$\alpha$_{2^{X_{i}^{2} $\Delta$+c_{$\eta$}x_{i}\+\frac{$\sigma$}{p}(\sum_{i=1}^{n}x_{i}^{2})^{2}$\epsilon$. \mathrm{s}.. Introducing. the. i=1 ,. c_{t}x_{i}\leq 0,. ... .. ,. (10). n.. the nonlinear one‐to‐one map:. x_{i}=\left\{ begin{ar y}{l \sqrt{z_i},\mathrm{i}\mathrm{f}c_{\dot{$\tau$}\leq0,\ i=1,. n,\ -\sqrt{z_i},\mathrm{i}\mathrm{f}c_{\dot{$\tau$}>0, \end{ar y}\right.. problem (10) becomes the following min \mathrm{s}.. The. \mathrm{t}.. global optimal. mation. set.. \mathrm{t}.. convex. (11). program:. -\displaystyle\sum_{i=1}^{n}|c_{$\eta$}|\sqrt{z_{i}+\frac{1}{2}\sum_{i=1}^{n}$\alpha$_{i}z_{i}+\frac{$\sigma$}{p}(\sum_{i=1}^{n}z_{i})^{2}$\epsilon$. z_{i}\geq 0,. solution of. (12). i=1 , can. ... .. ,. (12). n.. be converted to generate x^{*}. through. (11).. Remark 2 The. first. two derivatives. of the. secular. function h(t). h'(t)=\displaystyle\sum_{i=1}^{n}\frac{-2$\sigma$c_{i}^{2}{($\alpha$_{i}+$\sigma$t)^{3}-\frac{2}{p-2}t^{\frac{4-}{p-}f_{2} and. h'(t)=\displaystyle\sum_{i=1}^{n}\frac{6$\sigma$^{2}c_{i}^{2}{($\alpha$_{i}+$\sigma$t)^{4}-\frac{2(4-p)}{(p-2)^{2}t^{\frac{6-}{\mathrm{p}\text{∽}2-2.. are. the transfor‐.

(8) 54. h(t). strictly decreasing. is. that. ensures. choosing. the. c\neq 0). is. always. convex. convex. a. only for p\geq 4. finite. h(t). $\sigma$,. .. subinterval. can. be made. For. p=3 if H\not\geq 0 (which ,. of I_{g} covering t^{*} by properly ,. convex.. Notice that the secular. though.. Conclusions. solving. regularization subproblem. unconstrained optimization. regularization subproblem on. and. regularization parameter. Since the cubic for. I_{g}. and h is restricted to. function for (TRS). 3. on. the. between the two types of. problems,. is very close to. p‐‐regularized subproblems. for. can. (TRS). be used to it is in. modify Newtons method. generally. believed that the cubic. spirit. Our comprehensive analysis. general p>2 gives. the most detailed. subproblems and confirms that they do share. comparison. many similar. features.. References [1]. D. Bienstock and A.. Michalka, Polynomial Solvability of. Subproblem, ACM‐SIAM Symposium. [2]. [3]. C.. Cartis,. N. I. M.. optimization. Part. Mathematical. Programming, 127 (2011),. A. R.. Conn,. E.. Erkut,. search 46. N. I. M. on. Discrete. Algorithms (2014),. Gould, Ph.. I:. L.. motivation,. pp. 380‐390.. convergence and numerical. The discrete p ‐dispersion pp. 48−60. results,. pp. 245‐295.. Toint, Trust‐Region Methods, Number. optimization, Philadelphia, SIAM,. (1990),. of the Tmst‐Region. Gould, Ph. L. Toint, Adaptive cubic regularisation methods for. unconstrained. SIAM Series. [4]. on. Variants. 01 MPS‐. 2000.. problem, European Journal of Operational Re‐.

(9) 55. [5]. S. C. Fang, D. and Its. 0ptimization. (2014). Mech. Solids. [6]. M. R.. Gao, G.. GAREY,. X.. Lin, R. L. Sheu,. in The. W.. n ‐dimensional. Real. [8]. D. M.. D. S.. 2(1981),. pp. 186‐197.. N. I. M.. Gould, D.. P.. Robinson,. regularised subproblems. [9]. A.. Part. I,. to appear in Math.. JOHNSON, Computers and Intractability: A guide. Gay, Computing optimal locally. 2(2010),. Space‐. arXiv: 1410.5925. of NP‐completeness, Freeman, San Francisco,. [7]. Xing, Double Well Potential Function. in. to the. 1979.. constrained steps, SIAM J. Sci. Stat.. H. Sue. theory. Comput.,. Thorne, On solving trust‐region and other. optimization, Mathematical Programming Computation,. pp. 21‐57.. Griewank,. The. modification of Newton^{\mathrm{Z}}s method for unconstrained optimization. by bounding cubic terms, Technical Report \mathrm{D}\mathrm{A}\mathrm{M}\mathrm{T}\mathrm{P}/\mathrm{N}\mathrm{A}12 Department of Applied ,. Mathematics and Theoretical. [10]. P.. Hansen,. I.. TIMS/ORSA [11]. [12]. [13]. Y.. Hsia, R.. Sheu,. Trust. on. a. network, Presentation. 1981.. at. the. Meeting, Washington, D.C. (1988). Region Subproblem with. a. Fixed Number of Additional. Linear. Inequality Constraints has Polynomial Complexity, arXiv:1312.1398,. R. A.. Horn, C. R. Johnson, Matrix Analysis, Cambridge: Cambridge University. Press,. 1985.. J. M.. Martínez, Local. spheres, SIAM. [14]. Moon, Dispersing facilities. Joint National. L.. Physics, Cambridge University, Cambridge, UK,. J. J.. J.. minimizers. of quadratic function. optimization, 4(1994),. Moré, D. C. Sorensen, Computing. COMpUt., 4(1983),. pp. 553‐572.. on. 2013. Euclidean balls and. pp. 159‐176.. a. trust. region step, SIAM J. Sci. Statist..

(10) 56. [15]. Y.. Nesterov, B. T. Polyak, Cubic regularization of. performance,. [16]. D. C.. M.. Program., 108(2006),. Sorensen, Newtons method with. Numer.. [17]. Math.. Anal., 19(1982),. Weiser,. P.. Deuflhard,. Y.. global. pp. 177‐205.. model trust. region modification, SIAM J.. pp. 409‐426.. B.. nonlinear elastomechanics,. [18]. a. Newton method and its. Erdmann, Affine conjugate adaptive Newton methods for. Optim.. Methods. Softw., 22(2007),. pp. 413‐431.. Xia, S. C. Fang, R. L. Sheu, W. Xing, Double Well Potential Function and Its. optimization Solids. (2014). in The. n ‐dimensional. arXiv: 1404. 1963. Real. Space—. Part. II,. to appear in Math. Mech..

(11)