• 検索結果がありません。

A hybrid method of three-term conjugate gradient method and memoryless quasi-Newton method for unconstrained optimization

N/A
N/A
Protected

Academic year: 2021

シェア "A hybrid method of three-term conjugate gradient method and memoryless quasi-Newton method for unconstrained optimization"

Copied!
20
0
0

読み込み中.... (全文を見る)

全文

(1)

A hybrid method of three-term conjugate gradient

method and memoryless quasi-Newton method for

unconstrained optimization

Shummin Nakayama

(Received September 29, 2017; Revised May 21, 2018)

Abstract. Memoryless quasi-Newton methods are studied for solving large-scale unconstrained optimization problems. Nakayama et al. (2017) proposed a memoryless quasi-Newton method based on the spectral-scaling Broyden family and showed that the method satisfies the sufficient descent condition and con-verges globally. To relax the conditions on parameters in the method, we apply the modification technique by Kou and Dai (2015) to the method of Nakayama et al., and we give a hybrid method of the three-term conjugate gradient method and the memoryless quasi-Newton method based on the spectral-scaling Broy-den family. We show that our method satisfies the sufficient descent condition, and we prove that the method converges globally. Furthermore, we give a con-crete choice of parameters for our method. Finally, some numerical results are given.

AMS 2010 Mathematics Subject Classification. 90C30, 90C06.

Key words and phrases. Unconstrained optimization, memoryless quasi-Newton

method, Broyden family, three-term conjugate gradient method, sufficient de-scent condition, global convergence.

§1. Introduction

Quasi-Newton methods are known as effective numerical methods for solving the following unconstrained optimization problem

min x∈Rn f (x),

where f is a smooth function. We denote its gradient ∇f by g. The quasi-Newton method is an iterative method of the form:

(1.1) xk+1= xk+ αkdk,

(2)

where xk∈ Rnis the k-th approximation to a solution, the step size αk> 0 is obtained by some line search, and the search direction dk is given by

(1.2) dk=−Hkgk.

Here, we denote ∇f(xk) by gk, and Hk is an approximation to the inverse Hessian 2f (xk)−1. In this paper, we fix the initial direction by d0 = −g0. The matrix Hk is updated at each iteration such that the secant condition

Hkyk−1 = sk−1

is satisfied, where sk−1 and yk−1 are defined by

sk−1 = xk− xk−1 = αk−1dk−1 and yk−1 = gk− gk−1,

respectively. The well-known updating formulas are the BFGS, DFP and symmetric rank-one (SR1) formulas. This paper focuses on the Broyden family

(1.3) Hk= Hk−1− Hk−1yk−1ykT−1Hk−1 ykT−1Hk−1yk−1 +sk−1s T k−1 sTk−1yk−1 + θk−1wk−1wkT−1, (1.4) wk−1 = √ yT k−1Hk−1yk−1 ( sk−1 sT k−1yk−1 Hk−1yk−1 yT k−1Hk−1yk−1 ) ,

where θk−1 is a parameter. In particular, the formula (1.3) becomes the DFP formula when θk−1 = 0, and the BFGS formula when θk−1 = 1. Moreover, the Broyden family (1.3) is a convex combination of the DFP formula and the BFGS formula if θk−1 ∈ [0, 1], and we say this interval convex class. The BFGS formula (θk−1 = 1) is known as one of the best choices in practice. On the other hand, Zhang and Tewarson [27] dealt with the preconvex class to find a better choice than the BFGS formula. The preconvex class means the interval θk−1 > 1. If sTk−1yk−1 > 0, θk−1 ≥ 0 and Hk−1 is symmetric positive definite, then the matrix Hk updated by the Broyden family (1.3) is also symmetric positive definite. This guarantees that the search direction satisfies the descent condition, namely, gkTdk=−gkTHkgk< 0.

Since quasi-Newton methods need the storage of memories for matrices, it is difficult to apply quasi-Newton methods directly to large-scale uncon-strained optimization problems. In order to remedy this difficulty, Shanno [22] proposed the memoryless quasi-Newton method for solving large-scale un-constrained optimization problems, and the method avoids the storage of memories for matrices. Specifically, Shanno substituted (1.3) and (1.4) with

(3)

Hk−1 = I into (1.2) and derived the following search direction: dk=− gk+ ( θk−1 ykT−1gk dT k−1yk−1 ( 1 + θk−1 ykT−1yk−1 sT k−1yk−1 ) sTk−1gk dT k−1yk−1 ) dk−1 + ( θk−1 dTk−1gk dT k−1yk−1 + (1− θk−1) yTk−1gk yT k−1yk−1 ) yk−1. (1.5)

The memoryless quasi-Newton method is closely related to the nonlinear con-jugate gradient (CG) method. Under the exact line search, namely gkTdk−1 = 0, then the search direction (1.5) with θk−1 = 1 becomes

(1.6) dk=−gk+

ykT−1gk

dT k−1yk−1

dk−1,

which is identical to the nonlinear CG method with the Hestenes-Stiefel (HS) formula (see [17, 19], for example). Recently, three-term CG methods have been paid attention to (see [1, 18, 26], for example). By using the memoryless quasi-Newton method based on the BFGS formula, several three-term CG methods have been proposed (see [2, 24, 25], for example).

In this decade, several memoryless quasi-Newton methods have been stud-ied. Kou and Dai [14] proposed the modified self-scaling memoryless BFGS method. Furthermore, several researchers have paid attention to the mem-oryless quasi-Newton methods based on other updating formulas instead of the BFGS formula. Nakayama et al. [15] proposed the memoryless quasi-Newton method based on the SR1 formula with the spectral-scaling secant condition [5]. The above methods always satisfy the sufficient descent condi-tion which means that there exists a positive constant c such that

(1.7) gTkdk≤ −c∥gk∥2 for all k,

where∥·∥ is the ℓ2norm. Moreover, they showed the global convergence of the method for general objective functions. Nakayama et al. [16] also proposed the memoryless quasi-Newton method based on the spectral-scaling Broyden fam-ily [4] and gave a sufficient condition for the global convergence of the method. In their numerical experiments, they showed that the proposed method with the preconvex class performed better than the method with the convex class did.

This paper focuses on the modification of the method by Kou and Dai [14]. Considering their modification technique, we modify the memoryless quasi-Newton method based on the Broyden family [16] and propose a new method, which always satisfies the sufficient descent condition. We show that the method converges globally for general objective functions. Furthermore,

(4)

we give parameters for which our method can be regarded as a three-term CG method.

This paper is organized as follows. We recall some preliminaries and previ-ous researches in Section 2. In Section 3, we propose a new method and show its global convergence. Finally, some numerical results are given.

§2. Preliminaries and previous researches

2.1. Preliminaries

In this subsection, we recall some preliminaries. We first make the following standard assumptions for the objective function.

Assumption 1.

(i) The level setL = {x | f(x) ≤ f(x0)} at the initial point x0 is bounded, namely, there exists a positive constant ˆa such that

(2.1) ∥x∥ ≤ ˆa for all x ∈ L.

(ii) The objective function f is continuously differentiable on an open convex neighborhood N of L, and its gradient g is Lipschitz continuous on N , namely, there exists a positive constant L such that

(2.2) ∥g(u) − g(v)∥ ≤ L∥u − v∥ for all u, v ∈ N .

Note that, under Assumption 1, there exists a positive constant Γ such that (2.3) ∥g(x)∥ ≤ Γ for all x ∈ L.

Throughout this paper, we assume that

gk ̸= 0 for all k ≥ 0, otherwise a stationary point has been found.

In the line search procedure, we require the step size αk in (1.1) to satisfy the Wolfe conditions:

f (xk+ αkdk)− f(xk)≤ δαkgkTdk, (2.4)

g(xk+ αkdk)Tdk≥ σgkTdk, (2.5)

where 0 < δ < σ < 1. If the search direction satisfies the descent condition, then condition (2.5) yields

(5)

Since this paper deals with the Wolfe condition (2.4)–(2.5), we have (2.7) dTkyk > 0 (sTkyk> 0) for all k≥ 0,

which implies∥sk∥ ̸= 0 and ∥yk∥ ̸= 0.

2.2. Modified self-scaling memoryless BFGS method

In this subsection, we review the memoryless quasi-Newton method by Kou and Dai [14]. Modifying the memoryless quasi-Newton method based on the self-scaling BFGS method, Kou and Dai [14] proposed the following search direction: dk=− gk+ ( yTk−1gk dTk−1yk−1 ( τk−1+ ykT−1yk−1 sTk−1yk−1 ) sTk−1gk dTk−1yk−1 ) dk−1 + νk−1 dTk−1gk dT k−1yk−1 yk−1, (2.8)

where νk−1 ∈ [0, 1] is a parameter and τk−1 is a scaling parameter satisfying

τk−1∈ [ sTk−1yk−1 sT k−1sk−1 , y T k−1yk−1 sT k−1yk−1 ] ,

which corresponds to the interval proposed by Oren and Luenberger [20, 21]. We note that since sTk−1yk−1 > 0,

(2.9) s T k−1yk−1 sTk−1sk−1 yTk−1yk−1 sTk−1yk−1

holds by the Cauchy-Schwarz inequality. The search direction (2.8) with

νk−1 = 0 can be regarded as a CG method of the form dk =−gk+ βkdk−1, where βk is defined by βk= ykT−1gk dTk−1yk−1 ( τk−1+ yTk−1yk−1 sTk−1yk−1 ) sTk−1gk dTk−1yk−1 .

Furthermore, if gkTdk−1 = 0, then the search direction (2.8) becomes (1.6). The following lemma was given by [14, Lemma 3.1].

Lemma 2.1. If νk−1 ∈ [0, 1) is a constant and (2.7) holds, then the search

direction (2.8) satisfies the sufficient descent condition (1.7) for some positive constant c.

(6)

In order to guarantee the global convergence, Kou and Dai modified the direction (2.8) as follows: (2.10) dk=−gk+ βkKDdk−1+ ζkKDyk−1, βkKD= max { ykT−1gk dTk−1yk−1 ( τk−1+ yTk−1yk−1 sTk−1yk−1 ) sTk−1gk dTk−1yk−1 , ρg T kdk−1 ∥dk−1∥2 } and ζkKD =    0 if βKDk = ρgkTdk−1 ∥dk−1∥2 νk−1 dT k−1gk dT k−1yk−1 otherwise,

where ρ ∈ (0, 1) is a parameter. Note that, the above modified search direc-tion always satisfies the sufficient descent condidirec-tion (1.7). In the line search procedure, they used the improved Wolfe conditions [6]:

f (xk+ αkdk)− f(xk)≤ min { ϵ|f(xk)|, δαkgkTdk+ ηk } , (2.11)

and (2.5), where 0 < δ < σ < 1, ϵ > 0 is a small number and ηk= 1/k2. The following theorem was given by [14, Theorem 4.2].

Theorem 2.2. Suppose that Assumption 1 is satisfied. Consider the method (1.1) and (2.10) with νk−1 ∈ [0, 1), where the step size αk satisfies the improved

Wolfe conditions (2.5) and (2.11). Then the method converges in the sense that

(2.12) lim inf

k→∞ ∥gk∥ = 0

holds.

2.3. Memoryless quasi-Newton method based on the Broyden fam-ily

In this subsection, we recall the memoryless quasi-Newton method based on the Broyden family by Nakayama et al. [16]. They focused on the following spectral-scaling Broyden family [4]:

(2.13) Hk= Hk−1− Hk−1yk−1ykT−1Hk−1 ykT−1Hk−1yk−1 + 1 γk−1 sk−1sTk−1 sTk−1yk−1 + θk−1wk−1wkT−1,

where wk−1appears in (1.4) and γk−1 > 0 is a scaling parameter. The formula (2.13) is the Broyden family based on the spectral-scaling secant condition [5]:

Hkyk−1= 1

γk−1

(7)

where Hkis an approximation to (

γk−1∇2f (xk) )−1

. Nakayama et al. [16] gave the search direction of a memoryless quasi-Newton method based on (2.13), which is dk=− gk+ ( θk−1 ykT−1gk dTk−1yk−1 ( ˆ γk−1+ θk−1 ykT−1yk−1 sTk−1yk−1 ) sTk−1gk dTk−1yk−1 ) dk−1 + ( θk−1 dT k−1gk dTk−1yk−1 + (1− θk−1) yT k−1gk yTk−1yk−1 ) yk−1, (2.14)

where ˆγk−1 = 1/γk−1. Note that the search direction (2.14) with θk−1 = 1 corresponds to (2.8) with νk−1= 1 and τk−1= ˆγk−1. They gave the following proposition. Proposition 2.3. If conditions (2.7), (2.15) ˆγk−1≥ θk−1 ykT−1yk−1 sTk−1yk−1 and (2.16) 0 < θmin≤ θk−1 ≤ θmax< 2

hold, then the search direction (2.14) satisfies the sufficient descent condition

(1.7) with c := min { θmin 2 , 1− θmax 2 }

, where θmin and θmax are constants

satisfying 0 < θmin ≤ 1 ≤ θmax< 2.

In order to establish the global convergence of the method, Nakayama et al. [16] modified (2.14) as follows:

(2.17) dk=−gk+ βkN N Ydk−1+ ζkN N Yyk−1, (2.18) βkN N Y = max { θk−1 yT k−1gk dTk−1yk−1 ( ˆ γk−1+ θk−1 yT k−1yk−1 sTk−1yk−1 ) sT k−1gk dTk−1yk−1 , 0 } , and (2.19) ζkN N Y = sgn(βkN N Y) ( θk−1 dTk−1gk dTk−1yk−1 + (1− θk−1) yTk−1gk ykT−1yk−1 ) , where sgn(·) is defined by sgn(a) = { 1 a > 0, 0 a = 0.

(8)

Theorem 2.4. Suppose that Assumption 1 is satisfied. Consider the method (1.1) and (2.17) with (2.20) ˆγk−1= θk−1 ykT−1yk−1 sT k−1yk−1

and (2.16), where the step size αk satisfies the Wolfe conditions (2.4)–(2.5).

Then the method converges in the sense that (2.12) holds.

§3. A hybrid method of three-term CG method

and memoryless quasi-Newton method

In their numerical experiments, Nakayama et al. [16] showed that the method (1.1) and (2.17) with (3.1) γˆk−1= θk−1 sTk−1yk−1 sT k−1sk−1 .

performed better than the method with (2.20) did. However, (3.1) may not satisfy the condition (2.15), which dose not guarantee the global convergence of the method with (3.1). In order to relax the condition (2.15), we apply the modification technique by Kou and Dai [14] to the method of Nakayama et al. [16], and we give a new method. We show that the proposed method satisfies the sufficient descent condition without the condition (2.15), and the method converges globally. We note that the proposed method can adopt the parameter (3.1).

3.1. Proposed method

Based on the modification described in Section 2.2, we propose the following search direction dk=− gk+ ( θk−1 ykT−1gk dT k−1yk−1 ( ˆ γk−1+ θk−1 ykT−1yk−1 sT k−1yk−1 ) sTk−1gk dT k−1yk−1 ) dk−1 + νk−1 ( θk−1 dTk−1gk dT k−1yk−1 + (1− θk−1) yTk−1gk yT k−1yk−1 ) yk−1, (3.2)

where νk−1is a parameter. Note that the search direction (3.2) with νk−1= 1 is identical to (2.14), and that (3.2) with θk−1 = 1 corresponds to (2.8) with

τk−1= ˆγk−1. In the same way as (2.17), we modify (3.2) as follows: (3.3) dk =−gk+ βkN N Ydk−1+ ζknewyk−1

(9)

and

(3.4) ζknew = νk−1ζkN N Y,

where βkN N Y and ζkN N Y are defined by (2.18) and (2.19), respectively. Note that, if βN N Y

k > 0 is satisfied, then the search direction (3.3) is identical to (3.2). Otherwise, the search direction (3.3) becomes the steepest descent direction (dk=−gk).

To establish the sufficient descent property of the proposed method, we impose the following condition on θk−1

(3.5) 0≤ θk−1≤ ¯c21,

where 1 < ¯c1< 2 is a constant. As a choice of νk−1, we consider the following:

(3.6)    0≤ νk−1≤ ¯ν, if 0≤ θk−1≤ 1, 0≤ νk−1≤ ¯ c1 √ θk−1 − 1, if 1 < θk−1≤ ¯c21,

where 0 ≤ ¯ν < 1. Then we give the sufficient condition for the search direction (3.3) to satisfy (1.7) as follows.

Proposition 3.1. Assume that (2.7) holds. Then the search direction (3.3)

with (3.5)–(3.6) satisfies the sufficient descent condition (1.7) for some positive constant c.

Proof. By (2.18), we first note that βkN N Y ≥ 0 holds for all k ≥ 1. For the case βkN N Y = 0, the search direction (3.3) becomes the steepest descent direction which implies that the sufficient descent condition (1.7) with c = 1 holds. Otherwise, the search direction (3.3) is identical to (3.2). Thus, it is sufficient to show that (3.2) satisfies (1.7).

Using the relation 2uTv≤ ∥u∥2+∥v∥2 for any vectors u and v, we have

(1 + νk−1) (yTk−1gk)(dTk−1gk) dT k−1yk−1 = (√ 2dTk−1gk dT k−1yk−1 yk−1 )T( (1 + νk−1) 2 gk ) 1 2   2dTk−1gk dT k−1yk−1 yk−1 2 + (1 + ν√k−1) 2 gk 2   ,

(10)

and hence it follows from (2.7), (3.2) and sk−1= αk−1dk−1 that gTkdk=− ∥gk∥2+ θk−1(1 + νk−1) (ykT−1gk)(dTk−1gk) dTk−1yk−1 ( ˆ γk−1+ θk−1 yT k−1yk−1 sT k−1yk−1 ) αk−1(dTk−1gk)2 dT k−1yk−1 + νk−1(1− θk−1) (yT k−1gk)2 yT k−1yk−1 ≤ − ∥gk∥2+ θk−1 2   2dT k−1gk dTk−1yk−1 yk−1 2 + (1 + ν√k−1) 2 gk 2   ( ˆ γk−1+ θk−1 yT k−1yk−1 sTk−1yk−1 ) αk−1(dTk−1gk)2 dTk−1yk−1 + νk−1(1− θk−1) (yT k−1gk)2 ykT−1yk−1 = ( 1 θk−1(1 + νk−1) 2 4 ) ∥gk∥2− ˆγk−1 αk−1(dTk−1gk)2 dT k−1yk−1 + νk−1(1− θk−1) (ykT−1gk)2 yT k−1yk−1 ≤ − ( 1 θk−1(1 + νk−1) 2 4 ) ∥gk∥2+ νk−1(1− θk−1) (ykT−1gk)2 yTk−1yk−1 .

We consider the case 0≤ θk−1≤ 1. Using νk−1(1−θk−1)≥ 0 and the Cauchy-Schwarz inequality, gkTdk ≤ − ( 1−θk−1(1 + νk−1) 2 4 ) ∥gk∥2+ νk−1(1− θk−1)∥yk−1∥ 2∥g k∥2 ∥yk−1∥2 = ( (1− νk−1) θk−1 4 ( 1− 2νk−1+ νk2−1 )) ∥gk∥2 = ( (1− νk−1) ( 1−θk−1 4 (1− νk−1) )) ∥gk∥2 holds. Since it follows from 0≤ νk−1 that

( 1−θk−1 4 (1− νk−1) ) ≤ − ( 1 1 4+ 0 ) =3 4,

we obtain from 1− νk−1> 0 and (3.6)

gTkdk≤ − 3(1− νk−1) 4 ∥gk∥ 2 ≤ −3(1− ¯ν) 4 ∥gk∥ 2.

(11)

We next consider the case 1 < θk−1 ≤ ¯c21. Since νk−1(1− θk−1)≤ 0 holds, we have from (3.6) gTkdk≤ − ( 1−θk−1(1 + νk−1) 2 4 ) ∥gk∥2 ≤ − ( 1−θk−1 4 ¯ c21 θk−1 ) ∥gk∥2 = ( 1−c¯ 2 1 4 ) ∥gk∥2.

Therefore, the search direction (3.2) satisfies the sufficient descent condition (1.7) with c = min { 3(1−¯ν) 4 , 1− ¯ c2 1 4 } . 3.2. Global convergence

In this subsection, we prove the global convergence of the proposed method for general objective functions. We first introduce the following property. Property 1. Consider the method (1.1) and (3.3). Suppose that there exists

a positive constant ε such that

(3.7) ε≤ ∥gk∥ for all k

holds. Then we say that the method has Property 1 if there exists a positive constant ¯c2 such that

ˆ

γk−1 ≤ ¯c2∥dk−1∥ for all k.

The next proposition guarantees that the proposed method with some con-crete choices of ˆγk−1 has Property 1.

Proposition 3.2. Suppose that Assumption 1 is satisfied. Consider the method (1.1) and (3.3), where the step size αk satisfies the Wolfe conditions (2.4)– (2.5). Then the following hold:

1. If we set

(3.8) ˆγk−1=

yTk−1yk−1

sTk−1yk−1

,

(12)

2. If we set

(3.9) ˆγk−1 = s

T k−1yk−1

sTk−1sk−1,

then the method has Property 1. Proof. By (2.1) and (2.2), we obtain

∥yk−1∥ ≤ L∥sk−1∥ ≤ L(∥xk∥ + ∥xk−1∥) ≤ 2Lˆa. If (3.7) holds, then it follows from (1.7), (2.2), (2.6) and (2.9) that

sTk−1yk−1 sT k−1sk−1 yTk−1yk−1 sT k−1yk−1 2L2aˆ∥sk−1∥ αk−1c(1− σ)ε2 = 2L 2ˆa c(1− σ)ε2∥dk−1∥. Therefore, the methods have Property 1.

We note that the proposed method with (2.20) has Property 1. Also, the method with (3.1) has Property 1. We emphasize that the proposed method can choose the parameter (3.1), which is a good choice in the previous research [16].

The following lemma corresponds to [16, Lemma 3.3]. Since the proof is almost same as that of [16, Lemma 3.3], we omit it. Note that the lemma imposes a different condition from that of [16, Lemma 3.3] on θk−1.

Lemma 3.3. Consider the method (1.1) and (3.3) with (3.5), where the step

size αk satisfies the Wolfe conditions (2.4)–(2.5). Suppose that Assumption 1

is satisfied and there exists a positive constant ε such that (3.7) holds. If the method has Property 1, then there exist constants b > 1 and ξ > 0 such that for all k (3.10) βkN N Y ≤ b and (3.11) ∥sk−1∥ ≤ ξ =⇒ βN N Yk 1 2b.

The next lemma corresponds to [7, Lemma 3.4] and [9, Lemma 4.1]. Lemma 3.4. If all assumptions of Lemma 3.3 are satisfied, then dk̸= 0 and

k=0

∥uk− uk−1∥2 <∞

(13)

Proof. Since dk̸= 0 follows from (1.7) and (3.7), the vector uk is well-defined. By defining vk= 1 ∥dk∥ (gk− ζknewyk−1) and ηk= βkN N Y ∥dk−1∥ ∥dk∥ , equation (3.3) is written as uk = vk+ ηkuk−1.

Then we get from the fact∥uk∥ = ∥uk−1∥ = 1

(3.12) ∥vk∥ = ∥uk− ηkuk−1∥ = ∥ηkuk− uk−1∥. From the relations βkN N Y ≥ 0 and (3.12), we obtain

∥uk− uk−1∥ ≤ (1 + ηk)∥uk− uk−1∥

=∥uk− ηkuk−1+ ηkuk− uk−1∥

≤ ∥uk− ηkuk−1∥ + ∥ηkuk− uk−1∥ = 2∥vk∥.

(3.13)

Since the search direction satisfies the descent condition, it follows from (2.6) that gTkdk−1≥ σgkT−1dk−1≥ −σ 1− σd T k−1yk−1 and dTk−1yk−1 = gkTdk−1− gTk−1dk−1 ≥ gTkdk−1. Hence we obtain (3.14) gTkdk−1 dTk−1yk−1 ≤ max { σ 1− σ, 1 } .

Using νk−1< 1, 0≤ θk−1 < 4, (3.4) and (3.14), we get

∥ζnew k yk−1∥ ≤ νk−1 ( θk−1 gkTdk−1 dTk−1yk−1 yk−1 +|1 − θk−1|∥y∥yk−1∥∥gk∥ k−1∥2 ∥yk−1∥ ) < 4 max { σ 1− σ, 1 } ∥yk−1∥ + 3∥gk∥. (3.15)

(14)

By (2.1), (2.2), (2.3), (3.13) and (3.15), for any positive integer m, we have mk=0 ∥uk− uk−1∥2≤ 4 mk=0 ∥vk∥2 ≤ 4 mk=0 ( 4∥gk∥ + 4 max { σ 1− σ, 1 } ∥yk−1∥ )2 1 ∥dk∥2 ≤ 64 ( Γ + 2Lˆa max { σ 1− σ, 1 })2 mk=0 1 ∥dk∥2 .

Since the search direction satisfies the sufficient descent condition, we ob-tain ∑k=0∥d1

k∥2 < ∞ by (3.7) (see [23, Lemma 3.1]). Therefore, we have

k=0∥uk− uk−1∥2 <∞.

Let N denote the set of all positive integers. For λ > 0 and a positive integer ∆, we define

k,∆ :={i ∈ N | ∥si−1∥ > λ, k ≤ i ≤ k + ∆ − 1}.

Let|Kλk,∆| denote the number of elements in Kλk,∆. The following lemma shows that if the magnitude of the gradient is bounded away from zero and (3.10)– (3.11) hold, then a certain fraction of the steps cannot be too small. This lemma corresponds to [1, Lemma 3] and [9, Lemma 4.2]. Since the proof is almost same as in that of [1, Lemma 3] and [9, Lemma 4.2], we omit it. Lemma 3.5. Suppose that all assumptions of Lemma 3.3 hold. Then there

exists λ > 0 such that, for any ∆ ∈ N and any index k0, there is an index ˆ k≥ k0 such that |Kλ ˆ k,∆| > ∆ 2.

Using Lemmas 3.4 and 3.5, we obtain the following global convergence theorem of our method. Since the proof of the theorem is exactly same as [7, Theorem 3.6], we omit it.

Theorem 3.6. Suppose that Assumption 1 is satisfied. Consider the method (1.1) and (3.3) with (3.5)–(3.6). Assume that the method has Property 1 and

the step size αk satisfies the Wolfe conditions (2.4)–(2.5). Then the method

converges in the sense that

(3.16) lim inf

k→∞ ∥gk∥ = 0

(15)

Closing this section, we briefly consider a suitable choice of parameters. Nakayama et al. [16] showed that parameters

θk−1 = 1 + |gT kdk−1| ∥gk−1∥∥dk−1∥ and θk−1 = 1 + gkTdk−1 gkT−1dk−1

were better choices than θk−1 = 1 in their numerical experiments. These results mean that the preconvex class is efficient, and hence we consider the following parameter:

(3.17) θk−1 = 1 + tk|gkTdk−1|,

where tk is a scalar parameter. Then, if we use the exact line search, the parameter θk−1 becomes 1, and hence the search direction (3.2) with (3.17) is identical to the search direction of the CG method with HS formula (1.6). Therefore, we regard the proposed method with (3.17) as a hybrid method of the three-term CG method and the memoryless quasi-Newton method.

Now we give concrete choices of θk−1and νk−1 that satisfy conditions (3.5)– (3.6). Let ¯c1= 1.99. If 1 < θk−1 ≤ 1.2, then ¯ c1 √ θk−1 − 1 ≥ √1.99 1.2− 1 ≈ 0.8166

holds. Using the above inequality and (3.17), we propose the following param-eters: (3.18) θk−1= 1 + min { dTk−1gk gTk−1dk−1 , 0.2 } and νk−1= 0.8.

Note that (3.18) satisfies conditions (3.5)–(3.6).

§4. Numerical experiments

In this section, we report numerical experiments of the proposed method of the form (1.1) and (3.3). We tested 138 problems from the CUTEr library [3, 10]. The problems were used in [16], and their names and dimensions n are given in Table 1. All codes were written in C by modifying the software package CG-DESCENT Version 5.3 [11, 12, 13]. They were run on a PC with 3.5GHz Intel Core i5, 32.0 GB RAM memory and Linux OS Ubuntu 16. We stopped the algorithm if

(16)

held. The line search procedure was the default procedure of CG-DESCENT, which implies that we used the parameter values of σ = 0.9 and δ = 0.1 in the Wolfe conditions (2.4)–(2.5).

To compare numerical performance among the tested methods, we adopt the performance profiles based on the CPU time by Dolan and Mor´e [8]. For ns solvers and np problems, the performance profiles P : R→ [0, 1] is defined as follows: LetP and S be the set of problems and the set of solvers, respectively. For each problem p ∈ P and for each solver s ∈ S, we define tp,s = CPU time required to solve problem p by solver s. The performance ratio is given by rp,s = tp,s/ minstp,s.Then, the performance profile is defined by P (τ ) =

1

np|{p ∈ P|rp,s≤ τ}|, for all τ ≥ 1. Note that P (τ) is the probability for solver

s∈ S such that a performance ratio rp,s is within a factor τ ≥ 1 of the best result. The left side of the figure of performance profiles gives the percentage of the test problems for which a method is the best result. The top curve is the method that solves the most problems in a result that is within a factor τ of the best result. In order to prevent a measurement error, we set the minimum of the 0.1 seconds.

Table 2 presents the choices of parameters θk−1, ˆγk−1 and νk−1 for the tested methods. ML1–3, KD and NEW are the method of the form (1.1) and (3.3). Note that ML1–3 correspond to memoryless quasi-Newton methods by Nakayama et al. [16], and KD corresponds to the method by Kou and Dai [14]. In our numerical experiments, for ML1–3 and KD, we chose the parameters recommended in their numerical experiments. NEW is the proposed method with (3.18). For NEW, we chose the parameters which performed better in our preliminary numerical experiments ([14, 16]). CGD is the well-known benchmark method based on the CG method by Hager and Zhang [12, 13], namely CG DESCENT 5.3.

Figure 1 shows the performance profiles of the methods in Table 2. As mentioned at the beginning of Section 3, we see that ML3 performed better than ML2 did. This implies that the scaling parameter (3.1) is more efficient than (2.20). Since ML3 and NEW performed better than ML1 and KD did, respectively, we see that the preconvex class is significant for the Broyden family. We see that KD and NEW were superior to CGD and performed slightly better than ML3 did. Note that ML3 does not guarantee the global convergence. On the other hand, KD and NEW guarantee the global conver-gence. Thus, the proposed method is theoretically superior to the memoryless quasi-Newton method by Nakayama et al. [16].

(17)

Table 1: Test problems (names and dimensions) by CUTEr library

name n name n name n name n

AKIVA 2 DIXMAANC 3000 HEART8LS 8 PENALTY1 1000 ALLINITU 4 DIXMAAND 3000 HELIX 3 PENALTY2 200 ARGLINA 200 DIXMAANE 3000 HIELOW 3 PENALTY3 200 ARGLINB 200 DIXMAANF 3000 HILBERTA 2 POWELLSG 5000 ARWHEAD 5000 DIXMAANG 3000 HILBERTB 10 POWER 10000

BARD 3 DIXMAANH 3000 HIMMELBB 2 QUARTC 5000

BDQRTIC 5000 DIXMAANI 3000 HIMMELBF 4 ROSENBR 2

BEALE 2 DIXMAANJ 3000 HIMMELBG 2 S308 2

BIGGS6 6 DIXMAANK 3000 HIMMELBH 2 SCHMVETT 5000

BOX3 3 DIXMAANL 3000 HUMPS 2 SENSORS 100

BOX 10000 DIXON3DQ 10000 JENSMP 2 SINEVAL 2

BRKMCC 2 DJTL 2 KOWOSB 4 SINQUAD 5000

BROWNAL 200 DQDRTIC 5000 LIARWHD 5000 SISSER 2

BROWNBS 2 DQRTIC 5000 LOGHAIRY 2 SNAIL 2

BROWNDEN 4 EDENSCH 2000 MANCINO 100 SPARSINE 5000 BROYDN7D 5000 EG2 1000 MARATOSB 2 SPARSQUR 10000 BRYBND 5000 ENGVAL1 5000 MEXHAT 2 SPMSRTLS 4999 CHAINWOO 4000 ENGVAL2 3 MOREBV 5000 SROSENBR 5000 CHNROSNB 50 ERRINROS 50 MSQRTALS 1024 STRATEC 10

CLIFF 2 EXPFIT 2 MSQRTBLS 1024 TESTQUAD 5000

COSINE 10000 EXTROSNB 1000 NONCVXU2 5000 TOINTGOR 50 CRAGGLVY 5000 FLETCBV2 5000 NONDIA 5000 TOINTGSS 5000 CUBE 2 FLETCHCR 1000 NONDQUAR 5000 TOINTPSP 50 CURLY10 10000 FMINSRF2 5625 OSBORNEA 5 TOINTQOR 50 CURLY20 10000 FMINSURF 5625 OSBORNEB 11 TQUARTIC 5000 CURLY30 10000 FREUROTH 5000 OSCIPATH 10 TRIDIA 5000 DECONVU 63 GENHUMPS 5000 PALMER1C 8 VARDIM 200 DENSCHNA 2 GENROSE 500 PALMER1D 7 VAREIGVL 50 DENSCHNB 2 GROWTHLS 3 PALMER2C 8 VIBRBEAM 8

DENSCHNC 2 GULF 3 PALMER3C 8 WATSON 12

DENSCHND 3 HAIRY 2 PALMER4C 8 WOODS 4000

DENSCHNE 3 HATFLDD 3 PALMER5C 6 YFITU 3

DENSCHNF 2 HATFLDE 3 PALMER6C 8 ZANGWIL2 2

DIXMAANA 3000 HATFLDFL 3 PALMER7C 8 DIXMAANB 3000 HEART6LS 6 PALMER8C 8

Table 2: Tested methods

Method name θk−1 ˆγk−1 νk−1 ML1(BFGS) 1 (3.1) 1 ML2 1 + min { |gT kdk−1| ∥gk∥∥dk−1∥ , 0.9 } (2.20) 1 ML3 1 + min { |gT kdk−1| ∥gk∥∥dk−1∥ , 0.9 } (3.1) 1 KD 1 (3.9) 0.8 NEW 1 + min { gTkdk−1 gT k−1dk−1 , 0.2 } (3.9) 0.8 CGD CG DESCENT Ver5.3

(18)

1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.6 0.7 0.8 0.9 1.0 τ P τ) CGD NEW KD ML3 ML2 ML1(BFGS)

Figure 1: performance profiles based on CPU time

§5. Conclusions

In this paper, we have modified the memoryless quasi-Newton method based on the spectral-scaling secant condition [16] and proposed a new method. Fur-thermore, we have shown that the method always satisfies the sufficient descent condition and converges globally. In numerical experiments, we have shown that the proposed method performs better than existing memoryless quasi-Newton methods do, and we have found suitable parameters for the proposed method. A further study is to find more suitable choices for parameters θk−1, ˆ

γk−1 and νk−1.

Acknowledgment

I would like to thank Professor Hiroshi Yabe and Professor Yasushi Narushima for useful discussions and proofreading the manuscript. The author is grateful to the anonymous referee whose comments helped to improve the paper.

References

[1] M. Al-Baali, Y. Narushima and H. Yabe, A family of three-term conjugate gra-dient methods with sufficient descent property for unconstrained optimization, Computational Optimization and Applications, 60 (2015), 89–110.

(19)

[2] N. Andrei, Accelerated adaptive Perry conjugate gradient algorithms based on the self-scaling memoryless BFGS update, Journal of Computational and Applied Mathematics, 325 (2017), 149–164.

[3] I. Bongart, A.R. Conn, N.I.M Gould and P.L. Toint, CUTE: constrained and un-constrained testing environment, ACM Transactions on Mathematical Software, 21 (1995), 123–160.

[4] Z. Chen and W. Cheng, Spectral-scaling quasi-Newton methods with updates from the one parameter of the Broyden family, Journal of Computational and Applied Mathematics, 248 (2013), 88–98.

[5] W.Y. Cheng and D.H. Li, Spectral scaling BFGS method, Journal of Optimiza-tion Theory and ApplicaOptimiza-tions, 146 (2010), 305–319.

[6] Y.H. Dai and C.X. Kou, A nonlinear conjugate gradient algorithm with an opti-mal property and an improved Wolfe line search, SIAM Journal on Optimization, 23 (2013), 296–320.

[7] Y.H. Dai and L.Z. Liao New conjugacy conditions and related nonlinear con-jugate gradient methods, Applied Mathematics and Optimization, 43 (2001), 87–101.

[8] E.D. Dolan and J.J Mor´e, Benchmarking optimization software with perfor-mance profiles, Mathematical Programming, 91 (2002), 201–213.

[9] J.C. Gilbert and J. Nocedal, Global convergence properties of conjugate gradient methods for optimization, SIAM Journal on Optimization, 2 (1992), 21–42. [10] N.I.M Gould, D. Orban and P.L. Toint, CUTEr and SifDec: A constrained and

unconstrained testing environment, revisited, ACM Transactions on Mathemat-ical Software, 29 (2003), 373–394.

[11] W.W. Hager, Hager’s web page : http://people.clas.ufl.edu/hager/.

[12] W.W. Hager and H. Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM Journal on Optimization, 16 (2005), 170–192.

[13] W.W. Hager and H. Zhang, Algorithm 851: CG DESCENT, a conjugate gra-dient method with guaranteed descent, ACM Transactions on Mathematical Software, 32 (2006), 113–137.

[14] C.X. Kou and Y.H. Dai, A modified self-scaling memoryless Broyden-Fletcher-Goldfarb-Shanno method for unconstrained optimization, Journal of Optimiza-tion Theory and ApplicaOptimiza-tions, 165 (2015), 209-224.

[15] S. Nakayama, Y. Narushima and H. Yabe, A memoryless symmetric rank-one method with sufficient descent property for unconstrained optimization, Journal of the Operations Research Society of Japan, 61 (2018), 53–70.

(20)

[16] S. Nakayama, Y. Narushima and H. Yabe, Memoryless quasi-Newton meth-ods based on spectral-scaling Broyden family for unconstrained optimization, Journal of Industrial and Management Optimization, to appear.

[17] Y. Narushima and H. Yabe, A survey of sufficient descent conjugate gradi-ent methods for unconstrained optimization, SUT Journal of Mathematics, 50 (2014), 167–203.

[18] Y. Narushima, H. Yabe and J.A. Ford, A three-term conjugate gradient method with sufficient descent property for unconstrained optimization, SIAM Journal on Optimization, 21 (2011), 212–230.

[19] J. Nocedal and S.J. Wright, “Numerical optimization”, 2nd edition, Springer, 2006.

[20] S.S. Oren, Self-scaling variable metric (SSVM) algorithms, Part II: Implemen-tation and experiments, Management Science, 20 (1974), 863–874.

[21] S.S. Oren and D.G. Luenberger, Self-scaling variable metric (SSVM) algorithms, Part I: Criteria and sufficient conditions for scaling a class of algorithms, Man-agement Science, 20 (1974), 845–862.

[22] D.F. Shanno, Conjugate gradient methods with inexact searches, Mathematics of Operations Research, 3 (1978), 244–256.

[23] K. Sugiki, Y. Narushima and H. Yabe, Globally convergent three-term con-jugate gradient methods that use secant condition and generate descent search directions for unconstrained optimization, Journal of Optimization Theory and Applications, 153 (2012), 733–757.

[24] S. Yao and L. Ning, An adaptive three-term conjugate gradient method based on self-scaling memoryless BFGS matrix, Journal of Computational and Applied Mathematics, 332 (2018), 72–85.

[25] L. Zhang, W. Zhou and D.H. Li, A descent modified Polak-Ribi`ere-Polyak con-jugate gradient method and its global convergence, IMA Journal of Numerical Analysis, 26 (2006), 629–640.

[26] L. Zhang, W. Zhou and D.H. Li, Some descent three-term conjugate gradient methods and their global convergence, Optimization Methods and Software, 26 (2007), 697–711.

[27] Y. Zhang and R.P. Tewarson, Quasi-Newton Algorithms with updates from the preconvex part of Broyden’s family, IMA Journal of Numerical Analysis, 8 (1988), 487–509.

Shummin Nakayama

Department of Applied Mathematics, Tokyo University of Science 1-3, Kagurazaka, Shinjuku, Tokyo 162-8601, Japan

Table 1: Test problems (names and dimensions) by CUTEr library
Figure 1: performance profiles based on CPU time

参照

関連したドキュメント

This paper is devoted to the study of maximum principles holding for some nonlocal diffusion operators defined in (half-) bounded domains and its applications to obtain

More specifically, we will study the extended Kantorovich method for the case n = 2, which has been used extensively in the analysis of stress on rectangular plates... This

Furthermore, we also consider the viscosity shrinking projection method for finding a common element of the set of solutions of the generalized equilibrium problem and the set of

To derive a weak formulation of (1.1)–(1.8), we first assume that the functions v, p, θ and c are a classical solution of our problem. 33]) and substitute the Neumann boundary

Variational iteration method is a powerful and efficient technique in finding exact and approximate solutions for one-dimensional fractional hyperbolic partial differential equations..

The Dubrovin–Novikov procedure is well justified in the averaging of the Gardner–Zakharov–Faddeev bracket on the m-phase solutions of KdV for any m and provides a local

Many families of function spaces play a central role in analysis, in particular, in signal processing e.g., wavelet or Gabor analysis.. Typical are L p spaces, Besov spaces,

In this work, we proposed variational method and compared with homotopy perturbation method to solve ordinary Sturm-Liouville differential equation.. The variational iteration