RIGHT:
URL:
CITATION:
AUTHOR(S):
ISSUE DATE:
TITLE:
Studies on Solution Methods for Nonlinear Programming Problems(
Dissertation_全文 )
Fukushima, Masao
Fukushima, Masao. Studies on Solution Methods for Nonlinear Programming Problems. 京都大学, 1979, 工学博士
1979-05-23
https://doi.org/10.14989/doctor.r3905
i" ■'■■‑;":....
STUDIES ON SOLUTION METHODS FOR
NONLINEAR PROGRAMMING PROBLEMS
MASAO FUKUSHIMA
KYOTO UNIVERSITY
KYOTO, JAPAN
JANUARY 1979
STUDIES ON SOLUTION METHODS
FOR
NONLINEAR PROGRAMMING PROBLEMS
by
Masao Fukushima
Submitted in partial fulfillment of the requirement for the degree of
DOCTOR OF ENGINEERING
at
Kyoto University KYOTO, JAPAN January 1979
PREFACE
Optimization is a principle which plays an important role in various fields of engineering. In practice, an optimization problem is normally formulated as a mathematical model consisting of an objective function to optimize and constraints specifying feasible solutions. Such a problem is generally called a mathematical pro‑
gramming problem. The analysis of mathematical programming problems containing, in particular, nonlinearity in the objective function or the constraints is called nonlinear programming. Specifically, non‑
linear programming involves such diverse fields as characterizing optimality conditions , clarifying duality correspondences and devel‑
oping numerical methods to solve such problems.
The origin of nonlinear programming goes as far back as the pioneering work of Kuhn and Tucker in 1951. Most attention in the early years was mainly focused on the theoretical subjects like opti‑
mality and duality theories. Later, rapid progress of computer ma‑
chineries in the 1960's has made it possible to handle relatively large and complex problems, and hence the development of efficient
numerical methods has become a major subject of research in this field.
In those years, however, most of the solution techniques were pro‑
posed independently and the interrelationship between theory and practice remained almost neglected. Only recently, it has been rec‑
ognized that theory and practice should be integrated to provide
/
a unified framework for the study of nonlinear programming. More‑
over, one of the recent trends is to study the problems under a still more general setting and to develop effective numerical methods applicable to such problems by establishing a theoretical framework based on new concepts. On the other hand, it is also necessary to restrict our attention to a certain class of problems and then de‑
velop a detailed theory and a most relevant numerical solution tech‑
nique .
With these situations in mind, this thesis attempts to present new ideas and to give some insights into the nonlinear programming problems. Nonlinear programming is a relatively young field of study and numerous attempts are now being undertaken extensively.
It seems that nonlinear programming is still less established in both theory and practice compared with linear programming. The au‑
thor hopes that the work contained in the thesis will be helpful for future research in this growing field.
ACKNOWLEDGEMENTS
I wish to express my sincere gratitude to Professor H. Mine of Kyoto University for his invaluable guidance and criticism ever since my student days. Without his constant encouragement
and helpful suggestions, this work would not exist today.
I also wish to acknowledge Associate Professor K. Ohno of Kyoto University for his inenumerable suggestions and comments.
Particularly, the work in Chapters 7 through 9 of this thesis was motivated from discussions with him.
I appreciate the stimulating atmosphere supplied by the Department of Applied Mathematics and Physics at Kyoto University.
In this respect, I wish to acknowledge Professor T. Hasegawa,
Associate Professor T. Ibaraki, Dr. H. Kawai, and others, too many for me to enumerate individually. Furthermore, I am grateful to all my friends and colleagues who have encouraged and assisted me
toward the completion of the thesis. In particular, I have to mention that Mr. Y. Fujii has partly participated in completing the work in Chapter 4 of the thesis. Finally, I wish to express my appreciation to Mr. N. Navarrete, Jr. for his kindness of
checking and correcting my English.
PREFACE
ACKNOWLEDGEMENTS
CHAPTER 1. INTRODUCTION 1.1 Nonlinear Pro 1.2 Review of NLP 1.3 Review of NLP
2.1
2.2
2.3
2.4
2.5
2.6
3
3
3
3
3
3
3
1
2
3
4
5
6
7
Introduc
CONTENTS
iii
v
1
1
3
6
9
13
13
17
18
28
29
29
35
44
52
58
1.4 Outline of the
CHAPTER 2. FOUNDATION OF PENALTY FUNCTION METHODS FOR
GENERAL CONVEX PROGRAMS
A Penalty Function Approach 15
Infimal Convergence Convergence Theorems
Examples of Penalty Functions 25
CHAPTER 3. APPLICATION OF FENCHEL DUALITY TO PENALTY
METHODS FOR GENERAL CONVEX PROGRAMS
Introduc
Cone1us
A General Convex Program and Fenchel Duality 32
Conjugate Penalty Methods
Advantaae of Coniuqate Penalty Methods 38
A Conjugate Interior Penalty Method Numerical Examp
iSiii v 1 1 3
3
1]
13
15
n
18
25
28
29
29
32
35
38
<i
52
58
CHAPTER 4. APPLICATION OF FENCHEL DUALITY TO MULTIFACILITY
LOCATION PROBLEMS
4.1
4.5
4.6
5.1
6
6
6.
6
6
6
6 1
2
3
4
5
6
7
59
59
62
64
70
75
78
79
79
82
84
88
96
108
111
120
124
126
4.2 The Multifacility Location Problems 4.3 The Dual
4.4 Exampl
Computational Res
CHAPTER 5. MINIMIZATION OF THE SUM OF A CONVEX FUNCTION
AND A CONTINUOUSLY DIFFERENTIABLE FUNCTION
5.2 The
Algo 5.3 Algori
5.4 Convergence of Algori 5.5 Rate of Convergence
5.6 Conclusion
Introd
102
CHAPTER 6. GENERALIZATION OF THE PROXIMAL POINT ALGORITHM
TO CERTAIN NONCONVEX PROGRAMS 103
103
Preliminar 104
Convergence of Algorithm Rate of Convergence
Discus
CHAPTER 7. MULTILEVEL DECOMPOSITION OF NONLINEAR PROGRAMMING PROBLEMS BY DYNAMIC PROGRAMMING
7
7
7
7
1
2
3
4
8.5
CHAPTER 9.
APPENDIX.
REFERENCES
Introduction
Decomposability and Separability Main Res
127
127
. . 130
137
145
159
160
160
163
171
180
182
183
183
203
204
206
211
Application to a Quadratic Fractional Programming
7.5 Continuous Obiective and Constraint Functions .... 156
7.6 Conclus
CHAPTER 8. DECOMPOSITION OF NONLINEAR CHANCE‑CONSTRAINED PROGRAMMING PROBLEMS BY DYNAMIC PROGRAMMING 8.1 Introduction
8.2 The Problem and Assumptions ....
8.3 Main Res 8.4 Example
DECOMPOSITION OF MULTIPLE CRITERIA MATHEMATICAL
PROGRAMMING PROBLEMS BY DYNAMIC PROGRAMMING
9.1 Introd
9.2 The Problem and Definitions 186
9.3 Main Results 191
9.4 Computational Aspects 199
9.5 Conclusion
CHAPTER 10. CONCLUSION
CONVEX SETS AND FUNCTIONS
L21
12]
1≫
137
145
15S
16?
160
163
171
135
13;
13!
16i
136
'?'
c
1!
CHAPTER 1
INTRODUCTION
This chapter states the historical background and the present status of nonlinear programming and present the outline of this thesis.
1.1 Nonlinear Programming
The mathematical programming problem is that of minimizing (or maximizing) a certain objective function subject to constraints with respect to decision variables. It is frequently encountered not only in operations research but also various fields such as economics, business, engineering, and other social and natural sci‑
ences. If the objective function and the constraint functions are not all linear, the problem is called the nonlinear programming (NLP) problem in contrast with the linear programming (LP) problem in which all functions are linear. Moreover, if the objective func‑
tion and the constraints are both convex, then the problem is called the convex programming problem and is of particular importance due to many nice properties it has.
The area of NLP covers a variety of topics such as devising methods of finding the optimal solutions, investigating the algo‑
rithmic characteristics of those methods, and studying various types of optimality conditions and duality correspondences. The
NLP has a history of about thirty years since it became common‑
ly recognized as a separate field in operations research. There still remains, however, a number of subjects not yet resolved, while LP has almost completely been established in theory via comprehensive studies of systems of linear equalities and in‑
equalities and has extensively been applied in practice as a powerful tool of decision making.
Nevertheless, starting from the famous pioneering work on NLP by Kuhn and Tucker and passing through the theory of convex analysis of Fenchel and Rockafellar who succeeded in outgrowing classical analysis by systematically replacing differentiability assumptions by convexity assumptions, a number of important theoretical results have been obtained up to the present. For convex programming, in particular, excellent unified theory has been established which stands comparison with theory for LP.
On the other hand, as to computational aspects of finding optimal solutions of NLP problems, several general purpose al‑
gorithms have been devised in proportion to the progress of computer machineries in recent years. Those algorithms are, however, still unsatisfactory from a practical viewpoint, and continual efforts are being made to improve the efficiency and to overcome various difficulties in solving actual problems.
It seems to the author that research in this branch will go on boundlessly since it is unlikely to win an absolutely superior method such as the simplex methods in LP.
1.2 Review of NLP Theory
Kuhn and Tucker succeeded in extending the classical Lagrange multiplier rule for equality constrained NLP problems to the case of inequalities in their fundamental paper [ K6 ] in 1951. Prior
to Kuhn and Tucker, Fritz John [ J2 ] had made an attempt to gen‑
eralize the method of Lagrange to inequality constrained problems, but his result was unsatisfactory in the sense that certain ir‑
regular situations could not be excluded. Kuhn and Tucker pointed out that certain regularity condition, called constraint qualifi‑
cation, need be considered to avoid such difficulties. The
optimality conditions obtained by them have been called the Kuhn‑
Tucker conditions after their names, though it was found later that Karush [ K2 ] had derived the equivalent conditions prior to them. It is very interesting to note that the motivations for the works by Kuhn and Tucker, John, and Karush were entirely different from each other. Namely, Karush was inspired from the calculus of variations, John considered a problem in geometrical inequality, and the background of Kuhn and Tucker was in a wide
variety of fields such as network theory, duality in topology, the theory of games and linear programming. Therefore, it may be reasonable to say that the present research of NLP has its origin
at the work of Kuhn and Tucker. In fact, it was their paper that gave it the name nonlinear programming.
Since then, study of the optimality conditions has been one of the major topics in NLP. In particular, a number of results were published during the 1960's and several of them were collected
by Fiacco and McCormick [ F8 ] and Mangasarian [Ml]. Moreover, constraint qualifications other than that of Kuhn and Tucker have been proposed and more sophisticated optimality conditions have been obtained by many authors. Relationships among various con‑
straint qualifications were clarified by Bazaraa, Goode and Shetty [Bl ] .
Recently, attempts have begun to extend the Kuhn‑Tucker
conditions to more general NLP problems involving nondifferentiable functions. In particular, Pshenichnyi [ P8 ] in 1971 considered necessary optimality conditions for the NLP problem with quasi‑
differentiable functions, and Clarke [C4 ] in 1976 and Hiriart‑
Urruty [H4 ] in 1978 studied the NLP problems involving functions which have generalized gradients in the sense of Clarke [C3 ].
Kuhn and Tucker [K6 ] also showed the equivalence of an NLP problem and a saddle point problem for the Lagrangean associated with the original problem. In the late 1960's, Gale [ Gl ] and Geoffrion [G5 ] developed an elegant duality theory for convex programming by the use of a perturbation function. Duality theory in convex programming has further been established almost complete‑
ly during a series of works of Rockafellar whose idea is to utilize the concept of conjugate functions originated by Fenchel [F2 ] ・ Most of the results of Rockafellar may be found in [R5 ] [R7 ].
For nonconvex programs, an earlier attempt was made by Gould [ G9 ] in 1969 to generalize the Kuhn‑Tucker equivalence theorem concerning the saddle point theorem by introducing the notion of multiplier functions instead of Lagrange multipliers. The multipliei
functions were further investigated in detail and the conditions for the local saddle point theorem to hold were studied by Arrow, Gould and Howe [ A6 ].
On the other hand, by utilizing nonlinear perturbation functions, Rockafellar showed in his recent papers [R9 ][RIO]
that many results already obtained with respect to convex programs could be extended even for nonconvex cases, and he constructed an elegant nonconvex duality theory. These results are particular‑
ly important since they establish a background for the multiplier methods for solving general NLP problems.
1.3 Review of NLP Methods
Methods of solving NLP problems may be roughly classified into two categories, i.e., primal methods and dual methods.
Typical primal methods in early years are feasible direction methods by Zoutenzijk [ Z4 ] and the gradient projection method by Rosen [Rl2l・ Tne latter, combined with the variable metric methods for unconstrained minimization, has been improved by
Goldfarb [ G7 ] and Murtagh and Sargent [M19] to assure fast convergence.
Generalizing the simplex method of LP, Wolfe [ w5 1 proposed the reduced gradient method for solving NLP problems with linear constraints. Later this method has been modified by Abadie [ Al ] [ A2 ] to cope with nonlinear constraints. This generalized reduced gradient (GRG) method is now recognized as one of the most efficient methods in NLP.
More recently, primal methods that require a solution of an auxiliary quadratic programming problem at each iteration have been proposed and shown to converge rapidly by Han [ Hi ] in 1976
and Powell [ P7 ] in 1977.
Dual methods of primitive type are the penalty methods which are often called sequential unconstrained minimization techniques (SUMT), compiled by Fiacco and McCormick [ F8 ] , although they seem not to have been explicitly recognized as dual methods. In the 1960's, penalty methods were paid much attention from practical viewpoints due to their simplicity and wide applicability to actual problems. However, it has been gradually recoqnized that penalty
methods had a serious difficulty in the sense that the penalty functions tend to be considerably ill‑conditioned as the ite‑
ration proceeds. These properties of penalty methods have been studied in detail by Lootzma IL5 ] and Ryan [r14].
In order to avoid such a difficulty, new methods, which make use of penalty functions combined with the Lagrangean, were proposed in 1969 independently by Hestenes [ H2 ] and Powell [ p6 ]・ These methods have indeed clear dual character‑
istics and are called the multiplier methods. It should be noted, however, that the origin of their idea may be found in an earlier paper by Arrow and Solow [ A7 ]. Although Hestenes and Powell considered only equality constrained problems, their idea has later been generalized to deal with inequality con‑
straints as well by Rockafellar [ R6 ] and Kort and Bertsekas [ K5 ].
The multiplier methods are now widely accepted as one of the superior methods for NLP problems. The efficiency of the multi‑
plier methods has been validated by Bertsekas [B7 ] [BIO]. Fur‑
thermore, interesting approaches based on the multiplier methods have been proposed by Mangasarian [M2 ], Fletcher [F10] and Tapia [Tl].
Very recently, nondifferentiable optimization problems have drawn significant attention since actual problems often involve functions not everywhere differentiable. Earlier attempts were made for problems with nondifferentiable convex functions by Bertsekas and Mitter [Bill in 1973, Wolfe [ W6 ] in 1975 and Lemarechal [ L2 ] in 1975. Subsequently, the idea of Wolfe and
Lemarechal has been generalized by Feuer [ F4 ] in 1976 and Mifflin [ M4 ] in 1977 to handle more general classes of non‑
differentiable functions. Nondifferentiable optimization is one of the newest topics in NLP and is now under active investigation.
In addition to the above mentioned methods for general NLP problems, there are also many algorithms devised for large‑
scale mathematical programming problems. Most of the earlier works in this field were to deal with large‑scale LP problems.
Among them, the most important result might be the Dantzig‑Wolfe decomposition principle [ Dl ]. For mathematical programming problems other than LP problems, there have been proposed several methods such as Benders1 decomposition methods [ B4 ] [ G6 ],
Rosen's partitioning procedures [R13] and Geoffrion‑Silverman's primal decomposition methods [ G4 ] [S3 ] . These methods for large‑scale problems are collected in Lasdon [ LI ] and Geoffrion
[G3 ]‑ On the other hand, based on the principle of optimality in dynamic programming (DP) by Bellman [ B2 ], several attempts
have been made to deal with large‑scale mathematical programming problems. It seems, however, that DP has been relatively less applied to large‑scale NLP problems compared with the above men‑
tioned methods for large‑scale problems. Thus it might be valuable to study the applicability of DP to various classes of large‑scale NLP problems.
1.4 Outline of the Thesis
The main purpose of this thesis is to clarify various properties of NLP problems and to propose several methods to solve these prob‑
lems. First, a foundation of traditional SUMT for convex programming problems is studied, and a dual method of SUMT is proposed to over‑
come computational difficulties in the traditional SUMT. The same idea as that used in the dual method is also applied to multifacility location problems. Next, certain nonconvex programming problems are
also considered and algorithms for solving these problems are proposed. . Finally, applicability of DP to various NLP problems is investigated
in detail. Organization of each chapter is briefly summarized as follows:
In Chapter 2, a class of penalty functions for solving convex programming problems with general constraint set is considered. Con‑
vergence theorems for penalty methods are established by utilizing the notion of infimal convergence of a sequence of functions. It is shown that most existing penalty functions are included in the class of penalty functions defined in this chapter. Thus this chapter pro‑
vides a foundation of penalty methods for convex programming problems.
In Chapter 3, a new class of sequential unconstrained optimiza‑
tion methods, called the conjugate penalty methods, is proposed for solving convex programming problems. The conjugate penalty methods utilize conjugate convex functions and are based on Fenchel's duality theorem‑ in convex analysis. It is shown that, under certain assump‑
tions, conjugate penalty functions behave quite mildly and hence avoid the ill‑conditioning of ordinary penalty methods of Chapter 2.
Convergence of the methods is proved and the relationship between ordinary and conjugate penalty methods is shown.
In Chapter 4, a dual approach is developed for solving multi‑
facility location problems under very general settings. Formulation of the dual problem is essentially based on Fenchel's duality theo‑
rem which also plays a crucial role in the previous chapter. It should be noted that the multifacility location problem is applicable to various actual problems, for example, locating facilities such as plants, warehouses and firestations, locating machines in a factory, and communication networks.
In Chapters 5 and 6, two algorithms are proposed to minimize a general function which is the sum of a continuously differentiable
function and a convex function. The objective function is thus in general neither convex nor differentiable. It is noted that this class of problems contains, as a special case, the problem of mini‑
mizing a continuously differentiable function over a closed convex set. In view of this, the algorithm proposed in Chapter 5 is re‑
garded as a natural extension of the Frank‑Wolfe method to the more general problems. On the other hand, the algorithm proposed in Chapter 6 is a generalized version of the proximal point algorithm for monotone operators. Convergence of the algorithms is proved and the rate of convergence is discussed.
In Chapter 7 through 9, sufficient conditions are presented under which large‑scale NLP problems are decomposed into subproblems by dynamic programming. Specifically, Chapter 7 considers multi‑
level decomposition of general NLP problems,. Chapter 8 deals with Stochastic NLP problems with chance‑constraints. Chapter 9 is con‑
cerned with multiple criteria NLP problems for which the concept of Pareto optimality replaces the ordinary concept of optimality. The underlying ideas in Chapters 7 through 9 are basically the same,
although methods used in proving the results are significantly dif‑
ferent from each other.
The final chapter is conclusion of the thesis. Appendix pro‑
vides a summary of definitions and notations in convex analysis that are frequently used in the thesis.
Finally, it should be mentioned that the materials of the thesis are based on the author's original works. Specifically, Chapter 2 is taken from the paper [ M7 ] published in Journal of Optimization Theory and Applications, Chapter 3 from the papers
[M15] and [ M6 ] published, respectively, in SIAM Journal on Con‑
trol and Optimization and in Memoirs of the Faculty of Engineering, Kyoto University, Chapter 4 from the paper [M10] published in the Proceedings of the International Conference on Cybernetics and Society, Tokyo, 1978, Chapter 5 from the paper [ M8 ] to be pub‑
lished in Journal of Optimization Theory and Applications,
Chapter 6 from the paper [F14] to be presented at the Tenth Inter‑
national Symposium on Mathematical Programming, Montreal, 1979, Chapters 7 and 8 from the papers [M13] and [M14], respectively, published in Journal of Mathematical Analysis and Applications, and Chapter 9 from the paper [M9 ] to be published in the Inter‑
national Journal of Systems Science.
CHAPTER 2
FOUNDATION OF PENALTY FUNCTION METHODS
FOR GENERAL CONVEX PROGRAMS
In this chapter, a class of penalty functions for solving convex programming problems with general constraint sets is considered. Convergence theorems for the penalty function methods are established by utilizing the concept of infimal convergence of a sequence of functions. It is shown that most existing penalty functions are included in the class of penalty functions discussed in this chapter.
2.1 Introduction
We consider the following general convex programming problem:
(P) minimize f(x) subject to x e C
where f is a closed convex function on R and C is a closed convex subset of R . Many kinds of sequential un‑
constrained minimization methods for solving problem (P) have been appeared in the literature. Those methods are surveyed in, for example, [ F8 ] [ L5 ]. A typical idea underlying those
techniques is transforming problem (P) into a sequence of un‑
constrained problems, say {(P ); k=l,2,...} , by means of a sequence of auxiliary functions which in general contain one or several parameters. We shall term those functions as penalty
functions conventionally. The constraint set C of problem (P) is usually represented by the system of equalities and/or inequalities of functions and those auxiliary functions are
composed in terms of the problem functions. In the following development, however, we do not explicitly assume the constraint
functions.
The purpose of this chapter is to develop a unified theory of sequential unconstrained minimization methods for the general
convex programming problem (P). We shall consider penalty func‑
tions for (P) without taking account of constraint functions.
It is noted that Fiacco [ F5 ] [ F6 ] and Fiacco and Jones [ F7]
generalize and characterize the penalty function method for solving (P), where they do not assume convexity. However, their approaches are essentially based on the interior and exterior penalty function theory. On the other hand, we consider here only convex cases, but our characterization is more general than those, because we make no distinction either between the interior and the exterior penalty functions or between equality and in‑
equality constraints.
This chapter is organized as follows: Section 2.2 describes the general penalty function methods. Section 2.3 defines the concept of infimal convergence of functions that plays a crucial role in this chapter. Section 2.4 proves the convergence theorems for the penalty function methods. Section 2.5 shows that the results given here are applicable to most existing penalty func‑
tion methods.
2.2 A Penalty Function Approach
It is obvious that problem (P) is equivalent to the follow‑
ing problem;
(P) minimize f(x) + <5 (x) over x e R ,
where 6 is the indicator function of C defined by 6 (x) = 0 if x e C , = +°°if x fL C .
Since C is closed and convex, 6 is a closed convex function.
In the following, we shall assume that the minimum set, de‑
noted by M , is nonempty and bounded, hence
M = { x e C ; f (x) £ f (y) for all y e C }
is a compact convex set. This assumption is the same as assuming that f and C have no direction of recession in common [ R5 , Thm.27.3], or equivalently that f + 6 has no direction of re‑
cession [ R5 , Thm.27.2]. The reader may refer to [ A3 ] for exist‑
ence criteria of minima. We shall make an additional assumption that int(dom f) =>C , which will be needed in the proof of the theorem given in Section 2.4. However, there will be little loss of generality in making this assumption, because in most practical problems f is everywhere finite, i.e., dom f = R
In solving problem (P) via a penalty method, it is required in general thet
Al For a sequence of penalty functions { U, ; k=l,2,... } ,
K.
there is a sequence { x ; k=l,2,... } where each x minimizes
K K
U (x) over R , i.e., ^^iJ = min Uk^x^ * xeRn
A2̲.
k k
Every cluster point of { xk ; k=l,2,... } is in M and tends to the minimum value of (P).
Therefore, a feature of penalty function algorithms is suc‑
cessive minimization of the following type:
(V
minimize U (x) over x e R , k=l,2,... .Let us suppose that the penalty functions U take the form U, (x) = f(x) + h (x)
k k
as most penalty functions do. In order to fulfill the requirements Al and A2, it is expected that the whole information of the con‑
straint set C of problem (P) is imbedded in the sequence of func‑
tions { h ; k=l,2,... } , or intuitively speaking, that h con‑
verges to 6 in a certain sense.
2.3 Infimal Convergence
In this section, we sketch the notion of infimal convergence of a sequence of functions to a function, introduced by Wijsman [W2 ][W3 3■ For any function g on R and for any r > 0 , define the function g on R by
g(x) = inf g(y) , yeBr(x)
where B (x) is the closed sphere of radius r with center x
Definition 2.1 Let { g ,‑k=l,2, ... } be a sequence of functions
on R . The sequence is said to converge infimally to a function
.. ̲ n
g if for every x e R
lim lim inf g (x) = lim lim sup g (x) = g(x) .
If {g
r‑KD k‑*≫ r‑K) k‑*≫
} converges infimally to g , we write g, *■g , while
inf
for pointwise convergence, we write as usual g g . It is noted as indicated by Wijsman [ W3 ] that infimal and pointwise convergence are not comparable in general. But it is intuitively clear that if g y g and g > g , then g ̲> g .
inf
Example 2.1. For k=l,2,... , let g : R y (‑≫,+oo] be
Then
and
g (x) = kx if ‑1/k ̲£ x ̲< 1/k , = +°° otherwise.
g >■g , where g(x) = 0 if x = 0 , = +°° if x ^ 0
g g , where g(x) =‑1 if x=0, =+°° if x^O
inf
2.4 Convergence Theorems
In this section, we consider a class of penalty functions which guarantee the validity of a sequential unconstrained mini‑
mization method for solving (P) and then for an important sub‑
class of (P) we restate it in a somewhat concrete form.
In order that property Al holds, we shall always assume the following condition:
Bl For every k h is a closed convex function with
k
dom f n dom h f 0 and f and h have no direction of reces‑
K K
sion in common.
Under this condition, the minimum set of (P ), denoted by VL , is nonempty, compact and convex.
The key condition for property A2 to hold is the following:
B2. {h ; k=l,2,... } converges infimally to 6
The following theorem asserts that a sequential unconstrained minimization method via penalty functions which satisfy Bl and B2 is valid for solving (P).
Theorem 2.1 Under conditions Bl and B2, every sequence { x e M,
k=l,2,... } has a convergent subsequence and each cluster point of the sequence is in M . Moreover, lim U (x ) = min f .
, v K K C
Proof.
/
First we prove that the sequence { U } converges infimally
Suppose x e C . As C c int(dom f) by our assump‑
to f + 6c
min f = min f(x) xeC
tion, there exists r,, > 0 such that B (x\ c j,nt(dom f) if 0 < r < rn . By definition, U (x) = inf { f(y) + h (y) } .
yeB (x)
Since f is convex, f is continuous on int(dom f) , that is y e B (x) implies f(x) ‑ e < f(y) < f(x) + e and e ‑≫‑0 as r ‑*■0 . Thus we have
and
hence
and
U (x) < inf { f(x) + e + h, (y) }
TC K . . K
yeB (x)
= f(x) + e + h (x)
U. (x) > inf { f(x) ‑ e + h, (y) } = f (x) ‑ e + h (x) ,
r k ̲ , . k r k
yeB (x)
r‑*0 k‑*≫
(x) £ lim{ f (x) r‑>0
+ £ + lim sup h (x) } k‑x≫
f (x)
lim lim inf U (x) >̲lim { f(x) ‑ £ + lim inf h (x) } = f(x) .
r‑>‑0 k‑*≫
For x ^ C , it is obvious that
lim lim sup U (x) = lim lim inf U (x) = +°°
r k ̲ r k
r+0 k‑*≫
Consequently
r‑K> k‑x≫
lim lim sup U = lim lim inf U = f + 6
r+0 k‑*≫ r r+0 k‑*≫ r
Choose e > 0 arbitrarily. Since U has no direction of recession, the set M = { x ; U (x) ̲< min f + e } is compact
Jc .K C
for every k . Similarly, the set M = { xe C ; f (x) <^ min f + £ } is compact. Since U converges infimally to f + 5 , applying
k o
[ R5 , Thm.7.1] and taking account of compactness, 1YL converges to M with respect to the Hausdorff metric as k ‑*■°°. ( Def‑
inition of the Hausdorff metric on a space of compact subsets is
found, for example, in Berge I B6 ] ‑.) Since M is nonempty for every k greater than some K , M <= M for k ̲> K . Therefore every sequence { x e M } has a convergent sub‑
sequence and each cluster point is in M . On the other hand,
E £
M is nonempty and compact, and M converges nonincreasingly to M with respect to the Hausdorff metric as £ tends to zero [ R5 , Thm.27.2]. Therefore every cluster point of {x } is in M .
Now we prove the last assertion. Without loss of general‑
ity, we assume that {x } converges to x e M . Then f(x) + 6 (x) = lim lim sup U (x)
r+0 k+o° *
̲>^lim sup U (x ) ( since x minimizes U )
k^oo k k k k
> lim inf U, (x, )
= v‑^o k k
̲> lim lim inf r‑≫‑0 k‑*≫
= f (x) + 6C (x) ,
(x) ( by the definition of r kU. )
since U converges infimally to f + 5 . Therefore
lim U (x ) = min f
k k C
k‑≫°°
This completes the proof. n
As indicated in Section 2.3, pointwise convergence does not imply infimal convergence. In practice, it seems difficult to
establish the infimal convergence condition B2. However, if C has nonempty interior, it will be shown that there is a close relationship between pointwise and infimal convergence. If C is
represented in terms of the system of inequalities of functions, then int C f 0 is ordinarily satisfied except for pathological cases. The following condition will be substituted for B2:
B2'
ary
C .
of c
{ Yl } converges pointwise to 6 except on the bound‑
, i.e., lim h (x) =0 if x e int C , = +°°if x /
It should be noted that this condition says nothing explicitly about the behavior of {h } on the boundary of C . The fol‑
lowing lemma, however, shows that, for any boundary point x of C , h (x) never tends to a negative limit.
Lemma 2.1 Assume that int C is nonempty. If {h } is a
sequence of convex functions satisfying B2 ', then lim inf h (x) ̲> 0
for any x on the boundary of C .
Proof. Let x by any boundary point of C . There exist x ,
?*2 and x = Ax + (l‑A)x for some
By the convexity of h , for every k hk(x) ^ { \U±) ‑ (1‑A)hk(x2) } / A .
lim inf h. (x) ^ lira { h (x. ) ‑ (l‑X)h. (xo) } / X
= 0 .
Assume that int C is nonempty. If {h } is a
□
X2 e int C such that x 0 < A < 1
Thus
Theorem 2.2
sequence of convex functions satisfying B2', then the sequence converges infimally to 6
Proof. By definition, it suffices to show that
lim lim inf hv^x^ =° ifxeC, = +≫ ifx^C
r+0 k‑*° r
and
lim lim sup h (x) £ 0 if x e C . r‑K) k‑*≫
First we prove that lim lim inf ni(x) ― R for all x e R r+0 k‑*≫ r
Since int C is nonempty, there is an n‑dimensional simplex S with vertices { b , b , ... , b } in int C . For any e > 0 , there exist K. , i=O,l,...,n, such that Ih (b.)│ < e for all k > K. ,
1 ' k 1 ' =1
i=O,l,...,n. Put K = max { K , K ,.. ., K }. Let b be the bary‑
0 1 n
center of S . For any k , epi h is supported by a hyperplane, say z = <a ,x> + $ , at b . In particular, for k ̲> K we have
K K ―
‑e < h (b) = <a,,b> + 3 < e anci
e > h (b.) > <a ,b.> + R , i=O,l,...,n .
JC 1 JC 1 K.
That is, <a ,b.‑b> < 2e , i=O,l,...,n . Since e is arbitrary, K. X
a tends to a null vector and 8 also tends to zero as k ‑> °°.
Thprpfnrp
lim lim inf h (x) = lim lim inf inf
. ―. i K
r^‑0 k‑*≫ r‑H) k‑*≫ yeB (x)
hk(y)
^ lim lim inf inf { <a ,y> + 3, } r‑K) k‑*≫ yeB (x)
= lim lim inf { <a ,x> ‑ r││a ││ + 3, }
^n v‑>oo k k Jc
0
Secondly we prove that for x / C , lim lim inf h (x) = +°°
r‑*‑0 k‑*≫ r
\ '・v
Assume the contrary. Then we must have a sequence {x } con verging to x and lim h (x
k‑*=°K
k) < +°°. For such a sequence, we
can choose y / C and generate a sequence { zk } in int C
k k k
such that y = (x + z )/2 and z ‑*■z e int C . The existence
of such y follows from the closedness of C For sufficiently
large k , we may assume that every z stays in an n‑dimensional simplex S in int C . As shown previously, for any e > 0 there exists some K such that h, (b.) < e for all vertices b.
k 1 1
of S Since h is convex and each z1 e S is represented as
z'=Ab + X b +...+ X b , where A,, + A, + ... + X =1 and
each \. > 0 , 1 ―
vz<) i wv + wv + ・・・+ wv "e
for k ̲> K . Thus lim sup h (z ) ̲£0 . Again by the convexity
k V‑ko k k
of h , h (x ) ^ 2h (y) ‑ h (z ) . Consequently lim h (x ) = +°°, because lim inf { 2h (y) ‑ h (z ) }=+<≫. This is a contradiction.
k^oo k k
Lastly we show that lim lim sup h,(x) ^0 if x jL C r‑>0 k‑*=° r "
For any r > 0 , there is z e int C n B (x) By definition, h (x) < h (z) for all k , thus
r k = k
lim sup h (x) ̲< lim h (z) = 0 . k‑>°° k‑*x>
Consequently lim lim sup h (x) <^ 0 . This completes the proof. [1
k‑*≫ k‑*30
Combining Theorems 2.1 and 2.2, we immediately obtain the
following theorem. The proof is omitted.
Theorem 2.3 Assume that int C is nonempty‑ Under conditions
\
\
Bl and B21, every sequence { x e M ; k=l,2,... } has a con‑
K X
vergent subsequence and every cluster point of the sequence is
in M Moreover, lim U (x ) = min f
, K K ^
k‑MX>
2.5 Examples of Penalty Functions
In this section, we consider the following problem:
(P1) minimize f(x) subject to g.(x) <^ 0 , i=l,...,m, where f, g , ... , g are everywhere finite convex functions on Rn . Let C = { x e Rn ; g.(x) £ 0 , i=l,...,m } and C = { x e R ; g.(x) < 0 , i=l,...,m } . It is well known that if C is
nonempty, then C = int C . It is now possible to exemplify several classes of penalty functions concerning the infimal con‑
vergence condition B2 or B2' for problem (P1)‑ In the following let {t } be a sequence of positive numbers strictly decreasing
to zero.
(a) Interior Penalty Functions [ F8 ][ L5 J
The interior penalty functions are frequently called the barrier functions because of their barrier properties to prevent violation of the constraints. It is noted that these functions are defined so long as C is nonempty. Two well‑known interior penalty functions are defined as
and
h (X) = ‑t ym n i/g. (x) if x e C , = +°° otherwise
h (x) = ― t 2.■̲n log[‑g.(x)] if x e C , = +°° otherwise
which are called the inverse barrier function and the logarithmic barrier function, respectively. For both of them, it is easy to
verify that h *■h , where h(x) = 0 if x e C , = +°°other―
Jc u
wise, and h > 6 . inf
(b) Exterior Penalty Functions [ F8 ][ L5 ]
In exterior penalty function methods, no penalty is assigned to feasible points, whereas the penalty for constraint violation increases as parameters change. These functions are sometimes called loss functions. One of the most familiar is the quadratic loss function defined by
KM = I ., {max[ 0,g. (x)] }2/t. . Then h >■6 and h > 6 .
JC L* X inf. ̲ L.
(c) Exponential Penalty Functions [ A4 ] [ El ] [M18]
Another interesting class of penalty functions is the expo‑
nential penalty function first introduced by Allran and Johnsen [A4 ]. A general class of these functions is the following [M18]:
where 0 <
Sk e3^P [ 9± (x) /t ]
t,,£ su £ ! ・ Suppose that C is nonempty, then
lim h (x) = 0 if x e C , = ‑H≫ if x / C
k+°° * °
while h *■6
k . C
mf
Remark 2.1 It is noted that a class of multiplier functions
defined by Evans and Gould [ E2 ] is a generalized class of ex‑
terior and exponential penalty functions. Concerning convex programming problems, Evans‑Gould multiplier functions seem al‑
most equivalent in principle to our penalty functions. However, our aDDroach is more aeneral than those because we do not treat
the constraint set in terms of constraint functions.
Remark 2.2 In exterior penalty function methods, it is not
required that int C is nonempty. In particular, if each g.
is affine and C = { x e R ; g.(x) = 0 , i=l,...,m } , then, for example, the quadratic loss function takes the form
hk(x) =^=1{gi(x)}2/tk,
moreover, h 6 and h >‑6
k C k c
2.6 Conclusion
We have seen that the convergence of various kinds of penalty function methods independently proposed by many authors can be described in a unified manner in terms of the notion of infimal convergence of convex functions.
From a numerical point of view, however, it should be noted that these methods have an unfavorable property that they inevitably become ill‑conditioned as the sequential unconstrained minimization proceeds. An attempt to overcome such difficluty will be proposed in the next chapter.
CHAPTER 3
APPLICATION OF FENCHEL DUALITY TO PENALTY METHODS
FOR GENERAL CONVEX PROGRAMS
In this chapter, a new class of sequential unconstrained optimization methods, called the conjugate penalty methods, is proposed for solving convex programming problems. The methods utilize conjugate convex functions and are based on Fenchel's duality theorem in convex analysis. It is shown that, under certain conditions, conjugate penalty functions behave quite mildly and hence avoid the ill‑conditioning of ordinary penalty methods of Chapter 2. Convergence of the methods is proved and the relationship between ordinary and conjugate penalty methods is shown.
3.1 Introduction
Recently, many authors have reported a various methods for solving NLP problems by transforming each constrained opti‑
mization problem into unconstrained optimization problems [ B9 1 [ F8 ][ L5 ] [R14]. A characteristic underlying those methods is that a solution of the original problem can be obtained as a limit of sequential solutions to the transformed unconstrained problems. Among those methods, the sequential unconstrained minimization techniques [ F8 1f commonly abbreviated SUMT, have been widely used in practice. They, sometimes called penalty
function methods or penalty methods, reduce the computational process to unconstrained minimization of a transformed func‑
tion, called a penalty function, combining the objective func‑
tion, the constraint functions and one or more parameters. A foundation of those methods for convex programs is stated in detail in the previous chapter of the thesis. Meanwhile, it is well known that the penalty functions inevitably ill‑condi‑
tioned near the boundary of the constraint region as the ite‑
ration proceeds [ L5 ] [ L8 1・ Indeed, such ill‑conditioning causes serious computaional difficulties in solving actual problems.
In this chapter, restricting our attention to convex pro‑
grams, we present a new class of sequential unconstrained opti‑
mization methods which we call conjugate penalty methods. Under appropriate assumptions they circumvent the ill‑conditioning of
ordinary penalty methods. The idea is to dualize ordinary pen‑
alty methods by use of Fenchel's duality theorem [ R5 ] which is stated in Appendix. Specifically, the conjugate penalty method involves sequential unconstrained maximizations of conjugate penalty functions which behave quite mildly near the solution as the sequential maximization proceeds. It is shown that maxi‑
mizing the conjugate penalty function is dual to minimizing the ordinary penalty functions.
The concept of conjugate convex (concave) functions, orig‑
inated by Fenchel [ F2 ] and applied to NLP variously, e.g., [Bll]
[ Fl ] [ K5 ] [ R4 ][ R5 ] , plays a central role in this chapter. The
material from convex analysis used in this chapter can be found in Appendix.
This chapter is organized as follows: Section 3.2 shows a duality between two extremum problems derived from a general convex program. Section 3.3 defines the conjugate penalty‑
method and prove its convergence. In Section 3.4, several conditions are given, under which the conjugate penalty func‑
tions are well‑behaved. Section 3.5 discusses the conjugate interior penalty methods restricted to convex programs with inequality constraints in order to go into further details.
Finally, Section 3.6 gives illustrative numerical examples.
3.2 A General Convex Program and Fenchel Duality
Consider the following general convex programming problem:
(P) minimize f(x) subject to x e C ,
where f is a closed convex function on R and C is a non‑
empty closed convex set in R The convex programming problem (P) is equivalent to the following 'unconstrained' problem:
minimize U(x) ̲A f (x) ‑ y (x)c over x e R , where y is the indicator function of C defined by
C
y (x) = 0 if x e C , = ‑°° if x / C . ' Obviously, y is a closed concave function on R
We assume the following:
£L The finite minimum of U is uniquely attained at x Namely x is the unique minimum of problem (P)
C2. ri(dom f) and ri C have a point in common.
In order to guarantee the existence of a minimum ( possibly
(3.1)
not unique ), we may suppose that f and C have no direction of recession in common [ R5 , Thm.27.3]. Some other conditions for the existence of minima are found in [ A3 ] . The latter assump‑
tion is automatically satisfied when f is finite everywhere, i.e.
dom f = R , and ri C is nonempty, as is almost the case in prac‑
tical problems.
Let V be a closed concave function defined by v(y) A y*(y) ‑ f*(y) ,
where f* and y* are conjugates of f and y ,
\
respectively.
is the negative of the indicator function 6 of Chapter 2
(3.2)
The function y* is the negative of the support function of C I R5, p.28] and, hence, a positively homogeneous closed concave
function.
The following lemma is derived from Fenchel's duality theorem
Lemma 3.1 Let U and V be defined by (3.1) and (3.2), respec‑
tively. If assumptions Cl and C2 are satisfied, then f(x) = min U(x) = sup V(y) ,
xeRn yeRn
where the supremum is attained. Moreover, the maximum set of V is 9f(x) n 8y̲(x) , and conversely, for each maximum y of V
the set df* (y) n 3y*(y) is a singleton {x } , where 8 de‑
notes the subdifferential operator
Proof. The first part is immediate from Fenchel's duality theorem
with g(x) = y Ax) Now prove the latter half. We can show that the following conditions are equivalent;
(i) (ii) (iii)
f(x) ‑ Yc(x) = y*(y) ‑ f*(y) ; f(x) + f*(y) = <x,y>
y g 3f(x) n 8y (X) ;
(iv) x e 3f*(y) n 3y*(y)
Yc(x) + Yc(^ f
( [ R5 , Thm.23.5])
Let x1 be an arbitrary point in 3f*(y) n 8y*(y) , then obviously f(x') ‑ Yc(x1) = f(x) ‑ Yc(x) . Thus, 3f*(y) n 8y*(y) must be a
singleton {x} by the uniqueness.
The minimum set 3f(x) n 8y (x) of V is clearly a closed convex set. In particular, when f is differentiable, the maximum set of V is fairly simplified by the following.
33
D
Lemma 3.2. Let all requirements in Lemma 3.1 be satisfied. In addition, if f is differentiable at x , then the supremum of V is uniquely attained at y = Vf(x) .
Proof. Since 9f(x) = {Vf(x) } and the maximum set of V is
nonempty by Lemma 3.1, it is necessarily {Vf(x) } n
3.3 Conjugate Penalty Methods
There are a number of methods that solve problem (P) by transforming it into a sequence of unconstrained problems of the form:
(v
minimize U, (x)k =A f(x) ‑ h, (x)k over x e RFor each k = 1, 2,... , each auxiliary problem (P ) is solved and the optimal solution to problem (P) is obtained as a limit of a sequence of the optimal solutions to problem (P ). Accord‑
k
ing to the types of h , functions U are classified into several classes, e.g., barrier functions, loss functions, and exponential penalty functions [ F8 ] [ L5 ] [ L8 ] [R14l ‑ Here, we call those functions generically penalty functions. The penalty
functions U should be constructed so that:
k
Al. For every k , there exists a (unique) x that mini‑
mizes U, over R k
A2. x converges to x as k ‑≫‑°°, and the limit of U (x ) is the minimum value of problem (P).
In the following, let appropriate conditions be implicitly assumed so that the properties Al and A2 above are fulfilled. Such con‑
ditions are studied in Chapter 2 of this thesis. Moreover, we assume the following:
C3 Each h is a closed concave function with ri(dom f) n
ri(dom h ) ? 0 ; C4
C5̲
int(dom f) ^ 0 and f is differentiable on int(dom f) ; Every x , a minimum of U , belongs to int(dom f) .
by
Let us define the conjugate penalty functions V on R k
V,(y) A h*(y) ‑ f*(y) , (3.3)
where f* and h* are the conjugates of f and h , respec‑
tively. Now consider a sequence of problems:
(Q ) maximize V (y) over y e R
Since f* and h* are convex and concave respectively, each problem (0 ) is to find an unconstrained maximum of the concave function V
k
It is now possible to define a conjugate penalty method for solving problem (P) in a manner quite similar to that in ordinary penalty methods. Specifically, we try to solve problem (P) by successive maximizations of the conjugate penalty functions V k = 1, 2,... . Therefore, the methods may be regarded as one of the SUMT procedures.
The following theorem proves convergence of the conjugate penalty method.
Theorem 3.1 Let Vk be defined by (3.3). If assumptions Cl
through C5 are satisfied, then there exists for every k a unique maximum y
of problem
of V^ and y = Vf(x ) , where x is a minimum
(V
Moreover the y and V,k Vf (x) and f (x) , respectively, as k ‑>■°°(v
converge toProof. By C3, the existence of a maximum y of V follows from Fenchel's duality theorem. By the differentiability of f r