2.2 Convex Functions

(1)

Chapter 2

Elements of Convex Analysis

(COSS 2018 Reading Material)

Kazuo Murota (July 14, 2018) Minimal technical elements from convex analysis are given in this section. For comprehensive account, the reader is referred to books on convex analysis [1, 2, 3, 5, 6, 7, 8, 9, 10].

2.1 Convex Sets

For two vectorsa= (a₁,a₂, . . . ,a_n),b= (b₁,b₂, . . . ,b_n)∈(R∪ {−∞,+∞})ⁿwe de- fineclosed interval[a,b]andopen interval(a,b)as

[a,b] = [a,b]_R={x∈Rⁿ|a_i≤x_i≤b_i(i=1,2, . . . ,n)}, (2.1) (a,b) = (a,b)_R={x∈Rⁿ|a_i<x_i<b_i(i=1,2, . . . ,n)}, (2.2) where, ifai=−∞, for example,ai≤xiis to be understood as−∞<xi.

A setS⊆Rⁿis calledconvexif it satisfies the condition

x,y∈S, 0≤λ≤1 =⇒ λ^x^{+ (1}−λ^)y∈S, (2.3) where an empty set is a convex set. Aconvex polyhedronis a convex setSdescribed by a finite number of linear inequalities as

S={x∈Rⁿ|

∑

ⁿ

j=1

a_{i j}x_j≤b_i(i=1,2, . . . ,m)}, (2.4) wherea_{i j}∈Randb_i∈R(i=1,2, . . . ,m;j=1,2, . . . ,n).

For a finite number of pointsx¹,x², . . . ,x^min a setS, a point represented as λ1x¹+λ2x²+···+λmx^m (2.5)

39

(2)

40 2 Elements of Convex Analysis (COSS 2018 Reading Material) with nonnegative coefficientsλi(1≤i≤m)having unit sum (∑^mi=1λi=1) is called aconvex combination of those points. IfSis convex, any convex combination of points inSbelongs toS, and the converse is also true. Therefore,Sis convex if and only ifS=S, whereSdenotes the set of all possible convex combinations of a finite number of points ofS.

The intersection of any (finite or infinite) number of convex sets is a convex set. For any setS, the intersection of all the convex sets containingSis the smallest convex set containingS, which is called theconvex hullofSand denoted as conv(S).

The convex hull ofScoincides with the set of all convex combinations of points in S. That is, we have conv(S) =S. The convex hull of a setSis not necessarily closed (in the topological sense). The smallest closed convex set containingSis called the closed convex hullofS. For a finite setS, the convex hull is always closed.

The affine hullof a set S is defined to be the smallest affine set (a translation of a linear space) containingS, and is denoted by affS. Therelative interiorofS, denoted as riS, is the set of pointsx∈Ssuch that{y∈Rⁿ| ∥y−x∥<ε} ∩affSis contained inSfor someε^>0. In other words, the relative interior ofSis the set of the interior points ofSwith respect to the topology induced from affS.

For two setsSandT, the set

S+T={x+y|x∈S,y∈T} (2.6)

is called theMinkowski sumofSandT. IfSandT are convex, the Minkowski sum S+Tis a convex set.

A setSis aconeif it satisfies

x∈S, λ ^>⁰ ⁼⇒ λ^x∈S. (2.7)

A cone that is convex is called aconvex cone. In other words, a setSis a convex cone if and only if it satisfies the condition

x,y∈S, λ^,µ^>⁰ ⁼⇒ λ^x⁺µ^y∈S. (2.8)

2.2 Convex Functions

For a function f:Rⁿ→R∪ {−∞,+∞}we define

domf=dom_Rf ={x∈Rⁿ| −∞<f(x)<+∞}, (2.9) which is called theeffective domainof f.

A function f :Rⁿ→Ris said to beconvexif it satisfies

λ^f^{(x) + (1}−λ⁾^f^(y)≥f(λ^{x+ (1}−λ^)y) ^(x,^y∈Rⁿ; 0≤λ ≤1). (2.10)

(3)

Note that−∞is excluded from the possible function values of a convex function, and that the inequality (2.10) is satisfied, by convention, if both sides are equal to+∞. A convex function having a nonempty effective domain is called aproper convex function. A function isstrictly convex if it satisfies (2.10) with strict inequalities, i.e., if

λ^f^{(x) + (1}−λ⁾^f^(y)^> ^f⁽λ^{x+ (1}−λ^)y) ^(x,^y∈domf; 0<λ ^<^1). ^(2.11) A functiong:Rⁿ→Risconcaveif−gis convex, that is, if

λ^{g(x) + (1}−λ^)g(y)≤g(λ^x^{+ (1}−λ^)y) ^(x,^y∈Rⁿ; 0≤λ≤1). (2.12) Theepigraphof a function f :Rⁿ→R, denoted as epif, is the set of points in Rⁿ×Rlying above the graph ofα⁼^f(x). Namely,

epif={(x,α⁾∈Rⁿ⁺¹|α≥f(x)}. (2.13) Then we have

f is a convex function ⇐⇒ epif is a convex set. (2.14) A function f is said to beclosed convexif epif is a closed convex set inRⁿ⁺¹.

Theindicator functionof a setS⊆Rⁿis a functionδS:Rⁿ→ {0,+∞}defined by

δS(x) =

{0 (x∈S),

+∞(x̸∈S). (2.15)

Then we have

Sis a convex set ⇐⇒ δSis a convex function. (2.16) For a family of convex functions{f_k|k∈K}, indexed byK, the pointwise maximum of those functions, f(x) =sup{f_k(x)|k∈K}, is again a convex function, where the index setKhere may possibly be infinite. In particular, the maximum of a finite or infinite number of affine functions

f(x) =sup{αk+⟨p_k,x⟩ |k∈K} (2.17) is a convex function, whereαk∈Randp_k∈Rⁿfork∈Kand

⟨p,x⟩=

∑

n i=1

p_ix_i (2.18)

denotes theinner productof p= (p₁,p₂, . . . ,p_n)andx= (x₁,x₂, . . . ,x_n).

A function defined on Rⁿ is said to be polyhedral convex if its epigraph is a convex polyhedron inRⁿ⁺¹. A polyhedral convex function is exactly such a function that can be represented as the maximum of a finite number of affine functions (i.e., (2.17) with finiteK) on an effective domain represented as (2.4).

(4)

42 2 Elements of Convex Analysis (COSS 2018 Reading Material) For two functions f,g:Rⁿ→R, theirsumis the function f+g:Rⁿ→Rdefined naturally by

(f+g)(x) =f(x) +g(x) (x∈Rⁿ), (2.19) and theirinfimal convolutionis the function f2g:Rⁿ→R∪{−∞,+∞}defined by (f2g)(x) =inf{f(y) +g(z)|x=y+z, y,z∈Rⁿ} (x∈Rⁿ). (2.20) The sum of two convex functions is convex, and the infimal convolution of two convex functions is convex if it does not take the value of−∞. If f andgare the indicator functions of setsSandT, then f+gand f2gare the indicator functions of the intersectionS∩T and the Minkowski sumS+T, respectively.

For a function f and a vectorp, we denote by f[−p]the function defined by f[−p](x) =f(x)− ⟨p,x⟩ (x∈Rⁿ). (2.21) This is convex for a convex function f.

2.3 Minimization and Subgradients

The most appealing property of a convex function is that local minimality is equivalent to global minimality. A point (or vector)xis said to be a (global)minimizerof

f if the inequality

f(x)≤f(y) (2.22)

holds for everyy. A pointxis alocal minimizer if the inequality (2.22) holds for everyyin some neighborhood ofx. Obviously, global minimality implies local minimality. The converse is not true in general, but it is true for convex functions.

Theorem 2.1.For a convex function, local minimality implies global minimality.

Proof. Letxbe a local minimizer of a convex functionf. Then we have f(z)≥f(x) for allzin some neighborhoodUofx. For anyy, we can chooseλ^<1 sufficiently close to 1 such thatz=λ^x^{+ (1}−λ^)y^{belongs to}^U. Then it follows from (2.10) that

λ^f^{(x) + (1}−λ⁾^f^(y)≥ f(λ^x^{+ (1}−λ^{)y) =}^f^(z)≥f(x).

This implies f(y)≥ f(x). ⊓⊔

The set of the minimizers of f is denoted as

argminf =argmin_Rf ={x∈Rⁿ| f(x)≤f(y) (∀y∈Rⁿ)}. (2.23) This is a convex set if f is convex.

Thesubdifferentialof a function f at a pointx∈domf is defined to be the set

∂^f^{(x) =}{p∈Rⁿ|f(y)−f(x)≥ ⟨p,y−x⟩(∀y∈Rⁿ)}. (2.24)

(5)

Note thatp∈∂^f^(x)if and only ifx∈argminf[−p]; in particular, 000∈∂^f^(x)^{if and} only ifx∈argminf. For a convex function f, the set∂^f^(x)is nonempty ifxis in the relative interior of domf. An element of∂^f^(x)is called asubgradientof f atx.

If f is convex and differentiable atx, the subdifferential∂^f^(x)consists of a single element, which is thegradientof f atx:

∇f = (∂^f

∂^x1

, ∂^f

∂^x2

, . . . , ∂^f

∂^xn

)

. (2.25)

The directional derivative of a function f at a pointx∈domf in a direction d∈Rⁿis defined by

f^′(x;d) =lim

α↓0

f(x+α^d)−f(x)

α ^(2.26)

when this limit (finite or infinite) exists, whereα↓0 means thatα tends to 0 from the positive side (α ^>0). For a convex function f, the limit exists for alld, and f^′(x;d)is a convex function ind. For a polyhedral convex function f, there exists ε^>0, independent ofx∈domf, such that

f^′(x;d) =f(x+d)−f(x) (∥d∥ ≤ε^). ^(2.27)

2.4 Conjugacy

As Fig. 2.1 (a,b) shows, a convex function f(x)can be recovered from tangent lines as the upper envelope of all tangent lines with different slopes. Letα be the vertical intercept of the tangent line with slope p. Since α is dependent on slope p, we denoteα ⁼−f^•(p). By considering the minimum distance between the graph of y= f(x)and the liney=px, we see that the minimum of f(x)−pxover allxis equal toα⁼−f^•(p); cf., Fig. 2.1(c). That is,

f^•(p) =sup{px−f(x)|x∈R} (p∈R). (2.28) This function f^•(p)should be equivalent to the original function f(x)in some ap- propriate sense.

For a function f :Rⁿ→R with domf ̸= /0, the convex conjugate (or simply conjugate) of f is a function f^•:Rⁿ→Rdefined by

f^•(p) =sup{⟨p,x⟩ −f(x)|x∈Rⁿ} (p∈Rⁿ), (2.29) which is indeed a convex function since it is the maximum of (infinitely many) affine functions in pindexed byx. The function f^•is also called the (convex)Legendre–

Fenchel transform of f, and the mapping f 7→ f^• is referred to as the (convex) Legendre–Fenchel transformation. Similarly, theconcave conjugateof a function g:Rⁿ→Rwith domg̸=/0 is a concave functiong^◦:Rⁿ→Rdefined by

(6)

44 2 Elements of Convex Analysis (COSS 2018 Reading Material)

-x 6

y f(x)

slopep

−f^•(p)

(a)

-x 6

y

(b)

-x 6

y f(x)

slopep

−f^•(p) 6

? (c)

Fig. 2.1 Tangent lines of a convex function

g^◦(p) =inf{⟨p,x⟩ −g(x)|x∈Rⁿ} (p∈Rⁿ). (2.30) Note thatg^◦(p) =−(−g)^•(−p).

For a function f, the conjugate function of the conjugate off, i.e.,(f^•)^•, is called thebiconjugateof f and denoted as f^••. The biconjugate of f is the largest closed convex function that is dominated pointwise by f.

Theorem 2.2.The Legendre–Fenchel transform f^•in(2.29)is a closed proper con- vex function for any function f withdomf ̸=/0, and f^••= f for a closed proper convex function f .

This theorem shows that the Legendre–Fenchel transformation f 7→ f^• gives a symmetric (or involutive) one-to-one correspondence in the class of all closed proper convex functions.

For a setS⊆Rⁿ, the conjugateδS•of its indicator functionδSis expressed as δS•(p) =sup{⟨p,x⟩ |x∈S} (p∈Rⁿ), (2.31) which is called the support functionof S. The biconjugate δS•• of the indicator functionδSof a setSis the indicator function of the closed convex hull ofS.

By Theorem 2.2 and the definition (2.24) we obtain the relationship p∈∂^f^(x) ⇐⇒ x∈argminf[−p]

⇕

f(x) +f^•(p) =⟨p,x⟩

⇕

x∈∂^f^•^(p) ⇐⇒ p∈argminf^•[−x]

(2.32)

for a closed proper convex function f and vectors x,p∈Rⁿ. For a closed convex function f and a pointxin the relative interior of domf, the support function of the subdifferential∂^f^(x)coincides with the directional derivative f^′(x;d)as a function ind, i.e.,

(δ_∂f(x))^•(d) =f^′(x;d) (d∈Rⁿ). (2.33)

(7)

The addition (2.19) and the infimal convolution (2.20) are conjugate operations with respect to the Legendre–Fenchel transformation. For proper convex functions

f andgwe have

(f2g)^•=f^•+g^•, (2.34)

(f+g)^•= f^•2g^•, (2.35)

where the latter is true under the assumption that ri(domf)∩ri(domg)̸=/0.

Example 2.1.The conjugate of a quadratic functionf(x) =¹₂x^⊤Axdefined by a positive definite symmetric matrixAcan be computed as follows. The maximizerxon the right-hand side of (2.29) is determined fromp=∇f(x) =Axasx=A⁻¹p. Then

f^•(p) =p^⊤x−1

2x^⊤Ax=1

2p^⊤A⁻¹p.

Since ∇f(x) =Ax and ∇f^•(p) =A⁻¹p, we indeed have the equivalence “p ∈

∂^f^(x) ⇐⇒ x∈∂^f^•(p)” in (2.32).

2.5 Duality

The separation theorem for functions asserts that, if a convex function pointwise dominates a concave function, then there exists an affine function that lies between the convex function and the concave function; see Fig. 2.2 (a).

Theorem 2.3 (Separation for convex functions).Let f:Rⁿ→Rbe a proper con- vex function and g:Rⁿ→Ra proper concave function, and assume that(a1)or (a2)below is satisfied:

(a1) ri(domf)∩ri(domg)̸=/0,

(a2) f and g are polyhedral, anddomf∩domg̸=/0.

If f(x)≥g(x) (∀x∈Rⁿ), there existα^∗∈Rand p^∗∈Rⁿsuch that

f(x)≥α^∗⁺⟨p^∗,x⟩ ≥g(x) (∀x∈Rⁿ). (2.36)

Note that the convexity assumption is critical in Theorem 2.3. In Fig. 2.2 (b), we have f(x)≥g(x)for allx, but there exists no affine functionα^∗⁺^p^∗^xthat separates

f(x)andg(x).

TheFenchel dualityis a min-max relation between a pair of convex function f and concave functiongand their conjugate functions f^•andg^◦.

Theorem 2.4 (Fenchel duality).Let f :Rⁿ→Rbe a proper convex function and g:Rⁿ→Ra proper concave function, and assume that at least one of the following four conditions(a1)∼(b2)below is satisfied:

(8)

46 2 Elements of Convex Analysis (COSS 2018 Reading Material) f(x)

g(x)

α^∗+p^∗x

(a) Convex-concave pair

f(x)

g(x) α^∗+p^∗x (b) Nonconvex-concave pair Fig. 2.2 Separation theorem

(a1) ri(domf)∩ri(domg)̸=/0,

(a2) f and g are polyhedral, anddomf∩domg̸=/0, (b1) f and g are closed¹, andri(domf^•)∩ri(domg^◦)̸=/0, (b2) f and g are polyhedral, anddomf^•∩domg^◦̸=/0.

Then it holds that

inf{f(x)−g(x)|x∈Rⁿ}=sup{g^◦(p)−f^•(p)|p∈Rⁿ}. (2.37) Moreover, if this common value is finite, the supremum is attained by some p∈ domf^•∩domg^◦under the assumption of(a1)or(a2), and the infimum is attained by some x∈domf∩domg under the assumption of(b1)or(b2).

If the supremum in (2.37) is attained byp=p^∗, we have

argmin(f−g) =argminf[−p^∗]∩argmaxg[−p^∗]. (2.38) Remark 2.1.Theorem 2.4 above is formulated for a pair of convex and concave functions. In some cases it is convenient to reformulate it in terms of two convex functions. For convex functions f andg, the min-max formula (2.37) is rewritten as inf{f(x) +g(x)|x∈Rⁿ}=sup{−f^•(p)−g^•(−p)|p∈Rⁿ}. (2.39)

References

1. Barvinok, A.: A Course in Convexity. American Mathematical Society, Providence, R.I.

(2002)

2. Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Ex- amples. Springer, Berlin (2000)

3. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

1By this we mean thatfand−gare closed convex functions.

(9)

4. Ekeland, I., T´emam, R.: Convex Analysis and Variational Problems. North-Holland, Amster- dam (1976); Society for Industrial and Applied Mathematics, Philadelphia (1999)

5. Hiriart-Urruty, J.-B., Lemar´echal, C.: Convex Analysis and Minimization Algorithms I, II.

Springer, Berlin (1993)

6. Hiriart-Urruty, J.-B., Lemar´echal, C.: Fundamentals of Convex Analysis. Springer, Berlin (2001)

7. Krantz, S.G.: Convex Analysis. CRC Press, Boca Raton (2015)

8. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

9. Rockafellar, R.T.: Conjugate Duality and Optimization. SIAM Regional Conference Series in Applied Mathematics, vol. 16, Society for Industrial and Applied Mathematics, Philadelphia (1974)

10. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)

(10)