An example of the resulting Krylov subspace eigensolvers is the generalized Davidson method

(1)

SHARP RITZ VALUE ESTIMATES

FOR RESTARTED KRYLOV SUBSPACE ITERATIONS^∗

MING ZHOU^†ANDKLAUS NEYMEYR^†

Abstract.Gradient iterations for the Rayleigh quotient are elementary methods for computing the smallest eigenvalues of a pair of symmetric and positive definite matrices. A considerable convergence acceleration can be achieved by preconditioning and by computing Rayleigh-Ritz approximations from subspaces of increasing dimensions. An example of the resulting Krylov subspace eigensolvers is the generalized Davidson method. Krylov subspace iterations can be restarted in order to limit their computer storage requirements. For the restarted Krylov subspace eigensolvers, a Chebyshev-type convergence estimate was presented by Knyazev in [Soviet J. Numer. Anal. Math. Modelling, 2 (1987), pp. 371–396]. This estimate has been generalized to arbitrary eigenvalue intervals in [SIAM J. Matrix Anal. Appl., 37 (2016), pp. 955–975].

The generalized Ritz value estimate is not sharp as it depends only on three eigenvalues. In the present paper we extend the latter analysis by generalizing the geometric approach from [SIAM J. Matrix Anal. Appl., 32 (2011), pp. 443–456] in order to derive a sharp Ritz value estimate for restarted Krylov subspace iterations.

Key words.Krylov subspace, Rayleigh quotient, Rayleigh-Ritz procedure, polynomial interpolation, multigrid, elliptic eigenvalue problem.

AMS subject classifications.65F15, 65N25

1. Introduction. Gradient iterations for the Rayleigh quotient ρ:Rⁿ\{0} →R, ρ(x) = (x, Ax)/(x, M x) (1.1)

serve for the computation of a moderate number of the extreme eigenvalues and the associated eigenspaces of a pair of symmetric and positive definite matrices A, M ∈ R^n×n. In typical applicationsAandM are finite element discretization matrices of self-adjoint and elliptic partial differential operators. For a sufficiently fine discretization, these matrices are large and sparse.

Thus, the smallest eigenvalues of(A, M)and the associated eigenvectors should not be computed by (classical) matrix transformations [2,6,20,23]. Instead, one prefers a gradient iteration for (1.1) of the basic formx^(`+1)=x^(`)−ω∇ρ(x^(`))since the minimizers of the Rayleigh quotient are the eigenvectors associated with the smallest eigenvalue of(A, M).

Replacing the Euclidean gradient∇ρ(·)by theA-gradient∇Aρ(·) :=A⁻¹∇ρ(·)can result in a considerable convergence acceleration [4,12,16]. Practically, theA-gradient iteration can be implemented by the Rayleigh-Ritz procedure applied to the two-dimensional subspace spanned by xandA⁻¹∇ρ(x). A further acceleration can be achieved by extending this subspace with previous iterates. The generalized Davidson method belongs to this class ofA-gradient iterations if it uses the exact inverseA⁻¹for preconditioning. We call this case “exact-inverse preconditioning”. The iterative scheme, which starts with an iteratex^(`)and results in a Ritz vectorx^(`+1)from an at mostk-dimensional subspaceV, reads

(1.2)







v⁽¹⁾=x^(`), V= span{v⁽¹⁾},

V ← span{V, A⁻¹∇ρ(v⁽ⁱ⁾)}, v⁽ⁱ⁺¹⁾ ← RRmin(V), for i= 1, . . . , k−1, x^(`+1)=v^(k).

Therein, the Rayleigh-Ritz procedure RRmin(V)returns a Ritz vector associated with the smallest Ritz value of(A, M)inV. Knyazev [8] analyzed the convergence behavior of (1.2) by estimating ρ(x^(`+1)). A more general estimate, which improves a classical result for Krylov subspaces by Kaniel, Saad, and Parlett [7,20,21], is presented in the recent work [19]. Numerical examples

∗Received May 15, 2017. Accepted September 22, 2017. Published online on November 10, 2017. Recommended by D. Szyld.

†Universität Rostock, Institut für Mathematik, Ulmenstraße 69, 18055 Rostock, Germany ({ming.zhou,klaus.neymeyr}@uni-rostock.de).

424

(2)

show that even the improved estimate is only sharp fork = 2. In fact, the special form of the estimate for the two-dimensional case coincides with the estimate for theA-gradient iteration analyzed in [16]. There, the analytic theory is mainly based on geometric arguments. A partial generalization of this geometric analysis is presented in [19] in order to prove an estimate for the angle enclosed byx^(`)and the eigenspace associated with the smallest eigenvalue. This estimate is sharp for Krylov subspaces of arbitrary dimensionsk≥2. In the present paper we complete this generalization in order to derive a sharp Ritz value estimate for the general casesk >2. The new result is a first step toward a convergence analysis of (1.2) for the case thatA⁻¹is substituted by general preconditionersB⁻¹≈A⁻¹; cf. the analysis in [16] which has been generalized in [15] to preconditioned gradient iterations.

1.1. TheA-gradient and restarted Krylov subspace iterations. TheA-gradient of (1.1) reads

A⁻¹∇ρ(x) = 2/(x^TM x)

x−ρ(x)A⁻¹M x

∈span{x, A⁻¹M x}.

Hence the restarted iteration (1.2) works in subspacesVof the Krylov subspace K^k(x^(`)) = span{x^(`), A⁻¹M x^(`), . . . ,(A⁻¹M)^k−1x^(`)}.

We consider the nontrivial case that the dimension of V increases by 1 in each step so that dimK^k(x^(`)) =k. All this allows us to consider (1.2) as a restarted Krylov subspace iteration

(1.3) x^(`+1) ← RRmin K^k(x^(`))

.

Alternatively, the iteration can be interpreted as an Invert-Lanczos process [14]. The required matrix-vector productsA⁻¹ware computed by solving linear systems inA; see [9,10,11]. An approximate solution of these linear systems amounts to an inexact-inverse preconditioning. Block versions of (1.3) serve for the simultaneous computation of several eigenvalues; cf. the block Lanczos algorithm [3,5].

Classical convergence estimates for eigenvalue and eigenvector approximations in Krylov subspaces have been presented by Kaniel, Saad, and Parlett [7,20,21]. A generalization of Theorem 12.4.1 in [20] to the generalized eigenvalue problem for(A, M)provides an upper bound for the approximation error of the smallest Ritz value inK^k(x^(`))which depends on differences of eigenvalues and angles betweenx^(`)and certain invariant subspaces. This generalization also provides upper bounds for angles between eigenvectors andK^k(x^(`)). However, these estimates cannot be applied recursively in order to formulate a priori estimates for multiple steps of the restarted Krylov subspace iteration (1.3). For instance, in order to estimateρ(x^(`+2))in terms of x^(`), the classical results cannot provide suitable estimates ofx^(`+1)andx^(`); cf. [19, Section 2].

For the derivation of an a priori estimate for (1.3), we choose the convergence measure

(1.4) ∆i,i+1(θ) = θ−λi

/ λi+1−θ ,

which describes the relative position of an eigenvalue approximationθwith respect to two neighboring eigenvaluesλi< λi+1of(A, M). The ratio (1.4) has been used in various papers on sharp convergence estimates for gradient-type eigensolvers; see [1,8,15,16,17,18]. In particular, [8]

provides for (1.3) the Ritz value estimate

(1.5) ∆_1,2 ρ(x^(`+1))

≤[T_k−1(1 + 2γ₁) ]⁻²∆_1,2 ρ(x^(`)) .

The convergence factor contains a Chebyshev polynomial of degree k−1 and the gap ratio γ₁ := (λ⁻¹₁ −λ⁻¹₂ )/(λ⁻¹₂ −λ⁻¹_max) of reciprocal eigenvalues, and it does not depend on the iteratex^(`). Thus, the a priori estimate∆_1,2 ρ(x^(`))

≤[T_k−1(1 + 2γ₁) ]^−2`∆_1,2 ρ(x⁽⁰⁾) holds.

Furthermore, (1.5) gives a tighter bound in comparison to the classical Ritz value estimate from [20]. This fact has been illustrated in [19, Section 2.1] by a numerical example. A generalization

(3)

concerning arbitrary eigenvalue intervals is given by [19, Theorem 3.5]. There, the generalized estimate concerns the case thatρ(x^(`))is contained in an arbitrary interval(λi, λi+1)fori≥1and has the form

(1.6) ∆_i,i+1 ρ(x^(`+1))

≤[T_k−1(1 + 2γ_i) ]⁻²∆_i,i+1 ρ(x^(`))

withγi:= (λ⁻¹_i −λ⁻¹_i+1)/(λ⁻¹_i+1−λ⁻¹_max). The convergence factor[Tk−1(1 + 2γi) ]⁻²shows that a larger dimensionkor a larger distance betweenλ_iandλ_i+1yields faster convergence. However, the numerical examples in [19] show that the Ritz value estimate (1.6) is not sharp fork > 2.

The simple explanation is that the convergence factor[T_k−1(1 + 2γ_i) ]⁻²depends only on three eigenvaluesλ_i,λ_i+1, andλ_mfor arbitraryk >2. In this paper our focus is on deriving a sharp Ritz value estimate which depends for increasingkon an increasing number of eigenvalues of(A, M).

Our convergence analysis can be simplified considerably by using anA-geometry representation as in [19], i.e., the generalized eigenvalue problemAx =λM xis represented by the standard eigenvalue problemHy=λ⁻¹yby consideringy=A^1/2xandH =A^−1/2M A^−1/2. Thus, we can start with the setting of the following theorem which contains the main results of [19].

THEOREM 1.1. Letµ1 > µ2 > · · · > µmbe the distinct eigenvalues of the symmetric and positive definite matrixH ∈ R^n×n, and letµ(·)be the Rayleigh quotient with respect to H. Consider a Ritz vectory⁰associated with the largest Ritz value ofHin the Krylov subspace K :=K^k_H(y) = span{y, Hy, . . . , H^k−1y}withy ∈Rⁿ\{0}andk≥ 2. Ifµ(y)∈ (µi+1, µi), then it holds that

(1.7) µi−µ(y⁰)

µ(y⁰)−µ_i+1 ≤[T_k−1(1 + 2γi) ]⁻² µi−µ(y) µ(y)−µ_i+1

with the Chebyshev polynomialT_k−1and the gap ratioγ_i:= (µ_i−µ_i+1)/(µ_i+1−µ_m). Ifyis notH-orthogonal to the eigenspaceW₁associated withµ₁, then

(1.8) tan∠^H(y⁰,W1)≤

k−1

Y

j=1

µ2−µm+1−j

µ1−µ_m+1−j tan∠^H(y,W1).

Here∠Hdenotes an angle with respect to the inner product induced byH.

1.2. Relation to the analyses by Saad and Knyazev. This paper aims at improving the Ritz value estimate (1.7) for the restarted Krylov subspace iteration (1.3). We extend the ellipsoid analysis from [19] in order to generalize the two-dimensional ellipse analysis from [16]. The new estimate depends on an interpolating polynomial generated bykeigenvalues; see Lemma3.2. In comparison to the Chebyshev polynomials used in the classical theory on Krylov subspaces, this polynomial is not derived from an optimization with respect to the representation of a Krylov subspace by polynomials, but it is derived from geometry-based arguments. Therefore, our analysis can hardly be compared to the Chebyshev-type analysis by Saad [22, eq. (6.45), (6.53)].

However, a proof technique suggested by Knyazev for deriving (1.5) (cf. [8, eq. (1.9), (1.18)]) is used in the proof of Theorem3.6, but the argument is applied to an interpolating polynomial instead of a Chebyshev polynomial. Among theksupporting eigenvalues, the smallest and the largest eigenvalue are given explicitly, and the monotonicity of our polynomial is described by the interpolation conditions. Thus, the speed of convergence depends on the distance between two neighboring eigenvaluesµiandµi+1. The improved quality of the new estimate in comparison to the Chebyshev-type estimates (1.6) and (1.7) is demonstrated in Figure3.2for a model problem and in Tables4.2and4.3for a high-dimensional PDE eigenproblem.

1.3. Overview of the paper. The remaining part of the paper is structured as follows. The drawback of the dimension reduction technique from [16] applied to Krylov subspaces of dimension k > 2is discussed in Section2. Then a short review is given of the analytic arguments from [19] which are required here to extend the ellipsoid analysis. The main convergence analysis in

(4)

Section3provides first an implicit sharp Ritz value estimate in terms of the settings of Theorem1.1.

Monotonicity arguments show that the slowest convergence is attained in a(k+ 1)-dimensional invariant subspace. This allows us to derive explicit convergence estimates. Especially for k= 3an explicit sharp estimate is compared with the Chebyshev-type estimate (1.7) for some numerical examples. Finally, in Section4the iteration (1.3) is numerically tested by an adaptive multigrid eigensolver for elliptic operator eigenvalue problems. The improvement of the estimates is illustrated by numerical tests.

2. Auxiliary arguments on Krylov subspaces. In the casek = 2the estimates (1.6) and (1.7) are sharp and coincide with the estimates for the steepest descent/ascent iteration as presented in [16]. A natural way to gain a sharp estimate also fork >2is to use the same framework and to generalize the arguments to higher-dimensional subspaces. In [16] the analysis starts by deriving necessary conditions on the slowest convergence by differentiating smooth curves on a level set of the Rayleigh quotient. The generalization of this analysis to restarted Krylov subspace iterations shows that the slowest convergence is attained in invariant subspaces which are associated with 2k−1distinct eigenvalues. In contrast to this, the sharp Ritz vector estimate (1.8) in Theorem1.1 depends on onlyk+1eigenvalues and suggests to use a(k+1)-dimensional auxiliary subspace.

For this reason we prefer the analysis framework which has been used in [19] for proving the estimate (1.8).

We begin with the setting of Theorem1.1. IfKisH-invariant, thenµ(y⁰)is an eigenvalue of Hand fulfillsµ(y⁰)≥µ(y)> µi+1. This meansµ(y⁰)≥µiso that the left-hand side of (1.7) is non-positive; the estimate holds trivially. IfKis notH-invariant, then we first consider the case µ(y)∈(µ2, µ1)since the auxiliary arguments for (1.8) are related to the extremal eigenvalues. Let y=Pm

j=1wjbe the eigenspace expansion ofy, i.e.,wjis the orthogonal projection ofyonto the eigenspace associated withµ_jfor eachjwith respect to the Euclidean inner product and also with respect to the inner product induced byH. Herew₁is nonzero since otherwiseµ(y)≤µ₂. Then the auxiliary subspace

(2.1) U := span{w1,K}= span{w1, y, Hy, . . . , H^k−1y}

can be used for a mini-dimensional analysis. Next we collect some basic properties of U. LEMMA2.1.In the setting of Theorem1.1and withµ(y)∈(µ₂, µ₁), letUbe an orthonormal matrix whose column space equalsUdefined by(2.1). We further define theU-representations

Hb :=U^THU, yb:=U^Ty.

IfKis notH-invariant, then the following assertions hold.

(a) Left multiplication ofKwithU^T results in the Krylov subspace Kb := span{y,b Hby, . . . ,b Hb^k−1by}.

The pair(θ, v)is a Ritz pair ofH inKif and only if(θ,bv)withbv:=U^Tvis a Ritz pair ofHb inK.b

(b) Uhas the dimensionk+ 1. Thus, there arek+1Ritz values ofHin U. These are denoted byα1≥ · · · ≥αk+1. Theαiare the eigenvalues ofHb and are strictly interlaced by the Ritz valuesθ1≥ · · · ≥θkofHinK, i.e.,

(2.2) α1> θ1> α2>· · ·> αk> θk> αk+1.

Further, the eigenspace ofHb associated with the eigenvalueα1=µ1is the column space ofU^Tw1.

(c) The eigenvaluesα1, . . . , αk+1 ofHb are simple due to (2.2). Letbu1, . . . ,ubk+1be the associated orthonormal eigenvectors andµ(·)b be the Rayleigh quotient with respect toH.b Then the affine space

Ub:=ub₁+ span{ub₂, . . . ,bu_k+1}

(5)

contains a vector

ye:=ub1+

k+1

X

j=2

βjbuj

for whichUeyis collinear withy. TheH-invariance ofKguarantees that all coefficients β_jare nonzero. There exists a further vectorey⁰inUbfor which(θ₁, Uye⁰)is a Ritz pair of HinK. The level set{ub∈Ub; µ(b bu) =θ1}forms an ellipsoid. Namely, the coefficients βbjin the representationub=bu1+Pk+1

j=2βbjbujfulfill the ellipsoid equation

(2.3)

k+1

X

j=2

βb_j² α1−θ1

θ1−αj

= 1.

The proof of this lemma is given in [19, Lemma 3.3 and Lemma 3.4]. In particular, the ellipsoid defined by (2.3) can be derived by the definition of the Rayleigh quotientµ(·)b since

θ₁=µ(b bu) = bu^THbub

bu^Tbu =α1+Pk+1 j=2αjβb_j² 1 +Pk+1

j=2βb_j² ⇒

k+1

X

j=2

(θ₁−α_j)bβ_j²=α₁−θ₁.

The quotients(α1−θ1)/(θ1−αj)are positive due to (2.2) and are equal to the squares of the semi-axes of the ellipsoid; see Figure2.1for an illustration.

a=

qα₁−θ1

θ1−α2

b=q

α1−θ1

θ₁−α3

c=q

α₁−θ1

θ1−α4

FIG. 2.1.The ellipsoid defined by(2.3)in the casek= 3.

3. Sharp Ritz value estimates. Lemma2.1is the basis for the following advanced ellipsoid analysis, which is inspired by [19]. We use the properties of the auxiliary subspace U in order to derive a sharp estimate for restarted Krylov subspace iterations. First, Theorem3.1provides an estimate in terms of the Ritz values ofH in U. In Section3.2, Lemma3.2contains an improved estimate using eigenvalues ofH and monotonicity arguments. This result shows that the slowest convergence is attained in a(k+1)-dimensional invariant subspace. Additionally, similar estimates concerning arbitrary eigenvalue intervals, namelyµ(y)∈ (µi+1, µi), fori = 1, . . . , m−1, are formulated in Lemma3.3. In Section3.3we improve the representation of the sharp estimate especially for three-dimensional Krylov subspaces. Moreover, the drawback of the Chebyshev- type estimate (1.7) is discussed and illustrated by a numerical comparison. In Section3.4the main results are summarized and restated in explicit form for the generalized eigenvalue problem Ax=λM x.

(6)

3.1. An auxiliary estimate. We start with the nontrivial case that the Krylov subspaceK is notH-invariant. The analysis extends the geometry-based one in [19] and generalizes the derivation of a sharp bound for the steepest descent method in [16].

THEOREM3.1.Letµ1> µ2>· · ·> µmbe the distinct eigenvalues of the symmetric and positive definite matrixH ∈R^n×n. The corresponding Rayleigh quotient isµ(y) = (y^THy)/(y^Ty).

Consider for y ∈ Rⁿ\{0} the Krylov subspaceK := K^k_H(y) = span{y, Hy, . . . , H^k−1y}, k > 2, and the H-projection w1 of y to the eigenspace associated with µ1. If K is not H- invariant andµ(y)∈(µ₂, µ₁), then the subspace U := span{w1,K}is(k+1)-dimensional. Let α₁≥ · · · ≥α_k+1be the Ritz values ofH in U. Then the largest Ritz valueθ₁ofHinKsatisfies (3.1)

α1−θ1

θ1−α2

α1−µ(y) µ(y)−α2

−1

≤[p(α1) ]⁻². Herep(·)is a polynomial of degreek−1that interpolates thekpairs αj,(−1)^j

forj= 2, . . . , k+

1. Equality is attained in the limit caseµ(y)→α1.

Proof.Lemma2.1provesdimU =k+1. The proof of (3.1) is given in five steps.

(i) We represent y and a Ritz vector associated with θ1 by using the auxiliary vectors ey,ey⁰∈Ubas introduced in Lemma2.1. Then theUb-representation of the Krylov subspace Kis shown to be a hyperplane inR^k.

(ii) A representation of the hyperplane from (i) is derived depending on the Ritz values α₁, . . . , α_k+1and the coefficientsβ₂, . . . , β_k+1ofeyfrom Lemma2.1.

(iii) We representye⁰by using the geometric property of the ellipsoid (2.3).

(iv) An intermediate bound for the left-hand side of (3.1) is proved concerning the Ub- coefficients ofyeandye⁰.

(v) An upper bound is proved for the intermediate bound by (iv). This bound is shown to be equal to[p(α1) ]⁻²with the interpolating polynomialp(·).

We now go into detail.

(i) According to Lemma2.1, we use the Rayleigh quotientµ(·)b with respect to the matrix Hb :=U^THU. It holds that

bµ(ey) =µ(Ub ^Ty) = (U^Ty)^T(U^THU)(U^Ty)

(U^Ty)^T(U^Ty) =y^T(U U^T)H(U U^T)y y^T(U U^T)y .

Sincey, Hy ∈ U andU U^T is the projection matrix onto U, it holds that(U U^T)y = y and (U U^T)H(U U^T)y = Hy so thatµ(b y) = (ye ^THy)/(y^Ty) = µ(y). Further, (a) in Lemma2.1 shows that(θ₁,ye⁰)is a Ritz pair ofHb inKbsince(θ₁, Uey⁰)is a Ritz pair ofH inK, and since ey⁰=U^T(Uye⁰). Moreover,θ1=µ(b ey⁰)is the largest Ritz value ofHb inK; i.e., the maximum ofb µ(·)b inK. Thus, the analysis can be restricted to the Krylov subspaceb K.b

Due toey,ey⁰∈Ub, we define for eachub∈Ubthe coefficient vector Pub:= (bβ₂, . . . ,βb_k+1)^T ∈R^k with respect to the expansion bu = bu1 +Pk+1

j=2βbjbuj. For each subspace V ⊆b Ub we define PVb := {Pub; bu∈ V}. Then we consider the level setb S := {bu ∈ Ub ; µ(b u) =b θ₁} and the intersectionU ∩b K. Theirb U-representationsb PSandP(U ∩b K)b can be interpreted geometrically.

The set PS is the ellipsoid given by (2.3). The convexity of the ellipsoid and the fact thatey⁰ maximizes the Rayleigh quotientµ(·)b inU ∩b Kbshow thatP(U ∩b K)b is a tangential hyperplane of PS; see Figure3.1. The point of tangency isPey⁰.

(ii) The hyperplaneP(U ∩b K)b can be represented byPub+Wwith an arbitraryub∈U ∩b Kb and a(k−1)-dimensional subspaceW ⊂R^k. SuitableubandWcan be constructed by using the intersection pointsviof U ∩b Kband the axesub1+ span{ubi}with

v_i:= (U ∩b K)b ∩(ub₁+ span{ub_i}), i= 2, . . . , k+ 1.

(7)

FIG. 3.1. Geometry in the affine spaceUbin the casek= 3. Left: ellipsoidPSand its tangential hyperplane P(U ∩b K). Right: representation of the tangential hyperplane by its intersection points with the coordinate axes.b

Due toub1+ span{ubi} ⊂Ub, it holds thatvi=K ∩b (ub1+ span{ubi}). Sinceeyis collinear withby (asUyeandy=Uybare collinear) andvi∈Kb= span{by,Hbby, . . . ,Hb^k−1y}, the vectorsb vican be represented byvi=qi(Hb)yefor certain polynomialsqi(·)whose degrees are less than or equal to k−1. Next, we determine the explicit form ofqi(·). With the expansionye=bu1+Pk+1

j=2βjbuj

from Lemma2.1, we get

v_i =q_i(H)b ye=q_i(Hb)



bu₁+

k+1

X

j=2

β_jub_j



=q_i(α₁)ub₁+

k+1

X

j=2

β_jq_i(α_j)ub_j.

Together withv_i∈bu₁+ span{bu_i}andβ_j 6= 0, we conclude that

qi(α1) = 1 and qi(αj) = 0 for j = 2, . . . , k+ 1, j6=i.

This interpolation problem in the k pairwise different nodesα1, . . . , α_i−1, αi+1, . . . , αk+1 is uniquely solved by the Lagrange polynomial

q_i(α) =

k+1

Y

j=2, j6=i

α−αj

α₁−α_j. This yields

vi=bu1+βiqi(αi)ubi=bu1+βi k+1

Y

j=2, j6=i

αi−αj

α₁−α_j ubi. The correspondingU-representations areb

(3.2) P v_i :=β_iκ_ie_i−1 with κ_i:=

k+1

Y

j=2, j6=i

α_i−α_j α1−αj

, i= 2, . . . , k+ 1,

wheree₁, . . . , e_k ∈R^kare the standard basis vectors. By using thesekintersection points, we obtain a representationPbu+WofP(U ∩b K)b withW= span{P v₃−P v₂, . . . , P v_k+1−P v₂} andub=v2; see Figure3.1. HereP vi=βiκie_i−1andβiκi6= 0prove thatdimW=k−1.

(iii) The representationPu+Wb from (ii) allows us to describeey⁰componentwise: we interpret the ellipsoidPSas the unit sphere inR^kwith respect to the normk · kDinduced by the diagonal matrixD= diag(δ2, . . . , δk+1)∈R^k×kwith

(3.3) δj =

α1−θ1

θ₁−α_j −1

, j= 2, . . . , k+ 1.

(8)

SincePey⁰is the point of tangency associated withPSand the tangential hyperplanePub+W, the vectorPey⁰is orthogonal to the subspaceWwith respect to the inner product(·,·)Dinduced by D. Because ofdimW=k−1, theD-orthogonal complement ofWinR^k is one-dimensional so thatPye⁰is collinear with anyv∈R^k\{0}satisfyingv⊥DW. Such a vectorvcan easily be determined by using the basis vectorsP vi−P v2ofW. We set, e.g.,v= (γ1, . . . , γk)^T and solve the system of equations

(v, P vi−P v2)D= 0, i= 2, . . . , k+ 1.

SinceD= diag(δ₂, . . . , δ_k+1)andP v_i=β_iκ_ie_i−1, these equations have the detailed form γi−1δi(βiκi)−γ1δ2(β2κ2) = 0, i= 2, . . . , k+ 1.

A particular solution of this linear system with degenerate rank has, forj = 1, . . . , k, the compo- nentsγj= (δj+1βj+1κj+1)⁻¹so thatPye⁰is collinear to

(3.4) v=

1

δ₂β₂κ₂, . . . , 1 δ_k+1β_k+1κ_k+1

^T .

(iv) In order to apply the detailed representation (3.4), we derive for the left-hand side of (3.1) an intermediate bound that is related toPyeandPye⁰. First, (b) in Lemma2.1and the Courant- Fischer principles show thatµ₁=α₁> µ₂≥α₂. Thenµ(y)∈(µ₂, µ₁)andµ(b ey) =µ(y)yield µ(b ey)∈(α₂, α₁). Next, the expansionye=bu₁+Pk+1

j=2β_jbu_j gives the equation

bµ(y) =e α₁+Pk+1 j=2α_jβ_j² 1 +Pk+1

j=2β_j² so thatα1−µ(b y) =e Pk+1

j=2β_j² µ(b y)e −αj and (3.5) α1−µ(y)

µ(y)−α₂ = α1−µ(b ey) bµ(y)e −α₂ =

k+1

X

j=2

βj2

µ(b y)e −αj

µ(b y)e −α₂

≥

k+1

X

j=2

βj2θ1−αj

θ₁−α₂

by usingθ₁≥µ(b y)e and the monotonicity of the functionf(a) = (α−α_j)/(α−α₂). Thus, α1−θ1

θ₁−α₂

α1−µ(y) µ(y)−α₂

−1

≤

α1−θ1

θ₁−α₂





k+1

X

j=2

βj2θ1−αj

θ₁−α₂





−1

=





k+1

X

j=2

βj2

θ1−αj

α1−θ1





−1 (3.3)

=





k+1

X

j=2

δjβj2





−1

=kPyke ⁻²_D .

However, the intermediate boundkPeyk⁻²_D depends on the semi-axesδ2, . . . , δk+1, whereas the bound in (3.1) only depends on the Ritz values ofH in U. Hence, a further reformulation is necessary. SincePy, Pe ey⁰ ∈ P(U ∩b K) =b Pbu+W, it holds thatPye−Pye⁰ ∈ W. Then the orthogonalityPey⁰⊥DWimplies that

(Pye−Pey⁰, Pye⁰)D= 0 ⇒ (Py, Pe ye⁰)D=kPye⁰k²_D= 1.

Here the last equality holds sincePey⁰belongs to the unit sphere with respect tok · k_D; see (iii).

Furthermore, we get

cos∠^D(Py, Pe ye⁰) = (Pey, Pye⁰)D

kPyke DkPey⁰kD

= 1

kPyke D

.

(9)

Therefore, the intermediate boundkPeyk⁻²_D coincides withcos²∠^D(Pey, Pey⁰)and, forv from (3.4), withcos²∠^D(Py, v)e sincePye⁰is collinear withv.

(v) Finally,cos²∠D(Py, v)e needs to be compared with a further bound which only depends onα₁, . . . , α_k+1. We begin with the representation

cos²∠^D(Py, v) =e (Py, v)e ²_D kPyke ²_Dkvk²_D

(3.4)

=

Pk+1 j=2κ⁻¹_j ² Pk+1

j=2δ_jβ_j² Pk+1

j=2δ_j⁻¹β_j⁻²κ⁻²_j . The denominator can be bounded from below by using the Cauchy-Schwarz inequality,





k+1

X

j=2

δjβ_j²





1/2



k+1

X

j=2

δ_j⁻¹β_j⁻²κ⁻²_j





1/2

≥

k+1

X

j=2

δ_j^1/2βj

δ^−1/2_j β_j⁻¹|κj|⁻¹

=

k+1

X

j=2

|κj|⁻¹.

This yields

cos²∠D(Pey, v)≤

Pk+1 j=2κ⁻¹_j ² Pk+1

j=2|κj|⁻¹².

According to the definition ofκ2, . . . , κk+1in (3.2), we consider the polynomials l_j(α) =

k+1

Y

i=2, i6=j

α−α_i αj−αi

, j= 2, . . . , k+ 1,

which are the Lagrange basis polynomials with respect toα2, . . . , αk+1. Then the sumPk+1 j=2lj(·) is a constant function equal to1so thatPk+1

j=2κ⁻¹_j =Pk+1

j=2lj(α1) = 1. Furthermore, the relation α1< α2<· · ·< αk+1shows that|κj|⁻¹= (−1)^jκ⁻¹_j . Thus,

k+1

X

j=2

|κj|⁻¹=

k+1

X

j=2

(−1)^jκ⁻¹_j =

k+1

X

j=2

(−1)^jlj(α1) =p(α1)

with the interpolating polynomial p(·) for the pairs α_j,(−1)^j

, j = 2, . . . , k+ 1, so that cos²∠D(Py, v)e ≤[p(α₁) ]⁻². Combining this with (iv) results in the estimate (3.1).

In the limit caseµ(y)→α1, which means thatµ(b y)e →α1andθ1 →α1, the inequality in (3.5) for the intermediate bound in (iv) turns into an equality. Additionally, the Cauchy-Schwarz inequality applied in (v) turns into an equality by setting suitable coefficientsβjsuch that the two corresponding vectors are collinear. Thus, equality in (3.1) can be attained.

3.2. An improved estimate in terms of the eigenvalues ofH. In order to improve the usefulness and applicability of the estimate (3.1), we use the Courant-Fischer principles to bound the Ritz valuesα1, . . . , αk+1in terms of the eigenvalues ofH.

LEMMA 3.2. Letµ1 > µ2 >· · · > µmbe the distinct eigenvalues of the symmetric and positive definite matrixH ∈ R^n×n, and letµ(·) be the Rayleigh quotient with respect toH.

Consider a Ritz vectory⁰ associated with the largest Ritz value ofH in the Krylov subspace K:=K^k_H(y) = span{y, Hy, . . . , H^k−1y}withy∈Rⁿ\{0}andk >2. Ifµ(y)∈(µ2, µ1), then it holds that

(3.6) µ1−µ(y⁰)

µ(y⁰)−µ2

≤

minJ pJ(µ1)−2 µ1−µ(y) µ(y)−µ2

,

(10)

whereJruns through all (k−2)-element subsets of{3, . . . , m−1}. Further,pJ(·)is a polynomial of degreek−1through thekpoints µ2,1

, µm,(−1)^k+1

, and µ_σ(j),(−1)^j

,j = 3, . . . , k, with the indicesσ(j)∈J in increasing order.

Proof. IfKisH-invariant, then one can easily show thatµ(y⁰) = µ₁. Thus, (3.6) holds trivially. IfKis notH-invariant, then Theorem3.1provides the auxiliary estimate (3.1). Because ofµ1=α1> µ2≥α2andµ(y⁰) =θ1, we have

µ1−µ(y⁰) µ(y⁰)−µ₂

µ1−µ(y) µ(y)−µ₂

⁻¹

≤

α1−θ1

θ₁−α₂

α1−µ(y) µ(y)−α₂

⁻¹

≤[p(α1) ]⁻²= [p(µ1) ]⁻². The interpolation conditions p(αj) = (−1)^j, j = 2, . . . , k + 1, together with the relation µ1 = α1 > α2 > · · · > αk+1 imply thatp(·)has all its roots in (αk+1, α2). Then p(µ1) has the same sign asp(α2)so thatp(µ1)>0. Thus, we can maximize[p(µ1) ]⁻²by minimizing p(µ1). To prove (3.6) we consider the functionf(α2, αk+1) :=p(µ1)and compare the values f(α2, αk+1),f(µ2, αk+1), andf(µ2, µm)on condition thatµ1> µ2≥α2>· · ·> αk+1≥µm.

In order to showf(α2, αk+1)≥f(µ2, αk+1), we use the Newton form of the interpolating polynomialp(·)derived from the table

αk+1 (−1)^k+1

α_k (−1)^k δ_k,k+1⁽¹⁾ ... ... ... . .. α3 −1 δ_3,4⁽¹⁾ . .. . ..

α₂ 1 δ_2,3⁽¹⁾ δ_2,4⁽²⁾ · · · δ_2,k+1^(k−1) with the divided differences

(3.7) δ_i,i+1⁽¹⁾ := 2 (−1)ⁱ

α_i−α_i+1, δ_i,i+j+1^(j+1) := δ^(j)_i,i+j−δ^(j)_i+1,i+j+1 α_i−α_i+j+1 .

The firstk−1rows define a polynomialq(·)whose coefficients do not depend onα₂. Furthermore, the polynomialp(·)has the formp(α) =q(α) +δ_2,k+1^(k−1)Qk+1

i=3(α−αi)so that f(α₂, α_k+1) =p(µ₁) =q(µ₁) +δ_2,k+1^(k−1)

k+1

Y

i=3

(µ₁−α_i).

To representf(µ2, αk+1)in a similar way, we substituteα2withµ2in the Newton table. Then only the last row is changed. The new quotients are

eδ_2,3⁽¹⁾:= 2 µ2−α3

, δe_2,j+3^(j+1):=δe^(j)_2,j+2−δ^(j)_3,j+3 µ2−αj+3

so that

f(µ₂, α_k+1) =q(µ₁) +eδ^(k−1)_2,k+1

k+1

Y

i=3

(µ₁−α_i).

Because ofµ1> α2>· · ·> αk+1, the productQk+1

i=3(µ1−αi)is positive. Thus, we can verify f(α₂, α_k+1)≥f(µ₂, α_k+1)by means of the inequalityδ_2,k+1^(k−1)≥eδ_2,k+1^(k−1). This is shown below.

(11)

First, we get by induction thatδ_i,i+j^(j) >0for eveniandδ_i,i+j^(j) <0for oddi. In particular, it holds thatδ^(j)_3,j+3<0. Next, a successive comparison of the two versions of the last row leads to

µ2≥α2> α3 ⇒ δ⁽¹⁾_2,3 = 2 α2−α3

≥ 2

µ2−α3

=δe_2,3⁽¹⁾>0, µ2≥α2> αj+3

δ^(j)_2,j+2≥δe_2,j+2^(j) >0 δ_3,j+3^(j) <0











⇒ δ^(j+1)_2,j+3= δ^(j)_2,j₊₂−δ_3,j+3^(j) α2−αj+3

≥ eδ_2,j^(j)₊₂−δ_3,j+3^(j) µ2−αj+3

=eδ^(j+1)_2,j+3>0.

This proves thatδ^(k−1)_2,k+1≥eδ^(k−1)_2,k+1inductively.

In order to showf(µ2, αk+1)≥f(µ2, µm), we modify the Newton table in the form

µ₂ 1

α3 −1 δ_2,3⁽¹⁾

... ... ... . ..

αk (−1)^k δ_k−1,k⁽¹⁾ . .. . ..

αk+1 (−1)^k+1 δ_k,k+1⁽¹⁾ δ⁽²⁾_k−1,k+1 · · · δ_2,k+1^(k−1) .

Here we use again (3.7) to form the divided differences and substituteα2byµ2. As above, we compare the last row with its variant corresponding toµm. The variant has the new quotients

bδ⁽¹⁾_k,k+1:= 2 (−1)^k αk−µm

, bδ^(j+1)_k−j,k+1:=δ_k−j,k^(j) −bδ^(j)_k−j+1,k+1 α_k−j−µm

.

Thenf(µ2, αk+1)≥f(µ2, µm)is verified by showingδ^(k−1)_2,k+1≥bδ^(k−1)_2,k+1. Here it holds again that δ_i,i+j^(j) >0for eveniandδ_i,i+j^(j) <0for oddi. In the case of evenk, we have

α_k > α_k+1≥µ_m ⇒ δ⁽¹⁾_k,k+1= 2 αk−αk+1

≥ 2

αk−µm

=bδ⁽¹⁾_k,k+1>0,

δ^(j+1)_k−j,k+1=δ^(j)_k−j,k−δ_k−j+1,k+1^(j) α_k−j−αk+1

≤ δ_k−j,k^(j) −δb^(j)_k−j+1,k+1 α_k−j−µm

=δb^(j+1)_k−j,k+1<0for oddj, and

δ^(j+1)_k−j,k+1=δ^(j)_k−j,k−δ_k−j+1,k+1^(j)

α_k−j−α_k+1 ≥ δ_k−j,k^(j) −δb^(j)_k−j+1,k+1

α_k−j−µ_m =δb^(j+1)_k−j,k+1>0for evenj so thatδ^(k−1)_2,k+1≥bδ^(k−1)_2,k+1. For oddka similar induction yields the same result.

In summary, we have shown thatf(α2, αk+1) ≥ f(µ2, αk+1) ≥ f(µ2, µm). In order to improve the lower boundf(µ2, µm), we can consider it as a function ofα3, . . . , αkand determine its discrete minimum in the set{µ3, . . . , µ_m−1}^k−2⊂R^k−2under the constraintα3>· · ·> αk. This results in the interpolating polynomialpJ(·)in the assertion.

An optimal setJin minJ pJ(µ1)−2

can be determined by solving a discrete minimization problem. Then the analytical approach of Theorem3.1is applicable by considering an invariant subspace corresponding toJ instead of the subspace U. The resulting bound is sharp. This additionally shows that the slowest convergence can be attained in a(k+1)-dimensional invariant subspace. For the caseµ(y)∈(µ_i+1, µ_i), a similar analysis can be formulated with some additional assumptions onyand the Ritz valueα₂.

LEMMA 3.3. Letµ₁ > µ₂ >· · · > µ_mbe the distinct eigenvalues of the symmetric and positive definite matrixH ∈ R^n×n, and letµ(·) be the Rayleigh quotient with respect toH.

(12)

Consider the Krylov subspaceK:=K_H^k(y) = span{y, Hy, . . . , H^k−1y}withy∈Rⁿ\{0}and k >2, where theH-projectionwiofyonto the eigenspace associated withµiis nonzero. IfKis not H-invariant andµ(y)∈(µi+1, µi), then the subspaceU = span{wi,K}is(k+1)-dimensional and hask+1Ritz values ofH which are denoted byα1 ≥ · · · ≥αk+1. Lety⁰ be a Ritz vector associated with the largest Ritz valueθ1ofHinK. Ifµi+1 ≥α2, then the estimate(3.1)holds.

Furthermore,µ(y⁰)≥µiholds trivially fori > m−k. In the casei≤m−kit holds that µi−µ(y⁰)

µ(y⁰)−µ_i+1 ≤ min

J p_J(µ_i)−2 µi−µ(y) µ(y)−µ_i+1,

whereJ runs through all (k−2)-element subsets of{i+2, . . . , m−1}. Further,pJ(·)is a polynomial of degreek−1that interpolates the pairs µi+1,1

, µm,(−1)^k+1

, and µ_σ(j),(−1)^j , j= 3, . . . , k, with the indicesσ(j)∈Jin increasing order.

Proof. With the additional assumptionswi 6= 0and µi+1 ≥ α2, minor modifications of Lemma2.1and Theorem3.1result in (3.1). The new estimate can be proved by modifying the proof of Lemma3.2and usingµi=α1> µi+1≥α2.

3.3. Explicit andJ-minimization-free estimates. An obvious drawback of the estimates of Lemma3.2and Lemma3.3is that the bounds are non-explicit, i.e., they depend on a minimization with respect to the index setJ. We prefer a description that explicitly depends on the relevant eigenvalues. Such explicit estimates can be proved at least for three-dimensional Krylov subspaces.

In order to avoid the additional assumptions onyand the Ritz valueα2in Lemma3.3, we start with the caseµ(y)∈(µ2, µ1). First we adapt Lemma3.2.

LEMMA3.4.In the setting of Lemma3.2fork= 3, it holds that µ1−µ(y⁰)

µ(y⁰)−µ₂ ≤[q(µ1) ]⁻² µ1−µ(y) µ(y)−µ₂. Hereq(·)is a quadratic polynomial which interpolates the pairs µ₂,1

, µ_ξ,−1

, and µ_m,1 , andµξ is an eigenvalue which has the smallest distance to(µ2+µm)/2among the eigenvalues µ3, . . . , µ_m−1, where we select the larger one asµξ if there are two such eigenvalues nearest to (µ2+µm)/2.

Proof.We start with the estimate (3.6) from Lemma3.2. In the casek= 3, the polynomial p_J(·)in the convergence factor min_J p_J(µ₁)−2

is a quadratic polynomial interpolating the pairs µ2,1

, µm,1

, and µξ,−1

withµξ ∈ {µ3, . . . , µ_m−1}. In order to determine the optimal pJ(·), we define

f(µ_ξ) :=p_J(µ₁) = 2µ²₁−2(µ₂+µ_m)µ₁+ (µ₂µ_ξ+µ_ξµ_m+µ₂µ_m−µ²_ξ) (µ2−µξ)(µξ−µm) . The derivative

f⁰(µ_ξ) =−2(µ₁−µ₂)(µ₁−µ_m)(µ₂−2µ_ξ+µ_m) (µ2−µξ)²(µξ−µm)²

is a negative multiple of(µ2−2µξ+µm). Hence,f(·)is decreasing in(µm,µ)e and increasing in (µ, µe 2)witheµ:= (µ2+µm)/2. Moreover,f(·)is symmetric with respect toµewithin the interval (µm, µ2), i.e.,f(eµ−t) = f(µe+t) ∀t ∈ [0, µ2−eµ). The minimizer off(·)with respect to {µ3, . . . , µ_m−1} ⊂ (µ_m, µ₂)is therefore given by an element nearest toµ. Consequently, thee optimalp_J(·)coincides with the polynomialq(·)from the assertion.

We remark that the valueq(µ₁)is a discrete minimum of the functionf(·), whereas the continuous minimum off(·)in the interval(µ_m, µ₂)isT₂(1+2γ₁)withγ₁= (µ₁−µ₂)/(µ₂−µ_m).

Thus,q(µ₁)≥T₂(1 + 2γ₁)which means that the bound[q(µ₁) ]⁻²is smaller than the Chebyshev bound[T₂(1 + 2γ₁) ]⁻². This comparison inspires us to extend Lemma3.4to general eigenvalue

(13)

intervals(µi+1, µi)by using the proof technique in [19] since the latter analysis has adapted the Chebyshev-type estimates to general eigenvalue intervals. For this sake we use Lemma 3.2 in [19], which is restated next.

LEMMA3.5.In the setting of Lemma3.2, letey=Pm

j=1wejbe the expansion ofye∈Rⁿ\{0}

in terms of its orthogonal projectionswejto the eigenspaces ofH for themdistinct eigenvaluesµj. Ifµ(y)e ∈[µi+1, µi], then the reweighted vectorez=Pm

j=1αjwejsatisfies (a) µ(z)e ≥µ(ey)if |αj| ≥1∀j≤i and |αj| ≤1∀j > i, (b) µ(z)e ≤µ(ey)if |αj| ≤1∀j≤i and |αj| ≥1∀j > i.

The combination of Lemma3.4and Lemma3.5yields the following sharp Ritz value estimate for three-dimensional Krylov subspaces.

THEOREM 3.6. Letµ₁ > µ₂ > · · · > µ_mbe the distinct eigenvalues of the symmetric and positive definite matrixH ∈ R^n×n, and letµ(·)be the Rayleigh quotient with respect to H. Lety⁰ be a Ritz vector associated with the largest Ritz value ofH in the Krylov subspace K := K³_H(y) = span{y, Hy, H²y}withy ∈ Rⁿ\{0}. Ifµ(y) ∈(µi+1, µi), thenµ(y⁰) ≥µi

holds trivially fori > m−3. Note that this case is not relevant for the intended high-dimensional Hwhose largest eigenvalue is to be computed. In the casei≤m−3it holds that

(3.8) µi−µ(y⁰)

µ(y⁰)−µi+1

≤[q(µi) ]⁻² µi−µ(y) µ(y)−µi+1

. Hereq(·)is a quadratic polynomial which interpolates the pairs µi+1,1

, µξ,−1

, and µm,1 , andµξis an eigenvalue which has the smallest distance to(µi+1+µm)/2among the eigenvalues µi+2, . . . , µm−1, where we select the larger one asµξif there are two such eigenvalues nearest to (µ_i+1+µ_m)/2. Equality in(3.8)can be attained in the limit case thatybelongs to the invariant

subspace associated with the eigenvaluesµ_i, µ_i+1, µ_ξ, andµ_mtogether withµ(y)→µ_i. Proof.The three interpolation conditions

q(µi+1) = 1, q(µξ) =−1, and q(µm) = 1 withµ1>· · ·> µmimply thatq(·)is strictly increasing in[µi+1,∞)so that

(3.9) min

j=1,...,i|q(µj)|=q(µi)> q(µi+1) = 1.

Moreover,q(·)is symmetric with respect toµe:= (µi+1+µm)/2in the interval(µm, µi+1). Since µξis an eigenvalue closest toµ, no eigenvalues are contained in the intervale (µe−δ,µe+δ)with δ:=|µe−µξ|. Combining this with the symmetry and the monotonicity ofq(·)shows that

(3.10) max

j=i+1,...,m|q(µj)|= 1.

The caseµ(y⁰)≥µ_iis trivial since then the left-hand side of (3.8) is non-positive. In the other case µ(y⁰)< µi, we begin with the obvious relationµi> µ(y⁰)≥µ(y)> µi+1and select an auxiliary vectorzsatisfyingµ(y⁰)≥µ(z)≥µ(y). The rest of the proof is similar to the analysis for the Chebyshev-type estimate in [19, Section 3.2]. For convenience, we present here a concise version of the proof. Based on the eigenspace expansiony=Pm

j=1wjofysuggested in Lemma3.5, we definez:=q(µ_i)y₁+y₂withy₁=Pi

j=1w_jandy₂=Pm

j=i+1w_j. Then (3.11)

µ(y1)−µ(z) µ(z)−µ(y2)

µ(y1)−µ(y) µ(y)−µ(y2)

⁻¹

= [q(µi) ]⁻².

Note thatµ(y1)andµ(y2)are the Ritz values ofH in the subspacespan{y1, y2}. The properties (3.9) and (3.10) allow us to apply Lemma3.5(a) toey=yandez=z, which yieldsµ(z)≥µ(y).

Further inequalities can be derived using the vectorq(H)y = Pm

j=1q(µ_j)w_j ∈ K. Applying Lemma3.5(a) toey = yandez =q(H)yimplies thatµ q(H)y

≥µ(y). Therefore, we have

(14)

µ q(H)y

∈[µ(y), µ(y⁰)]⊆[µi+1, µi]. Then Lemma3.5(b) applied toye=q(H)yandez=z shows thatµ(z)≤µ q(H)y

. In summary, we have µ(y1)≥µi> µ(y⁰)≥µ q(H)y

≥µ(z)≥µ(y)> µi+1≥µ(y2) so that

µi−µ(y⁰) µ(y⁰)−µ_i+1

µi−µ(y) µ(y)−µ_i+1

−1

≤

µ(y1)−µ(z) µ(z)−µ(y₂)

µ(y1)−µ(y) µ(y)−µ(y₂)

−1

. This result together with (3.11) completes the proof of the estimate (3.8). The sharpness of the bound can be proved by applying the analysis in Theorem3.1to the invariant subspace spanned by the orthogonal projectionswi,wi+1,wξ, andwmofy.

The convergence factor in (3.8) can easily be shown to fulfill[q(µi) ]⁻²≤[T2(1 + 2γi) ]⁻² with the gap ratioγi := (µi−µi+1)/(µi+1−µm). To illustrate this inequality we consider the simple exampleH = diag(10,9,8,7,6.5,3,2,1) and select98equidistant values in each of the intervals(7,8),(8,9), and(9,10). By using random vectors in the corresponding level sets of the Rayleigh quotientµ(·)and additionally by using equiangular vectors (with respect to spherical coordinates) from the invariant subspace associated withµ_i, µ_i+1, µ_ξ, andµ_m, we compute numerical upper bounds of _µ

i−µ(y⁰) µ(y⁰)−µi+1

µ_i−µ(y) µ(y)−µi+1

⁻¹

and compare them with the new estimated bound[q(µi) ]⁻²and the known Chebyshev bound[T2(1 + 2γi) ]⁻²based on [8]

and [19]; see Figure3.2.

0.05 0.15 0.25

µ

4

µ

3

µ

2

µ

1

µ(y)

C on v . fa ct or s

FIG. 3.2.Comparison of the numerical upper bounds (curve) with the estimates bounds[q(µi) ]⁻²(bold line) and [T2(1 + 2γi) ]⁻²(dashed line) for the ratios

µ_i−µ(y⁰) µ(y⁰)−µ_i+1

µ_i−µ(y) µ(y)−µ_i+1

−1

.

We remark that the Chebyshev bound[T2(1 + 2γi) ]⁻² is sharp if (µi+1 +µm)/2is an eigenvalue ofH. The following conclusion is more general.

COROLLARY3.7.The Chebyshev bound in the estimate(1.7)from Theorem1.1is sharp for k >2if all extreme points of the polynomial

p(α) :=T_k−1 1 + 2(α−µ_i+1)/(µ_i+1−µ_m) in the interval(µ_m, µ_i+1)belong to the spectrum ofH.

Proof.Since the polynomialp(·)is given by the transformation of the Chebyshev polynomial T_k−1(·)from[−1,1]to[µ_m, µ_i+1], there arek−2extreme points in (µ_m, µ_i+1). If these are eigenvalues ofH, then they can be denoted byµ_σ(3) >· · ·> µ_σ(k). By applying the analysis