• 検索結果がありません。

An Explicit Example of Leave-One-Out Cross-Validation Parameter Estimation for a Univariate Radial Basis Function

N/A
N/A
Protected

Academic year: 2022

シェア "An Explicit Example of Leave-One-Out Cross-Validation Parameter Estimation for a Univariate Radial Basis Function"

Copied!
8
0
0

読み込み中.... (全文を見る)

全文

(1)

An Explicit Example of Leave-One-Out Cross-Validation Parameter Estimation for a Univariate Radial Basis Function

L. Bosa·F. Polatob

Communicated by S. De Marchi

Abstract

We give an explicit example for the selection of the shape parameter for a certain univariate radial basis function (RBF) interpolation problem.

1 Introduction

Radial Basis Function interpolation (RBF) is an important method of (multivariate) interpolation of typically scattered data, which has been much used in applications. Thebasicform of RBF is as follows. Given abasisfunctiong:R+→R, the associated RBF interpolant of a data set{(xj,yj)} ⊂Rd+1withn“sites”xj∈Rdand function values yj∈R, is the function of the form s(x) =

n

X

j=1

aig(|xxj|)such thats(xi) =yi, 1≤in(if it exists). Typically the basis functionghas a so-calledshapeparameter, the value of which has an important effect on both the quality of the resulting interpolant as well as the numerical conditioning of associated interpolation linear system to be solved. For example, for the Gaussian basis function

gλ(x):=exp(−λkxk22),λ >0

smallλ≈0 gives a basis function nearly constant in a neighbourhood of the origin, while largeλ≈ ∞gives a basis function, for all intents, a delta function supported at the origin.

A discussion of practical methods for choosing the shape parameter may be found, for example, in[5, Chapt. 17], where it may be verified that the problem of selecting the shape parameter is indeed important and, in general, rather difficult. One may also consult the monographs[4,7]for more on the theory of RBF.

Given this typical difficulty of analyzing multivariate interpolation procedures, it is often useful to look more carefully at the univariate case for some suggestion as to how the general case might behave. The goal of this paper is to give an explicit univariate example of one of the most commonly used procedures for selecting an “optimal” shape parameter, the so-called Leave-One-Out Cross-Validation procedure, in the hope that it sheds some light on what happens more generally.

2 Leave-One-Out Cross-Validation (LOOCV)

Consider RBF interpolation with a basis functiongλ:R+→Rdependent on some shape parameterλ. For a collection of sites X={x1, . . . ,xn} ⊂Rdwe let

Xj:=X\{xj}, j=1, 2, . . . ,n i.e.,Xjis the set of sites withxjleft out. Then, let

sj(x) =X

k6=j

a(j)k gλ(|x−xk|) such that

sj(xk) =yk,k6=j.

In other words,sjis the RBF interpolant for the sitesXj. We may think of the valuesj(xj)as the predicted value of the data atXj for the left out sitexj, andej:=yjsj(xj), 1≤jn, measures its discrepancy with the value in the full dataset.

LOOCV selects the parameterλto minimize the 2-norm of the vector of discrepanciesej, i.e., to minimize E(λ):=

n

X

j=1

|ej|2.

(2)

Our example will be the LOOCV procedure applied to the functions

gλ(x):=aeλx+be−λx,λ,a,b∈R (1)

and sites

0=x1<x2<x3<. . .<xn=1. (2) RBF interpolation by such gλwas considered in[1]where it is shown (Theorem 1) that the determinant of the associated interpolation matrix is

det [gλ(|xixj|)]1≤i,j≤n

= (ba)n−2e

Pn j=1xj

‚n1 Y

j=1

(e2λxj+1e2λxj)

Œ

b2e2λx1a2e2λxn .

Naturally, we must restrict the values ofa,b∈Rso that this determinant is non-zero and hence the interpolation problem has a unique solution.

Given this restriction, it is then shown in[1, Theorem 5]that the cardinal functions,uk(i.e., those linear combinations of the gλ(| · −xi|)with the property thatuk(xi) =δik) are given by

uk(x) =eλ(xk−x)





e2λxexk−1

e2λxk−e2λxk−1 ifx∈[xk1,xk]

e2λx−e2λxk+1

exkexk+1 ifx∈[xk,xk+1]

0 otherwise

, 2≤kn−1,

u1(x) =eλ(x1x)

¨e2λxe2λx2

e2λx1e2λx2 ifx∈[x1,x2]

0 otherwise ,

un(x) =eλ(xnx)

¨e2λx−e2λxn−1

e2λxn−e2λxn−1 ifx∈[xn−1,xn]

0 otherwise

independently of the values ofaandb!

This allows us, for convenience’s sake, to takeb=−awitha=1/(2λ)for which our basis function (1) becomes gλ(x) =sinh(λx)

λ .

Note that

limλ→0gλ(x) =x

and the interpolation becomes one by piecewise linear functions. We will therefore letg0(x):=x.

Remark1. It is worth noting at this point that asλincreases the cardinal functions become more and more like delta functions.

Indeed, it is easy to verify that

λ→∞limuk(x) =

1 forx=xk 0 forx6=xk.

We will actually setx1=0 andxn=1 and select the parameterλaccording to the LOOCV principle of minimizing E(λ):=

n1

X

j=2

|ej|2 where

ej:=sj(xj)−yj, 2≤jn−1 andsjis the interpolant of the sitesXj, explicitly

0=x0<x1<. . .<xj−1<xj+1<. . .<xn=1. (3) To calculate the valuessj(xj)we will use the formulas for the cardinal functionsukgiven above. Indeed, letu(kj)be thekth cardinal function for the sites (3). Then

sj(x) =

n

X

k=0,k6=j

yku(j)k (x) and, in particular, due to the compact support of the cardinal functions,

sj(xj) =yj−1u(jj)1(xj) +yj+1u(j+1j)(xj). It is easily verified that then

sj(xj) =yj1e−λhj−1 e2λhj−1

2λh 2λh +yj+1eλhj 1−e2λhj−1

2λh 2λh

(3)

where, as usual, we have sethj:=xj+1xj, 1≤jn−1. It follows that ej:=yj

yj1e−λhj−1 e2λhj−1

e2λhje2λhj−1 +yj+1eλhj 1−e−2λhj−1 e2λhje2λhj−1

(4) and

E(λ) =

n−1

X

j=2

yj1e−λhj−1 e2λhj−1

e2λhje2λhj−1+yj+1eλhj 1−e2λhj−1 e2λhje2λhj−1yj

2

. (5)

We note that, since the interchange ofaandbin the definition (1) ofgλis equivalent to replacingλby−λ, the formulas foruk are invariant under this replacement,λby−λ. Consequently,E(−λ) =E(λ)andE(λ)is anevenfunction.

Remark2. In case the data comes fromy(x) =sinh(αx)for someα∈R, we note that then y(x) =αgα(x) =αgα(|xx1|), x∈[0, 1],

i.e.,yis in the span of the translatesg(| · −xj|)and so its interpolant is itself. ConsequentlyE(α) =0 andλ=αis the optimal parameter. Further, as theuk(x)donotdepend on the constantsa,bin (1), we also have, for example, that the optimalλ=αfor

y(x) =exp(αx).

Remark3. The optimalλneed not be unique. For example, if we taken=3 withy1=y3=0 thenE(λ) = (0−y2)2is constant inλ.

Remark4. An optimalλmay not exist. For example, if we taken=3 withy1=y3= +1 and y2=−1 then E(λ) =

e−λh1 e2λh2−1

e2λh2e2λh1+eλh2 1−e2λh1 e2λh2e2λh1 +1

2

. As the terms are all positiveE(λ)>1 while, as is easily verified, limλ→∞E(λ) =1 (cf. Remark1).

2.1 The case of a positive concave function

Theorem 2.1. Suppose that y(x)≥0is concave ( y00(x)≤0) for x∈[0, 1].Thenλ=0is an optimal LOOCV parameter, for any set of sites (2).

Proof.The discrepancy atxjis given byej, (4). Forλ=0 this becomes e(0)j :=yj

yj−1 hj

hj1+hj +yj+1 hj−1

hj1+hj

. (6)

We claim that for 2≤j≤(n−1),

eje(0)j (≥0 sincey(x)is concave).

To see this, first note that

eje(0)j

⇐⇒yj

yj1e−λhj−1 e2λhj−1

e2λhje2λhj−1 +yj+1eλhj 1−e2λhj−1 e2λhje2λhj−1

yj

yj1 hj

hj−1+hj+yj+1 hj1 hj−1+hj

⇐⇒yj1 hj

hj−1+hj +yj+1 hj1 hj−1+hj

yj1e−λhj−1 e2λhj−1

e2λhje2λhj−1+yj+1eλhj 1−e2λhj−1 e2λhje2λhj−1. Since, by assumption, yj±1≥0, for this it suffices to show that

hj

hj1+hje−λhj−1 e2λhj−1

e2λhje2λhj−1 (7)

and hj−1

hj1+hjeλhj 1−e−2λhj−1

e2λhje2λhj−1. (8)

To see (7). setx:=λhj−1and y:=λhj. Then

hj

hj1+hj = y x+y

(4)

while

e−λhj−1 e2λhj−1

e2λhje2λhj−1 = ex+2yex e2(x+y)−1

y

x+y (by Lemma2.2below)

= hj hj−1+hj. Similarly, for (8), setx:=λhjandy:=λhj−1so that

eλhj 1−e2λhj−1

e2λhje2λhj−1 = ex+2yex e2(x+y)−1

y

x+y (by Lemma2.2below)

= hj1 hj1+hj.

ƒ

Lemma 2.2. For all x,y>0,

e2y+xex e2(x+y)−1≤ y

x+y. Proof.This holds iff

e2y+xex

ye2(x+y)−1 x+y

⇐⇒ exe2y−1

ye2(x+y)−1 x+y

⇐⇒ e2y−1

yexe2(x+y)−1 x+y

⇐⇒ h(y)≤exh(x+y) forh(t):= (e2t−1)/t.

Hence consider, for a fixedy>0,

f(x):=e−xh(x+y).

We need to show thatf(x)≥h(y) =f(0),x≥0, i.e., that the minimum off on[0,∞)isf(0). To see this we calculate f0(x) =−exh(x+y) +exh0(x+y)

=ex{h0(x+y)−h(x+y)}.

But by Lemma2.3,h0(x+y)≥h(x+y)and sof0(x)≥0 andf is increasing on[0,∞).ƒ Lemma 2.3. Let

h(t):=e2t−1 t (with h(0):=limt0h(t) =2). Then, for t≥0,h0(t)≥h(t).

Proof.We calculate

h0(t) =2t e2te2t+1

t2 .

Henceh0(t)≥h(t)

⇐⇒ (2t−1)e2t+1

t2e2t−1 t

⇐⇒ (2t−1)e2t+1≥t(e2t−1)

⇐⇒ (t−1)e2t+1≥ −t

⇐⇒ (t−1)e2t+1+t≥0.

Now, ift≥1,(t−1)≥0 and this latter inequality is clearly true. On the other hand, if 0≤t<1, then(t−1)<0, and (t−1)e2t+1+t≥0

⇐⇒ 1+t≥(1−t)e2t

⇐⇒ 1+t 1−te2t, which is true by Lemma2.4.ƒ

(5)

Lemma 2.4. For0≤t<1,

e2t≤1+t 1−t. Proof.The Taylor series fore2tis

e2t=1+ X k=1

2k k!tk. while the Taylor series for(1+t)/(1−t)is

1+t 1−t = 2

1−t −1

=2 X

k=0

tk−1

=1+ X k=1

2tk.

Now it is easy to confirm by induction that 2k/k!≤2,k=1, 2, . . . . Hence, comparing Taylor series, we are done.ƒ 2.2 The case of equally spaced sites

Here we consider the sitesxj:= (j−1)h, 1jnforh:=1/(n−1). In this theejsimplify to ej=yj− 1

eλh+e−λh

yj1+yj+1 =yj−1

2sech(λh)

yj1+yj+1 and

E(λ) =

n−1

X

j=2

 yj−1

2sech(λh)

yj1+yj+1

‹2

.

Since, as noted previously,E(−λ) =E(λ)we minimize over the interval[0,∞). We easily calculate E0(λ) =hsech(λh)tanh(λh)

× Xn−1

j=2

 yj−1

2sech(λh)

yj1+yj+1

‹

yj1+yj+1

=hsech(λh)tanh(λh)

×

¨n−1 X

j=2

yj(yj−1+yj+1)−1

2sech(λh) Xn−1

j=2

(yj−1+yj+1)2

«

=hsech(λh)tanh(λhA−1

2sech(λh)B

‹

where we have set

A:=

n1

X

j=2

yj(yj−1+yj+1)andB:=

n1

X

j=2

(yj−1+yj+1)2.

First note thathsech(λh)tanh(λh) =0 iffλ=0 which is already an endpoint of our interval. The case ofB=0 is a bit special.

For then, forλ >0, sgn(E0(λ)) =sgn(A). Hence, then,λ=0 is the minimum ifA>0, there is no minimum ifA<0 andE(λ)is constant ifA=0.

Suppose then thatB6=0. Then, asB>0 and sech(t)≤1, ifAB/2 thenE0(λ)>0 forλ >0 andλ=0 is the unique optimal parameter.

In caseA≤0 thenE0(λ)<0 forλ >0 and again there is no minimum on[0,∞). Otherwise, in case 0<A<B/2 then there is a critical point given by

sech(λh) =2A/B,λ=1

hsech1(2A/B) which must necessarily be the optimumλ.

(6)

If we letn→ ∞we may observe that fory(x)∈C2[0, 1], 2A

B =2Pn1

j=2yj(yj1+yj+1) Pn1

j=2(yj1+yj+1)2

=2hPn−1

j=2yj(yj1+yj+1) hPn1

j=2(yj1+yj+1)2

=4R1

0(y(x))2d x+2h2R1

0 y(x)y00(x)d x+O(h3) 4R1

0(y(x))2+4h2R1

0 y(x)y00(x)d x+O(h3)

=1+12h2R1

0 y(x)y00(x)d x/R1

0(y(x))2d x+O(h3) 1+h2R1

0 y(x)y00(x)d x/R1

0(y(x))2d x+O(h3)

=1−h2 2

R1

0 y(x)y00(x)d x R1

0(y(x))2d x +O(h3). In the case that

Z1 0

y(x)y00(x)d x≥0

this expression is at most 1 (forhsufficiently large). Assuming this to be the case and using the fact that sech1(t) =log

1+p 1−t2 t

, we obtain that, for largenthe optimal parameter is then

λ= v u u t

R1

0 y(x)y00(x)d x R1

0(y(x))2d x +O(h).

3 Comparison with a Maximum Likelihood Estimate

In the context of Kriging, in which RBF interpolation is embedded in a statistical context, it has been suggested to use a Maximum Likelihood Estimate (MLE) for the optimal parameter. Here, the interpolation matrixR∈Rn×ngiven byR= [gλ(|xixj|)]1≤i,j≤n, is interpreted as a covariance matrix for a certain family ofnrandom variables. As suchRis assumed to be positive definite, and then the MLE parameter turns out be theλ(see e.g.[6]) for which

m(λ):=|det(R)|1/n|ytR−1y| (9) is aminimum.

In the case of our example, the interpolation matrix isnotpositive definite (it is however conditionally definite on a(n−1)- dimensional subspace) and hence the statistical interpretation of Kriging does not directly apply. However, one may nevertheless attempt to minimize the expression (9) and compare with the LOOCV parameter. Indeed doing so reveals an interesting relation between the two approaches. We concentrate on the case of equally spaced sites,xj= (j−1)h, 1jn,h:=1/(n−1).

First note that from Propositions 3.1 and 3.2 of[1]we have then that

R1= λ 2 sinh(hλ)×

sinhsinh((1−h)λ)(λ) 1 0 · · · 0 sinhsinh(hλ)(λ)

1 −2 cosh(hλ) 1 0 · · · 0

0 1 −2 cosh(hλ) 1 · · · 0

· · ·

· · ·

0 · · · 0 1 −2 cosh(hλ) 1

sinh(hλ)

sinh(λ) 0 · · · 0 1 −sinh((1sinh(λ)h)λ)

 .

(7)

LetM:=2 sinh(hλ)

λ R1be the above matrix. We caluclate ytM y=y1

§

−sinh((1−h)λ)

sinh(λ) y1+y2+sinh(hλ) sinh(λ) yn

ª

+

n−1

X

i=2

yi{yi1−2 cosh(hλ)yi+yi+1} +yn

§sinh(hλ)

sinh(λ) y1+yn1−sinh((1−h)λ) sinh(λ) yn

ª

=h y1

§y2y1

h +1

h



1−sinh((1−h)λ) sinh(λ)

‹

y1+sinh(hλ) hsinh(λ)yn

ª

+h2

n1

X

i=2

§

yiyi1−2yi+yi+1 h2 +2yi2

1−cosh(hλ) h2

ܻ

+h yn

§sinh(hλ)

hsinh(λ)y1+ yn1yn

h +1

h



1−sinh((1−h)λ) sinh(λ)

‹ yn

ª . Then taking the limit ash→0+, we see that

hlim0+

1

hytM y=y(0)

y0(0) +λcoth(λ)y(0) + λ

sinh(λ)y(1)‹ +

Z1 0

y(x)y00(x)d xλ2 Z1

0

y2(x)d x +y(1)

y0(1) +λcoth(λ)y(1) + λ

sinh(λ)y(0)‹ .

Now, notice thatm(λ)being a positive value will be minimized if for someλ,ytR1y=0, or, equivalently,ytM y=0. If we were to ignore the boundary terms involving the values ofy(x)andy0(x)atx=0, 1 this would happen (approximately) when

Z1 0

y(x)y00(x)d xλ2 Z1

0

y2(x)d x=0, i.e., for

λ= v u u t

R1

0 y(x)y00(x)d x R1

0(y(x))2d x , i.e., for essentially thesamevalue as the LOOCV parameter in this circumstance!

However, the boundary terms do alter this optimal value of the parameter. In Figure 1 below we give the plots of the resulting interpolants for the LOOCV parameter (λ=1.000658) and the MLE estimate computed numerically (λ=2.563091) andn=13.

The MLE interpolant is noticably worse. However, we emphasize that asRis not positive definite the MLE approach is technically not applicable. It is nontheless interesting, that apart from the boundary effects the two approaches provide the same parameter estimate, in this circumstance.

4 Conclusions

We have given a univariate example where it is possible to give explicit values for the optimal LOOCV parameter for RBF interpolation. We do not claim that this is a practical example – we give it only in the hope that it may provide some small insight into the difficult general problem of RBF parameter selection.

Acknowledgments. Work supported by the ex-60% funds of the University of Verona.

References

[1] L.. Bos and S. De Marchi. Univariate Radial Basis Functions with Compact Support Cardinal Functions East J. Approx., Vol. 14 (1) (2008), 69 – 80.

[2] L. Bos and U. Maier. On the asymptotics of points which maximize determinants of the formd et[g(|xixj|)]in “Advances in Multivariate Approximation”, Mathematical Research, (W. Haussmannet al.Eds.), Vol. 107, pp. 107 – 128, Wiley-Vch, Berlin, 1999.

[3] L. Bos and U. Maier. On the Asymptotics of Fekete-Type Points for Univariate Radial Basis Interpolation J. of Approx. Theory119, 252 – 270 (2002).

[4] M. Buhmann. Radial Basis Functions Cambridge U. Press (2003).

[5] G. Fasshauer. Meshfree Approximation Methods with Matlab World Scientific Press (2007).

(8)

0 0.5 1 0.99

1 1.01 1.02 1.03 1.04 1.05 1.06 1.07

0 0.5 1

0.99 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07

Figure 1:Left: LOOCV interpolant; Right: MLE interpolant

[6] S. Lophaven, H. B. Nielsen, and J. Sondergaard. DACE, A Matlab Kriging Toolbox Technical Report IMM-TR-2002-12, Technical Univ. of Denmark, (2002).

[7] H. Wendland. Scattered Data Approximation Cambridge U. Press (2005).

参照

関連したドキュメント