3. Main Result

(1)

Nonparametric Prediction via Mode

MOUNIR ARFI

58 Corrour Road, Aviemore PH22 1SS, Inverness-Shire, Scotland.

Email: mounir.ar…@hotmail.co.uk

Abstract

It is shown that the (empirically determined) mode of the kernel estimate is uniformly convergent to the

conditional mode function underthe ergodic condition over a sequence of compact sets which increases to R^d.

Key words: Kernel density estimate; conditional mode; ergodicity, sequence of compact sets.

1. Introduction

Letf(X_i; Y_i)gⁱ2Nbe a stationary process where(X_i; Y_i)take values inR^d R and distributed as (X; Y). Suppose that a segment of data f(X_i; Y_i)gⁿi=1 has been observed. We are interested in predicting Y from the data for a …xed value ofX.

Such an approach has been investigated by several authors when the observed data are i.i.d. or when the process is mixing (see the surveys by Collomb [5] and Györ…et al. [7]).

However, we know that if the conditional distribution of Y given X has a dominant center peak and a smaller peak far from the center, then it is more reasonable to consider the conditional mode function.

The objective of this paper is to investigate the estimation of the conditional mode function, assuming that it is uniquely de…ned. Also, to establish the uniform almost sure convergence for the estimate of the conditional mode function, obtained from the conditional density under the ergodic hypothesis, which is more general than the i.i.d. case or even mixing situations over a sequence of compact sets which increases toR^d.

On the other hand, most of the results suppose that the data belong to a …xed compact set, this is rather cumbersome for the applications. In our

(2)

paper we deal with sequences belonging to a sequence of compact sets which increases toR^d:

Such a subject has been studied by many authors, among others, Parzen [9] who studied the estimation of a probability density function and mode, Collomb& al. [6] considered the case of the conditional mode function, Ar…

[2] used the mode function to investigate the prediction and Hermann & Ziegler [8] proposed rates of consistency for a nonparametric estimation of the mode in absence of smoothness assumptions.

The conditional mode is de…ned by means of the conditional densityf(yjx) of Y, givenX, as follows: (x) = arg max_y₂_Rf(yjx);

and the so-called empirical mode predictor is de…ned as the maximum of f_n(yjx)over y2R, wheref_n(yjx) is the kernel estimate of f(yjx)de…ned by:

fn(yjx) = f_n(x; y) g_n(x) ;

here g_n(x)>0, is the kernel estimate of the density function of X, g(x), and f_n(x; y)is the kernel estimate of the joint density of the pair (X; Y),f(x; y).

These kernel estimates are de…ned, respectively, as follows:

f_n(x; y) = 1 nh^d+1_n

Xn i=1

K₂ y Y_i

h_n K₁ x X_i h_n ; and

g_n(x) = 1 nh^d_n

Xn i=1

K₁ x X_i h_n ;

here K1 (K2) are two Parzen-Rosenblatt kernels on R^d (R) with K1 strictly positive and bounded variation, andK₂ compactly supported;h_nis a sequence of positive numbers such that: h_n!0 and nh^d+1_n ! 1 when n! 1.

We show that the random function n(x) = arg max

y2Rf_n(yjx) converges uniformly over a sequence of compact setsC_n (which increases to R^d) to the mode function (x).

(3)

2. Assumptions and Main Arguments

We denote byFⁱ ¹ and Gⁱ ¹, the -…elds generated by f(X_{i j}; Y_{i j}) ; 1 j < ig and fXi j ; 1 j < ig, respectively.

We assume the existence of the conditional densitiesf_X;Y^Fⁱ ¹(:; :)and g_X^Gⁱ ¹(:) of the variables (X; Y) and X with respect toFⁱ ¹ and Gⁱ ¹.

It will be further assumed thatf(:; :)and g(:)2C0(R^j),j =d; d+ 1where C0(R^j) denotes the space of real-valued continuous functions on R^j tending to zero at in…nity. The same assumption will be made for the conditional densitiesf_X;Y^Fⁱ ¹ and g^G_Xⁱ ¹:

Under the previous conditions, the Theorem in Beck [4] implies the following condition named the (T) condition:

T_1;n = sup

(x;y)2R^d Rjn ¹ Xn

i=1

f_X;Y^Fⁱ ¹(x; y) f(x; y)j ^a:s:!0; n ! 1

T_2;n = sup

x2R^djn ¹ Xn

i=1

g_X^Gⁱ ¹(x) g(x)j ^a:s:!0; n! 1;

for ergodic processes satisfying some further mild regularity conditions ( Györ…et al. [7]).

In the sequel, we suppose that the(T)condition holds and we suppose that T_1;n =o(n ^a) and T_2;n =o(n ^a ¹).

Moreover, the conditional densities f_X;Y^Fⁱ ¹(:)andg^G_Xⁱ ¹(:)are assumed to be Lipschitz, in the sense that:

jf_X;Y^Fⁱ ¹(x; y) f_X;Y^Fⁱ ¹(x⁰; y⁰)j jj(x; y) (x⁰; y⁰)jjR^d R; jg_X^Gⁱ ¹(x) g_X^Gⁱ ¹(x⁰)j jjx x⁰jjR^d:

We will also make use of the following assumptions:

A1. The process (X_i; Y_i)_i₂_N is strictly stationary and ergodic

A2. The joint distributionP_(X;Y₎ of the pair(X; Y) is absolutely continuous with regard to the Lebesgue measure on R^d R.

A3. There exists a >0, such that g(x) n ^a, n 1; for all x 2C_n, where C_n=fx:jjxjj c_ngwith c_n ! 1; n! 1.

A4. The kernels K_j, j = 1;2 are Lipschitz of order ₁ >0, in the sense that:

9L_K <1 jK_j(u) K_j(v)j L_Kju vj ¹ j = 1;2:

(4)

A5. K_j, j = 1;2 are bounded and integrate to one.

A6. The mode function (:) satis…es the following condition on a sequence of compact sets C_n:

8 ⁿ>0; 9 n >0;(8 C_n!R^d) if sup

x2Cn

j (x) (x)j ⁿ; then sup

x2Cn

jf( (x)jx) f( (x)jx)j n: A7. There exists >2and M < 1 such thatEjYj < M:

3. Main Result

Our main result is stated in the theorem below Theorem

We suppose that the assumptions A1 to A7 hold. We further assume that the sequence h_n satis…es:

nlim!1

nh^2(d+1)n

Logn =1; n^a+1h^k_n !0 for a >0 and k 3 (1) and

8 ⁿ>0; X

n

n^d(a+1)= ¹c_n^d(Logn)¹⁼ ¹ hn^d(d+1+ ¹⁾⁼ ¹

h

1

n expf ²nnh_n^2(d+1)g<1 for >1 + (d+ 2)= ₁+ (d+ 1)= ²₁ , with a >0 and a positive constant.

If the kernel K1is even with Z

z^kK₁(z)dz <1 f or k 1, then sup

x2Cn

j ⁿ(x) (x)j ^a:s:!0; n ! 1:

Remarks

1) As sequenceshn and cn we can choose hn =O(n ^b)with b <1=2(d+ 1) and c_n =O((Logn)¹⁼ ¹).

2) In the ergodic case, there is no general theoretical result to determine the precise rate of convergence. The convergence can be arbitrarily fast.

4. Preliminary Results

sup

x2Cn

sup

y2Rjf_n(yjx) f(yjx)j

(5)

1

xinf2Cn

g(x) 8<

:sup

x2Cn

sup

y2Rjf_n(x; y) f(x; y)j+ sup

x2Cn

sup

y2Rjf_n(yjx)jjg_n(x) g(x)j 9=

;

n^a 8<

:sup

x2Cn

sup

y2Rjfn(x; y) f(x; y)j+ sup

x2Cn

sup

y2Rjfn(yjx)jjgn(x) g(x)j 9=

; with

sup

y2Rjf_n(yjx)j Ke

h_nK₁ then n ¹sup

y2Rjf_n(yjx)j Ke

nh_nK₁ < M₁ <1 where M1is a positive constant and Ke = max sup

x2R^d

K1(x);sup

y2R

K2(y);1 K₁ is an upperbound ofK₁ and we can write

sup

x2Cn

sup

y2Rjf_n(yjx) f(yjx)j n^a sup

x2Cn

sup

y2Rjf_n(x; y) f(x; y)j+M₁n^a+1 sup

x2Cn

jg_n(x) g(x)j

De…nition

A process (X_i)_i₂_N is called a martingale di¤erence, if it is real valued and satis…es:

E(X_ijAⁱ ¹) = 0 8i2N ;

where Aⁱ ¹ denotes the -…eld generated by the past of the process (X_i).

Lemma 1(Azuma [3])

If (X_i)_i₂_N is a martingale di¤erence with jX_ij B a.s., then for all >0 P

( j

Xn i=1

X_ij>

)

2 exp

2

2nB² :

If (X_i)_i₂_N is real-valued with jX_ij B a:s:, then for all integers m > 0 such thati m >0 and all >0,

P (

j Xn

i=1

[X_i E(X_ijF^{i m})]j>

)

<2mexp

2

2nm²B² :

(6)

Lemma 2

Under assumptions A1 to A5, we have:

n^a+1 sup

x2Cn

jg_n(x) g(x)j ^a:s:!0; n ! 1: Proof:

Consider the following decomposition:

g_n(x) g(x) = Xn

i=1

Z_i(x) +V_n(x) with

Z_i(x) = 1

nh^d_n K₁ x X_i

h_n E^Gⁱ ¹K₁ x X_i

h_n ;

and V_n(x) = 1 nh^d_n

Xn i=1

E^Gⁱ ¹K₁ x X_i

h_n g(x);

where, E^Gⁱ ¹(:) =E(:jGⁱ ¹) and Gⁱ ¹ = (X_{i j}; 1 j < i).

For …xed x, Z_i is a martingale di¤erence with jZ_i(x)j nh^B^d_n a:s:, where B is a positive constant. Then, by applying Lemma 1, we obtain

P (

n^a+1j Xn

i=1

Z_i(x)j>

)

=P (

j Xn

i=1

Z_i(x)j> _n )

2 exp

2 n

2B²nh^2d_n ; (2) 8 > 0 and _n = n ^a ¹ .The choice of h_n in the Theorem allows us to conclude that:

n^a+1j Xn

i=1

Z_i(x)j !0; a:s: when n! 1: Next, we show that: n^a+1 sup_x₂_C_njPn

i=1Z_i(x)j ^a:s:!0; n! 1: We cover Cn by _n spheres in the shape of fx:jjx xnjjj cn 1

n g for 1 j ^d_n; c_n ! 1 and _n chosen such that _n ! 1 to be de…ned later and we make the following decomposition.

Xn i=1

Z_i(x) 1 nh^d_n

Xn i=1

K₁ x X_i

h_n K₁ x_nj X_i

h_n +

1 nh^d_n

Xn i=1

E^Gⁱ ¹ K₁ x X_i

h_n K₁ x_nj X_i

h_n +

1 nh^d_n

Xn i=1

K₁ x_nj X_i

h_n E^Gⁱ ¹K₁ x_nj X_i

h_n :

(7)

We have:

n^a+1 nh^d_n

Xn i=1

K₁ x X_i

h_n K₁ x_nj X_i h_n

n^a+1L_K h^d+n ¹

jjx x_njjj ¹

L_Kn^a+1 h^d+n ¹

c_n¹ _n ¹ = 1 Logn where _n is chosen such that

n = L¹⁼_K ¹c_nn^(1+a)= ¹(Logn)¹⁼ ¹ h^d=n ¹⁺¹

! 1:

Then:

n^a+1sup

x2Cn

Xn i=1

Z_i(x)

sup

1 j ^d_n

n^a+1 nh^d_n

Xn i=1

K₁ xnj Xi

h_n E^Gⁱ ¹K₁ xnj Xi

h_n + 2

Logn: For all n n₁( )and for all >0

P n^a+1sup

x2Cn

Xn i=1

Zi(x) >2

!

=P sup

x2Cn

Xn i=1

Zi(x) >2 n

!

dn

X

j=1

P 1

nh^d_n Xn

i=1

K₁ x_nj X_i

h_n E^Gⁱ ¹K₁ x_nj X_i

h_n > _n

! :

Applying Azuma’s Lemma ^d_n times we obtain:

P sup

x2Cn

Xn i=1

Z_i(x) >2 _n

!

2 ^d_nexp

2 nnh^2d_n

8K²₁

!

2h_n^d(d= ¹⁺¹⁾L^d=_K ¹c^d_nn^d(1+a)= ¹(Logn)^d= ¹exp

2 nnh^2d_n

8K²₁

!

where K₁ is an upperbound ofK_1:

(8)

The hypotheses of the Theorem and Borel-Cantelli lemma permit to conclude.

Now, we show that: n^a+1sup_x₂_C_njV_n(x)j ^a:s:!0; n! 1: Write V_n(x) = 1

nh^d_n Z

R^d

K₁ u x h_n

Xn i=1

g_X^Gⁱ ¹(u)du g(x);

and set z = (u x)=h_n to obtain:

sup

x2Cn

jV_n(x)j sup

x2Cn

Z

R^d

K₁(z)n ¹ Xn

i=1

h

g^G_Xⁱ ¹(zh_n+x) g^G_Xⁱ ¹(x)i dz

+ sup

x2Cn

Z

R^d

K₁(z)

"

n ¹ Xn

i=1

g_X^Gⁱ ¹(x) g(x)

# dz :

By the assumption that the conditional densities satisfy the Lipschitz condition, we obtain

n^a+1 sup

x2Cn

jV_n(x)j n^a+1h^k_n Z

R^d

z^kK₁(z)dz+

n^a+1 sup

x2Cn

n ¹ Xn

i=1

g_X^Gⁱ ¹(x) g(x) Z

R^d

K₁(z)dz:

The condition (T); the choice T2;n = o(n ^a ¹) and the assumption about the kernelK₁ permit us to conclude that:

n^a+1 sup

x2Cn

jV_n(x)j ^a:s:!0; n! 1:

Lemma 3

Under the assumptions of the Theorem, we have:

n^a sup

x2Cn

sup

y2Rjfn(x; y) f(x; y)j ^a:s:!0; n! 1: Proof:

f_n(x; y) f(x; y) = Xn

i=1

Z_i(x; y) +T_n(x; y);

(9)

where

T_n(x; y) = 1 nh^d+1_n

Xn i=1

E^Fⁱ ¹ K₂ y Y_i

h_n K₁ x X_i

h_n f(x; y);

and

Z_i(x; y) = 1

nh^d+1_n K₂ y Y_i

h_n K₁ x X_i h_n 1

nh^d+1_n E^Fⁱ ¹ K₂ y Y_i

h_n K₁ x X_i h_n

Z_i is a martingale di¤erence with jZ_ij _nh²^K^e^d+1_n² , where Ke = max sup

x2R^d

K₁(x);sup

y2R

K₂(y); 1 : Then, apply Lemma 1 to obtain:

8 >0; P (

j Xn

i=1

Z_ij> n ^a )

=P (

j Xn

i=1

Z_ij> _n )

2 exp C₁ ²_nnh^2(d+1)_n ; (3)

where C₁ is a positive constant.

Condition (1) in the Theorem permits us to conclude:

X

n

P (

n^aj Xn

i=1

Z_ij>

)

<1:

Next, we show that: n^asup_x₂_C_nsup_y₂_RjPn

i=1Z_i(x; y)j ^a:s:!0; n ! 1: We cover C_n by ^d_n spheres: fx : jjx x_njjj c_n _n¹g, 1 j ^d_n, where c_n ! 1; and _n is chosen so that _n ! 1, to be de…ned precisely later.

Consider the following decomposition:

Xn i=1

Z_i(x; y) = Xn

i=1

[ _i(x; y) _i(x_nj; y)]

Xn i=1

E^Fⁱ ¹[ _i(x; y) _i(x_nj; y)] + Xn

i=1

[ _i(x_nj; y) E^Fⁱ ¹ _i(x_nj; y)];

(10)

where i(:; y) = ¹

nh^d+1n K₂ ^{y Y}_h ⁱ

n K₁ ^{: X}_h ⁱ

n :

By the fact that the kernelK₁ is Lipschitz, we obtain:

n^a sup

x2Cn

sup

y2Rj Xn

i=1

[ i(x; y) i(xnj; y)] L_KKne ^a h^d+1+n ¹

jjx xnjjj ¹ L_KKne ^a h^d+1+n ¹

c_n¹ _n ¹

= 1

Logn; where _n is chosen so that: _n= ^L

1= 1

K Ke¹⁼ ¹cnn^a= ¹(logn)¹⁼ ¹

h^(dn^{+1+ 1)}⁼ ¹ ! 1: Thus, n^a sup

x2Cn

sup

y2Rj Xn

i=1

Z_i(x; y)j

n^a sup

1 j ^d_n

sup

y2Rj Xn

i=1

[ i(xnj; y) E^Fⁱ ¹ i(xnj; y)]j+ 2 Logn;

and then, for all n n₁( ) and all >0, if we put n=n ^a we have:

P (

sup

x2Cn

sup

y2Rj Xn

i=1

Z_i(x; y)j>2 _n )

dn

X

j=1

P (

sup

y2Rj Xn

i=1

[ _i(x_nj; y) E^Fⁱ ¹ _i(x_nj; y)]j> _n )

: (4)

For …xed j, set:

Xn i=1

[ _i(x_nj; y) E^Fⁱ ¹ _i(x_nj; y)] = _n(x_nj; y) if jyj v_n

Xn i=1

[ _i(x_nj; y) E^Fⁱ ¹ _i(x_nj; y)] = _n(x_nj; y) if jyj> v_n

where v_n is de…ned by v_n =h

1

n with being a positive constant.

(11)

Then we have sup

y2Rj Xn

i=1

[ _i(x_nj; y) E^Fⁱ ¹ _i(x_nj; y)]j sup

jyj vn

j ⁿ(x_nj; y)j+ sup

jyj>vn

j ⁿ(x_nj; y)j: Cover[ v_n; v_n]byl_n spheresB_s with centerst_s and radii less than or equal toh_n, where l_n v_nh_n and is a …xed number. Then by arguments similar to those in the proof of Lemma 2, we obtain:

sup

jyj vn

jf_n(x_nj; y)j ⁰h_n¹⁽ ^{1) (d+1)} a:s:;

where f_n(x_nj; y) = _n(x_nj; y) _n(x_nj; t_s) and 0 is a positive constant.

Furthermore,

!_n =P max

s=1;::::lnj ⁿ(x_nj; t_s)j> _n=2

ln

X

s=1

P fj ⁿ(x_nj; t_s)j> _n=2g

l_n sup

jyj vn

P fj ⁿ(x_nj; y)j> _n=2g:

Then inequality (3) implies: !_n 2v_nh_n expf C₁ ²_nnh^2(d+1)n g. Applying Lemma 1, ^d_n times, we obtain:

P (

sup

x2Cn

sup

jyj vn

j Xn

i=1

Z_i(x; y)j> _n )

n^ad= ¹c^d_nL^d=_K ¹Ke^d= ¹(Logn)^d= ¹ h^d(d+1+n ¹⁾⁼ ¹

h

1

n expf C₁ ²_nnh^2(d+1)_n g: The assumptions of the Theorem permit us to conclude that:

n^a sup

x2Cn

sup

jyj vn

j Xn

i=1

Z_i(x; y)j ^a:s:!0:

It remains to show that: n^asup_j_y_j_>v_nj ⁿ(x_nj; y)j ^a:s:!0: We have sup

jyj>vn

j ⁿ(x_nj; y)j sup

jyj>vn

j Xn

i=1

i(x_nj; y)j+ sup

jyj>vn

j Xn

i=1

E^Fⁱ ¹ _i(x_nj; y)j; and by the compactness of the support ofK₂,

K₂ y Y

h_n KIe _[_j_Y_j_>v_n_=2]:

(12)

Therefore sup

jyj>vn

j Xn

i=1

i(x_nj; y)j 1 nh^d+1_n Ke²

Xn i=1

I_[_j_Y_i_j_>v_n_=2] (5) with

P(jYj> vn=2) (2v_n¹) (EjYj ) (6) for a certain >0 such that > ₁( 1).

For all >0, we have P

( sup

jyj>vn

j Xn

i=1

i(x_nj; y)j> _n )

1 n E

"

sup

jyj>vn

j Xn

i=1

i(x_nj; y)j

# :

Then, using (5) and (6) we obtain:

P (

sup

jyj>vn

j Xn

i=1

i(x_nj; y)j> _n )

1

n Ke²h_n^d ¹(2v_n¹) (EjYj ) = _n¹Ke²hn^d ¹⁺ 2 (EjYj ):

Inequality (4) implies:

P (

sup

x2Cn

sup

jyj>vn

j Xn

i=1

Z_i(x; y)j>2 _n )

A ^d_nhn^d ¹⁺ (EjYj );

whereA is a positive constant.

The choice of and the assumptions of the Theorem permit us to conclude that:

n^a sup

x2Cn

sup

y2Rj Xn

i=1

Z_i(x; y)j ^a:s:!0

To complete the proof of Lemma 3, we need to show that:

n^a sup

x2Cn

sup

y2RjT_n(x; y)j ^a:s:!0; n! 1: To this end:

T_n(x; y) = 1 nh^d+1_n

Xn i=1

E^Fⁱ ¹ K₂ y Yi

h_n K₁ x Xi

h_n f(x; y);

with

E^Fⁱ ¹ K₂ y Y_i

h_n K₁ x X_i

h_n =

(13)

Z Z

R^d R

K₂ y v

h_n K₁ x u

h_n f_X;Y^Fⁱ ¹(u; v)dudv:

Properties of the Bochner’s integral permit to write T_n(x; y) =

1 h^d+1_n

Z Z

R^d R

K₂ y v

h_n K₁ x u h_n n ¹

Xn i=1

f_X;Y^Fⁱ ¹(u; v)dudv f(x; y):

Then if we set z₁ = (x u)=h_n,z₂ = (y v)=h_n, we obtain T_n(x; y) =

Z Z

R^d R

K₂(z₂)K₁(z₁)n ¹ Xn

i=1

f_X;Y^Fⁱ ¹(x z₁h_n; y z₂h_n)dz₁dz₂ f(x; y):

Condition (T) and the fact that the conditional densities f_X;Y^Fⁱ ¹ are Lip- schitz and similar arguments to those used before yield:

n^a sup

x2Cn

sup

y2RjT_n(x; y)j ^a:s:!0; n ! 1

5. Proof of the Main Result

By the de…nitions of n(x) and (x), we have

jf( _n(x)jx) f( (x)jx)j jf_n( _n(x)jx) f( _n(x)jx)j+jf_n( _n(x)jx) f( (x)jx)j sup

y2Rjf_n(yjx) f(yjx)j+jsup

y2R

f_n(yjx) sup

y2R

f(yjx)j 2 sup

y2Rjf_n(yjx) f(yjx)j:

Assumption A6 implies that for all n >0 there exists _n>0 such that:

P sup

x2Cn

j ⁿ(x) (x)j ⁿ P sup

x2Cn

sup

y2Rjf_n(yjx) f(yjx)j n ; which completes the proof of the Theorem.

The Open Problem

The rate of convergence remains up to now very hard to control because it could be arbitrarily fast, one can consider this study in the case when the process is ergodic on each compact set separately and …nd a function to conclude for the whole space.

Acknowledgements

The author is grateful to the referees for their comments and citicisms

(14)

References

[1] Ar…, M.Sur la régression non paramétrique d’un processus stationnaire mélangeant ou ergodique. Thèse de Doctorat de l’Université Paris VI, 1996

[2] Ar…, M. Nonparametric prediction from ergodic samples. J.

Nonparametric Statistics. 9, 1998, 23 - 37

[3] Azuma, K. Weighted sums of certain dependent random variables.

Tôhoku Math. J., 19, 1967, 357-367.

[4] Beck, A. On the strong law of large numbers. Ergodic theory, F. B.

Wright Ed Academic Press, 1963, 21-53

[5] Collomb, G. Nonparametric regression: an up to date bibliography.

Statistics,16, 1985, 309-324

[6] Collomb, G., Hardle, G., Hassani, S. A note on prediction via estimation of the conditional mode function. J. Statist. Plann. Inference. 15, 1987, 227 - 236

[7] Györ…, L.; Härdle, W.; Sarda, P.; Vieu, P. Nonparametric curve estimation from time series. Lecture Notes in Statistics, 60, 1989, Springer-Verlag

[8] Hermann, E., Ziegler, K. Rates of consistency for a nonparametric estimation of the mode in absence of smothness assumptions.

Statistics&Probability Letters, 68, 2004, 359 - 368

[9] Parzen, E. On the estimation of a probability density function and mode.

Ann. Statist., 33, 1962, 1065 - 1076