New York Journal of Mathematics

(1)

New York Journal of Mathematics

New York J. Math. ⁴(1998) 249{257.

Metric Diophantine Approximation and Probability

Doug Hensley

Abstract. Let^pⁿ^=qⁿ= (^pⁿ^=qⁿ)(^x) denote theⁿth simple continued fraction convergent to an arbitrary irrational number^x²(01). Dene the sequence of approximation constantsⁿ(^x) := ^q²ⁿ^jx^;^pⁿ^=qⁿ^j. It was conjectured by Lenstra that for almost all^x²(01),

lim

n!1

1

n

jfj: 1^jⁿand^j(^x)^zgj=^F(^z)

where^F(^z) :=^z=log 2 if 0^z1⁼2, and ^log¹²(1^;z+log(2^z)) if 1⁼2^z1.

This was proved in BJW83] and extended in Nai98] to the same conclusion for

kj(^x) where^k^jis a sequence of positive integers satisfying a certain technical condition related to ergodic theory. Our main result is that this condition can be dispensed with we only need that^k^jbe strictly increasing.

Contents

1. Introduction 249

2. Probabilities and operators 250

3. Non-independent trials are good enough. 255

References 257

1. Introduction

Metric questions about Diophantine approximation can be approached by means of ergodic theory, dynamical systems, weak mixing and so on. At the heart of this approach lies the observation that not only is the mapT : 01]ⁿQ^!01]ⁿQgiven byT :x^!^h1=xⁱ=: 1=x^;1=x] ergodic, but that^T : ^! := (01]ⁿQ)01]

given by^T : (xy)^!(^h1=xⁱ1=(1=x]+y)) is ergodic with better mixing properties.

The associated measure, invariant under ^T assigns to measurable A mass

R

A^log¹²

R

A dxdy

(1+xy⁾². Here we take a dierent tack. It goes back to the pioneering work that led to the Gauss-Kuzmin theorem, and the thread continues with the work of Wirsing and Babenko on the convergence rate in that theorem. Most recently, Vallee et al have had signal success with this circle of ideas, analyzing for

Received July 14, 1998.

Mathematics Subject Classication. 11K50 primary, 11A55, 60G50 secondary.

Key words and phrases. continued fractions, distribution, random variable.

c

1998StateUniversityofNewYork

ISSN1076-9803/98

249

(2)

instance the lattice reduction algorithm in two dimensions Val95]. This approach uses functional analysis and classical probability. Discussion of linear operators, eigenvalues, and eigenvectors requires that a linear space be specied. We shall get around to this, but for now it suces to note that the denition below ofLwould make sense for any reasonable space of functions. At the heart of our approach lies the fact that if X is a random variable onU := 01]ⁿQwith density f, then TⁿX has density Lⁿf where Lf(t) = ^P¹_k⁼¹(k+t)^;2f(1=(k+t)) (for t ² 01], else zero) and that Lhas dominant eigenvalue 1 with corresponding eigenfunction g(t) := 1=(log2(1 +t)). From this it follows that well-separated values ofTⁿX are nearly independent random variables, so that the usual tools of classical probability can be brought into play.

These statements about random variables and density can be rephrased so as to avoid explicit mention of probability: X is a measurable function fromU toU, and : 01]^! 01] is dened by (y) = m(^fx ² U : X(x) y^g) where m denotes Lebesgue measure. Ifis dierentiable on 01], thenf :=⁰ is the density of the random variableX. Similarly, the density ofTⁿX is

(d=dy)m(^fx²U :TⁿX(x)y^g) =Lⁿf

a fact which is used in the pioneering work mentioned above and in all subsequent developments along this line.

Recall (from the abstract) that pn=qn = (pn=qn)(x) denotes the nth simple continued fraction convergent to an arbitrary irrational number x ² (01), while n(x) :=q_n²^jx^;pn=qn^j. Also,

F(z) :=

(z=log2 if 0z1=2

1

log2(1^;z+ log(2z)) if 1=2z1.

Our main result is

Theorem 1.1.

If (k_j) is a strictly increasing sequence of positive integers, and 0< z <1, then (with respect to Lebesgue measure)

nlim^!11

n^jfj: 1jn andkj(x)z^gj=F(z) for almost all x²(01).

2. Probabilities and operators

The probabilityQrf(z) thatr(X)z, when the initial density for X isf, is essentiallyF(z), as we shall see. In the casef 1, this is due to Knuth. BJW83].

What is new here is that there is a kind of near-independence of these events for widely separated values ofr, and uniformly over a certain class of initial probability distributionsf.

Let Vr denote the r-fold Cartesian product of the positive integers. For an arbitrary sequencev²Vr of rpositive integers, let v] := 0v¹v²:::vr] =pr=qr, let^fv^g:= 0vrvr^;1:::v¹] =qr^;1=qr, and let^jv^j:=qr. Letv^;:= (v²:::vr) and

(3)

letv^; := (v¹:::vr^;1), so thatpr=^jv^;^jandqr^;1=^jv^;^j. Then forx²01]ⁿQ, x = pr+ (T^rx)pr^;1

q_r+ (T^rx)q_r^;1 (1)

_r(x) = T^rx 1 +^fv^gT^rx L^r(1 +ut)^;2] = ^X

v²Vr

jv^j^;2(1 +uv])^;2(1 + (^fv^g+u^fv^;^g)t)^;2 (L^rf)(t) = ^X

v²Vr

jv^j^;2(1 +^fv^gt)^;2f(v+t])

where v+t] := 0v¹v²:::vr^;1vr+t]. (Thus if f is a convex combination of probability density functions of the form (1+u)(1+ut)^;2, then so isLf.) The claim above about the dominant eigenvalue ofLcan now be given specic content. For an arbitrary function f of bounded variation and zero except in 01], (which we will call good) let^kf^kbe the total variation off on the real line. (Thus any probability density function has norm at least 2). We have (specializing from Lemma 6 Hen92, p 346])

Lemma 2.1.

Uniformly over0t1 and good probability density functionsf of bounded variation, L^rf = ^log²⁽¹⁺¹ _t⁾+O(((^p5^;1)=2)^r^kf^k).

Again let f be a good probability density function. LetI(S) := 1 ifS is a true statement, else zero. Let X be a random variable on 01] with density f. Then the density ofT^rX isL^rf. Letv=v(xr) denote the sequence of the rstrpartial quotients in the continued fraction expansion ofx, so thatx= v+t] wheret=T^rx. We now consider the probabilityPrzf that ^fv(Xr)^gz. In the case r= 1, this is

P¹zf =

Z

1

x⁼⁰f(x)I(^b1=x^cz)dx= ^X

fk^:1=kz^g

Z

1

t⁼⁰(k+t)^;2f(1=(k+t))dt on substitutingx= 1=(k+t). The similar substitution

x= v+t] = v¹v²:::v_r^;1v_r+t]

hasdx=dt=^jv^j^;2(1 +^fv^gt)^;2, so the probabilityPrzf that^fv^gzis given by Przf =

Z

1

t⁼⁰

X

v²Vr^fv^gz

jv^j^;2(1 +^fv^gt)^;2f(v+t])dt:

(2)

Much of what lies ahead has to do with conditional probabilities, conditional random variables, and their associated conditional densities. If E¹ is an event (measurable subset ofU) with positive probability (that is,m(E¹)>0), then the conditional probability of another eventE² givenE¹is by denition

prob(E¹ andE²)=prob(E¹) =m(E¹^\E²)=m(E¹):

Most of our events have to do with some random variableX onU with densityf, and some function :U ^!U. SupposeD¹ R and letE¹=X^;1D¹=^fx²U : X(x)² D¹^g. The conditional probability that (X)² D² given X ²D¹ is then

(4)

m(( X)^;1D¹^\X^;1D²)=m(X^;1D¹), and the conditional density for (X) given thatX ²D¹is the function

t^!(d=dt)m(^fx: (X(x))tandX(x)²D¹^g)=m(X^;1(D¹)): The conditional density given that^fv(Xr)^gz, forT^rX is

grzf := ^X

v²Vr^fv^gz

jv^j^;2(1 +^fv^gt)^;2f(v+t])=Przf: (3)

Let h_rzf(t) := g_frz(t)P_rzf. By convention both f and g take the value 0 o 01].

Becauseg_rzf is the conditional density ofT^rX givenv(xr)z, whereX is a random variable on (01) with densityf, it follows that if Y is a random variable on (01) with densityg:=grzf andB is a measurable subset of 01) then

prob^fv(Xr)^gzandT^rX²B]=prob^fv(Xr)^gz] = probY ²B] (4)We are now in a position to state and prove

Theorem 2.1.

Uniformly over good probability density functions f, over0z 1, and over 0t1,

hrzf := ^X

v²Vr^fv^gz

jv^j^;2(1 +^fv^gt)^;2f(v+t]) is good and satises

hrzf(t) = z

log2(1 +tz) +O(rz((^p5^;1)=2)^r^kf^k)

Proof.

It is clear that the resulting function again has bounded variation. The real issue is whether it satises the estimate. We begin by proving a weaker version of the theorem, in which thezin the error term is replaced with 1. For this weaker version, the ground oor of induction is trivial. Let = (^p5^;1)=2. LetVr(z) denote the set of allv²Vr so that^fv^g=qr^;1=qrz. Now assume that for some r,

X

v²Vr⁽z⁾

jv^j^;2(1 +t^fv^g)^;2f(v+t])^; z log2(1 +tz)

C_r^r Then

X

v²Vr⁺¹⁽z⁾

jv^j^;2(1 +t^fv^g)^;2=

= ^X¹

n^{= 1}=z^]+1n^;2 ^X

v²Vr

jv^j^;2(1 +^fv^g=n)^;2(1 +t=(n+^fv^g))^;2f(vn+t])

+ ^X

v²VrⁿVr^(h1=zⁱ⁾

jv^j^;2(1=z] +t+^fv^g)^;2

= ^X¹

n^{= 1}=z^](n+t)^;2 ^X

v²Vr

jv^j^;2(1 +^fv^g=(n+t))^;2f(v+t])

; X

v²Vr^(h1=zⁱ⁾

jv^j^;2(1=z] +t)^;2(1 +^fv^g=(1=z] +t))^;2f(v1=z] +t)

(5)

The rst term here is

1

X

n^{= 1}=z^](n+t)^;2g(1=(n+t)) +O(^rz^kf^k) from Lemma 2.1, while the second term is

;

1=z] +t)^;2( 1log2

h1=zⁱ

1 +^h1=zⁱ=(1=z] +t) + Cr(^r^kf^k)

where^j^j1, on the induction hypothesis. The total thus simplies to z=(log2(1 +tz)) +O(^r^kf^k) + Cr1=z]^;2^kf^k:

This leads to a recurrenceCr⁺¹=Cr+O(1) from which it follows that (r^;1Cr) is bounded and the weak version of the theorem follows. For the strong version, we just note that given the weak version, the strong version follows immediately since the two error terms in passing fromrtor+ 1 wereO(^rz+Cr1=z]^;2).

Corollary 2.2.

Uniformly over good probability density functions f, over0z 1, and over 0t1,

X

v²Vr^fv^gz

jv^j^;2(1 +^fv^gt)^;2f(v+t]) = 1log2 1^;z

(1 +t)(1 +tz) +O(r(1^;z)^r^kf^k): In view of Corollary 2.2, the conditional density for T^rX given initial density f and given that qr^;1=qr(X)z is, on the one hand, a good density, and on the other hand,z=((1+tz)log(1+z))+O(r^r^kf^k), while the conditional density, given that qr^;1=qr(X)z, is again on the one hand a good density, and on the other, (1^;z)=((log2^;log(1 +z))(1 +t)(1 +tz)) +O(r^r^kf^k). Now consider a sequence kj of positive integers such thatk¹randkj⁺¹^;kjrfor allj.

The probabilityQrf(z) thatr(X)z, when the initial density forX isf, is Qrf(z) = ^X

v²Vr

jv^j^;2^Z_t¹

=0

(1 +^fv^gt)^;2f(v+t])I(t=(1 +^fv^gt)z)dt

=

Z

1

t⁼⁰

X

v²Vr

jv^j^;2(1 +^fv^gt)^;2f(v+t])(1^;I(^fv^g1=z^;1=t))dt Letu:= 0 ifu <0,uif 0u1, and 1 ifu >1. Taking (1=z^;1=t) in place of tin Theorem 2.1 breaks into two cases. Ifz1=2 we get

Qrf(z) = 1^; 1 log2

Z z=^(1;z⁾

z (1=t^;z=t²)dt+

Z

1

z=^(1;z⁾ 1 1 +t dt

!

+O(^r^kf^k)

= z

log2 +O(^r^kf^k): Forz >1=2 we get instead

Qrf(z) = 1^; 1 log2

Z

1

z (1=t^;z=t²)dt+O(^r^kf^k) In either case, this says

Q_rf(z) =F(z) +O(^r)^kf^k (5)as claimed at the outset of this section.

(6)

Next we consider the conditional densityrzf(t) ofT^rX when the initial density forX isf, given thatr(X)_><zaccording to whether= 0 or 1. This is again a good density, with norm not more than a constant multiple of^kf^k, (the multiple may and does depend onz) we claim.

We rst note that the probability, on initial densityf, thatr(X)zis at least Cz+O(^r). So for xedz >0, andrsuciently large, it is at leastKz. Similarly, the probability that r(X)z is at leastK(1^;z) for rsuciently large. Next, we need some estimate of the possible growth of ^krzf^k=^kf^kwith increasing r. The scaling factor that results from dividing

X

v²Vr

jv^j^;2(1 +^fv^gt)^;2f(v+t])(1^;I(^fv^g1=z^;1=t))

byQrf(z) or by (1^;Qrf(z)), according to whether= 0 or 1, is at mostO(1=z) orO(1=(1^;z)). Apart from this eect, we have a norm of at most

X

v²Vr

jv^j^;2^k(1 +^fv^gt)^;2f(v+t])I(^fv^g_><z^;1^;t^;1)^k: A lemma on total variation is now needed.

Lemma 2.2.

If g is a positive function of bounded variation on R and is zero outside01] andhis a positive, increasing function onRwithh(1) = 1, or positive and decreasing with h(0) = 1, then the total variation of gh is no more than that of g.

Proof.

^Write ^g ^as^g¹^;^g² where both are zero on (^;10), increasing, and constant on (1¹), so that the total variation of g is equal to 2g¹(1) = 2g²(1). Then gh = g¹h^;g²h. This gives a representation of gh as the dierence of two increasing functions, both zero on (^;10) and both equal and constant on (1¹).

By reecting g we see that the same holds if h is positive and decreasing with h(0) = 1.

Now by repeated application of the lemma, and taking into account that the total variation off(v+t]) is no more than that off, we calculate that

X

v²Vr

jv^j^;2^k(1 +^fv^gt)^;2f(v+t])I(^fv^g_><z^;1^;t^;1)^k (6)

X

v²Vr

jv^j^;2^k(1 +^fv^gt)^;2f(v+t])^k

X

v²Vr

jv^j^;2^kf(v+t])^k(^X

v²Vr

jv^j^;2)^kf^k2^kf^k From this it follows that for all ² (01=2), and whether we require r(X) < z or r(X) > z, there exists R() > 0 and K() > 0 so that ^kfrz^k K()^kf^k, uniformly in z1^;forrR().

Now the probability, on initial densityf, thatk¹(X)z, isF(z) +O(^r^kf^k).

(Recall, allkj > r). The conditional density ofT^k¹X =Y (say) given this event is a normalized version of the sum of all terms (withv²V(k¹)) of the form^jv^j^;2(1+

fv^gt)^;2It z=(1^;z^fv^g)]f(v+t]), or given instead the complementary event, the same expression but with Itz=(1^;z^fv^g)] in place of Itz=(1^;z^fv^g)].

(7)

In either case, this is a function of bounded variation. In either case, provided z²1^;], that variation is bounded byK()^kf^k.

Fixnlarge, and consider the sequenceT^k^jX1jn. Fix >0. To satisfy a technical condition at the end of the paper, we also require that=(F(z)+)<1=4.

Consider the probability that more than (F(z) + 2)nof the j havekj z. For largen, we can break this nite sequence intoO(n³⁼⁴) subsequences, each of which hask_j⁺¹ > k_j+n³⁼⁴, and each of which has at most, but on the order of, (n¹⁼⁴) entries. This way,r=n³⁼⁴is comfortably larger than the number of trialsO(n¹⁼⁴) in a subsequence. The initial densityf⁰is the density ofT^k1, wherekis the least of thekj in our subsequence. For each such subsequence of lengthN say (N depends on the subsequence), the event E(¹:::N), where (¹²:::N)²^f01^g^N, is the set of allx²01)ⁿQfor whichkj < z if and only ifj= 0, 1jN.

Letrj:=kj⁺¹^;kj, with r⁰ :=k¹. A step consists of replacingfj withfj⁺¹:=

_r_j_zf_j_j⁺¹, the conditional density, given that _k_j(Y) _><z, of T^r^jY where Y is a random variable onU with density f_j. Thus the input to the next step is again a good density, with a certain norm. The norm of the `working'fj may increase, by at most a factor of K() each time. If the number N of steps is small compared to the minimum rj, this is not much of a problem, because at each stage, the working `initial' probability density functionfj has norm no greater thanK^j. The probability, at each trial within the subsequence, and for any prior history of trials within that subsequence, thatkj > z, is less thanF(z)+. That is, the conditional probability that k_jm > z, given thatk_jl < z exactly whenl = 0 for 1lm, is less thanF(z) +.

(We take n large enough that K()ⁿ¹⁼⁴ⁿ³⁼⁴ < =2). The probability that a particular subsequence has more than its own length, multiplied byF(z)+2, cases of_k_j > z, can be shown (see below) to be bounded above by O(exp(^;C²n¹⁼⁴)).

Keeping in mind that this claim has yet to be established, we continue with the main line of the proof.

The probability that any one of the subsequences has such an atypical success ratio, isO(n³⁼⁴exp(^;C²n¹⁼⁴)) which tends to zero. This shows that the probability of an atypically high success ratio is asymptotically zero. The same arguments apply to the case of an atypically low success ratio, simply by redening success to meankj < z. This proves that Nair's theorem¹ holds for any strictly increasing sequence (kj) of positive integers, as claimed.

3. Non-independent trials are good enough.

`The probability that a particular subsequence has more than its own length, multiplied byF(z) + 2, cases ofkj > z, can be shown (see below) to be bounded above byO(exp(^;C²n¹⁼⁴)).' We now make good on this claim. For the remainder of this section, we shall use n to mean the number of trials. This new value of n will be on the order of the old value of n¹⁼⁴.

We have a kind of near-independence: If

E:=E(k¹k²:::kn¹²:::n) =^fx²01]ⁿQ:kj(x)< z ij= 0^g (7)then the conditional probability that kn⁺¹ < zgiven that x²E is F(z) +O(^r) and so less thanF(z) +. Thus if (a⁰a¹) is the sequence

(a⁰a¹) := (prob(k¹(x)< z)probk¹(x)> z])

1See http://nyjm.albany.edu:8000/j/1998/3A-9.html.

(8)

and (b⁰b¹) the sequence (1^;^;F(z)F(z) +), thena⁰> b⁰ anda⁰+a¹= 1 = b⁰+b¹.

Given two sequences (a⁰a¹:::an) and (b⁰b¹:::bn) of non-negative numbers summing to 1, we say thata:> bif for allk < n,^P^k⁰aj>^P^k⁰bj.

Lemma 3.1.

If a :> b, if (uk) and (vk) are sequences of numbers in 01] with uk < vk for all k, and if a⁰ is given by a⁰_j = (1^;uj)aj+uj^;1aj^;1 and b⁰ by b⁰_j= (1^;vj)bj+vj^;1bj^;1,(settinga^;1=b^;1:= 0), thena⁰:> b⁰.

Proof.

We have

k

X

j⁼⁰a⁰_j^;b⁰_j =^X^k

j⁼⁰(1^;uj)aj+uj^;1aj^;1^;(1^;vj)bj^;vj^;1bj^;1

= (1^;v_k)^X^k

0

(a_j^;b_j) +v_k^k^X^;1

0

(a_j^;b_j) +a_k(v_k^;u_k)>0: We apply this lemma with a = (a⁰a¹:::an)k¹k²:::kn¹²:::nz], dened by

am:= prob#^fj: 1jn and kj < z^g=m] for 0mn and b:= (b⁰b¹:::bn) where

b_m:=

n m

(F(z) +)^m(1^;F(z)^;)ⁿ^;^m 0mn:

The claim is that with thisaandb,a:> b. The proof is inductive, using Lemma 3.1 ntimes.

We shall be using a succession ofa's and b's, which will be denoted by super- scripts.

a¹:= (a¹⁰a¹¹) = (probk¹ z]probk¹ < z]) while b¹:= (1^;F(z)^;F(z) +):

We havea¹ :> b¹ because probk¹ z]>1^;F(z)^;. This so far uses only the denition of :>and earlier material but not Lemma 3.1

Now let a²⁰ := (probk¹ z andk² z], a²¹ := probone small theta], and a²²:= prob_k¹ zand_k²z]),

a²:= (a²⁰a²¹a²²) and

b²:= ((1^;F(z)^;)²2(1^;F(z)^;)(F(z) +)(F(z) +)²): We takea=a¹,b=b¹,a⁰ =a², andb⁰=b² in Lemma 3.1. Then we have

a⁰⁰= (1^;u⁰)a⁰ a⁰¹= (1^;u¹)a¹+u⁰a⁰ a⁰²= (1^;u²)0 +u¹a¹

where u⁰ = probk² < z givenk¹ z] and u¹ = probk² < z givenk¹ < z], whilev⁰=v¹=F(z)+. Applying Lemma 3.1 givesa²:> b². Inductively, it gives aⁿ:> bⁿ which says thata:> b.

Thus the probability that more thann(F(z) + 2) cases ofkj < z out of the rstnvalues ofkj is less than the probability, with a Bernoulli process which gives

`heads' with probabilityF(z) +, that more thann(F(z) + 2) of the rstntrials come up heads.

(9)

By standard exponential centering, this probability is, we claim, less than exp(^;(3=8)n²). LetYj be independent Bernoulli trials with a coin taking heads with probability=F(z)+. (IfF(z)+ >1 the probability in question is zero.) Now for >0,

Prob^Xⁿ

j⁼¹Y_jn(+)]

e^;ⁿ⁽⁺ ⁾ ^X

kn⁽⁺ ⁾e^kProb^Xⁿ

j⁼¹Yj=k]

e^;ⁿ⁽⁺ ⁾^Xⁿ

k⁼⁰e^kProb^Xⁿ

j⁼¹Yj=k] =e^;ⁿ⁽⁺ ⁾(1^;+e)ⁿ We take = log(1 +=) and recall that= <1=4. Thus

Prob^Xⁿ

j⁼¹Yjn(+)](1 +=)^;ⁿ⁽⁺ ⁾eⁿ:

Using the rst two terms in the series expansion of log(1 +=), this is less than exp(^;(3=8)n²=)<exp(^;n²=4). This completes the proof of the theorem.

References

BJW83] W. Bosma, H. Jager, and F. Wiedijk,Some metrical observations on the approximation by continued fractions, Indag. Math.⁴⁵(1983), 281{299, MR 85f:11059.

Hen92] D. Hensley,Continued fraction Cantor sets, Hausdor dimension, and functional analysis, Journal of Number Theory⁴⁰(1992), no. 3, 336{358, MR 93c:11058.

Nai98] R. Nair,On metric diophantine approximation theory and subsequence ergodic theory, New York Journal of Mathematics^3A(1998), 117{124.

Val95] B. Vallee,Methodes d'analyse fonctionelle dans l'analyse en moyenne des algorithmes d'Euclide et de Gauss, C.R. Acad. Sci. Paris³²¹(1995), 1133{1138, MR 97c:58088.

Department of Mathematics, Texas A&M University, College Station, TX 77843

[email protected] http://www.math.tamu.edu/~doug.hensley/

This paper is available via http://nyjm.albany.edu:8000/j/1998/4-16.html.