New York Journal of Mathematics
New York J. Math. 4(1998) 249{257.
Metric Diophantine Approximation and Probability
Doug Hensley
Abstract. Letpn=qn= (pn=qn)(x) denote thenth simple continued fraction convergent to an arbitrary irrational numberx2(01). Dene the sequence of approximation constantsn(x) := q2njx;pn=qnj. It was conjectured by Lenstra that for almost allx2(01),
lim
n!1
1
n
jfj: 1jnandj(x)zgj=F(z)
whereF(z) :=z=log 2 if 0z1=2, and log12(1;z+log(2z)) if 1=2z1.
This was proved in BJW83] and extended in Nai98] to the same conclusion for
kj(x) wherekjis a sequence of positive integers satisfying a certain technical condition related to ergodic theory. Our main result is that this condition can be dispensed with we only need thatkjbe strictly increasing.
Contents
1. Introduction 249
2. Probabilities and operators 250
3. Non-independent trials are good enough. 255
References 257
1. Introduction
Metric questions about Diophantine approximation can be approached by means of ergodic theory, dynamical systems, weak mixing and so on. At the heart of this approach lies the observation that not only is the mapT : 01]nQ!01]nQgiven byT :x!h1=xi=: 1=x;1=x] ergodic, but thatT : ! := (01]nQ)01]
given byT : (xy)!(h1=xi1=(1=x]+y)) is ergodic with better mixing properties.
The associated measure, invariant under T assigns to measurable A mass
R
Alog12
R
A dxdy
(1+xy)2. Here we take a dierent tack. It goes back to the pioneering work that led to the Gauss-Kuzmin theorem, and the thread continues with the work of Wirsing and Babenko on the convergence rate in that theorem. Most recently, Vallee et al have had signal success with this circle of ideas, analyzing for
Received July 14, 1998.
Mathematics Subject Classication. 11K50 primary, 11A55, 60G50 secondary.
Key words and phrases. continued fractions, distribution, random variable.
c
1998StateUniversityofNewYork
ISSN1076-9803/98
249
instance the lattice reduction algorithm in two dimensions Val95]. This approach uses functional analysis and classical probability. Discussion of linear operators, eigenvalues, and eigenvectors requires that a linear space be specied. We shall get around to this, but for now it suces to note that the denition below ofLwould make sense for any reasonable space of functions. At the heart of our approach lies the fact that if X is a random variable onU := 01]nQwith density f, then TnX has density Lnf where Lf(t) = P1k=1(k+t);2f(1=(k+t)) (for t 2 01], else zero) and that Lhas dominant eigenvalue 1 with corresponding eigenfunction g(t) := 1=(log2(1 +t)). From this it follows that well-separated values ofTnX are nearly independent random variables, so that the usual tools of classical probability can be brought into play.
These statements about random variables and density can be rephrased so as to avoid explicit mention of probability: X is a measurable function fromU toU, and : 01]! 01] is dened by (y) = m(fx 2 U : X(x) yg) where m denotes Lebesgue measure. Ifis dierentiable on 01], thenf :=0 is the density of the random variableX. Similarly, the density ofTnX is
(d=dy)m(fx2U :TnX(x)yg) =Lnf
a fact which is used in the pioneering work mentioned above and in all subsequent developments along this line.
Recall (from the abstract) that pn=qn = (pn=qn)(x) denotes the nth simple continued fraction convergent to an arbitrary irrational number x 2 (01), while n(x) :=qn2jx;pn=qnj. Also,
F(z) :=
(z=log2 if 0z1=2
1
log2(1;z+ log(2z)) if 1=2z1.
Our main result is
Theorem 1.1.
If (kj) is a strictly increasing sequence of positive integers, and 0< z <1, then (with respect to Lebesgue measure)nlim!11
njfj: 1jn andkj(x)zgj=F(z) for almost all x2(01).
2. Probabilities and operators
The probabilityQrf(z) thatr(X)z, when the initial density for X isf, is essentiallyF(z), as we shall see. In the casef 1, this is due to Knuth. BJW83].
What is new here is that there is a kind of near-independence of these events for widely separated values ofr, and uniformly over a certain class of initial probability distributionsf.
Let Vr denote the r-fold Cartesian product of the positive integers. For an arbitrary sequencev2Vr of rpositive integers, let v] := 0v1v2:::vr] =pr=qr, letfvg:= 0vrvr;1:::v1] =qr;1=qr, and letjvj:=qr. Letv;:= (v2:::vr) and
letv; := (v1:::vr;1), so thatpr=jv;jandqr;1=jv;j. Then forx201]nQ, x = pr+ (Trx)pr;1
qr+ (Trx)qr;1 (1)
r(x) = Trx 1 +fvgTrx Lr(1 +ut);2] = X
v2Vr
jvj;2(1 +uv]);2(1 + (fvg+ufv;g)t);2 (Lrf)(t) = X
v2Vr
jvj;2(1 +fvgt);2f(v+t])
where v+t] := 0v1v2:::vr;1vr+t]. (Thus if f is a convex combination of probability density functions of the form (1+u)(1+ut);2, then so isLf.) The claim above about the dominant eigenvalue ofLcan now be given specic content. For an arbitrary function f of bounded variation and zero except in 01], (which we will call good) letkfkbe the total variation off on the real line. (Thus any probability density function has norm at least 2). We have (specializing from Lemma 6 Hen92, p 346])
Lemma 2.1.
Uniformly over0t1 and good probability density functionsf of bounded variation, Lrf = log2(1+1 t)+O(((p5;1)=2)rkfk).Again let f be a good probability density function. LetI(S) := 1 ifS is a true statement, else zero. Let X be a random variable on 01] with density f. Then the density ofTrX isLrf. Letv=v(xr) denote the sequence of the rstrpartial quotients in the continued fraction expansion ofx, so thatx= v+t] wheret=Trx. We now consider the probabilityPrzf that fv(Xr)gz. In the case r= 1, this is
P1zf =
Z
1
x=0f(x)I(b1=xcz)dx= X
fk:1=kzg
Z
1
t=0(k+t);2f(1=(k+t))dt on substitutingx= 1=(k+t). The similar substitution
x= v+t] = v1v2:::vr;1vr+t]
hasdx=dt=jvj;2(1 +fvgt);2, so the probabilityPrzf thatfvgzis given by Przf =
Z
1
t=0
X
v2Vrfvgz
jvj;2(1 +fvgt);2f(v+t])dt:
(2)
Much of what lies ahead has to do with conditional probabilities, conditional random variables, and their associated conditional densities. If E1 is an event (measurable subset ofU) with positive probability (that is,m(E1)>0), then the conditional probability of another eventE2 givenE1is by denition
prob(E1 andE2)=prob(E1) =m(E1\E2)=m(E1):
Most of our events have to do with some random variableX onU with densityf, and some function :U !U. SupposeD1 R and letE1=X;1D1=fx2U : X(x)2 D1g. The conditional probability that (X)2 D2 given X 2D1 is then
m(( X);1D1\X;1D2)=m(X;1D1), and the conditional density for (X) given thatX 2D1is the function
t!(d=dt)m(fx: (X(x))tandX(x)2D1g)=m(X;1(D1)): The conditional density given thatfv(Xr)gz, forTrX is
grzf := X
v2Vrfvgz
jvj;2(1 +fvgt);2f(v+t])=Przf: (3)
Let hrzf(t) := gfrz(t)Przf. By convention both f and g take the value 0 o 01].
Becausegrzf is the conditional density ofTrX givenv(xr)z, whereX is a random variable on (01) with densityf, it follows that if Y is a random variable on (01) with densityg:=grzf andB is a measurable subset of 01) then
probfv(Xr)gzandTrX2B]=probfv(Xr)gz] = probY 2B] (4)We are now in a position to state and prove
Theorem 2.1.
Uniformly over good probability density functions f, over0z 1, and over 0t1,hrzf := X
v2Vrfvgz
jvj;2(1 +fvgt);2f(v+t]) is good and satises
hrzf(t) = z
log2(1 +tz) +O(rz((p5;1)=2)rkfk)
Proof.
It is clear that the resulting function again has bounded variation. The real issue is whether it satises the estimate. We begin by proving a weaker version of the theorem, in which thezin the error term is replaced with 1. For this weaker version, the ground oor of induction is trivial. Let = (p5;1)=2. LetVr(z) denote the set of allv2Vr so thatfvg=qr;1=qrz. Now assume that for some r,
X
v2Vr(z)
jvj;2(1 +tfvg);2f(v+t]); z log2(1 +tz)
Crr Then
X
v2Vr+1(z)
jvj;2(1 +tfvg);2=
= X1
n= 1=z]+1n;2 X
v2Vr
jvj;2(1 +fvg=n);2(1 +t=(n+fvg));2f(vn+t])
+ X
v2VrnVr(h1=zi)
jvj;2(1=z] +t+fvg);2
= X1
n= 1=z](n+t);2 X
v2Vr
jvj;2(1 +fvg=(n+t));2f(v+t])
; X
v2Vr(h1=zi)
jvj;2(1=z] +t);2(1 +fvg=(1=z] +t));2f(v1=z] +t)
The rst term here is
1
X
n= 1=z](n+t);2g(1=(n+t)) +O(rzkfk) from Lemma 2.1, while the second term is
;
1=z] +t);2( 1log2
h1=zi
1 +h1=zi=(1=z] +t) + Cr(rkfk)
wherejj1, on the induction hypothesis. The total thus simplies to z=(log2(1 +tz)) +O(rkfk) + Cr1=z];2kfk:
This leads to a recurrenceCr+1=Cr+O(1) from which it follows that (r;1Cr) is bounded and the weak version of the theorem follows. For the strong version, we just note that given the weak version, the strong version follows immediately since the two error terms in passing fromrtor+ 1 wereO(rz+Cr1=z];2).
Corollary 2.2.
Uniformly over good probability density functions f, over0z 1, and over 0t1,X
v2Vrfvgz
jvj;2(1 +fvgt);2f(v+t]) = 1log2 1;z
(1 +t)(1 +tz) +O(r(1;z)rkfk): In view of Corollary 2.2, the conditional density for TrX given initial density f and given that qr;1=qr(X)z is, on the one hand, a good density, and on the other hand,z=((1+tz)log(1+z))+O(rrkfk), while the conditional density, given that qr;1=qr(X)z, is again on the one hand a good density, and on the other, (1;z)=((log2;log(1 +z))(1 +t)(1 +tz)) +O(rrkfk). Now consider a sequence kj of positive integers such thatk1randkj+1;kjrfor allj.
The probabilityQrf(z) thatr(X)z, when the initial density forX isf, is Qrf(z) = X
v2Vr
jvj;2Zt1
=0
(1 +fvgt);2f(v+t])I(t=(1 +fvgt)z)dt
=
Z
1
t=0
X
v2Vr
jvj;2(1 +fvgt);2f(v+t])(1;I(fvg1=z;1=t))dt Letu:= 0 ifu <0,uif 0u1, and 1 ifu >1. Taking (1=z;1=t) in place of tin Theorem 2.1 breaks into two cases. Ifz1=2 we get
Qrf(z) = 1; 1 log2
Z z=(1;z)
z (1=t;z=t2)dt+
Z
1
z=(1;z) 1 1 +t dt
!
+O(rkfk)
= z
log2 +O(rkfk): Forz >1=2 we get instead
Qrf(z) = 1; 1 log2
Z
1
z (1=t;z=t2)dt+O(rkfk) In either case, this says
Qrf(z) =F(z) +O(r)kfk (5)as claimed at the outset of this section.
Next we consider the conditional densityrzf(t) ofTrX when the initial den- sity forX isf, given thatr(X)><zaccording to whether= 0 or 1. This is again a good density, with norm not more than a constant multiple ofkfk, (the multiple may and does depend onz) we claim.
We rst note that the probability, on initial densityf, thatr(X)zis at least Cz+O(r). So for xedz >0, andrsuciently large, it is at leastKz. Similarly, the probability that r(X)z is at leastK(1;z) for rsuciently large. Next, we need some estimate of the possible growth of krzfk=kfkwith increasing r. The scaling factor that results from dividing
X
v2Vr
jvj;2(1 +fvgt);2f(v+t])(1;I(fvg1=z;1=t))
byQrf(z) or by (1;Qrf(z)), according to whether= 0 or 1, is at mostO(1=z) orO(1=(1;z)). Apart from this eect, we have a norm of at most
X
v2Vr
jvj;2k(1 +fvgt);2f(v+t])I(fvg><z;1;t;1)k: A lemma on total variation is now needed.
Lemma 2.2.
If g is a positive function of bounded variation on R and is zero outside01] andhis a positive, increasing function onRwithh(1) = 1, or positive and decreasing with h(0) = 1, then the total variation of gh is no more than that of g.Proof.
Write g asg1;g2 where both are zero on (;10), increasing, and con- stant on (11), so that the total variation of g is equal to 2g1(1) = 2g2(1). Then gh = g1h;g2h. This gives a representation of gh as the dierence of two in- creasing functions, both zero on (;10) and both equal and constant on (11).By reecting g we see that the same holds if h is positive and decreasing with h(0) = 1.
Now by repeated application of the lemma, and taking into account that the total variation off(v+t]) is no more than that off, we calculate that
X
v2Vr
jvj;2k(1 +fvgt);2f(v+t])I(fvg><z;1;t;1)k (6)
X
v2Vr
jvj;2k(1 +fvgt);2f(v+t])k
X
v2Vr
jvj;2kf(v+t])k(X
v2Vr
jvj;2)kfk2kfk From this it follows that for all 2 (01=2), and whether we require r(X) < z or r(X) > z, there exists R() > 0 and K() > 0 so that kfrzk K()kfk, uniformly in z1;forrR().
Now the probability, on initial densityf, thatk1(X)z, isF(z) +O(rkfk).
(Recall, allkj > r). The conditional density ofTk1X =Y (say) given this event is a normalized version of the sum of all terms (withv2V(k1)) of the formjvj;2(1+
fvgt);2It z=(1;zfvg)]f(v+t]), or given instead the complementary event, the same expression but with Itz=(1;zfvg)] in place of Itz=(1;zfvg)].
In either case, this is a function of bounded variation. In either case, provided z21;], that variation is bounded byK()kfk.
Fixnlarge, and consider the sequenceTkjX1jn. Fix >0. To satisfy a technical condition at the end of the paper, we also require that=(F(z)+)<1=4.
Consider the probability that more than (F(z) + 2)nof the j havekj z. For largen, we can break this nite sequence intoO(n3=4) subsequences, each of which haskj+1 > kj+n3=4, and each of which has at most, but on the order of, (n1=4) entries. This way,r=n3=4is comfortably larger than the number of trialsO(n1=4) in a subsequence. The initial densityf0is the density ofTk1, wherekis the least of thekj in our subsequence. For each such subsequence of lengthN say (N depends on the subsequence), the event E(1:::N), where (12:::N)2f01gN, is the set of allx201)nQfor whichkj < z if and only ifj= 0, 1jN.
Letrj:=kj+1;kj, with r0 :=k1. A step consists of replacingfj withfj+1:=
rjzfjj+1, the conditional density, given that kj(Y) ><z, of TrjY where Y is a random variable onU with density fj. Thus the input to the next step is again a good density, with a certain norm. The norm of the `working'fj may increase, by at most a factor of K() each time. If the number N of steps is small compared to the minimum rj, this is not much of a problem, because at each stage, the working `initial' probability density functionfj has norm no greater thanKj. The probability, at each trial within the subsequence, and for any prior history of trials within that subsequence, thatkj > z, is less thanF(z)+. That is, the conditional probability that kjm > z, given thatkjl < z exactly whenl = 0 for 1lm, is less thanF(z) +.
(We take n large enough that K()n1=4n3=4 < =2). The probability that a particular subsequence has more than its own length, multiplied byF(z)+2, cases ofkj > z, can be shown (see below) to be bounded above by O(exp(;C2n1=4)).
Keeping in mind that this claim has yet to be established, we continue with the main line of the proof.
The probability that any one of the subsequences has such an atypical success ratio, isO(n3=4exp(;C2n1=4)) which tends to zero. This shows that the probabil- ity of an atypically high success ratio is asymptotically zero. The same arguments apply to the case of an atypically low success ratio, simply by redening success to meankj < z. This proves that Nair's theorem1 holds for any strictly increasing sequence (kj) of positive integers, as claimed.
3. Non-independent trials are good enough.
`The probability that a particular subsequence has more than its own length, multiplied byF(z) + 2, cases ofkj > z, can be shown (see below) to be bounded above byO(exp(;C2n1=4)).' We now make good on this claim. For the remainder of this section, we shall use n to mean the number of trials. This new value of n will be on the order of the old value of n1=4.
We have a kind of near-independence: If
E:=E(k1k2:::kn12:::n) =fx201]nQ:kj(x)< z ij= 0g (7)then the conditional probability that kn+1 < zgiven that x2E is F(z) +O(r) and so less thanF(z) +. Thus if (a0a1) is the sequence
(a0a1) := (prob(k1(x)< z)probk1(x)> z])
1See http://nyjm.albany.edu:8000/j/1998/3A-9.html.
and (b0b1) the sequence (1;;F(z)F(z) +), thena0> b0 anda0+a1= 1 = b0+b1.
Given two sequences (a0a1:::an) and (b0b1:::bn) of non-negative numbers summing to 1, we say thata:> bif for allk < n,Pk0aj>Pk0bj.
Lemma 3.1.
If a :> b, if (uk) and (vk) are sequences of numbers in 01] with uk < vk for all k, and if a0 is given by a0j = (1;uj)aj+uj;1aj;1 and b0 by b0j= (1;vj)bj+vj;1bj;1,(settinga;1=b;1:= 0), thena0:> b0.Proof.
We havek
X
j=0a0j;b0j =Xk
j=0(1;uj)aj+uj;1aj;1;(1;vj)bj;vj;1bj;1
= (1;vk)Xk
0
(aj;bj) +vkkX;1
0
(aj;bj) +ak(vk;uk)>0: We apply this lemma with a = (a0a1:::an)k1k2:::kn12:::nz], de- ned by
am:= prob#fj: 1jn and kj < zg=m] for 0mn and b:= (b0b1:::bn) where
bm:=
n m
(F(z) +)m(1;F(z);)n;m 0mn:
The claim is that with thisaandb,a:> b. The proof is inductive, using Lemma 3.1 ntimes.
We shall be using a succession ofa's and b's, which will be denoted by super- scripts.
a1:= (a10a11) = (probk1 z]probk1 < z]) while b1:= (1;F(z);F(z) +):
We havea1 :> b1 because probk1 z]>1;F(z);. This so far uses only the denition of :>and earlier material but not Lemma 3.1
Now let a20 := (probk1 z andk2 z], a21 := probone small theta], and a22:= probk1 zandk2z]),
a2:= (a20a21a22) and
b2:= ((1;F(z);)22(1;F(z);)(F(z) +)(F(z) +)2): We takea=a1,b=b1,a0 =a2, andb0=b2 in Lemma 3.1. Then we have
a00= (1;u0)a0 a01= (1;u1)a1+u0a0 a02= (1;u2)0 +u1a1
where u0 = probk2 < z givenk1 z] and u1 = probk2 < z givenk1 < z], whilev0=v1=F(z)+. Applying Lemma 3.1 givesa2:> b2. Inductively, it gives an:> bn which says thata:> b.
Thus the probability that more thann(F(z) + 2) cases ofkj < z out of the rstnvalues ofkj is less than the probability, with a Bernoulli process which gives
`heads' with probabilityF(z) +, that more thann(F(z) + 2) of the rstntrials come up heads.
By standard exponential centering, this probability is, we claim, less than exp(;(3=8)n2). LetYj be independent Bernoulli trials with a coin taking heads with probability=F(z)+. (IfF(z)+ >1 the probability in question is zero.) Now for >0,
ProbXn
j=1Yjn(+)]
e;n(+ ) X
kn(+ )ekProbXn
j=1Yj=k]
e;n(+ )Xn
k=0ekProbXn
j=1Yj=k] =e;n(+ )(1;+e)n We take = log(1 +=) and recall that= <1=4. Thus
ProbXn
j=1Yjn(+)](1 +=);n(+ )en :
Using the rst two terms in the series expansion of log(1 +=), this is less than exp(;(3=8)n2=)<exp(;n2=4). This completes the proof of the theorem.
References
BJW83] W. Bosma, H. Jager, and F. Wiedijk,Some metrical observations on the approximation by continued fractions, Indag. Math.45(1983), 281{299, MR 85f:11059.
Hen92] D. Hensley,Continued fraction Cantor sets, Hausdor dimension, and functional anal- ysis, Journal of Number Theory40(1992), no. 3, 336{358, MR 93c:11058.
Nai98] R. Nair,On metric diophantine approximation theory and subsequence ergodic theory, New York Journal of Mathematics3A(1998), 117{124.
Val95] B. Vallee,Methodes d'analyse fonctionelle dans l'analyse en moyenne des algorithmes d'Euclide et de Gauss, C.R. Acad. Sci. Paris321(1995), 1133{1138, MR 97c:58088.
Department of Mathematics, Texas A&M University, College Station, TX 77843
[email protected] http://www.math.tamu.edu/~doug.hensley/
This paper is available via http://nyjm.albany.edu:8000/j/1998/4-16.html.