2 Markov Chains on G o S

(1)

E l e c t ro nic

Jo u r n a l of

Pr

o ba b i l i t y

Vol. 6 (2001) Paper no. 11, pages 1–22.

Journal URL

http://www.math.washington.edu/~ejpecp/

Paper URL

http://www.math.washington.edu/~ejpecp/EjpVol6/paper11.abs.html

MIXING TIMES FOR MARKOV CHAINS ON WREATH PRODUCTS AND RELATED HOMOGENEOUS SPACES

James Allen FILL

Department of Mathematical Sciences, The Johns Hopkins University 3400 N. Charles Street, Baltimore, MD 21218-2682

[email protected]

http://www.mts.jhu.edu/~fill/

Clyde H. SCHOOLFIELD, Jr.

Department of Statistics, Harvard University, One Oxford Street, Cambridge, MA 02138 [email protected]

http://www.fas.harvard.edu/~chschool/

Abstract We develop a method for analyzing the mixing times for a quite general class of Markov chains on the complete monomial group GoSn and a quite general class of Markov chains on the homogeneous space (G o Sn)/(Sr × Sn−r). We derive an exact formula for the L² distance in terms of the L² distances to uniformity for closely related random walks on the symmetric groups Sj for 1 ≤ j ≤ n or for closely related Markov chains on the homogeneous spacesS_i₊_j/(S_i × S_j) for various values ofiandj, respectively. Our results are consistent with those previously known, but our method is considerably simpler and more general.

KeywordsMarkov chain, random walk, rate of convergence to stationarity, mixing time, wreath product, Bernoulli–Laplace diffusion, complete monomial group, hyperoctahedral group, homogeneous space, M¨obius inversion.

AMS subject classification Primary 60J10, 60B10; secondary 20E22.

Submitted to EJP on April 9, 2001. Final version accepted on April 23, 2001.

(2)

1 Introduction and Summary.

In the proofs of many of the results of Schoolfield (2001a), theL² distance to uniformity for the random walk (on the so-calledwreath product of a groupGwith the symmetric groupSn) being analyzed is often found to be expressible in terms of the L² distance to uniformity for related random walks on the symmetric groups S_j with 1≤ j ≤ n. Similarly, in the proofs of many of the results of Schoolfield (2001b), theL² distance to stationarity for the Markov chain being analyzed is often found to be expressible in terms of the L² distance to stationarity of related Markov chains on the homogeneous spacesS_i₊_j/(S_i ×S_j) for various values ofiandj. It is from this observation that the results of this paper have evolved. We develop a method, with broad applications, for bounding the rate of convergence to stationarity for a general class of random walks and Markov chains in terms of closely related chains on the symmetric groups and related homogeneous spaces. Certain specialized problems of this sort were previously analyzed with the use of group representation theory. Our analysis is more directly probabilistic and yields some insight into the basic structure of the random walks and Markov chains being analyzed.

1.1 Markov Chains on G o Sn.

We now describe one of the two basic set-ups we will be considering [namely, the one corresponding to the results in Schoolfield (2001a)]. Letnbe a positive integer and let P be a probability measure defined on a finite setG (={1, . . . , m}, say). Imaginencards, labeled 1 through non their fronts, arranged on a table in sequential order. Write the number 1 on the back of each card. Now repeatedly permute the cards and rewrite the numbers on their backs, as follows. For each independent repetition, begin by choosing integers iand j independently according toP. Ifi6=j, transpose the cards in positionsiand j. Then, (probabilistically) independently of the choice of iand j, replace the numbers on the backs of the transposed cards with two numbers chosen independently fromGaccording to P.

If i = j (which occurs with probability 1/n), leave all cards in their current positions. Then, again independently of the choice ofj, replace the number on the back of the card in position j by a number chosen according toP.

Our interest is in bounding the mixing time for Markov chains of the sort we have described.

More generally, consider any probability measure, say Q, on the set of ordered pairs ˆb π of the form ˆπ = (π, J), whereπis a permutation of{1, . . . , n}andJ is a subset of the set of fixed points of π. At each time step, we choose such a ˆπ according to Qb and then (a) permute the cards by multiplying the current permutation of front-labels by π; and (b) replace the back-numbers of all cards whose positions have changed, and also every card whose (necessarily unchanged) position belongs toJ, by numbers chosen independently according toP.

The specific transpositions example discussed above fits the more general description, takingQb to be defined by

Q(e,b {j}) := 1

n² for anyj ∈[n], withethe identity permutation, Q(τ,b ^?) := 2

n² for any transposition τ , Q(ˆb π) := 0 otherwise.

(1.1)

(3)

When m = 1, i.e., when the aspect of back-number labeling is ignored, the state space of the chain can be identified with the symmetric groupSn, and the mixing time can be bounded as in the following classical result, which is Theorem 1 of Diaconis and Shahshahani (1981) and was later included in Diaconis (1988) as Theorem 5 in Section D of Chapter 3. The total variation norm (k · kTV) and the L² norm (k · k2) will be reviewed in Section 1.3.

Theorem 1.2 Let ν^∗^k denote the distribution at time k for the random transpositions chain (1.1) when m = 1, and let U be the uniform distribution on S_n. Let k = ¹₂nlogn+cn.

Then there exists a universal constant a >0 such that

kν^∗^k−UkTV ≤ ¹₂ kν^∗^k−Uk2 ≤ ae⁻²^c for all c >0.

Without reviewing the precise details, we remark that this bound is sharp, in that there is a matching lower bound for total variation (and hence also forL²). Thus, roughly put, ¹₂nlogn+cn steps are necessary and sufficient for approximate stationarity.

Now consider the chain (1.1) for general m ≥ 2, but restrict attention to the case that P is uniform on G. An elementary approach to bounding the mixing time is to combine the mix- ing time result of Theorem 1.2 (which measures how quickly the cards get mixed up) with a coupon collector’s analysis (which measures how quickly their back-numbers become random).

This approach is carried out in Theorem 3.6.5 of Schoolfield (2001a), but gives an upper bound only on total variation distance. If we are to use the chain’s mixing-time analysis in conjunc- tion with the powerful comparison technique of Diaconis and Saloff-Coste (1993a, 1993b) to bound mixing times for other more complicated chains, as is done for example in Chapter 9 of Schoolfield (1998), we need an upper bound onL² distance.

Such a bound can be obtained using group representation theory. Indeed, the Markov chain we have described is a random walk on the complete monomial groupG o S_n, which is the wreath product of the groupGwithS_n; see Schoolfield (2001a) for further background and discussion.

The following result is Theorem 3.1.3 of Schoolfield (2001a).

Theorem 1.3 Let ν^∗^k denote the distribution at time k for the random transpositions chain (1.1) when P is uniform on G (with |G| ≥ 2). Let k = ¹₂nlogn+ ¹₄nlog(|G| −1) +cn.

Then there exists a universal constant b >0 such that

kν^∗^k−UkTV ≤ ¹₂ kν^∗^k−Uk2 ≤ be⁻²^c for allc >0.

For L² distance (but not for total variation distance), the presence of the additional term

14nlog(|G| −1) in the mixing-time bound is “real,” in that there is a matching lower bound: see the discussion following the proof of Theorem 3.1.3 of Schoolfield (2001a).

The group-representation approach becomes substantially more difficult to carry out when the card-rearrangement scheme is something other than random transpositions, and prohibitively so if the resulting step-distribution onS_n is not constant on conjugacy classes. Moreover, there is no possibility whatsoever of using this approach whenP is non-uniform, since then we are no longer dealing with random walk on a group.

In Section 2 we provide anL²-analysis of our chain for completely general shufflesQb of the sort we have described. More specifically, in Theorem 2.3 we derive an exact formula for the L²

(4)

distance to stationarity in terms of the L² distance for closely related random walks on the symmetric groupsSj for 1≤j≤n. Subsequent corollaries establish more easily applied results in special cases. In particular, Corollary 2.8 extends Theorem 1.3 to handle non-uniformP. Our new method does have its limitations. The back-number randomizations must not depend on the current back numbers (but rather chosen afresh fromP), and they must be independent and identically distributed from card to card. So, for example, we do not know how to adapt our method to analyze the “paired-shuffles” random walk of Section 5.7 in Schoolfield (1998).

1.2 Markov Chains on (G o Sn)/(Sr × Sn−r).

We now turn to our second basic set-up [namely, the one corresponding to the results in Schoolfield (2001b)]. Again, let n be a positive integer and let P be a probability measure defined on a finite setG={1, . . . , m}.

Imagine two racks, the first with positions labeled 1 through r and the second with positions labeled r+ 1 through n. Without loss of generality, we assume that 1 ≤ r ≤ n/2. Suppose that there are n balls, labeled with serial numbers 1 through n, each initially placed at its corresponding rack position. On each ball is written the number 1, which we shall call its G-number. Now repeatedly rearrange the balls and rewrite theirG-numbers, as follows.

Consider any Qb as in Section 1.1. At each time step, choose ˆπ from Qb and then (a) permute the balls by multiplying the current permutation of serial numbers by π; (b) independently, replace theG-numbers of all balls whose positions have changed as a result of the permutation, and also every ball whose (necessarily unchanged) position belongs to J, by numbers chosen independently fromP; and (c) rearrange the balls on each of the two racks so that their serial numbers are in increasing order.

Notice that steps (a)–(b) are carried out in precisely the same way as steps (a)–(b) in Section 1.1.

The state of the system is completely determined, at each step, by the ordered n-tuple of G- numbers of the n balls 1,2, . . . , n and the unordered set of serial numbers of balls on the first rack. We have thus described a Markov chain on the set of all|G|ⁿ· ⁿ_r

ordered pairs ofn-tuples of elements ofGand r-element subsets of a set with nelements.

In our present setting, the transpositions example (1.1) fits the more general description, tak- ingQb to be defined by

Q(κ,b {j}) := 1

n²r!(n−r)! whereκ∈K and j∈[n], Q(κ,b {i, j}) := 2

n²r!(n−r)!

where κ∈K andi6=j

with i, j∈[r] ori, j∈[n]\[r], Q(τ κ,b ^?) := 2

n²r!(n−r)! whereτ κ∈T K, Q(ˆb π) := 0 otherwise,

(1.4)

whereK :=S_r × S_n₋_r,T is the set of all transpositions inS_n\K, and T K:={τ κ∈S_n:τ ∈ T andκ∈K}. Whenm= 1, the state space of the chain can be identified with the homogeneous spaceSn/(Sr×Sn−r). The chain is then a variant of the celebrated Bernoulli–Laplace diffusion model. For the classical model, Diaconis and Shahshahani (1987) determined the mixing time.

(5)

Similarly, Schoolfield (2001b) determined the mixing time of the present variant, which slows down the classical chain by a factor of ₂_r₍ⁿ_n²₋_r₎ by not forcing two balls to switch racks at each step. The following result is Theorem 2.3.3 of Schoolfield (2001b).

Theorem 1.5 Let νf^∗^k denote the distribution at time k for the variant (1.4) of the Bernoulli–

Laplace model when m = 1, and let Ue be the uniform distribution on Sn/(Sr × Sn−r). Let k= ¹₄n(logn+c). Then there exists a universal constant a >0 such that

kνf^∗^k−Uek_TV ≤ ¹₂ kνf^∗^k−Uek₂ ≤ ae⁻²^c for all c >0.

Again there are matching lower bounds, forrnot too far fromn/2, so this Markov chain is twice as fast to converge as the random walk of Theorem 1.2.

The following analogue, for the special case m = 2, of Theorem 1.3 in the present setting was obtained as Theorem 3.1.3 of Schoolfield (2001b).

Theorem 1.6 Let νf^∗^k denote the distribution at time k for the variant (1.4) of the Bernoulli–

Laplace model when P is uniform on G with|G|= 2. Let k= ¹₄n(logn+c). Then there exists a universal constant b >0 such that

kνf^∗^k−Uek_TV ≤ ¹₂ kνf^∗^k−Uek₂ ≤ be⁻^c/² for all c >0.

Notice that Theorem 1.6 provides (essentially) the same mixing time bound as that found in Theorem 1.5. Again there are matching lower bounds, forrnot too far fromn/2, so this Markov chain is twice as fast to converge as the random walk of Theorem 1.3 in the special casem= 2.

In Section 3, we provide a generalL²-analysis of our chain, which has state space equal to the homogeneous space (G o S_n)/(S_r×S_n₋_r). More specifically, in Theorem 3.3 we derive an exact formula for theL²distance to stationarity in terms of theL² distance for closely related Markov chains on the homogeneous spaces Si+j/(Si × Sj) for various values of i and j. Subsequent corollaries establish more easily applied results in special cases. In particular, Corollary 3.8 extends Theorem 1.6 to handle non-uniformP.

Again, our method does have its limitations. For example, we do not know how to adapt our method to analyze the “paired-flips” Markov chain of Section 7.4 in Schoolfield (1998).

1.3 Distances Between Probability Measures.

We now review several ways of measuring distances between probability measures on a finite set G. Let R be a fixed reference probability measure on G with R(g) > 0 for all g ∈ G. As discussed in Aldous and Fill (200x), for each 1≤p <∞ define theL^p norm kνkp of any signed measure ν on G(with respect to R) by

kνkp :=

ERν R

^p₁/p

= X

g∈G

|ν(g)|^p R(g)^p⁻¹

₁/p

.

(6)

Thus theL^p distance between any two probability measuresP and QonG(with respect toR) is

kP −Qkp =

ER P−Q

R

^p₁/p

= X

g∈G

|P(g)−Q(g)|^p R(g)^p⁻¹

₁_/p

Notice that

kP −Qk₁ = X

g∈G

|P(g)−Q(g)|.

In our applications we will always takeQ=R(andRwill always be the stationary distribution of the Markov chain under consideration at that time). In that case, when U is the uniform distribution on G,

kP−Uk₂ =

|G|X

g∈G

|P(g)−U(g)|²₁/2

.

The total variation distance betweenP andQ is defined by kP−Qk_TV := max

A⊆G|P(A)−Q(A)|.

Notice that kP −Qk_TV = ¹₂kP −Qk₁. It is a direct consequence of the Cauchy-Schwarz inequality that

kP−Uk_TV ≤ ¹₂ kP−Uk₂.

If P(·,·) is a reversible transition matrix on G with stationary distribution R = P^∞(·), then, for any g₀∈G,

kP^k(g₀,·)−P^∞(·)k²₂ = P²^k(g₀, g₀) P^∞(g₀) − 1.

All of the distances we have discussed here are indeed metrics on the space of probability measures on G.

2 Markov Chains on G o S

n

.

We now analyze a very general Markov chain on the complete monomial groupG oS_n. It should be noted that, in the results which follow, there is no essential use of the group structure of G.

So the results of this section extend simply; in general, the Markov chain of interest is on the setGⁿ × S_n.

2.1 A Class of Chains on G o Sn.

We introduce a generalization of permutations π ∈ S_n which will provide an extra level of generality in the results that follow. Recall that any permutation π∈Sn can be written as the product of disjoint cyclic factors, say

π= (i⁽¹⁾₁ i⁽¹⁾₂ · · · i⁽¹⁾_k

1 ) (i⁽²⁾₁ i⁽²⁾₂ · · · i⁽²⁾_k

2 ) · · · (i⁽₁^`⁾ i⁽₂^`⁾ · · · i⁽_k^`⁾

`),

(7)

where theK :=k₁+· · ·+k` numbersi⁽_b^a⁾ are distinct elements from [n] :={1,2, . . . , n}and we may supposek_a≥2 for 1 ≤a≤`. Then−K elements of [n] not included among the i⁽_b^a⁾ are each fixed byπ; we denote this (n−K)-set byF(π).

We refer to the ordered pair of a permutation π∈S_n and a subsetJ of F(π) as anaugmented permutation. We denote the set of all such ordered pairs ˆπ= (π, J), withπ ∈Sn andJ ⊆F(π), by Sb_n. For example, ˆπ ∈Sb₁₀ given by ˆπ = ((12)(34)(567),{8,10}) is the augmentation of the permutation π = (12)(34)(567) ∈S₁₀ by the subset {8,10} of F(π) ={8,9,10}. Notice that any given ˆπ ∈Sbn corresponds to a unique permutation π ∈Sn; denote the mapping ˆπ 7→π by T. For ˆπ = (π, J) ∈ Sb_n, define I(ˆπ) to be the set of indices i included in ˆπ, in the sense that eitheriis not a fixed point ofπ ori∈J; for our example, I(ˆπ) = {1,2,3,4,5,6,7,8,10}. LetQb be a probability measure onSb_n such that

Q(π, J) =b Q(πb ⁻¹, J) for allπ ∈S_n and J ⊆F(π) =F(π⁻¹). (2.0) We refer to this property asaugmented symmetry. This terminology is (in part) justified by the fact that if Qb is augmented symmetric, then the measureQ onS_n induced by T is given by

Q(π) = X

J⊆F(π)

Qb((π, J)) = Q(π⁻¹) for each π ∈S_n

and so is symmetric in the usual sense. We assume thatQis not concentrated on a subgroup of Gor a coset thereof. Thus Q^∗^k approaches the uniform distributionU onS_n for largek.

Suppose that G is a finite group. Label the elements of G as g₁, g₂, . . . , g_|_G_|. Let P be a probability measure defined on G. Define pi := P(gi) for 1≤i≤ |G|. To avoid trivialities, we supposep_min:= min{pi : 1≤i≤ |G|}>0.

Let ˆξ₁,ξˆ₂, . . .be a sequence of independent augmented permutations each distributed according to Q.b These correspond uniquely to a sequence ξ₁, ξ₂, . . . of permutations each distributed according to Q. Define Y := (Y₀, Y₁, Y₂, . . .) to be the random walk on S_n with Y₀ := e and Y_k :=ξ_kξ_k₋₁· · ·ξ₁ for allk≥1. (There is no loss of generality in definingY₀:=e, as any other π∈S_n can be transformed to the identity by a permutation of the labels.)

Define X:= (X₀, X₁, X₂, . . .) to be the Markov chain onGⁿ such thatX₀ :=~x₀ = (χ₁, . . . , χ_n) withχ_i ∈Gfor 1≤i≤nand, at each step kfork≥1, the entries ofX_k₋₁ whose positions are included inI( ˆξ_k) are independently changed to an element ofGdistributed according to P. DefineW := (W₀, W₁, W₂, . . .) to be the Markov chain onG o Sn such thatWk:= (Xk;Yk) for all k ≥0. Notice that the random walk on G o S_n analyzed in Theorem 1.3 is a special case of W, withP being the uniform distribution andQb being defined as at (1.1). LetP(·,·) be the transition matrix forW and letP^∞(·) be the stationary distribution forW.

Notice that

P^∞(~x;π) = 1 n!

Yn i=1

p_x_i

for any (~x;π)∈G o S_n and that

P((~x;π),(~y;σ)) = X

ˆ

ρ∈Sb_n:T(ˆρ)=σπ⁻¹

Q(ˆb ρ)h Y

j∈I(ˆρ)

py_j

i·h Y

`6∈I(ˆρ)

I(x_`=y_`) i

(8)

for any (~x;π),(~y;σ)∈G o S_n. Thus, using the augmented symmetry ofQ,b P^∞(~x;π)P((~x;π),(~y;σ))

= h1

n!

Yn i=1

px_i

i X

ˆ

Q(ˆb ρ)h Y

j∈I(ˆρ)

py_j

i·h Y

`6∈I(ˆρ)

I(x`=y`) i

= X

ˆ

Q(ˆb ρ) h1

n!

Y

i∈I(ˆρ)

p_x_i Y

i6∈I(ˆρ)

p_x_i

i·h Y

j∈I(ˆρ)

p_y_j

i·h Y

`6∈I(ˆρ)

I(x_`=y_`) i

= X

ˆ

ρ∈Sb_n:T(ˆρ)=πσ⁻¹

Q(ˆb ρ)



1 n!



 Y

i∈I(ˆρ)

px_i







 Y

j6∈I(ˆρ)

py_j







·



 Y

j∈I(ˆρ)

py_j



·



 Y

`6∈I(ˆρ)

I(y_`=x_`)





=



1 n!

Yn j=1

p_y_j



 X

ˆ

ρ∈Sb_n:T(ˆρ)=πσ⁻¹

Q(ˆb ρ)



 Y

i∈I(ˆρ)

p_x_i



·



 Y

`6∈I(ˆρ)

I(y_`=x_`)





=P^∞(~y;σ)P((~y;σ),(~x;π)).

Therefore, P is reversible, which is a necessary condition in order to apply the comparison technique of Diaconis and Saloff-Coste (1993a).

2.2 Convergence to Stationarity: Main Result.

For notational purposes, let

µ_n(J) := Qb{σˆ ∈Sb_n:I(ˆσ)⊆J}. (2.1) For anyJ ⊆[n], let S₍_J₎ be the subgroup ofS_n consisting of those σ∈S_n with [n]\F(σ)⊆J. If ˆπ ∈Sb_n is random with distributionQ, then, when the conditioning eventb

E:={I(ˆπ)⊆J}

={[n]\F(T(ˆπ))⊆J}

has positive probability, the probability measure induced byT from the conditional distribution (call itQbS_(J)) of ˆπgivenEis concentrated onS₍_J₎. Call this induced measureQS_(J). Notice that QbS_(J), likeQ, is augmented symmetric and hence thatb QS_(J) is symmetric onS₍_J₎. LetUS_(J) be the uniform measure onS₍_J₎. For notational purposes, let

d_k(J) := |J|!kQ^∗_S^k

(J) −U_S_(J)k²₂. (2.2)

Example Let Qb be defined as at (1.1). Then Qb satisfies the augmented symmetry property (2.0). In Corollary 2.8 we will be usingQb to define a random walk onG o Snwhich is precisely the random walk analyzed in Theorem 1.3.

(9)

For now, however, we will be satisfied to determineQb_S_(J) and Q_S_(J), whereJ ⊆[n]. It is easy to verify that

Qb_S_(J)(e,{j}) := 1

|J|² for each j∈J , Qb_S_(J)((p q),^?) := 2

|J|² for each transposition τ ∈S_n with{p, q} ⊆J , QbS_(J)(ˆπ) := 0 otherwise,

and hence that QbS_(J) is the probability measure defined at (1.1), but with [n] changed to J. Thus, roughly put, the random walk analyzed in Theorem 1.3, conditionally restricted to the indices in J, gives a random walk “as if J were the only indices.”

The following result establishes an upper bound on the total variation distance by deriving an exact formula forkP^k((~x₀, e),·)−P^∞(·)k²₂.

Theorem 2.3 Let W be the Markov chain on the complete monomial groupG o S_n defined in Section 2.1. Then

kP^k((~x₀;e),·)−P^∞(·)k²_TV ≤ ¹₄ kP^k((~x₀, e),·)−P^∞(·)k²₂

= ¹₄ X

J:J⊆[n]

n!

|J|!



Y

i6∈J

1 p_χi −1



µn(J)²^k dk(J)

+ ¹₄ X

J:J⁽[n]

n!

|J|!



Y

i6∈J

1 p_χi −1



µn(J)²^k.

where µ_n(J) andd_k(J) are defined at(2.1) and (2.2), respectively.

Before proceeding to the proof, we note the following. In the present setting, the argument used to prove Theorem 3.6.5 of Schoolfield (2001a) gives the upper bound

kP^k((~x₀;e),·)−P^∞(·)k_TV ≤ kQ^∗^k−US_nk_TV + ^P(T > k),

where T := inf{k≥1 :H_k= [n]} and H_k is defined as at the outset of that theorem’s proof.

Theorem 2.3 provides a similar type of upper bound, but (a) we work withL² distance instead of total variation distance and (b) the analysis is more intricate, involving the need to consider how many steps are needed to escape sets J of positions and also the need to know L² for random walks on subsets of [n]. However, Theorem 2.3 does derive anexact formula forL². Proof For eachk≥1, letH_k:=

[k

`=1

I( ˆξ_`)⊆[n]; soH_kis the (random) set of indices included in at least one of the augmented permutations ˆξ₁, . . . ,ξˆ_k. For any given w= (~x;π)∈G o Sn, let A⊆[n] be the set of indices such that x_i 6=χ_i, where x_i is the ith entry of ~x and χ_i is the ith

(10)

entry of~x₀, and letB = [n]\F(π) be the set of indices deranged byπ. Notice thatH_k⊇A∪B. Then

P(Wk= (~x;π)) = X

C:A∪B⊆C⊆[n]

P(Hk =C, Wk = (~x;π))

= X

C:A∪B⊆C⊆[n]

P(H_k =C, Y_k=π)·^P(X_k=~x |H_k=C)

= X

C:A∪B⊆C⊆[n]

P(Hk =C, Yk=π) Y

i∈C

px_i.

For anyJ ⊆[n], we have^P(H_k⊆J, Y_k=π) = 0 unlessB ⊆J ⊆[n], in which case

P(H_k⊆J, Y_k=π) = ^P(H_k⊆J) ^P(Y_k=π |H_k⊆J)

=

Qb{σˆ ∈Sb_n:I(ˆσ)⊆J}k

P(Y_k=π |H_k⊆J)

= µ_n(J)^k ^P(Y_k=π |H_k⊆J).

Then, by M¨obius inversion [see, e.g., Stanley (1986), Section 3.7], for anyC ⊆[n] we have

P(Hk=C, Yk=π) = X

J:J⊆C

(−1)^|^C^|−|^J^| ^P(Hk⊆J, Yk=π)

= X

J:B⊆J⊆C

(−1)^|^C^|−|^J^| µ_n(J)^k ^P(Y_k=π | H_k⊆J).

Combining these results gives

P(W_k= (~x;π)) = X

C:A∪B⊆C⊆[n]

X

J:B⊆J⊆C

(−1)^|^C^|−|^J^| µ_n(J)^k ^P(Y_k=π |H_k ⊆J) Y

i∈C

p_x_i

= X

J:B⊆J⊆[n]

(−1)^|^J^|µn(J)^k ^P(Y_k=π |H_k⊆J) X

C:A∪J⊆C⊆[n]

Y

i∈C

(−px_i).

But for anyD⊆[n], we have X

C:D⊆C⊆[n]

Y

i∈C

(−px_i) =

"

Y

i∈D

(−px_i)

# X

E:E⊆[n]\D

Y

i∈E

(−px_i)

=

"

Y

i∈D

(−px_i)

# Y

i∈[n]\D

(1−px_i)

= Y

i∈[n]

[1−^ID(i)−p_x_i]

(11)

where (as usual)^I_D(i) = 1 if i∈D and^I_D(i) = 0 if i6∈D. Therefore

P(Wk= (~x;π)) = X

J:B⊆J⊆[n]

(−1)^|^J^|µn(J)^k ^P(Yk=π |Hk⊆J) Yn i=1

[1−^IA∪J(i)−px_i]. In particular, when (~x;π) = (~x₀;e), we haveA=^?=B and

P(W_k= (~x₀;e)) = X

J:J⊆[n]

(−1)^|^J^|µ_n(J)^k ^P(Y_k=e|H_k ⊆J) Yn i=1

[1−^IJ(i)−p_χ_i]

=

" _n Y

i=1

p_χ_i

# X

J:J⊆[n]

µ_n(J)^k ^P(Y_k=e|H_k⊆J)Y

i6∈J

1 p_χi −1

.

Notice that {Hk ⊆ J} =

\k

`=1

n

I( ˆξ`)⊆J o

for any k and J. So L ((Y₀, Y₁, . . . , Yk | Hk⊆J)) is the law of a random walk on S_n (through stepk) with step distributionQ_S_(J). Thus, using the reversibility of Pand the symmetry ofQ_S_(J),

kP^k((~x₀, e),·)−P^∞(·)k²₂ = n!

Q_n

i=1p_χ_iP²^k((~x₀;e),(~x₀;e)) − 1

= n! X

J:J⊆[n]



Y

i6∈J

1 p_χi −1



 µn(J)²^k ^P(Y₂k =e|H₂k⊆J) − 1

= n! X

J:J⊆[n]



Y

i6∈J

1 p_χi −1



 µn(J)²^k

kQ^∗_S^k_(J)−US_(J)k²₂+ 1

|J|!

− 1

= n! X

J:J⊆[n]



Y

i6∈J

1 p_χi −1



 µ_n(J)²^k 1

|J|!(d_k(J) + 1) − 1

= X

J:J⊆[n]

n!

|J|!



Y

i6∈J

1 p_χi −1



µ_n(J)²^k d_k(J)

+ X

J:J⁽[n]

n!

|J|!



Y

i6∈J

1 p_χi −1



µ_n(J)²^k,

from which the desired result follows.

2.3 Corollaries.

We now establish several corollaries to our main result.

(12)

Corollary 2.4 Let W be the Markov chain on the complete monomial group G o S_n as in Theorem 2.3. For 0≤j≤n, let

Mn(j) := max{µn(J) :|J|=j} and Dk(j) := max{dk(J) :|J|=j}. Also let

B(n, k) := max{Dk(j) : 0≤j≤n}= max{dk(J) :J ⊆[n]}. Then

≤ ¹₄ B(n, k) Xn j=0

n j

n!

j!

1 p_min −1

n−j

Mn(j)²^k

+ ¹₄

n−1

X

j=0

n j

n!

j!

1 p_min −1

n−j

M_n(j)²^k.

Proof Notice that Y

i6∈J

1 p_χi −1

≤

p_min1 −1 _n_−|_J_|

.

The result then follows readily from Theorem 2.3.

Corollary 2.5 In addition to the assumptions of Theorem 2.3 and Corollary 2.4, suppose that there exists m > 0 such that M_n(j) ≤ (j/n)^m for all 0 ≤ j ≤ n. Let k ≥ _m¹nlogn+

21mnlog 1

p_min −1

+_m¹cn. Then

kP^k((~x₀;e),·)−P^∞(·)k_TV ≤ ¹₂ kP^k((~x₀, e),·)−P^∞(·)k₂ ≤ B(n, k) + e⁻²^c₁/2

.

Proof It follows from Corollary 2.4 that

≤ ¹₄ B(n, k) Xn j=0

n j

n!

j!

1 p_min −1

_n₋_j j n

₂_km

+ ¹₄

n−1

X

j=0

n j

n!

j!

1 p_min −1

n−j j n

₂km

.

(2.6)

(13)

If we leti=n−j, then the upper bound becomes

≤ ¹₄ B(n, k) Xn i=0

n i

n!

(n−i)!

1 p_min −1

i

1−_nⁱ₂km

+ ¹₄ Xn

i=1

n i

n!

(n−i)!

1 p_min −1

_i

1− _nⁱ₂km

≤ ¹₄ B(n, k) Xn i=0

1 i!n²ⁱ

1 p_min −1

i

e⁻²^ikm/n + ¹₄ Xn i=1

1 i!n²ⁱ

1 p_min −1

i

e⁻²^ikm/n.

Notice that if k≥ _m¹nlogn+₂¹_mnlog 1

p_min −1

+ _m¹cn, then

e⁻²^ikm/n ≤



 e⁻²^c 1

p_min −1

n²





i

,

from which it follows that

≤ ¹₄ B(n, k) Xn i=0

1

i! e⁻²^ci

+ ¹₄ Xn

i=1

1

i! e⁻²^ci

≤ ¹₄ B(n, k) exp e⁻²^c

+ ¹₄e⁻²^cexp e⁻²^c . Since c >0, we have exp e⁻²^c

< e. Therefore

kP^k((~x₀;e),·)−P^∞(·)k²_TV ≤ ¹₄ kP^k((~x₀, e),·)−P^∞(·)k²₂ ≤ B(n, k) + e⁻²^c,

from which the desired result follows.

Corollary 2.7 In addition to the assumptions of Theorem 2.3 and Corollary 2.4, suppose that a set with the distribution ofI(ˆσ) whenσˆ has distributionQb can be constructed by first choosing a set size 0 < ` ≤ n according to a probability mass function f_n(·) and then choosing a set L with|L|=` uniformly among all such choices. Let k≥nlogn+¹₂nlog

1 p_min −1

+cn. Then kP^k((~x₀;e),·)−P^∞(·)k_TV ≤ ¹₂ kP^k((~x₀, e),·)−P^∞(·)k₂ ≤ B(n, k) + e⁻²^c₁/2

.

Proof We apply Corollary 2.5. Notice that Qb{ˆσ∈Sb_n:I(ˆσ) =L} =

fn(`)/ ⁿ_`

if|L|=`,

0 otherwise.