• 検索結果がありません。

2 The discrete Benamou-Brenier condition

N/A
N/A
Protected

Academic year: 2022

シェア "2 The discrete Benamou-Brenier condition"

Copied!
30
0
0

読み込み中.... (全文を見る)

全文

(1)

El e c t ro nic J

o f

Pr

ob a bi l i t y

Electron. J. Probab.19(2014), no. 92, 1–29.

ISSN:1083-6489 DOI:10.1214/EJP.v19-3336

W

1,+

-interpolation of probability measures on graphs

Erwan Hillion

Abstract

We generalize an equation introduced by Benamou and Brenier in [BB00] and char- acterizing WassersteinWp-geodesics forp >1, from the continuous setting of prob- ability distributions on a Riemannian manifold to the discrete setting of probability distributions on a general graph.

Given an initial and a final distributions(f0(x))x∈G, (f1(x))x∈G, we prove the exis- tence of a curve(ft(x))t∈[0,1],x∈Gsatisfying this Benamou-Brenier equation. We also show that such a curve can be described as a mixture of binomial distributions with respect to a coupling that is solution of a certain optimization problem.

Keywords:Optimal transport ; Wasserstein distances ; Discrete probability.

AMS MSC 2010:60J10 ; 60Dxx.

Submitted to EJP on February 21, 2014, final version accepted on September 15, 2014.

1 Introduction

Given some p≥1, we consider the space Pp(X)of probability distributions over a metric space(X, d)having a finitep-th moment. On this space we define the Wasserstein distanceWpby

Wp0, µ1)p:= inf

π∈Π(µ01)

Z

X×X

d(x0, x1)pdπ(x0, x1), (1.1) where the setΠ(µ0, µ1)is the set of couplings ofµ0 andµ1, i.e. the set of probability distributionsπonX×X havingµ0andµ1as marginals.

A comprehensive study of the Problem (1.1), called Monge-Kantorovitch problem, can be found in Villani’s textbooks [Vil03] and [Vil08]. What is important for our pur- poses is that it is possible to prove, under very general assumptions, the existence of a minimizerπ ∈ Π(µ0, µ1) for problem (1.1), called optimal coupling, and that Wp is indeed a metric onPp(X). The couple(Pp(X), Wp)will be called the Wp-Wasserstein space over the metric space(X, d).

University of Luxembourg. E-mail:erwan.hillion@uni.lu

(2)

Given a metric space(X, d)and a continuous curve(γt)t∈[0,1]on the space(X, d), i.e.

a continuous mappingγ: [0,1]→(X, d), we define the length of(γt)t∈[0,1] by L(γ) := sup

0=t0≤···≤tN=1 N−1

X

i=0

d(γti, γti+1). (1.2)

The inequality L(γ)≥d(γ0, γ1)always holds. A curve(γt)t∈[0,1] withγ0 =x, γ1 =y andL(γ) =d(x, y)is called a geodesic joiningxtoy. The metric space(X, d)is called geodesic space if each couple of points x, y ∈ X is joined by at least one continuous geodesic. A large class of geodesic spaces is given by compact Riemannian manifolds.

It is important to remark that a discrete metric space(X, d)(like a graph with its usual distance) cannot be a geodesic space, because by definition of the discrete topology there does not exist continuous curves from[0,1]to(X, d).

A natural question about Wasserstein spaces is the following: given two prescribed probability distributions µ0, µ1 ∈ Pp(X), does there exist a geodesic joining µ0 toµ1? In other terms, is the metric space(Pp(X), Wp)a geodesic space? A partial answer is given by the following:

Proposition 1.1. If (X, d) is a geodesic space, then the metric space(Pp(X), Wp) is also a geodesic space.

In particular, each coupleµ0, µ1∈ P2(X)can be joined by a curve(µt)t∈[0,1] of mini- mal length forW2, calledW2-Wasserstein geodesic.

In their seminal papers [Stu06a], [Stu06b] and [LV09], Sturm, and independently Lott and Villani studied the links between the geometry of a measured geodesic space (X, d, ν)and the behaviour of the entropy functional along theW2-Wasserstein geodesics onP2(X). For instance,(X, d, ν)is said to satisfy the curvature conditionCD(K,∞)for someK∈Rif for each couple of probability distributionsµ0, µ1∈ P2(X)there exists a W2-geodesic(µt)t∈[0,1]such that

∀t∈[0,1], Hνt)≤(1−t)Hν0) +tHν1)−Kt(1−t)

2 W20, µ1)2, (1.3) where the relative entropy functionalHν(·)is defined by

Hν(ρν) :=

Z

X

ρ(x) log(ρ(x))dν(x) (1.4)

ifµ=ρν for some densityρ, and byHν(µ) =∞otherwise.

If the measured geodesic space(X, d, ν)is a compact Riemannian manifold with its usual distance an its normalized volume measure, the curvature conditionCD(K,∞)is shown to be equivalent to the boundRic≥ K on the Ricci curvature tensor. Another important property is the stability of the conditionCD(K,∞)under measured Gromov- Hausdorff convergence.

Moreover, ifCD(K,∞)is satisfied for someK >0, one can prove functional inequal- ities on(X, d, ν)such as the logarithmic Sobolev inequality, which asserts that

Hν(f dν)≤ 1 2K

Z

X

|∇f|2

f dν, (1.5)

for any Lipschitz probability density f and where |∇f| is to be seen as a particular form of the norm of a gradient. As a corollary, it can be shown that under the condition

(3)

CD(K,∞)forK >0a Poincaré inequality holds: for any Lipschitz functionh:X →R such thatR

Xhdν= 0, we have Z

X

h2dν≤ 1 2K

Z

X

|∇h|2dν. (1.6)

Since the works of Sturm and Lott-Villani, the theory of measured geodesic spaces satisfyingCD(K,∞)has been thoroughly studied in a large number of papers, among which the most impressive are the works by Ambrosio, Gigli and Savaré (see for in- stance [AGS12]) and by Erbar, Kuwada and Sturm ([EKS13]).

Several obstacles prevent us from a direct generalization of Sturm-Lott-Villani the- ory to the framework of discrete metric spaces. Indeed, if (X, d) is a graph with its usual distance, equation (1.1) still defines a metric on the spacePp(X), but ifp >1then the length of non-trivial curves in(P(X), Wp)is+∞. (The reader can be convinced of this fact by considering the two-point spaceX :={0,1}and the curve(µt)t∈[0,1]defined onPp(X)by µt(0) := 1−t and µt(1) := t. We then haveWpt, µt0 = |t−t0|1/p, from which we deduce by equation (1.2) that the length ofµin the metric space(Pp(X), Wp) is+∞.) In particular, WassersteinW2-geodesics do not exist in general.

Several solutions have been proposed to overcome this difficulty, and there are now many different definitions of Ricci curvature bounds on discrete spaces. The most no- table of them are the coarse Ricci curvature, defined by Ollivier in [Oll09], and the Erbar-Maas curvature, defined in [EM12]. The latter is based on the study of the gradi- ent flow of the entropy and present some similarities with our own approach. Another approach by Gozlan, Roberto, Samson and Tetali, see [GRST12], is based on the con- struction of interpolating curves between probability distributions on a graph. These interpolating curves are defined as mixtures of binomial distributions, which is reminis- cent of the interpolating curves we introduce in this paper.

In this paper, we place ourselves in the framework of a connected and locally finite graphG, endowed with its usual graph distance and the counting measure as the ref- erence measure. In this framework, a probability distribution will be denoted by its density, i.e. by a functionf :G→R+ such thatP

x∈Gf(x) = 1. Given two probability measuresf0andf1onG, we investigate the question of the generalization of the notion ofWp-geodesic joiningf0tof1in a setting where such a curve does not exist. Our goal is to provide a way to chose, among the set of allW1-geodesics joiningf0tof1, a curve which shares some properties satisfied byWp-geodesics forp >1. Such curves will be calledW1,+ geodesics on the graphG.

This article is to be seen as the first of a two-paper research work. A following article will investigate the convexity properties of the entropy functional along those particularW1 geodesics, in the view of obtaining a discrete version of equation (1.3) strong enough to imply discrete versions of log-Sobolev or Poincaré inequalities. This ultimate goal has to be kept in mind even in this present paper because it will motivate the definition of aW1,+-geodesic betweenf0andf1: along such a curve, some technical tools will allow us to give bounds on the second derivative of the entropy.

Our starting point is the article [BB99], by Benamou and Brenier. In this paper, the authors reformulate the Monge-Kantorovitch problem in terms of velocity fields and prove the following:

Theorem 1.2. Let µ0, µ1 be two probability distributions on a Riemannian manifold (M, g)andp >1. Then

Wp0, µ1)p= inf Z

M

Z 1 0

|vt(x)|pt(x)dt, (1.7)

(4)

the infimum being taken over the families of probability distributions(µt) := (ftdvol) joiningµ0toµ1and all velocity fields(vt(x))satisfying

∂tft(x) =−∇ ·(ft(x)vt(x)), (1.8) where∇·is the divergence operator onM. Moreover the minimizing curve(µt)t∈[0,1]is theWpgeodesic joiningµ0toµ1.

This theorem has been extended to the framework of separable Hilbert spaces by Ambrosio, Gigli and Savré in [AGS].

The strategy used by Erbar and Maas in [EM12] is based on a generalization of the minimization problem (1.7) in the framework of discrete Markov chains. Our approach will consist in defining a discrete version of a characterization of its solutions. More precisely, as pointed in [BB99], the formal optimality condition for the optimization problem (1.7) can be written:

∂tvt(x) =−vt(x)∇ ·vt(x). (1.9) Another point of view on the formal optimality condition (1.9) is provided by writing the velocity field(vt(x))as the gradient of a family of convex functionsvt:= grad Φt. As explained for instance in [OV00], it can be proven that such a functionΦsatisfies the Hamilton-Jacobi equation

∂tΦt+1

2|∇Φt|2= 0. (1.10)

It suffices to consider the gradient of equation (1.10) to recover equation (1.9).

The links between the convexity of the entropy H(t) ofµt and the Ricci curvature tensor on the manifoldM are seen on the following heuristic formula, established by Otto and Villani in [OV00]:

H00(t) = Z

M

[Tr((D2Φt)TD2Φt) +∇Φt·Ric∇Φt]dµt. (1.11) In particular, the non-negativity of the tensorRiceasily implies thatH00(t)≥0.

The formal optimality condition (1.9) on velocity fields makes sense only whenv is regular enough. The question of the regularity of optimal couplings is a difficult topic, see for instance [AGS]. However, what is important for our purposes is that (1.9) can be used to construct W2-geodesics: if(ft(x)) is a smooth family of probability densi- ties satisfying the transport equation (1.8) for a smooth velocity field(vt(x))t∈[0,1],x∈M

satisfying the condition (1.9), then the curve(ft(x))is aW2-geodesic.

In the simpler framework of the real lineRwith its usual distance and the Lebesgue reference measure, it is possible to give an equivalent statement of this result without introducing explicitly the velocity field:

Proposition 1.3. Let (ft(x)) be a family of smooth probability densitites on R. We define the families of functions

gt(x) :=− Z x

−∞

∂tft(z)dz , ht(x) :=− Z x

−∞

∂tgt(z)dz. (1.12) We suppose thatgt>0and that the following one-dimensional Benamou-Brenier condi- tion holds:

ft(x)ht(x) =gt(x)2. (1.13)

Then(ft(x))is aWp-geodesic for anyp >1.

(5)

Proof. To prove Proposition 1.3, it suffices to realize that equation (1.13) easily implies equation (1.9): the conditiongt>0implies that we also haveft>0andht>0, we can thus introduce the velocity functionvt(x) :=gt(x)/ft(x), which is positive and smooth.

The transport equation (1.8) is then satisfied, the divergence operator ∇·being here the derivative ∂/∂x. Moreover, the Benamou-Brenier condition (1.13) is equivalent to the conditionvt(x) =ht(x)/gt(x). We then write

∂tlog(vt(x)) = 1 gt(x)

∂tgt(x)− 1 ft(x)

∂tft(x)

= −vt(x) 1

ht(x)

∂xht(x)− 1 gt(x)

∂xgt(x)

= −vt(x) ∂

∂xlog(vt(x)), which implies equation (1.9) as desired.

Apart from regularity issues, which will not play an important role in a discrete framework, the main restriction made in the statement of Proposition 1.3 is the non- degeneracy conditiongt(z)>0. It is quite easy to prove that such a condition implies thatf0is stochastically dominated byf1. In the setting of graphs, we will introduce the notion of W1-orientation (see Paragraph 2.2) in order to force the function gt to stay positive.

The main purpose of this article is to study curves in the space of probability dis- tributions on a graph which satisfy a discrete version of the Benamou-Brenier condi- tion (1.13).

• The goal of Section 2 is to provide a generalization of equations (1.12) and (1.13) to this discrete setting. We will first show that these equations can be recovered in a particular form in the case of contraction of measures. Given a couple of probability measures f0, f1 defined on G, we then endow Gwith an orientation which will allow us to give a general definition of W1,+-geodesics onG. The ter- minology "W1,+-geodesic" will be explained by considering a discrete version of problem (1.7) whenp >1is close to1.

• In Section 3, we are looking for necessary conditions satisfied byW1,+-geodesics on G. In particular, we will prove in Theorem 3.19 that if f0 and f1 are finitely supported, then any W1,+-geodesic(ft) can be written as a mixture of binomial measures supported on geodesics ofG.

• In Section 4 we prove the existence ofW1,+-geodesics(ft)with prescribed initial and final measures f0 and f1. The construction of such curves suggests strong links with the "Entropic Interpolations" studied in a recent series of papers by Léonard.

2 The discrete Benamou-Brenier condition

In this paper, we consider a locally finite and connected graph G. A path γ onG of length n = L(γ) is a collection of vertices γ0, . . . , γn ∈ G such that γi ∼ γi+1 for everyi= 0, . . . n−1, where the relationx∼ymeans that(xy)is in the edge set of the graphG. To any pathγ :{0, . . . n} →Gare associated its endpointse0(γ) :=γ(0)and e1(γ) :=γ(n).

We will use the usual graph distance onG:d(x, y)is the length of the shortest path joiningxtoy. The set of geodesics joiningxtoy, denoted byΓx,y, is the set of pathsγ joiningxtoysuch thatL(γ) =d(x, y). The set of all geodesics onGis denoted byΓ(G).

(6)

Remark 2.1. In this paper, the word ’geodesic’ is used for two slightly different, al- though closely related, objects. If(X, d)is a geodesic space, a geodesic is a continuous curve γ : [0,1] → (X, d) of minimal length. In the setting of a graph G, a geodesic is a sequence γ : {0, . . . n} → G which is also length-minimizing in the sense that d(γ(0), γ(n)) =n. As there is no continuous curveγ: [0,1]→G, there is no ambiguity.

A couplingπ∈Π(f0, f1)is said to be aWp-optimal coupling for somep≥1if it is a minimizer for the functional

Ip:π→ X

x,y∈G×G

d(x, y)pπ(x, y). (2.1)

We denote byΠp(f0, f1)the set ofWp-optimal couplings.

Remark 2.2. The equalityIp(απ1+ (1−α)π2) =αIp1) + (1−α)Ip2)proves that the setΠp(f0, f1)is a convex subset ofΠ(f0, f1).

2.1 Contraction of measures and the Benamou-Brenier equation

Among early attempts to generalize particular Wasserstein geodesics to the discrete case, one important example is given by the thinning operation:

Definition 2.3. Letf be a probability distribution finitely supported onZ+. The thin- ning off is the family(Ttf)of probability distributions defined by

Ttf(k) :=X

l≥0

binl,t(k)f(l) =X

l≥0

l k

tk(1−t)l−kf(l), (2.2)

where by convention kl

= 0ifl <0or ifk /∈ {0, . . . l}.

In particular,(Ttf)is an interpolation between the Dirac measureT0f(k) =δ(k= 0) andT1f =f. The operationf 7→Ttf is often seen as a discrete version of the operation

f(x)7→ft(x) :=1 tfx

t

, (2.3)

and is for instance used to state a weak law of small numbers (see [HJK10]) about the limit in distribution ofT1/n(f?n)whenn→ ∞.

We know that, given a smooth probability densityf onR, the family(ft)defined by equation (2.3) is aWp-geodesic for anyp≥ 1. According to Sturm-Lott-Villani theory, the metric space (R,| · |) satisfies the condition CD(0,∞), so the entropy H(t) of ft

with respect to the Lebesgue measure is a convex function oft. On the other hand, a theorem by Johnson and Yu (see [YJ09]) asserts that the entropy of the thinningTtf is also a convex function oft. The proof of this fact given in [Hil14] relies on the following:

Proposition 2.4. Let (ft) := (Ttf)be the thinning family associated to a probability distributionf :=f1supported onZ+. We define the families of functions (gt)and (ht) by

gt(k) :=−X

l≤k

∂tft(k), ht(k) :=−X

l≤k

∂tgt(k). (2.4)

The tripleft, gt, htthen satisfies the discrete Benamou-Brenier equation:

∀k∈Z, ft(k)ht(k−1) =gt(k)gt(k−1). (2.5) Moreover,gt(k)≥0, and ifgt(k) = 0then eitherft(k+ 1) = 0orht(k) = 0.

(7)

Remark 2.5. Denoting by ∇1 (resp. ∇2) the left derivative operator (resp. the left second derivative operator) defined by ∇1u(k) := u(k)−u(k −1) (resp. ∇2u(k) = u(k)−2u(k−1) +u(k−2)), we thus have

∂ft(k)

∂t =−∇1gt(k), ∂2ft(k)

∂t2 =∇2ht(k).

The proof of the convexity of the entropy along thinning families relies so importantly on Proposition 2.4 that this proof can be usedverbatim to prove a stronger statement:

Proposition 2.6. Let(ft)be a family of finitely supported probability distributions on Z. We suppose that the families of functions (gt)and (ht), defined by equation (2.4), satisfy the discrete Benamou-Brenier equation (2.5) and the non-negativity condition gt(k)≥0. Then the entropyH(t)offtis a convex function oft.

Because the similarities with equation (1.13), it seems legitimate to consider a family of measures satisfying equation (2.5) and the non-negativity conditiongt(k) ≥ 0 as a pseudoWp-geodesic, forp >1, along which the entropy functional is convex, which is reminiscent of Sturm-Lott-Villani theory.

The notion of thinning has been extended in [Hil14] to the setting of general graphs in the following way: we consider a probability distributionf1defined onGand another probability measuref0 which is a Dirac mass at a given pointo ∈ G. In this case, an interpolating curve(ft)t∈[0,1], called contraction of f1 on o, is defined as a mixture of binomial distributions by

ft:=X

z∈G

1

o,z| X

γ∈Γ(o,z)

binγ,t, (2.6)

where the binomial distribution onγis related to the classical binomial distribution by

∀p∈ {0, . . . L(γ)}, binγ,t(γ(p)) := binL(γ),t(p) (2.7) and where|Γo,z|denotes the cardinality of the setΓo,zof geodesics joiningotoz.

There is a canonical way to associate to each vertexo∈Gan orientation onG: Definition 2.7. Let us fix a vertexo∈G.

• We define a partial order on the set of vertices ofGby writingx1≤x2if the vertex x1belongs to a geodesicγ∈Γ0,x2.

• Ifx1∼x2andx1≤x2, we say that(x1, x2)is an oriented edge and we write(x1x2) orx1→x2. We denote by(G,→)the graphGendowed with this orientation.

• Given a vertexx1∈G, we define the (possibly empty) sets of verticesE(x1),F(x1) by

E(x1) :={x0∈G : x0→x1}, F(x1) :={x2∈G : x1→x2}.

Remark 2.8. Ifx, y∈Gare two vertices such thatd(o, x) =d(o, y)andx∼y then the edge(x, y)∈E(G)is not oriented. To be more rigorous, in the definition of(G,→)one should discard such non-oriented edges.

To the oriented graph(G,→)are associated two other oriented graphs:

Definition 2.9. The oriented edge graph (E(G),→) is the graph of oriented couples x1 → x2 ∈ G, oriented itself by the relation (x1x2) → (x2x3). In particular, for any (x1x2)∈(E(G),→)we have

E((x1x2)) ={(x0x1) :x0∈ E(x1)}, F((x1x2)) ={(x2x3) :x3∈ F(x2)}.

Similarly, we define the graph of oriented triples (T(G),→) := (E(E(G)),→), having as vertices the triples (x1x2x3) with x1 → x2 → x3 and edges between each couple (x0x1x2)and(x1x2x3).

(8)

Remark 2.10. The graphGbeing now oriented, the notationE(G)andT(G)stand for (E(G),→)and(T(G),→), which is a slight abuse of notation. For instance,(xy)∈E(G) imply thatx→y. This remark will still be valid once introduced theW1-orientation on G.

Orienting the graphGallows us to define a divergence operator:

Definition 2.11. The divergence of a functiong:E(G)→Ris the function∇·g:G→R defined by

∀x1∈G , ∇ ·g(x1) := X

x2∈F(x)

g(x1x2)− X

x0∈E(x)

g(x0x1).

Similarly, the divergence of a functionh: T(G)→ Ris the function∇ ·h:E(G)→R defined by

∇ ·h(x1x2) := X

x3∈F(x2)

h(x1x2x3)− X

x0∈E(x1)

h(x0x1x2).

We use this orientation to express the function ft as a product of two functions satisfying interesting differential equations:

Proposition 2.12. There exists a couple(Pt),(Qt)of families of non-negative functions onGsuch that:

1. We haveft(x) =Pt(x)Qt(x).

2. The functionsPandQsatisfy the equations

∂tPt(x1) = X

x0∈E(x1)

Pt(x0), ∂

∂tQt(x1) =− X

x2∈F(x1)

Qt(x2). (2.8)

This proposition is proven in [Hil14]. We can now use Definition 2.11 and Proposi- tion 2.12 to state a generalized version of Proposition 2.4:

Proposition 2.13. We define the families of functions(gt)and(ht), defined respectively onE(G)andT(G), by

gt(x1x2) :=Pt(x1)Qt(x2), ht(x0x1x2) :=Pt(x0)Qt(x2). (2.9) 1. The functionsf,gandhsatisfy the differential equations

∂tft(x1) =−∇ ·gt(x1), ∂

∂tgt(x1x2) =−∇ ·ht(x1x2). (2.10) 2. For every oriented triple(x0x1x2)∈T(G)we have

ht(x0x1x2)ft(x1) =gt(x0x1)gt(x1x2). (2.11)

Remark 2.14. As in the thinning case, Proposition 2.13, and in particular equation (2.11) are used to study the convexity properties of the entropy functional along con- traction families on graphs (see [Hil14, Section 5]).

(9)

2.2 TheW1-orientation

It is not possible to use directly Proposition 2.13 to propose a general Benamou- Brenier condition because such a definition relies on an orientation of the graph G which has been constructed by using the fact thatf0 is Dirac. As a first necessary step in the construction of generalW1,+-geodesics, we thus need to find a nice orientation onG, depending on the initial and final measuresf0andf1.

The term "nice orientation" is vague, but the study of the thinning and of the contrac- tion families suggests that, in order to have interesting consequences on the convexity of the entropy, we should at least require thatgt(x1x2)≥0for everyx1 →x2 ∈ E(G). As we will see at the end of this paragraph, this requirement can be interpreted in the framework of optimal transportation theory.

We first recall some properties of supports ofW1-optimal couplings:

Definition 2.15. Given a couplef0, f1of finitely supported measures, we associate the set

C(f0, f1) :={(x, y)∈G×G : ∃π∈Π1(f0, f1), π(x, y)>0}. (2.12) Equivalently,C(f0, f1)is the smallest subset ofG×Gcontaining the supports of all the W1-optimal couplings betweenf0andf1.

Proposition 2.16. There existsπ∈Π1(f0, f1)such thatSupp(π) =C(f0, f1).

Proof. For every(x, y) ∈ C(f0, f1), there exists a couplingπ(x,y) ∈ Π1(f0, f1)satisfying π(x,y)(x, y)>0. Asf0andf1are finitely supported, we can consider the barycenter

π:= 1

|C(f0, f1)|

X

(x,y)∈C(f0,f1)

π(x,y), (2.13)

which by convexity is inΠ1(f0, f1)and which is clearly fully supported inC(f0, f1). A tool often used when studying the support of optimal couplings is the cyclic mono- tonicity property:

Lemma 2.17. If(x0, y0), . . .(xp, yp)are inC(f0, f1)then

p

X

i=0

d(xi, yi)≤d(x0, yp) +

p−1

X

i=0

d(xi+1, yi). (2.14)

Proof. We consider a coupling π ∈ Π1(f0, f1) as constructed in Proposition 2.16 and a number 0 < a < infi(π(xi, yi)). We introduce the function h on G×G defined by h(x0, yp) := a, h(x, y) := aif (x, y) = (xi+1, yi)for some i ∈ {0, . . . p−1},h(x, y) := −a if (x, y) = (xi, yi)for some i ∈ {0, . . . p}, andh(x, y) := 0 elsewhere. Thenπ+his a coupling inΠ(f0, f1)and

I1(π+h)−I1(π) =a d(x0, yp) +

p−1

X

i=0

d(xi+1, yi)−

p

X

i=0

d(xi, yi)

!

, (2.15)

butπ∈Π1(f0, f1)impliesI1(π+h)≥I1(π), which shows equation (2.14).

Lemma 2.17 is used to define unambiguously an orientation on some edges ofG:

(10)

Theorem 2.18. Letf0, f1 be two fixed probability distributions onG, andγ(1), γ(2) be two geodesics in G such that (e0(i)), e1(i))) ∈ C(f0, f1) fori = 1,2. Then for any (k1, k2)withki ≤L(γ(i))−1 we cannot have both identitiesγ(1)(k1) =γ(2)(k2+ 1)and γ(1)(k1+ 1) =γ(2)(k2).

Proof. We suppose that both identitiesγ(1)(k1) =γ(2)(k2+ 1)andγ(1)(k1+ 1) =γ(2)(k2) hold. By considering the pathe0(1)),· · ·γ(1)(k1), γ(2)(k2+ 2),· · ·e1(2)), we see that

d(e0(1)), e1(2)))≤k1+L(γ(2))−k2−1.

Similarly we have

d(e0(2)), e1(1)))≤k2+L(γ(1))−k1−1.

SinceL(γ(i)) =d(xi, yi)fori= 1,2, we have

d(x1, y1) +d(x2, y2)≥d(x1, y2) +d(x2, y1) + 2, (2.16) which by Lemma 2.17 is a contradiction.

Definition 2.19. Letf0, f1be two finitely supported probability measures onG.

• The W1 orientation associated to f0, f1 is defined by orienting the edge (x, y) ∈ E(G)byx→yif there exists a geodesicγonGsuch that

1. (e0(γ), e1(γ))∈ C(f0, f1).

2. γ(k) =x,γ(k+ 1) =yfor somek∈ {0, . . . L(γ)−1}.

• An oriented path on the oriented graph(G,→)is an applicationγ :{0, . . . L} →G such thatγ(i)→γ(i+ 1)fori= 0, . . . L−1.

• We define a partial order relation on the vertices of Gby writingx ≤ y if there exists an oriented path joiningxtoy.

Remark 2.20. The process described in the first item of Definition 2.19 may not orient every edge(x, y)∈ E(G). For instance, ifG=Z/3Zis the complete graph with three vertices, f0 := δ(0) is the Dirac measure at0 and f1 is any probability distribution on G, then the edge(1,2)is not oriented for theW1-orientation associated to(f0, f1). This issue has already been encountered in the thinning case, see Remark 2.8.

If such edges exist, it is convenient to consider the subgraphG0 ⊂G, which depends on(f0, f1), such that the edge setE(G0)is exactly the set of edges(x, y)∈E(G)which can be oriented and whose vertices are the vertices ofGthat are endpoints of at least one oriented edge in E(G0). By an abuse of notation, we will denote (G0, E(G0)) by (G, E(G)).

An important property of theW1-orientation is the following:

Theorem 2.21. Every oriented path on(G,→)is a geodesic.

Proof. Let γ be an oriented path on(G,→)of lengthn. To show thatγ is a geodesic, it suffices to prove thatd(γ(0), γ(n))≥n. By definition of theW1 orientation, for each i∈ {0, . . . n−1}there exists a geodesicγ(i)of lengthLi≥1andki∈ {0, . . . Li−1}such that

• (e0(i)), e1(i)))∈ C(f0, f1),

• γ(i)(ki) =γ(i),

• γ(i)(ki+ 1) =γ(i+ 1).

(11)

By Lemma 2.17, settingxi:=e0(i))andyi:=e1(i)), we have

n−1

X

i=0

d(xi, yi)≤d(x0, yn−1) +

n−2

X

i=0

d(xi+1, yi). (2.17) But, γ(i) being a geodesic ofG, we have d(xi, yi) =d(e0(i)), e1(i))) = Li. Further- more, fori∈ {0, . . . n−2}we have

d(xi+1, yi) ≤ d(xi+1, γ(i+ 1)) +d(γ(i+ 1), yi)

= d(γ(i+1)(0), γ(i+1)(ki+1)) +d(γ(i)(ki+ 1), γ(i)(Li))

= ki+1+Li−ki−1.

We also have the estimation

d(x0, yn−1) ≤ d(x0, γ(0)) +d(γ(0), γ(n)) +d(γ(n), yn−1)

= d(γ(0)(0), γ(0)(k0)) +d(γ(0), γ(n)) +d(γ(n−1)(Ln−1), γ(n−1)(kn−1+ 1))

= k0+Ln−1−kn−1+ 1 +d(γ(0), γ(n)).

We finally have

n−1

X

i=0

Li≤k0+Ln−1−kn−1+ 1 +d(γ(0), γ(n)) +

n−2

X

i=0

ki+1+Li−ki−1, (2.18) which gives0≤d(γ(0), γ(n))−nand proves the theorem.

The following shows that theW1-orientation is in some sense stable by restriction:

Proposition 2.22. Let(ft)t∈[0,1]be aW1-geodesic onG. For0≤s≤t≤1, let(x, y)in E(G)such thatx→y for theW1-orientation with respect tofs, ft. Thenx→y for the W1-orientation with respect tof0, f1.

Proof. It suffices to show that, ifπ˜ ∈Π1(fs, ft)andπ(b, c)˜ >0thenb≤cfor the partial order coming from theW1-orientation associated tof0, f1.

The proof this fact is inspired by the ’gluing lemma’ stated and explained in [LV09]:

let π(1) ∈ Π1(f0, fs), π(2) ∈ Π1(fs, ft), π(3) ∈ Π1(ft, f1). We consider the ’gluing’ π of these three couplings, defined by:

π(a, d) := X

b,c∈G

π(1)(a, b)π(2)(b, c)π(3)(c, d) fs(b)ft(c) ,

where the quotient is zero when fs(b) = 0 orft(c) = 0. It is easily shown that π ∈ Π(f0, f1). Moreover,

W1(f0, f1) ≤ X

a,d∈G

d(a, d)π(a, d)

≤ X

a,b,c,d∈G

(d(a, b) +d(b, c) +d(c, d))π(a, d)

= X

a,b,c,d∈G

(d(a, b) +d(b, c) +d(c, d))π(1)(a, b)π(2)(b, c)π(3)(c, d) fs(b)ft(c)

= X

a,b∈G

d(a, b)π(1)(a, b) + X

b,c∈G

d(b, c)π(2)(b, c) + X

c,d∈G

d(c, d)π(3)(c, d)

= W1(f0, fs) +W1(fs, ft) +W1(ft, f1)

= W1(f0, f1).

(12)

This shows theW1-optimality ofπand the equality

d(a, d)π(a, d) = (d(a, b) +d(b, c) +d(c, d))π(a, d).

Theorem 2.21 shows that, wheneverπ(a, d) >0, we havea≤b ≤c≤d. On the other hand, ifπ(2)(b, c)>0then there existsa∈Supp(f0)andd∈Supp(f1)withπ(1)(a, b)>0 andπ(3)(c, d)>0, soπ(a, b) = 0and sob≤c.

We now prove:

Theorem 2.23. Let(ft)be a smoothW1-geodesic onG. We endow this graph with the W1-orientation associated tof0, f1. There exists a family of functions(gt) : E(G)→R such that

∂tft(x) =−∇ ·gt(x).

• ∀(xy)∈E(G), gt(xy)≥0.

Moreover, there exists a family(ht) :T(G)→Rsuch that

∀(xy)∈E(G), ∂

∂tgt(xy) =−∇ ·ht(xy).

We first prove a general result implying the existence of a family (gt) such that

∂tft(x) =−∇ ·gt(x):

Lemma 2.24. Let(G,→)be an oriented graph andu:G→Rfinitely supported such thatP

x∈Gu(x) = 0. Then there existsg:E(G)→Rwith∇ ·g=u.

Proof. We consider two scalar products, for functions respectively defined on G and E(G), defined by

< u, v >G:=X

x∈G

u(x)v(x), < a, b >E:= X

x→y

a(xy)b(xy).

The adjoint of the divergence operator∇·is−∂, where∂is the linear operator defined by(∂u)(xy) :=u(y)−u(x), in the sense that

<∇ ·a, u >G=< a, ∂u >E (2.19) for any couple u, a of functions respectively defined on G and E(G). The kernel of

∂ is the one-dimensional space generated by the constant function v = 1. The con- dition P

x∈Gu(x) = 0 is thus equivalent to < u, v >G= 0 or u ∈ (ker(∂))G. We thus want to prove that (ker(∂))G ⊂ range(∇·). As the linear spaces we are con- sidering are finite-dimensional, this inclusion is equivalent to(range(∇·))G ⊂ker(∂). Let u ∈ (range(∇·))G. Then for any b : E(G) → R we have < ∇ ·b, u >G= 0, so

< b, ∂u >E= 0, which proves thatu∈ker(∂).

As we have ∂tft(x) = 0, Lemma 2.24 gives the existence of a family (gt) with

∂tft(x) = −∇ ·gt(x). However, this result does not provide an explicit construction ofgand in general nothing can be said about its sign.

Proof of Theorem 2.23. Let G0 be a spanning tree of G, i.e. a tree having the same vertices as G, but with possibly fewer edges. We endow G0 with the restriction of the orientation on G. According to Lemma 2.24, there exists a family of functions (gt) :E(G0)→R+satisfying ∂tft(x) =−∇·gt(x). AsG0is a tree, we know that removing an edge(x0y0)from the graphG0will cut it into two disjoint subgraphsG01 :=G01(x0y0)

(13)

andG02 :=G02(x0y0). Letu(x0y0) be the indicator function ofG01. This function satisfies (∂u(x0y0))(xy) =−1if(xy) = (x0y0)and(∂u(x0y0))(xy) = 0otherwise, which implies:

gt(x0y0) = − X

(xy)∈(E(G0),→)

gt(xy)(∂u(x0y0))(xy)

= −< gt, ∂u(x0y0)>E=<∇ ·gt, u(x0y0)>G

= −< ∂

∂tft, u(x0y0)>=− X

z∈G01

∂tft(z)

We want to prove that gt(x0y0) ≥ 0. Actually we will prove that the function t 7→

P

z∈G01ft(z) is strictly decreasing, so we have gt(x0y0) > 0. For 0 ≤ s ≤ t ≤ 1, let π∈Π1(fs, ft). We have

X

z∈G01

fs(z) = X

x≤y∈G:x∈G01

π(x, y), X

z∈G01

ft(z) = X

x≤y∈G:y∈G01

π(x, y)

By Proposition 2.22, we know that ifπ(x, y)> 0 thenx≤ y. In particular, we cannot havex ∈ G02 andy ∈ G01. Equivalently, ifx≤ y,π(x, y) > 0 andy ∈ G1 thenx∈ G1. Consequently, we have:

X

z∈G01

fs(z)− X

z∈G01

ft(z) = − X

x≤y∈G:x∈G01,y∈G02

π(x, y)≤0. (2.20) Furthermore, as (x0y0) is an oriented edge, we know by the definition of the W1

orientation that there exists(x, y)∈ C(f0, f1)such thatx≤x0 ≤y0 ≤y. In particular, x∈G01,y∈G02andπ(x, y)>0. This proves that the inequality (2.20) is actually strict, which shows the positivity of the family of functions(gt)onE(G0). The first point of Theorem 2.23 is proven by extendinggttoE(G), settinggt(xy) := 0if(xy)∈/ E(G0).

The existence of a family of functions (ht) such that ∂t gt = −∇ ·ht is proven by Lemma 2.24. We only need to check that P

(xy)∈E(G)

∂tgt(xy) = 0. We are actually going to prove the stronger statement:

X

(xy)∈(E(G),→)

gt(xy) =W1(f0, f1).

To prove this fact, we consider the functionu := P

(x0y0)∈E(G0)u(x0y0). The function u satisfies(∂u)(xy) = 1for everyx→y ∈E(G0). We then have:

Z t 0

X

(xy)∈(E(G),→)

gs(xy)ds = Z t

0

X

(xy)∈(E(G),→)

gs(xy)(∂u)(x, y)ds

= −

Z t 0

X

x∈G

(∇ ·gs)(x)u(x)dt

= Z t

0

X

x∈G

∂sfs(x)u(x)ds

= X

y∈G

ft(y)u(y)−X

x∈G

f0(x)u(x).

Let π ∈ Π1(f0, ft). We know by Proposition 2.22 that if π(x, y) > 0 then x ≤ y. On the other hand, ifx≤y then there exists a pathx=γ0 → · · · → γn =y and we have u(y)−u(x) = (u(γn)−u(γn−1)) +· · ·+ (u(γ1)−u(γ0)) =n=d(x, y), so we have

Z t 0

X

(xy)∈(E(G),→)

gs(xy)ds=X

x≤y

π(x, y)d(x, y) =W1(f0, ft) =tW1(f0, f1).

(14)

Differentiating with respect tot shows that the sumP

(xy)∈E(G)gt(xy)is constant and equal to W1(f0, f1). To finish the proof of the theorem, we extend (ht) to T(G) by defininght(x0x1x2) := 0if(x0x1x2)∈/T(G0).

Actually, Theorem 2.23 can be strengthened in the following way:

Proposition 2.25. In Theorem 2.23, we can replace the assertion ∀(xy) ∈ E(G) , gt(xy)≥0by∀(xy)∈E(G),gt(xy)>0.

Proof. The proof of Theorem 2.23 allowed us to construct, given a spanning treeG0⊂G, a family of functions(gtG0)such that gGt0(xy) > 0 when(xy) ∈ E(G0) and gGt0(xy) = 0 when (xy) ∈/ E(G0). But for each edge (x0y0) ∈ E(G) there exists a spanning tree G0⊂Gwith(x0y0)∈E(G0). We define a family(gt) :E(G)→Ras the barycenter

∀(xy)∈E(G), gt(xy) := 1

|T | X

G0∈T

gtG0(xy),

where T is the (finite) set of spanning trees for G. Then gt > 0 and satisfies the conditions of Theorem 2.23. We finally construct a suitable family (ht) by defining ht := |T |1 P

G0∈T hGt0, where (h(G

0)

t ) is constructed from (gtG0)as in the proof of Theo- rem 2.23.

2.3 Definition ofW1,+-geodesics

Having now constructed an orientation of G associated to each couple of finitely supported probability distributions f0, f1 ∈ P1(G), we propose a definition of W1,+- geodesics inspired by Proposition 2.13:

Definition 2.26. Let G be a graph, W1-oriented with respect to a couple of finitely supported probability measuresf0, f1. A family(ft)t∈[0,1] ∈ P(G)is said to be aW1,+- geodesic if:

1. The curve(ft)t∈[0,1] is aW1-geodesic.

2. There exists two families(gt)and(ht)defined respectively onE(G)andT(G)such that

∂tft=−∇ ·gt, ∂

∂tgt=−∇ ·ht. 3. For every(xy)∈E(G)we havegt(xy)>0.

4. The triple(ft, gt, ht)satisfies the Benamou-Brenier equation

∀(x0x1x2)∈T(G), ft(x1)ht(x0x1x2) =gt(x0x1)gt(x1x2). (2.21) Remark 2.27. In the sequel,the assertion ’let (ft) be a W1,+-geodesic’ means ’let ((ft),(gt),(ht)) be a triple of families of functions satisfying the conditions of Defini- tion 2.26’. This is an abuse because nothing is known about the uniqueness of the families(gt)and(ht)associated to aW1,+-geodesic.

Remark 2.28. We can check that any contraction of measure is also aW1,+-geodesic:

iff0ois a Dirac measure, then the setΠ1(f0, f1)has only one element, and it easy to prove that theW1-orientation with respect to f0, f1coincide with the orientation used for contraction of measures. Proposition 2.13 shows that the other points of Defini- tion 2.26 are satisfied by contraction families.

It is possible to state (2.21) in terms of two different velocity fields:

(15)

Proposition 2.29. Let(ft)t∈[0,1]be aW1,+-geodesic onG. We define the velocity fields v+,tandv−,t by

v+,t(x0x1) :=gt(x0x1)

ft(x0) , v−,t(x0x1) :=gt(x0x1)

ft(x1) (2.22)

and the velocity functionsV+,tandV−,tby V+,t(x1) := X

x2∈F(x1)

v+,t(x1x2), V−,t(x1) := X

x0∈E(x1)

v−,t(x0x1). (2.23)

The following differential equations then hold:

∂tv+,t(x0x1) =−v+,t(x0x1) [V+,t(x1)−V+,t(x0)], (2.24)

∂tv−,t(x0x1) =−v−,t(x0x1) [V−,t(x1)−V−,t(x0)]. (2.25) Proof. We use the definitions ofgtand ht, and then apply the Benamou-Brenier equa- tion (2.21) to write:

∂tv+,t(x0x1) = gt(x0x1) ft(x0)2

 X

˜ x1∈F(x0)

gt(x01)− X

x−1∈E(x0)

gt(x−1x0)

+ 1

ft(x0)

− X

x2∈F(x1)

gt(x0x1)gt(x1x2)

ft(x1) + X

x−1∈E(x0)

gt(x−1x0)gt(x0x1) ft(x0)

= v+,t(x0x1)

 X

˜ x1∈F(x0)

gt(x01)

ft(x0) − X

x2∈F(x1)

gt(x1x2) ft(x1)

= v+,t(x0x1) [V+,t(x0)−V+,t(x1)]. The second formula is proven by similar methods.

Remark 2.30. Proposition 2.29 is reminiscent of the continuous case described in the introduction. In particular, equation (2.22) is similar to the transport equation (1.8) (see also the proof of Proposition 1.3) and equations(2.24)and (2.25)are similar to the formal optimality condition(1.9).

We now give some heuristic arguments explaining the terminology ’W1,+-geodesic’.

Let us consider the minimization problem described by equation (1.7) of Theorem 1.2, when the paramaterp= 1+εis close to1. We use the expansiona1+ε=aexp(εlog(a)) = a+ε alog(a) +O(ε2), valid fora >0, to write

Z

M

Z 1 0

|vt(x)|pt(x)dt= Z

M

Z 1 0

|vt(x)|dµt(x)dt+ε Z

M

Z 1 0

|vt(x)|log(|vt(x)|)dµt(x)dt+O(ε2).

The integralR

M

R1

0 |vt(x)|dµt(x)dtis exactly equation (1.7) forp= 1. We thus know, by Theorem 1.2 that the minimizers of this integral over the set of families(ft)of proba- bility measures withf0, f1prescribed and ∂tft(x) +∇ ·(vt(x)ft(x)) = 0are exaclty the W1-geodesics joiningf0tof1. This suggests the following:

Definition 2.31. We say that a curve (ft) of probability measures on a Riemannian manifoldM is aW1,+-geodesic onM if it is solution to the minimization problem

inf Z

M

Z 1 0

|vt(x)|log(|vt(x)|)dµt(x)dt,

(16)

where the infimum is taken over the set of all W1-geodesics between f0 and f1 and where the velocity field(vt)is defined by the continuity equation

∂tft(x) +∇ ·(vt(x)ft(x)) = 0.

The formal optimality condition on (vt)obtained by applying Euler-Lagrange equa- tions is the same as forWp-geodesics:

∂tvt(x) =−vt(x)∇vt(x).

The next proposition shows thatW1,+-geodesics on a graph can be related to a min- imization problem similar to the continuous one described in Definition 2.31:

Proposition 2.32. Let Gbe aW1-orientated with respect tof0, f1 finitely supported.

We consider the problem

infI+(f, g) := inf Z 1

0

X

x→y

v+,t(xy) log(v+,t(xy))ft(x), (2.26) where the infimum is taken over the set ofW1-geodesics (ft) betweenf0 and f1 such that the velocityv+,t(xy)is defined by equation(2.22)from a family of non negative(gt) with ∂tft(x) =−∇ ·gt(x).

We suppose that there exists a W1,+-geodesic(ft) joining f0 tof1. Then (ft) is a critical point forI+ in the following sense: if(ut) is a family of functions defined on E(G)satisfying the boundary conditionsu0(xy) =u1(xy) = 0, then

I+

f +η∇ ·u, g−η∂u

∂t

=I+(f, g) +O(η2). (2.27) Remark 2.33. Recall that, given aW1-geodesic(ft), the continuity equation ∂tft(x) =

−∇ ·gt(x)may be solved by a family(gt)which is not necessarily always positive. We restrict ourselves to the families of non-negative (gt), which always exist by Proposi- tion 2.25, in order to write|v+,t(xy)|=v+,t(xy).

Proof of Proposition 2.32. Whenηis small, we have the expansion I+

f +η∇ ·u, g−η∂u

∂t

− I+(f, g)

=−η Z 1

0

X

x→y

∂tut(x, y) (1 + log(v+,t(xy))) + (∇ ·ut)(x)vt(xy)dt+O(η2) On the other hand, we use the boundary conditionsu0=u1= 0to write:

Z 1 0

X

x→y

∂tut(x, y) (1 + log(v+,t(xy))) = − Z 1

0

X

x→y

ut(x, y)∂

∂t(1 + log(v+,t(xy)))dt

= −

Z 1 0

X

x→y

ut(xy) 1 vt(xy)

∂tvt(xy)dt

= Z 1

0

X

x→y

ut(xy)[V+,t(y)−V+,t(x)]dt

= −

Z 1 0

X

x∈G

(∇ ·ut)(x)V+,t(x)dt

= Z 1

0

X

x→y

(∇ ·ut)(x)vt(xy)dt,

参照

関連したドキュメント

The approach based on the strangeness index includes un- determined solution components but requires a number of constant rank conditions, whereas the approach based on

Many interesting graphs are obtained from combining pairs (or more) of graphs or operating on a single graph in some way. We now discuss a number of operations which are used

In Section 3 the extended Rapcs´ ak system with curvature condition is considered in the n-dimensional generic case, when the eigenvalues of the Jacobi curvature tensor Φ are

We show that a discrete fixed point theorem of Eilenberg is equivalent to the restriction of the contraction principle to the class of non-Archimedean bounded metric spaces.. We

Reynolds, “Sharp conditions for boundedness in linear discrete Volterra equations,” Journal of Difference Equations and Applications, vol.. Kolmanovskii, “Asymptotic properties of

Then it follows immediately from a suitable version of “Hensel’s Lemma” [cf., e.g., the argument of [4], Lemma 2.1] that S may be obtained, as the notation suggests, as the m A

As a consequence of this characterization, we get a characterization of the convex ideal hyperbolic polyhedra associated to a compact surface with genus greater than one (Corollary

[Mag3] , Painlev´ e-type differential equations for the recurrence coefficients of semi- classical orthogonal polynomials, J. Zaslavsky , Asymptotic expansions of ratios of