ファイル置き場 Sendai Logic Homepage

(1)

Value on Concurrent B¨uchi games

Mathematical Institute Tohoku University Sendai Logic Seminar

October 19, 2012

(2)

Introduction

In this work, we present some recent results in the infinite games played on a finite graphs.

Roughly speaking, we show the existence of value on B¨uchi game (i.e., determined) by constructing generalized reachability games. Specifically, we provide how to compute an ǫ-optimal strategies and approximate a value of game in some way.

In particular, we show the value of B¨uchi game can be computed by special payoff function (we may call a weighted reachability payoff function) defined on generalized reachability games.

Finally, we show that the value of B¨uchi game can be written as a value of generalized reachability games.

(3)

Definitions of Concurrent games

Definition

A two-player concurrent game is given by G = (S, X , Y , δ), where S is a nonempty finite set of states

X is a nonempty set of strategies of Player I (actions) Y is a nonempty set of strategies of Player II (actions) a transition function δ : S × X × Y → S that specifies for every state s ∈ S, for each x ∈ X and for all y ∈ Y determine a successor state δ(s, x, y ).

(4)

Definitions of Concurrent games

Intuitively, a concurrent game is play as follows.

For every state s in S, Player I selects an element x₁∈ X and Player II chooses y₁ ∈ Y , each unaware of the choice of the other and then the game proceeds to the successor state δ(s, x₁, y₁). Next, Player I selects x₂ ∈ X , and simultaneously the Player II selects y₂∈ Y and the next state is δ(s^′, x₂, y₂). By this way, they produce an infinite number of rounds.

(5)

Notations

A path of G is a finite or infinite sequence s₀, s₁, s₂, ... of states in S such that for all i ≥ 0, there exist x₁ⁱ ∈ X and there exist y₁ⁱ ∈ Y where δ(si, x₁ⁱ, y₁ⁱ) = s_i+1.

We refer to infinite sequence of states as an infinite play (or run) w =< s₀, s₁, s₂, ... >.

We denote Ω(S) be the set of all infinite play, or Ω if S is clear. The individual states in w are denoted by w (0), w (1), w (2), ..., and for every infinite play w and every i ≥ 0 we use w (i) to denote the i-th occurence of run w .

(6)

Winning Objectives

Reachability objectives

Let T ⊆ S be the set of states called target states. The reachability objective is defined as follows.

RT = {w =< s₀, s₁, s₂, ... >∈ Ω : ∃k ≥ 0, w (k) ∈ T }. B¨uchi objectives

Let C ⊆ S be set of states called B¨uchi states. For a run w =< s₀, s₁, s₂, ... >, we denote

In(w ) := {s ∈ S : ∃^∞k such that w (k) = s} set of states that occur infinitely often in w . Thus, B¨uchi objective is given by

BC = {w ∈ Ω : In(w ) ∩ C 6= ∅}.

We denoted G(RT) and G(BC) the game with reachability and B¨uchi objectives, respectively.

(7)

Game Values

Given G(RT), we define the value function for Player I as follows. val^σ_s(G(RT)) = sup

σ

infτ ^P σ,τ s ^(R^T^).

Similarly, the value function for Player II is given by val^τ_s(G(¬R_T)) = sup

τ

infσ ^P σ,τ

s ^(¬RT^).

The game G with objective R_T and ¬R_T is determinate if val^σ_s(G(R_T)) + val^τ_s(G(Ω \ R_T)) = 1.

Determinacy implies the following equalities. For every s ∈ S, sup

σ

infτ ^P σ,τ

s ^(RT^{) = inf} τ ^sup_σ ^P

σ,τ s ^(RT^).

If these two quantities are equal, we call them the value of the game, denoted by vals(G(RT)).

We define these value in similar way for B¨uchi game (G(BC)).

(8)

Optimal Strategies

A strategy that achieves the value is an optimal strategy. Player I strategy σ is optimal if

vals(G(R_T)) = inf

τ ^P σ,τ s ^(RT^).

For ǫ > 0, a strategy σ for Player I is ǫ−optimal if vals(G(RT)) ≤ inf

τ ^P

sσ,τ^(RT) + ǫ. An optimal and ǫ-optimal strategy for Player II is defined analogously.

(9)

Determinacy of B¨uchi games

Idea:

We introduce generalizing reachability game. We first show this game is determined.

Then, we define a function l^∗ on the states S over the non-negative real numbers (called a valuation) w.r.t. generalizing reachability game.

Finally, we present the determinacy theorem for B¨uchi games (by showing vals(G(RT_l^∗)) = vals(G(BC))).

(10)

Generalized Reachability Games

We first introduce a simple function on the set of states S over the non-negative real numbers (called a labelling) as follows.

l : S → R^≥0 Define

T_l = {s ∈ S : l(s) > 0} set of states with strictly positive labelled.

(11)

weighted reachability payoff

Consider the generalized reachability games (denoted by G(R_T_l)) with reachability objectives RT_l := {w ∈ Ω : ∃n > 0, w (n) ∈ Tl} and whose values can be describes as follows.

valw(G(R_T_l)) =

(l(w (µn > 0 : w (n) ∈ Tl)) if ∃n > 0 s.t. w (n) ∈ Tl

0 if otherwise

for a w =< s₀, s₁, s₂, ... >∈ Ω.

We may call this values as a weighted reachability payoff which assigns to every run w either 0 (if w does does not visit a target state) or the reward of the first target state visited by w .

(12)

Generalized Reachability Games

It is not hard to check this game is determined or not.

We need to define a new interpretation of value (called a limit value) of generalized reachability games (denoted by V (s)). We then show that this value can be written as a value of the game vals(G(RTl)), where the definition of V (s) is given as follows.

(13)

limit value

Definition

For a given generalized reachability game G(R_T_l), and for every state s ∈ S, for any n ≥ 0, the value Vn(s) is defined by

V₀(s) := 0

Vn+1(s) :=











the value of one step game

assigning V_n(s^′) on s^′ if s^′ ∈ T/ l

and

l(s^′′) on s^′′ if s^′′∈ Tl

Note that the sequence of {Vn(s)}_n∈N is non-decreasing and bounded. Thus, we write the limit value as follows.

V (s) := lim

n→∞^Vⁿ^(s).

(14)

generalized reachability game is determined

We can now prove the following theorem whose proof basically follows from the theorem for Det. of Reachability games. Theorem

V (s) = sup_σinfτPs^σ,τ(RTl^{) = inf}τsup_σPs^σ,τ(RTl^{) = val}s(G(RTl⁾⁾

(15)

function l

^∗

(s)

Definition

For every state s ∈ S and for any n ≥ 0, a function l_n : S → R^≥0 is defined by

l₀(s) :=

(1 if s ∈ C 0 if otherwise

l_n+1(s) :=

(vals(G(Rln⁾⁾ ^{if s ∈ C}

0 if otherwise

Note that {l_n(s)}_n∈N is a non-increasing bounded sequence and the limit exists. Therefore we set

l^∗(s) := lim

n→∞^lⁿ^(s).

(16)

The theorem

Now, we are ready to prove our main theorem of this section. Theorem

For a given B¨uchi game G(BC), and for all states s ∈ S, the followings are hold

vals(G(RT_l^∗)) = sup

σ

infτ ^P σ,τ

s ^(B^C^{) = inf}_τ ^sup σ

P_s^σ,τ(BC) = vals(G(BC))

Proof

The last equality is holds if we prove the first two equalities. Thus, it is enough by showing the following inequalities.

vals(G(R_T^∗

l ^{)) ≤ sup}

σ

infτ ^P σ,τ

s ^(BC^{) ≤ inf} τ ^sup_σ ^P

σ,τ

s ^(BC^{) ≤ val}s(G(R_T^∗

l ^)).

(17)

Proof

The proof is devided by two phases.

(1) special case (assuming there exist an optimal strategy σ) (2) general case (the existence of σ_ǫ^∗: HR ǫ-optimal strategy)

(18)

Special case

Proof

Suppose that there exist an optimal strategy σ for Player I in game G_(R_T∗

l ^).

We define σ^∗(ρ) = σ(ρ^′) where ρ^′ is the finite play such that ρ = (ρ ↾ M)ρ^′ and M = max{0, i : ρ(i) ∈ T_l∗_}.

Fix s₀ ∈ S as a start point of the game.

(19)

Proof

Firstly, we want to show that there exist an optimal strategy σ^∗ of Player I such that

infτ ^P σ^∗,τ

s₀ ^(BTⁿ^{) ≥ val}^s0^(G(RT_l^∗)) where

Bⁿ_T := {w ∈ Ω : ∃^≥nm such that w (m) ∈ T } and

T := {t ∈ C : l^∗(t) > 0}.

Intuitively, we will show there is an optimal strategy σ^∗ where the probability of reaching B_Tⁿ, there exist n many natural numbers m such that m-th element of w is in T achieve the value at state s0

of G(R_T^∗

l ^{) games.}

*We may call G(B_Tⁿ) an n-approximation of B¨uchi games.

(20)

Proof

For the case n = 1, the probability of reaching state s₁ from s₀ where s₁ ∈ T is achieve the value at state s0. Specifically,

infτ ^P σ^∗,τ

s₀ ^(BT¹^{) = val}^s0^(G(RT_l^∗^)).

Therefore, the probability of entering T in n + 1 times is given by infτ ^P

σ^∗,τ

s₀ ^(BTⁿ⁺¹^{) ≥ inf}_τ,τ_′

X

s₀₋_→^ρ s₁∈T

(prob_s^σ₀^∗^,τ(ρ)P_s^σ₁^∗^,τ^′(B_Tⁿ))

where

prob^σ_s₀^∗^,τ(ρ) := ^Y

n<|ρ|

X

x,y∈X ×Y : δ(ρ_↾n−1,x,y)=ρ↾n

σ^∗(ρ↾n)(x)τ (ρ↾n)(y ).

(21)

Proof

By induction hypothesis, the followings are holds

≥ inf

τ

X

s0

−ρ

→^s1∈T

(prob_s^σ₀^∗^,τ(ρ) · l^∗(s₁)

= inf

τ ^P σ^∗,τ s₀ ^(RTl^∗⁾

= val_s₀(G(R_T^∗

l ⁾⁾

by the choice of σ^∗.

(22)

Proof

Note that B_T¹ ⊇ B²_T ⊇ B_T³ ⊇ .... Thus, we have P_s^σ₀^∗^,τ(^\

n∈N

Bⁿ_T) = lim

n→∞^P σ^∗,τ s₀ ^(BTⁿ⁾

which implies

P_s^σ₀^∗^,τ(B_C) ≥ lim

n→∞^P σ^∗,τ s₀ ^(BTⁿ^).

Since val_s₀(G(R_T^∗

l ^{)) ≤ lim}^n→∞^P

σ^∗,τ s0 ^(B

n

T), the following equation is holds.

infτ ^P σ^∗,τ

s0 ^(B^C^{) ≥ val}^s0^(G(RT_l^∗^)).

(23)

general case

Proof

To show σ_ǫ^∗: ǫ-optimal strategy in game G(RT_{l ∗}).

Let us consider the case there is no optimal strategy for Player I in game G(R_T_{l ∗}).

The argument will be similar to the previous case, but more complicated because of approximation.

It is enough to show that for any s0 ∈ S, a starting point and for any ǫ > 0, there exist σ^∗ such that

infτ ^P σ^∗,τ

s0 ^(B^C^{) ≥ val}^s⁰^(G(R^Tl^∗^{)) − ǫ.}

(24)

Proof

Fix s0 ∈ S and ǫ > 0. Let {εn}_n∈N be a positive sequence of reals s.t. ∀n, εn> 0,

vals₀(G(RT))^Y

n

(1 − εn) ≥ vals₀(G(RT)) − ǫ.

Choose {σ_n}n∈N be a sequence of strategies of Player I s.t. ∀n, σ_n is a βn-optimal strategy of Player I in G(RT), where βn> 0 satisfies

vals(G(RT)) − βn> vals(G(RT))(1 − εn) for any s ∈ S.

(25)

Proof

For a finite play ρ, define σ^∗(ρ) = σ_n(ρ^′) where ρ^′⊇ ρ such that for any ρ^′′,

ρ = ρ^′′ρ^′ satisfies

[∀i < |ρ^′|, i > 0, ρ(i) /∈ T & (ρ 6= ρ^′ ⇒ ρ^′(0) ∈ T )] and

n = ♯{m|ρ^′′(m) ∈ T } + 1.

(26)

Proof

Thus, we have a following equalities. infτ ^P

σ^∗,τ s0 ^(B

n+1 T ^{) ≥ inf}_τ,τ_′

X

s₀₋_→^ρ s₁∈T

(prob_s^σ₀^∗^,τ(ρ)P_s^σ₁^∗^,τ^′(Bⁿ_T))

≥ inf

τ

X

s0

−ρ

→^s1∈T

[(prob^σ_s₀^∗^,τ(ρ)vals₁(G(RT_l^∗)) ^Y

i≤n+1 i>0

(1 − εi)]

= ( ^Y

i≤n+1 i>0

(1 − ε_i)) inf

τ

X

s₀₋_→^ρ s₁∈T

(prob_s^σ₀^∗^,τ(ρ)val_s₁(G(R_T^∗

l ⁾⁾⁾

≥ ( ^Y

i≤n+1 i>0

(1 − εi)) inf

τ ^P σ^∗,τ s0 ^(R^Tl ∗⁾

≥ ( ^Y

i≤n+1 i>0

(1 − ε_i))(1 − ε₀) · val_s₀(G(R_T^∗

l ⁾⁾

(27)

Proof

= vals0^(G(RT_l^∗)) ^Y

i≤n+1

(1 − εi).

Since

infτ ^P σ^∗,τ

s0 ^(B^C^{) ≥ inf}_τ ^P

σ^∗,τ s0 ⁽

\

n∈N

Bⁿ_T) and

vals₀(G(RT_l^∗)) − ǫ ≤ vals₀(G(RT_l^∗))^Y

n

(1 − εn), then we have

infτ ^P σ^∗,τ

s0 ^(B^C^{) ≥ val}^s0^(G(R^T_l^∗^{)) − ǫ,}

for any ǫ > 0, which proved the first equality.

(28)

Proof

Finally, we have to prove the second inequality such that infτ ^sup_σ ^P

σ,τ

s ^(BC^{) ≤ val}s(G(R_T^∗

l ^)).

It is enough to show that infτ ^sup_σ ^P

σ,τ

s ^(B^C^{) ≤ val}^s^(G(R^T_l^∗^{)) + ǫ}

for any ǫ > 0. Note that val_s(G(R_T^∗

l ^{)) = lim}_n→∞^val^s^(G(R^lⁿ⁾⁾

and

vals(G(Rln^{)) ≥ val}s(G(Rl_n+1)).

(29)

Proof.

Fix ǫ > 0. There is an n such that

vals(G(Rln^{)) ≤ val}s(G(RT_l^∗)) + ǫ. Clearly,

infτ ^sup_σ ^P σ,τ

s ^(BC^{) ≤ inf} τ ^sup_σ ^P

σ,τ

s ^(BTⁿ^{) = val}^s^(G(Rln^)).

Therefore,

infτ ^sup_σ ^P σ,τ

s ^(B^C^{) ≤ val}^s^(G(R^T_l^∗^{)) + ǫ.}

which proved the theorem.

(30)

Thank you.