Value on Concurrent B¨uchi games
Mathematical Institute Tohoku University Sendai Logic Seminar
October 19, 2012
Introduction
In this work, we present some recent results in the infinite games played on a finite graphs.
Roughly speaking, we show the existence of value on B¨uchi game (i.e., determined) by constructing generalized reachability games. Specifically, we provide how to compute an ǫ-optimal strategies and approximate a value of game in some way.
In particular, we show the value of B¨uchi game can be computed by special payoff function (we may call a weighted reachability payoff function) defined on generalized reachability games.
Finally, we show that the value of B¨uchi game can be written as a value of generalized reachability games.
Definitions of Concurrent games
Definition
A two-player concurrent game is given by G = (S, X , Y , δ), where S is a nonempty finite set of states
X is a nonempty set of strategies of Player I (actions) Y is a nonempty set of strategies of Player II (actions) a transition function δ : S × X × Y → S that specifies for every state s ∈ S, for each x ∈ X and for all y ∈ Y determine a successor state δ(s, x, y ).
Definitions of Concurrent games
Intuitively, a concurrent game is play as follows.
For every state s in S, Player I selects an element x1∈ X and Player II chooses y1 ∈ Y , each unaware of the choice of the other and then the game proceeds to the successor state δ(s, x1, y1). Next, Player I selects x2 ∈ X , and simultaneously the Player II selects y2∈ Y and the next state is δ(s′, x2, y2). By this way, they produce an infinite number of rounds.
Notations
A path of G is a finite or infinite sequence s0, s1, s2, ... of states in S such that for all i ≥ 0, there exist x1i ∈ X and there exist y1i ∈ Y where δ(si, x1i, y1i) = si+1.
We refer to infinite sequence of states as an infinite play (or run) w =< s0, s1, s2, ... >.
We denote Ω(S) be the set of all infinite play, or Ω if S is clear. The individual states in w are denoted by w (0), w (1), w (2), ..., and for every infinite play w and every i ≥ 0 we use w (i) to denote the i-th occurence of run w .
Winning Objectives
Reachability objectives
Let T ⊆ S be the set of states called target states. The reachability objective is defined as follows.
RT = {w =< s0, s1, s2, ... >∈ Ω : ∃k ≥ 0, w (k) ∈ T }. B¨uchi objectives
Let C ⊆ S be set of states called B¨uchi states. For a run w =< s0, s1, s2, ... >, we denote
In(w ) := {s ∈ S : ∃∞k such that w (k) = s} set of states that occur infinitely often in w . Thus, B¨uchi objective is given by
BC = {w ∈ Ω : In(w ) ∩ C 6= ∅}.
We denoted G(RT) and G(BC) the game with reachability and B¨uchi objectives, respectively.
Game Values
Given G(RT), we define the value function for Player I as follows. valσs(G(RT)) = sup
σ
infτ P σ,τ s (RT).
Similarly, the value function for Player II is given by valτs(G(¬RT)) = sup
τ
infσ P σ,τ
s (¬RT).
The game G with objective RT and ¬RT is determinate if valσs(G(RT)) + valτs(G(Ω \ RT)) = 1.
Determinacy implies the following equalities. For every s ∈ S, sup
σ
infτ P σ,τ
s (RT) = inf τ supσ P
σ,τ s (RT).
If these two quantities are equal, we call them the value of the game, denoted by vals(G(RT)).
We define these value in similar way for B¨uchi game (G(BC)).
Optimal Strategies
A strategy that achieves the value is an optimal strategy. Player I strategy σ is optimal if
vals(G(RT)) = inf
τ P σ,τ s (RT).
For ǫ > 0, a strategy σ for Player I is ǫ−optimal if vals(G(RT)) ≤ inf
τ P
sσ,τ(RT) + ǫ. An optimal and ǫ-optimal strategy for Player II is defined analogously.
Determinacy of B¨uchi games
Idea:
We introduce generalizing reachability game. We first show this game is determined.
Then, we define a function l∗ on the states S over the non-negative real numbers (called a valuation) w.r.t. generalizing reachability game.
Finally, we present the determinacy theorem for B¨uchi games (by showing vals(G(RTl∗)) = vals(G(BC))).
Generalized Reachability Games
We first introduce a simple function on the set of states S over the non-negative real numbers (called a labelling) as follows.
l : S → R≥0 Define
Tl = {s ∈ S : l(s) > 0} set of states with strictly positive labelled.
weighted reachability payoff
Consider the generalized reachability games (denoted by G(RTl)) with reachability objectives RTl := {w ∈ Ω : ∃n > 0, w (n) ∈ Tl} and whose values can be describes as follows.
valw(G(RTl)) =
(l(w (µn > 0 : w (n) ∈ Tl)) if ∃n > 0 s.t. w (n) ∈ Tl
0 if otherwise
for a w =< s0, s1, s2, ... >∈ Ω.
We may call this values as a weighted reachability payoff which assigns to every run w either 0 (if w does does not visit a target state) or the reward of the first target state visited by w .
Generalized Reachability Games
It is not hard to check this game is determined or not.
We need to define a new interpretation of value (called a limit value) of generalized reachability games (denoted by V (s)). We then show that this value can be written as a value of the game vals(G(RTl)), where the definition of V (s) is given as follows.
limit value
Definition
For a given generalized reachability game G(RTl), and for every state s ∈ S, for any n ≥ 0, the value Vn(s) is defined by
V0(s) := 0
Vn+1(s) :=
the value of one step game
assigning Vn(s′) on s′ if s′ ∈ T/ l
and
l(s′′) on s′′ if s′′∈ Tl
Note that the sequence of {Vn(s)}n∈N is non-decreasing and bounded. Thus, we write the limit value as follows.
V (s) := lim
n→∞Vn(s).
generalized reachability game is determined
We can now prove the following theorem whose proof basically follows from the theorem for Det. of Reachability games. Theorem
V (s) = supσinfτPsσ,τ(RTl) = infτsupσPsσ,τ(RTl) = vals(G(RTl))
function l
∗(s)
Definition
For every state s ∈ S and for any n ≥ 0, a function ln : S → R≥0 is defined by
l0(s) :=
(1 if s ∈ C 0 if otherwise
ln+1(s) :=
(vals(G(Rln)) if s ∈ C
0 if otherwise
Note that {ln(s)}n∈N is a non-increasing bounded sequence and the limit exists. Therefore we set
l∗(s) := lim
n→∞ln(s).
The theorem
Now, we are ready to prove our main theorem of this section. Theorem
For a given B¨uchi game G(BC), and for all states s ∈ S, the followings are hold
vals(G(RTl∗)) = sup
σ
infτ P σ,τ
s (BC) = infτ sup σ
Psσ,τ(BC) = vals(G(BC))
Proof
The last equality is holds if we prove the first two equalities. Thus, it is enough by showing the following inequalities.
vals(G(RT∗
l )) ≤ sup
σ
infτ P σ,τ
s (BC) ≤ inf τ supσ P
σ,τ
s (BC) ≤ vals(G(RT∗
l )).
Proof
The proof is devided by two phases.
(1) special case (assuming there exist an optimal strategy σ) (2) general case (the existence of σǫ∗: HR ǫ-optimal strategy)
Special case
Proof
Suppose that there exist an optimal strategy σ for Player I in game G(RT∗
l ).
We define σ∗(ρ) = σ(ρ′) where ρ′ is the finite play such that ρ = (ρ ↾ M)ρ′ and M = max{0, i : ρ(i) ∈ Tl∗}.
Fix s0 ∈ S as a start point of the game.
Proof
Firstly, we want to show that there exist an optimal strategy σ∗ of Player I such that
infτ P σ∗,τ
s0 (BTn) ≥ vals0(G(RTl∗)) where
BnT := {w ∈ Ω : ∃≥nm such that w (m) ∈ T } and
T := {t ∈ C : l∗(t) > 0}.
Intuitively, we will show there is an optimal strategy σ∗ where the probability of reaching BTn, there exist n many natural numbers m such that m-th element of w is in T achieve the value at state s0
of G(RT∗
l ) games.
*We may call G(BTn) an n-approximation of B¨uchi games.
Proof
For the case n = 1, the probability of reaching state s1 from s0 where s1 ∈ T is achieve the value at state s0. Specifically,
infτ P σ∗,τ
s0 (BT1) = vals0(G(RTl∗)).
Therefore, the probability of entering T in n + 1 times is given by infτ P
σ∗,τ
s0 (BTn+1) ≥ infτ,τ′
X
s0−→ρ s1∈T
(probsσ0∗,τ(ρ)Psσ1∗,τ′(BTn))
where
probσs0∗,τ(ρ) := Y
n<|ρ|
X
x,y∈X ×Y : δ(ρ↾n−1,x,y)=ρ↾n
σ∗(ρ↾n)(x)τ (ρ↾n)(y ).
Proof
By induction hypothesis, the followings are holds
≥ inf
τ
X
s0
−ρ
→s1∈T
(probsσ0∗,τ(ρ) · l∗(s1)
= inf
τ P σ∗,τ s0 (RTl∗)
= vals0(G(RT∗
l ))
by the choice of σ∗.
Proof
Note that BT1 ⊇ B2T ⊇ BT3 ⊇ .... Thus, we have Psσ0∗,τ(\
n∈N
BnT) = lim
n→∞P σ∗,τ s0 (BTn)
which implies
Psσ0∗,τ(BC) ≥ lim
n→∞P σ∗,τ s0 (BTn).
Since vals0(G(RT∗
l )) ≤ limn→∞P
σ∗,τ s0 (B
n
T), the following equation is holds.
infτ P σ∗,τ
s0 (BC) ≥ vals0(G(RTl∗)).
general case
Proof
To show σǫ∗: ǫ-optimal strategy in game G(RTl ∗).
Let us consider the case there is no optimal strategy for Player I in game G(RTl ∗).
The argument will be similar to the previous case, but more complicated because of approximation.
It is enough to show that for any s0 ∈ S, a starting point and for any ǫ > 0, there exist σ∗ such that
infτ P σ∗,τ
s0 (BC) ≥ vals0(G(RTl∗)) − ǫ.
Proof
Fix s0 ∈ S and ǫ > 0. Let {εn}n∈N be a positive sequence of reals s.t. ∀n, εn> 0,
vals0(G(RT))Y
n
(1 − εn) ≥ vals0(G(RT)) − ǫ.
Choose {σn}n∈N be a sequence of strategies of Player I s.t. ∀n, σn is a βn-optimal strategy of Player I in G(RT), where βn> 0 satisfies
vals(G(RT)) − βn> vals(G(RT))(1 − εn) for any s ∈ S.
Proof
For a finite play ρ, define σ∗(ρ) = σn(ρ′) where ρ′⊇ ρ such that for any ρ′′,
ρ = ρ′′ρ′ satisfies
[∀i < |ρ′|, i > 0, ρ(i) /∈ T & (ρ 6= ρ′ ⇒ ρ′(0) ∈ T )] and
n = ♯{m|ρ′′(m) ∈ T } + 1.
Proof
Thus, we have a following equalities. infτ P
σ∗,τ s0 (B
n+1 T ) ≥ infτ,τ′
X
s0−→ρ s1∈T
(probsσ0∗,τ(ρ)Psσ1∗,τ′(BnT))
≥ inf
τ
X
s0
−ρ
→s1∈T
[(probσs0∗,τ(ρ)vals1(G(RTl∗)) Y
i≤n+1 i>0
(1 − εi)]
= ( Y
i≤n+1 i>0
(1 − εi)) inf
τ
X
s0−→ρ s1∈T
(probsσ0∗,τ(ρ)vals1(G(RT∗
l )))
≥ ( Y
i≤n+1 i>0
(1 − εi)) inf
τ P σ∗,τ s0 (RTl ∗)
≥ ( Y
i≤n+1 i>0
(1 − εi))(1 − ε0) · vals0(G(RT∗
l ))
Proof
= vals0(G(RTl∗)) Y
i≤n+1
(1 − εi).
Since
infτ P σ∗,τ
s0 (BC) ≥ infτ P
σ∗,τ s0 (
\
n∈N
BnT) and
vals0(G(RTl∗)) − ǫ ≤ vals0(G(RTl∗))Y
n
(1 − εn), then we have
infτ P σ∗,τ
s0 (BC) ≥ vals0(G(RTl∗)) − ǫ,
for any ǫ > 0, which proved the first equality.
Proof
Finally, we have to prove the second inequality such that infτ supσ P
σ,τ
s (BC) ≤ vals(G(RT∗
l )).
It is enough to show that infτ supσ P
σ,τ
s (BC) ≤ vals(G(RTl∗)) + ǫ
for any ǫ > 0. Note that vals(G(RT∗
l )) = limn→∞vals(G(Rln))
and
vals(G(Rln)) ≥ vals(G(Rln+1)).
Proof.
Fix ǫ > 0. There is an n such that
vals(G(Rln)) ≤ vals(G(RTl∗)) + ǫ. Clearly,
infτ supσ P σ,τ
s (BC) ≤ inf τ supσ P
σ,τ
s (BTn) = vals(G(Rln)).
Therefore,
infτ supσ P σ,τ
s (BC) ≤ vals(G(RTl∗)) + ǫ.
which proved the theorem.
Thank you.