Lecture 9: Repeated Games
Advanced Microeconomics II
Yosuke YASUDA
Osaka University, Department of Economics yasuda@econ.osaka-u.ac.jp
January 6, 2015
1 / 12
Finitely Repeated Games (1)
A repeated game, a specific class of dynamic game, is a suitable framework for studying the interaction between immediate gains and long-term incentives, and for understanding how a reputation mechanism can support cooperation.
Let G = {A1, ..., An; u1, ..., un} denote a static game in which players 1 through n simultaneously choose actions a1 through an
from the action spaces A1 through An, and payoffs are u1(a1, ..., an) through un(a1, ..., an).
Definition 1
The game G is called the stage game of the repeated game. Given a stage game G, let G(T ) denote the finitely repeated gamein which G is played T times, with the outcomes of all preceding plays observed before the next play begins.
Assume that the payoff for G(T ) is simply the sum of the
payoffs from the T stage games. 2 / 12
Finitely Repeated Games (2)
Theorem 2
If the stage game G has a unique Nash equilibrium, then, for any finite T , the repeated game G(T ) has a unique subgame perfect Nash equilibrium: the Nash equilibrium of G is played in every stage irrespective of the past history of the play.
Proof.
We can solve the game by backward, that is, starting from the smallest subgame and going backward through the game.
In stage T , players choose a unique Nash equilibrium of G. Given that, in stage T − 1, players again choose the same Nash equilibrium outcome, since no matter what they play the last stage game outcome will be unchanged.
This argument carries over backwards through stage 1, which concludes that the unique Nash equilibrium outcome is played in every stage.
3 / 12
Finitely Repeated Games (3)
When there are more than one Nash equilibrium in a stage game, multiple subgame perfect Nash equilibria may exist.
Furthermore, an action profile which does not constitute a stage game Nash equilibrium may be sustained (for any period t < T) in a subgame perfect Nash equilibrium.
✞
✝
☎
Q The following stage game will be played twice. Can players✆ sustain non-equilibrium outcome (M1, M2) in the first period?
1 2 L2 M2 R2
L1 1, 1 5, 0 0, 0 M1 0, 5 4, 4 0, 0 R1 0, 0 0, 0 3, 3
✞
✝
☎
Rm Note that there are two Nash equilibria in the stage game,✆ i.e., (L1, L2), (R1, R2): future behavior can influence current
behavior in the first period. 4 / 12
Infinitely Repeated Games (1)
Even if the stage game has a unique Nash equilibrium, there may be subgame perfect outcomes of the infinitely repeated game in which no stage game’s outcome is a Nash equilibrium of G. Let G(∞, δ) denote the infinitely repeated game in which G is repeated forever and the players share the discount factor δ.
For each t, the outcomes of the t − 1 preceding plays of the stage game are observed before the t-th stage begins. Each player’s payoff in G(∞, δ) is the average payoff defined as follows.
Definition 3
Given the discount factor δ, the average payoff of the infinite sequence of payoffs π1, π2, ...is
(1 − δ)(π1+ δπ2+ δ2π3+ · · ·) = (1 − δ) X∞ t=1
δt−1πt.
5 / 12
Infinitely Repeated Games (2)
There are a few important remarks:
The history of play through stage t is the record of the players’ choices in stages 1 through t.
The players might have chosen (as1, ..., asn) in stage s, where for each player i the action asi belongs to Ai.
In the finitely repeated game G(T ) or the infinitely repeated game G(∞, δ), a player’s strategy specifies the action the player will take in each stage, for every possible history of play. In the finitely repeated game G(T ), a subgame beginning at stage t + 1 is the repeated game in which G is played T − t times, denoted G(T − t).
In the infinitely repeated game G(∞, δ), each subgame beginning at any stage is identical to the original game. In a repeated game, a Nash equilibrium is subgame perfect if the players’ strategies constitute a Nash equilibrium in every subgame, i.e., after every possible history of the play.
6 / 12
Unimprovability (1)
Definition 4
A strategy σi is called a perfect best response to the other players’ strategies, when player i has no incentive to deviate following any history.
Consider the following requirement that, at first glance, looks much weaker than the perfect best response condition. Definition 5
A strategy for i is unimprovable against a vector of strategies of her opponents if there is no t − 1 period history (for any t) such that i could profit by deviating from her strategy in period t only (conforming thereafter).
To verify the unimprovability of a strategy, one checks only
“one-shot” deviations from the strategy, rather than arbitrarily complex deviations (such as defecting in every period).
7 / 12
Unimprovability (2)
Theorem 6
Let the payoffs of G be bounded. In the repeated game G(S) or G(∞, δ), strategy σi is a perfect best response to a profile of strategies σ if and only if σi is unimprovable against that profile. Proof of ⇒ (⇐ is trivial).
Consider the contrapositive.
1 If σi is not a perfect best response, there must be a history after which it is profitable to deviate to some other strategy.
2 Then, because of discounting and boundedness of payoffs, there must exist a profitable deviation involves defection for finitely many periods (and conforms to σi thereafter).
3 Consider a profitable deviation involving defection at the smallest possible number of period, denoted by T .
4 In such a profitable deviation, the player must be improvable after deviating for T − 1 period.
8 / 12
Repeated Prisoner’s Dilemma (1)
✞
✝
☎
Q The following prisoner’s dilemma will be played infinitely many✆ times. Under what conditions of δ, we can sustain cooperation (C1, C2) as a SPNE?
1 2 D2 C2
D1 1, 1 5, 0
C1 0, 5 4, 4
Suppose that player i plays Ci in the first stage. In the t-th stage, if the outcome of all t − 1 preceding stages has been (C1, C2) then
play Ci; otherwise, play Di.
This strategy is called trigger strategy, because player i cooperates until someone fails to cooperate, which triggers a switch to noncooperation forever after.
If both players adopt this trigger strategy then the outcome of the infinitely repeated game will be (C1, C2) in every stage.
9 / 12
Repeated Prisoner’s Dilemma (2)
To show that the trigger strategy is SPNE, we must verify that the trigger strategies constitute a Nash equilibrium on every subgame of that infinitely repeated game.
✞
✝
☎
Rm Since every subgame of an infinitely repeated game is✆ identical to the game as a whole, we have to consider only two types of subgames: (i) subgame in which all the outcomes of earlier stages have been (C1, C2), and (ii) subgames in which the outcome of at least one earlier stage differs from (C1, C2).
Thanks to the previous theorem, it is sufficient to show that there is no one-shot profitable deviation in every possible history that can realize under the trigger strategy.
Players have no incentive to deviate in (ii) since trigger strategy involves repeated play of one shot NE, (D1, D2).
10 / 12
Repeated Prisoner’s Dilemma (3)
The following condition guarantees that there will be no (one-shot) profitable deviation in (i).
4 + 4δ + 4δ2+ · · · ≥ 5 + δ + δ2+ · · ·
⇐⇒ 3(δ + δ2+ · · · ) ≥ 1
⇐⇒ 3δ
1 − δ ≥ 1 ⇐⇒ δ ≥ 1 4.
✞
✝
☎
Rm Mutual cooperation (C✆ 1, C2) can be sustained as an SPNE outcome by using the trigger strategy when a discount factor is sufficiently large.
The next theorem, called falk theorem, states that large subsets of feasible payoffs are sustained in an SPNE.
11 / 12
Falk Theorem
Definition 7
Payoffs (x1, ..., xn) are called feasible in the stage game G if it is a convex combination of the pure strategy payoffs of G.
Theorem 8 (Falk Theorem)
Let G be a finite, static game. Let(e1, ..., en) denote the payoffs from a Nash equilibrium of G, and let(x1, ..., xn) denote any other feasible payoffs from G. If xi> ei for every player i and if δ is sufficiently close to one, then there exists a subgame perfect Nash equilibrium of the infinitely repeated game G(∞, δ) that achieves (x1, ..., xn) as the average payoff.
Proof See Appendix 2.3.B (Gibbons, pp.100)
✞
✝
☎
Rm The name comes from the fact that the statement (relying✆ on NE rather than SPNE) was widely known among game theorists in the 1950s, even though no one had published it.
12 / 12