3. Repeated Games
(and Long-term Relationship)
• The same game is repeatedly played by the same players
• A theory of long-term relationship
• The past outcome (history) is
• Completely observable: Perfect monitoring
• Not observable completely: Imperfect monitoring
• Players commonly observe something: Imperfect public monitoring
• Players privately observe something: Imperfect private monitoring
• Can people cooperate with each other when they are in a
long-term relationship?
Stage game
3.1. Finitely Repeated Games
E.g., Prisoners’ Dilemma
• Stage game actions: �� = �, � , � = �1 ×�2
• Stage payoff �1,�2
• PD is played T times (2 ≤ � < ∞)
• History at period t: ℎ� = �1,�2, … ,��−1 ∈ �� = ��−1, (�1 = ∅ )
• (Pure) strategy of player i: for each t,
��:�� → ��
• ��� = �� ℎ� is an action taken at period t under history ℎ�
• Mixed and behavior strategies are well defined, but we do not consider here
• Total payoff: �� � = ∑�=1� �� ��
1 / 2 C D
C 2, 2 -1, 3
D 3, -1 0, 0
Pure Strategy in Repeated Games
• 繰り返しゲームの各t期(の初め)において、各プレーヤーは「t-1期までに何が 起きたか」を観察している
• 第t期のある(一つの)履歴(history)ℎ�とは、「t-1期までに何が起きたか」を記 述したもの:
ℎ� = �1,�2, … ,��−1
• �� = ��� �∈�は第s期にプレイされた行動の組を表す
• 第t期の初めにあり得る全てのhistoryの集合: ��
• �� = ��−1 (�� ∈ �, s=1,…,t-1なので、��は【「t-1個の行動組� ∈ �の列」の集合】の 意味)
• プレーヤーiは、それまでのあり得る全てのhistory ℎ� ∈ ��に対して、第t期にプ レイする行動��� ∈ �を決定:��� = �� ℎ� ∈ �
• プレーヤーiの戦略:全ての期t=1,2,…について、そのときあり得る全てのhistory ℎ� ∈ ��の下で第t期にプレイする行動���を決定
• Player i’s action plan at period t: ���:�� → �
• Player i’s strategy ��
�� = ��� �=1∞ (= ��1,��2,��3, … )
SPNE
• Backward induction
• Period T:
• Period T-1:
• Period T-2:
• And so on
• Therefore, (D,D) is played for all T periods in a unique SPNE
• Simply one-shot PD
• Regardless of ℎ� ∈ ��, (D,D) is a unique Nash eqm
• Regardless of ℎ�−1 ∈ ��−1 and the current outcome, (D,D) is played at period T
• Same as the case where T-1 is the last period
• (D,D) is played as a unique Nash eqm
• Repeat the same consideration
• Regardless of ℎ�−2 ∈ ��−2, (D,D) is a unique Nash eqm
• Cooperation (C,C) is not achieved in finitely repeated PD…
Proposition 3.1.1. In finitely repeated games, it is a subgame perfect Nash equilibrium to play a stage Nash equilibrium in each period. In addition, if there is a unique Nash equilibrium in the stage game, repetition of stage Nash equilibrium is a unique subgame perfect Nash equilibrium.
• Repetition of stage NE is a trivial equilibrium
• In infinitely repeated games, cooperation can be sustained even if stage Nash is unique!
• When there are multiple NE in the stage game, we can construct a non- trivial SPNE such that stage NE is not taken
• “Carrot and stick”
• We learn this in homework
3.2. Infinitely Repeated Games
• Infinitely repeated PD
• Same definition regarding stage game
• Same definition for history ℎ� ∈ �� (� = 1,2, …) and strategy
• Discount factor � ∈ 0,1
• Payoff is defined as the discounted sum:
�� � = �
�=1
∞
��−1�� ��
• Sometimes average payoff is used:
�� � = 1 − � �� �
• When a player earns a constant stage payoff �� for all t, the associated payoff is 1 + � + �2 +⋯ �� = ��
1 − �
• Hence, when the discounted sum is �� � , the average payoff �� per period is
��
1− � = �� �
• Discount factor
• Represents time preference
• Probability that the game continues
• With probability 1− �, either player dies
• When interest rate is � > 0, present value of future payoff is discounted by � = 1
1+�
• Large �: more patient
• In infinitely repeated games, we cannot use backward induction, because the “end of the game” does not exist
• However, we can solve due to the recursive structure of repeated games
• The subgame after one-period the stage game is the original game
3.2.1. Cooperation
• In infinitely repeated PD, (C,C) can be achieved by an SPNE!
Definition 3.2.1. In the grim-trigger strategy ���� in repeated PD games, (1) ���� ℎ1 = �, and
(2) ���� ℎ� = �� if ℎ� = �, � , �, � , … , �, �
� otherwise
• In the grim-trigger strategy, a player starts with cooperation.
• Keep choosing C whenever no one (including myself) has chosen defection
• Once someone chooses D, the player turns to D and plays D forever
• When both players take the grim-trigger strategy, (C,C) is played in each period
Proposition 3.2.1. Suppose � ≥ 1/3. In the (infinitely) repeated prisoners’ dilemma, the profile of grim-trigger strategies is a subgame perfect Nash equilibrium.
Proof.
Suppose that player 2 takes the grim-trigger strategy.
If player 1 also takes the grim-trigger strategy, his payoff is 2/(1-δ). Suppose that P1 deviates and takes D at period t. Then P1 earns stage payoff �1 = 3 at t
Because P2 chooses D for all �′ ≥ � + 1, it is optimal for P1 to choose D for all �′ ≥ � + 1 too.
Because the realized payoff is the same up to � − 1, the deviation is not profitable if
��−1 3 + � ⋅ 0 + �2 ⋅ 0 + ⋯ ≤ ��−1 2 + � ⋅ 2 + �2 ⋅ 2 + ⋯
⇔ � ≥ 13
3.2.2. One Shot Deviation Principle
• In repeated games, it looks hard to check whether a strategy profile is an SPNE, because there are infinitely many strategies (even if stage game is very simple)
• However, in fact, it is not so hard. We can check it without considering all the possible strategies: One shot deviation principle (1回逸脱の原 理)
Definition 3.2.2. For every period t, the continuation payoff (継続利得)
�� � ℎ� is a total payoff under a specified strategy profile in a subgame starting at t:
�� � ℎ� = �
�=�
∞
��−��� ��
• 戦略組� ∈ �のとき、第t期にプレイされる行動組:
�� = � ℎ� = �� ℎ� �∈�
• Continuation payoff
�� � = �� �1 + ��� �2 + ⋯ + ��−1�� �� + ���� ��+1 + ��+1�� ��+2 + ⋯
= �� �1 + ⋯ ��−2�� ��−1 + ��−1 �� �� + ��� ��+1 + �2�� ��+2 + ⋯
≡ �
�� ℎ
�history ℎ
�Continuation payoff
Definition 3.2.3. In repeated games, a strategy profile �∗ is a subgame perfect Nash equilibrium if for all ��′, all t, and all ℎ�,
�� �∗ ℎ� ≥ �� ��′,�−�∗ ℎ� .
• Fix any strategy profile other than i, �−�∗
• Under �−�∗ , we want to verify whether ��∗ is optimal or not
• To check this, we do NOT have to consider all possible strategies
Proposition 3.2.2 [one shot deviation principle]. Fix any �−�∗ . A strategy ��∗ is optimal if player i is not better off deviating only at period � and going back to ��∗ from period t+1.
• ��∗が最適かどうか確かめるには、t期に1回だけ��∗から逸脱したときに 得をしないかどうかをチェックすれば十分
Proof. (sketch)
• Fix �−�∗ and consider any ℎ�
• Let ��∗ = �� �∗ ℎ�
• Let ��′ = �� ��′,�−�∗ ℎ� , where ��′ is any strategy
• Let ��� = �� ���,�−�∗ ℎ� , where ��� is any “k-shot deviation” of ��∗
• Deviation at periods �, � + 1, … , � + � − 1 only
• We want to show [��∗ ≥ ��1 ⇒ ��∗ ≥ ��′]
• Suppose ��∗ ≥ ��1. Then, we have
��∗ ≥ ��1 ≥ ��2 ≥ ��3 ≥ ⋯
• The first inequality is by assumption
• The second inequality follows by applying the assumption after the deviation at period t (“ℎ�+1”)
• The third inequality follows by applying the assumption after the deviation at periods t and t+1 (“ℎ�+2”), and so on
• Thus, any finite-period-shot deviation is not profitable
• To complete the proof, we need to consider the case of “infinite k”
• “Infinite-period deviation” is almost the same as “k-shot deviation” with large k, because future payoff is discounted by ��−1 ≈ 0 and negligible
3.2.3. The Folk Theorem
• Infinitely repeated games with the general stage game:
• � = 1, … , �
• Player i’s action �� ∈ ��. Action profile � ∈ �
• Stage payoff function �� �
• Stage Nash equilibrium �∗
Proposition 3.2.3 [Folk Theorem]. Suppose that an action profile � ∈ � satisfies �� � > �� �∗ for all � ∈ �. If discount factor � is sufficiently close to 1, there exists a subgame perfect Nash equilibrium such that the profile of average payoffs is �� = �1 � , … , �� �
• If players are sufficiently patient (� is large), any stage outcome better than stage NE can be sustainable in an SPNE
• In infinitely repeated games, there are so many SPNE
• Basically, almost everything can be supported in an SPNE
• For any � ∈ 0,1 , we can construct an SPNE that induces the average payoffs of
�� = �� � + 1 − � � �∗
• E.g. (repeated PD): By taking (C,C) in every odd period and (D,D) in every even period, players achieve (almost) the “half cooperation” (� = 1/2)
• In fact, even an average payoff such that �� < � �∗ can be sustained
3.2.4. Tit-for-Tat Strategy
• Axelrod (1984)
• Repeated PD
Definition 3.2.4. In the tit-for-tat strategy ���� in repeated PD games, (1) ���� ℎ1 = �
(2) ���� ℎ� = ���−1 for � ≥ 2 (the action taken by the other player in the previous period)
• In Axelrod’s computer simulations, various strategies are run in tournament. Tit-for-tat finally won.
Proposition 3.2.4. In (generic) repeated PD games, the profile of tit-for-tat strategies is NOT a subgame perfect Nash equilibrium.
• Under the specification of sec 3.1, the profile of tit-for-tat is a subgame perfect Nash equilibrium if and only if � = 1/3
• Proof is homework