ミクロ経済学II（大学院修士課程） Ryuji Sano repeated games v2

(1)

3. Repeated Games

(and Long-term Relationship)

• The same game is repeatedly played by the same players

• A theory of long-term relationship

• The past outcome (history) is

• Completely observable: Perfect monitoring

• Not observable completely: Imperfect monitoring

• Players commonly observe something: Imperfect public monitoring

• Players privately observe something: Imperfect private monitoring

• Can people cooperate with each other when they are in a

long-term relationship?

Stage game

(2)

3.1. Finitely Repeated Games

E.g., Prisoners’ Dilemma

• Stage game actions: _�_� = �, � , � = �₁ ^×�₂

• Stage payoff _�₁_,_�₂

• PD is played T times (2 ≤ � < ∞)

• History at period t: _ℎ_� = _�¹,_�², … ,_�^�−1 _{∈ �}_� = _�^�−1, (_�₁ = _{∅ )}

• (Pure) strategy of player i: for each t,

�_�^:�_� → �_�

• _�_�^� = _�_� _ℎ_� is an action taken at period t under history _ℎ_�

• Mixed and behavior strategies are well defined, but we do not consider here

• Total payoff: _�_� _{� = ∑}_�=1^� _�_� _�^�

1 / 2 C D

C 2, 2 -1, 3

D 3, -1 0, 0

(3)

Pure Strategy in Repeated Games

• ^{繰り返しゲームの各}t期（の初め）において、各プレーヤーは「_t-1期までに何が起きたか」を観察している

• ^第t期のある（一つの）履歴（_history）_ℎ_�とは、「_t-1期までに何が起きたか」を記述したもの：

ℎ_� ⁼ �¹^,�²^{, … ,}�^�−1

• _�^� = _�_�^� _�∈�^は第s期にプレイされた行動の組を表す

• ^第t期の初めにあり得る全ての_historyの集合_:_�_�

• _�_� = _�^�−1 ⁽_�^� ∈ �, s=1,…,t-1^なので、�_�^は【「^t-1^{個の行動組}� ∈ �^{の列」の集合】の} 意味₎

• ^{プレーヤー}iは、それまでのあり得る全ての_history_ℎ_� _{∈ �}_�に対して、第_t期にプレイする行動_�_�^� _{∈ �}を決定：_�_�^� ₌ _�_� _ℎ_� _{∈ �}

• ^{プレーヤー}i^{の戦略：全ての期}t=1,2,…について、そのときあり得る全ての_history ℎ_� ∈ �_�^の下で第^t^{期にプレイする行動}�_�^�^を決定

• Player i’s action plan at period t: _�_�^�:_�_� _{→ �}

• Player i’s strategy _�_�

�_� ⁼ �_�^� _�=1^∞ ⁽⁼ �_�¹^,�_�²^,�_�³^{, … )}

(4)

SPNE

• Backward induction

• Period T:

• Period T-1:

• Period T-2:

• And so on

• Therefore, (D,D) is played for all T periods in a unique SPNE

• Simply one-shot PD

• Regardless of _ℎ_� _{∈ �}_�, (D,D) is a unique Nash eqm

• Regardless of _ℎ_�−1 _{∈ �}_�−1 and the current outcome, (D,D) is played at period T

• Same as the case where T-1 is the last period

• (D,D) is played as a unique Nash eqm

• Repeat the same consideration

• Regardless of _ℎ_�−2 _{∈ �}_�−2, (D,D) is a unique Nash eqm

• Cooperation (C,C) is not achieved in finitely repeated PD…

(5)

Proposition 3.1.1. In finitely repeated games, it is a subgame perfect Nash equilibrium to play a stage Nash equilibrium in each period. In addition, if there is a unique Nash equilibrium in the stage game, repetition of stage Nash equilibrium is a unique subgame perfect Nash equilibrium.

• Repetition of stage NE is a trivial equilibrium

• In infinitely repeated games, cooperation can be sustained even if stage Nash is unique!

• When there are multiple NE in the stage game, we can construct a non- trivial SPNE such that stage NE is not taken

• “Carrot and stick”

• We learn this in homework

(6)

3.2. Infinitely Repeated Games

• Infinitely repeated PD

• Same definition regarding stage game

• Same definition for history _ℎ_� _{∈ �}_� (� = 1,2, …) and strategy

• Discount factor _{� ∈ 0,1}

• Payoff is defined as the discounted sum:

�_� � = �

�=1

∞

�^�−1�_� �^�

• Sometimes average payoff is used:

�_� � = 1 − � �_� �

• When a player earns a constant stage payoff �� for all t, the associated payoff is 1 + _{� + �}² +_{⋯ �� =} ^��

1 _{− �}

• Hence, when the discounted sum is _�_� � , the average payoff �_� per period is

�_�

1_{− �} ⁼ ^�^� ^�

(7)

• Discount factor

• Represents time preference

• Probability that the game continues

• With probability 1− �, either player dies

• When interest rate is � > 0, present value of future payoff is discounted by _{� =} ¹

1+�

• Large �: more patient

(8)

• In infinitely repeated games, we cannot use backward induction, because the “end of the game” does not exist

• However, we can solve due to the recursive structure of repeated games

• The subgame after one-period the stage game is the original game

(9)

3.2.1. Cooperation

• In infinitely repeated PD, (C,C) can be achieved by an SPNE!

Definition 3.2.1. In the grim-trigger strategy _�_�^�� in repeated PD games, (1) _�_�^�� _ℎ₁ = _{�, and}

(2) _�_�^�� _ℎ_� = _�� ^if ^ℎ^� ⁼ �, � , �, � , … , �, �

� ^otherwise

• In the grim-trigger strategy, a player starts with cooperation.

• Keep choosing C whenever no one (including myself) has chosen defection

• Once someone chooses D, the player turns to D and plays D forever

• When both players take the grim-trigger strategy, (C,C) is played in each period

(10)

Proposition 3.2.1. Suppose � ≥ 1/3. In the (infinitely) repeated prisoners’ dilemma, the profile of grim-trigger strategies is a subgame perfect Nash equilibrium.

Proof.

Suppose that player 2 takes the grim-trigger strategy.

If player 1 also takes the grim-trigger strategy, his payoff is 2/(1-_δ). Suppose that P1 deviates and takes D at period t. Then P1 earns stage payoff _�₁ = 3 at t

Because P2 chooses D for all _�^′ ≥ � + 1, it is optimal for P1 to choose D for all _�^′ ≥ � + 1 too.

Because the realized payoff is the same up to � − 1, the deviation is not profitable if

�^�−1 ^{3 +} � ⋅ 0 + �² ⋅ 0 + ⋯ ≤ �^�−1 ^{2 +} � ⋅ 2 + �² ⋅ 2 + ⋯

⇔ � ≥ ¹₃

(11)

3.2.2. One Shot Deviation Principle

• In repeated games, it looks hard to check whether a strategy profile is an SPNE, because there are infinitely many strategies (even if stage game is very simple)

• However, in fact, it is not so hard. We can check it without considering all the possible strategies: One shot deviation principle (1^{回逸脱の原} 理₎

Definition 3.2.2. For every period t, the continuation payoff (^継続利得)

�_� � ℎ_� is a total payoff under a specified strategy profile in a subgame starting at t:

�_� � ℎ_� ⁼ �

�=�

∞

�^�−��_� �^�

(12)

• ^戦略組_{� ∈ �}^{のとき、第}t期にプレイされる行動組：

�^� ⁼ � ℎ_� ⁼ �_� ℎ_{� �∈�}

• Continuation payoff

�_� � = �_� �¹ ⁺ ��_� �² ⁺ ⋯ + �^�−1�_� �^� ⁺ �^��_� �^�+1 ⁺ �^�+1�_� �^�+2 ⁺ ⋯

= _�_� _�¹ + _{⋯ �}^�−2_�_� _�^�−1 + _�^�−1 _�_� _�^� + _��_� _�^�+1 + _�²_�_� _�^�+2 + _⋯

≡ �

_�

� ℎ

_�

history _ℎ

_�

Continuation payoff

(13)

Definition 3.2.3. In repeated games, a strategy profile _�^∗ is a subgame perfect Nash equilibrium if for all _�_�^′, all t, and all _ℎ_�,

�_� �^∗ ℎ_� ≥ �_� �_�^′^,�_−�^∗ ℎ_� ^.

• Fix any strategy profile other than i, _�_−�^∗

• Under _�_−�^∗ , we want to verify whether _�_�^∗ is optimal or not

• To check this, we do NOT have to consider all possible strategies

Proposition 3.2.2 [one shot deviation principle]. Fix any _�_−�^∗ . A strategy _�_�^∗ is optimal if player i is not better off deviating only at period � and going back to _�_�^∗ from period t+1.

• _�_�^∗が最適かどうか確かめるには、_t期に₁回だけ_�_�^∗から逸脱したときに得をしないかどうかをチェックすれば十分

(14)

Proof. (sketch)

• Fix _�_−�^∗ and consider any _ℎ_�

• Let _�_�^∗ = _�_� _�^∗ _ℎ_�

• Let _�_�^′ = _�_� _�_�^′,_�_−�^∗ _ℎ_� , where _�_�^′ is any strategy

• Let _�_�^� = _�_� _�_�^�,_�_−�^∗ _ℎ_� , where _�_�^� is any “k-shot deviation” of _�_�^∗

• Deviation at periods �, � + 1, … , � + � − 1 only

• We want to show [_�_�^∗ _{≥ �}_�¹ _{⇒ �}_�^∗ _{≥ �}_�^′]

• Suppose _�_�^∗ _{≥ �}_�¹. Then, we have

�_�^∗ ≥ �_�¹ ≥ �_�² ≥ �_�³ ≥ ⋯

• The first inequality is by assumption

• The second inequality follows by applying the assumption after the deviation at period t (“_ℎ_�+1”)

• The third inequality follows by applying the assumption after the deviation at periods t and t+1 (“_ℎ_�+2”), and so on

• Thus, any finite-period-shot deviation is not profitable

• To complete the proof, we need to consider the case of “infinite k”

• “Infinite-period deviation” is almost the same as “k-shot deviation” with large k, because future payoff is discounted by _�^�−1 ≈ 0 and negligible

(15)

3.2.3. The Folk Theorem

• Infinitely repeated games with the general stage game:

• � = 1, … , �

• Player i’s action _�_� _{∈ �}_�. Action profile _{� ∈ �}

• Stage payoff function _�_� _�

• Stage Nash equilibrium _�^∗

Proposition 3.2.3 [Folk Theorem]. Suppose that an action profile _{� ∈ �} satisfies _�_� _{� > �}_� _�^∗ for all � ∈ �. If discount factor � is sufficiently close to 1, there exists a subgame perfect Nash equilibrium such that the profile of average payoffs is _{�� = �}₁ _{� , … , �}_� _�

• If players are sufficiently patient (� is large), any stage outcome better than stage NE can be sustainable in an SPNE

(16)

• In infinitely repeated games, there are so many SPNE

• Basically, almost everything can be supported in an SPNE

• For any � ∈ 0,1 , we can construct an SPNE that induces the average payoffs of

�� = �� + 1 − � � �^∗

• E.g. (repeated PD): By taking (C,C) in every odd period and (D,D) in every even period, players achieve (almost) the “half cooperation” (_{� = 1/2)}

• In fact, even an average payoff such that �� < � �^∗ can be sustained

(17)

3.2.4. Tit-for-Tat Strategy

• Axelrod (1984)

• Repeated PD

Definition 3.2.4. In the tit-for-tat strategy _�_�^�� in repeated PD games, (1) _�_�^�� _ℎ₁ = _�

(2) _�_�^�� _ℎ_� = _�_�^�−1 for � ≥ 2 (the action taken by the other player in the previous period)

• In Axelrod’s computer simulations, various strategies are run in tournament. Tit-for-tat finally won.

(18)

Proposition 3.2.4. In (generic) repeated PD games, the profile of tit-for-tat strategies is NOT a subgame perfect Nash equilibrium.

• Under the specification of sec 3.1, the profile of tit-for-tat is a subgame perfect Nash equilibrium if and only if _{� = 1/3}

• Proof is homework