we should not apply the method of removing trend components by Del Negro et al. (2007) or SW (2007). Instead, we extract the cycle component of each data by removing the time-varying trend component by the Hodrick-Prescott filter without handling the trend break. Finally, all data was demeaned to make the means zeros. The solid line in Figure 2.1 displays the data Xt used in our estimation.
2.5. RESULTS 63 where M is the number of sampling, ˆγ(i) is i-th order covariance, K(z) is the kernel of Parzen window shown below ,whereBM is the band width and set to BM = 0.01M.
ˆ
γ(i) = 1 M
XM k=i+1
(θ(k)−θ)(θ¯ (k−i)−θ),¯
K(z) =
1−6z2+ 6z3, z ∈[0,12] 2(1−z)3, z∈[12,1]
Using the SE described above, when judging convergence to invariant distributions in the MCMC method, we refer the following convergence diagnostics (hereinafter, CD) statistics proposed by Geweke (1992).
CD = θ¯A−θ¯B
p{SE(¯θA)}2+{SE(¯θB)}2 (2.27)
where ¯θA is the sample mean calculated using the first 20,000 replicates (the first 10% of total samples after discarding), and ¯θB is the sample mean derived using the last 100,000 replicates (50%
of total samples). SE(¯θA) and SE(¯θB) are calculated by the method of the Parzen window, and the band widths are 200 and 1000, respectively.
The CD statistics becomes asymptotically the standard Normal under the null hypothesis is true that the posterior converges. According to the convergence judgment by the CD statistics, in both cases, the null hypothesis is accepted as 5% significance level with some exceptions such as parameters related to labor supply.
Also, following Fujiwara and Watanabe (2011), to assess how inefficient the sampling method is compared with the random sampling, we calculate inefficiency factor (hereinafter, IF) for each parameter.
IF = 1 + 2 X∞
i=1
ˆ
ρ(i) (2.28)
where ˆρ(i) is i-th order autocorrelation of parameters drawn by the MCMC. Let the variance and i-th order autocorrelation denote as σ2 and ρ(i). Then, the variance of sample mean becomes σ2(1 + 2P∞
i=1ρ(i))/M. On the other hand, since there is no autocorrelation in the random sam-pling, the variance of the sample mean isσ2/M. The IF, the ratio of both, represents the inefficiency of sampling of MCMC against random sampling, and the higher IF indicates the sampling ineffi-ciency by MCMC is high. Since sample autocorrelation up to the infinite order cannot be calculated, we use the calculation method by the Parzen window when calculating IF.
The IF shows high values in both cases, and in particular, parameters related to labor, σL and εL. According to the estimation results of the SW model using the random walk MH algorithm by Chib and Ramamurthy (2010), the sampled parameters ofσLshowed high autocorrelation and that IF became more than 2500.
Turning to the nominal rigidities represented by the Calvo parametersξp andξw, in the case w/o ME, IFs are 712 and 791 respectively, and in the case with ME, IFs are 579 and 522, respectively.
Thus, the case with ME is efficiently sampled.
2.5.3 State Variables
Next, we consider the smoothed state variables in the case with ME. It should be noted that, in the case w/o ME, since observation variables and state variables are definitively connected, state variables cannot be smoothed.
Figure 2.1 shows the smoothed state variables of output, consumption, investment, labor, real wage and inflation.14
As can be seen from Figure 2.1, in output, consumption, investment and labor, there is not a noticeable difference between the observed data and the smoothed state variables, but in real wages and inflation the difference is substantial, which implies that the influence of ME is relatively small in output, consumption, investment and labor, and the influence of ME is relatively large in real wage and inflation.
In the case w/o ME such as SW (2003, 2007) and Sugo and Ueda (2008), the high frequency movements of real wage and inflation data are grasped by price and wage markup shocks, i.e.
structural shocks. In the case where ME is taken into account, most of these violent fluctuations are captured by MEs, not by structural shocks, since state variables of wage and inflation are remarkably smoothed. The result is in contrast to the standard DSGE model which has regarded real wage and inflation as a volatile state variables.
Furthermore, it will naturally affect estimated structural parameters related to price and wage whether to regard wage and inflation as persistent state variables or as volatile state variables.
As pointed out earlier, it seems to have appeared as a difference between the estimated posteriors between the two cases of price and wage indexations and the Calvo parameter on wage.
In fact, in the case w/o ME, the posterior mean of price indexation is 0.304, while in the case with ME the posterior mean is 0.817. Thus, the inflation was regarded as more persistent state variable in the case with ME. On wage indexation, the posterior mean is 0.395 in the case w/o ME, while 0.678 in the case with ME. Regarding the Calvo parameter on wage, the posterior mean in the case with ME is 0.685, which is higher than 0.355 in the case w/o ME, and the wage is captured as a more sticky state variable in the case with ME. Therefore, smoothed real wage and inflation in the case with ME are consistent with the estimated parameters reported in Table 2.2 (b).
2.5.4 Historical Decompositions
Let us turn to historical decompositions. The historical decomposition in the case w/o ME is an attempt to evaluate the contribution of each structural shock and explain all the historical data movements by structural shocks. Especially, we consider historical decompositions focusing on the inflation data and real wage data where the influence of ME seems to be relatively large.
The upper part (a) in Figure2.2 shows the historical decomposition of inflation in the case w/o ME, and the lower part (b) corresponds to the case with ME . Here, in historical decomposition in the case w/o ME (the upper part), for convenience, the contributions of markup shocks have been removed. In the cases with ME (the lower part), the contribution of ME is also eliminated. This is to make it easy to compare contributions by seven common structural shocks in both cases.15 The dashed line in Figure 2.2 shows the sum of contributions of seven structural shocks.
Focusing on the dashed line in both cases, the basic movement of inflation is roughly the same in the two cases. But in the case w/o ME, even if the influences of markup shocks are removed,
14The nominal interest rate is also an observation variable, but it is not shown in the Figure 2.1, since the data of nominal interest rate is assumed to have no ME.
15The seven common structural shocks are preference shock, productivity shock, investment adjustment cost shock, equity premium shock, labor supply shock, TFP shocks, government expenditure shock, and monetary policy shock.
2.5. RESULTS 65 noisy fluctuations still remain. By contrast, in the case with ME, it is markedly smoothed. Looking at contributions due to each structural shock, we can find big differences among two cases. In the case w/o ME, contributions mainly are brought by preference shock, monetary policy shock, and labor supply shock, whereas in the case with ME, mainly due to investment adjustment cost shock and TFP shock. In sum, both cases are generally in agreement with the underlying movement of inflation, but which structural shock has explanatory power over inflation variations has large differences.
Historical decompositions on real wage are depicted in Figure 2.3. Focusing on the dashed line in both cases, in the case w/o ME, we can see the movement relatively close to the actual wage data, whereas in the case with ME it has been markedly smoothed and almost all the volatile actual wage fluctuations are captured by ME. Looking at contributions due to each structural shock, contributions of preference shock and labor supply shock are main sources in the case w/o ME, and contribution of monetary policy shock and investment adjustment cost shock can be also recognized.
In particular, labor supply shock is so volatile, which might make actual wage’s baseline movement volatile in cases w/o ME. By contrast, in case with ME, smoothed wage movements are mainly explained by investment adjustment cost shock and TFP shock. In addition, contributions by labor supply shock are hardly recognized. In summary, the underlying movements of wage are different in both cases and the historical decomposition (especially, contributions of labor supply shock) also changed significantly.
2.5.5 Model Selection
Is the model with ME is better than the model w/o ME? To examine this, we conduct the model selection by the Bayes factor,B10, in the following formula.
B10= p(Y|M1) p(Y|M0)
where p(Y | Mi) is the marginal likelihood of the model Mi. On the calculation method of the marginal likelihood using MCMC, we adopt the modified harmonic mean method of Geweke (1999).16
16The marginal likelihoodp(Y|M) of the modelM in the modified harmonic mean method (Geweke,1999) can be calculated by using parameters (θ1, θ2,· · ·, θN) sampled from the posteriorp(θ|YM), according to the following way:
p(Y|M)≈ 1
N
N
X
i=1
Cg g(θi)
p(Y|θiM) × Cpp(θi|M) −1
whereN is the number of sample,p(Y|θiM) is the likelihood,p(θi|M) is the prior, andg(θi) is an any probability density. CpandCg are scaling parameters forp(θi|M) andg(θi). Under an any functiong(θ), the modified harmonic mean method by Geweke (1999) employs the following truncated Normal distribution.
g(θ) = τ−1(2π)−k/2|Σ|−1/2 exp [−1
2(θ−µ)0Σ−1(θ−µ) ]
× I[ (θ−µ)0Σ−1(θ−µ)≤Fχ−12 k
(τ) ]
wherekis the number of parametersθ,Iis the indicator function that becomes unity if the value in the parentheses is true, and zero, otherwise. F−1
χ2k(τ) is the inverse function of the cumulativeχ2 distribution with degree of freedom k. The area of the cumulative density truncated is set toτ= 0.95.
It should be noted, however, in calculating the marginal likelihood of the DSGE model, we must pay attention to the priorp(θi|M) and the scaling parametersCp,Cg of the multivariate Normalg(θ). In other words, the parameter
Table 2.3 shows the log marginal likelihood, its SE and the log Bayes factor log(B10). The log Bayes factor for the model with ME against the model w/o ME reveals an extremely large, about 80.
Kass and Raftrey (1995) suggests if the log Bayes factor exceeds five, the model can be supported very strongly against the other model to be compared from the view of the fit to data .17 Therefore, adding ME to data greatly contributes to increasing the estimation and prediction accuracy of the SW model.
Why the model with ME is superior to the model w/o ME in the marginal likelihood?
The reason is whether to capture the noisy data by structural shocks or by MEs. In the case w/o ME, the data fluctuations are caused by markup shocks, but because markup shocks are structural shocks, shocks variations affect not only wage or inflation, but also other state variables. As markup shocks undertake volatile data movements of real wage and inflation, it will force volatile fluctuations to other state variables as well, resulting in a sacrifice of prediction accuracy as a whole model. On the other hand, in the case with ME, since the volatile data are regarded as MEs, there are no influences for other state variables. For this reason, it is possible to smooth data with high inertia such as output, consumption, investment and labor without forcing volatile movements, resulting in no sacrifice of prediction accuracy as a whole model.
region of priors and posteriors is truncated if the model solution does not determined uniquely (indeterminacy or no solution). So, the probability density of the priorp(θi|M) and the multivariate Normalg(θ) needs to be adjusted by the scaling parameters so that the integral in the non-truncated range becomes unity. The procedure for calculating this scaling parameter is as follows: First, generating 10000 replicatesθfrom the priorp(θi|M), we solve the model by the method of Sims (2002). At that time, we count the number of replicates that leads to unique solution. LettingNp
denoted as the number, we can set the scaling parameter toCp= 10000/Np. Similarly, for the multivariate Normal g(θ), we generated 10000 replicatesθ and set the scaling parameter toCg = 10000/Ng.
Also, the delta method is used for the numerical calculation of the SE of the log marginal likelihood. From the right-hand side of the above, we have the log-marginal likelihoodf=−loge[p(Y|θ M)g(θ)p(θ|M) ]. Then, we numerically calculate the Jacobian∂f /∂θevaluated by the posterior mean of parameters ¯θ. Using the Jacobian and the posterior mean ¯θ’s variance covariance matrix Σθ, we calculate p
(∂f /∂θ) Σθ (∂f /∂θ)0 to be the SE of the log-likelihood likelihood.
As for the method of calculating the marginal likelihood, there is a detailed description in Appendix C of Fujiwara and Watabe (2011). For the delta method, for example, see Hayashi (2000, p. 93).
17In Jeffreys (1961, Appendix B, p. 432), the criteria for model selection by the Bayes factor are shown as follows.
log10(B10) (B10) Evidence againstH0
0 to 1/2 1 to 3.2 Not worth more than a bare mention 1/2 to 1 3.2 to 10 Substantial
1 to 2 10 to 100 Strong
>2 >100 Decisive
However, his criteria are based on log10 and it is unsuitable for model selection using calibration such as MCMC.
So, Kass and Raftery (1995) proposes criteria based on 2 loge(B10) as the same as the information criteria and the likelihood ratio test . In these criteria, the unit of each criterion has been converted from 10 to 20 used by Jeffreys (1961).
2 loge(B10) (B10) Evidence againstH0
0 to 2 1 to 3 Not worth more than a bare mention
2 to 6 3 to 20 Positive
6 to 10 20 to 150 Strong
>10 >150 Very Strong