where the level of output Y depends on inputs of capital (K), labour (L), and multifactor productivity (A). One can see that the Cobb-Douglas production function (4.2) includes another variable (A) apart from the capital and labour in explaining the output. The variable A is usually referred to as total factor productivity (TFP) in which technological progress plays an important role. Hence, an improvement of A is commonly to be construed as a result of technological advancement (Quah, 2001). The parameters and indicate returns to scale of production. If + > 1, there is an increasing return to scale, meaning a one percentage increase in the usage level of all input will result in a greater that one percentage increase in output level. If + < 1, it is a case of decreasing returns to scale, meaning that the output will increase less than the proportional increase in the input. The last case is called constant returns to scale which takes place when + = 1. Under the constant returns to scale condition, the output will increase in the same proportion as the increase in the input.
Within the Cobb-Douglas production function derived from Solow’s growth accounting theory, output growth can be easily understood. This output growth may occur as a result of more input. More workers in a factory or more employees in a company will increase the level of output. Similarly, a higher level of capital input – more production machines and equipment – is associated with a higher level of output given a level of labour input. However, the main interest of this thesis is the impact of a change in technology (A) on output growth, which is likely to occur according to the theory. Such an impact will be analysed in further case studies of this thesis. The output under the macroeconomic case studies should be construed as national output, which is measured as the aggregate output produced in the country. In other words, it is the aggregated Y that is focused on in the case studies of the macroeconomic impact of ICT. This thesis will analyse the macroeconomic impact from both the improvement of current technology as well as the introduction of more advanced ICT services that have yet to mature in Thailand.
Section 2. MICROECONOMIC PERSPECTIVE:DISCRETE CHOICE THEORY
Apart from the macroeconomic analysis, this thesis also provides a microeconomic analysis of businesses and individuals in terms of decision making
towards a particular alternative. Thus, it is wise to understand the theoretical background of the case studies involving choice analysis. As the name suggests, the discrete choice theory explains the process and underlying reason of choice selection by economic agents. This theory is based on McFadden’s random utility theory (1973). To begin, it is assumed that any choice is selected in order to maximise the utility obtained from that selection. In other words, an economic agent will select an alternative that gives the highest level of happiness – or utility in the language of economics. Within this setting, the utility of an individual i facing choice j can be expressed as below.
= + 4.3
In 4.3, one can see that the utility that the individual i receives from selecting choice j consists of two parts. The first one is the observable part represented by , while the second one () is the unobserved part that includes tastes. Again, although the components of utility can be portrayed by this theory, it is difficult to measure them.
The focus here is directed towards the comparison of utilities when a choice is made.
Choice j will be selected if it gives a higher level of utility over another alternative – choice k, for example.
Suppose is a variable that represents the choice made by individual i. The probability that choice j is selected over choice k can be expressed as
Pr(= |, ) = Pr> 4.4
where ≠ . From 4.4, it is obvious that the probability that the individual i will select choice j over k occurs if the utility obtained from selecting choice j represented by is higher than that obtained from choice k, known as .
A common approach used in the research of decision making of economic agents is to estimate the probability that the choice under study will be selected over other alternatives. In order to estimate such probability, a logit model is often employed. This model provides a linearised estimation of a non-linear relationship of variables. More details will be elaborated on later in this chapter.
In the case studies following the random utility theory, it is assumed that is independently and identically distributed, as suggested by McFadden (1973). The main focus is the probability estimation based on the observed part () that consists of attributes of the choice and the individual.
Section 3. ESTIMATION TECHNIQUE
This thesis employs the quantitative approach as the main research methodology for all case studies. However, there are also some analyses of qualitative data presented in order to endorse, or compare with the results from the quantitative analyses. There are five different quantitative approaches used in the case studies and most of them are mathematical models derived from econometrics concepts based on sound economic theories. For the macroeconomic analysis, the statistical models are based on ordinary least squares (OLS) regression, bivariate autoregressive model (Granger causality test), and vector autoregressive (VAR) model. For the microeconomic analysis, apart from the OLS regression, binary logistic regression – binary logit – and confirmatory factor analysis (CFA) are used.
4.3.1. Ordinary Least Squares Regression
Before moving on to the explanation of the ordinary least squares (OLS) regression, it should be noted that detailed information regarding statistical derivations as well as interpretations shall be put aside as they are is beyond the scope of this thesis.
This chapter provides an explanation of the underlying research methodology – regression analysis – with the assumption that the readers possess sufficient knowledge of fundamental statistics.
The OLS regression is one of the most commonly used approaches for research in various fields. In economics literature, one can witness this statistical approach very often. It is one of the basic, but powerful, types of regression that can be used to test hypotheses based on economic theories. The OLS regression is associated with an estimation of parameters that can be used to explain the direction and magnitude of the impact of the independent variables – explanatory variables – on the dependent variable. The practical flow of OLS regression analysis is as follow. First, a model based on economic theories is generated. Then, the explanatory variables are added according to the hypothesis. After the regression has been performed, the estimated coefficient is verified to see if the underlying hypothesis can be accepted.
To understand the OLS regression, it is wise to start with normal linear regression. The author follows the explanation of Gujarati (2004). Let the simple regression equation begin as follows,
= + + 4.5
where is the dependent variable, is the independent variable, or explanatory variable, and are called estimators or coefficients, and is the residual, or error term. The task is to estimate 4.5 with a sample set of observations of and that varies across time t. Then the new equations are
= + +
where , , , and in 4.6 represent the estimated value of , , , and , respectively. There is another expression of 4.7 as follows,
= − 4.8
which states that the estimated residual, or error term, is equal to the difference between the actual and estimated value of . In order to produce a good estimation model, the value of should be minimised implying the closer value of to . In other words, a sound estimation model based on regression analysis can be achieved when there is a very low difference between the actual and estimated value of the dependent variable reflected in the minimised estimated error.
However, the straightforward minimisation of in 4.8 is not appropriate in general because it gives the same weight to each of error term () for each observation.
As a consequence, a model with high errors may be accepted. To avoid this problem, it is recommended to minimise the value of – the squared error. Up to this point, the readers are highly advised to refer to Gujarati (2004) for detailed explanations on this matter. For a given number of observations, the desired estimation model is the one with minimum value of the sum of . This is shown below.
= − 4.9
The equation 4.9 is based on the least-squares criterion; hence, considered a fundamental of OLS regression. Recall that = + from 4.6 and 4.7, one can see
that the sum of squared errors is a function of and . This relationship can be captured in the following equation:
= , 4.10
By using 4.10 as a condition, the principle of OLS regression is to achieve the value of and with the smallest ∑ for a given set of data. In other words, this method of regression provides the estimated value of and that minimises the sum of the squared errors (∑ ), which then gives the outcome model robustness in terms of explanatory power as well as prediction.
In fact, for an OLS regression, the main concern is the value of and . Thanks to statistical packages such as STATA and EViews being available in the market, the value of and such that ∑ is minimised can be produced easily.
This thesis utilises both of the aforementioned statistical programmes to perform OLS regression where required.
Another important issue to be discussed here is the interpretation of and . First, is the estimated intercept of the regression line. This term is usually regarded as a constant term in the literature. It receives less attention compared to the other term because no intuitive explanations of the relationship between explanatory variables and dependent variable can be made within this framework. Hence, the case studies using OLS regression in this thesis focus mainly on discussions of the other term. This second term, , receives a lot of attention since it reveals the direction and magnitude of the relationship between the explanatory variable and the dependent variable. If increases by one unit, will increase by unit(s) provided that the sign of is positive. However, if has a negative sign, then will decrease by the same unit increase in .
In practice, there are usually more than one explanatory variable. All the case studies in this thesis that use OLS regression include a set of explanatory variables in one estimation model in order to find the effect of each one at the same time. In fact, a more practical model that includes more than one explanatory variable can be seen below:
One can see that in 4.11 the second term on the right hand side represents a set of up to m explanatory variables, where each variable has a corresponding where n is a positive number indicating different variables. The interpretation of estimated value of is still the same. It is the direction and magnitude of change in at time t given a unit change in .
Up to this point, it is to be understood that the OLS regression produces the estimated value of for each explanatory variable such that the sum squared error (∑ ) is minimised for a given set of observations that varies through time t. Nevertheless, in a microeconomic case study, the cross-sectional variation is considered instead of time variation. According to the principle of regression, such variation can also be analysed with OLS regression. Similar results and interpretations can also be made. An example model that can handle the variation of cross-sectional data is shown below.
In 4.12, instead of t, the cross-sectional variation i is considered. In other words, the value of can also be estimated with the OLS criterion by detecting a pattern of relationship between and in cross sectional variation regardless of time. Such a variation (i) can be reflected in, for example, different types of industries, different individuals, and different countries. The interpretation is the same as the aforementioned one. The reader is advised to recall 4.12 when discussing the practical model presented in the case study under the microeconomic analysis section.
In order to support the interpretation of the estimated , there are some relevant statistics that are considered acceptable for similar kinds of research. First, all the case studies in this thesis accept the estimated – as well as underlying hypothesis – at statistical significance level of one, five, and ten per cent. This statistical significance level can be verified by using p-value, t-statistic, and standard error of each estimate.
For the overall robustness of the outcome model, the value of R2 is considered. The higher the value of R2, the higher an explanatory power of the model can be expected.
Hence, the model with high R2 is selected26. These two concepts of statistical significance will be discussed in the case studies using OLS regression. Besides those,
26 For more details of statistical significance and hypothesis testing, refer to Gujarati (2004).
there may be some other statistics brought up in order to cope with specific issues of the estimation.
One last point to be made for the OLS regression is the problem of multicollinearity. According to Frisch (1934), the term “multicollinearity” refers to the situation in which there exists a perfect, or exact, linear relationship among some, or all explanatory variables of a regression model. Mathematical expressions of the consequence of multicollinearity will be put aside to provide rooms for textual explanations. In sum, a regression with multicollinearity will result in a biased and unreliable estimation model in which most – or all – of the explanatory variables will become statistically insignificant despite the seemingly high explanatory power of the model as a whole. Of course, this will affect the result of hypothesis testing in the sense that there will be a high rejection rate. One way to quickly detect the presence of multicollnearity is seeing high R2 despite insignificant estimated coefficients () pointed out by the t-statistic and standard error. However, this thesis utilises another indicator called variance inflation factor (VIF) apart from the basic method of detection. A high value of VIF for each explanatory variable means there exists multicollinearity; hence, is undesirable. More details of VIF and the decision-making criteria will be presented again in the case studies that use OLS regression. A common solution to handle multicollinearity problems is to discard the variable(s) with high correlation. Again, this solution will be witnessed in the case study that requires it.
The OLS regression is utilised in the first case study of macroeconomic analysis and the first case study under business level of microeconomic analysis. For the former case study, the OLS regression is considered appropriate because this study attempts to find out an economic relationship in terms of the impact of ICT (explanatory variable) and labour productivity (dependent variable) by using time-series data of Thailand. The OLS regression can capture the variation in time and reveal the relationship. For the latter case study, the focus is on potential adoption of cloud computing-based services and its impact on ICT spending of an industry27. Cloud services are treated as the explanatory variables while ICT spending is the dependent variable. This case study
27 More information of cloud computing and its enabled services will be discussed in later texts.
uses a set of cross-sectional data derived from industry surveys in Thailand. Since the survey was done at one point in time, the variation is across industries, not time. The OLS regression is a good method of estimation because it can also capture a relationship based on cross-sectional variation.
4.3.2. Bivariate Autoregressive Model
Now that the basic concept of OLS regression has been explained, it is obvious that a relationship between the explanatory variable and dependent variable can be examined. However, such a relationship is most of the time regarded as correlation rather causation. This thesis then employs another extended model based on linear regression in order to verify the causality between pairs of variables as it is necessary in the case study. The model is called bivariate autoregressive model, or the Granger causality test. It was developed by Professor Clive W.J. Granger, who received the Nobel Prize in Economics in 2003 for his work.
Once again, to understand the process of the bivariate autoregressive model, a basic understanding of regression is required, as this model is based on regression estimation. Under this section, fundamental concepts of the bivariate autoregressive model are elaborated on. This model is normally used to verify causality in terms of time of occurrence. A basic concept is that a cause must occur before an effect. Based on this concept, the bivariate autoregressive model can be used to analyse causality in time-series data. Hence, the variation in time is a requirement. Consider the two equations below,
$ + %
$ + & 4.13
= + '!
$!+ & 4.14
where and are the two variables that can hypothetically be conjectured to exhibit a causal relationship. The constant term is represented by and , while a set of coefficients is represented by the sum of , % , '!, and -!, where N and M are positive numbers indicating the number of lags in 4.13 and 4.14 respectively, with initial value of one (. = 1 and / = 1). Finally, the estimation errors are represented by
& and & for the first and second model.
Unlike the OLS regression model, one can see that the variable is included in both sides of the equation 4.13 and is also included in both sides of 4.14 in the bivariate autoregressive model. This is to check whether there is a one-way causality rather than to verify a basic correlation between the dependent and explanatory variable as in the OLS regression. Both 4.13 and 4.14 together are referred to as the bivariate autoregressive model, or the Granger causality test. According to Granger (1969), is believed to cause a change in if the inclusion of n-lagged value of , or $ , reduces the variance of the estimation error (&) in 4.13. The explanation is the same for 4.14 in which the main focus is the lower estimated error (&) with the inclusion of $! for m- lagged period. This concept is similar to the OLS in the sense that a good estimation model should produce lower estimated error.
In practice, a common approach to examine a causal relationship between a pair of variables is to test the statistical significance of their coefficients. For example, in 4.13, the coefficient % for all n lags should be statistically significant in order to state that is the cause of change in . Likewise, the coefficient '! for all m lags should be statistically significant to conclude a causality flowing from to in 4.14 Apart from the approach mentioned earlier, another way to test the statistical significance is to use the F test with the null hypothesis that the underlying coefficient is equal to zero.
In terms of the number of lag(s), the Akaike Information Criterion (AIC)28 is used as the main method to determine the appropriate lag in the bivariate autoregressive model constructed in the case study of this thesis. A robust estimation model is the one with a number of lags that minimises the value of AIC.
The bivariate autoregressive model is used in the first and second case study of macroeconomic analysis. In the first case study, after the OLS regression has been conducted to verify a relationship between ICT and labour productivity, the Granger causality test is performed to witness the causality. An estimation model incorporating 4.13 and 4.14 is generated to examine whether there is a one-way or bi-directional causality between ICT and labour productivity. The ICT and labour productivity variable is derived from reliable macroeconomic time-series data; hence, the bivariate
28 For more information about Akaike Information Criterion, refer to Akaike (1974)
autoregressive model can be considered a suitable means to point out the causality. In the second case study, this model is used as a preliminary analysis in order to examine bi-directional causality among the potential adoption of cloud computing-based services, national output, employment, and labour productivity. After confirming the existence of the two-way relationship, a suitable estimation model is constructed. This model will be discussed in the next section. The statistical software used to perform the Granger causality test was EViews version 6.1. With the aid of the software, a number of important issues such as selection of number of lag(s) and test of statistical significance of coefficients are automatically handled before the estimation outcome is delivered.
Indeed, the outcome presented in the case study has the number of lags with minimum AIC value and statistically significant coefficients.
4.3.3. Vector Autoregressive Model
The third model is vector autoregressive model or VAR model for short. This model is quite similar to the bivariate autoregressive model in the sense that it also considers past values of variables. However, the VAR model is usually conducted on a set of variables rather than a pair. More than that, instead of examining causality, the VAR model has a reputation for being valuable in forecasting. The advantage of the VAR model is that it treats all variables on the same manner.
According to Sims (1980), there is no need for identification of exogenous and endogenous variables. In other words, the separation of dependent and explanatory variables is not required, for VAR models treat all variables on equal footing. Most of the time, it is not clear, or there is no sound theoretical support to the treatment of variables as dependent or independent. Indeed, there may be a case of loop of causality so that one cannot explicitly distinguish between dependent and explanatory variables.
Such issue is called endogeneity, or the existence of a two-way relationship among the variables involved. The existence of endogeneity is problematic to an estimation using OLS regression as the outcome will be biased; hence, unreliable. The endogeneity problem can be handled using the VAR model because under this model all variables are treated equally. The estimation can be performed without concerns of endogeneity.
The term autoregressive signals the inclusion of lagged values of the dependent variable in the right hand side, while the term vector means a vector of variables is to be