Chapter 4 Methodology
4.1 Data Collection – Internet Survey
Diffusion of the Internet has substantially reduced the cost, time and accessibility for survey data collection. In particular, in comparison with traditional methods such as random direct dial (RDD) telephone survey, mail and face to face survey, features of Internet survey such as its ability to carry out self-administered and interactive survey, as well as its capability to increase the coverage and reduce the turnaround time have made large scale survey much more accessible to researchers and scholars. In addition, as the diffusion of Internet is becoming almost universal, together with a wide selection of both conventional and innovative survey methods over different platforms (e.g., webpage, email and social media), Internet survey has become one of the mainstream data collection methods for both individual and institutional researchers and scholars.
4.1.1 Error of Internet Survey
Despite its popularity, there are a few underlying problems that one has to pay attention to when employing Internet survey as data collection instrument. For instance, there are four main types of error that might limit the reliability, applicability and viability of Internet survey, namely non-coverage error, sampling error, non-response error and measurement error (Couper, 2000; Sills & Song, 2002; Hill et al., 2007).
Non-coverage error – it is one of the major shortcomings of Internet survey. General speaking, non-coverage error is ‘a function of the mismatch between the target population and the frame population’ (Couper, 2000, p. 467). Whereas the target population is the subject of interest (for example, in this case, people from the three prefectures that were directly hit by the disaster), and the frame population refers to where the samples are drawn from (for example, people from these three prefectures that have access to Internet – since the survey is carry over the Internet). In other words, the non-coverage error in Internet survey comes from the fact that it literately can only cover people can access to the Internet. The impact of the non-coverage error depends on the purpose of the survey and the Internet penetration of the total population (Couper, 2000). For instance, in the case of Japan, that by the end of 2012, 79.5% of the total population had access to the Internet (included both wireline and wireless Internet) (MIC, 2013), therefore, one can argue that the non-coverage error of Internet survey is not too severe.
Sampling error – while non-coverage error is caused by those who are not covered by the frame population, sampling error occurs in the selection process of the sample from the frame population. As Couper (2000) defines ‘sampling error arises from the fact that not all members of the frame population are measured. If the selection process were repeated, a slightly different set of sample persons would be obtained’ (Couper, 2000, p. 467). In other words, the probability of a member from the frame population being selected. Sampling error varies between different Internet survey methods.
Non-response error – non-response error is the discrepancy between the observed samples (those that have responded) and the entire frame population (those that have responded plus those that have not responded) (Sills & Song, 2002). It ‘arises through the fact that not all people included in the sample are willing or able to complete the survey’ (Couper, 2000, p. 473). Again, similar to sampling error, non-response error varies between different Internet survey methods, for example, the non-response error will be very different between surveys that are voluntary and that are compulsory or with incentive.
Measurement error – since Internet survey is mostly self-administrated, measurement errors could ‘arise from the respondent (lack of motivation, comprehension problems, deliberate distortion, etc.) or from the instrument (poor wording or design, technical flaws, etc.)’ (Couper, 2000, p. 475). Another source of measurement effort comes from self-report bias as the respondents tend to overstate or understand the answer because
‘In general, research participants want to respond in a way that makes them look as good as possible. Thus, they tend to under-report behaviors deemed inappropriate by researchers or other observers, and they tend to over-report behaviors viewed as appropriate’ (Donaldson & Grant-Vallone, 2002, p. 247). On top of the survey methods, measurement error also varies between different target population and the subject of the questions being asked.
4.1.2 Types of Internet Survey
Fundamentally, Internet survey can be classified into two main types; non-probability methods and non-probability-based methods (Couper, 2000).
Non-probability methods are methods that recruit the respondents via specific channels, whereas the respondents can decide whether or not they want to participate in the survey voluntarily. Examples of non-probability method are ‘entertainment-typed survey’
(surveys that are designed to attract respondents with entertainment purposes), ‘self-selected web survey’ or ‘volunteer panel surveys’ (surveys that recruit respondents through pop-up advertisements on websites and social media). The advantages of non-probability methods are the relatively low operational cost required to recruit a large number of respondents. However, as the frame population in most cases are unknown and the samples are selected fully based on their own will, the risk of non-coverage error and sampling error are quite high, and the non-response error and measurement error are also uncontrolled. Hence non-probability methods are usually not suitable for surveys that aim for generalisation.
Probability-based methods – in contrast, probability-based methods attempt to define the frame population first, and then select the respondents from the frame population based on certain sampling criteria. For example, ‘intercept survey’ and ‘list based samples survey’ recruit the respondents systematically by either intercepting the traffic to popular websites or by specific user list, and ‘pre-recruited panels survey’ recruit people to participate as the panel via various channels both online and/or offline. In probability-based methods, usually the basic demographic data of the respondents are available, and hence, the profile of the frame population can be constructed.
The advantage of probability-based methods is that with a known frame population, the non-coverage error and the sampling error can be controlled. In addition, depends on way the survey is carried out, the non-response and measurement errors can also be controlled. In particular, as Internet survey is becoming more and more popular, many commercial survey companies are offering professional Internet survey services, these companies usually have a fairly large survey panel (frame population) to reduce the non-coverage error and sampling error. In addition, these companies usually offer some incentives to the respondents to increase the response rate, and some of them also employ some quality control policies to control the measurement error. Professional Internet panel survey has become one of the mainstream Internet survey methods for both marketing and academic researches (Couper, 2000; Sills & Song, 2002; Tsuboi et al., 2012). However, since by nature, Internet survey inherit some non-coverage error because of the fact it can only cover the Internet users (unless in some case the target population is only Internet users), and most Internet survey are self-administrated. Therefore, some studies in Japan, the US and the UK (e.g., Tsuboi et al., 2012; Dennis et al., 2005; Grandcolas et al., 2003) have found that Internet panel survey, as like other methods such as face to face or telephone survey,
do have a certain level of bias. That being said, from a practical point of view, by comparing the data of large scale Internet surveys and national surveys, Hill et al., (2007) argue that a mildly biased but large scale survey can produce more reliable estimates than small but unbiased survey. Therefore, considering the cost, time, and reliability factors, professional Internet panel survey offers the most balanced solution and hence it is deemed suitable as the data collection method for the empirical analysis in thesis.
4.1.3 Post-Stratification Weighting
There are many methods to reduce or mitigate the coverage and sampling errors statistically, one of commonly used methods is post-stratification weighting (Little, 1993;
Barboza & Williams, 2005). In short, post-stratification is a ‘method of data analysis which involves forming units into homogeneous groups after observation of the sample’ (Little, 1993, p. 315). It often involves comparing the observed sample with some expected (reference) samples, and in many cases, large scale surveys such as government census surveys are used as the expected sample. A common technique of post-stratification is to create a post-stratification weighting factor by comparing the basic social demographic data e.g., age, gender, geographic location of the observed sample with the expected samples to calculate a ratio based weighting factor. The weighting factor is then applied to the observed sample to create the weighted samples, which will be used for further statistical analysis. Although in some cases, it is found that post-stratification weighting factor can effectively reduce the coverage and sampling errors and increase the representativeness of the sample (Barboza & Williams, 2005). However, it is important to note that post-stratification cannot reduce non-response and measurement errors, nor it can increase the precision of the survey (Little, 1993). In fact, in other cases, it is found that post-stratification weighting adjustment has only a minor effect on the results (Loosveldt & Sonck, 2008).
4.1.4 Data Collection Procedure
Probability based Internet panel survey by professional survey company is chosen as the data collection method for this thesis because of its advantages described in section 4.1.2. In this thesis, two separate Internet panel surveys9were carried out, the first survey was conducted in March 2013 to collect the data for the core analysis, and the second survey was carried out in March 2014 to collect the data for the extended analysis. Both surveys were commissioned by Macromill Inc., one of the largest online research companies in Japan. They have an online survey panel of approximately 1 million active registered respondents across Japan. They recruit the respondents online, and will compensate them with points that can be exchanged for cash or coupons upon completing the survey. They select the samples randomly from this relatively large panel base to minimise the sampling error. On top of that, Macromill Inc. also employs a set of quality assurance policies to
9Both surveys in this study are sponsored by the Japan Commercial Broadcasters Association.
ensure the quality of the survey data, for example, they regularly compare their panel data with other large scale surveys carried out by the government and other survey companies to check the representativeness of their panel to reduce the potential non-coverage error. In addition, they also monitor the survey panel to check for non-responses, false and repeated registrations in order to reduce the non-response error. Furthermore, the privacy of the personal data of the respondents is protected by the company’s privacy policy (Macromill, 2014). For the survey procedure, first, the questionnaire is sent to the research company, who will double-check the wordings to ensure they can be easily understood by the respondents to reduce the potential measurement error. Then they will convert the questionnaire to online format and the hyperlink to the questionnaire is sent to the respondents via email. The respondents are randomly drawn by the company from their survey panel according to the research specific requirements. Finally, noting that Internet survey do have certain level of inherited bias, post-stratification weighting will be employed if necessary to minimise the possible error.