Sampling
A Workbook by Alison Galloway
Contents
Sampling
How to use this workbook
In class time we will work through some of the theory of sampling. At regular intervals, however, you will be asked to consider some questions that relate to sampling issues. Please give some thought to these questions and jot down answers to them. Feel free to 'submit' them to your lecturer if you want. These questions are there to help you to test your own understanding of the subject matter. Some possible answers will then be suggested by your lecturer, but remember that for many of the questions there is not one correct "solution", but a variety of possible answers.
It is hoped that the workbook will help you have a better understanding of this subject. It can also be used as a tool for revision and as a point of reference for much of the work that will be covered in other areas of your course.
Why sampling theory is important
When undertaking any survey, it is essential that you obtain data from people that are as representative as possible of the group that you are studying. Even with the perfect questionnaire (if such a thing exists), your survey data will only be regarded as useful if it is considered that your respondents are typical of the population as a whole. For this reason, an awareness of the principles of sampling is essential to the implementation of most methods of research, both quantitative and qualitative.
Some definitions first of all:
- Population The group of people, items or units under investigation
- Census Obtained by collecting information about each member of a population
- Sample Obtained by collecting information only about some members of a "population"
- Sampling Frame The list of people from which the sample is taken. It should be comprehensive, complete and up-to-date. Examples of sampling frame: Electoral Register; Postcode Address File; telephone book
Probability and Non-Probability Sampling
A probability sample is one in which each member of the population has an equal chance of being selected.
In a non-probability sample, some people have a greater, but unknown, chance than others of selection.
There are five main types of probability sample. The choice of these depends on nature of research problem, the availability of a good sampling frame, money, time, desired level of accuracy in the sample and data collection methods. Each has its advantages, each its disadvantages. They are:
- Simple Random Sample
- Systematic Sampling
- Random Route Sampling
- Stratified Sampling
- Multi Stage Cluster Sampling
1. Simple random sample
This is perhaps an unfortunate term, because it isn't that simple and it isn't done at random, in the sense of "haphazardly".
Characteristics:
- Each person has same chance as any other of being selected
- Standard against which other methods are sometimes evaluated
- Suitable where population is relatively small and where sampling frame is complete and up-to-date
Procedure:
- Obtain a complete sampling frame
- Give each case a unique number, starting at one
- Decide on the required sample size
- Select that many numbers from a table of random numbers or using computer
Table of random numbers (usually found at back of statistics textbooks) e.g.
92941 04999 77422 25992 27372
94157 43252 83266 47196 94045
48135 34237 46293 46178 50110
78907 37586 50940 88094 28209
82843 43383 32561 62108 46076
Decide on a pattern of movement through table and stick to it, e.g. numbers from every second column and every row. If a number comes up twice or a number is selected which is larger than population number, discard it.
2. Systematic sampling
Similar to simple random sampling, but instead of selecting random numbers from tables, you move through list (sample frame) picking every nth name.
You must first work out SAMPLING FRACTION by dividing population size by required sample size. E.g. for a population of 500 and a sample of 100, the sampling fraction is 1/5 i.e. you will select one person out of every five in the population. Random number needs to be used only to decide on starting point. With the sampling fraction of 1/5, the starting point must be within the first 5 people in your list
Disadvantage: Effect of periodicity (bias caused by particular characteristics arising in the sampling frame at regular units). An example of this would occur if you used a sampling frame of adult residents in an area composed of predominantly couples or young families. If this list was arranged: Husband / Wife / Husband / Wife etc. and if every tenth person was to be interviewed, there would be an increased chance of males being selected
3. Random Route Sampling
Used in market research surveys - mainly for sampling households, shops, garages and other premises in urban areas
Address is selected at random from sampling frame (usually electoral register) as a starting point. Interviewer then given instructions to identify further addresses by taking alternate left- and right-hand turns at road junctions and calling at every nth address (shop, garage etc.)
Advantages:
- May be saving in time
- Bias may be reduced because interviewer has to call at clearly defined addresses - not able to choose
Problems:
- Characteristics of particular areas (e.g. poor / rich) may mean that sample is not representative
- Open to abuse by interviewer because difficult to check that instructions fully carried out
4. Stratified Sampling
All people in sampling frame are divided into "strata" (groups or categories). Within each stratum, a simple random sample or systematic sample is selected.
Example of stratified sampling - If we want to ensure that a sample of 5 students from a group of 50 contains both male and female students in same proportions as in the full population (i.e. the group of 50), we first divide that population into male and female. In this case, there are 22 male students and 28 females. To work out the number of males and females in the sample........
No. of males in sample = (5 / 50) x 22 = 2.2
No. of females in sample = (5 / 50) x 28 = 2.8
We obviously can't interview .2 of a person or .8 of a person, and have to "round" the numbers. Therefore we choose 2 males and 3 females in the sample. These would be selected using simple random or systematic sample methods
5. Multi-stage cluster sampling
As the name implies, this involves drawing several different samples. It does so in such a way that cost of final interviewing is minimised.
Basic procedure: First draw sample of areas. Initially large areas selected then progressively smaller areas within larger area are sampled. Eventually end up with sample of households and use method of selecting individuals from these selected households.
Non-Probability Samples
- Purposive Sampling
- Quota Sampling
- Convenience Sampling
- Snowball Sampling
- Self Selection
It isn't always possible to undertake a probability method of sampling, such as in random sampling. For example, there is not a complete sampling frame available for certain groups of the population e.g. the elderly; people who are attending a football match; people who shop in a particular part of town. Another factor to bear in mind is that many of the probability sampling methods described above may mean that researchers would have to undertake a postal or telephone survey delivery or might be expected to go from house to house. We will discuss some of the problems of low response rate later on in this workbook, but you might find that a probability sample with a poor response rate doesn't in the end give you a particularly good representation of the population being examined.
Advantages of non-probability methods:
- Cheaper
- Used when sampling frame is not available
- Useful when population is so widely dispersed that cluster sampling would not be efficient
- Often used in exploratory studies, e.g. for hypothesis generation
- Some research not interested in working out what proportion of population gives a particular response but rather in obtaining an idea of the range of responses on ideas that people have.
1. Purposive Sampling
A purposive sample is one which is selected by the researcher subjectively. The researcher attempts to obtain sample that appears to him/her to be representative of the population and will usually try to ensure that a range from one extreme to the other is included.
Often used in political polling - districts chosen because their pattern has in the past provided good idea of outcomes for whole electorate.
2. Quota Sampling
Have you ever been ambling along your local High Street, noticed a Market Researcher with a clipboard and thought "I don't mind being asked some questions - it might be interesting", only to find that the researcher looks straight through you? No? Well, for those people who have had that happen, there is no need to take it personally. It is all due to quota sampling.
Quota sampling is often used in market research. Interviewers are required to find cases with particular characteristics. They are given quota of particular types of people to interview and the quota are organised so that final sample should be representative of population.
Stages:
- Decide on characteristic of which sample is to be representative, e.g. age
- Find out distribution of this variable in population and set quota accordingly. E.g. if 20% of population is between 20 and 30, and sample is to be 1,000 then 200 of sample (20%) will be in this age group
Complex quotas can be developed so that several characteristics (e.g. age, sex, marital status) are used simultaneously. By the end of the day, the researcher may be looking for a widowed man in his nineties who looks as though he might buy a particular brand of detergent.
Disadvantage of quota sampling - Interviewers choose who they like (within above criteria) and may therefore select those who are easiest to interview, so bias can result. Also, impossible to estimate accuracy (because not random sample)
A convenience sample is used when you simply stop anybody in the street who is prepared to stop, or when you wander round a business, a shop, a restaurant, a theatre or whatever, asking people you meet whether they will answer your questions. In other words, the sample comprises subjects who are simply available in a convenient way to the researcher. There is no randomness and the likelihood of bias is high. You can't draw any meaningful conclusions from the results you obtain.
However, this method is often the only feasible one, particularly for students or others with restricted time and resources, and can legitimately be used provided its limitations are clearly understood and stated.
Because it is an extremely haphazard approach, students are often tempted to use the word "random" when describing their sample where they have stopped people in the street, as they see it "at random". You should avoid using the word "random" when describing anything to do with sampling unless you are absolutely certain that you selected respondents from a sampling frame using truly random methods.
With this approach, you initially contact a few potential respondents and then ask them whether they know of anybody with the same characteristics that you are looking for in your research. For example, if you wanted to interview a sample of vegetarians / cyclists / people with a particular disability / people who support a particular political party etc., your initial contacts may well have knowledge (through e.g. support group) of others.
Self-selection is perhaps self-explanatory. Respondents themselves decide that they would like to take part in your survey.
Non-Response
Some people selected in a sample may not be included:
- Some will refuse
- Some will be uncontactable
- Some will be uninterviewable
Non-response can create 2 major problems:
- Unacceptable reduction in sample size
- Bias
Depends on:
- Methodology selected
- Degree of accuracy required for the study (how much error can be tolerated)
- Extent to which there is variation in the population with regard to key characteristics of the study
- Likely response rate (which itself will depend on sampling method selected)
- Time and money available
THE LAWS OF SAMPLING
The Law of Statistical Regularity
A reasonably large sample selected at random from a large population will be, on average, representative of the characteristics of that population.
The Law of the Inertia of Large Numbers
Large groups of data show a higher degree of stability than smaller ones; there is a tendency for variations in the data to be cancelled out by each other.