Stratified Sampling for Oversampling Small Sub-Populations

by Ritu Narayan

Sampling is a critical issue in any research study design. Most of us have grappled with balancing costs, time and of course, statistical power when deciding our sampling strategies.

How do we know when to go for a simple random sample or to go for stratification or for clustering? Let’s talk about stratified sampling here and one research scenario when it is useful.

One Scenario for Stratified Sampling

Suppose you are studying minority groups and their behavior, say Yiddish speakers in the U.S. and their voting.  Yiddish speakers are a small subset of the US population, just .6%.patterns. If Yiddish speakers are only 0.6% of the population, even if you have a random sample of 1000 respondents, you will get approximately 6 respondents from the group.

A large simple random sample of 1000 residents would, on average, result in 6 Yiddish speakers.

Obviously, it is difficult to draw any meaningful inferences about the group’s behavior based on 6 respondents.

Further, suppose you need at least 60 respondents from this group for sufficient statistical power. That would entail collecting data from 10,000 respondents – a number for which you might neither have the financial resources nor the time.

What if you were able to get meaningful results from a total sample of just 200 respondents of which 100 are Yiddish speakers? Now, that sounds more manageable, doesn’t it? And this is where stratified sampling comes in.

How to do it

In stratified sampling, the population is divided into different sub-groups or strata, and then the subjects are randomly selected from each of the strata. So, in the above example, you would divide the population into different linguistic sub-groups (one of which is Yiddish speakers). Here are two simple steps you should follow:

Step 1: Divide the population into sub-groups (strata)

Commonly used strata are age, gender, ethnicity, socio-economic class and religion. You should ensure that the strata meet the following criteria:

  1. The sub-groups should be exhaustive, i.e. the entire population should be covered within the strata. For example, the different strata for age could be child (<=12 years), teenager (13-19 years), adult (20-59 years), and senior (>=60 years).
  2. There should be no overlaps within sub-groups i.e. each subject or element should fall in only one sub-group. This is evident in the example above.

Step 2: Sample the strata using proportionate or disproportionate allocation

1. In proportionate allocation, in a sample of 1000, you would draw 6 Yiddish speakers.

2. Alternatively, you could draw 100 Yiddish speakers in a total sample size of 200. In other words, you disproportionately sample more subjects from the stratum of interest. That is, 50% Yiddish speakers is much more than their representation in the population (0.6%). With such a sample you can draw meaningful inferences about Yiddish speakers and how they compare with the rest of the population.

In this scenario, disproportional allocation would make the most sense, since the point is to ensure an adequate number of Yiddish speakers in the sample.

Proportional allocation makes more sense in other scenarios.  One example is when the strata themselves are not of interest in the research question, but they improve access to potential research participants.

To summarize, one good reason to use stratified sampling is if you believe that the sub-group you want to study is a small proportion of the population, and sample a disproportionately high number of subjects from this sub-group. This will enable you to compare your sub-group with the rest of the population with greater accuracy, and at lower cost.

Ritu Narayan, M.S., M.B.A provides services to clients in conceptualization, design and implementation of research studies. Her expertise in quantitative data analysis methods and background in business consulting and information systems help her provide insights during all stages of research, from questionnaire design to report writing. She has published in peer reviewed journals on what motivates people to post online reviews, and is interested in research focused on social media.

 

Analysis of Complex Sample Surveys Made Simple
Are you analyzing data from a nationally representative survey? Learn about the different types of sampling techniques and their effects on data analysis.

Reader Interactions

Comments

  1. Lyna says

    the 50% allocated to the disproportional sample is the result of any formula or is it just based on the researcher’s consideration?

  2. Vichu says

    Respected mam
    In my study population is 34,48,000 how to define sample in my research. This population divided in to 6 taluks.
    Pls clarify

  3. peter hadjis says

    An otherwise useful explanation, is muddled up by omitted text or suspended editing.

    Fourth paragraph down:
    Obviously, it is difficult to draw any meaningful inferences about the group’s behavior based on 6 respondents.
    only makes sense if you take into consideration:

    Step 2: Sample the strata using proportionate or disproportionate allocation
    1. In proportionate allocation, in a sample of 1000, you would draw 6 Yiddish speakers.

    which further down in the text.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.