Author: Trent Buskirk, PhD.
What do you do when you hear the word error? Do you think you made a mistake?
Well in survey statistics, error could imply that things are as they should be. That might be the best news yet–error could mean that things are as they should be.
Let’s break this down a bit more before you think this might be a typo or even worse, an error. (more…)
by Karen Grace-Martin and Trent Buskirk
Sampling is such a fundamental concept in statistics that it’s easy to overlook. You know, like fish ignore water.
It’s just there.
But how you sample is actually very important.
There are many different ways of taking probability samples, but they come down to two basic types. Most of the statistics we’re trained on use only one of those types—the simple random sample.
If you don’t have a simple random sample, you need to incorporate that into the way you calculate your statistics in order for the statistics to accurately reflect the population.
Why all this is important
Remember the objective of a sample is to represent the population of interest.
Simple random samples do that in a very straightforward way.
Because they’re simple, after all.
Complex samples do it as well, but in a more…roundabout way. Their roundabout nature has many other advantages, though. But you do need to make adjustments to any statistics you calculate from them.
What is Simple random sampling?
Simple Random Samples (SRS) have a few important features.
1. Each element in the population has an equal probability of being selected to the sample.
That’s pretty self-explanatory, but it has important consequences and requirements.
First, it requires that the list of all individuals in the population is available to the researcher.
Practically, this is never entirely true. But we can often get close. Or we can at least have reason to believe that the individuals who are available are in no systematic way different than the ones who aren’t.
That belief may or may not be reasonable, and it’s a good thing to question in your own research.
One consequence is that all observations are independent and identically distributed (i.i.d.). You’re probably familiar with this term because it’s extremely important in statistics and modeling in particular.
2. The sample is a tiny proportion of an infinite population, but…
Now, we know that most populations aren’t really infinite. But once they get to a certain size, that part doesn’t matter mathematically.
The overall samples tend to represent a very small fraction of this very, very large population. But don’t let the small sample size fool you.
That’s where the beauty of simple random sampling comes in. Your sample doesn’t have to be that large to adequately represent the population from which it is drawn if the selection was done through simple random sampling.
In fact, most polls of Americans are conducted using simple random samples of telephone numbers and a sample of roughly 1,200 adults in the U.S.
This relatively small sample is enough to represent the full population of approximately 300 million people and to estimate a binary outcome like “will you vote or not in 2014 general elections?” within 3 percentage points.
In the next post in this series, we’ll talk about the other kind of probability sample: Complex Samples.
by Lucy Fike
We know that using SPSS syntax is an easy way to organize analyses so that you can rerun them in the future without having to go through the menu commands.
Using Python with SPSS makes it much easier to do complicated programming, or even basic programming, that would be difficult to do using SPSS syntax alone. You can use scripting programming in Python to create programs that execute automatically. (more…)
by Maike Rahn, PhD
In previous posts in this series, we discussed factors and factor loadings and rotations. In this post, I would like to address another important detail for a successful factor analysis, the type of variables that you include in your analysis.
What type of variable?
Ideally, factor analysis is conducted with continuous variables that are normally distributed since factor analysis is based on a correlation matrix.
However, you will undoubtedly find many factor analyses that include ordinal variables, particularly Likert scale items.
While technically, Likert items don’t meet the assumptions of Factor Analysis, at least in some situations the results have been found to be quite reasonable. For example, Lubke & Muthen, (2004) found that Confirmatory Factor Analysis on a single homogenous group worked, as long as items have at least seven values.
Some researchers include variables with fewer than seven values into their factor analysis. Sometimes this cannot be avoided, if you are using an already published scale.
Last, there is an interesting discussion about including binary variables in a factor analysis in the Sage Publications booklet “Factor analysis. Statistical methods and practical issues” (Kim and Mueller, 1978; page 75).
Correct coding of variables
It is important to prepare your variables in advance. For example, if you anticipate finding a socioeconomic factor, create your ordinal variable occupation with levels from lowest to highest to make sure that you have a positive factor loading with your factor.
Occupational categories |
Levels of occupation variable |
Nurse’s Aid |
1 |
Administrative assistant |
2 |
Nurse |
3 |
Nurse manager |
4 |
Physician |
5 |
Department chair |
6 |
Director |
7 |
The reason for this preparation is that you will wind up with factor solutions that are easily interpretable, because variables that are coded in the same direction as the factor will always have a positive factor loading. On the other hand, variables that have an inverse association with the factor will always have a negative factor loading.
Kim, Jae-On and Mueller, Charles W (1978) Factor analysis. Statistical methods and practical issues. Series: Quantitative Applications in the Social Sciences. Sage Publications: Beverly Hills, CA.
Complex Surveys use a sampling technique other than a simple random sample. Terms you may have heard in this area include cluster sampling, stratified sampling, oversampling, two-stage sampling, and primary sampling unit.
Complex Samples require statistical methods that take the exact sampling design into account to ensure accurate results.
In this webinar, guest instructor Dr. Trent Buskirk will give you an overview of the common sampling techniques and their effects on data analysis.
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
About the Instructor
Trent D. Buskirk, Ph.D. is the Vice President of Statistics and Methodology, Marketing Systems Group.
Dr. Buskirk has more than 15 years of professional and academic experience in the fields of survey research, statistics, as well as SPSS, SAS, and R.
Dr. Buskirk has taught for more than a decade at the University of Nebraska and Saint Louis University where he was an Associate Professor of Biostatistics in the School of Public Health.
Not a Member Yet?
It’s never too early to set yourself up for successful analysis with support and training from expert statisticians.
Just head over and sign up for Statistically Speaking.
You'll get access to this training webinar, 130+ other stats trainings, a pathway to work through the trainings that you need — plus the expert guidance you need to build statistical skill with live Q&A sessions and an ask-a-mentor forum.
In Part 6, let’s look at basic plotting in R. Try entering the following three commands together (the semi-colon allows you to place several commands on the same line).
x <- seq(-4, 4, 0.2) ; y <- 2*x^2 + 4*x - 7
plot(x, y)
(more…)