In choosing an approach to missing data, there are a number of things to consider. But you need to keep in mind what you’re aiming for before you can even consider which approach to take.
There are three criteria we’re aiming for with any missing data technique:
1. Unbiased parameter estimates: Whether you’re estimating means, regressions, or odds ratios, you want your parameter estimates to be accurate representations of the actual population parameters. In statistical terms, that means the estimates should be unbiased. If all the (more…)
There are two ways to run a repeated measures analysis.The traditional way is to treat it as a multivariate test–each response is considered a separate variable.The other way is to it as a mixed model.While the multivariate approach is easy to run and quite intuitive, there are a number of advantages to running a repeated measures analysis as a mixed model.
First I will explain the difference between the approaches, then briefly describe some of the advantages of using the mixed models approach. (more…)
I’m sure I don’t need to explain to you all the problems that occur as a result of missing data. Anyone who has dealt with missing data—that means everyone who has ever worked with real data—knows about the loss of power and sample size, and the potential bias in your data that comes with listwise deletion.
Listwise deletion is the default method for dealing with missing data in most statistical software packages. It simply means excluding from the analysis any cases with data missing on any variables involved in the analysis.
A very simple, and in many ways appealing, method devised to (more…)
There are many ways to approach missing data. The most common, I believe, is to ignore it. But making no choice means that your statistical software is choosing for you.
Most of the time, your software is choosing listwise deletion. Listwise deletion may or may not be a bad choice, depending on why and how much data are missing.
Another common approach among those who are paying attention is imputation. Imputation simply means replacing the missing values with an estimate, then analyzing the full data set as if the imputed values were actual observed values.
How do you choose that estimate? The following are common methods: (more…)
Two excellent resources about multiple imputation and missing data:
Joe Schafer’s Multiple Imputation FAQ Page gives more detail about multiple imputation, including a list of references.
Paul Allison’s 2001 book Missing Data is the most readable book on the topic. It gives in-depth information on many good approaches to missing data, including multiple imputation. It is aimed at social science researchers, and best of all, it is very affordable (about $15).
SPSS has a nice little feature for adding and averaging variables with missing data that many people don’t know about.
It allows you to add or average variables, while specifying how many are allowed to be missing.
For example, a very common situation is a researcher needs to average the values of the 5 variables on a scale, each of which is measured on the same Likert scale.
There are two ways to do this in SPSS syntax.
Newvar=(X1 + X2 + X3 + X4 + X5)/5 or
Newvar=MEAN(X1,X2, X3, X4, X5).
In the first method, if any of the variables are missing, due to SPSS’s default of listwise deletion, Newvar will also be missing.
In the second method, if any of the variables is missing, it will still calculate the mean. While this seems great at first, the researcher may wish to limit how many of the 5 variables need to be observed in order to calculate the mean. If only one or two variables are present, the mean may not be a reasonable estimate of the mean of all 5 variables.
SPSS has an option for dealing with this situation. Running it the following way will only calculate the mean if any 4 of the 5 variables is observed. If fewer than 4 of the variables are observed, Newvar will be system missing.
Newvar=MEAN.4(X1,X2, X3, X4, X5).
You can specify any number of variables that need to be observed.
(This same distinction holds for the SUM function in SPSS, but the scale changes based on how many are being averaged. A better approach is to calculate the mean, then multiply by 5).
This works the same way in the syntax or in the Transform–>Compute menu dialog.
First Published 12/1/2016;
Updated 7/20/21 to give more detail.