Some approaches to missing data work well in some situations, but perform very poorly in others. So it’s really important to get a good idea of the type and pattern of missingness in your data. You may even take different missing data approaches to different variables.
Matt Blackwell of the Harvard Social Science Statistics blog has come up with a nice way to visualize the missingness patterns in a data set. (I’m a big fan of graphing data to understand it). He calls it a Missingness Map.
The only drawback seems to be that it will be cumbersome for large data sets.
In choosing an approach to missing data, there are a number of things to consider. But you need to keep in mind what you’re aiming for before you can even consider which approach to take.
There are three criteria we’re aiming for with any missing data technique:
1. Unbiased parameter estimates: Whether you’re estimating means, regressions, or odds ratios, you want your parameter estimates to be accurate representations of the actual population parameters. In statistical terms, that means the estimates should be unbiased. If all the (more…)
There are two ways to run a repeated measures analysis.The traditional way is to treat it as a multivariate test–each response is considered a separate variable.The other way is to it as a mixed model.While the multivariate approach is easy to run and quite intuitive, there are a number of advantages to running a repeated measures analysis as a mixed model.
First I will explain the difference between the approaches, then briefly describe some of the advantages of using the mixed models approach. (more…)
I’m sure I don’t need to explain to you all the problems that occur as a result of missing data. Anyone who has dealt with missing data—that means everyone who has ever worked with real data—knows about the loss of power and sample size, and the potential bias in your data that comes with listwise deletion.
Listwise deletion is the default method for dealing with missing data in most statistical software packages. It simply means excluding from the analysis any cases with data missing on any variables involved in the analysis.
A very simple, and in many ways appealing, method devised to (more…)
There are many ways to approach missing data. The most common, I believe, is to ignore it. But making no choice means that your statistical software is choosing for you.
Most of the time, your software is choosing listwise deletion. Listwise deletion may or may not be a bad choice, depending on why and how much data are missing.
Another common approach among those who are paying attention is imputation. Imputation simply means replacing the missing values with an estimate, then analyzing the full data set as if the imputed values were actual observed values.
How do you choose that estimate? The following are common methods: (more…)
Two excellent resources about multiple imputation and missing data:
Joe Schafer’s Multiple Imputation FAQ Page gives more detail about multiple imputation, including a list of references.
Paul Allison’s 2001 book Missing Data is the most readable book on the topic. It gives in-depth information on many good approaches to missing data, including multiple imputation. It is aimed at social science researchers, and best of all, it is very affordable (about $15).