The Analysis Factor

A Note from Karen

Featured Article:
Is Mean Imputation
Really so Terrible?

Resource of the Month

What's New

About Us

Our Website

More About Us

You received this email because you subscribed to The Analysis Factor's mailing list. To change your subscription, see the link at end of this email. If your email is having trouble with the format, click here for a web version.

Please forward this to anyone you know who might benefit. If you received this from a friend, sign up for this ezine now!

Dear %$firstname$%,

Karen Grace-Martin Hope you're enjoying spring! I find myself having a hard time enjoying it for fear that winter just isn't done yet. A few years ago we got a foot of snow in mid-April. So we're not taking the snow tires off yet.

Meanwhile, we're wrapping up the first workshop. It's been going really well, and I've really enjoyed it. I've been hesitant to announce the next workshops before getting this one underway, to make sure the technology worked out okay. It has been great, though, and we've all been learning a lot. I learn things every time I teach a workshop. Laying out the information into logical steps always clarifies something for me, too. But some questions have already come up that have given me ideas for new programs. One idea that seems especially popular is about the ins and outs of how to runs Regressions and ANOVAs in SPSS.

Please feel free to drop me a line to let me know if that would benefit you. I was thinking of doing a series of one-time SPSS workshops over the summer. I know that scheduling often becomes an issue as we near graduation and summer, so look for an announcement about a quick survey about scheduling our next programs.

In the meantime, our article this month is about missing data and what to do about it. If you want more information, I've linked to a great article about new modern approaches in the Resource of the Month, and it's the topic of this month's teleseminar. I also have quite a few articles about missing data in StatChat, my blog, so if anything is unclear, you can also check out the Missing Data section there. I'll also be announcing soon a new Missing Data Teleworkshop to be coming up in the next month or two.

Happy analyzing,
Karen

I’m sure I don’t need to explain to you all the problems that result from missing data. Anyone who has dealt with missing data—that means anyone who has ever worked with real data—knows about the loss of power and sample size, and the potential bias in your data that comes with listwise deletion.

Listwise deletion is the default method for dealing with missing data in most statistical software packages. It simply means excluding from the analysis any cases with data missing on any variables involved in the analysis.

A very simple, and in many ways appealing, method devised to overcome these problems is mean imputation. Once again, I'm sure you've heard of it--just plug in the mean for that variable for all the missing values. The nice part is the mean isn't affected, and you don't lose that case from the analysis. And it's so easy! SPSS even has a little button to click to just impute all those means.

But there are new problems. True, the mean doesn't change, but the relationships with other variables do. And that's usually what you're interested in, right? Well, now they're biased. And while the sample size remains at its full value, the standard error of that variable will be vastly underestimated--and this underestimation gets bigger the more missing data there are. Too-small standard errors lead to too-small p-values, so now you're reporting results that should not be there.

There are other options. Multiple Imputation and Maximum Likelihood both solve these problems. But while Multiple Imputation is not available in all the major stats packages, it is very labor-intensive to do well. And Maximum Likelihood isn't hard or labor intensive, but requires using structural equation modeling software, such as AMOS or MPlus.

The good news is there are other imputation techniques that are still quite simple, and don't cause bias in some situations. And sometimes (although rarely) it really is okay to use mean imputation. When?

If your rate of missing data is very, very small, it honestly doesn't matter what technique you use. I'm talking very, very, very small (2-3%).

There is another, better method for imputing single values, however, that is only slightly more difficult than mean imputation. It uses the E-M Algorithm, which stands for Expectation-Maximization. It is an interative procedure in which it uses other variables to impute a value (Expectation), then checks whether that is the value most likely (Maximization). If not, it re-imputes a more likely value. This goes on until it reaches the most likely value.

EM imputations are better than mean imputations because they preserve the relationship with other variables, which is vital if you go on to use something like Factor Analysis or Regression. They still underestimate standard error, however, so once again, this approach is only reasonable if the percentage of missing data are very small (under 5%) and if the standard error of individual items is not vital (as when combining individual items into an index).

EM Imputations are available in all the major software packages.

The heavy hitters like Multiple Imputation and Maximum Likelihood are still superior methods of dealing with missing data and are in most situations the only viable approach. But you need to fit the right tool to the size of the problem. It may be true that backhoes are better at digging holes than trowels, but trowels are just right for digging small holes. It's better to use a small tool like EM when it fits than to ignore the problem altogether.

Schafer, J.L. & Graham, J.W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177.

This is a very well-written overview of the new approaches to dealing with missing data. Joe Schaefer is one of the top statististicians doing research on Missing data techniques and John Graham runs the statistical consulting center at Penn State. Together they explain these new techniques in understandable ways.

1. Free Teleseminar on April 29th: Approaches to Missing Data: The Good, the Bad, and the Unthinkable

You’ve probably heard about many different approaches to dealing with missing data, and you’ve probably gotten different opinions about which one you should use. In this teleseminar, you’ll get an overview of:

the three types of missing data, and how they affect the approach to take
the common approach that is generally worse than any other
the easy, common, seemingly bad approach that often isn’t so bad, and the situations when it doesn’t work
the two approaches that give unbiased results, one that is very easy to implement, but only works in limited situations and one that is harder to implement well, but works with any statistical analysis

For more information and to register: http://www.analysisfactor.com/learning/teletraining6.html

What is The Analysis Factor? The Analysis Factor is the difference between knowing about statistics and knowing how to use statistics. It acknowledges that statistical analysis is an applied skill. It requires learning how to use statistical tools within the context of a researcher’s own data, and supports that learning.

The Analysis Factor, the organization, offers statistical consulting, projects, resources, and learning programs that empower social science researchers to become confident, able, and skilled statistical practitioners. Our aim is to make your journey acquiring the applied skills of statistical analysis easier and more pleasant.

Karen Grace-Martin, the founder, spent seven years as a statistical consultant at Cornell University. While there, she learned that being a great statistical advisor is not only about having excellent statistical skills, but about understanding the pressures and issues researchers face, about fabulous customer service, and about communicating technical ideas at a level each client understands.

You can learn more about Karen Grace-Martin and The Analysis Factor at analysisfactor.com.

Please forward this newsletter to colleagues who you think would find it useful. Your recommendation is how we grow.

If you received this email from a friend or colleague, click here to subscribe to this newsletter.

Need to change your email address? See below for details.

No longer wish to receive this newsletter? See below to cancel.