Maybe I’ve noticed it more because I’m getting ready for next week’s SPSS in GLM workshop. Just this week, I’ve had a number of experiences with people’s struggle with SPSS, and GLM in particular.
Number 1: I read this in a technical report by Patrick Burns comparing SPSS to R:
“SPSS is notorious for its attitude of ‘You want to do one of these things. If you don’t understand what the output means, click help and we’ll pop up five lines of mumbo-jumbo that you’re not going to understand either.’ “
And while I still prefer SPSS, I had to laugh because the anonymous person Burns (more…)
I know you know it–those assumptions in your regression or ANOVA model really are important. If they’re not met adequately, all your p-values are inaccurate, wrong, useless.
But, and this is a big one, linear models are robust to departures from those assumptions. Meaning, they don’t have to fit exactly for p-values to be accurate, right, and useful.
You’ve probably heard both of these contradictory statements in stats classes and a million other places, and they are the kinds of statements that drive you crazy. Right?
I mean, do statisticians make this stuff up just to torture researchers? Or just to keep you feeling stupid?
No, they really don’t. (I promise!) And learning how far you can push those robust assumptions isn’t so hard, with some training and a little practice. Over the years, I’ve found a few mistakes researchers commonly make because of one, or both, of these statements:
1. They worry too much about the assumptions and over-test them. There are some nice statistical tests to determine if your assumptions are met. And it’s so nice having a p-value, right? Then it’s clear what you’re supposed to do, based on that golden rule of p<.05.
The only problem is that many of these tests ignore that robustness. They find that every distribution is non-normal and heteroskedastic. They’re good tools, but these hammers think every data set is a nail. You want to use the hammer when needed, but don’t hammer everything.
2.They assume everything is robust anyway, so they don’t test anything. It’s easy to do. And once again, it probably works out much of the time. Except when it doesn’t.
Yes, the GLM is robust to deviations from some of the assumptions. But not all the way, and not all the assumptions. You do have to check them.
3. They test the wrong assumptions. Look at any two regression books and they’ll give you a different set of assumptions.
This is partially because many of these “assumptions” need to be checked, but they’re not really model assumptions, they’re data issues. And it’s also partially because sometimes the assumptions have been taken to their logical conclusions. That textbook author is trying to make it more logical for you. But sometimes that just leads you to testing the related, but wrong thing. It works out most of the time, but not always.
In many research fields, a common practice is to categorize continuous predictor variables so they work in an ANOVA. This is often done with median splits. This is a way of splitting the sample into two categories: the “high” values above the median and the “low” values below the median.
Reasons Not to Categorize a Continuous Predictor
There are many reasons why this isn’t such a good idea: (more…)
A new version of Amelia II, a free package for multiple imputation, has just been released today. Amelia II is available in two versions. One is part of R, and the other, AmeliaView, is a GUI package that does not require any knowledge of the R programming language. They both use the same underlying algorithms and both require having R installed.
At the Amelia II website, you can download Amelia II (did I mention it’s free?!), download R, get the very useful User’s Guide, join the Amelia listserve, and get information about multiple imputation.
If you want to learn more about multiple imputation:
I’ve talked a bit about the arbitrary nature of median splits and all the information they just throw away.
But I have found that as a data analyst, it is incredibly freeing to be able to choose whether to make a variable continuous or categorical and to make the switch easily. Essentially, this means you need to be (more…)
There are two ways to run a repeated measures analysis.The traditional way is to treat it as a multivariate test–each response is considered a separate variable.The other way is to it as a mixed model.While the multivariate approach is easy to run and quite intuitive, there are a number of advantages to running a repeated measures analysis as a mixed model.
First I will explain the difference between the approaches, then briefly describe some of the advantages of using the mixed models approach. (more…)