Need to dummy code in a Cox regression model?
Interpret interactions in a logistic regression?
Add a quadratic term to a multilevel model?
This is where statistical analysis starts to feel really hard. You’re combining two difficult issues into one.
You’re dealing with both a complicated modeling technique at Stage 3 (survival analysis, logistic regression, multilevel modeling) and tricky effects in the model (dummy coding, interactions, and quadratic terms).
The only way to figure it all out in a situation like that is to break it down into parts. (more…)
In my last blog post, I wrote about a mistake I once made when I didn’t realize the defaults for dummy coding were different in two SPSS procedures (Binary Logistic and GEE).
Ironically, about the same time I wrote it, I was having a conversation with Ann Maria de Mars on Twitter. She was trying to figure out why her logistic regression model fit results were identical in SAS Proc Logistic and SPSS Binary Logistic, but the coefficients in SAS were half those of SPSS.
It was ironic because I, of course, didn’t recognize it as the same issue and wasn’t much help.
But Ann Maria investigated and discovered that it came down to differences in the defaults for coding categorical predictors in SAS and SPSS that did it. Her detailed and humorous explanation is here.
Some takeaways for you, the researcher and data analyst:
1. Give yourself a break if you hit a snag. Even very experienced data analysts, statisticians who understand what they’re doing, get stumped sometimes. Don’t ever think that performing data analysis is an IQ test. You’re bringing together many skills and complex tools.
2. Learn thy software. In my last post, I phrased it “Know thy software”, but this is where you get to know it. Snags are good opportunities to investigate the details of your software, just like Ann Maria did. If you can think of it as a challenge to figure out–a puzzle–it can actually be fun.
Make friends with your syntax manuals.
3. Get help when you need it. Statistical software packages *are* complex tools. You don’t have to know everything to use them
Ask colleagues. Call customer support. Call a stat consultant. That’s what they’re there for.
4. A great way to check your work is to run your test two different ways. It’s another reason to be able to use at least two stat software packages. I’m not suggesting you have to run every analysis twice. But when a result looks strange, or you want to double-check a specific important model, this can be a good strategy for testing things out.
It may be that your results aren’t telling you what you think they are.
[Logistic_Regression_Workshop]
Can I use SPSS MIXED models for (a) ordinal logistic regression, and (b) multi-nomial logistic regression?
Every once in a while I get emailed a question that I think others will find helpful. This is definitely one of them.
My answer:
No.
(And by the way, this is all true in SAS as well. I’ll include the SAS versions in parentheses). (more…)
Missing Data, and multiple imputation specifically, is one area of statistics that is changing rapidly. Research is still ongoing, and each year new findings on best practices and new techniques in software appear.
The downside for researchers is that some of the recommendations missing data statisticians were making even five years ago have changed.
Remember that there are three goals of multiple imputation, or any missing data technique: Unbiased parameter estimates in the final analysis (more…)
Here’s a little tip.
When you construct Dummy Variables, make it easy on yourself to remember which code is which. Heck, if you want to be really nice, make it easy for anyone else who will analyze the data or read the results.
Make the codes inherent in the Dummy variable name.
So instead of a variable named Gender with values of 1=Female and 0=Male, call the variable Female.
Instead of a set of dummy variables named MaritalStatus1 with values of 1=Married and 0=Single, along with MaritalStatus2 with values 1=Divorced and 0=Single, name the same variables Married and Divorced.
And if you’re new to dummy coding, this has the extra bonus of making the dummy coding intuitive. It’s just a set of yes/no variables about all but one of your categories.
Someone who registered for my upcoming Interpreting (Even Tricky) Regression Models workshop asked if the content applies to logistic regression as well.
The short answer: Yes
The long-winded detailed explanation of why this is true and the one caveat:
One of the greatest things about regression models is that they all have the same set up: (more…)