In my last blog post, I wrote about a mistake I once made when I didn’t realize the defaults for dummy coding were different in two SPSS procedures (Binary Logistic and GEE).
Ironically, about the same time I wrote it, I was having a conversation with Ann Maria de Mars on Twitter. She was trying to figure out why her logistic regression model fit results were identical in SAS Proc Logistic and SPSS Binary Logistic, but the coefficients in SAS were half those of SPSS.
It was ironic because I, of course, didn’t recognize it as the same issue and wasn’t much help.
But Ann Maria investigated and discovered that it came down to differences in the defaults for coding categorical predictors in SAS and SPSS that did it. Her detailed and humorous explanation is here.
Some takeaways for you, the researcher and data analyst:
1. Give yourself a break if you hit a snag. Even very experienced data analysts, statisticians who understand what they’re doing, get stumped sometimes. Don’t ever think that performing data analysis is an IQ test. You’re bringing together many skills and complex tools.
2. Learn thy software. In my last post, I phrased it “Know thy software”, but this is where you get to know it. Snags are good opportunities to investigate the details of your software, just like Ann Maria did. If you can think of it as a challenge to figure out–a puzzle–it can actually be fun.
Make friends with your syntax manuals.
3. Get help when you need it. Statistical software packages *are* complex tools. You don’t have to know everything to use them
Ask colleagues. Call customer support. Call a stat consultant. That’s what they’re there for.
4. A great way to check your work is to run your test two different ways. It’s another reason to be able to use at least two stat software packages. I’m not suggesting you have to run every analysis twice. But when a result looks strange, or you want to double-check a specific important model, this can be a good strategy for testing things out.
It may be that your results aren’t telling you what you think they are.
[Logistic_Regression_Workshop]
Can I use SPSS MIXED models for (a) ordinal logistic regression, and (b) multi-nomial logistic regression?
Every once in a while I get emailed a question that I think others will find helpful. This is definitely one of them.
My answer:
No.
(And by the way, this is all true in SAS as well. I’ll include the SAS versions in parentheses). (more…)
Missing Data, and multiple imputation specifically, is one area of statistics that is changing rapidly. Research is still ongoing, and each year new findings on best practices and new techniques in software appear.
The downside for researchers is that some of the recommendations missing data statisticians were making even five years ago have changed.
Remember that there are three goals of multiple imputation, or any missing data technique: Unbiased parameter estimates in the final analysis (more…)
Here’s a little tip.
When you construct Dummy Variables, make it easy on yourself to remember which code is which. Heck, if you want to be really nice, make it easy for anyone else who will analyze the data or read the results.
Make the codes inherent in the Dummy variable name.
So instead of a variable named Gender with values of 1=Female and 0=Male, call the variable Female.
Instead of a set of dummy variables named MaritalStatus1 with values of 1=Married and 0=Single, along with MaritalStatus2 with values 1=Divorced and 0=Single, name the same variables Married and Divorced.
And if you’re new to dummy coding, this has the extra bonus of making the dummy coding intuitive. It’s just a set of yes/no variables about all but one of your categories.
Someone who registered for my upcoming Interpreting (Even Tricky) Regression Models workshop asked if the content applies to logistic regression as well.
The short answer: Yes
The long-winded detailed explanation of why this is true and the one caveat:
One of the greatest things about regression models is that they all have the same set up: (more…)
I recently received this email, which I thought was a great question, and one of wider interest…
Hello Karen,
I am an MPH student in biostatistics and I am curious about using regression for tests of associations in applied statistical analysis. Why is using regression, or logistic regression “better” than doing bivariate analysis such as Chi-square?
I read a lot of studies in my graduate school studies, and it seems like half of the studies use Chi-Square to test for association between variables, and the other half, who just seem to be trying to be fancy, conduct some complicated regression-adjusted for-controlled by- model. But the end results seem to be the same. I have worked with some professionals that say simple is better, and that using Chi- Square is just fine, but I have worked with other professors that insist on building models. It also just seems so much more simple to do chi-square when you are doing primarily categorical analysis.
My professors don’t seem to be able to give me a simple justified
answer, so I thought I’d ask you. I enjoy reading your site and plan to begin participating in your webinars.
Thank you!
(more…)