One of the most anxiety-laden questions I get from researchers is whether their analysis is “right.”
I’m always slightly uncomfortable with that word. Often there is no one right analysis.
It’s like finding Mr. or Ms. Right. Most of the time, there is not just one Right. But there are many that are clearly Wrong.
What Makes an Analysis Right?
Luckily, what makes an analysis right is easier to define than what makes a person right for you. It pretty much comes down to two things: whether the assumptions of the statistical method are being met and whether the analysis answers the research question.
Assumptions are very important. A test needs to reflect the measurement scale of the variables, the study design, and issues in the data. A repeated measures study design requires a repeated measures analysis. A binary dependent variable requires a categorical analysis method.
But within those general categories, there are often many analyses that meet assumptions. A logistic regression or a chi-square test both handle a binary dependent variable with a single categorical predictor. But a logistic regression can answer more research questions. It can incorporate covariates, directly test interactions, and calculate predicted probabilities. A chi-square test can do none of these.
So you get different information from different tests. They answer different research questions.
An analysis that is correct from an assumptions point of view is useless if it doesn’t answer the research question. A data set can spawn an endless number of statistical tests that don’t answer the research question. And you can spend an endless number of days running them.
When to Think about the Analysis
The real bummer is it’s not always clear that the analyses aren’t relevant until you write up the research paper.
That’s why writing out the research questions in theoretical and operational terms is the first step of any statistical analysis. It’s absolutely fundamental. And I mean writing them in minute detail. Issues of mediation, interaction, subsetting, control variables, et cetera, should all be blatantly obvious in the research questions.
Thinking about how to analyze the data before collecting the data can help you from hitting a dead end. It can be very obvious, once you think through the details, that the analysis available to you based on the data won’t answer the research question.
Whether the answer is what you expected or not is a different issue.
So when you are concerned about getting an analysis “right,” clearly define the design, variables, and data issues, but most importantly, get explicitly clear about what you want to learn from this analysis.
Once you’ve done this, it’s much easier to find the statistical method that answers the research questions and meets assumptions. Even if you don’t know the right method, you can narrow your search with clear guidance.
I recently received this email, which I thought was a great question, and one of wider interest…
Hello Karen,
I am an MPH student in biostatistics and I am curious about using regression for tests of associations in applied statistical analysis. Why is using regression, or logistic regression “better” than doing bivariate analysis such as Chi-square?
I read a lot of studies in my graduate school studies, and it seems like half of the studies use Chi-Square to test for association between variables, and the other half, who just seem to be trying to be fancy, conduct some complicated regression-adjusted for-controlled by- model. But the end results seem to be the same. I have worked with some professionals that say simple is better, and that using Chi- Square is just fine, but I have worked with other professors that insist on building models. It also just seems so much more simple to do chi-square when you are doing primarily categorical analysis.
My professors don’t seem to be able to give me a simple justified
answer, so I thought I’d ask you. I enjoy reading your site and plan to begin participating in your webinars.
Thank you!
(more…)
The assumptions of normality and constant variance in a linear model (both OLS regression and ANOVA) are quite robust to departures. That means that even if the assumptions aren’t met perfectly, the resulting p-values will still be reasonable estimates.
But you need to check the assumptions anyway, because some departures are so far off that the p-values become inaccurate. And in many cases there are remedial measures you can take to turn non-normal residuals into normal ones.
But sometimes you can’t.
Sometimes it’s because the dependent variable just isn’t appropriate for a linear model. The (more…)
I was recently asked this question about Chi-square tests. This question comes up a lot, so I thought I’d share my answer.
I have to compare two sets of categorical data in a 2×4 table. I cannot run the chi-square test because most of the cells contain values less than five and a couple of them contain values of 0. Is there any other test that I could use that overcomes the limitations of chi-square?
And here is my answer: (more…)
There are 4 questions you must answer to choose an appropriate statistical analysis.
1. What is your Research Question?
2. What is the scale of measurement of the variables used to answer the research question?
3. What is the Design? (between subjects, within subjects, etc.)
4. Are there any data issues? (missing, censored, truncated, etc.)
If you have not already, read about these in more detail.
(more…)
One of the most common situations in which researchers get stuck with statistics is choosing which statistical methodology is appropriate to analyze their data. If you start by asking the following four questions, you will be able to narrow things down considerably.
Even if you don’t know the implications of your answers, answering these questions will clarify issues for you. It will help you decide what information to seek, and it will make any conversations you have with statistical advisors more efficient and useful.
1. What is your research question? (more…)