One of the most anxiety-laden questions I get from researchers is whether their analysis is “right.”
I’m always slightly uncomfortable with that word. Often there is no one right analysis.
It’s like finding Mr. or Ms. Right. Most of the time, there is not just one Right. But there are many that are clearly Wrong.
What Makes an Analysis Right?
Luckily, what makes an analysis right is easier to define than what makes a person right for you. It pretty much comes down to two things: whether the assumptions of the statistical method are being met and whether the analysis answers the research question.
Assumptions are very important. A test needs to reflect the measurement scale of the variables, the study design, and issues in the data. A repeated measures study design requires a repeated measures analysis. A binary dependent variable requires a categorical analysis method.
But within those general categories, there are often many analyses that meet assumptions. A logistic regression or a chi-square test both handle a binary dependent variable with a single categorical predictor. But a logistic regression can answer more research questions. It can incorporate covariates, directly test interactions, and calculate predicted probabilities. A chi-square test can do none of these.
So you get different information from different tests. They answer different research questions.
An analysis that is correct from an assumptions point of view is useless if it doesn’t answer the research question. A data set can spawn an endless number of statistical tests that don’t answer the research question. And you can spend an endless number of days running them.
When to Think about the Analysis
The real bummer is it’s not always clear that the analyses aren’t relevant until you write up the research paper.
That’s why writing out the research questions in theoretical and operational terms is the first step of any statistical analysis. It’s absolutely fundamental. And I mean writing them in minute detail. Issues of mediation, interaction, subsetting, control variables, et cetera, should all be blatantly obvious in the research questions.
Thinking about how to analyze the data before collecting the data can help you from hitting a dead end. It can be very obvious, once you think through the details, that the analysis available to you based on the data won’t answer the research question.
Whether the answer is what you expected or not is a different issue.
So when you are concerned about getting an analysis “right,” clearly define the design, variables, and data issues, but most importantly, get explicitly clear about what you want to learn from this analysis.
Once you’ve done this, it’s much easier to find the statistical method that answers the research questions and meets assumptions. Even if you don’t know the right method, you can narrow your search with clear guidance.
Ruy Tchao says
Karen, Your post is right on point, especially about p-values.
I urge everyone to read two great articles on misinterpretations of p-value :
A Dirty Dozen: Twelve p-value Misconceptions By Steven Goodman in Seminars in Hematology 45, 135-140, 2008
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.
By Sander Greenland et al. in Eur J Epidemiol (2016) 31:337–350. DOI 10.1007/s10654-016-0149-3
I use these two article in my course on research conduct and ethics.
Ruy
Joseph Green says
Your point about answering research questions is exactly what many people need to hear.
Thank you!
It seems to me that this is also relevant to discussions about p values, because every p value answers a specific question.
For example, in a comparison between the means of two groups the question that’s answered by a p value is something like this:
“What is the probability of obtaining the measured difference between the two means, or a greater difference, if the null hypothesis is correct?”
(Karen, if I’m wrong about that please correct me.)
Do researchers often ask questions like that?
No.
So, despite the ubiquity of p values in the research literature of medicine, psychology, sociology, etc., they give the answer to a question that almost nobody ever asks.
richard lengo says
this is great.indeed am learning a lot.please keep on …
Michele says
Karen,
This is so apropos for any researcher. You’ve done a great job spelling these two issues out clearly. It isn’t fun when an experienced reviewer looks at you with a raised eyebrow and asks you to rationalize an analysis in which your data do not quite meet statistical assumptions because you rushed through these crucial design steps. Trust me—it can’t be done. A hard way to learn this particular lesson, so take Karen’s word for it. Thanks, Karen!
Nina says
Great post. Thank you!
Statistical analyses are , imho, decsion-aid tools.
If this is not clear at the beginning, we can easily come to an absolutely correct and totally useless answer, like 42. Good luck!
Luis G. Morales says
Dear Karen,
Your point(s) can’t be more relevant for any research project. I have reviewed projects, theses, and papers where the research questions were not well defined, which in turn led to collecting data that says not much by themsalves. Unfortunatelly, many journals allow these research to be published by requiring authors to make dozens of statistical tests -the least known, the better. I hope universities and finantial sources will require the submission of not only the scientific hypotheses to be tested and the sampling/experimental design, but also the probleble statistical outcomes of the data analyses and the meaning of these results for answering the research questions of each proposal. Thanks for bringing this key -but often overlooked issue
Have a happy & safe analyzing
Cheers,
Luis