What Makes a Statistical Analysis Wrong?

One of the most anxiety-laden questions I get from researchers is whether their analysis is “right.”

I’m always slightly uncomfortable with that word. Often there is no one right analysis.

It’s like finding Mr. or Ms. Right. Most of the time, there is not just one Right. But there are many that are clearly Wrong.

What Makes an Analysis Right?

Luckily, what makes an analysis right is easier to define than what makes a person right for you. It pretty much comes down to two things: whether the assumptions of the statistical method are being met and whether the analysis answers the research question.

Assumptions are very important. A test needs to reflect the measurement scale of the variables, the study design, and issues in the data. A repeated measures study design requires a repeated measures analysis. A binary dependent variable requires a categorical analysis method.

But within those general categories, there are often many analyses that meet assumptions. A logistic regression or a chi-square test both handle a binary dependent variable with a single categorical predictor. But a logistic regression can answer more research questions. It can incorporate covariates, directly test interactions, and calculate predicted probabilities. A chi-square test can do none of these.

So you get different information from different tests. They answer different research questions.

An analysis that is correct from an assumptions point of view is useless if it doesn’t answer the research question. A data set can spawn an endless number of statistical tests that don’t answer the research question. And you can spend an endless number of days running them.

When to Think about the Analysis

The real bummer is it’s not always clear that the analyses aren’t relevant until you  write up the research paper.

That’s why writing out the research questions in theoretical and operational terms is the first step of any statistical analysis. It’s absolutely fundamental. And I mean writing them in minute detail. Issues of mediation, interaction, subsetting, control variables, et cetera, should all be blatantly obvious in the research questions.

Thinking about how to analyze the data before collecting the data can help you from hitting a dead end. It can be very obvious, once you think through the details, that the analysis available to you based on the data won’t answer the research question.

Whether the answer is what you expected or not is a different issue.

So when you are concerned about getting an analysis “right,” clearly define the design, variables, and data issues, but most importantly, get explicitly clear about what you want to learn from this analysis.

Once you’ve done this, it’s much easier to find the statistical method that answers the research questions and meets assumptions. Even if you don’t know the right method, you can narrow your search with clear guidance.

 

The Pathway: Steps for Staying Out of the Weeds in Any Data Analysis
Get the road map for your data analysis before you begin. Learn how to make any statistical modeling – ANOVA, Linear Regression, Poisson Regression, Multilevel Model – straightforward and more efficient.

Reader Interactions

Comments

  1. Ruy Tchao says

    Karen, Your post is right on point, especially about p-values.
    I urge everyone to read two great articles on misinterpretations of p-value :
    A Dirty Dozen: Twelve p-value Misconceptions By Steven Goodman in Seminars in Hematology 45, 135-140, 2008

    Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.
    By Sander Greenland et al. in Eur J Epidemiol (2016) 31:337–350. DOI 10.1007/s10654-016-0149-3

    I use these two article in my course on research conduct and ethics.

    Ruy

  2. Joseph Green says

    Your point about answering research questions is exactly what many people need to hear.
    Thank you!

    It seems to me that this is also relevant to discussions about p values, because every p value answers a specific question.
    For example, in a comparison between the means of two groups the question that’s answered by a p value is something like this:
    “What is the probability of obtaining the measured difference between the two means, or a greater difference, if the null hypothesis is correct?”
    (Karen, if I’m wrong about that please correct me.)
    Do researchers often ask questions like that?
    No.
    So, despite the ubiquity of p values in the research literature of medicine, psychology, sociology, etc., they give the answer to a question that almost nobody ever asks.

  3. Michele says

    Karen,

    This is so apropos for any researcher. You’ve done a great job spelling these two issues out clearly. It isn’t fun when an experienced reviewer looks at you with a raised eyebrow and asks you to rationalize an analysis in which your data do not quite meet statistical assumptions because you rushed through these crucial design steps. Trust me—it can’t be done. A hard way to learn this particular lesson, so take Karen’s word for it. Thanks, Karen!

  4. Nina says

    Great post. Thank you!
    Statistical analyses are , imho, decsion-aid tools.
    If this is not clear at the beginning, we can easily come to an absolutely correct and totally useless answer, like 42. Good luck!

  5. Luis G. Morales says

    Dear Karen,
    Your point(s) can’t be more relevant for any research project. I have reviewed projects, theses, and papers where the research questions were not well defined, which in turn led to collecting data that says not much by themsalves. Unfortunatelly, many journals allow these research to be published by requiring authors to make dozens of statistical tests -the least known, the better. I hope universities and finantial sources will require the submission of not only the scientific hypotheses to be tested and the sampling/experimental design, but also the probleble statistical outcomes of the data analyses and the meaning of these results for answering the research questions of each proposal. Thanks for bringing this key -but often overlooked issue

    Have a happy & safe analyzing

    Cheers,

    Luis


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.