
Multicollinearity is one of those terms in statistics that is often defined in one of two ways:
1. Very mathematical terms that make no sense — I mean, what is a linear combination anyway?
2. Completely oversimplified in order to avoid the mathematical terms — it’s a high correlation, right?
So what is it really? In English?
(more…)
I’ve written about this before–there is just something about statistics that makes people feel…well, not so smart.

This makes people v-e-r-y reluctant to ask questions.
This fact really struck me years and years ago. Hit me hard.
(more…)
It’s easy to think that if you just knew statistics better, data analysis wouldn’t be so hard.

It’s true that more statistical knowledge is always helpful. But I’ve found that statistical knowledge is only part of the story.
Another key part is developing data analysis skills. These skills apply to all analyses. It doesn’t matter which statistical method or software you’re using. So even if you never need any statistical analysis harder than a t-test, developing these skills will make your job easier.
(more…)
Multilevel models and Mixed Models are generally the same thing. In our recent webinar on the basics of mixed
models, Random Intercept and Random Slope Models, we had a number of questions about terminology that I’m going to answer here.
If you want to see the full recording of the webinar, get it here. It’s free.
Q: Is this different from multi-level modeling?
A: No. I don’t really know the history of why we have the different names, but the difference in multilevel modeling (more…)
What does it mean for two variables to be correlated?
Is that the same or different than if they’re associated or related?
This is the kind of question that can feel silly, but shouldn’t. It’s just a reflection of the confusing terminology used in statistics. In this case, the technical statistical term looks like, but is not exactly the same as, the way we mean it in everyday English. (more…)
The following statement might surprise you, but it’s true.
To run a linear model, you don’t need an outcome variable Y that’s normally distributed. Instead, you need a dependent variable that is:
- Continuous
- Unbounded
- Measured on an interval or ratio scale
The normality assumption is about the errors in the model, which have the same distribution as Y|X. It’s absolutely possible to have a skewed distribution of Y and a normal distribution of errors because of the effect of X. (more…)