Multicollinearity is one of those terms in statistics that is often defined in one of two ways:
1. Very mathematical terms that make no sense — I mean, what is a linear combination anyway?
2. Completely oversimplified in order to avoid the mathematical terms — it’s a high correlation, right?
So what is it really? In English?
(more…)
How do you know your variables are measuring what you think they are? And how do you know they’re doing it well?
A key part of answering these questions is establishing reliability and validity of the measurements that you use in your research study. But the process of establishing reliability and validity is confusing. There are a dizzying number of choices available to you.
(more…)
Whether or not you run experiments, there are elements of experimental design that affect how you need to analyze many types of studies.
The most fundamental of these are replication, randomization, and blocking. These key design elements come up in studies under all sorts of names: trials, replicates, multi-level nesting, repeated measures. Any data set that requires mixed or multilevel models has some of these design elements. (more…)
Sample size estimates are one of those data analysis tasks that look straightforward, but once you try to do one, make you want to bang your head against the computer in frustration. Or, maybe that’s just me.
Regardless of how they make you feel, they are super important to do for your study before you collect the data.
(more…)
A research study rarely involves just one single statistical test. And multiple testing can result in more statistically significant findings just by chance.
After all, with the typical Type I error rate of 5% used in most tests, we are allowing ourselves to “get lucky” 1 in 20 times for each test. When you figure out the probability of Type I error across all the tests, that probability skyrockets.
(more…)
Transformations don’t always help, but when they do, they can improve your linear regression model in several ways simultaneously.
They can help you better meet the linear regression assumptions of normality and homoscedascity (i.e., equal variances). They also can help avoid some of the artifacts caused by boundary limits in your dependent variable — and sometimes even remove a difficult-to-interpret interaction.
(more…)