How do you know your variables are measuring what you think they are? And how do you know they’re doing it well?
How do you know your variables are measuring what you think they are? And how do you know they’re doing it well?
The following statement might surprise you, but it’s true.
To run a linear model, you don’t need an outcome variable Y that’s normally distributed. Instead, you need a dependent variable that is:
The normality assumption is about the errors in the model, which have the same distribution as Y|X. It’s absolutely possible to have a skewed distribution of Y and a normal distribution of errors because of the effect of X. (more…)
What is a Confounder?
Confounder (also called confounding variable) is one of those statistical terms that confuses a lot of people. Not because it represents a confusing concept, but because of how it’s used.
(Well, it’s a bit of a confusing concept, but that’s not the worst part).
It has slightly different meanings to different types of researchers. The definition is essentially the same, but the research context can have specific implications for how that definition plays out.
If the person you’re talking to has a different understanding of what it means, you’re going to have a confusing conversation.
Let’s take a look at some examples to unpack this.
At times it is necessary to convert a continuous predictor into a categorical predictor. For example, income per household is shown below.
This data is censored, all family income above $155,000 is stated as $155,000. A further explanation about censored and truncated data can be found here. It would be incorrect to use this variable as a continuous predictor due to its censoring.
In a recent article, we reviewed the impact of removing the intercept from a regression model when the predictor variable is categorical. This month we’re going to talk about removing the intercept when the predictor variable is continuous.
Spoiler alert: You should never remove the intercept when a predictor variable is continuous.
Here’s why. (more…)
Last week I had the pleasure of teaching a webinar on Interpreting Regression Coefficients. We walked through the output of a somewhat tricky regression model—it included two dummy-coded categorical variables, a covariate, and a few interactions.
As always seems to happen, our audience asked an amazing number of great questions. (Seriously, I’ve had multiple guest instructors compliment me on our audience and their thoughtful questions.)
We had so many that although I spent about 40 minutes answering (more…)