by Jeff Meyer, MBA, MPA
One of the most important concepts in data analysis is that the analysis needs to be appropriate for the scale of measurement of the variable. The focus of these decisions about scale tends to focus on levels of measurement: nominal, ordinal, interval, ratio.
These levels of measurement tell you about the amount of information in the variable. But there are other ways of distinguishing the scales that are also important and often overlooked.
(more…)
Suppose you are asked to create a model that will predict who will drop out of a program your organization offers. You decide to use a binary logistic regression because your outcome has two values: “0” for not dropping out and “1” for dropping out.
Most of us were trained in building models for the purpose of understanding and explaining the relationships between an outcome and a set of predictors. But model building works differently for purely predictive models. Where do we go from here? (more…)
Even with a few years of experience, interpreting the coefficients of interactions in a regression table can take some time to figure out. Trying to explain these coefficients to a group of non-statistically inclined people is a daunting task.
For example, say you are going to speak to a group of dieticians. They are interested (more…)
We’ve looked at the interaction effect between two categorical variables. Now let’s make things a little more interesting, shall we?
What if our predictors of interest, say, are a categorical and a continuous variable? How do we interpret the interaction between the two? (more…)
When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.” But they never explained why.
And I couldn’t figure it out.
The model notation is different.
The output looks different.
The vocabulary is different.
The focus of what we’re testing is completely different. How can they be the same model?
(more…)
Transformations don’t always help, but when they do, they can improve your linear regression model in several ways simultaneously.
They can help you better meet the linear regression assumptions of normality and homoscedascity (i.e., equal variances). They also can help avoid some of the artifacts caused by boundary limits in your dependent variable — and sometimes even remove a difficult-to-interpret interaction.
(more…)