It was Casey Stengel who offered the sage advice, “If you come to a fork in the road, take it.”
When you need to fit a regression model to survival data, you have to take a fork in the road. One road asks you to make a distributional assumption about your data and the other does not. (more…)
When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.” But they never explained why.
And I couldn’t figure it out.
The model notation is different.
The output looks different.
The vocabulary is different.
The focus of what we’re testing is completely different. How can they be the same model?
(more…)
I recently gave a free webinar on Principal Component Analysis. We had almost 300 researchers attend and didn’t get through all the questions. This is part of a series of answers to those questions.
If you missed it, you can get the webinar recording here.
Question: Can we use PCA for reducing both predictors and response variables?
In fact, there were a few related but separate questions about using and interpreting the resulting component scores, so I’ll answer them together here.
How could you use the component scores?
A lot of times PCAs are used for further analysis — say, regression. How can we interpret the results of regression?
Let’s say I would like to interpret my regression results in terms of original data, but they are hiding under PCAs. What is the best interpretation that we can do in this case?
Answer:
So yes, the point of PCA is to reduce variables — create an index score variable that is an optimally weighted combination of a group of correlated variables.
And yes, you can use this index variable as either a predictor or response variable.
It is often used as a solution for multicollinearity among predictor variables in a regression model. Rather than include multiple correlated predictors, none of which is significant, if you can combine them using PCA, then use that.
It’s also used as a solution to avoid inflated familywise Type I error caused by running the same analysis on multiple correlated outcome variables. Combine the correlated outcomes using PCA, then use that as the single outcome variable. (This is, incidentally, what MANOVA does).
In both cases, you can no longer interpret the individual variables.
You may want to, but you can’t. (more…)
The LASSO model (Least Absolute Shrinkage and Selection Operator) is a recent development that allows you to find a good fitting model in the regression context. It avoids many of the problems of overfitting that plague other model-building approaches.
In this Statistically Speaking Training, guest instructor Steve Simon, PhD, explains what overfitting is — and why it’s a problem.
Then he illustrates the geometry of the LASSO model in comparison to other regression approaches, ridge regression and stepwise variable selection.
Finally, he shows you how LASSO regression works with a real data set.
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
(more…)
Multicollinearity isn’t an assumption of regression models; it’s a data issue.
And while it can be seriously problematic, more often it’s just a nuisance.
In this webinar, we’ll discuss:
- What multicollinearity is and isn’t
- What it does to your model and estimates
- How to detect it
- What to do about it, depending on how serious it is
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
(more…)
Model Building–choosing predictors–is one of those skills in statistics that is difficult to teach. It’s hard to lay out the steps, because at each step, you have to evaluate the situation and make decisions on the next step.
If you’re running purely predictive models, and the relationships among the variables aren’t the focus, it’s much easier. Go ahead and run a stepwise regression model. Let the data give you the best prediction.
But if the point is to answer a research question that describes relationships, you’re going to have to get your hands dirty.
It’s easy to say “use theory” or “test your research question” but that ignores a lot of practical issues. Like the fact that you may have 10 different variables that all measure the same theoretical construct, and it’s not clear which one to use. (more…)