Multicollinearity can affect any regression model with more than one predictor. It occurs when two or more predictor variables overlap so much in what they measure that their effects are indistinguishable.
When the model tries to estimate their unique effects, it goes wonky (yes, that’s a technical term).
So for example, you may be interested in understanding the separate effects of altitude and temperature on the growth of a certain species of mountain tree.
(more…)
At times it is necessary to convert a continuous predictor into a categorical predictor. For example, income per household is shown below.

This data is censored, all family income above $155,000 is stated as $155,000. A further explanation about censored and truncated data can be found here. It would be incorrect to use this variable as a continuous predictor due to its censoring.
(more…)
What’s a good method for interpreting the results of a model with two continuous predictors and their interaction?
Let’s start by looking at a model without an interaction. In the model below, we regress a subject’s hip size on their weight and height. Height and weight are centered at their means.
(more…)
One approach to model building is to use all predictors that make theoretical sense in the first model. For example, a first model for determining birth weight could include mother’s age, education, marital status, race, weight gain during pregnancy and gestation period.
The main effects of this model show that a mother’s education level and marital status are insignificant.
(more…)
There is a bit of art and experience to model building. You need to build a model to answer your research question but how do you build a statistical model when there are no instructions in the box?
Should you start with all your predictors or look at each one separately? Do you always take out non-significant variables and do you always leave in significant ones?
(more…)