Multicollinearity occurs when two or more predictor variables in a regression model are redundant. It is a real problem, and it can do terrible things to your results. However, the dangers of multicollinearity seem to have been so drummed into students’ minds that it created a panic.
True multicolllinearity (the kind that messes things up) is pretty uncommon. High correlations among predictor variables may indicate multicollinearity, but it is NOT a reliable indicator that it exists. It does not necessarily indicate a problem. How high is too high depends on the sample size–as the sample gets bigger higher correlations are tolerable.
Likewise, multicollinearity can exist without a high correlation among predictors. Two common examples are redundant information in summed variables or between multiplicative terms (quadratics and polynomials) and the variables that make them up.
The real problem with multicollearity, and the real issue to check for, is that it hugely inflates the variance of parameter estimates (regression coefficients, group means, etc.). This means that standard errors become enormous and t values end up at 0, making everything insignificant. The best way to check for severe multicollinearity is using condition indices which is easily done with the ‘collinearity diagnostics’ option in SPSS regression analysis.
A more common issue is moderate multicollinearity which occurs when predictor variables are moderately correlated. Although it does not cause the mathematical problems of severe multicollinearity, it does affect the interpretation of model parameter estimates. Researchers need to keep the associations among predictors in mind as they interpret regression coefficients.
Leave a Reply