In the last article, we saw how to create a simple Generalized Linear Model on binary data using the glm() command. We continue with the same glm on the mtcars data set (modeling the vs variable on the weight and engine displacement).
model <- glm(formula= vs ~ wt + disp, data=mtcars, family=binomial)
summary(model)
Call: glm(formula = vs ~ wt + disp, family = binomial, data = mtcars)
Deviance Residuals: Min 1Q Median 3Q Max -1.67506 -0.28444 -0.08401 0.57281 2.08234
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.60859 2.43903 0.660 0.510 wt 1.62635 1.49068 1.091 0.275 disp -0.03443 0.01536 -2.241 0.025 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.86 on 31 degrees of freedom Residual deviance: 21.40 on 29 degrees of freedom AIC: 27.4
Number of Fisher Scoring iterations: 6
We see that weight influences vs positively, while displacement has a slightly negative effect. We also see that the coefficient of weight is non-significant (p > 0.05), while the coefficient of displacement is significant. Later we will see how to investigate ways of improving our model.
In fact, the estimates (coefficients of the predictors weight and displacement) are now in units called logits. We will define the logit in a later blog.
Deviance
We see the word Deviance twice over in the model output. Deviance is a measure of goodness of fit of a generalized linear model. Or rather, it’s a measure of badness of fit–higher numbers indicate worse fit.
R reports two forms of deviance – the null deviance and the residual deviance. The null deviance shows how well the response variable is predicted by a model that includes only the intercept (grand mean).
For our example, we have a value of 43.9 on 31 degrees of freedom. Including the independent variables (weight and displacement) decreased the deviance to 21.4 points on 29 degrees of freedom, a significant reduction in deviance.
The Residual Deviance has reduced by 22.46 with a loss of two degrees of freedom.
Fisher Scoring
What about the Fisher scoring algorithm? Fisher’s scoring algorithm is a derivative of Newton’s method for solving maximum likelihood problems numerically.
For model1 we see that Fisher’s Scoring Algorithm needed six iterations to perform the fit.
This doesn’t really tell you a lot that you need to know, other than the fact that the model did indeed converge, and had no trouble doing it.
Information Criteria
The Akaike Information Criterion (AIC) provides a method for assessing the quality of your model through comparison of related models. It’s based on the Deviance, but penalizes you for making the model more complicated. Much like adjusted R-squared, it’s intent is to prevent you from including irrelevant predictors.
However, unlike adjusted R-squared, the number itself is not meaningful. If you have more than one similar candidate models (where all of the variables of the simpler model occur in the more complex models), then you should select the model that has the smallest AIC.
So it’s useful for comparing models, but isn’t interpretable on its own.
Hosmer-Lemeshow Goodness of Fit
How well our model fits depends on the difference between the model and the observed data. One approach for binary data is to implement a Hosmer Lemeshow goodness of fit test.
To implement this test, first install the ResourceSelection package, a follows.
install.packages("ResourceSelection")
Then load the package using the library() function. The test is available through the hoslem.test() function.
library(ResourceSelection)
hoslem.test(mtcars$vs, fitted(model)) Hosmer and Lemeshow goodness of fit (GOF) test
data: mtcars$vs, fitted(model) X-squared = 6.4717, df = 8, p-value = 0.5945
Our model appears to fit well because we have no significant difference between the model and the observed data (i.e. the p-value is above 0.05).
As with all measures of model fit, we’ll use this as just one piece of information in deciding how well this model fits. It doesn’t work well in very large or very small data sets, but is often useful nonetheless.
That wasn’t so hard! In our next article, we will plot our model.
About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.
See our full R Tutorial Series and other blog posts regarding R programming.
Nivedan I.B says
Hello!
You said, deviance is a measure of goodness of fit of a generalized linear model. What do you exactly mean by “fit”?
Looking forward to hearing from you!
Regards
Nivedan I.B
Matt says
Hi Mr. Lillis,
your description of “deviance” helped me understanding it a bit better but one question is still coming up: how can I interpret the decrease from null deviance when adding independet variables (residual deviance)?
Does it mean the model with indepedents fits better than the null model because of the lower value?
Thank you for your response in advance
chch says
difference b/w null deviance and residual deviance should be chi_squared distributed with 2 degrees of freedom in this case (df null – df model with more parameters)
in this case it’s even highly significant (table below)
https://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm