It was Casey Stengel who offered the sage advice, “If you come to a fork in the road, take it.”
When you need to fit a regression model to survival data, you have to take a fork in the road. One road asks you to make a distributional assumption about your data and the other does not.
Parametric Survival Analysis Models
Parametric models for survival data don’t work well with the normal distribution. The normal distribution can have any value, even negative ones.
Parametric survival analysis models typically require a non-negative distribution, because if you have negative survival times in your study, it is a sign that the zombie apocalypse has started (Wheatley-Price 2012).
The distributions that work well for survival data include the exponential, Weibull, gamma, and lognormal distributions among others.
Which distribution you choose will affect the shape of the model’s hazard function. You can choose the one that best matches your a priori beliefs about the hazard function or you can compare different parametric models and choose among them using a criterion like AIC.
Having to choose a reasonable distribution is the biggest challenge in running parametric models.
Semi-Parametric Survival Analysis Model: Cox Regression
The alternative fork estimates the hazard function from the data. This approach is referred to as a semi-parametric approach because while the hazard function is estimated non-parametrically, the functional form of the covariates is parametric.
The semi-parametric model relies on some very clever partial likelihood calculations by Sir David Cox in 1972 and the method is often called Cox regression in his honor. It is also often referred to as proportional hazards regression to highlight a major assumption of this model.
Cox regression is a much more popular choice than parametric regression, because the nonparametric estimate of the hazard function offers you much greater flexibility than most parametric approaches.
Nevertheless, a parametric model, if it is the correct parametric model, does offer some advantages.
A parametric model will provide somewhat greater efficiency, because you are estimating fewer parameters. It also provides you with the ability to extrapolate beyond the range of the data. Finally, if the parametric model matches some underlying mechanism associated with your data, you end up with more relevant interpretations of your model.
References: Wheatley-Price P, Hutton B, Clemons M. The Mayan Doomsday’s effect on survival outcomes in clinical trials. CMAJ. 2012 Dec 11; 184(18): 2021–2022. doi: 10.1503/cmaj.121616.
Leave a Reply