Regression models

Differences in Model Building Between Explanatory and Predictive Models

October 8th, 2018 by Jeff Meyer

Suppose you are asked to create a model that will predict who will drop out of a program your organization offers. You decide to use a binary logistic regression because your outcome has two values: “0” for not dropping out and “1” for dropping out.

Most of us were trained in building models for the purpose of understanding and explaining the relationships between an outcome and a set of predictors. But model building works differently for purely predictive models. Where do we go from here? (more…)

10 comments

What is the Purpose of a Generalized Linear Mixed Model?

September 10th, 2018 by Kim Love

If you are new to using generalized linear mixed effects models, or if you have heard of them but never used them, you might be wondering about the purpose of a GLMM.

Mixed effects models are useful when we have data with more than one source of random variability. For example, an outcome may be measured more than once on the same person (repeated measures taken over time).

When we do that we have to account for both within-person and across-person variability. A single measure of residual variance can’t account for both.

(more…)

1 comment

The Proportional Hazard Assumption in Cox Regression

August 20th, 2018 by guest contributer

by Steve Simon, PhD

The Cox regression model has a fairly minimal set of assumptions, but how do you check those assumptions and what happens if those assumptions are not satisfied?

Non-proportional hazards

The proportional hazards assumption is so important to Cox regression that we often include it in the name (the Cox proportional hazards model). What it essentially means is that the ratio of the hazards for any two individuals is constant over time. They’re proportional. It involves logarithms and it’s a strange concept, so in this article, we’re going to show you how to tell if you don’t have it.

There are several graphical methods for spotting this violation, but the simplest is an examination of the Kaplan-Meier curves.

If the curves cross, as shown below, then you have a problem.

Likewise, if one curve levels off while the other drops to zero, you have a problem.

Figure 2. Kaplan-Meier curve with only one curve leveling off

You can think of non-proportional hazards as an interaction of your independent variable with time. It means that you have to do more work in interpreting your model. If you ignore this problem, you may also experience a serious loss in power.

If you have evidence of non-proportional hazards, don’t despair. There are several fairly simple modifications to the Cox regression model that will work for you.

Nonlinear covariate relationships

The Cox model assumes that each variable makes a linear contribution to the model, but sometimes the relationship may be more complex.

You can diagnose this problem graphically using residual plots. The residual in a Cox regression model is not as simple to compute as the residual in linear regression, but you look for the same sort of pattern as in linear regression.

If you have a nonlinear relationship, you have several options that parallel your choices in a linear regression model.

Lack of independence

Lack of independence is not something that you have to wait to diagnose until your data is collected. Often it is something you are aware from the start because certain features of the design, such as centers in a multi-center study, are likely to produce correlated outcomes. These are the same issues that hound you with a linear regression model in a multi-center study.

There are several ways to account for lack of independence, but this is one problem you don’t want to ignore. An invalid model will ruin all your confidence intervals and p-values.

2 comments

Parametric or Semi-Parametric Models in Survival Analysis?

August 13th, 2018 by guest contributer

It was Casey Stengel who offered the sage advice, “If you come to a fork in the road, take it.”

When you need to fit a regression model to survival data, you have to take a fork in the road. One road asks you to make a distributional assumption about your data and the other does not. (more…)

No comments yet

Six Types of Survival Analysis and Challenges in Learning Them

August 6th, 2018 by Karen Grace-Martin

Survival analysis isn’t just a single model.

It’s a whole set of tests, graphs, and models that are all used in slightly different data and study design situations. Choosing the most appropriate model can be challenging.

In this article I will describe the most common types of tests and models in survival analysis, how they differ, and some challenges to learning them.

(more…)

1 comment

What is Survival Analysis and When Can It Be Used?

July 17th, 2018 by guest contributer

by Steve Simon, PhD

There are two features of survival models.

First is the process of measuring the time in a sample of people, animals, or machines until a specific event occurs. In fact, many people use the term “time to event analysis” or “event history analysis” instead of “survival analysis” to emphasize the broad range of areas where you can apply these techniques.

Second is the recognition that not everyone/everything in your sample will experience the event. Those not experiencing the event, either because the study ended before they had the event or because they were lost to follow-up, are classified as censored observations.

(more…)

No comments yet