Regression models

Understanding Interaction Between Dummy Coded Categorical Variables in Linear Regression

September 2nd, 2016 by Jeff Meyer

The concept of a statistical interaction is one of those things that seems very abstract. Obtuse definitions, like this one from Wikipedia, don’t help:

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the simultaneous influence of two variables on a third is not additive. Most commonly, interactions are considered in the context of regression analyses.

First, we know this is true because we read it on the internet! Second, are you more confused now about interactions than you were before you read that definition? (more…)

44 comments

Member Training: Cox Regression

September 1st, 2016 by guest contributer

When you have data measuring the time to an event, you can examine the relationship between various predictor variables and the time to the event using a Cox proportional hazards model.

In this webinar, you will see what a hazard function is and describe the interpretations of increasing, decreasing, and constant hazard. Then you will examine the log rank test, a simple test closely tied to the Kaplan-Meier curve, and the Cox proportional hazards model.

Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

(more…)

1 comment

The Difference Between Relative Risk and Odds Ratios

July 11th, 2016 by Audrey Schnell

Relative Risk and Odds Ratios are often confused despite being unique concepts. Why?

Well, both measure association between a binary outcome variable and a continuous or binary predictor variable. (more…)

28 comments

Pros and Cons of Treating Ordinal Variables as Nominal or Continuous

July 1st, 2016 by Karen Grace-Martin

There are not a lot of statistical methods designed just for ordinal variables.

But that doesn’t mean that you’re stuck with few options. There are more than you’d think. (more…)

3 comments

Member Training: Working with Truncated and Censored Data

July 1st, 2016 by Jeff Meyer

Statistically speaking, when we see a continuous outcome variable we often worry about outliers and how these extreme observations can impact our model.

But have you ever had an outcome variable with no outliers because there was a boundary value at which accurate measurements couldn’t be or weren’t recorded?

Examples include:

Income data where all values above $100,000 are recorded as $100k or greater
Soil toxicity ratings where the device cannot measure values below 1 ppm
Number of arrests where there are no zeros because the data set came from police records where all participants had at least one arrest

These are all examples of data that are truncated or censored. Failing to incorporate the truncation or censoring will result in biased results.

This webinar will discuss what truncated and censored data are and how to identify them.

There are several different models that are used with this type of data. We will go over each model and discuss which type of data is appropriate for each model.

We will then compare the results of models that account for truncated or censored data to those that do not. From this you will see what possible impact the wrong model choice has on the results.

(more…)

1 comment

Member Training: Zero Inflated Models

June 1st, 2016 by Karen Grace-Martin

A common situation with count outcome variables is there are a lot of zero values. The Poisson distribution used for modeling count variables takes into account that zeros are often the most

common value, but sometimes there are even more zeros than the Poisson distribution can account for.

This can happen in continuous variables as well–most of the distribution follows a beautiful normal distribution, except for the big stack of zeros.

This webinar will explore two ways of modeling zero-inflated data: the Zero Inflated model and the Hurdle model. Both assume there are two different processes: one that affects the probability of a zero and one that affects the actual values, and both allow different sets of predictors for each process.

We’ll explore these models as well as some related models, like Zero-One Inflated Beta models for proportion data.

(more…)

No comments yet