Regression models

How to Reduce the Number of Variables to Analyze

July 10th, 2019 by

by Christos Giannoulis

Many data sets contain well over a thousand variables. Such complexity, the speed of contemporary desktop computers, and the ease of use of statistical analysis packages can encourage ill-directed analysis.

It is easy to generate a vast array of poor ‘results’ by throwing everything into your software and waiting to see what turns up. (more…)


Confusing Statistical Terms #11: Confounder

June 26th, 2019 by

What is a Confounder?

Confounder (also called confounding variable) is one of those statistical terms that confuses a lot of people. Not because it represents a confusing concept, but because of how it’s used.

(Well, it’s a bit of a confusing concept, but that’s not the worst part).

It has slightly different meanings to different types of researchers. The definition is essentially the same, but the research context can have specific implications for how that definition plays out.

If the person you’re talking to has a different understanding of what it means, you’re going to have a confusing conversation.

Let’s take a look at some examples to unpack this.

(more…)


What Is a Hazard Function in Survival Analysis?

April 29th, 2019 by

One of the key concepts in Survival Analysis is the Hazard Function.

But like a lot of concepts in Survival Analysis, the concept of “hazard” is similar, but not exactly the same as, its meaning in everyday English. Since it’s so important, though, let’s take a look. (more…)


Regression Diagnostics in Generalized Linear Mixed Models

March 25th, 2019 by

What are the best methods for checking a generalized linear mixed model (GLMM) for proper fit?

This question comes up frequently.

Unfortunately, it isn’t as straightforward as it is for a general linear model.

In linear models the requirements are easy to outline: linear in the parameters, normally distributed and independent residuals, and homogeneity of variance (that is, similar variance at all values of all predictors).

(more…)


Recoding a Variable from a Survey Question to Use in a Statistical Model

March 18th, 2019 by

Survey questions are often structured without regard for ease of use within a statistical model.Stage 2

Take for example a survey done by the Centers for Disease Control (CDC) regarding child births in the U.S. One of the variables in the data set is “interval since last pregnancy”. Here is a histogram of the results.

(more…)


How to Decide Between Multinomial and Ordinal Logistic Regression Models

March 11th, 2019 by

A great tool to have in your statistical tool belt is logistic regression.

It comes in many varieties and many of us are familiar with the variety for binary outcomes.

But multinomial and ordinal varieties of logistic regression are also incredibly useful and worth knowing.

They can be tricky to decide between in practice, however.  In some — but not all — situations you (more…)