Regression models

What is Multicollinearity? A Visual Description

November 20th, 2019 by

Multicollinearity is one of those terms in statistics that is often defined in one of two ways:

1. Very mathematical terms that make no sense — I mean, what is a linear combination anyway?

2. Completely oversimplified in order to avoid the mathematical terms — it’s a high correlation, right?

So what is it really? In English?

(more…)


Member Training: Practical Advice for Establishing Reliability and Validity

October 30th, 2019 by

How do you know your variables are measuring what you think they are? And how do you know they’re doing it well?

A key part of answering these questions is establishing reliability and validity of the measurements that you use in your research study. But the process of establishing reliability and validity is confusing. There are a dizzying number of choices available to you.

(more…)


R-Squared for Mixed Effects Models

August 21st, 2019 by

When learning about linear models —that is, regression, ANOVA, and similar techniques—we are taught to calculate an R2. The R2 has the following useful properties:

  • The range is limited to [0,1], so we can easily judge how relatively large it is.
  • It is standardized, meaning its value does not depend on the scale of the variables involved in the analysis.
  • The interpretation is pretty clear: It is the proportion of variability in the outcome that can be explained by the independent variables in the model.

The calculation of the R2 is also intuitive, once you understand the concepts of variance and prediction. (more…)


Member Training: Elements of Experimental Design

August 1st, 2019 by

Whether or not you run experiments, there are elements of experimental design that affect how you need to analyze many types of studies.

The most fundamental of these are replication, randomization, and blocking. These key design elements come up in studies under all sorts of names: trials, replicates, multi-level nesting, repeated measures. Any data set that requires mixed or multilevel models has some of these design elements. (more…)


Linear Regression for an Outcome Variable with Boundaries

July 22nd, 2019 by

The following statement might surprise you, but it’s true.

To run a linear model, you don’t need an outcome variable Y that’s normally distributed. Instead, you need a dependent variable that is:

  • Continuous
  • Unbounded
  • Measured on an interval or ratio scale

The normality assumption is about the errors in the model, which have the same distribution as Y|X. It’s absolutely possible to have a skewed distribution of Y and a normal distribution of errors because of the effect of X. (more…)


How to Reduce the Number of Variables to Analyze

July 10th, 2019 by

by Christos Giannoulis

Many data sets contain well over a thousand variables. Such complexity, the speed of contemporary desktop computers, and the ease of use of statistical analysis packages can encourage ill-directed analysis.

It is easy to generate a vast array of poor ‘results’ by throwing everything into your software and waiting to see what turns up. (more…)