OptinMon 30 - Four Critical Steps in Building Linear Regression Models

Overfitting in Regression Models

August 9th, 2021 by

The practice of choosing predictors for a regression model, called model building, is an area of real craft.Stage 2

There are many possible strategies and approaches and they all work well in some situations. Every one of them requires making a lot of decisions along the way. As you make decisions, one danger to look out for is overfitting—creating a model that is too complex for the the data. (more…)


What are Sums of Squares?

January 9th, 2021 by

A key part of the output in any linear model is the ANOVA table. It has many names in different software procedures, but every regression or ANOVAStage 2 model has a table with Sums of Squares, degrees of freedom, mean squares, and F tests. Many of us were trained to skip over this table, but

(more…)


When Unequal Sample Sizes Are and Are NOT a Problem in ANOVA

December 18th, 2020 by

Stage 2

In your statistics class, your professor made a big deal about unequal sample sizes in one-way Analysis of Variance (ANOVA) for two reasons.

1. Because she was making you calculate everything by hand.  Sums of squares require a different formula* if sample sizes are unequal, but statistical software will automatically use the right formula. So we’re not too concerned. We’re definitely using software.

2. Nice properties in ANOVA such as the Grand Mean being the intercept in an effect-coded regression model don’t hold when data are unbalanced.  Instead of the grand mean, you need to use a weighted mean.  That’s not a big deal if you’re aware of it. (more…)


What It Really Means to Remove an Interaction From a Model

September 17th, 2020 by

When you’re model building, a key decision is which interaction terms to include. And which interactions to remove.Stage 2

As a general rule, the default in regression is to leave them out. Add interactions only with a solid reason. It would seem like data fishing to simply add in all possible interactions.

And yet, that’s a common practice in most ANOVA models: put in all possible interactions and only take them out if there’s a solid reason. Even many software procedures default to creating interactions among categorical predictors.

(more…)


Simplifying a Categorical Predictor in Regression Models

January 14th, 2020 by

One of the many decisions you have to make when model building is which form each predictor variable should take. One specific version of thisStage 2 decision is whether to combine categories of a categorical predictor.

The greater the number of parameter estimates in a model the greater the number of observations that are needed to keep power constant. The parameter estimates in a linear (more…)


What is Multicollinearity? A Visual Description

November 20th, 2019 by

Multicollinearity is one of those terms in statistics that is often defined in one of two ways:

1. Very mathematical terms that make no sense — I mean, what is a linear combination anyway?

2. Completely oversimplified in order to avoid the mathematical terms — it’s a high correlation, right?

So what is it really? In English?

(more…)