independent variable

Member Training: Confusing Statistical Terms

February 28th, 2020 by guest contributer

Learning statistics is difficult enough; throw in some especially confusing terminology and it can feel impossible! There are many ways that statistical language can be confusing.

Some terms mean one thing in the English language, but have another (usually more specific) meaning in statistics. (more…)

Comments closed

How to Reduce the Number of Variables to Analyze

July 10th, 2019 by Christos Giannoulis

by Christos Giannoulis

Many data sets contain well over a thousand variables. Such complexity, the speed of contemporary desktop computers, and the ease of use of statistical analysis packages can encourage ill-directed analysis.

It is easy to generate a vast array of poor ‘results’ by throwing everything into your software and waiting to see what turns up. (more…)

1 comment

To Moderate or to Mediate?

May 21st, 2018 by Christos Giannoulis

We get many questions from clients who use the terms mediator and moderator interchangeably.

They are easy to confuse, yet mediation and moderation are two distinct terms that require distinct statistical approaches.

The key difference between the concepts can be compared to a case where a moderator lets you know when an association will occur while a mediator will inform you how or why it occurs.

(more…)

No comments yet

The Distribution of Independent Variables in Regression Models

January 19th, 2010 by Karen Grace-Martin

While there are a number of distributional assumptions in regression models, one distribution that has no assumptions is that of any predictor (i.e. independent) variables.

It’s because regression models are directional. In a correlation, there is no direction–Y and X are interchangeable. If you switched them, you’d get the same correlation coefficient.

But regression is inherently a model about the outcome variable. What predicts its value and how well? The nature of how predictors relate to it (more…)

9 comments

Series on Confusing Statistical Terms

December 3rd, 2009 by Karen Grace-Martin

One of the biggest challenges in learning statistics and data analysis is learning the lingo. It doesn’t help that half of the notation is in Greek (literally).

The terminology in statistics is particularly confusing because often the same word or symbol is used to mean completely different concepts.

I know it feels that way, but it really isn’t a master plot by statisticians to keep researchers feeling ignorant.

Really.

It’s just that a lot of the methods in statistics were created by statisticians working in different fields–economics, psychology, medicine, and yes, straight statistics. Certain fields often have specific types of data that come up a lot and that require specific statistical methodologies to analyze.

Economics needs time series, psychology needs factor analysis. Et cetera, et cetera.

But separate fields developing statistics in isolation has some ugly effects.

Sometimes different fields develop the same technique, but use different names or notation.

Other times different fields use the same name or notation on different techniques they developed.

And of course, there are those terms with slightly different names, often used in similar contexts, but with different meanings. These are never used interchangeably, but they’re easy to confuse if you don’t use this stuff every day.

And sometimes, there are different terms for subtly different concepts, but people use them interchangeably. (I am guilty of this myself). It’s not a big deal if you understand those subtle differences. But if you don’t, it’s a mess.

And it’s not just fields–it’s software, too.

SPSS uses different names for the exact same thing in different procedures. In GLM, a continuous independent variable is called a Covariate. In Regression, it’s called an Independent Variable.

Likewise, SAS has a Repeated statement in its GLM, Genmod, and Mixed procedures. They all get at the same concept there (repeated measures), but they deal with it in drastically different ways.

So once the fields come together and realize they’re all doing the same thing, people in different fields or using different software procedures, are already used to using their terminology. So we’re stuck with different versions of the same word or method.

So anyway, I am beginning a series of blog posts to help clear this up. Hopefully it will be a good reference you can come back to when you get stuck.

We’ve expanded on this list with a member training, if you’re interested.

If you have good examples, please post them in the comments. I’ll do my best to clear things up.

Why Statistics Terminology is Especially Confusing

Confusing Statistical Term #1: Independent Variable

Confusing Statistical Terms #2: Alpha and Beta

Confusing Statistical Term #3: Levels

Confusing Statistical Terms #4: Hierarchical Regression vs. Hierarchical Model

Confusing Statistical Term #5: Covariate

Confusing Statistical Term #6: Factor

Same Statistical Models, Different (and Confusing) Output Terms

Confusing Statistical Term #7: GLM

Confusing Statistical Term #8: Odds

Confusing Statistical Term #9: Multiple Regression Model and Multivariate Regression Model

Confusing Statistical Term #10: Mixed and Multilevel Models

Confusing Statistical Terms #11: Confounder

Six terms that mean something different statistically and colloquially

Confusing Statistical Term #13: MAR and MCAR Missing Data

8 comments

The Distribution of Independent Variables in Regression Models

April 9th, 2009 by Karen Grace-Martin

I often hear concern about the non-normal distributions of independent variables in regression models, and I am here to ease your mind.

There are NO assumptions in any linear model about the distribution of the independent variables. Yes, you only get meaningful parameter estimates from nominal (unordered categories) or numerical (continuous or discrete) independent variables. But no, the model makes no assumptions about them. They do not need to be normally distributed or continuous.

It is useful, however, to understand the distribution of predictor variables to find influential outliers or concentrated values. A highly skewed independent variable may be made more symmetric with a transformation.

27 comments