by Kim Love and Karen Grace-Martin
Statistics terminology is confusing.
Sometimes different terms are used to mean the same thing, often in different fields of application. Sometimes the same term is used to mean different things. And sometimes very similar terms are used to describe related but distinct statistical concepts.
But the type of terms that causes the most trouble when communicating with non-researchers are those with a different colloquial meaning in English than the technical definition in statistics. This is particularly difficult because the definitions are often similar, if not exact.
Let’s take a look at six of these.
1. Significance
This is, for sure, the big one. You’re probably familiar with the difference between statistical significance, generally indicating a p-value that is below a threshold, and the colloquial meaning of large or important.
A significant other is important. A significant raise is large. A statistically significant difference may be neither. This has been so misunderstood that many statisticians are calling for its demise.
2. Odds
In everyday English, people use the terms Odds and Probability interchangeably. In statistics, they’re measuring the same general construct – how likely an event is to occur – on different scales. This difference in scales has a huge impact on how you interpret the value.
Odds measure the probability of an outcome relative to the probability that outcome doesn’t occur: p/(1-p). They range from zero to infinity and a value of 1 indicates equal odds.
Probability is just the numerator, p. They range from zero to one and a value of 0 indicates equal probability.
So while you can easily convert back and forth, an odds of .8 means something very different from a probability of .8.
3. Bias
In colloquial English, bias means prejudice. It’s bad.
Bias isn’t always a good thing in statistics, but it doesn’t have that inherent value judgment.
Bias is a measure of the difference between the value of a population parameter and the theoretical mean value of a statistic that estimates that parameter.
For example, in a simple linear model, the parameter β1 is the slope of the regression line in the population. Since we don’t know its value, we estimate it by calculating b1, the slope of the regression line in a representative sample. Though we know b1 won’t have the exact same value as β1, we expect the average value of b1, across hundreds of samples we could have taken, will. Any difference between that theoretical average value of b1 and the true population value, is the bias of that estimator.
Statistical bias can occur from using an estimator with a known bias. But since we know what those are, more often it comes from having an unrepresentative sample.
4. Correlation
In statistics, a correlation is a specific measurement. Yes, there are different correlation coefficients, like Spearman, Pearson, and polychoric, but all have a few characteristics:
– They measure the direction and strength of association between two variables
– They range from -1 to 1, with 0 indicating no association
The colloquial definition is much broader. It can mean any connection, match, or co-occurrence between individual events. “The correlation between the machine’s failure and a loose connection in the joint coupling.”
5. Error
Colloquially, an error is a mistake.
In regression models, an error is the difference between the value of an outcome variable for one individual and the value predicted by the model. There’s no mistake involved here. Just variation.
There are also other specific uses of error, such as “standard error,” “sampling error” and “measurement error,” all of which are about variation, not mistakes.
6. Random
The technical definition: “a phenomenon is random if individual outcomes are uncertain, but there is nonetheless a regular distribution of outcomes in a large number of repetitions” – Moore and McCabe
And while this is one usage of random in everyday English, it also often means strange or unexpected. For example “There is a random pineapple in my yard.”
Leave a Reply