The Analysis Factor

Volume 6, Issue 10	May 2014

A Note From Karen

Karen Grace-Martin I want to start by thanking Dr. Trent Buskirk of the Marketing Systems Group for his webinar earlier this month on Analyzing Complex Samples. If you missed the webinar, the recording is available for download on our website.

That was part of our regular The Craft of Statistical Analysis Webinar series. That program will be taking a hiatus this summer, and will resume in September.

But not to worry, we still have lots going on. Next up is our workshop on Logistic Regression. As I mentioned last time, we'll be adding Stata examples to the workshop this time through. We're working on Stata examples for a few more workshops as well, and we'll let you know once those are ready.

I'd like to welcome all the new members to our Data Analysis Brown Bag program. The May topic was Cluster Analysis. Although you may have missed the live webinar, remember, you can join anytime during a month to have access to the recording and the open Q&A sessions. There is another one next week.

Happy analyzing!
Karen

Feature Article: Effect Size Statistics in Logistic Regression

Effect size statistics are expected by m these days.

And if you're running an ANOVA, t-test, or linear regression model, it's pretty straightforward which ones to report.

Things get trickier, though, once you venture into other types of models. Logistic regression, for example.

Many of the common effect size statistics, like eta-squared and Cohen's d, can't be calculated in a logistic regression model. So now what do you use?

Types of Effect Size Statistics

First, it's important to understand what effect size statistics are for and why they're worth reporting.

This quotation about what effect size statistics generally do by Joseph Durlak explains it nicely:

“…provide information about the magnitude and direction of the difference between two groups or the relationship between two variables.”

There are two types of effect size statistics--standardized and unstandardized.

Standardized statistics have been stripped of all units of measurement. Correlation is a nice example. People like correlation because the strength and direction of any two correlations can be compared, regardless of the units of the variables on which the correlation was measured.

Unstandardized statistics are still measured in the original units of the variables. So a difference in two means and a regression coefficient are both effect size statistics and both are useful to report.

Most people mean standardized when they say "effect size statistic." But both describe the magnitude and direction of the research findings.

Odds Ratios as Effect Size Statistics

If you're at all familiar with logistic regression, you're also familiar with odds ratios. Odds ratios measure how many times bigger the odds of one outcome is for one value of an IV, compared to another value.

For example, let's say you're doing a logistic regression for a ecology study on whether or not a wetland in a certain area has been infected with a specific invasive plant. Predictors include water temperature in degrees Celsius, altitude, and whether the wetland is a fen or a marsh.

If the odds ratio for water temperature is 1.12, that means that for each one-degree Celsius increase in water temperature, the odds of the wetland having the invasive plant species is 1.12 times as big, after controlling for the other predictors.

That odds ratio is an unstandardized effect size statistic. It tells you the direction and the strength of the relationship between water temperature and the odds that the plant is present.

It's unstandardized because it's based on the units of temperature. I realize no ecologist would do so, but if the water were measured in degrees Fahrenheit, that odds ratio would have a different value. The direction and the strength of the relationship would be the same, but the statistic would be evaluated on a different scale.

Odds Ratios as Standardized Effect Size Statistics

Surprisingly, I've seen odds ratios listed as standardized effect size statistics. A little digging showed those authors were referring to one of two situations.

The first situation is when the predictor is also binary. The odds ratio for whether the wetland was a fen or a marsh can be considered standardized, not because we've removed any units, but because there never were any. So if the odds ratio for fen vs. marsh is 2.3, we know the odds of the invasive plant being in a fen is 2.3 times that of a marsh.

The second is when the numerical predictor is standardized. If we standardize the temperature (aka, convert the scale to Z scores), we've removed the units. We'll get the same Z scores from degrees fahrenheit as degrees Celsius, and the new odds ratio will be in terms of one standard deviation increases in termperature, rather than one degree.