Karen Grace-Martin

Measures of Predictive Models: Sensitivity and Specificity

June 5th, 2015 by Karen Grace-Martin

A few years ago, I was in Syracuse for a family trip to the zoo. Syracuse is about 50 miles from where I live and it has a very nice little zoo.

One year was particularly exciting because a Trader Joe’s just opened in Syracuse. We don’t have one where we live* (sadly!) so we always stock up on our favorite specialty groceries when we’re near a Trader Joe’s.

On this particular trip, though, we had an unwelcome surprise. My credit card card company believed my Trader Joe’s spree was fraudulent and declined the transaction. I got a notice on my phone and was able to fix it right away, so it wasn’t the big inconvenience it could have been.

But this led us to wonder what it was about the transaction that led the bank to believe it was fraudulent. Do credit card thieves often skip town and go grocery shopping?

The bank was clearly betting so. It must have a statistical model for aspects of a transaction that are likely enough to be fraudulent that it shuts it down. (more…)

6 comments

Effect Size Statistics in Logistic Regression

May 18th, 2015 by Karen Grace-Martin

Effect size statistics are expected by many journal editors these days.

If you’re running an ANOVA, t-test, or linear regression model, it’s pretty straightforward which ones to report.

Things get trickier, though, once you venture into other types of models. (more…)

7 comments

What is a Logit Function and Why Use Logistic Regression?

May 11th, 2015 by Karen Grace-Martin

One of the big assumptions of linear models is that the residuals are normally distributed. This doesn’t mean that Y, the response variable, has to also be normally distributed, but it does have to be continuous, unbounded and measured on an interval or ratio scale.

Unfortunately, categorical response variables are none of these. (more…)

17 comments

3 Tips for Keeping Track of Data Files in a Large Data Analysis

March 23rd, 2015 by Karen Grace-Martin

If you’ve ever worked on a large data analysis project, you know that just keeping track of everything is a battle in itself.

Every data analysis project is unique and there are always many good ways to keep your data organized.

In case it’s helpful, here are a few strategies I used in a recent project that you may find helpful. They didn’t make the project easy, but they helped keep it from spiraling into overwhelm.

1. Use file directory structures to keep relevant files together

In our data set, it was clear which analyses were needed for each outcome. Therefore, all files and corresponding file directories were organized by outcomes.

Organizing everything by outcome variable also allowed us to keep the unique raw and cleaned data, programs, and output in a single directory.

This made it always easy to find the final data set, analysis, or output for any particular analysis.

You may not want to organize your directories by outcome. Pick a directory structure that makes it easy to find each set of analyses with corresponding data and output files.

2. Split large data sets into smaller relevant ones

In this particular analysis, there were about a dozen outcomes, each of which was a scale. In other words, each one had many, many variables.

Rather than create one enormous and unmanageable data set, each outcome scale made up a unique data set. Variables that were common to all analyses–demographics, controls, and condition variables–were in their own data set.

For each analysis, we merged the common variables data set with the relevant unique variable data set.

This allowed us to run each analysis without the clutter of irrelevant variables.

This strategy can be particularly helpful when you are running secondary data analysis on a large data set.

Spend some time thinking about which variables are common to all analyses and which are unique to a single model.

3. Do all data manipulation in syntax

I can’t emphasize this one enough.

As you’re cleaning data it’s tempting to make changes in menus without documenting them, then save the changes in a separate data file.

It may be quicker in the short term, but it will ultimately cost you time and a whole lot of frustration.

Above and beyond the inability to find your mistakes (we all make mistakes) and document changes, the problem is this: you won’t be able to clean a large data set in one sitting.

So at each sitting, you have to save the data to keep changes. You don’t feel comfortable overwriting the data, so instead you create a new version.

Do this each time you clean data and you end up with dozens of versions of the same data.

A few strategic versions can make sense if each is used for specific analyses. But if you have too many, it gets incredibly confusing which version of each variable is where.

Picture this instead.

Start with one raw data set.

Write a syntax file that opens that raw data set, cleans, recodes, and computes new variables, then saves a finished one, ready for analysis.

If you don’t get the syntax file done in one sitting, no problem. You can add to it later and rerun everything from your previous sitting with one click.

If you love using menus instead of writing syntax, still no problem.

Paste the commands as you go along. The goal is not to create a new version of the data set, but to create a clean syntax file that creates the new version of the data set. Edit it as you go.

If you made a mistake in recoding something, edit the syntax, not the data file.

Need to make small changes? If it’s set up well, rerunning it only takes seconds.

There is no problem with overwriting the finished data set because all the changes are documented in the syntax file.

No comments yet

Why Mixed Models are Harder in Repeated Measures Designs: G-Side and R-Side Modeling

February 25th, 2015 by Karen Grace-Martin

I have recently worked with two clients who were running generalized linear mixed models in SPSS.

Both had repeated measures experiments with a binary outcome.

The details of the designs were quite different, of course. But both had pretty complicated combinations of within-subjects factors.

Fortunately, both clients are intelligent, have a good background in statistical modeling, and are willing to do the work to learn how to do this. So in both cases, we made a lot of progress in just a couple meetings.

I found it interesting, through, that both were getting stuck on the same subtle point. It’s the same point I was missing for a long time in my own learning of mixed models.

Once I finally got it, a huge light bulb turned on. (more…)

2 comments

When Main Effects are Not Significant, But the Interaction Is

January 21st, 2015 by Karen Grace-Martin

If you have significant a significant interaction effect and non-significant main effects, would you interpret the interaction effect?

It’s a question I get pretty often, and it’s a more straightforward answer than most.

(more…)

29 comments