OptinMon 36 - Getting Started with SPSS

5 Reasons to use SPSS Syntax

October 7th, 2009 by

You don’t rely on only SPSS menus to run your analysis, right?  (Please, please tell me you don’t).

There’s really nothing wrong with using the menus.  It’s a great way to get started using SPSS and it saves you the hassle of remembering all that code.

But there are some really, really good reasons to use the syntax as well.

 

1. Efficiency

If you’re figuring out the best model and have to refine which predictors to include, running the same descriptive statistics on a  bunch of variables, or defining the missing values for all 286 variable in the data set, you’re essentially running the same analysis over and over.

Picking your way through the menus gets old fast.  In syntax, you just copy and paste and change or add variables names.

A trick I use is to run through the menus for one variable, paste the code, then add the other 285. You can even copy the names out of the Variable View and paste them into the code. Very easy.

2. Memory

I know that while you’re immersed in your data analysis, you can’t imagine you won’t always remember every step you did.

But you will.  And sooner than you think.

Syntax gives you a “paper” trail of what you did, so you don’t have to remember. If you’re in a regulated industry, you know why you need this trail. But anyone who needs to defend their research needs it.

3. Communication

When your advisor, coauthor, colleague, statistical consultant, or Reviewer #2 asks you which options you used in your analysis or exactly how you recoded that variable, you can clearly communicate it by showing the syntax.  Much harder to explain with menu options.

When I hold a workshop or run an analysis for a client, I always use syntax.  I  send it to them to peruse, tweak, adapt, or admire.  It’s really the only way for me to show them exactly what I did and how to do it.

If your client, advisor, or colleague doesn’t know how to read the syntax, that’s okay. Because you have a clear answer of what you did, you can explain it.

4. Efficiency again

When the data set gets updated, or a reviewer (or your advisor, coauthor, colleague, or statistical consultant) asks you to add another predictor to a model, it’s a simple matter to edit and rerun a syntax program.

In menus, you have to start all over. Hopefully you’ll remember exactly which options you chose last time and/or exactly how you made every small decision in your data analysis (see #2: Memory).

5. Control

There are some SPSS options that are available in syntax, but not in the menus.

And others that just aren’t what they seem in the menus.

The menus for the Mixed procedure are about the most unintuitive I’ve ever seen.  But the syntax for Mixed is really logical and straightforward.  And it’s very much like the GLM syntax (UNIANOVA), so if you’re familiar with GLM, learning Mixed is a simple extension.

Bonus Reason to use SPSS Syntax: Cleanliness

Luckily, SPSS makes it exceedingly easy to create syntax.  If you’re more comfortable with menus, run it in menus the first time, then hit PASTE instead of OK.  SPSS will automatically create the syntax for you, which you can alter at will.  So you don’t have to remember every programming convention.

When refining a model, I often run through menus and paste it.  Then I alter the syntax to find the best-fitting model.

At this point, the output is a mess, filled with so many models I can barely keep them straight.  Once I’ve figured out the model that fits best, I delete the entire output, then rerun the syntax for only the best model.  Nice, clean output.

The Take-away: Reproducibility

What this all really comes down to is your ability to confidently, easily, and accurately reproduce your analysis. When you rely on menus, you are relying on your own memory to reproduce. There are too many decisions, judgments, and too many places to make easy mistakes without noticing it to ever be able to rely totally on your memory.

The tools are there to make this easy. Use them.

 


PSPP – the free, open source version of SPSS

March 24th, 2009 by

I just heard recently about PSPP, which is a free, open source version of SPSS.

I have not tried it yet, but it does look promising. This is the description from its website:

It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions.

The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get “advanced” functions; all functionality that PSPP currently supports is in the core package.

PSPP can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands.

Sounds pretty good, huh?

The only downside I can see, though, is with the statement “no additional packages to purchase in order to get ‘advanced’ functions.”  That appears to be because there aren’t any advanced functions.  PSPP seems to correspond only to SPSS base.  No Advanced Models, no Missing Values Analysis, no Complex Surveys.  That means you can do one-way ANOVA and regression, but not GLM, logisitic regression, factor analysis.

So if you are only using SPSS for basic statistics, or for teaching an intro class, this may be just what you need.  And perhaps if it takes off, as R has, we’ll see more advanced features soon.

If you’ve had any experience using PSPP, please tell me about it in a comment.  I’d love to hear how well it works.

 


SPSS, SAS, R, Stata, JMP? Choosing a Statistical Software Package or Two

March 16th, 2009 by

In addition to the five listed in this title, there are quite a few other options, so how do you choose which statistical software to use?

The default is to use whatever software they used in your statistics class–at least you know the basics.

And this might turn out pretty well, but chances are it will fail you at some point. Many times the stat package used in a class is chosen for its shallow learning curve, (more…)


Variable Labels and Value Labels in SPSS

January 2nd, 2009 by

SPSS Variable Labels and Value Labels are two of the great features of its ability to create a code book right in the data set.  Using these every time is good data analysis practice.

SPSS doesn’t limit variable names to 8 characters like it used to, but you still can’t use spaces, and it will make coding easier if you keep the variable names short.  You then use Variable Labels to give a nice, long description of each variable.  On questionnaires, I often use the actual question.

There are good reasons for using Variable Labels right in the data set.  I know you want to get right to your data analysis, but using Variable Labels will save so much time later.

1. If your paper code sheet ever gets lost, you still have the variable names.

2. Anyone else who uses your data–lab assistants, graduate students, statisticians–will immediately know what each variable means.

3. As entrenched as you are with your data right now, you will forget what those variable names refer to within months.  When a committee member or reviewer wants you to redo an analysis, it will save tons of time to have those variable labels right there.

4.  It’s just more efficient–you don’t have to look up what those variable names mean when you read your output.

Variable Labels

The really nice part is SPSS makes Variable Labels easy to use:

1. Mouse over the variable name in the Data View spreadsheet to see the Variable Label.

2. In dialog boxes, lists of variables can be shown with either Variable Names or Variable Labels.  Just go to Edit–>Options.  In the General tab, choose Display Labels.

3. On the output, SPSS allows you to print out Variable Names or Variable Labels or both.  I usually like to have both.  Just go to Edit–>Options.  In the Output tab, choose ‘Names and Labels’ in the first and third boxes.

Value Labels

Value Labels are similar, but Value Labels are descriptions of the values a variable can take.  Labeling values right in SPSS means you don’t have to remember if 1=Strongly Agree and 5=Strongly Disagree or vice-versa.  And it makes data entry much more efficient–you can type in 1 and 0 for Male and Female much faster than you can type out those whole words, or even M and F.  But by having Value Labels, your data and output still give you the meaningful values.

Once again, SPSS makes it easy for you.

1. If you’d rather see Male and Female in the data set than 0 and 1, go to View–>Value Labels.

2. Like Variable Labels, you can get Value Labels on output, along with the actual values.  Just go to Edit–>Options.  In the ‘Output Labels’ tab, choose ‘Values and Labels’ in the second and fourth boxes.

 


Averaging and Adding Variables with Missing Data in SPSS

August 29th, 2008 by

SPSS has a nice little feature for adding and averaging variables with missing data that many people don’t know about.

It allows you to add or average variables, while specifying how many are allowed to be missing.

For example, a very common situation is a researcher needs to average the values of the 5 variables on a scale, each of which is measured on the same Likert scale.

There are two ways to do this in SPSS syntax.

Newvar=(X1 + X2 + X3 + X4 + X5)/5  or

Newvar=MEAN(X1,X2, X3, X4, X5).

In the first method, if any of the variables are missing, due to SPSS’s default of listwise deletion, Newvar will also be missing.

In the second method, if any of the variables is missing, it will still calculate the mean.  While this seems great at first,  the researcher may wish to limit how many of the 5 variables need to be observed in order to calculate the mean.  If only one or two variables are present, the mean may not be a reasonable estimate of the mean of all 5 variables.

SPSS has an option for dealing with this situation.  Running it the following way will only calculate the mean if any 4 of the 5 variables is observed.  If fewer than 4 of the variables are observed, Newvar will be system missing.

Newvar=MEAN.4(X1,X2, X3, X4, X5).

You can specify any number of variables that need to be observed.

(This same distinction holds for the SUM function in SPSS, but the scale changes based on how many are being averaged.  A better approach is to calculate the mean, then multiply by 5).

This works the same way in the syntax or in the Transform–>Compute menu dialog.

First Published  12/1/2016;
Updated  7/20/21 to give more detail.