by Ursula Saqui, Ph.D.
This post is the first of a two-part series on the overall process of doing a literature review. Part two covers where to find your resources.
Would you build your house without a foundation? Of course not! However, many people skip the first step of any empirical-based project–conducting a literature review. Like the foundation of your house, the literature review is the foundation of your project.
Having a strong literature review gives structure to your research method and informs your statistical analysis. If your literature review is weak or non-existent, (more…)
The steps you take to analyze data are just as important as the statistics you use. Mistakes and frustration in statistical analysis come as much, if not more, from poor process than from using the wrong statistical method.
Benjamin Earnhart of the University of Iowa has written a short (and humorous) article entitled “Respect Your Data” (requires LinkedIn account) that describes 23 practical steps that data analysts must take. This article was published in the newsletter of the American Statistical Association and has since been expanded and annotated
SPSS Variable Labels and Value Labels are two of the great features of its ability to create a code book right in the data set. Using these every time is good data analysis practice.
SPSS doesn’t limit variable names to 8 characters like it used to, but you still can’t use spaces, and it will make coding easier if you keep the variable names short. You then use Variable Labels to give a nice, long description of each variable. On questionnaires, I often use the actual question.
There are good reasons for using Variable Labels right in the data set. I know you want to get right to your data analysis, but using Variable Labels will save so much time later.
1. If your paper code sheet ever gets lost, you still have the variable names.
2. Anyone else who uses your data–lab assistants, graduate students, statisticians–will immediately know what each variable means.
3. As entrenched as you are with your data right now, you will forget what those variable names refer to within months. When a committee member or reviewer wants you to redo an analysis, it will save tons of time to have those variable labels right there.
4. It’s just more efficient–you don’t have to look up what those variable names mean when you read your output.
Variable Labels
The really nice part is SPSS makes Variable Labels easy to use:
1. Mouse over the variable name in the Data View spreadsheet to see the Variable Label.
2. In dialog boxes, lists of variables can be shown with either Variable Names or Variable Labels. Just go to Edit–>Options. In the General tab, choose Display Labels.
3. On the output, SPSS allows you to print out Variable Names or Variable Labels or both. I usually like to have both. Just go to Edit–>Options. In the Output tab, choose ‘Names and Labels’ in the first and third boxes.
Value Labels
Value Labels are similar, but Value Labels are descriptions of the values a variable can take. Labeling values right in SPSS means you don’t have to remember if 1=Strongly Agree and 5=Strongly Disagree or vice-versa. And it makes data entry much more efficient–you can type in 1 and 0 for Male and Female much faster than you can type out those whole words, or even M and F. But by having Value Labels, your data and output still give you the meaningful values.
Once again, SPSS makes it easy for you.
1. If you’d rather see Male and Female in the data set than 0 and 1, go to View–>Value Labels.
2. Like Variable Labels, you can get Value Labels on output, along with the actual values. Just go to Edit–>Options. In the ‘Output Labels’ tab, choose ‘Values and Labels’ in the second and fourth boxes.
Should you drop outliers? Outliers are one of those statistical issues that everyone knows about, but most people aren’t sure how to deal with. Most parametric statistics, like means, standard deviations, and correlations, and every statistic based on these, are highly sensitive to outliers.
And since the assumptions of common statistical procedures, like linear regression and ANOVA, are also based on these statistics, outliers can really mess up your analysis.
Despite all this, as much as you’d like to, it is NOT acceptable to
(more…)
SPSS has a nice little feature for adding and averaging variables with missing data that many people don’t know about.
It allows you to add or average variables, while specifying how many are allowed to be missing.
For example, a very common situation is a researcher needs to average the values of the 5 variables on a scale, each of which is measured on the same Likert scale.
There are two ways to do this in SPSS syntax.
Newvar=(X1 + X2 + X3 + X4 + X5)/5 or
Newvar=MEAN(X1,X2, X3, X4, X5).
In the first method, if any of the variables are missing, due to SPSS’s default of listwise deletion, Newvar will also be missing.
In the second method, if any of the variables is missing, it will still calculate the mean. While this seems great at first, the researcher may wish to limit how many of the 5 variables need to be observed in order to calculate the mean. If only one or two variables are present, the mean may not be a reasonable estimate of the mean of all 5 variables.
SPSS has an option for dealing with this situation. Running it the following way will only calculate the mean if any 4 of the 5 variables is observed. If fewer than 4 of the variables are observed, Newvar will be system missing.
Newvar=MEAN.4(X1,X2, X3, X4, X5).
You can specify any number of variables that need to be observed.
(This same distinction holds for the SUM function in SPSS, but the scale changes based on how many are being averaged. A better approach is to calculate the mean, then multiply by 5).
This works the same way in the syntax or in the Transform–>Compute menu dialog.
First Published 12/1/2016;
Updated 7/20/21 to give more detail.