There’s no mincing words here. Missing values can cause problems for every statistician. That’s true for a lot of reasons, but it can start with simple issues of choices made when coding missing values in a data set. Here are a few examples.
Example 1: The Null License Plate
Researcher Joseph Tartaro thought it would be funny to get the following California vanity license plate: (more…)
Survey questions are often structured without regard for ease of use within a statistical model.
Take for example a survey done by the Centers for Disease Control (CDC) regarding child births in the U.S. One of the variables in the data set is “interval since last pregnancy”. Here is a histogram of the results.
(more…)
I recently opened a very large data set titled “1998 California Work and Health Survey” compiled by the Institute for Health Policy Studies at the University of California, San Francisco. There are 1,771 observations and 345 variables. (more…)
One data manipulation task that you need to do in pretty much any data analysis is recode data. It’s almost never the case that the data are set up exactly the way you need them for your analysis.
In R, you can re-code an entire vector or array at once. To illustrate, let’s set up a vector that has missing values.
A <- c(3, 2, NA, 5, 3, 7, NA, NA, 5, 2, 6)
A
[1] 3 2 NA 5 3 7 NA NA 5 2 6
We can re-code all missing values by another number (such as zero) as follows: (more…)
Before you run a Cronbach’s alpha or factor analysis on scale items, it’s generally a good idea to reverse code items that are negatively worded so that a high value indicates the same type of response on every item.
So for example let’s say you have 20 items each on a 1 to 7 scale. For most items, a 7 may indicate a positive attitude toward some issue, but for a few items, a 1 indicates a positive attitude.
I want to show you a very quick and easy way to reverse code them using a single command line. This works in any software. (more…)
SPSS offers two choices under the recode command: Into Same Variable and Into Different Variables.
The command Into Same Variable replaces existing data with new values, but the command Into Different Variables adds a new variable to the data set.
In almost every situation, you want to use Into Different Variables. Recoding Into Same Variables replaces the values in the existing variable.
So if you notice a mistake after you’ve recoded, you can’t fix it.
But you may not even notice the mistake, because you can’t even test it.
And that’s just dangerous. (more…)