Stata makes it a breeze to edit or clean your data. If you’re unfamiliar with using data sets in Stata, check out these blog posts to get a good grasp on importing and browsing data in Stata.
For this tutorial we will be using Stata’s “auto” data set. If you haven’t loaded it in yet, type
(more…)
There’s no mincing words here. Missing values can cause problems for every statistician. That’s true for a lot of reasons, but it can start with simple issues of choices
made when coding missing values in a data set. Here are a few examples.
Example 1: The Null License Plate
Researcher Joseph Tartaro thought it would be funny to get the following California vanity license plate: (more…)
Survey questions are often structured without regard for ease of use within a statistical model.
Take for example a survey done by the Centers for Disease Control (CDC) regarding child births in the U.S. One of the variables in the data set is “interval since last pregnancy”. Here is a histogram of the results.
(more…)
I recently opened a very large data set titled “1998 California Work and Health Survey” compiled by the Institute for Health Policy Studies at the University of California, San Francisco. There are 1,771 observations and 345 variables. (more…)

One data manipulation task that you need to do in pretty much any data analysis is recode data. It’s almost never the case that the data are set up exactly the way you need them for your analysis.
In R, you can re-code an entire vector or array at once. To illustrate, let’s set up a vector that has missing values.
A <- c(3, 2, NA, 5, 3, 7, NA, NA, 5, 2, 6)
A
[1] 3 2 NA 5 3 7 NA NA 5 2 6
We can re-code all missing values by another number (such as zero) as follows: (more…)
Before you run a Cronbach’s alpha or factor analysis on scale items, it’s generally a good idea to reverse code items that are negatively worded so that a high value indicates the same type of response on every item.
So for example let’s say you have 20 items each on a 1 to 7 scale. For most items, a 7 may indicate a positive attitude toward some issue, but for a few items, a 1 indicates a positive attitude.
I want to show you a very quick and easy way to reverse code them using a single command line. This works in any software. (more…)