Once you’ve imported your data into Stata the next step is usually examining it.
Before you work on building a model or running any tests, you need to understand your data. Ask yourself these questions:
- Is every variable marked as the appropriate type?
- Are missing observations coded consistently and marked as missing?
- Do I want to exclude any variables or data points?
(more…)
SPSS has a nice little feature for adding and averaging variables with missing data that many people don’t know about.
It allows you to add or average variables that have some missing data, while specifying how many are allowed to be missing. (more…)
In our previous posts, we’ve relied on Stata’s pre-loaded datasets to perform analyses. But when you’re working with your own data, you’ll need to know how to import it into Stata.
To demonstrate how this process works, we will use the Iris dataset from UCI.
Download the dataset, then move it to whichever directory you intend to use for Stata files.
There are three main ways of importing data in Stata: either use the menus to import the data, call the dataset by its full file extension, or change your directory to the one with your data and then refer to the dataset by name. (more…)
Ever consider skipping the important step of cleaning your data? It’s tempting but not a good idea. Why? It’s a bit like baking.
I like to bake. There’s nothing nicer than a rainy Sunday with no plans, and a pantry full of supplies. I have done my shopping, and now it’s time to make the cake. Ah, but the kitchen is a mess. I don’t have things in order. This is no way to start.
First, I need to clear the counter, wash the breakfast dishes, and set out my tools. I need to take stock, read the recipe, and measure out my ingredients. Then it’s time for the fun part. I’ll admit, in my rush to get started I have at times skipped this step.
(more…)
There’s no mincing words here. Missing values can cause problems for every statistician. That’s true for a lot of reasons, but it can start with simple issues of choices made when coding missing values in a data set. Here are a few examples.
Example 1: The Null License Plate
Researcher Joseph Tartaro thought it would be funny to get the following California vanity license plate: (more…)
Formatting Date Variables seems like it should be straightforward, but sadly, it’s not.
If you are given data that includes dates, expect confusion. Dates can be represented in many different ways. (more…)