Data Preparation

Getting Started with Stata Tutorial #8: Examining Data in Stata

January 14th, 2025 by

Once you’ve imported your data into Stata the next step is usually examining it.stage 1

Before you work on building a model or running any tests, you need to understand your data. Ask yourself these questions:

  • Is every variable marked as the appropriate type?
  • Are missing observations coded consistently and marked as missing?
  • Do I want to exclude any variables or data points?

(more…)


Averaging and Adding Variables with Missing Data in SPSS

December 17th, 2024 by

SPSS has a nice little feature for adding and averaging variables with stage 1missing data that many people don’t know about.

It allows you to add or average variables that have some missing data, while specifying how many are allowed to be missing. (more…)


Getting Started with Stata Tutorial #7: Importing Data into Stata

December 10th, 2024 by

In our previous posts, we’ve relied on Stata’s pre-loaded datasets to perform analyses. But when you’re working with your own data, you’ll need to know how to import it into Stata.

To demonstrate how this process works, we will use the Iris dataset from UCI.

Download the dataset, then move it to whichever directory you intend to use for Stata files.

There are three main ways of importing data in Stata: either use the menus to import the data, call the dataset by its full file extension, or change your directory to the one with your data and then refer to the dataset by name. (more…)


Seven Steps for Data Cleaning

June 20th, 2024 by

Ever consider skipping the important step of cleaning your data? It’s tempting but not a good idea. Why? It’s a bit like baking.stage 1

I like to bake. There’s nothing nicer than a rainy Sunday with no plans, and a pantry full of supplies. I have done my shopping, and now it’s time to make the cake. Ah, but the kitchen is a mess. I don’t have things in order. This is no way to start.

First, I need to clear the counter, wash the breakfast dishes, and set out my tools. I need to take stock, read the recipe, and measure out my ingredients. Then it’s time for the fun part. I’ll admit, in my rush to get started I have at times skipped this step.

(more…)


Issues in Coding Missing Values

October 11th, 2023 by

There’s no mincing words here. Missing values can cause problems for every statistician. That’s true for a lot of reasons, but it can start with simple issues of choices stage 1made when coding missing values in a data set. Here are a few examples.

Example 1: The Null License Plate

Researcher Joseph Tartaro thought it would be funny to get the following California vanity license plate: (more…)


Best Practices for Formatting Date Variables

March 9th, 2023 by

Formatting Date Variables seems like it should be straightforward, but sadly, it’s not.

If you are given data that includes dates, expect confusion. Dates can be represented in many different ways. (more…)