Missing Data

Getting Started with Stata Tutorial #8: Examining Data in Stata

January 14th, 2025 by

Once you’ve imported your data into Stata the next step is usually examining it.stage 1

Before you work on building a model or running any tests, you need to understand your data. Ask yourself these questions:

  • Is every variable marked as the appropriate type?
  • Are missing observations coded consistently and marked as missing?
  • Do I want to exclude any variables or data points?

(more…)


Averaging and Adding Variables with Missing Data in SPSS

December 17th, 2024 by

SPSS has a nice little feature for adding and averaging variables with stage 1missing data that many people don’t know about.

It allows you to add or average variables that have some missing data, while specifying how many are allowed to be missing. (more…)


Seven Steps for Data Cleaning

June 20th, 2024 by

Ever consider skipping the important step of cleaning your data? It’s tempting but not a good idea. Why? It’s a bit like baking.stage 1

I like to bake. There’s nothing nicer than a rainy Sunday with no plans, and a pantry full of supplies. I have done my shopping, and now it’s time to make the cake. Ah, but the kitchen is a mess. I don’t have things in order. This is no way to start.

First, I need to clear the counter, wash the breakfast dishes, and set out my tools. I need to take stock, read the recipe, and measure out my ingredients. Then it’s time for the fun part. I’ll admit, in my rush to get started I have at times skipped this step.

(more…)


Issues in Coding Missing Values

October 11th, 2023 by

There’s no mincing words here. Missing values can cause problems for every statistician. That’s true for a lot of reasons, but it can start with simple issues of choices stage 1made when coding missing values in a data set. Here are a few examples.

Example 1: The Null License Plate

Researcher Joseph Tartaro thought it would be funny to get the following California vanity license plate: (more…)


Confusing Statistical Term #13: Missing at Random and Missing Completely at Random

November 22nd, 2022 by

Stage 2One of the important issues with missing data is the missing data mechanism. You may have heard of these: Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR).

The mechanism is important because it affects how much the missing data bias your results. This has a big impact on what is a reasonable approach to dealing with the missing data.  So you have to take it into account in choosing an approach.

The concepts of these mechanisms can be a bit abstract.missing data

And to top it off, two of these mechanisms have really confusing names: Missing Completely at Random and Missing at Random.

Missing Completely at Random (MCAR)

Missing Completely at Random is pretty straightforward.  What it means is what is (more…)


Multiple Imputation in a Nutshell

September 20th, 2021 by

Imputation as an approach to missing data has been around for decades.

stage-3

You probably learned about mean imputation in methods classes, only to be told to never do it for a variety of very good reasons. Mean imputation, in which each missing value is replaced, or imputed, with the mean of observed values of that variable, is not the only type of imputation, however. (more…)