A data set can contain indicator (dummy) variables, categorical variables and/or both. Initially, it all depends upon how the data is coded as to which variable type it is.
For example, a categorical variable like marital status could be coded in the data set as a single variable with 5 values: (more…)
In the last post, we examined how to use the same sample when running a set of regression models with different predictors.
Adding a predictor with missing data causes cases that had been included in previous models to be dropped from the new model.
Using different samples in different models can lead to very different conclusions when interpreting results.
Let’s look at how to investigate the effect of the missing data on the regression models in Stata.
The coefficient for the variable “frequent religious attendance” was negative 58 in model 3 and then rose to a positive 6 in model 4 when income was included. Results (more…)
You may have never heard of listwise deletion for missing data, but you’ve probably used it.
Listwise deletion means that any individual in a data set is deleted from an analysis if they’re missing data on any variable in the analysis.
It’s the default in most software packages.
Although the simplicity of it is a major advantage, it causes big problems in many missing data situations.
But not always. If you happen to have one of the uncommon missing data situations in which (more…)
One important consideration in choosing a missing data approach is the missing data mechanism—different approaches have different assumptions about the mechanism.
Each of the three mechanisms describes one possible relationship between the propensity of data to be missing and values of the data, both missing and observed.
The Missing Data Mechanisms
Missing Completely at Random, MCAR, means there is no relationship between (more…)
Two methods for dealing with missing data, vast improvements over traditional approaches, have become available in mainstream statistical software in the last few years.
Both of the methods discussed here require that the data are missing at random–not related to the missing values. If this assumption holds, resulting estimates (i.e., regression coefficients and standard errors) will be unbiased with no loss of power.
The first method is Multiple Imputation (MI). Just like the old-fashioned imputation (more…)
Q: Do most high impact journals require authors to state which method has been used on missing data?
I don’t usually get far enough in the publishing process to read journal requirements.
But based on my conversations with researchers who both review articles for journals and who deal with reviewers’ comments, I can offer this response.
I would be shocked if journal editors at top journals didn’t want information about the missing data technique. If you leave it out, they’ll either assume you didn’t have missing data or are using defaults like listwise deletion. (more…)