Most statistical software packages use a spreadsheet format for viewing the data. This helps you get a feeling for what you will be working with, especially if the data set is small.
But what if your data set contains numerous variables and hundreds or thousands of observations? There is no way you can get warm and fuzzy by browsing through a large data set.
To help you get a good feel for your data you will need to use your software’s command or syntax editor to write a series of code for reviewing your data. Sounds complicated.
(more…)
I love working with my clients.
I love working with my clients for many reasons, but one of them is I learn so much from them.
Just this week, one of my clients showed me how to get SPSS GENLINMIXED results without the Model Viewer.
She’s my new hero.
If you’ve ever used GENLINMIXED, the procedure for Generalized Linear Mixed Models, you know that the results automatically appear in this new Model Viewer. (more…)
Like many people with graduate degrees, I have used a number of statistical software packages over the years.
Through work and school I have used Eviews, SAS, SPSS, R, and Stata.
Some were more difficult to use than others but if you used them often enough you would become proficient to take on the task at hand (though some packages required greater usage of George Carlin’s 7 dirty words).
There was always one caveat which determined which package I used. (more…)
One data manipulation task that you need to do in pretty much any data analysis is recode data. It’s almost never the case that the data are set up exactly the way you need them for your analysis.
In R, you can re-code an entire vector or array at once. To illustrate, let’s set up a vector that has missing values.
A <- c(3, 2, NA, 5, 3, 7, NA, NA, 5, 2, 6)
A
[1] 3 2 NA 5 3 7 NA NA 5 2 6
We can re-code all missing values by another number (such as zero) as follows: (more…)
Sometimes you need to know if your data set contains elements that meet some criterion or a particular set of criteria.
For example, a common data cleaning task is to check if you have missing data (NAs) lurking somewhere in a large data set.
Or you may need to check if you have zeroes or negative numbers, or numbers outside a given range.
In such cases, the any() and all() commands are very helpful. You can use them to interrogate R about the values in your data. (more…)
SPSS has the Count Values within Cases option, but R does not have an equivalent function. Here are two functions that you might find helpful, each of which counts values within cases inside a rectangular array. (more…)