Choosing statistical software is part of The Fundamentals of Statistical Skill and is necessary to learning a second software (something we recommend to anyone progressing from Stage 2 to Stage 3 and beyond).
Choosing statistical software is part of The Fundamentals of Statistical Skill and is necessary to learning a second software (something we recommend to anyone progressing from Stage 2 to Stage 3 and beyond).
Sometimes what is most tricky about understanding your regression output is knowing exactly what your software is presenting to you.
Here’s a great example of what looks like two completely different model results from SPSS and Stata that in reality, agree.
I ran a linear model regressing “physical composite score” on education and “mental composite score”.
The outcome variable, physical composite score, is a measurement of one’s physical well-being. The predictor “education” is categorical with four categories. The other predictor, mental composite score, is continuous and measures one’s mental well-being.
I am interested in determining whether the association between physical composite score and mental composite score is different among the four levels of education. To determine this I included an interaction between mental composite score and education.
Here is the result of the regression using SPSS:
We’ve talked a lot around here about the reasons to use syntax — not only menus — in your statistical analyses.
Regardless of which software you use, the syntax file is pretty much always a text file. This is true for R, SPSS, SAS, Stata — just about all of them.
This is important because it means you can use an unlikely tool to help you code: Microsoft Word.
I know what you’re thinking. Word? Really?
Yep, it’s true. Essentially it’s because Word has much better Search-and-Replace options than your stat software’s editor.
Here are a couple features of Word’s search-and-replace that I use to help me code faster:
In a previous post we discussed the difficulties of spotting meaningful information when we work with a large panel data set.
Observing the data collapsed into groups, such as quartiles or deciles, is one approach to tackling this challenging task. We showed how this can be easily done in Stata using just 10 lines of code.
As promised, we will now show you how to graph the collapsed data. (more…)
Panel data provides us with observations over several time periods per subject. In this first of two blog posts, I’ll walk you through the process. (Stick with me here. In Part 2, I’ll show you the graph, I promise.)
The challenge is that some of these data sets are massive. For example, if we’ve collected data on 100,000 individuals over 15 time periods, then that means we have 1.5 million cells of information.
So how can we look through this massive amount of data and observe trends over the time periods that we have tracked? (more…)
The concept of a statistical interaction is one of those things that seems very abstract. Obtuse definitions, like this one from Wikipedia, don’t help:
In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the simultaneous influence of two variables on a third is not additive. Most commonly, interactions are considered in the context of regression analyses.
First, we know this is true because we read it on the internet! Second, are you more confused now about interactions than you were before you read that definition? (more…)