In your statistics class, your professor made a big deal about unequal sample sizes in one-way Analysis of Variance (ANOVA) for two reasons.
1. Because she was making you calculate everything by hand. Sums of squares require a different formula* if sample sizes are unequal, but statistical software will automatically use the right formula. So we’re not too concerned. We’re definitely using software.
2. Nice properties in ANOVA such as the Grand Mean being the intercept in an effect-coded regression model don’t hold when data are unbalanced. Instead of the grand mean, you need to use a weighted mean. That’s not a big deal if you’re aware of it. (more…)
Learning how to analyze data can be frustrating at times. Why do statistical software companies have to add to our confusion?
I do not have a good answer to that question. What I will do is show examples. In upcoming blog posts, I will explain what each output means and how they are used in a model.
We will focus on ANOVA and linear regression models using SPSS and Stata software. As you will see, the biggest differences are not across software, but across procedures in the same software.
(more…)
Our analysis of linear regression focuses on parameter estimates, z-scores, p-values and confidence levels. Rarely in regression do we see a discussion of the estimates and F statistics given in the ANOVA table above the coefficients and p-values.
And yet, they tell you a lot about your model and your data. Understanding the parts of the table and what they tell you is important for anyone running any regression or ANOVA model.
(more…)
Whether or not you run experiments, there are elements of experimental design that affect how you need to analyze many types of studies.
The most fundamental of these are replication, randomization, and blocking. These key design elements come up in studies under all sorts of names: trials, replicates, multi-level nesting, repeated measures. Any data set that requires mixed or multilevel models has some of these design elements. (more…)
Oops—you ran the analysis you planned to run on your data, carefully chosen to answer your research question, but your residuals aren’t normally distributed.
Maybe you’ve tried transforming the outcome variable, or playing around with the independent variables, but still no dice. That’s ok, because you can always turn to a non-parametric analysis, right?
Well, sometimes.
(more…)
Mixed models are hard.
They’re abstract, they’re a little weird, and there is not a common vocabulary or notation for them.
But they’re also extremely important to understand because many data sets require their use.
Repeated measures ANOVA has too many limitations. It just doesn’t cut it any more.
One of the most difficult parts of fitting mixed models is figuring out which random effects to include in a model. And that’s hard to do if you don’t really understand what a random effect is or how it differs from a fixed effect. (more…)