No matter what statistical model you’re running, you need to go through the same steps. The order and the specifics of how you do each step will differ depending on the data and the type of model you use.
These steps are in 4 phases. Most people think of only the third as modeling. But the phases before this one are fundamental to making the modeling go well. It will be much, much easier, more accurate, and more efficient if you don’t skip them.
And there is no point in running the model if you skip phase 4.
If you think of them all as part of the analysis, the modeling process will be faster, easier, and make more sense.
Phase 1: Define and Design
In the first 5 steps of running the model, the object is clarity. You want to make everything as clear as possible to yourself. The more clear things are at this point, the smoother everything will be. (more…)
There are many designs that could be considered Repeated Measures design, and they all have one key feature: you measure the outcome variable for each subject on several occasions, treatments, or locations.
Understanding this design is important for avoiding analysis mistakes. For example, you can’t treat multiple observations on the same subject as independent observations.
Example
Suppose that you recruit 10 subjects (more…)
I recently received a great question in a comment about whether the assumptions of normality, constant variance, and independence in linear models are about the errors, εi, or the response variable, Yi.
The asker had a situation where Y, the response, was not normally distributed, but the residuals were.
Quick Answer: It’s just the errors.
In fact, if you look at any (good) statistics textbook on linear models, you’ll see below the model, stating the assumptions: (more…)
One of those tricky, but necessary, concepts in statistics is the difference between crossed and nested factors.
As a reminder, a factor is any categorical independent variable. In experiments, or any randomized designs, these factors are often manipulated. Experimental manipulations (like Treatment vs. Control) are factors.
Observational categorical predictors, such as gender, time point, poverty status, etc., are also factors. Whether the factor is observational or manipulated won’t affect the analysis, but it will affect the conclusions you draw from the results.
(more…)
One issue in data analysis that feels like it should be obvious, but often isn’t, is setting up your data.
The kinds of issues involved include:
- What is a variable?
- What is a unit of observation?
- Which data should go in each row of the data matrix?
Answering these practical questions is one of those skills that comes with experience, especially in complicated data sets.
Even so, it’s extremely important. If the data isn’t set up right, the software won’t be able to run any of your analyses.
And in many data situations, you will need to set up the data different ways for different parts of the analyses. (more…)
Are you learning Multilevel Models? Do you feel ready? Or in over your head?
It’s a very common analysis to need to use. I have to say, learning it is not so easy on your own. The concepts of random effects are hard to wrap your head around and there is a ton of new vocabulary and notation. Sadly, this vocabulary and notation is not consistent across articles, books, and software, so you end up having to do a lot of translating.
(more…)