online workshops
Linear Models
Most of the statistical analyses you need to do as a researcher are most likely linear regressions or ANOVAs. Or some extension of them. But how well do you really understand linear models? learn more
Interpreting (Even Tricky) Regression Coefficients
The statistics classes you’ve had so far have probably focused on straightforward models with only continuous predictors. But real research, unfortunately, isn’t usually that simple. At some point, you’re going to need to build a more complex model. learn more
the craft of statistical analysis free webinars
Interpreting Linear Regression Coefficients
There are many coefficients in linear regression models that are difficult to interpret — interactions, categorical predictors, centered predictors. Put them together into one model and it’s even harder! learn more
Four Critical Steps in Building Linear Regression Models
A primary consideration in model building is which variables to include in the model. A secondary one is deciding which predictors to retain in the model. But the decisions don’t stop there. A number of other considerations can make model building either very straightforward or extremely frustrating. learn more
statistically speaking member trainings
Multicollinearity
Multicollinearity isn’t an assumption of regression models; it’s a data issue. And while it can be seriously problematic, more often it’s just a nuisance. learn more
Hierarchical Regression
Hierarchical regression is a very common approach to model building that allows you to see the incremental contribution to a model of sets of predictor variables. Popular for linear regression in many fields, the approach can be used in any type of regression model — logistic regression, linear mixed models, or even ANOVA. learn more
Using Excel to Graph Predicted
Values from Regression Models
Graphing predicted values from a regression model or means from an ANOVA makes interpretation of results much easier. Every statistical software will graph predicted values for you. But the more complicated your model, the harder it can be to get the graph you want in the format you want. learn more
ANCOVA (Analysis of Covariance)
Analysis of Covariance (ANCOVA) is a type of linear model that combines the best abilities of linear regression with the best of Analysis of Variance. It allows you to test differences in group means and interactions, just like ANOVA, while covarying out the effect of a continuous covariate. learn more
Dummy and Effect Coding
Why does ANOVA give main effects in the presence of interactions, but Regression gives marginal effects? What are the advantages and disadvantages of dummy coding and effect coding? When does it make sense to use one or the other? How does each one work, really? learn more
Transformations & Nonlinear Effects in Linear Models
Why is it we can model non-linear effects in linear regression? What the heck does it mean for a model to be “linear in the parameters?” We explore a number of ways of using a linear regression to model a non-linear effect between X and Y. learn more
The Multi-Faceted World of Residuals
Most analysts’ primary focus is to check the distributional assumptions with regards to residuals. They must be independent and identically distributed (i.i.d.) with a mean of zero and constant variance. Residuals can also give us insight into the quality of our models. learn more
Using Transformations to Improve Your
Linear Regression Model
Transformations don’t always help, but when they do, they can improve your linear regression model in several ways simultaneously. They can help you better meet the linear regression assumptions of normality and homoscedascity (i.e., equal variances). They also can help avoid some of the artifacts caused by boundary limits in your dependent variable — and sometimes even remove a difficult-to-interpret interaction. learn more
Marginal Means, Your New Best Friend
Interpreting regression coefficients can be tricky, especially when the model has interactions or categorical predictors (or worse – both). But there is a secret weapon that can help you make sense of your regression results: marginal means. learn more
Segmented Regression
Linear regression with a continuous predictor is set up to measure the constant relationship between that predictor and a continuous outcome. This relationship is measured in the expected change in the outcome for each one-unit change in the predictor. learn more
Quantile Regression: Going
Beyond the Mean
In your typical statistical work, chances are you have already used quantiles such as the median, 25th or 75th percentiles as descriptive statistics. But did you know quantiles are also valuable in regression, where they can answer a broader set of research questions than standard linear regression? learn more
articles at the analysis factor
Should I Specify a Model Predictor as
Categorical or Continuous?
Predictor variables in statistical models can be treated as either continuous or categorical. Usually, this is a very straightforward decision. But there are numerical predictors that aren’t continuous. And these can sometimes make sense to treat as continuous and sometimes make sense as categorical. learn more
What Is Specification Error in
Statistical Models?
When we think about model assumptions, we tend to focus on assumptions like independence, normality, and constant variance. The other big assumption, which is harder to see or test, is that there is no specification error. The assumption of linearity is part of this, but it’s actually a bigger assumption. learn more
Steps to Take When Your Regression (or
Other) Results Just Look… Wrong
You’ve probably experienced this before. You’ve done a statistical analysis, you’ve figured out all the steps, you finally get results and are able to interpret them. But they just look…wrong. Backwards, or even impossible—theoretically or logically. learn more
Understanding Interactions Between Categorical
and Continuous Variables in Linear Regression
We’ve looked at the interaction effect between two categorical variables. But what if our predictors of interest, say, are a categorical and a continuous variable? How do we interpret the interaction between the two? learn more
The Distribution of Independent Variables
in Regression Models
While there are a number of distributional assumptions in regression models, one distribution that has no assumptions is that of any predictor (i.e. independent) variables. It’s because regression models are directional. In a correlation, there is no direction–Y and X are interchangeable. If you switched them, you’d get the same correlation coefficient. learn more
Differences in Model Building Between
Explanatory and Predictive Models
Most of us were trained in building models for the purpose of understanding and explaining the relationships between an outcome and a set of predictors. But model building works differently for purely predictive models. learn more
Why ANOVA is Really a Linear Regression,
Despite the Difference in Notation
When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.” But they never explained why. And I couldn’t figure it out. The model notation is different. The output looks different. The vocabulary is different. The focus of what we’re testing is completely different. How can they be the same model? learn more
The Impact of Removing the Constant from a
Regression Model: The Categorical Case
In a simple linear regression model, how the constant (a.k.a., intercept) is interpreted depends upon the type of predictor (independent) variable. If the predictor is categorical and dummy-coded, the constant is the mean value of the outcome variable for the reference category only. If the predictor variable is continuous, the constant equals the predicted value of the outcome variable when the predictor variable equals zero. learn more
When to Leave Insignificant Effects in a Model
You may have noticed conflicting advice about whether to leave insignificant effects in a model or take them out in order to simplify the model. One effect of leaving in insignificant predictors is on p-values–they use up precious df in small samples. But if your sample isn’t small, the effect is negligible. learn more
Model Building Strategies: Step Up and Top Down
How should I build my model? I get this question a lot, and it’s difficult to answer at first glance–it depends too much on your particular situation. There are really three parts to the approach to building a model: the strategy, the technique to implement that strategy, and the decision criteria used within the technique. learn more
Five Common Relationships Among Three
Variables in a Statistical Model
In a statistical model–any statistical model–there is generally one way that a predictor X and a response Y can relate. This relationship can take on different forms, of course, like a line or a curve, but there’s really only one relationship here to measure. Usually the point is to model the predictive ability, the effect, of X on Y. learn more
Can a Regression Model with a Small
R-squared Be Useful?
R² is such a lovely statistic, isn’t it? Unlike so many of the others, it makes sense–the percentage of variance in Y accounted for by a model. I mean, you can actually understand that. So can your grandmother. And the clinical audience you’re writing the report for. A big R² is always good and a small one is always bad, right? Well, maybe. learn more
Confusing Statistical Terms #5: Covariate
Covariate is a tricky term in a different way than hierarchical or beta, which have completely different meanings in different contexts. Covariate really has only one meaning, but it gets tricky because the meaning has different implications in different situations, and people use it in slightly different ways. And these different ways of using the term have BIG implications for what your model means. learn more
Making Dummy Codes Easy to Keep Track of
Here’s a little tip. When you construct Dummy Variables, make it easy on yourself to remember which code is which. Heck, if you want to be really nice, make it easy for anyone else who will analyze the data or read the results. learn more
3 Situations When it Makes Sense to Categorize a
Continuous Predictor in a Regression Model
In many research fields, particularly those that mostly use ANOVA, a common practice is to categorize continuous predictor variables so they work in an ANOVA. This is often done with median splits—splitting the sample into two categories—the “high” values above the median and the “low” values below the median. There are many reasons why this isn’t such a good idea. learn more
Likert Scale Items as Predictor
Variables in Regression
I was recently asked about whether it’s okay to treat a likert scale as continuous as a predictor in a regression model. Here’s my reply. In the question, the researcher asked about logistic regression, but the same answer applies to all regression models. learn more
Why ANOVA and Linear Regression
Are the Same Analysis
If your graduate statistical training was anything like mine, you learned ANOVA in one class and Linear Regression in another. My professors would often say things like “ANOVA is just a special case of Regression,” but give vague answers when pressed. It was not until I started consulting that I realized how closely related ANOVA and regression are. They’re not only related, they’re the same thing. Not a quarter and a nickel–different sides of the same coin. learn more
Measures of Model Fit for Linear Regression Models
A well-fitting regression model results in predicted values close to the observed data values. The mean model, which uses the mean for every predicted value, generally would be used if there were no useful predictor variables. The fit of a proposed regression model should therefore be better than the fit of the mean model. learn more
Understanding Interaction Between Dummy Coded
Categorical Variables in Linear Regression
The concept of a statistical interaction is one of those things that seems very abstract. Obtuse definitions don’t help. But statistical interaction isn’t so bad once you really get it. learn more
stata
Incorporating Graphs in Regression Diagnostics with Stata
You put a lot of work into preparing and cleaning your data. Running the model is the moment of excitement. You look at your tables and interpret the results. But first you remember that one or more variables had a few outliers. Did these outliers impact your results? learn more
Linear Regression in Stata: Missing Data and the Stories they Might Tell
In a previous post, we examined how to use the same sample when comparing regression models. Using different samples in our models could lead to erroneous conclusions when interpreting results. But excluding observations can also result in inaccurate results. learn more
Using the Same Sample for Different Models in Stata
In a recent article, I presented a table which examined the impact several predictors have on one’ mental health. At the bottom of the table is the number of observations (N) contained within each sample. The sample sizes are quite large. Does it really matter that they are different? The answer is absolutely yes. Fortunately in Stata it is not a difficult process to use the same sample for all four models shown. learn more
Hierarchical Regression in Stata: An Easy Method to Compare Model Results
An “estimation command” in Stata is a generic term used for a command that runs a statistical model. Examples are regress, ANOVA, Poisson, logit, and mixed. Stata has more than 100 estimation commands. Creating the “best” model requires trying alternative models. There are a number of different model building approaches, but regardless of the strategy you take, you’re going to need to compare them. learn more
r
Linear Models in R: Diagnosing Our Regression Model
Last time we created two variables and added a best-fit regression line to our plot of the variables. Today we learn how to obtain useful diagnostic information about a regression model and then how to draw residuals on a plot. learn more
Linear Models in R: Plotting Regression Lines
Today let’s re-create two variables and see how to plot them and include a regression line. We take height to be a variable that describes the heights (in cm) of ten people. learn more
R Is Not So Hard! A Tutorial, Part 4: Fitting a Quadratic Model
In Part 4 we will look at more advanced aspects of regression models and see what R has to offer. One way of checking for non-linearity in your data is to fit a polynomial model and check whether the polynomial model fits the data better than a linear model. However, you may also wish to fit a quadratic or higher model because you have reason to believe that the relationship between the variables is inherently polynomial in nature. learn more
R Is Not So Hard! A Tutorial, Part 5: Fitting an Exponential Model
In Parts 3 and 4, we saw how to check for non-linearity in our data by fitting polynomial models and checking whether they fit the data better than a linear model. Now let’s see how to fit an exponential model in R. As before, we will use a data set of counts (atomic disintegration events that take place within a radiation source), taken with a Geiger counter at a nuclear plant. learn more
spss
Order Affects Regression Parameter Estimates in SPSS GLM
When you have an interaction in the model, the order you put terms into the Model statement affects which parameters SPSS gives you. The default in SPSS is to automatically create interaction terms among all the categorical predictors. But if you want fewer than all those interactions, or if you want to put in an interaction involving a continuous variable, you need to choose Model–>Custom Model. learn more