In a previous post, Interpreting Interactions in Regression, I said the following:
In our example, once we add the interaction term, our model looks like:
Height = 35 + 4.2*Bacteria + 9*Sun + 3.2*Bacteria*Sun
Adding the interaction term changed the values of B1 and B2. The effect of Bacteria on Height is now 4.2 + 3.2*Sun. For plants in partial sun, Sun = 0, so the effect of Bacteria is 4.2 + 3.2*0 = 4.2. So for two plants in partial sun, a plant with 1000 more bacteria/ml in the soil would be expected to be 4.2 cm taller than a plant with less bacteria.
For plants in full sun, however, the effect of Bacteria is 4.2 + 3.2*1 = 7.4. So for two plants in full sun, a plant with 1000 more bacteria/ml in the soil would be expected to be 7.4 cm taller than a plant with less bacteria.
But I just received the following question about this explanation. I thought I’d respond here, in case I’m confusing other people as well.
The question was:
I was confused on how to interpret the interaction results. According to the post “For plants in full sun, however, the effect of Bacteria is 4.2 + 3.2*1 = 7.4.” I do not understand why the “sun” coefficient is not included, such that the effect of bacteria in full sun would be 9 + 4.2 + 3.2*1. Thanks for your help.
And here’s my answer:
Excellent question. First of all, you would need to include the 9 (the coefficient for full sun) to calculate the predicted, or mean, height for plants in full sun at any specific value of Bacteria that you decided to plug in.
Because Sun is dummy-coded, that 9 (Sun’s coefficient) represents the difference in mean plant heights for plants in full sun compared to those in partial sun ONLY when Bacteria=0.
But to know the effect of Bacteria levels on plant height, you don’t need to know the differences in means. The effect of a predictor variable, X, in a regression model is how much Y differs, on average, for a one-unit difference in X.
In this example, it’s the increase (or decrease) in plant height for each incremental difference in soil bacteria count.
That’s the slope in a simple linear regression.
The interaction is telling you that this increase is not the same for plants in full and partial sun.
So the coefficient of Bacteria on its own is not enough to tell you the effect of Bacteria on plant height. The coefficient of Bacteria is not an overall slope for Bacteria.
Because it’s not a constant effect. There are two different slopes (effects of Bacteria on height). One for full sun and one for part sun.
Bul says
Hi all) I need to clarify about the significance of the coefficients. Do we require that all coefficients are significant? I mean coefficient of Bacteria and coefficient of the interaction of Bacteria with Sun. If only the coefficient of interaction is significant, so we consider only its value for interpetation (not summing with coefficient of Bacteria)? Thank you for clarifying.
Seren says
If I am unsure whether the model requires an interaction term, and I add it in anyway could this cause the model to be incorrect? In other words is it a safe bet to always add in an interaction term? Thank you
NEnduru says
Hi Seren,
we add interaction term if there is any relationship between two variables.
eg., predict children’s food nutrition based on families size and income. here the interaction between family size and income can make significant difference.
Karen Grace-Martin says
NEnduru,
I would just clarify, the interaction isn’t about the relationship between two variables. It’s about how a third affect the relationship between two. Subtle difference. See The Difference Between Interaction and Association
A says
The effect of Bacteria on Height can be interpreted as “how much Height differs for a one-unit difference in Bacteria”
Rewrite original formula:
Height_1 = 35 + 4.2*Bacteria + 9*Sun + 3.2*Bacteria*Sun
= 35 + 9*Sun + (4.2 + 3.2*Sun)*Bacteria
Assume, Bacteria increase by 1:
Height_2 = 35 + 9*Sun + (4.2 + 3.2*Sun)* (Bacteria + 1)
= 35 + 9*Sun + (4.2 + 3.2*Sun)*Bacteria + (4.2 + 3.2*Sun)
= Height_1 + (4.2 + 3.2*Sun)
Therefore, Height_2 – Height_1 = 4.2 + 3.2*Sun
So the effect is 4.2 + 3.2*Sun
Karen says
Yes, exactly. That’s the whole idea of the interaction–the effect of bacteria on height is not the same for without sun as it is for with sun.
ben says
would it be correct to say that the effect of bacteria AND sun is 9+4.2+3.2*1? or what else would that sum mean?
thanks for any input,
ben
Hendrik says
What if, instead of Bacteria, we had another dummy coded variable that interacted with Sun: Would this argument still hold?