Why report estimated marginal means?

Updated 8/18/2021

I recently was asked whether to report means from descriptive statistics or from the Estimated Marginal Means with SPSS GLM.Stage 2

The short answer: Report the Estimated Marginal Means (almost always).

To understand why and the rare case it doesn’t matter, let’s dig in a bit with a longer answer.

First, a marginal mean is the mean response for each category of a factor, adjusted for any other variables in the model (more on this later).

Just about any time you include a factor in a linear model, you’ll want to report the mean for each group. The F test of the model in the ANOVA table will give you a p-value for the null hypothesis that those means are equal. And that’s important.

But you need to see the means and their standard errors to interpret the results. The difference in those means is what measures the effect of the factor. While that difference can also appear in the regression coefficients, looking at the means themselves give you a context and makes interpretation more straightforward. This is especially true if you have interactions in the model.

Some basic info about marginal means

  • In SPSS menus, they are in the Options button and in SPSS’s syntax they’re EMMEANS.
  • These are called LSMeans in SAS, margins in Stata, and emmeans in R’s emmeans package.
  • Although I’m talking about them in the context of linear models, all the software has them in other types of models, including linear mixed models, generalized linear models, and generalized linear mixed models.
  • They are also called predicted means, and model-based means. There are probably a few other names for them, because that’s what happens in statistics.

When marginal means are the same as observed means

Let’s consider a few different models. In all of these, our factor of interest, X, is a categorical predictor for which we’re calculating Estimated Marginal Means. We’ll call it the Independent Variable (IV).

Model 1: No other predictors

If you have just a single factor in the model (a one-way anova), marginal means and observed means will be the same.

Observed means are what you would get if you simply calculated the mean of Y for each group of X.

Model 2: Other categorical predictors, and all are balanced

Likewise, if you have other factors in the model, if all those factors are balanced, the estimated marginal means will be the same as the observed means you got from descriptive statistics.

Model 3: Other categorical predictors, unbalanced

Now things change. The marginal mean for our IV is different from the observed mean. It’s the mean for each group of the IV, averaged across the groups for the other factor.

When you’re observing the category an individual is in, you will pretty much never get balanced data. Even when you’re doing random assignment, balanced groups can be hard to achieve.

In this situation, the observed means will be different than the marginal means. So report the marginal means. They better reflect the main effect of your IV—the effect of that IV, averaged across the groups of the other factor.

Model 4: A continuous covariate

When you have a covariate in the model the estimated marginal means will be adjusted for the covariate. Again, they’ll differ from observed means.

It works a little bit differently than it does with a factor. For a covariate, the estimated marginal mean is the mean of Y for each group of the IV at one specific value of the covariate.

By default in most software, this one specific value is the mean of the covariate. Therefore, you interpret the estimated marginal means of your IV as the mean of each group at the mean of the covariate.

This, of course, is the reason for including the covariate in the model–you want to see if your factor still has an effect, beyond the effect of the covariate.  You are interested in the adjusted effects in both the overall F-test and in the means.

If you just use observed means and there was any association between the covariate and your IV, some of that mean difference would be driven by the covariate.

For example, say your IV is the type of math curriculum taught to first graders. There are two types. And say your covariate is child’s age, which is related to the outcome: math score.

It turns out that curriculum A has slightly older kids and a higher mean math score than curriculum B. Observed means for each curriculum will not account for the fact that the kids who received that curriculum were a little older. Marginal means will give you the mean math score for each group at the same age. In essence, it sets Age at a constant value before calculating the mean for each curriculum. This gives you a fairer comparison between the two curricula.

But there is another advantage here. Although the default value of the covariate is its mean, you can change this default.  This is especially helpful for interpreting interactions, where you can see the means for each group of the IV at both high and low values of the covariate.

In SPSS, you can change this default using syntax, but not through the menus.

For example, in this syntax, the EMMEANS statement reports the marginal means of Y at each level of the categorical variable X at the mean of the Covariate V.

UNIANOVA Y BY X WITH V
/INTERCEPT=INCLUDE
/EMMEANS=TABLES(X) WITH(V=MEAN)
/DESIGN=X V.

If instead,  you wanted to evaluate the effect of X at a specific value of V, say 50, you can just change the EMMEANS statement to:

/EMMEANS=TABLES(X) WITH(V=50)

Another good reason to use syntax.

Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Reader Interactions

Comments

  1. Philip Campbell says

    Hi Karen,

    Very useful article. I am using the GENLIN function in SPSS to compare a continuous DV across 3 groups (with group 1 acting as the reference control group to make comparisons), I am controlling for a continuous covariate age, but also a dichotomous covariate (smoking status). I have also added an interaction term of age*group as I know that the effect of age is different on the DV across the groups (violation of homogeneity of regression slopes…so I am not calling in an ANCOVA). The EM means produced are for the average age (which is interpretable), but for smoking status is reports a value of 4.2. It is not clear what this represents as you either don’t smoke (coded 0) or do (coded 1). Is there likely something I have done wrong in the set up?

    • Karen Grace-Martin says

      The 4.2 would be averaged across the two smoking statuses.

      If you want an EMMean for each smoking status, you need to include a group*smoking status interaction and then ask for the EMMEans for the interaction.

      • Philip Campbell says

        Thank you very much, makes complete sense now. Related to this, is there any way to ask SPSS to output an adjusted DV value for each individual study participant that is corrected for the covariates, i.e. if each participant was the same age and didn’t smoke? I know this was the EMmeans are producing, but I would like to plot the individual points rather than a bar graph.

        • Karen Grace-Martin says

          Yes. If it’s for each individual, it would be their predicted value. These are under the SAVE subcommand or button.

          But it does take into account the covariates for that person. If you want the same age and didn’t smoke, that’s just the overall mean.

          If you want to come up with predicted values for a hypothetical person with certain values of the covariates, then specify those in the EMMEANS.

  2. Aditja says

    Getting estimated marginal means at different values of a covariate is shared here. Is there an argument for the same in R’s emmeans package or any other package in R?
    Thank you!

  3. Ainhoa says

    Hello, Thanks for the post. It was very helpful.

    I have a question. I saw that my adjusted means and the mean of the adjusted values are different. I don’t know why is that happening and also what values should I present in my research.

    Thanks in advance!

  4. Aleem Ashraf says

    Hi Karen,
    This article is really useful. In the output of ANCOVA, the SPSS produces estimated marginal means adjusted for the continuous covariate at its mean level. I’d like to learn how exactly SPSS adjusts those means. Can you teach us the procedure to manually do that for the sake of understanding what’s going on?

    • Karen Grace-Martin says

      Hi Aleem,

      The short answer isL If Y is the response, X the categorical IV, and Z the continuous covariate. The EMM is the predicted value of Y for each group of X at the point on the regression line between Y and Z where Z is at its mean.

      I realize that may not be super helpful, but it’s really hard to explain without a drawing. I’ll try to add one to the post.

  5. ANdre says

    Hi there Karen,

    Is there anything wrong with reporting the effect size when calculating the difference between two EMMs?

    Thanks

  6. Meenu says

    Hi Karen,

    This is the same model I have been working on with your help having continuous time and categorical group treatment/control and their interaction in mixed model. I want to compare mean outcome of two groups at specific values of continuous covariate (similar to spotlight analysis blog post).

    To understand the means and the pvalues obtained from ttests; and marginal means after running a mixed model I observed::

    At time point 1, the p value obtained from ttest (p=0.1462) is totally opposite from margins (p=0.031) and
    At time point 120, the p value obtained from ttest (p=0.0782) is totally opposite from margins (p=0.114)

    However, their mean difference is somewhat similar except p-values. The trend is totally opposite. Please help me understand why these opposite p-values?
    How can I explain to a non-statistician about using margins a better option that test to answer my research question.

    Thanks
    meenu

  7. Leah says

    Hi,
    I have run the mixed linear model to investigate the effect size for two interventions. I also want to report the Estimated marginal means for the within group changes. I am conducting this in spss and have obtained this from the estimates table output. However, this only provided the 95% CI. I was wondering if there is a way to generate p-values in spss for this?
    Thanks
    Leah

  8. Amber van der Wal says

    Dear Karen,
    I have learned that another reason to use the marginal means is when you have unequal cell sizes. I have conducted a two-way anova for example in which the groups aren’t exactly the same size. Do I understand correctly that I should report the marginal means and standard error instead of the mean and standard deviation?
    Thank you in advance.

    Kind regards,
    Amber van der Wal

  9. Lora says

    In addition to a covariate which serves as a control variable (covariate A) for my ANCOVA (model 1), I also want to know whether another covariate (covariate B) is a significant predictor of the DV. Do I need to run Covariate B as a straight IV in a separate ANCOVA model (model 2), or can I just get an EMM for covariate B from the original ANCOVA (model 1)?

  10. David says

    Any advice for getting estimated marginal means with a within-subject variable? I am looking at the dependent variable SIR over three time points (pre, mid, and posttreatment).

    My syntax is:

    MIXED sir with time
    /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0,
    ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
    /FIXED= time | SSTYPE(3)
    /METHOD=REML
    /PRINT=SOLUTION TESTCOV
    /RANDOM=INTERCEPT time | SUBJECT(id) COVTYPE(UN)
    /REPEATED=time | SUBJECT(id) COVTYPE(AR1)
    /SAVE=PRED RESID
    /EMMEANS=TABLES(time).

    And this gives me the error message:
    Only TABLES (OVERALL) is valid in the EMMEANS subcommand when no factors are stated. Execution of this command stops.

    Any advice would be much appreciated!

    Thanks,
    David

    • Karen says

      Hi David,

      The problem is not that it’s within subjects, it’s that you’ve made Time continuous. SPSS will only do EMMeans for each value of a categorical variable. So in your mixed statement, change the WITH to BY. Based on your description, it sounds like time ought to be categorical anyway, as the three time points have qualitative meanings.

      You may find this helpful: SPSS GLM: Choosing Fixed Factors and Covariates

  11. putri says

    dear karen,

    I have an experiment with more than 50 treatments, each of the treatment has very different sample sizes. is it better to present EMM rather than actual mean? would you please explain why?

  12. Desiree says

    Dear Karen,

    Many thanks for this information. I actually have the same questions as Annicka. I am running a LMM on reaction times in spss, with condition as a fixed factor and subject as random factor. (In one of the conditions there are a few missing values). Now the descriptives (only slightly) differ from the EMmeans. Why is this the case? Because there are missing values? Or because of the random factor?
    As I would like to report the means with standard deviations, I am inclined to report the outcomes from the Descriptives. But I now doubt if I perhaps should report the EMmeans. Is there any way to get standard deviations in that case?

    kind regards,
    Desiree

  13. Annicka says

    Hi Karen,
    I have used a univariat mixed-linear effects model in SPSS to investigate the time-effect on my outcome variable. That is, follow-up time is added as the only factor. I´m a bit confused about why the estimated marginal means differ from the descriptive ones, as I have not entered any covariats that the model would adjust for. Would be great if you could explain why the means differ.
    Thank you!

  14. Mary says

    Hello,

    I have run a two way MANCOVA and am reporting the results. Since I should report the estimated means for my descriptives, I am trying to make a bar graphs with the estimated means but cannot figure out how short of maknig them in Excel. Is there a way to do this in SPSS?

    Feeling lost

    • Karen says

      Hi Mary,

      I don’t know a way off the top of my head other than exporting the EMMeans table as an SPSS data set, then using that as the data set for your graph.

  15. Omar says

    Hello Karen,

    At the moment, if I want to know the EMMs evaluated at multiple values of the covariate, I create separate EMM tables. e.g. extending your example:

    UNIANOVA Y BY X WITH V
    /METHOD=SSTYPE(3)
    /INTERCEPT=INCLUDE
    /EMMEANS=TABLES(X) WITH(V=50)
    /EMMEANS=TABLES(X) WITH(V=100)
    /EMMEANS=TABLES(X) WITH(V=150)
    /CRITERIA=ALPHA(.05)
    /DESIGN=X V.

    Is it possible to do this within a single table?

    Thanks

  16. Andreea says

    Hello ,
    I have a related problem: I want to obtain predicted means of outcome adjusted for various other factors (use the model to predict the outcome at mean values of the co-variates). However I have both categorical and continuous confounders, so I cannot do mean for categorical ones, maybe mode. Is there an easier way in GLM to do this taking into account that some of my predictors are categorical? Initially I was planning to do it in a linear regression, do dummies for my categorical variables, and then work out the modal value of the categorical predictors and add them to the first free row for the corresponding variable at the bottom of the file. Do the same – with the mean – for any continuous predictors. Then in the Linear regression dialogue box select the ‘Save’ and check the ‘Unstandardized predicted values’ and ‘Mean prediction intervals’ boxes. It will save the predicted value plus confidence intervals in that row in the datasheet.
    However I was hoping estimated marginal means will help me work around all those steps, but how does it account for the categorical predictors? thank you!

  17. Claudia says

    I am a little confused:
    – First, you write about “FACTORS (aka categorical predictors)” that were manipulated and not measured.
    – Then, you write about “a COVARIATE in the model that was measured, not manipulated… The estimated marginal means will now be adjusted for the covariate.”
    – But what if I have a measured factor, which I do not treat as a covariate, but as an independent factor?
    – Would this sentence also be right:? “If however, you have an IV in the model that was measured, not manipulated, things are a little different. The estimated marginal means will now be adjusted for the IV.”

    • Karen says

      Hi Claudia,

      Good question. From the model’s mathematical point of view, there is no difference between variables that are manipulated or observed. Observed variables are more likely to be correlated, whereas manipulated ones are more likely to be independent. Beyond that, there is no difference in how SPSS estimates a manipulated or observed variable.

      The model only cares if it’s categorical or continuous.

      So yes, you would still treat a measured factor as a factor. The only thing that differs is how you will interpret the results. The estimated marginal means will be adjusted for any other predictors, factors or covariates, in the model.

  18. Ian says

    Lots of good advice on this subject, thanks! One issue however: isn’t the rote calculation of EMMs for groups, after adjustment for covariates, equivalent to doing ANCOVA without first testing for heterogeneity of slopes by the significance of the covariate X categorical interaction term?

  19. Mariska says

    Hi,

    For a meta-analysis, we need a mean and standard deviation (sd) to calculate effect sizes. We have estimated standardized means and standard errors (se) from SPSS, but no standard deviations. Is it correct to apply the formula sd = se * sqrt(n) on our se from our adjusted analysis to calculate the standard deviation? Thank you for your help!

    Mariska

    • Karen says

      It depends on exactly which procedure you’re using. Your means are standardized? Hmm.

      If you’re using, say the estimated marginal means, realize that those are based on the assumption that all groups have the same variance. So those std errors aren’t unique. I’m not sure if you need unique sd’s for meta-analysis.

  20. Maya says

    It’s great to have a plot of marginal means, but how can I add SD or SE to that plot. Can anyone help.
    Maybe there is a syntax or something that can help?

    Thanks.

    • Karen says

      Hi Maya,

      I don’t know that you can do it within the GLM plot. But you can export the EMMeans table, with standard errors, and plot those.

      Karen

  21. Alessio Toraldo says

    Dear all
    thank you for the useful posts. I have a related problem.
    I have to run a GLM analysis with factors A, B and a covariate C.
    I wished to know what the EMM of AxB are when C=0, and you already solved my problem, by suggesting the syntax to obtain such information.
    However, I also wish to have the significance values for the main effect of A, the main effect of B, and for the interaction AxB, *all computed at C=0.*
    SPSS, by default, gives you the ANOVA output table (with all F, df, p-values, etc) with effects of factors and interactions computed for the *average* values of the covariate. Instead, I would need to have the table referring to a specific covariate value (C=0, see above). Do you know how to do it?
    Thank you for any suggestions.
    Alessio

  22. Kathy says

    Hi Karen,
    I need to report the standard deviation with my marginal means instead of standard error. Is there anyway to calculate that via spss?

    Thanks

    • Karen says

      Hi Kathy,

      I believe the easiest way is to get the descriptives. They won’t be adjusted means, but the standard deviations will be there too.

      Either check the descriptives box under the Options button or use /Print Descriptives in syntax.

  23. alex says

    Thanks for the content.

    I have a related question: I want to know how using SPSS to generate a scatter plot of my data taking corrected for the covariates.

    I have a single predictor variable (X) that I am interested in its effect on a single response variable (Y). But I have several covariates and one factor variable.

    Can I plot the effect of X on Y taking into account 4 covariates and 1 factor?

    • Karen says

      Hi Alex,

      Yes, but you’ll have to do it in two steps.

      The first step is to run a regression model regressing Y on the 4 covariates and 1 factor (without X). Save the residuals, which is easy to do in GLM with a /SAVE Resid subcommand.

      Those residuals are literally the distribution of Y after controlling for all those covariates. It’s what’s still not explained by those covariates.

      Now plot X vs. Residuals.

  24. Sigrid says

    Hi Karen,
    thank you for your answer.
    I do not think that my situation is comparable to the one you mention. The problem that I suppose they want me to address is, that they would wish to be able to apply my results to all possible pobulations and not just mine – that is representative for my country only. So they (and I) are wondering whether there is a way to make general comments on the results of my calculations.
    If you have any idea on how to do it, it would be a great help to me.
    Thanks, Sigrid

  25. Karen says

    Hi Sigrid,

    I can really only guess what they’re asking for, but it sounds like it isn’t about the standard errors.

    The EMMeans adjust for other terms in the model, but that won’t make them interpretable for a similar population.

    One thing I just saw in consulting, which I’ve never seen before, is the researcher added a weight command before running her glm. It seemed strange to me because none of the reasons for weighting applied (missing data, complex sample, nonconstant variance).

    It turns out that she weighted so that the results would be adjusted to be representative of the population. She had equal n’s in her three samples (it was an experiment), but these samples come from populations that aren’t equally observed in the population.

    This seemed strange to me, since she wasn’t estimating the overall population mean, just the mean for each group, but it might be very important in her field in ways I’m not familiar with. Could it be something like that?

  26. Sigrid says

    Hi Karen,
    I performed a Gamma GLM and was asked to produce adjusted estimates for my dependent variable because the results should be interpretable for a similar population. I am confused. What I am actually asked for?
    I computed model-based estimates as well as robust ones and they did hardly differ. Hence I chose robust estimates since they would allow for errors in incorrectly specified covariance structure. Somehow I have the feeling that this does not address the question. Could you please tell me what I am actually have to do?
    Thanks, Sigrid

    /EMMEANS SCALE=ORIGINAL
    /EMMEANS TABLES=vdichotom1 SCALE=ORIGINAL COMPARE=vdichotom1 CONTRAST=PAIRWISE
    PADJUST=SEQBONFERRONI
    /EMMEANS TABLES=vdichotom2 SCALE=ORIGINAL COMPARE=vdichotom2 CONTRAST=PAIRWISE
    PADJUST=SEQBONFERRONI

  27. Otto says

    It seems, that SPSS 18 doesn’t adjust the Estimated Marginal Means for a Repeated Measures ( Within-Subject)-Variable.

    • Karen says

      Otto, that’s not surprising if you ran it in GLM Repeated Measures. In that approach, the within subject variable is actually made up of multiple variables–one response for each level of the variable (the wide format).

      If you ran it in Mixed, it would adjust for the within subject variable, since it is able to account for the within-subject variable as a single variable. It requires setting up the data differently (the long format).

  28. orna says

    Hi Karen, thank you for the informative post!

    I recently ran a repeated measures analysis, and I’m not sure which means I should report. I have 2 independent variables (1 within subject, and one between), and the cells are similar in size.

    Should I report the estimated marginal means, or should I report the means and SD’s from the descriptive tables? (From some reason, the descriptive do not include overall means and SD’s for the between-subject variable).

    Thank you,
    orna

    • Karen says

      Hi Orna,

      Report the Estimated Marginal Means. If your independent variables are independent of each other, they shouldn’t differ from the descriptives anyway. And if they do, the EMMeans are the ones you’re interested in.

      Karen

  29. Guido says

    Hi Karen,

    Thanks very much for you very quick response! This is exactly what I did; I asked for the means at high and low levels of the continuous predictors. And the data makes sense (theoretically and replicating earlier finding in which I used different paradigm). Thanks again!

    Best, Guido

  30. Guido says

    Hi,

    Nice article! I have an additional question. Is it o.k. if the estimated marginal means have a negative value (on a measure that can’t be negative). Or should my alarm bells be ringing? The data, however, does make sense to me (a ran a GLM with a categorical between factor, one repeated factor, and two continuous predictors). Thanks for any hints!

    • Karen says

      Hi Guido,

      It is possible to get a negative EMMean if the DV can’t be negative, if for example, you asked for the EMMean at the value of a continuous predictor that doesn’t actually exist in the data.

      But unless you specifically did something like that, my alarm bells would be ringing. I would just check into it and make sure it’s estimating what you think it is.

      Karen

  31. Janelle says

    I need to somehow obtain a SD from the Marginal Means SE because I have a problem where I have overlapping samples (I have three types of a disease where people may have more than 1 type of this disease) and I’m testing differences between these 3 disease types. I have a way to compute a variance of the differences between overlapping samples but I need to be able to obtain SD rather than a SE. Can anyone help?

  32. Bill says

    Hi,

    Very usefull article thank you,
    i have an additional question.
    Why the ‘estimated marginal means’ adjusted for a measured covariate are not the same with the means of the new variable NewY which is obtained after saving the unstandardized predicted values ?

  33. Katy says

    Through some trial and error today I discovered that SPSS doesn’t seem to give the standard error of the mean in the EMM. They are reporting a standard error, but it seems to be based only on sample size and not on standard deviation. Is there any way to get the SEM or the actual standard deviation for estimated marginal means with a covariate? When I try to calculate the stdev from the standard error provided in EMM, I get the same stdev for each group, which seems doubtful. I’m now worried about the legitimacy of using standard error from SPSS EMM in post-hoc t-tests if it is not really the standard error of the mean–anyone have insight on this?

    • Karen says

      Hi Katy,

      That’s a great question. It threw me for a loop when I first discovered it too, but it’s actually not a problem.

      The standard errors in the estimated marginal means are all based on the Mean Squared Error (MSE) in the overall ANOVA table. It reports them this way based on the ANOVA assumption that all groups have equal variance.

      If that assumption is true, it’s inefficient to report separate estimates of the same population variance. So rather than report the variance separately for each group mean, it uses the average variance of all the groups.

      • Far says

        Fixed factorD Mean Std. Error
        Level1 88.742 .751
        Level2 88.872 .832
        Level3 89.664 .738
        Hi there
        My design is a factorial design. Factor C with 4 levels and Factor D with 3 levels.
        In my final result table I would like to report just one SEM for each fixed factor. The table which you see above is estimated marginal means table after GLM, univariate analysis in SPSS. May I know please which one of these std errors is my SEM for FacorD?
        In advance thank you so much for your help and consideration.
        Cheers.
        the table should be like this

        level1 level2 level3 SE
        dependent variable 98.11 98.44 97.65 0.265

        • Karen says

          Hi Far,

          Hmm, usually the estimated marginal means give just one std error across a factor, but the descriptives give multiple values.

          I’m not sure what’s going on there.

    • admin says

      Vandana,

      Most statistical software should give you the standard errors along with the EMM. I know both SPSS and SAS do (SAS calls them LSMeans) in GLM.

      Karen

      • Jonathan says

        I have the same issue. How can I get SPSS to tell me the standard deviations of adjusted means?


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.