Volume 5, Issue 12
July 2013

A Note From Karen

Karen Grace-MartinHave you ever noticed that some of the most useful statistics are completely no-intuitive when you first encounter them?

Sometimes it takes a little bit of breaking the statistic apart to see what its doing in a particular context.

This month’s newsletter is about one of those, Intraclass Correlation (ICC). It’s a bit of a mouthfeel to say and a little strange at first, but very useful within a few statistical concepts, including mixed models.

I hope you find it useful.

In the meantime, we’ve got a few things in the works.

One is some great workshops for the fall, including two brand new ones by two different guest instructors. I will reveal more once those plans are firmed up.

Over the next few weeks, we hope to unveil (as it were) on demand versions of some of our favorite workshops. We have added to our team a new techy user experience person who is currently putting them together in a way that makes it easy to navigate. Again, we’ll let you know once that is available.

And, if you’re currently, or just contemplating, working with me one-on-one, I will be away (and without internet) for the middle two weeks of August. So please get in touch in the next few weeks so that we can get you on the calendar.

Happy analyzing!
Karen


Feature Article: The Intraclass Correlation Coefficient in Mixed Models

The ICC, or Intraclass Correlation Coefficient, can be very useful in many statistical situations, but especially so in Linear Mixed Models.

Linear Mixed Models are used when there is some sort of clustering in the data.

Two common examples of clustered data include:

  • individuals were sampled within sites (hospitals, companies, community centers, schools, etc.).  The site is the cluster.
  • repeated measures or longitudinal data where multiple observations are collected from the same individual.  The individual is the cluster in which multiple observations are grouped.

Observations from the same cluster are usually more similar to each other than observations from different clusters.  If they are, you can’t use statistical methods on these data to that assume independence, because estimates of variance, and therefore p-values, will be incorrect.

Mixed models not only account for the correlations among observations in the same cluster, they give you an estimate of that correlation.

At the right  is the equation of a very simple linear mixed model.  This has a single fixed independent variable, X, and a single random effect u.  For simplicity, I’m going to assume that X is centered on it’s mean.  This is also known as a random intercept model.

The subscripts i and j on the Y indicate that each observation j is nested within cluster i.

The u represents the random intercept for each cluster.  It’s really a residual term that measures the distance from each subject’s intercept around the overall intercept β0.  Rather than calculate an estimate for every one of those distances, the model is able to just estimate a single variance σ0.

That variance parameter estimate is the between-cluster variance.  The variance of the residuals is the within-cluster variance.  Their sum is the total variance in Y that is not explained by X.

random-intercept-graphIf there is no real correlation among observations within a cluster, the cluster means won’t differ.  It’s only when some clusters have generally high values and others have relatively low values that the values within a cluster are correlated.

In the graph on the right, each cluster has its own trajectory of a different color.  The thick black line represents the overall trajectory, averaged across all clusters.

Some clusters, like the magenta one, have all three values above the overall (black) mean.  Those values will be correlated, because they’re all relatively high.  Simultaneously, those three points have a high mean.

Likewise, the turquoise cluster has all three values below the overall (black) mean.  Again, those values will be correlated, because they’re all relatively low.  And the turquoise mean is  quite low.

And so it goes.  When some clusters have generally high values and others have generally low, (in other words, where there is consistency among a cluster’s responses), there is variation among the clusters’ means.  This is the between-cluster variance.

The within-cluster variance represents how far each point is to the cluster specific mean.  In other words, what the variation of the magenta points around the magenta trajectory?

In this graph, it’s pretty small.  Because those magenta points are all pretty high, they are quite close to their trajectory, and there is not a lot of within-cluster variation.

The ratio of the between-cluster variance to the total variance is called the Intraclass Correlation.  It tells you the proportion of the total variance in Y that is accounted for by the clustering.

It can also be interpreted as the correlation among observations within the same cluster.

Why ICC is useful

1. It can help you determine whether or not a linear mixed model is even necessary. If you find that the correlation is zero, that means the observations within clusters are no more similar than observations from different clusters.  Go ahead and use a simpler analysis technique.

2. It can be theoretically meaningful to understand how much of the overall variation in the response is explained simply by clustering.  For example, in a repeated measures psychological study you can tell to what extent mood is a trait (varies among people, but not within a person on different occasions) or state (varies little on average among people, but varies a lot across occasions).

3. It can also be meaningful to see how the ICC (as well as the between and within cluster variances) changes as variable are added to the model.


Further Reading and Resources

Snijders & Bosker (2011). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling.  Sage.

Five Extensions of the General Linear Model

The Difference Between Clustered, Longitudinal, and Repeated Measures Data

 

 
This Month's Data Analysis Brown Bag Webinar

Measures of Association: A review of types of correlations, crosstabs, and covariances for continuous, discrete, and categorical variables


Upcoming Workshop:

Analyzing Repeated Measures Data Online


Upcoming Webinar:

Random Intercept and Random Slope Models


Quick Links

The Analysis Factor

The Analysis Institute

More About Us

You received this email because you subscribed to The Analysis Factor's list community. To change your subscription, see the link at end of this email. If your email is having trouble with the format, click here for a web version.

Please forward this to anyone you know who might benefit. If you received this from a friend, sign up for this email newsletter here.


About Us

What is The Analysis Factor? The Analysis Factor is the difference between knowing about statistics and knowing how to use statistics in data analysis. It acknowledges that statistical analysis is an applied skill. It requires learning how to use statistical tools within the context of a researcher's own data, and supports that learning.

The Analysis Factor, the organization, offers statistical consulting, resources, and learning programs that empower researchers to become confident, able, and skilled statistical practitioners. Our aim is to make your journey acquiring the applied skills of statistical analysis easier and more pleasant.

Karen Grace-Martin, the founder, spent seven years as a statistical consultant at Cornell University. While there, she learned that being a great statistical advisor is not only about having excellent statistical skills, but about understanding the pressures and issues researchers face, about fabulous customer service, and about communicating technical ideas at a level each client understands. 

You can learn more about Karen Grace-Martin and The Analysis Factor at theanalysisfactor.com.

Please forward this newsletter to colleagues who you think would find it useful. Your recommendation is how we grow.

If you received this email from a friend or colleague, click here to subscribe to this newsletter.

Need to change your email address? See below for details.

No longer wish to receive this newsletter? See below to cancel.