The Analysis Factor Statwise Newsletter
Volume 5, Issue 8
March 2013
In this Issue

A Note from Karen

Featured Article: Anatomy of a Normal Probability Plot

Further Reading and Resources

What's New

About Us

 
Quick Links

Our Website

More About Us

You received this email because you subscribed to The Analysis Factor's list community. To change your subscription, see the link at end of this email. If your email is having trouble with the format, click here for a web version.

Please forward this to anyone you know who might benefit. If you received this from a friend, sign up for this email newsletter here.
A Note From Karen

Karen Grace-MartinOn April 6th, I'll be in Rochester for the day at the Upstate ASA Conference. I'm already looking forward to meeting one of my workshop participants there. If you're in Rochester or going to be at the conference and would like to join a meet up, please let me know. It's always fun to connect in person.

As I mentioned in the last newsletter, we have opened our new Data Analysis Brown Bag program on a pilot basis to a small group. It's a low-cost program of ongoing support and one month in, we're getting great feedback on it. Once we get all the administrative details sorted out with the first group, we'll open it again to more participants.

We are offering a brand new workshop in April, just for those of you who are getting started or want a review of using statistics with SPSS. This one covers all the basics of the menus and syntax -- defining and working with variables, descriptive statistics and tests, and graphs. Get all the information here: Introduction to Data Analysis with SPSS.

We are also pleased to announce that in June, we are going to offer a brand new workshop: Linear Models in R, given by Dr. David Lillis, of Sigma Statistics. You may remember David as our guest webinar speaker in January. We'll be announcing the workshop once we have registration set up, but it looks like it will be starting on June 7th.

And I hope you enjoy today's article on an extremely useful little plot: The Normal Probability Plot. It seems to have been skipped over in many statistics classes, but it's worth learning for any data analyst.

Happy analyzing!
Karen

Feature Article

Anatomy of a Normal Probability Plot

Every statistical software procedure that dummy codes predictor variables uses a

A normal probability plot is extremely useful for testing normality assumptions. It's more precise than a histogram, which can't pick up subtle deviations, and doesn't suffer from too much or too little power, as do tests of normality.

There are two versions of normal probability plots: Q-Q and P-P. I'll start with the Q-Q. The Q-Q plot plots every observed value against a standard normal distribution with the same number of points. We have 111 observations in this data set, and you can see a histogram of the distribution on the right, and the corresponding Q-Q plot on the left.

Across the bottom are the observed data values, sorted lowest to highest. You can see that just like on the histogram, the values range from about -2.2 to 2.2. (Note, these are standardized residuals, so they already have a mean of 0 and a standard deviation of 1. If they didn't, the plot would standardize them before plotting).

On the Y axis are the values that you would have gotten if they came from a standard normal distribution with the same number of data points--111.

This concept is a little strange if you're not used to it, so think through it a bit. Remember back to your intro stats class, when you learned these rules about a standard normal distribution:

  • The mean is 0 and the standard deviation is 1
  • 34% of points are between the mean and one standard deviation below the mean.
  • Another 12.5% are between one and 2 standard deviations below the mean.
  • The final 2.5% are above 2 standard deviations below the mean.
  • Because the distribution is symmetric, these same percentages apply above the mean.

So in a distribution of 111 points with a mean of 0 and standard deviation of 1, we know that the 56th point must have a value of 0 -- in a normal distribution the median equals the mean.

Only 2.5% of the 111 points -- about 3 of them -- should have values at or below -2. So the 3rd value should be right around -2.

Likewise, there is an exact value that the 17th, the 42nd, each of the 111 ordered values would have under a standard normal distribution. We don't have easy ways to remember them, but those are the values plotted on the graph.

So if each value is exactly where it should be, if the distribution is perfectly normal, every single point would fall right on the line. The further a point is from the line, the further it is from where a normal distribution would put it.

The following Q-Q plot shows a positively skewed distribution. You can see that it's not matching a normal distribution well at all -- there are too many low values and too few high values.

I find it helpful to always plot a histogram along with the Q-Q plot, to aid interpretation. As you do more of these, you'll get better at reading them without the histogram.

Q-Q vs. P-P

The Q-Q is plotting the quantiles -- the actual values of X against the theoretical values of X under the normal distribution. That's what I've described above.

A P-P plot, one the other hand, plots the corresponding areas under the curve (cumulative distribution function) for those values.

In both, the points fall right on the line when normality has been met. For the most part, the normal P-P plot is better at finding deviations from normality in the center of the distribution, and the normal Q-Q plot is better at finding deviations in the tails. Q-Q plots tend to be preferred in research situations.

Both Q-Q and P-P plots can be used for distributions other than normal.

Further Reading and Resources

Comparison of P-P Plots and Q-Q Plots

Checking the Normality Assumption for an ANOVA Model

Assumptions of Linear Models are about Residuals, not the Response Variable

What's New

The next FREE Craft of Statistical Analysis Webinar:

Approaches to Missing Data: The Good, the Bad, and the Unthinkable

You've probably heard about many different approaches to dealing with missing data, and you've probably gotten different opinions about which one you should use. In this webinar, you'll get an overview of:

  • the three types of missing data, and how they affect the approach to take
  • the common approach that is generally worse than any other
  • the easy, common, seemingly bad approach that often isn't so bad, and the situations when it doesn't work
  • the two approaches that give unbiased results, one that is very easy to implement, but only works in limited situations and one that is harder to implement well, but works with any statistical analysis

Get more information and register here.

Upcoming Workshops:

Introduction to Data Analysis with SPSS

This workshop will give you the strong foundation you need to get started doing statistical analysis with SPSS. You will learn how to work with data sets, define and recode variables, and run univariate and bivariate statistics and graphs.

Everything is shown in both menus and syntax. This is a good introduction for anyone learning statistics in SPSS for the first time or for SPSS users who are comfortable with menus who want to learn the syntax.

Begins April 10, 2013

Get more information and register here.

About Us

What is The Analysis Factor? The Analysis Factor is the difference between knowing about statistics and knowing how to use statistics in data analysis. It acknowledges that statistical analysis is an applied skill. It requires learning how to use statistical tools within the context of a researcher's own data, and supports that learning.

The Analysis Factor, the organization, offers statistical consulting, resources, and learning programs that empower researchers to become confident, able, and skilled statistical practitioners. Our aim is to make your journey acquiring the applied skills of statistical analysis easier and more pleasant.

Karen Grace-Martin, the founder, spent seven years as a statistical consultant at Cornell University. While there, she learned that being a great statistical advisor is not only about having excellent statistical skills, but about understanding the pressures and issues researchers face, about fabulous customer service, and about communicating technical ideas at a level each client understands. 

You can learn more about Karen Grace-Martin and The Analysis Factor at theanalysisfactor.com.

Please forward this newsletter to colleagues who you think would find it useful. Your recommendation is how we grow.

If you received this email from a friend or colleague, click here to subscribe to this newsletter.

Need to change your email address? See below for details.

No longer wish to receive this newsletter? See below to cancel.