July 2018 Newsletter
StatWise
Newsletter
July 2018 | Issue 120
A Note from Karen

Please join us in welcoming a new team member to The Analysis Factor: Jeremiah Cotman. Jeremiah is our new Client Engagement and Communications coordinator. He started a few weeks ago, so if you're a member of Statistically Speaking or a consulting client, you may have already been introduced. His background in member support and engagement at a small business incubator and graduate school will make him fit right in.

Upcoming this month is a brand new installment in our very popular The Craft of Statistical Analysis free webinar program. Steve Simon will introduce us to some fundamentals of Survival Analysis. You can join him on Wednesday, July 25th at 12 Noon (EDT) - reserve your spot here. This is a great resource for those of you interested in his Survival Analysis workshop this fall.

And next week is your last chance for a really useful workshop: Principal Component and Factor Analysis. Enrollment ends next Thursday, July 12th. Workshop instructor Christos Giannoulis, has provided us a warm-up with the article below on data reduction.

Happy Analyzing!
Karen


How to Reduce the Number of
Variables to Analyze

By Christos Giannoulis

Many data sets contain well over a thousand variables. Such complexity, the speed of contemporary desktop computers, and the ease of use of statistical analysis packages can encourage ill-directed analysis.

It is easy to generate a vast array of poor 'results' by throwing everything into your software and waiting to see what turns up.



Why do variables need to be selected before analyzing the data?


The thoughtless analysis of data is a problem for a number of reasons. It is easy to plunge into a data analysis without even thinking about what the intended endpoints are.

Analysis without thinking will almost certainly produce biased results.

Powerful multi-variable techniques, such as multiple regression, make it easy to include a very large number of predictor variables in the hope of maximizing the explanatory power of the model.

A similar problem occurs with factor analysis. There is nothing stopping us from factor-analyzing a random set of variables.

Factor analysis will nearly always produce a 'solution'. However, it may well be a nonsense solution.


Factor analysis is designed to identify sets of variables that are tapping the same underlying phenomenon. It does this by examining the patterns of correlations among a set of variables.

The assumption of factor analysis is that the variables that are identified as belonging to a factor are really measuring the same thing. The factor itself is driving the responses on the individual variables. Therefore, they should not be causally related to each other.


Unfortunately, factor analysis cannot distinguish between variables that are causally related and those that are non-causally related.

This can result in variables being grouped together when they should not be. So it’s up to you, the data analyst, to think about the possible types of relationships among the variables and not just let the software make the decisions.


How to narrow down the choice of variables
?


The selection of independent and dependent variables should be a function of the research question to which the data analysis is directed.

​​​​​​​Unless a clear research question is formulated, you will find no answers. It’s as simple as that.


One approach I usually follow is to draw diagrams of the model I plan to evaluate before I begin to analyze the data.

First, I state what my dependent variable is. Then I specify the independent variable and the likely mechanisms by which the independent and dependent variables might be related.

As simple as it sounds, it is of paramount importance as it helps me make sense and guide the selection of variables for further analysis.


When undertaking factor analysis, think about the variables involved. Before subjecting a set of variables to factor analyses you should have some idea of what they might have in common.

​​​​​​​You should make some attempt to include variables that make sense together.

You should also avoid including variables where any correlation is more likely due to causal relationships than to the variables having something in common at the conceptual level.

Want to learn more from Christos?

Join his upcoming workshop on Principal Component and Factor Analysis:



References and Further Reading

Share the love. Forward this newsletter to friends, fans, and colleagues who might be interested. Your recommendation is how we grow.

Get this email from a friend, colleague, or secret admirer of all things statistics? Click here to subscribe.