Statistical Software

Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups, Part 2

March 31st, 2009 by

Part 1 outlined one issue in deciding whether to put a categorical predictor variable into Fixed Factors or Covariates in SPSS GLM.  That issue dealt with how SPSS automatically creates dummy variables from any variable in Fixed Factors.

There is another key default to keep in mind. SPSS GLM will automatically create interactions between any and all variables you specify as Fixed Factors.

If you put 5 variables in Fixed Factors, you’ll get a lot of interactions. SPSS will automatically create all 2-way, 3-way, 4-way, and even a 5-way interaction among those 5 variables. (more…)


PSPP – the free, open source version of SPSS

March 24th, 2009 by

I just heard recently about PSPP, which is a free, open source version of SPSS.

I have not tried it yet, but it does look promising. This is the description from its website:

It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions.

The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get “advanced” functions; all functionality that PSPP currently supports is in the core package.

PSPP can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands.

Sounds pretty good, huh?

The only downside I can see, though, is with the statement “no additional packages to purchase in order to get ‘advanced’ functions.”  That appears to be because there aren’t any advanced functions.  PSPP seems to correspond only to SPSS base.  No Advanced Models, no Missing Values Analysis, no Complex Surveys.  That means you can do one-way ANOVA and regression, but not GLM, logisitic regression, factor analysis.

So if you are only using SPSS for basic statistics, or for teaching an intro class, this may be just what you need.  And perhaps if it takes off, as R has, we’ll see more advanced features soon.

If you’ve had any experience using PSPP, please tell me about it in a comment.  I’d love to hear how well it works.

 


Logistic Regression Models: Reversed odds ratios in SAS Proc Logistic–Use ‘Descending’

March 18th, 2009 by

If you’ve ever been puzzled by odds ratios in a logistic regression that seem backward, stop banging your head on the desk.

Odds are (pun intended) you ran your analysis in SAS Proc Logistic.

Proc logistic has a strange (I couldn’t say odd again) little default.  If your dependent variable Y is coded 0 and 1, SAS will model the probability of Y=0.  Most of us are trying to model the probability that Y=1.  So, yes, your results ARE backward, but only because SAS is testing a hypothesis opposite yours.

Luckily, SAS made the solution easy.  Simply add the ‘Descending’ option right in the proc logisitic command line.  For example:

PROC LOGISTIC DESCENDING;
MODEL Y = X1 X2;
RUN;

All of your parameter estimates (B) will reverse signs, although p-values will not be affected.

 

[Logistic_Regression_Workshop]


SPSS, SAS, R, Stata, JMP? Choosing a Statistical Software Package or Two

March 16th, 2009 by

In addition to the five listed in this title, there are quite a few other options, so how do you choose which statistical software to use?

The default is to use whatever software they used in your statistics class–at least you know the basics.

And this might turn out pretty well, but chances are it will fail you at some point. Many times the stat package used in a class is chosen for its shallow learning curve, (more…)


The Exposure Variable in Poisson Regression Models

January 23rd, 2009 by

Poisson Regression Models and its extensions (Zero-Inflated Poisson, Negative Binomial Regression, etc.) are used to model counts and rates. A few examples of count variables include:

– Number of words an eighteen month old can say

– Number of aggressive incidents performed by patients in an impatient rehab center

Most count variables follow one of these distributions in the Poisson family. Poisson regression models allow researchers to examine the relationship between predictors and count outcome variables.

Using these regression models gives much more accurate parameter (more…)


Logistic Regression Models for Multinomial and Ordinal Variables

January 14th, 2009 by

Multinomial Logistic Regression

The multinomial (a.k.a. polytomous) logistic regression model is a simple extension of the binomial logistic regression model.  They are used when the dependent variable has more than two nominal (unordered) categories.

Dummy coding of independent variables is quite common.  In multinomial logistic regression the dependent variable is dummy coded into multiple 1/0 variables.  There is a variable for all categories but one, so if there are M categories, there will be M-1 dummy variables.  All but one category has its own dummy variable.  Each category’s dummy variable has a value of 1 for its category and a 0 for all others.  One category, the reference category, doesn’t need its own dummy variable as it is uniquely identified by all the other variables being 0.

The multinomial logistic regression then estimates a separate binary logistic regression model for each of those dummy variables.  The result is (more…)