Someone asked me this recently.
Many R advocates would absolutely say yes to everyone who asks.
I don’t.
(I actually gave her a pretty long answer, summarized here).
It depends on what kind of work you do and the context in which you’re working.
I can say that R is a very handy tool to have in your pocket.
It’s powerful. It’s flexible. It’s cost-effective.
And it’s a lot easier to approach than it used to be, with options like RStudio and RCommander.
But not everyone needs it.
If you have access to another software program (or two) that handles all the statistics you need; if you’re well versed in it and have secure legal access to a site license, you probably don’t need to learn R. This is particularly true if none of your colleagues in your field are using R in any large part.
And yet…
It’s always good to have options. A second software package in which you can check your work.
That may or may not be R. Still, there are times when R is the best tool for the job. Or the only tool.
More often, I suspect, is this situation: there are other tools that could work just as well, if you had access to them. And how many packages can you justify purchasing?
This exact situation came up for me just last month.
My Recent Experience with Embracing R
I was working with a client who needed a pretty specific statistic calculated–a sample size estimate for a kappa statistic (for inter rater reliability). Naturally, they had a tight deadline.
I checked all my sample size software, and none had that statistic. (Of course).
I didn’t look, but it’s quite possible I could have purchased a third sample size software package that had this specific statistic. There are others out there that I don’t have.
A year ago, I would have been a bit stuck and may have had to do just that.
But I’ve been taking some of our own R workshops. And I’ve started to see the options opening up. So I thought I’d check to see if there’s an R package for sample size estimates on kappa.
Lo and behold, there is.
It wasn’t hard to download or use, and I got the answer I needed pretty quickly.
My day was saved.
Why Learn a Second Statistical Software
So really it comes down to what I’ve been saying for years–it’s always good to have more statistical software options when you need them.
R is becoming an obvious choice, not just due to the $0 price tag, but the fact that so many people are creating packages that perform ridiculously specific tests–like a sample size calculation for a kappa statistic.
I realize that R looks extremely intimidating from the outside. I was required to use its parent SPlus in one of my grad programs, and I didn’t like it one bit.
But once you get started, you realize it’s pretty logical. I don’t know if R makes more sense than SPlus (I don’t think so) or if our instructor Kim Love is just a better teacher than my grad school TA (very likely).
The other thing I’ve been saying for years is that you don’t have to learn every difficult or ridiculously specific analysis up front. You don’t need to master every option on a tool to be able to use it.
Build for yourself a solid foundation, from basics up through linear models, and build it well. Once you have those skills, you’ll be able to add new ones from there as you need them.
BRANDY SINCO says
I work for a major research university. We often analyze linear mixed models and generalized linear mixed models. I do graphics, categorical analysis, and complex survey analysis also. SAS meets my needs the best. It’s user-friendly, state-of-the-art, and has excellent tech support.
Because many of my colleagues use Stata and R, I am working on learning those also. I am fine with using a variety of tools, but I definitely do not want to stop using SAS. If I had a choice to use 100% or experiment also with R and Stata, I would want to continue using SAS primarily, but also to be aware of what Stata and R had to offer. Best not to be pidgeon-holed with a single statistical software.
Jerry says
Hi Karen,
I learned SPSS in graduate school because, well, I had time to learn it. I found out what the various menu choices mean and slogged through using data from my classes, and even went into the syntax code (when absolutely necessary). I also benefitted from small 20-minute tutorials from a TA in the computer lab, but I mostly learned on my own. Easy to do when you’re fiddling around in the computer lab until 2:00 am, when grad school was my whole life.
In contrast, I took a class on SAS and it was horrible. It was like going back to the 1970s with laborious syntax, telling the computer the field length with the number of decimals, etc. I could literally do statistics in two minutes using SPSS that would take an hour or more in SAS. I’m not sure if the instructor was backward or just what. It turned me off to SAS.
Now that I’m in the real world, I look for statistics jobs occasionally and I notice that almost all require SAS, R, or “SQL” — and I’m not sure what that even is! Very, very few jobs require SPSS, and I know that for the exact same job description, I could make literally around twice the income if I knew SAS. R is a close second.
I honestly don’t know why SAS is so popular, given my earlier experience. It must be that there’s a large base of entrenched users, a critical mass, that keep it going even when people change jobs (which, I’ve come to learn, statisticians have to do with regularity).
I should have learned SAS, and maybe its not too late, but the learning curve would be steep to bring me up to speed with what I can do with SPSS. The same thing with R.
It is intriguing that some people think SAS is on its way out now. We will see. R is free of course, although rather cumbersome. I’ve seen JAMOVI, which is very SPSS-like and is also free. I’m not sure if it does everything SPSS does, but I understand that programmers can create modules in R (it’s underlying code) and add these to JAMOVI. JAMOVI does really great graphs too. Another one is JASP—also free. It’s nice to have free, especially if possibly having to work from home.
And for the sample size for the kappa statistic — have you tried G-power? Also free, and one of my staples for sample size estimates. It has a bit of a learning curve because some of the terms and concepts it uses were slightly foreign to me.
But as you say, it’s really good to know more than one statistical software. I like the idea of learning it well enough through linear regression, then you can expand into other areas with more confidence.
Carl Hollins says
R is a great tool as are STATA, and SAS. What I particularly like about R is the ease of working with entire pipelines – data-prep through a final analysis and the simple ability to create reproducible data products. Not that the same can’t be done with other tools.
R is my preferred tool for a host of reasons (some mentioned above); I work with a statistics shop that leans towards SAS a lot. So , many times I am forced to use SAS for other reasons that I won’t get into. Off the bat, I will say being stuck using SAS can be frustrating. Especially when I can get the same tasks done with a LOT less code using R.
Granted, I still consider myself a SAS novice. But from what I see working with lists in R is far easier than SAS, which makes a big difference when working with datasets that have nearly the same structure.
To be fair, in recent weeks I have been taking a deeper dive into SAS as well as STATA and hope to gain more fluency by the year’s end.
I feel that the choice goes beyond free cost entry. If one has zero coding skills in any language, then all of the choices will pose challenges once you leave the GUI.
On the technical side. I find it easier (e.g. when creating models, or descriptive stats) to access and manipulate objects in R. This ease as I mentioned at the start makes casting data a breeze. Working with vectors, lists etc just makes life easy. I vote for everyone trying to pickup R but also because of the depth of statistical modules/packages across so many domains.
Robert Twining says
I agree! I let my students use whatever program they want but I warn them that when they graduate they may want me to have forced them to use R. It is always nice to have a free option for statistics. You are also right about another thing: Kim is a superb instructor! The R workshops were a great foundation for getting past the intimidating programming.
John Thomson says
I discovered R about 4 or 5 years ago. My company had a SAS license but only a few people had access and, frankly, most of them only knew how to plug their data into a script someone else had written. I learned R on my own to have control over my data, increase my knowledge and confidence in using stats and to gain flexibility in experimental design. That journey has not been a straight line! I think the efforts of R contributors like Hadley Wickham to add a “grammar” to R through the Tidyverse are remarkable and have made the software much more approachable. When you leave the Tidyverse, things can get scary. R offers tremendous power and flexibility but like everything else in life, flexibility is directly proportional to complexity.
Joe Todd says
I learned SPlus in graduate school, of course switched to R, and it is what I know. I’m in a biomedical workgroup using a statistical graphics program and some bespoke specialty software with our own (copious) data. No one else knows R. I’m not hands-on with data analysis these days (I do produce much of it). If a statistical problem is mentioned that I know R can resolve and if I suggest using it, the suggestion usually goes nowhere. The latest had to do with some data grooming, which R does well. It’ll probably get done instead by cut-pasting in a spreadsheet.
Joe Trubisz says
First off, assume for a second that you are not a student or work for a company that purchases software, but you are sitting at your desk and want to learn _____ (fill in the blank with your favorite stats package. Now, find out the cost of a license. SAS? Thousands, given that you need 6 modules for it to become useful. Stata or Minitab? At least $1,000. SPSS? Expensive, last time I looked. Want to do data mining? well, Statistica Data Miner is probably the best. Only will set you back $15,000. R? Basically free.
Customer work on Linux? Too bad if you are an SPSS or Stastistica user, since they run on Windows. R? Works everywhere. In fact, I do all my R work on an Apple iMac and every model I have delivered ran, unchanged, on systems other than Apple.
Superb community support that costs zero, add to that the many user supplied functions, and you have quite the robust environment.
Last but not least, just look at job boards, especially jobs with “big data” in the description. A vast majority want R. In fact, at a data mining conference in NY city a few months back, the speaker from Columbia University said he expects R to be the de facto statistics standard within 3-years.
Personally, I have not renewed my SAS license (way to expensive) and not sure if I’m going to upgrade Stata. Reason: since 2007, the only thing clients have requested is R
So, the question should really be: why DON’T you know R?
Karen says
Hi Joe,
Well, there definitely people who don’t need to worry about the cost of a license of other software packages. (And fwiw, Stata is a relative bargain, compared to many others). And there are fields where other software is entrenched. People working in those fields, especially those who don’t use stats often, may not need R. At least not yet.
I still believe it’s useful to be able to use the field-entrenched software if you do any collaborative work, and this includes all grad students. You’re going to get stuck if you’re the only R user.
But unless you are a very seldom user of stats, knowing at least two packages is extremely helpful. So I agree–anyone who does a lot of stats ought to learn R.
I think R is becoming the statistics software equivalent of speaking English. Not everyone in the world needs it and it’s not the only important language, but many, many people benefit immensely from it. In some situations this benefit is huge. In others it’s not at all, and in some, it’s absolutely crucial.