If you are like I was for a long time, you have avoided learning R.
You’ve probably heard that there’s a steep learning curve. Or noticed that the available documentation is not necessarily user-friendly.
Frankly, both things are true, to some extent.
R is Open-Source
The best and worst thing about R is that it is open-source. So there is no single company that is responsible for R or your ability to use it.
While there is a developer community that maintains a set of standards and regulated documentation, anyone can add new functionality to R through user-created “packages.”
This gives R users a large, flexible range of options (once you know how to install the packages, of course!), which can be a major advantage.
On the other hand, these packages are as diverse as the users who create them, and they may emphasize different model features, output displays, and even basic methodological principles.
How R Thinks
Underlying all of this, though, is what I feel is the truly intimidating part of R: that is, how R thinks. For those of us who are used to using SAS, SPSS, and most other commercially-based statistical software products, the way that we interact with R feels dauntingly unfamiliar.
Consider running a linear model in SAS or SPSS.
We write some code, or click some buttons and follow some menus, and there’s our output. The software might give us slightly different output, depending on what options we include or check off. But that’s the basic story every time. We run a model, and our results appear.
Not so with R.
Let’s take a look at the syntax you might use to run a basic one-way ANOVA in R. We will use a dataset called data1. (Notice I say might, because there is more than one way to do this!)
model1 <- lm(yvar ~ factorvar, data=data1)
We run the syntax, and…
> model1 <- lm(yvar ~ factorvar, data=data1)
>
…
Nothing.
Did it work? And if it did work, where are the results?
Turns out, R stored them as an object called model1. If we want to see the results, we have to ask for them, and we have to know how.
If we want to see the ANOVA table, for example, one option is to run a function called anova on that object:
anova(model1)
If we want to see the actual solution to the model, along with some other basic statistics, we might run a different function on that object:
summary(model1)
Yes, this might seem burdensome and unnecessary at first. But the more you program in R, the more the advantages of this system become clear. It is exactly what gives R the wonderful flexibility and range that experienced R programmers always seem to be talking about.
Growing your understanding of this “object-based” programming opens many doors.
Most importantly, a deeper understanding of R objects and the functions we use on them is the key to being able to understand the documentation that seems so out of reach when we first start trying to learn R.
Mengesha Assefa Ahunie says
it is so interesting
Edzard van Santen says
The flip side of readily available packages/functions in R is that it perpetuates some bad habits, because manuscript authors find a given approach, grind it through an R function and submit the manuscript. A particular example is the ubiquitous PCA for multivariate data when there are better MV techniques. A stumbling block is the cult-like behavior of many hard-core R users, who are set in their ways and won’t tolerate any deviation from the doctrine of how things should be done. This is particularly egregious with native English (American) speakers who don’t know a second language. Our foreign students intuitively get the advantage that a second language offers. Also, as Ralph O’Brien (think test for normality) once pointed out to me, many cutting edge analytical techniques are not and may never be readily available in commercial packages. The next frontier will be Python, which will or already has eclipsed are in the AI realm.
William Peck says
this is good.
Would you say “R” is foundational to Data Science?
Karen Grace-Martin says
Probably. That’s not really my field, but my impression is that R and Python are used extensively in data science.
Alan Mainwaring says
When I heard about “R” and downloaded and installed it I was very disappointed. In retrospect all I got was, well basically a command line. Strangely this command line was called a “GUI” that is a Graphic User Interface. I was then expecting this GUI to be like a click and point Statistical Application like SPSS. Of course this is incorrect.
What really switched me onto R was the concept one could install packages. Here was the next hurdle. Which ones would really get you started ?This where the package R Commander turned out to be so helpful, known by its acronym “Rcmdr” even here you must be aware that R is case sensitive.
R Commander is called training wheels for R and I reckon its a very good start for first time users. At my University we have now R Commander installed in all our computer labs. We now have all our students using it and liking it.
Sure R Commander has its issues but it is a really good start, later on students move onto R Studio. I love R but this has taken me quite a few years to get used to it. Of course I am still learning like for example permutation ANOVA you never stop learning R.