In Part 1 we installed R and used it to create a variable and summarize it using a few simple commands. Today let’s re-create that variable and also create a second variable, and see what we can do with them.
As before, we take height to be a variable that describes the heights (in cm) of ten people. Type the following code to the R command line to create this variable.
height = c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175)
Now let’s take bodymass to be a variable that describes the weight (in kg) of the same ten people. Copy and paste the following code to the R command line to create the bodymass variable.
bodymass = c(82, 49, 53, 112, 47, 69, 77, 71, 62, 78)
Both variables are now stored in the R workspace. To view them, enter:
height bodymass
We can now create a simple plot of the two variables as follows:
plot(bodymass, height)
However, this is a rather simple plot and we can embellish it a little. Type the following code into the R workspace:
plot(bodymass, height, pch = 16, cex = 1.3, col = "red", main = "MY FIRST PLOT USING R", xlab = "Body Mass (kg)", ylab = "HEIGHT (cm)")
[Note: R is very picky about the quotation marks you use. If the font that is displaying this post shows the beginning and ending quotation marks as facing in different directions, it won’t work in R. They both have to look the same–just straight lines. You may have to retype them within R rather than cutting and pasting.]
In the above code, the syntax pch = 16
creates solid dots, while cex = 1.3
creates dots that are 1.3 times bigger than the default (where cex = 1
). More about these commands later.
Now let’s perform a linear regression on the two variables by adding the following text at the command line:
lm(height~bodymass)
We see that the intercept is 98.0054 and the slope is 0.9528. By the way – lm stands for “linear model”.
Finally, we can add a best fit line to our plot by adding the following text at the command line:
abline(98.0054, 0.9528)
None of this was so difficult!
In Part 3 we will look again at regression and create more sophisticated plots.
About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.
See our full R Tutorial Series and other blog posts regarding R programming.
Ceri says
My data gives me lm(weight~wing) = -20.2080 0.4871 which when given the command abline(-20.2080, 0.4871) does plot the best fit line. Can you tell me why please?
Dikeledi Moremi says
Dear David
Im very glad to have found your website and hopeful about my struggles with the data i am trying to analyze. I want to do factor analysis for ordinal data which i find very challenging because for me to do this i need to familiarise myself with the R programme.
i have been trying to read on R hoping that i will figure it out but it is not happenning. please i need help.
I have seen that you will be conducting some workshop in December, but im concerned that December is bit late for me since i have a deadline in January. Plus, the allocated time slots may be problematic for me as i am based in South Africa.
I will appreciate any help from you.
Kind Regards
Dikeledi
Karen says
Hi Dikeledi,
I’ll make sure David gets this message, but I wanted to point out that you don’t NEED to use R to get polychoric correlations. SAS can do them as well. I assume Stata can too, although I haven’t checked that out.
If you use SPSS, there is a way to integrate the R command for polychoric correlations using REssentials, which is free to download from SPSS. Exactly how you do it depends on your version of SPSS, and honestly, you may need help from IT to get it to work.
David Lillis says
Hello,
Thank you for pointing out this typo. In Blog 2 the syntax should not be lm(height, weight). Instead it should be:
lm(height~weight)
Sorry about that typo.
Regards,
David
David
Glenys says
Hi David. I am brand new to R and use a Mac (OSX 10.7.5). In following the instructions above, the lm(height, weight) command did not work. The correct syntax seems to be lm(height~weight). Is this specific to R on a Mac? If so, are there many syntax differences between the PC & Mac versions of R? Many thanks.