R Programming: Get Started in Statistical Programming
R is a programming language and development environment used for statistical analysis and the creation of publication-quality data visualizations. R is completely free, open-source, part of the GNU Project, and is supported by the R Foundation for Statistical Computing.
Where Did R Come From?
R was first conceived in 1992 by two professors at the University of Auckland in New Zealand: Ross Ihaka and Robert Gentleman. The very first version of the language was released in 1994. However, it would be another six years before a stable beta version of R was made available to the public in February 2000.
R is an implementation of the S programming language originally developed in the 1970s by John Chambers and others at the venerable Bell Laboratories. R and S-PLUS, a proprietary language owned by TIBCO, are the two modern implementations of the S programming language.
Today, R is the most popular statistical analysis programming language and is used by industry giants such as Facebook and Google. Interestingly, the original creator of S, John Chambers, is now part of the R Development Core Team, which is tasked with ongoing development of R. Meaning that R is the spiritual continuation of the S programming language even if it isn’t a direct descendant.
What is R Used For Today?
Roughly half of all data scientists use R for data mining and statistical analysis — it is the programming language of choice within the rather nebulous “big data” industry you keep hearing about. R includes built-in functions and variables designed to make statistical analysis easier, and it also provides graphic-generation tools that produce publication-quality data visualizations.
R is highly extensible, and many packages exist to address specific data analysis tasks and problems. It owes a part of it’s popularity to its open-source status, which means that anyone can use R and have access to world-quality statistical analysis tools.
R is designed to work on virtually any platform and can be run on systems with a Unix, Linux, Windows, or Mac OS operating system.
GUIs for R
Standard R is accessed over the command line. However, users who prefer an easy-to-use graphical user interface (GUI) are in luck. There are many GUIs available for R, some of which are free and open source.
If you’d like to learn more about R GUIs, here are six of the most popular options:
- RStudio Open Source Version
- JGR (pronounced “Jaguar” and standing for “Java Gui for R“)
- R Commander
Programming with Style
Programming is a fairly free-form medium. In the case of most programming languages, line breaks and indentation are completely optional and ignored by the machine that interprets the code, and there are few naming conventions that must be followed.
However, just because you can write code using any style you like doesn’t mean you should. How you style code matters a great deal for at least three reasons:
- Poorly styled code is hard to read and understand.
- Because it’s hard to read and understand, poorly styled code can be frustrating to extend.
- In addition, if code is hard to read and isn’t styled for clarity, then it will be harder than necessary to debug.
For this reason, how you style code in R is second in importance only to whether or not the code actually works. To help you get started on the right foot, here are the top three stylistic recommendations you should follow when writing code in R:
- Indent your code: Nothing helps code clarity more than proper indentation. In R, you never use tabs for indentation, but instead, use four blank spaces for each level of indentation.
- Use clear and unique variable and function names: Never name a variable or function by reusing a name that is already in use, and do your best to avoid confusing names. When debugging code months from now, or when someone else reads your code, it should be easy to pick out the variables and functions that you created.
=: The equal sign should not be used to assign a value to a function or variable. Instead, combine a less-than symbol and dash (
<-) for this purpose. While the modern R system will accept an equals sign, its use is technically incorrect, and it only works because R was adapted to fit the poor syntactical practices of new programmers. Do it right. Use
R style is a contentious topic and we can hardly do it justice in a couple hundred words. While we’ve tried to hit three high-points, there’s a lot more to learn about this topic. If you want to master R programming style, and look like you know what you’re doing when you write R, check out R Style: An Rchaeological Commentary (PDF) by Paul E Johnson as well as Google’s R Style Guide.
While we’ve told you a lot about R programming, we haven’t taught you how to program in R. We can’t do that in this forum but we can direct you to the finest R programming education you’ll find on the web. If you’ve learned enough about R and ready to start writing some code and crunching some numbers, here are the best R programming resources the web has to offer.
The web offers lots of R tutorials you can use to learn R programming. However, we think these two are the very best:
- An Introduction to R: This introduction is anything but light and basic. Don’t jump into this tutorial without first steeling your nerves and setting up a long-term study schedule. This in-depth and thorough introduction to R is managed by the R Core Team, which means you’re being educated by the experts as you work your way through more than 30,000 words of content. If you already know a bit of R and just want to jump to specific topics, this guide does also include a helpful index of functions and variables as well as a concept index.
If you’d rather learn by doing, an interactive tutorial might be just what you’re looking for. Here are three options to consider:
- Try R From Code School: A short interactive introduction to R syntax and basic programming with R. Think of it as the interactive version of R for Cats, but with fewer feline references.
- DataCamp Introduction to R: A basic interactive introduction to R programming that covers how to perform arithmetic and work with variables, and introduces basic data types.
- Swirl: this is actually an R package. That means that you’ll be walking through interactive tutorials with R installed right on your system. There are swirl courses available to walk you through everything, starting with installation. In addition, there are quite a few additional courses beyond the introductory course with titles such as “R Programming”, “Data Analysis”, and “Regression Models.”
If you really want to learn how to perform statistical analysis with R there’s no substitute for formal textbooks. Considering the use of R in industry and academia, there is no shortage of quality R texts. However, we’ve taken the time to sort through the clutter and pinpoint five of the most recommended and most highly-rated R programming texts available today:
- R in Action (2015) by Robert Kabacoff: one of the most widely studied R texts on the market, it presents the R programming language and demonstrates the use of R to solve business problems.
- Practical Data Science with R (2014) by Zumel and Mount: just as the name suggests, this text teaches the theory of statistical analysis with R but focuses on the practical application of theory to real-world problems. Written by a pair of impressively-qualified private-sector data scientists, if you read only one text on R, make it this one.
- Discovering Statistics Using R (2012) by Field, et al: this irreverent text is universally considered to be the most entertaining introductions to statistical analysis with R. In addition, the text provides a solid technical foundation. If you hate textbooks but understand the need to read one, this is the textbook you’re looking for.
- The Art of R Programming (2011) by Norman Matloff: this tour of applied R programming walks the reader through real-world scenarios where R is used every day. Suitable for beginners and experienced developers alike, this text is designed to stretch your perception of what data analysis can do while simultaneously teaching fundamental R programming.
- R Cookbook (2011) by Paul Teetor: if you’re more concerned with solving specific problems than learning the theory behind the R programming language and statistical analysis, this cookbook from O’Reilly will help you solve problems and produce results quickly.
R is free and open-source, making it possible for anyone to have access to world-class statistical analysis tools. It is used widely in academia and the private sector and is the most popular statistical analysis programming language today. Learning R isn’t easy — if it was, data scientists wouldn’t be in such high demand. However, there is no shortage of quality resources you can use to learn R if you’re willing to put in the time and effort.
Further Reading and Resources
We have more guides, tutorials, and infographics related to programming and statistics:
- S-PLUS Programming Resources: the standard commercial language.
- SAS Programming Introduction and Resources: SAS is the market leader in data analysis.
- Stata Programming: Stata is a whole development environment for doing data analysis.
What Code Should You Learn?
Confused about what programming language you should learn to code in? Check out our infographic, What Code Should You Learn? It not only discusses different aspects of the languages, it answers important questions such as, “How much money will I make programming Java for a living?”