The name of the language, R, comes from being both an S language successor as well as the shared first letter of the authors, Ross and Robert.[11] In August 1993, Ihaka and Gentleman posted a binary of R on StatLib — a data archive website.[12] At the same time, they announced the posting on the s-newsmailing list.[13] On 5 December 1997, R became a GNU project when version 0.60 was released.[14] On 29 February 2000, the 1.0 version was released.[15]
Base packages are immediately available when starting R and provide the necessary syntax and commands for programming, computing, graphics production, basic arithmetic, and statistical functionality.[19]
The tidyverse package bundles several subsidiary packages that provide a common interface for tasks related to accessing and processing "tidy data",[27] data contained in a two-dimensional table with a single row for each observation and a single column for each variable.[28]
Installing a package occurs only once. For example, to install the tidyverse package:[28]
> install.packages("tidyverse")
To load the functions, data, and documentation of a package, one executes the library() function. To load tidyverse:[a]
> # Package name can be enclosed in quotes> library("tidyverse")> # But also the package name can be called without quotes> library(tidyverse)
The R Core Team was founded in 1997 to maintain the Rsource code. The R Foundation for Statistical Computing was founded in April 2003 to provide financial support. The R Consortium is a Linux Foundation project to develop R infrastructure.
The R Journal is an open access, academic journal which features short to medium-length articles on the use and development of R. It includes articles on packages, programming tips, CRAN news, and foundation news.
The R community hosts many conferences and in-person meetups - see the community maintained GitHub list. These groups include:
UseR!: an annual international R user conference (website)
Directions in Statistical Computing (DSC) (website)
The following examples illustrate the basic syntax of the language and use of the command-line interface. (An expanded list of standard language features can be found in the R manual, "An Introduction to R".[34])
In R, the generally preferred assignment operator is an arrow made from two characters <-, although = can be used in some cases.[35]
> x<-1:6# Create a numeric vector in the current environment> y<-x^2# Create vector based on the values in x.> print(y)# Print the vector’s contents.[1] 1 4 9 16 25 36> z<-x+y# Create a new vector that is the sum of x and y> z# Return the contents of z to the current environment.[1] 2 6 12 20 30 42> z_matrix<-matrix(z,nrow=3)# Create a new matrix that turns the vector z into a 3x2 matrix object> z_matrix [,1] [,2][1,] 2 20[2,] 6 30[3,] 12 42> 2*t(z_matrix)-2# Transpose the matrix, multiply every element by 2, subtract 2 from each element in the matrix, and return the results to the terminal. [,1] [,2] [,3][1,] 2 10 22[2,] 38 58 82> new_df<-data.frame(t(z_matrix),row.names=c("A","B"))# Create a new data.frame object that contains the data from a transposed z_matrix, with row names 'A' and 'B'> names(new_df)<-c("X","Y","Z")# Set the column names of new_df as X, Y, and Z.> print(new_df)# Print the current results. X Y ZA 2 6 12B 20 30 42> new_df$Z# Output the Z column[1] 12 42> new_df$Z==new_df['Z']&&new_df[3]==new_df$Z# The data.frame column Z can be accessed using $Z, ['Z'], or [3] syntax and the values are the same. [1] TRUE> attributes(new_df)# Print attributes information about the new_df object$names[1] "X" "Y" "Z"$row.names[1] "A" "B"$class[1] "data.frame"> attributes(new_df)$row.names<-c("one","two")# Access and then change the row.names attribute; can also be done using rownames()> new_df X Y Zone 2 6 12two 20 30 42
# The input parameters are x and y.# The function returns a linear combination of x and y.f<-function(x,y){z<-3*x+4*y# An explicit return() statement is optional, could be replaced with simply `z`.return(z)}# Alternatively, the last statement executed is implicitly returned.f<-function(x,y)3*x+4*y
Since version 4.1.0 functions can be written in a short notation, which is useful for passing anonymous functions to higher-order functions:[38]
> sapply(1:5,\(i)i^2)# here \(i) is the same as function(i) [1] 1 4 9 16 25
Native pipe operator
In R version 4.1.0, a native pipe operator, |>, was introduced.[39] This operator allows users to chain functions together one after another, instead of a nested function call.
> nrow(subset(mtcars,cyl==4))# Nested without the pipe character[1] 11> mtcars|>subset(cyl==4)|>nrow()# Using the pipe character[1] 11
Another alternative to nested functions, in contrast to using the pipe character, is using intermediate objects:
While the pipe operator can produce code that is easier to read, it has been advised to pipe together at most 10 to 15 lines and chunk code into sub-tasks which are saved into objects with meaningful names.[40]
Here is an example with fewer than 10 lines that some readers may still struggle to grasp without intermediate named steps:
In the example, summary is a generic function that dispatches to different methods depending on whether its argument is a numeric vector or a "factor":
> data<-c("a","b","c","a",NA)> summary(data) Length Class Mode 5 character character > summary(as.factor(data)) a b c NA's 2 1 1 1
Modeling and plotting
Diagnostic plots from plotting "model" (q.v. "plot.lm()" function). Notice the mathematical notation allowed in labels (lower left plot).
The R language has built-in support for data modeling and graphics. The following example shows how R can generate and plot a linear model with residuals.
# Create x and y valuesx<-1:6y<-x^2# Linear regression model y = A + B * xmodel<-lm(y~x)# Display an in-depth summary of the modelsummary(model)# Create a 2 by 2 layout for figurespar(mfrow=c(2,2))# Output diagnostic plots of the modelplot(model)
Output:
Residuals: 1 2 3 4 5 6 7 8 9 10 3.3333 -0.6667 -2.6667 -2.6667 -0.6667 3.3333Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -9.3333 2.8441 -3.282 0.030453 * x 7.0000 0.7303 9.585 0.000662 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 3.055 on 4 degrees of freedomMultiple R-squared: 0.9583, Adjusted R-squared: 0.9478F-statistic: 91.88 on 1 and 4 DF, p-value: 0.000662
Mandelbrot set
"Mandelbrot.gif" graphic created in R. (Note: Colors differ from actual output.)
Install the package that provides the write.gif() function beforehand:
install.packages("caTools")
R Source code:
library(caTools)jet.colors<-colorRampPalette(c("green","pink","#007FFF","cyan","#7FFF7F","white","#FF7F00","red","#7F0000"))dx<-1500# define widthdy<-1400# define heightC<-complex(real=rep(seq(-2.2,1.0,length.out=dx),each=dy),imag=rep(seq(-1.2,1.2,length.out=dy),times=dx))# reshape as matrix of complex numbersC<-matrix(C,dy,dx)# initialize output 3D arrayX<-array(0,c(dy,dx,20))Z<-0# loop with 20 iterationsfor (kin1:20){# the central difference equationZ<-Z^2+C# capture the resultsX[,,k]<-exp(-abs(Z))}write.gif(X,"Mandelbrot.gif",col=jet.colors,delay=100)
Version names
CD of R Version 1.0.0, autographed by the core team of R, photographed R in Quebec City in 2019
All R version releases from 2.14.0 onward have codenames that make reference to Peanuts comics and films.[42][43][44]
In 2018, core R developer Peter Dalgaard presented a history of R releases since 1997.[45] Some notable early releases before the named releases include:
Version 1.0.0 released on 29 February 2000 (2000-02-29), a leap day
Version 2.0.0 released on 4 October 2004 (2004-10-04), "which at least had a nice ring to it"[45]
The idea of naming R version releases was inspired by the Debian and Ubuntu version naming system. Dalgaard also noted that another reason for the use of Peanuts references for R codenames is because, "everyone in statistics is a P-nut".[45]
^This displays to standard error a listing of all the packages that tidyverse depends upon. It may also display warnings showing namespace conflicts, which may typically be ignored.
^ abHornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 3.3 What are the differences between R and S?. Archived from the original on 28 December 2022. Retrieved 27 December 2022.
^Chambers, John M. (2020). "S, R, and Data Science". The R Journal. 12 (1): 462–476. doi:10.32614/RJ-2020-028. ISSN2073-4859. The R language and related software play a major role in computing for data science. ... R packages provide tools for a wide range of purposes and users.
^Davies, Tilman M. (2016). "Installing R and Contributed Packages". The Book of R: A First Course in Programming and Statistics. San Francisco, California: No Starch Press. p. 739. ISBN9781593276515.
^Talbot, Justin; DeVito, Zachary; Hanrahan, Pat (1 January 2012). "Riposte: A trace-driven compiler and parallel VM for vector code in R". Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM. pp. 43–52. doi:10.1145/2370816.2370825. ISBN9781450311823. S2CID1989369.
^Schulz, Charles M. (2019). Happiness is a warm puppy. New York : Penguin Workshop. ISBN978-1-5247-8995-4.{{cite book}}: CS1 maint: publisher location (link)
Wickham, Hadley; Çetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for data science: import, tidy, transform, visualize, and model data (2nd ed.). Beijing Boston Farnham Sebastopol Tokyo: O'Reilly. ISBN978-1-4920-9740-2.