23/02/2020

R System Basics

Package system

  • R is based on a powerful packaging system

  • Almost everything is packaged as a separate library of functions

  • Some packages are delivered by default at installation

    • base, boot, compiler, datasets, …
  • Others, you have to install

    • dplyr, ggplot2, tidyr, …
  • To install a package (i.e. dplyr):

    install.packages("dplyr")

  • To load an installed package:

    library(dplyr)

Installed packages

  • To get a list of all installed packages on the system

    installed.packages()

    or

    library()

  • Check if a package can be updated

    old.packages()

  • Update old packages

    update.packages()

  • All of these can also be done from RStudio interface

RStudio Package Management

RStudio Package Management

Setting a seed

You can set a seed to ensure reproducebility of results.

set.seed(1024)
runif(5, 0, 1)
## [1] 0.21808916 0.98763424 0.34846189 0.38104699 0.02098596
set.seed(1023)
runif(5, 0, 1)
## [1] 0.2493580 0.2529563 0.9496766 0.4891691 0.4063322
set.seed(1024)
runif(5, 0, 1)
## [1] 0.21808916 0.98763424 0.34846189 0.38104699 0.02098596

The same seed always produces the same random numbers.

Interaction with the R Console

4 + 3 / 5^2
## [1] 4.12
  • The first line is the input command
  • The second line is the output
  • Notice the [1] at the beginning of the output
  • In R, almost everything is a vector (array) of something
  • Here, the output is a vector of numbers and the first item is 4.12
  • Beware!!! R vectors start with index 1

Interaction with the R Console

rnorm(10, mean = 4, sd = 2)
##  [1] 5.347261 3.059237 4.444793 7.473732 3.772683 2.175777 6.118450
##  [8] 3.959706 3.423982 3.934415
mean(sample(1:10, 5))
## [1] 6.6

Interaction with the R Console

Plots are drawn in a different window, (or tab in RStudio)

plot(x = sample(1:10,5), y = sample(1:10,5), main = "Five random points", 
     xlab = "X values", ylab = "Y values")

Primitive Data Types

  • numeric, character, logical

    • also, integer and complex
  • Assign a value to a variable with <-

    age <- 10

    name <- "Ali"

    registered <- FALSE

  • You can re-assign on the fly

    age <- 10

    age <- "ten"

Printing an Object’s Value

  • From the console just type the name of the object
age <- 10
age
## [1] 10
  • Within a script, preferred method is
print(age)
## [1] 10
cat(age)
## 10

Printing an Object’s Value

  • print and cat are different at EOL (end of line)
x <- 5
y <- 10
print(x)
print(y)
## [1] 5
## [1] 10
cat(x)
cat(y)
## 510

List and remove variables and objects

x <- 5
y <- 10
ls()
## [1] "x" "y"
rm(x)   # ... or, rm("x")
ls()
## [1] "y"

Functions

R Functions

  • Functions are also objects
foo <- function()
{
   print("Hello World!")
}

foo()
## [1] "Hello World!"

R Functions

  • Parameters can be provided
foo <- function(name)
{
   cat("Hello", name, "\n")
}

foo("Ali")
## Hello Ali
  • Why not use print()?
    • print does not allow comma concatenation

R Functions

  • Parameters can be initiated
foo <- function(name = "Hasan")
{
   cat("Hello", name, "\n")
}

foo()
## Hello Hasan
foo("Ali")
## Hello Ali

R Functions

  • Parameters can be called by name
foo <- function(name = "Hasan", surname = "Kaya")
{
   cat("Hello", name, surname, "\n")
}

foo(surname = "Topal")
## Hello Hasan Topal

R Functions

  • If called by name, order is not important
foo <- function(name = "Hasan", gender = "M")
{
   if (gender == "M")
      cat("Hello", "Mr.", name, "\n")
   else
      cat("Hello", "Ms.", name, "\n")
}

foo(name = "Ali", gender = "M")
## Hello Mr. Ali
foo(gender = "F", name = "Ayşe")
## Hello Ms. Ayşe

R Functions

  • Some useful functions
max(1:10)
## [1] 10
min(1:10)
## [1] 1
mean(1:10)
## [1] 5.5

R Functions

  • Some useful functions
sd(1:10)
## [1] 3.02765
length(1:10)
## [1] 10
range(1:10)
## [1]  1 10

R Functions

  • Is a function name already used?
exists("max")
## [1] TRUE
exists("burkay")
## [1] FALSE

Vectors

Vectors

  • A vector is a list of things of same type
  • Create one with the c() function
names <- c("Ali", "Ayşe", "Burak")
names
## [1] "Ali"   "Ayşe"  "Burak"
ages <- c(12, 24, 36)
ages
## [1] 12 24 36
married <- c(F, T, T)
married
## [1] FALSE  TRUE  TRUE

Vectors - Useful functions

length(names)
## [1] 3
mode(names)
## [1] "character"

Vectors - Coercion

ages <- c(12, 24, 36, "forty")
ages
## [1] "12"    "24"    "36"    "forty"
mode(ages)
## [1] "character"
mixed <- c(T, 12, "test")
mixed
## [1] "TRUE" "12"   "test"

Missing Values

  • A missing value is represented by an NA
ages <- c(12, 24, 36, NA)
ages
## [1] 12 24 36 NA
  • Be careful with computations if NA exists
mean(ages)
## [1] NA
mean(ages, na.rm = T)
## [1] 24

Vector indexing

  • Use brackets vec[i] to get element i of vec
ages
## [1] 12 24 36 NA
ages[1]
## [1] 12
ages[4]
## [1] NA

Vector indexing

ages[4] <- 20
ages
## [1] 12 24 36 20

Subsetting

ages
## [1] 12 24 36 20
ages[1:3]
## [1] 12 24 36
ages[-1]
## [1] 24 36 20
ages[-1:-3]
## [1] 20

Subsetting

ages
## [1] 12 24 36 20
ages[c(1,3)]
## [1] 12 36
ages[c(-2,-4)]
## [1] 12 36

Batch assignments

ages <- 1:10
ages
##  [1]  1  2  3  4  5  6  7  8  9 10
ages[1:5] <- 20
ages
##  [1] 20 20 20 20 20  6  7  8  9 10

Dynamic resizing

ages <- c(12, 24, 36)
ages
## [1] 12 24 36
ages[4] <- 48
ages
## [1] 12 24 36 48
ages[6] <- 72
ages
## [1] 12 24 36 48 NA 72

Empty vectors

ages <- vector()
ages
## logical(0)
ages[1] <- 24
ages
## [1] 24

Empty vectors

You can create vectors of any size, but they will be initialized by default values:

ages <- numeric()
ages
## numeric(0)
ages <- numeric(10)
ages
##  [1] 0 0 0 0 0 0 0 0 0 0
gender <- character(10)
gender
##  [1] "" "" "" "" "" "" "" "" "" ""
married <- logical(10)
married
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Vector concatenation

You can concatenate vectors with c():

x <- c(1, 2, 3)
x <- c(x, 4)
x
## [1] 1 2 3 4
x <- c(x, x)
x
## [1] 1 2 3 4 1 2 3 4

Vectorization

  • R supports many operations and functions to work on vectors
  • Scalars will be automatically converted to vectors
x <- 1:10
y <- x + 1
y
##  [1]  2  3  4  5  6  7  8  9 10 11
y <- x^2
y
##  [1]   1   4   9  16  25  36  49  64  81 100

Vectorization

x <- -10:10
y <- 2*x^2 - 4*x + 1
plot(x, y)

Vectorization - Recycling

x <- 1:10
y <- 10:1
x + y
##  [1] 11 11 11 11 11 11 11 11 11 11
y <- 1:5
x
x + y
##  [1]  1  2  3  4  5  6  7  8  9 10
##  [1]  2  4  6  8 10  7  9 11 13 15
y <- 1:3
x
x + y
## Warning in x + y: longer object length is not a multiple of shorter object
## length
##  [1]  1  2  3  4  5  6  7  8  9 10
##  [1]  2  4  6  5  7  9  8 10 12 11

Factors

Factors

gender <- c("m", "f", "f", "f", "m")
gender
## [1] "m" "f" "f" "f" "m"
gender <- factor(gender)
gender
## [1] m f f f m
## Levels: f m
gender[3]
## [1] f
## Levels: f m
gender[4] == "f"
## [1] TRUE

Factors

gender
## [1] m f f f m
## Levels: f m
gender[4] <- "m"
gender
## [1] m f f m m
## Levels: f m
gender[4] <- "o"
## Warning in `[<-.factor`(`*tmp*`, 4, value = "o"): invalid factor level, NA
## generated
gender
## [1] m    f    f    <NA> m   
## Levels: f m

Factors - Forced levels

gender <- factor(c("m", "f", "f", "f", "m"), levels = c("f", "m", "o"))
gender
## [1] m f f f m
## Levels: f m o
gender[4] <- "o"
gender
## [1] m f f o m
## Levels: f m o

Factors - Useful functions

levels(gender)
## [1] "f" "m" "o"
length(gender)
## [1] 5
table(gender)
## gender
## f m o 
## 2 2 1

Cross tabulation

hair <- factor(c("Red", "Brown", "Brown", "Black", "Yellow", "Yellow", "Black", "Brown"))
eye <- factor(c("Brown", "Black", "Brown", "Green", "Black", "Brown", "Green", "Black"))
table(hair, eye)
##         eye
## hair     Black Brown Green
##   Black      0     0     2
##   Brown      2     1     0
##   Red        0     1     0
##   Yellow     1     1     0

Cross tabulation

t <- table(hair, eye)
margin.table(t, 1)
## hair
##  Black  Brown    Red Yellow 
##      2      3      1      2
margin.table(t, 2)
## eye
## Black Brown Green 
##     3     3     2

Cross tabulation

t <- table(hair, eye)
prop.table(t, 1)
##         eye
## hair         Black     Brown     Green
##   Black  0.0000000 0.0000000 1.0000000
##   Brown  0.6666667 0.3333333 0.0000000
##   Red    0.0000000 1.0000000 0.0000000
##   Yellow 0.5000000 0.5000000 0.0000000
prop.table(t, 2)
##         eye
## hair         Black     Brown     Green
##   Black  0.0000000 0.0000000 1.0000000
##   Brown  0.6666667 0.3333333 0.0000000
##   Red    0.0000000 0.3333333 0.0000000
##   Yellow 0.3333333 0.3333333 0.0000000
prop.table(t)
##         eye
## hair     Black Brown Green
##   Black  0.000 0.000 0.250
##   Brown  0.250 0.125 0.000
##   Red    0.000 0.125 0.000
##   Yellow 0.125 0.125 0.000

Sequences

Sequences

seq(1, 30, 1)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30
seq(1, 30, 5)
## [1]  1  6 11 16 21 26
seq(-1, 1, 0.1)
##  [1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1  0.0  0.1  0.2  0.3
## [15]  0.4  0.5  0.6  0.7  0.8  0.9  1.0
seq(1, -1, -0.2)
##  [1]  1.0  0.8  0.6  0.4  0.2  0.0 -0.2 -0.4 -0.6 -0.8 -1.0

Sequences

seq(1, 30, length=6)
## [1]  1.0  6.8 12.6 18.4 24.2 30.0
seq(from = 10, length = 5, by = 3)
## [1] 10 13 16 19 22

Repetitive sequences

rep(1, 10)
##  [1] 1 1 1 1 1 1 1 1 1 1
rep(1:2, 5)
##  [1] 1 2 1 2 1 2 1 2 1 2
rep(1:2, each = 5)
##  [1] 1 1 1 1 1 2 2 2 2 2

Useful Tricks

Generating factors

gl(3, 5)
##  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
## Levels: 1 2 3
gl(3, 5, labels = c("Red", "Green", "Blue"))
##  [1] Red   Red   Red   Red   Red   Green Green Green Green Green Blue 
## [12] Blue  Blue  Blue  Blue 
## Levels: Red Green Blue

Other generators

rnorm(10, 1, 2)
##  [1]  4.21060332  3.89519151  0.74339555 -0.07785289  1.78317210
##  [6]  2.75843405 -0.64946418  2.46575285 -0.32982902  1.72177110
rpois(10, 3)
##  [1] 5 4 2 3 5 3 2 3 5 2
rbinom(10, 3, 0.3)
##  [1] 1 2 0 2 0 0 1 0 2 0
runif(10, -10, 10)
##  [1] -4.968684 -3.225994 -7.735908  5.628258  6.942104  3.860872  4.541266
##  [8] -2.912779 -4.304087 -6.685231

More subsetting

x <- rpois(10, 3)
x
##  [1] 1 3 0 2 4 2 2 3 2 2
x > 3
##  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
x[x > 3]
## [1] 4
which(x > 3)
## [1] 5

More subsetting

x <- rpois(10, 3)
x
##  [1] 6 3 4 5 2 2 2 2 2 3
x > 3
##  [1]  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
x[x > 3]
## [1] 6 4 5
which(x > 3)
## [1] 1 3 4

More subsetting

x
##  [1] 6 3 4 5 2 2 2 2 2 3
x[x > 3] <- 3
x
##  [1] 3 3 3 3 2 2 2 2 2 3

Binary comparison

x <- c(F, F, T, T)
y <- c(F, T, T, F)
x & y
## [1] FALSE FALSE  TRUE FALSE
x | y
## [1] FALSE  TRUE  TRUE  TRUE
x && y
## [1] FALSE
x || y
## [1] FALSE

Named indexes

ages <- c(12, 24, 36)
names(ages) <- c("Ali", "Ayşe", "Burak")
ages
##   Ali  Ayşe Burak 
##    12    24    36
ages["Burak"]
## Burak 
##    36
ages[c("Ali", "Burak")]
##   Ali Burak 
##    12    36

Named indexes

ages <- c(Ali = 12, Ayşe = 24, Burak = 36)
ages
##   Ali  Ayşe Burak 
##    12    24    36

Matrices and Arrays

Matrices

m <- c(1, 2, 3, 4, 5, 6, 7, 8)
dim(m)
## NULL
dim(m) <- c(2, 4)
m
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8
m <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8), 2, 4)
m
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8

Matrices - By row

m <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8), 2, 4, byrow = T)
m
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8

Matrices - Indexing

m
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
m[2, 3]
## [1] 7
m[2,]
## [1] 5 6 7 8
m[,3]
## [1] 3 7

Matrices - Indexing

m
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
m[2, -2]
## [1] 5 7 8
m[1:2, 2:3]
##      [,1] [,2]
## [1,]    2    3
## [2,]    6    7

Matrices - Forced matrix output

m[2,]
## [1] 5 6 7 8
m[2, , drop = F]
##      [,1] [,2] [,3] [,4]
## [1,]    5    6    7    8

Matrices - Binding

m <- matrix(1:9, 3, 3)
v <- c(10, 11, 12)
rbind(m, v)
##   [,1] [,2] [,3]
##      1    4    7
##      2    5    8
##      3    6    9
## v   10   11   12
cbind(m, v)
##             v
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12

Matrices - Naming

m <- matrix(1:9, 3, 3)
colnames(m) <- c("A", "B", "C")
rownames(m) <- c("x", "y", "z")
m
##   A B C
## x 1 4 7
## y 2 5 8
## z 3 6 9
m["y", "C"]
## [1] 8

Arrays

  • Arrays are matrices with higher dimensions
    • Don’t confuse with C and Java type arrays
a <- array(1:24, dim = c(4, 3, 2))
a
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   13   17   21
## [2,]   14   18   22
## [3,]   15   19   23
## [4,]   16   20   24

Recycling in Matrices and Arrays

m <- matrix(1, 3, 3)
m
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
## [3,]    1    1    1
m <- matrix(1:3, 3, 3)
m
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]    3    3    3

Matrix operations

m1 <- matrix(1:9, 3, 3)
m2 <- matrix(2, 3, 3)
m1 + m2
##      [,1] [,2] [,3]
## [1,]    3    6    9
## [2,]    4    7   10
## [3,]    5    8   11
m1 * m2
##      [,1] [,2] [,3]
## [1,]    2    8   14
## [2,]    4   10   16
## [3,]    6   12   18
m1 %*% m2  # Matrix multiplication
##      [,1] [,2] [,3]
## [1,]   24   24   24
## [2,]   30   30   30
## [3,]   36   36   36

Lists

Lists

  • Lists allow you to package different types of objects
my.lst <- list(stud.id=34453,
               stud.name="Ali",
               stud.marks=c(14.3,12,15,19))
my.lst
## $stud.id
## [1] 34453
## 
## $stud.name
## [1] "Ali"
## 
## $stud.marks
## [1] 14.3 12.0 15.0 19.0

Lists

my.lst
## $stud.id
## [1] 34453
## 
## $stud.name
## [1] "Ali"
## 
## $stud.marks
## [1] 14.3 12.0 15.0 19.0
my.lst[1]
## $stud.id
## [1] 34453
my.lst[[1]]
## [1] 34453

Lists

my.lst$stud.id          # same as my.lst[[1]]
## [1] 34453
my.lst["stud.id"]       # same as my.lst[1]
## $stud.id
## [1] 34453
my.lst[["stud.id"]]     # same as my.lst[[1]]
## [1] 34453

Lists - tricks with names

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          registered = c(T, F, F))
l$names
## [1] "Ali"   "Ayşe"  "Burak"
l$nam
## [1] "Ali"   "Ayşe"  "Burak"
l$r
## [1]  TRUE FALSE FALSE

Lists - tricks with names

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          agents = c(2, 1, 2),
          registered = c(T, F, F))
l$age
## NULL
l$agen
## [1] 2 1 2

Lists - tricks with names

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          registered = c(T, F, F))
names(l)
## [1] "names"      "ages"       "registered"
names(l)[2] <- "age"
l
## $names
## [1] "Ali"   "Ayşe"  "Burak"
## 
## $age
## [1] 12 24 36
## 
## $registered
## [1]  TRUE FALSE FALSE

Lists - adding to a list

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          registered = c(T, F, F))
l$agents <- c(2, 1, 2)
l
## $names
## [1] "Ali"   "Ayşe"  "Burak"
## 
## $ages
## [1] 12 24 36
## 
## $registered
## [1]  TRUE FALSE FALSE
## 
## $agents
## [1] 2 1 2

Lists - removing from a list

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          registered = c(T, F, F))
l$ages <- NULL
l
## $names
## [1] "Ali"   "Ayşe"  "Burak"
## 
## $registered
## [1]  TRUE FALSE FALSE

Lists - removing from a list

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          registered = c(T, F, F))
l <- l[-2]
l
## $names
## [1] "Ali"   "Ayşe"  "Burak"
## 
## $registered
## [1]  TRUE FALSE FALSE

Lists - concatenation

l1 <- list(1, "a", T)
l2 <- list("red", 1.72)
l <- c(l1, l2)
l
## [[1]]
## [1] 1
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] "red"
## 
## [[5]]
## [1] 1.72