Lecture 2

26 Feb, 2024

R System Basics

Package system

R is based on a powerful packaging system
Almost everything is packaged as a separate library of functions
Some packages are delivered by default at installation
- base, boot, compiler, datasets, …
Others, you have to install
- dplyr, ggplot2, tidyr, …
To install a package (i.e. dplyr):

install.packages("dplyr")
To load an installed package:

library(dplyr)

Installed packages

To get a list of all installed packages on the system

installed.packages()

or

library()
Check if a package can be updated

old.packages()
Update old packages

update.packages()
All of these can also be done from RStudio interface

RStudio Package Management

Setting a seed

You can set a seed to ensure reproducebility of results.

set.seed(1024)
runif(5, 0, 1)

## [1] 0.21808916 0.98763424 0.34846189 0.38104699 0.02098596

set.seed(1023)
runif(5, 0, 1)

## [1] 0.2493580 0.2529563 0.9496766 0.4891691 0.4063322

set.seed(1024)
runif(5, 0, 1)

## [1] 0.21808916 0.98763424 0.34846189 0.38104699 0.02098596

The same seed always produces the same random numbers.

This is quite essential in data studies as your results must be reproducible.

About random generator seeds

Note that these slides are generated with R.

Code scripts on each slide are being executed to produce these slides.

Whenever a seed is used on a slide, this will ensure that all results on a lecture are reproducible by you on your own PCs.

The seed setting in the previous slide also guarantees that the following slides are all reproducible.

Interaction with the R Console

4 + 3 / 5^2

## [1] 4.12

The first line is the input command
The second line is the output
Notice the [1] at the beginning of the output
In R, almost everything is a vector (array) of something
Here, the output is a vector of numbers and the first item is 4.12
Beware!!! R vectors start with index 1

Interaction with the R Console

Producing 10 random normal values with a mean of \(4\) and standard deviation of \(2\):

rnorm(10, mean = 4, sd = 2)

##  [1] 5.347261 3.059237 4.444793 7.473732 3.772683 2.175777 6.118450 3.959706
##  [9] 3.423982 3.934415

Computing the mean of 5 randomly selected values between \(1\) and \(10\), both inclusive:

mean(sample(1:10, 5, replace = FALSE))

## [1] 6.6

Interaction with the R Console

Plots are drawn in a different window, (or tab in RStudio)

plot(x = sample(1:10,5), y = sample(1:10,5), main = "Five random points", 
     xlab = "X values", ylab = "Y values")

Primitive Data Types

numeric, character, logical
- also, integer and complex
Assign a value to a variable with <-

age <- 10

name <- "Ali"

registered <- FALSE
You can re-assign on the fly

age <- 10

age <- "ten"
Note that these are actually vectors of size \(1\).

Printing an Object’s Value

From the console just type the name of the object

age <- 10
age

## [1] 10

But, be careful that this won’t work in a script

age <- 10
age2 <- 20
age
age2

Printing an Object’s Value

Within a script, preferred method is

print(age)

## [1] 10

You can also use cat to get rid of vector indices

cat(age)

## 10

Printing an Object’s Value

print and cat are different also at EOL (end of line)

x <- 5
y <- 10
print(x)
print(y)

## [1] 5
## [1] 10

cat(x)
cat(y)

## 510

So you must deliberately add an EOL when using cat

cat(x, "\n")

## 5

cat(y)

## 10

List and remove variables and objects

x <- 5
y <- 10
ls()

## [1] "x" "y"

rm(x)   # ... or, rm("x")
ls()

## [1] "y"

Functions

R Functions

Functions are also objects

foo <- function()
{
   print("Hello World!")
}

foo()

## [1] "Hello World!"

R Functions

Parameters can be provided

foo <- function(name)
{
   cat("Hello", name, "\n")
}

foo("Ali")

## Hello Ali

Why not use print()?
- print does not allow comma concatenation

R Functions

Parameters can be initiated

foo <- function(name = "Hasan")
{
   cat("Hello", name, "\n")
}

foo()

## Hello Hasan

foo("Ali")

## Hello Ali

R Functions

Parameters can be called by name

foo <- function(name = "Hasan", surname = "Kaya")
{
   cat("Hello", name, surname, "\n")
}

foo(surname = "Topal")

## Hello Hasan Topal

R Functions

If called by name, order is not important

foo <- function(name = "Hasan", gender = "M")
{
   if (gender == "M")
      cat("Hello", "Mr.", name, "\n")
   else
      cat("Hello", "Ms.", name, "\n")
}

foo(name = "Ali", gender = "M")

## Hello Mr. Ali

foo(gender = "F", name = "Ayşe")

## Hello Ms. Ayşe

R Functions

Some useful functions

max(1:10)

## [1] 10

min(1:10)

## [1] 1

mean(1:10)

## [1] 5.5

R Functions

Some useful functions

sd(1:10)

## [1] 3.02765

length(1:10)

## [1] 10

range(1:10)

## [1]  1 10

R Functions

Is a function name already used?

exists("max")

## [1] TRUE

exists("burkay")

## [1] FALSE

Vectors

A vector is a list of things of same type
Create one with the c() function

names <- c("Ali", "Ayşe", "Burak")
names

## [1] "Ali"   "Ayşe"  "Burak"

ages <- c(12, 24, 36)
ages

## [1] 12 24 36

married <- c(F, T, T)
married

## [1] FALSE  TRUE  TRUE

Vectors - Useful functions

length(names)

## [1] 3

mode(names)

## [1] "character"

Vectors - Coercion

ages <- c(12, 24, 36, "forty")
ages

## [1] "12"    "24"    "36"    "forty"

mode(ages)

## [1] "character"

mixed <- c(T, 12, "test")
mixed

## [1] "TRUE" "12"   "test"

Missing Values

A missing value is represented by an NA

ages <- c(12, 24, 36, NA)
ages

## [1] 12 24 36 NA

Be careful with computations if NA exists

mean(ages)

## [1] NA

mean(ages, na.rm = T)

## [1] 24

Vector indexing

Use brackets vec[i] to get element i of vec

ages

## [1] 12 24 36 NA

ages[1]

## [1] 12

ages[4]

## [1] NA

Vector indexing

ages[4] <- 20
ages

## [1] 12 24 36 20

Subsetting

ages

## [1] 12 24 36 20

ages[1:3]

## [1] 12 24 36

ages[-1]

## [1] 24 36 20

ages[-1:-3]

## [1] 20

ages[-(1:3)]

## [1] 20

Subsetting

ages

## [1] 12 24 36 20

ages[c(1,3)]

## [1] 12 36

ages[c(-2,-4)]

## [1] 12 36

Boolean subsetting

A very frequently used technique is boolean subsetting. Items with TRUE indices are selected, while items with FALSE indices are discarded.

x <- 1:5
y <- c(F,T,T,F,T)
x[y]

## [1] 2 3 5

This can be quite useful in many scenarios.

x <- 1:10
x[x > mean(x)]

## [1]  6  7  8  9 10

This works, because:

x > mean(x)

##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Batch assignments

ages <- 1:10
ages

##  [1]  1  2  3  4  5  6  7  8  9 10

ages[1:5] <- 20
ages

##  [1] 20 20 20 20 20  6  7  8  9 10

ages[3:5] <- c(99,98,97)
ages

##  [1] 20 20 99 98 97  6  7  8  9 10

ages[ages > 50] <- -1
ages

##  [1] 20 20 -1 -1 -1  6  7  8  9 10

Dynamic resizing

ages <- c(12, 24, 36)
ages

## [1] 12 24 36

ages[4] <- 48
ages

## [1] 12 24 36 48

ages[6] <- 72
ages

## [1] 12 24 36 48 NA 72

ages[is.na(ages)] <- mean(ages, na.rm = TRUE)
ages

## [1] 12.0 24.0 36.0 48.0 38.4 72.0

An automatic coercion to real numbers have occured.

Empty vectors

ages <- vector()
ages

## logical(0)

ages[1] <- 24
ages

## [1] 24

Empty vectors

You can create vectors of any size, but they will be initialized by default values:

ages <- numeric()
ages

## numeric(0)

ages <- numeric(10)
ages

##  [1] 0 0 0 0 0 0 0 0 0 0

gender <- character(10)
gender

##  [1] "" "" "" "" "" "" "" "" "" ""

married <- logical(10)
married

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Vector concatenation

You can concatenate vectors with c():

x <- c(1, 2, 3)
x <- c(x, 4)
x

## [1] 1 2 3 4

x <- c(x, x)
x

## [1] 1 2 3 4 1 2 3 4

Vectorization

R supports many operations and functions to work on vectors
Scalars will be automatically converted to vectors
- Below 1 is treated as [1,1,1,1,1,1,1,1,1,1]

x <- 1:10
y <- x + 1
y

##  [1]  2  3  4  5  6  7  8  9 10 11

y <- x^2
y

##  [1]   1   4   9  16  25  36  49  64  81 100

Vectorization

x <- -10:10
y <- 2*x^2 - 4*x + 1
plot(x, y)

Vectorization - Recycling

x <- 1:10
y <- 10:1
x + y

##  [1] 11 11 11 11 11 11 11 11 11 11

y <- 1:5
x
x + y

##  [1]  1  2  3  4  5  6  7  8  9 10
##  [1]  2  4  6  8 10  7  9 11 13 15

y <- 1:3
x
x + y

## Warning in x + y: longer object length is not a multiple of shorter object
## length

##  [1]  1  2  3  4  5  6  7  8  9 10
##  [1]  2  4  6  5  7  9  8 10 12 11

Factors

color <- c("red", "blue", "blue", "blue", "red")
color

## [1] "red"  "blue" "blue" "blue" "red"

color <- factor(color)
color

## [1] red  blue blue blue red 
## Levels: blue red

color[3]

## [1] blue
## Levels: blue red

color[4] == "blue"

## [1] TRUE

Factors

color

## [1] red  blue blue blue red 
## Levels: blue red

color[4] <- "red"
color

## [1] red  blue blue red  red 
## Levels: blue red

color[4] <- "green"

## Warning in `[<-.factor`(`*tmp*`, 4, value = "green"): invalid factor level, NA
## generated

color

## [1] red  blue blue <NA> red 
## Levels: blue red

Factors - Forced levels

color <- factor(c("red", "blue", "blue", "blue", "red"), 
                levels = c("blue", "red", "green"))
color

## [1] red  blue blue blue red 
## Levels: blue red green

color[4] <- "green"
color

## [1] red   blue  blue  green red  
## Levels: blue red green

Factors - Useful functions

levels(color)

## [1] "blue"  "red"   "green"

length(color)

## [1] 5

table(color)

## color
##  blue   red green 
##     2     2     1

Cross tabulation

hair <- factor(c("Red", "Brown", "Brown", "Black", "Yellow", "Yellow", "Black", "Brown"))
eye <- factor(c("Brown", "Black", "Brown", "Green", "Black", "Brown", "Green", "Black"))
table(hair, eye)

##         eye
## hair     Black Brown Green
##   Black      0     0     2
##   Brown      2     1     0
##   Red        0     1     0
##   Yellow     1     1     0

Cross tabulation

t <- table(hair, eye)
margin.table(t, 1)

## hair
##  Black  Brown    Red Yellow 
##      2      3      1      2

margin.table(t, 2)

## eye
## Black Brown Green 
##     3     3     2

Cross tabulation

t <- table(hair, eye)
prop.table(t, 1)

##         eye
## hair         Black     Brown     Green
##   Black  0.0000000 0.0000000 1.0000000
##   Brown  0.6666667 0.3333333 0.0000000
##   Red    0.0000000 1.0000000 0.0000000
##   Yellow 0.5000000 0.5000000 0.0000000

prop.table(t, 2)

##         eye
## hair         Black     Brown     Green
##   Black  0.0000000 0.0000000 1.0000000
##   Brown  0.6666667 0.3333333 0.0000000
##   Red    0.0000000 0.3333333 0.0000000
##   Yellow 0.3333333 0.3333333 0.0000000

prop.table(t)

##         eye
## hair     Black Brown Green
##   Black  0.000 0.000 0.250
##   Brown  0.250 0.125 0.000
##   Red    0.000 0.125 0.000
##   Yellow 0.125 0.125 0.000

Sequences

seq(1, 30, 1)

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

seq(1, 30, 5)

## [1]  1  6 11 16 21 26

seq(-1, 1, 0.1)

##  [1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1  0.0  0.1  0.2  0.3  0.4
## [16]  0.5  0.6  0.7  0.8  0.9  1.0

seq(1, -1, -0.2)

##  [1]  1.0  0.8  0.6  0.4  0.2  0.0 -0.2 -0.4 -0.6 -0.8 -1.0

Sequences

seq(1, 30, length=6)

## [1]  1.0  6.8 12.6 18.4 24.2 30.0

seq(from = 10, length = 5, by = 3)

## [1] 10 13 16 19 22

Repetitive sequences

rep(1, 10)

##  [1] 1 1 1 1 1 1 1 1 1 1

rep(1:2, 5)

##  [1] 1 2 1 2 1 2 1 2 1 2

rep(1:2, each = 5)

##  [1] 1 1 1 1 1 2 2 2 2 2

Useful Tricks

Generating factors

gl(3, 5)

##  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
## Levels: 1 2 3

gl(3, 5, labels = c("Red", "Green", "Blue"))

##  [1] Red   Red   Red   Red   Red   Green Green Green Green Green Blue  Blue 
## [13] Blue  Blue  Blue 
## Levels: Red Green Blue

Other generators

rnorm(10, 1, 2)

##  [1]  4.21060332  3.89519151  0.74339555 -0.07785289  1.78317210  2.75843405
##  [7] -0.64946418  2.46575285 -0.32982902  1.72177110

rpois(10, 3)

##  [1] 5 4 2 3 5 3 2 3 5 2

rbinom(10, 3, 0.3)

##  [1] 1 2 0 2 0 0 1 0 2 0

runif(10, -10, 10)

##  [1] -4.968684 -3.225994 -7.735908  5.628258  6.942104  3.860872  4.541266
##  [8] -2.912779 -4.304087 -6.685231

More subsetting

x <- rpois(10, 3)
x

##  [1] 1 3 2 3 1 5 4 4 3 3

x > 3

##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE

x[x > 3]

## [1] 5 4 4

which(x > 3)

## [1] 6 7 8

More subsetting

x <- rpois(10, 3)
x

##  [1] 2 1 4 3 5 1 3 5 1 4

x > 3

##  [1] FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE

x[x > 3]

## [1] 4 5 5 4

which(x > 3)

## [1]  3  5  8 10

More subsetting

##  [1] 2 1 4 3 5 1 3 5 1 4

x[x > 3] <- 3
x

##  [1] 2 1 3 3 3 1 3 3 1 3

Binary comparison

x <- c(F, F, T, T)
y <- c(F, T, T, F)
x & y

## [1] FALSE FALSE  TRUE FALSE

x | y

## [1] FALSE  TRUE  TRUE  TRUE

Binary comparison

As of 4.3.0, && and ||only works with vectors of length 1

x <- T
y <- F
x && y

## [1] FALSE

x || y

## [1] TRUE

x & y

## [1] FALSE

x | y

## [1] TRUE

So, simply use & or |.

Named indexes

ages <- c(12, 24, 36)
names(ages) <- c("Ali", "Ayşe", "Burak")
ages

##   Ali  Ayşe Burak 
##    12    24    36

ages["Burak"]

## Burak 
##    36

ages[c("Ali", "Burak")]

##   Ali Burak 
##    12    36

Named indexes

ages <- c(Ali = 12, Ayşe = 24, Burak = 36)
ages

##   Ali  Ayşe Burak 
##    12    24    36

Matrices and Arrays

Matrices

m <- c(1, 2, 3, 4, 5, 6, 7, 8)
dim(m)

## NULL

dim(m) <- c(2, 4)
m

##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8

m <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8), 2, 4)
m

##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8

Matrices - By row

m <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8), 2, 4, byrow = T)
m

##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8

Matrices - Indexing

##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8

m[2, 3]

## [1] 7

m[2,]

## [1] 5 6 7 8

m[,3]

## [1] 3 7

Matrices - Indexing

##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8

m[2, -2]

## [1] 5 7 8

m[1:2, 2:3]

##      [,1] [,2]
## [1,]    2    3
## [2,]    6    7

Matrices - Forced matrix output

m[2,]

## [1] 5 6 7 8

m[2, , drop = F]

##      [,1] [,2] [,3] [,4]
## [1,]    5    6    7    8

Matrices - Binding

m <- matrix(1:9, 3, 3)
v <- c(10, 11, 12)
rbind(m, v)

##   [,1] [,2] [,3]
##      1    4    7
##      2    5    8
##      3    6    9
## v   10   11   12

cbind(m, v)

##             v
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12

Matrices - Naming

m <- matrix(1:9, 3, 3)
colnames(m) <- c("A", "B", "C")
rownames(m) <- c("x", "y", "z")
m

##   A B C
## x 1 4 7
## y 2 5 8
## z 3 6 9

m["y", "C"]

## [1] 8

Arrays

Arrays are matrices with higher dimensions
- Don’t confuse with C and Java type arrays

a <- array(1:24, dim = c(4, 3, 2))
a

## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   13   17   21
## [2,]   14   18   22
## [3,]   15   19   23
## [4,]   16   20   24

Recycling in Matrices and Arrays

m <- matrix(1, 3, 3)
m

##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
## [3,]    1    1    1

m <- matrix(1:3, 3, 3)
m

##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]    3    3    3

Matrix operations

m1 <- matrix(1:9, 3, 3)
m2 <- matrix(2, 3, 3)
m1

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

m2

##      [,1] [,2] [,3]
## [1,]    2    2    2
## [2,]    2    2    2
## [3,]    2    2    2

m1 + m2

##      [,1] [,2] [,3]
## [1,]    3    6    9
## [2,]    4    7   10
## [3,]    5    8   11

m1 * m2

##      [,1] [,2] [,3]
## [1,]    2    8   14
## [2,]    4   10   16
## [3,]    6   12   18

m1 %*% m2  # Matrix multiplication

##      [,1] [,2] [,3]
## [1,]   24   24   24
## [2,]   30   30   30
## [3,]   36   36   36

Lists

Lists allow you to package different types of objects

my.lst <- list(stud.id=34453,
               stud.name="Ali",
               stud.marks=c(14.3,12,15,19))
my.lst

## $stud.id
## [1] 34453
## 
## $stud.name
## [1] "Ali"
## 
## $stud.marks
## [1] 14.3 12.0 15.0 19.0

Lists

my.lst

## $stud.id
## [1] 34453
## 
## $stud.name
## [1] "Ali"
## 
## $stud.marks
## [1] 14.3 12.0 15.0 19.0

my.lst[1]

## $stud.id
## [1] 34453

my.lst[[1]]

## [1] 34453

Lists

my.lst$stud.id          # same as my.lst[[1]]

## [1] 34453

my.lst["stud.id"]       # same as my.lst[1]

## $stud.id
## [1] 34453

my.lst[["stud.id"]]     # same as my.lst[[1]]

## [1] 34453

Lists - tricks with names

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          registered = c(T, F, F))
l$names

## [1] "Ali"   "Ayşe"  "Burak"

l$nam

## [1] "Ali"   "Ayşe"  "Burak"

l$r

## [1]  TRUE FALSE FALSE

Lists - tricks with names

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          agents = c(2, 1, 2),
          registered = c(T, F, F))
l$age

## NULL

l$agen

## [1] 2 1 2

Lists - tricks with names

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          registered = c(T, F, F))
names(l)

## [1] "names"      "ages"       "registered"

names(l)[2] <- "age"
l

## $names
## [1] "Ali"   "Ayşe"  "Burak"
## 
## $age
## [1] 12 24 36
## 
## $registered
## [1]  TRUE FALSE FALSE

Lists - adding to a list

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          registered = c(T, F, F))
l$agents <- c(2, 1, 2)
l

## $names
## [1] "Ali"   "Ayşe"  "Burak"
## 
## $ages
## [1] 12 24 36
## 
## $registered
## [1]  TRUE FALSE FALSE
## 
## $agents
## [1] 2 1 2

Lists - removing from a list

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          registered = c(T, F, F))
l$ages <- NULL
l

## $names
## [1] "Ali"   "Ayşe"  "Burak"
## 
## $registered
## [1]  TRUE FALSE FALSE

Lists - removing from a list

l <- list(names = c("Ali", "Ayşe", "Burak"), 
          ages = c(12, 24, 36),
          registered = c(T, F, F))
l <- l[-2]
l

## $names
## [1] "Ali"   "Ayşe"  "Burak"
## 
## $registered
## [1]  TRUE FALSE FALSE

Lists - concatenation

l1 <- list(1, "a", T)
l2 <- list("red", 1.72)
l <- c(l1, l2)
l

## [[1]]
## [1] 1
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] "red"
## 
## [[5]]
## [1] 1.72