set.seed(1024)

18/05/2020

set.seed(1024)

library(adabag) library(mlbench) library(randomForest)

- Philosophy:
- No matter how wise,
**one person cannot know everything**! **Ensembles of less wise people**produce better wisdom.

- No matter how wise,
- Why?

One superwise person’s decisions vs. one hundred barely wise persons’ decisions

SuperWise <- 0.9 BarelyWise <- 0.6 x <- rbinom(100, 1, SuperWise) y <- rbinom(100, 100, BarelyWise) / 100 y <- ifelse(y > 0.5, 1, 0) table(x)

## x ## 0 1 ## 10 90

table(y)

## y ## 0 1 ## 4 96

cat(sum(x) , sum(y))

## 90 96

Idea:

- Sample the original dataset uniformly and with replacement to obtain
**k training sets**- These are called
**bootstraps**

- These are called
- Train a model with each
- Usually the decision tree is used

- Take the
**average for regression**, or**vote for classification**

- implemented in package
`adabag`

# BreastCancer data from mlbench package data(BreastCancer, package = "mlbench") # use only the complete cases and remove the ID column bc <- BreastCancer[complete.cases(BreastCancer), -1] # Obtain a 70-30 split for training and testing rndSample <- sample(1:nrow(bc), nrow(bc) * 0.70) tr <- bc[rndSample, ] ts <- bc[-rndSample, ] # Build the model (mfinal = number of trees) m <- bagging(Class ~ ., tr, mfinal = 20, control = rpart.control(maxdepth=1)) ps <- predict(m,ts) names(ps)

## [1] "formula" "votes" "prob" "class" "confusion" "error"

ps$confusion

## Observed Class ## Predicted Class benign malignant ## benign 124 7 ## malignant 14 60

- Why use trees with
`maxdepth = 1`

?

# Build the model (mfinal = number of trees) m <- bagging(Class ~ ., tr, mfinal = 20, control = rpart.control(maxdepth=3)) ps <- predict(m, ts) names(ps)

## [1] "formula" "votes" "prob" "class" "confusion" "error"

ps$confusion

## Observed Class ## Predicted Class benign malignant ## benign 131 7 ## malignant 7 60

- They are faster to build
- Complex trees will be more similar

- Improved version of bagging
- Each tree is grown with a
**subset of variables**- Actually subset is randomly selected for each split

- Very diverse set of trees obtained
- Each tree is built
**very quickly**- Each split considers only a few variables

- implemented in
`randomForest`

m <- randomForest(Class ~ ., tr, ntree = 100, mtry = 3) ps <- predict(m,ts) (cm <- table(ps, ts$Class))

## ## ps benign malignant ## benign 132 0 ## malignant 6 67

- parameter
`mtry`

controls the size of the feature subset- if not provided it is automatically calculated
- for classification: sqrt of number of vars
- for regression: one third of number of vars

- Much faster than bagging
- Due to easier split decision

- How many trees is optimal?

error <- numeric() nmodels <- 20 for (i in 1:nmodels) { m <- randomForest(Class ~ ., tr, ntree = i, mtry = 3) ps <- predict(m, ts) cm <- table(ps, ts$Class) error[i] <- (cm[1,2]+cm[2,1])/nrow(ts) } par(mar=c(2,4,1,2)) plot(1:nmodels, error, type = "l")

- Both bagging and random forest are
**independent models**- Individual models are completely independent and unaware of each other

- There are also
**coordinated models**- Where each member depends on the others
- Each model improves the previous models
**Boosting**is a famous example

- Can
**many weak learners**improve each other to**form a strong learner**? - At each iteration we add a new model to the ensemble
- The new model is trained to predict observations which were
**hard to predict by the previous models** - This is done by assigning
**weights**to the observations

- The new model is trained to predict observations which were

- Most well-known boosting algorithm
- An additive system of models

\[H(x_i) = \sum_k{w_kh_k(x_i)}\]

- implemented in
`adabag`

as`boosting()`

- AdaBoost.M1 algorithm

m <- boosting(Class ~ ., tr, mfinal = 20) ps <- predict(m, ts) ps$confusion

## Observed Class ## Predicted Class benign malignant ## benign 131 0 ## malignant 7 67

- add parameter
`coeflearn = "Zhu"`

to run SAMME algorithm

- How many trees is optimal?

error <- numeric() nmodels <- 20 for (i in 1:nmodels) { m <- boosting(Class ~ ., tr, mfinal = i) ps <- predict(m, ts) error[i] <- ps$error } par(mar=c(2,4,1,2)) plot(1:nmodels, error, type = "l")

- Yet another boosting implementation
- This time using gradient descent optimization
- implemented in package
`gbm`