01/06/2020

Seed used in these slides

set.seed(1024)

Libraries used in these slides

library(ggplot2)
library(nnet)
library(bit64)
library(h2o)

Artificial Neural Networks

Artificial Neural Networks

  • Non-linear models
  • Can solve both classification and regression tasks
  • Composed of neurons
    • Connected together
    • Each connection has a weight
    • Idea: find best weights that produce the correct output

Artificial Neural Networks

  1. linear computation using the input values \[in_i = \sum_{i=1}^{k}{w_{i,j}a_i}\]

  2. a non-linear (activation) function

  3. output sent to other neurons

Activation Function

  • The Step Function \[step(x) = \begin{cases} 1, & \text{if}\ x \geq t \\ 0, & \text{otherwise} \end{cases}\]
  • The Sign Function \[sign(x) = \begin{cases} 1, & \text{if}\ x \geq 0 \\ 0, & \text{otherwise} \end{cases}\]
  • The Sigmoid Function \[sigmoid(x) = \frac{1}{1+\exp^{-x}}\]

Types

  • Perceptron
    • Single unit
    • Linearity assumption
    • Incapable
  • Multi-layer
    • Feed-forward (acyclic)
    • Recurrent (cyclic)

Feed-forward

Feed-forward Artificial Neural Network

What to Know?

  • ANNs are universal function approximators
    • They can approximate any function if provided the correct architecture
  • Two major handicaps
    • You have to guess the architecture
      • How many layers?
      • How many nodes in each layer?
      • Initial weights
    • Expensive computation

Artificial Neural Networks - Implementations

  • nnet included in base installation
    • single hidden layer
  • RSNNS (Bergmeir and Benitez, 2012)
  • FCNN4R (Klima, 2016)
  • neuralnet (Fritsch et al., 2012)

Classification Example

data(iris)
rndSample <- sample(1:nrow(iris), 100)
tr <- iris[rndSample, ]
ts <- iris[-rndSample, ]
n <- nnet(Species ~ ., tr, size = 6 ,trace = FALSE, maxit = 1000)
ps <- predict(n, ts, type="class")
(cm <- table(ps, ts$Species))
##             
## ps           setosa versicolor virginica
##   setosa         12          0         0
##   versicolor      1         20         0
##   virginica       0          1        16
  • parameter decay can be used to set the learning rate
  • initial weights are randomized in \([-0.5, 0.5]\)
  • two consecutive runs can result in different outputs
    • unless seed is fixed

Regression Example

data(Boston,package='MASS')
sp <- sample(1:nrow(Boston),354)
tr <- Boston[sp,]
ts <- Boston[-sp,]
nr <- nnet(medv ~ ., tr,
           linout=TRUE,
           trace=FALSE,
           size=6,
           decay=0.01,
           maxit=2000)
psnr <- predict(nr, ts)
mean(abs(psnr-ts$medv))
## [1] 3.198552
plot(ts$medv, psnr)
abline(0, 1)

Deep Learning

Deep Learning

  • Each consecutive layer defines a more complex set of features
  • Use an unsupervised learning method at each layer to learn the features
  • Apply ANN to the obtained structure
  • DLNNs are ANNs with many hidden layers
    • They are very popular now, because
      • Hardware improvements
      • Methodological improvements (Hinton, 2006)

Deep Learning

  • Many implementations exist
    • h2o (Aiello et al., 2016)
    • mxnet
    • darch (Drees, 2013)
    • deepnet (Rong, 2014)

Deep Learning

h2oInstance <- h2o.init(ip = "localhost") # start H2O instance locally
## 
## H2O is not running yet, starting it now...
## 
## Note:  In case of errors look at the following log files:
##     /tmp/RtmphH8qFr/filecd5c7a5a11e5/h2o_bgenc_started_from_r.out
##     /tmp/RtmphH8qFr/filecd5c2dc88072/h2o_bgenc_started_from_r.err
## 
## 
## Starting H2O JVM and connecting: . Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         1 seconds 106 milliseconds 
##     H2O cluster timezone:       Europe/Istanbul 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.30.0.1 
##     H2O cluster version age:    1 month and 28 days  
##     H2O cluster name:           H2O_started_from_R_bgenc_qbt439 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.88 GB 
##     H2O cluster total cores:    12 
##     H2O cluster allowed cores:  12 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     H2O API Extensions:         Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4 
##     R Version:                  R version 4.0.0 (2020-04-24)

Deep Learning

rndSample <- sample(1:nrow(iris), 100) 
trH <- as.h2o(iris[rndSample, ], "trH")
tsH <- as.h2o(iris[-rndSample, ], "tsH")
mdl <- h2o.deeplearning(x = 1:4, y = 5, training_frame = trH)
preds <- h2o.predict(mdl, tsH)[, "predict"]
(cm <- table(as.vector(preds), as.vector(tsH$Species)))
##             
##              setosa versicolor virginica
##   setosa         16          0         0
##   versicolor      0         15         0
##   virginica       0          5        14
  • Analyze outputs in console

Deep Learning

data(Boston, package="MASS")
trH <- as.h2o(Boston[sp, ],"trH")
tsH <- as.h2o(Boston[-sp, ],"tsH")
mdl <- h2o.deeplearning(x=1:13, y=14, training_frame=trH,
hidden = c(100, 100, 100, 100), epochs = 500)
preds <- as.vector(h2o.predict(mdl,tsH))
mean(abs(preds - as.vector(tsH$medv)))
## [1] 2.339476

Deep Learning

plot(as.vector(tsH$medv), preds)
abline(0, 1)

plot(as.vector(tsH$medv), preds)
points(as.vector(tsH$medv), psnr, col = "red")
abline(0, 1)

Deep Learning

  • Don’t forget to shutdown H2O :)
h2o.shutdown(prompt = F);