set.seed(1024)
2024-04-30
set.seed(1024)
library(rpart) library(rpart.plot) library(mlbench) library(DMwR2) library(e1071)
## [1] "Id" "Cl.thickness" "Cell.size" "Cell.shape" ## [5] "Marg.adhesion" "Epith.c.size" "Bare.nuclei" "Bl.cromatin" ## [9] "Normal.nucleoli" "Mitoses" "Class"
The Gini index of a dataset D, where each example belongs to one of C classes:
\(\displaystyle Gini(D) = 1 - \sum_{i=1}^{C}{p_i^2}\)
If D is split by a logical test s, then
\(\displaystyle Gini_s(D) = \frac{|D_s|}{|D|}Gini(D_s) + \frac{|D_{\neg s}|}{|D|}Gini(D_{\neg s})\)
Then, the reduction in impurity is given by
\(\Delta Gini_s(D) = Gini(D) - Gini_s(D)\)
\(\displaystyle Err(D) = \frac{1}{|D|} \sum_{ \langle x_i,y_i \rangle \in D}{(y_i - k_D)^2}\)
where \(k_D\) is the constant representing value of D.
It is shown that \(mean(y_i)\) actually minimizes LS.
If D is split by a logical test s, then
\(\displaystyle Err_s(D) = \frac{|D_s|}{|D|}Err(D_s) + \frac{|D_{\neg s}|}{|D|}Err(D_{\neg s})\)
Then, the reduction in impurity is given by
\(\Delta Err_s(D) = Err(D) - Err_s(D)\)
rpart
and party
rpart
rpart()
and prune.rpart()
rpartXse()
which combines rpart()
and prune.rpart()
\(Y \sim X_1 + X_2 + X_3 + X_4...\)
\(Y \sim .\)
Due to the certain randomized parts of the algorithm, it is possible to obtain slightly different trees between different runs.
Hence, always use a seed
rpart.plot
package allows nice drawings of DTs using prp
data(iris) ct1 <- rpartXse(Species ~ ., iris, model = TRUE) ct2 <- rpartXse(Species ~ ., iris, se = 0, model = TRUE)
se=0
is a less agressive prunningpar(mfrow=c(1,2)) prp(ct1, type = 0, extra = 101) prp(ct2, type = 0, extra = 101)
samp <- sample(1:nrow(iris), 120) tr_set <- iris[samp, ] tst_set <- iris[-samp, ] model <- rpartXse(Species ~ ., tr_set, se = 0.5) predicted <- predict(model, tst_set, type = "class") head(predicted)
## 12 15 35 37 40 43 ## setosa setosa setosa setosa setosa setosa ## Levels: setosa versicolor virginica
table(tst_set$Species, predicted)
## predicted ## setosa versicolor virginica ## setosa 8 0 0 ## versicolor 0 10 1 ## virginica 0 0 11
errorRate <- sum(predicted != tst_set$Species) / nrow(tst_set) errorRate
## [1] 0.03333333
Kernel trick
\(K(x,z)=(\langle x_1, x_2 \rangle \cdot \langle z_1, z_2 \rangle)^2\)
\(=(x_1z_1+x_2z_2)^2 = x_1^2z_1^2 + x_2^2z_2^2 + 2x_1x_2z_1z_2\)
\(=\langle x_1^2, x_2^2, \sqrt{2}x_1x_2 \rangle \cdot \langle z_1^2, z_2^2, \sqrt{2}z_1z_2 \rangle\)
\(K(x,z)=\phi(x)\cdot \phi(z)\)
This means, if we find these Kernel functions then we can use them for mapping our data to higher dimensions much faster.
Indeed there are many such kernel function families
\(K(x_i, x_j) = e^{-\frac{||x_i-x_j||^2}{2\sigma^2}}\)
\(K(x_i, x_j) = (x_i\cdot x_j)^d\)
\(K(x_i,x_j) = e^{-\gamma||x_i - x_j||^2}\)
e1071
and kernlab
.
kernlab
may be more flexible. e1071
is simpler.data(iris) rndSample <- sample(1:nrow(iris), 100) tr <- iris[rndSample, ] ts <- iris[-rndSample, ] s <- svm(Species ~ ., tr) ps <- predict(s, ts) (cm <- table(ps, ts$Species))
## ## ps setosa versicolor virginica ## setosa 24 0 0 ## versicolor 0 14 0 ## virginica 0 1 11
s2 <- svm(Species ~ ., tr, cost=10, kernel="polynomial", degree=3) ps2 <- predict(s2, ts) (cm2 <- table(ps2, ts$Species))
## ## ps2 setosa versicolor virginica ## setosa 24 0 0 ## versicolor 0 15 3 ## virginica 0 0 8