Multivariate Statistics and Machine Learning in R For Beginners

Welcome to the companion website for the book Multivariate Statistics and Machine Learning in R For Beginners, where you will find the associated videos and R code. At the bottom of this page, you will also find corrected errors identified since the book was published.

Chapter 1 A brief introduction to machine learning and multivariate statistics

Chapter 2 Matrix Algebra

R scripts: Chapter 2

Chapter 3 Managing data in R

R scripts: Chapter 3

Chapter 4 Graphical illustration of multivariate data

R scripts: Chapter 4

Chapter 5 Multivariate Relationships

R scripts: Chapter 5

Chapter 6 PCA and PCoA

R script: Chapter 6

Chapter 7 Linear discriminant analysis

R scripts: Chapter 7

Chapter 8 Distances in space

R scripts: Chapter 8

Chapter 9 Multivariate statistical tests

R scripts: Chapter 9

Chapter 10 Classification and performance metrics

R scripts: Chapter 10

Chapter 11 Supervised Machine Learning

R scripts: Chapter 11

Chapter 12 Clustering

R scripts: Chapter 12

Chapter 13 PCR, PLS and Lasso regression

R scripts: Chapter 13

Chapter 14 Case studies

R scripts: Chapter 14

Paper 1

Dataset: Cytokines

Paper 2

Chapter 15 Answers to the exercises

R scripts: Exercises

 

Errors identified in the book since it was published

LOOCV in the package Caret

Page 195

fit=train(Species ~ ., data=iris, method="lda", 
       trControl = trainControl(method = "LOOCV"))
pred= predict(fit,dimen=1)
Tab=table(pred, iris$Species)

Correct

fit=train(Species ~ ., data=iris, method="lda",
trControl = trainControl(method = "LOOCV",savePredictions = "final"))
Tab=table(fit$pred$pred, fit$pred$obs)

Page: 210

fit = train(Species ~ ., data=df_iris, method="glm",
family="binomial",
      trControl = trainControl(method = "LOOCV"))
pred= predict(fit)
Tab=table(pred, df_iris$Species)
Tab
pred       versicolor virginica
versicolor     49         1
virginica      1          49
sum(diag(Tab))/sum(Tab)
0.98

Correct

fit = train(Species ~ ., data=df_iris, method="glm",
family="binomial",
trControl = trainControl(method = "LOOCV",savePredictions = "final"))
Tab=table(fit$pred$pred, fit$pred$obs)
Tab
           versicolor virginica
versicolor     48        1
virginica       2       49
sum(diag(Tab))/sum(Tab)
0.97