# Basic statistics

- Population vs Sample
- Mean, median and mode
- The geometric mean
- Variables and scales
- Range, interquartile range and box plots
- Standard deviations and error bars
- Why we divide by n-1 and not n when we calculate the SD
- The normal distribution
- The central limit theorem
- The standard error of the mean (SEM)
- Confidence intervals
- The t-distribution – why we need it
- The one-sample t-test and p-values
- t-test VS confidence intervals
- The degrees of freedom
- The basic steps of hypothesis testing
- The unpaired t-test (independent samples t-test)
- One-way ANOVA: the basics
- One-way ANOVA: the calculations
- The paired t-test
- Paired vs unpaired t-test
- The repeated-measures ANOVA
- One-proportion Z-test and the corresponding confidence interval
- The Chi-square goodness of fit test and the difference to the one-proportion Z-test
- The two proportion z-test and the Chi-square test of homogeneity
- The Chi-square test of independence VS homogeneity and goodness of fit
- McNemar test
- A deeper understanding of p-values

### Non-parametric tests

- The Mann Whitney U test (Wilcoxon Mann Whitney test) part 1/2
- The Mann Whitney U test (Wilcoxon Mann Whitney test) part 2/2 | exact p-value
- The Wilcoxon signed-rank test & the sign test

### Type 1 and 2 errors and power

- The basics of type 1 and 2 errors
- The probability of making a type 1 error
- The probability of making a type 2 error (part 1/2)
- The probability of making a type 2 error (part 2/2)
- Statistical power and sample size calculations

### Correlation

- Correlation – the basics | Pearson correlation
- Correlation | hypothesis testing | assumptions
- Spearman’s rank correlation | Pearson VS Spearman

### Linear regression

- Linear regression – the basics
- Linear regression – least squares
- Linear regression – hypothesis testing
- Linear regression – R2
- Multiple linear regression

## How to select an appropriate statistical test

# Two-way ANOVA

# Post-hoc tests

- The familywise error rate (FWER)
- Fisher’s LSD method
- Bonferroni
- Holm
- Tukey’s test and Dunnett’s test
- How to select an appropriate post-hoc test

## Tests based on the false discovery rate (FDR)

# Generalized linear models (GLMs)

**Logistic regression**

- Logistic regression 1: the basics
- Logistic regression 2: classification
- Logistic regression 3: Likelihood and deviance
- Logistic regression 4: Likelihood ratio test and AIC

**Poisson regression**

- The Poisson distribution vs the normal distribution
- Poisson regression 1 – the basics
- Poisson regression 2 – rates and the offset
- Poisson regression 3 – categorical variables
- Poisson regression 4 – how to calculate the likelihood and the deviance
- Poisson regression 5 – compare models with the likelihood ratio test and AIC
- Quasi-Poisson and negative binomial regression models
- Zero-inflated Poisson (ZIP) regression

# Linear mixed-effect models

# Introduction to multivariate statistics

To understand the methods that are used in multivariate statistics, you need to understand some basic linear algebra. For example, you need to understand things like matrix operations, eigenvectors and eigenvalues. You also need to understand the meaning of covariance and different distances in space.

**Linear algebra**

- Matrices and matrix operations – part 1
- Matrices and matrix operations – part 2
- Eigenvectors and eigenvalues – the basics
- Eigenvectors and eigenvalues – the math

**Covariance and distances**

Once you know the basics in the above videos, you can start to learn about multivariate statistical methods. I recommend that you start to learn about PCA.

**PCA**

- PCA 1: the basics
- PCA 2: the math
- PCA 3: standardization and extract
- PCA 4: interpret the weights and Varimax rotation

**LDA**

**Multivariate statistical methods**

**Metrics used for binary classification and validation**

- Sensitivity and specificity
- The positive and negative predictive values
- The ROC curve
- Validation (cross-validation, hold-out, LOOCV)
- Likelihood ratio

**Classification methods**

*Logistic regression*

*Linear discriminant analysis*

*k-nearest neighbors and Mahalanobis distance*

*Decision trees and random forest*

*Support vector machines*

**Clustering methods**

**Partial least squares regression**

**Canonical correlation analysis**

# Gene set analysis

In gene set analysis, one usually uses Fisher’s exact test to identify an overrepresented set of genes. To understand Fisher’s exact test, we first need to understand a few things about permutations and combinations.

- Permutations Combinations and the Hypergeometric distribution
- Fisher’s test and how to calculate the exact p-value
- Gene set analysis

# Model selection

- Model selection with AIC and AICc
- Forward and backward selection, best subset selection
- Lasso regression

# Modeling and simulations

**Cellular automata**