Advanced Statistics for the Life Sciences

We dive into the best methods for analyzing complex data in the life sciences with a focus on genomics.

About The Course

Here we will cover the advanced techniques being used by data analysis experts in the life sciences. These methods are required to analyze some of the more complex datasets, such as those found in genomics. We will cover several topics including statistical modeling, multiple test correction, clustering, prediction methods, factor analysis and empirical Bayes methods. We will also elaborate on the use of R markdown to conduct reproducible research.


  • Statistical modeling
  • Multiple testing
  • Distance and clustering
  • Prediction
  • Factor analysis (batch effects)
  • Empirical Bayes (hierarchical modeling)

This class was supported in part by NIH grant R25GM114818.

This course is part of a larger set of 8 total courses:

PH525.1x: Statistics and R for the Life Sciences

PH525.2x: Introduction to Linear Models and Matrix Algebra

PH525.3x: Advanced Statistics for the Life Sciences

PH525.4x: Introduction to Bioconductor

PH525.5x: Case study: RNA-seq data analysis

PH525.6x: Case study: Variant Discovery and Genotyping

PH525.7x: Case study: ChIP-seq data analysis

PH525.8x: Case study: DNA methylation data analysis

Recommended Background

PH525.1x and PH525.2x or basic programming, intro to statistics, intro to linear algebra