6th Summer Institute in Statistics for Big Data (SISBID)

This module is currently full. Registrations are closed at this time.

Module 3: Unsupervised Methods for Statistical Machine Learning

Mon, July 20 to Wed, July 22

Module dates/times: Live sessions will start no earlier than 8 a.m. Pacific and end no later than 2:30 p.m. Pacific, except for Wednesdays. For modules that end on Wednesday, live sessions will end by 11 a.m. Pacific. For modules that start on Wednesday, live sessions will begin no earlier than 11:30 a.m.

In this module, we will present a number of unsupervised learning techniques for finding patterns and associations in Biomedical Big Data. These include dimension reduction techniques such as principal components analysis and non-negative matrix factorization, clustering analysis, and network analysis with graphical models.

We will also discuss large-scale inference issues, such as multiple testing, that arise when mining for associations in Biomedical Big Data. As in Module 4 on supervised learning, the main emphasis will be on the analysis of real high-dimensional data sets from various scientific fields, including genomics and biomedical imaging. The techniques discussed will be demonstrated in R.

This course assumes some previous exposure to linear regression and statistical hypothesis testing, as well as some familiarity with R or another programming language (see previous year’s materials as reference).

Recommended Reading: James et al. (2013) Introduction to Statistical Learning. Springer Series in Statistics. Available for free download at www.statlearning.com.