2nd Summer Institute in Statistics for Big Data

Module 1: Big Data Wrangling with R

Mon, July 11 to Wed, July 13
Instructor(s):

In this module, we will introduce some of the most popular public data repositories (e.g. GEO, SRA), and will demonstrate how to access these repositories using tools in R and Bioconductor (e.g. using the GEOquery package). We will focus on data retrieval, manipulation, and formatting. Participants will learn how to get datasets and biological annotations ready for visualization and statistical analysis. Our approach will focus on the concept of “tidy data”: data that is organized into readable and distributable files. We will use hands-on examples from published studies. Principles will be illustrated using data from microarray and next generation sequencing technologies. We will assume some familiarity with R.

Recommended Reading: Cookbook for R, by Winston Chang, available at www.cookbook-r.com.