SISBID 2016 Module 1 Big Data Wrangling with R

Module info

  • Location In Person
  • Room FSH 102
  • Meeting Times Mon, Jul 11, 8:30am - Wed, Jul 13, 12pm PST
  • Instructors Jeffrey Leek Jeffrey Leek Andrew Jaffe Andrew Jaffe

In this module, we will introduce some of the most popular public data repositories (e.g. GEO, SRA), and will demonstrate how to access these repositories using tools in R and Bioconductor (e.g. using the GEOquery package). We will focus on data retrieval, manipulation, and formatting. Participants will learn how to get datasets and biological annotations ready for visualization and statistical analysis. Our approach will focus on the concept of “tidy data”: data that is organized into readable and distributable files. We will use hands-on examples from published studies. Principles will be illustrated using data from microarray and next generation sequencing technologies. We will assume some familiarity with R.

Recommended Reading: Cookbook for R, by Winston Chang, available at www.cookbook-r.com.