3rd Summer Institute in Statistics for Big Data

Module 1: Data Wrangling with R

Mon, July 10 to Wed, July 12
Instructor(s):

Participants will learn how to get data and process it for visualization and statistical analysis. Our approach focuses on the concept of creating “tidy data”, e.g. data that is organized into readable and distributable files. In this module, we will:

  • Use hands-on examples from published studies and cover concepts on data retrieval, manipulation, and formatting.
  • Touch on reproducible research using R Markdown and collaborative code sharing using GitHub.
  • Briefly introduce some of the most popular public data repositories in genomics (e.g. GEO, SRA), and demonstrate how to access these repositories using tools in R and Bioconductor (e.g. using the recount and GEOquery packages).

Principles will be illustrated using data from microarray and next generation sequencing technologies.

Module assumes some familiarity with R.

Recommended Reading: Cookbook for R, by Winston Chang, available at www.cookbook-r.com.\

2017 Module Materialshttps://github.com/sisbid/module1

Video Links