7th Summer Institute in Statistics for Big Data (SISBID)


This module is currently full. Registrations are closed at this time.

Module 1: Data Wrangling with R

Mon, July 12 to Wed, July 14
Instructor(s):
Registration for this module closes July 5. 

 

Live session timeframe (exact schedule with live sessions will be posted by module instructors prior to the start of the module): Monday: 8 a.m. – 2:30 p.m. Pacific (11 a.m. – 5:30 p.m. Eastern); Tuesday: 8 a.m. – 2:30 p.m. Pacific (11 a.m. – 5:30 p.m. Eastern); Wednesday: 8 a.m. – 11 a.m. Pacific (11 a.m. – 2 p.m. Eastern).

Participants will learn how to get data and process it for visualization and statistical analysis. Our approach focuses on the concept of creating “tidy data”, e.g. data that is organized into readable and distributable files. In this module, we will:

  • Use hands-on examples from published studies and cover concepts on data retrieval, manipulation, and formatting.
  • Touch on reproducible research using R Markdown and collaborative code sharing using GitHub.
  • Briefly introduce some of the most popular public data repositories in genomics (e.g. GEO, SRA), and demonstrate how to access these repositories using tools in R and Bioconductor (e.g. using the recount and GEOquery packages).

Principles will be illustrated using data from microarray and next generation sequencing technologies.

Module assumes some familiarity with R (see previous year’s course materials for reference).

Recommended Reading: Cookbook for R, by Winston Chang, available at www.cookbook-r.com.