The 9th Summer Institute in Statistics for Big Data

Module 1: Data Wrangling with R

Mon, July 24 to Wed, July 26
Instructor(s):

Participants will learn how to prepare and process data, a key step prior to visualization and statistical analysis. Our approach focuses on the concept of creating “tidy data”, e.g. data that is organized into readable and distributable files. In this module, we will:

  • Use hands-on examples covering concepts on data retrieval, cleaning, manipulation, and formatting.
  • Touch on reproducible research using R Markdown and collaborative code sharing using GitHub.
  • Learn the basics of using an RStudio environment on the cloud computing platform AnVIL (https://anvilproject.org/)

Module assumes some familiarity with R (see previous year’s course materials for reference).

Recommended Reading: Tidyverse Skills for Data Science in R (https://leanpub.com/tidyverseskillsdatascience)