Participants will learn how to prepare and process data, a key step prior to visualization and statistical analysis. Our approach focuses on the concept of creating “tidy data”, e.g. data that is organized into readable and distributable files. In this module, we will:

  • Use hands-on examples covering concepts on data retrieval, cleaning, manipulation, and formatting.
  • Touch on reproducible research using R Markdown and collaborative code sharing using GitHub.
  • Learn the basics of using an RStudio environment on the cloud computing platform AnVIL (https://anvilproject.org/)

Module assumes some familiarity with R (see previous year’s course materials for reference).

Recommended Reading: Tidyverse Skills for Data Science in R (https://leanpub.com/tidyverseskillsdatascience)

Ava Hoffman

Senior Staff Scientist
Fred Hutchinson Cancer Center

Carrie Wright

Senior Staff Scientist
Fred Hutchinson Cancer Center
Affiliated Faculty Member
Johns Hopkins Bloomberg School of Public Health (JHSPH)