23rd Summer Institute in Statistical Genetics (SISG)

Module 12: Computational Pipeline for WGS Data

Wed, July 18 to Fri, July 20

Module dates/times: Wednesday, July 18, 1:30-5 p.m.; Thursday, July 19, 8:30 a.m.-5 p.m., and Friday, July 20, 8:30 a.m.-5 p.m.

This module provides an introduction to analysis of whole-genome sequence data, with an application to the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Topics include population structure and relatedness, phenotype harmonization, aggregating and filtering variants using annotation, and association testing using single- and multi-marker tests. Concepts will be illustrated with hands-on exercises in R. Computational pipelines to link multi-step analyses will be presented, along with considerations for deploying these pipelines on a local compute cluster or in the cloud.

Stephanie Gogarten is a Research Scientist in the Genetics Analysis Center at the University of Washing- ton. She develops computational pipelines for GWAS and WGS data. She was lead author on “GWASTools: an R/Bioconductor package for quality control and analysis of Genome-Wide Association Studies. Bioinfor- matics 28:3329-3331, 2012. She recently published “Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology.” nature Genetics 49:1560-1563, 2017.

Ken Rice is Professor of Biostatistics at the University of Washington. His research focuses primarily on developing and applying statistical methods for complex disease epidemiology, notably cardiovascular disease. He leads the Analysis Committee for the CHARGE consortium, a large group of investigators studying genetic determinants of heart and aging outcomes. He recently published “Large-scale genome-wide analysis identifies genetic variants associated with cardiac structure and function.” J. Clinical Investigation 127:1798-1812.