The 27th Summer Institute in Statistical Genetics

Module 16: Pathway & Network Analysis for Omics Data

Mon, July 25 to Wed, July 27

Networks represent the interactions among components of biological systems. In the context of high dimensional omics data, relevant networks include gene regulatory networks, protein-protein interaction networks, and metabolic networks. These networks provide a window into biological systems as well as complex diseases, and can be used to understand how biological functions are implemented and how homeostasis is maintained. On the other hand, pathway-based analyses can be used to leverage biological knowledge available from literature, gene ontologies or previous experiments in order to identify the pathways associated with disease or an outcome of interest.

In this module, various statistical learning methods for reconstruction and analysis of networks from omics data are discussed, as well as methods of pathway enrichment analysis. Particular attention is paid to omics datasets with a large number of variables, e.g. genes, and a small number of samples, e.g. patients. The techniques discussed will be demonstrated in R. This course assumes familiarity with R or other command-line programming languages.

Suggested pairing: Modules 6: Gene Expression Profiling

Learning Objectives: After attending this module, participants will be able to:

  1. Evaluate the relative strengths and weaknesses of publicly available knowledge bases for gene set analysis.
  2. Choose an appropriate null hypotheses in gene-set analysis methods for specific biological questions.
  3. Using publicly available tools, test for over representation of gene-sets/pathways from individual gene association results.
  4. Estimate (partially) directed and undirected networks from high-dimensional omics data, using publicly available software appropriate for the data at hand.
  5. Perform network-based pathway enrichment analysis using publicly available software tools.
  6. Perform version control for meta-data (e.g. pathway and network data) and analysis (codes, hyper-parameters) to ensure reproducibility of results.