25th Summer Institute in Statistical Genetics (SISG)


Module 18: Multivariate Analysis for Genetic Data

Wed, July 29 to Fri, July 31
Instructor(s):

Module dates/times: Wednesday, July 29; Thursday, July 30, and Friday, July 31. Live sessions will start no earlier than 8 a.m. Pacific and end no later than 2:30 p.m. Pacific, except for Wednesdays. For modules that end on Wednesday, live sessions will end by 11 a.m. Pacific. For modules that start on Wednesday, live sessions will begin no earlier than 11:30 a.m.

This module provides an introduction to multivariate analysis, with a strong emphasis on data visualization by means of multivariate graphics known as biplots. The course covers principal component analysis (PCA), multidimensional scaling (MDS), correspondence analysis (CA), canonical analysis, cluster analysis, discriminant analysis (DA) and some multivariate inference, illustrating these methods with genetic data. Some genetic datasets have a compositional nature, and basic principles of compositional data analysis like log-ratio transformations are considered. The use of multivariate methods for uncovering population substructure and cryptic relatedness is addressed. 

Suggested pairing: Modules 7 and 13.

Course materials can be accessed through the Summer Institutes archives.

Jan Graffelman is associate professor in Statistics at the Technical University of Catalonia in Barcelona, and visiting associate professor at UW Biostatistics. He conducts methodological and applied research in the fields of multivariate analysis, statistical genetics and compositional data analysis. He has made contributions to biplot theory, methodology for compositional data, and statistical methods used in testing genetic variants for Hardy-Weinberg equilibrium, X chromosomal variants in particular. He is author of the R package HardyWeinberg, and recently published “A log-ratio biplot approach for exploring genetic relatedness based on identity by state”, Frontiers in Genetics. Volume 10, article 341, 2019.

Learning Objectives: After attending this module, participants will be able to:

  1. Describe the purpose of basic multivariate statistical methods.
  2. Select an appropriate multivariate method for a given data set.
  3. Apply adequate transformations for a given data set.
  4. Perform multivariate statistical analysis on a computer in the R environment.
  5. Visualize multivariate data by means of biplot construction.
  6. Interpret biplots correctly and assess goodness-of-fit.
  7. Carry out basic multivariate hypothesis tests.
  8. State the peculiar nature of compositional data, and account it for in the analysis.