Introduction to Basic XGR Use

  • eXploring Genomic Relations (XGR)
  • Used for:
    • Enrichment Analysis
    • Similarity Analysis
    • Identifying Gene Sub networks

Set Up

Load Data

Load Packages used

  • First Install Package using the BiocManager Package (Once Per Computer)
  • Then Load The Packages used for this lesson
  • XGR - Pathway Analysis
  • tidyverse - Data Cleaning and Plotting Tools
  • kableExtra - Nicely formatted tables for rMarkdown
  • RColorBrewer - Color Palettes for Plots
# if(!("BiocManager" %in% rownames(installed.packages()))) install.packages("BiocManager")
# BiocManager::install("remotes", dependencies=T)
# BiocManager::install("hfang-bristol/XGR", dependencies=T)
# 
# 
# 
# install.packages('tidyverse')
# install.packages('kableExtra')
# install.packages('RColorBrewer')

library(XGR)
library(tidyverse)
library(kableExtra)
library(RColorBrewer)
  • Data is loaded from the web address specified with built in functions from XGR
  • Next, data is subset at to only include data with an FDR less than 0.01
RData.location <- "http://galahad.well.ox.ac.uk/bigdata"


res <- xRDataLoader(RData.customised='JKscience_TS1A', RData.location=RData.location)
## Start at 2021-07-20 10:08:08
## 
## 'JKscience_TS1A' (from http://galahad.well.ox.ac.uk/bigdata/JKscience_TS1A.RData) has been loaded into the working environment (at 2021-07-20 10:08:11)
## 
## End at 2021-07-20 10:08:11
## Runtime in total is: 3 secs
background <- res$Symbol
# Create a data frame for genes significantly induced by IFN24
flag <- res$logFC_INF24_Naive<0 & res$fdr_INF24_Naive<0.01
df_IFN24 <- res[flag, c('Symbol','logFC_INF24_Naive','fdr_INF24_Naive')]
# Create a data frame for genes significantly induced by LPS24
flag <- res$logFC_LPS24_Naive<0 & res$fdr_LPS24_Naive<0.01
df_LPS24 <- res[flag, c('Symbol','logFC_LPS24_Naive','fdr_LPS24_Naive')]
# Create a data frame for genes significantly induced by LPS2
flag <- res$logFC_LPS2_Naive<0 & res$fdr_LPS2_Naive<0.01
df_LPS2 <- res[flag, c('Symbol','logFC_LPS2_Naive','fdr_LPS2_Naive')]
  • Next, our data columns are renamed, and the three treatments are stacked into one data set that can be fed into our XGR object creation
df_IFN24 <- df_IFN24 %>% dplyr::mutate(Treatment = 'IFN24 vs. Control') %>% rename(L2FC = logFC_INF24_Naive, FDR = fdr_INF24_Naive)
df_LPS24 <- df_LPS24 %>% dplyr::mutate(Treatment = 'LPS24 vs. Control') %>% rename(L2FC = logFC_LPS24_Naive, FDR = fdr_LPS24_Naive)
df_LPS2 <- df_LPS2 %>% dplyr::mutate(Treatment = 'LPS2 vs. Control') %>% rename(L2FC = logFC_LPS2_Naive, FDR = fdr_LPS2_Naive)
contrasts <- bind_rows(df_IFN24,df_LPS24,df_LPS2)
  • Sample Data set structure
head(contrasts %>% 
  arrange(FDR)) %>% 
  kable(booktabs = T, caption = "All Contrasts Dataset") %>%
  kable_styling(full_width = T,bootstrap_options = c("striped",'hover'), font_size = 9) 
All Contrasts Dataset
Symbol L2FC FDR Treatment
STX11 -1.81 0 IFN24 vs. Control
CUL1 -1.69 0 IFN24 vs. Control
ANKRD22 -5.92 0 IFN24 vs. Control
PSME1 -1.43 0 IFN24 vs. Control
TRANK1 -1.55 0 IFN24 vs. Control
GCH1 -4.06 0 IFN24 vs. Control

Generate XGR Lists

Step 1: Identify Contrasts and Change Background

  • Find all Unique Contrasts by looking at all unique treatments in our experiment

  • Our default background is all annotated genes, however, we want to limit our background to only genes in our data in order to properly see what pathways are enriched in our data set.

  • This background gene set falls into the competitive null hypothesis since we are asking if the genes in our set are more differentially expressed then all backgrounds as opposed to a self contained analysis

mycontrasts <- unique(contrasts$Treatment)

mycontrasts
## [1] "IFN24 vs. Control" "LPS24 vs. Control" "LPS2 vs. Control"
background <-unique(toupper(contrasts$Symbol))

tail(background)
## [1] "AK057196" "NEUROG2"  "BX106374" "VN1R5"    "CPLX2"    "AW296529"

Step 2: Create Helper Function

  • This function takes our output and turns them into the vector of significant genes that XGR needs

  • However, if you started with just a vector of significant genes and a background this step can be skipped

  • Simplifies analysis for many contrasts using xEnricherGenes and xEnrichConciser

  • This function takes our input contrast data, filters it at an FDR level of 0.1 and a log2 Fold Change to what you need for a specific contrast to get our vector of input genes for xEnricherGenes

  • xEnricherGenes then takes this vector of symbols, compares them to our selected background for our chosen ontology set and returns a list containing the information we need to visualize our enriched pathways

  • If you turn the tree functionality on you can remove redundant terms but significantly increase run time

  • xEnrichConciser is used to clarify the results by removing redundant terms.

Generate_XGR_list <- function(data, contrast, mygo , tree = F, l2fc ,background){
 
  mysymbol <-  data %>% dplyr::filter(FDR < 0.1, abs(L2FC) > l2fc,  Treatment == contrast) %>%
    dplyr::select(Symbol, FDR) %>% mutate(symbol = toupper(Symbol)) %>%
    pull(symbol) 
  
  myxgr <- xEnricherGenes(data = mysymbol, ontology = mygo, background = background,ontology.algorithm = ifelse(tree == F,"none","lea"))
  
  if(tree == F){myxgr <- try(xEnrichConciser(myxgr))}
}

Step 3: Create xEnricherGenes list for all Contrasts

  • This lets us compare enriched pathways by contrasts

  • Repeats the function for all of our contrasts

xgr_list <- list(
                   Generate_XGR_list(contrast = mycontrasts[1], mygo = "MsigdbH", data = contrasts, l2fc = 0,background = background),
                   Generate_XGR_list(contrast = mycontrasts[2], mygo = "MsigdbH", data = contrasts, l2fc = 0,background = background),
                   Generate_XGR_list(contrast = mycontrasts[3], mygo = "MsigdbH", data = contrasts, l2fc = 0,background = background)
                   )

names(xgr_list) <- mycontrasts

Pathway Comparison Plot

  • We now plot these contrasts with the function xEnrichCompare which returns a ggplot bar chart that we can customize at will
  • The Plot is set to display by FDR
  • We can also set the FDR cutoff for enrichment to whatever value we need
 p <- xEnrichCompare(xgr_list, displayBy="fdr", FDR.cutoff = 0.1, wrap.width = 45) + 
  scale_fill_brewer(palette='Set2') +
  ggtitle('Hallmark Pathway Enrichments (FDR < 0.1)')

p

Single Comparison Plot

  • To see the most enriched pathways for only one contrast, you can use xEnrichBarplot
one_contrast_plot <- xEnrichBarplot(xgr_list[[1]], top_num=10, displayBy="fc")
one_contrast_plot
## Warning: Position guide is perpendicular to the intended axis. Did you mean to
## specify a different guide `position`?

Step 3: Create Enrichment dataset

  • Using the function xEnrichViewer we can view
  • This will return a data frame that allows us to look through the genes in these enriched pathways to form Hypothesis
  • Important Columns:
    • Name = Pathway Name
    • nAnno = Number of Genes in Pathway
    • nOverlap = Number of Enriched Genes in Pathway
    • Members_Overlap = Enriched Genes in Pathway
    • Members_Anno = All Genes in Pathway
head(xEnrichViewer(xgr_list[[1]], top_num = 250, sortBy = "adjp", details = T) %>% 
  mutate(Contrast = names(xgr_list)[1]),1) %>% 
  kable(booktabs = T, caption = "xEnrichViewer Output") %>%
  kable_styling(full_width = T,bootstrap_options = c("striped",'hover'), font_size = 9)
xEnrichViewer Output
name nAnno nOverlap fc zscore pvalue adjp or CIl CIu distance namespace members_Overlap members_Anno Contrast
HALLMARK_INTERFERON_GAMMA_RESPONSE Genes up-regulated in response to IFNG [GeneID=3458]. 177 164 1.65 9.94 0 0 10.3 5.83 19.8 H ADAR, APOL6, ARID5B, AUTS2, BANK1, BATF2, BPGM, BST2, C1R, C1S, CASP1, CASP3, CASP4, CASP7, CCL2, CCL7, CD274, CD38, CD40, CD69, CD74, CD86, CDKN1A, CFB, CFH, CIITA, CMKLR1, CMPK2, CSF2RB, CXCL10, CXCL9, DDX58, DDX60, DHX58, EIF2AK2, EIF4E3, EPSTI1, FAS, FGL2, FPR1, GBP4, GBP6, GCH1, GPR18, GZMA, HERC6, HLA-B, HLA-DRB1, ICAM1, IDO1, IFI27, IFI30, IFI35, IFI44, IFI44L, IFIH1, IFIT1, IFIT3, IFITM2, IFNAR2, IL10RA, IL15, IL15RA, IL18BP, IL2RB, IL4R, IL7, IRF1, IRF2, IRF4, IRF7, IRF8, IRF9, ISG15, ISG20, ISOC1, ITGB7, JAK2, LAP3, LATS2, LGALS3BP, LY6E, LYSMD2, MARCH1, METTL7B, MT2A, MTHFD2, MVP, MX1, MX2, MYD88, NAMPT, NFKB1, NLRC5, NMI, NOD1, NUP93, OAS2, OAS3, OASL, OGFR, P2RY14, PARP12, PARP14, PDE4B, PIM1, PLA2G4A, PLSCR1, PML, PSMA2, PSMA3, PSMB10, PSMB2, PSMB8, PSMB9, PSME1, PTGS2, PTPN1, PTPN2, PTPN6, RBCK1, RIPK1, RIPK2, RNF213, RNF31, RSAD2, SAMD9L, SAMHD1, SECTM1, SERPING1, SLAMF7, SLC25A28, SOCS1, SOCS3, SOD2, SP110, SPPL2A, SRI, SSPN, ST3GAL5, STAT1, STAT2, STAT3, STAT4, TAP1, TAPBP, TDRD7, TNFAIP3, TNFAIP6, TNFSF10, TOR1B, TRAFD1, TRIM14, TRIM21, TRIM26, UBE2L6, UPP1, VAMP5, VAMP8, VCAM1, WARS, XAF1, ZBP1, ZNFX1 ADAR, APOL6, ARID5B, ARL4A, AUTS2, BANK1, BATF2, BPGM, BST2, BTG1, C1R, C1S, CASP1, CASP3, CASP4, CASP7, CASP8, CCL2, CCL5, CCL7, CD274, CD38, CD40, CD69, CD74, CD86, CDKN1A, CFB, CFH, CIITA, CMKLR1, CMPK2, CSF2RB, CXCL10, CXCL9, DDX58, DDX60, DHX58, EIF2AK2, EIF4E3, EPSTI1, FAS, FGL2, FPR1, GBP4, GBP6, GCH1, GPR18, GZMA, HERC6, HIF1A, HLA-B, HLA-DRB1, ICAM1, IDO1, IFI27, IFI30, IFI35, IFI44, IFI44L, IFIH1, IFIT1, IFIT3, IFITM2, IFNAR2, IL10RA, IL15, IL15RA, IL18BP, IL2RB, IL4R, IL6, IL7, IRF1, IRF2, IRF4, IRF7, IRF8, IRF9, ISG15, ISG20, ISOC1, ITGB7, JAK2, LAP3, LATS2, LGALS3BP, LY6E, LYSMD2, MARCH1, METTL7B, MT2A, MTHFD2, MVP, MX1, MX2, MYD88, NAMPT, NCOA3, NFKB1, NFKBIA, NLRC5, NMI, NOD1, NUP93, OAS2, OAS3, OASL, OGFR, P2RY14, PARP12, PARP14, PDE4B, PELI1, PIM1, PLA2G4A, PLSCR1, PML, PNP, PSMA2, PSMA3, PSMB10, PSMB2, PSMB8, PSMB9, PSME1, PTGS2, PTPN1, PTPN2, PTPN6, RAPGEF6, RBCK1, RIPK1, RIPK2, RNF213, RNF31, RSAD2, SAMD9L, SAMHD1, SECTM1, SERPING1, SLAMF7, SLC25A28, SOCS1, SOCS3, SOD2, SP110, SPPL2A, SRI, SSPN, ST3GAL5, ST8SIA4, STAT1, STAT2, STAT3, STAT4, TAP1, TAPBP, TDRD7, TNFAIP3, TNFAIP6, TNFSF10, TOR1B, TRAFD1, TRIM14, TRIM21, TRIM25, TRIM26, UBE2L6, UPP1, VAMP5, VAMP8, VCAM1, WARS, XAF1, ZBP1, ZNFX1 IFN24 vs. Control

XGR Demo Series