1 Topics

Expected Time to Completion: 30 minutes

  • Trees (in R)
  • Tree graphics (ape, phyloseq, ggtree)
  • Tree-based distances
  • Network Manipulation
  • Network Graphics

 


2 Questions

  • Load phyloseq, ape, and ggplot2 packages, and the example data.

2.1 Trees in R

  • Skim the ape documentation of the “phylo” class.
  • Access the tree in closedps using the phy_tree function, save the accessed tree as a new variable, named tree. Explore the components of tree using standard list semantics, especially using $.
  • What are the names of each component of the tree?
  • What are the dimensions of the edge table?
  • What do the first and second columns of the edge table mean?
  • What is a node, and what is an edge? What are tips?
  • The node labels are missing/empty. Assign random numbers between 0 and 1 to the node labels of tree. Use runif to define these, and set.seed to make it reproducible.
  • Check if tree is rooted, using is.rooted.
  • If it is, unroot tree using unroot function.
  • Assign a new root to tree using the root function. Hint: use argument resolve.root=TRUE
  • Replace the original tree in closedps with this new, node-labeled, re-rooted tree. Hint: Use the phy_tree<- assignment method.

2.2 Tree Plotting

  • Plot the tree using ape’s tree plotting function, plot.phylo
  • Now plot using phyloseq’s plot_tree function.
  • Explore options in plot_tree, but make sure to leave time to complete the remaining questions. In particular, explore the node- and tip-labelling features. There are special functions for plotting bootstrap values at nodes, for example.

2.3 Distances that Use the Tree

  • Calculate and store the weighted-UniFrac distance matrix (use the distance function).
  • Decompose this distance matrix with multi-dimensional scaling (also called PCoA) using the ordinate function. Store the wUF/PCoA result. This is an exceedingly common method for exploring sample-wise features in microbiome data.
  • Also use the ordinate function to calculate Double Principle Coordinate Analysis (DPCoA), which also uses the tree.
  • Compare these two results using the method="split" argument in plot_ordination, along with other plotting options that you find useful.
  • Which of the two ordination methods seems most helpful on this data? Do you think this will always be the case? What species/OTUs are most different between the Fast/Control sample classes?

2.4 Distance based Networks

  • Load the enterotype dataset that comes with phyloseq, using the data function.
  • Use a distance/dissimilarity of your choice to create a distance-threshold network plot using enterotype dataset and plot_net.
  • Try various distances until you find one that seems most useful. Use other plot aesthetic options in plot_net (e.g. color, shape) to overlay other known information about the samples.
  • What are the main trends in the sample relationships, from this point of view?
  • What can you learn from an ordination method instead? Are these complementary or overlapping methods?
  • What’s different about these two forms of exploratory graphics?

3 Hints

  • Most hints are in-line. The order of the steps should help you move through this quickly.