Topics
Expected Time to Completion: 30 minutes
- Trees (in R)
- Tree graphics (ape, phyloseq, ggtree)
- Tree-based distances
- Network Manipulation
- Network Graphics
Questions
- Load phyloseq, ape, and ggplot2 packages, and the example data.
Trees in R
- Skim the ape documentation of the “phylo” class.
- Access the tree in
closedps
using the phy_tree
function, save the accessed tree as a new variable, named tree
. Explore the components of tree
using standard list semantics, especially using $
.
- What are the names of each component of the tree?
- What are the dimensions of the
edge
table?
- What do the first and second columns of the edge table mean?
- What is a node, and what is an edge? What are tips?
- The node labels are missing/empty. Assign random numbers between
0
and 1
to the node labels of tree
. Use runif
to define these, and set.seed
to make it reproducible.
- Check if
tree
is rooted, using is.rooted
.
- If it is, unroot
tree
using unroot
function.
- Assign a new root to
tree
using the root
function. Hint: use argument resolve.root=TRUE
- Replace the original tree in
closedps
with this new, node-labeled, re-rooted tree. Hint: Use the phy_tree<-
assignment method.
Tree Plotting
- Plot the tree using ape’s tree plotting function,
plot.phylo
- Now plot using phyloseq’s
plot_tree
function.
- Explore options in
plot_tree
, but make sure to leave time to complete the remaining questions. In particular, explore the node- and tip-labelling features. There are special functions for plotting bootstrap values at nodes, for example.
Distances that Use the Tree
- Calculate and store the weighted-UniFrac distance matrix (use the
distance
function).
- Decompose this distance matrix with multi-dimensional scaling (also called PCoA) using the
ordinate
function. Store the wUF/PCoA result. This is an exceedingly common method for exploring sample-wise features in microbiome data.
- Also use the
ordinate
function to calculate Double Principle Coordinate Analysis (DPCoA), which also uses the tree.
- Compare these two results using the
method="split"
argument in plot_ordination
, along with other plotting options that you find useful.
- Which of the two ordination methods seems most helpful on this data? Do you think this will always be the case? What species/OTUs are most different between the Fast/Control sample classes?
Distance based Networks
- Load the
enterotype
dataset that comes with phyloseq, using the data
function.
- Use a distance/dissimilarity of your choice to create a distance-threshold network plot using
enterotype
dataset and plot_net
.
- Try various distances until you find one that seems most useful. Use other plot aesthetic options in
plot_net
(e.g. color
, shape
) to overlay other known information about the samples.
- What are the main trends in the sample relationships, from this point of view?
- What can you learn from an ordination method instead? Are these complementary or overlapping methods?
- What’s different about these two forms of exploratory graphics?
Hints
- Most hints are in-line. The order of the steps should help you move through this quickly.