The goal of this laboratory is to experiment with hierarchical and discrete clustering and classification error estimation.

  1. Load the image for Nature 488, pp. 621-626 dataset.
  2. Use distance() function to compute JSD distance of the normalized microbiome data.
  3. Compute hierarchical clustering using Single, Average, and Complete linkage.
  4. Plot the dendrograms from 3.
  5. Use rect.hclust function to produce best discrete clusters from the dendrograms.
  6. Compute and plot cophenetic distance against the JSD distances computed in 2 for each of the hierarchical clusterings in 3.
  7. How well do the cophenetic distances represent the original distances? Compute correlations.
  8. Use Partitioning around Medoids, pam, to cluster the JSD distances into two clusters.
  9. Test for association of the clusters with the Treatment and Location and variables.
  10. Compute the gap statistic plot to determine the optimal number of clusters for these data.
  11. Install and load the following packages: “randomForest”, “kernlab”, “ROCR”.
  12. Use the provided 6-fold cross validation functions (svm.kfoldAUC and rf.kfoldAUC) to estimate prediction accuracy in 100 repetitions using Phylum level data for:

A. Location; B. Antibiotic vs. Control within fecal samples; C. Antibiotic vs. Control within cecal samples.

Alternatively, use the “caret” package to accomplish the same.

  1. Report mean, and the upper 95% confidence interval for AUC.