The goal of this laboratory is to experiment with hierarchical and discrete clustering and classification error estimation.
- Load the image for Nature 488, pp. 621-626 dataset.
- Use
distance()
function to compute JSD distance of the normalized microbiome data.
- Compute hierarchical clustering using Single, Average, and Complete linkage.
- Plot the dendrograms from 3.
- Use
rect.hclust
function to produce best discrete clusters from the dendrograms.
- Compute and plot cophenetic distance against the JSD distances computed in 2 for each of the hierarchical clusterings in 3.
- How well do the cophenetic distances represent the original distances? Compute correlations.
- Use Partitioning around Medoids,
pam
, to cluster the JSD distances into two clusters.
- Test for association of the clusters with the Treatment and Location and variables.
- Compute the gap statistic plot to determine the optimal number of clusters for these data.
- Install and load the following packages: “randomForest”, “kernlab”, “ROCR”.
- Use the provided 6-fold cross validation functions (
svm.kfoldAUC
and rf.kfoldAUC
) to estimate prediction accuracy in 100 repetitions using Phylum level data for:
A. Location; B. Antibiotic vs. Control within fecal samples; C. Antibiotic vs. Control within cecal samples.
Alternatively, use the “caret” package to accomplish the same.
- Report mean, and the upper 95% confidence interval for AUC.