DESeq2 with phyloseq¶
Note
Required OS: | OS x or Linux. |
---|---|
Software: | R, phyloseq R library |
Purpose: | This document provides instructions about how to find differentially abundant OTUs for microbiome data. |
More: | Read more about phyloseq DEseq2 here and here. |
Author: | This document was created by Saranga Wijeratne. |
Software installation¶
Note
If you are runing this on MCBL mcic-sel019-d, please skip the installation. The following command in R-Studio will load the software module to your environment.
1 2 | > library("phyloseq"); packageVersion("phyloseq")
[1] ‘1.16.2’ # version
|
Install the phyloseq
package as follows:
1 2 | > source('http://bioconductor.org/biocLite.R')
> biocLite('phyloseq')
|
Import data with phyloseq¶
For this step, you need Biom and mapping files generated by the Qiime pipeline.
Input biom file: | |
---|---|
otu_table_mc10_w_tax.biom | |
Qiime mapping file: | |
mapping.txt | |
Output file: | DESeq2_Out |
Copy all the input files to your “Working Directory” before you execute the following commands.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # Filenames:
biom_file <- "otu_table_mc10_w_tax.biom"
mapping_file <- "mapping.txt"
# Import the biom file with phyloseq:
biom_otu_tax <- import_biom(biom_file)
# Import the mapping file with phyloseq:
mapping_file <- import_qiime_sample_data(mapping_file)
# Merge biom and mapping files with phyloseq:
merged_mapping_biom <- merge_phyloseq(biom_otu_tax,mapping_file)
# Set column names in the taxa table:
colnames(tax_table(merged_mapping_biom)) <- c("kingdom", "Phylum", "Class", "Order", "Family", "Genus", "species")
|
Now, your merged mapping and Biom output should look as follows:
1 2 3 4 5 6 | merged_mapping_biom
# phyloseq-class experiment-level object
# otu_table() OTU Table: [ 315 taxa and 9 samples ]
# sample_data() Sample Data: [ 9 samples by 8 sample variables ]
# tax_table() Taxonomy Table: [ 315 taxa by 7 taxonomic ranks ]
|
The mapping file should look like this:
1 2 3 4 5 6 7 8 9 | head(mapping_file)
# Sample Data: [40 samples by 7 sample variables]:
# X.SampleID BarcodeSequence LinkerPrimerSequence InputFileName IncubationDate Treatment Description
# S1 S1 NA NA S1.fasta 0 CO CO1
# S2 S2 NA NA S2.fasta 0 CO CO2
# S3 S3 NA NA S3.fasta 0 CO CO3
# S4 S4 NA NA S4.fasta 15 CO CO4
# S5 S5 NA NA S5.fasta 15 CO CO5
|
To remove taxonomy level tags assigned to each level (k__, p__, etc..), issue the following commands:
1 2 3 4 5 6 7 8 9 10 11 12 13 | tax_table(merged_mapping_biom) <- gsub("k__([[:alpha:]])", "\\1", tax_table( merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("p__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("c__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("o__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("f__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("g__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("s__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("p__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("c__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("o__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("f__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("g__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("s__(\\[)","\\1", tax_table(merged_mapping_biom))
|
Testing for differential abundance among OTUs¶
Input file: | merged_mapping_biom |
---|---|
Output files: | DESeq2_Out.txt |
1. Load the DESeq2 package into your R environment
1 2 3 library("DESeq2") packageVersion("DESeq2") # [1] ‘1.12.4’
2. Assign DESeq2 output name and padj-cutoff
1 2 filename_out <- "DESeq2_Out.txt" alpha <- 0.01
- 3. Convert to DESeqDataSet format
The
phyloseq_to_deseq2()
function converts the phyloseq-format microbiome data (i.e merged_mapping_biom) to aDESeqDataSet
with dispersion estimated, using the experimental design formula (i.e ~ Treatment):1
diagdds <- phyloseq_to_deseq2(merged_mapping_biom, ~ Treatment)
4. Run DESeq
1 2 3 4 5 6 7 8 diagdds <- DESeq(diagdds, test="Wald", fitType="parametric") ## estimating size factors ## estimating dispersions ## gene-wise dispersion estimates ## mean-dispersion relationship ## final dispersion estimates ## fitting model and testingWarning
If you are getting the following error:
Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans = geoMeans) : every gene contains at least one zero, cannot compute log geometric means Calls: estimateSizeFactors ... estimateSizeFactors -> .local -> estimateSizeFactorsForMatrixThen please execute the following code (see here for more info):
1 2 3 4 gm_mean <- function(x, na.rm=TRUE){ exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))} geoMeans <- apply(counts(diagdds), 1, gm_mean) diagdds <- estimateSizeFactors(diagdds, geoMeans = geoMeans) diagdds <- DESeq(diagdds, test="Wald", fitType="parametric")
- 5. Process the results
- The
results
function creates a table of results. Then theres
table is filtered bypadj < alpha
.1 2 3 4
res <- results(diagdds, cooksCutoff = FALSE) sigtab <- res[which(res$padj < alpha), ] sigtab <- cbind(as(sigtab, "data.frame"), as(tax_table(merged_mapping_biom)[rownames(sigtab), ], "matrix")) # Bind taxonomic info to final results table write.csv(sigtab, as.character(filename_out)) # Write `sigtab` to file