DESeq2 with phyloseq¶

Note

Required OS:	OS x or Linux.
Software:	R, phyloseq R library
Purpose:	This document provides instructions about how to find differentially abundant OTUs for microbiome data.
More:	Read more about phyloseq DEseq2 here and here.
Author:	This document was created by Saranga Wijeratne.

Software installation¶

Note

If you are runing this on MCBL mcic-sel019-d, please skip the installation. The following command in R-Studio will load the software module to your environment.

> library("phyloseq"); packageVersion("phyloseq")
[1] ‘1.16.2’ # version

Install the phyloseq package as follows:

> source('http://bioconductor.org/biocLite.R')
> biocLite('phyloseq')

Import data with phyloseq¶

For this step, you need Biom and mapping files generated by the Qiime pipeline.

Input biom file:
	otu_table_mc10_w_tax.biom
Qiime mapping file:
	mapping.txt
Output file:	DESeq2_Out

Copy all the input files to your “Working Directory” before you execute the following commands.

# Filenames:
biom_file <- "otu_table_mc10_w_tax.biom"
mapping_file <- "mapping.txt"

# Import the biom file with phyloseq:
biom_otu_tax <- import_biom(biom_file)

# Import the mapping file with phyloseq:
mapping_file <- import_qiime_sample_data(mapping_file)

# Merge biom and mapping files with phyloseq:
merged_mapping_biom <- merge_phyloseq(biom_otu_tax,mapping_file)

# Set column names in the taxa table:
colnames(tax_table(merged_mapping_biom)) <- c("kingdom", "Phylum", "Class", "Order", "Family", "Genus", "species")

Now, your merged mapping and Biom output should look as follows:

merged_mapping_biom

# phyloseq-class experiment-level object
# otu_table()   OTU Table:         [ 315 taxa and 9 samples ]
# sample_data() Sample Data:       [ 9 samples by 8 sample variables ]
# tax_table()   Taxonomy Table:    [ 315 taxa by 7 taxonomic ranks ]

The mapping file should look like this:

head(mapping_file)

# Sample Data:        [40 samples by 7 sample variables]:
# X.SampleID BarcodeSequence LinkerPrimerSequence InputFileName IncubationDate Treatment Description
# S1          S1              NA                   NA      S1.fasta              0        CO         CO1
# S2          S2              NA                   NA      S2.fasta              0        CO         CO2
# S3          S3              NA                   NA      S3.fasta              0        CO         CO3
# S4          S4              NA                   NA      S4.fasta             15        CO         CO4
# S5          S5              NA                   NA      S5.fasta             15        CO         CO5

To remove taxonomy level tags assigned to each level (k__, p__, etc..), issue the following commands:

tax_table(merged_mapping_biom) <- gsub("k__([[:alpha:]])", "\\1", tax_table( merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("p__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("c__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("o__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("f__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("g__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("s__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("p__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("c__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("o__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("f__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("g__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("s__(\\[)","\\1", tax_table(merged_mapping_biom))

Testing for differential abundance among OTUs¶

Input file:	merged_mapping_biom
Output files:	DESeq2_Out.txt

1. Load the DESeq2 package into your R environment

library("DESeq2")
packageVersion("DESeq2")
    # [1] ‘1.12.4’

2. Assign DESeq2 output name and padj-cutoff

filename_out <- "DESeq2_Out.txt"
alpha <- 0.01

3. Convert to DESeqDataSet format

The phyloseq_to_deseq2() function converts the phyloseq-format microbiome data (i.e merged_mapping_biom) to a DESeqDataSet with dispersion estimated, using the experimental design formula (i.e ~ Treatment):

diagdds <- phyloseq_to_deseq2(merged_mapping_biom, ~ Treatment)

4. Run DESeq

 diagdds <- DESeq(diagdds, test="Wald", fitType="parametric")

 ## estimating size factors
 ## estimating dispersions
 ## gene-wise dispersion estimates
 ## mean-dispersion relationship
 ## final dispersion estimates
 ## fitting model and testing

Warning

If you are getting the following error:

Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans = geoMeans) : every gene contains at least one zero, cannot compute log geometric means
Calls: estimateSizeFactors ... estimateSizeFactors -> .local -> estimateSizeFactorsForMatrix

Then please execute the following code (see here for more info):

gm_mean <- function(x, na.rm=TRUE){ exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))}
geoMeans <- apply(counts(diagdds), 1, gm_mean)
diagdds <- estimateSizeFactors(diagdds, geoMeans = geoMeans)
diagdds <- DESeq(diagdds, test="Wald", fitType="parametric")

5. Process the results

The results function creates a table of results. Then the res table is filtered by padj < alpha.

res <- results(diagdds, cooksCutoff = FALSE)
sigtab <- res[which(res$padj < alpha), ]
sigtab <- cbind(as(sigtab, "data.frame"), as(tax_table(merged_mapping_biom)[rownames(sigtab), ], "matrix")) # Bind taxonomic info to final results table
write.csv(sigtab, as.character(filename_out)) # Write `sigtab` to file