_images/Logo.png

DESeq2 with phyloseq

Note

Required OS:OS x or Linux.
Software:R, phyloseq R library
Purpose:This document provides instructions about how to find differentially abundant OTUs for microbiome data.
More:Read more about phyloseq DEseq2 here and here.
Author:This document was created by Saranga Wijeratne.

Software installation

Note

If you are runing this on MCBL mcic-sel019-d, please skip the installation. The following command in R-Studio will load the software module to your environment.

1
2
> library("phyloseq"); packageVersion("phyloseq")
[1]1.16.2# version

Install the phyloseq package as follows:

1
2
> source('http://bioconductor.org/biocLite.R')
> biocLite('phyloseq')

Import data with phyloseq

For this step, you need Biom and mapping files generated by the Qiime pipeline.

Input biom file:
 otu_table_mc10_w_tax.biom
Qiime mapping file:
 mapping.txt
Output file:DESeq2_Out

Copy all the input files to your “Working Directory” before you execute the following commands.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Filenames:
biom_file <- "otu_table_mc10_w_tax.biom"
mapping_file <- "mapping.txt"

# Import the biom file with phyloseq:
biom_otu_tax <- import_biom(biom_file)

# Import the mapping file with phyloseq:
mapping_file <- import_qiime_sample_data(mapping_file)

# Merge biom and mapping files with phyloseq:
merged_mapping_biom <- merge_phyloseq(biom_otu_tax,mapping_file)

# Set column names in the taxa table:
colnames(tax_table(merged_mapping_biom)) <- c("kingdom", "Phylum", "Class", "Order", "Family", "Genus", "species")

Now, your merged mapping and Biom output should look as follows:

1
2
3
4
5
6
merged_mapping_biom

# phyloseq-class experiment-level object
# otu_table()   OTU Table:         [ 315 taxa and 9 samples ]
# sample_data() Sample Data:       [ 9 samples by 8 sample variables ]
# tax_table()   Taxonomy Table:    [ 315 taxa by 7 taxonomic ranks ]

The mapping file should look like this:

1
2
3
4
5
6
7
8
9
head(mapping_file)

# Sample Data:        [40 samples by 7 sample variables]:
# X.SampleID BarcodeSequence LinkerPrimerSequence InputFileName IncubationDate Treatment Description
# S1          S1              NA                   NA      S1.fasta              0        CO         CO1
# S2          S2              NA                   NA      S2.fasta              0        CO         CO2
# S3          S3              NA                   NA      S3.fasta              0        CO         CO3
# S4          S4              NA                   NA      S4.fasta             15        CO         CO4
# S5          S5              NA                   NA      S5.fasta             15        CO         CO5

To remove taxonomy level tags assigned to each level (k__, p__, etc..), issue the following commands:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
tax_table(merged_mapping_biom) <- gsub("k__([[:alpha:]])", "\\1", tax_table( merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("p__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("c__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("o__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("f__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("g__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("s__([[:alpha:]])", "\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("p__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("c__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("o__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("f__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("g__(\\[)","\\1", tax_table(merged_mapping_biom))
tax_table(merged_mapping_biom) <- gsub("s__(\\[)","\\1", tax_table(merged_mapping_biom))

Testing for differential abundance among OTUs

Input file:merged_mapping_biom
Output files:DESeq2_Out.txt

1. Load the DESeq2 package into your R environment

1
2
3
library("DESeq2")
packageVersion("DESeq2")
    # [1] ‘1.12.4’

2. Assign DESeq2 output name and padj-cutoff

1
2
filename_out <- "DESeq2_Out.txt"
alpha <- 0.01
3. Convert to DESeqDataSet format

The phyloseq_to_deseq2() function converts the phyloseq-format microbiome data (i.e merged_mapping_biom) to a DESeqDataSet with dispersion estimated, using the experimental design formula (i.e ~ Treatment):

1
diagdds <- phyloseq_to_deseq2(merged_mapping_biom, ~ Treatment)

4. Run DESeq

1
2
3
4
5
6
7
8
 diagdds <- DESeq(diagdds, test="Wald", fitType="parametric")

 ## estimating size factors
 ## estimating dispersions
 ## gene-wise dispersion estimates
 ## mean-dispersion relationship
 ## final dispersion estimates
 ## fitting model and testing

Warning

If you are getting the following error:

Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans = geoMeans) : every gene contains at least one zero, cannot compute log geometric means
Calls: estimateSizeFactors ... estimateSizeFactors -> .local -> estimateSizeFactorsForMatrix

Then please execute the following code (see here for more info):

1
2
3
4
gm_mean <- function(x, na.rm=TRUE){ exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))}
geoMeans <- apply(counts(diagdds), 1, gm_mean)
diagdds <- estimateSizeFactors(diagdds, geoMeans = geoMeans)
diagdds <- DESeq(diagdds, test="Wald", fitType="parametric")
5. Process the results
The results function creates a table of results. Then the res table is filtered by padj < alpha.
1
2
3
4
res <- results(diagdds, cooksCutoff = FALSE)
sigtab <- res[which(res$padj < alpha), ]
sigtab <- cbind(as(sigtab, "data.frame"), as(tax_table(merged_mapping_biom)[rownames(sigtab), ], "matrix")) # Bind taxonomic info to final results table
write.csv(sigtab, as.character(filename_out)) # Write `sigtab` to file