GBS¶

This documents a pipeline for the analysis of GBS (Genotyping-By-Sequencing) data.

Note

Required OS:	OS x or Linux.
Software:	Tassel 5
Documentation:	Tassel 5.0 Wiki
Author:	This document is created by Saranga Wijeratne

File formats¶

File formats that will be using in this analysis:

Files you need to have¶

The following files need to be present before you start the pipeline:

Sequencing data files (.fastq or .fastq.gz)

Note

Fastq files should follow this naming convention: (more on page 7 here) - FLOWCELL_LANE_fastq.gz (e.g. AL2P1XXX_2_fastq.gz) - FLOWCELL_s_LANE_fastq.gz (e.g. AL2P1XXX_s_2_fastq.gz) - code_FLOWCELL_s_LANE_fastq.gz (e.g.: 00000000_AL2P1XXX_s_2_fastq.gz)

# To rename a .fastq.gz file:
$ mv  AE_S1_L001_R1_001.fastq.gz AL2P1XXX_1_fastq.gz

GBSv2 key file (example key file, more information).
A reference genome.

GBSv2 pipeline plugins¶

Plugin	Description
GBSSeqToTagDBPlugin	Executed to pull distinct tags from the database and export them in the fastq format. More
TagExportToFastqPlugin	Retrieves distinct tags stored in the database and reformats them to a FASTQ file. More
SAMToGBSdbPlugin	Used to identify SNPs from aligned tags using the GBS DB. More
DiscoverySNPCallerPluginV2	Takes a GBSv2 database file as input and identifies SNPs from the aligned tags. More
SNPQualityProfilerPlugin	Scores all discovered SNPs for various coverage depth and genotypic statistics for a given set of taxa. More
UpdateSNPPositionQualityPlugin	Reads a quality score file to obtain quality score data for positions stored in the snpposition table. More
SNPCutPosTagVerificationPlugin	Allows a user to specify a Cut or SNP position for which they would like data printed. More
GetTagSequenceFromDBPlugin	Takes an existing GBSv2 SQLite database file as input and returns a tab-delimited file containing a list of Tag Sequences stored in the specified database file. More
ProductionSNPCallerPluginV2	Converts data from fastq and keyfile to genotypes then adds these to a genotype file in VCF or HDF5 format. More

GBSv2 pipeline¶

1. Load Tassel 5.0 module

1	$ module load Tassel/5.0

2. Useful commands

To check all the plugins available, type:

1	$ run_pipeline.pl -Xmx200g -ListPlugins

To check all the parameters for given Plugin, Ex: GBSSeqToTagDBPlugin, type:

1	$ run_pipeline.pl -fork1 -GBSSeqToTagDBPlugin -endPlugin -runfork1

Tip

Users are recommended to read more about GBS command line options in here. Page 1-2

3. File preparation

Create necessary folders and copy your raw data (fastqs), reference file and key file to appropriate folder:

1	$ mkdir fastq ref key db tagsForAlign hd5

4. Execute the pipeline

$ run_pipeline.pl -Xmx200g -fork1 -GBSSeqToTagDBPlugin -i fastq  -k key/Tomato_key.txt -e ApeKI -db db/Tomato.db  -kmerLength 85 -mnQS 20  -endPlugin -runfork1
$ run_pipeline.pl -fork1 -TagExportToFastqPlugin  -db db/Tomato.db -o tagsForAlign/tagsForAlign.fa.gz -c 5  -endPlugin -runfork1
$ cd ref
$ bwa index -a is S_lycopersicum_chromosomes.2.50.fa
$ cd ../
$ bwa samse ref/S_lycopersicum_chromosomes.2.50.fa tagsForAlign/tagsForAlign.sai tagsForAlign/tagsForAlign.fa.gz > tagsForAlign/tagsForAlign.sam
$ run_pipeline.pl -fork1 -SAMToGBSdbPlugin -i tagsForAlign/tagsForAlign.sam  -db db/Tomato.db  -aProp 0.0 -aLen 0 -endPlugin -runfork1
$ run_pipeline.pl -fork1 -DiscoverySNPCallerPluginV2 -db db/Tomato.db  -sC "chr00" -eC "chr12" -mnLCov 0.1 -mnMAF 0.01  -endPlugin -runfork1
$ run_pipeline.pl -fork1 -ProductionSNPCallerPluginV2 -db db/Tomato.db  -e ApeKI -i fastq -k key/Tomato_key2.txt  -kmerLength 85 -mnQS 20 -o hd5/HapMap_tomato.h5 -endPlugin -runfork1