GBS¶
This documents a pipeline for the analysis of GBS (Genotyping-By-Sequencing) data.
Note
Required OS: | OS x or Linux. |
---|---|
Software: | Tassel 5 |
Documentation: | Tassel 5.0 Wiki |
Author: | This document is created by Saranga Wijeratne |
File formats¶
- File formats that will be using in this analysis:
Files you need to have¶
The following files need to be present before you start the pipeline:
- Sequencing data files (.fastq or .fastq.gz)
Note
Fastq files should follow this naming convention: (more on page 7 here) - FLOWCELL_LANE_fastq.gz (e.g. AL2P1XXX_2_fastq.gz) - FLOWCELL_s_LANE_fastq.gz (e.g. AL2P1XXX_s_2_fastq.gz) - code_FLOWCELL_s_LANE_fastq.gz (e.g.: 00000000_AL2P1XXX_s_2_fastq.gz)
1 2 | # To rename a .fastq.gz file:
$ mv AE_S1_L001_R1_001.fastq.gz AL2P1XXX_1_fastq.gz
|
- GBSv2 key file (example key file, more information).
- A reference genome.
GBSv2 pipeline plugins¶
Plugin | Description |
---|---|
GBSSeqToTagDBPlugin | Executed to pull distinct tags from the database and export them in the fastq format. More |
TagExportToFastqPlugin | Retrieves distinct tags stored in the database and reformats them to a FASTQ file. More |
SAMToGBSdbPlugin | Used to identify SNPs from aligned tags using the GBS DB. More |
DiscoverySNPCallerPluginV2 | Takes a GBSv2 database file as input and identifies SNPs from the aligned tags. More |
SNPQualityProfilerPlugin | Scores all discovered SNPs for various coverage depth and genotypic statistics for a given set of taxa. More |
UpdateSNPPositionQualityPlugin | Reads a quality score file to obtain quality score data for positions stored in the snpposition table. More |
SNPCutPosTagVerificationPlugin | Allows a user to specify a Cut or SNP position for which they would like data printed. More |
GetTagSequenceFromDBPlugin | Takes an existing GBSv2 SQLite database file as input and returns a tab-delimited file containing a list of Tag Sequences stored in the specified database file. More |
ProductionSNPCallerPluginV2 | Converts data from fastq and keyfile to genotypes then adds these to a genotype file in VCF or HDF5 format. More |
GBSv2 pipeline¶
1. Load Tassel 5.0 module
1 | $ module load Tassel/5.0
|
2. Useful commands
To check all the plugins available, type:
1 | $ run_pipeline.pl -Xmx200g -ListPlugins
|
To check all the parameters for given Plugin, Ex: GBSSeqToTagDBPlugin, type:
1 | $ run_pipeline.pl -fork1 -GBSSeqToTagDBPlugin -endPlugin -runfork1
|
Tip
Users are recommended to read more about GBS command line options in here. Page 1-2
3. File preparation
Create necessary folders and copy your raw data (fastqs), reference file and key file to appropriate folder:
1 | $ mkdir fastq ref key db tagsForAlign hd5
|
4. Execute the pipeline
1 2 3 4 5 6 7 8 9 | $ run_pipeline.pl -Xmx200g -fork1 -GBSSeqToTagDBPlugin -i fastq -k key/Tomato_key.txt -e ApeKI -db db/Tomato.db -kmerLength 85 -mnQS 20 -endPlugin -runfork1
$ run_pipeline.pl -fork1 -TagExportToFastqPlugin -db db/Tomato.db -o tagsForAlign/tagsForAlign.fa.gz -c 5 -endPlugin -runfork1
$ cd ref
$ bwa index -a is S_lycopersicum_chromosomes.2.50.fa
$ cd ../
$ bwa samse ref/S_lycopersicum_chromosomes.2.50.fa tagsForAlign/tagsForAlign.sai tagsForAlign/tagsForAlign.fa.gz > tagsForAlign/tagsForAlign.sam
$ run_pipeline.pl -fork1 -SAMToGBSdbPlugin -i tagsForAlign/tagsForAlign.sam -db db/Tomato.db -aProp 0.0 -aLen 0 -endPlugin -runfork1
$ run_pipeline.pl -fork1 -DiscoverySNPCallerPluginV2 -db db/Tomato.db -sC "chr00" -eC "chr12" -mnLCov 0.1 -mnMAF 0.01 -endPlugin -runfork1
$ run_pipeline.pl -fork1 -ProductionSNPCallerPluginV2 -db db/Tomato.db -e ApeKI -i fastq -k key/Tomato_key2.txt -kmerLength 85 -mnQS 20 -o hd5/HapMap_tomato.h5 -endPlugin -runfork1
|