Skip to content
This repository was archived by the owner on Jul 4, 2019. It is now read-only.
/ ExGen Public archive

An automated pipeline for parallel analysis of whole Exome and Genome data in an HPC environment

Notifications You must be signed in to change notification settings

pushkardakle/ExGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExGen[Archived]

ExGen is a bpipe and Perl based whole Exome and Genome analysis pipeline. The pipeline was designed primarily to take advantage of a multi-node HPC environment though it also supports single node environments. The pipeline can be used for

Major features of the pipeline are:-

  • Easy single command launch for any number of input samples
  • Customizable mail notifications on completion of pipeline/ individual stages
  • Monitoring of the run with sample and stage wise status
  • Out of the box support for running on HPC environment with multiple job managers eg. lsf, torque etc supported
  • Inbuilt validation of tools, parameters and input files
  • Customizable tool flow and easy resume from any intermediate step in case of failure
  • Integrated collection of relevant statistics from every stage and collation into a single excel file for easy comprison and summarization
  • All fastqc images are collected into a single directory for easy comparison of pre and post QC
  • Analysis of time taken per module/tool
  • Logging of all the executed commands for easy traceback of parameters

Limitations:-

  • Currently supports only paired end Illumina data out of the box. Though pipeline can be easily edited to single end mode, different toolbase etc.

TODO:-

  • Reduction of consumed disk space with compression of BAM files using CRAM or alternate toolkit
  • Relevant plots for the collected statistics
  • Creation of a dockerised container with all the dependencies
  • Add test cases for multi node/single node use case scenarios

About

An automated pipeline for parallel analysis of whole Exome and Genome data in an HPC environment

Resources

Stars

Watchers

Forks

Packages

No packages published