Skip to content

faircloth-lab/phyluce-workflows

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

phyluce-workflows

Purpose

These are additional workflows that we use with phyluce to accomplish a variety of redundant tasks (e.g. read mapping, contig correct, etc.).

Installing

  1. Check out the code somewhere

    git clone https://github.com/faircloth-lab/phyluce-workflows
  2. Navigate to that location and install dependencies from conda environment

    cd <path to wherever you cloned>
    conda env create -f environment.yml
  3. Activate conda environment

    conda activate phyluce-workflows
  4. Navigate to directory that contains the workflow you want to run, edit config file appropriately, and run snakemake:

    cd mapping
    # <edit config file or run using example>
    # change cores to suit your system
    snakemake --cores 1
  5. Right now, the remaining workflows are built off of the mapping workflow, meaning that you need to run it first, regardless of other things that you run.

Workflows

  • Mapping: Map raw reads to species specific contigs. Uses bwa, samtools, pandas, and a custom script to map reads, perform duplicate detection and marking, and compute coverage across contigs by a few metrics.

  • Contig-correction: Using pre-existing BAM files (perhaps from Mapping), use bcftools and depth of coverage information to call SNPs in contigs, remove bad calls, and output consensus of results. BAM files may also have been run through mapDamage. Filters for removal are --IndelGap 5 --SnpGap 5 --exclude 'QUAL<20 | DP<5 | AN>2', and sequences that are reduced to < 50 bp after filtering.

  • Phasing: Using pre-existing BAM files (perhaps from Mapping), use samtools to phase SNPs. The workflow phases SNPs using samtools, produces 0.BAM and 1.BAM files for each haplotype, then converts those to FASTA data representing each haplotype using pilon. Along with the FASTA files, pilon outputs a changes file and a vcf file for each haplotype. Probably still needs a little work to deal with low coverage FASTAs that are produced.

About

Snakemake workflows for phyluce tasks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages