GitHub - faircloth-lab/splitaake: demultiplex massively parallel sequencing data

Purpose

splitaake is a reasonably easy method for demultiplexing Illumina reads using either Hamming distance or Levenshtein (edit) distance sequence tags. A the moment, Hamming distance is the default, which is similar to the approach Illumina uses in their own software. splitaake differs from the Illumina software in that it can demultiplex sequence tags of many lengths, rather than just the "standard" TruSeq index length of 6 nucleotides.

Notes

splitaake is under development. This means that it may break, be a pain to get running, etc. I'll improve as I have time. Please feel free to suggest contributions/changes/additions.

Design

splitaake is currently designed as a single-core application meaning that it does not parallelize the process of demultiplexing. After a number of tests, I've found that for most files, a single core approach is reasonably fast (and sometimes faster) than multi-core options, particularly when you wish to work with gzipped fastq files.

I'm still testing additional ways of demultiplexing data in parallel. Hopefully more on this front soon...

Dependencies

seqtools ("working" branch):

pip install git+git://github.com/faircloth-lab/seqtools.git@working

jellyfish (at the moment - quite fast Hamming implementation):
```
pip install git+git://github.com/sunlightlabs/jellyfish
```

Running

generate a config file mapping indexes to filenames. This file is named map.conf, as used below:

TruSeq1:ATCACGATCT
TruSeq2:CGATGTATCT
TruSeq3:TTAGGCATCT
TruSeq4:TGACCAATCT
TruSeq5:ACAGTGATCT
TruSeq6:GCCAATATCT
TruSeq7:CAGATCATCT
TruSeq8:ACTTGAATCT
TruSeq9:GATCAGATCT
TruSeq10:TAGCTTATCT
TruSeq11:GGCTACATCT
TruSeq12:CTTGTAATCT

run splitaake:

python splitaake.py L007_R1.fastq.gz L007_R2.fastq.gz L007_R3.fastq.gz map.conf --section taxa

this will identify your reads and create a directory dmux containing your reads in interleaved, fastq, gzip files, like so:
```
dmux/
    TruSeq1.fastq.gz
    TruSeq2.fastq.gz
    TruSeq3.fastq.gz
    ...
```

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
bin		bin
examples		examples
splitaake		splitaake
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.rst		README.rst
distribute_setup.py		distribute_setup.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Purpose

Notes

Design

Dependencies

Running

About

Releases

Packages

Languages

License

faircloth-lab/splitaake

Folders and files

Latest commit

History

Repository files navigation

Purpose

Notes

Design

Dependencies

Running

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages