Skip to content

Commit

Permalink
Merge pull request #56 from phac-nml/allele
Browse files Browse the repository at this point in the history
Allele
  • Loading branch information
chadlaing authored Jul 4, 2018
2 parents 7eed1fc + 078d7d8 commit 38069b9
Show file tree
Hide file tree
Showing 40 changed files with 930 additions and 96,591 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,9 @@ ENV/

# IDE
.vscode/
.idea/
*.iml


# Spyder project settings
.spyderproject
Expand All @@ -110,3 +113,4 @@ output/
validation/enterobase_90_50_with_blacklist.csv
.coveragerc
coverage_html_report/
/.pytest_cache/
25 changes: 10 additions & 15 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,27 +1,22 @@
language: python
python:
# We don't actually use the Travis Python, but this keeps it organized.
- "3.6"
install:
before_install:
- sudo apt-get update
# We do this conditionally because it saves us some downloading if the
# version is the same.
- if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh;
else
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
fi
install:
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
- bash miniconda.sh -b -p $HOME/miniconda
- export PATH="$HOME/miniconda/bin:$PATH"
- hash -r
- echo ". $HOME/miniconda/etc/profile.d/conda.sh" >> ~/.bashrc
- source ~/.bashrc
- conda config --set always_yes yes --set changeps1 no
- conda update -q conda
- conda update -y conda
# Useful for debugging any issues with conda
- conda info -a
- conda config --add channels bioconda
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION samtools bowtie2 mash bcftools biopython nose blast pandas seqtk
- source activate test-environment
- conda config --add channels conda-forge
- conda create -q -n test-environment python=3.6 samtools>=1.7 pandas>=0.23 bowtie2>=2.3 mash>=2 bcftools>=1.7 biopython>=1.69 blast>=2.2.31 seqtk>=1.2 pytest>=3.6
- conda activate test-environment
- python setup.py install

script:
- nosetests
- pytest
50 changes: 31 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
[![Master branch build status](https://api.travis-ci.org/phac-nml/ecoli_serotyping.svg?branch=master "Master Build Status")](https://travis-ci.org/phac-nml/ecoli_serotyping)

# ECTyper (an easy typer)
**ecyper** wraps a standalone serotyping module for _Escherichia coli_.
Supports _fasta_ and _fastq_ file formats.
**ecyper** is a standalone serotyping module for _Escherichia coli_. It supports _fasta_ and _fastq_ file formats.

# Dependencies:
- python 3.6.3.*
- pytest 3.6.*
- pandas 0.21.0.*
- samtools 1.5.*
- bowtie2 2.3.0.*
- mash 1.1.*
- bcftools 1.6.*
- mash 2.0.*
- bcftools 1.8.*
- biopython 1.69.*
- blast 2.2.31 .*
- seqtk 1.2.*
Expand All @@ -17,48 +19,58 @@ Supports _fasta_ and _fastq_ file formats.
1. Get `miniconda` if you do not already have `miniconda` or `anaconda`:
1. `wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh`
1. `bash miniconda.sh -b -p $HOME/miniconda`
1. `export PATH="$HOME/miniconda/bin:$PATH"`
1. `echo ". $HOME/miniconda/etc/profile.d/conda.sh" >> ~/.bashrc`
1. `source ~/.bashrc`
2. Install ectyper
* Directly via `conda`
1. `conda install -c bioconda ectyper`
* Through `github`
1. Install dependencies
`conda install pandas samtools bowtie2 mash bcftools biopython nose blast seqtk tqdm python=3.6`
`pandas samtools bowtie2 mash bcftools biopython pytest blast seqtk tqdm python=3.6`
1. Download git repository then unzip
`wget https://github.com/phac-nml/ecoli_serotyping/archive/master.zip`
1. Install ectyper inside unzipped directory
`python setup.py install`

# Basic Usage
1. Put all of your fasta/fastq files in one folder (concatenate paired files if you want the result to be considered a single entity)
1. Put the fasta/fastq files for serotyping analyses in one folder (concatenate paired files if you would like them to be considered a single entity)
1. `ectyper -i [file path]`
1. View the results on the console or in `output/[datetime]/output.csv`
1. View the results on the console or in `ectyper_[datetime]/output.csv`

# Example Usage
* `ectyper -i ecoliA.fasta` for a single file
* `ectyper -i ecoliA.fasta -o output_dir` for a single file, results stored in `output_dir`
* `ectyper -i ecoliA.fasta,ecoliB.fastq,ecoliC.fna` for multiple files
* `ectyper -i ecoli_folder` for a folder

# Advanced Usage
```
usage: ectyper [-h] -i INPUT [-d PERCENTIDENTITY] [-l PERCENTLENGTH]
[--verify] [-s] [--detailed] [-o OUTPUT]
usage: ectyper [-h] [-V] -i INPUT [-d PERCENTIDENTITY] [-l PERCENTLENGTH]
[--verify] [-o OUTPUT] [-r REFSEQ]
ectyper v0.4.0 Prediction of Escherichia coli serotype from raw reads or
assembled genome sequences
optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-i INPUT, --input INPUT
Location of E. coli genome file(s). Can be a single
file or a directory
file, a comma-separated list of files, or a directory
-d PERCENTIDENTITY, --percentIdentity PERCENTIDENTITY
Percent identity required for an allele match [default 90]
Percent identity required for an allele match [default
90]
-l PERCENTLENGTH, --percentLength PERCENTLENGTH
Percent length required for an allele match [default 50]
Percent length required for an allele match [default
50]
--verify Enable E. coli species verification
-s, --species Enable species identification when a non-E. coli
genome is found Note: refseq downloading is required
when running this option for the first time
--detailed Enable detailed program output
-o OUTPUT, --output OUTPUT
Directory location of output files.
Directory location of output files
-r REFSEQ, --refseq REFSEQ
Location of pre-computed MASH RefSeq sketch. If
provided, genomes identified as non-E. coli will have
their species identified using MASH. For best results
the pre-sketched RefSeq archive https://gembox.cbcb.um
d.edu/mash/refseq.genomes.k21s1000.msh is recommended
```
* The first time species identification is enabled you will need to wait for **ectyper** to download the reference sequences.

45 changes: 0 additions & 45 deletions ectyper.puml

This file was deleted.

Binary file removed ectyper/Data/bowtie_index/combined.1.bt2
Binary file not shown.
Binary file removed ectyper/Data/bowtie_index/combined.2.bt2
Binary file not shown.
Binary file removed ectyper/Data/bowtie_index/combined.3.bt2
Binary file not shown.
Binary file removed ectyper/Data/bowtie_index/combined.4.bt2
Binary file not shown.
Binary file removed ectyper/Data/bowtie_index/combined.rev.1.bt2
Binary file not shown.
Binary file removed ectyper/Data/bowtie_index/combined.rev.2.bt2
Binary file not shown.
Loading

0 comments on commit 38069b9

Please sign in to comment.