-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #225 from BinPro/develop
Version 1.0.0
- Loading branch information
Showing
49 changed files
with
2,584 additions
and
2,841 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Docker for CONCOCT (http://github.com/BinPro/CONCOCT) v1.0.0 | ||
# VERSION 1.0.0 | ||
# | ||
# This docker creates and sets up an Ubuntu environment with all | ||
# dependencies for CONCOCT v1.0.0 installed. | ||
# | ||
# To login to the docker with a shared directory from the host do: | ||
# | ||
# docker run -v /my/host/shared/directory:/my/docker/location -i -t alneberg/concoct_1.0.0 /bin/bash | ||
# | ||
|
||
FROM ubuntu:18.04 | ||
COPY . /opt/CONCOCT | ||
|
||
# Get basic ubuntu packages needed | ||
RUN apt-get update -qq | ||
RUN apt-get install -qq wget build-essential libgsl0-dev git zip unzip bedtools python-pip | ||
|
||
RUN pip install --upgrade pip | ||
|
||
# Install python dependencies and fetch and install CONCOCT 1.0.0 | ||
RUN cd /opt/CONCOCT;\ | ||
pip install -r requirements.txt;\ | ||
|
||
# wget --no-check-certificate https://github.com/BinPro/CONCOCT/archive/1.0.0.tar.gz;\ | ||
# tar xf 1.0.0.tar.gz;\ | ||
# cd CONCOCT-1.0.0;\ | ||
# python setup.py install | ||
|
||
RUN cd /opt/CONCOCT/;\ | ||
python setup.py install | ||
|
||
RUN cd /opt/CONCOCT/;\ | ||
nosetests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,8 @@ | ||
## CONCOCT 0.4.2 [![Build Status](https://travis-ci.org/BinPro/CONCOCT.png?branch=master)](https://travis-ci.org/BinPro/CONCOCT) | ||
## CONCOCT 1.0.0 [![Build Status](https://travis-ci.org/BinPro/CONCOCT.png?branch=master)](https://travis-ci.org/BinPro/CONCOCT) | ||
|
||
A program for unsupervised binning of metagenomic contigs by using nucleotide composition, | ||
A program for unsupervised binning of metagenomic contigs by using nucleotide composition, | ||
coverage data in multiple samples and linkage data from paired end reads. | ||
|
||
Warning! This software is to be considered under development. Functionality and the user interface may still change significantly from one version to another. | ||
If you want to use this software, please stay up to date with the list of known issues: | ||
https://github.com/BinPro/CONCOCT/issues | ||
|
||
## Please Cite ## | ||
If you use CONCOCT in your publication, please cite: | ||
|
||
|
@@ -15,8 +11,37 @@ Johannes Alneberg, Brynjar Smári Bjarnason, Ino de Bruijn, Melanie Schirmer, Jo | |
## Documentation ## | ||
A comprehensive documentation for concoct is hosted on [readthedocs](https://concoct.readthedocs.org). | ||
|
||
## Basic Usage ## | ||
Cut contigs into smaller parts | ||
```bash | ||
cut_up_fasta.py original_contigs.fa -c 10000 -o 0 --merge_last -b contigs_10K.bed > contigs_10K.fa | ||
``` | ||
|
||
Generate table with coverage depth information per sample and subcontig. | ||
This step assumes the directory 'mapping' contains sorted and indexed bam files where each sample has been mapped against the original contigs. | ||
```bash | ||
concoct_coverage_table.py contigs_10K.bed mapping/Sample*.sorted.bam > coverage_table.tsv | ||
``` | ||
|
||
Run concoct | ||
```bash | ||
concoct --composition_file contigs_10K.fa --coverage_file coverage_table.tsv -b concoct_output/ | ||
``` | ||
|
||
Merge subcontig clustering into original contig clustering | ||
```bash | ||
merge_cutup_clustering.py concoct_output/clustering_gt1000.csv > concoct_output/clustering_merged.csv | ||
``` | ||
|
||
Extract bins as individual FASTA | ||
```bash | ||
mkdir concoct_output/fasta_bins | ||
extract_fasta_bins.py original_contigs.fa concoct_output/clustering_merged.csv --output_path concoct_output/fasta_bins | ||
``` | ||
|
||
## Support ## | ||
If you are having issues, please let us know. We have a mailing list located at: [email protected] which you can also subscribe to [here](https://lists.sourceforge.net/lists/listinfo/concoct-support). | ||
[![Gitter](https://img.shields.io/badge/gitter-%20join%20chat%20%E2%86%92-4fb99a.svg?style=flat-square)](https://gitter.im/BinPro/CONCOCT) | ||
If you are having trouble running CONCOCT or interpretting any results, please don't hesitate to write a question in our gitter channel. | ||
|
||
## Contribute ## | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
#!/usr/bin/env python | ||
from __future__ import division | ||
DESC="""A script that iterates over concoct results and reruns the concoct algorithm | ||
for clusters where the median SCG presence is at least 2.""" | ||
|
||
|
||
import sys | ||
import logging | ||
import vbgmm | ||
import numpy as np | ||
import argparse | ||
import pandas as p | ||
|
||
from sklearn.decomposition import PCA | ||
|
||
from concoct.transform import perform_pca | ||
|
||
def main(argv): | ||
parser = argparse.ArgumentParser(description=DESC) | ||
|
||
parser.add_argument("cluster_file", help="string specifying cluster file") | ||
|
||
parser.add_argument("original_data", help="string original but transformed data file") | ||
|
||
parser.add_argument("scg_file", help="string specifying scg frequency file") | ||
|
||
parser.add_argument('-e','--expansion_factor',default=2, type=int, | ||
help=("number of clusters to expand by")) | ||
|
||
parser.add_argument('-t','--threads',default=1, type=int, | ||
help=("number of threads to use defaults to one")) | ||
|
||
args = parser.parse_args() | ||
|
||
clusters = p.read_csv(args.cluster_file, header=None, index_col=0) | ||
|
||
original_data = p.read_csv(args.original_data, header=0, index_col=0) | ||
|
||
original_data_matrix = original_data.as_matrix() | ||
|
||
scg_freq = p.read_csv(args.scg_file, header=0, index_col=0) | ||
|
||
scg_freq_matrix = scg_freq.as_matrix() | ||
|
||
med_scgs = np.median(scg_freq_matrix, axis=1) | ||
|
||
clusters_matrix = clusters.as_matrix() | ||
|
||
cluster_freq = np.bincount(clusters_matrix[:,0]) | ||
|
||
K = cluster_freq.shape[0] | ||
new_clusters_matrix = np.copy(clusters_matrix,order='C') | ||
nNewK = K - 1 | ||
for k in range(K): | ||
if med_scgs[k] > 1: | ||
|
||
select = clusters_matrix == k | ||
|
||
slice_k = original_data_matrix[select[:,0],:] | ||
|
||
index_k = np.where(select[:,0])[0] | ||
|
||
pca_object = PCA(n_components=0.90).fit(slice_k) | ||
transform_k = pca_object.transform(slice_k) | ||
|
||
NK = med_scgs[k]*args.expansion_factor | ||
print "Run CONCOCT for " + str(k) + "with " + str(NK) + "clusters" + " using " + str(args.threads) + "threads" | ||
assigns = vbgmm.fit(np.copy(transform_k,order='C'),int(NK),int(args.threads)) | ||
kK = np.max(assigns) + 1 | ||
|
||
|
||
for a in range(1,kK): | ||
index_a = index_k[assigns == a] | ||
new_clusters_matrix[index_a] = nNewK + a | ||
|
||
nNewK = nNewK + kK - 1 | ||
|
||
new_assign_df = p.DataFrame(new_clusters_matrix,index=original_data.index) | ||
new_assign_df.to_csv("clustering_refine.csv") | ||
if __name__ == "__main__": | ||
main(sys.argv[1:]) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
CC = gcc | ||
CFLAGS = -std=c99 -g -I/usr/local/include/ | ||
EFLAGS = | ||
EFILE = test_vbgmmfit | ||
LIBS = -lgomp -lpthread -lm -lgsl -lgslcblas -L/usr/local/lib | ||
OBJS = c_vbgmm_fit.o test_vbgmm_fit.o | ||
|
||
$(EFILE) : $(OBJS) | ||
@echo "linking..." | ||
$(CC) $(EFLAGS) -o $(EFILE) $(OBJS) $(LIBS) | ||
|
||
$(OBJS) : c_vbgmm_fit.c c_vbgmm_fit.h | ||
$(CC) $(CFLAGS) -c $*.c | ||
|
||
clean: | ||
rm -rf *.o test_vbgmmfit |
Oops, something went wrong.