Skip to content

Commit

Permalink
Merge branch 'release/1.0.1'
Browse files Browse the repository at this point in the history
  • Loading branch information
Ubuntu committed May 9, 2017
2 parents 804f702 + 2fb764f commit 9125d85
Show file tree
Hide file tree
Showing 16 changed files with 2,600 additions and 0 deletions.
141 changes: 141 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,143 @@
# dcc-action-service

## Introduction

A place for our business logic written in Luigi that triggers workflow launches.

## Install

### Ubuntu 14.04

You need to make sure you have system level dependencies installed in the appropriate way for your OS. For Ubuntu 14.04 you do:

sudo apt-get install python-dev libxml2-dev libxslt-dev lib32z1-dev

### Elasticsearch

Download and install [elasticsearch](https://www.elastic.co). I found the debian package install to be easiest on Ubuntu. Start it using the /etc/init.d/elasticsearch script. Or, if you're on a mac, you can download a tarball and just execute the ./bin/elasticsearch script. I'm using version 5.0.0.

NOTE: Elasticsearch is not secure. Do not run it on a web server open to the outside world.

### Python

Use python 2.7.x.

See [here](https://www.dabapps.com/blog/introduction-to-pip-and-virtualenv-python/) for information on setting
up a virtual environment for Python.

If you haven't already installed pip and virtualenv, depending on your system you may
(or may not) need to use `sudo` for these:

sudo easy_install pip
sudo pip install virtualenv

Now to setup:

virtualenv env
source env/bin/activate
pip install jsonschema jsonmerge openpyxl sets json-spec elasticsearch semver luigi python-dateutil cwl-runner cwltool==1.0.20160316150250 schema-salad==1.7.20160316150109 avro==1.7.7 typing

Alternatively, you may want to use Conda, see [here](http://conda.pydata.org/docs/_downloads/conda-pip-virtualenv-translator.html)
[here](http://conda.pydata.org/docs/test-drive.html), and [here](http://kylepurdon.com/blog/using-continuum-analytics-conda-as-a-replacement-for-virtualenv-pyenv-and-more.html)
for more information.

conda create -n schemas-project python=2.7.11
source activate schemas-project
pip install jsonschema jsonmerge openpyxl sets json-spec elasticsearch semver luigi python-dateutil cwl-runner cwltool==1.0.20160316150250 schema-salad==1.7.20160316150109 avro==1.7.7 typing

### Consonance Command Line

TODO: need to document the install of the Consonance command line.
, instead the user will provide a directory that contains json files.

## Manually Calling

```
# example run (local mode)
rm -rf /tmp/AlignmentQCTask* /tmp/afb54dff-41ad-50e5-9c66-8671c53a278b; PYTHONPATH='' luigi --module AlignmentQCTask AlignmentQCCoordinator --local-scheduler --es-index-host localhost --es-index-port 9200
# example run (local mode) of next gen version
rm -rf /tmp/consonance-jobs/AlignmentQCTask*; PYTHONPATH='.' luigi --module AlignmentQCTaskV2 AlignmentQCCoordinatorV2 --local-scheduler --es-index-host localhost --es-index-port 9200
# run with central scheduler
mkdir -p /mnt/AlignmentQCTask
rm -rf /tmp/AlignmentQCTask* /tmp/9c09bca7-8ffa-54fe-a1c9-3f8a71df515b; PYTHONPATH='' luigi --module AlignmentQCTask AlignmentQCCoordinator --es-index-host localhost --es-index-port 9200 --ucsc-storage-client-path ../ucsc-storage2-client --ucsc-storage-host https://storage2.ucsc-cgl.org --tmp-dir `pwd`/luigi_state --data-dir /mnt/AlignmentQCTask
# another test with AlignmentQC
git hf update; git hf pull; PYTHONPATH='' luigi --module AlignmentQCTask AlignmentQCCoordinator --es-index-host localhost --es-index-port 9200 --ucsc-storage-client-path ../ucsc-storage2-client --ucsc-storage-host https://storage2.ucsc-cgl.org --tmp-dir `pwd`/luigi_state --data-dir /mnt/AlignmentQCTask --max-jobs 1
# now test sequence QC runner (can use --local-scheduler for local schedule)
cd luigi_task_executor
rm -rf /tmp/consonance-jobs; PYTHONPATH='' luigi --module SequenceQCTask SequenceQCCoordinator --es-index-host localhost --es-index-port 9200 --redwood-token `cat ../accessToken` --redwood-client-path ../ucsc-storage-client --redwood-host storage.ucsc-cgl.org --tmp-dir /tmp --data-dir /mnt/SequenceQCTask --max-jobs 1
# local scheduler
PYTHONPATH='.' luigi --module SequenceQCTask SequenceQCCoordinator --es-index-host 172.31.25.227 --es-index-port 9200 --redwood-token `cat ../accessToken` --redwood-client-path ../ucsc-storage-client --redwood-host storage.ucsc-cgl.org --local-scheduler --tmp-dir /tmp --data-dir /mnt/SequenceQCTask --max-jobs 1 --image-descriptor /home/ubuntu/luigi_consonance_rnaseq_testing/test_here/Dockstore.cwl
```

Monitor on the web GUI:
http://localhost:8082/static/visualiser/index.html#

## Demo

Goal: create sample single donor documents and perform queries on them.

1. Install the needed packages as described above.
1. Generate metadata for multiple donors using `generate_metadata.py`, see command above
1. Create single donor documents using `merge_gen_meta.py`, see command above
1. Load into ES index, see `curl -XPUT` command above
1. Run the queries using `esquery.py`, see command above
1. Optionally, deleted the index using the `curl -XDELETE` command above

The query script, `esquery.py`, produces output whose first line prints the number of documents searched upon.
The next few lines are center, program and project.
Following those lines, are the queries, which give information on:
* specifications of the query
* number of documents that fit the query
* number of documents that fit this query for a particular program
* project name

### services

Make sure both luigi and elasticsearch are running in screen sessions.

# screen session 1
~/elasticsearch-2.3.5/bin/elasticsearch &
# or if you use service (as you should)
sudo service elasticsearch restart
# screen session 2
source env/bin/activate
luigid

### simulate_upload.py

This script runs an unlimited number of BAM file uploads at random intervals. The script will run until killed.

cd luigi_task_executor
python simulate_upload.py --bam-url https://s3.amazonaws.com/oconnor-test-bucket/sample-data/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam \
--input-metadata-schema ../input_metadata.json --metadata-schema ../metadata_schema.json --output-dir output_metadata --receipt-file receipt.tsv \
--storage-access-token `cat ../ucsc-storage2-client/accessToken` --metadata-server-url https://storage2.ucsc-cgl.org:8444 \
--storage-server-url https://storage2.ucsc-cgl.org:5431 --ucsc-storage-client-path ../ucsc-storage2-client

Another script, this time it simulates the upload of fastq files:

cd luigi_task_executor
python simulate_upload_rnaseq_fastq.py --fastq-r1-path \
https://s3.amazonaws.com/oconnor-test-bucket/sample-data/ERR030886_1.fastq.gz \
--fastq-r2-path https://s3.amazonaws.com/oconnor-test-bucket/sample-data/ERR030886_2.fastq.gz \
--input-metadata-schema ../input_metadata.json --metadata-schema ../metadata_schema.json \
--output-dir output_metadata --receipt-file receipt.tsv \
--storage-access-token `cat ../ucsc-storage2-client/accessToken` --metadata-server-url https://storage2.ucsc-cgl.org:8444 \
--storage-server-url https://storage2.ucsc-cgl.org:5431 --ucsc-storage-client-path ../ucsc-storage2-client

### simulate_indexing.py

cd luigi_task_executor
python simulate_indexing.py --storage-access-token `cat ../ucsc-storage2-client/accessToken` --client-path ../ucsc-storage2-client --metadata-schema ../metadata_schema.json --server-host storage2.ucsc-cgl.org

### simulate_analysis.py

cd luigi_task_executor
python simulate_analysis.py --es-index-host localhost --es-index-port 9200 --ucsc-storage-client-path ../ucsc-storage2-client --ucsc-storage-host https://storage2.ucsc-cgl.org

# temp
git hf update; git hf pull; PYTHONPATH='' luigi --module AlignmentQCTask AlignmentQCCoordinator --es-index-host localhost --es-index-port 9200 --ucsc-storage-client-path ../ucsc-storage2-client --ucsc-storage-host https://storage2.ucsc-cgl.org --tmp-dir `pwd`/luigi_state --data-dir /mnt/AlignmentQCTask --max-jobs 1
Loading

0 comments on commit 9125d85

Please sign in to comment.