Skip to content

Commit

Permalink
Merge pull request #3 from BD2KGenomics/feature#1/ManifestDownload
Browse files Browse the repository at this point in the history
Feature#1/manifest download
  • Loading branch information
briandoconnor authored Dec 21, 2016
2 parents e730a07 + fdcfb66 commit 2c1cb71
Show file tree
Hide file tree
Showing 6 changed files with 110 additions and 5 deletions.
26 changes: 24 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,27 @@ Look in _/dcc/ucsc-storage-client/conf/_ for further configuration options.
## Development
Build docker image with:
```
mvn && tar xf target/redwood-client-1.0.1-SNAPSHOT-dist.tar.gz && docker build -t benjaminran/redwood-client redwood-client-1.0.1-SNAPSHOT; rm -r redwood-client-1.0.1-SNAPSHOT
```
mvn && tar xf target/redwood-client-1.0.1-SNAPSHOT-dist.tar.gz && docker build -t quay.io/ucsc_cgl/core-client:1.0.0 redwood-client-1.0.1-SNAPSHOT; rm -r redwood-client-1.0.1-SNAPSHOT
```

## Upload via Spinnaker

Create a manifest that links your metdata and data. Your `manifest.tsv` should be a TSV based on this [template](https://docs.google.com/spreadsheets/d/13fqil92C-Evi-4cy_GTnzNMmrD0ssuSCx3-cveZ4k70/edit?usp=sharing).

You need to include file paths to your upload files that start with `/dcc/data` since that's the location used in the docker run below.

You should create a directory where you want to have your files for upload (assumed to be `pwd`), place your `manifest.tsv` in this directory along with all your files for upload, and then execute the following:

docker run --rm -it -e ACCESS_TOKEN=<access_token> -e REDWOOD_ENDPOINT=storage.ucsc-cgl.org -v `pwd`:/dcc/data quay.io/ucsc_cgl/core-client:1.0.0 spinnaker-upload /dcc/data/manifest.tsv

Once completed, you will find a receipt file (`spinnaker/output_metadata/receipt.tsv`) which you should save. It provides various IDs assigned to your donor, specimen, sample and file that make it much easier to find/audit later.

NOTE: Uploads can take a long time and our feedback on the command line needs to be improved. I suggest using a tool like `dstat` to monitor network usage to ensure uploads are in progress.

## Download via Manifest

This assumes the current working directory (`pwd`) has a manifest, like the ones you can download from http://ucsc-cgl.org/file_browser/. The command below will then download the files to the current working directory.

NOTE: make sure you have enough space in `pwd`!!!

docker run --rm -e ACCESS_TOKEN=<access_token> -e REDWOOD_ENDPOINT=storage.ucsc-cgl.org -v `pwd`:/dcc/data quay.io/ucsc_cgl/core-client:1.0.0 redwood-download /dcc/data/manifest.tsv /dcc/data/
2 changes: 1 addition & 1 deletion src/main/bin/download
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
accessToken=${ACCESS_TOKEN}
metadata_server_url=https://${REDWOOD_ENDPOINT}:8444
storage_server_url=https://${REDWOOD_ENDPOINT}:5431
trust_store_path=${DCC_HOME}/cert/devcacerts
trust_store_path=${DCC_HOME}/cert/clientcacerts
trust_store_pass=${UCSC_STORAGE_TRUSTSTORE_PASSWORD}

# setup
Expand Down
24 changes: 24 additions & 0 deletions src/main/bin/redwood-download
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash

#
# Usage: icgc-download.sh object-manifest output-dir
#

: "${DCC_HOME:?Need to set environment variable DCC_HOME}"
: "${ACCESS_TOKEN:?Need to set environment variable ACCESS_TOKEN}"
: "${REDWOOD_ENDPOINT:?Set environment variable REDWOOD_ENDPOINT to e.g. storage.ucsc-cgl.org}"
: "${UCSC_STORAGE_TRUSTSTORE_PASSWORD:?Need to set environment variable UCSC_STORAGE_TRUSTSTORE_PASSWORD}"

# config
accessToken=${ACCESS_TOKEN}
metadata_server_url=https://${REDWOOD_ENDPOINT}:8444
storage_server_url=https://${REDWOOD_ENDPOINT}:5431
trust_store_path=${DCC_HOME}/cert/clientcacerts
trust_store_pass=${UCSC_STORAGE_TRUSTSTORE_PASSWORD}

# setup
object=$1
download=$2

#download files in a manifest
java -Djavax.net.ssl.trustStore=${trust_store_path} -Djavax.net.ssl.trustStorePassword=${trust_store_pass} -Dmetadata.url=${metadata_server_url} -Dmetadata.ssl.enabled=true -Dclient.ssl.custom=false -Dstorage.url=${storage_server_url} -DaccessToken=${accessToken} -jar ${DCC_HOME}/lib/icgc-storage-client-1.0.14-SNAPSHOT/lib/icgc-storage-client.jar download --output-dir ${download} --output-layout bundle --manifest ${object} --force
28 changes: 28 additions & 0 deletions src/main/bin/spinnaker-upload
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash
set -x
#
# Usage: ucsc-upload.sh dataFile...
#

: "${DCC_HOME:?Set DCC_HOME to directory containing ucsc-storage-client}"
: "${ACCESS_TOKEN:?Need to set environment variable ACCESS_TOKEN}"
: "${REDWOOD_ENDPOINT:?Need to set environment variable REDWOOD_ENDPOINT to e.g. storage.ucsc-cgl.org}"
: "${UCSC_STORAGE_TRUSTSTORE_PASSWORD:?Need to set environment variable UCSC_STORAGE_TRUSTSTORE_PASSWORD}"

#config
accessToken=${ACCESS_TOKEN}
metadata_server_url=https://${REDWOOD_ENDPOINT}:8444
storage_server_url=https://${REDWOOD_ENDPOINT}:5431
tsv_file=$1

#Go to the right directory and execute the script
cd ${DCC_HOME}/dcc-spinnaker-client
python spinnaker.py \
--input-metadata-schema schemas/input_metadata.json \
--metadata-schema schemas/metadata_schema.json \
--output-dir ../data/spinnaker/output_metadata \
--receipt-file receipt.tsv \
--storage-client-path ${DCC_HOME}/dcc-spinnaker-client/ucsc-storage-client \
--force-upload \
--storage-access-token ${ACCESS_TOKEN} \
${tsv_file}
2 changes: 1 addition & 1 deletion src/main/bin/upload
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ cp $* ${upload}
accessToken=${ACCESS_TOKEN}
metadata_server_url=https://${REDWOOD_ENDPOINT}:8444
storage_server_url=https://${REDWOOD_ENDPOINT}:5431
trust_store_path=${DCC_HOME}/cert/devcacerts
trust_store_path=${DCC_HOME}/cert/clientcacerts
trust_store_pass=${UCSC_STORAGE_TRUSTSTORE_PASSWORD}

# register upload
Expand Down
33 changes: 32 additions & 1 deletion src/main/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM ubuntu:latest
MAINTAINER Carlos Espinosa <[email protected]>

# install oracle java
RUN apt-get update
Expand All @@ -8,7 +9,29 @@ RUN apt-get update
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections
RUN apt-get install -y oracle-java8-installer

RUN apt-get update && apt-get install -y uuid-runtime
#Setting up the spinnaker client
RUN apt-get update && apt-get install -y --force-yes \
python-dev \
python-pip \
build-essential \
libxml2-dev \
libxslt-dev \
lib32z1-dev \
python-setuptools \
git \
uuid-runtime

RUN easy_install pip

RUN pip install \
jsonschema \
jsonmerge \
openpyxl \
sets \
json-spec \
elasticsearch \
semver \
luigi

# add dcc files
ENV DCC_HOME /dcc
Expand All @@ -18,6 +41,14 @@ ADD bin ${DCC_HOME}/bin
ADD cert ${DCC_HOME}/cert
ADD lib ${DCC_HOME}/lib

# install spinnaker
RUN git clone -b release/1.0.0 --single-branch https://github.com/BD2KGenomics/dcc-spinnaker-client.git ${DCC_HOME}/dcc-spinnaker-client/
RUN cd ${DCC_HOME}/dcc-spinnaker-client/ && wget https://s3-us-west-2.amazonaws.com/beni-dcc-storage-dev/20161216_ucsc-storage-client.tar.gz && tar zxf 20161216_ucsc-storage-client.tar.gz && rm -f 20161216_ucsc-storage-client.tar.gz

# make everything runnable
RUN chmod a+x ${DCC_HOME}/bin/*

#Set environment variables
ENV PATH ${DCC_HOME}/bin:$PATH
ENV REDWOOD_ENDPOINT storage.ucsc-cgl.org
ENV UCSC_STORAGE_TRUSTSTORE_PASSWORD=changeit

0 comments on commit 2c1cb71

Please sign in to comment.