Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Allow using an existing instance template on gcp #236

Open
infalmo opened this issue Feb 23, 2022 · 4 comments
Open

Feature request: Allow using an existing instance template on gcp #236

infalmo opened this issue Feb 23, 2022 · 4 comments

Comments

@infalmo
Copy link

infalmo commented Feb 23, 2022

No description provided.

@mbookman
Copy link
Contributor

Thanks for the feature request, @infinitepr0 !

Can you describe more of the motivation behind the request?

The Cloud Life Sciences API that dsub uses to run tasks does not allow for the specification of an instance template:

https://cloud.google.com/life-sciences/docs/reference/rest/v2beta/projects.locations.pipelines/run#VirtualMachine

So we'd need to bubble up the request for the feature in dsub to the Google team supporting the API. The more you are able to articulate the value of the feature and what capabilities you are currently missing, the better the chance that they can resource an update to the API.

Thanks!

@slagelwa
Copy link

Might make it easier/simpler to submit jobs?

E.g. one might be able to replace this:

dsub \
    --provider google-cls-v2 \
    --network projects/XXXXX/global/networks/XXXXX-shared \
    --subnetwork projects/YYYYY/regions/us-west1/subnetworks/YYYYY-west1 \
    --service-account [email protected] \
    --region us-west1 \
    --use-private-address \
    --min-ram 32 \
    --min-cores 8 \
    --boot-disk-size 10 \
    --disk-size 1500 \
    --project myproject \
    --image us.gcr.io/myproject/bcl2fastq2:2.20.0 \
    --logging gs://mybucket/logging/ \
    --input-recursive INPUT_PATH=gs://mybucket/run \
    --output-recursive OUTPUT_PATH=gs://mybucket/fastq \
    --command 'bcl2fastq 
         --runfolder-dir /mnt/data/input/run 
         --output-dir /mnt/data/output/fastq 
         --sample-sheet /mnt/data/input/SampleSheet.csv' \
    --wait

with this?

dsub \
    --provider google-cls-v2 \
    --template convert \
    --project myproject \
    --image us.gcr.io/myproject/bcl2fastq2:2.20.0 \
    --logging gs://mybucket/logging/ \
    --input-recursive INPUT_PATH=gs://mybucket/run \
    --output-recursive OUTPUT_PATH=gs://mybucket/fastq \
    --command 'bcl2fastq 
         --runfolder-dir /mnt/data/input/run 
         --output-dir /mnt/data/output/fastq 
         --sample-sheet /mnt/data/input/SampleSheet.csv' \
    --wait

Granted if it was something you were running frequently most people would probably just throw it into a script with a few parameters and just use the script. But with the template at least if any of your machine or networking parameters need to change you just update the template instead of having to hunt down all your scripts.

@mbookman
Copy link
Contributor

FWIW, we are in the process of adding support for the new Google Batch API.

One feature of the API is to Create a job from a Compute Engine instance template.

Once we have feature parity and stability of dsub with the Batch provider, we'll explore some of the new capabilities that the new API enables.

@slagelwa
Copy link

I hadn't heard of Google Batch API. Looks like a replacement for LIfe Sciences? Do you think there are going to be any limitations using Batch over Life Sciences?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants