Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Nextflow example #39

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

[WIP] Add Nextflow example #39

wants to merge 6 commits into from

Conversation

jluethi
Copy link
Collaborator

@jluethi jluethi commented Mar 31, 2023

Closes #38

It's still work in progress to generalize a bit further. But example 01 runs successfully with this workflow and it already uses the parameters from our parameter json files in the examples.
I'll update it a bit more next week before merging :)

Major parts to explore:

  • Currently, the components passed to each task are hard-coded for this example
  • Currently, everything is tested for 1 well only
  • Currently, nextflow & the tasks run in the same environment and all tasks run in the same conda environment. This can be generalized further
  • Currently, every task is defined as its own process, but the processes are very similar. This could potentially be generalized (but maybe that's also what we'd want to have, makes for an easy overview of the Nextflow workflow)
  • I haven't figured out where the task logs go so far.

@jluethi
Copy link
Collaborator Author

jluethi commented Apr 4, 2023

Quick update on the Nextflow tests: I now managed to get the nextflow example to a state where example 01 can be run successfully through Nextflow with minimal hard-coding: Now, there are a few parameters that need to be set, but nothing hard-coded in the individual processes (each process is 1 task). I wrote some helper functions to parse metadata and components that are run in the processes as well, as part of the script (a bit of pre- and postprocessing).

This makes the processes more task specific now, they don't all have the same content (even besides the different task names being called). Some processes generate metadata, others don't actually add useful updates to them.

I added 2 version now:
run_nextflow.nf updates the metadata after each task. In that way, each task is depending on the prior task and will only execute when that one is finished.
run_nextflow_better_metadata_handling.nf: Only update the metadata in the tasks that run on the plate level. But by default, processes then don't wait for prior processes to finish, so I added dummy outputs that are passed to the next process as an input (clearly not the intended nextflow behavior, couldn't get other things to work so far).

Next steps:
Parallelize over multiple wells: The component logic should help and prepare for this

Hurdles:

  1. How do I make some processes wait (e.g. copy-ome-zarr should wait for all yoko-to-ome-zarr processes to finish)?
  2. Handling metadata: Would approach 1 still work? Every process write a consolidated metadata, passes it on to the next process? I don't understand nextflow enough to judge this yet.

@jluethi jluethi marked this pull request as draft April 17, 2023 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add nextflow examples
1 participant