Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help modifying startTasksFrom to support sub-WorkflowTasks #6

Open
hyjkim opened this issue Jul 18, 2015 · 1 comment
Open

Help modifying startTasksFrom to support sub-WorkflowTasks #6

hyjkim opened this issue Jul 18, 2015 · 1 comment
Labels

Comments

@hyjkim
Copy link

hyjkim commented Jul 18, 2015

Pyflow has a nice option of starting from a particular task id when a workflow is run. eg,

wf = SomeWorkflow()
wf.run(startFromTasks='task_id')

Which works great as long as your workflows are just one level deep. In some instances, I'd like to start from a specific subworkflow task. Take the following example:

from pyflow import WorkflowRunner
import sys

class ChildA(WorkflowRunner):
    def workflow(self):
        self.flowLog('ChildA called')


class ChildB(WorkflowRunner):
    def workflow(self):
        self.flowLog('ChildB called')
        gcc_wc = GrandchildC()
        gcc_task = self.addWorkflowTask('grandchild_c', gcc_wc)

        gcd_wc = GrandchildD()
        self.addWorkflowTask('grandchild_d', gcd_wc, dependencies=gcc_task)

class GrandchildC(WorkflowRunner):
    def workflow(self):
        self.flowLog('GrandchildC called')


class GrandchildD(WorkflowRunner):
    def workflow(self):
        self.flowLog('GrandchildD called')

class Master(WorkflowRunner):
    def workflow(self):
        a_wf = ChildA()
        a_task = self.addWorkflowTask('child_a', a_wf)
        b_wf = ChildB()
        self.addWorkflowTask('child_b', b_wf, dependencies = a_task)

if __name__ == "__main__":
    startFromTasks = None
    if len(sys.argv) > 1:
        startFromTasks = sys.argv[1]
    wf = Master()
    wf.run(startFromTasks=startFromTasks, isContinue='Auto')

Where this workflow is launched as

python workflow.py

Running the whole workflow generates a task graph like this:
example state

Like in my initial example, it's simple enough to launch from a child workflow:

python workflow.py child_b

But trying to start from a specific grandchild workflow results in no tasks being run at all:

python workflow.py child_b+grandchild_d

I'm guessing this is due to the way that pyflow builds its DAG. A grandchild task will not be added to the DAG if the child task is already marked as complete.

Any ideas on how I could extend pyflow to support this feature?

@ctsa
Copy link
Contributor

ctsa commented Jul 21, 2015

Thanks for finding this issue and providing all the details. I won't be able to take a deep dive into this one anytime soon but it seems probable that child_b is being erroneously marked as complete, which would produce the behavior you describe. It's likely to be an issue in here:

https://github.com/Illumina/pyflow/blob/master/pyflow/src/pyflow.py#L2351-2370

Note you might want to try disabling the auto-continuation when you test this -- that wasn't the problem in this case but you could be building up a completion history which would complicate this test otherwise.

@ctsa ctsa added the bug label Jul 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants