-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plans to better integrate the 'in' and 'out' requirements? #31
Comments
Hi @BioComSoftware and thanks again for reaching out (and yes, issues are great, so that others can make use of the information too)! Q1
No, we don't plan to do this. The reason is that the idea of using fields and functions for the in and out-ports is that we can get auto-completion on the available in-ports and out-ports, in python editors supporting that. This is really important in our usecases where we want to develop a library of re-usable components. Then, using auto-completion to see the available ports, lessens the need to constantly look up the documentation while using component libraries. Do you have some specific reasons why you find that would be better? :) Q2I'm not sure I follow completely. But to start with, the following line: fooreplacer.in_foo = foowriter.out_foo ... does not execute the This method is actually executed as part of the Please correct me if I misunderstood you! Best Regards // Samuel |
Hi Samuel,
Thanks for touching base.
On Q1...I don't necessarily see a better reason for it. It was just a
way to reduce code from two lines to one - if there wasn't a specific
reason for doing the current way. It sounds like there is a specific
reason, so I'll go with it as I learn more about the code.
On Q2...maybe I'm doing something wrong. I used a number of print lines
to track what was happening, and I noticed that while the MyFooWriter()
object does get created by the workflow if I remove the request to the
variable, the MyFooWriter.run() never executes and the "foo.txt" never gets
created, even though the second file "foo.txt.bar.txt" does.
Notice in ~~txhxexsxexcxoxnxdxsxcxrxixpxt~~ **both scripts**, I removed the references to
MyFooWriter.out_foo from the MyFooReplacer.run()...so that there's no
hardcoded references in MyFooReplacer to MyFooWriter
Notice also that in MyFooReplacer.out_bar(), MyFooWriter.out_foo does
not exist.
Basically, unless MyFooWriter.out_foo is referenced either by the
workflow line "fooreplacer.in_foo = foowriter.out_foo", or
MyFooReplacer.run() ... the MyFooWriter.run() never fires.
Here's the code and output. Maybe I'm simply missing something obvious
on my part :D ...
=====================================
=== WIth the request to foowriter.out_foo ===
import luigi
import logging
import sciluigi
class MyFooWriter(sciluigi.Task):
print "Entered 'MyFooWriter' ..."
def out_foo(self):
print "Entered 'MyFooWriter.out_foo()' ..."
return sciluigi.TargetInfo(self, "foo.txt")
def run(self):
print "Entered 'MyFooWriter.run()' ..."
with self.out_foo().open('w') as foofile:
foofile.write('foo\n')
class MyFooReplacer(sciluigi.Task):
print "Entered 'MyFooReplacer' ..."
replacement = sciluigi.Parameter() # Here, we take as a parameter
what to replace foo with.
in_foo = None
def out_bar(self):
print "Entered 'MyFooReplacer.out_bar()' ..."
try:
# If in_foo() object exists
print "self.in_foo().path =", self.in_foo().path
return sciluigi.TargetInfo(self, self.in_foo().path +
'.bar.txt')
except Exception as e:
print e.message
print 'Using harcoded one'
return sciluigi.TargetInfo(self, 'foo.txt.bar.txt')
def run(self):
print "Entered 'MyFooReplacer.run()' ..."
# ================================================
# with self.in_foo().open() as in_f:
# with self.out_bar().open('w') as out_f:
# # Here we see that we use the parameter self.replacement:
# out_f.write(in_f.read().replace('foo', self.replacement))
# ================================================
with self.out_bar().open('w') as out_f:
out_f.write('bar\n')
class MyWorkflow(sciluigi.WorkflowTask):
def workflow(self):
print 'Starting workflow...'
foowriter = self.new_task('foowriter', MyFooWriter)
fooreplacer = self.new_task('fooreplacer', MyFooReplacer,
replacement='bar')#, in_foo = foowriter.out_foo)
print 'setting fooreplacer.in_foo = foowriter.out_foo'
fooreplacer.in_foo = foowriter.out_foo
return fooreplacer
if __name__ == '__main__':
sciluigi.run_local(main_task_cls=MyWorkflow)
=== Files Created ===
drwxr-xr-x 4 mikes staff 136 Feb 22 20:28 audit
-rw-r--r-- 1 mikes staff 4 Feb 22 20:28 foo.txt
-rw-r--r-- 1 mikes staff 4 Feb 22 20:28 foo.txt.bar.txt
drwxr-xr-x 4 mikes staff 136 Feb 22 20:28 log
=== OUTPUT ===
Entered 'MyFooWriter' ...
Entered 'MyFooReplacer' ...
2017-02-22 20:28:54 | INFO |
--------------------------------------------------------------------------------
Starting workflow...
2017-02-22 20:28:54 | INFO | SciLuigi: MyWorkflow Workflow Started
(logging to log/workflow_myworkflow_started_20170222_192854_286582.log)
2017-02-22 20:28:54 | INFO |
--------------------------------------------------------------------------------
/Library/Python/2.7/site-packages/luigi/parameter.py:259: UserWarning:
Parameter MyWorkflow(instance_name=sciluigi_workflow) is not of type string.
warnings.warn("Parameter {0} is not of type string.".format(str(x)))
setting fooreplacer.in_foo = foowriter.out_foo
Entered 'MyFooReplacer.out_bar()' ...
self.in_foo().path = Entered 'MyFooWriter.out_foo()' ...
foo.txt
Entered 'MyFooWriter.out_foo()' ...
Entered 'MyFooWriter.out_foo()' ...
Entered 'MyFooWriter.out_foo()' ...
2017-02-22 20:28:54 | INFO | Task foowriter started
Entered 'MyFooWriter.run()' ...
Entered 'MyFooWriter.out_foo()' ...
2017-02-22 20:28:54 | INFO | Task foowriter finished after 0.000s
Entered 'MyFooWriter.out_foo()' ...
Entered 'MyFooWriter.out_foo()' ...
2017-02-22 20:28:55 | INFO | Task fooreplacer started
Entered 'MyFooReplacer.run()' ...
Entered 'MyFooReplacer.out_bar()' ...
self.in_foo().path = Entered 'MyFooWriter.out_foo()' ...
foo.txt
Entered 'MyFooWriter.out_foo()' ...
2017-02-22 20:28:55 | INFO | Task fooreplacer finished after 0.001s
Starting workflow...
setting fooreplacer.in_foo = foowriter.out_foo
Entered 'MyFooReplacer.out_bar()' ...
self.in_foo().path = Entered 'MyFooWriter.out_foo()' ...
foo.txt
Entered 'MyFooWriter.out_foo()' ...
2017-02-22 20:28:55 | INFO |
--------------------------------------------------------------------------------
2017-02-22 20:28:55 | INFO | SciLuigi: MyWorkflow Workflow Finished
(workflow log at log/workflow_myworkflow_started_20170222_192854_286582.log)
2017-02-22 20:28:55 | INFO |
--------------------------------------------------------------------------------
=====================================
=== WIth-OUT the request to foowriter.out_foo ===
import luigi
import logging
import sciluigi
class MyFooWriter(sciluigi.Task):
print "Entered 'MyFooWriter' ..."
def out_foo(self):
print "Entered 'MyFooWriter.out_foo()' ..."
return sciluigi.TargetInfo(self, "foo.txt")
def run(self):
print "Entered 'MyFooWriter.run()' ..."
with self.out_foo().open('w') as foofile:
foofile.write('foo\n')
class MyFooReplacer(sciluigi.Task):
print "Entered 'MyFooReplacer' ..."
replacement = sciluigi.Parameter() # Here, we take as a parameter
what to replace foo with.
in_foo = None
def out_bar(self):
print "Entered 'MyFooReplacer.out_bar()' ..."
try:
print "self.in_foo().path =", self.in_foo().path
return sciluigi.TargetInfo(self, self.in_foo().path +
'.bar.txt')
except Exception as e:
print e.message
print 'Using harcoded one'
return sciluigi.TargetInfo(self, 'foo.txt.bar.txt')
def run(self):
print "Entered 'MyFooReplacer.run()' ..."
# ================================================
# with self.in_foo().open() as in_f:
# with self.out_bar().open('w') as out_f:
# # Here we see that we use the parameter self.replacement:
# out_f.write(in_f.read().replace('foo', self.replacement))
# ================================================
with self.out_bar().open('w') as out_f:
out_f.write('bar\n')
class MyWorkflow(sciluigi.WorkflowTask):
def workflow(self):
print 'Starting workflow...'
foowriter = self.new_task('foowriter_ThisIsJustAName', MyFooWriter)
fooreplacer = self.new_task('fooreplacer_ThisIsJustAName',
MyFooReplacer, replacement='bar')#, in_foo = foowriter.out_foo)
#=== REMOVED ====
# print 'setting fooreplacer.in_foo = foowriter.out_foo'
# fooreplacer.in_foo = foowriter.out_foo
#=== REMOVED ====
return fooreplacer
if __name__ == '__main__':
sciluigi.run_local(main_task_cls=MyWorkflow)
=== Files Created ===
drwxr-xr-x 4 mikes staff 136 Feb 22 20:34 audit
-rw-r--r-- 1 mikes staff 4 Feb 22 20:34 foo.txt.bar.txt
drwxr-xr-x 4 mikes staff 136 Feb 22 20:34 log
=== OUTPUT ===
Entered 'MyFooWriter' ...
Entered 'MyFooReplacer' ...
Starting workflow...
2017-02-22 20:34:47 | INFO |
--------------------------------------------------------------------------------
2017-02-22 20:34:47 | INFO | SciLuigi: MyWorkflow Workflow Started
(logging to log/workflow_myworkflow_started_20170222_193447_810115.log)
2017-02-22 20:34:47 | INFO |
--------------------------------------------------------------------------------
/Library/Python/2.7/site-packages/luigi/parameter.py:259: UserWarning:
Parameter MyWorkflow(instance_name=sciluigi_workflow) is not of type string.
warnings.warn("Parameter {0} is not of type string.".format(str(x)))
Entered 'MyFooReplacer.out_bar()' ...
self.in_foo().path = 'NoneType' object is not callable
Using harcoded one
2017-02-22 20:34:47 | INFO | Task fooreplacer_ThisIsJustAName started
Entered 'MyFooReplacer.run()' ...
Entered 'MyFooReplacer.out_bar()' ...
self.in_foo().path = 'NoneType' object is not callable
Using harcoded one
2017-02-22 20:34:47 | INFO | Task fooreplacer_ThisIsJustAName
finished after 0.000s
Starting workflow...
Entered 'MyFooReplacer.out_bar()' ...
self.in_foo().path = 'NoneType' object is not callable
Using harcoded one
2017-02-22 20:34:47 | INFO |
--------------------------------------------------------------------------------
2017-02-22 20:34:47 | INFO | SciLuigi: MyWorkflow Workflow Finished
(workflow log at log/workflow_myworkflow_started_20170222_193447_810115.log)
2017-02-22 20:34:47 | INFO |
--------------------------------------------------------------------------------
|
P.s.
Re: "Notice in the second script, I removed the references to
MyFooWriter.out_foo from the MyFooReplacer.run()...so that there's no
hardcoded references in MyFooReplacer to MyFooWriter "
Just to make sure I was clear - the above has been removed from both
scripts. So the only difference between them (as far as I can tell) is
the "fooreplacer.in_foo = foowriter.out_foo" line in the
MyWorkflow.workflow(). When that line is commented out, the
MyFooWriter.run() never fires.
|
Thanks for the additional info @BioComSoftware! Will have to have a closer look at this when head is clear and fresh :) perhaps tomorrow or so. |
Again, I'm not sure if 'issues' is the correct place to ask this. If not, I apologize.
I'm asking this question because; if there are no plans to implement the following, I may tackle it myself - unless you feel there's a specific reason not to.
So, there are two questions:
Q1.
In the tutorial workflow...
...I notice that we:
QUESTION: Are there any plans to make this sharing of in and out targets more integrated. For example...
I realize this requires significant overhauling of the luigi.parameter class. Which is why I wanted to ask if you were working on it before I dug in :D
Q2:
I also notice that it is the line...
...that actually triggers the foowriter.run(). If that line is removed (and foowriter.out_foo replaced manually inside of fooreplacer)...foowriter.run() never occurs.
Are there any plans to have the sciluigi.WorkflowTask class (in this case MyWorkflow) assume to run every new_task line...similar to the 'requires' method from luigi core, but not hardcoded in the class.
I.e. Something like an inline 'requires' statement (see below)...
...OR...
Just to simply assume they come in order.
...which results in a backwards running of each new_task, in the reverse order it is listed within the workflow.
The text was updated successfully, but these errors were encountered: