-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't toposort nodes in non-user-facing operations #3146
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make def nodes
a cached_property
instead? Are pipelines supposed to be mutable?
Edit: I've changed my mind; accessing We could still make it a cached property (depending on the answer to your other question), but I think whether it's a cached property or not doesn't actually make a significant difference for user behaviors, and just using the unsorted version is more efficient regardless for the motivating case (and a lot of the update operations in general).
I don't know; I suppose they could be? But I don't think anybody would suffer much if they weren't. |
6a5370e
to
06145cf
Compare
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
06145cf
to
26ba1e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested it some time ago as part of #3167 - it contributed to a small (~5%) improvement.
Add more context, since the original slack thread is in a private channel I cannot get a linen copy.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to understand this PR more. Is the description of this PR updated? AFAIK, nodes
does not do any additional toposorts so I am not sure what's the fix for.
Does this actually helps performance or is just a clean up?
An archive of the discussion (terribly formatted) marrrcin plot.png 49 replies marrrcin
marrrcin Ivan Danov marrrcin Ivan Danov juanlu
Ivan Danov datajoely datajoely Ivan Danov Ivan Danov marrrcin Ivan Danov Ivan Danov marrrcin datajoely Ivan Danov datajoely marrrcin Ivan Danov Ivan Danov marrrcin juanlu image.png Ivan Danov marrrcin marrrcin Deepyaman Datta Deepyaman Datta Deepyaman Datta datajoely Ivan Danov juanlu juanlu datajoely datajoely juanlu juanlu datajoely datajoely Ivan Danov 🥳 datajoely Deepyaman Datta marrrcin There is a small improvement, but it’s less than 5% Deepyaman Datta marrrcin
marrrcin
marrrcin
👍:skin-tone-3:👍 Deepyaman Datta Click to Expand |
@deepyaman Are you still interested to continue with this PR? |
Signed-off-by: Ivan Danov <[email protected]>
Signed-off-by: Ivan Danov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still an improvement over the original, since every time we call self.nodes
, we copy the whole list. We can save up on making new copies of that list for internal non-mutating tasks. Not sure whether the stated goal of the PR is met though, since the toposorting happens only once anyways. Should we refocus this PR and make it about avoiding unnecessary copies through using list
, set
and copy
in different places internally?
@@ -2,6 +2,7 @@ | |||
|
|||
## Major features and improvements | |||
* Improved error message when passing wrong value to node. | |||
* Improved performance of pipeline operations by using non-toposorted nodes in internal code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Improved performance of pipeline operations by using non-toposorted nodes in internal code. | |
* Improved performance of pipeline operations by avoiding unnecessary copies of the nodes. |
I'm guessing this PR can be closed now #3730 is merged? |
Seems like that is a superset of my changes, so sure. |
Description
https://kedro-org.slack.com/archives/C054U0LJLAC/p1696862244884179
Development notes
Docs failure fixed by #3152
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file