-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat[next][dace]: Added DistributedGlobalSelfCopyElimination
#1890
feat[next][dace]: Added DistributedGlobalSelfCopyElimination
#1890
Conversation
… but this time in multiple states. Note that this transformation does not look at the subsets when it removes it. This is in accordance with ADR-18.
…) -> (T) -> (G)` which is handled by the previous self copy elimination transformation. However, it is something that we should do. The main problem is that deleting the (unnecessary) writes to `G` is a lot harder.
…fication pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/gt4py/next/program_processors/runners/dace/transformations/simplify.py
Outdated
Show resolved
Hide resolved
@@ -58,7 +58,11 @@ def gt_simplify( | |||
|
|||
Further, the function will run the following passes in addition to DaCe simplify: | |||
- `GT4PyGlobalSelfCopyElimination`: Special copy pattern that in the context | |||
of GT4Py based SDFG behaves as a no op. | |||
of GT4Py based SDFG behaves as a no op, i.e. `(G) -> (T) -> (G)`. | |||
- `DistributedGlobalSelfCopyElimination`: Very similar to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose different names: SingleStateGlobalSelfCopyElimination
and MultiStateGlobalSelfCopyElimination
, but this is just a matter of taste. Please do as you prefer.
Also, I wonder why only one includes "GT4Py" in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the names are very bad and your names are very good.
They are actually so good that I now want to keep SingleStateGlobalSelfCopyElimination
such that I can keep MultiStateGlobalSelfCopyElimination
and do not have to rename it to GlobalSelfCopyElimination
.
src/gt4py/next/program_processors/runners/dace/transformations/redundant_array_removers.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/redundant_array_removers.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/redundant_array_removers.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/redundant_array_removers.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/redundant_array_removers.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/redundant_array_removers.py
Outdated
Show resolved
Hide resolved
# This is because the local `G` node could become isolated. | ||
# We do not need to consider the outgoing edges, because | ||
# they are reads which we have handled above. | ||
neigbourhood.update((state, iedge.src) for iedge in state.in_edges(dnode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I wrong or here we could assert isinstance(iedge.src, dace_nodes.AccessNode) and iedge.src.data == gname
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.
The assert is there, but down on line 405, inside the loop that actually removes the isolated nodes.
My problem with the assert the location you suggested is/was that I would either have to use a for
loop or go through the set of edges twice.
The for
loop adds more indentation and the going through the neighbours is potentially expensive.
So I originally put the assert later.
However, I agree that having the assert where the problem originates is better.
I settled with the for
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, please do as you prefer.
removed_isolated_nodes: set[dace_nodes.Node] = set() | ||
for state, nh_node in neigbourhood: | ||
assert isinstance(nh_node, dace_nodes.AccessNode) and nh_node.data == gname | ||
if (nh_node not in removed_isolated_nodes) and (state.degree(nh_node) == 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we could assert state.in_degree(nh_node) == 0
and only check state.out_degree(nh_node) == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not really understand that comment.
I am not sure if I understand you correctly, but nh_node
is not necessaraly isolated, as some other nodes might read from it.
However, your question made me question the current implementation and I updated it and made it a bit cleaner.
However, because Python does not have curly brace and we do not use 8 space indentation I am not sure if it has become better, what do you think?
I like this one a little bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was to assert on input degree, so the node can have successors reading from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean: there should never be a write to the global, since this is a candidate. Instead, there might be a read, that is why we should check the output degree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote input degree above, but maybe it is too restrictive:
if (nh_node not in removed_isolated_nodes) and (state.out_degree(nh_node) == 0):
assert state.in_degree(nh_node) == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully sure.
I do not have a counter example, but in most cases you might be right.
However, I would keep the degree == 0
test since we are interested in removing isolated nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
Co-authored-by: edopao <[email protected]>
removed_isolated_nodes: set[dace_nodes.Node] = set() | ||
for state, nh_node in neigbourhood: | ||
assert isinstance(nh_node, dace_nodes.AccessNode) and nh_node.data == gname | ||
if (nh_node not in removed_isolated_nodes) and (state.degree(nh_node) == 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
The transformation
DistributedGlobalSelfCopyElimination
is very similar to theGT4PyGlobalSelfCopyElimination
but they target slightly different cases.The transformation
GT4PyGlobalSelfCopyElimination
, which is already there, handles the pattern(G) -> (T) -> (G)
, i.e. the global dataG
is copied into the transientT
is then immediately copied back intoG
.Because of ADR-18 we know that this has no effect, because
G
is used as input and output and must therefore be point wise, soG[i, j]
in the output can only beG[i, j]
at the beginning.The new transformation
GT4PyGlobalSelfCopyElimination
handles a different case, it looks for patterns(G) -> (T)
and(T) -> (G)
, which is essentially the same, but this time the definition ofT
and the write back ofT
intoG
does not need to be in the same state.In the long run, the two transformation should be combined.