-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add single exit node (TerminalNode) to FlowGraph #114
Conversation
Previously, any ReturnNode could be the exit of the FlowGraph, which complicated some of the interval/structural analysis. Adding a single common exit node helps reduce edge cases. Also refactored: Postdominators & loops can now be calculated on the FlowGraph at construction. Previously it was "postdominator within this interval" because there was not the common exit. This moved a lot of code out of `if_statements.py` and into `flow_graph.py`. `Node.emit_goto` is now only used for very special cases (certain asm instructions, or the user manually annotates the source). Rather than pre-compute which nodes need goto, we just use a goto if the node was already emitted while building the Body. The `emit_node()` function now contains the logic for checking if a node has been already been emitted and a goto should be used instead. Added `SwitchStatement` to `if_statements.py`. Indenting is now done at formatting time, instead of being determined while constructing `Body`s. This makes it possible to shift `Body` segments around after construction without recalculating the indents. Similiarly, empty return elision is done after building the `Body`s, instead of during the `build_*` functions. Like indenting, this is to accomidate re-writes.
Not sure if this is appropriate, but I tested this out on a kind of nasty function in paper mario (func_802B62A4_E25174) the results seem to be an improvement, but there are some new sections that seem not great:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random comments. Can't say I have a good understanding of the big picture, still...
- Add documentation/comments - Remove stray debugging/comments - Make `BaseNode` an ABC, `children()` abstract & return a list
- Make `indent` arg to `Formatter.indent()` optional - Add more documentation, TODOs, and ext links
@matt-kempster I'll wait to merge this until you have a chance to re-review it (if you want to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks absolutely great, thanks for working on this.
src/flow_graph.py
Outdated
for n in nodes: | ||
if n == entry: | ||
continue | ||
nset = dominators(n) | ||
for p in parents(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (just noticed this was in the original code): this would be difficult to debug in pdb due to the use of n
and p
. Maybe add a TODO to change these variable names to node
and parent
respectively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nbd to just fix it now - I'l rename them to node
& parent
(and c
-> child
for consistency)
@@ -167,6 +234,25 @@ def get_lone_if_statement(self) -> Optional[IfElseStatement]: | |||
ret = statement | |||
return ret | |||
|
|||
def elide_empty_returns(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we expand this to also elide duplicate returns and/or warn about unreachable post-return statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I think this is more appropriate for the emit_node
discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this refactor addressed those issues we were having before where we would sometimes output return; return; return;
(ex func_80B8E84C
in MM) -- that stopped happening even before I added elide_empty_returns
.
if_body=if_body, | ||
else_body=None, | ||
) | ||
) | ||
emit_successor(node.fallthrough_edge, i) | ||
else: | ||
assert isinstance(node, TerminalNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
src/if_statements.py
Outdated
- {context.flow_graph.terminal_node()} | ||
) | ||
for node in unemitted_nodes: | ||
body.add_comment(f"bug: did not emit code for node #{node.name()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should expand this to also include the contents of said block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - not too hard.
# TODO: Treating ReturnNode as a special case and emitting it repeatedly | ||
# hides the fact that we failed to fold the control flow. Maybe remove? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly we should emit a comment at the very least indicating duplicate emission?
- Change locals from n/p/c to node/parent/child - Emit comment about duplicated return - Emit node content when skipped
This PR is (mostly? entirely?) orthogonal to my other PR, so I don't think there's any nontrivial merge conflicts between the two and it doesn't matter which order they get merged in.
As discussed on Discord, any ReturnNode could be the exit of the FlowGraph, which complicated some of the interval/structural analysis. Adding a single common exit node helps reduce edge cases.
While making this change, I also made several other refactors. I tried to only change things when I could delete chunks of code (e.g.
immediate_postdominator()
inif_statements.py
) or if the change helped preserve previous behavior/test output (e.g. the indenting changes).I don't love how big this PR got -- but I think even though I technically added lines, the code is a bit more straightforward? I also don't think my changes are incompatible with the direction that #76 is headed.
Other refactors:
Postdominators & loops can now be calculated on the FlowGraph at construction. Previously it was "postdominator within this interval" because there was not the common exit. This moved a lot of code out of
if_statements.py
and intoflow_graph.py
.Added
Node.children()
- it's very easy for a node to say what it's children are, rather than manually doing the.add_parent()
bookkeeping. (The.add_parent()
stuff could be removed in a future PR, andparents
could be calculated incompute_relations
, but the ordering change would introduce output changes, so I kept that out of this PR for clarity)Node.emit_goto
is now only used for very special cases (certain asm instructions, or the user manually annotates the source). Rather than pre-compute which nodes need goto, we just use a goto if the node was already emitted while building the Body.The
emit_node()
function now contains the logic for checking if a node has been already been emitted and a goto should be used instead.Added
SwitchStatement
toif_statements.py
.Indenting is now done at formatting time, instead of being determined while constructing
Body
s. This makes it possible to shiftBody
segments around after construction without recalculating the indents.Similiarly, empty return elision is done after building the
Body
s, instead of during thebuild_*
functions. Like indenting, this is to accomidate re-writes.Notes on test diffs:
func_809E80D8
. The429dbfb
commit has the output of the test on master, where you can see it failed to combine the conditions.switch
test to be improved: it largely just moves thecase
blocks out of the ends ofif
statements.multi-switch
test is a doozy. afaict it's semantically equivalent output, but it's still not great looking. (There is slightly less nesting though?).loop-selfassign-phi
has an infinite loop, so the flow graph is not reducible, and now mips2c automatically will fall back to the "no ifs"/"goto" mode. (There are only 12 functions across OOT/MM/Papermario that are non-reducible, and they also appear to be infinite loops.)