-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Add control flow for try
/except
blocks (v2)
#13633
Conversation
Codspeed reports a 2% regression in the |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't fully reviewed yet, just one kind of fundamental thing that jumped out at me on first look, would like to get your thoughts on that.
// These definitions were erased by `self.flow_restore`ing to the post-`else` state. | ||
// We can't simply `self.flow_merge()` with any snapshots taken during the `finally` block, however, | ||
// as there are more potential definition states inside the `finally` block than there are | ||
// from a point after the `finally` block's completion. | ||
// Instead, we must manually re-add these definitions to the `use-def` map | ||
if let Some(finally_definitions) = self.try_node_context_stack().pop_context() { | ||
for DefinitionRecord { | ||
symbol, | ||
definition, | ||
category, | ||
} in finally_definitions | ||
{ | ||
self.current_use_def_map_mut() | ||
.record_definition(symbol, definition, category); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is tricky indeed. I hadn't fully understood the awkward consequences of the way finally blocks work for our CFG. Thanks for taking the time to think this through!
Unfortunately I don't think this approach (of storing and then re-applying Definitions in the finally block) is going to give us the right results. Consider a case like this:
x = 1
try:
x = could_raise_returns_str()
finally:
y = x
reveal_type(y)
The correct revealed type for y
is str
, because in any case where code flow continues after the finally
, that means the try
block actually completed without an exception. But this PR currently gives the revealed type as Literal[1] | str
. By storing and reapplying the Definition
for y
, we get the type of the RHS from the scenario where we might have an exception.
I think the only way to handle this correctly is to, in some form, duplicate or double visit the finally
block. We effectively need to type it twice, once under the assumption that any code it protects might have raised, and again under the assumption that it didn't.
This will be a significant bit of work, as it troubles some core assumptions we have about visiting every expression exactly once. I don't think we should do it in this PR.
But I also don't think we should do this store-and-reapply-definitions thing, either, for two reasons. One is that I think it's just generally important for correctness that we maintain the control-flow-graph abstraction and don't work around it with tricks like this. The other is just about the tradeoff in semantics for Python code. Until/unless we get to a correct double-visit fix, I think the best tradeoff is to accept some false negatives while checking the finally
block itself, but ensure we get the types correct after the finally
block. In other words, for now I think we should just visit the finally block under the no-exceptions assumption.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great example that shows the flaws in this approach! Ugh, I really thought I'd covered everything this time 🫠 This was, as I'm sure you guessed, the bit of this PR that I was least sure about.
But I also don't think we should do this store-and-reapply-definitions thing, either, for two reasons. One is that I think it's just generally important for correctness that we maintain the control-flow-graph abstraction and don't work around it with tricks like this. The other is just about the tradeoff in semantics for Python code. Until/unless we get to a correct double-visit fix, I think the best tradeoff is to accept some false negatives while checking the
finally
block itself, but ensure we get the types correct after thefinally
block. In other words, for now I think we should just visit the finally block under the no-exceptions assumption.What do you think?
I think this makes me a little sad after I spent so much time thinking about finally
blocks 😆
I think it is pretty important that we fix this eventually. In the long run, this will lead to false positives as well as false negatives. For example, when we start emitting diagnostics for unreachable code, we will emit spurious errors on the if
branch inside the finally
block in this snippet, as we will incorrectly infer it as being unreachable:
x = 42
try:
x = could_raise_returns_int()
except:
could_raise()
x = "foo"
else:
could_raise()
x = "foo"
finally:
if isinstance(x, int):
... # we'd probably detect this as unreachable
# unless we consider the fact that we might have jumped to the `finally`
# branch from halfway through an `except` or `else` branch
else:
...
Another way I thought of trying to fix this "awkwardness" was to utilise the fact that we know that try
/except
blocks with finally
branches desugar to nested try
/except
blocks. We could attempt to "synthesize" a nested StmtTry
node if we see that a StmtTry
node has a non-empty finally
suite. (Not actually create a synthetic StmtTry
node, but visit the StmtTry
node exactly as if it were a nested StmtTry
inside another StmtTry
.) I actually started off trying to do that, but quickly stopped as this PR's current approach seemed like a simpler solution. (And I was also not sure how this would work with the assertions we have that you mentioned above, about only ever visiting every expression once.) Given the issue you just pointed out in your example, it seems like that probably is the only good way of doing it, though; there doesn't seem to be any way of taking shortcuts while respecting Python's semantics properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you're right that we will want to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've written up your edge case in my document describing control-flow semantics for exception handlers. It's very specific! I believe it only applies to StmtTry
nodes that:
- have
finally
blocks, and either:- do not have any
except
branches, or - all the
except
branches of theStmtTry
node lead to immediate termination of the scope following thefinally
block, through either araise
,return
or similar.
- do not have any
The specificity of the edge case doesn't mean that it's unimportant to consider, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it also applies to try blocks with except handlers, it's just that the issue shifts to considering the possibility of an exception in the exception handler, rather than an exception in the try block?
And try/finally without except handlers is not an uncommon case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it also applies to try blocks with except handlers, it's just that the issue shifts to considering the possibility of an exception in the exception handler, rather than an exception in the try block?
Ah, great point.
And try/finally without except handlers is not an uncommon case.
I said specific, not uncommon! I agree that try
/finally
without except
is pretty common, so I definitely agree this is an important case to consider.
Would it be possible and would you feel comfortable to make the internal document public and mention it in the pr summary? I hope I get to review this on Monday or no later than Tuesday |
Done! |
ccd271b
to
aac8753
Compare
aac8753
to
8219a07
Compare
8219a07
to
e51ff89
Compare
(I still have yet to address @carljm's feedback, which is going to require a somewhat significant rewrite. I just pushed some changes to rewrite the tests using |
We may or may not want to add a lot of the TL;DR: I'm closing this for now; #13729 is the new version that addresses @carljm's comments! |
Another thing I realized that we'll need to take into account when we do smarter handling for |
Summary
This PR adds control flow for
try
/except
/else
/finally
blocks to red-knot. It's a replacement PR for #13338, which had some fundamental issues in its approach, in particular with regards tofinally
blocks.The semantics of
try
/except
blocks are very complicated! I've written up a long document outlining all the various jumps control flow could take, which can be found here. I won't try to summarise that document in this PR description. But I will give a brief description of some of the ways I've attempted to model these semantics in this PR:Abstractions for handling
try
/except
blocks have been added to a newbuilder
submodule,builder/exception_handlers.rs
:TryNodeContext
keeps track of state for a singletry
/except
/else
/finally
block. Exactly what state we need to keep track of varies according to whether the node has afinally
branch, and according to which branch of theStmtTry
node we're currently visiting.TryNodeContextStack
is a stack ofTryNodeContext
instances. For any given scope,try
blocks can be arbitrarily nested; this means that we must keep a stack ofTryNodeContext
s for each scope we visit.TryNodeContextStackManager
is a stack ofTryNodeContextStack
s. Whenever we enter a nested scope, a newTryNodeContextStack
is initialised by theTryNodeContextStackManager
and appended to the stack of stacks. Whenever we exit that scope, theTryNodeContextStack
is popped off the stack of stacks.The diff for this PR is quite large, but this is mostly tests. There aren't actually that many tests, but they unfortunately need to be quite verbose. This is because we may add a more sophisticated understanding of exception handlers in the future (where we would understand that e.g.
x = 1
can never raise an exception), and I wanted the tests to be robust to this so that they wouldn't have to be rewritten when that happens. (This also helps readability of the tests, since we obviously know thatx = 1
can never raise exceptions.) To address this, I made sure to use assignments to function calls for testing places where a raised exception could cause a jump in control flow. This will be robust to future improvements, since it will always be the case that we will consider a function call capable of raising arbitrary exceptions.Test Plan
All tests have been added to
infer.rs
. They all usereveal_type
to assert that the type of a variable changes as we move through the varioustry
/except
/else
/finally
branches.