Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.12 throws UndefVarError: not defined in local scope for variable renamed in function before being captured by a closure #57141

Closed
nickrobinson251 opened this issue Jan 23, 2025 · 13 comments · Fixed by #57201
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior compiler:lowering Syntax lowering (compiler front end, 2nd stage) regression 1.12 Regression in the 1.12 release

Comments

@nickrobinson251
Copy link
Contributor

nickrobinson251 commented Jan 23, 2025

I've not yet been able to create a minimal reproduction of this that doesn't depend on a private package (sorry), but i can at least start a bisect to try to find the offending commit.

[Edit: better repro in message below]

In a private package, which has a function like

function assign_node_ids(node, starting_id::Int)
    ii = starting_id

    function _traverse::BackIRType)
        new_φ = copy_and_assign_id(φ, ii)
        if new_φ !== φ
            ii += 1
        end
        return new_φ
    end

    _traverse(n) = n

    return (_traverse(node), ii)
end
function assign_node_ids(node)
    return assign_node_ids(node, 0)[1]
end

Calling the function like

node = make_node(...)
assign_node_ids(node)

throws

ERROR: UndefVarError: `ii` not defined in local scope
Suggestion: check for an assignment to a local variable that shadows a global of the same name.
Stacktrace:
 [1] (::RAI_BackIR.Utilities.var"#_traverse#assign_node_ids##0")(φ::RAI_BackIR.Atom)
   @ RAI_BackIR.Utilities ~/repos/raicode/packages/RAI_BackIR/src/Utilities/traversal.jl:104
 [2] assign_node_ids
   @ ~/repos/raicode/packages/RAI_BackIR/src/Utilities/traversal.jl:113 [inlined]
 [3] assign_node_ids(node::RAI_BackIR.Atom)
   @ RAI_BackIR.Utilities ~/repos/raicode/packages/RAI_BackIR/src/Utilities/traversal.jl:117
 [4] top-level scope
   @ REPL[1]:1

The error disappears if the function is marked @noinline or the variable isn't renamed in the function, i.e.

@noinline function assign_node_ids(node, starting_id::Int)
    ii = starting_id
    ...
end

or

function assign_node_ids(node, ii::Int) 
    ...    
@nsajko nsajko added the bug Indicates an unexpected problem or unintended behavior label Jan 23, 2025
@nickrobinson251
Copy link
Contributor Author

Reproduction:

julia> module Repro

       struct BackIRType
           value::Int
           id::Int
           BackIRType(value::Int) = new(value)
       end

       function assign_node_ids(node)
           return assign_node_ids(node, 0)[1]
       end
       function assign_node_ids(node, starting_id::Int)
           ii = starting_id

           function _traverse::BackIRType)
               new_φ = _copy_and_assign_id(φ, ii)
               if new_φ !== φ
                   ii += 1
               end
               return new_φ
           end

           _traverse(n) = n

           return (_traverse(node), ii)
       end
       function _copy_and_assign_id::BackIRType, id::Int)
           isdefined(φ, :id) && return φ
           return BackIRType.value, id)
       end

       end
Main.Repro

julia> Repro.assign_node_ids(Repro.BackIRType(1))
ERROR: UndefVarError: `ii` not defined in local scope
Suggestion: check for an assignment to a local variable that shadows a global of the same name.
Stacktrace:
 [1] (::Main.Repro.var"#_traverse#assign_node_ids##0")(φ::Main.Repro.BackIRType)
   @ Main.Repro ./REPL[1]:16
 [2] assign_node_ids
   @ ./REPL[1]:25 [inlined]
 [3] assign_node_ids(node::Main.Repro.BackIRType)
   @ Main.Repro ./REPL[1]:10
 [4] top-level scope
   @ REPL[2]:1

julia> VERSION
v"1.12.0-DEV.1903"

@oscardssmith
Copy link
Member

tagging @Keno since this seems probably related to variable world age changes

@nsajko nsajko added the regression 1.12 Regression in the 1.12 release label Jan 23, 2025
@Keno Keno added the compiler:lowering Syntax lowering (compiler front end, 2nd stage) label Jan 23, 2025
@nickrobinson251
Copy link
Contributor Author

Bisect pointed to 9650510 (#55297)

@serenity4
Copy link
Contributor

I was able to shrink the reproducer to this:

julia> repro() = f()[1]
repro (generic function with 1 method)

julia> function f()
           i = 0
       
           function increment_i()
               i += 1
           end
       
           return (increment_i(), i)
       end
f (generic function with 1 method)

julia> repro()
ERROR: UndefVarError: `i` not defined in local scope

For reference, here is the emitted code for Julia 1.11.3, where things work as intended:

julia> @code_typed repro()
CodeInfo(
1%1 = %new(Core.Box)::Core.Box
│        Core.setfield!(%1, :contents, 0)::Int64%3 = %new(Main.:(var"#increment_i#12"), %1)::var"#increment_i#12"%4 = invoke %3()::Any%5 = Core.isdefined(%1, :contents)::Bool
└──      goto #3 if not %5
2 ─      Core.getfield(%1, :contents)::Any
└──      goto #4
3$(Expr(:throw_undef_if_not, :i, false))::Any
└──      unreachable
4return %4
) => Any

The UndefVarError comes from accessing i within increment_i. It only happens when the result of f() is partially used, taking the first element; when the return of i is optimized away (being the unused second tuple element), the dependency of increment_i() on i = 0 (the expression which defines the contents of the Core.Box) seems to be lost, resulting in this error.

This bug is not specific to boxed variables, it also happens with a manually constructed Ref:

julia> repro2() = f2()[1]
repro2 (generic function with 1 method)

julia> function f2()
           ref = Ref{Any}()
           ref[] = 0
       
           inner() = ref[] + 1
       
           @assert isdefined(ref, :x)
           return (inner(), ref[])
       end
f2 (generic function with 1 method)

julia> repro2()
ERROR: UndefRefError: access to undefined reference

With the same behavior, inner() throwing an error. If we remove @assert isdefined(ref, :x), the code executes fine. Therefore it seems to me that optimization is too aggressive in discarding ref[] = 0 via the information gained with isdefined(ref, :x).

@serenity4
Copy link
Contributor

serenity4 commented Jan 28, 2025

Further troubleshooting reveals that it is the SRoA pass which discards the call to setfield!:

infil> ir
29 1%1  = %new(Base.RefValue{Any})::Base.RefValue{Any}     │╻╷╷ f2
   │           builtin Base.setfield!(%1, :x, 0)::Int64
   

infil> sroa_mutables!(ir, defuses, used_ssas, lazydomtree, inlining)

infil> ir
29 1%1  = %new(Base.RefValue{Any})::Base.RefValue{Any}     │╻╷╷ f2
   │         nothing::Int64                                   
   

whereas I believe it should conservatively abort in the case of the mutable %1 escaping local scope via an invoke call to %3 which was constructed using %1.

@aviatesk Any thoughts on how/where to proceed with this, if this sounds correct to you? I'm thinking conceptually we might want to walk through definitions for arguments to any encountered invoke expression and define a list of SSA values for mutables that we know may be used through these calls (so we may give up on SRoA in those cases), but I have the feeling we may prefer to reuse other analysis tools instead of performing that one manually.

@aviatesk
Copy link
Member

That’s a great observation.

As you pointed out, this is likely an issue with the SROA pass, specifically within sroa_mutables! (although based on the bisected commit, the issue might lie in the PartialStruct derived through abstract interpretation)

More concretely, %1 should be considered as "escaping" in the :new statement for %3. Specifically the setfield! should not be eliminated at this line:

all_eliminated || continue
# all "usages" (i.e. `getfield` and `isdefined` calls) are eliminated,
# now eliminate "definitions" (i.e. `setfield!`) calls
# (NOTE the allocation itself will be eliminated by DCE pass later)
for didx in du.defs
didx == defidx && continue # this is allocation
# verify this statement won't throw, otherwise it can't be eliminated safely
setfield_ssa = SSAValue(didx)
if is_nothrow(ir, setfield_ssa)
ir[setfield_ssa][:stmt] = nothing

@aviatesk
Copy link
Member

As you said there might be an issue with the calculation of used_ssas for the ir. I suspect there’s something off with the way this computation is being handled.

# Check if there are any uses we did not account for. If so, the variable
# escapes and we cannot eliminate the allocation. This works, because we're guaranteed
# not to include any intermediaries that have dead uses. As a result, missing uses will only ever
# show up in the nuses_total count.
nleaves = length(defuse.uses) + length(defuse.defs)
nuses = 0
for iidx in intermediaries
nuses += used_ssas[iidx]
end
nuses_total = used_ssas[defidx] + nuses - length(intermediaries)
all_eliminated = all_forwarded = true

@serenity4
Copy link
Contributor

serenity4 commented Jan 28, 2025

I inspected the PartialStruct that is inferred in this context, and AFAIK it correctly respects its semantics (at least pre-optimization). I didn't see anything that may indicate an illegal construction, and don't see clearly yet how these observations are linked to the bisected PR. I'll further look into that, my current guess would be that either the PartialStruct construction just enables the attempt at SRoA optimization or maybe the update to the IntermediaryCollector made in #55297 was incomplete:

function (walker_callback::IntermediaryCollector)(@nospecialize(def), @nospecialize(defssa::AnySSAValue))
if !(def isa Expr)
push!(walker_callback.intermediaries, defssa.id)
if def isa PiNode
return LiftedValue(def.val)
end
end
return nothing
end

In particular, walking with this collector does skip the :new statement you mention.

@aviatesk
Copy link
Member

Yeah, I also think it's not that the PR caused the issue, but more like it just exposed a problem that was already there. We probably need to further tweak the walker callback, or maybe we could also change the condition that sets all_eliminated = true in sroa_mutables!.

@serenity4
Copy link
Contributor

Would a PreserveUse be suitable to express that another statement opaquely uses the value?

struct SSAUse
kind::Symbol
idx::Int
end
GetfieldUse(idx::Int) = SSAUse(:getfield, idx)
PreserveUse(idx::Int) = SSAUse(:preserve, idx)
NoPreserve() = SSAUse(:nopreserve, 0)
IsdefinedUse(idx::Int) = SSAUse(:isdefined, idx)
FinalizerUse(idx::Int) = SSAUse(:finalizer, idx)

Currently it seems like no use other than setfield!/isdefined/getfield/finalizer can be expressed:

if ismutabletypename(struct_typ_name)
isa(val, SSAValue) || continue
let intermediaries = SPCSet()
def = simple_walk(compact, val, IntermediaryCollector(intermediaries))
# Mutable stuff here
isa(def, SSAValue) || continue
if defuses === nothing
defuses = IdDict{Int, Tuple{SPCSet, SSADefUse}}()
end
mid, defuse = get!(()->(SPCSet(),SSADefUse()), defuses, def.id)
if is_setfield
push!(defuse.defs, idx)
elseif is_isdefined
push!(defuse.uses, IsdefinedUse(idx))
elseif is_finalizer
push!(defuse.uses, FinalizerUse(idx))
else
push!(defuse.uses, GetfieldUse(idx))
end
union!(mid, intermediaries)

But if SSAUse is not meant to express escapes I'll focus my efforts on used_ssas/all_eliminated as suggested.

@Keno
Copy link
Member

Keno commented Jan 29, 2025

If the use is opaque, it doesn't need to be handled, because it's supposed to be caught by the count comparison

@serenity4
Copy link
Contributor

Indeed, so the count comparison takes care of it correctly but the defuses info was wrong after DCE. #57201

@nickrobinson251
Copy link
Contributor Author

thanks for the quick response on this! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior compiler:lowering Syntax lowering (compiler front end, 2nd stage) regression 1.12 Regression in the 1.12 release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants