Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codegen: crash / incorrect answer with 128 threads #36

Open
jzxia opened this issue Sep 23, 2021 · 2 comments
Open

codegen: crash / incorrect answer with 128 threads #36

jzxia opened this issue Sep 23, 2021 · 2 comments

Comments

@jzxia
Copy link
Contributor

jzxia commented Sep 23, 2021

The code generated by src/codegen/broutine.jl crashes or produces wrong answer with 128 threads. Note that these errors have not occurred so far with <= 64 threads.

In particular, I tested on a computer running Ubuntu 20.04.2 LTS whose hardware topology is as follows:

julia> using Hwloc

julia> topology_info()
Machine: 1 (503.78 GB)
 Package: 2 (251.81 GB)
  Group: 8 (62.87 GB)
   NUMANode: 8 (62.87 GB)
    L3Cache: 32 (16.0 MB)
     L2Cache: 128 (512.0 kB)
      L1Cache: 128 (32.0 kB)
       Core: 128
        PU: 256

The following code (or a slight variation of it) is used to perform the test:

using Test
using BenchmarkTools
using LinearAlgebra
using BQCESubroutine
using YaoLocations

Threads.nthreads()

@testset "N=$N" for N in [15, 20]
        st = rand(Float64, 1<<N);
        loc = 1
        locs = BQCESubroutine.Locations(loc);
        st0 = broutine!(copy(st), Val(:X), locs);
        st1 = broutine!(copy(st), [0 1; 1 0], locs);
	println("|err| = ", norm(st0-st1))
        @test st0  st1
end;

I did the test for the following cases:

  • loc=1, old codegen using Threads.@threads
  • loc=N, old codegen using Threads.@threads
  • loc=N, new codegen using @batch from Polyester
  • loc=N, new codegen using Threads.@threads

where "old codegen" refers to the case where the following lines of src/codegen/broutine.jl are commented out (so that bsubspace is used); while "new codegen" refers to the case where the following lines are retained (so that threaded_subspace_loop_2x2_nontrivial is called).

if n == 1
push!(ret.args, threaded_subspace_loop_2x2_nontrivial(f_kernel, ctx, brt))
return ret
end

The test results are as follows. The errors occur in about 1/3 of all trials. Also, I haven't seen any errors so far with <=64 threads.

  • (crash) loc=1, old codegen using Threads.@threads
(base) visitor@delta106:~/julia_xjz/BQCESubroutine.jl$ julia --project=@.
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.2 (2021-07-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Test

julia> using BenchmarkTools

julia> using LinearAlgebra

julia> using BQCESubroutine

julia> using YaoLocations

julia> using BQCESubroutine: threaded_basic_broutine!

julia> @testset "N=$N" for N in [15, 20]
           #@testset "i=$i" for i in 1:N
           #for i in 1:N
           #for j in 1:1000
           for i in 1:1
               st = rand(Float64, 1<<N);
               locs = BQCESubroutine.Locations(i);
               st0 = broutine!(copy(st), Val(:X), locs);
               st1 = broutine!(copy(st), [0 1; 1 0], locs);
               println("|err| = ", norm(st0-st1))
               @test st0 ≈ st1
           end
           #end
       end;

signal (11): Segmentation fault
in expression starting at REPL[7]:1
unsafe_load at ./pointer.jl:105 [inlined]
unsafe_load at ./pointer.jl:105 [inlined]
macro expansion at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:177 [inlined]
pload at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:177 [inlined]
getindex at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:331 [inlined]
macro expansion at /home/visitor/julia_xjz/BQCESubroutine.jl/src/codegen/broutine.jl:315 [inlined]
#90 at /home/visitor/.julia/packages/Polyester/7cr0U/src/closure.jl:223 [inlined]
BatchClosure at /home/visitor/.julia/packages/Polyester/7cr0U/src/batch.jl:8
unknown function (ip: 0x7f0c580868f0)
_call at /home/visitor/.julia/packages/ThreadingUtilities/IkkvN/src/threadtasks.jl:11 [inlined]
ThreadTask at /home/visitor/.julia/packages/ThreadingUtilities/IkkvN/src/threadtasks.jl:29
unknown function (ip: 0x7f0c5808d9cc)

signal (11): Segmentation fault
in expression starting at REPL[7]:1
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
unsafe_load at ./pointer.jl:105 [inlined]
unsafe_load at ./pointer.jl:105 [inlined]
macro expansion at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:177 [inlined]
pload at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:177 [inlined]
getindex at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:331 [inlined]
macro expansion at /home/visitor/julia_xjz/BQCESubroutine.jl/src/codegen/broutine.jl:315 [inlined]
#90 at /home/visitor/.julia/packages/Polyester/7cr0U/src/closure.jl:223 [inlined]
BatchClosure at /home/visitor/.julia/packages/Polyester/7cr0U/src/batch.jl:8
unknown function (ip: 0x7f0c580868f0)
_call at /home/visitor/.julia/packages/ThreadingUtilities/IkkvN/src/threadtasks.jl:11 [inlined]
ThreadTask at /home/visitor/.julia/packages/ThreadingUtilities/IkkvN/src/threadtasks.jl:29
unknown function (ip: 0x7f0c5808d9cc)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
Allocations: 10465430 (Pool: 10461946; Big: 3484); GC: 10
unknown function (ip: (nil))
Allocations: 10465430 (Pool: 10461946; Big: 3484); GC: 10
Segmentation fault (core dumped)
  • (incorrect answer) loc=1, old codegen using Threads.@threads
(base) visitor@delta106:~/julia_xjz/BQCESubroutine.jl$ julia --project=@.
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.2 (2021-07-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Test

julia> using BenchmarkTools

julia> using LinearAlgebra

julia> using BQCESubroutine

julia> using YaoLocations

julia> using BQCESubroutine: threaded_basic_broutine!

julia> Threads.nthreads()
128

julia> @testset "N=$N" for N in [15, 20]
           #@testset "i=$i" for i in 1:N
           #for i in 1:N
           #for j in 1:1000
           for i in 1:1
               st = rand(Float64, 1<<N);
               locs = BQCESubroutine.Locations(i);
               st0 = broutine!(copy(st), Val(:X), locs);
               st1 = broutine!(copy(st), [0 1; 1 0], locs);
               println("|err| = ", norm(st0-st1))
               @test st0 ≈ st1
           end
           #end
       end;
|err| = 9.591304821338616
N=15: Test Failed at REPL[8]:11
  Expression: st0 ≈ st1
   Evaluated: [0.9892420967764597, 0.26037900123707414, 0.614994982713237, 0.20759717205479533, 0.3126703177619974, 0.18078785290089638, 0.7422001386059047, 0.7726755538057188, 0.3277775066108153, 0.5181144753668747  …  0.5338442075110978, 0.8575211492346384, 0.9954840790239925, 0.6424407507078336, 0.7940770595462205, 0.053890792175115054, 0.9595014083141846, 0.8423338613101816, 0.5532812445454995, 0.42973496521957366] ≈ [0.9892420967764597, 0.26037900123707414, 0.614994982713237, 0.20759717205479533, 0.3126703177619974, 0.18078785290089638, 0.7422001386059047, 0.7726755538057188, 0.3277775066108153, 0.5181144753668747  …  0.5338442075110978, 0.8575211492346384, 0.9954840790239925, 0.6424407507078336, 0.7940770595462205, 0.053890792175115054, 0.9595014083141846, 0.8423338613101816, 0.5532812445454995, 0.42973496521957366]
Stacktrace:
  [1] macro expansion
    @ ./REPL[8]:11 [inlined]
  [2] top-level scope
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1226 [inlined]
  [3] top-level scope
    @ ./REPL[8]:0
  [4] eval
    @ ./boot.jl:360 [inlined]
  [5] eval_user_input(ast::Any, backend::REPL.REPLBackend)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139
  [6] repl_backend_loop(backend::REPL.REPLBackend)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200
  [7] start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185
  [8] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:317
  [9] run_repl(repl::REPL.AbstractREPL, consumer::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:305
 [10] (::Base.var"#874#876"{Bool, Bool, Bool})(REPL::Module)
    @ Base ./client.jl:387
 [11] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [12] invokelatest
    @ ./essentials.jl:706 [inlined]
 [13] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
    @ Base ./client.jl:372
 [14] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:302
 [15] _start()
    @ Base ./client.jl:485
Test Summary: | Fail  Total
N=15          |    1      1
Test Summary: | Fail  Total
N=15          |    1      1
ERROR: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.

caused by: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.

julia>
  • (crash) loc=N, old codegen using Threads.@threads
...
signal (11): Segmentation fault
in expression starting at REPL[8]:1
unsafe_load at ./pointer.jl:105 [inlined]
unsafe_load at ./pointer.jl:105 [inlined]
...
  • (crash) loc=N, new codegen using Threads.@threads
    ditto

  • (crash) loc=N, new codegen using @batch from Polyester
    ditto

  • (incorrect answer) loc=N, new codegen using @batch from Polyester

(base) visitor@delta106:~/julia_xjz/BQCESubroutine.jl$ julia --project=@.
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.2 (2021-07-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using BQCESubroutine
[ Info: Precompiling BQCESubroutine [29e2bfda-5ba7-471c-9125-afac425f1f80]

julia> using Test

julia> using BenchmarkTools

julia> using LinearAlgebra

julia> using BQCESubroutine

julia> using YaoLocations

julia> Threads.nthreads()
128

julia> @testset "N=$N" for N in [15, 20]
               st = rand(Float64, 1<<N);
               locs = BQCESubroutine.Locations(N);
               st0 = broutine!(copy(st), Val(:X), locs);
               st1 = broutine!(copy(st), [0 1; 1 0], locs);
               println("|err| = ", norm(st0-st1))
               @test st0 ≈ st1
       end;
threaded_subspace_loop_2x2_nontrivial
|err| = 16.047140931650585
N=15: Test Failed at REPL[8]:7
  Expression: st0 ≈ st1
   Evaluated: [0.5331964447622937, 0.6840490894483715, 0.2992315961195635, 0.2788357425851684, 0.8245955857174441, 0.34661593647558275, 0.13788131297975648, 0.4132599933839103, 0.10438664295039812, 0.6052680657151797  …  0.01005720357114237, 0.40938335588275665, 0.13120408445874276, 0.21412778340666128, 0.23683502279509216, 0.4887433118091513, 0.43142024877557206, 0.4821280787877209, 0.5761057194395589, 0.7531886577130373] ≈ [0.5331964447622937, 0.6840490894483715, 0.2992315961195635, 0.2788357425851684, 0.8245955857174441, 0.34661593647558275, 0.13788131297975648, 0.4132599933839103, 0.10438664295039812, 0.6052680657151797  …  0.01005720357114237, 0.40938335588275665, 0.13120408445874276, 0.21412778340666128, 0.23683502279509216, 0.4887433118091513, 0.43142024877557206, 0.4821280787877209, 0.5761057194395589, 0.7531886577130373]
Stacktrace:
  [1] macro expansion
    @ ./REPL[8]:7 [inlined]
  [2] top-level scope
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1226 [inlined]
  [3] top-level scope
    @ ./REPL[8]:0
  [4] eval
    @ ./boot.jl:360 [inlined]
  [5] eval_user_input(ast::Any, backend::REPL.REPLBackend)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139
  [6] repl_backend_loop(backend::REPL.REPLBackend)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200
  [7] start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185
  [8] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:317
  [9] run_repl(repl::REPL.AbstractREPL, consumer::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:305
 [10] (::Base.var"#874#876"{Bool, Bool, Bool})(REPL::Module)
    @ Base ./client.jl:387
 [11] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [12] invokelatest
    @ ./essentials.jl:706 [inlined]
 [13] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
    @ Base ./client.jl:372
 [14] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:302
 [15] _start()
    @ Base ./client.jl:485
Test Summary: | Fail  Total
N=15          |    1      1
Test Summary: | Fail  Total
N=15          |    1      1
ERROR: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.

caused by: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.

julia>
@jzxia
Copy link
Contributor Author

jzxia commented Sep 23, 2021

What I've tried so far:

  • disable fastmath and inbounds (change both options to false), but the same error occurs:

const FASTMATH = Ref(true)
const INBOUNDS = Ref(true)

  • add @assert 0 <= $m < (1 << $(ctx.hoisted_vars.nqubits)) before calling kernel, but same error occurs, without triggering the assert.

    for $m in $k : $k | $m_max
    $(kernel(m))
    end

  • test with nthreads=1 to 64 (929304e)
    all tests passed

Possibly related issue:
JuliaLang/julia#14857

@Roger-luo
Copy link
Member

can we try to reduce the MWE by copying out one segment fault code generated from codegen? The current code is too complicated yo address this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants