Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark results oscillate #79

Closed
matthias314 opened this issue Mar 9, 2024 · 12 comments
Closed

benchmark results oscillate #79

matthias314 opened this issue Mar 9, 2024 · 12 comments

Comments

@matthias314
Copy link

In this example, the timings returned by @b oscillate between two values, one of which agrees with @btime:

julia> using StaticArrays, BenchmarkTools, Chairmarks

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.522 ns (0 allocations: 0 bytes)
16.110 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.510 ns (0 allocations: 0 bytes)
9.763 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.515 ns (0 allocations: 0 bytes)
16.536 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.517 ns (0 allocations: 0 bytes)
9.764 ns

I only get this on my laptop, not on another machine. versioninfo() for my laptop:

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 4 virtual cores

Packages:

Status `/tmp/jl_AUb4oc/Project.toml`
  [6e4b80f9] BenchmarkTools v1.5.0
  [0ca39b1e] Chairmarks v1.1.2
  [90137ffa] StaticArrays v1.9.3

Addition: For @btime I once got the higher number when I ran the benchmark for the first time:

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  16.658 ns (0 allocations: 0 bytes)
17.701 ns

but I cannot reproduce this.

@LilithHafner
Copy link
Owner

Very interesting. Thank you for reporting even though it is hard to reproduce.

My initial guess is that this is that this is due to a random slowdown causing the autotuner to pick a significantly too low number of evaluations, which results in a longer reported runtime. This is something that both Chairmarks and BenchmarkTools are susceptible to, but Chairmarks more so due to less time spent tuning.

One possible solution is to verify that the tuning was reasonable using the final estimated runtime, and retry if not.


However, after a fair bit of experimentation, the closest I could get to reproducing this is

julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
  1.916 ns (0 allocations: 0 bytes)
Benchmark: 56 samples with 8260 evaluations
min    1.993 ns
median 2.003 ns
mean   2.007 ns
max    2.406 ns

julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
  2.041 ns (0 allocations: 0 bytes)
Benchmark: 4 samples with 7769 evaluations
       2.183 ns
       2.183 ns
       2.188 ns
       2.188 ns

julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
  1.958 ns (0 allocations: 0 bytes)
Benchmark: 54 samples with 8256 evaluations
min    1.993 ns
median 1.999 ns
mean   2.006 ns
max    2.387 ns

julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
  1.958 ns (0 allocations: 0 bytes)
Benchmark: 33 samples with 7773 evaluations
min    2.182 ns
median 2.187 ns
mean   2.197 ns
max    2.546 ns

Which is totally inconsistent with that hypothesis. What do you get when you change @b to @be?

@LilithHafner
Copy link
Owner

This seems suspicious and related:

julia> for _ in 1:10
           v = zero(SVector{1,Int16});
           x = @be $v == $v seconds=.003
           println(sum(s -> s.time*s.evals, x.samples) / .003) # Efficiency
       end
0.005333333333333334
0.6315463333333339
0.9134653333333339
0.8869929999999999
0.8970749999999997
0.8211156666666665
0.8458399999999997
0.9160763333333332
0.7899360000000001
0.959712333333333

Perhaps it is because compile time is counted against the runtime budget resulting in significantly fewer samples being taken

@matthias314
Copy link
Author

What do you get when you change @b to @be?

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v
  17.638 ns (0 allocations: 0 bytes)
Benchmark: 2699 samples with 1725 evaluations
min    15.839 ns
median 16.171 ns
mean   16.095 ns
max    27.762 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v
  9.525 ns (0 allocations: 0 bytes)
Benchmark: 3421 samples with 2721 evaluations
min    9.763 ns
median 9.771 ns
mean   9.879 ns
max    22.681 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v
  9.514 ns (0 allocations: 0 bytes)
Benchmark: 3201 samples with 1564 evaluations
min    16.551 ns
median 18.191 ns
mean   18.113 ns
max    32.007 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v
  9.521 ns (0 allocations: 0 bytes)
Benchmark: 3410 samples with 2709 evaluations
min    9.766 ns
median 9.778 ns
mean   9.930 ns
max    23.148 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v
  9.515 ns (0 allocations: 0 bytes)
Benchmark: 3267 samples with 1561 evaluations
min    17.669 ns
median 17.701 ns
mean   17.861 ns
max    29.844 ns

Note that for @btime I again get the larger value once at the beginning.

@LilithHafner
Copy link
Owner

Very strange. This is inconsistent with both the compile time idea and the tuning idea. To confirm that it is not tuning-related, what if you add keywords

v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421

@matthias314
Copy link
Author

what if you add keywords

v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421

Now I almost always get the larger value for @be -- and an oscillating behaviour for @btime!

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  9.518 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    9.766 ns
median 10.036 ns
mean   10.170 ns
max    43.306 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  10.576 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    15.828 ns
median 16.102 ns
mean   16.322 ns
max    45.574 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  17.644 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    15.829 ns
median 16.258 ns
mean   16.347 ns
max    49.786 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  10.573 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    15.829 ns
median 16.262 ns
mean   16.319 ns
max    49.806 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  17.570 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    15.828 ns
median 16.266 ns
mean   16.718 ns
max    43.635 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  10.572 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    15.829 ns
median 16.019 ns
mean   16.339 ns
max    29.027 ns

@LilithHafner
Copy link
Owner

wtf. Benchmarking is hard. I have no idea.

What if you put it in a function?

f() = begin v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421 end

and then f() repeatedly.

@matthias314
Copy link
Author

What if you put it in a function?

julia> f() = begin v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421 end
f (generic function with 1 method)

julia> f()
  9.516 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    17.675 ns
median 17.692 ns
mean   18.746 ns
max    53.658 ns

julia> f()
  9.778 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    17.666 ns
median 17.688 ns
mean   18.915 ns
max    52.516 ns

julia> f()
  10.578 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    17.663 ns
median 17.688 ns
mean   18.602 ns
max    46.182 ns

julia> f()
  9.515 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    17.674 ns
median 17.689 ns
mean   18.355 ns
max    46.739 ns

julia> f()
  16.661 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    17.674 ns
median 17.688 ns
mean   18.145 ns
max    41.606 ns

julia> f()
  9.516 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    17.674 ns
median 17.691 ns
mean   18.206 ns
max    38.266 ns

Maybe this cannot be debugged ...

@LilithHafner
Copy link
Owner

I wonder if @jrevels has any ideas, seeing how this effects both BenchmarkTools and Chairmarks.

@LilithHafner
Copy link
Owner

If you take out the call to BenchmarkTools are the Chairmark results still slow?

g() = begin v = zero(SVector{14,Int16}); @be $v == $v evals=2721 samples=3421 end

And then g() repeatedly

@matthias314
Copy link
Author

f you take out the call to BenchmarkTools are the Chairmark results still slow?

Yes, still slow with g(), both within the same Julia session as before and a fresh one.

@LilithHafner
Copy link
Owner

LilithHafner commented Mar 9, 2024

I wish I could reproduce this on my machine, that would make debugging easier. I've tried unsuccessfully on

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (aarch64-linux-gnu)
  CPU: 8 × unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, generic)
  Threads: 1 on 8 virtual cores

julia> versioninfo()
Julia Version 1.11.0-alpha1
Commit 671de9f5793 (2024-03-01 08:30 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (aarch64-linux-gnu)
  CPU: 8 × unknown
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = code

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, nehalem)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M2
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores
Environment:
  JULIA_EDITOR = code

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 4 × Apple Silicon
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, k8-sse3)
  Threads: 1 on 4 virtual cores

@LilithHafner
Copy link
Owner

I'm guessing this is due to hardware quirks. I'm going to close this as not actionable because

  • I can't reproduce it
  • This effects both BenchmarkTools and Chairmarks so it's plausibly not due to benchmarking methodology

@LilithHafner LilithHafner closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants