+
+
+ 404 | Chairmarks.jl
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/previews/PR157/assets/app.Dp5Nz74x.js b/previews/PR157/assets/app.Dp5Nz74x.js
new file mode 100644
index 00000000..3029428f
--- /dev/null
+++ b/previews/PR157/assets/app.Dp5Nz74x.js
@@ -0,0 +1 @@
+import{R as p}from"./chunks/theme.Dr7bQEWL.js";import{R as o,a6 as u,a7 as c,a8 as l,a9 as f,aa as d,ab as m,ac as h,ad as g,ae as A,af as v,d as P,u as R,v as w,s as y,ag as C,ah as b,ai as E,a4 as S}from"./chunks/framework.rx6Iergl.js";function i(e){if(e.extends){const a=i(e.extends);return{...a,...e,async enhanceApp(t){a.enhanceApp&&await a.enhanceApp(t),e.enhanceApp&&await e.enhanceApp(t)}}}return e}const s=i(p),T=P({name:"VitePressApp",setup(){const{site:e,lang:a,dir:t}=R();return w(()=>{y(()=>{document.documentElement.lang=a.value,document.documentElement.dir=t.value})}),e.value.router.prefetchLinks&&C(),b(),E(),s.setup&&s.setup(),()=>S(s.Layout)}});async function D(){globalThis.__VITEPRESS__=!0;const e=j(),a=_();a.provide(c,e);const t=l(e.route);return a.provide(f,t),a.component("Content",d),a.component("ClientOnly",m),Object.defineProperties(a.config.globalProperties,{$frontmatter:{get(){return t.frontmatter.value}},$params:{get(){return t.page.value.params}}}),s.enhanceApp&&await s.enhanceApp({app:a,router:e,siteData:h}),{app:a,router:e,data:t}}function _(){return g(T)}function j(){let e=o,a;return A(t=>{let n=v(t),r=null;return n&&(e&&(a=n),(e||a===n)&&(n=n.replace(/\.js$/,".lean.js")),r=import(n)),o&&(e=!1),r},s.NotFound)}o&&D().then(({app:e,router:a,data:t})=>{a.go().then(()=>{u(a.route,t.site),e.mount("#app")})});export{D as createApp};
diff --git a/previews/PR157/assets/autoload.md.CTdFRUqF.js b/previews/PR157/assets/autoload.md.CTdFRUqF.js
new file mode 100644
index 00000000..6a31c2c8
--- /dev/null
+++ b/previews/PR157/assets/autoload.md.CTdFRUqF.js
@@ -0,0 +1,6 @@
+import{_ as s,c as a,a5 as t,o as e}from"./chunks/framework.rx6Iergl.js";const c=JSON.parse('{"title":"How to integrate Chairmarks into your workflow","description":"","frontmatter":{},"headers":[],"relativePath":"autoload.md","filePath":"autoload.md","lastUpdated":null}'),n={name:"autoload.md"};function l(o,i,r,h,p,d){return e(),a("div",null,i[0]||(i[0]=[t(`
There are several ways to use Chairmarks in your interactive sessions, ordered from simplest to install first to most streamlined user experience last.
Add Chairmarks to your default environment with import Pkg; Pkg.activate(); Pkg.add("Chairmarks"). Chairmarks has no non-stdlib dependencies, and precompiles in less than one second, so this should not have any adverse impacts on your environments nor slow load times nor package instillation times.
Add Chairmarks to your default environment and put isinteractive() && using Chairmarks in your startup.jl file. This will make Chairmarks available in all your REPL sessions while still requiring an explicit load in scripts and packages. This will slow down launching a new Julia session by a few milliseconds (for comparison, this is about 20x faster than loading Revise in your startup.jl file).
[Recommended] Add Chairmarks and BasicAutoloads to your default environment and put the following script in your startup.jl file to automatically load it when you type @b or @be in the REPL:
`,4)]))}const u=s(n,[["render",l]]);export{c as __pageData,u as default};
diff --git a/previews/PR157/assets/autoload.md.CTdFRUqF.lean.js b/previews/PR157/assets/autoload.md.CTdFRUqF.lean.js
new file mode 100644
index 00000000..6a31c2c8
--- /dev/null
+++ b/previews/PR157/assets/autoload.md.CTdFRUqF.lean.js
@@ -0,0 +1,6 @@
+import{_ as s,c as a,a5 as t,o as e}from"./chunks/framework.rx6Iergl.js";const c=JSON.parse('{"title":"How to integrate Chairmarks into your workflow","description":"","frontmatter":{},"headers":[],"relativePath":"autoload.md","filePath":"autoload.md","lastUpdated":null}'),n={name:"autoload.md"};function l(o,i,r,h,p,d){return e(),a("div",null,i[0]||(i[0]=[t(`
There are several ways to use Chairmarks in your interactive sessions, ordered from simplest to install first to most streamlined user experience last.
Add Chairmarks to your default environment with import Pkg; Pkg.activate(); Pkg.add("Chairmarks"). Chairmarks has no non-stdlib dependencies, and precompiles in less than one second, so this should not have any adverse impacts on your environments nor slow load times nor package instillation times.
Add Chairmarks to your default environment and put isinteractive() && using Chairmarks in your startup.jl file. This will make Chairmarks available in all your REPL sessions while still requiring an explicit load in scripts and packages. This will slow down launching a new Julia session by a few milliseconds (for comparison, this is about 20x faster than loading Revise in your startup.jl file).
[Recommended] Add Chairmarks and BasicAutoloads to your default environment and put the following script in your startup.jl file to automatically load it when you type @b or @be in the REPL:
This page of the documentation is not targeted at teaching folks how to use this package. Instead, it is designed to offer insight into how the the internals work, why I made certain design decisions. That said, it certainly won't hurt your user experience to read this!
This is not part of the API
The things listed on this page are true (or should be fixed) but are not guarantees. They may change in future 1.x releases.
The obvious and formulaic choice, Benchmarks.jl, was taken. This package is very similar to Benchmarks.jl and BenchmarkTools.jl, but has a significantly different implementation and a distinct API. When differentiating multiple similar things, I prefer distinctive names over synonyms or different parts of speech. The difference between the names should, if possible, reflect the difference in the concepts. If that's not possible, it should be clear that the difference between the names does not reflect the difference between concepts. This rules out most names like "Benchmarker.jl", "Benchmarking.jl", "BenchmarkSystem.jl", etc. I could have chosen "EfficientBenchmarks.jl", but that is pretty pretentious and also would become misleading if "BenchmarkTools.jl" becomes more efficient in the future.
Chairmarks doesn't run garbage collection at the start of every benchmark by default
Chairmarks has faster and more efficient auto-tuning
Chairmarks runs its arguments as functions in the scope that the benchmark was invoked from, rather than evaling them at global scope. This makes it possible to get significant performance speedups for fast benchmarks by putting the benchmarking itself into a function. It also avoids leaking memory on repeated invocations of a benchmark, which is unavoidable with BenchmarkTools.jl's design. (discourse, github)
Because Charimarks does not use toplevel eval, it can run arbitrarily quickly, as limited by a user's noise tolerance. Consequently, the auto-tuning algorithm is tuned for low runtime budgets in addition to high budgets so its precision doesn't degrade too much at low runtime budgets.
Chairmarks tries very hard not to discard data. For example, if your function takes longer to evaluate then the runtime budget, Chairmarks will simply report the warmup runtime (with a disclaimer that there was no warmup). This makes Chairmarks a viable complete substitute for the trivial @time macro and friends. @b sleep(10) takes 10.05 seconds (just like @time sleep(10)), whereas @benchmark sleep(10) takes 30.6 seconds despite only reporting one sample.
When comparing @b to @btime with seconds=.5 or more, yes: result stability should be comparable. Any deficiency in precision or reliability compared to BenchmarkTools is a problem and should be reported. When seconds is less than about 0.5, BenchmarkTools stops respecting the requested runtime budget and so it could very well perform much more precisely than Chairmarks (it's hard to compete with a 500ms benchmark when you only have 1ms). In practice, however, Chairmarks stays pretty reliable even for fairly low runtimes.
When comparing different implementations of the same function, @b rand f,g can be more reliable than judge(minimum(@benchmark(f(x) setup=(x=rand()))), minimum(@benchmark(g(x) setup=(x=rand()))) because the former randomly interleaves calls to f and g in the same context and scope with the same inputs while the latter runs all evaluations of f before all evaluations of g and—typically less importantly—uses different random inputs.
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
First of all, what is "tuning" for? It's for tuning the number of evaluations per sample. We want the total runtime of a sample to be 30μs, which makes the noise of instrumentation itself (clock precision, the time to takes to record performance counters, etc.) negligible. If the user specifies evals manually, then there is nothing to tune, so we do a single warmup and then jump straight to the benchmark. In the benchmark, we run samples until the time budget or sample budget is exhausted.
If evals is not provided and seconds is (by default we have seconds=0.1), then we target spending 5% of the time budget on calibration. We have a multi-phase approach where we start by running the function just once, use that to decide the order of the benchmark and how much additional calibration is needed. See https://github.com/LilithHafner/Chairmarks.jl/blob/main/src/benchmarking.jl for details.
We prioritize human experience (both user and developer) over formal guarantees. Where formal guarantees improve the experience of folks using this package, we will try to make and adhere to them. Under both soft and traditional semantic versioning, the version number is primarily used to communicate to users whether a release is breaking. If Chairmarks had an infinite number of users, all of whom respected the formal API by only depending on formally documented behavior, then soft semantic versioning would be equivalent to traditional semantic versioning. However, as the user base differs from that theoretical ideal, so too does the most effective way of communicating which releases are breaking. For example, if version 1.1.0 documents that "the default runtime is 0.1 seconds" and a new version allows users to control this with a global variable, then that change does break the guarantee that the default runtime is 0.1 seconds. However, it still makes sense to release as 1.2.0 rather than 2.0.0 because it is less disruptive to users to have that technical breakage than to have to review the changelog for breakage and decide whether to update their compatibility statements or not.
When there are conflicts between compatibility/alignment with BenchmarkTools and producing the best experience I can for folks who are not coming for BenchmarkTools or using BenchmarkTools simultaneously, I put much more weight on the latter. One reason for this is folks who want something like BenchmarkTools should use BenchmarkTools. It's a great package that is reliable, mature, and has been stable for a long time. A diversity of design choices lets users pick packages based on their own preferences. Another reason for this is that I aim to work toward the best long term benchmarking solution possible (perhaps in some years there will come a time where another package makes both BenchmarkTools.jl and Chairmarks.jl obsolete). To this end, carrying forward design choices I disagree with is not beneficial. All that said, I do not want to break compatibility or change style just to stand out. Almost all of BenchmarkTools' design decisions are solid and worth copying. Things like automatic tuning, the ability to bypass that automatic tuning, a split evals/samples structure, the ability to run untimed setup code before each sample, and many more mundane details we take for granted were once clever design decisions made in BenchmarkTools or its predecessors.
Below, I'll list some specific design departures and why I made them
Chairmarks uses the abbreviated macros @b and @be. Descriptive names are almost always better than terse one-letter names. However I maintain that macros defined in packages and designed to be typed repeatedly at the REPL are one of the few exceptions to this "almost always". At the REPL, these macros are often typed once and never read. In this case, concision does matter and readability does not. When naming these macros I anticipated that REPL usage would be much more common than usage in packages or reused scripts. However, if and as this changes it may be worth adding longer names for them and possibly restricting the shorter names to interactive use only.
@be, like BenchmarkTools.@benchmark, returns a Benchmark object. @b, unlike BenchmarkTools.@btime returns a composite sample formed by computing the minimum statistic over the benchmark, rather than returning the expression result and printing runtime statistics. The reason I originally considered making this decision is that typed @btime sort!(x) setup=(x=rand(1000)) evals=1 into the REPL and seen the whole screen fill with random numbers too many times. Let's also consider the etymology of @time to justify this decision further. @time is a lovely macro that can be placed around an arbitrary long-running chunk of code or expression to report its runtime to stdout. @time is the print statement of profiling. @btime and @b can very much not fill that role for three major reasons: first, most long-running code has side effects, and those macros run the code repeatedly, which could break things that rely on their side effects; second, @btime, and to a lesser extent @b, take ages to run; and third, only applying to @btime, @btime runs its body in global scope, not the scope of the caller. @btime and @b are not noninvasive tools to measure runtime of a portion of an algorithm, they are top-level macros to measure the runtime of an expression or function call. Their primary result is the runtime statistics of expression under benchmarking and the conventional way to report the primary result of a macro of function call to the calling context is with a return value. Consequently @b returns an aggregated benchmark result rather than following the pattern of @btime.
If you are writing a script that computes some values and want to display those values to the user, you generally have to call display. Chairmarks in not an exception. If it were possible, I would consider special-casing @show @b blah.
Chairmarks's display format is differs slightly from BenchmarkTools' display format. The indentation differences are to make sure Chairmarks is internally consistent and the choice of information displayed differs because Chairmarks has more types of information to display than BenchmarkTools.
@btime displays with a leading space while @b does not. No Julia objects that I know of displays with a leading space on the first line. Sample (returned by @b) is no different. See above for why @b returns a Sample instead of displaying in the style of @time.
BenchmarkTools.jl's short display mode (@btime) displays runtime and allocations. Chairmark's short display mode (displaying a sample, or simply @b at the REPL) follows Base.@time instead and captures a wide variety of information, displaying only nonzero values. Here's a selection of the diversity of information Charimarks makes available to users, paired with how BenchmarkTools treats the same expressions:
It would be a loss restrict ourselves to only runtime and allocations, it would be distracting to include "0% compilation time" in outputs which have zero compile time, and it would be inconsistent to make some fields (e.g. allocation count and amount) always display while others are only displayed when non-zero. Sparse display is the compromise I've chosen to get the best of both worlds.
`,33)]))}const m=i(n,[["render",h]]);export{c as __pageData,m as default};
diff --git a/previews/PR157/assets/explanations.md.BIDZQFXY.lean.js b/previews/PR157/assets/explanations.md.BIDZQFXY.lean.js
new file mode 100644
index 00000000..a0220cc7
--- /dev/null
+++ b/previews/PR157/assets/explanations.md.BIDZQFXY.lean.js
@@ -0,0 +1,29 @@
+import{_ as i,c as e,a5 as a,o as t}from"./chunks/framework.rx6Iergl.js";const c=JSON.parse('{"title":"Explanation of design decisions","description":"","frontmatter":{},"headers":[],"relativePath":"explanations.md","filePath":"explanations.md","lastUpdated":null}'),n={name:"explanations.md"};function h(o,s,l,r,p,k){return t(),e("div",null,s[0]||(s[0]=[a(`
This page of the documentation is not targeted at teaching folks how to use this package. Instead, it is designed to offer insight into how the the internals work, why I made certain design decisions. That said, it certainly won't hurt your user experience to read this!
This is not part of the API
The things listed on this page are true (or should be fixed) but are not guarantees. They may change in future 1.x releases.
The obvious and formulaic choice, Benchmarks.jl, was taken. This package is very similar to Benchmarks.jl and BenchmarkTools.jl, but has a significantly different implementation and a distinct API. When differentiating multiple similar things, I prefer distinctive names over synonyms or different parts of speech. The difference between the names should, if possible, reflect the difference in the concepts. If that's not possible, it should be clear that the difference between the names does not reflect the difference between concepts. This rules out most names like "Benchmarker.jl", "Benchmarking.jl", "BenchmarkSystem.jl", etc. I could have chosen "EfficientBenchmarks.jl", but that is pretty pretentious and also would become misleading if "BenchmarkTools.jl" becomes more efficient in the future.
Chairmarks doesn't run garbage collection at the start of every benchmark by default
Chairmarks has faster and more efficient auto-tuning
Chairmarks runs its arguments as functions in the scope that the benchmark was invoked from, rather than evaling them at global scope. This makes it possible to get significant performance speedups for fast benchmarks by putting the benchmarking itself into a function. It also avoids leaking memory on repeated invocations of a benchmark, which is unavoidable with BenchmarkTools.jl's design. (discourse, github)
Because Charimarks does not use toplevel eval, it can run arbitrarily quickly, as limited by a user's noise tolerance. Consequently, the auto-tuning algorithm is tuned for low runtime budgets in addition to high budgets so its precision doesn't degrade too much at low runtime budgets.
Chairmarks tries very hard not to discard data. For example, if your function takes longer to evaluate then the runtime budget, Chairmarks will simply report the warmup runtime (with a disclaimer that there was no warmup). This makes Chairmarks a viable complete substitute for the trivial @time macro and friends. @b sleep(10) takes 10.05 seconds (just like @time sleep(10)), whereas @benchmark sleep(10) takes 30.6 seconds despite only reporting one sample.
When comparing @b to @btime with seconds=.5 or more, yes: result stability should be comparable. Any deficiency in precision or reliability compared to BenchmarkTools is a problem and should be reported. When seconds is less than about 0.5, BenchmarkTools stops respecting the requested runtime budget and so it could very well perform much more precisely than Chairmarks (it's hard to compete with a 500ms benchmark when you only have 1ms). In practice, however, Chairmarks stays pretty reliable even for fairly low runtimes.
When comparing different implementations of the same function, @b rand f,g can be more reliable than judge(minimum(@benchmark(f(x) setup=(x=rand()))), minimum(@benchmark(g(x) setup=(x=rand()))) because the former randomly interleaves calls to f and g in the same context and scope with the same inputs while the latter runs all evaluations of f before all evaluations of g and—typically less importantly—uses different random inputs.
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
First of all, what is "tuning" for? It's for tuning the number of evaluations per sample. We want the total runtime of a sample to be 30μs, which makes the noise of instrumentation itself (clock precision, the time to takes to record performance counters, etc.) negligible. If the user specifies evals manually, then there is nothing to tune, so we do a single warmup and then jump straight to the benchmark. In the benchmark, we run samples until the time budget or sample budget is exhausted.
If evals is not provided and seconds is (by default we have seconds=0.1), then we target spending 5% of the time budget on calibration. We have a multi-phase approach where we start by running the function just once, use that to decide the order of the benchmark and how much additional calibration is needed. See https://github.com/LilithHafner/Chairmarks.jl/blob/main/src/benchmarking.jl for details.
We prioritize human experience (both user and developer) over formal guarantees. Where formal guarantees improve the experience of folks using this package, we will try to make and adhere to them. Under both soft and traditional semantic versioning, the version number is primarily used to communicate to users whether a release is breaking. If Chairmarks had an infinite number of users, all of whom respected the formal API by only depending on formally documented behavior, then soft semantic versioning would be equivalent to traditional semantic versioning. However, as the user base differs from that theoretical ideal, so too does the most effective way of communicating which releases are breaking. For example, if version 1.1.0 documents that "the default runtime is 0.1 seconds" and a new version allows users to control this with a global variable, then that change does break the guarantee that the default runtime is 0.1 seconds. However, it still makes sense to release as 1.2.0 rather than 2.0.0 because it is less disruptive to users to have that technical breakage than to have to review the changelog for breakage and decide whether to update their compatibility statements or not.
When there are conflicts between compatibility/alignment with BenchmarkTools and producing the best experience I can for folks who are not coming for BenchmarkTools or using BenchmarkTools simultaneously, I put much more weight on the latter. One reason for this is folks who want something like BenchmarkTools should use BenchmarkTools. It's a great package that is reliable, mature, and has been stable for a long time. A diversity of design choices lets users pick packages based on their own preferences. Another reason for this is that I aim to work toward the best long term benchmarking solution possible (perhaps in some years there will come a time where another package makes both BenchmarkTools.jl and Chairmarks.jl obsolete). To this end, carrying forward design choices I disagree with is not beneficial. All that said, I do not want to break compatibility or change style just to stand out. Almost all of BenchmarkTools' design decisions are solid and worth copying. Things like automatic tuning, the ability to bypass that automatic tuning, a split evals/samples structure, the ability to run untimed setup code before each sample, and many more mundane details we take for granted were once clever design decisions made in BenchmarkTools or its predecessors.
Below, I'll list some specific design departures and why I made them
Chairmarks uses the abbreviated macros @b and @be. Descriptive names are almost always better than terse one-letter names. However I maintain that macros defined in packages and designed to be typed repeatedly at the REPL are one of the few exceptions to this "almost always". At the REPL, these macros are often typed once and never read. In this case, concision does matter and readability does not. When naming these macros I anticipated that REPL usage would be much more common than usage in packages or reused scripts. However, if and as this changes it may be worth adding longer names for them and possibly restricting the shorter names to interactive use only.
@be, like BenchmarkTools.@benchmark, returns a Benchmark object. @b, unlike BenchmarkTools.@btime returns a composite sample formed by computing the minimum statistic over the benchmark, rather than returning the expression result and printing runtime statistics. The reason I originally considered making this decision is that typed @btime sort!(x) setup=(x=rand(1000)) evals=1 into the REPL and seen the whole screen fill with random numbers too many times. Let's also consider the etymology of @time to justify this decision further. @time is a lovely macro that can be placed around an arbitrary long-running chunk of code or expression to report its runtime to stdout. @time is the print statement of profiling. @btime and @b can very much not fill that role for three major reasons: first, most long-running code has side effects, and those macros run the code repeatedly, which could break things that rely on their side effects; second, @btime, and to a lesser extent @b, take ages to run; and third, only applying to @btime, @btime runs its body in global scope, not the scope of the caller. @btime and @b are not noninvasive tools to measure runtime of a portion of an algorithm, they are top-level macros to measure the runtime of an expression or function call. Their primary result is the runtime statistics of expression under benchmarking and the conventional way to report the primary result of a macro of function call to the calling context is with a return value. Consequently @b returns an aggregated benchmark result rather than following the pattern of @btime.
If you are writing a script that computes some values and want to display those values to the user, you generally have to call display. Chairmarks in not an exception. If it were possible, I would consider special-casing @show @b blah.
Chairmarks's display format is differs slightly from BenchmarkTools' display format. The indentation differences are to make sure Chairmarks is internally consistent and the choice of information displayed differs because Chairmarks has more types of information to display than BenchmarkTools.
@btime displays with a leading space while @b does not. No Julia objects that I know of displays with a leading space on the first line. Sample (returned by @b) is no different. See above for why @b returns a Sample instead of displaying in the style of @time.
BenchmarkTools.jl's short display mode (@btime) displays runtime and allocations. Chairmark's short display mode (displaying a sample, or simply @b at the REPL) follows Base.@time instead and captures a wide variety of information, displaying only nonzero values. Here's a selection of the diversity of information Charimarks makes available to users, paired with how BenchmarkTools treats the same expressions:
It would be a loss restrict ourselves to only runtime and allocations, it would be distracting to include "0% compilation time" in outputs which have zero compile time, and it would be inconsistent to make some fields (e.g. allocation count and amount) always display while others are only displayed when non-zero. Sparse display is the compromise I've chosen to get the best of both worlds.
`,33)]))}const m=i(n,[["render",h]]);export{c as __pageData,m as default};
diff --git a/previews/PR157/assets/index.md.DrFDggC8.js b/previews/PR157/assets/index.md.DrFDggC8.js
new file mode 100644
index 00000000..a7267611
--- /dev/null
+++ b/previews/PR157/assets/index.md.DrFDggC8.js
@@ -0,0 +1,13 @@
+import{_ as i,c as a,a5 as h,o as n}from"./chunks/framework.rx6Iergl.js";const g=JSON.parse('{"title":"Chairmarks","description":"","frontmatter":{},"headers":[],"relativePath":"index.md","filePath":"index.md","lastUpdated":null}'),k={name:"index.md"};function t(l,s,p,e,r,E){return n(),a("div",null,s[0]||(s[0]=[h(`
julia> using Chairmarks
+
+julia> @b rand(1000) # How long does it take to generate a random array of length 1000?
+720.214 ns (3 allocs: 7.875 KiB)
+
+julia> @b rand(1000) hash # How long does it take to hash that array?
+1.689 μs
+
+julia> @b rand(1000) _.*5 # How long does it take to multiply it by 5 element wise?
+172.970 ns (3 allocs: 7.875 KiB)
+
+julia> @b rand(100,100) inv,_^2,sum # Is it be faster to invert, square, or sum a matrix? [THIS USAGE IS EXPERIMENTAL]
+(92.917 μs (9 allocs: 129.203 KiB), 27.166 μs (3 allocs: 78.203 KiB), 1.083 μs)
`,6)]))}const y=i(k,[["render",t]]);export{g as __pageData,y as default};
diff --git a/previews/PR157/assets/index.md.DrFDggC8.lean.js b/previews/PR157/assets/index.md.DrFDggC8.lean.js
new file mode 100644
index 00000000..a7267611
--- /dev/null
+++ b/previews/PR157/assets/index.md.DrFDggC8.lean.js
@@ -0,0 +1,13 @@
+import{_ as i,c as a,a5 as h,o as n}from"./chunks/framework.rx6Iergl.js";const g=JSON.parse('{"title":"Chairmarks","description":"","frontmatter":{},"headers":[],"relativePath":"index.md","filePath":"index.md","lastUpdated":null}'),k={name:"index.md"};function t(l,s,p,e,r,E){return n(),a("div",null,s[0]||(s[0]=[h(`
julia> using Chairmarks
+
+julia> @b rand(1000) # How long does it take to generate a random array of length 1000?
+720.214 ns (3 allocs: 7.875 KiB)
+
+julia> @b rand(1000) hash # How long does it take to hash that array?
+1.689 μs
+
+julia> @b rand(1000) _.*5 # How long does it take to multiply it by 5 element wise?
+172.970 ns (3 allocs: 7.875 KiB)
+
+julia> @b rand(100,100) inv,_^2,sum # Is it be faster to invert, square, or sum a matrix? [THIS USAGE IS EXPERIMENTAL]
+(92.917 μs (9 allocs: 129.203 KiB), 27.166 μs (3 allocs: 78.203 KiB), 1.083 μs)
`,6)]))}const y=i(k,[["render",t]]);export{g as __pageData,y as default};
diff --git a/previews/PR157/assets/inter-italic-cyrillic-ext.r48I6akx.woff2 b/previews/PR157/assets/inter-italic-cyrillic-ext.r48I6akx.woff2
new file mode 100644
index 00000000..b6b603d5
Binary files /dev/null and b/previews/PR157/assets/inter-italic-cyrillic-ext.r48I6akx.woff2 differ
diff --git a/previews/PR157/assets/inter-italic-cyrillic.By2_1cv3.woff2 b/previews/PR157/assets/inter-italic-cyrillic.By2_1cv3.woff2
new file mode 100644
index 00000000..def40a4f
Binary files /dev/null and b/previews/PR157/assets/inter-italic-cyrillic.By2_1cv3.woff2 differ
diff --git a/previews/PR157/assets/inter-italic-greek-ext.1u6EdAuj.woff2 b/previews/PR157/assets/inter-italic-greek-ext.1u6EdAuj.woff2
new file mode 100644
index 00000000..e070c3d3
Binary files /dev/null and b/previews/PR157/assets/inter-italic-greek-ext.1u6EdAuj.woff2 differ
diff --git a/previews/PR157/assets/inter-italic-greek.DJ8dCoTZ.woff2 b/previews/PR157/assets/inter-italic-greek.DJ8dCoTZ.woff2
new file mode 100644
index 00000000..a3c16ca4
Binary files /dev/null and b/previews/PR157/assets/inter-italic-greek.DJ8dCoTZ.woff2 differ
diff --git a/previews/PR157/assets/inter-italic-latin-ext.CN1xVJS-.woff2 b/previews/PR157/assets/inter-italic-latin-ext.CN1xVJS-.woff2
new file mode 100644
index 00000000..2210a899
Binary files /dev/null and b/previews/PR157/assets/inter-italic-latin-ext.CN1xVJS-.woff2 differ
diff --git a/previews/PR157/assets/inter-italic-latin.C2AdPX0b.woff2 b/previews/PR157/assets/inter-italic-latin.C2AdPX0b.woff2
new file mode 100644
index 00000000..790d62dc
Binary files /dev/null and b/previews/PR157/assets/inter-italic-latin.C2AdPX0b.woff2 differ
diff --git a/previews/PR157/assets/inter-italic-vietnamese.BSbpV94h.woff2 b/previews/PR157/assets/inter-italic-vietnamese.BSbpV94h.woff2
new file mode 100644
index 00000000..1eec0775
Binary files /dev/null and b/previews/PR157/assets/inter-italic-vietnamese.BSbpV94h.woff2 differ
diff --git a/previews/PR157/assets/inter-roman-cyrillic-ext.BBPuwvHQ.woff2 b/previews/PR157/assets/inter-roman-cyrillic-ext.BBPuwvHQ.woff2
new file mode 100644
index 00000000..2cfe6153
Binary files /dev/null and b/previews/PR157/assets/inter-roman-cyrillic-ext.BBPuwvHQ.woff2 differ
diff --git a/previews/PR157/assets/inter-roman-cyrillic.C5lxZ8CY.woff2 b/previews/PR157/assets/inter-roman-cyrillic.C5lxZ8CY.woff2
new file mode 100644
index 00000000..e3886dd1
Binary files /dev/null and b/previews/PR157/assets/inter-roman-cyrillic.C5lxZ8CY.woff2 differ
diff --git a/previews/PR157/assets/inter-roman-greek-ext.CqjqNYQ-.woff2 b/previews/PR157/assets/inter-roman-greek-ext.CqjqNYQ-.woff2
new file mode 100644
index 00000000..36d67487
Binary files /dev/null and b/previews/PR157/assets/inter-roman-greek-ext.CqjqNYQ-.woff2 differ
diff --git a/previews/PR157/assets/inter-roman-greek.BBVDIX6e.woff2 b/previews/PR157/assets/inter-roman-greek.BBVDIX6e.woff2
new file mode 100644
index 00000000..2bed1e85
Binary files /dev/null and b/previews/PR157/assets/inter-roman-greek.BBVDIX6e.woff2 differ
diff --git a/previews/PR157/assets/inter-roman-latin-ext.4ZJIpNVo.woff2 b/previews/PR157/assets/inter-roman-latin-ext.4ZJIpNVo.woff2
new file mode 100644
index 00000000..9a8d1e2b
Binary files /dev/null and b/previews/PR157/assets/inter-roman-latin-ext.4ZJIpNVo.woff2 differ
diff --git a/previews/PR157/assets/inter-roman-latin.Di8DUHzh.woff2 b/previews/PR157/assets/inter-roman-latin.Di8DUHzh.woff2
new file mode 100644
index 00000000..07d3c53a
Binary files /dev/null and b/previews/PR157/assets/inter-roman-latin.Di8DUHzh.woff2 differ
diff --git a/previews/PR157/assets/inter-roman-vietnamese.BjW4sHH5.woff2 b/previews/PR157/assets/inter-roman-vietnamese.BjW4sHH5.woff2
new file mode 100644
index 00000000..57bdc22a
Binary files /dev/null and b/previews/PR157/assets/inter-roman-vietnamese.BjW4sHH5.woff2 differ
diff --git a/previews/PR157/assets/migration.md.Dj8Qh8mB.js b/previews/PR157/assets/migration.md.Dj8Qh8mB.js
new file mode 100644
index 00000000..72c1caee
--- /dev/null
+++ b/previews/PR157/assets/migration.md.Dj8Qh8mB.js
@@ -0,0 +1,76 @@
+import{_ as i,c as a,a5 as t,o as n}from"./chunks/framework.rx6Iergl.js";const g=JSON.parse('{"title":"How to migrate from BenchmarkTools to Chairmarks","description":"","frontmatter":{},"headers":[],"relativePath":"migration.md","filePath":"migration.md","lastUpdated":null}'),e={name:"migration.md"};function h(l,s,k,p,r,d){return n(),a("div",null,s[0]||(s[0]=[t(`
How to migrate from BenchmarkTools to Chairmarks
Chairmarks has a similar samples/evals model to BenchmarkTools. It preserves the keyword arguments samples, evals, and seconds. Unlike BenchmarkTools, the seconds argument is honored even as it drops down to the order of 30μs (@b @b hash(rand()) seconds=.00003). While accuracy does decay as the total number of evaluations and samples decreases, it remains quite reasonable (e.g. I see a noise of about 30% when benchmarking @b hash(rand()) seconds=.00003). This makes it much more reasonable to perform meta-analysis such as computing the time it takes to hash a thousand different lengthed arrays with [@b hash(rand(n)) seconds=.001 for n in 1:1000].
Both BenchmarkTools and Chairmarks use an evaluation model structured like this:
julia
init()
+samples = []
+for _ in 1:samples
+ setup()
+ t0 = time()
+ for _ in 1:evals
+ f()
+ end
+ t1 = time()
+ push!(samples, t1 - t0)
+ teardown()
+end
+return samples
In BenchmarkTools, you specify f and setup with the invocation @benchmark f setup=(setup). In Chairmarks, you specify f and setup with the invocation @be setup f. In BenchmarkTools, setup and f communicate via shared local variables in code generated by BenchmarkTools. In Chairmarks, the function f is passed the return value of the function setup as an argument. Chairmarks also lets you specify teardown, which is not possible with BenchmarkTools, and an init which can be emulated with interpolation using BenchmarkTools.
Here are some examples of corresponding invocations in BenchmarkTools and Chairmarks:
For automated regression tests, RegressionTests.jl is a work in progress replacement for the BenchmarkGroup and @benchmarkable system. Because Chairmarks is efficiently and stably autotuned and RegressionTests.jl is inherently robust to noise, there is no need for parameter caching.
Chairmarks does not provide a judge function to decide if two benchmarks are significantly different. However, you can get accurate data to inform that judgement by passing passing a comma separated list of functions to @b or @be.
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
Like BenchmarkTools, benchmarks that include access to nonconstant globals will receive a performance overhead for that access and you can avoid this via interpolation.
However, Chairmarks's arguments are functions evaluated in the scope of the macro call, not quoted expressions evaled at global scope. This makes nonconstant global access much less of an issue in Chairmarks than BenchmarkTools which, in turn, eliminates much of the need to interpolate variables. For example, the following invocations are all equally fast:
julia
julia> x = 6 # nonconstant global
+6
+
+julia> f(len) = @b rand(len) # put the \`@b\` call in a function (highest performance for repeated benchmarks)
+f (generic function with 1 method)
+
+julia> f(x)
+15.318 ns (2 allocs: 112 bytes)
+
+julia> @b rand($x) # interpolate (most familiar to BenchmarkTools users)
+15.620 ns (2 allocs: 112 bytes)
+
+julia> @b x rand # put the access in the setup phase (most concise in simple cases)
+15.507 ns (2 allocs: 112 bytes)
It is possible to use BenchmarkTools.BenchmarkGroup with Chairmarks. Replacing @benchmarkable invocations with @be invocations and wrapping the group in a function suffices. You don't have to run tune! and instead of calling run, call the function. Even running Statistics.median(suite) works—although any custom plotting might need a couple of tweaks.
julia
using BenchmarkTools, Statistics
+
+function create_benchmarks()
+ functions = Function[sqrt, inv, cbrt, sin, cos]
+ group = BenchmarkGroup()
+ for (index, func) in enumerate(functions)
+ group[index] = @benchmarkable $func(x) setup=(x=rand())
+ end
+ group
+end
+
+suite = create_benchmarks()
+
+tune!(suite)
+
+median(run(suite))
+# edit code
+median(run(suite))
julia
using Chairmarks, Statistics
+
+function run_benchmarks()
+ functions = Function[sqrt, inv, cbrt, sin, cos]
+ group = BenchmarkGroup()
+ for (index, func) in enumerate(functions)
+ group[nameof(func)] = @be rand func
+ end
+ group
+end
+
+median(run_benchmarks())
+# edit code
+median(run_benchmarks())
`,29)]))}const E=i(e,[["render",h]]);export{g as __pageData,E as default};
diff --git a/previews/PR157/assets/migration.md.Dj8Qh8mB.lean.js b/previews/PR157/assets/migration.md.Dj8Qh8mB.lean.js
new file mode 100644
index 00000000..72c1caee
--- /dev/null
+++ b/previews/PR157/assets/migration.md.Dj8Qh8mB.lean.js
@@ -0,0 +1,76 @@
+import{_ as i,c as a,a5 as t,o as n}from"./chunks/framework.rx6Iergl.js";const g=JSON.parse('{"title":"How to migrate from BenchmarkTools to Chairmarks","description":"","frontmatter":{},"headers":[],"relativePath":"migration.md","filePath":"migration.md","lastUpdated":null}'),e={name:"migration.md"};function h(l,s,k,p,r,d){return n(),a("div",null,s[0]||(s[0]=[t(`
How to migrate from BenchmarkTools to Chairmarks
Chairmarks has a similar samples/evals model to BenchmarkTools. It preserves the keyword arguments samples, evals, and seconds. Unlike BenchmarkTools, the seconds argument is honored even as it drops down to the order of 30μs (@b @b hash(rand()) seconds=.00003). While accuracy does decay as the total number of evaluations and samples decreases, it remains quite reasonable (e.g. I see a noise of about 30% when benchmarking @b hash(rand()) seconds=.00003). This makes it much more reasonable to perform meta-analysis such as computing the time it takes to hash a thousand different lengthed arrays with [@b hash(rand(n)) seconds=.001 for n in 1:1000].
Both BenchmarkTools and Chairmarks use an evaluation model structured like this:
julia
init()
+samples = []
+for _ in 1:samples
+ setup()
+ t0 = time()
+ for _ in 1:evals
+ f()
+ end
+ t1 = time()
+ push!(samples, t1 - t0)
+ teardown()
+end
+return samples
In BenchmarkTools, you specify f and setup with the invocation @benchmark f setup=(setup). In Chairmarks, you specify f and setup with the invocation @be setup f. In BenchmarkTools, setup and f communicate via shared local variables in code generated by BenchmarkTools. In Chairmarks, the function f is passed the return value of the function setup as an argument. Chairmarks also lets you specify teardown, which is not possible with BenchmarkTools, and an init which can be emulated with interpolation using BenchmarkTools.
Here are some examples of corresponding invocations in BenchmarkTools and Chairmarks:
For automated regression tests, RegressionTests.jl is a work in progress replacement for the BenchmarkGroup and @benchmarkable system. Because Chairmarks is efficiently and stably autotuned and RegressionTests.jl is inherently robust to noise, there is no need for parameter caching.
Chairmarks does not provide a judge function to decide if two benchmarks are significantly different. However, you can get accurate data to inform that judgement by passing passing a comma separated list of functions to @b or @be.
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
Like BenchmarkTools, benchmarks that include access to nonconstant globals will receive a performance overhead for that access and you can avoid this via interpolation.
However, Chairmarks's arguments are functions evaluated in the scope of the macro call, not quoted expressions evaled at global scope. This makes nonconstant global access much less of an issue in Chairmarks than BenchmarkTools which, in turn, eliminates much of the need to interpolate variables. For example, the following invocations are all equally fast:
julia
julia> x = 6 # nonconstant global
+6
+
+julia> f(len) = @b rand(len) # put the \`@b\` call in a function (highest performance for repeated benchmarks)
+f (generic function with 1 method)
+
+julia> f(x)
+15.318 ns (2 allocs: 112 bytes)
+
+julia> @b rand($x) # interpolate (most familiar to BenchmarkTools users)
+15.620 ns (2 allocs: 112 bytes)
+
+julia> @b x rand # put the access in the setup phase (most concise in simple cases)
+15.507 ns (2 allocs: 112 bytes)
It is possible to use BenchmarkTools.BenchmarkGroup with Chairmarks. Replacing @benchmarkable invocations with @be invocations and wrapping the group in a function suffices. You don't have to run tune! and instead of calling run, call the function. Even running Statistics.median(suite) works—although any custom plotting might need a couple of tweaks.
julia
using BenchmarkTools, Statistics
+
+function create_benchmarks()
+ functions = Function[sqrt, inv, cbrt, sin, cos]
+ group = BenchmarkGroup()
+ for (index, func) in enumerate(functions)
+ group[index] = @benchmarkable $func(x) setup=(x=rand())
+ end
+ group
+end
+
+suite = create_benchmarks()
+
+tune!(suite)
+
+median(run(suite))
+# edit code
+median(run(suite))
julia
using Chairmarks, Statistics
+
+function run_benchmarks()
+ functions = Function[sqrt, inv, cbrt, sin, cos]
+ group = BenchmarkGroup()
+ for (index, func) in enumerate(functions)
+ group[nameof(func)] = @be rand func
+ end
+ group
+end
+
+median(run_benchmarks())
+# edit code
+median(run_benchmarks())
`,29)]))}const E=i(e,[["render",h]]);export{g as __pageData,E as default};
diff --git a/previews/PR157/assets/reference.md.iPg_0Q3h.js b/previews/PR157/assets/reference.md.iPg_0Q3h.js
new file mode 100644
index 00000000..1b1a6b95
--- /dev/null
+++ b/previews/PR157/assets/reference.md.iPg_0Q3h.js
@@ -0,0 +1,129 @@
+import{_ as e,c as t,a5 as a,j as i,a as h,G as l,B as k,o as p}from"./chunks/framework.rx6Iergl.js";const A=JSON.parse('{"title":"Formal API","description":"","frontmatter":{},"headers":[],"relativePath":"reference.md","filePath":"reference.md","lastUpdated":null}'),r={name:"reference.md"},d={class:"jldocstring custom-block",open:""},E={class:"jldocstring custom-block",open:""},g={class:"jldocstring custom-block",open:""},o={class:"jldocstring custom-block",open:""},y={class:"jldocstring custom-block",open:""},c={class:"jldocstring custom-block",open:""};function F(C,s,m,u,b,B){const n=k("Badge");return p(),t("div",null,[s[18]||(s[18]=a('
The formal API of Chairmarks is defined by the docstrings of public symbols. Any behavior promised by these docstrings should typically remain in all future non-breaking releases. Specific display behavior is not part of the API.
However, as a package designed primarily for interactive usage, Chairmarks follows soft semantic versioning. A technically breaking change may be released with a non-breaking version number if the change is not expected to cause significant disruptions.
struct Sample
+ evals ::Float64 # The number of times the benchmark was evaluated for this sample.
+ time ::Float64 # The average time taken to run the sample, in seconds per evaluation.
+ allocs ::Float64 # The average number of allocations made per evaluation
+ bytes ::Float64 # The average number of bytes allocated per evaluation
+ gc_fraction ::Float64 # The fraction of time spent in garbage collection (0.0 to 1.0)
+ compile_fraction ::Float64 # The fraction of time spent compiling (0.0 to 1.0)
+ recompile_fraction ::Float64 # The fraction of compile time which was, itself, recompilation (0.0 to 1.0)
+ warmup ::Float64 # Whether this sample had a warmup run before it (1.0 = yes. 0.0 = no).
+ ...more fields may be added...
+end
A struct representing a single sample of a benchmark.
@b returns a composite sample formed by taking the field-wise minimum of the measured samples. More fields may be added in the future as more information becomes available.
struct Benchmark
+ samples::Vector{Sample}
+ ...more fields may be added...
+end
A struct representing a complete benchmark result. Returned by @be.
More fields may be added in the future to represent non sample specific information.
The functions minimum and maximum are defined field wise on Benchmark objects and return Samples. On Julia 1.9 and above, the functions Statistics.median, Statistics.mean, and Statistics.quantile are also defined field wise on Benchmark objects and return Samples.
julia
julia> @be eval(:(for _ in 1:10; sqrt(rand()); end))
+Benchmark: 15 samples with 1 evaluation
+ min 4.307 ms (3608 allocs: 173.453 KiB, 92.21% compile time)
+ median 4.778 ms (3608 allocs: 173.453 KiB, 94.65% compile time)
+ mean 6.494 ms (3608 allocs: 173.453 KiB, 94.15% compile time)
+ max 12.021 ms (3608 allocs: 173.453 KiB, 95.03% compile time)
+
+julia> minimum(ans)
+4.307 ms (3608 allocs: 173.453 KiB, 92.21% compile time)
@b args... is equivalent to Chairmarks.summarize(@be args...). See the docstring of @be for more information.
Examples
julia
julia> @b rand(10000) # Benchmark a function
+5.833 μs (2 allocs: 78.172 KiB)
+
+julia> @b rand hash # How long does it take to hash a random Float64?
+1.757 ns
+
+julia> @b rand(1000) sort issorted(_) || error() # Simultaneously benchmark and test
+11.291 μs (3 allocs: 18.062 KiB)
+
+julia> @b rand(1000) sort! issorted(_) || error() # BAD! This repeatedly resorts the same array!
+1.309 μs (0.08 allocs: 398.769 bytes)
+
+julia> @b rand(1000) sort! issorted(_) || error() evals=1 # Specify evals=1 to ensure the function is only run once between setup and teardown
+10.041 μs (2 allocs: 10.125 KiB)
+
+julia> @b rand(10) _ sort!∘rand! issorted(_) || error() # Or, include randomization in the benchmarked function and only allocate once
+120.536 ns
+
+julia> @b (x = 0; for _ in 1:50; x = hash(x); end; x) # We can use arbitrary expressions in any position in the pipeline, not just simple functions.
+183.871 ns
+
+julia> @b (x = 0; for _ in 1:5e8; x = hash(x); end; x) # This runs for a long time, so it is only run once (with no warmup)
+2.447 s (without a warmup)
+
+julia> @b rand(10) hash,objectid # Which hash algorithm is faster? [THIS USAGE IS EXPERIMENTAL]
+(17.256 ns, 4.246 ns)
The four positional arguments form a pipeline with the return value of each passed as an argument to the next. Consequently, the first expression in the pipeline must be a nullary function. If you use a symbol like rand, it will be interpreted as a function and called normally. If you use any other expression, it will be interpreted as the body of a nullary function. For example in @be rand(10) the function being benchmarked is () -> rand(10).
Later positions in the pipeline must be unary functions. As with the first function, you may provide either a function, or an expression. However, the rules are slightly different. If the expression you provide contains an _ as an rvalue (which would otherwise error), it is interpreted as a unary function and any such occurrences of _ are replaced with result from the previous function in the pipeline. For example, in @be rand(10) sort(_, rev=true) the setup function is () -> rand(10) and the primary function is x -> sort(x, rev=true). If the expression you provide does not contain an _ as an rvalue, it is assumed to produce a function and is called with the result from the previous function in the pipeline. For example, in @be rand(10) sort!∘shuffle!, the primary function is simply sort!∘shuffle! and receives no preprocessing. @macroexpand can help elucidate what is going on in specific cases.
Positional argument disambiguation
setup, teardown, and init are optional and are parsed with that precedence giving these possible forms:
@be f
+@be setup f
+@be setup f teardown
+@be init setup f teardown
You may use an underscore _ to provide other combinations of arguments. For example, you may provide a teardown and no setup with
@be _ f teardown
Keyword arguments
Provide keyword arguments using name=value syntax similar to how you provide keyword arguments to ordinary functions. Keyword arguments to control executions are
evals::Integer How many function evaluations to perform in each sample. Defaults to automatic calibration.
samples::Integer Maximum number of samples to take. Defaults to unlimited and cannot be specified without also specifying evals. Specifying samples = 0 will cause @be to run the warmup sample only and return that sample.
seconds::Real Maximum amount of time to spend benchmarking. Defaults to Charimarks.DEFAULTS.seconds (which is 0.1 by default) unless samples is specified, in which case it defaults to 10 times as long (1 second, by default). Users are free to modify Charimarks.DEFAULTS.seconds for their own interactive usage and its default value may change in the future. Set to Inf to disable the time limit. Compile time is typically not counted against this limit. A reasonable effort is made to respect the time limit but if samples is unspecified it is always exceeded by a small about (less than 1%) and can be significantly exceeded when benchmarking long running functions.
gc::Bool An experimental option to disable garbage collection during benchmarking. Defaults to Charimarks.DEFAULTS.gc which is true by default. Set to false to disable garbage collection during benchmarking. Disabling garbage collection may cause out of memory errors during a benchmark that requires garbage collection, but should not result in memory leaks that survive past the end of the benchmark. As an experimental option, this may be removed in the future or its semantics may change. This option also depends on Julia internals and so it may break in future versions of Julia.
Interpolation
You may use standard interpolation syntax within any of the positional arguments. This will cause the interpolated values to be evaluated only once upon execution of the benchmark and the runtime of that evlaution will not be included in reported results. For example,
x = [1,2,3]
+@b length($x)
is equivalent to
@b [1,2,3] _ length _
Evaluation model
At a high level, the implementation of this function looks like this
x = init()
+results = []
+for sample in 1:samples
+ y = setup(x)
+
+ t0 = time()
+
+ z = f(y)
+ for _ in 2:evals
+ f(y)
+ end
+
+ push!(results, time()-t0)
+
+ teardown(z)
+end
So init will be called once, setup and teardown will be called once per sample, and f will be called evals times per sample.
Experimental Features
You can pass a comma separated list of functions or expressions to @be and they will all be benchmarked at the same time with interleaved samples, returning a tuple of Benchmarks.
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
Examples
julia
julia> @be rand(10000) # Benchmark a function
+Benchmark: 267 samples with 2 evaluations
+ min 8.500 μs (2 allocs: 78.172 KiB)
+ median 10.354 μs (2 allocs: 78.172 KiB)
+ mean 159.639 μs (2 allocs: 78.172 KiB, 0.37% gc time)
+ max 39.579 ms (2 allocs: 78.172 KiB, 99.93% gc time)
+
+julia> @be rand hash # How long does it take to hash a random Float64?
+Benchmark: 4967 samples with 10805 evaluations
+ min 1.758 ns
+ median 1.774 ns
+ mean 1.820 ns
+ max 5.279 ns
+
+julia> @be rand(1000) sort issorted(_) || error() # Simultaneously benchmark and test
+Benchmark: 2689 samples with 2 evaluations
+ min 9.771 μs (3 allocs: 18.062 KiB)
+ median 11.562 μs (3 allocs: 18.062 KiB)
+ mean 14.933 μs (3 allocs: 18.097 KiB, 0.04% gc time)
+ max 4.916 ms (3 allocs: 20.062 KiB, 99.52% gc time)
+
+julia> @be rand(1000) sort! issorted(_) || error() # BAD! This repeatedly resorts the same array!
+Benchmark: 2850 samples with 13 evaluations
+ min 1.647 μs (0.15 allocs: 797.538 bytes)
+ median 1.971 μs (0.15 allocs: 797.538 bytes)
+ mean 2.212 μs (0.15 allocs: 800.745 bytes, 0.03% gc time)
+ max 262.163 μs (0.15 allocs: 955.077 bytes, 98.95% gc time)
+
+julia> @be rand(1000) sort! issorted(_) || error() evals=1 # Specify evals=1 to ensure the function is only run once between setup and teardown
+Benchmark: 6015 samples with 1 evaluation
+ min 9.666 μs (2 allocs: 10.125 KiB)
+ median 10.916 μs (2 allocs: 10.125 KiB)
+ mean 12.330 μs (2 allocs: 10.159 KiB, 0.02% gc time)
+ max 6.883 ms (2 allocs: 12.125 KiB, 99.56% gc time)
+
+julia> @be rand(10) _ sort!∘rand! issorted(_) || error() # Or, include randomization in the benchmarked function and only allocate once
+Benchmark: 3093 samples with 237 evaluations
+ min 121.308 ns
+ median 126.055 ns
+ mean 128.108 ns
+ max 303.447 ns
+
+julia> @be (x = 0; for _ in 1:50; x = hash(x); end; x) # We can use arbitrary expressions in any position in the pipeline, not just simple functions.
+Benchmark: 3387 samples with 144 evaluations
+ min 183.160 ns
+ median 184.611 ns
+ mean 188.869 ns
+ max 541.667 ns
+
+julia> @be (x = 0; for _ in 1:5e8; x = hash(x); end; x) # This runs for a long time, so it is only run once (with no warmup)
+Benchmark: 1 sample with 1 evaluation
+ 2.488 s (without a warmup)
+
+julia> @be rand(10) hash,objectid # Which hash algorithm is faster? [THIS USAGE IS EXPERIMENTAL]
+Benchmark: 14887 samples with 436 evaluations
+ min 17.106 ns
+ median 18.922 ns
+ mean 20.974 ns
+ max 234.998 ns
+Benchmark: 14887 samples with 436 evaluations
+ min 4.110 ns
+ median 4.683 ns
+ mean 4.979 ns
+ max 42.911 ns
A global constant that holds default benchmarking parameters.
When a parameter is unspecified it defaults to the value stored in Chairmarks.DEFAULTS.
Currently there is one stable default: Chairmarks.DEFAULTS.seconds::Float64 which defaults to 0.1; and one experimental default: Chairmarks.DEFAULTS.gc::Bool which defaults to true.
All default values may be changed in the future and the gc default may be removed entirely.
',6))])])}const D=e(r,[["render",F]]);export{A as __pageData,D as default};
diff --git a/previews/PR157/assets/reference.md.iPg_0Q3h.lean.js b/previews/PR157/assets/reference.md.iPg_0Q3h.lean.js
new file mode 100644
index 00000000..1b1a6b95
--- /dev/null
+++ b/previews/PR157/assets/reference.md.iPg_0Q3h.lean.js
@@ -0,0 +1,129 @@
+import{_ as e,c as t,a5 as a,j as i,a as h,G as l,B as k,o as p}from"./chunks/framework.rx6Iergl.js";const A=JSON.parse('{"title":"Formal API","description":"","frontmatter":{},"headers":[],"relativePath":"reference.md","filePath":"reference.md","lastUpdated":null}'),r={name:"reference.md"},d={class:"jldocstring custom-block",open:""},E={class:"jldocstring custom-block",open:""},g={class:"jldocstring custom-block",open:""},o={class:"jldocstring custom-block",open:""},y={class:"jldocstring custom-block",open:""},c={class:"jldocstring custom-block",open:""};function F(C,s,m,u,b,B){const n=k("Badge");return p(),t("div",null,[s[18]||(s[18]=a('
The formal API of Chairmarks is defined by the docstrings of public symbols. Any behavior promised by these docstrings should typically remain in all future non-breaking releases. Specific display behavior is not part of the API.
However, as a package designed primarily for interactive usage, Chairmarks follows soft semantic versioning. A technically breaking change may be released with a non-breaking version number if the change is not expected to cause significant disruptions.
struct Sample
+ evals ::Float64 # The number of times the benchmark was evaluated for this sample.
+ time ::Float64 # The average time taken to run the sample, in seconds per evaluation.
+ allocs ::Float64 # The average number of allocations made per evaluation
+ bytes ::Float64 # The average number of bytes allocated per evaluation
+ gc_fraction ::Float64 # The fraction of time spent in garbage collection (0.0 to 1.0)
+ compile_fraction ::Float64 # The fraction of time spent compiling (0.0 to 1.0)
+ recompile_fraction ::Float64 # The fraction of compile time which was, itself, recompilation (0.0 to 1.0)
+ warmup ::Float64 # Whether this sample had a warmup run before it (1.0 = yes. 0.0 = no).
+ ...more fields may be added...
+end
A struct representing a single sample of a benchmark.
@b returns a composite sample formed by taking the field-wise minimum of the measured samples. More fields may be added in the future as more information becomes available.
struct Benchmark
+ samples::Vector{Sample}
+ ...more fields may be added...
+end
A struct representing a complete benchmark result. Returned by @be.
More fields may be added in the future to represent non sample specific information.
The functions minimum and maximum are defined field wise on Benchmark objects and return Samples. On Julia 1.9 and above, the functions Statistics.median, Statistics.mean, and Statistics.quantile are also defined field wise on Benchmark objects and return Samples.
julia
julia> @be eval(:(for _ in 1:10; sqrt(rand()); end))
+Benchmark: 15 samples with 1 evaluation
+ min 4.307 ms (3608 allocs: 173.453 KiB, 92.21% compile time)
+ median 4.778 ms (3608 allocs: 173.453 KiB, 94.65% compile time)
+ mean 6.494 ms (3608 allocs: 173.453 KiB, 94.15% compile time)
+ max 12.021 ms (3608 allocs: 173.453 KiB, 95.03% compile time)
+
+julia> minimum(ans)
+4.307 ms (3608 allocs: 173.453 KiB, 92.21% compile time)
@b args... is equivalent to Chairmarks.summarize(@be args...). See the docstring of @be for more information.
Examples
julia
julia> @b rand(10000) # Benchmark a function
+5.833 μs (2 allocs: 78.172 KiB)
+
+julia> @b rand hash # How long does it take to hash a random Float64?
+1.757 ns
+
+julia> @b rand(1000) sort issorted(_) || error() # Simultaneously benchmark and test
+11.291 μs (3 allocs: 18.062 KiB)
+
+julia> @b rand(1000) sort! issorted(_) || error() # BAD! This repeatedly resorts the same array!
+1.309 μs (0.08 allocs: 398.769 bytes)
+
+julia> @b rand(1000) sort! issorted(_) || error() evals=1 # Specify evals=1 to ensure the function is only run once between setup and teardown
+10.041 μs (2 allocs: 10.125 KiB)
+
+julia> @b rand(10) _ sort!∘rand! issorted(_) || error() # Or, include randomization in the benchmarked function and only allocate once
+120.536 ns
+
+julia> @b (x = 0; for _ in 1:50; x = hash(x); end; x) # We can use arbitrary expressions in any position in the pipeline, not just simple functions.
+183.871 ns
+
+julia> @b (x = 0; for _ in 1:5e8; x = hash(x); end; x) # This runs for a long time, so it is only run once (with no warmup)
+2.447 s (without a warmup)
+
+julia> @b rand(10) hash,objectid # Which hash algorithm is faster? [THIS USAGE IS EXPERIMENTAL]
+(17.256 ns, 4.246 ns)
The four positional arguments form a pipeline with the return value of each passed as an argument to the next. Consequently, the first expression in the pipeline must be a nullary function. If you use a symbol like rand, it will be interpreted as a function and called normally. If you use any other expression, it will be interpreted as the body of a nullary function. For example in @be rand(10) the function being benchmarked is () -> rand(10).
Later positions in the pipeline must be unary functions. As with the first function, you may provide either a function, or an expression. However, the rules are slightly different. If the expression you provide contains an _ as an rvalue (which would otherwise error), it is interpreted as a unary function and any such occurrences of _ are replaced with result from the previous function in the pipeline. For example, in @be rand(10) sort(_, rev=true) the setup function is () -> rand(10) and the primary function is x -> sort(x, rev=true). If the expression you provide does not contain an _ as an rvalue, it is assumed to produce a function and is called with the result from the previous function in the pipeline. For example, in @be rand(10) sort!∘shuffle!, the primary function is simply sort!∘shuffle! and receives no preprocessing. @macroexpand can help elucidate what is going on in specific cases.
Positional argument disambiguation
setup, teardown, and init are optional and are parsed with that precedence giving these possible forms:
@be f
+@be setup f
+@be setup f teardown
+@be init setup f teardown
You may use an underscore _ to provide other combinations of arguments. For example, you may provide a teardown and no setup with
@be _ f teardown
Keyword arguments
Provide keyword arguments using name=value syntax similar to how you provide keyword arguments to ordinary functions. Keyword arguments to control executions are
evals::Integer How many function evaluations to perform in each sample. Defaults to automatic calibration.
samples::Integer Maximum number of samples to take. Defaults to unlimited and cannot be specified without also specifying evals. Specifying samples = 0 will cause @be to run the warmup sample only and return that sample.
seconds::Real Maximum amount of time to spend benchmarking. Defaults to Charimarks.DEFAULTS.seconds (which is 0.1 by default) unless samples is specified, in which case it defaults to 10 times as long (1 second, by default). Users are free to modify Charimarks.DEFAULTS.seconds for their own interactive usage and its default value may change in the future. Set to Inf to disable the time limit. Compile time is typically not counted against this limit. A reasonable effort is made to respect the time limit but if samples is unspecified it is always exceeded by a small about (less than 1%) and can be significantly exceeded when benchmarking long running functions.
gc::Bool An experimental option to disable garbage collection during benchmarking. Defaults to Charimarks.DEFAULTS.gc which is true by default. Set to false to disable garbage collection during benchmarking. Disabling garbage collection may cause out of memory errors during a benchmark that requires garbage collection, but should not result in memory leaks that survive past the end of the benchmark. As an experimental option, this may be removed in the future or its semantics may change. This option also depends on Julia internals and so it may break in future versions of Julia.
Interpolation
You may use standard interpolation syntax within any of the positional arguments. This will cause the interpolated values to be evaluated only once upon execution of the benchmark and the runtime of that evlaution will not be included in reported results. For example,
x = [1,2,3]
+@b length($x)
is equivalent to
@b [1,2,3] _ length _
Evaluation model
At a high level, the implementation of this function looks like this
x = init()
+results = []
+for sample in 1:samples
+ y = setup(x)
+
+ t0 = time()
+
+ z = f(y)
+ for _ in 2:evals
+ f(y)
+ end
+
+ push!(results, time()-t0)
+
+ teardown(z)
+end
So init will be called once, setup and teardown will be called once per sample, and f will be called evals times per sample.
Experimental Features
You can pass a comma separated list of functions or expressions to @be and they will all be benchmarked at the same time with interleaved samples, returning a tuple of Benchmarks.
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
Examples
julia
julia> @be rand(10000) # Benchmark a function
+Benchmark: 267 samples with 2 evaluations
+ min 8.500 μs (2 allocs: 78.172 KiB)
+ median 10.354 μs (2 allocs: 78.172 KiB)
+ mean 159.639 μs (2 allocs: 78.172 KiB, 0.37% gc time)
+ max 39.579 ms (2 allocs: 78.172 KiB, 99.93% gc time)
+
+julia> @be rand hash # How long does it take to hash a random Float64?
+Benchmark: 4967 samples with 10805 evaluations
+ min 1.758 ns
+ median 1.774 ns
+ mean 1.820 ns
+ max 5.279 ns
+
+julia> @be rand(1000) sort issorted(_) || error() # Simultaneously benchmark and test
+Benchmark: 2689 samples with 2 evaluations
+ min 9.771 μs (3 allocs: 18.062 KiB)
+ median 11.562 μs (3 allocs: 18.062 KiB)
+ mean 14.933 μs (3 allocs: 18.097 KiB, 0.04% gc time)
+ max 4.916 ms (3 allocs: 20.062 KiB, 99.52% gc time)
+
+julia> @be rand(1000) sort! issorted(_) || error() # BAD! This repeatedly resorts the same array!
+Benchmark: 2850 samples with 13 evaluations
+ min 1.647 μs (0.15 allocs: 797.538 bytes)
+ median 1.971 μs (0.15 allocs: 797.538 bytes)
+ mean 2.212 μs (0.15 allocs: 800.745 bytes, 0.03% gc time)
+ max 262.163 μs (0.15 allocs: 955.077 bytes, 98.95% gc time)
+
+julia> @be rand(1000) sort! issorted(_) || error() evals=1 # Specify evals=1 to ensure the function is only run once between setup and teardown
+Benchmark: 6015 samples with 1 evaluation
+ min 9.666 μs (2 allocs: 10.125 KiB)
+ median 10.916 μs (2 allocs: 10.125 KiB)
+ mean 12.330 μs (2 allocs: 10.159 KiB, 0.02% gc time)
+ max 6.883 ms (2 allocs: 12.125 KiB, 99.56% gc time)
+
+julia> @be rand(10) _ sort!∘rand! issorted(_) || error() # Or, include randomization in the benchmarked function and only allocate once
+Benchmark: 3093 samples with 237 evaluations
+ min 121.308 ns
+ median 126.055 ns
+ mean 128.108 ns
+ max 303.447 ns
+
+julia> @be (x = 0; for _ in 1:50; x = hash(x); end; x) # We can use arbitrary expressions in any position in the pipeline, not just simple functions.
+Benchmark: 3387 samples with 144 evaluations
+ min 183.160 ns
+ median 184.611 ns
+ mean 188.869 ns
+ max 541.667 ns
+
+julia> @be (x = 0; for _ in 1:5e8; x = hash(x); end; x) # This runs for a long time, so it is only run once (with no warmup)
+Benchmark: 1 sample with 1 evaluation
+ 2.488 s (without a warmup)
+
+julia> @be rand(10) hash,objectid # Which hash algorithm is faster? [THIS USAGE IS EXPERIMENTAL]
+Benchmark: 14887 samples with 436 evaluations
+ min 17.106 ns
+ median 18.922 ns
+ mean 20.974 ns
+ max 234.998 ns
+Benchmark: 14887 samples with 436 evaluations
+ min 4.110 ns
+ median 4.683 ns
+ mean 4.979 ns
+ max 42.911 ns
A global constant that holds default benchmarking parameters.
When a parameter is unspecified it defaults to the value stored in Chairmarks.DEFAULTS.
Currently there is one stable default: Chairmarks.DEFAULTS.seconds::Float64 which defaults to 0.1; and one experimental default: Chairmarks.DEFAULTS.gc::Bool which defaults to true.
All default values may be changed in the future and the gc default may be removed entirely.
',6))])])}const D=e(r,[["render",F]]);export{A as __pageData,D as default};
diff --git a/previews/PR157/assets/regressions.md.COviHpic.js b/previews/PR157/assets/regressions.md.COviHpic.js
new file mode 100644
index 00000000..65deeb26
--- /dev/null
+++ b/previews/PR157/assets/regressions.md.COviHpic.js
@@ -0,0 +1,9 @@
+import{_ as i,c as a,a5 as e,o as t}from"./chunks/framework.rx6Iergl.js";const d=JSON.parse('{"title":"How to use Chairmarks for regression testing","description":"","frontmatter":{},"headers":[],"relativePath":"regressions.md","filePath":"regressions.md","lastUpdated":null}'),n={name:"regressions.md"};function r(h,s,l,p,k,o){return t(),a("div",null,s[0]||(s[0]=[e(`
Regression testing is a difficult task. RegressionTests.jl has ambitious goals and is already state of the art within the Julia ecosystem, but it is very much a work in progress. Proceed at your own risk, or wait for that package to reach maturity.
Use RegressionTests.jl! Make a file bench/runbenchmarks.jl with the following content:
`,7)]))}const c=i(n,[["render",r]]);export{d as __pageData,c as default};
diff --git a/previews/PR157/assets/regressions.md.COviHpic.lean.js b/previews/PR157/assets/regressions.md.COviHpic.lean.js
new file mode 100644
index 00000000..65deeb26
--- /dev/null
+++ b/previews/PR157/assets/regressions.md.COviHpic.lean.js
@@ -0,0 +1,9 @@
+import{_ as i,c as a,a5 as e,o as t}from"./chunks/framework.rx6Iergl.js";const d=JSON.parse('{"title":"How to use Chairmarks for regression testing","description":"","frontmatter":{},"headers":[],"relativePath":"regressions.md","filePath":"regressions.md","lastUpdated":null}'),n={name:"regressions.md"};function r(h,s,l,p,k,o){return t(),a("div",null,s[0]||(s[0]=[e(`
Regression testing is a difficult task. RegressionTests.jl has ambitious goals and is already state of the art within the Julia ecosystem, but it is very much a work in progress. Proceed at your own risk, or wait for that package to reach maturity.
Use RegressionTests.jl! Make a file bench/runbenchmarks.jl with the following content:
Welcome! This tutorial assumes very little prior knowledge and walks you through how to become a competent user of Chairmarks. If you are already an experienced user of BenchmarkTools, you may want to read about how to migrate from BenchmarkTools to Chairmarks instead.
Now, launch a Julia REPL by typing julia at the command line.
To install Chairmarks, type ] to enter the package manager, and then type
julia
(@v1.xx) pkg> add Chairmarks
This will install Chairmarks into your default environment. Unlike most packages, installing Chairmarks into your default environment is recommended because it is a very lightweight package and a development tool.
Now, you can use Chairmarks by typing using Chairmarks in the REPL. Press backspace to exit the package manager and return to the REPL and run
Congratulations! This is your first result from Chairmarks. Let's look a little closer at the invocation and results. @b is a macro exported from Chairmarks. It takes the expression rand(100) and runs it a bunch of times, measuring how long it takes to run.
The result, 95.500 ns (2 allocs: 928 bytes) tells us that the expression takes 95.5 nanoseconds to run and allocates 928 bytes of memory spread across two distinct allocation events. The exact results you get will likely differ based on your hardware and the Julia version you are using. These results from Julia 1.11.
Chairmarks reports results in seconds (s), milliseconds (ms), microseconds (μs), or nanoseconds (ns) depending on the magnitude of the runtime. Each of these units is 1000 times smaller than the last according to the standard SI unit system.
By default, Chairmarks reports the fastest runtime of the expression. This is typically the best choice for reducing noise in microbenchmarks as things like garbage collection and other background tasks can cause inconsistent slowdowns but not speedups. If you want to get the full results, use the @be macro. (@be is longer than @b and gives a longer output)
julia
julia> @be rand(100)
+Benchmark: 19442 samples with 25 evaluations
+ min 95.000 ns (2 allocs: 928 bytes)
+ median 103.320 ns (2 allocs: 928 bytes)
+ mean 140.096 ns (2 allocs: 928 bytes, 0.36% gc time)
+ max 19.748 μs (2 allocs: 928 bytes, 96.95% gc time)
This invocation runs the same experiment as @b, but reports more results. It ran 19442 samples, each of which involved recording some performance counters, running rand(100) 25 times, and then recording the performance counters again and computing the difference. The reported runtimes and allocations are those differences divided by the number of evaluations. We can see here that the runtime of rand(100) is pretty stable. 50% of the time it ranges between 95 and 103.3 nanoseconds. However, the maximum time is two orders of magnitude slower than the mean time. This is because the maximum time includes a garbage collection event that took 96.95% of the time.[1]
Sometimes, we wish to measure the runtime of a function that requires some data to operate on, but don't want to measure the runtime of the function that generates the data. For example, we may want to compare how long it takes to hash an array of numbers, but we don't want to include the time it takes to generate the input in our measurements. We can do this using Chairmarks' pipeline syntax:
julia
julia> @b rand(100) hash
+166.665 ns
The first argument is called once per sample, and the second argument is called once per evaluation, each time passing the result of the first argument. We can also use the special _ variable to refer to the output of the previous step. Here, we benchmark computing the norm of a vector:
The _ refers to the array whose norm is to be computed.
We can perform a comparison of two different implementations of the same specification by providing a comma-separated list of functions to benchmark. Here, we compare two ways of computing the norm of a vector:
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
This invocation pattern runs the setup function once per sample and randomly selects which implementation to run first for each sample. This makes comparative benchmarks robust to fluctuations in system load.
When benchmarking a function which mutates its arguments, be aware that the same input is passed to the function for each evaluation in a sample. This can cause problems if the function does not expect to repeatedly operate on the same input.
We can see immediately that something suspicious is going on here: the reported number of allocations (which we expect to be an integer) is a floating point number. This is because for each sample, the array is sorted once, which involves allocating a scratchspace, and then that same array is re-sorted repeatedly. It turns out sort! operates very quickly and does not allocate at all when it is passed a sorted array. To benchmark this more accurately, we may specify the number of evaluations
Notice that each of these invocations produces a different output. Setting evals to 1 can cause strange effects whenever the runtime of the expression is less than about 30 μs both due to the overhead of starting and stopping the timers and due to the imprecision of timer results on most machines. Any form of pre-processing included in the primary function will be included in the reported runtime, so each of the latter options also introduces artifacts.
In general, it is important to use the same methodology when comparing two different functions. Chairmarks is optimized to produce reliable results for answering questions of the form "which of these two implementations of the same specification is faster", more so than providing absolute measurements of the runtime of fast-running functions.
That said, for functions which take more than about 30 μs to run, Chairmarks can reliably provide accurate absolute timings. In general, the faster the runtime of the expression being measured, the more strange behavior and artifacts you will see, and the more careful you have to be.
Longer runtimes and macrobenchmarks are much more trustworthy than microbenchmarks, though microbenchmarks are often a great tool for identifying performance bottlenecks and optimizing macrobenchmarks.
It's pretty straightforward to benchmark a whole parameter sweep to check performance figures. Just invoke @b or @be repeatedly. For example, if you want to know how allocation times vary with input size, you could run this list comprehension which runs @b fill(0, n) for each power of 4 from 4 to 4^10:
The default runtime of a benchmark is 0.1 seconds, so this invocation should take just over 1 second to run. Let's verify:
julia
julia> @time [@b fill(0, n) for n in 4 .^ (1:10)];
+ 1.038502 seconds (27.16 M allocations: 22.065 GiB, 27.03% gc time, 3.59% compilation time)
If we want a wider parameter sweep, we can use the seconds parameter to configure how long benchmarking will take. However, once we start setting seconds to a value below 0.1, the benchmarking itself becomes performance sensitive and, from the performance tips, performance critical code should be inside a function. So we should put the call to @b or @be into a function.
Setting the seconds parameter too low can cause benchmarks to be noisy. It's good practice to run a benchmark at least a couple of times no matter what the configuration is to make sure it's reasonably stable.
It is possible to manually specify the number of evaluations, samples, and/or seconds to run benchmarking for and configure the default benchmarking runtime. It is also possible to pass a teardown function or an initialization function that runs only once. See the docstring of @be for more information on these additional arguments.
note that the samples are aggregated element wise, so the max field reports the maximum runtime and the maximum proportion of runtime spent in garbage collection (gc). Thus it is possible that the trial which had a 19.748 μs runtime was not the same trial that spent 96.95% of its time in garbage collection. This is in order to make the results more consistent. If half the trials spend 10% of their time in gc amd runtime varies based on other factors, it would be unfortunate to report maximum gc time as either 10% or 0% at random depending on whether the longest running trial happened to trigger gc. ↩︎
`,52)]))}const g=i(h,[["render",e]]);export{o as __pageData,g as default};
diff --git a/previews/PR157/assets/tutorial.md.DSdCipZ_.lean.js b/previews/PR157/assets/tutorial.md.DSdCipZ_.lean.js
new file mode 100644
index 00000000..0e00fd90
--- /dev/null
+++ b/previews/PR157/assets/tutorial.md.DSdCipZ_.lean.js
@@ -0,0 +1,59 @@
+import{_ as i,c as a,a5 as n,o as t}from"./chunks/framework.rx6Iergl.js";const o=JSON.parse('{"title":"Tutorial","description":"","frontmatter":{},"headers":[],"relativePath":"tutorial.md","filePath":"tutorial.md","lastUpdated":null}'),h={name:"tutorial.md"};function e(l,s,k,p,r,d){return t(),a("div",null,s[0]||(s[0]=[n(`
Welcome! This tutorial assumes very little prior knowledge and walks you through how to become a competent user of Chairmarks. If you are already an experienced user of BenchmarkTools, you may want to read about how to migrate from BenchmarkTools to Chairmarks instead.
Now, launch a Julia REPL by typing julia at the command line.
To install Chairmarks, type ] to enter the package manager, and then type
julia
(@v1.xx) pkg> add Chairmarks
This will install Chairmarks into your default environment. Unlike most packages, installing Chairmarks into your default environment is recommended because it is a very lightweight package and a development tool.
Now, you can use Chairmarks by typing using Chairmarks in the REPL. Press backspace to exit the package manager and return to the REPL and run
Congratulations! This is your first result from Chairmarks. Let's look a little closer at the invocation and results. @b is a macro exported from Chairmarks. It takes the expression rand(100) and runs it a bunch of times, measuring how long it takes to run.
The result, 95.500 ns (2 allocs: 928 bytes) tells us that the expression takes 95.5 nanoseconds to run and allocates 928 bytes of memory spread across two distinct allocation events. The exact results you get will likely differ based on your hardware and the Julia version you are using. These results from Julia 1.11.
Chairmarks reports results in seconds (s), milliseconds (ms), microseconds (μs), or nanoseconds (ns) depending on the magnitude of the runtime. Each of these units is 1000 times smaller than the last according to the standard SI unit system.
By default, Chairmarks reports the fastest runtime of the expression. This is typically the best choice for reducing noise in microbenchmarks as things like garbage collection and other background tasks can cause inconsistent slowdowns but not speedups. If you want to get the full results, use the @be macro. (@be is longer than @b and gives a longer output)
julia
julia> @be rand(100)
+Benchmark: 19442 samples with 25 evaluations
+ min 95.000 ns (2 allocs: 928 bytes)
+ median 103.320 ns (2 allocs: 928 bytes)
+ mean 140.096 ns (2 allocs: 928 bytes, 0.36% gc time)
+ max 19.748 μs (2 allocs: 928 bytes, 96.95% gc time)
This invocation runs the same experiment as @b, but reports more results. It ran 19442 samples, each of which involved recording some performance counters, running rand(100) 25 times, and then recording the performance counters again and computing the difference. The reported runtimes and allocations are those differences divided by the number of evaluations. We can see here that the runtime of rand(100) is pretty stable. 50% of the time it ranges between 95 and 103.3 nanoseconds. However, the maximum time is two orders of magnitude slower than the mean time. This is because the maximum time includes a garbage collection event that took 96.95% of the time.[1]
Sometimes, we wish to measure the runtime of a function that requires some data to operate on, but don't want to measure the runtime of the function that generates the data. For example, we may want to compare how long it takes to hash an array of numbers, but we don't want to include the time it takes to generate the input in our measurements. We can do this using Chairmarks' pipeline syntax:
julia
julia> @b rand(100) hash
+166.665 ns
The first argument is called once per sample, and the second argument is called once per evaluation, each time passing the result of the first argument. We can also use the special _ variable to refer to the output of the previous step. Here, we benchmark computing the norm of a vector:
The _ refers to the array whose norm is to be computed.
We can perform a comparison of two different implementations of the same specification by providing a comma-separated list of functions to benchmark. Here, we compare two ways of computing the norm of a vector:
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
This invocation pattern runs the setup function once per sample and randomly selects which implementation to run first for each sample. This makes comparative benchmarks robust to fluctuations in system load.
When benchmarking a function which mutates its arguments, be aware that the same input is passed to the function for each evaluation in a sample. This can cause problems if the function does not expect to repeatedly operate on the same input.
We can see immediately that something suspicious is going on here: the reported number of allocations (which we expect to be an integer) is a floating point number. This is because for each sample, the array is sorted once, which involves allocating a scratchspace, and then that same array is re-sorted repeatedly. It turns out sort! operates very quickly and does not allocate at all when it is passed a sorted array. To benchmark this more accurately, we may specify the number of evaluations
Notice that each of these invocations produces a different output. Setting evals to 1 can cause strange effects whenever the runtime of the expression is less than about 30 μs both due to the overhead of starting and stopping the timers and due to the imprecision of timer results on most machines. Any form of pre-processing included in the primary function will be included in the reported runtime, so each of the latter options also introduces artifacts.
In general, it is important to use the same methodology when comparing two different functions. Chairmarks is optimized to produce reliable results for answering questions of the form "which of these two implementations of the same specification is faster", more so than providing absolute measurements of the runtime of fast-running functions.
That said, for functions which take more than about 30 μs to run, Chairmarks can reliably provide accurate absolute timings. In general, the faster the runtime of the expression being measured, the more strange behavior and artifacts you will see, and the more careful you have to be.
Longer runtimes and macrobenchmarks are much more trustworthy than microbenchmarks, though microbenchmarks are often a great tool for identifying performance bottlenecks and optimizing macrobenchmarks.
It's pretty straightforward to benchmark a whole parameter sweep to check performance figures. Just invoke @b or @be repeatedly. For example, if you want to know how allocation times vary with input size, you could run this list comprehension which runs @b fill(0, n) for each power of 4 from 4 to 4^10:
The default runtime of a benchmark is 0.1 seconds, so this invocation should take just over 1 second to run. Let's verify:
julia
julia> @time [@b fill(0, n) for n in 4 .^ (1:10)];
+ 1.038502 seconds (27.16 M allocations: 22.065 GiB, 27.03% gc time, 3.59% compilation time)
If we want a wider parameter sweep, we can use the seconds parameter to configure how long benchmarking will take. However, once we start setting seconds to a value below 0.1, the benchmarking itself becomes performance sensitive and, from the performance tips, performance critical code should be inside a function. So we should put the call to @b or @be into a function.
Setting the seconds parameter too low can cause benchmarks to be noisy. It's good practice to run a benchmark at least a couple of times no matter what the configuration is to make sure it's reasonably stable.
It is possible to manually specify the number of evaluations, samples, and/or seconds to run benchmarking for and configure the default benchmarking runtime. It is also possible to pass a teardown function or an initialization function that runs only once. See the docstring of @be for more information on these additional arguments.
note that the samples are aggregated element wise, so the max field reports the maximum runtime and the maximum proportion of runtime spent in garbage collection (gc). Thus it is possible that the trial which had a 19.748 μs runtime was not the same trial that spent 96.95% of its time in garbage collection. This is in order to make the results more consistent. If half the trials spend 10% of their time in gc amd runtime varies based on other factors, it would be unfortunate to report maximum gc time as either 10% or 0% at random depending on whether the longest running trial happened to trigger gc. ↩︎
`,52)]))}const g=i(h,[["render",e]]);export{o as __pageData,g as default};
diff --git a/previews/PR157/assets/why.md.D-0pwC7u.js b/previews/PR157/assets/why.md.D-0pwC7u.js
new file mode 100644
index 00000000..5a34bb7e
--- /dev/null
+++ b/previews/PR157/assets/why.md.D-0pwC7u.js
@@ -0,0 +1,22 @@
+import{_ as i,c as a,a5 as t,o as h}from"./chunks/framework.rx6Iergl.js";const g=JSON.parse('{"title":"","description":"","frontmatter":{},"headers":[],"relativePath":"why.md","filePath":"why.md","lastUpdated":null}'),n={name:"why.md"};function l(e,s,k,p,r,d){return h(),a("div",null,s[0]||(s[0]=[t(`
Chairmarks uses a concise pipeline syntax to define benchmarks. When providing a single argument, that argument is automatically wrapped in a function for higher performance and executed
On versions of Julia prior to 1.8, Chairmarks automatically computes a checksum based on the results of the provided computations and stores the checksum in Chiarmaks.CHECKSUM. This makes it impossible for the compiler to elide any part of the computation that has an impact on its return value.
While the checksums are reasonably fast, one negative side effect of this is that they add a bit of overhead to the measured runtime, and that overhead can vary depending on the return value of the function being benchmarked. In versions of Julia 1.8 and later, these checksums are emulated using the function Base.donotdelete which is designed and documented to ensure that necessary computation is not elided without adding extra overhead.
Chairmarks is inherently narrower than BenchmarkTools by construction. It also has more reliable back support. Back support is a defining feature of chairs while benches are known to sometimes lack back support.
`,25)]))}const y=i(n,[["render",l]]);export{g as __pageData,y as default};
diff --git a/previews/PR157/assets/why.md.D-0pwC7u.lean.js b/previews/PR157/assets/why.md.D-0pwC7u.lean.js
new file mode 100644
index 00000000..5a34bb7e
--- /dev/null
+++ b/previews/PR157/assets/why.md.D-0pwC7u.lean.js
@@ -0,0 +1,22 @@
+import{_ as i,c as a,a5 as t,o as h}from"./chunks/framework.rx6Iergl.js";const g=JSON.parse('{"title":"","description":"","frontmatter":{},"headers":[],"relativePath":"why.md","filePath":"why.md","lastUpdated":null}'),n={name:"why.md"};function l(e,s,k,p,r,d){return h(),a("div",null,s[0]||(s[0]=[t(`
Chairmarks uses a concise pipeline syntax to define benchmarks. When providing a single argument, that argument is automatically wrapped in a function for higher performance and executed
On versions of Julia prior to 1.8, Chairmarks automatically computes a checksum based on the results of the provided computations and stores the checksum in Chiarmaks.CHECKSUM. This makes it impossible for the compiler to elide any part of the computation that has an impact on its return value.
While the checksums are reasonably fast, one negative side effect of this is that they add a bit of overhead to the measured runtime, and that overhead can vary depending on the return value of the function being benchmarked. In versions of Julia 1.8 and later, these checksums are emulated using the function Base.donotdelete which is designed and documented to ensure that necessary computation is not elided without adding extra overhead.
Chairmarks is inherently narrower than BenchmarkTools by construction. It also has more reliable back support. Back support is a defining feature of chairs while benches are known to sometimes lack back support.
`,25)]))}const y=i(n,[["render",l]]);export{g as __pageData,y as default};
diff --git a/previews/PR157/autoload.html b/previews/PR157/autoload.html
new file mode 100644
index 00000000..d4681cc3
--- /dev/null
+++ b/previews/PR157/autoload.html
@@ -0,0 +1,30 @@
+
+
+
+
+
+ How to integrate Chairmarks into your workflow | Chairmarks.jl
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
There are several ways to use Chairmarks in your interactive sessions, ordered from simplest to install first to most streamlined user experience last.
Add Chairmarks to your default environment with import Pkg; Pkg.activate(); Pkg.add("Chairmarks"). Chairmarks has no non-stdlib dependencies, and precompiles in less than one second, so this should not have any adverse impacts on your environments nor slow load times nor package instillation times.
Add Chairmarks to your default environment and put isinteractive() && using Chairmarks in your startup.jl file. This will make Chairmarks available in all your REPL sessions while still requiring an explicit load in scripts and packages. This will slow down launching a new Julia session by a few milliseconds (for comparison, this is about 20x faster than loading Revise in your startup.jl file).
[Recommended] Add Chairmarks and BasicAutoloads to your default environment and put the following script in your startup.jl file to automatically load it when you type @b or @be in the REPL:
This page of the documentation is not targeted at teaching folks how to use this package. Instead, it is designed to offer insight into how the the internals work, why I made certain design decisions. That said, it certainly won't hurt your user experience to read this!
This is not part of the API
The things listed on this page are true (or should be fixed) but are not guarantees. They may change in future 1.x releases.
The obvious and formulaic choice, Benchmarks.jl, was taken. This package is very similar to Benchmarks.jl and BenchmarkTools.jl, but has a significantly different implementation and a distinct API. When differentiating multiple similar things, I prefer distinctive names over synonyms or different parts of speech. The difference between the names should, if possible, reflect the difference in the concepts. If that's not possible, it should be clear that the difference between the names does not reflect the difference between concepts. This rules out most names like "Benchmarker.jl", "Benchmarking.jl", "BenchmarkSystem.jl", etc. I could have chosen "EfficientBenchmarks.jl", but that is pretty pretentious and also would become misleading if "BenchmarkTools.jl" becomes more efficient in the future.
Chairmarks doesn't run garbage collection at the start of every benchmark by default
Chairmarks has faster and more efficient auto-tuning
Chairmarks runs its arguments as functions in the scope that the benchmark was invoked from, rather than evaling them at global scope. This makes it possible to get significant performance speedups for fast benchmarks by putting the benchmarking itself into a function. It also avoids leaking memory on repeated invocations of a benchmark, which is unavoidable with BenchmarkTools.jl's design. (discourse, github)
Because Charimarks does not use toplevel eval, it can run arbitrarily quickly, as limited by a user's noise tolerance. Consequently, the auto-tuning algorithm is tuned for low runtime budgets in addition to high budgets so its precision doesn't degrade too much at low runtime budgets.
Chairmarks tries very hard not to discard data. For example, if your function takes longer to evaluate then the runtime budget, Chairmarks will simply report the warmup runtime (with a disclaimer that there was no warmup). This makes Chairmarks a viable complete substitute for the trivial @time macro and friends. @b sleep(10) takes 10.05 seconds (just like @time sleep(10)), whereas @benchmark sleep(10) takes 30.6 seconds despite only reporting one sample.
When comparing @b to @btime with seconds=.5 or more, yes: result stability should be comparable. Any deficiency in precision or reliability compared to BenchmarkTools is a problem and should be reported. When seconds is less than about 0.5, BenchmarkTools stops respecting the requested runtime budget and so it could very well perform much more precisely than Chairmarks (it's hard to compete with a 500ms benchmark when you only have 1ms). In practice, however, Chairmarks stays pretty reliable even for fairly low runtimes.
When comparing different implementations of the same function, @b rand f,g can be more reliable than judge(minimum(@benchmark(f(x) setup=(x=rand()))), minimum(@benchmark(g(x) setup=(x=rand()))) because the former randomly interleaves calls to f and g in the same context and scope with the same inputs while the latter runs all evaluations of f before all evaluations of g and—typically less importantly—uses different random inputs.
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
First of all, what is "tuning" for? It's for tuning the number of evaluations per sample. We want the total runtime of a sample to be 30μs, which makes the noise of instrumentation itself (clock precision, the time to takes to record performance counters, etc.) negligible. If the user specifies evals manually, then there is nothing to tune, so we do a single warmup and then jump straight to the benchmark. In the benchmark, we run samples until the time budget or sample budget is exhausted.
If evals is not provided and seconds is (by default we have seconds=0.1), then we target spending 5% of the time budget on calibration. We have a multi-phase approach where we start by running the function just once, use that to decide the order of the benchmark and how much additional calibration is needed. See https://github.com/LilithHafner/Chairmarks.jl/blob/main/src/benchmarking.jl for details.
We prioritize human experience (both user and developer) over formal guarantees. Where formal guarantees improve the experience of folks using this package, we will try to make and adhere to them. Under both soft and traditional semantic versioning, the version number is primarily used to communicate to users whether a release is breaking. If Chairmarks had an infinite number of users, all of whom respected the formal API by only depending on formally documented behavior, then soft semantic versioning would be equivalent to traditional semantic versioning. However, as the user base differs from that theoretical ideal, so too does the most effective way of communicating which releases are breaking. For example, if version 1.1.0 documents that "the default runtime is 0.1 seconds" and a new version allows users to control this with a global variable, then that change does break the guarantee that the default runtime is 0.1 seconds. However, it still makes sense to release as 1.2.0 rather than 2.0.0 because it is less disruptive to users to have that technical breakage than to have to review the changelog for breakage and decide whether to update their compatibility statements or not.
When there are conflicts between compatibility/alignment with BenchmarkTools and producing the best experience I can for folks who are not coming for BenchmarkTools or using BenchmarkTools simultaneously, I put much more weight on the latter. One reason for this is folks who want something like BenchmarkTools should use BenchmarkTools. It's a great package that is reliable, mature, and has been stable for a long time. A diversity of design choices lets users pick packages based on their own preferences. Another reason for this is that I aim to work toward the best long term benchmarking solution possible (perhaps in some years there will come a time where another package makes both BenchmarkTools.jl and Chairmarks.jl obsolete). To this end, carrying forward design choices I disagree with is not beneficial. All that said, I do not want to break compatibility or change style just to stand out. Almost all of BenchmarkTools' design decisions are solid and worth copying. Things like automatic tuning, the ability to bypass that automatic tuning, a split evals/samples structure, the ability to run untimed setup code before each sample, and many more mundane details we take for granted were once clever design decisions made in BenchmarkTools or its predecessors.
Below, I'll list some specific design departures and why I made them
Chairmarks uses the abbreviated macros @b and @be. Descriptive names are almost always better than terse one-letter names. However I maintain that macros defined in packages and designed to be typed repeatedly at the REPL are one of the few exceptions to this "almost always". At the REPL, these macros are often typed once and never read. In this case, concision does matter and readability does not. When naming these macros I anticipated that REPL usage would be much more common than usage in packages or reused scripts. However, if and as this changes it may be worth adding longer names for them and possibly restricting the shorter names to interactive use only.
@be, like BenchmarkTools.@benchmark, returns a Benchmark object. @b, unlike BenchmarkTools.@btime returns a composite sample formed by computing the minimum statistic over the benchmark, rather than returning the expression result and printing runtime statistics. The reason I originally considered making this decision is that typed @btime sort!(x) setup=(x=rand(1000)) evals=1 into the REPL and seen the whole screen fill with random numbers too many times. Let's also consider the etymology of @time to justify this decision further. @time is a lovely macro that can be placed around an arbitrary long-running chunk of code or expression to report its runtime to stdout. @time is the print statement of profiling. @btime and @b can very much not fill that role for three major reasons: first, most long-running code has side effects, and those macros run the code repeatedly, which could break things that rely on their side effects; second, @btime, and to a lesser extent @b, take ages to run; and third, only applying to @btime, @btime runs its body in global scope, not the scope of the caller. @btime and @b are not noninvasive tools to measure runtime of a portion of an algorithm, they are top-level macros to measure the runtime of an expression or function call. Their primary result is the runtime statistics of expression under benchmarking and the conventional way to report the primary result of a macro of function call to the calling context is with a return value. Consequently @b returns an aggregated benchmark result rather than following the pattern of @btime.
If you are writing a script that computes some values and want to display those values to the user, you generally have to call display. Chairmarks in not an exception. If it were possible, I would consider special-casing @show @b blah.
Chairmarks's display format is differs slightly from BenchmarkTools' display format. The indentation differences are to make sure Chairmarks is internally consistent and the choice of information displayed differs because Chairmarks has more types of information to display than BenchmarkTools.
@btime displays with a leading space while @b does not. No Julia objects that I know of displays with a leading space on the first line. Sample (returned by @b) is no different. See above for why @b returns a Sample instead of displaying in the style of @time.
BenchmarkTools.jl's short display mode (@btime) displays runtime and allocations. Chairmark's short display mode (displaying a sample, or simply @b at the REPL) follows Base.@time instead and captures a wide variety of information, displaying only nonzero values. Here's a selection of the diversity of information Charimarks makes available to users, paired with how BenchmarkTools treats the same expressions:
It would be a loss restrict ourselves to only runtime and allocations, it would be distracting to include "0% compilation time" in outputs which have zero compile time, and it would be inconsistent to make some fields (e.g. allocation count and amount) always display while others are only displayed when non-zero. Sparse display is the compromise I've chosen to get the best of both worlds.
+
+
+
+
\ No newline at end of file
diff --git a/previews/PR157/hashmap.json b/previews/PR157/hashmap.json
new file mode 100644
index 00000000..0f679a19
--- /dev/null
+++ b/previews/PR157/hashmap.json
@@ -0,0 +1 @@
+{"autoload.md":"CTdFRUqF","explanations.md":"BIDZQFXY","index.md":"DrFDggC8","migration.md":"Dj8Qh8mB","reference.md":"iPg_0Q3h","regressions.md":"COviHpic","tutorial.md":"DSdCipZ_","why.md":"D-0pwC7u"}
diff --git a/previews/PR157/index.html b/previews/PR157/index.html
new file mode 100644
index 00000000..58bc9bc9
--- /dev/null
+++ b/previews/PR157/index.html
@@ -0,0 +1,37 @@
+
+
+
+
+
+ Chairmarks | Chairmarks.jl
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
julia> using Chairmarks
+
+julia> @b rand(1000) # How long does it take to generate a random array of length 1000?
+720.214 ns (3 allocs: 7.875 KiB)
+
+julia> @b rand(1000) hash # How long does it take to hash that array?
+1.689 μs
+
+julia> @b rand(1000) _.*5 # How long does it take to multiply it by 5 element wise?
+172.970 ns (3 allocs: 7.875 KiB)
+
+julia> @b rand(100,100) inv,_^2,sum # Is it be faster to invert, square, or sum a matrix? [THIS USAGE IS EXPERIMENTAL]
+(92.917 μs (9 allocs: 129.203 KiB), 27.166 μs (3 allocs: 78.203 KiB), 1.083 μs)
+
+
+
+
\ No newline at end of file
diff --git a/previews/PR157/migration.html b/previews/PR157/migration.html
new file mode 100644
index 00000000..cd590872
--- /dev/null
+++ b/previews/PR157/migration.html
@@ -0,0 +1,100 @@
+
+
+
+
+
+ How to migrate from BenchmarkTools to Chairmarks | Chairmarks.jl
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
How to migrate from BenchmarkTools to Chairmarks
Chairmarks has a similar samples/evals model to BenchmarkTools. It preserves the keyword arguments samples, evals, and seconds. Unlike BenchmarkTools, the seconds argument is honored even as it drops down to the order of 30μs (@b @b hash(rand()) seconds=.00003). While accuracy does decay as the total number of evaluations and samples decreases, it remains quite reasonable (e.g. I see a noise of about 30% when benchmarking @b hash(rand()) seconds=.00003). This makes it much more reasonable to perform meta-analysis such as computing the time it takes to hash a thousand different lengthed arrays with [@b hash(rand(n)) seconds=.001 for n in 1:1000].
Both BenchmarkTools and Chairmarks use an evaluation model structured like this:
julia
init()
+samples = []
+for _ in 1:samples
+ setup()
+ t0 = time()
+ for _ in 1:evals
+ f()
+ end
+ t1 = time()
+ push!(samples, t1 - t0)
+ teardown()
+end
+return samples
In BenchmarkTools, you specify f and setup with the invocation @benchmark f setup=(setup). In Chairmarks, you specify f and setup with the invocation @be setup f. In BenchmarkTools, setup and f communicate via shared local variables in code generated by BenchmarkTools. In Chairmarks, the function f is passed the return value of the function setup as an argument. Chairmarks also lets you specify teardown, which is not possible with BenchmarkTools, and an init which can be emulated with interpolation using BenchmarkTools.
Here are some examples of corresponding invocations in BenchmarkTools and Chairmarks:
For automated regression tests, RegressionTests.jl is a work in progress replacement for the BenchmarkGroup and @benchmarkable system. Because Chairmarks is efficiently and stably autotuned and RegressionTests.jl is inherently robust to noise, there is no need for parameter caching.
Chairmarks does not provide a judge function to decide if two benchmarks are significantly different. However, you can get accurate data to inform that judgement by passing passing a comma separated list of functions to @b or @be.
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
Like BenchmarkTools, benchmarks that include access to nonconstant globals will receive a performance overhead for that access and you can avoid this via interpolation.
However, Chairmarks's arguments are functions evaluated in the scope of the macro call, not quoted expressions evaled at global scope. This makes nonconstant global access much less of an issue in Chairmarks than BenchmarkTools which, in turn, eliminates much of the need to interpolate variables. For example, the following invocations are all equally fast:
julia
julia> x = 6 # nonconstant global
+6
+
+julia> f(len) = @b rand(len) # put the `@b` call in a function (highest performance for repeated benchmarks)
+f (generic function with 1 method)
+
+julia> f(x)
+15.318 ns (2 allocs: 112 bytes)
+
+julia> @b rand($x) # interpolate (most familiar to BenchmarkTools users)
+15.620 ns (2 allocs: 112 bytes)
+
+julia> @b x rand # put the access in the setup phase (most concise in simple cases)
+15.507 ns (2 allocs: 112 bytes)
It is possible to use BenchmarkTools.BenchmarkGroup with Chairmarks. Replacing @benchmarkable invocations with @be invocations and wrapping the group in a function suffices. You don't have to run tune! and instead of calling run, call the function. Even running Statistics.median(suite) works—although any custom plotting might need a couple of tweaks.
julia
using BenchmarkTools, Statistics
+
+function create_benchmarks()
+ functions = Function[sqrt, inv, cbrt, sin, cos]
+ group = BenchmarkGroup()
+ for (index, func) in enumerate(functions)
+ group[index] = @benchmarkable $func(x) setup=(x=rand())
+ end
+ group
+end
+
+suite = create_benchmarks()
+
+tune!(suite)
+
+median(run(suite))
+# edit code
+median(run(suite))
julia
using Chairmarks, Statistics
+
+function run_benchmarks()
+ functions = Function[sqrt, inv, cbrt, sin, cos]
+ group = BenchmarkGroup()
+ for (index, func) in enumerate(functions)
+ group[nameof(func)] = @be rand func
+ end
+ group
+end
+
+median(run_benchmarks())
+# edit code
+median(run_benchmarks())
The formal API of Chairmarks is defined by the docstrings of public symbols. Any behavior promised by these docstrings should typically remain in all future non-breaking releases. Specific display behavior is not part of the API.
However, as a package designed primarily for interactive usage, Chairmarks follows soft semantic versioning. A technically breaking change may be released with a non-breaking version number if the change is not expected to cause significant disruptions.
struct Sample
+ evals ::Float64 # The number of times the benchmark was evaluated for this sample.
+ time ::Float64 # The average time taken to run the sample, in seconds per evaluation.
+ allocs ::Float64 # The average number of allocations made per evaluation
+ bytes ::Float64 # The average number of bytes allocated per evaluation
+ gc_fraction ::Float64 # The fraction of time spent in garbage collection (0.0 to 1.0)
+ compile_fraction ::Float64 # The fraction of time spent compiling (0.0 to 1.0)
+ recompile_fraction ::Float64 # The fraction of compile time which was, itself, recompilation (0.0 to 1.0)
+ warmup ::Float64 # Whether this sample had a warmup run before it (1.0 = yes. 0.0 = no).
+ ...more fields may be added...
+end
A struct representing a single sample of a benchmark.
@b returns a composite sample formed by taking the field-wise minimum of the measured samples. More fields may be added in the future as more information becomes available.
struct Benchmark
+ samples::Vector{Sample}
+ ...more fields may be added...
+end
A struct representing a complete benchmark result. Returned by @be.
More fields may be added in the future to represent non sample specific information.
The functions minimum and maximum are defined field wise on Benchmark objects and return Samples. On Julia 1.9 and above, the functions Statistics.median, Statistics.mean, and Statistics.quantile are also defined field wise on Benchmark objects and return Samples.
julia
julia> @be eval(:(for _ in 1:10; sqrt(rand()); end))
+Benchmark: 15 samples with 1 evaluation
+ min 4.307 ms (3608 allocs: 173.453 KiB, 92.21% compile time)
+ median 4.778 ms (3608 allocs: 173.453 KiB, 94.65% compile time)
+ mean 6.494 ms (3608 allocs: 173.453 KiB, 94.15% compile time)
+ max 12.021 ms (3608 allocs: 173.453 KiB, 95.03% compile time)
+
+julia> minimum(ans)
+4.307 ms (3608 allocs: 173.453 KiB, 92.21% compile time)
@b args... is equivalent to Chairmarks.summarize(@be args...). See the docstring of @be for more information.
Examples
julia
julia> @b rand(10000) # Benchmark a function
+5.833 μs (2 allocs: 78.172 KiB)
+
+julia> @b rand hash # How long does it take to hash a random Float64?
+1.757 ns
+
+julia> @b rand(1000) sort issorted(_) || error() # Simultaneously benchmark and test
+11.291 μs (3 allocs: 18.062 KiB)
+
+julia> @b rand(1000) sort! issorted(_) || error() # BAD! This repeatedly resorts the same array!
+1.309 μs (0.08 allocs: 398.769 bytes)
+
+julia> @b rand(1000) sort! issorted(_) || error() evals=1 # Specify evals=1 to ensure the function is only run once between setup and teardown
+10.041 μs (2 allocs: 10.125 KiB)
+
+julia> @b rand(10) _ sort!∘rand! issorted(_) || error() # Or, include randomization in the benchmarked function and only allocate once
+120.536 ns
+
+julia> @b (x = 0; for _ in 1:50; x = hash(x); end; x) # We can use arbitrary expressions in any position in the pipeline, not just simple functions.
+183.871 ns
+
+julia> @b (x = 0; for _ in 1:5e8; x = hash(x); end; x) # This runs for a long time, so it is only run once (with no warmup)
+2.447 s (without a warmup)
+
+julia> @b rand(10) hash,objectid # Which hash algorithm is faster? [THIS USAGE IS EXPERIMENTAL]
+(17.256 ns, 4.246 ns)
The four positional arguments form a pipeline with the return value of each passed as an argument to the next. Consequently, the first expression in the pipeline must be a nullary function. If you use a symbol like rand, it will be interpreted as a function and called normally. If you use any other expression, it will be interpreted as the body of a nullary function. For example in @be rand(10) the function being benchmarked is () -> rand(10).
Later positions in the pipeline must be unary functions. As with the first function, you may provide either a function, or an expression. However, the rules are slightly different. If the expression you provide contains an _ as an rvalue (which would otherwise error), it is interpreted as a unary function and any such occurrences of _ are replaced with result from the previous function in the pipeline. For example, in @be rand(10) sort(_, rev=true) the setup function is () -> rand(10) and the primary function is x -> sort(x, rev=true). If the expression you provide does not contain an _ as an rvalue, it is assumed to produce a function and is called with the result from the previous function in the pipeline. For example, in @be rand(10) sort!∘shuffle!, the primary function is simply sort!∘shuffle! and receives no preprocessing. @macroexpand can help elucidate what is going on in specific cases.
Positional argument disambiguation
setup, teardown, and init are optional and are parsed with that precedence giving these possible forms:
@be f
+@be setup f
+@be setup f teardown
+@be init setup f teardown
You may use an underscore _ to provide other combinations of arguments. For example, you may provide a teardown and no setup with
@be _ f teardown
Keyword arguments
Provide keyword arguments using name=value syntax similar to how you provide keyword arguments to ordinary functions. Keyword arguments to control executions are
evals::Integer How many function evaluations to perform in each sample. Defaults to automatic calibration.
samples::Integer Maximum number of samples to take. Defaults to unlimited and cannot be specified without also specifying evals. Specifying samples = 0 will cause @be to run the warmup sample only and return that sample.
seconds::Real Maximum amount of time to spend benchmarking. Defaults to Charimarks.DEFAULTS.seconds (which is 0.1 by default) unless samples is specified, in which case it defaults to 10 times as long (1 second, by default). Users are free to modify Charimarks.DEFAULTS.seconds for their own interactive usage and its default value may change in the future. Set to Inf to disable the time limit. Compile time is typically not counted against this limit. A reasonable effort is made to respect the time limit but if samples is unspecified it is always exceeded by a small about (less than 1%) and can be significantly exceeded when benchmarking long running functions.
gc::Bool An experimental option to disable garbage collection during benchmarking. Defaults to Charimarks.DEFAULTS.gc which is true by default. Set to false to disable garbage collection during benchmarking. Disabling garbage collection may cause out of memory errors during a benchmark that requires garbage collection, but should not result in memory leaks that survive past the end of the benchmark. As an experimental option, this may be removed in the future or its semantics may change. This option also depends on Julia internals and so it may break in future versions of Julia.
Interpolation
You may use standard interpolation syntax within any of the positional arguments. This will cause the interpolated values to be evaluated only once upon execution of the benchmark and the runtime of that evlaution will not be included in reported results. For example,
x = [1,2,3]
+@b length($x)
is equivalent to
@b [1,2,3] _ length _
Evaluation model
At a high level, the implementation of this function looks like this
x = init()
+results = []
+for sample in 1:samples
+ y = setup(x)
+
+ t0 = time()
+
+ z = f(y)
+ for _ in 2:evals
+ f(y)
+ end
+
+ push!(results, time()-t0)
+
+ teardown(z)
+end
So init will be called once, setup and teardown will be called once per sample, and f will be called evals times per sample.
Experimental Features
You can pass a comma separated list of functions or expressions to @be and they will all be benchmarked at the same time with interleaved samples, returning a tuple of Benchmarks.
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
Examples
julia
julia> @be rand(10000) # Benchmark a function
+Benchmark: 267 samples with 2 evaluations
+ min 8.500 μs (2 allocs: 78.172 KiB)
+ median 10.354 μs (2 allocs: 78.172 KiB)
+ mean 159.639 μs (2 allocs: 78.172 KiB, 0.37% gc time)
+ max 39.579 ms (2 allocs: 78.172 KiB, 99.93% gc time)
+
+julia> @be rand hash # How long does it take to hash a random Float64?
+Benchmark: 4967 samples with 10805 evaluations
+ min 1.758 ns
+ median 1.774 ns
+ mean 1.820 ns
+ max 5.279 ns
+
+julia> @be rand(1000) sort issorted(_) || error() # Simultaneously benchmark and test
+Benchmark: 2689 samples with 2 evaluations
+ min 9.771 μs (3 allocs: 18.062 KiB)
+ median 11.562 μs (3 allocs: 18.062 KiB)
+ mean 14.933 μs (3 allocs: 18.097 KiB, 0.04% gc time)
+ max 4.916 ms (3 allocs: 20.062 KiB, 99.52% gc time)
+
+julia> @be rand(1000) sort! issorted(_) || error() # BAD! This repeatedly resorts the same array!
+Benchmark: 2850 samples with 13 evaluations
+ min 1.647 μs (0.15 allocs: 797.538 bytes)
+ median 1.971 μs (0.15 allocs: 797.538 bytes)
+ mean 2.212 μs (0.15 allocs: 800.745 bytes, 0.03% gc time)
+ max 262.163 μs (0.15 allocs: 955.077 bytes, 98.95% gc time)
+
+julia> @be rand(1000) sort! issorted(_) || error() evals=1 # Specify evals=1 to ensure the function is only run once between setup and teardown
+Benchmark: 6015 samples with 1 evaluation
+ min 9.666 μs (2 allocs: 10.125 KiB)
+ median 10.916 μs (2 allocs: 10.125 KiB)
+ mean 12.330 μs (2 allocs: 10.159 KiB, 0.02% gc time)
+ max 6.883 ms (2 allocs: 12.125 KiB, 99.56% gc time)
+
+julia> @be rand(10) _ sort!∘rand! issorted(_) || error() # Or, include randomization in the benchmarked function and only allocate once
+Benchmark: 3093 samples with 237 evaluations
+ min 121.308 ns
+ median 126.055 ns
+ mean 128.108 ns
+ max 303.447 ns
+
+julia> @be (x = 0; for _ in 1:50; x = hash(x); end; x) # We can use arbitrary expressions in any position in the pipeline, not just simple functions.
+Benchmark: 3387 samples with 144 evaluations
+ min 183.160 ns
+ median 184.611 ns
+ mean 188.869 ns
+ max 541.667 ns
+
+julia> @be (x = 0; for _ in 1:5e8; x = hash(x); end; x) # This runs for a long time, so it is only run once (with no warmup)
+Benchmark: 1 sample with 1 evaluation
+ 2.488 s (without a warmup)
+
+julia> @be rand(10) hash,objectid # Which hash algorithm is faster? [THIS USAGE IS EXPERIMENTAL]
+Benchmark: 14887 samples with 436 evaluations
+ min 17.106 ns
+ median 18.922 ns
+ mean 20.974 ns
+ max 234.998 ns
+Benchmark: 14887 samples with 436 evaluations
+ min 4.110 ns
+ median 4.683 ns
+ mean 4.979 ns
+ max 42.911 ns
A global constant that holds default benchmarking parameters.
When a parameter is unspecified it defaults to the value stored in Chairmarks.DEFAULTS.
Currently there is one stable default: Chairmarks.DEFAULTS.seconds::Float64 which defaults to 0.1; and one experimental default: Chairmarks.DEFAULTS.gc::Bool which defaults to true.
All default values may be changed in the future and the gc default may be removed entirely.
Regression testing is a difficult task. RegressionTests.jl has ambitious goals and is already state of the art within the Julia ecosystem, but it is very much a work in progress. Proceed at your own risk, or wait for that package to reach maturity.
Use RegressionTests.jl! Make a file bench/runbenchmarks.jl with the following content:
Welcome! This tutorial assumes very little prior knowledge and walks you through how to become a competent user of Chairmarks. If you are already an experienced user of BenchmarkTools, you may want to read about how to migrate from BenchmarkTools to Chairmarks instead.
Now, launch a Julia REPL by typing julia at the command line.
To install Chairmarks, type ] to enter the package manager, and then type
julia
(@v1.xx) pkg> add Chairmarks
This will install Chairmarks into your default environment. Unlike most packages, installing Chairmarks into your default environment is recommended because it is a very lightweight package and a development tool.
Now, you can use Chairmarks by typing using Chairmarks in the REPL. Press backspace to exit the package manager and return to the REPL and run
Congratulations! This is your first result from Chairmarks. Let's look a little closer at the invocation and results. @b is a macro exported from Chairmarks. It takes the expression rand(100) and runs it a bunch of times, measuring how long it takes to run.
The result, 95.500 ns (2 allocs: 928 bytes) tells us that the expression takes 95.5 nanoseconds to run and allocates 928 bytes of memory spread across two distinct allocation events. The exact results you get will likely differ based on your hardware and the Julia version you are using. These results from Julia 1.11.
Chairmarks reports results in seconds (s), milliseconds (ms), microseconds (μs), or nanoseconds (ns) depending on the magnitude of the runtime. Each of these units is 1000 times smaller than the last according to the standard SI unit system.
By default, Chairmarks reports the fastest runtime of the expression. This is typically the best choice for reducing noise in microbenchmarks as things like garbage collection and other background tasks can cause inconsistent slowdowns but not speedups. If you want to get the full results, use the @be macro. (@be is longer than @b and gives a longer output)
julia
julia> @be rand(100)
+Benchmark: 19442 samples with 25 evaluations
+ min 95.000 ns (2 allocs: 928 bytes)
+ median 103.320 ns (2 allocs: 928 bytes)
+ mean 140.096 ns (2 allocs: 928 bytes, 0.36% gc time)
+ max 19.748 μs (2 allocs: 928 bytes, 96.95% gc time)
This invocation runs the same experiment as @b, but reports more results. It ran 19442 samples, each of which involved recording some performance counters, running rand(100) 25 times, and then recording the performance counters again and computing the difference. The reported runtimes and allocations are those differences divided by the number of evaluations. We can see here that the runtime of rand(100) is pretty stable. 50% of the time it ranges between 95 and 103.3 nanoseconds. However, the maximum time is two orders of magnitude slower than the mean time. This is because the maximum time includes a garbage collection event that took 96.95% of the time.[1]
Sometimes, we wish to measure the runtime of a function that requires some data to operate on, but don't want to measure the runtime of the function that generates the data. For example, we may want to compare how long it takes to hash an array of numbers, but we don't want to include the time it takes to generate the input in our measurements. We can do this using Chairmarks' pipeline syntax:
julia
julia> @b rand(100) hash
+166.665 ns
The first argument is called once per sample, and the second argument is called once per evaluation, each time passing the result of the first argument. We can also use the special _ variable to refer to the output of the previous step. Here, we benchmark computing the norm of a vector:
The _ refers to the array whose norm is to be computed.
We can perform a comparison of two different implementations of the same specification by providing a comma-separated list of functions to benchmark. Here, we compare two ways of computing the norm of a vector:
Warning
Comparative benchmarking is experimental and may be removed or changed in future versions
This invocation pattern runs the setup function once per sample and randomly selects which implementation to run first for each sample. This makes comparative benchmarks robust to fluctuations in system load.
When benchmarking a function which mutates its arguments, be aware that the same input is passed to the function for each evaluation in a sample. This can cause problems if the function does not expect to repeatedly operate on the same input.
We can see immediately that something suspicious is going on here: the reported number of allocations (which we expect to be an integer) is a floating point number. This is because for each sample, the array is sorted once, which involves allocating a scratchspace, and then that same array is re-sorted repeatedly. It turns out sort! operates very quickly and does not allocate at all when it is passed a sorted array. To benchmark this more accurately, we may specify the number of evaluations
Notice that each of these invocations produces a different output. Setting evals to 1 can cause strange effects whenever the runtime of the expression is less than about 30 μs both due to the overhead of starting and stopping the timers and due to the imprecision of timer results on most machines. Any form of pre-processing included in the primary function will be included in the reported runtime, so each of the latter options also introduces artifacts.
In general, it is important to use the same methodology when comparing two different functions. Chairmarks is optimized to produce reliable results for answering questions of the form "which of these two implementations of the same specification is faster", more so than providing absolute measurements of the runtime of fast-running functions.
That said, for functions which take more than about 30 μs to run, Chairmarks can reliably provide accurate absolute timings. In general, the faster the runtime of the expression being measured, the more strange behavior and artifacts you will see, and the more careful you have to be.
Longer runtimes and macrobenchmarks are much more trustworthy than microbenchmarks, though microbenchmarks are often a great tool for identifying performance bottlenecks and optimizing macrobenchmarks.
It's pretty straightforward to benchmark a whole parameter sweep to check performance figures. Just invoke @b or @be repeatedly. For example, if you want to know how allocation times vary with input size, you could run this list comprehension which runs @b fill(0, n) for each power of 4 from 4 to 4^10:
The default runtime of a benchmark is 0.1 seconds, so this invocation should take just over 1 second to run. Let's verify:
julia
julia> @time [@b fill(0, n) for n in 4 .^ (1:10)];
+ 1.038502 seconds (27.16 M allocations: 22.065 GiB, 27.03% gc time, 3.59% compilation time)
If we want a wider parameter sweep, we can use the seconds parameter to configure how long benchmarking will take. However, once we start setting seconds to a value below 0.1, the benchmarking itself becomes performance sensitive and, from the performance tips, performance critical code should be inside a function. So we should put the call to @b or @be into a function.
Setting the seconds parameter too low can cause benchmarks to be noisy. It's good practice to run a benchmark at least a couple of times no matter what the configuration is to make sure it's reasonably stable.
It is possible to manually specify the number of evaluations, samples, and/or seconds to run benchmarking for and configure the default benchmarking runtime. It is also possible to pass a teardown function or an initialization function that runs only once. See the docstring of @be for more information on these additional arguments.
note that the samples are aggregated element wise, so the max field reports the maximum runtime and the maximum proportion of runtime spent in garbage collection (gc). Thus it is possible that the trial which had a 19.748 μs runtime was not the same trial that spent 96.95% of its time in garbage collection. This is in order to make the results more consistent. If half the trials spend 10% of their time in gc amd runtime varies based on other factors, it would be unfortunate to report maximum gc time as either 10% or 0% at random depending on whether the longest running trial happened to trigger gc. ↩︎
Chairmarks uses a concise pipeline syntax to define benchmarks. When providing a single argument, that argument is automatically wrapped in a function for higher performance and executed
On versions of Julia prior to 1.8, Chairmarks automatically computes a checksum based on the results of the provided computations and stores the checksum in Chiarmaks.CHECKSUM. This makes it impossible for the compiler to elide any part of the computation that has an impact on its return value.
While the checksums are reasonably fast, one negative side effect of this is that they add a bit of overhead to the measured runtime, and that overhead can vary depending on the return value of the function being benchmarked. In versions of Julia 1.8 and later, these checksums are emulated using the function Base.donotdelete which is designed and documented to ensure that necessary computation is not elided without adding extra overhead.
Chairmarks is inherently narrower than BenchmarkTools by construction. It also has more reliable back support. Back support is a defining feature of chairs while benches are known to sometimes lack back support.