-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Your test duration measurement is inaccurate #143
Comments
That's why the API:
There's no one-size-fits-all benchmarking design in the JS ecosystem. But instead APIs allowing to refine the benchmark accordingly and offering sane defaults. What is actually missing in tinybench API to adapt it to browsers? |
@jerome-benoit I didn't realize you also used confidence - that means the jitter isn't of much concern. But for the rest, in short, let start = now()
fn()
let end = now()
let duration = end - start The concrete fix is this:
Changing the |
On a related note, maybe they should've called it |
Benchmark warmup at
Only the latency of the benchmarking function execution with JIT deoptimization is measured in tinybench.
A correct benchmark methodology means not modifying the experiment to time. Such as measuring the time a runner at doing 500m is not measuring the time and distance of one step repetitively and use the average of that measurement as a base to time his 500m course. It's utterly wrong in so many ways ... I've seen benchmarking tool such as mitata using a similar totally flawed methology. Tinybench will never go that path as we care about using unbiased measurement methodology. That why I've forked mitata in tatami-ng because the maintainer was not inclined to external contributions about it. And now pushing the relevant bits of that fork to tinybench that will show up in version 3.x.x
Tinybench is meant to be a lean library using state of the art benchmarking methods and advanced statistics. The analysis of them such as determining if the margin of error is acceptable, the median absolute deviation is acceptable, ... and globally the statistical significance of the result will not be part of tinybench. It's up to the user to analyze them and eventually automate the detection of anomalies in the measurement.
The analysis of the result is meant to tell if a measurement is correct or not: for example the presence of a lot of zero measurement will make the margin of error go high for latency => results cannot be trusted. And using a totally flawed benchmarking methodology (and opening a wide door to the premature optimization disease) as a workaround to a too high resolution in the JS runtime timestamping is not an acceptable solution. The root cause must be fixed: not offering an optional mode with high resolution timer in a JS runtime is considered as a bug nowadays. And browsers can be started with high resolution timer for benchmarking purpose. So I repeat: what is actually missing in tinybench to run accurate benchmark using state of the art methodology in browsers? |
Performance measurement is unfortunately not as simple as
performance.now()
in browsers. Further, operating systems can and do sometimes have resolution limits of their own.Here's some considerations that need to be addressed, in general:
performance.now()
tick during sleep. In other platforms, they don't, in violation of the spec: Suggestions: ticking during sleep, comparison across contexts, time origin + now semantics, and skew definition w3c/hr-time#115 (comment)The benchmarks currently naively use start and end
performance.now()
/process.hrtime.bigint()
calls. The precision issues can give you bad data, but it's not insurmountable:It doesn't appear your benchmark execution code currently takes any of this into account, hence this issue.
The text was updated successfully, but these errors were encountered: