-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double-check harness and warm up heuristics #316
Comments
Update: doing some quick empirical testing on our headline benchmarks, it looks like an adaptive heuristic which goes something like "we are done warming up after a minimum of 8 iterations, and after we've compiled zero blocks for 4 iterations" would probably work decently well in practice. Though it wouldn't be 100% perfect. The I'm actually seeing that a number of benchmarks have a spike in compiled blocks after 30 iterations, which makes sense given our call threshold heuristic. An alternative way to do adaptive warm-up could be to benchmark for a minimum of 30 iterations, and then we can check in-hindsight if we stopped compiling after the 10th iteration. If not, count the first 30 iterations as warm-up and then run for another 30 iterations, then check how much we compiled during those 30 new iterations, keep going until we get a minimum number of benchmarking iterations with zero or near-zero new blocks compiled. A third way to go would be to just increase the number of warm-up iterations to 40 and call it good enough (though that could obviously slow down some benchmarks). |
off topic: I often need to manually modify yjit-bench locally to achieve things like this. Now that you're doing it too, I filed a PR to add that feature Shopify/yjit-bench#330. This helped the following investigation:
At first, I thought the two-step call threshold might be causing this. However, forcing it to Given that it's coming from an eval, I suspect that the warmup of lobsters may never end (until it hits the Hexapdf does stop compiling after a reasonably small number of iterations, so it makes sense to do something to increase the number of warmup iterations for hexapdf if necessary. |
I guess I'd prefer not defining the number of warmup iterations by hand for each benchmark as it seems failure-prone. We could change how YJIT decides to compile code and break our manually set iteration counts without noticing. I could try to experiment with different heuristics for adaptable warm-up.
Hmm yeah. If there was a way to get rid of that eval it would be ideal. If not, then we would have to settle for "still compiling but not compiling much proportionally" to be good enough. |
@rwstauner can you provide some context on how warm-up is handled in yjit-metrics? |
I think this is what Jean fixed: |
The current default that yjit-metrics is using is a static
|
Thank you Randy. Still not sure what to do here. I feel like ideally we should figure out a better criteria and make things more uniform between yjit-bench and yjit-metrics (use the same warm up criteria). I may experiment with some other warm-up approaches tomorrow. |
Back when we were using the "variable warmup report" (prior to May 2024) it would default to 30 warmups, which made the whole thing take several more hours |
Ok maybe there is a way that we can do something like 30 warm-up iterations or maximum 1 minute or maximum 30 seconds or something like that. |
I think it's worth looking at the differences between the two repos. |
Looking at the 2024-09-18 results for arm, these are the results where warmup (10 iterations) is already taking more than 30s:
Only 4 of those are currently taking more than 1 minute to do 10 warmup itrs. Most warmup itrs currently look to be under 1s, so doing up to 30 warmups or 60 seconds would probably get us pretty far. |
With respect to Rubykon, that benchmark seems to take especially long to run with CRuby, but it is 3x faster with YJIT. Maybe one simple thing that we can do is cap the warmup iterations for CRuby to 4 or 5 iterations, and use a variable warm-up for YJIT? I'm also going to go look if I can edit Rubykon to have a faster per-iteration time, because it is super duper slow on CRuby. |
Kokubun's compilation time investigation made me want to check if we're including compilation time during benchmarking iterations on yjit-bench. The answer is that for the lobster benchmark, which is the largest of our headline benchmarks, yes we are. We're using a fixed number of warm-up iterations (15), and this is enough for smaller benchmarks such as railsbench, but not for lobsters.
Just by printing the delta in
compiled_block_count
after each iteration for railsbench, we can see that it warms up in just 3 iterations:The situation is the same for many small benchmarks.
However, for lobsters, this value doesn't reach a stable zero for a long while. It slowly trends towards zero, but there's a looong tail with a few blocks being compiled even after 300 iterations. My guess is that there is a long tail of low-probability branch stubs that keep slowly being hit.
Lobsters blocks compiled per iteration
@rwstauner My first question would be: what are we currently doing for warm-up in yjit-metrics? Are we using the same fixed 15 warm-up iterations?
@k0kubun The second question is what do we want to do about this? We may want to implement the same solution in both yjit-metrics and yjit-bench, even though they don't use the same harness.
In theory we can just bump the number of iterations for warm-up to a bigger value, but this doesn't seem great, because it will slow down smaller benchmarks. Another solution is to have some kind of adaptive mechanism. We could decide that a benchmark is warmed up when it doesn't compile any new blocks for 5 iterations, for example, and also make sure that benchmarks run for at least 8-10 warm up iterations minimum. That may actually speed up things for smaller benchmarks. I guess as a first step I will check with all of the headline benchmarks that this heuristic would actually work.
@nirvdrum a third question is what are we actually compiling in lobsters after 300+ iterations? Am I correct in assuming that it's branch stubs being hit? This is a question to be answered using the compilation log, and once we have that answer, we may want to think of heuristics to stop compiling new code.
The text was updated successfully, but these errors were encountered: