Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: New torin benchmark #318

Merged
merged 19 commits into from
Oct 22, 2023
Merged

feat: New torin benchmark #318

merged 19 commits into from
Oct 22, 2023

Conversation

marc2332
Copy link
Owner

Depends on #314

New torin benchmark: deep + wide + cached

@github-actions
Copy link

Benchmark for bf072ce

Click to view benchmark
Test Base PR %
benchmarks/big trees (deep + branches + cached) + invalidated node in the middle 2.1±0.02µs N/A N/A
benchmarks/big trees (deep + cached) + invalidated node in the bottom 81.6±0.28µs 79.3±0.36µs -2.82%
benchmarks/big trees (deep + cached) + invalidated node in the middle 96.1±0.26µs 94.9±4.34µs -1.25%
benchmarks/big trees (deep + cached) + invalidated node in the top 169.1±0.47µs 166.5±0.32µs -1.54%
benchmarks/big trees (deep) nodes=10000, depth=14 128.8±1.77µs 128.5±0.34µs -0.23%
benchmarks/big trees (deep) nodes=100000, depth=17 1299.5±32.80µs 1293.3±36.59µs -0.48%
benchmarks/big trees (deep) nodes=4000, depth=12 69.1±0.70µs 68.9±0.78µs -0.29%
benchmarks/big trees (wide) nodes=1000, depth=1 161.5±16.60µs 159.8±0.37µs -1.05%
benchmarks/big trees (wide) nodes=10000, depth=1 1915.9±59.72µs 1925.0±52.03µs +0.47%
benchmarks/big trees (wide) nodes=100000, depth=1 34.2±0.76ms 39.1±0.99ms +14.33%

@github-actions
Copy link

Benchmark for 5546e08

Click to view benchmark
Test Base PR %
benchmarks/big trees (deep + branches + cached) + invalidated node in the middle 2.1±0.03µs N/A N/A
benchmarks/big trees (deep + cached) + invalidated node in the bottom 80.1±0.26µs 79.8±0.99µs -0.37%
benchmarks/big trees (deep + cached) + invalidated node in the middle 95.5±0.56µs 94.3±0.21µs -1.26%
benchmarks/big trees (deep + cached) + invalidated node in the top 169.1±0.39µs 167.2±0.39µs -1.12%
benchmarks/big trees (deep) nodes=10000, depth=14 130.6±0.26µs 138.3±0.27µs +5.90%
benchmarks/big trees (deep) nodes=100000, depth=17 1294.6±31.19µs 1366.6±30.76µs +5.56%
benchmarks/big trees (deep) nodes=4000, depth=12 70.3±0.71µs 73.1±0.73µs +3.98%
benchmarks/big trees (wide) nodes=1000, depth=1 161.3±0.44µs 172.3±1.14µs +6.82%
benchmarks/big trees (wide) nodes=10000, depth=1 1951.5±59.95µs 2.1±0.05ms +7.61%
benchmarks/big trees (wide) nodes=100000, depth=1 38.4±0.58ms 40.6±1.40ms +5.73%

@github-actions
Copy link

Benchmark for c4dd640

Click to view benchmark
Test Base PR %
benchmarks/big trees (deep + branches + cached) + invalidated node in the middle 2.7±0.54µs N/A N/A
benchmarks/big trees (deep + cached) + invalidated node in the bottom 100.5±7.51µs 100.1±11.57µs -0.40%
benchmarks/big trees (deep + cached) + invalidated node in the middle 120.6±7.70µs 118.2±15.75µs -1.99%
benchmarks/big trees (deep + cached) + invalidated node in the top 213.3±14.17µs 214.1±21.70µs +0.38%
benchmarks/big trees (deep) nodes=10000, depth=14 184.9±23.81µs 171.0±12.93µs -7.52%
benchmarks/big trees (deep) nodes=100000, depth=17 1507.5±125.83µs 1480.8±187.61µs -1.77%
benchmarks/big trees (deep) nodes=4000, depth=12 93.7±13.38µs 87.8±7.84µs -6.30%
benchmarks/big trees (wide) nodes=1000, depth=1 224.9±10.03µs 226.7±23.56µs +0.80%
benchmarks/big trees (wide) nodes=10000, depth=1 2.7±0.47ms 2.3±0.20ms -14.81%
benchmarks/big trees (wide) nodes=100000, depth=1 40.9±2.59ms 40.3±3.08ms -1.47%

* feat: Skip measuring of cached siblings in layout

* clean up

* remove fxhash

* fix test

* fixes

* .

* clean up

* new benchmark

* add_with_depth

* fix

* fix

* fix

* fixes

* fixes

* fixes

* fixes

* Update bench.rs
Base automatically changed from feat/improved-measuring-of-cached-layout to main October 1, 2023 12:54
@codecov
Copy link

codecov bot commented Oct 1, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6dda57e) 51.66% compared to head (ee144d4) 51.67%.

❗ Current head ee144d4 differs from pull request most recent head 7aa3731. Consider uploading reports for the commit 7aa3731 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #318      +/-   ##
==========================================
+ Coverage   51.66%   51.67%   +0.01%     
==========================================
  Files         131      131              
  Lines       11463    11466       +3     
==========================================
+ Hits         5922     5925       +3     
  Misses       5541     5541              
Files Coverage Δ
crates/torin/src/torin.rs 89.75% <100.00%> (+0.06%) ⬆️
crates/torin/tests/test.rs 99.80% <ø> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nicoburns
Copy link

@marc2332 These benchmark results looks suspiciously good to me. Specifically the benchmarks/big trees (wide) and benchmarks/big trees (deep) results where I'm assuming you're attempting to measure a fresh uncached layout? I think you might be making the same mistake that the initial version of Taffy's benchmarks did: reusing the same node tree across all iterations of the benchmarks, which means that if the benchmark framework does 100 iterations and averages the results, then only the first iteration computes layout and subsequent iteration just access the cached layout. So you're actually testing (time_to_layout + (99 * time_to_access_cached_value)) / 100 which is probably ~= time_to_layout / 100

To fix this in Taffy, we had to switch from using group.iter to using group.iter_batched (see https://bheisler.github.io/criterion.rs/book/user_guide/timing_loops.html#iter_batchediter_batched_ref), which allowed us to construct a new tree for each iteration.

@marc2332
Copy link
Owner Author

marc2332 commented Oct 1, 2023

@marc2332 These benchmark results looks suspiciously good to me. Specifically the benchmarks/big trees (wide) and benchmarks/big trees (deep) results where I'm assuming you're attempting to measure a fresh uncached layout? I think you might be making the same mistake that the initial version of Taffy's benchmarks did: reusing the same node tree across all iterations of the benchmarks, which means that if the benchmark framework does 100 iterations and averages the results, then only the first iteration computes layout and subsequent iteration just access the cached layout. So you're actually testing (time_to_layout + (99 * time_to_access_cached_value)) / 100 which is probably ~= time_to_layout / 100

To fix this in Taffy, we had to switch from using group.iter to using group.iter_batched (see https://bheisler.github.io/criterion.rs/book/user_guide/timing_loops.html#iter_batchediter_batched_ref), which allowed us to construct a new tree for each iteration

Hey Nico!

I have two types of benchmarks, the ones where I create the layout on each iteration, so, nothing is cached. And the other one is where I create and cache the layout before the benchmark, and then I mark as dirty a certain element on each iteration. But... I agree that this benchmark from this PR in particular (and maybe the others) are too good, hence why I haven't merged it yet, so don't worry about it

I still have to improve them, it's my first time doing benchmarks! I'll take a look at iter_batched asap 😄 Thank you 🙏

@github-actions
Copy link

Benchmark for b4d8f9c

Click to view benchmark
Test Base PR %
benchmarks/big trees (deep + branches + cached) + invalidated node in the middle 1113.0±67.41µs N/A N/A
benchmarks/big trees (deep + cached) + invalidated node in the bottom 77.9±0.72µs 109.0±43.07µs +39.92%
benchmarks/big trees (deep + cached) + invalidated node in the middle 93.6±0.15µs 124.8±43.38µs +33.33%
benchmarks/big trees (deep + cached) + invalidated node in the top 164.4±0.28µs 190.9±0.74µs +16.12%
benchmarks/big trees (deep) nodes=10000, depth=14 125.1±0.18µs 253.1±1.79µs +102.32%
benchmarks/big trees (deep) nodes=100000, depth=17 1208.1±10.23µs 3.9±0.16ms +222.82%
benchmarks/big trees (deep) nodes=4000, depth=12 67.6±0.75µs 117.1±0.71µs +73.22%
benchmarks/big trees (wide) nodes=1000, depth=1 157.6±0.25µs 182.0±1.91µs +15.48%
benchmarks/big trees (wide) nodes=10000, depth=1 1850.9±3.48µs 1980.2±10.45µs +6.99%
benchmarks/big trees (wide) nodes=100000, depth=1 34.6±0.91ms 38.1±0.77ms +10.12%

@marc2332
Copy link
Owner Author

marc2332 commented Oct 21, 2023

Looks like it is more accurate with iter_batched now @nicoburn, right?! Thanks for the suggestion! 😄

@marc2332
Copy link
Owner Author

marc2332 commented Oct 21, 2023

I still have to improve the benchmarks though! Some of them are not very realistic yet, specially the ones with depth 12, 14 and 17

@github-actions
Copy link

Benchmark for 747c2b7

Click to view benchmark
Test Base PR %
benchmarks/big trees (deep + branches + cached) + invalidated node in the middle 1167.6±100.96µs N/A N/A
benchmarks/big trees (deep + cached) + invalidated node in the bottom 77.6±0.39µs 107.2±1.40µs +38.14%
benchmarks/big trees (deep + cached) + invalidated node in the middle 92.3±0.50µs 116.1±0.53µs +25.79%
benchmarks/big trees (deep + cached) + invalidated node in the top 163.0±0.87µs 191.6±0.70µs +17.55%
benchmarks/big trees (deep) nodes=10000, depth=14 128.9±0.26µs 257.5±3.15µs +99.77%
benchmarks/big trees (deep) nodes=100000, depth=17 1244.1±16.23µs 4.1±0.28ms +229.56%
benchmarks/big trees (deep) nodes=4000, depth=12 69.9±2.13µs 119.2±2.09µs +70.53%
benchmarks/big trees (wide) nodes=1000, depth=1 164.8±13.74µs 185.4±0.49µs +12.50%
benchmarks/big trees (wide) nodes=10000, depth=1 1951.2±23.36µs 2.0±0.02ms +2.50%
benchmarks/big trees (wide) nodes=100000, depth=1 33.1±0.55ms 38.0±0.76ms +14.80%

@nicoburns
Copy link

@marc2332 Benchmarking code looks reasonable to me now. Results are still very fast! (roughly 10x faster than Taffy (and Morphorm last time I saw numbers for that)). But Torin is quite a bit simpler than either (no stretch/flex functionality at all yet, right?), so that seems plausible to me.

Taffy's benchmarks evenly distribute the children in each layer rather than only giving one child in each layer children. But I don't think there's anything wrong with your approach. It's just different.

@marc2332
Copy link
Owner Author

marc2332 commented Oct 21, 2023

(no stretch/flex functionality at all yet, right?)

Yeah, that's correct! Torin is way simpler

Taffy's benchmarks evenly distribute the children in each layer rather than only giving one child in each layer children. But I don't think there's anything wrong with your approach. It's just different.

Exactly! I want to do it just like Taffy because I think it's more realistic, perhaps I could leave it there but change the name so it becomes more obvious that there is only one child with children per layer

@marc2332 marc2332 merged commit 4ac85ef into main Oct 22, 2023
5 checks passed
@marc2332 marc2332 deleted the feat/new-torin-benchmark branch October 22, 2023 10:30
@github-actions
Copy link

Benchmark for c27e834

Click to view benchmark
Test Base PR %
benchmarks/big trees (deep + branches + cached) + invalidated node in the middle 1242.3±150.85µs 1162.4±133.31µs -6.43%
benchmarks/big trees (deep + cached) + invalidated node in the bottom 104.0±55.33µs 100.4±0.47µs -3.46%
benchmarks/big trees (deep + cached) + invalidated node in the middle 125.0±40.35µs 124.6±35.23µs -0.32%
benchmarks/big trees (deep + cached) + invalidated node in the top 194.5±60.71µs 191.7±2.13µs -1.44%
benchmarks/big trees (deep) nodes=10000, depth=14 258.0±3.45µs 252.8±2.38µs -2.02%
benchmarks/big trees (deep) nodes=100000, depth=17 4.1±0.30ms 4.0±0.32ms -2.44%
benchmarks/big trees (deep) nodes=4000, depth=12 118.9±1.01µs 116.8±1.53µs -1.77%
benchmarks/big trees (wide) nodes=1000, depth=1 184.6±0.71µs 184.4±0.66µs -0.11%
benchmarks/big trees (wide) nodes=10000, depth=1 1987.3±18.20µs 1998.2±12.86µs +0.55%
benchmarks/big trees (wide) nodes=100000, depth=1 38.1±0.84ms 37.2±0.64ms -2.36%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cd 🤖 CI/CD enhancement 🔥 New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants