Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Subclass conservation in td ops #1186

Merged
merged 2 commits into from
Jan 20, 2025
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 16, 2025

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 16, 2025
ghstack-source-id: 2990d5539699d53cf5e6c23950d6fcfbbe30dea8
Pull Request resolved: #1186
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2025
@vmoens vmoens added the enhancement New feature or request label Jan 16, 2025
Copy link

github-actions bot commented Jan 16, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 217. Improved: $\large\color{#35bf28}41$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 75.8620μs 19.2901μs 51.8402 KOps/s 48.7280 KOps/s $\textbf{\color{#35bf28}+6.39\%}$
test_plain_set_stack_nested 53.9110μs 19.4405μs 51.4389 KOps/s 48.1336 KOps/s $\textbf{\color{#35bf28}+6.87\%}$
test_plain_set_nested_inplace 83.9870μs 21.2443μs 47.0714 KOps/s 44.5286 KOps/s $\textbf{\color{#35bf28}+5.71\%}$
test_plain_set_stack_nested_inplace 53.0090μs 21.1288μs 47.3288 KOps/s 44.3081 KOps/s $\textbf{\color{#35bf28}+6.82\%}$
test_items 44.5730μs 4.1749μs 239.5271 KOps/s 243.4163 KOps/s $\color{#d91a1a}-1.60\%$
test_items_nested 0.6868ms 0.3944ms 2.5358 KOps/s 2.5407 KOps/s $\color{#d91a1a}-0.19\%$
test_items_nested_locked 0.8076ms 0.3955ms 2.5287 KOps/s 2.4945 KOps/s $\color{#35bf28}+1.37\%$
test_items_nested_leaf 0.1808ms 76.3298μs 13.1010 KOps/s 12.9339 KOps/s $\color{#35bf28}+1.29\%$
test_items_stack_nested 0.6842ms 0.3968ms 2.5203 KOps/s 2.4985 KOps/s $\color{#35bf28}+0.87\%$
test_items_stack_nested_leaf 0.1476ms 80.5904μs 12.4084 KOps/s 12.5911 KOps/s $\color{#d91a1a}-1.45\%$
test_items_stack_nested_locked 0.5746ms 0.3973ms 2.5168 KOps/s 2.5080 KOps/s $\color{#35bf28}+0.35\%$
test_keys 41.7480μs 3.6148μs 276.6369 KOps/s 281.9563 KOps/s $\color{#d91a1a}-1.89\%$
test_keys_nested 0.3431ms 0.1652ms 6.0520 KOps/s 6.0619 KOps/s $\color{#d91a1a}-0.16\%$
test_keys_nested_locked 1.5809ms 0.1690ms 5.9166 KOps/s 5.8109 KOps/s $\color{#35bf28}+1.82\%$
test_keys_nested_leaf 0.2125ms 0.1423ms 7.0285 KOps/s 6.8054 KOps/s $\color{#35bf28}+3.28\%$
test_keys_stack_nested 0.2460ms 0.1621ms 6.1680 KOps/s 5.9053 KOps/s $\color{#35bf28}+4.45\%$
test_keys_stack_nested_leaf 0.2577ms 0.1381ms 7.2421 KOps/s 6.8006 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_keys_stack_nested_locked 0.3070ms 0.1669ms 5.9923 KOps/s 5.7725 KOps/s $\color{#35bf28}+3.81\%$
test_values 8.2012μs 1.0339μs 967.2092 KOps/s 935.1485 KOps/s $\color{#35bf28}+3.43\%$
test_values_nested 0.1055ms 62.1699μs 16.0850 KOps/s 15.9309 KOps/s $\color{#35bf28}+0.97\%$
test_values_nested_locked 0.1234ms 61.6143μs 16.2300 KOps/s 16.1425 KOps/s $\color{#35bf28}+0.54\%$
test_values_nested_leaf 0.1376ms 71.6461μs 13.9575 KOps/s 13.9544 KOps/s $\color{#35bf28}+0.02\%$
test_values_stack_nested 0.1104ms 63.7615μs 15.6834 KOps/s 15.5741 KOps/s $\color{#35bf28}+0.70\%$
test_values_stack_nested_leaf 0.1273ms 70.4078μs 14.2030 KOps/s 14.0395 KOps/s $\color{#35bf28}+1.16\%$
test_values_stack_nested_locked 0.1143ms 63.8120μs 15.6710 KOps/s 16.0617 KOps/s $\color{#d91a1a}-2.43\%$
test_membership 39.7640μs 0.8512μs 1.1749 MOps/s 1.1701 MOps/s $\color{#35bf28}+0.41\%$
test_membership_nested 22.0710μs 2.9121μs 343.3908 KOps/s 345.1291 KOps/s $\color{#d91a1a}-0.50\%$
test_membership_nested_leaf 25.5180μs 2.9135μs 343.2312 KOps/s 346.5557 KOps/s $\color{#d91a1a}-0.96\%$
test_membership_stacked_nested 18.0240μs 2.8925μs 345.7206 KOps/s 352.1376 KOps/s $\color{#d91a1a}-1.82\%$
test_membership_stacked_nested_leaf 42.0780μs 2.8887μs 346.1737 KOps/s 346.9598 KOps/s $\color{#d91a1a}-0.23\%$
test_membership_nested_last 27.2510μs 4.3208μs 231.4369 KOps/s 230.9357 KOps/s $\color{#35bf28}+0.22\%$
test_membership_nested_leaf_last 50.4240μs 4.3017μs 232.4658 KOps/s 227.4073 KOps/s $\color{#35bf28}+2.22\%$
test_membership_stacked_nested_last 28.2620μs 5.4755μs 182.6309 KOps/s 231.7716 KOps/s $\textbf{\color{#d91a1a}-21.20\%}$
test_membership_stacked_nested_leaf_last 38.1710μs 5.5448μs 180.3494 KOps/s 231.0055 KOps/s $\textbf{\color{#d91a1a}-21.93\%}$
test_nested_getleaf 31.4680μs 10.4641μs 95.5645 KOps/s 93.5325 KOps/s $\color{#35bf28}+2.17\%$
test_nested_get 32.0390μs 10.0610μs 99.3936 KOps/s 97.6411 KOps/s $\color{#35bf28}+1.79\%$
test_stacked_getleaf 47.9700μs 10.5339μs 94.9320 KOps/s 94.6736 KOps/s $\color{#35bf28}+0.27\%$
test_stacked_get 32.0200μs 9.9799μs 100.2019 KOps/s 97.9327 KOps/s $\color{#35bf28}+2.32\%$
test_nested_getitemleaf 36.8290μs 11.1610μs 89.5977 KOps/s 89.1769 KOps/s $\color{#35bf28}+0.47\%$
test_nested_getitem 61.7350μs 10.6077μs 94.2711 KOps/s 93.4859 KOps/s $\color{#35bf28}+0.84\%$
test_stacked_getitemleaf 40.5760μs 11.2011μs 89.2768 KOps/s 88.3459 KOps/s $\color{#35bf28}+1.05\%$
test_stacked_getitem 33.0120μs 10.4946μs 95.2869 KOps/s 93.5732 KOps/s $\color{#35bf28}+1.83\%$
test_lock_nested 2.8349ms 0.4451ms 2.2465 KOps/s 1.7956 KOps/s $\textbf{\color{#35bf28}+25.11\%}$
test_lock_stack_nested 0.5510ms 0.4146ms 2.4119 KOps/s 2.3685 KOps/s $\color{#35bf28}+1.84\%$
test_unlock_nested 0.7380ms 0.3644ms 2.7443 KOps/s 2.6430 KOps/s $\color{#35bf28}+3.83\%$
test_unlock_stack_nested 0.5265ms 0.3354ms 2.9812 KOps/s 2.8671 KOps/s $\color{#35bf28}+3.98\%$
test_flatten_speed 0.1863ms 0.1005ms 9.9489 KOps/s 10.0295 KOps/s $\color{#d91a1a}-0.80\%$
test_unflatten_speed 0.9152ms 0.5209ms 1.9198 KOps/s 1.9178 KOps/s $\color{#35bf28}+0.10\%$
test_common_ops 4.1285ms 0.7250ms 1.3792 KOps/s 1.2448 KOps/s $\textbf{\color{#35bf28}+10.79\%}$
test_creation 68.1070μs 2.4244μs 412.4793 KOps/s 407.1937 KOps/s $\color{#35bf28}+1.30\%$
test_creation_empty 31.5390μs 9.2298μs 108.3446 KOps/s 82.9872 KOps/s $\textbf{\color{#35bf28}+30.56\%}$
test_creation_nested_1 44.2620μs 12.0716μs 82.8393 KOps/s 68.0355 KOps/s $\textbf{\color{#35bf28}+21.76\%}$
test_creation_nested_2 50.8140μs 16.4035μs 60.9627 KOps/s 51.5717 KOps/s $\textbf{\color{#35bf28}+18.21\%}$
test_clone 62.6570μs 13.2463μs 75.4927 KOps/s 75.5107 KOps/s $\color{#d91a1a}-0.02\%$
test_getitem[int] 1.1971ms 12.6741μs 78.9012 KOps/s 78.7566 KOps/s $\color{#35bf28}+0.18\%$
test_getitem[slice_int] 0.1442ms 25.7148μs 38.8882 KOps/s 40.9765 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_getitem[range] 0.1710ms 47.4933μs 21.0556 KOps/s 19.9796 KOps/s $\textbf{\color{#35bf28}+5.39\%}$
test_getitem[tuple] 0.1247ms 19.9779μs 50.0552 KOps/s 50.3482 KOps/s $\color{#d91a1a}-0.58\%$
test_getitem[list] 0.1647ms 42.2788μs 23.6525 KOps/s 22.0492 KOps/s $\textbf{\color{#35bf28}+7.27\%}$
test_setitem_dim[int] 0.1061ms 29.4615μs 33.9426 KOps/s 39.8461 KOps/s $\textbf{\color{#d91a1a}-14.82\%}$
test_setitem_dim[slice_int] 0.1285ms 51.2792μs 19.5011 KOps/s 19.6614 KOps/s $\color{#d91a1a}-0.82\%$
test_setitem_dim[range] 0.1557ms 76.6388μs 13.0482 KOps/s 13.0616 KOps/s $\color{#d91a1a}-0.10\%$
test_setitem_dim[tuple] 85.0490μs 41.0388μs 24.3672 KOps/s 24.7473 KOps/s $\color{#d91a1a}-1.54\%$
test_setitem 0.2295ms 18.7805μs 53.2466 KOps/s 49.3131 KOps/s $\textbf{\color{#35bf28}+7.98\%}$
test_set 0.2686ms 17.6882μs 56.5348 KOps/s 49.8732 KOps/s $\textbf{\color{#35bf28}+13.36\%}$
test_set_shared 3.4013ms 0.1731ms 5.7759 KOps/s 5.8465 KOps/s $\color{#d91a1a}-1.21\%$
test_update 0.2558ms 19.4309μs 51.4644 KOps/s 42.5604 KOps/s $\textbf{\color{#35bf28}+20.92\%}$
test_update_nested 0.1779ms 29.6803μs 33.6924 KOps/s 30.2615 KOps/s $\textbf{\color{#35bf28}+11.34\%}$
test_update__nested 0.5273ms 32.6023μs 30.6727 KOps/s 29.2223 KOps/s $\color{#35bf28}+4.96\%$
test_set_nested 0.1131ms 20.0701μs 49.8255 KOps/s 44.7273 KOps/s $\textbf{\color{#35bf28}+11.40\%}$
test_set_nested_new 0.1455ms 24.6838μs 40.5123 KOps/s 37.1776 KOps/s $\textbf{\color{#35bf28}+8.97\%}$
test_select 0.1978ms 39.9437μs 25.0352 KOps/s 23.1433 KOps/s $\textbf{\color{#35bf28}+8.17\%}$
test_select_nested 0.1282ms 61.8317μs 16.1729 KOps/s 15.9955 KOps/s $\color{#35bf28}+1.11\%$
test_exclude_nested 0.1644ms 80.3432μs 12.4466 KOps/s 12.2263 KOps/s $\color{#35bf28}+1.80\%$
test_empty[True] 0.7583ms 0.4033ms 2.4798 KOps/s 2.4697 KOps/s $\color{#35bf28}+0.41\%$
test_empty[False] 15.6223μs 1.4503μs 689.5329 KOps/s 727.4528 KOps/s $\textbf{\color{#d91a1a}-5.21\%}$
test_unbind_speed 0.4233ms 0.2654ms 3.7679 KOps/s 3.7069 KOps/s $\color{#35bf28}+1.65\%$
test_unbind_speed_stack0 0.4125ms 0.2622ms 3.8144 KOps/s 3.7890 KOps/s $\color{#35bf28}+0.67\%$
test_unbind_speed_stack1 0.1004s 0.7849ms 1.2741 KOps/s 1.3421 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_split 1.7050ms 1.5911ms 628.4814 Ops/s 562.5200 Ops/s $\textbf{\color{#35bf28}+11.73\%}$
test_chunk 0.1046s 1.7683ms 565.5289 Ops/s 553.5126 Ops/s $\color{#35bf28}+2.17\%$
test_consolidate_njt[False-None] 0.1100s 8.9467ms 111.7726 Ops/s 116.8953 Ops/s $\color{#d91a1a}-4.38\%$
test_creation[device0] 1.1356ms 98.7338μs 10.1282 KOps/s 10.7902 KOps/s $\textbf{\color{#d91a1a}-6.14\%}$
test_creation_from_tensor 0.2626ms 94.0790μs 10.6294 KOps/s 10.4616 KOps/s $\color{#35bf28}+1.60\%$
test_add_one[memmap_tensor0] 0.2423ms 4.9030μs 203.9551 KOps/s 204.3312 KOps/s $\color{#d91a1a}-0.18\%$
test_contiguous[memmap_tensor0] 18.0340μs 0.5052μs 1.9794 MOps/s 1.9552 MOps/s $\color{#35bf28}+1.24\%$
test_stack[memmap_tensor0] 62.2360μs 3.3721μs 296.5516 KOps/s 284.4937 KOps/s $\color{#35bf28}+4.24\%$
test_memmaptd_index 0.9257ms 0.2407ms 4.1550 KOps/s 4.2151 KOps/s $\color{#d91a1a}-1.43\%$
test_memmaptd_index_astensor 0.5776ms 0.3283ms 3.0457 KOps/s 3.1023 KOps/s $\color{#d91a1a}-1.82\%$
test_memmaptd_index_op 0.9359ms 0.5497ms 1.8190 KOps/s 1.6794 KOps/s $\textbf{\color{#35bf28}+8.31\%}$
test_serialize_model 0.1255s 0.1172s 8.5354 Ops/s 8.4530 Ops/s $\color{#35bf28}+0.97\%$
test_serialize_model_pickle 0.4958s 0.4062s 2.4619 Ops/s 2.5283 Ops/s $\color{#d91a1a}-2.63\%$
test_serialize_weights 0.1280s 0.1147s 8.7185 Ops/s 8.2904 Ops/s $\textbf{\color{#35bf28}+5.16\%}$
test_serialize_weights_returnearly 0.2786s 0.1733s 5.7716 Ops/s 6.0828 Ops/s $\textbf{\color{#d91a1a}-5.12\%}$
test_serialize_weights_pickle 0.9466s 0.6547s 1.5275 Ops/s 2.4645 Ops/s $\textbf{\color{#d91a1a}-38.02\%}$
test_serialize_weights_filesystem 0.1494s 0.1417s 7.0586 Ops/s 6.7828 Ops/s $\color{#35bf28}+4.07\%$
test_serialize_model_filesystem 0.1509s 0.1422s 7.0322 Ops/s 6.3100 Ops/s $\textbf{\color{#35bf28}+11.44\%}$
test_reshape_pytree 71.4430μs 26.6937μs 37.4621 KOps/s 38.1798 KOps/s $\color{#d91a1a}-1.88\%$
test_reshape_td 80.7410μs 32.3427μs 30.9189 KOps/s 29.6118 KOps/s $\color{#35bf28}+4.41\%$
test_view_pytree 68.3680μs 26.5107μs 37.7207 KOps/s 38.3352 KOps/s $\color{#d91a1a}-1.60\%$
test_view_td 76.0320μs 36.7952μs 27.1775 KOps/s 25.9760 KOps/s $\color{#35bf28}+4.63\%$
test_unbind_pytree 72.7650μs 29.4928μs 33.9066 KOps/s 34.2937 KOps/s $\color{#d91a1a}-1.13\%$
test_unbind_td 0.3240ms 39.2208μs 25.4967 KOps/s 24.9025 KOps/s $\color{#35bf28}+2.39\%$
test_split_pytree 66.2140μs 29.1843μs 34.2650 KOps/s 34.7427 KOps/s $\color{#d91a1a}-1.38\%$
test_split_td 0.2199ms 44.9353μs 22.2542 KOps/s 22.0428 KOps/s $\color{#35bf28}+0.96\%$
test_add_pytree 75.4310μs 35.2184μs 28.3942 KOps/s 28.1077 KOps/s $\color{#35bf28}+1.02\%$
test_add_td 0.2315ms 48.6175μs 20.5687 KOps/s 17.0883 KOps/s $\textbf{\color{#35bf28}+20.37\%}$
test_compile_add_one_nested[tensordict-compile] 0.1403ms 63.9582μs 15.6352 KOps/s 15.5931 KOps/s $\color{#35bf28}+0.27\%$
test_compile_add_one_nested[tensordict-eager] 0.3651ms 0.1734ms 5.7685 KOps/s 5.6747 KOps/s $\color{#35bf28}+1.65\%$
test_compile_add_one_nested[pytree-compile] 0.1109ms 45.2270μs 22.1107 KOps/s 21.5888 KOps/s $\color{#35bf28}+2.42\%$
test_compile_add_one_nested[pytree-eager] 0.2264ms 0.1180ms 8.4748 KOps/s 8.4129 KOps/s $\color{#35bf28}+0.74\%$
test_compile_copy_nested[tensordict-compile] 65.9120μs 27.7026μs 36.0977 KOps/s 37.7600 KOps/s $\color{#d91a1a}-4.40\%$
test_compile_copy_nested[tensordict-eager] 0.1140ms 57.9468μs 17.2572 KOps/s 17.2060 KOps/s $\color{#35bf28}+0.30\%$
test_compile_copy_nested[pytree-compile] 0.1422ms 78.3828μs 12.7579 KOps/s 12.7352 KOps/s $\color{#35bf28}+0.18\%$
test_compile_copy_nested[pytree-eager] 0.1447ms 66.8692μs 14.9546 KOps/s 15.0632 KOps/s $\color{#d91a1a}-0.72\%$
test_compile_add_one_flat[tensordict-compile] 0.2039ms 0.1030ms 9.7114 KOps/s 9.5063 KOps/s $\color{#35bf28}+2.16\%$
test_compile_add_one_flat[tensordict-eager] 0.3933ms 0.2130ms 4.6944 KOps/s 4.6113 KOps/s $\color{#35bf28}+1.80\%$
test_compile_add_one_flat[tensorclass-compile] 93.7740μs 44.6216μs 22.4107 KOps/s 21.2872 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_compile_add_one_flat[tensorclass-eager] 0.5298ms 67.1349μs 14.8954 KOps/s 14.9340 KOps/s $\color{#d91a1a}-0.26\%$
test_compile_add_one_flat[pytree-compile] 0.1858ms 0.1017ms 9.8370 KOps/s 9.7981 KOps/s $\color{#35bf28}+0.40\%$
test_compile_add_one_flat[pytree-eager] 0.5344ms 0.2057ms 4.8626 KOps/s 4.9617 KOps/s $\color{#d91a1a}-2.00\%$
test_compile_add_self_flat[tensordict-eager] 0.3461ms 0.2281ms 4.3836 KOps/s 4.3055 KOps/s $\color{#35bf28}+1.81\%$
test_compile_add_self_flat[tensordict-compile] 0.2022ms 0.1038ms 9.6318 KOps/s 9.4643 KOps/s $\color{#35bf28}+1.77\%$
test_compile_add_self_flat[tensorclass-eager] 0.1393ms 62.6789μs 15.9543 KOps/s 15.6304 KOps/s $\color{#35bf28}+2.07\%$
test_compile_add_self_flat[tensorclass-compile] 0.1862ms 46.3100μs 21.5936 KOps/s 21.0279 KOps/s $\color{#35bf28}+2.69\%$
test_compile_add_self_flat[pytree-eager] 0.3310ms 0.1607ms 6.2237 KOps/s 6.3481 KOps/s $\color{#d91a1a}-1.96\%$
test_compile_add_self_flat[pytree-compile] 0.2380ms 0.1016ms 9.8459 KOps/s 9.2365 KOps/s $\textbf{\color{#35bf28}+6.60\%}$
test_compile_copy_flat[tensordict-compile] 56.0840μs 20.8316μs 48.0041 KOps/s 46.4140 KOps/s $\color{#35bf28}+3.43\%$
test_compile_copy_flat[tensordict-eager] 0.1471ms 66.4664μs 15.0452 KOps/s 15.1233 KOps/s $\color{#d91a1a}-0.52\%$
test_compile_copy_flat[pytree-compile] 0.1921ms 78.3207μs 12.7680 KOps/s 12.8555 KOps/s $\color{#d91a1a}-0.68\%$
test_compile_copy_flat[pytree-eager] 0.1507ms 70.0174μs 14.2822 KOps/s 15.0500 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_compile_assign_and_add[tensordict-compile] 0.4258ms 0.2038ms 4.9076 KOps/s 4.7862 KOps/s $\color{#35bf28}+2.54\%$
test_compile_assign_and_add[tensordict-eager] 1.4978ms 1.3228ms 755.9814 Ops/s 761.2707 Ops/s $\color{#d91a1a}-0.69\%$
test_compile_assign_and_add[pytree-compile] 0.3550ms 0.2025ms 4.9380 KOps/s 4.9297 KOps/s $\color{#35bf28}+0.17\%$
test_compile_assign_and_add[pytree-eager] 1.3675ms 0.7824ms 1.2781 KOps/s 1.2762 KOps/s $\color{#35bf28}+0.15\%$
test_compile_assign_and_add_stack[compile] 0.9117ms 0.4538ms 2.2035 KOps/s 2.2219 KOps/s $\color{#d91a1a}-0.83\%$
test_compile_assign_and_add_stack[eager] 2.8119ms 2.5317ms 394.9970 Ops/s 375.0284 Ops/s $\textbf{\color{#35bf28}+5.32\%}$
test_compile_indexing[tensor-tensordict-compile] 0.1026ms 35.5789μs 28.1065 KOps/s 27.8206 KOps/s $\color{#35bf28}+1.03\%$
test_compile_indexing[tensor-tensordict-eager] 0.7355ms 32.1890μs 31.0665 KOps/s 30.5857 KOps/s $\color{#35bf28}+1.57\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1133ms 28.8902μs 34.6138 KOps/s 34.4499 KOps/s $\color{#35bf28}+0.48\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1051ms 23.7953μs 42.0251 KOps/s 42.8636 KOps/s $\color{#d91a1a}-1.96\%$
test_compile_indexing[tensor-pytree-compile] 0.1110ms 30.0826μs 33.2419 KOps/s 33.8588 KOps/s $\color{#d91a1a}-1.82\%$
test_compile_indexing[tensor-pytree-eager] 88.8160μs 23.8272μs 41.9688 KOps/s 43.2496 KOps/s $\color{#d91a1a}-2.96\%$
test_compile_indexing[slice-tensordict-compile] 0.1167ms 51.3922μs 19.4582 KOps/s 19.5767 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_indexing[slice-tensordict-eager] 0.6112ms 20.0957μs 49.7618 KOps/s 50.8750 KOps/s $\color{#d91a1a}-2.19\%$
test_compile_indexing[slice-tensorclass-compile] 0.1523ms 43.4660μs 23.0065 KOps/s 22.8379 KOps/s $\color{#35bf28}+0.74\%$
test_compile_indexing[slice-tensorclass-eager] 68.2670μs 18.6602μs 53.5901 KOps/s 54.5147 KOps/s $\color{#d91a1a}-1.70\%$
test_compile_indexing[slice-pytree-compile] 0.1206ms 44.2922μs 22.5774 KOps/s 22.4712 KOps/s $\color{#35bf28}+0.47\%$
test_compile_indexing[slice-pytree-eager] 77.4850μs 18.7374μs 53.3692 KOps/s 55.2233 KOps/s $\color{#d91a1a}-3.36\%$
test_compile_indexing[int-tensordict-compile] 0.1204ms 51.5452μs 19.4004 KOps/s 19.3766 KOps/s $\color{#35bf28}+0.12\%$
test_compile_indexing[int-tensordict-eager] 1.0121ms 19.8254μs 50.4403 KOps/s 51.4784 KOps/s $\color{#d91a1a}-2.02\%$
test_compile_indexing[int-tensorclass-compile] 0.1251ms 44.5121μs 22.4658 KOps/s 22.4557 KOps/s $\color{#35bf28}+0.05\%$
test_compile_indexing[int-tensorclass-eager] 70.3810μs 18.6548μs 53.6055 KOps/s 55.0752 KOps/s $\color{#d91a1a}-2.67\%$
test_compile_indexing[int-pytree-compile] 0.1460ms 44.6778μs 22.3825 KOps/s 22.3916 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_indexing[int-pytree-eager] 91.2000μs 18.7183μs 53.4237 KOps/s 55.2872 KOps/s $\color{#d91a1a}-3.37\%$
test_mod_add[eager] 0.1046ms 32.7530μs 30.5315 KOps/s 28.9307 KOps/s $\textbf{\color{#35bf28}+5.53\%}$
test_mod_add[compile] 0.1713ms 48.3622μs 20.6773 KOps/s 20.8661 KOps/s $\color{#d91a1a}-0.90\%$
test_mod_add[compile-overhead] 0.1071ms 47.5685μs 21.0223 KOps/s 20.8312 KOps/s $\color{#35bf28}+0.92\%$
test_mod_wrap[eager] 0.4578ms 0.2235ms 4.4746 KOps/s 4.5807 KOps/s $\color{#d91a1a}-2.32\%$
test_mod_wrap[compile] 0.4107ms 0.2054ms 4.8679 KOps/s 4.8043 KOps/s $\color{#35bf28}+1.32\%$
test_mod_wrap[compile-overhead] 0.3845ms 0.2057ms 4.8607 KOps/s 4.9197 KOps/s $\color{#d91a1a}-1.20\%$
test_mod_wrap_and_backward[eager] 14.5899ms 12.4329ms 80.4320 Ops/s 75.3865 Ops/s $\textbf{\color{#35bf28}+6.69\%}$
test_mod_wrap_and_backward[compile] 19.2336ms 13.4903ms 74.1275 Ops/s 74.3224 Ops/s $\color{#d91a1a}-0.26\%$
test_mod_wrap_and_backward[compile-overhead] 18.2843ms 12.9396ms 77.2821 Ops/s 74.6303 Ops/s $\color{#35bf28}+3.55\%$
test_seq_add[eager] 0.2544ms 0.1136ms 8.8006 KOps/s 8.6231 KOps/s $\color{#35bf28}+2.06\%$
test_seq_add[compile] 0.1491ms 63.0168μs 15.8688 KOps/s 15.5775 KOps/s $\color{#35bf28}+1.87\%$
test_seq_add[compile-overhead] 0.1380ms 59.9933μs 16.6685 KOps/s 16.1987 KOps/s $\color{#35bf28}+2.90\%$
test_seq_wrap[eager] 0.6997ms 0.4289ms 2.3314 KOps/s 2.2154 KOps/s $\textbf{\color{#35bf28}+5.23\%}$
test_seq_wrap[compile] 0.6885ms 0.2270ms 4.4046 KOps/s 4.2894 KOps/s $\color{#35bf28}+2.68\%$
test_seq_wrap[compile-overhead] 0.3584ms 0.2231ms 4.4821 KOps/s 4.3375 KOps/s $\color{#35bf28}+3.33\%$
test_func_call_runtime[False-eager] 0.9674ms 0.5433ms 1.8405 KOps/s 1.8516 KOps/s $\color{#d91a1a}-0.60\%$
test_func_call_runtime[False-compile] 0.7493ms 0.4207ms 2.3772 KOps/s 2.3098 KOps/s $\color{#35bf28}+2.92\%$
test_func_call_runtime[False-compile-overhead] 0.5243ms 0.4202ms 2.3801 KOps/s 2.3776 KOps/s $\color{#35bf28}+0.10\%$
test_func_call_runtime[True-eager] 0.9001ms 0.7526ms 1.3287 KOps/s 1.3114 KOps/s $\color{#35bf28}+1.32\%$
test_func_call_runtime[True-compile] 0.6662ms 0.4582ms 2.1824 KOps/s 2.1514 KOps/s $\color{#35bf28}+1.44\%$
test_func_call_runtime[True-compile-overhead] 0.6915ms 0.4621ms 2.1639 KOps/s 2.1551 KOps/s $\color{#35bf28}+0.41\%$
test_func_call_cm_runtime[False-eager] 0.8352ms 0.5448ms 1.8356 KOps/s 1.8714 KOps/s $\color{#d91a1a}-1.91\%$
test_func_call_cm_runtime[False-compile] 1.1342ms 0.4268ms 2.3430 KOps/s 2.3689 KOps/s $\color{#d91a1a}-1.09\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5704ms 0.4199ms 2.3817 KOps/s 2.3810 KOps/s $\color{#35bf28}+0.03\%$
test_func_call_cm_runtime[True-eager] 1.4407ms 0.9011ms 1.1097 KOps/s 1.1180 KOps/s $\color{#d91a1a}-0.74\%$
test_func_call_cm_runtime[True-compile] 0.5941ms 0.4835ms 2.0682 KOps/s 2.0697 KOps/s $\color{#d91a1a}-0.07\%$
test_func_call_cm_runtime[True-compile-overhead] 0.6282ms 0.4826ms 2.0723 KOps/s 2.0830 KOps/s $\color{#d91a1a}-0.51\%$
test_vmap_func_call_cm_runtime[eager] 2.6072ms 1.8896ms 529.2084 Ops/s 525.2757 Ops/s $\color{#35bf28}+0.75\%$
test_vmap_func_call_cm_runtime[compile] 0.6862ms 0.5187ms 1.9279 KOps/s 1.9381 KOps/s $\color{#d91a1a}-0.53\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.7174ms 0.5209ms 1.9198 KOps/s 1.9292 KOps/s $\color{#d91a1a}-0.49\%$
test_distributed 0.3668ms 0.1254ms 7.9755 KOps/s 7.8023 KOps/s $\color{#35bf28}+2.22\%$
test_tdmodule 75.8910μs 24.3432μs 41.0792 KOps/s 37.7461 KOps/s $\textbf{\color{#35bf28}+8.83\%}$
test_tdmodule_dispatch 72.5150μs 43.5911μs 22.9404 KOps/s 20.9871 KOps/s $\textbf{\color{#35bf28}+9.31\%}$
test_tdseq 47.0670μs 26.2869μs 38.0418 KOps/s 35.0551 KOps/s $\textbf{\color{#35bf28}+8.52\%}$
test_tdseq_dispatch 95.4770μs 49.4920μs 20.2053 KOps/s 18.5958 KOps/s $\textbf{\color{#35bf28}+8.66\%}$
test_instantiation_functorch 3.0774ms 1.5121ms 661.3475 Ops/s 655.8504 Ops/s $\color{#35bf28}+0.84\%$
test_exec_functorch 0.3247ms 0.1810ms 5.5254 KOps/s 5.6397 KOps/s $\color{#d91a1a}-2.03\%$
test_exec_functional_call 0.2956ms 0.1693ms 5.9078 KOps/s 5.8658 KOps/s $\color{#35bf28}+0.72\%$
test_exec_td_decorator 0.5058ms 0.2276ms 4.3936 KOps/s 4.3938 KOps/s $-0.00\%$
test_vmap_mlp_speed_decorator[True-True] 0.9553ms 0.6473ms 1.5449 KOps/s 1.4877 KOps/s $\color{#35bf28}+3.85\%$
test_vmap_mlp_speed_decorator[True-False] 1.3060ms 0.6493ms 1.5402 KOps/s 1.5430 KOps/s $\color{#d91a1a}-0.18\%$
test_vmap_mlp_speed_decorator[False-True] 0.8832ms 0.5303ms 1.8857 KOps/s 1.8986 KOps/s $\color{#d91a1a}-0.68\%$
test_vmap_mlp_speed_decorator[False-False] 1.1113ms 0.5342ms 1.8721 KOps/s 1.9072 KOps/s $\color{#d91a1a}-1.84\%$
test_to_module_speed[True] 1.7597ms 1.3357ms 748.6711 Ops/s 736.1793 Ops/s $\color{#35bf28}+1.70\%$
test_to_module_speed[False] 1.9305ms 1.3032ms 767.3161 Ops/s 743.6382 Ops/s $\color{#35bf28}+3.18\%$
test_tc_init 73.6770μs 41.5814μs 24.0492 KOps/s 21.4711 KOps/s $\textbf{\color{#35bf28}+12.01\%}$
test_tc_init_nested 0.1656ms 81.6932μs 12.2409 KOps/s 10.2505 KOps/s $\textbf{\color{#35bf28}+19.42\%}$
test_tc_first_layer_tensor 15.7790μs 1.5236μs 656.3321 KOps/s 661.4717 KOps/s $\color{#d91a1a}-0.78\%$
test_tc_first_layer_nontensor 26.7690μs 4.6510μs 215.0060 KOps/s 217.0874 KOps/s $\color{#d91a1a}-0.96\%$
test_tc_second_layer_tensor 43.4740μs 2.7661μs 361.5186 KOps/s 355.1725 KOps/s $\color{#35bf28}+1.79\%$
test_tc_second_layer_nontensor 32.0900μs 5.9967μs 166.7587 KOps/s 168.2494 KOps/s $\color{#d91a1a}-0.89\%$
test_unbind 0.2169s 13.3926ms 74.6682 Ops/s 78.2489 Ops/s $\color{#d91a1a}-4.58\%$
test_full_like 8.4709ms 7.1128ms 140.5907 Ops/s 81.4874 Ops/s $\textbf{\color{#35bf28}+72.53\%}$
test_zeros_like 3.2627ms 2.7222ms 367.3523 Ops/s 138.3856 Ops/s $\textbf{\color{#35bf28}+165.46\%}$
test_ones_like 3.7025ms 3.2233ms 310.2372 Ops/s 133.2428 Ops/s $\textbf{\color{#35bf28}+132.84\%}$
test_clone 8.5705ms 5.2735ms 189.6256 Ops/s 110.1729 Ops/s $\textbf{\color{#35bf28}+72.12\%}$
test_squeeze 81.8930μs 11.9466μs 83.7061 KOps/s 83.7175 KOps/s $\color{#d91a1a}-0.01\%$
test_unsqueeze 0.2361ms 88.2136μs 11.3361 KOps/s 10.7024 KOps/s $\textbf{\color{#35bf28}+5.92\%}$
test_split 0.3615ms 0.1909ms 5.2392 KOps/s 5.1914 KOps/s $\color{#35bf28}+0.92\%$
test_permute 0.3863ms 0.2003ms 4.9936 KOps/s 4.9049 KOps/s $\color{#35bf28}+1.81\%$
test_stack 29.2729ms 25.3166ms 39.4997 Ops/s 39.8566 Ops/s $\color{#d91a1a}-0.90\%$
test_cat 28.5507ms 24.7456ms 40.4112 Ops/s 40.0578 Ops/s $\color{#35bf28}+0.88\%$

Copy link

github-actions bot commented Jan 16, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 229. Improved: $\large\color{#35bf28}33$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 34.7220μs 11.2547μs 88.8519 KOps/s 79.5299 KOps/s $\textbf{\color{#35bf28}+11.72\%}$
test_plain_set_stack_nested 35.5320μs 11.4421μs 87.3966 KOps/s 78.4291 KOps/s $\textbf{\color{#35bf28}+11.43\%}$
test_plain_set_nested_inplace 40.2020μs 12.2643μs 81.5378 KOps/s 72.4528 KOps/s $\textbf{\color{#35bf28}+12.54\%}$
test_plain_set_stack_nested_inplace 0.1095ms 12.3334μs 81.0804 KOps/s 73.2179 KOps/s $\textbf{\color{#35bf28}+10.74\%}$
test_items 30.1120μs 2.8884μs 346.2111 KOps/s 344.0520 KOps/s $\color{#35bf28}+0.63\%$
test_items_nested 0.4092ms 0.3599ms 2.7788 KOps/s 2.7740 KOps/s $\color{#35bf28}+0.17\%$
test_items_nested_locked 0.4159ms 0.3599ms 2.7787 KOps/s 2.7717 KOps/s $\color{#35bf28}+0.25\%$
test_items_nested_leaf 79.9440μs 57.8218μs 17.2945 KOps/s 17.2955 KOps/s $-0.01\%$
test_items_stack_nested 0.4285ms 0.3617ms 2.7644 KOps/s 2.7583 KOps/s $\color{#35bf28}+0.22\%$
test_items_stack_nested_leaf 86.8750μs 58.6112μs 17.0616 KOps/s 16.8370 KOps/s $\color{#35bf28}+1.33\%$
test_items_stack_nested_locked 0.4368ms 0.3639ms 2.7483 KOps/s 2.7455 KOps/s $\color{#35bf28}+0.10\%$
test_keys 32.3810μs 3.6075μs 277.2020 KOps/s 292.3105 KOps/s $\textbf{\color{#d91a1a}-5.17\%}$
test_keys_nested 0.1185ms 88.1540μs 11.3438 KOps/s 11.4063 KOps/s $\color{#d91a1a}-0.55\%$
test_keys_nested_locked 0.8121ms 93.7005μs 10.6723 KOps/s 10.7166 KOps/s $\color{#d91a1a}-0.41\%$
test_keys_nested_leaf 0.1156ms 78.4276μs 12.7506 KOps/s 12.8466 KOps/s $\color{#d91a1a}-0.75\%$
test_keys_stack_nested 0.1146ms 90.5310μs 11.0459 KOps/s 11.2482 KOps/s $\color{#d91a1a}-1.80\%$
test_keys_stack_nested_leaf 0.1295ms 80.2567μs 12.4600 KOps/s 12.3777 KOps/s $\color{#35bf28}+0.66\%$
test_keys_stack_nested_locked 0.1217ms 95.1842μs 10.5059 KOps/s 10.4785 KOps/s $\color{#35bf28}+0.26\%$
test_values 7.9320μs 0.8489μs 1.1780 MOps/s 1.1798 MOps/s $\color{#d91a1a}-0.16\%$
test_values_nested 71.9240μs 37.5900μs 26.6028 KOps/s 26.8613 KOps/s $\color{#d91a1a}-0.96\%$
test_values_nested_locked 0.3354ms 39.3178μs 25.4338 KOps/s 25.6168 KOps/s $\color{#d91a1a}-0.71\%$
test_values_nested_leaf 78.7540μs 41.9900μs 23.8152 KOps/s 24.0621 KOps/s $\color{#d91a1a}-1.03\%$
test_values_stack_nested 73.0640μs 38.1301μs 26.2260 KOps/s 26.3485 KOps/s $\color{#d91a1a}-0.46\%$
test_values_stack_nested_leaf 77.7640μs 42.3154μs 23.6321 KOps/s 23.7699 KOps/s $\color{#d91a1a}-0.58\%$
test_values_stack_nested_locked 74.6140μs 40.0397μs 24.9752 KOps/s 25.0839 KOps/s $\color{#d91a1a}-0.43\%$
test_membership 1.7366μs 0.5012μs 1.9951 MOps/s 1.9530 MOps/s $\color{#35bf28}+2.16\%$
test_membership_nested 33.3320μs 2.0376μs 490.7641 KOps/s 505.7721 KOps/s $\color{#d91a1a}-2.97\%$
test_membership_nested_leaf 15.2955μs 2.0040μs 499.0048 KOps/s 504.7272 KOps/s $\color{#d91a1a}-1.13\%$
test_membership_stacked_nested 32.5120μs 2.0303μs 492.5351 KOps/s 485.0938 KOps/s $\color{#35bf28}+1.53\%$
test_membership_stacked_nested_leaf 40.1220μs 2.0480μs 488.2922 KOps/s 488.7442 KOps/s $\color{#d91a1a}-0.09\%$
test_membership_nested_last 35.7520μs 2.9939μs 334.0149 KOps/s 330.8566 KOps/s $\color{#35bf28}+0.95\%$
test_membership_nested_leaf_last 27.6020μs 2.9744μs 336.2056 KOps/s 332.4787 KOps/s $\color{#35bf28}+1.12\%$
test_membership_stacked_nested_last 28.8820μs 2.9894μs 334.5109 KOps/s 281.4104 KOps/s $\textbf{\color{#35bf28}+18.87\%}$
test_membership_stacked_nested_leaf_last 39.5520μs 2.9966μs 333.7097 KOps/s 286.5069 KOps/s $\textbf{\color{#35bf28}+16.48\%}$
test_nested_getleaf 42.1320μs 6.0478μs 165.3481 KOps/s 164.7907 KOps/s $\color{#35bf28}+0.34\%$
test_nested_get 34.4620μs 5.7491μs 173.9409 KOps/s 174.0153 KOps/s $\color{#d91a1a}-0.04\%$
test_stacked_getleaf 33.4820μs 6.0815μs 164.4337 KOps/s 163.6704 KOps/s $\color{#35bf28}+0.47\%$
test_stacked_get 41.9020μs 5.7690μs 173.3393 KOps/s 172.5653 KOps/s $\color{#35bf28}+0.45\%$
test_nested_getitemleaf 36.5120μs 6.4218μs 155.7201 KOps/s 155.6618 KOps/s $\color{#35bf28}+0.04\%$
test_nested_getitem 31.7720μs 6.0743μs 164.6270 KOps/s 166.0551 KOps/s $\color{#d91a1a}-0.86\%$
test_stacked_getitemleaf 45.7520μs 6.3758μs 156.8427 KOps/s 156.1297 KOps/s $\color{#35bf28}+0.46\%$
test_stacked_getitem 38.5220μs 6.1000μs 163.9347 KOps/s 165.2739 KOps/s $\color{#d91a1a}-0.81\%$
test_lock_nested 1.1250ms 0.3735ms 2.6772 KOps/s 2.6635 KOps/s $\color{#35bf28}+0.52\%$
test_lock_stack_nested 0.4705ms 0.3474ms 2.8788 KOps/s 2.9346 KOps/s $\color{#d91a1a}-1.90\%$
test_unlock_nested 0.6944ms 0.3183ms 3.1416 KOps/s 3.2518 KOps/s $\color{#d91a1a}-3.39\%$
test_unlock_stack_nested 0.3262ms 0.2846ms 3.5140 KOps/s 3.5886 KOps/s $\color{#d91a1a}-2.08\%$
test_flatten_speed 0.1157ms 74.7613μs 13.3759 KOps/s 13.4021 KOps/s $\color{#d91a1a}-0.20\%$
test_unflatten_speed 0.3687ms 0.3195ms 3.1295 KOps/s 3.1639 KOps/s $\color{#d91a1a}-1.09\%$
test_common_ops 1.6446ms 0.5585ms 1.7904 KOps/s 1.6355 KOps/s $\textbf{\color{#35bf28}+9.47\%}$
test_creation 0.1715ms 1.7096μs 584.9290 KOps/s 590.7755 KOps/s $\color{#d91a1a}-0.99\%$
test_creation_empty 38.5320μs 6.2687μs 159.5231 KOps/s 114.3177 KOps/s $\textbf{\color{#35bf28}+39.54\%}$
test_creation_nested_1 35.2020μs 7.9421μs 125.9106 KOps/s 95.3676 KOps/s $\textbf{\color{#35bf28}+32.03\%}$
test_creation_nested_2 46.3720μs 10.7329μs 93.1714 KOps/s 76.7714 KOps/s $\textbf{\color{#35bf28}+21.36\%}$
test_clone 0.1243ms 10.2347μs 97.7065 KOps/s 100.3034 KOps/s $\color{#d91a1a}-2.59\%$
test_getitem[int] 1.9867ms 11.0920μs 90.1549 KOps/s 93.2923 KOps/s $\color{#d91a1a}-3.36\%$
test_getitem[slice_int] 0.1155ms 21.8072μs 45.8564 KOps/s 49.1642 KOps/s $\textbf{\color{#d91a1a}-6.73\%}$
test_getitem[range] 0.1327ms 37.4109μs 26.7302 KOps/s 28.2197 KOps/s $\textbf{\color{#d91a1a}-5.28\%}$
test_getitem[tuple] 0.1112ms 18.6891μs 53.5071 KOps/s 55.1818 KOps/s $\color{#d91a1a}-3.03\%$
test_getitem[list] 0.1313ms 32.5971μs 30.6776 KOps/s 31.5977 KOps/s $\color{#d91a1a}-2.91\%$
test_setitem_dim[int] 40.8120μs 19.1747μs 52.1520 KOps/s 54.4386 KOps/s $\color{#d91a1a}-4.20\%$
test_setitem_dim[slice_int] 61.5030μs 38.0382μs 26.2893 KOps/s 27.2815 KOps/s $\color{#d91a1a}-3.64\%$
test_setitem_dim[range] 77.2740μs 51.9584μs 19.2462 KOps/s 19.7108 KOps/s $\color{#d91a1a}-2.36\%$
test_setitem_dim[tuple] 65.8040μs 32.4320μs 30.8338 KOps/s 32.2009 KOps/s $\color{#d91a1a}-4.25\%$
test_setitem 45.3520μs 13.5758μs 73.6605 KOps/s 67.7661 KOps/s $\textbf{\color{#35bf28}+8.70\%}$
test_set 0.1326ms 12.9779μs 77.0540 KOps/s 70.0116 KOps/s $\textbf{\color{#35bf28}+10.06\%}$
test_set_shared 1.4527ms 0.1489ms 6.7160 KOps/s 6.6982 KOps/s $\color{#35bf28}+0.27\%$
test_update 0.8622ms 15.0018μs 66.6586 KOps/s 57.0089 KOps/s $\textbf{\color{#35bf28}+16.93\%}$
test_update_nested 0.1321ms 20.3521μs 49.1350 KOps/s 43.3763 KOps/s $\textbf{\color{#35bf28}+13.28\%}$
test_update__nested 1.3175ms 24.9415μs 40.0939 KOps/s 41.2102 KOps/s $\color{#d91a1a}-2.71\%$
test_set_nested 0.1255ms 14.5429μs 68.7622 KOps/s 65.2803 KOps/s $\textbf{\color{#35bf28}+5.33\%}$
test_set_nested_new 0.1246ms 16.9654μs 58.9436 KOps/s 56.5187 KOps/s $\color{#35bf28}+4.29\%$
test_select 0.1413ms 28.6981μs 34.8455 KOps/s 33.5196 KOps/s $\color{#35bf28}+3.96\%$
test_select_nested 75.3140μs 43.6877μs 22.8897 KOps/s 22.7982 KOps/s $\color{#35bf28}+0.40\%$
test_exclude_nested 92.6050μs 61.5834μs 16.2381 KOps/s 16.0411 KOps/s $\color{#35bf28}+1.23\%$
test_empty[True] 0.3491ms 0.2932ms 3.4107 KOps/s 3.4376 KOps/s $\color{#d91a1a}-0.78\%$
test_empty[False] 4.5782μs 0.8221μs 1.2164 MOps/s 1.2179 MOps/s $\color{#d91a1a}-0.12\%$
test_to 84.7050μs 56.2844μs 17.7669 KOps/s 17.1846 KOps/s $\color{#35bf28}+3.39\%$
test_to_nonblocking 0.1200ms 46.2809μs 21.6072 KOps/s 21.1275 KOps/s $\color{#35bf28}+2.27\%$
test_unbind_speed 0.8247ms 0.2412ms 4.1468 KOps/s 4.2411 KOps/s $\color{#d91a1a}-2.22\%$
test_unbind_speed_stack0 0.2915ms 0.2393ms 4.1780 KOps/s 4.2398 KOps/s $\color{#d91a1a}-1.46\%$
test_unbind_speed_stack1 0.6728ms 0.6182ms 1.6176 KOps/s 1.5049 KOps/s $\textbf{\color{#35bf28}+7.49\%}$
test_split 95.1560ms 1.6468ms 607.2350 Ops/s 622.4164 Ops/s $\color{#d91a1a}-2.44\%$
test_chunk 97.0090ms 1.6411ms 609.3517 Ops/s 620.3660 Ops/s $\color{#d91a1a}-1.78\%$
test_consolidate[False-None] 97.0805ms 2.8843ms 346.7095 Ops/s 341.4397 Ops/s $\color{#35bf28}+1.54\%$
test_consolidate[default-None] 1.8007ms 1.7126ms 583.8972 Ops/s 601.5385 Ops/s $\color{#d91a1a}-2.93\%$
test_consolidate[reduce-overhead-None] 1.8650ms 1.7555ms 569.6533 Ops/s 590.4789 Ops/s $\color{#d91a1a}-3.53\%$
test_consolidate_njt[False-None] 6.7159ms 6.4852ms 154.1962 Ops/s 156.0760 Ops/s $\color{#d91a1a}-1.20\%$
test_to[False-False-None] 1.8143ms 1.7010ms 587.8758 Ops/s 589.0270 Ops/s $\color{#d91a1a}-0.20\%$
test_to[True-False-None] 1.4956ms 1.2986ms 770.0760 Ops/s 796.2186 Ops/s $\color{#d91a1a}-3.28\%$
test_to[within-False-None] 4.3647ms 4.1011ms 243.8373 Ops/s 178.4825 Ops/s $\textbf{\color{#35bf28}+36.62\%}$
test_to[True-default-None] 5.5659ms 5.3393ms 187.2896 Ops/s 193.9373 Ops/s $\color{#d91a1a}-3.43\%$
test_to_njt[False-False-None] 7.0973ms 6.7955ms 147.1556 Ops/s 143.1369 Ops/s $\color{#35bf28}+2.81\%$
test_to_njt[True-False-None] 5.6958ms 5.4001ms 185.1833 Ops/s 181.9411 Ops/s $\color{#35bf28}+1.78\%$
test_to_njt[within-False-None] 12.4818ms 12.1266ms 82.4631 Ops/s 82.8275 Ops/s $\color{#d91a1a}-0.44\%$
test_creation[device0] 0.6185ms 80.3841μs 12.4403 KOps/s 12.1822 KOps/s $\color{#35bf28}+2.12\%$
test_creation_from_tensor 0.5955ms 84.5659μs 11.8251 KOps/s 11.6993 KOps/s $\color{#35bf28}+1.07\%$
test_add_one[memmap_tensor0] 0.4075ms 6.3254μs 158.0937 KOps/s 160.5520 KOps/s $\color{#d91a1a}-1.53\%$
test_contiguous[memmap_tensor0] 2.1076μs 0.3992μs 2.5049 MOps/s 2.4666 MOps/s $\color{#35bf28}+1.55\%$
test_stack[memmap_tensor0] 52.9120μs 4.6050μs 217.1559 KOps/s 219.7225 KOps/s $\color{#d91a1a}-1.17\%$
test_memmaptd_index 1.5123ms 0.2597ms 3.8507 KOps/s 4.0186 KOps/s $\color{#d91a1a}-4.18\%$
test_memmaptd_index_astensor 0.5880ms 0.3237ms 3.0889 KOps/s 3.2148 KOps/s $\color{#d91a1a}-3.92\%$
test_memmaptd_index_op 1.0530ms 0.5554ms 1.8004 KOps/s 1.7232 KOps/s $\color{#35bf28}+4.48\%$
test_serialize_model 0.1318s 0.1312s 7.6243 Ops/s 7.6426 Ops/s $\color{#d91a1a}-0.24\%$
test_serialize_model_pickle 1.3776s 1.2190s 0.8204 Ops/s 0.8443 Ops/s $\color{#d91a1a}-2.83\%$
test_serialize_weights 0.1310s 0.1304s 7.6716 Ops/s 7.6867 Ops/s $\color{#d91a1a}-0.20\%$
test_serialize_weights_returnearly 0.5131s 72.7543ms 13.7449 Ops/s 15.3446 Ops/s $\textbf{\color{#d91a1a}-10.43\%}$
test_serialize_weights_pickle 1.3772s 1.2161s 0.8223 Ops/s 0.8222 Ops/s $+0.00\%$
test_reshape_pytree 55.1830μs 21.8910μs 45.6808 KOps/s 45.2079 KOps/s $\color{#35bf28}+1.05\%$
test_reshape_td 59.2630μs 26.4187μs 37.8520 KOps/s 37.4825 KOps/s $\color{#35bf28}+0.99\%$
test_view_pytree 47.1220μs 21.8535μs 45.7592 KOps/s 45.8195 KOps/s $\color{#d91a1a}-0.13\%$
test_view_td 74.5140μs 31.6835μs 31.5622 KOps/s 33.9184 KOps/s $\textbf{\color{#d91a1a}-6.95\%}$
test_unbind_pytree 56.4530μs 27.9367μs 35.7952 KOps/s 35.8382 KOps/s $\color{#d91a1a}-0.12\%$
test_unbind_td 1.0009ms 36.3647μs 27.4992 KOps/s 27.8305 KOps/s $\color{#d91a1a}-1.19\%$
test_split_pytree 64.1740μs 29.4431μs 33.9638 KOps/s 32.7569 KOps/s $\color{#35bf28}+3.68\%$
test_split_td 0.1818ms 39.3721μs 25.3987 KOps/s 25.9971 KOps/s $\color{#d91a1a}-2.30\%$
test_add_pytree 69.1840μs 32.9244μs 30.3726 KOps/s 30.7799 KOps/s $\color{#d91a1a}-1.32\%$
test_add_td 84.2440μs 43.2222μs 23.1362 KOps/s 20.8011 KOps/s $\textbf{\color{#35bf28}+11.23\%}$
test_compile_add_one_nested[tensordict-compile] 0.1765ms 0.1214ms 8.2367 KOps/s 7.7906 KOps/s $\textbf{\color{#35bf28}+5.73\%}$
test_compile_add_one_nested[tensordict-eager] 0.2271ms 0.1307ms 7.6524 KOps/s 7.6110 KOps/s $\color{#35bf28}+0.54\%$
test_compile_add_one_nested[pytree-compile] 0.1327ms 93.9959μs 10.6388 KOps/s 10.7203 KOps/s $\color{#d91a1a}-0.76\%$
test_compile_add_one_nested[pytree-eager] 0.2029ms 0.1483ms 6.7420 KOps/s 6.7362 KOps/s $\color{#35bf28}+0.09\%$
test_compile_copy_nested[tensordict-compile] 59.5930μs 25.2350μs 39.6275 KOps/s 44.3595 KOps/s $\textbf{\color{#d91a1a}-10.67\%}$
test_compile_copy_nested[tensordict-eager] 57.4630μs 28.7693μs 34.7593 KOps/s 34.0518 KOps/s $\color{#35bf28}+2.08\%$
test_compile_copy_nested[pytree-compile] 0.3343ms 64.6168μs 15.4759 KOps/s 15.6510 KOps/s $\color{#d91a1a}-1.12\%$
test_compile_copy_nested[pytree-eager] 73.7740μs 48.8258μs 20.4810 KOps/s 20.1801 KOps/s $\color{#35bf28}+1.49\%$
test_compile_add_one_flat[tensordict-compile] 0.1831ms 0.1407ms 7.1058 KOps/s 7.1557 KOps/s $\color{#d91a1a}-0.70\%$
test_compile_add_one_flat[tensordict-eager] 0.3188ms 0.2176ms 4.5952 KOps/s 4.6463 KOps/s $\color{#d91a1a}-1.10\%$
test_compile_add_one_flat[tensorclass-compile] 0.1580ms 0.1011ms 9.8878 KOps/s 10.3629 KOps/s $\color{#d91a1a}-4.58\%$
test_compile_add_one_flat[tensorclass-eager] 0.1213ms 57.6444μs 17.3477 KOps/s 18.0375 KOps/s $\color{#d91a1a}-3.82\%$
test_compile_add_one_flat[pytree-compile] 0.1750ms 0.1346ms 7.4294 KOps/s 7.2501 KOps/s $\color{#35bf28}+2.47\%$
test_compile_add_one_flat[pytree-eager] 0.5871ms 0.4809ms 2.0795 KOps/s 2.0813 KOps/s $\color{#d91a1a}-0.09\%$
test_compile_add_self_flat[tensordict-eager] 0.4004ms 0.2610ms 3.8318 KOps/s 3.8235 KOps/s $\color{#35bf28}+0.22\%$
test_compile_add_self_flat[tensordict-compile] 0.1824ms 0.1425ms 7.0158 KOps/s 6.8394 KOps/s $\color{#35bf28}+2.58\%$
test_compile_add_self_flat[tensorclass-eager] 0.1644ms 66.0051μs 15.1503 KOps/s 15.0853 KOps/s $\color{#35bf28}+0.43\%$
test_compile_add_self_flat[tensorclass-compile] 0.1371ms 98.9696μs 10.1041 KOps/s 10.2648 KOps/s $\color{#d91a1a}-1.57\%$
test_compile_add_self_flat[pytree-eager] 0.4570ms 0.4099ms 2.4397 KOps/s 2.4690 KOps/s $\color{#d91a1a}-1.19\%$
test_compile_add_self_flat[pytree-compile] 0.1762ms 0.1343ms 7.4479 KOps/s 7.5053 KOps/s $\color{#d91a1a}-0.77\%$
test_compile_copy_flat[tensordict-compile] 50.7020μs 19.2374μs 51.9821 KOps/s 56.2192 KOps/s $\textbf{\color{#d91a1a}-7.54\%}$
test_compile_copy_flat[tensordict-eager] 67.4140μs 30.7192μs 32.5530 KOps/s 31.9528 KOps/s $\color{#35bf28}+1.88\%$
test_compile_copy_flat[pytree-compile] 0.1071ms 69.5336μs 14.3815 KOps/s 14.3566 KOps/s $\color{#35bf28}+0.17\%$
test_compile_copy_flat[pytree-eager] 0.1908ms 50.6273μs 19.7522 KOps/s 19.7803 KOps/s $\color{#d91a1a}-0.14\%$
test_compile_assign_and_add[tensordict-compile] 1.6005ms 0.3855ms 2.5943 KOps/s 2.2404 KOps/s $\textbf{\color{#35bf28}+15.80\%}$
test_compile_assign_and_add[tensordict-eager] 2.7965ms 2.5825ms 387.2158 Ops/s 379.6917 Ops/s $\color{#35bf28}+1.98\%$
test_compile_assign_and_add[pytree-compile] 1.5750ms 0.4291ms 2.3302 KOps/s 2.3499 KOps/s $\color{#d91a1a}-0.84\%$
test_compile_assign_and_add[pytree-eager] 2.7035ms 2.6046ms 383.9422 Ops/s 386.8415 Ops/s $\color{#d91a1a}-0.75\%$
test_compile_indexing[tensor-tensordict-compile] 0.1664ms 0.1159ms 8.6292 KOps/s 9.1939 KOps/s $\textbf{\color{#d91a1a}-6.14\%}$
test_compile_indexing[tensor-tensordict-eager] 0.5551ms 80.4090μs 12.4364 KOps/s 13.1836 KOps/s $\textbf{\color{#d91a1a}-5.67\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.1633ms 0.1067ms 9.3694 KOps/s 9.8955 KOps/s $\textbf{\color{#d91a1a}-5.32\%}$
test_compile_indexing[tensor-tensorclass-eager] 0.1296ms 69.4867μs 14.3912 KOps/s 15.2716 KOps/s $\textbf{\color{#d91a1a}-5.76\%}$
test_compile_indexing[tensor-pytree-compile] 0.1636ms 0.1084ms 9.2267 KOps/s 9.7992 KOps/s $\textbf{\color{#d91a1a}-5.84\%}$
test_compile_indexing[tensor-pytree-eager] 0.1847ms 67.1319μs 14.8961 KOps/s 15.2948 KOps/s $\color{#d91a1a}-2.61\%$
test_compile_indexing[slice-tensordict-compile] 0.1597ms 0.1001ms 9.9881 KOps/s 10.1860 KOps/s $\color{#d91a1a}-1.94\%$
test_compile_indexing[slice-tensordict-eager] 0.1418ms 17.3407μs 57.6679 KOps/s 58.3530 KOps/s $\color{#d91a1a}-1.17\%$
test_compile_indexing[slice-tensorclass-compile] 0.1454ms 95.9372μs 10.4235 KOps/s 10.5737 KOps/s $\color{#d91a1a}-1.42\%$
test_compile_indexing[slice-tensorclass-eager] 56.3730μs 15.8989μs 62.8974 KOps/s 63.7575 KOps/s $\color{#d91a1a}-1.35\%$
test_compile_indexing[slice-pytree-compile] 0.1352ms 95.5633μs 10.4643 KOps/s 10.5376 KOps/s $\color{#d91a1a}-0.70\%$
test_compile_indexing[slice-pytree-eager] 49.2830μs 15.7897μs 63.3322 KOps/s 64.6620 KOps/s $\color{#d91a1a}-2.06\%$
test_compile_indexing[int-tensordict-compile] 0.1416ms 0.1010ms 9.9002 KOps/s 10.0684 KOps/s $\color{#d91a1a}-1.67\%$
test_compile_indexing[int-tensordict-eager] 0.5765ms 16.7003μs 59.8792 KOps/s 58.9535 KOps/s $\color{#35bf28}+1.57\%$
test_compile_indexing[int-tensorclass-compile] 0.1571ms 95.5840μs 10.4620 KOps/s 10.5284 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_indexing[int-tensorclass-eager] 63.2030μs 15.8193μs 63.2141 KOps/s 64.2480 KOps/s $\color{#d91a1a}-1.61\%$
test_compile_indexing[int-pytree-compile] 0.1351ms 93.8926μs 10.6505 KOps/s 10.5200 KOps/s $\color{#35bf28}+1.24\%$
test_compile_indexing[int-pytree-eager] 49.0530μs 15.7693μs 63.4145 KOps/s 63.9852 KOps/s $\color{#d91a1a}-0.89\%$
test_mod_add[eager] 0.1453ms 35.5214μs 28.1520 KOps/s 25.7210 KOps/s $\textbf{\color{#35bf28}+9.45\%}$
test_mod_add[compile] 0.1321ms 79.4973μs 12.5790 KOps/s 12.6481 KOps/s $\color{#d91a1a}-0.55\%$
test_mod_add[compile-overhead] 0.3257ms 0.1656ms 6.0399 KOps/s 5.6686 KOps/s $\textbf{\color{#35bf28}+6.55\%}$
test_mod_wrap[eager] 0.3126ms 0.2360ms 4.2375 KOps/s 4.0491 KOps/s $\color{#35bf28}+4.65\%$
test_mod_wrap[compile] 0.6765ms 0.2752ms 3.6342 KOps/s 3.6157 KOps/s $\color{#35bf28}+0.51\%$
test_mod_wrap[compile-overhead] 6.9577ms 3.6533ms 273.7225 Ops/s 263.4296 Ops/s $\color{#35bf28}+3.91\%$
test_mod_wrap_and_backward[eager] 1.7785ms 1.3383ms 747.2186 Ops/s 711.8402 Ops/s $\color{#35bf28}+4.97\%$
test_mod_wrap_and_backward[compile] 1.3172ms 1.2297ms 813.2384 Ops/s 789.2293 Ops/s $\color{#35bf28}+3.04\%$
test_mod_wrap_and_backward[compile-overhead] 1.3366ms 0.8989ms 1.1125 KOps/s 1.1009 KOps/s $\color{#35bf28}+1.05\%$
test_seq_add[eager] 0.1460ms 0.1098ms 9.1108 KOps/s 8.7351 KOps/s $\color{#35bf28}+4.30\%$
test_seq_add[compile] 0.1409ms 86.6636μs 11.5389 KOps/s 11.6046 KOps/s $\color{#d91a1a}-0.57\%$
test_seq_add[compile-overhead] 0.1820ms 0.1276ms 7.8392 KOps/s 7.7668 KOps/s $\color{#35bf28}+0.93\%$
test_seq_wrap[eager] 0.4791ms 0.3983ms 2.5107 KOps/s 2.4342 KOps/s $\color{#35bf28}+3.14\%$
test_seq_wrap[compile] 0.3584ms 0.2902ms 3.4453 KOps/s 3.4004 KOps/s $\color{#35bf28}+1.32\%$
test_seq_wrap[compile-overhead] 0.2687ms 0.2197ms 4.5515 KOps/s 4.5381 KOps/s $\color{#35bf28}+0.29\%$
test_func_call_runtime[False-eager] 0.7675ms 0.7020ms 1.4246 KOps/s 1.3456 KOps/s $\textbf{\color{#35bf28}+5.87\%}$
test_func_call_runtime[False-compile] 0.9005ms 0.7183ms 1.3921 KOps/s 1.3920 KOps/s $+0.01\%$
test_func_call_runtime[False-compile-overhead] 0.3995ms 0.3503ms 2.8546 KOps/s 2.8633 KOps/s $\color{#d91a1a}-0.30\%$
test_func_call_runtime[True-eager] 0.9612ms 0.8586ms 1.1647 KOps/s 1.1434 KOps/s $\color{#35bf28}+1.87\%$
test_func_call_runtime[True-compile] 0.7887ms 0.7354ms 1.3598 KOps/s 1.3603 KOps/s $\color{#d91a1a}-0.04\%$
test_func_call_runtime[True-compile-overhead] 0.4307ms 0.3709ms 2.6964 KOps/s 2.7079 KOps/s $\color{#d91a1a}-0.42\%$
test_func_call_cm_runtime[False-eager] 0.7517ms 0.7010ms 1.4265 KOps/s 1.4252 KOps/s $\color{#35bf28}+0.09\%$
test_func_call_cm_runtime[False-compile] 0.7813ms 0.7209ms 1.3871 KOps/s 1.3861 KOps/s $\color{#35bf28}+0.07\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4019ms 0.3544ms 2.8216 KOps/s 2.8409 KOps/s $\color{#d91a1a}-0.68\%$
test_func_call_cm_runtime[True-eager] 1.0639ms 0.9702ms 1.0307 KOps/s 1.0169 KOps/s $\color{#35bf28}+1.36\%$
test_func_call_cm_runtime[True-compile] 0.8184ms 0.7685ms 1.3012 KOps/s 1.3016 KOps/s $\color{#d91a1a}-0.03\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4791ms 0.3956ms 2.5279 KOps/s 2.5188 KOps/s $\color{#35bf28}+0.36\%$
test_vmap_func_call_cm_runtime[eager] 2.4242ms 1.9867ms 503.3438 Ops/s 498.5650 Ops/s $\color{#35bf28}+0.96\%$
test_vmap_func_call_cm_runtime[compile] 0.8716ms 0.7982ms 1.2528 KOps/s 1.2748 KOps/s $\color{#d91a1a}-1.72\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4621ms 0.4011ms 2.4931 KOps/s 2.4959 KOps/s $\color{#d91a1a}-0.11\%$
test_distributed 2.8675ms 0.2493ms 4.0116 KOps/s 7.9003 KOps/s $\textbf{\color{#d91a1a}-49.22\%}$
test_tdmodule 0.2731ms 18.9941μs 52.6479 KOps/s 49.4999 KOps/s $\textbf{\color{#35bf28}+6.36\%}$
test_tdmodule_dispatch 62.8130μs 33.2606μs 30.0656 KOps/s 27.5137 KOps/s $\textbf{\color{#35bf28}+9.28\%}$
test_tdseq 26.8510μs 19.3759μs 51.6104 KOps/s 46.9755 KOps/s $\textbf{\color{#35bf28}+9.87\%}$
test_tdseq_dispatch 64.9330μs 35.5195μs 28.1536 KOps/s 25.1208 KOps/s $\textbf{\color{#35bf28}+12.07\%}$
test_instantiation_functorch 1.6917ms 1.5088ms 662.7574 Ops/s 658.6560 Ops/s $\color{#35bf28}+0.62\%$
test_exec_functorch 0.2027ms 0.1427ms 7.0080 KOps/s 7.1140 KOps/s $\color{#d91a1a}-1.49\%$
test_exec_functional_call 0.1766ms 0.1320ms 7.5782 KOps/s 7.6413 KOps/s $\color{#d91a1a}-0.83\%$
test_exec_td_decorator 0.3721ms 0.1800ms 5.5565 KOps/s 5.5423 KOps/s $\color{#35bf28}+0.26\%$
test_vmap_mlp_speed_decorator[True-True] 0.7982ms 0.6566ms 1.5229 KOps/s 1.4978 KOps/s $\color{#35bf28}+1.67\%$
test_vmap_mlp_speed_decorator[True-False] 0.7733ms 0.6596ms 1.5160 KOps/s 1.4884 KOps/s $\color{#35bf28}+1.85\%$
test_vmap_mlp_speed_decorator[False-True] 0.7653ms 0.5714ms 1.7501 KOps/s 1.6478 KOps/s $\textbf{\color{#35bf28}+6.20\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7021ms 0.5759ms 1.7364 KOps/s 1.6488 KOps/s $\textbf{\color{#35bf28}+5.32\%}$
test_vmap_transformer_speed_decorator[True-True] 18.5657ms 18.4701ms 54.1415 Ops/s 53.8538 Ops/s $\color{#35bf28}+0.53\%$
test_vmap_transformer_speed_decorator[True-False] 18.9438ms 18.4736ms 54.1313 Ops/s 53.1730 Ops/s $\color{#35bf28}+1.80\%$
test_vmap_transformer_speed_decorator[False-True] 19.5586ms 18.3743ms 54.4237 Ops/s 54.1865 Ops/s $\color{#35bf28}+0.44\%$
test_vmap_transformer_speed_decorator[False-False] 19.2815ms 18.4060ms 54.3300 Ops/s 53.9413 Ops/s $\color{#35bf28}+0.72\%$
test_to_module_speed[True] 1.3219ms 0.9630ms 1.0384 KOps/s 1.0168 KOps/s $\color{#35bf28}+2.12\%$
test_to_module_speed[False] 1.4717ms 0.9498ms 1.0529 KOps/s 1.0251 KOps/s $\color{#35bf28}+2.71\%$
test_tc_init 84.6540μs 33.4077μs 29.9332 KOps/s 26.9109 KOps/s $\textbf{\color{#35bf28}+11.23\%}$
test_tc_init_nested 99.3760μs 65.3180μs 15.3097 KOps/s 13.3784 KOps/s $\textbf{\color{#35bf28}+14.44\%}$
test_tc_first_layer_tensor 6.3161μs 0.6870μs 1.4556 MOps/s 1.2415 MOps/s $\textbf{\color{#35bf28}+17.24\%}$
test_tc_first_layer_nontensor 23.6810μs 2.2373μs 446.9742 KOps/s 443.8622 KOps/s $\color{#35bf28}+0.70\%$
test_tc_second_layer_tensor 63.2467μs 1.3990μs 714.8042 KOps/s 689.0311 KOps/s $\color{#35bf28}+3.74\%$
test_tc_second_layer_nontensor 47.4930μs 2.9217μs 342.2635 KOps/s 330.2787 KOps/s $\color{#35bf28}+3.63\%$
test_unbind 0.2338s 10.1977ms 98.0615 Ops/s 142.8387 Ops/s $\textbf{\color{#d91a1a}-31.35\%}$
test_full_like 10.1944ms 9.1677ms 109.0790 Ops/s 109.1917 Ops/s $\color{#d91a1a}-0.10\%$
test_zeros_like 5.2727ms 4.3256ms 231.1814 Ops/s 231.7920 Ops/s $\color{#d91a1a}-0.26\%$
test_ones_like 4.9001ms 4.3278ms 231.0653 Ops/s 231.7467 Ops/s $\color{#d91a1a}-0.29\%$
test_clone 6.7279ms 6.3067ms 158.5625 Ops/s 158.1899 Ops/s $\color{#35bf28}+0.24\%$
test_squeeze 57.3030μs 9.1812μs 108.9182 KOps/s 108.1743 KOps/s $\color{#35bf28}+0.69\%$
test_unsqueeze 0.1462ms 72.8202μs 13.7325 KOps/s 14.1147 KOps/s $\color{#d91a1a}-2.71\%$
test_split 0.5813ms 0.1664ms 6.0086 KOps/s 6.2868 KOps/s $\color{#d91a1a}-4.42\%$
test_permute 0.2464ms 0.1682ms 5.9440 KOps/s 5.5992 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_stack 50.9293ms 50.2353ms 19.9063 Ops/s 19.9300 Ops/s $\color{#d91a1a}-0.12\%$
test_cat 50.6232ms 50.1617ms 19.9355 Ops/s 19.9800 Ops/s $\color{#d91a1a}-0.22\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 16, 2025
ghstack-source-id: 83e79abda6a4bb6839d99240052323380981855c
Pull Request resolved: #1186
@vmoens vmoens merged commit 8a7c4bc into gh/vmoens/45/base Jan 20, 2025
45 of 52 checks passed
vmoens added a commit that referenced this pull request Jan 20, 2025
ghstack-source-id: 83e79abda6a4bb6839d99240052323380981855c
Pull Request resolved: #1186
@vmoens vmoens deleted the gh/vmoens/45/head branch January 20, 2025 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
2 participants