Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Improve functional call efficiency #567

Merged
merged 8 commits into from
Nov 23, 2023
Merged

[Refactor] Improve functional call efficiency #567

merged 8 commits into from
Nov 23, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 22, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 22, 2023
@vmoens vmoens marked this pull request as ready for review November 22, 2023 12:54
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Nov 22, 2023
Copy link

github-actions bot commented Nov 22, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 49.3920μs 16.3499μs 61.1625 KOps/s 60.8033 KOps/s $\color{#35bf28}+0.59\%$
test_plain_set_stack_nested 0.2911ms 0.1485ms 6.7335 KOps/s 6.7729 KOps/s $\color{#d91a1a}-0.58\%$
test_plain_set_nested_inplace 69.2480μs 19.3836μs 51.5900 KOps/s 51.4667 KOps/s $\color{#35bf28}+0.24\%$
test_plain_set_stack_nested_inplace 0.4063ms 0.1757ms 5.6900 KOps/s 5.7172 KOps/s $\color{#d91a1a}-0.47\%$
test_items 0.1627ms 2.6808μs 373.0212 KOps/s 405.0031 KOps/s $\textbf{\color{#d91a1a}-7.90\%}$
test_items_nested 1.3161ms 0.2731ms 3.6617 KOps/s 3.6793 KOps/s $\color{#d91a1a}-0.48\%$
test_items_nested_locked 0.3928ms 0.2697ms 3.7077 KOps/s 3.7163 KOps/s $\color{#d91a1a}-0.23\%$
test_items_nested_leaf 0.8663ms 0.1736ms 5.7593 KOps/s 6.0094 KOps/s $\color{#d91a1a}-4.16\%$
test_items_stack_nested 2.6182ms 1.5238ms 656.2486 Ops/s 684.7777 Ops/s $\color{#d91a1a}-4.17\%$
test_items_stack_nested_leaf 2.5099ms 1.4151ms 706.6753 Ops/s 754.9729 Ops/s $\textbf{\color{#d91a1a}-6.40\%}$
test_items_stack_nested_locked 1.1996ms 0.7626ms 1.3113 KOps/s 1.2988 KOps/s $\color{#35bf28}+0.97\%$
test_keys 41.6180μs 4.8424μs 206.5080 KOps/s 255.3977 KOps/s $\textbf{\color{#d91a1a}-19.14\%}$
test_keys_nested 1.5238ms 0.1509ms 6.6252 KOps/s 6.7517 KOps/s $\color{#d91a1a}-1.87\%$
test_keys_nested_locked 0.2670ms 0.1411ms 7.0877 KOps/s 7.0900 KOps/s $\color{#d91a1a}-0.03\%$
test_keys_nested_leaf 0.3867ms 0.1407ms 7.1048 KOps/s 7.2393 KOps/s $\color{#d91a1a}-1.86\%$
test_keys_stack_nested 1.8532ms 1.4088ms 709.8094 Ops/s 727.3771 Ops/s $\color{#d91a1a}-2.42\%$
test_keys_stack_nested_leaf 1.5072ms 1.4035ms 712.4796 Ops/s 723.6928 Ops/s $\color{#d91a1a}-1.55\%$
test_keys_stack_nested_locked 1.3501ms 0.6610ms 1.5128 KOps/s 1.4748 KOps/s $\color{#35bf28}+2.58\%$
test_values 21.9283μs 1.1628μs 859.9575 KOps/s 859.4670 KOps/s $\color{#35bf28}+0.06\%$
test_values_nested 95.8180μs 50.2259μs 19.9100 KOps/s 20.2167 KOps/s $\color{#d91a1a}-1.52\%$
test_values_nested_locked 97.2900μs 49.9500μs 20.0200 KOps/s 20.0410 KOps/s $\color{#d91a1a}-0.10\%$
test_values_nested_leaf 75.4600μs 44.4162μs 22.5143 KOps/s 22.3275 KOps/s $\color{#35bf28}+0.84\%$
test_values_stack_nested 1.3409ms 1.2076ms 828.0931 Ops/s 851.6675 Ops/s $\color{#d91a1a}-2.77\%$
test_values_stack_nested_leaf 1.4246ms 1.1900ms 840.3581 Ops/s 859.6559 Ops/s $\color{#d91a1a}-2.24\%$
test_values_stack_nested_locked 0.6945ms 0.5060ms 1.9765 KOps/s 1.9328 KOps/s $\color{#35bf28}+2.26\%$
test_membership 22.2610μs 1.3507μs 740.3711 KOps/s 726.3895 KOps/s $\color{#35bf28}+1.92\%$
test_membership_nested 42.8900μs 2.8028μs 356.7881 KOps/s 355.1947 KOps/s $\color{#35bf28}+0.45\%$
test_membership_nested_leaf 20.4080μs 2.7785μs 359.9057 KOps/s 353.0000 KOps/s $\color{#35bf28}+1.96\%$
test_membership_stacked_nested 56.5750μs 12.1121μs 82.5619 KOps/s 84.1272 KOps/s $\color{#d91a1a}-1.86\%$
test_membership_stacked_nested_leaf 0.1187ms 12.9604μs 77.1581 KOps/s 84.8124 KOps/s $\textbf{\color{#d91a1a}-9.02\%}$
test_membership_nested_last 49.1010μs 5.9223μs 168.8527 KOps/s 168.9552 KOps/s $\color{#d91a1a}-0.06\%$
test_membership_nested_leaf_last 32.5100μs 5.8923μs 169.7128 KOps/s 167.6926 KOps/s $\color{#35bf28}+1.20\%$
test_membership_stacked_nested_last 0.2853ms 0.1695ms 5.9002 KOps/s 5.8767 KOps/s $\color{#35bf28}+0.40\%$
test_membership_stacked_nested_leaf_last 65.5690μs 14.1912μs 70.4660 KOps/s 71.9069 KOps/s $\color{#d91a1a}-2.00\%$
test_nested_getleaf 30.8170μs 10.8293μs 92.3425 KOps/s 93.9048 KOps/s $\color{#d91a1a}-1.66\%$
test_nested_get 34.1130μs 10.2198μs 97.8494 KOps/s 98.7061 KOps/s $\color{#d91a1a}-0.87\%$
test_stacked_getleaf 1.1213ms 0.6495ms 1.5396 KOps/s 1.6099 KOps/s $\color{#d91a1a}-4.37\%$
test_stacked_get 1.2159ms 0.6131ms 1.6310 KOps/s 1.6942 KOps/s $\color{#d91a1a}-3.73\%$
test_nested_getitemleaf 0.1270ms 11.2172μs 89.1491 KOps/s 93.1218 KOps/s $\color{#d91a1a}-4.27\%$
test_nested_getitem 42.8290μs 10.0364μs 99.6374 KOps/s 98.1196 KOps/s $\color{#35bf28}+1.55\%$
test_stacked_getitemleaf 1.2130ms 0.6500ms 1.5385 KOps/s 1.5472 KOps/s $\color{#d91a1a}-0.56\%$
test_stacked_getitem 1.0322ms 0.6122ms 1.6333 KOps/s 1.6941 KOps/s $\color{#d91a1a}-3.59\%$
test_lock_nested 61.6082ms 0.5543ms 1.8041 KOps/s 2.0121 KOps/s $\textbf{\color{#d91a1a}-10.34\%}$
test_lock_stack_nested 91.3765ms 8.9422ms 111.8289 Ops/s 117.1341 Ops/s $\color{#d91a1a}-4.53\%$
test_unlock_nested 71.0665ms 0.5176ms 1.9321 KOps/s 1.9421 KOps/s $\color{#d91a1a}-0.51\%$
test_unlock_stack_nested 79.0098ms 8.4626ms 118.1669 Ops/s 202.9720 Ops/s $\textbf{\color{#d91a1a}-41.78\%}$
test_flatten_speed 0.7222ms 0.2705ms 3.6973 KOps/s 3.6769 KOps/s $\color{#35bf28}+0.56\%$
test_unflatten_speed 1.2667ms 0.4728ms 2.1149 KOps/s 2.1077 KOps/s $\color{#35bf28}+0.34\%$
test_common_ops 1.4755ms 0.6937ms 1.4416 KOps/s 1.4068 KOps/s $\color{#35bf28}+2.47\%$
test_creation 26.2480μs 2.4419μs 409.5184 KOps/s 405.5477 KOps/s $\color{#35bf28}+0.98\%$
test_creation_empty 38.2920μs 8.5579μs 116.8509 KOps/s 112.0649 KOps/s $\color{#35bf28}+4.27\%$
test_creation_nested_1 25.9380μs 12.1243μs 82.4792 KOps/s 77.3122 KOps/s $\textbf{\color{#35bf28}+6.68\%}$
test_creation_nested_2 65.8220μs 15.3840μs 65.0024 KOps/s 61.4068 KOps/s $\textbf{\color{#35bf28}+5.86\%}$
test_clone 0.2143ms 13.4427μs 74.3899 KOps/s 76.0741 KOps/s $\color{#d91a1a}-2.21\%$
test_getitem[int] 58.8100μs 12.6932μs 78.7822 KOps/s 77.9121 KOps/s $\color{#35bf28}+1.12\%$
test_getitem[slice_int] 69.6500μs 24.6366μs 40.5901 KOps/s 40.6617 KOps/s $\color{#d91a1a}-0.18\%$
test_getitem[range] 96.6090μs 43.1414μs 23.1796 KOps/s 22.5144 KOps/s $\color{#35bf28}+2.95\%$
test_getitem[tuple] 53.2680μs 20.4100μs 48.9957 KOps/s 49.8519 KOps/s $\color{#d91a1a}-1.72\%$
test_getitem[list] 80.3980μs 38.7185μs 25.8275 KOps/s 24.9051 KOps/s $\color{#35bf28}+3.70\%$
test_setitem_dim[int] 47.3470μs 27.4519μs 36.4274 KOps/s 36.0508 KOps/s $\color{#35bf28}+1.04\%$
test_setitem_dim[slice_int] 84.3760μs 51.8672μs 19.2800 KOps/s 19.3474 KOps/s $\color{#d91a1a}-0.35\%$
test_setitem_dim[range] 0.1212ms 72.0116μs 13.8867 KOps/s 13.7889 KOps/s $\color{#35bf28}+0.71\%$
test_setitem_dim[tuple] 0.1135ms 40.9736μs 24.4060 KOps/s 24.3050 KOps/s $\color{#35bf28}+0.42\%$
test_setitem 0.2502ms 18.7232μs 53.4097 KOps/s 53.4992 KOps/s $\color{#d91a1a}-0.17\%$
test_set 0.2319ms 18.1232μs 55.1778 KOps/s 55.1898 KOps/s $\color{#d91a1a}-0.02\%$
test_set_shared 3.3573ms 0.1408ms 7.1033 KOps/s 7.2263 KOps/s $\color{#d91a1a}-1.70\%$
test_update 0.2095ms 24.4040μs 40.9769 KOps/s 41.4602 KOps/s $\color{#d91a1a}-1.17\%$
test_update_nested 0.3807ms 36.6963μs 27.2507 KOps/s 27.8001 KOps/s $\color{#d91a1a}-1.98\%$
test_set_nested 0.2146ms 19.8791μs 50.3041 KOps/s 50.0880 KOps/s $\color{#35bf28}+0.43\%$
test_set_nested_new 0.2211ms 25.3880μs 39.3887 KOps/s 37.9763 KOps/s $\color{#35bf28}+3.72\%$
test_select 98.5830μs 49.7152μs 20.1146 KOps/s 19.2303 KOps/s $\color{#35bf28}+4.60\%$
test_unbind_speed 0.4633ms 0.3685ms 2.7138 KOps/s 2.6979 KOps/s $\color{#35bf28}+0.59\%$
test_unbind_speed_stack0 69.7567ms 5.6275ms 177.6997 Ops/s 175.6044 Ops/s $\color{#35bf28}+1.19\%$
test_unbind_speed_stack1 1.6030μs 0.6445μs 1.5517 MOps/s 1.5998 MOps/s $\color{#d91a1a}-3.01\%$
test_split 1.7025ms 1.6314ms 612.9627 Ops/s 611.4281 Ops/s $\color{#35bf28}+0.25\%$
test_chunk 62.3735ms 1.7439ms 573.4190 Ops/s 569.1813 Ops/s $\color{#35bf28}+0.74\%$
test_creation[device0] 3.6752ms 0.3027ms 3.3034 KOps/s 3.1244 KOps/s $\textbf{\color{#35bf28}+5.73\%}$
test_creation_from_tensor 60.0502ms 0.3677ms 2.7198 KOps/s 2.6764 KOps/s $\color{#35bf28}+1.62\%$
test_add_one[memmap_tensor0] 70.4610μs 24.4884μs 40.8357 KOps/s 39.8896 KOps/s $\color{#35bf28}+2.37\%$
test_contiguous[memmap_tensor0] 33.1810μs 5.6485μs 177.0374 KOps/s 175.4126 KOps/s $\color{#35bf28}+0.93\%$
test_stack[memmap_tensor0] 67.1740μs 18.6293μs 53.6789 KOps/s 51.6743 KOps/s $\color{#35bf28}+3.88\%$
test_memmaptd_index 0.4305ms 0.1879ms 5.3210 KOps/s 5.2488 KOps/s $\color{#35bf28}+1.38\%$
test_memmaptd_index_astensor 0.4081ms 0.2499ms 4.0015 KOps/s 3.9370 KOps/s $\color{#35bf28}+1.64\%$
test_memmaptd_index_op 0.9349ms 0.4864ms 2.0560 KOps/s 2.0120 KOps/s $\color{#35bf28}+2.18\%$
test_reshape_pytree 52.0570μs 23.2417μs 43.0260 KOps/s 42.9671 KOps/s $\color{#35bf28}+0.14\%$
test_reshape_td 73.2360μs 31.2186μs 32.0322 KOps/s 30.9588 KOps/s $\color{#35bf28}+3.47\%$
test_view_pytree 52.5280μs 23.3549μs 42.8176 KOps/s 43.4072 KOps/s $\color{#d91a1a}-1.36\%$
test_view_td 22.6920μs 4.8195μs 207.4895 KOps/s 206.8551 KOps/s $\color{#35bf28}+0.31\%$
test_unbind_pytree 0.2067ms 28.6962μs 34.8478 KOps/s 38.0687 KOps/s $\textbf{\color{#d91a1a}-8.46\%}$
test_unbind_td 0.1092ms 58.5145μs 17.0898 KOps/s 16.8906 KOps/s $\color{#35bf28}+1.18\%$
test_split_pytree 56.4650μs 26.3504μs 37.9501 KOps/s 37.6899 KOps/s $\color{#35bf28}+0.69\%$
test_split_td 0.1323ms 45.5219μs 21.9674 KOps/s 21.5815 KOps/s $\color{#35bf28}+1.79\%$
test_add_pytree 79.1470μs 31.7549μs 31.4912 KOps/s 31.4766 KOps/s $\color{#35bf28}+0.05\%$
test_add_td 0.1422ms 44.1417μs 22.6543 KOps/s 21.8671 KOps/s $\color{#35bf28}+3.60\%$
test_distributed 17.8530μs 6.3633μs 157.1500 KOps/s 168.0954 KOps/s $\textbf{\color{#d91a1a}-6.51\%}$
test_tdmodule 0.1066ms 21.1744μs 47.2267 KOps/s 46.4830 KOps/s $\color{#35bf28}+1.60\%$
test_tdmodule_dispatch 0.1800ms 39.0708μs 25.5946 KOps/s 25.2814 KOps/s $\color{#35bf28}+1.24\%$
test_tdseq 50.1830μs 24.5409μs 40.7483 KOps/s 40.9599 KOps/s $\color{#d91a1a}-0.52\%$
test_tdseq_dispatch 0.1347ms 43.4939μs 22.9917 KOps/s 22.7276 KOps/s $\color{#35bf28}+1.16\%$
test_instantiation_functorch 1.3813ms 1.2806ms 780.9079 Ops/s 777.5176 Ops/s $\color{#35bf28}+0.44\%$
test_instantiation_td 1.4325ms 0.9992ms 1.0008 KOps/s 995.5540 Ops/s $\color{#35bf28}+0.53\%$
test_exec_functorch 0.3485ms 0.1617ms 6.1855 KOps/s 6.3716 KOps/s $\color{#d91a1a}-2.92\%$
test_exec_functional_call 0.2255ms 0.1439ms 6.9490 KOps/s 6.8448 KOps/s $\color{#35bf28}+1.52\%$
test_exec_td 0.4636ms 0.1435ms 6.9669 KOps/s 7.0820 KOps/s $\color{#d91a1a}-1.63\%$
test_exec_td_decorator 1.0311ms 0.2171ms 4.6054 KOps/s 4.0619 KOps/s $\textbf{\color{#35bf28}+13.38\%}$
test_vmap_mlp_speed[True-True] 1.2534ms 0.8935ms 1.1192 KOps/s 1.1242 KOps/s $\color{#d91a1a}-0.45\%$
test_vmap_mlp_speed[True-False] 1.3172ms 0.4848ms 2.0628 KOps/s 2.1430 KOps/s $\color{#d91a1a}-3.75\%$
test_vmap_mlp_speed[False-True] 1.2179ms 0.7777ms 1.2859 KOps/s 1.2859 KOps/s $+0.00\%$
test_vmap_mlp_speed[False-False] 1.0457ms 0.4056ms 2.4656 KOps/s 2.5991 KOps/s $\textbf{\color{#d91a1a}-5.14\%}$
test_vmap_mlp_speed_decorator[True-True] 2.2594ms 1.5668ms 638.2491 Ops/s 540.0427 Ops/s $\textbf{\color{#35bf28}+18.18\%}$
test_vmap_mlp_speed_decorator[True-False] 0.9945ms 0.5492ms 1.8208 KOps/s 1.7247 KOps/s $\textbf{\color{#35bf28}+5.57\%}$
test_vmap_mlp_speed_decorator[False-True] 1.8058ms 1.3554ms 737.7642 Ops/s 619.8881 Ops/s $\textbf{\color{#35bf28}+19.02\%}$
test_vmap_mlp_speed_decorator[False-False] 0.9009ms 0.4251ms 2.3526 KOps/s 2.1978 KOps/s $\textbf{\color{#35bf28}+7.05\%}$

Copy link

github-actions bot commented Nov 22, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5600ms 12.4435μs 80.3634 KOps/s 78.8103 KOps/s $\color{#35bf28}+1.97\%$
test_plain_set_stack_nested 0.1368ms 0.1140ms 8.7732 KOps/s 8.2374 KOps/s $\textbf{\color{#35bf28}+6.51\%}$
test_plain_set_nested_inplace 31.9910μs 14.9392μs 66.9382 KOps/s 66.7472 KOps/s $\color{#35bf28}+0.29\%$
test_plain_set_stack_nested_inplace 0.1737ms 0.1398ms 7.1525 KOps/s 7.1543 KOps/s $\color{#d91a1a}-0.03\%$
test_items 27.1900μs 4.7468μs 210.6672 KOps/s 211.1098 KOps/s $\color{#d91a1a}-0.21\%$
test_items_nested 0.3738ms 0.3383ms 2.9557 KOps/s 2.9742 KOps/s $\color{#d91a1a}-0.62\%$
test_items_nested_locked 0.3587ms 0.3368ms 2.9691 KOps/s 2.9601 KOps/s $\color{#35bf28}+0.31\%$
test_items_nested_leaf 0.2221ms 0.1988ms 5.0293 KOps/s 5.0367 KOps/s $\color{#d91a1a}-0.15\%$
test_items_stack_nested 1.5322ms 1.4929ms 669.8484 Ops/s 687.0648 Ops/s $\color{#d91a1a}-2.51\%$
test_items_stack_nested_leaf 1.3610ms 1.3096ms 763.5654 Ops/s 779.9794 Ops/s $\color{#d91a1a}-2.10\%$
test_items_stack_nested_locked 0.8597ms 0.8185ms 1.2217 KOps/s 1.2188 KOps/s $\color{#35bf28}+0.24\%$
test_keys 25.4400μs 4.7912μs 208.7149 KOps/s 216.1124 KOps/s $\color{#d91a1a}-3.42\%$
test_keys_nested 1.2051ms 89.8260μs 11.1326 KOps/s 11.0851 KOps/s $\color{#35bf28}+0.43\%$
test_keys_nested_locked 0.1201ms 89.2809μs 11.2006 KOps/s 11.1176 KOps/s $\color{#35bf28}+0.75\%$
test_keys_nested_leaf 42.3150ms 85.9087μs 11.6403 KOps/s 12.2449 KOps/s $\color{#d91a1a}-4.94\%$
test_keys_stack_nested 1.3565ms 1.3065ms 765.4025 Ops/s 786.6514 Ops/s $\color{#d91a1a}-2.70\%$
test_keys_stack_nested_leaf 1.3327ms 1.3013ms 768.4577 Ops/s 783.5534 Ops/s $\color{#d91a1a}-1.93\%$
test_keys_stack_nested_locked 0.6776ms 0.6233ms 1.6043 KOps/s 1.6142 KOps/s $\color{#d91a1a}-0.61\%$
test_values 9.2600μs 1.9002μs 526.2615 KOps/s 524.0092 KOps/s $\color{#35bf28}+0.43\%$
test_values_nested 73.3410μs 43.0276μs 23.2409 KOps/s 23.1851 KOps/s $\color{#35bf28}+0.24\%$
test_values_nested_locked 66.2310μs 43.1296μs 23.1859 KOps/s 22.9951 KOps/s $\color{#35bf28}+0.83\%$
test_values_nested_leaf 64.2710μs 37.2400μs 26.8528 KOps/s 26.6395 KOps/s $\color{#35bf28}+0.80\%$
test_values_stack_nested 1.1891ms 1.1482ms 870.9081 Ops/s 903.7586 Ops/s $\color{#d91a1a}-3.63\%$
test_values_stack_nested_leaf 1.1615ms 1.1255ms 888.4834 Ops/s 906.3536 Ops/s $\color{#d91a1a}-1.97\%$
test_values_stack_nested_locked 0.5307ms 0.4962ms 2.0153 KOps/s 2.0166 KOps/s $\color{#d91a1a}-0.07\%$
test_membership 5.4380μs 0.9341μs 1.0705 MOps/s 1.0505 MOps/s $\color{#35bf28}+1.90\%$
test_membership_nested 13.4455μs 2.1409μs 467.0866 KOps/s 472.6314 KOps/s $\color{#d91a1a}-1.17\%$
test_membership_nested_leaf 16.8500μs 2.1365μs 468.0498 KOps/s 472.9976 KOps/s $\color{#d91a1a}-1.05\%$
test_membership_stacked_nested 45.8510μs 10.8318μs 92.3208 KOps/s 92.2587 KOps/s $\color{#35bf28}+0.07\%$
test_membership_stacked_nested_leaf 33.1900μs 10.9361μs 91.4400 KOps/s 92.1949 KOps/s $\color{#d91a1a}-0.82\%$
test_membership_nested_last 23.4800μs 4.6464μs 215.2195 KOps/s 215.9088 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_nested_leaf_last 43.5610μs 4.6473μs 215.1765 KOps/s 216.8417 KOps/s $\color{#d91a1a}-0.77\%$
test_membership_stacked_nested_last 0.1699ms 0.1348ms 7.4157 KOps/s 7.5515 KOps/s $\color{#d91a1a}-1.80\%$
test_membership_stacked_nested_leaf_last 43.6600μs 12.6286μs 79.1851 KOps/s 78.0133 KOps/s $\color{#35bf28}+1.50\%$
test_nested_getleaf 28.9900μs 8.4037μs 118.9951 KOps/s 119.2803 KOps/s $\color{#d91a1a}-0.24\%$
test_nested_get 33.1700μs 7.9474μs 125.8273 KOps/s 125.7483 KOps/s $\color{#35bf28}+0.06\%$
test_stacked_getleaf 0.6280ms 0.5681ms 1.7602 KOps/s 1.8373 KOps/s $\color{#d91a1a}-4.20\%$
test_stacked_get 0.5863ms 0.5419ms 1.8453 KOps/s 1.9479 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_nested_getitemleaf 30.0010μs 8.4310μs 118.6103 KOps/s 117.9235 KOps/s $\color{#35bf28}+0.58\%$
test_nested_getitem 32.2210μs 7.9841μs 125.2487 KOps/s 125.0657 KOps/s $\color{#35bf28}+0.15\%$
test_stacked_getitemleaf 0.6516ms 0.5762ms 1.7354 KOps/s 1.8302 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_stacked_getitem 0.5665ms 0.5385ms 1.8569 KOps/s 1.9454 KOps/s $\color{#d91a1a}-4.55\%$
test_lock_nested 4.3898ms 0.4591ms 2.1780 KOps/s 2.1683 KOps/s $\color{#35bf28}+0.44\%$
test_lock_stack_nested 70.2399ms 6.6020ms 151.4682 Ops/s 149.0229 Ops/s $\color{#35bf28}+1.64\%$
test_unlock_nested 1.2830ms 0.4360ms 2.2937 KOps/s 2.0005 KOps/s $\textbf{\color{#35bf28}+14.66\%}$
test_unlock_stack_nested 67.9921ms 7.3462ms 136.1256 Ops/s 135.6508 Ops/s $\color{#35bf28}+0.35\%$
test_flatten_speed 0.5160ms 0.1860ms 5.3753 KOps/s 5.2410 KOps/s $\color{#35bf28}+2.56\%$
test_unflatten_speed 0.4136ms 0.3617ms 2.7649 KOps/s 2.6809 KOps/s $\color{#35bf28}+3.13\%$
test_common_ops 1.0144ms 0.6037ms 1.6563 KOps/s 1.5813 KOps/s $\color{#35bf28}+4.75\%$
test_creation 18.8900μs 1.9256μs 519.3206 KOps/s 526.2617 KOps/s $\color{#d91a1a}-1.32\%$
test_creation_empty 21.3010μs 6.5755μs 152.0803 KOps/s 144.4321 KOps/s $\textbf{\color{#35bf28}+5.30\%}$
test_creation_nested_1 24.0000μs 8.9291μs 111.9935 KOps/s 104.8845 KOps/s $\textbf{\color{#35bf28}+6.78\%}$
test_creation_nested_2 30.4200μs 11.4306μs 87.4844 KOps/s 82.6993 KOps/s $\textbf{\color{#35bf28}+5.79\%}$
test_clone 31.0900μs 14.3546μs 69.6643 KOps/s 70.2266 KOps/s $\color{#d91a1a}-0.80\%$
test_getitem[int] 56.3310μs 12.1110μs 82.5697 KOps/s 82.0592 KOps/s $\color{#35bf28}+0.62\%$
test_getitem[slice_int] 49.1710μs 23.7430μs 42.1177 KOps/s 43.2061 KOps/s $\color{#d91a1a}-2.52\%$
test_getitem[range] 63.8910μs 40.2647μs 24.8356 KOps/s 25.3155 KOps/s $\color{#d91a1a}-1.90\%$
test_getitem[tuple] 97.9810μs 20.1114μs 49.7231 KOps/s 49.8081 KOps/s $\color{#d91a1a}-0.17\%$
test_getitem[list] 0.3321ms 37.2574μs 26.8403 KOps/s 27.0289 KOps/s $\color{#d91a1a}-0.70\%$
test_setitem_dim[int] 51.1710μs 26.5841μs 37.6165 KOps/s 38.2038 KOps/s $\color{#d91a1a}-1.54\%$
test_setitem_dim[slice_int] 64.4110μs 46.3562μs 21.5721 KOps/s 21.8220 KOps/s $\color{#d91a1a}-1.15\%$
test_setitem_dim[range] 82.9820μs 63.9276μs 15.6427 KOps/s 15.9622 KOps/s $\color{#d91a1a}-2.00\%$
test_setitem_dim[tuple] 57.0300μs 40.0798μs 24.9502 KOps/s 25.4376 KOps/s $\color{#d91a1a}-1.92\%$
test_setitem 0.1348ms 18.2273μs 54.8627 KOps/s 55.9126 KOps/s $\color{#d91a1a}-1.88\%$
test_set 0.1130ms 17.7122μs 56.4584 KOps/s 57.5634 KOps/s $\color{#d91a1a}-1.92\%$
test_set_shared 3.2710ms 0.1024ms 9.7676 KOps/s 9.2968 KOps/s $\textbf{\color{#35bf28}+5.06\%}$
test_update 79.4810μs 21.4295μs 46.6646 KOps/s 43.1235 KOps/s $\textbf{\color{#35bf28}+8.21\%}$
test_update_nested 0.1344ms 31.1642μs 32.0881 KOps/s 31.8821 KOps/s $\color{#35bf28}+0.65\%$
test_set_nested 0.1117ms 18.7856μs 53.2323 KOps/s 52.8417 KOps/s $\color{#35bf28}+0.74\%$
test_set_nested_new 0.1297ms 23.0662μs 43.3536 KOps/s 41.9758 KOps/s $\color{#35bf28}+3.28\%$
test_select 69.4810μs 45.4799μs 21.9877 KOps/s 20.8797 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_to 74.5910μs 53.2820μs 18.7681 KOps/s 18.7561 KOps/s $\color{#35bf28}+0.06\%$
test_to_nonblocking 64.8010μs 34.8789μs 28.6706 KOps/s 28.5955 KOps/s $\color{#35bf28}+0.26\%$
test_unbind_speed 0.3724ms 0.3568ms 2.8027 KOps/s 2.8332 KOps/s $\color{#d91a1a}-1.08\%$
test_unbind_speed_stack0 66.4712ms 5.2004ms 192.2934 Ops/s 192.3873 Ops/s $\color{#d91a1a}-0.05\%$
test_unbind_speed_stack1 3.4940μs 0.5216μs 1.9170 MOps/s 1.8938 MOps/s $\color{#35bf28}+1.23\%$
test_split 53.9143ms 1.8135ms 551.4224 Ops/s 552.2271 Ops/s $\color{#d91a1a}-0.15\%$
test_chunk 53.4272ms 1.7995ms 555.6988 Ops/s 556.8926 Ops/s $\color{#d91a1a}-0.21\%$
test_creation[device0] 0.4964ms 0.3079ms 3.2480 KOps/s 3.2562 KOps/s $\color{#d91a1a}-0.25\%$
test_creation[device1] 0.7881ms 0.3114ms 3.2111 KOps/s 3.2239 KOps/s $\color{#d91a1a}-0.40\%$
test_creation_from_tensor 0.6377ms 0.3372ms 2.9655 KOps/s 2.7469 KOps/s $\textbf{\color{#35bf28}+7.96\%}$
test_add_one[memmap_tensor0] 60.3610μs 24.7806μs 40.3541 KOps/s 40.8849 KOps/s $\color{#d91a1a}-1.30\%$
test_add_one[memmap_tensor1] 0.2126ms 74.9794μs 13.3370 KOps/s 13.4492 KOps/s $\color{#d91a1a}-0.83\%$
test_contiguous[memmap_tensor0] 22.7800μs 6.1070μs 163.7462 KOps/s 169.7714 KOps/s $\color{#d91a1a}-3.55\%$
test_contiguous[memmap_tensor1] 47.8400μs 22.7212μs 44.0118 KOps/s 44.3417 KOps/s $\color{#d91a1a}-0.74\%$
test_stack[memmap_tensor0] 50.0310μs 20.2392μs 49.4090 KOps/s 49.4225 KOps/s $\color{#d91a1a}-0.03\%$
test_stack[memmap_tensor1] 0.1516ms 74.8658μs 13.3572 KOps/s 13.5224 KOps/s $\color{#d91a1a}-1.22\%$
test_memmaptd_index 0.2687ms 0.2243ms 4.4578 KOps/s 4.5213 KOps/s $\color{#d91a1a}-1.40\%$
test_memmaptd_index_astensor 0.3170ms 0.2781ms 3.5953 KOps/s 3.4951 KOps/s $\color{#35bf28}+2.87\%$
test_memmaptd_index_op 0.5958ms 0.5401ms 1.8515 KOps/s 1.8346 KOps/s $\color{#35bf28}+0.92\%$
test_reshape_pytree 37.5410μs 20.9622μs 47.7050 KOps/s 47.9799 KOps/s $\color{#d91a1a}-0.57\%$
test_reshape_td 56.3910μs 30.3328μs 32.9676 KOps/s 32.8760 KOps/s $\color{#35bf28}+0.28\%$
test_view_pytree 40.1910μs 20.6539μs 48.4170 KOps/s 48.6524 KOps/s $\color{#d91a1a}-0.48\%$
test_view_td 21.2210μs 4.0757μs 245.3572 KOps/s 244.2347 KOps/s $\color{#35bf28}+0.46\%$
test_unbind_pytree 44.9410μs 25.9160μs 38.5862 KOps/s 38.0345 KOps/s $\color{#35bf28}+1.45\%$
test_unbind_td 75.3020μs 55.5101μs 18.0147 KOps/s 17.8492 KOps/s $\color{#35bf28}+0.93\%$
test_split_pytree 51.6300μs 24.7577μs 40.3915 KOps/s 40.6124 KOps/s $\color{#d91a1a}-0.54\%$
test_split_td 67.0910μs 45.2003μs 22.1238 KOps/s 22.4545 KOps/s $\color{#d91a1a}-1.47\%$
test_add_pytree 60.2510μs 32.6394μs 30.6378 KOps/s 30.8746 KOps/s $\color{#d91a1a}-0.77\%$
test_add_td 66.0310μs 43.1110μs 23.1960 KOps/s 22.7016 KOps/s $\color{#35bf28}+2.18\%$
test_distributed 23.8800μs 5.4491μs 183.5165 KOps/s 181.1962 KOps/s $\color{#35bf28}+1.28\%$
test_tdmodule 31.8910μs 16.5140μs 60.5546 KOps/s 59.1035 KOps/s $\color{#35bf28}+2.46\%$
test_tdmodule_dispatch 0.1308ms 32.0093μs 31.2409 KOps/s 30.6072 KOps/s $\color{#35bf28}+2.07\%$
test_tdseq 36.3200μs 19.9486μs 50.1288 KOps/s 49.4903 KOps/s $\color{#35bf28}+1.29\%$
test_tdseq_dispatch 0.1318ms 35.1524μs 28.4475 KOps/s 27.7594 KOps/s $\color{#35bf28}+2.48\%$
test_instantiation_functorch 2.0295ms 1.6961ms 589.5974 Ops/s 597.7524 Ops/s $\color{#d91a1a}-1.36\%$
test_instantiation_td 1.6707ms 1.1903ms 840.0944 Ops/s 852.0547 Ops/s $\color{#d91a1a}-1.40\%$
test_exec_functorch 0.2032ms 0.1628ms 6.1434 KOps/s 6.2042 KOps/s $\color{#d91a1a}-0.98\%$
test_exec_functional_call 0.2278ms 0.1639ms 6.0996 KOps/s 6.2171 KOps/s $\color{#d91a1a}-1.89\%$
test_exec_td 0.1965ms 0.1545ms 6.4714 KOps/s 6.5071 KOps/s $\color{#d91a1a}-0.55\%$
test_exec_td_decorator 0.7872ms 0.2264ms 4.4165 KOps/s 3.9988 KOps/s $\textbf{\color{#35bf28}+10.44\%}$
test_vmap_mlp_speed[True-True] 1.1511ms 1.0792ms 926.6249 Ops/s 939.5392 Ops/s $\color{#d91a1a}-1.37\%$
test_vmap_mlp_speed[True-False] 0.8070ms 0.6171ms 1.6204 KOps/s 1.6241 KOps/s $\color{#d91a1a}-0.22\%$
test_vmap_mlp_speed[False-True] 1.1154ms 1.0402ms 961.3480 Ops/s 1.0270 KOps/s $\textbf{\color{#d91a1a}-6.39\%}$
test_vmap_mlp_speed[False-False] 0.6150ms 0.5460ms 1.8316 KOps/s 1.8320 KOps/s $\color{#d91a1a}-0.02\%$
test_vmap_mlp_speed_decorator[True-True] 2.6243ms 1.7956ms 556.9136 Ops/s 482.6189 Ops/s $\textbf{\color{#35bf28}+15.39\%}$
test_vmap_mlp_speed_decorator[True-False] 1.1706ms 0.6905ms 1.4482 KOps/s 1.4070 KOps/s $\color{#35bf28}+2.93\%$
test_vmap_mlp_speed_decorator[False-True] 2.1119ms 1.6157ms 618.9403 Ops/s 534.6432 Ops/s $\textbf{\color{#35bf28}+15.77\%}$
test_vmap_mlp_speed_decorator[False-False] 1.0170ms 0.5879ms 1.7008 KOps/s 1.6464 KOps/s $\color{#35bf28}+3.30\%$
test_vmap_transformer_speed[True-True] 12.7199ms 12.6224ms 79.2242 Ops/s 79.1637 Ops/s $\color{#35bf28}+0.08\%$
test_vmap_transformer_speed[True-False] 8.7169ms 8.2975ms 120.5176 Ops/s 120.7391 Ops/s $\color{#d91a1a}-0.18\%$
test_vmap_transformer_speed[False-True] 12.6758ms 12.5791ms 79.4968 Ops/s 80.1661 Ops/s $\color{#d91a1a}-0.83\%$
test_vmap_transformer_speed[False-False] 8.3318ms 8.2351ms 121.4308 Ops/s 121.7175 Ops/s $\color{#d91a1a}-0.24\%$
test_vmap_transformer_speed_decorator[True-True] 44.5421ms 43.0429ms 23.2326 Ops/s 19.0166 Ops/s $\textbf{\color{#35bf28}+22.17\%}$
test_vmap_transformer_speed_decorator[True-False] 0.1010s 22.0845ms 45.2805 Ops/s 47.8972 Ops/s $\textbf{\color{#d91a1a}-5.46\%}$
test_vmap_transformer_speed_decorator[False-True] 43.7874ms 42.6218ms 23.4622 Ops/s 19.0503 Ops/s $\textbf{\color{#35bf28}+23.16\%}$
test_vmap_transformer_speed_decorator[False-False] 0.1002s 21.6044ms 46.2869 Ops/s 48.7028 Ops/s $\color{#d91a1a}-4.96\%$

@vmoens vmoens merged commit 57fc236 into main Nov 23, 2023
17 of 29 checks passed
@vmoens vmoens deleted the improve-func-td branch November 23, 2023 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants