Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix mem leak when locking #1188

Merged
merged 2 commits into from
Jan 21, 2025
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 21, 2025

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 21, 2025
ghstack-source-id: 7fb551a371fbd44a695005a9c8b0976dd061bcb4
Pull Request resolved: #1188
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2025
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 21, 2025
ghstack-source-id: d6e44e1d9b9afc9903a0f45945c10a94dcf5a0ca
Pull Request resolved: #1188
@vmoens vmoens added the bug Something isn't working label Jan 21, 2025
@vmoens vmoens linked an issue Jan 21, 2025 that may be closed by this pull request
3 tasks
@vmoens vmoens merged commit 734bcac into gh/vmoens/45/base Jan 21, 2025
21 of 26 checks passed
@vmoens vmoens deleted the gh/vmoens/45/head branch January 21, 2025 09:21
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 229. Improved: $\large\color{#35bf28}41$. Worsened: $\large\color{#d91a1a}23$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4007ms 11.6004μs 86.2043 KOps/s 75.5907 KOps/s $\textbf{\color{#35bf28}+14.04\%}$
test_plain_set_stack_nested 47.3710μs 11.6794μs 85.6211 KOps/s 74.7623 KOps/s $\textbf{\color{#35bf28}+14.52\%}$
test_plain_set_nested_inplace 0.4051ms 12.5074μs 79.9529 KOps/s 69.8456 KOps/s $\textbf{\color{#35bf28}+14.47\%}$
test_plain_set_stack_nested_inplace 34.3610μs 12.6287μs 79.1845 KOps/s 69.4316 KOps/s $\textbf{\color{#35bf28}+14.05\%}$
test_items 0.3871ms 3.0380μs 329.1686 KOps/s 341.1411 KOps/s $\color{#d91a1a}-3.51\%$
test_items_nested 0.7512ms 0.3582ms 2.7920 KOps/s 2.7623 KOps/s $\color{#35bf28}+1.07\%$
test_items_nested_locked 0.7547ms 0.3629ms 2.7556 KOps/s 2.7799 KOps/s $\color{#d91a1a}-0.87\%$
test_items_nested_leaf 0.1110ms 59.8400μs 16.7112 KOps/s 17.1530 KOps/s $\color{#d91a1a}-2.58\%$
test_items_stack_nested 0.7762ms 0.3612ms 2.7689 KOps/s 2.7748 KOps/s $\color{#d91a1a}-0.21\%$
test_items_stack_nested_leaf 0.4596ms 61.3879μs 16.2899 KOps/s 16.4970 KOps/s $\color{#d91a1a}-1.26\%$
test_items_stack_nested_locked 0.7531ms 0.3603ms 2.7757 KOps/s 2.7367 KOps/s $\color{#35bf28}+1.43\%$
test_keys 0.3981ms 3.5027μs 285.4909 KOps/s 285.7347 KOps/s $\color{#d91a1a}-0.09\%$
test_keys_nested 0.4844ms 90.2888μs 11.0756 KOps/s 11.1242 KOps/s $\color{#d91a1a}-0.44\%$
test_keys_nested_locked 0.7236ms 95.7704μs 10.4416 KOps/s 10.3903 KOps/s $\color{#35bf28}+0.49\%$
test_keys_nested_leaf 0.1091ms 80.2680μs 12.4583 KOps/s 12.4306 KOps/s $\color{#35bf28}+0.22\%$
test_keys_stack_nested 0.4889ms 91.8484μs 10.8875 KOps/s 11.0673 KOps/s $\color{#d91a1a}-1.62\%$
test_keys_stack_nested_leaf 0.4873ms 82.7098μs 12.0905 KOps/s 12.2298 KOps/s $\color{#d91a1a}-1.14\%$
test_keys_stack_nested_locked 0.5027ms 97.1946μs 10.2886 KOps/s 10.3802 KOps/s $\color{#d91a1a}-0.88\%$
test_values 67.3212μs 0.8670μs 1.1534 MOps/s 1.1495 MOps/s $\color{#35bf28}+0.34\%$
test_values_nested 0.4413ms 38.5244μs 25.9576 KOps/s 26.3171 KOps/s $\color{#d91a1a}-1.37\%$
test_values_nested_locked 0.4383ms 39.9293μs 25.0443 KOps/s 25.0943 KOps/s $\color{#d91a1a}-0.20\%$
test_values_nested_leaf 71.0810μs 42.5072μs 23.5254 KOps/s 23.6508 KOps/s $\color{#d91a1a}-0.53\%$
test_values_stack_nested 0.4409ms 39.2632μs 25.4692 KOps/s 25.7649 KOps/s $\color{#d91a1a}-1.15\%$
test_values_stack_nested_leaf 0.4631ms 43.1626μs 23.1682 KOps/s 23.2910 KOps/s $\color{#d91a1a}-0.53\%$
test_values_stack_nested_locked 0.4436ms 40.9115μs 24.4430 KOps/s 24.6781 KOps/s $\color{#d91a1a}-0.95\%$
test_membership 20.1368μs 0.5101μs 1.9604 MOps/s 1.9667 MOps/s $\color{#d91a1a}-0.32\%$
test_membership_nested 0.2026ms 2.0371μs 490.9030 KOps/s 486.0788 KOps/s $\color{#35bf28}+0.99\%$
test_membership_nested_leaf 0.2103ms 2.0458μs 488.8125 KOps/s 498.3231 KOps/s $\color{#d91a1a}-1.91\%$
test_membership_stacked_nested 40.5510μs 2.1199μs 471.7123 KOps/s 481.6687 KOps/s $\color{#d91a1a}-2.07\%$
test_membership_stacked_nested_leaf 24.9610μs 2.0909μs 478.2598 KOps/s 480.8882 KOps/s $\color{#d91a1a}-0.55\%$
test_membership_nested_last 0.4127ms 3.0797μs 324.7094 KOps/s 325.7793 KOps/s $\color{#d91a1a}-0.33\%$
test_membership_nested_leaf_last 25.7300μs 3.0868μs 323.9608 KOps/s 326.7797 KOps/s $\color{#d91a1a}-0.86\%$
test_membership_stacked_nested_last 0.4136ms 3.0979μs 322.8014 KOps/s 133.6521 KOps/s $\textbf{\color{#35bf28}+141.52\%}$
test_membership_stacked_nested_leaf_last 27.6900μs 3.0630μs 326.4759 KOps/s 134.5825 KOps/s $\textbf{\color{#35bf28}+142.58\%}$
test_nested_getleaf 0.4117ms 6.1747μs 161.9514 KOps/s 164.1918 KOps/s $\color{#d91a1a}-1.36\%$
test_nested_get 25.7500μs 5.7847μs 172.8702 KOps/s 172.3890 KOps/s $\color{#35bf28}+0.28\%$
test_stacked_getleaf 0.4109ms 6.1201μs 163.3948 KOps/s 161.5493 KOps/s $\color{#35bf28}+1.14\%$
test_stacked_get 40.1610μs 5.7802μs 173.0031 KOps/s 172.6725 KOps/s $\color{#35bf28}+0.19\%$
test_nested_getitemleaf 0.4075ms 6.4226μs 155.7013 KOps/s 156.6237 KOps/s $\color{#d91a1a}-0.59\%$
test_nested_getitem 28.9710μs 6.1246μs 163.2758 KOps/s 163.1197 KOps/s $\color{#35bf28}+0.10\%$
test_stacked_getitemleaf 0.4207ms 6.3723μs 156.9300 KOps/s 156.0931 KOps/s $\color{#35bf28}+0.54\%$
test_stacked_getitem 28.9100μs 6.0679μs 164.8019 KOps/s 162.9010 KOps/s $\color{#35bf28}+1.17\%$
test_lock_nested 0.4267ms 0.3450ms 2.8988 KOps/s 2.6767 KOps/s $\textbf{\color{#35bf28}+8.30\%}$
test_lock_stack_nested 0.3962ms 0.3515ms 2.8451 KOps/s 2.9267 KOps/s $\color{#d91a1a}-2.79\%$
test_unlock_nested 0.4071ms 0.2894ms 3.4555 KOps/s 3.1890 KOps/s $\textbf{\color{#35bf28}+8.36\%}$
test_unlock_stack_nested 0.6904ms 0.2904ms 3.4438 KOps/s 3.5664 KOps/s $\color{#d91a1a}-3.44\%$
test_flatten_speed 0.4609ms 77.0830μs 12.9730 KOps/s 13.2657 KOps/s $\color{#d91a1a}-2.21\%$
test_unflatten_speed 0.7207ms 0.3206ms 3.1193 KOps/s 3.1196 KOps/s $-0.01\%$
test_common_ops 1.0045ms 0.5991ms 1.6692 KOps/s 1.5331 KOps/s $\textbf{\color{#35bf28}+8.88\%}$
test_creation 0.1147ms 1.7814μs 561.3499 KOps/s 563.6669 KOps/s $\color{#d91a1a}-0.41\%$
test_creation_empty 37.4300μs 6.9713μs 143.4446 KOps/s 96.9585 KOps/s $\textbf{\color{#35bf28}+47.94\%}$
test_creation_nested_1 37.8110μs 8.6147μs 116.0809 KOps/s 82.5322 KOps/s $\textbf{\color{#35bf28}+40.65\%}$
test_creation_nested_2 0.4163ms 11.3797μs 87.8757 KOps/s 67.7719 KOps/s $\textbf{\color{#35bf28}+29.66\%}$
test_clone 65.4320μs 11.0464μs 90.5270 KOps/s 95.0749 KOps/s $\color{#d91a1a}-4.78\%$
test_getitem[int] 1.2572ms 10.9214μs 91.5633 KOps/s 91.4793 KOps/s $\color{#35bf28}+0.09\%$
test_getitem[slice_int] 0.4304ms 21.6899μs 46.1044 KOps/s 47.5980 KOps/s $\color{#d91a1a}-3.14\%$
test_getitem[range] 0.1293ms 38.0332μs 26.2928 KOps/s 26.8218 KOps/s $\color{#d91a1a}-1.97\%$
test_getitem[tuple] 0.1111ms 18.4844μs 54.0997 KOps/s 54.2345 KOps/s $\color{#d91a1a}-0.25\%$
test_getitem[list] 0.4543ms 35.7527μs 27.9699 KOps/s 30.3281 KOps/s $\textbf{\color{#d91a1a}-7.78\%}$
test_setitem_dim[int] 44.6800μs 22.1920μs 45.0613 KOps/s 50.4938 KOps/s $\textbf{\color{#d91a1a}-10.76\%}$
test_setitem_dim[slice_int] 63.3110μs 41.8032μs 23.9216 KOps/s 25.8223 KOps/s $\textbf{\color{#d91a1a}-7.36\%}$
test_setitem_dim[range] 94.5110μs 57.9189μs 17.2655 KOps/s 19.3175 KOps/s $\textbf{\color{#d91a1a}-10.62\%}$
test_setitem_dim[tuple] 58.7310μs 35.6326μs 28.0642 KOps/s 30.0027 KOps/s $\textbf{\color{#d91a1a}-6.46\%}$
test_setitem 0.4253ms 15.0684μs 66.3641 KOps/s 59.9753 KOps/s $\textbf{\color{#35bf28}+10.65\%}$
test_set 66.9710μs 14.2530μs 70.1607 KOps/s 61.5582 KOps/s $\textbf{\color{#35bf28}+13.97\%}$
test_set_shared 0.7207ms 0.1619ms 6.1763 KOps/s 6.6097 KOps/s $\textbf{\color{#d91a1a}-6.56\%}$
test_update 0.3946ms 16.5097μs 60.5705 KOps/s 50.1657 KOps/s $\textbf{\color{#35bf28}+20.74\%}$
test_update_nested 0.4309ms 22.5284μs 44.3885 KOps/s 39.2502 KOps/s $\textbf{\color{#35bf28}+13.09\%}$
test_update__nested 0.5359ms 26.7570μs 37.3735 KOps/s 38.9710 KOps/s $\color{#d91a1a}-4.10\%$
test_set_nested 76.6810μs 16.0393μs 62.3471 KOps/s 58.9091 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_set_nested_new 0.4246ms 18.3624μs 54.4591 KOps/s 50.4716 KOps/s $\textbf{\color{#35bf28}+7.90\%}$
test_select 58.0810μs 30.0417μs 33.2871 KOps/s 30.5046 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_select_nested 81.5020μs 44.9123μs 22.2656 KOps/s 22.2075 KOps/s $\color{#35bf28}+0.26\%$
test_exclude_nested 0.4555ms 63.8708μs 15.6566 KOps/s 15.5901 KOps/s $\color{#35bf28}+0.43\%$
test_empty[True] 0.6992ms 0.2999ms 3.3339 KOps/s 3.3280 KOps/s $\color{#35bf28}+0.18\%$
test_empty[False] 4.3221μs 0.8333μs 1.2001 MOps/s 1.2010 MOps/s $\color{#d91a1a}-0.08\%$
test_to 92.1810μs 58.6080μs 17.0625 KOps/s 17.2986 KOps/s $\color{#d91a1a}-1.37\%$
test_to_nonblocking 94.9820μs 50.5814μs 19.7701 KOps/s 20.8906 KOps/s $\textbf{\color{#d91a1a}-5.36\%}$
test_unbind_speed 0.2908ms 0.2538ms 3.9402 KOps/s 4.1835 KOps/s $\textbf{\color{#d91a1a}-5.81\%}$
test_unbind_speed_stack0 0.6567ms 0.2420ms 4.1318 KOps/s 4.2177 KOps/s $\color{#d91a1a}-2.04\%$
test_unbind_speed_stack1 94.2833ms 0.7292ms 1.3714 KOps/s 1.5136 KOps/s $\textbf{\color{#d91a1a}-9.39\%}$
test_split 96.1732ms 1.6160ms 618.8166 Ops/s 628.9577 Ops/s $\color{#d91a1a}-1.61\%$
test_chunk 95.6180ms 1.6210ms 616.8947 Ops/s 622.0507 Ops/s $\color{#d91a1a}-0.83\%$
test_consolidate[False-None] 3.4888ms 2.6817ms 372.8921 Ops/s 330.2746 Ops/s $\textbf{\color{#35bf28}+12.90\%}$
test_consolidate[default-None] 1.7735ms 1.6889ms 592.1160 Ops/s 591.4928 Ops/s $\color{#35bf28}+0.11\%$
test_consolidate[reduce-overhead-None] 1.7950ms 1.7322ms 577.3135 Ops/s 574.3208 Ops/s $\color{#35bf28}+0.52\%$
test_consolidate_njt[False-None] 6.8572ms 6.6128ms 151.2213 Ops/s 108.9083 Ops/s $\textbf{\color{#35bf28}+38.85\%}$
test_to[False-False-None] 1.8462ms 1.7648ms 566.6245 Ops/s 576.7769 Ops/s $\color{#d91a1a}-1.76\%$
test_to[True-False-None] 1.6077ms 1.3247ms 754.8976 Ops/s 726.8508 Ops/s $\color{#35bf28}+3.86\%$
test_to[within-False-None] 4.2650ms 4.1616ms 240.2903 Ops/s 240.3570 Ops/s $\color{#d91a1a}-0.03\%$
test_to[True-default-None] 5.6490ms 5.4220ms 184.4343 Ops/s 189.7206 Ops/s $\color{#d91a1a}-2.79\%$
test_to_njt[False-False-None] 7.4056ms 7.0378ms 142.0890 Ops/s 143.2606 Ops/s $\color{#d91a1a}-0.82\%$
test_to_njt[True-False-None] 5.7697ms 5.5114ms 181.4437 Ops/s 177.1123 Ops/s $\color{#35bf28}+2.45\%$
test_to_njt[within-False-None] 12.7666ms 12.2051ms 81.9332 Ops/s 80.6033 Ops/s $\color{#35bf28}+1.65\%$
test_creation[device0] 0.4500ms 81.0671μs 12.3355 KOps/s 12.4394 KOps/s $\color{#d91a1a}-0.84\%$
test_creation_from_tensor 0.4561ms 85.1282μs 11.7470 KOps/s 11.9203 KOps/s $\color{#d91a1a}-1.45\%$
test_add_one[memmap_tensor0] 0.4119ms 6.9548μs 143.7846 KOps/s 148.5809 KOps/s $\color{#d91a1a}-3.23\%$
test_contiguous[memmap_tensor0] 1.8896μs 0.4250μs 2.3529 MOps/s 2.4358 MOps/s $\color{#d91a1a}-3.41\%$
test_stack[memmap_tensor0] 39.2800μs 4.6634μs 214.4369 KOps/s 225.5850 KOps/s $\color{#d91a1a}-4.94\%$
test_memmaptd_index 1.4354ms 0.2453ms 4.0770 KOps/s 3.8637 KOps/s $\textbf{\color{#35bf28}+5.52\%}$
test_memmaptd_index_astensor 0.4503ms 0.3080ms 3.2471 KOps/s 3.0948 KOps/s $\color{#35bf28}+4.92\%$
test_memmaptd_index_op 0.7306ms 0.5693ms 1.7565 KOps/s 1.6007 KOps/s $\textbf{\color{#35bf28}+9.74\%}$
test_serialize_model 0.4188s 0.1725s 5.7961 Ops/s 7.6174 Ops/s $\textbf{\color{#d91a1a}-23.91\%}$
test_serialize_model_pickle 1.3480s 1.2145s 0.8234 Ops/s 0.8423 Ops/s $\color{#d91a1a}-2.24\%$
test_serialize_weights 0.1314s 0.1306s 7.6560 Ops/s 7.6311 Ops/s $\color{#35bf28}+0.33\%$
test_serialize_weights_returnearly 0.3182s 54.7309ms 18.2712 Ops/s 11.8312 Ops/s $\textbf{\color{#35bf28}+54.43\%}$
test_serialize_weights_pickle 1.3808s 1.2160s 0.8224 Ops/s 0.8216 Ops/s $\color{#35bf28}+0.09\%$
test_reshape_pytree 64.5010μs 22.0321μs 45.3884 KOps/s 45.6382 KOps/s $\color{#d91a1a}-0.55\%$
test_reshape_td 61.8110μs 26.8533μs 37.2394 KOps/s 36.7171 KOps/s $\color{#35bf28}+1.42\%$
test_view_pytree 51.8410μs 22.0873μs 45.2750 KOps/s 46.0049 KOps/s $\color{#d91a1a}-1.59\%$
test_view_td 65.3710μs 32.4242μs 30.8411 KOps/s 31.3263 KOps/s $\color{#d91a1a}-1.55\%$
test_unbind_pytree 62.8110μs 27.9719μs 35.7502 KOps/s 35.7781 KOps/s $\color{#d91a1a}-0.08\%$
test_unbind_td 0.7608ms 38.7021μs 25.8384 KOps/s 27.0514 KOps/s $\color{#d91a1a}-4.48\%$
test_split_pytree 67.5610μs 32.0571μs 31.1943 KOps/s 33.7170 KOps/s $\textbf{\color{#d91a1a}-7.48\%}$
test_split_td 0.9712ms 39.5031μs 25.3144 KOps/s 25.1146 KOps/s $\color{#35bf28}+0.80\%$
test_add_pytree 71.2810μs 37.8686μs 26.4071 KOps/s 29.4389 KOps/s $\textbf{\color{#d91a1a}-10.30\%}$
test_add_td 0.1032ms 51.4527μs 19.4353 KOps/s 19.4935 KOps/s $\color{#d91a1a}-0.30\%$
test_compile_add_one_nested[tensordict-compile] 0.1779ms 0.1303ms 7.6768 KOps/s 7.8349 KOps/s $\color{#d91a1a}-2.02\%$
test_compile_add_one_nested[tensordict-eager] 0.2363ms 0.1358ms 7.3658 KOps/s 7.4893 KOps/s $\color{#d91a1a}-1.65\%$
test_compile_add_one_nested[pytree-compile] 0.1413ms 97.5170μs 10.2546 KOps/s 10.0368 KOps/s $\color{#35bf28}+2.17\%$
test_compile_add_one_nested[pytree-eager] 1.3879ms 0.1637ms 6.1103 KOps/s 6.8094 KOps/s $\textbf{\color{#d91a1a}-10.27\%}$
test_compile_copy_nested[tensordict-compile] 62.2410μs 26.0079μs 38.4499 KOps/s 35.7494 KOps/s $\textbf{\color{#35bf28}+7.55\%}$
test_compile_copy_nested[tensordict-eager] 61.8100μs 30.7547μs 32.5154 KOps/s 33.1277 KOps/s $\color{#d91a1a}-1.85\%$
test_compile_copy_nested[pytree-compile] 0.2831ms 64.9643μs 15.3931 KOps/s 15.3138 KOps/s $\color{#35bf28}+0.52\%$
test_compile_copy_nested[pytree-eager] 88.4010μs 49.1738μs 20.3360 KOps/s 20.3309 KOps/s $\color{#35bf28}+0.03\%$
test_compile_add_one_flat[tensordict-compile] 0.1919ms 0.1436ms 6.9615 KOps/s 7.1393 KOps/s $\color{#d91a1a}-2.49\%$
test_compile_add_one_flat[tensordict-eager] 0.3124ms 0.2214ms 4.5162 KOps/s 4.6255 KOps/s $\color{#d91a1a}-2.36\%$
test_compile_add_one_flat[tensorclass-compile] 0.1576ms 99.0117μs 10.0998 KOps/s 10.2914 KOps/s $\color{#d91a1a}-1.86\%$
test_compile_add_one_flat[tensorclass-eager] 0.1310ms 56.7684μs 17.6154 KOps/s 17.9399 KOps/s $\color{#d91a1a}-1.81\%$
test_compile_add_one_flat[pytree-compile] 0.1820ms 0.1361ms 7.3495 KOps/s 7.3920 KOps/s $\color{#d91a1a}-0.58\%$
test_compile_add_one_flat[pytree-eager] 0.5607ms 0.5053ms 1.9792 KOps/s 2.1352 KOps/s $\textbf{\color{#d91a1a}-7.31\%}$
test_compile_add_self_flat[tensordict-eager] 0.4187ms 0.2650ms 3.7741 KOps/s 3.8262 KOps/s $\color{#d91a1a}-1.36\%$
test_compile_add_self_flat[tensordict-compile] 0.1874ms 0.1444ms 6.9262 KOps/s 7.1049 KOps/s $\color{#d91a1a}-2.52\%$
test_compile_add_self_flat[tensorclass-eager] 0.1885ms 69.4620μs 14.3964 KOps/s 14.8338 KOps/s $\color{#d91a1a}-2.95\%$
test_compile_add_self_flat[tensorclass-compile] 0.1455ms 0.1005ms 9.9539 KOps/s 10.1220 KOps/s $\color{#d91a1a}-1.66\%$
test_compile_add_self_flat[pytree-eager] 0.5217ms 0.4197ms 2.3828 KOps/s 2.5154 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_compile_add_self_flat[pytree-compile] 0.1913ms 0.1356ms 7.3766 KOps/s 7.4939 KOps/s $\color{#d91a1a}-1.57\%$
test_compile_copy_flat[tensordict-compile] 0.1317ms 19.1304μs 52.2727 KOps/s 37.9328 KOps/s $\textbf{\color{#35bf28}+37.80\%}$
test_compile_copy_flat[tensordict-eager] 75.2010μs 31.1225μs 32.1311 KOps/s 31.8256 KOps/s $\color{#35bf28}+0.96\%$
test_compile_copy_flat[pytree-compile] 0.1109ms 70.2846μs 14.2279 KOps/s 14.2351 KOps/s $\color{#d91a1a}-0.05\%$
test_compile_copy_flat[pytree-eager] 76.9110μs 51.3646μs 19.4687 KOps/s 19.4899 KOps/s $\color{#d91a1a}-0.11\%$
test_compile_assign_and_add[tensordict-compile] 1.6135ms 0.3881ms 2.5769 KOps/s 2.2254 KOps/s $\textbf{\color{#35bf28}+15.80\%}$
test_compile_assign_and_add[tensordict-eager] 2.9011ms 2.7446ms 364.3544 Ops/s 378.0414 Ops/s $\color{#d91a1a}-3.62\%$
test_compile_assign_and_add[pytree-compile] 1.5841ms 0.3794ms 2.6354 KOps/s 2.2877 KOps/s $\textbf{\color{#35bf28}+15.20\%}$
test_compile_assign_and_add[pytree-eager] 2.8308ms 2.7339ms 365.7833 Ops/s 386.5164 Ops/s $\textbf{\color{#d91a1a}-5.36\%}$
test_compile_indexing[tensor-tensordict-compile] 0.5344ms 0.1147ms 8.7203 KOps/s 8.7173 KOps/s $\color{#35bf28}+0.03\%$
test_compile_indexing[tensor-tensordict-eager] 0.5583ms 80.7067μs 12.3905 KOps/s 12.1110 KOps/s $\color{#35bf28}+2.31\%$
test_compile_indexing[tensor-tensorclass-compile] 0.6214ms 0.1071ms 9.3347 KOps/s 9.3818 KOps/s $\color{#d91a1a}-0.50\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1275ms 70.5856μs 14.1672 KOps/s 13.9604 KOps/s $\color{#35bf28}+1.48\%$
test_compile_indexing[tensor-pytree-compile] 0.1639ms 0.1140ms 8.7682 KOps/s 9.0201 KOps/s $\color{#d91a1a}-2.79\%$
test_compile_indexing[tensor-pytree-eager] 0.1185ms 73.3274μs 13.6375 KOps/s 13.7171 KOps/s $\color{#d91a1a}-0.58\%$
test_compile_indexing[slice-tensordict-compile] 0.1551ms 0.1035ms 9.6634 KOps/s 9.9233 KOps/s $\color{#d91a1a}-2.62\%$
test_compile_indexing[slice-tensordict-eager] 0.1490ms 17.7522μs 56.3309 KOps/s 56.2640 KOps/s $\color{#35bf28}+0.12\%$
test_compile_indexing[slice-tensorclass-compile] 0.1509ms 98.0780μs 10.1960 KOps/s 10.4693 KOps/s $\color{#d91a1a}-2.61\%$
test_compile_indexing[slice-tensorclass-eager] 51.8110μs 16.3016μs 61.3436 KOps/s 62.9009 KOps/s $\color{#d91a1a}-2.48\%$
test_compile_indexing[slice-pytree-compile] 0.1470ms 99.5338μs 10.0468 KOps/s 10.2215 KOps/s $\color{#d91a1a}-1.71\%$
test_compile_indexing[slice-pytree-eager] 46.3010μs 16.1761μs 61.8194 KOps/s 62.7765 KOps/s $\color{#d91a1a}-1.52\%$
test_compile_indexing[int-tensordict-compile] 0.1507ms 0.1030ms 9.7118 KOps/s 9.9711 KOps/s $\color{#d91a1a}-2.60\%$
test_compile_indexing[int-tensordict-eager] 0.5400ms 17.4452μs 57.3223 KOps/s 57.3518 KOps/s $\color{#d91a1a}-0.05\%$
test_compile_indexing[int-tensorclass-compile] 0.1456ms 96.4258μs 10.3707 KOps/s 10.0365 KOps/s $\color{#35bf28}+3.33\%$
test_compile_indexing[int-tensorclass-eager] 0.1232ms 17.6364μs 56.7008 KOps/s 63.0791 KOps/s $\textbf{\color{#d91a1a}-10.11\%}$
test_compile_indexing[int-pytree-compile] 0.1480ms 96.3818μs 10.3754 KOps/s 10.1852 KOps/s $\color{#35bf28}+1.87\%$
test_compile_indexing[int-pytree-eager] 60.0610μs 16.1562μs 61.8956 KOps/s 62.5160 KOps/s $\color{#d91a1a}-0.99\%$
test_mod_add[eager] 82.5410μs 40.4614μs 24.7149 KOps/s 23.6019 KOps/s $\color{#35bf28}+4.72\%$
test_mod_add[compile] 0.3338ms 84.2467μs 11.8699 KOps/s 11.4883 KOps/s $\color{#35bf28}+3.32\%$
test_mod_add[compile-overhead] 0.3281ms 0.1684ms 5.9378 KOps/s 5.7013 KOps/s $\color{#35bf28}+4.15\%$
test_mod_wrap[eager] 0.3363ms 0.2639ms 3.7896 KOps/s 3.7398 KOps/s $\color{#35bf28}+1.33\%$
test_mod_wrap[compile] 0.3476ms 0.2842ms 3.5189 KOps/s 3.4760 KOps/s $\color{#35bf28}+1.23\%$
test_mod_wrap[compile-overhead] 6.3913ms 3.5487ms 281.7968 Ops/s 267.3155 Ops/s $\textbf{\color{#35bf28}+5.42\%}$
test_mod_wrap_and_backward[eager] 1.5417ms 1.3855ms 721.7727 Ops/s 686.1055 Ops/s $\textbf{\color{#35bf28}+5.20\%}$
test_mod_wrap_and_backward[compile] 1.3755ms 1.2903ms 775.0047 Ops/s 725.6111 Ops/s $\textbf{\color{#35bf28}+6.81\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3753ms 0.9294ms 1.0759 KOps/s 913.1650 Ops/s $\textbf{\color{#35bf28}+17.82\%}$
test_seq_add[eager] 0.1668ms 0.1148ms 8.7145 KOps/s 7.9629 KOps/s $\textbf{\color{#35bf28}+9.44\%}$
test_seq_add[compile] 0.1375ms 87.7992μs 11.3896 KOps/s 11.2810 KOps/s $\color{#35bf28}+0.96\%$
test_seq_add[compile-overhead] 0.1987ms 0.1306ms 7.6584 KOps/s 7.7831 KOps/s $\color{#d91a1a}-1.60\%$
test_seq_wrap[eager] 0.5073ms 0.4165ms 2.4012 KOps/s 2.3053 KOps/s $\color{#35bf28}+4.16\%$
test_seq_wrap[compile] 0.3524ms 0.3009ms 3.3232 KOps/s 3.2372 KOps/s $\color{#35bf28}+2.66\%$
test_seq_wrap[compile-overhead] 0.2782ms 0.2300ms 4.3485 KOps/s 4.4078 KOps/s $\color{#d91a1a}-1.35\%$
test_func_call_runtime[False-eager] 0.8585ms 0.7777ms 1.2859 KOps/s 1.2760 KOps/s $\color{#35bf28}+0.77\%$
test_func_call_runtime[False-compile] 0.9930ms 0.7567ms 1.3216 KOps/s 1.3354 KOps/s $\color{#d91a1a}-1.03\%$
test_func_call_runtime[False-compile-overhead] 0.4345ms 0.3632ms 2.7535 KOps/s 2.7541 KOps/s $\color{#d91a1a}-0.02\%$
test_func_call_runtime[True-eager] 1.0292ms 0.9184ms 1.0889 KOps/s 1.0972 KOps/s $\color{#d91a1a}-0.76\%$
test_func_call_runtime[True-compile] 0.8363ms 0.7746ms 1.2910 KOps/s 1.3035 KOps/s $\color{#d91a1a}-0.96\%$
test_func_call_runtime[True-compile-overhead] 0.4370ms 0.3899ms 2.5648 KOps/s 2.6089 KOps/s $\color{#d91a1a}-1.69\%$
test_func_call_cm_runtime[False-eager] 0.9359ms 0.7816ms 1.2794 KOps/s 1.3468 KOps/s $\color{#d91a1a}-5.00\%$
test_func_call_cm_runtime[False-compile] 1.1619ms 0.7588ms 1.3179 KOps/s 1.3275 KOps/s $\color{#d91a1a}-0.72\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4389ms 0.3658ms 2.7337 KOps/s 2.7430 KOps/s $\color{#d91a1a}-0.34\%$
test_func_call_cm_runtime[True-eager] 1.2399ms 1.0202ms 980.2473 Ops/s 985.3750 Ops/s $\color{#d91a1a}-0.52\%$
test_func_call_cm_runtime[True-compile] 1.1129ms 0.9978ms 1.0022 KOps/s 1.2528 KOps/s $\textbf{\color{#d91a1a}-20.01\%}$
test_func_call_cm_runtime[True-compile-overhead] 1.0639ms 1.0029ms 997.0991 Ops/s 2.4310 KOps/s $\textbf{\color{#d91a1a}-58.98\%}$
test_vmap_func_call_cm_runtime[eager] 2.6592ms 2.1511ms 464.8731 Ops/s 473.5371 Ops/s $\color{#d91a1a}-1.83\%$
test_vmap_func_call_cm_runtime[compile] 0.8812ms 0.8166ms 1.2246 KOps/s 1.1856 KOps/s $\color{#35bf28}+3.29\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4627ms 0.4152ms 2.4086 KOps/s 2.3929 KOps/s $\color{#35bf28}+0.66\%$
test_distributed 7.5318ms 0.1660ms 6.0230 KOps/s 8.4273 KOps/s $\textbf{\color{#d91a1a}-28.53\%}$
test_tdmodule 0.1726ms 19.3394μs 51.7080 KOps/s 47.0643 KOps/s $\textbf{\color{#35bf28}+9.87\%}$
test_tdmodule_dispatch 54.6810μs 33.9641μs 29.4429 KOps/s 25.4988 KOps/s $\textbf{\color{#35bf28}+15.47\%}$
test_tdseq 40.1510μs 19.7950μs 50.5178 KOps/s 44.6915 KOps/s $\textbf{\color{#35bf28}+13.04\%}$
test_tdseq_dispatch 57.2110μs 36.8694μs 27.1228 KOps/s 23.5634 KOps/s $\textbf{\color{#35bf28}+15.11\%}$
test_instantiation_functorch 2.1362ms 1.5758ms 634.6160 Ops/s 632.1773 Ops/s $\color{#35bf28}+0.39\%$
test_exec_functorch 0.1950ms 0.1506ms 6.6406 KOps/s 6.9297 KOps/s $\color{#d91a1a}-4.17\%$
test_exec_functional_call 0.1777ms 0.1425ms 7.0162 KOps/s 7.3122 KOps/s $\color{#d91a1a}-4.05\%$
test_exec_td_decorator 0.3884ms 0.1931ms 5.1793 KOps/s 5.3782 KOps/s $\color{#d91a1a}-3.70\%$
test_vmap_mlp_speed_decorator[True-True] 0.8326ms 0.6904ms 1.4484 KOps/s 1.4312 KOps/s $\color{#35bf28}+1.21\%$
test_vmap_mlp_speed_decorator[True-False] 0.7943ms 0.6899ms 1.4494 KOps/s 1.3797 KOps/s $\textbf{\color{#35bf28}+5.06\%}$
test_vmap_mlp_speed_decorator[False-True] 0.7118ms 0.6042ms 1.6550 KOps/s 1.5912 KOps/s $\color{#35bf28}+4.01\%$
test_vmap_mlp_speed_decorator[False-False] 0.7193ms 0.6034ms 1.6573 KOps/s 1.5941 KOps/s $\color{#35bf28}+3.97\%$
test_vmap_transformer_speed_decorator[True-True] 20.1095ms 19.4760ms 51.3451 Ops/s 50.6223 Ops/s $\color{#35bf28}+1.43\%$
test_vmap_transformer_speed_decorator[True-False] 19.6442ms 19.4804ms 51.3337 Ops/s 51.9325 Ops/s $\color{#d91a1a}-1.15\%$
test_vmap_transformer_speed_decorator[False-True] 19.4887ms 19.3196ms 51.7609 Ops/s 52.4214 Ops/s $\color{#d91a1a}-1.26\%$
test_vmap_transformer_speed_decorator[False-False] 20.2889ms 19.5398ms 51.1776 Ops/s 52.3325 Ops/s $\color{#d91a1a}-2.21\%$
test_to_module_speed[True] 1.4855ms 0.9872ms 1.0130 KOps/s 990.2280 Ops/s $\color{#35bf28}+2.30\%$
test_to_module_speed[False] 1.0803ms 0.9743ms 1.0264 KOps/s 1.0195 KOps/s $\color{#35bf28}+0.68\%$
test_tc_init 76.0920μs 35.7488μs 27.9730 KOps/s 25.0235 KOps/s $\textbf{\color{#35bf28}+11.79\%}$
test_tc_init_nested 0.1679ms 71.4137μs 14.0029 KOps/s 12.5426 KOps/s $\textbf{\color{#35bf28}+11.64\%}$
test_tc_first_layer_tensor 28.1400μs 0.8280μs 1.2078 MOps/s 1.1541 MOps/s $\color{#35bf28}+4.65\%$
test_tc_first_layer_nontensor 36.7300μs 2.2900μs 436.6721 KOps/s 421.8379 KOps/s $\color{#35bf28}+3.52\%$
test_tc_second_layer_tensor 26.5303μs 1.4588μs 685.4915 KOps/s 678.5825 KOps/s $\color{#35bf28}+1.02\%$
test_tc_second_layer_nontensor 61.9210μs 3.0080μs 332.4509 KOps/s 320.1565 KOps/s $\color{#35bf28}+3.84\%$
test_unbind 7.2559ms 7.0157ms 142.5379 Ops/s 141.8157 Ops/s $\color{#35bf28}+0.51\%$
test_full_like 12.9426ms 9.1721ms 109.0261 Ops/s 109.4730 Ops/s $\color{#d91a1a}-0.41\%$
test_zeros_like 5.9419ms 4.2705ms 234.1623 Ops/s 114.5373 Ops/s $\textbf{\color{#35bf28}+104.44\%}$
test_ones_like 4.4646ms 4.2364ms 236.0516 Ops/s 241.4820 Ops/s $\color{#d91a1a}-2.25\%$
test_clone 11.2332ms 9.0777ms 110.1596 Ops/s 157.9296 Ops/s $\textbf{\color{#d91a1a}-30.25\%}$
test_squeeze 47.6200μs 9.5143μs 105.1045 KOps/s 103.6631 KOps/s $\color{#35bf28}+1.39\%$
test_unsqueeze 0.1234ms 74.3705μs 13.4462 KOps/s 13.9544 KOps/s $\color{#d91a1a}-3.64\%$
test_split 0.2084s 0.2217ms 4.5099 KOps/s 6.0607 KOps/s $\textbf{\color{#d91a1a}-25.59\%}$
test_permute 0.2369ms 0.1878ms 5.3252 KOps/s 5.6699 KOps/s $\textbf{\color{#d91a1a}-6.08\%}$
test_stack 53.1031ms 50.2750ms 19.8906 Ops/s 19.6310 Ops/s $\color{#35bf28}+1.32\%$
test_cat 50.4501ms 50.0801ms 19.9680 Ops/s 19.8950 Ops/s $\color{#35bf28}+0.37\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] @tensorclass(frozen=True) results in tensor cycles
2 participants