Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] flexible return type when indexing prob sequences #1189

Merged
merged 1 commit into from
Jan 21, 2025

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 21, 2025

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 21, 2025
ghstack-source-id: 74d28ee84d965c11c527c60b20d9123ef30007f6
Pull Request resolved: #1189
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2025
@vmoens vmoens merged commit 19b8812 into gh/vmoens/45/base Jan 21, 2025
19 of 25 checks passed
vmoens added a commit that referenced this pull request Jan 21, 2025
ghstack-source-id: 74d28ee84d965c11c527c60b20d9123ef30007f6
Pull Request resolved: #1189
@vmoens vmoens deleted the gh/vmoens/45/head branch January 21, 2025 09:44
@vmoens vmoens added the enhancement New feature or request label Jan 21, 2025
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 229. Improved: $\large\color{#35bf28}48$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 34.2710μs 11.4245μs 87.5314 KOps/s 74.0584 KOps/s $\textbf{\color{#35bf28}+18.19\%}$
test_plain_set_stack_nested 34.1510μs 11.6590μs 85.7703 KOps/s 72.6472 KOps/s $\textbf{\color{#35bf28}+18.06\%}$
test_plain_set_nested_inplace 52.6800μs 12.5686μs 79.5632 KOps/s 69.0652 KOps/s $\textbf{\color{#35bf28}+15.20\%}$
test_plain_set_stack_nested_inplace 43.5010μs 12.5399μs 79.7454 KOps/s 68.9896 KOps/s $\textbf{\color{#35bf28}+15.59\%}$
test_items 21.1400μs 2.9149μs 343.0705 KOps/s 340.7535 KOps/s $\color{#35bf28}+0.68\%$
test_items_nested 0.4158ms 0.3618ms 2.7641 KOps/s 2.7452 KOps/s $\color{#35bf28}+0.69\%$
test_items_nested_locked 0.5574ms 0.3688ms 2.7116 KOps/s 2.7397 KOps/s $\color{#d91a1a}-1.03\%$
test_items_nested_leaf 84.3020μs 58.5174μs 17.0889 KOps/s 17.0415 KOps/s $\color{#35bf28}+0.28\%$
test_items_stack_nested 0.4154ms 0.3646ms 2.7425 KOps/s 2.7356 KOps/s $\color{#35bf28}+0.25\%$
test_items_stack_nested_leaf 90.6520μs 59.9455μs 16.6818 KOps/s 16.6472 KOps/s $\color{#35bf28}+0.21\%$
test_items_stack_nested_locked 0.3920ms 0.3659ms 2.7333 KOps/s 2.7265 KOps/s $\color{#35bf28}+0.25\%$
test_keys 30.0910μs 3.4713μs 288.0767 KOps/s 288.4604 KOps/s $\color{#d91a1a}-0.13\%$
test_keys_nested 0.1269ms 87.8884μs 11.3781 KOps/s 11.4683 KOps/s $\color{#d91a1a}-0.79\%$
test_keys_nested_locked 0.7249ms 93.9821μs 10.6403 KOps/s 10.7784 KOps/s $\color{#d91a1a}-1.28\%$
test_keys_nested_leaf 0.1212ms 78.4432μs 12.7481 KOps/s 12.8161 KOps/s $\color{#d91a1a}-0.53\%$
test_keys_stack_nested 0.1513ms 87.8316μs 11.3854 KOps/s 11.3594 KOps/s $\color{#35bf28}+0.23\%$
test_keys_stack_nested_leaf 0.1176ms 78.8555μs 12.6814 KOps/s 12.6611 KOps/s $\color{#35bf28}+0.16\%$
test_keys_stack_nested_locked 0.1291ms 93.1290μs 10.7378 KOps/s 10.6756 KOps/s $\color{#35bf28}+0.58\%$
test_values 5.8633μs 0.8541μs 1.1708 MOps/s 1.1784 MOps/s $\color{#d91a1a}-0.64\%$
test_values_nested 67.2910μs 37.9280μs 26.3658 KOps/s 26.9034 KOps/s $\color{#d91a1a}-2.00\%$
test_values_nested_locked 65.8210μs 39.1954μs 25.5132 KOps/s 25.8360 KOps/s $\color{#d91a1a}-1.25\%$
test_values_nested_leaf 88.8920μs 41.7198μs 23.9694 KOps/s 23.9851 KOps/s $\color{#d91a1a}-0.07\%$
test_values_stack_nested 0.1086ms 38.5644μs 25.9307 KOps/s 26.4775 KOps/s $\color{#d91a1a}-2.07\%$
test_values_stack_nested_leaf 75.6710μs 42.3412μs 23.6176 KOps/s 23.5387 KOps/s $\color{#35bf28}+0.34\%$
test_values_stack_nested_locked 87.4610μs 39.8939μs 25.0665 KOps/s 25.4017 KOps/s $\color{#d91a1a}-1.32\%$
test_membership 2.6170μs 0.5002μs 1.9993 MOps/s 1.9568 MOps/s $\color{#35bf28}+2.17\%$
test_membership_nested 16.1300μs 1.9998μs 500.0458 KOps/s 493.2413 KOps/s $\color{#35bf28}+1.38\%$
test_membership_nested_leaf 17.4300μs 2.0234μs 494.2097 KOps/s 493.2794 KOps/s $\color{#35bf28}+0.19\%$
test_membership_stacked_nested 29.6400μs 2.1269μs 470.1635 KOps/s 481.1856 KOps/s $\color{#d91a1a}-2.29\%$
test_membership_stacked_nested_leaf 32.3010μs 2.0879μs 478.9574 KOps/s 481.1007 KOps/s $\color{#d91a1a}-0.45\%$
test_membership_nested_last 36.4510μs 3.0844μs 324.2097 KOps/s 317.5868 KOps/s $\color{#35bf28}+2.09\%$
test_membership_nested_leaf_last 39.7400μs 3.0835μs 324.3050 KOps/s 315.8241 KOps/s $\color{#35bf28}+2.69\%$
test_membership_stacked_nested_last 54.7310μs 8.1967μs 122.0005 KOps/s 243.0257 KOps/s $\textbf{\color{#d91a1a}-49.80\%}$
test_membership_stacked_nested_leaf_last 35.0000μs 8.2478μs 121.2447 KOps/s 242.1743 KOps/s $\textbf{\color{#d91a1a}-49.93\%}$
test_nested_getleaf 37.0610μs 6.0865μs 164.2979 KOps/s 162.2235 KOps/s $\color{#35bf28}+1.28\%$
test_nested_get 34.6510μs 5.8415μs 171.1877 KOps/s 171.2673 KOps/s $\color{#d91a1a}-0.05\%$
test_stacked_getleaf 38.9610μs 6.1337μs 163.0344 KOps/s 163.8438 KOps/s $\color{#d91a1a}-0.49\%$
test_stacked_get 33.1410μs 5.8095μs 172.1327 KOps/s 173.0299 KOps/s $\color{#d91a1a}-0.52\%$
test_nested_getitemleaf 32.4410μs 6.4002μs 156.2440 KOps/s 155.5115 KOps/s $\color{#35bf28}+0.47\%$
test_nested_getitem 32.9410μs 6.1759μs 161.9202 KOps/s 162.9436 KOps/s $\color{#d91a1a}-0.63\%$
test_stacked_getitemleaf 37.5010μs 6.4437μs 155.1910 KOps/s 155.5388 KOps/s $\color{#d91a1a}-0.22\%$
test_stacked_getitem 27.5900μs 6.0949μs 164.0707 KOps/s 163.9280 KOps/s $\color{#35bf28}+0.09\%$
test_lock_nested 8.8157ms 0.3514ms 2.8455 KOps/s 2.8012 KOps/s $\color{#35bf28}+1.58\%$
test_lock_stack_nested 0.3911ms 0.3377ms 2.9616 KOps/s 2.8578 KOps/s $\color{#35bf28}+3.63\%$
test_unlock_nested 0.3503ms 0.2818ms 3.5485 KOps/s 3.4481 KOps/s $\color{#35bf28}+2.91\%$
test_unlock_stack_nested 0.3184ms 0.2753ms 3.6321 KOps/s 3.4994 KOps/s $\color{#35bf28}+3.79\%$
test_flatten_speed 0.1247ms 75.5420μs 13.2377 KOps/s 13.1493 KOps/s $\color{#35bf28}+0.67\%$
test_unflatten_speed 0.4502ms 0.3225ms 3.1012 KOps/s 3.0708 KOps/s $\color{#35bf28}+0.99\%$
test_common_ops 0.7388ms 0.5941ms 1.6832 KOps/s 1.4615 KOps/s $\textbf{\color{#35bf28}+15.17\%}$
test_creation 0.1003ms 1.7405μs 574.5598 KOps/s 573.9256 KOps/s $\color{#35bf28}+0.11\%$
test_creation_empty 32.3810μs 6.9871μs 143.1211 KOps/s 92.6450 KOps/s $\textbf{\color{#35bf28}+54.48\%}$
test_creation_nested_1 42.1000μs 8.7205μs 114.6720 KOps/s 79.6560 KOps/s $\textbf{\color{#35bf28}+43.96\%}$
test_creation_nested_2 33.5000μs 11.5260μs 86.7602 KOps/s 65.6566 KOps/s $\textbf{\color{#35bf28}+32.14\%}$
test_clone 45.3510μs 10.7415μs 93.0971 KOps/s 87.7606 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_getitem[int] 1.1478ms 10.8619μs 92.0647 KOps/s 89.6719 KOps/s $\color{#35bf28}+2.67\%$
test_getitem[slice_int] 0.1064ms 21.0626μs 47.4775 KOps/s 45.6785 KOps/s $\color{#35bf28}+3.94\%$
test_getitem[range] 0.1245ms 38.8388μs 25.7474 KOps/s 25.0841 KOps/s $\color{#35bf28}+2.64\%$
test_getitem[tuple] 0.1067ms 18.6403μs 53.6473 KOps/s 52.5165 KOps/s $\color{#35bf28}+2.15\%$
test_getitem[list] 0.1315ms 33.2375μs 30.0865 KOps/s 28.2871 KOps/s $\textbf{\color{#35bf28}+6.36\%}$
test_setitem_dim[int] 41.2010μs 19.8500μs 50.3779 KOps/s 47.5882 KOps/s $\textbf{\color{#35bf28}+5.86\%}$
test_setitem_dim[slice_int] 60.5610μs 39.3330μs 25.4239 KOps/s 25.0482 KOps/s $\color{#35bf28}+1.50\%$
test_setitem_dim[range] 88.6320μs 54.8144μs 18.2434 KOps/s 18.0270 KOps/s $\color{#35bf28}+1.20\%$
test_setitem_dim[tuple] 61.3010μs 33.2495μs 30.0756 KOps/s 29.2012 KOps/s $\color{#35bf28}+2.99\%$
test_setitem 49.0810μs 14.7614μs 67.7442 KOps/s 58.2152 KOps/s $\textbf{\color{#35bf28}+16.37\%}$
test_set 55.5510μs 14.2883μs 69.9874 KOps/s 58.8965 KOps/s $\textbf{\color{#35bf28}+18.83\%}$
test_set_shared 0.5076ms 0.1621ms 6.1709 KOps/s 6.1777 KOps/s $\color{#d91a1a}-0.11\%$
test_update 0.3693ms 16.3153μs 61.2921 KOps/s 47.6036 KOps/s $\textbf{\color{#35bf28}+28.76\%}$
test_update_nested 49.5710μs 21.4561μs 46.6068 KOps/s 37.5994 KOps/s $\textbf{\color{#35bf28}+23.96\%}$
test_update__nested 0.4961ms 25.8945μs 38.6183 KOps/s 36.6641 KOps/s $\textbf{\color{#35bf28}+5.33\%}$
test_set_nested 64.3110μs 15.3908μs 64.9738 KOps/s 55.0176 KOps/s $\textbf{\color{#35bf28}+18.10\%}$
test_set_nested_new 52.2500μs 17.6452μs 56.6728 KOps/s 48.5175 KOps/s $\textbf{\color{#35bf28}+16.81\%}$
test_select 65.5910μs 29.3658μs 34.0532 KOps/s 30.6089 KOps/s $\textbf{\color{#35bf28}+11.25\%}$
test_select_nested 88.5810μs 44.5962μs 22.4234 KOps/s 22.6758 KOps/s $\color{#d91a1a}-1.11\%$
test_exclude_nested 0.1064ms 64.3675μs 15.5358 KOps/s 15.6699 KOps/s $\color{#d91a1a}-0.86\%$
test_empty[True] 0.3448ms 0.2949ms 3.3905 KOps/s 3.3825 KOps/s $\color{#35bf28}+0.24\%$
test_empty[False] 4.3211μs 0.8316μs 1.2024 MOps/s 1.2063 MOps/s $\color{#d91a1a}-0.32\%$
test_to 84.9410μs 56.8937μs 17.5766 KOps/s 17.5322 KOps/s $\color{#35bf28}+0.25\%$
test_to_nonblocking 88.6210μs 49.0766μs 20.3763 KOps/s 20.4592 KOps/s $\color{#d91a1a}-0.41\%$
test_unbind_speed 0.2666ms 0.2413ms 4.1448 KOps/s 4.0604 KOps/s $\color{#35bf28}+2.08\%$
test_unbind_speed_stack0 0.2771ms 0.2352ms 4.2509 KOps/s 4.0693 KOps/s $\color{#35bf28}+4.46\%$
test_unbind_speed_stack1 92.3105ms 0.7264ms 1.3766 KOps/s 1.3700 KOps/s $\color{#35bf28}+0.48\%$
test_split 93.4549ms 1.5934ms 627.6045 Ops/s 611.5255 Ops/s $\color{#35bf28}+2.63\%$
test_chunk 95.8648ms 1.6076ms 622.0548 Ops/s 606.7582 Ops/s $\color{#35bf28}+2.52\%$
test_consolidate[False-None] 3.4107ms 2.7565ms 362.7823 Ops/s 363.8128 Ops/s $\color{#d91a1a}-0.28\%$
test_consolidate[default-None] 1.8169ms 1.7404ms 574.5760 Ops/s 579.7707 Ops/s $\color{#d91a1a}-0.90\%$
test_consolidate[reduce-overhead-None] 1.8297ms 1.7605ms 568.0225 Ops/s 560.1837 Ops/s $\color{#35bf28}+1.40\%$
test_consolidate_njt[False-None] 7.1986ms 6.9121ms 144.6743 Ops/s 145.6552 Ops/s $\color{#d91a1a}-0.67\%$
test_to[False-False-None] 1.8789ms 1.7674ms 565.7933 Ops/s 559.1862 Ops/s $\color{#35bf28}+1.18\%$
test_to[True-False-None] 1.7290ms 1.4065ms 710.9699 Ops/s 705.5864 Ops/s $\color{#35bf28}+0.76\%$
test_to[within-False-None] 4.3574ms 4.1823ms 239.1013 Ops/s 229.9538 Ops/s $\color{#35bf28}+3.98\%$
test_to[True-default-None] 5.9665ms 5.5067ms 181.5984 Ops/s 185.4693 Ops/s $\color{#d91a1a}-2.09\%$
test_to_njt[False-False-None] 7.1762ms 7.0643ms 141.5559 Ops/s 139.1086 Ops/s $\color{#35bf28}+1.76\%$
test_to_njt[True-False-None] 6.0786ms 5.6110ms 178.2222 Ops/s 170.5579 Ops/s $\color{#35bf28}+4.49\%$
test_to_njt[within-False-None] 12.4929ms 12.3914ms 80.7011 Ops/s 78.3343 Ops/s $\color{#35bf28}+3.02\%$
test_creation[device0] 0.2837ms 85.5903μs 11.6836 KOps/s 12.1382 KOps/s $\color{#d91a1a}-3.75\%$
test_creation_from_tensor 0.5462ms 89.4112μs 11.1843 KOps/s 11.2308 KOps/s $\color{#d91a1a}-0.41\%$
test_add_one[memmap_tensor0] 0.5776ms 6.9077μs 144.7661 KOps/s 136.3970 KOps/s $\textbf{\color{#35bf28}+6.14\%}$
test_contiguous[memmap_tensor0] 2.4685μs 0.4203μs 2.3792 MOps/s 2.3489 MOps/s $\color{#35bf28}+1.29\%$
test_stack[memmap_tensor0] 36.3310μs 4.4449μs 224.9782 KOps/s 215.1177 KOps/s $\color{#35bf28}+4.58\%$
test_memmaptd_index 1.4640ms 0.2446ms 4.0883 KOps/s 3.9479 KOps/s $\color{#35bf28}+3.55\%$
test_memmaptd_index_astensor 0.4545ms 0.3081ms 3.2458 KOps/s 3.1869 KOps/s $\color{#35bf28}+1.85\%$
test_memmaptd_index_op 0.7049ms 0.5610ms 1.7826 KOps/s 1.5404 KOps/s $\textbf{\color{#35bf28}+15.72\%}$
test_serialize_model 0.4291s 0.1731s 5.7778 Ops/s 7.6324 Ops/s $\textbf{\color{#d91a1a}-24.30\%}$
test_serialize_model_pickle 1.3843s 1.2157s 0.8226 Ops/s 0.8248 Ops/s $\color{#d91a1a}-0.26\%$
test_serialize_weights 0.1321s 0.1304s 7.6684 Ops/s 7.6814 Ops/s $\color{#d91a1a}-0.17\%$
test_serialize_weights_returnearly 0.3295s 54.9902ms 18.1851 Ops/s 14.9987 Ops/s $\textbf{\color{#35bf28}+21.24\%}$
test_serialize_weights_pickle 2.9377s 1.7565s 0.5693 Ops/s 0.8254 Ops/s $\textbf{\color{#d91a1a}-31.02\%}$
test_reshape_pytree 52.5400μs 22.4919μs 44.4604 KOps/s 44.0636 KOps/s $\color{#35bf28}+0.90\%$
test_reshape_td 69.6310μs 27.8680μs 35.8835 KOps/s 35.4466 KOps/s $\color{#35bf28}+1.23\%$
test_view_pytree 61.4110μs 22.8158μs 43.8294 KOps/s 44.8365 KOps/s $\color{#d91a1a}-2.25\%$
test_view_td 70.4810μs 32.9571μs 30.3425 KOps/s 28.7999 KOps/s $\textbf{\color{#35bf28}+5.36\%}$
test_unbind_pytree 54.3700μs 28.3846μs 35.2304 KOps/s 34.7553 KOps/s $\color{#35bf28}+1.37\%$
test_unbind_td 0.8051ms 37.2263μs 26.8627 KOps/s 26.5785 KOps/s $\color{#35bf28}+1.07\%$
test_split_pytree 58.2610μs 30.8017μs 32.4658 KOps/s 31.8726 KOps/s $\color{#35bf28}+1.86\%$
test_split_td 0.6490ms 39.1240μs 25.5598 KOps/s 25.0535 KOps/s $\color{#35bf28}+2.02\%$
test_add_pytree 76.9410μs 35.3173μs 28.3147 KOps/s 27.0926 KOps/s $\color{#35bf28}+4.51\%$
test_add_td 87.8320μs 47.5066μs 21.0497 KOps/s 18.4998 KOps/s $\textbf{\color{#35bf28}+13.78\%}$
test_compile_add_one_nested[tensordict-compile] 0.1787ms 0.1253ms 7.9838 KOps/s 7.5456 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_compile_add_one_nested[tensordict-eager] 0.2222ms 0.1324ms 7.5555 KOps/s 7.4313 KOps/s $\color{#35bf28}+1.67\%$
test_compile_add_one_nested[pytree-compile] 0.1449ms 97.5140μs 10.2549 KOps/s 10.0130 KOps/s $\color{#35bf28}+2.42\%$
test_compile_add_one_nested[pytree-eager] 1.3615ms 0.1485ms 6.7329 KOps/s 6.3580 KOps/s $\textbf{\color{#35bf28}+5.90\%}$
test_compile_copy_nested[tensordict-compile] 65.3110μs 25.0195μs 39.9688 KOps/s 42.6738 KOps/s $\textbf{\color{#d91a1a}-6.34\%}$
test_compile_copy_nested[tensordict-eager] 61.2510μs 29.6712μs 33.7027 KOps/s 33.8460 KOps/s $\color{#d91a1a}-0.42\%$
test_compile_copy_nested[pytree-compile] 0.3915ms 66.5363μs 15.0294 KOps/s 15.2793 KOps/s $\color{#d91a1a}-1.64\%$
test_compile_copy_nested[pytree-eager] 78.4210μs 49.2607μs 20.3002 KOps/s 20.6826 KOps/s $\color{#d91a1a}-1.85\%$
test_compile_add_one_flat[tensordict-compile] 0.1839ms 0.1431ms 6.9857 KOps/s 7.0047 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_add_one_flat[tensordict-eager] 0.3195ms 0.2163ms 4.6229 KOps/s 4.6126 KOps/s $\color{#35bf28}+0.22\%$
test_compile_add_one_flat[tensorclass-compile] 0.1610ms 98.9461μs 10.1065 KOps/s 10.0614 KOps/s $\color{#35bf28}+0.45\%$
test_compile_add_one_flat[tensorclass-eager] 0.1175ms 55.8030μs 17.9202 KOps/s 16.5397 KOps/s $\textbf{\color{#35bf28}+8.35\%}$
test_compile_add_one_flat[pytree-compile] 0.2075ms 0.1358ms 7.3646 KOps/s 7.3332 KOps/s $\color{#35bf28}+0.43\%$
test_compile_add_one_flat[pytree-eager] 0.5297ms 0.4787ms 2.0891 KOps/s 1.9659 KOps/s $\textbf{\color{#35bf28}+6.26\%}$
test_compile_add_self_flat[tensordict-eager] 0.3935ms 0.2606ms 3.8368 KOps/s 3.8060 KOps/s $\color{#35bf28}+0.81\%$
test_compile_add_self_flat[tensordict-compile] 0.1829ms 0.1450ms 6.8960 KOps/s 6.8498 KOps/s $\color{#35bf28}+0.67\%$
test_compile_add_self_flat[tensorclass-eager] 0.1570ms 70.6810μs 14.1481 KOps/s 13.6654 KOps/s $\color{#35bf28}+3.53\%$
test_compile_add_self_flat[tensorclass-compile] 0.1487ms 0.1012ms 9.8856 KOps/s 9.6094 KOps/s $\color{#35bf28}+2.87\%$
test_compile_add_self_flat[pytree-eager] 0.4548ms 0.4090ms 2.4448 KOps/s 2.3568 KOps/s $\color{#35bf28}+3.73\%$
test_compile_add_self_flat[pytree-compile] 0.1745ms 0.1368ms 7.3086 KOps/s 7.2717 KOps/s $\color{#35bf28}+0.51\%$
test_compile_copy_flat[tensordict-compile] 58.4810μs 19.5170μs 51.2375 KOps/s 55.2720 KOps/s $\textbf{\color{#d91a1a}-7.30\%}$
test_compile_copy_flat[tensordict-eager] 69.5110μs 30.6016μs 32.6781 KOps/s 32.1756 KOps/s $\color{#35bf28}+1.56\%$
test_compile_copy_flat[pytree-compile] 96.8810μs 70.3945μs 14.2056 KOps/s 14.2803 KOps/s $\color{#d91a1a}-0.52\%$
test_compile_copy_flat[pytree-eager] 87.9210μs 51.3182μs 19.4863 KOps/s 19.3500 KOps/s $\color{#35bf28}+0.70\%$
test_compile_assign_and_add[tensordict-compile] 1.6100ms 0.3899ms 2.5647 KOps/s 2.1873 KOps/s $\textbf{\color{#35bf28}+17.26\%}$
test_compile_assign_and_add[tensordict-eager] 2.8665ms 2.6705ms 374.4641 Ops/s 364.4989 Ops/s $\color{#35bf28}+2.73\%$
test_compile_assign_and_add[pytree-compile] 1.5841ms 0.4323ms 2.3131 KOps/s 2.2295 KOps/s $\color{#35bf28}+3.75\%$
test_compile_assign_and_add[pytree-eager] 2.7823ms 2.6738ms 374.0030 Ops/s 359.1933 Ops/s $\color{#35bf28}+4.12\%$
test_compile_indexing[tensor-tensordict-compile] 0.1733ms 0.1153ms 8.6756 KOps/s 8.6053 KOps/s $\color{#35bf28}+0.82\%$
test_compile_indexing[tensor-tensordict-eager] 0.5594ms 79.8443μs 12.5244 KOps/s 11.7084 KOps/s $\textbf{\color{#35bf28}+6.97\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.2200ms 0.1128ms 8.8624 KOps/s 8.5911 KOps/s $\color{#35bf28}+3.16\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1403ms 72.4982μs 13.7935 KOps/s 14.0748 KOps/s $\color{#d91a1a}-2.00\%$
test_compile_indexing[tensor-pytree-compile] 0.1661ms 0.1136ms 8.8046 KOps/s 8.8039 KOps/s $+0.01\%$
test_compile_indexing[tensor-pytree-eager] 0.1438ms 72.3465μs 13.8224 KOps/s 14.2002 KOps/s $\color{#d91a1a}-2.66\%$
test_compile_indexing[slice-tensordict-compile] 0.1561ms 0.1063ms 9.4094 KOps/s 9.7431 KOps/s $\color{#d91a1a}-3.42\%$
test_compile_indexing[slice-tensordict-eager] 0.1473ms 19.5044μs 51.2705 KOps/s 53.2010 KOps/s $\color{#d91a1a}-3.63\%$
test_compile_indexing[slice-tensorclass-compile] 0.1504ms 0.1022ms 9.7856 KOps/s 10.1775 KOps/s $\color{#d91a1a}-3.85\%$
test_compile_indexing[slice-tensorclass-eager] 70.9010μs 17.0202μs 58.7536 KOps/s 59.6066 KOps/s $\color{#d91a1a}-1.43\%$
test_compile_indexing[slice-pytree-compile] 0.1779ms 0.1026ms 9.7508 KOps/s 10.1511 KOps/s $\color{#d91a1a}-3.94\%$
test_compile_indexing[slice-pytree-eager] 93.6710μs 17.0603μs 58.6156 KOps/s 59.9884 KOps/s $\color{#d91a1a}-2.29\%$
test_compile_indexing[int-tensordict-compile] 0.1794ms 0.1072ms 9.3297 KOps/s 9.7285 KOps/s $\color{#d91a1a}-4.10\%$
test_compile_indexing[int-tensordict-eager] 0.5578ms 19.0980μs 52.3615 KOps/s 53.8502 KOps/s $\color{#d91a1a}-2.76\%$
test_compile_indexing[int-tensorclass-compile] 0.1942ms 99.3713μs 10.0633 KOps/s 10.1648 KOps/s $\color{#d91a1a}-1.00\%$
test_compile_indexing[int-tensorclass-eager] 59.8210μs 17.1065μs 58.4575 KOps/s 60.3485 KOps/s $\color{#d91a1a}-3.13\%$
test_compile_indexing[int-pytree-compile] 0.1543ms 0.1027ms 9.7403 KOps/s 10.1657 KOps/s $\color{#d91a1a}-4.18\%$
test_compile_indexing[int-pytree-eager] 90.3710μs 16.6330μs 60.1215 KOps/s 60.0080 KOps/s $\color{#35bf28}+0.19\%$
test_mod_add[eager] 92.6510μs 37.7048μs 26.5218 KOps/s 23.0083 KOps/s $\textbf{\color{#35bf28}+15.27\%}$
test_mod_add[compile] 0.1140ms 82.0765μs 12.1838 KOps/s 12.0710 KOps/s $\color{#35bf28}+0.93\%$
test_mod_add[compile-overhead] 0.3270ms 0.1680ms 5.9529 KOps/s 5.6156 KOps/s $\textbf{\color{#35bf28}+6.01\%}$
test_mod_wrap[eager] 0.3288ms 0.2578ms 3.8785 KOps/s 3.7979 KOps/s $\color{#35bf28}+2.12\%$
test_mod_wrap[compile] 0.3590ms 0.2877ms 3.4755 KOps/s 3.3833 KOps/s $\color{#35bf28}+2.73\%$
test_mod_wrap[compile-overhead] 6.4095ms 3.5546ms 281.3288 Ops/s 272.9331 Ops/s $\color{#35bf28}+3.08\%$
test_mod_wrap_and_backward[eager] 1.4979ms 1.3573ms 736.7450 Ops/s 661.1178 Ops/s $\textbf{\color{#35bf28}+11.44\%}$
test_mod_wrap_and_backward[compile] 1.3621ms 1.2759ms 783.7872 Ops/s 711.4294 Ops/s $\textbf{\color{#35bf28}+10.17\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3861ms 0.9246ms 1.0815 KOps/s 955.2525 Ops/s $\textbf{\color{#35bf28}+13.22\%}$
test_seq_add[eager] 0.1605ms 0.1149ms 8.7036 KOps/s 8.1337 KOps/s $\textbf{\color{#35bf28}+7.01\%}$
test_seq_add[compile] 0.1230ms 89.0996μs 11.2234 KOps/s 11.0938 KOps/s $\color{#35bf28}+1.17\%$
test_seq_add[compile-overhead] 0.1820ms 0.1287ms 7.7673 KOps/s 7.5578 KOps/s $\color{#35bf28}+2.77\%$
test_seq_wrap[eager] 0.4919ms 0.4180ms 2.3923 KOps/s 2.2376 KOps/s $\textbf{\color{#35bf28}+6.91\%}$
test_seq_wrap[compile] 0.3453ms 0.3041ms 3.2887 KOps/s 3.0459 KOps/s $\textbf{\color{#35bf28}+7.97\%}$
test_seq_wrap[compile-overhead] 0.2735ms 0.2262ms 4.4216 KOps/s 4.3617 KOps/s $\color{#35bf28}+1.37\%$
test_func_call_runtime[False-eager] 0.8537ms 0.7445ms 1.3432 KOps/s 1.2971 KOps/s $\color{#35bf28}+3.56\%$
test_func_call_runtime[False-compile] 0.8185ms 0.7536ms 1.3270 KOps/s 1.2962 KOps/s $\color{#35bf28}+2.38\%$
test_func_call_runtime[False-compile-overhead] 0.4110ms 0.3669ms 2.7256 KOps/s 2.7034 KOps/s $\color{#35bf28}+0.82\%$
test_func_call_runtime[True-eager] 1.0553ms 0.9103ms 1.0985 KOps/s 1.0683 KOps/s $\color{#35bf28}+2.83\%$
test_func_call_runtime[True-compile] 0.8752ms 0.7917ms 1.2631 KOps/s 1.2742 KOps/s $\color{#d91a1a}-0.87\%$
test_func_call_runtime[True-compile-overhead] 0.4365ms 0.3854ms 2.5945 KOps/s 2.5570 KOps/s $\color{#35bf28}+1.47\%$
test_func_call_cm_runtime[False-eager] 1.1580ms 0.7418ms 1.3481 KOps/s 1.3048 KOps/s $\color{#35bf28}+3.32\%$
test_func_call_cm_runtime[False-compile] 1.1744ms 0.7562ms 1.3224 KOps/s 1.2963 KOps/s $\color{#35bf28}+2.01\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4353ms 0.3705ms 2.6991 KOps/s 2.6738 KOps/s $\color{#35bf28}+0.94\%$
test_func_call_cm_runtime[True-eager] 1.4199ms 1.0179ms 982.3745 Ops/s 959.6950 Ops/s $\color{#35bf28}+2.36\%$
test_func_call_cm_runtime[True-compile] 1.4144ms 1.0267ms 973.9678 Ops/s 919.1223 Ops/s $\textbf{\color{#35bf28}+5.97\%}$
test_func_call_cm_runtime[True-compile-overhead] 1.1355ms 0.9974ms 1.0026 KOps/s 976.0718 Ops/s $\color{#35bf28}+2.71\%$
test_vmap_func_call_cm_runtime[eager] 2.5527ms 2.1151ms 472.7818 Ops/s 464.7163 Ops/s $\color{#35bf28}+1.74\%$
test_vmap_func_call_cm_runtime[compile] 0.9353ms 0.8241ms 1.2135 KOps/s 1.1900 KOps/s $\color{#35bf28}+1.97\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.8349ms 0.4172ms 2.3972 KOps/s 2.3501 KOps/s $\color{#35bf28}+2.00\%$
test_distributed 3.0278ms 0.2085ms 4.7958 KOps/s 8.4933 KOps/s $\textbf{\color{#d91a1a}-43.53\%}$
test_tdmodule 79.4310μs 19.4327μs 51.4597 KOps/s 46.8217 KOps/s $\textbf{\color{#35bf28}+9.91\%}$
test_tdmodule_dispatch 0.3019ms 34.9475μs 28.6143 KOps/s 25.8064 KOps/s $\textbf{\color{#35bf28}+10.88\%}$
test_tdseq 41.9310μs 19.9860μs 50.0349 KOps/s 43.9210 KOps/s $\textbf{\color{#35bf28}+13.92\%}$
test_tdseq_dispatch 61.1110μs 36.8402μs 27.1442 KOps/s 23.8873 KOps/s $\textbf{\color{#35bf28}+13.63\%}$
test_instantiation_functorch 1.6883ms 1.5761ms 634.4784 Ops/s 623.3938 Ops/s $\color{#35bf28}+1.78\%$
test_exec_functorch 0.1806ms 0.1460ms 6.8476 KOps/s 6.6980 KOps/s $\color{#35bf28}+2.23\%$
test_exec_functional_call 0.1840ms 0.1400ms 7.1426 KOps/s 6.9520 KOps/s $\color{#35bf28}+2.74\%$
test_exec_td_decorator 0.3888ms 0.1899ms 5.2660 KOps/s 5.2187 KOps/s $\color{#35bf28}+0.91\%$
test_vmap_mlp_speed_decorator[True-True] 0.8112ms 0.6874ms 1.4548 KOps/s 1.4223 KOps/s $\color{#35bf28}+2.28\%$
test_vmap_mlp_speed_decorator[True-False] 0.8274ms 0.6893ms 1.4508 KOps/s 1.3903 KOps/s $\color{#35bf28}+4.35\%$
test_vmap_mlp_speed_decorator[False-True] 0.7236ms 0.6025ms 1.6598 KOps/s 1.5701 KOps/s $\textbf{\color{#35bf28}+5.71\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7337ms 0.6044ms 1.6546 KOps/s 1.6046 KOps/s $\color{#35bf28}+3.11\%$
test_vmap_transformer_speed_decorator[True-True] 19.9721ms 19.3445ms 51.6944 Ops/s 51.1123 Ops/s $\color{#35bf28}+1.14\%$
test_vmap_transformer_speed_decorator[True-False] 19.4172ms 19.3114ms 51.7828 Ops/s 51.0983 Ops/s $\color{#35bf28}+1.34\%$
test_vmap_transformer_speed_decorator[False-True] 19.9622ms 19.6400ms 50.9164 Ops/s 51.1447 Ops/s $\color{#d91a1a}-0.45\%$
test_vmap_transformer_speed_decorator[False-False] 19.9242ms 19.6367ms 50.9250 Ops/s 51.6029 Ops/s $\color{#d91a1a}-1.31\%$
test_to_module_speed[True] 1.4901ms 0.9831ms 1.0172 KOps/s 1.0390 KOps/s $\color{#d91a1a}-2.09\%$
test_to_module_speed[False] 1.0384ms 0.9650ms 1.0362 KOps/s 1.0561 KOps/s $\color{#d91a1a}-1.88\%$
test_tc_init 72.6510μs 35.2020μs 28.4075 KOps/s 24.8831 KOps/s $\textbf{\color{#35bf28}+14.16\%}$
test_tc_init_nested 0.1033ms 68.8254μs 14.5295 KOps/s 12.4574 KOps/s $\textbf{\color{#35bf28}+16.63\%}$
test_tc_first_layer_tensor 30.5610μs 0.8179μs 1.2227 MOps/s 1.3976 MOps/s $\textbf{\color{#d91a1a}-12.52\%}$
test_tc_first_layer_nontensor 27.9510μs 2.2619μs 442.1042 KOps/s 445.0371 KOps/s $\color{#d91a1a}-0.66\%$
test_tc_second_layer_tensor 8.5953μs 1.4425μs 693.2215 KOps/s 694.6681 KOps/s $\color{#d91a1a}-0.21\%$
test_tc_second_layer_nontensor 33.5510μs 2.9688μs 336.8318 KOps/s 329.7965 KOps/s $\color{#35bf28}+2.13\%$
test_unbind 0.2215s 12.2836ms 81.4091 Ops/s 140.7518 Ops/s $\textbf{\color{#d91a1a}-42.16\%}$
test_full_like 9.5256ms 9.2295ms 108.3482 Ops/s 108.1443 Ops/s $\color{#35bf28}+0.19\%$
test_zeros_like 9.1963ms 7.2763ms 137.4317 Ops/s 231.2600 Ops/s $\textbf{\color{#d91a1a}-40.57\%}$
test_ones_like 4.9693ms 4.3335ms 230.7620 Ops/s 230.4945 Ops/s $\color{#35bf28}+0.12\%$
test_clone 11.6548ms 9.2550ms 108.0494 Ops/s 153.7866 Ops/s $\textbf{\color{#d91a1a}-29.74\%}$
test_squeeze 67.1310μs 9.9210μs 100.7967 KOps/s 89.8484 KOps/s $\textbf{\color{#35bf28}+12.19\%}$
test_unsqueeze 0.1237ms 74.3121μs 13.4568 KOps/s 12.8219 KOps/s $\color{#35bf28}+4.95\%$
test_split 0.3825ms 0.1628ms 6.1424 KOps/s 6.0134 KOps/s $\color{#35bf28}+2.15\%$
test_permute 0.2290ms 0.1780ms 5.6188 KOps/s 5.2783 KOps/s $\textbf{\color{#35bf28}+6.45\%}$
test_stack 51.3315ms 50.7410ms 19.7079 Ops/s 19.5761 Ops/s $\color{#35bf28}+0.67\%$
test_cat 51.3484ms 50.7894ms 19.6892 Ops/s 19.7650 Ops/s $\color{#d91a1a}-0.38\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants