-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactor] composite_lp_aggregate
to handle log-probs aggregates globally
#1181
Open
vmoens
wants to merge
6
commits into
main
Choose a base branch
from
composite_lp_aggregate
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+762
−467
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Jan 13, 2025
vmoens
added
Refactor
Refactoring code - not a new feature
Deprecation
Announces or enacts a deprecation
labels
Jan 13, 2025
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 49.8230μs | 21.2783μs | 46.9963 KOps/s | 51.9469 KOps/s | |
test_plain_set_stack_nested | 55.7540μs | 21.5773μs | 46.3451 KOps/s | 52.2627 KOps/s | |
test_plain_set_nested_inplace | 59.9320μs | 22.8812μs | 43.7040 KOps/s | 47.3951 KOps/s | |
test_plain_set_stack_nested_inplace | 72.8360μs | 22.9579μs | 43.5579 KOps/s | 47.8776 KOps/s | |
test_items | 27.1710μs | 4.2007μs | 238.0575 KOps/s | 237.3068 KOps/s | |
test_items_nested | 0.7366ms | 0.4132ms | 2.4202 KOps/s | 2.5275 KOps/s | |
test_items_nested_locked | 0.7156ms | 0.4093ms | 2.4431 KOps/s | 2.4716 KOps/s | |
test_items_nested_leaf | 0.1516ms | 78.7568μs | 12.6973 KOps/s | 12.8632 KOps/s | |
test_items_stack_nested | 0.5882ms | 0.4100ms | 2.4391 KOps/s | 2.5033 KOps/s | |
test_items_stack_nested_leaf | 0.1521ms | 78.8587μs | 12.6809 KOps/s | 12.7487 KOps/s | |
test_items_stack_nested_locked | 0.6345ms | 0.4116ms | 2.4294 KOps/s | 2.4874 KOps/s | |
test_keys | 22.6020μs | 3.4875μs | 286.7374 KOps/s | 288.6082 KOps/s | |
test_keys_nested | 0.3129ms | 0.1625ms | 6.1550 KOps/s | 6.0063 KOps/s | |
test_keys_nested_locked | 0.6861ms | 0.1682ms | 5.9470 KOps/s | 5.8588 KOps/s | |
test_keys_nested_leaf | 0.2305ms | 0.1423ms | 7.0288 KOps/s | 6.9533 KOps/s | |
test_keys_stack_nested | 0.2879ms | 0.1634ms | 6.1214 KOps/s | 6.0092 KOps/s | |
test_keys_stack_nested_leaf | 0.2216ms | 0.1418ms | 7.0526 KOps/s | 6.9686 KOps/s | |
test_keys_stack_nested_locked | 0.3064ms | 0.1681ms | 5.9490 KOps/s | 5.8710 KOps/s | |
test_values | 5.5084μs | 1.0291μs | 971.7639 KOps/s | 965.5300 KOps/s | |
test_values_nested | 0.1142ms | 61.5524μs | 16.2463 KOps/s | 15.9085 KOps/s | |
test_values_nested_locked | 0.1125ms | 61.5663μs | 16.2426 KOps/s | 15.9646 KOps/s | |
test_values_nested_leaf | 0.1242ms | 70.3801μs | 14.2086 KOps/s | 12.8642 KOps/s | |
test_values_stack_nested | 0.1481ms | 61.2655μs | 16.3224 KOps/s | 15.6874 KOps/s | |
test_values_stack_nested_leaf | 0.1023ms | 69.9535μs | 14.2952 KOps/s | 14.0240 KOps/s | |
test_values_stack_nested_locked | 0.1317ms | 61.4307μs | 16.2785 KOps/s | 15.8429 KOps/s | |
test_membership | 3.4636μs | 0.7184μs | 1.3920 MOps/s | 1.3906 MOps/s | |
test_membership_nested | 23.6140μs | 2.9320μs | 341.0641 KOps/s | 342.5849 KOps/s | |
test_membership_nested_leaf | 21.5500μs | 2.9650μs | 337.2717 KOps/s | 339.7894 KOps/s | |
test_membership_stacked_nested | 21.1900μs | 2.9376μs | 340.4130 KOps/s | 345.8762 KOps/s | |
test_membership_stacked_nested_leaf | 29.1240μs | 2.9708μs | 336.6122 KOps/s | 344.7126 KOps/s | |
test_membership_nested_last | 23.7850μs | 4.4172μs | 226.3861 KOps/s | 229.2153 KOps/s | |
test_membership_nested_leaf_last | 27.5810μs | 4.4555μs | 224.4439 KOps/s | 227.9879 KOps/s | |
test_membership_stacked_nested_last | 27.7320μs | 4.4207μs | 226.2066 KOps/s | 228.3774 KOps/s | |
test_membership_stacked_nested_leaf_last | 33.0810μs | 4.4126μs | 226.6244 KOps/s | 227.2498 KOps/s | |
test_nested_getleaf | 0.1096ms | 10.7878μs | 92.6970 KOps/s | 94.1166 KOps/s | |
test_nested_get | 32.1900μs | 10.1279μs | 98.7368 KOps/s | 99.0653 KOps/s | |
test_stacked_getleaf | 37.0790μs | 10.5626μs | 94.6740 KOps/s | 94.2444 KOps/s | |
test_stacked_get | 33.0520μs | 10.0679μs | 99.3254 KOps/s | 99.4643 KOps/s | |
test_nested_getitemleaf | 38.2720μs | 11.1284μs | 89.8604 KOps/s | 89.7692 KOps/s | |
test_nested_getitem | 35.7070μs | 10.5291μs | 94.9744 KOps/s | 94.8203 KOps/s | |
test_stacked_getitemleaf | 97.0810μs | 11.0149μs | 90.7865 KOps/s | 89.9089 KOps/s | |
test_stacked_getitem | 34.1230μs | 10.4366μs | 95.8165 KOps/s | 94.6037 KOps/s | |
test_lock_nested | 8.0101ms | 0.4624ms | 2.1626 KOps/s | 1.7969 KOps/s | |
test_lock_stack_nested | 0.7875ms | 0.4316ms | 2.3170 KOps/s | 2.3658 KOps/s | |
test_unlock_nested | 0.9373ms | 0.3775ms | 2.6493 KOps/s | 2.6828 KOps/s | |
test_unlock_stack_nested | 0.5989ms | 0.3484ms | 2.8702 KOps/s | 2.9258 KOps/s | |
test_flatten_speed | 0.4773ms | 0.1038ms | 9.6312 KOps/s | 10.1345 KOps/s | |
test_unflatten_speed | 0.6273ms | 0.5329ms | 1.8764 KOps/s | 1.9348 KOps/s | |
test_common_ops | 1.9100ms | 0.8077ms | 1.2381 KOps/s | 1.3264 KOps/s | |
test_creation | 44.6940μs | 2.5720μs | 388.8034 KOps/s | 404.7572 KOps/s | |
test_creation_empty | 36.8180μs | 13.4418μs | 74.3948 KOps/s | 105.3372 KOps/s | |
test_creation_nested_1 | 49.2720μs | 16.5613μs | 60.3816 KOps/s | 81.7070 KOps/s | |
test_creation_nested_2 | 74.5690μs | 20.8806μs | 47.8913 KOps/s | 58.5738 KOps/s | |
test_clone | 54.9830μs | 13.8258μs | 72.3288 KOps/s | 73.1026 KOps/s | |
test_getitem[int] | 1.2290ms | 13.1004μs | 76.3338 KOps/s | 77.3476 KOps/s | |
test_getitem[slice_int] | 0.1392ms | 25.0700μs | 39.8882 KOps/s | 39.9212 KOps/s | |
test_getitem[range] | 0.1657ms | 48.2914μs | 20.7076 KOps/s | 19.8904 KOps/s | |
test_getitem[tuple] | 0.1325ms | 20.6038μs | 48.5347 KOps/s | 49.0389 KOps/s | |
test_getitem[list] | 0.1693ms | 44.0739μs | 22.6892 KOps/s | 21.9134 KOps/s | |
test_setitem_dim[int] | 56.2450μs | 25.5227μs | 39.1808 KOps/s | 37.6849 KOps/s | |
test_setitem_dim[slice_int] | 92.1010μs | 52.1629μs | 19.1707 KOps/s | 19.0070 KOps/s | |
test_setitem_dim[range] | 0.1159ms | 72.7516μs | 13.7454 KOps/s | 13.3502 KOps/s | |
test_setitem_dim[tuple] | 0.1134ms | 40.9603μs | 24.4139 KOps/s | 23.8646 KOps/s | |
test_setitem | 61.7450μs | 21.9697μs | 45.5173 KOps/s | 51.1170 KOps/s | |
test_set | 65.1910μs | 21.3380μs | 46.8648 KOps/s | 52.6455 KOps/s | |
test_set_shared | 4.2512ms | 0.1711ms | 5.8444 KOps/s | 5.8271 KOps/s | |
test_update | 0.3706ms | 24.4082μs | 40.9698 KOps/s | 48.8780 KOps/s | |
test_update_nested | 80.8500μs | 35.1263μs | 28.4687 KOps/s | 33.0236 KOps/s | |
test_update__nested | 1.0082ms | 34.2226μs | 29.2204 KOps/s | 29.7034 KOps/s | |
test_set_nested | 78.7870μs | 23.1802μs | 43.1402 KOps/s | 48.0020 KOps/s | |
test_set_nested_new | 85.4990μs | 27.9400μs | 35.7910 KOps/s | 39.1881 KOps/s | |
test_select | 0.1176ms | 44.5983μs | 22.4224 KOps/s | 24.4016 KOps/s | |
test_select_nested | 0.1224ms | 64.6981μs | 15.4564 KOps/s | 16.0853 KOps/s | |
test_exclude_nested | 0.1605ms | 84.0326μs | 11.9001 KOps/s | 12.4220 KOps/s | |
test_empty[True] | 0.5622ms | 0.4087ms | 2.4467 KOps/s | 2.4764 KOps/s | |
test_empty[False] | 34.3040μs | 1.5038μs | 664.9603 KOps/s | 726.7559 KOps/s | |
test_unbind_speed | 0.4290ms | 0.2755ms | 3.6295 KOps/s | 3.7090 KOps/s | |
test_unbind_speed_stack0 | 0.4082ms | 0.2716ms | 3.6821 KOps/s | 3.7823 KOps/s | |
test_unbind_speed_stack1 | 0.1129s | 0.8294ms | 1.2057 KOps/s | 1.3565 KOps/s | |
test_split | 0.1044s | 1.7957ms | 556.9003 Ops/s | 562.5401 Ops/s | |
test_chunk | 3.0621ms | 1.6285ms | 614.0454 Ops/s | 554.7686 Ops/s | |
test_consolidate_njt[False-None] | 8.3733ms | 8.0165ms | 124.7428 Ops/s | 118.6417 Ops/s | |
test_creation[device0] | 4.2875ms | 92.7868μs | 10.7774 KOps/s | 10.6282 KOps/s | |
test_creation_from_tensor | 0.2721ms | 95.8106μs | 10.4373 KOps/s | 10.3061 KOps/s | |
test_add_one[memmap_tensor0] | 0.1440ms | 5.0565μs | 197.7633 KOps/s | 201.7513 KOps/s | |
test_contiguous[memmap_tensor0] | 20.3980μs | 0.5183μs | 1.9295 MOps/s | 1.9465 MOps/s | |
test_stack[memmap_tensor0] | 53.8410μs | 3.4221μs | 292.2158 KOps/s | 293.3690 KOps/s | |
test_memmaptd_index | 0.9493ms | 0.2317ms | 4.3166 KOps/s | 4.1133 KOps/s | |
test_memmaptd_index_astensor | 0.5901ms | 0.3169ms | 3.1559 KOps/s | 3.0102 KOps/s | |
test_memmaptd_index_op | 0.9711ms | 0.6113ms | 1.6359 KOps/s | 1.7127 KOps/s | |
test_serialize_model | 0.1256s | 0.1162s | 8.6022 Ops/s | 8.3469 Ops/s | |
test_serialize_model_pickle | 0.4937s | 0.4047s | 2.4707 Ops/s | 2.5328 Ops/s | |
test_serialize_weights | 0.1291s | 0.1139s | 8.7765 Ops/s | 8.3739 Ops/s | |
test_serialize_weights_returnearly | 0.2663s | 0.1768s | 5.6558 Ops/s | 6.3680 Ops/s | |
test_serialize_weights_pickle | 0.5564s | 0.4561s | 2.1927 Ops/s | 2.4815 Ops/s | |
test_serialize_weights_filesystem | 0.1508s | 0.1444s | 6.9245 Ops/s | 6.8490 Ops/s | |
test_serialize_model_filesystem | 0.1532s | 0.1451s | 6.8921 Ops/s | 6.0346 Ops/s | |
test_reshape_pytree | 82.1730μs | 26.4879μs | 37.7531 KOps/s | 38.0296 KOps/s | |
test_reshape_td | 98.5540μs | 32.6995μs | 30.5815 KOps/s | 29.3637 KOps/s | |
test_view_pytree | 85.5680μs | 26.6994μs | 37.4541 KOps/s | 37.7465 KOps/s | |
test_view_td | 0.1108ms | 38.2904μs | 26.1162 KOps/s | 25.1195 KOps/s | |
test_unbind_pytree | 99.2760μs | 29.4235μs | 33.9865 KOps/s | 33.8925 KOps/s | |
test_unbind_td | 0.2903ms | 39.9702μs | 25.0186 KOps/s | 25.1026 KOps/s | |
test_split_pytree | 62.9170μs | 29.0833μs | 34.3840 KOps/s | 34.1638 KOps/s | |
test_split_td | 0.4574ms | 46.5805μs | 21.4682 KOps/s | 22.1353 KOps/s | |
test_add_pytree | 71.1730μs | 35.6355μs | 28.0619 KOps/s | 27.6990 KOps/s | |
test_add_td | 0.1336ms | 59.3027μs | 16.8626 KOps/s | 18.8713 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.2356ms | 64.0096μs | 15.6227 KOps/s | 15.9695 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 1.2393ms | 0.1746ms | 5.7268 KOps/s | 5.7161 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1740ms | 45.9232μs | 21.7755 KOps/s | 21.9187 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2848ms | 0.1189ms | 8.4083 KOps/s | 8.3939 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 80.7210μs | 26.6745μs | 37.4890 KOps/s | 38.3240 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1730ms | 59.0825μs | 16.9255 KOps/s | 17.1276 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.3633ms | 77.6924μs | 12.8713 KOps/s | 12.7929 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1266ms | 66.6663μs | 15.0001 KOps/s | 14.9432 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.2584ms | 0.1048ms | 9.5408 KOps/s | 9.4630 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3932ms | 0.2115ms | 4.7280 KOps/s | 4.6274 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.2390ms | 45.4254μs | 22.0141 KOps/s | 21.4143 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.5023ms | 66.4343μs | 15.0525 KOps/s | 14.6494 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.2503ms | 0.1025ms | 9.7571 KOps/s | 9.6369 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.4621ms | 0.1977ms | 5.0582 KOps/s | 4.9444 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.5003ms | 0.2278ms | 4.3890 KOps/s | 4.2975 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.2106ms | 0.1032ms | 9.6932 KOps/s | 9.0870 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1423ms | 62.1489μs | 16.0904 KOps/s | 15.7295 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1409ms | 46.7859μs | 21.3740 KOps/s | 21.2906 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.1943ms | 0.1567ms | 6.3834 KOps/s | 6.3975 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.2185ms | 0.1016ms | 9.8383 KOps/s | 9.6278 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 77.4750μs | 21.3447μs | 46.8501 KOps/s | 46.6089 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1268ms | 65.7317μs | 15.2134 KOps/s | 14.7783 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1516ms | 77.8115μs | 12.8516 KOps/s | 12.4167 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1358ms | 66.8626μs | 14.9560 KOps/s | 14.8524 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.5603ms | 0.2077ms | 4.8140 KOps/s | 4.8130 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.3384ms | 1.3218ms | 756.5484 Ops/s | 764.1802 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.7701ms | 0.2062ms | 4.8499 KOps/s | 4.8343 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 0.9175ms | 0.7703ms | 1.2981 KOps/s | 1.2174 KOps/s | |
test_compile_assign_and_add_stack[compile] | 0.8202ms | 0.4550ms | 2.1976 KOps/s | 2.1689 KOps/s | |
test_compile_assign_and_add_stack[eager] | 3.0584ms | 2.7401ms | 364.9447 Ops/s | 393.7284 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.2345ms | 37.7343μs | 26.5011 KOps/s | 27.6066 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.6436ms | 33.6577μs | 29.7109 KOps/s | 30.1967 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 98.3730μs | 29.9385μs | 33.4018 KOps/s | 34.2140 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.2096ms | 23.2320μs | 43.0440 KOps/s | 43.3744 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 96.0100μs | 30.8566μs | 32.4080 KOps/s | 33.6282 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.1091ms | 23.0572μs | 43.3703 KOps/s | 43.8603 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1392ms | 52.3701μs | 19.0949 KOps/s | 19.0326 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.6733ms | 20.4640μs | 48.8664 KOps/s | 48.5273 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1653ms | 44.7785μs | 22.3322 KOps/s | 22.7484 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 0.1077ms | 18.8925μs | 52.9311 KOps/s | 54.2203 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1129ms | 45.0179μs | 22.2134 KOps/s | 22.3369 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 68.1770μs | 18.7976μs | 53.1983 KOps/s | 54.7352 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.2699ms | 53.0842μs | 18.8380 KOps/s | 18.2959 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 1.0086ms | 20.2937μs | 49.2764 KOps/s | 50.3167 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1168ms | 45.2870μs | 22.0814 KOps/s | 22.1497 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 83.8600μs | 18.7685μs | 53.2809 KOps/s | 54.1608 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1391ms | 45.5874μs | 21.9359 KOps/s | 22.2767 KOps/s | |
test_compile_indexing[int-pytree-eager] | 88.1240μs | 19.0935μs | 52.3739 KOps/s | 54.2129 KOps/s | |
test_mod_add[eager] | 0.1007ms | 35.0369μs | 28.5413 KOps/s | 30.2198 KOps/s | |
test_mod_add[compile] | 0.1736ms | 49.4849μs | 20.2082 KOps/s | 20.5096 KOps/s | |
test_mod_add[compile-overhead] | 0.1560ms | 48.9675μs | 20.4217 KOps/s | 20.3653 KOps/s | |
test_mod_wrap[eager] | 0.5347ms | 0.2195ms | 4.5561 KOps/s | 4.4726 KOps/s | |
test_mod_wrap[compile] | 0.5324ms | 0.2082ms | 4.8029 KOps/s | 4.7935 KOps/s | |
test_mod_wrap[compile-overhead] | 0.4206ms | 0.2049ms | 4.8802 KOps/s | 4.8319 KOps/s | |
test_mod_wrap_and_backward[eager] | 12.3741ms | 10.9780ms | 91.0914 Ops/s | 87.3708 Ops/s | |
test_mod_wrap_and_backward[compile] | 12.3040ms | 10.9432ms | 91.3808 Ops/s | 72.8930 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 12.7825ms | 11.0645ms | 90.3789 Ops/s | 77.2105 Ops/s | |
test_seq_add[eager] | 0.4073ms | 0.1200ms | 8.3305 KOps/s | 8.6779 KOps/s | |
test_seq_add[compile] | 0.1437ms | 65.6847μs | 15.2242 KOps/s | 16.0620 KOps/s | |
test_seq_add[compile-overhead] | 0.1954ms | 63.8782μs | 15.6548 KOps/s | 16.0536 KOps/s | |
test_seq_wrap[eager] | 0.6229ms | 0.4460ms | 2.2421 KOps/s | 2.2555 KOps/s | |
test_seq_wrap[compile] | 0.4305ms | 0.2338ms | 4.2779 KOps/s | 4.3815 KOps/s | |
test_seq_wrap[compile-overhead] | 0.4514ms | 0.2298ms | 4.3520 KOps/s | 4.4140 KOps/s | |
test_func_call_runtime[False-eager] | 0.7527ms | 0.5366ms | 1.8636 KOps/s | 1.8333 KOps/s | |
test_func_call_runtime[False-compile] | 0.5471ms | 0.4287ms | 2.3327 KOps/s | 2.3219 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.5294ms | 0.4272ms | 2.3411 KOps/s | 2.3354 KOps/s | |
test_func_call_runtime[True-eager] | 1.0641ms | 0.7428ms | 1.3462 KOps/s | 1.3357 KOps/s | |
test_func_call_runtime[True-compile] | 0.9286ms | 0.4720ms | 2.1184 KOps/s | 2.1266 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.6170ms | 0.4729ms | 2.1145 KOps/s | 2.1454 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.9017ms | 0.5363ms | 1.8647 KOps/s | 1.8590 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.7954ms | 0.4316ms | 2.3168 KOps/s | 2.3620 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.9542ms | 0.4297ms | 2.3273 KOps/s | 2.3343 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.2443ms | 0.8888ms | 1.1251 KOps/s | 1.1110 KOps/s | |
test_func_call_cm_runtime[True-compile] | 0.8576ms | 0.4955ms | 2.0183 KOps/s | 2.0236 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.8277ms | 0.4981ms | 2.0078 KOps/s | 2.0226 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.6871ms | 1.9067ms | 524.4755 Ops/s | 521.4270 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.7842ms | 0.5170ms | 1.9344 KOps/s | 1.9137 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.9201ms | 0.5216ms | 1.9171 KOps/s | 1.9022 KOps/s | |
test_distributed | 0.2653ms | 0.1259ms | 7.9407 KOps/s | 7.5138 KOps/s | |
test_tdmodule | 85.9200μs | 26.4485μs | 37.8093 KOps/s | 39.5222 KOps/s | |
test_tdmodule_dispatch | 92.7530μs | 50.2864μs | 19.8861 KOps/s | 22.1510 KOps/s | |
test_tdseq | 59.5510μs | 30.5913μs | 32.6891 KOps/s | 37.0589 KOps/s | |
test_tdseq_dispatch | 88.8460μs | 56.8898μs | 17.5778 KOps/s | 19.7436 KOps/s | |
test_instantiation_functorch | 2.1888ms | 1.5079ms | 663.1709 Ops/s | 647.3114 Ops/s | |
test_exec_functorch | 0.3175ms | 0.1816ms | 5.5053 KOps/s | 5.5630 KOps/s | |
test_exec_functional_call | 0.3180ms | 0.1723ms | 5.8034 KOps/s | 5.7565 KOps/s | |
test_exec_td_decorator | 0.4848ms | 0.2322ms | 4.3062 KOps/s | 4.2154 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.8368ms | 0.6453ms | 1.5496 KOps/s | 1.5512 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.9173ms | 0.6416ms | 1.5585 KOps/s | 1.5488 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7814ms | 0.5217ms | 1.9167 KOps/s | 1.8756 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.9348ms | 0.5253ms | 1.9038 KOps/s | 1.8893 KOps/s | |
test_to_module_speed[True] | 2.2186ms | 1.3209ms | 757.0715 Ops/s | 743.4904 Ops/s | |
test_to_module_speed[False] | 1.9408ms | 1.2860ms | 777.5877 Ops/s | 769.1419 Ops/s | |
test_tc_init | 89.0460μs | 47.4220μs | 21.0872 KOps/s | 21.8572 KOps/s | |
test_tc_init_nested | 0.2160ms | 95.6714μs | 10.4524 KOps/s | 10.8073 KOps/s | |
test_tc_first_layer_tensor | 17.1020μs | 1.5335μs | 652.1198 KOps/s | 643.0244 KOps/s | |
test_tc_first_layer_nontensor | 23.7950μs | 4.6349μs | 215.7541 KOps/s | 212.6060 KOps/s | |
test_tc_second_layer_tensor | 41.2370μs | 2.8009μs | 357.0219 KOps/s | 348.7563 KOps/s | |
test_tc_second_layer_nontensor | 54.4620μs | 5.9851μs | 167.0813 KOps/s | 164.5807 KOps/s | |
test_unbind | 0.2401s | 14.1453ms | 70.6950 Ops/s | 77.6404 Ops/s | |
test_full_like | 12.8849ms | 9.1157ms | 109.7006 Ops/s | 80.5217 Ops/s | |
test_zeros_like | 3.6105ms | 3.0884ms | 323.7962 Ops/s | 127.2016 Ops/s | |
test_ones_like | 4.0997ms | 3.4906ms | 286.4838 Ops/s | 124.5687 Ops/s | |
test_clone | 6.1469ms | 5.4104ms | 184.8289 Ops/s | 105.2500 Ops/s | |
test_squeeze | 65.3420μs | 12.1372μs | 82.3914 KOps/s | 82.3183 KOps/s | |
test_unsqueeze | 0.1714ms | 90.3071μs | 11.0733 KOps/s | 11.0308 KOps/s | |
test_split | 0.4723ms | 0.1947ms | 5.1362 KOps/s | 5.1149 KOps/s | |
test_permute | 0.5393ms | 0.1944ms | 5.1430 KOps/s | 4.8504 KOps/s | |
test_stack | 31.4204ms | 25.9278ms | 38.5686 Ops/s | 39.6751 Ops/s | |
test_cat | 31.6721ms | 25.6890ms | 38.9271 Ops/s | 39.5518 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 28.8010μs | 12.6360μs | 79.1390 KOps/s | 79.4643 KOps/s | |
test_plain_set_stack_nested | 40.5000μs | 12.8580μs | 77.7728 KOps/s | 78.6394 KOps/s | |
test_plain_set_nested_inplace | 0.4025ms | 13.7000μs | 72.9926 KOps/s | 73.2416 KOps/s | |
test_plain_set_stack_nested_inplace | 0.2016ms | 13.7940μs | 72.4955 KOps/s | 73.1424 KOps/s | |
test_items | 45.5410μs | 2.9214μs | 342.3045 KOps/s | 339.9020 KOps/s | |
test_items_nested | 0.7671ms | 0.3610ms | 2.7700 KOps/s | 2.7465 KOps/s | |
test_items_nested_locked | 0.7598ms | 0.3644ms | 2.7439 KOps/s | 2.7350 KOps/s | |
test_items_nested_leaf | 0.4767ms | 58.3399μs | 17.1409 KOps/s | 17.2987 KOps/s | |
test_items_stack_nested | 0.7748ms | 0.3655ms | 2.7358 KOps/s | 2.7238 KOps/s | |
test_items_stack_nested_leaf | 0.1031ms | 60.1812μs | 16.6165 KOps/s | 16.7964 KOps/s | |
test_items_stack_nested_locked | 0.7714ms | 0.3657ms | 2.7341 KOps/s | 2.7283 KOps/s | |
test_keys | 0.4084ms | 3.4814μs | 287.2423 KOps/s | 289.3488 KOps/s | |
test_keys_nested | 0.4827ms | 87.7129μs | 11.4008 KOps/s | 11.3857 KOps/s | |
test_keys_nested_locked | 0.7921ms | 93.7056μs | 10.6717 KOps/s | 10.6989 KOps/s | |
test_keys_nested_leaf | 0.4738ms | 79.2086μs | 12.6249 KOps/s | 12.9069 KOps/s | |
test_keys_stack_nested | 0.1221ms | 89.1559μs | 11.2163 KOps/s | 11.2851 KOps/s | |
test_keys_stack_nested_leaf | 0.4974ms | 80.6762μs | 12.3952 KOps/s | 12.6455 KOps/s | |
test_keys_stack_nested_locked | 0.5134ms | 95.4641μs | 10.4751 KOps/s | 10.6555 KOps/s | |
test_values | 67.3895μs | 0.8615μs | 1.1607 MOps/s | 1.1658 MOps/s | |
test_values_nested | 0.4364ms | 37.6510μs | 26.5597 KOps/s | 26.9254 KOps/s | |
test_values_nested_locked | 0.4364ms | 39.1383μs | 25.5504 KOps/s | 25.9511 KOps/s | |
test_values_nested_leaf | 0.4380ms | 41.9690μs | 23.8271 KOps/s | 24.0856 KOps/s | |
test_values_stack_nested | 0.4498ms | 38.2031μs | 26.1759 KOps/s | 26.2754 KOps/s | |
test_values_stack_nested_leaf | 81.9120μs | 42.7058μs | 23.4160 KOps/s | 23.7881 KOps/s | |
test_values_stack_nested_locked | 0.4378ms | 39.9701μs | 25.0187 KOps/s | 25.3387 KOps/s | |
test_membership | 20.7304μs | 0.5117μs | 1.9542 MOps/s | 1.9668 MOps/s | |
test_membership_nested | 0.2119ms | 1.9704μs | 507.5000 KOps/s | 492.8007 KOps/s | |
test_membership_nested_leaf | 0.2154ms | 1.9876μs | 503.1275 KOps/s | 509.0535 KOps/s | |
test_membership_stacked_nested | 20.9100μs | 2.0272μs | 493.2829 KOps/s | 486.6987 KOps/s | |
test_membership_stacked_nested_leaf | 0.4079ms | 2.0196μs | 495.1362 KOps/s | 491.5500 KOps/s | |
test_membership_nested_last | 41.5600μs | 3.0468μs | 328.2143 KOps/s | 323.9584 KOps/s | |
test_membership_nested_leaf_last | 0.3996ms | 3.0843μs | 324.2175 KOps/s | 327.2971 KOps/s | |
test_membership_stacked_nested_last | 37.4110μs | 3.7814μs | 264.4519 KOps/s | 260.5690 KOps/s | |
test_membership_stacked_nested_leaf_last | 44.8510μs | 3.8310μs | 261.0315 KOps/s | 265.5028 KOps/s | |
test_nested_getleaf | 0.4277ms | 6.1580μs | 162.3916 KOps/s | 162.8336 KOps/s | |
test_nested_get | 0.4083ms | 5.8630μs | 170.5600 KOps/s | 171.6812 KOps/s | |
test_stacked_getleaf | 49.1610μs | 6.1631μs | 162.2555 KOps/s | 163.8031 KOps/s | |
test_stacked_get | 33.0210μs | 5.8258μs | 171.6494 KOps/s | 173.6668 KOps/s | |
test_nested_getitemleaf | 53.5010μs | 6.3853μs | 156.6103 KOps/s | 156.9453 KOps/s | |
test_nested_getitem | 0.4268ms | 6.0931μs | 164.1214 KOps/s | 164.9916 KOps/s | |
test_stacked_getitemleaf | 33.6510μs | 6.4240μs | 155.6661 KOps/s | 156.7016 KOps/s | |
test_stacked_getitem | 0.4305ms | 6.1217μs | 163.3527 KOps/s | 163.1128 KOps/s | |
test_lock_nested | 0.7787ms | 0.3695ms | 2.7061 KOps/s | 2.6529 KOps/s | |
test_lock_stack_nested | 0.4156ms | 0.3387ms | 2.9528 KOps/s | 2.9711 KOps/s | |
test_unlock_nested | 0.7001ms | 0.3120ms | 3.2054 KOps/s | 3.2222 KOps/s | |
test_unlock_stack_nested | 0.3961ms | 0.2791ms | 3.5826 KOps/s | 3.6163 KOps/s | |
test_flatten_speed | 0.4957ms | 74.8266μs | 13.3642 KOps/s | 13.4923 KOps/s | |
test_unflatten_speed | 0.7295ms | 0.3171ms | 3.1535 KOps/s | 3.1560 KOps/s | |
test_common_ops | 1.6485ms | 0.6364ms | 1.5713 KOps/s | 1.6045 KOps/s | |
test_creation | 0.1054ms | 1.7302μs | 577.9818 KOps/s | 578.7302 KOps/s | |
test_creation_empty | 43.9700μs | 9.0000μs | 111.1116 KOps/s | 111.7880 KOps/s | |
test_creation_nested_1 | 35.9500μs | 10.6815μs | 93.6201 KOps/s | 93.7699 KOps/s | |
test_creation_nested_2 | 55.6410μs | 13.4337μs | 74.4396 KOps/s | 74.7594 KOps/s | |
test_clone | 95.7110μs | 10.8261μs | 92.3696 KOps/s | 93.6198 KOps/s | |
test_getitem[int] | 1.4978ms | 10.5823μs | 94.4972 KOps/s | 94.3335 KOps/s | |
test_getitem[slice_int] | 0.1058ms | 20.8162μs | 48.0394 KOps/s | 48.8339 KOps/s | |
test_getitem[range] | 0.1526ms | 37.6808μs | 26.5387 KOps/s | 27.1785 KOps/s | |
test_getitem[tuple] | 0.1120ms | 18.0116μs | 55.5197 KOps/s | 55.7979 KOps/s | |
test_getitem[list] | 0.2467ms | 34.0303μs | 29.3856 KOps/s | 30.6461 KOps/s | |
test_setitem_dim[int] | 39.0510μs | 19.8906μs | 50.2749 KOps/s | 52.9328 KOps/s | |
test_setitem_dim[slice_int] | 86.1810μs | 38.9423μs | 25.6790 KOps/s | 26.3875 KOps/s | |
test_setitem_dim[range] | 79.8720μs | 53.2165μs | 18.7912 KOps/s | 19.0993 KOps/s | |
test_setitem_dim[tuple] | 54.7810μs | 32.6934μs | 30.5872 KOps/s | 31.3968 KOps/s | |
test_setitem | 0.2072ms | 15.6575μs | 63.8673 KOps/s | 63.9478 KOps/s | |
test_set | 0.1008ms | 15.3735μs | 65.0470 KOps/s | 65.7743 KOps/s | |
test_set_shared | 1.5221ms | 0.1515ms | 6.6026 KOps/s | 6.6049 KOps/s | |
test_update | 1.0424ms | 18.4463μs | 54.2114 KOps/s | 50.9721 KOps/s | |
test_update_nested | 96.9620μs | 23.8896μs | 41.8592 KOps/s | 41.2168 KOps/s | |
test_update__nested | 0.1296ms | 25.7101μs | 38.8951 KOps/s | 39.5396 KOps/s | |
test_set_nested | 97.6210μs | 16.0920μs | 62.1427 KOps/s | 60.5397 KOps/s | |
test_set_nested_new | 93.9920μs | 19.1217μs | 52.2966 KOps/s | 52.5143 KOps/s | |
test_select | 0.1183ms | 31.0543μs | 32.2016 KOps/s | 32.3276 KOps/s | |
test_select_nested | 88.4320μs | 43.6341μs | 22.9178 KOps/s | 22.9291 KOps/s | |
test_exclude_nested | 0.1065ms | 63.0735μs | 15.8545 KOps/s | 16.3064 KOps/s | |
test_empty[True] | 0.4329ms | 0.3006ms | 3.3265 KOps/s | 3.3690 KOps/s | |
test_empty[False] | 39.9987μs | 0.8108μs | 1.2334 MOps/s | 1.2122 MOps/s | |
test_to | 86.4720μs | 54.6660μs | 18.2929 KOps/s | 16.9875 KOps/s | |
test_to_nonblocking | 0.1991ms | 47.3760μs | 21.1077 KOps/s | 20.0496 KOps/s | |
test_unbind_speed | 1.7436ms | 0.2325ms | 4.3004 KOps/s | 4.3092 KOps/s | |
test_unbind_speed_stack0 | 0.3706ms | 0.2338ms | 4.2779 KOps/s | 4.3236 KOps/s | |
test_unbind_speed_stack1 | 93.1921ms | 0.6576ms | 1.5207 KOps/s | 1.5175 KOps/s | |
test_split | 95.6654ms | 1.5887ms | 629.4518 Ops/s | 583.1948 Ops/s | |
test_chunk | 95.2452ms | 1.5826ms | 631.8824 Ops/s | 695.6165 Ops/s | |
test_consolidate[False-None] | 97.2070ms | 2.9156ms | 342.9864 Ops/s | 340.8040 Ops/s | |
test_consolidate[default-None] | 1.8280ms | 1.6674ms | 599.7328 Ops/s | 601.2249 Ops/s | |
test_consolidate[reduce-overhead-None] | 1.8853ms | 1.7064ms | 586.0240 Ops/s | 588.6689 Ops/s | |
test_consolidate_njt[False-None] | 6.5827ms | 6.4777ms | 154.3758 Ops/s | 154.0551 Ops/s | |
test_to[False-False-None] | 1.8404ms | 1.6891ms | 592.0373 Ops/s | 588.5170 Ops/s | |
test_to[True-False-None] | 1.7702ms | 1.3313ms | 751.1506 Ops/s | 772.8905 Ops/s | |
test_to[within-False-None] | 4.2128ms | 4.0807ms | 245.0550 Ops/s | 243.3478 Ops/s | |
test_to[True-default-None] | 5.4571ms | 5.3122ms | 188.2474 Ops/s | 189.7531 Ops/s | |
test_to_njt[False-False-None] | 6.9885ms | 6.8366ms | 146.2717 Ops/s | 144.2820 Ops/s | |
test_to_njt[True-False-None] | 5.6663ms | 5.4499ms | 183.4908 Ops/s | 184.7188 Ops/s | |
test_to_njt[within-False-None] | 12.2051ms | 12.0071ms | 83.2844 Ops/s | 82.8790 Ops/s | |
test_creation[device0] | 0.3745ms | 79.7022μs | 12.5467 KOps/s | 11.9475 KOps/s | |
test_creation_from_tensor | 0.5421ms | 83.4786μs | 11.9791 KOps/s | 11.5037 KOps/s | |
test_add_one[memmap_tensor0] | 0.4343ms | 6.8663μs | 145.6382 KOps/s | 148.7211 KOps/s | |
test_contiguous[memmap_tensor0] | 1.7840μs | 0.4043μs | 2.4733 MOps/s | 2.4972 MOps/s | |
test_stack[memmap_tensor0] | 37.0510μs | 4.3499μs | 229.8908 KOps/s | 237.8936 KOps/s | |
test_memmaptd_index | 1.6517ms | 0.2542ms | 3.9341 KOps/s | 4.1236 KOps/s | |
test_memmaptd_index_astensor | 0.9365ms | 0.3157ms | 3.1676 KOps/s | 3.2712 KOps/s | |
test_memmaptd_index_op | 1.0024ms | 0.6031ms | 1.6580 KOps/s | 1.7271 KOps/s | |
test_serialize_model | 0.1312s | 0.1302s | 7.6816 Ops/s | 7.6507 Ops/s | |
test_serialize_model_pickle | 1.3504s | 1.1924s | 0.8386 Ops/s | 0.8225 Ops/s | |
test_serialize_weights | 0.1322s | 0.1303s | 7.6763 Ops/s | 7.6865 Ops/s | |
test_serialize_weights_returnearly | 0.5040s | 71.9870ms | 13.8914 Ops/s | 14.6701 Ops/s | |
test_serialize_weights_pickle | 1.3771s | 1.1946s | 0.8371 Ops/s | 0.8199 Ops/s | |
test_reshape_pytree | 0.1127ms | 21.8656μs | 45.7339 KOps/s | 45.1356 KOps/s | |
test_reshape_td | 48.5900μs | 25.8960μs | 38.6160 KOps/s | 36.4544 KOps/s | |
test_view_pytree | 0.1719ms | 21.8560μs | 45.7540 KOps/s | 46.2352 KOps/s | |
test_view_td | 0.1576ms | 30.2989μs | 33.0045 KOps/s | 30.9353 KOps/s | |
test_unbind_pytree | 0.1339ms | 27.6445μs | 36.1736 KOps/s | 36.4197 KOps/s | |
test_unbind_td | 0.7664ms | 36.1885μs | 27.6331 KOps/s | 27.6710 KOps/s | |
test_split_pytree | 0.1495ms | 29.7568μs | 33.6058 KOps/s | 34.0359 KOps/s | |
test_split_td | 0.9804ms | 38.1604μs | 26.2052 KOps/s | 25.7727 KOps/s | |
test_add_pytree | 0.1079ms | 34.6243μs | 28.8815 KOps/s | 28.9885 KOps/s | |
test_add_td | 88.2720μs | 49.9785μs | 20.0086 KOps/s | 19.8795 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1705ms | 0.1193ms | 8.3839 KOps/s | 8.0859 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.2739ms | 0.1312ms | 7.6206 KOps/s | 7.5067 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.2377ms | 94.9048μs | 10.5369 KOps/s | 10.4138 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.3348ms | 0.1474ms | 6.7853 KOps/s | 6.6847 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 0.1859ms | 24.8034μs | 40.3170 KOps/s | 41.0853 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 65.9910μs | 29.1124μs | 34.3496 KOps/s | 34.0429 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.4405ms | 63.6862μs | 15.7020 KOps/s | 15.3392 KOps/s | |
test_compile_copy_nested[pytree-eager] | 87.1720μs | 48.7590μs | 20.5090 KOps/s | 20.1237 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.1806ms | 0.1402ms | 7.1333 KOps/s | 6.9994 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3496ms | 0.2160ms | 4.6289 KOps/s | 4.6042 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.2413ms | 96.9231μs | 10.3175 KOps/s | 10.3134 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1973ms | 54.5533μs | 18.3307 KOps/s | 18.3919 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.1737ms | 0.1344ms | 7.4402 KOps/s | 7.4327 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.6278ms | 0.4731ms | 2.1137 KOps/s | 2.0922 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.4086ms | 0.2603ms | 3.8421 KOps/s | 3.8423 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.2586ms | 0.1413ms | 7.0788 KOps/s | 7.0932 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.2249ms | 66.2841μs | 15.0866 KOps/s | 14.9813 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1520ms | 99.7629μs | 10.0238 KOps/s | 10.1902 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.5449ms | 0.4038ms | 2.4762 KOps/s | 2.4854 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.1733ms | 0.1345ms | 7.4327 KOps/s | 7.5231 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 87.5010μs | 19.4273μs | 51.4740 KOps/s | 54.8891 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 60.4210μs | 31.2913μs | 31.9578 KOps/s | 31.4302 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1071ms | 70.0123μs | 14.2832 KOps/s | 14.3504 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1551ms | 50.5605μs | 19.7783 KOps/s | 19.4746 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 1.5853ms | 0.3896ms | 2.5670 KOps/s | 2.2511 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.7948ms | 2.5729ms | 388.6630 Ops/s | 382.9095 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 1.5779ms | 0.4376ms | 2.2851 KOps/s | 2.2970 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 2.8266ms | 2.5815ms | 387.3773 Ops/s | 386.6520 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.4861ms | 0.1179ms | 8.4805 KOps/s | 8.7881 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.6014ms | 80.7422μs | 12.3851 KOps/s | 12.4136 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.5126ms | 0.1102ms | 9.0715 KOps/s | 9.6022 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 2.7094ms | 71.8635μs | 13.9153 KOps/s | 15.0180 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1597ms | 0.1101ms | 9.0827 KOps/s | 9.5506 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.2410ms | 71.7396μs | 13.9393 KOps/s | 14.9660 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.2434ms | 0.1004ms | 9.9632 KOps/s | 10.1621 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.1408ms | 16.9706μs | 58.9255 KOps/s | 58.6836 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.2640ms | 98.9054μs | 10.1107 KOps/s | 10.5874 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 0.1647ms | 16.4262μs | 60.8784 KOps/s | 64.7708 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.2850ms | 98.6066μs | 10.1413 KOps/s | 10.5658 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 0.1126ms | 15.8216μs | 63.2046 KOps/s | 64.9604 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.2804ms | 0.1026ms | 9.7506 KOps/s | 10.0463 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.5518ms | 16.7304μs | 59.7715 KOps/s | 59.1362 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.2520ms | 96.1915μs | 10.3959 KOps/s | 10.4995 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 96.3020μs | 15.7474μs | 63.5027 KOps/s | 65.5357 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.2233ms | 95.9462μs | 10.4225 KOps/s | 10.5175 KOps/s | |
test_compile_indexing[int-pytree-eager] | 92.8520μs | 15.9271μs | 62.7862 KOps/s | 65.4563 KOps/s | |
test_mod_add[eager] | 0.1873ms | 37.5566μs | 26.6265 KOps/s | 26.0395 KOps/s | |
test_mod_add[compile] | 0.3390ms | 78.4650μs | 12.7445 KOps/s | 12.5689 KOps/s | |
test_mod_add[compile-overhead] | 0.3216ms | 0.1634ms | 6.1183 KOps/s | 5.7544 KOps/s | |
test_mod_wrap[eager] | 0.3778ms | 0.2453ms | 4.0772 KOps/s | 3.9936 KOps/s | |
test_mod_wrap[compile] | 0.3974ms | 0.2787ms | 3.5879 KOps/s | 3.5583 KOps/s | |
test_mod_wrap[compile-overhead] | 6.9500ms | 3.7215ms | 268.7102 Ops/s | 265.9146 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.5921ms | 1.3996ms | 714.4700 Ops/s | 675.9173 Ops/s | |
test_mod_wrap_and_backward[compile] | 1.4128ms | 1.2497ms | 800.1712 Ops/s | 730.0133 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.4289ms | 0.9250ms | 1.0811 KOps/s | 954.4387 Ops/s | |
test_seq_add[eager] | 0.3274ms | 0.1185ms | 8.4361 KOps/s | 8.3344 KOps/s | |
test_seq_add[compile] | 0.2448ms | 87.9137μs | 11.3748 KOps/s | 11.0080 KOps/s | |
test_seq_add[compile-overhead] | 0.3290ms | 0.1273ms | 7.8580 KOps/s | 7.8081 KOps/s | |
test_seq_wrap[eager] | 0.6148ms | 0.4167ms | 2.4000 KOps/s | 2.3664 KOps/s | |
test_seq_wrap[compile] | 0.4658ms | 0.2959ms | 3.3801 KOps/s | 3.3650 KOps/s | |
test_seq_wrap[compile-overhead] | 0.3636ms | 0.2222ms | 4.5005 KOps/s | 4.3119 KOps/s | |
test_func_call_runtime[False-eager] | 0.9297ms | 0.7559ms | 1.3229 KOps/s | 1.2526 KOps/s | |
test_func_call_runtime[False-compile] | 0.9456ms | 0.7267ms | 1.3761 KOps/s | 1.3612 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.4165ms | 0.3586ms | 2.7889 KOps/s | 2.7698 KOps/s | |
test_func_call_runtime[True-eager] | 1.0391ms | 0.9003ms | 1.1107 KOps/s | 1.1193 KOps/s | |
test_func_call_runtime[True-compile] | 0.9677ms | 0.7552ms | 1.3242 KOps/s | 1.3310 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.5184ms | 0.3778ms | 2.6469 KOps/s | 2.6199 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8441ms | 0.7175ms | 1.3937 KOps/s | 1.2641 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.9000ms | 0.7343ms | 1.3619 KOps/s | 1.2523 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.5090ms | 0.3622ms | 2.7607 KOps/s | 2.7358 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.0861ms | 0.9852ms | 1.0150 KOps/s | 934.9338 Ops/s | |
test_func_call_cm_runtime[True-compile] | 0.9376ms | 0.7807ms | 1.2810 KOps/s | 1.2645 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.5052ms | 0.4084ms | 2.4488 KOps/s | 2.4433 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.6843ms | 2.1460ms | 465.9733 Ops/s | 474.7593 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 1.0389ms | 0.8502ms | 1.1762 KOps/s | 1.1508 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.5343ms | 0.4069ms | 2.4573 KOps/s | 2.4490 KOps/s | |
test_distributed | 4.6210ms | 0.2414ms | 4.1428 KOps/s | 8.1015 KOps/s | |
test_tdmodule | 57.5410μs | 20.4778μs | 48.8334 KOps/s | 50.1350 KOps/s | |
test_tdmodule_dispatch | 78.1110μs | 36.4521μs | 27.4332 KOps/s | 27.4947 KOps/s | |
test_tdseq | 58.5310μs | 21.0201μs | 47.5734 KOps/s | 46.7700 KOps/s | |
test_tdseq_dispatch | 69.6010μs | 39.1402μs | 25.5492 KOps/s | 25.3246 KOps/s | |
test_instantiation_functorch | 1.6474ms | 1.5321ms | 652.7131 Ops/s | 657.8127 Ops/s | |
test_exec_functorch | 0.1996ms | 0.1464ms | 6.8286 KOps/s | 7.0815 KOps/s | |
test_exec_functional_call | 0.2803ms | 0.1403ms | 7.1297 KOps/s | 7.3007 KOps/s | |
test_exec_td_decorator | 0.3859ms | 0.1869ms | 5.3515 KOps/s | 5.4101 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.8357ms | 0.6811ms | 1.4682 KOps/s | 1.4658 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8126ms | 0.6826ms | 1.4651 KOps/s | 1.4634 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7365ms | 0.6180ms | 1.6181 KOps/s | 1.5969 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7747ms | 0.6206ms | 1.6114 KOps/s | 1.5894 KOps/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.1584ms | 19.0321ms | 52.5428 Ops/s | 52.2756 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.1677ms | 19.0325ms | 52.5416 Ops/s | 51.8572 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.7835ms | 18.9602ms | 52.7421 Ops/s | 52.6647 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.3346ms | 18.9270ms | 52.8345 Ops/s | 52.6136 Ops/s | |
test_to_module_speed[True] | 1.6296ms | 0.9638ms | 1.0376 KOps/s | 1.0408 KOps/s | |
test_to_module_speed[False] | 1.4790ms | 0.9439ms | 1.0594 KOps/s | 1.0500 KOps/s | |
test_tc_init | 74.2220μs | 36.6187μs | 27.3084 KOps/s | 27.0592 KOps/s | |
test_tc_init_nested | 0.2272ms | 75.4294μs | 13.2574 KOps/s | 13.6972 KOps/s | |
test_tc_first_layer_tensor | 4.8900μs | 0.6945μs | 1.4400 MOps/s | 1.2373 MOps/s | |
test_tc_first_layer_nontensor | 37.2200μs | 2.2313μs | 448.1684 KOps/s | 444.6312 KOps/s | |
test_tc_second_layer_tensor | 7.6550μs | 1.4085μs | 709.9756 KOps/s | 700.6774 KOps/s | |
test_tc_second_layer_nontensor | 40.6800μs | 2.9624μs | 337.5594 KOps/s | 334.0619 KOps/s | |
test_unbind | 0.2292s | 9.9001ms | 101.0086 Ops/s | 145.2471 Ops/s | |
test_full_like | 10.5238ms | 9.4334ms | 106.0061 Ops/s | 105.5727 Ops/s | |
test_zeros_like | 5.2722ms | 4.3482ms | 229.9820 Ops/s | 233.6216 Ops/s | |
test_ones_like | 5.3862ms | 4.2632ms | 234.5671 Ops/s | 232.0524 Ops/s | |
test_clone | 7.0348ms | 6.5907ms | 151.7279 Ops/s | 150.9094 Ops/s | |
test_squeeze | 0.1546ms | 9.8031μs | 102.0086 KOps/s | 104.6865 KOps/s | |
test_unsqueeze | 0.2628ms | 73.5846μs | 13.5898 KOps/s | 13.3166 KOps/s | |
test_split | 0.4078ms | 0.1693ms | 5.9060 KOps/s | 6.1728 KOps/s | |
test_permute | 0.3642ms | 0.1858ms | 5.3823 KOps/s | 5.4279 KOps/s | |
test_stack | 51.6608ms | 50.9426ms | 19.6299 Ops/s | 19.3348 Ops/s | |
test_cat | 51.5917ms | 50.7316ms | 19.7116 Ops/s | 19.5790 Ops/s |
vmoens
force-pushed
the
composite_lp_aggregate
branch
from
January 14, 2025 15:24
8c1335e
to
04e1c1d
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Deprecation
Announces or enacts a deprecation
Refactor
Refactoring code - not a new feature
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I propose a global flag
composite_lp_aggregate
to handle the issue of the aggregation of log-probs.So far, we have dealt with this using kwargs everywhere (in
ProbabilisticTDModule
,ProbabilisticTDSequential
,CompositeDistribution
and subclasses). The hierarchy of these classes and what to do when args conflict isn't easy to handle. It's also confusing for users, who I suspect will usually want to work with either collapsed or non-collapsed log-probs.A global flag (set to True for now and False in the future) will make things easier to handle.
Globally, the v0.6.2 behaviour will not be changed all users who rely on it will be informed about the upcoming change through a warning that will tell them to set the global var to
False
to accommodate upcoming changes. If they set it toTrue
, nothing will change for them but that also means that bugs will not be solved (we won't maintain theTrue
behaviour).When
composite_lp_aggregate() == True
, we'll haveaggregate_probabilities=True
,include_sum=True
andinplace=True
by default everywhere. Whencomposite_lp_aggregate() == False
, all of these will be set to False, meaning that any call towhatever.log_prob(tensordict)
will return another tensordict containing the leaf log-probs.