Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Type casting for tensorclass #735

Merged
merged 6 commits into from
Apr 24, 2024
Merged

[Feature] Type casting for tensorclass #735

merged 6 commits into from
Apr 24, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Apr 19, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 19, 2024
Copy link

github-actions bot commented Apr 19, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 39.5240μs 16.4264μs 60.8778 KOps/s 57.6248 KOps/s $\textbf{\color{#35bf28}+5.65\%}$
test_plain_set_stack_nested 51.2060μs 16.2940μs 61.3721 KOps/s 55.0578 KOps/s $\textbf{\color{#35bf28}+11.47\%}$
test_plain_set_nested_inplace 62.0870μs 18.6214μs 53.7017 KOps/s 50.0012 KOps/s $\textbf{\color{#35bf28}+7.40\%}$
test_plain_set_stack_nested_inplace 76.0910μs 18.4448μs 54.2160 KOps/s 50.1608 KOps/s $\textbf{\color{#35bf28}+8.08\%}$
test_items 16.8920μs 2.5115μs 398.1621 KOps/s 401.1558 KOps/s $\color{#d91a1a}-0.75\%$
test_items_nested 0.4215ms 0.2736ms 3.6550 KOps/s 3.6086 KOps/s $\color{#35bf28}+1.29\%$
test_items_nested_locked 1.3434ms 0.2777ms 3.6015 KOps/s 3.5866 KOps/s $\color{#35bf28}+0.41\%$
test_items_nested_leaf 0.1711ms 75.3972μs 13.2631 KOps/s 12.9669 KOps/s $\color{#35bf28}+2.28\%$
test_items_stack_nested 0.4712ms 0.2773ms 3.6067 KOps/s 3.5424 KOps/s $\color{#35bf28}+1.81\%$
test_items_stack_nested_leaf 0.1791ms 78.5482μs 12.7310 KOps/s 12.6158 KOps/s $\color{#35bf28}+0.91\%$
test_items_stack_nested_locked 0.3979ms 0.2798ms 3.5736 KOps/s 3.5836 KOps/s $\color{#d91a1a}-0.28\%$
test_keys 22.5420μs 3.9500μs 253.1659 KOps/s 257.2939 KOps/s $\color{#d91a1a}-1.60\%$
test_keys_nested 0.2335ms 0.1358ms 7.3640 KOps/s 7.1329 KOps/s $\color{#35bf28}+3.24\%$
test_keys_nested_locked 0.7738ms 0.1403ms 7.1254 KOps/s 6.9272 KOps/s $\color{#35bf28}+2.86\%$
test_keys_nested_leaf 0.2820ms 0.1142ms 8.7564 KOps/s 8.4079 KOps/s $\color{#35bf28}+4.14\%$
test_keys_stack_nested 0.2333ms 0.1356ms 7.3738 KOps/s 7.2234 KOps/s $\color{#35bf28}+2.08\%$
test_keys_stack_nested_leaf 0.1966ms 0.1137ms 8.7945 KOps/s 8.5124 KOps/s $\color{#35bf28}+3.31\%$
test_keys_stack_nested_locked 0.2656ms 0.1402ms 7.1319 KOps/s 6.9659 KOps/s $\color{#35bf28}+2.38\%$
test_values 4.2138μs 1.1495μs 869.9109 KOps/s 865.4515 KOps/s $\color{#35bf28}+0.52\%$
test_values_nested 0.1016ms 50.7535μs 19.7031 KOps/s 19.6765 KOps/s $\color{#35bf28}+0.13\%$
test_values_nested_locked 0.1167ms 50.8797μs 19.6542 KOps/s 19.5746 KOps/s $\color{#35bf28}+0.41\%$
test_values_nested_leaf 0.1022ms 46.0612μs 21.7102 KOps/s 21.5835 KOps/s $\color{#35bf28}+0.59\%$
test_values_stack_nested 91.5120μs 51.7659μs 19.3177 KOps/s 19.2761 KOps/s $\color{#35bf28}+0.22\%$
test_values_stack_nested_leaf 94.4580μs 45.6934μs 21.8850 KOps/s 21.7435 KOps/s $\color{#35bf28}+0.65\%$
test_values_stack_nested_locked 0.1021ms 51.5108μs 19.4134 KOps/s 19.3339 KOps/s $\color{#35bf28}+0.41\%$
test_membership 15.6990μs 1.3519μs 739.6953 KOps/s 742.2275 KOps/s $\color{#d91a1a}-0.34\%$
test_membership_nested 21.2600μs 3.4262μs 291.8721 KOps/s 282.0566 KOps/s $\color{#35bf28}+3.48\%$
test_membership_nested_leaf 22.1920μs 3.4803μs 287.3341 KOps/s 279.3527 KOps/s $\color{#35bf28}+2.86\%$
test_membership_stacked_nested 29.8760μs 3.4213μs 292.2826 KOps/s 284.5989 KOps/s $\color{#35bf28}+2.70\%$
test_membership_stacked_nested_leaf 22.8520μs 3.4227μs 292.1676 KOps/s 280.6026 KOps/s $\color{#35bf28}+4.12\%$
test_membership_nested_last 28.2140μs 4.2457μs 235.5345 KOps/s 230.5678 KOps/s $\color{#35bf28}+2.15\%$
test_membership_nested_leaf_last 24.6660μs 4.3074μs 232.1595 KOps/s 226.9169 KOps/s $\color{#35bf28}+2.31\%$
test_membership_stacked_nested_last 27.3510μs 4.2671μs 234.3497 KOps/s 229.8824 KOps/s $\color{#35bf28}+1.94\%$
test_membership_stacked_nested_leaf_last 23.5350μs 4.2580μs 234.8529 KOps/s 229.3107 KOps/s $\color{#35bf28}+2.42\%$
test_nested_getleaf 49.1120μs 10.6362μs 94.0185 KOps/s 94.9897 KOps/s $\color{#d91a1a}-1.02\%$
test_nested_get 35.8680μs 9.8916μs 101.0958 KOps/s 98.7512 KOps/s $\color{#35bf28}+2.37\%$
test_stacked_getleaf 28.0030μs 10.4839μs 95.3840 KOps/s 94.4797 KOps/s $\color{#35bf28}+0.96\%$
test_stacked_get 46.6480μs 9.9675μs 100.3259 KOps/s 99.8346 KOps/s $\color{#35bf28}+0.49\%$
test_nested_getitemleaf 33.9840μs 11.0987μs 90.1004 KOps/s 89.7836 KOps/s $\color{#35bf28}+0.35\%$
test_nested_getitem 47.7700μs 10.2075μs 97.9670 KOps/s 96.3169 KOps/s $\color{#35bf28}+1.71\%$
test_stacked_getitemleaf 31.5490μs 10.9318μs 91.4760 KOps/s 90.4240 KOps/s $\color{#35bf28}+1.16\%$
test_stacked_getitem 33.8130μs 10.2044μs 97.9968 KOps/s 95.4603 KOps/s $\color{#35bf28}+2.66\%$
test_lock_nested 47.0381ms 0.3889ms 2.5712 KOps/s 2.8448 KOps/s $\textbf{\color{#d91a1a}-9.62\%}$
test_lock_stack_nested 0.5043ms 0.3106ms 3.2192 KOps/s 3.2393 KOps/s $\color{#d91a1a}-0.62\%$
test_unlock_nested 85.5961ms 0.4304ms 2.3233 KOps/s 2.2408 KOps/s $\color{#35bf28}+3.68\%$
test_unlock_stack_nested 0.4482ms 0.3196ms 3.1285 KOps/s 3.1523 KOps/s $\color{#d91a1a}-0.76\%$
test_flatten_speed 0.4308ms 93.2351μs 10.7256 KOps/s 10.4294 KOps/s $\color{#35bf28}+2.84\%$
test_unflatten_speed 0.7963ms 0.4007ms 2.4956 KOps/s 2.4187 KOps/s $\color{#35bf28}+3.18\%$
test_common_ops 4.9507ms 0.6950ms 1.4389 KOps/s 1.3905 KOps/s $\color{#35bf28}+3.48\%$
test_creation 60.1530μs 1.8399μs 543.5072 KOps/s 533.3889 KOps/s $\color{#35bf28}+1.90\%$
test_creation_empty 44.3830μs 9.5362μs 104.8638 KOps/s 85.3488 KOps/s $\textbf{\color{#35bf28}+22.87\%}$
test_creation_nested_1 34.2040μs 12.1631μs 82.2159 KOps/s 68.8644 KOps/s $\textbf{\color{#35bf28}+19.39\%}$
test_creation_nested_2 43.3720μs 15.5106μs 64.4721 KOps/s 56.1614 KOps/s $\textbf{\color{#35bf28}+14.80\%}$
test_clone 80.8620μs 13.9505μs 71.6822 KOps/s 73.7176 KOps/s $\color{#d91a1a}-2.76\%$
test_getitem[int] 32.8820μs 11.5928μs 86.2605 KOps/s 87.3923 KOps/s $\color{#d91a1a}-1.30\%$
test_getitem[slice_int] 49.9340μs 23.5379μs 42.4848 KOps/s 43.6940 KOps/s $\color{#d91a1a}-2.77\%$
test_getitem[range] 0.1487ms 45.9620μs 21.7571 KOps/s 24.2416 KOps/s $\textbf{\color{#d91a1a}-10.25\%}$
test_getitem[tuple] 56.4660μs 19.3377μs 51.7125 KOps/s 53.7209 KOps/s $\color{#d91a1a}-3.74\%$
test_getitem[list] 0.1708ms 40.2612μs 24.8378 KOps/s 26.4996 KOps/s $\textbf{\color{#d91a1a}-6.27\%}$
test_setitem_dim[int] 84.6100μs 35.7975μs 27.9349 KOps/s 27.2537 KOps/s $\color{#35bf28}+2.50\%$
test_setitem_dim[slice_int] 0.1077ms 61.5345μs 16.2510 KOps/s 15.9448 KOps/s $\color{#35bf28}+1.92\%$
test_setitem_dim[range] 0.1403ms 80.5485μs 12.4149 KOps/s 12.4059 KOps/s $\color{#35bf28}+0.07\%$
test_setitem_dim[tuple] 80.9420μs 50.3436μs 19.8635 KOps/s 19.5090 KOps/s $\color{#35bf28}+1.82\%$
test_setitem 76.4540μs 20.2320μs 49.4268 KOps/s 48.2805 KOps/s $\color{#35bf28}+2.37\%$
test_set 68.3280μs 19.3727μs 51.6191 KOps/s 49.7613 KOps/s $\color{#35bf28}+3.73\%$
test_set_shared 1.5290ms 0.1405ms 7.1155 KOps/s 6.8008 KOps/s $\color{#35bf28}+4.63\%$
test_update 0.1208ms 20.6559μs 48.4123 KOps/s 43.9798 KOps/s $\textbf{\color{#35bf28}+10.08\%}$
test_update_nested 0.1054ms 29.5652μs 33.8236 KOps/s 32.0352 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_update__nested 63.8900μs 25.4121μs 39.3513 KOps/s 39.1256 KOps/s $\color{#35bf28}+0.58\%$
test_set_nested 79.1990μs 21.4455μs 46.6298 KOps/s 45.2508 KOps/s $\color{#35bf28}+3.05\%$
test_set_nested_new 98.0140μs 25.6789μs 38.9425 KOps/s 38.5005 KOps/s $\color{#35bf28}+1.15\%$
test_select 0.1012ms 39.8723μs 25.0801 KOps/s 23.2148 KOps/s $\textbf{\color{#35bf28}+8.03\%}$
test_select_nested 0.1209ms 58.6753μs 17.0430 KOps/s 16.7286 KOps/s $\color{#35bf28}+1.88\%$
test_exclude_nested 0.2227ms 0.1191ms 8.3938 KOps/s 8.3306 KOps/s $\color{#35bf28}+0.76\%$
test_empty[True] 0.9618ms 0.3896ms 2.5665 KOps/s 2.5244 KOps/s $\color{#35bf28}+1.67\%$
test_empty[False] 5.6266μs 1.0459μs 956.1383 KOps/s 944.6686 KOps/s $\color{#35bf28}+1.21\%$
test_unbind_speed 1.7057ms 0.2568ms 3.8945 KOps/s 3.8954 KOps/s $\color{#d91a1a}-0.02\%$
test_unbind_speed_stack0 0.3680ms 0.2519ms 3.9706 KOps/s 3.9871 KOps/s $\color{#d91a1a}-0.42\%$
test_unbind_speed_stack1 0.1179s 0.6944ms 1.4400 KOps/s 1.4419 KOps/s $\color{#d91a1a}-0.13\%$
test_split 1.7398ms 1.5182ms 658.6861 Ops/s 601.9216 Ops/s $\textbf{\color{#35bf28}+9.43\%}$
test_chunk 0.1088s 1.6923ms 590.9019 Ops/s 674.8935 Ops/s $\textbf{\color{#d91a1a}-12.45\%}$
test_creation[device0] 0.1899ms 0.1028ms 9.7265 KOps/s 9.7275 KOps/s $-0.01\%$
test_creation_from_tensor 3.6569ms 82.7662μs 12.0822 KOps/s 12.0262 KOps/s $\color{#35bf28}+0.47\%$
test_add_one[memmap_tensor0] 0.1329ms 5.5270μs 180.9306 KOps/s 177.1571 KOps/s $\color{#35bf28}+2.13\%$
test_contiguous[memmap_tensor0] 15.0890μs 0.6392μs 1.5645 MOps/s 1.6032 MOps/s $\color{#d91a1a}-2.41\%$
test_stack[memmap_tensor0] 41.1370μs 3.6576μs 273.4046 KOps/s 287.7383 KOps/s $\color{#d91a1a}-4.98\%$
test_memmaptd_index 0.9303ms 0.2464ms 4.0590 KOps/s 4.1017 KOps/s $\color{#d91a1a}-1.04\%$
test_memmaptd_index_astensor 0.5582ms 0.3104ms 3.2215 KOps/s 3.2342 KOps/s $\color{#d91a1a}-0.39\%$
test_memmaptd_index_op 0.9872ms 0.6015ms 1.6626 KOps/s 1.5946 KOps/s $\color{#35bf28}+4.26\%$
test_serialize_model 0.2221s 0.1140s 8.7735 Ops/s 8.7814 Ops/s $\color{#d91a1a}-0.09\%$
test_serialize_model_pickle 0.4476s 0.3747s 2.6689 Ops/s 2.6092 Ops/s $\color{#35bf28}+2.29\%$
test_serialize_weights 0.1008s 96.1760ms 10.3976 Ops/s 10.0041 Ops/s $\color{#35bf28}+3.93\%$
test_serialize_weights_returnearly 0.2480s 0.1365s 7.3239 Ops/s 7.1835 Ops/s $\color{#35bf28}+1.95\%$
test_serialize_weights_pickle 0.7051s 0.4963s 2.0149 Ops/s 2.4942 Ops/s $\textbf{\color{#d91a1a}-19.22\%}$
test_serialize_weights_filesystem 96.7477ms 91.7259ms 10.9020 Ops/s 10.8695 Ops/s $\color{#35bf28}+0.30\%$
test_serialize_model_filesystem 0.1010s 89.9391ms 11.1186 Ops/s 10.6806 Ops/s $\color{#35bf28}+4.10\%$
test_reshape_pytree 60.3540μs 21.1625μs 47.2534 KOps/s 48.1524 KOps/s $\color{#d91a1a}-1.87\%$
test_reshape_td 69.0500μs 32.2799μs 30.9790 KOps/s 30.9518 KOps/s $\color{#35bf28}+0.09\%$
test_view_pytree 68.5090μs 21.0883μs 47.4198 KOps/s 48.1996 KOps/s $\color{#d91a1a}-1.62\%$
test_view_td 0.1140s 60.9439μs 16.4085 KOps/s 16.0961 KOps/s $\color{#35bf28}+1.94\%$
test_unbind_pytree 72.6660μs 25.0333μs 39.9467 KOps/s 41.1408 KOps/s $\color{#d91a1a}-2.90\%$
test_unbind_td 95.8800μs 37.7758μs 26.4719 KOps/s 27.1689 KOps/s $\color{#d91a1a}-2.57\%$
test_split_pytree 54.5530μs 24.4933μs 40.8274 KOps/s 42.3006 KOps/s $\color{#d91a1a}-3.48\%$
test_split_td 0.1307ms 41.5438μs 24.0710 KOps/s 24.7379 KOps/s $\color{#d91a1a}-2.70\%$
test_add_pytree 70.6730μs 30.2314μs 33.0782 KOps/s 33.0952 KOps/s $\color{#d91a1a}-0.05\%$
test_add_td 0.1120ms 54.1182μs 18.4781 KOps/s 16.6967 KOps/s $\textbf{\color{#35bf28}+10.67\%}$
test_distributed 0.1906ms 99.9035μs 10.0097 KOps/s 9.7841 KOps/s $\color{#35bf28}+2.31\%$
test_tdmodule 31.3700μs 17.0836μs 58.5358 KOps/s 56.2138 KOps/s $\color{#35bf28}+4.13\%$
test_tdmodule_dispatch 59.8120μs 34.1968μs 29.2425 KOps/s 26.6553 KOps/s $\textbf{\color{#35bf28}+9.71\%}$
test_tdseq 34.8160μs 19.6446μs 50.9046 KOps/s 48.5778 KOps/s $\color{#35bf28}+4.79\%$
test_tdseq_dispatch 62.1470μs 38.3350μs 26.0859 KOps/s 24.8083 KOps/s $\textbf{\color{#35bf28}+5.15\%}$
test_instantiation_functorch 2.2105ms 1.3857ms 721.6506 Ops/s 761.1774 Ops/s $\textbf{\color{#d91a1a}-5.19\%}$
test_instantiation_td 1.6776ms 1.0207ms 979.7517 Ops/s 997.8633 Ops/s $\color{#d91a1a}-1.82\%$
test_exec_functorch 0.3081ms 0.1592ms 6.2817 KOps/s 6.3433 KOps/s $\color{#d91a1a}-0.97\%$
test_exec_functional_call 0.2897ms 0.1480ms 6.7580 KOps/s 6.7827 KOps/s $\color{#d91a1a}-0.36\%$
test_exec_td 0.2242ms 0.1433ms 6.9776 KOps/s 6.8661 KOps/s $\color{#35bf28}+1.62\%$
test_exec_td_decorator 0.5676ms 0.1995ms 5.0130 KOps/s 5.0406 KOps/s $\color{#d91a1a}-0.55\%$
test_vmap_mlp_speed[True-True] 0.8761ms 0.4789ms 2.0881 KOps/s 2.0748 KOps/s $\color{#35bf28}+0.64\%$
test_vmap_mlp_speed[True-False] 0.6913ms 0.4734ms 2.1122 KOps/s 2.0798 KOps/s $\color{#35bf28}+1.56\%$
test_vmap_mlp_speed[False-True] 0.4831ms 0.3893ms 2.5686 KOps/s 2.4658 KOps/s $\color{#35bf28}+4.17\%$
test_vmap_mlp_speed[False-False] 0.6334ms 0.3860ms 2.5907 KOps/s 2.5203 KOps/s $\color{#35bf28}+2.79\%$
test_vmap_mlp_speed_decorator[True-True] 1.0624ms 0.4973ms 2.0109 KOps/s 2.0150 KOps/s $\color{#d91a1a}-0.20\%$
test_vmap_mlp_speed_decorator[True-False] 0.6683ms 0.4979ms 2.0086 KOps/s 2.0001 KOps/s $\color{#35bf28}+0.43\%$
test_vmap_mlp_speed_decorator[False-True] 0.6118ms 0.4084ms 2.4483 KOps/s 2.4685 KOps/s $\color{#d91a1a}-0.82\%$
test_vmap_mlp_speed_decorator[False-False] 0.7344ms 0.4094ms 2.4425 KOps/s 2.4556 KOps/s $\color{#d91a1a}-0.53\%$
test_to_module_speed[True] 2.1149ms 1.4304ms 699.0860 Ops/s 707.3422 Ops/s $\color{#d91a1a}-1.17\%$
test_to_module_speed[False] 1.8812ms 1.4049ms 711.8111 Ops/s 717.0471 Ops/s $\color{#d91a1a}-0.73\%$

@vmoens vmoens changed the title [WIP][Feature] Type casting for tensorclass [Feature] Type casting for tensorclass Apr 22, 2024
@vmoens vmoens added the enhancement New feature or request label Apr 22, 2024
Copy link

@AlexandreBrown AlexandreBrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@vmoens vmoens merged commit 2a73516 into main Apr 24, 2024
44 of 48 checks passed
@vmoens vmoens deleted the tensorclass-type-cast branch April 24, 2024 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants