Add TagUnquantizedNodesPass #7427

Aleksei-grovety · 2024-12-23T09:07:40Z

For models with operations that are not quantized, this pass keeps unquantized operators on the CPU. For example, the deit-tiny-patch16-224 network has an unquantized scaled_dot_product_attention operation. When compiling to Vela, invalid argument errors occur because unquantized operations are offloaded to the NPU. This pass is designed to solve this problem.

pytorch-bot · 2024-12-23T09:07:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7427

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a5db1ed with merge base 82763a9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Aleksei-grovety · 2024-12-23T09:15:51Z

@pytorchbot label "topic: not user facing"

Aleksei-grovety · 2024-12-23T09:15:55Z

@pytorchbot label "partner: arm"

oscarandersson8218 · 2025-01-10T08:55:23Z

Thanks for your contribution @Aleksei-grovety! Just for clarification, do you prefer ScaledDotProductAttention (SDPA) to run on the CPU in FP32 for precision reasons, or would you prefer to run it on the NPU?

If you prefer to run it on the NPU, a different approach to this would be to decompose SDPA before quantization. An SDPA-decomposition exists in https://github.com/pytorch/executorch/blob/main/backends/transforms/decompose_sdpa.py. You can add pre-quantization decompositions to

executorch/backends/arm/_passes/arm_pass_manager.py

Line 144 in e2afedf

    
           def transform_for_annotation_pipeline(self, graph_module: torch.fx.GraphModule):

Note: doing this might require some additional changes to https://github.com/pytorch/executorch/blob/main/backends/arm/_passes/decompose_softmaxes_pass.py

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 23, 2024

pytorch-bot bot added the topic: not user facing label Dec 23, 2024

pytorch-bot bot added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Dec 23, 2024

freddan80 requested a review from oscarandersson8218 January 9, 2025 15:53

freddan80 added the ciflow/trunk label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TagUnquantizedNodesPass #7427

Add TagUnquantizedNodesPass #7427

Aleksei-grovety commented Dec 23, 2024

pytorch-bot bot commented Dec 23, 2024 •

edited

Loading

Aleksei-grovety commented Dec 23, 2024

Aleksei-grovety commented Dec 23, 2024

oscarandersson8218 commented Jan 10, 2025

Add TagUnquantizedNodesPass #7427

Are you sure you want to change the base?

Add TagUnquantizedNodesPass #7427

Conversation

Aleksei-grovety commented Dec 23, 2024

pytorch-bot bot commented Dec 23, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7427

✅ No Failures

Aleksei-grovety commented Dec 23, 2024

Aleksei-grovety commented Dec 23, 2024

oscarandersson8218 commented Jan 10, 2025

pytorch-bot bot commented Dec 23, 2024 •

edited

Loading