Subclass API (#966) #995

metascroy · 2024-10-02T22:09:59Z

Summary:

Adds new int8_dynamic_activation_intx_weight quantization with subclass API

Differential Revision: D62464487

Summary: Adds new int8_dynamic_activation_intx_weight quantization with subclass API Differential Revision: D62464487

pytorch-bot · 2024-10-02T22:10:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/995

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 41a40cb with merge base 09b8b3c ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://download.pytorc... / linux-job (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-10-02T22:10:07Z

This pull request was exported from Phabricator. Differential Revision: D62464487

metascroy · 2024-10-02T22:14:33Z

torchao/quantization/quant_primitives.py

@@ -300,7 +300,7 @@ def _quantize_affine_no_dtype_cast(
    elif zero_point_domain is None:
        # This case handles quantization for float8 we expect no zero point and no zero point domain
        assert zero_point is None, "zero_point should be None when zero_point_domain is None"
-        quant = torch.clamp(input * scale.reciprocal(), quant_min, quant_max)
+        quant = torch.clamp(torch.round(input * (1.0 / scale)), quant_min, quant_max)


@jerryzh168 to confirm if this is OK. It was needed to match behavior of other quantizer.

hmmm, it might be fine as long as all the tests passes I think

metascroy · 2024-10-02T22:15:11Z

torchao/quantization/quant_primitives.py

-        if preserve_zero:
-            zero_point = quant_min - torch.round(min_val_neg / scale)
-            zero_point = torch.clamp(zero_point, quant_min, quant_max)
+        if zero_point_domain is None:


@jerryzh168 confirm if this is OK. It was needed to get scale-only quantization in affine_quantized_tensor

OK, should zero_point be None here?

metascroy · 2024-10-02T22:18:17Z

torchao/experimental/tests/test_linear_8bit_act_xbit_weight_subclass_quantizer.py

+        exported = torch.export.export(model, (activations,))
+
+        print("Compiling quantized model")
+        compiled = torch.compile(unwrapped_model)


@jerryzh168 do you see unification for compile and export coming soon? The fact that one requires an unwrapped tensor subclass and the other requires a wrapped one makes using this API inconvenient in torchchat.

yes, it's blocked by pytorch/pytorch#129682 and I heard @tugsbayasgalan is working on this

metascroy · 2024-10-02T22:28:18Z

@kimishpatel @jerryzh168 moving review over to GH. I hope I've addressed most of your concerns.

@jerryzh168, the fact that compile and export cannot handle the same model (export requires an unwrapped tensor subclass, compile requires a wrapped one, and eager can handle both) makes using this API inconvenient in torchchat. Do you know if there is planned unification there?

kimishpatel · 2024-10-07T17:22:02Z