-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inductor] [Doc] Add tutorial for Max-autotune Support on CPU as a prototype feature for PyTorch 2.5 #3063
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3063
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ad0c00c with merge base 1217b4c (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -0,0 +1,198 @@ | |||
Max-autotune Support on CPU with GEMM Template Tutorial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GEMM template is an implementation detail. Suggest to name the title from users perspective: "Use max-autotune compilation on CPU to gain further performance boost". Then in the content of the tutorial, we can talk about perf boost in GEMMs with the template implementation etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to "Use max-autotune compilation on CPU to gain further performance boost"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are docs with name prefixed torch.compiler_
in docs/source
. Shall we move the doc there instead?
Example code | ||
------------ | ||
The below code is an example of using the ``max-autotune`` mode on a simple neural network with a linear layer followed by a ReLU activation. | ||
You could run the example code by setting this environment variable ``export TORCHINDUCTOR_FREEZING=1``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only support frozen model with torch.no_grad
or the inference mode, shall we highlight this fact here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request on frozen and no_grad
has been added.
---------------- | ||
- `torch.compile and TorchInductor concepts in PyTorch <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`__ | ||
|
||
Introduction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, add the RFC link in the intro as the reference too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC link has been added
in the generated code anymore, instead, we'll find kernel based on CPP GEMM template ``cpp_fused__to_copy_relu_1`` | ||
(only part of the code is demonstrated below for simplicity) with the bias and relu epilogues fused inside the CPP GEMM template kernel. | ||
|
||
.. code:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code below is generated on CPUs with AMX support. You may consider to add a note here that the generated code differs per CPU archs and implementation specific, subject to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The note has been added.
Hi @svekars could you help review this tutorial targeting PyTorch 2.5? |
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Hi @svekars thanks for the suggestions! I've updated the PR. Could you help take another look? |
Hi @svekars may I know if there's any action required on my side for this tutorial targeting PyTorch 2.5? |
Description
Max-autotune Support on CPU with GEMM Template is proposed as a prototype feature for PyTorch 2.5. This PR adds the tutorial for this feature.
Rendered version: link.