Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inductor] [Doc] Add tutorial for Max-autotune Support on CPU as a prototype feature for PyTorch 2.5 #3063

Merged
merged 22 commits into from
Oct 9, 2024

Conversation

chunyuan-w
Copy link
Contributor

@chunyuan-w chunyuan-w commented Sep 27, 2024

Description

Max-autotune Support on CPU with GEMM Template is proposed as a prototype feature for PyTorch 2.5. This PR adds the tutorial for this feature.

Rendered version: link.

Copy link

pytorch-bot bot commented Sep 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3063

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ad0c00c with merge base 1217b4c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@@ -0,0 +1,198 @@
Max-autotune Support on CPU with GEMM Template Tutorial
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GEMM template is an implementation detail. Suggest to name the title from users perspective: "Use max-autotune compilation on CPU to gain further performance boost". Then in the content of the tutorial, we can talk about perf boost in GEMMs with the template implementation etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to "Use max-autotune compilation on CPU to gain further performance boost"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are docs with name prefixed torch.compiler_ in docs/source. Shall we move the doc there instead?

Example code
------------
The below code is an example of using the ``max-autotune`` mode on a simple neural network with a linear layer followed by a ReLU activation.
You could run the example code by setting this environment variable ``export TORCHINDUCTOR_FREEZING=1``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only support frozen model with torch.no_grad or the inference mode, shall we highlight this fact here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request on frozen and no_grad has been added.

----------------
- `torch.compile and TorchInductor concepts in PyTorch <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`__

Introduction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, add the RFC link in the intro as the reference too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC link has been added

in the generated code anymore, instead, we'll find kernel based on CPP GEMM template ``cpp_fused__to_copy_relu_1``
(only part of the code is demonstrated below for simplicity) with the bias and relu epilogues fused inside the CPP GEMM template kernel.

.. code:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code below is generated on CPUs with AMX support. You may consider to add a note here that the generated code differs per CPU archs and implementation specific, subject to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note has been added.

@chunyuan-w chunyuan-w changed the title [Inductor] [Doc] Add tutorial for Max-autotune Support on CPU with GEMM Template as a prototype feature for PyTorch 2.5 [Inductor] [Doc] Add tutorial for Max-autotune Support on CPU as a prototype feature for PyTorch 2.5 Sep 27, 2024
@chunyuan-w chunyuan-w marked this pull request as ready for review September 27, 2024 07:50
@chunyuan-w
Copy link
Contributor Author

Hi @svekars could you help review this tutorial targeting PyTorch 2.5?

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved
prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved
prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved
prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved
prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved
prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved
prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved
prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved
@chunyuan-w
Copy link
Contributor Author

Hi @svekars thanks for the suggestions! I've updated the PR. Could you help take another look?

@chunyuan-w
Copy link
Contributor Author

Hi @svekars may I know if there's any action required on my side for this tutorial targeting PyTorch 2.5?

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved
prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved
@svekars svekars added the 2.5 PR related to version 2.5 label Oct 9, 2024
@svekars svekars merged commit 19fffda into pytorch:main Oct 9, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.5 PR related to version 2.5 cla signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants