New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Inductor] [Doc] Add tutorial for Max-autotune Support on CPU as a prototype feature for PyTorch 2.5 #3063

Merged

svekars merged 22 commits into pytorch:main from chunyuan-w:chunyuan/max-autotune-pr

Oct 9, 2024

Contributor

chunyuan-w commented Sep 27, 2024 •

edited

Loading

Description

Max-autotune Support on CPU with GEMM Template is proposed as a prototype feature for PyTorch 2.5. This PR adds the tutorial for this feature.

Rendered version: link.


          add max-autotune tutorial

66800eb

pytorch-bot bot commented Sep 27, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3063

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ad0c00c with merge base 1217b4c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the cla signed label

jgong5 suggested changes

View reviewed changes

prototype_source/max_autotune_CPU_with_gemm_template_tutorial.rst Outdated

		@@ -0,0 +1,198 @@
		Max-autotune Support on CPU with GEMM Template Tutorial

Contributor

jgong5 Sep 27, 2024

GEMM template is an implementation detail. Suggest to name the title from users perspective: "Use max-autotune compilation on CPU to gain further performance boost". Then in the content of the tutorial, we can talk about perf boost in GEMMs with the template implementation etc.

Contributor Author

chunyuan-w Sep 27, 2024

Renamed to "Use max-autotune compilation on CPU to gain further performance boost"

prototype_source/max_autotune_CPU_with_gemm_template_tutorial.rst Outdated

Contributor

jgong5 Sep 27, 2024

There are docs with name prefixed torch.compiler_ in docs/source. Shall we move the doc there instead?

prototype_source/max_autotune_CPU_with_gemm_template_tutorial.rst Outdated

+              Example code
+              ------------
+              The below code is an example of using the ``max-autotune`` mode on a simple neural network with a linear layer followed by a ReLU activation.
+              You could run the example code by setting this environment variable ``export TORCHINDUCTOR_FREEZING=1``.

Contributor

jgong5 Sep 27, 2024

Since we only support frozen model with torch.no_grad or the inference mode, shall we highlight this fact here?

Contributor Author

chunyuan-w Sep 27, 2024

The request on frozen and no_grad has been added.

prototype_source/max_autotune_CPU_with_gemm_template_tutorial.rst Outdated

+              ----------------
+              -  `torch.compile and TorchInductor concepts in PyTorch <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`__
+              Introduction

Contributor

jgong5 Sep 27, 2024

Perhaps, add the RFC link in the intro as the reference too?

Contributor Author

chunyuan-w Sep 27, 2024

RFC link has been added

prototype_source/max_autotune_CPU_with_gemm_template_tutorial.rst Outdated

+              in the generated code anymore, instead, we'll find kernel based on CPP GEMM template ``cpp_fused__to_copy_relu_1``
+              (only part of the code is demonstrated below for simplicity) with the bias and relu epilogues fused inside the CPP GEMM template kernel.
+              .. code:: python

Contributor

jgong5 Sep 27, 2024

The code below is generated on CPUs with AMX support. You may consider to add a note here that the generated code differs per CPU archs and implementation specific, subject to change?

Contributor Author

chunyuan-w Sep 27, 2024

The note has been added.

chunyuan-w added 5 commits

September 26, 2024 20:33


          Rename the tutorial

1d57543


          add RFC link and mention that code is subject to change

243c58e


          fix link

9380b9d


          add request on frozen and no_grad

29effc5


          add description on perf boost

b8639c1

chunyuan-w requested a review from jgong5

September 27, 2024 06:03

chunyuan-w changed the title ~~[Inductor] [Doc] Add tutorial for Max-autotune Support on CPU with GEMM Template as a prototype feature for PyTorch 2.5~~ [Inductor] [Doc] Add tutorial for Max-autotune Support on CPU as a prototype feature for PyTorch 2.5

jgong5 approved these changes

View reviewed changes


          change from further to additional in the title

806ee21

chunyuan-w marked this pull request as ready for review

September 27, 2024 07:50

Contributor Author

chunyuan-w commented Sep 27, 2024

Hi @svekars could you help review this tutorial targeting PyTorch 2.5?

svekars requested a review from agunapal

September 27, 2024 16:34

agunapal reviewed

View reviewed changes

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved


          Add more details for freezing

dce78f3

svekars reviewed

View reviewed changes

prototype_source/max_autotune_on_CPU_tutorial.rst Show resolved Hide resolved

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved

chunyuan-w and others added 10 commits

October 2, 2024 10:27


          Update prototype_source/max_autotune_on_CPU_tutorial.rst

7540b9a

Co-authored-by: Svetlana Karslioglu <[email protected]>


          Update prototype_source/max_autotune_on_CPU_tutorial.rst

fb8f415

Co-authored-by: Svetlana Karslioglu <[email protected]>


          Update prototype_source/max_autotune_on_CPU_tutorial.rst

4323a70

Co-authored-by: Svetlana Karslioglu <[email protected]>


          Update prototype_source/max_autotune_on_CPU_tutorial.rst

e480573

Co-authored-by: Svetlana Karslioglu <[email protected]>


          Update prototype_source/max_autotune_on_CPU_tutorial.rst

f9b4159

Co-authored-by: Svetlana Karslioglu <[email protected]>


          Update prototype_source/max_autotune_on_CPU_tutorial.rst

a43f7b9

Co-authored-by: Svetlana Karslioglu <[email protected]>


          Update prototype_source/max_autotune_on_CPU_tutorial.rst

5aa9e84

Co-authored-by: Svetlana Karslioglu <[email protected]>


          Update prototype_source/max_autotune_on_CPU_tutorial.rst

c320d58

Co-authored-by: Svetlana Karslioglu <[email protected]>


          Update prototype_source/max_autotune_on_CPU_tutorial.rst

f1fae2e

Co-authored-by: Svetlana Karslioglu <[email protected]>


          Merge branch 'main' into chunyuan/max-autotune-pr

94e3e06

chunyuan-w requested a review from svekars

October 2, 2024 02:33

Contributor Author

chunyuan-w commented Oct 2, 2024

Hi @svekars thanks for the suggestions! I've updated the PR. Could you help take another look?

Contributor Author

chunyuan-w commented Oct 8, 2024

Hi @svekars may I know if there's any action required on my side for this tutorial targeting PyTorch 2.5?


          Merge branch 'main' into chunyuan/max-autotune-pr

50285ed

svekars approved these changes

View reviewed changes

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved

prototype_source/max_autotune_on_CPU_tutorial.rst Show resolved Hide resolved

svekars added 2 commits

October 9, 2024 11:52


          Formatting fixes.

2f90eae


          Merge branch 'main' into chunyuan/max-autotune-pr

7b4792c

svekars added the 2.5 label

svekars reviewed

View reviewed changes

prototype_source/max_autotune_on_CPU_tutorial.rst Outdated Show resolved Hide resolved


          Update prototype_source/max_autotune_on_CPU_tutorial.rst

ad0c00c

svekars merged commit 19fffda into pytorch:main

17 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels