Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed streamk kernel bug #602

Open
wants to merge 1 commit into
base: streamk-no-atomic
Choose a base branch
from

Conversation

ravil-mobile
Copy link

No description provided.

@ravil-mobile
Copy link
Author

Hi, @xiaohuguo2023,

Can you check whether the numerical results are correct?

@xiaohuguo2023
Copy link
Member

Thanks @ravil-mobile , this does fix the compiling issue, but not the numerical results which are caused by the spinning locks.

@xiaohuguo2023
Copy link
Member

And the interesting part is that we don't really need another allocation at line 80, however, if I delete line 80, we will end up with the same error below:

root@banff-cyxtera-s79-2:/home/work/tritontmp/python/perf-kernels/streamk# python 03-matrix-multiplication-stream-k.py
total SMs: 304
M,N,K=8192,8192,8192 ; BLK_M,N,K=256,128,64
total_blocks_M=32 x total_blocks_N=64 = total_tiles=2048
total_tiles_streamk=224 + total_blocking_tiles=1824 = total_tiles=2048
total_programs_streamk=304
total_blocking_tiles=1824
iters_per_tile=128
total_iters_streamk=28672
python: /home/openai/triton/lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp:517: mlir::Value mlir::triton::gpu::{anonymous}::LayoutPropagation::getValueAs(mlir::Value, mlir::Attribute): Assertion `rewrittenValue' failed.
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libtriton.so       0x00007f788e9d3e67
1  libtriton.so       0x00007f788e9d18ec
2  libtriton.so       0x00007f788e9d453f
3  libpthread.so.0    0x00007f8069ef4420
4  libc.so.6          0x00007f8069bd700b gsignal + 203
5  libc.so.6          0x00007f8069bb6859 abort + 299
6  libc.so.6          0x00007f8069bb6729
7  libc.so.6          0x00007f8069bc7fd6
8  libtriton.so       0x00007f788ac27d22
9  libtriton.so       0x00007f788ac2ee04
10 libtriton.so       0x00007f788ac2fbbc
11 libtriton.so       0x00007f788aa01a6f
12 libtriton.so       0x00007f788ac26186
13 libtriton.so       0x00007f788b1a6b16
14 libtriton.so       0x00007f788b1a7461
15 libtriton.so       0x00007f788b1a9c2b
16 libtriton.so       0x00007f788aec5941
17 libtriton.so       0x00007f788aee4e98
18 libtriton.so       0x00007f788aeb1cb4
19 python             0x0000000000506ee7
20 python             0x00000000004f029c _PyObject_MakeTpCall + 748
21 python             0x0000000000504e10
22 python             0x00000000004ec2b4 _PyEval_EvalFrameDefault + 19780
23 python             0x00000000004f7c83
24 python             0x00000000004ec2b4 _PyEval_EvalFrameDefault + 19780
25 python             0x00000000004e666a
26 python             0x00000000004f79a5 _PyFunction_Vectorcall + 213
27 python             0x00000000004e7939 _PyEval_EvalFrameDefault + 969
28 python             0x00000000004e666a
29 python             0x00000000004f79a5 _PyFunction_Vectorcall + 213
30 python             0x00000000004e87a1 _PyEval_EvalFrameDefault + 4657
31 python             0x00000000004e666a
32 python             0x0000000000504bfd
33 python             0x0000000000505344 PyObject_Call + 180
34 python             0x00000000004eacd4 _PyEval_EvalFrameDefault + 14180
35 python             0x00000000004e666a
36 python             0x00000000004f79a5 _PyFunction_Vectorcall + 213
37 python             0x00000000004e87a1 _PyEval_EvalFrameDefault + 4657
38 python             0x00000000004e666a
39 python             0x00000000004f79a5 _PyFunction_Vectorcall + 213
40 python             0x00000000004e87a1 _PyEval_EvalFrameDefault + 4657
41 python             0x00000000004e666a
42 python             0x00000000004f79a5 _PyFunction_Vectorcall + 213
43 libtorch_python.so 0x00007f8061fd3011 THPFunction_apply(_object*, _object*) + 4017
44 python             0x0000000000506f10
45 python             0x00000000005053e8 PyObject_Call + 344
46 python             0x00000000004ed114 _PyEval_EvalFrameDefault + 23460
47 python             0x00000000004e666a
48 python             0x0000000000504b8c
49 python             0x00000000004ec2b4 _PyEval_EvalFrameDefault + 19780
50 python             0x00000000004e666a
51 python             0x00000000004e62f7 _PyEval_EvalCodeWithName + 71
52 python             0x00000000004e62a9 PyEval_EvalCodeEx + 57
53 python             0x00000000005930eb PyEval_EvalCode + 27
54 python             0x00000000005c0907
55 python             0x00000000005bc920
56 python             0x00000000004562ce
57 python             0x00000000005b6602 PyRun_SimpleFileExFlags + 418
58 python             0x00000000005b3b7e Py_RunMain + 894
59 python             0x0000000000587199 Py_BytesMain + 57
60 libc.so.6          0x00007f8069bb8083 __libc_start_main + 243
61 python             0x000000000058704e
Aborted (core dumped)

@@ -76,6 +77,7 @@ def persistent_streamk_gemm(
mask = (rm < M)[:, None] & (rn < N)[None, :]
tl.store(C_, acc, mask=mask)

acc = tl.zeros((BLOCK_SIZE_M, BLOCK_SIZE_N), dtype=acc_dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we could remove this line, we will get better perf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants