Merge recent changes from ROCm xformers #1196

qianfengz · 2025-01-17T10:46:25Z

This PR provides

Add support of hdim-512 for fmha-fwd
Avoid PageBlockDiagonal attn_bias types being used with forward-training (since PageKVCache is only implemented by the splitkv-kernel)
Fix in xformers/benchmarks/benchmark_attn_decoding.py to make it works correctly for ck.FwOp
Performance improvement for decoder fmha-fwd with mqa/gqa

The following scripts are used to test/verify the changes

#> pytest tests/test_mem_eff_attention.py::test_forward
#> pytest tests/test_mem_eff_attention.py::test_backward
#> pytest tests/test_mem_eff_attention.py::test_dropout_ck
#> pytest tests/test_mem_eff_attention.py::test_dropout_backward_ck
#> pytest tests/test_mem_eff_attention.py::test_logsumexp
#> pytest tests/test_mem_eff_attention.py::test_paged_attention_ck

The following script is used to benchmark/verify the performance of decoder with mqa/gqa using ck.FwOp

#> python xformers/benchmarks/benchmark_attn_decoding.py

Avoid unused-const-variable warning

)

…ases passed

…etup.py

… the fmd_bwd_kernel

…_kernel

…_stride_dv parameters

…false situtation

…erated reference headers

[CK] Memory-efficient attention (Head Dimension = 512)

Remove using splitkv kernel from fmha fwd training path

…rs been suppressed

Disable PagedAttn bias types and hdim-512 for test_logsumexp

hotfix typo

update to rocm 6.3 wheels

Enable hdim=512 by default

Further update to build hdim-512 by default

Merge upstream into ROCM develop

xw285cornell · 2025-01-18T03:01:23Z

Let's hold off a bit, I'm still working on merging the prior PR. Can we make sure all mem efficient tests are passing?

This reverts commit 84883b5.

qianfengz · 2025-01-23T08:17:16Z

Let's hold off a bit, I'm still working on merging the prior PR. Can we make sure all mem efficient tests are passing?

I just pushed commit f858c, let the forward training path still able to use splitkv-kernel, since without doing this, the benchmark scripts which uses memory_efficient_attention_partial will not benefit from our recent optimization for small-q sizes by the splitkv kernel. But with this enabled the unit test

#> pytest tests/test_mem_eff_attention.py::test_forward

will have 14 bfloat16 cases failed even when export ENABLE_HIP_FMHA_RTN_CONVERT16=1 is used to enable RTN method for fp32 to bfloat16 conversion (which is much more accurate than the default RTZ conversion method). So currently, we are not able to judge whether the failed cases are due to bug or lower accuracy with regard to the lse output from fmha-forward kernel brings expanded inaccuracy in the final outputs (dQuery, dKey, dValue).

danthe3rd · 2025-01-30T12:59:46Z

Hi, is this ready to merge? We would like to do a new release soon (PT 2.6 is just out)
Linters are still failing

qianfengz and others added 30 commits July 9, 2024 18:22

Merge pull request #13 from xw285cornell/xdwang-develop

9440282

Avoid unused-const-variable warning

Remove _check_large_shapes checking in fmha/ck.py (facebookresearch#1067

bd49f48

)

make xformers install editable to fix cpp extensions detection

0d1d1be

Update to using the improved fmha-bwd (compiling passed)

9390d6a

Update to get 80% of the test_backward and test_dropout_backward_ck c…

22fce7e

…ases passed

Replace the using of ConvertGradQ by using torch tensor type converting

463a475

Change the tile settings for MaxK=32

3427a6f

Fix padding setting bug in grouped_backward

fbc7c50

Change -DCK_FMHA_FWD_FAST_EXP2=1 to -DCK_TILE_FMHA_FWD_FAST_EXP2=1

6e08666

Point the composable_kernel_tiled submodule to ck_tile/fa_bwd_opt branch

94ab599

Disable flshattF and flshattB on ROCM

830697c

Add -mllvm and -enable-post-misched=0 compiling options for ROCM on s…

afd7e02

…etup.py

Disable flshattF and flshattB on ROCM

e67de41

Update to support separate grad_q_f32_strides do to the API change in…

d72c2b3

… the fmd_bwd_kernel

Use old method for setting BlockDropout due to the revert in fmha_fwd…

5ddff31

…_kernel

Tiny fix in grouped_backward

cf2b622

Use packed tensor allocation for grad_q_f32

112aaed

Update to the ConvertGradQ kernel calling

dd83c62

Tiny update

3e9b99d

Fix the parameter location in grouped_backward

019448e

Adjust headdim128 tile shapes for better performance

c55966a

Update backward kernel calling due to adding of nhead_stride_dk/nhead…

e22829a

…_stride_dv parameters

Synchronize with CK to use separate pipeline for kPadHeadDim true of …

cae1b77

…false situtation

Use convertDQ kernel

e564f5e

Update to use unpadded lse layout

b043765

Add explicit headdim256 instances for fmha backward

c9e7595

Add leaked headdim256 instance references

4a7b7dc

Change to generate.py and the re-generate the instance files using it

1ad9cbe

Change to generate.py to generate instances refences and uses the gen…

7db2aa4

…erated reference headers

Relax the RTOL of ckFwOp from 4e-4 to 3e-3 due to one big result case

73dbf32

qianfengz and others added 23 commits January 13, 2025 04:52

Synchronize to latest ck commit

eeb581f

Use 64x128 Gemm0 Tile and WarpGemm-16x16x16 for hdim-512

701685c

Merge pull request #48 from ROCm/hdim-512

fd11dbd

[CK] Memory-efficient attention (Head Dimension = 512)

Remove using splitkv kernel from fmha fwd training path

84883b5

Merge pull request #49 from ROCm/hack_test_backward

2f66b19

Remove using splitkv kernel from fmha fwd training path

Add -Wc++11-narrowing to hip_fmha compiling options to avoid any erro…

be6f8c2

…rs been suppressed

Merge branch 'develop' into develop

1f12982

Update wheels.yml

e14bf36

Disable PagedAttn bias types and hdim-512 for test_logsumexp

6213bf6

Merge pull request #50 from ROCm/fix_test_logsumexp

028196d

Disable PagedAttn bias types and hdim-512 for test_logsumexp

Merge branch 'develop' into develop

d6e7e4f

Update wheels.yml

58c037b

hotfix typo

1dcb9d8

Merge pull request #51 from tenpercent/develop

21ede52

hotfix typo

Merge branch 'develop' into develop

433f4f9

Merge pull request #47 from johnnynunez/develop

4685c44

update to rocm 6.3 wheels

enable hdim=512 by default

6c78398

Merge branch 'develop' into develop

865e802

Merge pull request #52 from tenpercent/develop

beadd0b

Enable hdim=512 by default

Further update to build hdim-512 by default

0c85bee

Merge pull request #53 from ROCm/further_fix

fdc410a

Further update to build hdim-512 by default

Merge remote-tracking branch 'upstream/main' into merge_upstream

9928374

Merge pull request #54 from ROCm/merge_upstream

9045af7

Merge upstream into ROCM develop

facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm labels Jan 17, 2025

Remove Dockerfile.rocm

8e84e22

Revert "Remove using splitkv kernel from fmha fwd training path"

4cfab36

This reverts commit 84883b5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge recent changes from ROCm xformers #1196

Merge recent changes from ROCm xformers #1196

qianfengz commented Jan 17, 2025 •

edited

Loading

xw285cornell commented Jan 18, 2025

qianfengz commented Jan 23, 2025 •

edited

Loading

danthe3rd commented Jan 30, 2025 •

edited

Loading

Merge recent changes from ROCm xformers #1196

Are you sure you want to change the base?

Merge recent changes from ROCm xformers #1196

Conversation

qianfengz commented Jan 17, 2025 • edited Loading

xw285cornell commented Jan 18, 2025

qianfengz commented Jan 23, 2025 • edited Loading

danthe3rd commented Jan 30, 2025 • edited Loading

qianfengz commented Jan 17, 2025 •

edited

Loading

qianfengz commented Jan 23, 2025 •

edited

Loading

danthe3rd commented Jan 30, 2025 •

edited

Loading