Bump xformers from 0.0.18 to 0.0.27.post2 #39

dependabot · 2024-07-29T11:09:13Z

Bumps xformers from 0.0.18 to 0.0.27.post2.

Release notes

torch.compile support, bug fixes & more

Pre-built binary wheels require PyTorch 2.4.0

Added

fMHA: PagedBlockDiagonalGappyKeysMask

fMHA: heterogeneous queries in triton_splitk

fMHA: support for paged attention in flash

fMHA: Added backwards pass for merge_attentions

fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4

fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))

fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.

fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device

2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

fMHA: Fixed out-of-bounds reading for Split-K triton implementation

Profiler: fix bug with modules that take a single tuple as argument

Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

Removed support for PyTorch version older than 2.2.0

torch.compile support, bug fixes & more

Pre-built binary wheels require PyTorch 2.4.0

Added

fMHA: PagedBlockDiagonalGappyKeysMask

fMHA: heterogeneous queries in triton_splitk

fMHA: support for paged attention in flash

fMHA: Added backwards pass for merge_attentions

fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4

fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))

fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.

fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device

2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

fMHA: Fixed out-of-bounds reading for Split-K triton implementation

Profiler: fix bug with modules that take a single tuple as argument

Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

Removed support for PyTorch version older than 2.2.0

... (truncated)

Changelog

Sourced from xformers's changelog.

[0.0.27.post2] - 2024-07-26

Pre-built binary wheels require PyTorch 2.4.0

[0.0.27.post1] - 2024-07-25

Pre-built binary wheels require PyTorch 2.4.0

[0.0.27] - 2024-07-10

Pre-built binary wheels require PyTorch 2.3.1

Added

fMHA: PagedBlockDiagonalGappyKeysMask

fMHA: heterogeneous queries in triton_splitk

fMHA: support for paged attention in flash

fMHA: Added backwards pass for merge_attentions

fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4

fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))

fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.

fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device

2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

fMHA: Fixed out-of-bounds reading for Split-K triton implementation

Profiler: fix bug with modules that take a single tuple as argument

Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

Removed support for PyTorch version older than 2.2

[0.0.26] - 2024-04-29

Pre-built binary wheels require PyTorch 2.3.0

Added

[2:4 sparsity] Added support for Straight-Through Estimator for sparsify24 gradient (GRADIENT_STE)

[2:4 sparsity] sparsify24_like now supports the cuSparseLt backend, and the STE gradient

Basic support for torch.compile for the memory_efficient_attention operator. Currently only supports Flash-Attention, and without any bias provided. We want to expand this coverage progressively.

Improved

merge_attentions no longer needs inputs to be stacked.

fMHA: triton_splitk now supports additive bias

fMHA: benchmark cleanup

[0.0.25.post1] - 2024-03-29

Pre-built binary wheels require PyTorch 2.2.2

[0.0.25] - 2024-03-14

Pre-built binary wheels require PyTorch 2.2.1

Added

New merge_attentions function

fMHA: New gappy attention biases.

Improved

fMHA: Updated Flash-Attention to v2.5.6: this has a performance improvement for multiquery.

fMHA: triton_splitk changed and expanded. Now amalgamates using LSE. Can autotune, supports causal with a small number of queries - not just 1. Experimental support for paged attention.

rope_padded: Fixed CUDA error with many queries (more than 65k)

rmsnorm: Fixed CUDA error with large inputs (enables 512k+ sequence length on Llama2 70B)

Removed

... (truncated)

Commits

1fc661f Release v0.0.27.post2
8d8463c Re-enable build of Flash on Windows
3610a54 LowerTriangularMask.to inference_mode fix
0b9cb70 Fix fused seqpar after change in torch._scaled_mm
2b8f5fc Remove unused declarations
a635e0c Replace deprecated packed_accessor with packed_accessor64
b03b9ba Avoid using deprecated AMP functions
b3831ea Build for PyTorch 2.4.0
71308aa Have a single _FusedSequenceParallel class handle all dtypes (fairinternal/xf...
2456ea3 Remove _check_large_shapes checking in fmha/ck.py (#1067)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [xformers](https://github.com/facebookresearch/xformers) from 0.0.18 to 0.0.27.post2. - [Release notes](https://github.com/facebookresearch/xformers/releases) - [Changelog](https://github.com/facebookresearch/xformers/blob/main/CHANGELOG.md) - [Commits](facebookresearch/xformers@v0.0.18...v0.0.27.post2) --- updated-dependencies: - dependency-name: xformers dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

dependabot bot added the dependencies Pull requests that update a dependency file label Jul 29, 2024

dependabot bot mentioned this pull request Jul 29, 2024

Bump xformers from 0.0.18 to 0.0.27.post1 #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump xformers from 0.0.18 to 0.0.27.post2 #39

Bump xformers from 0.0.18 to 0.0.27.post2 #39

dependabot bot commented on behalf of github Jul 29, 2024

Bump xformers from 0.0.18 to 0.0.27.post2 #39

Are you sure you want to change the base?

Bump xformers from 0.0.18 to 0.0.27.post2 #39

Conversation

dependabot bot commented on behalf of github Jul 29, 2024

torch.compile support, bug fixes & more

Added

Improved

Removed

torch.compile support, bug fixes & more

Added

Improved

Removed

[0.0.27.post2] - 2024-07-26

[0.0.27.post1] - 2024-07-25

[0.0.27] - 2024-07-10

Added

Improved

Removed

[0.0.26] - 2024-04-29

Added

Improved

[0.0.25.post1] - 2024-03-29

[0.0.25] - 2024-03-14

Added

Improved

Removed