Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump xformers from 0.0.18 to 0.0.27.post2 #39

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dependabot[bot]
Copy link

@dependabot dependabot bot commented on behalf of github Jul 29, 2024

Bumps xformers from 0.0.18 to 0.0.27.post2.

Release notes

Sourced from xformers's releases.

torch.compile support, bug fixes & more

Pre-built binary wheels require PyTorch 2.4.0

Added

  • fMHA: PagedBlockDiagonalGappyKeysMask
  • fMHA: heterogeneous queries in triton_splitk
  • fMHA: support for paged attention in flash
  • fMHA: Added backwards pass for merge_attentions
  • fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4
  • fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))
  • fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.
  • fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device
  • 2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

  • fMHA: Fixed out-of-bounds reading for Split-K triton implementation
  • Profiler: fix bug with modules that take a single tuple as argument
  • Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

  • Removed support for PyTorch version older than 2.2.0

torch.compile support, bug fixes & more

Pre-built binary wheels require PyTorch 2.4.0

Added

  • fMHA: PagedBlockDiagonalGappyKeysMask
  • fMHA: heterogeneous queries in triton_splitk
  • fMHA: support for paged attention in flash
  • fMHA: Added backwards pass for merge_attentions
  • fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4
  • fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))
  • fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.
  • fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device
  • 2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

  • fMHA: Fixed out-of-bounds reading for Split-K triton implementation
  • Profiler: fix bug with modules that take a single tuple as argument
  • Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

  • Removed support for PyTorch version older than 2.2.0

... (truncated)

Changelog

Sourced from xformers's changelog.

[0.0.27.post2] - 2024-07-26

Pre-built binary wheels require PyTorch 2.4.0

[0.0.27.post1] - 2024-07-25

Pre-built binary wheels require PyTorch 2.4.0

[0.0.27] - 2024-07-10

Pre-built binary wheels require PyTorch 2.3.1

Added

  • fMHA: PagedBlockDiagonalGappyKeysMask
  • fMHA: heterogeneous queries in triton_splitk
  • fMHA: support for paged attention in flash
  • fMHA: Added backwards pass for merge_attentions
  • fMHA: Added torch.compile support for 3 biases (LowerTriangularMask, LowerTriangularMaskWithTensorBias and BlockDiagonalMask) - some might require PyTorch 2.4
  • fMHA: Added torch.compile support in memory_efficient_attention when passing the flash operator explicitely (eg memory_efficient_attention(..., op=(flash.FwOp, flash.BwOp)))
  • fMHA: memory_efficient_attention now expects its attn_bias argument to be on the same device as the other input tensor. Previously, it would convert the bias to the right device.
  • fMHA: AttentionBias subclasses are now constructed by default on the cuda device if available - they used to be created on the CPU device
  • 2:4 sparsity: Added xformers.ops.sp24.sparsify24_ste for Straight Through Estimator (STE) with options to rescale the gradient differently for masked out/kept values

Improved

  • fMHA: Fixed out-of-bounds reading for Split-K triton implementation
  • Profiler: fix bug with modules that take a single tuple as argument
  • Profiler: Added manual trigger for a profiling step, by creating a trigger file in the profiling directory

Removed

  • Removed support for PyTorch version older than 2.2

[0.0.26] - 2024-04-29

Pre-built binary wheels require PyTorch 2.3.0

Added

  • [2:4 sparsity] Added support for Straight-Through Estimator for sparsify24 gradient (GRADIENT_STE)
  • [2:4 sparsity] sparsify24_like now supports the cuSparseLt backend, and the STE gradient
  • Basic support for torch.compile for the memory_efficient_attention operator. Currently only supports Flash-Attention, and without any bias provided. We want to expand this coverage progressively.

Improved

  • merge_attentions no longer needs inputs to be stacked.
  • fMHA: triton_splitk now supports additive bias
  • fMHA: benchmark cleanup

[0.0.25.post1] - 2024-03-29

Pre-built binary wheels require PyTorch 2.2.2

[0.0.25] - 2024-03-14

Pre-built binary wheels require PyTorch 2.2.1

Added

  • New merge_attentions function
  • fMHA: New gappy attention biases.

Improved

  • fMHA: Updated Flash-Attention to v2.5.6: this has a performance improvement for multiquery.
  • fMHA: triton_splitk changed and expanded. Now amalgamates using LSE. Can autotune, supports causal with a small number of queries - not just 1. Experimental support for paged attention.
  • rope_padded: Fixed CUDA error with many queries (more than 65k)
  • rmsnorm: Fixed CUDA error with large inputs (enables 512k+ sequence length on Llama2 70B)

Removed

... (truncated)

Commits
  • 1fc661f Release v0.0.27.post2
  • 8d8463c Re-enable build of Flash on Windows
  • 3610a54 LowerTriangularMask.to inference_mode fix
  • 0b9cb70 Fix fused seqpar after change in torch._scaled_mm
  • 2b8f5fc Remove unused declarations
  • a635e0c Replace deprecated packed_accessor with packed_accessor64
  • b03b9ba Avoid using deprecated AMP functions
  • b3831ea Build for PyTorch 2.4.0
  • 71308aa Have a single _FusedSequenceParallel class handle all dtypes (fairinternal/xf...
  • 2456ea3 Remove _check_large_shapes checking in fmha/ck.py (#1067)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [xformers](https://github.com/facebookresearch/xformers) from 0.0.18 to 0.0.27.post2.
- [Release notes](https://github.com/facebookresearch/xformers/releases)
- [Changelog](https://github.com/facebookresearch/xformers/blob/main/CHANGELOG.md)
- [Commits](facebookresearch/xformers@v0.0.18...v0.0.27.post2)

---
updated-dependencies:
- dependency-name: xformers
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants