Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flash-attn v2.6.3 + TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX #10

Merged
merged 32 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
4179071
updated v2.6.2
regro-cf-autotick-bot Jul 24, 2024
b58d8e1
MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.37.1, and co…
regro-cf-autotick-bot Jul 24, 2024
89c5d54
Update setup.py for flash-attention v2.6.2
weiji14 Jul 24, 2024
7a7e321
updated v2.6.3
regro-cf-autotick-bot Jul 26, 2024
d57121f
MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.37.1, and co…
regro-cf-autotick-bot Jul 26, 2024
7dda024
Enable cirun-openstack-gpu-large, cirun-openstack-cpu-large using Cirun
Jul 29, 2024
49756ac
MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.37.1, and co…
conda-forge-curator[bot] Jul 29, 2024
58a39f9
Merge branch 'cirun-1722272355' into 2.6.2_h62eaee
carterbox Jul 29, 2024
738fe60
Merge branch '2.6.3_h7bf43b' into 2.6.2_h62eaee
carterbox Jul 29, 2024
6be2fd7
BLD: Enable all compatible CUDA arch targets
carterbox Jul 29, 2024
0419cf9
BLD: Limit MAX_JOBS to prevent runner crash
carterbox Jul 29, 2024
7b3a38a
Limit MAX_JOBS to 1
carterbox Jul 29, 2024
494edc3
CI: Increase timeout to 36 hours
carterbox Jul 30, 2024
248c4eb
MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.37.1, and co…
Jul 30, 2024
686b454
Small non-bot commit to trigger runners
weiji14 Jul 30, 2024
82fbb57
MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.37.1, and co…
weiji14 Jul 30, 2024
680495a
CI: Reduce build matrix for debugging
carterbox Jul 31, 2024
0dfb395
CI: Increase jobs and use larger runner
carterbox Aug 28, 2024
5878e8b
MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.41.1, and co…
carterbox Oct 9, 2024
ff9184f
Merge branch 'main' into 2.6.2_h62eaee
weiji14 Oct 13, 2024
91bc5d0
Rebuild for python 3.13
regro-cf-autotick-bot Oct 9, 2024
96ed564
Add setuptools as runtime dependency
weiji14 Oct 13, 2024
7391de7
Test build on CUDA 12.0 and Python 3.13
weiji14 Oct 13, 2024
e4e877b
MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.42.1, and co…
weiji14 Oct 13, 2024
12182c1
BLD: Revert debugging settings
carterbox Oct 15, 2024
68a3446
MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.42.2, and co…
Oct 15, 2024
ca12eba
Skip 313 until pytorch unbroken
carterbox Oct 15, 2024
5c8dd0e
MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.42.2, and co…
Oct 15, 2024
9ec6e0b
Update .github/workflows/conda-build.yml
carterbox Oct 15, 2024
3051209
MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.42.2, and co…
Oct 15, 2024
b0e6598
Set timeout to 9 hours again
weiji14 Oct 15, 2024
38c4700
MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.42.2, and co…
weiji14 Oct 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .ci_support/migrations/python313.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
migrator_ts: 1724712607
__migrator:
commit_message: Rebuild for python 3.13
migration_number: 1
operation: key_add
primary_key: python
ordering:
python:
- 3.6.* *_cpython
- 3.7.* *_cpython
- 3.8.* *_cpython
- 3.9.* *_cpython
- 3.10.* *_cpython
- 3.11.* *_cpython
- 3.12.* *_cpython
- 3.13.* *_cp313 # new entry
- 3.6.* *_73_pypy
- 3.7.* *_73_pypy
- 3.8.* *_73_pypy
- 3.9.* *_73_pypy
paused: false
longterm: true
pr_limit: 20
max_solver_attempts: 3 # this will make the bot retry "not solvable" stuff 12 times
exclude:
# this shouldn't attempt to modify the python feedstocks
- python
- pypy3.6
- pypy-meta
- cross-python
- python_abi
# see https://github.com/conda-forge/scipy-feedstock/pull/283
- scipy
exclude_pinned_pkgs: false
additional_zip_keys:
- channel_sources

python:
- 3.13.* *_cp313
channel_sources:
- conda-forge/label/python_rc,conda-forge
# additional entries to add for zip_keys
numpy:
- 2
python_impl:
- cpython
4 changes: 3 additions & 1 deletion .github/workflows/conda-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ name: Build conda package
on:
push:

pull_request:

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}
cancel-in-progress: true
Expand All @@ -14,7 +16,7 @@ jobs:
build:
name: ${{ matrix.CONFIG }}
runs-on: ${{ matrix.runs_on }}
timeout-minutes: 360
timeout-minutes: 540
strategy:
fail-fast: false
matrix:
Expand Down
4 changes: 2 additions & 2 deletions .scripts/build_steps.sh

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 4 additions & 5 deletions conda-forge.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
azure:
free_disk_space: true
timeout_minutes: 360
settings_linux:
swapfile_size: 10GiB
github:
branch_name: main
tooling_branch_name: main
conda_build:
error_overlinking: true
conda_forge_output_validation: true
github_actions:
timeout_minutes: 540
self_hosted: true
triggers:
- push
- pull_request
provider:
linux_64: github_actions
14 changes: 9 additions & 5 deletions recipe/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{% set name = "flash-attn" %}
{% set version = "2.6.1" %}
{% set version = "2.6.3" %}

package:
name: {{ name|lower }}
version: {{ version }}

source:
- url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/flash_attn-{{ version }}.tar.gz
sha256: c18d22d27031a761e68ffeb770be17b2b865b04fb9b401aa35372957965d01a3
sha256: 5bfae9500ad8e7d2937ebccb4906f3bc464d1bf66eedd0e4adabd520811c7b52
# Overwrite with a simpler build script that doesn't try to revend pre-compiled binaries
- path: pyproject.toml
- path: setup.py
Expand All @@ -16,11 +16,15 @@ build:
number: 1
script: {{ PYTHON }} -m pip install . -vvv --no-deps --no-build-isolation
script_env:
- MAX_JOBS=$CPU_COUNT
# Not compiling for 8.0;8.6;8.9;9.0+PTX to keep builds under 6 hours
- TORCH_CUDA_ARCH_LIST=8.0+PTX
# Limit MAX_JOBS in order to prevent runners from crashing
- MAX_JOBS=4
- TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX
skip: true # [cuda_compiler_version in (undefined, "None")]
skip: true # [not linux]
skip: true # [py==313] # Skip until pytorch dependency on setuptools is fixed
Copy link
Member

@weiji14 weiji14 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove after conda-forge/pytorch-cpu-feedstock#276 is merged (remember to re-render manually afterwards).

Suggested change
skip: true # [py==313] # Skip until pytorch dependency on setuptools is fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to rebuild to add those additional modules requested in #18. We can see if pytorch is ready for 3.13 then.

# debugging skips below
# skip: true # [py!=313]
# skip: true # [cuda_compiler_version != "12.0"]
ignore_run_exports_from:
- libcublas-dev # [(cuda_compiler_version or "").startswith("12")]
- libcusolver-dev # [(cuda_compiler_version or "").startswith("12")]
Expand Down
24 changes: 14 additions & 10 deletions recipe/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@
"csrc/flash_attn/src/flash_fwd_hdim160_bf16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim192_fp16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim192_bf16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim224_fp16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim224_bf16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim256_fp16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim256_bf16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim32_fp16_causal_sm80.cu",
Expand All @@ -53,8 +51,6 @@
"csrc/flash_attn/src/flash_fwd_hdim160_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim192_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim192_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim224_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim224_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim256_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_hdim256_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim32_fp16_sm80.cu",
Expand All @@ -69,10 +65,22 @@
"csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim224_fp16_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim224_bf16_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim256_fp16_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim256_bf16_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim32_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim32_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim64_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim64_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim96_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim96_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim160_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim160_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim192_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim192_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim256_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_bwd_hdim256_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim32_fp16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim32_bf16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu",
Expand All @@ -85,8 +93,6 @@
"csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim192_fp16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim192_bf16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim224_fp16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim224_bf16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim256_fp16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim256_bf16_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim32_fp16_causal_sm80.cu",
Expand All @@ -101,8 +107,6 @@
"csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim192_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim192_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim224_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim224_bf16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim256_fp16_causal_sm80.cu",
"csrc/flash_attn/src/flash_fwd_split_hdim256_bf16_causal_sm80.cu",
],
Expand Down