Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flash-attn v2.6.3 + TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX #10

Merged
merged 32 commits into from
Oct 16, 2024

Conversation

regro-cf-autotick-bot
Copy link
Contributor

@regro-cf-autotick-bot regro-cf-autotick-bot commented Jul 24, 2024

It is very likely that the current package version for this feedstock is out of date.

Checklist before merging this PR:

  • Dependencies have been updated if changed: see upstream
  • Tests have passed
  • Updated license if changed and license_file is packaged

Information about this PR:

  1. Feel free to push to the bot's branch to update this PR if needed.
  2. The bot will almost always only open one PR per version.
  3. The bot will stop issuing PRs if more than 3 version bump PRs generated by the bot are open. If you don't want to package a particular version please close the PR.
  4. If you want these PRs to be merged automatically, make an issue with @conda-forge-admin,please add bot automerge in the title and merge the resulting PR. This command will add our bot automerge feature to your feedstock.
  5. If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase @conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

Pending Dependency Version Updates

Here is a list of all the pending dependency version updates for this repo. Please double check all dependencies before merging.

Name Upstream Version Current Version
pytorch-cpu 131547 Anaconda-Server Badge

Dependency Analysis

We couldn't run dependency analysis due to an internal error in the bot, depfinder, or grayskull. :/ Help is very welcome!

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by - please use this URL for debugging.

Closes #11
Closes #17

@conda-forge-webservices
Copy link
Contributor

conda-forge-webservices bot commented Jul 24, 2024

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@carterbox
Copy link
Member

Looks like we are exceeding the 6 hour limit even when building for only a single arch. Looks like we need to finish the process to get on the Quantsight GPU server.

@weiji14, could you please agree to the terms of use for the Quantsight GPU severs by opening a pull request here: https://github.com/Quansight/open-gpu-server

Then we can open a request to have the bot enable the Quantsight GPU server for this feedstock using this template: https://github.com/conda-forge/admin-requests/blob/main/examples/example-open-gpu-server.yml

Ref: https://conda-forge.org/docs/maintainer/knowledge_base/#packages-that-require-a-gpu-or-long-running-builds

@weiji14
Copy link
Member

weiji14 commented Jul 25, 2024

Sure, let me work on that later, recent builds have been too close to the 6hr mark and I've had to restart CI for the past two releases already, and would be good to handle more CUDA archs (xref #4). Not sure if we actually need a GPU-enabled server though, maybe just a longer timeout (+more CPU/RAM)?

@carterbox
Copy link
Member

Not sure if we actually need a GPU-enabled server though, maybe just a longer timeout (+more CPU/RAM)?

The only way to accomplish builds longer than 6 hours is to use Quantsight's servers. I'm not sure what options Quantsight provides for servers with/without GPUs, but this package is a CUDA enabled package, so we should just use the GPU server list.

@carterbox
Copy link
Member

carterbox commented Jul 29, 2024

TODO:

@carterbox
Copy link
Member

On the openstack runners, the jobs are crashing during the first compilation phase. Trying to reduce the number of concurrent compilations to 2 in order to prevent this crash (from a max of 4).

@carterbox
Copy link
Member

Maybe they're not crashing? Maybe the jobs are just being killed at 30mins?

@carterbox carterbox added the automerge Merge the PR when CI passes label Jul 29, 2024
@carterbox
Copy link
Member

@conda-forge-admin, please rerender

@carterbox carterbox added automerge Merge the PR when CI passes and removed automerge Merge the PR when CI passes labels Jul 30, 2024
@weiji14 weiji14 changed the title flash-attn v2.6.3 + TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX flash-attn v2.6.3 + python 3.13 + TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX Oct 13, 2024
@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@weiji14
Copy link
Member

weiji14 commented Oct 13, 2024

Thanks @jakirkham for the merge tips 🙏

The previous test build with CUDA 11.8 / Python 3.12 seems to have completed successfully on cirun-openstack-cpu-xlarge (32GB RAM) with MAX_JOBS=4 at commit 5878e8b in 8h6m. See logs at https://github.com/conda-forge/flash-attn-feedstock/actions/runs/11246603140/job/31367673505

I'm now testing CUDA 12.0 / Python 3.13 (cherry-picking from #17) at commit e4e877b, and if this builds complete within the time limit, I think we should have enough info on what resources will be needed for the full matrix builds.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, are we ready for the full matrix builds?

recipe/meta.yaml Outdated Show resolved Hide resolved
.github/workflows/conda-build.yml Outdated Show resolved Hide resolved
@carterbox
Copy link
Member

carterbox commented Oct 15, 2024

Sounds good to me! 8 hours per build seems reasonable.

@carterbox
Copy link
Member

@conda-forge-admin, please rerender

Copy link
Member

@carterbox carterbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. Setuptools is not a runtime dependency of this package. Pytorch needs to fix its missing/extra dependency, not us.

conda-forge-webservices[bot] and others added 2 commits October 15, 2024 16:19
@carterbox
Copy link
Member

@conda-forge-admin, please rerender

conda-forge-webservices[bot] and others added 2 commits October 15, 2024 16:25
@carterbox
Copy link
Member

@conda-forge-admin, please rerender

@carterbox carterbox changed the title flash-attn v2.6.3 + python 3.13 + TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX flash-attn v2.6.3 + TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX Oct 15, 2024
@jakirkham
Copy link
Member

Thanks Daniel! 🙏

Looks like cirun doesn't like the bot commit. So may need to push another commit

Unsure why rerender at 3051209 reverted to 2160 mins.
@weiji14
Copy link
Member

weiji14 commented Oct 15, 2024

Ok, re-rendered manually using conda smithy rerender -c auto. I've set the timeout back to 9 hours.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builds all pass now 🎉 Longest one was CUDA 11.8 / Python 3.12 that took 8h 26m 41s (under the 9h time limit).

Shall we wait for conda-forge/pytorch-cpu-feedstock#276, and then we can include the Python 3.13 builds here too?

skip: true # [cuda_compiler_version in (undefined, "None")]
skip: true # [not linux]
skip: true # [py==313] # Skip until pytorch dependency on setuptools is fixed
Copy link
Member

@weiji14 weiji14 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove after conda-forge/pytorch-cpu-feedstock#276 is merged (remember to re-render manually afterwards).

Suggested change
skip: true # [py==313] # Skip until pytorch dependency on setuptools is fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to rebuild to add those additional modules requested in #18. We can see if pytorch is ready for 3.13 then.

@carterbox carterbox merged commit bc32f15 into conda-forge:main Oct 16, 2024
10 checks passed
@regro-cf-autotick-bot regro-cf-autotick-bot deleted the 2.6.2_h62eaee branch October 16, 2024 04:27
@jakirkham
Copy link
Member

Hooray! 🥳

Excellent work 😄 Thank you both! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants