Skip to content

Commit

Permalink
Auto enable mem efficient attention on gfx1100 on pytorch nightly 2.7
Browse files Browse the repository at this point in the history
I'm not not sure which arches are supported yet. If you see improvements in
memory usage while using --use-pytorch-cross-attention on your AMD GPU let
me know and I will add it to the list.
  • Loading branch information
comfyanonymous committed Feb 14, 2025
1 parent 042a905 commit d7b4bf2
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions comfy/model_management.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,19 @@ def is_amd():
except:
pass


try:
if is_amd():
arch = torch.cuda.get_device_properties(get_torch_device()).gcnArchName
logging.info("AMD arch: {}".format(arch))
if args.use_split_cross_attention == False and args.use_quad_cross_attention == False:
if int(torch_version[0]) >= 2 and int(torch_version[2]) >= 7: # works on 2.6 but doesn't actually seem to improve much
if arch in ["gfx1100"]: #TODO: more arches
ENABLE_PYTORCH_ATTENTION = True
except:
pass


if ENABLE_PYTORCH_ATTENTION:
torch.backends.cuda.enable_math_sdp(True)
torch.backends.cuda.enable_flash_sdp(True)
Expand Down

5 comments on commit d7b4bf2

@sleppyrobot
Copy link

@sleppyrobot sleppyrobot commented on d7b4bf2 Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, after updating comfy yesterday, my PC hard locks on the 5/12 sampler step in my workflow, its happen 3 times the exact same way when using hunyuan Video. 320x240x61
I tried LTXv, sdxl and sd1 and had no issues.
Its never happen before like that, so i test a bit and determine this was the cause, after commented out this code and now it works again.

Also xformers work with 7900xtx as of a few months ago, there are still some incompatibility with custom nodes but not with core nodes. For example it does not work with depth_anythingV2 but it does work with ml-depth-pro.

comfy start up below

/ComfyUI$ MIOPEN_FIND_MODE=2 PYTORCH_TUNABLEOP_ENABLED=1 python main.py --use-pytorch-cross-attention --bf16-vae --reserve-vram 0.9
[START] Security scan
[DONE] Security scan

ComfyUI-Manager: installing dependencies done.

** ComfyUI startup time: 2025-02-15 12:29:17.545
** Platform: Linux
** Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]
** Python executable: /home/adminl/anaconda3/envs/ComfyUI_310_s/bin/python
** ComfyUI Path: /home/adminl/Comfy/minimal/ComfyUI
** ComfyUI Base Folder Path: /home/adminl/Comfy/minimal/ComfyUI
** User directory: /home/adminl/Comfy/minimal/ComfyUI/user
** ComfyUI-Manager config path: /home/adminl/Comfy/minimal/ComfyUI/user/default/ComfyUI-Manager/config.ini
** Log path: /home/adminl/Comfy/minimal/ComfyUI/user/comfyui.log

Prestartup times for custom nodes:
0.0 seconds: /home/adminl/Comfy/minimal/ComfyUI/custom_nodes/rgthree-comfy
0.0 seconds: /home/adminl/Comfy/minimal/ComfyUI/custom_nodes/ComfyUI-Easy-Use
1.5 seconds: /home/adminl/Comfy/minimal/ComfyUI/custom_nodes/ComfyUI-Manager

Checkpoint files will always be loaded safely.
Total VRAM 24560 MB, total RAM 128733 MB
pytorch version: 2.6.0+rocm6.2.4
xformers version: 0.0.29.post2
Set vram state to: NORMAL_VRAM
Device: cuda:0 Radeon RX 7900 XTX : native
Using pytorch attention
ComfyUI version: 0.3.14
[Prompt Server] web root: /home/adminl/Comfy/minimal/ComfyUI/web
Total VRAM 24560 MB, total RAM 128733 MB
pytorch version: 2.6.0+rocm6.2.4
xformers version: 0.0.29.post2
Set vram state to: NORMAL_VRAM
Device: cuda:0 Radeon RX 7900 XTX : native

@sleppyrobot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, it seem that my files got corrupted somehow.

@LuXuxue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Notebook, SDXL, 1024x1536, euler_ancestral, simple
Hardware: Radeon 680M(Ryzen 7 6800H)
Using ArchLinux, pytorch 2.6, up to date
I use --fp8_e5m2 to make triton work and save 2GB RAM
I use --novram to make --use-pytorch-cross-attention work, without it will get oom
--use-pytorch-cross-attention do not save RAM, but can get significant speedup.
without torch.compile and --use-pytorch-cross-attention: 11.8s/it
with --use-pytorch-cross-attention and without torch.compile: 11.5s/it
with torch.compile and without --use-pytorch-cross-attention: 11.5s/it
with torch.compile and --use-pytorch-cross-attention: 8s/it, about 30% speedup

Checkpoint files will always be loaded safely.
Total VRAM 7584 MB, total RAM 15169 MB
pytorch version: 2.6.0
AMD arch: gfx1030
Set vram state to: NO_VRAM
Device: cuda:0 AMD Radeon Graphics : native
Using pytorch attention
ComfyUI version: 0.3.14
[Prompt Server] web root: /home/kane/ComfyUI/web

Import times for custom nodes:
0.0 seconds: /home/kane/ComfyUI/custom_nodes/websocket_image_save.py
0.0 seconds: /home/kane/ComfyUI/custom_nodes/ComfyUI-Custom-Scripts

Starting server

@wxe551
Copy link

@wxe551 wxe551 commented on d7b4bf2 Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also xformers work with 7900xtx as of a few months ago, there are still some incompatibility with custom nodes but not with core nodes. For example it does not work with depth_anythingV2 but it does work with ml-depth-pro.

@sleppyrobot
Though it is not related to the commit, but this might help if you are talking about "Controlnet AIO Aux Preprocessor".

kijai/ComfyUI-DepthAnythingV2@003d7b4

@sleppyrobot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i was referring to that

Please sign in to comment.