Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No operator found for memory_efficient_attention_forward with inputs: #1109

Open
brcisna opened this issue Sep 20, 2024 · 1 comment
Open

Comments

@brcisna
Copy link

brcisna commented Sep 20, 2024

🐛 Bug

Command

start wunjo V2

To Reproduce

Steps to reproduce the behavior:

briefcase dev # starts wunjo AI V2

1.go to generation tab
2.start image generation
3.in the console after a few seconds of image generation the following appears,,

ERROR
No operator found for memory_efficient_attention_forward with inputs:
query : shape=(1, 2, 1, 40) (torch.float32)
key : shape=(1, 2, 1, 40) (torch.float32)
value : shape=(1, 2, 1, 40) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
ckF is not supported because:
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})

Expected behavior

image is created

Environment

Debian 13
python3.10.12
Pytorch4.2.1_rocm
ROCm HIPCC
AND Radeon Pro W6600 GPU

Please copy and paste the output from the
environment collection script from PyTorch
(or fill out the checklist below manually).

You can run the script with:

# For security purposes, please check the contents of collect_env.py before running it.
python -m torch.utils.collect_env
python -m torch.utils.collect_env
/home/superuser/.pyenv/versions/3.10.12/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
Collecting environment information...
PyTorch version: 2.4.1+rocm6.1
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 6.1.40091-a8dbc0c19

OS: Debian GNU/Linux trixie/sid (x86_64)
GCC version: (Debian 14.2.0-3) 14.2.0
Clang version: Could not collect
CMake version: version 3.30.3
Libc version: glibc-2.40

Python version: 3.10.12 (main, Sep 17 2024, 03:58:18) [GCC 14.2.0] (64-bit runtime)
Python platform: Linux-6.10.9-amd64-x86_64-with-glibc2.40
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Radeon Pro W6600 (gfx1032)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 6.1.40091
MIOpen runtime version: 3.1.0
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               40
On-line CPU(s) list:                  0-39
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
CPU family:                           6
Model:                                63
Thread(s) per core:                   2
Core(s) per socket:                   10
Socket(s):                            2
Stepping:                             2
CPU(s) scaling MHz:                   57%
CPU max MHz:                          3000.0000
CPU min MHz:                          1200.0000
BogoMIPS:                             4588.99
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization:                       VT-x
L1d cache:                            640 KiB (20 instances)
L1i cache:                            640 KiB (20 instances)
L2 cache:                             5 MiB (20 instances)
L3 cache:                             50 MiB (2 instances)
NUMA node(s):                         2
NUMA node0 CPU(s):                    0-9,20-29
NUMA node1 CPU(s):                    10-19,30-39
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          KVM: Mitigation: VMX disabled
Vulnerability L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:                    Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:               Mitigation; PTI
Vulnerability Mmio stale data:        Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.23.5
[pip3] onnx==1.16.2
[pip3] onnxruntime==1.19.2
[pip3] onnxruntime-gpu==1.19.2
[pip3] open_clip_torch==2.26.1
[pip3] pytorch-lightning==2.3.3
[pip3] pytorch-ranger==0.1.1
[pip3] pytorch-triton-rocm==3.0.0
[pip3] torch==2.4.1+rocm6.1
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.4.1+rocm6.1
[pip3] torchlibrosa==0.1.0
[pip3] torchmetrics==1.2.0
[pip3] torchvision==0.19.1+rocm6.1
[pip3] triton==3.0.0
[conda] Could not collect

  • PyTorch Version (e.g., 1.0):2.4.1
  • OS (e.g., Linux): Debian 13
  • How you installed PyTorch (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.10.12
  • CUDA/cuDNN version: 11.8
  • GPU models and configuration: AMD Radeon Pro W6600
  • Any other relevant information:

Additional context

python -m xformers.info
xFormers 0.0.28.post1
memory_efficient_attention.ckF: available
memory_efficient_attention.ckB: available
memory_efficient_attention.ck_decoderF: available
memory_efficient_attention.ck_splitKF: available
memory_efficient_attention.cutlassF: unavailable
memory_efficient_attention.cutlassB: unavailable
[email protected]: unavailable
[email protected]: unavailable
[email protected]: unavailable
[email protected]: unavailable
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
sequence_parallel_fused.write_values: available
sequence_parallel_fused.wait_values: available
sequence_parallel_fused.cuda_memset_32b_async: available
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
[email protected]: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: True
pytorch.version: 2.4.1+rocm6.1
pytorch.cuda: available
gpu.compute_capability: 10.3
gpu.name: AMD Radeon Pro W6600
dcgm_profiler: unavailable
build.info: available
build.cuda_version: None
build.hip_version: 6.1.40093-bd86f1708
build.python_version: 3.10.15
build.torch_version: 2.4.1+rocm6.1
build.env.TORCH_CUDA_ARCH_LIST:
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: -allow-unsupported-compiler
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.28.post1
source.privacy: open source

@lw
Copy link
Contributor

lw commented Sep 23, 2024

What exactly are you asking for help for?

The error message seems quite clear: you cannot pass float32 tensors to that operator on AMD GPUs.

If you're invoking xFormers through wunjo (no idea what that is) you should check with them to get them to fix their invocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants