Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Flash Attention Failure on AMD Mi50 #1140

Closed
ThePerfectComputer opened this issue Jan 23, 2024 · 4 comments
Closed

[Issue]: Flash Attention Failure on AMD Mi50 #1140

ThePerfectComputer opened this issue Jan 23, 2024 · 4 comments

Comments

@ThePerfectComputer
Copy link

Problem Description

I was able to build flash-attention ROCM for both my Mi100 and Mi50 cards, but only got flash attention working on the Mi100(very impressive performance I might add).

Trying to run flash attention on the Mi50 delivered the following error:
RuntimeError: DeviceGroupedMultiheadAttentionForward_Xdl_CShuffle_V2<256, 128, 128, 32, 8, 8, 128, 128, 32, 2, Default, ASpecDefault, B0SpecDefault, B1SpecDefault, CSpecDefault, MaskUpperTriangleFromTopLeft> does not support this problem

How hard would it be to ensure CK works against the Mi50? Happy to pay/hire for support on this as I have a rather large stockpile of Mi50s.

Note - I was directed to CK from here: ROCm/flash-attention#29

Operating System

Ubuntu 22.04

CPU

AMD EPYC 7351 16-Core Processor

GPU

AMD Instinct MI50

Other

No response

ROCm Version

ROCm 5.7.1

ROCm Component

No response

Steps to Reproduce

follow the steps in this dockerfile but change AMD GPU targets for the various makefiles to gfx906 for AMD Mi50.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

ROCk module is loaded

HSA System Attributes

Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES

==========
HSA Agents


Agent 1


Name: AMD EPYC 7351 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7351 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2400
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 8068804(0x7b1ec4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 8068804(0x7b1ec4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8068804(0x7b1ec4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 2


Name: AMD EPYC 7351 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7351 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 1
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2400
BDFID: 0
Internal Node ID: 1
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16512396(0xfbf58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16512396(0xfbf58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16512396(0xfbf58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 3


Name: AMD EPYC 7351 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7351 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 2
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2400
BDFID: 0
Internal Node ID: 2
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 8254860(0x7df58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 8254860(0x7df58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8254860(0x7df58c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 4


Name: AMD EPYC 7351 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7351 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 3
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2400
BDFID: 0
Internal Node ID: 3
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16458272(0xfb2220) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16458272(0xfb2220) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16458272(0xfb2220) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 5


Name: gfx906
Uuid: GPU-a20c80e172fd5d44
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 4
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 1792
Internal Node ID: 4
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32


Agent 6


Name: gfx906
Uuid: GPU-68da30417337ecda
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 5
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 2560
Internal Node ID: 5
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32


Agent 7


Name: gfx906
Uuid: GPU-2492786172fc1a88
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 6
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 10752
Internal Node ID: 6
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32


Agent 8


Name: gfx906
Uuid: GPU-bc5e39417337ecd7
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 7
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 11520
Internal Node ID: 7
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32


Agent 9


Name: gfx906
Uuid: GPU-65d6404172e6212c
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 8
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 17664
Internal Node ID: 8
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32


Agent 10


Name: gfx906
Uuid: GPU-1cf2390172dc76bd
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 9
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 18432
Internal Node ID: 9
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32


Agent 11


Name: gfx906
Uuid: GPU-aae6492172e6212c
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 10
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 25856
Internal Node ID: 10
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32


Agent 12


Name: gfx906
Uuid: GPU-4c24504172e6268f
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 11
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 26624
Internal Node ID: 11
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 469
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

Additional Information

No response

@zjing14
Copy link
Contributor

zjing14 commented Jan 30, 2024

@ThePerfectComputer Thanks for being interested in our Flash Attention. Our Flash Attention is implemented for MI100 and later DC GPUs. MI50, which lacks of AMD matrix cores (mfma), cannot provide good enough performance.

@zjing14
Copy link
Contributor

zjing14 commented Jan 30, 2024

You may be interested in our Flash Attention works on Navi3x: #1032

@radna0
Copy link

radna0 commented May 5, 2024

@ThePerfectComputer Have you been able to find a solution for this?

@linchen111
Copy link

Problem Description 问题描述

I was able to build flash-attention ROCM for both my Mi100 and Mi50 cards, but only got flash attention working on the Mi100(very impressive performance I might add).我能够为我的 Mi100 和 Mi50 卡构建 flash-attention ROCM,但只在 Mi100 上实现 flash-attention(我可能会补充非常令人印象深刻的性能)。

Trying to run flash attention on the Mi50 delivered the following error:尝试在 Mi50 上运行 Flash Attention 时出现以下错误: RuntimeError: DeviceGroupedMultiheadAttentionForward_Xdl_CShuffle_V2<256, 128, 128, 32, 8, 8, 128, 128, 32, 2, Default, ASpecDefault, B0SpecDefault, B1SpecDefault, CSpecDefault, MaskUpperTriangleFromTopLeft> does not support this problem RuntimeError: DeviceGroupedMultiheadAttentionForward_Xdl_CShuffle_V2<256, 128, 128, 32, 8, 8, 128, 128, 32, 2, Default, ASpecDefault, B0SpecDefault, B1SpecDefault, CSpecDefault, MaskUpperTriangleFromTopLeft> 不支持此问题

How hard would it be to ensure CK works against the Mi50? Happy to pay/hire for support on this as I have a rather large stockpile of Mi50s.确保 CK 能够对抗 Mi50 有多难?很高兴支付/雇用这方面的支持,因为我有相当大的 Mi50 库存。

Note - I was directed to CK from here: ROCm/flash-attention#29注意 - 我从这里被定向到 CK: ROCm/flash-attention#29

Operating System 操作系统

Ubuntu 22.04 乌班图22.04

CPU 中央处理器

AMD EPYC 7351 16-Core ProcessorAMD EPYC 7351 16 核处理器

GPU 图形处理器

AMD Instinct MI50 AMD 本能 MI50

Other 其他

No response 没有反应

ROCm Version ROCm版本

ROCm 5.7.1

ROCm Component ROCm 成分

No response 没有反应

Steps to Reproduce 重现步骤

follow the steps in this dockerfile but change AMD GPU targets for the various makefiles to gfx906 for AMD Mi50.按照此 dockerfile 中的步骤操作,但将各种 makefile 的 AMD GPU 目标更改为 AMD Mi50 的 gfx906。

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

(Linux 用户可选)/opt/rocm/bin/rocminfo --support 的输出

ROCk module is loaded

ROCK 模块已加载

HSA System Attributes HSA 系统属性

Runtime Version: 1.1 运行时版本:1.1 System Timestamp Freq.: 1000.000000MHz 系统时间戳频率:1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 签名。最大等待持续时间:18446744073709551615 (0xFFFFFFFFFFFFFFFF)(时间戳计数) Machine Model: LARGE  机器型号: 大型 System Endianness: LITTLE系统字节顺序:LITTLE Mwaitx: DISABLED Mwaitx:已禁用 DMAbuf Support: YES DMAbuf 支持:是

==========

HSA Agents ========== HSA 代理
Agent 1 代理人1

Name: AMD EPYC 7351 16-Core Processor名称:AMD EPYC 7351 16 核处理器 Uuid: CPU-XX  Uuid:CPU-XX Marketing Name: AMD EPYC 7351 16-Core Processor 营销名称:AMD EPYC 7351 16 核处理器 Vendor Name: CPU  供应商名称:CPU Feature: None specified 特征:未指定 Profile: FULL_PROFILE 个人资料:FULL_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 0(0x0)最大队列数:0(0x0) Queue Min Size: 0(0x0) 队列最小大小:0(0x0) Queue Max Size: 0(0x0) 队列最大大小:0(0x0) Queue Type: MULTI  队列类型:多 Node: 0  节点:0 Device Type: CPU  设备类型:CPU Cache Info:  缓存信息: L1: 32768(0x8000) KB Chip ID: 0(0x0)  芯片ID:0(0x0) ASIC Revision: 0(0x0) ASIC 修订版:0(0x0) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 2400 最大时钟频率(兆赫):2400 BDFID: 0  BDID: 0 Internal Node ID: 0 内部节点 ID:0 Compute Unit: 8  计算单元:8 SIMDs per CU: 0 每个 CU 的 SIMD:0 Shader Engines: 0  着色器引擎:0 Shader Arrs. per Eng.: 0 着色器编曲每个工程师:0 WatchPts on Addr. Ranges:1地址上的 WatchPts。范围:1 Features: None  特点:无 Pool Info: 矿池信息: Pool 1  池1 Segment: GLOBAL; FLAGS: FINE GRAINED 细分市场:全球;标志:细粒度 Size: 8068804(0x7b1ec4) KB大小:8068804(0x7b1ec4) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 2 2号池 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 细分市场:全球;旗帜:KERNARG,细粒 Size: 8068804(0x7b1ec4) KB大小:8068804(0x7b1ec4) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 3 3号池 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 8068804(0x7b1ec4) KB大小:8068804(0x7b1ec4) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE ISA Info: ISA 信息:

Agent 2 特工2

Name: AMD EPYC 7351 16-Core Processor名称:AMD EPYC 7351 16 核处理器 Uuid: CPU-XX  Uuid:CPU-XX Marketing Name: AMD EPYC 7351 16-Core Processor 营销名称:AMD EPYC 7351 16 核处理器 Vendor Name: CPU  供应商名称:CPU Feature: None specified 特征:未指定 Profile: FULL_PROFILE 个人资料:FULL_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 0(0x0)最大队列数:0(0x0) Queue Min Size: 0(0x0) 队列最小大小:0(0x0) Queue Max Size: 0(0x0) 队列最大大小:0(0x0) Queue Type: MULTI  队列类型:多 Node: 1  节点:1 Device Type: CPU  设备类型:CPU Cache Info:  缓存信息: L1: 32768(0x8000) KB Chip ID: 0(0x0)  芯片ID:0(0x0) ASIC Revision: 0(0x0) ASIC 修订版:0(0x0) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 2400 最大时钟频率(兆赫):2400 BDFID: 0  BDID: 0 Internal Node ID: 1 内部节点ID:1 Compute Unit: 8  计算单元:8 SIMDs per CU: 0 每个 CU 的 SIMD:0 Shader Engines: 0  着色器引擎:0 Shader Arrs. per Eng.: 0 着色器编曲每个工程师:0 WatchPts on Addr. Ranges:1地址上的 WatchPts。范围:1 Features: None  特点:无 Pool Info: 矿池信息: Pool 1  池1 Segment: GLOBAL; FLAGS: FINE GRAINED 细分市场:全球;标志:细粒度 Size: 16512396(0xfbf58c) KB 大小:16512396(0xfbf58c) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 2 2号池 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 细分市场:全球;旗帜:KERNARG,细粒 Size: 16512396(0xfbf58c) KB 大小:16512396(0xfbf58c) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 3 3号池 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 16512396(0xfbf58c) KB 大小:16512396(0xfbf58c) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE ISA Info: ISA 信息:

Agent 3 特工3

Name: AMD EPYC 7351 16-Core Processor名称:AMD EPYC 7351 16 核处理器 Uuid: CPU-XX  Uuid:CPU-XX Marketing Name: AMD EPYC 7351 16-Core Processor 营销名称:AMD EPYC 7351 16 核处理器 Vendor Name: CPU  供应商名称:CPU Feature: None specified 特征:未指定 Profile: FULL_PROFILE 个人资料:FULL_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 0(0x0)最大队列数:0(0x0) Queue Min Size: 0(0x0) 队列最小大小:0(0x0) Queue Max Size: 0(0x0) 队列最大大小:0(0x0) Queue Type: MULTI  队列类型:多 Node: 2  节点:2 Device Type: CPU  设备类型:CPU Cache Info:  缓存信息: L1: 32768(0x8000) KB Chip ID: 0(0x0)  芯片ID:0(0x0) ASIC Revision: 0(0x0) ASIC 修订版:0(0x0) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 2400 最大时钟频率(兆赫):2400 BDFID: 0  BDID: 0 Internal Node ID: 2 内部节点ID:2 Compute Unit: 8  计算单元:8 SIMDs per CU: 0 每个 CU 的 SIMD:0 Shader Engines: 0  着色器引擎:0 Shader Arrs. per Eng.: 0 着色器编曲每个工程师:0 WatchPts on Addr. Ranges:1地址上的 WatchPts。范围:1 Features: None  特点:无 Pool Info: 矿池信息: Pool 1  池1 Segment: GLOBAL; FLAGS: FINE GRAINED 细分市场:全球;标志:细粒度 Size: 8254860(0x7df58c) KB 大小:8254860(0x7df58c) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 2 2号池 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 细分市场:全球;旗帜:KERNARG,细粒 Size: 8254860(0x7df58c) KB 大小:8254860(0x7df58c) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE Pool 3 3号池 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 8254860(0x7df58c) KB 大小:8254860(0x7df58c) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: TRUE所有人均可访问:TRUE ISA Info: ISA 信息:

Agent 4 特工4

Name: AMD EPYC 7351 16-Core Processor名称:AMD EPYC 7351 16 核处理器 Uuid: CPU-XX  Uuid:CPU-XX Marketing Name: AMD EPYC 7351 16-Core Processor 营销名称:AMD EPYC 7351 16 核处理器 Vendor Name: CPU  供应商名称:CPU Feature: None specified 特征:未指定 Profile: FULL_PROFILE 个人资料:FULL_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 0(0x0)最大队列数:0(0x0) Queue Min Size: 0(0x0) 队列最小大小:0(0x0) Queue Max Size: 0(0x0) 队列最大大小:0(0x0) Queue Type: MULTI  队列类型:多 Node: 3 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2400 BDFID: 0 Internal Node ID: 3 Compute Unit: 8 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 16458272(0xfb2220) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 16458272(0xfb2220) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16458272(0xfb2220) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:

Agent 5

Name: gfx906 Uuid: GPU-a20c80e172fd5d44 Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 4 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 26287(0x66af) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1801 BDFID: 1792 Internal Node ID: 4 Compute Unit: 60 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 469 SDMA engine uCode:: 145 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32

Agent 6

Name: gfx906 Uuid: GPU-68da30417337ecda Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 5 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 26287(0x66af) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1801 BDFID: 2560 Internal Node ID: 5 Compute Unit: 60 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 469 SDMA engine uCode:: 145 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32

Agent 7

Name: gfx906 Uuid: GPU-2492786172fc1a88 Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 6 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 26287(0x66af) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1801 BDFID: 10752 Internal Node ID: 6 Compute Unit: 60 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 469 SDMA engine uCode:: 145 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32

Agent 8

Name: gfx906 Uuid: GPU-bc5e39417337ecd7 Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 7 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 26287(0x66af) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1801 BDFID: 11520 Internal Node ID: 7 Compute Unit: 60 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 469 SDMA engine uCode:: 145 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32

Agent 9

Name: gfx906 Uuid: GPU-65d6404172e6212c Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 8 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 26287(0x66af) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1801 BDFID: 17664 Internal Node ID: 8 Compute Unit: 60 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 469 SDMA engine uCode:: 145 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32

Agent 10

Name: gfx906 Uuid: GPU-1cf2390172dc76bd Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) 队列最小大小:64(0x40) Queue Max Size: 131072(0x20000) 队列最大大小:131072(0x20000) Queue Type: MULTI  队列类型:多 Node: 9  节点:9 Device Type: GPU  设备类型:GPU Cache Info:  缓存信息: L1: 16(0x10) KB L2: 8192(0x2000) KB  L2:8192(0x2000)KB Chip ID: 26287(0x66af)  芯片ID:26287(0x66af) ASIC Revision: 1(0x1) ASIC 修订版:1(0x1) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 1801最大时钟频率(兆赫):1801 BDFID: 18432  BDID:18432 Internal Node ID: 9 内部节点ID:9 Compute Unit: 60  计算单元:60 SIMDs per CU: 4 每个 CU SIMD:4 Shader Engines: 4  着色器引擎:4 Shader Arrs. per Eng.: 1 着色器编曲每个工程师:1 WatchPts on Addr. Ranges:4地址上的 WatchPts。范围:4 Features: KERNEL_DISPATCH 特征:KERNEL_DISPATCH Fast F16 Operation: TRUE快速 F16 操作:正确 Wavefront Size: 64(0x40) 波前尺寸:64(0x40) Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400)  1024(0x400) Max Waves Per CU: 40(0x28) 每个 CU 的最大波数:40(0x28) Max Work-item Per CU: 2560(0xa00) 每个 CU 的最大工作项:2560(0xa00) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) Max fbarriers/Workgrp: 32最大 fbarriers/Workgrp:32 Packet Processor uCode:: 469 数据包处理器 uCode:: 469 SDMA engine uCode:: 145SDMA 引擎 uCode:: 145 IOMMU Support:: None IOMMU 支持:: 无 Pool Info: 矿池信息: Pool 1  池1 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 2 2号池 Segment: GLOBAL; FLAGS: 细分市场:全球;标志: Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 3 3号池 Segment: GROUP 部门: 集团 Size: 64(0x40) KB  大小:64(0x40) KB Allocatable: FALSE  可分配:FALSE Alloc Granule: 0KB  分配颗粒:0KB Alloc Alignment: 0KB  分配对齐:0KB Accessible by all: FALSE所有人均可访问:FALSE ISA Info: ISA 信息: ISA 1 指令集1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- 名称:amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE 机器型号:HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE 配置文件:HSA_PROFILE_BASE Default Rounding Mode: NEAR 默认舍入模式:NEAR Default Rounding Mode: NEAR 默认舍入模式:NEAR Fast f16: TRUE  快速 f16:正确 Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400)  1024(0x400) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) FBarrier Max Size: 32FBarrier 最大尺寸:32

Agent 11 特工11

Name: gfx906 名称:gfx906 Uuid: GPU-aae6492172e6212cUuid:GPU-aae6492172e6212c Marketing Name: AMD Radeon VII 营销名称:AMD Radeon VII Vendor Name: AMD  供应商名称:AMD Feature: KERNEL_DISPATCH 功能:KERNEL_DISPATCH Profile: BASE_PROFILE  配置文件:BASE_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 128(0x80) 最大队列数:128(0x80) Queue Min Size: 64(0x40) 队列最小大小:64(0x40) Queue Max Size: 131072(0x20000) 队列最大大小:131072(0x20000) Queue Type: MULTI  队列类型:多 Node: 10  节点:10 Device Type: GPU  设备类型:GPU Cache Info:  缓存信息: L1: 16(0x10) KB L2: 8192(0x2000) KB  L2:8192(0x2000)KB Chip ID: 26287(0x66af)  芯片ID:26287(0x66af) ASIC Revision: 1(0x1) ASIC 修订版:1(0x1) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 1801最大时钟频率(兆赫):1801 BDFID: 25856  BDID:25856 Internal Node ID: 10 内部节点ID:10 Compute Unit: 60  计算单元:60 SIMDs per CU: 4 每个 CU SIMD:4 Shader Engines: 4  着色器引擎:4 Shader Arrs. per Eng.: 1 着色器编曲每个工程师:1 WatchPts on Addr. Ranges:4地址上的 WatchPts。范围:4 Features: KERNEL_DISPATCH 特征:KERNEL_DISPATCH Fast F16 Operation: TRUE快速 F16 操作:正确 Wavefront Size: 64(0x40) 波前尺寸:64(0x40) Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400)  1024(0x400) Max Waves Per CU: 40(0x28) 每个 CU 的最大波数:40(0x28) Max Work-item Per CU: 2560(0xa00) 每个 CU 的最大工作项:2560(0xa00) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) Max fbarriers/Workgrp: 32最大 fbarriers/Workgrp:32 Packet Processor uCode:: 469 数据包处理器 uCode:: 469 SDMA engine uCode:: 145SDMA 引擎 uCode:: 145 IOMMU Support:: None IOMMU 支持:: 无 Pool Info: 矿池信息: Pool 1  池1 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 2 2号池 Segment: GLOBAL; FLAGS: 细分市场:全球;标志: Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 3 3号池 Segment: GROUP 部门: 集团 Size: 64(0x40) KB  大小:64(0x40) KB Allocatable: FALSE  可分配:FALSE Alloc Granule: 0KB  分配颗粒:0KB Alloc Alignment: 0KB  分配对齐:0KB Accessible by all: FALSE所有人均可访问:FALSE ISA Info: ISA 信息: ISA 1 指令集1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- 名称:amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE 机器型号:HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE 配置文件:HSA_PROFILE_BASE Default Rounding Mode: NEAR 默认舍入模式:NEAR Default Rounding Mode: NEAR 默认舍入模式:NEAR Fast f16: TRUE  快速 f16:正确 Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400)  1024(0x400) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) FBarrier Max Size: 32FBarrier 最大尺寸:32

Agent 12 特工12

Name: gfx906 名称:gfx906 Uuid: GPU-4c24504172e6268fUuid:GPU-4c24504172e6268f Marketing Name: AMD Radeon VII 营销名称:AMD Radeon VII Vendor Name: AMD  供应商名称:AMD Feature: KERNEL_DISPATCH 功能:KERNEL_DISPATCH Profile: BASE_PROFILE  配置文件:BASE_PROFILE Float Round Mode: NEAR浮动轮模式:NEAR Max Queue Number: 128(0x80) 最大队列数:128(0x80) Queue Min Size: 64(0x40) 队列最小大小:64(0x40) Queue Max Size: 131072(0x20000) 队列最大大小:131072(0x20000) Queue Type: MULTI  队列类型:多 Node: 11  节点:11 Device Type: GPU  设备类型:GPU Cache Info:  缓存信息: L1: 16(0x10) KB L2: 8192(0x2000) KB  L2:8192(0x2000)KB Chip ID: 26287(0x66af)  芯片ID:26287(0x66af) ASIC Revision: 1(0x1) ASIC 修订版:1(0x1) Cacheline Size: 64(0x40)缓存线大小:64(0x40) Max Clock Freq. (MHz): 1801最大时钟频率(兆赫):1801 BDFID: 26624  BDID:26624 Internal Node ID: 11 内部节点ID:11 Compute Unit: 60  计算单元:60 SIMDs per CU: 4 每个 CU SIMD:4 Shader Engines: 4  着色器引擎:4 Shader Arrs. per Eng.: 1 着色器编曲每个工程师:1 WatchPts on Addr. Ranges:4地址上的 WatchPts。范围:4 Features: KERNEL_DISPATCH 特征:KERNEL_DISPATCH Fast F16 Operation: TRUE快速 F16 操作:正确 Wavefront Size: 64(0x40) 波前尺寸:64(0x40) Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400)  1024(0x400) Max Waves Per CU: 40(0x28) 每个 CU 的最大波数:40(0x28) Max Work-item Per CU: 2560(0xa00) 每个 CU 的最大工作项:2560(0xa00) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) Max fbarriers/Workgrp: 32最大 fbarriers/Workgrp:32 Packet Processor uCode:: 469 数据包处理器 uCode:: 469 SDMA engine uCode:: 145SDMA 引擎 uCode:: 145 IOMMU Support:: None IOMMU 支持:: 无 Pool Info: 矿池信息: Pool 1  池1 Segment: GLOBAL; FLAGS: COARSE GRAINED 细分市场:全球;标志:粗粒度 Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 2 2号池 Segment: GLOBAL; FLAGS: 细分市场:全球;标志: Size: 16760832(0xffc000) KB 大小:16760832(0xffc000) KB Allocatable: TRUE  可分配:TRUE Alloc Granule: 4KB  分配粒度:4KB Alloc Alignment: 4KB  分配对齐:4KB Accessible by all: FALSE所有人均可访问:FALSE Pool 3 3号池 Segment: GROUP 部门: 集团 Size: 64(0x40) KB  大小:64(0x40) KB Allocatable: FALSE  可分配:FALSE Alloc Granule: 0KB  分配颗粒:0KB Alloc Alignment: 0KB  分配对齐:0KB Accessible by all: FALSE所有人均可访问:FALSE ISA Info: ISA 信息: ISA 1 指令集1 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- 名称:amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE 机器型号:HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE 配置文件:HSA_PROFILE_BASE Default Rounding Mode: NEAR 默认舍入模式:NEAR Default Rounding Mode: NEAR 默认舍入模式:NEAR Fast f16: TRUE  快速 f16:正确 Workgroup Max Size: 1024(0x400)工作组最大大小:1024(0x400) Workgroup Max Size per Dimension:工作组每个维度的最大大小: x 1024(0x400) y 1024(0x400) z 1024(0x400)  1024(0x400) Grid Max Size: 4294967295(0xffffffff) 网格最大尺寸:4294967295(0xffffffff) Grid Max Size per Dimension: 每个维度的网格最大尺寸: x 4294967295(0xffffffff) y 4294967295(0xffffffff) 4294967295(0xffffffff) z 4294967295(0xffffffff) 4294967295(0xffffffff) FBarrier Max Size: 32FBarrier 最大尺寸:32 *** Done *** *** 完毕 ***

Additional Information 附加信息

No response 没有反应

hi, did you solve this problem? no matter what Mi50 is super cheap~ if you did solve this , I can send you some cards for free,haha.

or more directly ,can you help building vllm in mi50

@ppanchad-amd ppanchad-amd closed this as not planned Won't fix, can't repro, duplicate, stale Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants