You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
interesting notes: (and yes i know, userspace shall never crash driver)
Pretty much reproduces on a 7900 XT(not the last X)
i dont get into that unrecoverable state though(or maby i have not run it for long enough)
v6.1 of rocm was released a few days ago 12th april, it behaves way better, with ubuntu 20.04 and default 5.15 kernel
Modified the driver.py, indented the print, added a counter, sleep 1 and destroyed the queue.
print(nq, ii)
ii += 1
time.sleep(1)
qq = kio.destroy_queue(fd)
With rocm 6.0.3 and kernel 5.15 it will shortly start to write these to dmesg (before 100 loops):
[ 337.541867] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14
and you notice that it starts to go abit slower. Yes it will run a long time, over 1000 loops... though spewing the errors.
With rocm 6.1 and kernel 5.15 it will run for over 1000 runs
almost just writes these warnings: [drm] Skip scheduling IBs!
They have not just silenced the errors, it seems to work better, otherwise it would go slower like it did for 6.0.3
bob@melee:~/dev/7900xtx/crash$ uname -a
Linux melee 5.15.0-105-generic #115~20.04.1-Ubuntu SMP Mon Apr 15 17:33:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Updated to kernel 6.5 and both rocm 6.0.3 and 6.1... ewww. spewing errors quickly.
The text was updated successfully, but these errors were encountered:
interesting notes: (and yes i know, userspace shall never crash driver)
Pretty much reproduces on a 7900 XT(not the last X)
i dont get into that unrecoverable state though(or maby i have not run it for long enough)
v6.1 of rocm was released a few days ago 12th april, it behaves way better, with ubuntu 20.04 and default 5.15 kernel
Modified the
driver.py
, indented the print, added a counter, sleep 1 and destroyed the queue.With rocm 6.0.3 and kernel 5.15 it will shortly start to write these to
dmesg
(before 100 loops):and you notice that it starts to go abit slower. Yes it will run a long time, over 1000 loops... though spewing the errors.
With rocm 6.1 and kernel 5.15 it will run for over 1000 runs
almost just writes these warnings:
[drm] Skip scheduling IBs!
They have not just silenced the errors, it seems to work better, otherwise it would go slower like it did for 6.0.3
Updated to kernel 6.5 and both rocm 6.0.3 and 6.1... ewww. spewing errors quickly.
The text was updated successfully, but these errors were encountered: