Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7900XT and rocm 6.1 #12

Open
smurfd opened this issue Apr 20, 2024 · 0 comments
Open

7900XT and rocm 6.1 #12

smurfd opened this issue Apr 20, 2024 · 0 comments

Comments

@smurfd
Copy link

smurfd commented Apr 20, 2024

interesting notes: (and yes i know, userspace shall never crash driver)
Pretty much reproduces on a 7900 XT(not the last X)
i dont get into that unrecoverable state though(or maby i have not run it for long enough)

v6.1 of rocm was released a few days ago 12th april, it behaves way better, with ubuntu 20.04 and default 5.15 kernel

Modified the driver.py, indented the print, added a counter, sleep 1 and destroyed the queue.

    print(nq, ii)
    ii += 1
    time.sleep(1)
    qq = kio.destroy_queue(fd)

With rocm 6.0.3 and kernel 5.15 it will shortly start to write these to dmesg (before 100 loops):

[  337.541867] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14

and you notice that it starts to go abit slower. Yes it will run a long time, over 1000 loops... though spewing the errors.

With rocm 6.1 and kernel 5.15 it will run for over 1000 runs
almost just writes these warnings:
[drm] Skip scheduling IBs!
They have not just silenced the errors, it seems to work better, otherwise it would go slower like it did for 6.0.3

bob@melee:~/dev/7900xtx/crash$ uname -a
Linux melee 5.15.0-105-generic #115~20.04.1-Ubuntu SMP Mon Apr 15 17:33:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Updated to kernel 6.5 and both rocm 6.0.3 and 6.1... ewww. spewing errors quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant