-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Video corruption on resume (Skylake) #149
Comments
I have had this issue for a few months now. It did work at some point in the past but I haven't gone through building old kernels to figure out which commit broke it. I recall it was some time between late January and early March. Is this a regression of #68 ? The visual symptoms are identical but the cause may not be same since that bug seemed to be fixed. /var/log/messages contains many repeats of
|
The source seems to have been refactored since https://bugs.freedesktop.org/show_bug.cgi?id=91697#c21 but I monkey-merged it with e0c82f4 of drm-next and upon resuming I do see the the desktop instead of the screen full of lines but the system remains graphically unresponsive and can be rebooted by pressing power button. messages.txt I haven't tried to understand the driver code so I have very little idea what the change is doing or should be doing, but symptomatically it appears to be a step in the right direction. |
@therontarigo Upon resume, are you able to ssh into the system? If so, could you please collect the output of: $ top -SHza -d 1 | cat ? |
I managed to get the
I notice that my swap partition is full, which is perhaps contributing to the loss of responsiveness. Anyhow, I'll try to get the |
Hm, no, the swap partition isn't used at all here. I don't see anything particularly odd in the top output - I think the dmesg would be more helpful. |
I also get screen corruption on kabylake + intel/SNA ddx. it is not as bad as the originator of this ticket though. I also am not swapping out and the procstat command does not show any processes:
Also, no interesting events in my dmesg buffer since the corruption starts. What may be interesting though is I have two displays connected on this laptop. The corruption always happens on the laptops eDP display, not the HDMI connected display. I've also got debugfs mounted, but am not seeing anything of interest in there. let me know if there are other datapoints which may be of help In the off chance that the info in i915_context_status from debugfs is helpful here it is:
|
Oops, that was 4096M free, not 4096M used! :) The output of
The output of
Hopefully that's a bit more illuminating. |
Just FYI, when the machine (running da5f901) is plugged into an external display (via either DP or HDMI), I see the same pattern on both displays: Even more interestingly, when the computer is in this state it starts making an unusual sound a few seconds after resuming: |
Okay, I found that the exact commit to break this was 41b97ee, which in fact only updated the firmware.
Is this Intel's bug or a failure elsewhere in the kernel code to make a change necessary to become compatible with newer firmware? |
@therontarigo: thank you! Applying this workaround also fixes my resume corruption. |
@therontarigo nice find! |
@therontarigo @markjdb : Reverting the firmware version installed will break other setups, because it basically means the default firmware will be used instead of the shipped one. I.E. Applying this patch will prevent the firmware from be loaded, because the kernel always asks for ver6_1 ?? Can you check this in dmesg? |
@hselasky Indeed,
It seems likely then that the workaround for some reason depends on the absence of the firmware and the problem would come back if the kernel were to load the ver4 guc and ver1 dmc. Why is the kernel still trying to load the newer file? There are absolutely no instances of the string "ver6_1" in my source tree! |
@therontarigo : Look for this:
They define the firmware versions to be loaded. Can you check if a newer firmware revision is available? |
6.1 is the latest as per this Intel site: https://01.org/linuxgraphics/downloads/firmware Also wanted to mention I enabled loading GuC on my Skylake system and the firmware loads as expected:
|
@hselasky Thanks, I figured it might be something like that. Now I see that the Makefile that was modified only controls what is copied upon kernel install. By not loading these firmwares, are scheduling and certain sleep tasks falling back on kernel code as the documentation at https://01.org/linuxgraphics/downloads/firmware seems to imply, or is there a "default firmware" somewhere in ROM or elsewhere in the source tree? |
@therontarigo : Did you try to search for similar issues ? Also did you try the very latest drm-next branch? There is also an effort to upgrade to Linux 4.10. Maybe this issue is already known and fixed. |
Just FYI, I still see this corruption on resume with ad07fe7. |
@pewright-tronc: when you say that you "enabled loading GuC"... how do you do that? I've tried |
from what i've found - loading i915 related firmware has been hit or miss. on my skylake box when i load the i915kms driver, it automatically finds the skl GuC and DMC firmware and loads it. yet on a kabylake box, when i build the appropriate firmware modules the DMC does load, but i get an identical error that you see for the GuC bits. My guess is the "fetch NONE, load NONE" error is a red-herring- although unfortunately i don't have anything to back that up at the moment :( do you happen to see the DMC firmware getting loaded on your system - it's named "i915_skl_dmc_ver1_26_bin.ko" |
Ok, so it sounds like perhaps I shouldn't worry about the GuC thing... Also, yes, the DMC firmware gets loaded automatically when I |
It looks like @cperciva has now gotten to the point of seeing the same coloured bars as me... progress of a kind? :) |
It is GuC that always fails to load. DMC loads without issue, but the video corruption occurs when it is loaded - deleting the module containing the firmware works around the issue, but likely at the cost of greater power consumption. |
I have heard that Skylake video resume is not an issue on OpenBSD - could this be as simple as they do not load the firmware in the first place, as it cannot be audited? |
Thanks to everyone for the digging and discussion. Just wanted to add a note regarding my positive experience using the workaround. Running with top-of-tree drm-next GENERIC (rev 300ce5d) + current FreeBSD.org generic pkgs on a Dell Latitude E7470 with an i7-6600U, I've never had problems suspending, and the following gets me functional resume without the graphical stripe corruption: On this machine, only i915_skl_dmc_ver1_26_bin.ko is loaded automatically by i915kms.ko, and i915_skl_guc_ver6_1_bin.ko is not. I tried explicitly loading both dmc and guc modules from rc.conf like so: |
@lastewart : thanks very much! This workaround works for me too, and lets me run with |
Just a quick update: my (Skylake) notebook is still experiencing this issue as of f304e52, though the prevously-proposed workaround continues to help. I wonder: are there examples of Skylake machines that do not experience this video corruption on resume? If not, might it be worth disabling the |
Hi, Can you try this patch: --HPS |
…er(). This ensures that firmware loading works properly without any signature errors. Issue: #149 Issue: #151 Signed-off-by: Hans Petter Selasky <[email protected]>
Have you tried to set "sysctl hw.acpi.reset_video=1" before suspending ? |
It seems that the system doesn't resume with that sysctl enabled: I get a black screen and an unresponsive machine (pressing the power button again doesn't shut the system down, for example). |
I have this exact same issue with lenovo X1. Setting "sysctl hw.acpi.reset_video=1" prevents the system from resuming altogether. Otherwise I have the same pinstripe looking video curroption as shown in the screenshot. This is running stock CURRENT r330606 with 4.11.g20180224 installed fresh today. Let me know if I can provide any additional debugging to resolve this. |
To load the guc you need to force it since it's disabled by default in the intel driver. For me it fails to load about half of the times. I think my skylake GPU hangs with the new firmware which might be why intel has not enabled it yet (the system continues to boot as normal with guc disabled after automatic GPU reset). Add to /boot/loader.conf
|
scatterlist.h is idential to what is in drm-next branch in freebsd-base-graphics. It has my fixes from August: commit a7dcabc |
Yes I realized that the comment was very old :) |
Dumb question here, but why do we want the guc? Does it do anything aside from giving us mangled video output on resume? (I mean, I assume it's supposed to do something, but...) |
@cperciva |
What's the plan to overcome this issue? Remove dmc firmware from /boot/modules ? |
@abishai If that solves your problem, yes do that for now until we can locate the source of the problem. |
@johalun Yes, it does. During boot sequence, driver warns me that runtime power management is disabled, but I failed to measure, at least with acpiconf -i 0, the difference. Numbers seems to be the same. |
@johalun On a HP EliteBook 1040 G3 (Skylake i7-6600U), moving the |
Issue persists (unless firmware modules are deleted) on FreeBSD 12.0-CURRENT r335560, drm-stable-kmod-g20180606 (from ports). Linux had this problem, but it was fixed: https://bugs.freedesktop.org/show_bug.cgi?id=91697 Kernel output during the problem remains as always: For what it is worth, the symptoms have been encountered even on Windows: |
Why is the dmc firmware needed, and why is it packaged if it causes resume issues? |
@pkgdemon Newer firmware brings updates and fixes to GPU microcode, should be OK without for basic functionality. The resume issue is most likely not related to the firmware itself but how the firmware is loaded in FreeBSD / LinuxKPI. Investigating this closer is on the todo list.. |
@pkgdemon I have strong suspicion that without dmc, plasma5 compositor becomes unstable (producing artifacts from time to time) with OpenGL backend when doing vsync |
@abishai Does the combination of absent dmc with drm-stable-kmod produce these artifacts? |
@therontarigo Yes, I run drm-stable-kmod. Visually, it's screen redraw glitches used in vsync. They go away if I toggle OpenGL (2 to 3.1 or back) or to xrandr in compositor settings and occurs very infrequently. However, I can't be sure if it's directly because of dmc. Maybe, it's result of suspend/resume. For obvious reasons, I'm not use it with dmc present. |
This seems resolved:
Suspend and resume with no (graphics-related) problems... |
Confirmed, the latest version works fine. IIRC someone at BSDCam said that it was an issue with reloading the firmware after resume. |
I have no problems either on the HP 1040 G3. However, I don't suspend and resume other than to test the laptop. I turn off my laptop completely. It's an old habit from from the days when the best CPU my laptop had was a Celeron M that I refuse to break (my current laptop has an i7). |
Looks good for me too! |
When I resume from suspend, I see the following corruption. This has been going on for a couple of months; @markjdb asked me to retest this week, but alas, the corruption continues.
The text was updated successfully, but these errors were encountered: