-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible issue with TME #836
Comments
Hi @renehoj, thank you for the report. What is the state of ME? If it is disabled can you try with enabled? |
I'm currently using the MSI firmware with ME enabled, and I believe I tried Dasharo both with and without ME enabled. |
@renehoj, thanks; at least we know this is not ME's weird dependency reason. The key question would be if we will have reproducibility of this issue. |
Can you put more info about the DDR5 modules and XMP settings (Which profile)? Does it happens with RAM running at default JEDEC? |
DDR5 4x16GB G.Skill Ripjaws S5 6GHz (XMP1) With Dasharo both failed to overclock with XMP1, and I was experiencing software crashes with both running as JEDEC. I didn't test Ripjaws with the MSI firmware. Vengeance with the MSI firmware runs 5.6GHz XMP1 OC without any issues, and is stable unless I enable TME in the firmware. You can enable memory encryption in the MSI firmware if you enable the advanced settings, and go into the CPU configuration, it's disabled as default. |
Last night I compiled Dasharo with CONFIG_CPU_SUPPORTS_INTEL_TME=n to see if that makes a difference. |
I didn't know this, thought that MSI never implemented it and said so a lot of times in the past. Well, at least it would make benchmarking easier... XMP implementation has been a failure considered in how few builds it works. So far, what I remember is than if you have Serial output, for DDR5 builds the FSP complains that XMP is only valid for 2 DIMMs installed and stops, doesn't even try to train with 4 DIMMs. Did this happen since the very beginning that you used Dasharo or somewhere recently? With the current Intel issues about stable builds that suddently begin to crash I could suspect about Processor degradation. Which would actually be a major problem since Dasharo uses Intel default settings. Perhaps the TME AES-XTS units are the weakest link in the chain? Impossible to know since few people uses them. |
The Z690 didn't have this issue, that I know for a fact, it started after I switched to the Z790. I don't believe I had this issue with the Z790 v0.9.0, but I'm not 100% sure. I checked the update history, and there was a microcode update for Qubes OS in January, I don't know if that could affect TME. I think the MSI firmware defaults to 4800, but I could be wrong. Dmidecode also says it's 4800 speed memory, configure to run 3600 when using Dasharo. |
Adding CONFIG_CPU_SUPPORTS_INTEL_TME=n to the config file didn't seem to do anything, it gets ignored by the build script. I compiled a new ROM with CONFIG_INTEL_TME=n |
@zirblazer I tested the MSI firmware again, the memory defaults to 4000MHz. I don't think this directly related to TME, I think it's this issue: I read elsewhere Chromium applications are directly affected by the issue. Maybe enabling TME makes it worse, but disabling TME doesn't solve the issue. I think the difference between the MSI and Dasharo firmware is just the default settings used, and the root issue is the 13900K CPU. I'm currently testing the beta version of the latest MSI firmware to see if that improves the situation. |
That is directly related to this, which was heavily cover by Hardware news sites between a month and two ago: https://www.tomshardware.com/pc-components/cpus/intel-issues-official-statement-on-core-k-series-crashes-stick-to-intels-official-power-profiles I'm just curious that you had the same CPU on two different motherboards and it is unstable on Z790-P but not on Z690-A. So far, Z690-A uses Alder Lake FSP that had preliminary Raptor Lake support, whereas Z790-P uses Raptor Lake FSP that has backwards compatibility with Alder Lake (Plus Intel recently discontinued Alder Lake FSP and said to use Raptor Lake FSP for everything instead). That is literally the biggest difference there is between the two. |
I didn't have the same CPU in two motherboards, I have one system with Z690 DDR4 + 12900K and one system with Z790 DDR5 + 13900K. What I meant was I didn't have the issue prior to upgrading to the Z790. |
You may be interesed in this: https://videocardz.com/newz/intel-finds-root-cause-of-raptor-lake-cpu-stability-issues-bios-with-new-microcode-underway |
Sounds like it could solve the issue. I tried building Dasharo with the same settings MSI use in their firmware, but I didn't know how to mimic the feature they call CPU Lite Load that control load line voltage and impedance/resistance. In the MSI firmware, it seems like my CPU isn't stable without changing that value. |
I need @miczyg1 confirmation that this is correct since I don't want to cause your Hardware go out with magic white smoke, but since this was asked recently I didn't had to delve too much in chat logs. First, if you already have MSI BIOS, get the MSI default CPU Lite Load Control values. When you select CPU Lite Load Control in Advance, it should show you the default values that MSI is currently using. You can check photos here: https://docs.dasharo.com/guides/dasharo-reviewers-guide/#configure-msi-firmware-with-intel-default-parameters Dasharo defaults depends on SKU and for a 13900K is supposed to be 110/110, which is the maximum value according to data sheet (Ironically, it should be the most stable, since you are providing the most voltage under load, at the cost of higher temperatures and power consumption). MSI tends to use lower values than that, but it also depends on SKU. Them, you're supposed to add SOMEWHERE in around here (This is what I'm not 100% sure about): The following two lines: That is equivalent to 80/80 on MSI (Basically, you need to add two zeros). Replace with whatever MSI uses. The only tool that I'm aware that can read these values from Software is HWinfo in Windows to confirm it kicked in as intended. Alternatively, I'm thinking about whenever to not disable TVB too... |
I've tried the stable version and the two beta version of the MSI firmware, the stable version with auto default to the value 8 and the beta versions defaults to the value 12 This post goes into default about how to map the Lite Load values 8 => 80/80 and 12 => 110/110 The beta version says in the change log it implements the intel default values, so I presume the value should be 12. I'm currently running the stable firmware with PLL 150/265 and Lite Load at 12, and the MSI config for intel thermal/boost/voltage/etc, and everything seems to be working as it should and the CPU isn't getting hot. Tomorrow I'll try and build Dasharo with Dc/Ac LoadLine set to 110000 I did try different TVB settings, it didn't really seem to do much. |
Is better if you actually talk in actual mOhms instead of whatever metric MSI uses for their "Modes" thing, that is why I told you to select CPU Lite Load Control in Advance to see what you're currently using as per the photos from the guide.
Dasharo should already use 110/110 for 13900K as I stated above, so you are effectively achieving nothing. It could make more sense to use 80/80 if that is what has been working fine. You can only verify it in Windows with HWinfo because that is the only tool I know of that can read current values. |
@renehoj |
I would need to wait 2-3 weeks to get a new unit, it's not an option for me. If getting a new unit would solve the problem, I would buy a new CPU, but it seems like there is no guarantee it will change anything. I'll use the MSI firmware until the 15th gen is released, and probably upgrade if the next generation doesn't have the same issue. |
As a 14900K on Z690-A DDR4 user I can share my experience here. I was also experiencing crashes in the browser tabs, especially on tabs with video players (Youtube most often crashed by simply fast-forwarding the video by 10s with a right arrow key). Browser crashes were just a small thing compared to slightly higher workloads like games or compilation. Whenever I tried to compile something on Linux, I often got segmentation faults/core dumps. On Windows some games launched from Steam could run for 1h maybe 2 or even 3h, but sometimes other games crashed in less than 15 minutes or faster. The best was League of Legends with their kernel module for Vanguard. It constantly caused BSODs, first every ~30mins, then every ~5mins and at last every 1-2min. Unplayable... So I went to the support and they came up with this: https://support-leagueoflegends.riotgames.com/hc/en-us/articles/30677122946195-vgk-sys-Error-Troubleshooting-13900k-14900k-processors-only Dasharo was already running on Intel defaults. However, the turbo limits and TVB could be tuned better. I quickly compiled a ROM with turbo multiplier x53 on all P cores, disabled TVB, disabled IA CEP and the problems were gone. So these problem are real for 13900K and 14900K. What i still bothering me, that the browser tab crashes started april/may. But BSODs started for me in June (right when I came back from Xen Summit Lizbon). I had no issues before since I bought the CPU in November (?). Not sure what to think of it. CPU degradation? When I watched the CPU workload in Throttlestop on Windows, I saw a new limit reason which did not appear before, the V-MAX limiter. So apparently the CPU core voltage got out of allowed range and caused problems. If I was to bet, it would be TVB (the TVB microcode issue + TVB "intelligent" frequency/voltage adaptation). Anyway, I'm reopening this, since it is valid. @renehoj don't RMA, I don't think it will help. It can be workarounded by a few firmware tweaks. Still, I don't like limiting my performance (turbo multipliers) just because some CPU features don't do their job properly, even when running on Intel's blessed defaults... |
I noticed the crashes some time around February, I initially thought it was an issue with a Firefox update. Running Qubes OS, the thing I find the most odd, it doesn't crash Xen, dom0, or the domUs. It rarely crashes Firefox itself, it crashes a single tab in the browser, and it can happen while the browser doesn't seem to be doing anything. I guess the browser could be running some JS code in the background, or Xen could be using the same core for something else, but I don't understand why the system overall seems stable, and only the applications are crashing. With both the original MSI and Dasharo settings, there is 10-20% chance the browser tab will crash if I open a GitHub link in the browser, but for some reason the hypervisor and gust OS is completely unaffected. |
@miczyg1 This article seems to confirm the problem gets progressively worse over time.
I just double-checked, I started noticing the crashes 4–5 months after I bought the CPU. |
I bought mine in November, so that would be 6-7 months in my case. |
Intel has found the root cause and will fix it with a microcode update in August, already affected CPUs seem to be permanently damaged. |
Component
Dasharo firmware
Device
MSI Pro Z790-P
Dasharo version
v0.9.1
Dasharo Tools Suite version
No response
Brief summary
System seems to becomes unstable with TME enabled.
How reproducible
100%
How to reproduce
Run a browser with moderate workload, e.g. streaming video, and the browser or browser tabs will randomly crash.
It doesn't happen often, maybe 1-3 times a day.
Expected behavior
There are no software crashes with TME enabled.
Actual behavior
Applications crash.
Screenshots
No response
Additional context
For some time, I had problems with both Firefox and Brave crashing, and switching to the MSI firmware solved the problem.
This week, I enabled memory encryption and the issue return, after disabling TME the system is stable again.
I found the post on the MSI forums which mentions a similar problem, and I'm running the MSI firmware with XMP enabled.
https://forum-en.msi.com/index.php?threads/total-memory-encryption-disabilized-oc.395049/
I think my Dasharo issue could have the same issue, just happening without the memory being overclocked.
CPU: 13900K - Memory: DDR5
Solutions you've tried
I tried two different types of memory, 64 GB and 128 GB, both had the same issue.
Installing the MSI firmware and disabling TME seems to solve the issue.
The text was updated successfully, but these errors were encountered: