Replies: 45 comments 146 replies
-
Found something interesting in a proposed patch in a discussion whose topic was "[PATCH] nvme-pci: fix host memory buffer allocation size" dating of may 10th 2022. The starting point of the discussion start here => https://www.spinics.net/lists/kernel/msg4339024.html At some point (https://www.spinics.net/lists/kernel/msg4352567.html), it is mentioned that:
Also in a subsequent message ( https://www.spinics.net/lists/kernel/msg4372632.html ) it is also mentioned that the situation has improved drastically with the patch. And another point of the discussion about having the Host Memory Buffer of just 32MB. According to my logs, I have the same allocation:
For the record, here is excerpts of some messages:
Current parameters for the nvme kernel modules on my system are on their defaults:
Going though the code of
The patch in question is mentioned at the very beginning of the discussion and is this one:
Another related thread is here => https://lore.kernel.org/linux-nvme/[email protected]/
|
Beta Was this translation helpful? Give feedback.
-
Above patch tried, but in my case, worsens the issue :( The crash happens much more earlier than before. |
Beta Was this translation helpful? Give feedback.
-
Basically at this point, I am out of options with those sticks. Those are a replacement for a trio of ADATA Gammix S70 Blade which were also problematic because their namespace had a bad value for EUI64: Basically all were all set to eui64=0000000000000000 which made the system totally confused about who was who. So my only option at this point is to get another model :/ Perhaps I will keep them for a much-less intensive use. Reality is: not all NVMe hardware can play nicely with ZFS. It seems that investing in higher end of hardware is not an option, especially with ZFS. I won't ever consider switching them back to 512b sectors, I don't think this will solve the issue and if ever it solves it, there is a significant performance penalty. Hoping my hours of investigations would avoid someone wasting money in junk hardware. It is a bit disappointing that this junk is coming from a well-known brand. PS: Free feel to further elaborate. I will post if I get something new on this. |
Beta Was this translation helpful? Give feedback.
-
I would try to replace the PSU with another one and probably 1000W one.
Often mysterious problems end up with replacing faulty PSU.
…On Wed, Apr 26, 2023 at 9:23 AM admnd ***@***.***> wrote:
Above patch tried, but in my case, worsens the issue :( The crash happens
much more early than before.
Fiddling around with parameters of nvme.ko, I managed to have a higher
allocation of 200 MB with nvme.max_host_mem_size_mb=512 + the above patch
applied.
—
Reply to this email directly, view it on GitHub
<#14793 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXQ6HOVYHJWDVAHYS4RWYDXDBMHPANCNFSM6AAAAAAXLAAQ7E>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
This might be a longshot, but where have you connected your NVMe? Did you use the onboard slots or a riser card with bifurcation? And if you used the onboard slots which ones did you use? From the Manual you can see one of the slots shares bandwith with the Sata Ports if theres anything in there it could cause a Problem. Further x670 daisy chanins 2x the x670 chipset to give more connectivity. A Guess off mine could be that this issue could be cause by limited bandwith between chipsets and the CPU which might cause the controller to look like its dropping. My suggestion to troubleshoot this, is to get a bifurcating riser card put it in the 16x Slot and have all the NVMes directly connected to the CPU. This would eliminate going over the Chipsets. Unfortunatly ASUS has no blockdiagram of the Board and where which PCIe Lanes go with which speed. But I would see if limiting the speed of the drives could also be causing this issue. PCIe Switching link speed caused me a lot of headaches with my rx5700 xt GPU. It caused some weird issue of it disconnecting crashing the drivers etc. So pretty similar to what you experience. Those 2 would be my guesses for this issue. |
Beta Was this translation helpful? Give feedback.
-
It's interesting you're having issues with the SN770. I was having issues with mine (2TB as well) in my laptop. ZFS, Btrfs on LVM/LUKS even ext4, my drive would reset just like yours in my laptop. Whether during boot or when sitting there doing nothing, or something. Seemingly random. I took it to my computer store to get it replaced. Through their testing the drive passed all tests, so they did not replace it. I believe they were testing with windows. I am going to RMA it with WD, hopefully my replacement performs better. I have the exact same drive in my desktop(X570 5950X), using a single ZFS vdev as root. I have not experienced these issues. I would try putting the desktop drive in my laptop (XPS 9560)to see if it has issues but that would be quite an inconvenience to me. So I am just going to RMA it. The previous drive in my laptop did not have these issues. This stuff occurred with both 512b and 4kb sectors I believe. |
Beta Was this translation helpful? Give feedback.
-
Others pointers (FreeBSD):
At this point, I have opened a case with WD, perhaps something can be done at their level. As I should have some freetime tomorrow, I will try to exchange modules between my two machines. |
Beta Was this translation helpful? Give feedback.
-
SN770 Swapped out for 3x WD SN 850 configured in 4K. Day & night! My 7950X is literally breathing again! Over 100K IOPS while emerging GCC 13, zpool scrubs are going easily to 5-6 GB/s. Earlier this afternoon, I tried to swap one module at a time. Guess what? One SN 770 quit the pool seconds after the resilvering started, the second reset in the middle. I had thousands checksums errors reported. Fortunately I have daily snapshots stored on a TrueNAS box, so not an issue. This junk is even not able to sustain a pool resilvering. So, gentlemen, moral of the story : Don't use DRAM-less NVMe stuff with ZFS Will give news on what happens with my now famous SN 770 when I will have :) Perhaps they will do better in my secondary machine or in the junk-box. Thank you, again, for jumping in and take some of your time to put suggestions here. This is greatly appreciated. |
Beta Was this translation helpful? Give feedback.
-
Stumpled over this by searching for consequences of my pool crash. |
Beta Was this translation helpful? Give feedback.
-
Hello @admnd I'm experiencing the same problems on my server infrastructure, I recently added this wd nvme (sn850x) just for some low-spec VM that I did not prefer to run on my main nvme composed by different pm9a3. |
Beta Was this translation helpful? Give feedback.
-
I don't know if its related somehow but here's my 2 cents. I had an SN570 500GB (dram less) NVMe, which was actually quite newish (less than 1 year old). I never had any issues initially with ZFS and gentoo on it, been using ZFS since the last 5 months. Until recently, I started noticing random kernel crashes and ZFS status reporting permanent errors while scrubbing. My RAM was perfectly fine concluding from the fact that memtest86+ tests reported pass twice consecutively. To my surprise, upon rebooting to windows, WD dashboard reported that "NVM subsystem reliability has degraded" with 99% lifetime remaining. Even, SMART tests started failing. And unfortunately, the drive had to be replaced out. |
Beta Was this translation helpful? Give feedback.
-
Would be cool for a "ZFS NVMe Recommendations List" to come out of this discussion. I imagine SLC and MLC NVMes would be above the rest. What are the other criteria of which ZFS users should be aware when identifying the best SSD hardware? |
Beta Was this translation helpful? Give feedback.
-
I think I'm suffering from this on a 8TB Corsair MP600 PRO NH used as additional storage for a proxmox 8. rsync seems to trigger it specially. The sledgehammer solution:
Brings back the device for me but the zfs pool doesn't come back. I think it is because proxmox creates the pool with a /dev/nvme0nX and the X changes with every "resurrection". I'm going to try ext4 next on that device and see how it goes. I wanted to post here in case there is more people with the same device and similar problems. |
Beta Was this translation helpful? Give feedback.
-
Just FYI, I had the exact same issue with a brand new WD BLACK SN770, and swapping my PSU solved the issue (while my previous one seemed perfectly fine)... |
Beta Was this translation helpful? Give feedback.
-
Last time I saw this was with either firmware / hardware issue, RMA solves sometimes, if they return you a piece with newer version of firmware or an internal known defect fixed. I would suggest not to buy same brand & model of the same batch for all vdev in a pool, that might put you at risk of faulting all disks if ever there is a hardware / firmware / manufacture issue. |
Beta Was this translation helpful? Give feedback.
-
Just to possibly add to the list of potential problematic devices: Verbatim Vi3000. Got them relatively cheap and did not worry too much about them being DRAM-less as they were supposed to be the base for some VMs and light dockers. Using an Asus Pro WS W680M-ACE SE with both slots populated with these drives. Issues appeared right away in the resilvering for the mirror: one drive completely dropped out with (as far as I remember) same error message as OP. The drives have a small green LED that indicates access (not sure if both read and write but I suspect). After the dropout but before the reboot, this LED stayed lit (not blinking). To document: I'm on Unraid 6.12.8 with ZFS: Loaded module v2.1.14-1, ZFS pool version 5000, ZFS filesystem version 5 |
Beta Was this translation helpful? Give feedback.
-
Stumbled over this list of SSD's with Power Loss Protection: https://www.techpowerup.com/ssd-specs/filter/?plp=1 Looks pretty comprehensive. How did I not see this before? 😄 |
Beta Was this translation helpful? Give feedback.
-
I’ve had similar issues after an RMA with back-to-back issues on SSDs running Linux 6.7 & 6.8 with bcachefs on a 4096 sector size WD SN740 NVMe (2242 size) with firmware 73110101 for a Lenovo laptop with AMD Ryzen 7 CPU. Drive completely shits the bed under heavy IO like compiling the kernel & Lenovo support is acting like Linux is the problem instead of the vendors they partner with. No kernel parameters helped.
|
Beta Was this translation helpful? Give feedback.
-
Update 2024-04-07: Update 2024-0403: Just to confirm... (modern HW noob here) |
Beta Was this translation helpful? Give feedback.
-
I can also report ZFS troubles with 4x
Mainboard: Asrock Rack EP2C602 I have 4 of them in a riser to use the bifurcation feature of my motherboard (PCIE Slot 7, directly connected to the 2nd CPU, so no chipset in the data-path). I saw controller crashes in a What I unsuccessfully tried:
Observations:
@admnd I'm super grateful for this discussion. Thanks for the inital write-up and debugging :) Luckily I can still return my NVMEs to Amazon. |
Beta Was this translation helpful? Give feedback.
-
I see the NVMe errors, but Linux seems to have a long-running issue causing |
Beta Was this translation helpful? Give feedback.
-
Can I assume SN850x is working fine? Does anyone have a recommendation for a 2230 M.2 (Framework 16 does not have 2x2280 😢 ) |
Beta Was this translation helpful? Give feedback.
-
It seems there could be a way to use SN770 if formatted to 512 bytes. I have not tried it myself |
Beta Was this translation helpful? Give feedback.
-
Hi!
I am on Debian11 (Bullseye) and installed ZFS 2.1.11 from backports
(which stuck at this version, which will not be upgraded .... this may
happen, if "top-animals" say, something like "ZFS is not neccessary",
and Actions follow thoughts ....),
so I stay on this version and kernel 6.1.0-0.deb11.21-amd64.
BTW, dont remember which kernel I used, as it happend (2023-05-19),
about 1 hour after return from hibernate (this is why I added, that I am
using a separate swap ssd). I am using ZFS since 2012(!) and have
never seen something bad, especially like this. It was NOT a high
load problem, this is sure.
BTW, at the beginning of this thread, there was a note, that the kernel
nvme driver dont give the amount of bufferspace, nvme expects - but
I lost the track and have not read the specs.
I continue working on the same hardware (supermicro H12SSL)
and have never had a problem with high load.
Why I am using two different NVMEs in the mirror is, that I started
with two FireCuda 510, but one died in the first weeks and I was
afraid, it is a systematic error and the next will follow soon, but both
(FireCuda and WD) are nearly the same in regards to their specs.
I even plan to remove the remaining, because I am getting this
error on each boot:
>smartd[5015]: Device: /dev/nvme0, number of Error Log entries
increased from 1049 to 1052<
This has never been the case for the WD drive (WD_BLACK SN770 2TB).
Regards,
Manfred
…----- Original Message -----
From: Justin Clift ***@***.***
To: "openzfs/zfs" ***@***.***>
Cc: ***@***.***
Sent: Fri, 02 Aug 2024 16:10:10 -0700
Subject: Re: [openzfs/zfs] Unsuitable SSD/NVMe hardware for ZFS - WD BLACK SN770 and others (Discussion #14793)
@mabra Which version of ZFS was this with?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I dont see a really hard evidence for HW error, at least for my case.
I runs > 15 month after the crash without any problem.
If I see, what happens to the kernel (every version another crash), I see
other possibilities - and, so my note in the last answer, why should
someone worries about "a product wich taints the kernel" ....
Regards,
Manfred
…----- Original Message -----
From: Mohammed Sameer ***@***.***
To: "openzfs/zfs" ***@***.***>
Cc: ***@***.***
Sent: Fri, 02 Aug 2024 16:18:12 -0700
Subject: Re: [openzfs/zfs] Unsuitable SSD/NVMe hardware for ZFS - WD BLACK SN770 and others (Discussion #14793)
This is really a mess. But this really points to either zfs itself or the hw. Guess the only safe bet is 850x
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
There has been another set of issues with the HMB buffer size causing crashes on Windows 24H2 with SN770 and SN580 SSDs (and all others based on the same controller) https://community.wd.com/t/windows-24h2-wd-blue-screens/297867 Might this issue be related to something similar? |
Beta Was this translation helpful? Give feedback.
-
An update on this issue, could this strictly be a firmware bug to do with 4096 byte sector sizes (on the SN770) and nothing else? I've recently given my problematic SN770 to someone else who suddenly started having similar looking dropout issues in Windows. After reformatting to 512 byte mode the issues went away. Someone else here reports issues with just the 4096 bytes mode: https://community.wd.com/t/sn770-nvme-controller-reset-when-formatted-with-4096-byte-sectors/282532 |
Beta Was this translation helpful? Give feedback.
-
Also ran into this after ~10 weeks of having a pair of 2T SN770s in a raidz1. Kernel is 6.1.118. Both plugged into the chipset m.2 slots on my Supermicro X13SAE-F (W680 chipset). Saw this thread quickly and didn't bother trying to fix through other means. I swapped them with SN850X. Will report back if I see any issues with the SN850X.
|
Beta Was this translation helpful? Give feedback.
-
Rick Branson ***@***.***> writes:
Also ran into this after ~10 weeks of having a pair of 2T SN770s in a
raidz1. Kernel is 6.1.118. Both plugged into the chipset m.2 slots on
my Supermicro X13SAE-F (W680 chipset). Saw this thread quickly and
didn't bother trying to fix through other means. I swapped them with
SN850X. Will report back if I see any issues with the SN850X.
I'm using the SN850X (WD_BLACK SN850X 2000GB) in 4k LBA mode with BTRFS
(and LUKS) for about a year or longer now. No issues with it so far.
|
Beta Was this translation helpful? Give feedback.
-
I noticed the same issue while running fio benchmarks on two SN770 2TB drives configured with 4K. A sequential read test with multiple jobs is enough to trigger this and can be reproduced each time, even on Windows with similar fio options. I could not trigger a crash with 512B yet. Disks
Filesystems (default settings)
fio
Logs
|
Beta Was this translation helpful? Give feedback.
-
Originally started as a bug, but after investigations and comments it is definitely more a hardware issue related to ZFS than a ZFS bug so I open a general discussion here, free feel to put constructive observations/ideas/workarounds/suggestions.
TL;DR: Some NVME sticks just crash with ZFS, probably due to the fact they are unable to sustain I/O bursts. It is not clear why this happens, the controller might just crash or a combination of firmware/BIOS/hardware makes it unstable/crash when used in a ZFS pool.
Hardware
Issue observed
My system zpool is composed of a single RAID-Z1 VDEV composed of 3x WD Black SN770 2TB them selves configured in 4K logical sectors (I did not test with 512b sectors to see if the issue still happens....yet). The VDEV uses LZ4 compression, is not encrypted neither the underlying modules (they do not support that), standard 128K stripes are used. No L2ARC cache used. System has plenty of free RAM so no RAM underpressure.
Under "normal" daily usage I did not experience anything, the zpool is regularly scrubbed and nothing to report: no checksum error, no frozen tasks, no crash, nothing, the pool completes all scrubbings wonderfully well. The machine also experience no freeze or kernel crashes/"oopses", no stuck tasks (I have had reported an issue with auditd here a couple of weeks ago but this guy is now inactive, see bug #14697). Even "emerging" big stuff like dev-qt/qtwebengine with 32 CMake jobs in parallel or reemerging the whole system from scratch with 32 parallel tasks with heavy packages rebuilt at the same time succeeds. No crashes.
However, if I use
zfs send
to make a backup of the system datasets on a local TrueNAS box over a 10GbE link this is another story: most of the time one of the NVMe modules randomly crash. The issues also happens at different times in the data transfer: sometimes the issue appears after 12Gb, sometimes after 78Gb, sometimes after 93 Gb and so on. If I am lucky, sometimes it completes the operation successfully (less than a quarter of the time). Itchy and annoying. I have managed also to reproduce it with rsync-ing a dataset on an empty new one in the same pool also this happens more rarely. The TrueNAS box and network are out of concern as they run smoothly and as I can reproduce the issue locally by sending the ZFS stream in /dev/null (zfs send .... | cat > /dev/null
).When the crash happens, the following trace appears in the kernel logs:
At this point, if I am lucky enough, I can manage to bring it back to life using a sledgehammer:
If the faulted device reappears the zpool becomes ONLINE again and completes its resilvering (a couple of KB or MB). In the worst case, another one NVMe also drops off the pool which becomes suspended so I have to powercycle the machine or push its reset button. Of course, doing a
nvme list
at this point either completely freezes either lists the two remaining NVMe modules, depending on what is alive.My best guess so far is that the Western Digital SN 770 modules controller is not not beefy enough to handle a burst of I/O requests (knowing they have no DRAM cache) so it is put on its knees and become so unresponsive that it is unable to complete a reset request on its own (no AER reported in logs BTW). As not always the same module crashes, they do not seems be all defective or I am extremely unlucky. Pool scrubbing might by a bit lighter for the controller so the scrubs/resilvers work without any issue (maximum observed speed observe is around 4.5~5 GB/s when scrubbing the pool according to
zpool status
).What has been tried so far
Several things! Without any improvements unfortunately:
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
on the kernel command-line;zfs
kernel modules parameters: lowering values ofzfs_vdev_sync_read_min_active
,zfs_vdev_sync_read_max_active
and theirasync
counterpart (I used the same values set as defaults forfs_vdev_scrub_max_active
andfs_vdev_scrub_max_active
) ;throttle
:zfs send ... | throttle -M 300 | ...
blkio
cgroupzfs send
from a FreeBSD live media : FreeBSD allocates a 200MB host buffer for each module but unfortunately no more success and azfs send
also hangs :/Some thoughts / ideas of tests to try
Is there a "ZFS native" way to throttle I/O operations in the case of doing a
zfs send
?Has anybody here experienced something like this? If so, what are the other brands/models subject to a similar issue?
Beta Was this translation helpful? Give feedback.
All reactions