Potential trim issue on 0.8.0 #8835

kroy-the-rabbit · 2019-05-30T00:33:00Z

kroy-the-rabbit
May 30, 2019

System information

Type	Version/Name
Distribution Name	Debian
Distribution Version	Buster
Linux Kernel	4.19.0-5-amd64
Architecture	amd64
ZFS Version	0.8.0
SPL Version

Describe the problem you're observing

Upgraded a 0.7.13 pool to 0.8.0. Started a trim on the pool. Within minutes a disk was marked as faulted. SMART data is clean, regular scrubs are occurring, and the disk has never once thrown an error. It has less than 2000 hours on it.

I'm not saying the disk isn't bad, but it would be a heck of a coincidence.

Cleared the error on the disk, now the disk seems to be in a permanent "trimming" state.

zpool layout is eight S3610 in four mirrors

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

Permanent trimming (now 5 hours later):

mirror-2                                       ONLINE       0     0     0
	    ata-LK1600GEYMV_xxxxx           ONLINE       0     0     0  (trimming)
	    ata-LK1600GEYMV_xxxxx           ONLINE       0     0     0

Disk errors:


May 29 14:48:57 tank kernel: [ 3437.636049] sd 0:0:4:0: attempting task abort! scmd(00000000f0a3b4b6)
 May 29 14:48:57 tank kernel: [ 3437.636059] sd 0:0:4:0: [sde] tag#3060 CDB: Write(10) 2a 00 8d 64 b7 98 00 00 08 00
 May 29 14:48:57 tank kernel: [ 3437.636064] scsi target0:0:4: handle(0x000d), sas_address(0x4433221103000000), phy(3)
 May 29 14:48:57 tank kernel: [ 3437.636068] scsi target0:0:4: enclosure logical id(0x590b11c00214b300), slot(0)
 May 29 14:48:57 tank kernel: [ 3437.660033] sd 0:0:4:0: task abort: SUCCESS scmd(00000000f0a3b4b6)
 May 29 14:48:57 tank kernel: [ 3437.660046] sd 0:0:4:0: [sde] tag#3060 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
 May 29 14:48:57 tank kernel: [ 3437.660050] sd 0:0:4:0: [sde] tag#3060 CDB: Write(10) 2a 00 8d 64 b7 98 00 00 08 00
 May 29 14:48:57 tank kernel: [ 3437.660113] zio pool=ssdtank vdev=/dev/disk/by-id/ata-LK1600GEYMV_xxxxxx-part1 error=5 type=2 offset=1214559236096 size=4096    flags=180880
 May 29 14:48:57 tank kernel: [ 3437.660136] sd 0:0:4:0: attempting task abort! scmd(000000005093a137)
 May 29 14:48:57 tank kernel: [ 3437.660139] sd 0:0:4:0: [sde] tag#3057 CDB: Write(10) 2a 00 8d 64 b6 f8 00 00 78 00
 May 29 14:48:57 tank kernel: [ 3437.660143] scsi target0:0:4: handle(0x000d), sas_address(0x4433221103000000), phy(3)
 May 29 14:48:57 tank kernel: [ 3437.660145] scsi target0:0:4: enclosure logical id(0x590b11c00214b300), slot(0)
 May 29 14:48:57 tank kernel: [ 3437.660148] sd 0:0:4:0: task abort: SUCCESS scmd(000000005093a137)
 May 29 14:48:57 tank kernel: [ 3437.660155] sd 0:0:4:0: [sde] tag#3057 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
 May 29 14:48:57 tank kernel: [ 3437.660157] sd 0:0:4:0: [sde] tag#3057 CDB: Write(10) 2a 00 8d 64 b6 f8 00 00 78 00
 May 29 14:48:57 tank kernel: [ 3437.660207] zio pool=ssdtank vdev=/dev/disk/by-id/ata-LK1600GEYMV_xxxxxx-part1 error=5 type=2 offset=1214559154176 size=61440   flags=180880
 May 29 14:48:57 tank kernel: [ 3437.660219] sd 0:0:4:0: attempting task abort! scmd(000000004cb30aad)
 May 29 14:48:57 tank kernel: [ 3437.660223] sd 0:0:4:0: [sde] tag#3056 CDB: Write(10) 2a 00 04 a2 ef 58 00 00 e8 00
 May 29 14:48:57 tank kernel: [ 3437.660227] scsi target0:0:4: handle(0x000d), sas_address(0x4433221103000000), phy(3)
 May 29 14:48:57 tank kernel: [ 3437.660230] scsi target0:0:4: enclosure logical id(0x590b11c00214b300), slot(0)
 May 29 14:48:57 tank kernel: [ 3437.660234] sd 0:0:4:0: task abort: SUCCESS scmd(000000004cb30aad)
 May 29 14:48:57 tank kernel: [ 3437.660243] sd 0:0:4:0: [sde] tag#3056 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
 May 29 14:48:57 tank kernel: [ 3437.660247] sd 0:0:4:0: [sde] tag#3056 CDB: Write(10) 2a 00 04 a2 ef 58 00 00 e8 00
 May 29 14:48:57 tank kernel: [ 3437.660318] zio pool=ssdtank vdev=/dev/disk/by-id/ata-LK1600GEYMV_xxxxxx-part1 error=5 type=2 offset=39825879040 size=118784    flags=180880
 May 29 14:48:57 tank kernel: [ 3437.660329] sd 0:0:4:0: attempting task abort! scmd(000000009a38a976)
 May 29 14:48:57 tank kernel: [ 3437.660333] sd 0:0:4:0: [sde] tag#3026 CDB: Write(10) 2a 00 9a cb 24 b8 00 00 08 00
 May 29 14:48:57 tank kernel: [ 3437.660335] scsi target0:0:4: handle(0x000d), sas_address(0x4433221103000000), phy(3)
 May 29 14:48:57 tank kernel: [ 3437.660338] scsi target0:0:4: enclosure logical id(0x590b11c00214b300), slot(0)
 May 29 14:48:57 tank kernel: [ 3437.660340] sd 0:0:4:0: task abort: SUCCESS scmd(000000009a38a976)
 May 29 14:48:57 tank kernel: [ 3437.660345] sd 0:0:4:0: [sde] tag#3026 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
 May 29 14:48:57 tank kernel: [ 3437.660348] sd 0:0:4:0: [sde] tag#3026 CDB: Write(10) 2a 00 9a cb 24 b8 00 00 08 00
 May 29 14:48:57 tank kernel: [ 3437.660395] zio pool=ssdtank vdev=/dev/disk/by-id/ata-LK1600GEYMV_xxxxxx-part1 error=5 type=2 offset=1329665241088 size=4096    flags=180880

After clearing the faulted device:

ZFS has finished a resilver:

   eid: 74
 class: resilver_finish
  host: tank
  time: 2019-05-29 14:52:51-0500
  pool: ssdtank
 state: ONLINE
  scan: resilvered 146M in 0 days 00:00:02 with 0 errors on Wed May 29 14:52:51 2019
config:

	NAME                                             STATE     READ WRITE CKSUM
	ssdtank                                          ONLINE       0     0     0
	  mirror-0                                       ONLINE       0     0     0
	    ata-LK1600GEYMV_xxxxxx                       ONLINE       0     0     0
	    ata-LK1600GEYMV_xxxxxx                       ONLINE       0     0     0
	  mirror-1                                       ONLINE       0     0     0
	    ata-LK1600GEYMV_xxxxxx                       ONLINE       0     0     0
	    ata-LK1600GEYMV_xxxxxx                       ONLINE       0     0     0
	  mirror-2                                       ONLINE       0     0     0
	    ata-LK1600GEYMV_xxxxxx                       ONLINE       0     0     0  (trimming)
	    ata-LK1600GEYMV_xxxxxx                       ONLINE       0     0     0  
          mirror-3                                       ONLINE       0     0     0
  	    ata-LK1600GEYMV_xxxxxx                       ONLINE       0     0     0
  	    ata-LK1600GEYMV_xxxxxx                       ONLINE       0     0     0

kroy-the-rabbit · 2019-05-30T03:17:46Z

kroy-the-rabbit
May 30, 2019
Author

To add to this, some 9 hours later, still running.

Manually running:

zpool trim -s ssdtank /dev/disk/by-id/ata-device-still-trimming

does nothing, while the other devices report there is no active trim

After running above:

mirror-2                                       ONLINE       0     0     0
	    ata-LK1600GEYMV_ xxxxxxx           ONLINE       0     0     0  (trimming)

0 replies

behlendorf · 2019-05-30T04:25:47Z

behlendorf
May 30, 2019
Maintainer

@kroy-the-rabbit that would be a real of a coincidence. According to the console messages posted an unexpected timeout was encountered (hostbyte=DID_TIME_OUT). After aborting the command it was converted to an IO error by the scsi layer and then reported to ZFS. This doesn't necessarily mean there's anything wrong with the device, but something in the path resulted in the timeout.

A couple quick questions.

Did you TRIM the entire pool? If so were all other vdevs in the pool trimmed successfully? You can use zpool status -t to output the detailed per-vdev trim status.
The zpool trim -s command will suspend the TRIM, not cancel it, so it hanging around may be expected. Use zpool trim -c to cancel it.

0 replies

kroy-the-rabbit · 2019-05-30T04:35:33Z

kroy-the-rabbit
May 30, 2019
Author

Did you TRIM the entire pool?

Yep.

The rest of the pool other than the failed device says:

100% trimmed, completed at Wed 29 May 2019 02:29:10 PM CDT)

The other mirror device in that vdev says:

(100% trimmed, completed at Wed 29 May 2019 03:09:23 PM CDT)

The "failed" device says (and doesn't seem to be moving):

(33% trimmed, started at Wed 29 May 2019 02:27:35 PM CDT)

Use zpool trim -c to cancel it.

That did bring the status to "untrimmed".

Restarting it by the specific device resulted in a postive result in less than 3 minutes:

100% trimmed, completed at Wed 29 May 2019 11:34:25 PM CDT)

0 replies

behlendorf · 2019-05-30T04:47:37Z

behlendorf
May 30, 2019
Maintainer

It sounds as if ZFS may have overwhelmed the controller with outstanding TRIM commands resulting in the unexpected timeout and subsequently faulted device. You could try setting the module option zfs_vdev_trim_max_active=1 to reduce how many concurrent TRIM commands can be outstanding per-vdev (it defaults to 2). If this prevents the issue we may want to consider reducing the default value.

0 replies

nixomose · 2019-11-30T13:09:48Z

nixomose
Nov 30, 2019

so I didn't want to open another ticket, but I'm not sure this is the same thing.
Yesterday I upgraded zfs from the ubuntu provided 0.7.5 to a manually built 0.8.2 so I could use trim.
I have a simple mirror of 2 1tb ssds. (both product: SanDisk SSD PLUS).
I installed 0.8.2, did the pool upgrade, and first thing I did was zpool trim on the pool.
come back later and there's over 1000 unrecoverable errors.
blow away the pool, recreate it, turn on autotrim, copied 200+gig back to it, I have a handful of unrecoverable errors, while it's copying to the pool.
I blow away the pool again, leave autotrim off, copy 300+ gig to it, zero errors, scrub, zero errors.

root@io:/z/vm_backup# lsb_release -a; uname -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
Linux io 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12 10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

filename: /lib/modules/4.15.0-70-generic/updates/dkms/zfs.ko
version: 0.8.2-1

maybe it's my disk, not dealing well with trim? maybe not.

This is my backup system so I don't really want to use it as a testbed to run more tests, but if there's something non destructive you want me to do, I can try it.

oh and there were no disk errors of any kind in kern.log.

0 replies

Temtaime · 2019-12-13T12:50:14Z

Temtaime
Dec 13, 2019

I have issues with trim too, triming pool results in zvol corruption ALWAYS.
I can restore pool from backup, scrub says it's ok, and right after trim it becomes degraded.

Dec 13 15:41:38 pve zed: eid=3005 class=trim_start pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1 vdev_state=ONLINE
Dec 13 15:41:38 pve zed: eid=3006 class=history_event pool_guid=0x9649B29D3CFCA360
Dec 13 15:42:04 pve zed: eid=3007 class=trim_finish pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1 vdev_state=ONLINE
Dec 13 15:42:04 pve zed: eid=3008 class=history_event pool_guid=0x9649B29D3CFCA360
Dec 13 15:42:29 pve zed: eid=3009 class=scrub_start pool_guid=0x9649B29D3CFCA360
Dec 13 15:42:29 pve zed: eid=3010 class=history_event pool_guid=0x9649B29D3CFCA360
Dec 13 15:42:40 pve zed: eid=3011 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3012 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3013 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3014 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3015 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3016 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3017 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3018 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3019 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3020 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3021 class=statechange pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1 vdev_state=DEGRADED


# zfs list
NAME USED AVAIL REFER MOUNTPOINT
ssd-storage 3.46G 427G 24K /ssd-storage
ssd-storage/data 3.46G 427G 24K /ssd-storage/data
ssd-storage/data/vm-130-disk-0 3.46G 427G 3.46G -
ssd-storage/data/vm-130-disk-1 15K 427G 15K -
# zpool scrub ssd-storage
# zpool status
pool: ssd-storage
state: ONLINE
scan: scrub repaired 0B in 0 days 00:00:12 with 0 errors on Fri Dec 13 04:22:48 2019
config:

NAME STATE READ WRITE CKSUM
ssd-storage ONLINE 0 0 0
ata-WDC_WDS480G2G0B-00EPW0_19144B800540 ONLINE 0 0 0

errors: No known data errors
# zpool trim ssd-storage
# zpool status
pool: ssd-storage
state: ONLINE
scan: scrub repaired 0B in 0 days 00:00:13 with 0 errors on Fri Dec 13 04:24:01 2019
config:

NAME STATE READ WRITE CKSUM
ssd-storage ONLINE 0 0 0
ata-WDC_WDS480G2G0B-00EPW0_19144B800540 ONLINE 0 0 0 (trimming)

errors: No known data errors

# zpool scrub ssd-storage
# zpool status -v
pool: ssd-storage
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 7K in 0 days 00:00:13 with 18 errors on Fri Dec 13 04:25:00 2019
config:

NAME STATE READ WRITE CKSUM
ssd-storage DEGRADED 0 0 0
ata-WDC_WDS480G2G0B-00EPW0_19144B800540 DEGRADED 0 0 40 too many errors

errors: Permanent errors have been detected in the following files:

ssd-storage/data/vm-130-disk-0:<0x1>

Status says it was trimmed OK: (100% trimmed, completed at Fri 13 Dec 2019 03:42:04 PM MSK)

0 replies

Temtaime · 2019-12-13T12:55:17Z

Temtaime
Dec 13, 2019

echo 1 > /sys/module/zfs/parameters/zfs_vdev_trim_max_active didn't help

0 replies

behlendorf · 2019-12-13T19:47:58Z

behlendorf
Dec 13, 2019
Maintainer

@Temtaime if it's not too much to ask would it be possible for you to rerun your test case with the ZFS_DEBUG_TRIM bit set in zfs_flags. This will enable some additional internal checks to re-verify that only free space is ever trimmed. If any of these checks fail it'll result in a system panic with some additional debugging. You can enable these checks by running:

sudo sh -c "echo 2048 >/sys/module/zfs/parameters/zfs_flags"

This should help us determine if it's the software or hardware which is causing the problem.

0 replies

Temtaime · 2019-12-14T12:33:30Z

Temtaime
Dec 14, 2019

@behlendorf Hello. Thanks for a reply.
Sorry, this is a remote production machine and i cannot right for now cause a kernel panic for it.
I'll try to find some free time to get to it and we'll see.

0 replies

Temtaime · 2020-04-23T09:00:12Z

Temtaime
Apr 23, 2020

@behlendorf Long story short, i replaced ssd and the issue is gone.
Any news from @nixomose ?

0 replies

nixomose · 2020-04-23T11:01:04Z

nixomose
Apr 23, 2020

sorry I didn't get back to you before.
so I grabbed a couple of ssds at work, different make and model. set up the mirror just like I have at home, did all the same things, ran trim, everything was fine.
I'm afraid to run it at home again because it's a pain restoring all the data. :-)
So likely it has to do with some firmware problem specific to these drives I have.
That's just a guess though. I can get you the model (they're sandisk drives) if you want. I'm not near the machine right now so ... oh wait, I have a network one moment...

*-disk
description: ATA Disk
product: SanDisk SSD PLUS
physical id: 0.0.0
bus info: scsi@3:0.0.0
logical name: /dev/sdc
version: 00RL
serial: 190919447101
size: 931GiB (1TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=e9911e60-0f76-c847-b894-61de7ca43f2b logicalsectorsize=512 sectorsize=512

and

*-disk
description: ATA Disk
product: SanDisk SSD PLUS
physical id: 0.0.0
bus info: scsi@5:0.0.0
logical name: /dev/sde
version: 00RL
serial: 191177464112
size: 931GiB (1TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=55d0fdde-0f3e-db4e-8cc2-7f474864f590 logicalsectorsize=512 sectorsize=512

I have a single pool with them mirrored together, and if I run trim, it all goes boom.

0 replies

nixomose · 2020-05-01T19:35:24Z

nixomose
May 1, 2020

@behlendorf Hi from stu. Happy friday.

So those drives I had a problem with are going to be freed up in the next few days I hope, for a little while anyway. I asked tom and he said tag you and ask if you'd want me to do any testing with these drives.

I'm inclined to agree with you that the firmware on these drives is not capable of dealing with whatever trims zfs is throwing at it.

But if you think this is a problem worth chasing and there's something I can test on these drives, lemme know, I can do whatever.

a short version of what happened so you don't have to read through everything again:
I installed 0.8.2, I have a pool with two 1tb sandisk ssds mirrored together. ( the pool was created with 0.7 something I think)
when I run zpool trim on that pool, I get a bazillion errors and all the data is unrecoverable.

0 replies

behlendorf · 2020-05-01T22:01:45Z

behlendorf
May 1, 2020
Maintainer

Thanks for the update. If you're able to do a little testing with the drives it would be helpful to create a new pool and see if you can reproduce the issue. If so are there any IO errors logged to dmesg? What does the /proc/spl/kstat/zfs/pool/iostats file look like? Are only checksum errors reported? Going on the theory that this is a firmware issue. If we were somehow able to recognize the drive was behaving badly we could stop issuing trims.

0 replies

nixomose · 2020-05-03T12:35:12Z

nixomose
May 3, 2020

Didn't have a lot of time, I'll get you more information, but the first set of stuff is interesting...

I ran zpool trim z went away, came back after a while...

these two ssd are mirrored together....

lrwxrwxrwx 1 root root 10 May 2 20:30 ata-SanDisk_SSD_PLUS_1000GB_191177464112-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 May 2 20:30 ata-SanDisk_SSD_PLUS_1000GB_190919447101-part1 -> ../../sdd1
root@io:/dev/disk/by-id# zpool status
pool: z
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0B in 0 days 01:10:35 with 0 errors on Tue Mar 31 21:16:40 2020
config:

    NAME                                          STATE     READ WRITE CKSUM
    z                                             ONLINE       0     0     0
      mirror-0                                    ONLINE       0     0     0
        ata-SanDisk_SSD_PLUS_1000GB_190919447101  ONLINE       7     0   181
        ata-SanDisk_SSD_PLUS_1000GB_191177464112  ONLINE       0     0   177

so kern.log has io errors on sdd which is 190919447101
...

May 3 07:45:06 io kernel: [207209.921245] ata4.00: exception Emask 0x0 SAct 0x200 SErr 0x0 action 0x0
May 3 07:45:06 io kernel: [207209.921249] ata4.00: irq_stat 0x40000008
May 3 07:45:06 io kernel: [207209.921251] ata4.00: failed command: READ FPDMA QUEUED
May 3 07:45:06 io kernel: [207209.921256] ata4.00: cmd 60/00:48:34:92:75/01:00:4a:00:00/40 tag 9 ncq dma 131072 in
May 3 07:45:06 io kernel: [207209.921256] res 41/40:00:34:92:75/00:00:4a:00:00/00 Emask 0x409 (media error)
May 3 07:45:06 io kernel: [207209.921258] ata4.00: status: { DRDY ERR }
May 3 07:45:06 io kernel: [207209.921259] ata4.00: error: { UNC }
May 3 07:45:06 io kernel: [207209.927018] ata4.00: configured for UDMA/133
May 3 07:45:06 io kernel: [207209.927028] sd 3:0:0:0: [sdd] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 3 07:45:06 io kernel: [207209.927030] sd 3:0:0:0: [sdd] tag#9 Sense Key : Medium Error [current]
May 3 07:45:06 io kernel: [207209.927032] sd 3:0:0:0: [sdd] tag#9 Add. Sense: Unrecovered read error - auto reallocate failed
May 3 07:45:06 io kernel: [207209.927034] sd 3:0:0:0: [sdd] tag#9 CDB: Read(10) 28 00 4a 75 92 34 00 01 00 00
May 3 07:45:06 io kernel: [207209.927036] print_req_error: I/O error, dev sdd, sector 1249219124
May 3 07:45:06 io kernel: [207209.927041] zio pool=z vdev=/dev/disk/by-id/ata-SanDisk_SSD_PLUS_1000GB_190919447101-part1 error=5 type=1 offset=639599142912 size=131072 flags=180880
May 3 07:45:06 io kernel: [207209.927052] ata4: EH complete

But the other disk seems fine (from a kern.log point of view) yet the data is corrupt, the mirror couldn't get the data from either drive?

there are zero errors in kern.log for the sdf drive.

reading the files zpool status -v says are bad I get io errors...

root@io:/dev/disk/by-id# file /z/backup/sp/spford/ford/git/o/objects/e8/6955a1e45bf3d4c3e3d7836d8c10a14389711b
/z/backup/sp/spford/ford/git/o/objects/e8/6955a1e45bf3d4c3e3d7836d8c10a14389711b: ERROR: cannot read `/z/backup/sp/spford/ford/git/o/objects/e8/6955a1e45bf3d4c3e3d7836d8c10a14389711b' (Input/output error)

0 replies

nixomose · 2020-05-05T03:37:40Z

nixomose
May 5, 2020

a few more notes:

I checked and there's no firmware update for these drives, so it is what it is.
given that only one drive was giving errors, I swapped sata cables, and plugged them into different ports, since then... no more hardware errors. I've been copying, and deleting and creating and deleting snapshots, the corrupt blocks are still corrupt, but nothing new shows up. again, no hardware errors.
So maybe just a bad sata cable. Odd that it's only a problem when I trim though, I'll play more.

Also, I guess this makes sense, just wanted to check, but when doing a zpool status, I notice that every time I did a du on my zpool, the CKSUM errors went up. I guess that's because zfs was unable to repair the blocks, so it kept finding them bad over and over every time I tried to read that data, and the CKSUM value is just a tally of errors it has come across I guess?

Anyway. I'll let you know if I find anything else worth mentioning but it's starting to look like either a bad cable or something not seated right. I blew away the pool, made a new one and am generating more data and deletes, it seems to be holding up for now.

Also, I'm realizing, these appear to be pretty low grade consumer level drives, so one can only ask for so much.

Thanks for the feedback anyway.

0 replies

somebody-somewhere-over-the-rainbow · 2020-05-28T14:29:39Z

somebody-somewhere-over-the-rainbow
May 28, 2020

I have a similar issue occuring with autotrim=on on a a IBM M1015 HBA (LSI 9211-8i). I can isolate it to the HBA as the raidz1 consists of disks on the HBA and on the MoBo SATA controller. With autotrim=on I receive on all 4 HBA-connected devices error messages like this (none on MoBO connected disks):
zio pool vdev=xxxx error=5 type=1 offset=xxxx flags=180880

with autotrim=off I so far have not seen the issue appearing. A manual trim also did not invoke any issues

Updtae: I ran a scrub without autotrim and did receive errors I/O errors (though not zio pool errors but typical blk_update_request IO errors). I ordered a new 8087 cable to check if errors persist with a new cable.

If I do not report back, consider my issue solved ;-)

@nixomose what HBA are you using?

0 replies

nixomose · 2020-05-29T12:18:22Z

nixomose
May 29, 2020

all my drives are plugged directly into the motherboard, so whatever controller's on the board which lshw says is....

product: 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] [8086:1E02]
vendor: Intel Corporation [8086]

0 replies

nixomose · 2020-05-29T12:19:07Z

nixomose
May 29, 2020

I can't imagine my problem is the controller though, because other ssds attached the same way are fine with trim, I'm sure it's the drives in my case. never buying those again. :-)

0 replies

Frostman · 2020-06-01T23:20:20Z

Frostman
Jun 1, 2020

Hello, seems like I've faced a similar issue. I hope some more info could help to figure out root cause for the issue.

I'm running NixOS 19.09 with ZFS 0.8.3 (while there was the same issue with 0.8.2).
My SSDs are Seagate Nytro 1351 3.84TB (XA3840LE10063). I have 6 of them and the issue is reproducible on all of them and looks the same. All SSDs are directly connected to the MB. That's my root pool and I can't destroy it, but I'm ok with carefully running some experiments on one of the devices as I'm running raidz2.

So, before enabling the autotrim feature I've decided to test on one device first using the sudo zpool trim zmain <DEVICE> command and almost immediately I've got one read error:

➜ sudo zpool status -t zmain                                                                                                    ~
  pool: zmain
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 0 days 02:16:07 with 0 errors on Mon Jun  1 15:55:00 2020
config:

        NAME                                                 STATE     READ WRITE CKSUM
        zmain                                                ONLINE       0     0     0
          raidz2-0                                           ONLINE       0     0     0
            ata-XA3840LE10063_HKT01DNK-part2                 ONLINE       0     0     0  (100% trimmed, completed at Mon 01 Jun 2020 01:30:30 PM PDT)
            ata-XA3840LE10063_HKT01DNN-part2                 ONLINE       1     0     0  (43% trimmed, started at Mon 01 Jun 2020 04:09:43 PM PDT)
            ata-XA3840LE10063_HKT01E92-part2                 ONLINE       0     0     0  (untrimmed)
            ata-XA3840LE10063_HKT01EBH-part2                 ONLINE       0     0     0  (untrimmed)
            ata-XA3840LE10063_HKT01EGB-part2                 ONLINE       0     0     0  (untrimmed)
            ata-XA3840LE10063_HKT01ETE-part2                 ONLINE       0     0     0  (untrimmed)
        logs
          nvme-INTEL_SSDPE21D480GA_PHM2809000AZ480BGN-part2  ONLINE       0     0     0  (untrimmed)
        cache
          nvme-Samsung_SSD_970_PRO_512GB_S463NF0K910544A     ONLINE       0     0     0  (untrimmed)

errors: No known data errors

Trim would eventually successfully finish. Scrub successfully passing after that as well and the pool seems to continue running just fine after that. I've faced that issue more than 3 months ago and after successful scrub, I've cleaned up the errors and they never acquired again. So, that's quite interesting and I'm wondering is it even a real error or some issue that leading to zfs to report an error...

Here is what I see in logs at the time of running trim command:

Jun 01 16:09:43 nas sudo[123474]: frostman : TTY=pts/4 ; PWD=/home/frostman ; USER=root ; COMMAND=/run/current-system/sw/bin/zpoo>
Jun 01 16:09:43 nas sudo[123474]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 01 16:09:43 nas sudo[123474]: pam_unix(sudo:session): session closed for user root
Jun 01 16:09:43 nas kernel: ata2.00: exception Emask 0x0 SAct 0xa000000 SErr 0x0 action 0x0
Jun 01 16:09:43 nas kernel: ata2.00: irq_stat 0x40000001
Jun 01 16:09:43 nas kernel: ata2.00: failed command: SEND FPDMA QUEUED
Jun 01 16:09:43 nas kernel: ata2.00: cmd 64/01:c8:00:00:00/00:00:00:00:00/a0 tag 25 ncq dma 512 out
                                     res 41/04:c8:00:00:00/00:00:00:00:00/00 Emask 0x401 (device error) <F>
Jun 01 16:09:43 nas kernel: ata2.00: status: { DRDY ERR }
Jun 01 16:09:43 nas kernel: ata2.00: error: { ABRT }
Jun 01 16:09:43 nas kernel: ata2.00: failed command: SEND FPDMA QUEUED
Jun 01 16:09:43 nas kernel: ata2.00: cmd 64/01:d8:00:00:00/00:00:00:00:00/a0 tag 27 ncq dma 512 out
                                     res 51/04:c8:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)
Jun 01 16:09:43 nas kernel: ata2.00: status: { DRDY ERR }
Jun 01 16:09:43 nas kernel: ata2.00: error: { ABRT }
Jun 01 16:09:43 nas kernel: ata2.00: configured for UDMA/133
Jun 01 16:09:43 nas kernel: ata2.00: device reported invalid CHS sector 0
Jun 01 16:09:43 nas kernel: sd 1:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 01 16:09:43 nas kernel: sd 1:0:0:0: [sdb] tag#27 Sense Key : Illegal Request [current]
Jun 01 16:09:43 nas kernel: sd 1:0:0:0: [sdb] tag#27 Add. Sense: Unaligned write command
Jun 01 16:09:43 nas kernel: sd 1:0:0:0: [sdb] tag#27 CDB: Write same(16) 93 08 00 00 00 00 00 20 21 f0 00 00 00 48 00 00
Jun 01 16:09:43 nas kernel: blk_update_request: I/O error, dev sdb, sector 2105840 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio c>
Jun 01 16:09:43 nas kernel: ata2: EH complete
Jun 01 16:09:43 nas kernel: ata2.00: exception Emask 0x0 SAct 0x820 SErr 0x0 action 0x0
Jun 01 16:09:43 nas kernel: ata2.00: irq_stat 0x40000001
Jun 01 16:09:43 nas kernel: ata2.00: failed command: SEND FPDMA QUEUED
Jun 01 16:09:43 nas kernel: ata2.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq dma 512 out
                                     res 41/04:28:00:00:00/00:00:00:00:00/00 Emask 0x401 (device error) <F>
Jun 01 16:09:44 nas kernel: ata2.00: status: { DRDY ERR }
Jun 01 16:09:44 nas kernel: ata2.00: error: { ABRT }

0 replies

somebody-somewhere-over-the-rainbow · 2020-06-03T19:24:14Z

somebody-somewhere-over-the-rainbow
Jun 3, 2020

I have a similar issue occuring with autotrim=on on a a IBM M1015 HBA (LSI 9211-8i). I can isolate it to the HBA as the raidz1 consists of disks on the HBA and on the MoBo SATA controller. With autotrim=on I receive on all 4 HBA-connected devices error messages like this (none on MoBO connected disks):
I ran a scrub without autotrim and did receive errors I/O errors (though not zio pool errors but typical blk_update_request IO errors). I ordered a new 8087 cable to check if errors persist with a new cable.
If I do not report back, consider my issue solved ;-)

So I installed the new cable without any significant change - I still get errors like these regularly:

Jun  3 19:41:47 xxx kernel: [ 7368.995554] sd 0:0:6:0: [sdg] tag#205 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Jun  3 19:41:47 xxx kernel: [ 7368.995556] sd 0:0:6:0: [sdg] tag#205 CDB: Read(10) 28 00 04 cc f6 d0 00 00 08 00
Jun  3 19:41:47 xxx kernel: [ 7368.995557] blk_update_request: I/O error, dev sdg, sector 80541392 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
Jun  3 19:41:47 xxx kernel: [ 7368.995584] zio pool=tank-ssd vdev=/dev/disk/by-id/wwn-0x5002538844584d30-part1 error=5 type=1 offset=41236144128 size=4096 flags=180880
Jun  3 19:41:47 xxx zed: eid=21 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5002538844584d30-part1
Jun  3 19:41:48 xxx zed: eid=22 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5001b444a44a4f38-part1
Jun  3 19:41:48 xxx zed: eid=23 class=io pool_guid=0xAF344193AF7F244C
Jun  3 19:41:48 xxx zed: eid=24 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5001b444a4b66962-part1
Jun  3 19:41:48 xxx zed: eid=25 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x0000000000000000-part1
Jun  3 20:45:22 xxx kernel: [11184.199033] blk_update_request: I/O error, dev sdf, sector 67551064 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
Jun  3 20:45:22 xxx kernel: [11184.218613] sd 0:0:4:0: [sde] tag#200 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Jun  3 20:45:22 xxx kernel: [11184.218616] blk_update_request: I/O error, dev sde, sector 67551064 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
Jun  3 20:45:23 xxx zed: eid=26 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5001b444a4b66962-part1
Jun  3 20:45:23 xxx zed: eid=27 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5001b444a44a4f38-part1
Jun  3 20:45:23 xxx zed: eid=28 class=io pool_guid=0xAF344193AF7F244C
Jun  3 20:45:23 xxx zed: eid=29 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x0000000000000000-part1
Jun  3 20:45:23 xxx zed: eid=30 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5002538844584d30-part1
Jun  3 21:06:46 xxx kernel: [12468.149728] sd 0:0:5:0: [sdf] tag#194 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Jun  3 21:06:46 xxx kernel: [12468.149736] sd 0:0:5:0: [sdf] tag#194 CDB: Read(10) 28 00 1d cf 2f 80 00 00 08 00
Jun  3 21:06:46 xxx kernel: [12468.224035] sd 0:0:6:0: [sdg] tag#238 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Jun  3 21:06:46 xxx kernel: [12468.224044] sd 0:0:6:0: [sdg] tag#238 CDB: Read(10) 28 00 1d cf 2f 80 00 00 08 00
Jun  3 21:06:46 xxx kernel: [12468.224050] blk_update_request: I/O error, dev sdg, sector 500117376 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0

It does not seem to be a trim issue to another ZFS related issue. Before converting to zfs, I ran those disks as mdadm raid5 without any issues. Nevertheless, this seems to be isolated to the controller. 4 out of 5 disks are connected via the HBA (LSI 9211-8i) and all of them are throwing these error. It is very unlikely that all 4 SSDs all of a sudden have hardware issues.

I hope someone has an idea how to fix this although not this is not related to trim...

0 replies

xartin · 2020-06-27T01:46:50Z

xartin
Jun 27, 2020

Someone from reddit mentioned this bug in response to a post i submitted inquiring if the Linux mpt3sas driver may have been responsible for a reproducible bug with zpool trim where using zpool trim with a single vdev mirror pool comprised of samsung 860 evo ssd's was causing controller resets with an LSI 9305-16i sas hba.

Running Gentoo Linux and 16+ years experience using it. Here's the server kernel config running Linux 5.4.38 and the kernel log result of attempting to use zpool trim with zfs_vdev_max_active=2

https://pastebin.com/1JQGpL9j

After recovering from the first round of disk resets and having set zfs_vdev_trim_max_active=1 zpool trim degraded the mirror pool due to more disk resets and faulted one disk in the pool.

Currently running zfs 0.8.4 but this was reproducable on previous zfs versions when i was willing to attempt testing it.

I'm not certain what to attempt from here but i know i'm not running zpool trim on my root disk pool set until further notice which this mirror pool certainly is not.

I'm likely going to try a pool rebuild and attempt to trim with a fresh pool and will report the results.

Here's the kernel logs post zfs_vdex_max_active=1

[   89.440215] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[   89.440218] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[   89.440225] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[   89.440229] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[   89.440232] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[   89.440262] sd 6:0:3:0: [sdd] tag#3457 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[   89.440275] sd 6:0:3:0: [sdd] tag#3457 CDB: Write(10) 2a 00 1d 1c 57 80 00 00 08 00
[   89.440284] blk_update_request: I/O error, dev sdd, sector 488396672 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[   89.440292] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 error=5 type=2 offset=222058971136 size=4096 flags=180ac0
[   89.440311] sd 6:0:3:0: [sdd] tag#3456 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[   89.440316] sd 6:0:3:0: [sdd] tag#3456 CDB: Write(10) 2a 00 1d 1c 55 80 00 00 08 00
[   89.440321] blk_update_request: I/O error, dev sdd, sector 488396160 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[   89.440325] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 error=5 type=2 offset=222058708992 size=4096 flags=180ac0
[   89.440333] sd 6:0:3:0: [sdd] tag#3519 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[   89.440342] sd 6:0:3:0: [sdd] tag#3519 CDB: Write(10) 2a 00 03 42 7b 80 00 00 08 00
[   89.440346] blk_update_request: I/O error, dev sdd, sector 54688640 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[   89.440351] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 error=5 type=2 offset=458752 size=4096 flags=180ac0
[   89.440358] sd 6:0:3:0: [sdd] tag#3517 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[   89.440362] sd 6:0:3:0: [sdd] tag#3517 CDB: Write(10) 2a 00 03 42 79 80 00 00 08 00
[   89.440365] blk_update_request: I/O error, dev sdd, sector 54688128 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[   89.440369] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 error=5 type=2 offset=196608 size=4096 flags=180ac0
[   89.992550] mpt3sas_cm0: fault_state(0x6004)!
[   89.992554] mpt3sas_cm0: sending diag reset !!
[   90.957467] mpt3sas_cm0: diag reset: SUCCESS
[   91.021702] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[   91.176534] mpt3sas_cm0: _base_display_fwpkg_version: complete
[   91.176855] mpt3sas_cm0: LSISAS3224: FWVersion(16.00.01.00), ChipRevision(0x01), BiosVersion(18.00.00.00)
[   91.176856] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[   91.176915] mpt3sas_cm0: sending port enable !!
[  100.003767] mpt3sas_cm0: port enable: SUCCESS
[  100.003867] mpt3sas_cm0: search for end-devices: start
[  100.004889] scsi target6:0:0: handle(0x0019), sas_addr(0x300062b20394b740)
[  100.004892] scsi target6:0:0: enclosure logical id(0x500062b20394b740), slot(3)
[  100.004945] scsi target6:0:1: handle(0x001a), sas_addr(0x300062b20394b741)
[  100.004947] scsi target6:0:1: enclosure logical id(0x500062b20394b740), slot(2)
[  100.004990] scsi target6:0:2: handle(0x001b), sas_addr(0x300062b20394b742)
[  100.004992] scsi target6:0:2: enclosure logical id(0x500062b20394b740), slot(0)
[  100.005035] scsi target6:0:3: handle(0x001c), sas_addr(0x300062b20394b743)
[  100.005037] scsi target6:0:3: enclosure logical id(0x500062b20394b740), slot(1)
[  100.005080] scsi target6:0:4: handle(0x001d), sas_addr(0x300062b20394b744)
[  100.005082] scsi target6:0:4: enclosure logical id(0x500062b20394b740), slot(7)
[  100.005129] scsi target6:0:5: handle(0x001e), sas_addr(0x300062b20394b745)
[  100.005131] scsi target6:0:5: enclosure logical id(0x500062b20394b740), slot(6)
[  100.005176] scsi target6:0:6: handle(0x001f), sas_addr(0x300062b20394b746)
[  100.005178] scsi target6:0:6: enclosure logical id(0x500062b20394b740), slot(4)
[  100.005220] scsi target6:0:7: handle(0x0020), sas_addr(0x300062b20394b747)
[  100.005222] scsi target6:0:7: enclosure logical id(0x500062b20394b740), slot(5)
[  100.005266] scsi target6:0:8: handle(0x0021), sas_addr(0x300062b20394b750)
[  100.005268] scsi target6:0:8: enclosure logical id(0x500062b20394b740), slot(11)
[  100.005312] scsi target6:0:10: handle(0x0022), sas_addr(0x300062b20394b752)
[  100.005314] scsi target6:0:10: enclosure logical id(0x500062b20394b740), slot(8)
[  100.005315]  handle changed from(0x0023)!!!
[  100.005357] scsi target6:0:9: handle(0x0023), sas_addr(0x300062b20394b751)
[  100.005359] scsi target6:0:9: enclosure logical id(0x500062b20394b740), slot(10)
[  100.005360]  handle changed from(0x0022)!!!
[  100.005403] scsi target6:0:11: handle(0x0024), sas_addr(0x300062b20394b753)
[  100.005405] scsi target6:0:11: enclosure logical id(0x500062b20394b740), slot(9)
[  100.005473] mpt3sas_cm0: search for end-devices: complete
[  100.005474] mpt3sas_cm0: search for end-devices: start
[  100.005475] mpt3sas_cm0: search for PCIe end-devices: complete
[  100.005476] mpt3sas_cm0: search for expanders: start
[  100.005477] mpt3sas_cm0: search for expanders: complete
[  100.005483] mpt3sas_cm0: _base_fault_reset_work: hard reset: success
[  100.005488] mpt3sas_cm0: removing unresponding devices: start
[  100.005489] mpt3sas_cm0: removing unresponding devices: end-devices
[  100.005490] mpt3sas_cm0: Removing unresponding devices: pcie end-devices
[  100.005491] mpt3sas_cm0: removing unresponding devices: expanders
[  100.005492] mpt3sas_cm0: removing unresponding devices: complete
[  100.005497] mpt3sas_cm0: scan devices: start
[  100.006118] mpt3sas_cm0:     scan devices: expanders start
[  100.006180] mpt3sas_cm0:     break from expander scan: ioc_status(0x0022), loginfo(0x310f0400)
[  100.006181] mpt3sas_cm0:     scan devices: expanders complete
[  100.006183] mpt3sas_cm0:     scan devices: end devices start
[  100.007760] mpt3sas_cm0:     break from end device scan: ioc_status(0x0022), loginfo(0x310f0400)
[  100.007761] mpt3sas_cm0:     scan devices: end devices complete
[  100.007762] mpt3sas_cm0:     scan devices: pcie end devices start
[  100.007780] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[  100.007799] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[  100.007803] mpt3sas_cm0:     break from pcie end device scan: ioc_status(0x0021), loginfo(0x3003011d)
[  100.007804] mpt3sas_cm0:     pcie devices: pcie end devices complete
[  100.007804] mpt3sas_cm0: scan devices: complete
[  100.129679] sd 6:0:3:0: Power-on or device reset occurred
[  100.131188] sd 6:0:0:0: Power-on or device reset occurred
[  100.131196] sd 6:0:2:0: Power-on or device reset occurred
[  100.132498] sd 6:0:1:0: Power-on or device reset occurred
[  100.133699] sd 6:0:5:0: Power-on or device reset occurred
[  100.137979] sd 6:0:10:0: Power-on or device reset occurred
[  100.138342] sd 6:0:11:0: Power-on or device reset occurred
[  100.138391] sd 6:0:4:0: Power-on or device reset occurred
[  100.138713] sd 6:0:9:0: Power-on or device reset occurred
[  100.141598] sd 6:0:6:0: Power-on or device reset occurred
[  100.143354] sd 6:0:8:0: Power-on or device reset occurred
[  100.168447] sd 6:0:7:0: Power-on or device reset occurred
[  130.579449] sd 6:0:3:0: attempting task abort! scmd(0000000089c83e88)
[  130.579461] sd 6:0:3:0: [sdd] tag#4988 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[  130.579466] scsi target6:0:3: handle(0x001c), sas_address(0x300062b20394b743), phy(3)
[  130.579470] scsi target6:0:3: enclosure logical id(0x500062b20394b740), slot(1)
[  130.579473] scsi target6:0:3: enclosure level(0x0000), connector name(     )
[  131.784364] mpt3sas_cm0: fault_state(0x6004)!
[  131.784367] mpt3sas_cm0: sending diag reset !!
[  132.749519] mpt3sas_cm0: diag reset: SUCCESS
[  132.749573] mpt3sas_cm0: Command terminated due to Host Reset
[  132.749576] mf:

[  132.749578] 0100001c
[  132.749579] 00000100
[  132.749580] 00000000
[  132.749581] 00000000
[  132.749582] 00000000
[  132.749582] 00000000
[  132.749583] 00000000
[  132.749584] 00000000
[  132.749585]

[  132.749586] 00000000
[  132.749586] 00000000
[  132.749587] 00000000
[  132.749588] 00000000
[  132.749589] 0000137d

[  132.749603] sd 6:0:3:0: task abort: SUCCESS scmd(0000000089c83e88)
[  132.813788] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[  132.967821] mpt3sas_cm0: _base_display_fwpkg_version: complete
[  132.968130] mpt3sas_cm0: LSISAS3224: FWVersion(16.00.01.00), ChipRevision(0x01), BiosVersion(18.00.00.00)
[  132.968132] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[  132.968188] mpt3sas_cm0: sending port enable !!
[  141.795517] mpt3sas_cm0: port enable: SUCCESS
[  141.795614] mpt3sas_cm0: search for end-devices: start
[  141.796664] scsi target6:0:0: handle(0x0019), sas_addr(0x300062b20394b740)
[  141.796668] scsi target6:0:0: enclosure logical id(0x500062b20394b740), slot(3)
[  141.796710] scsi target6:0:1: handle(0x001a), sas_addr(0x300062b20394b741)
[  141.796713] scsi target6:0:1: enclosure logical id(0x500062b20394b740), slot(2)
[  141.796754] scsi target6:0:2: handle(0x001b), sas_addr(0x300062b20394b742)
[  141.796758] scsi target6:0:2: enclosure logical id(0x500062b20394b740), slot(0)
[  141.796800] scsi target6:0:3: handle(0x001c), sas_addr(0x300062b20394b743)
[  141.796803] scsi target6:0:3: enclosure logical id(0x500062b20394b740), slot(1)
[  141.796844] scsi target6:0:4: handle(0x001d), sas_addr(0x300062b20394b744)
[  141.796846] scsi target6:0:4: enclosure logical id(0x500062b20394b740), slot(7)
[  141.796887] scsi target6:0:5: handle(0x001e), sas_addr(0x300062b20394b745)
[  141.796889] scsi target6:0:5: enclosure logical id(0x500062b20394b740), slot(6)
[  141.796930] scsi target6:0:6: handle(0x001f), sas_addr(0x300062b20394b746)
[  141.796932] scsi target6:0:6: enclosure logical id(0x500062b20394b740), slot(4)
[  141.796973] scsi target6:0:7: handle(0x0020), sas_addr(0x300062b20394b747)
[  141.796975] scsi target6:0:7: enclosure logical id(0x500062b20394b740), slot(5)
[  141.797016] scsi target6:0:8: handle(0x0021), sas_addr(0x300062b20394b750)
[  141.797018] scsi target6:0:8: enclosure logical id(0x500062b20394b740), slot(11)
[  141.797064] scsi target6:0:10: handle(0x0022), sas_addr(0x300062b20394b752)
[  141.797066] scsi target6:0:10: enclosure logical id(0x500062b20394b740), slot(8)
[  141.797107] scsi target6:0:9: handle(0x0023), sas_addr(0x300062b20394b751)
[  141.797109] scsi target6:0:9: enclosure logical id(0x500062b20394b740), slot(10)
[  141.797150] scsi target6:0:11: handle(0x0024), sas_addr(0x300062b20394b753)
[  141.797153] scsi target6:0:11: enclosure logical id(0x500062b20394b740), slot(9)
[  141.797219] mpt3sas_cm0: search for end-devices: complete
[  141.797220] mpt3sas_cm0: search for end-devices: start
[  141.797222] mpt3sas_cm0: search for PCIe end-devices: complete
[  141.797224] mpt3sas_cm0: search for expanders: start
[  141.797225] mpt3sas_cm0: search for expanders: complete
[  141.797233] mpt3sas_cm0: _base_fault_reset_work: hard reset: success
[  141.797238] mpt3sas_cm0: removing unresponding devices: start
[  141.797239] mpt3sas_cm0: removing unresponding devices: end-devices
[  141.797241] mpt3sas_cm0: Removing unresponding devices: pcie end-devices
[  141.797242] mpt3sas_cm0: removing unresponding devices: expanders
[  141.797243] mpt3sas_cm0: removing unresponding devices: complete
[  141.797250] mpt3sas_cm0: scan devices: start
[  141.797804] mpt3sas_cm0:     scan devices: expanders start
[  141.797863] mpt3sas_cm0:     break from expander scan: ioc_status(0x0022), loginfo(0x310f0400)
[  141.797864] mpt3sas_cm0:     scan devices: expanders complete
[  141.797865] mpt3sas_cm0:     scan devices: end devices start
[  141.799307] mpt3sas_cm0:     break from end device scan: ioc_status(0x0022), loginfo(0x310f0400)
[  141.799308] mpt3sas_cm0:     scan devices: end devices complete
[  141.799309] mpt3sas_cm0:     scan devices: pcie end devices start
[  141.799326] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[  141.799345] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[  141.799348] mpt3sas_cm0:     break from pcie end device scan: ioc_status(0x0021), loginfo(0x3003011d)
[  141.799349] mpt3sas_cm0:     pcie devices: pcie end devices complete
[  141.799350] mpt3sas_cm0: scan devices: complete
[  141.923008] sd 6:0:2:0: Power-on or device reset occurred
[  141.923075] sd 6:0:3:0: Power-on or device reset occurred
[  141.923116] sd 6:0:0:0: Power-on or device reset occurred
[  141.923201] sd 6:0:1:0: Power-on or device reset occurred
[  141.928761] sd 6:0:9:0: Power-on or device reset occurred
[  141.928790] sd 6:0:11:0: Power-on or device reset occurred
[  141.956396] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[  141.958947] sd 6:0:7:0: Power-on or device reset occurred
[  141.959101] sd 6:0:8:0: Power-on or device reset occurred
[  141.959145] sd 6:0:10:0: Power-on or device reset occurred
[  141.959247] sd 6:0:4:0: Power-on or device reset occurred
[  141.959296] sd 6:0:5:0: Power-on or device reset occurred
[  141.959347] sd 6:0:6:0: Power-on or device reset occurred
[  142.547967] sd 6:0:3:0: Power-on or device reset occurred
[ 8492.279205] logitech-hidpp-device 0003:046D:4051.0006: HID++ 4.5 device connected.
[51703.443968] logitech-hidpp-device 0003:046D:4076.0007: HID++ 4.1 device connected.
[69105.246529] br0: port 3(vnet1) entered blocking state
[69105.246532] br0: port 3(vnet1) entered disabled state
[69105.246596] device vnet1 entered promiscuous mode
[69105.246786] br0: port 3(vnet1) entered blocking state
[69105.246788] br0: port 3(vnet1) entered forwarding state
[83882.067828] br0: port 3(vnet1) entered disabled state
[83882.072255] device vnet1 left promiscuous mode
[83882.072260] br0: port 3(vnet1) entered disabled state
[89017.080568] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89017.080578] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89017.080610] sd 6:0:3:0: [sdd] tag#637 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89017.080615] sd 6:0:3:0: [sdd] tag#637 CDB: Write(10) 2a 00 0c 46 06 00 00 00 10 00
[89017.080619] blk_update_request: I/O error, dev sdd, sector 205915648 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0
[89017.080622] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 error=5 type=2 offset=77428686848 size=8192 flags=180880
[89017.510016] mpt3sas_cm0: fault_state(0x6004)!
[89017.510033] mpt3sas_cm0: sending diag reset !!
[89018.474335] mpt3sas_cm0: diag reset: SUCCESS
[89018.538873] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[89018.694001] mpt3sas_cm0: _base_display_fwpkg_version: complete
[89018.694298] mpt3sas_cm0: LSISAS3224: FWVersion(16.00.01.00), ChipRevision(0x01), BiosVersion(18.00.00.00)
[89018.694299] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[89018.694354] mpt3sas_cm0: sending port enable !!
[89027.520224] mpt3sas_cm0: port enable: SUCCESS
[89027.520315] mpt3sas_cm0: search for end-devices: start
[89027.521216] scsi target6:0:0: handle(0x0019), sas_addr(0x300062b20394b740)
[89027.521218] scsi target6:0:0: enclosure logical id(0x500062b20394b740), slot(3)
[89027.521256] scsi target6:0:1: handle(0x001a), sas_addr(0x300062b20394b741)
[89027.521257] scsi target6:0:1: enclosure logical id(0x500062b20394b740), slot(2)
[89027.521294] scsi target6:0:2: handle(0x001b), sas_addr(0x300062b20394b742)
[89027.521296] scsi target6:0:2: enclosure logical id(0x500062b20394b740), slot(0)
[89027.521333] scsi target6:0:3: handle(0x001c), sas_addr(0x300062b20394b743)
[89027.521335] scsi target6:0:3: enclosure logical id(0x500062b20394b740), slot(1)
[89027.521372] scsi target6:0:4: handle(0x001d), sas_addr(0x300062b20394b744)
[89027.521373] scsi target6:0:4: enclosure logical id(0x500062b20394b740), slot(7)
[89027.521412] scsi target6:0:5: handle(0x001e), sas_addr(0x300062b20394b745)
[89027.521414] scsi target6:0:5: enclosure logical id(0x500062b20394b740), slot(6)
[89027.521451] scsi target6:0:6: handle(0x001f), sas_addr(0x300062b20394b746)
[89027.521452] scsi target6:0:6: enclosure logical id(0x500062b20394b740), slot(4)
[89027.521489] scsi target6:0:7: handle(0x0020), sas_addr(0x300062b20394b747)
[89027.521490] scsi target6:0:7: enclosure logical id(0x500062b20394b740), slot(5)
[89027.521528] scsi target6:0:8: handle(0x0021), sas_addr(0x300062b20394b750)
[89027.521530] scsi target6:0:8: enclosure logical id(0x500062b20394b740), slot(11)
[89027.521568] scsi target6:0:10: handle(0x0022), sas_addr(0x300062b20394b752)
[89027.521569] scsi target6:0:10: enclosure logical id(0x500062b20394b740), slot(8)
[89027.521606] scsi target6:0:9: handle(0x0023), sas_addr(0x300062b20394b751)
[89027.521608] scsi target6:0:9: enclosure logical id(0x500062b20394b740), slot(10)
[89027.521650] scsi target6:0:11: handle(0x0024), sas_addr(0x300062b20394b753)
[89027.521651] scsi target6:0:11: enclosure logical id(0x500062b20394b740), slot(9)
[89027.521711] mpt3sas_cm0: search for end-devices: complete
[89027.521711] mpt3sas_cm0: search for end-devices: start
[89027.521712] mpt3sas_cm0: search for PCIe end-devices: complete
[89027.521713] mpt3sas_cm0: search for expanders: start
[89027.521714] mpt3sas_cm0: search for expanders: complete
[89027.521720] mpt3sas_cm0: _base_fault_reset_work: hard reset: success
[89027.521724] mpt3sas_cm0: removing unresponding devices: start
[89027.521725] mpt3sas_cm0: removing unresponding devices: end-devices
[89027.521725] mpt3sas_cm0: Removing unresponding devices: pcie end-devices
[89027.521726] mpt3sas_cm0: removing unresponding devices: expanders
[89027.521727] mpt3sas_cm0: removing unresponding devices: complete
[89027.521729] mpt3sas_cm0: scan devices: start
[89027.522200] mpt3sas_cm0:     scan devices: expanders start
[89027.522252] mpt3sas_cm0:     break from expander scan: ioc_status(0x0022), loginfo(0x310f0400)
[89027.522252] mpt3sas_cm0:     scan devices: expanders complete
[89027.522253] mpt3sas_cm0:     scan devices: end devices start
[89027.523490] mpt3sas_cm0:     break from end device scan: ioc_status(0x0022), loginfo(0x310f0400)
[89027.523491] mpt3sas_cm0:     scan devices: end devices complete
[89027.523492] mpt3sas_cm0:     scan devices: pcie end devices start
[89027.523505] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[89027.523519] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[89027.523522] mpt3sas_cm0:     break from pcie end device scan: ioc_status(0x0021), loginfo(0x3003011d)
[89027.523522] mpt3sas_cm0:     pcie devices: pcie end devices complete
[89027.523523] mpt3sas_cm0: scan devices: complete
[89027.647728] sd 6:0:3:0: Power-on or device reset occurred
[89027.647808] sd 6:0:0:0: Power-on or device reset occurred
[89027.647892] sd 6:0:1:0: Power-on or device reset occurred
[89027.655001] sd 6:0:8:0: Power-on or device reset occurred
[89027.656901] sd 6:0:7:0: Power-on or device reset occurred
[89027.659283] sd 6:0:11:0: Power-on or device reset occurred
[89027.660610] sd 6:0:9:0: Power-on or device reset occurred
[89027.660802] sd 6:0:10:0: Power-on or device reset occurred
[89027.661496] sd 6:0:6:0: Power-on or device reset occurred
[89027.663245] sd 6:0:5:0: Power-on or device reset occurred
[89027.665317] sd 6:0:4:0: Power-on or device reset occurred
[89027.772630] sd 6:0:2:0: Power-on or device reset occurred
[89027.827594] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827600] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827610] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827613] sd 6:0:2:0: [sdc] tag#2055 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827615] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827619] sd 6:0:2:0: [sdc] tag#602 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827621] sd 6:0:2:0: [sdc] tag#605 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827623] sd 6:0:2:0: [sdc] tag#2055 CDB: Write(10) 2a 00 03 ad 0f 50 00 00 08 00
[89027.827625] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827627] sd 6:0:2:0: [sdc] tag#605 CDB: Write(10) 2a 00 14 90 18 00 00 00 08 00
[89027.827631] blk_update_request: I/O error, dev sdc, sector 61673296 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827634] sd 6:0:2:0: [sdc] tag#602 CDB: Write(10) 2a 00 0b 95 09 70 00 00 08 00
[89027.827640] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=3576602624 size=4096 flags=180880
[89027.827643] blk_update_request: I/O error, dev sdc, sector 344987648 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827645] sd 6:0:2:0: [sdc] tag#604 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827648] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827652] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=148633550848 size=4096 flags=180880
[89027.827654] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827658] blk_update_request: I/O error, dev sdc, sector 194316656 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827660] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827661] sd 6:0:2:0: [sdc] tag#692 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827666] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=71490002944 size=4096 flags=180880
[89027.827667] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827669] sd 6:0:2:0: [sdc] tag#692 CDB: Write(10) 2a 00 0c 46 07 00 00 00 10 00
[89027.827673] blk_update_request: I/O error, dev sdc, sector 205915904 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0
[89027.827674] sd 6:0:2:0: [sdc] tag#604 CDB: Write(10) 2a 00 11 86 04 d8 00 00 08 00
[89027.827675] sd 6:0:2:0: [sdc] tag#691 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827678] blk_update_request: I/O error, dev sdc, sector 293995736 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827681] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=77428817920 size=4096 flags=180880
[89027.827684] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=122525691904 size=4096 flags=180880
[89027.827684] sd 6:0:2:0: [sdc] tag#691 CDB: Write(10) 2a 00 09 1e 36 70 00 00 08 00
[89027.827687] blk_update_request: I/O error, dev sdc, sector 152974960 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827691] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=50323054592 size=4096 flags=180880
[89027.827693] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=77428822016 size=4096 flags=180880
[89027.827695] sd 6:0:2:0: [sdc] tag#601 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827699] sd 6:0:2:0: [sdc] tag#601 CDB: Write(10) 2a 00 11 86 04 d0 00 00 08 00
[89027.827700] sd 6:0:2:0: [sdc] tag#688 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827704] blk_update_request: I/O error, dev sdc, sector 293995728 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827705] sd 6:0:2:0: [sdc] tag#688 CDB: Write(10) 2a 00 13 57 db c8 00 00 08 00
[89027.827707] blk_update_request: I/O error, dev sdc, sector 324525000 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827710] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=122525687808 size=4096 flags=180880
[89027.827713] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=138156675072 size=4096 flags=180880
[89027.827718] sd 6:0:2:0: [sdc] tag#685 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827720] sd 6:0:2:0: [sdc] tag#685 CDB: Write(10) 2a 00 0b 95 09 68 00 00 08 00
[89027.827721] blk_update_request: I/O error, dev sdc, sector 194316648 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827723] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=71489998848 size=4096 flags=180880
[89028.522746] sd 6:0:2:0: Power-on or device reset occurred
[89028.560512] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89028.560516] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89028.560520] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89028.560521] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89028.560523] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89028.560544] sd 6:0:2:0: [sdc] tag#637 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89028.560547] sd 6:0:2:0: [sdc] tag#637 CDB: Write(10) 2a 00 1d 1c 57 c8 00 00 08 00
[89028.560550] blk_update_request: I/O error, dev sdc, sector 488396744 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89028.560553] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=222059008000 size=4096 flags=180ac0
[89028.560565] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=222058745856 size=4096 flags=180ac0
[89028.560569] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=495616 size=4096 flags=180ac0
[89028.560571] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=233472 size=4096 flags=180ac0
[89029.604982] mpt3sas_cm0: fault_state(0x6004)!
[89029.604985] mpt3sas_cm0: sending diag reset !!
[89030.569407] mpt3sas_cm0: diag reset: SUCCESS
[89030.634010] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[89030.787981] mpt3sas_cm0: _base_display_fwpkg_version: complete
[89030.788253] mpt3sas_cm0: LSISAS3224: FWVersion(16.00.01.00), ChipRevision(0x01), BiosVersion(18.00.00.00)
[89030.788254] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[89030.788296] mpt3sas_cm0: sending port enable !!

Follow up zpool commands and general system info

fenrir ~ # zpool status -t deadpool
  pool: deadpool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: scrub repaired 0B in 0 days 00:00:48 with 0 errors on Fri Jun 26 04:00:49 2020
config:

        NAME                                                     STATE     READ WRITE CKSUM
        deadpool                                                 DEGRADED     0     0     0
          mirror-0                                               DEGRADED     1     2     0
            ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4  ONLINE       1     3     0  (untrimmed)
            ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4  FAULTED      0    14     0  too many errors  (0% trimmed, started at Fri Jun 26 20:32:17 2020)

errors: No known data errors
fenrir ~ # cat /sys/module/zfs/parameters/zfs_vdev_trim_max_active
1
fenrir ~ # uname -a
Linux fenrir 5.4.38-gentoo #3 SMP Wed Jun 17 19:31:19 CDT 2020 x86_64 Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz GenuineIntel GNU/Linux

0 replies

xartin · 2020-06-29T23:47:55Z

xartin
Jun 29, 2020

Update to my previous comment.

I migrated datasets off of the 250GB mirror pool to prepare for further testing,

zpool destroyed the pool
used cli parted to mklabel a new gpt partition label
recreated a new pool using full disks not disk partitions
forced zfs_vdev_trim_max_active=1

Pool creation command string used

zpool create -f -o ashift=12 -O compression=lz4 -O xattr=sa -O relatime=on -O dedup=off deadpool2 mirror /dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_XXXXXXXXXXXXXXX /dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_YYYYYYYYYYYYYYY

Results... zpool trim succeeded with no device reset errors, same ssd disks and same server. Bizarre...

I also tested attempting to rate limit trim using zpool trim -r initially to attempt to further rate limit the trim intensity expecting dmesg to fill with mpt3sas device reset errors however there were none after testing zpool trim -r and forcing several different trim rate limits or just allowing trim to proceed at full i/o speed.

# zpool status -t deadpool2
  pool: deadpool2
 state: ONLINE
  scan: none requested
config:

        NAME                                               STATE     READ WRITE CKSUM
        deadpool2                                          ONLINE       0     0     0
          mirror-0                                         ONLINE       0     0     0
            ata-Samsung_SSD_860_EVO_250GB_XXXXXXXXXXXXXXX  ONLINE       0     0     0  (100% trimmed, completed at Sat Jun 27 18:49:17 2020)
            ata-Samsung_SSD_860_EVO_250GB_YYYYYYYYYYYYYYY  ONLINE       0     0     0  (100% trimmed, completed at Sat Jun 27 18:49:17 2020)

errors: No known data errors

0 replies

chase9 · 2020-12-15T18:18:29Z

chase9
Dec 15, 2020

I believe I just ran into this issue as well. Running Ubuntu Server 20.04.01 with a fully upgraded zpool. 4 Samsung 860 EVOs in a mirror-stripped setup. Running a stock zpool trim vpool caused each device to become degraded and the system to lock up due to too many write errors.

Currently running trim on each drive individually is working fine.

0 replies

xartin · 2020-12-15T18:40:47Z

xartin
Dec 15, 2020

I believe I just ran into this issue as well. Running Ubuntu Server 20.04.01 with a fully upgraded zpool. 4 Samsung 860 EVOs in a mirror-stripped setup. Running a stock zpool trim vpool caused each device to become degraded and the system to lock up due to too many write errors.

Currently running trim on each drive individually is working fine.

How did you run trim individually on each drive?

i've set the kernel parameter for zfs_vdev_trim_max_active=1 and tested this again across several zfs versions up to 0.8.5 and the issue has been more frequently reproducible for disks containing partitions where zfs only resides within the boundary of a disk partition however my "deadpool" mirror pool where zfs was not restricted to the boundaries of disk partitions recently experienced the same complication so theories why this occurs are uncertain. It may be worth mentioning my rpool was created by cli parted with a gpt disk label.

my 500gb mirror pool set i believe has samsung firmware updates available and lsi 9305-16i hba had firmware updates applied but does have more recent firmware update available however when I previously had maintenance windows available samsung magician was unable to update firmware for disks connected to the lsi hba controller.

I have another maintenance window planned within coming weeks to decommission deadpool replacing that mirror set then expanding rpool by adding another 500gb mirror as well as planned strategy for applying firmware updates for the sata ssd's and hba controller. After the updates and maintenance are complete i'll be able to attempt to reproduce or isolate this bug further.

I have been suspicious that sata ssd's run by a sas hba may have been part of the cause of complications experienced because of the interface capability differences between sas and sata while similar iirc sata is not capable of the command or bandwidth parallelism of sas. I've not been able to or have not attempted to reproduce this bug using direct sata 6gb/s connections provided by a consumer pc motherboard however soon I will have the 250gb ssd set available to use for installing an rpool on another pc using an asus prime z270-a motherboard.

0 replies

chase9 · 2020-12-16T17:54:12Z

chase9
Dec 16, 2020

Interesting. Two of the drives in my setup are partitioned to contain EFI, so I fall within both of those groupings.

Unfortunately, I don't have a good solution for trimming drives individually. I just issued the trim command by hand zpool trim pool drive-name.

0 replies

QBANIN · 2021-11-30T01:25:54Z

QBANIN
Nov 30, 2021

Hi,
Until a week ago I had LSI9217-8i@HBA with 2x Samsung 860 Evo as ZFS Raid1 + 2xWD RED 2,5" 1TB HDD as RAID0 and 4x WD RED 3,5" 3TB as RAID10 . Recently I all 3,5" WD REDs with Seagate SAS drives and since then I'm getting this errors at least once a day. It happens at xx H and 01 m (few times at 2:01 and 16:01) like in attached log. sda and sdd are Samsung 860 Evo drives. The weird thing is that for almost 2 years I had no problems until recent 3,5" drives replacement. Any clue why this error apeears always at 01m and how to fix it? SMART for all drives is clear. After theses read errors RAID1 pool becomes degraded but works fine after a scrub till next time.

sudo zfs version
zfs-2.0.3-9bpo10+1
zfs-kmod-2.0.3-9bpo10+1

uname -a
Linux NAS 5.4.161-qba1 #3 SMP Sun Nov 21 18:11:35 CET 2021 x86_64 GNU/Linux

sudo dmesg -T |grep DID
[nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8439 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8401 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8424 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8418 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8437 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8390 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8408 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8438 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8405 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8386 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8429 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8427 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8407 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8433 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8397 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8428 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8426 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8425 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8423 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8422 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8447 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8442 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8439 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8438 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8436 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8427 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8423 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9392 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9387 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9360 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9362 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9369 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9401 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9391 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9382 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9388 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9370 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9348 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9347 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9345 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9344 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9360 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9359 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9358 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9357 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9356 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9346 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9396 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9389 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9388 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9366 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9371 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9349 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[wto lis 30 02:02:06 2021] sd 0:0:3:0: [sdd] tag#9393 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[wto lis 30 02:02:06 2021] sd 0:0:3:0: [sdd] tag#9370 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[wto lis 30 02:02:06 2021] sd 0:0:3:0: [sdd] tag#9369 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[wto lis 30 02:02:06 2021] sd 0:0:3:0: [sdd] tag#9368 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK

1 reply

xartin Nov 30, 2021

Do you have any autotrim attributes set in your pool or trim cron tasks configured?

You may just want to disable autotrim or avoid using zfs trim entirely to attempt to prevent pool degradation or investigate if an hba controller firmware update might resolve the symptoms.

I've read some of the firmware update patch notes related to the 9305-16i my server uses that as best i recall revealed lsi and samsung may have collaborated to resolve a functionality bug with certain samsung ssd's and hba controller models due to bug reports or concerns users of synology or some nas solutions relying on zfs reported that potentially may have been related to the disk controller resets using zfs trim with similar hardware configurations comments in this thread have mentioned but older hba controllers due to age perhaps may be too old to have firmware updates available.

I've become somewhat resolved to hardware, firmware and software generational disparity being too dissimilar for the components I rely on to be exceptionally compatible or functionally reliable for zfs trim to function with the hba controller and sata ssd disk models. Something i've not however tested is a new 870 evo samsung ssd disk set that supposedly has a newer design disk controller.

Almost a year later since my last reply to this post with the same server now running 4x 500gb 860 evo's as a dual vdev mirror pool config and trim still refuses to complete without degrading the pool still running Gentoo /w Linux kernel 5.10.x and zfs 2.1.x

I'm not 100% certain of the cause for this issue but considering the kernel versions i've used for attempting to resolve this the previously described symptoms may perhaps specifically be a samsung sata disk and lsi hba controller firmware complication or compatibility related concern. Other than the trim being nonfunctional with sata ssd's my large disk pool comprised of ironwolf pro disks and a 950 pro never complains or experiences complications connected to the same 9305-16i hba controller.

The only task i've not completed is an hba firmware update and potentially could however it's been so long since I upgraded the hba controller firmware I've forgotten how to successfully complete that maintenance task and if its not broke I'm content with it working as well as it has been for several years.

The garbage collection features of samsung ssd's have fortunately prevented an ssd data write lockup or write prevention due to trim not having succeeded for a year or more and while not generally ideal I also have a samsung 950 pro configured as slog installed in the same server does not have any complications completing zfs trim operations.

I've read somewhere within the past three to six months that sata ssd's may not be entirely compatible with zfs trim but I've immediately forgotten where I located that informational table.

IvanVolosyuk · 2021-11-30T04:03:24Z

IvanVolosyuk
Nov 30, 2021

Did you upgrade the kernel by any chance during the replacement? It looks like 5.4.152 has some changes to trimming on Samsung 860 SSDs. There is a way to restore NCQ for the drive using kernel command line arguments. ======================== commit caff281e2073e4e71fd0bced2385b7771b512264 Author: Kate Hsuan ***@***.***> Date: Fri Sep 3 17:44:11 2021 +0800 libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD. commit 7a8526a5cd51cf5f070310c6c37dd7293334ac49 upstream. Many users are reporting that the Samsung 860 and 870 SSD are having various issues when combined with AMD/ATI (vendor ID 0x1002) SATA controllers and only completely disabling NCQ helps to avoid these issues. Always disabling NCQ for Samsung 860/870 SSDs regardless of the host SATA adapter vendor will cause I/O performance degradation with well behaved adapters. To limit the performance impact to ATI adapters, introduce the ATA_HORKAGE_NO_NCQ_ON_ATI flag to force disable NCQ only for these adapters. ..... ========

…

On Tue, Nov 30, 2021 at 12:26 PM QBANIN ***@***.***> wrote: Hi, Until a week ago I had ***@***.*** with 2x Samsung 860 Evo as ZFS Raid1 + 2xWD RED 2,5" 1TB HDD as RAID0 and 4x WD RED 3,5" 3TB as RAID10 . Recently I all 3,5" WD REDs with Seagate SAS drives and since then I'm getting this errors at least once a day. It happens at xx H and 01 m (few times at 2:01 and 16:01) like in attached log. sda and sdd are Samsung 860 Evo drives. The weird thing is that for almost 2 years I had no problems until recent 3,5" drives replacement. Any clue why this error apeears always at 01m and how to fix it? SMART for all drives is clear. After theses read errors RAID1 pool becomes degraded but works fine after a scrub till next time. sudo zfs version zfs-2.0.3-9bpo10+1 zfs-kmod-2.0.3-9bpo10+1 uname -a Linux NAS 5.4.161-qba1 #3 <#3> SMP Sun Nov 21 18:11:35 CET 2021 x86_64 GNU/Linux sudo dmesg -T |grep DID [nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8439 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8401 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8424 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8418 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8437 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8390 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8408 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8438 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8405 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [nie lis 28 16:01:06 2021] sd 0:0:3:0: [sdd] tag#8386 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8429 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8427 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8407 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8433 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8397 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8428 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8426 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8425 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8423 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:01:38 2021] sd 0:0:3:0: [sdd] tag#8422 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8447 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8442 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8439 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8438 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8436 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8427 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [nie lis 28 16:02:09 2021] sd 0:0:0:0: [sda] tag#8423 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9392 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9387 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9360 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9362 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9369 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9401 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9391 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9382 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9388 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:04 2021] sd 0:0:0:0: [sda] tag#9370 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9348 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9347 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9345 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9344 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9360 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9359 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9358 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9357 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9356 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [wto lis 30 02:01:35 2021] sd 0:0:0:0: [sda] tag#9346 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9396 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9389 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9388 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9366 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9371 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:02:05 2021] sd 0:0:0:0: [sda] tag#9349 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [wto lis 30 02:02:06 2021] sd 0:0:3:0: [sdd] tag#9393 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [wto lis 30 02:02:06 2021] sd 0:0:3:0: [sdd] tag#9370 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [wto lis 30 02:02:06 2021] sd 0:0:3:0: [sdd] tag#9369 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK [wto lis 30 02:02:06 2021] sd 0:0:3:0: [sdd] tag#9368 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8835 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABXQ6HJIQV5LTOU274G46HDUOQR5LANCNFSM5JAQ4OUQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

0 replies

QBANIN · 2021-11-30T11:21:59Z

QBANIN
Nov 30, 2021

@xartin This error appears with both autotrim on or off

@IvanVolosyuk I'm almost 100% certain that I upgraded to >5.4.152 few weeks before disk replacement. Besides I don't use ATI controller. Tried to trigger this issue by running trim manually but no luck.

0 replies

QBANIN · 2021-12-01T16:51:43Z

QBANIN
Dec 1, 2021

Yesterday I disabled NCQ for 2 SSD drives with echo 1 > /sys/block/sdX/device/queue_depth command. Today the same error has been triggered by 2 remaining SATA WD RED HDDs, guess when? at 11:01 :( What kind of operation is linux performing at xx:01m ? There's nothing scheduled in cron at this time.

Could the be some kind of interference between SAS and SATA drives?

BTW. Drives attached to LSI controller don't respect libata.force=noncq kernel parameters, queue depth=32 after reboot, this works only for drives attached to mobo sata controller.

0 replies

keithy · 2022-02-14T12:41:18Z

keithy
Feb 14, 2022

I'm hitting this issue nixos 21.05/21.11 SanDisk plus drives

5 replies

nixomose Feb 14, 2022

I've come to realize lately that sandisk ssds are crap. I've got a skhynix, sandisk and kingston 1tb ssd, the skhynix and kingston read at 500+MB/s and the sandisk reads at ~200MB/s. my lesson here is not to buy sandisk anymore. that said, I'm still a bit afraid to run trim. :-)

QBANIN Feb 15, 2022

@nixomose don't buy Samsung either https://bugzilla.kernel.org/show_bug.cgi?id=203475

nixomose Feb 16, 2022

well that's comforting. :-) Thanks for the heads up.

xartin Jul 6, 2022

Update to contribute.

My server was begging for yearly maintenance and this provided an opportunity to flash update the firmware and bios of the 9305-16i HBA controller in my server. The use of sas3flash required a temporary windows install.

Both the bios and firmware flash upgrades succeeded and after I ran zpool trim on one 860 evo ssd individually and zpool trim succeeded without producing any device reset or kernel driver errors.

I then ran zpool trim on the remaining three disks at the same time and all three succeeded. the time to trim these was longer than usual but these sata disks haven't been ssd trimmed in YEARS.

If you have a samsung evo ssd's and an LSI hba controller consider updating the controller bios and firmware with sas3flash. Today was the first occasion zpool trim has succeeded completion of trim on four sata ssd's since my server was built several years ago.

This was completed running a Gentoo linux admincd with the following zfs and kernel versions using gentoo's distro kernel.

# zfs version
zfs-2.1.4-r0-gentoo
zfs-kmod-2.1.4-r1-gentoo
Linux 5.15.41-gentoo-x86_64

I'll update this comment with the firmware and bios versions asap.

rincebrain Jul 6, 2022
Collaborator

sas3flash runs on Linux, you know.

Potential trim issue on 0.8.0 #8835

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

Replies: 32 comments · 6 replies

kroy-the-rabbit May 30, 2019 Author

behlendorf May 30, 2019 Maintainer

kroy-the-rabbit May 30, 2019 Author

behlendorf May 30, 2019 Maintainer

behlendorf Dec 13, 2019 Maintainer

behlendorf May 1, 2020 Maintainer

rincebrain Jul 6, 2022 Collaborator

Replies: 32 comments 6 replies

kroy-the-rabbit
May 30, 2019
Author

behlendorf
May 30, 2019
Maintainer

kroy-the-rabbit
May 30, 2019
Author

behlendorf
May 30, 2019
Maintainer

behlendorf
Dec 13, 2019
Maintainer

behlendorf
May 1, 2020
Maintainer

rincebrain Jul 6, 2022
Collaborator