-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PANIC at zfs_znode.c with zfs 2.1 and 2.2 #16607
Comments
@snajpa it looks like you deleted your comment, but I will post my properties on my pool. Maybe it will help, looks like project_quota i active.
|
yeah sorry I was reading through the code and my fingers were faster than my brain, I thought I saw something related to project quota but I invalidated it after a bit :-D |
I think this could fit what I'm trying to fix in #16625; but that fix won't help in this case, when there's a znode without SA initialized, but somehow still linked to a directory. I tried figuring out a way how to call For now please make a snapshot of the affected dataset and then let's try the patch below, to see if it enables you to move forward without a panic - it should evict the inode and throw If I'm right, the file was created with O_TMPFILE and perhaps power was lost, or the kernel crashed in the wrong section, otherwise I can't explain how we end up with a znode without any SA at all. The patch (should be applied to current master - 75dda92): diff --git a/module/os/linux/zfs/zfs_znode_os.c b/module/os/linux/zfs/zfs_znode_os.c
index f13edf95b..025f2482a 100644
--- a/module/os/linux/zfs/zfs_znode_os.c
+++ b/module/os/linux/zfs/zfs_znode_os.c
@@ -323,7 +323,7 @@ zfs_cmpldev(uint64_t dev)
return (dev);
}
-static void
+static int
zfs_znode_sa_init(zfsvfs_t *zfsvfs, znode_t *zp,
dmu_buf_t *db, dmu_object_type_t obj_type, sa_handle_t *sa_hdl)
{
@@ -334,8 +334,11 @@ zfs_znode_sa_init(zfsvfs_t *zfsvfs, znode_t *zp,
ASSERT(zp->z_sa_hdl == NULL);
ASSERT(zp->z_acl_cached == NULL);
if (sa_hdl == NULL) {
- VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp,
- SA_HDL_SHARED, &zp->z_sa_hdl));
+ if (0 != sa_handle_get_from_db(zfsvfs->z_os, db, zp,
+ SA_HDL_SHARED, &zp->z_sa_hdl)) {
+ zfs_dbgmsg("sa_handle_get_from_db failed");
+ return (1);
+ }
} else {
zp->z_sa_hdl = sa_hdl;
sa_set_userp(sa_hdl, zp);
@@ -344,6 +347,7 @@ zfs_znode_sa_init(zfsvfs_t *zfsvfs, znode_t *zp,
zp->z_is_sa = (obj_type == DMU_OT_SA) ? B_TRUE : B_FALSE;
mutex_exit(&zp->z_lock);
+ return (0);
}
void
@@ -538,7 +542,11 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
zp->z_sync_writes_cnt = 0;
zp->z_async_writes_cnt = 0;
- zfs_znode_sa_init(zfsvfs, zp, db, obj_type, hdl);
+ int fail = zfs_znode_sa_init(zfsvfs, zp, db, obj_type, hdl);
+ if (fail) {
+ iput(ip);
+ return (SET_ERROR(EIO));
+ }
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL, &mode, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GEN(zfsvfs), NULL, &tmp_gen, 8); |
I initially tried deleting a corrupted file and directory, but I destroyed and recreated two ZFS datasets when that failed. This resolved my issue. The datasets weren’t critical (mostly build directories), but I needed them to continue working. The first time I encountered the error was from a directory which can't be a At the time, I didn’t realize I should have taken a snapshot, assuming a corrupted file or directory wouldn’t be included in a snapshot. I am relatively new to ZFS. The problem first appeared when deleting the directory: a specific file caused ZFS to panic. I opened a second SSH connection to my workstation and retried, but I encountered an EIO error instead of the panic. This resulted in a CPU hang on At this point, I’m not sure if the patch will make a difference, as I’ve already recreated the datasets. I don't know if the files I was trying to delete were created with O_TMPFILE since they’ve been there for a while. They are part of a build system that typically doesn’t create temporary files for the type I was trying to delete. The second dataset could have used O_TMPFILE, as it was from a C++ build with -pipe—maybe. The drive they’re on has 68,000 power-on hours (SSD). I need help analyzing the smartctl output; you likely have more expertise interpreting that data than I do.
|
I actually got the panic again on another more critical dataset. It's already in my snapshots, so I'll try the patch. This happened during a scan with |
Curious, is your |
|
@snajpa so I had bad memory stick, swap to new sticks and now no more problems. So it's not a ZFS bug per say I guess? |
if you can't reproduce after a HW change (and no SW upgrade), then it must have been it :) |
System information
Describe the problem you're observing
I am experiencing kernel panic when navigating to a ZFS dataset or copying data to it. The issue persists even after upgrading. Initially, I was using ZFS version 2.1 with kernel 6.1, and the problem occurred. I upgraded to the latest versions available to both ZFS and the kernel bookworm backports, but the issue remains.
Describe how to reproduce the problem
The kernel panic started when I used the "z" plugin in oh-my-zsh to navigate directories within a ZFS dataset. The panic also occurs when trying to rsync a directory from an ext4 filesystem to a ZFS dataset.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: