After reload-incremental will cause error when mount image #194

jamesruic · 2022-11-16T07:58:52Z

Hi I'm testing elioctl reload-incremental and reload-snapshot command with latest code, and found it has some problem.
After elioctl reload-incremental or reload-snapshot, then update changes to image. This image will cause error while mounting.

Here are my test steps and virtual machine information.
My VM is CentOS 7.9 on VMware and its info:

[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

[root@localhost ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 908M     0  908M   0% /dev
tmpfs                    919M     0  919M   0% /dev/shm
tmpfs                    919M  8.9M  910M   1% /run
tmpfs                    919M     0  919M   0% /sys/fs/cgroup
/dev/mapper/centos-root   14G  1.8G   12G  13% /
/dev/sdb1                2.0G   33M  2.0G   2% /data
/dev/sda2               1014M  143M  872M  15% /boot
/dev/sda1                200M   12M  189M   6% /boot/efi
tmpfs                    184M     0  184M   0% /run/user/0
/dev/sdc1                 50G   33M   50G   1% /mnt

[root@localhost ~]# lsblk -fp
NAME                        FSTYPE      LABEL           UUID                                   MOUNTPOINT
/dev/sda
├─/dev/sda1                 vfat                        4363-9EE4                              /boot/efi
├─/dev/sda2                 xfs                         3b2b2be2-5144-4004-a677-b637cd956f3c   /boot
└─/dev/sda3                 LVM2_member                 4hODIp-EdZc-ehjb-bM2X-liUE-9BaH-LoO5Tk
  ├─/dev/mapper/centos-root xfs                         ccbc5ed4-d7ad-4fd2-8716-d6d65da8ca6b   /
  └─/dev/mapper/centos-swap swap                        447468cd-858d-4139-90e0-0cae99e27322   [SWAP]
/dev/sdb
└─/dev/sdb1                 xfs                         f7eb5852-c43e-497c-b02c-aeed0c79d570   /data
/dev/sdc
└─/dev/sdc1                 xfs                         ea385966-6f1e-433f-90ab-0833c661da90   /mnt
/dev/sr0                    iso9660     CentOS 7 x86_64 2020-11-04-11-36-43-00

Here's my test steps:

Change reload file then build rpm packages :

[root@localhost ~]# cat elastio-snap/dist/initramfs/dracut/elastio-snap.sh
#!/bin/sh

type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh

modprobe elastio-snap

[ -z "$root" ] && root=$(getarg root=)
[ -z "$rootfstype" ] && rootfstype=$(getarg rootfstype=)

rbd="${root#block:}"
if [ -n "$rbd" ]; then
    case "$rbd" in
        LABEL=*)
            rbd="$(echo $rbd | sed 's,/,\\x2f,g')"
            rbd="/dev/disk/by-label/${rbd#LABEL=}"
            ;;
        UUID=*)
            rbd="/dev/disk/by-uuid/${rbd#UUID=}"
            ;;
        PARTLABEL=*)
            rbd="/dev/disk/by-partlabel/${rbd#PARTLABEL=}"
            ;;
        PARTUUID=*)
            rbd="/dev/disk/by-partuuid/${rbd#PARTUUID=}"
            ;;
    esac

    echo "elastio-snap: root block device = $rbd" > /dev/kmsg

    # Device might not be ready
    if [ ! -b "$rbd" ]; then
        udevadm settle
    fi

    # Kernel cmdline might not specify rootfstype
    [ -z "$rootfstype" ] && rootfstype=$(blkid -s TYPE "$rbd" -o value)

    echo "elastio-snap: mounting $rbd as $rootfstype" > /dev/kmsg
    blockdev --setro $rbd
    mount -t $rootfstype -o ro "$rbd" /etc/elastio/dla/mnt
    udevadm settle

    if [ -x /etc/elastio/dla/mnt/elastio-reload ]; then
        /etc/elastio/dla/mnt/elastio-reload
    else
        echo "elastio-snap: error: cannot reload tracking data: missing /sbin/elastio_reload" > /dev/kmsg
    fi

    umount -f /etc/elastio/dla/mnt
    blockdev --setrw $rbd
fi

[root@localhost ~]# cat /elastio-reload
#!/bin/sh
modprobe elastio-snap -d /etc/elastio/dla/mnt
elioctl reload-incremental /dev/sdb1 /.snapshot0 0

Create /dev/sdb1 snapshot and copy image to another disk (/dev/sdc):

[root@localhost ~]# elioctl setup-snapshot /dev/sdb1 /data/.snapshot0 0
[root@localhost ~]# cat /proc/elastio-snap-info
{
        "version": "0.11.0",
        "devices": [
                {
                        "minor": 0,
                        "cow_file": "/.snapshot0",
                        "block_device": "/dev/sdb1",
                        "max_cache": 314572800,
                        "fallocate": 213909504,
                        "seq_id": 1,
                        "uuid": "ae776b8c35124ea4b9eeeb8cebbb8034",
                        "version": 1,
                        "nr_changed_blocks": 0,
                        "state": 3
                }
        ]
}

[root@localhost ~]# dd if=/dev/elastio-snap0 of=/mnt/mydisk bs=4M
511+1 records in
511+1 records out
2145386496 bytes (2.1 GB) copied, 4.23293 s, 507 MB/s

Put the snapshot into incremental mode then reboot to trigger reload-incremental:

[root@localhost ~]# elioctl transition-to-incremental 0
[root@localhost ~]# cat /proc/elastio-snap-info
{
        "version": "0.11.0",
        "devices": [
                {
                        "minor": 0,
                        "cow_file": "/.snapshot0",
                        "block_device": "/dev/sdb1",
                        "max_cache": 314572800,
                        "fallocate": 213909504,
                        "seq_id": 1,
                        "uuid": "ae776b8c35124ea4b9eeeb8cebbb8034",
                        "version": 1,
                        "nr_changed_blocks": 9,
                        "state": 2
                }
        ]
}

[root@localhost ~]# reboot

Check /proc/elastio-snap-info and add new file:

[root@localhost ~]# cat /proc/elastio-snap-info
{
        "version": "0.11.0",
        "devices": [
                {
                        "minor": 0,
                        "cow_file": "/.snapshot0",
                        "block_device": "/dev/sdb1",
                        "max_cache": 314572800,
                        "fallocate": 213909504,
                        "seq_id": 1,
                        "uuid": "ae776b8c35124ea4b9eeeb8cebbb8034",
                        "version": 1,
                        "nr_changed_blocks": 9,
                        "state": 2
                }
        ]
}

[root@localhost ~]# touch /data/tempfile2
[root@localhost ~]# ls -la /data/
total 4104
drwxr-xr-x.  2 root root      57 Nov 16 14:40 .
dr-xr-xr-x. 18 root root     277 Nov 16 14:34 ..
----------.  1 root root 4198400 Nov 16 14:38 .snapshot0
-rw-r--r--.  1 root root       6 Nov 16 14:25 tempfile
-rw-r--r--.  1 root root       0 Nov 16 14:40 tempfile2

Switch into snapshot mode and update block changes:

[root@localhost ~]# elioctl transition-to-snapshot /.snapshot1 0
[root@localhost ~]# update-img /dev/elastio-snap0 /data/.snapshot0 /mnt/mydisk
snapshot is 523776 blocks large
copying blocks
copying complete: 13 blocks changed, 0 errors

Move image to another VM, try to mount it but get some errors:

[root@localhost2 ~]# mount /mnt/mydisk /test/
mount: wrong fs type, bad option, bad superblock on /dev/loop0,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

[root@localhost2 ~]# dmesg
[   32.873553] XFS (loop0): Mounting V5 Filesystem
[   32.884892] XFS (loop0): Corruption warning: Metadata has LSN (1:2303) ahead of current LSN (1:2271). Please unmount and run xfs_repair (>= v4.3) to resolve.
[   32.884894] XFS (loop0): log mount/recovery failed: error -22
[   32.884918] XFS (loop0): log mount failed

It has the same problem when use elioctl reload-snapshot within this test.

The text was updated successfully, but these errors were encountered:

kgermanov · 2022-11-16T17:25:15Z

@105590023 Looks like same as #63. Is it reproducable without reboot?
Another direction. Could you test reboot with shutdown script?

root@user-vm:/home/kgermanov# cat  /lib/systemd/system-shutdown/umount_rootfs.shutdown
#!/bin/sh

sync
mount -o remount,ro /
umount /

e-kov · 2022-11-16T17:58:49Z

@105590023
There is another interesting case to check. Does this issue with Corruption warning in dmesg happen with ext4 FS?
If no, @kgermanov is right. It looks like #63.

jamesruic · 2022-11-18T01:35:24Z

@kgermanov Thank you for your reply.
I use the shutdown script, still get the error.

[root@localhost ~]# cat /lib/systemd/system-shutdown/umount_rootfs.shutdown
#!/bin/sh

sync
mount -o remount,ro /
umount /

[root@localhost ~]# dmesg
[   45.309376] loop: module loaded
[   45.325215] XFS (loop0): Mounting V5 Filesystem
[   45.336186] XFS (loop0): Corruption warning: Metadata has LSN (1:2704) ahead of current LSN (1:2679). Please unmount and run xfs_repair (>= v4.3) to resolve.
[   45.336188] XFS (loop0): log mount/recovery failed: error -22
[   45.336208] XFS (loop0): log mount failed

jamesruic · 2022-11-18T01:36:49Z

@e-kov Thank you for your reply.
It works fine in ext4 FS.

kgermanov · 2022-11-21T09:21:20Z

@jamesruic Could you retest with this steps?:

[root@localhost ~]# cat /elastio-reload
#!/bin/sh
elioctl reload-snapshot /dev/sdb1 /.snapshot0 0

[root@localhost ~]# xfs_freeze /data
[root@localhost ~]# sync
[root@localhost ~]# elioctl setup-snapshot -c 10 -f 200 /dev/sdb1 /data/.snapshot0 0
[root@localhost ~]# xfs_freeze -u /data
[root@localhost ~]# mount /dev/elastio-snap0 /test/ && sleep 1 && umount /test
[root@localhost ~]# dmesg | grep elastio
[root@localhost ~]# systemctl start reboot.target
<after reboot>
[root@localhost ~]# mount /dev/elastio-snap0 /test/
[root@localhost ~]# dmesg

e-kov · 2022-11-21T11:19:41Z

@kgermanov I'm afraid elioctl setup-snapshot will hang after xfs_freeze, because it couldn't allocate CoW file at the frozen FS.

kgermanov · 2022-11-21T18:17:25Z

@e-kov yes, you are right(

anelson · 2022-12-12T11:36:04Z

@e-kov is incremental after reboot broken in general?

e-kov · 2022-12-12T14:16:04Z

@anelson no. It's not broken in general. The issue is with XFS only. This is a manifestation of a problem with the mount and XFS logs #63

anelson · 2022-12-12T14:36:29Z

Discussed on planning. Scope is clear now.

anelson · 2022-12-12T14:37:40Z

This is technically a duplicate of #63, however @e-kov has asked to keep this issue open separately as it contains another useful scenario with which to validate a future fix of #63.

jamesruic · 2023-01-30T06:46:49Z

Is it possible to use register_reboot_notifier to do some processing on the block device before shutting down the system?
Maybe register one notifier through register_reboot_notifier() function when module init, and do something like: change to snapshot mode or freeze device.
I'm not sure if it will help.
https://elixir.bootlin.com/linux/v3.10/source/kernel/sys.c#L344

int shutdown_notification(struct notifier_block *nb, unsigned long action, void* unused) {
    // do something

    return NOTIFY_DONE;
}

static struct notifier_block reboot_notifier = {
    .notifier_call = &shutdown_notification,
    .priority = INT_MAX
};

int __init init_module(void) {
    ...
    register_reboot_notifier(&reboot_notifier);
    ...
}

void __exit exit_module(void) {
    ...
    unregister_reboot_notifier(&reboot_notifier);
    ...
}

e-kov added the project/data-plane label Nov 28, 2022

anelson added the needs-clarification label Dec 12, 2022

anelson removed the needs-clarification label Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After reload-incremental will cause error when mount image #194

After reload-incremental will cause error when mount image #194

jamesruic commented Nov 16, 2022

kgermanov commented Nov 16, 2022 •

edited

Loading

e-kov commented Nov 16, 2022 •

edited

Loading

jamesruic commented Nov 18, 2022

jamesruic commented Nov 18, 2022 •

edited

Loading

kgermanov commented Nov 21, 2022 •

edited

Loading

e-kov commented Nov 21, 2022

kgermanov commented Nov 21, 2022

anelson commented Dec 12, 2022

e-kov commented Dec 12, 2022

anelson commented Dec 12, 2022

anelson commented Dec 12, 2022

jamesruic commented Jan 30, 2023 •

edited

Loading

After reload-incremental will cause error when mount image #194

After reload-incremental will cause error when mount image #194

Comments

jamesruic commented Nov 16, 2022

kgermanov commented Nov 16, 2022 • edited Loading

e-kov commented Nov 16, 2022 • edited Loading

jamesruic commented Nov 18, 2022

jamesruic commented Nov 18, 2022 • edited Loading

kgermanov commented Nov 21, 2022 • edited Loading

e-kov commented Nov 21, 2022

kgermanov commented Nov 21, 2022

anelson commented Dec 12, 2022

e-kov commented Dec 12, 2022

anelson commented Dec 12, 2022

anelson commented Dec 12, 2022

jamesruic commented Jan 30, 2023 • edited Loading

kgermanov commented Nov 16, 2022 •

edited

Loading

e-kov commented Nov 16, 2022 •

edited

Loading

jamesruic commented Nov 18, 2022 •

edited

Loading

kgermanov commented Nov 21, 2022 •

edited

Loading

jamesruic commented Jan 30, 2023 •

edited

Loading