Skip to content
This repository has been archived by the owner on Jan 2, 2025. It is now read-only.

After reload-incremental will cause error when mount image #194

Open
jamesruic opened this issue Nov 16, 2022 · 12 comments
Open

After reload-incremental will cause error when mount image #194

jamesruic opened this issue Nov 16, 2022 · 12 comments

Comments

@jamesruic
Copy link

Hi I'm testing elioctl reload-incremental and reload-snapshot command with latest code, and found it has some problem.
After elioctl reload-incremental or reload-snapshot, then update changes to image. This image will cause error while mounting.

Here are my test steps and virtual machine information.
My VM is CentOS 7.9 on VMware and its info:

[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

[root@localhost ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 908M     0  908M   0% /dev
tmpfs                    919M     0  919M   0% /dev/shm
tmpfs                    919M  8.9M  910M   1% /run
tmpfs                    919M     0  919M   0% /sys/fs/cgroup
/dev/mapper/centos-root   14G  1.8G   12G  13% /
/dev/sdb1                2.0G   33M  2.0G   2% /data
/dev/sda2               1014M  143M  872M  15% /boot
/dev/sda1                200M   12M  189M   6% /boot/efi
tmpfs                    184M     0  184M   0% /run/user/0
/dev/sdc1                 50G   33M   50G   1% /mnt

[root@localhost ~]# lsblk -fp
NAME                        FSTYPE      LABEL           UUID                                   MOUNTPOINT
/dev/sda
├─/dev/sda1                 vfat                        4363-9EE4                              /boot/efi
├─/dev/sda2                 xfs                         3b2b2be2-5144-4004-a677-b637cd956f3c   /boot
└─/dev/sda3                 LVM2_member                 4hODIp-EdZc-ehjb-bM2X-liUE-9BaH-LoO5Tk
  ├─/dev/mapper/centos-root xfs                         ccbc5ed4-d7ad-4fd2-8716-d6d65da8ca6b   /
  └─/dev/mapper/centos-swap swap                        447468cd-858d-4139-90e0-0cae99e27322   [SWAP]
/dev/sdb
└─/dev/sdb1                 xfs                         f7eb5852-c43e-497c-b02c-aeed0c79d570   /data
/dev/sdc
└─/dev/sdc1                 xfs                         ea385966-6f1e-433f-90ab-0833c661da90   /mnt
/dev/sr0                    iso9660     CentOS 7 x86_64 2020-11-04-11-36-43-00

Here's my test steps:

  1. Change reload file then build rpm packages :
[root@localhost ~]# cat elastio-snap/dist/initramfs/dracut/elastio-snap.sh
#!/bin/sh

type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh

modprobe elastio-snap

[ -z "$root" ] && root=$(getarg root=)
[ -z "$rootfstype" ] && rootfstype=$(getarg rootfstype=)

rbd="${root#block:}"
if [ -n "$rbd" ]; then
    case "$rbd" in
        LABEL=*)
            rbd="$(echo $rbd | sed 's,/,\\x2f,g')"
            rbd="/dev/disk/by-label/${rbd#LABEL=}"
            ;;
        UUID=*)
            rbd="/dev/disk/by-uuid/${rbd#UUID=}"
            ;;
        PARTLABEL=*)
            rbd="/dev/disk/by-partlabel/${rbd#PARTLABEL=}"
            ;;
        PARTUUID=*)
            rbd="/dev/disk/by-partuuid/${rbd#PARTUUID=}"
            ;;
    esac

    echo "elastio-snap: root block device = $rbd" > /dev/kmsg

    # Device might not be ready
    if [ ! -b "$rbd" ]; then
        udevadm settle
    fi

    # Kernel cmdline might not specify rootfstype
    [ -z "$rootfstype" ] && rootfstype=$(blkid -s TYPE "$rbd" -o value)

    echo "elastio-snap: mounting $rbd as $rootfstype" > /dev/kmsg
    blockdev --setro $rbd
    mount -t $rootfstype -o ro "$rbd" /etc/elastio/dla/mnt
    udevadm settle

    if [ -x /etc/elastio/dla/mnt/elastio-reload ]; then
        /etc/elastio/dla/mnt/elastio-reload
    else
        echo "elastio-snap: error: cannot reload tracking data: missing /sbin/elastio_reload" > /dev/kmsg
    fi

    umount -f /etc/elastio/dla/mnt
    blockdev --setrw $rbd
fi

[root@localhost ~]# cat /elastio-reload
#!/bin/sh
modprobe elastio-snap -d /etc/elastio/dla/mnt
elioctl reload-incremental /dev/sdb1 /.snapshot0 0
  1. Create /dev/sdb1 snapshot and copy image to another disk (/dev/sdc):
[root@localhost ~]# elioctl setup-snapshot /dev/sdb1 /data/.snapshot0 0
[root@localhost ~]# cat /proc/elastio-snap-info
{
        "version": "0.11.0",
        "devices": [
                {
                        "minor": 0,
                        "cow_file": "/.snapshot0",
                        "block_device": "/dev/sdb1",
                        "max_cache": 314572800,
                        "fallocate": 213909504,
                        "seq_id": 1,
                        "uuid": "ae776b8c35124ea4b9eeeb8cebbb8034",
                        "version": 1,
                        "nr_changed_blocks": 0,
                        "state": 3
                }
        ]
}

[root@localhost ~]# dd if=/dev/elastio-snap0 of=/mnt/mydisk bs=4M
511+1 records in
511+1 records out
2145386496 bytes (2.1 GB) copied, 4.23293 s, 507 MB/s
  1. Put the snapshot into incremental mode then reboot to trigger reload-incremental:
[root@localhost ~]# elioctl transition-to-incremental 0
[root@localhost ~]# cat /proc/elastio-snap-info
{
        "version": "0.11.0",
        "devices": [
                {
                        "minor": 0,
                        "cow_file": "/.snapshot0",
                        "block_device": "/dev/sdb1",
                        "max_cache": 314572800,
                        "fallocate": 213909504,
                        "seq_id": 1,
                        "uuid": "ae776b8c35124ea4b9eeeb8cebbb8034",
                        "version": 1,
                        "nr_changed_blocks": 9,
                        "state": 2
                }
        ]
}

[root@localhost ~]# reboot
  1. Check /proc/elastio-snap-info and add new file:
[root@localhost ~]# cat /proc/elastio-snap-info
{
        "version": "0.11.0",
        "devices": [
                {
                        "minor": 0,
                        "cow_file": "/.snapshot0",
                        "block_device": "/dev/sdb1",
                        "max_cache": 314572800,
                        "fallocate": 213909504,
                        "seq_id": 1,
                        "uuid": "ae776b8c35124ea4b9eeeb8cebbb8034",
                        "version": 1,
                        "nr_changed_blocks": 9,
                        "state": 2
                }
        ]
}

[root@localhost ~]# touch /data/tempfile2
[root@localhost ~]# ls -la /data/
total 4104
drwxr-xr-x.  2 root root      57 Nov 16 14:40 .
dr-xr-xr-x. 18 root root     277 Nov 16 14:34 ..
----------.  1 root root 4198400 Nov 16 14:38 .snapshot0
-rw-r--r--.  1 root root       6 Nov 16 14:25 tempfile
-rw-r--r--.  1 root root       0 Nov 16 14:40 tempfile2
  1. Switch into snapshot mode and update block changes:
[root@localhost ~]# elioctl transition-to-snapshot /.snapshot1 0
[root@localhost ~]# update-img /dev/elastio-snap0 /data/.snapshot0 /mnt/mydisk
snapshot is 523776 blocks large
copying blocks
copying complete: 13 blocks changed, 0 errors
  1. Move image to another VM, try to mount it but get some errors:
[root@localhost2 ~]# mount /mnt/mydisk /test/
mount: wrong fs type, bad option, bad superblock on /dev/loop0,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

[root@localhost2 ~]# dmesg
[   32.873553] XFS (loop0): Mounting V5 Filesystem
[   32.884892] XFS (loop0): Corruption warning: Metadata has LSN (1:2303) ahead of current LSN (1:2271). Please unmount and run xfs_repair (>= v4.3) to resolve.
[   32.884894] XFS (loop0): log mount/recovery failed: error -22
[   32.884918] XFS (loop0): log mount failed

It has the same problem when use elioctl reload-snapshot within this test.

@kgermanov
Copy link

kgermanov commented Nov 16, 2022

@105590023 Looks like same as #63. Is it reproducable without reboot?
Another direction. Could you test reboot with shutdown script?

root@user-vm:/home/kgermanov# cat  /lib/systemd/system-shutdown/umount_rootfs.shutdown
#!/bin/sh

sync
mount -o remount,ro /
umount /

@e-kov
Copy link
Collaborator

e-kov commented Nov 16, 2022

@105590023
There is another interesting case to check. Does this issue with Corruption warning in dmesg happen with ext4 FS?
If no, @kgermanov is right. It looks like #63.

@jamesruic
Copy link
Author

@kgermanov Thank you for your reply.
I use the shutdown script, still get the error.

[root@localhost ~]# cat /lib/systemd/system-shutdown/umount_rootfs.shutdown
#!/bin/sh

sync
mount -o remount,ro /
umount /

[root@localhost ~]# dmesg
[   45.309376] loop: module loaded
[   45.325215] XFS (loop0): Mounting V5 Filesystem
[   45.336186] XFS (loop0): Corruption warning: Metadata has LSN (1:2704) ahead of current LSN (1:2679). Please unmount and run xfs_repair (>= v4.3) to resolve.
[   45.336188] XFS (loop0): log mount/recovery failed: error -22
[   45.336208] XFS (loop0): log mount failed

@jamesruic
Copy link
Author

jamesruic commented Nov 18, 2022

@e-kov Thank you for your reply.
It works fine in ext4 FS.

@kgermanov
Copy link

kgermanov commented Nov 21, 2022

@jamesruic Could you retest with this steps?:

[root@localhost ~]# cat /elastio-reload
#!/bin/sh
elioctl reload-snapshot /dev/sdb1 /.snapshot0 0

[root@localhost ~]# xfs_freeze /data
[root@localhost ~]# sync
[root@localhost ~]# elioctl setup-snapshot -c 10 -f 200 /dev/sdb1 /data/.snapshot0 0
[root@localhost ~]# xfs_freeze -u /data
[root@localhost ~]# mount /dev/elastio-snap0 /test/ && sleep 1 && umount /test
[root@localhost ~]# dmesg | grep elastio
[root@localhost ~]# systemctl start reboot.target
<after reboot>
[root@localhost ~]# mount /dev/elastio-snap0 /test/
[root@localhost ~]# dmesg

@e-kov
Copy link
Collaborator

e-kov commented Nov 21, 2022

@kgermanov I'm afraid elioctl setup-snapshot will hang after xfs_freeze, because it couldn't allocate CoW file at the frozen FS.

@kgermanov
Copy link

@e-kov yes, you are right(

@anelson
Copy link

anelson commented Dec 12, 2022

@e-kov is incremental after reboot broken in general?

@e-kov
Copy link
Collaborator

e-kov commented Dec 12, 2022

@anelson no. It's not broken in general. The issue is with XFS only. This is a manifestation of a problem with the mount and XFS logs #63

@anelson
Copy link

anelson commented Dec 12, 2022

Discussed on planning. Scope is clear now.

@anelson
Copy link

anelson commented Dec 12, 2022

This is technically a duplicate of #63, however @e-kov has asked to keep this issue open separately as it contains another useful scenario with which to validate a future fix of #63.

@jamesruic
Copy link
Author

jamesruic commented Jan 30, 2023

Is it possible to use register_reboot_notifier to do some processing on the block device before shutting down the system?
Maybe register one notifier through register_reboot_notifier() function when module init, and do something like: change to snapshot mode or freeze device.
I'm not sure if it will help.
https://elixir.bootlin.com/linux/v3.10/source/kernel/sys.c#L344

int shutdown_notification(struct notifier_block *nb, unsigned long action, void* unused) {
    // do something

    return NOTIFY_DONE;
}

static struct notifier_block reboot_notifier = {
    .notifier_call = &shutdown_notification,
    .priority = INT_MAX
};

int __init init_module(void) {
    ...
    register_reboot_notifier(&reboot_notifier);
    ...
}

void __exit exit_module(void) {
    ...
    unregister_reboot_notifier(&reboot_notifier);
    ...
}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants