Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for performing updates from a container image #128

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rdoxenham
Copy link
Contributor

In this PR I'm introducing support for transactional-update to be able to consume a container image (or OCI artefact) as the source for the next boot snapshot. This is very important functionality that allows customers to build, distribute, and validate their operating system images via a standardised container image workflow; they can build OS images via Dockerfiles, store and retrieve them via a standard image registry, and put them through standard SBOM and vulnerability checkers.

This PR is specifically addressing requirements to enable customers to upgrade an existing machine, regardless of how it was deployed, to a new image-based snapshot, however the initial day1 approach could use tooling such as kiwi-ng system stackbuild to build bootable raw and SelfInstall ISO's based on the same container image, which is already a supported feature.

The approach taken in this PR:

  • Enables an easy on-ramp (and off-ramp) for users wanting to use the image approach
  • Leverages existing standard tooling to achieve the workflow; this would just use the standard OS layout/tools.
  • Enables users to switch between standard package or image based upgrades (no need to build a new image every time)
  • Doesn't break existing workflow for patching or migrating between products
  • Retains full snapper (i.e. transactional-update rollback) functionality
  • Retains full customisation options through ignition/combustion and jeos-firstboot (provided they're in the image!)
  • Preserves existing user configuration between updates/rollbacks [1]

I could use some help with ensuring that I'm making the correct calls, as I'm mixing tukit callext with using the {SNAPSHOT_DIR} directly. I suspect there's a more elegant/safer/better way of achieving this, hence why it's labelled as a work in progress for now.

Further, we likely need to run some validations to make sure that the target image is actually bootable. This code forces a dracut and grub2-mkconfig run, but I can see this being an area where it may be trivial to make the system difficult to boot and rescue. I'm not an expert here, so I'm wondering whether there are checks that we could execute, and if they fail, we can abort the snapshot. Documentation on how to build an image will be very important, especially as it relates to partitions (or btrfs subvolumes that are ignored) but we should likely do some additional sanity checks to verify the state of the new snapshot rather than blindly closing it and enabling the user to reboot, where we're not able to provide a decent level of confidence in a successful reboot.

For testing, I used this on-top of SLE Micro 5.5 and used a test container image available at (registry.opensuse.org/home/roxenham/slemicro/containers/edge/sle-micro/5.5:20240726). This is a very simple image that aims to mirror the standard SLE Micro 5.5 packages as defined in the original Kiwi image definition file, i.e. it's not cut down, but has the bare set of packages typically installed, or as defined in the "Default" profile. Of course, it's perfectly possible to modify this image to suit, e.g. adding a package to it is as simple as building and pushing a new image (noting that the suseconnect is only required for commercially registered images, this wouldn't be required for Leap Micro or MicroOS:

% cat Dockerfile
FROM registry.opensuse.org/home/roxenham/slemicro/containers/edge/sle-micro/5.5:20240726
RUN suseconnect -r <regcode> && zypper --gpg-auto-import-keys ref \
     && zypper in -y nvidia-open-driver-G06-signed-kmp-default \
     && suseconnect -d && zypper clean -a

ARG IMAGE_REPO=unknown
ARG IMAGE=unknown
ARG IMAGE_TAG=unknown

RUN sed -i '/IMAGE/d' /usr/lib/os-release && \
    sed -i '/TIMESTAMP/d' /usr/lib/os-release && \
    echo IMAGE_REPO=\"$IMAGE_REPO\"              >> /usr/lib/os-release && \
    echo IMAGE_TAG=\"$IMAGE_TAG\"                >> /usr/lib/os-release && \
    echo IMAGE=\"$IMAGE_REPO:$IMAGE_TAG\"        >> /usr/lib/os-release && \
    echo TIMESTAMP="`date +'%Y%m%d%H%M%S'`"      >> /usr/lib/os-release

% podman build -t harbor.rancher.rdoxenham.com/slemicro/5.5:$(date +'%Y%m%d') \
    --build-arg "IMAGE=harbor.rancher.rdoxenham.com/slemicro/5.5:$(date +'%Y%m%d')" \
    --build-arg "IMAGE_TAG=$(date +'%Y%m%d')" \
    --build-arg "IMAGE_REPO=harbor.rancher.rdoxenham.com/slemicro/5.5" .

% podman push harbor.rancher.rdoxenham.com/slemicro/5.5:20240727
(...)

Then, this image can be used as an input to code provided as part of this PR:

# transactional-update apply-oci --image harbor.rancher.rdoxenham.com/slemicro/5.5:20240727
(...)

[after reboot]

# cat /etc/os-release
NAME="SLE Micro"
VERSION="5.5"
VERSION_ID="5.5"
PRETTY_NAME="SUSE Linux Enterprise Micro 5.5"
ID="sle-micro"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sle-micro:5.5"
GRUB_ENTRY_NAME="SLE Micro"
IMAGE_REPO="harbor.rancher.rdoxenham.com/slemicro/5.5"
IMAGE_TAG="20240727"
IMAGE="harbor.rancher.rdoxenham.com/slemicro/5.5:20240727"
TIMESTAMP=20240727173117

# rpm -qa | grep nvidia-open-driver
nvidia-open-driver-G06-signed-kmp-default-550.90.07_k5.14.21_150500.55.65-150500.3.47.1.x86_64

Some further guidance from the TU community would be appreciated. Thanks a lot!

[1] This approach aligns well with the bootc project (https://containers.github.io/bootc/filesystem.html#etc) to be able to persist /etc configuration across image updates, which would enable configuration such as OS registration, package repositories, static network configuration, and various other components to persist, and not be overwritten by the OS upgrade. However, one question we may want to answer is whether we want to enable /usr/etc to enable specific files in /etc to be overwritten by force by contents found in /usr/etc to give users a release-valve for changing certain persistent configuration over time as part of a 3-way merge (this PR doesn't do this yet).

@rdoxenham rdoxenham changed the title [WIP] Adding support for performing updates from a container image Adding support for performing updates from a container image Aug 19, 2024
@lz-coder
Copy link

any plans for this to be merged?

@laenion
Copy link
Collaborator

laenion commented Aug 30, 2024

any plans for this to be merged?

Yes, I just have to find the time to finally review it! Sorry for the delay!

@joostwestra
Copy link

registry.opensuse.org/home/roxenham/slemicro/containers/edge/sle-micro/5.5:20240726 does not exist anymore?

@rdoxenham
Copy link
Contributor Author

Hi @joostwestra - yes, that system rebuilds every day and discards already created tags, so you can either select the latest tag, or just build off latest:

  • registry.opensuse.org/home/roxenham/slemicro/containers/edge/sle-micro/5.5:20240912
  • registry.opensuse.org/home/roxenham/slemicro/containers/edge/sle-micro/5.5:latest

@agracey
Copy link

agracey commented Sep 27, 2024

Any update on timeline for a review of this?

@puneetlws
Copy link

puneetlws commented Sep 29, 2024

Since slemicro 6 is also released now, Is it possible to add migration functionality to it through above mechanism, from 5.5 to 6?? Right now to migrate to 6 there is 'transactional-update migrate' command which internally calls zypper migrate, which requires connectivity to SUSEConnect.

@joostwestra
Copy link

We tested this feature by manually patching it in. We think it is a valuable feature for a wide audience.
Anything we can do to help get this feature to be picked up further?

@laenion
Copy link
Collaborator

laenion commented Sep 30, 2024

I'll include it in the next major release (hopefully to be released soon), it's just that I'm currently prioritizing finalizing the work on moving the /etc overlays to btrfs subvolumes.

@agracey
Copy link

agracey commented Sep 30, 2024

@laenion Thank you!

Copy link
Collaborator

@laenion laenion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the implementation! I really love the approach, though I'm not sure how bullet-proof it has to become to merge it. So let me just add a few comments.

It seems that one so far has to specify a specific image, i.e. it won't detect whether there's an update. On the other hand it's also possible to install any image (even other distributions) which will obviously break.
I guess it would make sense to either predefine (e.g. in tukit.conf or transactional-update.conf) the repository and just let the user specify the specific version by default.

In the longer term I think we also want to integrate the functionality into tukit directly as an alternative implementation to snapper. Then the updates could also be triggered by D-Bus and the API.

Comment on lines +1517 to +1520
OCI_RSYNC_EXCLUDES_LIST=()
for i in ${OCI_RSYNC_EXCLUDES}; do
OCI_RSYNC_EXCLUDES_LIST+=("--exclude $i ")
done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand it's good that the directories are excluded explicitly because the directories will be overmounted anyway, on the other hand transactional-update is warning at the end of the update about those files, and the created files would still be available as a reference.

Both ways are valid approaches though, I just wanted to mention this.


# Merge contents of /etc from container image but preserve existing configuration
log_info "INFO: Merging /etc from container image into existing snapshot, preserving existing configuration..."
tukit ${TUKIT_OPTS} callext ${SNAPSHOT_ID} rsync --ignore-existing ${OCI_RSYNC_ARGS} ${OCI_MOUNT}/etc/ ${SNAPSHOT_DIR}/etc/ |& tee -a ${LOGFILE} 1>&${origstdout}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this rsync call is different from the one above syncing the root file system. Isn't the result exactly the same when removing /etc from the exclude list?

I guess this rsync call is supposed to synchronize the changed files in /etc from the currently running system into the new snapshot?

Comment on lines +1119 to +1122
if [ "$1" == "--help" ]; then
usage 1
break
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a dedicated --help option for this specific command is useful because it would be the only command with this behavior - I'd include the information about the required parameters in the general help.

@@ -74,6 +75,9 @@ SYSTEM_MANIFEST_FILE="@libdir@/sysimage/tu/system.manifest"
ZYPPER_AUTO_IMPORT_KEYS=0
ETC_OVERLAY_PATTERN='^[^[:space:]]\+[[:space:]]\+\/etc[[:space:]]\+overlay[[:space:]]\+\([^[:space:]]*,\|\)workdir=\/sysroot\/var\/lib\/overlay\/work-etc[,[:space:]]'
NON_ROOTFS_WHITELIST=("/var/lib/YaST2/cookies" "/var/lib/rpm" "/var/lib/systemd/migrated" "/var/run/zypp.pid")
OCI_RSYNC_ARGS="-a --hard-links --xattrs --acls --inplace"
OCI_RSYNC_EXCLUDES="/etc /var /usr/local /tmp /root /home /srv /opt /sys /dev /proc /run"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding this list I'd suggest to dynamically parse it using findmnt, but also see the comment where the exclude list is processed below.

usage 1
break
fi
shift
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wouldn't consume all arguments it would even be possible to install additional packages in one go (e.g. transactional-update apply-oci --image <path> pkg in <package>.

@laenion
Copy link
Collaborator

laenion commented Oct 10, 2024

I could use some help with ensuring that I'm making the correct calls, as I'm mixing tukit callext with using the {SNAPSHOT_DIR} directly. I suspect there's a more elegant/safer/better way of achieving this, hence why it's labelled as a work in progress for now.

These calls are perfectly fine. You could use tukit call podman image pull ... and also call rsync with tukit call for syncing the initial root file system, but the result will (with the current exclusion list) be the same.

Further, we likely need to run some validations to make sure that the target image is actually bootable. This code forces a dracut and grub2-mkconfig run, but I can see this being an area where it may be trivial to make the system difficult to boot and rescue. I'm not an expert here, so I'm wondering whether there are checks that we could execute, and if they fail, we can abort the snapshot.

Indeed, also see my comment in #128 (review).

Documentation on how to build an image will be very important, especially as it relates to partitions (or btrfs subvolumes that are ignored) but we should likely do some additional sanity checks to verify the state of the new snapshot rather than blindly closing it and enabling the user to reboot, where we're not able to provide a decent level of confidence in a successful reboot.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants