Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-bootchart fails with ENOENT for "/proc/schedstat" when run from initial ramdisk #60

Open
jamuir opened this issue Jan 21, 2025 · 3 comments

Comments

@jamuir
Copy link

jamuir commented Jan 21, 2025

Executing systemd-bootchart from the initial ramdisk fails when systemd does its switch-root procedure.

This can be reproduced on Fedora 41 with an initial ramdisk updated to include systemd-bootchart.

The systemd-bootchart documentation does not mention if execution from the initial ramdisk is supported or not, but, internally, systemd-bootchart sets argv[0][0] = '@', so it seems like this was supported at one point (setting argv[0][0] = '@' is one way to survive the switch-root process killing spree).

The failure happens here:

https://github.com/systemd/systemd-bootchart/blob/a15bcafb60b9a24d866024953e9965316ba73eaf/src/store.c#L191C1-L194C71

I will provide an strace log and more detailed steps to reproduce below.

@jamuir
Copy link
Author

jamuir commented Jan 22, 2025

strace log is attached.

strace-proc-schedstat.log

To prepare an initial ramdisk with systemd-bootchart (and strace), you can do this:

sudo -i
mkdir -p initrd/root
cd initrd/root
gunzip --stdout /boot/initramfs-6.11.4-301.fc41.aarch64.img | cpio --extract 
cd usr/lib/systemd
cp /usr/lib/systemd/systemd-bootchart .
# you can check that all required libs are already present
#   ldd /usr/lib/systemd/systemd-bootchart
cd ../..
cd bin
cp /usr/bin/strace .
# you will need to copy a few libs to support strace
#   ldd /usr/bin/strace
cd ../..
find . | cpio -o -H newc --file=../initramfs-xx.cpio
cd ..
gzip --stdout initramfs-xx.cpio > initramfs-xx.img
cp initramfs-xx.img /boot/initramfs-xx.img

Reboot and then edit the grub command to boot using the new initial ramdisk:

initrd ($root)/initramfs-xx.img

Also, add a kernel param to boot into the rd.emergency target (I also added enforcing=0):

$ xargs -n1 < /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.11.4-301.fc41.aarch64
root=/dev/mapper/fedora_vbox-root
ro
rd.lvm.lv=fedora_vbox/root
rhgb
enforcing=0
rd.emergency

In the ramdisk emergency shell, run bootchart and then exit to continue booting:

# strace -o /run/log/strace.log /usr/lib/systemd/systemd-bootchart &
# exit

When you login as normal, systemd-bootchart won't be running.

The strace log shows that systemd-bootchart failed attempting to read /proc/schedstat and then exited:

clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=32804124}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=1, si_uid=0} ---
restart_syscall(<... resuming interrupted clock_nanosleep ...>) = 0
lseek(4, 0, SEEK_SET)                   = 0
pread64(5, "nr_free_pages 478087\nnr_zone_ina"..., 4095, 0) = 3531
openat(AT_FDCWD, "/proc/schedstat", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
writev(2, [{iov_base="Unable to read schedstat: No suc"..., iov_len=51}, {iov_base="\n", iov_len=1}], 2) = -1 EIO (Input/output error)
getpid()                                = 241
close(3)                                = 0
close(4)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++

You can also reproduce the defect by setting the kernel param rdinit=:

rdinit=/usr/lib/systemd/systemd-bootchart

@sofar
Copy link
Contributor

sofar commented Feb 9, 2025

This certainly wasn't supported.

I think we can, though. I think we might have to rewrite all the proc opening code to open the "correct" proc folder, somehow detect and fallback to the "new" location of proc and instead of opening file by full path, use openat on the existing proc directory fd. It's likely going to be a little messy because for each process, we will be opening files relative to the proc folder.

That's assuming that it actually works and the fd for /proc remains accessible after the switchroot.

@jamuir
Copy link
Author

jamuir commented Feb 9, 2025

We have a patch that works the way you suggest; i.e. rather than use an absolute path, it holds a file descriptor to the original /proc (pre-switch-root) and then opens relative to that fd.

It seems to work.

I will test it a bit more and then open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants