Skip to content

Commit

Permalink
Fix bbmain to correctly handle #! files
Browse files Browse the repository at this point in the history
This is needed for Plan 9.

Signed-off-by: Ronald G. Minnich <[email protected]>
  • Loading branch information
rminnich committed Aug 24, 2021
1 parent adc854e commit 95851a7
Show file tree
Hide file tree
Showing 2 changed files with 88 additions and 1 deletion.
87 changes: 87 additions & 0 deletions src/pkg/bb/bbmain/cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"log"
"os"
"path/filepath"
"strings"

"github.com/u-root/gobusybox/src/pkg/bb/bbmain"
// There MUST NOT be any other dependencies here.
Expand Down Expand Up @@ -79,7 +80,93 @@ func main() {
run()
}

// u-root was originally built around the use of symlinks, but not all systems
// have symlinks. This only recently became an issue with the Plan 9 port.
//
// One way to get around this lack, inefficiently, is to make each of the symlinks
// a small shell script, e.g., on Plan 9, one might have, in /bbin/ls,
// #!/bin/rc
// bb ls
// This leaves a lot to be desired: it puts the execution of a shell in front
// of each u-root command, and it requires the existence of that shell on the
// system.
//
// The goal is that a single u-root file lead to running the u-root busybox
// with no intermediate programs running.
//
// It is worth taking a look at what a symlink is, how it works in operation,
// and how we might achieve the same goal some other way.
//
// A symlink is plain file, containing 0 or more bytes of text (or utf-8, depending)
// with an attribute that causes the kernel to give it special treatment.
// It is not available on all file systems.
//
// [Note: they were invented in 1965 for Multics].
// The symlink is itself still controversial, though widely used.
//
// Consider the process of traversing a symlink: it involves the equivalent
// of stat, open, read, evaluate contents, use that as a file name, repeat as needed.
//
// It is possible to get that same effect, with the same overheads, by using #!
// files but specifying bb as the interpreter.
//
// ls would then be:
// #!/bin/bb ls
//
// Note that the absolute path is required, else Linux will throw an error as bb
// is not in the list of allowed interpreters.
// The /bin/bb path is not an issue on Plan 9, since users construct their name space
// on startup and binding /bbin into /bin is no problem.
//
// In this case the kernel will stat, open, and read the file, find the executable name,
// and start it. This approach has as low overhead as the symlink approach.
//
// One problem remains: Unix and Plan 9 evaluate arguments in a #! file differently,
// and, further, invoke the argument in a different way.
// Given the file shown above, bb on Plan9 gets the arguments:
// [ls ls /tmp/ls]
// With the same file, bb on Linux gets this:
// [/bbin/bb ls /tmp/ls]
// But wait! There's more!
// On Plan 9, the arguments following the interpreter are tokenized (split on space)
// and on Linux, they are not.
//
// This leads to a few conclusions:
// - We can get around lack of symlinks by using #! (sh-bang) files with an absolute path to
// bb as the interpreter, e.g. #!/abs/path/to/bb argument.
// This achieves the "exec once" goal.
// - We can specify which u-root tool to use via arguments to bb in the #! file.
// - The argument to the interpreter (/bbin/bb) should be one token (e.g. ls) because of different
// behavior in different systems (some tokenize, some do not).
// - Because of the differences in how arguments are presented to #! on different kernels,
// there should be a reasonably unique marker so that bb can have confidence that
// it is running as an interpreter.
//
// The conclusions lead to the following design:
// #! files for bb specify their argument with #!. E.g., the file for ls looks like this:
// #!/bbin/bb #!ls
// On Linux, the args to bb then look like:
// [/bbin/bb #!ls /tmp/ls ...]
// on Plan 9:
// [ls #!ls /tmp/ls ...]
// The code needs to change the arguments to look like an exec:
// [/tmp/ls ...]
// In each case, the second arg begins with a #!, which is extremely unlikely to appear
// in any other context (save testing #! files, of course).
// The result is that the kernel, given a path to a u-root #! file, will read that file,
// then exec bbin with the argument from the #! and any additional arguments from the exec.
// The overhead in this case is no more than the symlink overhead.
// A final advantage is that we can now install u-root on file systems that don't have
// symbolic links, e.g. VFAT, and it will have low overhead.
//
// So, dear reader, if you are wondering why the little bit of code below is the way
// it is, now you know.
func init() {
// If this has been run from a #! file, it will have at least
// 3 args, and os.Args needs to be reconstructed.
if len(os.Args) > 2 && strings.HasPrefix(os.Args[1], "#!") {
os.Args = os.Args[2:]
}
m := func() {
if len(os.Args) == 1 {
log.Fatalf("Invalid busybox command: %q", os.Args)
Expand Down
2 changes: 1 addition & 1 deletion src/pkg/bb/bbmain_src.go
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
package bb

var bbMainSource = []byte("// Copyright 2018 the u-root Authors. All rights reserved\n// Use of this source code is governed by a BSD-style\n// license that can be found in the LICENSE file.\n\n// Package main is the busybox main.go template.\npackage main\n\nimport (\n\t\"log\"\n\t\"os\"\n\t\"path/filepath\"\n\n\t\"github.com/u-root/gobusybox/src/pkg/bb/bbmain\"\n\t// There MUST NOT be any other dependencies here.\n\t//\n\t// It is preferred to copy minimal code necessary into this file, as\n\t// dependency management for this main file is... hard.\n)\n\n// AbsSymlink returns an absolute path for the link from a file to a target.\nfunc AbsSymlink(originalFile, target string) string {\n\tif !filepath.IsAbs(originalFile) {\n\t\tvar err error\n\t\toriginalFile, err = filepath.Abs(originalFile)\n\t\tif err != nil {\n\t\t\t// This should not happen on Unix systems, or you're\n\t\t\t// already royally screwed.\n\t\t\tlog.Fatalf(\"could not determine absolute path for %v: %v\", originalFile, err)\n\t\t}\n\t}\n\t// Relative symlinks are resolved relative to the original file's\n\t// parent directory.\n\t//\n\t// E.g. /bin/defaultsh -> ../bbin/elvish\n\tif !filepath.IsAbs(target) {\n\t\treturn filepath.Join(filepath.Dir(originalFile), target)\n\t}\n\treturn target\n}\n\n// IsTargetSymlink returns true if a target of a symlink is also a symlink.\nfunc IsTargetSymlink(originalFile, target string) bool {\n\ts, err := os.Lstat(AbsSymlink(originalFile, target))\n\tif err != nil {\n\t\treturn false\n\t}\n\treturn (s.Mode() & os.ModeSymlink) == os.ModeSymlink\n}\n\n// ResolveUntilLastSymlink resolves until the last symlink.\n//\n// This is needed when we have a chain of symlinks and want the last\n// symlink, not the file pointed to (which is why we don't use\n// filepath.EvalSymlinks)\n//\n// I.e.\n//\n// /foo/bar -> ../baz/foo\n// /baz/foo -> bla\n//\n// ResolveUntilLastSymlink(/foo/bar) returns /baz/foo.\nfunc ResolveUntilLastSymlink(p string) string {\n\tfor target, err := os.Readlink(p); err == nil && IsTargetSymlink(p, target); target, err = os.Readlink(p) {\n\t\tp = AbsSymlink(p, target)\n\t}\n\treturn p\n}\n\nfunc run() {\n\tname := filepath.Base(os.Args[0])\n\tif err := bbmain.Run(name); err != nil {\n\t\tlog.Fatalf(\"%s: %v\", name, err)\n\t}\n}\n\nfunc main() {\n\tos.Args[0] = ResolveUntilLastSymlink(os.Args[0])\n\n\trun()\n}\n\nfunc init() {\n\tm := func() {\n\t\tif len(os.Args) == 1 {\n\t\t\tlog.Fatalf(\"Invalid busybox command: %q\", os.Args)\n\t\t}\n\t\t// Use argv[1] as the name.\n\t\tos.Args = os.Args[1:]\n\t\trun()\n\t}\n\tbbmain.Register(\"bbdiagnose\", bbmain.Noop, bbmain.ListCmds)\n\tbbmain.RegisterDefault(bbmain.Noop, m)\n}\n")
var bbMainSource = []byte("// Copyright 2018 the u-root Authors. All rights reserved\n// Use of this source code is governed by a BSD-style\n// license that can be found in the LICENSE file.\n\n// Package main is the busybox main.go template.\npackage main\n\nimport (\n\t\"log\"\n\t\"os\"\n\t\"path/filepath\"\n\t\"strings\"\n\n\t\"github.com/u-root/gobusybox/src/pkg/bb/bbmain\"\n\t// There MUST NOT be any other dependencies here.\n\t//\n\t// It is preferred to copy minimal code necessary into this file, as\n\t// dependency management for this main file is... hard.\n)\n\n// AbsSymlink returns an absolute path for the link from a file to a target.\nfunc AbsSymlink(originalFile, target string) string {\n\tif !filepath.IsAbs(originalFile) {\n\t\tvar err error\n\t\toriginalFile, err = filepath.Abs(originalFile)\n\t\tif err != nil {\n\t\t\t// This should not happen on Unix systems, or you're\n\t\t\t// already royally screwed.\n\t\t\tlog.Fatalf(\"could not determine absolute path for %v: %v\", originalFile, err)\n\t\t}\n\t}\n\t// Relative symlinks are resolved relative to the original file's\n\t// parent directory.\n\t//\n\t// E.g. /bin/defaultsh -> ../bbin/elvish\n\tif !filepath.IsAbs(target) {\n\t\treturn filepath.Join(filepath.Dir(originalFile), target)\n\t}\n\treturn target\n}\n\n// IsTargetSymlink returns true if a target of a symlink is also a symlink.\nfunc IsTargetSymlink(originalFile, target string) bool {\n\ts, err := os.Lstat(AbsSymlink(originalFile, target))\n\tif err != nil {\n\t\treturn false\n\t}\n\treturn (s.Mode() & os.ModeSymlink) == os.ModeSymlink\n}\n\n// ResolveUntilLastSymlink resolves until the last symlink.\n//\n// This is needed when we have a chain of symlinks and want the last\n// symlink, not the file pointed to (which is why we don't use\n// filepath.EvalSymlinks)\n//\n// I.e.\n//\n// /foo/bar -> ../baz/foo\n// /baz/foo -> bla\n//\n// ResolveUntilLastSymlink(/foo/bar) returns /baz/foo.\nfunc ResolveUntilLastSymlink(p string) string {\n\tfor target, err := os.Readlink(p); err == nil && IsTargetSymlink(p, target); target, err = os.Readlink(p) {\n\t\tp = AbsSymlink(p, target)\n\t}\n\treturn p\n}\n\nfunc run() {\n\tname := filepath.Base(os.Args[0])\n\tif err := bbmain.Run(name); err != nil {\n\t\tlog.Fatalf(\"%s: %v\", name, err)\n\t}\n}\n\nfunc main() {\n\tos.Args[0] = ResolveUntilLastSymlink(os.Args[0])\n\n\trun()\n}\n\n// u-root was originally built around the use of symlinks, but not all systems\n// have symlinks. This only recently became an issue with the Plan 9 port.\n//\n// One way to get around this lack, inefficiently, is to make each of the symlinks\n// a small shell script, e.g., on Plan 9, one might have, in /bbin/ls,\n// #!/bin/rc\n// bb ls\n// This leaves a lot to be desired: it puts the execution of a shell in front\n// of each u-root command, and it requires the existence of that shell on the\n// system.\n//\n// The goal is that a single u-root file lead to running the u-root busybox\n// with no intermediate programs running.\n//\n// It is worth taking a look at what a symlink is, how it works in operation,\n// and how we might achieve the same goal some other way.\n//\n// A symlink is plain file, containing 0 or more bytes of text (or utf-8, depending)\n// with an attribute that causes the kernel to give it special treatment.\n// It is not available on all file systems.\n//\n// [Note: they were invented in 1965 for Multics].\n// The symlink is itself still controversial, though widely used.\n//\n// Consider the process of traversing a symlink: it involves the equivalent\n// of stat, open, read, evaluate contents, use that as a file name, repeat as needed.\n//\n// It is possible to get that same effect, with the same overheads, by using #!\n// files but specifying bb as the interpreter.\n//\n// ls would then be:\n// #!/bin/bb ls\n//\n// Note that the absolute path is required, else Linux will throw an error as bb\n// is not in the list of allowed interpreters.\n// The /bin/bb path is not an issue on Plan 9, since users construct their name space\n// on startup and binding /bbin into /bin is no problem.\n//\n// In this case the kernel will stat, open, and read the file, find the executable name,\n// and start it. This approach has as low overhead as the symlink approach.\n//\n// One problem remains: Unix and Plan 9 evaluate arguments in a #! file differently,\n// and, further, invoke the argument in a different way.\n// Given the file shown above, bb on Plan9 gets the arguments:\n// [ls ls /tmp/ls]\n// With the same file, bb on Linux gets this:\n// [/bbin/bb ls /tmp/ls]\n// But wait! There's more!\n// On Plan 9, the arguments following the interpreter are tokenized (split on space)\n// and on Linux, they are not.\n//\n// This leads to a few conclusions:\n// - We can get around lack of symlinks by using #! (sh-bang) files with an absolute path to\n// bb as the interpreter, e.g. #!/abs/path/to/bb argument.\n// This achieves the \"exec once\" goal.\n// - We can specify which u-root tool to use via arguments to bb in the #! file.\n// - The argument to the interpreter (/bbin/bb) should be one token (e.g. ls) because of different\n// behavior in different systems (some tokenize, some do not).\n// - Because of the differences in how arguments are presented to #! on different kernels,\n// there should be a reasonably unique marker so that bb can have confidence that\n// it is running as an interpreter.\n//\n// The conclusions lead to the following design:\n// #! files for bb specify their argument with #!. E.g., the file for ls looks like this:\n// #!/bbin/bb #!ls\n// On Linux, the args to bb then look like:\n// [/bbin/bb #!ls /tmp/ls ...]\n// on Plan 9:\n// [ls #!ls /tmp/ls ...]\n// The code needs to change the arguments to look like an exec:\n// [/tmp/ls ...]\n// In each case, the second arg begins with a #!, which is extremely unlikely to appear\n// in any other context (save testing #! files, of course).\n// The result is that the kernel, given a path to a u-root #! file, will read that file,\n// then exec bbin with the argument from the #! and any additional arguments from the exec.\n// The overhead in this case is no more than the symlink overhead.\n// A final advantage is that we can now install u-root on file systems that don't have\n// symbolic links, e.g. VFAT, and it will have low overhead.\n//\n// So, dear reader, if you are wondering why the little bit of code below is the way\n// it is, now you know.\nfunc init() {\n\t// If this has been run from a #! file, it will have at least\n\t// 3 args, and os.Args needs to be reconstructed.\n\tif len(os.Args) > 2 && strings.HasPrefix(os.Args[1], \"#!\") {\n\t\tos.Args = os.Args[2:]\n\t}\n\tm := func() {\n\t\tif len(os.Args) == 1 {\n\t\t\tlog.Fatalf(\"Invalid busybox command: %q\", os.Args)\n\t\t}\n\t\t// Use argv[1] as the name.\n\t\tos.Args = os.Args[1:]\n\t\trun()\n\t}\n\tbbmain.Register(\"bbdiagnose\", bbmain.Noop, bbmain.ListCmds)\n\tbbmain.RegisterDefault(bbmain.Noop, m)\n}\n")

0 comments on commit 95851a7

Please sign in to comment.