Skip to content

Latest commit

 

History

History
93 lines (79 loc) · 4.7 KB

stream-poisoning.md

File metadata and controls

93 lines (79 loc) · 4.7 KB

Detecting errors in upstream processes via Stream Poisoning

jb can detect errors in upstream jb calls that are pulled into a downstream jb process, such as when several jb calls are fed into each other using process substitution.

$ # The jb call 3 levels deep reading the missing file ./not-found fails
$ jb club:json@<(
>   jb name="jb Users" members:json[]@<(
>     jb name=h4l; jb name@./not-found
>   )
> )
/.../bin/jb: line ...: ./not-found: No such file or directory
json(): Could not open the file './not-found' referenced as the value of argument 'name@./not-found'.
json.encode_json(): not all inputs are valid JSON: '{"name":"h4l"}' $'\030'
json(): Could not encode the value of argument 'members:json[]@/dev/fd/...' as an array with 'json' values. Read from file /dev/fd/..., split into chunks on $'\n', interpreted chunks with 'raw' format.
json.encode_json(): not all inputs are valid JSON: $'\030'
json(): Could not encode the value of argument 'club:json@/dev/fd/...' as a 'json' value. Read from file /dev/fd/..., up to the first 0x00 byte or end-of-file.
�␘

jb detects the error in the external process using Stream Poisoning — a simple in-band protocol that propagates an error signal from upstream processes down to processes consuming their output.

The conventional pattern command-line programs use when needing to fail is to exit with a non-zero exit status, emit nothing on stdout and an error message on stderr.

The problem with this pattern is that a downstream program consuming the output of a program failing upstream only has access to the stdout stream of the program. The exit status of process failing in a sub-shell is not easily available. (There are various caveats to this, e.g. bash provides the pipefail option, but this only helps react to an upstream error after an operation has completed, and doesn't help when using nested process substitution.)

jb takes the opinion that errors are part of the normal behaviour of jb, and so it communicates errors in its normal output to stdout when it fails. This lets jb propagate an error from the source, down through several intermediate programs to the ultimate jb (or other JSON-processing) program. (Despite the programs not being aware of each other.)

We call this pattern Stream Poisoning because it works by intentionally making the JSON output of jb invalid by injecting a Cancel control character (\x18 / \030 / ^X). Control characters like Cancel are not allowed to occur in valid JSON documents, so the presence poisons the JSON output by rendering it invalid. The poison of the Cancel character will propagate from the failed jb program, down through any intermediate programs (even JSON-unaware programs handling text) until its presence causes the most-downstream JSON-consuming program to fail.

This is why you'll see the Unicode visual symbol for Cancel: ␘ after the error message when jb fails. Terminal programs typically don't display a Cancel character, so when jb outputs an error to an interactive terminal it prints a ␘ as well as the actual Cancel character to hint that the output is not empty. When jb outputs to a non-interactive destination, like input stream of another process, or a file, it only emits the actual Cancel character.

$ # \030 is the octal escape for Cancel (0x18 / decimal 24)
$ jb @error | { read jbout; echo "jb stdout: ${jbout@Q}"; }
json(): Could not process argument '@error'. Its value references unbound variable $error. (Use the '~' flag after the :type to treat a missing value as empty.)
jb stdout: $'\030'

You might wonder why this is necessary, considering an empty string is not valid JSON either. The trouble with that is that when we combine the outputs of multiple programs in the shell, it's easy for the absence of output from a failed program to go unnoticed. For example:

$ # Everything OK here — 3 things
$ jb important_things:json[]@<(
>   echo '{"name":"Thing #1"}';
>   echo '{"name":"Thing #2"}';
>   echo '{"name":"Thing #3"}';
> )
{"important_things":[{"name":"Thing #1"},{"name":"Thing #2"},{"name":"Thing #3"}]}

$ # What if the process creating Thing #2 fails with an error? We silently loose it.
$ jb important_things:json[]@<(
>   echo '{"name":"Thing #1"}';
>   false && echo '{"name":"Thing #2"}';  # ← 💥
>   echo '{"name":"Thing #3"}';
> )
{"important_things":[{"name":"Thing #1"},{"name":"Thing #3"}]}