Skip to content

OSH Word Evaluation Algorithm

andychu edited this page Jul 8, 2019 · 30 revisions

This page documents a portion of the OSH implementation. It differs significantly from other shells in this respect.

  • They tend to use a homogeneous tree with various flags (e.g. nosplit, assignment, etc.).
  • OSH uses a typed, heterogeneous tree (now statically checked with MyPy).

For example, word_part = LiteralPart(...) | BracedVarSub(...) | CommandSub(...) | ...

https://github.com/oilshell/oil/blob/master/frontend/syntax.asdl#L107

(Specifying ML-like data structures with ASDL was an implementation style borrowed from CPython itself: see posts tagged #ASDL)

Preliminaries

  1. As much parsing as possible is done in a single pass, with lexer modes.
  2. There are some subsequent tweaks for detecting assignments, tildes, etc.
  3. There is a "metaprogramming" pass for brace expansion: i=0; {$((i++)),x,y}

Word Evaluation Algorithm

There are three stages (not four as in POSIX):

  1. Evaluation of the typed tree. (using osh/word_eval.py)
    • There is a restricted variant of word evaluation for completion, e.g. so arbitrary processes aren't run with you hit TAB.
  2. Splitting with IFS. Ths is specified with a state machine in osh/split.py. (I think OSH is unique in this regard too.)
    • Splitting involves the concept of "frames", to handle things like x='a b'; y='c d'; echo $x"${@}"$y. The last part of $x has to be joined with argv[0], and argv[n-1] has to be joined with $y.
  3. Globbing.

There is no such thing as "quote removal" in OSH (e.g. any more than a Python or JavaScript interpreter has "quote removal"). It's just evaluation.

Bug: Internally, splitting and globbing both use \ to inhibit expansion. That is, \* is an escaped glob. And \ is an escaped space (IFS character).

This causes problems when IFS='\'. I think I could choose a different character for OSH, maybe even the NUL byte.

Clone this wiki locally