-
-
Notifications
You must be signed in to change notification settings - Fork 163
OSH Word Evaluation Algorithm
This page documents a portion of the OSH implementation. It differs significantly from other shells in this respect.
- They tend to use a homogeneous tree with various flags (e.g.
nosplit
,assignment
, etc.). - OSH uses a typed, heterogeneous tree (now statically checked with MyPy).
For example, word_part = LiteralPart(...) | BracedVarSub(...) | CommandSub(...) | ...
https://github.com/oilshell/oil/blob/master/frontend/syntax.asdl#L107
(Specifying ML-like data structures with ASDL was an implementation style borrowed from CPython itself: see posts tagged #ASDL)
- As much parsing as possible is done in a single pass, with lexer modes.
- There are some subsequent tweaks for detecting assignments, tildes, etc.
- There is a "metaprogramming" pass for brace expansion:
i=0; {$((i++)),x,y}
There are three stages (not four as in POSIX):
- Evaluation of the typed tree. (using
osh/word_eval.py
)- There is a restricted variant of word evaluation for completion, e.g. so arbitrary processes aren't run with you hit TAB.
- Splitting with IFS. Ths is specified with a state machine in
osh/split.py
. (I think OSH is unique in this regard too.)- Splitting involves the concept of "frames", to handle things like
x='a b'; y='c d'; echo $x"${@}"$y
. The last part of$x
has to be joined withargv[0]
, andargv[n-1]
has to be joined with$y
.
- Splitting involves the concept of "frames", to handle things like
- Globbing.
There is no such thing as "quote removal" in OSH (e.g. any more than a Python or JavaScript interpreter has "quote removal"). It's just evaluation.
Bug: Internally, splitting and globbing both use \
to inhibit expansion. That is, \*
is an escaped glob. And \
is an escaped space (IFS character).
This causes problems when IFS='\'
. I think I could choose a different character for OSH, maybe even the NUL
byte.
OSH wants to treat all sublanguages uniformly. (Command, Word, Arith, and the non-POSIX bool [[
) are the main sublanguages.)
For some "dynamic" sublanguages like builtin flag syntax, we fall a bit short, but that could change in the future.
This matters for interactive completion, which wants to understand every sublanguage statically.
For example, note that you can have variable references in several sublanguages:
Static:
-
x=1
-- assignments are in the command language 3. `[[ $x -
echo ${x:-${x:-$y}}
-- word language -
echo $(( x + 1 ))
-- arithmetic language
Dynamic:
-
code='x=1'; readonly $code
-- the dynamic builtin language - Other builtins that manage variables:
getopts
read
unset
-
printf -v
in bash
- Variable references in
${!x}
in bash/ksh