Rework handling of assignment words #176
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request reworks the way assignment words are handled in Morbig. It is to be noted that Morbig was correctly promoting parsing tokens from WORD to ASSIGNMENT_WORD and thus yielding reasonable CSTs. What this PR fixes is only the processing of assignment words in word CSTs only.
The most important change is the disappearance of the
WordAssignmentWord
inword_cst
. I think this constructor made very little sense in itself because assignment words were already handled in the CST asCmdPrefix_AssignmentWord
constructors carrying a name and a word. There were probably two reasons why this word CST constructor existed:Because it might be practical for client libraries (eg. static analysers) to already have access to assignment words in other places (eg. as arguments of
alias
ormake
). I think this is a bad argument: in the semantics of Shell, those words are not assignment words, it is only the utilities that decide to see them as such, and they should do so in the way that is practical for them. We can offer a helper to do that nicely in word CSTs.As an intermediary state to store words that could be assignment words before reaching the proper assignment words recognition. I think that makes sense but (a) it clutters the output type with an extra useless constructor, (b) Morbig actually leaves quite a lot of those in its output, and (c) it breaks tilde prefixes recognition that was based on the presence of such constructors.
I think it is better to get rid of it altogether. This PR therefore removes the
WordAssignmentWord
constructor ofword_cst
, rewritesAssignment.recognize_assignment
to not rely on it, and gets rid ofPrelexerState.recognize_assignment
as well.Additionally, I took this opportunity to fix the tilde prefixes recognition mechanism. The main problem was that tilde prefixes recognition changes depending on whether we are in an assignment word, in which case it splits the word on colon characters and recognises tilde prefixes in all the components. This was being overzealous. This recognition is now made in two steps: during prelexing, only standard tilde prefixes (at the beginning of words) are recognised. During parsing, when yielding a
CmdPrefix_AssignmentWords
, tilde prefixes are recognised again, this time fully.I updated the expectation files and checked them one by one. I am happy with the result. A lot of them actually have to do with tests on alias and are not very interesting when it comes to parsing assignment words. There are two main interesting test files:
good/2.6-word-expansions/2.6.1-tilde-prefix/login.t
is the one discussed in Strip tilde prefixes fromWordTildePrefix
#164 (comment) which led to writing Should assignment words be recognised on the right-hand side of a command name? #165. After this change, the last lines:are correctly handled as containing no tilde prefixes.
tests/golden/good/2.9-shell-commands/2.9.1-simple-commands/assignment-words.t/input.sh
, funnily enough, explains the subtelty handled in this PR:However, Morbig, until now, was precisely recognising
CC=cc
as an assignment words. Not anymore.@yurug WDYT?