Terminal::ANSIParser - ANSI/VT stream parser
use Terminal::ANSIParser;
my @parsed;
# Default: Assume UTF-8 decoded inputs, thus working with codepoints
my &parse-codepoint := make-ansi-parser(emit-item => { @parsed.push: $_ });
parse-codepoint($_) for $input-buffer.list;
# Raw bytes: Assume raw byte stream input, for pre-Unicode terminal emulation
my &parse-byte := make-ansi-parser(emit-item => { @parsed.push: $_ },
:raw-bytes);
parse-byte($_) for $input-buffer.list;
# DEC Pedantic: Ignore xterm extensions, forcing pure DEC VT compatibility
# (implies :raw-bytes as well)
my &parse-pedantic := make-ansi-parser(emit-item => { @parsed.push: $_ },
:dec-pedantic);
parse-pedantic($_) for $input-buffer.list;
Terminal::ANSIParser
is a general parser for ANSI/VT escape codes, as defined by the related specs ANSI X3.64, ECMA-48, ISO/IEC 6429, and of course the actual physical DEC VT terminals generally considered the standard for escape code behavior.
The basic make-ansi-parser()
routine builds and returns a codepoint-by-codepoint (or byte-by-byte, if the :raw-bytes
option is True) table-based binary parsing state machine, based on (and extended from) the error-recovering state machine built from observed DEC VT behavior described at https://vt100.net/emu/dec_ansi_parser.
Each time the parser determines that it has parsed enough input, it emits a token representing the parsed data, which can take one of the following forms:
-
A plain codepoint (or byte if
:raw-bytes
is True), for passed through data when no escape sequence is active -
A
Terminal::ANSIParser::Sequence
object, if an escape sequence is parsed -
A
Terminal::ANSIParser::String
object, if a control string is parsed
A few Sequence
subclasses exist for separate cases:
-
Ignored
: invalid sequences that the parser decides should be ignored -
Incomplete
: sequences that were cut off by the start of another sequence or the end of the input data (see [End of Input](End of Input) below) -
SimpleEscape
: simple escape sequences such as function key codes -
CSI
: parameterized sequences beginning withCSI
(ESC [
)
Likewise, String
has its own subclasses:
-
DCS
: Device Control Strings -
OSC
: Operating System Commands -
SOS
: Strings beginning with a general Start Of String indicator -
PM
: Privacy Message (NOTE: NOT A SECURE FUNCTION) -
APC
: Application Program Command
End of input can be signaled by parsing an undefined "codepoint"/"byte"; any partial sequence in progress will be flushed as Incomplete
, and the undefined marker will be emitted as well, so that downstream consumers are also notified that input is complete.
Geoffrey Broadwell [email protected]
Copyright 2021-2024 Geoffrey Broadwell
This library is free software; you can redistribute it and/or modify it under the Artistic License 2.0.