-
Notifications
You must be signed in to change notification settings - Fork 32
LexicalConversion
In R7RS and R5RS modes, Larceny normally recognizes R7RS lexical syntax together with most of the lexical syntax specified by the older R6RS, R5RS, and IEEE standards for Scheme. In R6RS mode, Larceny normally recognizes only R6RS lexical syntax, because the R6RS standard explicitly forbids most lexical extensions:
An implementation must not extend the lexical or datum syntax in any way, with one exception: it need not treat the syntax
#!<identifier>
, for any<identifier>
(see section 4.2.4) that is notr6rs
, as a syntax violation, and it may use specific#!
-prefixed identifiers as flags indicating that subsequent input contains extensions to the standard lexical or datum syntax.
The lexical syntax allowed on a textual input port can be altered by reading a #!r7rs
, #!r6rs
, #!r5rs
, #!err5rs
, #!larceny
, #!fold-case
, or #!no-fold-case
flag from the port. These flags and their effects are described below.
The lexical syntax allowed on a newly opened textual input port is determined by a set of parameters described below. The initial values of those parameters are determined by the mode option (-r7rs
, -r6rs
, or -r5rs
) specified on Larceny's command line.
Those parameters also determine the lexical syntax associated with a newly opened textual output port, which influences some of the lexical conventions used when writing to that port.
Larceny is case-sensitive by default. This can be changed on Larceny's command line by using the -foldcase
or -nofoldcase
options, and can also be changed at runtime.
Case-sensitivity is a property of individual ports. The case-sensitivity of a newly created textual port is determined by the case-sensitive?
parameter, and can be changed by reading a #!r7rs
, #!r6rs
, #!r5rs
, #!larceny
, #!fold-case
, or #!no-fold-case
flag from the port. These flags and their effects are described in the next section.
These flags may be placed anywhere within a file that contains Scheme code or data. Their effect is limited to that file, and to the text following the flag.
These flags may also be typed at an interactive top level, in which case their effect is limited to the (current-input-port)
from which they are read.
Apart from their side effects, which are limited to the port from which they are read, all of these flags are read as comments. (That's always been true of the r6rs
flag, but the other flags evaluated to unspecified values in Larceny through v0.97 and v0.98b1.)
Tells the read
and get-datum
procedures to read from the port in a mode that recognizes all R7RS lexical syntax. Does not disable any other lexical syntax, so it can be used in combination with other flags. Implies case-sensitivity.
Tells the read
and get-datum
procedures to read from the port in a mode that enforces all lexical restrictions imposed by the R6RS. Disables all other lexical syntax, so it cancels the effect of any previous #!r7rs
, #!r5rs
, or #!larceny
flag. Implies case-sensitivity.
Tells the read
and get-datum
procedures to read from the port in an R5RS-compatible mode that allows R5RS and R7RS lexical syntax along with Larceny's usual lexical extensions. Implies case-insensitivity.
Tells the read
and get-datum
procedures to read from the port in a mode that allows R5RS, R6RS, and R7RS lexical syntax along with Larceny's usual lexical extensions. Does not affect case-sensitivity.
Note: Starting with Larceny v0.98, this flag is deprecated.
Tells the read
and get-datum
procedures to read from the port in a mode that allows R5RS, R6RS, and R7RS lexical syntax along with Larceny's usual lexical extensions. Implies case-sensitivity.
Tells the read
and get-datum
procedures to use Unicode's locale-independent case-folding algorithm on the names of symbols that are not written with R7RS-compliant vertical bars or an R6RS-compliant hexadecimal escape. (If R7RS lexical syntax is enabled, then surrounding the symbol with vertical bars will disable folding on that symbol. If R6RS lexical syntax is enabled, then using a hexadecimal escape within the symbol will disable folding on the entire symbol. If Larceny's traditional extensions are enabled, then any backslash escapes within a symbol will disable case-folding on the entire symbol. This behavior is compatible with but not mandated by the R7RS/R6RS/R5RS/IEEE standards.)
Tells the read
and get-datum
procedures not to mess with the names of symbols.
These parameters determine the lexical syntax associated with newly created textual ports. The lexical syntax associated with textual input ports can be altered by reading one of the flags described above from the port.
- if true: symbols are case-sensitive
- if false: symbols are not case-sensitive (with exceptions listed in the description of
#!fold-case
)
- if true: allow flags other than
#!r6rs
- if false: treat flags other than
#!r6rs
as errors
- if true: allow all R7RS lexical syntax
- if false: do not allow R7RS-specific extensions to the R5RS lexical syntax
- if true: allow all R6RS lexical syntax
- if false: do not allow R6RS-specific extensions to the R5RS lexical syntax
- allow
#
as insignificant digit in numerals (required by R5RS) - allow some nonstandard peculiar identifiers (
-- -1+ 1+ 1-
) - allow leading
.
or@
or+:
or-:
in symbols - allow backslashes in strings before characters that don't have to be escaped
- allow vertical bar as a
<subsequent>
in symbols (used in FASL files) - allow
#^B #^C #^F #^P #^G
randomness (used in FASL files) - Note: all of these extensions are deprecated
- allow vertical bars surrounding symbol (even if other R7RS extensions are disallowed)
- allow backslash escaping within symbols
- allow unconditional downcasing of the character following
#
- allow
#!...!#
comments (but these are not implemented in v0.94; seelib/Standard/exec-comment.sch
) - allow
#.(...)
read-time evaluation (seelib/Standard/sharp-dot.sch
) - allow
#&...
(but this doesn't work in v0.94; seelib/Standard/box.sch
) - Note: all but the first of these extensions are deprecated
- allow MzScheme
#\uXX
character extension - allow MzScheme
#%
randomness - allow
#"..."
randomness - Note: all of these extensions are deprecated
- recognize JavaDot symbols (for the subset of JavaDot symbols that are allowed by the lexical mode in effect when the symbol is read)
- if true: allow square brackets even if R6RS lexical extensions are not otherwise allowed
- if false: allow square brackets only if R6RS lexical extensions are allowed
- if true: treat colon keywords specially (e.g.
:foo
) - Note: The reader sets this parameter but does not consult it. The macro expander consults it.
- Note: This parameter is strongly deprecated.
The R7RS describes a language that extends the R5RS in several important ways. Some of those extensions are taken from or compatible with the R6RS, while others are not.
- case sensitivity
- R7RS does not mandate support for the
#
digit in numeric literals - R7RS does not mandate support for exponent markers other than
e
- the
#!fold-case
and#!no-fold-case
flags offer control over case-sensitivity - identifiers can include the
@
character - identifiers can begin and end with vertical bars, allowing unusual characters to be specified by mnemonic or hexadecimal escape sequences
- those mnemonic and hexadecimal escape sequences can also be used within string literals
- external representations for inexact infinities and NaNs (e.g.
-inf.0
,+nan.0
) - external representations for bytevectors (e.g.
#u8(105 226 153 165 206 187 97 114 99 101 110 121)
) - external representations for shared or circular structures
-
#
is a delimiter in R6RS but not in R7RS - the R6RS does not allow identifiers to be surrounded by vertical bars
- the R7RS does not allow hexadecimal escapes within identifiers that are not surrounded by vertical bars (but allowing them would be a legitimate extension)
- the R6RS forbids mnemonic escapes within identifiers
- the list of mnemonic escapes allowed within strings (e.g.
\v
)is different - the list of mnemonic characters (e.g.
#\vtab
) is different - the external syntax of bytevectors is different
- the R6RS forbids the R7RS and SRFI-38 notation for shared/circular structures
The R7RS allows implementations to support extensions to the R7RS syntax. In modes other than R6RS, Larceny normally recognizes both R7RS and R6RS lexical syntax.
When writing to a textual output port:
- if the port allows R7RS lexical syntax, then Larceny uses R7RS lexical syntax
- if the port does not allow R7RS syntax but allows R6RS syntax, then Larceny uses R6RS syntax
- if the port allows neither R7RS nor R6RS syntax, then Larceny uses R5RS syntax when possible and R7RS syntax with Larceny's traditional extensions when necessary (e.g. for bytevectors, circular structures, and identifiers that contain non-R5RS characters)
The R6RS describes a language that extends the R5RS lexical syntax in several important ways, while forbidding most other extensions to the lexical syntax.
- case sensitivity
- identifiers, numbers, characters, booleans, and dot must be followed by a delimiter
- hexadecimal escape sequences allow identifiers to contain any Unicode characters (e.g.
i\x2665;\x3bb;arceny
) - hexadecimal escape sequences allow strings to contain any Unicode characters (e.g.
"Kurt G\xf6;del"
) - hexadecimal escape sequence for any Unicode character (e.g.
#\x2192
) - names for selected Ascii characters (e.g.
#\vtab
) - single letter escapes for selected Ascii characters within strings (e.g.
"Posterity shall ne'er survey\nA nobler grave than this..."
) - external representations for bytevectors (e.g.
#vu8(105 226 153 165 206 187 97 114 99 101 110 121)
)
Larceny supports all lexical syntax of the R6RS. Mantissa widths are ignored, however, because all of Larceny's inexact reals are represented in IEEE double precision.
As of Larceny v0.94, Larceny's state machine and parser were generated by Will's LexGen and ParseGen tools, so we can regenerate the reader from a declarative specification.
Larceny's read
procedure is not programmable. Future versions of Larceny might provide tools for constructing and installing custom readers, but the read
procedure is one of Larceny's most complex components so customization will not be easy.