LexicalConversion

Default Syntax

In R7RS and R5RS modes, Larceny normally recognizes R7RS lexical syntax together with most of the lexical syntax specified by the older R6RS, R5RS, and IEEE standards for Scheme. In R6RS mode, Larceny normally recognizes only R6RS lexical syntax, because the R6RS standard explicitly forbids most lexical extensions:

An implementation must not extend the lexical or datum syntax in any way, with one exception: it need not treat the syntax #!<identifier>, for any <identifier> (see section 4.2.4) that is not r6rs, as a syntax violation, and it may use specific #!-prefixed identifiers as flags indicating that subsequent input contains extensions to the standard lexical or datum syntax.

The lexical syntax allowed on a textual input port can be altered by reading a #!r7rs, #!r6rs, #!r5rs, #!err5rs, #!larceny, #!fold-case, or #!no-fold-case flag from the port. These flags and their effects are described below.

The lexical syntax allowed on a newly opened textual input port is determined by a set of parameters described below. The initial values of those parameters are determined by the mode option (-r7rs, -r6rs, or -r5rs) specified on Larceny's command line.

Those parameters also determine the lexical syntax associated with a newly opened textual output port, which influences some of the lexical conventions used when writing to that port.

Case Sensitivity

Larceny is case-sensitive by default. This can be changed on Larceny's command line by using the -foldcase or -nofoldcase options, and can also be changed at runtime.

Case-sensitivity is a property of individual ports. The case-sensitivity of a newly created textual port is determined by the case-sensitive? parameter, and can be changed by reading a #!r7rs, #!r6rs, #!r5rs, #!larceny, #!fold-case, or #!no-fold-case flag from the port. These flags and their effects are described in the next section.

Flags

These flags may be placed anywhere within a file that contains Scheme code or data. Their effect is limited to that file, and to the text following the flag.

These flags may also be typed at an interactive top level, in which case their effect is limited to the (current-input-port) from which they are read.

Apart from their side effects, which are limited to the port from which they are read, all of these flags are read as comments. (That's always been true of the r6rs flag, but the other flags evaluated to unspecified values in Larceny through v0.97 and v0.98b1.)

`#!r7rs`

Tells the read and get-datum procedures to read from the port in a mode that recognizes all R7RS lexical syntax. Does not disable any other lexical syntax, so it can be used in combination with other flags. Implies case-sensitivity.

`#!r6rs`

Tells the read and get-datum procedures to read from the port in a mode that enforces all lexical restrictions imposed by the R6RS. Disables all other lexical syntax, so it cancels the effect of any previous #!r7rs, #!r5rs, or #!larceny flag. Implies case-sensitivity.

`#!r5rs`

Tells the read and get-datum procedures to read from the port in an R5RS-compatible mode that allows R5RS and R7RS lexical syntax along with Larceny's usual lexical extensions. Implies case-insensitivity.

`#!err5rs`

Tells the read and get-datum procedures to read from the port in a mode that allows R5RS, R6RS, and R7RS lexical syntax along with Larceny's usual lexical extensions. Does not affect case-sensitivity. Note: Starting with Larceny v0.98, this flag is deprecated.

`#!larceny`

Tells the read and get-datum procedures to read from the port in a mode that allows R5RS, R6RS, and R7RS lexical syntax along with Larceny's usual lexical extensions. Implies case-sensitivity.

`#!fold-case`

Tells the read and get-datum procedures to use Unicode's locale-independent case-folding algorithm on the names of symbols that are not written with R7RS-compliant vertical bars or an R6RS-compliant hexadecimal escape. (If R7RS lexical syntax is enabled, then surrounding the symbol with vertical bars will disable folding on that symbol. If R6RS lexical syntax is enabled, then using a hexadecimal escape within the symbol will disable folding on the entire symbol. If Larceny's traditional extensions are enabled, then any backslash escapes within a symbol will disable case-folding on the entire symbol. This behavior is compatible with but not mandated by the R7RS/R6RS/R5RS/IEEE standards.)

`#!no-fold-case`

Tells the read and get-datum procedures not to mess with the names of symbols.

Port-specific Parameters

These parameters determine the lexical syntax associated with newly created textual ports. The lexical syntax associated with textual input ports can be altered by reading one of the flags described above from the port.

`case-sensitive?`

if true: symbols are case-sensitive
if false: symbols are not case-sensitive (with exceptions listed in the description of #!fold-case)

`read-r6rs-flags?`

if true: allow flags other than #!r6rs
if false: treat flags other than #!r6rs as errors

`read-r7rs-weirdness?`

if true: allow all R7RS lexical syntax
if false: do not allow R7RS-specific extensions to the R5RS lexical syntax

`read-r6rs-weirdness?`

if true: allow all R6RS lexical syntax
if false: do not allow R6RS-specific extensions to the R5RS lexical syntax

`read-larceny-weirdness?`

allow # as insignificant digit in numerals (required by R5RS)
allow some nonstandard peculiar identifiers (-- -1+ 1+ 1-)
allow leading . or @ or +: or -: in symbols
allow backslashes in strings before characters that don't have to be escaped
allow vertical bar as a <subsequent> in symbols (used in FASL files)
allow #^B #^C #^F #^P #^G randomness (used in FASL files)
Note: all of these extensions are deprecated

`read-traditional-weirdness?`

allow vertical bars surrounding symbol (even if other R7RS extensions are disallowed)
allow backslash escaping within symbols
allow unconditional downcasing of the character following #
allow #!...!# comments (but these are not implemented in v0.94; see lib/Standard/exec-comment.sch)
allow #.(...) read-time evaluation (see lib/Standard/sharp-dot.sch)
allow #&... (but this doesn't work in v0.94; see lib/Standard/box.sch)
Note: all but the first of these extensions are deprecated

`read-mzscheme-weirdness?`

allow MzScheme #\uXX character extension
allow MzScheme #% randomness
allow #"..." randomness
Note: all of these extensions are deprecated

`recognize-javadot-symbols?`

recognize JavaDot symbols (for the subset of JavaDot symbols that are allowed by the lexical mode in effect when the symbol is read)

Global Parameters

`read-square-bracket-as-paren`

if true: allow square brackets even if R6RS lexical extensions are not otherwise allowed
if false: allow square brackets only if R6RS lexical extensions are allowed

`recognize-keywords?`

if true: treat colon keywords specially (e.g. :foo)
Note: The reader sets this parameter but does not consult it. The macro expander consults it.
Note: This parameter is strongly deprecated.

R7RS Lexical Syntax

The R7RS describes a language that extends the R5RS in several important ways. Some of those extensions are taken from or compatible with the R6RS, while others are not.

Known incompatibilities between R5RS and R7RS lexical syntax

case sensitivity
R7RS does not mandate support for the # digit in numeric literals
R7RS does not mandate support for exponent markers other than e

Important extensions provided by the R7RS lexical syntax

the #!fold-case and #!no-fold-case flags offer control over case-sensitivity
identifiers can include the @ character
identifiers can begin and end with vertical bars, allowing unusual characters to be specified by mnemonic or hexadecimal escape sequences
those mnemonic and hexadecimal escape sequences can also be used within string literals
external representations for inexact infinities and NaNs (e.g. -inf.0, +nan.0)
external representations for bytevectors (e.g. #u8(105 226 153 165 206 187 97 114 99 101 110 121))
external representations for shared or circular structures

Known incompatibilities between R6RS and R7RS lexical syntax

# is a delimiter in R6RS but not in R7RS
the R6RS does not allow identifiers to be surrounded by vertical bars
the R7RS does not allow hexadecimal escapes within identifiers that are not surrounded by vertical bars (but allowing them would be a legitimate extension)
the R6RS forbids mnemonic escapes within identifiers
the list of mnemonic escapes allowed within strings (e.g. \v)is different
the list of mnemonic characters (e.g. #\vtab) is different
the external syntax of bytevectors is different
the R6RS forbids the R7RS and SRFI-38 notation for shared/circular structures

The R7RS allows implementations to support extensions to the R7RS syntax. In modes other than R6RS, Larceny normally recognizes both R7RS and R6RS lexical syntax.

When writing to a textual output port:

if the port allows R7RS lexical syntax, then Larceny uses R7RS lexical syntax
if the port does not allow R7RS syntax but allows R6RS syntax, then Larceny uses R6RS syntax
if the port allows neither R7RS nor R6RS syntax, then Larceny uses R5RS syntax when possible and R7RS syntax with Larceny's traditional extensions when necessary (e.g. for bytevectors, circular structures, and identifiers that contain non-R5RS characters)

R6RS Lexical Syntax

The R6RS describes a language that extends the R5RS lexical syntax in several important ways, while forbidding most other extensions to the lexical syntax.

Known incompatibilities between R5RS and R6RS lexical syntax

case sensitivity
identifiers, numbers, characters, booleans, and dot must be followed by a delimiter

Important extensions provided by the R6RS lexical syntax

hexadecimal escape sequences allow identifiers to contain any Unicode characters (e.g. i\x2665;\x3bb;arceny)
hexadecimal escape sequences allow strings to contain any Unicode characters (e.g. "Kurt G\xf6;del")
hexadecimal escape sequence for any Unicode character (e.g. #\x2192)
names for selected Ascii characters (e.g. #\vtab)
single letter escapes for selected Ascii characters within strings (e.g. "Posterity shall ne'er survey\nA nobler grave than this...")
external representations for bytevectors (e.g. #vu8(105 226 153 165 206 187 97 114 99 101 110 121))

Larceny supports all lexical syntax of the R6RS. Mantissa widths are ignored, however, because all of Larceny's inexact reals are represented in IEEE double precision.

Past and Future

As of Larceny v0.94, Larceny's state machine and parser were generated by Will's LexGen and ParseGen tools, so we can regenerate the reader from a declarative specification.

Larceny's read procedure is not programmable. Future versions of Larceny might provide tools for constructing and installing custom readers, but the read procedure is one of Larceny's most complex components so customization will not be easy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly