Skip to content
Will Clinger edited this page Jan 14, 2015 · 7 revisions

Default Syntax

In R7RS and R5RS modes, Larceny normally recognizes R7RS lexical syntax together with most of the lexical syntax specified by the older R6RS, R5RS, and IEEE standards for Scheme. In R6RS mode, Larceny normally recognizes only R6RS lexical syntax, because the R6RS standard explicitly forbids most lexical extensions:

An implementation must not extend the lexical or datum syntax in any way, with one exception: it need not treat the syntax #!<identifier>, for any <identifier> (see section 4.2.4) that is not r6rs, as a syntax violation, and it may use specific #!-prefixed identifiers as flags indicating that subsequent input contains extensions to the standard lexical or datum syntax.

The lexical syntax allowed on a textual input port can be altered by reading a #!r7rs, #!r6rs, #!r5rs, #!err5rs, #!larceny, #!fold-case, or #!no-fold-case flag from the port. These flags and their effects are described below.

The lexical syntax allowed on a newly opened textual input port is determined by a set of parameters described below. The initial values of those parameters are determined by the mode option (-r7rs, -r6rs, or -r5rs) specified on Larceny's command line.

Those parameters also determine the lexical syntax associated with a newly opened textual output port, which influences some of the lexical conventions used when writing to that port.

Case Sensitivity

Larceny is case-sensitive by default. This can be changed on Larceny's command line by using the -foldcase or -nofoldcase options, and can also be changed at runtime.

Case-sensitivity is a property of individual ports. The case-sensitivity of a newly created textual port is determined by the case-sensitive? parameter, and can be changed by reading a #!r7rs, #!r6rs, #!r5rs, #!larceny, #!fold-case, or #!no-fold-case flag from the port. These flags and their effects are described in the next section.


Flags

These flags may be placed anywhere within a file that contains Scheme code or data. Their effect is limited to that file, and to the text following the flag.

These flags may also be typed at an interactive top level, in which case their effect is limited to the (current-input-port) from which they are read.

Apart from their side effects, which are limited to the port from which they are read, all of these flags are read as comments. (That's always been true of the r6rs flag, but the other flags evaluated to unspecified values in Larceny through v0.97 and v0.98b1.)

#!r7rs

Tells the read and get-datum procedures to read from the port in a mode that recognizes all R7RS lexical syntax. Does not disable any other lexical syntax, so it can be used in combination with other flags. Implies case-sensitivity.

#!r6rs

Tells the read and get-datum procedures to read from the port in a mode that enforces all lexical restrictions imposed by the R6RS. Disables all other lexical syntax, so it cancels the effect of any previous #!r7rs, #!r5rs, or #!larceny flag. Implies case-sensitivity.

#!r5rs

Tells the read and get-datum procedures to read from the port in an R5RS-compatible mode that allows R5RS and R7RS lexical syntax along with Larceny's usual lexical extensions. Implies case-insensitivity.

#!err5rs

Tells the read and get-datum procedures to read from the port in a mode that allows R5RS, R6RS, and R7RS lexical syntax along with Larceny's usual lexical extensions. Does not affect case-sensitivity. Note: Starting with Larceny v0.98, this flag is deprecated.

#!larceny

Tells the read and get-datum procedures to read from the port in a mode that allows R5RS, R6RS, and R7RS lexical syntax along with Larceny's usual lexical extensions. Implies case-sensitivity.

#!fold-case

Tells the read and get-datum procedures to use Unicode's locale-independent case-folding algorithm on the names of symbols that are not written with R7RS-compliant vertical bars or an R6RS-compliant hexadecimal escape. (If R7RS lexical syntax is enabled, then surrounding the symbol with vertical bars will disable folding on that symbol. If R6RS lexical syntax is enabled, then using a hexadecimal escape within the symbol will disable folding on the entire symbol. If Larceny's traditional extensions are enabled, then any backslash escapes within a symbol will disable case-folding on the entire symbol. This behavior is compatible with but not mandated by the R7RS/R6RS/R5RS/IEEE standards.)

#!no-fold-case

Tells the read and get-datum procedures not to mess with the names of symbols.


Port-specific Parameters

These parameters determine the lexical syntax associated with newly created textual ports. The lexical syntax associated with textual input ports can be altered by reading one of the flags described above from the port.

case-sensitive?

  • if true: symbols are case-sensitive
  • if false: symbols are not case-sensitive (with exceptions listed in the description of #!fold-case)

read-r6rs-flags?

  • if true: allow flags other than #!r6rs
  • if false: treat flags other than #!r6rs as errors

read-r7rs-weirdness?

  • if true: allow all R7RS lexical syntax
  • if false: do not allow R7RS-specific extensions to the R5RS lexical syntax

read-r6rs-weirdness?

  • if true: allow all R6RS lexical syntax
  • if false: do not allow R6RS-specific extensions to the R5RS lexical syntax

read-larceny-weirdness?

  • allow # as insignificant digit in numerals (required by R5RS)
  • allow some nonstandard peculiar identifiers (-- -1+ 1+ 1-)
  • allow leading . or @ or +: or -: in symbols
  • allow backslashes in strings before characters that don't have to be escaped
  • allow vertical bar as a <subsequent> in symbols (used in FASL files)
  • allow #^B #^C #^F #^P #^G randomness (used in FASL files)
  • Note: all of these extensions are deprecated

read-traditional-weirdness?

  • allow vertical bars surrounding symbol (even if other R7RS extensions are disallowed)
  • allow backslash escaping within symbols
  • allow unconditional downcasing of the character following #
  • allow #!...!# comments (but these are not implemented in v0.94; see lib/Standard/exec-comment.sch)
  • allow #.(...) read-time evaluation (see lib/Standard/sharp-dot.sch)
  • allow #&... (but this doesn't work in v0.94; see lib/Standard/box.sch)
  • Note: all but the first of these extensions are deprecated

read-mzscheme-weirdness?

  • allow MzScheme #\uXX character extension
  • allow MzScheme #% randomness
  • allow #"..." randomness
  • Note: all of these extensions are deprecated

recognize-javadot-symbols?

  • recognize JavaDot symbols (for the subset of JavaDot symbols that are allowed by the lexical mode in effect when the symbol is read)

Global Parameters

read-square-bracket-as-paren

  • if true: allow square brackets even if R6RS lexical extensions are not otherwise allowed
  • if false: allow square brackets only if R6RS lexical extensions are allowed

recognize-keywords?

  • if true: treat colon keywords specially (e.g. :foo)
  • Note: The reader sets this parameter but does not consult it. The macro expander consults it.
  • Note: This parameter is strongly deprecated.

R7RS Lexical Syntax

The R7RS describes a language that extends the R5RS in several important ways. Some of those extensions are taken from or compatible with the R6RS, while others are not.

Known incompatibilities between R5RS and R7RS lexical syntax

  • case sensitivity
  • R7RS does not mandate support for the # digit in numeric literals
  • R7RS does not mandate support for exponent markers other than e

Important extensions provided by the R7RS lexical syntax

  • the #!fold-case and #!no-fold-case flags offer control over case-sensitivity
  • identifiers can include the @ character
  • identifiers can begin and end with vertical bars, allowing unusual characters to be specified by mnemonic or hexadecimal escape sequences
  • those mnemonic and hexadecimal escape sequences can also be used within string literals
  • external representations for inexact infinities and NaNs (e.g. -inf.0, +nan.0)
  • external representations for bytevectors (e.g. #u8(105 226 153 165 206 187 97 114 99 101 110 121))
  • external representations for shared or circular structures

Known incompatibilities between R6RS and R7RS lexical syntax

  • # is a delimiter in R6RS but not in R7RS
  • the R6RS does not allow identifiers to be surrounded by vertical bars
  • the R7RS does not allow hexadecimal escapes within identifiers that are not surrounded by vertical bars (but allowing them would be a legitimate extension)
  • the R6RS forbids mnemonic escapes within identifiers
  • the list of mnemonic escapes allowed within strings (e.g. \v)is different
  • the list of mnemonic characters (e.g. #\vtab) is different
  • the external syntax of bytevectors is different
  • the R6RS forbids the R7RS and SRFI-38 notation for shared/circular structures

The R7RS allows implementations to support extensions to the R7RS syntax. In modes other than R6RS, Larceny normally recognizes both R7RS and R6RS lexical syntax.

When writing to a textual output port:

  • if the port allows R7RS lexical syntax, then Larceny uses R7RS lexical syntax
  • if the port does not allow R7RS syntax but allows R6RS syntax, then Larceny uses R6RS syntax
  • if the port allows neither R7RS nor R6RS syntax, then Larceny uses R5RS syntax when possible and R7RS syntax with Larceny's traditional extensions when necessary (e.g. for bytevectors, circular structures, and identifiers that contain non-R5RS characters)

R6RS Lexical Syntax

The R6RS describes a language that extends the R5RS lexical syntax in several important ways, while forbidding most other extensions to the lexical syntax.

Known incompatibilities between R5RS and R6RS lexical syntax

  • case sensitivity
  • identifiers, numbers, characters, booleans, and dot must be followed by a delimiter

Important extensions provided by the R6RS lexical syntax

  • hexadecimal escape sequences allow identifiers to contain any Unicode characters (e.g. i\x2665;\x3bb;arceny)
  • hexadecimal escape sequences allow strings to contain any Unicode characters (e.g. "Kurt G\xf6;del")
  • hexadecimal escape sequence for any Unicode character (e.g. #\x2192)
  • names for selected Ascii characters (e.g. #\vtab)
  • single letter escapes for selected Ascii characters within strings (e.g. "Posterity shall ne'er survey\nA nobler grave than this...")
  • external representations for bytevectors (e.g. #vu8(105 226 153 165 206 187 97 114 99 101 110 121))

Larceny supports all lexical syntax of the R6RS. Mantissa widths are ignored, however, because all of Larceny's inexact reals are represented in IEEE double precision.


Past and Future

As of Larceny v0.94, Larceny's state machine and parser were generated by Will's LexGen and ParseGen tools, so we can regenerate the reader from a declarative specification.

Larceny's read procedure is not programmable. Future versions of Larceny might provide tools for constructing and installing custom readers, but the read procedure is one of Larceny's most complex components so customization will not be easy.


Clone this wiki locally