Skip to content
Felix S. Klock II edited this page Jul 28, 2013 · 4 revisions

Larceny now uses Unicode strings with R6RS semantics.

Native Larceny (both Sparc and x86-32) can still be built with Latin-1 strings, and we expect to offer another representation of Unicode strings as a build-time option in a future version of Larceny. Common Larceny and Petit Larceny no longer support Latin-1 strings at all.

For the rationale behind Larceny's current and projected representations of strings, see StringRepresentations.

What follows is mostly out of date, and should be revised.


Will is responsible for the portable reference implementation of the (r6rs unicode) library, which was based on Mike Sperber's implementation for Scheme 48. That reference implementation was dropped into Larceny's code base without change, except for the elimination of library and #vu8(...) syntax that would have made cross-compilation more difficult.


Will's notes on what remains to be done for the SPARC.

BTW, we really need a more complete set of operations on bytevectors.

Asm/Sparc/peepopt.sch

Add peephole optimizations here. (Optional, not yet done.)

Compiler/sparc.imp.sch

string-ref should be added to $immediate-primops$.

Compiler/sparc.imp2.sch

The rep:ustring representation should be added, and new entries should be added to several of the tables. This is all optional, however. (It appears that the regular string operations haven't been added to some of those tables!)

Compiler/common.imp.sch

Done. All but ustring? have been added to the syntax definition for name:CALL.


Felix's notes on IA32

See changeset:4138, changeset:4142, and changeset:4143. (Note also the broader changes of changeset:4141.)

  • I did not bother allocating new primcodes for these operations; therefore the additions to the case expression in changeset:4138 do not have numbers in the front like the others.
    • I think we should get rid of the primcodes entirely if we can. They are ignored on Sparc and I think they are now ignored on IasnLarceny, so they are only there for PetitLarceny and maybe CommonLarceny.

Felix's notes on PetitLarceny

see changeset:4139 and changeset:4144. Note also the broader changes of changeset:4141.

  • I had to allocate some new numbers in the primcode space in [source:/trunk/larceny_src/src/Compiler/standard-C.imp.sch standard-C.imp.sch]. It seems like this is a bit of a mess.
  • My implementation is not equivalent to the string.sch ones that Will commented out in changeset:4137, because of endianness issues. (See changeset:4141.) But that should not matter as long as clients don't rely on a particular endianness when viewing strings as bytevectors, right?

Will's notes

Right.

We need to perform a large-scale cleanup of the primops on all four target architectures. Originally, each primop was responsible for performing its own safety checks. To make it possible for Twobit to optimize these checks away in safe code, the safety checks were removed from common primops and added to common.imp.sch. Many of those primops are still doing their own safety checking, however, which duplicates the check when Twobit cannot eliminate it, and continues to perform the check even when Twobit has proved it is unnecessary. In some cases, the safety-checking primops are never used, so the duplication does not occur, but the safety-checking primops are dead code and should be removed from our code base.

I volunteer to do this cleanup, starting with the ustring primops.

Clone this wiki locally