-
Notifications
You must be signed in to change notification settings - Fork 32
UniCodeTodo
Larceny now uses Unicode strings with R6RS semantics.
Native Larceny (both Sparc and x86-32) can still be built with Latin-1 strings, and we expect to offer another representation of Unicode strings as a build-time option in a future version of Larceny. Common Larceny and Petit Larceny no longer support Latin-1 strings at all.
For the rationale behind Larceny's current and projected representations of strings, see StringRepresentations.
What follows is mostly out of date, and should be revised.
Will is responsible for the portable reference implementation of the (r6rs unicode)
library, which was based on Mike Sperber's implementation for Scheme 48. That reference implementation was dropped into Larceny's code base without change, except for the elimination of library and #vu8(...)
syntax that would have made cross-compilation more difficult.
BTW, we really need a more complete set of operations on bytevectors.
Add peephole optimizations here. (Optional, not yet done.)
string-ref
should be added to $immediate-primops$
.
The rep:ustring
representation should be added, and new entries should be added to several of the tables. This is all optional, however. (It appears that the regular string operations haven't been added to some of those tables!)
Done. All but ustring?
have been added to the syntax definition for name:CALL.
Felix's notes on IA32
See changeset:4138, changeset:4142, and changeset:4143. (Note also the broader changes of changeset:4141.)
- I did not bother allocating new primcodes for these operations; therefore the additions to the case expression in changeset:4138 do not have numbers in the front like the others.
- I think we should get rid of the primcodes entirely if we can. They are ignored on Sparc and I think they are now ignored on IasnLarceny, so they are only there for PetitLarceny and maybe CommonLarceny.
Felix's notes on PetitLarceny
see changeset:4139 and changeset:4144. Note also the broader changes of changeset:4141.
- I had to allocate some new numbers in the primcode space in [source:/trunk/larceny_src/src/Compiler/standard-C.imp.sch standard-C.imp.sch]. It seems like this is a bit of a mess.
- My implementation is not equivalent to the string.sch ones that Will commented out in changeset:4137, because of endianness issues. (See changeset:4141.) But that should not matter as long as clients don't rely on a particular endianness when viewing strings as bytevectors, right?
Right.
We need to perform a large-scale cleanup of the primops on all four target architectures. Originally, each primop was responsible for performing its own safety checks. To make it possible for Twobit to optimize these checks away in safe code, the safety checks were removed from common primops and added to common.imp.sch
. Many of those primops are still doing their own safety checking, however, which duplicates the check when Twobit cannot eliminate it, and continues to perform the check even when Twobit has proved it is unnecessary. In some cases, the safety-checking primops are never used, so the duplication does not occur, but the safety-checking primops are dead code and should be removed from our code base.
I volunteer to do this cleanup, starting with the ustring primops.