-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A REPL #138
Conversation
Interesting model. Will code ever be unloaded again? I suppose resetting the REPL could be done simply by restarting the subprocess. |
I cannot imagine good alternatives to the suggested model. The plan would be to allow code to grow in the child process with more and more dynamic libraries being loaded. One could of course imagine a kind of GC for discovering unreachable dynamic loaded libraries... Also mlb-file loading could be cached if there are no dependency breaks...
I also think I need to use named pipes in order for the child process to read and write from and to the standard file descriptors...
Do you have other (model) suggestions?
|
I think this is a decent model. Here are the disadvantages I can think of:
I don't think (1) is a big problem. Just make resetting the state (by resetting the process) an expected part of the workflow; e.g. by doing so whenever a file is freshly loaded. This is like (2) is probably also not a problem with the speed of modern computers, and the speed of MLKit. The delay will likely be noticeable (maybe 0.5 seconds?), but I don't think it will significantly impede productivity. I think (3) is a bigger problem. My model for working with REPLs often involve doing a few value bindings and then poking at them. I suppose I could put them in files instead. I suspect this goal is too optimistic:
Which Standard ML compilers could possibly be used this way? Except for MLKit, they all depend on heavy runtime systems, and e.g. MLton only supports whole program compilation. Also, most of them already have perfectly serviceable REPLs. While there is nothing wrong with a portable REPL per se, I don't think it is worth complicating the design in order to support portability. |
The way MLKit generates code for toplevel value bindings is that it generates a fresh labeled slot (64 bit) (in the data segment), binds the variable name to the label, and execute code that fills the slot. All this fits very well with the `LOADRUN` model, which will usually be followed by a `PRINT`...
My plan was to make everything work like MosML and SML/NJ with the property that the dynamic basis (value environment) is extended only if the topdec does not raise an exception, which is also why it is not sound to group-compile multiple toplevel declarations like
```
val a = 5; val _ = raise Div;
```
With respect to portability, what I meant was that the MLKit REPL could be compiled both with MLton and MLKit as the solution does not require special MLKit features at the ML level (e.g,. `Dynlib` : `DYNLIB`). Instead, all dynamic loading happens at the C level...
|
I have now updated the description with a mini proof of concept, which demonstrates how value declarations and linking will work...
|
The diff is a bit hard for me to grasp, but I'm very much looking forward to trying this out! When do you think a runnable prototype is ready? |
It already works:
```
bash-3.2$ rm -rf MLB && SML_LIB=. ./bin/mlkit
MLKit v4.7.4 (v4.7.4-8-gd0770db9-dirty - 2023-10-06T13:44:36+02:00) [X64
Backend]
Disabling garbage collection - it is not supported with the REPL
. infix + - <; fun sum (n:int) : int = if n < 0 then 0 else n + sum(n-1);
infix 0 +
infix 0 -
infix 0 <
val sum = fn : int->int
. exception MyExn
. structure A = struct val a = 4.3 val b = sum 23 type t = bool end;
structure A =
struct
type t = bool
val a = 4.300000 : real
val b = 276 : int
end
exception MyExn
.
```
The next step is to load already compiled MLB-files... Not too hard, I
think...
|
Is this expected to work under Linux yet? I get a very long list of dynamic linker errors when starting
|
It seems that only my local |
There seems to be some issues with the dynamic linking under Linux (Ubuntu) that I cannot fix before having access to a physical machine or at least some development machine I can login to... @athas : any good suggestions for a machine I can login to at DIKU? The old gpu0x-servers mentioned at http://github.com/diku-dk/howto don't seem to work anymore and I have limited success with the hendrix cluster (I'm awaiting some approval on identity.ku.dk, I think)... |
You should have access to these. |
It works now under linux to some extend:
Dynamic linking and position independent code works annoyingly differently with elf and mach-o... I still need to get rid of the ld warnings and to do something about having to set LD_LIBRARY_PATH... |
The sign bit in a nan is not meaningful. Does SML really require that you print it like that? |
According to the Standard ML Basis Library manual, IEEEReal.toString should prepend the sign bit whereas Real.toString should collapse all nans to nan, which apparently leads to differences on different architectures...
|
Also notice that the REPL pretty-printer is not finalised and therefore does wrong things...
|
Cool, it works! I suggest terminating parsing at every newline. I also supported multi-line input in the Futhark REPL for a while, but I ditched it because it is ultimately too confusing for users. It is also annoying to have to type a semicolon after every input. |
If you enter EOF (Ctrl-d), the REPL will go into an infinite loop. This is because the |
I kind of like the multi-line input support, which allows for general pasting of source code with newlines... |
It's your call, but you can look forward to spending the rest of your life answering questions from users as to why nothing happens when they enter an expression. |
I'll just ask them to try out the example with MosML... ;) |
I'm sure that the well-oiled Moscow ML support team can handle the torrent of MLKit REPL users. Beside, SML users seems to be rather proficient using a REPL. It has been at least a decade since I got a question like the one @athas is worried about. |
That is because there have been no new SML users since then. |
Hooray, the day of interactive MLKit is upon us! |
The purpose of this PR is to build a REPL for MLKit. Whereas the static aspects are mostly working, we need to decide on a mechanism for the dynamic aspects.
A Proposal
PRINT ty L
: On succes, reply withSTR N s
, wheres
is the result of pretty-printing the value of typety
located at locationL
and whereN
is the size ofs
.LOADRUN so-file L
: Load the shared library file specified by so-file into the process and run its code specified byL
. Before running the code, setup an exception handler that will print "Uncaught exception" in case of an uncaught exception. Reply withDONE
on success and withEXN
if an exception was raised.TERMINATE
: Terminate the child process. Reply withDONE
on success and withFAIL
on failure.An important design benefit of this approach is that the implementation is portable across Standard ML compilers, meaning that the Standard ML code involved does not depend on implementation details and dynamic linking support of a particular Standard ML compiler (the POSIX api will suffice). More specifically, the ML parent process, which hosts the compiler, will not need to perform any dynamic linking...
Progress
LOADRUN
command after compilation.PRINT
command(s).:cmd;
, wherecmd
is a command:cmd ::= set flag | set flag N | set flag S | unset flag | quit | help | help flag | flags
:menu;
and:menu N
for displaying groupings of flags.:load mlb-file
. We'll first compile the MLB-file (if it hasn't been compiled yet) and then create an so-file from the o-files. The difficult part is to assemble the basis in a clever way. We can avoid loading all the deep bases for each modcode leaf and instead only load the elaboration bases. We can then compute which deep bases need to be loaded (similar to how the Manager does it for MLB compilation...)PRINT
command. The design then makes use of a compiled type-indexed ML function, which has been exported to C (basis/repl.sml
) and loaded together with the basis library at initialisation time. The difficult part is to make this design work with all representation of values in MLKit, including unboxed data types.-no_basislib
is passed).Open Questions
gcc -shared -o file.so file.o -l...
.ty
given toPRINT
may be somewhat complex as the type needs to describe all datatypes used and the implementation details of each value constructor (i.e., boxity and tag). A nice aspect is that we can provide the "elaboration type" for shallow printing (without proceeding below abstract types) and the "implementation type" for deep printing (revealing the details of values with abstract types).runtimeSystem.a
). The entry point of the runtime system is the functionmain
, which parses command-line arguments and calls the functioncode
. For non-REPL code generation, link-code is generated, which defines the functioncode
. For REPL code generation, we link in initialisation code generated withCodeGenX64.generate_repl_init_code()
, which declares thecode
function.The Child Process
The child process is a program (started with a fork/exec pattern) that consists of the relevant runtime system (in fact,
runtimeSystemX.a
contains themain
function), generated assembler code for allocating global regions, and a message loop that waits for messages from the parent (see above) and reacts accordingly.A Mini Proof of Concept (https://gist.github.com/melsman/f3e8f587140fc1b8c82f630255b24169)
Below is a mini proof of concept in terms of a
Makefile
and a bunch of C-files illustrating the idea. Themain.c
function acts as the loading driver andruntime.c
is the runtime system (region management, C-level wrappers for ML code, etc). The filesa.c
,b.c
, andc.c
reflect toplevel declarations. Whereasa.c
can refer to identifiers (labels) inruntime.c
,b.c
can also refer to identifiers (labels) declared ina.c
. Notice that each toplevel declaration is compiled into a shared library that is loaded bymain.c
. Whereas the proof of concept just loads these shared libraries explicitly with calls todlopen
and runs the initialisation code for each toplevel declaration using calls todlsym
, the real implementation will include an interpreter that loads and initialises shared libraries as a result ofLOADRUN
calls.File
Makefile
File
main.c
File
runtime.c
File
a.c
File
b.c
File
c.c
Running this program yields the following output: