-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serializing the state of the Starlark program/REPL #557
Comments
I'll share here my responses to another user who contacted me privately asking for a very similar feature. I've paraphrased their questions and edited my answers a little. To be clear, this feature is complex and invasive and I am not convinced it is worth supporting (or even fully feasible) in this repo; my advice below should be thought of as how to prepare of a fork of the interpreter that supports it. ...
Does the state in your case involve running threads, or just the state of the heap after all threads have finished? The latter is a strictly simpler problem because it doesn't require you to get into the guts of the interpreter to the same degree; you just need to implement a GC-like marking phase over the heap and, for each object, serialize it. Of course it requires that you know how to serialize every type of object you encounter, so it needs the "closed world" assumption: it can be implemented if you control the entire application, but not as a library linked against unknown new types of starlark.Value. Given the closed-world assumption, it seems like it should be relatively easy for you to fork starlark-go and add the hooks you need; you shouldn't need to change the original code very much to do this, which should make it easy for you to keep up with patches (which are in any case infrequent). Do you expose any Go APIs that mention starlark-go? If so, this would of course make the problem harder.
You would need to serialize the state of every thread. That means every Starlark frame (the operand stack, and all values reachable from it; the iterator stack, and all iterators) and every Go frame, including the local state of functions like sorted, min, and max, which all make Go->Starlark calls. Min and max make more than one, so you would need to remember the logical program counter too; and you'd need to do this for every built-in you've defined that can make callbacks (and for every future one that you add). If any of them hold locks, you'd need to record that. And then you need to arrange for both Starlark and Go frames to be resumable at a given logical program counter that makes a starlark.Call. And of course you'll also need to make sure that both ends of the channel agree on the Starlark version: you can't suspend in one version and resume in another.
If your task is to save the stack of a Starlark thread so that it can be resumed later, then you need to save the state of any Starlark function implemented in Go that happens to be active too. Functions like sorted, min, and max make callbacks, so can appear in the middle of the stack, not just as a leaf. Therefore you will need to redesign those functions so that they can be suspended and resumed. Of course, it's highly unlikely that any of these three particular functions will make a callback that does more than compare two values, far less trigger a thread suspension. But that's the discipline required by the model; and perhaps in future you will need to add more important Go functions that make Starlark callbacks.
It does make it simpler. You would need the act of executing the suspend operation to cause every active frame on the stack to record its state into a suspension (serializable continuation). You could do that by handling ErrSuspend after each CALL operation at each frame, or you could record the necessary information beforehand, similar to the way we update frame.pc for each program counter increment. Either way, you would need to ensure the operand stack and iter stack were saved in the frame or continuation. That's the serialization part. For deserialization, you would need to change Call and |
Thanks @adonovan for the detailed answer. In my case, it must be a lot simpler.
So the solution still is to manually go over all the objects in heap and construct the object graph like GC's Mark phase. |
I agree, those two assumptions do make things a great deal simpler as they mean the only types of Value you need to deal with are your Protean "Role" type, plus those defined by the interpreter itself. The main types--string, list, dict, and so on--are all API-complete, so you can make a perfect clone of a dict by making queries on the public API of the original one. But that's not the case for a handful of opaque values such as that returned by the |
Actually, In any case, I will implement it the way you suggested. |
Is there a way to serialize the state of a Starlark execution at a certain point of execution, and sometime later load from the serialized state to continue execution?
Context: I am implementing formal methods system that uses a Python'ish language. To simplify the implementation, I am using Starlark (with Go), so execute individual statements. It is a model checker, so whenever there is a choice of two possible transitions, the model checker will explore both the possibilities. For example, randomly choosing between head/tail will be model checked as, the state of the system will be cloned. In one world, the coin would have come out head and we will explore that path. In the other world, the coin would have landed tail and we will explore that path.
To implement this, I see only a few options.
For the initial launch and small models, this is okay, but the performance issues show up badly as the users model larger systems.
I am looking for any alternate options available to clone or serialize/load the state of the execution.
If it is possible to serialize the REPL state and be able to restore, this would simplify the implementation significantly. Is this possible?
The text was updated successfully, but these errors were encountered: