Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Bigarray as message storage #49

Open
andreas opened this issue Jan 14, 2018 · 5 comments
Open

Using Bigarray as message storage #49

andreas opened this issue Jan 14, 2018 · 5 comments

Comments

@andreas
Copy link

andreas commented Jan 14, 2018

The README mentions using Bigarray as message storage, but I haven't been able to find any examples in this repo or elsewhere. I've implemented a module using Bigstring which satisfies Capnp.MessageSig.S, but it's still not clear to me how to serialize/unserialize in a zero-copy fashion, e.g. using Writer and Reader from Async. If you can point to any examples of using Capnp with Bigarray, I would appreciate it.

Thanks 🙏

@pelzlpj
Copy link
Contributor

pelzlpj commented Jan 15, 2018

I'm not aware of any examples using Bigarray or Bigstring. But if you've implemented a module that satisfies Capnp.MessageSig.S, you're just about done. Examples from the benchmark might be helpful:
https://github.com/capnproto/capnp-ocaml/blob/master/src/benchmark/capnpCarsales.ml
The first line instantiates the Carsales.Make functor on BytesMessage; if you instantiate on your Bigstring-based module instead, in theory that should give you zero-copy semantics for most of the struct field accessors. ("Most" because string fields require a copy for reasons of API practicality.)

Of course, if you want to send your message across some channel, the I/O is going to look different because you're not using Bytes-backed storage. The benchmark is based on Unix read and write (https://github.com/capnproto/capnp-ocaml/blob/master/src/benchmark/methods.ml) and I guess you would need to replace that with something that knows about Bigstring.

@talex5
Copy link
Collaborator

talex5 commented Jan 15, 2018

I had a brief look at a Cstruct-backed version once. As I recall, the main problem was that https://github.com/capnproto/capnp-ocaml/blob/master/src/runtime/codecs.mli only works on ByteMessages (but wouldn't be too hard to fix).

@pelzlpj
Copy link
Contributor

pelzlpj commented Jan 15, 2018

Note that if you are actually trying to do message passing via mapped memory, you'll have some extra work to do.

When sending messages across a channel, Cap'n Proto specifies a standardized message framing format as well as a compression scheme. Messages get a small header prepended so that the receiver knows what's coming (how many segments in the message, and how long the segments are). This logic is captured in codecs.mli, and it's not generalized beyond BytesMessage because it wasn't clear whether it makes sense for other message storage formats.

If you're using a shared memory transport, Cap'n Proto does not (yet) specify a format for the message framing information. The process which builds the message has to somehow communicate to the reader process some of the metadata about the message: where are the message segments located within your mapped buffer, and how big are they? You would have to decide on a convention for passing this information, and you would also have to ensure that the builder and reader appropriately synchronize their accesses to the buffer (e.g. with semaphores).

@andreas
Copy link
Author

andreas commented Jan 15, 2018

Thanks for the input! My use case is efficiently folding over a large file containing many small messages (current implementation uses bin_prot and suggests there's time to be saved on deserialization).

If I understand correctly, I'll have to handle framing myself as described in the spec:

(4 bytes) The number of segments, minus one (since there is always at least one segment).
(N * 4 bytes) The size of each segment, in words.
(0 or 4 bytes) Padding up to the next word boundary.
The content of each segment, in order.

That seems fairly simple. Feel free to close the issue -- I'll report back if anything meaningful comes out of it.

@pelzlpj
Copy link
Contributor

pelzlpj commented Jan 15, 2018

Hard to know without trying it, but I suspect that Bigstring storage isn't going to help much for that use case. mmap() tricks generally won't outperform read() if you're just walking through a file sequentially. Under that assumption, you might find that IO.create_read_context_for_channel is close to optimal for decoding messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants