Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add ability to serialize and deserialize Batches and Collections #60

Open
jonbonazza opened this issue May 15, 2018 · 6 comments

Comments

@jonbonazza
Copy link

jonbonazza commented May 15, 2018

I am working on a distributed cache that spreads Moss Collections across a cluster of nodes and while I have it working for basic Get, Set, and Delete operations, without the ability to serialize Batches, there isn't a really good way to replicate Batch operations. One solution would be to create my own batch implementation that can be serialized then "replay" the batch on the receiving node to create a moss.Batch, but it would be more convenient if a Batch could just be serialized directly and then deserialized on the receiving end.

Similarly, I am using Raft for my replication and it would be nice if I could serialize an entire Collection so that I can create a Raft snapshot periodically. Currently, I am just iterating through all of the KVPs in the Collection and serializing them individually with my own serialization format, but this requires me to implement compaction and what-not myself and since Moss already has its own persistence format, as well as its own compaction algorithm, it would be nice to reuse this.

I'm willing to implement both of these myself and submit PRs, but I was wondering if you had any pointers on doing this in a way that is backwards compatible and fits the overall vision and design goals of Moss.

@steveyen
Copy link
Member

Hi @jonbonazza

https://github.com/huton-io/huton looks like a VERY interesting project! Neat!

One thought is a moss Batch/batch is a pretty thin wrapper around a Segment/segment, so adding a public method on a batch that allows access to the underlying Segment should be easy, BUT....

But, there's a BIG caveat: the existing Segment persistence approach only works if all your participating machines have the same endianess encoding of int's (as that's a main assumption that allows moss to gain higher performance)... https://github.com/couchbase/moss/blob/master/api.go#L69

If that's an acceptable caveat, then a Segment then ought to be serializable with the existing loader/persister routines... https://github.com/couchbase/moss/blob/master/segment.go#L27

If that's not an acceptable caveat, then it'd be a lot more work (which I haven't thought through deeply -- but, your approach of iterating through the mutations and serializing them yourself is about as good as you might be able to achieve).

... it would be nice if I could serialize an entire Collection so that I can create a Raft snapshot periodically

Since the moss persistence file format is append-only, it should be possible and workable that a "network copy or transfer of the entire moss subdirectory" as-is from one machine to another, without any new features or improvements to moss, ought to work... even while the moss subdirectory is concurrently receiving new mutations or even undergoing compaction.

Again, that comes with that big caveat that the participating machines need to have the same architecture or endian'ness.

And, of course, I might be missing some issue or other complication that I don't see yet.

@jonbonazza
Copy link
Author

Hi,

I was looking at the SegmentPersister interface, and it seems to only support persisting to a File. In my use case, I need the ability to send over the Raft network via grpc.

In my particular use case, I dont have the luxury of streaming the data, and instead need to have the entire byte slice in memory, but maybe it's better to have a function that serializes a Segment to an io.Writer, then I could just use a bytes.Buffer to get a byte slice.

@jonbonazza
Copy link
Author

@steveyen Hey Steve, any updates on this?

@steveyen
Copy link
Member

@steveyen Hey Steve, any updates on this?

yikes -- sorry for the lack of response!

One quick thought is the SegmentPersister interface does indeed persist to a File...

Persist(file File, options *StoreOptions) (SegmentLoc, error)
https://github.com/couchbase/moss/blob/master/segment.go#L95 

...but, that File type is actually an interface, defined here...

https://github.com/couchbase/moss/blob/master/file.go#L29

It was designed that way so that users could pass in their own File implementations -- like in your case you can implement the interface so it's just backed by memory buffers instead of by a real file.

@jonbonazza
Copy link
Author

Hey Steve, no worries at all!

That definitely seems like an interesting approach. I'll prototype it and let you know how it goes!

P.S. I spoke to one of your colleagues, Tron, at Gophercon in Denver today. He was mentioning that couchbase has since moved away from Moss in favor of a more Project specific indexing implementation. I was curious what that means for the future of Moss. Is it here to stay?

@tleyden
Copy link
Contributor

tleyden commented Aug 29, 2018

@steveyen Hopefully that wasn’t disinformation, but I was under the impression that scorch was the new storage engine for FTS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants