Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stream chunking in MongoDB event store #172

Open
oskardudycz opened this issue Jan 11, 2025 · 0 comments
Open

Add stream chunking in MongoDB event store #172

oskardudycz opened this issue Jan 11, 2025 · 0 comments
Labels
enhancement New feature or request mongodb

Comments

@oskardudycz
Copy link
Collaborator

oskardudycz commented Jan 11, 2025

A valid concern raised by @rkaw92 about the maximum size of the MongoDB document.

The maximum size is 16MB, which is actually more than the raw JSON size, as BSON used in MongoDB is a binary format. If we keep our streams short, that should be sufficient for most cases.

The stream per document will need to be chunked into multiple documents. The current structure could be expanded to include chunk numbers and set a unique index on stream name and chunk number. Once the size is reached, events could be moved to the new document. That can be detected with

  • error on update,
  • hard limit on the events number,
  • chunking after a certain event type,
  • proactively calculating the size of the document in the background.

We can make decision on which chunk to store on append (so after error, or reaching other condition). It is worth checking if using $merge operator can be helpful.

When reading, we'd target to read the last chunk (which can be done by searching by stream name and ordering by chunk number).

We may also need to use snapshots. Snapshots or summary events should allow targeting a single document. We’d read first the snapshot from the last chunk and then read the event slice starting from the stream position at which the snapshot was made (potentially that means that no events are read if we do a snapshot on each change).

If the snapshot schema gets a breaking change, we'd need to recalculate it based on all chunks.

The current inline projections are good enough for handling snapshots; just some syntactic sugar (similar to the default projections) would be needed. We also already have events slicing upon read.

It'd also be good to eventually do benchmarks checking the performance degradation when the size of the stream increases.

@oskardudycz oskardudycz added enhancement New feature or request mongodb labels Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mongodb
Projects
None yet
Development

No branches or pull requests

1 participant