Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate temporary buffer allocations during bson serialization #1628

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Edarke
Copy link
Contributor

@Edarke Edarke commented Feb 10, 2025

Fixes: JAVA-5786

  • Override Outputbuffer.toByteArray() to avoid extraneous array copy
  • Modify BsonBinaryWriter to call OutputBuffers.writeObjectId so implementations can opt to use putToByteBuffer rather than allocating a temporary byte[]
  • Wrap array in ByteBuffer so we can write numeric values 8 bytes at a time

Motivation

Atlas Search uses the Java driver to communicate with the Mongodb Server. We serialize up to ~300k documents per batch, primarily consisting of ObjectID and numbers. For these large batches, serialization has measurable overhead. This PR proposes some simple optimizations for BasicOutputBuffer without altering the API.

JMH Benchmarks

Benchmark Main (ops/ms) GC (B/op) This PR (ops/ms) GC (B/op)
toByteArray_large 0.907 26,777,857 2.269 13,388,928
toByteArray_small 110,515 160 184,816 80
encodeId 23,774 134 32,696 104
write_id 28,255 168 36,467 80
write_numbers 16,327 80 18,125 80

@jyemin
Copy link
Collaborator

jyemin commented Feb 10, 2025

@Edarke please open a Jira issue in the JAVA project for this PR, as per our contribution guide.

Thanks!

@@ -35,6 +38,12 @@ public class BasicOutputBuffer extends OutputBuffer {
private byte[] buffer;
private int position;

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems error-prone (and confusing) to add this field without completely replacing use of the two existing fields with the new one. For instance, you have to always remember to use absolute position when writing to buf, so that its own position is ignored in favor of position field.

What do you think about getting rid of buffer and position fields entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Updated in 0385da6

@Edarke Edarke requested a review from jyemin February 11, 2025 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants