JAVA-5788 Improve ByteBufferBsonOutput::writeCharacters #1629

franz1981 · 2025-02-11T09:41:52Z

While running few benchmarks vs https://github.com/quarkusio/quarkus-super-heroes/blob/main/README.md using native image (graal CE) we found

writeCharacters heap-based code path to be bloated by:

many not inlined calls: which Graal CE should be blamed since it has a very simple and naive inliner
an excessive usage of logic to handle the current buffer to be used to write chars into
zero loop unrolling due to the complexity of the main loop; since native image cannot profile branches cannot "shape" the assembly to grant latin to be a fast path

This patch is manually inlining the fast paths for both latin chars and heap buffers, assuming them to have enough available space.
This should help immensely JVM mode as well (i.e. JDK Hotspot) - so if there's any JMH bench I can re-use or run to verify it would be great.

NOTE: i could change the assumption which make the optimization to be triggered to be more "elastic" and use the heap buffers (byte[]) remaining capacity and perform checkpoints to refill it, too, but it would make the logic way more complex - still more elastic.
Let me know what you prefer 🙏

franz1981 · 2025-02-11T22:20:04Z

@jyemin I see I have to open an issue there but I need a bit of help to find if there is already a JMH benchmark which already cover this; if there is not I can provides a new one myself

franz1981 · 2025-02-12T10:30:04Z

This is a flamegraph coming from the same test running in JVM mode (JDK 21)

which reports a similar story.

franz1981 · 2025-02-15T16:30:13Z

@jyemin I have just verified that since we are using the Netty driver as well, the core path there will uses (direct) Netty buffers too, so I think I will expand this optimization to include them, checking which optimization path is more appropriate there e g.

branchless ASCII check with 8 chars (using 2 longs storing 4 UTF-16 chars each)
branchless zero check leveraging ASCII bytes
batch buffer writes

franz1981 · 2025-02-16T07:53:29Z

I have added an extra commit to improve the case for the Netty buffers as well.
I can't modify the original JIRA, to explain and include such, how I can do it?

franz1981 · 2025-02-16T21:07:42Z

driver-core/src/main/com/mongodb/internal/connection/ByteBufferBsonOutput.java

+        }
+    }
+
+    private int writeCharactersOnNettyByteBuf(String str, boolean checkForNullCharacters, ByteBuf buf) {


I have tried hard to reduce the burden of Netty accessibility checks and the CPU cost of writing byte per byte with this, to not mention amortize the cost of branch mispredict for less common cases (non ASCII String).

franz1981 · 2025-02-17T07:38:51Z

I have performed some JMH benchmarks and the Netty approach, despite is way better than the original code, is still much slower than the plain array case, byte per byte; I will experiment a bit more by using some Netty internalNioBuffer's API - although, since we create a Netty swapped wrapper (not needed actually) it could not work

franz1981 · 2025-02-17T22:09:32Z

I've added ff360c0 to show what means using the internal NIO buffer from Netty
I'll post some JMH bench results soon - although native image CE won't benefit a lot from this since NIO ByteBuffer(s) have some "wide" enough OOP hierarchy which can still make single byte access to be too costly - which is something the "batchy" version instead was saving.

franz1981 force-pushed the write_chars_heap_ascii branch from fc1c1f8 to 82782ee Compare February 12, 2025 10:01

franz1981 changed the title ~~Latin heap array fast path~~ JAVA-5788 Latin heap array fast path Feb 12, 2025

jyemin added the external label Feb 13, 2025

franz1981 marked this pull request as draft February 15, 2025 16:30

franz1981 force-pushed the write_chars_heap_ascii branch from bc4af89 to 7b80312 Compare February 16, 2025 07:09

JAVA-5788 Improve ByteBufferBsonOutput::writeCharacters

b2cd2b8

franz1981 force-pushed the write_chars_heap_ascii branch from 7b80312 to b2cd2b8 Compare February 16, 2025 14:59

franz1981 marked this pull request as ready for review February 16, 2025 14:59

franz1981 changed the title ~~JAVA-5788 Latin heap array fast path~~ JAVA-5788 Improve ByteBufferBsonOutput::writeCharacters Feb 16, 2025

franz1981 commented Feb 16, 2025

View reviewed changes

Use internal NIO ByteBuffer from Netty

ff360c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JAVA-5788 Improve ByteBufferBsonOutput::writeCharacters #1629

JAVA-5788 Improve ByteBufferBsonOutput::writeCharacters #1629

franz1981 commented Feb 11, 2025 •

edited

Loading

franz1981 commented Feb 11, 2025

franz1981 commented Feb 12, 2025

franz1981 commented Feb 15, 2025

franz1981 commented Feb 16, 2025

franz1981 Feb 16, 2025

franz1981 commented Feb 17, 2025

franz1981 commented Feb 17, 2025

JAVA-5788 Improve ByteBufferBsonOutput::writeCharacters #1629

Are you sure you want to change the base?

JAVA-5788 Improve ByteBufferBsonOutput::writeCharacters #1629

Conversation

franz1981 commented Feb 11, 2025 • edited Loading

franz1981 commented Feb 11, 2025

franz1981 commented Feb 12, 2025

franz1981 commented Feb 15, 2025

franz1981 commented Feb 16, 2025

franz1981 Feb 16, 2025

Choose a reason for hiding this comment

franz1981 commented Feb 17, 2025

franz1981 commented Feb 17, 2025

franz1981 commented Feb 11, 2025 •

edited

Loading