From 6b1a3697d35719b2e381200cd7570a92ae18c615 Mon Sep 17 00:00:00 2001 From: chaokunyang Date: Mon, 15 Apr 2024 11:26:47 +0000 Subject: [PATCH] =?UTF-8?q?=F0=9F=94=84=20synced=20local=20'docs/specifica?= =?UTF-8?q?tion/'=20with=20remote=20'docs/specification/'?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/specification/xlang_serialization_spec.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/docs/specification/xlang_serialization_spec.md b/docs/specification/xlang_serialization_spec.md index 15094c524..4641d2ba5 100644 --- a/docs/specification/xlang_serialization_spec.md +++ b/docs/specification/xlang_serialization_spec.md @@ -559,7 +559,7 @@ Map iteration is too expensive, Fury won't compute the header like for list sinc Users can use `MapFieldInfo` annotation to provide the header in advance. Otherwise Fury will use first key-value pair to predict header optimistically, and update the chunk header if the prediction failed at some pair. -Fury will serialize the map chunk by chunk, every chunk has 127 pairs at most. +Fury will serialize the map chunk by chunk, every chunk has 255 pairs at most. ``` | 1 byte | 1 byte | variable bytes | @@ -592,6 +592,21 @@ format will be: `KV header` will be a header marked by `MapFieldInfo` in java. For languages such as golang, this can be computed in advance for non-interface types most times. +#### Why serialize chunk by chunk? + +When fury will use first key-value pair to predict header optimistically, it can't know how many pairs have same +meta(tracking kef ref, key has null and so on). If we don't write chunk by chunk with max chunk size, we must write at +least `X` bytes to take up a place for later to update the number which has same elements, `X` is the num_bytes for +encoding varint encoding of map size. + +And most map size are smaller than 255, if all pairs have same data, the chunk will be 1. This is common in golang/rust, +which object are not reference by default. + +Also, if only one or two keys have different meta, we can make it into a different chunk, so that most pairs can share +meta. + +The implementation can accumulate read count with map size to decide whether to read more chunks. + ### enum Enums are serialized as an unsigned var int. If the order of enum values change, the deserialized enum value may not be