Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set Buf size before encoding value #1189

Open
samuelorji opened this issue Sep 5, 2024 · 4 comments
Open

Set Buf size before encoding value #1189

samuelorji opened this issue Sep 5, 2024 · 4 comments
Labels

Comments

@samuelorji
Copy link

samuelorji commented Sep 5, 2024

Hi,

Thanks for your contribution.

I have a question about how this buffer which is part of the thread local variable writer pool connects to something like the writer config here.

Unless I'm mistaken, I can see that the buffer only gets resized after its been encoded, as opposed to when writing to a stream where the buffer is resized based on the supplied config before its encoded

As a practical example, say I have to writeToArray a Map[String,String] and say the size of each entry in the map is < 2KB.

Now I defined my write config in terms of the size of the Map, so I make my buffer the right size for each response or Map like this:

val map : Map[String, String] = _
writeToArray(map, WriterConfig.withPreferredBufSize(AVERAGE_RESPONSE_SIZE * (map.size + 1))) // +1 in case of empty map

where AVERAGE_RESPONSE_SIZE = 2KB.

Now, if the size of the first map is 2, my buffer size should be 4Kb, if the next map has size of 200, then my buffer size should be 400Kb,

Going by the logic in the code, it means that for the first invocation, it uses the previous value of the buffer which is the default 32KB , after which it encodes my value, then shrinks it to 4KB ? because this will mean that for the second Map where I want 400KB buffer, it is going to start with a 4KB buffer because it only reallocateBufToPreferredSize after the encoding has happened?

Is this the way it's expected to behave?

Is a better way to resize the buf before we encode here.

I don't mind submitting a PR if you think that's the preferred solution

Thanks

Repository owner deleted a comment Sep 5, 2024
@plokhotnyuk
Copy link
Owner

plokhotnyuk commented Sep 5, 2024

@samuelorji Hi, Samuel!

Thanks for the question! This topic of preferred sizes for internal buffers should be covered in docs definitely.

Yes, it is expected behavior to reallocate internal buffers to preferred sizes to avoid holding of big chunks of memory while it is not needed anymore.

So, when writing to an array, if you want to avoid redundant allocations then just set the preferred size to max expected size of your JSON representation.

Use writeToSubArray as the ultimate method to avoid internal reallocations, but that will require preallocation of an array with size that should be enough for all inputs.

@samuelorji
Copy link
Author

samuelorji commented Sep 5, 2024

@plokhotnyuk so what you mean is that its preferred to set it once to the max expected size of the JSON representation as opposed to constantly setting it based on the expected size of the input.

Using the previous example, if I'm to call writeToArray twice, first time needs 4KB and second time needs 400KB, you're saying it's not advisable to set buffer to 4K for the first and 400K for the second, but instead set it to max of 400K as it will cover both.

Will there be any performance consequence of setting the buffer to the required buffer size before encoding apart from the fact that GC will increase as instead of potentially reusing one buffer, we're always recreating buffers (both small and big).

Is this a fair pros and cons list:

Method Pros Cons
Reusable Buffer (resize after encoding) Less GC activity as the same buffer is reused probably not memory efficient for multiple small writes . E.g, if each write takes 4KB, then we're potentially consistently using 8x more memory for the same task
Resize buffer before encoding A little memory efficient as you create a buffer that's just enough for your writes More GC activity has buffers are often recreated

Looking at it from an HTTP response angle where you don't know the size of the response to the user, could be 1kB, could be 400KB,

@samuelorji
Copy link
Author

@plokhotnyuk , this also means that if there's an abnormally large outlier like only 1% of the response being about 400KB when the average is 4KB, then the buffer remains 400KB forever. But I guess that's fine considering there's just one instance of that per thread, so there's some overhead, but its not so bad

@plokhotnyuk
Copy link
Owner

@samuelorji A good option would be having server side metrics for your response sizes.

Also, if you can run async-profiler on production (or on performance environment with reasonable simulations) then you can measure CPU cycles and allocation more precisely.

Which HTTP server do you use currently?

Please see here how to measure scalability, CPU usage and allocations for most popular Scala's HTTP servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants