-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge RAM usage on big file uploads #2856
Comments
Can you provide a minimal code sample that reproduces this? What partSize are you using? |
my minimal example
You can note that the callback is also not called but that is declared as a separate issue... |
CRT S3 client will automatically split big uploads into multiple parts and upload them in parallel. So during upload, crt will hold several part-sized buffers in memory depending on overall parallelism settings. So depending on how big the file is and how many parts you are trying to upload at the same time, 1GB might be a reasonable number. On top of that CRT will pool buffers to avoid reallocating them over an over again, so you might see crt holding on to a larger chunk of memory than you would expect. buffer pools are cleared after some period of inactivity |
Well to be frank, 1GB to upload a file, whatever the size, is a huge price to pay. On restricted cloud environment, this is a ridiculous amount of RAM, not to mention that we could have multiple uploads simultaneously. |
S3 has a fairly low per connection throughput, so to reach decent amounts of throughput, crt needs to run several connections in parallel and buffer considerable portion of the data being uploaded. Amount of parallelism used by crt can be controlled by target throughput setting (https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-s3-crt/include/aws/s3-crt/ClientConfiguration.h#L58). Unfortunately, that setting already defaults to the lowest possible value in cpp sdl and setting it lower will not have impact on memory usage. Note: that overall max memory usage for the client will have an upper bound that is derived from part size and number of connections (which in turn is derived from max throughput). so memory usage does not scale directly with the number of s3 requests queued up on the client and once that upper bound is reached, memory usage will stay there. We've made several improvements to underlying C CRT libs with regards to memory usage un the past couple months that havent made its way to CPP SDK yet, so I would be interested in learning about your use cases. What kind of instances are you running code on? overall ram on the system and NIC bandwidth? what are the typical file sizes you are trying to upload? |
Hi @DmitriyMusatkin and thank you for the reply. I was actually wondering if setting the throughput to a lower value would help.. Heh too bad for me. I suppose that if the canes to the memory usage does not directly affect those buffers it will not help me much. In essence we are a SaaS provider and there are times where we need to push data. Most likely in files of a few GB but it can go to 10s of GB (there is no actual limit), hence my questions. That being said, for now, we have switched to using TransferManager that allows to control better the memory management. |
Thanks for bringing your use case to our attention. I'm sorry that s3crtclient doesn't currently fit you needs. I'm changing this issue to a feature request. This feature would be to add additional options for configuring the s3crtclient. If you have any ideas for which settings you would like to configure please let us know, but I can't guarantee that we will be able to implement them. |
Describe the bug
I want to uploada big file. So I wanted to up the partSize of my s3crtclient Configuration.
But, it seems the RAM consumption of my process is a direct multiple (around 20x) of that value. So when I tried 50MB, my process was taking 1GB of RAM.
Expected Behavior
Uploading files should be simple enough that it consumes less RAM.
Current Behavior
It uses 20x the RAM of the part size. For a huge upload that is too much. And that means I cannot do more than 1 in parallel.
Reproduction Steps
see description
Possible Solution
No response
Additional Information/Context
No response
AWS CPP SDK version used
1.11.258
Compiler and Version used
Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Operating System and version
macOS Sonoma 14.3
The text was updated successfully, but these errors were encountered: