-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vertical scaling and SoftReference for snapshot repository data cache #16489
Conversation
9061419
to
e746f68
Compare
❌ Gradle check result for e746f68: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for a607693: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
2a3cbc5
to
46b46be
Compare
Oh, wow, thank you so much. I hadn’t caught this error when running with JUnit. In my future contributions, I’ll be sure to verify tests with Gradle before pushing! |
…data cache - Applies `SoftReference` to cached repository data for efficient memory management under heap pressure. - Enables cache size configuration in `opensearch.yml`, adjustable within a range of 500KB to 1% of heap memory. - Sets the default cache size to `Math.max(ByteSizeUnit.KB.toBytes(500), CACHE_MAX_THRESHOLD / 2)` so it’s generally proportional to heap size. In cases where 1% of the heap is less than 1000KB, indicating a low-memory environment, the default reverts to 500KB as before. - Since `BytesReference` internally uses `byte[]`, the compressed array size is capped at `Integer.MAX_VALUE - 8` to ensure compatibility with JDK limitations on array sizes. Therefore, the maximum cache size cannot exceed this limit. Signed-off-by: inpink <[email protected]>
46b46be
to
eeb0be0
Compare
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-16489-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 53d41d3fac28c9f72d5883467d5a6211ad09feac
# Push it to GitHub
git push --set-upstream origin backport/backport-16489-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…data cache (opensearch-project#16489) - Applies `SoftReference` to cached repository data for efficient memory management under heap pressure. - Enables cache size configuration in `opensearch.yml`, adjustable within a range of 500KB to 1% of heap memory. - Sets the default cache size to `Math.max(ByteSizeUnit.KB.toBytes(500), CACHE_MAX_THRESHOLD / 2)` so it’s generally proportional to heap size. In cases where 1% of the heap is less than 1000KB, indicating a low-memory environment, the default reverts to 500KB as before. - Since `BytesReference` internally uses `byte[]`, the compressed array size is capped at `Integer.MAX_VALUE - 8` to ensure compatibility with JDK limitations on array sizes. Therefore, the maximum cache size cannot exceed this limit. Signed-off-by: inpink <[email protected]> (cherry picked from commit 53d41d3)
…data cache (opensearch-project#16489) - Applies `SoftReference` to cached repository data for efficient memory management under heap pressure. - Enables cache size configuration in `opensearch.yml`, adjustable within a range of 500KB to 1% of heap memory. - Sets the default cache size to `Math.max(ByteSizeUnit.KB.toBytes(500), CACHE_MAX_THRESHOLD / 2)` so it’s generally proportional to heap size. In cases where 1% of the heap is less than 1000KB, indicating a low-memory environment, the default reverts to 500KB as before. - Since `BytesReference` internally uses `byte[]`, the compressed array size is capped at `Integer.MAX_VALUE - 8` to ensure compatibility with JDK limitations on array sizes. Therefore, the maximum cache size cannot exceed this limit. Signed-off-by: inpink <[email protected]> (cherry picked from commit 53d41d3) Signed-off-by: Andriy Redko <[email protected]>
Nice work @inpink !! 👍👍👍 |
Thank you so much, @injae-kim! 🎉 |
…data cache (#16489) (#16624) - Applies `SoftReference` to cached repository data for efficient memory management under heap pressure. - Enables cache size configuration in `opensearch.yml`, adjustable within a range of 500KB to 1% of heap memory. - Sets the default cache size to `Math.max(ByteSizeUnit.KB.toBytes(500), CACHE_MAX_THRESHOLD / 2)` so it’s generally proportional to heap size. In cases where 1% of the heap is less than 1000KB, indicating a low-memory environment, the default reverts to 500KB as before. - Since `BytesReference` internally uses `byte[]`, the compressed array size is capped at `Integer.MAX_VALUE - 8` to ensure compatibility with JDK limitations on array sizes. Therefore, the maximum cache size cannot exceed this limit. (cherry picked from commit 53d41d3) Signed-off-by: inpink <[email protected]> Signed-off-by: Andriy Redko <[email protected]> Co-authored-by: inpink <[email protected]>
Description
Background
Currently, snapshot repository data is not cached if its compressed size exceeds 500KB.
This static limit causes repeated downloads in operations like clone, restore, and status checks, increasing latency.
The limit has not been adjusted for larger heap sizes, impacting repositories with numerous snapshots.
It doesn’t adjust with vertical or horizontal scaling. This restricts caching efficiency, even with enhanced system resources.
Changes
The new setting,
opensearch.snapshot.cache.threshold
, allows users to adjust the cache limit as a percentage of heap size. To maintain consistency in how cache sizes are set across OpenSearch, I referenced existing configurations like indices.requests.cache.size and indices.fielddata.cache.size.snapshot.repository_data.cache.threshold
(String): Defines the maximum threshold of the snapshot repository data cache. This value can be specified either as an absolute value (e.g., 2GB) or as a percentage of the node’s heap memory (e.g., 3%). It is a static setting, meaning it must be defined in theopensearch.yml
file.Based on the discussions, I set the user-selectable cache size range within
500kb ... min(max(500kb, 1% of heap memory), Integer.MAX - 8))
, with a default ofmin(max(500kb, 1% of heap memory), Integer.MAX - 8))
.For implementing the maximum limit, I referred to the knn plugin and its knn.model.cache.size.limit setting.
(It would be helpful if the Setting class could directly support maximum memory size limits in the future.)
I applied a
[SoftReference](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/SoftReference.html)
tolatestKnownRepositoryData
, which is already managed with anAtomicReference
for thread safety. The approach of combining both references was inspired by Gradle’s implementation.The previous warning threshold for cached data was set at 5MB, but this has now been changed to issue a warning at
min(configured limit * 10, Integer.MAX - 8)
.I believed that a
static
setting, rather thandynamic
, was more suitable here to avoid further amplifying the unpredictability ofSoftReference
.(I plan to contribute documentation for this feature to the OpenSearch project. I have already opened an issue for documentation contribution.)
Testing
I conducted tests in MemorySizeSettingsTests to verify that the snapshot.repository_data.cache.threshold option allocates memory as intended. I also confirmed that exceptions are triggered if the specified percentage exceeds the maximum limit or falls below the minimum limit.
I assembled the code I modified and ran it in Docker to conduct E2E testing to ensure everything functions properly overall.
(If any additional data is needed for these E2E tests, I am happy to provide it at any time.)
(Click the image to view it in full size.)
I wrote test code to confirm that the
SoftReference
works as expected. However, sincelatestKnownRepositoryData
is private, I temporarily changed its access level to protected in my local environment, and confirmed that the tests passed.testCacheRepositoryData
: Verifies that caching works as expected inrepository.latestKnownRepositoryData
.testSoftReferenceRepositoryDataCacheCleared
: Confirms that the cache is cleared when there is memory pressure and no objects are referencinglatestKnownRepositoryData
.Closing Thoughts
I’m very pleased to have contributed to OpenSearchㅡit was an exciting feature to add! Participating in discussions with excellent maintainers to provide a robust caching feature for users has been an incredibly valuable experience for me. If there’s anything that needs adjustment, I’ll make the changes as quickly as possible.
Related Issues
Resolves #16298
Check List
[ ] API changes companion pull request created, if applicable.[ ]Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.