Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unexpected Cache Behavior on ASTRA_MANAGER_REPLICA_LIFESPAN_MINS Update #1206

Open
3 tasks done
autata opened this issue Jan 17, 2025 · 0 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@autata
Copy link
Contributor

autata commented Jan 17, 2025

Describe the bug

It seems that when the configuration for replicaCreationServiceConfig.replicaLifespanMins (e.g., ASTRA_MANAGER_REPLICA_LIFESPAN_MINS) is updated, existing replicas do not reflect the new value. This behavior is unexpected, as it differs from what I anticipated for cache updates.

Requirements (place an x in each of the [ ])**

  • I've read and understood the Contributing guidelines and have done my best effort to follow them.
  • I've read and agree to the Code of Conduct.
  • I've searched for any related issues and avoided creating a duplicate issue.

To Reproduce

  1. Set ASTRA_MANAGER_REPLICA_LIFESPAN_MINS to a high value (e.g., 7 days).
  2. Keep the cluster running for 7 or more days
  3. Reduce ASTRA_MANAGER_REPLICA_LIFESPAN_MINS to a lower value (e.g., 24 hours).
  4. Query nodes still serve data from the original 7-day window and require cache capacity to accommodate the older data.

Observations

  • When a snapshot is created by the index node, the associated record in ZooKeeper reflects the value of ASTRA_MANAGER_REPLICA_LIFESPAN_MINS at the time of creation.
  • Subsequent updates to this configuration do not appear to impact existing replicas.

Expected behavior

If the lifespan is increased, I would expect the system to pull additional data from S3. Conversely, if it is decreased, I would expect the system to limit the data served to align with the reduced window.

Questions and Suggestions

I understand that caching logic is undergoing changes. Will the new implementation allow for the cache window to adapt more immediately following a configuration update? This could be particularly useful for occasional scenarios where serving older data is necessary. For example:

  • Normally, you might only require 3-7 days of data, but you keep segments in S3 for longer.
  • By temporarily increasing ASTRA_MANAGER_REPLICA_LIFESPAN_MINS and scaling up cache capacity, you could serve older data as needed.
  • Afterward, scaling down the cache and resetting the configuration would return the system to its usual state.
    Currently, this flexibility does not seem possible due to the described behavior. Let me know if I can provide any additional details or run further tests to assist in diagnosing this issue.

Thank you!

Screenshots

If applicable, add screenshots to help explain your problem.

Reproducible in:

Astra version: We are running a slightly older version of astra. We are built off of https://github.com/airbnb/kaldb but I don't see any PRs that change this behavior since then.

JVM version:

OS version(s):

Additional context

Add any other context about the problem here.

@autata autata added the bug Something isn't working label Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant