-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage_min_free_bytes config option for topic-level #25074
Comments
Redpanda is certainly designed to handle mixed workloads (high/low latency) on the same broker with many configuration knobs to effect the relative prioritization, but not with respect to use of disk space in on-premise environments without any access to cloud storage, as there isn't a good way to do this. In systems with a single storage tier (local storage only) the disk is a shared resource, and when it is globally full we need to protect the broker from disk fullness regardless of who is writing messages and to which topic, without an obvious way to make tradeoffs Its not a question of latency/performance but keeping the system available overall and since we can't force-delete a users' topic data, there isn't really room for prioritization of disk space without a force-delete violating (retention.ms of a low priority topic) as I see it. In cloud environments using Tiered Storage, we do have such a mechanism for prioritizing low latency topics wrt their disk resources, and auto-magically managing disk space accordingly ('Space Management'). This is accomplished by tuning the 'local retention target' and by offloading data to cloud storage automatically when under disk pressure, taking into account these hints on topics that desire more local disk retention in order to support low end to end latency further back in the log. These tools are described here: https://docs.redpanda.com/current/manage/cluster-maintenance/disk-utilization/#space-management |
I understand the need to protect system availability. This is not a request to remove |
You can't limit production completely without dropping data upstream of producers or forcibly deleting user data, as described above, which breaks the basic contract of the system. When you have a two-tiered elastic storage system as noted above, you can make these kinds of tradeoffs as detailed above in the docs. When you have a single storage tier, you need to manage the storage yourself as has always been the case with kafka, and the suggestion here of not accepting any more data at all when disk space is low is a harsh, blunt instrument thats hard to predict with a mix of workloads sharing the system. High volume workloads are not considered 'noisy' as such as they in fact can be the highest priority workload and maintaining their throughput equally critical. However Its true that we could perhaps provide backpressure to Kafka producers to 'lower priority' topics under high disk pressure when there is no tiered storage available to offload. Curious what you think of this second approach @dotnwat |
Thanks @mattschumpert
I like the idea of priority, but I'm not sure it is enough. For example, if I have two topics A (high priority) and B (low priority), then at some point when disk pressure gets hot I need to block traffic. Instead of applying the block globally to all topics, I instead choose B. That choice, however, doesn't seem to change the fact that the disk is still full. What about quotas? Topic A has no quota, but topics B,C,D collectively say have a 10 GB quota. At least to me this is a bit easier to reason about than min_free_space. |
Who is this for and what problem do they have today?
There are cases where some topics should have minimal message latency between production and consumption, while others allow higher latency.
For example, live-streaming vs download. Let's call them
topic S
andtopic D
.When these topics share a single broker and disk, and disk pressure starts to build up due many messages in
topic D
, producers to all topics are rejected based on the value ofstorage_min_free_bytes
. Meaning,topic D
being full affects performance oftopic S
.My request is to have a topic-level configuration option that rejects producers to that topic based on minimum free bytes in disk (the value of which will be higher than
storage_min_free_bytes
).What are the success criteria?
Have production to other topics unaffected by a high disk pressure threshold that was configured to a specific topic.
Why is solving this problem impactful?
Enables hosting low-latency topics on brokers that also handle high-production topics.
Additional notes
JIRA Link: CORE-9066
The text was updated successfully, but these errors were encountered: