storage_min_free_bytes config option for topic-level #25074

gilad-aperio · 2025-02-11T08:39:07Z

Who is this for and what problem do they have today?

There are cases where some topics should have minimal message latency between production and consumption, while others allow higher latency.

For example, live-streaming vs download. Let's call them topic S and topic D.

When these topics share a single broker and disk, and disk pressure starts to build up due many messages in topic D, producers to all topics are rejected based on the value of storage_min_free_bytes. Meaning, topic D being full affects performance of topic S.

My request is to have a topic-level configuration option that rejects producers to that topic based on minimum free bytes in disk (the value of which will be higher than storage_min_free_bytes).

What are the success criteria?

Have production to other topics unaffected by a high disk pressure threshold that was configured to a specific topic.

Why is solving this problem impactful?

Enables hosting low-latency topics on brokers that also handle high-production topics.

Additional notes

JIRA Link: CORE-9066

The text was updated successfully, but these errors were encountered:

dotnwat · 2025-02-20T05:02:15Z

cc @mattschumpert

mattschumpert · 2025-02-20T18:43:28Z

Redpanda is certainly designed to handle mixed workloads (high/low latency) on the same broker with many configuration knobs to effect the relative prioritization, but not with respect to use of disk space in on-premise environments without any access to cloud storage, as there isn't a good way to do this.

In systems with a single storage tier (local storage only) the disk is a shared resource, and when it is globally full we need to protect the broker from disk fullness regardless of who is writing messages and to which topic, without an obvious way to make tradeoffs

Its not a question of latency/performance but keeping the system available overall and since we can't force-delete a users' topic data, there isn't really room for prioritization of disk space without a force-delete violating (retention.ms of a low priority topic) as I see it.

In cloud environments using Tiered Storage, we do have such a mechanism for prioritizing low latency topics wrt their disk resources, and auto-magically managing disk space accordingly ('Space Management'). This is accomplished by tuning the 'local retention target' and by offloading data to cloud storage automatically when under disk pressure, taking into account these hints on topics that desire more local disk retention in order to support low end to end latency further back in the log.

These tools are described here: https://docs.redpanda.com/current/manage/cluster-maintenance/disk-utilization/#space-management

gilad-aperio · 2025-02-23T08:24:27Z

I understand the need to protect system availability. This is not a request to remove storage_min_free_bytes for certain topics, but rather be extra conservative by limiting production to the noisy ones before even reaching that safeguard. As I see it, this actually improves availability while maintaining simplicity.

mattschumpert · 2025-02-24T19:40:44Z

You can't limit production completely without dropping data upstream of producers or forcibly deleting user data, as described above, which breaks the basic contract of the system. When you have a two-tiered elastic storage system as noted above, you can make these kinds of tradeoffs as detailed above in the docs. When you have a single storage tier, you need to manage the storage yourself as has always been the case with kafka, and the suggestion here of not accepting any more data at all when disk space is low is a harsh, blunt instrument thats hard to predict with a mix of workloads sharing the system. High volume workloads are not considered 'noisy' as such as they in fact can be the highest priority workload and maintaining their throughput equally critical. However Its true that we could perhaps provide backpressure to Kafka producers to 'lower priority' topics under high disk pressure when there is no tiered storage available to offload.

Curious what you think of this second approach @dotnwat

dotnwat · 2025-02-24T20:56:17Z

Thanks @mattschumpert

Curious what you think of this second approach @dotnwat

I like the idea of priority, but I'm not sure it is enough. For example, if I have two topics A (high priority) and B (low priority), then at some point when disk pressure gets hot I need to block traffic. Instead of applying the block globally to all topics, I instead choose B. That choice, however, doesn't seem to change the fact that the disk is still full.

What about quotas? Topic A has no quota, but topics B,C,D collectively say have a 10 GB quota. At least to me this is a bit easier to reason about than min_free_space.

gilad-aperio added the kind/enhance New feature or request label Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage_min_free_bytes config option for topic-level #25074

storage_min_free_bytes config option for topic-level #25074

gilad-aperio commented Feb 11, 2025 •

edited by github-actions bot

Loading

dotnwat commented Feb 20, 2025

mattschumpert commented Feb 20, 2025

gilad-aperio commented Feb 23, 2025

mattschumpert commented Feb 24, 2025

dotnwat commented Feb 24, 2025 •

edited

Loading

storage_min_free_bytes config option for topic-level #25074

storage_min_free_bytes config option for topic-level #25074

Comments

gilad-aperio commented Feb 11, 2025 • edited by github-actions bot Loading

Who is this for and what problem do they have today?

What are the success criteria?

Why is solving this problem impactful?

Additional notes

dotnwat commented Feb 20, 2025

mattschumpert commented Feb 20, 2025

gilad-aperio commented Feb 23, 2025

mattschumpert commented Feb 24, 2025

dotnwat commented Feb 24, 2025 • edited Loading

gilad-aperio commented Feb 11, 2025 •

edited by github-actions bot

Loading

dotnwat commented Feb 24, 2025 •

edited

Loading