-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature requirement: new option expiration for recycle the share dict #164
Comments
I believe that's pretty standard for the Prometheus ecosystem. Once created, metrics exist until the process ends. It's totally reasonable for a metric to not get updated for a long time. For example, a counter that tracks the number of errors is hopefully not incremented very often. But in most cases it would be unhelpful to delete it after it has been incremented.
Does it need to be part of the nginx-lua-prometheus library? We provide del and reset methods for counters and gauges that I believe should make it possible to implement time-based deletion of metrics if it is necessary in your environment. |
In the openresty ecosystem, metrics are usually recorded in the log phase. However, there is no appropriate phase to manually destroy metric records. If you use TTL to destroy, it is more in line with openresty's practice, instead of using more complex logic to maintain the share dict required by prometheus. I can implement this feature if you agree with this requirement. |
There are a lot of moving pieces we have which were mostly added for performance reasons (per-worker counters that get regularly flushed to the main dict, separate keys in the dict for metric names), and integrating time-based expiry correctly will not be trivial and will likely further increase complexity of the library. Also, I believe as a use case this is sufficiently unusual and niche that I would be hesitant to support it natively. I am not sure many other Prometheus metric libraries have functionality like this, and a quick search for "TTL" in Prometheus mailing list suggest that every time this has been discussed, there was a more idiomatic way to achieve whatever was needed. I am not trying to tell you that what you are trying to do is wrong, but I would suggest trying to find a solution without making changes to the library. I've not though too much about it, but perhaps you might be able to do it by keeping track of metric usage in another dict and calling |
I think since this library is in the nginx ecosystem, it is more practical to better adapt to the nginx ecosystem. Simply using timer implementation may be more complicated than implementing ttl in the library.
|
I am not sure I understand how it is different from any other environment?
This approach will result in only metric values being removed, with all other data staying behind in an inconsistent state. There is a reason why |
In openresty, this library needs to be called at a specified phase. In most cases, it is the
It is indeed possible, it may depend on the specific data type. Let me evaluate it first. I think since share dict is used for storage, no matter what data type it is in Prometheus, it will happen together during recycling and the possibility of finding data inconsistency is very small. |
I think I figured it out, and it’s completely ok to give up on deleting it. The incr and set methods of nginx share dict have return values of whether to force setting. |
Hi @knyar, many of our users have encountered this problem, for example no memory and high cpu usage What you suggest is to use a timer at upper level to remove metrics, but IMHO, we have several problems:
So it's the easiest way to clean stale data in the library, just a few minor modifications, and I'm glad to use this feature in our open source project, i am sure it will make the library more and more powerful. |
Having 150k timeseries per application (as mentioned in the linked issue) is definitely well outside of the best practices for prometheus. See for example the warning given here. Using URL, IDs or other unbounded high-cardinality data as labels is always bad idea and there is not much that the client can help with. There will always be a limit where things just get overloaded. Also, even if the client library deletes unused metrics, you might just pushed the problems further downstream, to your Thanos, Mimir or whatever is used to store the metrics data. |
@dolik-rce Yeah, i agree with you, but the main dict will be full of metrics eventually, it will cause the problem i mentioned above. we need to provide a simple way to prevent this from happening. i think the ttl feature is the best way, and i'm ready for this. |
That's not something that is expected to happen, and it never happened in practice in any of the deployments where I've used this library. As @dolik-rce mentioned above, Prometheus is designed to work well with a relatively small and static set of unique combinations of metric label values. I noticed that metric libraries for other languages (Ruby, Java) explicitly document this, even though it's not something that's specific to a particular metric library. Perhaps we should add a similar note to the README file as well. |
I found that the implementation of the prometheus library requires manual recycling of the metric if the metric is no longer updated.
I think if nginx-lua-prometheus supports the new option expiration, it means that this metric will be recycled if it is not updated within the specified time.
and exptime default is 0
I think it is enough to simply implement it based on share dict.
The text was updated successfully, but these errors were encountered: