-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a traitlet to disable recording HTTP request metrics #1472
Conversation
Since this records a series of metrics for each HTTP handler class, this quickly leads to an explosion of cardinality and makes storing metrics quite difficult. For example, just accessing the metrics endpoint creates the following 17 metrics: ``` http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="0.005",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="0.01",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="0.025",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="0.05",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="0.075",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="0.1",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="0.25",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="0.5",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="0.75",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="1.0",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="2.5",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="5.0",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="7.5",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="10.0",method="GET",status_code="200"} 9.0 http_request_duration_seconds_bucket{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",le="+Inf",method="GET",status_code="200"} 9.0 http_request_duration_seconds_count{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",method="GET",status_code="200"} 9.0 http_request_duration_seconds_sum{handler="jupyter_server.base.handlers.PrometheusMetricsHandler",method="GET",status_code="200"} 0.009019851684570312 ``` This has what has stalled prior attempts at collecting metrics from jupyter_server usefully in multitenant deployments (see berkeley-dsep-infra/datahub#1977). This PR adds a traitlet that allows hub admins to turn these metrics off.
for more information, see https://pre-commit.ci
Co-authored-by: Zachary Sailer <[email protected]>
@@ -41,13 +41,16 @@ def _scrub_uri(uri: str) -> str: | |||
return uri | |||
|
|||
|
|||
def log_request(handler): | |||
def log_request(handler, record_prometheus_metrics=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanding the signature here makes me a little more cautious to merge. This technically expands a public API which would normally be saved for a major release. However, we rarely make a major release of Jupyter Server (because it requires a lot of work to coordinate with out subprojects).
On the other hand, this change seems small enough that it likely shouldn't trigger a major release. We could also argue that this API is likely unused by anyone outside Jupyter Server, since it's really specific to Jupyter Server. The only people this might affect are folks that monkeypatch this method, which is discouraged.
I think we can proceed as long as we communicate clearly that there's a "possible breaking change" when this is released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, that's why I made it a default arg and set it to the value that would cause no behavior change when omitted! So a log_request(handler)
from any other project would see no difference than before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, and this is great!
The specific case I was thinking about was where someone patches log_request
to intercept our logger. Something like this:
import jupyter_server.log
# Create a custom function to monkeypatch jupyter server's log_request
def log_request(handler):
...
# custom logic
jupyter_server.log.log_request = log_request
A Jupyter Server using this monkeypatch would fail after releasing this PR right?
This is definitely a discouraged thing to do 😅, but because this function is public, the reason this fails feels like a breach of contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah interesting. that would fail, but I always assume that if you monkeypatch and something fails, that's on you :D the risk and reward of monkeypatching...
But regardless, I agree this one is ok here. If people do want to override log_request, IMO the way to do that is to override the tornado setting log_function
instead - and people doing that will not be affected by this change.
I think we can proceed as long as we communicate clearly that there's a "possible breaking change" when this is released.
Is there anything you'd like me to do to make this possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you monkeypatch and something fails, that's on you :D
Agreed 👍
Nope, I think this is good to go.
I'm mostly raised/noted here in the thread so we can cross-link if someone reports a "bug" after release. It's not a bug, but a consequence of a monkeypatch 😃 Documenting this here is enough for future reference.
Thanks @yuvipanda!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay, ty @Zsailer!
Since this records a series of metrics for each HTTP handler class, this quickly leads to an explosion of cardinality and makes storing metrics quite difficult. For example, just accessing the metrics endpoint creates the following 17 metrics:
This has what has stalled prior attempts at collecting metrics from jupyter_server usefully in multitenant deployments (see berkeley-dsep-infra/datahub#1977).
This PR adds a traitlet that allows hub admins to turn these metrics off.