-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Test and document histogram latency metrics #7694
test: Test and document histogram latency metrics #7694
Conversation
d663da5
to
e084067
Compare
… DLIS-7383-yinggeh-metrics-standardization-TTFT
… DLIS-7383-yinggeh-metrics-standardization-TTFT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to rebase on main that has the compute capability fix merged.
qa/L0_metrics/ensemble_decoupled/async_execute_decouple/1/model.py
Outdated
Show resolved
Hide resolved
… DLIS-7383-yinggeh-metrics-standardization-TTFT
9f5b2ab
to
aa9ddc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work adding a more comprehensive test!
Only two things left:
- Verify the number of response(s) returned on each inference matches the number expected.
- Verify the histogram bucket key and value pairs after each inference.
# Histograms | ||
def test_inf_histograms_decoupled_exist(self): | ||
metrics = self._get_metrics() | ||
for metric in INF_HISTOGRAM_DECOUPLED_PATTERNS: | ||
for suffix in ["_count", "_sum", ""]: | ||
self.assertIn(metric + suffix, metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work adding checks that the count, sum and bucket metrics exist!
This does not check if the value on each bucket is correct, but you mentioned there is an existing test that will verify the value is correct, so you are not verifying the value here because the existing test will verify the Prometheus histogram metrics is functioning correctly and the tests on histogram_metrics_test.py is verifying the numbers provided to Prometheus is correct via count and sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this test only checks if specific histograms exist without updating the metrics, all values should be 0. For tests regarding Prometheus metrics APIs and functionalities, please refer to
https://github.com/triton-inference-server/core/blob/main/src/test/metrics_api_test.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Prometheus histogram buckets are tested in metrics_api_test.cc::HistogramAPIHelper
Just a nit for clarification: The Metrics API tests (ex: TRITONSERVER_MetricObserve
) aren't going through the same code path that the built-in latency metrics are going through. So although there are custom metrics tests testing histograms, they aren't necessarily testing the built-in histogram latency metrics added here.
The built-in metrics actually just use prometheus APIs directly and some C++ helper functions around them today. Ideally, we would unify these to all use the TRITONSERVER_Metric
layer in the same way for both built-in metrics and custom metrics for easier test coverage and maintenance in the future.
All that being said, I think the current tests included here with the checks around the _sum
value are fine for now for this PR. When adding other histogram latency metrics, I think it would be a good idea to either (a) add some bucket-related tests to these python unit tests or (b) attempt to unify the internal metrics to use the same custom metrics APIs or internal C++ classes around them for better re-use of the metrics_api_test.cc
tests.
cp ../python_models/${decoupled_model_name}/model.py ${MODELDIR}/${decoupled_model_name}/1/ | ||
cp ../python_models/${decoupled_model_name}/config.pbtxt ${MODELDIR}/${decoupled_model_name}/ | ||
|
||
SERVER_ARGS="${BASE_SERVER_ARGS} --load-model=${decoupled_model_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for future follow-up (not this PR), a lot of this could probably be condensed with a for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Only a minor comment on metrics.md
for this PR, but LGTM otherwise.
What does the PR do?
The PR adds tests to histogram metrics and new
nv_inference_first_response_histogram_ms
.--metrics-config histogram_latencies=<bool>
.Checklist
<commit_type>: <Title>
Commit Type:
Related PRs:
triton-inference-server/core#396
Where should the reviewer start?
Check the implementation PR first.
Test plan:
L0_metrics--base
L0_response_cache--base
19614087
Background
Standardizing Large Model Server Metrics in Kubernetes