Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Report histogram metrics to Triton metrics server #58

Merged
merged 36 commits into from
Aug 16, 2024

Conversation

yinggeh
Copy link
Contributor

@yinggeh yinggeh commented Aug 16, 2024

Sample histogram output

# HELP vllm:time_to_first_token_seconds Histogram of time to first token in seconds.
# TYPE vllm:time_to_first_token_seconds histogram
vllm:time_to_first_token_seconds_count{model="vllm_opt",version="1"} 3
vllm:time_to_first_token_seconds_sum{model="vllm_opt",version="1"} 0.0002238750457763672
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.001"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.005"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.01"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.02"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.04"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.06"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.08"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.1"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.25"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.75"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="1"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="2.5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="7.5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="10"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="+Inf"} 3
# HELP vllm:time_per_output_token_seconds Histogram of time per output token in seconds.
# TYPE vllm:time_per_output_token_seconds histogram
vllm:time_per_output_token_seconds_count{model="vllm_opt",version="1"} 45
vllm:time_per_output_token_seconds_sum{model="vllm_opt",version="1"} 0.002027750015258789
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.01"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.025"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.05"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.075"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.1"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.15"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.2"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.3"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.4"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.5"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.75"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="1"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="2.5"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="+Inf"} 45

What does the PR do?

Support histogram metric type and add tests.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • feat

Related PRs:

triton-inference-server/python_backend#374
triton-inference-server/core#386
triton-inference-server/server#7525

Where should the reviewer start?

n/a

Test plan:

n/a

  • CI Pipeline ID:
    17487728

Caveats:

n/a

Background

Customer requested histogram metrics from vLLM.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

n/a

@yinggeh yinggeh self-assigned this Aug 16, 2024
@yinggeh
Copy link
Contributor Author

yinggeh commented Aug 16, 2024

Merged #56 to the wrong branch. Needs approval for this PR. My apologies.

@yinggeh yinggeh merged commit 507e4dc into main Aug 16, 2024
3 checks passed
mc-nv added a commit that referenced this pull request Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants