Add a Python script that runs LLM benchmark #406

nv-hwoo · 2023-09-28T23:27:45Z

Running benchmark 1: Prefill phase

python profile.py --prompt-size-range 100 1000 200 --max-tokens 1

# Sample output
# [ Benchmark Summary ]
#   Prompt size: 100, Average first-token latency: 0.0421 sec
#   Prompt size: 300, Average first-token latency: 0.0312 sec
#   Prompt size: 500, Average first-token latency: 0.0289 sec
#   Prompt size: 700, Average first-token latency: 0.0358 sec
#   Prompt size: 900, Average first-token latency: 0.0327 sec

Running benchmark 2: Generation phase

python profile.py --prompt-size-range 100 1000 200 --max-tokens 256 --ignore-eos

# Sample output
# [ Benchmark Summary ]
#   Prompt size: 100, Average first-token latency: 0.0419 sec, Average token-token latency: 0.0068 sec
#   Prompt size: 300, Average first-token latency: 0.0513 sec, Average token-token latency: 0.0070 sec
#   Prompt size: 500, Average first-token latency: 0.0325 sec, Average token-token latency: 0.0069 sec
#   Prompt size: 700, Average first-token latency: 0.0325 sec, Average token-token latency: 0.0071 sec
#   Prompt size: 900, Average first-token latency: 0.0368 sec, Average token-token latency: 0.0071 sec

src/c++/perf_analyzer/docs/examples/profile.py

src/c++/perf_analyzer/docs/llm.md

src/c++/perf_analyzer/docs/examples/profile.py

github-advanced-security bot found potential problems Sep 28, 2023

View reviewed changes

src/c++/perf_analyzer/docs/examples/profile.py Fixed Show fixed Hide fixed

nv-hwoo requested review from nv-braf and matthewkotila September 28, 2023 23:39

nv-braf reviewed Sep 29, 2023

View reviewed changes

nv-hwoo requested a review from nv-braf October 3, 2023 21:15

matthewkotila reviewed Oct 3, 2023

View reviewed changes

src/c++/perf_analyzer/docs/examples/profile.py Outdated Show resolved Hide resolved

nv-braf reviewed Oct 4, 2023

View reviewed changes

src/c++/perf_analyzer/docs/examples/profile.py Show resolved Hide resolved

nv-hwoo changed the title ~~[WIP] Add a Python script that runs LLM benchmark~~ Add a Python script that runs LLM benchmark Oct 4, 2023

nv-hwoo marked this pull request as ready for review October 4, 2023 21:01

nv-hwoo requested review from matthewkotila and nv-braf October 4, 2023 21:01

nv-braf approved these changes Oct 5, 2023

View reviewed changes

nv-hwoo added 17 commits October 5, 2023 08:45

Run PA using Python

e944803

Formatting and clean up

8339172

Run multiple inferences with different prompt lengths

043e628

Address feedback

f6568a6

Add prompt length command line option

4f02034

Add max_tokens and ignore_eos parameters

3ce918a

Add token-token latency calculation

8dcb1b0

Allow user to provide input data file

c8e18ab

Update llm.md

bcf8dce

Minor changes to llm.md

6560780

Remove note about 23.09 release

7d6aa59

Update sample output

00e9484

Use python staistics.mean to compute avg

88d8aab

Address feedback

b6d267d

Address feedback

02ba02d

Excluded null response from T2T latency calculation

c24edda

Remove avg T2T latency sample output from prefill benchmark

f3a69fc

nv-hwoo force-pushed the hwoo-llm-py-guide branch from b9cb7fc to f3a69fc Compare October 5, 2023 15:46

nv-hwoo merged commit eca94a8 into main Oct 5, 2023
3 checks passed

nv-hwoo deleted the hwoo-llm-py-guide branch October 5, 2023 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Python script that runs LLM benchmark #406

Add a Python script that runs LLM benchmark #406

nv-hwoo commented Sep 28, 2023 •

edited

Loading

Add a Python script that runs LLM benchmark #406

Add a Python script that runs LLM benchmark #406

Conversation

nv-hwoo commented Sep 28, 2023 • edited Loading

Running benchmark 1: Prefill phase

Running benchmark 2: Generation phase

nv-hwoo commented Sep 28, 2023 •

edited

Loading