Support profile data parsing for tensorrtllm engine service kind #33

nv-hwoo · 2024-08-08T17:00:41Z

Support parsing profile export JSON from tensorrtllm engine through c api
add tests
small refactor for test

dyastremsky

Great testing architecture!

Left a question. Since we may be removing service-kind, it may matter less, but it feels like the C API service-kind is being used exclusively with the direct TRT-LLM engine in mind (e.g. skipping the tokenization, getting engine token counts in a TRT-LLM-specific way). That feels like it could cause issues in the future. Should we rename the service-kind to make it clear that it's not just the C API but specifically for TRT-LLM? Customers may think this is for the general Triton C API, like what PA supports.

genai-perf/genai_perf/profile_data_parser/llm_profile_data_parser.py

genai-perf/tests/test_llm_profile_data_parser.py

nv-hwoo · 2024-08-08T19:20:08Z

@dyastremsky Note that the service kind used in data parser is different from args.service_kind. From customer's perspective, they run TRTLLM engine using --service-kind tensorrtllm_engine so we don't actually expose to the customers that triton c api is being used. The data parser gets service kind from profile export json directly, and unfortunately I don't think there's an easy way to change that in PA. The data parser gets service kind directly from profile export json because the class is being used by compare subcommand where no profile related args are passed.

genai-perf/tests/test_llm_profile_data_parser.py

dyastremsky · 2024-08-08T19:27:25Z

@dyastremsky Note that the service kind used in data parser is different from args.service_kind. From customer's perspective, they run TRTLLM engine using --service-kind tensorrtllm_engine so we don't actually expose to the customers that triton c api is being used. The data parser gets service kind from profile export json directly, and unfortunately I don't think there's an easy way to change that in PA. The data parser gets service kind directly from profile export json because the class is being used by compare subcommand where no profile related args are passed.

Ah, I see. Thanks! That's a bit confusing. It might be worth spending a little bit of time to see if there's a workaround or way to improve the code (maybe Matt or Elias have ideas). If not, we can keep this as is, but there's a code smell here IMO.

genai-perf/tests/test_llm_profile_data_parser.py

dyastremsky

Spectacular work, Hyunjae! 🚀

* support parsing tensorrtllm engine profile response * add test * refactor the test * update types and names * fix pre-commit * run PA with triton c api * more clean up on the tests * fix codeql * address feedback

* Add tensorrtllm_engine option to service-kind and update testing (#700) (#762) * Add tensorrtllm_engine option to service-kind and update testing * Add output format check for tensorrtllm_engine Co-authored-by: Elias Bermudez <[email protected]> * Support input payload generation for tensorrtllm engine (#767) * Add functionality for async requests and output retrieval with Triton C API (#25) * Support 1-d array data in profile exporter (#28) * support array of data in profile exporter * add some tests * run formatting * fix pre-commit * remove duplicate argparser arguments * Fix Triton C API mode missing infer requested output datatype bug --------- Co-authored-by: Matthew Kotila <[email protected]> * Support profile data parsing for tensorrtllm engine service kind (#33) * support parsing tensorrtllm engine profile response * add test * refactor the test * update types and names * fix pre-commit * run PA with triton c api * more clean up on the tests * fix codeql * address feedback * Add functionality to continue benchmarking in Triton C API mode if server logging support is disabled (#34) --------- Co-authored-by: Hyunjae Woo <[email protected]> Co-authored-by: Elias Bermudez <[email protected]>

nv-hwoo added 4 commits August 8, 2024 09:41

support parsing tensorrtllm engine profile response

8c73416

add test

eb518a2

refactor the test

04a4608

update types and names

fef8c89

nv-hwoo requested review from debermudez, matthewkotila and dyastremsky August 8, 2024 17:00

nv-hwoo temporarily deployed to GITLAB August 8, 2024 17:00 — with GitHub Actions Inactive

fix pre-commit

580d668

nv-hwoo temporarily deployed to GITLAB August 8, 2024 17:07 — with GitHub Actions Inactive

dyastremsky reviewed Aug 8, 2024

View reviewed changes

genai-perf/genai_perf/profile_data_parser/llm_profile_data_parser.py Show resolved Hide resolved

run PA with triton c api

639172d

nv-hwoo temporarily deployed to GITLAB August 8, 2024 18:38 — with GitHub Actions Inactive