Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support profile data parsing for tensorrtllm engine service kind #33

Merged
merged 9 commits into from
Aug 9, 2024

Conversation

nv-hwoo
Copy link
Contributor

@nv-hwoo nv-hwoo commented Aug 8, 2024

  • Support parsing profile export JSON from tensorrtllm engine through c api
  • add tests
  • small refactor for test

Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great testing architecture!

Left a question. Since we may be removing service-kind, it may matter less, but it feels like the C API service-kind is being used exclusively with the direct TRT-LLM engine in mind (e.g. skipping the tokenization, getting engine token counts in a TRT-LLM-specific way). That feels like it could cause issues in the future. Should we rename the service-kind to make it clear that it's not just the C API but specifically for TRT-LLM? Customers may think this is for the general Triton C API, like what PA supports.

@nv-hwoo
Copy link
Contributor Author

nv-hwoo commented Aug 8, 2024

@dyastremsky Note that the service kind used in data parser is different from args.service_kind. From customer's perspective, they run TRTLLM engine using --service-kind tensorrtllm_engine so we don't actually expose to the customers that triton c api is being used. The data parser gets service kind from profile export json directly, and unfortunately I don't think there's an easy way to change that in PA. The data parser gets service kind directly from profile export json because the class is being used by compare subcommand where no profile related args are passed.

@dyastremsky
Copy link
Contributor

@dyastremsky Note that the service kind used in data parser is different from args.service_kind. From customer's perspective, they run TRTLLM engine using --service-kind tensorrtllm_engine so we don't actually expose to the customers that triton c api is being used. The data parser gets service kind from profile export json directly, and unfortunately I don't think there's an easy way to change that in PA. The data parser gets service kind directly from profile export json because the class is being used by compare subcommand where no profile related args are passed.

Ah, I see. Thanks! That's a bit confusing. It might be worth spending a little bit of time to see if there's a workaround or way to improve the code (maybe Matt or Elias have ideas). If not, we can keep this as is, but there's a code smell here IMO.

Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spectacular work, Hyunjae! 🚀

@nv-hwoo nv-hwoo merged commit 5f288a0 into tensorrtllm-engine Aug 9, 2024
7 checks passed
@nv-hwoo nv-hwoo deleted the hwoo-tensorrtllm-output branch August 9, 2024 17:41
matthewkotila pushed a commit that referenced this pull request Aug 9, 2024
* support parsing tensorrtllm engine profile response

* add test

* refactor the test

* update types and names

* fix pre-commit

* run PA with triton c api

* more clean up on the tests

* fix codeql

* address feedback
matthewkotila added a commit that referenced this pull request Aug 9, 2024
* Add tensorrtllm_engine option to service-kind and update testing (#700) (#762)

* Add tensorrtllm_engine option to service-kind and update testing

* Add output format check for tensorrtllm_engine

Co-authored-by: Elias Bermudez <[email protected]>

* Support input payload generation for tensorrtllm engine (#767)

* Add functionality for async requests and output retrieval with Triton C API (#25)

* Support 1-d array data in profile exporter (#28)

* support array of data in profile exporter

* add some tests

* run formatting

* fix pre-commit

* remove duplicate argparser arguments

* Fix Triton C API mode missing infer requested output datatype bug

---------

Co-authored-by: Matthew Kotila <[email protected]>

* Support profile data parsing for tensorrtllm engine service kind (#33)

* support parsing tensorrtllm engine profile response

* add test

* refactor the test

* update types and names

* fix pre-commit

* run PA with triton c api

* more clean up on the tests

* fix codeql

* address feedback

* Add functionality to continue benchmarking in Triton C API mode if server logging support is disabled (#34)

---------

Co-authored-by: Hyunjae Woo <[email protected]>
Co-authored-by: Elias Bermudez <[email protected]>
lkomali pushed a commit that referenced this pull request Aug 15, 2024
* Add tensorrtllm_engine option to service-kind and update testing (#700) (#762)

* Add tensorrtllm_engine option to service-kind and update testing

* Add output format check for tensorrtllm_engine

Co-authored-by: Elias Bermudez <[email protected]>

* Support input payload generation for tensorrtllm engine (#767)

* Add functionality for async requests and output retrieval with Triton C API (#25)

* Support 1-d array data in profile exporter (#28)

* support array of data in profile exporter

* add some tests

* run formatting

* fix pre-commit

* remove duplicate argparser arguments

* Fix Triton C API mode missing infer requested output datatype bug

---------

Co-authored-by: Matthew Kotila <[email protected]>

* Support profile data parsing for tensorrtllm engine service kind (#33)

* support parsing tensorrtllm engine profile response

* add test

* refactor the test

* update types and names

* fix pre-commit

* run PA with triton c api

* more clean up on the tests

* fix codeql

* address feedback

* Add functionality to continue benchmarking in Triton C API mode if server logging support is disabled (#34)

---------

Co-authored-by: Hyunjae Woo <[email protected]>
Co-authored-by: Elias Bermudez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants