Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Output only generated text #89

Open
jiangshining opened this issue Nov 3, 2023 · 3 comments
Open

Feature request: Output only generated text #89

jiangshining opened this issue Nov 3, 2023 · 3 comments
Assignees
Labels
feature request New feature or request triaged Issue has been triaged by maintainers

Comments

@jiangshining
Copy link

The returned output starts with the original text of the input. This is a waste of network width, especially when the input is very long. Can a flag be provided to control the return of only generated text output? thanks

@harrydrippin
Copy link

+1 on this. I'm also trying to handle this issue, because my case also has very long input.

@BasicCoder
Copy link

May be you can reference this #95

@byshiue byshiue added the feature request New feature or request label Nov 5, 2023
@harrydrippin
Copy link

@BasicCoder Your one is also good, but it will be better if tensorrt_llm backend can cut input tokens out from the result, by itself. I'm using tensorrt_llm backend only (not ensembled one), because I separated my tokenizer to another server due to my business logics. We may not need to use additional Python backend if this feature is supported by tensorrt_llm backend itself.

@ncomly-nvidia ncomly-nvidia added the triaged Issue has been triaged by maintainers label Nov 27, 2023
pvijayakrish pushed a commit that referenced this issue Oct 8, 2024
…t as well (#88) (#89)

* Replace binding index-based methods with name-based alternatives

* Remove unused variables

* Remove unused variables

* Remove allInput*Specified()

* Delete TRTV1Interface

* Replace getProfileShapeValues() with getProfileTensorValues()

* Remove buffer_bindings_

* Enhancements

* Replace isExecutionBinding()

* Add INT64 support

* Remove hasImplicitBatchDimension()

* Update Copyright

* Remove unused variables

* Undo copyright

* Undo Copyright

* Undo copyright

* Fix the handling in INT64 shape tensors output

* Fix data dependent output shapes

* Fix pre commit errors

* Update copyright

* Resolve review comments

* Include source for building on TRT 8 (#86) (#87)

* Include source for building on TRT 8

* Apply suggestions from code review



---------



* Fix envvar access in CMake

---------

Co-authored-by: Sai Kiran Polisetty <[email protected]>
Co-authored-by: Misha Chornyi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

5 participants