Update on the development branch #670
kaiyux
announced in
Announcements
Replies: 1 comment 2 replies
-
how to use speculative decoding? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (also includes the Triton backend) this December 15th, 2023.
This update includes:
multi_block_mode
when certain conditions are met (large TP & 32K sequence length)Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions