Update on the development branch #2298
DanBlanaru
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we have pushed an update to the development branch (and the Triton backend) this Oct 08, 2024.
This #2297 includes:
examples/run.py
and documentation is inexamples/draft_target_model/README.md
.ModelRunnerCpp
class.isParticipant
method to the C++Executor
API to check if the current process is a participant in the executor instance.trtllm-build
command.strongly_typed=False
to build the fp16 vision engine for the multimodal example. TensorRT 10 made the defaultstrongly_typed=True
so fp32 vision engines are built, even if input ONNX files are fp16. This issue is now fixed.trtllm-build --fast-build
with fake or random weights. Thanks to @ZJLi2013 for flagging it in trtllm-build with --fast-build ignore transformer layers #2135.assistant_model
.customAllReduce
performance by using Lamport-style AllReduce + Norm fusion.memcpy
over MPI to the target model's process inorchestrator
mode. This reduces the latency between the end of the draft model generation and beginning of target inference.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions