Added documentation of using warmups to initialize lora weights #515

TheCodeWrangler · 2024-06-27T22:26:27Z

This PR provides documentation for converting lora adapters from a hugging face checkpoint into a warmup that can be used in the triton-inference-server TensorRT-LLM backend.

This approach allows for the LoRa weights to never be required for the client of the triton-inference-server backend and does not require loading or passing these weights from any of the python backend models (preprocessing) to avoid the numpy datatype conversion (which does not support bfloat16)

smehta2000 · 2024-07-17T16:59:40Z

Tagging @kaiyux @byshiue to help triage and/or add to review board, thanks!

TheCodeWrangler · 2024-08-05T12:48:36Z

Curious to get any feedback here

This update is also related to a performance issue I am seeing.
NVIDIA/TensorRT-LLM#1957

This PR gets results much closer to the expected outputs but not fully in line with huggingface/ pre-compiled results. Would love to have some feedback on the process for preparation of the adapter weights.

Added documentation of using warmups to initialize lora weights

ad13938

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added documentation of using warmups to initialize lora weights #515

Added documentation of using warmups to initialize lora weights #515

TheCodeWrangler commented Jun 27, 2024

smehta2000 commented Jul 17, 2024

TheCodeWrangler commented Aug 5, 2024

Added documentation of using warmups to initialize lora weights #515

Are you sure you want to change the base?

Added documentation of using warmups to initialize lora weights #515

Conversation

TheCodeWrangler commented Jun 27, 2024

smehta2000 commented Jul 17, 2024

TheCodeWrangler commented Aug 5, 2024