Curious about gpu utilization using multi-lora #13647

wlmokac · 2025-02-21T05:06:22Z

wlmokac
Feb 21, 2025

When using the multi-lora inference, I am curious about how the back-end GPU utilization is working and how the caching and overhead is going on.
Like for example, when at the beginning, there is none of the lora path being loaded via the curl command but only allowing the enable_dynamic_loading, may I know about how the caching takes on or else is there any pre-allocate gpu already reserved for loading the lora modules and like how many lora modules are allowed.
Also, I am curious that in other case, maybe I have loaded 2 lora modules already at the beginning curl command, are they being stored in the GPU or RAM memory, and when switching them for inference, how much is the overhead and computational cost for switching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Curious about gpu utilization using multi-lora #13647

{{title}}

Replies: 0 comments

Select a reply

Curious about gpu utilization using multi-lora #13647

wlmokac Feb 21, 2025

Replies: 0 comments

wlmokac
Feb 21, 2025