You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given that we're only requesting one expert in the server at a time, it might be possible to keep many experts in CPU memory and to process larger batches in a single step as a result. Since we know the upcoming request queue, we can also prefetch the experts that are going to be required to minimize the latency.
The text was updated successfully, but these errors were encountered:
Given that we're only requesting one expert in the server at a time, it might be possible to keep many experts in CPU memory and to process larger batches in a single step as a result. Since we know the upcoming request queue, we can also prefetch the experts that are going to be required to minimize the latency.
The text was updated successfully, but these errors were encountered: