Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft : API Server #18

Open
wants to merge 27 commits into
base: v1.1
Choose a base branch
from
Open

Conversation

fearnworks
Copy link
Contributor

  • Implements openai mock api server
  • Implements basic api server
  • Examples for consumption through jupyter notebook and gradio
  • Includes streaming response example

@nivibilla
Copy link

Hey, quick question. Since vLLM doesn't support LoRA. How are you planning to have different experts loading at the same time? I'm asking as I've been trying to figure out the same but I didn't get anywhere

@fearnworks
Copy link
Contributor Author

We are currently exploring adding a custom mistral_moe model to vllm to handle loading the lora weights and any changes we need to apply to the forward passes.

@nivibilla
Copy link

Ah I see. Makes sense. Would that work with tensor parallel too?

@fearnworks
Copy link
Contributor Author

Pivoting this to a general fastapi server

@fearnworks fearnworks changed the title Draft : vllm Integration Draft : API Server Oct 25, 2023
pharaouk added 2 commits November 2, 2023 22:41
This reverts commit e0dd9dc.
@fearnworks fearnworks marked this pull request as ready for review November 7, 2023 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants