Draft : API Server #18

fearnworks · 2023-10-24T21:29:33Z

Implements openai mock api server
Implements basic api server
Examples for consumption through jupyter notebook and gradio
Includes streaming response example

nivibilla · 2023-10-25T11:57:40Z

Hey, quick question. Since vLLM doesn't support LoRA. How are you planning to have different experts loading at the same time? I'm asking as I've been trying to figure out the same but I didn't get anywhere

fearnworks · 2023-10-25T12:34:25Z

We are currently exploring adding a custom mistral_moe model to vllm to handle loading the lora weights and any changes we need to apply to the forward passes.

nivibilla · 2023-10-25T12:38:16Z

Ah I see. Makes sense. Would that work with tensor parallel too?

fearnworks · 2023-10-25T14:53:57Z

Pivoting this to a general fastapi server

This reverts commit e0dd9dc.

hydra-moe vllm init

a9a31ca

Pivot to standalone api server

94e1329

fearnworks changed the title ~~Draft : vllm Integration~~ Draft : API Server Oct 25, 2023

fearnworks and others added 23 commits October 25, 2023 11:51

Succesful api model inference

e23751b

Streaming fix

a859eb2

Add api notebook

fe7986e

Newline fixes

81c19bf

Decouple model loading from chat

560d184

Add installable hydramoe base

ce32263

Fix memory hog on model load

b1c9d69

Update to run api from compose

61cbf76

ModelWorker & RabbitMQ Init, First Handshake

9c0cf52

Streaming Inference over RabbitMQ

670acd8

Integrate fixes for eos stream

9cc1c04

Skeleton for openai routing and schemas

b0fb4db

Manual merge from pharouks fork, openai api init

4993241

Add validation for max_tokens api param

7f07de7

Add read of model_info from inference yaml file

94a8beb

Model Info Endpoint provided by ModelWorker

d644712

Add cli option for choosing a new config file

bfa6a83

Connect max_tokens api input to model gen

45f417e

A params for temp, top_p, top_k, and rep penalty

77119eb

Update chat_service.py

93224a0

Update chat.py

ae39630

Updates to infer script and new tokenizer logic

115fe56

Add check for max_token limit being breached

7192317

pharaouk added 2 commits November 2, 2023 22:41

a

e0dd9dc

Revert "a"

5846241

This reverts commit e0dd9dc.

fearnworks marked this pull request as ready for review November 7, 2023 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft : API Server #18

Draft : API Server #18

fearnworks commented Oct 24, 2023

nivibilla commented Oct 25, 2023

fearnworks commented Oct 25, 2023

nivibilla commented Oct 25, 2023

fearnworks commented Oct 25, 2023

Draft : API Server #18

Are you sure you want to change the base?

Draft : API Server #18

Conversation

fearnworks commented Oct 24, 2023

nivibilla commented Oct 25, 2023

fearnworks commented Oct 25, 2023

nivibilla commented Oct 25, 2023

fearnworks commented Oct 25, 2023