Proxies API calls to Llama.
API requests should include history
, a list of tuples of strings, and the current
query
Example Request:
{ "history": [["user", "hello"], ["llm", "hi"]], "query": "how are you?" }
Responses will be returned as dictionaries. Responses should contain the following:
response
- String LLM response to the query
When running this as a docker container, the XDG_CONFIG_HOME
envvar is set to /config
.
A configuration file at /config/neon/diana.yaml
is required and should look like:
MQ:
port: <MQ Port>
server: <MQ Hostname or IP>
users:
neon_llm_llama:
password: <neon_llama user's password>
user: neon_llama
LLM_LLAMA:
context_depth: 3
max_tokens: 256
num_parallel_processes: 2
num_threads_per_process: 4
For example, if your configuration resides in ~/.config
:
export CONFIG_PATH="/home/${USER}/.config"
docker run -v ${CONFIG_PATH}:/config neon_llm_llama
Note: If connecting to a local MQ server, you may need to specify
--network host
System setup
# Nvidia Docker
sudo apt install curl
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Run docker
export CONFIG_PATH="/home/${USER}/.config"
docker run --gpus 0 -v ${CONFIG_PATH}:/config neon_llm_llama