Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deepseek-r1 examples #2234

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions examples/.dstack-task.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
type: dev-environment
# The name is optional, if not specified, generated randomly
name: vscode

python: "3.11"
# Uncomment to use a custom Docker image
#image: dstackai/base:py3.13-0.6-cuda-12.1
ide: vscode


# Uncomment to leverage spot instances
#spot_policy: auto

resources:
gpu: 24GB
36 changes: 28 additions & 8 deletions examples/.dstack.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,35 @@
type: dev-environment
# The name is optional, if not specified, generated randomly
name: vscode
type: service
name: llama31

python: "3.11"
# Uncomment to use a custom Docker image
#image: dstackai/base:py3.13-0.6-cuda-12.1
env:
- HF_TOKEN
- MODEL_ID=meta-llama/Llama-3.2-1B
- MAX_MODEL_LEN=4096
commands:
- pip install vllm
- curl -o simple_chat_template.jinja https://github.com/Bihan/vllm/blob/main/examples/simple_chat_template.jinja
- vllm serve $MODEL_ID
--max-model-len $MAX_MODEL_LEN
--chat-template simple_chat_template.jinja

ide: vscode
auth: false

# Use either spot or on-demand instances
spot_policy: auto
port: 8000
# Register the model
model:
name: meta-llama/Llama-3.2-1B
type: chat
format: openai

# Uncomment to leverage spot instances
#spot_policy: auto

# Uncomment to cache downloaded models
#volumes:
# - /root/.cache/huggingface/hub:/root/.cache/huggingface/hub

resources:
gpu: 24GB
# Uncomment if using multiple GPUs
#shm_size: 24GB
21 changes: 21 additions & 0 deletions examples/llms/deepseek/sglang/amd/.dstack.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
type: service
name: deepseek-r1-amd

image: lmsysorg/sglang:v0.4.1.post4-rocm620
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
commands:
- python3 -m sglang.launch_server
--model-path $MODEL_ID
--port 8000
--trust-remote-code

port: 8000
model:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could write model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B (shorter syntax)

name: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
type: chat
format: openai

resources:
gpu: mi300x
disk: 300Gb
22 changes: 22 additions & 0 deletions examples/llms/deepseek/sglang/nvidia/.dstack.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
type: service
name: deepseek-r1-nvidia

image: lmsysorg/sglang:latest
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
commands:
- python3 -m sglang.launch_server
--model-path $MODEL_ID
--port 8000
--trust-remote-code

port: 8000

model:
name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
type: chat
format: openai


resources:
gpu: 24GB
24 changes: 24 additions & 0 deletions examples/llms/deepseek/vllm/amd/.dstack.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
type: service
name: deepseek-r1-amd

image: rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
- MAX_MODEL_LEN=4096

commands:
- pip install vllm
- vllm serve $MODEL_ID
--max-model-len $MAX_MODEL_LEN

port: 8000

model:
name: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
type: chat
format: openai


resources:
gpu: mi300x
disk: 300Gb
21 changes: 21 additions & 0 deletions examples/llms/deepseek/vllm/nvidia/.dstack.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
type: service
name: deepseek-r1-nvidia

image: vllm/vllm-openai:latest
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- MAX_MODEL_LEN=4096
commands:
- pip install vllm
- vllm serve $MODEL_ID
--max-model-len $MAX_MODEL_LEN

port: 8000

model:
name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
type: chat
format: openai

resources:
gpu: 24GB
Loading