Skip to content

Commit

Permalink
add deepseek-r1-distill-qwen-1.5b-rtx4070 model (#376)
Browse files Browse the repository at this point in the history
fixes #374 

note that this requires new vllm 0.7.1 image
  • Loading branch information
samos123 authored Feb 2, 2025
1 parent a963f8b commit dd5a548
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 0 deletions.
13 changes: 13 additions & 0 deletions charts/models/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,19 @@ catalog:
- --disable-log-requests
resourceProfile: nvidia-gpu-gh200:1
targetRequests: 200
deepseek-r1-distill-qwen-1.5b-rtx4070:
enabled: false
features: ["TextGeneration"]
url: "hf://deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
engine: VLLM
env:
VLLM_USE_V1: "1"
args:
- --max-model-len=2048
- --max-num-batched-token=2048
- --max-num-seqs=8
- --kv-cache-dtype=fp8
resourceProfile: nvidia-gpu-rtx4070-8gb:1
deepseek-r1-mi300x:
enabled: false
features: [TextGeneration]
Expand Down
17 changes: 17 additions & 0 deletions manifests/models/deepseek-r1-distill-qwen-1.5b-rtx4070.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Source: models/templates/models.yaml
apiVersion: kubeai.org/v1
kind: Model
metadata:
name: deepseek-r1-distill-qwen-1.5b-rtx4070
spec:
features: [TextGeneration]
url: hf://deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
engine: VLLM
args:
- --max-model-len=2048
- --max-num-batched-token=2048
- --max-num-seqs=8
- --kv-cache-dtype=fp8
env:
VLLM_USE_V1: "1"
resourceProfile: nvidia-gpu-rtx4070-8gb:1

0 comments on commit dd5a548

Please sign in to comment.