Lastly, and most importantly, we've added a guide on deploying Mixtral 8x7B as a service. This guide allows you to
+
Lastly, and most importantly, we've added an example on deploying Mixtral 8x7B as a service. This guide allows you to
effortlessly deploy a Mixtral endpoint on any cloud platform of your preference.
Deploying Mixtral 8x7B is easy, especailly when using vLLM:
Note: After you update to 0.14.0, it's important to delete your existing gateways (if any)
using dstack gateway delete and create them again with dstack gateway create.
Create an access key by following the this guide .
Once you've downloaded the .csv file with your IAM user's Access key ID and Secret access key, proceed to
configure the backend.
Log into your DataCrunch account, click Account Settings in the sidebar, find REST API Credentials area and then click the Generate Credentials button.
+
Log into your DataCrunch account, click Account Settings in the sidebar, find REST API Credentials area and then click the Generate Credentials button.
To configure a Kubernetes backend, specify the path to the kubeconfig file,
and the port that dstack can use for proxying SSH traffic.
@@ -3206,8 +2681,8 @@
Before running a service, ensure that you have configured a gateway.
-If you're using dstack Sky, the default gateway is configured automatically for you.
Any embedding models served by Infinity automatically comes with OpenAI's Embeddings APIs compatible APIs,
-so we can directly use openai package to interact with the deployed Infinity.
Llama Index loads data from local files, structures it into chunks, and ingests it into Weaviate (an open-source vector database).
- We set up Llama Index to use local embeddings through the SentenceTransformers library.
-
dstack allows us to deploy LLMs to any cloud provider, e.g. via Services using TGI or vLLM.
-
Llama Index allows us to prompt the LLM automatically incorporating the context from Weaviate.
Next, prepare the Llama Index classes: llama_index.ServiceContext (for indexing and querying) and
-llama_index.StorageContext (for loading and storing).
-
-
Embeddings
-
Note that we're using
-langchain.embeddings.huggingface.HuggingFaceEmbeddings for local embeddings instead of OpenAI.
Once the utility classes are configured, we can load the data from local files and pass it to
-llama_index.VectorStoreIndex. Using its from_documents method will then store the data in the vector database.
This example assumes we're using an LLM deployed using TGI.
-
Once you deployed the model, make sure to set the TGI_ENDPOINT_URL environment variable
-to its URL, e.g. https://<run name>.<gateway domain> (or http://localhost:<port> if it's deployed
-as a task). We'll use this environment variable below.
Once llama_index.VectorStoreIndex is ready, we can proceed with querying it.
-
-
Prompt format
-
If we're deploying Llama 2, we have to ensure that the prompt format is correct.
-
-
fromllama_indeximport(QuestionAnswerPrompt,RefinePrompt)
-
-text_qa_template=QuestionAnswerPrompt(
-"""<s>[INST] <<SYS>>
-We have provided context information below.
-
-{context_str}
-
-Given this information, please answer the question.
-<</SYS>>
-
-{query_str} [/INST]"""
- )
-
-refine_template=RefinePrompt(
-"""<s>[INST] <<SYS>>
-The original query is as follows:
-
-{query_str}
-
-We have provided an existing answer:
-
-{existing_answer}
-
-We have the opportunity to refine the existing answer (only if needed) with some more context below.
-
-{context_msg}
-<</SYS>>
-
-Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer. [/INST]"""
-)
-
-query_engine=index.as_query_engine(
- text_qa_template=text_qa_template,
- refine_template=refine_template,
- streaming=True,
-)
-
-response=query_engine.query("Make a bullet-point timeline of the authors biography?")
-response.print_response_stream()
-
-
That's it! This basic example shows how straightforward it is to use Llama Index and Weaviate with the LLMs deployed
-using dstack. For more in-depth information, we encourage you to explore the documentation for each tool.
To deploy Mixtral as a service, you have to define the corresponding configuration file.
-Below are multiple variants: via vLLM (fp16), TGI (fp16), or TGI (int4).
-
TGI fp16TGI int4vLLM fp16
-
-
-
-
type:service
-
-image:ghcr.io/huggingface/text-generation-inference:latest
-env:
--MODEL_ID=mistralai/Mixtral-8x7B-Instruct-v0.1
-commands:
--text-generation-launcher
---port 80
---trust-remote-code
---num-shard 2# Should match the number of GPUs
-port:80
-
-resources:
-gpu:80GB:2
-disk:200GB
-
-# (Optional) Enable the OpenAI-compatible endpoint
-model:
-type:chat
-name:TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ
-format:tgi
-
In case the service has the model mapping configured, you will also be able
-to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
-
fromopenaiimportOpenAI
-
-client=OpenAI(base_url="https://gateway.<gateway domain>",api_key="<dstack token>")
-
-completion=client.chat.completions.create(
- model="mistralai/Mixtral-8x7B-Instruct-v0.1",
- messages=[
- {
- "role":"user",
- "content":"Compose a poem that explains the concept of recursion in programming.",
- }
- ],
- stream=True,
-)
-
-forchunkincompletion:
- print(chunk.choices[0].delta.content,end="")
-print()
-
-
-Hugging Face Hub token
-
To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable
-(with --env in dstack run or
-using env in the configuration file).
Before running a service, ensure that you have configured a gateway.
-If you're using dstack Sky, the default gateway is configured automatically for you.
Because we've configured the model mapping, it will also be possible
-to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
-
fromopenaiimportOpenAI
-
-client=OpenAI(
- base_url="https://gateway.<gateway domain>",
- api_key="<dstack token>",
-)
-
-completion=client.chat.completions.create(
- model="mixtral",
- messages=[
- {
- "role":"user",
- "content":"Compose a poem that explains the concept of recursion in programming.",
- }
- ],
- stream=True,
-)
-
-forchunkincompletion:
- print(chunk.choices[0].delta.content,end="")
-print()
-
-
-
Hugging Face Hub token
-
To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable
-(with --env in dstack run or
-using env in the configuration file).
The most notable libraries that we'll use are peft (required for using the QLoRA
-technique), bitsandbytes (required for using
-the quantization technique), and trl (required for supervised fine-tuning).
dstack will provision the cloud instance corresponding to the configured project and profile, run the training, and
-tear down the cloud instance once the training is complete.
-
-Tensorboard
-
Since we've executed tensorboard within our task and configured its port using ports,
-you can access it using the URL provided in the output. dstack automatically forwards
-the configured port to your local machine.
If you want to serve an application for development purposes only, you can use
-tasks.
-In this scenario, while the application runs in the cloud,
-it is accessible from your local machine only.
-
-
For production purposes, the optimal approach to serve an application is by using
-services. In this case, the application can be accessed through a public endpoint.
Before running a service, ensure that you have configured a gateway.
-If you're using dstack Sky, the default gateway is configured automatically for you.
-
-
After the gateway is configured, go ahead run the service.
Before running a service, ensure that you have configured a gateway.
-If you're using dstack Sky, the default gateway is configured automatically for you.
Once the service is up, you can query it at
-https://<run name>.<gateway domain> (using the domain set up for the gateway):
-
-
Authorization
-
By default, the service endpoint requires the Authorization header with "Bearer <dstack token>".
-
-
-
-
$curlhttps://yellow-cat-1.example.com\
--XPOST\
--H'Content-Type: application/json'\
--H'Authorization: "Bearer <dstack token>"'\
--d'{"inputs":"What is Deep Learning?"}'
-
-[[0.010704354,-0.033910684,0.004793657,-0.0042832214,0.07551489,0.028702762,0.03985837,0.021956133,...]]
-
-
-
-
-
-
Hugging Face Hub token
-
To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable
-(with --env in dstack run or
-using env in the configuration file).
Note the model property is optional and is only required
-if you're running a chat model and want to access it via an OpenAI-compatible endpoint.
-For more details on how to use this feature, check the documentation on services.
Before running a service, ensure that you have configured a gateway.
-If you're using dstack Sky, the default gateway is configured automatically for you.
Once the service is up, you'll be able to
-access it at https://<run name>.<gateway domain>.
-
-
Authorization
-
By default, the service endpoint requires the Authorization header with "Bearer <dstack token>".
-
-
-
-
$curlhttps://yellow-cat-1.example.com/generate\
--XPOST\
--d'{"inputs":"<s>[INST] What is your favourite condiment?[/INST]"}'\
--H'Content-Type: application/json'\
--H'Authorization: "Bearer <dstack token>"'
-
Because we've configured the model mapping, it will also be possible
-to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
-
fromopenaiimportOpenAI
-
-
-client=OpenAI(
- base_url="https://gateway.<gateway domain>",
- api_key="<dstack token>"
-)
-
-completion=client.chat.completions.create(
- model="mistralai/Mistral-7B-Instruct-v0.1",
- messages=[
- {"role":"user","content":"Compose a poem that explains the concept of recursion in programming."}
- ]
-)
-
-print(completion.choices[0].message)
-
-
-
Hugging Face Hub token
-
To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable
-(with --env in dstack run or
-using env in the configuration file).
Before running a service, ensure that you have configured a gateway.
-If you're using dstack Sky, the default gateway is configured automatically for you.
Because we've configured the model mapping, it will also be possible
-to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
-
fromopenaiimportOpenAI
-
-client=OpenAI(
- base_url="https://gateway.<gateway domain>",
- api_key="<dstack token>"
-)
-
-completion=client.chat.completions.create(
- model="mixtral",
- messages=[
- {
- "role":"user",
- "content":"Compose a poem that explains the concept of recursion in programming.",
- }
- ],
- stream=True,
-)
-
-forchunkincompletion:
- print(chunk.choices[0].delta.content,end="")
-print()
-
-
-
Hugging Face Hub token
-
To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable
-(with --env in dstack run or
-using env in the configuration file).
Data science and ML tools have made significant advancements in recent years. This blog post aims to examine the advantages of cloud dev environments (CDE) for ML engineers and compare them with web-based managed notebooks.
"},{"location":"blog/archive/say-goodbye-to-managed-notebooks/#notebooks-are-here-to-stay","title":"Notebooks are here to stay","text":"
Jupyter notebooks are instrumental for interactive work with data. They provide numerous advantages such as high interactivity, visualization support, remote accessibility, and effortless sharing.
Managed notebook platforms, like Google Colab and AWS SageMaker have become popular thanks to their easy integration with clouds. With pre-configured environments, managed notebooks remove the need to worry about infrastructure.
As the code evolves, it needs to be converted into Python scripts and stored in Git for improved organization and version control. Notebooks alone cannot handle this task, which is why they must be a part of a developer environment that also supports Python scripts and Git.
The JupyterLab project attempts to address this by turning notebooks into an IDE by adding a file browser, terminal, and Git support.
"},{"location":"blog/archive/say-goodbye-to-managed-notebooks/#ides-get-equipped-for-ml","title":"IDEs get equipped for ML","text":"
Recently, IDEs have improved in their ability to support machine learning. They have started to combine the benefits of traditional IDEs and managed notebooks.
IDEs have upgraded their remote capabilities, with better SSH support. Additionally, they now offer built-in support for editing notebooks.
Two popular IDEs, VS Code and PyCharm, have both integrated remote capabilities and seamless notebook editing features.
"},{"location":"blog/archive/say-goodbye-to-managed-notebooks/#the-rise-of-app-ecosystem","title":"The rise of app ecosystem","text":"
Notebooks have been beneficial for their interactivity and sharing features. However, there are new alternatives like Streamlit and Gradio that allow developers to build data apps using Python code. These frameworks not only simplify app-building but also enhance reproducibility by integrating with Git.
Hugging Face Spaces, for example, is a popular tool today for sharing Streamlit and Gradio apps with others.
"},{"location":"blog/archive/say-goodbye-to-managed-notebooks/#say-hello-to-cloud-dev-environments","title":"Say hello to cloud dev environments!","text":"
Remote development within IDEs is becoming increasingly popular, and as a result, cloud dev environments have emerged as a new concept. Various managed services, such as Codespaces and GitPod, offer scalable infrastructure while maintaining the familiar IDE experience.
One such open-source tool is dstack, which enables you to define your dev environment declaratively as code and run it on any cloud.
With this tool, provisioning the required hardware, setting up the pre-built environment (no Docker is needed), and fetching your local code is automated.
$ dstack run .\n\n RUN CONFIGURATION USER PROJECT INSTANCE SPOT POLICY\n honest-jellyfish-1 .dstack.yml peter gcp a2-highgpu-1g on-demand\n\nStarting SSH tunnel...\n\nTo open in VS Code Desktop, use one of these link:\n vscode://vscode-remote/ssh-remote+honest-jellyfish-1/workflow\n\nTo exit, press Ctrl+C.\n
You can securely access the cloud development environment with the desktop IDE of your choice.
Learn more
Check out our guide for running dev environments in your cloud.
dstack is an open-source tool designed for managing AI infrastructure across various cloud platforms. It's lighter and more specifically geared towards AI tasks compared to Kubernetes.
Due to its support for multiple cloud providers, dstack is frequently used to access on-demand and spot GPUs across multiple clouds. From our users, we've learned that managing various cloud accounts, quotas, and billing can be cumbersome.
To streamline this process, we introduce dstack Sky, a managed service that enables users to access GPUs from multiple providers through dstack \u2013 without needing an account in each cloud provider.
"},{"location":"blog/dstack-sky/#what-is-dstack-sky","title":"What is dstack Sky?","text":"
Instead of running dstack server yourself, you point dstack config to a project set up with dstack Sky.
Now, you can use dstack's CLI or API \u2013 just like you would with your own cloud accounts.
$ dstack run . -b tensordock -b vastai\n\n # BACKEND REGION RESOURCES SPOT PRICE \n 1 vastai canada 16xCPU/64GB/1xRTX4090/1TB no $0.35\n 2 vastai canada 16xCPU/64GB/1xRTX4090/400GB no $0.34\n 3 tensordock us 8xCPU/48GB/1xRTX4090/480GB no $0.74\n ...\n Shown 3 of 50 offers, $0.7424 max\n\nContinue? [y/n]:\n
Backends
dstack Sky supports the same backends as the open-source version, except that you don't need to set them up. By default, it uses all supported backends.
You can use both on-demand and spot instances without needing to manage quotas, as they are automatically handled for you.
With dstack Sky you can use all of dstack's features, incl. dev environments, tasks, services, and pools.
To use services, the open-source version requires setting up a gateway with your own domain. dstack Sky comes with a pre-configured gateway.
$ dstack gateway list\n BACKEND REGION NAME ADDRESS DOMAIN DEFAULT\n aws eu-west-1 dstack 3.252.79.143 my-awesome-project.sky.dstack.ai \u2713\n
If you run it with dstack Sky, the service's endpoint will be available at https://<run name>.<project name>.sky.dstack.ai.
If it has a model mapping, the model will be accessible at https://gateway.<project name>.sky.dstack.ai via the OpenAI compatible interface.
from openai import OpenAI\n\n\nclient = OpenAI(\n base_url=\"https://gateway.<project name>.sky.dstack.ai\",\n api_key=\"<dstack token>\"\n)\n\ncompletion = client.chat.completions.create(\n model=\"mixtral\",\n messages=[\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message)\n
Now, you can choose \u2014 either use dstack via the open-source version or via dstack Sky, or even use them side by side.
Credits
Are you an active contributor to the AI community? Request free dstack Sky credits.
dstack Sky is live on Product Hunt. Support it by giving it your vote!
Join Discord
"},{"location":"changelog/","title":"Changelog","text":""},{"location":"changelog/0.10.5/","title":"dstack 0.10.5: Lambda integration, Docker support, and more","text":"
In the previous update, we added initial integration with Lambda Cloud. With today's release, this integration has significantly improved and finally goes generally available. Additionally, the latest release adds support for custom Docker images.
By default, dstack uses its own base Docker images to run dev environments and tasks. These base images come pre-configured with Python, Conda, and essential CUDA drivers. However, there may be times when you need additional dependencies that you don't want to install every time you run your dev environment or task.
To address this, dstack now allows specifying custom Docker images. Here's an example:
Dev environments require the Docker image to have openssh-server pre-installed. If you want to use a custom Docker image with a dev environment and it does not include openssh-server, you can install it using the following method:
Until now, dstack has supported dev-environment and task as configuration types. Even though task may be used for basic serving use cases, it lacks crucial serving features. With the new update, we introduce service, a dedicated configuration type for serving.
As you see, there are two differences compared to task.
The gateway property: the address of a special cloud instance that wraps the running service with a public endpoint. Currently, you must specify it manually. In the future, dstack will assign it automatically.
The port property: A service must always configure one port on which it's running.
When running, dstack forwards the traffic to the gateway, providing you with a public endpoint that you can use to access the running service.
Existing limitations
Currently, you must create a gateway manually using the dstack gateway command and specify its address via YAML (e.g. using secrets). In the future, dstack will assign it automatically.
Gateways do not support HTTPS yet. When you run a service, its endpoint URL is <the address of the gateway>:80. The port can be overridden via the port property: instead of 8000, specify <gateway port>:8000.
Gateways do not provide authorization and auto-scaling. In the future, dstack will support them as well.
This initial support for services is the first step towards providing multi-cloud and cost-effective inference.
Give it a try and share feedback
Even though the current support is limited in many ways, we encourage you to give it a try and share your feedback with us!
More details on how to use services can be found in a dedicated guide in our docs. Questions and requests for help are very much welcome in our Discord server.
"},{"location":"changelog/0.11.0/","title":"dstack 0.11.0: Multi-cloud and multi-region projects","text":"
The latest release of dstack enables the automatic discovery of the best GPU price and availability across multiple configured cloud providers and regions.
"},{"location":"changelog/0.11.0/#multiple-backends-per-project","title":"Multiple backends per project","text":"
Now, dstack leverages price data from multiple configured cloud providers and regions to automatically suggest the most cost-effective options.
The default behavior of dstack is to first attempt the most cost-effective options, provided they are available. You have the option to set a maximum price limit either through max_price in .dstack/profiles.yml or by using --max-price in the dstack run command.
To implement this change, we have modified the way projects are configured. You can now configure multiple clouds and regions within a single project.
Why this matter?
The ability to run LLM workloads across multiple cloud GPU providers allows for a significant reduction in costs and an increase in availability, while also remaining independent of any particular cloud vendor.
We hope that the value of dstack will continue to grow as we expand our support for additional cloud GPU providers. If you're interested in a specific provider, please message us on Discord.
"},{"location":"changelog/0.11.0/#custom-domains-and-https","title":"Custom domains and HTTPS","text":"
In other news, it is now possible to deploy services using HTTPS. All you need to do is configure a wildcard domain (e.g., *.mydomain.com), point it to the gateway IP address, and then pass the subdomain you want to use (e.g., myservice.mydomain.com) to the gateway property in YAML (instead of the gateway IP address).
Using the dstack run command, you are now able to utilize options such as --gpu, --memory, --env, --max-price, and several other arguments to override the profile settings.
Lastly, the local backend is no longer supported. Now, you can run everything using only a cloud backend.
The documentation is updated to reflect the changes in the release.
Migration to 0.11
The dstack version 0.11 update brings significant changes that break backward compatibility. If you used prior dstack versions, after updating to dstack==0.11, you'll need to log in to the UI and reconfigure clouds.
We apologize for any inconvenience and aim to ensure future updates maintain backward compatibility.
"},{"location":"changelog/0.11.0/#give-it-a-try","title":"Give it a try","text":"
Getting started with dstack takes less than a minute. Go ahead and give it a try.
"},{"location":"changelog/0.12.0/","title":"dstack 0.12.0: Simplified cloud setup, and refined API","text":"
For the past six weeks, we've been diligently overhauling dstack with the aim of significantly simplifying the process of configuring clouds and enhancing the functionality of the API. Please take note of the breaking changes, as they necessitate careful migration.
Previously, the only way to configure clouds for a project was through the UI. Additionally, you had to specify not only the credentials but also set up a storage bucket for each cloud to store metadata.
Now, you can configure clouds for a project via ~/.dstack/server/config.yml. Example:
The dstack.api.Run instance provides methods for various operations including attaching to the run, forwarding ports to localhost, retrieving status, stopping, and accessing logs. For more details, refer to the reference.
Because we've prioritized CLI and API UX over the UI, the UI is no longer bundled. Please inform us if you experience any significant inconvenience related to this.
Gateways should now be configured using the dstack gateway command, and their usage requires you to specify a domain. Learn more about how to set up a gateway.
The dstack start command is now dstack server.
The Python API classes were moved from the dstack package to dstack.api.
Unfortunately, when upgrading to 0.12.0, there is no automatic migration for data. This means you'll need to delete ~/.dstack and configure dstack from scratch.
pip install \"dstack[all]==0.12.0\"
Delete ~/.dstack
Configure clouds via ~/.dstack/server/config.yml (see the new guide)
Run dstack server
The documentation and examples are updated.
"},{"location":"changelog/0.12.0/#give-it-a-try","title":"Give it a try","text":"
Getting started with dstack takes less than a minute. Go ahead and give it a try.
At dstack, we remain committed to our mission of building the most convenient tool for orchestrating generative AI workloads in the cloud. In today's release, we have added support for TensorDock, making it easier for you to leverage cloud GPUs at highly competitive prices.
Configuring your TensorDock account with dstack is very easy. Simply generate an authorization key in your TensorDock API settings and set it up in ~/.dstack/server/config.yml:
Now you can restart the server and proceed to using the CLI or API for running development environments, tasks, and services.
$ dstack run . -f .dstack.yml --gpu 40GB\n\n Min resources 1xGPU (40GB)\n Max price -\n Max duration 6h\n Retry policy no\n\n # REGION INSTANCE RESOURCES SPOT PRICE\n 1 unitedstates ef483076 10xCPU, 80GB, 1xA6000 (48GB) no $0.6235\n 2 canada 0ca177e7 10xCPU, 80GB, 1xA6000 (48GB) no $0.6435\n 3 canada 45d0cabd 10xCPU, 80GB, 1xA6000 (48GB) no $0.6435\n ...\n\nContinue? [y/n]:\n
TensorDock offers cloud GPUs on top of servers from dozens of independent hosts, providing some of the most affordable GPU pricing you can find on the internet.
With dstack, you can now utilize TensorDock's GPUs through a highly convenient interface, which includes the developer-friendly CLI and API.
Feedback and support
Feel free to ask questions or seek help in our Discord server.
dstack simplifies gen AI model development and deployment through its developer-friendly CLI and API. It eliminates cloud infrastructure hassles while supporting top cloud providers (such as AWS, GCP, Azure, among others).
While dstack streamlines infrastructure challenges, GPU costs can still hinder development. To address this, we've integrated dstack with Vast.ai, a marketplace providing GPUs from independent hosts at notably lower prices compared to other providers.
With the dstack 0.12.3 release, it's now possible use Vast.ai alongside other cloud providers.
Now you can restart the server and proceed to using dstack's CLI and API.
If you want an easy way to develop, train and deploy gen AI models using affordable cloud GPUs, give dstack with Vast.ai a try.
Feedback and support
Feel free to ask questions or seek help in our Discord server.
"},{"location":"changelog/0.13.0/","title":"dstack 0.13.0: Disk size, CUDA 12.1, Mixtral, and more","text":"
As we wrap up this year, we're releasing a new update and publishing a guide on deploying Mixtral 8x7B with dstack.
"},{"location":"changelog/0.13.0/#configurable-disk-size","title":"Configurable disk size","text":"
Previously, dstack set the disk size to 100GB regardless of the cloud provider. Now, to accommodate larger language models and datasets, dstack enables setting a custom disk size using --disk in dstack run or via the disk property in .dstack/profiles.yml.
With dstack, whether you're using dev environments, tasks, or services, you can opt for a custom Docker image (for self-installed dependencies) or stick with the default Docker image (dstack pre-installs CUDA drivers, Conda, Python, etc.).
We've upgraded the default Docker image's CUDA drivers to 12.1 (for better compatibility with modern libraries).
nvcc
If you're using the default Docker image and need the CUDA compiler (nvcc), you'll have to install it manually using conda install cuda. The image comes pre-configured with the nvidia/label/cuda-12.1.0 Conda channel.
Lastly, and most importantly, we've added a guide on deploying Mixtral 8x7B as a service. This guide allows you to effortlessly deploy a Mixtral endpoint on any cloud platform of your preference.
Deploying Mixtral 8x7B is easy, especailly when using vLLM:
type: service\n\npython: \"3.11\"\n\ncommands:\n - conda install cuda # (required by megablocks)\n - pip install torch # (required by megablocks)\n - pip install vllm megablocks\n - python -m vllm.entrypoints.openai.api_server\n --model mistralai/Mixtral-8X7B-Instruct-v0.1\n --host 0.0.0.0\n --tensor-parallel-size 2 # should match the number of GPUs\n\nport: 8000\n
Once the configuration is defined, goahead and run it:
$ dstack run . -f llms/mixtral.dstack.yml --gpu \"80GB:2\" --disk 200GB\n
It will deploy the endpoint at https://<run-name>.<gateway-domain>.
Because vLLM provides an OpenAI-compatible endpoint, feel free to access it using various OpenAI-compatible tools like Chat UI, LangChain, Llama Index, etc.
Check the complete guide for more details.
Don't forget, with dstack, you can use spot instances across different clouds and regions.
"},{"location":"changelog/0.13.0/#feedback-and-support","title":"Feedback and support","text":"
That's all! Feel free to try out the update and the new guide, and share your feedback with us.
The service configuration deploys any application as a public endpoint. For instance, you can use HuggingFace's TGI or other frameworks to deploy custom LLMs. While this is simple and customizable, using different frameworks and LLMs complicates the integration of LLMs.
With dstack 0.14.0, we are extending the service configuration in dstack to enable you to optionally map your custom LLM to an OpenAI-compatible endpoint.
Here's how it works: you define a service (as before) and include the model property with the model's type, name, format, and other settings.
When you deploy the service using dstack run, dstack will automatically publish the OpenAI-compatible endpoint, converting the prompt and response format between your LLM and OpenAI interface.
from openai import OpenAI\n\nclient = OpenAI(\n base_url=\"https://gateway.<your gateway domain>\",\n api_key=\"none\"\n)\n\ncompletion = client.chat.completions.create(\n model=\"mistralai/Mistral-7B-Instruct-v0.1\",\n messages=[\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message)\n
Here's a live demo of how it works:
For more details on how to use the new feature, be sure to check the updated documentation on services, and the TGI example.
Note: After you update to 0.14.0, it's important to delete your existing gateways (if any) using dstack gateway delete and create them again with dstack gateway create.
In case you have any questions, experience bugs, or need help, drop us a message on our Discord server or submit it as a GitHub issue.
"},{"location":"changelog/0.15.0/","title":"dstack 0.15.0: Resources, authorization, and more","text":"
The latest update brings many improvements, enabling the configuration of resources in YAML files, requiring authorization in services, supporting OpenAI-compatible endpoints for vLLM, and more.
Previously, if you wanted to request hardware resources, you had to either use the corresponding arguments with dstack run (e.g. --gpu GPU_SPEC) or use .dstack/profiles.yml.
With 0.15.0, it is now possible to configure resources in the YAML configuration file:
Supported properties include: gpu, cpu, memory, disk, and shm_size.
If you specify memory size, you can either specify an explicit size (e.g. 24GB) or a range (e.g. 24GB.., or 24GB..80GB, or ..80GB).
The gpu property allows specifying not only memory size but also GPU names and their quantity. Examples: A100 (one A100), A10G,A100 (either A10G or A100), A100:80GB (one A100 of 80GB), A100:2 (two A100), 24GB..40GB:2 (two GPUs between 24GB and 40GB), etc.
It's also possible to configure gpu as an object:
type: dev-environment\n\npython: 3.11\nide: vscode\n\n# Require 2 GPUs of at least 40GB with CUDA compute compatibility of 7.5\nresources:\n gpu:\n count: 2\n memory: 40GB..\n compute_capability: 7.5\n
"},{"location":"changelog/0.15.0/#authorization-in-services","title":"Authorization in services","text":"
Previously, when deploying a service, the public endpoint didn't support authorization, meaning anyone with access to the gateway could call it.
With 0.15.0, by default, service endpoints require the Authorization header with \"Bearer <dstack token>\".
$ curl https://yellow-cat-1.example.com/generate \\\n -X POST \\\n -d '{\"inputs\":\"<s>[INST] What is your favourite condiment?[/INST]\"}' \\\n -H 'Content-Type: application/json' \\\n -H 'Authorization: \"Bearer <dstack token>\"'\n
Authorization can be disabled by setting auth to false in the service configuration file.
In case the service has model mapping configured, the OpenAI-compatible endpoint requires authorization.
from openai import OpenAI\n\n\nclient = OpenAI(\n base_url=\"https://gateway.example.com\",\n api_key=\"<dstack token>\"\n)\n\ncompletion = client.chat.completions.create(\n model=\"mistralai/Mistral-7B-Instruct-v0.1\",\n messages=[\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message)\n
"},{"location":"changelog/0.15.0/#model-mapping-for-vllm","title":"Model mapping for vLLM","text":"
Last but not least, we've added one more format for model mapping: openai.
For example, if you run vLLM using the OpenAI mode, it's possible to configure model mapping for it.
When we run such a service, it will be possible to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface and using your dstack user token.
In addition to a few bug fixes, the latest update brings initial integration with Kubernetes (experimental) and adds the possibility to configure a custom VPC for AWS. Read below for more details.
"},{"location":"changelog/0.15.1/#configuring-a-kubernetes-backend","title":"Configuring a Kubernetes backend","text":"
With the latest update, it's now possible to configure a Kubernetes backend. In this case, if you run a workload, dstack will provision infrastructure within your Kubernetes cluster. This may work with both self-managed and managed clusters.
Prerequisite
To use GPUs with Kubernetes, the cluster must be installed with the NVIDIA GPU Operator.
To configure a Kubernetes backend, you need to specify the path to the kubeconfig file, and the port that dstack can use for proxying SSH traffic. In case of a self-managed cluster, also specify the IP address of any node in the cluster.
Self-managedManaged
Here's how to configure the backend to use a self-managed cluster.
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_host: localhost # The external IP address of any node\n ssh_port: 32000 # Any port accessible outside of the cluster\n
The port specified to ssh_port must be accessible outside of the cluster.
For example, if you are using Kind, make sure to add it via extraPortMappings:
kind: Cluster\napiVersion: kind.x-k8s.io/v1alpha4\nnodes:\n- role: control-plane\n extraPortMappings:\n - containerPort: 32000 # Must be same as `ssh_port`\n hostPort: 32000 # Must be same as `ssh_port`\n
Here's how to configure the backend to use a managed cluster (AWS, GCP, Azure).
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_port: 32000 # Any port accessible outside of the cluster\n
The port specified to ssh_port must be accessible outside of the cluster.
For example, if you are using EKS, make sure to add it via an ingress rule of the corresponding security group:
While dstack supports both self-managed and managed clusters, if you're using AWS, GCP, or Azure, it's generally recommended to corresponding backends directly for greater efficiency and ease of use.
"},{"location":"changelog/0.15.1/#specifying-a-custom-vpc-for-aws","title":"Specifying a custom VPC for AWS","text":"
If you're using dstack with AWS, it's now possible to configure a custom VPC via ~/.dstack/server/config.yml:
In this case, dstack will attempt to utilize the VPC with the configured name in each region. If any region lacks a VPC with that name, it will be skipped.
NOTE:
All subnets of the configured VPC should be public; otherwise, dstack won't be able to manage workloads.
Previously, when running a dev environment, task, or service, dstack provisioned an instance in a configured backend, and upon completion of the run, deleted the instance.
In the latest update, we introduce pools, a significantly more efficient way to manage instance lifecycles and reuse instances across runs.
Now, when using the dstack run command, it tries to reuse an instance from a pool. If no ready instance meets the requirements, dstack automatically provisions a new one and adds it to the pool.
Once the workload finishes, the instance is marked as ready (to run other workloads). If the instance remains idle for the configured duration, dstack tears it down.
Idle duration
By default, if dstack run provisions a new instance, its idle duration is set to 5m. This means the instance waits for a new workload for only five minutes before getting torn down. To override it, use the --idle-duration DURATION argument.
The dstack pool command allows for managing instances within pools.
To manually add an instance to a pool, use dstack pool add:
$ dstack pool add --gpu 80GB --idle-duration 1d\n\n BACKEND REGION RESOURCES SPOT PRICE\n tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595\n azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n\nContinue? [y/n]: y\n
The dstack pool add command allows specifying resource requirements, along with the spot policy, idle duration, max price, retry policy, and other policies.
If no idle duration is configured, by default, dstack sets it to 72h. To override it, use the --idle-duration DURATION argument.
Limitations
The dstack pool add command is not yet supported for Lambda, Azure, TensorDock, Kubernetes, and VastAI backends. Support for them is coming in version 0.16.1.
Refer to pools for more details on the new feature and how to use it.
"},{"location":"changelog/0.16.0/#why-does-this-matter","title":"Why does this matter?","text":"
With this new feature, using the cloud can be a lot more predictable and convenient:
Now, you can provision instances in advance and ensure they are available for the entire duration of the project. This saves you from the risk of not having a GPU when you need it most.
If you reuse an instance from a pool, dstack run starts much faster. For example, you can provision an instance and reuse it for running a dev environment, task, or service.
Have questions or need help? Drop us a message on our Discord server. See a bug? Report it to GitHub issues.
"},{"location":"changelog/0.16.1/","title":"dstack 0.16.1: Improvements to dstack pool and bug-fixes","text":"
The latest update enhances the dstack pool command introduced earlier, and it fixes a number of important bugs.
"},{"location":"changelog/0.16.1/#improvements-to-dstack-pool","title":"Improvements to dstack pool","text":"
The dstack pool command, that allows you to manually add instances to the pool, has received several improvements:
The dstack pool add command now works with all VM-based backends (which means all backends except vastai and kubernetes).
The dstack pool add command now accepts the arguments to configure the spot policy (via --spot-auto, --spot, --on-demand) and idle duration (via --idle-duration DURATION). By default, the spot policy is set to on-demand, while the idle duration is set to 72h.
Didn't try dstack pool yet? Give it a try now. It significantly improves the predictability and convenience of using cloud GPUs.
The 0.16.0 update broke the vastai backend (the dstack run command didn't show offers).
If you submitted runs via the API, the default idle duration was not applied, leading to instances staying in the pool and not being automatically removed.
dstack couldn't connect to the instance via SSH due to a number of issues related to not properly handling the user' s default SSH config.
When connecting to a run via ssh <run name> (while using the default Docker image), python, pip, and conda couldn't be found due to the broken PATH.
On our journey to provide an open-source, cloud-agnostic platform for orchestrating GPU workloads, we are proud to announce another step forward \u2013 the integration with CUDO Compute.
CUDO Compute is a GPU marketplace that offers cloud resources at an affordable cost in a number of locations. Currently, the available GPUs include A40, RTX A6000, RTX A4000, RTX A5000, and RTX 3080.
To use it with dstack, you only need to configure the cudo backend with your CUDO Compute project ID and API key:
Once it's done, you can restart the dstack server and use the dstack CLI or API to run workloads.
$ dstack run . -b cudo \n # BACKEND REGION RESOURCES SPOT PRICE\n 1 cudo no-luster-1 25xCPU, 96GB, 1xA6000 no $1.17267\n (48GB), 100GB (disk)\n 2 cudo no-luster-1 26xCPU, 100GB, 1xA6000 no $1.17477\n (48GB), 100GB (disk)\n 3 cudo no-luster-1 27xCPU, 100GB, 1xA6000 no $1.17687\n (48GB), 100GB (disk)\n ...\n Shown 3 of 8 offers, $1.18737 max\n\n Continue? [y/n]:\n
Just like with other backends, the cudo backend allows you to launch dev environments, run tasks, and deploy services with dstack run, and manage your pool of instances via dstack pool.
Limitations
The dstack gateway feature is not yet compatible with cudo, but it is expected to be supported in version 0.17.0, planned for release within a week.
The cudo backend cannot yet be used with dstack Sky, but it will also be enabled within a week.
Haven't tried dstack yet? You're very welcome to do so now. With dstack, orchestrating GPU workloads over any cloud is very easy!
Previously, dstack always served services as single replicas. While this is suitable for development, in production, the service must automatically scale based on the load.
That's why in 0.17.0, we extended dstack with the capability to configure the number of replicas as well as the auto-scaling policy.
The replicas property can be set either to a number or to a range. In the case of a range, the scaling property is required to configure the auto-scaling policy. The auto-scaling policy requires specifying metric (such as rps, i.e. \"requests per second\") and its target (the metric value).
"},{"location":"changelog/0.17.0/#regions-and-instance-types","title":"Regions and instance types","text":"
Also, the update brings a simpler way to configure regions and instance types.
For example, if you'd like to use only a subset of specific regions or instance types, you can now configure them via .dstack/profiles.yml.
Previously, environment variables had to be hardcoded in the configuration file or passed via the CLI. The update brings two major improvements.
Firstly, it's now possible to configure an environment variable in the configuration without hardcoding its value. Secondly, dstack run now inherits environment variables from the current process.
Together, these features allow users to define environment variables separately from the configuration and pass them to dstack run conveniently, such as by using a .env file.
Now, if you run this configuration, dstack will ensure that you've set HUGGING_FACE_HUB_TOKEN either via HUGGING_FACE_HUB_TOKEN=<value> dstack run ..., dstack run -e HUGGING_FACE_HUB_TOKEN=<value> ..., or by using other tools such as direnv or similar.
Currently supported providers for this feature include AWS, GCP, and Azure. For other providers or on-premises servers, file the corresponding feature requests or ping on Discord.
One more small improvement is that the commands property is now not required for tasks and services if you use an image that has a default entrypoint configured.
With the release of version 0.2 of dstack, it is now possible to configure GCP as a remote. All features that were previously available for AWS, except real-time artifacts, are now available for GCP as well.
This means that you can define your ML workflows in code and easily run them locally or remotely in your GCP account.
dstack automatically creates and deletes cloud instances as needed, and assists in setting up the environment, including pipeline dependencies, and saving/loading artifacts.
No code changes are required since ML workflows are described in YAML. You won't need to deal with Docker, Kubernetes, or stateful UI.
This article will explain how to use dstack to run remote ML workflows on GCP.
Ensure that you have installed the latest version of dstack before proceeding.
$ pip install dstack --upgrade\n
By default, workflows run locally. To run workflows remotely, e.g. on a GCP account), you must configure a remote using the dstack config command. Follow the steps below to do so.
"},{"location":"changelog/0.2/#1-create-a-project","title":"1. Create a project","text":"
First you have to create a project in your GCP account, link a billing to it, and make sure that the required APIs and enabled for it.
"},{"location":"changelog/0.2/#2-create-a-storage-bucket","title":"2. Create a storage bucket","text":"
Once the project is set up, you can proceed and create a storage bucket. This bucket will be used to store workflow artifacts and metadata.
NOTE:
Make sure to create the bucket in the sane location where you'd like to run your workflows.
"},{"location":"changelog/0.2/#3-create-a-service-account","title":"3. Create a service account","text":"
The next step is to create a service account in the created project and configure the following roles for it: Service Account User, Compute Admin, Storage Admin, Secret Manager Admin, and Logging Admin.
Once the service account is set up, create a key for it and download the corresponding JSON file to your local machine (e.g. to ~/Downloads/my-awesome-project-d7735ca1dd53.json).
"},{"location":"changelog/0.2/#4-configure-the-cli","title":"4. Configure the CLI","text":"
Once the service account key JSON file is on your machine, you can configure the CLI using the dstack config command.
The command will ask you for a path to the key, GCP region and zone, and storage bucket name.
$ dstack config\n\n? Choose backend: gcp\n? Enter path to credentials file: ~/Downloads/dstack-d7735ca1dd53.json\n? Choose GCP geographic area: North America\n? Choose GCP region: us-west1\n? Choose GCP zone: us-west1-b\n? Choose storage bucket: dstack-dstack-us-west1\n? Choose VPC subnet: no preference\n
That's it! Now you can run remote workflows on GCP.
Last October, we open-sourced the dstack CLI for defining ML workflows as code and running them easily on any cloud or locally. The tool abstracts ML engineers from vendor APIs and infrastructure, making it convenient to run scripts, development environments, and applications.
Today, we are excited to announce a preview of Hub, a new way to use dstack for teams to manage their model development workflows effectively on any cloud platform.
"},{"location":"changelog/0.7.0/#how-does-it-work","title":"How does it work?","text":"
Previously, the dstack CLI configured a cloud account as a remote to use local cloud credentials for direct requests to the cloud. Now, the CLI allows configuration of Hub as a remote, enabling requests to the cloud using user credentials stored in Hub.
sequenceDiagram\n autonumber\n participant CLI\n participant Hub\n participant Cloud\n % Note right of Cloud: AWS, GCP, etc\n CLI->>Hub: Run a workflow\n activate Hub\n Hub-->>Hub: User authentication\n loop Workflow provider\n Hub-->>Cloud: Submit workflow jobs\n end\n Hub-->>CLI: Return the workflow status\n deactivate Hub\n loop Workflow scheduler\n Hub-->>Cloud: Re-submit workflow jobs\n end
The Hub not only provides basic features such as authentication and credential storage, but it also has built-in workflow scheduling capabilities. For instance, it can monitor the availability of spot instances and automatically resubmit jobs.
"},{"location":"changelog/0.7.0/#why-does-it-matter","title":"Why does it matter?","text":"
As you start developing models more regularly, you'll encounter the challenge of automating your ML workflows to reduce time spent on infrastructure and manual work.
While many cloud vendors offer tools to automate ML workflows, they do so through opinionated UIs and APIs, leading to a suboptimal developer experience and vendor lock-in.
In contrast, dstack aims to provide a non-opinionated and developer-friendly interface that can work across any vendor.
"},{"location":"changelog/0.7.0/#try-the-preview","title":"Try the preview","text":"
Here's a quick guide to get started with Hub:
Start the Hub application
Visit the URL provided in the output to log in as an administrator
Create a project and configure its backend (AWS or GCP)
Currently, the only way to run or manage workflows is through the dstack CLI. There are scenarios when you'd prefer to run workflows other ways, e.g. from Python code or programmatically via API. To support these scenarios, we plan to release soon Python SDK and REST API.
The built-in scheduler currently monitors spot instance availability and automatically resubmits jobs. Our plan is to enhance this feature and include additional capabilities. Users will be able to track cloud compute usage, and manage quotes per team via the user interface.
Lastly, and of utmost importance, we plan to extend support to other cloud platforms, not limiting ourselves to AWS, GCP, and Azure.
At dstack, our goal is to create a simple and unified interface for ML engineers to run dev environments, pipelines, and apps on any cloud. With the latest update, we take another significant step in this direction.
We are thrilled to announce that the latest update introduces Azure support, among other things, making it incredibly easy to run dev environments, pipelines, and apps in Azure. Read on for more details.
Using Azure with dstack is very straightforward. All you need to do is create the corresponding project via the UI and provide your Azure credentials.
NOTE:
For detailed instructions on setting up dstack for Azure, refer to the documentation.
Once the project is set up, you can define dev environments, pipelines, and apps as code, and easily run them with just a single command. dstack will automatically provision the infrastructure for you.
"},{"location":"changelog/0.9.1/#logs-and-artifacts-in-ui","title":"Logs and artifacts in UI","text":"
Secondly, with the new update, you now have the ability to browse the logs and artifacts of any run through the user interface.
Last but not least, with the update, we have reworked the documentation to provide a greater emphasis on specific use cases: dev environments, tasks, and services.
"},{"location":"changelog/0.9.1/#try-it-out","title":"Try it out","text":"
Please note that when installing dstack via pip, you now need to specify the exact list of cloud providers you intend to use:
$ pip install \"dstack[aws,gcp,azure]\" -U\n
This requirement applies only when you start the server locally. If you connect to a server hosted elsewhere, you can use the shorter syntax:pip install dstack.
Feedback
If you have any feedback, including issues or questions, please share them in our Discord community or file it as a GitHub issue.
"},{"location":"docs/","title":"What is dstack?","text":"
dstack is an open-source orchestration engine for running AI workloads. It supports a wide range of cloud providers (such as AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, CUDO, RunPod, etc.) as well as on-premises infrastructure.
"},{"location":"docs/#why-use-dstack","title":"Why use dstack?","text":"
Designed for development, training, and deployment of gen AI models.
Efficiently utilizes compute across cloud providers and on-prem servers.
Compatible with any training, fine-tuning, and serving frameworks, as well as other third-party tools.
100% open-source.
"},{"location":"docs/#how-does-it-work","title":"How does it work?","text":"
Install the open-source version of dstack and configure your own cloud accounts, or sign up with dstack Sky
Define configurations such as dev environments, tasks, and services.
Run configurations via dstack's CLI or API.
Use pools to manage instances and on-prem servers.
"},{"location":"docs/#where-do-i-start","title":"Where do I start?","text":"
To use the open-source version, make sure to install the server and configure backends.
If you're using dstack Sky, install the CLI and run the dstack config command:
Once the CLI is set up, follow the quickstart.
"},{"location":"docs/quickstart/#initialize-a-repo","title":"Initialize a repo","text":"
To use dstack's CLI in a folder, first run dstack init within that folder.
$ mkdir quickstart && cd quickstart\n$ dstack init\n
Your folder can be a regular local folder or a Git repo.
"},{"location":"docs/quickstart/#define-a-configuration","title":"Define a configuration","text":"
Define what you want to run as a YAML file. The filename must end with .dstack.yml (e.g., .dstack.yml or train.dstack.yml are both acceptable).
Dev environmentTaskService
Dev environments allow you to quickly provision a machine with a pre-configured environment, resources, IDE, code, etc.
type: dev-environment\n\n# Use either `python` or `image` to configure environment\npython: \"3.11\"\n# image: ghcr.io/huggingface/text-generation-inference:latest\n\nide: vscode\n\n# (Optional) Configure `gpu`, `memory`, `disk`, etc\nresources:\n gpu: 80GB\n
Tasks make it very easy to run any scripts, be it for training, data processing, or web apps. They allow you to pre-configure the environment, resources, code, etc.
Run a configuration using the dstack run command, followed by the working directory path (e.g., .), the path to the configuration file, and run options (e.g., configuring hardware resources, spot policy, etc.)
Before submitting a task or deploying a model, you may want to run code interactively. Dev environments allow you to do exactly that.
You specify the required environment and resources, then run it. dstack provisions the dev environment in the configured backend and enables access via your desktop IDE.
"},{"location":"docs/concepts/dev-environments/#define-a-configuration","title":"Define a configuration","text":"
First, create a YAML file in your project folder. Its name must end with .dstack.yml (e.g. .dstack.yml or dev.dstack.yml are both acceptable).
The YAML file allows you to specify your own Docker image, environment variables, resource requirements, etc. If image is not specified, dstack uses its own (pre-configured with Python, Conda, and essential CUDA drivers).
.dstack.yml
For more details on the file syntax, refer to the .dstack.yml reference.
If you don't assign a value to an environment variable (see HUGGING_FACE_HUB_TOKEN above), dstack will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a .env file and utilize tools like direnv.
"},{"location":"docs/concepts/dev-environments/#run-the-configuration","title":"Run the configuration","text":"
To run a configuration, use the dstack run command followed by the working directory path, configuration file path, and other options.
$ dstack run . -f .dstack.yml\n\n BACKEND REGION RESOURCES SPOT PRICE\n tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595\n azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n\nContinue? [y/n]: y\n\nProvisioning `fast-moth-1`...\n---> 100%\n\nTo open in VS Code Desktop, use this link:\n vscode://vscode-remote/ssh-remote+fast-moth-1/workflow\n
When dstack provisions the dev environment, it uses the current folder contents.
Exclude files
If there are large files or folders you'd like to avoid uploading, you can list them in either .gitignore or .dstackignore.
The dstack run command allows specifying many things, including spot policy, retry and max duration, max price, regions, instance types, and much more.
In case you'd like to reuse certain parameters (such as spot policy, retry and max duration, max price, regions, instance types, etc.) across runs, you can define them via .dstack/profiles.yml.
"},{"location":"docs/concepts/dev-environments/#manage-runs","title":"Manage runs","text":""},{"location":"docs/concepts/dev-environments/#stop-a-run","title":"Stop a run","text":"
Once the run exceeds the max duration, or when you use dstack stop, the dev environment and its cloud resources are deleted.
Pools simplify managing the lifecycle of cloud instances and enable their efficient reuse across runs.
You can have instances provisioned in the configured backend automatically when you run a workload, or add them manually, configuring the required resources, idle duration, etc.
By default, when using the dstack run command, it tries to reuse an instance from a pool. If no idle instance meets the requirements, dstack automatically provisions a new one and adds it to the pool.
To avoid provisioning new instances with dstack run, use --reuse. Your run will be assigned to an idle instance in the pool.
Idle duration
By default, dstack run sets the idle duration of a newly provisioned instance to 5m. This means that if the run is finished and the instance remains idle for longer than five minutes, it is automatically removed from the pool. To override the default idle duration, use --idle-duration DURATION with dstack run.
"},{"location":"docs/concepts/pools/#dstack-pool-add","title":"dstack pool add","text":"
To manually add an instance to a pool, use dstack pool add:
$ dstack pool add --gpu 80GB\n\n BACKEND REGION RESOURCES SPOT PRICE\n tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595\n azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n\nContinue? [y/n]: y\n
The dstack pool add command allows specifying resource requirements, along with the spot policy, idle duration, max price, retry policy, and other policies.
The default idle duration if you're using dstack pool add is 72h. To override it, use the --idle-duration DURATION argument.
You can also specify the policies via .dstack/profiles.yml instead of passing them as arguments. For more details on policies and their defaults, refer to .dstack/profiles.yml.
Limitations
The dstack pool add command is not supported for Kubernetes, and VastAI backends yet.
Services make it very easy to deploy any kind of model or web application as public endpoints.
Use any serving frameworks and specify required resources. dstack deploys it in the configured backend, handles authorization, auto-scaling, and provides an OpenAI-compatible interface if needed.
Prerequisites
If you're using the open-source server, you first have to set up a gateway.
"},{"location":"docs/concepts/services/#set-up-a-gateway","title":"Set up a gateway","text":"
For example, if your domain is example.com, go ahead and run the dstack gateway create command:
Afterward, in your domain's DNS settings, add an A DNS record for *.example.com pointing to the IP address of the gateway.
Now, if you run a service, dstack will make its endpoint available at https://<run name>.<gateway domain>.
In case your service has the model mapping configured, dstack will automatically make your model available at https://gateway.<gateway domain> via the OpenAI-compatible interface.
If you're using dstack Sky, the gateway is set up for you.
"},{"location":"docs/concepts/services/#define-a-configuration","title":"Define a configuration","text":"
First, create a YAML file in your project folder. Its name must end with .dstack.yml (e.g. .dstack.yml or train.dstack.yml are both acceptable).
The YAML file allows you to specify your own Docker image, environment variables, resource requirements, etc. If image is not specified, dstack uses its own (pre-configured with Python, Conda, and essential CUDA drivers).
.dstack.yml
For more details on the file syntax, refer to the .dstack.yml reference.
If you don't assign a value to an environment variable (see HUGGING_FACE_HUB_TOKEN above), dstack will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a .env file and utilize tools like direnv.
"},{"location":"docs/concepts/services/#configure-model-mapping","title":"Configure model mapping","text":"
By default, if you run a service, its endpoint is accessible at https://<run name>.<gateway domain>.
If you run a model, you can optionally configure the mapping to make it accessible via the OpenAI-compatible interface.
In this case, with such a configuration, once the service is up, you'll be able to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
The format supports only tgi (Text Generation Inference) and openai (if you are using Text Generation Inference or vLLM with OpenAI-compatible mode).
Chat template
By default, dstack loads the chat template from the model's repository. If it is not present there, manual configuration is required.
type: service\n\nimage: ghcr.io/huggingface/text-generation-inference:latest\nenv:\n - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ\ncommands:\n - text-generation-launcher --port 8000 --trust-remote-code --quantize gptq\nport: 8000\n\nresources:\n gpu: 80GB\n\n# Enable the OpenAI-compatible endpoint\nmodel:\n type: chat\n name: TheBloke/Llama-2-13B-chat-GPTQ\n format: tgi\n chat_template: \"{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\\\n' + system_message + '\\\\n<</SYS>>\\\\n\\\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' </s>' }}{% endif %}{% endfor %}\"\n eos_token: \"</s>\"\n
Please note that model mapping is an experimental feature with the following limitations:
Doesn't work if your chat_template uses bos_token. As a workaround, replace bos_token inside chat_template with the token content itself.
Doesn't work if eos_token is defined in the model repository as a dictionary. As a workaround, set eos_token manually, as shown in the example above (see Chat template).
If you encounter any other issues, please make sure to file a GitHub issue.
"},{"location":"docs/concepts/services/#configure-replicas-and-auto-scaling","title":"Configure replicas and auto-scaling","text":"
By default, dstack runs a single replica of the service. You can configure the number of replicas as well as the auto-scaling policy.
If you specify the minimum number of replicas as 0, the service will scale down to zero when there are no requests.
"},{"location":"docs/concepts/services/#run-the-configuration","title":"Run the configuration","text":"
To run a configuration, use the dstack run command followed by the working directory path, configuration file path, and any other options.
$ dstack run . -f serve.dstack.yml\n\n BACKEND REGION RESOURCES SPOT PRICE\n tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595\n azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n\nContinue? [y/n]: y\n\nProvisioning...\n---> 100%\n\nService is published at https://yellow-cat-1.example.com\n
When dstack submits the task, it uses the current folder contents.
Exclude files
If there are large files or folders you'd like to avoid uploading, you can list them in either .gitignore or .dstackignore.
The dstack run command allows specifying many things, including spot policy, retry and max duration, max price, regions, instance types, and much more.
In case the service has the model mapping configured, you will also be able to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
from openai import OpenAI\n\n\nclient = OpenAI(\n base_url=\"https://gateway.example.com\",\n api_key=\"<dstack token>\"\n)\n\ncompletion = client.chat.completions.create(\n model=\"mistralai/Mistral-7B-Instruct-v0.1\",\n messages=[\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message)\n
In case you'd like to reuse certain parameters (such as spot policy, retry and max duration, max price, regions, instance types, etc.) across runs, you can define them via .dstack/profiles.yml.
"},{"location":"docs/concepts/services/#manage-runs","title":"Manage runs","text":""},{"location":"docs/concepts/services/#stop-a-run","title":"Stop a run","text":"
When you use dstack stop, the service and its cloud resources are deleted.
Tasks allow for convenient scheduling of any kind of batch jobs, such as training, fine-tuning, or data processing, as well as running web applications.
You simply specify the commands, required environment, and resources, and then submit it. dstack provisions the required resources in a configured backend and runs the task.
"},{"location":"docs/concepts/tasks/#define-a-configuration","title":"Define a configuration","text":"
First, create a YAML file in your project folder. Its name must end with .dstack.yml (e.g. .dstack.yml or train.dstack.yml are both acceptable).
The YAML file allows you to specify your own Docker image, environment variables, resource requirements, etc. If image is not specified, dstack uses its own (pre-configured with Python, Conda, and essential CUDA drivers).
.dstack.yml
For more details on the file syntax, refer to the .dstack.yml reference.
If you don't assign a value to an environment variable (see HUGGING_FACE_HUB_TOKEN above), dstack will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a .env file and utilize tools like direnv.
A task can configure ports. In this case, if the task is running an application on a port, dstack run will securely allow you to access this port from your local machine through port forwarding.
When dstack submits the task, it uses the current folder contents.
Exclude files
If there are large files or folders you'd like to avoid uploading, you can list them in either .gitignore or .dstackignore.
The dstack run command allows specifying many things, including spot policy, retry and max duration, max price, regions, instance types, and much more.
In case you'd like to reuse certain parameters (such as spot policy, retry and max duration, max price, regions, instance types, etc.) across runs, you can define them via .dstack/profiles.yml.
"},{"location":"docs/concepts/tasks/#manage-runs","title":"Manage runs","text":""},{"location":"docs/concepts/tasks/#stop-a-run","title":"Stop a run","text":"
Once the run exceeds the max duration, or when you use dstack stop, the task and its cloud resources are deleted.
There are two ways to configure AWS: using an access key or using the default credentials.
Access keyDefault credentials
Create an access key by following the this guide. Once you've downloaded the .csv file with your IAM user's Access key ID and Secret access key, proceed to configure the backend.
Log into your DataCrunch account, click Account Settings in the sidebar, find REST API Credentials area and then click the Generate Credentials button.
dstack supports both self-managed, and managed Kubernetes clusters.
Prerequisite
To use GPUs with Kubernetes, the cluster must be installed with the NVIDIA GPU Operator.
To configure a Kubernetes backend, specify the path to the kubeconfig file, and the port that dstack can use for proxying SSH traffic. In case of a self-managed cluster, also specify the IP address of any node in the cluster.
Self-managedManaged
Here's how to configure the backend to use a self-managed cluster.
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_host: localhost # The external IP address of any node\n ssh_port: 32000 # Any port accessible outside of the cluster\n
The port specified to ssh_port must be accessible outside of the cluster.
For example, if you are using Kind, make sure to add it via extraPortMappings:
kind: Cluster\napiVersion: kind.x-k8s.io/v1alpha4\nnodes:\n- role: control-plane\n extraPortMappings:\n - containerPort: 32000 # Must be same as `ssh_port`\n hostPort: 32000 # Must be same as `ssh_port`\n
Here's how to configure the backend to use a managed cluster (AWS, GCP, Azure).
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_port: 32000 # Any port accessible outside of the cluster\n
The port specified to ssh_port must be accessible outside of the cluster.
For example, if you are using EKS, make sure to add it via an ingress rule of the corresponding security group:
"},{"location":"docs/installation/#start-the-server","title":"Start the server","text":"
Once the ~/.dstack/server/config.yml file is configured, proceed to start the server:
pipDocker
$ dstack server\n\nApplying ~/.dstack/server/config.yml...\n\nThe admin token is \"bbae0f28-d3dd-4820-bf61-8f4bb40815da\"\nThe server is running at http://127.0.0.1:3000/\n
$ docker run -p 3000:3000 -v $HOME/.dstack/server/:/root/.dstack/server dstackai/dstack\n\nApplying ~/.dstack/server/config.yml...\n\nThe admin token is \"bbae0f28-d3dd-4820-bf61-8f4bb40815da\"\nThe server is running at http://127.0.0.1:3000/\n
"},{"location":"docs/installation/#configure-the-cli","title":"Configure the CLI","text":"
To point the CLI to the dstack server, you need to configure ~/.dstack/config.yml with the server address, user token and project name.
$ dstack config --url http://127.0.0.1:3000 \\\n --project main \\\n --token bbae0f28-d3dd-4820-bf61-8f4bb40815da\n\nConfiguration is updated at ~/.dstack/config.yml\n
Instead of configuring run options as dstack run arguments or .dstack.yml parameters, you can defines those options in profiles.yml and reuse them across different run configurations. dstack supports repository-level profiles defined in $REPO_ROOT/.dstack/profiles.yml and global profiles defined in ~/.dstack/profiles.yml.
Profiles parameters are resolved with the following priority:
dstack run arguments
.dstack.yml parameters
Repository-level profiles from $REPO_ROOT/.dstack/profiles.yml
profiles:\n - name: large\n\n spot_policy: auto # (Optional) The spot policy. Supports `spot`, `on-demand, and `auto`.\n\n max_price: 1.5 # (Optional) The maximum price per instance per hour\n\n max_duration: 1d # (Optional) The maximum duration of the run.\n\n retry:\n retry-duration: 3h # (Optional) To wait for capacity\n\n backends: [azure, lambda] # (Optional) Use only listed backends \n\n default: true # (Optional) Activate the profile by default\n
You can mark any profile as default or pass its name via --profile to dstack run.
"},{"location":"docs/reference/profiles.yml/#root-reference","title":"Root reference","text":""},{"location":"docs/reference/profiles.yml/#backends","title":"backends - (Optional) The backends to consider for provisionig (e.g., [aws, gcp]).","text":""},{"location":"docs/reference/profiles.yml/#regions","title":"regions - (Optional) The regions to consider for provisionig (e.g., [eu-west-1, us-west4, westeurope]).","text":""},{"location":"docs/reference/profiles.yml/#instance_types","title":"instance_types - (Optional) The cloud-specific instance types to consider for provisionig (e.g., [p3.8xlarge, n1-standard-4]).","text":""},{"location":"docs/reference/profiles.yml/#spot_policy","title":"spot_policy - (Optional) The policy for provisioning spot or on-demand instances: spot, on-demand, or auto.","text":""},{"location":"docs/reference/profiles.yml/#_retry_policy","title":"retry_policy - (Optional) The policy for re-submitting the run.","text":""},{"location":"docs/reference/profiles.yml/#max_duration","title":"max_duration - (Optional) The maximum duration of a run (e.g., 2h, 1d, etc). After it elapses, the run is forced to stop. Defaults to off.","text":""},{"location":"docs/reference/profiles.yml/#max_price","title":"max_price - (Optional) The maximum price per hour, in dollars.","text":""},{"location":"docs/reference/profiles.yml/#pool_name","title":"pool_name - (Optional) The name of the pool. If not set, dstack will use the default name.","text":""},{"location":"docs/reference/profiles.yml/#instance_name","title":"instance_name - (Optional) The name of the instance.","text":""},{"location":"docs/reference/profiles.yml/#creation_policy","title":"creation_policy - (Optional) The policy for using instances from the pool. Defaults to reuse-or-create.","text":""},{"location":"docs/reference/profiles.yml/#termination_policy","title":"termination_policy - (Optional) The policy for termination instances. Defaults to destroy-after-idle.","text":""},{"location":"docs/reference/profiles.yml/#termination_idle_time","title":"termination_idle_time - (Optional) Time to wait before destroying the idle instance. Defaults to 5m for dstack run and to 3d for dstack pool add.","text":""},{"location":"docs/reference/profiles.yml/#name","title":"name - The name of the profile that can be passed as --profile to dstack run.","text":""},{"location":"docs/reference/profiles.yml/#default","title":"default - (Optional) If set to true, dstack run will use this profile by default..","text":""},{"location":"docs/reference/profiles.yml/#retry_policy","title":"retry_policy","text":""},{"location":"docs/reference/profiles.yml/#retry","title":"retry - (Optional) Whether to retry the run on failure or not.","text":""},{"location":"docs/reference/profiles.yml/#duration","title":"duration - (Optional) The maximum period of retrying the run, e.g., 4h or 1d.","text":""},{"location":"docs/reference/api/python/","title":"Python API","text":"
The Python API enables running tasks, services, and managing runs programmatically.
Below is a quick example of submitting a task for running and displaying its logs.
import sys\n\nfrom dstack.api import Task, GPU, Client, Resources\n\nclient = Client.from_config()\n\ntask = Task(\n image=\"ghcr.io/huggingface/text-generation-inference:latest\",\n env={\"MODEL_ID\": \"TheBloke/Llama-2-13B-chat-GPTQ\"},\n commands=[\n \"text-generation-launcher --trust-remote-code --quantize gptq\",\n ],\n ports=[\"80\"],\n resources=Resources(gpu=GPU(memory=\"24GB\")),\n)\n\nrun = client.runs.submit(\n run_name=\"my-awesome-run\", # If not specified, a random name is assigned \n configuration=task,\n repo=None, # Specify to mount additional files\n)\n\nrun.attach()\n\ntry:\n for log in run.logs():\n sys.stdout.buffer.write(log)\n sys.stdout.buffer.flush()\nexcept KeyboardInterrupt:\n run.stop(abort=True)\nfinally:\n run.detach()\n
NOTE:
The configuration argument in the submit method can be either dstack.api.Task or dstack.api.Service.
If you create dstack.api.Task or dstack.api.Service, you may specify the image argument. If image isn't specified, the default image will be used. For a private Docker registry, ensure you also pass the registry_auth argument.
The repo argument in the submit method allows the mounting of a local folder, a remote repo, or a programmatically created repo. In this case, the commands argument can refer to the files within this repo.
The attach method waits for the run to start and, for dstack.api.Task sets up an SSH tunnel and forwards configured ports to localhost.
By default, it uses the default Git credentials configured on the machine. You can override these credentials via the git_identity_file or oauth_token arguments of the init method.
Once the repo is initialized, you can pass the repo object to the run:
run = client.runs.submit(\n configuration=...,\n repo=repo,\n)\n
Parameters:
Name Type Description Default repoRepo
The repo to initialize.
required git_identity_fileOptional[PathLike]
The private SSH key path for accessing the remote repo.
Name Type Description Default cpuOptional[Range[int]]
The number of CPUs
DEFAULT_CPU_COUNTmemoryOptional[Range[Memory]]
The size of RAM memory (e.g., \"16GB\")
DEFAULT_MEMORY_SIZEgpuOptional[GPUSpec]
The GPU spec
Noneshm_sizeOptional[Range[Memory]]
The of shared memory (e.g., \"8GB\"). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.
By default, it uses the default Git credentials configured on the machine. You can override these credentials via the git_identity_file or oauth_token arguments of the init method.
Finally, you can pass the repo object to the run:
run = client.runs.submit(\n configuration=...,\n repo=repo,\n)\n
$ dstack server --help\nUsage: dstack server [-h] [--host HOST] [-p PORT] [-l LOG_LEVEL] [--default]\n [--no-default] [--token TOKEN]\n\nOptions:\n -h, --help Show this help message and exit\n --host HOST Bind socket to this host. Defaults to 127.0.0.1\n -p, --port PORT Bind socket to this port. Defaults to 3000.\n -l, --log-level LOG_LEVEL\n Server logging level. Defaults to WARNING.\n --default Update the default project configuration\n --no-default Do not update the default project configuration\n --token TOKEN The admin user token\n
This command initializes the current folder as a repo.
$ dstack init --help\nUsage: dstack init [-h] [--project PROJECT] [-t OAUTH_TOKEN]\n [--git-identity SSH_PRIVATE_KEY]\n [--ssh-identity SSH_PRIVATE_KEY] [--local]\n\nOptions:\n -h, --help Show this help message and exit\n --project PROJECT The name of the project\n -t, --token OAUTH_TOKEN\n An authentication token for Git\n --git-identity SSH_PRIVATE_KEY\n The private SSH key path to access the remote repo\n --ssh-identity SSH_PRIVATE_KEY\n The private SSH key path for SSH tunneling\n --local Do not use git\n
Git credentials
If the current folder is a Git repo, the command authorizes dstack to access it. By default, the command uses the default Git credentials configured for the repo. You can override these credentials via --token (OAuth token) or --git-identity.
Custom SSH key
By default, this command generates an SSH key that will be used for port forwarding and SSH access to running workloads. You can override this key via --ssh-identity.
$ dstack run . --help\nUsage: dstack run [--project NAME] [-h [TYPE]] [-f FILE] [-n RUN_NAME] [-d]\n [-y] [--max-offers MAX_OFFERS] [--profile NAME]\n [--max-price PRICE] [--max-duration DURATION] [-b NAME]\n [-r NAME] [--instance-type NAME]\n [--pool POOL_NAME | --reuse | --dont-destroy | --idle-duration IDLE_DURATION | --instance NAME]\n [--spot | --on-demand | --spot-auto | --spot-policy POLICY]\n [--retry | --no-retry | --retry-duration DURATION]\n [-e KEY=VALUE] [--gpu SPEC] [--disk RANGE]\n working_dir\n\nPositional Arguments:\n working_dir\n\nOptions:\n --project NAME The name of the project. Defaults to $DSTACK_PROJECT\n -h, --help [TYPE] Show this help message and exit. TYPE is one of task,\n dev-environment, service\n -f, --file FILE The path to the run configuration file. Defaults to\n WORKING_DIR/.dstack.yml\n -n, --name RUN_NAME The name of the run. If not specified, a random name\n is assigned\n -d, --detach Do not poll logs and run status\n -y, --yes Do not ask for plan confirmation\n --max-offers MAX_OFFERS\n Number of offers to show in the run plan\n -e, --env KEY=VALUE Environment variables\n --gpu SPEC Request GPU for the run. The format is\n NAME:COUNT:MEMORY (all parts are optional)\n --disk RANGE Request the size range of disk for the run. Example\n --disk 100GB...\n\nProfile:\n --profile NAME The name of the profile. Defaults to $DSTACK_PROFILE\n --max-price PRICE The maximum price per hour, in dollars\n --max-duration DURATION\n The maximum duration of the run\n -b, --backend NAME The backends that will be tried for provisioning\n -r, --region NAME The regions that will be tried for provisioning\n --instance-type NAME The cloud-specific instance types that will be tried\n for provisioning\n\nPools:\n --pool POOL_NAME The name of the pool. If not set, the default pool\n will be used\n --reuse Reuse instance from pool\n --dont-destroy Do not destroy instance after the run is finished\n --idle-duration IDLE_DURATION\n Time to wait before destroying the idle instance\n --instance NAME Reuse instance from pool with name NAME\n\nSpot Policy:\n --spot Consider only spot instances\n --on-demand Consider only on-demand instances\n --spot-auto Consider both spot and on-demand instances\n --spot-policy POLICY One of spot, on-demand, auto\n\nRetry Policy:\n --retry\n --no-retry\n --retry-duration DURATION\n
.gitignore
When running anything via CLI, dstack uses the exact version of code from your project directory.
If there are large files, consider creating a .gitignore file to exclude them for better performance.
$ dstack ps --help\nUsage: dstack ps [-h] [--project NAME] [-a] [-v] [-w]\n\nOptions:\n -h, --help Show this help message and exit\n --project NAME The name of the project. Defaults to $DSTACK_PROJECT\n -a, --all Show all runs. By default, it only shows unfinished runs or\n the last finished.\n -v, --verbose Show more information about runs\n -w, --watch Watch statuses of runs in realtime\n
This command stops run(s) within the current repository.
$ dstack stop --help\nUsage: dstack stop [-h] [--project NAME] [-x] [-y] run_name\n\nPositional Arguments:\n run_name\n\nOptions:\n -h, --help Show this help message and exit\n --project NAME The name of the project. Defaults to $DSTACK_PROJECT\n -x, --abort\n -y, --yes\n
This command shows the output of a given run within the current repository.
$ dstack logs --help\nUsage: dstack logs [-h] [--project NAME] [-d] [-a]\n [--ssh-identity SSH_PRIVATE_KEY] [--replica REPLICA]\n [--job JOB]\n run_name\n\nPositional Arguments:\n run_name\n\nOptions:\n -h, --help Show this help message and exit\n --project NAME The name of the project. Defaults to $DSTACK_PROJECT\n -d, --diagnose\n -a, --attach Set up an SSH tunnel, and print logs as they follow.\n --ssh-identity SSH_PRIVATE_KEY\n The private SSH key path for SSH tunneling\n --replica REPLICA The relica number. Defaults to 0.\n --job JOB The job number inside the replica. Defaults to 0.\n
Both the CLI and API need to be configured with the server address, user token, and project name via ~/.dstack/config.yml.
At startup, the server automatically configures CLI and API with the server address, user token, and the default project name (main). This configuration is stored via ~/.dstack/config.yml.
To use CLI and API on different machines or projects, use the dstack config command.
$ dstack config --help\nUsage: dstack config [-h] [--project PROJECT] [--url URL] [--token TOKEN]\n [--default] [--remove] [--no-default]\n\nOptions:\n -h, --help Show this help message and exit\n --project PROJECT The name of the project to configure\n --url URL Server url\n --token TOKEN User token\n --default Set the project as default. It will be used when\n --project is omitted in commands.\n --remove Delete project configuration\n --no-default Do not prompt to set the project as default\n
Pools allow for managing the lifecycle of instances and reusing them across runs. The default pool is created automatically.
"},{"location":"docs/reference/cli/#dstack-pool-add","title":"dstack pool add","text":"
The dstack pool add command adds an instance to a pool. If no pool name is specified, the instance goes to the default pool.
$ dstack pool add --help\nUsage: dstack pool add [-h] [-y] [--remote] [--remote-host REMOTE_HOST]\n [--remote-port REMOTE_PORT] [--name INSTANCE_NAME]\n [--profile NAME] [--max-price PRICE] [-b NAME]\n [-r NAME] [--instance-type NAME] [--pool POOL_NAME]\n [--reuse] [--dont-destroy]\n [--idle-duration IDLE_DURATION]\n [--spot | --on-demand | --spot-auto | --spot-policy POLICY]\n [--retry | --no-retry | --retry-duration DURATION]\n [--cpu SPEC] [--memory SIZE] [--shared-memory SIZE]\n [--gpu SPEC] [--disk SIZE]\n\nOptions:\n -h, --help show this help message and exit\n -y, --yes Don't ask for confirmation\n --remote Add remote runner as an instance\n --remote-host REMOTE_HOST\n Remote runner host\n --remote-port REMOTE_PORT\n Remote runner port\n --name INSTANCE_NAME Set the name of the instance\n --pool POOL_NAME The name of the pool. If not set, the default pool\n will be used\n --reuse Reuse instance from pool\n --dont-destroy Do not destroy instance after the run is finished\n --idle-duration IDLE_DURATION\n Time to wait before destroying the idle instance\n\nProfile:\n --profile NAME The name of the profile. Defaults to $DSTACK_PROFILE\n --max-price PRICE The maximum price per hour, in dollars\n -b, --backend NAME The backends that will be tried for provisioning\n -r, --region NAME The regions that will be tried for provisioning\n --instance-type NAME The cloud-specific instance types that will be tried\n for provisioning\n\nSpot Policy:\n --spot Consider only spot instances\n --on-demand Consider only on-demand instances\n --spot-auto Consider both spot and on-demand instances\n --spot-policy POLICY One of spot, on-demand, auto\n\nRetry Policy:\n --retry\n --no-retry\n --retry-duration DURATION\n\nResources:\n --cpu SPEC Request the CPU count. Default: 2..\n --memory SIZE Request the size of RAM. The format is SIZE:MB|GB|TB.\n Default: 8GB..\n --shared-memory SIZE Request the size of Shared Memory. The format is\n SIZE:MB|GB|TB.\n --gpu SPEC Request GPU for the run. The format is\n NAME:COUNT:MEMORY (all parts are optional)\n --disk SIZE Request the size of disk for the run. Example --disk\n 100GB...\n
"},{"location":"docs/reference/cli/#dstack-pool-ps","title":"dstack pool ps","text":"
The dstack pool ps command lists all active instances of a pool. If no pool name is specified, default pool instances are displayed.
$ dstack pool ps --help\nUsage: dstack pool ps [-h] [--pool POOL_NAME] [-w]\n\nShow instances in the pool\n\nOptions:\n -h, --help show this help message and exit\n --pool POOL_NAME The name of the pool. If not set, the default pool will be\n used\n -w, --watch Watch instances in realtime\n
"},{"location":"docs/reference/cli/#dstack-pool-create","title":"dstack pool create","text":"
The dstack pool create command creates a new pool.
$ dstack pool create --help\nUsage: dstack pool create [-h] -n POOL_NAME\n\nOptions:\n -h, --help show this help message and exit\n -n, --name POOL_NAME The name of the pool\n
"},{"location":"docs/reference/cli/#dstack-pool-list","title":"dstack pool list","text":"
The dstack pool list lists all existing pools.
$ dstack pool delete --help\nUsage: dstack pool delete [-h] -n POOL_NAME\n\nOptions:\n -h, --help show this help message and exit\n -n, --name POOL_NAME The name of the pool\n
"},{"location":"docs/reference/cli/#dstack-pool-delete","title":"dstack pool delete","text":"
The dstack pool delete command deletes a specified pool.
$ dstack pool delete --help\nUsage: dstack pool delete [-h] -n POOL_NAME\n\nOptions:\n -h, --help show this help message and exit\n -n, --name POOL_NAME The name of the pool\n
A gateway is required for running services. It handles ingress traffic, authorization, domain mapping, model mapping for the OpenAI-compatible endpoint, and so on.
The dstack gateway list command displays the names and addresses of the gateways configured in the project.
$ dstack gateway list --help\nUsage: dstack gateway list [-h] [-v]\n\nOptions:\n -h, --help show this help message and exit\n -v, --verbose Show more information\n
The dstack gateway create command creates a new gateway instance in the project.
$ dstack gateway create --help\nUsage: dstack gateway create [-h] --backend {aws,azure,gcp,kubernetes}\n --region REGION [--set-default] [--name NAME]\n --domain DOMAIN\n\nOptions:\n -h, --help show this help message and exit\n --backend {aws,azure,gcp,kubernetes}\n --region REGION\n --set-default Set as default gateway for the project\n --name NAME Set a custom name for the gateway\n --domain DOMAIN Set the domain for the gateway\n
The dstack gateway delete command deletes the specified gateway.
$ dstack gateway delete --help\nUsage: dstack gateway delete [-h] [-y] name\n\nPositional Arguments:\n name The name of the gateway\n\nOptions:\n -h, --help show this help message and exit\n -y, --yes Don't ask for confirmation\n
The dstack gateway update command updates the specified gateway.
$ dstack gateway update --help\nUsage: dstack gateway update [-h] [--set-default] [--domain DOMAIN] name\n\nPositional Arguments:\n name The name of the gateway\n\nOptions:\n -h, --help show this help message and exit\n --set-default Set it the default gateway for the project\n --domain DOMAIN Set the domain for the gateway\n
"},{"location":"docs/reference/cli/#environment-variables","title":"Environment variables","text":"Name Description Default DSTACK_CLI_LOG_LEVEL Configures CLI logging level CRITICALDSTACK_PROFILE Has the same effect as --profileNoneDSTACK_PROJECT Has the same effect as --projectNoneDSTACK_DEFAULT_CREDS_DISABLED Disables default credentials detection if set NoneDSTACK_LOCAL_BACKEND_ENABLED Enables local backend for debug if set NoneDSTACK_RUNNER_VERSION Sets exact runner version for debug latestDSTACK_SERVER_ADMIN_TOKEN Has the same effect as --tokenNoneDSTACK_SERVER_DIR Sets path to store data and server configs ~/.dstack/serverDSTACK_SERVER_HOST Has the same effect as --host127.0.0.1DSTACK_SERVER_LOG_LEVEL Has the same effect as --log-levelWARNINGDSTACK_SERVER_LOG_FORMAT Sets format of log output standardDSTACK_SERVER_PORT Has the same effect as --port3000DSTACK_SERVER_ROOT_LOG_LEVEL Sets root logger log level ERRORDSTACK_SERVER_UVICORN_LOG_LEVEL Sets uvicorn logger log level ERROR"},{"location":"docs/reference/dstack.yml/dev-environment/","title":"dev-environment","text":"
The dev-environment configuration type allows running dev environments.
Filename
Configuration files must have a name ending with .dstack.yml (e.g., .dstack.yml or dev.dstack.yml are both acceptable) and can be located in the project's root directory or any nested folder. Any configuration can be run via dstack run.
If you don't specify image, dstack uses the default Docker image pre-configured with python, pip, conda (Miniforge), and essential CUDA drivers. The python property determines which default Docker image is used.
If you specify memory size, you can either specify an explicit size (e.g. 24GB) or a range (e.g. 24GB.., or 24GB..80GB, or ..80GB).
type: dev-environment\n\nide: vscode\n\nresources:\n cpu: 16.. # 16 or more CPUs\n memory: 200GB.. # 200GB or more RAM\n gpu: 40GB..80GB:4 # 4 GPUs from 40GB to 80GB\n shm_size: 16GB # 16GB of shared memory\n disk: 500GB\n
The gpu property allows specifying not only memory size but also GPU names and their quantity. Examples: A100 (one A100), A10G,A100 (either A10G or A100), A100:80GB (one A100 of 80GB), A100:2 (two A100), 24GB..40GB:2 (two GPUs between 24GB and 40GB), A100:40GB:2 (two A100 GPUs of 40GB).
Shared memory
If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure shm_size, e.g. set it to 16GB.
If you don't assign a value to an environment variable (see HUGGING_FACE_HUB_TOKEN above), dstack will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a .env file and utilize tools like direnv.
"},{"location":"docs/reference/dstack.yml/dev-environment/#root-reference","title":"Root reference","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#ide","title":"ide - The IDE to run.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#version","title":"version - (Optional) The version of the IDE.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#init","title":"init - (Optional) The bash commands to run.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#image","title":"image - (Optional) The name of the Docker image to run.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#entrypoint","title":"entrypoint - (Optional) The Docker entrypoint.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#home_dir","title":"home_dir - (Optional) The absolute path to the home directory inside the container. Defaults to /root.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#_registry_auth","title":"registry_auth - (Optional) Credentials for pulling a private Docker image.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#python","title":"python - (Optional) The major version of Python. Mutually exclusive with image.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#env","title":"env - (Optional) The mapping or the list of environment variables.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#setup","title":"setup - (Optional) The bash commands to run on the boot.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#_resources","title":"resources - (Optional) The resources requirements to run the configuration.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#ports","title":"ports - (Optional) Port numbers/mapping to expose.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#backends","title":"backends - (Optional) The backends to consider for provisionig (e.g., [aws, gcp]).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#regions","title":"regions - (Optional) The regions to consider for provisionig (e.g., [eu-west-1, us-west4, westeurope]).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#instance_types","title":"instance_types - (Optional) The cloud-specific instance types to consider for provisionig (e.g., [p3.8xlarge, n1-standard-4]).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#spot_policy","title":"spot_policy - (Optional) The policy for provisioning spot or on-demand instances: spot, on-demand, or auto.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#_retry_policy","title":"retry_policy - (Optional) The policy for re-submitting the run.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#max_duration","title":"max_duration - (Optional) The maximum duration of a run (e.g., 2h, 1d, etc). After it elapses, the run is forced to stop. Defaults to off.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#max_price","title":"max_price - (Optional) The maximum price per hour, in dollars.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#pool_name","title":"pool_name - (Optional) The name of the pool. If not set, dstack will use the default name.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#instance_name","title":"instance_name - (Optional) The name of the instance.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#creation_policy","title":"creation_policy - (Optional) The policy for using instances from the pool. Defaults to reuse-or-create.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#termination_policy","title":"termination_policy - (Optional) The policy for termination instances. Defaults to destroy-after-idle.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#termination_idle_time","title":"termination_idle_time - (Optional) Time to wait before destroying the idle instance. Defaults to 5m for dstack run and to 3d for dstack pool add.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#resources","title":"resources","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#cpu","title":"cpu - (Optional) The number of CPU cores. Defaults to 2...","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#memory","title":"memory - (Optional) The RAM size (e.g., 8GB). Defaults to 8GB...","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#shm_size","title":"shm_size - (Optional) The size of shared memory (e.g., 8GB). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#_gpu","title":"gpu - (Optional) The GPU requirements. Can be set to a number, a string (e.g. A100, 80GB:2, etc.), or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#_disk","title":"disk - (Optional) The disk resources.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#resources-gpu","title":"resources.gpu","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#name","title":"name - (Optional) The GPU name or list of names.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#count","title":"count - (Optional) The number of GPUs. Defaults to 1.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#memory","title":"memory - (Optional) The VRAM size (e.g., 16GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#total_memory","title":"total_memory - (Optional) The total VRAM size (e.g., 32GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#compute_capability","title":"compute_capability - (Optional) The minimum compute capability of the GPU (e.g., 7.5).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#resources-disk","title":"resources.disk","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#size","title":"size - The disk size. Can be a string (e.g., 100GB or 100GB..) or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#registry_auth","title":"registry_auth","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#username","title":"username - The username.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#password","title":"password - The password or access token.","text":""},{"location":"docs/reference/dstack.yml/service/","title":"service","text":"
The service configuration type allows running services.
Filename
Configuration files must have a name ending with .dstack.yml (e.g., .dstack.yml or serve.dstack.yml are both acceptable) and can be located in the project's root directory or any nested folder. Any configuration can be run via dstack run.
If you don't specify image, dstack uses the default Docker image pre-configured with python, pip, conda (Miniforge), and essential CUDA drivers. The python property determines which default Docker image is used.
In this case, with such a configuration, once the service is up, you'll be able to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface. See services for more detail.
"},{"location":"docs/reference/dstack.yml/service/#replicas-and-auto-scaling","title":"Replicas and auto-scaling","text":"
By default, dstack runs a single replica of the service. You can configure the number of replicas as well as the auto-scaling policy.
If you specify memory size, you can either specify an explicit size (e.g. 24GB) or a range (e.g. 24GB.., or 24GB..80GB, or ..80GB).
type: service\n\npython: \"3.11\"\ncommands:\n - pip install vllm\n - python -m vllm.entrypoints.openai.api_server\n --model mistralai/Mixtral-8X7B-Instruct-v0.1\n --host 0.0.0.0\n --tensor-parallel-size 2 # Match the number of GPUs\nport: 8000\n\nresources:\n gpu: 80GB:2 # 2 GPUs of 80GB\n disk: 200GB\n\n# Enable the OpenAI-compatible endpoint\nmodel:\n type: chat\n name: TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ\n format: openai\n
The gpu property allows specifying not only memory size but also GPU names and their quantity. Examples: A100 (one A100), A10G,A100 (either A10G or A100), A100:80GB (one A100 of 80GB), A100:2 (two A100), 24GB..40GB:2 (two GPUs between 24GB and 40GB), A100:40GB:2 (two A100 GPUs of 40GB).
Shared memory
If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure shm_size, e.g. set it to 16GB.
By default, the service endpoint requires the Authorization header with \"Bearer <dstack token>\". Authorization can be disabled by setting auth to false.
"},{"location":"docs/reference/dstack.yml/service/#root-reference","title":"Root reference","text":""},{"location":"docs/reference/dstack.yml/service/#port","title":"port - The port, that application listens on or the mapping.","text":""},{"location":"docs/reference/dstack.yml/service/#model","title":"model - (Optional) Mapping of the model for the OpenAI-compatible endpoint.","text":""},{"location":"docs/reference/dstack.yml/service/#auth","title":"auth - (Optional) Enable the authorization. Defaults to True.","text":""},{"location":"docs/reference/dstack.yml/service/#replicas","title":"replicas - (Optional) The range . Defaults to 1.","text":""},{"location":"docs/reference/dstack.yml/service/#_scaling","title":"scaling - (Optional) The auto-scaling configuration.","text":""},{"location":"docs/reference/dstack.yml/service/#image","title":"image - (Optional) The name of the Docker image to run.","text":""},{"location":"docs/reference/dstack.yml/service/#entrypoint","title":"entrypoint - (Optional) The Docker entrypoint.","text":""},{"location":"docs/reference/dstack.yml/service/#home_dir","title":"home_dir - (Optional) The absolute path to the home directory inside the container. Defaults to /root.","text":""},{"location":"docs/reference/dstack.yml/service/#_registry_auth","title":"registry_auth - (Optional) Credentials for pulling a private Docker image.","text":""},{"location":"docs/reference/dstack.yml/service/#python","title":"python - (Optional) The major version of Python. Mutually exclusive with image.","text":""},{"location":"docs/reference/dstack.yml/service/#env","title":"env - (Optional) The mapping or the list of environment variables.","text":""},{"location":"docs/reference/dstack.yml/service/#setup","title":"setup - (Optional) The bash commands to run on the boot.","text":""},{"location":"docs/reference/dstack.yml/service/#_resources","title":"resources - (Optional) The resources requirements to run the configuration.","text":""},{"location":"docs/reference/dstack.yml/service/#commands","title":"commands - (Optional) The bash commands to run.","text":""},{"location":"docs/reference/dstack.yml/service/#backends","title":"backends - (Optional) The backends to consider for provisionig (e.g., [aws, gcp]).","text":""},{"location":"docs/reference/dstack.yml/service/#regions","title":"regions - (Optional) The regions to consider for provisionig (e.g., [eu-west-1, us-west4, westeurope]).","text":""},{"location":"docs/reference/dstack.yml/service/#instance_types","title":"instance_types - (Optional) The cloud-specific instance types to consider for provisionig (e.g., [p3.8xlarge, n1-standard-4]).","text":""},{"location":"docs/reference/dstack.yml/service/#spot_policy","title":"spot_policy - (Optional) The policy for provisioning spot or on-demand instances: spot, on-demand, or auto.","text":""},{"location":"docs/reference/dstack.yml/service/#_retry_policy","title":"retry_policy - (Optional) The policy for re-submitting the run.","text":""},{"location":"docs/reference/dstack.yml/service/#max_duration","title":"max_duration - (Optional) The maximum duration of a run (e.g., 2h, 1d, etc). After it elapses, the run is forced to stop. Defaults to off.","text":""},{"location":"docs/reference/dstack.yml/service/#max_price","title":"max_price - (Optional) The maximum price per hour, in dollars.","text":""},{"location":"docs/reference/dstack.yml/service/#pool_name","title":"pool_name - (Optional) The name of the pool. If not set, dstack will use the default name.","text":""},{"location":"docs/reference/dstack.yml/service/#instance_name","title":"instance_name - (Optional) The name of the instance.","text":""},{"location":"docs/reference/dstack.yml/service/#creation_policy","title":"creation_policy - (Optional) The policy for using instances from the pool. Defaults to reuse-or-create.","text":""},{"location":"docs/reference/dstack.yml/service/#termination_policy","title":"termination_policy - (Optional) The policy for termination instances. Defaults to destroy-after-idle.","text":""},{"location":"docs/reference/dstack.yml/service/#termination_idle_time","title":"termination_idle_time - (Optional) Time to wait before destroying the idle instance. Defaults to 5m for dstack run and to 3d for dstack pool add.","text":""},{"location":"docs/reference/dstack.yml/service/#model_1","title":"model","text":""},{"location":"docs/reference/dstack.yml/service/#type","title":"type - The type of the model.","text":""},{"location":"docs/reference/dstack.yml/service/#name","title":"name - The name of the model.","text":""},{"location":"docs/reference/dstack.yml/service/#format","title":"format - The serving format.","text":""},{"location":"docs/reference/dstack.yml/service/#scaling","title":"scaling","text":""},{"location":"docs/reference/dstack.yml/service/#metric","title":"metric - The target metric to track.","text":""},{"location":"docs/reference/dstack.yml/service/#target","title":"target - The target value of the metric.","text":""},{"location":"docs/reference/dstack.yml/service/#scale_up_delay","title":"scale_up_delay - (Optional) The delay in seconds before scaling up. Defaults to 300.","text":""},{"location":"docs/reference/dstack.yml/service/#scale_down_delay","title":"scale_down_delay - (Optional) The delay in seconds before scaling down. Defaults to 600.","text":""},{"location":"docs/reference/dstack.yml/service/#resources","title":"resources","text":""},{"location":"docs/reference/dstack.yml/service/#cpu","title":"cpu - (Optional) The number of CPU cores. Defaults to 2...","text":""},{"location":"docs/reference/dstack.yml/service/#memory","title":"memory - (Optional) The RAM size (e.g., 8GB). Defaults to 8GB...","text":""},{"location":"docs/reference/dstack.yml/service/#shm_size","title":"shm_size - (Optional) The size of shared memory (e.g., 8GB). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.","text":""},{"location":"docs/reference/dstack.yml/service/#_gpu","title":"gpu - (Optional) The GPU requirements. Can be set to a number, a string (e.g. A100, 80GB:2, etc.), or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/service/#_disk","title":"disk - (Optional) The disk resources.","text":""},{"location":"docs/reference/dstack.yml/service/#resources-gpu","title":"resouces.gpu","text":""},{"location":"docs/reference/dstack.yml/service/#name","title":"name - (Optional) The GPU name or list of names.","text":""},{"location":"docs/reference/dstack.yml/service/#count","title":"count - (Optional) The number of GPUs. Defaults to 1.","text":""},{"location":"docs/reference/dstack.yml/service/#memory","title":"memory - (Optional) The VRAM size (e.g., 16GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/service/#total_memory","title":"total_memory - (Optional) The total VRAM size (e.g., 32GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/service/#compute_capability","title":"compute_capability - (Optional) The minimum compute capability of the GPU (e.g., 7.5).","text":""},{"location":"docs/reference/dstack.yml/service/#resources-disk","title":"resouces.disk","text":""},{"location":"docs/reference/dstack.yml/service/#size","title":"size - The disk size. Can be a string (e.g., 100GB or 100GB..) or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/service/#registry_auth","title":"registry_auth","text":""},{"location":"docs/reference/dstack.yml/service/#username","title":"username - The username.","text":""},{"location":"docs/reference/dstack.yml/service/#password","title":"password - The password or access token.","text":""},{"location":"docs/reference/dstack.yml/task/","title":"task","text":"
The task configuration type allows running tasks.
Filename
Configuration files must have a name ending with .dstack.yml (e.g., .dstack.yml or train.dstack.yml are both acceptable) and can be located in the project's root directory or any nested folder. Any configuration can be run via dstack run.
If you don't specify image, dstack uses the default Docker image pre-configured with python, pip, conda (Miniforge), and essential CUDA drivers. The python property determines which default Docker image is used.
A task can configure ports. In this case, if the task is running an application on a port, dstack run will securely allow you to access this port from your local machine through port forwarding.
If you specify memory size, you can either specify an explicit size (e.g. 24GB) or a range (e.g. 24GB.., or 24GB..80GB, or ..80GB).
type: task\n\ncommands:\n - pip install -r fine-tuning/qlora/requirements.txt\n - python fine-tuning/qlora/train.py\n\nresources:\n cpu: 16.. # 16 or more CPUs\n memory: 200GB.. # 200GB or more RAM\n gpu: 40GB..80GB:4 # 4 GPUs from 40GB to 80GB\n shm_size: 16GB # 16GB of shared memory\n disk: 500GB\n
The gpu property allows specifying not only memory size but also GPU names and their quantity. Examples: A100 (one A100), A10G,A100 (either A10G or A100), A100:80GB (one A100 of 80GB), A100:2 (two A100), 24GB..40GB:2 (two GPUs between 24GB and 40GB), A100:40GB:2 (two A100 GPUs of 40GB).
Shared memory
If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure shm_size, e.g. set it to 16GB.
If you don't assign a value to an environment variable (see HUGGING_FACE_HUB_TOKEN above), dstack will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a .env file and utilize tools like direnv.
"},{"location":"docs/reference/dstack.yml/task/#root-reference","title":"Root reference","text":""},{"location":"docs/reference/dstack.yml/task/#nodes","title":"nodes - (Optional) Number of nodes. Defaults to 1.","text":""},{"location":"docs/reference/dstack.yml/task/#image","title":"image - (Optional) The name of the Docker image to run.","text":""},{"location":"docs/reference/dstack.yml/task/#entrypoint","title":"entrypoint - (Optional) The Docker entrypoint.","text":""},{"location":"docs/reference/dstack.yml/task/#home_dir","title":"home_dir - (Optional) The absolute path to the home directory inside the container. Defaults to /root.","text":""},{"location":"docs/reference/dstack.yml/task/#_registry_auth","title":"registry_auth - (Optional) Credentials for pulling a private Docker image.","text":""},{"location":"docs/reference/dstack.yml/task/#python","title":"python - (Optional) The major version of Python. Mutually exclusive with image.","text":""},{"location":"docs/reference/dstack.yml/task/#env","title":"env - (Optional) The mapping or the list of environment variables.","text":""},{"location":"docs/reference/dstack.yml/task/#setup","title":"setup - (Optional) The bash commands to run on the boot.","text":""},{"location":"docs/reference/dstack.yml/task/#_resources","title":"resources - (Optional) The resources requirements to run the configuration.","text":""},{"location":"docs/reference/dstack.yml/task/#ports","title":"ports - (Optional) Port numbers/mapping to expose.","text":""},{"location":"docs/reference/dstack.yml/task/#commands","title":"commands - (Optional) The bash commands to run.","text":""},{"location":"docs/reference/dstack.yml/task/#backends","title":"backends - (Optional) The backends to consider for provisionig (e.g., [aws, gcp]).","text":""},{"location":"docs/reference/dstack.yml/task/#regions","title":"regions - (Optional) The regions to consider for provisionig (e.g., [eu-west-1, us-west4, westeurope]).","text":""},{"location":"docs/reference/dstack.yml/task/#instance_types","title":"instance_types - (Optional) The cloud-specific instance types to consider for provisionig (e.g., [p3.8xlarge, n1-standard-4]).","text":""},{"location":"docs/reference/dstack.yml/task/#spot_policy","title":"spot_policy - (Optional) The policy for provisioning spot or on-demand instances: spot, on-demand, or auto.","text":""},{"location":"docs/reference/dstack.yml/task/#_retry_policy","title":"retry_policy - (Optional) The policy for re-submitting the run.","text":""},{"location":"docs/reference/dstack.yml/task/#max_duration","title":"max_duration - (Optional) The maximum duration of a run (e.g., 2h, 1d, etc). After it elapses, the run is forced to stop. Defaults to off.","text":""},{"location":"docs/reference/dstack.yml/task/#max_price","title":"max_price - (Optional) The maximum price per hour, in dollars.","text":""},{"location":"docs/reference/dstack.yml/task/#pool_name","title":"pool_name - (Optional) The name of the pool. If not set, dstack will use the default name.","text":""},{"location":"docs/reference/dstack.yml/task/#instance_name","title":"instance_name - (Optional) The name of the instance.","text":""},{"location":"docs/reference/dstack.yml/task/#creation_policy","title":"creation_policy - (Optional) The policy for using instances from the pool. Defaults to reuse-or-create.","text":""},{"location":"docs/reference/dstack.yml/task/#termination_policy","title":"termination_policy - (Optional) The policy for termination instances. Defaults to destroy-after-idle.","text":""},{"location":"docs/reference/dstack.yml/task/#termination_idle_time","title":"termination_idle_time - (Optional) Time to wait before destroying the idle instance. Defaults to 5m for dstack run and to 3d for dstack pool add.","text":""},{"location":"docs/reference/dstack.yml/task/#resources","title":"resources","text":""},{"location":"docs/reference/dstack.yml/task/#cpu","title":"cpu - (Optional) The number of CPU cores. Defaults to 2...","text":""},{"location":"docs/reference/dstack.yml/task/#memory","title":"memory - (Optional) The RAM size (e.g., 8GB). Defaults to 8GB...","text":""},{"location":"docs/reference/dstack.yml/task/#shm_size","title":"shm_size - (Optional) The size of shared memory (e.g., 8GB). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.","text":""},{"location":"docs/reference/dstack.yml/task/#_gpu","title":"gpu - (Optional) The GPU requirements. Can be set to a number, a string (e.g. A100, 80GB:2, etc.), or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/task/#_disk","title":"disk - (Optional) The disk resources.","text":""},{"location":"docs/reference/dstack.yml/task/#resources-gpu","title":"resouces.gpu","text":""},{"location":"docs/reference/dstack.yml/task/#name","title":"name - (Optional) The GPU name or list of names.","text":""},{"location":"docs/reference/dstack.yml/task/#count","title":"count - (Optional) The number of GPUs. Defaults to 1.","text":""},{"location":"docs/reference/dstack.yml/task/#memory","title":"memory - (Optional) The VRAM size (e.g., 16GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/task/#total_memory","title":"total_memory - (Optional) The total VRAM size (e.g., 32GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/task/#compute_capability","title":"compute_capability - (Optional) The minimum compute capability of the GPU (e.g., 7.5).","text":""},{"location":"docs/reference/dstack.yml/task/#resources-disk","title":"resouces.disk","text":""},{"location":"docs/reference/dstack.yml/task/#size","title":"size - The disk size. Can be a string (e.g., 100GB or 100GB..) or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/task/#registry_auth","title":"registry_auth","text":""},{"location":"docs/reference/dstack.yml/task/#username","title":"username - The username.","text":""},{"location":"docs/reference/dstack.yml/task/#password","title":"password - The password or access token.","text":""},{"location":"docs/reference/server/config.yml/","title":"~/.dstack/server/config.yml","text":"
The ~/.dstack/server/config.yml file is used by the dstack server to configure cloud accounts.
Projects
For flexibility, dstack server permits you to configure backends for multiple projects. If you intend to use only one project, name it main.
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_host: localhost # The external IP address of any node\n ssh_port: 32000 # Any port accessible outside of the cluster\n
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_port: 32000 # Any port accessible outside of the cluster\n
For more details on configuring clouds, please refer to Installation.
"},{"location":"docs/reference/server/config.yml/#root-reference","title":"Root reference","text":""},{"location":"docs/reference/server/config.yml/#_projects","title":"projects - The list of projects.","text":""},{"location":"docs/reference/server/config.yml/#projects","title":"projects[n]","text":""},{"location":"docs/reference/server/config.yml/#name","title":"name - The name of the project.","text":""},{"location":"docs/reference/server/config.yml/#backends","title":"backends - The list of backends.","text":""},{"location":"docs/reference/server/config.yml/#aws","title":"projects[n].backends[type=aws]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of the backend. Must be aws.","text":""},{"location":"docs/reference/server/config.yml/#vpc_name","title":"vpc_name - (Optional) The VPC name.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#aws-creds","title":"projects[n].backends[type=aws].creds","text":"Access keyDefault"},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be access_key.","text":""},{"location":"docs/reference/server/config.yml/#access_key","title":"access_key - The access key.","text":""},{"location":"docs/reference/server/config.yml/#secret_key","title":"secret_key - The secret key.","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be default.","text":""},{"location":"docs/reference/server/config.yml/#azure","title":"projects[n].backends[type=azure]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of the backend. Must be azure.","text":""},{"location":"docs/reference/server/config.yml/#tenant_id","title":"tenant_id - The tenant ID.","text":""},{"location":"docs/reference/server/config.yml/#subscription_id","title":"subscription_id - The subscription ID.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#azure-creds","title":"projects[n].backends[type=azure].creds","text":"ClientDefault"},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be client.","text":""},{"location":"docs/reference/server/config.yml/#client_id","title":"client_id - The client ID.","text":""},{"location":"docs/reference/server/config.yml/#client_secret","title":"client_secret - The client secret.","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be default.","text":""},{"location":"docs/reference/server/config.yml/#datacrunch","title":"projects[n].backends[type=datacrunch]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be datacrunch.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#datacrunch-creds","title":"projects[n].backends[type=datacrunch].creds","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be api_key.","text":""},{"location":"docs/reference/server/config.yml/#client_id","title":"client_id - The client ID.","text":""},{"location":"docs/reference/server/config.yml/#client_secret","title":"client_secret - The client secret.","text":""},{"location":"docs/reference/server/config.yml/#gcp","title":"projects[n].backends[type=gcp]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be gcp.","text":""},{"location":"docs/reference/server/config.yml/#project_id","title":"project_id - The project ID.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#gcp-creds","title":"projects[n].backends[type=gcp].creds","text":"Service accountDefault"},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be service_account.","text":""},{"location":"docs/reference/server/config.yml/#filename","title":"filename - The path to the service account file.","text":""},{"location":"docs/reference/server/config.yml/#data","title":"data - (Optional) The contents of the service account file.","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be default.","text":""},{"location":"docs/reference/server/config.yml/#lambda","title":"projects[n].backends[type=lambda]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be lambda.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#lambda-creds","title":"projects[n].backends[type=lambda].creds","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be api_key.","text":""},{"location":"docs/reference/server/config.yml/#api_key","title":"api_key - The API key.","text":""},{"location":"docs/reference/server/config.yml/#tensordock","title":"projects[n].backends[type=tensordock]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be tensordock.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#tensordock-creds","title":"projects[n].backends[type=tensordock].creds","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be api_key.","text":""},{"location":"docs/reference/server/config.yml/#api_key","title":"api_key - The API key.","text":""},{"location":"docs/reference/server/config.yml/#api_token","title":"api_token - The API token.","text":""},{"location":"docs/reference/server/config.yml/#vastai","title":"projects[n].backends[type=vastai]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be vastai.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#vastai-creds","title":"projects[n].backends[type=vastai].creds","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be api_key.","text":""},{"location":"docs/reference/server/config.yml/#api_key","title":"api_key - The API key.","text":""},{"location":"docs/reference/server/config.yml/#kubernetes","title":"projects[n].backends[type=kubernetes]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be kubernetes.","text":""},{"location":"docs/reference/server/config.yml/#_kubeconfig","title":"kubeconfig - The kubeconfig configuration.","text":""},{"location":"docs/reference/server/config.yml/#_networking","title":"networking - (Optional) The networking configuration.","text":""},{"location":"docs/reference/server/config.yml/#kubeconfig","title":"projects[n].backends[type=kubernetes].kubeconfig","text":""},{"location":"docs/reference/server/config.yml/#filename","title":"filename - The path to the kubeconfig file.","text":""},{"location":"docs/reference/server/config.yml/#data","title":"data - (Optional) The contents of the kubeconfig file.","text":""},{"location":"docs/reference/server/config.yml/#networking","title":"projects[n].backends[type=kubernetes].networking","text":""},{"location":"docs/reference/server/config.yml/#ssh_host","title":"ssh_host - (Optional) The external IP address of any node.","text":""},{"location":"docs/reference/server/config.yml/#ssh_port","title":"ssh_port - (Optional) Any port accessible outside of the cluster.","text":""},{"location":"examples/infinity/","title":"Infinity","text":"
This example demonstrates how to use Infinity with dstack' s services to deploy any SentenceTransformers based embedding models.
"},{"location":"examples/infinity/#define-the-configuration","title":"Define the configuration","text":"
To deploy a SentenceTransformers based embedding models using Infinity, you need to define the following configuration file at minimum:
"},{"location":"examples/infinity/#run-the-configuration","title":"Run the configuration","text":"
Gateway
Before running a service, ensure that you have configured a gateway. If you're using dstack Sky, the default gateway is configured automatically for you.
$ dstack run . -f infinity/serve.dstack.yml\n
"},{"location":"examples/infinity/#access-the-endpoint","title":"Access the endpoint","text":"
Once the service is up, you can query it at https://<run name>.<gateway domain> (using the domain set up for the gateway):
Authorization
By default, the service endpoint requires the Authorization header with \"Bearer <dstack token>\".
Any embedding models served by Infinity automatically comes with OpenAI's Embeddings APIs compatible APIs, so we can directly use openai package to interact with the deployed Infinity.
from openai import OpenAI\nfrom functools import partial\n\nclient = OpenAI(base_url=\"https://<run name>.<gateway domain>\", api_key=\"<dstack token>\")\n\nclient.embeddings.create = partial(\n client.embeddings.create, model=\"bge-small-en-v1.5\"\n)\n\nprint(client.embeddings.create(input=[\"A sentence to encode.\"]))\n
RAG, or retrieval-augmented generation, empowers LLMs by providing them with access to your data.
Here's an example of how to apply this technique using the Llama Index framework and Weaviate vector database.
"},{"location":"examples/llama-index/#how-does-it-work","title":"How does it work?","text":"
Llama Index loads data from local files, structures it into chunks, and ingests it into Weaviate (an open-source vector database). We set up Llama Index to use local embeddings through the SentenceTransformers library.
dstack allows us to deploy LLMs to any cloud provider, e.g. via Services using TGI or vLLM.
Llama Index allows us to prompt the LLM automatically incorporating the context from Weaviate.
Next, prepare the Llama Index classes: llama_index.ServiceContext (for indexing and querying) and llama_index.StorageContext (for loading and storing).
Embeddings
Note that we're using langchain.embeddings.huggingface.HuggingFaceEmbeddings for local embeddings instead of OpenAI.
Once the utility classes are configured, we can load the data from local files and pass it to llama_index.VectorStoreIndex. Using its from_documents method will then store the data in the vector database.
The data is in the vector database! Now we can proceed with the part where we invoke an LLM using this data as context.
"},{"location":"examples/llama-index/#deploy-an-llm","title":"Deploy an LLM","text":"
This example assumes we're using an LLM deployed using TGI.
Once you deployed the model, make sure to set the TGI_ENDPOINT_URL environment variable to its URL, e.g. https://<run name>.<gateway domain> (or http://localhost:<port> if it's deployed as a task). We'll use this environment variable below.
$ curl -X POST --location $TGI_ENDPOINT_URL/generate \\\n -H 'Content-Type: application/json' \\\n -d '{\n \"inputs\": \"What is Deep Learning?\",\n \"parameters\": {\n \"max_new_tokens\": 20\n }\n }'\n
Once llama_index.VectorStoreIndex is ready, we can proceed with querying it.
Prompt format
If we're deploying Llama 2, we have to ensure that the prompt format is correct.
from llama_index import (QuestionAnswerPrompt, RefinePrompt)\n\ntext_qa_template = QuestionAnswerPrompt(\n \"\"\"<s>[INST] <<SYS>>\nWe have provided context information below. \n\n{context_str}\n\nGiven this information, please answer the question.\n<</SYS>>\n\n{query_str} [/INST]\"\"\"\n )\n\nrefine_template = RefinePrompt(\n \"\"\"<s>[INST] <<SYS>>\nThe original query is as follows: \n\n{query_str}\n\nWe have provided an existing answer:\n\n{existing_answer}\n\nWe have the opportunity to refine the existing answer (only if needed) with some more context below.\n\n{context_msg}\n<</SYS>>\n\nGiven the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer. [/INST]\"\"\"\n)\n\nquery_engine = index.as_query_engine(\n text_qa_template=text_qa_template,\n refine_template=refine_template,\n streaming=True,\n)\n\nresponse = query_engine.query(\"Make a bullet-point timeline of the authors biography?\")\nresponse.print_response_stream()\n
That's it! This basic example shows how straightforward it is to use Llama Index and Weaviate with the LLMs deployed using dstack. For more in-depth information, we encourage you to explore the documentation for each tool.
This example demonstrates how to deploy Mixtral with dstack's services.
"},{"location":"examples/mixtral/#define-the-configuration","title":"Define the configuration","text":"
To deploy Mixtral as a service, you have to define the corresponding configuration file. Below are multiple variants: via vLLM (fp16), TGI (fp16), or TGI (int4).
TGI fp16TGI int4vLLM fp16
type: service\n\nimage: ghcr.io/huggingface/text-generation-inference:latest\nenv:\n - MODEL_ID=mistralai/Mixtral-8x7B-Instruct-v0.1\ncommands:\n - text-generation-launcher \n --port 80\n --trust-remote-code\n --num-shard 2 # Should match the number of GPUs \nport: 80\n\nresources:\n gpu: 80GB:2\n disk: 200GB\n\n# (Optional) Enable the OpenAI-compatible endpoint\nmodel:\n type: chat\n name: TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ\n format: tgi\n
In case the service has the model mapping configured, you will also be able to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
from openai import OpenAI\n\nclient = OpenAI(base_url=\"https://gateway.<gateway domain>\", api_key=\"<dstack token>\")\n\ncompletion = client.chat.completions.create(\n model=\"mistralai/Mixtral-8x7B-Instruct-v0.1\",\n messages=[\n {\n \"role\": \"user\",\n \"content\": \"Compose a poem that explains the concept of recursion in programming.\",\n }\n ],\n stream=True,\n)\n\nfor chunk in completion:\n print(chunk.choices[0].delta.content, end=\"\")\nprint()\n
Hugging Face Hub token
To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable (with --env in dstack run or using env in the configuration file).
"},{"location":"examples/ollama/#run-the-configuration","title":"Run the configuration","text":"
Gateway
Before running a service, ensure that you have configured a gateway. If you're using dstack Sky, the default gateway is configured automatically for you.
$ dstack run . -f deployment/ollama/serve.dstack.yml\n
"},{"location":"examples/ollama/#access-the-endpoint","title":"Access the endpoint","text":"
Once the service is up, you can query it at https://<run name>.<gateway domain> (using the domain set up for the gateway):
Authorization
By default, the service endpoint requires the Authorization header with \"Bearer <dstack token>\".
Because we've configured the model mapping, it will also be possible to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
from openai import OpenAI\n\nclient = OpenAI(\n base_url=\"https://gateway.<gateway domain>\", \n api_key=\"<dstack token>\",\n)\n\ncompletion = client.chat.completions.create(\n model=\"mixtral\",\n messages=[\n {\n \"role\": \"user\",\n \"content\": \"Compose a poem that explains the concept of recursion in programming.\",\n }\n ],\n stream=True,\n)\n\nfor chunk in completion:\n print(chunk.choices[0].delta.content, end=\"\")\nprint()\n
Hugging Face Hub token
To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable (with --env in dstack run or using env in the configuration file).
This example demonstrates how to fine-tune llama-2-7b-chat-hf, with QLoRA and your own script, using Tasks.
"},{"location":"examples/qlora/#prepare-a-dataset","title":"Prepare a dataset","text":"
When selecting a dataset, make sure that it is pre-processed to match the prompt format of Llama 2:
<s>[INST] <<SYS>>\nSystem prompt\n<</SYS>>\n\nUser prompt [/INST] Model answer </s>\n
In our example, we'll use the mlabonne/guanaco-llama2-1k dataset. It is a 1K sample from the timdettmers/openassistant-guanaco dataset converted to Llama 2's format.
"},{"location":"examples/qlora/#define-the-training-script","title":"Define the training script","text":""},{"location":"examples/qlora/#requirements","title":"Requirements","text":"
The most notable libraries that we'll use are peft (required for using the QLoRA technique), bitsandbytes (required for using the quantization technique), and trl (required for supervised fine-tuning).
"},{"location":"examples/qlora/#publish-the-fine-tuned-model","title":"Publish the fine-tuned model","text":"
In the third part of the script, we merge the base model with the fine-tuned model and push it to the Hugging Face Hub.
from peft import PeftModel\nimport torch\nfrom transformers import (\n AutoModelForCausalLM,\n AutoTokenizer\n)\n\ndef merge_and_push(args):\n # Reload model in FP16 and merge it with LoRA weights\n base_model = AutoModelForCausalLM.from_pretrained(\n args.model_name,\n low_cpu_mem_usage=True,\n return_dict=True,\n torch_dtype=torch.float16,\n device_map=\"auto\",\n )\n model = PeftModel.from_pretrained(base_model, args.new_model_name)\n model = model.merge_and_unload()\n\n # Reload the new tokenizer\n tokenizer = AutoTokenizer.from_pretrained(\n args.model_name, trust_remote_code=True\n )\n tokenizer.pad_token = tokenizer.eos_token\n tokenizer.padding_side = \"right\"\n\n # Publish the new model to Hugging Face Hub\n model.push_to_hub(args.new_model_name, use_temp_dir=False)\n tokenizer.push_to_hub(args.new_model_name, use_temp_dir=False)\n
"},{"location":"examples/qlora/#put-it-all-together","title":"Put it all together","text":"
Finally, in the main part of the script, we put it all together.
dstack will provision the cloud instance corresponding to the configured project and profile, run the training, and tear down the cloud instance once the training is complete.
Tensorboard
Since we've executed tensorboard within our task and configured its port using ports, you can access it using the URL provided in the output. dstack automatically forwards the configured port to your local machine.
The code for the endpoints is ready. Now, let's explore how to use dstack to serve it on a cloud account of your choice.
"},{"location":"examples/sdxl/#define-the-configuration","title":"Define the configuration","text":"Tasks
If you want to serve an application for development purposes only, you can use tasks. In this scenario, while the application runs in the cloud, it is accessible from your local machine only.
For production purposes, the optimal approach to serve an application is by using services. In this case, the application can be accessed through a public endpoint.
"},{"location":"examples/sdxl/#run-the-configuration","title":"Run the configuration","text":"
NOTE:
Before running a service, ensure that you have configured a gateway. If you're using dstack Sky, the default gateway is configured automatically for you.
After the gateway is configured, go ahead run the service.
$ dstack run . -f deployment/sdxl/serve.dstack.yml\n
"},{"location":"examples/sdxl/#access-the-endpoint","title":"Access the endpoint","text":"
Once the service is up, you can query it at https://<run name>.<gateway domain> (using the domain set up for the gateway):
Authorization
By default, the service endpoint requires the Authorization header with \"Bearer <dstack token>\".
$ curl -X POST --location https://yellow-cat-1.mydomain.com/generate \\\n -H 'Content-Type: application/json' \\\n -H 'Authorization: \"Bearer <dstack token>\"' \\\n -d '{ \"prompt\": \"A cat in a hat\" }'\n
"},{"location":"examples/tei/#run-the-configuration","title":"Run the configuration","text":"
Gateway
Before running a service, ensure that you have configured a gateway. If you're using dstack Sky, the default gateway is configured automatically for you.
$ dstack run . -f deployment/tae/serve.dstack.yml\n
"},{"location":"examples/tei/#access-the-endpoint","title":"Access the endpoint","text":"
Once the service is up, you can query it at https://<run name>.<gateway domain> (using the domain set up for the gateway):
Authorization
By default, the service endpoint requires the Authorization header with \"Bearer <dstack token>\".
$ curl https://yellow-cat-1.example.com \\\n -X POST \\\n -H 'Content-Type: application/json' \\\n -H 'Authorization: \"Bearer <dstack token>\"' \\\n -d '{\"inputs\":\"What is Deep Learning?\"}'\n\n[[0.010704354,-0.033910684,0.004793657,-0.0042832214,0.07551489,0.028702762,0.03985837,0.021956133,...]]\n
Hugging Face Hub token
To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable (with --env in dstack run or using env in the configuration file).
Note the model property is optional and is only required if you're running a chat model and want to access it via an OpenAI-compatible endpoint. For more details on how to use this feature, check the documentation on services.
"},{"location":"examples/tgi/#run-the-configuration","title":"Run the configuration","text":"
Gateway
Before running a service, ensure that you have configured a gateway. If you're using dstack Sky, the default gateway is configured automatically for you.
$ dstack run . -f deployment/tgi/serve.dstack.yml\n
"},{"location":"examples/tgi/#access-the-endpoint","title":"Access the endpoint","text":"
Once the service is up, you'll be able to access it at https://<run name>.<gateway domain>.
Authorization
By default, the service endpoint requires the Authorization header with \"Bearer <dstack token>\".
$ curl https://yellow-cat-1.example.com/generate \\\n -X POST \\\n -d '{\"inputs\":\"<s>[INST] What is your favourite condiment?[/INST]\"}' \\\n -H 'Content-Type: application/json' \\\n -H 'Authorization: \"Bearer <dstack token>\"'\n
Because we've configured the model mapping, it will also be possible to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
from openai import OpenAI\n\n\nclient = OpenAI(\n base_url=\"https://gateway.<gateway domain>\",\n api_key=\"<dstack token>\"\n)\n\ncompletion = client.chat.completions.create(\n model=\"mistralai/Mistral-7B-Instruct-v0.1\",\n messages=[\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message)\n
Hugging Face Hub token
To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable (with --env in dstack run or using env in the configuration file).
"},{"location":"examples/vllm/#run-the-configuration","title":"Run the configuration","text":"
Gateway
Before running a service, ensure that you have configured a gateway. If you're using dstack Sky, the default gateway is configured automatically for you.
$ dstack run . -f deployment/vllm/serve.dstack.yml\n
"},{"location":"examples/vllm/#access-the-endpoint","title":"Access the endpoint","text":"
Once the service is up, you can query it at https://<run name>.<gateway domain> (using the domain set up for the gateway):
Authorization
By default, the service endpoint requires the Authorization header with \"Bearer <dstack token>\".
Because we've configured the model mapping, it will also be possible to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
from openai import OpenAI\n\nclient = OpenAI(\n base_url=\"https://gateway.<gateway domain>\", \n api_key=\"<dstack token>\"\n)\n\ncompletion = client.chat.completions.create(\n model=\"mixtral\",\n messages=[\n {\n \"role\": \"user\",\n \"content\": \"Compose a poem that explains the concept of recursion in programming.\",\n }\n ],\n stream=True,\n)\n\nfor chunk in completion:\n print(chunk.choices[0].delta.content, end=\"\")\nprint()\n
Hugging Face Hub token
To use a model with gated access, ensure configuring the HUGGING_FACE_HUB_TOKEN environment variable (with --env in dstack run or using env in the configuration file).
The complete, ready-to-run code is available in dstackai/dstack-examples.
What's next?
Check the Text Generation Inference example
Read about services
Browse examples
Join the Discord server
"},{"location":"changelog/archive/2024/","title":"2024","text":""},{"location":"changelog/archive/2023/","title":"2023","text":""},{"location":"changelog/page/2/","title":"Changelog","text":""},{"location":"blog/archive/2024/","title":"2024","text":""}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"],"fields":{"title":{"boost":1000.0},"text":{"boost":1.0},"tags":{"boost":1000000.0}}},"docs":[{"location":"blog/","title":"Blog","text":""},{"location":"blog/archive/say-goodbye-to-managed-notebooks/","title":"Say goodbye to managed notebooks","text":"
Data science and ML tools have made significant advancements in recent years. This blog post aims to examine the advantages of cloud dev environments (CDE) for ML engineers and compare them with web-based managed notebooks.
"},{"location":"blog/archive/say-goodbye-to-managed-notebooks/#notebooks-are-here-to-stay","title":"Notebooks are here to stay","text":"
Jupyter notebooks are instrumental for interactive work with data. They provide numerous advantages such as high interactivity, visualization support, remote accessibility, and effortless sharing.
Managed notebook platforms, like Google Colab and AWS SageMaker have become popular thanks to their easy integration with clouds. With pre-configured environments, managed notebooks remove the need to worry about infrastructure.
As the code evolves, it needs to be converted into Python scripts and stored in Git for improved organization and version control. Notebooks alone cannot handle this task, which is why they must be a part of a developer environment that also supports Python scripts and Git.
The JupyterLab project attempts to address this by turning notebooks into an IDE by adding a file browser, terminal, and Git support.
"},{"location":"blog/archive/say-goodbye-to-managed-notebooks/#ides-get-equipped-for-ml","title":"IDEs get equipped for ML","text":"
Recently, IDEs have improved in their ability to support machine learning. They have started to combine the benefits of traditional IDEs and managed notebooks.
IDEs have upgraded their remote capabilities, with better SSH support. Additionally, they now offer built-in support for editing notebooks.
Two popular IDEs, VS Code and PyCharm, have both integrated remote capabilities and seamless notebook editing features.
"},{"location":"blog/archive/say-goodbye-to-managed-notebooks/#the-rise-of-app-ecosystem","title":"The rise of app ecosystem","text":"
Notebooks have been beneficial for their interactivity and sharing features. However, there are new alternatives like Streamlit and Gradio that allow developers to build data apps using Python code. These frameworks not only simplify app-building but also enhance reproducibility by integrating with Git.
Hugging Face Spaces, for example, is a popular tool today for sharing Streamlit and Gradio apps with others.
"},{"location":"blog/archive/say-goodbye-to-managed-notebooks/#say-hello-to-cloud-dev-environments","title":"Say hello to cloud dev environments!","text":"
Remote development within IDEs is becoming increasingly popular, and as a result, cloud dev environments have emerged as a new concept. Various managed services, such as Codespaces and GitPod, offer scalable infrastructure while maintaining the familiar IDE experience.
One such open-source tool is dstack, which enables you to define your dev environment declaratively as code and run it on any cloud.
With this tool, provisioning the required hardware, setting up the pre-built environment (no Docker is needed), and fetching your local code is automated.
$ dstack run .\n\n RUN CONFIGURATION USER PROJECT INSTANCE SPOT POLICY\n honest-jellyfish-1 .dstack.yml peter gcp a2-highgpu-1g on-demand\n\nStarting SSH tunnel...\n\nTo open in VS Code Desktop, use one of these link:\n vscode://vscode-remote/ssh-remote+honest-jellyfish-1/workflow\n\nTo exit, press Ctrl+C.\n
You can securely access the cloud development environment with the desktop IDE of your choice.
Learn more
Check out our guide for running dev environments in your cloud.
dstack is an open-source tool designed for managing AI infrastructure across various cloud platforms. It's lighter and more specifically geared towards AI tasks compared to Kubernetes.
Due to its support for multiple cloud providers, dstack is frequently used to access on-demand and spot GPUs across multiple clouds. From our users, we've learned that managing various cloud accounts, quotas, and billing can be cumbersome.
To streamline this process, we introduce dstack Sky, a managed service that enables users to access GPUs from multiple providers through dstack \u2013 without needing an account in each cloud provider.
"},{"location":"blog/dstack-sky/#what-is-dstack-sky","title":"What is dstack Sky?","text":"
Instead of running dstack server yourself, you point dstack config to a project set up with dstack Sky.
Now, you can use dstack's CLI or API \u2013 just like you would with your own cloud accounts.
$ dstack run . -b tensordock -b vastai\n\n # BACKEND REGION RESOURCES SPOT PRICE \n 1 vastai canada 16xCPU/64GB/1xRTX4090/1TB no $0.35\n 2 vastai canada 16xCPU/64GB/1xRTX4090/400GB no $0.34\n 3 tensordock us 8xCPU/48GB/1xRTX4090/480GB no $0.74\n ...\n Shown 3 of 50 offers, $0.7424 max\n\nContinue? [y/n]:\n
Backends
dstack Sky supports the same backends as the open-source version, except that you don't need to set them up. By default, it uses all supported backends.
You can use both on-demand and spot instances without needing to manage quotas, as they are automatically handled for you.
With dstack Sky you can use all of dstack's features, incl. dev environments, tasks, services, and pools.
To use services, the open-source version requires setting up a gateway with your own domain. dstack Sky comes with a pre-configured gateway.
$ dstack gateway list\n BACKEND REGION NAME ADDRESS DOMAIN DEFAULT\n aws eu-west-1 dstack 3.252.79.143 my-awesome-project.sky.dstack.ai \u2713\n
If you run it with dstack Sky, the service's endpoint will be available at https://<run name>.<project name>.sky.dstack.ai.
If it has a model mapping, the model will be accessible at https://gateway.<project name>.sky.dstack.ai via the OpenAI compatible interface.
from openai import OpenAI\n\n\nclient = OpenAI(\n base_url=\"https://gateway.<project name>.sky.dstack.ai\",\n api_key=\"<dstack token>\"\n)\n\ncompletion = client.chat.completions.create(\n model=\"mixtral\",\n messages=[\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message)\n
Now, you can choose \u2014 either use dstack via the open-source version or via dstack Sky, or even use them side by side.
Credits
Are you an active contributor to the AI community? Request free dstack Sky credits.
dstack Sky is live on Product Hunt. Support it by giving it your vote!
Join Discord
"},{"location":"changelog/","title":"Changelog","text":""},{"location":"changelog/0.10.5/","title":"dstack 0.10.5: Lambda integration, Docker support, and more","text":"
In the previous update, we added initial integration with Lambda Cloud. With today's release, this integration has significantly improved and finally goes generally available. Additionally, the latest release adds support for custom Docker images.
By default, dstack uses its own base Docker images to run dev environments and tasks. These base images come pre-configured with Python, Conda, and essential CUDA drivers. However, there may be times when you need additional dependencies that you don't want to install every time you run your dev environment or task.
To address this, dstack now allows specifying custom Docker images. Here's an example:
Dev environments require the Docker image to have openssh-server pre-installed. If you want to use a custom Docker image with a dev environment and it does not include openssh-server, you can install it using the following method:
Until now, dstack has supported dev-environment and task as configuration types. Even though task may be used for basic serving use cases, it lacks crucial serving features. With the new update, we introduce service, a dedicated configuration type for serving.
As you see, there are two differences compared to task.
The gateway property: the address of a special cloud instance that wraps the running service with a public endpoint. Currently, you must specify it manually. In the future, dstack will assign it automatically.
The port property: A service must always configure one port on which it's running.
When running, dstack forwards the traffic to the gateway, providing you with a public endpoint that you can use to access the running service.
Existing limitations
Currently, you must create a gateway manually using the dstack gateway command and specify its address via YAML (e.g. using secrets). In the future, dstack will assign it automatically.
Gateways do not support HTTPS yet. When you run a service, its endpoint URL is <the address of the gateway>:80. The port can be overridden via the port property: instead of 8000, specify <gateway port>:8000.
Gateways do not provide authorization and auto-scaling. In the future, dstack will support them as well.
This initial support for services is the first step towards providing multi-cloud and cost-effective inference.
Give it a try and share feedback
Even though the current support is limited in many ways, we encourage you to give it a try and share your feedback with us!
More details on how to use services can be found in a dedicated guide in our docs. Questions and requests for help are very much welcome in our Discord server.
"},{"location":"changelog/0.11.0/","title":"dstack 0.11.0: Multi-cloud and multi-region projects","text":"
The latest release of dstack enables the automatic discovery of the best GPU price and availability across multiple configured cloud providers and regions.
"},{"location":"changelog/0.11.0/#multiple-backends-per-project","title":"Multiple backends per project","text":"
Now, dstack leverages price data from multiple configured cloud providers and regions to automatically suggest the most cost-effective options.
The default behavior of dstack is to first attempt the most cost-effective options, provided they are available. You have the option to set a maximum price limit either through max_price in .dstack/profiles.yml or by using --max-price in the dstack run command.
To implement this change, we have modified the way projects are configured. You can now configure multiple clouds and regions within a single project.
Why this matter?
The ability to run LLM workloads across multiple cloud GPU providers allows for a significant reduction in costs and an increase in availability, while also remaining independent of any particular cloud vendor.
We hope that the value of dstack will continue to grow as we expand our support for additional cloud GPU providers. If you're interested in a specific provider, please message us on Discord.
"},{"location":"changelog/0.11.0/#custom-domains-and-https","title":"Custom domains and HTTPS","text":"
In other news, it is now possible to deploy services using HTTPS. All you need to do is configure a wildcard domain (e.g., *.mydomain.com), point it to the gateway IP address, and then pass the subdomain you want to use (e.g., myservice.mydomain.com) to the gateway property in YAML (instead of the gateway IP address).
Using the dstack run command, you are now able to utilize options such as --gpu, --memory, --env, --max-price, and several other arguments to override the profile settings.
Lastly, the local backend is no longer supported. Now, you can run everything using only a cloud backend.
The documentation is updated to reflect the changes in the release.
Migration to 0.11
The dstack version 0.11 update brings significant changes that break backward compatibility. If you used prior dstack versions, after updating to dstack==0.11, you'll need to log in to the UI and reconfigure clouds.
We apologize for any inconvenience and aim to ensure future updates maintain backward compatibility.
"},{"location":"changelog/0.11.0/#give-it-a-try","title":"Give it a try","text":"
Getting started with dstack takes less than a minute. Go ahead and give it a try.
"},{"location":"changelog/0.12.0/","title":"dstack 0.12.0: Simplified cloud setup, and refined API","text":"
For the past six weeks, we've been diligently overhauling dstack with the aim of significantly simplifying the process of configuring clouds and enhancing the functionality of the API. Please take note of the breaking changes, as they necessitate careful migration.
Previously, the only way to configure clouds for a project was through the UI. Additionally, you had to specify not only the credentials but also set up a storage bucket for each cloud to store metadata.
Now, you can configure clouds for a project via ~/.dstack/server/config.yml. Example:
The dstack.api.Run instance provides methods for various operations including attaching to the run, forwarding ports to localhost, retrieving status, stopping, and accessing logs. For more details, refer to the reference.
Because we've prioritized CLI and API UX over the UI, the UI is no longer bundled. Please inform us if you experience any significant inconvenience related to this.
Gateways should now be configured using the dstack gateway command, and their usage requires you to specify a domain. Learn more about how to set up a gateway.
The dstack start command is now dstack server.
The Python API classes were moved from the dstack package to dstack.api.
Unfortunately, when upgrading to 0.12.0, there is no automatic migration for data. This means you'll need to delete ~/.dstack and configure dstack from scratch.
pip install \"dstack[all]==0.12.0\"
Delete ~/.dstack
Configure clouds via ~/.dstack/server/config.yml (see the new guide)
Run dstack server
The documentation and examples are updated.
"},{"location":"changelog/0.12.0/#give-it-a-try","title":"Give it a try","text":"
Getting started with dstack takes less than a minute. Go ahead and give it a try.
At dstack, we remain committed to our mission of building the most convenient tool for orchestrating generative AI workloads in the cloud. In today's release, we have added support for TensorDock, making it easier for you to leverage cloud GPUs at highly competitive prices.
Configuring your TensorDock account with dstack is very easy. Simply generate an authorization key in your TensorDock API settings and set it up in ~/.dstack/server/config.yml:
Now you can restart the server and proceed to using the CLI or API for running development environments, tasks, and services.
$ dstack run . -f .dstack.yml --gpu 40GB\n\n Min resources 1xGPU (40GB)\n Max price -\n Max duration 6h\n Retry policy no\n\n # REGION INSTANCE RESOURCES SPOT PRICE\n 1 unitedstates ef483076 10xCPU, 80GB, 1xA6000 (48GB) no $0.6235\n 2 canada 0ca177e7 10xCPU, 80GB, 1xA6000 (48GB) no $0.6435\n 3 canada 45d0cabd 10xCPU, 80GB, 1xA6000 (48GB) no $0.6435\n ...\n\nContinue? [y/n]:\n
TensorDock offers cloud GPUs on top of servers from dozens of independent hosts, providing some of the most affordable GPU pricing you can find on the internet.
With dstack, you can now utilize TensorDock's GPUs through a highly convenient interface, which includes the developer-friendly CLI and API.
Feedback and support
Feel free to ask questions or seek help in our Discord server.
dstack simplifies gen AI model development and deployment through its developer-friendly CLI and API. It eliminates cloud infrastructure hassles while supporting top cloud providers (such as AWS, GCP, Azure, among others).
While dstack streamlines infrastructure challenges, GPU costs can still hinder development. To address this, we've integrated dstack with Vast.ai, a marketplace providing GPUs from independent hosts at notably lower prices compared to other providers.
With the dstack 0.12.3 release, it's now possible use Vast.ai alongside other cloud providers.
Now you can restart the server and proceed to using dstack's CLI and API.
If you want an easy way to develop, train and deploy gen AI models using affordable cloud GPUs, give dstack with Vast.ai a try.
Feedback and support
Feel free to ask questions or seek help in our Discord server.
"},{"location":"changelog/0.13.0/","title":"dstack 0.13.0: Disk size, CUDA 12.1, Mixtral, and more","text":"
As we wrap up this year, we're releasing a new update and publishing a guide on deploying Mixtral 8x7B with dstack.
"},{"location":"changelog/0.13.0/#configurable-disk-size","title":"Configurable disk size","text":"
Previously, dstack set the disk size to 100GB regardless of the cloud provider. Now, to accommodate larger language models and datasets, dstack enables setting a custom disk size using --disk in dstack run or via the disk property in .dstack/profiles.yml.
With dstack, whether you're using dev environments, tasks, or services, you can opt for a custom Docker image (for self-installed dependencies) or stick with the default Docker image (dstack pre-installs CUDA drivers, Conda, Python, etc.).
We've upgraded the default Docker image's CUDA drivers to 12.1 (for better compatibility with modern libraries).
nvcc
If you're using the default Docker image and need the CUDA compiler (nvcc), you'll have to install it manually using conda install cuda. The image comes pre-configured with the nvidia/label/cuda-12.1.0 Conda channel.
Lastly, and most importantly, we've added an example on deploying Mixtral 8x7B as a service. This guide allows you to effortlessly deploy a Mixtral endpoint on any cloud platform of your preference.
Deploying Mixtral 8x7B is easy, especailly when using vLLM:
type: service\n\npython: \"3.11\"\n\ncommands:\n - conda install cuda # (required by megablocks)\n - pip install torch # (required by megablocks)\n - pip install vllm megablocks\n - python -m vllm.entrypoints.openai.api_server\n --model mistralai/Mixtral-8X7B-Instruct-v0.1\n --host 0.0.0.0\n --tensor-parallel-size 2 # should match the number of GPUs\n\nport: 8000\n
Once the configuration is defined, goahead and run it:
$ dstack run . -f llms/mixtral.dstack.yml --gpu \"80GB:2\" --disk 200GB\n
It will deploy the endpoint at https://<run-name>.<gateway-domain>.
Because vLLM provides an OpenAI-compatible endpoint, feel free to access it using various OpenAI-compatible tools like Chat UI, LangChain, Llama Index, etc.
Check the complete example for more details.
Don't forget, with dstack, you can use spot instances across different clouds and regions.
"},{"location":"changelog/0.13.0/#feedback-and-support","title":"Feedback and support","text":"
That's all! Feel free to try out the update and the new guide, and share your feedback with us.
The service configuration deploys any application as a public endpoint. For instance, you can use HuggingFace's TGI or other frameworks to deploy custom LLMs. While this is simple and customizable, using different frameworks and LLMs complicates the integration of LLMs.
With dstack 0.14.0, we are extending the service configuration in dstack to enable you to optionally map your custom LLM to an OpenAI-compatible endpoint.
Here's how it works: you define a service (as before) and include the model property with the model's type, name, format, and other settings.
When you deploy the service using dstack run, dstack will automatically publish the OpenAI-compatible endpoint, converting the prompt and response format between your LLM and OpenAI interface.
from openai import OpenAI\n\nclient = OpenAI(\n base_url=\"https://gateway.<your gateway domain>\",\n api_key=\"none\"\n)\n\ncompletion = client.chat.completions.create(\n model=\"mistralai/Mistral-7B-Instruct-v0.1\",\n messages=[\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message)\n
Here's a live demo of how it works:
For more details on how to use the new feature, be sure to check the updated documentation on services, and the TGI example.
Note: After you update to 0.14.0, it's important to delete your existing gateways (if any) using dstack gateway delete and create them again with dstack gateway create.
In case you have any questions, experience bugs, or need help, drop us a message on our Discord server or submit it as a GitHub issue.
"},{"location":"changelog/0.15.0/","title":"dstack 0.15.0: Resources, authorization, and more","text":"
The latest update brings many improvements, enabling the configuration of resources in YAML files, requiring authorization in services, supporting OpenAI-compatible endpoints for vLLM, and more.
Previously, if you wanted to request hardware resources, you had to either use the corresponding arguments with dstack run (e.g. --gpu GPU_SPEC) or use .dstack/profiles.yml.
With 0.15.0, it is now possible to configure resources in the YAML configuration file:
Supported properties include: gpu, cpu, memory, disk, and shm_size.
If you specify memory size, you can either specify an explicit size (e.g. 24GB) or a range (e.g. 24GB.., or 24GB..80GB, or ..80GB).
The gpu property allows specifying not only memory size but also GPU names and their quantity. Examples: A100 (one A100), A10G,A100 (either A10G or A100), A100:80GB (one A100 of 80GB), A100:2 (two A100), 24GB..40GB:2 (two GPUs between 24GB and 40GB), etc.
It's also possible to configure gpu as an object:
type: dev-environment\n\npython: 3.11\nide: vscode\n\n# Require 2 GPUs of at least 40GB with CUDA compute compatibility of 7.5\nresources:\n gpu:\n count: 2\n memory: 40GB..\n compute_capability: 7.5\n
"},{"location":"changelog/0.15.0/#authorization-in-services","title":"Authorization in services","text":"
Previously, when deploying a service, the public endpoint didn't support authorization, meaning anyone with access to the gateway could call it.
With 0.15.0, by default, service endpoints require the Authorization header with \"Bearer <dstack token>\".
$ curl https://yellow-cat-1.example.com/generate \\\n -X POST \\\n -d '{\"inputs\":\"<s>[INST] What is your favourite condiment?[/INST]\"}' \\\n -H 'Content-Type: application/json' \\\n -H 'Authorization: \"Bearer <dstack token>\"'\n
Authorization can be disabled by setting auth to false in the service configuration file.
In case the service has model mapping configured, the OpenAI-compatible endpoint requires authorization.
from openai import OpenAI\n\n\nclient = OpenAI(\n base_url=\"https://gateway.example.com\",\n api_key=\"<dstack token>\"\n)\n\ncompletion = client.chat.completions.create(\n model=\"mistralai/Mistral-7B-Instruct-v0.1\",\n messages=[\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message)\n
"},{"location":"changelog/0.15.0/#model-mapping-for-vllm","title":"Model mapping for vLLM","text":"
Last but not least, we've added one more format for model mapping: openai.
For example, if you run vLLM using the OpenAI mode, it's possible to configure model mapping for it.
When we run such a service, it will be possible to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface and using your dstack user token.
In addition to a few bug fixes, the latest update brings initial integration with Kubernetes (experimental) and adds the possibility to configure a custom VPC for AWS. Read below for more details.
"},{"location":"changelog/0.15.1/#configuring-a-kubernetes-backend","title":"Configuring a Kubernetes backend","text":"
With the latest update, it's now possible to configure a Kubernetes backend. In this case, if you run a workload, dstack will provision infrastructure within your Kubernetes cluster. This may work with both self-managed and managed clusters.
Prerequisite
To use GPUs with Kubernetes, the cluster must be installed with the NVIDIA GPU Operator.
To configure a Kubernetes backend, you need to specify the path to the kubeconfig file, and the port that dstack can use for proxying SSH traffic. In case of a self-managed cluster, also specify the IP address of any node in the cluster.
Self-managedManaged
Here's how to configure the backend to use a self-managed cluster.
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_host: localhost # The external IP address of any node\n ssh_port: 32000 # Any port accessible outside of the cluster\n
The port specified to ssh_port must be accessible outside of the cluster.
For example, if you are using Kind, make sure to add it via extraPortMappings:
kind: Cluster\napiVersion: kind.x-k8s.io/v1alpha4\nnodes:\n- role: control-plane\n extraPortMappings:\n - containerPort: 32000 # Must be same as `ssh_port`\n hostPort: 32000 # Must be same as `ssh_port`\n
Here's how to configure the backend to use a managed cluster (AWS, GCP, Azure).
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_port: 32000 # Any port accessible outside of the cluster\n
The port specified to ssh_port must be accessible outside of the cluster.
For example, if you are using EKS, make sure to add it via an ingress rule of the corresponding security group:
While dstack supports both self-managed and managed clusters, if you're using AWS, GCP, or Azure, it's generally recommended to corresponding backends directly for greater efficiency and ease of use.
"},{"location":"changelog/0.15.1/#specifying-a-custom-vpc-for-aws","title":"Specifying a custom VPC for AWS","text":"
If you're using dstack with AWS, it's now possible to configure a custom VPC via ~/.dstack/server/config.yml:
In this case, dstack will attempt to utilize the VPC with the configured name in each region. If any region lacks a VPC with that name, it will be skipped.
NOTE:
All subnets of the configured VPC should be public; otherwise, dstack won't be able to manage workloads.
Previously, when running a dev environment, task, or service, dstack provisioned an instance in a configured backend, and upon completion of the run, deleted the instance.
In the latest update, we introduce pools, a significantly more efficient way to manage instance lifecycles and reuse instances across runs.
Now, when using the dstack run command, it tries to reuse an instance from a pool. If no ready instance meets the requirements, dstack automatically provisions a new one and adds it to the pool.
Once the workload finishes, the instance is marked as ready (to run other workloads). If the instance remains idle for the configured duration, dstack tears it down.
Idle duration
By default, if dstack run provisions a new instance, its idle duration is set to 5m. This means the instance waits for a new workload for only five minutes before getting torn down. To override it, use the --idle-duration DURATION argument.
The dstack pool command allows for managing instances within pools.
To manually add an instance to a pool, use dstack pool add:
$ dstack pool add --gpu 80GB --idle-duration 1d\n\n BACKEND REGION RESOURCES SPOT PRICE\n tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595\n azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n\nContinue? [y/n]: y\n
The dstack pool add command allows specifying resource requirements, along with the spot policy, idle duration, max price, retry policy, and other policies.
If no idle duration is configured, by default, dstack sets it to 72h. To override it, use the --idle-duration DURATION argument.
Limitations
The dstack pool add command is not yet supported for Lambda, Azure, TensorDock, Kubernetes, and VastAI backends. Support for them is coming in version 0.16.1.
Refer to pools for more details on the new feature and how to use it.
"},{"location":"changelog/0.16.0/#why-does-this-matter","title":"Why does this matter?","text":"
With this new feature, using the cloud can be a lot more predictable and convenient:
Now, you can provision instances in advance and ensure they are available for the entire duration of the project. This saves you from the risk of not having a GPU when you need it most.
If you reuse an instance from a pool, dstack run starts much faster. For example, you can provision an instance and reuse it for running a dev environment, task, or service.
Have questions or need help? Drop us a message on our Discord server. See a bug? Report it to GitHub issues.
"},{"location":"changelog/0.16.1/","title":"dstack 0.16.1: Improvements to dstack pool and bug-fixes","text":"
The latest update enhances the dstack pool command introduced earlier, and it fixes a number of important bugs.
"},{"location":"changelog/0.16.1/#improvements-to-dstack-pool","title":"Improvements to dstack pool","text":"
The dstack pool command, that allows you to manually add instances to the pool, has received several improvements:
The dstack pool add command now works with all VM-based backends (which means all backends except vastai and kubernetes).
The dstack pool add command now accepts the arguments to configure the spot policy (via --spot-auto, --spot, --on-demand) and idle duration (via --idle-duration DURATION). By default, the spot policy is set to on-demand, while the idle duration is set to 72h.
Didn't try dstack pool yet? Give it a try now. It significantly improves the predictability and convenience of using cloud GPUs.
The 0.16.0 update broke the vastai backend (the dstack run command didn't show offers).
If you submitted runs via the API, the default idle duration was not applied, leading to instances staying in the pool and not being automatically removed.
dstack couldn't connect to the instance via SSH due to a number of issues related to not properly handling the user' s default SSH config.
When connecting to a run via ssh <run name> (while using the default Docker image), python, pip, and conda couldn't be found due to the broken PATH.
On our journey to provide an open-source, cloud-agnostic platform for orchestrating GPU workloads, we are proud to announce another step forward \u2013 the integration with CUDO Compute.
CUDO Compute is a GPU marketplace that offers cloud resources at an affordable cost in a number of locations. Currently, the available GPUs include A40, RTX A6000, RTX A4000, RTX A5000, and RTX 3080.
To use it with dstack, you only need to configure the cudo backend with your CUDO Compute project ID and API key:
Once it's done, you can restart the dstack server and use the dstack CLI or API to run workloads.
$ dstack run . -b cudo \n # BACKEND REGION RESOURCES SPOT PRICE\n 1 cudo no-luster-1 25xCPU, 96GB, 1xA6000 no $1.17267\n (48GB), 100GB (disk)\n 2 cudo no-luster-1 26xCPU, 100GB, 1xA6000 no $1.17477\n (48GB), 100GB (disk)\n 3 cudo no-luster-1 27xCPU, 100GB, 1xA6000 no $1.17687\n (48GB), 100GB (disk)\n ...\n Shown 3 of 8 offers, $1.18737 max\n\n Continue? [y/n]:\n
Just like with other backends, the cudo backend allows you to launch dev environments, run tasks, and deploy services with dstack run, and manage your pool of instances via dstack pool.
Limitations
The dstack gateway feature is not yet compatible with cudo, but it is expected to be supported in version 0.17.0, planned for release within a week.
The cudo backend cannot yet be used with dstack Sky, but it will also be enabled within a week.
Haven't tried dstack yet? You're very welcome to do so now. With dstack, orchestrating GPU workloads over any cloud is very easy!
Previously, dstack always served services as single replicas. While this is suitable for development, in production, the service must automatically scale based on the load.
That's why in 0.17.0, we extended dstack with the capability to configure the number of replicas as well as the auto-scaling policy.
The replicas property can be set either to a number or to a range. In the case of a range, the scaling property is required to configure the auto-scaling policy. The auto-scaling policy requires specifying metric (such as rps, i.e. \"requests per second\") and its target (the metric value).
"},{"location":"changelog/0.17.0/#regions-and-instance-types","title":"Regions and instance types","text":"
Also, the update brings a simpler way to configure regions and instance types.
For example, if you'd like to use only a subset of specific regions or instance types, you can now configure them via .dstack/profiles.yml.
Previously, environment variables had to be hardcoded in the configuration file or passed via the CLI. The update brings two major improvements.
Firstly, it's now possible to configure an environment variable in the configuration without hardcoding its value. Secondly, dstack run now inherits environment variables from the current process.
Together, these features allow users to define environment variables separately from the configuration and pass them to dstack run conveniently, such as by using a .env file.
Now, if you run this configuration, dstack will ensure that you've set HUGGING_FACE_HUB_TOKEN either via HUGGING_FACE_HUB_TOKEN=<value> dstack run ..., dstack run -e HUGGING_FACE_HUB_TOKEN=<value> ..., or by using other tools such as direnv or similar.
Currently supported providers for this feature include AWS, GCP, and Azure. For other providers or on-premises servers, file the corresponding feature requests or ping on Discord.
One more small improvement is that the commands property is now not required for tasks and services if you use an image that has a default entrypoint configured.
With the release of version 0.2 of dstack, it is now possible to configure GCP as a remote. All features that were previously available for AWS, except real-time artifacts, are now available for GCP as well.
This means that you can define your ML workflows in code and easily run them locally or remotely in your GCP account.
dstack automatically creates and deletes cloud instances as needed, and assists in setting up the environment, including pipeline dependencies, and saving/loading artifacts.
No code changes are required since ML workflows are described in YAML. You won't need to deal with Docker, Kubernetes, or stateful UI.
This article will explain how to use dstack to run remote ML workflows on GCP.
Ensure that you have installed the latest version of dstack before proceeding.
$ pip install dstack --upgrade\n
By default, workflows run locally. To run workflows remotely, e.g. on a GCP account), you must configure a remote using the dstack config command. Follow the steps below to do so.
"},{"location":"changelog/0.2/#1-create-a-project","title":"1. Create a project","text":"
First you have to create a project in your GCP account, link a billing to it, and make sure that the required APIs and enabled for it.
"},{"location":"changelog/0.2/#2-create-a-storage-bucket","title":"2. Create a storage bucket","text":"
Once the project is set up, you can proceed and create a storage bucket. This bucket will be used to store workflow artifacts and metadata.
NOTE:
Make sure to create the bucket in the sane location where you'd like to run your workflows.
"},{"location":"changelog/0.2/#3-create-a-service-account","title":"3. Create a service account","text":"
The next step is to create a service account in the created project and configure the following roles for it: Service Account User, Compute Admin, Storage Admin, Secret Manager Admin, and Logging Admin.
Once the service account is set up, create a key for it and download the corresponding JSON file to your local machine (e.g. to ~/Downloads/my-awesome-project-d7735ca1dd53.json).
"},{"location":"changelog/0.2/#4-configure-the-cli","title":"4. Configure the CLI","text":"
Once the service account key JSON file is on your machine, you can configure the CLI using the dstack config command.
The command will ask you for a path to the key, GCP region and zone, and storage bucket name.
$ dstack config\n\n? Choose backend: gcp\n? Enter path to credentials file: ~/Downloads/dstack-d7735ca1dd53.json\n? Choose GCP geographic area: North America\n? Choose GCP region: us-west1\n? Choose GCP zone: us-west1-b\n? Choose storage bucket: dstack-dstack-us-west1\n? Choose VPC subnet: no preference\n
That's it! Now you can run remote workflows on GCP.
Last October, we open-sourced the dstack CLI for defining ML workflows as code and running them easily on any cloud or locally. The tool abstracts ML engineers from vendor APIs and infrastructure, making it convenient to run scripts, development environments, and applications.
Today, we are excited to announce a preview of Hub, a new way to use dstack for teams to manage their model development workflows effectively on any cloud platform.
"},{"location":"changelog/0.7.0/#how-does-it-work","title":"How does it work?","text":"
Previously, the dstack CLI configured a cloud account as a remote to use local cloud credentials for direct requests to the cloud. Now, the CLI allows configuration of Hub as a remote, enabling requests to the cloud using user credentials stored in Hub.
sequenceDiagram\n autonumber\n participant CLI\n participant Hub\n participant Cloud\n % Note right of Cloud: AWS, GCP, etc\n CLI->>Hub: Run a workflow\n activate Hub\n Hub-->>Hub: User authentication\n loop Workflow provider\n Hub-->>Cloud: Submit workflow jobs\n end\n Hub-->>CLI: Return the workflow status\n deactivate Hub\n loop Workflow scheduler\n Hub-->>Cloud: Re-submit workflow jobs\n end
The Hub not only provides basic features such as authentication and credential storage, but it also has built-in workflow scheduling capabilities. For instance, it can monitor the availability of spot instances and automatically resubmit jobs.
"},{"location":"changelog/0.7.0/#why-does-it-matter","title":"Why does it matter?","text":"
As you start developing models more regularly, you'll encounter the challenge of automating your ML workflows to reduce time spent on infrastructure and manual work.
While many cloud vendors offer tools to automate ML workflows, they do so through opinionated UIs and APIs, leading to a suboptimal developer experience and vendor lock-in.
In contrast, dstack aims to provide a non-opinionated and developer-friendly interface that can work across any vendor.
"},{"location":"changelog/0.7.0/#try-the-preview","title":"Try the preview","text":"
Here's a quick guide to get started with Hub:
Start the Hub application
Visit the URL provided in the output to log in as an administrator
Create a project and configure its backend (AWS or GCP)
Currently, the only way to run or manage workflows is through the dstack CLI. There are scenarios when you'd prefer to run workflows other ways, e.g. from Python code or programmatically via API. To support these scenarios, we plan to release soon Python SDK and REST API.
The built-in scheduler currently monitors spot instance availability and automatically resubmits jobs. Our plan is to enhance this feature and include additional capabilities. Users will be able to track cloud compute usage, and manage quotes per team via the user interface.
Lastly, and of utmost importance, we plan to extend support to other cloud platforms, not limiting ourselves to AWS, GCP, and Azure.
At dstack, our goal is to create a simple and unified interface for ML engineers to run dev environments, pipelines, and apps on any cloud. With the latest update, we take another significant step in this direction.
We are thrilled to announce that the latest update introduces Azure support, among other things, making it incredibly easy to run dev environments, pipelines, and apps in Azure. Read on for more details.
Using Azure with dstack is very straightforward. All you need to do is create the corresponding project via the UI and provide your Azure credentials.
NOTE:
For detailed instructions on setting up dstack for Azure, refer to the documentation.
Once the project is set up, you can define dev environments, pipelines, and apps as code, and easily run them with just a single command. dstack will automatically provision the infrastructure for you.
"},{"location":"changelog/0.9.1/#logs-and-artifacts-in-ui","title":"Logs and artifacts in UI","text":"
Secondly, with the new update, you now have the ability to browse the logs and artifacts of any run through the user interface.
Last but not least, with the update, we have reworked the documentation to provide a greater emphasis on specific use cases: dev environments, tasks, and services.
"},{"location":"changelog/0.9.1/#try-it-out","title":"Try it out","text":"
Please note that when installing dstack via pip, you now need to specify the exact list of cloud providers you intend to use:
$ pip install \"dstack[aws,gcp,azure]\" -U\n
This requirement applies only when you start the server locally. If you connect to a server hosted elsewhere, you can use the shorter syntax:pip install dstack.
Feedback
If you have any feedback, including issues or questions, please share them in our Discord community or file it as a GitHub issue.
"},{"location":"docs/","title":"What is dstack?","text":"
dstack is an open-source orchestration engine for running AI workloads. It supports a wide range of cloud providers (such as AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, CUDO, RunPod, etc.) as well as on-premises infrastructure.
"},{"location":"docs/#why-use-dstack","title":"Why use dstack?","text":"
Designed for development, training, and deployment of gen AI models.
Efficiently utilizes compute across cloud providers and on-prem servers.
Compatible with any training, fine-tuning, and serving frameworks, as well as other third-party tools.
100% open-source.
"},{"location":"docs/#how-does-it-work","title":"How does it work?","text":"
Install the open-source version of dstack and configure your own cloud accounts, or sign up with dstack Sky
Define configurations such as dev environments, tasks, and services.
Run configurations via dstack's CLI or API.
Use pools to manage instances and on-prem servers.
"},{"location":"docs/#where-do-i-start","title":"Where do I start?","text":"
To use the open-source version, make sure to install the server and configure backends.
If you're using dstack Sky, install the CLI and run the dstack config command:
Once the CLI is set up, follow the quickstart.
"},{"location":"docs/quickstart/#initialize-a-repo","title":"Initialize a repo","text":"
To use dstack's CLI in a folder, first run dstack init within that folder.
$ mkdir quickstart && cd quickstart\n$ dstack init\n
Your folder can be a regular local folder or a Git repo.
"},{"location":"docs/quickstart/#define-a-configuration","title":"Define a configuration","text":"
Define what you want to run as a YAML file. The filename must end with .dstack.yml (e.g., .dstack.yml or train.dstack.yml are both acceptable).
Dev environmentTaskService
Dev environments allow you to quickly provision a machine with a pre-configured environment, resources, IDE, code, etc.
type: dev-environment\n\n# Use either `python` or `image` to configure environment\npython: \"3.11\"\n# image: ghcr.io/huggingface/text-generation-inference:latest\n\nide: vscode\n\n# (Optional) Configure `gpu`, `memory`, `disk`, etc\nresources:\n gpu: 80GB\n
Tasks make it very easy to run any scripts, be it for training, data processing, or web apps. They allow you to pre-configure the environment, resources, code, etc.
Run a configuration using the dstack run command, followed by the working directory path (e.g., .), the path to the configuration file, and run options (e.g., configuring hardware resources, spot policy, etc.)
Before submitting a task or deploying a model, you may want to run code interactively. Dev environments allow you to do exactly that.
You specify the required environment and resources, then run it. dstack provisions the dev environment in the configured backend and enables access via your desktop IDE.
"},{"location":"docs/concepts/dev-environments/#define-a-configuration","title":"Define a configuration","text":"
First, create a YAML file in your project folder. Its name must end with .dstack.yml (e.g. .dstack.yml or dev.dstack.yml are both acceptable).
The YAML file allows you to specify your own Docker image, environment variables, resource requirements, etc. If image is not specified, dstack uses its own (pre-configured with Python, Conda, and essential CUDA drivers).
.dstack.yml
For more details on the file syntax, refer to the .dstack.yml reference.
If you don't assign a value to an environment variable (see HUGGING_FACE_HUB_TOKEN above), dstack will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a .env file and utilize tools like direnv.
"},{"location":"docs/concepts/dev-environments/#run-the-configuration","title":"Run the configuration","text":"
To run a configuration, use the dstack run command followed by the working directory path, configuration file path, and other options.
$ dstack run . -f .dstack.yml\n\n BACKEND REGION RESOURCES SPOT PRICE\n tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595\n azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n\nContinue? [y/n]: y\n\nProvisioning `fast-moth-1`...\n---> 100%\n\nTo open in VS Code Desktop, use this link:\n vscode://vscode-remote/ssh-remote+fast-moth-1/workflow\n
When dstack provisions the dev environment, it uses the current folder contents.
Exclude files
If there are large files or folders you'd like to avoid uploading, you can list them in either .gitignore or .dstackignore.
The dstack run command allows specifying many things, including spot policy, retry and max duration, max price, regions, instance types, and much more.
In case you'd like to reuse certain parameters (such as spot policy, retry and max duration, max price, regions, instance types, etc.) across runs, you can define them via .dstack/profiles.yml.
"},{"location":"docs/concepts/dev-environments/#manage-runs","title":"Manage runs","text":""},{"location":"docs/concepts/dev-environments/#stop-a-run","title":"Stop a run","text":"
Once the run exceeds the max duration, or when you use dstack stop, the dev environment and its cloud resources are deleted.
Pools simplify managing the lifecycle of cloud instances and enable their efficient reuse across runs.
You can have instances provisioned in the configured backend automatically when you run a workload, or add them manually, configuring the required resources, idle duration, etc.
By default, when using the dstack run command, it tries to reuse an instance from a pool. If no idle instance meets the requirements, dstack automatically provisions a new one and adds it to the pool.
To avoid provisioning new instances with dstack run, use --reuse. Your run will be assigned to an idle instance in the pool.
Idle duration
By default, dstack run sets the idle duration of a newly provisioned instance to 5m. This means that if the run is finished and the instance remains idle for longer than five minutes, it is automatically removed from the pool. To override the default idle duration, use --idle-duration DURATION with dstack run.
"},{"location":"docs/concepts/pools/#dstack-pool-add","title":"dstack pool add","text":"
To manually add an instance to a pool, use dstack pool add:
$ dstack pool add --gpu 80GB\n\n BACKEND REGION RESOURCES SPOT PRICE\n tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595\n azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n\nContinue? [y/n]: y\n
The dstack pool add command allows specifying resource requirements, along with the spot policy, idle duration, max price, retry policy, and other policies.
The default idle duration if you're using dstack pool add is 72h. To override it, use the --idle-duration DURATION argument.
You can also specify the policies via .dstack/profiles.yml instead of passing them as arguments. For more details on policies and their defaults, refer to .dstack/profiles.yml.
Limitations
The dstack pool add command is not supported for Kubernetes, and VastAI backends yet.
Services make it very easy to deploy any kind of model or web application as public endpoints.
Use any serving frameworks and specify required resources. dstack deploys it in the configured backend, handles authorization, auto-scaling, and provides an OpenAI-compatible interface if needed.
Prerequisites
If you're using the open-source server, you first have to set up a gateway.
"},{"location":"docs/concepts/services/#set-up-a-gateway","title":"Set up a gateway","text":"
For example, if your domain is example.com, go ahead and run the dstack gateway create command:
Afterward, in your domain's DNS settings, add an A DNS record for *.example.com pointing to the IP address of the gateway.
Now, if you run a service, dstack will make its endpoint available at https://<run name>.<gateway domain>.
In case your service has the model mapping configured, dstack will automatically make your model available at https://gateway.<gateway domain> via the OpenAI-compatible interface.
If you're using dstack Sky, the gateway is set up for you.
"},{"location":"docs/concepts/services/#define-a-configuration","title":"Define a configuration","text":"
First, create a YAML file in your project folder. Its name must end with .dstack.yml (e.g. .dstack.yml or train.dstack.yml are both acceptable).
The YAML file allows you to specify your own Docker image, environment variables, resource requirements, etc. If image is not specified, dstack uses its own (pre-configured with Python, Conda, and essential CUDA drivers).
.dstack.yml
For more details on the file syntax, refer to the .dstack.yml reference.
If you don't assign a value to an environment variable (see HUGGING_FACE_HUB_TOKEN above), dstack will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a .env file and utilize tools like direnv.
"},{"location":"docs/concepts/services/#configure-model-mapping","title":"Configure model mapping","text":"
By default, if you run a service, its endpoint is accessible at https://<run name>.<gateway domain>.
If you run a model, you can optionally configure the mapping to make it accessible via the OpenAI-compatible interface.
In this case, with such a configuration, once the service is up, you'll be able to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
The format supports only tgi (Text Generation Inference) and openai (if you are using Text Generation Inference or vLLM with OpenAI-compatible mode).
Chat template
By default, dstack loads the chat template from the model's repository. If it is not present there, manual configuration is required.
type: service\n\nimage: ghcr.io/huggingface/text-generation-inference:latest\nenv:\n - MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ\ncommands:\n - text-generation-launcher --port 8000 --trust-remote-code --quantize gptq\nport: 8000\n\nresources:\n gpu: 80GB\n\n# Enable the OpenAI-compatible endpoint\nmodel:\n type: chat\n name: TheBloke/Llama-2-13B-chat-GPTQ\n format: tgi\n chat_template: \"{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\\\n' + system_message + '\\\\n<</SYS>>\\\\n\\\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' </s>' }}{% endif %}{% endfor %}\"\n eos_token: \"</s>\"\n
Please note that model mapping is an experimental feature with the following limitations:
Doesn't work if your chat_template uses bos_token. As a workaround, replace bos_token inside chat_template with the token content itself.
Doesn't work if eos_token is defined in the model repository as a dictionary. As a workaround, set eos_token manually, as shown in the example above (see Chat template).
If you encounter any other issues, please make sure to file a GitHub issue.
"},{"location":"docs/concepts/services/#configure-replicas-and-auto-scaling","title":"Configure replicas and auto-scaling","text":"
By default, dstack runs a single replica of the service. You can configure the number of replicas as well as the auto-scaling policy.
If you specify the minimum number of replicas as 0, the service will scale down to zero when there are no requests.
"},{"location":"docs/concepts/services/#run-the-configuration","title":"Run the configuration","text":"
To run a configuration, use the dstack run command followed by the working directory path, configuration file path, and any other options.
$ dstack run . -f serve.dstack.yml\n\n BACKEND REGION RESOURCES SPOT PRICE\n tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595\n azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673\n\nContinue? [y/n]: y\n\nProvisioning...\n---> 100%\n\nService is published at https://yellow-cat-1.example.com\n
When dstack submits the task, it uses the current folder contents.
Exclude files
If there are large files or folders you'd like to avoid uploading, you can list them in either .gitignore or .dstackignore.
The dstack run command allows specifying many things, including spot policy, retry and max duration, max price, regions, instance types, and much more.
In case the service has the model mapping configured, you will also be able to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface.
from openai import OpenAI\n\n\nclient = OpenAI(\n base_url=\"https://gateway.example.com\",\n api_key=\"<dstack token>\"\n)\n\ncompletion = client.chat.completions.create(\n model=\"mistralai/Mistral-7B-Instruct-v0.1\",\n messages=[\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message)\n
In case you'd like to reuse certain parameters (such as spot policy, retry and max duration, max price, regions, instance types, etc.) across runs, you can define them via .dstack/profiles.yml.
"},{"location":"docs/concepts/services/#manage-runs","title":"Manage runs","text":""},{"location":"docs/concepts/services/#stop-a-run","title":"Stop a run","text":"
When you use dstack stop, the service and its cloud resources are deleted.
Tasks allow for convenient scheduling of any kind of batch jobs, such as training, fine-tuning, or data processing, as well as running web applications.
You simply specify the commands, required environment, and resources, and then submit it. dstack provisions the required resources in a configured backend and runs the task.
"},{"location":"docs/concepts/tasks/#define-a-configuration","title":"Define a configuration","text":"
First, create a YAML file in your project folder. Its name must end with .dstack.yml (e.g. .dstack.yml or train.dstack.yml are both acceptable).
The YAML file allows you to specify your own Docker image, environment variables, resource requirements, etc. If image is not specified, dstack uses its own (pre-configured with Python, Conda, and essential CUDA drivers).
.dstack.yml
For more details on the file syntax, refer to the .dstack.yml reference.
If you don't assign a value to an environment variable (see HUGGING_FACE_HUB_TOKEN above), dstack will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a .env file and utilize tools like direnv.
A task can configure ports. In this case, if the task is running an application on a port, dstack run will securely allow you to access this port from your local machine through port forwarding.
When dstack submits the task, it uses the current folder contents.
Exclude files
If there are large files or folders you'd like to avoid uploading, you can list them in either .gitignore or .dstackignore.
The dstack run command allows specifying many things, including spot policy, retry and max duration, max price, regions, instance types, and much more.
In case you'd like to reuse certain parameters (such as spot policy, retry and max duration, max price, regions, instance types, etc.) across runs, you can define them via .dstack/profiles.yml.
"},{"location":"docs/concepts/tasks/#manage-runs","title":"Manage runs","text":""},{"location":"docs/concepts/tasks/#stop-a-run","title":"Stop a run","text":"
Once the run exceeds the max duration, or when you use dstack stop, the task and its cloud resources are deleted.
There are two ways to configure AWS: using an access key or using the default credentials.
Access keyDefault credentials
Create an access key by following the this guide . Once you've downloaded the .csv file with your IAM user's Access key ID and Secret access key, proceed to configure the backend.
Log into your DataCrunch account, click Account Settings in the sidebar, find REST API Credentials area and then click the Generate Credentials button.
dstack supports both self-managed, and managed Kubernetes clusters.
Prerequisite
To use GPUs with Kubernetes, the cluster must be installed with the NVIDIA GPU Operator .
To configure a Kubernetes backend, specify the path to the kubeconfig file, and the port that dstack can use for proxying SSH traffic. In case of a self-managed cluster, also specify the IP address of any node in the cluster.
Self-managedManaged
Here's how to configure the backend to use a self-managed cluster.
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_host: localhost # The external IP address of any node\n ssh_port: 32000 # Any port accessible outside of the cluster\n
The port specified to ssh_port must be accessible outside of the cluster.
For example, if you are using Kind, make sure to add it via extraPortMappings:
kind: Cluster\napiVersion: kind.x-k8s.io/v1alpha4\nnodes:\n- role: control-plane\n extraPortMappings:\n - containerPort: 32000 # Must be same as `ssh_port`\n hostPort: 32000 # Must be same as `ssh_port`\n
Here's how to configure the backend to use a managed cluster (AWS, GCP, Azure).
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_port: 32000 # Any port accessible outside of the cluster\n
The port specified to ssh_port must be accessible outside of the cluster.
For example, if you are using EKS, make sure to add it via an ingress rule of the corresponding security group:
"},{"location":"docs/installation/#start-the-server","title":"Start the server","text":"
Once the ~/.dstack/server/config.yml file is configured, proceed to start the server:
pipDocker
$ dstack server\n\nApplying ~/.dstack/server/config.yml...\n\nThe admin token is \"bbae0f28-d3dd-4820-bf61-8f4bb40815da\"\nThe server is running at http://127.0.0.1:3000/\n
$ docker run -p 3000:3000 -v $HOME/.dstack/server/:/root/.dstack/server dstackai/dstack\n\nApplying ~/.dstack/server/config.yml...\n\nThe admin token is \"bbae0f28-d3dd-4820-bf61-8f4bb40815da\"\nThe server is running at http://127.0.0.1:3000/\n
"},{"location":"docs/installation/#configure-the-cli","title":"Configure the CLI","text":"
To point the CLI to the dstack server, you need to configure ~/.dstack/config.yml with the server address, user token and project name.
$ dstack config --url http://127.0.0.1:3000 \\\n --project main \\\n --token bbae0f28-d3dd-4820-bf61-8f4bb40815da\n\nConfiguration is updated at ~/.dstack/config.yml\n
Instead of configuring run options as dstack run arguments or .dstack.yml parameters, you can defines those options in profiles.yml and reuse them across different run configurations. dstack supports repository-level profiles defined in $REPO_ROOT/.dstack/profiles.yml and global profiles defined in ~/.dstack/profiles.yml.
Profiles parameters are resolved with the following priority:
dstack run arguments
.dstack.yml parameters
Repository-level profiles from $REPO_ROOT/.dstack/profiles.yml
profiles:\n - name: large\n\n spot_policy: auto # (Optional) The spot policy. Supports `spot`, `on-demand, and `auto`.\n\n max_price: 1.5 # (Optional) The maximum price per instance per hour\n\n max_duration: 1d # (Optional) The maximum duration of the run.\n\n retry:\n retry-duration: 3h # (Optional) To wait for capacity\n\n backends: [azure, lambda] # (Optional) Use only listed backends \n\n default: true # (Optional) Activate the profile by default\n
You can mark any profile as default or pass its name via --profile to dstack run.
"},{"location":"docs/reference/profiles.yml/#root-reference","title":"Root reference","text":""},{"location":"docs/reference/profiles.yml/#backends","title":"backends - (Optional) The backends to consider for provisionig (e.g., [aws, gcp]).","text":""},{"location":"docs/reference/profiles.yml/#regions","title":"regions - (Optional) The regions to consider for provisionig (e.g., [eu-west-1, us-west4, westeurope]).","text":""},{"location":"docs/reference/profiles.yml/#instance_types","title":"instance_types - (Optional) The cloud-specific instance types to consider for provisionig (e.g., [p3.8xlarge, n1-standard-4]).","text":""},{"location":"docs/reference/profiles.yml/#spot_policy","title":"spot_policy - (Optional) The policy for provisioning spot or on-demand instances: spot, on-demand, or auto.","text":""},{"location":"docs/reference/profiles.yml/#_retry_policy","title":"retry_policy - (Optional) The policy for re-submitting the run.","text":""},{"location":"docs/reference/profiles.yml/#max_duration","title":"max_duration - (Optional) The maximum duration of a run (e.g., 2h, 1d, etc). After it elapses, the run is forced to stop. Defaults to off.","text":""},{"location":"docs/reference/profiles.yml/#max_price","title":"max_price - (Optional) The maximum price per hour, in dollars.","text":""},{"location":"docs/reference/profiles.yml/#pool_name","title":"pool_name - (Optional) The name of the pool. If not set, dstack will use the default name.","text":""},{"location":"docs/reference/profiles.yml/#instance_name","title":"instance_name - (Optional) The name of the instance.","text":""},{"location":"docs/reference/profiles.yml/#creation_policy","title":"creation_policy - (Optional) The policy for using instances from the pool. Defaults to reuse-or-create.","text":""},{"location":"docs/reference/profiles.yml/#termination_policy","title":"termination_policy - (Optional) The policy for termination instances. Defaults to destroy-after-idle.","text":""},{"location":"docs/reference/profiles.yml/#termination_idle_time","title":"termination_idle_time - (Optional) Time to wait before destroying the idle instance. Defaults to 5m for dstack run and to 3d for dstack pool add.","text":""},{"location":"docs/reference/profiles.yml/#name","title":"name - The name of the profile that can be passed as --profile to dstack run.","text":""},{"location":"docs/reference/profiles.yml/#default","title":"default - (Optional) If set to true, dstack run will use this profile by default..","text":""},{"location":"docs/reference/profiles.yml/#retry_policy","title":"retry_policy","text":""},{"location":"docs/reference/profiles.yml/#retry","title":"retry - (Optional) Whether to retry the run on failure or not.","text":""},{"location":"docs/reference/profiles.yml/#duration","title":"duration - (Optional) The maximum period of retrying the run, e.g., 4h or 1d.","text":""},{"location":"docs/reference/api/python/","title":"Python API","text":"
The Python API enables running tasks, services, and managing runs programmatically.
Below is a quick example of submitting a task for running and displaying its logs.
import sys\n\nfrom dstack.api import Task, GPU, Client, Resources\n\nclient = Client.from_config()\n\ntask = Task(\n image=\"ghcr.io/huggingface/text-generation-inference:latest\",\n env={\"MODEL_ID\": \"TheBloke/Llama-2-13B-chat-GPTQ\"},\n commands=[\n \"text-generation-launcher --trust-remote-code --quantize gptq\",\n ],\n ports=[\"80\"],\n resources=Resources(gpu=GPU(memory=\"24GB\")),\n)\n\nrun = client.runs.submit(\n run_name=\"my-awesome-run\", # If not specified, a random name is assigned \n configuration=task,\n repo=None, # Specify to mount additional files\n)\n\nrun.attach()\n\ntry:\n for log in run.logs():\n sys.stdout.buffer.write(log)\n sys.stdout.buffer.flush()\nexcept KeyboardInterrupt:\n run.stop(abort=True)\nfinally:\n run.detach()\n
NOTE:
The configuration argument in the submit method can be either dstack.api.Task or dstack.api.Service.
If you create dstack.api.Task or dstack.api.Service, you may specify the image argument. If image isn't specified, the default image will be used. For a private Docker registry, ensure you also pass the registry_auth argument.
The repo argument in the submit method allows the mounting of a local folder, a remote repo, or a programmatically created repo. In this case, the commands argument can refer to the files within this repo.
The attach method waits for the run to start and, for dstack.api.Task sets up an SSH tunnel and forwards configured ports to localhost.
By default, it uses the default Git credentials configured on the machine. You can override these credentials via the git_identity_file or oauth_token arguments of the init method.
Once the repo is initialized, you can pass the repo object to the run:
run = client.runs.submit(\n configuration=...,\n repo=repo,\n)\n
Parameters:
Name Type Description Default repoRepo
The repo to initialize.
required git_identity_fileOptional[PathLike]
The private SSH key path for accessing the remote repo.
Name Type Description Default cpuOptional[Range[int]]
The number of CPUs
DEFAULT_CPU_COUNTmemoryOptional[Range[Memory]]
The size of RAM memory (e.g., \"16GB\")
DEFAULT_MEMORY_SIZEgpuOptional[GPUSpec]
The GPU spec
Noneshm_sizeOptional[Range[Memory]]
The of shared memory (e.g., \"8GB\"). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.
By default, it uses the default Git credentials configured on the machine. You can override these credentials via the git_identity_file or oauth_token arguments of the init method.
Finally, you can pass the repo object to the run:
run = client.runs.submit(\n configuration=...,\n repo=repo,\n)\n
$ dstack server --help\nUsage: dstack server [-h] [--host HOST] [-p PORT] [-l LOG_LEVEL] [--default]\n [--no-default] [--token TOKEN]\n\nOptions:\n -h, --help Show this help message and exit\n --host HOST Bind socket to this host. Defaults to 127.0.0.1\n -p, --port PORT Bind socket to this port. Defaults to 3000.\n -l, --log-level LOG_LEVEL\n Server logging level. Defaults to INFO.\n --default Update the default project configuration\n --no-default Do not update the default project configuration\n --token TOKEN The admin user token\n
This command initializes the current folder as a repo.
$ dstack init --help\nUsage: dstack init [-h] [--project PROJECT] [-t OAUTH_TOKEN]\n [--git-identity SSH_PRIVATE_KEY]\n [--ssh-identity SSH_PRIVATE_KEY] [--local]\n\nOptions:\n -h, --help Show this help message and exit\n --project PROJECT The name of the project\n -t, --token OAUTH_TOKEN\n An authentication token for Git\n --git-identity SSH_PRIVATE_KEY\n The private SSH key path to access the remote repo\n --ssh-identity SSH_PRIVATE_KEY\n The private SSH key path for SSH tunneling\n --local Do not use git\n
Git credentials
If the current folder is a Git repo, the command authorizes dstack to access it. By default, the command uses the default Git credentials configured for the repo. You can override these credentials via --token (OAuth token) or --git-identity.
Custom SSH key
By default, this command generates an SSH key that will be used for port forwarding and SSH access to running workloads. You can override this key via --ssh-identity.
$ dstack run . --help\nUsage: dstack run [--project NAME] [-h [TYPE]] [-f FILE] [-n RUN_NAME] [-d]\n [-y] [--max-offers MAX_OFFERS] [--profile NAME]\n [--max-price PRICE] [--max-duration DURATION] [-b NAME]\n [-r NAME] [--instance-type NAME]\n [--pool POOL_NAME | --reuse | --dont-destroy | --idle-duration IDLE_DURATION | --instance NAME]\n [--spot | --on-demand | --spot-auto | --spot-policy POLICY]\n [--retry | --no-retry | --retry-duration DURATION]\n [-e KEY=VALUE] [--gpu SPEC] [--disk RANGE]\n working_dir\n\nPositional Arguments:\n working_dir\n\nOptions:\n --project NAME The name of the project. Defaults to $DSTACK_PROJECT\n -h, --help [TYPE] Show this help message and exit. TYPE is one of task,\n dev-environment, service\n -f, --file FILE The path to the run configuration file. Defaults to\n WORKING_DIR/.dstack.yml\n -n, --name RUN_NAME The name of the run. If not specified, a random name\n is assigned\n -d, --detach Do not poll logs and run status\n -y, --yes Do not ask for plan confirmation\n --max-offers MAX_OFFERS\n Number of offers to show in the run plan\n -e, --env KEY=VALUE Environment variables\n --gpu SPEC Request GPU for the run. The format is\n NAME:COUNT:MEMORY (all parts are optional)\n --disk RANGE Request the size range of disk for the run. Example\n --disk 100GB...\n\nProfile:\n --profile NAME The name of the profile. Defaults to $DSTACK_PROFILE\n --max-price PRICE The maximum price per hour, in dollars\n --max-duration DURATION\n The maximum duration of the run\n -b, --backend NAME The backends that will be tried for provisioning\n -r, --region NAME The regions that will be tried for provisioning\n --instance-type NAME The cloud-specific instance types that will be tried\n for provisioning\n\nPools:\n --pool POOL_NAME The name of the pool. If not set, the default pool\n will be used\n --reuse Reuse instance from pool\n --dont-destroy Do not destroy instance after the run is finished\n --idle-duration IDLE_DURATION\n Time to wait before destroying the idle instance\n --instance NAME Reuse instance from pool with name NAME\n\nSpot Policy:\n --spot Consider only spot instances\n --on-demand Consider only on-demand instances\n --spot-auto Consider both spot and on-demand instances\n --spot-policy POLICY One of spot, on-demand, auto\n\nRetry Policy:\n --retry\n --no-retry\n --retry-duration DURATION\n
.gitignore
When running anything via CLI, dstack uses the exact version of code from your project directory.
If there are large files, consider creating a .gitignore file to exclude them for better performance.
$ dstack ps --help\nUsage: dstack ps [-h] [--project NAME] [-a] [-v] [-w]\n\nOptions:\n -h, --help Show this help message and exit\n --project NAME The name of the project. Defaults to $DSTACK_PROJECT\n -a, --all Show all runs. By default, it only shows unfinished runs or\n the last finished.\n -v, --verbose Show more information about runs\n -w, --watch Watch statuses of runs in realtime\n
This command stops run(s) within the current repository.
$ dstack stop --help\nUsage: dstack stop [-h] [--project NAME] [-x] [-y] run_name\n\nPositional Arguments:\n run_name\n\nOptions:\n -h, --help Show this help message and exit\n --project NAME The name of the project. Defaults to $DSTACK_PROJECT\n -x, --abort\n -y, --yes\n
This command shows the output of a given run within the current repository.
$ dstack logs --help\nUsage: dstack logs [-h] [--project NAME] [-d] [-a]\n [--ssh-identity SSH_PRIVATE_KEY] [--replica REPLICA]\n [--job JOB]\n run_name\n\nPositional Arguments:\n run_name\n\nOptions:\n -h, --help Show this help message and exit\n --project NAME The name of the project. Defaults to $DSTACK_PROJECT\n -d, --diagnose\n -a, --attach Set up an SSH tunnel, and print logs as they follow.\n --ssh-identity SSH_PRIVATE_KEY\n The private SSH key path for SSH tunneling\n --replica REPLICA The relica number. Defaults to 0.\n --job JOB The job number inside the replica. Defaults to 0.\n
Both the CLI and API need to be configured with the server address, user token, and project name via ~/.dstack/config.yml.
At startup, the server automatically configures CLI and API with the server address, user token, and the default project name (main). This configuration is stored via ~/.dstack/config.yml.
To use CLI and API on different machines or projects, use the dstack config command.
$ dstack config --help\nUsage: dstack config [-h] [--project PROJECT] [--url URL] [--token TOKEN]\n [--default] [--remove] [--no-default]\n\nOptions:\n -h, --help Show this help message and exit\n --project PROJECT The name of the project to configure\n --url URL Server url\n --token TOKEN User token\n --default Set the project as default. It will be used when\n --project is omitted in commands.\n --remove Delete project configuration\n --no-default Do not prompt to set the project as default\n
Pools allow for managing the lifecycle of instances and reusing them across runs. The default pool is created automatically.
"},{"location":"docs/reference/cli/#dstack-pool-add","title":"dstack pool add","text":"
The dstack pool add command adds an instance to a pool. If no pool name is specified, the instance goes to the default pool.
$ dstack pool add --help\nUsage: dstack pool add [-h] [-y] [--remote] [--remote-host REMOTE_HOST]\n [--remote-port REMOTE_PORT] [--name INSTANCE_NAME]\n [--profile NAME] [--max-price PRICE] [-b NAME]\n [-r NAME] [--instance-type NAME] [--pool POOL_NAME]\n [--reuse] [--dont-destroy]\n [--idle-duration IDLE_DURATION]\n [--spot | --on-demand | --spot-auto | --spot-policy POLICY]\n [--retry | --no-retry | --retry-duration DURATION]\n [--cpu SPEC] [--memory SIZE] [--shared-memory SIZE]\n [--gpu SPEC] [--disk SIZE]\n\nOptions:\n -h, --help show this help message and exit\n -y, --yes Don't ask for confirmation\n --remote Add remote runner as an instance\n --remote-host REMOTE_HOST\n Remote runner host\n --remote-port REMOTE_PORT\n Remote runner port\n --name INSTANCE_NAME Set the name of the instance\n --pool POOL_NAME The name of the pool. If not set, the default pool\n will be used\n --reuse Reuse instance from pool\n --dont-destroy Do not destroy instance after the run is finished\n --idle-duration IDLE_DURATION\n Time to wait before destroying the idle instance\n\nProfile:\n --profile NAME The name of the profile. Defaults to $DSTACK_PROFILE\n --max-price PRICE The maximum price per hour, in dollars\n -b, --backend NAME The backends that will be tried for provisioning\n -r, --region NAME The regions that will be tried for provisioning\n --instance-type NAME The cloud-specific instance types that will be tried\n for provisioning\n\nSpot Policy:\n --spot Consider only spot instances\n --on-demand Consider only on-demand instances\n --spot-auto Consider both spot and on-demand instances\n --spot-policy POLICY One of spot, on-demand, auto\n\nRetry Policy:\n --retry\n --no-retry\n --retry-duration DURATION\n\nResources:\n --cpu SPEC Request the CPU count. Default: 2..\n --memory SIZE Request the size of RAM. The format is SIZE:MB|GB|TB.\n Default: 8GB..\n --shared-memory SIZE Request the size of Shared Memory. The format is\n SIZE:MB|GB|TB.\n --gpu SPEC Request GPU for the run. The format is\n NAME:COUNT:MEMORY (all parts are optional)\n --disk SIZE Request the size of disk for the run. Example --disk\n 100GB...\n
"},{"location":"docs/reference/cli/#dstack-pool-ps","title":"dstack pool ps","text":"
The dstack pool ps command lists all active instances of a pool. If no pool name is specified, default pool instances are displayed.
$ dstack pool ps --help\nUsage: dstack pool ps [-h] [--pool POOL_NAME] [-w]\n\nShow instances in the pool\n\nOptions:\n -h, --help show this help message and exit\n --pool POOL_NAME The name of the pool. If not set, the default pool will be\n used\n -w, --watch Watch instances in realtime\n
"},{"location":"docs/reference/cli/#dstack-pool-create","title":"dstack pool create","text":"
The dstack pool create command creates a new pool.
$ dstack pool create --help\nUsage: dstack pool create [-h] -n POOL_NAME\n\nOptions:\n -h, --help show this help message and exit\n -n, --name POOL_NAME The name of the pool\n
"},{"location":"docs/reference/cli/#dstack-pool-list","title":"dstack pool list","text":"
The dstack pool list lists all existing pools.
$ dstack pool delete --help\nUsage: dstack pool delete [-h] -n POOL_NAME\n\nOptions:\n -h, --help show this help message and exit\n -n, --name POOL_NAME The name of the pool\n
"},{"location":"docs/reference/cli/#dstack-pool-delete","title":"dstack pool delete","text":"
The dstack pool delete command deletes a specified pool.
$ dstack pool delete --help\nUsage: dstack pool delete [-h] -n POOL_NAME\n\nOptions:\n -h, --help show this help message and exit\n -n, --name POOL_NAME The name of the pool\n
A gateway is required for running services. It handles ingress traffic, authorization, domain mapping, model mapping for the OpenAI-compatible endpoint, and so on.
The dstack gateway list command displays the names and addresses of the gateways configured in the project.
$ dstack gateway list --help\nUsage: dstack gateway list [-h] [-v]\n\nOptions:\n -h, --help show this help message and exit\n -v, --verbose Show more information\n
The dstack gateway create command creates a new gateway instance in the project.
$ dstack gateway create --help\nUsage: dstack gateway create [-h] --backend {aws,azure,gcp,kubernetes}\n --region REGION [--set-default] [--name NAME]\n --domain DOMAIN\n\nOptions:\n -h, --help show this help message and exit\n --backend {aws,azure,gcp,kubernetes}\n --region REGION\n --set-default Set as default gateway for the project\n --name NAME Set a custom name for the gateway\n --domain DOMAIN Set the domain for the gateway\n
The dstack gateway delete command deletes the specified gateway.
$ dstack gateway delete --help\nUsage: dstack gateway delete [-h] [-y] name\n\nPositional Arguments:\n name The name of the gateway\n\nOptions:\n -h, --help show this help message and exit\n -y, --yes Don't ask for confirmation\n
The dstack gateway update command updates the specified gateway.
$ dstack gateway update --help\nUsage: dstack gateway update [-h] [--set-default] [--domain DOMAIN] name\n\nPositional Arguments:\n name The name of the gateway\n\nOptions:\n -h, --help show this help message and exit\n --set-default Set it the default gateway for the project\n --domain DOMAIN Set the domain for the gateway\n
"},{"location":"docs/reference/cli/#environment-variables","title":"Environment variables","text":"Name Description Default DSTACK_CLI_LOG_LEVEL Configures CLI logging level INFODSTACK_PROFILE Has the same effect as --profileNoneDSTACK_PROJECT Has the same effect as --projectNoneDSTACK_DEFAULT_CREDS_DISABLED Disables default credentials detection if set NoneDSTACK_LOCAL_BACKEND_ENABLED Enables local backend for debug if set NoneDSTACK_RUNNER_VERSION Sets exact runner version for debug latestDSTACK_SERVER_ADMIN_TOKEN Has the same effect as --tokenNoneDSTACK_SERVER_DIR Sets path to store data and server configs ~/.dstack/serverDSTACK_SERVER_HOST Has the same effect as --host127.0.0.1DSTACK_SERVER_LOG_LEVEL Has the same effect as --log-levelINFODSTACK_SERVER_LOG_FORMAT Sets format of log output. Can be rich, standard, json. richDSTACK_SERVER_PORT Has the same effect as --port3000DSTACK_SERVER_ROOT_LOG_LEVEL Sets root logger log level ERRORDSTACK_SERVER_UVICORN_LOG_LEVEL Sets uvicorn logger log level ERROR"},{"location":"docs/reference/dstack.yml/dev-environment/","title":"dev-environment","text":"
The dev-environment configuration type allows running dev environments.
Filename
Configuration files must have a name ending with .dstack.yml (e.g., .dstack.yml or dev.dstack.yml are both acceptable) and can be located in the project's root directory or any nested folder. Any configuration can be run via dstack run.
If you don't specify image, dstack uses the default Docker image pre-configured with python, pip, conda (Miniforge), and essential CUDA drivers. The python property determines which default Docker image is used.
If you specify memory size, you can either specify an explicit size (e.g. 24GB) or a range (e.g. 24GB.., or 24GB..80GB, or ..80GB).
type: dev-environment\n\nide: vscode\n\nresources:\n cpu: 16.. # 16 or more CPUs\n memory: 200GB.. # 200GB or more RAM\n gpu: 40GB..80GB:4 # 4 GPUs from 40GB to 80GB\n shm_size: 16GB # 16GB of shared memory\n disk: 500GB\n
The gpu property allows specifying not only memory size but also GPU names and their quantity. Examples: A100 (one A100), A10G,A100 (either A10G or A100), A100:80GB (one A100 of 80GB), A100:2 (two A100), 24GB..40GB:2 (two GPUs between 24GB and 40GB), A100:40GB:2 (two A100 GPUs of 40GB).
Shared memory
If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure shm_size, e.g. set it to 16GB.
If you don't assign a value to an environment variable (see HUGGING_FACE_HUB_TOKEN above), dstack will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a .env file and utilize tools like direnv.
"},{"location":"docs/reference/dstack.yml/dev-environment/#root-reference","title":"Root reference","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#ide","title":"ide - The IDE to run.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#version","title":"version - (Optional) The version of the IDE.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#init","title":"init - (Optional) The bash commands to run.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#image","title":"image - (Optional) The name of the Docker image to run.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#entrypoint","title":"entrypoint - (Optional) The Docker entrypoint.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#home_dir","title":"home_dir - (Optional) The absolute path to the home directory inside the container. Defaults to /root.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#_registry_auth","title":"registry_auth - (Optional) Credentials for pulling a private Docker image.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#python","title":"python - (Optional) The major version of Python. Mutually exclusive with image.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#env","title":"env - (Optional) The mapping or the list of environment variables.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#setup","title":"setup - (Optional) The bash commands to run on the boot.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#_resources","title":"resources - (Optional) The resources requirements to run the configuration.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#ports","title":"ports - (Optional) Port numbers/mapping to expose.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#backends","title":"backends - (Optional) The backends to consider for provisionig (e.g., [aws, gcp]).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#regions","title":"regions - (Optional) The regions to consider for provisionig (e.g., [eu-west-1, us-west4, westeurope]).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#instance_types","title":"instance_types - (Optional) The cloud-specific instance types to consider for provisionig (e.g., [p3.8xlarge, n1-standard-4]).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#spot_policy","title":"spot_policy - (Optional) The policy for provisioning spot or on-demand instances: spot, on-demand, or auto.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#_retry_policy","title":"retry_policy - (Optional) The policy for re-submitting the run.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#max_duration","title":"max_duration - (Optional) The maximum duration of a run (e.g., 2h, 1d, etc). After it elapses, the run is forced to stop. Defaults to off.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#max_price","title":"max_price - (Optional) The maximum price per hour, in dollars.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#pool_name","title":"pool_name - (Optional) The name of the pool. If not set, dstack will use the default name.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#instance_name","title":"instance_name - (Optional) The name of the instance.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#creation_policy","title":"creation_policy - (Optional) The policy for using instances from the pool. Defaults to reuse-or-create.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#termination_policy","title":"termination_policy - (Optional) The policy for termination instances. Defaults to destroy-after-idle.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#termination_idle_time","title":"termination_idle_time - (Optional) Time to wait before destroying the idle instance. Defaults to 5m for dstack run and to 3d for dstack pool add.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#resources","title":"resources","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#cpu","title":"cpu - (Optional) The number of CPU cores. Defaults to 2...","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#memory","title":"memory - (Optional) The RAM size (e.g., 8GB). Defaults to 8GB...","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#shm_size","title":"shm_size - (Optional) The size of shared memory (e.g., 8GB). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#_gpu","title":"gpu - (Optional) The GPU requirements. Can be set to a number, a string (e.g. A100, 80GB:2, etc.), or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#_disk","title":"disk - (Optional) The disk resources.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#resources-gpu","title":"resources.gpu","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#name","title":"name - (Optional) The GPU name or list of names.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#count","title":"count - (Optional) The number of GPUs. Defaults to 1.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#memory","title":"memory - (Optional) The VRAM size (e.g., 16GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#total_memory","title":"total_memory - (Optional) The total VRAM size (e.g., 32GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#compute_capability","title":"compute_capability - (Optional) The minimum compute capability of the GPU (e.g., 7.5).","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#resources-disk","title":"resources.disk","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#size","title":"size - The disk size. Can be a string (e.g., 100GB or 100GB..) or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#registry_auth","title":"registry_auth","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#username","title":"username - The username.","text":""},{"location":"docs/reference/dstack.yml/dev-environment/#password","title":"password - The password or access token.","text":""},{"location":"docs/reference/dstack.yml/service/","title":"service","text":"
The service configuration type allows running services.
Filename
Configuration files must have a name ending with .dstack.yml (e.g., .dstack.yml or serve.dstack.yml are both acceptable) and can be located in the project's root directory or any nested folder. Any configuration can be run via dstack run.
If you don't specify image, dstack uses the default Docker image pre-configured with python, pip, conda (Miniforge), and essential CUDA drivers. The python property determines which default Docker image is used.
In this case, with such a configuration, once the service is up, you'll be able to access the model at https://gateway.<gateway domain> via the OpenAI-compatible interface. See services for more detail.
"},{"location":"docs/reference/dstack.yml/service/#replicas-and-auto-scaling","title":"Replicas and auto-scaling","text":"
By default, dstack runs a single replica of the service. You can configure the number of replicas as well as the auto-scaling policy.
If you specify memory size, you can either specify an explicit size (e.g. 24GB) or a range (e.g. 24GB.., or 24GB..80GB, or ..80GB).
type: service\n\npython: \"3.11\"\ncommands:\n - pip install vllm\n - python -m vllm.entrypoints.openai.api_server\n --model mistralai/Mixtral-8X7B-Instruct-v0.1\n --host 0.0.0.0\n --tensor-parallel-size 2 # Match the number of GPUs\nport: 8000\n\nresources:\n gpu: 80GB:2 # 2 GPUs of 80GB\n disk: 200GB\n\n# Enable the OpenAI-compatible endpoint\nmodel:\n type: chat\n name: TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ\n format: openai\n
The gpu property allows specifying not only memory size but also GPU names and their quantity. Examples: A100 (one A100), A10G,A100 (either A10G or A100), A100:80GB (one A100 of 80GB), A100:2 (two A100), 24GB..40GB:2 (two GPUs between 24GB and 40GB), A100:40GB:2 (two A100 GPUs of 40GB).
Shared memory
If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure shm_size, e.g. set it to 16GB.
By default, the service endpoint requires the Authorization header with \"Bearer <dstack token>\". Authorization can be disabled by setting auth to false.
"},{"location":"docs/reference/dstack.yml/service/#root-reference","title":"Root reference","text":""},{"location":"docs/reference/dstack.yml/service/#port","title":"port - The port, that application listens on or the mapping.","text":""},{"location":"docs/reference/dstack.yml/service/#model","title":"model - (Optional) Mapping of the model for the OpenAI-compatible endpoint.","text":""},{"location":"docs/reference/dstack.yml/service/#auth","title":"auth - (Optional) Enable the authorization. Defaults to True.","text":""},{"location":"docs/reference/dstack.yml/service/#replicas","title":"replicas - (Optional) The range . Defaults to 1.","text":""},{"location":"docs/reference/dstack.yml/service/#_scaling","title":"scaling - (Optional) The auto-scaling configuration.","text":""},{"location":"docs/reference/dstack.yml/service/#image","title":"image - (Optional) The name of the Docker image to run.","text":""},{"location":"docs/reference/dstack.yml/service/#entrypoint","title":"entrypoint - (Optional) The Docker entrypoint.","text":""},{"location":"docs/reference/dstack.yml/service/#home_dir","title":"home_dir - (Optional) The absolute path to the home directory inside the container. Defaults to /root.","text":""},{"location":"docs/reference/dstack.yml/service/#_registry_auth","title":"registry_auth - (Optional) Credentials for pulling a private Docker image.","text":""},{"location":"docs/reference/dstack.yml/service/#python","title":"python - (Optional) The major version of Python. Mutually exclusive with image.","text":""},{"location":"docs/reference/dstack.yml/service/#env","title":"env - (Optional) The mapping or the list of environment variables.","text":""},{"location":"docs/reference/dstack.yml/service/#setup","title":"setup - (Optional) The bash commands to run on the boot.","text":""},{"location":"docs/reference/dstack.yml/service/#_resources","title":"resources - (Optional) The resources requirements to run the configuration.","text":""},{"location":"docs/reference/dstack.yml/service/#commands","title":"commands - (Optional) The bash commands to run.","text":""},{"location":"docs/reference/dstack.yml/service/#backends","title":"backends - (Optional) The backends to consider for provisionig (e.g., [aws, gcp]).","text":""},{"location":"docs/reference/dstack.yml/service/#regions","title":"regions - (Optional) The regions to consider for provisionig (e.g., [eu-west-1, us-west4, westeurope]).","text":""},{"location":"docs/reference/dstack.yml/service/#instance_types","title":"instance_types - (Optional) The cloud-specific instance types to consider for provisionig (e.g., [p3.8xlarge, n1-standard-4]).","text":""},{"location":"docs/reference/dstack.yml/service/#spot_policy","title":"spot_policy - (Optional) The policy for provisioning spot or on-demand instances: spot, on-demand, or auto.","text":""},{"location":"docs/reference/dstack.yml/service/#_retry_policy","title":"retry_policy - (Optional) The policy for re-submitting the run.","text":""},{"location":"docs/reference/dstack.yml/service/#max_duration","title":"max_duration - (Optional) The maximum duration of a run (e.g., 2h, 1d, etc). After it elapses, the run is forced to stop. Defaults to off.","text":""},{"location":"docs/reference/dstack.yml/service/#max_price","title":"max_price - (Optional) The maximum price per hour, in dollars.","text":""},{"location":"docs/reference/dstack.yml/service/#pool_name","title":"pool_name - (Optional) The name of the pool. If not set, dstack will use the default name.","text":""},{"location":"docs/reference/dstack.yml/service/#instance_name","title":"instance_name - (Optional) The name of the instance.","text":""},{"location":"docs/reference/dstack.yml/service/#creation_policy","title":"creation_policy - (Optional) The policy for using instances from the pool. Defaults to reuse-or-create.","text":""},{"location":"docs/reference/dstack.yml/service/#termination_policy","title":"termination_policy - (Optional) The policy for termination instances. Defaults to destroy-after-idle.","text":""},{"location":"docs/reference/dstack.yml/service/#termination_idle_time","title":"termination_idle_time - (Optional) Time to wait before destroying the idle instance. Defaults to 5m for dstack run and to 3d for dstack pool add.","text":""},{"location":"docs/reference/dstack.yml/service/#model_1","title":"model","text":""},{"location":"docs/reference/dstack.yml/service/#type","title":"type - The type of the model.","text":""},{"location":"docs/reference/dstack.yml/service/#name","title":"name - The name of the model.","text":""},{"location":"docs/reference/dstack.yml/service/#format","title":"format - The serving format.","text":""},{"location":"docs/reference/dstack.yml/service/#scaling","title":"scaling","text":""},{"location":"docs/reference/dstack.yml/service/#metric","title":"metric - The target metric to track.","text":""},{"location":"docs/reference/dstack.yml/service/#target","title":"target - The target value of the metric.","text":""},{"location":"docs/reference/dstack.yml/service/#scale_up_delay","title":"scale_up_delay - (Optional) The delay in seconds before scaling up. Defaults to 300.","text":""},{"location":"docs/reference/dstack.yml/service/#scale_down_delay","title":"scale_down_delay - (Optional) The delay in seconds before scaling down. Defaults to 600.","text":""},{"location":"docs/reference/dstack.yml/service/#resources","title":"resources","text":""},{"location":"docs/reference/dstack.yml/service/#cpu","title":"cpu - (Optional) The number of CPU cores. Defaults to 2...","text":""},{"location":"docs/reference/dstack.yml/service/#memory","title":"memory - (Optional) The RAM size (e.g., 8GB). Defaults to 8GB...","text":""},{"location":"docs/reference/dstack.yml/service/#shm_size","title":"shm_size - (Optional) The size of shared memory (e.g., 8GB). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.","text":""},{"location":"docs/reference/dstack.yml/service/#_gpu","title":"gpu - (Optional) The GPU requirements. Can be set to a number, a string (e.g. A100, 80GB:2, etc.), or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/service/#_disk","title":"disk - (Optional) The disk resources.","text":""},{"location":"docs/reference/dstack.yml/service/#resources-gpu","title":"resouces.gpu","text":""},{"location":"docs/reference/dstack.yml/service/#name","title":"name - (Optional) The GPU name or list of names.","text":""},{"location":"docs/reference/dstack.yml/service/#count","title":"count - (Optional) The number of GPUs. Defaults to 1.","text":""},{"location":"docs/reference/dstack.yml/service/#memory","title":"memory - (Optional) The VRAM size (e.g., 16GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/service/#total_memory","title":"total_memory - (Optional) The total VRAM size (e.g., 32GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/service/#compute_capability","title":"compute_capability - (Optional) The minimum compute capability of the GPU (e.g., 7.5).","text":""},{"location":"docs/reference/dstack.yml/service/#resources-disk","title":"resouces.disk","text":""},{"location":"docs/reference/dstack.yml/service/#size","title":"size - The disk size. Can be a string (e.g., 100GB or 100GB..) or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/service/#registry_auth","title":"registry_auth","text":""},{"location":"docs/reference/dstack.yml/service/#username","title":"username - The username.","text":""},{"location":"docs/reference/dstack.yml/service/#password","title":"password - The password or access token.","text":""},{"location":"docs/reference/dstack.yml/task/","title":"task","text":"
The task configuration type allows running tasks.
Filename
Configuration files must have a name ending with .dstack.yml (e.g., .dstack.yml or train.dstack.yml are both acceptable) and can be located in the project's root directory or any nested folder. Any configuration can be run via dstack run.
If you don't specify image, dstack uses the default Docker image pre-configured with python, pip, conda (Miniforge), and essential CUDA drivers. The python property determines which default Docker image is used.
A task can configure ports. In this case, if the task is running an application on a port, dstack run will securely allow you to access this port from your local machine through port forwarding.
If you specify memory size, you can either specify an explicit size (e.g. 24GB) or a range (e.g. 24GB.., or 24GB..80GB, or ..80GB).
type: task\n\ncommands:\n - pip install -r fine-tuning/qlora/requirements.txt\n - python fine-tuning/qlora/train.py\n\nresources:\n cpu: 16.. # 16 or more CPUs\n memory: 200GB.. # 200GB or more RAM\n gpu: 40GB..80GB:4 # 4 GPUs from 40GB to 80GB\n shm_size: 16GB # 16GB of shared memory\n disk: 500GB\n
The gpu property allows specifying not only memory size but also GPU names and their quantity. Examples: A100 (one A100), A10G,A100 (either A10G or A100), A100:80GB (one A100 of 80GB), A100:2 (two A100), 24GB..40GB:2 (two GPUs between 24GB and 40GB), A100:40GB:2 (two A100 GPUs of 40GB).
Shared memory
If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure shm_size, e.g. set it to 16GB.
If you don't assign a value to an environment variable (see HUGGING_FACE_HUB_TOKEN above), dstack will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a .env file and utilize tools like direnv.
"},{"location":"docs/reference/dstack.yml/task/#root-reference","title":"Root reference","text":""},{"location":"docs/reference/dstack.yml/task/#nodes","title":"nodes - (Optional) Number of nodes. Defaults to 1.","text":""},{"location":"docs/reference/dstack.yml/task/#image","title":"image - (Optional) The name of the Docker image to run.","text":""},{"location":"docs/reference/dstack.yml/task/#entrypoint","title":"entrypoint - (Optional) The Docker entrypoint.","text":""},{"location":"docs/reference/dstack.yml/task/#home_dir","title":"home_dir - (Optional) The absolute path to the home directory inside the container. Defaults to /root.","text":""},{"location":"docs/reference/dstack.yml/task/#_registry_auth","title":"registry_auth - (Optional) Credentials for pulling a private Docker image.","text":""},{"location":"docs/reference/dstack.yml/task/#python","title":"python - (Optional) The major version of Python. Mutually exclusive with image.","text":""},{"location":"docs/reference/dstack.yml/task/#env","title":"env - (Optional) The mapping or the list of environment variables.","text":""},{"location":"docs/reference/dstack.yml/task/#setup","title":"setup - (Optional) The bash commands to run on the boot.","text":""},{"location":"docs/reference/dstack.yml/task/#_resources","title":"resources - (Optional) The resources requirements to run the configuration.","text":""},{"location":"docs/reference/dstack.yml/task/#ports","title":"ports - (Optional) Port numbers/mapping to expose.","text":""},{"location":"docs/reference/dstack.yml/task/#commands","title":"commands - (Optional) The bash commands to run.","text":""},{"location":"docs/reference/dstack.yml/task/#backends","title":"backends - (Optional) The backends to consider for provisionig (e.g., [aws, gcp]).","text":""},{"location":"docs/reference/dstack.yml/task/#regions","title":"regions - (Optional) The regions to consider for provisionig (e.g., [eu-west-1, us-west4, westeurope]).","text":""},{"location":"docs/reference/dstack.yml/task/#instance_types","title":"instance_types - (Optional) The cloud-specific instance types to consider for provisionig (e.g., [p3.8xlarge, n1-standard-4]).","text":""},{"location":"docs/reference/dstack.yml/task/#spot_policy","title":"spot_policy - (Optional) The policy for provisioning spot or on-demand instances: spot, on-demand, or auto.","text":""},{"location":"docs/reference/dstack.yml/task/#_retry_policy","title":"retry_policy - (Optional) The policy for re-submitting the run.","text":""},{"location":"docs/reference/dstack.yml/task/#max_duration","title":"max_duration - (Optional) The maximum duration of a run (e.g., 2h, 1d, etc). After it elapses, the run is forced to stop. Defaults to off.","text":""},{"location":"docs/reference/dstack.yml/task/#max_price","title":"max_price - (Optional) The maximum price per hour, in dollars.","text":""},{"location":"docs/reference/dstack.yml/task/#pool_name","title":"pool_name - (Optional) The name of the pool. If not set, dstack will use the default name.","text":""},{"location":"docs/reference/dstack.yml/task/#instance_name","title":"instance_name - (Optional) The name of the instance.","text":""},{"location":"docs/reference/dstack.yml/task/#creation_policy","title":"creation_policy - (Optional) The policy for using instances from the pool. Defaults to reuse-or-create.","text":""},{"location":"docs/reference/dstack.yml/task/#termination_policy","title":"termination_policy - (Optional) The policy for termination instances. Defaults to destroy-after-idle.","text":""},{"location":"docs/reference/dstack.yml/task/#termination_idle_time","title":"termination_idle_time - (Optional) Time to wait before destroying the idle instance. Defaults to 5m for dstack run and to 3d for dstack pool add.","text":""},{"location":"docs/reference/dstack.yml/task/#resources","title":"resources","text":""},{"location":"docs/reference/dstack.yml/task/#cpu","title":"cpu - (Optional) The number of CPU cores. Defaults to 2...","text":""},{"location":"docs/reference/dstack.yml/task/#memory","title":"memory - (Optional) The RAM size (e.g., 8GB). Defaults to 8GB...","text":""},{"location":"docs/reference/dstack.yml/task/#shm_size","title":"shm_size - (Optional) The size of shared memory (e.g., 8GB). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this.","text":""},{"location":"docs/reference/dstack.yml/task/#_gpu","title":"gpu - (Optional) The GPU requirements. Can be set to a number, a string (e.g. A100, 80GB:2, etc.), or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/task/#_disk","title":"disk - (Optional) The disk resources.","text":""},{"location":"docs/reference/dstack.yml/task/#resources-gpu","title":"resouces.gpu","text":""},{"location":"docs/reference/dstack.yml/task/#name","title":"name - (Optional) The GPU name or list of names.","text":""},{"location":"docs/reference/dstack.yml/task/#count","title":"count - (Optional) The number of GPUs. Defaults to 1.","text":""},{"location":"docs/reference/dstack.yml/task/#memory","title":"memory - (Optional) The VRAM size (e.g., 16GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/task/#total_memory","title":"total_memory - (Optional) The total VRAM size (e.g., 32GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB).","text":""},{"location":"docs/reference/dstack.yml/task/#compute_capability","title":"compute_capability - (Optional) The minimum compute capability of the GPU (e.g., 7.5).","text":""},{"location":"docs/reference/dstack.yml/task/#resources-disk","title":"resouces.disk","text":""},{"location":"docs/reference/dstack.yml/task/#size","title":"size - The disk size. Can be a string (e.g., 100GB or 100GB..) or an object; see examples.","text":""},{"location":"docs/reference/dstack.yml/task/#registry_auth","title":"registry_auth","text":""},{"location":"docs/reference/dstack.yml/task/#username","title":"username - The username.","text":""},{"location":"docs/reference/dstack.yml/task/#password","title":"password - The password or access token.","text":""},{"location":"docs/reference/server/config.yml/","title":"~/.dstack/server/config.yml","text":"
The ~/.dstack/server/config.yml file is used by the dstack server to configure cloud accounts.
Projects
For flexibility, dstack server permits you to configure backends for multiple projects. If you intend to use only one project, name it main.
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_host: localhost # The external IP address of any node\n ssh_port: 32000 # Any port accessible outside of the cluster\n
projects:\n- name: main\n backends:\n - type: kubernetes\n kubeconfig:\n filename: ~/.kube/config\n networking:\n ssh_port: 32000 # Any port accessible outside of the cluster\n
For more details on configuring clouds, please refer to Installation.
"},{"location":"docs/reference/server/config.yml/#root-reference","title":"Root reference","text":""},{"location":"docs/reference/server/config.yml/#_projects","title":"projects - The list of projects.","text":""},{"location":"docs/reference/server/config.yml/#projects","title":"projects[n]","text":""},{"location":"docs/reference/server/config.yml/#name","title":"name - The name of the project.","text":""},{"location":"docs/reference/server/config.yml/#backends","title":"backends - The list of backends.","text":""},{"location":"docs/reference/server/config.yml/#aws","title":"projects[n].backends[type=aws]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of the backend. Must be aws.","text":""},{"location":"docs/reference/server/config.yml/#vpc_name","title":"vpc_name - (Optional) The VPC name.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#aws-creds","title":"projects[n].backends[type=aws].creds","text":"Access keyDefault"},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be access_key.","text":""},{"location":"docs/reference/server/config.yml/#access_key","title":"access_key - The access key.","text":""},{"location":"docs/reference/server/config.yml/#secret_key","title":"secret_key - The secret key.","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be default.","text":""},{"location":"docs/reference/server/config.yml/#azure","title":"projects[n].backends[type=azure]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of the backend. Must be azure.","text":""},{"location":"docs/reference/server/config.yml/#tenant_id","title":"tenant_id - The tenant ID.","text":""},{"location":"docs/reference/server/config.yml/#subscription_id","title":"subscription_id - The subscription ID.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#azure-creds","title":"projects[n].backends[type=azure].creds","text":"ClientDefault"},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be client.","text":""},{"location":"docs/reference/server/config.yml/#client_id","title":"client_id - The client ID.","text":""},{"location":"docs/reference/server/config.yml/#client_secret","title":"client_secret - The client secret.","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be default.","text":""},{"location":"docs/reference/server/config.yml/#datacrunch","title":"projects[n].backends[type=datacrunch]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be datacrunch.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#datacrunch-creds","title":"projects[n].backends[type=datacrunch].creds","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be api_key.","text":""},{"location":"docs/reference/server/config.yml/#client_id","title":"client_id - The client ID.","text":""},{"location":"docs/reference/server/config.yml/#client_secret","title":"client_secret - The client secret.","text":""},{"location":"docs/reference/server/config.yml/#gcp","title":"projects[n].backends[type=gcp]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be gcp.","text":""},{"location":"docs/reference/server/config.yml/#project_id","title":"project_id - The project ID.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#gcp-creds","title":"projects[n].backends[type=gcp].creds","text":"Service accountDefault"},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be service_account.","text":""},{"location":"docs/reference/server/config.yml/#filename","title":"filename - The path to the service account file.","text":""},{"location":"docs/reference/server/config.yml/#data","title":"data - (Optional) The contents of the service account file.","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be default.","text":""},{"location":"docs/reference/server/config.yml/#lambda","title":"projects[n].backends[type=lambda]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be lambda.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#lambda-creds","title":"projects[n].backends[type=lambda].creds","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be api_key.","text":""},{"location":"docs/reference/server/config.yml/#api_key","title":"api_key - The API key.","text":""},{"location":"docs/reference/server/config.yml/#tensordock","title":"projects[n].backends[type=tensordock]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be tensordock.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#tensordock-creds","title":"projects[n].backends[type=tensordock].creds","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be api_key.","text":""},{"location":"docs/reference/server/config.yml/#api_key","title":"api_key - The API key.","text":""},{"location":"docs/reference/server/config.yml/#api_token","title":"api_token - The API token.","text":""},{"location":"docs/reference/server/config.yml/#vastai","title":"projects[n].backends[type=vastai]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be vastai.","text":""},{"location":"docs/reference/server/config.yml/#_creds","title":"creds - The credentials.","text":""},{"location":"docs/reference/server/config.yml/#vastai-creds","title":"projects[n].backends[type=vastai].creds","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of credentials. Must be api_key.","text":""},{"location":"docs/reference/server/config.yml/#api_key","title":"api_key - The API key.","text":""},{"location":"docs/reference/server/config.yml/#kubernetes","title":"projects[n].backends[type=kubernetes]","text":""},{"location":"docs/reference/server/config.yml/#type","title":"type - The type of backend. Must be kubernetes.","text":""},{"location":"docs/reference/server/config.yml/#_kubeconfig","title":"kubeconfig - The kubeconfig configuration.","text":""},{"location":"docs/reference/server/config.yml/#_networking","title":"networking - (Optional) The networking configuration.","text":""},{"location":"docs/reference/server/config.yml/#kubeconfig","title":"projects[n].backends[type=kubernetes].kubeconfig","text":""},{"location":"docs/reference/server/config.yml/#filename","title":"filename - The path to the kubeconfig file.","text":""},{"location":"docs/reference/server/config.yml/#data","title":"data - (Optional) The contents of the kubeconfig file.","text":""},{"location":"docs/reference/server/config.yml/#networking","title":"projects[n].backends[type=kubernetes].networking","text":""},{"location":"docs/reference/server/config.yml/#ssh_host","title":"ssh_host - (Optional) The external IP address of any node.","text":""},{"location":"docs/reference/server/config.yml/#ssh_port","title":"ssh_port - (Optional) Any port accessible outside of the cluster.","text":""},{"location":"changelog/archive/2024/","title":"2024","text":""},{"location":"changelog/archive/2023/","title":"2023","text":""},{"location":"changelog/page/2/","title":"Changelog","text":""},{"location":"blog/archive/2024/","title":"2024","text":""}]}
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
index 10402f98..1edcdd17 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,282 +2,232 @@
https://dstack.ai/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/pricing/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/privacy/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/terms/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/blog/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/blog/archive/say-goodbye-to-managed-notebooks/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/blog/dstack-sky/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.10.5/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.10.7/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.11.0/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.12.0/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.12.2/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.12.3/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.13.0/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.14.0/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.15.0/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.15.1/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.16.0/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.16.1/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.16.4/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.17.0/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.18.0/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.2/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.7.0/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/0.9.1/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/quickstart/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/concepts/dev-environments/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/concepts/pools/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/concepts/services/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/concepts/tasks/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/installation/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/reference/dstack.yml/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/reference/profiles.yml/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/reference/api/python/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/reference/api/rest/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/reference/cli/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/reference/dstack.yml/dev-environment/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/reference/dstack.yml/service/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/reference/dstack.yml/task/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/docs/reference/server/config.yml/
- 2024-04-15
- daily
-
-
- https://dstack.ai/examples/
- 2024-04-15
- daily
-
-
- https://dstack.ai/examples/infinity/
- 2024-04-15
- daily
-
-
- https://dstack.ai/examples/llama-index/
- 2024-04-15
- daily
-
-
- https://dstack.ai/examples/mixtral/
- 2024-04-15
- daily
-
-
- https://dstack.ai/examples/ollama/
- 2024-04-15
- daily
-
-
- https://dstack.ai/examples/qlora/
- 2024-04-15
- daily
-
-
- https://dstack.ai/examples/sdxl/
- 2024-04-15
- daily
-
-
- https://dstack.ai/examples/tei/
- 2024-04-15
- daily
-
-
- https://dstack.ai/examples/tgi/
- 2024-04-15
- daily
-
-
- https://dstack.ai/examples/vllm/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/archive/2024/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/archive/2023/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/changelog/page/2/
- 2024-04-15
+ 2024-04-19dailyhttps://dstack.ai/blog/archive/2024/
- 2024-04-15
+ 2024-04-19daily
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index d9be41a7..20bd96c0 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ
diff --git a/terms/index.html b/terms/index.html
index 284d72aa..57d7e53f 100644
--- a/terms/index.html
+++ b/terms/index.html
@@ -21,7 +21,7 @@
-
+
@@ -29,7 +29,7 @@
-
+
@@ -204,10 +204,10 @@