From bacd676e4e7f779e2b8d0209aec0118cf3679c71 Mon Sep 17 00:00:00 2001 From: peterschmidt85 Date: Wed, 3 Jul 2024 20:53:04 +0200 Subject: [PATCH] - [Docs] Minor updates to the volumes documentation --- docs/docs/concepts/volumes.md | 139 +++++++----------- .../reference/dstack.yml/dev-environment.md | 27 ++++ docs/docs/reference/dstack.yml/service.md | 25 ++++ docs/docs/reference/dstack.yml/task.md | 31 ++++ docs/docs/reference/dstack.yml/volume.md | 24 ++- 5 files changed, 145 insertions(+), 101 deletions(-) diff --git a/docs/docs/concepts/volumes.md b/docs/docs/concepts/volumes.md index bfe52c79e..dabb8d057 100644 --- a/docs/docs/concepts/volumes.md +++ b/docs/docs/concepts/volumes.md @@ -1,26 +1,17 @@ # Volumes -Volumes allow persisting data between runs. When you add a volume, -`dstack` provisions a network disk in the cloud, such as an AWS EBS. -Then you can mount the volume as a directory in a run and store data there. -After the run is terminated, the volume can be mounted again and the stored data will persist. +Volumes allow you to persist data between runs. `dstack` simplifies managing volumes and lets you mount them to a specific +directory when working with dev environments, tasks, and services. -`dstack` supports creating new volumes (a.k.a. `dstack`-managed volumes) -and also registering existing volumes (a.k.a. external volumes). -The latter allows accessing data that is already stored on some volume, such as pre-processed training data. +!!! info "Experimental" + Volumes are currently experimental and only work with the `aws` backend. Support for other backends is coming soon. -!!! info "Backends" - Currently, volumes are supported only for `aws`. Support for other backends is coming soon! +## Configuration -!!! info "File system" - `dstack` creates an ext4 file system on `dstack`-managed volumes automatically. - If you register an external volume, you must ensure it already has a file system. +First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `vol.dstack.yml` +are both acceptable). -## Creating new volumes - -First create a `volume` configuration file and specify `size` of the volume you'd like to provision: - -
+
```yaml type: volume @@ -32,70 +23,43 @@ size: 100GB
-Then apply the configuration to create the volume: - -
- -```shell -$ dstack apply -f new-volume.dstack.yml -Volume my-new-volume does not exist yet. Create the volume? [y/n]: y - NAME BACKEND REGION STATUS CREATED - my-new-volume aws eu-central-1 submitted now - -``` - -
- -The volume is created and can be mounted in runs! - -!!! info "Volume parameters" - `dstack` has default volume parameters for every backend so you can specify only `size`. - On AWS, `dstack` provisions EBS gp3 volumes. - +If you use this configuration, `dstack` will create a new volume based on the specified options. -## Register existing volumes +!!! info "Registering existing volumes" + If you prefer not to create a new volume but to reuse an existing one (e.g., created manually), you can + [specify its ID via `volume_id`](../reference/dstack.yml/volume.md#register-volume). In this case, `dstack` will register the specified volume so that you can use it with development + environments, tasks, and services. -If you already have a volume in your cloud account that you'd like to use with `dstack`, -create a `volume` configuration file with `volume_id` specified: +!!! info "Reference" + See the [.dstack.yml reference](../reference/dstack.yml/dev-environment.md) + for all supported configuration options and multiple examples. -
+## Creating and registering volumes -```yaml -type: volume -name: my-external-volume -backend: aws -region: eu-central-1 -volume_id: vol1235 -``` - -
- -Then apply the configuration to register the volume: +To create or register the volume, simply call the `dstack apply` command:
```shell -$ dstack apply -f external-volume.dstack.yml -Volume my-external-volume does not exist yet. Create the volume? [y/n]: y - NAME BACKEND REGION STATUS CREATED - my-external-volume aws eu-central-1 submitted now +$ dstack apply -f volume.dstack.yml +Volume my-new-volume does not exist yet. Create the volume? [y/n]: y + NAME BACKEND REGION STATUS CREATED + my-new-volume aws eu-central-1 submitted now ```
-The volume is registered and can be mounted in runs! +> When creating the volume `dstack` automatically creates an `ext4` file system on it. +Once created, the volume can be attached with dev environments, tasks, and services. -## Mount volumes in runs +## Attaching volumes -Suppose we need to run a dev environment. -We could mount a volume and store our work there so it's not lost between run restarts or instance interruptions. -We do it by specifying a list of `volumes`. -Each item in `volumes` should have `name` of the volume and `path` where the volume should be mounted in the run. -Here's what it looks like: +Dev environments, tasks, and services let you attach any number of volumes. +To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents: -
+
```yaml type: dev-environment @@ -107,39 +71,38 @@ volumes:
-Then we can run this `dev-environment` configuration, ssh into the run, and see `/volume_data`: +Once you run this configuration, the contents of the volume will be attached to `/volume_data` inside the dev environment, +and its contents will persist across runs. -```shell --(workflow) root@ip-10-0-10-73:/workflow# ls -l / -total 92 -drwxr-xr-x 2 root root 4096 Apr 15 2020 home -... -drwxr-xr-x 3 root root 4096 Jun 28 07:02 volume_data -drwxr-xr-x 5 root root 4096 Jun 28 07:13 workflow -``` +!!! info "Limitations" + When you're running a dev environment, task, or service with `dstack`, it automatically mounts the project folder contents + to `/workflow` (and sets that as the current working directory). Right now, `dstack` doesn't allow you to + attach volumes to `/workflow` or any of its subdirectories. + +## Managing gateways -## Deleting volumes +**Deleting gateways** -After the run is stopped, a volume can be deleted with `dstack delete`: +When the volume isn't attached to any active dev environment, task, or service, you can delete it using `dstack delete`: ```shell -$ dstack delete -f .dstack/confs/volume.yaml -Delete the volume my-new-volume? [y/n]: y -Volume my-new-volume deleted +$ dstack delete -f vol.dstack.yaml ``` -Note that deleting `dstack`-managed volumes destroys all the volumes data! -Deleting external volumes makes `dstack` "forget" about the volumes, but they remain in the cloud. +If the volume was created using `dstack`, it will be physically destroyed along with the data. +If you've registered an existing volume, it will be de-registered with `dstack` but will keep the data. -## FAQ +**Listing volumes** -1. Can I mount volumes from one cloud on instances from other clouds? +The [`dstack volume list`](../reference/cli/index.md#dstack-gateway-list) command lists created and registered volumes. - No. Since volumes are backed up by cloud network disks, they can only be used with instances in the same cloud. - If you need to access data from different clouds, consider uploading it to an object storage. +## FAQ -2. Can I mount volumes from one region/zone on instances from other regions/zones? +??? info "Using volumes across backends" + Since volumes are backed up by cloud network disks, you can only use them within the same cloud. If you need to access + data across different backends, you should either use object storage (or replicate the data across multiple volumes). - It depends on the cloud and volume type. Generally, network volumes are tied to regions so they cannot be - used in other regions. Volumes are also often tied to availability zones but - some clouds support volumes that can be used across availability zones within a region. +??? info "Using volumes across regions" + Typically, network volumes are associated with specific regions, so you can't use them in other regions. Sometimes, + volumes are also linked to availability zones, but some systems allow volumes that can be used across different + availability zones within the same region. \ No newline at end of file diff --git a/docs/docs/reference/dstack.yml/dev-environment.md b/docs/docs/reference/dstack.yml/dev-environment.md index a0d14fedf..1c129ad53 100644 --- a/docs/docs/reference/dstack.yml/dev-environment.md +++ b/docs/docs/reference/dstack.yml/dev-environment.md @@ -190,6 +190,33 @@ regions: [eu-west-1, eu-west-2]
+### Volumes + +Volumes allow you to persist data between runs. +To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents: + +
+ +```yaml +type: dev-environment + +ide: vscode + +volumes: + - name: my-new-volume + path: /volume_data +``` + +
+ +Once you run this configuration, the contents of the volume will be attached to `/volume_data` inside the development +environment, and its contents will persist across runs. + +!!! info "Limitations" + When you're running a dev environment, task, or service with `dstack`, it automatically mounts the project folder contents + to `/workflow` (and sets that as the current working directory). Right now, `dstack` doesn't allow you to + attach volumes to `/workflow` or any of its subdirectories. + The `dev-environment` configuration type supports many other options. See below. ## Root reference diff --git a/docs/docs/reference/dstack.yml/service.md b/docs/docs/reference/dstack.yml/service.md index a7efec2ea..87f192fa5 100644 --- a/docs/docs/reference/dstack.yml/service.md +++ b/docs/docs/reference/dstack.yml/service.md @@ -349,6 +349,31 @@ regions: [eu-west-1, eu-west-2]
+### Volumes + +Volumes allow you to persist data between runs. +To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents: + +
+ +```yaml +type: service + +commands: + - python3 -m http.server + +port: 8000 + +volumes: + - name: my-new-volume + path: /volume_data +``` + +
+ +Once you run this configuration, the contents of the volume will be attached to `/volume_data` inside the service, +and its contents will persist across runs. + The `service` configuration type supports many other options. See below. ## Root reference diff --git a/docs/docs/reference/dstack.yml/task.md b/docs/docs/reference/dstack.yml/task.md index 0821137e4..8a94e2dd5 100644 --- a/docs/docs/reference/dstack.yml/task.md +++ b/docs/docs/reference/dstack.yml/task.md @@ -333,6 +333,37 @@ regions: [eu-west-1, eu-west-2] +### Volumes + +Volumes allow you to persist data between runs. +To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents: + +
+ +```yaml +type: task + +python: "3.11" + +commands: + - pip install -r fine-tuning/qlora/requirements.txt + - python fine-tuning/qlora/train.py + +volumes: + - name: my-new-volume + path: /volume_data +``` + +
+ +Once you run this configuration, the contents of the volume will be attached to `/volume_data` inside the task, +and its contents will persist across runs. + +!!! info "Limitations" + When you're running a dev environment, task, or service with `dstack`, it automatically mounts the project folder contents + to `/workflow` (and sets that as the current working directory). Right now, `dstack` doesn't allow you to + attach volumes to `/workflow` or any of its subdirectories. + The `task` configuration type supports many other options. See below. ## Root reference diff --git a/docs/docs/reference/dstack.yml/volume.md b/docs/docs/reference/dstack.yml/volume.md index c4caa2c5c..03351fb6e 100644 --- a/docs/docs/reference/dstack.yml/volume.md +++ b/docs/docs/reference/dstack.yml/volume.md @@ -1,17 +1,16 @@ # volume -The `volume` configuration type allows creating and updating volumes. +The `volume` configuration type allows creating, registering, and updating volumes. -!!! info "Filename" - Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `volume.dstack.yml` are both acceptable) - and can be located in the project's root directory or any nested folder. - Any configuration can be applied via [`dstack apply`](../cli/index.md#dstack-apply). +> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `vol.dstack.yml` are both acceptable) +> and can be located in the project's root directory or any nested folder. +> Any configuration can be applied via [`dstack apply`](../cli/index.md#dstack-apply). ## Examples -### AWS EBS volume +### Creating a new volume { #create-volume } -
+
```yaml type: volume @@ -23,17 +22,16 @@ size: 100GB
+### Registering an existing volume { #register-volume } -### AWS EBS external volume - -
- +
+ ```yaml type: volume -name: my-ext-aws-volume +name: my-external-volume backend: aws region: eu-central-1 -volume_id: vol-123456 +volume_id: vol1235 ```