Skip to content

Commit

Permalink
- [Docs] Minor updates to the volumes documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
peterschmidt85 committed Jul 3, 2024
1 parent 3152f5f commit bacd676
Show file tree
Hide file tree
Showing 5 changed files with 145 additions and 101 deletions.
139 changes: 51 additions & 88 deletions docs/docs/concepts/volumes.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,17 @@
# Volumes

Volumes allow persisting data between runs. When you add a volume,
`dstack` provisions a network disk in the cloud, such as an AWS EBS.
Then you can mount the volume as a directory in a run and store data there.
After the run is terminated, the volume can be mounted again and the stored data will persist.
Volumes allow you to persist data between runs. `dstack` simplifies managing volumes and lets you mount them to a specific
directory when working with dev environments, tasks, and services.

`dstack` supports creating new volumes (a.k.a. `dstack`-managed volumes)
and also registering existing volumes (a.k.a. external volumes).
The latter allows accessing data that is already stored on some volume, such as pre-processed training data.
!!! info "Experimental"
Volumes are currently experimental and only work with the `aws` backend. Support for other backends is coming soon.

!!! info "Backends"
Currently, volumes are supported only for `aws`. Support for other backends is coming soon!
## Configuration

!!! info "File system"
`dstack` creates an ext4 file system on `dstack`-managed volumes automatically.
If you register an external volume, you must ensure it already has a file system.
First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `vol.dstack.yml`
are both acceptable).

## Creating new volumes

First create a `volume` configuration file and specify `size` of the volume you'd like to provision:

<div editor-title="new-volume.dstack.yml">
<div editor-title="vol.dstack.yml">

```yaml
type: volume
Expand All @@ -32,70 +23,43 @@ size: 100GB
</div>
Then apply the configuration to create the volume:
<div class="termy">
```shell
$ dstack apply -f new-volume.dstack.yml
Volume my-new-volume does not exist yet. Create the volume? [y/n]: y
NAME BACKEND REGION STATUS CREATED
my-new-volume aws eu-central-1 submitted now

```
</div>
The volume is created and can be mounted in runs!
!!! info "Volume parameters"
`dstack` has default volume parameters for every backend so you can specify only `size`.
On AWS, `dstack` provisions EBS gp3 volumes.

If you use this configuration, `dstack` will create a new volume based on the specified options.

## Register existing volumes
!!! info "Registering existing volumes"
If you prefer not to create a new volume but to reuse an existing one (e.g., created manually), you can
[specify its ID via `volume_id`](../reference/dstack.yml/volume.md#register-volume). In this case, `dstack` will register the specified volume so that you can use it with development
environments, tasks, and services.

If you already have a volume in your cloud account that you'd like to use with `dstack`,
create a `volume` configuration file with `volume_id` specified:
!!! info "Reference"
See the [.dstack.yml reference](../reference/dstack.yml/dev-environment.md)
for all supported configuration options and multiple examples.

<div editor-title="external-volume.dstack.yml">
## Creating and registering volumes

```yaml
type: volume
name: my-external-volume
backend: aws
region: eu-central-1
volume_id: vol1235
```

</div>

Then apply the configuration to register the volume:
To create or register the volume, simply call the `dstack apply` command:

<div class="termy">

```shell
$ dstack apply -f external-volume.dstack.yml
Volume my-external-volume does not exist yet. Create the volume? [y/n]: y
NAME BACKEND REGION STATUS CREATED
my-external-volume aws eu-central-1 submitted now
$ dstack apply -f volume.dstack.yml
Volume my-new-volume does not exist yet. Create the volume? [y/n]: y
NAME BACKEND REGION STATUS CREATED
my-new-volume aws eu-central-1 submitted now
```

</div>

The volume is registered and can be mounted in runs!
> When creating the volume `dstack` automatically creates an `ext4` file system on it.

Once created, the volume can be attached with dev environments, tasks, and services.

## Mount volumes in runs
## Attaching volumes

Suppose we need to run a dev environment.
We could mount a volume and store our work there so it's not lost between run restarts or instance interruptions.
We do it by specifying a list of `volumes`.
Each item in `volumes` should have `name` of the volume and `path` where the volume should be mounted in the run.
Here's what it looks like:
Dev environments, tasks, and services let you attach any number of volumes.
To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents:

<div editor-title="dev.dstack.yml">
<div editor-title=".dstack.yml">

```yaml
type: dev-environment
Expand All @@ -107,39 +71,38 @@ volumes:

</div>

Then we can run this `dev-environment` configuration, ssh into the run, and see `/volume_data`:
Once you run this configuration, the contents of the volume will be attached to `/volume_data` inside the dev environment,
and its contents will persist across runs.

```shell
-(workflow) root@ip-10-0-10-73:/workflow# ls -l /
total 92
drwxr-xr-x 2 root root 4096 Apr 15 2020 home
...
drwxr-xr-x 3 root root 4096 Jun 28 07:02 volume_data
drwxr-xr-x 5 root root 4096 Jun 28 07:13 workflow
```
!!! info "Limitations"
When you're running a dev environment, task, or service with `dstack`, it automatically mounts the project folder contents
to `/workflow` (and sets that as the current working directory). Right now, `dstack` doesn't allow you to
attach volumes to `/workflow` or any of its subdirectories.

## Managing gateways

## Deleting volumes
**Deleting gateways**

After the run is stopped, a volume can be deleted with `dstack delete`:
When the volume isn't attached to any active dev environment, task, or service, you can delete it using `dstack delete`:

```shell
$ dstack delete -f .dstack/confs/volume.yaml
Delete the volume my-new-volume? [y/n]: y
Volume my-new-volume deleted
$ dstack delete -f vol.dstack.yaml
```

Note that deleting `dstack`-managed volumes destroys all the volumes data!
Deleting external volumes makes `dstack` "forget" about the volumes, but they remain in the cloud.
If the volume was created using `dstack`, it will be physically destroyed along with the data.
If you've registered an existing volume, it will be de-registered with `dstack` but will keep the data.

## FAQ
**Listing volumes**

1. Can I mount volumes from one cloud on instances from other clouds?
The [`dstack volume list`](../reference/cli/index.md#dstack-gateway-list) command lists created and registered volumes.

No. Since volumes are backed up by cloud network disks, they can only be used with instances in the same cloud.
If you need to access data from different clouds, consider uploading it to an object storage.
## FAQ

2. Can I mount volumes from one region/zone on instances from other regions/zones?
??? info "Using volumes across backends"
Since volumes are backed up by cloud network disks, you can only use them within the same cloud. If you need to access
data across different backends, you should either use object storage (or replicate the data across multiple volumes).

It depends on the cloud and volume type. Generally, network volumes are tied to regions so they cannot be
used in other regions. Volumes are also often tied to availability zones but
some clouds support volumes that can be used across availability zones within a region.
??? info "Using volumes across regions"
Typically, network volumes are associated with specific regions, so you can't use them in other regions. Sometimes,
volumes are also linked to availability zones, but some systems allow volumes that can be used across different
availability zones within the same region.
27 changes: 27 additions & 0 deletions docs/docs/reference/dstack.yml/dev-environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,33 @@ regions: [eu-west-1, eu-west-2]

</div>

### Volumes

Volumes allow you to persist data between runs.
To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents:

<div editor-title=".dstack.yml">

```yaml
type: dev-environment
ide: vscode
volumes:
- name: my-new-volume
path: /volume_data
```

</div>

Once you run this configuration, the contents of the volume will be attached to `/volume_data` inside the development
environment, and its contents will persist across runs.

!!! info "Limitations"
When you're running a dev environment, task, or service with `dstack`, it automatically mounts the project folder contents
to `/workflow` (and sets that as the current working directory). Right now, `dstack` doesn't allow you to
attach volumes to `/workflow` or any of its subdirectories.

The `dev-environment` configuration type supports many other options. See below.

## Root reference
Expand Down
25 changes: 25 additions & 0 deletions docs/docs/reference/dstack.yml/service.md
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,31 @@ regions: [eu-west-1, eu-west-2]

</div>

### Volumes

Volumes allow you to persist data between runs.
To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents:

<div editor-title="serve.dstack.yml">

```yaml
type: service
commands:
- python3 -m http.server
port: 8000
volumes:
- name: my-new-volume
path: /volume_data
```

</div>

Once you run this configuration, the contents of the volume will be attached to `/volume_data` inside the service,
and its contents will persist across runs.

The `service` configuration type supports many other options. See below.

## Root reference
Expand Down
31 changes: 31 additions & 0 deletions docs/docs/reference/dstack.yml/task.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,37 @@ regions: [eu-west-1, eu-west-2]

</div>

### Volumes

Volumes allow you to persist data between runs.
To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents:

<div editor-title="train.dstack.yml">

```yaml
type: task
python: "3.11"
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- python fine-tuning/qlora/train.py
volumes:
- name: my-new-volume
path: /volume_data
```

</div>

Once you run this configuration, the contents of the volume will be attached to `/volume_data` inside the task,
and its contents will persist across runs.

!!! info "Limitations"
When you're running a dev environment, task, or service with `dstack`, it automatically mounts the project folder contents
to `/workflow` (and sets that as the current working directory). Right now, `dstack` doesn't allow you to
attach volumes to `/workflow` or any of its subdirectories.

The `task` configuration type supports many other options. See below.

## Root reference
Expand Down
24 changes: 11 additions & 13 deletions docs/docs/reference/dstack.yml/volume.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
# volume

The `volume` configuration type allows creating and updating volumes.
The `volume` configuration type allows creating, registering, and updating volumes.

!!! info "Filename"
Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `volume.dstack.yml` are both acceptable)
and can be located in the project's root directory or any nested folder.
Any configuration can be applied via [`dstack apply`](../cli/index.md#dstack-apply).
> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `vol.dstack.yml` are both acceptable)
> and can be located in the project's root directory or any nested folder.
> Any configuration can be applied via [`dstack apply`](../cli/index.md#dstack-apply).
## Examples

### AWS EBS volume
### Creating a new volume { #create-volume }

<div editor-title="aws-volume.dstack.yml">
<div editor-title="vol.dstack.yml">

```yaml
type: volume
Expand All @@ -23,17 +22,16 @@ size: 100GB
</div>
### Registering an existing volume { #register-volume }
### AWS EBS external volume
<div editor-title="aws-ext-volume.dstack.yml">
<div editor-title="ext-vol.dstack.yml">
```yaml
type: volume
name: my-ext-aws-volume
name: my-external-volume
backend: aws
region: eu-central-1
volume_id: vol-123456
volume_id: vol1235
```
</div>
Expand Down

0 comments on commit bacd676

Please sign in to comment.