Skip to content

Releases: dstackai/dstack

0.18.19

17 Oct 07:25
a171ba6
Compare
Choose a tag to compare

This release contains CLI hotfixes for 0.18.18, including a fix for client backward compatibility and a fix for reported memory usage in dstack stats. It's recommended to update the CLI from 0.18.18 to 0.18.19. The server update from 0.18.18 is not necessary.

What's Changed

Full Changelog: 0.18.18...0.18.19

0.18.18

16 Oct 15:04
10e587d
Compare
Choose a tag to compare

Hardware metrics

The CLI introduces a new command, dstack stats, which displays real-time hardware metrics for runs, including CPU,
memory, and GPU usage per replica and job.

$ dstack stats hot-frog-1
 NAME        CPU  MEMORY           GPU                        
 hot-frog-1  2%   15307MB/49152MB  #0 22764MB/24576MB 0% Util 

Use the -w option to view stats updating every few seconds in the loop.

You can also retrieve the metrics using the REST API.

Docker inside dstack

Run configurations now have a new optional privileged property (equivalent to the
--privileged Docker CLI flag).
When it is set to true, the run container gets extended privileges, making it possible to use Docker inside
dstack.

To use Docker and Docker Compose within dstack, set the image property
to dstackai/dind. Additionally, you must invoke start-dockerd as the first command to start the Docker daemon.

type: task
name: misc-task-dind

image: dstackai/dind
privileged: true

commands:
  - start-dockerd
  - docker compose up
Dev environment example
type: dev-environment
name: vscode-dind

image: dstackai/dind
privileged: true
ide: vscode

init:
  - start-dockerd

See more examples in examples/misc/docker-compose.

Note

The privileged property is only supported by VM backends (all backends except runpod, vastai, and kubernetes).

What's changed

New contributors

Full changelog: 0.18.17...0.18.18

0.18.17

09 Oct 10:20
Compare
Choose a tag to compare

On-prem AMD GPU support

dstack now supports SSH fleets with AMD GPUs. Hosts should be pre-installed with Docker and AMDGPU-DKMS kernel driver (e.g. via native package manager or AMDGPU installer).

Elastic Fabric Adapter support

dstack now automatically enables AWS EFA if it is supported by the instance type, no extra configuration needed. The following EFA-enabled instance types are supported: p5.48xlarge, p4d.24xlarge, g4dn.12xlarge, g4dn.16xlarge, g4dn.8xlarge, g4dn.metal, g5.12xlarge, g5.16xlarge, g5.24xlarge, g5.48xlarge, g5.8xlarge, g6.12xlarge, g6.16xlarge, g6.24xlarge, g6.48xlarge, g6.8xlarge, gr6.8xlarge.

Improved apply plan

Previously, dstack apply showed a plan only for run configurations. Now it shows a plan for all configuration types including fleets, volumes, and gateways. Here's a fleet plan showing configuration parameters and the offers that will be tried for provisioning:

✗ dstack apply -f .dstack/confs/fleet.yaml
 Project        main                           
 User           admin                          
 Configuration  .dstack/confs/fleet.yaml       
 Type           fleet                          
 Fleet type     cloud                          
 Nodes          2                              
 Placement      cluster                        
 Backends       aws                            
 Resources      2..xCPU, 8GB.., 100GB.. (disk) 
 Spot policy    on-demand                      

 #  BACKEND  REGION        INSTANCE   RESOURCES                   SPOT  PRICE    
 1  aws      eu-west-1     m5.large   2xCPU, 8GB, 100.0GB (disk)  no    $0.107   
 2  aws      eu-central-1  m5.large   2xCPU, 8GB, 100.0GB (disk)  no    $0.115   
 3  aws      eu-west-1     c5.xlarge  4xCPU, 8GB, 100.0GB (disk)  no    $0.192   
    ...                                                                          
 Shown 3 of 82 offers, $40.9447 max

Fleet my-cluster-fleet does not exist yet.
Create the fleet? [y/n]: 

Volumes UI

Server administrators and regular users can now see volumes in the UI.

What's Changed

New Contributors

Full Changelog: 0.18.16...0.18.17

0.18.16

30 Sep 10:29
fccc8dd
Compare
Choose a tag to compare

New versioning policy

Starting with this release, dstack adopts a new versioning policy to provide better server and client backward compatibility and improve the upgrading experience. dstack continues to follow semver versioning scheme ({major}.{minor}.{patch}) with the following principles:

  • The server backward compatibility is maintained across all minor and patch releases. The specific features can be removed but the removal is preceded with deprecation warnings for several minor releases. This means you can use older client versions with newer server versions.
  • The client backward compatibility is maintained across patch releases. A new minor release indicates that the release breaks client backward compatibility. This means you don't need to update the server when you update the client to a new patch release. Still, upgrading a client to a new minor version requires upgrading the server too.

Perviously, dstack never guaranteed client backward compatibility, so you had to always update the server when updating the client. The new versioning policy makes the client and server upgrading more flexible.

Note: The new policy only takes affect after both the clients and the server are upgraded to 0.18.16. The 0.18.15 server still won't work with newer clients.

dstack attach

The CLI gets a new dstack attach command that allows attaching to a run. It establishes the SSH tunnel, forwards ports, and streams run logs in real time:

 ✗ dstack attach silent-panther-1
Attached to run silent-panther-1 (replica=0 job=0)
Forwarded ports (local -> remote):
  - localhost:7860 -> 7860
To connect to the run via SSH, use `ssh silent-panther-1`.
Press Ctrl+C to detach...

This command is a replacement for dstack logs --attach with major improvements and bugfixes.

CloudWatch-related bugfixes

The releases includes several important bugfixes for CloudWatchLogStorage. We strongly recommend upgrading the dstack server if it's configured to store logs in CloudWatch.

Deprecations

  • dstack logs --attach is deprecated in favor of dstack attach and may be removed in the following minor releases.

What's Changed

Full Changelog: 0.18.15...0.18.16

0.18.15

25 Sep 10:56
c187166
Compare
Choose a tag to compare

Cluster placement groups

Instances of AWS cluster fleets are now provisioned into cluster placement groups for better connectivity. For example, when you create this fleet:

type: fleet
name: my-cluster-fleet
nodes: 4
placement: cluster
backends: [aws]

dstack will automatically create a cluster placement group and use it to provision the instances.

On-prem and VM-based fleets improvements

  • All available Nvidia driver capabilities are now requested by default, which makes it possible to run GPU workloads requiring OpenGL/Vulkan/RT/Video Codec SDK libraries. (#1714)
  • Automatic container cleanup. Previously, when the run completed, either successfully or due to an error, its container was not deleted, which led to ever-increasing storage consumption. Now, only the last stopped container is preserved and is available until the next run is completed. (#1706)

Major bug fixes

  • Fixed a bug where under some conditions logs wouldn't be uploaded to CloudWatch Logs due to size limits. (#1712)
  • Fixed a bug that prevented running services on on-prem instances. (#1716)

Changelog

  • Fix cli connection issue with TPU by @Bihan in #1705
  • Rename --default to --yes and no-default to --no in dstack config and dstack server by @peterschmidt85 in #1709
  • [CI] Fix shim/runner release versions by @un-def in #1704
  • Document run diagnostic logs by @r4victor in #1710
  • [shim] Add old container cleanup routine by @un-def in #1706
  • Write events to CloudWatch in batches by @un-def in #1712
  • [shim] Request all Nvidia driver capabilities by @un-def in #1714
  • Added showing dstack version on the UI by @olgenn in #1717
  • Add missing project SSH key to on-prem instances by @un-def in #1716
  • Simplify handling missing GatewayConfiguration by @jvstme in #1724
  • [shim] Fix container logs processing by @un-def in #1721
  • Support AWS placement groups for cluster fleets by @r4victor in #1725

Full Changelog: 0.18.14...0.18.15

0.18.15rc1

24 Sep 06:53
1ce1243
Compare
Choose a tag to compare
0.18.15rc1 Pre-release
Pre-release

On-prem and VM-based fleets improvements

  • All available Nvidia driver capabilities are now requested by default, which makes it possible to run GPU workloads requiring OpenGL/Vulkan/RT/Video Codec SDK libraries.
  • Automatic container cleanup. Previously, when the run completed, either successfully or due to an error, its container was not deleted, which led to ever-increasing storage consumption. Now, only the last stopped container is preserved and is available until the next run is completed.

Major bug fixes

  • Fixed a bug where under some conditions logs wouldn't be uploaded to CloudWatch Logs due to size limits.

Changelog

  • [UX] Rename --default to --yes and --no-default to --no in dstack config and dstack server by @peterschmidt85 in #1709
  • Fix cli connection issue with TPU by @Bihan in #1705
  • Fix dstack-shim and dstack-runner release versions by @un-def in #1704
  • Request all Nvidia driver capabilities by @un-def in #1714
  • Add old container cleanup routine by @un-def in #1706
  • Write events to CloudWatch in batches by @un-def in #1712
  • [Docs] Document run diagnostic logs by @r4victor in #1710
  • [Docs] Added the server deployment guide, updated the README.md for the Docker Hub, fixed the scrolling issue by @peterschmidt85

Full changelog: 0.18.14...0.18.15rc1

0.18.14

18 Sep 09:40
854c812
Compare
Choose a tag to compare

Multi-replica server deployment

Previously, the dstack server only supported deploying a single instance (replica). However, with 0.18.14, you can now deploy multiple replicas, enabling high availability and zero-downtime updates

Note

Multi-replica server deployment requires using Postgres instead of the default SQLite. To configure Postgres, set the DSTACK_DATABASE_URL environment variable.

Make sure to update to version 0.18.14 before configuring multiple replicas.

Major bug-fixes

  • [Bugfix] dstack init --git-identity doesn't accept backslashes in path on Windows by @un-def in #1686
  • [Bugfix] Use -tmpfs /dev/shm:rw,nosuid,nodev,exec,size=X instead of --shm-size=X @un-def in #1690
  • [Bugfix] dstack-shim is not updated when fleet is recreated by @un-def in #1698

Other

  • [Bugfix] Fix SSHAttach.reuse_ports_lock() when no grep matches by @un-def in #1700
  • [Bugfix] Fix logger exception on instance provisioning timeout by @un-def in #1697
  • [Internal] Add JobProvisioningData.base_backend by @r4victor in #1682
  • [Internal] Add Run.error by @r4victor in #1684
  • [Internal] Return server_version in /api/server/get_info by @r4victor in #1685
  • [Internal] Allow gateway to connect to replicated server by @jvstme in #1688
  • [Internal] Adjust gateway management for multiple server replicas by @r4victor in #1691
  • [Internal] Skip gateway update if gateway was updated recently by @r4victor in #1695
  • [Internal] Remove redundant logger.error by @r4victor in #1702

Full changelog: 0.18.13...0.18.14

0.18.13

11 Sep 14:24
f277126
Compare
Choose a tag to compare

Windows

You can now use the CLI on Windows (WSL 2 is not required).

Ensure that Git and OpenSSH are installed via Git for Windows.

During installation, select Git from the command line and also from 3-rd party software
(or Use Git and optional Unix tools from the Command Prompt), and Use bundled OpenSSH checkboxes.

Spot policy

Previously, dev environments used the on-demand spot policy, while tasks and services used auto. With this update, we've changed the default spot policy to always be on-demand for all configurations. Users will now need to explicitly specify the spot policy if they want to use spot instances.

Troubleshooting

The documentation now includes a Troubleshooting guide with instructions on how to report issues.

Changelog

All commits: 0.18.12...0.18.13

0.18.12

04 Sep 12:15
1537163
Compare
Choose a tag to compare

Features

  • Added support for ECDSA and Ed25519 keys for on-prem fleets by @swsvc in #1641

Major bugfixes

  • Fixed the order of CloudWatch log events in the web interface by @un-def in #1613
  • Fixed a bug where CloudWatch log events might not be displayed in the web inferface for old runs by @un-def in #1652
  • Prevent possible server freeze on SSH connections by @jvstme in #1627

Other changes

Full changelog: 0.18.11...0.18.12

0.18.12rc1

04 Sep 11:28
1537163
Compare
Choose a tag to compare
0.18.12rc1 Pre-release
Pre-release

Features

  • Added support for ECDSA and Ed25519 keys for on-prem fleets by @swsvc in #1641

Major bugfixes

  • Fixed the order of CloudWatch log events in the web interface by @un-def in #1613
  • Fixed a bug where CloudWatch log events might not be displayed in the web inferface for old runs by @un-def in #1652
  • Prevent possible server freeze on SSH connections by @jvstme in #1627

Other changes

Full changelog: 0.18.11...0.18.12rc1