Skip to content

Commit

Permalink
- [Docs] Added python-api example (broken link)
Browse files Browse the repository at this point in the history
  • Loading branch information
peterschmidt85 committed Sep 12, 2023
1 parent be70782 commit 90b4191
Show file tree
Hide file tree
Showing 2 changed files with 88 additions and 4 deletions.
84 changes: 84 additions & 0 deletions docs/blog/posts/python-api-preview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: "dstack 0.10.7: An early preview of services"
date: 2023-09-12
description: "The 0.10.7 update introduces a new configuration type specifically for serving purposes."
slug: "python-api-preview.md"
categories:
- Releases
---

# An early preview of Python API

__The 0.10.7 update introduces a new configuration type for serving.__

Until now, `dstack` has supported `dev-environment` and `task` as configuration types. Even though `task`
may be used for basic serving use cases, it lacks crucial serving features. With the new update, we introduce
`service`, a dedicated configuration type for serving.

<!-- more -->

Consider the following example:

<div editor-title="text-generation-inference/serve.dstack.yml">

```yaml
type: task

image: ghcr.io/huggingface/text-generation-inference:0.9.3

ports:
- 8000

commands:
- text-generation-launcher --hostname 0.0.0.0 --port 8000 --trust-remote-code
```
</div>
When running it, the `dstack` CLI forwards traffic to `127.0.0.1:8000`.
This is convenient for development but unsuitable for production.

In production, you need your endpoint available on the external network, preferably behind authentication
and a load balancer.

This is why we introduce the `service` configuration type.

<div editor-title="text-generation-inference/serve.dstack.yml">

```yaml
type: service
image: ghcr.io/huggingface/text-generation-inference:0.9.3
port: 8000
commands:
- text-generation-launcher --hostname 0.0.0.0 --port 8000 --trust-remote-code
```

</div>

As you see, there are two differences compared to `task`.

1. The `gateway` property: the address of a special cloud instance that wraps the running service with a public
endpoint. Currently, you must specify it manually. In the future, `dstack` will assign it automatically.
2. The `port` property: A service must always configure one port on which it's running.

When running, `dstack` forwards the traffic to the gateway, providing you with a public endpoint that you can use to
access the running service.

??? info "Existing limitations"
1. Currently, you must create a gateway manually using the `dstack gateway` command
and specify its address via YAML (e.g. using secrets). In the future, `dstack` will assign it automatically.
2. Gateways do not support HTTPS yet. When you run a service, its endpoint URL is `<the address of the gateway>:80`.
The port can be overridden via the port property: instead of `8000`, specify `<gateway port>:8000`.
3. Gateways do not provide authorization and auto-scaling. In the future, `dstack` will support them as well.

This initial support for services is the first step towards providing multi-cloud and cost-effective inference.

!!! info "Give it a try and share feedback"
Even though the current support is limited in many ways, we encourage you to give it a try and share your feedback with us!

More details on how to use services can be found in a [dedicated guide](../../docs/guides/services.md) in our docs.
Questions and requests for help are
very much welcome in our [Slack chat](https://join.slack.com/t/dstackai/shared_invite/zt-xdnsytie-D4qU9BvJP8vkbkHXdi6clQ).
8 changes: 4 additions & 4 deletions docs/examples/python-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ title: Deploying LLMs with Python API

# Deploying LLMs with Python API

The [Python API](../../docs/docs/reference/api/python/index.md) of `dstack` can be used to run tasks
and services programmatically.
The [Python API](../docs/reference/api/python/index.md) of `dstack` can be used to run
[tasks](../docs/guides/tasks.md) and [services](../docs/guides/services.md) programmatically.

Below is an example of a Streamlit app that uses `dstack`'s API to deploy a quantized version of Llama 2 to your cloud
with a simple click of a button.
Expand All @@ -18,10 +18,10 @@ with a simple click of a button.

To get started, create an instance of `dstack.Client` and use its methods to submit and manage runs.

With `dstack.Client`, you can run [tasks](../../docs/guides/tasks.md) and [services](../../docs/guides/services.md). Running a task allows you to programmatically access its ports and
With `dstack.Client`, you can run [tasks](../docs/guides/tasks.md) and [services](../docs/guides/services.md). Running a task allows you to programmatically access its ports and
forward traffic to your local machine. For example, if you run an LLM as a task, you can access it on localhost.

For more details on the Python API, please refer to its [reference](../../docs/docs/reference/api/python/index.md).
For more details on the Python API, please refer to its [reference](../docs/docs/reference/api/python/index.md).

## Prerequisites

Expand Down

0 comments on commit 90b4191

Please sign in to comment.