Skip to content

Commit

Permalink
- Added docs on the new completion service (reflecting the changes in…
Browse files Browse the repository at this point in the history
… `0.12.3rc1`)
  • Loading branch information
peterschmidt85 committed Nov 14, 2023
1 parent 288efb0 commit bf51ec9
Show file tree
Hide file tree
Showing 5 changed files with 121 additions and 30 deletions.
45 changes: 22 additions & 23 deletions docs/docs/guides/fine-tuning.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,7 @@
# Fine-tuning

For fine-tuning an LLM with dstack's API, specify a model name, HuggingFace dataset, and training parameters.

You specify a model name, dataset on HuggingFace, and training parameters.
`dstack` takes care of the training and pushes it to the HuggingFace hub upon completion.

You can use any cloud GPU provider(s) and experiment tracker of your choice.
For fine-tuning an LLM with `dstack`'s API, specify a model, dataset, training parameters,
and required compute resources. `dstack` takes care of everything else.

??? info "Prerequisites"
To use the fine-tuning API, ensure you have the latest version:
Expand Down Expand Up @@ -39,22 +35,24 @@ and various [training parameters](../../docs/reference/api/python/index.md#dstac
```python
from dstack.api import FineTuningTask

task = FineTuningTask(model_name="NousResearch/Llama-2-13b-hf",
dataset_name="peterschmidt85/samsum",
env={
"HUGGING_FACE_HUB_TOKEN": "...",
},
num_train_epochs=2)
task = FineTuningTask(
model_name="NousResearch/Llama-2-13b-hf",
dataset_name="peterschmidt85/samsum",
env={
"HUGGING_FACE_HUB_TOKEN": "...",
},
num_train_epochs=2
)
```

!!! info "Dataset format"
For the SFT fine-tuning method, the dataset should contain a `"text"` column with completions following the prompt format
of the corresponding model.
Check the [peterschmidt85/samsum](https://huggingface.co/datasets/peterschmidt85/samsum) example.

## Submit the task
## Run the task

When submitting a task, you can configure resources, and many [other options](../../docs/reference/api/python/index.md#dstack.api.RunCollection.submit).
When running a task, you can configure resources, and many [other options](../../docs/reference/api/python/index.md#dstack.api.RunCollection.submit).

```python
from dstack.api import Resources, GPU
Expand Down Expand Up @@ -83,15 +81,16 @@ including getting a list of runs, stopping a given run, etc.
To track experiment metrics, specify `report_to` and related authentication environment variables.

```python
task = FineTuningTask(model_name="NousResearch/Llama-2-13b-hf",
dataset_name="peterschmidt85/samsum",
report_to="wandb",
env={
"HUGGING_FACE_HUB_TOKEN": "...",
"WANDB_API_KEY": "...",
},
num_train_epochs=2
)
task = FineTuningTask(
model_name="NousResearch/Llama-2-13b-hf",
dataset_name="peterschmidt85/samsum",
report_to="wandb",
env={
"HUGGING_FACE_HUB_TOKEN": "...",
"WANDB_API_KEY": "...",
},
num_train_epochs=2
)
```

Currently, the API supports `"tensorboard"` and `"wandb"`.
Expand Down
10 changes: 7 additions & 3 deletions docs/docs/guides/services.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@ Provide the commands, port, and choose the Python version or a Docker image.

`dstack` handles the deployment on configured cloud GPU provider(s) with the necessary resources.

## Prerequisites
??? info "Prerequisites"

If you're using the open-source server, you first have to set up a gateway.
If you're using the open-source server, you first have to set up a gateway.

### Set up a gateway

??? info "Set up a gateway"
For example, if your domain is `example.com`, go ahead and run the
`dstack gateway create` command:

Expand Down Expand Up @@ -93,6 +94,9 @@ Serving HTTP on https://yellow-cat-1.example.com ...

</div>

Once the service is deployed, its endpoint will be available at
`https://<run-name>.<domain-name>` (using the domain set up for the gateway).

!!! info "Run options"
The `dstack run` command allows you to use `--gpu` to request GPUs (e.g. `--gpu A100` or `--gpu 80GB` or `--gpu A100:4`, etc.),
and many other options (incl. spot instances, max price, max duration, retry policy, etc.).
Expand Down
87 changes: 87 additions & 0 deletions docs/docs/guides/text-generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Text generation

For deploying an LLM with `dstack`'s API, specify a model, quantization parameters,
and required compute resources. `dstack` takes care of everything else.

??? info "Prerequisites"
If you're using the open-source server, before using the model serving API, make sure to
[set up a gateway](services.md#set-up-a-gateway).

If you're using the cloud version of `dstack`, it's set up automatically for you.

Also, to use the model serving API, ensure you have the latest version:

<div class="termy">

```shell
$ pip install "dstack[all]==0.12.3rc1"
```

</div>

## Create a client

First, you connect to `dstack`:

```python
from dstack.api import Client, ClientError

try:
client = Client.from_config()
except ClientError:
print("Can't connect to the server")
```

## Create a service

Then, you create a completion service, specifying the model, and quantization parameters.

```python
from dstack.api import CompletionService

service = CompletionService(
model_name="TheBloke/CodeLlama-34B-GPTQ",
quantize="gptq"
)
```

## Run the service

When running a service, you can configure resources, and many [other options](../../docs/reference/api/python/index.md#dstack.api.RunCollection.submit).

```python
from dstack.api import Resources, GPU

run = client.runs.submit(
run_name="codellama-34b-gptq", # (Optional) If unset, its chosen randomly
configuration=service,
resources=Resources(gpu=GPU(memory="24GB")),
)
```

## Access the endpoint

Once the model is deployed, its endpoint will be available at
`https://<run-name>.<domain-name>` (using the domain set up for the gateway).

<div class="termy">

```shell
$ curl https://&lt;run-name&gt;.&lt;domain-name&gt;/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens": 20}}' \
-H 'Content-Type: application/json'
```

</div>

> The endpoint supports streaming, continuous batching, tensor parallelism, etc.
The OpenAI documentation on the endpoint can be found at `https://<run-name>.<domain-name>/docs`.

[//]: # (TODO: LangChain, own client)

## Manage runs

You can use the instance of [`dstack.api.Client`](../../docs/reference/api/python/index.md#dstack.api.Client) to manage your runs,
including getting a list of runs, stopping a given run, etc.
8 changes: 4 additions & 4 deletions docs/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ client = Client.from_config()
model_name="NousResearch/Llama-2-13b-hf",
dataset_name="peterschmidt85/samsum",
env={
"WANDB_API_KEY": "..."
"HUGGING_FACE_HUB_TOKEN": "..."
},
num_train_epochs=2
)
Expand All @@ -135,7 +135,7 @@ client = Client.from_config()

> Go to [Fine-tuning](guides/fine-tuning.md) to learn more.

=== "Model serving"
=== "Text generation"

```python
from dstack.api import Client, GPU, CompletionService, Resources
Expand All @@ -150,13 +150,13 @@ client = Client.from_config()
# Deploy the model as a public endpoint

run = client.runs.submit(
run_name = "llama-2-13b-hf", # If not set, assigned randomly
run_name = "codellama-34b-gptq", # If not set, assigned randomly
configuration=service,
resources=Resources(gpu=GPU(memory="24GB"))
)
```

[//]: # ( > Go to [Text generation]&#40;guides/text-generation.md&#41; to learn more.)
> Go to [Text generation](guides/text-generation.md) to learn more.

## Using the CLI

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,7 @@ nav:
- Server configuration: docs/configuration/server.md
- Guides:
- Fine-tuning: docs/guides/fine-tuning.md
- Text generation: docs/guides/text-generation.md
- Dev environments: docs/guides/dev-environments.md
- Tasks: docs/guides/tasks.md
- Services: docs/guides/services.md
Expand Down

0 comments on commit bf51ec9

Please sign in to comment.