- Updated the interface of the fine-tuning API

- Added the interface for completion service
dstackai · Nov 14, 2023 · 288efb0 · 288efb0
1 parent fae7937
commit 288efb0
Show file tree

Hide file tree

Showing 11 changed files with 283 additions and 75 deletions.
diff --git a/docs/assets/stylesheets/landing.css b/docs/assets/stylesheets/landing.css
@@ -269,7 +269,7 @@
     align-items: center;
     justify-content: center;
     gap: 14px;
-    padding: 18px 5px 3px;
+    padding: 19px 5px 1px;
 }
 
 .tx-landing__integrations .logo-xlarge {
@@ -341,7 +341,7 @@
 }
 
 .tx-landing__major_feature {
-    font-size: 1.1em;
+    font-size: 1.05em;
     margin-top: 5em;
 }
 

diff --git a/docs/docs/configuration/server.md b/docs/docs/configuration/server.md
@@ -299,11 +299,21 @@ projects:
 </div>
 
 !!! info "NOTE:"
-    The `vastai` backend supports on-demand instances only. Spot instance support coming soon.
+    To use Vast AI, ensure you have the latest version:
+
+    <div class="termy">
+
+    ```shell
+    $ pip install "dstack[all]==0.12.3rc1"
+    ```
+
+    </div>
+
+    Also, the `vastai` backend supports on-demand instances only. Spot instance support coming soon.
 
 ## Cloud regions
 
-In addition to credentials, each cloud (except TensorDock) optionally allows for region configuration.
+In addition to credentials, each cloud (except TensorDock and Vast AI) optionally allows for region configuration.
 
 Example:
 

diff --git a/docs/docs/guides/fine-tuning.md b/docs/docs/guides/fine-tuning.md
@@ -1,13 +1,23 @@
 # Fine-tuning
 
-If you want to fine-tune an LLM based on a given dataset, consider using
-`dstack`'s finetuning API.
+For fine-tuning an LLM with dstack's API, specify a model name, HuggingFace dataset, and training parameters.
 
 You specify a model name, dataset on HuggingFace, and training parameters.
 `dstack` takes care of the training and pushes it to the HuggingFace hub upon completion. 
 
 You can use any cloud GPU provider(s) and experiment tracker of your choice.
 
+??? info "Prerequisites"
+    To use the fine-tuning API, ensure you have the latest version:
+
+    <div class="termy">
+
+    ```shell
+    $ pip install "dstack[all]==0.12.3rc1"
+    ```
+
+    </div>
+
 ## Create a client
 
 First, you connect to `dstack`:
@@ -24,15 +34,17 @@ except ClientError:
 ## Create a task
 
 Then, you create a fine-tuning task, specifying the model and dataset, 
-and various [training parameters](../../docs/reference/api/python/index.md#dstack.api.finetuning.SFTFineTuningTask).
+and various [training parameters](../../docs/reference/api/python/index.md#dstack.api.FineTuningTask).
 
 ```python
-from dstack.api.finetuning import SFTFineTuningTask
-
-task = SFTFineTuningTask(hf_model_name="NousResearch/Llama-2-13b-hf",
-                         hf_dataset_name="peterschmidt85/samsum",
-                         hf_token="...",
-                         num_train_epochs=2)
+from dstack.api import FineTuningTask
+
+task = FineTuningTask(model_name="NousResearch/Llama-2-13b-hf",
+                      dataset_name="peterschmidt85/samsum",
+                      env={
+                          "HUGGING_FACE_HUB_TOKEN": "...",
+                      },
+                      num_train_epochs=2)
 ```
 
 !!! info "Dataset format"
@@ -71,16 +83,15 @@ including getting a list of runs, stopping a given run, etc.
 To track experiment metrics, specify `report_to` and related authentication environment variables.
 
 ```python
-task = SFTFineTuningTask(hf_model_name="NousResearch/Llama-2-13b-hf",
-                         hf_dataset_name="peterschmidt85/samsum",
-                         hf_token="...",
-                         report_to="wandb",
-                         env={
-                             "WANDB_API_KEY": "...",
-                             "WANDB_PROJECT": "...",
-                         },
-                         num_train_epochs=2
-                         )
+task = FineTuningTask(model_name="NousResearch/Llama-2-13b-hf",
+                      dataset_name="peterschmidt85/samsum",
+                      report_to="wandb",
+                      env={
+                          "HUGGING_FACE_HUB_TOKEN": "...",
+                          "WANDB_API_KEY": "...",
+                      },
+                      num_train_epochs=2
+                      )
 ```
 
 Currently, the API supports `"tensorboard"` and `"wandb"`.

diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -11,7 +11,8 @@ or use the cloud version (which provides GPU out of the box).
 
 ??? info "Open-source"
 
-    If you wish to use `dstack` with your own cloud accounts, you can set up the open-source server.
+    If you wish to use `dstack` with your own cloud accounts, you can do it 
+    by using the open-source server.
 
     ### Install the server
 
@@ -74,7 +75,7 @@ or use the cloud version (which provides GPU out of the box).
 
 ??? info "GPU cloud"
 
-    If you want `dstack` to provide cloud GPU, 
+    If you want to use the cloud version of `dstack`, 
     <a href="#" data-tally-open="w7K17R">sign up</a>, and configure the client 
     with server address, user token, and project name using `dstack config`.
 
@@ -111,15 +112,16 @@ client = Client.from_config()
 === "Fine-tuning"
 
     ```python
-    from dstack.api import Resources, GPU
-    from dstack.api.finetuning import SFTFineTuningTask
+    from dstack.api import FineTuningTask, Resources, GPU
 
     # Pass a model, dataset, and training params
 
-    task = SFTFineTuningTask(
-        hf_model_name="NousResearch/Llama-2-13b-hf",
-        hf_dataset_name="peterschmidt85/samsum",
-        hf_token="...",
+    task = FineTuningTask(
+        model_name="NousResearch/Llama-2-13b-hf",
+        dataset_name="peterschmidt85/samsum",
+        env={
+            "WANDB_API_KEY": "..."
+        },
         num_train_epochs=2
     )
 
@@ -133,6 +135,29 @@ client = Client.from_config()
 
     > Go to [Fine-tuning](guides/fine-tuning.md) to learn more.
 
+=== "Model serving"
+
+    ```python
+    from dstack.api import Client, GPU, CompletionService, Resources
+
+    # Pass a model and quantization params
+
+    service = CompletionService(
+        model_name="TheBloke/CodeLlama-34B-GPTQ",
+        quantize="gptq"
+    )
+
+    # Deploy the model as a public endpoint
+
+    run = client.runs.submit(
+        run_name = "llama-2-13b-hf",  # If not set, assigned randomly
+        configuration=service,
+        resources=Resources(gpu=GPU(memory="24GB"))
+    )
+    ```
+
+[//]: # (    > Go to [Text generation]&#40;guides/text-generation.md&#41; to learn more.)
+
 ## Using the CLI
 
 The CLI allows you to define configurations (what you want to run) as YAML files and run them using the `dstack run`

diff --git a/docs/docs/reference/api/python/index.md b/docs/docs/reference/api/python/index.md
@@ -30,6 +30,15 @@ The Python API allows for running tasks, services, and managing runs programmati
       show_root_toc_entry: false
       heading_level: 4
 
+### `dstack.api.FineTuningTask` { #dstack.api.FineTuningTask data-toc-label="FineTuningTask" }
+
+::: dstack.api.FineTuningTask
+    options:
+      show_bases: false
+      show_root_heading: false
+      show_root_toc_entry: false
+      heading_level: 4
+
 ### `dstack.api.Run` { ##dstack.api.Run data-toc-label="Run" }
 
 ::: dstack.api.Run
@@ -63,17 +72,6 @@ The Python API allows for running tasks, services, and managing runs programmati
       show_root_toc_entry: false
       heading_level: 4
 
-## `dstack.api.finetuning` { #dstack.api.finetuning data-toc-label="dstack.api.finetuning" }
-
-### `dstack.api.finetuning.SFTFineTuningTask` { #dstack.api.finetuning.SFTFineTuningTask data-toc-label="SFTFineTuningTask" }
-
-::: dstack.api.finetuning.SFTFineTuningTask
-    options:
-      show_bases: false
-      show_root_heading: false
-      show_root_toc_entry: false
-      heading_level: 4
-
 <style>
 .doc-heading .highlight {
     /* TODO pick color */