- Added terms and privacy pages

- Updated the `fine-tuning` guide - Minor changes to the landing page
dstackai · Nov 17, 2023 · 34eedc9 · 34eedc9
1 parent 6f9ddc9
commit 34eedc9
Show file tree

Hide file tree

Showing 12 changed files with 667 additions and 81 deletions.
diff --git a/docs/assets/stylesheets/extra.css b/docs/assets/stylesheets/extra.css
@@ -796,14 +796,14 @@ html .md-footer-meta.md-typeset a:is(:focus,:hover) {
 }
 
 .md-typeset :where(ul) > li:before {
-    background-color: #d1d5db;
+    background-color: rgba(0,0,0,87);
     border-radius: 50%;
     content: "";
-    height: 0.375em;
+    height: 0.48em;
+    width: 0.48em;
     left: 0.25em;
     position: absolute;
     top: 0.6875em;
-    width: 0.375em;
 }
 
 .md-typeset :where(ol) > li:before {

diff --git a/docs/assets/stylesheets/landing.css b/docs/assets/stylesheets/landing.css
@@ -187,6 +187,28 @@
     font-size: 20px;
 }
 
+.tx-container .md-button {
+    vertical-align: middle;
+}
+
+.tx-container .md-button--primary:hover {
+    transform: translateY(-2px);
+    transition: opacity .2s ease,transform .2s ease;
+}
+
+.tx-container .md-button .icon {
+    display: inline-block;
+    position: relative;
+    width: 15px;
+    height: 15px;
+    margin-left: 7px;
+    transition: opacity .2s ease,transform .2s ease;
+}
+
+.tx-container .md-button--primary:hover .icon {
+    transform: translateX(3px)
+}
+
 .md-header__buttons .md-button--primary,
 .tx-container .md-button--primary {
     background: -webkit-linear-gradient(45deg, #002aff, #002aff, #e165fe);
@@ -548,11 +570,18 @@
 
 .plans_card__subtitle {
     margin-bottom: 1.4rem;
-    font-size: 1.0em;
+    font-size: 0.98em;
     line-height: 1.44;
     color: #696F86;
 }
 
+.plans_card__buttons_subtitle {
+    margin-top: 10px;
+    margin-left: 10px;
+    color: #202128;
+    font-size: 0.7rem;
+}
+
 .plans_card__services {
     display: flex;
     flex-wrap: wrap;

diff --git a/docs/docs/guides/fine-tuning.md b/docs/docs/guides/fine-tuning.md
@@ -1,7 +1,7 @@
 # Fine-tuning
 
 For fine-tuning an LLM with `dstack`'s API, specify a model, dataset, training parameters,
-and required compute resources. `dstack` takes care of everything else.
+and required compute resources. The API takes care of everything else.
 
 ??? info "Prerequisites"
     To use the fine-tuning API, ensure you have the latest version:
@@ -14,17 +14,36 @@ and required compute resources. `dstack` takes care of everything else.
 
     </div>
 
+> The API currently supports only supervised fine-tuning (SFT). Support for DPO and RLHF is coming soon.
+
+## Prepare a dataset
+
+The dataset should contain a `"text"` column with completions following the prompt format
+of the corresponding model. Check the [example](https://huggingface.co/datasets/peterschmidt85/samsum)
+(for fine-tuning Llama 2).
+
+> Once the dataset is prepared, it must be [uploaded](https://huggingface.co/docs/datasets/upload_dataset) to Hugging Face.
+
+??? info "Uploading a dataset"
+    Here's an example of how to upload a dataset programmatically:
+
+    ```python
+    import pandas as pd
+    from datasets import Dataset
+
+    df = pd.read_json("samsum.jsonl", lines=True)
+    dataset = Dataset.from_pandas(df)
+    dataset.push_to_hub("peterschmidt85/samsum")
+    ```
+
 ## Create a client
 
 First, you connect to `dstack`:
 
 ```python
-from dstack.api import Client, ClientError
+from dstack.api import Client
 
-try:
-    client = Client.from_config()
-except ClientError:
-    print("Can't connect to the server")
+client = Client.from_config()
 ```
 
 ## Create a task
@@ -41,15 +60,12 @@ task = FineTuningTask(
     env={
         "HUGGING_FACE_HUB_TOKEN": "...",
     },
-    num_train_epochs=2
+    num_train_epochs=2,
+    max_seq_length=1024,
+    per_device_train_batch_size=2,
 )
 ```
 
-!!! info "Dataset format"
-    For the SFT fine-tuning method, the dataset should contain a `"text"` column with completions following the prompt format
-    of the corresponding model.
-    Check the [peterschmidt85/samsum](https://huggingface.co/datasets/peterschmidt85/samsum) example. 
-
 ## Run the task
 
 When running a task, you can configure resources, and many [other options](../../docs/reference/api/python/index.md#dstack.api.RunCollection.submit).
@@ -64,19 +80,22 @@ run = client.runs.submit(
 )
 ```
 
-!!! info "Fine-tuning methods"
-    The API currently supports only SFT, with support for DPO and other methods coming soon.
+!!! info "GPU memory"
+    The API defaults to using QLoRA based on the provided 
+    [training parameters](../../docs/reference/api/python/index.md#dstack.api.FineTuningTask).
+    When specifying GPU memory, consider both the model size and the specified batch size.
+    After a few attempts, you'll discover the best configuration.
 
-When the training is done, `dstack` pushes the final model to the Hugging Face hub.
+When the training is done, the API pushes the final model to the Hugging Face hub.
 
 ![](../../assets/images/dstack-finetuning-hf.png){ width=800 }
 
 ## Manage runs
 
-You can use the instance of [`dstack.api.Client`](../../docs/reference/api/python/index.md#dstack.api.Client) to manage your runs, 
-including getting a list of runs, stopping a given run, etc.
+You can manage runs using [API](../../docs/reference/api/python/index.md#dstack.api.Client),
+the [CLI](../../docs/reference/cli/index.md), or the user interface.
 
-## Track experiments
+## Track metrics
 
 To track experiment metrics, specify `report_to` and related authentication environment variables.
 
@@ -97,5 +116,13 @@ Currently, the API supports `"tensorboard"` and `"wandb"`.
 
 ![](../../assets/images/dstack-finetuning-wandb.png){ width=800 }
 
-[//]: # (TODO: Example)
-[//]: # (TODO: Next steps)
+[//]: # (TODO: Examples - Llama 2, Mistral, etc)
+
+## What's next?
+
+- Once the model is trained, proceed to [deploy](text-generation.md) it as an endpoint.
+  The deployed endpoint can be used from your apps directly or via LangChain.
+- The source code of the fine-tuning task is available
+  at [GitHub](https://github.com/dstackai/dstack/tree/master/src/dstack/api/_public/huggingface/finetuning/sft).
+  If you prefer using a custom script, feel free to do so using [dev environments](dev-environments.md) and 
+  [tasks](tasks.md).
diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -73,7 +73,7 @@ or use the cloud version (which provides GPU out of the box).
 
     The client configuration is stored via `~/.dstack/config.yml`.
 
-??? info "GPU cloud"
+??? info "dstack Cloud"
 
     If you want to use the cloud version of `dstack`, 
     <a href="#" data-tally-open="w7K17R">sign up</a>, and configure the client 

diff --git a/docs/docs/reference/cli/index.md b/docs/docs/reference/cli/index.md
@@ -1,6 +1,8 @@
 # CLI
 
-## dstack server
+## Commands
+
+### dstack server
 
 This command starts the `dstack` server.
 
@@ -13,24 +15,9 @@ $ dstack server --help
 
 </div>
 
-### Environment variables
-
-| Name                              | Description                                   | Default            |
-|-----------------------------------|-----------------------------------------------|--------------------|
-| `DSTACK_DEFAULT_CREDS_DISABLED`   | Disables default credentials detection if set | `None`             |
-| `DSTACK_LOCAL_BACKEND_ENABLED`    | Enables local backend for debug if set        | `None`             |
-| `DSTACK_RUNNER_VERSION`           | Sets exact runner version for debug           | `latest`           |
-| `DSTACK_SERVER_ADMIN_TOKEN`       | Has the same effect as `--token`              | `None`             |
-| `DSTACK_SERVER_DIR`               | Sets path to store data and server configs    | `~/.dstack/server` |
-| `DSTACK_SERVER_HOST`              | Has the same effect as `--host`               | `127.0.0.1`        |
-| `DSTACK_SERVER_LOG_LEVEL`         | Has the same effect as `--log-level`          | `WARNING`          |
-| `DSTACK_SERVER_PORT`              | Has the same effect as `--port`               | `3000`             |
-| `DSTACK_SERVER_ROOT_LOG_LEVEL`    | Sets root logger log level                    | `ERROR`            |
-| `DSTACK_SERVER_UVICORN_LOG_LEVEL` | Sets uvicorn logger log level                 | `ERROR`            |
-
 [//]: # (DSTACK_SERVER_ENVIRONMENT, DSTACK_SERVER_CONFIG_DISABLED, DSTACK_SENTRY_DSN, DSTACK_SENTRY_TRACES_SAMPLE_RATE, DSTACK_SERVER_BUCKET_REGION, DSTACK_SERVER_BUCKET, DSTACK_ALEMBIC_MIGRATIONS_LOCATION)
 
-## dstack init
+### dstack init
 
 This command initializes the current folder as a repo.
 
@@ -54,7 +41,7 @@ $ dstack init --help
     By default, this command generates an SSH key that will be used for port forwarding and SSH access to running workloads. 
     You can override this key via `--ssh-identity`.
 
-## dstack run
+### dstack run
 
 This command runs a given configuration.
 
@@ -68,11 +55,11 @@ $ dstack run . --help
 </div>
 
 ??? info ".gitignore"
-    When running anything via CLI, `dstack` uses the exact version of code from your project directory. 
+When running anything via CLI, `dstack` uses the exact version of code from your project directory.
 
     If there are large files, consider creating a `.gitignore` file to exclude them for better performance.
 
-## dstack ps
+### dstack ps
 
 This command shows the status of runs.
 
@@ -85,7 +72,7 @@ $ dstack ps --help
 
 </div>
 
-## dstack stop
+### dstack stop
 
 This command stops run(s) within the current repository.
 
@@ -98,7 +85,7 @@ $ dstack stop --help
 
 </div>
 
-## dstack logs
+### dstack logs
 
 This command shows the output of a given run within the current repository.
 
@@ -111,12 +98,12 @@ $ dstack logs --help
 
 </div>
 
-## dstack config
+### dstack config
 
-Both the CLI and API need to be configured with the server address, user token, and project name 
-via `~/.dstack/config.yml`. 
+Both the CLI and API need to be configured with the server address, user token, and project name
+via `~/.dstack/config.yml`.
 
-At startup, the server automatically configures CLI and API with the server address, user token, and 
+At startup, the server automatically configures CLI and API with the server address, user token, and
 the default project name (`main`). This configuration is stored via `~/.dstack/config.yml`.
 
 To use CLI and API on different machines or projects, use the `dstack config` command.
@@ -130,7 +117,7 @@ $ dstack config --help
 
 </div>
 
-## dstack gateway
+### dstack gateway
 
 A gateway is required for running services.
 
@@ -187,8 +174,19 @@ $ dstack gateway update --help
 </div>
 
 ## Environment variables
-| Name                   | Description                        | Default    |
-|------------------------|------------------------------------|------------|
-| `DSTACK_CLI_LOG_LEVEL` | Configures CLI logging level       | `CRITICAL` |
-| `DSTACK_PROFILE`       | Has the same effect as `--profile` | `None`     |
-| `DSTACK_PROJECT`       | Has the same effect as `--project` | `None`     |
+
+| Name                              | Description                                   | Default            |
+|-----------------------------------|-----------------------------------------------|--------------------|
+| `DSTACK_CLI_LOG_LEVEL`            | Configures CLI logging level                  | `CRITICAL`         |
+| `DSTACK_PROFILE`                  | Has the same effect as `--profile`            | `None`             |
+| `DSTACK_PROJECT`                  | Has the same effect as `--project`            | `None`             |
+| `DSTACK_DEFAULT_CREDS_DISABLED`   | Disables default credentials detection if set | `None`             |
+| `DSTACK_LOCAL_BACKEND_ENABLED`    | Enables local backend for debug if set        | `None`             |
+| `DSTACK_RUNNER_VERSION`           | Sets exact runner version for debug           | `latest`           |
+| `DSTACK_SERVER_ADMIN_TOKEN`       | Has the same effect as `--token`              | `None`             |
+| `DSTACK_SERVER_DIR`               | Sets path to store data and server configs    | `~/.dstack/server` |
+| `DSTACK_SERVER_HOST`              | Has the same effect as `--host`               | `127.0.0.1`        |
+| `DSTACK_SERVER_LOG_LEVEL`         | Has the same effect as `--log-level`          | `WARNING`          |
+| `DSTACK_SERVER_PORT`              | Has the same effect as `--port`               | `3000`             |
+| `DSTACK_SERVER_ROOT_LOG_LEVEL`    | Sets root logger log level                    | `ERROR`            |
+| `DSTACK_SERVER_UVICORN_LOG_LEVEL` | Sets uvicorn logger log level                 | `ERROR`            |
diff --git a/docs/overrides/header.html b/docs/overrides/header.html
@@ -55,6 +55,12 @@
       </div>
     {% endif %}
     <div class="md-header__buttons">
+      <!--<script>
+        function sign_in_on_click() {
+          window.location.href = "https://cloud.dstack.ai";
+        }
+      </script>
+      <a href="javascript:void(0)" class="md-button md-button-secondary" onclick="sign_in_on_click()">Sign in</a>-->
       <a href="#" data-tally-open="w7K17R" class="md-button md-button--primary">Sign up</a>
     </div>
   </nav>