diff --git a/.github/workflows/docs.yaml b/.github/workflows/docs.yaml index 7f5318dea..34c3f9f24 100644 --- a/.github/workflows/docs.yaml +++ b/.github/workflows/docs.yaml @@ -21,7 +21,7 @@ jobs: - run: | pip install pillow cairosvg sudo apt-get install -y libcairo2-dev libfreetype6-dev libffi-dev libjpeg-dev libpng-dev libz-dev - pip install mkdocs-material mkdocs-material-extensions mkdocs-redirects --upgrade + pip install mkdocs-material pip install "mkdocs-material[imaging]" mkdocs-material-extensions mkdocs-redirects --upgrade pip install git+https://${{ secrets.GH_TOKEN }}@github.com/squidfunk/mkdocs-material-insiders.git mkdocs gh-deploy --config-file ../dstack/mkdocs.yml --force working-directory: ./dstackai.github.io \ No newline at end of file diff --git a/docs/assets/stylesheets/extra.css b/docs/assets/stylesheets/extra.css index cea8ac637..490b75bcc 100644 --- a/docs/assets/stylesheets/extra.css +++ b/docs/assets/stylesheets/extra.css @@ -616,6 +616,10 @@ body { width: 26px; text-align: center; } + + .md-nav--lifted > .md-nav__list > .md-nav__item > [for] { + display: none; + } } .md-sidebar--primary .md-nav__link { diff --git a/docs/blog/.authors.yml b/docs/blog/.authors.yml index 854e6d39d..3c6513c07 100644 --- a/docs/blog/.authors.yml +++ b/docs/blog/.authors.yml @@ -1,3 +1,5 @@ -peterschmidt85: - name: Andrey Cheptsov - avatar: https://github.com/peterschmidt85.png \ No newline at end of file +authors: + peterschmidt85: + name: Andrey Cheptsov + avatar: https://github.com/peterschmidt85.png + description: Creator \ No newline at end of file diff --git a/docs/blog/posts/lambda-cloud-support-preview.md b/docs/blog/posts/lambda-cloud-support-preview.md index 6d4f19520..7dfd77dda 100644 --- a/docs/blog/posts/lambda-cloud-support-preview.md +++ b/docs/blog/posts/lambda-cloud-support-preview.md @@ -70,7 +70,7 @@ ide: vscode You only need to install `cuda` if you intend to build a custom CUDA kernel. Otherwise, it is not necessary as the essential CUDA drivers are already pre-installed. -The [documentation](../../docs) and [examples](https://github.com/dstackai/dstack-examples/blob/main/README.md) +The [documentation](../../docs/index.md) and [examples](../../examples/index.md) are updated to reflect the changes. !!! info "Give it a try and share feedback" diff --git a/docs/blog/posts/new-configuration-format-and-cli-experience.md b/docs/blog/posts/new-configuration-format-and-cli-experience.md index f5d057af5..4e2cdd6a3 100644 --- a/docs/blog/posts/new-configuration-format-and-cli-experience.md +++ b/docs/blog/posts/new-configuration-format-and-cli-experience.md @@ -160,7 +160,7 @@ Last but not least, here's a brief list of other improvements: configured on the machine where the server is running). * We've introduced a new Python API and UI for working with artifacts (more details to come later this week). -The [documentation](../../docs) and [examples](https://github.com/dstackai/dstack-examples/blob/main/README.md) +The [documentation](../../docs/index.md) and [examples](../../examples/index.md) are updated to reflect the changes. !!! info "Try the update and share feedback" diff --git a/docs/blog/posts/prebuilt-environments-spot-and-retry-policies.md b/docs/blog/posts/prebuilt-environments-spot-and-retry-policies.md index 052486674..844d251cf 100644 --- a/docs/blog/posts/prebuilt-environments-spot-and-retry-policies.md +++ b/docs/blog/posts/prebuilt-environments-spot-and-retry-policies.md @@ -96,9 +96,9 @@ Other improvements in this release: - Deletion of repositories is now possible through the UI. - When running a dev environment from a Git repo, you can now pull and push changes directly from the dev environment, with `dstack` correctly configuring your Git credentials. -- The newly added Python API for working with artifacts is now documented [here](../../docs/reference/api/python.md). +- The newly added Python API for working with artifacts is now documented [here](../../docs/reference/api/python/index.md). -The [documentation](../../docs) and [examples](https://github.com/dstackai/dstack-examples/blob/main/README.md) +The [documentation](../../docs/index.md) and [examples](../../examples/index.md) are updated to reflect the changes. !!! info "Give it a try and share feedback" diff --git a/docs/docs/guides/clouds.md b/docs/docs/guides/clouds.md index 44c838d13..6e4369411 100644 --- a/docs/docs/guides/clouds.md +++ b/docs/docs/guides/clouds.md @@ -49,16 +49,16 @@ Configuring backends involves providing cloud credentials, and specifying storag
- [**AWS** Learn how to set up an Amazon Web Services backend. - ](../../reference/backends/aws/) + ](../reference/backends/aws.md) - [**GCP** - Learn how to set up a Google Cloud backend. - ](../../reference/backends/gcp/) +Learn how to set up a Google Cloud backend. + ](../reference/backends/gcp.md) - [**Azure** Learn how to set up an Microsoft Azure backend. - ](../../reference/backends/azure/) + ](../reference/backends/azure.md) - [**Lambda** Learn how to set up a Lambda Cloud backend. - ](../../reference/backends/lambda/) + ](../reference/backends/lambda.md)
diff --git a/docs/docs/reference/api/python/index.md b/docs/docs/reference/api/python/index.md index 8811ed29e..c22af2422 100644 --- a/docs/docs/reference/api/python/index.md +++ b/docs/docs/reference/api/python/index.md @@ -5,7 +5,7 @@ The Python API allows for programmatically running tasks and services across con #### Installation Before you can use `dstack` Python API, ensure you have installed the `dstack` package, -started a `dstack` server with [configured clouds](../../docs/docs/guides/clouds.md). +started a `dstack` server with [configured clouds](../../../guides/clouds.md). ```shell pip install "dstack[all]==0.11.3rc1" diff --git a/docs/examples/deploy-python.md b/docs/examples/deploy-python.md new file mode 100644 index 000000000..4a213cc30 --- /dev/null +++ b/docs/examples/deploy-python.md @@ -0,0 +1,150 @@ +# Deploying LLMs using Python API + +The [Python API](../docs/reference/api/python/index.md) of `dstack` can be used to run [tasks](../docs/guides/tasks.md) and [services](../docs/guides/services.md) programmatically. + +To demonstrate how it works, we've created a simple Streamlit app that uses `dstack`'s API to deploy a quantized +version of Llama 2 to your cloud with a click of a button. + +![](images/python-api/dstack-python-api-streamlit-example.png){ width=800 } + +## Prerequisites + +Before you can use `dstack` Python API, ensure you have installed the `dstack` package, +started a `dstack` server with [configured clouds](../docs/guides/clouds.md). + +```shell +pip install "dstack[all]==0.11.3rc1" +dstack start +``` + +## How does it work? + +### Create a client + +If you're familiar with Docker's Python SDK, you'll find `dstack`'s [Python API](../docs/reference/api/python/index.md) +quite similar, except that it runs your workload in the cloud. + +To get started, create an instance of `dstack.Client` and use its methods to submit and manage runs. + +```python +import dstack + +try: + client = dstack.Client.from_config(repo_dir=".") +except dstack.api.hub.errors.HubClientError as e: + print(e) +``` + +### Create a task + +!!! info "NOTE:" + With `dstack.Client`, you can run [tasks](../docs/guides/tasks.md) and [services](../docs/guides/services.md). + Running a task allows you to programmatically access its ports and + forward traffic to your local machine. For example, if you run an LLM as a task, you can access it on `localhost`. + Services on the other hand allow deploying applications as public endpoints. + +In our example, we'll deploy an LLM as a task. To do this, we'll create a `dstack.Task` instance that configures how the +LLM should be run. + +```python +configuration = dstack.Task( + image="ghcr.io/huggingface/text-generation-inference:latest", + env={"MODEL_ID": model_id}, + commands=[ + "text-generation-launcher --trust-remote-code --quantize gptq", + ], + ports=["8080:80"], # LLM runs on port 80, forwarded to localhost:8080 +) +``` + +### Create resources + +Then, we'll need to specify the resources our LLM will require. To do this, we'll create a `dstack.Resources` instance: + +```python + +if model_id == "TheBloke/Llama-2-13B-chat-GPTQ": + gpu_memory = "20GB" +elif model_id == "TheBloke/Llama-2-70B-chat-GPTQ": + gpu_memory = "40GB" + +resources = dstack.Resources(gpu=dstack.GPU(memory=gpu_memory)) +``` + +### Submit the run + +To deploy the LLM, we submit the task using `runs.submit()` in `dstack.Client`. + +```python +run_name = "deploy-python" + +run = client.runs.submit(configuration=configuration, run_name=run_name, resources=resources) +``` + +### Attach to the run + +Then, we use the `attach()` method on `dstack.Run`. This method waits for the task to start, +and forwards the configured ports to `localhost`. + +``` +run.attach() +``` + +### Wait for the endpoint to start + +Finally, we wait until `http://localhost:8080/health` returns `200`, which indicates that the LLM is deployed and ready to +handle requests. + +```python +import time +import requests + +while True: + time.sleep(0.5) + try: + r = requests.get("http://localhost:8080/health") + if r.status_code == 200: + break + except Exception: + pass +``` + +### Stop the run + +To undeploy the model, we can use the `stop()` method on `dstack.Run`. + +```python +run.stop() +``` + +### Retrieve the status of a run + +Note: If you'd like to retrieve the `dstack.Run` instance by the name of the run, +you can use the `runs.get()` method on `dstack.Client`. + +```python +run = client.runs.get(run_name) +``` + +The `status()` method on `dstack.Run` can always provide the status of the run. + +```python +if run: + print(run.status()) +``` + +## Source code + +The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples). + +```shell +git clone https://github.com/dstackai/dstack-examples +cd dstack-examples +``` + +Once the repository is cloned, feel free to install the requirements and run the app: + +``` +pip install -r deploy-python/requirements.txt +streamlit run deploy-python/app.py +``` \ No newline at end of file diff --git a/docs/examples/python-api.md b/docs/examples/python-api.md deleted file mode 100644 index 08fd0aee4..000000000 --- a/docs/examples/python-api.md +++ /dev/null @@ -1,154 +0,0 @@ -# Deploying LLMs using Python API - -The [Python API](../docs/reference/api/python/index.md) of `dstack` can be used to run -[tasks](../docs/guides/tasks.md) and [services](../docs/guides/services.md) programmatically. - -Below is an example of a Streamlit app that uses `dstack`'s API to deploy a quantized version of Llama 2 to your cloud -with a simple click of a button. - -![](images/python-api/dstack-python-api-streamlit-example.png){ width=800 } - -!!! info "How does the API work?" - If you're familiar with Docker's Python SDK, you'll find dstack's Python API quite similar, except that it runs your - workload in the cloud. - - To get started, create an instance of `dstack.Client` and use its methods to submit and manage runs. - - With `dstack.Client`, you can run [tasks](../docs/guides/tasks.md) and [services](../docs/guides/services.md). Running a task allows you to programmatically access its ports and - forward traffic to your local machine. For example, if you run an LLM as a task, you can access it on localhost. - Services on the other hand allow deploying applications as public endpoints. - -## Prerequisites - -Before you can use `dstack` Python API, ensure you have installed the `dstack` package, -started a `dstack` server with [configured clouds](../../docs/docs/guides/clouds.md). - -```shell -pip install "dstack[all]==0.11.3rc1" -dstack start -``` - -## Run the app - -First, clone the repository with `dstack-examples`. - -```shell -git clone https://github.com/dstackai/dstack-examples -cd dstack-examples -``` - -Second, install the requirements, and run the app: - -``` -pip install -r streamlit-llama/requirements.txt -streamlit run streamlit-llama/app.py -``` - -That's it! Now you can choose a model (e.g., 13B or 70B) and click the `Deploy` button. -Once the LLM is up, you can access it at `localhost`. - -## Code walkthrough - -For the complete code, -refer to the [full version](https://github.com/dstackai/dstack-examples/blob/main/streamlit-llama/app.py). - -First, we initialize the `dstack.Client`: - -```python -if len(st.session_state) == 0: - st.session_state.client = dstack.Client.from_config(".") -``` - -Then, we prompt the user to choose an LLM for deployment. - -```python -def trigger_llm_deployment(): - st.session_state.deploying = True - st.session_state.error = None - -with st.sidebar: - model_id = st.selectbox("Choose an LLM to deploy", - ("TheBloke/Llama-2-13B-chat-GPTQ", - "TheBloke/Llama-2-70B-chat-GPTQ",), - disabled=st.session_state.deploying or st.session_state.deployed) - if not st.session_state.deploying: - st.button("Deploy", on_click=trigger_llm_deployment, type="primary") -``` - -Prepare a `dstack` task and resource requirements based on the selected model. - -```python -def get_configuration(): - return dstack.Task( - image="ghcr.io/huggingface/text-generation-inference:latest", - env={"MODEL_ID": model_id}, - commands=[ - "text-generation-launcher --trust-remote-code --quantize gptq", - ], - ports=["8080:80"], - ) - - -def get_resources(): - if model_id == "TheBloke/Llama-2-13B-chat-GPTQ": - gpu_memory = "20GB" - elif model_id == "TheBloke/Llama-2-70B-chat-GPTQ": - gpu_memory = "40GB" - return dstack.Resources(gpu=dstack.GPU(memory=gpu_memory)) -``` - -If the user clicks `Deploy`, we submit the task using `runs.submit()` on `dstack.Client`. Then, we use the `attach()` -method on `dstack.Run`. This method waits for the task to start, forwarding the port to `localhost`. - -Finally, we wait until `http://localhost:8080/health` returns `200`. - -```python -def wait_for_ok_status(url): - while True: - time.sleep(0.5) - try: - r = requests.get(url) - if r.status_code == 200: - break - except Exception: - pass - -if st.session_state.deploying: - with st.sidebar: - with st.status("Deploying the LLM...", expanded=True) as status: - st.write("Provisioning...") - try: - run = st.session_state.client.runs.submit(configuration=get_configuration(), run_name=run_name, - resources=get_resources()) - st.session_state.run = run - st.write("Attaching to the LLM...") - st.session_state.run.attach() - wait_for_ok_status("http://localhost:8080/health") - status.update(label="The LLM is ready!", state="complete", expanded=False) - st.session_state.deploying = False - st.session_state.deployed = True - except Exception as e: - st.session_state.error = str(e) - st.session_state.deploying = False - st.experimental_rerun() -``` - -If an error occurs, we display it. Additionally, we provide a button to undeploy the model using the `stop()` method on `dstack.Run`. - -```python -def trigger_llm_undeployment(): - st.session_state.run.stop() - st.session_state.deploying = False - st.session_state.deployed = False - st.session_state.run = None - -with st.sidebar: - if st.session_state.error: - st.error(st.session_state) - - if st.session_state.deployed: - st.button("Undeploy", type="primary", key="stop", on_click=trigger_llm_undeployment) -``` - -!!! info "Source code" - The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples). \ No newline at end of file diff --git a/docs/overrides/examples.html b/docs/overrides/examples.html index cd8eb69c5..4e0b28b93 100644 --- a/docs/overrides/examples.html +++ b/docs/overrides/examples.html @@ -1,4 +1,4 @@ -{% extends "main.html" %} +{% extends "landing.html" %} {% block content %}
@@ -27,7 +27,7 @@

- +
diff --git a/docs/overrides/header.html b/docs/overrides/header.html new file mode 100644 index 000000000..29c2e998f --- /dev/null +++ b/docs/overrides/header.html @@ -0,0 +1,66 @@ +{#- + This file was automatically generated - do not edit +-#} +{% set class = "md-header" %} +{% if "navigation.tabs.sticky" in features %} + {% set class = class ~ " md-header--shadow md-header--lifted" %} +{% elif "navigation.tabs" not in features %} + {% set class = class ~ " md-header--shadow" %} +{% endif %} +
+
+
+
+ + {{ config.site_name }} + +
+
+ + {% if page.meta and page.meta.title %} + {{ page.meta.title }} + {% else %} + {{ page.title }} + {% endif %} + +
+
+
+ {% if config.theme.palette %} + {% if not config.theme.palette is mapping %} + {% include "partials/palette.html" %} + {% endif %} + {% endif %} + {% if not config.theme.palette is mapping %} + {% include "partials/javascripts/palette.html" %} + {% endif %} + {% if config.extra.alternate %} + {% include "partials/alternate.html" %} + {% endif %} + {% if "material/search" in config.plugins %} + + {% include "partials/search.html" %} + {% endif %} + {% if config.repo_url %} +
+ {% include "partials/source.html" %} +
+ {% endif %} + + {% if "navigation.tabs.sticky" in features %} + {% if "navigation.tabs" in features %} + {% include "partials/tabs.html" %} + {% endif %} + {% endif %} + diff --git a/docs/overrides/home.html b/docs/overrides/home.html index 6011537b5..a76dbbb19 100644 --- a/docs/overrides/home.html +++ b/docs/overrides/home.html @@ -1,4 +1,4 @@ -{% extends "main.html" %} +{% extends "landing.html" %} {% block scripts %} {{ super() }} @@ -244,7 +244,7 @@

- +
diff --git a/docs/overrides/landing.html b/docs/overrides/landing.html new file mode 100644 index 000000000..753fc72da --- /dev/null +++ b/docs/overrides/landing.html @@ -0,0 +1,18 @@ +{% extends "base.html" %} + +{% block header %} + {% include "header.html" %} +{% endblock %} + +{% block scripts %} + +{{ super() }} +{% endblock %} + +{% block announce %} +Like dstack? Give us a ⭐ +on + GitHub! +{% endblock %} \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index d05cf57b3..2a1609c35 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -73,7 +73,8 @@ plugins: 'docs/quick-start.md': 'docs/index.md' 'docs/installation/index.md': 'docs/index.md' 'tutorials/stable-diffusion.md': 'examples/stable-diffusion-xl.md' - 'docs/guides/projects.md': 'docs/projects.md' + 'docs/guides/projects.md': 'docs/guides/clouds.md' + 'examples/python-api.md': 'examples/deploy-python.md' - typeset # Extensions @@ -123,13 +124,13 @@ extra: social: - icon: /fontawesome/brands/github link: https://github.com/dstackai/dstack - - icon: /fontawesome/brands/python - link: https://pypi.org/project/dstack +# - icon: /fontawesome/brands/python +# link: https://pypi.org/project/dstack - icon: /fontawesome/brands/docker link: https://hub.docker.com/r/dstackai/dstack - icon: /fontawesome/brands/discord link: https://discord.gg/u8SmfwPpMd - - icon: /fontawesome/brands/twitter + - icon: /simple/x link: https://twitter.com/dstackai status: new: Recently added @@ -180,7 +181,7 @@ nav: - examples/index.md - Examples: - RAG with Llama Index and Weaviate: examples/llama-index-weaviate.md - - Deploying LLMs using Python API: examples/python-api.md + - Deploying LLMs using Python API: examples/deploy-python.md - Fine-tuning Llama 2 using QLoRA: examples/finetuning-llama-2.md - Deploying LLMs using TGI: examples/text-generation-inference.md - Deploying SDXL using FastAPI: examples/stable-diffusion-xl.md