- [Docs] Added the RAG with Llama Index and Weaviate example

- [Docs] Minor edits
dstackai · Sep 15, 2023 · 7784a8a · 7784a8a
1 parent daa7263
commit 7784a8a
Show file tree

Hide file tree

Showing 17 changed files with 355 additions and 173 deletions.
diff --git a/README.md b/README.md
@@ -46,7 +46,7 @@ dstack start
 ## Configure clouds
 
 Upon startup, the server sets up the default project called `main`.
-Prior to using `dstack`, make sure to [configure clouds](https://dstack.ai/docs/guides/clouds#configuring-backends).
+Prior to using `dstack`, make sure to [configure clouds](https://dstack.ai/docs/guides/clouds#configure-backends).
 
 Once the server is up, you can orchestrate GPU workloads using
 either the CLI or Python API.
@@ -110,7 +110,7 @@ commands:
 ```
 
 Once the service is up, `dstack` makes it accessible from the Internet through
-the [gateway](https://dstack.ai/docs/guides/clouds#configuring-gateways).
+the [gateway](https://dstack.ai/docs/guides/clouds#configure-gateways).
 
 ### Run a configuration
 

diff --git a/docs/assets/stylesheets/extra.css b/docs/assets/stylesheets/extra.css
@@ -516,6 +516,10 @@ code .md-code__nav:hover .md-code__button {
     /*color: var(--md-code-fg-color) !important;*/
 }
 
+.highlight .sd {
+    color: var(--md-code-hl-string-color);
+}
+
 .highlight .na, .highlight .nv, .highlight .vc, .highlight .vg, .highlight .vi {
     color: #c6c052;
 }

diff --git a/docs/docs/guides/clouds.md b/docs/docs/guides/clouds.md
@@ -13,7 +13,7 @@ their credentials to `dstack`.
     3. <span>**No vendor lock-in**
        An open-source and cloud-agnostic interface enables easy switching between cloud providers.</span>
 
-## Creating cloud accounts
+## Create cloud accounts
 
 To use clouds with `dstack`, you need to first create an account with each cloud provider.
 
@@ -27,7 +27,7 @@ relevant issues in [our tracker](https://github.com/dstackai/dstack/issues).
     Startups can apply for extra credits, usually by reaching out directly to the provider in the case of smaller providers,
     or through a partner program (such as [NVIDIA Inception](https://www.nvidia.com/en-us/startups/)) for larger providers.
 
-??? info "Requesting GPU quotas"
+??? info "Request GPU quotas"
 
     Larger providers require you to request GPU quotas, essentially obtaining permission from their support
     team, prior to utilizing GPUs with your account. If planning to use GPU through credits, approval for the request might
@@ -38,7 +38,7 @@ relevant issues in [our tracker](https://github.com/dstackai/dstack/issues).
     To use spot instances with certain cloud providers (e.g. AWS), you should request quotes
     for such instances separately.
 
-## Configuring backends
+## Configure backends
 
 To use your cloud accounts with `dstack`, open the project settings and configure a backend for each cloud.
 
@@ -62,7 +62,7 @@ Configuring backends involves providing cloud credentials, and specifying storag
 
 </div>
 
-## Configuring gateways
+## Configure gateways
 
 If you intend to use [services](services.md) (e.g. to deploy public endpoints), you must also configure a gateway. 
 Configuring a backend involves selecting a backend and a region.
@@ -76,7 +76,7 @@ After the gateway is created (and assigned an external IP), set up an A record a
 
 Then, in the gateway's settings, specify the wildcard domain.
 
-## Requesting resources
+## Request resources
 
 You can request resources using the [`--gpu`](../reference/cli/run.md#GPU) 
 and [`--memory`](../reference/cli/run.md#MEMORY) arguments with `dstack run`, 

diff --git a/docs/docs/guides/services.md b/docs/docs/guides/services.md
@@ -93,8 +93,8 @@ For more details on the file syntax, refer to [`.dstack.yml`](../reference/dstac
 
 ## Run the configuration
 
-!!! info "NOTE:"
-    Before running a service, ensure that you have configured a [gateway](clouds.md#configuring-gateways).
+!!! info "Gateway"
+    Before running a service, ensure that you have configured a [gateway](clouds.md#configure-gateways).
 
 To run a service, use the `dstack run` command followed by the path to the directory you want to use as the
 working directory.
@@ -119,14 +119,14 @@ Serving HTTP on https://yellow-cat-1.mydomain.com ...
 
 This command deploys the service, and forwards the traffic to the gateway's endpoint.
 
-!!! info "Endoint URL"
-    If you've configured a [wildcard domain](clouds.md#configuring-gateways) for the gateway, 
+!!! info "Wildcard domain"
+    If you've configured a [wildcard domain](clouds.md#configure-gateways) for the gateway, 
     `dstack` enables HTTPS automatically and serves the service at 
     `https://<run name>.<your domain name>`.
 
     If you wish to customize the run name, you can use the `-n` argument with the `dstack run` command. 
 
-### Requesting resources
+### Request resources
 
 You can request resources using the [`--gpu`](../reference/cli/run.md#GPU) 
 and [`--memory`](../reference/cli/run.md#MEMORY) arguments with `dstack run`, 

diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -20,7 +20,7 @@ The server is available at http://127.0.0.1:3000?token=b934d226-e24a-4eab-eb92b3
 
 !!! info "Configure clouds"
     Upon startup, the server sets up the default project called `main`.
-    Prior to using `dstack`, make sure to [configure clouds](guides/clouds.md#configuring-backends).
+    Prior to using `dstack`, make sure to [configure clouds](guides/clouds.md#configure-backends).
 
 Once the server is up, you can orchestrate LLM workloads using either the CLI or the Python API.
 
@@ -111,7 +111,7 @@ commands:
 </div>
 
 Once the service is up, `dstack` makes it accessible from the Internet through
-the [gateway](guides/clouds.md#configuring-gateways).
+the [gateway](guides/clouds.md#configure-gateways).
 
 For more details on the file syntax, refer to [`.dstack.yml`](../docs/reference/dstack.yml/index.md).
 

diff --git a/docs/docs/installation/docker.md b/docs/docs/installation/docker.md
@@ -14,7 +14,7 @@ $ docker run --name dstack -p &lt;port-on-host&gt;:3000 \
 
 !!! info "Configure clouds"
     Upon startup, the server sets up the default project called `main`.
-    Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configuring-clouds-with-dstack).
+    Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configure-backends).
 
 ## Environment variables
 

diff --git a/docs/docs/installation/hf-spaces.md b/docs/docs/installation/hf-spaces.md
@@ -60,4 +60,4 @@ If you don't do that, `dstack` will generate it randomly and print it to the `Lo
 
 ## Configure clouds
 
-Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configuring-clouds-with-dstack).
+Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configure-backends).
diff --git a/docs/docs/installation/pip.md b/docs/docs/installation/pip.md
@@ -15,4 +15,4 @@ The server is available at http://127.0.0.1:3000?token=b934d226-e24a-4eab-eb92b3
 
 !!! info "Configure clouds"
     Upon startup, the server sets up the default project called `main`.
-    Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configuring-clouds-with-dstack).
+    Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configure-backends).
diff --git a/docs/examples/finetuning-llama-2.md b/docs/examples/finetuning-llama-2.md
@@ -1,4 +1,4 @@
-# Fine-tuning Llama 2
+# Fine-tuning Llama 2 using QLoRA
 
 The release of Llama 2 by Meta has caused quite a stir due to its impressive performance and its license that permits commercial use.
 Along with other advancements in the LLM toolchain, such as LangChain and vector databases, Llama 2 has vast potential

diff --git a/docs/examples/images/python-api/dstack-python-api-streamlit-example.png b/docs/examples/images/python-api/dstack-python-api-streamlit-example.png
diff --git a/docs/examples/llama-index-weaviate.md b/docs/examples/llama-index-weaviate.md
@@ -0,0 +1,205 @@
+# RAG with Llama Index and Weaviate
+
+RAG, or retrieval-augmented generation, empowers LLMs by providing them with access to your data.
+
+Here's an example of how to apply this technique using the [Llama Index](https://www.llamaindex.ai/) framework 
+and [Weaviate](https://weaviate.io/) vector database.
+
+??? info "About Llama Index"
+    Llama Index is an open-source framework that makes it easy extract data from different sources and connect
+    it to LLMs. It provides a variety of tools and APIs to help you ingest, structure, and access your data in a way that is
+    easy for LLMs to consume.
+
+??? info "About Weaviate"
+    Weaviate is an open-source vector database that allows you to store and query objects 
+    using their vector representations, also known as embeddings.
+    A vector representation is a mathematical way of representing an object as a point in a high-dimensional space.
+    This allows Weaviate to perform semantic search, which is the ability to search for objects based on their meaning,
+    rather than just their exact match to a query.
+
+## How does it work?
+
+1. Llama Index loads data from local files, structures it into chunks, and ingests it into Weaviate. It uses local
+  embeddings through the [SentenceTransformers](https://www.sbert.net/) library.
+2. `dstack` allows us to configure cloud accounts (e.g. AWS, GCP, Azure, or Lambda Cloud), 
+  and deploy LLMs (e.g. Llama 2) there. In this example, it employs [Text Generation Inference](https://github.com/huggingface/text-generation-inference)
+  to serve the LLM. Refer to [Deploying LLMs using TGI](text-generation-inference.md) and [Deploying LLMs via Python API](python-api.md).
+3. Llama Index allows us to prompt the LLM and automatically incorporates context from Weaviate. 
+
+## Requirements
+
+Here's the list of Python libraries that we'll use:
+
+```
+weaviate-client
+llama-index
+sentence-transformers
+text_generation
+```
+
+## Load data to Weaviate
+
+The first thing we do is load the data from local files and ingest it into Weaviate.
+
+!!! info "NOTE:"
+    To use Weaviate, you need to either [install](https://weaviate.io/developers/weaviate/installation) 
+    it on-premises or sign up for their managed service.
+
+Since we're going to load data into or from Weaviate, we'll need a `weaviate.Client`:
+
+```python
+import os
+
+import weaviate
+
+auth_config = weaviate.AuthApiKey(api_key=os.getenv("WEAVIATE_API_TOKEN"))
+
+client = weaviate.Client(url=os.getenv("WEAVIATE_URL"), auth_client_secret=auth_config)
+
+client.schema.delete_class("llama-index-weaviate")
+```
+
+Next, prepare the Llama Index classes: `llama_index.ServiceContext` (for indexing and querying) and
+`llama_index.StorageContext` (for loading and storing). Note that we're using
+`langchain.embeddings.huggingface.HuggingFaceEmbeddings` for local embeddings instead of OpenAI.
+
+```python
+from langchain.embeddings.huggingface import HuggingFaceEmbeddings
+
+from llama_index import (
+    LangchainEmbedding,
+    ServiceContext,
+    StorageContext,
+)
+from llama_index.vector_stores import WeaviateVectorStore
+
+embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
+
+service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=None)
+
+vector_store = WeaviateVectorStore(weaviate_client=client, index_name="llama-index-weaviate")
+
+storage_context = StorageContext.from_defaults(vector_store=vector_store)
+```
+
+Once the utility classes are configured, we can load the data from local files and pass it to
+`llama_index.VectorStoreIndex`. Using its `from_documents` method will then store the data in the vector database.
+
+```python
+from pathlib import Path
+
+from llama_index import (
+    SimpleDirectoryReader,
+    VectorStoreIndex,
+)
+
+documents = SimpleDirectoryReader(Path(__file__).parent / "data").load_data()
+
+index = VectorStoreIndex.from_documents(
+    documents,
+    service_context=service_context,
+    storage_context=storage_context,
+)
+```
+
+The data is in the vector database! Now we can proceed with the part where we invoke an LLM using this data as context.
+
+## Deploy an LLM
+
+This example assumes we're using an LLM deployed with `dstack` which can be set up either as a task (for development) or
+as a service (for production).
+
+For a detailed example on how to deploy an LLM as a service, check out
+[Deploying LLMs using TGI](text-generation-inference.md).
+
+Alternatively, for development purposes, you can also check out [Deploying LLMs using Python API](python-api.md). The example comes with a simple
+Streamlit app that allows you to deploy an LLM as a task with just one click.
+
+![llama-index-weaviate.md](images/python-api/dstack-python-api-streamlit-example.png)
+
+## Generate response
+
+Once the LLM is up, we can prompt it through Llama Index to automatically incorporate context from Weaviate.
+
+Since we'll invoke the actual LLM, when configuring `llama_index.ServiceContext`, we must include the LLM configuration.
+
+In our example, using an LLM deployed with dstack via TGI, we'll use the `langchain.HuggingFaceTextGenInference` wrapper.
+This wrapper requires the LLM's URL and other LLM parameters.
+
+!!! info "NOTE:"
+    If you've deployed the LLM using [Deploying LLMs using Python API](python-api.md),
+    make sure to set the `TGI_ENDPOINT_URL` to `http://localhost:8080`.
+
+```python
+import os
+
+from llama_index import (
+    LangchainEmbedding,
+    ServiceContext,
+    VectorStoreIndex,
+)
+
+from langchain import HuggingFaceTextGenInference
+from langchain.embeddings.huggingface import HuggingFaceEmbeddings
+
+from llama_index.llm_predictor import LLMPredictor
+from llama_index.vector_stores import WeaviateVectorStore
+
+embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
+
+llm_predictor = LLMPredictor(
+    llm=HuggingFaceTextGenInference(
+        inference_server_url=os.getenv("TGI_ENDPOINT_URL"),
+        max_new_tokens=512,
+        streaming=True,
+    ),
+)
+
+service_context = ServiceContext.from_defaults(
+    embed_model=embed_model,
+    llm_predictor=llm_predictor,
+)
+
+vector_store = WeaviateVectorStore(weaviate_client=client, index_name="llama-index-weaviate")
+
+index = VectorStoreIndex.from_vector_store(
+    vector_store, service_context=service_context
+)
+```
+
+Once `llama_index.VectorStoreIndex` is ready, we can proceed with querying it.
+
+!!! info "NOTE:"
+    If we're deploying Llama 2, we have to ensure that the prompt format is correct.
+
+```python
+from llama_index import QuestionAnswerPrompt
+
+prompt = QuestionAnswerPrompt(
+    """<s>[INST] <<SYS>>
+We have provided context information below. 
+
+{context_str}
+
+Given this information, please answer the question.
+<</SYS>>
+
+[/INST]</s>
+<s>[INST]{query_str}[/INST]"""
+)
+query_engine = index.as_query_engine(
+    text_qa_template=prompt,
+    streaming=True,
+)
+
+response = query_engine.query("What did the author do growing up?")
+response.print_response_stream()
+```
+
+That's it! This basic example shows how straightforward it is to use Llama Index and Weaviate with the LLMs deployed
+using `dstack`. For more in-depth information, we encourage you to explore the documentation for each tool.
+
+## Source code
+
+!!! info "Source code"
+    The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
diff --git a/docs/examples/python-api.md b/docs/examples/python-api.md
@@ -1,8 +1,4 @@
----
-title: Deploying LLMs with Python API
----
-
-# Deploying LLMs with Python API
+# Deploying LLMs using Python API
 
 The [Python API](../docs/reference/api/python/index.md) of `dstack` can be used to run 
 [tasks](../docs/guides/tasks.md) and [services](../docs/guides/services.md) programmatically.
@@ -20,8 +16,7 @@ with a simple click of a button.
 
     With `dstack.Client`, you can run [tasks](../docs/guides/tasks.md) and [services](../docs/guides/services.md). Running a task allows you to programmatically access its ports and
     forward traffic to your local machine. For example, if you run an LLM as a task, you can access it on localhost.
-
-    For more details on the Python API, please refer to its [reference](../docs/reference/api/python/index.md).
+    Services on the other hand allow deploying applications as public endpoints.
 
 ## Prerequisites