Skip to content

Commit

Permalink
- [Docs] Added the RAG with Llama Index and Weaviate example
Browse files Browse the repository at this point in the history
- [Docs] Minor edits
  • Loading branch information
peterschmidt85 committed Sep 15, 2023
1 parent daa7263 commit 7784a8a
Show file tree
Hide file tree
Showing 17 changed files with 355 additions and 173 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ dstack start
## Configure clouds

Upon startup, the server sets up the default project called `main`.
Prior to using `dstack`, make sure to [configure clouds](https://dstack.ai/docs/guides/clouds#configuring-backends).
Prior to using `dstack`, make sure to [configure clouds](https://dstack.ai/docs/guides/clouds#configure-backends).

Once the server is up, you can orchestrate GPU workloads using
either the CLI or Python API.
Expand Down Expand Up @@ -110,7 +110,7 @@ commands:
```

Once the service is up, `dstack` makes it accessible from the Internet through
the [gateway](https://dstack.ai/docs/guides/clouds#configuring-gateways).
the [gateway](https://dstack.ai/docs/guides/clouds#configure-gateways).

### Run a configuration

Expand Down
4 changes: 4 additions & 0 deletions docs/assets/stylesheets/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,10 @@ code .md-code__nav:hover .md-code__button {
/*color: var(--md-code-fg-color) !important;*/
}

.highlight .sd {
color: var(--md-code-hl-string-color);
}

.highlight .na, .highlight .nv, .highlight .vc, .highlight .vg, .highlight .vi {
color: #c6c052;
}
Expand Down
10 changes: 5 additions & 5 deletions docs/docs/guides/clouds.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ their credentials to `dstack`.
3. <span>**No vendor lock-in**
An open-source and cloud-agnostic interface enables easy switching between cloud providers.</span>

## Creating cloud accounts
## Create cloud accounts

To use clouds with `dstack`, you need to first create an account with each cloud provider.

Expand All @@ -27,7 +27,7 @@ relevant issues in [our tracker](https://github.com/dstackai/dstack/issues).
Startups can apply for extra credits, usually by reaching out directly to the provider in the case of smaller providers,
or through a partner program (such as [NVIDIA Inception](https://www.nvidia.com/en-us/startups/)) for larger providers.

??? info "Requesting GPU quotas"
??? info "Request GPU quotas"

Larger providers require you to request GPU quotas, essentially obtaining permission from their support
team, prior to utilizing GPUs with your account. If planning to use GPU through credits, approval for the request might
Expand All @@ -38,7 +38,7 @@ relevant issues in [our tracker](https://github.com/dstackai/dstack/issues).
To use spot instances with certain cloud providers (e.g. AWS), you should request quotes
for such instances separately.

## Configuring backends
## Configure backends

To use your cloud accounts with `dstack`, open the project settings and configure a backend for each cloud.

Expand All @@ -62,7 +62,7 @@ Configuring backends involves providing cloud credentials, and specifying storag

</div>

## Configuring gateways
## Configure gateways

If you intend to use [services](services.md) (e.g. to deploy public endpoints), you must also configure a gateway.
Configuring a backend involves selecting a backend and a region.
Expand All @@ -76,7 +76,7 @@ After the gateway is created (and assigned an external IP), set up an A record a

Then, in the gateway's settings, specify the wildcard domain.

## Requesting resources
## Request resources

You can request resources using the [`--gpu`](../reference/cli/run.md#GPU)
and [`--memory`](../reference/cli/run.md#MEMORY) arguments with `dstack run`,
Expand Down
10 changes: 5 additions & 5 deletions docs/docs/guides/services.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,8 @@ For more details on the file syntax, refer to [`.dstack.yml`](../reference/dstac

## Run the configuration

!!! info "NOTE:"
Before running a service, ensure that you have configured a [gateway](clouds.md#configuring-gateways).
!!! info "Gateway"
Before running a service, ensure that you have configured a [gateway](clouds.md#configure-gateways).

To run a service, use the `dstack run` command followed by the path to the directory you want to use as the
working directory.
Expand All @@ -119,14 +119,14 @@ Serving HTTP on https://yellow-cat-1.mydomain.com ...

This command deploys the service, and forwards the traffic to the gateway's endpoint.

!!! info "Endoint URL"
If you've configured a [wildcard domain](clouds.md#configuring-gateways) for the gateway,
!!! info "Wildcard domain"
If you've configured a [wildcard domain](clouds.md#configure-gateways) for the gateway,
`dstack` enables HTTPS automatically and serves the service at
`https://<run name>.<your domain name>`.

If you wish to customize the run name, you can use the `-n` argument with the `dstack run` command.

### Requesting resources
### Request resources

You can request resources using the [`--gpu`](../reference/cli/run.md#GPU)
and [`--memory`](../reference/cli/run.md#MEMORY) arguments with `dstack run`,
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The server is available at http://127.0.0.1:3000?token=b934d226-e24a-4eab-eb92b3

!!! info "Configure clouds"
Upon startup, the server sets up the default project called `main`.
Prior to using `dstack`, make sure to [configure clouds](guides/clouds.md#configuring-backends).
Prior to using `dstack`, make sure to [configure clouds](guides/clouds.md#configure-backends).

Once the server is up, you can orchestrate LLM workloads using either the CLI or the Python API.

Expand Down Expand Up @@ -111,7 +111,7 @@ commands:
</div>

Once the service is up, `dstack` makes it accessible from the Internet through
the [gateway](guides/clouds.md#configuring-gateways).
the [gateway](guides/clouds.md#configure-gateways).

For more details on the file syntax, refer to [`.dstack.yml`](../docs/reference/dstack.yml/index.md).

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/installation/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ $ docker run --name dstack -p &lt;port-on-host&gt;:3000 \

!!! info "Configure clouds"
Upon startup, the server sets up the default project called `main`.
Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configuring-clouds-with-dstack).
Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configure-backends).

## Environment variables

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/installation/hf-spaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,4 +60,4 @@ If you don't do that, `dstack` will generate it randomly and print it to the `Lo

## Configure clouds

Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configuring-clouds-with-dstack).
Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configure-backends).
2 changes: 1 addition & 1 deletion docs/docs/installation/pip.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ The server is available at http://127.0.0.1:3000?token=b934d226-e24a-4eab-eb92b3

!!! info "Configure clouds"
Upon startup, the server sets up the default project called `main`.
Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configuring-clouds-with-dstack).
Prior to using `dstack`, make sure to [configure clouds](../guides/clouds.md#configure-backends).
2 changes: 1 addition & 1 deletion docs/examples/finetuning-llama-2.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Fine-tuning Llama 2
# Fine-tuning Llama 2 using QLoRA

The release of Llama 2 by Meta has caused quite a stir due to its impressive performance and its license that permits commercial use.
Along with other advancements in the LLM toolchain, such as LangChain and vector databases, Llama 2 has vast potential
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
205 changes: 205 additions & 0 deletions docs/examples/llama-index-weaviate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# RAG with Llama Index and Weaviate

RAG, or retrieval-augmented generation, empowers LLMs by providing them with access to your data.

Here's an example of how to apply this technique using the [Llama Index](https://www.llamaindex.ai/) framework
and [Weaviate](https://weaviate.io/) vector database.

??? info "About Llama Index"
Llama Index is an open-source framework that makes it easy extract data from different sources and connect
it to LLMs. It provides a variety of tools and APIs to help you ingest, structure, and access your data in a way that is
easy for LLMs to consume.

??? info "About Weaviate"
Weaviate is an open-source vector database that allows you to store and query objects
using their vector representations, also known as embeddings.
A vector representation is a mathematical way of representing an object as a point in a high-dimensional space.
This allows Weaviate to perform semantic search, which is the ability to search for objects based on their meaning,
rather than just their exact match to a query.

## How does it work?

1. Llama Index loads data from local files, structures it into chunks, and ingests it into Weaviate. It uses local
embeddings through the [SentenceTransformers](https://www.sbert.net/) library.
2. `dstack` allows us to configure cloud accounts (e.g. AWS, GCP, Azure, or Lambda Cloud),
and deploy LLMs (e.g. Llama 2) there. In this example, it employs [Text Generation Inference](https://github.com/huggingface/text-generation-inference)
to serve the LLM. Refer to [Deploying LLMs using TGI](text-generation-inference.md) and [Deploying LLMs via Python API](python-api.md).
3. Llama Index allows us to prompt the LLM and automatically incorporates context from Weaviate.

## Requirements

Here's the list of Python libraries that we'll use:

```
weaviate-client
llama-index
sentence-transformers
text_generation
```

## Load data to Weaviate

The first thing we do is load the data from local files and ingest it into Weaviate.

!!! info "NOTE:"
To use Weaviate, you need to either [install](https://weaviate.io/developers/weaviate/installation)
it on-premises or sign up for their managed service.

Since we're going to load data into or from Weaviate, we'll need a `weaviate.Client`:

```python
import os

import weaviate

auth_config = weaviate.AuthApiKey(api_key=os.getenv("WEAVIATE_API_TOKEN"))

client = weaviate.Client(url=os.getenv("WEAVIATE_URL"), auth_client_secret=auth_config)

client.schema.delete_class("llama-index-weaviate")
```

Next, prepare the Llama Index classes: `llama_index.ServiceContext` (for indexing and querying) and
`llama_index.StorageContext` (for loading and storing). Note that we're using
`langchain.embeddings.huggingface.HuggingFaceEmbeddings` for local embeddings instead of OpenAI.

```python
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

from llama_index import (
LangchainEmbedding,
ServiceContext,
StorageContext,
)
from llama_index.vector_stores import WeaviateVectorStore

embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=None)

vector_store = WeaviateVectorStore(weaviate_client=client, index_name="llama-index-weaviate")

storage_context = StorageContext.from_defaults(vector_store=vector_store)
```

Once the utility classes are configured, we can load the data from local files and pass it to
`llama_index.VectorStoreIndex`. Using its `from_documents` method will then store the data in the vector database.

```python
from pathlib import Path

from llama_index import (
SimpleDirectoryReader,
VectorStoreIndex,
)

documents = SimpleDirectoryReader(Path(__file__).parent / "data").load_data()

index = VectorStoreIndex.from_documents(
documents,
service_context=service_context,
storage_context=storage_context,
)
```

The data is in the vector database! Now we can proceed with the part where we invoke an LLM using this data as context.

## Deploy an LLM

This example assumes we're using an LLM deployed with `dstack` which can be set up either as a task (for development) or
as a service (for production).

For a detailed example on how to deploy an LLM as a service, check out
[Deploying LLMs using TGI](text-generation-inference.md).

Alternatively, for development purposes, you can also check out [Deploying LLMs using Python API](python-api.md). The example comes with a simple
Streamlit app that allows you to deploy an LLM as a task with just one click.

![llama-index-weaviate.md](images/python-api/dstack-python-api-streamlit-example.png)

## Generate response

Once the LLM is up, we can prompt it through Llama Index to automatically incorporate context from Weaviate.

Since we'll invoke the actual LLM, when configuring `llama_index.ServiceContext`, we must include the LLM configuration.

In our example, using an LLM deployed with dstack via TGI, we'll use the `langchain.HuggingFaceTextGenInference` wrapper.
This wrapper requires the LLM's URL and other LLM parameters.

!!! info "NOTE:"
If you've deployed the LLM using [Deploying LLMs using Python API](python-api.md),
make sure to set the `TGI_ENDPOINT_URL` to `http://localhost:8080`.

```python
import os

from llama_index import (
LangchainEmbedding,
ServiceContext,
VectorStoreIndex,
)

from langchain import HuggingFaceTextGenInference
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

from llama_index.llm_predictor import LLMPredictor
from llama_index.vector_stores import WeaviateVectorStore

embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

llm_predictor = LLMPredictor(
llm=HuggingFaceTextGenInference(
inference_server_url=os.getenv("TGI_ENDPOINT_URL"),
max_new_tokens=512,
streaming=True,
),
)

service_context = ServiceContext.from_defaults(
embed_model=embed_model,
llm_predictor=llm_predictor,
)

vector_store = WeaviateVectorStore(weaviate_client=client, index_name="llama-index-weaviate")

index = VectorStoreIndex.from_vector_store(
vector_store, service_context=service_context
)
```

Once `llama_index.VectorStoreIndex` is ready, we can proceed with querying it.

!!! info "NOTE:"
If we're deploying Llama 2, we have to ensure that the prompt format is correct.

```python
from llama_index import QuestionAnswerPrompt

prompt = QuestionAnswerPrompt(
"""<s>[INST] <<SYS>>
We have provided context information below.
{context_str}
Given this information, please answer the question.
<</SYS>>
[/INST]</s>
<s>[INST]{query_str}[/INST]"""
)
query_engine = index.as_query_engine(
text_qa_template=prompt,
streaming=True,
)

response = query_engine.query("What did the author do growing up?")
response.print_response_stream()
```

That's it! This basic example shows how straightforward it is to use Llama Index and Weaviate with the LLMs deployed
using `dstack`. For more in-depth information, we encourage you to explore the documentation for each tool.

## Source code

!!! info "Source code"
The complete, ready-to-run code is available in [dstackai/dstack-examples](https://github.com/dstackai/dstack-examples).
9 changes: 2 additions & 7 deletions docs/examples/python-api.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
---
title: Deploying LLMs with Python API
---

# Deploying LLMs with Python API
# Deploying LLMs using Python API

The [Python API](../docs/reference/api/python/index.md) of `dstack` can be used to run
[tasks](../docs/guides/tasks.md) and [services](../docs/guides/services.md) programmatically.
Expand All @@ -20,8 +16,7 @@ with a simple click of a button.

With `dstack.Client`, you can run [tasks](../docs/guides/tasks.md) and [services](../docs/guides/services.md). Running a task allows you to programmatically access its ports and
forward traffic to your local machine. For example, if you run an LLM as a task, you can access it on localhost.

For more details on the Python API, please refer to its [reference](../docs/reference/api/python/index.md).
Services on the other hand allow deploying applications as public endpoints.

## Prerequisites

Expand Down
Loading

0 comments on commit 7784a8a

Please sign in to comment.