Skip to content

Commit

Permalink
Merge branch 'jerryjliu:main' into falkor-visual
Browse files Browse the repository at this point in the history
  • Loading branch information
gkorland authored Aug 30, 2023
2 parents c003c82 + bbb911c commit 3a6114c
Show file tree
Hide file tree
Showing 95 changed files with 5,986 additions and 93 deletions.
27 changes: 27 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,32 @@
# ChangeLog

## Unreleased

### Bug Fixes / Nits
- Improve openai fine-tuned model parsing (#7474)

## [0.8.13] - 2023-08-29

### New Features
- Add embedding finetuning (#7452)
- Added support for RunGPT LLM (#7401)
- Integration guide and notebook with DeepEval (#7425)
- Added `VectorIndex` and `VectaraRetriever` as a managed index (#7440)
- Added support for `to_tool_list` to detect and use async functions (#7282)

## [0.8.12] - 2023-08-28

### New Features

- add openai finetuning class (#7442)
- Service Context to/from dict (#7395)
- add finetuning guide (#7429)

### Smaller Features / Nits / Bug Fixes
- Add example how to run FalkorDB docker (#7441)
- Update root.md to use get_response_synthesizer expected type. (#7437)
- Bugfix MonsterAPI Pydantic version v2/v1 support. Doc Update (#7432)

## [0.8.11.post3] - 2023-08-27

### New Features
Expand Down
8 changes: 8 additions & 0 deletions docs/api_reference/finetuning.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. _Ref-Finetuning:

Finetuning
=============

.. automodule:: llama_index.finetuning
:members:
:inherited-members:
1 change: 1 addition & 0 deletions docs/api_reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,6 @@ API Reference for the ``llama-index`` package.
struct_store.rst
response.rst
playground.rst
finetuning.rst
example_notebooks.rst
langchain_integrations/base.rst
7 changes: 4 additions & 3 deletions docs/community/integrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The full set of agent tools are found on [LlamaHub](https://llamahub.ai/)
The full set of supported LLMs are found [here](/core_modules/model_modules/llms/modules.md).


## Observability/Tracing
## Observability/Tracing/Evaluation

Check out our [one-click observability](/end_to_end_tutorials/one_click_observability.md) page
for full tracing integrations.
Expand All @@ -25,6 +25,7 @@ maxdepth: 1
/end_to_end_tutorials/one_click_observability.md
integrations/graphsignal.md
integrations/trulens.md
integrations/deepeval.md
```

Expand All @@ -38,16 +39,16 @@ Guardrails </examples/output_parsing/GuardrailsDemo.ipynb>
OpenAI Function Calling </examples/output_parsing/openai_pydantic_program.ipynb>
```

## Storage
## Storage and Managed Indexes
```{toctree}
---
maxdepth: 1
---
integrations/vector_stores.md
integrations/graph_stores.md
integrations/managed_indices.md
```


## Application Frameworks
```{toctree}
---
Expand Down
146 changes: 146 additions & 0 deletions docs/community/integrations/deepeval.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Unit Testing LLMs With DeepEval

[DeepEval](https://github.com/confident-ai/deepeval) provides unit testing for AI agents and LLM-powered applications. It provides a really simple interface for LlamaIndex developers to write tests and helps developers ensure AI applications run as expected.

DeepEval provides an opinionated framework to measure responses and is completely open-source.

### Installation and Setup

Adding [DeepEval](https://github.com/confident-ai/deepeval) is simple, just install and configure it:

```sh
pip install -q -q llama-index
pip install -U deepeval
```

Once installed , you can get set up and start writing tests.

```sh
# Optional step: Login to get a nice dashboard for your tests later!
# During this step - make sure to save your project as llama
deepeval login
deepeval test generate test_sample.py
```

You can then run tests as such:

```bash
deepeval test run test_sample.py
```

After running this, you will get a beautiful dashboard like so:

![Sample dashboard](https://raw.githubusercontent.com/confident-ai/deepeval/main/docs/assets/dashboard-screenshot.png)

## Types of Tests

DeepEval presents an opinionated framework for the types of tests that are being run. It breaks down LLM outputs into:
- Answer Relevancy - [Read more here](https://docs.confident-ai.com/docs/measuring_llm_performance/answer_relevancy)
- Factual Consistency (to measure the extent of hallucinations) - [Read more here](https://docs.confident-ai.com/docs/measuring_llm_performance/factual_consistency)
- Conceptual Similarity (to know if answers are in line with expectations) - [Read more here](https://docs.confident-ai.com/docs/measuring_llm_performance/conceptual_similarity)
- Toxicness - [Read more here](https://docs.confident-ai.com/docs/measuring_llm_performance/non_toxic)
- Bias (can come up from finetuning) - [Read more here](https://docs.confident-ai.com/docs/measuring_llm_performance/debias)

You can more about the [DeepEval Framework](https://docs.confident-ai.com/docs/framework) here.

## Use With Your LlamaIndex

DeepEval integrates nicely with LlamaIndex's `ResponseEvaluator` class. Below is an example of the factual consistency documentation.

```python

from llama_index.response.schema import Response
from typing import List
from llama_index.schema import Document
from deepeval.metrics.factual_consistency import FactualConsistencyMetric

from llama_index import (
TreeIndex,
VectorStoreIndex,
SimpleDirectoryReader,
LLMPredictor,
ServiceContext,
Response,
)
from llama_index.llms import OpenAI
from llama_index.evaluation import ResponseEvaluator

import os
import openai

api_key = "sk-XXX"
openai.api_key = api_key

gpt4 = OpenAI(temperature=0, model="gpt-4", api_key=api_key)
service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)
evaluator_gpt4 = ResponseEvaluator(service_context=service_context_gpt4)

```

#### Getting a lLamaHub Loader

```python
from llama_index import download_loader

WikipediaReader = download_loader("WikipediaReader")

loader = WikipediaReader()
documents = loader.load_data(pages=['Tokyo'])
tree_index = TreeIndex.from_documents(documents=documents)
vector_index = VectorStoreIndex.from_documents(
documents, service_context=service_context_gpt4
)
```

We then build an evaluator based on the `BaseEvaluator` class that requires an `evaluate` method.

In this example, we show you how to write a factual consistency check.

```python
class FactualConsistencyResponseEvaluator:
def get_context(self, response: Response) -> List[Document]:
"""Get context information from given Response object using source nodes.
Args:
response (Response): Response object from an index based on the query.
Returns:
List of Documents of source nodes information as context information.
"""
context = []

for context_info in response.source_nodes:
context.append(Document(text=context_info.node.get_content()))

return context

def evaluate(self, response: Response) -> str:
"""Evaluate factual consistency metrics
"""
answer = str(response)
context = self.get_context(response)
metric = FactualConsistencyMetric()
context = " ".join([d.text for d in context])
score = metric.measure(output=answer, context=context)
if metric.is_successful():
return "YES"
else:
return "NO"

evaluator = FactualConsistencyResponseEvaluator()
```

You can then evaluate as such:

```python
query_engine = tree_index.as_query_engine()
response = query_engine.query("How did Tokyo get its name?")
eval_result = evaluator.evaluate(response)
```

### Useful Links

* [Read About The DeepEval Framework](https://docs.confident-ai.com/docs/framework)
* [Answer Relevancy](https://docs.confident-ai.com/docs/measuring_llm_performance/answer_relevancy)
* [Conceptual Similarity](https://docs.confident-ai.com/docs/measuring_llm_performance/conceptual_similarity) .
* [Bias](https://docs.confident-ai.com/docs/measuring_llm_performance/debias)
62 changes: 62 additions & 0 deletions docs/community/integrations/managed_indices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Using Managed Indices

LlamaIndex offers multiple integration points with Managed Indices. A managed index is a special type of index that is not managed locally as part of LlamaIndex but instead is managed via an API, such as [Vectara](https://vectara.com).

## Using a Managed Index

Similar to any other index within LlamaIndex (tree, keyword table, list), any `ManagedIndex` can be constructed with a collection
of documents. Once constructed, the index can be used for querying.

If the Index has been previously populated with documents - it can also be used directly for querying.

`VectaraIndex` is currently the only supported managed index, although we expect more to be available soon.
Below we show how to use it.

**Vectara Index Construction/Querying**

Use the [Vectara Console](https://console.vectara.com/login) to create a corpus (aka Index), and add an API key for access.
Then put the customer id, corpus id, and API key in your environment as shown below.

Then construct the Vectara Index and query it as follows:

```python
from llama_index import ManagedIndex, SimpleDirectoryReade
from llama_index.managed import VectaraIndex

# Load documents and build index
vectara_customer_id = os.environ.get("VECTARA_CUSTOMER_ID")
vectara_corpus_id = os.environ.get("VECTARA_CORPUS_ID")
vectara_api_key = os.environ.get("VECTARA_API_KEY")
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = VectaraIndex.from_documents(documents, vectara_customer_id=vectara_customer_id, vectara_corpus_id=vectara_corpus_id, vectara_api_key=vectara_api_key)

# Query index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
```

Note that if the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY` are in the environment already, you do not have to explicitly specifying them in your call and the VectaraIndex class will read them from the enviornment. For example this should be equivalent to the above, if these variables are in the environment already:

```python
from llama_index import ManagedIndex, SimpleDirectoryReade
from llama_index.managed import VectaraIndex

# Load documents and build index
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = VectaraIndex.from_documents(documents)

# Query index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
```




```{toctree}
---
caption: Examples
maxdepth: 1
---
../../examples/vector_stores/VectaraDemo.ipynb
```
9 changes: 8 additions & 1 deletion docs/core_modules/model_modules/llms/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,5 +107,12 @@ maxdepth: 1
maxdepth: 1
---
/examples/llm/monsterapi.ipynb
```

```
## RunGPT
```{toctree}
---
maxdepth: 1
---
/examples/llm/rungpt.ipynb
```
1 change: 1 addition & 0 deletions docs/core_modules/supporting_modules/evaluation/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@ maxdepth: 1
../../../examples/evaluation/TestNYC-Evaluation.ipynb
../../../examples/evaluation/TestNYC-Evaluation-Query.ipynb
../../../examples/evaluation/QuestionGeneration.ipynb
../../../examples/evaluation/Deepeval.ipynb
```
7 changes: 7 additions & 0 deletions docs/core_modules/supporting_modules/evaluation/root.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,13 @@ if it matches the query.

In addition to evaluating queries, LlamaIndex can also use your data to generate questions to evaluate on. This means that you can automatically generate questions, and then run an evaluation pipeline to test if the LLM can actually answer questions accurately using your data.

## Integrations

We also integrate with community evaluation tools.

- [DeepEval](../../../community/integrations/deepeval.md)
- [Ragas](https://github.com/explodinggradients/ragas/blob/main/docs/integrations/llamaindex.ipynb)

## Usage Pattern

For full usage details, see the usage pattern below.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,10 @@ data_generator = DatasetGenerator.from_documents(documents)

eval_questions = data_generator.generate_questions_from_nodes()
```

## Integrations

We also integrate with community evaluation tools.

- [DeepEval](../../../community/integrations/deepeval.md)
- [Ragas](https://github.com/explodinggradients/ragas/blob/main/docs/integrations/llamaindex.ipynb)
18 changes: 18 additions & 0 deletions docs/end_to_end_tutorials/finetuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,14 @@ We created a comprehensive repo/guide showing you how to finetune an open-source

Finetuning gives you a 5-10% increase in retrieval evaluation metrics. You can then plug this fine-tuned model into your RAG application with LlamaIndex.

```{toctree}
---
maxdepth: 1
---
Embedding Fine-tuning Guide </examples/finetuning/embeddings/finetune_embedding.ipynb>
```

**Old**
```{toctree}
---
maxdepth: 1
Expand All @@ -52,6 +60,16 @@ We use GPT-4 to automatically generate questions from any unstructured context,

We then launch a finetuning job, and get back a distilled model. We can evaluate this model with [Ragas](https://github.com/explodinggradients/ragas) to benchmark against a naive GPT-3.5 pipeline.

```{toctree}
---
maxdepth: 1
---
GPT-3.5 Fine-tuning Notebook (Colab) <https://colab.research.google.com/drive/1NgyCJVyrC2xcZ5lxt2frTU862v6eJHlc?usp=sharing>
GPT-3.5 Fine-tuning Notebook </examples/finetuning/openai_fine_tuning.ipynb>
```

**Old**

```{toctree}
---
maxdepth: 1
Expand Down
Loading

0 comments on commit 3a6114c

Please sign in to comment.