Merge branch 'jerryjliu:main' into falkor-visual

FalkorDB · Aug 30, 2023 · 3a6114c · 3a6114c
2 parents c003c82 + bbb911c
commit 3a6114c
Show file tree

Hide file tree

Showing 95 changed files with 5,986 additions and 93 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,32 @@
 # ChangeLog
 
+## Unreleased
+
+### Bug Fixes / Nits
+- Improve openai fine-tuned model parsing (#7474)
+
+## [0.8.13] - 2023-08-29
+
+### New Features
+- Add embedding finetuning (#7452)
+- Added support for RunGPT LLM (#7401)
+- Integration guide and notebook with DeepEval (#7425)
+- Added `VectorIndex` and `VectaraRetriever` as a managed index (#7440)
+- Added support for `to_tool_list` to detect and use async functions (#7282)
+
+## [0.8.12] - 2023-08-28
+
+### New Features
+
+- add openai finetuning class (#7442)
+- Service Context to/from dict (#7395)
+- add finetuning guide (#7429)
+
+### Smaller Features / Nits / Bug Fixes
+- Add example how to run FalkorDB docker (#7441)
+- Update root.md to use get_response_synthesizer expected type. (#7437) 
+- Bugfix MonsterAPI Pydantic version v2/v1 support. Doc Update (#7432)
+
 ## [0.8.11.post3] - 2023-08-27
 
 ### New Features

diff --git a/docs/api_reference/finetuning.rst b/docs/api_reference/finetuning.rst
@@ -0,0 +1,8 @@
+.. _Ref-Finetuning:
+
+Finetuning
+=============
+
+.. automodule:: llama_index.finetuning
+   :members:
+   :inherited-members:
diff --git a/docs/api_reference/index.rst b/docs/api_reference/index.rst
@@ -24,5 +24,6 @@ API Reference for the ``llama-index`` package.
    struct_store.rst
    response.rst
    playground.rst
+   finetuning.rst
    example_notebooks.rst
    langchain_integrations/base.rst
diff --git a/docs/community/integrations.md b/docs/community/integrations.md
@@ -13,7 +13,7 @@ The full set of agent tools are found on [LlamaHub](https://llamahub.ai/)
 The full set of supported LLMs are found [here](/core_modules/model_modules/llms/modules.md).
 
 
-## Observability/Tracing
+## Observability/Tracing/Evaluation
 
 Check out our [one-click observability](/end_to_end_tutorials/one_click_observability.md) page
 for full tracing integrations.
@@ -25,6 +25,7 @@ maxdepth: 1
 /end_to_end_tutorials/one_click_observability.md
 integrations/graphsignal.md
 integrations/trulens.md
+integrations/deepeval.md
 
 ```
 
@@ -38,16 +39,16 @@ Guardrails </examples/output_parsing/GuardrailsDemo.ipynb>
 OpenAI Function Calling </examples/output_parsing/openai_pydantic_program.ipynb>
 ```
 
-## Storage
+## Storage and Managed Indexes
 ```{toctree}
 ---
 maxdepth: 1
 ---
 integrations/vector_stores.md
 integrations/graph_stores.md
+integrations/managed_indices.md
 ```
 
-
 ## Application Frameworks
 ```{toctree}
 ---

diff --git a/docs/community/integrations/deepeval.md b/docs/community/integrations/deepeval.md
@@ -0,0 +1,146 @@
+# Unit Testing LLMs With DeepEval
+
+[DeepEval](https://github.com/confident-ai/deepeval) provides unit testing for AI agents and LLM-powered applications. It provides a really simple interface for LlamaIndex developers to write tests and helps developers ensure AI applications run as expected.
+
+DeepEval provides an opinionated framework to measure responses and is completely open-source.
+
+### Installation and Setup
+
+Adding [DeepEval](https://github.com/confident-ai/deepeval) is simple, just install and configure it:
+
+```sh
+pip install -q -q llama-index
+pip install -U deepeval
+```
+
+Once installed , you can get set up and start writing tests.
+
+```sh
+# Optional step: Login to get a nice dashboard for your tests later!
+# During this step - make sure to save your project as llama
+deepeval login
+deepeval test generate test_sample.py
+```
+
+You can then run tests as such:
+
+```bash
+deepeval test run test_sample.py
+```
+
+After running this, you will get a beautiful dashboard like so:
+
+![Sample dashboard](https://raw.githubusercontent.com/confident-ai/deepeval/main/docs/assets/dashboard-screenshot.png)
+
+## Types of Tests
+
+DeepEval presents an opinionated framework for the types of tests that are being run. It breaks down LLM outputs into: 
+- Answer Relevancy - [Read more here](https://docs.confident-ai.com/docs/measuring_llm_performance/answer_relevancy)
+- Factual Consistency (to measure the extent of hallucinations) - [Read more here](https://docs.confident-ai.com/docs/measuring_llm_performance/factual_consistency)
+- Conceptual Similarity (to know if answers are in line with expectations) - [Read more here](https://docs.confident-ai.com/docs/measuring_llm_performance/conceptual_similarity)
+- Toxicness - [Read more here](https://docs.confident-ai.com/docs/measuring_llm_performance/non_toxic)
+- Bias (can come up from finetuning) - [Read more here](https://docs.confident-ai.com/docs/measuring_llm_performance/debias)
+
+You can more about the [DeepEval Framework](https://docs.confident-ai.com/docs/framework) here.
+
+## Use With Your LlamaIndex
+
+DeepEval integrates nicely with LlamaIndex's `ResponseEvaluator` class. Below is an example of the factual consistency documentation.
+
+```python
+
+from llama_index.response.schema import Response
+from typing import List
+from llama_index.schema import Document
+from deepeval.metrics.factual_consistency import FactualConsistencyMetric
+
+from llama_index import (
+    TreeIndex,
+    VectorStoreIndex,
+    SimpleDirectoryReader,
+    LLMPredictor,
+    ServiceContext,
+    Response,
+)
+from llama_index.llms import OpenAI
+from llama_index.evaluation import ResponseEvaluator
+
+import os
+import openai
+
+api_key = "sk-XXX"
+openai.api_key = api_key
+
+gpt4 = OpenAI(temperature=0, model="gpt-4", api_key=api_key)
+service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)
+evaluator_gpt4 = ResponseEvaluator(service_context=service_context_gpt4)
+
+```
+
+#### Getting a lLamaHub Loader 
+
+```python
+from llama_index import download_loader
+
+WikipediaReader = download_loader("WikipediaReader")
+
+loader = WikipediaReader()
+documents = loader.load_data(pages=['Tokyo'])
+tree_index = TreeIndex.from_documents(documents=documents)
+vector_index = VectorStoreIndex.from_documents(
+    documents, service_context=service_context_gpt4
+)
+```
+
+We then build an evaluator based on the `BaseEvaluator` class that requires an `evaluate` method.
+
+In this example, we show you how to write a factual consistency check.
+
+```python
+class FactualConsistencyResponseEvaluator:
+  def get_context(self, response: Response) -> List[Document]:
+    """Get context information from given Response object using source nodes.
+
+    Args:
+        response (Response): Response object from an index based on the query.
+
+    Returns:
+        List of Documents of source nodes information as context information.
+    """
+    context = []
+
+    for context_info in response.source_nodes:
+        context.append(Document(text=context_info.node.get_content()))
+
+    return context
+
+  def evaluate(self, response: Response) -> str:
+    """Evaluate factual consistency metrics
+    """
+    answer = str(response)
+    context = self.get_context(response)
+    metric = FactualConsistencyMetric()
+    context = " ".join([d.text for d in context])
+    score = metric.measure(output=answer, context=context)
+    if metric.is_successful():
+        return "YES"
+    else:
+        return "NO"
+
+evaluator = FactualConsistencyResponseEvaluator()
+```
+
+You can then evaluate as such:
+
+```python
+query_engine = tree_index.as_query_engine()
+response = query_engine.query("How did Tokyo get its name?")
+eval_result = evaluator.evaluate(response)
+```
+
+### Useful Links
+
+* [Read About The DeepEval Framework](https://docs.confident-ai.com/docs/framework)
+* [Answer Relevancy](https://docs.confident-ai.com/docs/measuring_llm_performance/answer_relevancy)
+* [Conceptual Similarity](https://docs.confident-ai.com/docs/measuring_llm_performance/conceptual_similarity) . 
+* [Bias](https://docs.confident-ai.com/docs/measuring_llm_performance/debias)
diff --git a/docs/community/integrations/managed_indices.md b/docs/community/integrations/managed_indices.md
@@ -0,0 +1,62 @@
+# Using Managed Indices
+
+LlamaIndex offers multiple integration points with Managed Indices. A managed index is a special type of index that is not managed locally as part of LlamaIndex but instead is managed via an API, such as [Vectara](https://vectara.com).
+
+## Using a Managed Index
+
+Similar to any other index within LlamaIndex (tree, keyword table, list), any `ManagedIndex` can be constructed with a collection
+of documents. Once constructed, the index can be used for querying.
+
+If the Index has been previously populated with documents - it can also be used directly for querying.
+
+`VectaraIndex` is currently the only supported managed index, although we expect more to be available soon.
+Below we show how to use it.
+
+**Vectara Index Construction/Querying**
+
+Use the [Vectara Console](https://console.vectara.com/login) to create a corpus (aka Index), and add an API key for access. 
+Then put the customer id, corpus id, and API key in your environment as shown below.
+
+Then construct the Vectara Index and query it as follows:
+
+```python
+from llama_index import ManagedIndex, SimpleDirectoryReade
+from llama_index.managed import VectaraIndex
+
+# Load documents and build index
+vectara_customer_id = os.environ.get("VECTARA_CUSTOMER_ID")
+vectara_corpus_id = os.environ.get("VECTARA_CORPUS_ID")
+vectara_api_key = os.environ.get("VECTARA_API_KEY")
+documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
+index = VectaraIndex.from_documents(documents, vectara_customer_id=vectara_customer_id, vectara_corpus_id=vectara_corpus_id, vectara_api_key=vectara_api_key)
+
+# Query index
+query_engine = index.as_query_engine()
+response = query_engine.query("What did the author do growing up?")
+```
+
+Note that if the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY` are in the environment already, you do not have to explicitly specifying them in your call and the VectaraIndex class will read them from the enviornment. For example this should be equivalent to the above, if these variables are in the environment already:
+
+```python
+from llama_index import ManagedIndex, SimpleDirectoryReade
+from llama_index.managed import VectaraIndex
+
+# Load documents and build index
+documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
+index = VectaraIndex.from_documents(documents) 
+
+# Query index
+query_engine = index.as_query_engine()
+response = query_engine.query("What did the author do growing up?")
+```
+
+
+
+
+```{toctree}
+---
+caption: Examples
+maxdepth: 1
+---
+../../examples/vector_stores/VectaraDemo.ipynb
+```
diff --git a/docs/core_modules/model_modules/llms/modules.md b/docs/core_modules/model_modules/llms/modules.md
@@ -107,5 +107,12 @@ maxdepth: 1
 maxdepth: 1
 ---
 /examples/llm/monsterapi.ipynb
+```
 
-```
+## RunGPT
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/llm/rungpt.ipynb
+```
diff --git a/docs/core_modules/supporting_modules/evaluation/modules.md b/docs/core_modules/supporting_modules/evaluation/modules.md
@@ -10,4 +10,5 @@ maxdepth: 1
 ../../../examples/evaluation/TestNYC-Evaluation.ipynb
 ../../../examples/evaluation/TestNYC-Evaluation-Query.ipynb
 ../../../examples/evaluation/QuestionGeneration.ipynb
+../../../examples/evaluation/Deepeval.ipynb
 ```
diff --git a/docs/core_modules/supporting_modules/evaluation/root.md b/docs/core_modules/supporting_modules/evaluation/root.md
@@ -42,6 +42,13 @@ if it matches the query.
 
 In addition to evaluating queries, LlamaIndex can also use your data to generate questions to evaluate on. This means that you can automatically generate questions, and then run an evaluation pipeline to test if the LLM can actually answer questions accurately using your data.
 
+## Integrations
+
+We also integrate with community evaluation tools.
+
+- [DeepEval](../../../community/integrations/deepeval.md)
+- [Ragas](https://github.com/explodinggradients/ragas/blob/main/docs/integrations/llamaindex.ipynb)
+
 ## Usage Pattern
 
 For full usage details, see the usage pattern below.

diff --git a/docs/core_modules/supporting_modules/evaluation/usage_pattern.md b/docs/core_modules/supporting_modules/evaluation/usage_pattern.md
@@ -144,3 +144,10 @@ data_generator = DatasetGenerator.from_documents(documents)
 
 eval_questions = data_generator.generate_questions_from_nodes()
 ```
+
+## Integrations
+
+We also integrate with community evaluation tools.
+
+- [DeepEval](../../../community/integrations/deepeval.md)
+- [Ragas](https://github.com/explodinggradients/ragas/blob/main/docs/integrations/llamaindex.ipynb)
diff --git a/docs/end_to_end_tutorials/finetuning.md b/docs/end_to_end_tutorials/finetuning.md
@@ -36,6 +36,14 @@ We created a comprehensive repo/guide showing you how to finetune an open-source
 
 Finetuning gives you a 5-10% increase in retrieval evaluation metrics. You can then plug this fine-tuned model into your RAG application with LlamaIndex. 
 
+```{toctree}
+---
+maxdepth: 1
+---
+Embedding Fine-tuning Guide </examples/finetuning/embeddings/finetune_embedding.ipynb>
+```
+
+**Old**
 ```{toctree}
 ---
 maxdepth: 1
@@ -52,6 +60,16 @@ We use GPT-4 to automatically generate questions from any unstructured context,
 
 We then launch a finetuning job, and get back a distilled model. We can evaluate this model with [Ragas](https://github.com/explodinggradients/ragas) to benchmark against a naive GPT-3.5 pipeline.
 
+```{toctree}
+---
+maxdepth: 1
+---
+GPT-3.5 Fine-tuning Notebook (Colab) <https://colab.research.google.com/drive/1NgyCJVyrC2xcZ5lxt2frTU862v6eJHlc?usp=sharing>
+GPT-3.5 Fine-tuning Notebook </examples/finetuning/openai_fine_tuning.ipynb>
+```
+
+**Old**
+
 ```{toctree}
 ---
 maxdepth: 1