Merge branch 'master' into on_chain_start_fix

langchain-ai · Sep 19, 2024 · 1c4db33 · 1c4db33
2 parents 7bba1ea + c453b76
commit 1c4db33
Show file tree

Hide file tree

Showing 264 changed files with 5,003 additions and 4,830 deletions.
diff --git a/.github/DISCUSSION_TEMPLATE/q-a.yml b/.github/DISCUSSION_TEMPLATE/q-a.yml
@@ -96,22 +96,27 @@ body:
   - type: textarea
     id: system-info
     attributes:
+      label: System Info
       description: |
-        Please share your system info with us. Do NOT skip this step and please don't trim
-        the output. Most users don't include enough information here and it makes it harder
-        for us to help you.
+        Please share your system info with us. 
         
-        Run the following command in your terminal and paste the output here:
+        "pip freeze | grep langchain" 
+        platform (windows / linux / mac)
+        python version
         
-        python -m langchain_core.sys_info
+        OR if you're on a recent version of langchain-core you can paste the output of:
         
-        or if you have an existing python interpreter running:
+        python -m langchain_core.sys_info
+      placeholder: |
+        "pip freeze | grep langchain"
+        platform
+        python version
         
-        from langchain_core import sys_info
-        sys_info.print_sys_info()
+        Alternatively, if you're on a recent version of langchain-core you can paste the output of:
         
-        alternatively, put the entire output of `pip freeze` here.
-      placeholder: |
         python -m langchain_core.sys_info
+        
+        These will only surface LangChain packages, don't forget to include any other relevant
+        packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`).
     validations:
       required: true
diff --git a/.github/workflows/_release.yml b/.github/workflows/_release.yml
@@ -85,7 +85,7 @@ jobs:
           path: langchain
           sparse-checkout: | # this only grabs files for relevant dir
             ${{ inputs.working-directory }}
-          ref: master # this scopes to just master branch
+          ref: ${{ github.ref }} # this scopes to just ref'd branch
           fetch-depth: 0 # this fetches entire commit history
       - name: Check Tags
         id: check-tags

diff --git a/docs/docs/integrations/document_loaders/amazon_textract.ipynb b/docs/docs/integrations/document_loaders/amazon_textract.ipynb
@@ -13,7 +13,7 @@
     "\n",
     "This sample demonstrates the use of `Amazon Textract` in combination with LangChain as a DocumentLoader.\n",
     "\n",
-    "`Textract` supports`PDF`, `TIF`F, `PNG` and `JPEG` format.\n",
+    "`Textract` supports`PDF`, `TIFF`, `PNG` and `JPEG` format.\n",
     "\n",
     "`Textract` supports these [document sizes, languages and characters](https://docs.aws.amazon.com/textract/latest/dg/limits-document.html)."
    ]

diff --git a/docs/docs/integrations/document_loaders/google_speech_to_text.ipynb b/docs/docs/integrations/document_loaders/google_speech_to_text.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "# Google Speech-to-Text Audio Transcripts\n",
     "\n",
-    "The `GoogleSpeechToTextLoader` allows to transcribe audio files with the [Google Cloud Speech-to-Text API](https://cloud.google.com/speech-to-text) and loads the transcribed text into documents.\n",
+    "The `SpeechToTextLoader` allows to transcribe audio files with the [Google Cloud Speech-to-Text API](https://cloud.google.com/speech-to-text) and loads the transcribed text into documents.\n",
     "\n",
     "To use it, you should have the `google-cloud-speech` python package installed, and a Google Cloud project with the [Speech-to-Text API enabled](https://cloud.google.com/speech-to-text/v2/docs/transcribe-client-libraries#before_you_begin).\n",
     "\n",
@@ -41,7 +41,7 @@
    "source": [
     "## Example\n",
     "\n",
-    "The `GoogleSpeechToTextLoader` must include the `project_id` and `file_path` arguments. Audio files can be specified as a Google Cloud Storage URI (`gs://...`) or a local file path.\n",
+    "The `SpeechToTextLoader` must include the `project_id` and `file_path` arguments. Audio files can be specified as a Google Cloud Storage URI (`gs://...`) or a local file path.\n",
     "\n",
     "Only synchronous requests are supported by the loader, which has a [limit of 60 seconds or 10MB](https://cloud.google.com/speech-to-text/v2/docs/sync-recognize#:~:text=60%20seconds%20and/or%2010%20MB) per audio file."
    ]
@@ -52,13 +52,13 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from langchain_google_community import GoogleSpeechToTextLoader\n",
+    "from langchain_google_community import SpeechToTextLoader\n",
     "\n",
     "project_id = \"<PROJECT_ID>\"\n",
     "file_path = \"gs://cloud-samples-data/speech/audio.flac\"\n",
     "# or a local file path: file_path = \"./audio.wav\"\n",
     "\n",
-    "loader = GoogleSpeechToTextLoader(project_id=project_id, file_path=file_path)\n",
+    "loader = SpeechToTextLoader(project_id=project_id, file_path=file_path)\n",
     "\n",
     "docs = loader.load()"
    ]
@@ -152,7 +152,7 @@
     "    RecognitionConfig,\n",
     "    RecognitionFeatures,\n",
     ")\n",
-    "from langchain_google_community import GoogleSpeechToTextLoader\n",
+    "from langchain_google_community import SpeechToTextLoader\n",
     "\n",
     "project_id = \"<PROJECT_ID>\"\n",
     "location = \"global\"\n",
@@ -171,7 +171,7 @@
     "    ),\n",
     ")\n",
     "\n",
-    "loader = GoogleSpeechToTextLoader(\n",
+    "loader = SpeechToTextLoader(\n",
     "    project_id=project_id,\n",
     "    location=location,\n",
     "    recognizer_id=recognizer_id,\n",

diff --git a/docs/docs/integrations/document_loaders/unstructured_file.ipynb b/docs/docs/integrations/document_loaders/unstructured_file.ipynb
@@ -16,7 +16,7 @@
     "\n",
     "| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/docs/integrations/document_loaders/file_loaders/unstructured/)|\n",
     "| :--- | :--- | :---: | :---: |  :---: |\n",
-    "| [UnstructuredLoader](https://python.langchain.com/api_reference/unstructured/document_loaders/langchain_unstructured.document_loaders.UnstructuredLoader.html) | [langchain_community](https://python.langchain.com/api_reference/unstructured/index.html) | ✅ | ❌ | ✅ | \n",
+    "| [UnstructuredLoader](https://python.langchain.com/api_reference/unstructured/document_loaders/langchain_unstructured.document_loaders.UnstructuredLoader.html) | [langchain_unstructured](https://python.langchain.com/api_reference/unstructured/index.html) | ✅ | ❌ | ✅ | \n",
     "### Loader features\n",
     "| Source | Document Lazy Loading | Native Async Support\n",
     "| :---: | :---: | :---: | \n",
@@ -519,6 +519,47 @@
     "print(\"Length of text in the document:\", len(docs[0].page_content))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "3ec3c22d-02cd-498b-921f-b839d1404f32",
+   "metadata": {},
+   "source": [
+    "## Loading web pages\n",
+    "\n",
+    "`UnstructuredLoader` accepts a `web_url` kwarg when run locally that populates the `url` parameter of the underlying Unstructured [partition](https://docs.unstructured.io/open-source/core-functionality/partitioning). This allows for the parsing of remotely hosted documents, such as HTML web pages.\n",
+    "\n",
+    "Example usage:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "bf9a8546-659d-4861-bff2-fdf1ad93ac65",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "page_content='Example Domain' metadata={'category_depth': 0, 'languages': ['eng'], 'filetype': 'text/html', 'url': 'https://www.example.com', 'category': 'Title', 'element_id': 'fdaa78d856f9d143aeeed85bf23f58f8'}\n",
+      "\n",
+      "page_content='This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.' metadata={'languages': ['eng'], 'parent_id': 'fdaa78d856f9d143aeeed85bf23f58f8', 'filetype': 'text/html', 'url': 'https://www.example.com', 'category': 'NarrativeText', 'element_id': '3652b8458b0688639f973fe36253c992'}\n",
+      "\n",
+      "page_content='More information...' metadata={'category_depth': 0, 'link_texts': ['More information...'], 'link_urls': ['https://www.iana.org/domains/example'], 'languages': ['eng'], 'filetype': 'text/html', 'url': 'https://www.example.com', 'category': 'Title', 'element_id': '793ab98565d6f6d6f3a6d614e3ace2a9'}\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_unstructured import UnstructuredLoader\n",
+    "\n",
+    "loader = UnstructuredLoader(web_url=\"https://www.example.com\")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "for doc in docs:\n",
+    "    print(f\"{doc}\\n\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "ce01aa40",
@@ -546,7 +587,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.13"
+   "version": "3.10.4"
   }
  },
  "nbformat": 4,

diff --git a/docs/docs/integrations/llms/sambanova.ipynb b/docs/docs/integrations/llms/sambanova.ipynb
@@ -6,129 +6,11 @@
    "source": [
     "# SambaNova\n",
     "\n",
-    "**[SambaNova](https://sambanova.ai/)'s** [Sambaverse](https://sambaverse.sambanova.ai/) and [Sambastudio](https://sambanova.ai/technology/full-stack-ai-platform) are platforms for running your own open-source models\n",
+    "**[SambaNova](https://sambanova.ai/)'s** [Sambastudio](https://sambanova.ai/technology/full-stack-ai-platform) is a platform for running your own open-source models\n",
     "\n",
     "This example goes over how to use LangChain to interact with SambaNova models"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Sambaverse"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "**Sambaverse** allows you to interact with multiple open-source models. You can view the list of available models and interact with them in the [playground](https://sambaverse.sambanova.ai/playground).\n",
-    " **Please note that Sambaverse's free offering is performance-limited.** Companies that are ready to evaluate the production tokens-per-second performance, volume throughput, and 10x lower total cost of ownership (TCO) of SambaNova should [contact us](https://sambaverse.sambanova.ai/contact-us) for a non-limited evaluation instance."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "An API key is required to access Sambaverse models. To get a key, create an account at [sambaverse.sambanova.ai](https://sambaverse.sambanova.ai/)\n",
-    "\n",
-    "The [sseclient-py](https://pypi.org/project/sseclient-py/) package is required to run streaming predictions "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%pip install --quiet sseclient-py==1.8.0"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Register your API key as an environment variable:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "\n",
-    "sambaverse_api_key = \"<Your sambaverse API key>\"\n",
-    "\n",
-    "# Set the environment variables\n",
-    "os.environ[\"SAMBAVERSE_API_KEY\"] = sambaverse_api_key"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Call Sambaverse models directly from LangChain!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_community.llms.sambanova import Sambaverse\n",
-    "\n",
-    "llm = Sambaverse(\n",
-    "    sambaverse_model_name=\"Meta/llama-2-7b-chat-hf\",\n",
-    "    streaming=False,\n",
-    "    model_kwargs={\n",
-    "        \"do_sample\": True,\n",
-    "        \"max_tokens_to_generate\": 1000,\n",
-    "        \"temperature\": 0.01,\n",
-    "        \"select_expert\": \"llama-2-7b-chat-hf\",\n",
-    "        \"process_prompt\": False,\n",
-    "        # \"stop_sequences\": '\\\"sequence1\\\",\\\"sequence2\\\"',\n",
-    "        # \"repetition_penalty\":  1.0,\n",
-    "        # \"top_k\": 50,\n",
-    "        # \"top_p\": 1.0\n",
-    "    },\n",
-    ")\n",
-    "\n",
-    "print(llm.invoke(\"Why should I use open source models?\"))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Streaming response\n",
-    "\n",
-    "from langchain_community.llms.sambanova import Sambaverse\n",
-    "\n",
-    "llm = Sambaverse(\n",
-    "    sambaverse_model_name=\"Meta/llama-2-7b-chat-hf\",\n",
-    "    streaming=True,\n",
-    "    model_kwargs={\n",
-    "        \"do_sample\": True,\n",
-    "        \"max_tokens_to_generate\": 1000,\n",
-    "        \"temperature\": 0.01,\n",
-    "        \"select_expert\": \"llama-2-7b-chat-hf\",\n",
-    "        \"process_prompt\": False,\n",
-    "        # \"stop_sequences\": '\\\"sequence1\\\",\\\"sequence2\\\"',\n",
-    "        # \"repetition_penalty\":  1.0,\n",
-    "        # \"top_k\": 50,\n",
-    "        # \"top_p\": 1.0\n",
-    "    },\n",
-    ")\n",
-    "\n",
-    "for chunk in llm.stream(\"Why should I use open source models?\"):\n",
-    "    print(chunk, end=\"\", flush=True)"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/docs/docs/integrations/providers/mlflow.mdx b/docs/docs/integrations/providers/mlflow.mdx
@@ -1,12 +1,12 @@
-# MLflow Deployments for LLMs
+# MLflow AI Gateway for LLMs
 
->[The MLflow Deployments for LLMs](https://www.mlflow.org/docs/latest/llms/deployments/index.html) is a powerful tool designed to streamline the usage and management of various large
+>[The MLflow AI Gateway for LLMs](https://www.mlflow.org/docs/latest/llms/deployments/index.html) is a powerful tool designed to streamline the usage and management of various large
 > language model (LLM) providers, such as OpenAI and Anthropic, within an organization. It offers a high-level interface
 > that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM related requests.
 
 ## Installation and Setup
 
-Install `mlflow` with MLflow Deployments dependencies:
+Install `mlflow` with MLflow GenAI dependencies:
 
 ```sh
 pip install 'mlflow[genai]'
@@ -39,10 +39,10 @@ endpoints:
         openai_api_key: $OPENAI_API_KEY
 ```
 
-Start the deployments server:
+Start the gateway server:
 
 ```sh
-mlflow deployments start-server --config-path /path/to/config.yaml
+mlflow gateway start --config-path /path/to/config.yaml
 ```
 
 ## Example provided by `MLflow`