Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unstructured[major]: Unstructured partner package #6227

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,18 +30,18 @@
"\n",
"```\n",
"\n",
"This notebook provides a quick overview for getting started with `UnstructuredLoader` [document loaders](/docs/concepts/#document-loaders). For detailed documentation of all `UnstructuredLoader` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_document_loaders_fs_unstructured.UnstructuredLoader.html).\n",
"This notebook provides a quick overview for getting started with `UnstructuredLoader` [document loaders](/docs/concepts/#document-loaders). For detailed documentation of all `UnstructuredLoader` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_unstructured.UnstructuredLoader.html).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Compatibility | Local | [PY support](https://python.langchain.com/docs/integrations/document_loaders/unstructured_file) | \n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [UnstructuredLoader](https://api.js.langchain.com/classes/langchain_community_document_loaders_fs_unstructured.UnstructuredLoader.html) | [@langchain/community](https://api.js.langchain.com/modules/langchain_community_document_loaders_fs_unstructured.html) | Node-only | ✅ | ✅ |\n",
"| [UnstructuredLoader](https://api.js.langchain.com/classes/langchain_unstructured.UnstructuredLoader.html) | [@langchain/community](https://api.js.langchain.com/modules/langchain_unstructured.html) | Node-only | ✅ | ✅ |\n",
"\n",
"## Setup\n",
"\n",
"To access `UnstructuredLoader` document loader you'll need to install the `@langchain/community` integration package, and create an Unstructured account and get an API key.\n",
"To access `UnstructuredLoader` document loader you'll need to install the `@langchain/unstructured` integration package, and create an Unstructured account and get an API key.\n",
"\n",
"### Local\n",
"\n",
Expand All @@ -61,7 +61,7 @@
"\n",
"### Installation\n",
"\n",
"The LangChain UnstructuredLoader integration lives in the `@langchain/community` package:\n",
"The LangChain UnstructuredLoader integration lives in the `@langchain/unstructured` package:\n",
"\n",
"```{=mdx}\n",
"import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
Expand All @@ -70,7 +70,7 @@
"<IntegrationInstallTooltip></IntegrationInstallTooltip>\n",
"\n",
"<Npm2Yarn>\n",
" @langchain/community\n",
" @langchain/unstructured\n",
"</Npm2Yarn>\n",
"\n",
"```"
Expand All @@ -91,9 +91,11 @@
"metadata": {},
"outputs": [],
"source": [
"import { UnstructuredLoader } from \"@langchain/community/document_loaders/fs/unstructured\"\n",
"import { UnstructuredLoader } from \"@langchain/unstructured\"\n",
"\n",
"const loader = new UnstructuredLoader(\"../../../../../../examples/src/document_loaders/example_data/notion.md\")"
"const loader = new UnstructuredLoader({\n",
" filePath: \"../../../../../../examples/src/document_loaders/example_data/notion.md\",\n",
"})"
]
},
{
Expand Down Expand Up @@ -211,13 +213,8 @@
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all UnstructuredLoader features and configurations head to the API reference: https://api.js.langchain.com/classes/langchain_community_document_loaders_fs_unstructured.UnstructuredLoader.html"
"For detailed documentation of all UnstructuredLoader features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_unstructured.UnstructuredLoader.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
Expand Down
1 change: 1 addition & 0 deletions examples/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
"@langchain/redis": "workspace:*",
"@langchain/scripts": "~0.0.20",
"@langchain/textsplitters": "workspace:*",
"@langchain/unstructured": "workspace:*",
"@langchain/weaviate": "workspace:*",
"@langchain/yandex": "workspace:*",
"@layerup/layerup-security": "^1.5.12",
Expand Down
6 changes: 4 additions & 2 deletions examples/src/document_loaders/unstructured.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
import { UnstructuredLoader } from "@langchain/community/document_loaders/fs/unstructured";
import { UnstructuredLoader } from "@langchain/unstructured";

const options = {
apiKey: "MY_API_KEY",
};

const loader = new UnstructuredLoader(
"src/document_loaders/example_data/notion.md",
{
filePath: "src/document_loaders/example_data/notion.md",
},
options
);
const docs = await loader.load();
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@ export type SkipInferTableTypes =
*/
export type ChunkingStrategy = "None" | "by_title";

/**
* @deprecated Unstructured now has a standalone partner package with LangChain. Please download `@langchain/unstructured` instead.
*/
export type UnstructuredLoaderOptions = {
apiKey?: string;
apiUrl?: string;
Expand Down Expand Up @@ -127,6 +130,8 @@ export type UnstructuredMemoryLoaderOptions = {
};

/**
* @deprecated Unstructured now has a standalone partner package with LangChain. Please download `@langchain/unstructured` instead.
*
* A document loader that uses the Unstructured API to load unstructured
* documents. It supports both the new syntax with options object and the
* legacy syntax for backward compatibility. The load() method sends a
Expand Down
74 changes: 74 additions & 0 deletions libs/langchain-unstructured/.eslintrc.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
module.exports = {
extends: [
"airbnb-base",
"eslint:recommended",
"prettier",
"plugin:@typescript-eslint/recommended",
],
parserOptions: {
ecmaVersion: 12,
parser: "@typescript-eslint/parser",
project: "./tsconfig.json",
sourceType: "module",
},
plugins: ["@typescript-eslint", "no-instanceof"],
ignorePatterns: [
".eslintrc.cjs",
"scripts",
"node_modules",
"dist",
"dist-cjs",
"*.js",
"*.cjs",
"*.d.ts",
],
rules: {
"no-process-env": 2,
"no-instanceof/no-instanceof": 2,
"@typescript-eslint/explicit-module-boundary-types": 0,
"@typescript-eslint/no-empty-function": 0,
"@typescript-eslint/no-shadow": 0,
"@typescript-eslint/no-empty-interface": 0,
"@typescript-eslint/no-use-before-define": ["error", "nofunc"],
"@typescript-eslint/no-unused-vars": ["warn", { args: "none" }],
"@typescript-eslint/no-floating-promises": "error",
"@typescript-eslint/no-misused-promises": "error",
camelcase: 0,
"class-methods-use-this": 0,
"import/extensions": [2, "ignorePackages"],
"import/no-extraneous-dependencies": [
"error",
{ devDependencies: ["**/*.test.ts"] },
],
"import/no-unresolved": 0,
"import/prefer-default-export": 0,
"keyword-spacing": "error",
"max-classes-per-file": 0,
"max-len": 0,
"no-await-in-loop": 0,
"no-bitwise": 0,
"no-console": 0,
"no-restricted-syntax": 0,
"no-shadow": 0,
"no-continue": 0,
"no-void": 0,
"no-underscore-dangle": 0,
"no-use-before-define": 0,
"no-useless-constructor": 0,
"no-return-await": 0,
"consistent-return": 0,
"no-else-return": 0,
"func-names": 0,
"no-lonely-if": 0,
"prefer-rest-params": 0,
"new-cap": ["error", { properties: false, capIsNew: false }],
},
overrides: [
{
files: ["**/*.test.ts"],
rules: {
"@typescript-eslint/no-unused-vars": "off",
},
},
],
};
7 changes: 7 additions & 0 deletions libs/langchain-unstructured/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
index.cjs
index.js
index.d.ts
index.d.cts
node_modules
dist
.yarn
19 changes: 19 additions & 0 deletions libs/langchain-unstructured/.prettierrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"$schema": "https://json.schemastore.org/prettierrc",
"printWidth": 80,
"tabWidth": 2,
"useTabs": false,
"semi": true,
"singleQuote": false,
"quoteProps": "as-needed",
"jsxSingleQuote": false,
"trailingComma": "es5",
"bracketSpacing": true,
"arrowParens": "always",
"requirePragma": false,
"insertPragma": false,
"proseWrap": "preserve",
"htmlWhitespaceSensitivity": "css",
"vueIndentScriptAndStyle": false,
"endOfLine": "lf"
}
10 changes: 10 additions & 0 deletions libs/langchain-unstructured/.release-it.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"github": {
"release": true,
"autoGenerate": true,
"tokenRef": "GITHUB_TOKEN_RELEASE"
},
"npm": {
"versionArgs": ["--workspaces-update=false"]
}
}
21 changes: 21 additions & 0 deletions libs/langchain-unstructured/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License

Copyright (c) 2023 LangChain

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
21 changes: 21 additions & 0 deletions libs/langchain-unstructured/jest.config.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
/** @type {import('ts-jest').JestConfigWithTsJest} */
module.exports = {
preset: "ts-jest/presets/default-esm",
testEnvironment: "./jest.env.cjs",
modulePathIgnorePatterns: ["dist/", "docs/"],
moduleNameMapper: {
"^(\\.{1,2}/.*)\\.js$": "$1",
},
transform: {
"^.+\\.tsx?$": ["@swc/jest"],
},
transformIgnorePatterns: [
"/node_modules/",
"\\.pnp\\.[^\\/]+$",
"./scripts/jest-setup-after-env.js",
],
setupFiles: ["dotenv/config"],
testTimeout: 20_000,
passWithNoTests: true,
collectCoverageFrom: ["src/**/*.ts"],
};
12 changes: 12 additions & 0 deletions libs/langchain-unstructured/jest.env.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
const { TestEnvironment } = require("jest-environment-node");

class AdjustedTestEnvironmentToSupportFloat32Array extends TestEnvironment {
constructor(config, context) {
// Make `instanceof Float32Array` return true in tests
// to avoid https://github.com/xenova/transformers.js/issues/57 and https://github.com/jestjs/jest/issues/2549
super(config, context);
this.global.Float32Array = Float32Array;
}
}

module.exports = AdjustedTestEnvironmentToSupportFloat32Array;
22 changes: 22 additions & 0 deletions libs/langchain-unstructured/langchain.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import { resolve, dirname } from "node:path";
import { fileURLToPath } from "node:url";

/**
* @param {string} relativePath
* @returns {string}
*/
function abs(relativePath) {
return resolve(dirname(fileURLToPath(import.meta.url)), relativePath);
}

export const config = {
internals: [/node\:/, /@langchain\/core\//, "unstructured-client/sdk/models/shared"],
entrypoints: {
index: "index",
},
requiresOptionalDependency: [],
tsConfigPath: resolve("./tsconfig.json"),
cjsSource: "./dist-cjs",
cjsDestination: "./dist",
abs,
};
84 changes: 84 additions & 0 deletions libs/langchain-unstructured/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
{
"name": "@langchain/unstructured",
"version": "0.0.0",
"description": "Sample integration for LangChain.js",
"type": "module",
"engines": {
"node": ">=18"
},
"main": "./index.js",
"types": "./index.d.ts",
"repository": {
"type": "git",
"url": "[email protected]:langchain-ai/langchainjs.git"
},
"homepage": "https://github.com/langchain-ai/langchainjs/tree/main/libs/langchain-unstructured/",
"scripts": {
"build": "yarn turbo:command build:internal --filter=@langchain/unstructured",
"build:internal": "yarn lc_build_v2 --create-entrypoints --pre --tree-shaking",
"lint:eslint": "NODE_OPTIONS=--max-old-space-size=4096 eslint --cache --ext .ts,.js src/",
"lint:dpdm": "dpdm --exit-code circular:1 --no-warning --no-tree src/*.ts src/**/*.ts",
"lint": "yarn lint:eslint && yarn lint:dpdm",
"lint:fix": "yarn lint:eslint --fix && yarn lint:dpdm",
"clean": "rm -rf .turbo dist/",
"prepack": "yarn build",
"test": "NODE_OPTIONS=--experimental-vm-modules jest --testPathIgnorePatterns=\\.int\\.test.ts --testTimeout 30000 --maxWorkers=50%",
"test:watch": "NODE_OPTIONS=--experimental-vm-modules jest --watch --testPathIgnorePatterns=\\.int\\.test.ts",
"test:single": "NODE_OPTIONS=--experimental-vm-modules yarn run jest --config jest.config.cjs --testTimeout 100000",
"test:int": "NODE_OPTIONS=--experimental-vm-modules jest --testPathPattern=\\.int\\.test.ts --testTimeout 100000 --maxWorkers=50%",
"format": "prettier --config .prettierrc --write \"src\"",
"format:check": "prettier --config .prettierrc --check \"src\""
},
"author": "LangChain",
"license": "MIT",
"dependencies": {
"@langchain/core": ">0.1.0 <0.3.0",
"unstructured-client": "^0.13.0"
},
"devDependencies": {
"@jest/globals": "^29.5.0",
"@langchain/scripts": "~0.0.14",
"@swc/core": "^1.3.90",
"@swc/jest": "^0.2.29",
"@tsconfig/recommended": "^1.0.3",
"@typescript-eslint/eslint-plugin": "^6.12.0",
"@typescript-eslint/parser": "^6.12.0",
"dotenv": "^16.3.1",
"dpdm": "^3.12.0",
"eslint": "^8.33.0",
"eslint-config-airbnb-base": "^15.0.0",
"eslint-config-prettier": "^8.6.0",
"eslint-plugin-import": "^2.27.5",
"eslint-plugin-no-instanceof": "^1.0.1",
"eslint-plugin-prettier": "^4.2.1",
"jest": "^29.5.0",
"jest-environment-node": "^29.6.4",
"prettier": "^2.8.3",
"release-it": "^15.10.1",
"rollup": "^4.5.2",
"ts-jest": "^29.1.0",
"typescript": "<5.2.0"
},
"publishConfig": {
"access": "public"
},
"exports": {
".": {
"types": {
"import": "./index.d.ts",
"require": "./index.d.cts",
"default": "./index.d.ts"
},
"import": "./index.js",
"require": "./index.cjs"
},
"./package.json": "./package.json"
},
"files": [
"dist/",
"index.cjs",
"index.js",
"index.d.ts",
"index.d.cts"
]
}
Loading
Loading