From 1d8318a4079f7b3392ab9b6f7718d99f9e23bf46 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Wed, 20 Mar 2024 18:29:30 +0100 Subject: [PATCH 01/18] doc(docs): add contributing page --- docs/contributing.md | 89 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 docs/contributing.md diff --git a/docs/contributing.md b/docs/contributing.md new file mode 100644 index 00000000..53a49b5e --- /dev/null +++ b/docs/contributing.md @@ -0,0 +1,89 @@ +--- +layout: default +title: Contributing +nav_order: 3 +description: llmware contributions. +permalink: /contributing +--- +# Contributing to llmware +Contributions to `llmware` are welcome from everyone. +Our goal is to make the process simple, transparent, and straightforward. + +As with everything in the project, the contributions to `llmware` are governed by our [Code of Conduct](https://github.com/llmware-ai/llmware/blob/main/CODE_OF_CONDUCT.md). + +## How can you contribute? + +{: .note} +>If you have never contributed before look for issues with the tag [``good first issue``](https://github.com/llmware-ai/llmware/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). + +The most usual ways to contribute is to add new features, fix bugs, add tests, or add documentation. +You can visit the [issues](https://github.com/llmware-ai/llmware/issues) site of the project and search for tags such as +``bug``, ``enhancement``, ``documentation``, or ``test``. + + +Here is a non exhaustive list of contributions you can make. + +1. Code refactoring +2. Add new text data bases +3. Add new vector data bases +4. Fix bugs +5. Add usage examples (see for example the issues [jupyter notebook - more examples and better support](https://github.com/llmware-ai/llmware/issues/508) and [google colab examples and start up scripts](https://github.com/llmware-ai/llmware/issues/507)) +6. Add experimental features +7. Improve code quality +8. Improve documentation in the docs (what you are reading right now) +9. Improve documentation by adding or updating docstrings in modules, classes, methods, or functions (see for example [Add docstrings](https://github.com/llmware-ai/llmware/issues/219)) +10. Improve test coverage +11. Answer questions in our [Discord channel](https://discord.gg/MhZn5Nc39h), especially in the [technical support forum](https://discord.com/channels/1179245642770559067/1218498778915672194) +12. Post projects in which you use ``llmware`` in our Discord forum [made with llmware](https://discord.com/channels/1179245642770559067/1218567269471486012), ideially with a link to a public GitHub repository + +## Code contributions + +### Bugs +If you encounter a bug, you can + +- File an issue about the bug. +- Provide a self-contained example that reproduces the bug, which is extremely important. +- Provide possible solutions for the bug. +- Submit a pull a request to fix the bug. + +### Open Issues +If you're interested in existing issues, you can + +- Look for issues with the `good first issue` label as a good place to get started. +- Provide answers for questions in our [github discussions](https://github.com/llmware-ai/llmware/discussions) +- Provide help for bug or enhancement issues. + - Ask questions, reproduce the issues, or provide solutions. + - Pull a request to fix the issue. + +### New or Enhancement to existing Features +If you'd like to contribute a new feature or significantly change an existing one, you can + +- Start a discussion with us in our [github discussions](https://github.com/llmware-ai/llmware/discussions). + +# Security Vulnerabilities +**If you believe you've found a security vulnerability, then please _do not_ submit an issue ticket or pull request or otherwise publicly disclose the issue.** +Please follow the process at [Reporting a Vulnerability](https://github.com/llmware-ai/llmware/blob/main/Security.md) + + + +# GitHub workflow + +Generally, we follow the [``fork-and-pull``](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) Git workflow. + +1. [Fork](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) the repository on GitHub. +2. Clone your fork to your local machine with `git clone git@github.com:/llmware.git`. +3. Create a branch with `git checkout -b my-topic-branch`. +4. Run the test suite by navigating to the tests/ folder and running ```./run-tests.py -s``` to ensure there are no failures +5. [Commit](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/committing-changes-to-a-pull-request-branch-created-from-a-fork) changes to your own branch, then push to GitHub with `git push origin my-topic-branch`. +6. Submit a [pull request](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) so that we can review your changes. + +Remember to [synchronize your forked repository](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo#keep-your-fork-synced) _before_ submitting proposed changes upstream. If you have an existing local repository, please update it before you start, to minimize the chance of merge conflicts. + +```shell +git remote add upstream git@github.com:llmware-ai/llmware.git +git fetch upstream +git checkout upstream/main -b my-topic-branch +``` + +# Do you have questions or just want to bounce around an idea? +Questions and discussions are welcome in our [github discussions](https://github.com/llmware-ai/llmware/discussions)! From 9fbb0e128a10ff277fbecd5e155dfc2eecad37c5 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Thu, 21 Mar 2024 12:08:26 +0100 Subject: [PATCH 02/18] doc(docs): update contributing page with new structure and content --- docs/contributing.md | 41 +++++++++++++++++++++++++++++++++-------- 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/docs/contributing.md b/docs/contributing.md index 53a49b5e..d0928115 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -6,15 +6,24 @@ description: llmware contributions. permalink: /contributing --- # Contributing to llmware + +{: .note} +> The contributions to `llmware` are governed by our [Code of Conduct](https://github.com/llmware-ai/llmware/blob/main/CODE_OF_CONDUCT.md). + +{: .warning} +> Have you found a security issue? Then please jump to [Security Vulnerabilities](#security-vulnerabilities). + Contributions to `llmware` are welcome from everyone. Our goal is to make the process simple, transparent, and straightforward. -As with everything in the project, the contributions to `llmware` are governed by our [Code of Conduct](https://github.com/llmware-ai/llmware/blob/main/CODE_OF_CONDUCT.md). +On this page, we provide information for people interested in contributing to ``llmware``. +This includes information on contribution areas and the contribution process. + ## How can you contribute? {: .note} ->If you have never contributed before look for issues with the tag [``good first issue``](https://github.com/llmware-ai/llmware/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). +> If you have never contributed before look for issues with the tag [``good first issue``](https://github.com/llmware-ai/llmware/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). The most usual ways to contribute is to add new features, fix bugs, add tests, or add documentation. You can visit the [issues](https://github.com/llmware-ai/llmware/issues) site of the project and search for tags such as @@ -38,15 +47,34 @@ Here is a non exhaustive list of contributions you can make. ## Code contributions +### New or Enhancement to existing Features +You want to submit a code contribute that adds a new feature or enhances an existing one? +Then the best way to start is by opening a discussion in our [github discussions](https://github.com/llmware-ai/llmware/discussions). +Please do this before you work on it, so you do not put effort into it just to realise after submission that +it will not be merged. + ### Bugs If you encounter a bug, you can - File an issue about the bug. -- Provide a self-contained example that reproduces the bug, which is extremely important. +- Provide a self-contained minimal example that reproduces the bug, which is extremely important. - Provide possible solutions for the bug. - Submit a pull a request to fix the bug. -### Open Issues +We encourage you to read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) from the Stackoverflow helpcenter, and the tag description of [self-container](https://stackoverflow.com/tags/self-contained/info), also from Stackoverflow. + +## Documentation contributions +There are two ways to contribute to the ``llmware`` documentation. +The first is via docstrings in the code, and the second is what you are currently reading. +In both areas, you can contribute in a lot of ways. +Here is a non exhaustive list of these ways. + +1. Add documentation (e.g., adding a docstring to a function) +2. Update documentation (e.g., update a docstring that is not in sync with the code) +3. Simplify documentation (e.g., formulate a docstring more clearly) +4. Enhance documentation (e.g., add more examples to a docstring or fix typos) + +## Open Issues If you're interested in existing issues, you can - Look for issues with the `good first issue` label as a good place to get started. @@ -55,10 +83,7 @@ If you're interested in existing issues, you can - Ask questions, reproduce the issues, or provide solutions. - Pull a request to fix the issue. -### New or Enhancement to existing Features -If you'd like to contribute a new feature or significantly change an existing one, you can - -- Start a discussion with us in our [github discussions](https://github.com/llmware-ai/llmware/discussions). + # Security Vulnerabilities **If you believe you've found a security vulnerability, then please _do not_ submit an issue ticket or pull request or otherwise publicly disclose the issue.** From 4b185ea11be6bdfc8dca05d8ed75ce626f39421f Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sat, 23 Mar 2024 14:34:05 +0100 Subject: [PATCH 03/18] doc(docs): change github to GitHub --- docs/contributing.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/contributing.md b/docs/contributing.md index d0928115..ba5366ad 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -49,7 +49,7 @@ Here is a non exhaustive list of contributions you can make. ### New or Enhancement to existing Features You want to submit a code contribute that adds a new feature or enhances an existing one? -Then the best way to start is by opening a discussion in our [github discussions](https://github.com/llmware-ai/llmware/discussions). +Then the best way to start is by opening a discussion in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions). Please do this before you work on it, so you do not put effort into it just to realise after submission that it will not be merged. @@ -78,7 +78,7 @@ Here is a non exhaustive list of these ways. If you're interested in existing issues, you can - Look for issues with the `good first issue` label as a good place to get started. -- Provide answers for questions in our [github discussions](https://github.com/llmware-ai/llmware/discussions) +- Provide answers for questions in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions) - Provide help for bug or enhancement issues. - Ask questions, reproduce the issues, or provide solutions. - Pull a request to fix the issue. @@ -111,4 +111,4 @@ git checkout upstream/main -b my-topic-branch ``` # Do you have questions or just want to bounce around an idea? -Questions and discussions are welcome in our [github discussions](https://github.com/llmware-ai/llmware/discussions)! +Questions and discussions are welcome in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions)! From c7e4ad3080a5d09fb065bf54a85a7355f8e9b0e2 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sat, 23 Mar 2024 15:00:08 +0100 Subject: [PATCH 04/18] doc(docs): fix typo --- docs/contributing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing.md b/docs/contributing.md index ba5366ad..659a7697 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -48,7 +48,7 @@ Here is a non exhaustive list of contributions you can make. ## Code contributions ### New or Enhancement to existing Features -You want to submit a code contribute that adds a new feature or enhances an existing one? +You want to submit a code contribution that adds a new feature or enhances an existing one? Then the best way to start is by opening a discussion in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions). Please do this before you work on it, so you do not put effort into it just to realise after submission that it will not be merged. From b16bbb446051858bb0eed88f5abad7ec1194bdf1 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sat, 23 Mar 2024 15:08:55 +0100 Subject: [PATCH 05/18] doc(docs): add one level to the last three headings --- docs/contributing.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/contributing.md b/docs/contributing.md index 659a7697..a1f84aa5 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -85,13 +85,13 @@ If you're interested in existing issues, you can -# Security Vulnerabilities +## Security Vulnerabilities **If you believe you've found a security vulnerability, then please _do not_ submit an issue ticket or pull request or otherwise publicly disclose the issue.** Please follow the process at [Reporting a Vulnerability](https://github.com/llmware-ai/llmware/blob/main/Security.md) -# GitHub workflow +## GitHub workflow Generally, we follow the [``fork-and-pull``](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) Git workflow. @@ -110,5 +110,5 @@ git fetch upstream git checkout upstream/main -b my-topic-branch ``` -# Do you have questions or just want to bounce around an idea? +## Do you have questions or just want to bounce around an idea? Questions and discussions are welcome in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions)! From a83136ee3e35ea9ebec79e4776ba8198b999e1a5 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Mon, 25 Mar 2024 08:06:34 +0100 Subject: [PATCH 06/18] doc(docs): add blueprint of module descriptions --- docs/contributing.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/contributing.md b/docs/contributing.md index a1f84aa5..c9f0cd4f 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -45,6 +45,29 @@ Here is a non exhaustive list of contributions you can make. 11. Answer questions in our [Discord channel](https://discord.gg/MhZn5Nc39h), especially in the [technical support forum](https://discord.com/channels/1179245642770559067/1218498778915672194) 12. Post projects in which you use ``llmware`` in our Discord forum [made with llmware](https://discord.com/channels/1179245642770559067/1218567269471486012), ideially with a link to a public GitHub repository +We briefly describe some of the important modules of ``llmware`` next, so you can more easily navigate the code base. +For newcomers, we embed links to our [fast start series from YouTube](https://www.youtube.com/playlist?list=PL1-dn33KwsmD7SB9iSO6vx4ZLRAWea1DB). + +### Core modules + +#### Library + +In ``llmware``, a *library* is a collection of documents. +A library is responsible for parsing, text chunking, and indexing. + +#### Embeddings + +In ``llmware``, an *embedding* is a vector store and an embedding model. +An embedding is responsible for applying an embedding model to a library, storing the embeddings in a vector store, and providing access to the embeddings with natural language queries. + +#### Prompts + +In ``llmware``, a *prompt* is an input to model. + +#### Model Catalog +In ``llmware``, a *model catalog* is a collection of models. + + ## Code contributions ### New or Enhancement to existing Features From c642e2191811e915c84d06952380feaa915667c7 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sat, 6 Apr 2024 20:52:14 +0200 Subject: [PATCH 07/18] doc(docs): re-organise contributing such that code and documentation contributions have their own sites --- docs/contributing/code.md | 68 +++++++++++++++++++++++++ docs/{ => contributing}/contributing.md | 0 docs/contributing/documentation.md | 31 +++++++++++ 3 files changed, 99 insertions(+) create mode 100644 docs/contributing/code.md rename docs/{ => contributing}/contributing.md (100%) create mode 100644 docs/contributing/documentation.md diff --git a/docs/contributing/code.md b/docs/contributing/code.md new file mode 100644 index 00000000..321f3804 --- /dev/null +++ b/docs/contributing/code.md @@ -0,0 +1,68 @@ +--- +layout: default +title: Code contributions +parent: Contributing +nav_order: 1 +permalink: /contributing/code +--- +# Contiributing code +One way to contribute to ``llmware`` is by contributing to the code base. + +We briefly describe some of the important modules of ``llmware`` next, so you can more easily navigate the code base. +For newcomers, we embed links to our [fast start series from YouTube](https://www.youtube.com/playlist?list=PL1-dn33KwsmD7SB9iSO6vx4ZLRAWea1DB). + +# Core modules + +## Library + +In ``llmware``, a *library* is a collection of documents. +A library is responsible for parsing, text chunking, and indexing. + +## Embeddings + +In ``llmware``, an *embedding* is a vector store and an embedding model. +An embedding is responsible for applying an embedding model to a library, storing the embeddings in a vector store, and providing access to the embeddings with natural language queries. + +## Prompts + +In ``llmware``, a *prompt* is an input to model. + +## Model Catalog +In ``llmware``, a *model catalog* is a collection of models. + + +## Categories of code contributions + +### New or Enhancement to existing Features +You want to submit a code contribution that adds a new feature or enhances an existing one? +Then the best way to start is by opening a discussion in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions). +Please do this before you work on it, so you do not put effort into it just to realise after submission that +it will not be merged. + +### Bugs +If you encounter a bug, you can + +- File an issue about the bug. +- Provide a self-contained minimal example that reproduces the bug, which is extremely important. +- Provide possible solutions for the bug. +- Submit a pull a request to fix the bug. + +We encourage you to read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) from the Stackoverflow helpcenter, and the tag description of [self-container](https://stackoverflow.com/tags/self-contained/info), also from Stackoverflow. + +## Open Issues +If you're interested in existing issues, you can + +- Look for issues with the `good first issue` label as a good place to get started. +- Provide answers for questions in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions) +- Provide help for bug or enhancement issues. + - Ask questions, reproduce the issues, or provide solutions. + - Pull a request to fix the issue. + + + +## Security Vulnerabilities +**If you believe you've found a security vulnerability, then please _do not_ submit an issue ticket or pull request or otherwise publicly disclose the issue.** +Please follow the process at [Reporting a Vulnerability](https://github.com/llmware-ai/llmware/blob/main/Security.md) + +# Do you have questions or just want to bounce around an idea? +Questions and discussions are welcome in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions)! diff --git a/docs/contributing.md b/docs/contributing/contributing.md similarity index 100% rename from docs/contributing.md rename to docs/contributing/contributing.md diff --git a/docs/contributing/documentation.md b/docs/contributing/documentation.md new file mode 100644 index 00000000..2e09df11 --- /dev/null +++ b/docs/contributing/documentation.md @@ -0,0 +1,31 @@ +--- +layout: default +title: Documentation contributions +parent: Contributing +nav_order: 2 +--- +# Contributing documentation +One way to contribute to ``llmware`` is by contributing documentation. + +There are two ways to contribute to the ``llmware`` documentation. +The first is via docstrings in the code, and the second is what you are currently reading. +In both areas, you can contribute in a lot of ways. +Here is a non exhaustive list of these ways. + +1. Add documentation (e.g., adding a docstring to a function) +2. Update documentation (e.g., update a docstring that is not in sync with the code) +3. Simplify documentation (e.g., formulate a docstring more clearly) +4. Enhance documentation (e.g., add more examples to a docstring or fix typos) + +## Code + +## Docs + +## Open Issues +If you're interested in existing issues, you can + +- Look for issues with the `good first issue` and `documentation` label as a good place to get started. +- Provide answers for questions in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions) +- Provide help for bug or enhancement issues. + - Ask questions, reproduce the issues, or provide solutions. + - Pull a request to fix the issue. From 09a6156cadba3583a2a48b52f223df8776057d98 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sat, 13 Apr 2024 19:53:15 +0200 Subject: [PATCH 08/18] doc(docs): add that contributing has children --- docs/contributing/contributing.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/contributing/contributing.md b/docs/contributing/contributing.md index c9f0cd4f..99ad69fa 100644 --- a/docs/contributing/contributing.md +++ b/docs/contributing/contributing.md @@ -2,6 +2,7 @@ layout: default title: Contributing nav_order: 3 +has_children: true description: llmware contributions. permalink: /contributing --- From 9228b9544911029d5311b7f24629f8e6aa650b23 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sat, 13 Apr 2024 19:56:37 +0200 Subject: [PATCH 09/18] doc(docs): add section on how to build docs locally --- docs/contributing/documentation.md | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/docs/contributing/documentation.md b/docs/contributing/documentation.md index 2e09df11..48bb3266 100644 --- a/docs/contributing/documentation.md +++ b/docs/contributing/documentation.md @@ -17,10 +17,36 @@ Here is a non exhaustive list of these ways. 3. Simplify documentation (e.g., formulate a docstring more clearly) 4. Enhance documentation (e.g., add more examples to a docstring or fix typos) -## Code +## Docstrings ## Docs +{: .note} +> All commands are executed from the `docs` sub-directory. + +Contributing to this documentation is extremely important as many users will refer to it. + +If you plan to contribute to the docs, we recommend that you locally install `jekyll` so you can test your changes locally. +We also recommend, that you install `jekyll` into a a ruby enviroment so it does not interfere with any other installations you might have. + +`rbenv` is a tool that mangages different ruby versions, similar to what `conda` does for `python`. +Please [install rvm](https://github.com/rbenv/rbenv?tab=readme-ov-file#installation) following their instructions. +We recommend that you install a ruby version `>=3.0`. +After having installed an isolated ruby version, you have to install the dependencies to build the docs locally. +The `docs` directory has a `Gemfile` which specifies the dependencies. +You can hence simply navigate to it and use the `bundle install` command. + +```bash +bundle install +``` + +You should now be able to build and serve the documentation locally. +To do this, simply to the following. +```bash +bundle exec jekyll serve --livereload +``` +In the browser of your choice, you can then go to `http://127.0.0.1:4000/` and you will be served the documentation, which is re-build and re-loaded after any change to the `docs`. + ## Open Issues If you're interested in existing issues, you can From 94f15dbf83b1c3df6b70aaedcfca6812fdd93c67 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 5 May 2024 16:38:41 +0200 Subject: [PATCH 10/18] doc(docs): add module descriptions --- docs/contributing/code.md | 305 +++++++++++++++++++++++++++++++++----- 1 file changed, 272 insertions(+), 33 deletions(-) diff --git a/docs/contributing/code.md b/docs/contributing/code.md index 321f3804..a5d885e7 100644 --- a/docs/contributing/code.md +++ b/docs/contributing/code.md @@ -9,37 +9,294 @@ permalink: /contributing/code One way to contribute to ``llmware`` is by contributing to the code base. We briefly describe some of the important modules of ``llmware`` next, so you can more easily navigate the code base. -For newcomers, we embed links to our [fast start series from YouTube](https://www.youtube.com/playlist?list=PL1-dn33KwsmD7SB9iSO6vx4ZLRAWea1DB). +You may also take a look at our [fast start series from YouTube](https://www.youtube.com/playlist?list=PL1-dn33KwsmD7SB9iSO6vx4ZLRAWea1DB). -# Core modules +## Core modules -## Library +### Library -In ``llmware``, a *library* is a collection of documents. -A library is responsible for parsing, text chunking, and indexing. +The *library* module implements the classes **Library** and **LibraryCatalog**. +The **Library** class implements the *library* concept. +A *library* is a collection of documents, where a document can be PDF, an image, or an office document. +It is responsible for parsing, text chunking, and indexing. +In other words, it does the heavy lifting of adding content. +In the following, we shortly describe the functions for adding documents to the library. -## Embeddings +```python +add_file( + self, + file_path): +``` +This method adds one document of any supported type to the library. + +```python +add_files( + self, + input_folder_path=None, + encoding="utf-8", + chunk_size=400, + get_images=True,get_tables=True, + smart_chunking=2, + max_chunk_size=600, + table_grid=True, + get_header_text=True, + table_strategy=1, + strip_header=False, + verbose_level=2, + copy_files_to_library=True): +``` +This method adds the documents of one folder to the library. + +```python +add_website( + self, + url, + get_links=True, + max_links=5): +``` +This method adds a website, and links from the website, to the library. + +```python +add_wiki( + self, + topic_list, + target_results=10): +``` +This method adds a wikipedia article to the library. + +```python +add_dialogs( + self, + input_folder=None): +``` +This method adds an AWS dialog transcript to the library. + +```python +add_image( + self, + input_folder=None): +``` +This method adds images to the libary. + +```python +add_pdf_by_ocr( + self, + input_folder=None): +``` +This method adds scanned PDFs to the library. + +```python +add_pdf( + self, + input_folder=None): +``` +This method adds PDFs to the library. + +```python +add_office( + self, + input_folder=None): +``` +This method adds office documents to the library. + +### Embeddings -In ``llmware``, an *embedding* is a vector store and an embedding model. -An embedding is responsible for applying an embedding model to a library, storing the embeddings in a vector store, and providing access to the embeddings with natural language queries. +An *embedding* is a vector store and an embedding model. +It is responsible for applying an embedding model to a library, storing the embeddings in a vector store, and providing access to the embeddings with natural language queries. +We briefly describe the common methods offered by all vector stores below. + +```python +def create_new_embedding( + self, + doc_ids=None, + batch_size=500): +``` +This method creates the embeddings and adds them to the vector store. + +```python +def search_index( + self, + query_embedding_vector, + sample_count=10): +``` +This method searches the vector store given the query vector. + +```python +def delete_index(self): +``` +This method deletes the created vector store index. -## Prompts + +### Prompts -In ``llmware``, a *prompt* is an input to model. +A *prompt* is an input to model. +The prompt is used by the model to generate the response. +One important use case is that users want to augment a prompt, or a series of prompts, with additional information. +Next, we describe methods for augmenting a prompt with additional information. + +```python +def add_source_new_query( + self, + library, + query=None, + query_type="semantic", + result_count=10): +``` +This method adds the results of the ``query`` to the prompt. + +```python +def add_source_query_results( + self, + query_results): +``` +This method adds previous results from a query as a source to the prompt. + +```python +def add_source_library( + self, + library_name): +``` +This method adds an entire library to the prompt. +We recommend that you only use this when the library is sufficiently small. -## Model Catalog -In ``llmware``, a *model catalog* is a collection of models. +```python +def add_source_wikipedia( + self, + topic, + article_count=3, + query=None): +``` +This method adds wikipedia articles to the prompt based on the provided ``topic``. +```python +def add_source_yahoo_finance( + self, + ticker=None, + key_list=None): +``` +This method adds a Yahoo finance ticker to the prompt. -## Categories of code contributions +```python +def add_source_knowledge_graph( + self, + library, + query): +``` +This method adds the summary output elements from a knowledge graph based on the provided ``query``. +Please note that this method is experimental, i.e. unstable, and is subject to change dramatically in each new version. -### New or Enhancement to existing Features +```python +def add_source_website( + self, + url, + query=None): +``` +This method adds the website pointed to by the ``url`` to the prompt. + +```python +def add_source_document( + self, + input_fp, + input_fn, + query=None): +``` +This method adds a document, or documents, of any supported type to the prompt. +If documents are added, then the ``query`` parameter can be used to filter the documents. + +```python +def add_source_last_interaction_step( + self): +``` +This method adds the last interaction to the prompt. +The use case for this is to enable interactive dialog, i.e. chatting. + +### Model Catalog +A *model catalog* is a collection of models. +In the following, we briefly describe the methods for adding new models to the catalog. + +```python +def register_new_hf_generative_model( + self, + hf_model_name=None, + context_window=2048, + prompt_wrapper="", + display_name=None, + temperature=0.3, + trailing_space="", + link=""): +``` +This method adds a new generative model from hugging face. +Users can therefore add models from hugging face that are unsupported currently. + +```python +def register_sentence_transformer_model( + self, + model_name, + embedding_dims, + context_window, + display_name=None, + link=""): +``` +This method adds a new sentence transformer. + +```python +def register_gguf_model( + self, + model_name, + gguf_model_repo, + gguf_model_file_name, + prompt_wrapper=None, + eos_token_id=0, + display_name=None, + trailing_space="", + temperature=0.3, + context_window=2048, + instruction_following=True): +``` +This method adds a new GGUF model. + +```python +def register_open_chat_model( + cls, + model_name, + api_base=None, + model_type="chat", + display_name=None, + context_window=4096, + instruction_following=True, + prompt_wrapper="", + temperature=0.5): +``` +This method adds any chat model that is available through a web API, e.g. a chat model that is available locally +via localhost. + +```python +def register_ollama_model( + cls, + model_name, + host="localhost", + port=11434, + model_type="chat", + raw=False, + stream=False, + display_name=None, + context_window=4096, + instruction_following=True, + prompt_wrapper="", + temperature=0.5): +``` +This method adds an OLLama model that is available through a web API. +The method is similar to the ``register_open_chat_model`` method above. + +### Categories of code contributions + +#### New or Enhancement to existing Features You want to submit a code contribution that adds a new feature or enhances an existing one? Then the best way to start is by opening a discussion in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions). Please do this before you work on it, so you do not put effort into it just to realise after submission that it will not be merged. -### Bugs +#### Bugs If you encounter a bug, you can - File an issue about the bug. @@ -48,21 +305,3 @@ If you encounter a bug, you can - Submit a pull a request to fix the bug. We encourage you to read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) from the Stackoverflow helpcenter, and the tag description of [self-container](https://stackoverflow.com/tags/self-contained/info), also from Stackoverflow. - -## Open Issues -If you're interested in existing issues, you can - -- Look for issues with the `good first issue` label as a good place to get started. -- Provide answers for questions in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions) -- Provide help for bug or enhancement issues. - - Ask questions, reproduce the issues, or provide solutions. - - Pull a request to fix the issue. - - - -## Security Vulnerabilities -**If you believe you've found a security vulnerability, then please _do not_ submit an issue ticket or pull request or otherwise publicly disclose the issue.** -Please follow the process at [Reporting a Vulnerability](https://github.com/llmware-ai/llmware/blob/main/Security.md) - -# Do you have questions or just want to bounce around an idea? -Questions and discussions are welcome in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions)! From 0d35f879e1bbe48fad12d09caaa3d86b51c06c7f Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 5 May 2024 16:40:01 +0200 Subject: [PATCH 11/18] doc(docs): add main contributing page --- docs/contributing/contributing.md | 75 +++++++------------------------ 1 file changed, 15 insertions(+), 60 deletions(-) diff --git a/docs/contributing/contributing.md b/docs/contributing/contributing.md index 99ad69fa..290a3b4f 100644 --- a/docs/contributing/contributing.md +++ b/docs/contributing/contributing.md @@ -14,12 +14,14 @@ permalink: /contributing {: .warning} > Have you found a security issue? Then please jump to [Security Vulnerabilities](#security-vulnerabilities). +On this page, we provide information ``llmware`` contributions. +There are **two ways** on how you can contribute. +The first is by making **code contributions**, and the second by making contributions to the **documentation**. +Please look at our [contribution suggestions](#how-can-you-contribute) if you need inspiration, or take a look at [open issues](#open-issues). + Contributions to `llmware` are welcome from everyone. Our goal is to make the process simple, transparent, and straightforward. - -On this page, we provide information for people interested in contributing to ``llmware``. -This includes information on contribution areas and the contribution process. - +We are happy to receive suggestions on how the process can be improved. ## How can you contribute? @@ -46,62 +48,10 @@ Here is a non exhaustive list of contributions you can make. 11. Answer questions in our [Discord channel](https://discord.gg/MhZn5Nc39h), especially in the [technical support forum](https://discord.com/channels/1179245642770559067/1218498778915672194) 12. Post projects in which you use ``llmware`` in our Discord forum [made with llmware](https://discord.com/channels/1179245642770559067/1218567269471486012), ideially with a link to a public GitHub repository -We briefly describe some of the important modules of ``llmware`` next, so you can more easily navigate the code base. -For newcomers, we embed links to our [fast start series from YouTube](https://www.youtube.com/playlist?list=PL1-dn33KwsmD7SB9iSO6vx4ZLRAWea1DB). - -### Core modules - -#### Library - -In ``llmware``, a *library* is a collection of documents. -A library is responsible for parsing, text chunking, and indexing. - -#### Embeddings - -In ``llmware``, an *embedding* is a vector store and an embedding model. -An embedding is responsible for applying an embedding model to a library, storing the embeddings in a vector store, and providing access to the embeddings with natural language queries. - -#### Prompts - -In ``llmware``, a *prompt* is an input to model. - -#### Model Catalog -In ``llmware``, a *model catalog* is a collection of models. - - -## Code contributions - -### New or Enhancement to existing Features -You want to submit a code contribution that adds a new feature or enhances an existing one? -Then the best way to start is by opening a discussion in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions). -Please do this before you work on it, so you do not put effort into it just to realise after submission that -it will not be merged. - -### Bugs -If you encounter a bug, you can - -- File an issue about the bug. -- Provide a self-contained minimal example that reproduces the bug, which is extremely important. -- Provide possible solutions for the bug. -- Submit a pull a request to fix the bug. - -We encourage you to read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) from the Stackoverflow helpcenter, and the tag description of [self-container](https://stackoverflow.com/tags/self-contained/info), also from Stackoverflow. - -## Documentation contributions -There are two ways to contribute to the ``llmware`` documentation. -The first is via docstrings in the code, and the second is what you are currently reading. -In both areas, you can contribute in a lot of ways. -Here is a non exhaustive list of these ways. - -1. Add documentation (e.g., adding a docstring to a function) -2. Update documentation (e.g., update a docstring that is not in sync with the code) -3. Simplify documentation (e.g., formulate a docstring more clearly) -4. Enhance documentation (e.g., add more examples to a docstring or fix typos) - ## Open Issues If you're interested in existing issues, you can -- Look for issues with the `good first issue` label as a good place to get started. +- Look for issues, if you are a new to the project, look for issues with the `good first issue` label. - Provide answers for questions in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions) - Provide help for bug or enhancement issues. - Ask questions, reproduce the issues, or provide solutions. @@ -117,7 +67,7 @@ Please follow the process at [Reporting a Vulnerability](https://github.com/llmw ## GitHub workflow -Generally, we follow the [``fork-and-pull``](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) Git workflow. +We follow the [``fork-and-pull``](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) Git workflow. 1. [Fork](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) the repository on GitHub. 2. Clone your fork to your local machine with `git clone git@github.com:/llmware.git`. @@ -134,5 +84,10 @@ git fetch upstream git checkout upstream/main -b my-topic-branch ``` -## Do you have questions or just want to bounce around an idea? -Questions and discussions are welcome in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions)! +## Community +Questions and discussions are welcome in any shape or form. +Please fell free to join our community on our discord channel, on which we are active daily. +You are also welcome if you just want to post an idea! + +- [Discord Channel](https://discord.gg/MhZn5Nc39h) +- [GitHub discussions](https://github.com/llmware-ai/llmware/discussions) From 903a432699dcf6833613fa5bd9352993b14fefb7 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 5 May 2024 16:41:25 +0200 Subject: [PATCH 12/18] doc(docs): add documentation contributing page --- docs/contributing/documentation.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/docs/contributing/documentation.md b/docs/contributing/documentation.md index 48bb3266..95da45ac 100644 --- a/docs/contributing/documentation.md +++ b/docs/contributing/documentation.md @@ -3,14 +3,15 @@ layout: default title: Documentation contributions parent: Contributing nav_order: 2 +permalink: contributing/documentation --- # Contributing documentation One way to contribute to ``llmware`` is by contributing documentation. -There are two ways to contribute to the ``llmware`` documentation. -The first is via docstrings in the code, and the second is what you are currently reading. +There are **two ways** to contribute to the ``llmware`` documentation. +The first is via **docstrings in the code**, and the second is **the docs**, which is what you are *currently reading*. In both areas, you can contribute in a lot of ways. -Here is a non exhaustive list of these ways. +Here is a non exhaustive list of these ways for the docstrings which also apply to the docs. 1. Add documentation (e.g., adding a docstring to a function) 2. Update documentation (e.g., update a docstring that is not in sync with the code) @@ -18,6 +19,12 @@ Here is a non exhaustive list of these ways. 4. Enhance documentation (e.g., add more examples to a docstring or fix typos) ## Docstrings +**Docstrings** document the code within the code, which allows programmers to easily have a look while they are programming. +For an exmaple, have a look at [this docstring](https://github.com/llmware-ai/llmware/blob/c9e12a7a150162986622738e127c37ac70f31cd6/llmware/agents.py#L27-L66) which documents the ``LLMfx`` class. + +We follow the docstring style of **numpy**, for which you can find an example [here](https://github.com/numpy/numpydoc/blob/main/doc/example.py) and [here](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html). +Please be sure to follow the conventions and go over your pull request before you submit it. + ## Docs From fb05c61abb086eabc521119cb4fc1679020bb89b Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 5 May 2024 16:43:32 +0200 Subject: [PATCH 13/18] doc(docs): add verbose flag for jekyll server --- docs/contributing/documentation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing/documentation.md b/docs/contributing/documentation.md index 95da45ac..1b254757 100644 --- a/docs/contributing/documentation.md +++ b/docs/contributing/documentation.md @@ -50,7 +50,7 @@ bundle install You should now be able to build and serve the documentation locally. To do this, simply to the following. ```bash -bundle exec jekyll serve --livereload +bundle exec jekyll server --livereload --verbose ``` In the browser of your choice, you can then go to `http://127.0.0.1:4000/` and you will be served the documentation, which is re-build and re-loaded after any change to the `docs`. From 4c0ba49a5e6b6aa29698999579b078f44afe8baa Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 5 May 2024 16:46:36 +0200 Subject: [PATCH 14/18] fix(docs): typo --- docs/contributing/documentation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing/documentation.md b/docs/contributing/documentation.md index 1b254757..fdcea367 100644 --- a/docs/contributing/documentation.md +++ b/docs/contributing/documentation.md @@ -37,7 +37,7 @@ If you plan to contribute to the docs, we recommend that you locally install `je We also recommend, that you install `jekyll` into a a ruby enviroment so it does not interfere with any other installations you might have. `rbenv` is a tool that mangages different ruby versions, similar to what `conda` does for `python`. -Please [install rvm](https://github.com/rbenv/rbenv?tab=readme-ov-file#installation) following their instructions. +Please [install rbenv](https://github.com/rbenv/rbenv?tab=readme-ov-file#installation) following their instructions. We recommend that you install a ruby version `>=3.0`. After having installed an isolated ruby version, you have to install the dependencies to build the docs locally. The `docs` directory has a `Gemfile` which specifies the dependencies. From 802e8c16629497ceafe06308bd06e49f14fad46e Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 5 May 2024 16:56:07 +0200 Subject: [PATCH 15/18] doc(docs): add installation pointer for rvm and add that no one should ever commit files from the _site ditrectory --- docs/contributing/documentation.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/contributing/documentation.md b/docs/contributing/documentation.md index fdcea367..f41c7527 100644 --- a/docs/contributing/documentation.md +++ b/docs/contributing/documentation.md @@ -36,8 +36,9 @@ Contributing to this documentation is extremely important as many users will ref If you plan to contribute to the docs, we recommend that you locally install `jekyll` so you can test your changes locally. We also recommend, that you install `jekyll` into a a ruby enviroment so it does not interfere with any other installations you might have. +We recommend that you install `rbenv` and `rvm` to manage your ruby installation. `rbenv` is a tool that mangages different ruby versions, similar to what `conda` does for `python`. -Please [install rbenv](https://github.com/rbenv/rbenv?tab=readme-ov-file#installation) following their instructions. +Please [install rbenv](https://github.com/rbenv/rbenv?tab=readme-ov-file#installation) following their instructions, and the same for [install rvm](https://github.com/rvm/rvm?tab=readme-ov-file#installing-rvm). We recommend that you install a ruby version `>=3.0`. After having installed an isolated ruby version, you have to install the dependencies to build the docs locally. The `docs` directory has a `Gemfile` which specifies the dependencies. @@ -53,6 +54,7 @@ To do this, simply to the following. bundle exec jekyll server --livereload --verbose ``` In the browser of your choice, you can then go to `http://127.0.0.1:4000/` and you will be served the documentation, which is re-build and re-loaded after any change to the `docs`. +``jekyll`` will create a ``_site`` directory where it saves the created files, please **never commit any files from the \_site directory**! ## Open Issues If you're interested in existing issues, you can From c99eda0dfef097ecb197c24d2efbdf3c1ea263a3 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 5 May 2024 16:58:46 +0200 Subject: [PATCH 16/18] doc(docs): add updated Gemfile.lock --- docs/Gemfile.lock | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/Gemfile.lock b/docs/Gemfile.lock index 0cf5eeff..78cb4d0f 100644 --- a/docs/Gemfile.lock +++ b/docs/Gemfile.lock @@ -79,9 +79,10 @@ GEM rouge (4.1.3) ruby2_keywords (0.0.5) safe_yaml (1.0.5) - sass-embedded (1.67.0-arm64-darwin) + sass-embedded (1.69.5) google-protobuf (~> 3.23) - sass-embedded (1.67.0-x86_64-linux-gnu) + rake (>= 13.0.0) + sass-embedded (1.69.5-arm64-darwin) google-protobuf (~> 3.23) sawyer (0.9.2) addressable (>= 2.3.5) @@ -104,4 +105,4 @@ DEPENDENCIES just-the-docs (= 0.7.0) BUNDLED WITH - 2.3.26 + 2.5.6 From 992e2de152f35d2df2d20b4a30c19ce5a69fbdc4 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 5 May 2024 18:54:59 +0200 Subject: [PATCH 17/18] fix(docs): add hugging face logo path programatically to ensure it is correct from each subdirectory --- docs/_includes/footer_custom.html | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/_includes/footer_custom.html b/docs/_includes/footer_custom.html index 2d56178d..0299a1f0 100644 --- a/docs/_includes/footer_custom.html +++ b/docs/_includes/footer_custom.html @@ -12,7 +12,17 @@ |
  • - + + + {% for static_file in site.static_files %} + {% if static_file.basename == "hf-logo"%} + {% assign hf_logo = static_file %} + {% endif %} + {% endfor %} + + + +
  • From 482605c8efe47e95ba5a9255937c6a785254b6e5 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 5 May 2024 19:10:12 +0200 Subject: [PATCH 18/18] fix(docs): add site url before path to hf logo --- docs/_includes/footer_custom.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/_includes/footer_custom.html b/docs/_includes/footer_custom.html index 0299a1f0..a1202b7a 100644 --- a/docs/_includes/footer_custom.html +++ b/docs/_includes/footer_custom.html @@ -20,7 +20,7 @@ {% endif %} {% endfor %} - +