Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Resource ingestion pipeline to the vector database #43

Open
jurmy24 opened this issue Nov 1, 2024 · 1 comment
Open

[Feature]: Resource ingestion pipeline to the vector database #43

jurmy24 opened this issue Nov 1, 2024 · 1 comment
Assignees
Labels
dev Anything related to internal tooling/tests/CICD enhancement New feature or request

Comments

@jurmy24
Copy link
Member

jurmy24 commented Nov 1, 2024

Is your feature request related to a problem? Please describe.
It takes a lot of work to convert the PDF's of eg. textbooks to well divided chunks and also to find all the relevant information such as page numbers, associated chapter, subsection, etc... and put this into the vector database chunks (+ the associated resources and sections tables)

Describe the solution you'd like
I want a pipeline under the scripts/database folder that takes as input a PDF and automatically uploads the chunks and its metadata to the database. Discuss with me on the best way to do this.

@jurmy24 jurmy24 added enhancement New feature or request dev Anything related to internal tooling/tests/CICD labels Nov 1, 2024
@jurmy24 jurmy24 linked a pull request Nov 30, 2024 that will close this issue
@jurmy24 jurmy24 removed a link to a pull request Nov 30, 2024
@jurmy24
Copy link
Member Author

jurmy24 commented Dec 9, 2024

This is quite a big feature request. Will likely split it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev Anything related to internal tooling/tests/CICD enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants