[Feature]: Resource ingestion pipeline to the vector database #43

jurmy24 · 2024-11-01T09:35:38Z

Is your feature request related to a problem? Please describe.
It takes a lot of work to convert the PDF's of eg. textbooks to well divided chunks and also to find all the relevant information such as page numbers, associated chapter, subsection, etc... and put this into the vector database chunks (+ the associated resources and sections tables)

Describe the solution you'd like
I want a pipeline under the scripts/database folder that takes as input a PDF and automatically uploads the chunks and its metadata to the database. Discuss with me on the best way to do this.

jurmy24 · 2024-12-09T20:32:11Z

This is quite a big feature request. Will likely split it up.

jurmy24 added enhancement New feature or request dev Anything related to internal tooling/tests/CICD labels Nov 1, 2024

jurmy24 linked a pull request Nov 30, 2024 that will close this issue

Jurmy24/development/flow review #61

Merged

jurmy24 removed a link to a pull request Nov 30, 2024

Jurmy24/development/flow review #61

Merged

jurmy24 added this to the Resource Ingestion Pipeline milestone Dec 10, 2024

jurmy24 assigned alvaro-mazcu, iamrobzy and louhelhir Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Resource ingestion pipeline to the vector database #43

[Feature]: Resource ingestion pipeline to the vector database #43

jurmy24 commented Nov 1, 2024

jurmy24 commented Dec 9, 2024

[Feature]: Resource ingestion pipeline to the vector database #43

[Feature]: Resource ingestion pipeline to the vector database #43

Comments

jurmy24 commented Nov 1, 2024

jurmy24 commented Dec 9, 2024