fix: [PDF document with 81 pages being indexed into 1 node in Qdrant and Postgres, missing 99% of the document after "successfully indexed"] #1043

Seth-Peters · 2024-12-30T09:08:20Z

Describe the bug

When trying the community version, after connecting successfully an Azure LLM, Qdrant connection, and Llamaparse connection, I have tested by uploading a single document and clicking "index". It shows that it has successfully indexed the document, but only with "1 node". Upon further investigating, the Qdrant vector db has only a single indexed node with only the first title page text of the document. No other parts of the document are indexed.

To reproduce

Using Azure LLM, llamaparse, and Qdrant, then uploading a PDF with chunk_size = 1024 and overlap = 128 then pressing index.

Expected behavior

I would expect to see thousands of nodes in my Qdrant vector db of the successfully parsed/split document.

Environment details

Version: v0.101.6

Screenshots

Full log:

Parsing nodes: 100% 1/1:

Qdrant collection with 1 point:

EDIT:
Signed up for the unstract cloud free version, same issue there. It only indexes the first few characters of my document. I have checked that the llamaparse API works fine with my document.

Screenshot of the unstract cloud:

chunks used button:

The text was updated successfully, but these errors were encountered:

ritwik-g · 2024-12-31T06:45:22Z

@Seth-Peters could you try with the llmwhisperer free version once to confirm if this issue is happening only with the llamaparse?

Seth-Peters · 2024-12-31T06:55:44Z

@ritwik-g - it works with the LLM whisperer. Not sure what is happening, as I did check the document itself works in my llamaparse playground (with my account/api key there).

Seth-Peters added the bug Something isn't working label Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: [PDF document with 81 pages being indexed into 1 node in Qdrant and Postgres, missing 99% of the document after "successfully indexed"] #1043

fix: [PDF document with 81 pages being indexed into 1 node in Qdrant and Postgres, missing 99% of the document after "successfully indexed"] #1043

Seth-Peters commented Dec 30, 2024 •

edited

Loading

ritwik-g commented Dec 31, 2024

Seth-Peters commented Dec 31, 2024

fix: [PDF document with 81 pages being indexed into 1 node in Qdrant and Postgres, missing 99% of the document after "successfully indexed"] #1043

fix: [PDF document with 81 pages being indexed into 1 node in Qdrant and Postgres, missing 99% of the document after "successfully indexed"] #1043

Comments

Seth-Peters commented Dec 30, 2024 • edited Loading

Describe the bug

To reproduce

Expected behavior

Environment details

Screenshots

ritwik-g commented Dec 31, 2024

Seth-Peters commented Dec 31, 2024

Seth-Peters commented Dec 30, 2024 •

edited

Loading