Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: [PDF document with 81 pages being indexed into 1 node in Qdrant and Postgres, missing 99% of the document after "successfully indexed"] #1043

Open
Seth-Peters opened this issue Dec 30, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Seth-Peters
Copy link

Seth-Peters commented Dec 30, 2024

Describe the bug

When trying the community version, after connecting successfully an Azure LLM, Qdrant connection, and Llamaparse connection, I have tested by uploading a single document and clicking "index". It shows that it has successfully indexed the document, but only with "1 node". Upon further investigating, the Qdrant vector db has only a single indexed node with only the first title page text of the document. No other parts of the document are indexed.

To reproduce

Using Azure LLM, llamaparse, and Qdrant, then uploading a PDF with chunk_size = 1024 and overlap = 128 then pressing index.

Expected behavior

I would expect to see thousands of nodes in my Qdrant vector db of the successfully parsed/split document.

Environment details

  • Version: v0.101.6

Screenshots

Full log:

image

Parsing nodes: 100% 1/1:

image

Qdrant collection with 1 point:

image

EDIT:
Signed up for the unstract cloud free version, same issue there. It only indexes the first few characters of my document. I have checked that the llamaparse API works fine with my document.

Screenshot of the unstract cloud:

chunks used button:

image

@Seth-Peters Seth-Peters added the bug Something isn't working label Dec 30, 2024
@ritwik-g
Copy link
Contributor

@Seth-Peters could you try with the llmwhisperer free version once to confirm if this issue is happening only with the llamaparse?

@Seth-Peters
Copy link
Author

@ritwik-g - it works with the LLM whisperer. Not sure what is happening, as I did check the document itself works in my llamaparse playground (with my account/api key there).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@Seth-Peters @ritwik-g and others