Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing picture enrichment annotations to Markdown file #625

Open
theobgbd opened this issue Dec 18, 2024 · 3 comments
Open

Writing picture enrichment annotations to Markdown file #625

theobgbd opened this issue Dec 18, 2024 · 3 comments
Labels
question Further information is requested

Comments

@theobgbd
Copy link

Image annotations to MD

Following the Figure Enrichment tutorial it is easy to add classification metadata to an image through the element.annotations.append(data) function.

However this data is not stored during the export to Markdown format. Would there be a way to write it alongside the image path in the final MD document ? Would be great for our RAG application.

@theobgbd theobgbd added the question Further information is requested label Dec 18, 2024
@gauravmindzk
Copy link

hi @theobgbd ,
I'm currently exploring PDF parsers for my PDF RAG app.

What is the meaning of Picture Enrichment in Docling ?

I'm searching for a way to add context to images of my pdfs which will eventually help in image summarization.

Is picture enrichment the answer ?

@dolfim-ibm
Copy link
Contributor

Picture enrichment currently has specific typing for classification, description (like model captioning), charts, chemical strucrures, and a generic one PictureMiscData https://github.com/DS4SD/docling-core/blob/main/docling_core/types/doc/document.py#L261.

We have some idea on how to enable the serialization of that data when exporting to markdown or other formats.

@theobgbd
Copy link
Author

theobgbd commented Jan 8, 2025

I'm searching for a way to add context to images of my pdfs which will eventually help in image summarization.

Is picture enrichment the answer ?

@gauravmindzk What I meant by "picture enrichment" was model captionning with a vision LLM, with added context from the document. For my use case, I used [this template] https://ds4sd.github.io/docling/examples/develop_picture_enrichment/ ) as a base to set up a call to an on-premise Pixtral instance to caption the images, and then store the answer in the annotations field.

Picture enrichment currently has specific typing for classification, description (like model captioning), charts, chemical strucrures, and a generic one PictureMiscData https://github.com/DS4SD/docling-core/blob/main/docling_core/types/doc/document.py#L261.

We have some idea on how to enable the serialization of that data when exporting to markdown or other formats.

@dolfim-ibm Thanks for your feedback, I didn't know about the PictureMiscData field. I forked the docling_core project and wrote a crude "ANNOTATION" markdown export format that writes the content of the annotations as the "alt" description for the image in the MD file. That solution fits my need for now, but its nice to know that you have some plans to serialize it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants