Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add document-retrieval to Hub as a task #1097

Merged
merged 5 commits into from
Feb 5, 2025
Merged

Conversation

merveenoyan
Copy link
Contributor

@merveenoyan merveenoyan commented Jan 13, 2025

This PR adds document retrieval to Hub for following models that are used heavily and now there's a lot of them:

Icon looks like this:
Screenshot 2025-01-13 at 11 10 29

Other names I thought of:

  • multimodal-feature-extraction (ambiguous, also overlaps with zero shot image classification)
  • document-feature-extraction (sounds like plain document embeddings for a document backbone, best if this doesn't happen and those models are covered under image-feature-extraction instead)
  • zero-shot-document-classification (too long, sounds odd but would be accurate, but we use these models often for retrieval and not this purpose)
    this name is very to the point so best if it stays like it

@merveenoyan
Copy link
Contributor Author

@NielsRogge

@merveenoyan
Copy link
Contributor Author

@pcuenca can you leave a review? 👀💗

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it not already very close to image-feature-extraction? How many models would be covered by this? (i'm wondering if it isn't too specific)

@julien-c
Copy link
Member

(no strong opinion though)

@@ -676,6 +676,11 @@ export const PIPELINE_DATA = {
color: "red",
hideInDatasets: true,
},
"document-retrieval": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds a bit NLP-y to me. Could we use something like "visual-document-retrieval", for symmetry with "visual-question-answering"?

@pcuenca
Copy link
Member

pcuenca commented Jan 17, 2025

How many models would be covered by this? (i'm wondering if it isn't too specific)

I have the same question. Would, for example, all the ColPali models be included here?

@pcuenca
Copy link
Member

pcuenca commented Jan 17, 2025

We'd also have to tag multiple models before merging, as usual.

@merveenoyan
Copy link
Contributor Author

merveenoyan commented Jan 18, 2025

@pcuenca I wanted to wait for the naming consensus before opening PRs to them. we can do visual-document-retrieval yes.

@julien-c it's actually not. those are singular image backbones used to train traditional vision models. these models on the contrast are zero shot models built on VLMs to do document retrieval on multimodal RAG pipelines. they're a bit like CLIP, but for documents, and they're not used for classification (they have long context length and have fine grained image understanding). the number of models keep increasing as number of VLMs increase and they're all wrongly tagged hence this PR. (there's ColPali, ColQwen, ColSmolVLM, DSE models and more now)

here's an tldr explainer on how they're used https://x.com/mervenoyann/status/1831409380040044762?s=46

@julien-c
Copy link
Member

ok, sounds good

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe image-document-retrieval so that it's more in sync with the current task vision/ audio related tasks names?

packages/tasks/src/pipelines.ts Show resolved Hide resolved
@pcuenca
Copy link
Member

pcuenca commented Jan 31, 2025

Ok so the items in discussion were:

  • Name: @merveenoyan thinks image-document-retrieval is not ideal, so can we move forward with visual-document-retrieval?
  • hideInDatasets: if there are not many, I'd hide and we can "unhide" later on, but deferring to @merveenoyan to open PRs and decide once the name has been confirmed.

@merveenoyan
Copy link
Contributor Author

@pcuenca sorry I was busy with smolagents sprint. I'll edit with this visual-document-retrieval, and hide in datasets for now

@merveenoyan
Copy link
Contributor Author

locally lint passes with format and check too, idk why this is happening (I also did pnpm i just in case something wasn't matching the CI's linting setup)

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes! the failing CI is unrelated! (other PRs suffer from the same thing)

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

green CI yay 🙏

@merveenoyan merveenoyan merged commit 8b50438 into main Feb 5, 2025
5 checks passed
@merveenoyan merveenoyan deleted the add-doc-feature-ext branch February 5, 2025 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants