Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set artifacts_path for picture classifier #870

Open
simonschoe opened this issue Feb 3, 2025 · 0 comments
Open

Set artifacts_path for picture classifier #870

simonschoe opened this issue Feb 3, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@simonschoe
Copy link

simonschoe commented Feb 3, 2025

Bug

When downloading and pointing to models locally, we commonly use pipeline_options.artifacts_path. For example:
pipeline_options.artifacts_path = "models"

The layout and tableformer model are then imported from the following paths:

  • Layout model: models/model_artifacts/layout/...
  • Tableformer: models/model_artifacts/tableformer/...

However, when importing the new picture classifier from a local folder, it requires that all model files are in the root folder, i.e., models:

  • Picture classifier: models/config.json, models/model.safetensors

I find this very counterintuitive and not very well structured.

EDIT: This is even more annoying when using both the picture classifier and the CodeFormula models and importing from local since both accept the same artifacts_path.

Proposal

Two options that come to mind:
a. When initalizing the DocumentFigureClassifierPredictor, simply point to models as artifacts_path and allow same folder structure as on huggingface. For example:

DocumentFigureClassifierPredictor(
    artifacts_path="models", # model files are in models/ds4sd/DocumentFigureClassifier
    device="cpu",
    num_threads=4,
) 

b. On huggingface, for the DocumentFigureClassifier repo use the model_artifacts/<model_type> repo structure that is similar to the main docling repository and allow import by simply stating the root folder. For example:

DocumentFigureClassifierPredictor(
    artifacts_path="models", # model files are in models/model_artifacts/DocumentFigureClassifier/...
    device="cpu",
    num_threads=4,
) 

Docling version

2.16.0

@simonschoe simonschoe added the bug Something isn't working label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant