[models] Add VIPTR recognition model #1867

felixdittrich92 · 2025-02-05T12:32:55Z

🚀 The feature

Paper: VIPTR
Implementation: https://github.com/cxfyxl/VIPTR

PyTorch implementation
TensorFlow implementation

Hi, 

I would like to suggest possibly introducing another state-of-the-art text recognition architecture to docTR.
[SVIPTR](https://paperswithcode.com/paper/viptr-a-vision-permutable-extractor-for-fast)
It's promising accurate results at low latency.

Notably, the SVIPTR-T (Tiny) variant delivers highly competitive accuracy on par with other lightweight models and achieves SOTA inference speeds. Meanwhile, the SVIPTR-L (Large) attains SOTA accuracy in single-encoder-type models, while maintaining a low parameter count and favorable inference speed.

Thanks for your consideration.

Inference latency should be comparable to crnn_mobilenet_v3_large and the results are hopefully comparable to parseq.
The addition is agreed.

The text was updated successfully, but these errors were encountered:

felixdittrich92 · 2025-02-05T12:35:43Z

If someone wants to work on this feel free to ping here. Otherwise I planned to start working on it after we have some strategy done to make docTR multilingual.

felixdittrich92 added this to the 1.0.0 milestone Feb 5, 2025

felixdittrich92 self-assigned this Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[models] Add VIPTR recognition model #1867

[models] Add VIPTR recognition model #1867

felixdittrich92 commented Feb 5, 2025 •

edited

Loading

felixdittrich92 commented Feb 5, 2025

[models] Add VIPTR recognition model #1867

[models] Add VIPTR recognition model #1867

Comments

felixdittrich92 commented Feb 5, 2025 • edited Loading

🚀 The feature

felixdittrich92 commented Feb 5, 2025

felixdittrich92 commented Feb 5, 2025 •

edited

Loading