You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fine-tuning of both text detection and recognition for default algorithms as found here:https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/finetune_en.md is amazingly written and I was able to execute it. DB and SVTR_LCNet are default models for fine-tuning for detection and recognition respectively.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello guys,
Fine-tuning of both text detection and recognition for default algorithms as found here:https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/finetune_en.md is amazingly written and I was able to execute it. DB and SVTR_LCNet are default models for fine-tuning for detection and recognition respectively.
Brief Overview of models employed:
Text Detection: en_PP-OCRv3_det
Text Recognition: en_PP-OCRv4_rec
Corresponding models and yaml file can be found here: https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_ch/models_list.md
I would like to know fine-tune with text detection SAST:
Could I simply use wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams as pretrained model from here:https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/detection_en.md with config file from here: https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/det/det_r50_vd_sast_icdar15.yml? (I want to fine-tune not train from scratch).
Fine-tuning of text recognition with VisionLAN. I have been exploring this for weeks but has left me confused on how to proceed.
2.1 As per https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/recognition_en.md, we can download the pretrained model with wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/PP-OCRv4/english/en_PP-OCRv4_rec_train.tar, which is the en_PP-OCRv4_rec model. This step is pretty clear.
2.2 But when I had a look at https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/algorithm_rec_visionlan_en.md, it says that pretrained, trained model can be downloaded from this:
It is also mentioned on VisionLAN that PaddleOCR modularizes the code, and training different recognition models only requires changing the configuration file.
Information on 2.1 and 2,2 is bit confusing to me. Which model should I use to fine-tune on VisionLAN?
Thanks in advance for guidance!
@GreatV @WenmuZhou @LDOUBLEV @MissPenguin @tink2123 @UserWangZz ..... can someone please help me?
Beta Was this translation helpful? Give feedback.
All reactions