Fine-tuning recogintion model en_PP-OCRv4_rec #13897

aiden890 · 2024-09-23T07:36:03Z

aiden890
Sep 23, 2024

Hi!

I am trying to fine-tune recognition model to recognize spaced special characters.

I trained the model with 30,000 generated images

It appears that the pre-trained parameters are not being loaded successfully. Although there are no warnings like “pretrained parameter not in the model,” the accuracy starts from 0 and the loss exceeds 150. Furthermore, even the words that were previously recognized well are now experiencing a decrease in recognition accuracy

Here's my config file and train log

Global:
debug: false
use_gpu: true
epoch_num: 50
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v4
save_epoch_step: 10
eval_batch_step:

0
2000
cal_metric_during_train: true
pretrained_model: ./pretrained_models/en_PP-OCRv4_rec_train.pdparams
checkpoints: null
save_inference_dir: null
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/en_dict.txt
max_text_length: 25
infer_mode: false
use_space_char: true
distributed: false
save_res_path: ./output/rec/predicts_ppocrv3_en.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0005
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform: null
Backbone:
name: PPLCNetV3
scale: 0.95
Head:
name: MultiHead
head_list:
- CTCHead:
  Neck:
  name: svtr
  dims: 120
  depth: 2
  hidden_dims: 120
  kernel_size:
  - 1
  - 3
  use_guide: true
  Head:
  fc_decay: 1.0e-05
- NRTRHead:
  nrtr_dim: 384
  max_text_length: 25
  Loss:
  name: MultiLoss
  loss_config_list:
CTCLoss: null
NRTRLoss: null
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: false
Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: ./out_img/
ext_op_transform_idx: 1
label_file_list:
- ./long_dict_gt.txt
- ./block_dict_gt.txt
- ./spe_dict_gt.txt
- ./longlong_dict_gt.txt
  ratio_list:
- 1.0
- 1.0
- 1.0
- 1.0
  transforms:
- DecodeImage:
  img_mode: BGR
  channel_first: false
- RecConAug:
  prob: 0.5
  ext_data_num: 2
  image_shape:
  - 48
  - 320
  - 3
    max_text_length: 25
- RecAug: null
- MultiLabelEncode:
  gtc_encode: NRTRLabelEncode
- KeepKeys:
  keep_keys:
  - image
  - label_ctc
  - label_gtc
  - length
  - valid_ratio
    sampler:
    name: MultiScaleSampler
    scales:
- - 320
  - 32
- - 320
  - 48
- - 320
  - 64
    first_bs: 96
    fix_bs: false
    divided_factor:
- 8
- 16
  is_training: true
  loader:
  shuffle: true
  batch_size_per_card: 128
  drop_last: true
  num_workers: 4
  Eval:
  dataset:
  name: SimpleDataSet
  data_dir: ./test_img
  label_file_list:
- ./test_gt.txt
  transforms:
- DecodeImage:
  img_mode: BGR
  channel_first: false
- MultiLabelEncode: null
- RecResizeImg:
  image_shape:
  - 3
  - 48
  - 320
- KeepKeys:
  keep_keys:
  - image
  - label_ctc
  - label_sar
  - length
  - valid_ratio
    loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 128
    num_workers: 4
    profiler_options: null

train.log

My question is,

Is it normal for the accuracy to start at zero? If the parameters didn’t load successfully, what could be causing the problem?
If I change the dictionary, will using a pre-trained model based on the original dictionary not significantly affect performance if the dataset is sufficiently large? Alternatively, is there a way—like setting an allowlist in EasyOCR—to prevent the recognition of unwanted characters?
According to finetune_en.md, it is recommended to delete the strategy. However, when I modify the config file (not the one I uploaded), an error occurs. To remove the GTC strategy from my model, do I need to remove NRTRLoss, RecConAug, and change the encoding method?

Thank you for your support!

VishyAnand28 · 2024-09-23T11:21:57Z

VishyAnand28
Sep 23, 2024

Hi @aiden890, I will try my best to answer questions that I know.

Yes it is also normal for accuracy to start from 0 and increase with epochs. However, it is not normal to stay at 0 for a significant amount of epochs. And yes, accuracy will stay at 0 for a significant time when you use a different config file which causes the params not to be loaded successfully.
If you change the dictionary, it is recommended to increase data points significantly so changing the dictionary is not recommended but having more data points will make up for it. You can see the values of the detector and recognizer directly here: https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/finetune_en.md

Potential issues for poor results:

Maybe the dataset is too simple, causing overfitting.
Try changing some of the hyperparameters such as the learning rate.

2 replies

aiden890 Sep 25, 2024
Author

It seems I need to prepare a new dataset.

Thank you so much for your response!

VishyAnand28 Sep 25, 2024

@aiden890 you are welcome!

I have a new query here: #13904, can you perhaps take a look at it and provide solution if possible?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning recogintion model en_PP-OCRv4_rec #13897

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Fine-tuning recogintion model en_PP-OCRv4_rec #13897

aiden890 Sep 23, 2024

Replies: 1 comment · 2 replies

VishyAnand28 Sep 23, 2024

aiden890 Sep 25, 2024 Author

VishyAnand28 Sep 25, 2024

aiden890
Sep 23, 2024

Replies: 1 comment 2 replies

VishyAnand28
Sep 23, 2024

aiden890 Sep 25, 2024
Author