This is the repo to host the dataset TextSeg and code for TexRNet from the following paper:
Xingqian Xu, Zhifei Zhang, Zhaowen Wang, Brian Price, Zhonghao Wang and Humphrey Shi, Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach, ArXiv Link
Note: Our dataset and code are released, please send in a request for dataset download.
Text in the real world is extremely diverse, yet current text dataset does not reflect such diversity very well. To bridge this gap, we proposed TextSeg, a large-scale fine-annotated and multi-purpose text dataset, collecting scene and design text with six types of annotations: word- and character-wise bounding polygons, masks and transcriptions. We also introduce Text Refinement Network (TexRNet), a novel text segmentation approach that adapts to the unique properties of text, e.g. non-convex boundary, diverse texture, etc., which often impose burdens on traditional segmentation models. TexRNet refines results from common segmentation approach via key features pooling and attention, so that wrong-activated text regions can be adjusted. We also introduce trimap and discriminator losses that show significant improvement on text segmentation.
Our dataset (TextSeg) is academia-only and cannot be used on any commercial project and research. To download the data, please send a request email to [email protected] and tell us which school you are affiliated with.
In this table, we report the performance of our TexRNet on 5 text segmentation dataset including ours.
TextSeg(Ours) | ICDAR13 FST | COCO_TS | MLT_S | Total-Text | ||||||
Method | fgIoU | F-score | fgIoU | F-score | fgIoU | F-score | fgIoU | F-score | fgIoU | F-score |
DeeplabV3+ | 84.07 | 0.914 | 69.27 | 0.802 | 72.07 | 0.641 | 84.63 | 0.837 | 74.44 | 0.824 |
HRNetV2-W48 | 85.03 | 0.914 | 70.98 | 0.822 | 68.93 | 0.629 | 83.26 | 0.836 | 75.29 | 0.825 |
HRNetV2-W48 + OCR | 85.98 | 0.918 | 72.45 | 0.830 | 69.54 | 0.627 | 83.49 | 0.838 | 76.23 | 0.832 |
Ours: TexRNet + DeeplabV3+ | 86.06 | 0.921 | 72.16 | 0.835 | 73.98 | 0.722 | 86.31 | 0.830 | 76.53 | 0.844 |
Ours: TexRNet + HRNetV2-W48 | 86.84 | 0.924 | 73.38 | 0.850 | 72.39 | 0.720 | 86.09 | 0.865 | 78.47 | 0.848 |
conda create -n texrnet python=3.7
conda activate texrnet
pip install -r requirement.txt
First, make the following directories to hold pre-trained models, dataset, and running logs:
mkdir ./pretrained
mkdir ./data
mkdir ./log
Second, download the models from this link. Move those downloaded models to ./pretrained
.
Thrid, make sure that ./data
contains the data. A sample root directory for TextSeg would be ./data/TextSeg
.
Lastly, evaluate the model and compute fgIoU/F-score with the following command:
python main.py --eval --pth [model path] [--hrnet] [--gpu 0 1 ...] --dsname [dataset name]
Here is the sample command to eval a TexRNet_HRNet on TextSeg with 4 GPUs:
python main.py --eval --pth pretrained/texrnet_hrnet.pth --hrnet --gpu 0 1 2 3 --dsname textseg
The program will store results and execution log in ./log/eval
.
Similarly, these directories need to be created:
mkdir ./pretrained
mkdir ./pretrained/init
mkdir ./data
mkdir ./log
Second, we use multiple pre-trained models for training. Download these initial models from this link. Move those models to ./pretrained/init
. Also, make sure that ./data
contains the data.
Lastly, execute the training code with the following command:
python main.py [--hrnet] [--gpu 0 1 ...] --dsname [dataset name] [--trainwithcls]
Here is the sample command to train a TexRNet_HRNet on TextSeg with classifier and discriminate loss using 4 GPUs:
python main.py --hrnet --gpu 0 1 2 3 --dsname textseg --trainwithcls
The training configs, logs, and models will be stored in ./log/texrnet_[dsname]/[exid]_[signature]
.
@article{xu2020rethinking,
title={Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach},
author={Xu, Xingqian and Zhang, Zhifei and Wang, Zhaowen and Price, Brian and Wang, Zhonghao and Shi, Humphrey},
journal={arXiv preprint arXiv:2011.14021},
year={2020}
}
The directory .\hrnet_code
is directly copied from the HRNet official github website (link). HRNet code ownership should be credited to HRNet authors, and users should follow their terms of usage.