Skip to content

Commit

Permalink
upload readme and figs
Browse files Browse the repository at this point in the history
  • Loading branch information
Hxyz-123 committed May 2, 2023
1 parent 7b610a8 commit 86d269f
Show file tree
Hide file tree
Showing 7 changed files with 179 additions and 2 deletions.
181 changes: 179 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,179 @@
# SAMText
The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"
<h1 align="center">[Arxiv 2023] Scalable Mask Annotation for Video Text Spotting</a></h1>
<p align="center">
<h4 align="center">This is the official repository of the paper <a href="https://xxxx.com">Scalable Mask Annotation for Video Text Spotting</a>.</h4>
<h5 align="center"><em>Haibin He, Jing Zhang, Mengyang Xu, Juhua Liu, Bo Du, Dacheng Tao</em></h5>
<p align="center">
<a href="#news">News</a> |
<a href="#abstract">Abstract</a> |
<a href="#method">Method</a> |
<a href="#usage">Usage</a> |
<a href="#results">Results</a> |
<a href="#statement">Statement</a>
</p>




# News

***02/05/2023***

- The paper is post on arxiv! The code will be made public available once cleaned up.

- Relevant Project:

> [**DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer** ](https://arxiv.org/abs/2207.04491) | [Code](https://github.com/ymy-k/DPText-DETR)
>
> Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, Dacheng Tao
>
> [**DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting** ](https://arxiv.org/pdf/2211.10772v3) | [Code](https://github.com/ViTAE-Transformer/DeepSolo)
Other applications of [ViTAE](https://github.com/ViTAE-Transformer/ViTAE-Transformer) inlcude: [ViTPose](https://github.com/ViTAE-Transformer/ViTPose) | [Remote Sensing](https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing) | [Matting](https://github.com/ViTAE-Transformer/ViTAE-Transformer-Matting) | [VSA](https://github.com/ViTAE-Transformer/ViTAE-VSA) | [Video Object Segmentation](https://github.com/ViTAE-Transformer/VOS-LLB)

# Abstract

<p align="left">Video text spotting refers to localizing, recognizing, and tracking textual elements
such as captions, logos, license plates, signs, and other forms of text within consecutive
video frames. However, current datasets available for this task rely on
quadrilateral ground truth annotations, which may result in including excessive
background content and inaccurate text boundaries. Furthermore, methods trained
on these datasets often produce prediction results in the form of quadrilateral boxes,
which limits their ability to handle complex scenarios such as dense or curved text.
To address these issues, we propose a scalable mask annotation pipeline called
SAMText for video text spotting.SAMText leverages the <a href="https://arxiv.org/abs/2304.02643">SAM</a> model to
generate mask annotations for scene text images or video frames at scale. Using
SAMText, we have created a large-scale dataset, SAMText-9M, that contains over
2,400 video clips sourced from existing datasets and over 9 million mask annotations.
We have also conducted a thorough statistical analysis of the generated
masks and their quality, identifying several research topics that could be further
explored based on this dataset.




# Method
<figure>
<img src="figs/opening.png">
<figcaption align = "center"><b>Figure 1: Overview of the SAMText pipeline that builds upon the <a href="https://arxiv.org/abs/2304.02643">SAM</a> approach to generate
mask annotations for scene text images or video frames at scale. The input bounding box may be
sourced from existing annotations or derived from a scene text detection model.</b></figcaption>
</figure>





# Usage
The code and models will be released soon.



# Results
# The Quality of Generated Masks

<figure>
<img src="figs/figure3.png">
<figcaption align = "center"><b>Figure 3: The distribution of IoU between the generated
masks and ground truth masks in the COCOText
training dataset: <a href="https://arxiv.org/abs/1601.07140">COCO_Text V2</a>
</b></figcaption>
</figure>

To evaluate the performance of SAMText, we
select the COCO-Text training dataset [25] as it
provides ground truth mask annotations for text
instances. Specifically, we randomly sample
10% of the training data and calculate the IoU
between the masks generated by SAMText and
their corresponding ground truth masks. Our
findings show that SAMText has high accuracy,
with an average IoU of 0.70. The histogram of
IoU scores is shown in Fig. 3. Figure 3 presents
the histogram of IoU scores. Notably, the majority
of IoU scores are centered around 0.75,
suggesting that SAMText performs well.





# Visualization of Generated Masks



<figure>
<img src="figs/figure2.jpg">
<figcaption align = "center"><b>Figure 2: Some visualization results of the generated masks in five datasets using the SAMText
pipeline. The top row shows the scene text frames while the bottom row shows the generated masks.</a>
</b></figcaption>
</figure>

In Figure 2, we show some visualization results of the generated masks in five datasets using the
SAMText pipeline. The top row shows the scene text frames while the bottom row shows the
generated masks. As can be seen, the generated masks possess fewer background components and
align more precisely with the text boundaries than the bounding boxes. As a result, the generated
mask annotations facilitate conducting more comprehensive research on this dataset, e.g., video text
segmentation and video text spotting using mask annotations.






## Dataset Statistics and Analysis
### The size distribution.

<figure>
<img src="figs/figure4.png">
<figcaption align = "center"><b>Figure 4: (a) The mask size distributions of the ICDAR15, RoadText-1k, LSVDT, and DSText datasets.
Masks exceeding 10,000 pixels are excluded from the statistics. (b) The mask size distributions of
the BOVText datasets. Masks exceeding 80,000 pixels are excluded from the statistics.</a>
</b></figcaption>
</figure>



### The IoU and COV distribution.

<figure>
<img src="figs/figure5.png">
<figcaption align = "center"><b>Figure 5: (a) The distribution of IoU between the generated masks and ground truth bounding boxes
in each dataset. (b) The CoV distribution of mask size changes for the same individual in consecutive
frames in all five datasets, excluding the CoV scores exceeding 1.0 from the statistics.</a>
</b></figcaption>
</figure>



### The spatial distribution.

<figure>
<img src="figs/figure6.png">
<figcaption align = "center"><b>Figure 6: Visualization of the heatmaps that depict the spatial distribution of the generated masks in
the five video text spotting datasets employed to establish SAMText-9M.</a>
</b></figcaption>
</figure>



# Statement

This project is for research purpose only. For any other questions please contact [[email protected]](mailto:[email protected]).



## Citation

If you find SAMText helpful, please consider giving this repo a star:star: and citing:

```
@inproceedings{SAMText,
title={ Scalable Mask Annotation for Video Text Spotting},
author={Haibin He, Jing Zhang, Mengyang Xu, Juhua Liu, Bo Du, Dacheng Tao},
booktitle={arxiv},
year={2023}
}
```



Binary file added figs/figure2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/figure3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/figure4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/figure5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/figure6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/opening.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 86d269f

Please sign in to comment.