Overview

This script was written to facilitate converting the PRImA Layout Analysis Dataset into an RGB masked PNG format compatible with training various image segmentation models. The dataset is comprised of 478 individual pages images in TIF format and associated masking metadata in PAGE XML format alongside the images.

System Requirements

This script has only ever been tested on Ubuntu 18.04 but I would expect it would probably work on other *nix platforms (and who knows, maybe even Windows?)

Python 3.6
BeautifulSoup 4 (with libxml2 support)
Pillow

Mapping of colors to region types

The basic map is defined at lines 16-27 of converter.py and can be modified to suit. Colors are described in RGB notation. For example if you want to combine multiple region types into a single color you simple modify the map accordingly and the script should do the right thing (hopefully).

The default color to region type mapping

Region Type	Region Color Code
ChartRegion	255,0,0
DrawingRegion	0,255,0
FrameRegion	0,0,255
GraphicRegion	255,255,0
ImageRegion	0,255,255
MathsRegion	255,138,0
NoiseRegion	255,0,255
SeparatorRegion	150,0,255
TableRegion	0,100,25
TextRegion	128,128,128

Region outlines

If you would like to have an outline drawn around the RGB masked regions you'll want to set the region_outline variable to a three digit RGB tuple for the color you want to use for the outline in converter.py.

How to run the converter.py script

The converter.py script expects to be provided three directories as input arguments. Additionally it also operates under the assumption that ground truth images and their associated PAGE XML metadata files will share the same base name. For example if you have a ground truth image called coolpic.jpg converter.py will expect the associated PAGE XML metadata to be called coolpic.xml.

converter.py args

Directory containing ground truth images
Directory containing ground truth metadata in PAGE XML format
Directory where output images with RGB masks applied should be stored

converter.py usage example

./converter.py /path/to/ground_truth_images /path/to/ground_truth_pagexml /path/to/output_rgb_masked_images

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
converter.py		converter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

System Requirements

Mapping of colors to region types

The default color to region type mapping

Region outlines

How to run the converter.py script

converter.py args

converter.py usage example

About

Releases

Packages

Languages

benetech/PagexmlToRgbConverter

Folders and files

Latest commit

History

Repository files navigation

Overview

System Requirements

Mapping of colors to region types

The default color to region type mapping

Region outlines

How to run the converter.py script

converter.py args

converter.py usage example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages