-
Notifications
You must be signed in to change notification settings - Fork 7
Intro parameters
(as of writing this article, OCR-D/core is at 2.12.6, OCR-D/spec at 3.9.0)
The actual functionality of OCR-D is implemented in the form of processors, command line tools that adhere to the OCR-D CLI spec. For an overview which processors are available and how to combine them into workflows, see the OCR-D workflow guide.
All OCR-D processors have the same command line interface, meaning they all support the same set of flags and options when invoked. However, processors can define processor-specific settings in their ocrd-tool.json, called parameters. When running a processor, users can specify these parameters with the -p
and -P
command line options.
To find out which parameters are supported by a processor, use the --help
flag. For example, for ocrd-tesserocr-recognize
, this is the help output:
Usage: ocrd-tesserocr-recognize [OPTIONS]
Segment and/or recognize text with Tesseract (using annotated derived images, or masking and cropping images from coordinate polygons) on any level of the PAGE hierarchy.
> Perform layout segmentation and/or text recognition with Tesseract
> on the workspace.
> Open and deserialise PAGE input files and their respective images,
> then iterate over the element hierarchy down to the requested
> ``textequiv_level`` if it exists and if ``segmentation_level`` is
> lower (i.e. more granular) or ``none``.
> Otherwise stop before (i.e. above) ``segmentation_level``. If any
> segmentation exist at that level already, and ``overwrite_segments``
> is false, then descend into these segments, else remove them.
> Set up Tesseract to recognise each segment's image (either from
> AlternativeImage or cropping the bounding box rectangle and masking
> it from the polygon outline) with the appropriate mode and
> ``model``.
> Next, if there still is a gap between the current level in the PAGE
> hierarchy and the requested ``textequiv_level``, then iterate down
> the result hierarchy, adding new segments at each level (as well as
> reading order references, text line order, reading direction and
> orientation at the region/table level).
> Then, at ``textequiv_level``, remove any existing TextEquiv, unless
> ``overwrite_text`` is false, and add text and confidence results .
> The special value ``textequiv_level=none`` behaves like ``glyph``,
> except that no actual text recognition will be performed, only
> layout analysis (so no ``model`` is needed, and new segmentation is
> created down to the glyph level).
> The special value ``segmentation_level=none`` likewise is lowest,
> i.e. no actual layout analysis will be performed, only text
> recognition (so existing segmentation is needed down to
> ``textequiv_level``).
> Finally, make all higher levels consistent with these text results
> by concatenation, ordering according to each level's respective
> readingDirection, textLineOrder, and ReadingOrder, and joining by
> whitespace as appropriate for each level and according to its
> Relation/join status.
> In other words: - If ``segmentation_level=region``, then segment the
> page into regions (unless ``overwrite_segments=false``), else
> iterate existing regions. - If ``textequiv_level=region``, then
> recognize text in the region, annotate it, and continue with the
> next region. Otherwise... - If ``segmentation_level=cell`` or
> higher, then segment table regions into text regions (i.e. cells)
> (unless ``overwrite_segments=false``), else iterate existing cells.
> - If ``textequiv_level=cell``, then recognize text in the cell,
> annotate it, and continue with the next cell. Otherwise... - If
> ``segmentation_level=line`` or higher, then segment text regions
> into text lines (unless ``overwrite_segments=false``), else
> iterate existing text lines. - If ``textequiv_level=line``, then
> recognize text in the text lines, annotate it, and continue with
> the next line. Otherwise... - If ``segmentation_level=word`` or
> higher, then segment text lines into words (unless
> ``overwrite_segments=false``), else iterate existing words. - If
> ``textequiv_level=word``, then recognize text in the words,
> annotate it, and continue with the next word. Otherwise... - If
> ``segmentation_level=glyph`` or higher, then segment words into
> glyphs (unless ``overwrite_segments=false``), else iterate
> existing glyphs. - If ``textequiv_level=glyph``, then recognize text
> in the glyphs and continue with the next glyph. Otherwise... -
> (i.e. ``none``) annotate no text and be done.
> Note that ``cell`` is an _optional_ level that is only relevant for
> table regions, not text or other regions. Also, when segmenting
> tables in the same run that detects them (via
> ``segmentation_level=region`` and ``find_tables``), cells will just
> be 'paragraphs'. In contrast, when segmenting tables that already
> exist (via ``segmentation_level=cell``), cells will be detected in
> ``sparse_text`` mode, i.e. as single-line text regions.
> Thus, ``segmentation_level`` is the entry point level for layout
> analysis, and setting it to ``none`` makes this processor behave as
> recognition-only. Whereas ``textequiv_level`` selects the exit point
> level for segmentation, and setting it to ``none`` makes this
> processor behave as segmentation-only.
> All segments above ``segmentation_level`` must already exist, and no
> segments below ``textequiv_level`` will be newly created.
> If ``find_tables``, then during region segmentation, also try to
> detect table blocks and add them as TableRegion, then query the page
> iterator for paragraphs and add them as TextRegion cells.
> If ``block_polygons``, then during region segmentation, query
> Tesseract for polygon outlines instead of bounding boxes for each
> region. (This is more precise, but due to some path representation
> errors does not always yield accurate/valid polygons.)
> If ``sparse_text``, then during region segmentation, attempt to find
> single-line text blocks in no particular order (Tesseract's page
> segmentation mode ``SPARSE_TEXT``).
> Finally, produce new output files by serialising the resulting
> hierarchy.
Options:
-I, --input-file-grp USE File group(s) used as input
-O, --output-file-grp USE File group(s) used as output
-g, --page-id ID Physical page ID(s) to process
--overwrite Remove existing output pages/images
(with --page-id, remove only those)
-p, --parameter JSON-PATH Parameters, either verbatim JSON string
or JSON file path
-P, --param-override KEY VAL Override a single JSON object key-value pair,
taking precedence over --parameter
-m, --mets URL-PATH URL or file path of METS to process
-w, --working-dir PATH Working directory of local workspace
-l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
Log level
-J, --dump-json Dump tool description as JSON and exit
-h, --help This help message
-V, --version Show version
Parameters:
"dpi" [number - 0]
pixel density in dots per inch (overrides any meta-data in the
images); disabled when negative
"padding" [number - 0]
Extend detected region/cell/line/word rectangles by this many (true)
pixels, or extend existing region/line/word images (i.e. the
annotated AlternativeImage if it exists or the higher-level image
cropped to the bounding box and masked by the polygon otherwise) by
this many (background/white) pixels on each side before recognition.
"segmentation_level" [string - "word"]
Highest PAGE XML hierarchy level to remove existing annotation from
and detect segments for (before iterating downwards); if ``none``,
does not attempt any new segmentation; if ``cell``, starts at table
regions, detecting text regions (cells). Ineffective when lower than
``textequiv_level``.
Possible values: ["region", "cell", "line", "word", "glyph", "none"]
"textequiv_level" [string - "word"]
Lowest PAGE XML hierarchy level to re-use or detect segments for and
add the TextEquiv results to (before projecting upwards); if
``none``, adds segmentation down to the glyph level, but does not
attempt recognition at all; if ``cell``, stops short before text
lines, adding text of text regions inside tables (cells) or on page
level only.
Possible values: ["region", "cell", "line", "word", "glyph", "none"]
"overwrite_segments" [boolean - false]
If ``segmentation_level`` is not none, but an element already
contains segments, remove them and segment again. Otherwise use the
existing segments of that element.
"overwrite_text" [boolean - true]
If ``textequiv_level`` is not none, but a segment already contains
TextEquivs, remove them and replace with recognised text. Otherwise
add new text as alternative. (Only the first entry is projected
upwards.)
"block_polygons" [boolean - false]
When detecting regions, annotate polygon coordinates instead of
bounding box rectangles.
"find_tables" [boolean - true]
When detecting regions, recognise tables as table regions
(Tesseract's ``textord_tabfind_find_tables=1``).
"sparse_text" [boolean - false]
When detecting regions, use 'sparse text' page segmentation mode
(finding as much text as possible in no particular order): only text
regions, single lines without vertical or horizontal space.
"raw_lines" [boolean - false]
When detecting lines, do not attempt additional segmentation
(baseline+xheight+ascenders/descenders prediction) on line images.
Can increase accuracy for certain workflows. Disable when line
segments/images may contain components of more than 1 line, or
larger gaps/white-spaces.
"char_whitelist" [string - ""]
When recognizing text, enumeration of character hypotheses (from the
model) to allow exclusively; overruled by blacklist if set.
"char_blacklist" [string - ""]
When recognizing text, enumeration of character hypotheses (from the
model) to suppress; overruled by unblacklist if set.
"char_unblacklist" [string - ""]
When recognizing text, enumeration of character hypotheses (from the
model) to allow inclusively.
"model" [string]
The tessdata text recognition model to apply (an ISO 639-3 language
specification or some other basename, e.g. deu-frak or Fraktur).
Default Wiring:
['OCR-D-SEG-PAGE', 'OCR-D-SEG-REGION', 'OCR-D-SEG-TABLE', 'OCR-D-SEG-LINE', 'OCR-D-SEG-WORD'] -> ['OCR-D-SEG-REGION', 'OCR-D-SEG-TABLE', 'OCR-D-SEG-LINE', 'OCR-D-SEG-WORD', 'OCR-D-SEG-GLYPH', 'OCR-D-OCR-TESS']
You can find a description of the parameters in the section Parameters
. Every parameter (e.g. overwrite_segments
) is listed with its name (overwrite_segments
), its datatype (boolean
- so either true
or false
), its default value (false
) and a description of what the parameter does ("Remove existing layout and text annotation below the TextLine level (regardless of textequiv_level)").
There are three ways to pass parameters to a processor:
-
-P KEY VALUE
: set parameters individually -
-p JSON_FILE
: as a JSON fileJSON_FILE
-
-p JSON_STRING
: as literal JSON
Option 1. has been introduced in OCR-D/core v2.11.0 and is currently the recommended way to specify parameters.
Option 2. allows to define the parameters in a JSON file, including #
-prefixed comments. This is most useful for processor developers to define and describe sets of parameters.
Option 3. was the preferred way to pass parameters until the introduction of -P KEY VALUE
. Its advantage over -p JSON_FILE
is that the parameters can be defined ad-hoc on the command line. A major disadvantage is that quoting can become tricky when there's another level of indirection, such as when running a processor within a Docker container.
You can combine all variants of parameter passing and both -p
and -P
are repeatable. This allows for composition, i.e. in the following
ocrd-foo -p defaults.json -P this-param 42
will first read the file defaults.json
and parse it as JSON, then override the parameter this-param
with the value 42
(a number).
The following three invocations are functionally equivalent:
echo '{"foo": "bar"}' > param.json
ocrd-foo -p param.json
ocrd-foo -p '{"foo": "bar"}'
ocrd-foo -P foo bar
This illustrates that -P
is the most intuitive and therefore recommended way to pass parameters.
The -p
variants of passing parameters require a well-formed JSON object, that is:
- Enclosed in
{}
- Keys (parameter name) and values (parameter value) separated with
:
- Keys must be double-quoted (
"param-name"
) - Values must be valid JSON data types:
- string: double-quote (e.g.
"some string value"
) - number: the digits of the number, decimal separator is
.
(e.g.42
,3.1514
) - boolean:
true
orfalse
- array: A list of strings, numbers or boolean, separated by
,
and enclosed in[]
- object: The same syntax as for the whole parameter JSON
- string: double-quote (e.g.
One extension of JSON we support in OCR-D are #
-prefixed comments, i.e. you can describe
the parameter JSON with comments like such:
{
# This is set to true because we're augmenting existing OCR results
# which may have words already
"overwrite_segments": true
}
For the -P KEY VALUE
variant, these rules apply:
-
KEY
must not be quoted -
VALUE
can be any of the JSON data types described above - If
VALUE
is not a valid JSON data type, it is interpreted as a string. That has the advantage that you can write-P param-name string-value
instead of-P param-name '"string-value"'
. ~
Welcome to the OCR-D wiki, a companion to the OCR-D website.
Articles and tutorials
- Running OCR-D on macOS
- Running OCR-D in Windows 10 with Windows Subsystem for Linux
- Running OCR-D on POWER8 (IBM pSeries)
- Running browse-ocrd in a Docker container
- OCR-D Installation on NVIDIA Jetson Nano and Xavier
- Mapping PAGE to ALTO
- Comparison of OCR formats (outdated)
- A Practicioner's View on Binarization
- How to use the bulk-add command to generate workspaces from existing files
- Evaluation of (intermediary) steps of an OCR workflow
- A quickstart guide to ocrd workspace
- Introduction to parameters in OCR-D
- Introduction to OCR-D processors
- Introduction to OCR-D workflows
- Visualizing (intermediate) OCR-D-results
- Guide to updating ocrd workspace calls for 2.15.0+
- Introduction to Docker in OCR-D
- How to import Abbyy-generated ALTO
- How to create ALTO for DFG Viewer
- How to create searchable fulltext data for DFG Viewer
- Setup native CUDA Toolkit for Qurator tools on Ubuntu 18.04
- OCR-D Code Review Guidelines
- OCR-D Recommendations for Using CI in Your Repository
Expert section on OCR-D- workflows
Particular workflow steps
Workflow Guide
- Workflow Guide: preprocessing
- Workflow Guide: binarization
- Workflow Guide: cropping
- Workflow Guide: denoising
- Workflow Guide: deskewing
- Workflow Guide: dewarping
- Workflow Guide: region-segmentation
- Workflow Guide: clipping
- Workflow Guide: line-segmentation
- Workflow Guide: resegmentation
- Workflow Guide: olr-evaluation
- Workflow Guide: text-recognition
- Workflow Guide: text-alignment
- Workflow Guide: post-correction
- Workflow Guide: ocr-evaluation
- Workflow Guide: adaptation-of-coordinates
- Workflow Guide: format-conversion
- Workflow Guide: generic transformations
- Workflow Guide: dummy processing
- Workflow Guide: archiving
- Workflow Guide: recommended workflows