Skip to content

amenezes/aiopytesseract

This branch is 5 commits behind master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b272632 · Feb 5, 2024

History

72 Commits
Dec 6, 2023
Feb 5, 2024
Dec 6, 2023
Dec 6, 2023
Feb 5, 2024
Oct 23, 2023
Feb 14, 2022
Feb 14, 2022
Jan 28, 2022
Dec 6, 2023
Feb 5, 2024
Feb 14, 2022
Jan 28, 2022
Dec 6, 2023
Feb 5, 2024
Feb 14, 2022
Dec 4, 2022
Dec 6, 2023
Dec 4, 2022
Dec 4, 2022
Feb 5, 2024
Jan 28, 2022

Repository files navigation

ci codecov PyPI version PyPI - Python Version Code style: black

aiopytesseract

A Python asyncio wrapper for Tesseract-OCR.

Installation

Install and update using pip:

pip install aiopytesseract

Usage

List all available languages by Tesseract installation

import aiopytesseract

await aiopytesseract.languages()
await aiopytesseract.get_languages()

Tesseract version

import aiopytesseract

await aiopytesseract.tesseract_version()
await aiopytesseract.get_tesseract_version()

Tesseract parameters

import aiopytesseract

await aiopytesseract.tesseract_parameters()

Confidence only info

import aiopytesseract

await aiopytesseract.confidence("tests/samples/file-sample_150kB.png")

Deskew info

import aiopytesseract

await aiopytesseract.deskew("tests/samples/file-sample_150kB.png")

Extract text from an image: locally or bytes

from pathlib import Path

import aiopytesseract

await aiopytesseract.image_to_string("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_string(
	Path("tests/samples/file-sample_150kB.png").read_bytes(), dpi=220, lang='eng+por'
)

Box estimates

from pathlib import Path

import aiopytesseract

await aiopytesseract.image_to_boxes("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_boxes(Path("tests/samples/file-sample_150kB.png")

Boxes, confidence and page numbers

from pathlib import Path

import aiopytesseract

await aiopytesseract.image_to_data("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_data(Path("tests/samples/file-sample_150kB.png")

Information about orientation and script detection

from pathlib import Path

import aiopytesseract

await aiopytesseract.image_to_osd("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_osd(Path("tests/samples/file-sample_150kB.png")

Generate a searchable PDF

from pathlib import Path

import aiopytesseract

await aiopytesseract.image_to_pdf("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_pdf(Path("tests/samples/file-sample_150kB.png")

Generate HOCR output

from pathlib import Path

import aiopytesseract

await aiopytesseract.image_to_hocr("tests/samples/file-sample_150kB.png")
await aiopytesseract.image_to_hocr(Path("tests/samples/file-sample_150kB.png")

Multi ouput

from pathlib import Path

import aiopytesseract

async with aiopytesseract.run(
	Path('tests/samples/file-sample_150kB.png').read_bytes(),
	'output',
	'alto tsv txt'
) as resp:
	# will generate (output.xml, output.tsv and output.txt)
	print(resp)
	alto_file, tsv_file, txt_file = resp

Config variables

from pathlib import Path

import aiopytesseract

async with aiopytesseract.run(
	Path('tests/samples/text-with-chars-and-numbers.png').read_bytes(),
	'output',
	'alto tsv txt'
	config=[("tessedit_char_whitelist", "0123456789")]
) as resp:
	# will generate (output.xml, output.tsv and output.txt)
	print(resp)
	alto_file, tsv_file, txt_file = resp
from pathlib import Path

import aiopytesseract

await aiopytesseract.image_to_string(
	"tests/samples/text-with-chars-and-numbers.png",
	config=[("tessedit_char_whitelist", "0123456789")]
)

await aiopytesseract.image_to_string(
	Path("tests/samples/text-with-chars-and-numbers.png").read_bytes(),
	dpi=220,
	lang='eng+por',
	config=[("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")]
)

For more details on Tesseract best practices and the aiopytesseract, see the folder: docs.

Examples

If you want to test aiopytesseract easily, can you use some options like:

Docker / docker-compose

After clone this repo run the command below:

docker-compose up -d

streamlit app

For this option it's necessary first install aiopytesseract and streamlit, after execute:

# remote option:
streamlit run https://github.com/amenezes/aiopytesseract/blob/master/examples/streamlit/app.py
# local option:
streamlit run examples/streamlit/app.py

note: The streamlit example need python >= 3.10

Links