-
-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
24 changed files
with
283 additions
and
118 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
exclude: > | ||
(?x)( | ||
^alembic.ini$| | ||
^migrations/ | ||
) | ||
fail_fast: false | ||
repos: | ||
- repo: local | ||
hooks: | ||
- id: black | ||
name: black | ||
entry: black | ||
language: system | ||
types: [python] | ||
- id: isort | ||
name: isort | ||
entry: isort | ||
language: system | ||
types: [python] | ||
args: ["--profile", "black"] | ||
- id: flake8 | ||
name: flake8 | ||
entry: flake8 | ||
language: system | ||
types: [ python ] | ||
- id: mypy | ||
name: mypy | ||
entry: mypy | ||
language: system | ||
types: [ python ] | ||
- repo: https://github.com/Lucas-C/pre-commit-hooks | ||
rev: v1.3.0 | ||
hooks: | ||
- id: forbid-crlf | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v4.3.0 | ||
hooks: | ||
- id: check-case-conflict | ||
- id: check-merge-conflict | ||
- id: end-of-file-fixer | ||
- id: check-yaml | ||
- id: check-added-large-files |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
## Tips to improve recognition | ||
|
||
### Reference | ||
|
||
- [https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html) | ||
- [https://www.pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/](https://www.pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/) | ||
|
||
|
||
## Tesseract Command line | ||
|
||
### Reference | ||
|
||
- [https://tesseract-ocr.github.io/tessdoc/](https://tesseract-ocr.github.io/tessdoc/) | ||
- [https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html) | ||
- [https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc](https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Development | ||
|
||
## Install development dependencies | ||
|
||
```bash | ||
make install-deps | ||
``` | ||
|
||
> OR: pip install -r requirements-dev.txt | ||
## Execute tests | ||
|
||
```bash | ||
make tests | ||
``` | ||
|
||
> OR: pytest | ||
## Generating documentation locally. | ||
|
||
```bash | ||
pip install 'aiopytesseract[docs]' | ||
make docs | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
[![ci](https://github.com/amenezes/aiopytesseract/actions/workflows/ci.yml/badge.svg)](https://github.com/amenezes/aiopytesseract/actions/workflows/ci.yml) | ||
[![codecov](https://codecov.io/gh/amenezes/aiopytesseract/branch/master/graph/badge.svg)](https://codecov.io/gh/amenezes/aiopytesseract) | ||
[![PyPI version](https://badge.fury.io/py/aiopytesseract.svg)](https://badge.fury.io/py/aiopytesseract) | ||
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/aiopytesseract) | ||
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) | ||
|
||
# aiopytesseract | ||
|
||
A Python [asyncio](https://docs.python.org/3/library/asyncio.html) wrapper for [Tesseract-OCR](https://tesseract-ocr.github.io/tessdoc/). | ||
|
||
## Installation | ||
|
||
Install and update using pip: | ||
|
||
````bash | ||
pip install aiopytesseract | ||
```` | ||
|
||
## Usage | ||
|
||
```python | ||
from pathlib import Path | ||
|
||
import aiopytesseract | ||
|
||
|
||
# list all available languages by tesseract installation | ||
await aiopytesseract.languages() | ||
await aiopytesseract.get_languages() | ||
|
||
|
||
# tesseract version | ||
await aiopytesseract.tesseract_version() | ||
await aiopytesseract.get_tesseract_version() | ||
|
||
|
||
# tesseract parameters | ||
await aiopytesseract.tesseract_parameters() | ||
|
||
|
||
# confidence only info | ||
await aiopytesseract.confidence("tests/samples/file-sample_150kB.png") | ||
|
||
|
||
# deskew info | ||
await aiopytesseract.deskew("tests/samples/file-sample_150kB.png") | ||
|
||
|
||
# extract text from an image: locally or bytes | ||
await aiopytesseract.image_to_string("tests/samples/file-sample_150kB.png") | ||
await aiopytesseract.image_to_string( | ||
Path("tests/samples/file-sample_150kB.png")read_bytes(), dpi=220, lang='eng+por' | ||
) | ||
|
||
|
||
# box estimates | ||
await aiopytesseract.image_to_boxes("tests/samples/file-sample_150kB.png") | ||
await aiopytesseract.image_to_boxes(Path("tests/samples/file-sample_150kB.png") | ||
|
||
|
||
# boxes, confidence and page numbers | ||
await aiopytesseract.image_to_data("tests/samples/file-sample_150kB.png") | ||
await aiopytesseract.image_to_data(Path("tests/samples/file-sample_150kB.png") | ||
|
||
|
||
# information about orientation and script detection | ||
await aiopytesseract.image_to_osd("tests/samples/file-sample_150kB.png") | ||
await aiopytesseract.image_to_osd(Path("tests/samples/file-sample_150kB.png") | ||
|
||
|
||
# generate a searchable PDF | ||
await aiopytesseract.image_to_pdf("tests/samples/file-sample_150kB.png") | ||
await aiopytesseract.image_to_pdf(Path("tests/samples/file-sample_150kB.png") | ||
|
||
|
||
# generate HOCR output | ||
await aiopytesseract.image_to_hocr("tests/samples/file-sample_150kB.png") | ||
await aiopytesseract.image_to_hocr(Path("tests/samples/file-sample_150kB.png") | ||
|
||
|
||
# multi ouput | ||
async with aiopytesseract.run( | ||
Path('tests/samples/file-sample_150kB.png').read_bytes(), | ||
'output', | ||
'alto tsv txt' | ||
) as resp: | ||
# will generate (output.xml, output.tsv and output.txt) | ||
print(resp) | ||
alto_file, tsv_file, txt_file = resp | ||
``` | ||
|
||
## Examples | ||
|
||
If you want to test **aiopytesseract** easily, can you use some options like: | ||
|
||
- docker | ||
- docker-compose | ||
- [streamlit](https://streamlit.io) | ||
|
||
### Docker | ||
|
||
Just copy and paste the following line. | ||
|
||
```bash | ||
docker run --rm --name aiopytesseract -p 8501:8501 amenezes/aiopytesseract | ||
``` | ||
|
||
### docker-compose | ||
|
||
After clone this repo run the command below: | ||
|
||
```bash | ||
docker-compose up -d | ||
``` | ||
|
||
### streamlit app | ||
|
||
For this option it's necessary first install `aiopytesseract` and `streamlit`, after execute: | ||
|
||
```python | ||
streamlit run https://github.com/amenezes/aiopytesseract/blob/master/examples/streamlit/app.py | ||
``` | ||
|
||
> note: The streamlit example need **python >= 3.10** | ||
|
||
## Links | ||
|
||
- License: [Apache License](https://choosealicense.com/licenses/apache-2.0/) | ||
- Code: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract) | ||
- Issue tracker: [https://github.com/amenezes/aiopytesseract/issues](https://github.com/amenezes/aiopytesseract/issues) | ||
- Docs: [https://aiopytesseract.amenezes.net](https://github.com/amenezes/aiopytesseract) |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
site_name: aiopytesseract | ||
repo_url: https://github.com/amenezes/aiopytesseract | ||
repo_name: amenezes/aiopytesseract | ||
theme: | ||
name: material | ||
features: | ||
- navigation.instant | ||
- navigation.top | ||
- navigation.prune | ||
- toc.integrate | ||
- search.highlight | ||
- search.suggest | ||
- search.share | ||
- content.code.annotate | ||
- content.tooltips | ||
- toc.follow | ||
palette: | ||
- scheme: default | ||
primary: blue grey | ||
accent: indigo | ||
toggle: | ||
icon: material/lightbulb-on | ||
name: Switch to dark mode | ||
- scheme: slate | ||
primary: blue grey | ||
accent: indigo | ||
toggle: | ||
icon: material/lightbulb | ||
name: Switch to light mode | ||
icon: | ||
repo: fontawesome/brands/github-alt | ||
extra: | ||
social: | ||
- icon: fontawesome/brands/github | ||
link: https://github.com/amenezes/aiopytesseract | ||
- icon: fontawesome/solid/bug | ||
link: https://github.com/amenezes/aiopytesseract/issues | ||
- icon: fontawesome/solid/envelope | ||
link: mailto:[email protected] | ||
nav: | ||
- Best practices: best-practices.md | ||
- Development: development.md |
Oops, something went wrong.