Skip to content

Commit

Permalink
Updated docs + Better typing hint support
Browse files Browse the repository at this point in the history
  • Loading branch information
amenezes committed Dec 6, 2023
1 parent 4021d06 commit cc7ba3c
Show file tree
Hide file tree
Showing 13 changed files with 109 additions and 88 deletions.
11 changes: 10 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,16 @@ jobs:
tests:
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12', 'pypy-3.8', 'pypy-3.9', 'pypy-3.10']
python-version: [
'3.8',
'3.9',
'3.10',
'3.11',
'3.12',
'pypy-3.8',
'pypy-3.9',
'pypy-3.10'
]
os: [ubuntu]
fail-fast: true
runs-on: ${{ matrix.os }}-latest
Expand Down
74 changes: 40 additions & 34 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,38 +5,44 @@ exclude: >
)
fail_fast: false
repos:
- repo: local
hooks:
- id: black
name: black
entry: black
language: system
types: [python]
- id: isort
name: isort
entry: isort
language: system
types: [python]
args: ["--profile", "black"]
- id: flake8
name: flake8
entry: flake8
language: system
- repo: local
hooks:
- id: black
name: black
entry: black
language: system
types: [python]
- id: isort
name: isort
entry: isort
language: system
types: [python]
args: ["--profile", "black"]
- id: flake8
name: flake8
entry: flake8
language: system
types: [ python ]
- id: mypy
name: mypy
entry: mypy
language: system
types: [ python ]
- id: mypy
name: mypy
entry: mypy
language: system
types: [ python ]
- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.3.0
hooks:
- id: forbid-crlf
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: check-case-conflict
- id: check-merge-conflict
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.3.0
hooks:
- id: forbid-crlf
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: check-case-conflict
- id: check-merge-conflict
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/pycqa/flake8
rev: 3.8.4
hooks:
- id: flake8
additional_dependencies:
- flake8-encodings==0.5.1
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ tests:
docs:
@echo "> generate project documentation..."
@cp README.md docs/index.md
mkdocs serve
mkdocs serve -a 0.0.0.0:8000

install-deps:
@echo "> installing dependencies..."
Expand Down
25 changes: 12 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,23 +89,16 @@ async with aiopytesseract.run(
alto_file, tsv_file, txt_file = resp
```

For more details on Tesseract best practices and the aiopytesseract, see the folder: `docs`.

## Examples

If you want to test **aiopytesseract** easily, can you use some options like:

- docker
- docker-compose
- docker/docker-compose
- [streamlit](https://streamlit.io)

### Docker

Just copy and paste the following line.

```bash
docker run --rm --name aiopytesseract -p 8501:8501 amenezes/aiopytesseract
```

### docker-compose
### Docker / docker-compose

After clone this repo run the command below:

Expand All @@ -117,15 +110,21 @@ docker-compose up -d

For this option it's necessary first install `aiopytesseract` and `streamlit`, after execute:

```python
``` py
# remote option:
streamlit run https://github.com/amenezes/aiopytesseract/blob/master/examples/streamlit/app.py
```

``` py
# local option:
streamlit run examples/streamlit/app.py
```

> note: The streamlit example need **python >= 3.10**

## Links

- License: [Apache License](https://choosealicense.com/licenses/apache-2.0/)
- Code: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)
- Issue tracker: [https://github.com/amenezes/aiopytesseract/issues](https://github.com/amenezes/aiopytesseract/issues)
- Docs: [https://aiopytesseract.amenezes.net](https://github.com/amenezes/aiopytesseract)
- Docs: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)
2 changes: 1 addition & 1 deletion aiopytesseract/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
)
from .models import OSD, Box, Data, Parameter

__version__ = "0.12.0"
__version__ = "0.13.0"
__all__ = [
"__version__",
"OSD",
Expand Down
2 changes: 1 addition & 1 deletion aiopytesseract/base_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ async def execute_multi_output_cmd(
if proc.returncode != ReturnCode.SUCCESS:
raise TesseractRuntimeError(stderr.decode(encoding))
return tuple(
[f"{output_file}{OUTPUT_FILE_EXTENSIONS[ext]}" for ext in output_format.split()] # type: ignore
[f"{output_file}{OUTPUT_FILE_EXTENSIONS[ext]}" for ext in output_format.split()]
)


Expand Down
4 changes: 2 additions & 2 deletions aiopytesseract/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ async def languages(
) -> List[str]:
"""Tesseract available languages.
:param config: config. (valid values: str)
:param config: config. (valid values: str, default: "")
:param encoding: decode bytes to string. (default: utf-8)
"""
proc = await execute_cmd(f"--list-langs {config}")
Expand All @@ -49,7 +49,7 @@ async def get_languages(
) -> List[str]:
"""Tesseract available languages.
:param config: config. (valid values: str)
:param config: config. (valid values: str, default: "")
:param encoding: decode bytes to string. (default: utf-8)
"""
langs = await languages(config, encoding=encoding)
Expand Down
10 changes: 6 additions & 4 deletions aiopytesseract/constants.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from typing import Dict, Set, Union

from .file_format import FileFormat

TESSERACT_CMD: str = "tesseract"
Expand All @@ -11,7 +13,7 @@
AIOPYTESSERACT_DEFAULT_OEM: int = 3

# https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
TESSERACT_LANGUAGES = {
TESSERACT_LANGUAGES: Set[str] = {
"afr",
"amh",
"ara",
Expand Down Expand Up @@ -138,7 +140,7 @@
"yor",
}

PAGE_SEGMENTATION_MODES = {
PAGE_SEGMENTATION_MODES: Dict[int, str] = {
0: "Orientation and script detection (OSD) only.",
1: "Automatic page segmentation with OSD.",
2: "Automatic page segmentation, but no OSD, or OCR. (not implemented)",
Expand All @@ -155,14 +157,14 @@
13: "Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.",
}

OCR_ENGINE_MODES = {
OCR_ENGINE_MODES: Dict[int, str] = {
0: "Legacy engine only.",
1: "Neural nets LSTM engine only.",
2: "Legacy + LSTM engines.",
3: "Default, based on what is available.",
}

OUTPUT_FILE_EXTENSIONS = {
OUTPUT_FILE_EXTENSIONS: Dict[Union[str, FileFormat], str] = {
FileFormat.ALTO: ".xml",
FileFormat.HOCR: ".hocr",
FileFormat.PDF: ".pdf",
Expand Down
10 changes: 5 additions & 5 deletions docs/best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

### Reference

- [https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html)
- [https://www.pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/](https://www.pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/)
- [Improving the quality of the output](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html)
- [Tesseract Page Segmentation Modes (PSMs) Explained: How to Improve Your OCR Accuracy](https://www.pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/)


## Tesseract Command line

### Reference

- [https://tesseract-ocr.github.io/tessdoc/](https://tesseract-ocr.github.io/tessdoc/)
- [https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html)
- [https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc](https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc)
- [Tesseract User Manual](https://tesseract-ocr.github.io/tessdoc/)
- [Command Line Usage](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html)
- [TESSERACT(1) Manual Page](https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc)
19 changes: 10 additions & 9 deletions docs/development.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,25 @@
# Development
## Setting up the development environment

## Install development dependencies
### Installing Development Dependencies

```bash
make install-deps
```

> OR: pip install -r requirements-dev.txt
### Running Lint Checks

## Execute tests
```bash
make lint
```

### Running Tests

```bash
make tests
```

> OR: pytest
## Generating documentation locally.
### Documentation generation

```bash
pip install 'aiopytesseract[docs]'
make docs
make docs # local server
```
25 changes: 12 additions & 13 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,23 +89,16 @@ async with aiopytesseract.run(
alto_file, tsv_file, txt_file = resp
```

For more details on Tesseract best practices and the aiopytesseract, see the folder: `docs`.

## Examples

If you want to test **aiopytesseract** easily, can you use some options like:

- docker
- docker-compose
- docker/docker-compose
- [streamlit](https://streamlit.io)

### Docker

Just copy and paste the following line.

```bash
docker run --rm --name aiopytesseract -p 8501:8501 amenezes/aiopytesseract
```

### docker-compose
### Docker / docker-compose

After clone this repo run the command below:

Expand All @@ -117,15 +110,21 @@ docker-compose up -d

For this option it's necessary first install `aiopytesseract` and `streamlit`, after execute:

```python
``` py
# remote option:
streamlit run https://github.com/amenezes/aiopytesseract/blob/master/examples/streamlit/app.py
```

``` py
# local option:
streamlit run examples/streamlit/app.py
```

> note: The streamlit example need **python >= 3.10**

## Links

- License: [Apache License](https://choosealicense.com/licenses/apache-2.0/)
- Code: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)
- Issue tracker: [https://github.com/amenezes/aiopytesseract/issues](https://github.com/amenezes/aiopytesseract/issues)
- Docs: [https://aiopytesseract.amenezes.net](https://github.com/amenezes/aiopytesseract)
- Docs: [https://github.com/amenezes/aiopytesseract](https://github.com/amenezes/aiopytesseract)
10 changes: 6 additions & 4 deletions examples/streamlit/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,12 @@
import streamlit as st

import aiopytesseract
from aiopytesseract.constants import (AIOPYTESSERACT_DEFAULT_TIMEOUT,
OCR_ENGINE_MODES,
PAGE_SEGMENTATION_MODES,
TESSERACT_LANGUAGES)
from aiopytesseract.constants import (
AIOPYTESSERACT_DEFAULT_TIMEOUT,
OCR_ENGINE_MODES,
PAGE_SEGMENTATION_MODES,
TESSERACT_LANGUAGES,
)

loop = asyncio.new_event_loop()
loop.set_debug(True)
Expand Down
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,6 @@ requires = [
"setuptools >= 46.4.0",
]
build-backend = "setuptools.build_meta"

[tool.isort]
profile = "black"

0 comments on commit cc7ba3c

Please sign in to comment.