Skip to content

Fast LaTeX OCR powered by texify, but without bloated dependencies like torch or transformers.

License

Notifications You must be signed in to change notification settings

Sped0n/texifast

Repository files navigation

texifast Python Version from PEP 621 TOML PyPI - Version GitHub Actions Workflow Status

LaTeX and markdown OCR powered by texify, without bloated dependencies like torch or transformers.

Features

  • Minimal dependency graph
  • Compared to Optimum, texifast is faster (~20%) and has a smaller memory footprint (~20%). For details, see benchmark.
  • Supports IOBinding features of ONNXRuntime and optimizes for CUDAExecutionProvider.
  • Supports quantized/mixed precision models.

Installation

You must implicitly specify the required dependencies.

pip install texifast[cpu]
# or if you want to use CUDAExecutionProvider
pip install texifast[gpu]

⚠️⚠️⚠️

Do not install with pip install texifast !!!

Quickstart

This quick start use the image in test folder, you can use whatever you like.

from texifast.model import TxfModel
from texifast.pipeline import TxfPipeline

model = TxfModel(
    encoder_model_path="./encoder_model_quantized.onnx",
    decoder_model_path="./decoder_model_merged_quantized.onnx",
)
texifast = TxfPipeline(model=model, tokenizer="./tokenizer.json")
print(texifast("./latex.png"))

You can download the quantized ONNX model here and the FP16 ONNX model here.

API

The full Python API documentation can be found here.

Credits

About

Fast LaTeX OCR powered by texify, but without bloated dependencies like torch or transformers.

Topics

Resources

License

Stars

Watchers

Forks

Languages