Skip to content

New LaTeX OCR model; block visualizer; better links/references

Compare
Choose a tag to compare
@VikParuchuri VikParuchuri released this 29 Jan 16:43
· 6 commits to master since this release
9c740b1

Improved LaTeX OCR

We trained a new LaTeX OCR model that works a lot better overall. It will reliably output KaTeX-compatible math. It also operates on longer sequences than before.

The rendered output is on the right, original document on the left:

image

Block visualization

You can now visualize blocks in the streamlit app, thanks to @jazzido . By selecting json output and checking "show blocks", you get a nice visualization where you can see how marker parsed the page. Clicking on blocks will show the HTML.

image

Links and references

We fixed a bug with links and references, they now render as one block. You can see the extracted references here:

image

Misc bugfixes

  • Fixed some bugs with tables and row splitting
  • Escaped $ inside text and tables so we don't accidentally render things as equations

What's Changed

New Contributors

Full Changelog: v1.3.2...v1.3.3