Skip to content

Releases: VikParuchuri/marker

Fix pytorch bug

31 Jan 03:00
dba5b4c
Compare
Choose a tag to compare

There was a bug with pytorch 2.6 and MPS that caused errors in inference - this has been fixed.

New LaTeX OCR model; block visualizer; better links/references

29 Jan 16:43
9c740b1
Compare
Choose a tag to compare

Improved LaTeX OCR

We trained a new LaTeX OCR model that works a lot better overall. It will reliably output KaTeX-compatible math. It also operates on longer sequences than before.

The rendered output is on the right, original document on the left:

image

Block visualization

You can now visualize blocks in the streamlit app, thanks to @jazzido . By selecting json output and checking "show blocks", you get a nice visualization where you can see how marker parsed the page. Clicking on blocks will show the HTML.

image

Links and references

We fixed a bug with links and references, they now render as one block. You can see the extracted references here:

image

Misc bugfixes

  • Fixed some bugs with tables and row splitting
  • Escaped $ inside text and tables so we don't accidentally render things as equations

What's Changed

New Contributors

Full Changelog: v1.3.2...v1.3.3

Fix table bugs

27 Jan 16:27
228a7ba
Compare
Choose a tag to compare
  • Issue where some blocks were hidden when they were around tables
  • Fix span id issue with --use_llm and tables
  • Fix problem with tables not OCRing when needed

Improved equations, bugfixes

24 Jan 18:11
9ed906d
Compare
Choose a tag to compare
  • Equations in tables now render properly with --use_llm
  • Fix how block equations render
  • Fix bug with markdown table rendering and --use_llm
  • Fix bug with convert.py CLI script

Improved tables; links and references

24 Jan 03:34
8a2a845
Compare
Choose a tag to compare

Table improvements

  • Tables now handle colspans and rowspans properly
  • Improved table model with better accuracy
  • Tables merge across pages if you pass --use_llm
  • New table benchmarks

Links and references

  • Links and references are now pulled out of the pdf, and are clickable
  • Anchors are placed on elements as targets

Better configuration

  • Any configuration option can now be passed on the CLI

Misc

  • With --use_llm, handwriting is now recognized (if the layout detects it)
  • Better llm mode overall

What's Changed

Full Changelog: v1.2.7...v1.3.0

Remove code from new version

20 Jan 00:33
98dee1b
Compare
Choose a tag to compare

Remove some code that came from the dev branch in the cli scripts.

Reorganize imports

19 Jan 17:43
a123541
Compare
Choose a tag to compare

Fix issue with needing server dependencies to run other CLI scripts.

Hotfix scripts

19 Jan 17:35
f3cec23
Compare
Choose a tag to compare

CLI scripts were broken on some systems with 1.2.4. This fixes it.

Fix section header bug

14 Jan 21:19
d154d8d
Compare
Choose a tag to compare

Fix a bug with nested section headers.

Fix math delimiter issue

03 Jan 21:31
3a20621
Compare
Choose a tag to compare

Handle mismatched delimiters.