Skip to content

Releases: VikParuchuri/pdftext

Fix links to be in same span

28 Jan 17:10
92fd696
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5.0...v0.5.1

Table and link extraction support

22 Jan 17:58
0a4f33c
Compare
Choose a tag to compare

Summary

  • Add table extraction support
  • Add link support for references and external links
  • Bugfixes

What's Changed

New Contributors

Full Changelog: v0.4.1...v0.5.0

Pin pypdfium2

30 Dec 20:44
ea2e9b5
Compare
Choose a tag to compare

There's a bug with pypdfium 4.30.1 and text extraction - pinning to previous version.

Improved Segmentation with Heuristic-Based Approach

12 Dec 16:12
cd9d41d
Compare
Choose a tag to compare

We’ve removed pdftext's reliance on the decision tree for segmenting spans, lines, and blocks and are now utilizing simpler heuristics for more efficient and accurate segmentation.

Fix loose charbox for quotes

03 Dec 20:39
f26428a
Compare
Choose a tag to compare

Special chars don't work well with the loose charbox. We'll remove loose entirely soon, but this is an intermediate fix for an annoying issue with misplaced quotes.

Fix memory leak warnings

19 Nov 18:32
c065ac0
Compare
Choose a tag to compare

Close the PDF documents properly to avoid warnings + memory leaks.

Fix PDF flattening

25 Oct 17:47
10d979b
Compare
Choose a tag to compare

Ensure it flattens when multiprocessing

Better device coordinate extraction

18 Oct 15:41
c88e23c
Compare
Choose a tag to compare

There were some cases where visual and text coordinates didn't align. This fixes that issue.

Revert extraction changes

17 Oct 19:57
c6a85c6
Compare
Choose a tag to compare
Merge pull request #14 from VikParuchuri/dev

Revert extraction

Python 3.13 compatibility

17 Oct 18:58
a7cd4fb
Compare
Choose a tag to compare
Merge pull request #13 from VikParuchuri/dev

Python 3.13 support