Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

VikParuchuri / pdftext Public

Notifications You must be signed in to change notification settings
Fork 40
Star 405

Code
Issues 2
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: VikParuchuri/pdftext

Releases · VikParuchuri/pdftext

Fix links to be in same span

28 Jan 17:10

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Fix links to be in same span Latest

Latest

What's Changed

Misc bugfixes and improvements by @iammosespaulr in #32
Bump version by @VikParuchuri in #33
Dev by @VikParuchuri in #34

Full Changelog: v0.5.0...v0.5.1

Contributors

VikParuchuri and iammosespaulr

Assets 2

Loading

All reactions

Table and link extraction support

22 Jan 17:58

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Table and link extraction support

Summary

Add table extraction support
Add link support for references and external links
Bugfixes

What's Changed

fix: bbox sorting error by @simjak in #27
Add table extraction by @VikParuchuri in #25
Add support for PDF links and references by @iammosespaulr in #28
Improved References by @iammosespaulr in #30
Link support by @VikParuchuri in #29

New Contributors

@simjak made their first contribution in #27

Full Changelog: v0.4.1...v0.5.0

Contributors

VikParuchuri, simjak, and iammosespaulr

Assets 2

Loading

All reactions

Pin pypdfium2

30 Dec 20:44

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Pin pypdfium2

There's a bug with pypdfium 4.30.1 and text extraction - pinning to previous version.

Assets 2

Loading

mara004 and Vacuium reacted with eyes emoji

All reactions

👀 2 reactions

2 people reacted

Improved Segmentation with Heuristic-Based Approach

12 Dec 16:12

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Improved Segmentation with Heuristic-Based Approach

We’ve removed pdftext's reliance on the decision tree for segmenting spans, lines, and blocks and are now utilizing simpler heuristics for more efficient and accurate segmentation.

Assets 2

Loading

mara004 and ttamoud reacted with thumbs up emoji

All reactions

👍 2 reactions

2 people reacted

Fix loose charbox for quotes

03 Dec 20:39

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Fix loose charbox for quotes

Special chars don't work well with the loose charbox. We'll remove loose entirely soon, but this is an intermediate fix for an annoying issue with misplaced quotes.

Assets 2

Loading

All reactions

Fix memory leak warnings

19 Nov 18:32

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Fix memory leak warnings

Close the PDF documents properly to avoid warnings + memory leaks.

Assets 2

Loading

All reactions

Fix PDF flattening

25 Oct 17:47

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Fix PDF flattening

Ensure it flattens when multiprocessing

Assets 2

Loading

All reactions

Better device coordinate extraction

18 Oct 15:41

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Better device coordinate extraction

There were some cases where visual and text coordinates didn't align. This fixes that issue.

Assets 2

Loading

All reactions

Revert extraction changes

17 Oct 19:57

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Revert extraction changes

Merge pull request #14 from VikParuchuri/dev

Revert extraction

Assets 2

Loading

All reactions

Python 3.13 compatibility

17 Oct 18:58

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Python 3.13 compatibility

Merge pull request #13 from VikParuchuri/dev

Python 3.13 support

Assets 2

Loading

All reactions

Previous 1 2 3 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.